[Update: For defense, click here]
With the National Football League combine starting, people are sure to get excited about the players that run the fastest 40-yard dash times, bench the most, and put up numbers that match the “physical specimen” description. Players are measured for their height and weight, and then put through a series of drills that include the 40-yard dash, bench press, vertical leap, broad jump, 20-yard shuttle, and 3-cone drill. This post examines whethe
r or these measurements are at all predictive of their future performance in the NFL. To measure future production, I use Career Approximate Value (CAV) per year, which does a good job of valuing players across positions. I regressed the CAV of players who participated in the NFL combine from 1999-2010 against their combine workout stats. If a player did not play in the NFL, I assigned him a CAV of 0. Players are divided by the position they played in college, not the NFL.
There are two ways to view the results of these regressions. On one hand, most of the combine measurements do not predict anything at all, and the statistics that are significant (at the p = 0.05 level) explain relatively little of the variance in future performance. Furthermore, the models created from these metrics are very inaccurate and have highly heteroskedastic residuals. However, I am more surprised that any of these measurements matter: the idea that how fast a player runs for forty yards straight ahead once can explain anything I find remarkable. Take a look:
For defense, click the link.
Quarterbacks: two measurements matter: height over 74 inches (6’2”) and shuttle time. This model has a root Mean Standard Error (MSE) of 1.74 and adjusted R2 of 0.05, meaning that this equation explains 5% of the variance in quarterback performance. The p values for height and time are 0.004 and 0.04 respectively.
CAV = 0.28*inches over 74 – 1.52*shuttle time (seconds) + 7.1
So for every inch over 74 a quarterback is, one would expect his CAV to increase by 0.28. This finding suggests that the stress many analysts put on height is exaggerated: one would expect a 6’6” quarterback to have a CAV just 1 larger than a 6’2” quarterback, all else equal. Also, more agile quarterbacks are better; speed does not seem to matter as much, as the 40-yard dash time was not a relevant factor here.
Running backs: this model is more complex, using height, 40-yard dash time, and weight. (Note: height from here on is not compared to 74; that value is specific to quarterbacks). It is also more precise than the model for quarterbacks, with a MSE of 1.59 and adjusted R2 of 0.11. The p values for the metrics are 0.03, 0.00, and 0.03 respectively.
CAV = -0.19*height(inches) – 4.88*40-yard dash(seconds) + 0.03*weight(pounds) + 30.9
So heavier and shorter running backs tend to be more successful in the NFL than lighter, taller players; speed is also very important for RBs. No agility metrics proved significant, however: it’s all about speed, height, and weight.
Wide receivers: none of the metrics measured at the combine significantly predict the performance of wide receivers. The closest statistic is the 40-yard dash, with a p value of 0.08. Some seem to place special importance on the 40-yard dash when deciding which receivers to draft. Perhaps they should rethink that strategy.
Tight ends: here, only the 40-yard dash and bench-press matter. This model explains the most variance among offensive player performance and is the most accurate, with an MSE of 1.36 and adjusted R2 of 0.17. The p values for the included statistics are 0.00 and 0.05.
CAV = -3.55*40-yard dash + 0.06*bench press repetitions + 16.9
I found bench press repetitions surprising here. The story should be that tight ends need the upper body strength to block, but bench press is not included in the models for any offensive line position. Like with running backs, speed is very important, but agility does not matter at all. Surprisingly, neither do height or weight. Their exclusion seems to contradict the “physically impossible to match up against” storyline of Rob Gronkowski, Aaron Hernandez, and Jermichael Finley. However, it is surely possible that they are just outliers from this model.
Centers: Bench press doesn’t matter. Neither does the 40-yard dash. For centers, what explains the variance in future performance is shuttle time. In fact, this model uses the square and cube of shuttle time to best separate centers.
CAV = 3161.2*shuttle – 688.3*shuttle2 +49.8*shuttle3 – 4825.9
The various shuttle metrics have p values of 0.17, 0.16, and 0.15. This regression has a MSE of 1.63 and adjusted R2 of 0.10. I would guess that centers have similar levels of strength at the NFL level, and so agility is what separates the performance of one center from another. The data do not follow that story though, as bench press repetitions have a significant level of variance. Perhaps the best explanation is that bench press reps are not a good predictor of strength.
Guards: While center performance depends on shuttle times, 40-yard dash times determine guard play.
CAV = -2.59*40-yard dash +14.7
It seems that pure speed, not agility, separates guards from each other – at least as far as combine-metrics go. The p value here is 0.00, so it is definitely significant. The model’s MSE and R2 are 1.45 and 0.09.
Tackles: For tackles, both speed and size matter: performance depends on weight and 40-yard dash time. This model is not the strongest one put forward here, with a relatively high MSE (1.87) and an adjusted R2 of 0.10.
CAV = 0.03*weight – 3.77*40-yard dash +13.0
Here, tackles are rewarded for being big and fast, which makes sense. The surprise here is that no other metrics are significant. The measures of agility and strength at the combine do not influence the expected performance of tackles.
Full backs: like wide receivers, none of the measurements taken at the combine predict future production of full backs.
It is clear from these models that the measurements taken at the combine do not accurately predict future performance, at least judged by CAV. It is possible that using more position-specific statistics would increase the predictive value of these measurements. However, I doubt that using other statistics would allow one to create significantly better models than those presented here. A more likely scenario would be that certain interactions between these measurements are statistically significant. However, because there are over 5000 permutations of the seven measurements included here, I did not spend time trying to find which of those 5000+ mattered for each position.
To summarize, the vertical leap, broad jump, and 3-cone drill do not accurately predict production at any position. Bench press repetitions are only significant for tight ends. Height is important for two positions (quarterback and running back), as is the 20-yard shuttle (for quarterbacks and centers). Weight matters for tackles and running backs. The 40-yard dash is the most commonly significant statistic, influencing four positions (running backs, tight ends, guards, and tackles). These results suggest that the combine should experiment with some other measurements that may better predict performance, especially for wide receivers and full backs.
Check back tomorrow for the report on defense.
They do have other drills for receivers like the streaks, quick outs and the gauntlet.
Maybe introducing a sled pushing and/or pulling drill would be a better indicator of functional strength.
I wonder if the things you found even slightly significant (shuttle time for QBs and 40 for RBs, for instance) are really just products of scout biases. If running backs with faster 40-times are given chance after chance when other guys are not, that could make the results look more significant than they really are when you use average value per year as a proxy for ability. Do you have enough data to remove players that got less than X playing time (set it at something low ish)? Then we might find that these things aren’t predictive at all…maybe…
Thanks a ton!
I don’t know if it’s already been done, but it would be interesting to see these regressions run with where they were picked in the draft as another variable to see if any combine drills are overvalued or undervalued by NFL teams. It’s possible that when you control for pick number something like WR 40 time would be negatively correlated with CAV. I’m not sure what the best variable would be for pick number, something using that Massey and Thaler study might work, or even the standard chart that teams supposedly use. I would think pick number wouldn’t directly work very well in regressions.
R squared almost all less then 0.1. Seems like poor models/regressions (as you mentioned). Multivariate analysis may be better suited to analyses of these data. May support clusters of results impact CAV, rather than simple regressions.
How do these look for defensive players? I ask b/c in my mind raw athleticism is more important on defense.
I think these were multiple regression models where unnecessary explanatory variables have already been dropped…
Professional Scouts focus a great deal of emphasis on the 10 yard split of the 40 yard dash. I’ve also been told that vertical leap is a measure of explosion, and for some scouts these two numbers taken together are meaningful in determining an offensive linemen’s ablity to get push off the line.
I need to learn more about CAV in regaurds to O-linemen. Run blocking is all explosion and quick twitch aka “Sumo Wrestling”. Pass Blocking is all feet and balance aka “Basketball on Grass”. I believe those can very accuratly be described by the numbers. Maybe the CAV is breaking down.
It shouldn’t really be surprising that results aren’t predictive for WR’s, because combine results don’t account for a pretty important variable: hands. A great combine performance for a receiver is less relevant if he has poor/mediocre hands, and may only at the combine since he makes up for this in other areas.