Does the NFL Combine Matter: Offense

[Update: For defense, click here]

With the National Football League combine starting, people are sure to get excited about the players that run the fastest 40-yard dash times, bench the most, and put up numbers that match the “physical specimen” description. Players are measured for their height and weight, and then put through a series of drills that include the 40-yard dash, bench press, vertical leap, broad jump, 20-yard shuttle, and 3-cone drill. This post examines whethe

r or these measurements are at all predictive of their future performance in the NFL. To measure future production, I use Career Approximate Value (CAV) per year, which does a good job of valuing players across positions. I regressed the CAV of players who participated in the NFL combine from 1999-2010 against their combine workout stats. If a player did not play in the NFL, I assigned him a CAV of 0. Players are divided by the position they played in college, not the NFL.

There are two ways to view the results of these regressions. On one hand, most of the combine measurements do not predict anything at all, and the statistics that are significant (at the p = 0.05 level) explain relatively little of the variance in future performance. Furthermore, the models created from these metrics are very inaccurate and have highly heteroskedastic residuals. However, I am more surprised that any of these measurements matter: the idea that how fast a player runs for forty yards straight ahead once can explain anything I find remarkable. Take a look:

For defense, click the link.

Quarterbacks: two measurements matter: height over 74 inches (6’2”) and shuttle time. This model has a root Mean Standard Error (MSE) of 1.74 and adjusted R2 of 0.05, meaning that this equation explains 5% of the variance in quarterback performance. The p values for height and time are 0.004 and 0.04 respectively.

CAV = 0.28*inches over 74 – 1.52*shuttle time (seconds) + 7.1

So for every inch over 74 a quarterback is, one would expect his CAV to increase by 0.28. This finding suggests that the stress many analysts put on height is exaggerated: one would expect a 6’6” quarterback to have a CAV just 1 larger than a 6’2” quarterback, all else equal. Also, more agile quarterbacks are better; speed does not seem to matter as much, as the 40-yard dash time was not a relevant factor here.

Running backs: this model is more complex, using height, 40-yard dash time, and weight. (Note: height from here on is not compared to 74; that value is specific to quarterbacks). It is also more precise than the model for quarterbacks, with a MSE of 1.59 and adjusted R2 of 0.11. The p values for the metrics are 0.03, 0.00, and 0.03 respectively.

CAV = -0.19*height(inches) – 4.88*40-yard dash(seconds) + 0.03*weight(pounds) + 30.9

So heavier and shorter running backs tend to be more successful in the NFL than lighter, taller players; speed is also very important for RBs. No agility metrics proved significant, however: it’s all about speed, height, and weight.

Wide receivers: none of the metrics measured at the combine significantly predict the performance of wide receivers. The closest statistic is the 40-yard dash, with a p value of 0.08. Some seem to place special importance on the 40-yard dash when deciding which receivers to draft. Perhaps they should rethink that strategy.

Tight ends: here, only the 40-yard dash and bench-press matter. This model explains the most variance among offensive player performance and is the most accurate, with an MSE of 1.36 and adjusted R2 of 0.17. The p values for the included statistics are 0.00 and 0.05.

CAV = -3.55*40-yard dash + 0.06*bench press repetitions + 16.9

I found bench press repetitions surprising here. The story should be that tight ends need the upper body strength to block, but bench press is not included in the models for any offensive line position. Like with running backs, speed is very important, but agility does not matter at all. Surprisingly, neither do height or weight. Their exclusion seems to contradict the “physically impossible to match up against” storyline of Rob Gronkowski, Aaron Hernandez, and Jermichael Finley. However, it is surely possible that they are just outliers from this model.

Centers: Bench press doesn’t matter. Neither does the 40-yard dash. For centers, what explains the variance in future performance is shuttle time. In fact, this model uses the square and cube of shuttle time to best separate centers.

CAV = 3161.2*shuttle – 688.3*shuttle2 +49.8*shuttle3 – 4825.9

The various shuttle metrics have p values of 0.17, 0.16, and 0.15. This regression has a MSE of 1.63 and adjusted R2 of 0.10. I would guess that centers have similar levels of strength at the NFL level, and so agility is what separates the performance of one center from another. The data do not follow that story though, as bench press repetitions have a significant level of variance. Perhaps the best explanation is that bench press reps are not a good predictor of strength.

Guards: While center performance depends on shuttle times, 40-yard dash times determine guard play.

CAV = -2.59*40-yard dash +14.7

It seems that pure speed, not agility, separates guards from each other – at least as far as combine-metrics go. The p value here is 0.00, so it is definitely significant. The model’s MSE and R2 are 1.45 and 0.09.

Tackles: For tackles, both speed and size matter: performance depends on weight and 40-yard dash time. This model is not the strongest one put forward here, with a relatively high MSE (1.87) and an adjusted R2 of 0.10.

CAV = 0.03*weight – 3.77*40-yard dash +13.0

Here, tackles are rewarded for being big and fast, which makes sense. The surprise here is that no other metrics are significant. The measures of agility and strength at the combine do not influence the expected performance of tackles.

Full backs: like wide receivers, none of the measurements taken at the combine predict future production of full backs.


It is clear from these models that the measurements taken at the combine do not accurately predict future performance, at least judged by CAV. It is possible that using more position-specific statistics would increase the predictive value of these measurements. However, I doubt that using other statistics would allow one to create significantly better models than those presented here. A more likely scenario would be that certain interactions between these measurements are statistically significant. However, because there are over 5000 permutations of the seven measurements included here, I did not spend time trying to find which of those 5000+ mattered for each position.

To summarize, the vertical leap, broad jump, and 3-cone drill do not accurately predict production at any position. Bench press repetitions are only significant for tight ends. Height is important for two positions (quarterback and running back), as is the 20-yard shuttle (for quarterbacks and centers). Weight matters for tackles and running backs. The 40-yard dash is the most commonly significant statistic, influencing four positions (running backs, tight ends, guards, and tackles). These results suggest that the combine should experiment with some other measurements that may better predict performance, especially for wide receivers and full backs.

Check back tomorrow for the report on defense.

About the author


View all posts


Leave a Reply

Your email address will not be published. Required fields are marked *