By Anthony Zonfrelli

Last February, HSAC member Kevin Meers analyzed which combine events (namely 40-yard dash, bench press, vertical leap, broad jump, 20-yard shuttle, and 3-cone drill) actually translate into future NFL success for each position. The original article used Career Approximate Value (CAV) as a measure of production so that players could be compared across positions.

After the results found in his study, I decided to see if there were any associations between combine event scores and how early the players were selected in the NFL draft using the draft record from 1999-2010. If a player was either not drafted or did not participate in the combine, he was left out of the analysis. I assume that NFL teams try to draft players within each position in rank order of their future production: teams try to draft the best tight end first before drafting the second best, etc. Kevin found out which combine events were the best predictors of CAV, but which combine events have been the best predictors of eventual draft pick? In other words, do NFL teams select their rookies based on the combine events that will ultimately lead to their future success?

The accuracy of these models is subject to scrutiny. These regressions are very similar in accuracy to those in Kevin’s article, about which, he said, “There are two ways to view the results of these regressions. On one hand, most of the combine measurements do not predict anything at all, and the statistics that are significant (at the p = 0.05 level) explain relatively little of the variance in future performance.” On the other hand, it’s surprising that one short event can actually hold predictive value. Obviously there are many more factors that come into play when evaluating a draftee besides combine score (for example, “hands” for wide receivers), so it is interesting to see how important these one-time measurements really matter in the evaluation of new player. Here’s what I found:

**Quarterbacks**: the events that were significant in predicting CAV were height over 74 inches (6’2”) and shuttle time. As poor of a predictor as this model was (adjusted R^2 of 0.05), it makes intuitive sense. Your quarterback will benefit from being taller (even if only a small benefit), and the lateral movement of the shuttle logically translates into skill at scrambling and evading pass rushers better than 40-yard dash time would. After all, how often does your quarterback run 40 yards at a time if his name isn’t Michael Vick? Surprisingly, none of these events made it into the most predictive model, and even more surprisingly, only broad jump did.

Pick = -3.64*broad jump (inches) + 526.41

The p-values are 0.002 and 0.000 respectively, with an adjusted R^2 of 0.0859 and a root Mean Standard Error (MSE) of 71.712, which means that broad jump explains about 8.6% of the variation in quarterback draft selection. The model suggests that for every additional inch the quarterback’s jump, he can expect to be selected three or four picks earlier in the draft. It’s possible that broad jump is just a proxy for general athletic competence, and not that NFL scouts look for quarterbacks who can jump far.

**Running Backs**: heavier, shorter, faster running backs were found to be more successful on average, but NFL scouts tended to ignore height and weight and instead focused on 40-yard dash time and broad jump, with p-values of 0.036 and 0.000 respectively. The model has an R^2 of 0.19 and a MSE of 67.62.

Pick = 150.95*40-yard dash(seconds) – 4.32*broad jump(inches)

The broad jump sneaks its way into this regression too. It makes intuitive sense that NFL team would look for impressive 40-yard dash times, but perhaps they should pay more attention to size and stature.

**Wide Receivers**: No events were significant predictors of CAV for wide receivers, which makes sense because it is a skilled position. While speed and size would seem to be important, it is even more important to have good hands and understand how to run routes. I wouldn’t expect scouts to just ignore these metrics, though, but I was surprised at the only event that made it into the final regression: bench press.

Pick = -6.84*bench press (repetitions) + 230.67

The p-values for these were 0.001 and 0.000 respectively, with an adjusted R^2 of 0.17 and a MSE of 60.96. These findings suggest that each additional bench press rep a wide receiver put up, on average, led to getting picked about seven selections earlier. Perhaps teams think that stronger wide receivers can fight through jams, but bench press was not found to affect CAV.

**Tight Ends**: 40-yard dash time and bench press were significant in the most accurate model for CAV for any offensive position. The scouts nailed it with the tight ends – both 40-yard dash and bench press were significant in predicting draft pick for tight ends, each with p-values of 0.006.

Pick = 133.67*40-yard dash (seconds) – 4.38*bench press(repetitions)

This model has an adjusted R^2 of 0.17 and a MSE of 62.29. It seems NFL teams have figured out the equation for tight end success – 40-yard dash to go out for passes, and also bench-press to block. Kudos.

**Centers**: The equation that best predicted success for centers included three transformations of their shuttle time (shuttle, shuttle^2, and shuttle^3). Assuming that nearly every center entering the draft is large and strong, quickness is what sets them apart. NFL teams tend to favor speed however, with the only 40-yard dash being significant (p-value = 0.038).

Pick = 132.66*40-yard dash(seconds) – 567.06

This is one of the worst models out of any position. The adjusted R^2 is 0.06, the MSE is 64.46, and although the constant is not significant (p-value = 0.087), without it the model would not make any sense. Without the constant, a center who ran a 5.00s dash time would expect to be chosen 663^{rd }overall, rather than 96^{th} with the constant. All that can really be taken away from this model is that teams tend to draft faster centers earlier.

**Guards**: The CAV models found that only faster 40-yard dash times led to more success as a guard, but I found that weight also played a part in their evaluation.

Pick = 91.10*40-yard dash(seconds) – 1.23*weight

This is the worst model out of any position. With coefficients with p-values equal to 0.029 and 0.008 respectively, this model has an adjusted R^2 of only 0.05 and a MSE of 64.56. Again, this just shows that front offices prefer faster, heavier players at guard.

**Offensive Tackles**: for tackles, it appeared that weight mattered more than it did for guards in regards to total production. Luckily, scouts recognize it, favoring tackles heavy tackles with good 40-yard dash times (p-values = 0.027, 0.000, and 0.007 respectively)

Pick = -0.76*weight + 151.39*40-yard dash(seconds) – 436.61

This has an adjusted R^2 of 0.10 and a MSE of 70.05. NFL executives hit it right on the money, drafting heavier, faster offensive tackles.

**Full Backs**: None of the combine events predicted production for full backs, so it was fitting that NFL teams didn’t tend to favor any particular combine event in drafting them. While they seem to know that no combine metric is enough to favor a full back, they haven’t found out what does – full back was the only position that, when regressing the CAV of players at that position on their draft pick, the result was not significant.

Conclusion

In summary, vertical leap, height, 20-yard shuttle time, and 3-cone drill were not significant predictors of draft pick for any position. NFL teams looked at broad jump for quarterbacks and running backs, and bench press for tight ends and receivers. Weight was important in evaluating most offensive linemen (guards and tackles), and 40-yard dash time was the most commonly used statistic when evaluating players (running backs, tight ends, centers, guards, and tackles). Tams were right on when choosing which events led to the success of tight ends and offensive tackles, and came pretty close with guards. As far as full backs, someone needs to figure out a metric to evaluate them so teams can stop wasting draft picks.

Like the models for CAV, these models for overall draft pick are obviously not perfect. Many different factors go into selecting a draft pick, so the fact that any combine event can influence a draft pick by enough to be statistically significant is interesting. That is not to say that these combine scores directly cause the prospects to be selected higher in the draft – it’s just that these scores are associated with being selected earlier. We can’t pin down a causal relationship given this regression framework, but you don’t need exclusively causal analysis to find predictive relationships.

These model are far from perfect however. With MSEs at each position ranging from around 60 to 70, we can predict with 95% confidence a player’s overall draft position +/-130 picks (assuming a normal distribution). Unfortunately, that interval spans just about the entire draft. While the precision (or lack thereof) of our results demonstrate that the combine has very limited predictive value, the fact that an event that the player does one time for less than five seconds on one day can have any impact on their draft selection whatsoever is still astonishing.

How do you know they’re picking based on these drills, and not merely that for example faster football players tend to be better football players and also tend to run the 40 faster.

We don’t know that these drills are the direct cause of their draft selection. However, we don’t need to – this is only a predictive model, so if a good 40 time leads a player to get drafted earlier only because people see them as a fast football player (and therefore more likely to be good), then it still serves its purpose.

I actually just did an analysis on this on my blog from a more subjective data set: Mel Kiper’s pre/post combine predictions. The accuracy of predictions was improved following the combine for the more speed-based positions (WR, DB, TE) but actually impaired for some other positions (QB, OT, DT).

I find the regression angle interesting but I think it leaves out the more skill-based drills, team interviews and preconceptions (e.g., did a player “outperform” on the 40 even with a mediocre time?). While there are certainly flaws with looking at predictions like Kiper’s, the errors should be consistent since he is the guy doing the picking and he has high incentive to stay in line with teams’ thinking so his predictions look accurate. Anyway, give mine a look if you get a chance – would love to have other perspectives on the methodology http://www.sportsplusnumbers.com/2013/02/some-news-is-worse-than-no-news-nfl.html

that looks pretty solid actually, although it might just be that he gets info from real scouts and can more accurately incoroporate their thoughts into his rankings

This is a really interesting study… It’s fascinating to me that vertical leap wasn’t a huge factor- especially for DBs and WRs!! Great work guys.

-Bob

These are just straight linear regression? How many assumptions are you willing to violate? Your “assuming a normal distribution” is telling – the draft pick order is a uniform distribution and the picks are not necessarily equally spaced. So by loosening up those requirements you might also pick up some prediction. Also since you didn’t include all the positions in one model you’re missing main effects for positions (QBs tend to go earlier than WRs or OL, e.g.). So you might try a modeling structure that allows for an ordinal criterion conditioned on position.

Great article. I think the film is what they truely evaluate, though the combine is what some GM’s put certain talent over the top.