By: Evelyn Tjoa, Daniel Bittner, Victor Zeidenfeld, Matt Melucci, Matthew Doctoroff, Jennifer Yu, Andrew Christie, Arthur Macedo, Praveen Kumar, Cyril Leahy
What do Tom Brady, Johnny Manziel, Brian Bosworth, Shannon Sharpe, and Terrell Davis have in common? Whether they exceeded expectations or were simply underwhelming, their performance in the NFL came as a big surprise. It remains partly a mystery as to why performance in the NCAA doesn’t necessarily coordinate with performance in the NFL, and many turn to explanations relating to playing style, injury, and chemistry. However, in this article we aim to determine if certain metrics in the NCAA can act as potential predictors for a player’s success in the NFL.
Methodology
We began by selecting the positions we wanted to analyze and chose to focus on linebackers and tight ends. To ensure equal comparison we utilized a player’s college stats from their final year in the NCAA and extracted this data from PFF. We also included metrics of our own including height, conference, and climate. When it came to determining a measure for NFL success, we decided to use a player’s Madden NFL rating from their 4th season if they were a tight end and 2nd season if they were a linebacker. These particular seasons were chosen as we deemed them to be mostly representative of a player’s success in their respective position.
We then regressed all of our NCAA metrics to NFL rating individually to return linear models with intercepts, coefficients, and p-values. The intercept value represents a player’s NFL rating on average, given that their statistic for the corresponding metric is 0. The coefficient value signifies that on average, holding everything constant, how much a player’s NFL rating changes when the corresponding statistic increases by 1. Lastly, the p-values serve to determine whether or not the relationship between the NCAA metric and NFL rating exists in the larger population. Essentially, the p-value is the probability of observing the same correlation or stronger, given that there is 0 relationship between the NCAA metric and NFL rating. Thus, a smaller p-value closer to 0 would provide stronger evidence that there exists a relationship between the NCAA variable in question and NFL rating. We opted to use an alpha value of 0.05 with all p-values less than or equal to 0.05 classified as statistically significant.
Tables:
Taking the ‘Stops’ variable from the linebacker dataset as an example, the regression returned an intercept of 67.885 and coefficient of 0.180. The corresponding equation would be as follows:
NFL Rating = 67.885 + 0.180x
where the number of stops recorded by a player in their final NCAA season would be put in place of the x to predict their NFL rating. Because ‘Stops’ has a coefficient p-value of 0.004 which is below the alpha value of 0.05. We can consider applying this equation to the larger population of linebackers. Because this effect could be purely correlative, rather than causational, we cannot conclude a causal relationship between Stops and NFL Rating. However, by observing the slopes and coefficients and p-values of each variable, it becomes clear which metrics are most correlated with NFL success.
Although there are a number of variables of which we weren’t able to determine a statistically significant relationship, it’s still interesting to examine the trends within our restricted dataset. Interestingly, a tight end’s NFL rating increased by an average of 6.978 if they played in the Big 10. Moreover, we discovered that the tight ends in our dataset actually had their NFL rating decrease by an average of 1.206 for each additional inch in height.
The only statistically significant metric to quantify tight end performance, however, was fumbles, which actually had a positive (coefficient of 4.3) impact on NFL performance. Obviously, fumbling more should not make a tight end a better player. This paradoxical relationship, however, shows how correlative insights can still be useful. While the statistically significant relationship between fumbles and NFL performance could be due to variance, it could also be due to numerous other interesting explanations. Fumbles could imply more duties as a receiving rather than blocking tight end which could correlate with better performance. Fumbles could also be correlated with other attributes such as harder or more aggressive running.
Overall, this analysis presents a framework of how to use simply linear models to generate insights about future player performance. With more thorough examination and analysis of larger samples we can hope to make more sense of the somewhat unpredictable and bumpy path to NFL glory.
Notes*** A sample size of 23 was used for the tight end dataset. Madden ratings are out of 100