In baseball, wins above replacement (WAR) is one of the best statistics for summarizing the total value of any player on the field. Using Expected Points Added (EPA) and Approximate Value (AV), this post explores the possibility of creating a similar statistic for football: wins added. I first walk through the process of creating this statistic, then critique of the various flaws in the methodology used here. The final result is a highly imperfect statistic, but I hope it at least moves football analytics closer to a WAR-like metric for football.
The reasoning behind WAR begins with the insight that creating or preventing runs adds to the number of wins a team should expect. In any given situation (inning, score, base runners, etc.), there is a historical number of runs that one should expect. Athletes can either increase or decease their team’s expected runs by playing well or poorly. Given baseball’s formula for Pythagorean expectation, 10 runs prevented or created equal one win. Football has a slightly different formula, as derived by Jim Glass:
Win Percentage = Points Scored2.67 /(Points Scored2.67 + Points Allowed2.67)
Multiplying by 16, we can easily convert this formula into wins. To find how many points added equals one win, I plugged the league average number of points scored and allowed for each season into the formula above. I then began adding one to points scored until one could expect an additional win. Below is a table for the past seven seasons demonstrating how many points a win is worth.
Year |
Average Points Scored (Allowed) |
Points/Win |
2011 |
354.9 |
36 |
2010 |
352.6 |
35 |
2009 |
343.5 |
34 |
2008 |
352.5 |
35 |
2007 |
347.0 |
35 |
2006 |
330.5 |
33 |
2005 |
329.9 |
33 |
There is a clear pattern here: one win is worth 10% of the average points scored in a given season. For the past few years, that number has centered around 35.
Like baseball has the number of expected runs in a given situation, football has a number of expected points given a certain down, distance, score, and time remaining. www.advancednflstats.com calculates expected points added (EPA) for offensive skill players, which makes finding their wins added much easier: we can simply divide their EPA by 10% of the average points scored to get the number of wins they contributed to their team. EPA is a flawed statistic, but we’ll critique later.
Finding EPA for offensive linemen and defensive players is much trickier. There’s no true equivalent to EPA for either group. For defenders, Advanced NFL stats uses a statistic +Expected Points Added, which only includes plays where the defense lowered the offense’s expected points added. This method, however, is far from ideal, as it cannot credit individual defenders for their mistakes. There is also no way to give linemen credit looking only at EPA. We need to find another way to value these players.
To remedy this problem, I collected data on Approximate Value (AV) for every offensive player who had a listed EPA in the past five seasons. AV is not designed to be precise, but it does allow us to cut across positions and seasons. EPA and AV share a strong, positive correlation with each other, 0.723, implying that there is some relationship between the two. With this data, I modeled EPA based on a player’s AV.
Expected EPA = -37.98 + 15.24(AV) -1.87(AV2) + 0.12(AV3) – 0.002(AV4)
This model comes with an R2 of 0.59 and a standard error of 24. Those are ugly. Really ugly. But the critique comes later.
This model allows us to convert AV for defenders and offensive linemen into EPA, which we can then simply divide by 35 (or 10% of points scored that season) for the number of wins that player contributed to their team. Voila! We have determined how to find the number of wins added by an offensive or defensive player on the field! Hooray!
So why is this only a step in the right direction?
There are numerous problems with the above analysis. Most importantly, EPA and AV do not describe a player’s contribution on the field accurately enough. A 99 yard screen pass gives a quarterback the same amount of EPA as the receiver that made the run, but those are not equal contributions. Furthermore, it does not credit anyone who blocked for the play or any of the defenders responsible for allowing the play to happen. AV does not even purport to be precise: it’s even titled “approximate”. It is impossible to draw precise conclusions with imprecise data. However, until game-charting data becomes more advanced (and public), we have to make do with EPA and AV.
Another problem stems from the model of EPA using AV. The R2 (0.59) is high enough to be significant, but not high enough to have a lot of confidence. We can say that the variation in AV, AV2, AV3, and AV4 explains 59% of the variance in EPA: the majority of the variance in EPA, but just barely. More importantly, the standard error is enormous (24). If the data were normally distributed around the model, we could use the standard error to create a confidence interval for the average of players with a given AV.
Let’s use J.J. Watt’s season this year as an example. Watt had an AV = 10, so the model predicts an EPA of 28 with a 95% confidence interval of (25.5, 30.5). Not bad! However, that interval is for the average of players with an AV of 10. The prediction interval for an individual player is much larger: (-19.2, 75.2). So for any individual player with an AV of 10, we can be 95% confident that they contributed somewhere between -0.5 and 2 wins. To be confident that a player was better than Watt, the lower bound of that player’s prediction interval would have to be greater than 75 EPA. The next value of AV with such a high lower bound is 18. Only 12 players in the past 5 seasons have had AVs greater than or equal to 18. The seasons those players had are the only ones that we can be 95% confident were better than J.J. Watt’s season this year. So after all of this work, we can say with 95% confidence that J.J. Watt was somewhere between a mildly negative factor on the Texans defense to the 13th best player in the past five years. Ugh.
However, the data is not normally distributed around the model, so the above confidence and prediction intervals are not even accurate. Running a test for heteroskedasticity returns a P value of 0.0000, meaning we can be 99.99% confident that there is not constant variance in the standard error of the model. Since there are negative values for EPA, using a log or square root transformation leaves a significant amount of data out of the model. This result makes it impossible to accurately create a confidence or prediction interval.
These critiques are all valid, and I would not be surprised if there are more that I did not think of. That said, there is at least one important thing to take away from this study: a win is worth 10% of the average points scored by a team each season. From there on in, everything becomes much less accurate because the statistics we are working with, EPA and AV, are inherently imprecise. While there are fatal flaws here, let’s take a step back: we now have a model that can predict the expected wins added of any player on offense or defense. It is not accurate whatsoever, but creating it is itself an accomplishment. The methodology presented here is not the way NFL analysts will compute wins added for football in the future. It is, however, a start.
This is a great start. Here’s another factor that may be impossible to work in: decoys. Unlike football, you can’t distract the defense except with a runner on base. Football is built on misdirection, on fooling defenders to committing to one thing so you can exploit another, on creating mismatches and on giving the ball to the unexpected ballcarrier. How to account for that?