Grading the Accuracy of ESPN and Football Outsiders’ Power Rankings

By Cameron Dowd

Hoards of NFL fans obsess over their team’s “Power Ranking”. Every Tuesday, thousands of people flock to the comments section of ESPN’s newest Power Rankings – mostly yelling through their keyboards, screaming that their team is actually the best and everyone else should, shall we say, go away. But these rankings are mostly influenced by a team’s recent performance and ESPN’s perception of their overall ability, not rigorous statistical analysis. In sharp contrast, Football Outsiders (FO) also releases their weekly DVOA ratings, rooted in quantitative analysis of how well every team has performed. So whose rankings are better? Let’s take a look.

This question is tricky, since it’s hard to accurately say which team’s “true” talent level was higher than another’s even after the season’s end. I examined the accuracy of these ranking systems by comparing higher ranked teams’ performance against lower ranked teams. One would expect that a power ranking’s higher ranked teams would be more likely to win against lower ranked teams and vice versa. Using every regular season game (except week 1, as FO begins calculating DVOA after week 1) from the past four seasons, I tested ESPN’s Power Rankings and FO’s weekly DVOA rankings to see how accurately they predicted the winner of head to head matchups. A “win” for either ranking consists of a higher ranked team beating a lower ranked team, and a “loss” consists of a lower ranked team defeating a higher ranked team. Here are the results:

tableone

In addition to measuring overall winning percentage I wanted to gauge if the ESPN and Football Outsiders rankings improved throughout the year. My expectation was that the rankings should improve once more games were played and teams had more time to display their true overall ability. To examine this idea, I looked at each set of rankings from the past four seasons and compared first half results to second half results.The chart above shows that ESPN did not perform as well as FO, with ESPN accumulating a 62% winning percentage, just behind FO’s 63%. While both performed better than randomized game-by-game picks, it’s hard to call this result a dominant victory for FO’s rankings.

tabletwo

Neither set of rankings consistently shows improvement at predicting wins and losses from the first half to the second over the last four years. For ESPN, the biggest overall change in winning percentage came in 2009 when the ESPN rankings were actually much worse in the second half of the season than the first half of the season. For FO, the biggest overall change in winning percentage came in 2008. This jump in winning percentage appears to be the result of the relatively poor first half winning percentage rather than a significant improvement in the rankings in the second half of the season.

Overall, the FO power rankings derived from advanced analytics predicted wins and losses a little better than ESPN’s subjective rankings. Neither set of rankings consistently improved as more games were played and the rankers gained more information about the teams. Power Rankings can be a lot of fun to look at it, but it is important for fans to remember that an especially great or poor ranking neither crowns nor dooms a team.

About the author

harvardsports

View all posts

7 Comments

  • My guess is that FO and ESPN would agree on who the “winner” would be in roughly 80% of games. I’d be curious to see what the winning percentages were in the other 20%. When the Cardinals beat the Patriots, that’s not really a knock on ESPN or FO, IMO, so the more interesting test would be to see what happens when the two sites disagree.

    • By my count, favorites won 67% of games in this time span. But the Vegas line takes into account home field advantage, and this analysis does not, so I would expect Vegas to have an advantage.

  • Rather than assigning Ws & Ls, I might propose a simple regression to see how predictive each ranking was (specifically, how much wins are predicted by the delta between favored and unflavored’s ranking).

    Perhaps you’ve done this already and the results were inconclusive, but I would guess a scatterplot for ESPN and FO along with a regression line would be interesting to see.

Leave a Reply to David Cancel reply

Your email address will not be published. Required fields are marked *