How Much Correlation Between Seasons Is There in Each Major North American Sports League?

By Andrew Puopolo

Last week, I wrote a piece about parity in the NFL. Many of the comments on that article claimed that since 5 out of the 6 NFC playoff teams differed from a year ago, there was plenty of parity in the NFL. Although there is some room for interpretation on what the correct definition of parity is (season to season vs distribution in season), the feedback made me want to answer the following question; Which North American sports league has the greatest correlation from year to year?

A few years ago, Michael Lopez wrote a similar article attempting to answer this question. In his article, Lopez computes the Z score of every team’s record in each of the four major sports and then correlates it with their Z score in the following season. The reason for Z scoring the team’s win-loss record is it gives each sport a Normal distribution with mean 0 and variance 1, and thus a like for like comparison can be made when examining the year to year correlations.

To answer this question, I scraped the win-loss records of every team in the MLB, NFL, NBA and NHL from the 2006 season to the 2017 season from Sports-Reference. The 2006 season was chosen as the start point for this study because that was the first season the NHL adopted its current point-scoring format (2 points for a win, 1 for an overtime loss) and this made it easier to standardize season to season data.

Using this data, I used Lopez’s method by Z scoring each team’s winning percentage (or in the case of the NHL, the percentage of points attained) in each of the 12 seasons. After doing that, I computed the correlation between a team’s record in a certain season and the correlation with their record in each of the next five seasons. Our results were as follows:

This leads to quite a number of interesting conclusions.

The first thing to note is that NBA clearly shows the highest correlation (.64) between records in year x and year x+1, while the NFL clearly shows the lowest correlation (.33) with the MLB and NHL staying somewhere in between.

However, the more seasons of lag we add, the MLB tails off significantly (to a final correlation of -.15) and the NHL/NFL still show some positive correlation 5 years on. This is basically saying that if your team is good/terrible now, they are equally likely to be good or terrible five years from now. In fact, in baseball, it is more likely that they will be below average if they are currently above average.

What are the possible explanations for this and why does it vary by sport? A potential explanation for the NBA is the prevalence of tanking. Teams that are bad are likely to stay bad the next year (the process of tanking doesn’t affect teams overnight and teams like the Warriors will stay good) but over time the draft picks start to develop and the hierarchy of teams changes.

In the NFL, there is a lot of variance in a season (the season is only 16 games, and a single extra win increases your winning percentage by .06). In addition, the NFL has pretty significant roster turnover (the average NFL career is only 4 years). Therefore, it makes sense that season by season we see less correlation. However, as we widen out our time scale to five years, some constants that make teams good (like a head coach) are more likely to be constant.

It would be interesting to dive further into the reasons behind the differences in these curves. Potential explanations include salary cap, average contract length/roster turnover, the effect of aging and the effect of a head coach. Let us know in the comments below your hypotheses for this effect.

If you have any questions for Andrew, please feel free to email him at andrewpuopolo@college.harvard.edu or on Twitter at @andrew_puopolo.