A Machine Learning Analysis Of The NFL: Predicting New Playoff Contenders

By Matty Cheng

NFL teams that performed poorly in win column one year can rise to Super Bowl champions the next (like the Philadelphia Eagles). An NFL game has been sometimes referred to as a ā€œgame of inchesā€ in which wins and losses can be determined by chance, hiding the true potential of a team. This could lead to the seemingly surprising rise of a team like the Eagles. We can use machine learning to look beyond just team record to determine which teams that performed poorly last year could compete for a Super Bowl this year. Our goal is to create a machine learning model that groups NFL teams together, predicting a set of playoff teams.

First, letā€™s take a historical look at last five NFL season (2012-2016) to test our model on predicting former playoff teams, and then later we will predict next seasonā€™s. We scrape game data of the 1280 games across 5 seasons of 8 offensive and defensive variables such as yards, points, and turnovers.

Since we want to visualize the groupings of NFL teams, we must reduce the dimensionality of all the variable data we collected. To reduce dimensionality, we can use Principal Component Analysis (PCA), which is a statistical procedure that converts a set of variables into a new smaller set of variables that still captures the essence of all the original variables. The two principal components explain 64% of the variance.

Using the K-Means Elbow Method, we found the ideal number of clusters to be 3. Finally, we can use the K-Means Algorithm to determine and plot the clusters of different types of NFL teams, shown below:

The axes labeled ā€˜Dimension 1ā€™ and ā€˜Dimension 2ā€™ represent all the variables we have reduced through PCA. For visual clarity, only some of the teams have been labeled.

Clusters represent the quality of teams based on collected input variables. Cluster 1 represents the playoff caliber teams, cluster 2 are borderline playoff teams, and cluster 3 are non-playoff caliber teams. Cluster 1 contained 64% of the playoff teams, cluster 2 contained 28% of the playoff teams, while cluster 3 contained only 8% of the playoff teams. This indicates that teams in cluster 1 were 2.3 times more likely to make the playoffs then cluster 2 teams and 8 teams more likely to make the playoffs then cluster 3 teams. Cluster 1 predicts playoff teams for the next season correctly 64% of the time. This model using the input variables stated above is better than a model purely based off previous seasonā€™s record as that model only captures 52% of playoff teams in Cluster 1. A model using only whether the team was in the playoffs the previous year also only captures 52% of playoff teams.

Next we apply the K-Means algorithm to the 2017 season, to predict 2018 playoff teams, shown below:

The axes labeled ā€˜Dimension 1ā€™ and ā€˜Dimension 2ā€™ represent all the variables we have reduced through PCA.

The modelā€™s Cluster 1 contains 10 of the 12 playoff teams from 2017, deeming the other two 2017 playoff teams (Buffalo Bills and Tennessee Titans) unworthy of the playoffs. Both of those teams had the worst records of playoff teams (9-7), and needed the last game of the season to get a playoff berth. The Bills even needed the Bengalsā€™ miracle 49-yard touchdown on fourth down against the Ravens to win a tiebreaker.

The model also deemed the Baltimore Ravens, Detroit Lions, and the Tampa Bay Buccaneers as playoff caliber teams, while they did not actually make the postseason. The Ravens had the same record as the Bills and Titans (9-7), barely missing the AFC playoffs due to tiebreaker rules. If not for the Bengalsā€™ miracle touchdown, the Ravens would have made the playoffs over the Bills. The Lions also had a 9-7 record, but were in the NFC, so they would have needed one more victory and win a tiebreaker to reach the playoffs. Meanwhile the Buccaneers played in a division with 3 playoff teams (Falcons, Saints, Panthers), so was forced to play each of those teams twice, making their schedule extremely difficult to overcome. The model indicates that the Buccaneersā€™ record was less of a reflection of their level of play and more about the strength of their opponents.

Visually, the Raiders, Jets, Chargers, and Redskins seem to also be close to Cluster 1, so perhaps these teams could also get a playoff berth next season, being part of the 28% of playoff teams from cluster 2.

If you have any questions for Matthew about this article, please feel free to reach out to him at chengm@college.harvard.edu

About the author

harvardsports

View all posts

Leave a Reply

Your email address will not be published. Required fields are marked *