by Alex Rojas
How does one year’s March Madness tournament compare to the results of the previous year? How likely is it to see a repeat champion? Does a team’s success from the previous year impact the next year? Since last year’s tourney was cancelled by COVID-19 days before it was slated to start, no one knows what would have happened. With this year’s March Madness underway, we set out to answer these questions and created a model to simulate the cancelled 2019-2020 tournament, probabilistically predicting the result of each game. Looking back at who would have been successful in the tournament last year, we can see what teams stepped up and which teams fell off in this year’s tournament.
In order to capture the complexity of predicting sports games, we took a two-pronged approach to simulating game scores. First, we implemented a Multilayer Perceptron (MLP) to predict the actual score. For example, given team data on Baylor and Villanova, the model may have predicted a win for Baylor. Some inputs to the model include a team’s strength of schedule, win percentage, average offensive and defensive points per game, and the game location. Based on these metrics, the model might predict a 75-68 Baylor victory.
We also address the fact that team statistics change over the course of the season based on recent performance. For example, Villanova was ranked #3 in this year’s AP Preseason Poll, but now they are ranked #18, so we need to update for this in the model. To model this paradigm, we trained a Random Forest Regression model that, given a team’s efficiency ratings and game scores, could predict updated team metrics. The model would take the score from this hypothetical Baylor-Villanova game, the previous offensive and defensive efficiencies of both teams, and output the new offensive and defensive efficiencies. If Baylor beats Villanova, it would have increased their team rating and decreased Villanova’s.
Yet, such a model would not be very robust if it only was simulated once – the randomness in the model would probably not capture a solid representation of the slew of games to be played with only one isolated run, giving it large bias towards that single simulation. For example, if Baylor had a 70% win probability against Villanova and we simulated it only once, it could be that this simulation resulted in a Villanova win, even though they are probabilistically the weaker team. To address this, we use a Monte Carlo simulation to simulate the tourney thousands of times to create a probability distribution for each game. If Baylor wins 700 out of 1,000 simulations against Villanova, for example, they would have a 70% win probability.
While we never got to see who cut down the nets in last year’s tournament, it is still interesting to see what teams likely would have won and where they are today. Using last year’s simulation, we can see the holdovers, the newcomers, and the drop-offs from last year.
Since the NCAA’s Selection Sunday never happened last year, we had to use a heuristic approach roughly similar to that of the NCAA’s seeding policy. Because there is a voting process, we could not quite capture the same sentiments that people have when seeding teams; we pick the conference winners and best at-large teams based on the available metrics. So, the top results of our model must be taken into a context that was relatively blind to region, which could’ve changed from run to run. Two equally strengthened teams – one in a more competitive region and the other in a less competitive region – would have different probabilities of winning the tournament based on their competition.
The top teams from last year’s simulation in terms of win frequency were: Kansas, Baylor, San Diego State, Dayton, and Gonzaga.
2020 Simulation Analysis
What made certain teams do better than others? Our model strongly valued offensive and defensive efficiency (which culminate in the NCAA’s NET Ranking) as well as a team’s strength of schedule and opponents’ strength of schedule. Kansas, which stood atop both the rankings and our model at the season’s conclusion, plays in the Big 12 with other dominant teams like Baylor. This is likely why we see Gonzaga fall so low: the Zags annually steamroll teams that are simply not good. They play in the West Coast Conference where the average KenPom ranking of the top 5 teams this year was 57. The average KenPom ranking of the top 5 Big 12 teams was 21. An interesting riser in our 2020 model is Baylor, which may be due to their strong strength of schedule. For example, they had to play Kansas (as did most of their opponents), which in a sense allowed them to combine their own success with the strength of Kansas to reach a top-2 position. A notable faller, on the other hand, is Florida State. When the season ended, both the AP and Coaches Polls had the Seminoles as a top-5 team; however, they were ranked outside of the top 10 in our forecast, meaning that they were not taking advantage of teams on a per-possession basis as much as teams like Duke, which jumped up in our 2020 model compared to the polls.
Holdovers in 2021
The teams that advanced far in last year’s tournament simulation that are still in the 2021 tournament are Baylor (#2 in the 2020 simulation), Gonzaga, Creighton, Villanova, Florida State, Oregon, and Houston.
Gonzaga has improved upon their dominance from last year, jumping up to the unanimous #1 in the nation. Not once this season have they dipped out of the top spot since the AP Preseason poll. This is likely due to a more experienced front-court: Drew Timme has nearly doubled his points per game to 18.8 from 9.8 last year. Furthermore, the recruitment of future NBA lottery-pick Jalen Suggs has had a big impact; the Zags freshman guard averages 14.4 points per game and facilitates the offense with 4.3 assists per game. They are led by the WCC Player of the Year Corey Kispert, who scores a more efficient 19.2 points in 30.9 minutes per game compared to 13.9 in 30 last season. It still stands to be proven, however, whether or not the Zags will live up to the hype, but so far they have won their first two games by a combined margin of 59 points. FiveThirtyEight gives them the highest odds of winning the tournament at 32%.
Another notable holdover from last season is Baylor. Last season, the Bears ranked second in our model, and they are second again this year. This success can be chalked up to a team scoring 3-pointers at an even higher clip: they lead the entire NCAA with 41.5% shooting from beyond the arc. Wooden Award finalist Jared Butler is having an outstanding season, scoring 16.9 points per game on 47.9% shooting from the field. And don’t forget the mullet: Matthew Mayer is shooting more than 42% from deep. The Bears are certainly a team to watch out for as they are the second favorite left in the tournament.
Newcomers in 2021
It’s strange to call the Michigan Wolverines a “newcomer,” but they were not to be found last year after finishing 9th in the Big 10. They rebounded this year, finishing first in the Big 10 standings before losing in the conference tournament semifinals to Ohio State. While a historically dominant program, they find themselves in a relatively unfamiliar place this year as they are a #1 seed for the first time since 1993, when current Coach Juwan Howard was still playing. Last year, Michigan was unranked. In our simulations, however, they trended towards a #5 seed, which usually would correspond with a ranked team. So, maybe they were a solid sleeper pick last year indeed; a twenty-first ranking in our 2020 model only corroborates this. This year, Michigan ranks fourth in the NET and they are one of the tournament’s most balanced teams, so they are certainly on the rise.
Unfortunately for them, they will be without Isaiah Livers for the foreseeable future. Livers ranked second on the Wolverines with 13.1 points per game, so third-leading scorer Franz Wagner will have to step up in a big way if the Wolverines are to make a deep run. He’s got big shoes to fill: his older brother Mo took the Wolverines to the Championship Game in 2018 and was a first round NBA draft pick. In their fourth straight Sweet Sixteen, Michigan fans can be hopeful that this team can perform in spite of losing Livers if Eli Brooks and Chaundee Brown continue to score like they did against LSU (21 points each).
The Alabama Crimson Tide are also a solid contender this year– and this is basketball we’re talking about. Bama’s recent success as a #2 seed and a top 5 team in the country can be attributed to the hiring of Coach Nate Oats, whom they picked up from Buffalo. Oats coaches a modern, analytical style of basketball typically associated with the Houston Rockets: in a radio interview, Oats said that a long two point shot is “punishable by death.” This approach, in addition to improved defense, has boosted the Tide to seventh in the NET. If they can continue to shoot at 48.5% from deep like they did in their blowout of Maryland, #1-seeded teams should be frightened. This could make for the marquee Elite 8 matchup of the weekend if both the Crimson Tide and Wolverines advance to face each other. If the two were to play, FiveThirtyEight gives the Tide a 52% chance of winning – basically a toss-up. Could Alabama become the second school in history to win a football and basketball championship in the same year along with fellow SEC powerhouse Florida? They are still in the fight.
Dropped Off in 2021
Duke was 11th in the AP Poll last year, but they had the seventh best championship frequency in our simulation. While Kentucky was just ahead of them in the polls (#8 in the AP Poll), they had a win frequency that fell to 19th in our model’s simulations. In the context of NET, these numbers make sense, as Duke was ranked sixth and Kentucky 21st. This year, both blue bloods have missed the tournament entirely. One significant difference from most years is the lack of a high-scoring freshman; both teams are used to having one-and-done stars like Zion Williamson or John Wall to carry the team, but no freshman on either team averaged more than 14 points per game. Duke’s No. 1 recruit, Jalen Johnson, also opted out midway through the season to prepare for the NBA draft. Last year, Kentucky featured Tyrese Maxey and Immanuel Quickley. Duke was led by experienced point guard Tre Jones and freshmen Vernon Carey Jr. and Cassius Stanley. All of these players are now in the NBA, so both teams lost a lot of excellent players.
Dayton was led by lottery pick Obi Toppin and was ranked fourth in our model. We predicted that they would win the tournament with the fourth-highest frequency, but with Toppin’s departure, Dayton failed to qualify for the 2021 tournament.
We will never find closure for the many players whose final year in college basketball was cut short, but we can still try to analyze what would have happened and why. We think that last year had a strong Big 12, as Kansas and Baylor were our top two picks, but we were lower on Gonzaga. Teams often lose strength when the bulk of their rosters exit, so it’s not surprising that Duke, Kentucky, and Dayton failed to make the 2021 tournament. In contrast, no Gonzaga player was selected in last year’s NBA draft, so they retained much of their strength and brought in a great freshman in Jalen Suggs. The Zags are at the top of most people’s rankings now, and they are the heavy favorites to win the tournament. But in an already wild tournament with the most first weekend upsets ever (34%), anything is possible.
Citations and Credits
Thank you to Michael Lee for contributing significantly to this simulation (for the final project in CS 109a at Harvard). Thank you to David Arkow for helping with the editing process.
- 2019-2020 NET Rankings (HeroSports)
- 2019-2020 NCAA Men’s Basketball Rankings Polls (ESPN)
- Michigan earns No. 1 seed in NCAA Tournament, will face play-in winner (Andrew Kahn, Michigan Live)
- How did Alabama basketball get so good, so fast (and how can they be beaten) (Aaron Torres, Kentucky Sports Radio)
Editor’s Note: This article was inspired by a final project for Harvard’s CS 109a course. If you have questions about this article, or if you would like to read the whole report with more analysis of the hypothetical 2020 tournament, please feel free to reach out to Alex at firstname.lastname@example.org.
Thank you for reading!