Survival of the Fittest: Predicting the 2013 NCAA Tournament

The goal of every team in the NCAA tournament is to survive and advance. And, if you want to win your March Madness pool, your goal should be to predict which teams will do just that.

Most prediction systems view the NCAA tournament as an extension of the regular season. While that may be the best way to pick the most games in the tournament correctly, I do not believe it is the way to predict the most important games correctly. Correctly selecting a team to make the Championship Game can more than make up for a relatively poor first round.

That is why, building off of Ken Pomeroy’s great work, for the past two years I have been publishing a model of the NCAA tournament based on Survival Analysis. Academic researchers use Survival Analysis to determine whether new pharmaceutical drugs or treatments are effective. I co-opted the framework to try to discover something truly important: the path to bragging rights over your friends.

The precise details are contained in last year’s post, but the general idea is that the best way to win an office pool is to empirically determine which teams have the traits that best predict their survival to the Final Four, Championship Game, and eventual National Championship. Using Survival Analysis is a natural fit.

Does it work? Over the last six NCAA tournaments, using out of sample testing, the model has outperformed the RPI, BPI (unfortunately the BPI’s performance is behind ESPN Insider’s paywall), and Pomeroy rankings:

Screen shot 2013-02-19 at 5.33.26 PM

The 2013 Model

My model this year is almost exactly the same as last year–I log-transformed the experience variable, but that was the only change–but I have added predicted probabilities for every potential NCAA tournament matchup.

A brief diversion or those of you interested in the math (skip ahead a section if you are not): the Cox Proportional Hazards model yields the predicted probability of any team “dying” or losing at a given time. If we know that two teams meet and one must lose, we can calculate the probability of Team A defeating Team B as:

Screen shot 2013-03-19 at 10.04.17 AMI used this formula and the predicted Hazard Rates to predict all possible paths for a team. For instance, if Indiana makes the Sweet 16, they could play any one of UNLV, California, Syracuse, or Montana. The odds of the Hoosiers advancing to the Elite Eight by beating UNLV is the probability that Indiana beats UNLV times the probability that Indiana plays UNLV. Summing those probabilities for all four potential opponents, we get the overall odds that Indiana would reach the Elite Eight.

The Model Predictions

Without further ado, I present the full predicted tables for each region. Some quick notes: similar to the Upset Model, this Survival Model does not do well with very low probability events. Thus it overrates the chances of the 15 and 16 seeds pulling upsets. If you believe the true chances of a 16 pulling an upset are around two percent, you should adjust the Final Four and Championship probabilities for the 1 seeds up about one percentage point each.

Additionally, the model is fairly chalky this year and is similar to Ken’s final rankings. Despite incorporating experience and regular season wins over NCAA tournament teams, the model still picks Florida to win it all. The Gators, however, are essentially a tossup (51.5 percent) against Louisville in a potential Championship Game matchup.

Screen shot 2013-03-19 at 10.32.18 AMLouisville is the clear favorite here, with a predicted 34.5 percent chance of advancing to Atlanta. Six seed Memphis looks vulnerable to either St. Mary’s or MTSU, who are rated almost identically by the model. Also note that if Creighton is able to beat Duke, they have the potential to also beat Michigan State (47 percent chance).

Screen shot 2013-03-19 at 10.39.39 AM

Gonzaga remains the slight favorite here, but has a very tough potential matchup with Pitt in the Round of 32. If you are looking for a surprise in the first round, Boise State may be a good pick. The model predicts Kansas State to lose to Wisconsin, so if you believe the Badgers are that good, picking the Broncos may be a good risk-reward at 41 percent odds to win.

Screen shot 2013-03-19 at 10.40.08 AMAs many pundits have pointed out, Indiana appears to have been gifted a weak region. Miami and Marquette are by far the weakest two and three seeds. Bucknell has the best odds of any double-digit seed of reaching either a Sweet 16 or an Elite Eight. Davidson, too, has great odds of pulling an upset for a 14 seed.

Screen shot 2013-03-19 at 10.43.27 AMWhat do you do with a problem like the Gators? The “eye test” says that Florida cannot win close games and will lose early. Most tempo-free based rankings put the Gators up top. My predictions have Florida not winning the NCAA’s 84 percent of the time, but they are still the overall favorite.

Look out for Michigan in this bracket. Of all the teams in the field, the Wolverines have the highest variance of prediction (i.e., my model predicts the widest distribution of outcomes around the most likely one). This could mean an early exit to Nate Wolters and South Dakota State, or a run to the Elite Eight with a win over Kansas.

Picking straight down the line of predicted probabilities on these picks may not be the best strategy for your pool. If you are in a very large pool, you should pursue a riskier strategy than chalk in order to maximize your expected value. Regardless of your pool size, I believe the Survival Analysis predictions here can help you increase your chances of winning bragging rights over your friends.

(Ed. note: another version of this analysis appears on Sports Illustrated’s website.)

About the author

harvardsports

View all posts

39 Comments

  • An interesting idea to picking a bracket. An 11 seed has a 31.25% of beating a 6 seed according to past history. So you rank every 11 seed historically by their chances of beating a 6 seed according to this model, and if they’re in the top 32% then you pick them (and so on for every seed). This should lead you to not picking all chalk since for example the best 12 seed has a 40.6% chance of winning so this model would say that you should not pick a single 12 seed, even though I believe about 1.3 twelve seeds on average advance. Just some food for thought of how to apply this model to a large pool.

    • Well overall, there is a 36-40% chance of any given 12 seed to win. Since there are four 12 seeds in the tournament, that means that one or two upsets should occur, we just don’t know who they’ll be. And the model rates them all as equally likely to occur. Although we know that not every one of them will upset, but that not all of them won’t, the safest prediction is to go with the favorites and hope a 12 seed doesn’t turn on a Cinderella run.

      • Well “safest” depends on the size of the pool and the depth of the payoffs. At some point to win or rank high in a large pool you need to pick some upsets. The survival analysis seems like a handy way to do this. Nice work by John on this the last few years.

        • I think in general, the more people in the pool, the further from “consensus” you need to go with your picks. Of course, knowing your competitors is important. A lot of homers picking North Carolina, Duke and Ohio State leaves the door open for a nice, risk-averse bracket.

          • Also, why is Bucknell favored to beat Butler in this article but was only given a 37% chance in the other article (titled Predicting the Madness: 2013 Upset Edition)

  • I think it’s pretty intuitive that, in a big pool with equal total points per round, you should not pick any of the major favorites as your tournament winner. Too much competition even if you’re right. But in the early rounds, it seems like you don’t want to go chasing too many random upsets unless there is an upset bonus, or an obvious choice (like Minnesota over UCLA).

    At what point in the tournament should you go from risk-averse to risky? Or are there flaws in my thinking?

  • Are hazard-rates going to be posted, or should we just use the predicated rates to determine the hazard rates?

  • How do the percentages for the final four teams/championship compare this year to last year? Presumably Kentucky was a more clear-cut favorite, but is the model more or less “confident” this year?

  • I’m in a pool where you multiply seed by round value (1,2,4,8,16,32). My initial inclination is to do an expected value type of selection. So since Davidson has a 42.7% chance of winning and they are a 14 seed, their expected value is 5.978 compared to Marquette’s expected value of 1.719 (3 seed x 57.3%). Any thoughts as to whether there are flaws in this thinking? There are about 200 people in my pool so you will need to hit some of the upsets in order to win.

    • Not a bad play to a certain extent, just don’t put Davidson in your final 4 even though I am sure 3.3% times seed value times 8 is more expected points than any other team in the bracket. Essentially you have to optimize between highest expected value and lowest variance (unless you want a high variance and hope to get lucky).

  • It appears this model differs from other projections in the early rounds, especially with the likelihood of 1 or 2 seeds losing. Given that a #1 has never lost in the opening round, it seems odd that this model predicts that there is a better than 1/3 chance that one of the four will not make the Round of 32. How good have your aggregate numbers been over the past 7 years for 13-16 seeds been? 28 games is still a rather small sample per seed, but it would be instructive if the mean probability for a 15 seed advancing is 20%, but only 12% have.

  • Its only Thursday evening, and no thanks to you my brackets are busted. If this is your work on predicting who will survive and advance, I suggest you change fields, because your results are less than stellar. Your best double digit school to advance, Bucknell, is out. So is your second best, Davidson. However, double digit seed California and double digit seed Oregon, advance. On the games with close seeds, which you would think your results would come in most useful such as 8 Pittsburg vs. 9 Wichita St. your work bombed as well. This was useless across the board for someone who is attempting to win a march madness pool or in predicting who will survive and advance. Your work empirically determined nothing. Your professor might give it an A though, after all, its Harvard.

    • Definitely a smart way to judge something is after 12 games (and factoring out the 1 and 2 seed games more like 8) when it has had excellent results in prior years.

    • The only person not getting an “A” around here is you, Mr. Paul McGinnis. If I told you that there was an 83.3% probability of flipping a die and not getting a 6, and then I rolled a 6, would that discredit my probabilistic claim? Of course not. John didn’t say that Pitt would win. He didn’t say that Bucknell would win. He said they might win — with 66% and 52% chances, respectively.

      It should be noted that Nate Silver, Jeff Sagarin and Ken Pomeroy all agreed on Pitt, and agreed that Bucknell/Butler was a tossup.

      Please understand what probability means before taking a rip at someone that is quite obviously smarter than you.

  • The bottom line here is I used this article as a guide and after my brackets are ruined. I didn’t read this to learn that Louisville has a really good chance of going to Atlanta. Your predictions which I used is making mayhem of my brackets. I am not ripping anyone, I am learning that probabilities are virtually useless in a 64-68 team tournament.

  • You suggested that John change fields. I’m pretty sure that’s ripping on someone.

    Feel free to go use your gut instinct. That should be more useful.

    • You said “If this is your work on predicting who will survive and advance, I suggest you change fields, because your results are less than stellar.” This statement seems to be ripping John, although perhaps it was unintentional. John Ezekowitz has been working for the Suns since sophomore year of college and has an offer from Bain Capital according to a Harvard Crimson article, so I think the field is suiting him well.

      In regards to ” I am learning that probabilities are virtually useless in a 64-68 team tournament.”, probabilities aren’t useless they just don’t pan out every time. If I told you that a coin had a 60% chance of landing on heads so you bet heads straight up and the coin landed on tails, was knowing the probability useless? You need to stop with result oriented thinking and analyze this from a process oriented thinking mindset. If you want to say that the process was incorrect then that’s one thing, and if you want to say that a large sample of results show that the process is flawed that’s also fine, but if you want to argue that one day of march madness invalidates a whole process then I just can’t see the logic in that.

      Also Paul, in the future I recommend you go with your gut, even if you found a perfect predictor its more fun doing it on your own.
      Cheers
      Zachary

  • Nice picks egghead. Did your inhaler run out of puffs when you were calculating who would win basketball games being played by people infinitely more talented than you?

  • Mr. Esekowitz, it does not look like you are going to approach your past pick percentage of 73%. Before Sunday’s games you are at 65%.

Leave a Reply

Your email address will not be published. Required fields are marked *