By John Ezekowitz
On Tuesday, I looked at predicting upsets in the NCAA Tournament. Today I’m looking at which favorites are vulnerable to upsets. Again, I created a database of NCAA Tournament favorites’ tempo-free stat profiles using KenPom.com’s invaluable data. My dataset covers the last 6 NCAA Tournaments (compared to ESPN.com’s analysis, which only uses 4 years). I used logistic regressions to see which stats were statistically significant. You still have time to fill out your brackets, and I hope my conclusions can help you fill out a better one.
I found that three stats were statistically significant in predicting which favored teams were vulnerable to upsets: turnover percentage (P-value<0.001), offensive rebounding percentage (p=0.042), and 3 point field goal defense (p=0.017). An increase of one percent in turnover percentage increases the odds of being upset by 42 percent. A decrease of one percent in offensive rebounding percentage increases the chance of being upset by 13 percent. Finally, team that allows one percent better 3-point shooting would be 28% more likely to be upset.
These results make intuitive sense; turnover percentage appears to be the best predictive factor of tournament success for both favorites and underdogs. The other two stats are crucial in upset prevention. Teams that don’t get offensive rebounds give underdogs more effective scoring chances and the 3-pointer is the sort of risky strategy that underdogs can thrive with.
So which teams in the 2010 field are vulnerable to upsets? The first one that jumps out is 4-seeded Vanderbilt. Vandy turns the ball over 19% of the time, only rebounds 32% of their own misses, and are only average in defending the three. According to this formula, the Commodores have a relative risk of being upset 35% compared to an average favorite. Another perhaps more surprising team is Butler, who also turns the ball over 19% of the time and has an ORebound rate of 31.5%. Texas A&M also only crashes the offensive glass at a 34% clip and gives up 34% shooting from beyond the arc. The Aggies’ relative risk of being upset is 24%.
Among the 1 and 2 seeds, the team that appears most vulnerable to a 2nd round upset is Ohio State. The Buckeyes are a very poor offensive rebounding team and allow opponents to shoot better than 34% from beyond the arc. A matchup with sharpshooting (and surprisingly good rebounding) Oklahoma State could be an upset. The Buckeyes have a 25% relative risk of being upset. None of the 1 seeds appear particularly vulnerable, but Kentucky does turn the ball over on more than 20% of their possessions.
Finally, I have been pleasantly surprised by the interest and comments on my first post. I am planning a follow-up post after this tournament to analyze the magnitude of upsets and see how that affects this data set. I agree that there is a correlation/causation issue, as there almost always is in an observational look like this one. However, when we run the regressions for favorites and underdogs together, the results are the same, and are even magnified.
I’m a fan of more descriptives (if you don’t mind). For instance, I’m assuming this is a logistic regression, so could you post the variables you used and the odds ratios rather than just the p-values I feel like I could evaluate it a little bit more easily.
Also, what was the pseudo R2 for the analysis? Are you actually accounting for very much or not? I imagine it is probably pretty low. I liked your previous post on the averages but again, in that post I would have liked to see some SDs.
Not complaining, I like your analyses. Stellar stuff. Keep up the good work, I’ll keep reading.
And Vanderbilt goes down…
Could we have the data set?
DSMok1:
I intend to post the dataset as soon as I have finished inputting this year’s data.