By Kurt Bullard
We are less than 24 hours away from the greatest four-day sporting spectacle known to man. While the games themselves are non-stop, exciting action, the hook that brings in the masses is obviously bracket pools. Everyone looks for any type of edge to take home bragging rights for the next year, from looking for dark horses who could make a Final Four run, to picking the right first-round upsets. In years past, knowing about quantitative rankings sites by Ken Pomeroy and Kevin Pauga were advantages that most people who filled out brackets were unaware of. Now, KenPom is part of the common vernacular—thrown about by Jay Bilas and others—and it is simply no longer an advantage. Far fewer will be shocked if a team like Wichita State—ranked eighth by KenPom—makes a deep run as a No. 10 seed.
So, I looked to create a new ranking system, inspired by past HSAC posts. The basic idea is to create a modified Colley Matrix that ranks teams based on weighted win probability, rather than wins or losses. To back up a little, most ranking systems assess team strength based on wins against each other. However, this does not capture the entire picture. Some teams win by 50, while others squeak out a 1-point win in double-OT.
The next logical step is to go to point differential, but even that in itself has some issues. A four-point win in which a team was leading by double-digits the entire game and then puts in its subs is not the same as a team who was trailing staging a late comeback and hitting a few free-throws to ice it. A good example of this strategy is from the Texas A&M v. Northern Iowa game from last year’s March Madness, in which UNI dominated for most of the game, only to play a really, really awful last 44 seconds and choke away a 12-point lead. UNI was the demonstrably better team in that game, but point differential and wins favored Texas A&M. Looking at win probability averaged throughout the game gives a better picture. If you think of a basketball game as a random walk (an oversimplification), win probability lets you look at the entire walk, while point differential and wins only let you look at where the walk ended up at the end of the game.
So, I went about this process. I scraped play-by-play from ESPN and then built a win probability model on that data, in which I constructed a win probability curve for each margin (capped at -20 and 20), in which I ran logistic regressions against time left in the game. I then, to each play in the 2016-17 season, assigned a win probability estimate. Then, I calculated a weighted average of win probability throughout the game, using a quadratic weighting function for regulation, while completely ignoring any overtimes. Then, with this weighted average for each game, I was able to run the regression, taking into account the teams in the games and the conferences of those teams.
This type of regression causes some issues in comparing teams across conferences, two of which I’ll address. For one, conferences may not be a good predictor of every team’s success. For example, while Gonzaga and Saint Mary’s are in the West Coast Conference, they aren’t in the same class as San Francisco, Portland, and Pacific. So, teams in bad conferences may be overrated, dragged down by their conference’s coefficient. In addition, some teams have very few overlap with other conferences, which makes the error on team’s coefficients larger than what is desired. I’m still working on this, so I didn’t want to make comparisons that were shoddy in quality. So, for the purposes of this article, I’m just going to look at teams within each conference, and then make inferences on teams that may be overrated or underrated.
However, I will show the conference coefficients before I delve into the inter-conference comparisons. This is the coefficient assigned to each of the Power 6 conferences.
My model loves the ACC and hates the Pac-12. Despite having lived in Massachusetts all of my life, there’s no way I could have coded in any East Coast Bias. It just goes to speak towards the weakness of the conference beyond the top trio: Oregon, Arizona, and UCLA.
Now, I’ll break down conferences. The magnitudes of these strength values do not translate across conferences, it is purely for ranking teams inside each respective conference.
ACC (Ranking: 8.73)
This analysis actually has Virginia as the best team in the ACC. This partially may be because this model weights all games–November through March—as equals. As a result, this model heavily penalizes Duke for whatever happened to that program in January and early February (it had to be more than just Coach K’s absence), while it values Virginia early crusade through the ACC. There seems to be a top-tier six teams, with a slight dropoff from FSU to Miami, and then a steeper drop-off to the rest of the teams. The Cardiac ‘Cuse—who seemed to come back from double-digits in every game—are heavily penalized in this model, as its win probability throughout games was very low; they never ran away with an ACC game until the season finale. The main takeaway from this model is to trust UVA more than its seed suggests, and to be ever-so cautious about UNC. And BC sucks.
Big 12 (Ranking: 5.29)
Just like with KenPom, WVU comes out as the best team in the Big 12. There’s a clear top two, but a pretty muddled middle-tier that consists of K-State, OSU, ISU, Baylor, and Texas Tech. KSU and Baylor should be the teams that stand out the most here. The Wildcats are the third in the conference here, although they were seeded No. 11 in the tournament, while Baylor is sixth in the conference here. I’m generally skeptical of Scott Drew and Baylor, so I’m not too surprised they’re lower than other systems, but I am a bit shocked they’re this low.
Big 10 (Ranking: 5.28)
I don’t see a lot of surprises here. Maybe the committee would be surprised to see Wisconsin so high, but any sane college basketball fan would have no problem with them being second in the conference. The model really likes Michigan, and is really lukewarm on Maryland, as is every single quantitative ranking system, it seems (sorry, SVP).
Big East (Ranking: 3.86)
There also aren’t a lot of surprises here. It turns out Villanova is very good at basketball, and Creighton and Marquette are pretty good at basketball. It’s interesting that Butler is so low, considering they’re a No. 4 seed. KenPom has Marquette barely trailing Butler and Creighton, however, so this model is only slightly more down on Butler than KenPom is.
SEC (Ranking: 3.84)
Kentucky and Florida are the cream of the crop in this method, with significant space between them and South Carolina. Vanderbilt is closer to Alabama that it is USC, so they may not have been deserving of a No. 9 seed. (They had 15 losses, so not to surprising of a find). However, Arkansas ranks behind Alabama, Auburn, and A&M. They seem to be a very overrated team, despite reaching the SEC finals. They didn’t have a lot of convincing wins over the course of the season.
And finally, the Pac-12. There’s a clear top-3 in the conference, but besides that, it is a pretty barren conference. Such a low conference coefficient should strike fear in those who have UofA, UCLA, or Oregon doing any damage in the tournament. Arizona’s best non-conference win is a 2-point win over Michigan State (who, looking above, is pretty, pretty bad this year). Oregon did not beat a non-conference tournament team. UCLA is actually a little more proven, with wins over Michigan and Kentucky. Nonetheless, the Pac-12 this year looks pretty bad. I would be pretty scared to send any of these teams past the Elite Eight, preferably not the Sweet Sixteen.
So there is your major conference review. I’m by no means gospel, but before you send UNC-Wilmington past Virginia, or before you send Arizona or UCLA to the Final Four, you may want to do some more research.