19th century mathematician and genius Francis Galton, the man who invented the regression line (for which we at HSAC are forever indebted), once found himself at a country fair with a peculiar contest: who could guess the exact weight of a slaughtered and dressed cow? While no individual correctly guessed the exact weight, 1198 pounds, Galton noticed that the average of the 800 guesses was 1197 pounds–essentially perfect. This observation of the so-called “wisdom of the crowd” has been expanded upon in the last century with the creation of prediction markets. These markets (Intrade, for example) tend to be more accurate than any particular expert could hope to be over large sample sizes.
One particular set of markets of interest to sports fans and bettors are the sports betting markets of Las Vegas. Sports betting has become a multi-billion dollar industry, a way for shrewd bettors to make a living and squares to lose their money. The spreads and odds set by a combination of Vegas oddsmakers and gamblers are, in large samples, an accurate reflection of team strength and ability.
I think that most people would assume that the accuracy of the Vegas market increases as the season goes on. It makes intuitive sense that with more information about teams, the oddsmakers and bettors should set lines that are closer to the actual outcome of the game. But is this perception grounded in reality? Professional bettors may know better. To test this, I used a dataset of over 30,000 closing lines from college basketball games over the period of 1997 to 2011. I’d like to thank Mike James for doing the heavy data collection lifting.
I numbered each game in every team’s season, using only those seasons for whom I had at least 20 game lines. I wanted to analyze the actual results of the games compared to their lines for each of the game numbers (i.e. the average deviation from the line for the first game of each team’s season, the second game, etc). I tried to filter out teams who had played multiple games before their first game with a Vegas line.
You may find the results surprising. We often use standard deviation of a set of games from their lines to assess the accuracy of the lines. If the betting lines were getting more accurate with more information, one would expect the standard deviation of the lines to get smaller as the season went along. This would reflect fewer games that widely varied from the betting expectation. The SD over the course of the season is charted below:
While there is some trend lower towards the middle of the season, the series stays remarkably constant. The graph also makes the trend look more significant than it might be. Using a Dickey-Fuller Test, a concept borrowed from time-series forecasting, I tested whether the progression of SD over the course of the season exhibited stationarity. Stationarity means that the process is mean-reverting to some mean level. Large observations are followed by more negative ones, and vice versa. In this case, it would mean that regardless of when the game is in the season, we would expect the accuracy to be around some mean standard deviation.
The Dickey-Fuller Test for this series was significant at the 5% level (t-stat of -2.997), supporting the alternate hypothesis of stationarity. This means that at any given point in the season, the accuracy of the closing Vegas line, measured by the standard deviation of the results from the lines, is relatively constant. While there appears to be some increased variation at the beginning of the season, it is not different enough from the rest of the season to show that the sports betting markets are learning more about the teams.
But perhaps by looking at all teams, I’m missing a subset that has increased variability. Perhaps the market isn’t as good at predicting the relative strength of teams with fewer returning players, and thus more unknowns. To look at this aspect, I collected returning minutes data from 2008 to 2011, and sought correlations between returning minutes and the line miss for early season games. The correlation between returning minutes and line miss in each of the first 10 games of the season was at most -0.14, essentially zero. It seems that the prediction markets are equally good at assessing teams with lots of returners and teams with few.
What is there to conclude from this exercise? While there is more work to be done, it seems that there is some good evidence that even at the beginning of the college basketball season, Las Vegas sports betting markets are close to as accurate as they are all season. Of course, I am talking about large sample averages: there are obviously individual teams that gamblers learn more about whose early season lines are out of whack. But it seems to me that lines, like the stock market, are a mean-reverting process.
Vegas does not appear to learn much over the course of the season, but I think that is to the market’s credit. There is an inherent amount of randomness in the game of basketball. The betting market seems to be accurate, around that level of randomness, even at the beginning of the season, when one might expect it to be less accurate. This is yet another example of the power of prediction markets and the wisdom of crowds.
I think you’re leaving out a very important point. Remember the goal of Vegas is NOT to set a line that is expected to match the outcome. In fact, Vegas doesn’t care how close their line is to the actual outcome of the game. Their goal is simply to set a line that gets 50% of the wagers on both sides of the game.
Thanks for the comment. I agree with your assessment of how Vegas sets the lines, but I’d make two points: I’m using the closing lines, which should be an accurate reflection of the relative strength of the teams. I believe that if an oddsmaker sets a line that is solely to get 50-50 action and in no way reflects the true strength of the teams, smart money will come in on the difference and the closing line will be much closer to the truth.
The other point is that sports bettors have become extremely sophisticated in the last few years. A sportsbook that set bad opening lines would fairly quickly be hammered by good bettors who arbitrage the advantages.
I think you need to consider “Vegas” and the betting market as two separate entities with the former being solely responsible for the opener and the latter driving the closers from which you are basing your conclusions above. To find out if Vegas is learning as the season goes on from the market, can’t you just chart the lineset SD of the opener to the close on a per week basis? The early season weakness that gets routinely referenced would be that wider exploitable window.
Yes, that makes sense. I wonder if line moves are significantly bigger in the early season. The early opener weakness hypothesis, if true, could be caused by bigger line moves, or by a higher percentage of the line moves moving closer to the actual outcome.
The problem is collecting opening lines. Any suggestions on a data source?
SBR keeps track of openers, an example is here http://www.sbrforum.com/ncaa-basketball/odds-scores/20111114/ . The odds history is more detailed.
However, I do not know a convenient way to get these in a form where they could be used. If anyone does that would be wonderful.
This page has NCAA hoops data from 2007 to present. Each season opens as a comma delimited file with the opening line labeled “lineopen”. http://www.thepredictiontracker.com/basketball.php
Vegas (nor offshore books) does not set lines to get 50-50 action. That is a fallacy. They are risk takers.
Mike is correct that Vegas doesn’t set the line to get 50/50 action. Vegas is instead seeking to set the line to maximize the number of bets, as the vig guarantees them a small profit from each bet placed. Vegas doesn’t care about individual bets being 50/50.
You can get data on line moves. Many newspapers (including USA Today) print the line every day. I’m not sure how many days in advance the books put out a line on college hoops, but the papers do publish it. Newspapers’ lines the day of the game may not be reliable as the final line since lines can change during the day and evening. The closer to game time, the heavier the action.
LIne moves are tricky. Las Vegas books move the line in reaction to action. Sometimes that action is from the public — a large number of moderate bets. Sometimes that action is from the smart money — a few large bets by bettors whose views the books respect. You can’t know for sure whether a line move represents the “wisdom” of crowds or the smart money, though I would guess that early line moves are a reaction to smart money. Also, in some games (e.g., NFL) books will adjust the vig rather than the line. That would be really hard to track for old games.
As for James’s assertion that “vig guarantees them a small profit from each bet placed”: I don’t think so. Books collect vig only on losing bets. So generally the books do want a balanced action, but if they do have to take a position, they want the heavier side to be the one they expect not to cover.
Non-Conference/Conference splits would be interesting. I would guess that the lowest ‘trough’ in that graph is somewhere near beginning of the conference schedule.
Wouldn’t looking at how much a line moves opening-to-closing, better reflect the accuracy of the line setting?