By Daniel Silberwasser
With legal sports betting becoming a possibility in the U.S., it might be in our financial interest to investigate the track records of various U.K. soccer bookmakers. Using bookmaker data on the English Premier League from the past eleven seasons available online at football.data.co.uk (totaling over 4,000 matches), we can calculate the accuracy of five bookmakers: William Hill, Bet365, Ladbrokes, Interweten, and Bet&Win. In order to do so, we can convert pre-game odds set by these five bookmakers into implied win, loss, and draw probabilities, adjusting for vig, and then calculate the scaled Brier score for each bookmaker. A scaled Brier score penalizes a vector of predictions using mean squared error and is similar to Pearson’s R2 – the higher the scaled Brier score, the more accurate the pre-game odds, with a score of 1 indicating perfect accuracy with 100% conviction. Because the odds betting market is famously competitive, we shouldn’t be surprised to find that over three years, no odds betting site has performed significantly better than any other.
However, what should arouse our interest is that these bookmakers don’t do equally well when it comes to setting odds for home wins, away wins, and draws. As seen in the graph above, each bookmaker has a lower scaled Brier score when it comes to setting odds for draws than for setting odds for away and home team wins, implying they do significantly worse at setting the odds for draws than for home team and away team wins.
Before looking at where bookmakers odds go wrong, we must understand bookmaker probability patterns. We can do so by looking at a visualization of their probability distributions. Because probability densities look almost identical for each bookmaker, below is only William Hill’s.
As seen in the probability distribution above, bookmakers assign higher probabilities of winning to the home team, which in the last ten seasons has won approximately 46% of the time, than to the away team, which has won approximately 28% of the time. 26% percent of games ended in draws, which is also reflected in probability distribution. It is interesting to note that unlike the distribution for home team wins, which is rather Gaussian, the away team win distribution is skewed right and the draw distribution is skewed left.
We can now see how these pre-game probabilities compare with the data. In order to do so, we can cut probability vectors for each bookmaker into bins and calculate the average probability estimate, or expected probability, for each bin. The bins for the home and away team win probabilities are of length 0.05 and the draw distribution, because it has less variance, has been cut into bins of length 0.025. We can then find the actual percentages for the three outcomes predicted by looking at the games paired with the data for each bin. Because there are some bins at the high and low ends of each probability distribution with only a few games that will skew results, I’ve chosen only to include bins with more than 5 games. Below appears the results for the bookmaker Ladbrokes. The graphs for the other bookmakers are shown at the bottom of this post and appear to follow the same patterns.
In every graph shown, the green line depicting draw probabilities is almost entirely below the Y=X line that represents perfect probability accuracy. This implies that draws happen less often in real life than bookmakers predict they do and appears to explain why the scaled Brier scores for draws is so low.
However, there are more patterns in the data left to explore. Most noticeably, in almost every graph, we see that the away and home team lines are higher than the black perfection line in the right tail of their distributions. This implies that when a home or away team is the favorite to win, they win even more often than expected.
But given every bet in the right tail for the home team must be paired with a bet in the left tail for the away team and vice versa, how is the undervaluing of favorites reflected in the left tail? As expected, we see that when a home team wins more often than predicted in the right tail of its probability distribution, the away team loses more often than predicted in the left tail of its probability distribution. In the graph, this is reflected in the red line consistently falling below the black line in the far left of the graph.
Interestingly, we don’t see the same pattern when it comes to the away team winning more often than predicted when they’re heavily favored. We would expect that the blue line for home team wins would also fall beneath the black line in the left tail. Although in some graphs the blue line does fall below, there isn’t a consistent enough pattern across bookmakers to conclude the pattern holds. However, at the risk of reading too much into the graph, the lack of symmetry isn’t that surprising if we think about home team advantage.
Because of home team advantage, a draw seems less likely when the home team is favored heavily than when the away team is favored heavily. When away teams are favored and win with a higher probability than predicted, it’s reasonable to assume that rather than placing too high of a probability on a home team upset, bookmakers place too much probability weight on a draw. Thus, when the away team is heavily favored and wins more often than predicted, bookmakers take some of this loss in their predicted draw probability rather than in their home probability. So, not only do bookmakers consistently overvalue draws, but they probably overvalue them particularly when the away team is strongly favored.
But is it the bookmakers who are overvaluing draws and undervaluing favorite teams, or are we? As Stephen Levitt explained in his 2004 paper, “Why Are Gambling Markets Organized So Differently Than Financial Markets?,” bookmakers can either a) set odds for a game that will lead to a balanced book, with half of all gamblers on each side of the bet, b) set the efficient odds for a game that will equalize the probabilities of a successful bet on each side or c) “systematically [set] the ‘wrong’ prices in a manner that takes advantage of bettor preferences.” Strategy C makes the bookmaker the most money but if the odds are too off people who know the right odds can lose the bookmaker money.
These three strategies paint the scaled Brier score inefficiency found earlier in a new light. Levitt found that bookmakers in the NFL pursued strategy C most often because the gambling data he had access to showed that there were far from an equal number of bets taken on each side of the spread. I don’t have data with the number of bets taken on each side, but unless the findings above aren’t losing bookmakers that much money (yet), strategies B and C seem unprofitable. Strategy A seems the most likely here. Bookmakers know that people overvalue draws and undervalue favorites and are thus setting the odds such that the general gambling public thinks the odds are fair. To answer my earlier question, it seems unlikely the bookies are making these false assumptions. We are.
Below is the gallery for the lines of the five different bookmakers. Simply click on a image to enlarge it and enter the gallery.
The data used for this post can be found here.