Team “Form”, Recency Bias, and Regression to the Mean

By Daniel Silberwasser

The term “form” is often used in soccer to refer to either a player or a team’s recent performance. A key aspect to the idea of form is that form influences future success. For example, one might say “Typically I would favor Tottenham, but Aston Villa has been in great form since the international break”. This study looks to investigate form’s existence and whether it is incorporated into bookmaker expectations by looking at English Premier League match results and bookmaker odds data from the past 11 seasons made available online by http://www.football-data.co.uk/, totaling over 4,000 games.

The challenge with modeling form in soccer is establishing a metric for match difficulty that adjusts for home and away team strength but is independent of recent performance. While trying to come up with such a metric, we must confront the question of whether pre-game odds can be used as a metric for game difficulty that is independent of recent performance. In other words, while figuring out whether recent performance impacts future performance, we must also determine whether pre-game bookmaker odds adjust for recent performance.

Let us establish a null hypothesis that bookmaker odds don’t adjust for form. Since bookmaker odds are assumed to be purely an estimate of true match difficulty, I propose that form can be captured in the difference between the points a team earns and the points it was expected to earn. A team earns 3 points for a win, 1 for a draw and 0 for a loss. Expected points earned can be calculated by using the bookmaker probabilities, adjusted for vig. For example, if a team has a 45% probability of winning, a 30% probability of losing and a 25% probability of drawing, the team’s expected points are 3(.45) + 0(.30) + 1(.25) = 1.6. Let us define a team’s points above expectation for the current game t as PAEt, where PAE is the difference between a team’s points earned and the points it was expected to earn. The previous game’s PAE is PAEt-1 and two games in the past is PAEt-2 and so on and so forth. The odds used for this study are from Bet365. As I’ve shown in an earlier post, because of the competitiveness of the bookmaker market, odds are virtually identical across bookmakers.

If we want to define a team’s form as its performance compared to its expected performance  over the course of multiple recent games, we can do a weighted sum of these residuals for the previous x number of games, where we give more weight to the most recent game and incrementally less weight to games farther and farther in the past. For example, when the value of x is 3, the metric for form would be 1*PAEt-1 + .666*PAEt-1, + .3333*PAEt-2.. When x is 4, the metric for form over the last four games is 1*PAEt-1  + .75*PAEt-2, + .5*PAEt-3. + .25*PAEt-4. To make sure form isn’t calculated using games from previous seasons, form only begins to be calculated once a team has played x games in a season.

Having established a metric for team form that captures how a team is performing relative to expectations over recent games, we can now see how team performance compared to expectations, or the difference between a team’s earned points and its expected points, changes with respect to the team’s form going into the match. If either form has a real impact on team performance and is incorporated into bookmaker expectations accordingly or has no impact on team expectations and is thus not incorporated into bookmaker expectations, we would expect little to no change in team performance compared to expectations as pre-match team form changes. If team form does impact team performance in the way it is thought of popularly, and form isn’t accounted for appropriately by bookmakers, we would expect teams to overperform expectations when in good form and underperform expectations when in bad form. Lastly, if team form doesn’t impact performance as it is popularly thought to but bookmakers expect it to, teams would underperform expectations when in good form and overperform expectations when in bad form.

Bookmakers Adjust for Form

Bookmakers Don’t Adjust for Form

Form Impacts Team Performance

A: Teams neither beat nor underperform expectations as pre-match form changes

B: Teams overperform when in good pre-match form and underperform when in bad pre-match form

Form Doesn’t Impact Team Performance

C: Teams underperform when in good pre-match form and overperform when in bad pre-match form

D: Teams neither beat nor underperform expectations as pre-match form changes

In order to see which of the four possibilities is most likely, we must see how team performance with respect to expectations actually changes as pre-match form changes. In order to see this effect, I divided up the distribution of the team form metric detailed above into 30 bins, with at least 10 games in each bin. For each bin, I then calculated the average team form metric and the corresponding average difference between the number of points a team earned and the number of points it was expected to earn. Below, I plotted the former on the X axis and the latter on the Y axis, including a trend line plotted using a locally weighted regression with a 95% confidence band. Form was calculated using four different time frames, first using the previous game, next using the previous two games, then using the previous three games, and finally using the previous four games.

Screen Shot 2015-08-11 at 10.17.07 AM

The S-shaped curves in the graphs above suggest that bookmakers are susceptible to recency bias when it comes to extreme outcomes. In other words, cell C is the most likely in the matrix above. As we see in the graphs above, the data points between the two extremes, representing games when teams have either moderately overperformed or underperformed expectations in recent play, are centered around 0 randomly. This suggests that when teams are in either moderately good or bad form, they neither consistently overperform nor underperform in their next game. This is either because their recent game doesn’t impact their performance, or because expectations adjust accordingly to their impacted performance. Regardless, being in somewhat good or bad form doesn’t make one any more likely to beat or miss expectations.

However, we do see a consistent pattern at the extremes of the graph. On the right extreme of the graph, we see that teams that have significantly outperformed expectations either in the most recent or two most recent games tend to underperform in their following game. Conversely, teams that significantly underperformed expectations in recent games tend to overperform in their next game. This suggests that when a team does exceedingly well, expectations for that team are too high on average for its next game, and when a team does exceedingly poorly, expectations set for its next result are on average too low. The memory of the most recent game leads to imperfect odds, as teams are more likely to regress to the mean than have a similarly extreme outcome.

Although it appears the right tail of the graph using data from the three previous games data doesn’t appear to follow this trend, we should consider the point in the top right corner, way outside the 95% confidence interval, to be an outlier. The majority of the data in the right tail is at or below 0.

Interestingly, the S-shaped curve of the graph flattens out as we consider data from farther and farther in the past. This is further evidence for recency bias, as bookmakers respond most strongly to the most recent data. Additionally, we see that the left tail has a much sharper spike than the right tail has a drop, even as less recent data is added. This suggests that bookmakers are more strongly impacted by teams doing much worse than expectations than teams doing much better than expectations. This implies a certain level of negativity bias as well.

If one finds it surprising that professional bookmakers are so vulnerable to recency bias, consider, as I’ve mentioned in a different post, that soccer bookmakers are often setting odds to reflect public opinion. By doing so, bookmakers can balance their books such that half of the people betting with them will lose. By having recognized the recency bias in the soccer fans around us, hopefully we’ll be in the other half.

About the author

harvardsports

View all posts

1 Comment

  • I am no mathematician but would love to be able to put what you have just written into excel to produce fixed odda prices. Are you able to send me something that will enable me to do this?.

    Warm regards

    Justin

Leave a Reply

Your email address will not be published. Required fields are marked *