How Predictable Was The 2018 World Cup?

Yesterday, favorites France defeated surprise package Croatia to win their second World Cup. However, earlier on in this tournament we saw some very surprising results. In the group stage, Croatia dismantled Argentina, Germany were defeated by both Mexico and South Korea while in the knockout stages Russia eliminated Spain on penalties while Belgium defeated favorites Brazil. To the naked eye, it appeared as though this yearâ€™s tournament contained more surprises than usual. One of the semifinals featured relatively unfancied sides England and Croatia, while both Sweden and hosts Russia made the quarterfinals. But was this really the case? Was this tournament more surprising than past tournaments? And if not, which World Cup could be considered the â€śmostâ€ť and â€śleastâ€ť surprising?

I set out to answer these questions in the following manner. First, I took the pre tournament ELO rating of each team in the World Cup since 1986. 1986 was chosen at the start point of its analysis because it was the first tournament to have a single group stage followed by knockout stage. Next, I simulated each match 1 million times using the log linear regression model fit by Laurie Shaw from TheEightyFivePoints. After careful consideration, I chose to not run these simulations â€śhotâ€ť. This was chosen because I wanted to magnify the surprise factor of teams that were lowly rated as they pulled off more and more upsets. Since we are looking for surprises, it makes sense to consider the teams pre tournament rating at all points in the tournament. From the simulations, I was able to estimate a win/loss/draw probability for each match in each of the 9 World Cups. I then calculated the Brier Score for each match in the nine tournaments studied. For those unfamiliar with what a Brier score is, it is the quadratic loss of the predictions of each discrete match outcome. For example, if a model predicts Team A will win 50% of the time, Team B wins 30% and there is a draw 20% of the time and Team A wins, then the Brier score is calculated as (1-.5)^2+(0-.3)^2+(0-.2)^2=.39. A perfect model would have a Brier score of 0 while the worst model would have a score of 2 (in the case of 3 outcomes). Our results were as follows. In addition, if a knockout stage match went to extra time, I considered it to be a draw. Note, higher scores meant that the World Cup was more surprising, while the lower scores imply the opposite.

These results were very interesting. Based on this analysis, 2018 was considered to be an average World Cup in terms of surprise of the individual match results. We find two major outliers here. The first is that 2002 was the most surprising World Cup by a very clear margin. Those that were able to wake up for the incredibly early games will remember hosts South Korea and Turkey reaching the semifinals, defending champion France going out without scoring a goal, and the United States defeating highly rated Portugal in the group stage. In fact, runners up Germany (who traditionally have been a powerhouse) were only the 10th ranked team entering that World Cup.

On the flip side, we found that 2006 was the least surprising World Cup. That World Cup saw 4 large European nations (Italy, France, Portugal and Germany) reach the semifinals while Ukraine were the only non-traditional powerhouse to reach the quarterfinals. In fact, the two World Cups that were found to be least surprising were the 2 most recent World Cups to be held in Western Europe. Since European teams are traditionally the best teams at World Cups, this seems to confirm the notion that European teams dominate tournaments held in Europe, while tournaments held outside of Europe tend to be more random.

As for the 2018 World Cup, despite the fact that there were surprising results early in the tournament, the tournament appeared to be more unlikely than possible in large part due to the lopsided nature of the knockout stage draw. The only major shocks of the knockout round were Russia over Spain and Belgium over Brazil.