By: Laurie Shaw
On Thursday 14th June, at 6pm local time, the 21st World Cup will kick off in Moscow; the first to be held in Russia, and only the second in Asia. Thirty-one days later, two of the thirty-two participants will contest the final.
There are sure to be shocks and surprises along the way, but what should we expect from the tournament before the first ball has been kicked? Who are the favourites? How likely are we to have a European, South American, Asian or African winner (or perhaps a first-time winner)? Which is the toughest group, or the easiest? How far is each side likely to get?
Based on a simple model for predicting match results, I’ve simulated the World Cup 10,000 times to evaluate the likelihood of various outcomes and investigate some of the quirks of the tournament. If you’re interested in the technical details, scroll down to the Appendix. As the tournament plays out, I’ll be rerunning and updating my predictions: follow me on Twitter (@EightyFivePoint) if you’re interested.
I have made the Python code for running these simulations public: you can find it here.
But without further ado, here are the results.
Who will win Russia 2018?
Figure 1 shows the proportion of world cup simulations won by the sixteen most-likely victors. Brazil and Germany are the clear favourites, winning 17% and 16% of simulated tournaments respectively. So both have roughly a 1 in 6 chance, which is lower than the historical rate at which they have won the World Cup: 4 in 18 attempts for Germany (or West Germany) and 5 in 20 for Brazil. The bookmakers agree, offering odds of 9/2, a probability of 18%, for both teams. Brazil versus Germany is also the most likely final, occurring in 6% of simulations.
Figure 1. Probability of winning the 2018 World Cup for each of the top 16 favourites, based on 10,000 simulations of the tournament.
Despite the fact that, between them, Germany and Brazil have won nearly half of all previous tournaments, the model predicts there to be a 67% chance that a different country will win, and a 36% chance that we’ll have a first-time winner. The chasing pack include Spain (9% chance), Argentina (8%) and France (7%). After that come Colombia, Belgium and Portugal, each with a 5% chance and all three chasing their first world cup victory. England have a 3% chance of ending a half century of hurt. Hosts Russia, meanwhile, win only 1% of the simulated tournaments.
Aggregating by continent, South or Central American countries win 39% of the simulated tournaments, European countries 55% and Asian/Australia 4%. The model predicts only a 2% chance of an African country winning the World Cup for the first time.
Predicting the Group Stage
Let’s rewind back to the start of the tournament and take a more detailed look at the group stage. Figure 2 shows the probabilities of each country finishing in a given position in their group, from first to fourth, according to the simulations. The numbers down the right-hand side of each group table (labelled ‘Qual’) indicate the probability of the team qualifying for the round of 16, the first knock-out round. Only the top two teams in each group qualify for the round of 16, with the winner playing the runner-up of the neighbouring group (for example, the winner of Group A will play the runner-up in Group B).
Figure 2. Probability (%) of each country finishing in each position in their group table, from 1st to 4th. The ‘Qual’ column indicates the probability of the country finishing in either first or second position and therefore qualifying for the knock-out stage. The winners of each group will play the runner-up of the neighbouring group in the round of 16 (for example, the winner of Group A plays the runner-up in Group B). Figures may not sum due to rounding.
The most evenly matched group is Group H (Colombia, Poland, Japan and Senegal). Colombia and Poland are the favourites to progress, but both Japan and Senegal have about a 33% chance of qualifying for the knock-out stage. In 63% of tournament simulations, at least one of Colombia and Poland fail to qualify for the round of 16.
Group G (England, Belgium, Panama and Tunisia) appears to be the least competitive group, with the two European countries clear favourites to finish in the top 2 places. In only 42% of simulations do one of them fail to qualify for the knock-out stage.
If both Germany and Brazil finish in the same position (1st or 2nd) in their respective groups, they avoid each other until the final. However, in 30% of my simulations they do meet in what would be a momentous round of 16 tie.
How far will each country get?
Figure 3 provides a more comprehensive picture of how far the model thinks each country is likely to progress in the tournament. It shows the probability of each country reaching a given round, from the round of 16, the quarter-finals, semi-finals and final to winning the tournament outright. For example, Germany make it to the round of 16 in 84% of simulated tournaments, the quarter-finals in 56%, the semi-finals in 40%, the finals in 26% and win the tournament outright in 16%.
Figure 3. The probability (%) of reaching a given stage of the World Cup, from the round of 16 to winning the final, for each participating country. Figures may not sum due to rounding.
The model predicts that the hosts, Russia, have a 62% chance of making it to the round of 16, but their chances of progression thereafter are fairly low. They make it to the quarter-finals in 23% of simulations, and to the semis in less than 10%.
England benefit not only from being drawn in the easiest group, but also a relatively generous potential round of 16 tie against a team from Group H (most likely either Colombia or Poland). This gives them a 42% chance of reaching the quarter-finals, at which point they typically run into Brazil or Germany and get knocked out. Belgium are actually expected to progress further into the tournament than England, despite the England having a higher probability of winning Group H: once we get into the knock-out stages, England’s poor historical performance in penalty shootouts makes Belgium the stronger of the two.
Other Interesting Questions
Whom does the draw favour?
The draw does seem to favour some countries. Portugal, Spain, England and Belgium all benefit from avoiding one of the top-5 favourites in their potential round of 16 opponents. However, the advantage gained is small: the probability of each of these teams reaching the final is increased by about 1% relative to completely randomised tournament draws (using the same seedings).
Who might be the surprise package?
The World Cup winner typically comes from one of the pre-tournament favourites; however, at least one of the semi-final teams tends to be a surprise. In 2014, unfancied Holland took Argentina to penalties; Uruguay made it to the semis in 2010, and both South Korea and Turkey made it that far in 2002. In 83% of my simulations at least one country from outside the top-10 tournament favourites shown in Figure 1 makes it to the semi-finals.
If I had to name one team to make surprising progress in the tournament, I would go with Colombia. Thanks to a relatively generous draw, the model estimates that they have a 20% chance of making it to their first World Cup semi-final.
What is England’s most likely route to the final?
England make it to the World Cup final in 8% of simulations. Their most frequently occurring paths to the final typically involve defeating Poland or Colombia in the round of 16, Brazil or Germany in the quarter-finals and one of France, Portugal or Spain in the semi-finals. So the route to the quarter-finals looks reasonable, but England are likely to then play one of the best two teams in the world.
Appendix: Simulation Methodology
The core of the model is the method for simulating match outcomes. The number of goals scored by each team in a match is drawn from a Poisson distribution with the mean, μ, given by a simple linear model:
log μ =β0+ β1X1+β2X2
There are two predictive variables in the model: X1 = ΔElo/100, where ΔElo is difference between that Country’s Elo score and their opponents’, and X2 is a binary home-advantage indicator equal to one if the team is the host nation (i.e. Russia) and zero otherwise. Note that Elo scores are explicitly designed to be predictive of match outcomes. The initial Elo score for each team is taken from EloRatings, (using the average of the last year, rather than their latest score). The method does not use any information on individual players.
The beta coefficients are determined via linear regression using all World Cup matches since 1990, obtaining values β0 = 0.16 +- 0.03, β1 = 0.17 +- 0.02 and β2 = 0.18 +- 0.09. All are significant, as is the change in deviance relative to an intercept-only model. Home advantage is equivalent to about 100 in Elo score difference, which equates to a boost of 0.2 goals per game.
Running the regression back to 1954 obtains similar results, with the exception of the home advantage coefficient, which becomes significantly larger. Indeed, there is evidence that home advantage is a declining factor in the World Cup (as it is in club competitions). I have also investigated other indicators, such as distance travelled to the tournament, but did not find them to be statistically significant predictors.
Simulations are run ‘hot’, which means that the Elo scores are updated after each simulated match (using the procedure described here). This has the effect of propagating the impact of results to the outcome of future matches, adding a little more variation in the tournament outcomes by slightly increasing the probability that the weaker teams will progress further into the tournament.
If a match ends in a draw in the knock-out rounds, penalty shoot-outs are simulated, shot-by-shot. Each team is assigned a penalty ‘strength’: the probability that they score each penalty. This is determined based on their performance in previous World Cup penalty shoot-outs combined with a beta-distributed prior around the historical average (73%).
I simulated the tournament 10,000 times, evaluating the outcomes of the group stage, and subsequent knock-out rounds.
All the code for these simulations can be found on github.
Laurie is currently a Visiting Scholar at Harvard’s Center for Astrophysics. This article was originally run on his blog eightyfivepoints.blogspot.com. If you have any questions or comments about this article, please feel free to reach out to Laurie on Twitter @EightFivePoint or by email at firstname.lastname@example.org.