Elo Ratings for The Challenge: Rivals III

By Harrison Chase

The opening episode of America’s fifth major sport, the Challenge, premieres on MTV tonight. As a sports analytics blog, it would be almost unethical for us not provide some statistical insight into what promises to be an entertaining season. If you have not watched the Challenge, then you will probably not understand the point of this blog. But that’s okay, because, as I said before, The Challenge: Rivals III premieres tonight at 10/9c on MTV – so go check it out and then come back and give this blog a try.

Mathematical Details

The Challenge is nearly perfectly suited for a statistical analysis as it basically consists of players competing against each other in – this is probably where the name comes from – challenges. We can therefore use the Elo rating system, of 538 notoriety, to infer Elo ratings for all players involved in the TV show. Now, the Elo system doesn’t fit perfectly with the setup of the Challenge. Elo was developed for chess, a one-on-one sport, while the Challenge involves not only 1-vs-1 competitions, but also team-versus-team competitions and competitions involving more than two players/teams. Therefore, I have had to improvise a little bit in order to utilize all of the challenges and gameplay outcomes to get Elo ratings. I have done this in the following way:

  • 1 vs 1: Same as Elo, no changes
  • Team vs Team: I added up the Elo ratings of the all the players on each team, and then got the expected win probability from that sum. This assumes that players’ talent levels add linearly, which may very well not be true. To adjust the Elo ratings after the competition, I just assigned equal responsibility to each player on the teams (again, may not be true) and adjusted their score so that the total adjustment of the team was equal to what the Elo ranking system would have adjusted it by, treating the teams as individual players.
  • Team vs Team vs Team: For gameplay outcomes where there was more than one team, there is usually always a winning team and then often a losing team, with all the other teams in the middle. Therefore, I acted as if the winning team beat all of the other teams, and the losing team lost to all of the other teams, treating those as their own individual games.

Finally, details involving hyperparameters: I set each player’s initial rating to 1500, and used a K value (which is basically how much a player’s rating can change based on one game) that varied depending on how many games a player had played previously, ranging from 16 to 160. The idea behind the varying K is that early on you are less sure about how good a player is, so you want to update by larger amounts as that new information is worth more, and subsequently decrease K as new data becomes less informative.

Power Ranking The Challenge: Rivals III

Using these Elo ratings, we can rate the teams involved in the upcoming season of the Challenge. I generated the Elo ratings by running the algorithm through the last ten seasons (starting from ‘The Ruins’), mainly because that is all the data I had time to collect. I am hoping to update this to include more seasons later on. This season is paired challenge, where players are paired up with a rival of the opposite sex, so when ranking teams we can just add their Elo ratings together. Some of the teams include ‘rookies’, players who have never participated in a Challenge before. For the purpose of these initial ratings we will just mention them separately. In no particular order, with Elo rating (when available) in parentheses:

  • Ashley (NA) and Cory (1566): I don’t watch any other MTV show than the Challenge, so I can tell you absolutely nothing about Ashley. Cory, however, is coming off an impressive rookie season finishing second, and his rating, well above the average of 1500, makes this team perhaps the most dangerous of all teams involving a rookie.
  • Brianna (NA) and Brandon (NA), Cheyenne (NA) and Devin (NA): Both these teams are two rookies, so I have nothing to really add here.
  • Christina (1465) and Nate (NA): Christina’s rookie campaign (also last season) did not go that well, as she was part of the first team eliminated. There is therefore very little to judge her  on, especially because she did okay in gameplay (not losing) but then lost in an elimination to Jenna.

Now onto the teams that do not involve a rookie, so we can properly rank them. In order from worst to best:

  • Vince (1422) and Jenna (1492): This may be a shocking team to have rated so lowly, with Jenna finishing in the top three both times she competed and Vince being Johnny Banana’s cousin, meaning that by blood he is part demigod. Still, Jenna was kind of allowed to coast to the Finals both times, finishing third out of three once she got there, and Vince, despite being paired with Bananas himself last year, couldn’t make any noise. This, for me, is actually one of the most intriguing teams to watch for how they do this upcoming season.
  • Leroy (1573) and Averey (1373): Leroy, the 4th highest rated male, is dragged down by Averey, the 2nd lowest rated girl. It will be fascinating to watch and see if Leroy can carry Averey the distance.
  • Thomas (1511) and Simone (1436): I think most people would agree this might be one of the weaker teams, with Simone sticking out as particularly unimpressive. Thomas looked like a bust after his first season, but did well enough last season to bump his rating up to above average.
  • Johnny Reilly (1547) and Jessica (1440): Johnny Reilly impressed in his first season, coming in second only to Bananas without facing one elimination, but suffered a sophomore slump in his second season doing fairly mediocre. Will be interesting to see which Johnny Reilly his partner, Jessica, brings out.
  • Dario (1508) and Nicole (1490): A relatively untested team, having only competed in one Challenge each, we will learn a lot more about their strength this season.
  • Tony (1504) and Camila (1506): Both Tony and Camila come in at slightly above average, although I suspect most people would argue Camila should be higher. She has competed in a lot of Challenges so her rating is unlikely to jump much this season, but Tony’s Elo rating could change a lot in these next few weeks.
  • Jamie (1561) and KellyAnne (1536): Now we get to the top three teams, and the decided favorites to win it all. Jamie had an incredible season for a rookie, winning it all last season, and he is joined by KellyAnne who is also definitely a strong player. However, this team is rated so highly by Jamie’s miracle run last season – if that proves to have been just a flash in the pan, then perhaps this team isn’t as worthy of a top 3 rating as Elo thinks they are.
  • Wes (1538) and Nany (1566): There is no doubt that this Wes/Nany juggernaut is a top three favorite to win it all, however. Nany is a top 5 female competitor, and Wes brings a decisively above average rating and a wealth of experience to the team. Incredibly strong team, two veterans who have proved themselves many times before.
  • Johnny Bananas (1688) and Sarah (1597): This team seems a little unfair, pairing the 2nd highest rated male with the 3rd highest rated female (both are the top of their respective genders among competitors this season). It is not out of the question that Bananas manipulated his way into a conflict with Sarah so they could be paired together for a challenge like this  – classic Bananas move. However, one thing they have going against each other (which Elo does not account for) is that neither of these players have a ton of fans in the house with them. It seems like Bananas manages to piss off people every season, and that, combined with the fact that teams may be gunning for the favorites, may lead to an early exit. However, there is no doubt about it: this is the strongest team in this season (by far).

Overall Elo Ratings

Finally, let us take a look at the top rated players among all who have competed in the past 10 seasons. Here are the top ten competitors by Elo:

Screen Shot 2016-05-03 at 10.53.57 AM

My man CT runs away with it, and there is also pretty clearly a dominant top three (CT, Bananas, and Laurel). I don’t think any of the names on here shock anyone (keep in mind this is only based on results from the past ten seasons). These ratings pass the ‘smell test’, so to speak.

And there you have it. The first ever statistical analysis of MTV’s ‘The Challenge’. I think it is safe to say this type is probably the most important and groundbreaking work done in sports analytics since Moneyball, and I would not be surprised to hear that a whole lot of young statisticians want to now go into the blossoming and lucrative field of Challenge analytics.

About the author


View all posts

Leave a Reply

Your email address will not be published. Required fields are marked *