# What Predicts ATP Tennis Rankings? Hint: It’s Not Break Points

By Andrew Cohen

With the conclusion of the last tennis major of the year, the US Open, and the return of football on Sundays, the sport has been returned to its status as an afterthought for American sports fans. Interestingly, for those who look at sports from a quantitative point of view, tennis has also been an afterthought. While plenty of other sports have experienced statistical revolutions, there has been very little in the way of analytical research on pro tennis. With this post, I aim to do my part in changing that status quo.

The world rankings administered by the menâ€™s ATP Tour are used by players and fans around the globe to determine player talent and measure relative success.Â  Behind the world rankings lies a complicated system that rewards players with points for winning matches on a scale relative to the significance of the match.Â  Ideally, the ranking system assigns points in accordance with overall player talent.Â  However, factors such as injuries and playing schedules may not make this the case.Â  Match statistics do exist that attempt to quantify talent, such as how often a player aces his opponent or wins in break point (â€śclutchâ€ť) situations.Â  If talent truly is reflected through the rankings, one would hypothesize that better statistics would result in a higher ranking.Â  Is this the case?Â  Using multiple linear regression, I seek to assess the strength of five match statistics in their ability to predict ATP ranking points.

I took match statistic data from tennisinsight.com and looked at the current top 100 players as of the week of April 12th.Â  The five variables studied to predict ranking points were: number of aces per individual service game (aces per game), the percentage of times the player won his service game (service hold %), the percentage of points the player won when not serving (return points won %), the percentage of points won when, on his serve, his opponent had game point (break points saved %), and the percentage of points won when, on his opponent serve, he had game point (break points won %). Because of the exponential nature of the ranking system (points values increase exponentially as players progress through tournaments), the ranking points response variable is largely right skewed and thus assessed after a logarithmic transformation.Â  All five variables were assessed at the significant predictor level of P<0.05.

The results are summarized below.

Service hold % is the most statistically significant predictor of the two variables (it had a larger t-statistic).Â  As we can see from the graph below that specifically examines service hold %, the tennis players that can hold serve sit atop the world rankings.

The variables found significant in this study are relatively unsurprising.Â  While some players may not strictly adhere to these rules (like the big serving but low ranked Ivo Karlovic), they are the exceptions.Â  For the most part, the top servers and returners are the big names that make deep runs into the slams.

Perhaps the most interesting result of this study is that the other three variables studied (aces per game, break hold %, break points won %) were not significant predictors of world ranking.Â  It is certainly surprising that player performances in break point situations are insignificant determiners of world ranking.Â  If you were to look at who currently leads the ATP tour early on this season in break point stats, names such as Rafael Nadal, Roger Federer, and Andy Roddick would frequent the list.Â  Break points are considered the â€śclutchâ€ť moments in tennis, and the players that win them often attain insurmountable advantages in matches.Â  One would assume that winning or fighting off break points (representative of break points won % and break points saved %) would result in match wins which would increase ranking points.Â  As an avid Andy Roddick fan, I figure I will still have a tough time shaking off his errant shots on break points, despite the knowledge of this study.

It is slightly less surprising that aces per game are not significant predictors of ranking points.Â  While aces a surefire way of winning points, it is a well known fact that the best servers in tennis are not always the best players.Â  Having an effective serve helps, but often the best servers are powerful players who are usually larger and therefore lack quickness and other attributes required for success in tennis.Â  A look at the meager top 10 list in this statistical category provides confirmation to the finding (ATP ranking in parenthases): Ivo Karlovic (28), John Isner (22), Ivan Ljubicic (14), Sam Querrey (25), Andy Roddick (7), Ernests Gulbis (44), Michael Llodra (66), Jo-Wilfried Tsonga (10), Fliciano Lopez (35), Rajeev Ram (95).

Perhaps the largest weakness and limitation of this study is its inability to prove causation. This is because the study is simply observational.Â  The significant predictor variables in this study are simply associated with player ranking points.Â  Causation may exist for one or more of these variables, but cannot be proven.Â  A randomized experiment or other studies that confirm the association found are needed to prove causation.Â  Although only two of the five predictor variables are significant, all five variables are correlated with each other to varying degrees and confounding variables may exist such as a particular skill or trait that influences multiple predictors.Â  It should also be noted that the two significant predictor variables represent proxies for tennis a pro’s skillset, and they should not be treated as exclusively predicting future performance.Â  Another potential shortcoming of this study is that, other than aces per game, the predictor variables rely heavily on the quality of the playerâ€™s opponent.Â  Therefore, a player who does not play the same schedule difficulty as the top players can accumulate inflated statistics.

Future studies can be conducted that examine other predictor variables, specifically ones that look at player attributes rather than match statistics.Â  The players that constitute upper tier of todayâ€™s tennis rankings are much taller and more highly concentrated in Europe than in past years.Â  It would be interesting to see if any of these categorical variables were predictive of ranking points.Â  Logistic regression instead of multiple linear regression would probably provide a better study that could examine the likelihood of winning or advancing in tournaments.

#### harvardsports

View all posts

• jakefisher723 says:

“If you were to look at who currently leads the ATP tour early on this season in break point stats, names such as Rafael Nadal, Roger Federer, and Andy Roddick would frequent the list.”

This seems to say that the best players in the world are the best in break point situations. But the overall data still says break points are not statistically significant?

• akcohen says:

Yea, and those three are still high up on those lists. It contradicts the data, which overall states that the break points are not statistically significant.

It’s also important to note that “break point stats” refer to two lists: break points saved, and break points won. The top players don’t all appear on both of those lists. The former refers to how well you handle break points when you are serving (Roddick is only great at that), and the latter refers to how well you break opponents when they are serving (Djokovic and Murray are only great at that).

• Vijaylakshmi Shetty says:

Hi Andrew,
I noticed that you studied Hold serve % and Return points won % . For the first variable you took games and for the second variable , you took points. Why not return games won% or why not service points won%?
There must be a reason you did this, could you explain?
I am trying to apply your results to see which of the promising new guys is likely to end up in top 20 or top 10 , so I want to know why you chose games for service but points for return.

Nice work, Andrew. Good to see a tennis post…especially since I read about this in ESPN The Magazine weeks ago. Those guys can predict the future.

• Enjoyed the post very much. I wonder if the break point data would be different if there was a statistic you could use that would state the percentage of games won when a player held a break point. Let’s say a player is up 40-0 on his opponent’s serve and loses three straight points to get to deuce. Then he wins the deuce and ad points. He’s now won 25 percent of break points.
Another player goes up 40-0 against serve and wins the point immediately. He’s won 100 percent of break points, but got the same result.
Similarly, is it better to be 1-2 on break points or 4-17? The raw numbers say the former, but we know it’s the latter. Is there a way to account for this?

• Enrique says:

I don’t think the break points stat is that surprising. Return points % being high should be a much better indicator, as it indicates what happens throughout the whole match. What happens on one particular point (break point chances) has a lot more variance because it is just one point.

It should be evident that creating break point chances when returning and avoiding break point chances when serving are way more important in terms of rankings. Because those things happen more often. For example in the US Open final, Nadal was only 4 for 22 on break point chances against Djokovic. But the fact that he created so many opportunities took its toll on Djokovic and made the match a Nadal dominated match, even though the score didn’t look as lopsided since Nadal couldn’t break more.

• ebh says:

Check break point opportunities. I think that is more important than the actual break point. Also, the magnitude of return points is more significant than service hold. That should be pointed out.

Also, create a joint variable of break points won + bp saved. That should be interesting.

• B.D. says:

ATP points are dependent across observations. Is there a special type of MLR that was used? If not, what are the estimation and inference implications of this dependency? Thanks, enjoyed the article.