“It’s not an easy place to go” is a saying often used by soccer commentators when evaluating a team’s prospects in an upcoming away match. Champions League journeys to Eastern European countries like Russia, Ukraine and Turkey are often met with this kind of analysis. With the Champions League group stage starting tomorrow, this got me wondering, are these places actually that difficult to go to, or are commentators just using such hyperbole to fill airwaves? I decided to investigate this claim and try to determine, which place is the most difficult to go to in the UEFA Champions League.

To do this, I analyzed group stage results in the Champions League since 2003/4 season (the first where the current format of 32 teams followed by a 16 team knockout stage was instituted). I only chose Champions League group stage matches because of the differing nature of knockout matches (two legged aggregate ties) and the fact that the Europa League group stage often includes teams that rest key starters. To measure how strong a team (or country) is, I decided to compare the number of points they achieved in their home matches to the number of points achieved in their away matches. I aggregated this across all seasons for all countries and measured the percentage of points taken at home. At the country level, I chose all countries that have won at least 50 points in group stage matches over the time period and for clubs I chose all clubs that have participated in at least five group stages. At the country level, we get the following results:

Scotland comes in at number one. This can be explained by the intimidating atmospheres on hand at Celtic Park (extremely Irish nationalist and Catholic) and Ibrox (extremely pro Union and Protestant) on Champions League nights. Celtic and Rangers are the only two teams to represent Scotland in the group stages of the Champions League since 2004. The Netherlands are a very surprise inclusion at number two given its strong economy, high standard of life and lack of renown for especially intimidating stadiums. Greece, Turkey and Russia are not surprising. In addition, the very successful countries are near the bottom because of a high denominator (in addition to winning many matches at home, they also win many matches away from home thus lowering their ratio)

The club level also tells a similar story, I’ve reproduced the top 10 and bottom 10 teams below (out of the 39 teams who met the criteria):

The presence of PSG, Real Madrid and FC Barcelona should not be too concerning. The reason for their inclusion is due to their overwhelming success in away matches, which jacks up their denominator. French teams’ struggles at home are interesting to see. On the top end, Celtic Park again overwhelmingly asserts its dominance as a fortress with Istanbul based Galatasaray coming in second.

In conclusion, the places that commentators often mention are “difficult to go to” (Scotland, Ukraine, Greece, Russia and Turkey) when describing a Champions League away day do in fact rank amongst the hardest away days. However, we were also able to uncover other interesting places that are comparatively very strong at home (Netherlands and the 2 German clubs) that one might not originally think of. We also found that extremely developed countries (England, Spain, France) performed comparatively worse in their home matches. This is likely explained by the extra comfort levels experienced by away teams when visiting their countries (and in the case of England, tamer stadiums).

What are your thoughts on this? Is Celtic Park the toughest away day in Europe? Were the results more or less what you were expecting? Let us know in the comments below.

Editors Note: If you have any questions about this article, please feel free to reach out to Andrew at andrewpuopolo@college.harvard.edu.

]]>Editors Note: This article was originally completed as an assignment for Harvard’s Sophomore Tutorial (Economics 970: Sports Economics) that asked for students to conduct an economic evaluation of the Chargers move from San Diego to Los Angeles.

After spending 55 years in San Diego, the Chargers are moving back to Los Angeles this upcoming NFL season. Until the Los Angeles Stadium at Hollywood Park is finished being built in 2019, the Chargers will play their home games at the StubHub Center, with a maximum capacity of 30,000 (Florio 2017). The NFL’s average attendance per game is nearly 70,000 (Florio 2017) and the Chargers are expecting to sell out every home game. The decrease in the supply of seats has forced the franchise to raise ticket prices to a league-high average of $192 per ticket; the league average in 2016 was $93 per ticket (Florio 2017). This paper aims to forecast the revenue and franchise value of the Los Angeles Chargers following their move from San Diego.

To get a wide view of how different variables affect franchises’ revenue and current values, I used 2016 NFL Valuations from Forbes. I analyzed the following variables for all 32 teams: current value (Billions), one-year percent value change, debt/value, revenue (Millions), operating income (Millions), 2016 player salaries (Millions), 2016 average attendance, 2016 win-percentage, number of super bowl wins in franchise history, gate revenue (Millions), metropolitan area population (Millions), stadium capacity and average ticket price. Using Stata, I regressed every variable to revenue and current value, and narrowed down the variables by eliminating the ones that weren’t statistically significant. I created the following model for revenue: expected revenue (y hat) = -8.397 – .197(one-year percentage value change) + 1.285(Operating Income) + 1.188(Player Salaries) + 0.0009(Average attendance). For current value my model was: expected value (y hat) = -1.24 + .013(one-year percentage value change) + .006(revenue) + .018(Metropolitan area population) + .009(Average ticket price). For one-year percentage value change, I used the 2016 change for the Los Angeles Rams (100%), who just moved to to California last year from St. Louis. This allowed me to factor in the effect which moving to Los Angeles has on the value of the franchise. For Average attendance, I used 30,000, because the Chargers announced that they expect to sell out every home game next year. Interpretations for the coefficients of the revenue model would be as follows: holding all else constant we would expect a $197,000 decrease in revenue for each one percent increase in one-year percentage value change, a $1,285,000 increase in revenue for every $1,000,000 of operating income, a $1,188,000 increase in revenue for every $1,000,000 of player salaries and a $900 increase in revenue for a one-person increase in average attendance. We would expect a franchise to have a revenue of -$8,397,000 if they had no one-year value change, operating income, player salaries or attendance at their games. Interpretations for the coefficients of the current value model would be as follows: holding all else constant we would expect a $13,000,000 increase in value for a one-percent increase in one-year percentage value change, a $6,000,000 increase in value for a $1,000,000 increase in revenue, an $18,000,000 increase in value for a 1,000,000 person increase in metropolitan area population and a $9,000,000 increase in current value for a $1 increase in average ticket price. We would expect a franchise to have a current value of -$1,240,000,000 if they had no one-year value change, revenue, metropolitan area population and ticket price. According to my models, the Chargers would earn $330,630,000 in revenue in Los Angeles, which is a $13,370,000 decrease from the $344,000,000 they earned in San Diego in 2016. Additionally, their franchise value would increase $520,000,000 from $2,080,000,000 to $2,600,000,000.

While my models take a lot of variables into account, they still have many limitations. For example, neither of the models take into account the fact that NFL teams collect 60% of the gate revenue from home games and 40% of the gate revenue from away games. Adding this into the model could end up increasing the Chargers’ revenue because they would be gaining revenue from their away games in stadiums that have capacities much greater than 30,000. Also, while I used the one-year percentage value change that the Los Angeles Rams experienced following their move from St. Louis, it is difficult to assume the Chargers will experience the same exact increase. They are moving from a city much closer to Los Angeles, and that could have an effect on their future revenue and value. Lastly, the model does not take into account the fact that there will now be two NFL teams in Los Angeles, and that football fans in the city are likely going to be divided between the two teams (e.g. New York Yankees and New York Mets, Los Angeles Lakers and Los Angeles Clippers). While the models do have their limitations, they provide a good estimate for the revenues and values of NFL franchises, and can assist the Chargers in forecasting the results of their move to Los Angeles.

Regressions:

Works Cited

Florio, Mike. “Chargers Announce Season Ticket Prices for StubHub center.” ProFootballTalk.

N.p., 14 Feb. 2017. Web. 21 Feb. 2017.

“2016 NFL Valuations.” Forbes. Forbes Magazine, 2016. Web. 21 Feb. 2017.

If you have any questions for Alexander, feel free to reach out to him at alexandermeade@college.harvard.edu.

]]>Rhys Hoskins has been off to quite a tear to start his MLB career. 9 Home Runs in his first 54 at bats, 11 in his first 64 and he’s currently at 12 HR in 85 at bats. That got me thinking, how unlikely is it to hit 11 home runs in any 64 at bat sequence?

To measure this, I decided to calculate the cumulative distribution function of a binomial random variable with 64 trials and at least 11 home runs. I used quite a few different probabilities for the chances of hitting a Home Run in a particular at bat. As follows, they were:

1.) The average number of home runs per at bat for all MLB players in 2016 (5610 Home Runs in 165561 at bats)

2.) Mark McGwire’s 1987 rookie season (record number of Home Runs for a rookie in a season, 49 in 557 at bats)

3.) Barry Bonds 2001 season (73 in 476 at bats)

4.) Mark McGwire career (best HR/AB ratio in MLB history, 583 Home Runs in 6187 at bats)

Assuming that every at bat is independent and identically distributed, the probabilities for each of these players hitting at least 11 Home Runs in any 64 at bat cycle were:

While this does exhibit how absurd Barry Bonds’ 2001 season was, it does show that even for the best players of all time, the probability of this event happening is still extremely rare.

Looking at the average MLB player though shows how absolutely astonishing this feat is. For the average MLB player, we would expect this to happen once in every 104779 sequences of at bats. Assuming the average MLB player takes 600 at bats in a season, this means that for any one player this would happen approximately once every 174 years.

Astonishing.

]]>In tennis (and often other sports) debates, a champion’s path to victory is often subjectively debated upon – whether they were lucky based on the quality of opposition, or vice versa. With arguably the Greatest of All Time Roger Federer, winning his 19th Grand Slam and record 8th Wimbledon, I decided to inspect the path to victories in the era of the Big Four. In order to objectively determine the difficulty of winning a Grand Slam, a very simple metric was used. Each champion was given the cumulative ATP points (as it stood at the time of the Grand Slam in question) of all seven people he defeated along the way to the title, and tallied up. There a few flaws with going about the evaluation this way, but I stuck with it for simplicity and to remove any trace of subjectivity. Other things I was thinking of was weighting tournament end-stages higher (due to pressure), or weighting the opponent’s previous results in that tournament higher (e.g. Nadal is stronger on the clay of Roland Garros than at the US Open). Similarly, accounting for the amount of points a No.1 ranked champion has removed from tour could also be done in a more complicated, albeit slightly subjective model (Djokovic 2011/2015 or Federer 2006 probably faced a tougher No.2 than the corresponding No.2’s ranking suggested, purely because of the points they themselves denied the No.2 the previous year). Finally, things like sudden increase in opponents’ form and a strong player coming back from injury (with low ranking points) are others that could not properly reflect in the difficulty of the draw, but overall, simply choosing rankings points seemed to depict opposition quality quite well.

Now, before the results, some notes about the method:

· Due to the complexity of the ranking point system change in 2009, the points before 2009 were doubled to be comparable, although in actuality there are nuances in the differences of each system.

· The Grand Slams included begin from Wimbledon 2003, as Federer’s first win signified the first GS won by the Big 4. It’s nice to have come full circle with Wimbledon 2017, although the last two slams (French 2017 and Wimbledon 2017) were omitted due to lack of data.

So, after analysis, this is what the results throw up:

Notably, out of Wawrinka’s three Grand Slam victories, two ranked as the toughest and third-toughest path on the list. As the only multiple-slam winner outside of the Big 4 in their era, this is significant and depicts how hard it is for someone outside of that group to win. Nadal’s 2010 French Open win was the easiest slam on record, and it was the only French he won between 2005-2011 without having to beat Federer. As the grass at Wimbledon is notorious for causing upsets, it is not often that seeds hold their ground all the way through, and although Federer’s 2007 Wimbledon victory is the hardest one won, it ranks relatively low on the list if other slams are included. Australian Opens tend to continuously have difficult paths to victory, as a results of seeds playing to their rank and reaching the final stages more often than now. Here’s the list of the top 10 hardest paths to victory since Wimbledon 2003:

As mentioned, the fast courts of Wimbledon and the US Open seem to throw in a lot of upsets and ensure the winner doesn’t always run into the expected seeds along the way. Contrast that to the French Open, and to a lesser extent the Australian Open, where the top seeds almost always reach the later stages. Four out of the six non-Big4 Grand Slam wins since Wimbledon 2004 (Cilic being the exception) make up the first 7 spots, just going to show how remarkably difficult it was for the fringe Top 10 players to break through in such a dominant era. Stan Wawrinka, who only has 1 Masters 1000 yet 3 Grand Slams, proves yet again how when he catches fire he can go through anybody. And Juan Martin del Potro, who flat-forehanded his way to the top and seemed destined for greatness, was unfortunately riddled by wrist injuries. An obvious reason why non-Big4 members dominate the top of this metric is that they are typically lower ranked than Federer, Nadal, Murray, and Djokovic, and thus need to have to pick off one or more of them on the way to winning the title, as both Wawrinka and del Potro memorably did.

Another interesting fact is that certain opposition seem to often lose to the eventual winners. For example, the unlucky Lleyton Hewitt, himself a two-time Grand Slam winner prior to 2003, lost 13 times to the eventual Grand Slam winner, including 6 times in Round 4. This is highly unusual due to the fact that most of these exits were not finals or semi-finals, and the likelihood he was drawn in the same quarter or eight as the eventual champion was much greater than expected. However, the usual suspects reached the end stages often enough to lose to the eventual champion – Federer (18), Djokovic (18) and Murray (13). Interestingly, Nadal has only lost 8 times to the eventual champion, due to early exits in tournament’s he plays poorly/injured in.

So, there you have it, the hardest Grand Slams won over the past decade and change. Wawrinka seems to lead the way when he’s on song, which somewhat explains his mercurial career.

]]>A new season of European soccer is upon us. The Premier League kicked off this past weekend with some exciting matches and a major upset as Burnley defeated defending champions Chelsea at Stamford Bridge. Over the next couple of weeks, the other major European leagues will kick off their new season and fans will be excited to watch their teams battle it out for a title.

Or will they? Many fans of soccer are becoming increasingly worried that the game is becoming more and more unequal. The superpower teams have access to a clear majority of the resources, especially the ever-increasing television revenues and it seems as though those teams will dominate their respective leagues for many years to come. Last season, the top seven teams in England also had the seven highest budgets, with 8th place Southampton closer to the relegation zone than 7th place Everton. In Italy and Germany, it was business as usual as Juventus and Bayern Munich won their leagues for the 6th and 5th consecutive time respectively. In Spain, the top 3 was the same for the fourth season running.

Two years ago, current HSAC president Brendan Kent took a look at comparing the parity in MLS to the parity in the Big 5 European leagues using Gini coefficients, and came to the conclusion that MLS had much more parity than its European counterparts.

Inspired by Brendan’s work, I decided to expand upon it to test the following:

1.) To confirm Brendan’s work from 2 years ago across a longer time horizon (15 years).

2.) To show that the English League Championship, dubbed “the hardest league in the world” has levels of parity higher than its top division counterpart.

3.) To try to determine which European league has exhibited the highest levels of parity over the last 15 years.

4.) To see if levels of parity have shifted changed in certain leagues over time.

To do this, I calculated the Gini coefficient of every season dating back to 2003 in the following six leagues:

1.) English Premier League

2.) Spanish La Liga

3.) Italian Serie A

4.) German Bundesliga

5.) Major League Soccer

6.) English Football League Championship

The first four of these leagues were chosen because they are widely regarded to be the four best leagues in the world, as they have provided every single Champions League finalist since 2004. Major League Soccer was chosen as a control to test my methodology. MLS has a salary cap and therefore all teams have theoretically have equal resources. The English Football League Championship was chosen to compare directly to the English Premier League as having the same league structure and sporting culture but more income equality across the entire division.

For those uninitiated with economic theory, the Gini Coefficient is a measure of income inequality within a country. A country with a Gini coefficient of 100 has one person having all the economic resources while a country with a Gini Coefficient means everyone is equal. In this example, we use the number of points achieved by each team as our unit of income. The higher the Gini Coefficient, the more inequality there is across the league.

Before running any statistical tests, I will first graph the parities of each of the six leagues on the same set of axes, followed by the identical graph of the four major European Leagues.

Note: When calculating the league points for each team in the league, the number of points earned through results was used instead of the number of points on the final league table (in cases of teams being deducted points).

There are a few important things to note here before we dive deeper with more advanced statistical tests.

1.) For the most part (12 of the 15 seasons), the MLS had the lowest Gini coefficient.

2.) The English League Championship had a lower Gini coefficient than the Premier League in every season recorded, seemingly confirming our hypothesis.

3.) The league with the highest Gini coefficient has changed over time. Italy had the highest coefficient around the time of the infamous Italian match fixing scandal. In the last part of the first decade of this century, the Premier League took over, with the same four teams finishing in the first four places for 6 consecutive years. In recent years, the growing income inequality between the 2 Madrid clubs, Barcelona and the rest of La Liga has led to growing levels of inequality.

4.) La Liga’s Gini coefficient has steadily trended upward over the last 15 years while most of the other leagues remained constant.

We also look at the average Gini coefficient of each league.

This would imply that the Premier League has had the lowest levels of parity over the past 15 seasons (and contradicting the argument put forward by many proponents of the English game), but we ought to dig deeper into that claim given that the average Gini coefficient over the last 15 years by the Big 4 European Leagues are pretty close.

We will now test each of our hypotheses outlined above using statistical methods.

**Question 1: Does the MLS exhibit more parity than the other 5 leagues?**

The first question we are trying to answer is an extension of Brendan’s work applied to larger time sample. We will first run an analysis of variance test on the six leagues tests

This tells us that there is large statistical significance to imply that the average Gini coefficient of all 6 leagues are not the same. We now test the MLS against our second lowest coefficient using a t test.

From this, we can conclude at the 90% significance level that MLS has demonstrated the highest levels of parity over the last 15 years out of any of the six leagues that we have tested.

Question 2: Does the Championship exhibit more parity than the Premier League?

Our second question was does the English League Championship would exhibit much greater levels of parity than the English Premier League as a result of more comparable budgets across the league, despite having otherwise identical structures (culture, travel, stadiums etc.). We also test this using a t-test.

From this, we can confirm our second hypothesis that the Championship has much higher levels of parity than the Premier League.

**Question 3: Is there a difference in parity between the four biggest leagues?**

Our third question, trying to measure parity between the Big Four Leagues, we test using an analysis of variance test between the top 4 European Leagues.

Here, we find that there is no statistical significance implying that the Gini coefficients between any of the four leagues is different.

Question 4: Has the parity in leagues changed over the years?

To test this, I regressed each of the leagues against the entire time period to see if there was a statistically significant slope to imply that the levels of parity have changed in the last 15 years. After conducting this analysis for all 6 leagues, I found that only Spain’s La Liga exhibited a statistically significant difference in parity levels. I found that on average La Liga increased its Gini coefficient by about .5 per year. This makes sense, given the increased dominance of Real Madrid and Barcelona (and now Atletico Madrid) over the last 15 years.

To summarize, we came to the following conclusions:

1.) MLS has higher levels of parity than the major 4 European Leagues

2.) The English League Championship has significantly higher levels of parity than the English Premier League

3.) There is no real difference in parity levels across the four major leagues

4.) Spain’s La Liga has exhibited diminishing levels of parity over the last fifteen seasons.

Sorry Premier League fans, but your league unfortunately does not have any more parity than the “two team leagues” that you so often deride.

Editors Note: If you have any questions about this article, please feel free to reach out to Andrew at andrewpuopolo@college.harvard.edu.

]]>Tomorrow night, Arsenal and Leicester City kick off the 25th season of the English Premier League at the Emirates Stadium. Every summer, the vast majority of media attention focuses on the six biggest clubs, and this summer has been no different. Manchester City have bought every expensive goalie and defender under the sun; Arsenal, Manchester United and Chelsea have all signed expensive new starting strikers; Liverpool have signed Mohamed Salah from AS Roma and Tottenham Hotspur have done nothing besides sell Kyle Walker. Every day it seems there is some expert trying to predict the exact permutation of where those six teams will finish, an exercise that is likely futile as so much can happen during a season with injuries. Almost no one predicted Chelsea would win the title last season, let alone with such ease.

However, a main thrill of watching the Premier League is following what happens towards the bottom end of the table. Relegation battles are often very tense and have so much at stake. Every season, three new teams are promoted to the Premier League and add a different flavor to the league. In years past the Premier League has been treated to the likes of free spending Queens’ Park Rangers, audacious Blackpool, tenacious Bournemouth as well as teams that have since become Premier League regulars, like Southampton, Crystal Palace and West Ham United.

This year the three promoted teams share very different profiles. Leading the way is Newcastle United, a big club in the Northeast of England whose legends boast the likes of Premier League all time leading scorer Alan Shearer and play their home matches at 52,400 seat St. James’ Park. Coming up straight behind them after an epic final day title battle is Brighton and Hove Albion. Albion play on the south coast of England in the historic seaside city of Brighton at the 30,000 seat American Express Community Stadium, which opened in 2011. Brighton last played a top flight season in 1982/83, when they were relegated and lost in the FA Cup Final replay 4-0 to Manchester United (2-2 draw in first game). They’ve since been down to the fourth division (and almost beyond) and back. The last promoted team is Huddersfield Town, who were Champions of England three consecutive years from 1923-1925 but have not played a top flight season since 1972. They were promoted after winning successive penalty shootouts against Sheffield Wednesday and Reading in the Championship Playoffs.

I decided to build a model to determine what each of these three teams’ chances are of avoiding relegation this season. To do this, I conducted a logistic regression based on many different variables of each club promoted to the Premier League since 1992. To be as complete as possible, I took data on almost any possible data I could think of that might have an effect of teams staying up. However, since reliable transfer data is very difficult to get a hold of, I did not include transfer spending in the summer leading into the Premier League. The variables I decided to use in my initial regression were:

– Finishing position in the previous seasons’ Championship

– Points in the previous seasons’ Championship

– Whether the team came up through the playoffs

– Whether the team entered the Championship from the first or third tier

– Bounce back (teams relegated from Premier League previous year)

– Honeymoon (first top flight season in over 30 years)

– Familiarity (had spent at least one season in Premier League in previous 6 years, but not a bounce back team)

– Stadium Size (in thousands)

– Location (North, South, Midlands, London)

– Whether or not the club is a “big club” *

In addition, I also took interactions of all these variables with Championship points to see if there was hidden correlation. After running my logistic regression (R output at bottom of article), I found that the following three variables were statistically significant at the 10% level.

– Interaction between points and whether the team were promoted from the third tier

– Interaction between points and bounce back

– Interaction between points and big club

For this years crop of teams, Newcastle are considered both a “big club” and bounced back from the Premier League, while Brighton and Huddersfield entered the Championship from the third tier. Our logistic regression gives the following probabilities of staying up.

These probabilities seem pretty reasonable, however the main flaw with this model is that all teams that do not hit any of the binary variables above have a 26% chance of staying up. To show the plausibility of this model, here is what the model predicts for the last 15 teams to be promoted from the Championship and if they stayed up or not.

This season seems sure to provide excitement at the top and bottom of the Premier League, and these three teams will be fighting for their lives right up until the very end. Be sure to tune in!

Editors Note: Editors Note: If you have have any questions about this article for Andrew, please feel free to reach out at andrewpuopolo@college.harvard.edu

*The designation of big club was somewhat arbitrarily chosen based on my view of the stature of promoted clubs. For this analysis, the teams included were West Ham United, Newcastle United, Sunderland, Derby, Portsmouth, Manchester City, Blackburn Rovers, Ipswich Town and Nottingham Forest

R Output:

]]>Every sports fan knows the importance of averages. Cricket is no different, with the batting average being the primary yardstick most batsmen are gauged by. A hallmark of a great Test batsman has always been the golden 50, and most modern greats hover around or above there. An interesting anecdotal observation viewers often make is that “so-and-so has to get out before he faces 40 balls, else it is almost impossible”. So I set out to analyze batting averages based on the length of the innings. The methodology is simple. On the x-axis is the number of balls (or more) the innings has lasted. On the y-axis is the average for innings that have lasted at least “X” number of balls. Therefore, at x=0, everyone’s average is what their true career average is. At x=10, the y-axis indicates the average of innings which have lasted at least 10 balls or more, and so on. Obviously, as X is increased, the trend of averages is to go up, as longer innings indicate bigger scores. The few drops in an individual average as x increases is a result of innings in which the batsman hasn’t been dismissed (i.e. they are not out), which when removed, causes a drop in average.

For Test batsman, greats from the 1990s to today were included, as well as the modern leaders. While most players average between 50 and 60 as their career average, Australian Steven Smith separates himself from the pack after around 50 balls. Similarly, Rahul Dravid (India) and Joe Root (England) never seem to be really set, and separate themselves towards the bottom. What this means is this Smith is much harder to get out once he has his eye in, about after 50 balls. While this is true of most batsman, Smith’s case seems extraordinarily so, and bowlers must try to set attacking fields and get him out before he gets stuck in there. Dravid and Root paint the opposite picture, never really benefiting from spending time at the crease initially. The major caveat to this analysis is understanding of batsmen’s strike rates. Dravid is expected to feature lower down due to his slower rate of scoring, so in his case, the analysis might be slightly misleading.

The same analysis was conducted for One Day Internationals (ODIs), which threw up some interesting results. Most notably, South African AB de Villiers explodes off the chart after around 40 balls, spelling ominous signs for bowlers around the world. This definitely fits the visual storyline, where he sometimes gifts his wicket away early on in the pursuit of quick runs, but once he spends time batting, is one of the hardest in the world to dismiss.

]]>On April 9, Russell Westbrook broke Oscar Robertson’s purportedly-unbreakable regular season record of 41 triple doubles against the Denver Nuggets. Some have called the feat arbitrary while others say it is enough to guarantee MVP hardware. How important is a triple double if Russ is changing his on-court behavior to pad certain stats, as some suggest? Does Russell Westbrook push teammates away for rebounds and excessively pass for assists when he is close to a triple double? In this post, I’m going to use play-by-play data for the entire 2016-17 season to look for evidence of Russ stat-stuffing.

I’ll rely on the Basketball Reference play-by-play regular season database which includes Russ’s 867 total rebounds and 840 assists. I’m only going to look at the rebound and assist stat categories since Russ hasn’t had any trouble reaching double digit points in any game this year (unlike the Ricky Rubio/Draymond Green-style triple double). I’ll first analyze rebounds and assists up to 10 in each game since that is, of course the eligible level for a triple double. Later in the article, I’ll expand the analysis to test for discontinuities at the double-digit threshold.

In general, I’ll test whether Russ sped up his stat accumulation in situations in which he needed rebounds/assists to reach a triple double. So, unless otherwise noted, the dependent variable is the time since last stat (TSLS). A lower TSLS implies that Russ is achieving rebounds/assists faster than normal and a higher number means he has slowed down in that stat category.

The primary independent variable is Distance, or the number of rebounds/assists away from a triple double in that stat category. According to the stat-stuffing hypothesis, Russ is more likely to target rebounds/assists when he is close to achieving double digits in those categories. I also tested nonlinear transformations and interactions including Distance, but these were insignificant and did not add any predictive ability beyond the linear case.

I’ve also decided to control for the Time Remaining in the game since Russ is more likely to be on the court in late game situations. This control does not affect any of the results but does control for an important source of omitted variable bias. Even when excluding the Time Remaining control, the results don’t change meaningfully. I’ve also tested for multicollinearity between Distance and Time Remaining using the VIF and found that the variables are far from being collinear and thus the parameter estimates appear stable and have consistent standard errors.

__Rebounds:__

I’ll start by looking at Russ’s rebounding. In the simplest regression of TSLS on Distance (controlling for Time Remaining), the coefficient on Distance is positive and significant below the 1% level.

Constant | 6.31*** |

Distance |
0.87*** |

Time Remaining | -0.23*** |

This suggests that Russ is getting rebounds 0.87 of a minute (or 52 seconds) earlier for each rebound closer to double digits. For example, Russ is getting his 9^{th} rebound 52 seconds earlier than his 8^{th} rebound on average after controlling for time remaining. This effect is thus large and very significant. Interestingly, the coefficient on Time Remaining is negative and statistically significant, as expected. As the time remaining in the game decreases, Russ’s average time before his next rebound also decreases.

Next, I test the idea that Westbrook is more likely to stat-stuff in games in which OKC is far ahead or well behind. If his stat-stuffing behavior is dependent on game state, then maybe this behavior is innocuous from a competitiveness perspective. To test this, I include the absolute value of the score margin at the time the stat occurs and a binary indicator for losing/winning as control variables. I will also include the interaction of the absolute value of the margin with the losing/winning indicator to see if Westbrook’s behavior is asymmetric in an OKC blowout win vs. an OKC blowout loss.

In the results below, we see that the Distance and Time Remaining coefficient estimates have not changed when controlling for game state. Additionally, the absolute value of the score margin is significant below the 1% level. This result is robust in regressions without the Losing and Interaction terms as well.

Constant | 4.97*** |

Distance |
0.90*** |

Time Remaining | -0.21*** |

Abs | 0.10*** |

Losing | 0.24 |

Interaction | 0.00 |

It appears Russ isn’t more likely to stat-stuff rebounds in blowout games. In fact, Russ actually slows down his rebound accumulation when the game gets out of hand (as judged by the coefficient on Abs). Additionally, Losing and the interaction of Abs and Losing are relatively small and insignificant which indicates that Russ’s rebounding behavior is approximately the same in wins and losses.

__Assist Results:__

I’ll now perform the same analysis on assists. Interestingly, the results are extremely similar to the rebounding case in that there is clear evidence of Russ stat-stuffing with assists.

Constant | 7.88*** |

Distance |
1.04*** |

Time Remaining | -0.30*** |

This suggests that Russ is getting assists 1.04 minutes (or 62 seconds) earlier for each assist closer to double digits. For example, Russ is getting his 9^{th} assist 62 seconds earlier than his 8^{th} assist on average after controlling for time remaining. This effect is large and very significant.

Below I test the same game state variables for assists to see if Russ is conveniently stat-stuffing assists. Unlike rebounds, I find that the absolute value of the score margin is insignificant. This indicates that Russ may be backing off of rebounds in blowout games, but that the margin has no effect on his assist behavior. Like rebounds, the Losing dummy and the interaction of Absolute Margin and Losing are insignificant and small.

Constant | 7.99*** |

Distance |
1.02*** |

Time Remaining | -0.30*** |

Abs | -0.02 |

Losing | -0.13 |

Interaction | 0.03 |

The evidence from the rebound and assist data suggests that Westbrook is * significantly more likely to target rebounds and assists if he happens to be closer to completing a triple double in that statistical category.* Interestingly, Russ appears to stat-stuff more on assists than rebounds to the tune of ~6 seconds per stat away from double-digits.

__Discontinuity Over 10:__

I’m now going to include data beyond the double-digit threshold to test whether a discontinuity exists in the time it takes Russ to record his next rebound/assist after reaching double digits. To do this, I now turn to a variable I’ll call Game Total which is simply the total number of rebounds/assists Russ has in the game up to that point (essentially opposite of Distance), and a variable called Over 10 which is a dummy representing if Russ is over double digits in that stat category. Below, I’m regressing TSLS on Game Total and Over 10, controlling for Time Remaining for rebounds.

Constant | 13.99*** |

Game Total | -0.76*** |

Time Remaining | -0.21*** |

Over 10 |
1.02* |

The coefficient on Over 10 is positive and statistically significant at the 10% level. This means each rebound beyond 10 comes 0.26 of a minute (16 seconds) slower than the previous rebound (1.02-0.76), controlling for time remaining. Below, one can see that the assists data supports the same narrative.

Constant | 16.84*** |

Game Total | -0.91*** |

Time Remaining | -0.28*** |

Over 10 |
1.70*** |

For assists, the Over 10 coefficient is large, positive, and statistically significant below the 1% level. This means each assist beyond 10 comes 0.79 of a minute (47 seconds) slower than the previous assist (1.70-0.91), controlling for Time Remaining.

In conclusion, the continuous regression and discontinuity evidence both point toward Russell Westbrook advantageously targeting rebounds/assists to produce a triple double. Though this isn’t necessarily damning evidence against the Westbrook MVP candidacy, it should be fodder against the myth of the triple double.

]]>This past Sunday, the NHL released the available players for the 2017 expansion draft for the newly minted Las Vegas Golden Knights. In honor of the first expansion draft since 2000, I decided to try to quantitatively determine the optimal expansion draft class for the new franchise. Before beginning on this endeavor, let us review what the relevant rules for the expansion draft are:

**·** Vegas must select 1 player from each of the 30 existing NHL franchises

**·** Vegas must select at least 14 forwards, 9 defensemen, and 3 goalies

** ·** Vegas must select at least 20 players who are under contract through 2018

** ·** Vegas’ draft class must have an aggregate cap hit at between 60-100% of the 2017 salary cap ($73 Million).

Furthermore, there are complicated rules about protecting players that each franchise must follow, but those rules are irrelevant to this analysis. However, if you are curious, you can read more about them here

Now that the rules have been set, I will explain my methodology. The NHL has a number of stats that can be used to compare players like Corsi, Fenwick, and point shares. For this analysis I concluded that point shares weighted for salary cap hit would be the best way to compare players. Specifically, I took each available player’s 2016-17 point shares and divided it by their salary cap hit and then normalized it by a factor of 10^6 to get relatively good numbers to work with. Point shares are a good statistic to use to compare players because they can be used to compare forwards and defensemen against each other because the stat aggregates offensive point shares and defensive point shares. Weighting this stat by salary cap hit allows us to compare players’ “points per dollar” and thus compare their efficiency. Furthermore, since the Golden Knights need to be between $43.8 million and $73 million, cap hit is an important data point to consider when making draft decisions. Additionally, I kept track of an indicator variable that tells me whether the player is under contract through 2018 or not, since 2/3 of the draft picks must be under contract through 2018. For goaltenders, I simply used save percentage as my statistic to compare and I ended up selecting goaltenders that both had good save percentage and were on teams that did not have position players that fit well into the draft class.

To begin this analysis, I downloaded the most recent spreadsheet of NHL statistics from hockeyabstract.com and reduced the data to only the players available to be drafted and their point shares or save percentage and salary cap hit. I then sorted the list by in descending order by PS/Save%*10^6 and selected the best defender for each team:

I then did the same for forwards:

Now that the top forwards and top defensemen on each team had been given scores, I selected the 14 best forwards and the 9 best defensemen:

This runs into a few flaws, however. Among these 23 players, 14 players are not under contract in 2017-2018, their salaries are woefully below the $43.8 million basement at $18,834,167, and there are some teams represented twice. To address these problems, I first eliminated Patrick Eaves and Brett Connolly from the table because neither is not under contract for 2017-2018 and the other player represented by their team is a better pick. I removed the forwards Dominic Moore, Erik Haula, and Stefan Noesen and defenseman Xavier Ouellet and replaced them with Darren Helm, Lee Stempniak, William Carrier, and Mark Barberio, respectively. The net loss in weighted point shares is minimized as such at -4.10, I gain 4 new players under contract through 2018, and I add some larger contracts to start approaching the salary cap basement, so these moves are all necessary given the rules of the draft:

So now we have 21 players, 12 forwards, 9 defensemen, and only 8 players not under contract through 2018. With this set I will now add in the 3 requisite goalies, and fill out the rest of the table with one player from each of the 30 teams to fill out the rules and we get this set of players:

This is a potential draft class for the Las Vegas Golden Knights maximized on value as defined by point shares per millions of dollars. It has 3 goaltenders, 15 forwards and 12 defensemen. However, we are not finished because this draft class misses the salary cap floor of $43.8 million. To remediate this problem, I am going to consider the point shares of all of the available players and then substitute in players on the same team who have similar point shares at the same positions but have a heftier contract so that we can meet the salary cap floor:

Eric Staal was substituted for Erik Haula, Matt Moulson was substituted for William Carrier, and Andrei Markov was substituted in for Nikita Nesterov. This draft class has 21 players under contract through 2018 and has a total salary cap of $47,256,667. This newly minted set of selections is the most efficient draft for the Las Vegas Golden Knights.

To conclude this post, I will discuss the benefits and the drawbacks to the approach I took to solving this problem. Although considering efficiency is important when drafting as an expansion franchise, it is not the be all end all. Other factors like age, team chemistry, coaching scheme, and injury history are important factors to consider. My approach merely takes into account point shares per million dollars in contract. My method also ran into the problem of undershooting the salary floor, and that resulted in my having to manually substitute in more expensive (and less efficient) players into the draft. In future iterations of such an endeavor I would likely consider only point shares because the salary cap seems sufficiently high that the expansion franchise would likely be able to draft the highest impact players without too much hesitation. Of course, I have no way of predicting any draft day deals the Golden Knights might make and I have no idea what sort of system they want to run out on the strip, but this analysis should be at least a little helpful in illuminating the best drafting strategy for the new franchise.

One final thing to note is that the Golden Knights have already publically decided that they are going to try to build their team as young as possible, and as a result have already lined up four trades with other teams in exchange for draft picks. As a result, this might mean that some of the players that were selected above will be ineligible to be selected due to the terms of those trades.

Editors Note: If you have have any questions about this article for Mitchell, please feel free to reach out to him at mitchellpleasure@college.harvard.edu

]]>The ancient Greek philosopher Heraclitus once said that change is the only constant in life. While his teaching is from thousands of years ago, it can easily be applied to the rules of modern day sports. Rules have been constantly tweaked and adjusted for decades across all sports. While the NBA has made many rule changes such as adjusting its defensive rules, banning and unbanning the slam dunk and implementing a shot clock, the introduction and altering of the 3-point line has had one of the most dramatic impacts on the association out of any rule change in league history.

The 3-point shot made its debut in the NBA in the 1979-80 season, and it was originally called a “gimmick” in the New York Times’s season preview. Phoenix Suns coach John Macleod said, “It may change our game at the end of the quarters, but I’m not going to set up plays for guys to bomb from 23 feet. I think that’s very boring basketball”. Boston Celtics’ president Red Auerbach stated that, “We don’t need it. I say leave our game alone”. In the first season of its existence, the shot was a rarely used weapon as teams averaged only 2.8 attempts per game. To put that into perspective, teams averaged 27 attempts per game in this 2016-17 season. Three-point shots per game gradually increased over the 1980s, as coaches finally realized that a shot that is worth 50% more pays off, even if that shot is a little harder to make. Ex-Nets coach Lawrence Frank said, “Teams have all caught on to the whole points-per-possession argument”. By 1994, teams were averaging nearly 10 attempts per game, then the NBA made one of its most impactful rule changes of all time. As a result of below average scoring in the early 1990s, the league moved the 3-point line closer, hoping the easier shot would result in higher scoring games. The line was originally 23 feet 9 inches (22 feet in the corners), and they shortened it by 21 inches to a uniform 22 feet at the beginning of the 1994-95 season. Although the average number of 3-point attempts per game increased by over 50%, the line was moved back to its original distance after the 1996-97 season because the shortened line had lowered the average score of games even further. In the three seasons before the line was moved in, teams averaged 105.6 points per game, while in the three seasons with the shorter line, teams averaged only 100.8 points per game. When the line reverted back to its original distance, there was a slight decrease in 3-point attempts the following season, however teams increased their attempts over time, soon surpassing the average amount of attempts that were taken when the line was moved in. Today, the 3-point shot is far from the “gimmick” it was considered to be in 1980. The shot plays a huge role in teams’ strategies, and players of all positions are expanding their range past the arc. Considering how significant the 3-point shot is in today’s league, it is important to investigate the effect the 1995-1997 experiment had on the NBA’s teams – Did the moving of the line help or hurt good 3-point shooting teams?

To test this, I used team per game stats from the 1992-93 season through the 1998-99 season and sorted the stats into the two seasons prior to the shortening of the 3-point line, the three seasons with the closer line, and the two seasons after the line was moved back out to its present distance. I created variables to represent the means of win percentage, 3-point percentage and 3-point makes per game for each team from the three different eras – e.g. “Average 3pt percentage before line moved in” is the average 3-point percentage for each team in the two years before the line was moved in. The summary statistics for the averages of teams’ 3-point percentages and 3-point makes per game is shown in Table 1 below

__Table 1__

Then I generated variables to represent the difference in teams’ average win percentages, average 3-point makes per game and average 3-point percentages between the different eras. Table 2 below shows the summary statistics for the latter two statistics.

__Table 2__

To test my hypothesis, I regressed difference in average win percentage when the line was moved in on average 3-point makes per game and average 3-point percentage in the era before the move. Then I regressed difference in average win percentage when the line was moved back out on average 3-point makes per game and average 3-point percentage while the line was moved in.

I found that how many 3-pointers a team was making per game before the line was moved in and how high their 3-point percentage was before the line was moved in had no statistically significant effect on their difference in win percentage once the line was moved in. However, when the line was moved back out to its present distance, 3-point makes per game during the period with the closer line and 3-point percentage during this period had a statistically significant effect on teams’ differences in win percentage when the line was moved back out.

First, we look at the regressions of difference in average win percentage when the line was moved in on average 3-point percentage and average 3-point makes per game in the two years before the move.

**Table 3**

As shown above, neither 3-point makes per game before the move nor 3-point percentage before the move had a statistically significant effect on a teams’ difference in win percentage between the era with the closer line and the era before it was moved.

Next, we look at the regressions of difference in average win percentage after the line was moved back out on average 3-point percentage and average 3-point makes per game during the era with the closer line.

**Table 4**

As shown in table 4 above, both 3-point makes per game during the three seasons with the closer line and 3-point percentage during these seasons had a statistically significant effect on a teams’ difference in win percentage between the era with the line moved back out and the era with the line moved in.

When interpreting the results of this study, it is important to look back at the summary statistics of the differences in average 3-point makes per game and average 3-point percentage between the different eras (Table 2, which is shown again below).

**Table 2**

First, the moving in of the 3-point line had no effect on teams’ differences in average win percentages because every team began making more 3-pointers and shooting them at a higher percentage when the line was moved in. The increase in percentage is what I would expect, assuming a closer shot is easier to make than a further shot, thus the increase in makes would also be expected under the assumption that teams were more willing to shoot the easier 3-point shot. As shown in table 2, the minimum increase in average 3-point makes per game was nearly one make per game, and the maximum increase was just over four makes per game. Since the improvement in these two statistics were consistent league-wide, no teams gained a comparative advantage.

However, the effects of moving the 3-point line back out is a different story. While some would expect that the effects of moving the line back to its original distance would result in the exact opposite of what happened when the line was moved in, this was not the case. Even though the 3-point shot was made more difficult when it was pushed back 21 inches, not every team began making less 3-pointers or shooting them at a lower percentage. Table 2 shows that while the league average 3-point makes per game and 3-point percentage decreased, it was not the case for every team, as the maximum observations for both those variables were positive numbers. While making the 3-point shot easier helped the shooting of every team, making the 3-point shot more difficult did not hurt the shooting of every team. Because the moving back of the line affected teams differently, there was room for a comparative advantage.

According to the regressions, teams that were making a lot of 3-pointers when the line was moved in and shooting them at a high percentage had the largest drop in win percentage when the line was moved back out. One way to interpret this is that teams that had been getting a larger portion of their scoring from threes now had to either get their scoring from a more difficult 3-point shot or change their offensive game plan to include a higher proportion of 2-pointers. Therefore, the teams that weren’t making many threes (or were shooting them at a low percentage) while the line was moved in closer received a comparative advantage when it was moved back out because they didn’t have to change their offensive game plan as much as the teams that were relying heavily on the closer 3-point shot.

In an era where there is debate on if it is time to move the 3-point line back even further because of how proficient players have become at the shot, this post provides valuable information on how the moving of the line in the past has affected the NBA’s teams.

Editor’s Note: This article is a condensed version of a research paper conducted for the Harvard Sophomore Economics Tutorial Econ 970: Sports Economics. If you would like to read the full-length paper, please contact Robert at robertfeinberg@college.harvard.edu.

]]>