On April 9, Russell Westbrook broke Oscar Robertson’s purportedly-unbreakable regular season record of 41 triple doubles against the Denver Nuggets. Some have called the feat arbitrary while others say it is enough to guarantee MVP hardware. How important is a triple double if Russ is changing his on-court behavior to pad certain stats, as some suggest? Does Russell Westbrook push teammates away for rebounds and excessively pass for assists when he is close to a triple double? In this post, I’m going to use play-by-play data for the entire 2016-17 season to look for evidence of Russ stat-stuffing.

I’ll rely on the Basketball Reference play-by-play regular season database which includes Russ’s 867 total rebounds and 840 assists. I’m only going to look at the rebound and assist stat categories since Russ hasn’t had any trouble reaching double digit points in any game this year (unlike the Ricky Rubio/Draymond Green-style triple double). I’ll first analyze rebounds and assists up to 10 in each game since that is, of course the eligible level for a triple double. Later in the article, I’ll expand the analysis to test for discontinuities at the double-digit threshold.

In general, I’ll test whether Russ sped up his stat accumulation in situations in which he needed rebounds/assists to reach a triple double. So, unless otherwise noted, the dependent variable is the time since last stat (TSLS). A lower TSLS implies that Russ is achieving rebounds/assists faster than normal and a higher number means he has slowed down in that stat category.

The primary independent variable is Distance, or the number of rebounds/assists away from a triple double in that stat category. According to the stat-stuffing hypothesis, Russ is more likely to target rebounds/assists when he is close to achieving double digits in those categories. I also tested nonlinear transformations and interactions including Distance, but these were insignificant and did not add any predictive ability beyond the linear case.

I’ve also decided to control for the Time Remaining in the game since Russ is more likely to be on the court in late game situations. This control does not affect any of the results but does control for an important source of omitted variable bias. Even when excluding the Time Remaining control, the results don’t change meaningfully. I’ve also tested for multicollinearity between Distance and Time Remaining using the VIF and found that the variables are far from being collinear and thus the parameter estimates appear stable and have consistent standard errors.

__Rebounds:__

I’ll start by looking at Russ’s rebounding. In the simplest regression of TSLS on Distance (controlling for Time Remaining), the coefficient on Distance is positive and significant below the 1% level.

Constant | 6.31*** |

Distance |
0.87*** |

Time Remaining | -0.23*** |

This suggests that Russ is getting rebounds 0.87 of a minute (or 52 seconds) earlier for each rebound closer to double digits. For example, Russ is getting his 9^{th} rebound 52 seconds earlier than his 8^{th} rebound on average after controlling for time remaining. This effect is thus large and very significant. Interestingly, the coefficient on Time Remaining is negative and statistically significant, as expected. As the time remaining in the game decreases, Russ’s average time before his next rebound also decreases.

Next, I test the idea that Westbrook is more likely to stat-stuff in games in which OKC is far ahead or well behind. If his stat-stuffing behavior is dependent on game state, then maybe this behavior is innocuous from a competitiveness perspective. To test this, I include the absolute value of the score margin at the time the stat occurs and a binary indicator for losing/winning as control variables. I will also include the interaction of the absolute value of the margin with the losing/winning indicator to see if Westbrook’s behavior is asymmetric in an OKC blowout win vs. an OKC blowout loss.

In the results below, we see that the Distance and Time Remaining coefficient estimates have not changed when controlling for game state. Additionally, the absolute value of the score margin is significant below the 1% level. This result is robust in regressions without the Losing and Interaction terms as well.

Constant | 4.97*** |

Distance |
0.90*** |

Time Remaining | -0.21*** |

Abs | 0.10*** |

Losing | 0.24 |

Interaction | 0.00 |

It appears Russ isn’t more likely to stat-stuff rebounds in blowout games. In fact, Russ actually slows down his rebound accumulation when the game gets out of hand (as judged by the coefficient on Abs). Additionally, Losing and the interaction of Abs and Losing are relatively small and insignificant which indicates that Russ’s rebounding behavior is approximately the same in wins and losses.

__Assist Results:__

I’ll now perform the same analysis on assists. Interestingly, the results are extremely similar to the rebounding case in that there is clear evidence of Russ stat-stuffing with assists.

Constant | 7.88*** |

Distance |
1.04*** |

Time Remaining | -0.30*** |

This suggests that Russ is getting assists 1.04 minutes (or 62 seconds) earlier for each assist closer to double digits. For example, Russ is getting his 9^{th} assist 62 seconds earlier than his 8^{th} assist on average after controlling for time remaining. This effect is large and very significant.

Below I test the same game state variables for assists to see if Russ is conveniently stat-stuffing assists. Unlike rebounds, I find that the absolute value of the score margin is insignificant. This indicates that Russ may be backing off of rebounds in blowout games, but that the margin has no effect on his assist behavior. Like rebounds, the Losing dummy and the interaction of Absolute Margin and Losing are insignificant and small.

Constant | 7.99*** |

Distance |
1.02*** |

Time Remaining | -0.30*** |

Abs | -0.02 |

Losing | -0.13 |

Interaction | 0.03 |

The evidence from the rebound and assist data suggests that Westbrook is * significantly more likely to target rebounds and assists if he happens to be closer to completing a triple double in that statistical category.* Interestingly, Russ appears to stat-stuff more on assists than rebounds to the tune of ~6 seconds per stat away from double-digits.

__Discontinuity Over 10:__

I’m now going to include data beyond the double-digit threshold to test whether a discontinuity exists in the time it takes Russ to record his next rebound/assist after reaching double digits. To do this, I now turn to a variable I’ll call Game Total which is simply the total number of rebounds/assists Russ has in the game up to that point (essentially opposite of Distance), and a variable called Over 10 which is a dummy representing if Russ is over double digits in that stat category. Below, I’m regressing TSLS on Game Total and Over 10, controlling for Time Remaining for rebounds.

Constant | 13.99*** |

Game Total | -0.76*** |

Time Remaining | -0.21*** |

Over 10 |
1.02* |

The coefficient on Over 10 is positive and statistically significant at the 10% level. This means each rebound beyond 10 comes 0.26 of a minute (16 seconds) slower than the previous rebound (1.02-0.76), controlling for time remaining. Below, one can see that the assists data supports the same narrative.

Constant | 16.84*** |

Game Total | -0.91*** |

Time Remaining | -0.28*** |

Over 10 |
1.70*** |

For assists, the Over 10 coefficient is large, positive, and statistically significant below the 1% level. This means each assist beyond 10 comes 0.79 of a minute (47 seconds) slower than the previous assist (1.70-0.91), controlling for Time Remaining.

In conclusion, the continuous regression and discontinuity evidence both point toward Russell Westbrook advantageously targeting rebounds/assists to produce a triple double. Though this isn’t necessarily damning evidence against the Westbrook MVP candidacy, it should be fodder against the myth of the triple double.

]]>This past Sunday, the NHL released the available players for the 2017 expansion draft for the newly minted Las Vegas Golden Knights. In honor of the first expansion draft since 2000, I decided to try to quantitatively determine the optimal expansion draft class for the new franchise. Before beginning on this endeavor, let us review what the relevant rules for the expansion draft are:

**·** Vegas must select 1 player from each of the 30 existing NHL franchises

**·** Vegas must select at least 14 forwards, 9 defensemen, and 3 goalies

** ·** Vegas must select at least 20 players who are under contract through 2018

** ·** Vegas’ draft class must have an aggregate cap hit at between 60-100% of the 2017 salary cap ($73 Million).

Furthermore, there are complicated rules about protecting players that each franchise must follow, but those rules are irrelevant to this analysis. However, if you are curious, you can read more about them here

Now that the rules have been set, I will explain my methodology. The NHL has a number of stats that can be used to compare players like Corsi, Fenwick, and point shares. For this analysis I concluded that point shares weighted for salary cap hit would be the best way to compare players. Specifically, I took each available player’s 2016-17 point shares and divided it by their salary cap hit and then normalized it by a factor of 10^6 to get relatively good numbers to work with. Point shares are a good statistic to use to compare players because they can be used to compare forwards and defensemen against each other because the stat aggregates offensive point shares and defensive point shares. Weighting this stat by salary cap hit allows us to compare players’ “points per dollar” and thus compare their efficiency. Furthermore, since the Golden Knights need to be between $43.8 million and $73 million, cap hit is an important data point to consider when making draft decisions. Additionally, I kept track of an indicator variable that tells me whether the player is under contract through 2018 or not, since 2/3 of the draft picks must be under contract through 2018. For goaltenders, I simply used save percentage as my statistic to compare and I ended up selecting goaltenders that both had good save percentage and were on teams that did not have position players that fit well into the draft class.

To begin this analysis, I downloaded the most recent spreadsheet of NHL statistics from hockeyabstract.com and reduced the data to only the players available to be drafted and their point shares or save percentage and salary cap hit. I then sorted the list by in descending order by PS/Save%*10^6 and selected the best defender for each team:

I then did the same for forwards:

Now that the top forwards and top defensemen on each team had been given scores, I selected the 14 best forwards and the 9 best defensemen:

This runs into a few flaws, however. Among these 23 players, 14 players are not under contract in 2017-2018, their salaries are woefully below the $43.8 million basement at $18,834,167, and there are some teams represented twice. To address these problems, I first eliminated Patrick Eaves and Brett Connolly from the table because neither is not under contract for 2017-2018 and the other player represented by their team is a better pick. I removed the forwards Dominic Moore, Erik Haula, and Stefan Noesen and defenseman Xavier Ouellet and replaced them with Darren Helm, Lee Stempniak, William Carrier, and Mark Barberio, respectively. The net loss in weighted point shares is minimized as such at -4.10, I gain 4 new players under contract through 2018, and I add some larger contracts to start approaching the salary cap basement, so these moves are all necessary given the rules of the draft:

So now we have 21 players, 12 forwards, 9 defensemen, and only 8 players not under contract through 2018. With this set I will now add in the 3 requisite goalies, and fill out the rest of the table with one player from each of the 30 teams to fill out the rules and we get this set of players:

This is a potential draft class for the Las Vegas Golden Knights maximized on value as defined by point shares per millions of dollars. It has 3 goaltenders, 15 forwards and 12 defensemen. However, we are not finished because this draft class misses the salary cap floor of $43.8 million. To remediate this problem, I am going to consider the point shares of all of the available players and then substitute in players on the same team who have similar point shares at the same positions but have a heftier contract so that we can meet the salary cap floor:

Eric Staal was substituted for Erik Haula, Matt Moulson was substituted for William Carrier, and Andrei Markov was substituted in for Nikita Nesterov. This draft class has 21 players under contract through 2018 and has a total salary cap of $47,256,667. This newly minted set of selections is the most efficient draft for the Las Vegas Golden Knights.

To conclude this post, I will discuss the benefits and the drawbacks to the approach I took to solving this problem. Although considering efficiency is important when drafting as an expansion franchise, it is not the be all end all. Other factors like age, team chemistry, coaching scheme, and injury history are important factors to consider. My approach merely takes into account point shares per million dollars in contract. My method also ran into the problem of undershooting the salary floor, and that resulted in my having to manually substitute in more expensive (and less efficient) players into the draft. In future iterations of such an endeavor I would likely consider only point shares because the salary cap seems sufficiently high that the expansion franchise would likely be able to draft the highest impact players without too much hesitation. Of course, I have no way of predicting any draft day deals the Golden Knights might make and I have no idea what sort of system they want to run out on the strip, but this analysis should be at least a little helpful in illuminating the best drafting strategy for the new franchise.

One final thing to note is that the Golden Knights have already publically decided that they are going to try to build their team as young as possible, and as a result have already lined up four trades with other teams in exchange for draft picks. As a result, this might mean that some of the players that were selected above will be ineligible to be selected due to the terms of those trades.

Editors Note: If you have have any questions about this article for Mitchell, please feel free to reach out to him at mitchellpleasure@college.harvard.edu

]]>The ancient Greek philosopher Heraclitus once said that change is the only constant in life. While his teaching is from thousands of years ago, it can easily be applied to the rules of modern day sports. Rules have been constantly tweaked and adjusted for decades across all sports. While the NBA has made many rule changes such as adjusting its defensive rules, banning and unbanning the slam dunk and implementing a shot clock, the introduction and altering of the 3-point line has had one of the most dramatic impacts on the association out of any rule change in league history.

The 3-point shot made its debut in the NBA in the 1979-80 season, and it was originally called a “gimmick” in the New York Times’s season preview. Phoenix Suns coach John Macleod said, “It may change our game at the end of the quarters, but I’m not going to set up plays for guys to bomb from 23 feet. I think that’s very boring basketball”. Boston Celtics’ president Red Auerbach stated that, “We don’t need it. I say leave our game alone”. In the first season of its existence, the shot was a rarely used weapon as teams averaged only 2.8 attempts per game. To put that into perspective, teams averaged 27 attempts per game in this 2016-17 season. Three-point shots per game gradually increased over the 1980s, as coaches finally realized that a shot that is worth 50% more pays off, even if that shot is a little harder to make. Ex-Nets coach Lawrence Frank said, “Teams have all caught on to the whole points-per-possession argument”. By 1994, teams were averaging nearly 10 attempts per game, then the NBA made one of its most impactful rule changes of all time. As a result of below average scoring in the early 1990s, the league moved the 3-point line closer, hoping the easier shot would result in higher scoring games. The line was originally 23 feet 9 inches (22 feet in the corners), and they shortened it by 21 inches to a uniform 22 feet at the beginning of the 1994-95 season. Although the average number of 3-point attempts per game increased by over 50%, the line was moved back to its original distance after the 1996-97 season because the shortened line had lowered the average score of games even further. In the three seasons before the line was moved in, teams averaged 105.6 points per game, while in the three seasons with the shorter line, teams averaged only 100.8 points per game. When the line reverted back to its original distance, there was a slight decrease in 3-point attempts the following season, however teams increased their attempts over time, soon surpassing the average amount of attempts that were taken when the line was moved in. Today, the 3-point shot is far from the “gimmick” it was considered to be in 1980. The shot plays a huge role in teams’ strategies, and players of all positions are expanding their range past the arc. Considering how significant the 3-point shot is in today’s league, it is important to investigate the effect the 1995-1997 experiment had on the NBA’s teams – Did the moving of the line help or hurt good 3-point shooting teams?

To test this, I used team per game stats from the 1992-93 season through the 1998-99 season and sorted the stats into the two seasons prior to the shortening of the 3-point line, the three seasons with the closer line, and the two seasons after the line was moved back out to its present distance. I created variables to represent the means of win percentage, 3-point percentage and 3-point makes per game for each team from the three different eras – e.g. “Average 3pt percentage before line moved in” is the average 3-point percentage for each team in the two years before the line was moved in. The summary statistics for the averages of teams’ 3-point percentages and 3-point makes per game is shown in Table 1 below

__Table 1__

Then I generated variables to represent the difference in teams’ average win percentages, average 3-point makes per game and average 3-point percentages between the different eras. Table 2 below shows the summary statistics for the latter two statistics.

__Table 2__

To test my hypothesis, I regressed difference in average win percentage when the line was moved in on average 3-point makes per game and average 3-point percentage in the era before the move. Then I regressed difference in average win percentage when the line was moved back out on average 3-point makes per game and average 3-point percentage while the line was moved in.

I found that how many 3-pointers a team was making per game before the line was moved in and how high their 3-point percentage was before the line was moved in had no statistically significant effect on their difference in win percentage once the line was moved in. However, when the line was moved back out to its present distance, 3-point makes per game during the period with the closer line and 3-point percentage during this period had a statistically significant effect on teams’ differences in win percentage when the line was moved back out.

First, we look at the regressions of difference in average win percentage when the line was moved in on average 3-point percentage and average 3-point makes per game in the two years before the move.

**Table 3**

As shown above, neither 3-point makes per game before the move nor 3-point percentage before the move had a statistically significant effect on a teams’ difference in win percentage between the era with the closer line and the era before it was moved.

Next, we look at the regressions of difference in average win percentage after the line was moved back out on average 3-point percentage and average 3-point makes per game during the era with the closer line.

**Table 4**

As shown in table 4 above, both 3-point makes per game during the three seasons with the closer line and 3-point percentage during these seasons had a statistically significant effect on a teams’ difference in win percentage between the era with the line moved back out and the era with the line moved in.

When interpreting the results of this study, it is important to look back at the summary statistics of the differences in average 3-point makes per game and average 3-point percentage between the different eras (Table 2, which is shown again below).

**Table 2**

First, the moving in of the 3-point line had no effect on teams’ differences in average win percentages because every team began making more 3-pointers and shooting them at a higher percentage when the line was moved in. The increase in percentage is what I would expect, assuming a closer shot is easier to make than a further shot, thus the increase in makes would also be expected under the assumption that teams were more willing to shoot the easier 3-point shot. As shown in table 2, the minimum increase in average 3-point makes per game was nearly one make per game, and the maximum increase was just over four makes per game. Since the improvement in these two statistics were consistent league-wide, no teams gained a comparative advantage.

However, the effects of moving the 3-point line back out is a different story. While some would expect that the effects of moving the line back to its original distance would result in the exact opposite of what happened when the line was moved in, this was not the case. Even though the 3-point shot was made more difficult when it was pushed back 21 inches, not every team began making less 3-pointers or shooting them at a lower percentage. Table 2 shows that while the league average 3-point makes per game and 3-point percentage decreased, it was not the case for every team, as the maximum observations for both those variables were positive numbers. While making the 3-point shot easier helped the shooting of every team, making the 3-point shot more difficult did not hurt the shooting of every team. Because the moving back of the line affected teams differently, there was room for a comparative advantage.

According to the regressions, teams that were making a lot of 3-pointers when the line was moved in and shooting them at a high percentage had the largest drop in win percentage when the line was moved back out. One way to interpret this is that teams that had been getting a larger portion of their scoring from threes now had to either get their scoring from a more difficult 3-point shot or change their offensive game plan to include a higher proportion of 2-pointers. Therefore, the teams that weren’t making many threes (or were shooting them at a low percentage) while the line was moved in closer received a comparative advantage when it was moved back out because they didn’t have to change their offensive game plan as much as the teams that were relying heavily on the closer 3-point shot.

In an era where there is debate on if it is time to move the 3-point line back even further because of how proficient players have become at the shot, this post provides valuable information on how the moving of the line in the past has affected the NBA’s teams.

Editor’s Note: This article is a condensed version of a research paper conducted for the Harvard Sophomore Economics Tutorial Econ 970: Sports Economics. If you would like to read the full-length paper, please contact Robert at robertfeinberg@college.harvard.edu.

]]>In the 2013 NBA Finals, right before Ray Allen hit what will inevitably go down as one of the greatest shots of all time, hundreds of Heat fans were already filing out of the American Airlines Arena, cementing the widespread notion that Miami had the worst fans in the league. Four years later, as the Cavaliers, Warriors, and their questionably bandwagon supporters reach yet another NBA Finals, the age old question re-emerges, “Which team has the most loyal fans?” It’s about time that we start ranking—as we do with everything else in sports—NBA fans’ dedication to their teams.

To assess fanbase loyalty, I examine how stadium attendance responds to a team’s performance. In theory, a perfectly loyal fanbase’s attendance remains relatively constant from year to year regardless of the team’s performance. More colloquially, fans will stick with their team “through thick and thin.” Conversely, disloyal, “fair-weather” fans will show up to games only when their team is doing well and disappear when their team is underperforming. To quantify this relationship, for each NBA team I calculate the correlation between win percentage and attendance for as far back as the team has remained in its current stadium for every season dating back to the 1991-1992 season.

At the top of the chart are the most loyal, “diehard” fanbases (as signified by low correlations between wins and attendance) and at the bottom are the disloyal, “fair-weather” fans. Leading the pack in loyalty are the Dallas Mavericks (who actually posted a slightly *negative* correlation) and the New Orleans Pelicans. Among the least loyal are the Houston Rockets, Denver Nuggets, and Los Angeles Clippers. Interestingly, despite all the flack that Heat fans received for being fair-weather LeBron fans, they finished in the top third of the index. The results for the Cavaliers and the Warriors, which do not include their recent dominant years, suggest that their fanbases are moderately below average in loyalty. Because of their recent relocations and thus small sample sizes, I omit the Brooklyn Nets, Orlando Magic, and Oklahoma City Thunder from the rankings.

We can visualize the significance of these correlations by plotting a team’s wins and attendance over time. For example, such a chart for the Detroit Pistons demonstrates why their fanbase falls on the disloyal end of the spectrum:

When the mid 2000s Pistons dominated the league, attendance at The Palace of Auburn Hills consistently hovered around 100%. However, when the Pistons fell out of playoff contention, fans fled their home games, leaving The Palace empty.

Similarly, when the Portland Trailblazers suffered a disastrous few years of play in the mid 2000s, their attendance concurrently dipped, but as the Blazers began to resurge in the following years, their attendance jumped right back up to full capacity:

Some other teams show less of a responsiveness of attendance to wins. For example, despite fluctuations from elite play to more modest performances, attendance percentage at the Dallas Mavericks’ American Airlines Center consistently hovered close to 100%:

There are of course a host of threats to the validity of this analysis. One potential source of error is that the heterogeneity in duration of current stadium stay may bias the reliability of the correlation results. We must also recognize that stadium attendance isn’t a perfect proxy for loyalty, as dedication to a team manifests itself in all sorts of ways. For example, fans show support by watching games on TV, buying merchandise, or just closely following news on their team.

Additionally, stadium attendance will be biased by variables other than wins, such as city size, mean city income, stadium location, presence of a superstar on the team etc. For example, teams in cities with above-median income had a correlation of .42 vs. .59 for lower income cities. Similarly, cities with above-median population had an average correlation of .41 vs. .55 for lower population cities. These disparities are expected since larger, wealthier cities have an easier time filling up seats regardless of their team’s performance. For example, the Knicks, who ranked third in the loyalty index, consistently maintained attendance percentage in the high 90s through strong and poor seasons likely because Madison Square Garden is situated in the center of one of the biggest, wealthiest cities in the country:

Another interesting hiccup in the results is the possibility of simultaneous causality, wherein higher attendance causes home teams to perform better, although that hypothesis may be giving fans a little too much credit.

While we cannot extrapolate perfectly from this loyalty index, hopefully it will feed fuel to the fire of the everlasting debate of who has the best fans.

Editors Note: This article is a condensed version of a research paper conducted for the Harvard Sophomore Economics Tutorial (Econ 970: Strategy and Competitive Advantage). If you would like to read the full length paper, please contact Nicholas at nheath@college.harvard.edu.

]]>Editor’s note: This analysis was initially conducted as a final project for the Harvard Economics Sophomore Tutorial (Economics 970: Firm Strategy and Purpose in the Online Economy) and has been abbreviated from its original format for the purposes of the HSAC blog

For my final project for my Economics Tutorial, I conducted an analysis of College Football revenues and expenses was conducted to determine the effect of money spent in recruiting on the quality of recruits a school signs to play at their school. For the purposes of this analysis, “major college football” includes only the schools within the Football Bowl Subdivision (FBS). The median revenue of FBS programs account for more than 35% of the total median revenue generated by athletic departments. Within the FBS, the top 5 conferences (commonly referred to as the Power 5) hold a large portion of the power and revenue within the FBS. For the purposes of this study, Notre Dame will be included in the Power 5 designation.

In 2014-2015 there were twenty eight athletics programs that listed revenues of $100 million or more. Of those twenty eight, twenty seven of them belong to one of the Power 5 conferences. When we look specifically at the top earners of college football, the top 10 earners from 2000 earned less than $300 million, and by 2011 that number grew to $759 million. A large portion of this money comes from TV rights. In 2014 ESPN signed a $610 million/year contract with all of the FBS teams to broadcast the new 4-team playoff. The members of the Power 5 conferences take home 75% of the money from that contract, while the remaining 25% goes to the non-Power 5 conferences (commonly referred to as the Group of Six). While the top earners are growing at a rapid pace, the lower tiers of the FBS cannot say the same. Conference USA recently signed a new TV contract that will pay each school $300,000-$400,000 per year, about a third of their past contract.

Even with shrinking revenue numbers, institutions of higher learning may continue to subsidize their college football programs. In 2003 a study conducted found that public institutions receive about 6% more in state appropriations if they field an FBS football team. Schools with football teams will receive 3%-8% increases in state appropriations the year following a successful season. In a 2012 working paper from the National Bureau of Economics Research, Michael Anderson found correlations between the success of a college football team and reducing acceptance rates, increased donations, applications, academic reputation, in-state enrollment and incoming students’ SAT scores. For many people, a school’s college football team is their first reference point about an institution of higher learning. Major college football teams and their success have a profound effect on the school as a whole and its public perception.

The effects of a successful major college football team are wide ranging. To field a successful team, quality recruiting is key. There is a positive correlation between the quality of recruiting classes and wins. Bud Elliot of SBNation found that the majority of recruits on FBS national champions were either four or five star. There are only about 300 of those 4 or 5 star recruits every year. The race for those recruits is very important, and one that major programs are willing to shell out money for.

This study aimed to establish a link between the money a major college football program spends on recruiting, and the quality of recruits a team acquires. The teams who earn more revenue, can subsequently spend more in recruiting. If the revenue gap continues to widen, and the above relationship is established, non-Power 5 teams could be shut out completely from competing on a consistent level with upper-level revenue teams. The above discussion about state appropriations and other effects would mean that a shutout of non-Power 5 schools would have profound effects on the institution as a whole.

**Data: **

USA Today, The Des Moines Register and ESPN have compiled revenue and expense data for each of the FBS football programs in the country. The aggregation of the data they collected runs from 2008-2013. The Department of Education gathers revenue and expense data on all of the athletic departments (with the exclusion of some private institutions). To extend the recruiting expense numbers, shares of total expenses spent on recruiting were averaged for the past 3 years. Those ratios were then applied to the real value expenses from 2014 and 2015. That technique gave an estimation of what each team’s level of money spent on recruiting was during 2014 and 2015.

The data on quality of recruiting class was gathered from Rivals.com. This data was assigned on a “Year-1” basis. During the year 2015, a program was recruiting players who are in the Class of 2016 (high school class). To attempt to control for a school/teams’ qualities, a fixed effect was crafted for each observation. The first control was the quality of the team and their strength of schedule. To do so, collegefootballreference.com’s SRS and SOS ratings were used. An additional scale used for the team’s quality was their overall winning percentage and their conference winning percentage. The next control used was the relative prestige of the football program. To do so data from collegefootballreference.com was used to acquire AP poll rankings over the past 25 years. A simple scaling system was used with the AP Poll rankings as follows: A team ranked #1 was given 25 points, #2 was given 24 points and so forth. A scale of 5 was given for rankings in the past 5 years, a scale of 4 for rankings in the 5 years after that and so forth. Another thing controlled for was the membership of the school in one of the Power 5 conferences during the time of the recruitment. I used collegefootballreference.com to look at past conference membership data. This variable was implemented as a dummy variable. This did not provide any differentiation between conferences within the Power 5. Another control used in the model was the amount of blue-chip high school recruit talent located within each state. Between 2009 and 2016 the Top 200 or 300 (the most available from each year) recruits from ESPN were analyzed. The number of recruits in each state was then divided by how many Power 5 schools are in that state. The last control used was the academic ranking of each institution.

**Analysis:**

3 separate regressions were conducted to analyze the data mentioned above. The first was done solely with the teams who were a part of the Power 5 conferences at the time. There has been some movement in/out of the conferences so this regression included only those during each year.

For the data on Power 5 conference teams a step-wise regression was performed and the independent variables that remained statistically significant and remained in the model were: ln(recruiting money spent), conference winning percentage, overall winning percentage, strength of schedule (SOS ), top recruits per school, and the total AP polls points (program prestige).

A separate step-wise regression was done on the schools that were not a member of one of the Power 5 conferences during each year. That model included the same dependent and independent variables with one exception. The independent variable of “Top recruits per Power 5 schools” were not included in their model. The independent variables that remained statistically significant through the stepwise regression were: ln(recruiting money spent), total AP polls points (program prestige), strength of schedule (SOS) and the quality of the team (as measured by SRS).

The last step-wise regression conducted was done on all the observations in the dataset. This model included all the same variables used in the non-Power 5 conference regression. The independent variables that remained statistically significant through the stepwise regression were: total AP poll points (program prestige), strength of schedule (SOS), quality of team (SRS), conference winning percentage, and the ln(total expenses).

The limitations of this model begin with confounding variables. There are many different factors that may not have been controlled for in this model. The first was the inherent skill coaches at different schools have in recruiting. It is impossible to quantify how good a coach is at establishing a connection with a high school student, and making him feel like he belongs at their school. Some coaches are very gifted at this and can make a huge difference for their program. Another possible confounding variable is the depth chart at football programs. Many players decide where they want to play based on how quickly they can have extensive playing time. That variable would be very tough to control for in this model even with a large amount of time because of the lack of insider knowledge about different players’ true skills at college football programs. One confounding variable that could have impacted the model is the quality of each campus and the extracurricular activities around campus. It may be a small or negligible effect but some players may factor in the social scene at schools as a part of their decision.

For variables that were included, one limitation is certainly the measure of top players per Power 5 school in each state. While this will give some bearing to the amount of high school talent within a state it can be skewed a bit. Many states are very large and just because a player is located within that state does not mean that the school within it is the closest geographically. The other issue with this variable is that a majority of players who sign to play at Power 5 schools are not within the top 200 or 300. Each school signs approximately 30 players. In total, approximately 1600 players commit to Power 5 programs each year. The model cannot account for all those players.

The Stata results of the three regressions are as follows:

**Conclusions:**

The first regression can be thought of as the regression for recruiting the top high school talent. The second regression may make sense to point to the regression for recruiting lower level talent and the third for mid-level talent. Where those lines are drawn is a tough decision. Power 5 and non-Power 5 teams battling against each other for a recruit normally occurs with a mid-level talent. When Power 5 teams are battling each other for high level talent the money they spend on recruiting makes a difference. When non-Power 5 teams are doing the same, money spent on recruiting can also make a difference. The important note for non-Power 5 teams is that the money spent on recruiting in the third model was not significant. That means non-Power 5 conference teams can compete for mid-level talent without spending the same amount of money in recruiting. The only damper is that the natural log of total expenses was significant in the third model. That means that total expenses could be an inhibitor for non-Power 5 teams competing for mid-level talent.

For Power 5 conference teams the prevalence of overall and conference winning percentage reinforces that recruiting and winning may create a continuous cycle. When teams win games, their recruits get better. When their recruits get better, they will win more games. This cycle is a great finding for teams that are already at the top of the college football world, and could explain some team’s staying power. This cycle certainly holds for Power 5 teams recruiting top talent. The significance of SOS in the regression on Power 5 teams could point to high profile games on TV. Often when two top Power 5 teams play each other the game will be nationally televised and draw a large audience from all across the country. National TV games bring programs notoriety and can help in recruiting. Another continuing cycle might develop for Power 5 conference teams because of the high number of conference games played each year. At least 2/3 of each teams schedule is played in conference. Many of those matchups are two high level teams competing against each other and some can be rivalry games. For the non-Power 5 schools this link (SOS) also exists but for a slightly different reason. Two non-Power 5 teams playing each other might not attract a large national audience, but a team in that subgroup playing a Power 5 team might. Matchups against strong Power 5 teams could bring that national audience and account for the significance of SOS. SRS’s significance with the regression for non-Power 5 schools may be the better programs separating themselves in the subgroup. Teams in the Power 5 who are located within a state with a lot of high school talent have a special advantage. They can tap into the states’ resources very well and keep many recruits close to home. Both regressions showed some significance of the past prestige (as measured by AP poll points). It may be more important for non-Power 5 teams as a way to differentiate themselves, just like SRS.

The purpose of this study was to determine the effect of money spent on recruiting on the quality of recruits a team signs. When teams are battling for both high and low level talent there is a significant difference that money spent on recruiting can make. There is no significant difference made by recruiting money spent in competition for mid-level talent, but overall expenses may make a difference. As the expense gap grows between teams in the FBS, non-Power 5 teams may need to try to make up for that difference in other ways. One example could be the uptick in social media recruiting. There is almost no limit to the amount a team can post on social media to try to get high school player’s attention. Another outside of the box way that non-Power 5 teams might be able to compete is through the new option of stipends for college football players. Scholarships now can cover cost of attendance (COA) for athletes. Non-Power 5 schools may be able to close the recruiting gaps through higher stipend payments. Players who have families that are struggling could benefit in a big way from stipends. Whatever creative ways non-Power 5 teams find to try to bridge the gap, that need is urgent, and may become more so.

]]>

Every February, the NFL hosts its annual scouting combine. This event welcomes approximately 300 college football players who are looking to take the next step and join the NFL. It has been described as a week-long job interview, complete with personal conversations with teams and on/off-field tests. Without fail, the most anticipated drill each year is the 40-yard dash. As NFL.com explains, this drill is testing a player’s explosion from a static start. Two other drills that are employed at the combine are the vertical and broad jumps. Both of these drills are used to test an athlete’s lower body explosion and power. The shuttle drill is used to test a player’s lateral quickness and explosion in short areas. A similar drill, the 3-cone drill, tests an athlete’s ability to change direction at a high speed. An attendee’s performance at the combine can help their name rise up draft boards. In some instances, bad times can plummet a player’s draft stock. For players who are on the fringe of being drafted, the combine provides a chance for them to either show they belong, or be exposed.

In 2013, Jonathan Bales analyzed the correlation between the 40-yard dash times of running backs and their other drills. His analysis was done on all running backs drafted between 2008 and 2012. He saw that the strongest correlation was between a running back’s 40-yard dash time and their weight (0.51). The second strongest correlation was with a running back’s broad jump (-0.46). Correlation values are from -1 to 0 and 0 to 1. The closer to -1 or 1 the stronger the correlation between the data compared. In this article I will use the absolute value of the correlation values when referring to them in writing.

I set out to find the correlations between all the athletes’ 40-yard dash times and their performances in other drills. I thought that doing so would give me some insight into why times are different between players within position groups. I analyzed all position groups at the NFL combine from 2005 to 2015 and only included the players who participated in all of the drills. The tables below show each position group and what their top three highest correlated drills are with the 40-yard dash (along with the values).

To further the analysis, I attempted to group the positions together by looking at the top correlations for each position. I combined this with some personal football knowledge on what each position does and how it is connected with those around it. I grouped the running backs and wide receivers together because of their low number of highly correlated drills and prevalence of the broad jump. This group will be called the “offensive skill” group. I grouped the cornerbacks, free safeties, and strong safeties together. All three have the broad and vertical jumps as their top two drills, and they are all highly connected on the defensive side of the ball. This group will be called the “defensive skill” group. The guards, tackles, defensive tackles, and centers were grouped together. All four of them have the shuttle or 3-cone correlated with the 40-yard drill at a value of approximately 0.5, which is unique. This group is called “interior” group. The next group was the defensive ends, inside linebackers, outside linebackers, and tight ends. They are called “semi.” All four of these positions had the broad jump and vertical jump in their top 3 and ¾ had both the jumps at or around 0.5. Quarterbacks are left in a group of their own. They are the only position group that had three correlations above 0.52, and they didn’t fit in with any of the other groups. In summation:

- Running backs and wide receivers: Offensive Skill
- Cornerbacks, free safeties, and strong safeties: Defensive Skill
- Guards, Tackles, Defensive Tackles, Centers: Interior
- Defensive Ends, Inside Linebackers, Outside Linebackers, Tight Ends: Semi
- Quarterbacks: Outliers

I then combined the individual position data to create data for each grouping. A scatter plot was created (with the corresponding correlation) for all the correlations that were either the highest, and/or above 0.5 (excluding the quarterback). I separated the data points by whether they were drafted or undrafted to try to display where there may be some sort of cutoff for a player being drafted (or not) based on his combine performance. First for the offensive skill:

The low correlation value with the broad jump for running backs and wide receivers indicates to me that while many of the “offensive skill” players have differing levels of lower body explosiveness, but many can make up for it, or diminish that advantage with foot speed. So while someone might be more explosive, the other player can catch up to their 40 time with stride turnover. While this isn’t a foolproof connection, it could be the link for some of the differences in times within the offensive skill group. Another thing to note is that while the majority of players in the top left portion of the graph (exhibiting a slow 40 time and a low broad jump) went undrafted, there are certainly a lot of outliers, which may point to players who are heavier.

Next for the defensive skill:

This low correlation indicates similar results to the offensive skill. One interesting difference between the offensive and defensive skill is there appears to be a little more spread horizontally (across broad jump results) for the undrafted offensive skill than the undrafted defensive skill. This would indicate that it is easier for an offensive skill player to overcome a poor 40-yard dash with a good broad jump and be drafted. It is worth noting, the major outlier in this group was the Byron Jones, the cornerback from Connecticut, who in 2015 turned in one of the greatest combine performances ever.

The next group plotted is the interior. Here is the plot with broad jump:

The interior group also has a high correlation with the 3-cone drill.

The relatively high correlation value with the broad jump, paired with the relatively high correlation with the 3-cone indicates that players with a high broad jump will likely run a fast 40 for the interior group. More importantly though, the high correlation with the 3-cone may indicate that the times of those players in the interior group are more dependent on the players stride rate than other position groups. The 3-cone in turn is testing an athlete’s turnover of his stride as he changes direction. This also means that some players don’t have to be as explosive in the hips as an interior player to run a fast 40. They could make up for that lack in explosiveness with their stride turnover. For this group there certainly seems to be a cutoff point in measured athleticism when looking at the 40 time and broad jump graph. Above a 5.5 40 and below a 95 inch broad jump could be that point, because few players put up those numbers at the combine and went on to be drafted.

The next group analyzed was the semi:

The semi group also has a high correlation with the vertical jump:

The semi group had both the vertical jump and broad jump highly correlated with the 40-yard dash. This means that the overriding factor in the semi group’s 40 time is their hip explosiveness. The “stride turnover affect” I spoke about above does not seem to be as powerful on these players’ performances in the 40-yard dash.

The last “group” to analyze was the quarterbacks:

The quarterbacks were the true outliers of the position groups. They had both vertical and broad jump as their two highest drills, which wasn’t uncommon, but their values were much higher than the other groups (both above 0.6). This again suggests that the most predictive attribute of a QB for their 40 time is their hip explosiveness. However, the shuttle is their third highest correlated value (0.54), and that indicates that both turnover and explosiveness are highly related to a quarterbacks 40 time. That conclusion for quarterbacks is the most consistent with conventional wisdom about how someone runs a fast 40, and what accounts for differences between players.

]]>

The NBA playoffs have arrived at last.

With the first round series set, basketball analysts are looking to regular season results for information about postseason matchups. NESN, for instance, points to the Bulls and Celtics splitting their regular season series 2-2 as cause for concern. USA Today notes that the Clippers’ 3-1 record against the Jazz this season should inspire confidence among Los Angeles fans.

To test whether it’s wise to tie postseason predictions to regular season matchups, I grabbed data from Basketball-Reference going back to the 2006-2007 season. I was interested in home teams’ series winning percentage, as well as the following three regular season variables:

- Home team’s Adjusted Net Rating (ANR), an opponent-adjusted “estimate of point differential per 100 possessions,” as calculated by Basketball-Reference
- Away team’s ANR
- Regular season point differential between the home and away team.

As an example, this year’s Celtics have an ANR of +2.32 compared to the Bulls’ -0.08. The Celtics scored 409 points against the Bulls this season, while they allowed only 389. The regular season point differential, then, would be +20 for the home team Celtics.

A preliminary check suggests that home teams *do* fare worse against teams they’ve been outscored by in regular season matchups:

But we’ll need to control for team strength. After all, a negative differential may mean that, seeding be damned, the away team was actually better than the home team. To control for this, I used logistic regression to model the home team’s probability of winning each series as a function of the three variables outlined above: home team ANR, away team ANR, and regular season point differential in matchups between the two teams.

The results of this regression are as follows:

Variable |
Estimated Coefficient |
Test Statistic |
p-value |

Intercept | 0.701 | 1.35 | 0.18 |

Home Team ANR | 0.16 | 1.77 | 0.08 |

Away Team ANR | -0.24 | -2.82 | 0.005 |

Regular Season Point Differential | 0.02 | 2.04 | 0.04 |

These numbers indicate that regular season matchups *do *matter a significant amount, even after you’ve controlled for the strength of each team.

As an editorial aside, I should point out that I expected regular season matchups to be irrelevant, and I was fully prepared to point and laugh at the talking heads who suggest otherwise. Perhaps the professionals are professionals for a reason.

While the model is crude and simplistic, it might still be fun to predict the first round of the 2017 playoffs using our three variables of interest.

Home Team |
Probability |
Away Team |
Probability |

Celtics | 82% | Bulls | 18% |

Cavaliers | 89% | Pacers | 11% |

Raptors | 86% | Bucks | 14% |

Wizards | 95% | Hawks | 5% |

Warriors | 91% | Trail Blazers | 9% |

Spurs | 87% | Grizzlies | 13% |

Rockets | 89% | Thunder | 11% |

Clippers | 60% | Jazz | 40% |

These predictions should be taken with a grain of salt, especially compared to more advanced models. Whatever the outcomes, though, it’s good to return to the glorious time of year that is the NBA postseason.

]]>After 144 days of college basketball, we’ve come to Gonzaga v. UNC. It’s the mid-major who finally made it over the hump against the conference champion from what people were saying was the best conference of all-time pre-tourney. I’m trying not to fret over what exactly I’ll do with my life after the final buzzer sounds tonight, but until then, I’m doing my best to just soak in the moment.

As Mark Titus often expounds upon, Gonzaga is a victim of an unfair reputation of choking in the NCAA Tournament. The Zags’ two upsets in the past decade have come to an eight-seed Wichita State that would go onto the Final Four in 2013 and a Steph Curry-lead Davidson in 2008 in the First Round. Besides those two “upsets” (both opponents ended the season in the Top 20 in KenPom), Gonzaga has never been bounced early in the Tournament since 2006.

Yet, everyone was saying that this season seems different —that this wasn’t your father’s (or older sibling’s) Gonzaga. They ran through the WCC with one trip-up against BYU, and also beat Arizona and Florida in the regular season. But besides those two marquee matchups, it is tough to tell how good a team is that doesn’t regularly play great competition.

In a previous article, I highlight a new way of ranking teams in college basketball that tries to better encapsulate a team’s strength of victory by using weighted probability rather than point differential. In the end, it yields a coefficient that represents a team’s strength. I also calculated a proxy for Strength of Schedule by calculating the average coefficient of each team’s opponents during the season. The following plot shows how Gonzaga has performed over the past four years against its SOS.

Gonzaga’s strength of schedule has been worse than in years past, but its performance has been incredibly high. Gonzaga’s dominance over the WCC and its out-of-conference opponents is different from the last four years.

But even if you look at all Division 1 teams this year, no team dominated its schedule like Gonzaga did. No team approaches Gonzaga’s dominance—Wichita State and Villanova come the closest. North Carolina, on the other hand, has a lower coefficient, but it was achieved against a much better schedule. Off to the top right, there’s a cluster of ACC teams—Virginia (above UNC, not labelled), UNC, Duke, and Louisville—who are high achieving teams with strong schedules.

Looking even further back, to the past four years, Gonzaga ran over its opponents with more force than any other team. While the “Dream Team” Kentucky team from 2015 comes close, it doesn’t eclipse Gonzaga’s coefficient from this year. (To be fair, Kentucky’s SOS was much higher that year than Gonzaga’s this year.)

So, for those doubting Gonzaga’s place in the NCAA Championship game, don’t. Calling this team a mid-major is misleading. They can hang with anyone. The question will be whether the Tar Heels—and their insane rebounding—can hang with them.

]]>Sports media’s talking heads have dismissed Tim Tebow’s time with the New York Mets as “unearned” and “an embarrassment.” But the numbers tell a different story.

In fact, according to the numbers, the analytics, the stats, and even a few metrics, Tim Tebow isn’t merely good enough to play baseball—he’s in line to be one of the game’s brightest stars.

Using data from Spring Training (no sites have regular season data for Tebow as of this post), I calculate that the former football star is, by a wide margin, the finest prospect in the Mets organization.

First, I limited the pool to Mets players whose batting averages were among the team’s top 50 this spring. The next step required a statistic even more advanced than batting average. That’s why I created SLugging Over OPS for Fielders (SLOOF). The formula for SLOOF is as follows:

Because the term in the denominator is squared, this stat is officially quadratic. Let’s see how Tebow stacks up compared to other Mets:

Using SLOOF, it’s clear that Tebow deserves not only to make the MLB, but that he should be starting on day one. For some perspective, consider this: the Mets’ next closest SLOOF is 200, more than 5700 away from 5920.

While we know that Tebow in his current form is a premier ballplayer, questions remain about how his abilities will play out in the future. Fortunately, we can use linear regression to project his future numbers.

This spring, Tebow started out with 0 hits in 0 games played. While it may be too early to make such bold comparisons, it’s worth noting that Tebow shares this statline with some legendary seasons, including Barry Bonds 2001, Babe Ruth 1921, Mark Bellhorn 2003, Sammy Sosa 1998, Bellhorn 2004, Bellhorn 2006, and Rogers Hornsby 1924.

Tebow finished the spring with 4 hits in 9 games, a positive, causal relationship illustrated by the following graph:

If we assume this rate to keep up, the sky is the limit for Tebow. Given the proper number of games, there’s no reason Tebow shouldn’t join the 3000-hit club:

So while it has become fashionable in the hot-take crowd to doubt Tim Tebow, the consensus among sabermetricians is that Tebow is a present and future star. Teams would be wise to move any assets in their pursuit of him.

]]>It’s perhaps the best and worst weekend of the year. The cream of the crop has risen in the NCAA Tournament, culminating in the three most high-stakes games of the year. There are a lot of interesting story lines: Gonzaga finally got over the hump and into the Final Four, the fact that Frank Martin’s vocal chords are still intact four games into the tourney, among others. But come Monday night, college basketball will all but vanish into thin air for the foreseeable future.

While both games are fascinating matchups, the one that catches my eye the most is the UNC-Oregon tilt. Both teams are uber-athletic, have top-20 offensive and defensive units, and feature the ACC and Pac-12 POYs. In a matchup of elite offensive and defensive units, the ability to control the board may be crucial in deciding who comes out on top. For North Carolina, that may be great news. North Carolina is the best offensive rebounding team and the 22nd-best defensive rebounding team, per KenPom. However, those raw numbers may understate how dominant UNC is on the glass.

Using raw percentages can lead to questionable results. For example, New Hampshire is the “best” defensive rebounding team in the country according to KenPom. However, if any major conference team were to walk into Lundholm Gym in Durham, New Hampshire, they might look like the Monstars compared to the Wildcats.

Thus, I wanted to create an adjusted rebounding metric that took this into account. Using box score data from this past year, I ran two separate regressions—one to determine ORB strength, and one to determine DRB strength. For each game in the 2017 season, I calculated each team’s offensive and defensive rebounding percentage. I then created a dummy matrix for each team, with the team of interest coded as 1, and the opponent coded as -1. I also included a variable that designates whether the game was home, away, or at a neutral site for the team of interest.

The regression yielded two coefficients for each team, which I standardized and then plotted on the graph below. The further up a team is on the graph, the better the team is at offensive rebounding. The same relationship exists for a team being further right and its defensive rebounding ability.

It’s UNC and then the rest of the country. No team even approaches UNC’s ability to control the boards. It’s defensive and offensive rebounding coefficients both hover around 3.5 (where 0 would be around league-average). No other team goes above 2.5 in either category. Oregon, on the other hand, has a defensive coefficient of 0.04 and an offensive coefficient of 0.48, nowhere near UNC’s dominating performance on the board.

If Oregon can’t mitigate UNC’s huge rebounding advantage today, it may be a long day for the Ducks.

]]>