MoneyB-ball: An Analysis of Over and Undervalued NBA Statistics

By David Arkow, Ronan Jachuck, Luke Kolar

There are 30 teams in the NBA summing to over 400 players each season. Maximizing points scored (offense) and minimizing points allowed (defense) are criteria that each team’s general manager is looking to optimize to improve their winning percentage. There are many statistics that can be measured that affect the offense and defensive sides of the game and it is the job of each team to identify players that improve these areas given their current roster. However, this is not an easy task as there are complex salary regulations that limit a team’s ability to spend in the open market. Therefore, acquiring players with specific traits that will improve your team’s performance while staying within these salary regulations is a valuable skill. In this article, we analyze what player profiles and statistics are undervalued in the NBA market. This article was adapted from a final project submission for Judd Cramer’s undergraduate Sports Economics course.

Previous Literature

One of the most famous early examples of using statistical methods to build a competitive team against wealthier franchises is the Oakland Athletics’ playoff run in 2002. Popularized by the hit film, “Moneyball”, the A’s success put their approach in the spotlight and John Hakes and Raymond Sauer performed an economic evaluation to test its validity (“An Economic Evaluation of the Moneyball Hypothesis”). Using linear regression to estimate winning percentage on a number of baseball metrics, they confirmed that on-base percentage was significantly more important to a team’s chances of winning than power hitting statistics like slugging percentage. They also estimated a regression of salary on the same player statistics, and they found that the coefficient for slugging percentage (2.392) was significantly larger than the coefficient for on-base percentage (1.360). 

Although the Moneyball analysis focuses on baseball, similar attempts have been made to find undervalued statistics in the NBA. Using publicly available NBA data from SportVU camera tracking, Basketball Insiders identified three underrated statistics in the league. They found that open shooting numbers, pace metrics, and rim protection are underrated statistics when it comes to predicting a team’s performance (Draper 2022). Similarly, FiveThirtyEight created a regression using players’ box score stats (points, rebounds, assists, blocks, steals and turnovers) to predict how many points per game each stat was worth. The results showed that steals were worth approximately nine points per game, or a marginal steal is worth nine times more than a marginal point when predicting a player’s impact on a game. From these studies, we can see that non-shooting metrics are very important when predicting the success of a team. However, they do not specifically answer our research question of which statistics are undervalued in the NBA market. 

Our analysis is inspired by the Moneyball example, except we focused on basketball instead of baseball. The Basketball Insider study identified underrated statistics, but they simply selected three statistics that are relatively new to the public given the NBA’s release of SportUV data. The FiveThirtyEight study ran regressions of box score statistics on points per game, but did not take into account the value of these statistics in the open market. Our analysis differs from these because we used regression analysis to not only identify statistics that are important in predicting winning percentage, but we also regressed box score statistics on player salary to determine which statistics are undervalued relative to their impact on winning percentage. 

Data and Methodology

To conduct our analyses, we used three datasets with statistics on player performance and player salaries from the 2021-2022 NBA season. The player performance statistics are from Basketball Reference and the salary data is from Spotrac. We merged the two Basketball Reference datasets and the salary data from Spotrac into one to perform our regression analyses. Additionally, we normalized the statistics and salaries for our regressions to a z-score since the statistics are on different scales. For example, some are recorded in percentages, or an average player might have 4 assists per game while the average points per game is closer to 12. This was done to improve the interpretability of the results. We run two separate sets of regressions to see which box score stats and which advanced stats are over or undervalued when compensating players. Effectively, we regress salary on several box score stats and VORP on several box score stats and compare the deviation and differences in statistical significance between the coefficients (repeat for advanced stats). For example, if a 1 standard deviation in assists leads to a 1 standard deviation increase in VORP but only a 0.5 increase in salary, that would mean the market undervalues assists when paying players (and vice versa). The controls we use are for age as older players tend to have better stats and get paid more due to CBA rules and team fixed effects since players on very good teams (e.g. NBA champion Golden State Warriors) might have inflated stat lines due to peer effects.

Results and Analysis

The interpretation of the coefficients are for a one standard deviation increase in the specific box stat, the player’s salary (million $USD) or VORP increases by that standard deviation of the coefficient holding all else in the model constant. For example, if a player averages one standard deviation above the mean in points per game, their salary will increase by .649 standard deviations and their VORP will increase by .482 standard deviations holding all else constant. It makes sense that this coefficient is the largest in magnitude and statistically significant at the 1% level given that points are considered the most important metric in tracking basketball performance. Based on the coefficient differences and statistical significance, we can say if a statistic seems to be undervalued, overvalued, or indeterminate (likely appropriately valued). If the coefficient predicting salary is greater than VORP, that means the stat might be overvalued and vice versa. If the coefficient is statistically significant for salary but not for VORP, the stat might be overvalued and vice versa. See the full table below for the report of what traditional box score stats are under or overvalued. 

The three most undervalued statistics seem to be eFG%, assists, and turnovers. It makes sense eFG% is undervalued since it is an efficiency stat and less reported on traditional box scores. Assists are interesting because they are frequently reported but perhaps take second fiddle to points when owners are deciding to pay players. Potential assists (possession that ends in shot, foul, turnover) would likely be a better undervalued stat as it is agnostic of shooter quality (a player might have more assists simply because they pass the ball to Kevin Durant). Turnovers are also less frequently reported on box scores and can be detrimental to a team. Similar to assists, a better stat would be turnover percentage (share of times you get the ball that end in a turnover) as point guards typically have more turnovers than centers since they handle the ball more. 

The two most overvalued stats are not surprisingly blocks and points. While points are statistically significant and positive in both regressions, it is more positive in Salary than VORP. Points are a counting stat and while the best players typically score a lot, sometimes this comes at the expense of efficiency for players who just take a lot of shots and miss. Also, blocks are a “small sample stat” (Jaren Jackson Jr. led the league in 2021 with 2.3 per game) and although indicative of a strong rim protector can be flawed. Similar to chasing points, some players might chase blocks where they contest every shot which can either leave them vulnerable to pump fakes or fouling (Jaren Jackson Jr. also led the league with 3.64 fouls per game).

We repeat the same process with the advanced stats as with the box score stats. As expected, there is less of a deviation between these coefficients and they are all statistically significant since in today’s analytics revolution, more of the advanced stats are used by teams when deciding how much to compensate players rather than the box score. It would be interesting to look at a different time period (e.g. the 1990s when Jordan played) and see what stats were valuable for winning in that environment (e.g. threes were not taken as much) and how teams paid for them (probably more stats were misvalued in labor market and larger deviation between how teams compensate players). 

PER and Win shares are the most undervalued stats. Win shares are an individual player’s contribution to winning in the context of his team as it divides the credit. PER (developed by John Hollinger, former VP of Memphis Grizzlies) is a measure of a player’s per-minute productivity. We know that PER can be a flawed stat as it doesn’t capture defensive value and is agnostic of the competition you face (bench players who play against second units vs. starters). Nevertheless, both stats are good proxies of contribution to winning and player efficiency measuring their “opportunity cost”. Usage rate is overvalued meaning that teams pay players with high usage rates too much in proportion to how much they affect winning. Usage rate just measures how much of a team’s possessions end with that specific player touching the ball. A higher usage rate is neither good nor bad but provides context for the player’s efficiency (stars like Luka Doncic or Giannis Antetokounmpo usually lead the league in usage). Since it is overvalued, front offices should account for it more when paying players and penalize those who with high usage rates but don’t back it up with efficiency or contributing to winning (e.g. Russel Westbrook). 

Conclusion

We have identified both the traditional box score stats and advanced analytical stats that are over and undervalued by front offices when compensating players. After regressing several of them on salary and VORP, we compare differences between coefficients and statistical significance. In general, we find traditional stats like points or blocks to be overvalued and analytical or efficiency stats like eFG%, Win Shares, or PER to be undervalued. We recognize there are flaws in some of these metrics as they don’t fully capture defensive value as well or adjust for the efficiency and volume trade off. Nevertheless, these are solid estimates for how teams compensate their players based on performance and how they can optimally structure payroll. These results are only from the 2021-2022 season but likely hold within a limited time frame since the overall league environment has evolved over time as well as CBA rules about player compensation. For example, if repeated for the 1990s the regressions might find less of an emphasis on 3-PT shooting and place more values on “big-man” stats since the game allowed centers to dominate more. Owners might not care as much about winning when signing players and just want to attract the top players who average the most points regardless of if they contribute to winning but this is beyond our scope. Other interesting applications besides other NBA eras include looking at what specific teams allocate their salary the most effectively (e.g. the smartest “Moneyball” team like the Oakland A’s), specific players who are over or undervalued, and other sports running similar regressions for the NFL, NHL, and more. Ultimately, a simple regression analysis can shed light on what statistics are over or undervalued in the NBA and provides good lessons for what types of players teams should look for and how much to pay them to optimize their roster construction on a budgeted payroll to compete for a championship. 

About the author

harvardsports

View all posts

Leave a Reply

Your email address will not be published. Required fields are marked *