By Harrison Chase, Nathaniel Ver Steeg, and Daniel Smith
One of the most important jobs of any NBA team is player development. For as much money is spent on scouts and coaches to find and develop the potential of players, surprisingly little research (at least public research) has been done find out what factors accelerate player development. In an article bashing the Philadelphia 76er’s blatant tanking attempts a few months ago, Andrew Sharp of Grantland claimed that the Sixers were stunting the growth of their players because “young players learn how to play in the NBA by listening to veterans and playing meaningful games.” In this study we attempt to look at what measures are correlated with player progression in the NBA, and, more specifically, we attempted to support with numbers Sharp’s claim that veteran leadership and meaningful game experience are important.
The first step in analyzing player skill development is to determine how to accurately measure a player’s skill. There are many metrics out there that do this, but we chose BPM (box score plus – minus) for two reasons. It is one of the newest metrics, and therefore not that many people have worked with it so we would have the opportunity to do new research, and thanks to basketball-reference.com it is easy to scrape. We then looked for other research about player progression that we could build off of, and found that there was surprisingly little. One area where there is a decent amount of research is in how players age. As multiple analyses have shown, players typically improve until the 25/26/27 age range (due to growth, learning new skills, having more time to play) and then their performance falls off (due to becoming older). So in order to look at player progression, it is obvious that we would first have to control for that. The way we decided to go with was to create an aging curve of our own and then calculate the difference between a player’s actual progression and their age expected progression. To create the aging curve we used a process similar to the one described here, and we also created an adjusted aging curve (attempting to account for survivor bias) by following the procedure outlined here. It turns out that the two aging curves are very similar for young players, as is to be expected as the survivor bias affects mostly older players. The resulting aging curves are pictured below. As you can see, it is very similar to what we might imagine, with a peak around the 25-26 range.
Now, with an estimate for a player’s aging curve, we can find the difference between how much they actually progressed and how much they were expected to progress based on their age (we called this variable prog1). We could then regress this on several variables. To repeat, each observation in this study was a pair of consecutive years by the same players. We would therefore have data from both the first year of that pair (hence referred to as year 1) and the second year of that pair (year 2). For this analysis we were interested in looking at what things teams could do in year 1 to cause their players to improve in year 2. The assumption here is that things done in the first year carry over to the second in the form of player progression. It is for this assumption that we look exclusively at data from the players first year of the pair. A table of the variables we considered is below.
(NOTE: for the team stats of players who had changed teams midseason we took the average of the teams they had been on.)
Why did we include all these statistics? We included a player’s team winning percentage and whether or not they made the postseason as measures for whether they play in meaningful games, and we tried to measure veteran leadership with a variety of statistics. We looked not only at average team age and players above 32 years of age (who may be considered veterans), but also at the distribution of career minutes, games played, and VORP (Value Over Replacement Player) on a team as an indication of whether they had veterans who could perhaps mentor the young players on a team.
We only ran the regression on a subset of the data. We only looked at players who had been only 24 years old or younger before the start of the first season, because we were only interested in player development which occurs in the early years. We only looked at players who had played more than 200 minutes in both season, to attempt to get rid of those who might have weird BPM due to small size. We only looked at players who had been drafted, so we could try to see if draft position affected development (announcers often say that players are drafted ‘for their potential’). Finally, we only considered pairs where the first year was 1984 or later. We did this because we only had data going back to 1974 so every player’s career minutes in 1974 was zero, so that would throw off all the career stats. This left us with 2,716 data points, where, as mentioned before each data point was a pair of consecutive years from the same player.
We found BPM, minutes played, draft number, team making the playoffs, as well as the interaction effect between BPM and minutes, BPM and draft number, and BPM and team making the playoffs all to be significant factors in predicting the progression of young NBA players. Our final model has the form:
prog1= -0.723 – 0.394*bpm1+ 0.000296*min1 -0.00834*draft + 0.143*t.p + 0.0001*bpm1*min1 – 0.0018*bpm1*draft – 0.048*bpm1*t.p
That can be rearranged (to make the interpretation easier) to be:
prog1= -0.723 + bpm1* (-.394+0.0001*min1 – 0.0018*draft – 0.048*t.p) +0.000296*min1 -0.00834*draft + 0.143*t.p
For those interested, the R output of our regression is below:
For the interpretation, it is important to note that these are all correlations we are observing, since this was an observational study not a randomized experiment. Still, the correlation we found were pretty interesting. Looking at the first variable, bpm1, a player’s BPM in year 1, we can see that its coefficient is negative and it has three interaction terms. The interpretation of the negative coefficient can be thought of as regression to the mean. For players with a high BPM in year 1, they are more likely to regress back to the average (which is 0), while players with a low BPM are also likely to regress (but in a positive way) back to zero. Meanwhile, the three interaction terms can all be thought of as affecting regression to the mean. The interaction with minutes, which has a positive coefficient, shows that for players who play more minutes there is less regression to the mean, which makes sense as they would be unlikely to get lucky/unlucky over a large amount of minutes played, so their BPM estimate would likely be more accurate. The interaction with draft pick has a negative term, which shows that players with a high draft pick number, i.e. that are low draft picks (since 1 is the highest draft pick, and 60 is the lowest), experience more regression to the mean compared to higher draft picks. Finally, the interaction term with playoff experience shows that players who make the postseason experience more regression to the mean. This isn’t that surprising as many teams in general who make the postseason get lucky, or perform above their true skill, and come experience regression to the mean the next season. It therefore makes sense that it should happen to players as well.
The other three terms are the more interesting ones. The coefficient for min1 is positive, showing a correlation between minutes played in year 1 and improvement above expected. It is tempting to conclude causation (that when players play more minutes, the develop faster) but it is also entirely possible it is the other way around (teams can recognize which players are going to develop faster, and therefore they play them more minutes). The next term involves draft pick – namely, the higher draft pick number a player is (i.e., the lower they are selected) the slower they will progress. This term likely indicates that teams have done a good job drafting – they identify players with more potential and draft them higher. Finally, the last term is the most interesting one, which shows a correlation between player progression and t.p, playing in the postseason. This seems to support Sharp’s claim that we are testing, although it is important to note once again that this just means there is a correlation between playing significant games and player development, not necessarily a causation.
Notice that all our variables we constructed to measure veteran leadership are not significant. Now, this does not necessarily mean that veteran leadership is not significant to a player’s development, but could be due to two other reasons. First of all, perhaps the most likely reason, the variables we constructed may not be an accurate measure of veteran leadership. Perhaps we choose the wrong statistics to look at, or perhaps veteran leadership cannot be measured at all. Maybe it only matters for a young player if there is a veteran who plays the same position as them. It is very hard to accurately measure or quantify veteran leadership, and that is perhaps why we didn’t find it to have any significance. Another reason is perhaps the reason why we think veteran leadership matters is that is actually allows players to gain postseason experience (as older players, up to a certain age, are better, so by being on the team they improve a team’s chances of making the postseason). In fact, when we remove whether a team makes the postseason and insert our other statistics for veteran leadership, most of them have a positive (albeit very insignificant) coefficient. Therefore, it may appear to the public that veteran leadership matter, when in reality it is actually the fact that they allow young players to make the postseason which matters.
One final thing to mention is that, as much time as we spent on this model, there are several limitations of it. The first and foremost is that BPM, the measure of player skill we are using, is exactly that: a measure. We can never know a player’s true skill, and this is just one attempt to estimate it, and may itself contain biases or errors. Our model therefore assumes all the same qualifiers that coming with BPM itself. A second limitation is that we are assuming that player development is due to actions taken by teams in one year, and it is then realized the next year. Although this very likely happens, it could be inaccurate for several reasons. One is that a player’s development could be caused by summer workouts, something we can’t even begin to measure. Another is that a player’s development might not be realized one season later – it might be realized during the current season, or maybe even one or two seasons down the road.
The reason we tried to measure the effect between one season and the season immediately proceeding that is that there would be a lot of high correlation between several of the statistics within season. For example, playing time and player progression would obviously be highly correlated – if a player is playing better they will get more playing time. Although even looking between seasons we are still limited to correlation, not causation, the argument for causation that can be made is much stronger. It is because of confounding issues like these that we looked only at statistics from a player’s first year. A final limitation of our study is that we mostly restricted our analysis to things teams could do to improve their player’s development (minutes played, postseason experience, veteran leadership). We didn’t even begin to look at what may cause a player himself to have higher potential, whether it be looking at other statistics from their first year, or maybe college, or maybe their combine measurements like height and reach. We did attempt to control for that by looking at draft pick, however another fascinating research opportunity would be to try to look at what aspects of a player himself make him have high or low potential.
Check back later this week for player progression ratings for teams and coaches!