NBA players come into the league at different ages based on how many years of college they complete. At younger ages they are generally more athletic, while at older ages they tend to get slower and craftier. It is conventional wisdom among NBA analysts that player ability is a function of both age and years in the league, and that age is a pretty solid gauge of how much upward potential a player could have. Because of this, there are a lot of interesting questions about the age composition of the NBA and how it affects player performance.
Composition of the NBA
The most straightforward approach to this problem is to scrape basketball-reference for box score data and players’ data date of birth. I scraped game data from the 2006 season to 2017 season, in season and playoffs. This yielded about 317,000 box score events. About 1.5% of the entries had to be thrown out because player names matched or date of birth was not listed. From this data, the distribution of age for players listed on a box score is as follows:
Not all of these players play the same amount of minutes though. Younger players have a harder time playing since they are worse, and older players get tired more quickly. Therefore, we can look at the average minutes played for a player of a given age, and how the weighted average of the age has changed since 2006 (using the same methodology for weighted age as was used for height and weight here):
What we can see here is intuitive, younger and older players play fewer minutes, and the average age has not changed very much this decade. Are younger and older players playing less because they play less each game, or because they sit at a higher frequency than players in their mid twenties? It is likely that players sitting a large number of games shifts the curve for players sitting more games. Therefore, it is interesting to restrict the data to games where players log positive minutes, and then rerun the analysis:
This is not drastically different, but it smoothes out the shape of the curve a bit. Due to the infrequency of players under 20 or 40+, the rest of this analysis will be restricted to players 20-39.
While we don’t see massive shifts in player time conditional on age, one may also wonder about the change in certain statistical categories based on age. First we will look at cumulative statistics, noting that they are all correlated with minutes played. These plots have 95% confidence intervals to provide insight into the variability of the statistic.
There are a lot of interesting trends to talk about here, but most of them peak in the late twenties and are lower for both younger and older players. Three interesting cases though, are that blocks, rebounds, and fouls seem to pretty steadily decline as players get older. Also I would like to note here that none of these graphs should be construed as aging curves. All of this analysis is conditional on the player still playing in the NBA, so there is a selection bias among older players toward players that can stay in the NBA. By proxy, there might also be a selection bias towards younger (20 year old) players who come out of college super early. Perhaps this is highly correlated with not committing many personal fouls. You can find a glossary of the statistics here.
The next interesting frontier here is rate statistics. While we can see how players function due to their increased minutes, it is also interesting to wonder about how players perform given that they are playing:
Note that this is the average of each statistical category conditional on the player playing in the game. For example, if a player was 4/6 from three one night and 2/4 the next night, the average would be .6666 + .5 / 2 = .583, not 6/10 = .6. We are not recalculating all of the statistical categories conditional on age for sake of simplicity (they should be decently close).
Back to Backs
While this provides an insight into average player performance by age, not all games are created equal. One common distinction between games is if they exist on a back to back. To look into this, we can categorize every game as a back to back or not, conditional on whether the team has a game the next day. Then we can attempt to model these statistical categories as a function of age and whether the game is a back to back. We will model age as a polynomial of degree two because it seems to have that relationship for most variables. Then, we will assign every game the status of “Game 1 of B2B,” “Game 2 of B2B,” and neither. Then we can take all of the interaction terms and look at the results for every age and game type. We will use a ridge regression so that it shrinks the coefficients that are not useful, so that we do not get unwanted curvature. First, let’s look at minutes:
What we can see from this is that players play more when they are on the first or the second leg of a back to back then when they are not. Because of this, it is not really interesting to look at cumulative statistics, instead we will look at the important rate statistics:
The goal here is to tease out statistics where older players get better worse on back to backs relative to baseline. What we find is that older players are worse at some types of rebounding and stealing, while they are better than expected at shooting and assisting. This falls in line with expectation, the statistics that are based on physical fitness impact older players more on a back to back, while mental statistics such as finding assists and shooting efficiently favor the older players. In summary statistics though, we see no major impacts of being old in a back to back.
Editors Note: If you have any comments or questions for Benedict, please feel free reach out to him at firstname.lastname@example.org.