By Harrison Chase
One of the newest sources of data for the NBA has been the SportVu tracking data, which tracks players 25 times each second, recording their position in x,y coordinates as well as the x,y and z coordinates of the ball. Through this data tons of new information is available: check out the post I did on player movement or the one about who the best shooters in the NBA are for context.
A final project that Carlos Pena-Lobel, Daniel Silberwasser, Raymond Cen and I did this fall for a data science class uses this SportVu data in a little more in-depth way. Our full project is on Github, with (hopefully) well annotated Python code explaining our thinking, our process, and how to get the data. Here I will give a brief overview of the project as well as explaining some of the insights that I worked on. We may also put another post or two up looking at insights that other members found.
The main concept of our project was to build a model to predict whether a shot would go in. From this, we then kind of split up and did our own separate insights, which is why I am only going to write up what I did because I don’t want to speak for other people’s work. It is worth noting that there are other, very well done previous attempts to create a shot prediction model – we looked at and built of off the models created by John Ezekowitz et al., Krishna Narsu and YH Chang. Although the data has only been around for a few years, we are definitely not the first to try to do this.
I will not go over most of the gory mathematical details but rather let you read them in the full write up. We tried several different methods, each of which has it’s own benefits and are tailored towards the insights we were trying to glean. For example, I wanted to figure out which players in the NBA scored the most above what an average player would score given their shot difficulty, and also highlight the players who allowed their teammates to take the easiest shots. For this reason, the model I created tried to be as player blind as possible. Initially this meant that I just didn’t account for which player was attempting the shot, but there was some problems with this. For example, players who took pull-up jump shots tended to be some of the best shooters – Stephen Curry, Klay Thompson, etc. Therefore the model treated pull up jumpers as if they were relatively easy shots, because they were going in so often. But they weren’t going in at a high rate because they were easy, but rather because the players taking them were good! In order to control for that, I created a separate ridge regression model for each player with enough data points and then averaged the coefficients, in essence trying to create a mixed effects model (Python unfortunately doesn’t have a great module, that I could find, for mixed effects). From this, I was able to get the estimate probability of the shot going in if an average player took it, and then find out who took the hardest (and easiest) shots on average, and who scored the most above (and below) average.
Below is a Tableau visualization of the results, plotting expected points per shots against points above expected. These are for all shots in the 2014 season.
Walking through the results a little bit, we can see that it makes sense and matches other work done in this area. Players like Steph Curry, Chris Paul and Kyle Korver take tough shots but make them at an above average rate. On the opposite end of the spectrum are players like Andre Drummond, Andrew Bogut and Omer Asik that take easy shots but don’t make them as often as they should. The players who both take easy shots and make them at above average rates are mostly elite big men scorers like Tyson Chandler, DeAndre Jordan, Hassan Whiteside and Brandon Wright. The unfortunate players who take tough shots and don’t make them that often are mostly bench players who (reasonably) don’t get asked to take a lot of shots, but there are also players like Michael Carter-Williams, Lance Stephenson and Kobe Bryant in that area.
This was the main insight I was looking for, and I think the results make a good amount of sense. Inspired by Jeremias Engleman’s talk at NESSIS 2015, I tried to measure how much a player was responsible for not only his shot difficulty but also his teammates’ and his opponents’ shot difficulty. I followed pretty much the same procedure he laid out, except Python wasn’t letting me use three different constraints for the ridge regression so I just used one. The end product of this massive ridge regression is three numbers for each player – a number quantifying how much they affected their own shot difficulty, how much they affected their teammates shot difficulty, and how much they affected their opponent’s shot difficulty.
Below is a graph showing each player’s effect on his own shot difficulty against how he affects his teammates shot difficulty.
In the bottom right corner we have players who take tough shots but get their teammates good looks. These are pretty much all point guards – Chris Paul, Kyrie Irving, Steph Curry and Kyle Lowry among a few. In the upper right hand corner there would be players that take easier shots and allow their teammates to take easier shots. There aren’t that many players in this quadrant – Giannis Antetokounmpo, Roy Hibbert(?), Carlos Boozer and Jordan Hill stick out the most. The upper left quadrant is players who take easy shots and don’t get their teammates easy looks, and is where we find most of the big men – DeAndre Jordan, Rudy Gobert, Andre Drummond, etc. And final is the bottom left quadrant, where no player should want to be: players who take hard shots and don’t get good looks for their teammates: OJ Mayo, Eric Bledsoe, Jordan Clarkson and, of course, Kobe Bryant.
Talking a minute to step back and look at the results, they mostly make sense. There are a few questions marks – Roy Hibbert allows his teammates to take easier shots when he is on the floor according to this analysis. But overall it seems okay. And just to clarify, affecting your teammates’ shot difficulty can come in two different ways: you can either set them up for easy shots or you can be the one taking all the hard (but necessary) shots at the end of the shot clock and in situations like that.
The final graph I will present shows a player’s ability to affect his teammates’ shot difficulty against his ability to affect his opponent’s shot difficulty.
Of all the graphs, this is one I think that makes the least sense. Maybe this is because I used the same penalty in the ridge regression for all three aspects, or maybe there just isn’t enough data, or maybe this isn’t the best stat to do this type of analysis on. Either way, we see players in the upper right hard corner as players who get their teammates good shots and allow their opponents to take easier shots: Jordan Hill, Carlos Boozer, Ryan Kelly and Austin Rivers are the main ones. In the upper left hand corner, we have players who don’t get their teammates good shots but allow their opponents to get good ones: Tony Wroten, Nerlens Noel, Luol Deng. In the bottom right we have players who get bad shots for both their teammates and their opponents: among them are Rudy Gobert, Tyreke Evans and Ed Davis, which make some sense – but also Kobe Bryant, Ronnie Price and Damjan Rudez, which makes less. And finally in the quadrant with good defenders and good creators, we find Kawhi Leonard, Marc Gasol and John Wall (makes sense) along with Paul Pierce, Kyrie Irving and Dirk Nowitzki (less sense).
As you can see, I am slightly more skeptical of these results, as they have players like Kobe Bryant causing opponents to take worse shots and runner-up Defensive Player of the Year Draymond Green actually allowing opponents to take above average shots. However, pieces do make sense (Marc Gasol and Kawhi are good defenders, Austin Rivers is bad) so I don’t think this is total garbage.
This is not a finished product by any means (except in the sense that we already submitted it for a grade) and I look forward to bettering the model and sharing more results, hopefully after feedback from you all!