I recently came across a post (http://www.danielforsyth.me/exploring_nba_data_in_python/) demonstrating the ease of gathering data from NBA.com's API. I'm really excited about the way Andrew Wiggins has been playing this year, so I wanted to explore how his play has changed since the beginning of the year.
On Dec 23rd, the wolves played their first game against the Cavaliers. Wiggins had a huge game, putting up 27 points. Many journalists have depicted this game as a turning point in Wiggins' year.
Here, I plot shooting data from before and after Dec 22nd (so the games before Dec 23rd and the Dec 23rd and all following games). I'm not in love with splitting the data as before and after, as this depicts Wiggins' growth as occuring suddenly on Dec 23rd. I'm guessing Wiggins' growth is better modeled as continuous, gradual improvement, so maybe I'll do future posts that depict Wiggins' learning as continuous. For now, I'll stick with before/after as this is easier to graph.
I still have a lot to learn about NBA.com's API, so this post only uses shooting data. Any comments and suggestions would be greatly appreciated.
import numpy as np, sys, scipy.stats, pandas as pd, requests, json, matplotlib
import matplotlib.pyplot as plt
import pylab as pl
%matplotlib inline
pd.options.display.mpl_style = 'default'
Here I more or less copy and pasted the functions from http://www.danielforsyth.me/exploring_nba_data_in_python/.
find_stats gathers shooting data about a given player and finds thats players average number of dribbles before a shot, average amount of time with the ball before shooting, the average shot distance, average distance between the player and the closest defender when shooting, and shooting percentage.
find_stats2 is basically the same as find_stats, but it doesn't calculate averages and instead just outputs a dataframe with the raw shot data.
df = None
players = []
player_stats = {'name':None,'avg_dribbles':None,'avg_touch_time':None,'avg_shot_distance':None,'avg_defender_distance':None, 'shooting_%':None}
def find_stats(name,player_id,dateFrom,dateTo):
#NBA Stats API using selected player ID
url = 'http://stats.nba.com/stats/playerdashptshotlog?'+ \
'DateFrom='+dateFrom+'&DateTo='+dateTo+'&GameSegment=&LastNGames=0&LeagueID=00&' + \
'Location=&Month=0&OpponentTeamID=0&Outcome=&Period=0&' + \
'PlayerID='+player_id+'&Season=2014-15&SeasonSegment=&' + \
'SeasonType=Regular+Season&TeamID=0&VsConference=&VsDivision='
#Create Dict based on JSON response
response = requests.get(url)
shots = response.json()['resultSets'][0]['rowSet']
data = json.loads(response.text)
#Create df from data and find averages
headers = data['resultSets'][0]['headers']
shot_data = data['resultSets'][0]['rowSet']
df = pd.DataFrame(shot_data,columns=headers)
avg_def = df['CLOSE_DEF_DIST'].mean(axis=1)
avg_dribbles = df['DRIBBLES'].mean(axis=1)
avg_shot_distance = df['SHOT_DIST'].mean(axis=1)
avg_touch_time = df['TOUCH_TIME'].mean(axis=1)
#add Averages to dictionary then to list
player_stats['name'] = name
player_stats['avg_defender_distance']=avg_def
player_stats['avg_shot_distance'] = avg_shot_distance
player_stats['avg_touch_time'] = avg_touch_time
player_stats['avg_dribbles'] = avg_dribbles
player_stats['shooting_%'] = df['FGM'].mean(axis=1)
players.append(player_stats.copy())
def find_stats2(name,player_id,dateFrom,dateTo):
#NBA Stats API using selected player ID
url = 'http://stats.nba.com/stats/playerdashptshotlog?'+ \
'DateFrom='+dateFrom+'&DateTo='+dateTo+'&GameSegment=&LastNGames=0&LeagueID=00&' + \
'Location=&Month=0&OpponentTeamID=0&Outcome=&Period=0&' + \
'PlayerID='+player_id+'&Season=2014-15&SeasonSegment=&' + \
'SeasonType=Regular+Season&TeamID=0&VsConference=&VsDivision='
#Create Dict based on JSON response
response = requests.get(url)
shots = response.json()['resultSets'][0]['rowSet']
data = json.loads(response.text)
#Create df from data and find averages
headers = data['resultSets'][0]['headers']
shot_data = data['resultSets'][0]['rowSet']
df = pd.DataFrame(shot_data,columns=headers)
return df
Lets get down to business. How has Wiggin's shooting performance changed since the Dec 23rd game?
cols = ['name','avg_defender_distance','avg_dribbles','avg_shot_distance','avg_touch_time', 'shooting_%'] #name the columns that I will put data into
dateFrom='8/1/14' #some arbitrary before season date- dec22
dateTo='12/22/14'
find_stats('andrew wiggins','203952', dateFrom, dateTo)
df = pd.DataFrame(players,columns = cols)
dateFrom='12/22/14' #dec 22- through some arbitrary future date
dateTo='12/1/15'
find_stats('andrew wiggins','203952', dateFrom, dateTo)
df = pd.DataFrame(players,columns = cols)
Take a look at the table below.
Looks like defenders are on average a little closer when Wiggins shoots the ball. Not a ton closer though - 0.13 feet is 1.56 inches. I should probably do some stats to decide which of these differences are meaningful. I'll do some of that later. For now, I think we can just agree that 1.56 inches is not much.
Wiggins also takes roughly the same number of dribbles before shooting.
Wiggins does seem to shoot from closer now than he did earlier in the year.
Wiggins holds the ball for about the same amount of time before shooting.
Not surprisingly, Wiggins has shot the ball much better recently.
df.head()
name | avg_defender_distance | avg_dribbles | avg_shot_distance | avg_touch_time | shooting_% | |
---|---|---|---|---|---|---|
0 | andrew wiggins | 3.536396 | 1.286219 | 13.110954 | 2.320495 | 0.381625 |
1 | andrew wiggins | 3.401699 | 1.305825 | 11.544660 | 2.303155 | 0.470874 |
Not too much surprising here. Wiggins has made more shots by taking more shots closer to the hoop. Next, lets dig a little deeper into this data. For instance, how has Wiggins managed to take closer shots?
df = None #lets create a new place for the data
dateFrom='8/1/14'
dateTo='12/22/14'
df = find_stats2('andrew wiggins','203952', dateFrom, dateTo);
Here is what the new data looks like. As you can see, there's some cool stuff here
df.head()
GAME_ID | MATCHUP | LOCATION | W | FINAL_MARGIN | SHOT_NUMBER | PERIOD | GAME_CLOCK | SHOT_CLOCK | DRIBBLES | TOUCH_TIME | SHOT_DIST | PTS_TYPE | SHOT_RESULT | CLOSEST_DEFENDER | CLOSEST_DEFENDER_PLAYER_ID | CLOSE_DEF_DIST | FGM | PTS | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0021400407 | DEC 21, 2014 - MIN vs. IND | H | L | -4 | 1 | 1 | 11:38 | 3.4 | 1 | 2.0 | 5.7 | 2 | missed | Hibbert, Roy | 201579 | 2.4 | 0 | 0 |
1 | 0021400407 | DEC 21, 2014 - MIN vs. IND | H | L | -4 | 2 | 1 | 9:10 | 10.9 | 0 | 3.2 | 19.7 | 2 | missed | Stuckey, Rodney | 201155 | 4.0 | 0 | 0 |
2 | 0021400407 | DEC 21, 2014 - MIN vs. IND | H | L | -4 | 3 | 1 | 4:14 | 7.5 | 1 | 1.2 | 17.1 | 2 | missed | Hill, Solomon | 203524 | 1.6 | 0 | 0 |
3 | 0021400407 | DEC 21, 2014 - MIN vs. IND | H | L | -4 | 4 | 1 | 3:55 | 19.3 | 0 | 0.8 | 17.9 | 2 | missed | Hill, Solomon | 203524 | 7.2 | 0 | 0 |
4 | 0021400407 | DEC 21, 2014 - MIN vs. IND | H | L | -4 | 5 | 2 | 2:04 | 13.9 | 1 | 2.9 | 4.4 | 2 | missed | Stuckey, Rodney | 201155 | 0.8 | 0 | 0 |
First, lets look at how Wiggin's shot distance has changed
bins = [0,3,6,9,12,15,18,21,24,27]
binlabels = ['0','1-3','4-6','7-9','10-12','13-15','16-18','19-21','22-24','25-27']
fig = matplotlib.pyplot.gcf()
fig.set_size_inches(14,6)
plt.subplot(1,2,1)
n, bins, patches = plt.hist(df['SHOT_DIST'], bins=bins)
plt.xticks(bins+1.5,binlabels)
plt.xlabel('Shot Distance (ft)')
plt.title('Before Dec 22nd')
plt.ylabel('Bin Count');
dateFrom='12/22/14'
dateTo='12/1/15'
df2 = find_stats2('andrew wiggins','203952', dateFrom, dateTo);
plt.subplot(1,2,2)
n, bins, patches = plt.hist(df2['SHOT_DIST'], bins)
plt.xticks(bins+1.5,binlabels)
plt.title('After Dec 22nd')
plt.xlabel('Shot Distance (ft)');
First, these plots are just a histogram of the number of shots across different distances. Notice the y scale differs between the two plots. I would have just normalized the data, but normalizing with weirdly sized bins was giving me headaches. Nonetheless, the SHAPE of the histograms is what matters. Not the raw numbers. The xaxis is shot distance in 3 foot bins.
Before Dec 22nd, Wiggins took lots of shots from 1-6 feet. Wiggins also took a ton of shots from 13-18 feet. Eek. Not the most efficient shots.
After Dec 22nd, Wiggins still takes tons of inside shots, but has seriously cut down on that mid-range game. Seems like an easy way to increase your shooting percentage.
Great! But lets look at how this change in shot selection actually changed Wiggins' shooting %
fig = matplotlib.pyplot.gcf()
fig.set_size_inches(14,6)
made = df['SHOT_RESULT'] == 'made'
missed = df['SHOT_RESULT'] == 'missed'
plt.subplot(1,2,1)
n, bins, patches = plt.hist([df[made]['SHOT_DIST'],df[missed]['SHOT_DIST']], bins=bins, label=['shot made',' shot missed'])
plt.title('Before Dec 22nd')
plt.legend()
plt.xticks(bins+1.5,binlabels)
plt.xlabel('Shot Distance (ft)')
plt.ylabel('Bin Count');
made = df2['SHOT_RESULT'] == 'made'
missed = df2['SHOT_RESULT'] == 'missed'
plt.subplot(1,2,2)
plt.title('After Dec 22nd')
n, bins, patches = plt.hist([df2[made]['SHOT_DIST'],df2[missed]['SHOT_DIST']], bins, label=['shot made',' shot missed'])
plt.xticks(bins+1.5,binlabels)
plt.legend()
plt.xlabel('Shot Distance (ft)');
These are the same histograms as above, but now I am depicting made and missed shots.
These plots show that not only has Wiggins started shooting more inside shots, but he is also making way more of them. I guess he is getting used to drawing contact from NBA size players.
These plots also show how inefficient those long 2s are. Cutting down on the 13-18 footers definitely helped.
Ok, here is where I go a little overboard. Sorry the following plots are messy. I'm not sure whats going going on with the bins/grids. They're still fun to look at.
In this first pair of graphs, I look at how Wiggins' dribbling before shooting has changed. Basically, I think the weakest part of Wiggins' game is his dribbling, so I was wondering if he dribbles less before shooting now.
fig = matplotlib.pyplot.gcf()
fig.set_size_inches(16,6)
plt.subplot(1,2,1)
plt.hist2d(df['SHOT_DIST'], df['DRIBBLES'])
plt.colorbar()
plt.title('Before Dec 22nd')
plt.xlabel('Shot Distance (ft)')
plt.ylabel('# Dribbles');
plt.subplot(1,2,2)
plt.hist2d(df2['SHOT_DIST'], df2['DRIBBLES'])
plt.colorbar()
plt.title('After Dec 22nd')
plt.xlabel('Shot Distance (ft)');
In these plots, the more red colors depict more frequent events. For instance, Wiggins takes tons of shots from about 1 foot when he doesn't dribble (like tip-ins and alley-oops).
When I look at these plots, what sticks out most to me is that Wiggins has not been spot up shooting as much recently, as it looks like he doesn't take as many long shots without dribbling. Also, Wiggins seems to take far fewer long 2s after dribbling. Probably a good idea.
I didn't really have a reason to look at closest defender and shooting, but I had the data.
fig = matplotlib.pyplot.gcf()
fig.set_size_inches(16,6)
plt.subplot(1,2,1)
plt.hist2d(df['SHOT_DIST'], df['CLOSE_DEF_DIST'])
plt.colorbar()
plt.title('Before Dec 22nd')
plt.xlabel('Shot Distance (ft)')
plt.ylabel('Closest Defender (ft)');
plt.subplot(1,2,2)
plt.hist2d(df2['SHOT_DIST'], df2['CLOSE_DEF_DIST'])
plt.title('After Dec 22nd')
plt.colorbar()
plt.xlabel('Shot Distance (ft)');
Well, so I guess it looks like teams are leaving Wiggins less open on those long 2s... kinda interesting since Wiggins' wasn't exactly making those long 2s at a prodigious rate. I guess this could just be a change in where Flip tries to get Wiggins the ball. Instead of coming off high screens, Wiggins now opperates more in the mid post