import numpy as np
import matplotlib.pyplot as plt
%pylab inline
from scipy import stats
import matplotlib.mlab as mlab
import bisect
Populating the interactive namespace from numpy and matplotlib
Here we use Tom Tango's equation to find the true variance (variance due to skill) of win percentages for each league.
$ var(observed) = var(skill) + var(luck) $
where
$ var(luck)=.5*.5/gp $, where $gp$ is the number of games played
and $ var(observed) $ is found from looking at historical win percentages data for each league. Numbers from:
http://www.insidethebook.com/ee/index.php/site/comments/true_talent_levels_for_sports_leagues/
#actual variances of winning percentage
avNHL=0.0997**2
avNBA=0.1449**2
avNFL=0.1899**2
avMLB=0.0717**2
#random variace (variance due to luck) of winning percentage
rvNHL=.5*.5/82
rvNBA=.5*.5/82
rvNFL=.5*.5/16
rvMLB=.5*.5/162
#true variance (variance due to skill) of winning percentage
tvNHL=avNHL-rvNHL
tvNBA=avNBA-rvNBA
tvNFL=avNFL-rvNFL
tvMLB=avMLB-rvMLB
avs={'NHL':avNHL, 'NBA':avNBA, 'NFL':avNFL, 'MLB':avMLB}
rvs={'NHL':rvNHL, 'NBA':rvNBA, 'NFL':rvNFL, 'MLB':rvMLB}
tvs={'NHL':tvNHL, 'NBA':tvNBA, 'NFL':tvNFL, 'MLB':tvMLB}
leagues=['NHL','NBA','NFL','MLB']
Tango suggests that once we find the true variance (variance due to skill) for each league, we can vary the games played - which affects the random variance (variance due to luck) - to see how the actual observed variance would change, and what percent of that variance luck and skill would account for. The denomenator in the following equation is the new actual observed variance.
$ luck\% = \frac{.5*.5/gp}{var(skill)+.5*.5/gp} $
$ skill\% = \frac{var(skill)}{var(skill)+.5*.5/gp} $
The plots show $luck\%$ and $skill\%$ as $gp$ is varied from 1 through 100.
#after how many games does skill ratio equal luck ratio?
games=np.arange(1,101)
eqvarrats={}
#for each league find the skill ratio and luck ratio for different number of games and plot results
for league in leagues:
luckperc=(.5*.5/games)/(tvs[league]+.5*.5/games)
skillperc=tvs[league]/(tvs[league]+.5*.5/games)
eqvarrats[league]=.5*.5*.5/(tvs[league]-.5*tvs[league])
plt.title(league)
plt.plot(games,luckperc, label="luck")
plt.plot(games,skillperc, label="skill")
plt.xlabel("# of games")
plt.ylabel("percent of total variance")
plt.legend()
plt.show()
print "'Skill ratio' = 'luck ratio' at",eqvarrats[league],'games'
'Skill ratio' = 'luck ratio' at 36.2775753371 games
'Skill ratio' = 'luck ratio' at 13.9297265815 games
'Skill ratio' = 'luck ratio' at 12.2327091879 games
'Skill ratio' = 'luck ratio' at 69.4892240058 games
Here we just take the intersection points in the graphs from the previous section and make a bar graph. The 1:1 ratio was chosen arbitrarily, just as a way to compare the role of luck/randomness between the leagues.
#in what sport does luck play the biggest role?
#plot games needed for 1:1 ratio
N = len( eqvarrats.values() )
x = np.arange(1, N+1)
y = eqvarrats.values()
labels = eqvarrats.keys()
width = .75
bar1 = plt.bar( x, y, width, color='CornflowerBlue')
plt.ylabel( 'Games needed for 1:1 skill / luck ratio' )
plt.xticks(x + width/2.0, labels )
plt.ylim(0,75)
print "From left to right, the values of the bars in the graph:"
print eqvarrats.values()
plt.show()
From left to right, the values of the bars in the graph: [69.48922400577133, 36.277575337109816, 13.929726581482969, 12.232709187890006]
Here we divide the variance due to skill by the variance due to luck at the number of games played in each league.
$skill/luck \hspace{2mm} ratio = var(skill)/var(luck)$, when $gp$ is either 162, 82, or 16 depending on the league.
#in what sport can you best tell who the best team was at the end of the season?
skillluckseason={}
#calculate skill/luck ratio for each league
for league in leagues:
skillluckseason[league]=tvs[league]/rvs[league]
#plot ratios
N = len( skillluckseason.values() )
x = np.arange(1, N+1)
y = skillluckseason.values()
labels = skillluckseason.keys()
width = .75
bar1 = plt.bar( x, y, width, color='CornflowerBlue')
plt.ylabel( 'skill / luck ratio for a season' )
plt.xticks(x + width/2.0, labels )
plt.ylim(0,6.5)
print "From left to right, the values of the bars in the graph:"
print skillluckseason.values()
plt.show()
From left to right, the values of the bars in the graph: [2.3312967200000005, 2.2603495199999992, 5.88669128, 1.3079686400000003]