Chargement des bibliothèques
import csv
import numpy as np
%matplotlib inline
import matplotlib as plt
import pylab as pl
Importation des données
baseFileName = '2010.tab' # dataset
with open(baseFileName, 'rb') as f:
reader = csv.DictReader(f, dialect = 'excel', delimiter = '\t')
base = [dict([key, value] for key, value in row.iteritems()) for row in reader]
Tableau des joueuses
winners = [r['Winner'] for r in base]
losers = [r['Loser'] for r in base]
players = np.unique(winners + losers)
n = len(players)
print "number of players: ", n
number of players: 291
Définition d'une fonction pour obtenir l'indice associé à str
dans la liste a
def getindex(str, a):
return [i for (i, val) in enumerate(a) if val == str]
Par exemple, pour avoir les indices de tous les matches perdu par Williams V., on entre:
getindex('Williams V.',[r['Loser'] for r in base])
[244, 708, 1001, 1063, 1239, 1489, 2065]
On remplit une matrice de préférence R
, où chaque ligne correspond au scores obtenus par la joueuse associée: le coefficient R[i,j]
est le score obtenu par players[i]
contre players[j]
.
R = np.zeros((n,n))
pts = np.zeros((n,1))
for r in base:
# get Winner and Loser. Assign 1 to winner row and loser column
win = r['Winner']
los = r['Loser']
wind = getindex(win, players)
lind = getindex(los, players)
# Assign 1 to winner vs loser
R[wind,lind] = 1
if r['WPts'] != 'N/A':
if int(r['WPts']) > pts[wind]:
pts[wind] = int(r['WPts'])
if r['LPts'] != 'N/A':
if int(r['LPts']) > pts[lind]:
pts[lind] = int(r['LPts'])
Visuellement, la matrice est très creuse.
pl.imshow(R, interpolation='none')
<matplotlib.image.AxesImage at 0x106026290>
On calcule une approximation du vecteur propre dominant
r = np.ones((n,1))
for i in range(10):
r = np.dot(R,r)
r = r/np.linalg.norm(r)
Tri des scores pour obtenir le rang de chaque joueuse basée sur le vecteur propre dominant. Par comparaison, on calcule aussi le rang WTP.
eigenrank = r[:,0].argsort()
eigenrank = eigenrank[::-1]
wtp_rank = pts[:,0].argsort()
wtp_rank = wtp_rank[::-1]
Les 20 premières joueuses en 2010 selon le classement WTP
print players[wtp_rank[:20]]
['Williams S.' 'Safina D.' 'Wozniacki C.' 'Williams V.' 'Kuznetsova S.' 'Zvonareva V.' 'Jankovic J.' 'Dementieva E.' 'Clijsters K.' 'Azarenka V.' 'Stosur S.' 'Schiavone F.' 'Radwanska A.' 'Li N.' 'Bartoli M.' 'Pennetta F.' 'Sharapova M.' 'Peer S.' 'Petrova N.' 'Wickmayer Y.']
Les 20 premières joueuses selon le eigenclassement
print players[eigenrank[:20]]
['Wozniacki C.' 'Clijsters K.' 'Stosur S.' 'Dementieva E.' 'Henin J.' 'Zvonareva V.' 'Azarenka V.' 'Pennetta F.' 'Jankovic J.' 'Williams V.' 'Peer S.' 'Schiavone F.' 'Williams S.' 'Petrova N.' 'Li N.' 'Radwanska A.' 'Rezai A.' 'Petkovic A.' 'Pavlyuchenkova A.' 'Ivanovic A.']
print 'eigen score ', r[getindex('Henin J.',players),0]
print 'WTP pts ', pts[getindex('Henin J.',players),0]
print 'eigen rank ', getindex(getindex('Henin J.',players),eigenrank)[0]+1
print 'WTP rank', getindex(getindex('Henin J.',players),wtp_rank)[0]+1
eigen score [ 0.19515058] WTP pts [ 3135.] eigen rank 5 WTP rank 21
On peut déterminer visuellement quelle joueuse est sur ou sous-évaluée:
pl.plot(pts[:,0],r[:,0],'.')
pl.plot(range(8000),np.polyval(np.polyfit(pts[:,0],r[:,0],2),range(8000)))
[<matplotlib.lines.Line2D at 0x10690bc90>]
On liste les joueuses selon leur classement WTP, et on compare avec le eigen rank.
for p in players[wtp_rank]:
i = getindex(p,players)
print p, pts[i,0][0], r[i,0][0], getindex(i,wtp_rank)[0]+1, getindex(i,eigenrank)[0]+1
Williams S. 9195.0 0.162934688277 1 13 Safina D. 7800.0 0.0850149676463 2 43 Wozniacki C. 7270.0 0.227940789639 3 1 Williams V. 6506.0 0.17336456095 4 10 Kuznetsova S. 6141.0 0.118817170422 5 28 Zvonareva V. 6096.0 0.192617814982 6 6 Jankovic J. 5900.0 0.173926696388 7 9 Dementieva E. 5505.0 0.200987779475 8 4 Clijsters K. 5325.0 0.206833451933 9 2 Azarenka V. 5300.0 0.183592321629 10 7 Stosur S. 5045.0 0.205039000396 11 3 Schiavone F. 5005.0 0.16931870299 12 12 Radwanska A. 4190.0 0.153489269953 13 16 Li N. 4015.0 0.153959866245 14 15 Bartoli M. 3455.0 0.119980892622 15 26 Pennetta F. 3450.0 0.17915152632 16 8 Sharapova M. 3450.0 0.119950631636 17 27 Peer S. 3405.0 0.170054310698 18 11 Petrova N. 3345.0 0.160420634536 19 14 Wickmayer Y. 3320.0 0.131043877916 20 23 Henin J. 3135.0 0.195150580222 21 5 Rezai A. 3100.0 0.146211847918 22 17 Pavlyuchenkova A. 2780.0 0.137283533968 23 19 Martinez-Sanchez M.J. 2635.0 0.134455626619 24 22 Kanepi K. 2415.0 0.116532726193 25 29 Zheng J. 2355.0 0.115979471704 26 30 Kirilenko M. 2350.0 0.106819725153 27 31 Razzano V. 2300.0 0.0261742790372 28 83 Hantuchova D. 2285.0 0.128591624394 29 25 Ivanovic A. 2255.0 0.136426365915 30 20 Kleybanova A. 2185.0 0.130884864493 31 24 Safarova L. 2135.0 0.135116804766 32 21 Dulgheru A. 2080.0 0.0990262556965 33 34 Cibulkova D. 2063.0 0.0992560403439 34 33 Lisicki S. 2035.0 0.00731829933214 35 131 Bondarenko A. 2020.0 0.091673782779 36 38 Vesnina E. 2011.0 0.0843308155022 37 44 Medina Garrigues A. 1980.0 0.0498426218504 38 56 Kvitova P. 1917.0 0.0806048114838 39 46 Allgurin E. 1880.0 0.0 40 192 Shvedova Y. 1860.0 0.0888038884084 41 40 Petkovic A. 1820.0 0.139643788355 42 18 Pironkova T. 1805.0 0.0488540378644 43 57 Bondarenko K. 1740.0 0.0312103927924 44 76 Szavay A. 1735.0 0.0917151345198 45 37 Errani S. 1720.0 0.080315011164 46 47 Suarez Navarro C. 1715.0 0.0712097926813 47 50 Dulko G. 1665.0 0.0869074672728 48 41 Wozniak A. 1645.0 0.00961079893445 49 125 Oudin M. 1644.0 0.0258628411211 50 84 Zakopalova K. 1610.0 0.0832681688003 51 45 Cirstea S. 1606.0 0.0397041931034 52 65 Czink M. 1571.0 0.00393270211273 53 145 Benesova I. 1540.0 0.038576120426 54 67 Govortsova O. 1510.0 0.0546106247674 55 55 Dushevina V. 1490.0 0.0645431705541 56 52 Bacsinszky T. 1482.0 0.0620524198429 57 53 Schnyder P. 1440.0 0.0978011039535 58 35 Rybarikova M. 1425.0 0.0223605058953 59 90 Peng S. 1425.0 0.0603257381137 60 54 Bammer S. 1395.0 0.0331873634932 61 73 Groth J. 1389.0 0.0412248626337 62 64 Goerges J. 1361.0 0.0859816349879 63 42 Zahlavova Strycova B. 1348.0 0.0454544609443 64 59 Vinci R. 1340.0 0.106444307052 65 32 Hercog P. 1337.0 0.066526437766 66 51 Sevastova A. 1325.0 0.0932974567646 67 36 Cornet A. 1325.0 0.0414636837656 68 63 Kerber A. 1258.0 0.0719132533831 69 49 Garbin T. 1245.0 0.0307419488284 70 77 Makarova E. 1241.0 0.0785471424733 71 48 Parra Santonja A. 1231.0 0.044288468485 72 60 Date Krumm K. 1200.0 0.0891676328967 73 39 Baltacha E. 1179.0 0.0396453417693 74 66 Arvidsson S. 1176.0 0.0225449967425 75 89 Dokic J. 1153.0 0.00265658804966 76 154 Mirza S. 1148.0 0.00349851561171 77 149 Zahlavova Strycova B.. 1115.0 0.00625170231268 78 134 Rodionova An. 1095.0 0.0379073315394 79 68 Flipkens K. 1089.0 0.0421114669328 80 61 Groenefeld A.L. 1085.0 0.0121812306722 81 112 Kudryavtseva A. 1056.0 0.0288912243754 82 79 Amanmuradova A. 1031.0 0.0338753150448 83 71 Larsson J. 1004.0 0.034112603421 84 70 Domachowska M. 1003.0 0.0 85 195 Chakvetadze A. 1002.0 0.0482831790268 86 58 Tanasugarn T. 991.0 0.0285782870626 87 80 Chan Y.J. 989.0 0.0114848498868 88 118 Barrois K. 989.0 0.0192921237262 89 93 Oprandi R. 985.0 0.0108743359687 90 121 Martic P. 978.0 0.0220490001085 91 91 Hradecka L. 976.0 0.0070731498242 92 132 King V. 975.0 0.0312960237768 93 75 Kulikova R. 974.0 0.0318106812219 94 74 Voegele S. 959.0 0.0251678565463 95 86 Olaru I. 928.0 0.0117372106779 96 116 Malek T. 901.0 0.0301137194753 97 78 Mattek-Sands B. 887.0 0.0360937055562 98 69 Brianti A. 887.0 0.0134931450368 99 105 Radwanska U. 880.0 0.0032353914338 100 151 Zahlavova S. 873.0 0.01306957239 101 108 Coin J. 869.0 0.0121454336747 102 113 Zhang S. 866.0 0.00139034309862 103 169 Voegele 855.0 0.0 104 216 Voracova R. 848.0 0.0104372772996 105 124 Rodionova A. 843.0 0.012623779302 106 109 Morita A. 837.0 0.00837811198058 107 128 Craybas J. 829.0 0.0180162044725 108 95 Chang K.C. 806.0 0.0255534791216 109 85 Niculescu M. 801.0 0.0184003406276 110 94 Meusburger Y. 791.0 0.0173667761759 111 97 Jovanovski B. 776.0 0.02669840765 112 82 Arn G. 759.0 0.0169310845291 113 98 Molik A. 756.0 0.0212837050602 114 92 Scheepers C. 749.0 0.0118492407092 115 115 Sprem K. 748.0 0.0124590112845 116 111 Halep S. 738.0 0.0131392487083 117 106 O'Brien K. 729.0 0.000412228830498 118 178 Kutuzova V. 729.0 0.0 119 278 Gallovits E. 713.0 0.0125904527924 120 110 De Los Rios R. 712.0 0.00324472103179 121 150 Pervak K. 702.0 0.0114568628266 122 119 Parmentier P. 691.0 0.00392510261435 123 146 Camerin M.E. 686.0 0.0115777918437 124 117 Mayr P. 680.0 0.0022268738301 125 157 Lepchenko V. 679.0 0.00813625388738 126 129 Duque Marino M. 677.0 0.0148205318029 127 102 Nara K. 670.0 0.000993114386641 128 172 Lapushchenkova A. 669.0 0.0338689045689 129 72 Perry S. 664.0 0.0 130 227 Rodina E. 646.0 0.00506109112 131 140 Mchale C. 636.0 0.027613843709 132 81 Voskoboeva G. 625.0 0.0 133 250 Dubois S. 623.0 0.00365937727145 134 147 Koryttseva M. 609.0 0.0148227868623 135 101 Ferguson S. 599.0 0.005401186737 136 138 Vandeweghe C. 594.0 0.0418052243147 137 62 Kucova Z. 590.0 0.0 138 279 Bychkova E. 578.0 0.00455936889603 139 142 Yakimova A. 577.0 0.0118534114414 140 114 Keothavong A. 575.0 0.02357297243 141 88 Kucova K. 574.0 0.0 142 280 Hlavackova A. 573.0 5.14445003463e-05 143 185 Zec Peskiric M. 570.0 0.00296769000513 144 153 Larcherde Brito M. 568.0 0.00643780448958 145 133 Lucic M. 564.0 0.00617769856299 146 135 Ondraskova Z. 564.0 0.0131330812608 147 107 Rus A. 558.0 0.00616171041365 148 137 Pivovarova A. 535.0 0.0107819645777 149 122 Panova A. 530.0 9.85496409407e-05 150 183 Cohen Aloro S. 525.0 0.000245210700242 151 181 Han X. 522.0 0.00171230158558 152 162 Fichman S. 522.0 0.00365379420949 153 148 Krajicek M. 519.0 0.000263532786888 154 180 Osterloh L. 517.0 0.000245210700242 155 182 Woerle K. 516.0 0.0 156 226 Llagostera Vives N. 513.0 0.00401636379275 157 144 Kustova D. 510.0 0.00149835071864 158 166 Tetreault V. 510.0 0.00427882844333 159 143 Manasieva V. 506.0 0.0 160 253 Minella M. 492.0 0.00773150538924 161 130 Karatantcheva S. 483.0 0.0136527901095 162 103 Namigata J. 469.0 0.0 163 233 Bratchikova N. 462.0 0.0 164 203 Savchuk O. 458.0 0.00249098294256 165 155 Schruff J. 448.0 0.0 166 249 Bovina E. 436.0 0.0 167 202 Pous Tio L. 434.0 0.0166192739983 168 99 Paszek T. 433.0 0.0248516676788 169 87 Marino R. 432.0 0.00880746883271 170 127 Adamczak M. 430.0 0.00175389790736 171 161 Tatishvili A. 428.0 0.00307660336329 172 152 Floris A. 424.0 0.0 173 269 Ivanova E. 424.0 0.00181447779148 174 160 Riske A. 418.0 0.0135362763278 175 104 Dominguez Lino L. 418.0 0.00616740988907 176 136 Diatchenko V. 415.0 0.0 177 213 Dentoni C. 400.0 0.000775801730785 178 173 Rogowska O. 400.0 0.0 179 237 Johansson M. 400.0 0.0 180 290 Glatch A. 387.0 0.00073559960216 181 175 Hampton J. 386.0 0.0 182 258 Doi M. 384.0 0.0 183 206 Cetkovska P. 384.0 0.0 184 194 Sema Y. 379.0 0.0 185 248 Jones S. 350.0 0.0 186 289 Tsurenko L. 342.0 0.000993114386641 187 171 Fedak Y. 340.0 0.0 188 271 El Tabakh H. 338.0 0.0 189 215 Rodionova Ar. 335.0 0.0 190 218 Gullickson C. 324.0 0.0 191 260 Brengle M. 314.0 0.0 192 196 Wienerova L. 313.0 0.0 193 225 Dzehalevich E. 313.0 0.0 194 190 Castano C. 303.0 0.0111195359442 195 120 Sanchez O. 299.0 0.0 196 251 Daniilidou E. 298.0 0.00887485071475 197 126 Hofmanova N. 298.0 0.0 198 272 Vaidisova N. 295.0 0.0 199 238 Foretz S. 293.0 0.0 200 268 Antoniychuk K. 289.0 0.00477168379324 201 141 Soler Espinosa S. 281.0 0.0 202 244 Robson L. 280.0 0.00168645395692 203 163 Buyukakcay C. 279.0 0.0 204 207 Zhou Y.M. 278.0 0.0 205 277 Garcia-Vidagany B. 271.0 0.0 206 267 Barthel M. 268.0 0.0 207 198 Ungur L. 267.0 0.0 208 239 Fuda R. 266.0 5.19845061446e-05 209 184 Peers S. 263.0 0.000644007625231 210 176 Kim S.J. 260.0 0.0 211 287 Albanese L. 258.0 0.0 212 189 Pliskova Ka. 257.0 0.0 213 223 Chakhnashvili M. 250.0 0.0 214 210 Dellacqua C. 250.0 0.00162912046627 215 165 Diyas Z. 247.0 0.0174780926421 216 96 Begu I. 246.0 0.0 217 199 Mladenovic K. 245.0 0.00168645395692 218 164 Gerasimou A. 245.0 0.0 219 264 Yonemura T. 244.0 0.0 220 283 Feuerstein C. 235.0 0.0 221 270 South M. 232.0 0.0 222 243 Klaffner M. 230.0 0.0 223 285 Tomljanovic A. 228.0 0.0 224 241 Srebotnik K. 228.0 0.00207198201472 225 158 Erakovic M. 227.0 0.00147076343564 226 168 Granville L. 227.0 0.0 227 261 Georgatou E. 221.0 0.0 228 265 Pliskova Kr. 213.0 0.0 229 222 Cabeza-Candela E. 207.0 0.00147748777873 230 167 Sun S. 204.0 0.0 231 242 Lisjak I. 200.0 0.0 232 274 Watson H. 200.0 0.000644007625231 233 177 Xu Y.F. 199.0 0.0 234 229 Lu J.J. 196.0 0.0 235 254 Siegemund L. 192.0 0.0 236 245 Garcia Vidagany B. 188.0 0.0151522993534 237 100 Evtimova D. 184.0 0.00201789312767 238 159 Vrljic A. 182.0 0.00523172407694 239 139 Pochabova M. 180.0 0.0 240 221 Broady N. 177.0 0.0 241 205 Viratprasert S. 172.0 0.0 242 286 Palkina K. 164.0 0.0 243 230 Washington M. 163.0 0.0 244 234 Sfar S. 155.0 0.0 245 247 Falconi I. 154.0 0.0 246 255 De Gubernatis C. 149.0 0.0 247 211 Lertcheewakarn N. 149.0 0.000767708468948 248 174 Hrdinova E. 146.0 0.00226949642967 249 156 Hsieh S.W. 144.0 0.0 250 256 Stephens S. 144.0 0.00137811041987 251 170 Ruano Pascual V. 140.0 0.0 252 252 Kostanic Tosic J. 131.0 0.0 253 281 Poutchek T. 127.0 0.0 254 220 Ishizu S. 126.0 0.0 255 273 Piter K. 123.0 0.0 256 224 Rogers S. 122.0 0.0 257 235 Botto B. 112.0 0.0 258 201 Guskova N. 107.0 0.0 259 259 Ditty J. 106.0 0.0 260 214 Capra B. 102.0 0.0106323226031 261 123 Abduraimova N. 101.0 0.0 262 291 Zabala P. 98.0 0.0 263 275 Cecil M. 98.0 0.0 264 209 Ozgen P. 96.0 0.0 265 231 Babos T. 94.0 0.0 266 193 Luangnam N. 90.0 0.0 267 236 Sharipova S. 89.0 0.0 268 246 Wongteanchai V. 86.0 0.0 269 228 Yan Z. 86.0 0.000263532786888 270 179 Tvaroskova L. 80.0 0.0 271 240 De Lattre M. 65.0 0.0 272 212 Koehler M.J. 63.0 0.0 273 282 Caregaro M. 62.0 0.0 274 208 Brazhnikova A. 61.0 0.0 275 204 Klepac A. 60.0 0.0 276 284 Gavrilova D. 57.0 0.0 277 266 Granillo A. 44.0 0.0 278 262 Lalami N. 40.0 0.0 279 276 Grandin N. 38.0 0.0 280 263 El Allami Zhara F. 36.0 0.0 281 188 Jugic-Salkic M. 28.0 0.0 282 288 Bennani L. 19.0 0.0 283 200 Putintseva Y. 17.0 0.0 284 219 Ejdesgaard M. 16.0 0.0 285 187 Barte H. 7.0 0.0 286 197 Njiric S. 6.0 0.0 287 232 Harkleroad A. 0.0 0.0 288 257 Eraydin B. 0.0 0.0 289 217 Nakamura A. 0.0 6.60236164432e-06 290 186 Abdurakhimova A. 0.0 0.0 291 191