This is basicaly a raw exploration of the various info we can get from our corpus about wikipedia pages linked by the List of geometry topics
page. The aim of this study is to build an indicator allowing us to order pages by difficulty level.
%run "libraries.ipynb"
%config InlineBackend.figure_formats=['svg']
table = pd.read_csv("data/final.csv", encoding="UTF-8")
table = table.set_index("pagename")
table.head(10)
average word length | first revision | hidden users | length | page views | unique ip users | unique registered users | unique users | words | Pagerank pro 0.8 | ... | nbcontributorsBot | nbcontributorsIP | nbcontributorsMembers | nbrevisions | nbrevisionsBot | nbrevisionsIP | nbrevisionsMembers | ns | pageid | quality | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
pagename | |||||||||||||||||||||
2D computer graphics | 5.912321 | 2001-10-13T06:23:27Z | 0 | 24124 | 930542 | 83 | 139 | 222 | 3490 | 2.540488 | ... | 18 | 78 | 119 | 370 | 23 | 124 | 223 | 0 | 35248 | 4 |
2D geometric model | 5.786585 | 2004-03-08T00:49:59Z | 0 | 1113 | 155960 | 13 | 28 | 41 | 164 | 1.846966 | ... | 3 | 13 | 24 | 54 | 3 | 20 | 31 | 0 | 511647 | 1 |
3D computer graphics | 6.360368 | 2007-03-21T05:56:20Z | 0 | 8803 | 3174611 | 258 | 240 | 498 | 1196 | 3.665079 | ... | 28 | 250 | 211 | 893 | 65 | 368 | 460 | 0 | 10175073 | 4 |
3D projection | 5.907763 | 2003-09-07T18:48:57Z | 0 | 8987 | 1141382 | 98 | 113 | 211 | 1301 | 3.430536 | ... | 16 | 95 | 96 | 351 | 25 | 117 | 209 | 0 | 313741 | 2 |
3-sphere | 5.071429 | 2002-02-19T11:12:25Z | 0 | 17595 | 488222 | 44 | 117 | 161 | 2898 | 3.693437 | ... | 12 | 44 | 103 | 277 | 13 | 59 | 205 | 0 | 39792 | 4 |
Absolute geometry | 5.561167 | 2004-06-02T19:31:39Z | 0 | 5846 | 77451 | 22 | 59 | 81 | 891 | 2.870685 | ... | 14 | 20 | 44 | 104 | 15 | 27 | 62 | 0 | 699294 | 2 |
Acute and obtuse triangles | 5.818387 | 2014-10-10T19:11:27Z | 0 | 9048 | 5549 | 1 | 2 | 3 | 1327 | 0.829064 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | 44076423 | NaN |
Affine geometry | 5.424357 | 2003-06-11T09:28:43Z | 0 | 15245 | 277910 | 30 | 82 | 112 | 2373 | 3.495030 | ... | 15 | 30 | 64 | 205 | 22 | 40 | 143 | 0 | 243890 | 3 |
Affine space | 5.325827 | 2003-08-18T04:32:19Z | 0 | 13202 | 401482 | 64 | 105 | 169 | 2087 | 4.010550 | ... | 18 | 61 | 84 | 384 | 30 | 91 | 263 | 0 | 298834 | 2 |
Affine transformation | 5.599894 | 2002-02-25T15:51:15Z | 0 | 12454 | 1333126 | 78 | 139 | 217 | 1887 | 4.072621 | ... | 21 | 76 | 111 | 371 | 26 | 107 | 238 | 0 | 38449 | 2 |
10 rows × 32 columns
The specialization indicator (spe(wp,d)) is a first attempt at quantifying the difficulty of pages.
table.hist("specialization")
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x109087c10>]], dtype=object)
f, ax = plt.subplots(figsize=(18, 10))
sns.corrplot(table,diag_names=False)
<matplotlib.axes._subplots.AxesSubplot at 0x10d689810>