Mining GIT repo of IPYTHON project itself
GitRepo Mining : Konark Modi(@konarkmodi)
Concept : @dloss : http://nbviewer.ipython.org/gist/dloss/11089724 : OpenSSL Repository Mining
cd /opt/ipython/
/opt/ipython
!git log --reverse | head -40
commit 6f629fcc23ba63342548f61cc7307eeef4f55799 Author: fperez <> Date: Wed Jul 6 17:52:32 2005 +0000 Reorganized the directory for ipython/ to have its own dir, which is a bit more consistent with the SVN book recommended layout. commit b100d4426001d91a4f11ae0ff9dae53a55e7bc9e Author: tzanko <> Date: Sun Jul 17 01:03:15 2005 +0000 Add ChangeLog symlink, sync up SVN with my local tree (minimal changes), to start new work off SVN. commit d7d090d0840239eefe15b308e98b72a55be811b8 Author: tzanko <> Date: Sun Jul 17 01:56:45 2005 +0000 Updated release tag for SVN. Added John Hunter's patch to Shell.py, which fixes problems with certain backends and the OO matplotlib API. commit 22db71d27259b55134302c8e001d7bde5c756a2c Author: fperez <> Date: Sun Jul 17 02:46:50 2005 +0000 Close http://www.scipy.net/roundup/ipython/issue34 with slightly modified patch (filter non-strings from tab completions). commit 79c0cdc8fbeea9fb478426a2b7515e8d657476b5 Author: fperez <> Date: Sun Jul 17 03:11:11 2005 +0000 Make a global variable out of the color scheme table used for coloring exception tracebacks. Thanks to a patch by pabw, see http://www.scipy.net/roundup/ipython/issue35. commit c83aafa375965e8a1a3115cdcd2e97ca5405e677 Author: fperez <> Date: Mon Jul 18 03:01:41 2005 +0000
Most recent commit
!git log -1
commit 783ec915e2928237b5e3aafcf8e383407bc20c1d Merge: 6c15f0e 508d149 Author: Matthias Bussonnier <bussonniermatthias@gmail.com> Date: Sat Apr 26 15:06:12 2014 +0200 Merge pull request #5731 from takluyver/sysinfo-realpath Calculate real path for symlinked IPython package in sysinfo
Total number of commits
!git log --oneline | wc -l
15933
Saving time, author, commit-header in a file
!git log --format=format:"%ai,%an,%H" > ../commits
!tail -n10 ../commits
2005-08-14 06:31:37 +0000,fperez,a1346b76aa53b4ddacf07c729e7debda96be03e5 2005-08-11 18:34:18 +0000,fperez,ad5c5bb0188c4b720a844700798042d3c9788f5a 2005-08-11 18:32:44 +0000,fperez,ee90220465eca51abf230d51c6ed4c87d1801f0c 2005-07-19 01:59:26 +0000,fperez,396feddb2da30c9839644f38c89c0ff30bb2ad2a 2005-07-18 03:01:41 +0000,fperez,c83aafa375965e8a1a3115cdcd2e97ca5405e677 2005-07-17 03:11:11 +0000,fperez,79c0cdc8fbeea9fb478426a2b7515e8d657476b5 2005-07-17 02:46:50 +0000,fperez,22db71d27259b55134302c8e001d7bde5c756a2c 2005-07-17 01:56:45 +0000,tzanko,d7d090d0840239eefe15b308e98b72a55be811b8 2005-07-17 01:03:15 +0000,tzanko,b100d4426001d91a4f11ae0ff9dae53a55e7bc9e 2005-07-06 17:52:32 +0000,fperez,6f629fcc23ba63342548f61cc7307eeef4f55799
Move back to presentation
import pandas as pd
df=pd.read_csv("../commits", header=None, names=["time", "author", "id"], index_col="time", parse_dates=True)
df.sort(ascending=True, inplace=True)
df.head()
/opt/pycon-presentation/lib/python2.6/site-packages/pandas/core/config.py:570: DeprecationWarning: height has been deprecated. warnings.warn(d.msg, DeprecationWarning)
author | id | |
---|---|---|
time | ||
2005-07-06 17:52:32+00:00 | fperez | 6f629fcc23ba63342548f61cc7307eeef4f55799 |
2005-07-17 01:03:15+00:00 | tzanko | b100d4426001d91a4f11ae0ff9dae53a55e7bc9e |
2005-07-17 01:56:45+00:00 | tzanko | d7d090d0840239eefe15b308e98b72a55be811b8 |
2005-07-17 02:46:50+00:00 | fperez | 22db71d27259b55134302c8e001d7bde5c756a2c |
2005-07-17 03:11:11+00:00 | fperez | 79c0cdc8fbeea9fb478426a2b7515e8d657476b5 |
Total number of authors
print "Total number of authors: %s" % len(df.groupby('author'))
Total number of authors: 369
Commits per author
commits_per_author=df.author.value_counts()
commits_per_author
MinRK 3037 Thomas Kluyver 1638 Fernando Perez 1480 Jonathan Frederic 1272 Matthias BUSSONNIER 909 Brian E. Granger 834 Brian Granger 811 Min RK 769 Paul Ivanov 594 vivainio 572 epatters 268 fperez 253 Bradley M. Froehle 210 Ville M. Vainio 205 Matthias Bussonnier 200 ... Dale Jung 1 André Matos 1 Rustam Safin 1 Stephan Peijnik 1 muzuiget 1 Steven Bethard 1 Rob Young 1 urielshaolin 1 Bradley Froehle 1 Sathesh Chandra 1 Yung Siang Liau 1 zah 1 Andrew Mark 1 Kieran O'Mahony 1 stevenJohnson 1 Length: 369, dtype: int64
%matplotlib inline
commits_per_author.plot(kind="bar", figsize=(20,6))
<matplotlib.axes.AxesSubplot at 0x38d9c50>
Create custom filter based on number of commits
def filter(value):
if value > 30:
return value
df_author_commits = df.groupby('author').size().apply(filter)
df_author_commits = df_author_commits.dropna()
df_author_commits[:10]
df_author_commits.sort(ascending=False)
df_author_commits.plot(kind="bar", figsize=(10,6))
<matplotlib.axes.AxesSubplot at 0x3c91c90>
Increase in Commits over time
df["c"]=1 # counter
commits_over_time=df.c.cumsum().plot()
commits_over_time
<matplotlib.axes.AxesSubplot at 0x5a3a4d0>
** Favourable commit days **
print df[:10].index.day
print df[:10].index.month
print df[:10].index.year
print df[:10].index.weekday
df_commit_days = df
df_commit_days['weekday'] = df.index.weekday
daywise_counts = df_commit_days.groupby('weekday').aggregate(sum)
daywise_counts[:10]
daywise_counts.index = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
daywise_counts.plot(kind='bar')
[ 6 17 17 17 17 18 19 11 11 14] [7 7 7 7 7 7 7 8 8 8] [2005 2005 2005 2005 2005 2005 2005 2005 2005 2005] [2 6 6 6 6 0 1 3 3 6]
<matplotlib.axes.AxesSubplot at 0x5df82d0>
** Resample the data based on Quarters **
per_months=df.resample("3M", how="sum")
per_months['c'].plot(kind="bar", figsize=(20,5))
print "Total-Commits-%s" % (per_months.sum())
Total-Commits-c 15933 weekday 43939 dtype: int64
** Slice basis range, and re-sample on weekly-basis **
per_weeks = df.ix['2013-04-20':'2014-04-26']
per_weeks=per_weeks.resample("1w", how="sum")
per_weeks['c'].plot(kind="bar", figsize=(20,5))
print "Total-Commits-%s" % (per_weeks.sum())
Total-Commits-c 5905 weekday 15101 dtype: int64