# Version control for fun and profit: the tool you didn't know you needed. From personal workflows to open collaboration¶

Note: this tutorial was particularly modeled, and therefore owes a lot, to the excellent materials offered in:

In particular I've reused the excellent images from the Pro Git book that John had already selected and downloaded, as well as some of his outline. But this version of the tutorial aims to be 100% reproducible by being executed directly as an IPython notebook and is hosted itself on github so that others can more easily make improvements to it by collaborating on Github. Many thanks to John and Emanuele for making their materials available online.

After writing this document, I discovered J.R. Johansson's tutorial on version control that is also written as a fully reproducible notebook and is also aimed at a scientific audience. It has a similar spirit to this one, and is part of his excellent series Lectures on Scientific Computing with Python that is entirely available as IPython Notebooks.

## Wikipedia¶

“Revision control, also known as version control, source control or software configuration management (SCM), is the management of changes to documents, programs, and other information stored as computer files.

Reproducibility?

• Tracking and recreating every step of your work
• In the software world: it's called Version Control!

What do (good) version control tools give you?

• Peace of mind (backups)
• Freedom (exploratory branching)
• Collaboration (synchronization)

## Git is an enabling technology: Use version control for everything¶

• Paper writing (never get paper_v5_john_jane_final_oct22_really_final.tex by email again!)
• Grant writing
• Everyday research
• Teaching (never accept an emailed homework assignment again!)

## The plan for this tutorial¶

This tutorial is structured in the following way: we will begin with a brief overview of key concepts you need to understand in order for git to really make sense. We will then dive into hands-on work: after a brief interlude into necessary configuration we will discuss 5 "stages of git" with scenarios of increasing sophistication and complexity, introducing the necessary commands for each stage:

1. Local, single-user, linear workflow
2. Single local user, branching
3. Using remotes as a single user
4. Remotes for collaborating in a small team
5. Full-contact github: distributed collaboration with large teams

In reality, this tutorial only covers stages 1-4, since for #5 there are many software develoment-oriented tutorials and documents of very high quality online. But most scientists start working alone with a few files or with a small team, so I feel it's important to build first the key concepts and practices based on problems scientists encounter in their everyday life and without the jargon of the software world. Once you've become familiar with 1-4, the excellent tutorials that exist about collaborating on github on open-source projects should make sense.

## Very high level picture: an overview of key concepts¶

The commit: a snapshot of work at a point in time

Credit: ProGit book, by Scott Chacon, CC License.

In [1]:
ls

argv.ipynb               notes.html                      Scratch.ipynb
data.csv                 Notes.ipynb                     stocks/
err.ipynb                notes.md                        test/
fig/                     picogit.ipynb                   Untitled0.ipynb
git-resources.md         QuickTour.ipynb                 Untitled1.ipynb
ibm.csv                  QuickTour.v2.ipynb              Untitled1-new.ipynb
InteractiveMPI.v2.ipynb  reprosw.pdf                     Version Control.html
IntroNumPy.ipynb         reprosw.rst                     Version Control.ipynb
IntroNumPy.pdf           reprosw.tex                     Version Control.v2.ipynb
IPythonIntro.ipynb       scikit-learn-barc-meetup.ipynb



A repository: a group of linked commits

Note: these form a Directed Acyclic Graph (DAG), with nodes identified by their hash.

A hash: a fingerprint of the content of each commit and its parent

In [2]:
import sha

# Our first commit
data1 = 'This is the start of my paper2.'
meta1 = 'date: 1/1/12'
hash1 = sha.sha(data1 + meta1).hexdigest()
print 'Hash:', hash1

Hash: 7bb695b77966e27cfaebfa59e27a0b91f1d33813


In [3]:
# Our second commit, linked to the first
data2 = 'Some more text in my paper...'
meta2 = 'date: 1/2/12'
# Note we add the parent hash here!
hash2 = sha.sha(data2 + meta2 + hash1).hexdigest()
print 'Hash:', hash2

Hash: 543da8bac9f643ba5611897b192a16dea42d2ab7



And this is pretty much the essence of Git!

## First things first: git must be configured before first use¶

The minimal amount of configuration for git to work without pestering you is to tell it who you are:

In [23]:
%%bash
git config --global user.name "Fernando Perez"
git config --global user.email "Fernando.Perez@berkeley.edu"


And how you will edit text files (it will often ask you to edit messages and other information, and thus wants to know how you like to edit your files):

In [35]:
%%bash
# Put here your preferred editor. If this is not set, git will honor
source $HOME/.git-prompt.sh PS1='[\u@\h \W$(__git_ps1 " (%s)")]\$' # adjust this to your prompt liking  See the comments in both of those files for lots of extra functionality they offer. #### Embedding Git information in LaTeX documents (Sent by Yaroslav Halchenko) su I use a Make rule: # Helper if interested in providing proper version tag within the manuscript revision.tex: ../misc/revision.tex.in ../.git/index GITID=$$(git log -1 | grep -e '^commit' -e '^Date:' | sed -e 's/^[^ ]* *//g' | tr '\n' ' '); \ echo$$GITID; \ sed -e "s/GITID/$$GITID/g"$< >| \$@


in the top level Makefile.common which is included in all subdirectories which actually contain papers (hence all those ../.git). The revision.tex.in file is simply:

% Embed GIT ID revision and date
\def\revision{GITID}


The corresponding paper.pdf depends on revision.tex and includes the line \input{revision} to load up the actual revision mark.

#### git export

Git doesn't have a native export command, but this works just fine:

git archive --prefix=fperez.org/  master | gzip > ~/tmp/source.tgz

In []: