I am going to demonstrate how to use ipython notebook bash_kernal to do reproducible research.

I can do command line in the notebook and take notes along the way. Let's go to the directory first.

In [1]:
cd playground
ls -alt
total 256
drwxr-xr-x+ 82 Tammy  staff    2788 May  4 22:06 ..
drwxr-xr-x   7 Tammy  staff     238 May  4 21:42 .
[email protected]  1 Tammy  staff    6148 May  1 22:21 .DS_Store
-rw-r--r--   1 Tammy  staff    4608 May  1 22:00 iris.csv
drwxr-xr-x   3 Tammy  staff     102 May  1 09:40 play
[email protected]  1 Tammy  staff  114348 Mar 29 22:25 pybamview_example_data.tar.gz
[email protected]  7 Tammy  staff     238 Jul 11  2014 examples

we are going to work with the famous iris.csv dataset which is from R. First, look at the first serveral lines of the data.

In [2]:
head -5  iris.csv
sepal_length,sepal_width,petal_length,petal_width,species
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa

To have a better view of the data, use csvlook command from csvkit. csvkit use comma as a default delimiter, if you have tab delimited file, use -t flag. There are many other useful commands,check the link above.

In [2]:
cat iris.csv | head | csvlook
|---------------+-------------+--------------+-------------+--------------|
|  sepal_length | sepal_width | petal_length | petal_width | species      |
|---------------+-------------+--------------+-------------+--------------|
|  5.1          | 3.5         | 1.4          | 0.2         | Iris-setosa  |
|  4.9          | 3.0         | 1.4          | 0.2         | Iris-setosa  |
|  4.7          | 3.2         | 1.3          | 0.2         | Iris-setosa  |
|  4.6          | 3.1         | 1.5          | 0.2         | Iris-setosa  |
|  5.0          | 3.6         | 1.4          | 0.2         | Iris-setosa  |
|  5.4          | 3.9         | 1.7          | 0.4         | Iris-setosa  |
|  4.6          | 3.4         | 1.4          | 0.3         | Iris-setosa  |
|  5.0          | 3.4         | 1.5          | 0.2         | Iris-setosa  |
|  4.4          | 2.9         | 1.4          | 0.2         | Iris-setosa  |
|---------------+-------------+--------------+-------------+--------------|

It is a comma seperated value file, we are going to look at some statistics by using datamash It is a very interesting GNU project, and I like it very much. It is very powerful and enable me to do some very useful stuff together with awk and sed. There are examples in the link working with gene annoation file.

Let's look at the average sepal_length for each species. we can do it in R by dplyr easily, but I am going to use command lines.

In [3]:
cat iris.csv | datamash -t "," -H -s -g 5 mean 1
GroupBy(species),mean(sepal_length)
Iris-setosa,5.006
Iris-versicolor,5.936
Iris-virginica,6.588

-H flag means there is a header in the iris.csv file, -s flag means sort the file first, -g means group the data by specices and then calculate the mean of the first column.

Another very useful tool that I came across is q, which can execute SQL commands on plain txt files. q assumes the file is space delimited. use -d "," for comma delimited and -t for tab delimited files, respectively.

In [4]:
cat iris.csv | q -H -d "," "SELECT AVG(sepal_length), species from - Group BY species"
5.006,Iris-setosa
5.936,Iris-versicolor
6.588,Iris-virginica

we got the same result as using datamash.

ipython bash_kernal can also print the figure inline.

I am going to use Rio to interact R on the command line and print out the figure using display command following the link here: IBash Notebook

In [3]:
cat iris.csv | Rio -ge "g+geom_point(aes(x=sepal_length,y=sepal_width,colour=species))"| display

we get this figure inline, which I think is very awesome!

There are many limitations so far for the IBash_kernal.

  1. One thing I found is that if the command is not correctly executed. the error will persist and you can not proceed. I have to restart the kernal to continue to work on the same notebook.
  2. It can not display real-time data, less command will not work. others can be found in the post here Nevertherless, IBash Notebook gives a way to document your linux commands in a real-time manner and make your research reproducible to some extent!