IPython notebook demonstrating various features in the context of a somewhat contrived cummeRbund analysis. Cuffdiff data this is based on was generated inside of Galaxy but using the data and steps outlined by the great IPlant RNA-seq tutorial.
Go from Galaxy history to embedded IPython console in just a couple seconds. The environment is secured and dynmically proxied with your Galaxy session information.
from IPython.display import Image
Image('https://raw.githubusercontent.com/jmchilton/ipython_notebooks/master/images/launch_ipython_cropped.png')
# Load IPython magic for R integration.
%load_ext rpy2.ipython
%R library(cummeRbund)
The rpy2.ipython extension is already loaded. To reload it, use: %reload_ext rpy2.ipython
array(['mgcv', 'nlme', 'cummeRbund', 'Gviz', 'grid', 'rtracklayer', 'GenomicRanges', 'GenomeInfoDb', 'IRanges', 'S4Vectors', 'stats4', 'fastcluster', 'reshape2', 'ggplot2', 'RSQLite', 'DBI', 'BiocGenerics', 'parallel', 'tools', 'stats', 'graphics', 'grDevices', 'utils', 'datasets', 'methods', 'base'], dtype='|S13')
get(72, True) # Download Galaxy history id number 72 as file in current directory with name '72'
'/import/72'
%R cuff <- readCufflinks(dbFile='72') # Load history downloaded item (cuffdiff for cummeRbund)
<RS4 - Python:0x26781560 / R:0x1dfb8f50>
%R print(fpkmSCVPlot(genes(cuff))) # Demonstrate plotting with R
Scale for 'x' is already present. Adding another scale for 'x', which will replace the existing scale. geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
# Find all differentially expressed genes at a given alpha
%R sig <- getSig(cuff, alpha=0.01, level='genes')
%R sigGenes <- getGenes(cuff,sig)
%R print(length(sig))
Getting gene information: FPKM Differential Expression Data Annotation Data Replicate FPKMs Counts Getting isoforms information: FPKM Differential Expression Data Annotation Data Replicate FPKMs Counts Getting CDS information: FPKM Differential Expression Data Annotation Data Replicate FPKMs Counts Getting TSS information: FPKM Differential Expression Data Annotation Data Replicate FPKMs Counts Getting promoter information: distData Getting splicing information: distData Getting relCDS information: distData
[1] 783
# Print a heatmap here and then save as PNG in Galaxy's history.
%R print(csHeatmap(sigGenes, cluster='both'))
%R png(filename = 'siggene_heatmap.png', width = 900, height = 1000, units = 'px'); \
print(csHeatmap(sigGenes, cluster='both')); \
dev.off()
put('siggene_heatmap.png')
Using tracking_id, sample_name as id variables No id variables; using all as measure variables
Using tracking_id, sample_name as id variables No id variables; using all as measure variables
# Describe gene density.
%R dens <- csDensity(genes(cuff))
%R print(dens)
<ListVector - Python:0x267818c0 / R:0x1efb9560> [ListVector, ListVector, ListVector] <ListVector - Python:0x267818c0 / R:0x1efb9560> [ListVector, ListVector, ListVector] <ListVector - Python:0x267818c0 / R:0x1efb9560> [ListVector, ListVector, ListVector] <ListVector - Python:0x267818c0 / R:0x1efb9560> [ListVector, ListVector, ListVector]
# Now pull that dens data structure out of R and make available as a numpy structure in Python
%Rpull dens
# Iterate through it in Python and create a file 'gene_fpkm.tsv' and upload it to Galaxy history.
with open('gene_fpkm.tsv', "w") as f:
for val in zip(dens[0]['gene_id'], dens[0]['fpkm']):
f.write("\t".join(map(str, val)) + "\n")
put("gene_fpkm.tsv")
Save the notebook back to your Galaxy history and restore the IPython analysis with a click. History also allows you to view an HTML version of the analysis using IPython's nbviewer
application.
from IPython.display import Image
Image('https://raw.githubusercontent.com/jmchilton/ipython_notebooks/master/images/save_notebook_cropped.png')