IPython¶

IPython (Interactive Python) is an enhanced Python shell which provides a more robust and productive development environment for users. There are several key features that set it apart from the standard Python shell.

History¶

In IPython, all your inputs and outputs are saved. There are two variables named In and Out which are assigned as you work with your results. All outputs are saved automatically to variables of the form _N, where N is the prompt number, and inputs to _iN. This allows you to recover quickly the result of a prior computation by referring to its number even if you forgot to store it as a variable.

In [1]:

import numpy as np
np.sin(4)**2

Out[1]:

0.5727500169043067

In [2]:

_1

Out[2]:

0.5727500169043067

In [3]:

_i1

Out[3]:

'import numpy as np\nnp.sin(4)**2'

In [4]:

_1 / 4.

Out[4]:

0.14318750422607668

Output is asynchronous¶

All output is displayed asynchronously as it is generated in the Kernel. If you execute the next cell, you will see the output one piece at a time, not all at the end.

In [5]:

import time, sys
for i in range(8):
    print(i)
    time.sleep(0.5)

Introspection¶

If you want details regarding the properties and functionality of any Python objects currently loaded into IPython, you can use the ? to reveal any details that are available:

In [6]:

some_dict = {}
some_dict?

Type:        dict
String form: {}
Length:      0
Docstring:  
dict() -> new empty dictionary
dict(mapping) -> new dictionary initialized from a mapping object's
    (key, value) pairs
dict(iterable) -> new dictionary initialized as if via:
    d = {}
    for k, v in iterable:
        d[k] = v
dict(**kwargs) -> new dictionary initialized with the name=value pairs
    in the keyword argument list.  For example:  dict(one=1, two=2)

If available, additional detail is provided with two question marks, including the source code of the object itself.

In [7]:

from numpy.linalg import cholesky
cholesky??

Signature: cholesky(a)
Source:   
@array_function_dispatch(_unary_dispatcher)
def cholesky(a):
    """
    Cholesky decomposition.

    Return the Cholesky decomposition, `L * L.H`, of the square matrix `a`,
    where `L` is lower-triangular and .H is the conjugate transpose operator
    (which is the ordinary transpose if `a` is real-valued).  `a` must be
    Hermitian (symmetric if real-valued) and positive-definite. No
    checking is performed to verify whether `a` is Hermitian or not.
    In addition, only the lower-triangular and diagonal elements of `a`
    are used. Only `L` is actually returned.

    Parameters
    ----------
    a : (..., M, M) array_like
        Hermitian (symmetric if all elements are real), positive-definite
        input matrix.

    Returns
    -------
    L : (..., M, M) array_like
        Upper or lower-triangular Cholesky factor of `a`.  Returns a
        matrix object if `a` is a matrix object.

    Raises
    ------
    LinAlgError
       If the decomposition fails, for example, if `a` is not
       positive-definite.

    See Also
    --------
    scipy.linalg.cholesky : Similar function in SciPy.
    scipy.linalg.cholesky_banded : Cholesky decompose a banded Hermitian
                                   positive-definite matrix.
    scipy.linalg.cho_factor : Cholesky decomposition of a matrix, to use in
                              `scipy.linalg.cho_solve`.

    Notes
    -----

    .. versionadded:: 1.8.0

    Broadcasting rules apply, see the `numpy.linalg` documentation for
    details.

    The Cholesky decomposition is often used as a fast way of solving

    .. math:: A \\mathbf{x} = \\mathbf{b}

    (when `A` is both Hermitian/symmetric and positive-definite).

    First, we solve for :math:`\\mathbf{y}` in

    .. math:: L \\mathbf{y} = \\mathbf{b},

    and then for :math:`\\mathbf{x}` in

    .. math:: L.H \\mathbf{x} = \\mathbf{y}.

    Examples
    --------
    >>> A = np.array([[1,-2j],[2j,5]])
    >>> A
    array([[ 1.+0.j, -0.-2.j],
           [ 0.+2.j,  5.+0.j]])
    >>> L = np.linalg.cholesky(A)
    >>> L
    array([[1.+0.j, 0.+0.j],
           [0.+2.j, 1.+0.j]])
    >>> np.dot(L, L.T.conj()) # verify that L * L.H = A
    array([[1.+0.j, 0.-2.j],
           [0.+2.j, 5.+0.j]])
    >>> A = [[1,-2j],[2j,5]] # what happens if A is only array_like?
    >>> np.linalg.cholesky(A) # an ndarray object is returned
    array([[1.+0.j, 0.+0.j],
           [0.+2.j, 1.+0.j]])
    >>> # But a matrix object is returned if A is a matrix object
    >>> np.linalg.cholesky(np.matrix(A))
    matrix([[ 1.+0.j,  0.+0.j],
            [ 0.+2.j,  1.+0.j]])

    """
    extobj = get_linalg_error_extobj(_raise_linalgerror_nonposdef)
    gufunc = _umath_linalg.cholesky_lo
    a, wrap = _makearray(a)
    _assert_stacked_2d(a)
    _assert_stacked_square(a)
    t, result_t = _commonType(a)
    signature = 'D->D' if isComplexType(t) else 'd->d'
    r = gufunc(a, signature=signature, extobj=extobj)
    return wrap(r.astype(result_t, copy=False))
File:      ~/miniforge3/envs/bios8366/lib/python3.9/site-packages/numpy/linalg/linalg.py
Type:      function

This syntax can also be used to search namespaces with wildcards (*).

In [8]:

%matplotlib inline
import pylab as plt
plt.*plot*?

plt.Subplot
plt.SubplotSpec
plt.SubplotTool
plt.boxplot
plt.eventplot
plt.get_plot_commands
plt.matplotlib
plt.plot
plt.plot_date
plt.plotting
plt.stackplot
plt.streamplot
plt.subplot
plt.subplot2grid
plt.subplot_mosaic
plt.subplot_tool
plt.subplots
plt.subplots_adjust
plt.triplot
plt.violinplot

Tab completion¶

Because IPython allows for introspection, it is able to afford the user the ability to tab-complete commands that have been partially typed. This is done by pressing the <tab> key at any point during the process of typing a command.

Place your cursor after the partially-completed command below and press tab:

In [ ]:

np.ar

This can even be used to help with specifying arguments to functions, which can sometimes be difficult to remember:

In [ ]:

plt.hist

System commands¶

In IPython, you can type ls to see your files or cd to change directories, just like you would at a regular system prompt:

In [9]:

ls ../data

AIS/                          microbiome_missing.csv
baseball-archive-2011.sqlite  mushroom.csv
baseball.csv                  nashville_precip.txt
baseball.dat*                 occupancy.csv
besx97e.dta                   pima-indians-diabetes.data.txt
bikeshare.csv                 pima-indians-diabetes.metadata.txt
bodyfat.dat*                  pitches.csv
brasil_capitals.txt           pitches.md
cancer.csv                    prostate.data.txt
cdystonia.csv                 radon.csv
concrete.csv                  salmon.txt
credit.csv                    srrs2.dat*
cty.dat*                      survey.db
ebola/                        test_scores.csv
heart_rate.csv                titanic.html
heart_rate.txt                titanic.xls
measles.csv                   TNNASHVI.txt
measles.xlsx                  vlbw.csv
melanoma_data.py              walker.txt
microbiome/                   wine.dat*
microbiome.csv                wisconsin_breast_cancer.csv

Virtually any system command can be accessed by prepending !, which passes any subsequent command directly to the OS.

In [16]:

!touch test.txt

You can even use Python variables in commands sent to the OS:

In [17]:

file_type = 'csv'
!ls ../data/*$file_type

../data/baseball.csv	../data/microbiome_missing.csv
../data/bikeshare.csv	../data/mushroom.csv
../data/cancer.csv	../data/occupancy.csv
../data/cdystonia.csv	../data/pitches.csv
../data/concrete.csv	../data/radon.csv
../data/credit.csv	../data/test_scores.csv
../data/heart_rate.csv	../data/vlbw.csv
../data/measles.csv	../data/wisconsin_breast_cancer.csv
../data/microbiome.csv

The output of a system command using the exclamation point syntax can be assigned to a Python variable.

In [18]:

data_files = !ls ../data/microbiome/

In [19]:

data_files

Out[19]:

['metadata.xls',
 'MID1.xls',
 'MID2.xls',
 'MID3.xls',
 'MID4.xls',
 'MID5.xls',
 'MID6.xls',
 'MID7.xls',
 'MID8.xls',
 'MID9.xls']

Qt Console¶

If you type at the system prompt:

$ ipython qtconsole

instead of opening in a terminal, IPython will start a graphical console that at first sight appears just like a terminal, but which is in fact much more capable than a text-only terminal. This is a specialized terminal designed for interactive scientific work, and it supports full multi-line editing with color highlighting and graphical calltips for functions, it can keep multiple IPython sessions open simultaneously in tabs, and when scripts run it can display the figures inline directly in the work area.

qtconsole

Jupyter Notebook¶

Over time, the IPython project grew to include several components, including:

an interactive shell
a REPL protocol
a notebook document fromat
a notebook document conversion tool
a web-based notebook authoring tool
tools for building interactive UI (widgets)
interactive parallel Python

As each component has evolved, several had grown to the point that they warrented projects of their own. For example, pieces like the notebook and protocol are not even specific to Python. As the result, the IPython team created Project Jupyter, which is the new home of language-agnostic projects that began as part of IPython, such as the notebook in which you are reading this text.

The HTML notebook that is part of the Jupyter project supports interactive data visualization and easy high-performance parallel computing.

In [20]:

import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')

def f(x):
    return (x-3)*(x-5)*(x-7)+85

import numpy as np
x = np.linspace(0, 10, 200)
y = f(x)
plt.plot(x,y)

Out[20]:

[<matplotlib.lines.Line2D at 0x7f97e081ffd0>]

The notebook lets you document your workflow using either HTML or Markdown, providing a complete and self-contained record of a computation that can be exported to various formats and shared.

The Jupyter Notebook consists of three interacting components:

A notebook web application: An interactive web application for writing and running code interactively and authoring notebook documents.
Kernels: Separate processes started by the notebook web application that runs notebook code and returns output to the web application, as well as secondary features like interactive widgets, tab completion and introspection.
Notebook documents: JSON documents that contain a representation of all content visible in the notebook web application, including inputs and outputs of the computations, narrative text, equations, images, and rich media representations of objects. They are stored on your filesystem with an .ipynb extension.

The Notebook can be used by starting the Notebook server with the command:

$ jupyter notebook

This opens a Jupyter notebook dashboard that acts as a home page for your Jupyter instance. It displays the notebooks and other files in your current directory.

The notebook web application provides a rich computing environment for data science work. For example, you can embed images, videos, or entire websites into notebooks:

In [21]:

from IPython.display import HTML
HTML("<iframe src=http://fonnesbeck.github.io/Bios8366 width=700 height=350></iframe>")

/home/fonnesbeck/miniforge3/envs/bios8366/lib/python3.9/site-packages/IPython/core/display.py:724: UserWarning: Consider using IPython.display.IFrame instead
  warnings.warn("Consider using IPython.display.IFrame instead")

Out[21]:

In [22]:

from IPython.display import YouTubeVideo
YouTubeVideo("GExKsQ-OU78")

Out[22]:

Remote Code¶

Use %load to add remote code

In [24]:

# %load http://matplotlib.org/mpl_examples/shapes_and_collections/scatter_demo.py
"""
Simple demo of a scatter plot.
"""
import numpy as np
import matplotlib.pyplot as plt


N = 50
x = np.random.rand(N)
y = np.random.rand(N)
colors = np.random.rand(N)
area = np.pi * (15 * np.random.rand(N))**2  # 0 to 15 point radii

plt.scatter(x, y, s=area, c=colors, alpha=0.5)
plt.show()

Mathjax Support¶

Mathjax ia a javascript implementation $\alpha$ of LaTeX that allows equations to be embedded into HTML. For example, this markup:

"""$$ \int_{a}^{b} f(x)\, dx \approx \frac{1}{2} \sum_{k=1}^{N} \left( x_{k} - x_{k-1} \right) \left( f(x_{k}) + f(x_{k-1}) \right). $$"""

becomes this:

$$ \int_{a}^{b} f(x)\, dx \approx \frac{1}{2} \sum_{k=1}^{N} \left( x_{k} - x_{k-1} \right) \left( f(x_{k}) + f(x_{k-1}) \right). $$

SymPy Support¶

SymPy is a Python library for symbolic mathematics. It supports:

polynomials
calculus
solving equations
discrete math
matrices

In [29]:

# Never import like this!
from sympy import *
import warnings
warnings.filterwarnings('ignore')

init_printing()
x, y = symbols("x y")

In [28]:

eq = ((x+y)**2 * (x+1))
eq

Out[28]:

$\displaystyle \left(x + 1\right) \left(x + y\right)^{2}$

In [30]:

expand(eq)

Out[30]:

$\displaystyle x^{3} + 2 x^{2} y + x^{2} + x y^{2} + 2 x y + y^{2}$

In [31]:

(1/cos(x)).series(x, 0, 6)

Out[31]:

$\displaystyle 1 + \frac{x^{2}}{2} + \frac{5 x^{4}}{24} + O\left(x^{6}\right)$

In [32]:

limit((sin(x)-x)/x**3, x, 0)

Out[32]:

$\displaystyle - \frac{1}{6}$

In [33]:

diff(cos(x**2)**2 / (1+x), x)

Out[33]:

$\displaystyle - \frac{4 x \sin{\left(x^{2} \right)} \cos{\left(x^{2} \right)}}{x + 1} - \frac{\cos^{2}{\left(x^{2} \right)}}{\left(x + 1\right)^{2}}$

Magic functions¶

Jupyter has a set of predefined ‘magic functions’ that you can call with a command line style syntax. These include:

%run
%edit
%debug
%timeit
%paste
%load_ext

In [34]:

%lsmagic

Out[34]:

Available line magics:
%alias  %alias_magic  %autoawait  %autocall  %automagic  %autosave  %bookmark  %cat  %cd  %clear  %colors  %conda  %config  %connect_info  %cp  %debug  %dhist  %dirs  %doctest_mode  %ed  %edit  %env  %gui  %hist  %history  %killbgscripts  %ldir  %less  %lf  %lk  %ll  %load  %load_ext  %loadpy  %logoff  %logon  %logstart  %logstate  %logstop  %ls  %lsmagic  %lx  %macro  %magic  %man  %matplotlib  %mkdir  %more  %mv  %notebook  %page  %pastebin  %pdb  %pdef  %pdoc  %pfile  %pinfo  %pinfo2  %pip  %popd  %pprint  %precision  %prun  %psearch  %psource  %pushd  %pwd  %pycat  %pylab  %qtconsole  %quickref  %recall  %rehashx  %reload_ext  %rep  %rerun  %reset  %reset_selective  %rm  %rmdir  %run  %save  %sc  %set_env  %store  %sx  %system  %tb  %time  %timeit  %unalias  %unload_ext  %who  %who_ls  %whos  %xdel  %xmode

Available cell magics:
%%!  %%HTML  %%SVG  %%bash  %%capture  %%debug  %%file  %%html  %%javascript  %%js  %%latex  %%markdown  %%perl  %%prun  %%pypy  %%python  %%python2  %%python3  %%ruby  %%script  %%sh  %%svg  %%sx  %%system  %%time  %%timeit  %%writefile

Automagic is ON, % prefix IS NOT needed for line magics.

Timing the execution of code; the timeit magic exists both in line and cell form:

In [35]:

%timeit np.linalg.eigvals(np.random.rand(100,100))

3.48 ms ± 54.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [36]:

%%timeit a = np.random.rand(100, 100)
np.linalg.eigvals(a)

3.36 ms ± 146 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

IPython also creates aliases for a few common interpreters, such as bash, ruby, perl, etc.

These are all equivalent to %%script <name>

In [37]:

%%ruby
puts "Hello from Ruby #{RUBY_VERSION}"

Hello from Ruby 2.7.4

In [38]:

%%bash
echo "hello from $BASH"

hello from /bin/bash

IPython has an rmagic extension that contains a some magic functions for working with R via rpy2. This extension can be loaded using the %load_ext magic as follows:

In [39]:

%load_ext rpy2.ipython

If the above generates an error, it is likely that you do not have the rpy2 module installed. You can install this now via:

!pip install rpy2

In [40]:

x,y = np.arange(10), np.random.normal(size=10)
%R print(lm(rnorm(10)~rnorm(10)))

Call:
lm(formula = rnorm(10) ~ rnorm(10))

Coefficients:
(Intercept)  
     0.5598

Out[40]:

ListVector with 11 elements.

coefficients

FloatVector with 1 elements.

0.559818

residuals

FloatVector with 10 elements.

-0.539487

-0.752117

-0.765536

...

0.317830

-0.495214

0.527881

effects

FloatVector with 10 elements.

-1.770300

-0.622504

-0.635922

...

0.447443

-0.365601

0.657495

...

call

Call: lm(formula = rnorm(10) ~ rnorm(10)) Coefficients: (Intercept) 0.003158

terms

rnorm(10) ~ rnorm(10) attr(,"variables") list(rnorm(10)) attr(,"factors") rnorm(10) rnorm(10) 1 attr(,"term.labels") [1] "rnorm(10)" attr(,"order") [1] 1 attr(,"intercept") [1] 1 attr(,"response") [1] 1 attr(,".Environment") attr(,"predvars") list(rnorm(10)) attr(,"dataClasses") rnorm(10) "numeric"

model

R/rpy2 DataFrame (10 x 1)

rnorm(10)
...

In [41]:

%%R -i x,y -o XYcoef
lm.fit <- lm(y~x)
par(mfrow=c(2,2))
print(summary(lm.fit))
plot(lm.fit)
XYcoef <- coef(lm.fit)

Call:
lm(formula = y ~ x)

Residuals:
    Min      1Q  Median      3Q     Max 
-0.7938 -0.6169 -0.2493  0.6764  1.1001 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.05839    0.45573  -0.128    0.901
x           -0.03019    0.08537  -0.354    0.733

Residual standard error: 0.7754 on 8 degrees of freedom
Multiple R-squared:  0.0154,	Adjusted R-squared:  -0.1077 
F-statistic: 0.1251 on 1 and 8 DF,  p-value: 0.7327

In [42]:

XYcoef

Out[42]:

array([-0.05838551, -0.03019335])

LaTeX¶

In addition to MathJax support, you may declare a LaTeX cell using the %latex magic:

In [43]:

%%latex
\begin{align}
\nabla \times \vec{\mathbf{B}} -\, \frac1c\, \frac{\partial\vec{\mathbf{E}}}{\partial t} & = \frac{4\pi}{c}\vec{\mathbf{j}} \\
\nabla \cdot \vec{\mathbf{E}} & = 4 \pi \rho \\
\nabla \times \vec{\mathbf{E}}\, +\, \frac1c\, \frac{\partial\vec{\mathbf{B}}}{\partial t} & = \vec{\mathbf{0}} \\
\nabla \cdot \vec{\mathbf{B}} & = 0
\end{align}

\begin{align} \nabla \times \vec{\mathbf{B}} -\, \frac1c\, \frac{\partial\vec{\mathbf{E}}}{\partial t} & = \frac{4\pi}{c}\vec{\mathbf{j}} \\ \nabla \cdot \vec{\mathbf{E}} & = 4 \pi \rho \\ \nabla \times \vec{\mathbf{E}}\, +\, \frac1c\, \frac{\partial\vec{\mathbf{B}}}{\partial t} & = \vec{\mathbf{0}} \\ \nabla \cdot \vec{\mathbf{B}} & = 0 \end{align}

Javscript¶

Jupyter also enables objects to declare a JavaScript representation. At first, this may seem odd as output is inherently visual and JavaScript is a programming language. However, this opens the door for rich output that leverages the full power of JavaScript and associated libraries such as D3 for output.

In [44]:

%%javascript

alert("Hello world!");

Exporting and Converting Notebooks¶

In Jupyter, one can convert an .ipynb notebook document file into various static formats via the nbconvert tool. Currently, nbconvert is a command line tool, run as a script using Jupyter.

In [46]:

!jupyter nbconvert --to html Section0_1-IPython_and_Jupyter.ipynb

[NbConvertApp] Converting notebook Section0_1-IPython_and_Jupyter.ipynb to html
[NbConvertApp] Writing 747185 bytes to Section0_1-IPython_and_Jupyter.html

Currently, nbconvert supports HTML (default), LaTeX, Markdown, reStructuredText, Python and HTML5 slides for presentations. Some types can be post-processed, such as LaTeX to PDF (this requires Pandoc to be installed, however).

In [50]:

!jupyter nbconvert --to pdf Section2_1-Introduction-to-Pandas.ipynb

[NbConvertApp] Converting notebook Section2_1-Introduction-to-Pandas.ipynb to pdf
[NbConvertApp] Writing 96264 bytes to notebook.tex
[NbConvertApp] Building PDF
[NbConvertApp] Running xelatex 3 times: ['xelatex', 'notebook.tex', '-quiet']
[NbConvertApp] Running bibtex 1 time: ['bibtex', 'notebook']
[NbConvertApp] WARNING | bibtex had problems, most likely because there were no citations
[NbConvertApp] PDF successfully created
[NbConvertApp] Writing 98609 bytes to Section2_1-Introduction-to-Pandas.pdf

A very useful online service is the Jupyter Notebook Viewer which allows you to display your notebook as a static HTML page, which is useful for sharing with others:

In [49]:

%%html
<iframe src=https://nbviewer.org/2352771 width=700 height=300></iframe>

Also, GitHub supports the rendering of Jupyter Notebooks stored on its repositories.

Reproducible Research¶

reproducing conclusions from a single experiment based on the measurements from that experiment

The most basic form of reproducibility is a complete description of the data and associated analyses (including code!) so the results can be exactly reproduced by others.

Reproducing calculations can be onerous, even with one's own work!

Scientific data are becoming larger and more complex, making simple descriptions inadequate for reproducibility. As a result, most modern research is irreproducible without tremendous effort.

*** Reproducible research is not yet part of the culture of science in general, or scientific computing in particular. ***

Scientific Computing Workflow¶

There are a number of steps to scientific endeavors that involve computing:

workflow

Many of the standard tools impose barriers between one or more of these steps. This can make it difficult to iterate, reproduce work.

The Jupyter notebook eliminates or reduces these barriers to reproducibility.

Links and References¶

IPython Notebook Viewer Displays static HTML versions of notebooks, and includes a gallery of notebook examples.

NotebookCloud A service that allows you to launch and control IPython Notebook servers on Amazon EC2 from your browser.

A Reference-Free Algorithm for Computational Normalization of Shotgun Sequencing Data A landmark example of reproducible research in genomics: Git repo, iPython notebook, data and scripts.

Jacques Ravel and K Eric Wommack. 2014. All Hail Reproducibility in Microbiome Research. Microbiome, 2:8.

Benjamin Ragan-Kelley et al.. 2013. Collaborative cloud-enabled tools allow rapid, reproducible biological insights. The ISME Journal, 7, 461–464; doi:10.1038/ismej.2012.123;