IPython (Interactive Python) is an enhanced Python shell which provides a more robust and productive development environment for users. There are several key features that set it apart from the standard Python shell.
In IPython, all your inputs and outputs are saved. There are two variables named In
and Out
which are assigned as you work with your results. All outputs are saved automatically to variables of the form _N
, where N
is the prompt number, and inputs to _iN
. This allows you to recover quickly the result of a prior computation by referring to its number even if you forgot to store it as a variable.
import numpy as np
np.sin(4)**2
0.5727500169043067
_1
0.5727500169043067
_i1
'import numpy as np\nnp.sin(4)**2'
_1 / 4.
0.14318750422607668
All output is displayed asynchronously as it is generated in the Kernel. If you execute the next cell, you will see the output one piece at a time, not all at the end.
import time, sys
for i in range(8):
print(i)
time.sleep(0.5)
0 1 2 3 4 5 6 7
If you want details regarding the properties and functionality of any Python objects currently loaded into IPython, you can use the ?
to reveal any details that are available:
some_dict = {}
some_dict?
Type: dict String form: {} Length: 0 Docstring: dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)
If available, additional detail is provided with two question marks, including the source code of the object itself.
from numpy.linalg import cholesky
cholesky??
Signature: cholesky(a) Source: @array_function_dispatch(_unary_dispatcher) def cholesky(a): """ Cholesky decomposition. Return the Cholesky decomposition, `L * L.H`, of the square matrix `a`, where `L` is lower-triangular and .H is the conjugate transpose operator (which is the ordinary transpose if `a` is real-valued). `a` must be Hermitian (symmetric if real-valued) and positive-definite. No checking is performed to verify whether `a` is Hermitian or not. In addition, only the lower-triangular and diagonal elements of `a` are used. Only `L` is actually returned. Parameters ---------- a : (..., M, M) array_like Hermitian (symmetric if all elements are real), positive-definite input matrix. Returns ------- L : (..., M, M) array_like Upper or lower-triangular Cholesky factor of `a`. Returns a matrix object if `a` is a matrix object. Raises ------ LinAlgError If the decomposition fails, for example, if `a` is not positive-definite. See Also -------- scipy.linalg.cholesky : Similar function in SciPy. scipy.linalg.cholesky_banded : Cholesky decompose a banded Hermitian positive-definite matrix. scipy.linalg.cho_factor : Cholesky decomposition of a matrix, to use in `scipy.linalg.cho_solve`. Notes ----- .. versionadded:: 1.8.0 Broadcasting rules apply, see the `numpy.linalg` documentation for details. The Cholesky decomposition is often used as a fast way of solving .. math:: A \\mathbf{x} = \\mathbf{b} (when `A` is both Hermitian/symmetric and positive-definite). First, we solve for :math:`\\mathbf{y}` in .. math:: L \\mathbf{y} = \\mathbf{b}, and then for :math:`\\mathbf{x}` in .. math:: L.H \\mathbf{x} = \\mathbf{y}. Examples -------- >>> A = np.array([[1,-2j],[2j,5]]) >>> A array([[ 1.+0.j, -0.-2.j], [ 0.+2.j, 5.+0.j]]) >>> L = np.linalg.cholesky(A) >>> L array([[1.+0.j, 0.+0.j], [0.+2.j, 1.+0.j]]) >>> np.dot(L, L.T.conj()) # verify that L * L.H = A array([[1.+0.j, 0.-2.j], [0.+2.j, 5.+0.j]]) >>> A = [[1,-2j],[2j,5]] # what happens if A is only array_like? >>> np.linalg.cholesky(A) # an ndarray object is returned array([[1.+0.j, 0.+0.j], [0.+2.j, 1.+0.j]]) >>> # But a matrix object is returned if A is a matrix object >>> np.linalg.cholesky(np.matrix(A)) matrix([[ 1.+0.j, 0.+0.j], [ 0.+2.j, 1.+0.j]]) """ extobj = get_linalg_error_extobj(_raise_linalgerror_nonposdef) gufunc = _umath_linalg.cholesky_lo a, wrap = _makearray(a) _assert_stacked_2d(a) _assert_stacked_square(a) t, result_t = _commonType(a) signature = 'D->D' if isComplexType(t) else 'd->d' r = gufunc(a, signature=signature, extobj=extobj) return wrap(r.astype(result_t, copy=False)) File: ~/miniforge3/envs/bios8366/lib/python3.9/site-packages/numpy/linalg/linalg.py Type: function
This syntax can also be used to search namespaces with wildcards (*).
%matplotlib inline
import pylab as plt
plt.*plot*?
plt.Subplot plt.SubplotSpec plt.SubplotTool plt.boxplot plt.eventplot plt.get_plot_commands plt.matplotlib plt.plot plt.plot_date plt.plotting plt.stackplot plt.streamplot plt.subplot plt.subplot2grid plt.subplot_mosaic plt.subplot_tool plt.subplots plt.subplots_adjust plt.triplot plt.violinplot
Because IPython allows for introspection, it is able to afford the user the ability to tab-complete commands that have been partially typed. This is done by pressing the <tab>
key at any point during the process of typing a command.
Place your cursor after the partially-completed command below and press tab:
np.ar
This can even be used to help with specifying arguments to functions, which can sometimes be difficult to remember:
plt.hist
In IPython, you can type ls
to see your files or cd
to change directories, just like you would at a regular system prompt:
ls ../data
AIS/ microbiome_missing.csv baseball-archive-2011.sqlite mushroom.csv baseball.csv nashville_precip.txt baseball.dat* occupancy.csv besx97e.dta pima-indians-diabetes.data.txt bikeshare.csv pima-indians-diabetes.metadata.txt bodyfat.dat* pitches.csv brasil_capitals.txt pitches.md cancer.csv prostate.data.txt cdystonia.csv radon.csv concrete.csv salmon.txt credit.csv srrs2.dat* cty.dat* survey.db ebola/ test_scores.csv heart_rate.csv titanic.html heart_rate.txt titanic.xls measles.csv TNNASHVI.txt measles.xlsx vlbw.csv melanoma_data.py walker.txt microbiome/ wine.dat* microbiome.csv wisconsin_breast_cancer.csv
Virtually any system command can be accessed by prepending !
, which passes any subsequent command directly to the OS.
!touch test.txt
You can even use Python variables in commands sent to the OS:
file_type = 'csv'
!ls ../data/*$file_type
../data/baseball.csv ../data/microbiome_missing.csv ../data/bikeshare.csv ../data/mushroom.csv ../data/cancer.csv ../data/occupancy.csv ../data/cdystonia.csv ../data/pitches.csv ../data/concrete.csv ../data/radon.csv ../data/credit.csv ../data/test_scores.csv ../data/heart_rate.csv ../data/vlbw.csv ../data/measles.csv ../data/wisconsin_breast_cancer.csv ../data/microbiome.csv
The output of a system command using the exclamation point syntax can be assigned to a Python variable.
data_files = !ls ../data/microbiome/
data_files
['metadata.xls', 'MID1.xls', 'MID2.xls', 'MID3.xls', 'MID4.xls', 'MID5.xls', 'MID6.xls', 'MID7.xls', 'MID8.xls', 'MID9.xls']
If you type at the system prompt:
$ ipython qtconsole
instead of opening in a terminal, IPython will start a graphical console that at first sight appears just like a terminal, but which is in fact much more capable than a text-only terminal. This is a specialized terminal designed for interactive scientific work, and it supports full multi-line editing with color highlighting and graphical calltips for functions, it can keep multiple IPython sessions open simultaneously in tabs, and when scripts run it can display the figures inline directly in the work area.
Over time, the IPython project grew to include several components, including:
As each component has evolved, several had grown to the point that they warrented projects of their own. For example, pieces like the notebook and protocol are not even specific to Python. As the result, the IPython team created Project Jupyter, which is the new home of language-agnostic projects that began as part of IPython, such as the notebook in which you are reading this text.
The HTML notebook that is part of the Jupyter project supports interactive data visualization and easy high-performance parallel computing.
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
def f(x):
return (x-3)*(x-5)*(x-7)+85
import numpy as np
x = np.linspace(0, 10, 200)
y = f(x)
plt.plot(x,y)
[<matplotlib.lines.Line2D at 0x7f97e081ffd0>]
The notebook lets you document your workflow using either HTML or Markdown, providing a complete and self-contained record of a computation that can be exported to various formats and shared.
The Jupyter Notebook consists of three interacting components:
.ipynb
extension.The Notebook can be used by starting the Notebook server with the command:
$ jupyter notebook
This opens a Jupyter notebook dashboard that acts as a home page for your Jupyter instance. It displays the notebooks and other files in your current directory.
The notebook web application provides a rich computing environment for data science work. For example, you can embed images, videos, or entire websites into notebooks:
from IPython.display import HTML
HTML("<iframe src=http://fonnesbeck.github.io/Bios8366 width=700 height=350></iframe>")
/home/fonnesbeck/miniforge3/envs/bios8366/lib/python3.9/site-packages/IPython/core/display.py:724: UserWarning: Consider using IPython.display.IFrame instead warnings.warn("Consider using IPython.display.IFrame instead")
from IPython.display import YouTubeVideo
YouTubeVideo("GExKsQ-OU78")
Use %load
to add remote code
# %load http://matplotlib.org/mpl_examples/shapes_and_collections/scatter_demo.py
"""
Simple demo of a scatter plot.
"""
import numpy as np
import matplotlib.pyplot as plt
N = 50
x = np.random.rand(N)
y = np.random.rand(N)
colors = np.random.rand(N)
area = np.pi * (15 * np.random.rand(N))**2 # 0 to 15 point radii
plt.scatter(x, y, s=area, c=colors, alpha=0.5)
plt.show()
Mathjax ia a javascript implementation $\alpha$ of LaTeX that allows equations to be embedded into HTML. For example, this markup:
"""$$ \int_{a}^{b} f(x)\, dx \approx \frac{1}{2} \sum_{k=1}^{N} \left( x_{k} - x_{k-1} \right) \left( f(x_{k}) + f(x_{k-1}) \right). $$"""
becomes this:
$$ \int_{a}^{b} f(x)\, dx \approx \frac{1}{2} \sum_{k=1}^{N} \left( x_{k} - x_{k-1} \right) \left( f(x_{k}) + f(x_{k-1}) \right). $$SymPy is a Python library for symbolic mathematics. It supports:
# Never import like this!
from sympy import *
import warnings
warnings.filterwarnings('ignore')
init_printing()
x, y = symbols("x y")
eq = ((x+y)**2 * (x+1))
eq
expand(eq)
(1/cos(x)).series(x, 0, 6)
limit((sin(x)-x)/x**3, x, 0)
diff(cos(x**2)**2 / (1+x), x)
Jupyter has a set of predefined ‘magic functions’ that you can call with a command line style syntax. These include:
%run
%edit
%debug
%timeit
%paste
%load_ext
%lsmagic
Available line magics: %alias %alias_magic %autoawait %autocall %automagic %autosave %bookmark %cat %cd %clear %colors %conda %config %connect_info %cp %debug %dhist %dirs %doctest_mode %ed %edit %env %gui %hist %history %killbgscripts %ldir %less %lf %lk %ll %load %load_ext %loadpy %logoff %logon %logstart %logstate %logstop %ls %lsmagic %lx %macro %magic %man %matplotlib %mkdir %more %mv %notebook %page %pastebin %pdb %pdef %pdoc %pfile %pinfo %pinfo2 %pip %popd %pprint %precision %prun %psearch %psource %pushd %pwd %pycat %pylab %qtconsole %quickref %recall %rehashx %reload_ext %rep %rerun %reset %reset_selective %rm %rmdir %run %save %sc %set_env %store %sx %system %tb %time %timeit %unalias %unload_ext %who %who_ls %whos %xdel %xmode Available cell magics: %%! %%HTML %%SVG %%bash %%capture %%debug %%file %%html %%javascript %%js %%latex %%markdown %%perl %%prun %%pypy %%python %%python2 %%python3 %%ruby %%script %%sh %%svg %%sx %%system %%time %%timeit %%writefile Automagic is ON, % prefix IS NOT needed for line magics.
Timing the execution of code; the timeit
magic exists both in line and cell form:
%timeit np.linalg.eigvals(np.random.rand(100,100))
3.48 ms ± 54.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit a = np.random.rand(100, 100)
np.linalg.eigvals(a)
3.36 ms ± 146 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
IPython also creates aliases for a few common interpreters, such as bash, ruby, perl, etc.
These are all equivalent to %%script <name>
%%ruby
puts "Hello from Ruby #{RUBY_VERSION}"
Hello from Ruby 2.7.4
%%bash
echo "hello from $BASH"
hello from /bin/bash
IPython has an rmagic
extension that contains a some magic functions for working with R via rpy2. This extension can be loaded using the %load_ext
magic as follows:
%load_ext rpy2.ipython
If the above generates an error, it is likely that you do not have the rpy2
module installed. You can install this now via:
!pip install rpy2
x,y = np.arange(10), np.random.normal(size=10)
%R print(lm(rnorm(10)~rnorm(10)))
Call: lm(formula = rnorm(10) ~ rnorm(10)) Coefficients: (Intercept) 0.5598
coefficients |
FloatVector with 1 elements.
|
|||||||
---|---|---|---|---|---|---|---|---|
residuals |
FloatVector with 10 elements.
|
|||||||
effects |
FloatVector with 10 elements.
|
|||||||
... | ... | |||||||
call | Call: lm(formula = rnorm(10) ~ rnorm(10)) Coefficients: (Intercept) 0.003158 | |||||||
terms |
rnorm(10) ~ rnorm(10)
attr(,"variables")
list(rnorm(10))
attr(,"factors")
rnorm(10)
rnorm(10) 1
attr(,"term.labels")
[1] "rnorm(10)"
attr(,"order")
[1] 1
attr(,"intercept")
[1] 1
attr(,"response")
[1] 1
attr(,".Environment")
|
|||||||
model |
R/rpy2 DataFrame (10 x 1)
|
%%R -i x,y -o XYcoef
lm.fit <- lm(y~x)
par(mfrow=c(2,2))
print(summary(lm.fit))
plot(lm.fit)
XYcoef <- coef(lm.fit)
Call: lm(formula = y ~ x) Residuals: Min 1Q Median 3Q Max -0.7938 -0.6169 -0.2493 0.6764 1.1001 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.05839 0.45573 -0.128 0.901 x -0.03019 0.08537 -0.354 0.733 Residual standard error: 0.7754 on 8 degrees of freedom Multiple R-squared: 0.0154, Adjusted R-squared: -0.1077 F-statistic: 0.1251 on 1 and 8 DF, p-value: 0.7327
XYcoef
array([-0.05838551, -0.03019335])
In addition to MathJax support, you may declare a LaTeX cell using the %latex
magic:
%%latex
\begin{align}
\nabla \times \vec{\mathbf{B}} -\, \frac1c\, \frac{\partial\vec{\mathbf{E}}}{\partial t} & = \frac{4\pi}{c}\vec{\mathbf{j}} \\
\nabla \cdot \vec{\mathbf{E}} & = 4 \pi \rho \\
\nabla \times \vec{\mathbf{E}}\, +\, \frac1c\, \frac{\partial\vec{\mathbf{B}}}{\partial t} & = \vec{\mathbf{0}} \\
\nabla \cdot \vec{\mathbf{B}} & = 0
\end{align}
Jupyter also enables objects to declare a JavaScript representation. At first, this may seem odd as output is inherently visual and JavaScript is a programming language. However, this opens the door for rich output that leverages the full power of JavaScript and associated libraries such as D3 for output.
%%javascript
alert("Hello world!");
In Jupyter, one can convert an .ipynb
notebook document file into various static formats via the nbconvert
tool. Currently, nbconvert is a command line tool, run as a script using Jupyter.
!jupyter nbconvert --to html Section0_1-IPython_and_Jupyter.ipynb
[NbConvertApp] Converting notebook Section0_1-IPython_and_Jupyter.ipynb to html [NbConvertApp] Writing 747185 bytes to Section0_1-IPython_and_Jupyter.html
Currently, nbconvert
supports HTML (default), LaTeX, Markdown, reStructuredText, Python and HTML5 slides for presentations. Some types can be post-processed, such as LaTeX to PDF (this requires Pandoc to be installed, however).
!jupyter nbconvert --to pdf Section2_1-Introduction-to-Pandas.ipynb
[NbConvertApp] Converting notebook Section2_1-Introduction-to-Pandas.ipynb to pdf [NbConvertApp] Writing 96264 bytes to notebook.tex [NbConvertApp] Building PDF [NbConvertApp] Running xelatex 3 times: ['xelatex', 'notebook.tex', '-quiet'] [NbConvertApp] Running bibtex 1 time: ['bibtex', 'notebook'] [NbConvertApp] WARNING | bibtex had problems, most likely because there were no citations [NbConvertApp] PDF successfully created [NbConvertApp] Writing 98609 bytes to Section2_1-Introduction-to-Pandas.pdf
A very useful online service is the Jupyter Notebook Viewer which allows you to display your notebook as a static HTML page, which is useful for sharing with others:
%%html
<iframe src=https://nbviewer.org/2352771 width=700 height=300></iframe>
Also, GitHub supports the rendering of Jupyter Notebooks stored on its repositories.
reproducing conclusions from a single experiment based on the measurements from that experiment
The most basic form of reproducibility is a complete description of the data and associated analyses (including code!) so the results can be exactly reproduced by others.
Reproducing calculations can be onerous, even with one's own work!
Scientific data are becoming larger and more complex, making simple descriptions inadequate for reproducibility. As a result, most modern research is irreproducible without tremendous effort.
*** Reproducible research is not yet part of the culture of science in general, or scientific computing in particular. ***
There are a number of steps to scientific endeavors that involve computing:
Many of the standard tools impose barriers between one or more of these steps. This can make it difficult to iterate, reproduce work.
The Jupyter notebook eliminates or reduces these barriers to reproducibility.
IPython Notebook Viewer Displays static HTML versions of notebooks, and includes a gallery of notebook examples.
NotebookCloud A service that allows you to launch and control IPython Notebook servers on Amazon EC2 from your browser.
A Reference-Free Algorithm for Computational Normalization of Shotgun Sequencing Data A landmark example of reproducible research in genomics: Git repo, iPython notebook, data and scripts.
Jacques Ravel and K Eric Wommack. 2014. All Hail Reproducibility in Microbiome Research. Microbiome, 2:8.
Benjamin Ragan-Kelley et al.. 2013. Collaborative cloud-enabled tools allow rapid, reproducible biological insights. The ISME Journal, 7, 461–464; doi:10.1038/ismej.2012.123;