Notebook

In [1]:

name = "2020-11-05-ways-of-python"
title = "Many ways to run Python"
tags = "basics, anaconda, command line, hpc, jupyter"
author = "Callum Rollo, Nele Reyniers"

In [2]:

from nb_tools import connect_notebook_to_post
from IPython.core.display import HTML

html = connect_notebook_to_post(name, title, tags, author)

Unlike other programs that have a single programming interface (matlab) or a dominant interface de jour (R with RStudio), Python has a whole ecosystem of programs for writing it. This can be confusing at first, with so much choice, what should you use for your project?

This presentation will cover some of the most popular Python interfaces, their pros and cons, and some situations in which one may be preferable to another. We will also discuss some operational details of the Anaconda package management system.

You can see a recording of this presentation here

In [ ]:

from pygments import highlight
from pygments.lexers import PythonLexer
from pygments.formatters import HtmlFormatter
import IPython

1. Jupyter notebooks¶

These are our primary teaching tool.

Pros¶

Web based interface, easy to maintain
Inline figures and markdown cells make great workbooks
Encourages self documenting code
"magic" functions to interact with operating system
Can share interactive notebooks online, e.g. via Binder

Cons¶

Harder to automate the scripts
Makes a mess in git
Requires a GUI to run/efficiently examine the notebooks

Also check out jupyterlab. This is the new standard for jupyter. Much more powerful and integrated. All projects written in notebooks can be continued in lab with no changes needed

In [2]:

# Demo some jupyter stuff here

2. Integrated Development Environments (IDEs)¶

These are full featured tools for code development. Spyder is very popular among scientists. Especially if you are coming from a matlab or RStudio background, the appearance of this IDE is very familiar and comforting. The whole thing isitself made in Python which is pretty cool

Pycharm is like Spyder on steriods

Pros¶

See variables, file system, command line and code at a glance
Loads of plugins (especially Pycharm)
Smart autocompletion
Code highlighting for e.g. unused imports, missing whitespace
Can handle outside programs like git

Cons¶

Heavy on OS resources, especially RAM
Can be slow to start

In [3]:

# Demo an IDE, including code hints and autocompletion

3. Python fresh from the command line¶

Just open up a Python prompt and start coding

This is a farily rare use case unless you are doing something very short. However, it's good to remember that this is availble. On pretty much any unix system (Linux, Mac etc) you can get straight to Python from the command line. This can be useful if you're logged in to a remote server and need to execute some Python in a hurry.

If you're writing more than a couple of lines however, you'll want to write some .py files and run them

In [4]:

# Python command line demo

4. Python in files¶

You can write Python in any text editor program. On UNIX systems vim and emacs remain popular after several decades. Atom is a more user friendly GUI based option. Windows users can try notepad++ for Python support

Pros¶

Simple and lightweight
Always there for you (especially vim)
Super portable scripts
Easy to automate with tools like cron

Cons¶

Limited autocompletion and error checking
No easy way to check workspace (variables, path etc)
Working with figures can be difficult (need to save to file and display)

Providing inputs to Python scripts run from the command line¶

There are different ways to turn your Python program (.py) into a commandline tool. We will demonstrate two of these options below.

sys.argv¶

The sys module is part of the standard Python library and contains functions to access and modify variables of the Python runtime environment. In this tutorial, we're only demonstrating one of its functions: sys.argv.

Let's look at the contents of a python script called halloween_sysargv.py below. It is a very simple demonstration of how to provide numerical, string (for example filenames!) or list inputs to a python program.

In [5]:

# Show content of a python script with syntax highlighting. Shamelessly copied from jgosmann's answer on 
# stackoverflow.com/questions/19197931/how-to-show-as-output-cell-the-contents-of-a-py-file-with-syntax-highlighting
with open('halloween_sysargv.py') as f:
        code = f.read()
formatter = HtmlFormatter()
IPython.display.HTML('<style type="text/css">{}</style>{}'.format(
    formatter.get_style_defs('.highlight'), highlight(code, PythonLexer(), HtmlFormatter())))

Out[5]:

"""
This is a spooky demonstration of using sys.argv to provide a python script with inputs.
"""
import sys

# access inputs with sys.argv and save them in a list:
inputs = sys.argv[1:]
# all elements of sys.argv are strings, but we want these inputs to be of different types:
number = int(inputs[0])
lantern = inputs[1]
animals = inputs[2].split(',')
# check the types of the variables now
print("{} is a {}".format(number, type(number)))
print("{} is a {}".format(lantern, type(lantern)))
print("{} is a {}".format(animals, type(animals)))
# get the name of the program
print("This program is called {}".format(sys.argv[0]))

Now we can run this script called halloween.py in the shell as follows:

In [6]:

! python3 halloween_sysargv.py 13 pumpkin cat,bat,spider  # the exlamation mark tells Jupyter we're running a shell command.

13 is a <class 'int'>
pumpkin is a <class 'str'>
['cat', 'bat', 'spider'] is a <class 'list'>
This program is called halloween_sysargv.py

So, everything after python3 halloween.py ends up as a string in a list returned by sys.argv. The first element of sys.argv is always the name of the program that is being run.

argparse¶

With argparse, you can easily supply your Python program with input from commandline in a more user friendly way. Inputs are supplied to your python program in the following format:

python myprogram.py -a avalue -b bvalue --option-c cvalue -f

The predecessor of argparse is optparse.

Content of halloween_argparse.py:

In [7]:

# Show content of a python script with syntax highlighting. Shamelessly copied from jgosmann's answer on 
# stackoverflow.com/questions/19197931/how-to-show-as-output-cell-the-contents-of-a-py-file-with-syntax-highlighting
with open('halloween_argparse.py') as f:
        code = f.read()
IPython.display.HTML('<style type="text/css">{}</style>{}'.format(
    formatter.get_style_defs('.highlight'), highlight(code, PythonLexer(), HtmlFormatter())))

Out[7]:

"""
This is a spooky demonstration of using argparse to provide a python script with inputs.
"""
from argparse import ArgumentParser

parser = ArgumentParser()
parser.add_argument("-n", "--number",
                    action="store", type=float,
                    help="An unlucky number (float).")
parser.add_argument("-l", "--lantern",
                    action="store", type=str, default="pumpkin",
                    help="Material to carve a lantern out of (string, default is pumpkin).")
parser.add_argument("-a", "--animals",
                    action="store", type=str, default='',
                    help=("A comma separated list of animals typically associated with halloween. "
                          "Example: bat,cat,rat (string)"))
parser.add_argument("-c", "--christmas",
                    action="store_true", default=False,
                    help="Indicate whether it is Christmas yet (bool, default is False).")
args = parser.parse_args()

# split the comma separated values in the animals argument
animals = args.animals.split(',')

# check the types of the variables:
print("{} is a {}".format(args.number, type(args.number)))
print("{} is a {}".format(args.lantern, type(args.lantern)))
print("{} is a {}, but {} is a {}".format(args.animals, type(args.animals), animals, type(animals)))
print("{} is a {}".format(args.christmas, type(args.christmas)))

In the terminal, we can provide inputs using the flags we specified:

In [8]:

! python3 halloween_argparse.py -n 13 --animals=cat,bat,spider,wolf -c  # ! running in shell

13.0 is a <class 'float'>
pumpkin is a <class 'str'>
cat,bat,spider,wolf is a <class 'str'>, but ['cat', 'bat', 'spider', 'wolf'] is a <class 'list'>
True is a <class 'bool'>

One of the advantages of argparse is that a help function is automatically generated from the "help" argument you supply when adding options:

In [9]:

! python halloween_argparse.py --help  # ! running in shell

usage: halloween_argparse.py [-h] [-n NUMBER] [-l LANTERN] [-a ANIMALS] [-c]

optional arguments:
  -h, --help            show this help message and exit
  -n NUMBER, --number NUMBER
                        An unlucky number (float).
  -l LANTERN, --lantern LANTERN
                        Material to carve a lantern out of (string, default is
                        pumpkin).
  -a ANIMALS, --animals ANIMALS
                        A comma separated list of animals typically associated
                        with halloween. Example: bat,cat,rat (string)
  -c, --christmas       Indicate whether it is Christmas yet (bool, default is
                        False).

See the documentation and tutorial to find out what else you can do with argparse.

5. Python on the HPC¶

Depending on your research, your data and your computer, you may want to consider running some or most of your analyses and experiments on a High Performance Computer (HPC). While the HPC is running your Python programs, your own machine is not burdened, so you can freely use it for other tasks or shut it off.

UEA has its own HPC for research: the new ADA Cluster. This provides me with an excellent excuse to insert an image of 19th century visionary Ada Lovelace.

Picture of Ada Lovelace

For more introduction on high performance computing and ADA, please see the UEA Research and Specialist Computing Support help pages. The HPC Team offers to meet with all new users to help you get started. You can use Conda to manage Python environments on ADA. Information on how to build and activate conda python environments on ADA can be found here.

On a HPC, you can either work interactively or submit batch jobs.

When submitting batch jobs (after code development and testing locally or in an interactive session), only the fourth way of Python above is available to you. Providing inputs from the command line will come in handy when submitting (array) jobs. Note that in batch jobs, you need to activate conda environments with source activate myenv instead of the otherwise recommended conda activate myenv.

In an interactive session, the recommended ways to work with Python on ADA are options 3 and 4 from above (from the UEA HPC team: "Jupyter Notebooks and IDEs rely on graphical interfaces that have high overheads and therefore generally don't work well on a cluster environment"). The file editors available on ADA are nano, nedit, emacs, Vi and gvim.

Anaconda¶

If you are not already familiar with Anaconda, it is a distribution of Python geared toward data scientists that aims to make it quick and easy to manage multiple projects with differing dependencies.

With Anaconda you can maintain seperate environments for all your projects.

Why would you want to do this? Different projects require different packages, and not all of these packages are able to interoperate. Particularly in science, we often need to use legacy software dependant on older modules. If you want to work on one project built in Python 2.7 and your new stuff in 3.8, you'll need to keep them seperate on your system so they don't interfere with each other. Anaconda is a very user friendly way to acheive this.

Schematic of Anaconda operation

The key to anaconda is environments. These are collections of Python modules, non Python programs (like jupyter notebooks, GDAL or Spyder) and a specific version of Python itself. There is no limit to the number of environments you can have. The only requirement is that each one has a unique name on your system.

Here's an example environment from our PPD Python course

yml
name: ppd_python
channels:
   - defaults
   - conda-forge
dependencies:
   - python=3.8
   - ipython
   - jupyter
   - numpy
   - matplotlib
   - pandas
   - cartopy
   - xarray
   - netcdf4
   - seaborn
   - spyder
   - tqdm
   - scipy
   - iris
   - plotly
   - cftime

The environment is created from a textfile. You need to specify a names, sources and the modules (dependencies) you need. In this case we specified Python=3.8, jupyter to run notebooks and a bunch of modules including numpy, matplotlib and scipy. This should be all anyone needs to replicate the same environment on their machine and run the scripts succesfully. If you are sharing code with others, always include an environment file so it runs correctly.

We will do a more detailed demo of package management with Anaconda in the future

How I start a Python project¶

Masterful flow chart of Python decision making

*Other Hosting Services Are Available

Reading¶

If you want a good science environment file to start from, try the one from ppd_python. You'll find some handy conda instruction in the repo description. Click to download the zip You want the environment.yml file. The environment is based on Python 3.8 which will be supported until October 2024
A solid intro to git by Software Carpentry
A cool trick with conda for bash users by Leo Uieda. N.B. conda activate is preferred to source activate these days.

Sources¶

Python on the ADA HPC

Images¶

Conda image: https://www.imperial.ac.uk/admin-services/ict/self-service/research-support/rcs/support/applications/conda/
Ada Lovelace: https://blogs.scientificamerican.com/observations/ada-lovelace-day-honors-the-first-computer-programmer/
Flow chart made with graphviz

In [3]:

HTML(html)

Out[3]:

This post was written as an IPython (Jupyter) notebook. You can view or download it using nbviewer.