name = "2020-11-05-ways-of-python"
title = "Many ways to run Python"
tags = "basics, anaconda, command line, hpc, jupyter"
author = "Callum Rollo, Nele Reyniers"
from nb_tools import connect_notebook_to_post
from IPython.core.display import HTML
html = connect_notebook_to_post(name, title, tags, author)
Unlike other programs that have a single programming interface (matlab) or a dominant interface de jour (R with RStudio), Python has a whole ecosystem of programs for writing it. This can be confusing at first, with so much choice, what should you use for your project?
This presentation will cover some of the most popular Python interfaces, their pros and cons, and some situations in which one may be preferable to another. We will also discuss some operational details of the Anaconda package management system.
You can see a recording of this presentation here
from pygments import highlight
from pygments.lexers import PythonLexer
from pygments.formatters import HtmlFormatter
import IPython
These are our primary teaching tool.
Also check out jupyterlab. This is the new standard for jupyter. Much more powerful and integrated. All projects written in notebooks can be continued in lab with no changes needed
# Demo some jupyter stuff here
These are full featured tools for code development. Spyder is very popular among scientists. Especially if you are coming from a matlab or RStudio background, the appearance of this IDE is very familiar and comforting. The whole thing isitself made in Python which is pretty cool
Pycharm is like Spyder on steriods
# Demo an IDE, including code hints and autocompletion
Just open up a Python prompt and start coding
This is a farily rare use case unless you are doing something very short. However, it's good to remember that this is availble. On pretty much any unix system (Linux, Mac etc) you can get straight to Python from the command line. This can be useful if you're logged in to a remote server and need to execute some Python in a hurry.
If you're writing more than a couple of lines however, you'll want to write some .py
files and run them
# Python command line demo
You can write Python in any text editor program. On UNIX systems vim and emacs remain popular after several decades. Atom is a more user friendly GUI based option. Windows users can try notepad++ for Python support
There are different ways to turn your Python program (.py) into a commandline tool. We will demonstrate two of these options below.
The sys module is part of the standard Python library and contains functions to access and modify variables of the Python runtime environment. In this tutorial, we're only demonstrating one of its functions: sys.argv
.
Let's look at the contents of a python script called halloween_sysargv.py
below. It is a very simple demonstration of how to provide numerical, string (for example filenames!) or list inputs to a python program.
# Show content of a python script with syntax highlighting. Shamelessly copied from jgosmann's answer on
# stackoverflow.com/questions/19197931/how-to-show-as-output-cell-the-contents-of-a-py-file-with-syntax-highlighting
with open('halloween_sysargv.py') as f:
code = f.read()
formatter = HtmlFormatter()
IPython.display.HTML('<style type="text/css">{}</style>{}'.format(
formatter.get_style_defs('.highlight'), highlight(code, PythonLexer(), HtmlFormatter())))
"""
This is a spooky demonstration of using sys.argv to provide a python script with inputs.
"""
import sys
# access inputs with sys.argv and save them in a list:
inputs = sys.argv[1:]
# all elements of sys.argv are strings, but we want these inputs to be of different types:
number = int(inputs[0])
lantern = inputs[1]
animals = inputs[2].split(',')
# check the types of the variables now
print("{} is a {}".format(number, type(number)))
print("{} is a {}".format(lantern, type(lantern)))
print("{} is a {}".format(animals, type(animals)))
# get the name of the program
print("This program is called {}".format(sys.argv[0]))
Now we can run this script called halloween.py in the shell as follows:
! python3 halloween_sysargv.py 13 pumpkin cat,bat,spider # the exlamation mark tells Jupyter we're running a shell command.
13 is a <class 'int'> pumpkin is a <class 'str'> ['cat', 'bat', 'spider'] is a <class 'list'> This program is called halloween_sysargv.py
So, everything after python3 halloween.py
ends up as a string
in a list
returned by sys.argv
. The first element of sys.argv is always the name of the program that is being run.
With argparse, you can easily supply your Python program with input from commandline in a more user friendly way. Inputs are supplied to your python program in the following format:
python myprogram.py -a avalue -b bvalue --option-c cvalue -f
The predecessor of argparse is optparse.
Content of halloween_argparse.py:
# Show content of a python script with syntax highlighting. Shamelessly copied from jgosmann's answer on
# stackoverflow.com/questions/19197931/how-to-show-as-output-cell-the-contents-of-a-py-file-with-syntax-highlighting
with open('halloween_argparse.py') as f:
code = f.read()
IPython.display.HTML('<style type="text/css">{}</style>{}'.format(
formatter.get_style_defs('.highlight'), highlight(code, PythonLexer(), HtmlFormatter())))
"""
This is a spooky demonstration of using argparse to provide a python script with inputs.
"""
from argparse import ArgumentParser
parser = ArgumentParser()
parser.add_argument("-n", "--number",
action="store", type=float,
help="An unlucky number (float).")
parser.add_argument("-l", "--lantern",
action="store", type=str, default="pumpkin",
help="Material to carve a lantern out of (string, default is pumpkin).")
parser.add_argument("-a", "--animals",
action="store", type=str, default='',
help=("A comma separated list of animals typically associated with halloween. "
"Example: bat,cat,rat (string)"))
parser.add_argument("-c", "--christmas",
action="store_true", default=False,
help="Indicate whether it is Christmas yet (bool, default is False).")
args = parser.parse_args()
# split the comma separated values in the animals argument
animals = args.animals.split(',')
# check the types of the variables:
print("{} is a {}".format(args.number, type(args.number)))
print("{} is a {}".format(args.lantern, type(args.lantern)))
print("{} is a {}, but {} is a {}".format(args.animals, type(args.animals), animals, type(animals)))
print("{} is a {}".format(args.christmas, type(args.christmas)))
In the terminal, we can provide inputs using the flags we specified:
! python3 halloween_argparse.py -n 13 --animals=cat,bat,spider,wolf -c # ! running in shell
13.0 is a <class 'float'> pumpkin is a <class 'str'> cat,bat,spider,wolf is a <class 'str'>, but ['cat', 'bat', 'spider', 'wolf'] is a <class 'list'> True is a <class 'bool'>
One of the advantages of argparse is that a help function is automatically generated from the "help" argument you supply when adding options:
! python halloween_argparse.py --help # ! running in shell
usage: halloween_argparse.py [-h] [-n NUMBER] [-l LANTERN] [-a ANIMALS] [-c] optional arguments: -h, --help show this help message and exit -n NUMBER, --number NUMBER An unlucky number (float). -l LANTERN, --lantern LANTERN Material to carve a lantern out of (string, default is pumpkin). -a ANIMALS, --animals ANIMALS A comma separated list of animals typically associated with halloween. Example: bat,cat,rat (string) -c, --christmas Indicate whether it is Christmas yet (bool, default is False).
See the documentation and tutorial to find out what else you can do with argparse.
Depending on your research, your data and your computer, you may want to consider running some or most of your analyses and experiments on a High Performance Computer (HPC). While the HPC is running your Python programs, your own machine is not burdened, so you can freely use it for other tasks or shut it off.
UEA has its own HPC for research: the new ADA Cluster. This provides me with an excellent excuse to insert an image of 19th century visionary Ada Lovelace.
For more introduction on high performance computing and ADA, please see the UEA Research and Specialist Computing Support help pages. The HPC Team offers to meet with all new users to help you get started. You can use Conda to manage Python environments on ADA. Information on how to build and activate conda python environments on ADA can be found here.
On a HPC, you can either work interactively or submit batch jobs.
When submitting batch jobs (after code development and testing locally or in an interactive session), only the fourth way of Python above is available to you. Providing inputs from the command line will come in handy when submitting (array) jobs. Note that in batch jobs, you need to activate conda environments with source activate myenv
instead of the otherwise recommended conda activate myenv
.
In an interactive session, the recommended ways to work with Python on ADA are options 3 and 4 from above (from the UEA HPC team: "Jupyter Notebooks and IDEs rely on graphical interfaces that have high overheads and therefore generally don't work well on a cluster environment"). The file editors available on ADA are nano, nedit, emacs, Vi and gvim.
If you are not already familiar with Anaconda, it is a distribution of Python geared toward data scientists that aims to make it quick and easy to manage multiple projects with differing dependencies.
With Anaconda you can maintain seperate environments for all your projects.
Why would you want to do this? Different projects require different packages, and not all of these packages are able to interoperate. Particularly in science, we often need to use legacy software dependant on older modules. If you want to work on one project built in Python 2.7 and your new stuff in 3.8, you'll need to keep them seperate on your system so they don't interfere with each other. Anaconda is a very user friendly way to acheive this.
The key to anaconda is environments. These are collections of Python modules, non Python programs (like jupyter notebooks, GDAL or Spyder) and a specific version of Python itself. There is no limit to the number of environments you can have. The only requirement is that each one has a unique name on your system.
Here's an example environment from our PPD Python course
yml
name: ppd_python
channels:
- defaults
- conda-forge
dependencies:
- python=3.8
- ipython
- jupyter
- numpy
- matplotlib
- pandas
- cartopy
- xarray
- netcdf4
- seaborn
- spyder
- tqdm
- scipy
- iris
- plotly
- cftime
The environment is created from a textfile. You need to specify a names, sources and the modules (dependencies) you need. In this case we specified Python=3.8
, jupyter
to run notebooks and a bunch of modules including numpy
, matplotlib
and scipy
. This should be all anyone needs to replicate the same environment on their machine and run the scripts succesfully. If you are sharing code with others, always include an environment file so it runs correctly.
We will do a more detailed demo of package management with Anaconda in the future
*Other Hosting Services Are Available
If you want a good science environment file to start from, try the one from ppd_python. You'll find some handy conda instruction in the repo description. Click to download the zip You want the environment.yml file. The environment is based on Python 3.8 which will be supported until October 2024
A solid intro to git by Software Carpentry
A cool trick with conda for bash users by Leo Uieda. N.B. conda activate
is preferred to source activate
these days.
HTML(html)
This post was written as an IPython (Jupyter) notebook. You can view or download it using nbviewer.