%matplotlib inline
import numpy as np
import astropy.io.fits as fits
from astropy.convolution import Gaussian2DKernel, convolve_fft
# Create empty image
nx, ny = 512, 512
image = np.zeros((ny, nx))
# Set number of stars
n_stars = 10000
# Generate random positions
r = np.random.random(n_stars) * nx
theta = np.random.uniform(0., 2. * np.pi, n_stars)
# Generate random fluxes
fluxes = np.random.random(n_stars) ** 2
# Compute position
x = nx / 2 + r * np.cos(theta)
y = ny / 2 + r * np.sin(theta)
# Add stars to image
# ==> First for loop and if statement <==
for idx in range(n_stars):
if x[idx] >= 0 and x[idx] < nx and y[idx] >= 0 and y[idx] < ny:
image[y[idx], x[idx]] += fluxes[idx]
# Convolve with a gaussian
kernel = Gaussian2DKernel(stddev=1)
image = convolve_fft(image, kernel)
# Add noise
image += np.random.normal(1., 0.001, image.shape)
# Write to a FITS file
fits.writeto('cluster.fits', image, clobber=True)
WARNING: Overwriting existing file 'cluster.fits'. [astropy.io.fits.file] WARNING:astropy:Overwriting existing file 'cluster.fits'.
# APLpy is a third-party library for Python designed for easy creation of
# astronomy image plots
import aplpy
fig = aplpy.FITSFigure('cluster.fits')
fig.show_colorscale(cmap='hot')
WARNING: No WCS information found in header - using pixel coordinates [aplpy.header] WARNING:astropy:No WCS information found in header - using pixel coordinates INFO:astropy:Auto-setting vmin to 9.820e-01 INFO:astropy:Auto-setting vmax to 1.168e+00
INFO: Auto-setting vmin to 9.820e-01 [aplpy.core] INFO: Auto-setting vmax to 1.168e+00 [aplpy.core]
First thing's first, make this into an executable script by adding a "shebang" line, like
#!/usr/bin/env python
This indicates to your shell to run the remainder of this file through Python with a copy of your current shell environment. We also use IPython's %%file
magic to write this out as a file on disk:
%%file simcluster.py
#!/usr/bin/env python
import numpy as np
import astropy.io.fits as fits
from astropy.convolution import Gaussian2DKernel, convolve_fft
# Create empty image
nx, ny = 512, 512
image = np.zeros((ny, nx))
# Set number of stars
n_stars = 10000
# Generate random positions
r = np.random.random(n_stars) * nx
theta = np.random.uniform(0., 2. * np.pi, n_stars)
# Generate random fluxes
fluxes = np.random.random(n_stars) ** 2
# Compute position
x = nx / 2 + r * np.cos(theta)
y = ny / 2 + r * np.sin(theta)
# Add stars to image
# ==> First for loop and if statement <==
for idx in range(n_stars):
if x[idx] >= 0 and x[idx] < nx and y[idx] >= 0 and y[idx] < ny:
image[y[idx], x[idx]] += fluxes[idx]
# Convolve with a gaussian
kernel = Gaussian2DKernel(stddev=1)
image = convolve_fft(image, kernel)
# Add noise
image += np.random.normal(1., 0.001, image.shape)
# Write to a FITS file
fits.writeto('cluster.fits', image, clobber=True)
Overwriting simcluster.py
We also want to set the "executable" bit on our file's permissions, which tells the shell that this is an executable file (i.e. if we enter the file's name at a command prompt it will try to run it as a program):
! chmod +x simcluster.py
chmod
is the shell program for setting file permissions (it can also be used to set read/write permissions). The +x
option means to set the file simcluster
as "executable" for all users (likewise -x
removes the executable flag).
Now we can run our program at the command prompt:
! ./simcluster.py
WARNING: Overwriting existing file 'cluster.fits'. [astropy.io.fits.file]
We include the ./
because without it the shell does not know where to look for the program named "simcluster". If a program name is entered at the shell without an explicit path (like we do when we run ls
, chmod
, etc.) the shell searches all the entries on our $PATH
environment variable for a program by that name. We can check the $PATH
using the shell like:
! echo $PATH
/usr/lib/lightdm/lightdm:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
Or we can read and set environment variables directly in Python using the os
module:
import os
os.environ['PATH']
'/usr/lib/lightdm/lightdm:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games'
Getting back to our script, the next thing to do to make this script into a reusable program that others can use too, would be to note that we actually have several parameters for which specified some values that worked for us at the time, but that others (where "others" includes ourselves, in the future) may wish to adjust.
Perhaps most notably the file name that we wrote to (maybe "clobber.fits" doesn't do it for everyone). Other parameters we might want to make adjustable would be the number of stars, and the dimensions (in pixels, for now) of the image. Those are the only parameters we'll focus on for now--can you think of others?
It isn't very nice to have to modify our code every time we want to tweak some parameters. Nevermind that we might not even remember what parameters we used to generate a particular image (did this have 10000 stars? or 12000? I don't remember since I changed it since I last generated that image...). Of course since we're writing to FITS files there are things we might do, like include keywords in the header specifying the parameters of the image). But for ease of use we would like our script to accept command-line argument to set the number of stars, etc. Then the command we used to generate that file would be in our shell history. It also means that when we share are script with someone else we just have to tell them how to run it--not how to go in and edit the code.
How do Python scripts accept command-line arguments?
I've you've ever written a program in C you've probably written a main function like:
int main(int argc, char** argv) { ... }
When you run your program, the program name itself along with any command-line arguments that followed the program name are passed in via an array of strings in argv
(where argc
tells us the number of arguments that were passed in).
In Python the equivalent is just a variable that lives in the sys
module called sys.argv
. It gives us a list
of all the command-line arguments. To see this, let's just print sys.argv
at the top of our script:
%%file simcluster.py
#!/usr/bin/env python
import numpy as np
import astropy.io.fits as fits
from astropy.convolution import Gaussian2DKernel, convolve_fft
# Let's take a look at the command-line arguments:
import sys
print(sys.argv)
# Create empty image
nx, ny = 512, 512
image = np.zeros((ny, nx))
# Set number of stars
n_stars = 10000
# Generate random positions
r = np.random.random(n_stars) * nx
theta = np.random.uniform(0., 2. * np.pi, n_stars)
# Generate random fluxes
fluxes = np.random.random(n_stars) ** 2
# Compute position
x = nx / 2 + r * np.cos(theta)
y = ny / 2 + r * np.sin(theta)
# Add stars to image
# ==> First for loop and if statement <==
for idx in range(n_stars):
if x[idx] >= 0 and x[idx] < nx and y[idx] >= 0 and y[idx] < ny:
image[y[idx], x[idx]] += fluxes[idx]
# Convolve with a gaussian
kernel = Gaussian2DKernel(stddev=1)
image = convolve_fft(image, kernel)
# Add noise
image += np.random.normal(1., 0.001, image.shape)
# Write to a FITS file
fits.writeto('cluster.fits', image, clobber=True)
Overwriting simcluster.py
! ./simcluster.py --stars=10000 -x 512 -y 512 cluster.fits
['./simcluster.py', '--stars=10000', '-x', '512', '-y', '512', 'cluster.fits']
WARNING: Overwriting existing file 'cluster.fits'. [astropy.io.fits.file]
What we would like to do is parse these command-line options so that we use the value of the --stars
option to set our n_stars
variable, and we also note that we accept -x
and -y
options that we want to take as our x and y dimensions. We also need to handle the filename.
And remember when we said we'll want to explain to our colleagues how to use our program? Well, if we provide them with a --help
option, maybe we won't have to do even that, if it's a simple enough program.
Just to throw a wrench into things we also want to support -s
as a shortcut for --stars
and -h
as a shortcut for --help
.
We could spend a while writing all the code to handle these command-line arguments (rather than getting on with our science). But fortunately, except in extreme cases, we don't have to do that. This is already a solved problem, and there are libraries that will make it easy to add standard argument handling to our script.
Python comes out of the box with three such libraries:
Why three? Is this just to confuse you? No! It's because Python is a living language, and the standard library continues to evolve. getopt
is an interface to a C library of the same name, and is fairly basic. optparse
is a higher-level library that getopt
that has been in Python for a long time, but was beginning to show its age. argparse
already existed as a third-party library, and (most) people liked it so much more than the old optparse
that it was added as a replacement for optparse
(optparse
is kept around though so that old programs don't just break!)
We will take a brief look at argparse
. Here's one way we might go about adding argument parsing to our program with argparse
:
%%file simcluster.py
#!/usr/bin/env python
import argparse # We added a new import statement for argarse
import sys # Also import sys for sys.argv
import numpy as np
import astropy.io.fits as fits
from astropy.convolution import Gaussian2DKernel, convolve_fft
# Create an ArgumentParser--a special object that keeps track of all the
# arguments we want our script to be able to handle, and then implements parsing
# them from sys.argv
parser = argparse.ArgumentParser(description="Generates simulated images of clusters")
# Add an optional argument for # of stars (default=10000)
parser.add_argument('-s', '--stars', type=int, default=10000,
help='the number of stars to generate')
# Add the x and y arguments
parser.add_argument('-x', type=int, default=512,
help='the x dimension (in pixels) of the image')
parser.add_argument('-y', type=int, default=512,
help='the y dimension (in pixels) of the image')
# Add an argument to handle the output file. If we use argparse.FileType it will
# handle opening a writeable file (and ensuring we can write to it).
# Note that the argument name 'file' does not beging with a '-' or '--'; this indicates
# to argparse that it is a *positional* argument
parser.add_argument('file', type=argparse.FileType('w'),
help='the name of the output file')
args = parser.parse_args(sys.argv[1:])
n_stars = args.stars
nx = args.x
ny = args.y
fileobj = args.file
# Create empty image
image = np.zeros((ny, nx))
# Generate random positions
r = np.random.random(n_stars) * nx
theta = np.random.uniform(0., 2. * np.pi, n_stars)
# Generate random fluxes
fluxes = np.random.random(n_stars) ** 2
# Compute position
x = nx / 2 + r * np.cos(theta)
y = ny / 2 + r * np.sin(theta)
# Add stars to image
# ==> First for loop and if statement <==
for idx in range(n_stars):
if x[idx] >= 0 and x[idx] < nx and y[idx] >= 0 and y[idx] < ny:
image[y[idx], x[idx]] += fluxes[idx]
# Convolve with a gaussian
kernel = Gaussian2DKernel(stddev=1)
image = convolve_fft(image, kernel)
# Add noise
image += np.random.normal(1., 0.001, image.shape)
# Write to a FITS file
fits.writeto(fileobj, image, clobber=True)
Overwriting simcluster.py
So what did all that get us? What do all the help strings do for us (other than document the code)? Well let's try it out:
! ./simcluster.py --help
usage: simcluster.py [-h] [-s STARS] [-x X] [-y Y] file Generates simulated images of clusters positional arguments: file the name of the output file optional arguments: -h, --help show this help message and exit -s STARS, --stars STARS the number of stars to generate -x X the x dimension (in pixels) of the image -y Y the y dimension (in pixels) of the image
Note also that the output file is not optional--we must provide one now:
! ./simcluster.py --stars=12000
usage: simcluster.py [-h] [-s STARS] [-x X] [-y Y] file simcluster.py: error: too few arguments
! ./simcluster.py --stars=12000 cluster2.fits
fig = aplpy.FITSFigure('cluster2.fits')
fig.show_colorscale(cmap='hot')
WARNING: No WCS information found in header - using pixel coordinates [aplpy.header] WARNING:astropy:No WCS information found in header - using pixel coordinates INFO:astropy:Auto-setting vmin to 9.814e-01 INFO:astropy:Auto-setting vmax to 1.175e+00
INFO: Auto-setting vmin to 9.814e-01 [aplpy.core] INFO: Auto-setting vmax to 1.175e+00 [aplpy.core]
Now we have a new problem--our script is getting long, and a bit jumbled. We have argument handling, image processing, and file writing all mixed together. Even though we've done a fairly good job organizing the source into stages, this does not scale well as our script gets more sophisticated and grows from 50 lines to 100 lines to 1000 lines or more (for example the writing stage is just one line right now, but what if we want to add more to the FITS header? Maybe generate a WCS (which will require a lot more command-line options as well). Maybe we want to support other output formats. And of course the image generating itself could grow vastly more sophisticated.
Note also that the code for generating the image has no deep connection to the code for argument handling. It existed before we added argument parsing. So maybe that independence should be made explicit by breaking these stages into separate functions. Note also that generating the image just involved creating a Numpy array. We don't necessarily have to write that out to a file at all. Maybe we could use that function as just one step in a pipeline that will do some analysis (source counting, etc.) on our generated image.
To get this script under control we'll start breaking it up into a few different functions. Remember how for a C program we had a main function:
int main(int argc, char** argv) { ... }
that receives all the command-line arguments? Let's make a main
function in our Python program as well and use that as a place to do all our argument handling:
%%file simcluster.py
#!/usr/bin/env python
import argparse # We added a new import statement for argarse
import sys # Also import sys for sys.argv
import numpy as np
import astropy.io.fits as fits
from astropy.convolution import Gaussian2DKernel, convolve_fft
def main(argv):
# Create an ArgumentParser--a special object that keeps track of all the
# arguments we want our script to be able to handle, and then implements parsing
# them from sys.argv
parser = argparse.ArgumentParser(description="Generates simulated images of clusters")
# Add an optional argument for # of stars (default=10000)
parser.add_argument('-s', '--stars', type=int, default=10000,
help='the number of stars to generate')
# Add the x and y arguments
parser.add_argument('-x', type=int, default=512,
help='the x dimension (in pixels) of the image')
parser.add_argument('-y', type=int, default=512,
help='the y dimension (in pixels) of the image')
# Add an argument to handle the output file. If we use argparse.FileType it will
# handle opening a writeable file (and ensuring we can write to it).
# Note that the argument name 'file' does not beging with a '-' or '--'; this indicates
# to argparse that it is a *positional* argument
parser.add_argument('file', type=argparse.FileType('w'),
help='the name of the output file')
args = parser.parse_args(argv)
n_stars = args.stars
nx = args.x
ny = args.y
fileobj = args.file
# Create empty image
image = np.zeros((ny, nx))
# Generate random positions
r = np.random.random(n_stars) * nx
theta = np.random.uniform(0., 2. * np.pi, n_stars)
# Generate random fluxes
fluxes = np.random.random(n_stars) ** 2
# Compute position
x = nx / 2 + r * np.cos(theta)
y = ny / 2 + r * np.sin(theta)
# Add stars to image
# ==> First for loop and if statement <==
for idx in range(n_stars):
if x[idx] >= 0 and x[idx] < nx and y[idx] >= 0 and y[idx] < ny:
image[y[idx], x[idx]] += fluxes[idx]
# Convolve with a gaussian
kernel = Gaussian2DKernel(stddev=1)
image = convolve_fft(image, kernel)
# Add noise
image += np.random.normal(1., 0.001, image.shape)
# Write to a FITS file
fits.writeto(fileobj, image, clobber=True)
if __name__ == '__main__':
main(sys.argv[1:])
Overwriting simcluster.py
Now let's move the code for actually generating the pixels of the simulated image into its own function:
%%file simcluster.py
#!/usr/bin/env python
import argparse # We added a new import statement for argarse
import sys # Also import sys for sys.argv
import numpy as np
import astropy.io.fits as fits
from astropy.convolution import Gaussian2DKernel, convolve_fft
def simulated_cluster(n_stars=10000, dimensions=(512, 512)):
"""
Generates an image simulating a cluster of stars, including
a Gaussian filter and background noise.
Parameters
----------
n_stars : `int`
A positive integer giving the number of visible stars in the image
(default: 10000).
dimensions : `tuple`
A two-tuple of positive integers specifying the dimensions (in pixels)
of the output image (default: 512x512).
Returns
-------
array : `~numpy.ndarray`
A 2D Numpy array containing the pixels of the generated image.
"""
nx, ny = dimensions
# Create empty image
image = np.zeros((ny, nx))
# Generate random positions
r = np.random.random(n_stars) * nx
theta = np.random.uniform(0., 2. * np.pi, n_stars)
# Generate random fluxes
fluxes = np.random.random(n_stars) ** 2
# Compute position
x = nx / 2 + r * np.cos(theta)
y = ny / 2 + r * np.sin(theta)
# Add stars to image
# ==> First for loop and if statement <==
for idx in range(n_stars):
if x[idx] >= 0 and x[idx] < nx and y[idx] >= 0 and y[idx] < ny:
image[y[idx], x[idx]] += fluxes[idx]
# Convolve with a gaussian
kernel = Gaussian2DKernel(stddev=1)
image = convolve_fft(image, kernel)
# Add noise
image += np.random.normal(1., 0.001, image.shape)
return image
def main(argv):
"""Main function for the simcluster.py script."""
# Create an ArgumentParser--a special object that keeps track of all the
# arguments we want our script to be able to handle, and then implements parsing
# them from sys.argv
parser = argparse.ArgumentParser(description="Generates simulated images of clusters")
# Add an optional argument for # of stars (default=10000)
parser.add_argument('-s', '--stars', type=int, default=10000,
help='the number of stars to generate')
# Add the x and y arguments
parser.add_argument('-x', type=int, default=512,
help='the x dimension (in pixels) of the image')
parser.add_argument('-y', type=int, default=512,
help='the y dimension (in pixels) of the image')
# Add an argument to handle the output file. If we use argparse.FileType it will
# handle opening a writeable file (and ensuring we can write to it).
# Note that the argument name 'file' does not beging with a '-' or '--'; this indicates
# to argparse that it is a *positional* argument
parser.add_argument('file', type=argparse.FileType('w'),
help='the name of the output file')
args = parser.parse_args(argv)
image = simulated_cluster(n_stars=args.stars, dimensions=(args.x, args.y))
# Write to a FITS file
# For now the file writing is simple enough that we leave it in the main() function; in the future
# we may want to break this out into its own routine that takes a filename and the image array
# (and any other relevant options) and handles the writing
fits.writeto(args.file, image, clobber=True)
if __name__ == '__main__':
main(sys.argv[1:])
Overwriting simcluster.py
The most mysterious aspect of this last version is probably the incantation at the end:
if __name__ == '__main__':
main(sys.argv[1:])
Clearly this reads sys.argv
and passes all the argument (except the first, which is the program name) to our main()
function. But what is if __name__ == '__main__':
?
Every Python source file read by Python (also known as a "module" in Python parlance), such as our simcluster.py
file, automatically gets a variable that is global (to that file, not to the entire Python interpreter) variable called __name__
defined. Typically __name__
just comes from the filename, but with the .py
stripped off. Nearly every library or "module" you load in Python comes from a file somewhere, and has this __name__
attribute. For example:
import argparse
argparse.__file__
'/usr/lib/python2.7/argparse.pyc'
argparse.__name__
'argparse'
However, there is a caveat. When we import argparse
, argparse.__name__
is the same as the file name. However, when we run a .py
file as a script, as we are doing when we run simcluster.py
at the command-line, it gets a special __name__
with the value '__main__'
. We can see this with a simple test script:
%%file maintest.py
print __name__
Overwriting maintest.py
!python maintest.py
__main__
This is how we know that this file was the main entry point to our program. Any other Python module imported into our program still reports its normal __name__
. (Interesting note, when you run the Python REPL, all the commands you run are actually in a default module called '__main__'
. Try it yourself--run python
and then type print __name__
).
The reason we put this incantation in our script is to note that we have now separated out a function called simulated_cluster
, which we might like to use sometime in an interactive Python session, or perhaps as part of a different program. In other words, simcluster.py
is no longer just a script--it has also become a (small) library of functions, that can also optionally be run as a script. That is, rather than call ./simcluster.py
from the command-line, we can also import it in our Python session:
import simcluster
Note that when we import simcluster
, its __name__
is no longe '__main__'
:
simcluster.__name__
'simcluster'
That means we can import the module without it automatically running the main
function of the script. We can then call functions from within the module in our interactive session:
simcluster.simulated_cluster(100, (10, 10))
array([[ 1.08060842, 1.07781005, 1.06687394, 1.04314849, 1.02514778, 1.07103758, 1.21289128, 1.30040318, 1.1865629 , 1.05154149], [ 1.14474091, 1.15789996, 1.14743815, 1.0866468 , 1.04366706, 1.09124621, 1.19923599, 1.2307618 , 1.13277319, 1.03827623], [ 1.14379661, 1.18677995, 1.2045861 , 1.13782828, 1.09185713, 1.10385719, 1.12215755, 1.10702282, 1.06145937, 1.02006527], [ 1.1888733 , 1.21211689, 1.21824269, 1.18211722, 1.18498579, 1.198651 , 1.13984625, 1.08152114, 1.04217991, 1.01276278], [ 1.25177213, 1.2134936 , 1.19002654, 1.2160672 , 1.26027973, 1.28623961, 1.18859329, 1.0913626 , 1.0357741 , 1.0089897 ], [ 1.14867159, 1.12709839, 1.16200597, 1.23744945, 1.2437626 , 1.22530688, 1.15773793, 1.09226373, 1.04001322, 1.00919381], [ 1.03924094, 1.0634581 , 1.14238012, 1.20993034, 1.18198106, 1.15080055, 1.12951789, 1.11843634, 1.0859279 , 1.03178174], [ 1.02265141, 1.05677639, 1.13390076, 1.17900249, 1.15885219, 1.15614676, 1.16340906, 1.2117383 , 1.19567374, 1.09095349], [ 1.06123756, 1.06292554, 1.08276469, 1.09781104, 1.09732647, 1.1150451 , 1.1707453 , 1.28957764, 1.30740887, 1.15728917], [ 1.09167092, 1.06188045, 1.02819676, 1.02438726, 1.02829093, 1.04469458, 1.09684281, 1.20752832, 1.2506197 , 1.14320566]])
We have done a nice job so far of separating the implementation of our algorithm (the actual "science") from the "user-friendly" command-line interface that we now provide to it. Our simcluster.py
module is both an importable Python module (or library), but also acts as a command-line program when executed.
However, we can still do better. There are at least a few issues with this (in the case of this simple program hypothetical, but still realistic for a more complicated application):
There's a degree to which, by putting .py
in the filename, we're exposing how the sausage is made. Maybe we just want people to be able to make simulated images without knowing or caring what programming language it happened to be implemented in. We might even change the implementation language in the distant future--or maybe we're writing a replacement for an already existing program. If we're the only user it doesn't matter so much, but if we want to share our program with colleagues it might be nicer if we just gave them a program called simcluster
(without the .py
).
Right now our "library" code consists of a single function simulated_cluster
. But we've also mentioned that in the future it might grow more components. Real simulation code might involve dozens of subroutines and utilitiy functions that we might want to break up into separate files. It might become unmaintainable if we insist that all the code for simcluster
stay in a single file.
Whether our code consists of a single file, or a bundle of files, we want to be able to install it to a standard location on the operating system. That way we can run the simcluster
program just like any other program like ls
or grep
or ds9
. We also want to be able to import simcluster
from our Python code just like we import argparse
or import astropy
.
In this section we'll address the first bullet point. In the subsequent section we'll address the third. (The middle bullet point also pertains to the first one, but a little more softfly in our example program since the code is very small.)
What we are going to do now is separate our "library" code, from the command-line interface, this time by moving them into entirely separate files. The "library" code, currently just consisting of our simulated_cluster
function can stay where it is in simcluster.py
. We will move the main()
function into a new file called just simcluster
:
%%file simcluster.py
"""Utilities for generating simulated astronomical images."""
__version__ = '0.1'
import numpy as np
from astropy.convolution import Gaussian2DKernel, convolve_fft
def simulated_cluster(n_stars=10000, dimensions=(512, 512)):
"""
Generates an image simulating a cluster of stars, including
a Gaussian filter and background noise.
Parameters
----------
n_stars : `int`
A positive integer giving the number of visible stars in the image
(default: 10000).
dimensions : `tuple`
A two-tuple of positive integers specifying the dimensions (in pixels)
of the output image (default: 512x512).
Returns
-------
array : `~numpy.ndarray`
A 2D Numpy array containing the pixels of the generated image.
"""
nx, ny = dimensions
# Create empty image
image = np.zeros((ny, nx))
# Generate random positions
r = np.random.random(n_stars) * nx
theta = np.random.uniform(0., 2. * np.pi, n_stars)
# Generate random fluxes
fluxes = np.random.random(n_stars) ** 2
# Compute position
x = nx / 2 + r * np.cos(theta)
y = ny / 2 + r * np.sin(theta)
# Add stars to image
# ==> First for loop and if statement <==
for idx in range(n_stars):
if x[idx] >= 0 and x[idx] < nx and y[idx] >= 0 and y[idx] < ny:
image[y[idx], x[idx]] += fluxes[idx]
# Convolve with a gaussian
kernel = Gaussian2DKernel(stddev=1)
image = convolve_fft(image, kernel)
# Add noise
image += np.random.normal(1., 0.001, image.shape)
return image
Overwriting simcluster.py
Note that we also added a __version__
variable in simcluster.py
, as well as a docstring for the module. Using __version__
to specify a library's version is standard practice. But you should also think about how to use the version number effectively. Time permitting we can have a discussion about that, but in short there are many documented schemes for how to version your software. One popular one is called "semantic versions" or "semver" for short. Most Python projects in the wild use some variation on "semver", if not "semver" in the stricted definition.
%%file simcluster
#!/usr/bin/env python
import sys
import argparse
from astropy.io import fits
from simcluster import simulated_cluster
def main(argv):
"""Main function for the simcluster.py script."""
# Create an ArgumentParser--a special object that keeps track of all the
# arguments we want our script to be able to handle, and then implements parsing
# them from sys.argv
parser = argparse.ArgumentParser(description="Generates simulated images of clusters")
# Add an optional argument for # of stars (default=10000)
parser.add_argument('-s', '--stars', type=int, default=10000,
help='the number of stars to generate')
# Add the x and y arguments
parser.add_argument('-x', type=int, default=512,
help='the x dimension (in pixels) of the image')
parser.add_argument('-y', type=int, default=512,
help='the y dimension (in pixels) of the image')
# Add an argument to handle the output file. If we use argparse.FileType it will
# handle opening a writeable file (and ensuring we can write to it).
# Note that the argument name 'file' does not beging with a '-' or '--'; this indicates
# to argparse that it is a *positional* argument
parser.add_argument('file', type=argparse.FileType('w'),
help='the name of the output file')
args = parser.parse_args(argv)
image = simulated_cluster(n_stars=args.stars, dimensions=(args.x, args.y))
# Write to a FITS file
# For now the file writing is simple enough that we leave it in the main() function; in the future
# we may want to break this out into its own routine that takes a filename and the image array
# (and any other relevant options) and handles the writing
fits.writeto(args.file, image, clobber=True)
if __name__ == '__main__':
main(sys.argv[1:])
Overwriting simcluster
Note that now all simcluster.py
contains is a Python function definition. There is no code in it (like the main
function) that is executable in the context of a command-line script. So we will remove the +x
executable mode from the file (but add it to the new simcluster
file):
! chmod -x simcluster.py
! chmod +x simcluster
As mentioned in the previous section, although we have nicely separated our library code from our user-interface code, we now have a new problem: Instead of a nice, single script file that we can e-mail around to our colleagues, we now how two files to deal with, at least one of which depends on the other. How do we keep things straight--how do we make sure that simcluster
is executable, and that simcluster.py
is importable from within Python?
We all have a fairly "standard" execution $PATH
for our programs (by "standard" we mean standard for a particular platform--this actually varies wildly between OS's):
! echo $PATH
/usr/lib/lightdm/lightdm:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
When we give people (or ourselves) the simcluster
script we could ask them to copy it to on eof those standard paths. But which one? We'd have to give instructions for all OS's, and even with that things can be messy and complicated. Things get even worse when it comes to Python libraries. Python has a number of standard paths that it searches for libraries, and this varies depending on what Python version you're using. To see where Python searches, print out the variable sys.path
from the sys
module:
import sys
sys.path
['', '/usr/local/lib/python2.7/dist-packages/prettyplotlib-0.1.4-py2.7.egg', '/usr/local/lib/python2.7/dist-packages/brewer2mpl-1.3.2-py2.7.egg', '/usr/lib/python2.7', '/usr/lib/python2.7/plat-linux2', '/usr/lib/python2.7/lib-tk', '/usr/lib/python2.7/lib-old', '/usr/lib/python2.7/lib-dynload', '/usr/local/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages/PIL', '/usr/lib/python2.7/dist-packages/gst-0.10', '/usr/lib/python2.7/dist-packages/gtk-2.0', '/usr/lib/pymodules/python2.7', '/usr/lib/python2.7/dist-packages/ubuntu-sso-client', '/usr/lib/python2.7/dist-packages/ubuntuone-client', '/usr/lib/python2.7/dist-packages/ubuntuone-couch', '/usr/lib/python2.7/dist-packages/ubuntuone-installer', '/usr/lib/python2.7/dist-packages/ubuntuone-storage-protocol', '/usr/local/lib/python2.7/dist-packages/IPython/extensions']
Each value in sys.path
is a directory on your filesystem. When you run the code import simcluster
it searches each of those directories, in order (the first entry ''
means the current directory) for a file called simcluster.py
to import.
Note: Alternatively, it looks for a directory called simcluster
containing a file named simcluster/__init__.py
. That indicates that simcluster
is a "package", which is beyond the scope of this tutorial.
Knowing where to properly install a Python module can be a nightmare for novices.
Fortunately, your Python distribution does know where best to install both scripts and Python modules (so long as you have permission to install to those locations--there are ways of specifying alternative paths, but that is again outside the scope of this tutorial).
The "standard" way to install Python software is to write a special script called setup.py
, which uses a library called setuptools
(or sometimes the older distutils
). You may have seen this before if you have ever downloaded a third-party Python library. The library probably came in some archive file (either .zip
or .tar.gz
) containing all the source code for the library and/or scripts, along with a setup.py
file, and a few other files (a license, a README, etc.)
Users can simply unpack the archive, and run the command
./setup.py install
to install all the software in your package to the correct locations for your particular Python distribution (this will work even with Anaconda and the like). The key is that you don't have to implement the details of how this command works. All of that is already done by Python. All you have to do is provide a little boilerplate in the setup.py
file that looks something like:
%%file setup.py
#!/usr/bin/env python
from setuptools import setup
setup(
name='simcluster',
author='Your Name Here!',
author_email='your.name@example.com', # so people can pester you ;)
version='0.1',
py_modules=['simcluster'], # Python modules to install (without the .py in the filename)
scripts=['simcluster'] # This is the full name of the script "simcluster"; this will be installed to a bin/ directory
)
Overwriting setup.py
When the setup.py
script is run all that happens is the setup()
function is called. This passes all the information about your project to the Python setuptools/distutils infrastructure, and it uses it to implement an entire command-line interface for building and installing your project, as we'll see.
Next, be kind to your users and make the setup.py
script executable as well:
! chmod +x setup.py
Now when we run this setup.py
we can see that it implements a whole lot of functionality that we didn't have to write:
! ./setup.py --help
Common commands: (see '--help-commands' for more) setup.py build will build the package underneath 'build/' setup.py install will install the package Global options: --verbose (-v) run verbosely (default) --quiet (-q) run quietly (turns verbosity off) --dry-run (-n) don't actually do anything --help (-h) show detailed help message --no-user-cfg ignore pydistutils.cfg in your home directory --command-packages list of packages that provide distutils commands Information display options (just display information, ignore any commands) --help-commands list all available commands --name print package name --version (-V) print package version --fullname print <package name>-<version> --author print the author's name --author-email print the author's email address --maintainer print the maintainer's name --maintainer-email print the maintainer's email address --contact print the maintainer's name if known, else the author's --contact-email print the maintainer's email address if known, else the author's --url print the URL for this package --license print the license of the package --licence alias for --license --description print the package description --long-description print the long package description --platforms print the list of platforms --classifiers print the list of classifiers --keywords print the list of keywords --provides print the list of packages/modules provided --requires print the list of packages/modules required --obsoletes print the list of packages/modules made obsolete usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...] or: setup.py --help [cmd1 cmd2 ...] or: setup.py --help-commands or: setup.py cmd --help
To install the package we just run ./setup.py install
. We can also run ./setup.py install --help
to see additional options for controlling where files are installed to. But in general, it will by default copy files to their appropriate locations (so long as you have permission to write to those locations on the filesystem-- you may have to run sudo ./setup.py install
).
setup.py
also provides us with a number of other services. For example, it was mentioned earlier that when you download a Python library it comes in an archive file (usually referred to as a "source distribution"). Well, you can make your own simply by running:
! ./setup.py sdist
running sdist running egg_info writing simcluster.egg-info/PKG-INFO writing top-level names to simcluster.egg-info/top_level.txt writing dependency_links to simcluster.egg-info/dependency_links.txt reading manifest file 'simcluster.egg-info/SOURCES.txt' writing manifest file 'simcluster.egg-info/SOURCES.txt' warning: sdist: standard file not found: should have one of README, README.txt warning: check: missing required meta-data: url creating simcluster-0.1 creating simcluster-0.1/simcluster.egg-info making hard links in simcluster-0.1... hard linking setup.py -> simcluster-0.1 hard linking simcluster -> simcluster-0.1 hard linking simcluster.py -> simcluster-0.1 hard linking simcluster.egg-info/PKG-INFO -> simcluster-0.1/simcluster.egg-info hard linking simcluster.egg-info/SOURCES.txt -> simcluster-0.1/simcluster.egg-info hard linking simcluster.egg-info/dependency_links.txt -> simcluster-0.1/simcluster.egg-info hard linking simcluster.egg-info/top_level.txt -> simcluster-0.1/simcluster.egg-info Writing simcluster-0.1/setup.cfg Creating tar archive removing 'simcluster-0.1' (and everything under it)
This automatically creates an archive with all the required files for your project (though sometimes you need to hold its hand a bit for it to know what all the files are that it needs, see Python documentation for more details).
By default, it automatically creates the file in a subdirectory of your project called dist/
(note on OSX and Linux it usually creates a .tar.gz
by default, while on Windows it creates a .zip
by default, so the following example may not work on Windows):
! tar -tf dist/simcluster-0.1.tar.gz
simcluster-0.1/ simcluster-0.1/PKG-INFO simcluster-0.1/setup.cfg simcluster-0.1/simcluster simcluster-0.1/simcluster.egg-info/ simcluster-0.1/simcluster.egg-info/SOURCES.txt simcluster-0.1/simcluster.egg-info/PKG-INFO simcluster-0.1/simcluster.egg-info/dependency_links.txt simcluster-0.1/simcluster.egg-info/top_level.txt simcluster-0.1/setup.py simcluster-0.1/simcluster.py
Finally, if you're really ambitious and want to share your code with the rest of the world you can post it to PyPI, the Python Package Index (also known to old-timers as "the cheeseshop"). When your project is on PyPI then people don't even have to manually download and unpack the source distribution. They can just install your package with pip
, the Python installer. You can just tell your users to run:
pip install simcluster
and pip will do the rest. This only works if your project has a setup.py
that implements these standard interfaces. If you have an account on PyPI you can post your project to PyPI by running:
./setup.py register sdist upload
This is actually three commands in one: the register
command tells PyPI you want to reserve a project name called "simcluster" and that its latest version is "0.1" (along with any other metadata you provided setup.py
such as your name). Then sdist
builds a source distribution archive as we already saw. Finally upload
uploads the source archive to PyPI (this is separate from register
which only posts the metadata).
You are now the proud owner of a Python package!
Getting back to our code, let's see if there some other improvements we can make...
If you look carefully, you'll note also that I now have some repeated information--there are two copies of my default values for n_stars
and the image dimensions: My simulated_cluster
function has some default values for those arguments, and I also have a set of defaults that I gave to the ArgumentParser
. Maybe I want these to have different defaults--there's nothing wrong with that. But I might also consider some way of adding this information in one place. There are many approaches one could take for this. One of the simplest might be a global variable listing defaults in one place:
%%file simcluster.py
"""Utilities for generating simulated astronomical images."""
import numpy as np
from astropy.convolution import Gaussian2DKernel, convolve_fft
CLUSTER_DEFAULTS = {
'stars': 10000,
'dimensions': (512, 512)
}
def simulated_cluster(n_stars=10000, dimensions=(512, 512)):
"""
Generates an image simulating a cluster of stars, including
a Gaussian filter and background noise.
Parameters
----------
n_stars : `int`
A positive integer giving the number of visible stars in the image
(default: 10000).
dimensions : `tuple`
A two-tuple of positive integers specifying the dimensions (in pixels)
of the output image (default: 512x512).
Returns
-------
array : `~numpy.ndarray`
A 2D Numpy array containing the pixels of the generated image.
"""
nx, ny = dimensions
# Create empty image
image = np.zeros((ny, nx))
# Generate random positions
r = np.random.random(n_stars) * nx
theta = np.random.uniform(0., 2. * np.pi, n_stars)
# Generate random fluxes
fluxes = np.random.random(n_stars) ** 2
# Compute position
x = nx / 2 + r * np.cos(theta)
y = ny / 2 + r * np.sin(theta)
# Add stars to image
# ==> First for loop and if statement <==
for idx in range(n_stars):
if x[idx] >= 0 and x[idx] < nx and y[idx] >= 0 and y[idx] < ny:
image[y[idx], x[idx]] += fluxes[idx]
# Convolve with a gaussian
kernel = Gaussian2DKernel(stddev=1)
image = convolve_fft(image, kernel)
# Add noise
image += np.random.normal(1., 0.001, image.shape)
return image
Overwriting simcluster.py
%%file simcluster
#!/usr/bin/env python
import sys
import argparse
from astropy.io import fits
from simcluster import simulated_cluster, CLUSTER_DEFAULTS
def main(argv):
"""Main function for the simcluster.py script."""
# Create an ArgumentParser--a special object that keeps track of all the
# arguments we want our script to be able to handle, and then implements parsing
# them from sys.argv
parser = argparse.ArgumentParser(description="Generates simulated images of clusters")
# Add an optional argument for # of stars (default=10000)
parser.add_argument('-s', '--stars', type=int, default=CLUSTER_DEFAULTS['stars'],
help='the number of stars to generate')
# Add the x and y arguments
parser.add_argument('-x', type=int, default=CLUSTER_DEFAULTS['dimensions'][0],
help='the x dimension (in pixels) of the image')
parser.add_argument('-y', type=int, default=CLUSTER_DEFAULTS['dimensions'][1],
help='the y dimension (in pixels) of the image')
# Add an argument to handle the output file. If we use argparse.FileType it will
# handle opening a writeable file (and ensuring we can write to it).
# Note that the argument name 'file' does not beging with a '-' or '--'; this indicates
# to argparse that it is a *positional* argument
parser.add_argument('file', type=argparse.FileType('w'),
help='the name of the output file')
args = parser.parse_args(argv)
image = simulated_cluster(n_stars=args.stars, dimensions=(args.x, args.y))
# Write to a FITS file
# For now the file writing is simple enough that we leave it in the main() function; in the future
# we may want to break this out into its own routine that takes a filename and the image array
# (and any other relevant options) and handles the writing
fits.writeto(args.file, image, clobber=True)
if __name__ == '__main__':
main(sys.argv[1:])
Overwriting simcluster
Another possible approach would have been to determine the default arguments to simulated_cluster
directly from the function definition itself by using the inspect
module from the Python standard library. This will be left as an exercise.