# A quick - and highly biased - tour of Python for X-ray Astronomy¶

## Doug Burke, Chandra X-ray Center¶

### Presented at the COSPAR Capacity Building Workshop, 'X-ray Astrophysics School', Ensenada, Mexico, November 2014¶

This is a talk presented using IPython notebooks - now re-branded as Project Jupyter - which allows you to embed code and documentation, and is being enthusiastically endorsed by the "Open Science" movement. In particular, you can send notebooks to collaborators to allow "reproducible" science, as well as being used for demonstrations, as we will see below.

The contents of this talk - where applicable, so not the images - are placed in the Public Domain.

## What is Python?¶

Oops; let's focus on the programming language:

## Why Python?¶

There are a lot of Astronomers - and people in related (i.e. useful) fields - who provide tutorials, code, modules, workshops, and help. For instance, I saw this on Monday:

"Frequentism and Bayesianism: A Python-driven Primer", Jake VanderPlas, November 18, 2014:

This paper presents a brief, semi-technical comparison of the essential features of the frequentist and Bayesian approaches to statistical inference, with several illustrative examples implemented in Python. The differences between frequentism and Bayesianism fundamentally stem from differing definitions of probability, a philosophical divide which leads to distinct approaches to the solution of statistical problems as well as contrasting ways of asking and answering questions about unknown parameters. After an example-driven discussion of these differences, we briefly compare several leading Python statistical packages which implement frequentist inference using classical methods and Bayesian inference using Markov Chain Monte Carlo.

http://arxiv.org/abs/1411.5018

There are many examples; here is one I found that I thought would be relevant for this workshop:

http://allendowney.blogspot.mx/2014/11/the-world-cup-problem-part-2-germany-v.html

## Yes, yes, I get it, how do I start Python?¶

Starting python for interactive use, use ipython (you can use python for interactive analysis, but it's not as nice):

unix% ipython
Python 2.7.6 (default, Apr 18 2014, 17:48:56)

IPython 2.0.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import antigravity

In [2]: import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!



If you want to "run a script" - i.e. something that you don't want to run interactively - then you can just say

python my-wonderful-script-for-suzaku-analysis.py



and you can even make scripts executable and use a "shebang" line to avoid the need to say python (e.g. the CIAO chandra_repro script is written in Python).

Note that there are several versions to be aware of:

• the Python version ("big" changes between Python 2.x and Python 3.y)

• the IPython version

• the versions of various packages/systems we use (e.g. NumPy and Matplotlib; these versions aren't shown here)

You should also be aware of R - The R project for Statistical Computing - which is not Python (and you can combine both if you realy want to), but is important as Astronomy moves into the era of

## Terminology¶

• Object Oriented

• Dynamic Typing

• Strongly Typed

I have a brief introduction at another IPython notebook but there are many other, better, examples around.

Computer Science has its own terminology, just like X-ray Astronomy.

In [1]:
from IPython.display import Image
Image('question.jpg')

Out[1]:

Thanks to Steve for the photo opportunity. This photograph is copyright Douglas Burke, 2014 and is not for re-use without express permission.

• XMM has patterns, Chandra has ???

• Optical has blue and red, X-ray Astronomers use ??? and ???

## Important words and phrases¶

(* not Wolfram Alpha!)

Note: many of these packages are changing fast.

• self promotion, as I'm an author of this paper...

Acknowledging or Citing Astropy In Publications

If you use Astropy for work/research presented in a publication (whether directly, or as a dependency to another package), we would be grateful if you could include the following acknowledgment:

This research made use of Astropy, a community-developed core Python package for Astronomy (Astropy Collaboration, 2013).

where (Astropy Collaboration, 2013) is a citation to the Astropy Paper (ADS - BibTeX). If you wish, you can also include a link to http://www.astropy.org (if the journal allows this) in addition to the above text.

• and just to re-iterate this; here's the paper on ADS

Quick question: who uses the "new" ADS interface?

• here's the Machine Learning book; or at least its cover

• there are plenty of tutorials using IPython notebooks

## Installation¶

It's as easy as CIAO, SAS, and FTOOLS, all rolled up as one ...

CIAO has its own Python installation (works well in most cases, but it does not play nicely with other systems). Radio and HST have something similar.

A popular installation is Anaconda https://store.continuum.io/cshop/anaconda/ since it is a binary installation (SciPy can be a tad awkward to build) with a lot of scientific software provided, including astropy.

Your computer almost-certainly has its own Python installation.

Mixing and matching the different installations will lead to

## Using (I)Python¶

For the notebook I need to tell Matplotlib to add the images "into" the notebook, rather than throw up a window.

Err, I don't think I thought that sentence through...

In [2]:
%matplotlib inline


The main reason for using Python is because there's code out there that is useful to us; here's some useful imports - if you say

ipython --pylab



then this is done for you (note that this depends on the version of IPython you are using; e.g. ipython -pylab or ipython --matplotlib qt).

In [3]:
import numpy as np
import matplotlib.pyplot as plt

In [4]:
plt.rcParams['figure.figsize'] = (8, 6)


I am going to use AstroPy - remember: http://www.astropy.org/ - to read in FITS data:

In [5]:
from astropy.io import fits


You can use help(fits), but the output is a bit long for this example, so I've picked a symbol somewhat at random:

In [6]:
help(plt.plot)

Help on function plot in module matplotlib.pyplot:

plot(*args, **kwargs)
Plot lines and/or markers to the
:class:~matplotlib.axes.Axes.  *args* is a variable length
argument, allowing for multiple *x*, *y* pairs with an
optional format string.  For example, each of the following is
legal::

plot(x, y)        # plot x and y using default line style and color
plot(x, y, 'bo')  # plot x and y using blue circle markers
plot(y)           # plot y using x as index array 0..N-1
plot(y, 'r+')     # ditto, but with red plusses

If *x* and/or *y* is 2-dimensional, then the corresponding columns
will be plotted.

An arbitrary number of *x*, *y*, *fmt* groups can be
specified, as in::

a.plot(x1, y1, 'g^', x2, y2, 'g-')

Return value is a list of lines that were added.

By default, each line is assigned a different color specified by a
'color cycle'.  To change this behavior, you can edit the
axes.color_cycle rcParam.

The following format string characters are accepted to control
the line style or marker:

================    ===============================
character           description
================    ===============================
'-'             solid line style
'--'            dashed line style
'-.'            dash-dot line style
':'             dotted line style
'.'             point marker
','             pixel marker
'o'             circle marker
'v'             triangle_down marker
'^'             triangle_up marker
'<'             triangle_left marker
'>'             triangle_right marker
'1'             tri_down marker
'2'             tri_up marker
'3'             tri_left marker
'4'             tri_right marker
's'             square marker
'p'             pentagon marker
'*'             star marker
'h'             hexagon1 marker
'H'             hexagon2 marker
'+'             plus marker
'x'             x marker
'D'             diamond marker
'd'             thin_diamond marker
'|'             vline marker
'_'             hline marker
================    ===============================

The following color abbreviations are supported:

==========  ========
character   color
==========  ========
'b'         blue
'g'         green
'r'         red
'c'         cyan
'm'         magenta
'y'         yellow
'k'         black
'w'         white
==========  ========

In addition, you can specify colors in many weird and
wonderful ways, including full names ('green'), hex
strings ('#008000'), RGB or RGBA tuples ((0,1,0,1)) or
grayscale intensities as a string ('0.8').  Of these, the
string specifications can be used in place of a fmt group,
but the tuple forms can be used only as kwargs.

Line styles and colors are combined in a single format string, as in
'bo' for blue circles.

The *kwargs* can be used to set line properties (any property that has
a set_* method).  You can use this to set a line label (for auto
legends), linewidth, anitialising, marker face color, etc.  Here is an
example::

plot([1,2,3], [1,2,3], 'go-', label='line 1', linewidth=2)
plot([1,2,3], [1,4,9], 'rs',  label='line 2')
axis([0, 4, 0, 10])
legend()

If you make multiple lines with one plot command, the kwargs
apply to all those lines, e.g.::

plot(x1, y1, x2, y2, antialised=False)

Neither line will be antialiased.

You do not need to use format strings, which are just
abbreviations.  All of the line properties can be controlled
by keyword arguments.  For example, you can set the color,
marker, linestyle, and markercolor with::

plot(x, y, color='green', linestyle='dashed', marker='o',
markerfacecolor='blue', markersize=12).

See :class:~matplotlib.lines.Line2D for details.

The kwargs are :class:~matplotlib.lines.Line2D properties:

agg_filter: unknown
alpha: float (0.0 transparent through 1.0 opaque)
animated: [True | False]
antialiased or aa: [True | False]
axes: an :class:~matplotlib.axes.Axes instance
clip_box: a :class:matplotlib.transforms.Bbox instance
clip_on: [True | False]
clip_path: [ (:class:~matplotlib.path.Path,         :class:~matplotlib.transforms.Transform) |         :class:~matplotlib.patches.Patch | None ]
color or c: any matplotlib color
contains: a callable function
dash_capstyle: ['butt' | 'round' | 'projecting']
dash_joinstyle: ['miter' | 'round' | 'bevel']
dashes: sequence of on/off ink in points
drawstyle: ['default' | 'steps' | 'steps-pre' | 'steps-mid' |                   'steps-post']
figure: a :class:matplotlib.figure.Figure instance
fillstyle: ['full' | 'left' | 'right' | 'bottom' | 'top' | 'none']
gid: an id string
label: string or anything printable with '%s' conversion.
linestyle or ls: ['-' | '--' | '-.' | ':' | 'None' |                   ' ' | '']         and any drawstyle in combination with a linestyle, e.g., 'steps--'.
linewidth or lw: float value in points
lod: [True | False]
marker: unknown
markeredgecolor or mec: any matplotlib color
markeredgewidth or mew: float value in points
markerfacecolor or mfc: any matplotlib color
markerfacecoloralt or mfcalt: any matplotlib color
markersize or ms: float
markevery: unknown
path_effects: unknown
picker: float distance in points or callable pick function         fn(artist, event)
rasterized: [True | False | None]
sketch_params: unknown
snap: unknown
solid_capstyle: ['butt' | 'round' |  'projecting']
solid_joinstyle: ['miter' | 'round' | 'bevel']
transform: a :class:matplotlib.transforms.Transform instance
url: a url string
visible: [True | False]
xdata: 1D array
ydata: 1D array
zorder: any number

kwargs *scalex* and *scaley*, if defined, are passed on to
:meth:~matplotlib.axes.Axes.autoscale_view to determine
whether the *x* and *y* axes are autoscaled; the default is
*True*.

Additional kwargs: hold = [True|False] overrides default hold state



The reason for all the :class... and extra chatacters, such as * and ~ - is to make a "nice" HTML page:

http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.plot

You can find out what symbols - i.e. the name of a "thing" - are attached to another symbol using the dir command:

In [7]:
dir(fits)

Out[7]:
['BinTableHDU',
'Card',
'CardList',
'ColDefs',
'Column',
'CompImageHDU',
'Conf',
'DELAYED',
'Delayed',
'ENABLE_RECORD_VALUED_KEYWORD_CARDS',
'EXTENSION_NAME_CASE_SENSITIVE',
'FITSDiff',
'FITS_rec',
'FITS_record',
'FitsHDU',
'Group',
'GroupData',
'GroupsHDU',
'HDUDiff',
'HDUList',
'ImageHDU',
'PrimaryHDU',
'Section',
'StreamingHDU',
'TableHDU',
'USE_MEMMAP',
'Undefined',
'VerifyError',
'__all__',
'__builtins__',
'__doc__',
'__file__',
'__name__',
'__package__',
'__path__',
'_config',
'append',
'card',
'column',
'compression',
'conf',
'convenience',
'create_card',
'create_card_from_string',
'delval',
'diff',
'file',
'fitsrec',
'getdata',
'getval',
'hdu',
'info',
'new_table',
'open',
'py3compat',
'register_hdu',
'setval',
'tabledump',
'tcreate',
'tdump',
'unregister_hdu',
'update',
'upper_key',
'util',
'verify',
'writeto']

Don't forget to talk about tab completion!

I want to read in a FITS file, so open looks a good place to start (plus, I've used this module before, so I know what I want to do ;-).

In [8]:
help(fits.open)

Help on function fitsopen in module astropy.io.fits.hdu.hdulist:

Factory function to open a FITS file and return an HDUList object.

Parameters
----------
name : file path, file object or file-like object
File to be opened.

mode : str
Open mode, 'readonly' (default), 'update', 'append', 'denywrite', or
'ostream'.

If name is a file object that is already opened, mode must
match the mode the file was opened with, readonly (rb), update (rb+),
append (ab+), ostream (w), denywrite (rb)).

memmap : bool
Is memory mapping to be used?

save_backup : bool
If the file was opened in update or append mode, this ensures that a
backup of the original file is saved before any changes are flushed.
The backup has the same name as the original file with ".bak" appended.
If "file.bak" already exists then "file.bak.1" is used, and so on.

kwargs : dict
optional keyword arguments, possible values are:

- **uint** : bool

Interpret signed integer data where BZERO is the
central value and BSCALE == 1 as unsigned integer
data.  For example, int16 data with BZERO = 32768
and BSCALE = 1 would be treated as uint16 data.

Note, for backward compatibility, the kwarg **uint16** may
be used instead.  The kwarg was renamed when support was
added for integers of any size.

- **ignore_missing_end** : bool

Do not issue an exception when opening a file that is
missing an END card in the last header.

- **checksum** : bool, str

If True, verifies that both DATASUM and
CHECKSUM card values (when present in the HDU header)
match the header and data of all HDU's in the file.  Updates to a
file that already has a checksum will preserve and update the
existing checksums unless this argument is given a value of
'remove', in which case the CHECKSUM and DATASUM values are not
checked, and are removed when saving changes to the file.

- **disable_image_compression** : bool

If True, treates compressed image HDU's like normal
binary table HDU's.

- **do_not_scale_image_data** : bool

If True, image data is not scaled using BSCALE/BZERO values

- **ignore_blank** : bool
If True, the BLANK keyword is ignored if present.

- **scale_back** : bool

If True, when saving changes to a file that contained scaled
image data, restore the data to the original type and reapply the
original BSCALE/BZERO values.  This could lead to loss of accuracy
if scaling back to integer values after performing floating point
operations on the data.

Returns
-------
hdulist : an HDUList object
HDUList containing all of the header data units in the
file.



Let's look in the data directory:

In [9]:
%ls data

evt2.fits  lc.fits  mystery-picture.fits  swift.evt


For some common commands you do not need the % character; so I could have said:

In [10]:
ls data

evt2.fits  lc.fits  mystery-picture.fits  swift.evt


In [11]:
hdus = fits.open('data/mystery-picture.fits')


Aside:

import astropy.io.fits
hdus = astropy.io.fits.open(...)

import astropy.io.fits as keith
hdus = keith.open(...)

import astropy.io.fits
keith = astropy.io.fits.open
hdus = keith.open(...)

# do *NOT* do the following, as it will overwrite the
# standard open function, leading to "amusing" erros
# at some time in the future
from astropy.io.fits import open
hdus = open(...)

# Similarly, don't use the name time for a variable.



If you are wondering what a symbol "means", you can try just entering it at the prompt and seeing what gets returned (remember, this is IPython, which automatically prints out anything that is "returned", as long as it is not the special value None).

In [12]:
hdus

Out[12]:
[<astropy.io.fits.hdu.image.PrimaryHDU at 0x7f7d93333310>]

So, it is an array of "things", and there's only one of these "things". So, let's access the first element, which is labelled 0 in Python:

In [13]:
ihdu = hdus[0]
ihdu

Out[13]:
<astropy.io.fits.hdu.image.PrimaryHDU at 0x7f7d93333310>

If you want to find out how long something is, use len:

In [14]:
len(hdus)

Out[14]:
1
In [15]:
len("Carlos Ã© magnÃ­fico")

Out[15]:
20

but not everything has a "length":

A quick aside on errors:

In [16]:
len(ihdu)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-16-729b550a1ce2> in <module>()
----> 1 len(ihdu)

TypeError: object of type 'PrimaryHDU' has no len()

As with using SAS, FTOOLS, CIAO, ... - it is helpful to know how to "read" error messages!

So, you can have multiple lines in each "cell" of the notebook that get executed, and it's the output from the last command that will get displayed. That means that

In [17]:
2 + 3
"David Hasselhoff en alemÃ¡n significa impresionante"
(-3.4 + 2j) * (4.1 - 0.8j)

Out[17]:
(-12.339999999999998+10.92j)

This is only important if you are using IPython notebooks (so I don't know why I mentioned it).

Alternatively you can use the print command (in Python 2.x it doesn't need the () around what you want to print but it is needed in 3.x and above).

In [18]:
print(ihdu)

<astropy.io.fits.hdu.image.PrimaryHDU object at 0x7f7d93333310>

In [19]:
print ihdu

<astropy.io.fits.hdu.image.PrimaryHDU object at 0x7f7d93333310>


Some Python classes have written a "human-readable" output for print (or for when you just get IPython to display the value), as we will see very soon.

In [20]:
dir(ihdu)

Out[20]:
['ImgCode',
'NumCode',
'_EXCLUDE',
'__class__',
'__delattr__',
'__dict__',
'__doc__',
'__format__',
'__getattribute__',
'__hash__',
'__init__',
'__module__',
'__new__',
'__reduce__',
'__reduce_ex__',
'__repr__',
'__setattr__',
'__sizeof__',
'__str__',
'__subclasshook__',
'__weakref__',
'_axes',
'_bitpix',
'_blank',
'_bscale',
'_buffer',
'_bzero',
'_calculate_checksum',
'_calculate_datasum',
'_char_encode',
'_compute_checksum',
'_compute_hdu_checksum',
'_convert_pseudo_unsigned',
'_datLoc',
'_datSpan',
'_data_needs_rescale',
'_data_offset',
'_data_replaced',
'_data_size',
'_default_name',
'_do_not_scale_image_data',
'_dtype_for_bitpix',
'_encode_byte',
'_file',
'_gcount',
'_get_raw_data',
'_get_scaled_image_data',
'_get_timestamp',
'_has_data',
'_hdrLoc',
'_hdu_registry',
'_modified',
'_new',
'_orig_bitpix',
'_orig_blank',
'_orig_bscale',
'_orig_bzero',
'_output_checksum',
'_pcount',
'_postwriteto',
'_prewriteto',
'_scale_back',
'_standard',
'_summary',
'_uint',
'_update_checksum',
'_update_uint_scale_keywords',
'_verify',
'_verify_checksum_datasum',
'_writedata',
'_writedata_direct_copy',
'_writedata_internal',
'_writeto',
'copy',
'data',
'filebytes',
'fileinfo',
'fromstring',
'is_image',
'level',
'name',
'register_hdu',
'req_cards',
'run_option',
'scale',
'section',
'shape',
'size',
'unregister_hdu',
'update_ext_name',
'update_ext_version',
'ver',
'verify',
'verify_checksum',
'verify_datasum',
'writeto']

So now I want to look at the pixel data, which I happen to know is stored in the data attribute of the PrimaryHDU object.

Aside: when do I use ()?

In [21]:
carlos = ihdu.data()

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-21-2656497dcf52> in <module>()
----> 1 carlos = ihdu.data()

TypeError: 'numpy.ndarray' object is not callable

Note that you can refer to a function - i.e. something that needs () - without them and not get an error; in fact, this can be vitally important in some cases. It does mean potential confusion for new users who don't know what is a function (thing that can get called) and what is an attribute (thing that contains information).

In reality it's more complex, since attributes can be made to call routines automatically when accessed, but that's all hidden behind the scenes and you don't need to really know about this until you start writing your own classes.

Oh, you can also tell IPython to allow you to write

symbol value



and get it to add in the brackets for you; e.g.

In [999]: open "swift-is-awesome.dat"
-------> open("swift-is-awesome.dat")
IOError: [Errno 2] No such file or directory: 'swift-is-awesome.dat'



which we use in ChIPS and Sherpa, but it's not standard, as it makes writing code hard (or at least, copying and pasting code hard).

In [22]:
carlos = ihdu.data

In [23]:
carlos == hdus[0].data

Out[23]:
array([[ True,  True,  True, ...,  True,  True,  True],
[ True,  True,  True, ...,  True,  True,  True],
[ True,  True,  True, ...,  True,  True,  True],
...,
[ True,  True,  True, ...,  True,  True,  True],
[ True,  True,  True, ...,  True,  True,  True],
[ True,  True,  True, ...,  True,  True,  True]], dtype=bool)
In [24]:
np.all(carlos == hdus[0].data)

Out[24]:
True

What this is saying is that I can use either the symbol carlos or hdus[0].data to get at the data. It also shows off some basic NumPy functionality (storing arrays of data; doing things with the data).

Let's see what this mysterios carlos is:

In [25]:
carlos

Out[25]:
array([[116, 116, 115, ..., 114, 116, 112],
[111, 112, 114, ..., 113, 113, 113],
[113, 114, 115, ..., 111, 112, 113],
...,
[ 68,  68,  69, ..., 126, 128, 125],
[ 69,  71,  70, ..., 125, 124, 124],
[ 70,  69,  68, ..., 124, 124, 124]], dtype=uint8)
In [26]:
print(carlos)

[[116 116 115 ..., 114 116 112]
[111 112 114 ..., 113 113 113]
[113 114 115 ..., 111 112 113]
...,
[ 68  68  69 ..., 126 128 125]
[ 69  71  70 ..., 125 124 124]
[ 70  69  68 ..., 124 124 124]]


Note that the output of these two are slightly different from each other, and that they try to be "nice" and not display every element (this can actually be tweaked if you want, as it's a part of NumPy).

In [27]:
carlos.shape

Out[27]:
(1200, 800)

And before Carlos complains, this is listed as number of Y pixels (so height) and then the number of X pixels (width). This is the "C-like" style of indexing array data, rather than the "Fortran-like" style.

In [28]:
carlos.dtype

Out[28]:
dtype('uint8')
In [29]:
plt.imshow(carlos)

Out[29]:
<matplotlib.image.AxesImage at 0x7f7d92a87110>

Dog waiting to be scanned at MR Research Center in Budapest. Image Credit: Borbala Ferenczy.

http://mic.com/articles/104474/brain-scans-reveal-what-dogs-really-think-of-us

In [30]:
plt.set_cmap('grey')

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-30-83e946b4f45d> in <module>()
----> 1 plt.set_cmap('grey')

/home/dburke/anaconda/lib/python2.7/site-packages/matplotlib/pyplot.pyc in set_cmap(cmap)
2195     :func:matplotlib.cm.get_cmap.
2196     """
-> 2197     cmap = cm.get_cmap(cmap)
2198
2199     rc('image', cmap=cmap.name)

/home/dburke/anaconda/lib/python2.7/site-packages/matplotlib/cm.pyc in get_cmap(name, lut)
161         raise ValueError(
162             "Colormap %s is not recognized. Possible values are: %s"
--> 163             % (name, ', '.join(cmap_d.keys())))
164
165

ValueError: Colormap grey is not recognized. Possible values are: Spectral, summer, coolwarm, Wistia_r, pink_r, Set1, Set2, Set3, brg_r, Dark2, prism, PuOr_r, afmhot_r, terrain_r, PuBuGn_r, RdPu, gist_ncar_r, gist_yarg_r, Dark2_r, YlGnBu, RdYlBu, hot_r, gist_rainbow_r, gist_stern, PuBu_r, cool_r, cool, gray, copper_r, Greens_r, GnBu, gist_ncar, spring_r, gist_rainbow, gist_heat_r, Wistia, OrRd_r, CMRmap, bone, gist_stern_r, RdYlGn, Pastel2_r, spring, terrain, YlOrRd_r, Set2_r, winter_r, PuBu, RdGy_r, spectral, rainbow, flag_r, jet_r, RdPu_r, gist_yarg, BuGn, Paired_r, hsv_r, bwr, cubehelix, Greens, PRGn, gist_heat, spectral_r, Paired, hsv, Oranges_r, prism_r, Pastel2, Pastel1_r, Pastel1, gray_r, jet, Spectral_r, gnuplot2_r, gist_earth, YlGnBu_r, copper, gist_earth_r, Set3_r, OrRd, gnuplot_r, ocean_r, brg, gnuplot2, PuRd_r, bone_r, BuPu, Oranges, RdYlGn_r, PiYG, CMRmap_r, YlGn, binary_r, gist_gray_r, Accent, BuPu_r, gist_gray, flag, bwr_r, RdBu_r, BrBG, Reds, Set1_r, summer_r, GnBu_r, BrBG_r, Reds_r, RdGy, PuRd, Accent_r, Blues, autumn_r, autumn, cubehelix_r, nipy_spectral_r, ocean, PRGn_r, Greys_r, pink, binary, winter, gnuplot, RdYlBu_r, hot, YlOrBr, coolwarm_r, rainbow_r, Purples_r, PiYG_r, YlGn_r, Blues_r, YlOrBr_r, seismic, Purples, seismic_r, RdBu, Greys, BuGn_r, YlOrRd, PuOr, PuBuGn, nipy_spectral, afmhot

As mentioned earlier, the error messages can be useful, if you know where to look...

In [31]:
plt.imshow(carlos)
plt.set_cmap('gray')


Dog waiting to be scanned at MR Research Center in Budapest. Image Credit: Borbala Ferenczy.

Note that matplotlib can be run in a more interactive manner, in which case I could have just called set_cmap and it would have changed the existing plot. This doesn't work in the notebook environment that I am using, so we had to re-create the image here.

In [32]:
mariano = carlos[400:,:]

In [33]:
mariano.shape

Out[33]:
(800, 800)

If you know what you are doing, you can sometimes combine steps:

In [34]:
plt.imshow(mariano, cmap='gray')

Out[34]:
<matplotlib.image.AxesImage at 0x7f7d90e01a90>

Dog waiting to be scanned at MR Research Center in Budapest. Image Credit: Borbala Ferenczy.

In [35]:
from skimage.filter import canny

In [36]:
matteo = canny(mariano)


Remember you can use help(canny) to get more information. This information is also available on the web, since the Python documentation/help system makes it easy to convert the code documentation into other forms.

In [37]:
matteo.shape

Out[37]:
(800, 800)
In [38]:
matteo.dtype

Out[38]:
dtype('bool')
In [39]:
plt.imshow(matteo)

Out[39]:
<matplotlib.image.AxesImage at 0x7f7d9279c250>

Dog waiting to be scanned at MR Research Center in Budapest. Image Credit: Borbala Ferenczy.

I don't expect you to know that canny is the routine to use here (or in fact, whether it is remotely useful for your analysis); the point is to say that there's lots of functionality available to you in NumPy, SciPy, ... so you don't always have to go and write your own code.

## A brief foray into timing analysis¶

Bill the Cat, created by Berkeley Breathed: http://en.wikipedia.org/wiki/Bill_the_Cat

In [40]:
hdus = fits.open("data/evt2.fits")
print(hdus)

[<astropy.io.fits.hdu.image.PrimaryHDU object at 0x7f7d928132d0>, <astropy.io.fits.hdu.table.BinTableHDU object at 0x7f7d7df00110>, <astropy.io.fits.hdu.table.BinTableHDU object at 0x7f7d7df00890>]

In [41]:
hdus.info()

Filename: data/evt2.fits
No.    Name         Type      Cards   Dimensions   Format
0    PRIMARY     PrimaryHDU      32   ()
1    EVENTS      BinTableHDU    731   35782R x 2C   [1D, 1E]
2    GTI         BinTableHDU     28   4R x 2C      [1D, 1D]

In [42]:
thdu = hdus[1]

In [43]:
print(thdu.columns)

ColDefs(
name = 'time'; format = '1D'; unit = 's'
name = 'energy'; format = '1E'; unit = 'eV'
)

In [44]:
coldata = thdu.data

In [45]:
type(coldata)

Out[45]:
astropy.io.fits.fitsrec.FITS_rec
In [46]:
coldata.size

Out[46]:
35782
In [47]:
coldata.shape

Out[47]:
(35782,)
In [48]:
coldata[0]

Out[48]:
(186103828.51639557, 1936.8479)
In [49]:
times = coldata.field('time')

In [50]:
times.shape

Out[50]:
(35782,)
In [51]:
times[0]

Out[51]:
186103828.51639557
In [52]:
dt = times - times[0]


So, dt is the arrival times of the events; since this is Chandra data the TIME column is in seconds in the Modified Julian Date system (with an "arbitrary" offset), dt is in seconds too (in this case relative to the first event).

In [53]:
dt[0:10]

Out[53]:
array([ 0.        ,  0.        ,  1.76416001,  2.64622009,  3.96931016,
4.41035017,  6.17448026,  8.37966037,  8.82070038,  8.82070038])

Note that the times are discrete: why is this?

In [54]:
print(thdu.header['TIMEDEL'], thdu.header['EXPTIME'])

(0.44104, 0.4)


As an aside: if you need to create a string out of numbers and text, use format:

In [55]:
print("timedel={} exptime={}".format(thdu.header['TIMEDEL'], thdu.header['EXPTIME']))

timedel=0.44104 exptime=0.4


You will also see the use of % - as below - which is "old Python" and shouldn't really be used anymore (to make upgrading to Python 3 easier).

In [56]:
print "timedel=%f exptime=%f" % (thdu.header['TIMEDEL'], thdu.header['EXPTIME'])

timedel=0.441040 exptime=0.400000

In [57]:
dt[:20] / thdu.header['EXPTIME']

Out[57]:
array([  0.        ,   0.        ,   4.41040002,   6.61555022,
9.92327541,  11.02587543,  15.43620065,  20.94915092,
22.05175094,  22.05175094,  27.56467611,  28.66727613,
31.97502635,  34.1802264 ,  38.59055154,  46.30862705,
47.41122708,  49.61642705,  51.82157725,  56.23190247])
In [58]:
dt[0:20] / thdu.header['TIMEDEL']

Out[58]:
array([  0.        ,   0.        ,   4.00000002,   5.99995485,
8.999887  ,   9.99988703,  13.9998192 ,  18.9997741 ,
19.99977412,  19.99977412,  24.99970625,  25.99970627,
28.99966112,  30.99966116,  34.99959327,  41.99948037,
42.99948039,  44.99948036,  46.9994352 ,  50.99936737])
In [59]:
(dt[:20] / thdu.header['TIMEDEL']).astype(np.int)

Out[59]:
array([ 0,  0,  4,  5,  8,  9, 13, 18, 19, 19, 24, 25, 28, 30, 34, 41, 42,
44, 46, 50])
In [60]:
Image(filename='wat.jpg')

Out[60]:

Let' break this down:

(dt[:20] / thdu.header['TIMEDEL']).astype(np.int)

diego = dt[:20]
diego = diego.astype(np.int)

We can also access "from the end" of the array: here we get the last value:

In [61]:
dt[-1]

Out[61]:
41300.67773014307

Let's bin the data up, so we can view it easier. Here I pick a single binning factor (in this case ~ 400 seconds), but remember Diego's talks (in fact, all our talks), and explore your data (so in this case look at different binning values). You might to also want to use histogram in a slightly-different way (e.g. giving the bins to use; see help(np.histogram)), or other functions.

In [62]:
vals = np.histogram(dt, bins=100)


The histogram routine in NumPy returns both the binned values and the bin edges (always read the help to find out what is returned!) as a tuple (you can think of it as an array which you can not change).

In [63]:
vals

Out[63]:
(array([377, 400, 336, 353, 350, 326, 393, 374, 324, 381, 324, 348, 350,
381, 376, 334, 317, 370, 367, 375, 354, 351, 338, 370, 343, 346,
352, 324, 350, 355, 373, 349, 363, 348, 356, 408, 350, 363, 361,
348, 343, 415, 344, 387, 346, 374, 368, 345, 373, 336, 364, 314,
339, 354, 359, 350, 338, 344, 332, 310, 341, 351, 354, 383, 360,
347, 336, 379, 328, 367, 368, 340, 351, 376, 364, 372, 342, 382,
358, 361, 378, 362, 358, 390, 364, 358, 348, 372, 364, 360, 355,
329, 387, 372, 380, 351, 345, 363, 388, 405]),
array([     0.        ,    413.0067773 ,    826.0135546 ,   1239.0203319 ,
1652.02710921,   2065.03388651,   2478.04066381,   2891.04744111,
3304.05421841,   3717.06099571,   4130.06777301,   4543.07455032,
4956.08132762,   5369.08810492,   5782.09488222,   6195.10165952,
6608.10843682,   7021.11521412,   7434.12199143,   7847.12876873,
8260.13554603,   8673.14232333,   9086.14910063,   9499.15587793,
9912.16265523,  10325.16943254,  10738.17620984,  11151.18298714,
11564.18976444,  11977.19654174,  12390.20331904,  12803.21009634,
13216.21687365,  13629.22365095,  14042.23042825,  14455.23720555,
14868.24398285,  15281.25076015,  15694.25753745,  16107.26431476,
16520.27109206,  16933.27786936,  17346.28464666,  17759.29142396,
18172.29820126,  18585.30497856,  18998.31175587,  19411.31853317,
19824.32531047,  20237.33208777,  20650.33886507,  21063.34564237,
21476.35241967,  21889.35919698,  22302.36597428,  22715.37275158,
23128.37952888,  23541.38630618,  23954.39308348,  24367.39986078,
24780.40663809,  25193.41341539,  25606.42019269,  26019.42696999,
26432.43374729,  26845.44052459,  27258.44730189,  27671.4540792 ,
28084.4608565 ,  28497.4676338 ,  28910.4744111 ,  29323.4811884 ,
29736.4879657 ,  30149.494743  ,  30562.50152031,  30975.50829761,
31388.51507491,  31801.52185221,  32214.52862951,  32627.53540681,
33040.54218411,  33453.54896142,  33866.55573872,  34279.56251602,
34692.56929332,  35105.57607062,  35518.58284792,  35931.58962522,
36344.59640253,  36757.60317983,  37170.60995713,  37583.61673443,
37996.62351173,  38409.63028903,  38822.63706633,  39235.64384364,
39648.65062094,  40061.65739824,  40474.66417554,  40887.67095284,
41300.67773014]))
In [64]:
Image('question.jpg')

Out[64]:

This photograph is copyright Douglas Burke, 2014 and is not for re-use without express permission.

Question: what have I done wrong here?

Perhaps it's an "instrumental effect" that I've forgotten about...

In [65]:
(y,x) = vals
print(y.shape)
print(x.shape)

(100,)
(101,)


Question: why are these values not the same?

In [66]:
plt.plot(x[:-1], y)

Out[66]:
[<matplotlib.lines.Line2D at 0x7f7d7ddb86d0>]

Look around; there may be other useful routines! For instance, matplotlib has a routine that will bin up data using histogram and then plot it as a histogram:

In [67]:
plt.hist(dt, bins=100)

Out[67]:
(array([ 377.,  400.,  336.,  353.,  350.,  326.,  393.,  374.,  324.,
381.,  324.,  348.,  350.,  381.,  376.,  334.,  317.,  370.,
367.,  375.,  354.,  351.,  338.,  370.,  343.,  346.,  352.,
324.,  350.,  355.,  373.,  349.,  363.,  348.,  356.,  408.,
350.,  363.,  361.,  348.,  343.,  415.,  344.,  387.,  346.,
374.,  368.,  345.,  373.,  336.,  364.,  314.,  339.,  354.,
359.,  350.,  338.,  344.,  332.,  310.,  341.,  351.,  354.,
383.,  360.,  347.,  336.,  379.,  328.,  367.,  368.,  340.,
351.,  376.,  364.,  372.,  342.,  382.,  358.,  361.,  378.,
362.,  358.,  390.,  364.,  358.,  348.,  372.,  364.,  360.,
355.,  329.,  387.,  372.,  380.,  351.,  345.,  363.,  388.,  405.]),
array([     0.        ,    413.0067773 ,    826.0135546 ,   1239.0203319 ,
1652.02710921,   2065.03388651,   2478.04066381,   2891.04744111,
3304.05421841,   3717.06099571,   4130.06777301,   4543.07455032,
4956.08132762,   5369.08810492,   5782.09488222,   6195.10165952,
6608.10843682,   7021.11521412,   7434.12199143,   7847.12876873,
8260.13554603,   8673.14232333,   9086.14910063,   9499.15587793,
9912.16265523,  10325.16943254,  10738.17620984,  11151.18298714,
11564.18976444,  11977.19654174,  12390.20331904,  12803.21009634,
13216.21687365,  13629.22365095,  14042.23042825,  14455.23720555,
14868.24398285,  15281.25076015,  15694.25753745,  16107.26431476,
16520.27109206,  16933.27786936,  17346.28464666,  17759.29142396,
18172.29820126,  18585.30497856,  18998.31175587,  19411.31853317,
19824.32531047,  20237.33208777,  20650.33886507,  21063.34564237,
21476.35241967,  21889.35919698,  22302.36597428,  22715.37275158,
23128.37952888,  23541.38630618,  23954.39308348,  24367.39986078,
24780.40663809,  25193.41341539,  25606.42019269,  26019.42696999,
26432.43374729,  26845.44052459,  27258.44730189,  27671.4540792 ,
28084.4608565 ,  28497.4676338 ,  28910.4744111 ,  29323.4811884 ,
29736.4879657 ,  30149.494743  ,  30562.50152031,  30975.50829761,
31388.51507491,  31801.52185221,  32214.52862951,  32627.53540681,
33040.54218411,  33453.54896142,  33866.55573872,  34279.56251602,
34692.56929332,  35105.57607062,  35518.58284792,  35931.58962522,
36344.59640253,  36757.60317983,  37170.60995713,  37583.61673443,
37996.62351173,  38409.63028903,  38822.63706633,  39235.64384364,
39648.65062094,  40061.65739824,  40474.66417554,  40887.67095284,
41300.67773014]),
<a list of 100 Patch objects>)

### FFT ahoy¶

In [68]:
cvals = np.fft.fft(y)
print("shape={}".format(cvals.shape))
print("data type={}".format(cvals.dtype))

shape=(100,)
data type=complex128

In [69]:
avals = np.abs(cvals)**2
print("data type={}".format(avals.dtype))

data type=float64

In [70]:
plt.plot(avals)
plt.yscale('LOG')
plt.xlim(-5, 105)

Out[70]:
(-5, 105)
In [71]:
Image('question.jpg')

Out[71]:

This photograph is copyright Douglas Burke, 2014 and is not for re-use without express permission.

Question: how do you think I know to use -5 and 105 for the x limits?

In [72]:
tstep = thdu.header['TIMEDEL']
freqs = np.fft.fftfreq(y.size, tstep)
idx = np.argsort(freqs)


The above was found via random googling: you will find that many questions are asked on StackOverflow; some are even answered.

Some of these answers are even useful!

In [73]:
plt.plot(freqs[idx], avals[idx])
plt.yscale('LOG')


We can get rid of that peak by subtracting the mean of the signal; note that we don't need to recalculate freqs (although there is no harm in doing so, as it is not a long calculation).

Actually, I'm getting bored of this calculation, so let's automate it, since isn't that the whole point of computers?

In [74]:
def diego(times, tstep):
"""What Would Diego Do?"""

fft = np.fft.fft(times)
real = np.abs(fft)**2
freqs = np.fft.fftfreq(times.size, tstep)
idx = np.argsort(freqs)
return (freqs[idx], real[idx])

In [75]:
(a,b) = diego(y - y.mean(), tstep)
plt.plot(a, b)

Out[75]:
[<matplotlib.lines.Line2D at 0x7f7d7d761950>]

### Moar light curves¶

In [76]:
stimes = fits.open("data/swift.evt")[1].data.field('time')

In [77]:
Image(filename='wat.jpg')

Out[77]: