By James Keaveney (james.keaveney@durham.ac.uk)
We're going to start with a complex-looking plot, and break it down step-by-step into its component parts. The code used here should then be portable into pretty much any other Python script you'll write.
The end result of this tutorial is this plot (click on it for pdf) - a comparison of a lineshape that is fitted to various functions. The physical significance of this is the inhomogeneous broadening caused by the Doppler effect. As is well-known, the bare atomic response has a Lorentzian lineshape. Doppler broadening is governed by the Maxwell-Boltzmann distribution, which is Gaussian. Combining these requires a convolution of both lineshapes, resulting in a Voigt profile. For different temperature ranges, either the Lorentzian or Gaussian may be a good approximation. If this is the case, the residuals should be flat, and normally distributed around 0. In this plot we show the residuals for all three (Lorentzian, Gaussian, Voigt) functions.
So, what are the main features of the plot:
And some less obvious features:
Everything in this example is going to be imported from the script used to make this plot, which you can download here. So let's find out what the current working directory is:
#cfg = get_ipython().config
#cfg['InlineBackend']['close_figures'] = False
#print cfg
from os import getcwd
getcwd()
'D:\\Dropbox\\Dropbox\\Examples\\python_notebooks'
Ok, great - now we can tell the system to look in the subdirectory called 'code', then import from our script file:
import sys
sys.path.append(getcwd()+'/code/')
from lineshape_analysis import gaussian, lorentzian, voigt, generate_lineshapes
The functions we have imported are the standard Gaussian (Normal) and Lorentzian curves, and the Voigt profile, which is a convolution of the Gaussian and Lorentzian. Look up the functions in lineshape_analysis.py if you're interested in the details.
Now let's generate some noisy data, using numpy's random module:
%matplotlib inline
from numpy import random, arange
from matplotlib.pyplot import figure, subplot2grid, setp, clf
from IPython.display import display
x = arange(-30,30,0.2)
wL, wG = 1, 5
#generatae the Lorentzian, Gaussian, and Voigt lines:
yL,yG,yV = generate_lineshapes(x,0,wL,0,wG)
#add noise to the Voigt
y_noise = random.randn(len(x))*0.03
y_data = yV + y_noise
Now let's take a quick look at it:
fig = figure(1)
clf()
ax = fig.add_subplot(111)
ax.plot(x,y_data)
[<matplotlib.lines.Line2D at 0x8cd9c50>]
Great - there's some noise, but the overall lineshape looks ok.
from scipy.optimize import curve_fit
Then we can run the fit three times, one for each function (L, G, V). Personally, I like the curve_fit wrapper as it makes everything more neat and tidy. The only mandatory curve_fit inputs are the function, x_data and y_data. Optionally you can include y-axis error bars (with the option sigma=array), or initial parameters with p0=list - as we do here. curve_fit returns (by default) a list of the optimum parameters (popt), and the covariance matrix (perr). The errors on each of the fit parameters can be found from the square root of the diagonal elements of perr.
# FITTING:
# 1. Lorentzian
pin = [0,1]
popt, perr = curve_fit(lorentzian,x,y_data,p0=pin)
y_L = lorentzian(x,*popt)
y_LRes = y_data-y_L
# 2. Gaussian
pin = [0,1]
popt, perr = curve_fit(gaussian,x,y_data,p0=pin)
y_G = gaussian(x,*popt)
y_GRes = y_data-y_G
# 3. Voigt
pin = [0,1,0,1]
popt, perr = curve_fit(voigt,x,y_data,p0=pin)
y_V = voigt(x,*popt)
y_VRes = y_data-y_V
So that's all the analysis done, we just need to plot it all.
Now let's create the final figure with all the panels. The subplot2grid module is very useful here for making more complex panel arrangements than is possible just using subplots. Both of these modules are part of matplotlib.
subplot2grid creates a grid of subplots, in this case an 8 x 8 grid, then we just make panels span multiple elements in the grid using the rowspan and colspan options. The first argument (yy,xx) is the total number of panels, the second argument ((yy-3,0) for ax_LRes) is the top-left corner of that particular panel. Just remember that most indexing in python starts from 0; in this case panel 0,0 is the top left, and x-1,y-1 is the bottom right
We can share axes limits between panels with the sharex and sharey options. Turn off the axes tick labels that are not required, using setp.
Finally, we label the axes. Notice that we can use LaTeX commands for any maths in the labels (or anywhere else that text goes), just insert it using a raw string:
# Normal string
print ' Hello \n world' # prints 'Hello' and 'world' on two lines
# Raw string
print r' Hello \n world' # prints 'Hello \n world' in one line with no breaks
For the LaTeX stuff, this just avoids any confusion with all the back-slashes.
### Create figure panels using subplot2grid
fig = figure(1,figsize=(6*0.75,7*0.75))
clf()
fig.subplots_adjust(left=0.15,bottom=0.08,top=0.97)
#rows
yy = 8
#cols
xx = 8
#main panel
ax = subplot2grid((yy,xx),(0,0),rowspan=yy-3,colspan=xx-1)
#3 residual panels
ax_LRes = subplot2grid((yy,xx),(yy-3,0),rowspan=1,colspan=xx-1,sharex=ax)
ax_GRes = subplot2grid((yy,xx),(yy-2,0),rowspan=1,colspan=xx-1,sharex=ax,sharey=ax_LRes)
ax_VRes = subplot2grid((yy,xx),(yy-1,0),rowspan=1,colspan=xx-1,sharex=ax,sharey=ax_LRes)
#residual histogram panels
ax_LHist = subplot2grid((yy,xx),(yy-3,xx-1),rowspan=1,colspan=1,sharey=ax_LRes)
ax_GHist = subplot2grid((yy,xx),(yy-2,xx-1),rowspan=1,colspan=1,sharey=ax_GRes)
ax_VHist = subplot2grid((yy,xx),(yy-1,xx-1),rowspan=1,colspan=1,sharey=ax_VRes)
#turn off unwanted axes tick labels
setp(ax_LRes.get_xticklabels(),visible=False)
setp(ax_GRes.get_xticklabels(),visible=False)
setp(ax.get_xticklabels(),visible=False)
setp(ax_LHist.get_yticklabels(),visible=False)
setp(ax_GHist.get_yticklabels(),visible=False)
setp(ax_VHist.get_yticklabels(),visible=False)
setp(ax_LHist.get_xticklabels(),visible=False)
setp(ax_GHist.get_xticklabels(),visible=False)
setp(ax_VHist.get_xticklabels(),visible=False)
#set axis labels
ax.set_ylabel(r'Intensity ($I/I_0$)')
ax_VRes.set_xlabel(r'Detuning ($\Delta/\Gamma$)')
ax_LRes.set_ylabel(r'$L$')
ax_GRes.set_ylabel(r'$G$')
ax_VRes.set_ylabel(r'$L \otimes G$')
<matplotlib.text.Text at 0x898dcc0>
The y-axis tick labels for the bottom 3 panels look quite bad at the moment, but we can sort that out later when we know the range of the residuals.
Now let's populate the plot - we start with the data we generated previously:
# plot initial data
ax.plot(x,y_data,'k.',alpha=0.6)
ax.set_xlim(-30,30)
fig.canvas.draw()
display(fig)
Then add the fits...
# add fits to main panel
ax.plot(x,y_L,'b',lw=2,label = 'Lorentzian')
ax.plot(x,y_G,'r',lw=2,label = 'Gaussian')
ax.plot(x,y_V,'g--',lw=4,label = 'Voigt')
#and optimise the axes limits
ax.set_ylim(-0.1,1.05)
fig.canvas.draw()
display(fig)
...and the residuals...
# add residuals to sub-panels
for axis in [ax_LRes,ax_GRes,ax_VRes,ax_LHist,ax_GHist,ax_VHist]:
axis.axhline(0,color='k',linestyle='--')
ax_LRes.plot(x,y_LRes*100,'b.')
ax_GRes.plot(x,y_GRes*100,'r.')
ax_VRes.plot(x,y_VRes*100,'g.')
fig.canvas.draw()
display(fig)
A histogram of the residuals is a nice addition to this type of plot, and contains at-a-glance important information about the quality of the fit. With normally distributed errors (like we have, artificially, added into our example data), one expects a histogram of residuals of a 'good' fit to also be normally distributed, with a mean ideally around zero.
There are a few ways of making a histogram, but in this example we're going to use the method built in to matplotlib to generate and plot it at the same time. We use the orientation option to place the histograms horizontally (NOTE - this is a well documented bug in matplotlib versions before version 1.3.0 so make sure you're running an up-to-date version).
Let's add these to the plot now:
# check matplotlib version - horizontal histograms are buggy in versions before 1.3.0
import matplotlib
print 'Running Matplotlib Version: ', matplotlib.__version__
# Histograms
histrange = (-20,20)
ax_LHist.hist(y_LRes*100,bins=25,orientation='horizontal', \
fc='b',alpha=0.6,range=histrange)
ax_GHist.hist(y_GRes*100,bins=25,orientation='horizontal', \
fc='r',alpha=0.6,range=histrange)
ax_VHist.hist(y_VRes*100,bins=25,orientation='horizontal', \
fc='g',alpha=0.6,range=histrange)
fig.canvas.draw()
display(fig)
Running Matplotlib Version: 1.4.0
Now to tidy some things up. First, let's add a legend to the main panel:
ax.legend(loc=0)
fig.canvas.draw()
display(fig)
The loc=0 keyword argument places the legend in the 'best' location, i.e. it tries to avoid the legend hitting any data in the panel. This is automatically placed every time the program is run, but can also be manually placed - for a list of location parameters, see here.
Next, let's tidy up the y-axis tick labels on the residual plots - at the moment they're all overlapping with each other which looks terrible.
# we only need to do this to one of the axes, since the y axes are shared (with the sharey option when we first made them)
ax_LRes.set_yticks([-20,-10,0,10])
fig.canvas.draw()
display(fig)
Finally, let's make sure that the y-axis labels are all aligned. With this figure, it doesn't look too bad, but this is a useful bit of code for other plots where the difference is more obvious.
We need to override the automatic placement (so this takes a bit of guesswork...):
# set / align multiple vertical axes labels
# position of labels (play with this number until it looks right!)
labelx=-0.1
#set label position, relative to axis coordinates
for axes in [ax, ax_LRes, ax_GRes, ax_VRes]:
axes.yaxis.set_label_coords(labelx, 0.5)
fig.canvas.draw()
display(fig)
Great - we're all done for this example. When this code isn't running in a notebook, you can save figures either from the matplotlib figure toolbar, or integrate saving into the code, with
fig.savefig('your_filename_here.pdf')
Many file formats are supported, including png, pdf, ps and eps. Most journals want a vector graphics format, so either eps or pdf are the way forward. If you don't know what vector graphics is and why it's important, see/run this example for an explicit demonstration of the differences.