Notebook

Scientific Coding in Python: Creating a Power Spectrum from Image Data¶

(Taken from https://upload.wikimedia.org.)

Waves look cool. But did you know that by just looking at surface waves, even when there aren't regular pulses of energy (like those above), you can figure out what's underneath?

Consider the following problem: You have a sequence of images of an oscillating surface. These oscillations occur both on various timescales and lengthscales on the surface of interest. How can we decompose the complex motion of the surface in order to determine the different frequencies and wave lengths that are strongest in the object of interest?

For this lesson, we will use images of the Sun, the surface of which can be thought of as a fluid that is constantly oscillating. These images have been taken every 45 seconds for approximately 4.5 hours, and we will use this time-series to attempt to uncover the frequencies of interest that are present. Once we establish how to apply these techniques to our images, we'll discuss consraints to this technique and dataset, how these techniques can be used in other situations, and where else such techniques could come in handy.

Relevant Background Vocabulary¶

Power: The relative strength of a signal, usually at a given frequency or range of frequencies.
Power Spectrum: The relative spread of power across the full range of frequencies.
Fourier Transform: A mathematical transformation of data given in Euclidean (regular) space to frequency space, used to identify the strongest frequencies and wavelengths of oscillation in a physical dataset. For a more complete introduction to the Fourier transform, check out An Interactive Guide to the Fourier Transform.
Wave Number k: $$k = \frac{2\pi}{\lambda}$$. $$\lambda = N_x \Delta x$$. This will make more sense once we've taken the Fourier transform of our data.
Complex Conjugate: Consider a complex number (with both a real and imaginary part). The complex conjugate of this number is the same number, except with the sign of the imaginary part reversed. For example, the complex conjugate of (3-6$i$) is (3+6$i$).
Adding in Quadrature: Square root of the sum of two squared numbers: $c = \sqrt{a^2 + b^2}$
Azimuthal Average: An average over all points a certain distance from the axis of interest. In this notebook, we azimuthally average over the spatial dimensions (by adding their averages in quadrature) to allow us to represent our three dimensional (one time and two space) data in two dimensions (one time and one space).

For this project, we're trying to identify the features of the power at all wave lengths (wave numbers) and frequencies by creating a power spectrum. As our data is three-dimensional, the power spectrum will also be three dimensional, and so we walk through how to present it in two dimensions in order to quickly decipher the important information.

Python and iPython Notebook Basics¶

Before we get started, we need to import some standard libraries which contain the necessary tools to perform this analysis. All of these Python libraries are available for download for free on the Internet. For future manipulation of astronomical image files, I suggest downloading Ureka, an astronomy analysis package published by the Space Telescope Science Institute in Baltimore, Maryland. Ureka includes all of the packages use in this notebook.

In [1]:

import numpy as np 
import matplotlib.pyplot as plt
%matplotlib inline 
import scipy.io
import glob
import os
import time
from astropy.io import fits
from time import localtime,strftime
tstart = strftime("%a, %d %b %Y %H:%M:%S", localtime())
print "Starting Time:", tstart
tstart = time.time()

Starting Time: Tue, 07 Jul 2015 05:26:57

Note: For analysis purposes, I've chosen to time how long it takes to run the n-dimensional Fourier transform. For this notebook lesson, timing is fairly uninformative because it depends on how quickly you move through the lesson, but I've left the code in place for future reference with large datasets.

Importing Data¶

Next, we need to import the relevant dataset. As this is an astronomical dataset, images have been saved as .fits files. Luckily, Python combined with Ureka has a built-in library to read in .fits files, and so accessing the relevant data is straightforward. We move the 3-D stack of images into a Python list, which we can then modify with the built-in library functions in Python.

I've saved these images to my GitHub account, so if you're interested in recreating this experiment, you can download this notebook file along with all of the images in the Fedop folder and run it for yourself.

In [2]:

imnames = sorted(glob.glob('Fedop/*Fedop.fits'))# sort list of images. Fedop: Doppler images taken with an iron (Fe) filter.

N = len(imnames)

imlist = np.zeros((N,1024,1024))

for i in range(N):
    imlist[i] = fits.open(imnames[i])[0].data # Read in data, and save as Python list.

How can we be sure that we've read in the data correctly?

A quick check of images by referencing array and then plt.imshow-ing the result lets us investigate this. This also helps us to understand which list index corresponds to which axis in the image, which will be important later in the lesson. We also make a line plot of a given pixel to illustrate its sine-curve-like nature, illustrating the measurable oscillations present.

Plotting our Data¶

Let's take a moment to get to know the data. First, we should make sure we have an understanding of the size and shape of our images.

In [3]:

plt.imshow(imlist[100], cmap='cubehelix') # cmap: Specify color map to make for easier visual understanding.
plt.show()

Note that the imshow function built into matplotlib will plot the array indices as axis labels automatically. This gives us a quick and dirty way to be familiar with the data we're working with--we can see that we have approximately 1000 pixels in both the x- and y-directions.

A gradient is visible across the image. We would expect this, as we're effectively measuring line-of-sight velocities in these images, and thus the rotation of the Sun would result in such a pattern. Variations due to additional motions within the Sun would break this pattern. We can see examples of this as the spottiness present all over the solar surface. Cool!

Next, we can plot the variation in one pixel over time. This will give us a quick insight into the timescale of the strongest oscillations present in the data.

In [4]:

plt.plot(imlist[:,500,500]) #Plot slice through Python list corresponding to one spatial location in all images.
plt.xlabel('Image (45s cadence)')
plt.ylabel('Signal')
plt.show()

We can already see multiple components of the signal here. There seems to be a high frequency oscillation as well as larger trend changes over the datastring. Now, we want to separate these components of the signal from one another.

Now that we understand the shape of the Python lists that are holding the data of interest, we can pass the data to the relevant built-in Python function, an n-dimensional Fourier transform. This will change the axes of the data from (length, length, time) to (1/length, 1/length, 1/time). This allows us to explore the data as a function of temporal and spatial frequency, giving us more direct information about the features of the oscillations present, as opposed to showing us the signal created by one individual location in space or time.

In [5]:

#Create a power spectrum.

#First, take n-dimensional Fourier Transform of image data.
fftspec = np.fft.fftn(imlist, s=(512,1024,1024)) # FFT
fftspec = np.fft.fftshift(fftspec) # Shift FFT such that lowest frequencies are at the left of the array, highest at the right.

#Power Spectrum = FFT * conj(FFT)
powspec = fftspec * np.conj(fftspec)
t = time.time() - tstart
print "FFT runtime:", t, "s"

FFT runtime: 254.174783945 s

Note the shape of the FFT array. We have added buffer area along the time dimension to allow for a smoother FFT. We should do this in the spatial dimensions as well, but since the images have already been processed by NASA, it is less necessary.

Also, we take the complex conjugate of the FFT to give us the power spectrum. This allows us to combine the real and imaginary parts of the FFT so we can measure absolute power instead of power in the real or imaginary direction. From here out, when we plot our data, we plot the absolute value to combine the real and imaginary components.

Representing the Power Spectrum¶

Great! We've made a power spectrum of our data. Now, all we have to do is reconfigure it so that it's easy to interpret.

First, instead of plotting both positive and negative frequencies separately, let's align them on top of one another, as they should be mirror images of one another for this dataset. This is effectively folding the data perpendicular to the time axis.

In [6]:

#Analysis: run separately from FFT
tstart = time.time()

#combine positive and negative frequencies
newpowspec = np.zeros((256,1024,1024), dtype=complex) #Define new array of complex datatype.
for i in range(len(powspec)/2):
    newpowspec[i] = powspec[i] + powspec[-(i+1)] # combine powspec halves
    
newpowspec = newpowspec[::-1] #flip over frequency axis so frequencies run from lowest to highest.

Let's make a plot of the data at a given time frequency to make sure that we've arranged the data in a reasonable way.

In [14]:

plt.imshow(np.log(abs(newpowspec[90]))) #We plot the logarithm of the power to make the features more visible.
plt.show()

In this plot, we can identify the typical oscillations in power as a function of spatial frequency, or wavenumber. Great! However, we're most interest in comparing the power of spatial oscillations to the power of temporal oscillations. Thus, we need to find a way to collapse this two-dimensional spatial information into one dimension.

As you can see here, the spatial information seems to be tied to the distance from the origin--it's circular in the above representation. Thus, it would make sense to represent the two-dimensional information as a function of the distance from the origin.

We handle this by finding the average value for a given distance from the origin for each frequency. This is referred to as azimuthally averaging the data.

In [15]:

az = np.zeros((256,1024),dtype=complex)
#compute azimuthal avg
for i in range(1024):
    for j in range(256): # loop over all index locations in kaz
        kx = np.array(newpowspec[j,i,:]) # relevant slice along x axis
        ky = np.array(newpowspec[j,:,i]) # slice along y axis
        az[j,i] = np.sqrt(np.mean(kx)**2 + np.mean(ky)**2) #azimuthal average = x and y averages added in quadrature.

Let's take a look at our az array to make sure we're on the right track.

In [16]:

plt.imshow(np.log(abs(az))) #again, plot in log space to make all features more visible
plt.show()

We can definitely see some structure in this plot! Now, we stack the positive and negative components, although now in spatial frequency, to make the data more readable.

In [17]:

newaz = np.zeros((256,512), dtype=complex)
for i in range(len(az[0,:])/2):
    for j in range(256):
        newaz[j,i] = az[j,i] + az[j,-(i+1)] #combine positive and negative spatial values for each temporal frequency. 
newaz[:,:] = newaz[:,::-1] # flip over spatial frequency axis.

Now, let's plot the data. We should also label our axes, which is not a trivial task. In order to do so, we need to compute the full range of spatial and temporal frequencies given in our power spectrum, and then scale the axis labels accordingly in frequency and wavenumber units. We do this below:

In [18]:

Nt = 512 #steps in temporal direction
Nx = 1024 # steps in spatial x direction
Ny = 1024 # steps in spatial y direction
deltat = 45 #s
deltax = 1.41 #Mm/pixel. Mm: megameter = 1 million meters. 
deltay= 1.41 #Mm/pixel
dt = round(float(1. / (Nt*deltat)),6) # step size in temporal direction, rounded to the nearest hundredth
dnux = round(float(2*np.pi / (Nx*deltax)),6) #step size in spatial direction. multiply by 2pi factor due to 
                                             # wavenumber definition: k=2pi/lambda, lambda=Nx*deltax. round to make plot labels legible.
dnuy = round(float(2*np.pi / (Ny*deltay)),6)

print round(dt,6), dnux, dnuy
dnu = np.sqrt(dnux**2+dnuy**2) #adding x and y in quadrature
dnu = round(dnu,4)
print dnu

4.3e-05 0.004352 0.004352
0.0062

We can use these definitions when defining our plot axis labels. Now, let's make one final plot to understand our power spectrum. Note that we're still plot in logarithmic space and using the absolute value of the

In [19]:

plt.imshow(np.log(abs(newaz[5:150,5:200])), cmap='cubehelix', interpolation='none') # trim shortest and longest frequencies, 
                                                                           # and don't interpolate pixels for easier viewing.
plt.ylabel('Frequency (mHz)')
ylocs = [0,25,55,85,115] # locations of axis labels, in pixels
dt = dt *1000 #milliHertz (mHz) instead of Hertz. 1 Hertz = 1 oscillation per second.
yvals = [0,30*dt,60*dt,90*dt,120*dt] # values of axis labels
plt.yticks(ylocs,yvals) #places axis labels equal to yvals at ylocs pixel locations in plot.
plt.xlabel('Wavenumber (Mm^-1)')
xlocs = [0,45,95,145,195]
xvals = [0,50*dnu, 100*dnu, 150*dnu, 200*dnu]
plt.xticks(xlocs,xvals)
plt.show()

And now we have our readable power spectrum!

The strongest values are seen at low frequency and low wavenumber. This is an artifact of having finite data, and we can ignore it.
The pinkish patch around 3mHz at low wavenumbers corresponds to the fundamental frequency of solar oscillations. This corresponds to the high frequency sine curve component we illustrated for a given pixel earlier, but since it is present in each pixel, it gives us little spatial information. These waves can be thought of as similar to waves on the surface of the ocean.
The ridges to the right of this, however, correspond to the pressure modes of oscillation, giving us more information about the internal structure of the object. Isolating these oscillations allows us to probe underneath the surface of whatever we're looking at. These ridges can be made clearer with higher resolution in either time or space.

We can also display the power spectrum for a given wavenumber to look at the strength of oscillations of a given spatial size as a function of frequency.

In [20]:

plt.plot(np.log(abs(az[:,0])))
plt.plot(np.log(abs(az[:,10])))
plt.plot(np.log(abs(az[:,100]))) # these correspond to different wavenumbers: waves of different spatial sizes.
plt.legend(('az0','az10', 'az100'))
plt.show()
print "Max value and location:",np.max(az), "at", np.unravel_index(np.argmax(az), np.shape(az))

Max value and location: (2.02330574944e+22+0j) at (0, 512)

This frequency analysis can then be used to tell us a great deal about the internal structure and dynamics of stars. At all wavenumbers, the 3mHz peak is visible. At smaller wavenumbers, however, there appears to be some possibly significant lumpiness (aka structure) within the 3 mHz peak. This data has relatively low time resolution, however, so it makes detecting variation other than the 3 mHz peak very difficult. This can be seen more clearly in a comparable lesson that I've done with other data, also available on my GitHub.

The strongest frequency and spacing between frequencies is directly dependent on the density of the oscillating material. We use this to infer the stage of stellar evolution of the star, and can use this information for more precise constraints on stellar density, age, and therefore temperature, size, and mass.

This type of cross-correlation frequency analysis can be applied to any data where one wants to use surface wave oscillations to learn more about what lies underneath. For example, analysis of standing waves on a water surface can tell you more about the depth and shape of the underlying body of water. Power spectra are used in cases as mundane as regulating energy to the electrical grid to cases as esoteric as calculating the shape of the Universe. Hope this lesson was helpful!