%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
# use seaborn's default plotting styles for matplotlib
import seaborn; seaborn.set()
gatspy
includes loaders for some convenient datasets.
Here we'll take a look at some RR Lyrae light curves from Sesar 2010
from gatspy.datasets import fetch_rrlyrae
rrlyrae = fetch_rrlyrae()
This dataset object has an ids
attribute showing the light curve ids available, and a get_lightcurve
method which returns the light curve:
len(rrlyrae.ids)
483
lcid = rrlyrae.ids[0]
t, y, dy, filts = rrlyrae.get_lightcurve(lcid)
Let's quickly visualize this data, to see what we're working with:
for filt in 'ugriz':
mask = (filts == filt)
plt.errorbar(t[mask], y[mask], dy[mask], fmt='.', label=filt)
plt.gca().invert_yaxis()
plt.legend(ncol=3, loc='upper left');
The dataset has a metadata
attribute, which can tell us things like the period (as determined by Sesar 2010:
period = rrlyrae.get_metadata(lcid)['P']
period
0.61431831
Let's fold the lightcurve on this period and re-plot the points:
phase = t % period
for filt in 'ugriz':
mask = (filts == filt)
plt.errorbar(phase[mask], y[mask], dy[mask], fmt='.', label=filt)
plt.gca().invert_yaxis()
plt.legend(ncol=3, loc='upper left');
Now the characteristic shape of the RR Lyrae light curve can be seen!
We can now use the Lomb-Scargle periodogram to attempt to fit the period. We'll use the classic (single-band) version, and examine the best period for each band individually. Because of the large (~10 year) baseline, the fit takes a few seconds per band:
from gatspy.periodic import LombScargle, LombScargleFast
# LombScargleFast is slightly faster than LombScargle for the simplest case
# Both have the same interface.
model = LombScargleFast()
model.optimizer.set(quiet=True, period_range=(0.2, 1.2))
print("Sesar 2010:", period)
for filt in 'ugriz':
mask = (filts == filt)
model.fit(t[mask], y[mask], dy[mask])
print(filt + ': ', model.best_period)
Sesar 2010: 0.614318 u: 0.61431570304 g: 0.614316612117 r: 0.614316612117 i: 0.614317748467 z: 0.614316839386
We see that within each band, the lomb-scargle periodogram lands on the correct peak.
This model.best_period
hides some computation. To find the best period, the model optimizer determines the required resolution for a linear scan search, and computes the periodogram at each of these values.
We can view the periodogram for each model by calling the periodogram
method.
We'll put this in a function for later use
def plot_periodogram(model, lcid):
plt.figure()
rrlyrae = fetch_rrlyrae()
t, y, dy, filts = rrlyrae.get_lightcurve(lcid)
period = rrlyrae.get_metadata(lcid)['P']
periods = np.linspace(0.2, 1.0, 5000)
for i, filt in enumerate('ugriz'):
mask = (filts == filt)
model.fit(t[mask], y[mask], dy[mask])
power = model.periodogram(periods)
plt.plot(periods, i + power, lw=1, label=filt)
plt.xlim(periods[0], periods[-1])
plt.ylim(0, 5)
plt.legend(ncol=3)
plot_periodogram(LombScargle(), lcid)
The periodogram has very narrow peaks, of width approximately $\Delta P \approx P^2 / T$, where $T = (t_{max} - t_{min})$ is the baseline of the measurements. For 10 years of data, the width near the 0.6 day peak turns out to be around $10^{-4}$ days (about 8 seconds), which is why we need such fine sampling of the periodogram to see the peaks.
The Lomb-Scargle model corresponds to a single-term sinusoid fit to the data.
We can plot these model fits using the predict()
method.
We'll encapsulate this in a function in order to reuse it below:
def plot_model(model, lcid):
plt.figure()
rrlyrae = fetch_rrlyrae()
t, y, dy, filts = rrlyrae.get_lightcurve(lcid)
period = rrlyrae.get_metadata(lcid)['P']
phase = t % period
tfit = np.linspace(0, period, 100)
for filt in 'ugriz':
mask = (filts == filt)
pts = plt.errorbar(phase[mask], y[mask], dy[mask], fmt='.', label=filt)
model.fit(t[mask], y[mask], dy[mask])
yfit = model.predict(tfit, period=period)
plt.plot(tfit, yfit, color=pts[0].get_color(), alpha=0.5)
plt.gca().invert_yaxis()
plt.legend(ncol=3, loc='upper left');
plot_model(LombScargle(), lcid)
The model is clearly fairly biased: that is, it doesn't have enough flexibility to truly fit the data. We can do a bit better by adding more terms to the Fourier series with the Nterms
parameter
plot_model(LombScargle(Nterms=4), lcid)
This model can also be used within the periodogram, using the same interface as above:
plot_periodogram(LombScargle(Nterms=4), lcid)
The package includes the SuperSmoother algorithm using the same API. Here we'll plot the periodogram and the model fits for the same data using this model. Because the interface is identical, we can simply swap-in the supersmoother model in the function from above:
from gatspy.periodic import SuperSmoother
plot_periodogram(SuperSmoother(), lcid)
plot_model(SuperSmoother(), lcid)
Nice and easy!