%pylab inline
import numpy as np
import pylab as plt
import figrc
theme = figrc.Theme_Gray() + figrc.Theme_Ezrc(16, 1, 22)
theme.apply()
Populating the interactive namespace from numpy and matplotlib
Look at the file star.dat
in the data directory. This contains the astrophysical parameters (columns) for
stars (one star per row).
We will now explore the different properties from this dataset and highlight some correlations and eventually emphasize the physics.
data = np.recfromcsv('star.csv')
data.dtype
dtype([('g', '<i8'), ('ag', '<f8'), ('dist', '<f8'), ('mass', '<f8'), ('radius', '<f8'), ('lum', '<f8'), ('mg', '<f8'), ('teff', '<f8'), ('logg', '<f8'), ('met', '<f8')])
Column | Description |
---|---|
g | luminosity in the G band (gaia integrated luminosity) |
ag | attenuation in the G band from dust absorption |
dist | distance to the star |
mass | mass of the star |
radius | radius of the star |
lum | bolometric log-luminosity of the star |
mg | magnitude in G band |
teff | effective temperature |
logg | logarithm of the surface gravity |
met | metallicity |
mass
columnplt.hist(np.log10(data['mass']), bins=60)
plt.xlabel(r'log$_{10}$(M/M$_\odot$)')
plt.ylabel(r'dN/dlogM');
teff
column when binned according to logg
valuesIn python there is never one single method to do so.
Below, I do the plot from basic libraries and then with Pandas.
Note that pandas
and seaborn
could offer good alternatives but requires some learning.
from itertools import groupby
#extract column information and sort by logg
logg = np.round(data['logg'])
ind = np.argsort(logg)
teff = data['teff'][ind]
logg = np.round(data['logg'])[ind]
# group data by logg values
d = []
pos = []
for v, g in groupby(zip(teff, logg), lambda x: x[1]):
pos.append(v)
d.append(np.array(list(g))[:, 0])
# plot
plt.boxplot(d, positions=pos, widths=0.9)
plt.ylim(3900, 8100)
plt.xlabel('logg')
plt.ylabel('teff');
Using the pandas
package, could be a bit more efficient for this particular plot. Below I give an example of similar figure using pandas
from pandas import DataFrame
df = DataFrame(data)
df.boxplot('teff', by=np.round(df['logg']), widths=0.9);
plt.suptitle('')
plt.title('')
plt.ylim(3900, 8100)
plt.xlabel('logg')
plt.ylabel('teff');
Again many possibilities to make similar plots. I only show one method, feel free to explore other ways.
plt.scatter(np.log10(data['teff']), data['mg'])
plt.ylim(plt.ylim()[::-1])
plt.xlim(plt.xlim()[::-1])
plt.xlabel('log$_{10}$Teff')
plt.ylabel('M$_G$');
Density distributions can be represented by many means. Below I show an histrogram and a gaussian kernel density estimator.
plt.hist2d(np.log10(data['teff']), data['mg'], bins=40)
plt.ylim(plt.ylim()[::-1])
plt.xlim(plt.xlim()[::-1])
plt.xlabel('log$_{10}$Teff')
plt.ylabel('M$_G$');
Unfortunately, there is not a one-liner KDE plot with the standard libraries.
Note that seaborn
offers a one-liner option. (sns.kdeplot(x, y, cmap=cmap, shade=True, cut=5, ax=ax)
)
from scipy.stats import gaussian_kde
kernel = gaussian_kde(np.vstack([np.log10(data['teff']), data['mg']]))
mins = kernel.dataset.min(1)
maxs = kernel.dataset.max(1)
X, Y = np.meshgrid(np.linspace(mins[0], maxs[0],100), np.linspace(mins[1], maxs[1],100))
vals = kernel(np.vstack([X.ravel(),Y.ravel()])).reshape(X.shape)
plt.pcolor(X, Y, vals)
plt.xlim(maxs[0], mins[0])
plt.ylim(maxs[1], mins[1])
plt.xlabel('log$_{10}$Teff')
plt.ylabel('M$_G$');
Again, the dataset represents properties of individual stars. We know from stellar physics that there are relations between stellar properties, such as luminosity, gravity, and mass. These relations are relatively simple equations:
Now you're on your own...
Visualize for each of the 4 relations if the stars from the dataset obey stellar physics.
Reality is never perfect, either because our understanding of stellar physics is imperfect, or uncertain measurements. For each relation, characterize the dispersion of the dataset with respect to our expectations from the physics.
tip: the table at the top of this document describes the columns of the dataset.
Consider the uniform distribution, $p(x)$.
Show empirically that $m = (x_{max} + x_{min} )/2$ ("midpoint") is a more efficient estimator than the mean, $\mu$. Do this by drawing a sample of size $n$ from the distribution, calculating the estimators, and then plotting them against a large range of values $n$, in order to see which converges more rapidly.
How the variation of the standard deviation of $\mu$ with $n$ compares with theory? (i.e., overplot it).
How the standard deviation in m varies with n? Explain your result.
Recall that "efficiency" refers to the variance of the estimator.
Hint: generate many samples for a given $n$, calculate for each the mean and midpoint, then compute the mean and standard deviation of the estimators (given $n$).
** Important: Careful to the stddev definition in your coding language **, it may not be always the same
You are in a game show and presented with three closed doors. Behind one of them (chosen at random without your knowledge) is a prize car, behind each of the others is a goat. Your goal is to reveal (and thus win) the car.
You first select a door at random.
The game show host (who knows where the car is) then opens one of the other doors to reveal a goat (He will never open a door to reveal the car). He then gives you the opportunity to change your choice of door to the other closed one.
Do you? More specifically, what is the probability of winning the car if you do and if you do not change?
from IPython.display import Image
Image('http://imgs.xkcd.com/comics/monty_hall.png')
Work out the probabiliy of winning the car if you switch always, randomly, and never? Explain your results either mathematically or intuitively.
Write a piece of code that simulates the game to allow you to verify your findings.
simulate many games and plot the distribution of wins depending on 3 strategies: 1. you never switch 2. you always switch 3. you randomly switch
if $t$ is the lifetime of an element, which follows an exponential decay law with a charactistic time $\tau$: $$f(t) = A \exp(-t/\tau)$$
Below is the result of 5 experiments to measure the speed of light with some associated uncertainties.
measurements | uncertainties |
---|---|
[km/s] | [km/s] |
299 794 | 3 |
299 791 | 5 |
299 770 | 2 |
299 789 | 3 |
299 790 | 4 |
Calculate the best estimate of the speed of light, as well as your uncertainty in that estimate from the measurements only.
Take weighted average using uncertainties as weights, and quote the variance of this (all in the lecture notes)
tip: be critic about your data, and detail your assumptions.