The SDSS Galaxy data in Section 1.5.5 comes from DR8 (though they don't mention that anywhere that I could find).
You can run the query in Appendix D on the SDSS-III CasJobs site
(not the SDSS CasJobs site) in the 'DR8' Context.
The result of the query is saved in a FITS file that can be retrieved with the function astroML.datasets.fetch_sdss_specgals()
.
from astroML.datasets import fetch_sdss_specgals
help(fetch_sdss_specgals)
dr8data = fetch_sdss_specgals()
Help on function fetch_sdss_specgals in module astroML.datasets.sdss_specgals: fetch_sdss_specgals(data_home=None, download_if_missing=True) Loader for SDSS Galaxies with spectral information Parameters ---------- data_home : optional, default=None Specify another download and cache folder for the datasets. By default all scikit learn data is stored in '~/astroML_data' subfolders. download_if_missing : optional, default=True If False, raise a IOError if the data is not locally available instead of trying to download the data from the source site. Returns ------- data : recarray, shape = (327260,) record array containing pipeline parameters Notes ----- These were compiled from the SDSS database using the following SQL query:: SELECT G.ra, G.dec, S.mjd, S.plate, S.fiberID, --- basic identifiers --- basic spectral data S.z, S.zErr, S.rChi2, S.velDisp, S.velDispErr, --- some useful imaging parameters G.extinction_r, G.petroMag_r, G.psfMag_r, G.psfMagErr_r, G.modelMag_u, modelMagErr_u, G.modelMag_g, modelMagErr_g, G.modelMag_r, modelMagErr_r, G.modelMag_i, modelMagErr_i, G.modelMag_z, modelMagErr_z, G.petroR50_r, G.petroR90_r, --- line fluxes for BPT diagram and other derived spec. parameters GSL.nii_6584_flux, GSL.nii_6584_flux_err, GSL.h_alpha_flux, GSL.h_alpha_flux_err, GSL.oiii_5007_flux, GSL.oiii_5007_flux_err, GSL.h_beta_flux, GSL.h_beta_flux_err, GSL.h_delta_flux, GSL.h_delta_flux_err, GSX.d4000, GSX.d4000_err, GSE.bptclass, GSE.lgm_tot_p50, GSE.sfr_tot_p50, G.objID, GSI.specObjID INTO mydb.SDSSspecgalsDR8 FROM SpecObj S CROSS APPLY dbo.fGetNearestObjEQ(S.ra, S.dec, 0.06) N, Galaxy G, GalSpecInfo GSI, GalSpecLine GSL, GalSpecIndx GSX, GalSpecExtra GSE WHERE N.objID = G.objID AND GSI.specObjID = S.specObjID AND GSL.specObjID = S.specObjID AND GSX.specObjID = S.specObjID AND GSE.specObjID = S.specObjID --- add some quality cuts to get rid of obviously bad measurements AND (G.petroMag_r > 10 AND G.petroMag_r < 18) AND (G.modelMag_u-G.modelMag_r) > 0 AND (G.modelMag_u-G.modelMag_r) < 6 AND (modelMag_u > 10 AND modelMag_u < 25) AND (modelMag_g > 10 AND modelMag_g < 25) AND (modelMag_r > 10 AND modelMag_r < 25) AND (modelMag_i > 10 AND modelMag_i < 25) AND (modelMag_z > 10 AND modelMag_z < 25) AND S.rChi2 < 2 AND (S.zErr > 0 AND S.zErr < 0.01) AND S.z > 0.02 --- end of query --- Examples -------- >>> from astroML.datasets import fetch_sdss_specgals >>> data = fetch_sdss_specgals() >>> data.shape # number of objects in dataset (661598,) >>> data.names[:5] # first five column names ['ra', 'dec', 'mjd', 'plate', 'fiberID'] >>> print data['ra'][:3] # first three RA values [ 146.71419105 146.74414186 146.62857334] >>> print data['dec'][:3] # first three declination values [-1.04127639 -0.6522198 -0.7651468 ]
The DR8 astrometry, i.e. ra and dec, are known to be flawed. The errors were fixed in DR9, but the old data in DR8 was not overwritten. Instead a new table was added to DR8 that you can join to in a SQL query to get the DR9 astrometry.
This query shows how to get the DR9 astrometry in the DR8 context.
SELECT
A.ra, A.dec, --- DR9 Astrometry
S.mjd, S.plate, S.fiberID, --- basic identifiers
--- basic spectral data
S.z, S.zErr, S.rChi2, S.velDisp, S.velDispErr,
--- some useful imaging parameters
G.extinction_r, G.petroMag_r, G.psfMag_r, G.psfMagErr_r,
G.modelMag_u, G.modelMagErr_u, G.modelMag_g, G.modelMagErr_g,
G.modelMag_r, G.modelMagErr_r, G.modelMag_i, G.modelMagErr_i,
G.modelMag_z, G.modelMagErr_z, G.petroR50_r, G.petroR90_r,
--- line fluxes for BPT diagram and other derived spec. parameters
GSL.nii_6584_flux, GSL.nii_6584_flux_err, GSL.h_alpha_flux,
GSL.h_alpha_flux_err, GSL.oiii_5007_flux, GSL.oiii_5007_flux_err,
GSL.h_beta_flux, GSL.h_beta_flux_err, GSL.h_delta_flux,
GSL.h_delta_flux_err, GSX.d4000, GSX.d4000_err, GSE.bptclass,
GSE.lgm_tot_p50, GSE.sfr_tot_p50, G.objID, GSI.specObjID
INTO MyDB.SDSSspecgalsDR8astromDR9 FROM SpecObj AS S CROSS APPLY
dbo.fGetNearestObjEQ(S.ra, S.dec, 0.06) AS N
--- Here we use the more modern JOIN notation.
JOIN Galaxy AS G ON N.objID = G.objID
JOIN AstromDR9 AS A ON N.objID = A.objID -- This is the table with the DR9 astrometry!
JOIN GalSpecInfo AS GSI ON GSI.specObjID = S.specObjID
JOIN GalSpecLine AS GSL ON GSL.specObjID = S.specObjID
JOIN GalSpecIndx AS GSX ON GSX.specObjID = S.specObjID
JOIN GalSpecExtra AS GSE ON GSE.specObjID = S.specObjID
WHERE -- These cuts are all the same as before.
--- add some quality cuts to get rid of obviously bad measurements
(G.petroMag_r > 10 AND G.petroMag_r < 18)
AND (G.modelMag_u-G.modelMag_r) > 0
AND (G.modelMag_u-G.modelMag_r) < 6
AND (G.modelMag_u > 10 AND G.modelMag_u < 25)
AND (G.modelMag_g > 10 AND G.modelMag_g < 25)
AND (G.modelMag_r > 10 AND G.modelMag_r < 25)
AND (G.modelMag_i > 10 AND G.modelMag_i < 25)
AND (G.modelMag_z > 10 AND G.modelMag_z < 25)
AND S.rChi2 < 2
AND (S.zErr > 0 AND S.zErr < 0.01)
AND S.z > 0.02
--- end of query ---
I've already run this query (it takes about 1.5 hours), so let's grab the results.
#
# This is very similar to how fetch_sdss_specgals() works.
#
import os
import os.path
import numpy as np
import urllib2
from astropy.io import fits
from astroML.datasets import get_data_home
#
# This is the path to the file with the new astrometry.
#
DATA_URL = 'http://cosmo.nyu.edu/~bw55/SDSSspecgalsDR8astromDR9_weaver.fit'
#
# This gets the path to your local astroML data.
#
data_home = get_data_home()
if not os.path.exists(data_home):
os.makedirs(data_home)
archive_file = os.path.join(data_home, os.path.basename(DATA_URL))
if not os.path.exists(archive_file):
f = urllib2.urlopen(DATA_URL)
fitsdata = f.read()
with open(archive_file,'wb') as a:
a.write(fitsdata)
hdulist = fits.open(archive_file)
dr9data = np.asarray(hdulist[1].data)
One more thing to do before we can plot the results.
#
# Although they are the same size, they won't necessarily be in the same order, so we've got to sort them.
#
dr9isort = np.argsort(dr9data['objID'])
dr8isort = np.argsort(dr8data['objID'])
print(dr9data['objID'][:3])
print(dr8data['objID'][:3])
print(dr9data['objID'][dr9isort][:3])
print(dr8data['objID'][dr8isort][:3])
#
# If this throws an exception, something is very wrong.
#
assert ((dr9data['objID'][dr9isort]-dr8data['objID'][dr8isort]) == 0).all()
[1237648673458684391 1237648673458684355 1237648673458553624] [1237648720142401611 1237650795146510627 1237650795146445031] [1237645879551066262 1237645879577477252 1237645879577936138] [1237645879551066262 1237645879577477252 1237645879577936138]
#
# This is a big sample, so we'll do a random subsample
#
import numpy.random as rand
rand.seed(137)
index_array = np.arange(dr8data.size)
rand.shuffle(index_array)
subsample = index_array[:2000]
fig,ax = subplots(1,1,figsize=(10,10))
error = 3600.0*np.abs(dr8data['dec'][dr8isort][subsample]-dr9data['dec'][dr9isort][subsample]) # convert to arcsec
sc = ax.scatter(dr9data['dec'][dr9isort][subsample],error)
foo = ax.set_xlabel('$\delta_{\mathrm{DR9}}$ [degrees]')
foo = ax.set_ylabel('$| \delta_{\mathrm{DR8}} - \delta_{\mathrm{DR9}} |$ [arcsec]')
One aspect of the astrometric errors is that they are significantly larger, on average, for $\delta > 40^\circ$. You should be able to see this quite clearly in the plot.
So, if you want to use this Galaxy data, and you require astrometric precision, you know how to get the DR9 corrected astrometry.