Notebook

Correcting DR8 Astrometry¶

The SDSS Galaxy data in Section 1.5.5 comes from DR8 (though they don't mention that anywhere that I could find). You can run the query in Appendix D on the SDSS-III CasJobs site (not the SDSS CasJobs site) in the 'DR8' Context. The result of the query is saved in a FITS file that can be retrieved with the function astroML.datasets.fetch_sdss_specgals().

In [3]:

from astroML.datasets import fetch_sdss_specgals
help(fetch_sdss_specgals)
dr8data = fetch_sdss_specgals()

Help on function fetch_sdss_specgals in module astroML.datasets.sdss_specgals:

fetch_sdss_specgals(data_home=None, download_if_missing=True)
    Loader for SDSS Galaxies with spectral information
    
    Parameters
    ----------
    data_home : optional, default=None
        Specify another download and cache folder for the datasets. By default
        all scikit learn data is stored in '~/astroML_data' subfolders.
    
    download_if_missing : optional, default=True
        If False, raise a IOError if the data is not locally available
        instead of trying to download the data from the source site.
    
    Returns
    -------
    data : recarray, shape = (327260,)
        record array containing pipeline parameters
    
    Notes
    -----
    These were compiled from the SDSS database using the following SQL query::
    
        SELECT
          G.ra, G.dec, S.mjd, S.plate, S.fiberID, --- basic identifiers
          --- basic spectral data
          S.z, S.zErr, S.rChi2, S.velDisp, S.velDispErr,
          --- some useful imaging parameters
          G.extinction_r, G.petroMag_r, G.psfMag_r, G.psfMagErr_r,
          G.modelMag_u, modelMagErr_u, G.modelMag_g, modelMagErr_g,
          G.modelMag_r, modelMagErr_r, G.modelMag_i, modelMagErr_i,
          G.modelMag_z, modelMagErr_z, G.petroR50_r, G.petroR90_r,
          --- line fluxes for BPT diagram and other derived spec. parameters
          GSL.nii_6584_flux, GSL.nii_6584_flux_err, GSL.h_alpha_flux,
          GSL.h_alpha_flux_err, GSL.oiii_5007_flux, GSL.oiii_5007_flux_err,
          GSL.h_beta_flux, GSL.h_beta_flux_err, GSL.h_delta_flux,
          GSL.h_delta_flux_err, GSX.d4000, GSX.d4000_err, GSE.bptclass,
          GSE.lgm_tot_p50, GSE.sfr_tot_p50, G.objID, GSI.specObjID
        INTO mydb.SDSSspecgalsDR8 FROM SpecObj S CROSS APPLY
          dbo.fGetNearestObjEQ(S.ra, S.dec, 0.06) N, Galaxy G,
          GalSpecInfo GSI, GalSpecLine GSL, GalSpecIndx GSX, GalSpecExtra GSE
        WHERE N.objID = G.objID
          AND GSI.specObjID = S.specObjID
          AND GSL.specObjID = S.specObjID
          AND GSX.specObjID = S.specObjID
          AND GSE.specObjID = S.specObjID
          --- add some quality cuts to get rid of obviously bad measurements
          AND (G.petroMag_r > 10 AND G.petroMag_r < 18)
          AND (G.modelMag_u-G.modelMag_r) > 0
          AND (G.modelMag_u-G.modelMag_r) < 6
          AND (modelMag_u > 10 AND modelMag_u < 25)
          AND (modelMag_g > 10 AND modelMag_g < 25)
          AND (modelMag_r > 10 AND modelMag_r < 25)
          AND (modelMag_i > 10 AND modelMag_i < 25)
          AND (modelMag_z > 10 AND modelMag_z < 25)
          AND S.rChi2 < 2
          AND (S.zErr > 0 AND S.zErr < 0.01)
          AND S.z > 0.02
          --- end of query ---
    
    Examples
    --------
    >>> from astroML.datasets import fetch_sdss_specgals
    >>> data = fetch_sdss_specgals()
    >>> data.shape  # number of objects in dataset
    (661598,)
    >>> data.names[:5]  # first five column names
    ['ra', 'dec', 'mjd', 'plate', 'fiberID']
    >>> print data['ra'][:3]  # first three RA values
    [ 146.71419105  146.74414186  146.62857334]
    >>> print data['dec'][:3]  #  first three declination values
    [-1.04127639 -0.6522198  -0.7651468 ]

The DR8 astrometry, i.e. ra and dec, are known to be flawed. The errors were fixed in DR9, but the old data in DR8 was not overwritten. Instead a new table was added to DR8 that you can join to in a SQL query to get the DR9 astrometry.

This query shows how to get the DR9 astrometry in the DR8 context.

    SELECT
      A.ra, A.dec, --- DR9 Astrometry
      S.mjd, S.plate, S.fiberID, --- basic identifiers
      --- basic spectral data
      S.z, S.zErr, S.rChi2, S.velDisp, S.velDispErr,
      --- some useful imaging parameters
      G.extinction_r, G.petroMag_r, G.psfMag_r, G.psfMagErr_r,
      G.modelMag_u, G.modelMagErr_u, G.modelMag_g, G.modelMagErr_g,
      G.modelMag_r, G.modelMagErr_r, G.modelMag_i, G.modelMagErr_i,
      G.modelMag_z, G.modelMagErr_z, G.petroR50_r, G.petroR90_r,
      --- line fluxes for BPT diagram and other derived spec. parameters
      GSL.nii_6584_flux, GSL.nii_6584_flux_err, GSL.h_alpha_flux,
      GSL.h_alpha_flux_err, GSL.oiii_5007_flux, GSL.oiii_5007_flux_err,
      GSL.h_beta_flux, GSL.h_beta_flux_err, GSL.h_delta_flux,
      GSL.h_delta_flux_err, GSX.d4000, GSX.d4000_err, GSE.bptclass,
      GSE.lgm_tot_p50, GSE.sfr_tot_p50, G.objID, GSI.specObjID
    INTO MyDB.SDSSspecgalsDR8astromDR9 FROM SpecObj AS S CROSS APPLY
      dbo.fGetNearestObjEQ(S.ra, S.dec, 0.06) AS N
      --- Here we use the more modern JOIN notation.
      JOIN Galaxy       AS G   ON N.objID = G.objID
      JOIN AstromDR9    AS A   ON N.objID = A.objID -- This is the table with the DR9 astrometry!
      JOIN GalSpecInfo  AS GSI ON GSI.specObjID = S.specObjID
      JOIN GalSpecLine  AS GSL ON GSL.specObjID = S.specObjID
      JOIN GalSpecIndx  AS GSX ON GSX.specObjID = S.specObjID
      JOIN GalSpecExtra AS GSE ON GSE.specObjID = S.specObjID
    WHERE -- These cuts are all the same as before.
      --- add some quality cuts to get rid of obviously bad measurements
      (G.petroMag_r > 10 AND G.petroMag_r < 18)
      AND (G.modelMag_u-G.modelMag_r) > 0
      AND (G.modelMag_u-G.modelMag_r) < 6
      AND (G.modelMag_u > 10 AND G.modelMag_u < 25)
      AND (G.modelMag_g > 10 AND G.modelMag_g < 25)
      AND (G.modelMag_r > 10 AND G.modelMag_r < 25)
      AND (G.modelMag_i > 10 AND G.modelMag_i < 25)
      AND (G.modelMag_z > 10 AND G.modelMag_z < 25)
      AND S.rChi2 < 2
      AND (S.zErr > 0 AND S.zErr < 0.01)
      AND S.z > 0.02
      --- end of query ---

I've already run this query (it takes about 1.5 hours), so let's grab the results.

In [6]:

#
# This is very similar to how fetch_sdss_specgals() works.
#
import os
import os.path
import numpy as np
import urllib2
from astropy.io import fits
from astroML.datasets import get_data_home
#
# This is the path to the file with the new astrometry.
#
DATA_URL = 'http://cosmo.nyu.edu/~bw55/SDSSspecgalsDR8astromDR9_weaver.fit'
#
# This gets the path to your local astroML data.
#
data_home = get_data_home()
if not os.path.exists(data_home):
    os.makedirs(data_home)
archive_file = os.path.join(data_home, os.path.basename(DATA_URL))
if not os.path.exists(archive_file):
    f = urllib2.urlopen(DATA_URL)
    fitsdata = f.read()
    with open(archive_file,'wb') as a:
        a.write(fitsdata)
hdulist = fits.open(archive_file)
dr9data = np.asarray(hdulist[1].data)

One more thing to do before we can plot the results.

In [20]:

#
# Although they are the same size, they won't necessarily be in the same order, so we've got to sort them.
#
dr9isort = np.argsort(dr9data['objID'])
dr8isort = np.argsort(dr8data['objID'])
print(dr9data['objID'][:3])
print(dr8data['objID'][:3])
print(dr9data['objID'][dr9isort][:3])
print(dr8data['objID'][dr8isort][:3])
#
# If this throws an exception, something is very wrong.
#
assert ((dr9data['objID'][dr9isort]-dr8data['objID'][dr8isort]) == 0).all() 

[1237648673458684391 1237648673458684355 1237648673458553624]
[1237648720142401611 1237650795146510627 1237650795146445031]
[1237645879551066262 1237645879577477252 1237645879577936138]
[1237645879551066262 1237645879577477252 1237645879577936138]

In [22]:

#
# This is a big sample, so we'll do a random subsample
#
import numpy.random as rand
rand.seed(137)
index_array = np.arange(dr8data.size)
rand.shuffle(index_array)
subsample = index_array[:2000]
fig,ax = subplots(1,1,figsize=(10,10))
error = 3600.0*np.abs(dr8data['dec'][dr8isort][subsample]-dr9data['dec'][dr9isort][subsample]) # convert to arcsec
sc = ax.scatter(dr9data['dec'][dr9isort][subsample],error)
foo = ax.set_xlabel('$\delta_{\mathrm{DR9}}$ [degrees]')
foo = ax.set_ylabel('$| \delta_{\mathrm{DR8}} - \delta_{\mathrm{DR9}} |$ [arcsec]')

One aspect of the astrometric errors is that they are significantly larger, on average, for $\delta > 40^\circ$. You should be able to see this quite clearly in the plot.

So, if you want to use this Galaxy data, and you require astrometric precision, you know how to get the DR9 corrected astrometry.

In [ ]: