This guide will go over some basics for sampling and selecting data from skspec Spectra
objects. skspec
relies on a subset of pandas indexing and selection. In particular, Multiindex/heirarchal slicing is not supported in skspec
. skspec
also introduces approximate slicing, as it's a common need in spectroscopy.
Configure notebook style (see NBCONFIG.ipynb), add imports and paths. The %run magic used below requires IPython 2.0 or higher.
%run NBCONFIG.ipynb
Populating the interactive namespace from numpy and matplotlib
We load two sets of test data; one with nm
spectral units and one with cm-1
units, which are ordered traditionally from large to small. We will show slicing on both objects:
skspec
follows pandas directly in terms of columns and row seleciton. By default, a Spectra will index by columns. Let's print out and index by the first 3 columns:
from skspec.data import aunps_glass, solvent_evap
t1 = aunps_glass().as_varunit('m')
t2 = solvent_evap().as_varunit('m')
# Plot them side by side
fig, (ax1, ax2) = plt.subplots(1,2, figsize=(8,5))
t1.plot(ax=ax1)
t2.plot(ax = ax2);
first_3 = t1.columns[0:3]
print first_3
t1[first_3]
TimeIndex[m]([0.0, 0.05, 0.116666666667], dtype='float64')
0.0 | 0.05 | 0.116666666667 | |
---|---|---|---|
430.10 | 257.315595 | 257.462298 | 257.216689 |
430.47 | 267.776463 | 267.823164 | 267.707553 |
430.85 | 278.697354 | 278.704053 | 278.718440 |
431.22 | 290.288222 | 290.344919 | 290.219305 |
431.59 | 302.609089 | 302.635784 | 302.620169 |
431.96 | 314.499957 | 314.616650 | 314.641033 |
432.33 | 327.720825 | 327.857516 | 327.931897 |
432.70 | 340.491693 | 340.758382 | 340.682762 |
433.08 | 355.492584 | 355.819271 | 355.763649 |
433.45 | 369.703452 | 370.100137 | 370.154513 |
433.82 | 384.024319 | 384.261002 | 384.225378 |
434.19 | 399.695187 | 400.011868 | 400.016242 |
434.56 | 415.366055 | 415.592734 | 415.587106 |
434.93 | 430.516923 | 430.543600 | 430.647970 |
435.30 | 445.457791 | 445.534465 | 445.768834 |
435.68 | 462.498682 | 462.435355 | 462.779722 |
436.05 | 479.079550 | 479.136220 | 479.370586 |
436.42 | 495.810417 | 495.737086 | 496.031451 |
436.79 | 511.271285 | 511.087952 | 511.582315 |
437.16 | 527.992153 | 527.788818 | 528.323179 |
437.53 | 545.853021 | 545.679684 | 546.284043 |
437.90 | 563.923889 | 563.790549 | 564.294907 |
438.27 | 581.584756 | 581.481415 | 581.875772 |
438.64 | 600.155624 | 600.162281 | 600.346636 |
439.01 | 617.616492 | 617.573147 | 617.797500 |
439.38 | 633.597360 | 633.494012 | 633.718364 |
439.75 | 649.098227 | 648.834878 | 648.999229 |
440.13 | 666.689119 | 666.505767 | 666.630116 |
440.50 | 683.479986 | 683.346633 | 683.530980 |
440.87 | 697.580854 | 697.167499 | 697.341845 |
... | ... | ... | ... |
669.87 | 187.957941 | 188.223341 | 187.406734 |
670.21 | 185.978739 | 186.254137 | 185.567528 |
670.55 | 184.249536 | 184.544933 | 183.788322 |
670.89 | 182.770334 | 182.985728 | 182.319116 |
671.23 | 181.051131 | 181.156524 | 180.619911 |
671.57 | 179.541928 | 179.527319 | 179.120705 |
671.91 | 177.932726 | 177.878115 | 177.441499 |
672.25 | 176.223523 | 176.228910 | 175.832293 |
672.58 | 174.564297 | 174.529683 | 174.213064 |
672.92 | 172.775095 | 172.700478 | 172.503858 |
673.26 | 171.265892 | 171.101274 | 170.874652 |
673.60 | 169.806689 | 169.562069 | 169.315446 |
673.94 | 168.317487 | 168.032865 | 167.856241 |
674.28 | 166.958284 | 166.683661 | 166.477035 |
674.62 | 165.489082 | 165.274456 | 165.167829 |
674.96 | 163.929879 | 163.725252 | 163.538623 |
675.30 | 162.560677 | 162.246047 | 162.109417 |
675.64 | 161.131474 | 160.806843 | 160.580211 |
675.97 | 159.492248 | 159.317615 | 159.080982 |
676.31 | 157.923045 | 157.708411 | 157.521776 |
676.65 | 156.393843 | 156.229206 | 155.962570 |
676.99 | 155.104640 | 154.900002 | 154.743365 |
677.33 | 153.545438 | 153.340797 | 153.184159 |
677.67 | 152.126235 | 151.961593 | 151.714953 |
678.01 | 150.527033 | 150.432388 | 150.135747 |
678.34 | 149.007806 | 149.053161 | 148.726518 |
678.68 | 147.548604 | 147.573956 | 147.287312 |
679.02 | 146.069401 | 146.044752 | 145.828106 |
679.36 | 144.630199 | 144.625547 | 144.398900 |
679.70 | 143.220996 | 143.266343 | 142.979695 |
704 rows × 3 columns
For more on the various ways to select data from dataframes, see the [pandas indexing and selection tutorial. We will focus mainly on slicing below, which is where skspec
implements its own slicing objects.
We can use ts.nearby
to do approximate index slicing.
t1.nearby[550.0:650.0].plot(cbar=True);
We can slice t2, which has monotonically decreasing values, merely by slicing from large to small:
#Integers still interpreted as labels, not index
t2.nearby[3000:2800].plot(cbar=True, colormap='cool');
Pandas indexers natively support row and column slicing. Because our data is in minutes, it's easy to slice. This gets a little more tricky when data is in a complex unit like datetimeindex (see subsequent sections). The syntax for row and column slicing is:
The columns slicing is also inherently label-based, meaning we cannot slice by index (eg ts.nearby[5:8, 9:1])
t2.nearby[3000.0:2800.0, 3.0:11.0].plot(cbar=True, colormap='cool');
Since skspec
is using pandas slicing, it is very flexible. In addition to slices, we can use individual rows/columns, or a list of rows/columns. For example, to 3 column:
t2.nearby[3000.0:2800.0, [2, 5, 7]]
--------------------------------------------------------------------------- NotImplementedError Traceback (most recent call last) <ipython-input-13-c51e3bbfd197> in <module>() ----> 1 t2.nearby[3000.0:2800.0, [2, 5, 7]] /home/glue/Desktop/skspec/skspec/skspec/pandas_utils/metadframe.py in __getitem__(self, key) 324 def __getitem__(self, key): 325 --> 326 out = super(_MetaLocIndexer, self).__getitem__(key) 327 if isinstance(out, DataFrame): #If Series or Series subclass 328 out = self.obj._transfer(out) /home/glue/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/core/indexing.pyc in __getitem__(self, key) 67 pass 68 ---> 69 return self._getitem_tuple(key) 70 else: 71 return self._getitem_axis(key, axis=0) /home/glue/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/core/indexing.pyc in _getitem_tuple(self, tup) 673 continue 674 --> 675 retval = getattr(retval, self.name)._getitem_axis(key, axis=i) 676 677 return retval /home/glue/Desktop/skspec/skspec/skspec/core/spectra.pyc in _getitem_axis(self, key, axis, validate_iterable) 1479 # WILL OUTPUT WRONG INDEX STUFF LIKE TIMEINDEX 1480 # NEED TO FIX -> 1481 raise NotImplementedError("See GH #107") 1482 out = self._getitem_iterable(key, axis=axis) 1483 NotImplementedError: See GH #107
We can also use steps for slicing to sample (eg plot 1 out of ever 3 timepoints). Let's plot all wavelengths under 500nm, and every 10 columns from the first dataset
t1.nearby[:500, ::10].plot(title='Time sampling by 10', cbar=True);
Pandas slicers are not approximate; they require exact column ranges or index values. Pandas implements 3 slicing objects supported by puvvis
:
ts.loc[]
: Exact Label slicingts.iloc[]
: Index slicingts.ix[]
: Mixed label and index slicingloc
: exact label slicing¶loc
is very similiar to nearby
, except labels must be exact, and slicing outside the range of values of the index is acceptable; it merely returns an empty container:
print t1.index[50], t1.index[55]
448.64 450.48
t1.loc[448.64:450.48].plot(title='Location slicing based on index values above', cbar=True);
#Values are outside valid spectral axis
t1.loc[5000.0:6000.0]
0.0 | 0.05 | 0.116666666667 | 0.183333333333 | 0.233333333333 | 0.3 | 0.366666666667 | 0.416666666667 | 0.483333333333 | 0.55 | ... | 5.48333333333 | 5.55 | 5.6 | 5.66666666667 | 5.73333333333 | 5.78333333333 | 5.85 | 5.91666666667 | 5.98333333333 | 6.03333333333 |
---|
0 rows × 100 columns
iloc
: index slicing¶iloc
will take indicies for row and column values. All aformentioned slice types, including single index, slices, list of index values etc... are supported; this is just wrapping the DataFrame's iloc
slicer. skspec is adding nothing to this.
Let's slice the first 100 wavelengths, and select 5 timepoints at indicies 5, 20, 30, 50, 55
t1.iloc[0:100, [5,20,30, 50, 55]].plot(cbar=True);
ix
, mixed slicing.¶ix
will mix labels and indicies. This is especially useful for timestamped data, where it gets tedious to pass in labels. For example, let's convert our data back to timestamps and look at the first 2 columns:
t1_stamped = t1.as_varunit('dti') #dti = datetimeindex
t1_stamped.columns[0:2]
TimeIndex[dti]([2014-05-22 15:38:23, 2014-05-22 15:38:26], dtype='object')
While slicing/selection of timestamps is possible, it's sometimes easier to deal with indicies. In this case, we can use ix
to select the spectral axis through its labels and the perturbation axis (time) through indicies. Recall, label selection still requires exact values
t1_stamped.ix[448.64:450.48, 50:60].plot(title='Columns 50-60, rows sliced by label', color='r');