Nikolay Koldunov
koldunovn@gmail.com
import pandas as pd
import numpy as np
%pylab inline
import seaborn as sns
Populating the interactive namespace from numpy and matplotlib
We are going to use data from Indian Institute Of Tropical Meteorology. In particular Longest Instrumental Rainfall Series of the Indian Regions (1813-2006) time series.
data = np.loadtxt("8-all_ind.txt", skiprows=2)
Data came in as a table with months and yearly averages as columns:
data
array([[ 1813. , 10. , 14.2, ..., 97.7, 840.6, 156.2], [ 1814. , 9. , 12.3, ..., 81.1, 837.3, 81.7], [ 1815. , 18.5, 14.7, ..., 94.1, 895.5, 179.7], ..., [ 2004. , 18.4, 5.1, ..., 150.9, 807.2, 125.4], [ 2005. , 20.7, 19.8, ..., 106.4, 909.4, 161.4], [ 2006. , 5. , 3.4, ..., 154.3, 927.8, 115.5]])
We actually need only 12 columns with months:
data.shape
(194, 18)
data[:,1:13].shape
(194, 12)
One way to make continuous time series is to flatten the array:
data_flat = data[:,1:13].flatten()
Now its one dimentional:
data_flat.shape
(2328,)
Let's now construct a pandas DataFrame.
We know the time range of the data, so we can create time array, that can be used as an index:
dates = pd.period_range('1813-01','2006-12', freq="M")
dates
<class 'pandas.tseries.period.PeriodIndex'> [1813-01, ..., 2006-12] Length: 2328, Freq: M
Now we can put together our data and index:
df = pd.DataFrame({'PRC':data_flat}, index=dates)
df.plot()
<matplotlib.axes.AxesSubplot at 0x7f1b77114810>
Now create yearly means:
df.resample('A', how='sum').plot()
<matplotlib.axes.AxesSubplot at 0x7f1b76d1d350>
We actally have information about daily means in the original data. So we can compare out result:
plot(data[:,13])
[<matplotlib.lines.Line2D at 0x7f1b768e6c50>]
Extract data for JJAS period, put them in to pandas Data frame variable and plot.