import pandas as pd
is a convention.import pandas as pd
import numpy as np
%matplotlib inline
pd.set_option('display.mpl_style', 'default')
Two Parts: Index and Value
pd.Series(np.random.rand(10)).hist(bins=2)
<matplotlib.axes._subplots.AxesSubplot at 0x10bd49390>
NB The index can be either just sequence of integers or something semantic.
s = pd.Series(np.random.rand(1000))
s[:100].hist(bins=10)
<matplotlib.axes._subplots.AxesSubplot at 0x10bf8df10>
Sometimes you want the index to be something that is not just 0,1,2,...
For example, index might want to be times.
rng = pd.date_range('2/10/2015', periods=168, freq='H')
ts = pd.Series(np.random.randn(len(rng)), index=rng)
ts.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x10c227450>
A datatype that is representative of a set of tabular data. Similar to R, STATA and Matlab types. Inspired directly from R.
Has an Index, Headers, Rows, and Values.
reg = {'name': 'regenstein',
'books': 200,
'rooms': 25}
crerar = {'name': 'crerar',
'books': 250,
'rooms': 4}
libraries = [reg, crerar]
lib_short_df = pd.DataFrame(libraries)
lib_short_df
books | name | rooms | |
---|---|---|---|
0 | 200 | regenstein | 25 |
1 | 250 | crerar | 4 |
(input/output). Using Pandas to get data from elsewhere into a form that you can use.
url = 'http://cfss.uchicago.edu/data/libraries.csv'
lib_df = pd.read_csv(url)
lib_df[' numBooks'].plot(kind='bar', x)
<matplotlib.axes._subplots.AxesSubplot at 0x10c7b7550>
lib_df.describe()
capacity | numBooks | floors | |
---|---|---|---|
count | 5.000000 | 5.000000 | 5.000000 |
mean | 650.000000 | 344520.400000 | 3.600000 |
std | 511.126208 | 542448.921373 | 2.302173 |
min | 100.000000 | 15032.000000 | 1.000000 |
25% | 150.000000 | 75322.000000 | 2.000000 |
50% | 750.000000 | 102496.000000 | 4.000000 |
75% | 1000.000000 | 224506.000000 | 4.000000 |
max | 1250.000000 | 1305246.000000 | 7.000000 |