Lesson 4

In this lesson were going to go back to the basics. We will be working with a small data set so that you can easily understand what I am trying to explain. We will be adding columns, deleting columns, and slicing the data many different ways. Enjoy!

In [20]:
# Import libraries
from pandas import DataFrame
import pandas as pd
import sys
In [21]:
print 'Python version ' + sys.version
print 'Pandas version: ' + pd.__version__
Python version 2.7.5 |Anaconda 1.8.0 (64-bit)| (default, Jul  1 2013, 12:37:52) [MSC v.1500 64 bit (AMD64)]
Pandas version: 0.14.0
In [22]:
# Our small data set
d = [0,1,2,3,4,5,6,7,8,9]

# Create dataframe
df = DataFrame(d)
df
Out[22]:
0
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
In [23]:
# Lets change the name of the column
df.columns = ['Rev']
df
Out[23]:
Rev
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
In [24]:
# Lets add a column
df['NewCol'] = 5
df
Out[24]:
Rev NewCol
0 0 5
1 1 5
2 2 5
3 3 5
4 4 5
5 5 5
6 6 5
7 7 5
8 8 5
9 9 5
In [25]:
# Lets modify our new column
df['NewCol'] = df['NewCol'] + 1
df
Out[25]:
Rev NewCol
0 0 6
1 1 6
2 2 6
3 3 6
4 4 6
5 5 6
6 6 6
7 7 6
8 8 6
9 9 6
In [26]:
# We can delete columns
del df['NewCol']
df
Out[26]:
Rev
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
In [27]:
# Lets add a couple of columns
df['test'] = 3
df['col'] = df['Rev']
df
Out[27]:
Rev test col
0 0 3 0
1 1 3 1
2 2 3 2
3 3 3 3
4 4 3 4
5 5 3 5
6 6 3 6
7 7 3 7
8 8 3 8
9 9 3 9
In [28]:
# If we wanted, we could change the name of the index
i = ['a','b','c','d','e','f','g','h','i','j']
df.index = i
df
Out[28]:
Rev test col
a 0 3 0
b 1 3 1
c 2 3 2
d 3 3 3
e 4 3 4
f 5 3 5
g 6 3 6
h 7 3 7
i 8 3 8
j 9 3 9

We can now start to select pieces of the dataframe using loc.

note: loc is strictly label based. It is available from [version 0.11.0] (http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#v0-11-0-april-22-2013)

In [29]:
df.loc['a']
Out[29]:
Rev     0
test    3
col     0
Name: a, dtype: int64
In [30]:
# df.loc[inclusive:inclusive]
df.loc['a':'d']
Out[30]:
Rev test col
a 0 3 0
b 1 3 1
c 2 3 2
d 3 3 3
In [31]:
# df.iloc[inclusive:exclusive]
# Note: .iloc is strictly integer position based. It is available from [version 0.11.0] (http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#v0-11-0-april-22-2013) 
df.iloc[0:3]
Out[31]:
Rev test col
a 0 3 0
b 1 3 1
c 2 3 2

We can also select using the column name.

In [32]:
df['Rev']
Out[32]:
a    0
b    1
c    2
d    3
e    4
f    5
g    6
h    7
i    8
j    9
Name: Rev, dtype: int64
In [33]:
df[['Rev', 'test']]
Out[33]:
Rev test
a 0 3
b 1 3
c 2 3
d 3 3
e 4 3
f 5 3
g 6 3
h 7 3
i 8 3
j 9 3
In [34]:
# df['ColumnName'][inclusive:exclusive]
df['Rev'][0:3]
Out[34]:
a    0
b    1
c    2
Name: Rev, dtype: int64
In [35]:
df['col'][5:]
Out[35]:
f    5
g    6
h    7
i    8
j    9
Name: col, dtype: int64
In [36]:
df[['col', 'test']][:3]
Out[36]:
col test
a 0 3
b 1 3
c 2 3

There is also some handy function to select the top and bottom records of a dataframe.

In [37]:
# Select top N number of records (default = 5)
df.head()
Out[37]:
Rev test col
a 0 3 0
b 1 3 1
c 2 3 2
d 3 3 3
e 4 3 4
In [38]:
# Select bottom N number of records (default = 5)
df.tail()
Out[38]:
Rev test col
f 5 3 5
g 6 3 6
h 7 3 7
i 8 3 8
j 9 3 9

Author: David Rojas LLC