Lesson 11

Grab data from multiple excel files and merge them into a single dataframe.

In [21]:
from pandas import DataFrame, read_excel
import pandas as pd
import os
import sys
%matplotlib inline
In [22]:
print 'Python version ' + sys.version
print 'Pandas version: ' + pd.__version__
Python version 2.7.5 |Anaconda 1.8.0 (64-bit)| (default, Jul  1 2013, 12:37:52) [MSC v.1500 64 bit (AMD64)]
Pandas version: 0.14.0

Create 3 excel files

In [23]:
# Create DataFrame
d = {'Channel':[1], 'Number':[255]}
df = DataFrame(d)
df
Out[23]:
Channel Number
0 1 255
In [24]:
# Export to Excel

df.to_excel('test1.xlsx', sheet_name = 'test1', index = False)
df.to_excel('test2.xlsx', sheet_name = 'test2', index = False)
df.to_excel('test3.xlsx', sheet_name = 'test3', index = False)
print 'Done'
Done

Place all three Excel files into a DataFrame

Get a list of file names but make sure there are no other excel files present in the folder.

In [25]:
# List to hold file names
FileNames = []

# Your path will be different, please modify the path below.
os.chdir(r"C:\Users\david\notebooks\pandas")

# Find any file that ends with ".xlsx"
for files in os.listdir("."):
    if files.endswith(".xlsx"):
        FileNames.append(files)
        
FileNames
Out[25]:
['test1.xlsx', 'test2.xlsx', 'test3.xlsx']

Create a function to process all of the excel files.

In [26]:
def GetFile(fnombre):

    # Path to excel file
    # Your path will be different, please modify the path below.
    location = r'C:\Users\david\notebooks\pandas\\' + fnombre
    
    # Parse the excel file
    # 0 = first sheet
    df = read_excel(location, 0)
    
    # Tag record to file name
    df['File'] = fnombre
    
    # Make the "File" column the index of the df
    return df.set_index(['File'])

Go through each file name, create a dataframe, and add it to a list.

i.e.
df_list = [df, df, df]

In [27]:
# Create a list of dataframes
df_list = [GetFile(fname) for fname in FileNames]
df_list
Out[27]:
[            Channel  Number
 File                       
 test1.xlsx        1     255,             Channel  Number
 File                       
 test2.xlsx        1     255,             Channel  Number
 File                       
 test3.xlsx        1     255]
In [28]:
# Combine all of the dataframes into one
big_df = pd.concat(df_list)
big_df
Out[28]:
Channel Number
File
test1.xlsx 1 255
test2.xlsx 1 255
test3.xlsx 1 255
In [29]:
big_df.dtypes
Out[29]:
Channel    int64
Number     int64
dtype: object
In [30]:
# Plot it!
big_df['Channel'].plot(kind='bar');

Author: David Rojas LLC