Importing delimited text files can be accomplished using a variety of modules in Python. This notebook will cover pure python, the csv module, the numpy module and pandas.
In pure Python we:
The types of the resulting items in the list of list will always be strings, so if you need something else it will have to be converted later.
datafile = open('./data/examp_data.txt', 'r')
data = []
for row in datafile:
data.append(row.strip().split(','))
data
[['x', ' y', ' z'], ['1', ' 2', ' 3'], ['2', ' 2.4', ' 6'], ['3', ' 1.9', ' 8']]
In the csv module we:
The types of the resulting items in the list of list will always be strings, so if you need something else it will have to be converted later.
import csv
datafile = open('./data/examp_data.txt', 'r')
datareader = csv.reader(datafile, delimiter=',')
data = []
for row in datareader:
data.append(row)
data
[['x', ' y', ' z'], ['1', ' 2', ' 3'], ['2', ' 2.4', ' 6'], ['3', ' 1.9', ' 8']]
Importing using Numpy will import the data into a Numpy array, a commonly used data structure for scientific programming.
Using Numpy we simply use the genfromtxt() function to directly import the data. genfromtxt() has a lot of options for controlling what and how gets imported. See the docs page for details: http://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html
Numpy will autodetect the data type, so we'll often want to leave off the header row(s) (skip_header=1). We could keep them using the names=True argument, and also columns with different data types, but that creates a structured array and if we want to work with that type of data we're typically better off using pandas (see below).
import numpy
data = numpy.genfromtxt('./data/examp_data.txt', delimiter = ',', skip_header=1)
data
array([[ 1. , 2. , 3. ], [ 2. , 2.4, 6. ], [ 3. , 1.9, 8. ]])
Pandas is a powerful data management library that produces data structures and associated tools that are ideal for scientific computing tasks related to data. In particular, it produces a dataframe object that is much like R's dataframe and is designed to hold data with the standard structure of one row per record and one column per type of data (or field).
In Pandas we just use the read_csv() function to import text files. It has a lot of options, but will do most things automatically including detecting delimiters and detecting data types.
import pandas as pd
data = pd.read_csv('./data/examp_data.txt')
data
x y z 0 1 2.0 3 1 2 2.4 6 2 3 1.9 8
Pandas dataframes do behave a bit differently than a lot of list based structures in Python, but we'll learn how to work with them soon. If you just want to pull the core data out of a dataframe you can do this using the values member (a member is just a variable associated with an object).
data.values
array([[ 1. , 2. , 3. ], [ 2. , 2.4, 6. ], [ 3. , 1.9, 8. ]])