Numpy is the fundamental library for scientific computing in Python. It contains list like objects that work like arrays, matrices, and data tables. This is how scientists typically expect data to behave. Numpy also provides linear algebra, Fourier transforms, random number generation, and tools for integrating C/C++ and Fortran code.
If you primarily want to work with tables of data, Pandas, which depends on Numpy, is probably the module that you want to start with.
import numpy as np
example_array = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
example_array
array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
example_array[1, 1]
5
example_array[:, 0]
array([1, 4, 7])
example_array[1, :]
array([4, 5, 6])
example_array[1:3, 1:3]
array([[5, 6], [8, 9]])
array1 = np.array([1, 1, 1, 2, 2, 2, 1])
array2 = np.array([1, 2, 3, 4, 5, 6, 7])
array2[array1==1]
array([1, 2, 3, 7])
array3 = np.array(['a', 'a', 'a', 'b', 'b', 'b', 'b'])
array2[(array1==1) & (array3=='a')]
array([1, 2, 3])
array1 = np.array([1, 1, 1, 2, 2, 2, 1])
array2 = np.array([1, 2, 3, 4, 5, 6, 7])
array1 * 2 + 1
array([3, 3, 3, 5, 5, 5, 3])
array1 * array2
array([ 1, 2, 3, 8, 10, 12, 7])
Linear algebra is done using a different data structure called a matrix.
matrix1 = np.matrix([[1, 2, 3], [4, 5, 6]])
matrix2 = np.matrix([1, 2, 3])
matrix1 * matrix2.transpose()
matrix([[14], [32]])
The numpy function genfromtxt is a powerful way to import text data. It can use different delimiters, skip header rows, control the type of imported data, give columns of data names, and a number of other useful goodies. See the documentation for a full list of features of run help(np.genfromtxt) from the Python shell (after importing the module of course).
data = np.genfromtxt('../data/examp-data.txt', delimiter=',', skip_header=1)
data
array([[ 1. , 2. , 3. ], [ 2. , 2.4, 6. ], [ 3. , 1.9, 8. ]])
np.savetxt('../data/examp-output.txt', data, delimiter=',')
Lots of scientific data comes in the form of tables, with one row per observation, and one column per thing observed. Often the different columns to have different types (including text). The best way to work with this type of data is in a Structured Array.
To do this we let Numpy automatically detect the data types in each column using the optional argument dtype=None
.
We can also use an existing header row as the names for the columns using the optional arugment Names=True
.
data = np.genfromtxt('../data/examp-data-species-mass.txt', dtype=None, names=True, delimiter=',')
data
array([(1, 'DS', 125), (1, 'DM', 70), (2, 'DM', 55), (1, 'CB', 40), (2, 'DS', 110), (1, 'CB', 45)], dtype=[('site', '<i8'), ('species', '|S2'), ('mass', '<i8')])
The easiest way to export a structured array is to treat it like a list of lists and export it using the csv module using a function like this.
def export_to_csv(data, filename):
outputfile = open(filename, 'wb')
datawriter = csv.writer(outputfile)
datawriter.writerows(data)
outputfile.close()
If we import data into a Structured Array we can do a lot of things that we often want to do with scientific data.
data = np.genfromtxt('../data/examp-data-species-mass.txt', dtype=None, names=True, delimiter=',')
print data
data['species']
[(1, 'DS', 125) (1, 'DM', 70) (2, 'DM', 55) (1, 'CB', 40) (2, 'DS', 110) (1, 'CB', 45)]
array(['DS', 'DM', 'DM', 'CB', 'DS', 'CB'], dtype='|S2')
data['mass'][data['species'] == 'DM']
array([70, 55])
data['mass'][(data['species'] == 'DM') & (data['site'] == 1)]
array([70])
np.random.rand(3, 5)
array([[ 0.03414585, 0.83900235, 0.93206285, 0.06820967, 0.70145045], [ 0.552352 , 0.76730225, 0.06316622, 0.71285231, 0.81976971], [ 0.39709379, 0.71772434, 0.21598482, 0.96412023, 0.69841293]])
np.random.randn(4, 2)
array([[ 0.04802043, -0.89025722], [ 0.46246887, 1.11994326], [-0.95655129, 0.76707094], [-1.61019706, -0.21933367]])
min = 10
max = 20
np.random.randint(min, max, [10, 2])
array([[13, 10], [18, 10], [10, 12], [10, 16], [17, 12], [17, 13], [19, 16], [11, 17], [11, 16], [16, 18]])