The following is part of my notes on scientific computing with python:
# load numpy
import numpy as np
# load scipy
import scipy as sp
# load data visualization
import matplotlib.pyplot as plt # the tidy way
# allows embedding of plots in the notebook
%matplotlib inline
# load image processing
from IPython.display import Image
NumPy is the fundamental package for scientific computing with Python.
Python:
Numpy:
Numpy arrays are the standard tool for scienfitic computing. Here are some differences between lists, which we're all familiar with, and ndarrays:
a = np.array([0,1,2,3])
a
array([0, 1, 2, 3])
# old way
L = range(1000)
%timeit [i**2 for i in L]
10000 loops, best of 3: 73.9 µs per loop
# memory and computation efficient using numpy
a = np.arange(1000)
%timeit a**2
100000 loops, best of 3: 2.08 µs per loop
# access documentation - popup
np.array?
# search numpy
np.lookfor('create array')
Search results for 'create array' --------------------------------- numpy.array Create an array. numpy.memmap Create a memory-map to an array stored in a *binary* file on disk. numpy.diagflat Create a two-dimensional array with the flattened input as a diagonal. numpy.fromiter Create a new 1-dimensional array from an iterable object. numpy.partition Return a partitioned copy of an array. numpy.ma.diagflat Create a two-dimensional array with the flattened input as a diagonal. numpy.ctypeslib.as_array Create a numpy array from a ctypes array or a ctypes POINTER. numpy.ma.make_mask Create a boolean mask from an array. numpy.ctypeslib.as_ctypes Create and return a ctypes object from a numpy array. Actually numpy.ma.mrecords.fromarrays Creates a mrecarray from a (flat) list of masked arrays. numpy.lib.format.open_memmap Open a .npy file as a memory-mapped array. numpy.ma.MaskedArray.__new__ Create a new masked array from scratch. numpy.lib.arrayterator.Arrayterator Buffered iterator for big arrays. numpy.ma.mrecords.fromtextfile Creates a mrecarray from data stored in the file `filename`. numpy.oldnumeric.ma.fromfunction apply f to s to create array as in umath. numpy.oldnumeric.ma.masked_object Create array masked where exactly data equal to value numpy.oldnumeric.ma.masked_values Create a masked array; mask is nomask if possible. numpy.asarray Convert the input to an array. numpy.ndarray ndarray(shape, dtype=float, buffer=None, offset=0, numpy.recarray Construct an ndarray that allows field access using attributes. numpy.chararray chararray(shape, itemsize=1, unicode=False, buffer=None, offset=0, numpy.pad Pads an array. numpy.sum Sum of array elements over a given axis. numpy.asanyarray Convert the input to an ndarray, but pass ndarray subclasses through. numpy.copy Return an array copy of the given object. numpy.diag Extract a diagonal or construct a diagonal array. numpy.load Load an array(s) or pickled objects from .npy, .npz, or pickled files. numpy.sort Return a sorted copy of an array. numpy.array_equiv Returns True if input arrays are shape consistent and all elements equal. numpy.dtype Create a data type object. numpy.choose Construct an array from an index array and a set of arrays to choose from. numpy.nditer Efficient multi-dimensional iterator object to iterate over arrays. numpy.swapaxes Interchange two axes of an array. numpy.full_like Return a full array with the same shape and type as a given array. numpy.ones_like Return an array of ones with the same shape and type as a given array. numpy.empty_like Return a new array with the same shape and type as a given array. numpy.zeros_like Return an array of zeros with the same shape and type as a given array. numpy.asarray_chkfinite Convert the input to an array, checking for NaNs or Infs. numpy.diag_indices Return the indices to access the main diagonal of an array. numpy.ma.choose Use an index array to construct a new array from a set of choices. numpy.chararray.tolist a.tolist() numpy.matlib.rand Return a matrix of random values with given shape. numpy.savez_compressed Save several arrays into a single file in compressed ``.npz`` format. numpy.ma.empty_like Return a new array with the same shape and type as a given array. numpy.ma.make_mask_none Return a boolean mask of the given shape, filled with False. numpy.ma.mrecords.fromrecords Creates a MaskedRecords from a list of records. numpy.around Evenly round to the given number of decimals. numpy.source Print or write to a file the source code for a Numpy object. numpy.diagonal Return specified diagonals. numpy.histogram2d Compute the bi-dimensional histogram of two data samples. numpy.fft.ifft Compute the one-dimensional inverse discrete Fourier Transform. numpy.fft.ifftn Compute the N-dimensional inverse discrete Fourier Transform. numpy.busdaycalendar A business day calendar object that efficiently stores information
# wildcard search
np.con*?
# 1D array
a = np.array([0,1,2,3])
print a.ndim
print a.shape
1 (4L,)
# 2D
b = np.array([[0, 1, 2] , [3, 4, 5]])
print b.ndim
print b.shape
2 (2L, 3L)
# 3D
c = np.array([[[1], [2]], [[3], [4]]])
c
array([[[1], [2]], [[3], [4]]])
c.shape
(2L, 2L, 1L)
# creating arrays
a = np.arange(10)
a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
b = np.arange(1,9,2)
b
array([1, 3, 5, 7])
c = np.linspace(0,1,6) # start,stop,number of points
c
array([ 0. , 0.2, 0.4, 0.6, 0.8, 1. ])
a = np.ones((3,3)) # note the strange way to enter it! (3,3) is entered as a tuple!
a
array([[ 1., 1., 1.], [ 1., 1., 1.], [ 1., 1., 1.]])
b = np.zeros((2,2))
b
array([[ 0., 0.], [ 0., 0.]])
c = np.eye(3)
c
array([[ 1., 0., 0.], [ 0., 1., 0.], [ 0., 0., 1.]])
d = np.diag(np.array([0,1,2,3]))
d
array([[0, 0, 0, 0], [0, 1, 0, 0], [0, 0, 2, 0], [0, 0, 0, 3]])
np.random.rand(4,2) # uniform [0,1]
array([[ 0.84747313, 0.21853345], [ 0.59142407, 0.96509551], [ 0.1320055 , 0.08128184], [ 0.84485094, 0.76080588]])
np.random.randn(5)
array([ 0.68053198, -0.91925977, -0.01203956, -2.13515203, -0.1439533 ])
# complex
np.array([1+2j, 3+4j])
array([ 1.+2.j, 3.+4.j])
# strings
np.array(['Bonjour', 'hello'])
array(['Bonjour', 'hello'], dtype='|S7')
# load data visualization
import matplotlib.pyplot as plt # the tidy way
a = np.linspace(1,10,10)
b = np.linspace(2,20,10)
plt.plot(a,b,'o') # points
plt.plot(a,b) # line (matlab's "hold" seems to already be active? not sure if this is only an ipython thing)
[<matplotlib.lines.Line2D at 0x8043198>]
image = np.random.rand(30, 30)
plt.imshow(image, cmap=plt.cm.hot)
plt.colorbar()
<matplotlib.colorbar.Colorbar instance at 0x00000000080C4648>
# indexing and slicing
a = np.arange(10)
a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
a[0],a[2],a[-1]
(0, 2, 9)
# reversing an array
a[::-1] # go from start to end by -1
array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])
# multidimensional arrays are tuples of ints
a = np.random.rand(3,3) # notice there is no tuple size specification here! strange!
a
array([[ 0.14312807, 0.53999226, 0.75505203], [ 0.40060085, 0.43943795, 0.9280733 ], [ 0.64114899, 0.87781988, 0.39220377]])
print a[1,1] # 2nd row, 2nd column
print a[2,1] # 3rd row, 2nd column
print a[1] # 2nd row
print a[:,2] # return 3rd column - as a row vector?
0.439437950863 0.877819883543 [ 0.40060085 0.43943795 0.9280733 ] [ 0.75505203 0.9280733 0.39220377]
# slicing
a = np.arange(11,21)
a
array([11, 12, 13, 14, 15, 16, 17, 18, 19, 20])
a[:4]
array([11, 12, 13, 14])
a[3:]
array([14, 15, 16, 17, 18, 19, 20])
a[::3]
array([11, 14, 17, 20])
a = np.arange(10)
a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
# slicing and assigning
a[5:] = 10
a
array([ 0, 1, 2, 3, 4, 10, 10, 10, 10, 10])
b = np.arange(5)
b
array([0, 1, 2, 3, 4])
a[5:] = b[::-1]
a
array([0, 1, 2, 3, 4, 4, 3, 2, 1, 0])
Note: slicing operations create views on the original array - a way to view its data. The original array is not copied in memory when this occurs. Modifying a view, modifies the original array!
a = np.arange(10)
a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
b = a[::2]
b
array([0, 2, 4, 6, 8])
b[0] = 12
b
array([12, 2, 4, 6, 8])
a # a is now modified!!
array([12, 1, 2, 3, 4, 5, 6, 7, 8, 9])
We must force a copy to be made when we want one (taking up memory). This behavior allows you to save memory and time! (example given in docs)
a = np.arange(10)
a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
c = a[::2].copy() # forces the copy
c[0] = 12
a # a remains unchanged
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Note: Numpy arrays can be indexed with slices, but also with boolean or integer arrays (masks). This method is called fancy indexing. It creates copies not views.
# fancy indexing
np.random.seed(3)
a = np.random.random_integers(0, 20, 15)
a
array([10, 3, 8, 0, 19, 10, 11, 9, 10, 6, 0, 20, 12, 7, 14])
(a % 3 == 0)
array([False, True, False, True, False, False, False, True, False, True, True, False, True, False, False], dtype=bool)
mask = (a % 3 == 0)
extract_from_a = a[mask] # or, a[a%3==0]
extract_from_a # extract a sub-array with the mask
array([ 3, 0, 9, 6, 0, 12])
# index mask can be converted to position using `where` function
indeces = np.where(mask)
extract_from_a[0] = 4
extract_from_a
array([ 4, 0, 9, 6, 0, 12])
a # a is unchanged!
array([10, 3, 8, 0, 19, 10, 11, 9, 10, 6, 0, 20, 12, 7, 14])
# indexing with arrays of integers
a = np.arange(0, 100, 10)
a
array([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])
indeces_we_want = [2,3,2,4,2] # repeated indices!
a[indeces_we_want]
array([20, 30, 20, 40, 20])
a[[9,7]] = 34 # assigning values using this indexing - note they can be out of order!
a
array([ 0, 10, 20, 30, 40, 50, 60, 34, 80, 34])
# **Note:** Try to always use numpy array operations. There much faster then native python:
a = np.arange(10000)
%timeit a + 1
100000 loops, best of 3: 12.8 µs per loop
l = range(10000)
%timeit [i+1 for i in l]
1000 loops, best of 3: 484 µs per loop
NOTE: Array multiplication is not matrix multiplication!!!!
c = np.ones((3,3)) * 3
d = np.ones((3,4)) * 2
c * d # NOT MATRIX MULTIPLICATION
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-57-e88232272fbe> in <module>() 1 c = np.ones((3,3)) * 3 2 d = np.ones((3,4)) * 2 ----> 3 c * d # NOT MATRIX MULTIPLICATION ValueError: operands could not be broadcast together with shapes (3,3) (3,4)
c.dot(d) # THIS IS MATRIX MULTIPLICATION
Note: We can also change their type to the matrix
type, where the typical operands are using matrix algebra.
V = matrix(c)
B = matrix(d)
type(V)
V*B # this IS matrix algebra
v1 = arange(5)
v = matrix(v1).T # make it a column vector
v.T*v # inner product
Numpy matrices are strictly 2-dimensional, while numpy arrays (ndarrays) are N-dimensional. Matrix objects are a subclass of ndarray, so they inherit all the attributes and methods of ndarrays.
The main advantage of numpy matrices is that they provide a convenient notation for matrix multiplication: if a
and b
are matrices, then `ab` is their matrix product.*
a = np.mat('4 3; 2 1')
b = np.mat('1 2; 3 4')
print(a)
print(b)
print(a*b)
Both matrix objects and ndarrays have .T
to return the transpose, but matrix objects also have .H
for the conjugate transpose, and .I
for the inverse.
In contrast, numpy arrays consistently abide by the rule that operations are applied element-wise. Thus, if a
and b
are numpy arrays, then a*b
is the array formed by multiplying the components element-wise:
c = np.array([[4, 3], [2, 1]])
d = np.array([[1, 2], [3, 4]])
print(c*d)
To obtain the result of matrix multiplication, you use np.dot
:
print(np.dot(c,d))
The **
operator also behaves differently:
print(a**2)
print(c**2)
Since a
is a matrix, a**2
returns the matrix product a*a
. Since c
is an ndarray, c**2
returns an ndarray with each component squared element-wise.
There are other technical differences between matrix objects and ndarrays (having to do with np.ravel, item selection and sequence behavior).
The main advantage of numpy arrays is that they are more general than 2-dimensional matrices. What happens when you want a 3-dimensional array? Then you have to use an ndarray, not a matrix object. Thus, learning to use matrix objects is more work -- you have to learn matrix object operations, and ndarray operations.
Writing a program that uses both matrices and arrays makes your life difficult because you have to keep track of what type of object your variables are, lest multiplication return something you don't expect.
In contrast, if you stick solely with ndarrays, then you can do everything matrix objects can do, and more, except with slightly different functions/notation.
If you are willing to give up the visual appeal of numpy matrix product notation, then I think numpy arrays are definitely the way to go.
# array comparisons
a = np.array([1, 2, 3, 4])
b = np.array([4, 2, 2, 4])
a == b
a > b
# array wise comparsions
a = np.array([1, 2, 3, 4])
b = np.array([4, 2, 2, 4])
c = np.array([1, 2, 3, 4])
np.array_equal(a, b)
np.array_equal(a, c)
# logical operations
a = np.array([1, 1, 0, 0], dtype=bool)
b = np.array([1, 0, 1, 0], dtype=bool)
np.logical_or(a, b)
np.logical_and(a, b)
# sin/log/etc.
a = np.arange(5)
np.sin(a)
np.log(a)
np.exp(a)
# upper triangular
a = np.triu([[1,2,3],[4,5,6],[7,8,9],[10,11,12]], -1) # np.triu?
a
# transpose
a.T
NOTE: The transposition is a view. As a results, the following code is wrong and will not make a matrix symmetric:
# tranpose is a view!
a += a.T
x = np.array([1, 2, 3, 4])
np.sum(x) # or x.sum()
# sum by rows
x = np.array([[1, 1], [2, 2]])
x
# axis specification
Image(url='https://scipy-lectures.github.io/_images/reductions.png')
x.sum(axis=0) # columns (first dimension)
x[:, 0].sum(), x[:, 1].sum() # ugly
x.sum(axis=1)
x[0, :].sum(), x[1, :].sum()
# higher dimensions
x = np.random.rand(4, 2, 2)
x
x.sum(axis=2)[0, 1]
#extrema
x = np.array([1, 3, 2])
x.max()
x.argmax() # index of max
# logical operations
np.all([True, True, False]) # and
np.any([True, True, False]) # or
a = np.zeros((100,100))
# masking/comparisons
np.any(a != 0)
np.all(a == a)
# basic stats
x = np.array([1,2,3,1])
y = np.array([[1, 2, 3], [5, 6, 1]])
x.mean()
np.median(x)
np.median(y,axis=-1) # last axis
x.std()
To achieve high performance, assignments in Python usually do not copy the underlaying objects. This is important for example when objects are passed between functions, to avoid an excessive amount of memory copying when it is not necessary (techincal term: pass by reference).
A = array([[1, 2], [3, 4]])
A
B = A # now B is referring to the same array data as A
B[0,0] = 10 # changing B affects A
B
A
If we want to avoid this behavior, so that when we get a new completely independent object B copied from A, then we need to do a so-called "deep copy" using the function copy
:
B = copy(A)
B[0,0] = -5 # now, if we modify B, A is not affected
B
A
To get good performance we should try to avoid looping over elements in our vectors and matrices, and instead use vectorized algorithms. The first step in converting a scalar algorithm to a vectorized algorithm is to make sure that the functions we write work with vector inputs.
def Theta(x):
"""
Scalar implemenation of the Heaviside step function.
"""
if x >= 0:
return 1
else:
return 0
Theta(array([-3,-2,-1,0,1,2,3]))
OK, that didn't work because we didn't write the Theta function so that it can handle with vector input...
To get a vectorized version of Theta we can use the Numpy function vectorize. In many cases it can automatically vectorize a function:
Theta_vec = vectorize(Theta)
Theta_vec(array([-3,-2,-1,0,1,2,3]))
We can also implement the function to accept vector input from the beginning (requires more effort but might give better performance):
def Theta(x):
"""
Vector-aware implemenation of the Heaviside step function.
"""
return 1 * (x >= 0)
Theta(array([-3,-2,-1,0,1,2,3]))
# still works for scalars as well
Theta(-1.2), Theta(2.6)
vectorize?
Let us consider a simple 1D random walk process: at each time step a walker jumps right or left with equal probability.
We are interested in finding the typical distance from the origin of a random walker after t
jumps. We are going to simulate many “walkers” to find this law, and we are going to do so using array computing: we are going to create a 2D array with the “stories” (each walker has a story) in one direction, and the time in the other:
n_stories = 1000 # num rows/walkers
t_max = 200 # num col/time to follow walker
# randomly choose all the steps (1 or -1) of the walk:
time = np.arange(t_max)
steps = 2 * np.random.random_integers(0, 1, (n_stories, t_max)) - 1 # generate random 0:1, shift to 0:2, and finally -1:1
np.unique(steps) # verify all steps are between -1 and 1
# build the walks by summing steps along time (col)
positions = np.cumsum(steps, axis=1) # axis = 1: dimension of time
Image(url='https://scipy-lectures.github.io/_images/random_walk_schema.png')
sq_distance = positions**2
mean_sq_distance = np.mean(sq_distance, axis=0) # calc mean "story" (rows)
# plot results
plt.figure(figsize=(4, 3))
plt.plot(time, np.sqrt(mean_sq_distance), 'g.', time, np.sqrt(time), 'y-')
plt.xlabel(r"$t$")
plt.ylabel(r"$\sqrt{\langle (\delta x)^2 \rangle}$")
# skipped section on broadcasting - revist it:
Image(url='https://scipy-lectures.github.io/_images/numpy_broadcasting.png')
# array flattening
a = np.array([[1, 2, 3], [4, 5, 6]])
a
a.ravel() # ravel?
a.T
a.T.ravel()
# reshaping
a.shape
b = a.ravel()
b
b.reshape((2,3))
# not specifying the other dimension (give it -1) - its infered!
a.reshape((2,-1))
Note: ndarray.reshape
may return a view (cf help(np.reshape))), or copy - To understand this you need to learn more about the memory layout of a numpy array.
# adding an axis
z = np.array([1, 2, 3])
z
z[:, np.newaxis]
z[np.newaxis, :]
# polynomials
p = np.poly1d([3, 2, -1]) #3x^2 + 2x - 1
p(0)
p.roots
p.order
x = np.linspace(0,1,20)
y = np.cos(x) + 0.3*np.random.rand(20)
p = np.poly1d(np.polyfit(x,y,3))
t = np.linspace(0,1,200)
plt.plot(x, y, 'o', t, p(t), '-')
# 3x^2+2x-1
p = np.polynomial.Polynomial([-1, 2, 3]) # coefs in different order!!!
# Chebyshev basis
x = np.linspace(-1, 1, 2000)
y = np.cos(x) + 0.3*np.random.rand(2000)
p = np.polynomial.Chebyshev.fit(x, y, 90)
t = np.linspace(-1, 1, 200)
plt.plot(x, y, 'r.')
plt.plot(t, p(t), 'k-', lw=3)
# loading text files
data = np.loadtxt('files/data.txt')
data
array([[ 1900., 30000., 4000., 48300.], [ 1901., 47200., 6100., 48200.], [ 1902., 70200., 9800., 41500.], [ 1903., 77400., 35200., 38200.]])
# np.savetxt('myfile.txt',data)
# scipy.io.loadmat, scipy.io.savemat
# Lena example
from scipy import misc
lena = misc.lena()
plt.imshow(lena)
plt.imshow(lena, cmap=plt.cm.gray)
crop_lena = lena[30:-30,30:-30]
y, x = np.ogrid[0:512,0:512] # x and y indices of pixels
centerx, centery = (256, 256) # center of the image
mask = ((y - centery)**2 + (x - centerx)**2) > 230**2 # circle
lena[mask] = 0
plt.imshow(lena)