Presented by Karen Cranston, uses some materials by Katy Huff and Matthew Terry.
The NumPy library includes (among other things) ways of storing and manipulating data that are more efficient than standard Python arrays. Using NumPy with numerical data is much faster than using Python lists or tuples. Goals here are to understand some of the gotchas when using arrays vs lists and to get a tour of the NumPy features.
We will start by importing the library and creating a regular Python list and a numpy array from that list.
import numpy
x = [1, 2, 3, 4, 5, 6 ]
np_arr = numpy.array(x)
Let's look at difference between x (python list) and arr (numpy array)
x
[1, 2, 3, 4, 5, 6]
np_arr
array([1, 2, 3, 4, 5, 6])
np_arr.ndim
1
np_arr.shape
(6,)
We can compare the two data structures. Operations on numpy arrays operate element by element. Explain this result?
x == np_arr
array([ True, True, True, True, True, True], dtype=bool)
Now, let's make a 2D array
x = [ [1, 2], [3, 4], [5, 6] ]
np_arr = numpy.array(x)
np_arr.shape
(3, 2)
We can slice the matrix to get the second column. Note that slices are a view of the same data. What happens when we change an element of the slice?
array_slice = np_arr[:,1]
array_slice
array([2, 4, 6])
array_slice[2]=7
np_arr
array([[1, 2], [3, 4], [5, 7]])
Differences between shallow and deep copies
arr_copy = np_arr.copy()
arr_copy[0,0]=3
arr_copy
array([[3, 2], [3, 4], [5, 7]])
np_arr
array([[1, 2], [3, 4], [5, 7]])
Operating on Python lists and numpy arrays is very different.
x*2
[[1, 2], [3, 4], [5, 6], [1, 2], [3, 4], [5, 6]]
np_arr * 3
array([[ 3, 6], [ 9, 12], [15, 21]])
With numpy arrays, operations are element by element. The multiplication operation multiplied each element individually. Compare to the Python list, where multiplication copied the entire array as a single unit. Try adding the list to iteself and compare to when you add the array to itself.
np_arr + np_arr
array([[ 2, 4], [ 6, 8], [10, 14]])
Numpy has functions for all of your basic matrix operations and statistical functions.
T = transpose; dot = dot product
np_arr.T.dot(np_arr)
array([[35, 49], [49, 69]])
average(np_arr)
3.6666666666666665
Average of what? (default is whole array flattened into single list). Find the average of the first column.
average(np_arr[:,0])
3.0
cov(np_arr)
array([[ 0.5, 0.5, 1. ], [ 0.5, 0.5, 1. ], [ 1. , 1. , 2. ]])
We can use NumPy functions to read data from a file into an array
%%file example-data.txt
0,0
1,2
2,4
3,8
4,16
5,32
6,64
Overwriting example-data.txt
data = numpy.loadtxt('example-data.txt', delimiter=',')
print data
[[ 0. 0.] [ 1. 2.] [ 2. 4.] [ 3. 8.] [ 4. 16.] [ 5. 32.] [ 6. 64.]]
x = [ 0, 1, 2, 3, 4, 5, 6 ]
y = [ 0, 2, 4, 8, 16, 32, 64 ]
plot(x, y)
[<matplotlib.lines.Line2D at 0x10611c590>]
plot(x, y, 'r--', label='my favorite line')
legend()
<matplotlib.legend.Legend at 0x1060f1dd0>
plot(x, y, 'r-')
axis(xmin=-10, xmax = 8, ymin=-10)
(-10, 8, -10, 70.0)
plot(x, y, 'r-')
axis(xmin=-10, xmax = 8, ymin=-10)
xlabel('This is my X axis')
ylabel('This is my Y axis')
title('foo')
savefig('/tmp/figure.pdf')
plot(x, y, 'r-')
axis(xmin=-10, xmax = 8, ymin=-10)
xlabel('This is my X axis')
ylabel('This is my Y axis')
title('foo')
savefig('/tmp/figure.png')
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np
bignum = 100
mat = np.random.random((bignum, bignum))
X, Y = np.mgrid[:bignum, :bignum]
fig = plt.figure()
ax = fig.add_subplot(1,1,1, projection='3d')
surf = ax.plot_surface(X,Y,mat)
plt.show()