_prepared by: Cindee Madison
Thanks to: Ariel Rokem, Matt Davis, Justin Kitzes, Katy Huff, Matthew Terry, Scipy Tutorial
Link to master with answers
Additional Resources
NumPy is a Python package implementing efficient collections of specific types of data (generally numerical), similar to the standard array
module (but with many more features). NumPy arrays differ from lists and tuples in that the data is contiguous in memory. A Python list,
[0, 1, 2]
, in contrast, is actually an array of pointers to Python objects representing each number. This allows NumPy arrays to be
considerably faster for numerical operations than Python lists/tuples.
Numpy provides
extension package to Python for multi-dimensional arrays
closer to hardware (efficiency)
designed for scientific computation (convenience)
Also known as array oriented computing
The Numpy Library contains functions and things
np.sqrt(64) #this is a function
np.pi #this is variable, not a function
np.sin( np.pi ) #a variable passed to a function
# import library, standard convention
import numpy as np
# To see what's in a package, type the name, a period, then hit tab
#np?
#np.
np.pi?
# Some examples of numpy functions and "things":
print(np.sqrt(4))
print(np.pi) # Not a function, just a variable
print(np.sin(np.pi)) # A function on a variable :)
2.0 3.14159265359 1.22464679915e-16
Creating a NumPy array is as simple as passing a sequence to numpy.array:
Numpy arrays are collections of things, all of which must be the same type, that work similarly to lists (as we've described them so far). The most important are:
Arrays can be created from existing collections such as lists, or instantiated "from scratch" in a few useful ways.
arr1 = np.array([1, 2.3, 4])
print(type(arr1))
print(arr1.dtype)
<type 'numpy.ndarray'> float64
strarr = np.array([x for x in 'hello'])
strarr.dtype
dtype('S1')
[x for x in 'hello']
['h', 'e', 'l', 'l', 'o']
strarr
array(['h', 'e', 'l', 'l', 'o'], dtype='|S1')
You can also explicitly specify the data-type if the automatically-chosen one would be unsuitable.
arr2 = np.array([1, 2.56, 4], dtype=int)
print(type(arr2))
print(arr2.dtype)
print arr2
<type 'numpy.ndarray'> int64 [1 2 4]
As you might expect, creating a NumPy array this way can be slow, since it must manually convert each element of a list into its equivalent C type (int objects become C ints, etc). There are many other ways to create NumPy arrays, such as
np.identity
np.zeros
np.zeros_like
or by manually specifying the dimensions and type of the array with the low-level creation function:
arr3 = np.ndarray((2, 3, 4), dtype=complex) # Notice : `ndarray`, not `array`!
print(type(arr3))
## what are the values in arr3??
<type 'numpy.ndarray'>
arr3
array([[[ 0.00000000e+000 +0.00000000e+000j, 4.44659081e-322 +0.00000000e+000j, 0.00000000e+000 +0.00000000e+000j, 0.00000000e+000 +0.00000000e+000j], [ 0.00000000e+000 +2.16695872e-314j, 2.16695872e-314 +0.00000000e+000j, 0.00000000e+000 +0.00000000e+000j, 0.00000000e+000 +0.00000000e+000j], [ 0.00000000e+000 +0.00000000e+000j, 0.00000000e+000 +0.00000000e+000j, 0.00000000e+000 +0.00000000e+000j, 0.00000000e+000 +0.00000000e+000j]], [[ 0.00000000e+000 +0.00000000e+000j, 0.00000000e+000 +0.00000000e+000j, 0.00000000e+000 +0.00000000e+000j, 0.00000000e+000 +0.00000000e+000j], [ 0.00000000e+000 +0.00000000e+000j, 0.00000000e+000 +0.00000000e+000j, 0.00000000e+000 +0.00000000e+000j, 0.00000000e+000 +0.00000000e+000j], [ 0.00000000e+000 +0.00000000e+000j, 0.00000000e+000 +0.00000000e+000j, 0.00000000e+000 +0.00000000e+000j, 0.00000000e+000 +0.00000000e+000j]]])
Arrays have a .shape
attribute, which stores the dimensions of the array as a tuple:
print(arr3.shape)
For many of the examples below, we will be using np.arange
which, similar to the Python built-in function range
, returns a NumPy array
of integers from 0 to N-1, inclusive. Like range
, you can also specify a starting value and a step:
arr4 = np.arange(2, 5)
print(arr4)
arr5 = np.arange(1, 5, 2)
print(arr5)
arr6 = np.arange(2, 10, 2)
print arr6
[2 3 4] [1 3] [2 4 6 8]
np.arange(10)
np.arange?
Create an array with values ranging from 0 to 10, in increments of 0.5.
Reminder: get help by typing np.arange?
, np.ndarray?
, np.array?
, etc.
np.arange(0, 10.5, .5)
array([ 0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. , 5.5, 6. , 6.5, 7. , 7.5, 8. , 8.5, 9. , 9.5, 10. ])
Since numpy exists to perform efficient numerical operations in Python, arrays have all the usual arithmetic operations available to them. These operations are performed element-wise (i.e. the same operation is performed independently on each element of the array).
A = np.arange(5)
B = np.arange(5, 10)
print 'A is ', A
print 'B is ', B
print (A+B)
print(B-A)
print(A*B)
A is [0 1 2 3 4] B is [5 6 7 8 9] [ 5 7 9 11 13] [5 5 5 5 5] [ 0 6 14 24 36]
shape
?¶A_10 = np.ones(10)
B_5 = np.zeros(5)
print A_10, B_5
A_10 + B_5
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-33-c4c4562ff1ba> in <module>() 2 B_5 = np.zeros(5) 3 print A_10, B_5 ----> 4 A_10 + B_5 ValueError: operands could not be broadcast together with shapes (10) (5)
[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] [ 0. 0. 0. 0. 0.]
In addition, if one of the arguments is a scalar, that value will be applied to all the elements of the array.
A = np.arange(5)
A+10
2*A
A**2
array([ 0, 1, 4, 9, 16])
You can use arrays as vectors and matrices in linear algebra operations
Specifically, you can perform matrix/vector multiplication between arrays, by using the .dot
method, or the np.dot
function:
A, B
print A.dot(B)
print np.dot(A, B)
A.shape
80 80
(5,)
big_a = np.ones((2,5))
print big_a.shape
big_a
(2, 5)
array([[ 1., 1., 1., 1., 1.], [ 1., 1., 1., 1., 1.]])
Note: This is like the '``' operator in Matlab*
http://en.wikipedia.org/wiki/Dot_product
Given two vectors in 2-space:
a = [1,2]
b = [4,5]
Find the angle between these vectors.
The rotation matrix of $\theta$ angle is a matrix with elements:
[[cos($\theta$), -sin($\theta$],
[sin($\theta$, cos($\theta$)]]
Create a function that takes two 2D-vectors and creates the rotation matrix between them.
For the vector c=[7,8]
, find a vector d
that has the same rotation as b
has relative to a
http://en.wikipedia.org/wiki/Dot_product
We can use a dot product to help us find theta $$\theta$$
$$\mathbf A\cdot\mathbf B = \|\mathbf A\|\,\|\mathbf B\|\cos\theta$$$$\|\mathbf{p}\| = \sqrt{p_1^2+p_2^2+\cdots +p_n^2} = \sqrt{\mathbf{p}\cdot\mathbf{p}}$$Where $$\sqrt{\mathbf{p}\cdot\mathbf{p}}$$ is the square root of the dot product
a = np.array([1,2])
b = np.array([4,5])
def angle_between_vectors(a,b):
"""
find the angle between two vectors
"""
norm_a = np.sqrt(a.dot(a))
norm_b = np.sqrt(b.dot(b))
ab_dot = a.dot(b)
theta = np.arccos(ab_dot / (norm_a * norm_b))
return theta
theta = angle_between_vectors(a,b)
print theta
## create rotation matrix
def rotation_matrix(theta):
""" create a rotation matrix given theta"""
rot = np.array([
[np.cos(theta), -np.sin(theta)],
[np.sin(theta), np.cos(theta)],
])
return rot
ab_rot = rotation_matrix(theta)
print ab_rot
## we know the angle between a and b
## find a vector rotated by the same amount relative to c = [7,8,9],
c = np.array([7,8])
new = c.dot(ab_rot)
print angle_between_vectors(c,new)
Much like the basic arithmetic operations we discussed above, comparison operations are perfomed element-wise. That is, rather than returning a
single boolean, comparison operators compare each element in both arrays pairwise, and return an array
of booleans (if the sizes of the input
arrays are incompatible, the comparison will simply return False). For example:
arr1 = np.array([1, 2, 3, 4, 5])
arr2 = np.array([1, 1, 3, 3, 5])
print(arr1 == arr2)
c = (arr1 == arr2)
print c.dtype
print c.sum()
[ True False True False True] bool 3
arr1[c]
for mybool, item in zip(c, arr1):
if mybool:
print item **2
1 9 25
arr2[c]**2
array([ 1, 9, 25])
Note: You can use the methods .any()
and .all()
or the functions np.any
and np.all
to return a single boolean indicating whether any or all values in the array are True
, respectively.
print(np.all(c))
print(c.all())
print(c.any())
False False True
In addition to the usual methods of indexing lists with an integer (or with a series of colon-separated integers for a slice), numpy allows you to index arrays in a wide variety of different ways for more advanced operations.
First, the simple way:
Note: python indexes from zero
a = np.array([1,2,3])
print a[0:2]
[1 2]
a[1:2]
array([2])
c = np.random.rand(3,3)
print(c)
#print(c[1:,0:2])
#print a
c[0,:] = a
print
c[0,:] = 42
print(c)
[[ 0.80833693 0.83243262 0.26534231] [ 0.75869528 0.31479564 0.06134746] [ 0.77592561 0.24624831 0.03440165]] [[ 4.20000000e+01 4.20000000e+01 4.20000000e+01] [ 7.58695276e-01 3.14795638e-01 6.13474585e-02] [ 7.75925606e-01 2.46248308e-01 3.44016533e-02]]
myvals = np.ones(100) * 13 + np.random.random(100)
print myvals
[ 13.59114593 13.34001837 13.5375499 13.00451878 13.28052895 13.94806292 13.17553366 13.67686981 13.02973211 13.91452831 13.47677767 13.93323565 13.9551556 13.56607387 13.14314234 13.65010953 13.17455852 13.95296247 13.6675259 13.24268106 13.30771672 13.10878015 13.17385343 13.45145132 13.63363837 13.79471584 13.11998411 13.56198842 13.85606593 13.23624152 13.19706126 13.25253607 13.08575879 13.79461907 13.78047991 13.08693021 13.59714173 13.27073197 13.51291035 13.8937349 13.3952584 13.822228 13.30955901 13.51256604 13.70593058 13.3076226 13.17250251 13.61906959 13.83212702 13.35391282 13.97660988 13.74179028 13.08792259 13.39055293 13.44074272 13.12399919 13.91794852 13.41646663 13.53261444 13.16821828 13.06599806 13.26396216 13.42179522 13.02195477 13.24126517 13.80587748 13.62336911 13.59498972 13.13110852 13.57466704 13.91718828 13.27493337 13.39197814 13.26310546 13.25496674 13.63496534 13.0480124 13.35018566 13.76447972 13.5514787 13.71308711 13.60618318 13.6336942 13.92933969 13.37967773 13.51699195 13.166105 13.59111672 13.19483767 13.21579675 13.51226399 13.9978466 13.40086006 13.53810426 13.78326947 13.59867458 13.56798128 13.43876532 13.42391088 13.52500084]
We can manipulate the shape of an array as follows:
A = np.arange(16).reshape(4, 4)
Or even:
A = np.reshape(numpy.arange(16), (4, 4))
Using what we've learned about slicing and indexing,
Create A, index the array to get the upper-left quarter
create a function
0
to n**2-1
(like A
)For example, for A, the desired output would be:
array([[2, 3],
[6, 7]])
# create A and slice to get upper corner
# create code that generates a square array, and outputs upper 4th quarter
Arrays can be indexed with other arrays, using either an array of indices, or an array of booleans of the same length. In the former case, numpy returns a view of the data in the specified indices as a new array. In the latter, numpy returns a view of the array with only the elements where the index array is True. (We'll discuss the difference between views and copies in a moment.) This makes normally-tedious operations like clamping extremely simple.
Indexing with an array of indices:
A = np.arange(5, 10)
print(A)
print(A[[0, 2, 3]])
A[[0, 2, 3]] = 0
print(A)
Indexing with a boolean array:
random = np.random
A = np.array([random.randint(0, 10) for i in range(10)]) # Check out the list comprehension!
print 'A', (A)
A[A>5] = 5
print(A)
stuff_lessthan5 = A[A<5]
print stuff_lessthan5
A [6 7 9 0 3 9 4 4 2 7] [5 5 5 0 3 5 4 4 2 5] [0 3 4 4 2]
A few more examples:
b = np.array([4,5,6])
print (a)
print (b)
print (a > 2)
print (a[a > 2])
print (b[a > 2])
b[a == 3] = 77
print(b)
# There are handy ways to make arrays full of ones and zeros
print(np.zeros(5))
print np.ones(5)
print np.identity(5), '\n'
A = np.arange(5)*2
print(A)
B = range(5)*2
print(B)
Similarly, when adding two numpy arrays together, we get the vector sum back, whereas when adding two lists together, we get the concatenation back.
A = np.arange(5) + np.arange(5)
print(A)
B =range(5) + range(5)
print(B)
In order to be as efficient as possible, numpy uses "views" instead of copies wherever possible. That is, numpy arrays derived from another base array generally refer to the ''exact same data'' as the base array. The consequence of this is that modification of these derived arrays will also modify the base array. The result of an array indexed by an array of indices is a ''copy'', but an array indexed by an array of booleans is a ''view''.
Specifically, slices of arrays are always views, unlike slices of lists or tuples, which are always copies.
A = np.arange(5)
B = A[0:1]
B[0] = 42
print(A)
"""
A = range(5)
B = A[0:1]
B[0] = 42
print(A)
"""
[42 1 2 3 4]
'\nA = range(5)\nB = A[0:1]\nB[0] = 42\nprint(A)\n'
B = A[:2].copy()
print B
[42 1]
B[0] = 0
print A
[42 1 2 3 4]
Figure out how to create a copy of a numpy array. Remember: since numpy slices are views, you can't use the trick you'd use for Python lists, i.e. copy = list[:].
## create an array of 10 random numbers
## np.random.random?
# create a view and change the first element in the view
## what is the value in the original
## now create a copy
Being designed for scientific computing, numpy also contains a host of common mathematical functions, including linear algebra functions, fast Fourier transforms, and probability/statistics functions. While there isn't space to go over ''all'' of these in detail, we will provide an overview of the most common/essential of these.
For >2-dimensional arrays, there are some other common matrix operations that can be conducted:
A = np.arange(16).reshape(4, 4)
print(A)
print(A.T) # transpose
print(A.trace())
[[ 0 1 2 3] [ 4 5 6 7] [ 8 9 10 11] [12 13 14 15]] [[ 0 4 8 12] [ 1 5 9 13] [ 2 6 10 14] [ 3 7 11 15]] 30
There are many more methods like these available with NumPy arrays. Be sure to consult the numpy documentation before writing your own versions!
matrix
class¶So far, we've used two-dimensional arrays to represent matrix-like objects. However, numpy provides a specialized class for this. The matrix
class is almost identical to a two-dimensional numpy array, but has a few changes to the interface to simplify common linear algebraic tasks. These are:
* The * operator is performs matrix multiplication
* The ** operator performs matrix exponentiation
* The property .I (or the method .getI()) returns the matrix inverse
* The property .H (or the method .getH()) returns the conjugate transpose
la = np.linalg
A = np.matrix([[3, 2, -1], [2, -2, 4], [-1, .5, -1]])
B = np.array([1, -2, 0])
print(la.solve(A, B))
Universal functions (also called ufuncs) are high-speed, element-wise operations on NumPy arrays. They are, in essence, what allows you to operate on NumPy arrays efficiently. There are a large number of universal functions available covering most of the basic operations that get performed on data, like addition, subtraction, logarithms, and so on. Calling a ufunc is a simple matter:
A = np.arange(1,10)
print(np.log10(A))
In addition to basic operation like above, ufuncs that take two input arrays and return an output array can be used in more advanced ways.
Use %timeit magic to check speed
Using ufuncs, calculate the log of each element in the following array, compare speed to list operation:
[8.1, 1.6, 0.9, 4.3, 7.0, 7.3, 4.7, 8.2, 7.2, 3.0, 1.4, 9.8, 5.7, 0.7, 8.7, 4.6, 8.8, 0.9, 4.4, 4.4]
# create list, and array
## calculate log on all elements in list (list comprehension)
## calculate log on all elements in array
## use timeit to compare processing time