Numpy is python's library for numerical linear algebra, and provides a wealth of functionality for working with vectors, matrices, and tensors.
Our first step is to import the package (note the "import as" shortcut):
import numpy as np
Arrays are the standard data containers in numpy, and can have any number of dimensions.
Let's create a one dimensional array:
a = np.array([1, 2, 3])
a
print(type(a))
print(a.shape)
array([1, 2, 3])
<class 'numpy.ndarray'> (3,)
Accessing/modifying array elements:
print(a[0], a[1], a[2])
a[0] = 5 # Change an element of the array
a
1 2 3
array([5, 2, 3])
Two dimensional arrays:
b = np.array([[1,2,3],[4,5,6]])
b,b.shape
print(b[0, 0], b[0, 1], b[1, 0])
print(b[0][0], b[0][1], b[1][0])
(array([[1, 2, 3], [4, 5, 6]]), (2, 3))
1 2 4 1 2 4
Some functions for creating arrays:
a = np.zeros((2,2)) # Create an array of zeros
print(a)
b = np.ones((2,2)) # Create an array of ones
print(b)
c = np.full((2,2), 7.0) # Create a constant array (can also be done with the ones function)
print(c)
d = np.eye(2) # Create a 2x2 identity matrix
print(d)
e = np.random.random((3,3))
print(e)
f = np.arange(2, 3, 0.1)
print(f)
g = np.linspace(1., 4., 6)
print(g)
[[ 0. 0.] [ 0. 0.]] [[ 1. 1.] [ 1. 1.]] [[ 7. 7.] [ 7. 7.]] [[ 1. 0.] [ 0. 1.]] [[ 0.13849416 0.46435903 0.9068119 ] [ 0.86278242 0.66594523 0.71520984] [ 0.65630407 0.39285181 0.99686805]] [ 2. 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9] [ 1. 1.6 2.2 2.8 3.4 4. ]
For getting help e.g. on the Numpy linspace function you can do one of the following:
?np.linspace
or
help(np.linspace)
X = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
X
array([[ 1, 2, 3, 4], [ 5, 6, 7, 8], [ 9, 10, 11, 12]])
We'll think of this matrix as containing three feature vectors: each row of the matrix is a vector of features, and each column is the set of values that a given feature has in the data.
To access the vector of features of the first example in the dataset:
row = X[0] # the first row of X
row, row.shape
(array([1, 2, 3, 4]), (4,))
To access a column of the matrix (a single feature):
col = X[:, 0]
col, col.shape
(array([1, 5, 9]), (3,))
There are multiple ways of accessing sub-arrays. The first uses slices, similarly to python lists:
submatrix = X[1:3, 1:4]
submatrix, submatrix.shape
(array([[ 6, 7, 8], [10, 11, 12]]), (2, 3))
And here's my favorite way of indexing an array, using an integer array:
X[ [0, 2] ] # extract a given set of rows
array([[ 1, 2, 3, 4], [ 9, 10, 11, 12]])
X[:, [0,2]] # extract a given set of columns
array([[ 1, 3], [ 5, 7], [ 9, 11]])
You can multiply an array by a scalar:
x = np.array([1,1])
x * 2
array([2, 2])
Add arrays:
x + np.array([1,0])
array([2, 1])
X = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
Let's construct a weight vector for a linear classifier:
w = np.array([1,-1, 2, -1])
w
array([ 1, -1, 2, -1])
We can easily compute the dot/inner product of a row of X with the weight vector:
np.dot(X[0], w)
1
Or you can do it for the whole matrix at once:
np.dot(X, w)
array([1, 5, 9])
This can also be done using methods:
X.dot(w)
array([1, 5, 9])
and can also be achieved using
np.inner(X, w)
array([1, 5, 9])
Numpy has many other useful things it can do for you when it comes to vectors and matrices.
It's very easy to find the inverse of a matrix:
Z = np.array([[2,1,1],[1,2,2],[2,3,4]])
np.linalg.inv(Z)
array([[ 0.66666667, -0.33333333, 0. ], [ 0. , 2. , -1. ], [-0.33333333, -1.33333333, 1. ]])
And we can easily verify that this is correct:
np.dot(Z, np.linalg.inv(Z))==np.eye(3)
array([[ True, True, True], [ True, True, True], [ True, True, True]], dtype=bool)
You can compute some useful statistics over your matrix such the mean and standard deviation:
X = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
X.mean(), X.mean(axis=0), X.mean(axis=1)
(6.5, array([ 5., 6., 7., 8.]), array([ 2.5, 6.5, 10.5]))
X.std(), X.std(axis=0), X.std(axis=1)
(3.4520525295346629, array([ 3.26598632, 3.26598632, 3.26598632, 3.26598632]), array([ 1.11803399, 1.11803399, 1.11803399]))
There are lots of other things you can do with numpy:
x = np.array( [1,2,3,4] )
y = np.array( [5,6,7,8] )
np.vstack([x,y])
np.hstack([x,y])
array([[1, 2, 3, 4], [5, 6, 7, 8]])
array([1, 2, 3, 4, 5, 6, 7, 8])
Consider the following piece of code:
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1])
y = np.empty_like(x) # Create an empty matrix with the same shape as x
# Add the vector v to each row of the matrix x with an explicit loop
for i in range(4):
y[i, :] = x[i, :] + v
y
array([[ 2, 2, 4], [ 5, 5, 7], [ 8, 8, 10], [11, 11, 13]])
As we know, loops are slow in python. There is a much more efficient way of doing this:
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1])
y = x + v
y
array([[ 2, 2, 4], [ 5, 5, 7], [ 8, 8, 10], [11, 11, 13]])
This is called broadcasting.
These were some of the basics that are relevant to our course. You can find more details in the Numpy user manual and the detailed reference guide. The following is a good resource for learning how to vectorize Python code using Numpy for obtaining good performance.
Some aspects of these tutorial were inspired by this tutorial.