Numpy¶

Numpy is python's library for numerical linear algebra, and provides a wealth of functionality for working with vectors, matrices, and tensors.

Our first step is to import the package (note the "import as" shortcut):

In [1]:

import numpy as np

Arrays are the standard data containers in numpy, and can have any number of dimensions.

Let's create a one dimensional array:

In [2]:

a = np.array([1, 2, 3])
a
print(type(a))
print(a.shape)

Out[2]:

array([1, 2, 3])

<class 'numpy.ndarray'>
(3,)

Accessing/modifying array elements:

In [3]:

print(a[0], a[1], a[2])
a[0] = 5                  # Change an element of the array
a

1 2 3

Out[3]:

array([5, 2, 3])

Two dimensional arrays:

In [4]:

b = np.array([[1,2,3],[4,5,6]])
b,b.shape

print(b[0, 0], b[0, 1], b[1, 0])
print(b[0][0], b[0][1], b[1][0])

Out[4]:

(array([[1, 2, 3],
        [4, 5, 6]]), (2, 3))

1 2 4
1 2 4

Some functions for creating arrays:

In [5]:

a = np.zeros((2,2))      # Create an array of zeros
print(a)

b = np.ones((2,2))       # Create an array of ones
print(b)

c = np.full((2,2), 7.0)  # Create a constant array (can also be done with the ones function)
print(c)


d = np.eye(2)            # Create a 2x2 identity matrix
print(d)

e = np.random.random((3,3))
print(e)  

f = np.arange(2, 3, 0.1)
print(f)

g = np.linspace(1., 4., 6)
print(g)

[[ 0.  0.]
 [ 0.  0.]]
[[ 1.  1.]
 [ 1.  1.]]
[[ 7.  7.]
 [ 7.  7.]]
[[ 1.  0.]
 [ 0.  1.]]
[[ 0.13849416  0.46435903  0.9068119 ]
 [ 0.86278242  0.66594523  0.71520984]
 [ 0.65630407  0.39285181  0.99686805]]
[ 2.   2.1  2.2  2.3  2.4  2.5  2.6  2.7  2.8  2.9]
[ 1.   1.6  2.2  2.8  3.4  4. ]

Sidenote - getting help on python objects:¶

For getting help e.g. on the Numpy linspace function you can do one of the following:

?np.linspace

or

help(np.linspace)

In [6]:

Array indexing¶

In [7]:

X = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
X

Out[7]:

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

We'll think of this matrix as containing three feature vectors: each row of the matrix is a vector of features, and each column is the set of values that a given feature has in the data.

To access the vector of features of the first example in the dataset:

In [8]:

row = X[0]    # the first row of X
row, row.shape

Out[8]:

(array([1, 2, 3, 4]), (4,))

To access a column of the matrix (a single feature):

In [9]:

col = X[:, 0]
col, col.shape

Out[9]:

(array([1, 5, 9]), (3,))

There are multiple ways of accessing sub-arrays. The first uses slices, similarly to python lists:

In [10]:

submatrix = X[1:3, 1:4]
submatrix, submatrix.shape

Out[10]:

(array([[ 6,  7,  8],
        [10, 11, 12]]), (2, 3))

And here's my favorite way of indexing an array, using an integer array:

In [11]:

X[ [0, 2] ]   # extract a given set of rows

Out[11]:

array([[ 1,  2,  3,  4],
       [ 9, 10, 11, 12]])

In [12]:

X[:, [0,2]]  # extract a given set of columns

Out[12]:

array([[ 1,  3],
       [ 5,  7],
       [ 9, 11]])

Operations on arrays¶

You can multiply an array by a scalar:

In [13]:

x = np.array([1,1])
x * 2

Out[13]:

array([2, 2])

Add arrays:

In [14]:

x + np.array([1,0])

Out[14]:

array([2, 1])

Dot products¶

In [15]:

X = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])

Let's construct a weight vector for a linear classifier:

In [16]:

w = np.array([1,-1, 2, -1])
w

Out[16]:

array([ 1, -1,  2, -1])

We can easily compute the dot/inner product of a row of X with the weight vector:

In [17]:

np.dot(X[0], w)

Out[17]:

Or you can do it for the whole matrix at once:

In [18]:

np.dot(X, w)

Out[18]:

array([1, 5, 9])

This can also be done using methods:

In [19]:

X.dot(w)

Out[19]:

array([1, 5, 9])

and can also be achieved using

In [20]:

np.inner(X, w)

Out[20]:

array([1, 5, 9])

Other linear algebra operations¶

Numpy has many other useful things it can do for you when it comes to vectors and matrices.

It's very easy to find the inverse of a matrix:

In [21]:

Z = np.array([[2,1,1],[1,2,2],[2,3,4]])
np.linalg.inv(Z)

Out[21]:

array([[ 0.66666667, -0.33333333,  0.        ],
       [ 0.        ,  2.        , -1.        ],
       [-0.33333333, -1.33333333,  1.        ]])

And we can easily verify that this is correct:

In [22]:

np.dot(Z, np.linalg.inv(Z))==np.eye(3)

Out[22]:

array([[ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]], dtype=bool)

You can compute some useful statistics over your matrix such the mean and standard deviation:

In [23]:

X = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
X.mean(), X.mean(axis=0), X.mean(axis=1)

Out[23]:

(6.5, array([ 5.,  6.,  7.,  8.]), array([  2.5,   6.5,  10.5]))

In [24]:

X.std(), X.std(axis=0), X.std(axis=1)

Out[24]:

(3.4520525295346629,
 array([ 3.26598632,  3.26598632,  3.26598632,  3.26598632]),
 array([ 1.11803399,  1.11803399,  1.11803399]))

There are lots of other things you can do with numpy:

In [25]:

x = np.array( [1,2,3,4] )
y = np.array( [5,6,7,8] )
np.vstack([x,y])
np.hstack([x,y])

Out[25]:

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

Out[25]:

array([1, 2, 3, 4, 5, 6, 7, 8])

Avoid loops when you can¶

Consider the following piece of code:

In [26]:

x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1])
y = np.empty_like(x)   # Create an empty matrix with the same shape as x

# Add the vector v to each row of the matrix x with an explicit loop
for i in range(4):
    y[i, :] = x[i, :] + v

y

Out[26]:

array([[ 2,  2,  4],
       [ 5,  5,  7],
       [ 8,  8, 10],
       [11, 11, 13]])

As we know, loops are slow in python. There is a much more efficient way of doing this:

In [27]:

x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1])
y = x + v
y

Out[27]:

array([[ 2,  2,  4],
       [ 5,  5,  7],
       [ 8,  8, 10],
       [11, 11, 13]])

This is called broadcasting.

Numpy documentation¶

These were some of the basics that are relevant to our course. You can find more details in the Numpy user manual and the detailed reference guide. The following is a good resource for learning how to vectorize Python code using Numpy for obtaining good performance.

Some aspects of these tutorial were inspired by this tutorial.