Objectives:
This lesson is in large a rehearsal of the introduction video we watched last week, but now with practical excercises and extending on ideas of accessing and manipulating data with NumPy arrays. Upon completion of this class, you will be able to
Access data in N-dimensional arrays via indexing, slicing, fancy indexing
Perform various operations on the arrays, and become aware that some times you get a "view", some times a "copy"
Practice broadcasting
The items of an array can be accessed and assigned to the same way as
other Python sequences (list
, tuple
):
import numpy as np
a = np.arange(10)
print(a)
print(a[0], a[2], a[-1])
Note: Indices begin at 0, like other Python sequences (and C/C++): In contrast, in Fortran or Matlab, indices begin at 1.
Indexes are tuples of integers:
a = np.diag(np.arange(5))
a
a[1, 1]
a[2, 1] = 10 # third line, second column
a
and specifying only one index, would select the corresponding 'row' or 'column':
a[1]
a[:, 1]
in above example I have used ":" to specify "all" elements in that axis. Alternatively, even if we had >2 axes, I could use "..." to specify "all previous axes having ":"
a[..., 1]
Arrays, like other Python sequences can also be sliced:
a = np.arange(10)
a
a[2:9:3] # [start:end:step]
A small illustrated summary of Numpy indexing and slicing...
Fancy indexing is just a fancy name for indexing multiple entries at once by either providing indexes of the entries:
a = np.array(list('abcdefgh'))
print(a[[0, 2, 4, 6]])
or a boolean mask:
a = np.array(list('abcdefgh'))
print(a[[True, False]*4])
There was/used to be a catch: bool mask had to be an array of type bool! Excercise: find when behavior changed
Very powerful feature to modify in-place some elements identified by indexes or a mask:
a[[0, 2]] = 'X'
print(a)
a[a == 'g'] = 'Y'
print(a)
.flags
of any array provide information about the "internals" of the array. Lets look at them:
a_slice = a[::2]
print(a_slice)
print(a_slice.flags)
a_findexed = a[[0, 2, 4, 6]]
print(a_findexed)
print(a_findexed.flags)
See setflags documentation for more details about flags, but for now we concentrate on "OWNDATA".
Excercise
np.random
provides functions to generate random arrays. Create a 2d array of shape (3, 4)
with normally distributed data. Experiment with slicing, indexing and fancy indexing. Check OWNDATA
flag of created arrays.
With scalars:
a = np.array([1, 2, 3, 4])
a + 1
2**a
All arithmetic operates elementwise (like ufuncs)
b = np.ones(4) + 1
a - b
a * b
j = np.arange(5)
2**(j + 1) - j
Warning! Multiplication is no special and is also elementwise:
c = np.ones((3, 3))
c * c # NOT matrix multiplication!
Matrix multiplication:
c.dot(c)
Note Since python3. we get a new operation symbol "@" specifically reserved for "matrix multiplication".
c @ c
There is a growing number of libraries which try to take even more advantage of existing hardware features (e.g. GPUs, multiple CPUs etc) or simply providing even more efficient implementations (reuse of memory, etc). E.g. numexpr could speed up majority of ufunc executions and even np.where
:
a = np.random.normal(1e6)
b = np.arange(1e6)
c = a**2 + 2*b**3 + 2*a*b
pos_c = c>0
# The equation
%timeit a**2 + 2*b**3 + 2*a*b
%timeit np.where(pos_c, 1, 2)
import numexpr as ne
%timeit ne.evaluate("a**2 + 2*b**3 + 2*a*b")
%timeit ne.evaluate("where(pos_c, 1, 2)")
# now use a single thread
ne.set_num_threads(1)
%timeit ne.evaluate("a**2 + 2*b**3 + 2*a*b")
%timeit ne.evaluate("where(pos_c, 1, 2)")
You can make fast comparisons of ndarrays:
a = np.array([1, 2, 3, 4])
b = np.array([4, 2, 2, 4])
a == b
a > b
so note that it is a unary functionality, i.e. it compares each element at a time, and unlike (which stock container?) doesn't result in a single bool value:
if a > b:
print("a is bigger")
and you need to use numpy.any or numpy.all (well -- regular all and any would work but slower) reductions
to get the target bool:
print(np.all(a > b))
print(np.any(a > b))
you could also use array_equal
helper
np.array_equal(a, b)
Questions
np.allclose
-- check out its help. When will it be useful?And perform fast logical operations:
a = np.array([1, 1, 0, 0], dtype=bool)
b = np.array([1, 0, 1, 0], dtype=bool)
a | b
a & b
Note: For arrays: "&
" and "|
" for logical operations, not: "and
" and "or
". You could also use numpy.logical_and and numpy.logical_or functions.
What if things don't line up?
a = np.array([0.1, 0.6, -0.3])
b = np.array([1, 2])
print(a + b)
Broadcasting could be of great use. It follows 3 simple steps:
a = np.tile(np.arange(0, 40, 10), (3, 1)).T
print(a)
b = np.array([0, 1, 2])
print(b)
print(a + b)
We could left-pad with new dimension manually:
print(b.shape)
print(b[np.newaxis, :].shape)
print(a + b[np.newaxis, :])
Broadcasting excercise
Assuming that in above failed example I wanted to generate 3x2 array in which first column will be a+1 and 2nd column would be a+2 -- how to adjust above addition to make it happen? (hint: we need to introduce one more axis to one of those 1d vectors)
a_T = a.T
print(a_T)
Such a transpose is very fast operation since no data copying is actually being done -- only a new "view" created over the array data:
print(a_T.flags)
so similarly to "sliced" array, changing element in the copy would change data in the original array:
a_T[1, 0] = 101
print(a_T)
print(a)
Create a flattened version of an array, with the highest dimension ravel
ing first:
a_flat = a.ravel()
print(a_flat)
Excercises
np.ravel
about two possible different "orders".a.flags
to see which order a
is ina
in its "original" order and store result into a_flatten
a_flatten.flags
. What does OWNDATA would mean? Try changing some element within a_flatten
and inspect content of original a
If we do not care to get a "view" over an array, we can use .flatten()
method which is guaranteed to return a copy:
a.flatten()
a_3d = a.reshape((2, 2, 3))
print(a_3d.shape)
print(a_3d)
Excercise
a_3d
-- have we created a copy or a view?np.reshape
helpQuestion
If in the case of .T
operation or slicing we can somewhat rely that we would obtain a 'view', it is not the fact with ravel
and reshape
. How should we "guard" our code to state our expectation that we got a copy or not?
There is a number of functions to "join" multiple arrays or repeat existing ones multiple times:
a = np.arange(4).reshape((2, 2))
b = np.arange(4,6)
print(a, a.shape)
print(b, b.shape)
c = np.vstack((a, b))
print(c, c.shape)
c = np.hstack((a, b[:, np.newaxis]))
print(c, c.shape)
or an array could be repeated (by default flattened)
print(np.repeat(a, 2))
or along a given axis
np.repeat(a, 2, axis=1)
possibly even with varying number of repeats per each element (along that axis):
np.repeat(a, (2, 3), axis=1)
Excercise
Imagine we are to carry out an fMRI experiment where we have
np.random.shuffle
)np.bincount
function)Matrix multiplication:
a = np.triu(np.ones((3, 3)), 1) # see help(np.triu)
print(a)
b = np.diag([1, 2, 3])
print(a.dot(b))
print(np.dot(a, a))
A = a + b
print(A)
B = np.linalg.inv(A)
print(B)
B.dot(A)
np.linalg.eigvals(A)
... and so on, see
?np.linalg
Computing sums:
x = np.array([1, 2, 3, 4])
print(np.sum(x))
print(x.sum())
x = np.array([[1, 1], [2, 2]])
print(x)
We can ask for an operation along specific axis:
x.sum(axis=0) # columns (first dimension)
which is identical to having done something like:
x[:, 0].sum(), x[:, 1].sum()
and we could easily sum within each row:
x.sum(axis=1) # rows (second dimension)
x[0, :].sum(), x[1, :].sum()
Work the same way (and take axis=
)
x = np.array([1, 2, 3, 1])
y = np.array([[1, 2, 3], [5, 6, 1]])
x.mean()
np.median(x)
np.median(y, axis=-1) # last axis
x.std() # full population standard dev.
x.std(ddof=1) # sample std (with N-1 in divisor)
Extrema:
x = np.array([1, 3, 2])
x.min(), x.max()
# indexes of (min, max)
x.argmin(), x.argmax()
np.sort
provides a new array with entries sorted within any given axis:
a = np.array([[4, 3, 5], [1, 2, 1]])
b = np.sort(a, axis=1)
print(a)
print(b)
But we could also sort "inplace", i.e. by modifying the original array:
a.sort(axis=1)
print(a)
Excercise
np.random.shuffle
sorted
function, write a code snippet which would verify that you have sorted your rows correctly (and raises AssertionError if not)NumPy comes with a .testing
submodule which provides handy utilities for unit-testing your code where you need to verify correct operation on NumPy arrays, e.g.
import numpy.testing as npt
npt.assert_array_less(np.array([1,2,3]), np.array([2,3,4]))
npt.assert_array_equal(np.array([1,2,3]), np.array([1., 2., 3.]))
npt.assert_array_almost_equal(np.array([1,2,3]), np.array([1.0000001, 2., 3.]))
Excercise
Adjust your code snippet in previous excercise to use numpy.testing assertion helpers
So far we already saw that we could store integers, floating point numbers, and even strings into numpy arrays. .dtype
provides us description of the data type of the array:
np.arange(3).dtype
np.ones(3).dtype
np.array(['a', 'bc']).dtype
so numpy usually figures out the appropriate data-type but at times we might want to specify it explicitly, e.g. to save some memory (and possibly CPU time) at the cost of precision
np.ones(3, dtype=np.float32).dtype
and you can "cast" your existing array into another compatible type:
np.ones(3, dtype=np.float32).astype(int)
If you need to discover details (max/min value, resolution) of the specific floating point number data type, use finfo
function
np.finfo(np.float32)
Extra
For hungry minds -- checkout Structured data types and Masked arrays