Python objects
high-level number objects: integers, floating point
containers: lists (costless insertion and append), dictionaries (fast
lookup)
Numpy provides
extension package to Python for multi-dimensional arrays
closer to hardware (efficiency)
designed for scientific computation (convenience)
Also known as array oriented computing
import numpy as np
a = np.array([0, 1, 2, 3])
a
For example, An array containing:
values of an experiment/simulation at discrete time steps
signal recorded by a measurement device, e.g. sound wave
pixels of an image, grey-level or colour
3-D data measured at different X-Y-Z positions, e.g. MRI scan
...
Why it is useful: Memory-efficient container that provides fast numerical operations.
L = range(1000)
%timeit [i**2 for i in L]
a = np.arange(1000)
%timeit a**2
On the web: http://docs.scipy.org/
Interactive help:
np.array?
np.lookfor('create array')
np.con*?
The general convention to import numpy is:
import numpy as np
Using this style of import is recommended.
a = np.array([0, 1, 2, 3])
a
a.ndim
a.shape
len(a)
b = np.array([[0, 1, 2], [3, 4, 5]]) # 2 x 3 array
b
b.ndim
b.shape
len(b) # returns the size of the first dimension
c = np.array([[[1], [2]], [[3], [4]]])
c
c.shape
from above. And then create your own.
len
, shape
and ndim
on some of those arrays andobserve their output.
In practice, we rarely enter items one by one...
a = np.arange(10) # 0 .. n-1 (!)
a
b = np.arange(1, 9, 2) # start, end (exclusive), step
b
c = np.linspace(0, 1, 6) # start, end, num-points
c
d = np.linspace(0, 1, 5, endpoint=False)
d
a = np.ones((3, 3)) # reminder: (3, 3) is a tuple
a
b = np.zeros((2, 2))
b
c = np.eye(3)
c
d = np.diag(np.array([1, 2, 3, 4]))
d
np.random
random numbers (Mersenne Twister PRNG):a = np.random.rand(4) # uniform in [0, 1]
a
b = np.random.randn(4) # Gaussian
b
np.random.seed(1234) # Setting the random seed
Experiment with arange
, linspace
, ones
, zeros
, eye
and diag
.
Create different kinds of arrays with random numbers.
Try setting the seed before creating an array with random values.
Look at the function np.empty
. What does it do? When might this be
useful?
You may have noticed that, in some instances, array elements are
displayed with a trailing dot (e.g. 2.
vs 2
). This is due to a
difference in the data-type used:
a = np.array([1, 2, 3])
a.dtype
b = np.array([1., 2., 3.])
b.dtype
Different data-types allow us to store data more compactly in memory, but most of the time we simply work with floating point numbers. Note that, in the example above, NumPy auto-detects the data-type from the input.
You can explicitly specify which data-type you want:
c = np.array([1, 2, 3], dtype=float)
c.dtype
The default data type is floating point:
a = np.ones((3, 3))
a.dtype
There are also other types:
Complex
d = np.array([1+2j, 3+4j, 5+6*1j])
d.dtype
Bool
e = np.array([True, False, False, True])
e.dtype
Strings
f = np.array(['Bonjour', 'Hello', 'Hallo',])
f.dtype # <--- strings containing max. 7 letters
Much more
int32
int64
unit32
unit64
Now that we have our first data arrays, we are going to visualize them.
Start by launching IPython in pylab mode.
Or the notebook:
Alternatively, if IPython has already been started:
%pylab
Or, from the notebook:
%pylab inline
The inline
is important for the notebook, so that plots are displayed
in the notebook and not in a new window.
Matplotlib is a 2D plotting package. We can import its functions as below:
import matplotlib.pyplot as plt # the tidy way
And then use (note that you have to use show
explicitly):
plt.plot(x, y) # line plot
plt.show() # <-- shows the plot (not needed with pylab)
Or, if you are using pylab:
plot(x, y) # line plot
Using import matplotlib.pyplot as plt
is recommended for use in
scripts. Whereas pylab
is recommended for interactive exploratory
work.
x = np.linspace(0, 3, 20)
y = np.linspace(0, 9, 20)
plt.plot(x, y) # line plot
plt.plot(x, y, 'o') # dot plot
image = np.random.rand(30, 30)
plt.imshow(image, cmap=plt.cm.hot)
plt.colorbar()
More in the Matplotlib tutorial this afternoon
Plot some simple arrays.
Try to use both the IPython shell and the notebook, if possible.
Try using the gray
colormap.
The items of an array can be accessed and assigned to the same way as other Python sequences (e.g. lists):
a = np.arange(10)
a
a[0], a[2], a[-1]
Indices begin at 0, like other Python sequences (and C/C++). In contrast, in Fortran or Matlab, indices begin at 1.
The usual python idiom for reversing a sequence is supported:
a[::-1]
For multidimensional arrays, indexes are tuples of integers:
a = np.diag(np.arange(3))
a
a[1, 1]
a[2, 1] = 10 # third line, second column
a
a[1]
Note that:
In 2D, the first dimension corresponds to rows, the second to columns.
Let us repeat together: the first dimension corresponds to rows, the
second to columns.
a
, a[0]
is interpreted by taking all elementsin the unspecified dimensions.
Slicing Arrays, like other Python sequences can also be sliced:
a = np.arange(10)
a
a[2:9:3] # [start:end:step]
Note that the last index is not included! :
a[:4]
All three slice components are not required: by default, `start` is 0, `end` is the last and `step` is 1:
a[1:3]
a[::2]
a[3:]
A small illustrated summary of Numpy indexing and slicing...
from IPython.display import Image
Image(filename='images/numpy_indexing.png')
You can also combine assignment and slicing:
a = np.arange(10)
a[5:] = 10
a
b = np.arange(5)
a[5:] = b[::-1]
a
Try the different flavours of slicing, using start
, end
and step
.
Verify that the slices in the diagram above are indeed correct. You may
use the following expression to create the array:
np.arange(6) + np.arange(0, 51, 10)[:, np.newaxis]
example above.
-2
, in the reversal idiom above. Whateffect does this have?
Create the following arrays (with correct data types):
Par on course: 3 statements for each
Hint: Individual array elements can be accessed similarly to a list,
e.g. a[1]
or a[1, 2]
.
Hint: Examine the docstring for diag
.
Skim through the documentation for np.tile
, and use this function to
construct the array:
A slicing operation creates a view on the original array, which is
just a way of accessing array data. Thus the original array is not
copied in memory. You can use np.may_share_memory()
to check if two
arrays share the same memory block. Note however, that this uses
heuristics and may give you false positives.
When modifying the view, the original array is modified as well:
a = np.arange(10)
a
b = a[::2]
b
np.may_share_memory(a, b)
b[0] = 12
b
a # (!)
a = np.arange(10)
c = a[::2].copy() # force a copy
c[0] = 12
a
np.may_share_memory(a, c)
This behavior can be surprising at first sight... but it allows to save both memory and time.
from IPython.display import Image
Image(filename='images/prime-sieve.png')
Compute prime numbers in 0--99, with a sieve
is_prime
, filled with True inthe beginning:
is_prime = np.ones((100,), dtype=bool)
is_prime[:2] = 0
j
starting from 2, cross out its higher multiples:N_max = int(np.sqrt(len(is_prime)))
for j in range(2, N_max):
is_prime[2*j::j] = False
Skim through help(np.nonzero)
, and print the prime numbers
Follow-up:
Move the above code into a script file named prime_sieve.py
Run it to check it works
Use the optimization suggested in [the sieve of
Eratosthenes](http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes):
* Skip `j` which are already known to not be primes
* The first number to cross out is $j^2$
Numpy arrays can be indexed with slices, but also with boolean or integer arrays (masks). This method is called fancy indexing. It creates copies not views.
np.random.seed(3)
a = np.random.random_integers(0, 20, 15)
a
(a % 3 == 0)
mask = (a % 3 == 0)
extract_from_a = a[mask] # or, a[a%3==0]
extract_from_a # extract a sub-array with the mask
Indexing with a mask can be very useful to assign a new value to a sub-array:
a[a % 3 == 0] = -1
a
a = np.arange(0, 100, 10)
a
Indexing can be done with an array of integers, where the same index is repeated several time:
a[[2, 3, 2, 4, 2]] # note: [2, 3, 2, 4, 2] is a Python list
New values can be assigned with this kind of indexing:
a[[9, 7]] = -100
a
When a new array is created by indexing with an array of integers, the new array has the same shape than the array of integers:
a = np.arange(10)
idx = np.array([[3, 4], [9, 7]])
idx.shape
a[idx]
The image below illustrates various fancy indexing applications
from IPython.display import Image
Image(filename='images/numpy_fancy_indexing.png')
Again, verify the fancy indexing shown in the diagram above.
Use fancy indexing on the left and array creation on the right to assign
values from a smaller array to a larger array.