# add the "as np" so we don't have to type "numpy" each time
import numpy as np
At the beginning of the Python lesson we loaded data into NumPy from a text file. Let's look at a few other ways to make arrays. If you've got data in Python lists those can be converted to arrays:
# list like yesterday
odds = [1, 3, 5, 7]
odds
# pass the list to np.array
odds_arr = np.array(odds)
odds_arr
Lists and arrays have some similarities:
print('index', odds[1], odds_arr[1])
print('slice', odds[1:3], odds_arr[1:3])
print('length', len(odds), len(odds_arr))
for x in odds:
print(x)
for x in odds_arr:
print(x)
But some things are different:
How would you add 1 to every number in the odds
list and make a new list with those (even) numbers? (You don't need to write code, isntead talk about how you'd do this.)
evens = []
for x in odds:
y = x + 1
evens.append(y)
evens
With arrays this is simpler:
even_arr = odds_arr + 1
even_arr
Now suppose we want to add these odd and even numbers together. With lists that's another loop, but with arrays:
even_arr + odds_arr
Suppose we did need to loop over an array to perform a complex calculation and then store the result in a new array. We can't append to arrays, they have a fixed size.
odds_arr.append(9)
We need to create the new array with the appropriate size and shape. (By the way, you can check the size and shape of an array with the .size
and .shape
attributes.)
Functions for creating new arrays of same given shape are np.empty
, np.ones
, np.zeros
, and np.arange
. The one thing you have to tell these functions is how big the array needs to be.
np.ones(5)
np.zeros((3, 3))
Why did I put an extra pair of parentheses in the call to np.zeros
?
Which shape value is for the number of rows in the array, and which is for the number of columns?
np.zeros((4, 2))
np.arange
doesn't take a shape, instead it takes start, stop, and step values to make an array of numbers that cover some range:
r = np.arange(0, 20, 2)
r
See even more functions for making arrays at http://docs.scipy.org/doc/numpy/reference/routines.array-creation.html.
We're going to load some specialy prepared precipitation data derived from the precip_yearly.csv
file. It's saved in a special cross-platform NumPy binary format. Learn more about NumPy's special binary array storage format at http://docs.scipy.org/doc/numpy/reference/routines.io.html.
store = np.load('mean_ca_precip.npz')
years = store['years']
precip = store['precip']
print(years)
print(precip)
Arrays have methods attached to them for doing calculations and transformations with data. For example you can get the mean of an array (as seen with Pandas):
precip.mean()
And the NumPy package (np
) has functions that work on arrays, e.g. to calculate a logarithm:
np.log(precip)
Use IPython's tab completion feature to compare available array methods (e.g. precip.mean
) with available NumPy functions (e.g. np.log
). Do you notice any differences?
There are a massive number of routines in NumPy. For more info see http://docs.scipy.org/doc/numpy/reference/ufuncs.html#available-ufuncs and http://docs.scipy.org/doc/numpy/reference/routines.html.
Let's examine which years had more than average precipitation and which had below average. First we'll need the average precipitation:
avg = precip.mean()
Then we can compare that average to the values in the precip
array:
precip > avg
The comparison creates a new array of boolean (True/False) values that is True where the precipitation that year was above average and False where the precipitation was below average. We can use the boolean arrays to pull values out of other arrays:
above = precip > avg
print(precip[above])
print(years[above])
You can use boolean indexing without first assigning them to an array:
years[precip < avg]
But if you do have a boolean array and what its opposite you can use the ~
operator:
years[~above]
We can also use boolean indexing to make assignments to arrays:
arr = np.random.normal(size=(4, 4))
arr
arr[arr < 0] = 0
arr
Write a function that clips data in an array to given low and high values.
For example, calling clip(arr, 0, 1)
should return an array where values lower than zero have been replaced with 0 and values higher than one have been replaced with 1.