name = '2017-10-16-masked-arrays'
title = 'Masked arrays in NumPy'
tags = 'numpy'
author = 'Denis Sergeev'
from nb_tools import connect_notebook_to_post
from IPython.core.display import HTML
html = connect_notebook_to_post(name, title, tags, author)
A masked array includes:
Those elements deemed bad are treated as if they did not exist. Operations using the array automatically use the mask of bad values.
Typically bad values may represent something like a land mask (i.e. sea surface temperature only exists where there is ocean).
All operations related to masked arrays live in numpy.ma
submodule.
import numpy as np
x = np.array([1, 2, 3, 4, 5])
x
array([1, 2, 3, 4, 5])
The simplest example of manual creation of a masked array:
mx = np.ma.masked_array(data=x,
mask=[True, False, False, True, False],
# fill_value=-999
)
mx
masked_array(data = [-- 2 3 -- 5], mask = [ True False False True False], fill_value = 999999)
We can check if an array contains any masked values
np.ma.is_masked(mx)
True
or we can check if a particular element is masked
mx[1] is np.ma.masked
False
The original data are not erased, they are still stored in the data
attribute:
mx.data
array([1, 2, 3, 4, 5])
Can be accessed directly
mx.mask
array([ True, False, False, True, False], dtype=bool)
The masked entries can be filled with a given value to get an usual array back:
mx.filled()
array([999999, 2, 3, 999999, 5])
The mask can also be cleared:
mx.mask = np.ma.nomask
mx.mask
array([False, False, False, False, False], dtype=bool)
Some functions handle masked values automatically, e.g. the log
function.
np.log(mx)
masked_array(data = [0.0 0.6931471805599453 1.0986122886681098 1.3862943611198906 1.6094379124341003], mask = [False False False False False], fill_value = 999999)
np.ma.log(mx)
masked_array(data = [0.0 0.6931471805599453 1.0986122886681098 1.3862943611198906 1.6094379124341003], mask = [False False False False False], fill_value = 999999)
Note that result is the same.
Others don't see the mask, and so a relevant function from the ma
submodule should be used instead (if it exists):
np.dot(mx, mx)
55
np.ma.dot(mx, mx)
masked_array(data = 55, mask = False, fill_value = 999999)
Often, a task is to mask array depending on a criterion.
a = np.linspace(1, 15, 15)
a
array([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13., 14., 15.])
masked_a = np.ma.masked_greater_equal(a, 11)
masked_a
masked_array(data = [1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 -- -- -- -- --], mask = [False False False False False False False False False False True True True True True], fill_value = 1e+20)
Other simple examples can be found in the NumPy Docs: https://docs.scipy.org/doc/numpy-1.13.0/reference/maskedarray.generic.html#examples
HTML(html)
This post was written as an IPython (Jupyter) notebook. You can view or download it using nbviewer.