from IPython.display import Image, HTML
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
HTML('<iframe src=http://www.numpy.org/ width=900 height=350></iframe>')
Numpy is the fundamental building block of the so-called Python Scientific Stack
of arrays of numerical values, integers, reals, complex
Numpy arrays by default can only hold objects of the same type (e.g. cannot mix integers and floats))
It also provides with random number capability, some convenience simple statistical functions (mean, std, ...) and some essential linear algebra operations
Constructing an numpy array
x = [1,2,3] # x is a regular python list
x = np.array(x)
x
array([1, 2, 3])
a = np.array(range(10))
a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
x = [[1,2,3],[4,5,6]] # nested lists
x = np.array(x)
x
array([[1, 2, 3], [4, 5, 6]])
x.shape
(2, 3)
The shape attribute is a tuple containing the dimensions along each axis (0 = rows, 1 = columns, ...)
x.shape
(2, 3)
x = np.arange(1000).reshape((10,10,10))
x.shape
(10, 10, 10)
** Be careful: keep in mind references** Vs. copies of arrays!
a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
x = a
x
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
a[2] = 99
a
array([ 0, 1, 99, 3, 4, 5, 6, 7, 8, 9])
x
array([ 0, 1, 99, 3, 4, 5, 6, 7, 8, 9])
x has changed !, because x = a only means you create a new reference pointing to the same object as the array a
If you want to create a copy instead of a reference, you need to do it explicitely
x = a.copy()
a[3] = 888
a
array([ 0, 1, 99, 888, 4, 5, 6, 7, 8, 9])
x
array([ 0, 1, 99, 3, 4, 5, 6, 7, 8, 9])
Numpy tries and guess the correct type of the numerical values if you pass a list, here x is of type integer 64
x.dtype
dtype('int64')
You can change that using the method astype()
x = x.astype(np.float)
x.dtype
dtype('float64')
x
array([ 0., 1., 99., 3., 4., 5., 6., 7., 8., 9.])
you cannot mix different types, numpy will cast all values to the safest dtype
x = np.array([1.0,1,1])
x
array([ 1., 1., 1.])
np.arange(start, stop, step, dtype=None)
np.arange(100)
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])
np.arange(0,100,10)
array([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])
np.linspace(start, stop, num=50, *endpoint=True*, *retstep=False*)
np.linspace(0,0.5,10, endpoint=False)
array([ 0. , 0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 , 0.45])
same as np.arange(0,0.5, 0.05)
np.arange(0,0.5,0.05)
array([ 0. , 0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 , 0.45])
np.ones(shape, dtype=None, order='C')
creates an array of ones, similar function is np.zeros
np.ones((10,12))
array([[ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])
np.empty
can be used to pre-allocate "empty" arrays with small memory footprint
np.empty((5,5))
array([[ -2.00000000e+000, -2.68156175e+154, 2.12325858e-314, 2.18209237e-314, 2.12424592e-314], [ 2.18209239e-314, 2.12327588e-314, 2.18144762e-314, 2.12424592e-314, 2.18209242e-314], [ 2.12327588e-314, 2.18121852e-314, 2.12424592e-314, 2.18209244e-314, 2.12327588e-314], [ 2.18145174e-314, 2.12343765e-314, 2.18209247e-314, 2.12327588e-314, 2.18209249e-314], [ 2.12343765e-314, 2.12358380e-314, 2.12358380e-314, 2.12358380e-314, 2.12358380e-314]])
b = np.arange(25,dtype=float).reshape(5,5) # creating a 2D array, note the dtype
print(b)
[[ 0. 1. 2. 3. 4.] [ 5. 6. 7. 8. 9.] [ 10. 11. 12. 13. 14.] [ 15. 16. 17. 18. 19.] [ 20. 21. 22. 23. 24.]]
b.shape
(5, 5)
b[0,1] # FIRST row, SECOND column, remember, Python indexing is zero-based
1.0
b[:,0] # first column
array([ 0., 5., 10., 15., 20.])
b[1,:] # second row
array([ 5., 6., 7., 8., 9.])
b[:,2:] # all rows from third colum
array([[ 2., 3., 4.], [ 7., 8., 9.], [ 12., 13., 14.], [ 17., 18., 19.], [ 22., 23., 24.]])
b[-1,:] # last row
array([ 20., 21., 22., 23., 24.])
b[:,::-1] # handy: reverses column order
array([[ 4., 3., 2., 1., 0.], [ 9., 8., 7., 6., 5.], [ 14., 13., 12., 11., 10.], [ 19., 18., 17., 16., 15.], [ 24., 23., 22., 21., 20.]])
b
array([[ 0., 1., 2., 3., 4.], [ 5., 6., 7., 8., 9.], [ 10., 11., 12., 13., 14.], [ 15., 16., 17., 18., 19.], [ 20., 21., 22., 23., 24.]])
slices allows to easily select contiguous row / colums
b[0:2,1:3] # first two columns, and columns 2 to 3
array([[ 1., 2.], [ 6., 7.]])
When you want to select non-contiguous (or repeated) rows / columns, you can use the ix_ construct
b[np.ix_([0,1,3],[1,1,3])]
array([[ 1., 1., 3.], [ 6., 6., 8.], [ 16., 16., 18.]])
b
array([[ 0., 1., 2., 3., 4.], [ 5., 6., 7., 8., 9.], [ 10., 11., 12., 13., 14.], [ 15., 16., 17., 18., 19.], [ 20., 21., 22., 23., 24.]])
b = b[b>10]
print(b)
[ 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24.]
x = np.array([0,1,2,3]); print(x)
[0 1 2 3]
x + 3
array([3, 4, 5, 6])
x = np.arange(12).reshape((3,4))
x
array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]])
y = np.arange(3).reshape((3,1)); # or y = np.arange(3)[np.newaxis,:] or y = np.arange(3)[None,:]
y.shape
(3, 1)
x + y
array([[ 0, 1, 2, 3], [ 5, 6, 7, 8], [10, 11, 12, 13]])
x = np.arange(15).reshape((3, 5))
x
array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]])
y = np.ones(8)
z = x[...,np.newaxis] + y
print(z)
[[[ 1. 1. 1. 1. 1. 1. 1. 1.] [ 2. 2. 2. 2. 2. 2. 2. 2.] [ 3. 3. 3. 3. 3. 3. 3. 3.] [ 4. 4. 4. 4. 4. 4. 4. 4.] [ 5. 5. 5. 5. 5. 5. 5. 5.]] [[ 6. 6. 6. 6. 6. 6. 6. 6.] [ 7. 7. 7. 7. 7. 7. 7. 7.] [ 8. 8. 8. 8. 8. 8. 8. 8.] [ 9. 9. 9. 9. 9. 9. 9. 9.] [ 10. 10. 10. 10. 10. 10. 10. 10.]] [[ 11. 11. 11. 11. 11. 11. 11. 11.] [ 12. 12. 12. 12. 12. 12. 12. 12.] [ 13. 13. 13. 13. 13. 13. 13. 13.] [ 14. 14. 14. 14. 14. 14. 14. 14.] [ 15. 15. 15. 15. 15. 15. 15. 15.]]]
z.shape
(3, 5, 8)
We have already seen some array methods above, in this section I just dwelve into their general behavior and present some convenient array methods
In python, everything is an object, numpy ndarrays are no exceptions, and they expose a number of methods, which are equivalent to the call the respective numpy function, but can be more convenient to use, and - marginally - faster
a.shape # shape of a (like *size* in matlab)
(10,)
np.shape(a)
(10,)
%timeit a.mean() # The %timeit magic cell function times the execution of an expression
100000 loops, best of 3: 16.9 µs per loop
%timeit np.mean(a)
100000 loops, best of 3: 18.1 µs per loop
b = np.arange(100,dtype=float).reshape(10,10) # creating a 2D array, note the dtype
b.shape
(10, 10)
b
array([[ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.], [ 10., 11., 12., 13., 14., 15., 16., 17., 18., 19.], [ 20., 21., 22., 23., 24., 25., 26., 27., 28., 29.], [ 30., 31., 32., 33., 34., 35., 36., 37., 38., 39.], [ 40., 41., 42., 43., 44., 45., 46., 47., 48., 49.], [ 50., 51., 52., 53., 54., 55., 56., 57., 58., 59.], [ 60., 61., 62., 63., 64., 65., 66., 67., 68., 69.], [ 70., 71., 72., 73., 74., 75., 76., 77., 78., 79.], [ 80., 81., 82., 83., 84., 85., 86., 87., 88., 89.], [ 90., 91., 92., 93., 94., 95., 96., 97., 98., 99.]])
b.sum(0) ### summing over the first axis ('rows')
array([ 450., 460., 470., 480., 490., 500., 510., 520., 530., 540.])
np.sum(b, axis=0)
array([ 450., 460., 470., 480., 490., 500., 510., 520., 530., 540.])
b[:,0].sum()
450.0
swapaxes method swap the position of 2 axes
b = b.swapaxes(1,0)
b
array([[ 0., 10., 20., 30., 40., 50., 60., 70., 80., 90.], [ 1., 11., 21., 31., 41., 51., 61., 71., 81., 91.], [ 2., 12., 22., 32., 42., 52., 62., 72., 82., 92.], [ 3., 13., 23., 33., 43., 53., 63., 73., 83., 93.], [ 4., 14., 24., 34., 44., 54., 64., 74., 84., 94.], [ 5., 15., 25., 35., 45., 55., 65., 75., 85., 95.], [ 6., 16., 26., 36., 46., 56., 66., 76., 86., 96.], [ 7., 17., 27., 37., 47., 57., 67., 77., 87., 97.], [ 8., 18., 28., 38., 48., 58., 68., 78., 88., 98.], [ 9., 19., 29., 39., 49., 59., 69., 79., 89., 99.]])
here equivalent to transpose() on 2D array
b = b.transpose()
b
array([[ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.], [ 10., 11., 12., 13., 14., 15., 16., 17., 18., 19.], [ 20., 21., 22., 23., 24., 25., 26., 27., 28., 29.], [ 30., 31., 32., 33., 34., 35., 36., 37., 38., 39.], [ 40., 41., 42., 43., 44., 45., 46., 47., 48., 49.], [ 50., 51., 52., 53., 54., 55., 56., 57., 58., 59.], [ 60., 61., 62., 63., 64., 65., 66., 67., 68., 69.], [ 70., 71., 72., 73., 74., 75., 76., 77., 78., 79.], [ 80., 81., 82., 83., 84., 85., 86., 87., 88., 89.], [ 90., 91., 92., 93., 94., 95., 96., 97., 98., 99.]])
b = np.transpose(b)
b
array([[ 0., 10., 20., 30., 40., 50., 60., 70., 80., 90.], [ 1., 11., 21., 31., 41., 51., 61., 71., 81., 91.], [ 2., 12., 22., 32., 42., 52., 62., 72., 82., 92.], [ 3., 13., 23., 33., 43., 53., 63., 73., 83., 93.], [ 4., 14., 24., 34., 44., 54., 64., 74., 84., 94.], [ 5., 15., 25., 35., 45., 55., 65., 75., 85., 95.], [ 6., 16., 26., 36., 46., 56., 66., 76., 86., 96.], [ 7., 17., 27., 37., 47., 57., 67., 77., 87., 97.], [ 8., 18., 28., 38., 48., 58., 68., 78., 88., 98.], [ 9., 19., 29., 39., 49., 59., 69., 79., 89., 99.]])
You can handle missing values in Numpy in two different ways:
Using the masked array approach, while somehow more cumbersome, allows more flexibility
b[5,5] = -999.
print(b)
[[ 0. 10. 20. 30. 40. 50. 60. 70. 80. 90.] [ 1. 11. 21. 31. 41. 51. 61. 71. 81. 91.] [ 2. 12. 22. 32. 42. 52. 62. 72. 82. 92.] [ 3. 13. 23. 33. 43. 53. 63. 73. 83. 93.] [ 4. 14. 24. 34. 44. 54. 64. 74. 84. 94.] [ 5. 15. 25. 35. 45. -999. 65. 75. 85. 95.] [ 6. 16. 26. 36. 46. 56. 66. 76. 86. 96.] [ 7. 17. 27. 37. 47. 57. 67. 77. 87. 97.] [ 8. 18. 28. 38. 48. 58. 68. 78. 88. 98.] [ 9. 19. 29. 39. 49. 59. 69. 79. 89. 99.]]
c = b.copy()
c[c == -999.] = np.nan
print(c)
[[ 0. 10. 20. 30. 40. 50. 60. 70. 80. 90.] [ 1. 11. 21. 31. 41. 51. 61. 71. 81. 91.] [ 2. 12. 22. 32. 42. 52. 62. 72. 82. 92.] [ 3. 13. 23. 33. 43. 53. 63. 73. 83. 93.] [ 4. 14. 24. 34. 44. 54. 64. 74. 84. 94.] [ 5. 15. 25. 35. 45. nan 65. 75. 85. 95.] [ 6. 16. 26. 36. 46. 56. 66. 76. 86. 96.] [ 7. 17. 27. 37. 47. 57. 67. 77. 87. 97.] [ 8. 18. 28. 38. 48. 58. 68. 78. 88. 98.] [ 9. 19. 29. 39. 49. 59. 69. 79. 89. 99.]]
nan
values propagates automatically when using aggregation functions
c.mean(1)
array([ 45., 46., 47., 48., 49., nan, 51., 52., 53., 54.])
Unless you use one of the - few - NaN functions
np.nansum(c,axis=1)
array([ 450., 460., 470., 480., 490., 445., 510., 520., 530., 540.])
Now we're gonna see how to use the ma
(Masked Arrays) module
from numpy import ma
b = ma.masked_values(b, -999.)
print(b)
[[0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0 90.0] [1.0 11.0 21.0 31.0 41.0 51.0 61.0 71.0 81.0 91.0] [2.0 12.0 22.0 32.0 42.0 52.0 62.0 72.0 82.0 92.0] [3.0 13.0 23.0 33.0 43.0 53.0 63.0 73.0 83.0 93.0] [4.0 14.0 24.0 34.0 44.0 54.0 64.0 74.0 84.0 94.0] [5.0 15.0 25.0 35.0 45.0 -- 65.0 75.0 85.0 95.0] [6.0 16.0 26.0 36.0 46.0 56.0 66.0 76.0 86.0 96.0] [7.0 17.0 27.0 37.0 47.0 57.0 67.0 77.0 87.0 97.0] [8.0 18.0 28.0 38.0 48.0 58.0 68.0 78.0 88.0 98.0] [9.0 19.0 29.0 39.0 49.0 59.0 69.0 79.0 89.0 99.0]]
b
has now two new attributes: data, and mask
print(b.data)
[[ 0. 10. 20. 30. 40. 50. 60. 70. 80. 90.] [ 1. 11. 21. 31. 41. 51. 61. 71. 81. 91.] [ 2. 12. 22. 32. 42. 52. 62. 72. 82. 92.] [ 3. 13. 23. 33. 43. 53. 63. 73. 83. 93.] [ 4. 14. 24. 34. 44. 54. 64. 74. 84. 94.] [ 5. 15. 25. 35. 45. -999. 65. 75. 85. 95.] [ 6. 16. 26. 36. 46. 56. 66. 76. 86. 96.] [ 7. 17. 27. 37. 47. 57. 67. 77. 87. 97.] [ 8. 18. 28. 38. 48. 58. 68. 78. 88. 98.] [ 9. 19. 29. 39. 49. 59. 69. 79. 89. 99.]]
print(b.mask)
[[False False False False False False False False False False] [False False False False False False False False False False] [False False False False False False False False False False] [False False False False False False False False False False] [False False False False False False False False False False] [False False False False False True False False False False] [False False False False False False False False False False] [False False False False False False False False False False] [False False False False False False False False False False] [False False False False False False False False False False]]
b.mean(1)
masked_array(data = [45.0 46.0 47.0 48.0 49.0 49.44444444444444 51.0 52.0 53.0 54.0], mask = [False False False False False False False False False False], fill_value = 1e+20)
c = b.mean(1)
c.shape
(10,)
Masked values are just ignored in calculations
Other masked array functions allow to set masked values according to some conditions, all (I think) functions exposed for regular numpy arrays are available.
ma.
File "<ipython-input-202-a1280ea86273>", line 1 ma. ^ SyntaxError: invalid syntax
b = ma.masked_greater(b, 80)
b
masked_array(data = [[0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0 --] [1.0 11.0 21.0 31.0 41.0 51.0 61.0 71.0 -- --] [2.0 12.0 22.0 32.0 42.0 52.0 62.0 72.0 -- --] [3.0 13.0 23.0 33.0 43.0 53.0 63.0 73.0 -- --] [4.0 14.0 24.0 34.0 44.0 54.0 64.0 74.0 -- --] [5.0 15.0 25.0 35.0 45.0 -- 65.0 75.0 -- --] [6.0 16.0 26.0 36.0 46.0 56.0 66.0 76.0 -- --] [7.0 17.0 27.0 37.0 47.0 57.0 67.0 77.0 -- --] [8.0 18.0 28.0 38.0 48.0 58.0 68.0 78.0 -- --] [9.0 19.0 29.0 39.0 49.0 59.0 69.0 79.0 -- --]], mask = [[False False False False False False False False False True] [False False False False False False False False True True] [False False False False False False False False True True] [False False False False False False False False True True] [False False False False False False False False True True] [False False False False False True False False True True] [False False False False False False False False True True] [False False False False False False False False True True] [False False False False False False False False True True] [False False False False False False False False True True]], fill_value = 1e+20)
b.mean(0)
masked_array(data = [4.5 14.5 24.5 34.5 44.5 54.44444444444444 64.5 74.5 80.0 --], mask = [False False False False False False False False False True], fill_value = 1e+20)
You can generate random numbers coming from a wide range of distributions
np.random.
If you want more distributions, and more flexibility, you need to look into scipy.stats.distributions (see the statistical modelling notebook)
Example 1: Normal distribution
$$P(x) = \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left[-\frac{(x-\mu)^2}{2\sigma^2}\right]$$x = np.random.normal(0,1,1000)
x = np.random.standard_normal(1000)
_ = plt.hist(x, color='0.6',normed=True)
Example 2: Poisson distribution
\begin{equation*} P\left( x \right) = \frac{{e^{ - \lambda } \lambda ^x }}{{x!}} \end{equation*}poiss = np.random.poisson(lam=4, size=(10000))
#a = np.array([len(np.where(poiss==x)[0]) for x in np.unique(poiss)])
a = np.array([sum(poiss==x) for x in np.unique(poiss)])
f, ax = plt.subplots()
ax.plot(np.unique(poiss), a/10000., '-', color='grey')
ax.plot(np.unique(poiss), a/10000., 'o', color='purple', markersize=10)
ax.set_title("Poisson Empirical PMF, $\lambda = 4$", fontsize=20)
ax.set_xlim(-1,17)
ax.set_ylim(0,0.25)
[l.set_fontsize(16) for l in ax.xaxis.get_ticklabels()]
[l.set_fontsize(16) for l in ax.yaxis.get_ticklabels()];
import math # the math module exposes mathematical functions operating on scalars
lam = 4; end = 20 # lambda is a reserved word in python
P = [math.exp(-lam)*lam**x/math.factorial(x) for x in xrange(end)] # note the use of a list comprehension
f, ax = plt.subplots()
ax.plot(P, '-', color='grey')
ax.plot(P, 'o', color='purple', markersize=10)
ax.set_title("Poisson PMF, $\lambda = 4$", fontsize=20)
ax.set_xlim(-1,17)
ax.set_ylim(0,0.25)
[l.set_fontsize(16) for l in ax.xaxis.get_ticklabels()]
[l.set_fontsize(16) for l in ax.yaxis.get_ticklabels()];
From Wikipedia (http://en.wikipedia.org/wiki/Poisson_distribution)
Scipy can is a library (but can be thought more as a scientific python distribution) that is built on top of Numpy and and provides a large set of standard scientific computing algorithms, in particular:
We'll only have a look at some of these sub-packages, more directly relevant to analyzing data
from scipy import stats
dir(stats)
['Tester', '__all__', '__builtins__', '__doc__', '__file__', '__name__', '__package__', '__path__', '_binned_statistic', '_rank', '_support', '_tukeylambda_stats', 'absolute_import', 'alpha', 'anderson', 'anglit', 'ansari', 'arcsine', 'bartlett', 'bayes_mvs', 'bernoulli', 'beta', 'betai', 'betaprime', 'binned_statistic', 'binned_statistic_2d', 'binned_statistic_dd', 'binom', 'binom_test', 'boltzmann', 'boxcox', 'boxcox_llf', 'boxcox_normmax', 'boxcox_normplot', 'bradford', 'burr', 'callable', 'cauchy', 'chi', 'chi2', 'chi2_contingency', 'chisqprob', 'chisquare', 'circmean', 'circstd', 'circvar', 'cmedian', 'contingency', 'cosine', 'cumfreq', 'describe', 'dgamma', 'distributions', 'division', 'dlaplace', 'dweibull', 'entropy', 'erlang', 'expon', 'exponpow', 'exponweib', 'f', 'f_oneway', 'f_value', 'f_value_multivariate', 'f_value_wilks_lambda', 'fastsort', 'fatiguelife', 'find_repeats', 'fisher_exact', 'fisk', 'fligner', 'foldcauchy', 'foldnorm', 'fprob', 'frechet_l', 'frechet_r', 'friedmanchisquare', 'futil', 'gamma', 'gausshyper', 'gaussian_kde', 'genexpon', 'genextreme', 'gengamma', 'genhalflogistic', 'genlogistic', 'genpareto', 'geom', 'gilbrat', 'glm', 'gmean', 'gompertz', 'gumbel_l', 'gumbel_r', 'halfcauchy', 'halflogistic', 'halfnorm', 'histogram', 'histogram2', 'hmean', 'hypergeom', 'hypsecant', 'invgamma', 'invgauss', 'invweibull', 'itemfreq', 'jarque_bera', 'johnsonsb', 'johnsonsu', 'kde', 'kendalltau', 'kruskal', 'ks_2samp', 'ksone', 'ksprob', 'kstat', 'kstatvar', 'kstest', 'kstwobign', 'kurtosis', 'kurtosistest', 'laplace', 'levene', 'levy', 'levy_l', 'levy_stable', 'linregress', 'loggamma', 'logistic', 'loglaplace', 'lognorm', 'logser', 'lomax', 'mannwhitneyu', 'maxwell', 'mielke', 'mode', 'moment', 'mood', 'morestats', 'mstats', 'mstats_basic', 'mstats_extras', 'mvn', 'mvsdist', 'nakagami', 'nanmean', 'nanmedian', 'nanstd', 'nbinom', 'ncf', 'nct', 'ncx2', 'norm', 'normaltest', 'np', 'obrientransform', 'oneway', 'pareto', 'pdf_fromgamma', 'pearson3', 'pearsonr', 'percentileofscore', 'planck', 'pointbiserialr', 'poisson', 'power_divergence', 'powerlaw', 'powerlognorm', 'powernorm', 'ppcc_max', 'ppcc_plot', 'print_function', 'probplot', 'randint', 'randwcdf', 'randwppf', 'rankdata', 'ranksums', 'rayleigh', 'rdist', 'recipinvgauss', 'reciprocal', 'relfreq', 'rice', 'rv', 'rv_continuous', 'rv_discrete', 's', 'scoreatpercentile', 'sem', 'semicircular', 'shapiro', 'sigmaclip', 'signaltonoise', 'skellam', 'skew', 'skewtest', 'spearmanr', 'square_of_sums', 'ss', 'statlib', 'stats', 't', 'test', 'threshold', 'tiecorrect', 'tmax', 'tmean', 'tmin', 'triang', 'trim1', 'trim_mean', 'trimboth', 'truncexpon', 'truncnorm', 'tsem', 'tstd', 'ttest_1samp', 'ttest_ind', 'ttest_rel', 'tukeylambda', 'tvar', 'uniform', 'variation', 'vonmises', 'vonmises_cython', 'wald', 'weibull_max', 'weibull_min', 'wilcoxon', 'wrapcauchy', 'zipf', 'zmap', 'zprob', 'zscore']
One commonly used parametric test is the Student t-test, which determines whether two independant samples have - statistically - different means, under the assumption that the two samples are normally distributed. The Null hypothesis is that the two samples have identical means.
stats.ttest_ind?
X = stats.distributions.norm(loc=5, scale=10)
Y = stats.distributions.norm(loc=3, scale=10)
Xrvs = X.rvs(size=(1000))
Yrvs = Y.rvs(size=(1000))
f, ax = plt.subplots(figsize=(6,5))
ax.hist(Xrvs,histtype='stepfilled', bins=20, color='coral', alpha=.6, label=r'$\mu = 5$')
ax.hist(Yrvs,histtype='stepfilled', bins=20, color='steelblue', alpha=.6, label=r'$\mu = 3$')
ax.grid(color='b',alpha=.6)
ax.legend();
t, p = stats.ttest_ind(Xrvs, Yrvs)
print("""\n===========================
T statistics = %s ~= %4.2f
P-value = %s ~= %8.6f
===========================
""" % (t, t, p, p))
=========================== T statistics = 4.5718835021 ~= 4.57 P-value = 5.12942295409e-06 ~= 0.000005 ===========================
from scipy.stats import pearsonr
We're gonna apply that to Time-Series of NINO3.4 and the SOI
The NINO indices come from http://www.cpc.ncep.noaa.gov/data/indices/ersst3b.nino.mth.81-10.ascii
The SOI is in the data
directory
import os
from datetime import datetime
from dateutil import parser
import pandas as pd
nino_url = "http://www.cpc.ncep.noaa.gov/data/indices/ersst3b.nino.mth.81-10.ascii"
# if not working try:
#nino = pd.read_table('./data/ersst3b.nino.mth.81-10.ascii', sep='\s*')
nino = pd.read_table(nino_url, sep='\s*')
dates = [datetime(x[0], x[1], 1) for x in zip(nino['YR'].values, nino['MON'].values)]
nino.index = dates
nino.head()
YR | MON | NINO1+2 | ANOM | NINO3 | ANOM.1 | NINO4 | ANOM.2 | NINO3.4 | ANOM.3 | |
---|---|---|---|---|---|---|---|---|---|---|
1950-01-01 | 1950 | 1 | 23.11 | -1.57 | 23.74 | -2.03 | 27.03 | -1.29 | 24.83 | -1.84 |
1950-02-01 | 1950 | 2 | 24.20 | -1.91 | 24.92 | -1.57 | 27.15 | -1.02 | 25.20 | -1.63 |
1950-03-01 | 1950 | 3 | 25.37 | -1.13 | 26.33 | -0.91 | 27.06 | -1.23 | 26.03 | -1.30 |
1950-04-01 | 1950 | 4 | 23.86 | -1.75 | 26.46 | -1.10 | 27.29 | -1.26 | 26.36 | -1.43 |
1950-05-01 | 1950 | 5 | 23.03 | -1.39 | 25.72 | -1.52 | 27.59 | -1.25 | 26.19 | -1.72 |
5 rows × 10 columns
soipath = './data'
def get_SOI(soipath, start_date='1950-01-01'):
soi = pd.read_csv(os.path.join(soipath,'NIWA_SOI.csv'), index_col=0)
soi = soi.stack()
soi = soi.dropna()
dates = [parser.parse("%s-%s-%s" % (str(int(x[0])), x[1], "1")) for x in soi.index]
soidf = pd.DataFrame(soi.values, index=dates, columns=['SOI'])
soidf = soidf.truncate(before=start_date)
return soidf
soi = get_SOI(soipath)
soi.head()
SOI | |
---|---|
1950-01-01 | 0.570492 |
1950-02-01 | 1.642897 |
1950-03-01 | 1.745849 |
1950-04-01 | 1.895316 |
1950-05-01 | 0.748673 |
5 rows × 1 columns
r, p = pearsonr(soi['SOI'], nino['ANOM.3'])
print("R = {0:<4.2f}, p-value = {1:<4.2f}".format(r,p))
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-234-113fb9f7cc5b> in <module>() ----> 1 r, p = pearsonr(soi['SOI'], nino['ANOM.3']) 2 3 print("R = {0:<4.2f}, p-value = {1:<4.2f}".format(r,p)) /Users/nicolasfauchereau/anaconda/lib/python2.7/site-packages/scipy/stats/stats.pyc in pearsonr(x, y) 2510 my = y.mean() 2511 xm, ym = x-mx, y-my -> 2512 r_num = np.add.reduce(xm * ym) 2513 r_den = np.sqrt(ss(xm) * ss(ym)) 2514 r = r_num / r_den ValueError: operands could not be broadcast together with shapes (768) (771)
#df = pd.DataFrame({'SOI':soi['SOI'],'nino':nino['ANOM.3']}, index=nino.index)
#df.to_csv('./data/soi_nino.csv')
There are several interfaces for performing 1D, 2D or N Dimensional interpolation using scipy.
For more on that I refer you to
Here we're going to see one simple example in 1D and 2D using Radial Basis Functions using the default (Multiquadric)
from scipy.interpolate import interp1d
x = np.linspace(0, 10, 10)
y = np.sin(x)
plt.plot(x,y,'ro')
[<matplotlib.lines.Line2D at 0x10c85b6d0>]
f1 = interp1d(x, y) # default is linear
f2 = interp1d(x, y, kind='cubic') # and we're gonna try cubic interpolation for comparison
xnew = np.linspace(0, 10, 50)
ylin = f1(xnew)
ycub = f2(xnew)
f, ax = plt.subplots(figsize=(12,8))
ax.plot(xnew, ylin, 'b.', ms=8, label='linear')
ax.plot(xnew, ycub, 'gx-', ms=10, label='cubic')
ax.plot(x,y,'ro-', label='data')
ax.legend(loc='lower left')
<matplotlib.legend.Legend at 0x10c837250>
from scipy.interpolate import Rbf
# defines a function of X and Y
def func(x,y):
return x*np.exp(-x**2-y**2)
np.random.seed(1) # we 'fix' the generation of random numbers so that we've got consistent results
x = np.random.uniform(-2., 2., 100)
y = np.random.uniform(-2., 2., 100)
z = func(x,y)
ti = np.linspace(-2.0, 2.0, 100)
XI, YI = np.meshgrid(ti, ti) # meshgrid creates uniform 2D grids from 1D vectors
# use RBF
rbf = Rbf(x, y, z, epsilon=2) # instantiates the interpolator
# you might want to play with the epsilon optional parameter
ZI = rbf(XI, YI) # interpolate on grid
# this is the 'True' field, the result of the function evaluated on a regular grid
true_Z = func(XI, YI)
# plot the result
f, ax = plt.subplots(figsize=(8,6))
im = ax.pcolor(XI, YI, ZI, cmap=plt.get_cmap('RdBu_r'))
ax.scatter(x, y, 50, z, cmap=plt.get_cmap('RdBu_r'), edgecolor='.5')
ax.set_title('RBF interpolation - multiquadrics')
ax.set_xlim(-2, 2)
ax.set_ylim(-2, 2)
plt.colorbar(im, orientation='vertical', pad=0.06);
# plot the difference between the 'true' field and the interpolated
f, ax = plt.subplots(figsize=(8,6))
im = ax.pcolor(XI, YI, ZI - true_Z, cmap=plt.get_cmap('RdBu_r'), vmin=-1e-3, vmax=1e-3)
ax.set_xlim(-2, 2)
ax.set_ylim(-2, 2)
plt.colorbar(im, orientation='vertical', pad=0.06);
In this case you want to fit noisy, e.g. experimental, data to a specific function, with unknown parameters
from scipy.optimize import curve_fit
The algorithm uses the Levenberg-Marquardt algorithm to perform non-linear least-square optimization
def fitFunc(t, a, b, c):
"""
defines the function
takes 3 parameters: a, b and c
"""
return a*np.exp(-b*t) + c
curve_fit?
### defines the evaluation domain
t = np.linspace(0,4,50)
### defines the paramaters
a = 5.0
b = 1.5
c = 0.5
### create the response
temp = fitFunc(t, a, b, c)
### add some noise to simulate "real world observations" such as experimental data
noisy = temp + 0.4 * np.random.normal(size=len(temp))
### use curve_fit to estimate the parameters and the covariance matrices
fitParams, fitCovariances = curve_fit(fitFunc, t, noisy)
afit, bfit, cfit = tuple(fitParams)
print("\nEstimated parameters\na: {0:<4.2f}, b: {1:<4.2f}, c: {2:<4.2f}\n\n".format(afit, bfit, cfit))
Estimated parameters a: 5.03, b: 1.64, c: 0.64
f, ax = plt.subplots(figsize=(8,8))
ax.set_ylabel(u'Temperature (\xb0C)', fontsize = 16)
ax.set_xlabel('time (s)', fontsize = 16)
ax.set_xlim(0,4.1)
# plot the data as red circles with vertical errorbars
ax.errorbar(t, noisy, fmt = 'ro', yerr = 0.2)
# now plot the best fit curve
ax.plot(t, fitFunc(t, afit, bfit, cfit),'k-', lw=2)
# and plot the +- 1 sigma curves
# (the square root of the diagonal covariance matrix
# element is the uncertainty on the fit parameter.)
sigma_a, sigma_b, sigma_c = np.sqrt(fitCovariances[0,0]), \
np.sqrt(fitCovariances[1,1]), \
np.sqrt(fitCovariances[2,2])
ax.plot(t, fitFunc(t, afit + sigma_a, bfit - sigma_b, cfit + sigma_c), 'b-')
ax.plot(t, fitFunc(t, afit - sigma_a, bfit + sigma_b, cfit - sigma_c), 'b-');
Python itself has a very good support for IO and dealing with ascii files
f = open('filename','r')
out = f.readlines()
f.close()
f = open('./data/ascii_table.txt', 'r')
out = f.readlines()
f.close()
out
[' 0 1 2 3 4 5 6 7 8 9\n', ' 10 11 12 13 14 15 16 17 18 19\n', ' 20 21 22 23 24 25 26 27 28 29\n', ' 30 31 32 33 34 35 36 37 38 39\n', ' 40 41 42 43 44 45 46 47 48 49\n', ' 50 51 52 53 54 55 56 57 58 59\n', ' 60 61 62 63 64 65 66 67 68 69\n', ' 70 71 72 73 74 75 76 77 78 79\n', ' 80 81 82 83 84 85 86 87 88 89\n', ' 90 91 92 93 94 95 96 97 98 99\n']
out = [map(np.int, x.split()) for x in out]
map(
out
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [20, 21, 22, 23, 24, 25, 26, 27, 28, 29], [30, 31, 32, 33, 34, 35, 36, 37, 38, 39], [40, 41, 42, 43, 44, 45, 46, 47, 48, 49], [50, 51, 52, 53, 54, 55, 56, 57, 58, 59], [60, 61, 62, 63, 64, 65, 66, 67, 68, 69], [70, 71, 72, 73, 74, 75, 76, 77, 78, 79], [80, 81, 82, 83, 84, 85, 86, 87, 88, 89], [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]]
out = np.array(out)
print(out)
[[ 0 1 2 3 4 5 6 7 8 9] [10 11 12 13 14 15 16 17 18 19] [20 21 22 23 24 25 26 27 28 29] [30 31 32 33 34 35 36 37 38 39] [40 41 42 43 44 45 46 47 48 49] [50 51 52 53 54 55 56 57 58 59] [60 61 62 63 64 65 66 67 68 69] [70 71 72 73 74 75 76 77 78 79] [80 81 82 83 84 85 86 87 88 89] [90 91 92 93 94 95 96 97 98 99]]
out.shape
(10, 10)
np.genfromtxt(
File "<ipython-input-264-22645fd25a39>", line 1 np.genfromtxt( ^ SyntaxError: unexpected EOF while parsing
a = np.genfromtxt('./data/ascii_table.txt', dtype=int) # can handle missing values
np.genfromtxt(
a
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [20, 21, 22, 23, 24, 25, 26, 27, 28, 29], [30, 31, 32, 33, 34, 35, 36, 37, 38, 39], [40, 41, 42, 43, 44, 45, 46, 47, 48, 49], [50, 51, 52, 53, 54, 55, 56, 57, 58, 59], [60, 61, 62, 63, 64, 65, 66, 67, 68, 69], [70, 71, 72, 73, 74, 75, 76, 77, 78, 79], [80, 81, 82, 83, 84, 85, 86, 87, 88, 89], [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])
#np.loadtxt() ### each row must have the same number of values
a = np.loadtxt('./data/ascii_table.txt', dtype=np.int)
a
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [20, 21, 22, 23, 24, 25, 26, 27, 28, 29], [30, 31, 32, 33, 34, 35, 36, 37, 38, 39], [40, 41, 42, 43, 44, 45, 46, 47, 48, 49], [50, 51, 52, 53, 54, 55, 56, 57, 58, 59], [60, 61, 62, 63, 64, 65, 66, 67, 68, 69], [70, 71, 72, 73, 74, 75, 76, 77, 78, 79], [80, 81, 82, 83, 84, 85, 86, 87, 88, 89], [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])
NOTE: For reading all csv, tabular format files including xls, xlsx, etc. the most convenient way is to use the special IO functions of Pandas, which we have seen for the SOI / NINO example, and will see in more details in the pandas notebook. All these functions will load these formats directly into Pandas DataFrames, which are a very convenient data structure for interacting with spreadsheet-like data.
NetCDF stands for Network Common Data form, it is used largely in the atmosphere and ocean sciences to store mostly gridded datasets.
There is one set of functions that comes with Scipy which is available from the IO module, with which you can read and create NetCDF files easily.
from scipy.io.netcdf import NetCDFFile
Here we are reading SSTs (for the Southern Hemisphere) from the Hadley SST (available here)
nc = NetCDFFile('data/Hadley_SST.nc', 'r')
nc.variables.keys()
['lat', 'time_bnds', 'lon', 'sst', 'time']
lat = nc.variables['lat'][:]
lon = nc.variables['lon'][:]
lat
array([ 9.5, 8.5, 7.5, 6.5, 5.5, 4.5, 3.5, 2.5, 1.5, 0.5, -0.5, -1.5, -2.5, -3.5, -4.5, -5.5, -6.5, -7.5, -8.5, -9.5, -10.5, -11.5, -12.5, -13.5, -14.5, -15.5, -16.5, -17.5, -18.5, -19.5, -20.5, -21.5, -22.5, -23.5, -24.5, -25.5, -26.5, -27.5, -28.5, -29.5, -30.5, -31.5, -32.5, -33.5, -34.5, -35.5, -36.5, -37.5, -38.5, -39.5, -40.5, -41.5, -42.5, -43.5, -44.5, -45.5, -46.5, -47.5, -48.5, -49.5, -50.5, -51.5, -52.5, -53.5, -54.5, -55.5, -56.5, -57.5, -58.5, -59.5, -60.5, -61.5, -62.5, -63.5, -64.5, -65.5, -66.5, -67.5, -68.5, -69.5, -70.5, -71.5, -72.5, -73.5, -74.5, -75.5, -76.5, -77.5, -78.5, -79.5, -80.5, -81.5, -82.5, -83.5, -84.5, -85.5, -86.5, -87.5, -88.5, -89.5], dtype=float32)
sst = nc.variables['sst'][:]
mval = nc.variables['sst'].missing_value
mval
-1e+30
nc.close() # don't forget to close
mval
-1e+30
sst.shape
(480, 100, 360)
sst = ma.masked_values(sst, mval)
f, ax = plt.subplots(figsize=(12,6))
im = ax.imshow(sst[0,:,:], interpolation='nearest', aspect='auto')
ax.set_title('SSTs')
cb = plt.colorbar(im);
cb.set_label(u'\xb0C', rotation=0)
And the netCDF4 standalone module, which is installed as part of the Anaconda distribution, and has more functionalities, including:
from netCDF4 import Dataset
nc = Dataset('data/Hadley_SST.nc', 'r')
sst = nc.variables['sst'][:]
sst = ma.masked_values(sst, mval)
nc.close()
f, ax = plt.subplots(figsize=(12,6))
im = ax.imshow(sst[0,:,:], interpolation='nearest', aspect='auto')
ax.set_title('SSTs')
cb = plt.colorbar(im);
cb.set_label(u'\xb0C', rotation=0)
A more recent project is the IRIS module, developped by the Met. Office in the UK:
from scipy.io.matlab import loadmat, savemat, whosmat
We first create a matlab file containing one variable
x = np.random.randn(10,10)
savemat('data/random_2darray.mat', {'xmat':x})
Then inspect its content with whosmat
whosmat('data/random_2darray.mat')
[('xmat', (10, 10), 'double')]
And read it, accessing the saved variable as a numpy arrray using a dict type syntax
matfile = loadmat('data/random_2darray.mat')
x = matfile['xmat'] # like a dictionnary
x
array([[ 6.02319280e-01, 4.20282204e-01, 8.10951673e-01, 1.04444209e+00, -4.00878192e-01, 8.24005618e-01, -5.62305431e-01, 1.95487808e+00, -1.33195167e+00, -1.76068856e+00], [ -1.65072127e+00, -8.90555584e-01, -1.11911540e+00, 1.95607890e+00, -3.26499498e-01, -1.34267579e+00, 1.11438298e+00, -5.86523939e-01, -1.23685338e+00, 8.75838928e-01], [ 6.23362177e-01, -4.34956683e-01, 1.40754000e+00, 1.29101580e-01, 1.61694960e+00, 5.02740882e-01, 1.55880554e+00, 1.09402696e-01, -1.21974440e+00, 2.44936865e+00], [ -5.45774168e-01, -1.98837863e-01, -7.00398505e-01, -2.03394449e-01, 2.42669441e-01, 2.01830179e-01, 6.61020288e-01, 1.79215821e+00, -1.20464572e-01, -1.23312074e+00], [ -1.18231813e+00, -6.65754518e-01, -1.67419581e+00, 8.25029824e-01, -4.98213564e-01, -3.10984978e-01, -1.89148284e-03, -1.39662042e+00, -8.61316361e-01, 6.74711526e-01], [ 6.18539131e-01, -4.43171931e-01, 1.81053491e+00, -1.30572692e+00, -3.44987210e-01, -2.30839743e-01, -2.79308500e+00, 1.93752881e+00, 3.66332015e-01, -1.04458938e+00], [ 2.05117344e+00, 5.85662000e-01, 4.29526140e-01, -6.06998398e-01, 1.06222724e-01, -1.52568032e+00, 7.95026094e-01, -3.74438319e-01, 1.34048197e-01, 1.20205486e+00], [ 2.84748111e-01, 2.62467445e-01, 2.76499305e-01, -7.33271604e-01, 8.36004719e-01, 1.54335911e+00, 7.58805660e-01, 8.84908814e-01, -8.77281519e-01, -8.67787223e-01], [ -1.44087602e+00, 1.23225307e+00, -2.54179868e-01, 1.39984394e+00, -7.81911683e-01, -4.37508983e-01, 9.54250872e-02, 9.21450069e-01, 6.07501958e-02, 2.11124755e-01], [ 1.65275673e-02, 1.77187720e-01, -1.11647002e+00, 8.09271010e-02, -1.86578994e-01, -5.68244809e-02, 4.92336556e-01, -6.80678141e-01, -8.45080274e-02, -2.97361883e-01]])
For matlab files containing matlab structures, the syntax is marginally more complicated
whosmat('./data/clusters_monthly.mat')
[('clusm', (1, 1), 'struct')]
matfile = loadmat('./data/clusters_monthly.mat', struct_as_record=False)
matfile.keys()
['clusm', '__version__', '__header__', '__globals__']
clusm = matfile['clusm'][0][0] ### corresponds to the (1,1) shape as indicated by whosmat
type(clusm)
scipy.io.matlab.mio5_params.mat_struct
### inspect the attributes of clusm
dir(clusm)
['__class__', '__delattr__', '__dict__', '__doc__', '__format__', '__getattribute__', '__hash__', '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__slotnames__', '__str__', '__subclasshook__', '__weakref__', '_fieldnames', 'data', 'name', 'regimes', 'regname', 'time']
clusm._fieldnames
['time', 'data', 'name', 'regimes', 'regname']
t = clusm.time
t
array([[1948, 1], [1948, 2], [1948, 3], ..., [2012, 3], [2012, 4], [2012, 5]], dtype=uint16)
data = clusm.data
data[0,:]
array([ 0.11290323, 0.0483871 , 0.09677419, 0.19354839, 0.0483871 , 0.0483871 , 0.0483871 , 0.12903226, 0.03225806, 0.0483871 , 0.17741935, 0.01612903])
IMPORTANT: A note on matlab dates and time
Matlab's datenum representation is the number of days since midnight on January 1st, 0 AD. Python's datetime handling, (throught the datetime.fromordinal function) assumes time is the number of days since midnight on January 1st, 1 AD. So the duration of year 0 separates the time representation returned by python's fromordinal function and the time representation used by matlab.
matlab_datenum = 731965.04835648148
The following snippet is from Conrad Lee at https://gist.github.com/conradlee/4366296
from datetime import datetime, timedelta
python_datetime = datetime.fromordinal(int(matlab_datenum)) \
+ timedelta(days=matlab_datenum%1) - timedelta(days = 366) ## year 0 was a leap year
python_datetime
datetime.datetime(2004, 1, 19, 1, 9, 38)
print(python_datetime)
2004-01-19 01:09:38
Another - easiest - way to do it is to use the module netcdftime (installed with anaconda), which has a num2date
function, and to specify the origin, as well as the calendar (ISO 8601 Proleptic Gregorian calendar, which allows for Year 0), you can convert a entire vector (numpy array) of Matlab datenums that way.
from netcdftime import num2date
date_proleptic = num2date(matlab_datenum, units='days since 000-01-00 00:00:00', \
calendar='proleptic_gregorian')
date_proleptic
datetime.datetime(2004, 1, 19, 1, 9, 38)
num2date?
HDF5
Please See http://www.h5py.org/docs/intro/quick.html
IDL:
from scipy.io import idl
idl.readsav()
Fortran files:
from scipy.io import FortranFile
f = FortranFile()
Examples
--------
To create an unformatted sequential Fortran file:
>>> from scipy.io import FortranFile
>>> f = FortranFile('test.unf', 'w')
>>> f.write_record(np.array([1,2,3,4,5],dtype=np.int32))
>>> f.write_record(np.linspace(0,1,20).reshape((5,-1)))
>>> f.close()
To read this file:
>>> from scipy.io import FortranFile
>>> f = FortranFile('test.unf', 'r')
>>> print(f.read_ints(dtype=np.int32))
[1 2 3 4 5]
>>> print(f.read_reals(dtype=np.float).reshape((5,-1)))
[[ 0. 0.05263158 0.10526316 0.15789474]
[ 0.21052632 0.26315789 0.31578947 0.36842105]
[ 0.42105263 0.47368421 0.52631579 0.57894737]
[ 0.63157895 0.68421053 0.73684211 0.78947368]
[ 0.84210526 0.89473684 0.94736842 1. ]]
>>> f.close()
%load_ext load_style
The load_style extension is already loaded. To reload it, use: %reload_ext load_style
%load_style talk.css