We'll be using this CSV file of tide level data for this demo. The data are from Battery Park, New York City on October 29-30, 2012, during Hurricane Sandy. The data in the CSV are a reformatted and cleaned version of this data from NOAA.
import numpy as np
Basic read with numpy.genfromtxt. Returns a 2d array of floats.
!head -n 5 BatteryParkTideData.csv
TimeOffsetHours,Pred6,Backup,Acoustc 0.0,1.5900000000000001,4.6799999999999997,4.6500000000000004 0.10000000000000001,1.5,4.5499999999999998,4.54 0.20000000000000001,1.3999999999999999,4.46,4.4400000000000004 0.29999999999999999,1.3100000000000001,4.3600000000000003,4.3300000000000001
data = np.genfromtxt('BatteryParkTideData.csv', delimiter=',', skip_header=1, missing='NA')
data
array([[ 0. , 1.59, 4.68, 4.65], [ 0.1 , 1.5 , 4.55, 4.54], [ 0.2 , 1.4 , 4.46, 4.44], ..., [ 47.7 , 3.25, 4.32, 4.5 ], [ 47.8 , 3.14, 4.22, 4.39], [ 47.9 , 3.03, 4.12, 4.28]])
print 'Shape: ', data.shape
print 'Size: ', data.size
print 'Number of dimensions: ', data.ndim
print 'Data type: ', data.dtype
Shape: (480, 4) Size: 1920 Number of dimensions: 2 Data type: float64
data[0]
array([ 0. , 1.59, 4.68, 4.65])
data[0, 1]
1.5900000000000001
data[:, 1]
array([ 1.59, 1.5 , 1.4 , 1.31, 1.22, 1.13, 1.04, 0.95, 0.87, 0.78, 0.7 , 0.62, 0.55, 0.48, 0.41, 0.34, 0.28, 0.23, 0.18, 0.14, 0.1 , 0.08, 0.06, 0.05, 0.05, 0.06, 0.08, 0.11, 0.14, 0.19, 0.25, 0.31, 0.39, 0.47, 0.56, 0.66, 0.76, 0.87, 0.98, 1.1 , 1.22, 1.35, 1.48, 1.61, 1.74, 1.88, 2.01, 2.15, 2.28, 2.41, 2.55, 2.68, 2.81, 2.94, 3.06, 3.19, 3.31, 3.43, 3.55, 3.66, 3.78, 3.89, 3.99, 4.1 , 4.2 , 4.3 , 4.39, 4.48, 4.57, 4.66, 4.73, 4.81, 4.89, 4.95, 5.02, 5.08, 5.13, 5.18, 5.23, 5.27, 5.3 , 5.33, 5.35, 5.37, 5.38, 5.38, 5.38, 5.37, 5.36, 5.34, 5.31, 5.28, 5.24, 5.19, 5.13, 5.07, 5. , 4.93, 4.85, 4.76, 4.67, 4.57, 4.47, 4.37, 4.26, 4.15, 4.03, 3.92, 3.8 , 3.68, 3.56, 3.44, 3.32, 3.21, 3.09, 2.97, 2.86, 2.74, 2.63, 2.52, 2.41, 2.3 , 2.2 , 2.09, 1.98, 1.88, 1.78, 1.68, 1.57, 1.47, 1.37, 1.27, 1.17, 1.08, 0.98, 0.89, 0.8 , 0.71, 0.62, 0.54, 0.46, 0.39, 0.32, 0.26, 0.21, 0.16, 0.12, 0.09, 0.07, 0.06, 0.05, 0.06, 0.08, 0.1 , 0.13, 0.18, 0.23, 0.29, 0.36, 0.44, 0.52, 0.61, 0.71, 0.81, 0.91, 1.02, 1.13, 1.25, 1.36, 1.49, 1.6 , 1.73, 1.85, 1.97, 2.09, 2.21, 2.33, 2.45, 2.57, 2.68, 2.8 , 2.91, 3.02, 3.13, 3.23, 3.33, 3.44, 3.53, 3.63, 3.72, 3.81, 3.9 , 3.98, 4.06, 4.14, 4.21, 4.28, 4.34, 4.41, 4.46, 4.51, 4.56, 4.6 , 4.64, 4.67, 4.69, 4.71, 4.73, 4.74, 4.74, 4.74, 4.72, 4.71, 4.69, 4.65, 4.62, 4.57, 4.52, 4.46, 4.4 , 4.33, 4.25, 4.17, 4.08, 3.99, 3.89, 3.79, 3.68, 3.57, 3.46, 3.35, 3.23, 3.11, 3. , 2.88, 2.76, 2.65, 2.53, 2.42, 2.31, 2.2 , 2.1 , 1.99, 1.89, 1.79, 1.7 , 1.6 , 1.51, 1.42, 1.33, 1.25, 1.16, 1.08, 1. , 0.92, 0.85, 0.77, 0.7 , 0.63, 0.56, 0.5 , 0.44, 0.38, 0.33, 0.29, 0.24, 0.21, 0.18, 0.16, 0.15, 0.14, 0.14, 0.16, 0.18, 0.21, 0.25, 0.3 , 0.36, 0.43, 0.5 , 0.59, 0.68, 0.78, 0.89, 1. , 1.12, 1.24, 1.37, 1.5 , 1.63, 1.77, 1.91, 2.05, 2.19, 2.32, 2.46, 2.6 , 2.73, 2.87, 3. , 3.13, 3.25, 3.38, 3.5 , 3.62, 3.73, 3.84, 3.95, 4.06, 4.16, 4.25, 4.35, 4.44, 4.52, 4.6 , 4.68, 4.76, 4.83, 4.89, 4.95, 5.01, 5.07, 5.11, 5.16, 5.2 , 5.23, 5.26, 5.29, 5.31, 5.32, 5.33, 5.33, 5.33, 5.32, 5.31, 5.29, 5.26, 5.23, 5.19, 5.14, 5.1 , 5.04, 4.97, 4.9 , 4.83, 4.75, 4.66, 4.57, 4.47, 4.37, 4.27, 4.16, 4.05, 3.93, 3.81, 3.69, 3.57, 3.45, 3.33, 3.21, 3.09, 2.97, 2.86, 2.74, 2.63, 2.51, 2.4 , 2.29, 2.19, 2.08, 1.98, 1.88, 1.78, 1.68, 1.58, 1.49, 1.39, 1.3 , 1.21, 1.12, 1.03, 0.94, 0.85, 0.77, 0.69, 0.6 , 0.53, 0.46, 0.39, 0.33, 0.27, 0.22, 0.18, 0.14, 0.11, 0.09, 0.08, 0.08, 0.09, 0.1 , 0.13, 0.16, 0.2 , 0.26, 0.32, 0.39, 0.47, 0.55, 0.64, 0.74, 0.84, 0.95, 1.06, 1.17, 1.29, 1.41, 1.53, 1.66, 1.78, 1.9 , 2.02, 2.15, 2.27, 2.39, 2.5 , 2.62, 2.73, 2.84, 2.95, 3.05, 3.16, 3.26, 3.36, 3.45, 3.54, 3.63, 3.71, 3.8 , 3.87, 3.95, 4.02, 4.09, 4.15, 4.22, 4.27, 4.32, 4.37, 4.42, 4.46, 4.49, 4.52, 4.55, 4.57, 4.58, 4.6 , 4.6 , 4.6 , 4.59, 4.58, 4.56, 4.54, 4.51, 4.48, 4.43, 4.38, 4.33, 4.27, 4.2 , 4.13, 4.05, 3.96, 3.87, 3.78, 3.68, 3.58, 3.47, 3.37, 3.25, 3.14, 3.03])
Have genfromtxt grab column names from the header and it returns a recarray. The recarray can be indexed numerically to get row data or with a column name to get a 1d array of data for that column.
data = np.genfromtxt('BatteryParkTideData.csv', delimiter=',', names=True, missing='NA')
data
array([(0.0, 1.59, 4.68, 4.65), (0.1, 1.5, 4.55, 4.54), (0.2, 1.4, 4.46, 4.44), (0.3, 1.31, 4.36, 4.33), (0.4, 1.22, 4.28, 4.26), (0.5, 1.13, 4.21, 4.18), (0.6, 1.04, 4.15, 4.12), (0.7, 0.95, 4.08, 4.06), (0.8, 0.87, 3.99, 3.97), (0.9, 0.78, 3.92, 3.89), (1.0, 0.7, 3.87, 3.85), (1.1, 0.62, 3.86, 3.83), (1.2, 0.55, 3.8, 3.78), (1.3, 0.48, 3.74, 3.73), (1.4, 0.41, 3.68, 3.66), (1.5, 0.34, 3.63, 3.62), (1.6, 0.28, 3.59, 3.58), (1.7, 0.23, 3.55, 3.53), (1.8, 0.18, 3.5, 3.48), (1.9, 0.14, 3.45, 3.42), (2.0, 0.1, 3.39, 3.35), (2.1, 0.08, 3.37, 3.34), (2.2, 0.06, 3.33, 3.31), (2.3, 0.05, 3.31, 3.3), (2.4, 0.05, 3.29, 3.26), (2.5, 0.06, 3.25, 3.22), (2.6, 0.08, 3.21, 3.19), (2.7, 0.11, 3.19, 3.17), (2.8, 0.14, 3.19, 3.17), (2.9, 0.19, 3.2, 3.18), (3.0, 0.25, 3.23, 3.23), (3.1, 0.31, 3.29, 3.26), (3.2, 0.39, 3.31, 3.29), (3.3, 0.47, 3.34, 3.32), (3.4, 0.56, 3.39, 3.37), (3.5, 0.66, 3.44, 3.41), (3.6, 0.76, 3.49, 3.46), (3.7, 0.87, 3.57, 3.54), (3.8, 0.98, 3.67, 3.65), (3.9, 1.1, 3.78, 3.76), (4.0, 1.22, 3.87, 3.85), (4.1, 1.35, 3.96, 3.95), (4.2, 1.48, 4.09, 4.06), (4.3, 1.61, 4.19, 4.18), (4.4, 1.74, 4.36, 4.33), (4.5, 1.88, 4.49, 4.46), (4.6, 2.01, 4.59, 4.57), (4.7, 2.15, 4.68, 4.66), (4.8, 2.28, 4.79, 4.77), (4.9, 2.41, 4.9, 4.88), (5.0, 2.55, 5.01, 4.98), (5.1, 2.68, 5.12, 5.11), (5.2, 2.81, 5.26, 5.24), (5.3, 2.94, 5.38, 5.35), (5.4, 3.06, 5.52, 5.5), (5.5, 3.19, 5.69, 5.67), (5.6, 3.31, 5.84, 5.83), (5.7, 3.43, 5.97, 5.96), (5.8, 3.55, 6.13, 6.11), (5.9, 3.66, 6.26, 6.24), (6.0, 3.78, 6.39, 6.38), (6.1, 3.89, 6.53, 6.51), (6.2, 3.99, 6.68, 6.66), (6.3, 4.1, 6.84, 6.82), (6.4, 4.2, 7.0, 6.98), (6.5, 4.3, 7.13, 7.1), (6.6, 4.39, 7.25, 7.23), (6.7, 4.48, 7.36, 7.32), (6.8, 4.57, 7.46, 7.43), (6.9, 4.66, 7.56, 7.53), (7.0, 4.73, 7.65, 7.62), (7.1, 4.81, 7.71, 7.71), (7.2, 4.89, 7.8, 7.78), (7.3, 4.95, 7.9, 7.88), (7.4, 5.02, 8.02, 7.98), (7.5, 5.08, 8.07, 8.04), (7.6, 5.13, 8.12, 8.1), (7.7, 5.18, 8.26, 8.23), (7.8, 5.23, 8.36, 8.31), (7.9, 5.27, 8.47, 8.35), (8.0, 5.3, 8.53, 8.35), (8.1, 5.33, 8.58, 8.35), (8.2, 5.35, 8.58, 8.35), (8.3, 5.37, 8.6, 8.35), (8.4, 5.38, 8.67, 8.36), (8.5, 5.38, 8.69, 8.35), (8.6, 5.38, 8.67, 8.34), (8.7, 5.37, 8.71, 8.34), (8.8, 5.36, 8.76, 8.34), (8.9, 5.34, 8.79, 8.35), (9.0, 5.31, 8.8, 8.32), (9.1, 5.28, 8.85, 8.34), (9.2, 5.24, 8.85, 8.34), (9.3, 5.19, 8.85, 8.29), (9.4, 5.13, 8.85, 8.32), (9.5, 5.07, 8.81, 8.3), (9.6, 5.0, 8.78, 8.33), (9.7, 4.93, 8.69, 8.34), (9.8, 4.85, 8.6, 8.29), (9.9, 4.76, 8.58, 8.28), (10.0, 4.67, 8.53, 8.25), (10.1, 4.57, 8.5, 8.26), (10.2, 4.47, nan, nan), (10.3, 4.37, nan, nan), (10.4, 4.26, nan, nan), (10.5, 4.15, nan, nan), (10.6, 4.03, nan, nan), (10.7, 3.92, 8.1, 8.06), (10.8, 3.8, 8.05, 7.99), (10.9, 3.68, 7.94, 7.88), (11.0, 3.56, 7.83, 7.81), (11.1, 3.44, 7.77, 7.75), (11.2, 3.32, 7.68, 7.64), (11.3, 3.21, 7.56, 7.53), (11.4, 3.09, 7.45, 7.41), (11.5, 2.97, 7.33, 7.3), (11.6, 2.86, 7.22, 7.18), (11.7, 2.74, 7.09, 7.06), (11.8, 2.63, 6.96, 6.92), (11.9, 2.52, 6.84, 6.81), (12.0, 2.41, 6.77, 6.73), (12.1, 2.3, 6.65, 6.63), (12.2, 2.2, 6.58, 6.55), (12.3, 2.09, 6.5, 6.47), (12.4, 1.98, 6.44, 6.41), (12.5, 1.88, 6.33, 6.3), (12.6, 1.78, 6.24, 6.21), (12.7, 1.68, 6.16, 6.15), (12.8, 1.57, 6.1, 6.09), (12.9, 1.47, 6.02, 6.0), (13.0, 1.37, 5.91, 5.9), (13.1, 1.27, 5.87, 5.83), (13.2, 1.17, 5.79, 5.76), (13.3, 1.08, 5.69, 5.67), (13.4, 0.98, 5.61, 5.59), (13.5, 0.89, 5.55, 5.53), (13.6, 0.8, 5.46, 5.44), (13.7, 0.71, 5.42, 5.39), (13.8, 0.62, 5.37, 5.34), (13.9, 0.54, 5.33, 5.3), (14.0, 0.46, 5.3, 5.26), (14.1, 0.39, 5.27, 5.24), (14.2, 0.32, 5.26, 5.23), (14.3, 0.26, nan, 5.2), (14.4, 0.21, 5.27, 5.24), (14.5, 0.16, 5.28, 5.25), (14.6, 0.12, 5.29, 5.27), (14.7, 0.09, 5.38, 5.35), (14.8, 0.07, 5.43, 5.4), (14.9, 0.06, 5.52, 5.5), (15.0, 0.05, 5.6, 5.58), (15.1, 0.06, 5.69, 5.69), (15.2, 0.08, 5.79, 5.77), (15.3, 0.1, 5.97, 5.97), (15.4, 0.13, 6.11, 6.1), (15.5, 0.18, 6.24, 6.21), (15.6, 0.23, 6.4, 6.36), (15.7, 0.29, 6.5, 6.46), (15.8, 0.36, 6.61, 6.56), (15.9, 0.44, 6.7, 6.67), (16.0, 0.52, 6.85, 6.82), (16.1, 0.61, 7.02, 6.97), (16.2, 0.71, 7.15, 7.09), (16.3, 0.81, 7.28, 7.23), (16.4, 0.91, 7.45, 7.4), (16.5, 1.02, 7.57, 7.51), (16.6, 1.13, 7.75, 7.7), (16.7, 1.25, 7.91, 7.87), (16.8, 1.36, 8.03, 8.0), (16.9, 1.49, 8.18, 8.14), (17.0, 1.6, 8.27, 8.23), (17.1, 1.73, 8.38, 8.3), (17.2, 1.85, 8.48, 8.33), (17.3, 1.97, 8.63, 8.32), (17.4, 2.09, 8.77, 8.33), (17.5, 2.21, 8.9, 8.31), (17.6, 2.33, 9.03, 8.32), (17.7, 2.45, 9.19, 8.29), (17.8, 2.57, 9.33, 8.31), (17.9, 2.68, 9.48, 8.3), (18.0, 2.8, 9.62, 7.85), (18.1, 2.91, 9.74, 7.83), (18.2, 3.02, 9.95, 7.07), (18.3, 3.13, 10.1, 6.1), (18.4, 3.23, 10.22, 6.09), (18.5, 3.33, 10.39, 6.11), (18.6, 3.44, 10.55, 6.11), (18.7, 3.53, 10.69, 6.12), (18.8, 3.63, 10.88, 6.6), (18.9, 3.72, 11.07, 6.91), (19.0, 3.81, 11.25, 7.27), (19.1, 3.9, 11.41, 7.16), (19.2, 3.98, 11.62, 7.07), (19.3, 4.06, 11.87, 7.31), (19.4, 4.14, 12.09, 7.06), (19.5, 4.21, 12.33, 7.06), (19.6, 4.28, 12.54, 7.24), (19.7, 4.34, 12.75, 7.13), (19.8, 4.41, 12.93, 7.16), (19.9, 4.46, 13.04, 7.09), (20.0, 4.51, 13.15, 7.16), (20.1, 4.56, 13.2, 7.16), (20.2, 4.6, 13.26, 7.11), (20.3, 4.64, 13.34, 7.15), (20.4, 4.67, 13.4, 7.26), (20.5, 4.69, 13.46, 7.13), (20.6, 4.71, 13.54, 7.0), (20.7, 4.73, 13.65, 6.68), (20.8, 4.74, 13.72, 6.85), (20.9, 4.74, 13.78, 7.12), (21.0, 4.74, 13.81, 7.07), (21.1, 4.72, 13.85, 7.3), (21.2, 4.71, 13.87, 7.3), (21.3, 4.69, 13.87, 7.32), (21.4, 4.65, 13.88, 7.19), (21.5, 4.62, 13.79, 7.14), (21.6, 4.57, 13.72, 7.18), (21.7, 4.52, 13.63, 7.03), (21.8, 4.46, 13.54, 7.32), (21.9, 4.4, 13.41, 7.04), (22.0, 4.33, 13.3, 7.23), (22.1, 4.25, 13.15, 6.88), (22.2, 4.17, 12.99, 6.97), (22.3, 4.08, 12.86, 7.19), (22.4, 3.99, 12.69, 7.1), (22.5, 3.89, 12.5, 7.18), (22.6, 3.79, 12.27, 7.18), (22.7, 3.68, 12.07, 7.4), (22.8, 3.57, 11.87, 7.09), (22.9, 3.46, 11.61, 7.03), (23.0, 3.35, 11.32, 7.18), (23.1, 3.23, 11.04, 7.26), (23.2, 3.11, 10.78, 7.24), (23.3, 3.0, 10.47, 6.43), (23.4, 2.88, 10.15, 6.8), (23.5, 2.76, 9.81, 7.67), (23.6, 2.65, 9.54, 8.14), (23.7, 2.53, 9.22, 8.31), (23.8, 2.42, 8.92, 8.32), (23.9, 2.31, 8.62, 8.29), (24.0, 2.2, 8.35, 8.17), (24.1, 2.1, 8.03, 7.94), (24.2, 1.99, 7.77, 7.71), (24.3, 1.89, 7.58, 7.53), (24.4, 1.79, 7.42, 7.36), (24.5, 1.7, 7.2, 7.19), (24.6, 1.6, 7.06, 7.0), (24.7, 1.51, 6.87, 6.84), (24.8, 1.42, 6.71, 6.68), (24.9, 1.33, 6.54, 6.51), (25.0, 1.25, 6.4, 6.36), (25.1, 1.16, 6.26, 6.22), (25.2, 1.08, 6.13, 6.09), (25.3, 1.0, 5.99, 5.96), (25.4, 0.92, 5.85, 5.84), (25.5, 0.85, 5.77, 5.74), (25.6, 0.77, 5.64, 5.62), (25.7, 0.7, 5.52, 5.51), (25.8, 0.63, 5.36, 5.35), (25.9, 0.56, 5.21, 5.2), (26.0, 0.5, 5.08, 5.08), (26.1, 0.44, 4.94, 4.94), (26.2, 0.38, 4.8, 4.8), (26.3, 0.33, 4.7, 4.67), (26.4, 0.29, 4.55, 4.53), (26.5, 0.24, 4.44, 4.41), (26.6, 0.21, 4.32, 4.31), (26.7, 0.18, 4.21, 4.2), (26.8, 0.16, 4.1, 4.08), (26.9, 0.15, 4.0, 3.96), (27.0, 0.14, 3.91, 3.87), (27.1, 0.14, 3.81, 3.79), (27.2, 0.16, 3.77, 3.75), (27.3, 0.18, 3.71, 3.68), (27.4, 0.21, 3.67, 3.63), (27.5, 0.25, 3.64, 3.61), (27.6, 0.3, 3.63, 3.61), (27.7, 0.36, 3.67, 3.64), (27.8, 0.43, 3.69, 3.67), (27.9, 0.5, 3.72, 3.71), (28.0, 0.59, 3.81, 3.79), (28.1, 0.68, 3.88, 3.87), (28.2, 0.78, 4.0, 3.99), (28.3, 0.89, 4.07, 4.05), (28.4, 1.0, 4.14, 4.14), (28.5, 1.12, 4.21, 4.2), (28.6, 1.24, 4.3, 4.3), (28.7, 1.37, 4.41, 4.41), (28.8, 1.5, 4.52, 4.53), (28.9, 1.63, 4.65, 4.66), (29.0, 1.77, 4.79, 4.81), (29.1, 1.91, 4.95, 4.97), (29.2, 2.05, 5.08, 5.13), (29.3, 2.19, 5.21, 5.28), (29.4, 2.32, 5.36, 5.44), (29.5, 2.46, 5.5, 5.57), (29.6, 2.6, 5.6, 5.71), (29.7, 2.73, 5.75, 5.87), (29.8, 2.87, 5.89, 6.0), (29.9, 3.0, 5.99, 6.11), (30.0, 3.13, 6.09, 6.21), (30.1, 3.25, 6.17, 6.27), (30.2, 3.38, 6.27, 6.4), (30.3, 3.5, 6.37, 6.5), (30.4, 3.62, 6.43, 6.54), (30.5, 3.73, 6.51, 6.62), (30.6, 3.84, 6.56, 6.7), (30.7, 3.95, 6.61, 6.75), (30.8, 4.06, 6.7, 6.8), (30.9, 4.16, 6.74, 6.86), (31.0, 4.25, 6.81, 6.93), (31.1, 4.35, 6.84, 6.98), (31.2, 4.44, 6.92, 7.04), (31.3, 4.52, 6.93, 7.07), (31.4, 4.6, 6.93, 7.09), (31.5, 4.68, 6.96, 7.09), (31.6, 4.76, 6.95, 7.1), (31.7, 4.83, 6.96, 7.12), (31.8, 4.89, 6.97, 7.12), (31.9, 4.95, 7.01, 7.15), (32.0, 5.01, 7.02, 7.15), (32.1, 5.07, 7.02, 7.15), (32.2, 5.11, 7.03, 7.16), (32.3, 5.16, 7.02, 7.16), (32.4, 5.2, 7.01, 7.15), (32.5, 5.23, 7.03, 7.17), (32.6, 5.26, 7.05, 7.21), (32.7, 5.29, 7.13, 7.28), (32.8, 5.31, 7.18, 7.32), (32.9, 5.32, 7.19, 7.34), (33.0, 5.33, 7.17, 7.31), (33.1, 5.33, 7.13, 7.26), (33.2, 5.33, 7.11, 7.24), (33.3, 5.32, 7.12, 7.26), (33.4, 5.31, 7.12, 7.26), (33.5, 5.29, 7.11, 7.25), (33.6, 5.26, 7.12, 7.25), (33.7, 5.23, 7.16, 7.3), (33.8, 5.19, 7.2, 7.36), (33.9, 5.14, 7.24, 7.38), (34.0, 5.1, 7.28, 7.41), (34.1, 5.04, 7.29, 7.44), (34.2, 4.97, 7.33, 7.48), (34.3, 4.9, 7.31, 7.47), (34.4, 4.83, 7.27, 7.42), (34.5, 4.75, 7.24, 7.4), (34.6, 4.66, 7.19, 7.34), (34.7, 4.57, 7.11, 7.25), (34.8, 4.47, 6.99, 7.14), (34.9, 4.37, 6.87, 7.01), (35.0, 4.27, 6.72, 6.88), (35.1, 4.16, 6.64, 6.79), (35.2, 4.05, 6.56, 6.69), (35.3, 3.93, 6.46, 6.61), (35.4, 3.81, 6.37, 6.53), (35.5, 3.69, 6.25, 6.41), (35.6, 3.57, 6.11, 6.28), (35.7, 3.45, 6.01, 6.18), (35.8, 3.33, 5.91, 6.07), (35.9, 3.21, 5.78, 5.98), (36.0, 3.09, 5.66, 5.83), (36.1, 2.97, 5.49, 5.64), (36.2, 2.86, 5.31, 5.49), (36.3, 2.74, 5.15, 5.29), (36.4, 2.63, 5.03, 5.16), (36.5, 2.51, 4.91, 5.02), (36.6, 2.4, 4.78, 4.9), (36.7, 2.29, 4.64, 4.74), (36.8, 2.19, 4.51, 4.59), (36.9, 2.08, 4.36, 4.44), (37.0, 1.98, 4.21, 4.29), (37.1, 1.88, 4.05, 4.11), (37.2, 1.78, 3.91, 3.96), (37.3, 1.68, 3.76, 3.81), (37.4, 1.58, 3.63, 3.68), (37.5, 1.49, 3.51, 3.56), (37.6, 1.39, 3.4, 3.43), (37.7, 1.3, 3.3, 3.34), (37.8, 1.21, 3.21, 3.23), (37.9, 1.12, 3.08, 3.09), (38.0, 1.03, 2.94, 2.96), (38.1, 0.94, 2.83, 2.83), (38.2, 0.85, 2.73, 2.72), (38.3, 0.77, 2.62, 2.61), (38.4, 0.69, 2.51, 2.49), (38.5, 0.6, 2.4, 2.39), (38.6, 0.53, 2.34, 2.31), (38.7, 0.46, 2.23, 2.21), (38.8, 0.39, 2.14, 2.11), (38.9, 0.33, 2.03, 2.0), (39.0, 0.27, 1.92, 1.89), (39.1, 0.22, 1.84, 1.83), (39.2, 0.18, 1.77, 1.77), (39.3, 0.14, 1.71, 1.69), (39.4, 0.11, 1.67, 1.65), (39.5, 0.09, 1.64, 1.61), (39.6, 0.08, 1.6, 1.58), (39.7, 0.08, 1.58, 1.56), (39.8, 0.09, 1.58, 1.54), (39.9, 0.1, 1.56, 1.52), (40.0, 0.13, 1.53, 1.51), (40.1, 0.16, 1.52, 1.51), (40.2, 0.2, 1.53, 1.52), (40.3, 0.26, 1.55, 1.53), (40.4, 0.32, 1.54, 1.53), (40.5, 0.39, 1.58, 1.57), (40.6, 0.47, 1.64, 1.64), (40.7, 0.55, 1.7, 1.7), (40.8, 0.64, 1.83, 1.8), (40.9, 0.74, 1.94, 1.93), (41.0, 0.84, 2.03, 2.02), (41.1, 0.95, 2.14, 2.13), (41.2, 1.06, 2.26, 2.26), (41.3, 1.17, 2.4, 2.41), (41.4, 1.29, 2.57, 2.58), (41.5, 1.41, 2.73, 2.73), (41.6, 1.53, 2.89, 2.89), (41.7, 1.66, 3.03, 3.04), (41.8, 1.78, 3.17, 3.2), (41.9, 1.9, 3.33, 3.37), (42.0, 2.02, 3.5, 3.54), (42.1, 2.15, 3.66, 3.71), (42.2, 2.27, 3.83, 3.9), (42.3, 2.39, 4.01, 4.08), (42.4, 2.5, 4.17, 4.25), (42.5, 2.62, 4.29, 4.4), (42.6, 2.73, 4.44, 4.55), (42.7, 2.84, 4.6, 4.71), (42.8, 2.95, 4.78, 4.88), (42.9, 3.05, 4.92, 5.03), (43.0, 3.16, 4.99, 5.13), (43.1, 3.26, 5.1, 5.26), (43.2, 3.36, 5.22, 5.38), (43.3, 3.45, 5.35, 5.5), (43.4, 3.54, 5.41, 5.58), (43.5, 3.63, 5.5, 5.69), (43.6, 3.71, 5.58, 5.76), (43.7, 3.8, 5.64, 5.83), (43.8, 3.87, 5.68, 5.87), (43.9, 3.95, 5.73, 5.93), (44.0, 4.02, 5.79, 5.98), (44.1, 4.09, 5.79, 6.0), (44.2, 4.15, 5.81, 6.02), (44.3, 4.22, 5.83, 6.03), (44.4, 4.27, 5.86, 6.06), (44.5, 4.32, 5.88, 6.08), (44.6, 4.37, 5.89, 6.09), (44.7, 4.42, 5.88, 6.08), (44.8, 4.46, 5.88, 6.07), (44.9, 4.49, 5.88, 6.07), (45.0, 4.52, 5.88, 6.08), (45.1, 4.55, 5.91, 6.1), (45.2, 4.57, 5.91, 6.1), (45.3, 4.58, 5.9, 6.1), (45.4, 4.6, 5.9, 6.1), (45.5, 4.6, 5.87, 6.07), (45.6, 4.6, 5.84, 6.05), (45.7, 4.59, 5.8, 6.01), (45.8, 4.58, 5.79, 6.0), (45.9, 4.56, 5.77, 5.97), (46.0, 4.54, 5.72, 5.93), (46.1, 4.51, 5.68, 5.89), (46.2, 4.48, 5.64, 5.84), (46.3, 4.43, 5.58, 5.78), (46.4, 4.38, 5.51, 5.71), (46.5, 4.33, 5.44, 5.63), (46.6, 4.27, 5.38, 5.57), (46.7, 4.2, 5.32, 5.5), (46.8, 4.13, 5.23, 5.41), (46.9, 4.05, 5.15, 5.32), (47.0, 3.96, 5.07, 5.24), (47.1, 3.87, 4.96, 5.13), (47.2, 3.78, 4.86, 5.03), (47.3, 3.68, 4.75, 4.92), (47.4, 3.58, 4.63, 4.8), (47.5, 3.47, 4.52, 4.69), (47.6, 3.37, 4.42, 4.6), (47.7, 3.25, 4.32, 4.5), (47.8, 3.14, 4.22, 4.39), (47.9, 3.03, 4.12, 4.28)], dtype=[('TimeOffsetHours', '<f8'), ('Pred6', '<f8'), ('Backup', '<f8'), ('Acoustc', '<f8')])
data[0]
(0.0, 1.59, 4.68, 4.65)
data['Pred6']
array([ 1.59, 1.5 , 1.4 , 1.31, 1.22, 1.13, 1.04, 0.95, 0.87, 0.78, 0.7 , 0.62, 0.55, 0.48, 0.41, 0.34, 0.28, 0.23, 0.18, 0.14, 0.1 , 0.08, 0.06, 0.05, 0.05, 0.06, 0.08, 0.11, 0.14, 0.19, 0.25, 0.31, 0.39, 0.47, 0.56, 0.66, 0.76, 0.87, 0.98, 1.1 , 1.22, 1.35, 1.48, 1.61, 1.74, 1.88, 2.01, 2.15, 2.28, 2.41, 2.55, 2.68, 2.81, 2.94, 3.06, 3.19, 3.31, 3.43, 3.55, 3.66, 3.78, 3.89, 3.99, 4.1 , 4.2 , 4.3 , 4.39, 4.48, 4.57, 4.66, 4.73, 4.81, 4.89, 4.95, 5.02, 5.08, 5.13, 5.18, 5.23, 5.27, 5.3 , 5.33, 5.35, 5.37, 5.38, 5.38, 5.38, 5.37, 5.36, 5.34, 5.31, 5.28, 5.24, 5.19, 5.13, 5.07, 5. , 4.93, 4.85, 4.76, 4.67, 4.57, 4.47, 4.37, 4.26, 4.15, 4.03, 3.92, 3.8 , 3.68, 3.56, 3.44, 3.32, 3.21, 3.09, 2.97, 2.86, 2.74, 2.63, 2.52, 2.41, 2.3 , 2.2 , 2.09, 1.98, 1.88, 1.78, 1.68, 1.57, 1.47, 1.37, 1.27, 1.17, 1.08, 0.98, 0.89, 0.8 , 0.71, 0.62, 0.54, 0.46, 0.39, 0.32, 0.26, 0.21, 0.16, 0.12, 0.09, 0.07, 0.06, 0.05, 0.06, 0.08, 0.1 , 0.13, 0.18, 0.23, 0.29, 0.36, 0.44, 0.52, 0.61, 0.71, 0.81, 0.91, 1.02, 1.13, 1.25, 1.36, 1.49, 1.6 , 1.73, 1.85, 1.97, 2.09, 2.21, 2.33, 2.45, 2.57, 2.68, 2.8 , 2.91, 3.02, 3.13, 3.23, 3.33, 3.44, 3.53, 3.63, 3.72, 3.81, 3.9 , 3.98, 4.06, 4.14, 4.21, 4.28, 4.34, 4.41, 4.46, 4.51, 4.56, 4.6 , 4.64, 4.67, 4.69, 4.71, 4.73, 4.74, 4.74, 4.74, 4.72, 4.71, 4.69, 4.65, 4.62, 4.57, 4.52, 4.46, 4.4 , 4.33, 4.25, 4.17, 4.08, 3.99, 3.89, 3.79, 3.68, 3.57, 3.46, 3.35, 3.23, 3.11, 3. , 2.88, 2.76, 2.65, 2.53, 2.42, 2.31, 2.2 , 2.1 , 1.99, 1.89, 1.79, 1.7 , 1.6 , 1.51, 1.42, 1.33, 1.25, 1.16, 1.08, 1. , 0.92, 0.85, 0.77, 0.7 , 0.63, 0.56, 0.5 , 0.44, 0.38, 0.33, 0.29, 0.24, 0.21, 0.18, 0.16, 0.15, 0.14, 0.14, 0.16, 0.18, 0.21, 0.25, 0.3 , 0.36, 0.43, 0.5 , 0.59, 0.68, 0.78, 0.89, 1. , 1.12, 1.24, 1.37, 1.5 , 1.63, 1.77, 1.91, 2.05, 2.19, 2.32, 2.46, 2.6 , 2.73, 2.87, 3. , 3.13, 3.25, 3.38, 3.5 , 3.62, 3.73, 3.84, 3.95, 4.06, 4.16, 4.25, 4.35, 4.44, 4.52, 4.6 , 4.68, 4.76, 4.83, 4.89, 4.95, 5.01, 5.07, 5.11, 5.16, 5.2 , 5.23, 5.26, 5.29, 5.31, 5.32, 5.33, 5.33, 5.33, 5.32, 5.31, 5.29, 5.26, 5.23, 5.19, 5.14, 5.1 , 5.04, 4.97, 4.9 , 4.83, 4.75, 4.66, 4.57, 4.47, 4.37, 4.27, 4.16, 4.05, 3.93, 3.81, 3.69, 3.57, 3.45, 3.33, 3.21, 3.09, 2.97, 2.86, 2.74, 2.63, 2.51, 2.4 , 2.29, 2.19, 2.08, 1.98, 1.88, 1.78, 1.68, 1.58, 1.49, 1.39, 1.3 , 1.21, 1.12, 1.03, 0.94, 0.85, 0.77, 0.69, 0.6 , 0.53, 0.46, 0.39, 0.33, 0.27, 0.22, 0.18, 0.14, 0.11, 0.09, 0.08, 0.08, 0.09, 0.1 , 0.13, 0.16, 0.2 , 0.26, 0.32, 0.39, 0.47, 0.55, 0.64, 0.74, 0.84, 0.95, 1.06, 1.17, 1.29, 1.41, 1.53, 1.66, 1.78, 1.9 , 2.02, 2.15, 2.27, 2.39, 2.5 , 2.62, 2.73, 2.84, 2.95, 3.05, 3.16, 3.26, 3.36, 3.45, 3.54, 3.63, 3.71, 3.8 , 3.87, 3.95, 4.02, 4.09, 4.15, 4.22, 4.27, 4.32, 4.37, 4.42, 4.46, 4.49, 4.52, 4.55, 4.57, 4.58, 4.6 , 4.6 , 4.6 , 4.59, 4.58, 4.56, 4.54, 4.51, 4.48, 4.43, 4.38, 4.33, 4.27, 4.2 , 4.13, 4.05, 3.96, 3.87, 3.78, 3.68, 3.58, 3.47, 3.37, 3.25, 3.14, 3.03])
Can also have genfromtxt unpack the columns into separate arrays.
time, pred, backup, accoustic = np.genfromtxt('BatteryParkTideData.csv', delimiter=',', skip_header=1, missing='NA', unpack=True)
pred
array([ 1.59, 1.5 , 1.4 , 1.31, 1.22, 1.13, 1.04, 0.95, 0.87, 0.78, 0.7 , 0.62, 0.55, 0.48, 0.41, 0.34, 0.28, 0.23, 0.18, 0.14, 0.1 , 0.08, 0.06, 0.05, 0.05, 0.06, 0.08, 0.11, 0.14, 0.19, 0.25, 0.31, 0.39, 0.47, 0.56, 0.66, 0.76, 0.87, 0.98, 1.1 , 1.22, 1.35, 1.48, 1.61, 1.74, 1.88, 2.01, 2.15, 2.28, 2.41, 2.55, 2.68, 2.81, 2.94, 3.06, 3.19, 3.31, 3.43, 3.55, 3.66, 3.78, 3.89, 3.99, 4.1 , 4.2 , 4.3 , 4.39, 4.48, 4.57, 4.66, 4.73, 4.81, 4.89, 4.95, 5.02, 5.08, 5.13, 5.18, 5.23, 5.27, 5.3 , 5.33, 5.35, 5.37, 5.38, 5.38, 5.38, 5.37, 5.36, 5.34, 5.31, 5.28, 5.24, 5.19, 5.13, 5.07, 5. , 4.93, 4.85, 4.76, 4.67, 4.57, 4.47, 4.37, 4.26, 4.15, 4.03, 3.92, 3.8 , 3.68, 3.56, 3.44, 3.32, 3.21, 3.09, 2.97, 2.86, 2.74, 2.63, 2.52, 2.41, 2.3 , 2.2 , 2.09, 1.98, 1.88, 1.78, 1.68, 1.57, 1.47, 1.37, 1.27, 1.17, 1.08, 0.98, 0.89, 0.8 , 0.71, 0.62, 0.54, 0.46, 0.39, 0.32, 0.26, 0.21, 0.16, 0.12, 0.09, 0.07, 0.06, 0.05, 0.06, 0.08, 0.1 , 0.13, 0.18, 0.23, 0.29, 0.36, 0.44, 0.52, 0.61, 0.71, 0.81, 0.91, 1.02, 1.13, 1.25, 1.36, 1.49, 1.6 , 1.73, 1.85, 1.97, 2.09, 2.21, 2.33, 2.45, 2.57, 2.68, 2.8 , 2.91, 3.02, 3.13, 3.23, 3.33, 3.44, 3.53, 3.63, 3.72, 3.81, 3.9 , 3.98, 4.06, 4.14, 4.21, 4.28, 4.34, 4.41, 4.46, 4.51, 4.56, 4.6 , 4.64, 4.67, 4.69, 4.71, 4.73, 4.74, 4.74, 4.74, 4.72, 4.71, 4.69, 4.65, 4.62, 4.57, 4.52, 4.46, 4.4 , 4.33, 4.25, 4.17, 4.08, 3.99, 3.89, 3.79, 3.68, 3.57, 3.46, 3.35, 3.23, 3.11, 3. , 2.88, 2.76, 2.65, 2.53, 2.42, 2.31, 2.2 , 2.1 , 1.99, 1.89, 1.79, 1.7 , 1.6 , 1.51, 1.42, 1.33, 1.25, 1.16, 1.08, 1. , 0.92, 0.85, 0.77, 0.7 , 0.63, 0.56, 0.5 , 0.44, 0.38, 0.33, 0.29, 0.24, 0.21, 0.18, 0.16, 0.15, 0.14, 0.14, 0.16, 0.18, 0.21, 0.25, 0.3 , 0.36, 0.43, 0.5 , 0.59, 0.68, 0.78, 0.89, 1. , 1.12, 1.24, 1.37, 1.5 , 1.63, 1.77, 1.91, 2.05, 2.19, 2.32, 2.46, 2.6 , 2.73, 2.87, 3. , 3.13, 3.25, 3.38, 3.5 , 3.62, 3.73, 3.84, 3.95, 4.06, 4.16, 4.25, 4.35, 4.44, 4.52, 4.6 , 4.68, 4.76, 4.83, 4.89, 4.95, 5.01, 5.07, 5.11, 5.16, 5.2 , 5.23, 5.26, 5.29, 5.31, 5.32, 5.33, 5.33, 5.33, 5.32, 5.31, 5.29, 5.26, 5.23, 5.19, 5.14, 5.1 , 5.04, 4.97, 4.9 , 4.83, 4.75, 4.66, 4.57, 4.47, 4.37, 4.27, 4.16, 4.05, 3.93, 3.81, 3.69, 3.57, 3.45, 3.33, 3.21, 3.09, 2.97, 2.86, 2.74, 2.63, 2.51, 2.4 , 2.29, 2.19, 2.08, 1.98, 1.88, 1.78, 1.68, 1.58, 1.49, 1.39, 1.3 , 1.21, 1.12, 1.03, 0.94, 0.85, 0.77, 0.69, 0.6 , 0.53, 0.46, 0.39, 0.33, 0.27, 0.22, 0.18, 0.14, 0.11, 0.09, 0.08, 0.08, 0.09, 0.1 , 0.13, 0.16, 0.2 , 0.26, 0.32, 0.39, 0.47, 0.55, 0.64, 0.74, 0.84, 0.95, 1.06, 1.17, 1.29, 1.41, 1.53, 1.66, 1.78, 1.9 , 2.02, 2.15, 2.27, 2.39, 2.5 , 2.62, 2.73, 2.84, 2.95, 3.05, 3.16, 3.26, 3.36, 3.45, 3.54, 3.63, 3.71, 3.8 , 3.87, 3.95, 4.02, 4.09, 4.15, 4.22, 4.27, 4.32, 4.37, 4.42, 4.46, 4.49, 4.52, 4.55, 4.57, 4.58, 4.6 , 4.6 , 4.6 , 4.59, 4.58, 4.56, 4.54, 4.51, 4.48, 4.43, 4.38, 4.33, 4.27, 4.2 , 4.13, 4.05, 3.96, 3.87, 3.78, 3.68, 3.58, 3.47, 3.37, 3.25, 3.14, 3.03])
As an aside, you can manually create arrays too. A common situation is to want to create an array from a list or other sequence. Easy:
np.array([2.3, 42, 5.6])
array([ 2.3, 42. , 5.6])
NumPy also has routines for creating arrays of ones and zeros:
np.ones(10)
array([ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])
np.zeros((2, 2))
array([[ 0., 0.], [ 0., 0.]])
And for creating arrays over ranges, either using a step size or a set number of points:
np.arange(10, 20, 1.6)
array([ 10. , 11.6, 13.2, 14.8, 16.4, 18. , 19.6])
np.linspace(10, 20, 16)
array([ 10. , 10.66666667, 11.33333333, 12. , 12.66666667, 13.33333333, 14. , 14.66666667, 15.33333333, 16. , 16.66666667, 17.33333333, 18. , 18.66666667, 19.33333333, 20. ])
And for getting random numbers:
np.random.random((2, 2))
array([[ 0.76633572, 0.81414299], [ 0.81736843, 0.27763528]])
np.random.standard_normal((2, 2))
array([[-1.34426401, -0.8267184 ], [-0.61234629, -0.9110464 ]])
Back to the task at hand! We have these time
, pred
, backup
, and accoustic
arrays. What can we learn about them? There are a number of stats methods:
print pred.min()
print pred.max()
print pred.mean()
print pred.std()
print np.median(pred)
0.05 5.38 2.65491666667 1.75053229801 2.74
# peak-to-peak
print pred.ptp()
5.33
backup.max()
nan
The backup
array contains nan
values in places where data was missing. nan
combined with anything else gives nan
so all of our stats methods return nan
. To get our data without nan
we'll need to do some fancy indexing.
One of the powerful features of NumPy arrays is the many ways they can be indexed. You can, for example, use a list or array of integers to grab specific elements from an array. The list can contain indices in any order and can even contain repeated indices:
pred[[100, 5, 1, 5, 100]]
array([ 4.67, 1.13, 1.5 , 1.13, 4.67])
(Note: The array used to index must have the same number of dimensions as the array being indexed.)
It's also possible to index arrays using boolean expressions, similar to an if
statement:
pred[pred > 5]
array([ 5.02, 5.08, 5.13, 5.18, 5.23, 5.27, 5.3 , 5.33, 5.35, 5.37, 5.38, 5.38, 5.38, 5.37, 5.36, 5.34, 5.31, 5.28, 5.24, 5.19, 5.13, 5.07, 5.01, 5.07, 5.11, 5.16, 5.2 , 5.23, 5.26, 5.29, 5.31, 5.32, 5.33, 5.33, 5.33, 5.32, 5.31, 5.29, 5.26, 5.23, 5.19, 5.14, 5.1 , 5.04])
pred[(pred > 5) | (pred < 0.5)]
array([ 0.48, 0.41, 0.34, 0.28, 0.23, 0.18, 0.14, 0.1 , 0.08, 0.06, 0.05, 0.05, 0.06, 0.08, 0.11, 0.14, 0.19, 0.25, 0.31, 0.39, 0.47, 5.02, 5.08, 5.13, 5.18, 5.23, 5.27, 5.3 , 5.33, 5.35, 5.37, 5.38, 5.38, 5.38, 5.37, 5.36, 5.34, 5.31, 5.28, 5.24, 5.19, 5.13, 5.07, 0.46, 0.39, 0.32, 0.26, 0.21, 0.16, 0.12, 0.09, 0.07, 0.06, 0.05, 0.06, 0.08, 0.1 , 0.13, 0.18, 0.23, 0.29, 0.36, 0.44, 0.44, 0.38, 0.33, 0.29, 0.24, 0.21, 0.18, 0.16, 0.15, 0.14, 0.14, 0.16, 0.18, 0.21, 0.25, 0.3 , 0.36, 0.43, 5.01, 5.07, 5.11, 5.16, 5.2 , 5.23, 5.26, 5.29, 5.31, 5.32, 5.33, 5.33, 5.33, 5.32, 5.31, 5.29, 5.26, 5.23, 5.19, 5.14, 5.1 , 5.04, 0.46, 0.39, 0.33, 0.27, 0.22, 0.18, 0.14, 0.11, 0.09, 0.08, 0.08, 0.09, 0.1 , 0.13, 0.16, 0.2 , 0.26, 0.32, 0.39, 0.47])
How exactly does this work? The truthy expressions with arrays produce another array: an array of boolean values with the same shape as the original array with True
where the expression is true, and False
elsewhere. Let's see how that looks on a small array:
np.arange(10) > 5
array([False, False, False, False, False, False, True, True, True, True], dtype=bool)
These boolean arrays can be saved in their own variables, combined logically with other boolean arrays, and used to index any array with the same shape.
To get the indices where a condition is true use the numpy.where function:
np.where(np.arange(10) > 5)
(array([6, 7, 8, 9]),)
How does this help with the nan
issue? Much like Python's standard library has a math.isnan function that works on floats, there is a numpy.isnan function that works on arrays. (In fact, NumPy has array equivalents to most of the functions in the math
module.) Here's a small example of np.isnan
:
a = np.array([1, 2, np.nan, 4, 5, np.nan])
np.isnan(a)
array([False, False, True, False, False, True], dtype=bool)
np.isnan
returns an array of booleans just like the logical expressions up above, so that looks promising! Let's try it:
a[np.isnan(a)]
array([ nan, nan])
Of course, that grabbed the nan
values because np.isnan
gives True
where a
has nan
values. One thing to do perform a logical flip on the boolean array using the ~
operator:
a[~np.isnan(a)]
array([ 1., 2., 4., 5.])
Or we could see what other kinds of logical functions there are in NumPy. One is numpy.isfinite:
a[np.isfinite(a)]
array([ 1., 2., 4., 5.])
# are there any nan values?
print 'time:', np.isnan(time).any()
print 'backup:', np.isnan(backup).any()
time: False backup: True
# are all of the values finite?
print 'pred:', np.isfinite(pred).all()
print 'accoustic:', np.isfinite(accoustic).all()
pred: True accoustic: False
Both backup
and accoustic
have missing data. (These are the two columns of actual instrument measurements so it shouldn't be too shocking to see missing data.) We can use logical comparisons with arrays to make a boolean array of where backup
and accoustic
are both good:
not_nan = np.isfinite(backup) & np.isfinite(accoustic)
And then use this to make new copies of time
, pred
, backup
, and accoustic
without the rows where backup
and accoustic
are missing data:
time = time[not_nan]
pred = pred[not_nan]
backup = backup[not_nan]
accoustic = accoustic[not_nan]
How many rows did we lose?
not_nan.size - time.size
6
Not too bad. Now we can get down to business!
So what are we actually looking at here?
time
: time in hours since the first measurement in the filepred
: predicted water levelaccoustic
: a measured water levelbackup
: another measured water levelAll water levels are in feet above Mean Lower Low Water. Let's take a quick look:
print time[:5]
print pred[:5]
print accoustic[:5]
print backup[:5]
[ 0. 0.1 0.2 0.3 0.4] [ 1.59 1.5 1.4 1.31 1.22] [ 4.65 4.54 4.44 4.33 4.26] [ 4.68 4.55 4.46 4.36 4.28]
Honestly, looking at a ton of numbers is a great way to get a feal for things. We could compare maxima:
backup.max() - pred.max()
8.5
But are those at the same time? We can use the argmax method to get the index of the maxima of one and use that index in the other for a more apples-to-apples comparison:
m = backup.argmax()
print m
backup[m] - pred[m]
208
9.2300000000000004
So at least according to the backup
measurements the MLLW was 9 feet higher than predicted at one point during Hurricane Sandy!
But that's just one data point. To really see trends you want a plot. We'll start by turning on the IPython Notebook's inline plotting mode so that plots show up right here in our notebook:
%pylab inline
Welcome to pylab, a matplotlib-based Python environment [backend: module://IPython.zmq.pylab.backend_inline]. For more information, type 'help(pylab)'.
Then conigure plots to display as SVG (default is PNG):
%config InlineBackend.figure_format = 'svg'
Then we'll import matplotlib:
import matplotlib.pyplot as plt
And make a basic plot of our data, using time
along the x-axis and plotting the predicted and measured levels as three separate lines:
fig, ax = plt.subplots()
ax.plot(time, pred)
ax.plot(time, accoustic)
ax.plot(time, backup)
ax.set_ylabel('Feet above MLLW')
ax.set_xlabel('Hours Since First Measurement')
<matplotlib.text.Text at 0x1128fba90>
And to keep the lines straight we can throw in a legend:
fig, ax = plt.subplots()
ax.plot(time, pred, label='Predicted')
ax.plot(time, accoustic, label='Accoustic')
ax.plot(time, backup, label='Backup')
ax.set_ylabel('Feet above MLLW')
ax.set_xlabel('Hours Since First Measurement')
ax.legend(loc='upper right')
<matplotlib.legend.Legend at 0x11297ca50>
Great! We can see that maybe something got a bit weird with the accoustic
measurements and maybe we should trust the backup
measurements more. Now maybe we'd like to quantify and plot the difference between the measured and predicted tide levels. Piece of cake:
obs_minus_pred = backup - pred
fig, ax = plt.subplots()
ax.plot(time, pred, label='Predicted')
ax.plot(time, backup, label='Backup')
ax.plot(time, obs_minus_pred, label='Difference')
ax.set_ylabel('Feet above MLLW')
ax.set_xlabel('Hours Since First Measurement')
ax.legend(loc='upper right')
<matplotlib.legend.Legend at 0x1129b5e50>
Wait a minute, what did we just do there? We just used a minus sign to do an element-wise subtraction of two arrays! Pretty handy. Let's look at array arithmetic for a bit.
Arrays can be used in arithmetic expressions using the same binary operators we use for numbers: +
, -
, *
, /
, **
, etc. These expressions return new arrays in which the mathematical operation has been applied elementwise. It's easiest to see this when combining an array and a scalar:
a = np.arange(5, dtype=np.float) # float to avoid integer surprises
print a
[ 0. 1. 2. 3. 4.]
a + 5
array([ 5., 6., 7., 8., 9.])
a * 5
array([ 0., 5., 10., 15., 20.])
To get this same effect with lists you'd need to use a list or comprehension. With arrays it's as simple as a + 5
. When combining arrays and scalars the same operation is applied to every element of the array. When combining two arrays it's slightly different:
b = np.arange(10, 20, 2, dtype=np.float)
print b
[ 10. 12. 14. 16. 18.]
b - a
array([ 10., 11., 12., 13., 14.])
a / b
array([ 0. , 0.08333333, 0.14285714, 0.1875 , 0.22222222])
In these cases the first element of a
operates with the first element of b
, and so on. So long as the two arrays are the same size and shape you will see this behavior. Arrays with different shapes can sometimes be combined with binary operators via an implicit resizing/reshaping called broadcasting, but that's a topic for another day.
There might be more we could do with this data. When was the peak water height?
time[backup.argmax()]
21.399999999999999
These are hours past midnight on Oct 29, so the peak water height was sometime around 9:24 PM on Oct. 29. The biggest difference between the predicted and observed tide heights were around the same time:
time[obs_minus_pred.argmax()]
21.399999999999999
Which is one reason NYC had such bad flooding.
These are just some ways of working with data in NumPy arrays. Learn more by diving into the documentation at http://docs.scipy.org/doc/.