Here's how I used pandas to rearrange and clean up the raw tide data from NOAA.
import pandas as pd
Read the data via pandas' flexible read_table function. This returns a DataFrame:
d = pd.read_table('BatteryParkTideData_Cleaned.txt', sep='\s+', parse_dates=[[1, 2]])
d.head()
In the data I output I don't want dates, I just want the time since the first measurement, which is a simple, easy to work with floating point number.
So next I add a new column to d that's the difference between each timestamp and the first timestamp:
d['TimeOffset'] = d['Date_Time'] - d['Date_Time'][0]
d.head()
Now I have the time since the first measurement, but it's still in a datetime format, specifically a timedelta. To convert that to hours I use the total_seconds methods and divide by 3600. I add that to d as another new column:
d['TimeOffsetHours'] = pd.Series(to.total_seconds() / 3600. for to in d['TimeOffset'])
Finally I write a CSV using the to_csv method. I'm writing only the time since first measurement, predicted measurement, and measurement columns:
d.to_csv('BatteryParkTideData.csv', na_rep='NA', cols=['TimeOffsetHours', 'Pred6', 'Backup', 'Acoustc'], index=False)