This is part of Python for Geosciences notes.
================
!wget ftp://sidads.colorado.edu/pub/DATASETS/nsidc0051_gsfc_nasateam_seaice/final-gsfc/north/monthly/nt_200709_f17_v01_n.bin
Create file id:
ice = fromfile('nt_200709_f17_v01_n.bin', dtype='uint8')
We use uint8 data type. List of numpy data types
The file format consists of a 300-byte descriptive header followed by a two-dimensional array.
ice = ice[300:]
Reshape
ice = ice.reshape(448,304)
Simple visualisation of array with imshow (Matplotlib function):
imshow(ice)
colorbar()
<matplotlib.colorbar.Colorbar instance at 0xb05bb2c>
To convert to the fractional parameter range of 0.0 to 1.0, divide the scaled data in the file by 250.
ice = ice/250.
imshow(ice)
colorbar()
<matplotlib.colorbar.Colorbar instance at 0xb20c50c>
Let's mask all land and missing values:
ice_masked = ma.masked_greater(ice, 1.0)
imshow(ice_masked)
colorbar()
<matplotlib.colorbar.Colorbar instance at 0xb2e79ac>
Masking in this case is similar to using NaN in Matlab. More about NumPy masked arrays
fid = open('My_ice_2007.bin', 'wb')
ice.tofile(fid)
fid.close()
In order to work with other data formats we need to use one of the SciPy submodules:
General purpose scientific library (that consist of bunch of sublibraries) and builds on NumPy arrays.
We are going to use only scipy.io library.
First we have to load function that works with Matlab files:
from scipy.io import loadmat
We are going to download Polar science center Hydrographic Climatology (PHC) for January in Matlab format.
!wget https://www.dropbox.com/s/0kuzvz03gw6d393/PHC_jan.mat
Open file:
all_variables = loadmat('PHC_jan.mat')
We can look at the names of variables stored in the file:
all_variables.keys()
['PTEMP1', 'LON', '__header__', '__globals__', 'DEPTH', 'LAT', '__version__']
We need only PTEMP1 (3d potential temperature).
temp = numpy.array(all_variables['PTEMP1'])
Check variable's shape:
temp.shape
(33, 180, 360)
Show surface level:
imshow(temp[0,:,:])
colorbar()
<matplotlib.colorbar.Colorbar instance at 0xb6383ac>
Import nessesary function:
from scipy.io import netcdf
I am going to download NCEP reanalysis data. Surface 4 daily air temperature for 2012.
!wget ftp://ftp.cdc.noaa.gov/Datasets/ncep.reanalysis/surface/air.sig995.2012.nc
#Alternative for the times of US goverment shutdowns:
#!wget http://database.rish.kyoto-u.ac.jp/arch/ncep/data/ncep.reanalysis/surface/air.sig995.2012.nc
--2013-10-27 00:19:39-- ftp://ftp.cdc.noaa.gov/Datasets/ncep.reanalysis/surface/air.sig995.2012.nc => ‘air.sig995.2012.nc’ Resolving ftp.cdc.noaa.gov (ftp.cdc.noaa.gov)... 140.172.38.117 Connecting to ftp.cdc.noaa.gov (ftp.cdc.noaa.gov)|140.172.38.117|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD (1) /Datasets/ncep.reanalysis/surface ... done. ==> SIZE air.sig995.2012.nc ... 30793412 ==> PASV ... done. ==> RETR air.sig995.2012.nc ... done. Length: 30793412 (29M) (unauthoritative) 100%[======================================>] 30.793.412 1,22MB/s in 27s 2013-10-27 00:20:09 (1,10 MB/s) - ‘air.sig995.2012.nc’ saved [30793412]
Create file id:
fnc = netcdf.netcdf_file('air.sig995.2012.nc')
It's not really file id, it's netcdf_file object, that have some methods and attributes:
fnc.description
'Data is from NMC initialized reanalysis\n(4x/day). These are the 0.9950 sigma level values.'
fnc.history
'created 2011/12 by Hoop (netCDF2.3)'
list variables
fnc.variables
{'air': <scipy.io.netcdf.netcdf_variable at 0xb66140c>, 'lat': <scipy.io.netcdf.netcdf_variable at 0xb66172c>, 'lon': <scipy.io.netcdf.netcdf_variable at 0xb66162c>, 'time': <scipy.io.netcdf.netcdf_variable at 0xb6614ec>}
Access information about variables
air = fnc.variables['air']
This time we create netcdf_variable object, that contain among other things attributes of the netCDF variable as well as data themselves.
air.actual_range
array([ 191.1000061, 323. ], dtype=float32)
air.long_name
'4xDaily Air temperature at sigma level 995'
air.units
'degK'
print(air.add_offset)
print(air.scale_factor)
512.81 0.01
air.shape
(1464, 73, 144)
We can access the data by simply using array syntax. Here we show first time step of our data set:
imshow(air[0,:,:])
colorbar()
<matplotlib.colorbar.Colorbar instance at 0xbb6972c>
Data are packed. In order to unpack we have to use scale_factor and add_offset values from attributes of the netCDF air variable. We also convert to $^{\circ}$C:
air_c = ((air[:] * air.scale_factor) + air.add_offset) - 273.15
Check the values of our new variable:
imshow(air_c[0,:,:])
colorbar()
<matplotlib.colorbar.Colorbar instance at 0xbd5ab6c>
Minimalistic variant :)
!rm test_netcdf.nc
fw = netcdf.netcdf_file('test_netcdf.nc', 'w')
fw.createDimension('t', 1464)
fw.createDimension('y', 73)
fw.createDimension('x', 144)
air_var = fw.createVariable( 'air','float32', ('t', 'y', 'x'))
air_var[:] = air_c[:]
fw.close()
More descriptive variant:
!rm test_netcdf.nc
fw = netcdf.netcdf_file('test_netcdf.nc', 'w')
fw.createDimension('TIME', 1464)
fw.createDimension('LATITUDE', 73)
fw.createDimension('LONGITUDE', 144)
time = fw.createVariable('TIME', 'f', ('TIME',))
time[:] = fnc.variables['time'][:]
time.units = 'hours since 1-1-1 00:00:0.0'
lat = fw.createVariable('LATITUDE', 'f', ('LATITUDE',))
lat[:] = fnc.variables['lat'][:]
lon = fw.createVariable('LONGITUDE', 'f', ('LONGITUDE',))
lon[:] = fnc.variables['lon'][:]
ha = fw.createVariable('New_air','f', ('TIME', 'LATITUDE', 'LONGITUDE'))
ha[:] = air_c[:]
ha.missing_value = -9999.
fw.close()