Our previous lessons have shown us how to write programs that ingest a list of data files, perform some calculations on those data, and then print a final result to the screen. While this was a useful exercise in learning the principles of scripting and parsing the command line, in most cases the output of our programs will not be so simple. Instead, programs typically take data as input, manipulate that data, and then output yet more data. Over the course of a multi-year research project, most reseachers will write many different programs that produce many different output datasets.
We want to:
Along the way, we will learn:
In this lesson we are going to process some of the climate model data that was submitted to the CMIP5 project. This project informed many of the results presented in the Intergovernmental Panel on Climate Change (IPCC) Fifth Assessment Report, making it one of the most widely used datasets in the world.
First off, let's see what files we've got:
!ls *.nc
uas_Amon_ACCESS1-3_rcp85_r1i1p1_205006-205008_aus.nc vas_Amon_ACCESS1-3_rcp85_r1i1p1_205006-205008_aus.nc
The first thing to notice is the distinctive Data Reference Syntax (DRS) associated with CMIP5. Modelling groups contributing to the project must name their files according to the following structure:
<variable name>_<MIP table>_<model>_<experiment>_<ensemble member>_<temporal subset>_<geographical info>.nc
From this we can deduce, without even inspecting the contents of the file, that we have surface zonal (i.e. east-west; abbreviated uas
) and meridional (i.e. north-south; abbreviated vas
) wind speed data. It belongs to the atmospheric, monthly timescale data group (Amon
) and was derived from an Australian model known as ACCESS1-3
. The external forcing applied to the model was that corresponding to the rcp85
scenario (high future greenhouse gas emissions), it was the r1i1p1
'th realisation of the model, and we have the data for June, July and August of the year 2050 for the Australian (aus
) region.
The DRS for CMIP5 actually goes further than just the file name. If you download a whole heap of CMIP5 data, it comes with the following directory structure:
/<activity>/<product>/<institute>/<model>/<experiment>/<frequency>/<modeling realm>/<variable name>/<ensemble member>/
In the first instance this level of detail seems like a bit of overkill, but consider the scope of the CMIP5 data archive. It contains data from over 50 climate models for upwards of 100 different variables and 50 or so different experiments, for which each modelling group typically provides bewteen 3 and 10 different realisations. Since the data are so well labelled, calculating the average surface temperature (tas
) across the r1i1p1
realisation of all models that provided monthly timescale data for the rcp85
scenario can be achieved with a single cdo bash shell command like the following, which is truly amazing:
cdo ensmean /*/*/*/*/rcp85/mon/*/tas/r1i1p1/tas_Amon_*_rcp85_r1i1p1_*.nc
Unless your research involves analysing CMIP5 data, you may never deal with such a large dataset. Nevertheless, it is a very good idea to develop your own personal DRS for the data that you do have. This often involves investing some time at the beginning of a project to think carefully about the design of your directory and file name structures (as these can be very hard to change later on). The combination of bash shell wildcards and a well planned DRS is one of the easiest ways to make your research more efficient and reliable.
We haven't even looked inside our CMIP5 data files and already we have the beginnings of a detailed data management plan. The first step in any research project should be to develop such a plan, so for this challenge we are going to turn back time. If you could start your current research project all over again, what would your data management plan look like? Things to consider include:
Write down and discuss your plan with your partner.
Now that we've identified our CMIP5 files, let's go ahead and look what's inside. Our initial impulse might be to enter
!cat uas_Amon_ACCESS1-3_rcp85_r1i1p1_205006-205008_aus.nc
but in this case such a command will produce an incomprehensible mix symbols and letters. The reason is that up until now, we have been dealing with text files. These consist of a simple sequence of character data (represented using ASCII, Unicode, or some other standard) separated into lines, meaning that text files are human-readable when opened with a text editor or displayed using cat
.
All other file types are known collectively as binary files. They tend to be smaller and faster for the computer to interpret than text files, but the payoff is that they aren't human-readable unless you have the right intpreter (e.g. .doc
files aren't readable with your text editor and must instead be opened with Microsoft Word). In this case we have a Network Common Data Form (netCDF) file, so we need to use a special command line utility called ncdump
to view the contents.
!ncdump -h uas_Amon_ACCESS1-3_rcp85_r1i1p1_205006-205008_aus.nc
netcdf uas_Amon_ACCESS1-3_rcp85_r1i1p1_205006-205008_aus { dimensions: lon = 26 ; nb2 = 2 ; lat = 28 ; time = UNLIMITED ; // (3 currently) variables: double lon(lon) ; lon:standard_name = "longitude" ; lon:long_name = "longitude" ; lon:units = "degrees_east" ; lon:axis = "X" ; lon:bounds = "lon_bnds" ; double lon_bnds(lon, nb2) ; double lat(lat) ; lat:standard_name = "latitude" ; lat:long_name = "latitude" ; lat:units = "degrees_north" ; lat:axis = "Y" ; lat:bounds = "lat_bnds" ; double lat_bnds(lat, nb2) ; double time(time) ; time:standard_name = "time" ; time:bounds = "time_bnds" ; time:units = "days since 1-01-01 00:00:00" ; time:calendar = "proleptic_gregorian" ; double time_bnds(time, nb2) ; time_bnds:units = "days since 1-01-01 00:00:00" ; time_bnds:calendar = "proleptic_gregorian" ; float uas(time, lat, lon) ; uas:standard_name = "eastward_wind" ; uas:long_name = "Eastward Near-Surface Wind" ; uas:units = "m s-1" ; uas:_FillValue = 1.e+20f ; uas:cell_methods = "time: mean" ; uas:history = "2012-03-14T04:40:42Z altered by CMOR: Treated scalar dimension: \'height\'. 2012-03-14T04:40:42Z altered by CMOR: replaced missing value flag (-1.07374e+09) with standard missing value (1e+20)." ; uas:associated_files = "baseURL: http://cmip-pcmdi.llnl.gov/CMIP5/dataLocation gridspecFile: gridspec_atmos_fx_ACCESS1-3_rcp85_r0i0p0.nc" ; // global attributes: :CDI = "Climate Data Interface version 1.5.6 (http://code.zmaw.de/projects/cdi)" ; :Conventions = "CF-1.4" ; :history = "Thu Nov 07 14:19:44 2013: cdo sellonlatbox,110,160,-45,-10 uas_Amon_ACCESS1-3_rcp85_r1i1p1_205006-205008.nc uas_Amon_ACCESS1-3_rcp85_r1i1p1_205006-205008_aus.nc\n", "Thu Nov 07 14:13:51 2013: cdo seldate,2050-06-01,2050-08-31 uas_Amon_ACCESS1-3_rcp85_r1i1p1_200601-210012.nc uas_Amon_ACCESS1-3_rcp85_r1i1p1_205006-205008.nc\n", "CMIP5 compliant file produced from raw ACCESS model output using the ACCESS Post-Processor and CMOR2. 2012-03-14T04:40:43Z CMOR rewrote data to comply with CF standards and CMIP5 requirements. Fri Apr 13 12:32:01 2012: corrected model_id from ACCESS1-3 to ACCESS1.3 Fri Apr 13 14:07:50 2012: forcing attribute modified to correct value Wed May 2 13:39:09 2012: updated version number to v20120413." ; :institution = "CSIRO (Commonwealth Scientific and Industrial Research Organisation, Australia), and BOM (Bureau of Meteorology, Australia)" ; :institute_id = "CSIRO-BOM" ; :experiment_id = "rcp85" ; :model_id = "ACCESS1.3" ; :forcing = "GHG, Oz, SA, Sl, Vl, BC, OC, (GHG = CO2, N2O, CH4, CFC11, CFC12, CFC113, HCFC22, HFC125, HFC134a)" ; :parent_experiment_id = "historical" ; :parent_experiment_rip = "r1i1p1" ; :branch_time = 732311. ; :contact = "The ACCESS wiki: http://wiki.csiro.au/confluence/display/ACCESS/Home. Contact Tony.Hirst@csiro.au regarding the ACCESS coupled climate model. Contact Peter.Uhe@csiro.au regarding ACCESS coupled climate model CMIP5 datasets." ; :references = "See http://wiki.csiro.au/confluence/display/ACCESS/ACCESS+Publications" ; :initialization_method = 1 ; :physics_version = 1 ; :tracking_id = "724f536a-c5fa-4a68-85f1-ff277af34c75" ; :version_number = "v20120413" ; :product = "output" ; :experiment = "RCP8.5" ; :frequency = "mon" ; :creation_date = "2012-03-14T04:40:43Z" ; :project_id = "CMIP5" ; :table_id = "Table Amon (01 February 2012) 01388cb4507c2f05326b711b09604e7e" ; :title = "ACCESS1-3 model output prepared for CMIP5 RCP8.5" ; :parent_experiment = "historical" ; :modeling_realm = "atmos" ; :realization = 1 ; :cmor_version = "2.8.0" ; :CDO = "Climate Data Operators version 1.5.6.1 (http://code.zmaw.de/projects/cdo)" ; }
By using the -h
flag, only the header of the file has been shown. The great thing about netCDF files is that the header contains metadata - that is, data about the data. Each variable has it's own 'variable attributes' (e.g. the lat
axis has a long_name
and units
attribute) and then there are also a whole suite of global attributes that describe the history of the file. When we write out own netCDF output later on, we will discuss the conventions around netCDF metadata in more detail.
To read in our data, we are going to use a library known as the Climate Data Management System (cdms2
). This library is part of a larger open-source software package called Ultrascale Visualisation - Climate Data Analysis Tools (UV-CDAT).
import cdms2
u_name = 'uas_Amon_ACCESS1-3_rcp85_r1i1p1_205006-205008_aus.nc'
u_file = cdms2.open(u_name)
u_data = u_file('uas')
v_name = 'vas_Amon_ACCESS1-3_rcp85_r1i1p1_205006-205008_aus.nc'
v_file = cdms2.open(v_name)
v_data = v_file('vas')
Our two variables, udata
and vdata
, are cdms2 transient variables.
print 'udata is of type:', type(u_data)
udata is of type: <class 'cdms2.tvariable.TransientVariable'>
The nice thing about these transient variables is that they carry the netCDF metadata with them.
print 'Metadata about the time axis:'
print u_data.getTime()
print 'Raw time axis values:'
print u_data.getTime()[:]
print 'Time axis values in a friendlier format:'
print u_data.getTime().asComponentTime()
Metadata about the time axis: id: time Designated a time axis. units: days since 1-01-01 00:00:00 Length: 3 First: 748548.0 Last: 748609.5 Other axis attributes: standard_name: time calendar: proleptic_gregorian axis: T Python id: 0x3599190 Raw time axis values: [ 748548. 748578.5 748609.5] Time axis values in a friendlier format: [2050-6-16 0:0:0.0, 2050-7-16 12:0:0.0, 2050-8-16 12:0:0.0]
We can now go ahead and calculate the wind speed,
wsp_data = (u_data**2 + v_data**2)**0.5
and our transient variables are smart enough to pass along the relevant metadata to our new variable:
print 'Metadata about the time axis:'
print wsp_data.getTime()
print 'Time axis values in a friendlier format:'
print wsp_data.getTime().asComponentTime()
Metadata about the time axis: id: time Designated a time axis. units: days since 1-01-01 00:00:00 Length: 3 First: 748548.0 Last: 748609.5 Other axis attributes: standard_name: time calendar: proleptic_gregorian axis: T Python id: 0x321db50 Time axis values in a friendlier format: [2050-6-16 0:0:0.0, 2050-7-16 12:0:0.0, 2050-8-16 12:0:0.0]
Climate and Forecast (CF) metadata convention¶
It's incredibly useful that libraries like
cdms2
can make use of the metadata stored in netCDF files to create methods likeasComponentTime()
. However, let's put ourselves in the shoes of the developers ofcdms2
for a minute. In order to convert the time axis to a meaningful list of dates, the library needs to first identify the units of the time axis. This isn't as easy as you'd think, since the creator of the netCDF file could easily have called theunits
attribute "measure
," or "scale
," or something else completely unpredictable instead. They could also have defined the units as "weeks since 1-01-01 00:00:00
," or "milliseconds after 1979-12-31
." Obviously what is needed is a standard method for defining netCDF attributes, and that’s where the Climate and Forecast (CF) metadata convention comes in.
The CF metadata standard was first defined back in the early 2000s and has now been adopted by all the major institutions and projects in the weather/climate sciences. There is a nice blog post on the topic if you'd like more information, but for the most part you just need to be aware that if a tool you're using isn't working, if might be because your netCDF file isn't CF compliant.
Before we go ahead and create a new script (calc_wind_speed.py
) for calculating the wind speed, there's just one more thing to consider. Looking closely at the global attributes of uas_Amon_ACCESS1-3_rcp85_r1i1p1_205006-205008_aus.nc
you can see that the entire history of the file, all the way back to its initial download, has been recorded in the history
attribute.
global_atts = u_file.attributes
old_history = global_atts['history']
print old_history
Thu Nov 07 14:19:44 2013: cdo sellonlatbox,110,160,-45,-10 uas_Amon_ACCESS1-3_rcp85_r1i1p1_205006-205008.nc uas_Amon_ACCESS1-3_rcp85_r1i1p1_205006-205008_aus.nc Thu Nov 07 14:13:51 2013: cdo seldate,2050-06-01,2050-08-31 uas_Amon_ACCESS1-3_rcp85_r1i1p1_200601-210012.nc uas_Amon_ACCESS1-3_rcp85_r1i1p1_205006-205008.nc CMIP5 compliant file produced from raw ACCESS model output using the ACCESS Post-Processor and CMOR2. 2012-03-14T04:40:43Z CMOR rewrote data to comply with CF standards and CMIP5 requirements. Fri Apr 13 12:32:01 2012: corrected model_id from ACCESS1-3 to ACCESS1.3 Fri Apr 13 14:07:50 2012: forcing attribute modified to correct value Wed May 2 13:39:09 2012: updated version number to v20120413.
The last two entries, for instance, were generated by the cdo
package when it was used to select a temporal (seldate
) and spatial (sellonlatbox
) subset of the original data file. This practice of recording the history of the file ensures the provenance of the data. In other words, a complete record of everything that has been done to the data is stored with the data, which avoids any confusion in the event that the data is ever moved, passed around to different users, or viewed by its creator many months later.
If we want to create our own entry for the history attribute, we'll need to be able to create a:
calc_wind_speed.py
A library called datetime
can be used to find out the time and date right now:
import datetime
time_stamp = datetime.datetime.now().strftime("%a %b %d %H:%M:%S %Y")
print time_stamp
Tue Jun 10 16:59:58 2014
The strftime
function can be used to customise the appearance of a datetime object; in this case we've made it look just like the other time stamps in our data files.
In the Software Carpentry lesson on command line programs we met sys.argv
, which contains all the arguments entered by the user at the command line:
import sys
print sys.argv
['-c', '-f', '/home/dbirving/.ipython/profile_default/security/kernel-0f9c6be1-a1d7-4bd9-8f12-c7772ab46070.json', "--IPKernelApp.parent_appname='ipython-notebook'", '--profile-dir', '/home/dbirving/.ipython/profile_default', '--parent=1']
In launching this IPython notebook, you can see that a number of command line arguments were used. To join all these list elements up, we can use the join
function that belongs to Python strings
args = " ".join(sys.argv)
print args
-c -f /home/dbirving/.ipython/profile_default/security/kernel-0f9c6be1-a1d7-4bd9-8f12-c7772ab46070.json --IPKernelApp.parent_appname='ipython-notebook' --profile-dir /home/dbirving/.ipython/profile_default --parent=1
While this list of arguments is very useful, it doesn't tell us which Python installation was used to execute those arguments. The sys
library can help us out here too:
exe = sys.executable
print exe
/usr/local/uvcdat/1.5.1/bin/python
In the Software Carpentry lessons on git we learned that each commit is associated with a unique 40-character identifier known as a hash. We can use the git Python library to get the hash associated with the script:
from git import Repo
import os
git_hash = Repo(os.getcwd()).head.commit.hexsha
print git_hash
2db1f92b517fe4262f4c8d98d2ec8b9b4c89b154
We can now put all this information together for our history entry:
entry = """%s: %s %s (Git hash: %s)""" %(time_stamp, exe, args, git_hash[0:7])
print entry
Tue Jun 10 16:59:58 2014: /usr/local/uvcdat/1.5.1/bin/python -c -f /home/dbirving/.ipython/profile_default/security/kernel-0f9c6be1-a1d7-4bd9-8f12-c7772ab46070.json --IPKernelApp.parent_appname='ipython-notebook' --profile-dir /home/dbirving/.ipython/profile_default --parent=1 (Git hash: 2db1f92)
So far we've been experimenting in the IPython notebook to familiarise ourselves with UV-CDAT and the other Python modules that might be useful for calculating the wind speed. We should now go ahead and write a script, so we can repeat the process with a single entry at the command line:
!cat calc_wind_speed.py
import os, sys import datetime from git import Repo import cdms2 cdms2.setNetcdfShuffleFlag(0) cdms2.setNetcdfDeflateFlag(0) cdms2.setNetcdfDeflateLevelFlag(0) def main(): script = sys.argv[0] u_file = sys.argv[1] u_var = sys.argv[2] v_file = sys.argv[3] v_var = sys.argv[4] outfile_name = sys.argv[5] u_data, ufile_atts = read_data(u_file, u_var) v_data, vfile_atts = read_data(v_file, v_var) wsp_data = calc_wsp(u_data, v_data) write_output(wsp_data, ufile_atts, outfile_name) def read_data(ifile, var): """Read data from ifile corresponding to the var variable""" fin = cdms2.open(ifile) data = fin(var) file_atts = fin.attributes fin.close() return data, file_atts def calc_wsp(uwnd, vwnd): """Calculate the wind speed and create relevant attributes""" wsp = (uwnd**2 + vwnd**2)**0.5 wsp.id = 'wsp' wsp.long_name = 'Wind speed' wsp.units = 'm s-1' return wsp def write_output(wsp_data, ufile_atts, outfile_name): """Write the output file""" outfile = cdms2.open(outfile_name, 'w') new_history = create_history() old_history = ufile_atts['history'] setattr(outfile, 'history', """%s\n%s""" %(new_history, old_history)) for att_name in ufile_atts.keys(): if att_name != "history": # history excluded because we've already done it setattr(outfile, att_name, ufile_atts[att_name]) outfile.write(wsp_data) outfile.close() def create_history(): """Create the new entry for the global history file attribute""" time_stamp = datetime.datetime.now().strftime("%a %b %d %H:%M:%S %Y") exe = sys.executable args = " ".join(sys.argv) git_hash = Repo(os.getcwd()).head.commit.hexsha return """%s: %s %s (Git hash: %s)""" %(time_stamp, exe, args, git_hash[0:7]) main()
(The cdms2.setNetcdf...
commands simply specify that we want the classic netCDF format - see this post for more details on netCDF formats.)
We can now run this script at the command line:
!/usr/local/uvcdat/1.5.1/bin/python calc_wind_speed.py uas_Amon_ACCESS1-3_rcp85_r1i1p1_205006-205008_aus.nc uas vas_Amon_ACCESS1-3_rcp85_r1i1p1_205006-205008_aus.nc vas wsp_Amon_ACCESS1-3_rcp85_r1i1p1_205006-205008_aus.nc
We can now inspect the attributes in our new file:
!ncdump -h wsp_Amon_ACCESS1-3_rcp85_r1i1p1_205006-205008_aus.nc
netcdf wsp_Amon_ACCESS1-3_rcp85_r1i1p1_205006-205008_aus { dimensions: time = UNLIMITED ; // (3 currently) bound = 2 ; lat = 28 ; lon = 26 ; variables: double time(time) ; time:bounds = "time_bnds" ; time:units = "days since 1-01-01 00:00:00" ; time:standard_name = "time" ; time:calendar = "proleptic_gregorian" ; time:axis = "T" ; double time_bnds(time, bound) ; double lat(lat) ; lat:bounds = "lat_bnds" ; lat:units = "degrees_north" ; lat:long_name = "latitude" ; lat:standard_name = "latitude" ; lat:axis = "Y" ; double lat_bnds(lat, bound) ; double lon(lon) ; lon:bounds = "lon_bnds" ; lon:modulo = 360. ; lon:long_name = "longitude" ; lon:standard_name = "longitude" ; lon:units = "degrees_east" ; lon:axis = "X" ; lon:topology = "circular" ; double lon_bnds(lon, bound) ; float wsp(time, lat, lon) ; wsp:associated_files = "baseURL: http://cmip-pcmdi.llnl.gov/CMIP5/dataLocation gridspecFile: gridspec_atmos_fx_ACCESS1-3_rcp85_r0i0p0.nc" ; wsp:long_name = "Wind speed" ; wsp:standard_name = "eastward_wind" ; wsp:cell_methods = "time: mean" ; wsp:units = "m s-1" ; wsp:missing_value = 1.e+20f ; wsp:history = "2012-03-14T04:40:42Z altered by CMOR: Treated scalar dimension: \'height\'. 2012-03-14T04:40:42Z altered by CMOR: replaced missing value flag (-1.07374e+09) with standard missing value (1e+20)." ; // global attributes: :Conventions = "CF-1.4" ; :history = "Tue Jun 10 17:00:00 2014: /usr/local/uvcdat/1.5.1/bin/python calc_wind_speed.py uas_Amon_ACCESS1-3_rcp85_r1i1p1_205006-205008_aus.nc uas vas_Amon_ACCESS1-3_rcp85_r1i1p1_205006-205008_aus.nc vas wsp_Amon_ACCESS1-3_rcp85_r1i1p1_205006-205008_aus.nc (Git hash: 2db1f92)\n", "Thu Nov 07 14:19:44 2013: cdo sellonlatbox,110,160,-45,-10 uas_Amon_ACCESS1-3_rcp85_r1i1p1_205006-205008.nc uas_Amon_ACCESS1-3_rcp85_r1i1p1_205006-205008_aus.nc\n", "Thu Nov 07 14:13:51 2013: cdo seldate,2050-06-01,2050-08-31 uas_Amon_ACCESS1-3_rcp85_r1i1p1_200601-210012.nc uas_Amon_ACCESS1-3_rcp85_r1i1p1_205006-205008.nc\n", "CMIP5 compliant file produced from raw ACCESS model output using the ACCESS Post-Processor and CMOR2. 2012-03-14T04:40:43Z CMOR rewrote data to comply with CF standards and CMIP5 requirements. Fri Apr 13 12:32:01 2012: corrected model_id from ACCESS1-3 to ACCESS1.3 Fri Apr 13 14:07:50 2012: forcing attribute modified to correct value Wed May 2 13:39:09 2012: updated version number to v20120413." ; :initialization_method = 1 ; :CDI = "Climate Data Interface version 1.5.6 (http://code.zmaw.de/projects/cdi)" ; :product = "output" ; :creation_date = "2012-03-14T04:40:43Z" ; :frequency = "mon" ; :references = "See http://wiki.csiro.au/confluence/display/ACCESS/ACCESS+Publications" ; :title = "ACCESS1-3 model output prepared for CMIP5 RCP8.5" ; :experiment = "RCP8.5" ; :realization = 1 ; :project_id = "CMIP5" ; :institute_id = "CSIRO-BOM" ; :model_id = "ACCESS1.3" ; :parent_experiment_id = "historical" ; :experiment_id = "rcp85" ; :cmor_version = "2.8.0" ; :parent_experiment = "historical" ; :modeling_realm = "atmos" ; :branch_time = 732311. ; :institution = "CSIRO (Commonwealth Scientific and Industrial Research Organisation, Australia), and BOM (Bureau of Meteorology, Australia)" ; :version_number = "v20120413" ; :forcing = "GHG, Oz, SA, Sl, Vl, BC, OC, (GHG = CO2, N2O, CH4, CFC11, CFC12, CFC113, HCFC22, HFC125, HFC134a)" ; :CDO = "Climate Data Operators version 1.5.6.1 (http://code.zmaw.de/projects/cdo)" ; :physics_version = 1 ; :contact = "The ACCESS wiki: http://wiki.csiro.au/confluence/display/ACCESS/Home. Contact Tony.Hirst@csiro.au regarding the ACCESS coupled climate model. Contact Peter.Uhe@csiro.au regarding ACCESS coupled climate model CMIP5 datasets." ; :table_id = "Table Amon (01 February 2012) 01388cb4507c2f05326b711b09604e7e" ; :tracking_id = "724f536a-c5fa-4a68-85f1-ff277af34c75" ; :parent_experiment_rip = "r1i1p1" ; }
Since most of the file attributes were inherited by default from the input data file (i.e. the u-wind file), it's worth checking to see if there are any that don't make sense. Sure enough, the standard name is misleading:
wsp:standard_name = "eastward_wind"
We should revise our script so that it removes or renames the standard name, but beyond that we should resist the urge to start cleaning up. The associated_files
wind speed attribute, for instance, makes little sense to anyone who isn't involved in the CMIP5 project. While this might seem like a reasonable argument for deleting that attribute, once an attribute is deleted it's gone forever. The associated_files
attribute is taking up a negligible amount of memory, so why not just leave it there just in case? When in doubt, keep metadata.
Does your data management plan from the first challenge adequately address this issue of data provenance? If not, go ahead and add to your plan now. Things to consider include:
.nc
files, formats like .csv
or .png
don't store things like global and variable attributes within them)Discuss the additions you've made to your plan with your partner.