Notebook

Food Energy Flows Exploratorium

A project by Dénes Csala and Sgouris Sgouridis at Lancaster University

Food Energy Flows Data Parser Notebook

This workbook will guide you through the mining, processing, formatting and saving the input data for the data analysis and visualizations.

This is document has been created using IPython Notebook in the Anaconda distribution and it can be edited and run in active mode by clicking download in top right corner of this page. The code is partitioned into sections, called cells. When you are using this workbook in active mode, double-click on a cell to edit it and then run using Ctrl + Enter. Hitting Shift + Enter runs the code and steps into the next cell, while Alt + Enter runs the code and adds a new, empty cell. If you are running this notebook on a presonal computer, you will need a machine with at least 1GB of memory (2GB recommended) and a processor of 1GHz.

Code

1. Data access

We start with importing the libraries that we are going to use into the Python workspace.

In [3]:

import numpy as np, requests, zipfile, StringIO, pandas as pd, json, copy

#suppres warnings
import warnings
warnings.simplefilter(action = "ignore")

Afer importing the necessary libraries, we connect to the FAOSTAT database and define indicator paths. Alternatively, the FAOSTAT database is mirrored at the project website's data repository (FAOSTAT, has no directory listing access, just on a file-by-file basis). FAOSTAT offers the data in the format of zipped CSV (comma-separated values) files, and it has a separate file for each of their 17 domains. Should you choose to download the entire database to your local folder, please set the variable localpath to True and indicate this path in the masterpath variable. If you choose to use the online resources directly, please set localpath to False and use the appropiate masterpath (FAOSTAT or project website), as indicated below. Since the beginning of this project, FAOSTAT has changed their database access, this is reflected in the code below. It is recommended to use the alternative database mirror, tested to work with the code as of July 2015.

In [4]:

import matplotlib.pyplot as plt

In [5]:

masterpath = 'http://faostat3.fao.org/faostat-bulkdownloads/'                  #new FAOSTAT
#masterpath = 'http://faostat.fao.org/Portals/_Faostat/Downloads/zip_files/'   #old FAOSTAT
#masterpath = 'https://dl.dropboxusercontent.com/u/531697/datarepo/Food-Energy/db/'  #alternative for FAO database

localpath=True

if localpath:
    #local path for downloaded FAO database, only for developer purposes
    masterpath = 'E:/Dropbox (MIT)/Public/Food-Energy/db/' 
    savepath = 'E:/Dropbox (MIT)/Public/Food-Energy/json/'
    
indicators=[] #we store indicator name strings in this string array - do not touch
indicators.append('Emissions_Agriculture_Energy_E_All_Data') #0
indicators.append('Inputs_Fertilizers_E_All_Data') #1
indicators.append('Resources_FertilizersArchive_E_All_Data') #2
indicators.append('Population_E_All_Data') #3
indicators.append('FoodBalanceSheets_E_All_Data') #4
indicators.append('Environment_Energy_E_All_Data') #5
indicators.append('Resources_Land_E_All_Data') #6
indicators.append('Trade_Crops_Livestock_E_All_Data') #7

After defining the indicator names, we define the function get_data(ind,cols,icols) that loads the data from the respective CSV from inside the corresponding ZIP file into a pandas dataframe. This can take anywhere from a few seconds to several minutes, depending on your computer's performance. If you are running this notebook live and have access to paralell computing capabilites please turn them on at the Clusters tab of the main Ipython window. The syntax is slightly different for locally mirorred data and direct online access. By using compound indexing, we only load the relevant portions of the CSV files. The indicator oridnal (defined above) is passed with the argument ind, the argument cols defines which columns to load and icols defines which columns to use as indices and their hierarchy. Make sure you run the cells above first, as the get_data() function uses their output. The path variable is an array that containes 3 strings, in the following order: url or directory path, zip file name, csv file name (in case of FAO, the two last entries happend to coincide). The elements, items and units arguments are arrays of strings, providing different levels of data filtering, preserving order hierarchy. The new_get_data is the updated version of the function to support the changed FAOSTAT API.

In [6]:

def get_data(path,cols=[],icols=[],elements=[],items=[],units=[]):
    if localpath:
        r = path[0]+path[1]+'.zip'                          #define name of zip file to read
        z = zipfile.ZipFile(r)                              #open zip file for access
    else:
        r = requests.get(path[0]+path[1]+'.zip')            #define URL path of zip file to read
        z = zipfile.ZipFile(StringIO.StringIO(r.content))   #stream web content of zip file to read
    if elements:
        if items:
            if units:
                return pd.read_csv(z.open(path[2]+'.csv'),usecols=cols,index_col=icols)\
                         .query('Element=='+repr(elements)+'and Item=='+repr(items)+'and Unit=='+repr(units))\
                         .drop('Unit', axis=1)
            else:
                return pd.read_csv(z.open(path[2]+'.csv'),usecols=cols,index_col=icols)\
                         .query('Element=='+repr(elements)+'and Item=='+repr(items))
        else:
            return pd.read_csv(z.open(path[2]+'.csv'),usecols=cols,index_col=icols)\
                     .query('Element=='+repr(elements))
    elif icols:
        return pd.read_csv(z.open(path[2]+'.csv'),usecols=cols,index_col=icols)
    elif cols:
        return pd.read_csv(z.open(path[2]+'.csv'),usecols=cols)
    else:
        return pd.read_csv(z.open(path[2]+'.csv'))

In [7]:

def new_get_data(path,cols=[],icols=[],elements=[],items=[],units=[],years=[1961,2011]):
    if localpath:
        r = path[0]+path[1]+'.zip'                          #define name of zip file to read
        z = zipfile.ZipFile(r)                              #open zip file for access
    else:
        r = requests.get(path[0]+path[1]+'.zip')            #define URL path of zip file to read
        z = zipfile.ZipFile(StringIO.StringIO(r.content))   #stream web content of zip file to read
    yrrange=['Y'+repr(year) for year in range(years[0],years[1])] #keep only good years, drop fiscal years
    if elements:
        if items:
            if units:
                return pd.read_csv(z.open(path[2]+'.csv'),usecols=cols+yrrange,index_col=icols)\
                         .query('Element=='+repr(elements)+'and Item=='+repr(items)+'and Unit=='+repr(units))\
                         .drop('Unit', axis=1)
            else:
                return pd.read_csv(z.open(path[2]+'.csv'),usecols=cols+yrrange,index_col=icols)\
                         .query('Element=='+repr(elements)+'and Item=='+repr(items))
        else:
            return pd.read_csv(z.open(path[2]+'.csv'),usecols=cols+yrrange,index_col=icols)\
                     .query('Element=='+repr(elements))
    elif icols:
        return pd.read_csv(z.open(path[2]+'.csv'),usecols=cols+yrrange,index_col=icols)
    elif cols:
        return pd.read_csv(z.open(path[2]+'.csv'),usecols=cols+yrrange)
    else:
        return pd.read_csv(z.open(path[2]+'.csv'))

For this work, we will use the Food Balance Sheets, the Emissions Agriculture Energy, the Fertilizers and Fertilizers Archive and Population and Import modules of FAOSTAT. Prior to processing into a pandas dataframe with the above get_data or new_get_data function, we will refer to the data contained in the above modules as datasets (Food Balance Sheets, Emissions Agriculture Energy, Fertilizers, Fertilizers Archive, Population, Import). After processing and cleaning up, we will refer to them as dataframes (balance, energy, fertilizers, population, imports) and they will correspond to the actual pandas dataframe objects with idem names in the Python workspace.

To optimize for speed, when loading the datasets into the dataframes, we will filter and only save the entries that are needed for the study, for each of the datasets. This is defined by elements and items arguments. On top of that, we use further customizations for certain datasets. The entries of the energy dataframe are provided both in units of TJ and GWh, but we will only use one of them. After filtering the dataframe for the selected unit, the entries of Unit colum can be dropped to conserve memory. For the balance dataframe, all Item entries are needed, hence no filtering is done on this column.

1.1. Energy

First, let's create the energy dataframe. Emissions Agriculture Energy is the second largest of the datasets, and processing can take anywhere from a few seconds in case of locally mirrored data, for up to 5 minutes or more in case of direct online access. (The balance dataframe will be the largest. It needs at least 1GB of memory. We will leave it to load last). We display only the first 5 items of the pandas dataframe returned by the get_data() function by calling head().

In [5]:

elements=['Consumption in Agriculture']
items=['Electricity', 'Energy for power irrigation']+['Motor Gasoline','Gas-Diesel oil', 'Natural gas (including LNG)',\
                       'Fuel oil', 'Liquefied petroleum gas (LPG)','Coal',\
                       'Gas-diesel oils used in fisheries','Fuel oil used in fisheries']
units=['Terajoule']
energy = new_get_data([masterpath,indicators[0],indicators[0]],\
                  ['Country','Item','Element','Unit'],[0,2,1],elements,items,units,[1970,2012])

#extra processing for new FAO data tables
energy.columns=range(1970,2012)
energy=pd.DataFrame(energy.stack())
energy.index = energy.index.swaplevel(1, 3).swaplevel(2, 3)
energy.index.names=([u'Country', 'Year', u'Element', u'Item'])
energy.columns=['Value']

print energy.count()
energy.head() #remove .head() to view all items

Value    49369
dtype: int64

Out[5]:

				Value
Country	Year	Element	Item
Afghanistan	1970	Consumption in Agriculture	Electricity	43.2
	1971	Consumption in Agriculture	Electricity	46.8
	1972	Consumption in Agriculture	Electricity	57.6
	1973	Consumption in Agriculture	Electricity	57.6
	1974	Consumption in Agriculture	Electricity	68.4

As we can see, there are 48184 entries in the returned energy dataframe. We are interested in categorizing the food energy flows based on their destinations, which is depicted under the Element or Item column, depending on the dataframe. In order to have a well-manageable JSON output, we will always index the data after countries and years first. Using hierarchical indexing over Country, Year and Element and/or Item, we obtain the reduced dataframes that contain the values of interest to us in the Value column.

In the following cell, for demonstration purposes, we will display the data for Brazil, for the earliest available year. The pandas function xs creates a cross-section of a dataframe, indexed after the first argument (list of strings of indices) and with the hiercharchy defined by the second argument (list of integers level).

In [6]:

#note that for energy, the categories that are of interest to us are stored in the Item column
energy.xs(['Brazil',1980], level=[0,1])
#data availability is 1970-2011

Out[6]:

		Value
Element	Item
Consumption in Agriculture	Gas-Diesel oil	96836.0
	Motor Gasoline	0.0
	Natural gas (including LNG)	0.0
	Liquefied petroleum gas (LPG)	0.0
	Fuel oil	4686.4
	Electricity	7336.8
	Gas-diesel oils used in fisheries	0.0
	Fuel oil used in fisheries	0.0
	Energy for power irrigation	0.0

1.2. Fertilizers

We repeat the operations for the Fertilizers and Fertilizers Archive, Population, Food Balance Sheets and Import modules.
(To conserve display space, we only dispaly the first 5 elements - i.e. head - of the dataframes.)
For the fertilizer dataframes (fertilizers1, fertilizers2), the units are implicit (tonnes) and hence are not loaded to conserve memory.

In [7]:

elements=['Consumption in nutrients']
items=['Potash Fertilizers (K20 total nutrients)', 'Phosphate Fertilizers (P205 total nutrients)',\
       'Nitrogen Fertilizers (N total nutrients)']
fertilizers1 = new_get_data([masterpath,indicators[1],indicators[1]],\
                        ['Country','Item','Element'],[0,2,1],elements,items,[],[2002,2013])
#note that for fertilizers1, the categories that are of interest to us are stored in the Item column
#extra processing for new FAO data tables
fertilizers1.columns=range(2002,2013)
fertilizers1=pd.DataFrame(fertilizers1.stack())
fertilizers1.index = fertilizers1.index.swaplevel(1, 3).swaplevel(2, 3)
fertilizers1.index.names=([u'Country', 'Year', u'Element', u'Item'])
fertilizers1.columns=['Value']
fertilizers1.xs(['Brazil',2002], level=[0,1]).head() #remove .head() to view all items
#data availability is 2002-2012

Out[7]:

		Value
Element	Item
Consumption in nutrients	Nitrogen Fertilizers (N total nutrients)	1834733
	Phosphate Fertilizers (P205 total nutrients)	2634552
	Potash Fertilizers (K20 total nutrients)	2962910

In [8]:

elements=['Consumption']
items=['Phosphate fertilizers', 'Potash fertilizers', 'Nitrogenous fertilizers']
fertilizers2 = get_data([masterpath,indicators[2],indicators[2]],\
                        ['Country','Item','Element','Year','Value'],[0,3,2,1],elements,items)
#note that for fertilizers2, the categories that are of interest to us are stored in the Item column
fertilizers2.xs(['Brazil',1961], level=[0,1]).head() #remove .head() to view all items
#data availability is 1961-2002

Out[8]:

		Value
Element	Item
Consumption	Nitrogenous fertilizers	55064
	Phosphate fertilizers	144213
	Potash fertilizers	70727

1.3. Labor

In order to estimate labor energy, we create the population dataframe from the Population module of FAOSTAT.

In [9]:

elements=['Total economically active population in Agr','Total Population - Both sexes']
items=['Population - Est. & Proj.']
population = new_get_data([masterpath,indicators[3],indicators[3]],\
                      ['Country','Item','Element'],[0,2,1],elements,items,[],[1961,2051])
#note that for population, the categories that are of interest to us are stored in the Item column
#extra processing for new FAO data tables
population.columns=range(1961,2051)
population=pd.DataFrame(population.stack())
population.index = population.index.swaplevel(1, 3).swaplevel(2, 3)
population.index.names=([u'Country', 'Year', u'Element', u'Item'])
population.columns=['Value']
population.xs(['Brazil',1980], level=[0,1]).head()
#data availability is 1961-2050 for some countries, but mostly only 1980-2020

Out[9]:

		Value
Element	Item
Total Population - Both sexes	Population - Est. & Proj.	121740
Total economically active population in Agr	Population - Est. & Proj.	16345

1.4. Food

Food balance sheets are the last items to load. Note that for the balance dataframe, the categories of interest are stored in the Element column. We also need to import the ItemCode column for categorization purposes. Hence the indexing order changes to 0,3,2. Just like in for the case of fertilizers, the units are implicit (1000 tonnes) and hence are not loaded to conserve memory. Due to the size of the balance dataframe, execution of the following cell can take several minutes to complete.

In [10]:

elements=['Food supply (kcal/capita/day)','Food supply quantity (kg/capita/yr)']\
        +['Feed','Food','Processing','Other uses','Seed','Waste',\
          'Import Quantity','Export Quantity','Stock Variation']\
        +['Production']

itemfilter='(((2511<=ItemCode)&(ItemCode<=2782))\
                              |(ItemCode==2805)\
                              |(ItemCode==2899)\
                              |(ItemCode==2912)\
                              |(ItemCode==2948)\
                              |(ItemCode==2901)\
                              |(ItemCode==2960)\
                              |(ItemCode==2961))'
        
balance = new_get_data([masterpath,indicators[4],indicators[4]],\
                      ['Country','Item','Element','Item Code'],[0,3,2],elements,[],[],[1961,2012])
#need to save ItemCodes for categorization
#note that for balance, the categories that are of interest to us are stored in the Element column
#extra processing for new FAO data tables
balance.columns=['ItemCode']+range(1961,2012)
balance=balance.query(itemfilter).set_index("ItemCode", append=True)
balance=pd.DataFrame(balance.stack()).reset_index(level='ItemCode')
balance.columns=['ItemCode','Value']
balance.index = balance.index.swaplevel(1, 3)
balance.index.names=([u'Country', 'Year', u'Item', u'Element'])
balance.xs(['Brazil',1961,'Maize and products'], level=[0,1,2])#.head()
#data availability is 1961-2011

Out[10]:

	ItemCode	Value
Element
Production	2514	9036.00
Import Quantity	2514	6.00
Stock Variation	2514	0.00
Export Quantity	2514	4.00
Feed	2514	5961.00
Seed	2514	184.00
Waste	2514	904.00
Processing	2514	4.00
Food	2514	1985.00
Food supply quantity (kg/capita/yr)	2514	26.47
Food supply (kcal/capita/day)	2514	227.00

1.5. Imports

The value added imports of all agricultural products are the last items to load. Note that for the imports dataframe, the categories of interest are stored in the Element column. Just like the balance dataframe, execution of the following cell can take several minutes to complete.

In [8]:

elements=['Import Value']
items=['Agricult.Products,Total']
imports = new_get_data([masterpath,indicators[7],indicators[7]],\
                      ['Country','Item','Element'],[0,2,1],elements,items,[],[1961,2012])
#note that for imports, the categories that are of interest to us are stored in the Element column
#extra processing for new FAO data tables
imports.columns=range(1961,2012)
imports=pd.DataFrame(imports.stack())
imports.index = imports.index.swaplevel(1, 3).swaplevel(2, 3)
imports.index.names=([u'Country', 'Year', u'Element', u'Item'])
imports.columns=['Value']
imports.xs(['Brazil',1980], level=[0,1]).head()
#data availability is 1961-2011

Out[8]:

		Value
Element	Item
Import Value	Agricult.Products,Total	2470870

In [9]:

imports.loc['United Arab Emirates']

Out[9]:

			Value
Year	Element	Item
1961	Import Value	Agricult.Products,Total	3752
1962	Import Value	Agricult.Products,Total	4065
1963	Import Value	Agricult.Products,Total	4245
1964	Import Value	Agricult.Products,Total	4459
1965	Import Value	Agricult.Products,Total	4815
1966	Import Value	Agricult.Products,Total	11588
1967	Import Value	Agricult.Products,Total	13642
1968	Import Value	Agricult.Products,Total	22958
1969	Import Value	Agricult.Products,Total	22221
1970	Import Value	Agricult.Products,Total	31980
1971	Import Value	Agricult.Products,Total	36035
1972	Import Value	Agricult.Products,Total	52867
1973	Import Value	Agricult.Products,Total	98894
1974	Import Value	Agricult.Products,Total	239046
1975	Import Value	Agricult.Products,Total	297260
1976	Import Value	Agricult.Products,Total	370719
1977	Import Value	Agricult.Products,Total	432747
1978	Import Value	Agricult.Products,Total	594019
1979	Import Value	Agricult.Products,Total	879103
1980	Import Value	Agricult.Products,Total	1077400
1981	Import Value	Agricult.Products,Total	1148590
1982	Import Value	Agricult.Products,Total	1044390
1983	Import Value	Agricult.Products,Total	1018400
1984	Import Value	Agricult.Products,Total	1100640
1985	Import Value	Agricult.Products,Total	1118290
1986	Import Value	Agricult.Products,Total	1216010
1987	Import Value	Agricult.Products,Total	1285020
1988	Import Value	Agricult.Products,Total	1352900
1989	Import Value	Agricult.Products,Total	1762340
1990	Import Value	Agricult.Products,Total	1693330
1991	Import Value	Agricult.Products,Total	1747280
1992	Import Value	Agricult.Products,Total	1906260
1993	Import Value	Agricult.Products,Total	1739830
1994	Import Value	Agricult.Products,Total	2146840
1995	Import Value	Agricult.Products,Total	2459460
1996	Import Value	Agricult.Products,Total	2643060
1997	Import Value	Agricult.Products,Total	2245120
1998	Import Value	Agricult.Products,Total	2443370
1999	Import Value	Agricult.Products,Total	2642640
2000	Import Value	Agricult.Products,Total	2693230
2001	Import Value	Agricult.Products,Total	3112450
2002	Import Value	Agricult.Products,Total	3767700
2003	Import Value	Agricult.Products,Total	4296530
2004	Import Value	Agricult.Products,Total	4941250
2005	Import Value	Agricult.Products,Total	3730570
2006	Import Value	Agricult.Products,Total	7164150
2007	Import Value	Agricult.Products,Total	8161290
2008	Import Value	Agricult.Products,Total	10926500
2009	Import Value	Agricult.Products,Total	10489900
2010	Import Value	Agricult.Products,Total	11913800
2011	Import Value	Agricult.Products,Total	14018000

2. Processing

In the following section we convert the dataframes defined above into JSON format that can be read by the D3.js javascript framework for web visualization. Aside form the data manipulation, we also compute and add additional information such as caloric values and food category sums into the JSON files. The Sankey plugin of D3.js requires the input JSONs to be in a very specific format. All of them have to inclue at least the following key-value pairs:

nodes
- name: name of the node (the order or these entries (groupdict) automatically assigns an index (count) to the names)
links
- source: the index of the source node
- target: the index of the target node
- value: value of the link

Later this has been extended to add support for the indvidual coloring of nodes, positioning in the horizontal plane, as well as value fixation.

nodes
- color: color of the node
- layer: sets a fixed node x-index from the left
- value: sets a fixed node height

From this point forward, we will interchangibly refer to links as flows, due to the energetic aspect of the links. We will include additional information such as breakdown (stored under the key supply) or secondary flow values for a different type of category (stored under the key prod). As a first step, we define the function add_to_json(i,k,source,target,value,prod,supply) which will take numeric entries and create the JSON-like structure presented above. The conversion is made easy using Python dictionaries. The argument i defines the detail level at which we are passing the data and it is one of the following:

Food hypergroups
Food supergroups
FAO food groups
FAO crop groups

The argument k corresponds to the Python dictionary in the global variable data[i][k] to which the data will be written to. We will generate the following datasets:

Gross energy [TWh] - data[i][0] variable - includes food-system and crop residues
Food system energy [TWh] - data[i][1] variable - includes agri-system and food trade and scondary food products
Agri-system energy [TWh] - data[i][2] variable - includes primary crops
Food weight [ktonnes] - data[i][3] variable

In [12]:

def add_to_json(i,k,source,target,value,prod,supply,targetvalue=""):
    
    global groupdict
    global count
    global data
    
    if (~np.isnan(value)&(value!=0)): #only add non-zero and non-NaN values
        
        #update group dictionary if source not yet encountered
        if source not in groupdict[i][k].keys():
            groupdict[i][k][source]=count[i][k]
            nodes_innerdict={} 
            nodes_innerdict["name"]=source
            if (layer(k,source)!=-1): nodes_innerdict["layer"]=layer(k,source)
            if (color(k,source)!=""): nodes_innerdict["fill"]=color(k,source)
            data[i][k]["nodes"].append(nodes_innerdict)
            count[i][k] += 1 
            
        #update group dictionary if target not yet encountered
        if target not in groupdict[i][k].keys(): 
            groupdict[i][k][target]=count[i][k]
            nodes_innerdict={} 
            nodes_innerdict["name"]=target
            if (layer(k,target)!=-1): nodes_innerdict["layer"]=layer(k,target)
            if (color(k,target)!=""): nodes_innerdict["fill"]=color(k,target)
            if (targetvalue!=""): nodes_innerdict["value"]=targetvalue
            data[i][k]["nodes"].append(nodes_innerdict)
            count[i][k] += 1 
            
        links_innerdict={} 
        links_innerdict["source"]=groupdict[i][k][source]
        links_innerdict["target"]=groupdict[i][k][target]
        links_innerdict["value"]=value
        links_innerdict["prod"]=prod
        links_innerdict["supply"]=supply
        data[i][k]["links"].append(links_innerdict)
    else:
        pass #print 'NaN',source,target

We define the main nodes, based on the flows of the Food Balance Sheets. An extension of these nodes is used when loading the data in Section 1.

In [13]:

elements1=['Feed','Food','Processing','Other uses','Seed','Waste',\
           'Import Quantity','Export Quantity','Stock Variation','Production']        
elements2=['Food supply (kcal/capita/day)','Food supply quantity (kg/capita/yr)']
fuels1=['Electricity', 'Energy for power irrigation']
fuels2=['Motor Gasoline','Gas-Diesel oil', 'Natural gas (including LNG)',\
        'Fuel oil', 'Liquefied petroleum gas (LPG)','Coal']
fuels3=['Gas-diesel oils used in fisheries','Fuel oil used in fisheries']
fert1=['Phosphate fertilizers', 'Potash fertilizers', 'Nitrogenous fertilizers']
fert2=['Potash Fertilizers (K20 total nutrients)', 'Phosphate Fertilizers (P205 total nutrients)',\
       'Nitrogen Fertilizers (N total nutrients)']

The code functions is structured into 5 parts over 2 modules:

OUTPUT module
- Food part
  - Define food groups
  - Calculate food flows
INPUT module
- Fuels part
  - Electricity
  - Fossil fuels
- Fertilizers part
- Labor part
- Imports part

This is followed by the main code section which calls the above parts and saves the data in the desired format.

Save data
Main

Then main code is executed within two for cycles that loop over all countries and years. All code parts take as arguments country and year. In the following we present the code parts one by one. Throughout code development it has been assumed that the order in which the IPython cells corresponding to the code parts defined below are run is preserved. Therefore we define the main procedure in Section 3, after everything else has been defined, including the save data procedure in Section 2.6.

2.1 Food part

2.1.1. Define food groups

First we need to define the food groups, for all abstraction levels. In adition to the standard FAOSTAT categories, we use our own group definitions. These groups are referred to as supergroups throughout the code and are stored in the supergroup array since the FAOSTAT entries in the Food Balance Sheets dataset are already aggregated, such as Apples & Products or Wheat & Products. We will refer to individual entries of the balance dataframe as food items or food item groups bearing in mind that these do not represent individual crop types, but an already aggregated category, as per the FAOSTAT handbook. However, it is possible to retrive the production data for individual crops such as Apples or Wheat Bran from the Production Crops database by calling get_data(10), but this is outside the scope of this work and we will exclusively use the balance dataframe.

To create our own classification with a comfortable detail level, as well as to doublecheck the FAOSTAT groupings, we use the FAOSTAT item codes, as per their handbook. Likewise, within a for loop we define the corresponding supergroups for all items for all abstraction levels and where applicable, also their parent groups: secondary food items have their origin in another food item, called a parent, such as Feed for Animal Products or Processing for Oils. This information will be stored in the supergroup and parent array variables, accesible globally.

In [14]:

def food_groups(country, year):
    
    ###############################################################################################
    ######                            FOOD PART - define food groups                         ######
    ###############################################################################################
    
    # associate food items to their respective groups

    global supergroup         #supergroup to which the food item group belongs to, 
                              #i.e. Fruits is the supergroup for Apples & Products
    global parent             #certain food items have their origin in another food item, called a parent
                              #i.e. Feed is the parent for Animal Products and Processing is the parent for Oils
    global gross              #this is necessary for recording crop residues on the gross energy details level
                              #precentage of crop that is unused but has energy residues remaining inside it, husk
                              #http://www.sciencedirect.com/science/article/pii/S0160412004001564
                              #crop residue average energy content for burning 300kcal/100g
                              #another possibility is to estimate from emissions and back-calculate
                              #energy content from carbon
                              #http://www.gutenberg.org/files/22657/22657-h/chapters/value.html#equn_18

    for element in elements1+elements2:
        for i in balance.xs([country,year,element], level=[0,1,3]).T.iteritems():
            fooditem = i[0]       #choose column in which food item names are stored, might need to change it
            if "+" in fooditem:   #simplify food names for the sake of visualization
                fooditem=fooditem[0:fooditem.find("+")-1]
            if "(" in fooditem:
                fooditem=fooditem[0:fooditem.find("(")-1]
            if "and products" in fooditem:
                fooditem=fooditem[0:fooditem.find("and products")-1]

            # if fooditem has not been encountered yet, assing its properties
            if fooditem not in supergroup[3].keys():
                index = i[1][0]       #choose column in which food group codes are stored, might need to change it
                parent[fooditem]='None'
                supergroup[3][fooditem]=fooditem     #default supergroup is self
                supergroup[0][fooditem]='Fruits, Roots & Vegetables' 
                for i in range (1,3):   #assign default supergroup to all aggregate food items
                    supergroup[i][fooditem]='Others'
                if ((2511<=index<=2659) | (index==2745) | (index==2805) | (index==2899) | (index==2912)):
                    if ((2511<=index<=2520) | (index==2805)):
                        supergroup[2][fooditem] = 'Cereals - Excluding Beer'
                        supergroup[1][fooditem] = 'Cereals'
                        supergroup[0][fooditem] = 'Cereals'
                        parent[fooditem]        = 'None'
                        gross[fooditem] =gross[supergroup[0][fooditem]]\
                                        =gross[supergroup[1][fooditem]]\
                                        =gross[supergroup[2][fooditem]] = 1.0
                    elif (2531<=index<=2535):
                        supergroup[2][fooditem] = 'Starchy Roots'
                        supergroup[1][fooditem] = 'Starchy Roots'
                        supergroup[0][fooditem] = 'Fruits, Roots & Vegetables'
                        parent[fooditem]        = 'None'
                        gross[fooditem] =gross[supergroup[0][fooditem]]\
                                        =gross[supergroup[1][fooditem]]\
                                        =gross[supergroup[2][fooditem]] = 0.25
                    elif (2536<=index<=2537):
                        supergroup[2][fooditem] = 'Sugar Crops'
                        supergroup[1][fooditem] = 'Sugar Crops'
                        supergroup[0][fooditem] = 'Sugar Crops'
                        parent[fooditem]        = 'None'
                        gross[fooditem] =gross[supergroup[0][fooditem]]\
                                        =gross[supergroup[1][fooditem]]\
                                        =gross[supergroup[2][fooditem]] = 0.2
                    elif ((2541<=index<=2543)|(index==2745)):
                        supergroup[2][fooditem] = 'Sugar & Sweeteners'
                        supergroup[1][fooditem] = 'Sugar'
                        supergroup[0][fooditem] = 'Sugar'
                        parent[fooditem]        = 'Processing'
                    elif (2546<=index<=2549):
                        supergroup[2][fooditem] = 'Pulses'
                        supergroup[1][fooditem] = 'Fruits & Vegetables'
                        supergroup[0][fooditem] = 'Fruits, Roots & Vegetables'
                        parent[fooditem]        = 'None'
                        gross[fooditem] =gross[supergroup[0][fooditem]]\
                                        =gross[supergroup[1][fooditem]]\
                                        =gross[supergroup[2][fooditem]] = 0.5#1.0
                    elif (2555<=index<=2570):
                        supergroup[2][fooditem] = 'Oilcrops'
                        supergroup[1][fooditem] = 'Oilcrops'
                        supergroup[0][fooditem] = 'Oilcrops'
                        parent[fooditem]        = 'None'
                        gross[fooditem] =gross[supergroup[0][fooditem]]\
                                        =gross[supergroup[1][fooditem]]\
                                        =gross[supergroup[2][fooditem]] = 0.5#1.0
                    elif (2571<=index<=2586):
                        supergroup[2][fooditem] = 'Vegetable Oils'
                        supergroup[1][fooditem] = 'Vegetable Oils'
                        supergroup[0][fooditem] = 'Vegetable Oils'
                        parent[fooditem]        = 'Processing'
                    elif (2601<=index<=2605):
                        supergroup[2][fooditem] = 'Vegetables'
                        supergroup[1][fooditem] = 'Fruits & Vegetables'
                        supergroup[0][fooditem] = 'Fruits, Roots & Vegetables'
                        parent[fooditem]        = 'None'
                        gross[fooditem] =gross[supergroup[0][fooditem]]\
                                        =gross[supergroup[1][fooditem]]\
                                        =gross[supergroup[2][fooditem]] = 0.5#1.0
                    elif (2611<=index<=2625):
                        supergroup[2][fooditem] = 'Fruits - Excluding Wine'
                        supergroup[1][fooditem] = 'Fruits & Vegetables'
                        supergroup[0][fooditem] = 'Fruits, Roots & Vegetables'
                        parent[fooditem]        = 'None'
                    elif (2630<=index<=2635):
                        supergroup[2][fooditem] = 'Stimulants'
                        supergroup[1][fooditem] = 'Others'
                        supergroup[0][fooditem] = 'Fruits, Roots & Vegetables'
                        parent[fooditem]        = 'None'
                    elif (2640<=index<=2645):
                        supergroup[2][fooditem] = 'Spices'
                        supergroup[1][fooditem] = 'Others'
                        supergroup[0][fooditem] = 'Fruits, Roots & Vegetables'
                        parent[fooditem]        = 'None'
                    elif (2655<=index<=2659):
                        supergroup[2][fooditem] = 'Alcoholic Beverages'
                        supergroup[1][fooditem] = 'Other Processed'
                        supergroup[0][fooditem] = 'Other Processed'
                        parent[fooditem]        = 'Processing'
                    elif (index==2899):
                        supergroup[2][fooditem] = 'Miscellaneous'
                        supergroup[1][fooditem] = 'Others'
                        supergroup[0][fooditem] = 'Fruits, Roots & Vegetables'
                        parent[fooditem]        = 'None'
                    elif (index==2912):
                        supergroup[2][fooditem] = 'Treenuts'
                        supergroup[1][fooditem] = 'Fruits & Vegetables'
                        supergroup[0][fooditem] = 'Fruits, Roots & Vegetables'
                        parent[fooditem]        = 'None'

                elif ((2731<=index<=2782) | (index==2945)| (index==2948) | (index==2949) | (index==2961)):
                    if (2731<=index<=2736):
                        supergroup[2][fooditem] = 'Meat'
                        supergroup[1][fooditem] = 'Animal Products'
                        supergroup[0][fooditem] = 'Animal Products'
                        parent[fooditem]        = 'Feed'
                    elif (2737<=index<=2743):
                        supergroup[2][fooditem] = 'Animal Fats'  
                        supergroup[1][fooditem] = 'Other Processed' 
                        supergroup[0][fooditem] = 'Other Processed' 
                        parent[fooditem]        = 'Processing'
                    elif (index==2744):
                        supergroup[2][fooditem] = 'Eggs'
                        supergroup[1][fooditem] = 'Animal Products'
                        supergroup[0][fooditem] = 'Animal Products'
                        parent[fooditem]        = 'Feed'
                    elif (2761<=index<=2768):
                        supergroup[2][fooditem] = 'Fish, Seafood'
                        supergroup[1][fooditem] = 'Aquatic Products'
                        supergroup[0][fooditem] = 'Aquatic Products'
                        parent[fooditem]        = 'Feed'
                    elif ((2769<=index<=2775)|(index==2961)):
                        supergroup[2][fooditem] = 'Aquatic Products, Other'
                        supergroup[1][fooditem] = 'Aquatic Products'
                        supergroup[0][fooditem] = 'Aquatic Products'
                        parent[fooditem]        = 'Feed'
                    elif (2781<=index<=2782):
                        supergroup[2][fooditem] = 'Animal Fats'
                        supergroup[1][fooditem] = 'Other Processed'
                        supergroup[0][fooditem] = 'Other Processed'
                        parent[fooditem]        = 'Processing'
                    elif (index==2948):
                        supergroup[2][fooditem] = 'Milk - Excluding Butter'
                        supergroup[1][fooditem] = 'Animal Products'
                        supergroup[0][fooditem] = 'Animal Products'
                        parent[fooditem]        = 'Feed'

                #section for variables which only have a secondary flow (from Feed or Processing)
                #currently only Palm Oil because it only belongs to Processing, no other categories
                #need to create (practically non-food) super crop flow
                supergroup[0]["Oil Palm"]=supergroup[1]["Oil Palm"]=supergroup[2]["Oil Palm"]='Oilcrops'
                supergroup[3]["Oil Palm"]='Oil Palm'
                parent["Oil Palm"]='None'

                #create supergroups of supergroups
                supergroup[2][supergroup[2][fooditem]] = supergroup[2][fooditem]
                supergroup[2][supergroup[1][fooditem]] = supergroup[2][fooditem]
                supergroup[2][supergroup[0][fooditem]] = supergroup[2][fooditem]
                supergroup[1][supergroup[2][fooditem]] = supergroup[1][fooditem]
                supergroup[1][supergroup[1][fooditem]] = supergroup[1][fooditem]
                supergroup[1][supergroup[0][fooditem]] = supergroup[1][fooditem]
                supergroup[0][supergroup[2][fooditem]] = supergroup[0][fooditem]
                supergroup[0][supergroup[1][fooditem]] = supergroup[0][fooditem]
                supergroup[0][supergroup[0][fooditem]] = supergroup[0][fooditem]
                
                #create parents of supergroups
                parent[supergroup[0][fooditem]]=parent[supergroup[1][fooditem]]=\
                parent[supergroup[2][fooditem]]=parent[fooditem]

Before every section, we conduct a data profiling - this meas we interpolate the missing values of the dataframe using data from the neighboring countries or global values. This is the food_profiling, which eliminates 0 and NaN values from the dataframes. Food values are an exception compared to the rest of energy data as they are not interpolated.

In [15]:

def food_profiling(country):
    y={}
    
    for i in balance.loc[country].T.iteritems(): #make just one call, filter to country, superfast
        foodvalue=i[1][0] #choose column in which food item values are stored
        if (~np.isnan(foodvalue)&(foodvalue!=0)):
            year     = int(i[0][0]) #choose column in which years are stored
            element  = i[0][2]      #choose column in which element names are stored
            fooditem = i[0][1]      #choose column in which food item names are stored, might need to change it
            if "+" in fooditem:     #simplify food names for the sake of visualization, remove +,(),'and products'
                fooditem=fooditem[0:fooditem.find("+")-1]
            if "(" in fooditem:
                fooditem=fooditem[0:fooditem.find("(")-1]
            if "and products" in fooditem:
                fooditem=fooditem[0:fooditem.find("and products")-1]
            if element not in y: y[element]={}
            if year not in y[element]: y[element][year]={}
            y[element][year].update({fooditem:foodvalue})

    #no interpolation for indiviudal food items        
    return y

2.1.2. Calculate food flows

The next part is perhaps the most susbtantial part of this work, the calculation of the food energy and weight flows. It is structured into 3 sections:

Calculate caloric values of food items
Extract supply weight data for food items
Calculate energy data for food items

Caloric values

The function food_cal returns caloric value for food items and their supergroups on all detail levels, using the per capita food supply information in kcal/day/person and kg/year/person from the balance dataframe. It starts by filtering the dataframe to the country and year of interest and then extracts the values under query using an auxiliary array of length 2 counter. If an exact match is not found in either of dataframe columns (energy or supply values), then the supergroup value is attributed. E.g. if is there is no data for Maize in either of the columns, then the values of the aggregate group Cereals are attrbiuted, trying for the smaller supergroup (detail level 2) first. If data is completely non-existent for an entire supergroup for one country, then global values are used. In the very rare case that even global values do not exist, then a standard 100kcal/100g caloric value is attributed to the food item. The function returns the dictionary foodcalorie which has the food items as keys and their respective caloric values as entries.

In [16]:

def food_cal_profiling(country):
    
    ###############################################################################################
    ######                 FOOD PART - Calculate caloric values of food items                ######
    ###############################################################################################
        
    #extract food item caloric values
    foodcalorie={}
    
    if (set(food_extended.keys()).intersection(set(elements2))==set(elements2)):
        for year in food_extended[elements2[0]].keys():
            if year in food_extended[elements2[1]].keys():
                for fooditem in food_extended[elements2[0]][year].keys():
                    if fooditem in food_extended[elements2[1]][year].keys():
                        if fooditem not in {'Grand Total'}:
                            if fooditem not in foodcalorie: foodcalorie[fooditem]={}
                            energy = food_extended[elements2[0]][year][fooditem]
                            weight = food_extended[elements2[1]][year][fooditem]
                            foodcalorie[fooditem].update({year:[100*energy/(weight*1000/365.25)]})
                            #kcal / 100g : energy / supply, need to transfrom from kg/year/capita to g/day/capita
                            for k in range(3):
                                if supergroup[k][fooditem] not in foodcalorie: foodcalorie[supergroup[k][fooditem]]={}
                                if year not in foodcalorie[supergroup[k][fooditem]]:
                                    foodcalorie[supergroup[k][fooditem]][year]=[]
                                foodcalorie[supergroup[k][fooditem]][year].append(100*energy/(weight*1000/365.25))
    
    #create supergroup values
    for i in foodcalorie.keys():
        for j in foodcalorie[i].keys():
            foodcalorie[i][j]=np.median(foodcalorie[i][j])

    return foodcalorie

Supply and energy data

The food_part function extract food item flows from the balance dataframe and saves them into the global variable data using the add_to_json function, effectively creating the core structure of the FEF Sankey diagram. The function iteratively adds the flows by filtering the dataframe after the entries of elements string array, which contains the designations of the nodes (such as Feed or Processing) and looping through it item-by-item. The core logic of the food_part function follows the following sequence:

Access and extract food item name and value in 1000 tonnes.
Using the specific energy content from the foodcalorie variable, calculate energy values (and subsequently convert to TWh).
Populate item dictionary isum with numeric entries, for all detail levels.
After all food items have been parsed, create links from them to the element (i.e. node) under iteration.

This will create the links for the highest detail level where the supergroups will only contain the food items tehmselves. In the next section of the function, aggregated and special links are added:

Supergroup item aggregations are calculated and saved in the supersum variable for all remaining detail levels.
Aggregations are also calculated for non-primary food items (i.e. those with a parent) and the parent-to-fooditem links are added.
Eventually special links are dealt with:
- Crop residues (which only exist for a certain detail level)
- Processed foods of animal origin (which would have two parents, both Processed and Feed)
- Oil palm (Only the end product Palm Oil is included in the FAO database, with oil palm not being a food item, but necessary to maintain the energy balance of the model).

Finally the share of primary crops is calculated and saved in the global variable share for all detail levels, necessary for the proportionate distribution of fossil inputs.

In [17]:

def food_part(country, year):
    
    ###############################################################################################
    ######              FOOD PART - Extract supply data and calculate energy data            ######
    ###############################################################################################

    global foodinput          #summation variables for EROEI calculation
    global foodoutput

    #extract supply data and calculate energy data
    global data
    global supersum

    #auxiliary dictionaries for use, with accurate preassigned dimensions
    #dictionary to record item parent sums
    helperdict = [[{} for k in range(m)] for i in range(n)] 
    palmsum = 0
    
    for element in elements1:
        
        #discard world level imports and exports. they should balance out.
        if ((country=="World") & (element in {"Import Quantity", "Export Quantity"})):
            continue

        #dictionary to keep track of item sums in case of aggregations, based on the supergroup
        isum=[[{} for i in range(m)] for j in range(n)] 

        if element in food_extended.keys():
            if year in food_extended[element].keys():
                for fooditem in food_extended[element][year].keys():
                
                    foodvalue = food_extended[element][year][fooditem]
                    
                    #try to set default caloric value equal to global values
                    try: foodcal=food_cal_world[fooditem][year]
                    #fooditem was not found, set default caloric value, necessary to handle exceptions
                    except: foodcal=100 
                    #check if caloric value has been obtained for fooditem, attempt to get from supergroup otherwise
                    if fooditem in foodcalorie.keys():
                        if year in foodcalorie[fooditem].keys():
                            foodcal=foodcalorie[fooditem][year]
                    else:
                        if supergroup[2][fooditem] in foodcalorie.keys():
                            if year in foodcalorie[supergroup[2][fooditem]].keys():
                                foodcal=foodcalorie[supergroup[2][fooditem]][year]
                        elif supergroup[1][fooditem] in foodcalorie.keys():
                            if year in foodcalorie[supergroup[1][fooditem]].keys():
                                foodcal=foodcalorie[supergroup[1][fooditem]][year]
                        elif supergroup[0][fooditem] in foodcalorie.keys():
                            if year in foodcalorie[supergroup[0][fooditem]].keys():
                                foodcal=foodcalorie[supergroup[0][fooditem]][year]

                    energyvalue=foodvalue*foodcal*0.00116222222/100
                          #TWh = 1000tons * kcal/100g # 10^9) /100g * 0.00116222222 kwh/kcal *10^-9 TWh/kWh

                    #define sequence of variable assignemt over k axis
                    sequence=[]
                    for k in range(m-1): sequence.append(energyvalue)
                    sequence.append(foodvalue)

                    #sum all links going into food for small Sankey EROEI representation
                    if (element=="Food"):
                        for k in range(m-1):
                            if (k!=2): foodoutput["Food"][k]+=sequence[k]
                    if (element=="Export Quantity"):
                        for k in range(m-1):
                            if (k!=2): foodoutput["Export"][k]+=sequence[k]
                    if (element=="Import Quantity"):
                        for k in range(m):
                            #foodinput[k]+=sequence[k] #old, discounted version
                            foodimport[k]+=sequence[k]
                        #auxiliary variable for Palm Oil processing
                        if fooditem=="Palm Oil":
                            palmsum=energyvalue
                    if (element=="Stock Variation"):
                        if (sequence[0]<0): #decide whether is a stock deposit or withdrawal
                            for k in range(m-1): #food weight handled sepparately
                                if (k!=2): foodoutput["Stock Deposit"][k]+=-sequence[k]
                        else:
                            for k in range(m-1): #food weight handled sepparately
                                if (k!=2): 
                                    foodinput[k]+=sequence[k]
                                    foodinput2["Stock Draw"][k]+=sequence[k]
                    if (element=="Other uses"):
                        for k in range(m-1):
                            if (k!=2): foodoutput["Biofuels & Other"][k]+=sequence[k]
                    if (element=="Production"):
                        if parent[fooditem]=="None":
                            foodoutput["Food"][2]+=sequence[2]
                            foodoutput["Food"][m-1]+=sequence[m-1]
                    
                    #for WROWI - Seeds are input
                    if (element=="Seed"): 
                        if (parent[fooditem]=="None"):
                            foodinput[m-1]+=sequence[m-1]
                            foodinput2["Seed"][m-1]+=sequence[m-1]
                    
                    #save food item values for detail levels 1-3 (level 4 is currently not in use)
                    isum = food_isum(country,year,element,fooditem,sequence,isum)

        #create links    
        food_links(country,year,element,isum)

        #create aggregate values for all items detail levels 1-4
        helperdict = food_aggregate(country,year,element,isum,helperdict)

    #create parent links
    parentsum = food_parent(country,year,helperdict)

    #create special links

    #add extra links needed for processed foods of animal origin
    food_animal(country,year,parentsum)
    #add links for crop residues
    food_residue(country,year)   
    #Palm Oil only belongs to Processing, no other categories
    #Create Oil Palm (practically non-food) parent crop flow        
    food_palm(country,year,palmsum)

    #calculate crops share for energy input distribution
    food_share(country,year)

Food item values

The function food_isum progressively (has itself as an argument) creates a dictionary that eventually will contain food item names as keys and their values as entries. This process is repeatd for every element. Food item names are passed through the fooditem argument, whereas their values in the sequence variable of length m, corresponding to the number of different datasets the user wants to create. Currently m = 4 and it contains food item energy values on indices 0,1,2 and weight values on the last index position. Based on the element, some modifications are made:

Import becomes a source not a sink node.
Stock Variation is treated either as sink or source, based on its polarity. The node name is also modified correspondigly.

In [18]:

def food_isum(country,year,element,fooditem,sequence,isum):
#save fooditem values for detail levels 1-3 (level 4 is currently not in use) - was put back in
         
    for i in range(n): #save food item values over different groups
        #save components for the piechart breakdown (isum)
        for k in range(m):
            if (element=="Import Quantity"): 
                if parent[fooditem]!="None":
                    target=parent[fooditem]
                else: target=supergroup[i][fooditem]
                target="Import"+target
            elif (element=="Stock Variation"): 
                if sequence[k]<0:
                    target="Stock Deposit"+supergroup[i][fooditem]
                else:
                    if parent[fooditem]!="None":
                        target=parent[fooditem]
                    else: target=supergroup[i][fooditem]
                    target="Stock Draw"+target
            else: target=supergroup[i][fooditem]
            if target in isum[i][k].keys():
                isum[i][k][target].append({"l":fooditem, "v":abs(sequence[k])})
            else:
                isum[i][k][target]=[{"l":fooditem, "v":abs(sequence[k])}]
    
    return isum

Food item links

The food_links function iterates through the entries of the isum dictionary (i.e. food items for which data exists and it has been added for certain country, for a certain year) and saves them in the global data variable by invoking the add_to_json function. In some cases, node names (given by the element argument) are altered for clarity and better legibility.

In [19]:

def food_links(country,year,element,isum):
#create links
    for i in range(n):
        for k in range(m):
            for j in isum[i][k].keys():     #second index for isum can be any, keys dont change over k
                source=j
                if (element=="Seed"):         #replace 'Seed' group with self-loops
                    if ((k==m-1)&(parent[j]=="None")):              #except in weight view 
                        target=element
                    else:
                        target=j
                elif (element=="Other uses"): #rename 'Other uses' group to Biofuels & Other
                    target="Biofuels & Other"
                elif (element=="Export Quantity"): #rename 'Export Quantity' group to Export
                    target="Export"
                elif ("Import" in j): #rename 'Import Quantity' group to Import and reverse links
                    source="Import"
                    target=j[j.find("Import")+6:len(j)]
                elif ("Stock Draw" in j): #identify 'Stock Draw' and reverse links
                    source="Stock Draw"
                    target=j[j.find("Stock Draw")+10:len(j)]
                elif ("Stock Deposit" in j): #identify 'Stock Deposit' links
                    target="Stock Deposit"
                    source=j[j.find("Stock Deposit")+13:len(j)]
                else:
                    target=element
                if (k==m-1): c=0
                else:        c=m-1                        
                if (element!="Production"):
                    if (k!=m-2): #if not only agri-system
                        add_to_json(i,k,source,target,sum([w['v'] for w in isum[i][k][j]]),\
                                 sum([w['v'] for w in isum[i][c][j]]),isum[i][k][j])
                #Production
                elif (k==m-2): #if only agri-system
                    if (parent[j]=="None"):
                        add_to_json(i,k,source,target,sum([w['v'] for w in isum[i][k][j]]),\
                                 sum([w['v'] for w in isum[i][c][j]]),isum[i][k][j])
                elif (k==m-1): #if onlyfood weight
                    if (parent[j]=="None"):        
                        add_to_json(i,k,target,source,sum([w['v'] for w in isum[i][k][j]]),\
                                 sum([w['v'] for w in isum[i][c][j]]),isum[i][k][j])

Supergroup links

The food_aggregate function populates the supersum dictionary by aggregating links for all detail levels into their supergroups (themselves for detail level 3). In additions, it also creates sums of the links going into a certain node and saves these into the helperdict dictionary. This is necessary for adding parent links and special links later on.

In [20]:

def food_aggregate(country,year,element,isum,helperdict):
#create aggregate values for all items
    
    for i in range(n):
        for k in range(m):
            for j in isum[i][k].keys():
                #if link is not a parent aggregation itself
                if (("Import" not in j) and ("Stock Draw" not in j)):
                    if ("Stock Deposit" in j): p=j[13:]
                    else: p=j
                    if ((k!=m-2)&(element!="Production")): #if not only agri-system
                        #do not include seed values in parent links
                        if not ((parent[p]!="None")&(element=="Seed")):
                            if p in supersum[i][k].keys():
                                supersum[i][k][p]+=sum([w['v'] for w in isum[i][k][j]])
                            else:
                                supersum[i][k][p]=sum([w['v'] for w in isum[i][k][j]])
                        #create breakdowns for parent links detail levels 1-4  
                        if (parent[p]!="None"):
                            if parent[p] in helperdict[i][k].keys():
                                if supergroup[i][p] in helperdict[i][k][parent[p]]:
                                    helperdict[i][k][parent[p]][supergroup[i][p]].append(isum[i][k][j])
                                else:
                                    helperdict[i][k][parent[p]][supergroup[i][p]]=[isum[i][k][j]]
                            else:
                                helperdict[i][k][parent[p]]={supergroup[i][p]:[isum[i][k][j]]}
                    
                    #on agri-system level only include production values for crops with no parent links
                    elif ((parent[p]=="None")&(element=="Production")):
                            if p in supersum[i][k].keys():
                                supersum[i][k][p]+=sum([w['v'] for w in isum[i][k][j]])
                            else:
                                supersum[i][k][p]=sum([w['v'] for w in isum[i][k][j]])

    return helperdict

Parent links

The food_parent function iterates through the entries of the helperdict dictionary (i.e. food items with parent nodes) and saves them in the global data variable by invoking the add_to_json function. The diagram has been designed so that flows which terminate at nodes that are parents for other flows will represent the breakdown of the aggregate of items departing from the parent node. Therefore a flow containing the food item Milk going from Feed to Animal Products might have Milk in the Animal Products to Food and Animal Products to Waste flows, but it's depicted proportion in the Feed to Animal Products flow will correspond to the sum of the Animal Products to Food and Animal Products to Waste flows. Also in order to preserve the energetic equilibrium of the flows, a flow with source in the nodes Import or Stock Draw that cocntains Milk among its elements will have Feed as its sink.

In [21]:

def food_parent(country,year,helperdict):
#create parent links
    global parentsum
    
    parentsum = [[{} for k in range(m)] for i in range(n)] 
    for i in range(n):
        for k in range(m):
            for d in helperdict[i][k].keys():
                for e in helperdict[i][k][d].keys():
                    mydict={}
                    for f in helperdict[i][k][d][e]:
                        for j in f:
                            if j['l'] in mydict:
                                mydict[j['l']]+=j['v']
                            else:
                                mydict[j['l']]=j['v']
                    if d in parentsum[i][k]:
                        if e in parentsum[i][k][d]:
                            parentsum[i][k][d][e]=[{'l':p,'v':mydict[p]} for p in mydict.keys()]
                        else:
                            parentsum[i][k][d].update({e:[{'l':p,'v':mydict[p]} for p in mydict.keys()]})
                    else:
                        parentsum[i][k].update({d:{e:[{'l':p,'v':mydict[p]} for p in mydict.keys()]}})
                    if (k==m-1): c=0
                    else:        c=m-1
                    if (k!=m-2): #if not only agri-system
                        try: add_to_json(i,k,d,e,supersum[i][k][e],supersum[i][c][e],parentsum[i][k][d][e]) 
                        except: pass

    return parentsum

Crop residues

The food_residue function calculates food residues and add the corresponding links from food items to the Crop Residue node. Cros residue fractions are stroed in the global variable gross and they represent the fraction of the plant mass that can be further used for energy harvesting (mainly through combustion) pruposes. This is logcially restricted to primary crops only.

Detail levels 0 and 1 differ by the the former incorporating crop residue links into the diagram. However, the estimation of crop residue fraction is highly subejctive and uncertain and it depends on the type of crop and the measuring method used. Therefore the scope of this detail level is only orientative. It can provide the reader with a very accurate general picture of the food energy system, but the actual crop residue mass values might differ. Estimating the accurate energy content of crop residues also repsents a challenge as it depends on the type of energy-extraction method used. Here we estimate the energy that a crops may yield after only stage of energy conversion - which in most of the cases will be conversion to heat energy through burning. An average heat energy content of 300kcal/100g has been used throughout the crop spectrum, corresponding to typical caloric values of rice husk/straw/dry grass. Therefore, the calculated crop residue mass is multiplied by the 300 / foodcalorie ratio, where foodcalorie is the energy content of the each food item.

Crop residue values are saved in the residue dictionary and subsequently added to the global data variable. For detail level 0, the crop residues are counted as usable energy, therefore they influence the EROEI - this is reflected by adding their sum to the global variable foodoutput.

In [22]:

def food_residue(country,year):
#add links for crop residues

    global foodoutput
    
    residue=[{} for i in range(n)]    
    for i in range(n):
        for j in supersum[3][m-1].keys():
            #process only primary crops
            if (parent[j]=="None"):
                if j in gross.keys():
                    #adjust for fact that crop residues are burnt
                    #therefore have a rather uniform energy content (300)
                    try: calorie_multiplier=300/foodcalorie[j][year]
                    except: 
                        try:    calorie_multiplier=300/foodcalorie[supergroup[1][j]][year]
                        except: calorie_multiplier=1
                    cr_en=supersum[3][0][j]*gross[j]*calorie_multiplier
                    if supergroup[i][j] in residue[i].keys():
                        residue[i][supergroup[i][j]].append({'l':j,'v':cr_en})
                    else:
                        residue[i][supergroup[i][j]]=[{'l':j,'v':cr_en}]
        for j in residue[i].keys():
            try: calorie_multiplier=300/foodcalorie[j][year]
            except: 
                try:    calorie_multiplier=300/foodcalorie[supergroup[1][j]][year]
                except: calorie_multiplier=1
            cr_en=supersum[i][0][j]*gross[j]*calorie_multiplier
            cr_su=supersum[i][m-1][j]*gross[j]
            add_to_json(i,0,j,"Crop Residue",cr_en,cr_su,residue[i][j])
            if (i==0):
                foodoutput["Crop Residue"][0]+=cr_en

Processed foods of animal origin

Processed foods of animal origin represent a special types of links because them having two parent nodes, both Processed and Feed. Therefore extra links between the nodes Feed and Processed need to be added. The FAO supergroup for these food items is called Animal Fats. The function food_animal looks for the existence of such a key in the supersum dictionary and saves the extra links into data if needed.

In [23]:

def food_animal(country,year,parentsum):
#add extra links needed for processed foods of animal origin
    for i in range(n):    
        for k in range(m):  
            if (k==m-1): c=0
            else:        c=m-1
            if (k!=m-2): #if not only agri-system
                if "Animal Fats" in supersum[1][k].keys():
                    add_to_json(i,k,"Feed","Processing",supersum[1][k]["Animal Fats"],\
                        supersum[1][c]["Animal Fats"],parentsum[1][k]["Processing"]["Animal Fats"]) 

Oil palm

Palm oil represents a special category, as it is a secondary (processed) food product, but it does not have an originating primary crop. Therefore the Oil Palm primary food crop has to be created artificially, using consistent number and processing ratios with other oil crops and a 50% oil extraction ratio in lack of these.

In [24]:

def food_palm(country,year,palmsum): 
#Palm Oil only belongs to Processing, no other categories
#Create Oil Palm (practically non-food) parent crop flow
    
    adjuster=1/0.5 #if no data assume 50% oil extraction ratio from palm, otherwise based on other oilcrops
    for i in range(n):
        for k in range(m):        
            if "Palm Oil" in  supersum[3][k].keys():
                #if not all of Palm Oil is imported
                if supersum[3][0]["Palm Oil"]>palmsum: 
                    try:
                        #if ((supersum[0][0]["Vegetable Oils"]-supersum[3][0]["Palm Oil"]!=0) & \
                        #("Oilcrops" in supersum[0][0].keys())):

                            #if ((i==0)&(k==0)): 
                                #need to adjust for average oil products processing rate, only once
                                #adjuster = supersum[i][k]["Oilcrops"]/\
                                #      (supersum[0][0]["Vegetable Oils"]-supersum[3][0]["Palm Oil"])
                                #old version, just use universal value instead
                                
                        value=(supersum[3][0]["Palm Oil"]-palmsum)*adjuster
                        
                        #if "Oilcrops" in foodcalorie.keys(): cal=foodcalorie["Oilcrops"][year]
                        #else: cal=300
                        #old version, just use global average instead
                        
                        cal=365.25
                        prod=value/(cal*0.00116222222/100)
                                     #use average Oilcrop energy content to calculate Palm Oil weight  
                        if (k==m-1): #if in weight group, then swap values
                            value, prod = prod, value
                        if ((i<(n-1)) & ("Oilcrops" in groupdict[i][k].keys()) & \
                            ("Oilcrops" in supersum[i][k].keys())):
                            for w in data[i][k]['links']:
                                if ((w['source']==groupdict[i][k]["Oilcrops"]) & \
                                    (w['target']==groupdict[i][k]["Processing"])):
                                    w['value']+=value
                                    w['prod']+=prod
                                    w['supply'].append({'l':"Oil Palm", 'v':value})
                            supersum[i][k]["Oilcrops"]+=value  
                            if (k==1):
                                for w in data[i][2]['links']:
                                    if ((w['source']==groupdict[i][2]["Oilcrops"]) & \
                                        (w['target']==groupdict[i][2]["Production"])):
                                        w['value']+=value
                                        w['prod']+=prod
                                        w['supply'].append({'l':"Oil Palm", 'v':value})
                                supersum[i][2]["Oilcrops"]+=value
                                for w in data[i][3]['links']:
                                    if ((w['source']==groupdict[i][3]["Production"]) & \
                                        (w['target']==groupdict[i][3]["Oilcrops"])):
                                        w['value']+=prod
                                        w['prod']+=value
                                        w['supply'].append({'l':"Oil Palm", 'v':prod})
                                supersum[i][3]["Oilcrops"]+=prod      #questionable                           
                                if (i==1): #add production link, only once
                                    foodoutput["Food"][2]+=value
                                    add_to_json(3,2,"Oil Palm","Production",value,prod,[{'l':"Oil Palm", 'v':value}])
                                    supersum[3][2]["Oil Palm"]=value
                        else:
                            if (i<(n-1)): source="Oilcrops"
                            else: source="Oil Palm"                    
                            supersum[i][k][source]=value
                            if (k<m-2): #if not only agri-system
                                add_to_json(i,k,source,"Processing",value,prod,[{'l':"Oil Palm", 'v':value}])
                            elif (k==m-2): #agri-system
                                add_to_json(i,k,source,"Production",value,prod,[{'l':"Oil Palm", 'v':value}])
                                foodoutput["Food"][k]+=value
                            else: #production weight needs to be adjusted for consistency 
                                add_to_json(i,k,"Production",source,value,prod,[{'l':"Oil Palm", 'v':value}])
                                add_to_json(i,k,source,"Processing",value,prod,[{'l':"Oil Palm", 'v':value}])
                                foodoutput["Food"][k]+=value
                    except: pass

Crop Share

Calculate primary crop shares in weight. These values are used later for a proportional distribution of fossil fuel and electricity inputs into agriculture. This information is saved into the share variable, for all dimensions and all crops.

In [25]:

def food_share(country,year): 
#calculate crops share for energy input distribution
   
    global foodinput  #summation variable for EROEI calculation
    global share

    for i in range(n):
        crop=[0 for k in range(m)]
        for k in range(m):
            for j in supersum[i][k].keys():
                if (parent[j]=='None'):  #calculate primary crops share - parentless entries
                    crop[k]+=supersum[i][k][j]

        for k in range(m):
            for j in supersum[i][k].keys():
                if (parent[j]=='None'):
                    share[i][k][j] = supersum[i][k][j]/crop[k]

                if (supergroup[0][j]=="Aquatic Products"): # calculate aquatic products share
                    share[i][k][j] = supersum[i][k][j]/supersum[0][k]["Aquatic Products"]

2.2. Fuels part

Define global regions for interpolation.

In [26]:

#South Asia
SAS={'Afghanistan',
'Maldives',
'Bangladesh',
'Nepal',
'Bhutan',
'Pakistan',
'India',
'Sri Lanka'}

#East Asia and Pacific
EAP={'American Samoa',
'Myanmar',
'Cambodia',
'Palau',
'China',
'Papua New Guinea',
'Fiji',
'Philippines',
'Indonesia',
'Samoa',
'Kiribati',
'Solomon Islands',
'Korea, Dem. Rep.',
'Thailand',
'Lao PDR',
'Timor-Leste',
'Malaysia',
'Tonga',
'Marshall Islands',
'Tuvalu',
'Micronesia, Fed. Sts.',
'Vanuatu',
'Mongolia',
'Vietnam'}

#South America
SA={'Argentina',
'Guyana',
'Belize',
'Haiti',
'Bolivia',
'Honduras',
'Brazil',
'Jamaica',
'Colombia',
'Mexico',
'Costa Rica',
'Nicaragua',
'Cuba',
'Panama',
'Dominica',
'Paraguay',
'Dominican Republic',
'Peru',
'Ecuador',
'St. Lucia',
'El Salvador',
'St. Vincent and the Grenadines',
'Grenada',
'Suriname',
'Guatemala',
'Venezuela, RB'}

#Low income food deficit countries
LIFDC={'Afghanistan',
'Korea, Dem. Rep.',
'Bangladesh',
'Liberia',
'Benin',
'Madagascar',
'Burkina Faso',
'Malawi',
'Burundi',
'Mali',
'Cambodia',
'Mozambique',
'Central African Republic',
'Myanmar',
'Chad',
'Nepal',
'Comoros',
'Niger',
'Congo, Dem. Rep.',
'Rwanda',
'Eritrea',
'Sierra Leone',
'Ethiopia',
'Somalia',
'Gambia, The',
'Tajikistan',
'Guinea',
'Tanzania',
'Guinea-Bissau',
'Togo',
'Haiti',
'Uganda',
'Kenya',
'Zimbabwe'}

#EU
EU={'Austria',
'Italy',
'Belgium',
'Latvia',
'Bulgaria',
'Lithuania',
'Croatia',
'Luxembourg',
'Cyprus',
'Malta',
'Czech Republic',
'Netherlands',
'Denmark',
'Poland',
'Estonia',
'Portugal',
'Finland',
'Romania',
'France',
'Slovak Republic',
'Germany',
'Slovenia',
'Greece',
'Spain',
'Hungary',
'Sweden',
'Ireland',
'United Kingdom'}

#Carribean
CAR={'Antigua and Barbuda',
'Jamaica',
'Bahamas, The',
'St. Kitts and Nevis',
'Barbados',
'St. Lucia',
'Belize',
'St. Vincent and the Grenadines',
'Dominica',
'Suriname',
'Grenada',
'Trinidad and Tobago',
'Guyana'}

#Least developed countries
LDC={'Afghanistan',
'Madagascar',
'Angola',
'Malawi',
'Bangladesh',
'Mali',
'Benin',
'Mauritania',
'Bhutan',
'Mozambique',
'Burkina Faso',
'Myanmar',
'Burundi',
'Nepal',
'Cambodia',
'Niger',
'Central African Republic',
'Rwanda',
'Chad',
'Sao Tome and Principe',
'Comoros',
'Senegal',
'Congo, Dem. Rep.',
'Sierra Leone',
'Djibouti',
'Solomon Islands',
'Equatorial Guinea',
'Somalia',
'Eritrea',
'South Sudan',
'Ethiopia',
'Sudan',
'Gambia, The',
'Tanzania',
'Guinea',
'Timor-Leste',
'Guinea-Bissau',
'Togo',
'Haiti',
'Tuvalu',
'Kiribati',
'Uganda',
'Lao PDR',
'Vanuatu',
'Lesotho',
'Yemen, Rep.',
'Liberia',
'Zambia'}

#North America
NA={'Canada','United States'}

In [27]:

def get_pcountry(country):
    if country in LDC: return 'Least Developed Countries'
    elif country in LIFDC: return 'Low Income Food Deficit Countries'
    elif country in CAR: return 'Caribbean'
    elif country in EU: return 'European Union'
    elif country in NA: return 'Northern America'
    elif country in SA: return 'South America'
    elif country in SAS: return 'Southern Asia'
    elif country in EAP: return 'Eastern Asia'
    else: return 'World'

In [28]:

def get_ptcountry(country):
    if country in LDC: return 'Least Developed Countries + (Total)'
    elif country in LIFDC: return 'Low Income Food Deficit Countries + (Total)'
    elif country in CAR: return 'Caribbean + (Total)'
    elif country in EU: return 'European Union + (Total)'
    elif country in NA: return 'Northern America + (Total)'
    elif country in SA: return 'South America + (Total)'
    elif country in SAS: return 'Southern Asia + (Total)'
    elif country in EAP: return 'Eastern Asia + (Total)'
    else: return 'World + (Total)'

In [29]:

def country_name_converter6(country):
    #convert country names for new FAOSTAT 
    
    #aggregate regions 
    if "+" in country: return country[0:country.find("+")-1]
    
    else: return country

Define interpolation function which interpolates proportionally with the change of a country's GDP.

In [30]:

def gdp_interpolate(y,years,fit=2,depth=1,order=1,override=False):
    if 'GDP (current US$)' in wdi_import_extended:
        ratio={}
        w={}
        for year in y:
            try:
                ratio[year]=y[year]/(wdi_import_extended['GDP (current US$)'][year]/1000000)
            except: pass
        ratio=interpolate(ratio,years,1)
        for year in ratio:
            w[year]=ratio[year]*(wdi_import_extended['GDP (current US$)'][year]/1000000)
        return w
    else: return interpolate(y,years,fit,depth,order,override)

2.2.1 Fuel profiling

Fuel pre-profiling

Conduct fuel pre-profiling. For the energy inputs, we need to calculate the electricity used for food processing. This data comes from FAOSTAT, where available. Otherwise an alternative approach is used.

First, we try energy proportionality:
- If the World Bank Development Indicators (WDI) reports the electricity used by agriculture, then that value is used.
  - If not then if the WDI has agriculture electricity usage share and total electricity consumption, we calculate the electricity used by agriculture from there.
  - If electricity data is not available but total energy is, the data is scaled proportionally with the electricity's share in the total energy mix for a particular country in a particular year.
If none of the above works, we try GDP proportionality:
- If Agriculture as a fraction of GDP is reported in the WDI for a particular country, then we calculate the country specific energy intensity of the economy using energy consumption per unit of GDP. Then continuing the process above, we get the electricity used by agriculture.

The results are saved into the agrienergyshare variable. This code is invoked each years, for each country.

In [31]:

def remove_total(agrienergyshare):
    newindex=[]
    agrienergyshare=agrienergyshare.unstack()
    for i in agrienergyshare.index:
        if '+ (Total)' in i: newindex.append(i[:-10])
        else: newindex.append(i)
    agrienergyshare.index=newindex
    agrienergyshare=agrienergyshare.stack()
    return agrienergyshare

In [32]:

def fuel_preprofiling(years,fit=2):

    elements=['Agriculture and forestry energy use as a % of total Energy use']
    agrienergyshare = get_data([masterpath,indicators[5],indicators[5]],\
                              ['Country','Element','Year','Value'],[0,2],elements).drop("Element",axis=1)
    agrienergyshare=remove_total(agrienergyshare)
    x={} #load agriculture energy share from the FAO database
    
    for i in agrienergyshare.T.iteritems(): #make just one call, filter to country, superfast
        v=i[1][0] #choose column in which food item values are stored
        if (~np.isnan(v)&(v!=0)): #in FAOSTAT, NaN and 0 are the same thing
            year    = int(i[0][1]) #choose column in which years are stored
            country = i[0][0]      #choose column in which country names names are stored
            v       = i[1][0]      #choose column in which values are stored, might need to change it
            if country not in x: x[country]={}
            if year not in x[country]: x[country][year]={}
            x[country].update({year:v})
    
    #extrapolate from regional or global values for missing countries
    for country in countries:
        if ((country not in x)&(country!='World')):
            if country!='World':
                pcountry=get_pcountry(country)
                x[country]=x[pcountry]
    
    for j in x.keys():
        #interpolate missing values
        if set(x[j].keys()).intersection(years):
            x[j].update(interpolate(x[j],years,1))
    
    #as a last resort try GDP proportionality
    elsources_backup=['GDP (current US$)',\
                'Energy use (kt of oil equivalent)','Electric power consumption (kWh)',\
                'Electric power consumption (kWh per capita)','Energy use (kg of oil equivalent per capita)'\
                'Population, total']
    electricity2=get_data([wdipath,'WDI_csv',wdindicators[0]])
    electricity2.columns=['Country','Code','Indicator','ICode']+[1960+i for i in range(55)] #range(54) old
    electricity2=electricity2.drop('Code', axis=1).drop('ICode', axis=1)\
               .query('Indicator=='+repr(elsources_backup)).set_index(['Country','Indicator'],drop=True)
    
    y={} #load data from the WDI database
    for i in electricity2.T.iteritems():
        country=i[0][0]
        item=i[0][1]
        if country not in y: y[country]={}
        for j in range(len(i[1])):
            v=i[1][i[1].index[j]]
            if (~np.isnan(v)&(v!=0)):
                if item not in y[country]: y[country][item]={}
                y[country][item].update({int(i[1].index[j]):v})
            
    for j in y.keys():
        for k in y[j].keys():
            #interpolate missing values
            if set(y[j][k].keys()).intersection(years):
                y[j][k].update(interpolate(y[j][k],years,fit))
    
    z={}
    w={}
    for country in countries:
        if ((country_name_converter(country) in y.keys()) and (country in x.keys())):
            #if both interpolations sccessful, there should be a complete overlap of years
            for year in x[country].keys():
                v1=v2=0
                #fill electricity data, try agriculture energy share
                if  elsources_backup[2] in y[country_name_converter(country)]:
                    v1=x[country][year]/100*\
                    y[country_name_converter(country)][elsources_backup[2]][year]*1E-9
                #if not try GDP proportionality
                elif  elsources_backup[0] in y[country_name_converter(country)]:
                    pcountry=country_name_converter(get_pcountry(country)) 
                    if pcountry in y.keys():
                        if ((elsources_backup[2] in y[pcountry].keys()) and \
                            (elsources_backup[0] in y[pcountry].keys())):
                            helpervalue=y[pcountry][elsources_backup[2]][year]*\
                                        y[country_name_converter(country)][elsources_backup[0]][year]/\
                                        y[pcountry][elsources_backup[0]][year]
                            v1=x[country][year]/100*helpervalue*1E-9                            
                        
                #same for energy
                if  elsources_backup[1] in y[country_name_converter(country)]:
                    v2=x[country][year]/100*\
                    y[country_name_converter(country)][elsources_backup[1]][year]
                elif  elsources_backup[0] in y[country_name_converter(country)]:
                    pcountry=country_name_converter(get_pcountry(country)) 
                    if pcountry in y.keys():
                        if ((elsources_backup[1] in y[pcountry].keys()) and \
                            (elsources_backup[0] in y[pcountry].keys())):
                            helpervalue=y[pcountry][elsources_backup[1]][year]*\
                                        y[country_name_converter(country)][elsources_backup[0]][year]/\
                                        y[pcountry][elsources_backup[0]][year]
                            v2=x[country][year]/100*helpervalue
                
                #record into dictionary
                if (~np.isnan(v1)&(v1!=0)):
                    if country not in z: z[country]={}
                    if year not in z[country]: z[country][year]={}
                    z[country].update({year:v1/0.000277777778}) #convert from Wh to J
                if (~np.isnan(v2)&(v2!=0)):
                    if country not in w: w[country]={}
                    if year not in w[country]: w[country][year]={}
                    w[country].update({year:v2*41.868}) #convert from ktoe to TJ
    
    return [z,w]

Fuel profiling

Direct energy inputs from the FAOSTAT and WDI databases, as well as those obtained during the pre-profiling are loaded, interpolated and normalized for a certain country, for a given time period (given by the years argument of the function). During the process, data quality indicators are collected in the missing and integrity global variables. The fuel_profiling returns a dictionary which has elements of the years array as (secondary) keys and profiled values as values, for all fuel types (primary keys).

In [33]:

def fuel_profiling(country,years):
    global integrity
    global missing
    zerodata=[]
    y={}
    country=country_name_converter6(country) #need to convert old FAOSTAT country names to new FAOSTAT
    for year in years: #fill exiting data
        for fuel in fuels1+fuels2+fuels3:
            try:    
                for i in energy.xs([country,year,fuel], level=[0,1,3]).T.iteritems():
                    fuelvalue=i[1][0] #choose column in which fuel item values are stored
                    if (~np.isnan(fuelvalue)&(fuelvalue!=0)):
                        if fuel not in y: y[fuel]={}
                        y[fuel].update({year:fuelvalue})
                    else:
                        zerodata.append(fuel)
            except: pass
    
    for i in y.keys():

        #update data integrity
        for j in set(years).difference(y[i].keys()):
            integrity[j].append(i)

        #interpolate missing dataframe values
        if set(y[i].keys()).intersection(years):
            if i in fuels1: y[i].update(gdp_interpolate(y[i],years,2)) #electricity values change with GDP
            else: y[i].update(interpolate(y[i],years,1))
        else: print country, 'no data in the entire time range', i   
    
    if set(fuels1+fuels2+fuels3).difference(y.keys()):
        for i in set(fuels1+fuels2+fuels3).difference(y.keys()):
            if i not in zerodata:
                print country, 'no data in the entire time range', i
                missing.add(i)
            
    return y

Fuel re-profiling

Fossil fuels used in fisheries and electricity used in agriculture for power irrigation and processing of secondary food products are profiled again, after the initial fossil fuel values are obtained during profiling (y argument), using a different approach to obtain values not yet present in the dataframe - and then interpolated and normalized for a certain country in a similar manner to the main profiling function, for a given time period (given by the years argument of the function). During the process, data quality indicators are updated in the missing, estimated and integrity global variables. The fuel_reprofiling returns a dictionary which has elements of the years array as (secondary) keys and profiled values as values, for all fuel types (primary keys).

Electricity for irrigation - this data comes from FAOSTAT, where available. Otherwise we have calculated a characteristic energy requirement per acre of irrigated land for each climate belt based on the irrigation intensities of neighboring countries and global values based on geographical latitude bands.
Fossil fuels for fishing - this data comes from FAOSTAT for larger fishing nations. For smaller ones, we calculate a characteristic fossil energy requirement per ton of fish for the neighboring countries or global values based on geographical latitude bands.

In [34]:

def fuel_reprofiling(country,years,y):
    global integrity
    global missing
    global estimated
    zerodata=[]
    fuels=['Electricity','Energy for power irrigation']+fuels3+fuels2
    for fuel in fuels:
        if (fuel not in y):
            if fuel=='Electricity':

                for year in years: #fill exiting data
                    try: 
                        fuelvalue=agrienergy[0][country][year]
                        if (~np.isnan(fuelvalue)&(fuelvalue!=0)):
                            if fuel not in y: y[fuel]={}
                            y[fuel].update({year:fuelvalue})
                    except: pass
                    
            if fuel=='Energy for power irrigation':
                for year in years: #fill exiting data
                    try: 
                        #try assigning irrigation belt characteristic value (area*irrigation intensity for latitude)
                        fuelvalue=land.xs([country,year], level=[0,1]).values[0][0]*\
                                       irrig_intensity[(int)(country_lat[country])]
                        if (~np.isnan(fuelvalue)&(fuelvalue!=0)):
                            if fuel not in y: y[fuel]={}
                            y[fuel].update({year:fuelvalue}) #no need to convert data already in TJ
                    except: pass
                    
        if [i for i in fuels3 if i not in y]:#if none of fuels for fishing is in y
            #old fuel=fuels3[0] #assume all fishery fuel is Gas-diesel
            fuel="Fossil fuels in fisheries"
            missing.add(fuel)
            for year in years: #fill exiting data
                try: 
                    #try assigning fishery characteristic value
                    try: fuelvalue=fishing_energy(country,year)
                    except: pass
                    if (~np.isnan(fuelvalue)&(fuelvalue!=0)):
                        if fuel not in y: y[fuel]={}
                        y[fuel].update({year:fuelvalue}) #no need to convert data already in TJ
                except: pass
                
        if [i for i in fuels2 if i not in y]:#if none of fossil fuels is in y
            #old fuel=fuels2[0] #assume all fossil fuel is Gasoline
            fuel="Other fossil fuels"
            missing.add(fuel)
            for year in years: #fill exiting data
                try: 
                    #try assigning fossil characteristic value
                    fuelvalue=agrienergy[1][country][year]
                    if (~np.isnan(fuelvalue)&(fuelvalue!=0)):
                        if fuel not in y: y[fuel]={}
                        y[fuel].update({year:fuelvalue}) #no need to convert data already in TJ
                except: pass

    for i in fuels+["Other fossil fuels","Fossil fuels in fisheries"]:
        if ((i in y)&(i in missing)):
            #update data integrity
            for j in set(years).difference(y[i].keys()):
                integrity[j].append(i)

            #interpolate missing dataframe values
            if set(y[i].keys()).intersection(years):
                y[i].update(interpolate(y[i],years,1))
                print country, 'tried alternate source for', i,', success'
                if i in ["Fossil fuels in fisheries"]: #if fishing fuel, then remove all
                    for j in fuels3+["Fossil fuels in fisheries"]:
                        if j in missing:
                            missing.remove(j)
                elif i in ["Other fossil fuels"]: #if fossil fuel, then remove all
                    for j in fuels2+["Other fossil fuels"]:
                        if j in missing:
                            missing.remove(j)
                elif i in missing:
                    missing.remove(i)
                estimated.add(i)
                
            else: print country, 'tried alternate source for', i,', no luck' 
                
                 
    if "Fossil fuels in fisheries" in missing.copy(): #if fishing fuel, then remove all
        for j in fuels3:
            if j in missing:
                missing.remove(j)
    elif "Other fossil fuels" in missing.copy(): #if fossil fuel, then remove all
        for j in fuels2:
            if j in missing:
                missing.remove(j)            
    
    return y

2.2.2 Fuel parsing

Electricity Output

Direct elelctricity outputs are loaded from the electricity_extended global dictionary, which has been returned by the fuel_reprofiling function. Using the add_to_json function, the electricity-to-primary crops (irrigation) and electricity-to-processing flows are saved into the data variable and effectively added to the visualization structure.

In [35]:

def fuels_part1(country, year):
#if True: # placeholder dummy
         
        ###############################################################################################
        ######                              FUELS PART - Electricity                             ######
        ###############################################################################################
        
    if True: # placeholder dummy
        
        global foodinput #summation variable for EROEI calculation
        global eltotal
        global eltotal_n

        #distribute electricity input for crop irrigation
        for fuel in [fuels1[1]]:
            if fuel in energy_extended.keys():
                try:
                    fuelvalue=energy_extended[fuel][year]
                    if (~np.isnan(fuelvalue)&(fuelvalue!=0)):
                        electricityvalue=fuelvalue*0.000277777778 #1 joule = 0.000277777778 Wh
                        for i in range(n):
                            for k in range(m-1): #supply data (k=3) does not have non-food inputs
                                #distribute electricity input by locally grown crops share only (arg 2 instead of k)
                                for j in supersum[i][2]:
                                    if (parent[j]=='None'):
                                        #distribute total input based on supply-based share
                                        add_to_json(i,k,fuels1[0],j,electricityvalue*share[i][3][j],0,\
                                                '[{"l":'+repr(fuel)+',"v":'+repr(electricityvalue*share[i][3][j])+'}]')
                        eltotal+=electricityvalue
                        eltotal_n+=electricityvalue
                except: pass
                
        #electricity not used for irrigation goes for food processing
        for fuel in [fuels1[0]]:
            if fuel in energy_extended.keys():
                try:
                    fuelvalue=energy_extended[fuel][year]
                    if (~np.isnan(fuelvalue)&(fuelvalue!=0)):
                        electricityvalue=fuelvalue*0.000277777778 #1 joule = 0.000277777778 Wh
                        for i in range(n):     #exclude level 4 for now - was put back in
                            for k in range(m-1): #supply data (k=3) does not have non-food inputs
                                if (k!=m-2): #if not only agri-system
                                    add_to_json(i,k,fuel,"Processing",electricityvalue,0,\
                                        '[{"l":'+repr(fuel)+',"v":'+repr(electricityvalue)+'}]')
                        eltotal+=electricityvalue
                except: pass

Fossil to Electricity - Database setup

This section calculates the equivalent primary energy inputs for the required electric energy defined by the above cell. First, the data is loaded from the WDI database, where available.

In [36]:

#WDI database access, needed for fossil-to-electrcity part
wdipath = 'http://databank.worldbank.org/data/download/'
#wdipath = 'https://dl.dropboxusercontent.com/u/531697/datarepo/Food-Energy/db/' #alternative mirror for WDI database
if localpath:
    #local path for downloaded FAO database, only for developer purposes
    wdipath = 'E:/Dropbox (MIT)/Public/Food-Energy/db/'
    
wdindicators=[] #we store indicator name strings in this string array - do not touch
wdindicators.append('WDI_Data') #0
wdindicators.append('WDI_Description') #1
wdindicators.append('WDI_Country') #2
wdindicators.append('WDI_Series') #3
wdindicators.append('WDI_CS_Notes') #4
wdindicators.append('WDI_ST_Notes') #5
wdindicators.append('WDI_Footnotes') #6

elsources1=['Electricity production from coal sources (% of total)',\
'Electricity production from natural gas sources (% of total)',\
'Electricity production from oil sources (% of total)']
elsources2=['Electricity production from nuclear sources (% of total)',\
'Electricity production from hydroelectric sources (% of total)',\
'Electricity production from renewable sources, excluding hydroelectric (% of total)']
electricity=get_data([wdipath,'WDI_csv',wdindicators[0]])
electricity.columns=['Country','Code','Indicator','ICode']+[1960+i for i in range(55)] #range(54) #old WDI
electricity=electricity.drop('Code', axis=1).drop('ICode', axis=1)\
                       .query('Indicator=='+repr(elsources1+elsources2)).set_index(['Country','Indicator'],drop=True)
#data availability is 1960-2014

Fossil to Electricity - Data processing

After loading the data into the global variable electricity, country-specific electricty mixes and primary-energy (fossil fuels and renewables and nuclear) specific energy conversion factors are used to calculate the priamry energy inputs entering into electricity. Essentially, this is a profiling process, therefore the data is loaded, interpolated and normalized for a certain country - or extrapolated from the neighbors or global values, for a given time period (given by the years argument of the function). During the process, data quality indicators are collected in the missing, estimated and integrity global variables. The electricity_profiling returns a dictionary which has elements of the years array as (secondary) keys and profiled values as values, for all fuel types (primary keys).

In [37]:

def electricity_profiling(country,years):  
    global integrity
    global missing
    global estimated
    zerodata=[]
    #locally convert country names between databases
    country=country_name_converter(country)
    y={}
    for year in years: #fill exiting data
        for t in range(2):
            if (t<1): elsources=elsources1
            else:     elsources=elsources2
            for c in range(len(elsources)):
                try:    
                    #choose column in which fuel item values are stored
                    fuelvalue=electricity.xs([country,elsources[c]])[year] 
                    #WDI database consistently reports 0 values if that's the case
                    #if (~np.isnan(fuelvalue)&(fuelvalue!=0)): 
                    if (~np.isnan(fuelvalue)):
                        if elsources[c] not in y: y[elsources[c]]={}
                        y[elsources[c]].update({year:fuelvalue})
                    else:
                        zerodata.append(elsources[c])
                except: pass

    for i in y.keys():

        #update data integrity
        for j in set(years).difference(y[i].keys()):
            label=i
            try: label="Electricity from "+label[label.find("Elec")+28:label.find("sources")-1].capitalize()
            except: pass
            integrity[j].append(label)

        #interpolate missing dataframe values
        if set(y[i].keys()).intersection(years):
            y[i].update(interpolate(y[i],years,1)) #electricity share, changes linearly
    
    if set(elsources1+elsources2).difference(y.keys()):
        for i in set(elsources1+elsources2).difference(y.keys()):
            if i not in zerodata:
                label=i
                try: label=label[label.find("Elec")+28:label.find("sources")-1].capitalize()+" Electricity"
                except: pass
                print country, 'no data in the entire time range', label
                missing.add(label)

    #if all fuel sources missing data, then take regional/global average for fuel sources
    if ((y=={})&(country!='World')):
        if country!='World':
            if country in LDC: 
                pcountry='Least Developed Countries'
            elif country in LIFDC: 
                pcountry='Low Income Food Deficit Countries'
            elif country in CAR: 
                pcountry='Caribbean'
            elif country in EU: 
                pcountry='European Union'
            elif country in NA: 
                pcountry='Northern America'
            elif country in SA: 
                pcountry='South America'
            elif country in SAS: 
                pcountry='Southern Asia'
            elif country in EAP: 
                pcountry='Eastern Asia'
            else: 
                pcountry='World'
                
            print country, 'no data in the entire time range','All electricity sources, extrapolating from', pcountry
            
            y=electricity_profiling(pcountry,years)
            estimated.add("All electricity sources")
                
    for year in years: #renormalize shares to 100%
        
        s=0
        for fuel in y.keys():
            s+=y[fuel][year]
                
        for fuel in y.keys():
            y[fuel][year]=(y[fuel][year]/s)*100
            
    return y

Electricity Input

Direct elelctricity inputs are loaded from the electricity_extended global dictionary, which has been updated by the electricity_profiling function. Using the add_to_json function, the fossil-to-electrcity and renewable-to-electricity flows are saved into the data variable and effectively added to the visualization structure.

In [38]:

def fuels_part2(country, year, eltotal2, it):
#if True: # placeholder dummy
         
        ###############################################################################################
        ######                    FUELS PART - Fossil to Electricity                             ######
        ###############################################################################################
        
    if True: # placeholder dummy
        
        global foodinput #summation variable for EROEI calculation
        global foodinput2 #summation variable for EROEI calculation
        global electricity_to_fossil  #summation for primary equivalent of electricity
        electricity_to_fossil=0
        
        #create fossil to electricity links
        #need to define fossil-to-electrcity conversion efficiencies
        #http://www.eurelectric.org/Download/Download.aspx?DocumentID=13549
        #http://www.ecofys.com/files/files/ecofys-2012-international-comparison-fossil-power-efficiency.pdf   
        
        #locally convert country names between databases
        country=country_name_converter(country)
        
        for t in range(2):
            ffsum={}
            if (t<1):
                elsources=elsources1
                #conversion efficiency of fossil source into electrcity
                conv_eff=[0.39,0.38+((year-1960)/60.0)*0.2,0.38]
                src="Fossil Fuels"
            else:
                elsources=elsources2
                #conversion efficiency of renewables & nucelar into electricity, currently outside of scope
                conv_eff=[1,1,1] #[0.34,0.93,0.3]
                src="Renewables & Nuclear"
            for c in range(len(elsources)):
                if elsources[c] in electricity_extended.keys():
                    try:
                        if elsources[c] in ffsum:
                            ffsum[elsources[c]]+=eltotal2*(1/conv_eff[c])\
                            *electricity_extended[elsources[c]][year]/100.0
                        else:
                            ffsum[elsources[c]]=eltotal2*(1/conv_eff[c])\
                            *electricity_extended[elsources[c]][year]/100.0
                        if (it==0): #sum primary equivalent of electricity
                            electricity_to_fossil+=eltotal2*(1/conv_eff[c])\
                            *electricity_extended[elsources[c]][year]/100.0
                    except: pass
            
            mydata=[]
            for c in range(len(elsources)):
                 if elsources[c] in electricity_extended.keys():
                    try:
                        label=elsources[c]
                        label=label[label.find("Elec")+28:label.find("sources")-1].capitalize()
                        mydata.append({"l":label, "v":ffsum[elsources[c]]})
                    except: pass
            if (~np.isnan(sum(ffsum.values()))&(sum(ffsum.values())!=0)):
                for i in range(n): #exclude level 4 for now  - was put back in
                    for k in range(m-1):
                        if (k!=2):
                            if (it==0):
                                add_to_json(i,k,src,"Electricity",sum(ffsum.values()),0,mydata) 
                        elif (it==1):
                                add_to_json(i,k,src,"Electricity",sum(ffsum.values()),0,mydata) 

            if (it==0):
                for k in range(m-2):
                    foodinput[k]+=sum(ffsum.values())
                    foodinput2[src][k]+=sum(ffsum.values())
            else: 
                foodinput[2]+=sum(ffsum.values())
                foodinput2[src][2]+=sum(ffsum.values())

Fossil Fuels

Direct fossil inputs are loaded from the energy_extended global dictionary, which has been created by the fuel_profiling function. Using the add_to_json function, the fossil-to-primary crops and fossil-to-aquatic products flows are saved into the data variable and effectively added to the visualization structure. Fossil fuels not reported in FAOSTAT are estimated from WDI.

In [39]:

def fuels_part3(country, year):
#if True: # placeholder dummy

        ###############################################################################################
        ######                             FUELS PART - Fossil fuels                             ######
        ###############################################################################################
        
    if True: # placeholder dummy
        
        global foodinput
        
        ### FOSSIL FUELS DIRECT INTO AGRICULTURE
        fuelsum=[[{} for k in range(m-1)] for i in range(n)] 
        #supply data (k=3) does not have non-food inputs
        
        #distribute fossil fuel input for crop cultivation
        for fuel in fuels2:
            if fuel in energy_extended.keys():
                try:
                    fuelvalue=energy_extended[fuel][year]
                    if (~np.isnan(fuelvalue)&(fuelvalue!=0)):
                        fuelvalue*=0.000277777778 #1 joule = 0.000277777778 Wh
                        for i in range(n): #sum fuel item values over all groups
                            supergroup[i][fuel]="Fossil Fuels"
                            #save sum, but also components for the piechart breakdown
                            for k in range(m-1):
                                if supergroup[i][fuel] in fuelsum[i][k]:
                                    fuelsum[i][k][supergroup[i][fuel]].append({"l":fuel, "v":fuelvalue})
                                else:
                                    fuelsum[i][k][supergroup[i][fuel]]=[{"l":fuel, "v":fuelvalue}]
                except: pass
        
        #add any mising fossil fuels
        if 'Other fossil fuels' in energy_extended.keys():
            if year in energy_extended['Other fossil fuels']:
                fuelvalue=energy_extended['Other fossil fuels'][year]*0.000277777778 #1 joule = 0.000277777778 Wh
                if "Fossil Fuels" in fuelsum[0][0]:
                    if (sum([w['v'] for w in fuelsum[0][0]["Fossil Fuels"]])<(fuelvalue-electricity_to_fossil)):
                        fuel='Other fossil fuels'
                        fuelvalue=energy_extended['Other fossil fuels'][year]*0.000277777778-\
                                  electricity_to_fossil-sum([w['v'] for w in fuelsum[0][0]["Fossil Fuels"]])
                        #save sum, but also components for the piechart breakdown
                        for i in range(n): #sum fuel item values over all groups
                            supergroup[i][fuel]="Fossil Fuels"
                            for k in range(m-1):
                                if supergroup[i][fuel] in fuelsum[i][k]:
                                    fuelsum[i][k][supergroup[i][fuel]].append({"l":fuel, "v":fuelvalue})
                                else:
                                    fuelsum[i][k][supergroup[i][fuel]]=[{"l":fuel, "v":fuelvalue}]
                else:
                    fuel='Other fossil fuels'
                    fuelvalue=energy_extended['Other fossil fuels'][year]*0.000277777778-electricity_to_fossil
                    #save sum, but also components for the piechart breakdown
                    for i in range(n): #sum fuel item values over all groups
                        supergroup[i][fuel]="Fossil Fuels"
                        for k in range(m-1):
                            fuelsum[i][k][supergroup[i][fuel]]=[{"l":fuel, "v":fuelvalue}]
                    
        #return fuelsum
        for i in range(n):
            for k in range(m-1):
                #distribute electricity input by locally grown crops share only (arg 2 instead of k)
                for j in supersum[i][2]:
                    if (parent[j]=='None'):
                        for p in fuelsum[i][k]:
                            helper=[]
                            for w in fuelsum[i][k][p]:
                                #distribute total input based on supply-based share
                                helper.append({"l":w['l'],"v":w['v']*share[i][3][j]})
                            add_to_json(i,k,p,j,sum([w['v'] for w in fuelsum[i][k][p]])*share[i][3][j],0,helper)
        
        for k in range (m-1):
            if fuelsum[0][k]:
                foodinput[k]+=sum([w['v'] for w in fuelsum[0][k]["Fossil Fuels"]])
                foodinput2["Fossil Fuels"][k]+=sum([w['v'] for w in fuelsum[0][k]["Fossil Fuels"]])

        #########################################################################################################
                
        ### FOSSIL FUELS IN FISHERIES
        #reinitialize fuelsum for fishery data
        fuelsum=[[{} for k in range(m-1)] for i in range(n)]
        
        #distribute fossil fuel input for fisheries
        for fuel in fuels3:
            if fuel in energy_extended.keys():
                try:
                    fuelvalue=energy_extended[fuel][year]
                    if (~np.isnan(fuelvalue)&(fuelvalue!=0)):
                        fuelvalue*=0.000277777778 #1 joule = 0.000277777778 Wh
                        for i in range(n): #sum fuel item values over all groups
                            supergroup[i][fuel]="Fossil Fuels"
                            #save sum, but also components for the piechart breakdown
                            for k in range(m-1):
                                if supergroup[i][fuel] in fuelsum[i][k]:
                                    fuelsum[i][k][supergroup[i][fuel]].append({"l":fuel, "v":fuelvalue})
                                else:
                                    fuelsum[i][k][supergroup[i][fuel]]=[{"l":fuel, "v":fuelvalue}]
                except: pass
                
        #add any mising fossil fuels
        if 'Fossil fuels in fisheries' in energy_extended.keys():
            if year in energy_extended['Fossil fuels in fisheries']:
                fuelvalue=energy_extended['Fossil fuels in fisheries'][year]*0.000277777778 #1 joule = 0.000278 Wh
                if "Fossil Fuels" in fuelsum[0][0]:
                    if (sum([w['v'] for w in fuelsum[0][0]["Fossil Fuels"]])<(fuelvalue)):
                        fuel='Fossil fuels in fisheries'
                        fuelvalue=energy_extended['Fossil fuels in fisheries'][year]*0.000277777778-\
                                  sum([w['v'] for w in fuelsum[0][0]["Fossil Fuels"]])
                        #save sum, but also components for the piechart breakdown
                        for i in range(n): #sum fuel item values over all groups
                            supergroup[i][fuel]="Fossil Fuels"
                            for k in range(m-1):
                                if supergroup[i][fuel] in fuelsum[i][k]:
                                    fuelsum[i][k][supergroup[i][fuel]].append({"l":fuel, "v":fuelvalue})
                                else:
                                    fuelsum[i][k][supergroup[i][fuel]]=[{"l":fuel, "v":fuelvalue}]
                else:
                    fuel='Fossil fuels in fisheries'
                    fuelvalue=energy_extended['Fossil fuels in fisheries'][year]*0.000277777778
                    #save sum, but also components for the piechart breakdown
                    for i in range(n): #sum fuel item values over all groups
                        supergroup[i][fuel]="Fossil Fuels"
                        for k in range(m-1):
                            fuelsum[i][k][supergroup[i][fuel]]=[{"l":fuel, "v":fuelvalue}]
                    
        #return fuelsum
        for i in range(n):
            for k in range(m-2):
                #distribute electricity input by locally grown crops share only (arg 2 instead of k)
                for p in fuelsum[i][k]:
                    add_to_json(i,k,p,"Feed",sum([w['v'] for w in fuelsum[i][k][p]]),0,fuelsum[i][k][p])
        
        for k in range (m-1):
            if (k!=2):
                if fuelsum[0][k]:
                    foodinput[k]+=sum([w['v'] for w in fuelsum[0][k]["Fossil Fuels"]])
                    foodinput2["Fossil Fuels"][k]+=sum([w['v'] for w in fuelsum[0][k]["Fossil Fuels"]])

2.3. Fertilizers Part

2.3.1 Fertilizer profiling

Direct fertilizer inputs from the FAOSTAT database are loaded, interpolated and normalized for a certain country, for a given time period (given by the years argument of the function). During the process, data quality indicators are collected in the missing and integrity global variables. The fert_profiling returns a dictionary which has elements of the years array as (secondary) keys and profiled values as values, for all fertilizer types (primary keys). Remember that the fertilizers database is composed of two sepparate dataframes, fertilizers1 and fertilizers2. This combines the two to yield a single, unified dictionary.

In [40]:

def fert_profiling(country,years):
    global integrity
    global missing
    zerodata=[]
    y={}
    for year in years: #fill exiting data
        if (year<2003):
            fertilizers=fertilizers2 #inverted on purpose
            fertnames=fert1
            fertelement='Consumption'
        else:
            fertilizers=fertilizers1
            fertnames=fert2
            fertelement='Consumption in nutrients'
            country=country_name_converter6(country) #convert country names from old FAOSTAT to new FAOSTAT

        for fert in fertnames:
            try:
                for i in fertilizers.xs([country,year,fertelement,fert], level=[0,1,2,3]).T.iteritems():
                    fertvalue=i[1][0]            #choose column in which fert item values are stored
                    if fert in ['Phosphate fertilizers', 'Phosphate Fertilizers (P205 total nutrients)']:
                        fertitem='Phosphate'
                    if fert in ['Potash fertilizers', 'Potash Fertilizers (K20 total nutrients)']:
                        fertitem='Potash'    
                    if fert in ['Nitrogenous fertilizers', 'Nitrogen Fertilizers (N total nutrients)']:
                        fertitem='Nitrogen'
                    if (~np.isnan(fertvalue)&(fertvalue!=0)): #in FAOSTAT, NaN and 0 are the same thing
                        if fertitem not in y: y[fertitem]={}
                        y[fertitem].update({year:fertvalue})
                    else:
                        zerodata.append(fert)
            except: pass
            
    for i in y.keys():

        #update data integrity
        for j in set(years).difference(y[i].keys()):
            integrity[j].append(i)

        #interpolate missing dataframe values
        if set(y[i].keys()).intersection(years):
            #old y[i].update(interpolate(y[i],years,1)) #value, use linear
            y[i].update(gdp_interpolate(y[i],years,1)) #fertilizer values change with GDP
    
    label=[]
    for i in fert1+fert2:   
        if i in ['Phosphate fertilizers', 'Phosphate Fertilizers (P205 total nutrients)']:
            label='Phosphate'
        if i in ['Potash fertilizers', 'Potash Fertilizers (K20 total nutrients)']:
            label='Potash'    
        if i in ['Nitrogenous fertilizers', 'Nitrogen Fertilizers (N total nutrients)']:
            label='Nitrogen'
        if label:
            if label not in y.keys():
                if label not in zerodata:
                    if label not in missing:
                        print country, 'no data in the entire time range', label
                        missing.add(label)

    return y

2.3.2. Fertilizer Parsing

The fertilizer data comes from the FAOSTAT in tonnage for each fertilizer end product (urea, potash, etc.). Then we categorize these into P, K and N type fertilizers and convert the tonnage for each fertilizer type to nutrient content. Then, using the specific energy conversion factors of the chemical processes of fertilizers manufacturing, we convert these into equivalent fossil fuel energy input. Since each fertilizers can be manufactured from multiple types of fossil fuels, the total fossil energy requirement is split accordingly. Profiled fertilizer inputs are loaded from the unified fertilizers_extended global dictionary, which has been created by the fuel_profiling function. Using the add_to_json function, the fossil-to-fertilizer and fertilizer-to-primary crop flows are saved into the data variable and effectively added to the visualization structure.

In [41]:

def fertilizers_part(country, year):
#if True: # placeholder dummy
    
        ###############################################################################################
        ######                                   FERTILIZERS PART                                ######
        ###############################################################################################
        
    if True: # placeholder dummy
        
        global foodinput
        
        #need to save fertilizer weight and fossil sources
        fertsum=[[{} for k in range(m-1)] for i in range(n)]
        fertweight=[[{} for k in range(m-1)] for i in range(n)]
        #supply data (k=3) does not have non-food inputs
        fertfossil=["Natural gas (including LNG)","Fuel oil","Liquefied petroleum gas (LPG)","Coal"]
        ngratio=[0 for i in range(n)]
        ffsum={}
        
        if (year<2003):
            fertilizers=fertilizers2 #inverted on purpose
            fertnames=fert1
            fertelement='Consumption'
        else:
            fertilizers=fertilizers1
            fertnames=fert2
            fertelement='Consumption in nutrients'
        
        for fertitem in fertilizers_extended:
            fertvalue=fertilizers_extended[fertitem][year]
            if (~np.isnan(fertvalue)&(fertvalue!=0)):
            #define energy content+packaging and transport (http://www.engr.usask.ca/societies/csae/c9915.pdf page10)
            #define production energy requirement (http://www.sswm.info/sites/default/files/reference_attachments/
            #GELLINGS%20et%20al%202004%20Energy%20Efficiency%20in%20Fertiliser%20Production.pdf
            #(http://bioscience.oxfordjournals.org/content/63/4/263/F3.expansion.html)
                if fertitem=='Phosphate':
                    fertenergycontent=6.82 #MJ/kg 
                    fertenergyprod=7.7     #MJ/kg 
                    fertenergytrans=8.33   #MJ/kg
                    ngratio[0]=0.75
                    ngratio[1]=0.25
                    ngratio[2]=0
                    ngratio[3]=0
                if fertitem=='Potash':
                    fertenergycontent=2.88 #MJ/kg
                    fertenergyprod=6.4     #MJ/kg
                    fertenergytrans=6.35   #MJ/kg
                    ngratio[0]=0.5
                    ngratio[1]=0.5
                    ngratio[2]=0
                    ngratio[3]=0
                if fertitem=='Nitrogen':
                    fertenergycontent=59   #MJ/kg
                    fertenergyprod=69.53   #MJ/kg 
                    fertenergytrans=7.05   #MJ/kg
                    ngratio[0]=0.85*0.8
                    ngratio[1]=0.85*0.2
                    ngratio[2]=0.15*0.5
                    ngratio[3]=0.15*0.5

                fertvalue=fertvalue/1000 #model units are in 1000 tonnes
                fertenergy=fertvalue*(fertenergycontent+fertenergytrans)*  0.000277777778 #TWh 
                                      #MJ/kg / TJ/1000tonne  *  #1 joule = 0.000277777778  Wh
                fertratio=fertenergyprod/fertenergycontent
                for i in range(n): #sum fert item values over all groups
                    supergroup[i][fertitem]="Fertilizers"
                    #save sum, but also components for the piechart breakdown
                    for k in range(m-1):
                        if supergroup[i][fertitem] in fertsum[i][k]:
                            fertsum[i][k][supergroup[i][fertitem]].append({"l":fertitem, "v":fertenergy})
                            fertweight[i][k][supergroup[i][fertitem]].append({"l":fertitem, "v":fertvalue})
                        else:
                            fertsum[i][k][supergroup[i][fertitem]]=[{"l":fertitem, "v":fertenergy}]
                            fertweight[i][k][supergroup[i][fertitem]]=[{"l":fertitem, "v":fertvalue}]
                #save fossil to fertilizer link data
                for c in range(len(fertfossil)):
                    if fertfossil[c] in ffsum:
                        ffsum[fertfossil[c]]+=fertenergy*fertratio*ngratio[c]
                    else:
                        ffsum[fertfossil[c]]=fertenergy*fertratio*ngratio[c]
        for i in range(n):
            for k in range(m-1):
                #distribute fertilizer input by locally grown crops share only (arg 2 instead of k)
                #distribute total input based on supply-based share
                for j in supersum[i][2]:
                    if (parent[j]=='None'):        
                        for p in fertsum[i][k]:
                            add_to_json(i,k,p,j,sum([w['v'] for w in fertsum[i][k][p]])*share[i][3][j],\
                                    sum([w['v'] for w in fertweight[i][k][p]])*share[i][3][j],fertsum[i][k][p])
        
        #create fossil to fertilizer links
        mydata=[]
        try:
            for c in range(len(fertfossil)):
                mydata.append({"l":fertfossil[c], "v":ffsum[fertfossil[c]]})
            for i in range(n): #exclude level 4 for - now added back in
                for k in range(m-1):
                    add_to_json(i,k,"Fossil Fuels","Fertilizers",sum(ffsum.values()),0,mydata)
        except: pass
                            
        for k in range(m-1):
            foodinput[k]+=sum(ffsum.values())
            foodinput2["Fossil Fuels"][k]+=sum(ffsum.values())

2.4. Labor Part

2.4.1 Labor profiling

Labor inputs from the FAOSTAT database are loaded, interpolated and normalized for a certain country, for a given time period (given by the years argument of the function). During the process, data quality indicators are collected in the missing and integrity global variables. The labor_profiling returns a dictionary which has elements of the years array as keys and profiled values as values. If direct numbers about the number of people working in agriculture cannot be found in the population dataframe, total population numbers (mostly available) are multiplied by a regionally characteristic population fraction.

In [42]:

def labor_profiling(country,years):
    global integrity
    global missing
    y={}
    x={}
    w={}
    country=country_name_converter6(country) #convert country names from old FAOSTAT to new FAOSTAT
    try:
        for i in population.xs([country,'Total economically active population in Agr'],level=[0,2]).T.iteritems():
            year     = i[0][0]        
            popvalue = i[1][0]
            if (~np.isnan(popvalue)&(popvalue!=0)):
                x[year]=popvalue
    except: pass 
    try:
        for i in population.xs([country,'Total Population - Both sexes'],level=[0,2]).T.iteritems():
            year     = i[0][0]        
            popvalue = i[1][0]
            if (~np.isnan(popvalue)&(popvalue!=0)):
                w[year]=popvalue
    except: pass
    
    #caclulate agricultural population share
    for year in set(x.keys()).intersection(w.keys()):
        y[year]=x[year]/w[year]
    
    if set(y.keys()).intersection(years):
        #update data integrity
        for i in set(years).difference(y.keys()):
            integrity[i].append("Labor")
        #interpolate  missing dataframe values
    
        y.update(interpolate(y,years,1)) #share, use linear
    else: 
        print country, 'no data in the entire time range for Labor'
        missing.add("Labor")
    
    #convert share back to number of people
    for year in set(y.keys()).intersection(w.keys()):
        y[year]=y[year]*w[year]
    #interpolate again, if necessary
    if set(y.keys()).intersection(years):
        y.update(interpolate(y,years,2)) #population, use exponential
        
    return y

2.4.2. Labor Parsing

Labor data is calculated using the daily per capita energy intakes from FAOSTAT. A certain number of workhours/workday (hoursperday) and workdays/year (workyear) are used alongside a characteristic average energy expenditure intensity for the activity of farming (hourlycalorieburn). Then the fraction of population working in agriculture (FAOSTAT or extrapolated from neighbors) gets multiplied by the population (FAOSTAT or WDI) to get the number of people involved in agriculture (popvalue). Using the energy intensity calculated above, we get the total energy input as labor. For countries where the daily food energy intake does not meet the necessary minimum for the agricultural activity-specific energy expenditure, it is assumed that all food energy intake is expended for the activity. Profiled labor inputs are loaded from the unified population_extended global dictionary, which has been created by the labor_profiling function. Using the add_to_json function, the labor-to-primary crop flows are saved into the data variable and effectively added to the visualization structure.

In [43]:

def labor_part(country, year):
#if True: # placeholder dummy
       
        ###############################################################################################
        ######                                   LABOR PART                                      ######
        ###############################################################################################
    
    if True: # placeholder dummy
        
        global foodinput

        #old version, without extrapolation, slow
        #calorietotal=balance.xs([country,year,'Food supply','Grand Total'],level=[0,1,3,2]).values[0][1]
            #second index: choose column in which calorie item values are stored, might need to change it
        
        try:
            calorietotal=food_extended['Food supply (kcal/capita/day)'][year]['Grand Total']
        except:
            try:
                sum(food_extended['Food supply (kcal/capita/day)'][year].values())
            except:
                pass
        
        #old version, without extrapolation, slow
        #popvalue=population.xs([country,year,'Total economically active population in Agr'],level=[0,1,2]).values[0][0]
                            #second index: choose column in which pop item values are stored, might need to change it
        
        #new version, after extrapolation
        if year in population_extended.keys():
            try:
                popvalue = population_extended[year]

                if (~np.isnan(popvalue)&(popvalue!=0)):
                    workyear=5.0/7.0*365.25   #working days spent with farming per year

                    hourlycalorieburn=372        #calories needed per hour of farming
                             #http://calorielab.com/burned/?mo=se&gr=11&ti=Occupation&wt=150&un=lb&kg=68
                             #http://www.nutristrategy.com/activitylist.htm, 
                             #for 155 pound, average of 4 types of activities 372 cal/hour
                    hoursperday=12               #hours per day spent farming
                    minimumcalorieburn=hourlycalorieburn*hoursperday
                    calorieburn=0.9              #percent of calorie intake burned for farming, if above minimum

                    laborenergy=min(minimumcalorieburn,calorieburn*calorietotal)\
                                *workyear\
                                *popvalue\
                                *1000\
                                *0.00116222222/1000000000
                                # 0.00116222222 kwh/kcal *10^-9 TWh/kWh

                    for i in range(n):
                        for k in range(m-1): #supply data (k=3) does not have non-food inputs, but supply-based share
                            for j in supersum[i][k]:
                                if (parent[j]=='None'):
                                    add_to_json(i,k,"Labor",j,laborenergy*share[i][3][j],0,\
                                        '[{"l":'+repr("Labor")+',"v":'+repr(laborenergy*share[i][3][j])+'}]')

                for k in range(m-1):
                    foodinput[k]+=laborenergy
                    foodinput2["Labor"][k]+=laborenergy
            except: pass

2.5. Value Added Imports Part

2.5.1 Imports profiling

Food imports and export flows are present for all individual countries, but World values.
Imports need to converted to Value Added Imports in order to accommodate the fact that they have been produced in a country with a different energy intensity than the target. This is a compromise solution since the alternative is to use the energy return on energy investment (EROEI) values of the source countries of the imported products. However, this would require to calcualte individual crop EROEI's, since the final energy intennsity of crop imports would greatly depend on the imported crop mix. Therefore the total dollar value of food imports (from FAOSTAT, if not available then WDI, if not available then UN COMTRADE) gets multiplied by the energy intensity of the economy of the target country. The energy intensity data comes from WDI - as it involves GDP and total energy consumption. This yields the energy expended in the target country to purchase the imports in question.
The inflow of the imports is then calculated using the primary energy share by source breakdown (fossils, non-combustible renewables & nuclear, biomass) from WDI database.
However, we also have direct information about the energy content of the imports directly from FAOSTAT (which is not the same as the expended energy to purchase them). This is the outflow of the imports.

Therefore in the Import profiling section first set up the database access for each of the above mentioned 3 databases. The energy intensity of the economy (energy use/GDP) is calculated using WDI data. On top of that, the breakdown of primary energy sources (that run the economy) are also loaded from WDI, as well as the total value of mechandise imports and the food imports' share in this - to get the Value Added Imports of all food products. This database connection is set up in the next cell.

Database Setup: WDI

In [44]:

#WDI access, already set up for fossil-to-electricity part

wdi_import_indicators=['Energy use (kt of oil equivalent)','GDP (current US$)',\
                       'Food imports (% of merchandise imports)',\
                       'Merchandise imports (current US$)',\
                       'Fossil fuel energy consumption (% of total)',\
                       'Alternative and nuclear energy (% of total energy use)',\
                       'Combustible renewables and waste (% of total energy)']
wdi_import=get_data([wdipath,'WDI_csv',wdindicators[0]])
wdi_import.columns=['Country','Code','Indicator','ICode']+[1960+i for i in range(55)] #old 54
wdi_import=wdi_import.drop('Code', axis=1).drop('ICode', axis=1)\
                       .query('Indicator=='+repr(wdi_import_indicators)).set_index(['Country','Indicator'],drop=True)

We define a custom interpolation function that can interpolate and extrapolate using polynomials of different grade, as well as an exponential function. Furthermore, it has the option to extrapolate or truncate the values at the interpolation interval's edges to the depth nereast values.

In [45]:

def interpolate(d,years,gfit=2,depth=1,polyorder=1,override=False):
    #depth * length of interpolation substrings will be taken to the left and right
    #for example for {1971:5,1972:6,1973:7,1974:5} interpolating it over 1969-1990
    #for the section 1960-1970 (2 elements) the values from 1972,1973,1974 (3 elements) will be taken with depth 1.5
    #for the section 1974-1990 (15 elements) all values  (4 elements) will be taken to extrapolate
    if (gfit>2): 
        print 'interpolate takes only 1 (polynomial) or 2 (exponential) as 3rd argument [default=2]'
        return
    mydict={}
    missing_points=[[]]
    for year in years:
        if year not in d.keys():
            missing_points[-1].append(year)
        else:
            missing_points.append([])
    for m in missing_points:
        if m:
            fit=gfit
            if ((m[-1]<np.sort(d.keys())[0])|(m[0]>np.sort(d.keys())[-1])): 
                #check if it is ends of the interval, then extrapolate mean only
                if not override: fit=0
            
            if fit==0: #take average
                y = {k: d[k] for k in set(d.keys()).intersection(range(max(min(years),min(m)-int(3)),\
                                                                       min(max(years),max(m)+int(3))+1))}
                for i in range(len(m)):
                    mydict[m[i]]=np.mean(y.values())
            elif fit==1:
                #intersector
                y = {k: d[k] for k in set(d.keys()).intersection(range(max(min(years),\
                            min(m)-int(depth*len(m))),min(max(years),max(m)+int(depth*len(m)))+1))}
                #print y
                w = np.polyfit(y.keys(),y.values(),polyorder) # obtaining regression parameters
                if (polyorder==1):
                    intersector=w[0]*np.array(m)+w[1]
                else:
                    intersector=w[0]*np.array(m)*np.array(m)+w[1]*np.array(m)+w[2]
                for i in range(len(m)):
                    mydict[m[i]]=max(0,intersector[i])
            else:
                #intersector
                y = {k: d[k] for k in set(d.keys()).intersection(range(max(min(years),\
                            min(m)-int(depth*len(m))),min(max(years),max(m)+int(depth*len(m)))+1))}
                #print y
                w = np.polyfit(y.keys(),np.log(y.values()),1) # obtaining log regression parameters (exp fitting)
                intersector=np.exp(w[1])*np.exp(w[0]*np.array(m))
                for i in range(len(m)):
                    mydict[m[i]]=max(0,intersector[i])
    
    #return interpolated points
    return mydict

Then the values for the Value Added Imports of all food products are loaded, interpolated and normalized for a certain country, for a given time period (given by the years argument of the wdi_impot_profiling function). During the process, data quality indicators are collected in the missing, estimated and integrity global variables. The wdi_import_profiling parses the value added food imports, calculates the equivalent enegrgy expenditure in the country's economy, distributes this energy over the primary energy mix and returns a dictionary which has elements of the years array as (scondary) keys and profiled values as values for each primary energy type (primary keys).

In [46]:

#create global energy intensity of the economy values
def wdi_global(country):
    y={}
    years=range(1961,2012)
    country=country_name_converter(country)
    
    for year in years:
        w=wdi_import.loc[country]
        if ((year in w.loc['Energy use (kt of oil equivalent)']) and (year in w.loc['GDP (current US$)'])):
            y[year]=w.loc['Energy use (kt of oil equivalent)'][year]/w.loc['GDP (current US$)'][year]
            if np.isnan(y[year]): y.pop(year);
    y.update(interpolate(y,years,1)) #energy/GDP ratio is quasilinear

    return y

In [47]:

def wdi_import_profiling(country,years):  
    global integrity
    global missing
    global estimated
    zerodata=[]
    temp=set()
    #locally convert country names between databases
    country=country_name_converter(country)
    y={}
    
    #create global primary energy share
    fuelsources={'Fossil fuel energy consumption (% of total)':'Fossil Fuels',\
                       'Alternative and nuclear energy (% of total energy use)':'Renewables & Nuclear',\
                       'Combustible renewables and waste (% of total energy)':'Renewables & Nuclear'}
    
    for year in years: #fill exiting data
        for fuel in wdi_import_indicators:
            try:    
                fuelvalue=wdi_import.xs([country,fuel])[year] #choose column in which fuel item values are stored
                #if (~np.isnan(fuelvalue)&(fuelvalue!=0)):
                if (~np.isnan(fuelvalue)): #accept 0 data, except for energy use and GDP
                    if (fuelvalue==0):
                        if fuel not in {'Energy use (kt of oil equivalent)','GDP (current US$)'}:
                            if fuel not in y: y[fuel]={}
                            y[fuel].update({year:fuelvalue})
                    else:
                        if fuel not in y: y[fuel]={}
                        y[fuel].update({year:fuelvalue})
                else:
                    zerodata.append(fuel)
            except: pass

    for i in y.keys():

        #remove all zero values
        if not any(y[i].values()): y.pop(i)
        
        else:
            #update data integrity
            for j in set(years).difference(y[i].keys()):
                label=i
                try: label=label[:label.find("(")-1]
                except: pass
                integrity[j].append(label)

            #interpolate missing dataframe values
            if set(y[i].keys()).intersection(years):
                if i in {'Energy use (kt of oil equivalent)','GDP (current US$)'}: 
                    fit=2 #consumption, exponential
                    override=True
                else: 
                    fit = 1 #share, linear
                    override=False
                y[i].update(interpolate(y[i],years,fit,1,1,override)) 
    
    if set(wdi_import_indicators).difference(y.keys()):
        for i in set(wdi_import_indicators).difference(y.keys()):
            #if i not in zerodata:
                label=i
                try: label=label[:label.find("(")-1]
                except: pass
                print country, 'no data in the entire time range', label
                #missing.add(label)
                temp.add(i)
    
    #if all fuel sources missing data, then take regional/global average for fuel sources
    if not set(fuelsources).difference(temp):
        if country!='World':
            if country in LDC: 
                pcountry='Least Developed Countries'
            elif country in LIFDC: 
                pcountry='Low Income Food Deficit Countries'
            elif country in CAR: 
                pcountry='Caribbean'
            elif country in EU: 
                pcountry='European Union'
            elif country in NA: 
                pcountry='Northern America'
            elif country in SA: 
                pcountry='South America'
            elif country in SAS: 
                pcountry='Southern Asia'
            elif country in EAP: 
                pcountry='Eastern Asia'
            else: 
                pcountry='World'
                
            print country, 'no data in the entire time range','All primary energy sources, extrapolating from', pcountry
            
            w=wdi_import_profiling(pcountry,years)
            estimated.add("All primary energy sources")
            for i in fuelsources:
                y[i]=w[i]
                
    for year in years: #renormalize shares to 100%
        
        s=0
        for fuel in fuelsources.keys():
            if fuel in y.keys():
                s+=y[fuel][year]
                
        for fuel in fuelsources.keys():
            if fuel in y.keys():
                y[fuel][year]=(y[fuel][year]/s)*100

    return y

Database setup: FAOSTAT

From the FAOSTAT database the value addedd imports are loaded directly from the imports dataframe, where available. Where necessary, values are interpolated using a GDP-driven interpolation.

In [48]:

def fao_import_profiling(country,years):
    y={}
    country=country_name_converter6(country) #convert country names from old FAOSTAT to new FAOSTAT
    try:
        for i in imports.xs([country,'Import Value'],level=[0,2]).T.iteritems():
            year     = i[0][0]        
            impvalue = i[1][0]
            if (~np.isnan(impvalue)&(impvalue!=0)):
                y[year]=impvalue
    except: pass    
    
    if set(y.keys()).intersection(years):
        #interpolate  missing dataframe values
        y.update(gdp_interpolate(y,years,2)) #food imports change with GDP, is exponentially
        
    return y

The import_profiling function calculates and combines the output of the WDI and FAOSTAT profilings. The values for the Value Added Imports of all food products are loaded, first from FAOSTAT, if not available then WDI, then interpolated values in the same manner, then they are complied and interpolated and normalized for a certain country, for a given time period (given by the years argument of the wdi_impot_profiling function). During the process, data quality indicators are collected in the missing, estimated and integrity global variables. The import_profiling function returns a dictionary which has elements of the years array as keys and profiled value added food imports values as values.

In [49]:

def import_profiling(country,years):
    global integrity
    global missing
    global estimated
    y={}
    
    w=wdi_import_extended
    u=fao_import_profiling(country,years)
    
    #calculate value added total food import value
    for year in years:
        if year not in u: #data not found in FAOSTAT, try WDI
            if (('Merchandise imports (current US$)' in w) and \
                ('Food imports (% of merchandise imports)' in w)):
                if ((year in w['Merchandise imports (current US$)']) and \
                    (year in w['Food imports (% of merchandise imports)'])):
                    v=w['Food imports (% of merchandise imports)'][year]/\
                    100*w['Merchandise imports (current US$)'][year]
                    print country, 'tried alternate source for Value Added Imports , success' 
                else: 
                    v=0
                    print country, 'tried alternate source for Value Added Imports , no luck' 

        else:
            v=u[year]*1000
        if (('Energy use (kt of oil equivalent)' in w) and \
            ('GDP (current US$)' in w)): 
            if ((year in w['Energy use (kt of oil equivalent)']) and \
                (year in w['GDP (current US$)'])): #energy intenity of GDP is needed no matter what
                if (w['GDP (current US$)'][year]!=0):
                    if (v!=0): y[year]=v*w['Energy use (kt of oil equivalent)'][year]/\
                        w['GDP (current US$)'][year]*0.01163 # %/100 * $ * ktoe / $ * ktoe/TWh = TWh
        
        else: #otherwise use regional/global energy intensity values
            if (v!=0): 
                
                if country!='World':
                    if country in LDC: 
                        pcountry='Least Developed Countries'
                    elif country in LIFDC: 
                        pcountry='Low Income Food Deficit Countries'
                    elif country in CAR: 
                        pcountry='Caribbean'
                    elif country in EU: 
                        pcountry='European Union'
                    elif country in NA: 
                        pcountry='Northern America'
                    elif country in SA: 
                        pcountry='South America'
                    elif country in SAS: 
                        pcountry='Southern Asia'
                    elif country in EAP: 
                        pcountry='Eastern Asia'
                    else: 
                        pcountry='World'

                    print country, year, 'no data Value added imports, extrapolating from', pcountry
                
                    y[year]=v*wdi_global(pcountry)[year]*0.01163 # %/100 * $ * ktoe / $ * ktoe/TWh = TWh       
                    if "Value Added Imports" not in integrity[year]:
                        integrity[year].append("Value added imports")              

    if set(y.keys()).intersection(years):
        #update data integrity
        for i in set(years).difference(y.keys()):
            if "Value Added Imports" not in integrity[i]:
                integrity[i].append("Value added imports")
        
        #interpolate  missing dataframe values
        y.update(interpolate(y,years,2,1,1,True)) #economy is exponential
    
    else: 
        if "Value Added Imports" not in missing:
            print country, 'no data in the entire time range Value Added Imports' 
            missing.add("Value added imports")
            
    return y

2.5.2. Imports Parsing

Profiled imports inputs are loaded from the unified import_extended global dictionary, which has been created by the import_profiling function. Then, the value added food imports are use to calculate the equivalent enegrgy expenditure in the country's economy, distributes this energy over the primary energy mix (calculated by wdi_import_profiling and stored in wdi_import_extended). Then, using the add_to_json function, the imports-to-primary crop flows (from fao_import_profiling) and primary energy sources-to-imports are saved into the data variable and effectively added to the visualization structure.

In [50]:

def import_part(country, year):
#if True: # placeholder dummy
       
        ###############################################################################################
        ######                                   IMPORTS PART                                    ######
        ###############################################################################################
    
    if country not in {"World"}:
        
        global foodinput
        
        if year in import_extended.keys():
        
            fuelsources={'Fossil fuel energy consumption (% of total)':'Fossil Fuels',\
                       'Alternative and nuclear energy (% of total energy use)':'Renewables & Nuclear',\
                       'Combustible renewables and waste (% of total energy)':'Renewables & Nuclear'}
            
            if set(fuelsources.keys()).intersection(wdi_import_extended.keys()):
                 
                    try:
                        impvalue = import_extended[year]*\
                        wdi_import_extended['Fossil fuel energy consumption (% of total)'][year]/100

                        if (~np.isnan(impvalue)&(impvalue!=0)):

                            for i in range(n):
                                for k in range(m-1): #supply data (k=3) does not have non-food inputs
                                    if (k!=m-2): #if not only agri-system
                                        add_to_json(i,k,"Fossil Fuels","Import",impvalue,0,\
                                            '[{"l":'+repr("Fossil Fuels")+',"v":'+repr(impvalue)+'}]')

                        for k in range(m-1):
                            if (k!=2): 
                                foodinput[k]+=impvalue
                                foodinput2["Fossil Fuels"][k]+=impvalue
                    except: pass
                    impvalue1 = 0
                    try: impvalue1 = import_extended[year]*\
                        wdi_import_extended['Alternative and nuclear energy (% of total energy use)'][year]/100
                    except: pass
                    impvalue2 = 0
                    try: impvalue2 = import_extended[year]*\
                        wdi_import_extended['Combustible renewables and waste (% of total energy)'][year]/100
                    except: pass
                    try:
                        if ((~np.isnan(impvalue1+impvalue2))&(impvalue1+impvalue2!=0)):
                            for i in range(n):
                                for k in range(m-1): #supply data (k=3) does not have non-food inputs
                                    if (k!=m-2): #if not only agri-system
                                        add_to_json(i,k,"Renewables & Nuclear","Import",impvalue1+impvalue2,0,\
                                            '[{"l":'+repr("Renewables & Nuclear")+\
                                                    ',"v":'+repr(impvalue1)+'},{"l":'+\
                                                    repr("Biomass & Waste")+',"v":'+repr(impvalue2)+'}]')

                        for k in range(m-1):
                            if ((~np.isnan(impvalue1+impvalue2))&(impvalue1+impvalue2!=0)):
                                if (k!=2):
                                    foodinput[k]+=impvalue1+impvalue2
                                    foodinput2["Renewables & Nuclear"][k]+=impvalue1+impvalue2
                    except: pass
            else:
                try:
                    impvalue = import_extended[year]
                    if (~np.isnan(impvalue)&(impvalue!=0)):
                        for i in range(n):
                            for k in range(m-1): #supply data (k=3) does not have non-food inputs
                                if (k!=m-2): #if not only agri-system
                                    add_to_json(i,k,"Fossil Fuels","Import",impvalue,0,\
                                        '[{"l":'+repr("Fossil Fuels")+',"v":'+repr(impvalue)+'}]')

                    for k in range(m-1):
                        if (k!=2): 
                            foodinput[k]+=impvalue
                            foodinput2["Fossil Fuels"][k]+=impvalue
                except: pass
                print country, year, "no data for Energy composition, using 100% Fossil"

2.6. Save Data

This workflow leads to 3 large categories of inputs:

Fossil Fuels, Renewables & Nuclear, Labor

And 3 large categories of outputs:

Food (Domestic consumption), Exports, Biofuels & Other (and Waste, not part of the balance)

In addition the Stock Variation for each year is added to both sides.

Then, the energy return on energy investment (EROEI) of each country and region and the globe for each year can be calculated by taking the ratio of the sum of the energy outputs over the sum of the energy inputs. MOnitoring the evolution of the EROEI can offer valuable information into the dynamics of of food and agriculture systems.

We construct two EROEIs:

An EROEI of the agricultural system, that includes locally grown primary crops. We call this the Agri-system EROEI.
Another EROEI, when, of top of the above, we include secondary food products, imports and exports. We call this the ** Food system EROEI**.

Normalization:

Since the EROEI is subject to variations in the average crop energy content - resutling from changes in the crop mix, as well the subtle evolution of the energy content of seeds and subseeutn crops, we will do a normalization of the EROEI based on the normalized average crop energy content (NAEC) thus removing the effect of crop shifting from the system.
The the NAEC has different values for agri-system and food systems, as they are driven by different phenomena:
- For the agri-system, the NAEC is equivalent to the intrinsic change in produced primary crop energy content. That is, if a country starts replacing fruits with cereals - i.e. a crop with low energy content with a higher one, this leads to an increase in the average crop energy content of the argi-system. Thus, the real change in the agri-sytem EROEI has to be discounted by this phenomenon.
- For the food system the process is bit more complicated. In the food system the NAEC changes because of the change in the agri-system NAEC, but on top of that also because of a few other factors:
  - The energy intensity of the imported crops to the country changes over time
  - The end-use of primary crops changes over time - i.e. cereals start being fed to animals in larger proportions rather than direct human consumption, effectively reducing the NAEC.

Data is stored in a number of JSON files for each country and each year. In the beginning of this workbook, we have defined the argument i as the detail level at which we are passing the data and it is one of the following:

Food hypergroups
Food supergroups
FAO food groups
FAO crop groups

And the argument k corresponds to the Python dictionary in the global variable data[i][k] to which the data will be written to. We will generate the following datasets:

Gross energy [TWh] - data[i][0] variable - includes food-system and crop residues
Food system energy [TWh] - data[i][1] variable - includes agri-system and food trade and scondary food products
Agri-system energy [TWh] - data[i][2] variable - includes primary crops
Food weight [ktonnes] - data[i][3] variable

Since we have 4 (defined as n throughout the code) different values of i and 4 (defined as m throughout the code) different values of k, this leads to a total of 16 files for each country and each year. The naming convetion we us is:

[country]+[year]+[i]+[k].json, without spaces

On top of that, for each dataset, we calculate a sepparate EROEI and NAEC, storing them in 4 files indexed after k and prefixed with k:

[country]+[year]+k+[k].json, without spaces

Finally, the data integriy for each country is stored in one file per country, listing the contents of the missing, etimated and integrity (interpolated) dictionaries over time.

This leads to a grand total of (number of countries and regions)·(years·m·n+1) number of files. This equals 197·(51·4·4+1)=160949

In [51]:

def save_data(country, year):
    
    ###############################################################################################
    ######                                   SAVE DATA                                       ######
    ###############################################################################################

    global eroei

    for k in range(m):
        
        #if ((sum([foodoutput[i][k] for i in foodoutput.keys()])!=0) & (foodinput[k]!=0)): 
        #old, deprecated - null foodinput condition was needed to avoid Inf EROEI
        if ((sum([foodoutput[i][k] for i in foodoutput.keys()])!=0)):

            #save main Sankey data  
            for i in range(n):
                file(savepath+country+repr(year)+repr(i)+repr(k)+'.json','w').write(json.dumps(data[i][k]))

            #create global input-output graph (top right corner)
            fooddata={"nodes":[{"name": "Balance","fill":"#9edae5"}],"links":[]}
            for j in range(len(foodoutput.keys())):
                if foodoutput[foodoutput.keys()[j]][k]:
                    nodes_innerdict={}
                    nodes_innerdict["name"] = foodoutput.keys()[j]
                    if (color(k,foodoutput.keys()[j])!=""): nodes_innerdict["fill"]=color(k,foodoutput.keys()[j])
                    fooddata["nodes"].append(nodes_innerdict)
            for j in range(len(foodinput2.keys())):
                if foodinput2[foodinput2.keys()[j]][k]:
                    nodes_innerdict={}
                    nodes_innerdict["name"] = foodinput2.keys()[j]
                    if (color(k,foodinput2.keys()[j])!=""): nodes_innerdict["fill"]=color(k,foodinput2.keys()[j])
                    fooddata["nodes"].append(nodes_innerdict)
                
            for j in range(1,len(fooddata["nodes"])):
                if fooddata["nodes"][j]["name"] in foodinput2:
                    links_innerdict={} 
                    links_innerdict["source"]=j
                    links_innerdict["target"]=0
                    links_innerdict["value"]=foodinput2[fooddata["nodes"][j]["name"]][k]
                    fooddata["links"].append(links_innerdict)
                else:
                    links_innerdict={} 
                    links_innerdict["source"]=0
                    links_innerdict["target"]=j
                    links_innerdict["value"]=foodoutput[fooddata["nodes"][j]["name"]][k]
                    fooddata["links"].append(links_innerdict)

            #if (k!=m-1):
            #eroei[k][year]=sum([foodoutput[i][k] for i in foodoutput])/(foodinput[k])
            #old discounted imports for WROWI
            #else: eroei[k][year]=(sum([foodoutput[i][k] for i in foodoutput])-foodimport[k])/(foodinput[k])
            
            if (foodinput[k]!=0):
                eroei[k][year]=sum([foodoutput[i][k] for i in foodoutput])/(foodinput[k])
            else: eroei[k][year]=0
            
            #calculate aec
            global aec
            aec[k][year]=0
            aec1=0
            aec2=0
                      
            for item in foodoutput.keys()+["Production"]:
                if item in groupdict[0][k]:
                    try:
                        aec1+=sum([z['prod'] for z in data[0][k]['links'] if \
                                z['target']==groupdict[0][k][item]])
                        aec2+=sum([z['value'] for z in data[0][k]['links'] if \
                                z['target']==groupdict[0][k][item]])
                    except:pass #division by zero, interpolate later
                    
            if (k!=m-1): aec[k][year]=aec2/aec1*100*859.845 #convert TWh/ktonnes to kcal/100g
            else: aec[k][year]=aec1/aec2*100*859.845
        
            #save mini-Sankey data  
            file(savepath+country+repr(year)+'k'+repr(k)+'.json','w').write(json.dumps(fooddata))
           
        #safety switch
        else:
            aec[k][year]=0
            eroei[k][year]=0

3. Main

The main section of the code defines over which countries and years iterate over and calls the functions defined in Section 2 in order. There are also minor pre-profiling functions to be run first.

Country name converters

In order to establosh a smooth communication between the different databases used, we need to create 4 country_name_converter functions. We used a nromalization over the FAO country names, hence all other database country names are converted to FAO names. For saving we use a simplified version of the FAO nameset.

In [52]:

def country_name_converter(country):
    #convert country names between FAO and WDI databases
    if "\xf4" in country: return country[0:country.find("\xf4")]+"o"+country[country.find("\xf4")+1:len(country)]
    elif "of America" in country: return country[0:country.find("of America")-1]
    elif country == 'China, Hong Kong SAR': return 'Hong Kong SAR, China'
    elif country == 'China, Macao SAR': return 'Macao SAR, China'
    elif country == 'Democratic Republic of the Congo': return 'Congo, Dem. Rep.'
    elif country == 'Egypt': return 'Egypt, Arab Rep.'
    elif country == 'Faroe Islands': return 'Faeroe Islands'
    elif country == 'Iran': return 'Iran, Islamic Rep.'
    elif country == 'Iran (Islamic Republic of)': return 'Iran, Islamic Rep.'
    elif country == 'Kyrgyzstan': return 'Kyrgyz Republic'
    elif country == "Lao People's Democratic Republic": return 'Lao PDR'
    elif country == 'Occupied Palestinian Territory': return 'West Bank and Gaza'
    elif country == 'Republic of Korea': return 'Korea, Rep.'
    elif country == 'Republic of Moldova': return 'Moldova'
    elif country == 'Slovakia': return 'Slovak Republic'
    elif country == 'The former Yugoslav Republic of Macedonia': return 'Macedonia, FYR'
    elif country == 'Yemen': return 'Yemen, Rep.'
    elif country == 'Viet Nam': return 'Vietnam'
    elif country == 'Venezuela': return 'Venezuela, RB'
    elif country == 'Venezuela (Bolivarian Republic of)': return 'Venezuela, RB'    
    elif country == 'United Republic of Tanzania': return 'Tanzania'
    
    #special countries, existing in the past, electricity generation values estimated from contemporary countries
    elif country == 'Serbia and Montenegro': return 'Serbia'
    elif country == 'Saint Vincent and the Grenadines': return 'St. Vincent and the Grenadines'
    elif country == 'China, mainland': return 'China'
    elif country == 'Ethiopia PDR': return 'Ethiopia'
    elif country == 'USSR': return 'Russian Federation'
    elif country == 'Yugoslav SFR': return 'Serbia'
    elif country == 'Bahamas': return 'Bahamas, The'
    elif country == 'Saint Lucia': return 'St. Lucia'
    elif country == 'Belgium-Luxembourg': return 'Belgium'
    elif country == 'Gambia': return 'Gambia, The'
    elif country == "Democratic People's Republic of Korea": return 'Korea, Dem. Rep.'
    elif country == 'Saint Kitts and Nevis': return 'St. Kitts and Nevis'
    elif country == 'Congo': return 'Congo, Rep.'
    elif country == 'Czechoslovakia': return 'Czech Republic'
    
    #aggregate regions 
    elif country == 'Central America': return 'Mexico and Central America'
    elif country == 'Eastern Asia': return 'East Asia & Pacific (all income levels)'
    elif country == 'Least Developed Countries': return 'Least developed countries: UN classification'
    elif country == 'Low Income Food Deficit Countries': return 'Low income'
    elif country == 'Northern Africa': return 'Middle East & North Africa (all income levels)'
    elif country == 'Northern America': return 'North America'
    elif country == 'South America': return 'Latin America & Caribbean (all income levels)'
    elif country == 'Southern Africa': return 'Sub-Saharan Africa (all income levels)'
    elif country == 'Southern Asia': return 'South Asia'
    elif country == 'Caribbean': return 'Caribbean small states'
    elif country == 'Small Island Developing States': return 'Pacific island small states'
    elif "+" in country: return country[0:country.find("+")-1]
    elif "(" in country: return country[0:country.find("(")-1]
    
    else: return country

In [53]:

def country_name_converter2(country):
    #convert country names between databases: FAO to latitude bands map/info
    if country=='Serbia and Montenegro': return 'Serbia'
    elif country=='Russian Federation': return 'Russia'
    elif country=='United States Virgin Islands': return 'Virgin Islands'
    elif country=='Iran (Islamic Republic of)': return 'Iran'
    elif country=='Viet Nam': return 'Vietnam'
    elif country=='China, mainland': return 'China'
    elif country=='United States of America': return 'United States'
    elif country=='Falkland Islands (Malvinas)': return 'Falkland Islands (Islas Malvinas)'
    elif country=='Ethiopia PDR': return 'Ethiopia'
    elif country=='Bolivia (Plurinational State of)': return 'Bolivia'
    elif country=='USSR': return 'Russia'
    elif country=='World': return 'Pacific Ocean'
    elif country=='Yugoslav SFR': return 'Serbia'
    elif country=='Republic of Moldova': return 'Moldova'
    elif country=='Bahamas': return 'Bahamas, The'
    elif country=='Pacific Islands Trust Territory': return 'Pacific Ocean'
    elif country=="Lao People's Democratic Republic": return 'Laos'
    elif country=='Belgium-Luxembourg': return 'Belgium'
    elif country=='China, Hong Kong SAR': return 'Hong Kong'
    elif country=='Gambia': return 'Gambia, The'
    elif country=='China, Macao SAR': return 'Macau'
    elif country=='United Republic of Tanzania': return 'Tanzania'
    elif country=='Wallis and Futuna Islands': return 'Wallis and Futuna'
    elif country=='Venezuela (Bolivarian Republic of)': return 'Venezuela'
    elif country=='Occupied Palestinian Territory': return 'West Bank'
    elif country=='Syrian Arab Republic': return 'Syria'
    elif country=='Republic of Korea': return 'Korea, South'
    elif country=="Democratic People's Republic of Korea": return 'Korea, North'
    elif country=='Brunei Darussalam': return 'Brunei'
    elif country=='Democratic Republic of the Congo': return 'Congo, Democratic Republic of the'
    elif country=='Micronesia (Federated States of)': return 'Micronesia, Federated States of'
    elif country=='Holy See': return 'Holy See (Vatican City)'
    elif country=='The former Yugoslav Republic of Macedonia': return 'Macedonia'
    elif country=='R\xe9union': return 'Reunion'
    elif country=='China, Taiwan Province of': return 'Taiwan'
    elif country=="C\xf4te d'Ivoire": return "Cote d'Ivoire"
    elif country=='Myanmar': return 'Burma'
    elif country=='Sudan (former)': return 'Sudan'
    elif country=='Congo': return 'Congo, Republic of the'
    elif country=='Netherlands Antilles': return 'Aruba'
    elif country=='Czechoslovakia': return 'Czech Republic'
    else: return country

In [54]:

def country_name_converter3(country):
    #convert country names for Sankey app
    
    #aggregate regions 
    if "+" in country: return "~ "+country[0:country.find("+")-1]
    elif "(" in country: return country[0:country.find("(")-1]
    regions={'Small Island Developing States','Caribbean','Eastern Asia',\
             'European Union','Least Developed Countries',\
             'Low Income Food Deficit Countries','Northern Africa','Northern America',\
             'South America','Southern Africa','Southern Asia','World'}
    if country in regions: return "~ "+country
    
    #countries
    elif "\xf4" in country: return country[0:country.find("\xf4")]+"o"+country[country.find("\xf4")+1:len(country)]
    elif "of America" in country: return country[0:country.find("of America")-1]
    elif country == 'China, Hong Kong SAR': return 'Hong Kong SAR, China'
    elif country == 'China, Macao SAR': return 'Macao SAR, China'
    elif country == 'China, Taiwan province of': return 'Taiwan'
    elif country == 'Democratic Republic of the Congo': return 'Congo, DRC'
    elif country == "Lao People's Democratic Republic": return 'Lao PDR'
    elif country == 'Republic of Korea': return 'Korea, South'
    elif country == 'Republic of Moldova': return 'Moldova'
    elif country == 'The former Yugoslav Republic of Macedonia': return 'Macedonia, FYR'
    elif country == 'Viet Nam': return 'Vietnam'
    elif country == 'United Republic of Tanzania': return 'Tanzania'
    elif country == "Democratic People's Republic of Korea": return 'Korea, North'
    elif country == "Syrian Arab Republic": return 'Syria'
    
    #special countries, existing in the past, electricity generation values estimated from contemporary countries
    elif country == 'Serbia and Montenegro': return '{former} Serbia and Montenegro'
    elif country == 'China, mainland': return '{former} China, mainland'
    elif country == 'Ethiopia PDR': return '{former} Ethiopia, PDR'
    elif country == 'USSR': return '{former} USSR'
    elif country == 'Yugoslav SFR': return '{former} Yugoslav SFR'
    elif country == 'Belgium-Luxembourg': return '{former} Belgium-Luxembourg'
    elif country == 'Czechoslovakia': return '{former} Czechoslovakia'
    
    else: return country

In [55]:

def country_name_converter4(country):
    #convert country names for World Map app
    
    #from FAO
    if country == 'China, Hong Kong SAR': return 'Hong Kong'
    #elif country == 'China, mainland': return 'China'
    elif country == 'China, Macao SAR': return 'Macao, China'
    elif country == "Lao People's Democratic Republic": return "Lao People's Dem. Rep."
    elif country == 'Sao Tome and Principe': return 'S\xe3o Tom\xe9 and Principe'
    elif country == 'Cabo Verde': return 'Cape Verde'
    elif country == 'Iran (Islamic Republic of)': return 'Iran'
    elif country == 'Viet Nam': return 'Vietnam'
    elif country == 'United States of America': return 'United States'
    elif country == 'Bolivia (Plurinational State of)': return 'Bolivia'
    elif country == 'Central African Republic': return 'Cent African Rep'
    elif country == 'Occupied Palestinian Territory': return 'Palestinian Territories'
    elif country == 'Republic of Moldova': return 'Rep. of Moldova'
    elif country == 'Libya': return 'Libyan Arab Jamahiriya'
    elif country == 'United Republic of Tanzania': return 'Tanzania'
    elif country == 'Venezuela (Bolivarian Republic of)': return 'Venezuela'
    elif country == 'Republic of Korea': return 'Rep. of Korea'
    elif country == "Democratic People's Republic of Korea": return "People's Republic of Korea"
    elif country == 'The former Yugoslav Republic of Macedonia': return 'FYR of Macedonia'
    elif country == 'China, Taiwan Province of': return 'Taiwan'
    elif country == 'Sudan (former)': return 'Sudan'
    elif country == 'Myanmar': return 'Myanmar (Burma)'
    
    #from WDI
    elif country == 'Bahamas, The' :return 'Bahamas'
    elif country == 'Congo, Dem. Rep.':return 'Dem. Rep. of Congo'
    elif country == 'Congo, Rep.':return 'Congo'
    elif country == "Cote d'Ivoire": return  "C\xf4te d'Ivoire"
    elif country == 'Egypt, Arab Rep.':return 'Egypt'
    elif country == 'Gambia, The':return 'Gambia'
    elif country == 'Hong Kong SAR, China': return 'Hong Kong'
    elif country == 'Iran, Islamic Rep.': return 'Iran'
    elif country == 'Korea, Dem. Rep.' : return "People's Republic of Korea"
    elif country == 'Korea, Rep.': return 'Rep. of Korea'
    elif country == 'Kyrgyz Republic':return 'Kyrgyzstan'
    elif country == 'Lao PDR': return "Lao People's Dem. Rep."
    elif country == 'Macao SAR, China': return 'Macao, China'
    elif country == 'Macedonia, FYR': return 'FYR of Macedonia'
    elif country == 'Micronesia, Fed. Sts.' :return 'Micronesia (Federated States of)'
    elif country == 'Moldova':return 'Rep. of Moldova'
    elif country == 'Slovak Republic':return 'Slovakia'
    elif country == 'St. Kitts and Nevis':return 'Saint Kitts and Nevis'
    elif country == 'St. Lucia':return 'Saint Lucia'
    elif country == 'St. Vincent and the Grenadines':return 'Saint Vincent and the Grenadines'
    elif country == 'Venezuela, RB': return 'Venezuela'
    elif country == 'Virgin Islands (U.S.)':return 'US Virgin Islands'
    elif country == 'West Bank and Gaza':return 'Palestinian Territories'
    elif country == 'Yemen, Rep.':return 'Yemen'
    
    else: return country

Irrigation profiling

In order to decide on the average water and consequent energy needs of certain crop types in different climatic belts, we conduct an irrigation profiling. Countries without specific irrigation data are assigned a characteristic irrigation energy intensity for their climatic belt, based on the GPS coordinates of their centroid. For this you will need to access the centroids.csv file either directly or locally after download.

In [56]:

#irrigation profiling

#calculate country geo centroids
centroids=pd.read_csv(masterpath+'centroids.csv',sep='\t')
country_lat={}
countries={i[0][0] for i in population.T.iteritems() if "+" not in i[0][0]} | {"World"}
for country in countries:
    try: 
        mystring=centroids.xs(country_name_converter2(country)).index[0]
        if mystring[0]==' ': 
            mystring=mystring[1:len(mystring)]
        country_lat[country]=int(mystring[0:mystring.find(' ')])
        mystring=mystring[(mystring.find(' ')+1):]
        country_lat[country]+=int(mystring[0:mystring.find(' ')])/60.0
        if (mystring[(mystring.find(' ')+1):]=='S'):
            country_lat[country]=country_lat[country]*(-1)
    except: pass

#load land data
land = get_data([masterpath,indicators[6],indicators[6]],\
       ['Country','Item','Element','Year','Value'],[0,3,1],['Area'],\
                ['Total area equipped for irrigation']).drop('Element', axis=1).query('Value!=0')
                #drop null values
points=[]
points2=[]

#get available irrigation data
irrigation=energy.xs(['Energy for power irrigation'],level=[3]).reset_index().drop('Element',\
                                        axis=1).set_index(['Country', 'Year']).query('Value!=0')
for country in countries:
    if '+ (Total)' not in country:
        land_list={}
        irrig_list={}
        land_list2={}
        irrig_list2={}
        try:
            years=range(1961,2012)
            for year in years:
                try:
                    #power for irrigation per area (TWh/100Ha)
                    land_list[year]=land.xs([country,year], level=[0,1]).values[0][0]
                    irrig_list[year]=irrigation.xs([country,year],level=[0,1]).values[0][0]/land_list[year]
                    if year>2005:
                        land_list2[year]=land.xs([country,year], level=[0,1]).values[0][0]
                        irrig_list2[year]=irrigation.xs([country,year],level=[0,1]).values[0][0]/land_list2[year]
                except: pass
            try: 
                points.append([country,country_lat[country],pd.DataFrame(irrig_list.values()).mean().values[0],\
                               pd.DataFrame(land_list.values()).mean().values[0]])
                points2.append([country,country_lat[country],pd.DataFrame(irrig_list2.values()).mean().values[0],\
                               pd.DataFrame(land_list2.values()).mean().values[0]])
            except: pass
        except: pass

We plot the results and notice that the countries in the tropical desert regions indeed have a much higher irrigation energy intensity than others.

In [57]:

%matplotlib inline
import matplotlib.pyplot as plt
import mpld3
mpld3.enable_notebook()

fig, axes = plt.subplots(1,2,subplot_kw=dict(axisbg='#EEEEEE'),figsize=(13,4))

#latest irrigation intensity, average all years, for display

avg={}
wgts={}
for i in range(len(points2)):
    if int(points2[i][1])in avg.keys():
        avg[int(points2[i][1])].append(points2[i][2])
        wgts[int(points2[i][1])].append(points2[i][3])
    else: 
        avg[int(points2[i][1])]=[points2[i][2]]
        wgts[int(points2[i][1])]=[points2[i][3]]
for i in avg.keys():
    avg[i]=np.average(avg[i],weights=wgts[i])
    #simple average, old
    #avg[i]=np.average(avg[i])
avg[90]=avg[-90]=0
irrig_intensity={}
kys=np.sort(avg.keys())
for i in range(2,len(kys)-2):
    irrig_intensity[kys[i]] = np.average([avg[kys[i-2]],avg[kys[i-1]],avg[kys[i]],avg[kys[i+1]],avg[kys[i+2]]])

ax=axes[1]
scatter = ax.scatter([i[1] for i in points2],
                      [i[2] for i in points2],
                      s=[i[3]**(0.7) for i in points2],
                      #c=range(len(points)), 
                      c="yellow",
                      alpha=0.5)
x=[]
y=[]
for i in range(len(irrig_intensity)):
    x.append(np.sort(irrig_intensity.keys())[i])
    y.append(irrig_intensity[x[i]])
ax.plot(x,y,'r',linewidth=4)
ax.grid(color='white', linestyle='solid')

ax.set_xlim([-45,80])
ax.set_ylim([-2,15])
ax.set_title("Average Irrigation Energy Intensity 2006-2011", size=14)
ax.text(0.99,-0.15,'Bubble size ~ Irrigated Area [kHa]',size=12, horizontalalignment='right',transform=ax.transAxes)
 
#ax.set_xticks(np.arange(-90, 91, 20))
tooltip = mpld3.plugins.PointLabelTooltip(scatter, labels= [repr(i[0]) for i in points2])
mpld3.plugins.connect(fig, tooltip)

#set labels for countries with large enough territory
#load country codes for plot annotations  
cc=pd.read_csv(masterpath.replace('db','map/0')+'cc.csv',';')
for i in points2:
    try: 
        #if ((i[3]>1000) or (i[1]<-15) or (i[1]>45) or ((i[2]>7)and(i[3]>1000))): 
            ax.text(i[1],i[2]-0.0,cc[country_name_converter4(i[0])][0],\
                    size=7+i[3]**0.2, horizontalalignment='center',verticalalignment='center',alpha=0.5)
    except: pass
    
#irrig_intensity.update(interpolate(irrig_intensity,[-50,60]))
irrig_intensity[90]=irrig_intensity[-90]=max(irrig_intensity.values())/3
irrig_intensity.update(interpolate(irrig_intensity,range(-37,60),1,1,2))
for i in range(50,60): irrig_intensity.pop(i)
for i in range(-89,-37): irrig_intensity[i]=irrig_intensity[-37]
for i in range(50,90): irrig_intensity[i]=irrig_intensity[49]

ax.scatter([80],[-1.5],[1000**0.7],facecolors='none',alpha=0.5)
ax.scatter([80],[-1.0],[10000**0.7],facecolors='none',alpha=0.5)
ax.scatter([80],[-0.3],[50000**0.7],facecolors='none',alpha=0.5)
ax.text(1.01,0.05,1000,size=10, horizontalalignment='left',verticalalignment='center',transform=ax.transAxes)
ax.text(1.01,0.12,10000,size=10, horizontalalignment='left',verticalalignment='center',transform=ax.transAxes)
ax.text(1.01,0.2,50000,size=10, horizontalalignment='left',verticalalignment='center',transform=ax.transAxes)

#real irrigation intensity, average all years, for access

avg={}
wgts={}
for i in range(len(points)):
    if int(points[i][1])in avg.keys():
        avg[int(points[i][1])].append(points[i][2])
        wgts[int(points[i][1])].append(points[i][3])
    else: 
        avg[int(points[i][1])]=[points[i][2]]
        wgts[int(points[i][1])]=[points[i][3]]
for i in avg.keys():
    avg[i]=np.average(avg[i],weights=wgts[i])
    #simple average, old
    #avg[i]=np.average(avg[i])
avg[90]=avg[-90]=0
irrig_intensity={}
kys=np.sort(avg.keys())
for i in range(2,len(kys)-2):
    irrig_intensity[kys[i]] = np.average([avg[kys[i-2]],avg[kys[i-1]],avg[kys[i]],avg[kys[i+1]],avg[kys[i+2]]])

ax=axes[0]
scatter = ax.scatter([i[1] for i in points],
                      [i[2] for i in points],
                      s=[i[3]**(0.7) for i in points],
                      #c=range(len(points)), 
                      c="yellow",
                      alpha=0.5)
x=[]
y=[]
for i in range(len(irrig_intensity)):
    x.append(np.sort(irrig_intensity.keys())[i])
    y.append(irrig_intensity[x[i]])
ax.plot(x,y,'r',linewidth=4)
ax.grid(color='white', linestyle='solid')

ax.set_xlim([-45,80])
ax.set_ylim([-2,15])
ax.set_title("Average Irrigation Energy Intensity 1961-2011", size=14)
ax.text(0.01,-0.15,'Geographic Latitude',size=12, horizontalalignment='left',transform=ax.transAxes)
ax.text(-0.1,0.5,'Irrigation Energy Intensity [TJ / kHa]',size=12, horizontalalignment='center',\
        verticalalignment='center',rotation=90,transform=ax.transAxes)

#ax.set_xticks(np.arange(-90, 91, 20))
tooltip = mpld3.plugins.PointLabelTooltip(scatter, labels= [repr(i[0]) for i in points])
mpld3.plugins.connect(fig, tooltip)

#set labels for countries with large enough territory
#load country codes for plot annotations  
cc=pd.read_csv(masterpath.replace('db','map/0')+'cc.csv',';')
for i in points:
    try: 
        #if ((i[3]>1000) or (i[1]<-15) or (i[1]>45) or ((i[2]>7)and(i[3]>1000))): 
            ax.text(i[1],i[2]-0.0,cc[country_name_converter4(i[0])][0],\
                    size=7+i[3]**0.2, horizontalalignment='center',verticalalignment='center',alpha=0.5)
    except: pass
    
#irrig_intensity.update(interpolate(irrig_intensity,[-50,60]))
irrig_intensity[90]=irrig_intensity[-90]=max(irrig_intensity.values())/3
irrig_intensity.update(interpolate(irrig_intensity,range(-37,60),1,1,2))
for i in range(50,60): irrig_intensity.pop(i)
for i in range(-89,-37): irrig_intensity[i]=irrig_intensity[-37]
for i in range(50,90): irrig_intensity[i]=irrig_intensity[49]

plotpath="E:/Dropbox (MIT)/Public/Food-Energy/plots/"
fig.savefig(plotpath+'irrig.png',bbox_inches = 'tight', pad_inches = 0.1, dpi=150)
plt.show()

Fishing profiling

In order to decide on the average energy needs of fishing at different industry scales, we conduct an fishing profiling. Countries without specific fishing energy data are assigned a characteristic fishing energy intensity for their other fishing nations comparable to their size. For this we use the already parsed balance dataframe.

In [58]:

#fishing profiling
fishing = balance.xs(['Production','Fish, Seafood'],level=[3,2]).drop('ItemCode',axis=1)
points=[]

#get available fishing data
fish_diesel1=energy.xs([fuels3[0]],level=[3]).reset_index().drop('Element', axis=1).set_index(['Country', 'Year'])
fish_diesel2=energy.xs([fuels3[1]],level=[3]).reset_index().drop('Element', axis=1).set_index(['Country', 'Year'])
for country in countries:
    if '+ (Total)' not in country:
        fishing_list={}
        land_list={}
        try:
            years=range(1961,2012)
            for year in years:
                try:
                    #power for fishing per area (TJ/ktonnes of fish)
                    land_list[year]=fishing.xs([country,year],level=[0,1]).values[0][0]
                    if (land_list[year]!=0):
                        if (fish_diesel1.xs([country,year],level=[0,1]).values[0][0]+\
                             fish_diesel2.xs([country,year],level=[0,1]).values[0][0] != 0): 
                            fishing_list[year]=(fish_diesel1.xs([country,year],level=[0,1]).values[0][0]+\
                                 fish_diesel2.xs([country,year],level=[0,1]).values[0][0])\
                                 /land_list[year]
                except: pass
            try: 
                points.append([country,pd.DataFrame(fishing_list.values()).mean().values[0],\
                               pd.DataFrame(land_list.values()).mean().values[0]])
            except: pass
        except: pass

In [59]:

import mpld3
mpld3.enable_notebook()

y=[np.log(i[1]) for i in points]
x=[np.log(i[2]) for i in points]
w = np.linalg.lstsq(np.array([ np.array(x), np.ones(len(x))]).T,y)[0] # obtaining regression parameters
fig, ax = plt.subplots(subplot_kw=dict(axisbg='#EEEEEE'))
scatter = ax.scatter(x,y,s=500,c=range(len(x)),alpha=0.5)
ax.plot(x,w[0]*np.array(x)+w[1],'r',linewidth=4)
ax.grid(color='white', linestyle='solid')
ax.set_title("Fishing Energy Intensity", size=14)
ax.set_xlabel('log ( Total Fish & Seafood [ktonnes] )')
ax.set_ylabel('log ( Fishing Energy Intensity [TJ / ktonnes] )')
tooltip = mpld3.plugins.PointLabelTooltip(scatter, labels= [repr(i[0]) for i in points])
fig.savefig(plotpath+'fishing.png',bbox_inches = 'tight', pad_inches = 0.1, dpi=150)
mpld3.plugins.connect(fig, tooltip)

In [60]:

def fishing_energy(country,year):
    try:
        return np.exp(w[0]*np.log(fishing.xs([country,year],level=[0,1]).values[0][0])+w[1])\
                         *fishing.xs([country,year],level=[0,1]).values[0][0]
    except:
        return 0

Main code setup

In [61]:

def elapsed():
    from time import time
    global t0
    print time()-t0
    t0=time()

Assign predfined colors for major nodes.

In [62]:

def color(i,node):
    #Sankey Diagram coloring function (optional, Sankey.js does automatic coloring if missing)
    if node in {"Fossil Fuels"}: return "#c49c94"
    elif node in {"Labor"}: return "#7f7f7f"
    elif node in {"Renewables & Nuclear"}: return "#2ca02c"
    elif node in {"Stock Draw","Stock Deposit","Waste"}: return "#c7c7c7"
    elif node in {"Fertilizers"}: return "#c5b0d5"
    elif node in {"Import","Export"}: return "#aec7e8"
    elif node in {"Electricity"}: return "#bcbd22"
    elif node in {'Cereals'}: return "#e7ba52"
    elif node in {'Oilcrops'}: return "#ff9896"
    elif node in {'Fruits, Roots & Vegetables'}: return "#98df8a"
    elif node in {'Sugar Crops'}: return "#ffbb78"
    elif node in {"Feed"}: return "#dbdb8d"
    elif node in {"Seed"}: return "#dbdb8d"
    elif node in {"Processing"}: return "#9edae5"
    elif node in {'Animal Products'}: return "#17becf"
    elif node in {'Aquatic Products'}: return "#1f77b4"
    elif node in {'Vegetable Oils'}: return "#f7b6d2"
    elif node in {'Other Processed'}: return "#9c9ede"
    elif node in {'Sugar'}: return "#d6616b"
    elif node in {"Food", "Production"}: return "#b5cf6b"
    elif node in {"Biofuels & Other"}: return "#e7ba52"
    elif node in {'Crop Residue'}: return "#bd9e39"
    else: return ""

Assign a predfined position in the horizontal plane for major nodes.

In [63]:

def layer(i,node):
    if (i==2):
        #Sankey Diagram layering function (optional, Sankey.js does automatic layering if missing)
        if node in {"Fossil Fuels", "Labor", "Renewables & Nuclear", "Stock Draw"}: return 0
        elif node in {"Fertilizers", "Electricity", "Import"}: return 1
        elif node in {"Production"}: return 5
        elif node in parent:
            if (parent[node] == 'None'): return 3
            else: return 4
        else: return -1
    elif (i==3):
        #Sankey Diagram layering function (optional, Sankey.js does automatic layering if missing)
        if node in {"Production","Import", "Stock Draw"}: return 0
        elif node in {"Export", "Biofuels & Other", "Food", "Stock Deposit", "Waste, Crop Residue"}: return 16
        elif node in {"Feed", "Processing"}: return 10
        elif node in parent:
            if (parent[node] == 'None'): return 5
            else: return 13
        else: return -1
    else:
        #Sankey Diagram layering function (optional, Sankey.js does automatic layering if missing)
        if node in {"Fossil Fuels", "Labor", "Renewables & Nuclear", "Stock Draw"}: return 0
        elif node in {"Fertilizers", "Electricity", "Import"}: return 2
        elif node in {"Export", "Biofuels & Other", "Food", "Stock Deposit", "Waste, Crop Residue","Seed"}: return 16
        elif node in {"Feed", "Processing"}: return 10
        elif node in parent:
            if (parent[node] == 'None'): return 5
            else: return 13
        else: return -1

First run profiling

Define first run profiling parameters, i.e. set a baseline for all values, intensities, etc. using global values. The following 2 cells always need to be run before the Main code, but only once, before the first run.

In [64]:

firstrun = True
#define all countries and regions to iterate over during pre-profiling
countries={i for i in balance.index.get_level_values('Country').unique() if "+" not in i}
regions={'Small Island Developing States','Caribbean','Eastern Asia',\
         'European Union','Least Developed Countries',\
 'Low Income Food Deficit Countries','Northern Africa','Northern America',\
 'South America','Southern Africa','Southern Asia','World'}
countries=set(list(countries.union(regions)))

Conduct first run profiling. This defines the crop supergroups and parents, defines a basic agriculture energy share for countries and defines a basic calorie content for all products.

In [65]:

#code to execture only once, on startup
if (firstrun):
    try: 
        
        n = 4 #number of grouping variations = number of detail levels
        m = 4 #number of data variations = number of data levels

        #group to which food item entry belongs to. defined in food_groups()
        #custom groups possible, along the FAO official ones 
        #i.e. Cereals for Wheat, Meat for Poultrymeat, Alcoholic Beverages for Wine
        supergroup=[{} for i in range(n)]
        #food item parent(source) node, where applicable (secondary food items) 
        #i.e. Feed for Animal Products, Milk for Cheese, Processing for Vegetable Oils & Sugar
        parent={}
        #precentage of crop that is unused but has energy residues remaining inside it, husk
        #this is necessary for recording crop residues on the gross energy details level
        gross={}

        #define food categories. this is done globally, from World 2008 data 
        #presumed to be one of the most comprehensive
        food_groups('World',2008)
        #afer processing we can drop the itemcode column from balance, unless it has already been done
        balance.drop('ItemCode', axis=1, inplace=True)
        
        #agri preprofiling: load agriculture energy share from dataframe and extrapolate where necessary 
        agrienergy=fuel_preprofiling(range(1960,2015))
        
        #food calorie profiling for world to set benchmark crop calorific values
        food_extended = food_profiling('World') 
        food_cal_world = food_cal_profiling('World')
          
        firstrun=False
        print 'firstrun configuration finished'
    except: print 'firstrun configuration error'

firstrun configuration finished

Main

The main code is structured a two nested loops: he outer loops over the countries and regions, whereas the inner loops over the years. For every country and region, before entering the years loop, variable initializations and profilings are conducted. Based on teh availability of years obtained upon food_profiling, the rest of code is invoked only for those years.

In [ ]:

###############################################################################################
######                                   MAIN                                            ######
###############################################################################################
    
#define countries and regions to iterate over
countries={i for i in balance.index.get_level_values('Country').unique() if "+" not in i}
regions={'Small Island Developing States','Caribbean','Eastern Asia',\
         'European Union','Least Developed Countries',\
 'Low Income Food Deficit Countries','Northern Africa','Northern America',\
 'South America','Southern Africa','Southern Asia','World'}
#countries={"Afghanistan"}
#countries=set(list({i[0][0] for i in population.T.iteritems() if "+" not in i[0][0]} | {"World"})[0:200])
#countries={"Saint Helena, Ascension and Tristan da Cunha","Sao Tome and Principe"}
#countries={"United States"}
#countries={"Romania"}
#countries={"C\xf4te d'Ivoire",'Ghana','Zambia','Rwanda','India','China'}
#countries={"Russian Federation","United States of America","Brazil","Hungary","Romania","United Arab Emirates"}
#countries={"United States of America"}
#countries={"China","Canada"}
#countries={'China, Hong Kong SAR'}
#countries={"China, Taiwan Province of"}
#countries={"United Arab Emirates"}
#countries={"Iran (Islamic Republic of)"}
countries={"Turkmenistan", "Malaysia"}
#countries={'World'}
#countries={'Belize','Barbados','Burkina Faso'}
#regions={'Northern America'}
#countries=set([])
regions=set([])
c=[]    
t0=0    
progress=0
problematic=[]

#main loop - double check the countries!
for country in set(list(countries.union(regions))):
    
    try: 
        pc=json.loads(file(savepath+country+'m.json','r').read())
        print country, "data already exists, overwriting"
        
        raise KeyError()  #uncomment only if you want to overwrite existingdata
        
    except:
        
        try:   
            
            #######################################################################
            ######                       MAIN SETUP                          ######
            #######################################################################

            
            #food (meta) profiling independent of years, processes all data  
            food_extended        = food_profiling(country) 
            #load food item caloric values
            foodcalorie          = food_cal_profiling(country)     
                    
            #extract years of data availalibity for country under iteration or use predefined range
            years = {i for a in food_extended for i in food_extended[a].keys()}
            #years = range(1961,1963) #for developer purposes only

            #create metadata variable to save EROEIs for years with available data
            eroei=[{} for k in range(m)]
            eroei_adj=[{} for k in range(m)]

            #create metadata variable to save average energy content years with available data
            aec=[{} for k in range(m)]

            #helper variables to monitor data integrity
            missing=set([])
            estimated=set([])
            integrity={}
            for year in years: integrity[year]=[]
            
            #conduct data profiling, interpolate and extrapolate database values where missing
            #also, after it has been indexed, load the data from pandas dataframes into python dictionaries, 
                                           #which are much faster to work with in case of sparse operations
            
            wdi_import_extended  = wdi_import_profiling(country,years)    
            energy_extended      = fuel_profiling(country, years)   
            energy_extended      = fuel_reprofiling(country, years,energy_extended)
            if (energy_extended=={}):
                print 'progress:',progress*100,'%',country, "no energy input data for any year, skipping"
                raise KeyError() 
            electricity_extended = electricity_profiling(country, years)
            fertilizers_extended = fert_profiling(country, years)
            population_extended  = labor_profiling(country, years)
            if country not in {'World'}:
                import_extended      = import_profiling(country ,years)
            
            #raise KeyError()  #uncomment only if you want to do data-profiling
            
            for year in years:    
                
                # initialize global variables
                at_least_one=False
                #initialize food import variable for sum collection and processing for EROEI
                foodimport=[0 for k in range(m)]
                #initialize food input variable for sum collection and processing for EROEI
                foodinput=[0 for k in range(m)]
                #initialize food output variable for sum collection and processing for EROEI
                foodoutput={"Food":[0 for k in range(m)],\
                            "Biofuels & Other":[0 for k in range(m)],\
                            "Export":[0 for k in range(m)],\
                            "Stock Deposit":[0 for k in range(m)],\
                            "Crop Residue":[0 for k in range(m)]}
                foodinput2={"Fossil Fuels":[0 for k in range(m)],\
                            "Renewables & Nuclear":[0 for k in range(m)],\
                            "Labor":[0 for k in range(m)],\
                            "Seed":[0 for k in range(m)],\
                            "Stock Draw":[0 for k in range(m)]}
                #counter for the groupdict below
                count = [[0 for k in range(m)] for i in range(n)]
                #initialize dictionary for item order of addition - necessary format for sankey.js
                groupdict=[[{} for k in range(m)] for i in range(n)]
                #dictionary to keep track of food item sums in case of aggregations, based on the supergroup  
                supersum=[[{} for k in range(m)] for i in range(n)]      
                #initialize dictionary array for later JSON export for D3 & sankey.js visualization
                data=[[{"nodes":[],"links":[]} for k in range(m)] for i in range(n)]
                #save source crop share for supply input distribution later
                share=[[{} for k in range(m)] for i in range(n)]
                #elelctricity counters reset
                eltotal=0
                eltotal_n=0
                electricity_to_fossil=0

                #run code by parts

                #######################################################################
                ######                       MAIN CODE                           ######
                #######################################################################

                food_data_ok=True
                try: 
                    food_part(country, year)
                except:
                    print country, year, 'food error, skipping year' 
                    food_data_ok=False
            
                #check for food data availability, part of data quality profiling criterion
                if food_data_ok: 
                    try: fuels_part1(country, year)
                    except: print country,year,"fuel1 error"
                    try: fuels_part2(country, year, eltotal, 0)
                    except: print country,year,"fuel2a error"
                    try: fuels_part2(country, year, eltotal_n, 1)
                    except: print country,year,"fuel2b error"
                    try: fuels_part3(country, year)
                    except: print country,year,"fuel3 error"
                    try: fertilizers_part(country, year)
                    except: print country,year,"fertilizers error"
                    try: labor_part(country, year)
                    except: print country,year,"labor error" 
                    if country not in {'World'}:
                        try: import_part(country, year)
                        except: print country,year,"import error" 
                    try:
                        save_data(country_name_converter3(country), year)
                        at_least_one=True
                    except:
                        print country, year, 'save error'
                        food_data_ok=False
            
            #shorten country names if aggregate, such as World, solve unicode conflicts 
            country=country_name_converter3(country)
            progress+=1.0/len(countries.union(regions))
            
            #save EROEI values for years with available data
            if at_least_one:
                c.append(country)
                print 'progress:',progress*100,'%',country,'save OK for',eroei[0].keys()
                
                #normalize eroei
                for k in range(m):
                    keys=eroei[k].keys()
                    for year in keys:
                        pass
                        #if (np.isnan(eroei[k][year])|(eroei[k][year]==0)): eroei[k].pop(year);
                    if set(eroei[k].keys()).intersection(years):
                        #interpolate null or NaN eroei values
                        eroei[k].update(interpolate(eroei[k],years,1))
                
                #normalize aec
                for k in range(m):
                    keys=aec[k].keys()
                    for year in keys:
                        pass
                        #if (np.isnan(aec[k][year])|(aec[k][year]==0)): aec[k].pop(year);
                    if set(aec[k].keys()).intersection(years):
                        #interpolate null or NaN aec values
                        aec[k].update(interpolate(aec[k],years,1))
                        
                #calculate adjusted EROEI
                for k in range(m):
                    keys=aec[k].keys()
                    minyear=min(keys) #first available year - to normalize after
                    for year in keys:
                        try: eroei_adj[k][year]=eroei[k][year]/(aec[k][year]/aec[k][minyear])
                        except: eroei_adj[k][year]=0
                    
                    
                #save missing, estimated & interpolated data item list and EROEI
                file(savepath+country+'m.json','w').write(json.dumps({"missing":list(missing),
                                                                      "estimated":list(estimated),
                                                                      "interpolated":integrity,
                                                                      "eroei":eroei_adj,
                                                                      "eroei_raw":eroei,
                                                                      "aec":aec}))
            else:
                print 'progress:',progress*100,'%',country, "no food data for any year, skipping"
                raise KeyError() 

        except: 
            print 'progress:',progress*100,'%',country, "major data insufficiency, skipping country"
            problematic.append(country)

prev_countries={'countries':[]}
try: prev_countries=json.loads(file(savepath+'countries.json','r').read())
except: print 'country list does not exist, creating one'
for i in c:
    if i not in prev_countries['countries']:
        prev_countries['countries'].append(i)
prev_countries['countries'].sort()
file(savepath+'countries.json','w').write(json.dumps(prev_countries))
if problematic: print "Errors encountered for the following countries",problematic

Disclaimer

We would like to express our gratitude to all of the developers of the libraries used and especially to the affiliates of FAOSTAT for their great database and openly accesible data. The data manipulation algorithms and visualization techniques are open sourced and freely reproducible, forks are welcome on GitHub (soon to be made public under MIT license). The concept and methodology are subject to copyright of the authors.

Dénes Csala | 2020