Unpivoting CSV data with pandas¶

In [2]:

import pandas as pd

Our first job is to read the CSV file we want into a pandas dataframe.

In [4]:

df = pd.read_csv('lifeexpectancy-pivot.csv')

Let's see which columns and values we have

In [5]:

df.columns

Out[5]:

Index([Country, Country Code, 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011], dtype=object)

In [28]:

df.values

Out[28]:

array([['Afghanistan', 'AFG', '31.1', ..., '47.9', '48.3', '48.7'],
       ['Albania', 'ALB', '62.3', ..., '76.8', '76.9', '77.0'],
       ['Algeria', 'DZA', '47.0', ..., '72.6', '72.9', '73.1'],
       ..., 
       ['Sub-Saharan Africa (developing only)', 'SSA', '40.5', ..., '53.7',
        '54.2', '54.7'],
       ['South Asia', 'SAS', '43.3', ..., '65.0', '65.4', '65.7'],
       ['High income', 'HIC', '68.4', ..., '78.3', '78.6', '78.8']], dtype=object)

We want to "melt" this by holding the country and country code columns and have two new columns that will contain the year and life expectancy

In [12]:

le = pd.melt(df, id_vars=['Country','Country Code']) 

OK, let's see what that did

In [25]:

le.columns

Out[25]:

Index([Country, Country Code, variable, value], dtype=object)

In [27]:

le.values

Out[27]:

array([['Afghanistan', 'AFG', '1960', '31.1'],
       ['Albania', 'ALB', '1960', '62.3'],
       ['Algeria', 'DZA', '1960', '47.0'],
       ..., 
       ['Sub-Saharan Africa (developing only)', 'SSA', '2011', '54.7'],
       ['South Asia', 'SAS', '2011', '65.7'],
       ['High income', 'HIC', '2011', '78.8']], dtype=object)

Great - so we can now just rename "variable" and "value" and then sort by country

In [30]:

le2 = le.rename(columns={ 'variable':'year','value':'life_expectancy'}).sort('Country')

In [32]:

le2.columns

Out[32]:

Index([Country, Country Code, year, life_expectancy], dtype=object)

In [33]:

le2.values

Out[33]:

array([['Afghanistan', 'AFG', '1984', '40.5'],
       ['Afghanistan', 'AFG', '1986', '41.1'],
       ['Afghanistan', 'AFG', '2011', '48.7'],
       ..., 
       ['Zimbabwe', 'ZWE', '1988', '61.5'],
       ['Zimbabwe', 'ZWE', '1994', '55.0'],
       ['Zimbabwe', 'ZWE', '2010', '49.9']], dtype=object)

Finally, we export a CSV

In [35]:

le2.to_csv('lifeexpectancy-unpivoted.csv', sep=',', index=False)

In [ ]: