Unpivoting CSV data with pandas

by Tariq Khokhar

In [2]:
import pandas as pd

Our first job is to read the CSV file we want into a pandas dataframe.

In [4]:
df = pd.read_csv('lifeexpectancy-pivot.csv')

Let's see which columns and values we have

In [5]:
df.columns
Out[5]:
Index([Country, Country Code, 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011], dtype=object)
In [28]:
df.values
Out[28]:
array([['Afghanistan', 'AFG', '31.1', ..., '47.9', '48.3', '48.7'],
       ['Albania', 'ALB', '62.3', ..., '76.8', '76.9', '77.0'],
       ['Algeria', 'DZA', '47.0', ..., '72.6', '72.9', '73.1'],
       ..., 
       ['Sub-Saharan Africa (developing only)', 'SSA', '40.5', ..., '53.7',
        '54.2', '54.7'],
       ['South Asia', 'SAS', '43.3', ..., '65.0', '65.4', '65.7'],
       ['High income', 'HIC', '68.4', ..., '78.3', '78.6', '78.8']], dtype=object)

We want to "melt" this by holding the country and country code columns and have two new columns that will contain the year and life expectancy

In [12]:
le = pd.melt(df, id_vars=['Country','Country Code']) 

OK, let's see what that did

In [25]:
le.columns
Out[25]:
Index([Country, Country Code, variable, value], dtype=object)
In [27]:
le.values
Out[27]:
array([['Afghanistan', 'AFG', '1960', '31.1'],
       ['Albania', 'ALB', '1960', '62.3'],
       ['Algeria', 'DZA', '1960', '47.0'],
       ..., 
       ['Sub-Saharan Africa (developing only)', 'SSA', '2011', '54.7'],
       ['South Asia', 'SAS', '2011', '65.7'],
       ['High income', 'HIC', '2011', '78.8']], dtype=object)

Great - so we can now just rename "variable" and "value" and then sort by country

In [30]:
le2 = le.rename(columns={ 'variable':'year','value':'life_expectancy'}).sort('Country')
In [32]:
le2.columns
Out[32]:
Index([Country, Country Code, year, life_expectancy], dtype=object)
In [33]:
le2.values
Out[33]:
array([['Afghanistan', 'AFG', '1984', '40.5'],
       ['Afghanistan', 'AFG', '1986', '41.1'],
       ['Afghanistan', 'AFG', '2011', '48.7'],
       ..., 
       ['Zimbabwe', 'ZWE', '1988', '61.5'],
       ['Zimbabwe', 'ZWE', '1994', '55.0'],
       ['Zimbabwe', 'ZWE', '2010', '49.9']], dtype=object)

Finally, we export a CSV

In [35]:
le2.to_csv('lifeexpectancy-unpivoted.csv', sep=',', index=False)
In []:
 
Back to top