import pandas as pd
Our first job is to read the CSV file we want into a pandas dataframe.
df = pd.read_csv('lifeexpectancy-pivot.csv')
Let's see which columns and values we have
df.columns
Index([Country, Country Code, 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011], dtype=object)
df.values
array([['Afghanistan', 'AFG', '31.1', ..., '47.9', '48.3', '48.7'], ['Albania', 'ALB', '62.3', ..., '76.8', '76.9', '77.0'], ['Algeria', 'DZA', '47.0', ..., '72.6', '72.9', '73.1'], ..., ['Sub-Saharan Africa (developing only)', 'SSA', '40.5', ..., '53.7', '54.2', '54.7'], ['South Asia', 'SAS', '43.3', ..., '65.0', '65.4', '65.7'], ['High income', 'HIC', '68.4', ..., '78.3', '78.6', '78.8']], dtype=object)
We want to "melt" this by holding the country and country code columns and have two new columns that will contain the year and life expectancy
le = pd.melt(df, id_vars=['Country','Country Code'])
OK, let's see what that did
le.columns
Index([Country, Country Code, variable, value], dtype=object)
le.values
array([['Afghanistan', 'AFG', '1960', '31.1'], ['Albania', 'ALB', '1960', '62.3'], ['Algeria', 'DZA', '1960', '47.0'], ..., ['Sub-Saharan Africa (developing only)', 'SSA', '2011', '54.7'], ['South Asia', 'SAS', '2011', '65.7'], ['High income', 'HIC', '2011', '78.8']], dtype=object)
Great - so we can now just rename "variable" and "value" and then sort by country
le2 = le.rename(columns={ 'variable':'year','value':'life_expectancy'}).sort('Country')
le2.columns
Index([Country, Country Code, year, life_expectancy], dtype=object)
le2.values
array([['Afghanistan', 'AFG', '1984', '40.5'], ['Afghanistan', 'AFG', '1986', '41.1'], ['Afghanistan', 'AFG', '2011', '48.7'], ..., ['Zimbabwe', 'ZWE', '1988', '61.5'], ['Zimbabwe', 'ZWE', '1994', '55.0'], ['Zimbabwe', 'ZWE', '2010', '49.9']], dtype=object)
Finally, we export a CSV
le2.to_csv('lifeexpectancy-unpivoted.csv', sep=',', index=False)