Stanford tuition fun¶

"The Second Tuition Bomb" - The Stanford Illustrated Review (books.google.com)

Stanford lists the historical tuition prices here: Finances facts

Let's see how charting can better illustrate the data.

Download the data¶

In [6]:

# scraping stanford's tuition page

import pandas as pd
import csv
from lxml import html
import requests
import re
# live site is at: "http://facts.stanford.edu/administration/finances"
url = 'http://stash.compjour.org/mirrors/facts.stanford.edu/administration/finances.html'
resp = requests.get(url)
doc = html.fromstring(resp.text)
table = doc.cssselect('table')[3]

rows = []
for trs in table.cssselect('tr')[1:]:
    yr, cost = [t.text for t in trs]
    # cut off the "1959" part of "1950-1959"
    rows.append( [int(yr.split('-')[0]), int(re.sub('\D', '', cost))])

# alternatively
# rows = [( int(tds[0].text.split('-')[0]), int(re.sub('\D', '', tds[1].text))) for tds in
#              [trs for trs in table.cssselect('tr')[1:]]]

Now we need to fill in the gaps between the decades; for years in which no tuition is specified, we assume it's the same tuition as the previous year.

Warning: convoluted code to follow

In [5]:

# make a row for every year
tuition_rows = []
for row in rows:
    if len(tuition_rows) > 0:
        lastyr, lastcost = tuition_rows[-1]
        tuition_rows.extend([[lastyr + i, lastcost] for i in range(1, row[0] - lastyr)])
    tuition_rows.append(row)

# Now make a dataframe
tuition_df = pd.DataFrame(tuition_rows, columns = ['year', 'tuition'])
tuition_df.head()

Out[5]:

	year	tuition
0	1920	120
1	1921	120
2	1922	120
3	1923	120
4	1924	120

Make an inflation-calcuation funciton¶

Download some CPI/inflation data from OKFN

US Consumer Price Index and Inflation (CPI)

In [9]:

########################
# Set up inflation calculator

url = 'https://raw.githubusercontent.com/datasets/cpi-us/master/data/cpiai.csv'
cpidata = list(csv.reader(requests.get(url).text.splitlines()))
cpidf = pd.DataFrame(cpidata[1:], columns = cpidata[0])
cpidf = pd.DataFrame.convert_objects(cpidf, convert_dates = 'coerce', convert_numeric = True)
cpimean_df = cpidf.groupby(cpidf['Date'].map(lambda x: x.year)).mean()

def adjust_for_inflation(amt, from_year, to_year):
    ratio = cpimean_df['Index'][to_year] / cpimean_df['Index'][from_year]
    return round(ratio * amt, 2)

tuition_df['adjusted_tuition'] = tuition_df.apply(lambda x: adjust_for_inflation(x['tuition'], x['year'], 2014), axis=1)
tuition_df.head(15)

/Users/dtown/anaconda/lib/python3.4/site-packages/pandas/core/index.py:667: FutureWarning: scalar indexers for index type Int64Index should be integers and not floating point
  type(self).__name__),FutureWarning)

Out[9]:

	year	tuition	adjusted_tuition
0	1920	120	1400.58
1	1921	120	1572.54
2	1922	120	1675.82
3	1923	120	1646.33
4	1924	120	1639.12
5	1925	120	1600.19
6	1926	120	1585.87
7	1927	120	1617.09
8	1928	120	1635.94
9	1929	120	1635.94
10	1930	300	4202.08
11	1931	300	4614.23
12	1932	300	5144.15
13	1933	300	5425.89
14	1934	300	5243.45

Now we can chart.

In [10]:

import matplotlib.pyplot as pyplot
# this part is needed if you are doing this in an iPython notebook
%matplotlib inline

Sans inflation:

In [11]:

pyplot.plot(tuition_df['year'], tuition_df['tuition'])

Out[11]:

[<matplotlib.lines.Line2D at 0x10d456208>]

Now with adjustments for inflation:

In [13]:

pyplot.plot(tuition_df['year'], tuition_df['adjusted_tuition'])

Out[13]:

[<matplotlib.lines.Line2D at 0x10d620cc0>]

On the same chart:

In [14]:

pyplot.plot(tuition_df['year'], tuition_df['tuition'], label = 'Unadjusted', color = 'orange')
pyplot.plot(tuition_df['year'], tuition_df['adjusted_tuition'], label = 'Adjusted', color = 'red')

Out[14]:

[<matplotlib.lines.Line2D at 0x10d5991d0>]

Truncated:

In [21]:

xdf = tuition_df[tuition_df['year'] > 2000]

pyplot.plot(xdf['year'], xdf['tuition'], label = 'Unadjusted', color = 'orange')
pyplot.plot(xdf['year'], xdf['adjusted_tuition'], label = 'Adjusted', color = 'red')
pyplot.ylim(ymin = 0)

Out[21]:

(0, 45000.0)

In [ ]: