Notebook

This is the analyis notebook for the article Can't Predict? The Wirtschaftsweisen from my blog Minds, Machines and Marshmallows. The blog entry you can find at http://hildensia.github.io/2015/06/19/cant_predict_wirtschaftsweisen.html

In [1]:

%matplotlib inline

import numpy as np
import scipy.stats

import seaborn as sns
import bokeh.plotting as plt
import bokeh.models as mdls
import bokeh.charts as crts

import collections

colors = ['#4C72B0', '#55A868', '#C44E52', '#8172B2', '#CCB974', '#64B5CD']
plt.output_notebook()

TOOLS = "resize,hover,save,reset,pan,box_zoom,wheel_zoom"

BokehJS successfully loaded.

First we load the data. They are

The year
The absolute GND of Germany from 1970-2014. Data from http://www.destatis.org
The growth of GND of Germany from 1970-2014. Data from the world bank http://www.worldbank.org
The predicted growth, as predeicted by the Sachverständigenrat Wirtschaft. Data extracted from the reports at http://www.sachverstaendigenrat-wirtschaft.de/fruehere_jahresgutachten.html
The GND deflator of Germany. Data from the world bank [LINK]

You can find the raw data at http://github.com/hildensia/hildensia.github.io/data/cant_predict/wirtschaftweise.csv

In [2]:

year, gnd_growth, prediction, gnd_abs, deflator = np.genfromtxt('wirtschaftsweisen.csv', delimiter=',')

Now we develop the growth rates from the base year (2005) to sanity check our method by comparing against the absolute GND from destatis.

In [3]:

def develop(start_value, base_idx, percentage, deflator):
    values = np.ndarray((percentage.shape[0] + 1,))
    values[base_idx] = start_value
    for i in range(base_idx, 0, -1):
        values[i-1] = values[i]/(((100 + percentage[i-1]))/100)
   
    for i in range(base_idx, values.shape[0]-1):
        values[i+1] = values[i]*(((100 + percentage[i]))/100)
        
    return values * (deflator / 100)

In [4]:

base_pos = np.where(year==2005)[0]
base_value = gnd_abs[base_pos]
print("GND in 2005 (base year, in mil EUR): {}".format(base_value))

GND in 2005 (base year, in mil EUR): [ 2297.87]

In [5]:

gnd_developed = develop(base_value, base_pos, gnd_growth[1:], deflator)

In [6]:

source = plt.ColumnDataSource(
    data=dict(year=year, 
              gnd_abs=gnd_abs, gnd_developed=gnd_developed)
)

fig = plt.figure(y_axis_label="GND in billion EUR", tools=TOOLS, width=480, height=300)

fig.line("year", "gnd_abs", source=source, legend="Destatis", line_color=colors[0])
fig.line("year", "gnd_developed", source=source, legend="Developed", line_color=colors[1])
fig.legend[0].orientation = 'bottom_right'

hover = fig.select(dict(type=mdls.HoverTool))
hover.tooltips = collections.OrderedDict([
    ('Date', '@year'),
    ('Actual GND', '@gnd_abs'),
    ('Developed', '@gnd_developed'),
])

plt.output_file('sanity.html')
plt.show(fig)

That looks okay. Let's define some measures, to actually measure the error.

In [7]:

def mse(a, b):
    """Mean squared error between two curves."""
    return ((a - b)**2).mean()

def std_err(a, b):
    """Standard error between two curves"""
    return np.sqrt(mse(a, b))

def one_step_error(absolute, relative_prediction):
    """Error for one year look ahead prediction"""
    err = []
    for i, a in enumerate(absolute[:-1]):
        err.append(abs((a * (relative_prediction[i] + 100)/100) - absolute[i+1]))
    return np.array(err)

In [8]:

std_err(gnd_developed, gnd_abs)/gnd_abs.mean()

Out[8]:

0.013709902201761059

Okay, we have around 1% error. That's about okay and within rounding errors.

Now we check, how good the prediction is.

In [9]:

gnd_prediction = develop(base_value, base_pos, prediction[1:], deflator)

In [10]:

source = plt.ColumnDataSource(
    data=dict(year=year, 
              gnd_abs=gnd_abs, gnd_prediction=gnd_prediction)
)

fig = plt.figure(y_axis_label="GND in billion EUR", tools=TOOLS, width=480, height=300)

fig.line("year", "gnd_abs", source=source, legend="Destatis", line_color=colors[0])
fig.line("year", "gnd_prediction", source=source, legend="Prediction", line_color=colors[1])
fig.legend[0].orientation = 'bottom_right'

hover = fig.select(dict(type=mdls.HoverTool))
hover.tooltips = collections.OrderedDict([
    ('Date', '@year'),
    ('Actual GND', '@gnd_abs'),
    ('Prediction', '@gnd_prediction'),
])

plt.output_file('prediction.html')
plt.show(fig)

On a first glance that looks also quite okay, but let's compare with the simplest method, we can imagine: predict the growth of the GND of today for next year, we also test the mean growth so far as growth prediction.

In [11]:

gnd_easy_guess = develop(base_value, base_pos, gnd_growth[:-1], deflator)

In [12]:

mean_growth = np.array([gnd_growth[1:i].mean() for i in range(2, 46)])
gnd_mean_growth = develop(base_value, base_pos, mean_growth, deflator)

In [13]:

source = plt.ColumnDataSource(
    data=dict(year=year, 
              gnd_abs=gnd_abs, gnd_prediction=gnd_prediction, gnd_easy_guess=gnd_easy_guess, gnd_mean_growth=gnd_mean_growth)
)

fig = plt.figure(y_axis_label="GND in billion EUR", tools=TOOLS, width=480, height=300)
fig.line("year", "gnd_abs", source=source, legend="Destatis", line_color=colors[0])
fig.line("year", "gnd_prediction", source=source, legend="Prediction", line_color=colors[1])
fig.line("year", "gnd_easy_guess", source=source, legend="Easy guess", line_color=colors[2])
fig.line("year", "gnd_mean_growth", source=source, legend="Mean growth", line_color=colors[3])
fig.legend[0].orientation = 'bottom_right'

hover = fig.select(dict(type=mdls.HoverTool))
hover.tooltips = collections.OrderedDict([
    ('Date', '@year'),
    ('Actual GND', '@gnd_abs'),
    ('Prediction', '@gnd_prediction'),
    ('Easy guess', '@gnd_easy_guess'),
    ('Mean growth', '@gnd_mean_growth')
])


plt.output_file('all_abs.html')
plt.show(fig)

The gnd_easy_guess looks even better, don't you think? Okay, we definitely need some measures. But simply using the error or squared error between the curves gives wrong impressions, because the Wirtschaftsweisen adjust their prediction every year, whereas here I developed the curve from 2005 and made no readjustment. So I use the actual GND of each year as base and computed the absolute prediction. From that I measure the error.

In [14]:

easy_guess_err = one_step_error(gnd_abs, gnd_growth[:-1])
mean_growth_err = one_step_error(gnd_abs, mean_growth)
prediction_err = one_step_error(gnd_abs, prediction)
print("Easy guess error:  {:.2f}, std: {:.2f}".format(easy_guess_err.mean(), easy_guess_err.std()))
print("Mean growth error: {:.2f}, std: {:.2f}".format(mean_growth_err.mean(), mean_growth_err.std()))
print("Prediction error:  {:.2f}, std: {:.2f}".format(prediction_err.mean(), prediction_err.std()))

Easy guess error:  40.07, std: 42.60
Mean growth error: 32.06, std: 28.67
Prediction error:  38.50, std: 33.63

Wait. The mean growth error is smallest? And all the errors are not statistically significantly different! That means, if I simply assume the current growth rate for the next year I'm within the range of the Wirtschaftsweisen. If I assume the mean grwoth rate so far I'm probably better!

But that's not the whole story. Let's look further into the numbers. Let's look at the predicted growth.

In [15]:

source = plt.ColumnDataSource(
    data=collections.OrderedDict(growth=gnd_growth[1:], prediction=prediction[1:], easy_guess=gnd_growth[:-1], mean_growth=mean_growth)
)

TOOLS = "resize,hover,save,reset,xpan,xwheel_zoom"

fig = crts.Bar(source.data, cat=["{:.0f}".format(y) for y in year[:-1]], 
               width=480, ylabel="GND growth in %", tools=TOOLS, palette=colors, legend='bottom_left')

hover = fig.select(dict(type=mdls.HoverTool))
hover.tooltips = collections.OrderedDict([
    ('Date', '@cat'),
    ('Actual GND', '@growth'),
    ('Prediction', '@prediction'),
    ('Easy guess', '@easy_guess'),
    ('Mean growth', '@mean_growth')
])

plt.output_file('growth.html')
plt.show(fig)

This is somewhat hard to read, but let's try. First, what we see is, that the mean growth (purple) stays somewhat stable around 2%. But that's not at all what the real growth rate looks like (green). That one is going up and down. So the mean growth rate might make a smaller error but it is really bad at seeing trends. The easy guess also goes up and down, but always one year later. But the changes are not that long term. It often misses important changes (e.g. see the crisis around 2008). The prediction on the other hand often follows the trend, but misses the right number. We can also get numbers from that. In statistics we use a measure called (Pearson) correlation to see how closely related events from two different random variables are. If they are just a linear function of the other it is 1 or -1 (then the one variable goes up, where the other goes down). If they are not realted whatsoever, the correlation is 0. Lets see how these things are correlated to the actual growth rate.

In [16]:

prediction_corr = scipy.stats.pearsonr(gnd_growth[1:], prediction[1:])
easy_guess_corr = scipy.stats.pearsonr(gnd_growth[1:], gnd_growth[:-1])
mean_growth_corr = scipy.stats.pearsonr(gnd_growth[1:], mean_growth)

print("Correlation prediction/growth:  {}".format(prediction_corr[0]))
print("Correlation easy guess/growth:  {}".format(easy_guess_corr[0]))
print("Correlation mean growth/growth: {}".format(mean_growth_corr[0]))

Correlation prediction/growth:  0.6134760786505592
Correlation easy guess/growth:  0.1718229222019358
Correlation mean growth/growth: 0.42991080443309315

So what we learn from that is that the Wirtschaftsweisen aren't simply stupid. They do a good job in analysing the current situation, they are just not good at predicting the absolut value of the growth. But that's no surprise. Many factors play a role in the value you see at the end of the year, lots' of them not even objective and all highly cupled and non linear. But we can understand bigger things. For example all recessions in Germany followed certain important events:

the first oil crisis
the Iran-Iraq war/second oil crisis
the gulf war and the fall of the wall and the iron curtain
9/11
the global financial crisis

(I hope I don't interpret too much into that.)

No easy guess or mean model would have shown these events. And the scientists highly underestimated the results of these events. Still their trend prognose was quite okay. So we should stop looking at the growth rate prediction and start reading their whole report, which gives much more information. And maybe they should stop giving this number as the main aspect of their report. They even startet to give a higher precision number recently. While in older times they only had 0.5% steps, now they give a one-decimal number. Without any scientific reason to belief that this precision is of any worth. What would be interesting is is the uncertainty they have over their predictions. I didn't find it somewhere in the report (but I might have missed it).