%autosave 10
Autosaving every 10 seconds
Eurex Tutorial with Examples based on the VSTOXX Volatility Index
Dr. Yves J. Hilpisch
Continuum Analytics Europe GmbH
PyData London – 21. February 2014
You find the presentation and the IPython Notebook here:
A brief bio:
See www.hilpisch.com.
Corporations, decision makers and analysts nowadays generally face a number of problems with data:
In addition to these data-oriented problems, there typically are organizational issues that have to be considered:
At Continuum Analytics, the vision for Python-based data analytics is the following:
“To revolutionize data analytics and visualization by moving high-level Python code and domain expertise closer to data. This vision rests on four pillars:
This tutorial focuses on
It does not address such important issues like
A fundamental Python stack for interactive data analytics and visualization should at least contain the following libraries tools:
It is best to use either the Python distribution Anaconda or the Web-based analytics environment Wakari. Both provide almost "complete" Python environments.
For example, pandas can, among others, help with the following data-related problems:
As a simple example let's generate a NumPy array with five sets of 1000 (pseudo-)random numbers each.
import numpy as np # this imports the NumPy library
data = np.random.standard_normal((5, 1000)) # generate 5 sets with 1000 rn each
data[:, :5].round(3) # print first five values of each set rounded to 3 digits
array([[-0.221, 0.118, 1.807, 1.594, -0.778], [-0.781, -0.501, -0.44 , 0.081, 0.123], [-0.509, -0.316, -0.202, -1.206, 0.512], [-0.027, 1.481, 0.605, -0.51 , -0.25 ], [-1.391, -0.599, 1.765, 0.046, -0.321]])
Let's plot a histogram of the 1st, 2nd and 3rd data set.
import matplotlib as mpl # this imports matplotlib
import matplotlib.pyplot as plt # this imports matplotlib.pyplot
%matplotlib inline
# inline plotting
plt.hist([data[0], data[1], data[2]], label=['Set 0', 'Set 1', 'Set 2'])
plt.grid(True) # grid for better readability
plt.legend()
<matplotlib.legend.Legend at 0x108723ed0>
We then want to plot the 'running' cumulative sum of each set.
plt.figure() # initialize figure object
plt.grid(True)
for data_set in enumerate(data): # iterate over all rows
plt.plot(data_set[1].cumsum(), label='Set %s' % data_set[0])
# plot the running cumulative sums for each row
plt.legend(loc=0) # write legend with labels
<matplotlib.legend.Legend at 0x108794350>
Some fundamental statistics from our data sets.
data.mean(axis=1) # average value of the 5 sets
array([-0.04043321, 0.00598647, -0.00801746, 0.02091924, 0.03059838])
data.std(axis=1) # standard deviation of the 5 sets
array([ 1.03730809, 0.9755484 , 0.97019556, 1.028603 , 0.99431244])
np.corrcoef(data).round(3) # correltion matrix of the 5 data sets
array([[ 1. , -0.007, 0.001, 0.022, -0.05 ], [-0.007, 1. , -0.037, -0.015, 0.009], [ 0.001, -0.037, 1. , 0.017, 0.01 ], [ 0.022, -0.015, 0.017, 1. , -0.017], [-0.05 , 0.009, 0.01 , -0.017, 1. ]])
We need to make a couple of imports for what is to come.
import pandas as pd
import pandas.io.data as pdd
from urllib import urlretrieve
The convenience function DataReader makes it easy to read historical stock price data from Yahoo! Finance (http://finance.yahoo.com).
index = pdd.DataReader('^GDAXI', data_source='yahoo', start='2007/3/30')
# e.g. the EURO STOXX 50 ticker symbol -- ^SX5E
index.head(n=5)
Open | High | Low | Close | Volume | Adj Close | |
---|---|---|---|---|---|---|
Date | ||||||
2007-03-30 | 6903.88 | 6965.84 | 6885.58 | 6917.03 | 0 | 6917.03 |
2007-04-02 | 6911.13 | 6945.36 | 6891.80 | 6937.17 | 0 | 6937.17 |
2007-04-03 | 6945.36 | 7053.12 | 6945.27 | 7045.56 | 0 | 7045.56 |
2007-04-04 | 7072.51 | 7076.21 | 7038.49 | 7073.91 | 0 | 7073.91 |
2007-04-05 | 7074.59 | 7103.73 | 7068.09 | 7099.91 | 0 | 7099.91 |
5 rows × 6 columns
index.info()
<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 1762 entries, 2007-03-30 00:00:00 to 2014-02-20 00:00:00 Data columns (total 6 columns): Open 1762 non-null float64 High 1762 non-null float64 Low 1762 non-null float64 Close 1762 non-null float64 Volume 1762 non-null int64 Adj Close 1762 non-null float64 dtypes: float64(5), int64(1)
pandas strength is the handling of indexed/labeled/structured data, like times series data.
index.tail()
Open | High | Low | Close | Volume | Adj Close | |
---|---|---|---|---|---|---|
Date | ||||||
2014-02-14 | 9615.52 | 9677.53 | 9593.37 | 9662.40 | 91504800 | 9662.40 |
2014-02-17 | 9661.80 | 9682.19 | 9645.51 | 9656.76 | 45850100 | 9656.76 |
2014-02-18 | 9674.80 | 9690.97 | 9614.40 | 9659.78 | 58827300 | 9659.78 |
2014-02-19 | 9641.45 | 9695.86 | 9596.42 | 9660.05 | 72892900 | 9660.05 |
2014-02-20 | 9524.58 | 9618.85 | 9504.18 | 9618.85 | 88676900 | 9618.85 |
5 rows × 6 columns
pandas makes it easy to implement vectorized operations, like calculating log-returns over whole time series.
index['Returns'] = np.log(index['Close'] / index['Close'].shift(1))
In addition, pandas makes plotting quite simple and compact.
index[['Close', 'Returns']].plot(subplots=True, style='b', figsize=(8, 5))
array([<matplotlib.axes.AxesSubplot object at 0x106812210>, <matplotlib.axes.AxesSubplot object at 0x106842cd0>], dtype=object)
We now want to check how annual volatility changes over time.
index['Mov_Vol'] = pd.rolling_std(index['Returns'], window=252) * np.sqrt(252)
Obviously, the annual volatility changes significantly over time.
index[['Close', 'Returns', 'Mov_Vol']].plot(subplots=True, style='b', figsize=(8, 5))
array([<matplotlib.axes.AxesSubplot object at 0x106895ad0>, <matplotlib.axes.AxesSubplot object at 0x1068bf2d0>, <matplotlib.axes.AxesSubplot object at 0x1069150d0>], dtype=object)
Trend-based investment strategy with the EURO STOXX 50 index:
Signal generation:
index["42d"] = pd.rolling_mean(index["Close"], window=42)
index["252d"] = pd.rolling_mean(index["Close"], window=252)
index[["Close", "42d", "252d"]].plot(figsize=(8, 5))
<matplotlib.axes.AxesSubplot at 0x106d61050>
index["diff"] = index["42d"] - index["252d"]
index[["Close", "diff"]].plot(subplots=True, figsize=(8, 5))
array([<matplotlib.axes.AxesSubplot object at 0x106ebe650>, <matplotlib.axes.AxesSubplot object at 0x106fcdf90>], dtype=object)
sigdiff = 100.0
index["Signal"] = np.where(index["diff"] > sigdiff, 1, 0)
index["Signal"] = np.where(index["diff"] < -sigdiff, -1, index["Signal"])
index[["Close", "diff", "Signal"]].plot(subplots=True, figsize=(8, 5))
array([<matplotlib.axes.AxesSubplot object at 0x106c5b910>, <matplotlib.axes.AxesSubplot object at 0x10716e550>, <matplotlib.axes.AxesSubplot object at 0x107193bd0>], dtype=object)
# !!AI when writing up maybe exclude log, makes it easier to explain
index["Returns"] = np.log(index["Close"] / index["Close"].shift(1))
index["Strategy"] = (index["Signal"] * index["Returns"])
index["Earnings"] = index["Strategy"].cumsum()
index[["Close", "Signal", "Earnings"]].plot(subplots=True, figsize=(10, 8))
array([<matplotlib.axes.AxesSubplot object at 0x108cb8f90>, <matplotlib.axes.AxesSubplot object at 0x108ce22d0>, <matplotlib.axes.AxesSubplot object at 0x108e45950>], dtype=object)
It is a stylized fact that stock indexes and related volatility indexes are highly negatively correlated. The following example analyzes this stylized fact based on the EURO STOXX 50 stock index and the VSTOXX volatility index using Ordinary Least-Squares regession (OLS).
First, we collect historical data for both the EURO STOXX 50 stock and the VSTOXX volatility index.
import pandas as pd
import datetime as dt
from urllib import urlretrieve
es_url = 'http://www.stoxx.com/download/historical_values/hbrbcpe.txt'
vs_url = 'http://www.stoxx.com/download/historical_values/h_vstoxx.txt'
urlretrieve(es_url, 'es.txt')
urlretrieve(vs_url, 'vs.txt')
('vs.txt', <httplib.HTTPMessage instance at 0x108e4def0>)
The EURO STOXX 50 data is not yet in the right format. Some house cleaning is necessary (I).
lines = open('es.txt').readlines() # reads the whole file line-by-line
lines[:5] # header not well formatted
['Price Indices - EURO Currency\n', 'Date ;Blue-Chip;Blue-Chip;Broad ; Broad ;Ex UK ;Ex Euro Zone;Blue-Chip; Broad\n', ' ; Europe ;Euro-Zone;Europe ;Euro-Zone; ; ; Nordic ; Nordic\n', ' ; SX5P ; SX5E ;SXXP ;SXXE ; SXXF ; SXXA ; DK5F ; DKXF\n', '31.12.1986;775.00 ; 900.82 ; 82.76 ; 98.58 ; 98.06 ; 69.06 ; 645.26 ; 65.56\n']
The EURO STOXX 50 data is not yet in the right format. Some house cleaning is necessary (II).
lines[3883:3890] # from 27.12.2001 additional semi-colon
# look; the format changes half-way in the data set!! An additional semi-colon at end. This will throw off pandas.
['20.12.2001;3537.34; 3617.47; 286.07; 300.97; 317.10; 267.23; 5268.36 ; 363.19\n', '21.12.2001;3616.80; 3696.44; 291.39; 306.60; 322.55; 272.18; 5360.52 ; 370.94\n', '24.12.2001;3622.85; 3696.98; 291.90; 306.77; 322.69; 272.95; 5360.52 ; 370.94\n', '27.12.2001;3686.23; 3778.39; 297.11; 312.43; 327.57; 277.68; 5479.59; 378.69;\n', '28.12.2001;3706.93; 3806.13; 298.73; 314.52; 329.94; 278.87; 5585.35; 386.99;\n', '02.01.2002;3627.81; 3755.56; 293.69; 311.43; 326.77; 272.38; 5522.25; 380.09;\n', '03.01.2002;3699.09; 3833.09; 299.09; 317.54; 332.62; 277.08; 5722.57; 396.12;\n']
The EURO STOXX 50 data is not yet in the right format. Some house cleaning is necessary (III).
# We add an extra "DEL" column so that when we read it in we can delete it after,
# to deal with the additional semi-colon (additional column). Don't forget to
# delete it!
lines = open('es.txt').readlines() # reads the whole file line-by-line
new_file = open('es50.txt', 'w') # opens a new file
new_file.writelines('date' + lines[3][:-1].replace(' ', '') + ';DEL' + lines[3][-1])
# writes the corrected third line (additional column name)
# of the orginal file as first line of new file
new_file.writelines(lines[4:-1]) # writes the remaining lines of the orginal file
The EURO STOXX 50 data is not yet in the right format. Some house cleaning is necessary (IV).
list(open('es50.txt'))[:5] # opens the new file for inspection
['date;SX5P;SX5E;SXXP;SXXE;SXXF;SXXA;DK5F;DKXF;DEL\n', '31.12.1986;775.00 ; 900.82 ; 82.76 ; 98.58 ; 98.06 ; 69.06 ; 645.26 ; 65.56\n', '01.01.1987;775.00 ; 900.82 ; 82.76 ; 98.58 ; 98.06 ; 69.06 ; 645.26 ; 65.56\n', '02.01.1987;770.89 ; 891.78 ; 82.57 ; 97.80 ; 97.43 ; 69.37 ; 647.62 ; 65.81\n', '05.01.1987;771.89 ; 898.33 ; 82.82 ; 98.60 ; 98.19 ; 69.16 ; 649.94 ; 65.82\n']
Now, the data can be safely read into a DataFrame object.
es = pd.read_csv('es50.txt', index_col=0, parse_dates=True, sep=';', dayfirst=True)
del es['DEL'] # delete the helper column
es.info()
<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 6998 entries, 1986-12-31 00:00:00 to 2014-02-19 00:00:00 Data columns (total 8 columns): SX5P 6998 non-null float64 SX5E 6998 non-null float64 SXXP 6998 non-null float64 SXXE 6998 non-null float64 SXXF 6998 non-null float64 SXXA 6998 non-null float64 DK5F 6998 non-null float64 DKXF 6998 non-null float64 dtypes: float64(8)
The VSTOXX data can be read without touching the raw data.
vs = pd.read_csv('vs.txt', index_col=0, header=2, parse_dates=True, sep=',', dayfirst=True)
# you can alternatively read from the Web source directly
# without saving the csv file to disk:
# vs = pd.read_csv(vs_url, index_col=0, header=2,
# parse_dates=True, sep=',', dayfirst=True)
We now merge the data for further analysis.
# Dump EUROSTOXX data that existed before VSTOXX starts, no point
# having it, i.e. all data before 2000-01-01.
import datetime as dt
data = pd.DataFrame({'EUROSTOXX' :
es['SX5E'][es.index > dt.datetime(1999, 12, 31)]})
data = data.join(pd.DataFrame({'VSTOXX' :
vs['V2TX'][vs.index > dt.datetime(1999, 12, 31)]}))
data.info()
<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 3623 entries, 2000-01-03 00:00:00 to 2014-02-19 00:00:00 Data columns (total 2 columns): EUROSTOXX 3623 non-null float64 VSTOXX 3601 non-null float64 dtypes: float64(2)
Let's inspect the two time series.
data.head()
EUROSTOXX | VSTOXX | |
---|---|---|
date | ||
2000-01-03 | 4849.22 | 30.9845 |
2000-01-04 | 4657.83 | 33.2225 |
2000-01-05 | 4541.75 | 32.5944 |
2000-01-06 | 4500.69 | 31.1811 |
2000-01-07 | 4648.27 | 27.4407 |
5 rows × 2 columns
A picture can tell almost the complete story.
# Confirms stylized theory. When index falls volatility spikes.
data.plot(subplots=True, grid=True, style='b', figsize=(10, 5))
array([<matplotlib.axes.AxesSubplot object at 0x108eb1290>, <matplotlib.axes.AxesSubplot object at 0x1095a3410>], dtype=object)
We now generate log returns for both time series.
# Log returns helps comparing two different time series in a
# mathematical way. Seems a common pattern.
rets = np.log(data / data.shift(1))
rets.head()
EUROSTOXX | VSTOXX | |
---|---|---|
date | ||
2000-01-03 | NaN | NaN |
2000-01-04 | -0.040268 | 0.069740 |
2000-01-05 | -0.025237 | -0.019087 |
2000-01-06 | -0.009082 | -0.044328 |
2000-01-07 | 0.032264 | -0.127785 |
5 rows × 2 columns
To this new data set, also stored in a DataFrame object, we apply OLS.
xdat = rets['EUROSTOXX']
ydat = rets['VSTOXX']
model = pd.ols(y=ydat, x=xdat)
model
-------------------------Summary of Regression Analysis------------------------- Formula: Y ~ <x> + <intercept> Number of Observations: 3578 Number of Degrees of Freedom: 2 R-squared: 0.5544 Adj R-squared: 0.5542 Rmse: 0.0378 F-stat (1, 3576): 4448.5161, p-value: 0.0000 Degrees of Freedom: model 1, resid 3576 -----------------------Summary of Estimated Coefficients------------------------ Variable Coef Std Err t-stat p-value CI 2.5% CI 97.5% -------------------------------------------------------------------------------- x -2.7182 0.0408 -66.70 0.0000 -2.7981 -2.6384 intercept -0.0007 0.0006 -1.09 0.2750 -0.0019 0.0005 ---------------------------------End of Summary---------------------------------
Again, we want to see how our results look graphically.
# Again, confirms stylized theory. Highly negative correlation.
plt.plot(xdat, ydat, 'r.')
ax = plt.axis() # grab axis values
x = np.linspace(ax[0], ax[1] + 0.01)
plt.plot(x, model.beta[1] + model.beta[0] * x, 'b', lw=2)
plt.grid(True)
plt.axis('tight')
(-0.10000000000000001, 0.16, -0.43561042330383332, 0.43687964474802654)
Let us see if we can identify systematics over time. And indeed, during the crisis 2007/2008 (yellow dots) volatility has been more pronounced than more recently (red dots).
mpl_dates = mpl.dates.date2num(rets.index)
plt.figure(figsize=(8, 4))
plt.scatter(rets['EUROSTOXX'], rets['VSTOXX'], c=mpl_dates, marker='o')
plt.grid(True)
plt.xlabel('EUROSTOXX')
plt.ylabel('VSTOXX')
plt.colorbar(ticks=mpl.dates.DayLocator(interval=250),
format=mpl.dates.DateFormatter('%d %b %y'))
<matplotlib.colorbar.Colorbar instance at 0x10b2d9c68>
We want to test whether the EURO STOXX 50 and/or the VSTOXX returns are normally distributed or not (e.g. if they might have fat tails). We want to do a
Add on: plot a histogram of the log return frequencies and compare that to a normal distribution with same mean and variance (using e.g. norm.pdf from scipy.stats)
import statsmodels.api as sma
import scipy.stats
rets.head()
EUROSTOXX | VSTOXX | |
---|---|---|
date | ||
2000-01-03 | NaN | NaN |
2000-01-04 | -0.040268 | 0.069740 |
2000-01-05 | -0.025237 | -0.019087 |
2000-01-06 | -0.009082 | -0.044328 |
2000-01-07 | 0.032264 | -0.127785 |
5 rows × 2 columns
r1 = rets["EUROSTOXX"]
print r1.head()
r1.values
date 2000-01-04 -0.040268 2000-01-05 -0.025237 2000-01-06 -0.009082 2000-01-07 0.032264 2000-01-10 0.014048 Name: EUROSTOXX, dtype: float64
array([ -4.02681928e-02, -2.52372741e-02, -9.08168198e-03, ..., -4.80925686e-05, -4.71429611e-04, 1.07722701e-03])
rets = rets.dropna()
# This is a benchmark; normally distributed data looks
# like this.
sma.qqplot(np.random.standard_normal(1000), line='s')
pass
# This is qqplot for classic fat tails.
sma.qqplot(rets["EUROSTOXX"].values, line='s')
pass
# This is qqplot for classic fat tails.
sma.qqplot(rets["VSTOXX"].values, line='s')
pass
pvalues are well below pcritical, so reject nulll hypothesis that either distribution is normal.
scipy.stats.normaltest(rets["EUROSTOXX"].values)
(353.15590680814785, 2.0566917729990975e-77)
scipy.stats.normaltest(rets["VSTOXX"].values)
(600.74885820275608, 3.5403234068567166e-131)
scipy.stats.shapiro(rets["VSTOXX"].values)
(0.957076370716095, 2.971618685827881e-31)
def normality_tests(array):
print "Skew: %s" % (scipy.stats.skew(array), )
print "Skew test: %s" % (scipy.stats.skewtest(array), )
print "Kurt: %s" % (scipy.stats.kurtosis(array), )
print "Kurt test: %s" % (scipy.stats.kurtosistest(array), )
print "Normal test: %s" % (scipy.stats.normaltest(array), )
normality_tests(np.random.standard_normal(10000))
Skew: 0.0320913746685 Skew test: (1.3107691385636868, 0.18993577206096546) Kurt: -0.0508301464056 Kurt test: (-1.0291144471302336, 0.30342589805970654) Normal test: (2.7771922799031561, 0.24942521705745871)
normality_tests(rets["VSTOXX"].values)
Skew: 0.775406652102 Skew test: (16.89996674042791, 4.5016044182388654e-64) Kurt: 3.74756881261 Kurt test: (17.75218246794423, 1.6578425684316801e-70) Normal test: (600.74885820275608, 3.5403234068567166e-131)
rets.hist(bins=20, figsize=(10, 5))
array([[<matplotlib.axes.AxesSubplot object at 0x10bbbd090>, <matplotlib.axes.AxesSubplot object at 0x10bb03050>]], dtype=object)
There has been a number of studies which have illustrated that constant proportion investments in volatility derivatives – given a diversified equity portfolio – might improve investment performance considerably. See, for instance, the study
The Benefits of Volatility Derivatives in Equity Portfolio Management
We now want to replicate (in a simplified fashion) what you can flexibly test here on the basis of two backtesting applications for VSTOXX-based investment strategies:
The strategy we are going to implement and test is characterized as follows:
We already have the necessary data available. However, we want to drop 'NaN' values and want to normalize the index values.
data = data.dropna()
# Reindex so we compare like to like
data = data / data.ix[0] * 100
data.head()
EUROSTOXX | VSTOXX | |
---|---|---|
date | ||
2000-01-03 | 100.000000 | 100.000000 |
2000-01-04 | 96.053180 | 107.222966 |
2000-01-05 | 93.659393 | 105.195824 |
2000-01-06 | 92.812659 | 100.634511 |
2000-01-07 | 95.856035 | 88.562668 |
5 rows × 2 columns
First, the initial invest.
invest = 100
cratio = 0.3
data['Equity'] = (1 - cratio) * invest / data['EUROSTOXX'][0]
data['Volatility'] = cratio * invest / data['VSTOXX'][0]
This can already be considered an static investment strategy.
data['Static'] = (data['Equity'] * data['EUROSTOXX']
+ data['Volatility'] * data['VSTOXX'])
# Not amazing, but shows how to start. Wouldn't impress
# an investor.
data[['EUROSTOXX', 'Static']].plot(figsize=(10, 5))
<matplotlib.axes.AxesSubplot at 0x10babae90>
Second, the dynamic strategy with daily adjustments to keep the value ratio constant.
for i in xrange(1, len(data)):
evalue = data['Equity'][i - 1] * data['EUROSTOXX'][i]
# value of equity position
vvalue = data['Volatility'][i - 1] * data['VSTOXX'][i]
# value of volatility position
tvalue = evalue + vvalue
# total wealth
data['Equity'][i] = (1 - cratio) * tvalue / data['EUROSTOXX'][i]
# re-allocation of total wealth to equity ...
data['Volatility'][i] = cratio * tvalue / data['VSTOXX'][i]
# ... and volatility position
Third, the total wealth position.
data['Dynamic'] = (data['Equity'] * data['EUROSTOXX']
+ data['Volatility'] * data['VSTOXX'])
data.head()
EUROSTOXX | VSTOXX | Equity | Volatility | Static | Dynamic | |
---|---|---|---|---|---|---|
date | ||||||
2000-01-03 | 100.000000 | 100.000000 | 0.700000 | 0.300000 | 100.000000 | 100.000000 |
2000-01-04 | 96.053180 | 107.222966 | 0.724420 | 0.278124 | 99.404116 | 99.404116 |
2000-01-05 | 93.659393 | 105.195824 | 0.725761 | 0.276930 | 97.120322 | 97.106211 |
2000-01-06 | 92.812659 | 100.634511 | 0.718221 | 0.283884 | 95.159214 | 95.228521 |
2000-01-07 | 95.856035 | 88.562668 | 0.686354 | 0.318376 | 93.668025 | 93.987330 |
5 rows × 6 columns
A brief check if the ratios are indeed constant.
(data['Volatility'] * data['VSTOXX'] / data['Dynamic'])[:5]
date 2000-01-03 0.3 2000-01-04 0.3 2000-01-05 0.3 2000-01-06 0.3 2000-01-07 0.3 dtype: float64
(data['Equity'] * data['EUROSTOXX'] / data['Dynamic'])[:5]
date 2000-01-03 0.7 2000-01-04 0.7 2000-01-05 0.7 2000-01-06 0.7 2000-01-07 0.7 dtype: float64
Let us inspect the performance of the strategy.
data[['EUROSTOXX', 'Dynamic']].plot(figsize=(10, 5))
<matplotlib.axes.AxesSubplot at 0x10b85d410>
Write a Python function which allows for an arbitrary but constant ratio to be invested in the VSTOXX index and which returns net performance values (in percent) for the constant proportion VSTOXX strategy.
Add on: find the ratio to be invested in the VSTOXX that gives the maximum performance.
np.linspace(0, 1, num=20)
array([ 0. , 0.05263158, 0.10526316, 0.15789474, 0.21052632, 0.26315789, 0.31578947, 0.36842105, 0.42105263, 0.47368421, 0.52631579, 0.57894737, 0.63157895, 0.68421053, 0.73684211, 0.78947368, 0.84210526, 0.89473684, 0.94736842, 1. ])
import scipy.optimize
def my_investment(cratio):
invest = 100
data['Equity'] = (1 - cratio) * invest / data['EUROSTOXX'][0]
data['Volatility'] = cratio * invest / data['VSTOXX'][0]
for i in xrange(1, len(data)):
evalue = data['Equity'][i - 1] * data['EUROSTOXX'][i]
# value of equity position
vvalue = data['Volatility'][i - 1] * data['VSTOXX'][i]
# value of volatility position
tvalue = evalue + vvalue
# total wealth
data['Equity'][i] = (1 - cratio) * tvalue / data['EUROSTOXX'][i]
# re-allocation of total wealth to equity ...
data['Volatility'][i] = cratio * tvalue / data['VSTOXX'][i]
# ... and volatility position
data['Dynamic'] = (data['Equity'] * data['EUROSTOXX']
+ data['Volatility'] * data['VSTOXX'])
return -data["Dynamic"][-1]
# :) yay!
# reference: http://scipy-lectures.github.io/advanced/mathematical_optimization/
#scipy.optimize.brent(my_investment) # -512.953971939 for 0.488
print my_investment(0.488)
-512.953971939
Using standard Python functionality and pandas, the code that follows reads intraday, high-frequency data from a Web source, plots it and resamples it.
url = 'http://hopey.netfonds.no/posdump.php?'
url += 'date=%s%s%s&paper=AAPL.O&csv_format=csv' % ('2014', '02', '19')
# you may have to adjust the date since only recent dates are available
urlretrieve(url, 'aapl.csv')
('aapl.csv', <httplib.HTTPMessage instance at 0x10c2b5cb0>)
AAPL = pd.read_csv('aapl.csv', index_col=0, header=0, parse_dates=True)
AAPL.info()
<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 14649 entries, 2014-02-19 01:17:18 to 2014-02-19 22:16:26 Data columns (total 6 columns): bid 14649 non-null float64 bid_depth 14649 non-null int64 bid_depth_total 14649 non-null int64 offer 14649 non-null float64 offer_depth 14649 non-null int64 offer_depth_total 14649 non-null int64 dtypes: float64(2), int64(4)
The intraday evolution of the Apple stock price.
AAPL['bid'].plot()
<matplotlib.axes.AxesSubplot at 0x10c457490>
AAPL = AAPL[AAPL.index > dt.datetime(2014, 2, 19, 10, 0, 0)]
# only data later than 10am at that day
A resampling of the data is easily accomplished with pandas.
# this resamples the record frequency to 5 minutes, using mean as aggregation rule
# and fillna(method='ffill') is "forward fill", use last valid value.
AAPL_5min = AAPL.resample(rule='5min', how='mean').fillna(method='ffill')
AAPL_5min.head()
bid | bid_depth | bid_depth_total | offer | offer_depth | offer_depth_total | |
---|---|---|---|---|---|---|
time | ||||||
2014-02-19 10:00:00 | 545.050000 | 100.000000 | 100.000000 | 545.993333 | 241.666667 | 241.666667 |
2014-02-19 10:05:00 | 545.423462 | 142.307692 | 142.307692 | 546.255769 | 107.692308 | 107.692308 |
2014-02-19 10:10:00 | 545.600270 | 127.027027 | 127.027027 | 546.164324 | 100.000000 | 100.000000 |
2014-02-19 10:15:00 | 545.654528 | 183.018868 | 183.018868 | 546.149811 | 100.000000 | 100.000000 |
2014-02-19 10:20:00 | 545.777556 | 173.333333 | 173.333333 | 546.155778 | 100.000000 | 100.000000 |
5 rows × 6 columns
Let's have a graphical look at the new data set.
AAPL_5min['bid'].plot()
<matplotlib.axes.AxesSubplot at 0x10c457850>
With pandas you can easily apply custom functions to time series data.
#!!AI how does numexpr factor in here?
AAPL_5min['bid'].apply(lambda x: 2 * 540 - x).plot()
# this mirrors the stock price development at
<matplotlib.axes.AxesSubplot at 0x10c5984d0>
10 years ago, Python was considered exotic in the analytics space – at best. Languages/packages like R and Matlab dominated the scene. Today, Python has become a major force in financial analytics & visualization due to a number of characteristics:
One of the easiest ways to deploy Python today across a whole organization with a heterogenous IT infrastructure is via Wakari, Continuum's Web-/Browser- and Python-based Data Analytics environment. It is availble both as a cloud as well as an enterprise solution for in-house deployment.
Continuum Analytics Inc. – the company Web site
Dr. Yves J. Hilpisch – my personal Web site
Derivatives Analytics with Python – my new book
Read an Excerpt and Order the Book
Contact Us