This notebook demonstrates the preprocessing of equity options, in preparation for the estimation of the parameters of a stochastic model. A number of preliminary calculations must be performed:
Each step is now described.
Recall the put-call parity relationship with continuous dividends:
$$ C_t - P_t = S_t e^{-d (T-t)} - K e^{-r (T-t)} $$where
For each maturity, we estimate the linear regression:
$$ C_t - P_t = a_0 + a_1 K $$which yields
$$ r = - \frac{1}{T} \ln (-a_1) $$$$ d = \frac{1}{T} \ln \left( \frac{S_t}{a_0} \right) $$We next want to estimate the implied volatility of an option struck at the forward price. In general, such option is not traded, and the volatility must therefore be estimated. The calculation involves 3 steps, performed separately on calls and puts:
The forward ATM volatility is the average of the volatilities computed on calls and puts.
Recall that the delta of a European call is defined as $N(d_1)$, where
$$ d_{1} = \frac{1}{\sigma \sqrt{T}} \left[ \ln \left( \frac{S}{K} \right) + \left( r + \frac{1}{2}\sigma^2 \right)T \right] $$The "Quick Delta" (QD) is a popular measure of moneyness, inspired from the definition of delta:
$$ QD(K) = N \left( \frac{1}{\sigma \sqrt{T}} \ln \left( \frac{F_T}{K} \right) \right) $$Note that $QD(F_T)=0.5$, for all maturities, while the regular forward delta is a function of time to expiry. This property of Quick Delta makes it convenient for representing the volatility smile.
A number of filters may be applied, in an attempt to exclude inconsistent or erroneous data.
This logic is implemented in the function Compute_IV
, presented below. The function takes as argument a pandas DataFrame
and returns another
DataFrame
, with one row per quote and 14 columns:
import pandas
import dateutil
import re
import datetime
import os
import numpy as np
from pandas import DataFrame
from scipy.interpolate import interp1d
from scipy.stats import norm
import quantlib.reference.names as nm
import quantlib.pricingengines.blackformula
from quantlib.pricingengines.blackformula import blackFormulaImpliedStdDev
def Compute_IV(optionDataFrame, tMin=0, nMin=0, QDMin=0, QDMax=1, keepOTMData=True):
"""
Pre-processing of a standard European option quote file.
- Calculation of implied risk-free rate and dividend yield
- Calculation of implied volatility
- Estimate ATM volatility for each expiry
- Compute implied volatility and Quick Delta for each quote
Options for filtering the input data set:
- maturities with less than nMin strikes are ignored
- maturities shorter than tMin (ACT/365 daycount) are ignored
- strikes with Quick Delta < qdMin or > qdMax are ignored
"""
grouped = optionDataFrame.groupby(nm.EXPIRY_DATE)
isFirst = True
for spec, group in grouped:
print('processing group %s' % spec)
# implied vol for this type/expiry group
indx = group.index
dtTrade = group[nm.TRADE_DATE][indx[0]]
dtExpiry = group[nm.EXPIRY_DATE][indx[0]]
spot = group[nm.SPOT][indx[0]]
daysToExpiry = (dtExpiry-dtTrade).days
timeToMaturity = daysToExpiry/365.0
# exclude groups with too short time to maturity
if timeToMaturity < tMin:
continue
# exclude groups with too few data points
df_call = group[group[nm.OPTION_TYPE] == nm.CALL_OPTION]
df_put = group[group[nm.OPTION_TYPE] == nm.PUT_OPTION]
if (len(df_call) < nMin) | (len(df_put) < nMin):
continue
# calculate forward, implied interest rate and implied div. yield
df_C = DataFrame((df_call['PBid']+df_call['PAsk'])/2, columns=['PremiumC'])
df_C.index = df_call['Strike'].values
df_P = DataFrame((df_put['PBid']+df_put['PAsk'])/2, columns=['PremiumP'])
df_P.index = df_put['Strike'].values
# use 'inner' join because some strikes are not quoted for C and P
df_all = df_C.join(df_P, how='inner')
df_all['Strike'] = df_all.index
df_all['C-P'] = df_all['PremiumC'] - df_all['PremiumP']
y = np.array(df_all['C-P'])
x = np.array(df_all['Strike'])
A = np.vstack((x, np.ones(x.shape))).T
b = np.linalg.lstsq(A, y)[0]
# intercept is last coef
iRate = -np.log(-b[0])/timeToMaturity
dRate = np.log(spot/b[1])/timeToMaturity
discountFactor = np.exp(-iRate*timeToMaturity)
Fwd = spot * np.exp((iRate-dRate)*timeToMaturity)
print('Fwd: %f int rate: %f div yield: %f' % (Fwd, iRate, dRate))
# mid-market ATM volatility
def impvol(cp, strike, premium):
try:
vol = blackFormulaImpliedStdDev(cp, strike,
forward=Fwd, blackPrice=premium, discount=discountFactor,
TTM=timeToMaturity)
except:
vol = np.nan
return vol/np.sqrt(timeToMaturity)
# implied bid/ask vol for all options
df_call['IVBid'] = [impvol('C', strike, price) for strike, price
in zip(df_call['Strike'], df_call['PBid'])]
df_call['IVAsk'] = [impvol('C', strike, price) for strike, price
in zip(df_call['Strike'], df_call['PAsk'])]
df_call['IVMid'] = (df_call['IVBid'] + df_call['IVAsk'])/2
df_put['IVBid'] = [impvol('P', strike, price) for strike, price
in zip(df_put['Strike'], df_put['PBid'])]
df_put['IVAsk'] = [impvol('P', strike, price) for strike, price
in zip(df_put['Strike'], df_put['PAsk'])]
df_put['IVMid'] = (df_put['IVBid'] + df_put['IVAsk'])/2
f_call = interp1d(df_call['Strike'].values, df_call['IVMid'].values)
f_put = interp1d(df_put['Strike'].values, df_put['IVMid'].values)
atmVol = (f_call(Fwd)+f_put(Fwd))/2
print('ATM vol: %f' % atmVol)
# Quick Delta, computed with ATM vol
rv = norm()
df_call['QuickDelta'] = [rv.cdf(np.log(Fwd/strike)/(atmVol*np.sqrt(timeToMaturity))) \
for strike in df_call['Strike']]
df_put['QuickDelta'] = [rv.cdf(np.log(Fwd/strike)/(atmVol*np.sqrt(timeToMaturity))) \
for strike in df_put['Strike']]
# keep data within QD range
df_call = df_call[(df_call['QuickDelta'] >= QDMin) & \
(df_call['QuickDelta'] <= QDMax) ]
df_put = df_put[ (df_put['QuickDelta'] >= QDMin) & \
(df_put['QuickDelta'] <= QDMax) ]
# final assembly...
df_cp = df_call.append(df_put, ignore_index=True)
df_cp[nm.INTEREST_RATE] = iRate
df_cp[nm.DIVIDEND_YIELD] = dRate
df_cp[nm.ATMVOL] = atmVol
df_cp[nm.FORWARD] = Fwd
# keep only OTM data ?
if keepOTMData:
df_cp = df_cp[((df_cp[nm.STRIKE]>=Fwd) & (df_cp[nm.OPTION_TYPE] == nm.CALL_OPTION)) |
((df_cp[nm.STRIKE]<Fwd) & (df_cp[nm.OPTION_TYPE] == nm.PUT_OPTION))]
if isFirst:
df_final = df_cp
isFirst = False
else:
df_final = df_final.append(df_cp, ignore_index=True)
return df_final
Using the SPX data set found in the data folder, the above procedure generates a DataFrame
suited for use in a calibration program.
if __name__ == '__main__':
fname = os.path.join('..', 'data', 'df_SPX_24jan2011.pkl')
option_data_frame = pandas.read_pickle(fname)
df_final = Compute_IV(option_data_frame, tMin=1/12, nMin=6, QDMin=.2, QDMax=.8)
# save a csv file and pickled data frame
df_final.to_csv('../data/df_options_SPX_24jan2011.csv', index=False)
df_final.save('../data/df_options_SPX_24jan2011.pkl')
processing group 2011-01-19 processing group 2011-02-19 Fwd: 1287.691820 int rate: 0.006877 div yield: 0.038437 ATM vol: 0.214836 processing group 2011-03-16 Fwd: 1286.508509 int rate: 0.005435 div yield: 0.028105 ATM vol: 0.202546 processing group 2011-04-21 Fwd: 1284.254302 int rate: 0.005290 div yield: 0.025936 ATM vol: 0.196585 processing group 2011-05-18 Fwd: 1282.553057 int rate: 0.004818 div yield: 0.024819 ATM vol: 0.197806 processing group 2011-08-17 Fwd: 1277.641485 int rate: 0.004739 div yield: 0.022693 ATM vol: 0.204060 processing group 2011-11-17 Fwd: 1272.615205 int rate: 0.005162 div yield: 0.022398 ATM vol: 0.207660 processing group 2012-05-16 Fwd: 1264.157887 int rate: 0.006431 div yield: 0.022232 ATM vol: 0.211682 processing group 2012-11-22 Fwd: 1259.150211 int rate: 0.008381 div yield: 0.021857 ATM vol: 0.214665 processing group 2013-11-21 Fwd: 1255.181390 int rate: 0.013056 div yield: 0.022895 ATM vol: 0.219377