Notebook

Standardized Option Quotes Data Format¶

To facilitate model calibration, we propose the definition of a standard data set, which contains all the necessary information. The data is held in a Panda table of type option_quotes, with one row per quote and 8 columns. This table is defined in quantlib/reference/data_structures.py. The column names are defined in reference/names.py, and are as follows:

TRADE_DATE: Quote date, or time stamp
STRIKE: Ditto
EXPIRY_DATE: Option expiry date
OPTION_TYPE: Call/Put flag, coded as "C" or "P"
SPOT: Price of underlying asset
EXERCISE_STYLE: European/American, coded as "Amer" or "Euro"
PRICE_BID: Bid price
PRICE_ASK: Ask price

We do not include the dividend yield nor the risk-free rate in the data set: The implied forward price and risk-free rate are estimated from the call/put parity.

This notebook demonstrates the creation of such data file by processing the quotes on the S&P 500 index options (SPX) provided by the Chicago Board of Options Exchange (CBOE).

Obtaining SPX Option Quotes¶

SPX delayed options quotes are published by the CBOE, in a comma-separated format. The file provides:

the value of the underlying index
bid and ask prices for calls and puts, by strike and expiry date. The expiry date can be extracted from the option ticker.
other information, such as volume and open interest, also by strike and option type.

SPX Option Data Processing¶

We provide below the procedure for converting the raw SPX option data file into the standardized option quotes data format.

### SPX Utility functions These functions parse the SPX option names, and extract expiry date and strike.

In [1]:

from __future__ import print_function
import pandas
import datetime
import dateutil
import re
import os
import quantlib.reference.names as nm
import quantlib.reference.data_structures as ds

def ExpiryMonth(s):
    """
    Convert SPX contract months into month number
    """
    call_months = "ABCDEFGHIJKL"
    put_months = "MNOPQRSTUVWX"

    try:
        m = call_months.index(s)
    except ValueError:
        m = put_months.index(s)

    return m

spx_symbol = re.compile("\\(SPX(1[0-9])([0-9]{2})([A-Z])([0-9]{3,4})-E\\)")

def parseSPX(s):
    """
    Parse an SPX quote string, return expiry date and strike
    """
    tokens = spx_symbol.split(s)

    if len(tokens) == 1:
        return {'dtExpiry': None, 'strike': -1}

    year = 2000 + int(tokens[1])
    day = int(tokens[2])
    month = ExpiryMonth(tokens[3])
    strike = float(tokens[4])

    dtExpiry = datetime.date(year, month, day)

    return ({'dtExpiry': dtExpiry, 'strike': strike})

Reading the SPX raw data file¶

The csv file downloaded from the CBOE site can be converted into a standard option_quotes panda data frame by the following function.

In [2]:

def read_SPX_file(option_data_file):
    """
    Read SPX csv file,
    return spot and a data frame of type option_quotes
    """
    
    # read two lines for spot price and trade date
    with open(option_data_file) as fid:
        lineOne = fid.readline()
        spot = float(lineOne.split(',')[1])

        lineTwo = fid.readline()
        dt = lineTwo.split('@')[0]
        dtTrade = dateutil.parser.parse(dt).date()

        print('Dt Calc: %s Spot: %f' % (dtTrade, spot))
    
    # read all option price records as a data frame
    df = pandas.io.parsers.read_csv(option_data_file, header=0, sep=',', skiprows=[0,1])
    
    # split and stack calls and puts
    
    call_df = df[['Calls', 'Bid', 'Ask']]
    call_df = call_df.rename(columns={'Calls':'Spec', 'Bid':'PBid', 'Ask': 'PAsk'}) 
    call_df['Type'] = nm.CALL_OPTION
    
    put_df = df[['Puts', 'Bid.1', 'Ask.1']]
    put_df = put_df.rename(columns = {'Puts':'Spec', 'Bid.1':'PBid',
    'Ask.1':'PAsk'}) 
    put_df['Type'] = nm.PUT_OPTION
        
    df_all = call_df.append(put_df,  ignore_index=True)
    
    # parse Calls and Puts columns for strike and contract month
    # insert into data frame
    
    cp = [parseSPX(s) for s in df_all['Spec']]
    
    option_quotes = ds.option_quotes_template()
    option_quotes = option_quotes.reindex(index=range(len(cp)))
    
    # Fill the option_quotes data frame
    
    option_quotes[nm.STRIKE] = [x['strike'] for x in cp] 
    option_quotes[nm.EXPIRY_DATE] = [x['dtExpiry'] for x in cp]
    option_quotes[nm.OPTION_TYPE] = df_all['Type']
    option_quotes[nm.EXERCISE_STYLE] = nm.EURO_EXERCISE
    option_quotes[nm.PRICE_BID] = df_all['PBid']
    option_quotes[nm.PRICE_ASK] = df_all['PAsk']
    option_quotes[nm.TRADE_DATE] = dtTrade
    
    option_quotes = option_quotes[(option_quotes[nm.STRIKE] > 0) & \
                    (option_quotes[nm.PRICE_BID]>0) & \
                    (option_quotes[nm.PRICE_ASK]>0)]
                    
    option_quotes[nm.SPOT] = spot
    
    return option_quotes

Example¶

In the example below, the file 'SPX-Options-24jan2011.csv' was downloaded from the CBOE web site. The standardized option quotes data file is saved as a csv file and as a panda data frame.

File paths are relative to the notebooks folder, so it's important that the notebook browser be started with the command:

ipython notebook --pylab inline path-to-the-notebooks-folder

In [3]:

option_data_file = os.path.join('..', 'data', 'SPX-Options-24jan2011.csv')

df_SPX = read_SPX_file(option_data_file)
print('%d records processed' % len(df_SPX))
    
# save a csv file and pickled data frame
df_SPX.to_csv(os.path.join('..', 'data', 'df_SPX_24jan2011.csv'), index=False)
df_SPX.to_pickle(os.path.join('..', 'data', 'df_SPX_24jan2011.pkl'))

Dt Calc: 2011-01-24 Spot: 1290.590000
1472 records processed