Backtesting with Zipline:

  • Zipline is the engine that powers both backtesting and live trading on Quantopian.
  • The open source portion of Zipline supports nearly all the features that are available on the Quantopian platform.
    • The major exceptions are features that depend on proprietary data.
  • Zipline supplies functions similar to pandas.io.data.get_data_yahoo to create datasources in the format that it expects. (Main differences are slightly different key names, and dates are in UTC.)
In [13]:
%matplotlib inline
In [14]:
import pandas as pd
import zipline
from zipline import TradingAlgorithm
from zipline.data.loader import load_bars_from_yahoo
In [15]:
# Uncomment and run these lines to create a cache file of benchmarks 
# and historical treasury rates in your ~/.zipline directory.
# You should only have to do this once unless you delete the cache.

# zipline.data.loader.dump_treasury_curves()
# zipline.data.loader.dump_benchmarks('SPY')
In [16]:
start = pd.Timestamp('2008-01-01', tz='UTC')
end = pd.Timestamp('2013-01-01', tz='UTC')

input_data = load_bars_from_yahoo(
    stocks=['AAPL', 'MSFT'],
    start=start,
    end=end,
)
input_data
AAPL
MSFT

Out[16]:
<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 1259 (major_axis) x 6 (minor_axis)
Items axis: AAPL to MSFT
Major_axis axis: 2008-01-02 00:00:00+00:00 to 2012-12-31 00:00:00+00:00
Minor_axis axis: open to price
In [17]:
input_data.loc[:,:,'price'].plot()
Out[17]:
<matplotlib.axes.AxesSubplot at 0x116285d10>
In [18]:
volumes = input_data.loc[:,:,'volume']
volumes.plot()
Out[18]:
<matplotlib.axes.AxesSubplot at 0x116569a50>
In [19]:
# Quarterly volumes.  Resampling is awesome!
volumes.resample('1Q', how='sum').plot(kind='bar', stacked=True)
Out[19]:
<matplotlib.axes.AxesSubplot at 0x11610c510>
In [20]:
# A very simple example algo, using the TradingAlgorithm subclass interface.
class BuyAndHoldAlgorithm(TradingAlgorithm):
    
    def initialize(self):
        self.has_ordered = False
    
    def handle_data(self, data):
        """
        Buy 100 shares of every stock in our universe at the start of 
        the simulation.
        """
        if not self.has_ordered:
            for stock in data:
                self.order(stock, 100)
            self.has_ordered = True
In [21]:
my_algo = BuyAndHoldAlgorithm()
results = my_algo.run(input_data)
[2014-09-19 20:31] INFO: Performance: Simulated 1259 trading days out of 1259.
[2014-09-19 20:31] INFO: Performance: first open: 2008-01-02 14:31:00+00:00
[2014-09-19 20:31] INFO: Performance: last close: 2012-12-31 21:00:00+00:00

In [22]:
# Results has very fine-grained info on what your algorithm did.
# These are the raw values that we used to create our displays
# on Quantopian.
list(results.columns)
Out[22]:
['capital_used',
 'ending_cash',
 'ending_value',
 'orders',
 'period_close',
 'period_open',
 'pnl',
 'portfolio_value',
 'positions',
 'returns',
 'starting_cash',
 'starting_value',
 'transactions']
In [23]:
# My algo's positions, on days 0 and 1.
list(results.positions[[0,1]])
Out[23]:
[[],
 [{'amount': 100,
   'cost_basis': 26.6500000000006,
   'last_sale_price': 26.62,
   'sid': 'AAPL'},
  {'amount': 100,
   'cost_basis': 30.160000000012246,
   'last_sale_price': 30.13,
   'sid': 'MSFT'}]]
In [24]:
results.portfolio_value.plot()
Out[24]:
<matplotlib.axes.AxesSubplot at 0x11657b1d0>
In [27]:
%%zipline --symbols=AAPL --start=2009-01-01 --end=2013-01-01 -o outvar
# This is an IPython cell magic.  It's essentially a way to pass the contents
# of a cell into another program.  The %%zipline cell magic runs a a simulation using
# the initialize and handle_data functions defined in the cell, binding its output
# to the name passed to the -o flag.

# Unlike on Quantopian, you need to import magic functions into your namespace.
from zipline.api import (
    add_history,
    history,
    order_target,
    record,
    symbol,
)

def initialize(context):
    # Register 2 histories that track daily prices,
    # one with a 100 window and one with a 300 day window
    add_history(20, '1d', 'price')
    add_history(80, '1d', 'price')

    context.i = 0


def handle_data(context, data):
    # Skip first 300 days to get full windows
    context.i += 1
    if context.i < 80:
        return

    # Compute averages
    # history() has to be called with the same params
    # from above and returns a DataFrame with a DatetimeIndex
    # and columns given by the securities in the backtest.
    short_mavg = history(20, '1d', 'price').mean()
    long_mavg = history(80, '1d', 'price').mean()

    # Trading logic
    if short_mavg['AAPL'] > long_mavg['AAPL']:
        # order_target orders as many shares as needed to
        # achieve the desired number of shares.
        order_target('AAPL', 100)
        
    elif short_mavg['AAPL'] < long_mavg['AAPL']:
        order_target('AAPL', 0)

    # Save values for later inspection
    record(AAPL=data['AAPL'].price,
           short_mavg=short_mavg['AAPL'],
           long_mavg=long_mavg['AAPL'])
[2014-09-19 20:34] INFO: Performance: Simulated 1006 trading days out of 1006.
[2014-09-19 20:34] INFO: Performance: first open: 2009-01-02 14:31:00+00:00
[2014-09-19 20:34] INFO: Performance: last close: 2012-12-31 21:00:00+00:00

AAPL

In [26]:
output = outvar.dropna(how='any')

import matplotlib.pyplot as plt
fig = plt.figure()
aapl_subplot = fig.add_subplot('211', xlabel='Date', ylabel='Price')
position_value_subplot = fig.add_subplot('212', xlabel='Date', ylabel='Value')

output['AAPL'].plot(ax=aapl_subplot)
output['portfolio_value'].plot(ax=position_value_subplot)

plt.gcf().set_size_inches(14, 10)
In [26]: