%pylab inline
Populating the interactive namespace from numpy and matplotlib
Numpy is automatically imported to the top level namespace when you use %pylab
Numpy's main workhorse are is the "array" data structure. You can think of arrays as vectors or matrices.
test = array(range(10))
print(test)
[0 1 2 3 4 5 6 7 8 9]
Don't be confused by the fact that they output as python lists! It's more like Matlab. The main differences we care about are:
# example of elementwise operations
print(test + 1)
print(test * 2)
print(sqrt(test))
[ 1 2 3 4 5 6 7 8 9 10] [ 0 2 4 6 8 10 12 14 16 18] [ 0. 1. 1.41421356 1.73205081 2. 2.23606798 2.44948974 2.64575131 2.82842712 3. ]
That's most of what we want to cover! The main thing to keep in mind is underneath the covers, Numpy is doing most of the hard work.
But let's do a somewhat common example to showcase the power of numpy (and scipy)
"""Load all the data!"""
import pandas as pd
import pandas.io.data as web
from datetime import datetime
tickers = ["AAPL", "GOOG", "MSFT"]
stocks = pd.DataFrame({'AAPL': web.DataReader('AAPL', "yahoo", datetime(1970,1,1),
datetime(2013,10,1))['Adj Close'].resample('M', how='last')})
print(stocks)
# load everything into one nice dataframe
for stock in tickers:
stocks[stock] = np.NaN
temp = web.DataReader(stock, "yahoo", datetime(1970,1,1),
datetime(2013,10,1))['Adj Close'].resample('M', how='last')
stocks[stock] = temp
# put things into percent return
for stock in tickers:
stocks[stock] = stocks[stock]/stocks[stock].shift(1) - 1.
# visualize!
stocks.cumsum().plot()
show()
<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 350 entries, 1984-09-30 00:00:00 to 2013-10-31 00:00:00 Freq: M Data columns (total 1 columns): AAPL 350 non-null values dtypes: float64(1)
# load things into numpy data types
V = matrix(stocks.cov().values) # covariance matrix
R = matrix(stocks.mean().values).T # expected returns
ones_col = np.ones((3,1))
ones_row = ones_col.T
print(V)
print()
print(R)
[[ 0.01798821 0.00597144 0.0060241 ] [ 0.00597144 0.01112027 0.00299114] [ 0.0060241 0.00299114 0.01150264]] () [[ 0.02412728] [ 0.02498674] [ 0.02419786]]
# And now let's calculte the optimum portfolio!
# The optimum portfolio is just the inversion of the covariance matrix times the expected returns
# and then "normalized" so that the weights sum to 1
optimum_weights = multiply(((ones_col.T * V.I * R).I), (V.I * R) )
print(optimum_weights)
print(tickers)
[[ 0.07735049] [ 0.48577473] [ 0.43687478]] ['AAPL', 'GOOG', 'MSFT']
Another foundational package is scipy. If you want to work with simulations and probabilities, scipy is a great place to start. Let's show you how to pull some random variables
from scipy.stats import norm # import a normal distribution
from scipy.stats import chi2 # import a chi square distribution
normal_rvs = norm(0, 1).rvs(1000)
chi_rvs = chi2(10).rvs(1000)
hist(chi_rvs, bins=20)
show()
(array([ 17., 48., 85., 144., 145., 138., 106., 82., 72., 56., 40., 30., 15., 5., 6., 1., 5., 3., 1., 1.]), array([ 1.36129869, 2.77380021, 4.18630174, 5.59880326, 7.01130478, 8.4238063 , 9.83630782, 11.24880934, 12.66131086, 14.07381238, 15.4863139 , 16.89881542, 18.31131694, 19.72381846, 21.13631999, 22.54882151, 23.96132303, 25.37382455, 26.78632607, 28.19882759, 29.61132911]), <a list of 20 Patch objects>)