This is converted from rst documentation http://statsmodels.sourceforge.net/devel/vector_ar.html first draft
:orphan:
.. currentmodule:: statsmodels.tsa.vector_ar.var_model
.. _var:
tsa.vector_ar
¶We are interested in modeling a :math:T \times K
multivariate time series
:math:Y
, where :math:T
denotes the number of observations and :math:K
the
number of variables. One way of estimating relationships between the time series
and their lagged values is the vector autoregression process:
.. math::
Y_t = A_1 Y_{t-1} + \ldots + A_p Y_{t-p} + u_t
u_t \sim {\sf Normal}(0, \Sigma_u)
where :math:A_i
is a :math:K \times K
coefficient matrix.
We follow in large part the methods and notation of Lutkepohl (2005) <http://www.springer.com/economics/econometrics/book/978-3-540-40172-8?otherVersion=978-3-540-26239-8>
__,
which we will not develop here.
Model fitting
.. note::
The classes referenced below are accessible via the
:mod:`statsmodels.tsa.api` module.
To estimate a VAR model, one must first create the model using an `ndarray` of
homogeneous or structured dtype. When using a structured or record array, the
class will use the passed variable names. Otherwise they can be passed
explicitly:
# some example data
import numpy as np
import pandas
import statsmodels.api as sm
VAR = sm.tsa.VAR
mdata = sm.datasets.macrodata.load_pandas().data
# prepare the dates index
dates = mdata[['year', 'quarter']].astype(int).astype(str)
# the following from the documentation doesn't work for me
#quarterly = dates["year"] + "Q" + dates["quarter"]
#from statsmodels.tsa.base.datetools import dates_from_str
#quarterly = dates_from_str(quarterly)
# hacking around, this works
q = (mdata[['year']].astype("<S4").values + "Q" +
mdata[['quarter']].values.astype(str)).astype('<S6').ravel()
from statsmodels.tsa.base.datetools import dates_from_str
quarterly = dates_from_str(q)
mdata = mdata[['realgdp','realcons','realinv']]
mdata.index = pandas.DatetimeIndex(quarterly)
data = np.log(mdata).diff().dropna()
# make a VAR model
model = VAR(data)
.. note::
The :class:VAR
class assumes that the passed time series are
stationary. Non-stationary or trending data can often be transformed to be
stationary by first-differencing or some other method. For direct analysis of
non-stationary time series, a standard stable VAR(p) model is not
appropriate.
To actually do the estimation, call the fit
method with the desired lag
order. Or you can have the model select a lag order based on a standard
information criterion (see below):
::
results = model.fit(2)
results.summary()
Summary of Regression Results ================================== Model: VAR Method: OLS Date: Sun, 15, Feb, 2015 Time: 20:51:22 -------------------------------------------------------------------- No. of Equations: 3.00000 BIC: -27.5830 Nobs: 200.000 HQIC: -27.7892 Log likelihood: 1962.57 FPE: 7.42129e-13 AIC: -27.9293 Det(Omega_mle): 6.69358e-13 -------------------------------------------------------------------- Results for equation realgdp ============================================================================== coefficient std. error t-stat prob ------------------------------------------------------------------------------ const 0.001527 0.001119 1.365 0.174 L1.realgdp -0.279435 0.169663 -1.647 0.101 L1.realcons 0.675016 0.131285 5.142 0.000 L1.realinv 0.033219 0.026194 1.268 0.206 L2.realgdp 0.008221 0.173522 0.047 0.962 L2.realcons 0.290458 0.145904 1.991 0.048 L2.realinv -0.007321 0.025786 -0.284 0.777 ============================================================================== Results for equation realcons ============================================================================== coefficient std. error t-stat prob ------------------------------------------------------------------------------ const 0.005460 0.000969 5.634 0.000 L1.realgdp -0.100468 0.146924 -0.684 0.495 L1.realcons 0.268640 0.113690 2.363 0.019 L1.realinv 0.025739 0.022683 1.135 0.258 L2.realgdp -0.123174 0.150267 -0.820 0.413 L2.realcons 0.232499 0.126350 1.840 0.067 L2.realinv 0.023504 0.022330 1.053 0.294 ============================================================================== Results for equation realinv ============================================================================== coefficient std. error t-stat prob ------------------------------------------------------------------------------ const -0.023903 0.005863 -4.077 0.000 L1.realgdp -1.970974 0.888892 -2.217 0.028 L1.realcons 4.414162 0.687825 6.418 0.000 L1.realinv 0.225479 0.137234 1.643 0.102 L2.realgdp 0.380786 0.909114 0.419 0.676 L2.realcons 0.800281 0.764416 1.047 0.296 L2.realinv -0.124079 0.135098 -0.918 0.360 ============================================================================== Correlation matrix of residuals realgdp realcons realinv realgdp 1.000000 0.603316 0.750722 realcons 0.603316 1.000000 0.131951 realinv 0.750722 0.131951 1.000000
Summary of Regression Results
==================================
Model: VAR
Method: OLS
Date: Fri, 08, Jul, 2011
Time: 11:30:22
--------------------------------------------------------------------
No. of Equations: 3.00000 BIC: -27.5830
Nobs: 200.000 HQIC: -27.7892
Log likelihood: 1962.57 FPE: 7.42129e-13
AIC: -27.9293 Det(Omega_mle): 6.69358e-13
--------------------------------------------------------------------
Results for equation realgdp
==============================================================================
coefficient std. error t-stat prob
------------------------------------------------------------------------------
const 0.001527 0.001119 1.365 0.174
L1.realgdp -0.279435 0.169663 -1.647 0.101
L1.realcons 0.675016 0.131285 5.142 0.000
L1.realinv 0.033219 0.026194 1.268 0.206
L2.realgdp 0.008221 0.173522 0.047 0.962
L2.realcons 0.290458 0.145904 1.991 0.048
L2.realinv -0.007321 0.025786 -0.284 0.777
==============================================================================
Results for equation realcons
==============================================================================
coefficient std. error t-stat prob
------------------------------------------------------------------------------
const 0.005460 0.000969 5.634 0.000
L1.realgdp -0.100468 0.146924 -0.684 0.495
L1.realcons 0.268640 0.113690 2.363 0.019
L1.realinv 0.025739 0.022683 1.135 0.258
L2.realgdp -0.123174 0.150267 -0.820 0.413
L2.realcons 0.232499 0.126350 1.840 0.067
L2.realinv 0.023504 0.022330 1.053 0.294
==============================================================================
Results for equation realinv
==============================================================================
coefficient std. error t-stat prob
------------------------------------------------------------------------------
const -0.023903 0.005863 -4.077 0.000
L1.realgdp -1.970974 0.888892 -2.217 0.028
L1.realcons 4.414162 0.687825 6.418 0.000
L1.realinv 0.225479 0.137234 1.643 0.102
L2.realgdp 0.380786 0.909114 0.419 0.676
L2.realcons 0.800281 0.764416 1.047 0.296
L2.realinv -0.124079 0.135098 -0.918 0.360
==============================================================================
Correlation matrix of residuals
realgdp realcons realinv
realgdp 1.000000 0.603316 0.750722
realcons 0.603316 1.000000 0.131951
realinv 0.750722 0.131951 1.000000
Several ways to visualize the data using matplotlib
are available.
Plotting input time series:
::
results.plot()
#.. plot:: plots/var_plot_input.py
Plotting time series autocorrelation function:
results.plot_acorr()
#.. plot:: plots/var_plot_acorr.py
Lag order selection
Choice of lag order can be a difficult problem. Standard analysis employs
likelihood test or information criteria-based order selection. We have
implemented the latter, accessable through the :class:`VAR` class:
model.select_order(15)
VAR Order Selection ====================================================== aic bic fpe hqic ------------------------------------------------------ 0 -27.70 -27.65 9.358e-13 -27.68 1 -28.02 -27.82* 6.745e-13 -27.94* 2 -28.03 -27.66 6.732e-13 -27.88 3 -28.04* -27.52 6.651e-13* -27.83 4 -28.03 -27.36 6.681e-13 -27.76 5 -28.02 -27.19 6.773e-13 -27.69 6 -27.97 -26.98 7.147e-13 -27.57 7 -27.93 -26.79 7.446e-13 -27.47 8 -27.94 -26.64 7.407e-13 -27.41 9 -27.96 -26.50 7.280e-13 -27.37 10 -27.91 -26.30 7.629e-13 -27.26 11 -27.86 -26.09 8.076e-13 -27.14 12 -27.83 -25.91 8.316e-13 -27.05 13 -27.80 -25.73 8.594e-13 -26.96 14 -27.80 -25.57 8.627e-13 -26.90 15 -27.81 -25.43 8.599e-13 -26.85 ====================================================== * Minimum
{'aic': 3, 'bic': 1, 'fpe': 3, 'hqic': 1}
VAR Order Selection
======================================================
aic bic fpe hqic
------------------------------------------------------
0 -27.70 -27.65 9.358e-13 -27.68
1 -28.02 -27.82* 6.745e-13 -27.94*
2 -28.03 -27.66 6.732e-13 -27.88
3 -28.04* -27.52 6.651e-13* -27.83
4 -28.03 -27.36 6.681e-13 -27.76
5 -28.02 -27.19 6.773e-13 -27.69
6 -27.97 -26.98 7.147e-13 -27.57
7 -27.93 -26.79 7.446e-13 -27.47
8 -27.94 -26.64 7.407e-13 -27.41
9 -27.96 -26.50 7.280e-13 -27.37
10 -27.91 -26.30 7.629e-13 -27.26
11 -27.86 -26.09 8.076e-13 -27.14
12 -27.83 -25.91 8.316e-13 -27.05
13 -27.80 -25.73 8.594e-13 -26.96
14 -27.80 -25.57 8.627e-13 -26.90
15 -27.81 -25.43 8.599e-13 -26.85
======================================================
* Minimum
{'aic': 3, 'bic': 1, 'fpe': 3, 'hqic': 1}
When calling the fit
function, one can pass a maximum number of lags and the
order criterion to use for order selection:
::
results = model.fit(maxlags=15, ic='aic')
Forecasting
The linear predictor is the optimal h-step ahead forecast in terms of
mean-squared error:
.. math::
y_t(h) = \nu + A_1 y_t(h − 1) + \cdots + A_p y_t(h − p)
We can use the `forecast` function to produce this forecast. Note that we have
to specify the "initial value" for the forecast:
::
lag_order = results.k_ar
results.forecast(data.values[-lag_order:], 5)
array([[ 0.00616044, 0.00500006, 0.00916198], [ 0.00427559, 0.00344836, -0.00238478], [ 0.00416634, 0.0070728 , -0.01193629], [ 0.00557873, 0.00642784, 0.00147152], [ 0.00626431, 0.00666715, 0.00379567]])
array([[ 0.00616044, 0.00500006, 0.00916198],
[ 0.00427559, 0.00344836, -0.00238478],
[ 0.00416634, 0.0070728 , -0.01193629],
[ 0.00557873, 0.00642784, 0.00147152],
[ 0.00626431, 0.00666715, 0.00379567]])
The forecast_interval
function will produce the above forecast along with
asymptotic standard errors. These can be visualized using the plot_forecast
function:
.. plot:: plots/var_plot_forecast.py
Impulse responses are of interest in econometric studies: they are the
estimated responses to a unit impulse in one of the variables. They are computed
in practice using the MA(:math:\infty
) representation of the VAR(p) process:
.. math::
Y_t = \mu + \sum_{i=0}^\infty \Phi_i u_{t-i}
We can perform an impulse response analysis by calling the irf
function on a
VARResults
object:
::
irf = results.irf(10)
These can be visualized using the plot
function, in either orthogonalized or
non-orthogonalized form. Asymptotic standard errors are plotted by default at
the 95% significance level, which can be modified by the user.
.. note::
Orthogonalization is done using the Cholesky decomposition of the estimated
error covariance matrix :math:`\hat \Sigma_u` and hence interpretations may
change depending on variable ordering.
::
irf.plot(orth=False)
#.. plot:: plots/var_plot_irf.py
Note the plot
function is flexible and can plot only variables of interest if
so desired:
irf.plot(impulse='realgdp')
The cumulative effects :math:\Psi_n = \sum_{i=0}^n \Phi_i
can be plotted with
the long run effects as follows:
irf.plot_cum_effects(orth=False)
#.. plot:: plots/var_plot_irf_cum.py
Forecast errors of component j on k in an i-step ahead forecast can be
decomposed using the orthogonalized impulse responses :math:\Theta_i
:
.. math::
\omega_{jk, i} = \sum_{i=0}^{h-1} (e_j^\prime \Theta_i e_k)^2 / \mathrm{MSE}_j(h)
\mathrm{MSE}_j(h) = \sum_{i=0}^{h-1} e_j^\prime \Phi_i \Sigma_u \Phi_i^\prime e_j
These are computed via the fevd
function up through a total number of steps ahead:
fevd = results.fevd(5)
fevd.summary()
FEVD for realgdp realgdp realcons realinv 0 1.000000 0.000000 0.000000 1 0.864889 0.129253 0.005858 2 0.816725 0.177898 0.005378 3 0.793647 0.197590 0.008763 4 0.777279 0.208127 0.014594 FEVD for realcons realgdp realcons realinv 0 0.359877 0.640123 0.000000 1 0.358767 0.635420 0.005813 2 0.348044 0.645138 0.006817 3 0.319913 0.653609 0.026478 4 0.317407 0.652180 0.030414 FEVD for realinv realgdp realcons realinv 0 0.577021 0.152783 0.270196 1 0.488158 0.293622 0.218220 2 0.478727 0.314398 0.206874 3 0.477182 0.315564 0.207254 4 0.466741 0.324135 0.209124
FEVD for realgdp
realgdp realcons realinv
0 1.000000 0.000000 0.000000
1 0.864889 0.129253 0.005858
2 0.816725 0.177898 0.005378
3 0.793647 0.197590 0.008763
4 0.777279 0.208127 0.014594
FEVD for realcons
realgdp realcons realinv
0 0.359877 0.640123 0.000000
1 0.358767 0.635420 0.005813
2 0.348044 0.645138 0.006817
3 0.319913 0.653609 0.026478
4 0.317407 0.652180 0.030414
FEVD for realinv
realgdp realcons realinv
0 0.577021 0.152783 0.270196
1 0.488158 0.293622 0.218220
2 0.478727 0.314398 0.206874
3 0.477182 0.315564 0.207254
4 0.466741 0.324135 0.209124
They can also be visualized through the returned :class:FEVD
object:
::
results.fevd(20).plot()
#.. plot:: plots/var_plot_fevd.py
A number of different methods are provided to carry out hypothesis tests about the model results and also the validity of the model assumptions (normality, whiteness / "iid-ness" of errors, etc.).
Granger causality
One is often interested in whether a variable or group of variables is "causal"
for another variable, for some definition of "causal". In the context of VAR
models, one can say that a set of variables are Granger-causal within one of the
VAR equations. We will not detail the mathematics or definition of Granger
causality, but leave it to the reader. The :class:`VARResults` object has the
`test_causality` method for performing either a Wald (:math:`\chi^2`) test or an
F-test.
results.test_causality('realgdp', ['realinv', 'realcons'], kind='f')
Granger causality f-test ============================================================= Test statistic Critical Value p-value df ------------------------------------------------------------- 6.999888 2.114554 0.000 (6, 567) ============================================================= H_0: ['realinv', 'realcons'] do not Granger-cause realgdp Conclusion: reject H_0 at 5.00% significance level
{'conclusion': 'reject', 'crit_value': 2.1145543864562706, 'df': (6, 567), 'pvalue': 3.3805963773890438e-07, 'signif': 0.05, 'statistic': 6.999887552254318}
Granger causality f-test
=============================================================
Test statistic Critical Value p-value df
-------------------------------------------------------------
6.999888 2.114554 0.000 (6, 567)
=============================================================
H_0: ['realinv', 'realcons'] do not Granger-cause realgdp
Conclusion: reject H_0 at 5.00% significance level
[88]:
{'conclusion': 'reject',
'crit_value': 2.1145543864562706,
'df': (6, 567),
'pvalue': 3.3805963773886478e-07,
'signif': 0.05,
'statistic': 6.9998875522543473}
Normality
Whiteness of residuals
.. note::
To use this functionality, `pandas <https://pypi.python.org/pypi/pandas>`__
must be installed. See the `pandas documentation
<http://pandas.pydata.org>`__ for more information on the below data
structures.
One is often interested in estimating a moving-window regression on time series data for the purposes of making forecasts throughout the data sample. For example, we may wish to produce the series of 2-step-ahead forecasts produced by a VAR(p) model estimated at each point in time.
::
data
<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 202 entries, 1959-06-30 00:00:00 to 2009-09-30 00:00:00 Data columns (total 3 columns): realgdp 202 non-null values realcons 202 non-null values realinv 202 non-null values dtypes: float64(3)
<class 'pandas.core.frame.DataFrame'>
Index: 500 entries , 2000-01-03 00:00:00 to 2001-11-30 00:00:00
A 500 non-null values
B 500 non-null values
C 500 non-null values
D 500 non-null values
DynamicVAR = sm.tsa.DynamicVAR
var = DynamicVAR(data, lag_order=2, window_type='expanding')
The estimated coefficients for the dynamic model are returned as a
:class:pandas.WidePanel
object, which can allow you to easily examine, for
example, all of the model coefficients by equation or by date:
::
var.coefs
<class 'pandas.core.panel.Panel'> Dimensions: 7 (items) x 193 (major_axis) x 3 (minor_axis) Items axis: L1.realgdp to intercept Major_axis axis: 1961-09-30 00:00:00 to 2009-09-30 00:00:00 Minor_axis axis: realcons to realinv
<class 'pandas.core.panel.WidePanel'>
Dimensions: 9 (items) x 489 (major) x 4 (minor)
Items: L1.A to intercept
Major axis: 2000-01-18 00:00:00 to 2001-11-30 00:00:00
Minor axis: A to D
# all estimated coefficients for equation A
var.coefs.minor_xs('A').info()
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) <ipython-input-18-9ec20e4cabe5> in <module>() ----> 1 var.coefs.minor_xs('A').info() C:\Programs\Python27\lib\site-packages\pandas-0.12.0-py2.7-win32.egg\pandas\core\panel.pyc in minor_xs(self, key, copy) 1027 index -> major axis, columns -> items 1028 """ -> 1029 return self.xs(key, axis=self._AXIS_LEN - 1, copy=copy) 1030 1031 def xs(self, key, axis=1, copy=True): C:\Programs\Python27\lib\site-packages\pandas-0.12.0-py2.7-win32.egg\pandas\core\panel.pyc in xs(self, key, axis, copy) 1052 self._consolidate_inplace() 1053 axis_number = self._get_axis_number(axis) -> 1054 new_data = self._data.xs(key, axis=axis_number, copy=copy) 1055 return self._constructor_sliced(new_data) 1056 C:\Programs\Python27\lib\site-packages\pandas-0.12.0-py2.7-win32.egg\pandas\core\internals.pyc in xs(self, key, axis, copy) 1560 % axis) 1561 -> 1562 loc = self.axes[axis].get_loc(key) 1563 slicer = [slice(None, None) for _ in range(self.ndim)] 1564 slicer[axis] = loc C:\Programs\Python27\lib\site-packages\pandas-0.12.0-py2.7-win32.egg\pandas\core\index.pyc in get_loc(self, key) 714 loc : int if unique index, possibly slice or mask if not 715 """ --> 716 return self._engine.get_loc(key) 717 718 def get_value(self, series, key): C:\Programs\Python27\lib\site-packages\pandas-0.12.0-py2.7-win32.egg\pandas\index.pyd in pandas.index.IndexEngine.get_loc (pandas\index.c:3542)() C:\Programs\Python27\lib\site-packages\pandas-0.12.0-py2.7-win32.egg\pandas\index.pyd in pandas.index.IndexEngine.get_loc (pandas\index.c:3422)() C:\Programs\Python27\lib\site-packages\pandas-0.12.0-py2.7-win32.egg\pandas\hashtable.pyd in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:10682)() C:\Programs\Python27\lib\site-packages\pandas-0.12.0-py2.7-win32.egg\pandas\hashtable.pyd in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:10636)() KeyError: 'A'
Index: 489 entries , 2000-01-18 00:00:00 to 2001-11-30 00:00:00
Data columns:
L1.A 489 non-null values
L1.B 489 non-null values
L1.C 489 non-null values
L1.D 489 non-null values
L2.A 489 non-null values
L2.B 489 non-null values
L2.C 489 non-null values
L2.D 489 non-null values
intercept 489 non-null values
dtype: float64(9)
# coefficients on 11/30/2001
var.coefs.major_xs(datetime(2001, 11, 30)).T A B C D L1.A 0.9567 -0.07389 0.0588 -0.02848 L1.B -0.00839 0.9757 -0.004945 0.005938 L1.C -0.01824 0.1214 0.8875 0.01431 L1.D 0.09964 0.02951 0.05275 1.037 L2.A 0.02481 0.07542 -0.04409 0.06073 L2.B 0.006359 0.01413 0.02667 0.004795 L2.C 0.02207 -0.1087 0.08282 -0.01921 L2.D -0.08795 -0.04297 -0.06505 -0.06814 intercept 0.07778 -0.283 -0.1009 -0.6426
# TODO: I have no idea
dir(var.coefs)
from datetime import datetime
var.coefs.major_axis
<class 'pandas.tseries.index.DatetimeIndex'> [1961-09-30 00:00:00, ..., 2009-09-30 00:00:00] Length: 193, Freq: Q-DEC, Timezone: None
Dynamic forecasts for a given number of steps ahead can be produced using the
forecast
function and return a :class:pandas.DataMatrix
object:
f = var.forecast(2)
f
<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 191 entries, 1962-03-31 00:00:00 to 2009-09-30 00:00:00 Freq: Q-DEC Data columns (total 3 columns): realgdp 191 non-null values realcons 191 non-null values realinv 191 non-null values dtypes: float64(3)
f.head()
realgdp | realcons | realinv | |
---|---|---|---|
1962-03-31 | -8.796236 | 6.948442 | 1.026132 |
1962-06-30 | 5.705878 | -2.381689 | -0.702787 |
1962-09-30 | 11.204638 | -8.439173 | -1.720438 |
1962-12-31 | -2.240191 | 1.797411 | 0.306055 |
1963-03-31 | 5.554851 | -4.448416 | -0.702889 |
A B C D
<snip>
2001-11-23 00:00:00 -6.661 43.18 33.43 -23.71
2001-11-26 00:00:00 -5.942 43.58 34.04 -22.13
2001-11-27 00:00:00 -6.666 43.64 33.99 -22.85
2001-11-28 00:00:00 -6.521 44.2 35.34 -24.29
2001-11-29 00:00:00 -6.432 43.92 34.85 -26.68
2001-11-30 00:00:00 -5.445 41.98 34.87 -25.94
The forecasts can be visualized using plot_forecast
:
var.plot_forecast(2)
C:\Programs\Python27\lib\site-packages\matplotlib\legend.py:613: UserWarning: Legend does not support [<matplotlib.lines.Line2D object at 0x084695F0>] Use proxy artist instead. http://matplotlib.sourceforge.net/users/legend_guide.html#using-proxy-artist (str(orig_handle),)) C:\Programs\Python27\lib\site-packages\matplotlib\legend.py:613: UserWarning: Legend does not support [<matplotlib.lines.Line2D object at 0x084690D0>] Use proxy artist instead. http://matplotlib.sourceforge.net/users/legend_guide.html#using-proxy-artist (str(orig_handle),))
.. currentmodule:: statsmodels.tsa.vector_ar
.. autosummary:: :toctree: generated/
var_model.VAR var_model.VARProcess var_model.VARResults irf.IRAnalysis var_model.FEVD dynamic.DynamicVAR