Notebook

Discrete Data Models¶

In [1]:

import numpy as np
import statsmodels.api as sm

Data¶

Load data from Spector and Mazzeo (1980). Examples follow Greene's Econometric Analysis Ch. 21 (5th Edition).

In [2]:

spector_data = sm.datasets.spector.load()
spector_data.exog = sm.add_constant(spector_data.exog, prepend=False)

Inspect the data:

In [3]:

print spector_data.exog[:5,:]
print spector_data.endog[:5]

[[  2.66  20.     0.     1.  ]
 [  2.89  22.     0.     1.  ]
 [  3.28  24.     0.     1.  ]
 [  2.92  12.     0.     1.  ]
 [  4.    21.     0.     1.  ]]
[ 0.  0.  0.  0.  1.]

Linear Probability Model (OLS)¶

In [4]:

lpm_mod = sm.OLS(spector_data.endog, spector_data.exog)
lpm_res = lpm_mod.fit()
print 'Parameters: ', lpm_res.params[:-1]

Parameters:  [ 0.46385168  0.01049512  0.37855479]

Logit Model¶

In [5]:

logit_mod = sm.Logit(spector_data.endog, spector_data.exog)
logit_res = logit_mod.fit()
print 'Parameters: ', logit_res.params
print 'Marginal effects: ', logit_res.margeff()

Optimization terminated successfully.
         Current function value: 12.889634
         Iterations 7
Parameters:  [  2.82611259   0.09515766   2.37868766 -13.02134686]
Marginal effects:  [ 0.36258083  0.01220841  0.3051777 ]

As in all the discrete data models presented below, we can print a nice summary of results:

In [6]:

print logit_res.summary()

                           Logit Regression Results                           
==============================================================================
Dep. Variable:                      y   No. Observations:                   32
Model:                          Logit   Df Residuals:                       28
Method:                           MLE   Df Model:                            3
Date:                Sun, 26 Aug 2012   Pseudo R-squ.:                  0.3740
Time:                        20:48:40   Log-Likelihood:                -12.890
converged:                       True   LL-Null:                       -20.592
                                        LLR p-value:                  0.001502
==============================================================================
                 coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
x1             2.8261      1.263      2.238      0.025         0.351     5.301
x2             0.0952      0.142      0.672      0.501        -0.182     0.373
x3             2.3787      1.065      2.234      0.025         0.292     4.465
const        -13.0213      4.931     -2.641      0.008       -22.687    -3.356
==============================================================================

Probit Model¶

In [7]:

probit_mod = sm.Probit(spector_data.endog, spector_data.exog)
probit_res = probit_mod.fit()
print 'Parameters: ', probit_res.params
print 'Marginal effects: ', probit_res.margeff()

Optimization terminated successfully.
         Current function value: 12.818804
         Iterations 6
Parameters:  [ 1.62581004  0.05172895  1.42633234 -7.45231965]
Marginal effects:  [ 0.36078629  0.01147926  0.31651986]

Multinomial Logit¶

Load data from the American National Election Studies:

In [8]:

anes_data = sm.datasets.anes96.load()
anes_exog = anes_data.exog
anes_exog[:,0] = np.log(anes_exog[:,0] + .1)
anes_exog = np.column_stack((anes_exog[:,0],anes_exog[:,2],anes_exog[:,5:8]))
anes_exog = sm.add_constant(anes_exog, prepend=False)

Inspect the data:

In [9]:

print anes_data.exog[:5,:]
print anes_data.endog[:5]

[[ -2.30258509   7.           7.           1.           6.          36.
    3.           1.           1.        ]
 [  5.24755025   1.           3.           3.           5.          20.
    4.           1.           0.        ]
 [  3.43720782   7.           2.           2.           6.          24.
    6.           1.           0.        ]
 [  4.4200447    4.           3.           4.           5.          28.
    6.           1.           0.        ]
 [  6.46162441   7.           5.           6.           4.          68.
    6.           1.           0.        ]]
[ 6.  1.  1.  1.  0.]

Fit MNL model:

In [10]:

mlogit_mod = sm.MNLogit(anes_data.endog, anes_exog)
mlogit_res = mlogit_mod.fit()
print mlogit_res.params

Optimization terminated successfully.
         Current function value: 1461.922747
         Iterations 7
[[ -0.01153597  -0.08875065  -0.1059667   -0.0915567   -0.0932846
   -0.14088069]
 [  0.29771435   0.39166864   0.57345051   1.27877179   1.34696165
    2.07008014]
 [ -0.024945    -0.02289784  -0.01485121  -0.00868135  -0.01790407
   -0.00943265]
 [  0.08249144   0.18104276  -0.00715242   0.19982796   0.21693885
    0.3219257 ]
 [  0.00519655   0.04787398   0.05757516   0.08449838   0.08095841
    0.10889408]
 [ -0.37340168  -2.25091318  -3.66558353  -7.61384309  -7.06047825
  -12.1057509 ]]

Poisson¶

Load the Rand data. Note that this example is similar to Cameron and Trivedi's Microeconometrics Table 20.5, but it is slightly different because of minor changes in the data.

In [11]:

rand_data = sm.datasets.randhie.load()
rand_exog = rand_data.exog.view(float).reshape(len(rand_data.exog), -1)
rand_exog = sm.add_constant(rand_exog, prepend=False)

Fit Poisson model:

In [12]:

poisson_mod = sm.Poisson(rand_data.endog, rand_exog)
poisson_res = poisson_mod.fit(method="newton")
print poisson_res.summary()

Optimization terminated successfully.
         Current function value: 62419.588564
         Iterations 12
                          Poisson Regression Results                          
==============================================================================
Dep. Variable:                      y   No. Observations:                20190
Model:                        Poisson   Df Residuals:                    20180
Method:                           MLE   Df Model:                            9
Date:                Sun, 26 Aug 2012   Pseudo R-squ.:                 0.06343
Time:                        20:48:46   Log-Likelihood:                -62420.
converged:                       True   LL-Null:                       -66647.
                                        LLR p-value:                     0.000
==============================================================================
                 coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
x1            -0.0525      0.003    -18.216      0.000        -0.058    -0.047
x2            -0.2471      0.011    -23.272      0.000        -0.268    -0.226
x3             0.0353      0.002     19.302      0.000         0.032     0.039
x4            -0.0346      0.002    -21.439      0.000        -0.038    -0.031
x5             0.2717      0.012     22.200      0.000         0.248     0.296
x6             0.0339      0.001     60.098      0.000         0.033     0.035
x7            -0.0126      0.009     -1.366      0.172        -0.031     0.005
x8             0.0541      0.015      3.531      0.000         0.024     0.084
x9             0.2061      0.026      7.843      0.000         0.155     0.258
const          0.7004      0.011     62.741      0.000         0.678     0.722
==============================================================================

Alternative solvers¶

The default method for fitting discrete data MLE models is Newton-Raphson. You can use other solvers by using the method argument:

In [14]:

mlogit_res = mlogit_mod.fit(method='bfgs', maxiter=100)
print mlogit_res.summary()

Optimization terminated successfully.
         Current function value: 1461.922747
         Iterations: 54
         Function evaluations: 98
         Gradient evaluations: 89
                          MNLogit Regression Results                          
==============================================================================
Dep. Variable:                      y   No. Observations:                  944
Model:                        MNLogit   Df Residuals:                      908
Method:                           MLE   Df Model:                           30
Date:                Sun, 26 Aug 2012   Pseudo R-squ.:                  0.1648
Time:                        20:48:54   Log-Likelihood:                -1461.9
converged:                       True   LL-Null:                       -1750.3
                                        LLR p-value:                1.822e-102
==============================================================================
       y=1       coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
x1            -0.0115      0.034     -0.336      0.736        -0.079     0.056
x2             0.2977      0.094      3.180      0.001         0.114     0.481
x3            -0.0249      0.007     -3.823      0.000        -0.038    -0.012
x4             0.0825      0.074      1.121      0.262        -0.062     0.227
x5             0.0052      0.018      0.295      0.768        -0.029     0.040
const         -0.3734      0.630     -0.593      0.553        -1.608     0.861
------------------------------------------------------------------------------
       y=2       coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
x1            -0.0888      0.039     -2.266      0.023        -0.166    -0.012
x2             0.3917      0.108      3.619      0.000         0.180     0.604
x3            -0.0229      0.008     -2.893      0.004        -0.038    -0.007
x4             0.1810      0.085      2.123      0.034         0.014     0.348
x5             0.0479      0.022      2.149      0.032         0.004     0.092
const         -2.2509      0.763     -2.949      0.003        -3.747    -0.755
------------------------------------------------------------------------------
       y=3       coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
x1            -0.1060      0.057     -1.858      0.063        -0.218     0.006
x2             0.5735      0.159      3.617      0.000         0.263     0.884
x3            -0.0149      0.011     -1.311      0.190        -0.037     0.007
x4            -0.0072      0.126     -0.057      0.955        -0.255     0.240
x5             0.0576      0.034      1.713      0.087        -0.008     0.123
const         -3.6656      1.157     -3.169      0.002        -5.932    -1.399
------------------------------------------------------------------------------
       y=4       coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
x1            -0.0916      0.044     -2.091      0.037        -0.177    -0.006
x2             1.2788      0.129      9.921      0.000         1.026     1.531
x3            -0.0087      0.008     -1.031      0.302        -0.025     0.008
x4             0.1998      0.094      2.123      0.034         0.015     0.384
x5             0.0845      0.026      3.226      0.001         0.033     0.136
const         -7.6138      0.958     -7.951      0.000        -9.491    -5.737
------------------------------------------------------------------------------
       y=5       coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
x1            -0.0933      0.039     -2.371      0.018        -0.170    -0.016
x2             1.3470      0.117     11.494      0.000         1.117     1.577
x3            -0.0179      0.008     -2.352      0.019        -0.033    -0.003
x4             0.2169      0.085      2.552      0.011         0.050     0.384
x5             0.0810      0.023      3.524      0.000         0.036     0.126
const         -7.0605      0.844     -8.362      0.000        -8.715    -5.406
------------------------------------------------------------------------------
       y=6       coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
x1            -0.1409      0.042     -3.343      0.001        -0.223    -0.058
x2             2.0701      0.143     14.435      0.000         1.789     2.351
x3            -0.0094      0.008     -1.160      0.246        -0.025     0.007
x4             0.3219      0.091      3.534      0.000         0.143     0.500
x5             0.1089      0.025      4.304      0.000         0.059     0.158
const        -12.1058      1.060    -11.421      0.000       -14.183   -10.028
==============================================================================