Regresssion lineaire

avec Statsmodel

dataset: auto mpg Mileage per gallon performances of various cars

Disponible sur https://www.kaggle.com/uciml/autompg-dataset

A prédire:

  • mpg: continuous

Les variables

  • cylinders: multi-valued discrete
  • displacement: continuous
  • horsepower: continuous
  • weight: continuous
  • acceleration: continuous

On ne prends pas en compte:

  • model year: multi-valued discrete
  • origin: multi-valued discrete
  • car name: string (unique for each instance)
In [2]:
import pandas as pd
import statsmodels.formula.api as smf

df = pd.read_csv('../data/autos_mpg.csv')
In [3]:
lm = smf.ols(formula='mpg ~ cylinders + displacement + horsepower + weight + acceleration + origin ', data=df).fit()
lm.summary()
Out[3]:
OLS Regression Results
Dep. Variable: mpg R-squared: 0.717
Model: OLS Adj. R-squared: 0.713
Method: Least Squares F-statistic: 165.5
Date: Sat, 22 Sep 2018 Prob (F-statistic): 4.84e-104
Time: 17:58:03 Log-Likelihood: -1131.1
No. Observations: 398 AIC: 2276.
Df Residuals: 391 BIC: 2304.
Df Model: 6
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
Intercept 42.7111 2.693 15.861 0.000 37.417 48.005
cylinders -0.5256 0.404 -1.302 0.194 -1.320 0.268
displacement 0.0106 0.009 1.133 0.258 -0.008 0.029
horsepower -0.0529 0.016 -3.277 0.001 -0.085 -0.021
weight -0.0051 0.001 -6.441 0.000 -0.007 -0.004
acceleration 0.0043 0.120 0.036 0.972 -0.232 0.241
origin 1.4269 0.345 4.136 0.000 0.749 2.105
Omnibus: 32.659 Durbin-Watson: 0.886
Prob(Omnibus): 0.000 Jarque-Bera (JB): 43.338
Skew: 0.624 Prob(JB): 3.88e-10
Kurtosis: 4.028 Cond. No. 3.99e+04


Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 3.99e+04. This might indicate that there are
strong multicollinearity or other numerical problems.
In [4]:
df.corr()
Out[4]:
Unnamed: 0 mpg cylinders displacement horsepower weight acceleration model year origin
Unnamed: 0 1.000000 0.585131 -0.363040 -0.386976 -0.417861 -0.318869 0.287634 0.996800 0.199702
mpg 0.585131 1.000000 -0.775396 -0.804203 -0.771437 -0.831741 0.420289 0.579267 0.563450
cylinders -0.363040 -0.775396 1.000000 0.950721 0.838939 0.896017 -0.505419 -0.348746 -0.562543
displacement -0.386976 -0.804203 0.950721 1.000000 0.893646 0.932824 -0.543684 -0.370164 -0.609409
horsepower -0.417861 -0.771437 0.838939 0.893646 1.000000 0.860574 -0.684259 -0.411651 -0.453669
weight -0.318869 -0.831741 0.896017 0.932824 0.860574 1.000000 -0.417457 -0.306564 -0.581024
acceleration 0.287634 0.420289 -0.505419 -0.543684 -0.684259 -0.417457 1.000000 0.288137 0.205873
model year 0.996800 0.579267 -0.348746 -0.370164 -0.411651 -0.306564 0.288137 1.000000 0.180662
origin 0.199702 0.563450 -0.562543 -0.609409 -0.453669 -0.581024 0.205873 0.180662 1.000000

regression polynomiale

Comparer

mpg = β0 + β1 × horsepower  + ε

avec

mpg = β0 + β1 × horsepower + β2 × horsepower^2 + ε
In [ ]: