# Regresssion lineaire¶

avec Statsmodel

dataset: auto mpg Mileage per gallon performances of various cars

Disponible sur https://www.kaggle.com/uciml/autompg-dataset

A prédire:

• mpg: continuous

Les variables

• cylinders: multi-valued discrete
• displacement: continuous
• horsepower: continuous
• weight: continuous
• acceleration: continuous

On ne prends pas en compte:

• model year: multi-valued discrete
• origin: multi-valued discrete
• car name: string (unique for each instance)
In [2]:
import pandas as pd
import statsmodels.formula.api as smf


In [3]:
lm = smf.ols(formula='mpg ~ cylinders + displacement + horsepower + weight + acceleration + origin ', data=df).fit()
lm.summary()

Out[3]:
Dep. Variable: R-squared: mpg 0.717 OLS 0.713 Least Squares 165.5 Sat, 22 Sep 2018 4.84e-104 17:58:03 -1131.1 398 2276. 391 2304. 6 nonrobust
coef std err t P>|t| [0.025 0.975] 42.7111 2.693 15.861 0.000 37.417 48.005 -0.5256 0.404 -1.302 0.194 -1.320 0.268 0.0106 0.009 1.133 0.258 -0.008 0.029 -0.0529 0.016 -3.277 0.001 -0.085 -0.021 -0.0051 0.001 -6.441 0.000 -0.007 -0.004 0.0043 0.120 0.036 0.972 -0.232 0.241 1.4269 0.345 4.136 0.000 0.749 2.105
 Omnibus: Durbin-Watson: 32.659 0.886 0 43.338 0.624 3.88e-10 4.028 39900

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 3.99e+04. This might indicate that there are
strong multicollinearity or other numerical problems.
In [4]:
df.corr()

Out[4]:
Unnamed: 0 mpg cylinders displacement horsepower weight acceleration model year origin
Unnamed: 0 1.000000 0.585131 -0.363040 -0.386976 -0.417861 -0.318869 0.287634 0.996800 0.199702
mpg 0.585131 1.000000 -0.775396 -0.804203 -0.771437 -0.831741 0.420289 0.579267 0.563450
cylinders -0.363040 -0.775396 1.000000 0.950721 0.838939 0.896017 -0.505419 -0.348746 -0.562543
displacement -0.386976 -0.804203 0.950721 1.000000 0.893646 0.932824 -0.543684 -0.370164 -0.609409
horsepower -0.417861 -0.771437 0.838939 0.893646 1.000000 0.860574 -0.684259 -0.411651 -0.453669
weight -0.318869 -0.831741 0.896017 0.932824 0.860574 1.000000 -0.417457 -0.306564 -0.581024
acceleration 0.287634 0.420289 -0.505419 -0.543684 -0.684259 -0.417457 1.000000 0.288137 0.205873
model year 0.996800 0.579267 -0.348746 -0.370164 -0.411651 -0.306564 0.288137 1.000000 0.180662
origin 0.199702 0.563450 -0.562543 -0.609409 -0.453669 -0.581024 0.205873 0.180662 1.000000

# regression polynomiale¶

Comparer

mpg = β0 + β1 × horsepower  + ε



avec

mpg = β0 + β1 × horsepower + β2 × horsepower^2 + ε
In [ ]: