Scipy and scikits¶

Things that you may want to use, so you don't need to reinvent the wheel.

http://docs.scipy.org/doc/scipy/reference/tutorial/

In [7]:

try: 
    %matplotlib inline
except:
    %pylab inline

In [8]:

import matplotlib.pyplot as plt

Scipy¶

scipy.io

load and save mat files (from Matlab)

scipy.linalg

inverse
determinant
leastsquares
Matrix Decompositions
svd
eigenvectors/eignevalues

scipy.sparse

scipy.ndimage

multidimensional image processing
correlate, convolution

scipy.spatial

delaunay triangulation convex hulls vornoi diagrams

In [1]:

#when you import scipy, you do NOT import all the sub packages
import scipy

In [ ]:

scipy.i

In [3]:

scipy.linspace(10,11, num=10)

Out[3]:

array([ 10.        ,  10.11111111,  10.22222222,  10.33333333,
        10.44444444,  10.55555556,  10.66666667,  10.77777778,
        10.88888889,  11.        ])

In [ ]:

# you need to import a subpackage to have access
scipy.io

In [2]:

import scipy.io as io
io?

In [5]:

## note about namespaces
#import scipy.linalg
from scipy import linalg

In [ ]:

linalg.

In [ ]:

from scipy import interpolate
interpolate

B-splines¶

Example showing how to use B-splines in scipy.signal to do interpolation.

The input points must be equally spaced to use these routines.

In [9]:

from numpy import r_, sin
from scipy.signal import cspline1d, cspline1d_eval

x = r_[0:10] #matlab like range creation 
print x
dx = x[1]-x[0]
newx = r_[-3:13:0.1]  # notice outside the original domain 
y = sin(x)
## find cubic spline coefficients for a 1D signal
cj = cspline1d(y)
## Evaluate spline on a new set of points
###  dx is old sampling space, x0 was old sample origin
newy = cspline1d_eval(cj, newx, dx=dx,x0=x[0])

plt.plot(newx, newy, x, y, 'o')

[0 1 2 3 4 5 6 7 8 9]

Out[9]:

[<matplotlib.lines.Line2D at 0x10aa77590>,
 <matplotlib.lines.Line2D at 0x10aa77810>]

In [ ]:

## example r_
r_[1:2:.1]

Scikit Learn¶

Machine learning in python

http://scikit-learn.org/stable/

http://scikit-learn.org/stable/auto_examples/index.html

Gaussian Mixture Model Estimation of Density¶

In [11]:

import numpy as np
from sklearn import mixture

n_samples = 300

# generate random sample, two components
np.random.seed(0)
C = np.array([[0., -0.7], [3.5, .7]])
X_train = np.r_[np.dot(np.random.randn(n_samples, 2), C),
                np.random.randn(n_samples, 2) + np.array([20, 20])]

clf = mixture.GMM(n_components=2, covariance_type='full')
clf.fit(X_train)

x = np.linspace(-20.0, 30.0)
y = np.linspace(-20.0, 40.0)
X, Y = np.meshgrid(x, y)
XX = np.c_[X.ravel(), Y.ravel()]
Z = np.log(-clf.score_samples(XX)[0])
Z = Z.reshape(X.shape)

CS = plt.contour(X, Y, Z)
CB = plt.colorbar(CS, shrink=0.8, extend='both')
plt.scatter(X_train[:, 0], X_train[:, 1], .8)

plt.axis('tight')

Out[11]:

(-22.5, 32.5, -23.0, 43.0)

In [15]:

import numpy as np
C = np.array([[0., -0.7], [3.5, .7]])

In [22]:

n_samples = 300
np.dot(np.random.randn(n_samples, 2), C)

np.random.randn(n_samples, 2) + np.array([20, 20])
X_train = np.r_[np.dot(np.random.randn(n_samples, 2), C),
                np.random.randn(n_samples, 2) + np.array([20, 20])]

In [27]:

plt.plot(X_train[:,0], X_train[:,1], 'o')

Out[27]:

[<matplotlib.lines.Line2D at 0x10c8c29d0>]

Scikit image¶

Image processing in Python

http://scikit-image.org/

OpenCV (see also)¶

Computer Vision C/C++ with python wrappers

http://docs.opencv.org/trunk/doc/py_tutorials/py_tutorials.html

Scikit Image¶

Use a Sobel filter on a greyscale image

In [12]:

##Basic scikit-image example

from skimage import data, io, filter
import matplotlib.pyplot as plt

image = data.coins() # or any NumPy array!
edges = filter.sobel(image)
plt.figure(figsize=(10,10))
plt.subplot(121)
io.imshow(image)
plt.subplot(122)
io.imshow(edges)

StatsModels¶

http://statsmodels.sourceforge.net/

Link to some great notebooks

https://github.com/statsmodels/statsmodels/tree/master/examples/notebooks

In [ ]:

#statsmodel simple example

#Regression plot and loading datasets from R

http://statsmodels.sourceforge.net/devel/examples/notebooks/generated/regression_plots.html

In [14]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.formula.api import ols

We can use a utility function to load any R dataset available from the great Rdatasets package.

In [15]:

prestige = sm.datasets.get_rdataset("Duncan", "car", cache=True).data

Duncan's Occupational Prestige Data¶

Description


The **Duncan** data frame has 45 rows and 4 columns. Data on the
prestige and other characteristics of 45 U. S. occupations in 1950.


Format
~~~~~~

This data frame contains the following columns:

type
    Type of occupation. A factor with the following levels:
    * prof - professional and managerial
    * wc - white-collar
    * bc - blue-collar.

income
    Percent of males in occupation earning $3500 or more in 1950.

education
    Percent of males in occupation in 1950 who were high-school
    graduates.

prestige
    Percent of raters in NORC study rating occupation as excellent or
    good in prestige.

Source
~~~~~~

Duncan, O. D. (1961) A socioeconomic index for all occupations. In
Reiss, A. J., Jr. (Ed.) *Occupations and Social Status.* Free Press
[Table VI-1].

References
~~~~~~~~~~

Fox, J. (2008) *Applied Regression Analysis and Generalized Linear
Models*, Second Edition. Sage.

Fox, J. and Weisberg, S. (2011) *An R Companion to Applied Regression*,
Second Edition, Sage.

In [16]:

## remember pandas, this should look familiar

prestige.head()

Out[16]:

	type	income	education	prestige
accountant	prof	62	86	82
pilot	prof	72	76	83
architect	prof	75	92	90
author	prof	55	90	76
chemist	prof	64	86	90

5 rows × 4 columns

In [17]:

prestige_model = ols("prestige ~ income + education", data=prestige).fit()

In [18]:

print prestige_model.summary()

                            OLS Regression Results                            
==============================================================================
Dep. Variable:               prestige   R-squared:                       0.828
Model:                            OLS   Adj. R-squared:                  0.820
Method:                 Least Squares   F-statistic:                     101.2
Date:                Tue, 15 Apr 2014   Prob (F-statistic):           8.65e-17
Time:                        15:31:02   Log-Likelihood:                -178.98
No. Observations:                  45   AIC:                             364.0
Df Residuals:                      42   BIC:                             369.4
Df Model:                           2                                         
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept     -6.0647      4.272     -1.420      0.163       -14.686     2.556
income         0.5987      0.120      5.003      0.000         0.357     0.840
education      0.5458      0.098      5.555      0.000         0.348     0.744
==============================================================================
Omnibus:                        1.279   Durbin-Watson:                   1.458
Prob(Omnibus):                  0.528   Jarque-Bera (JB):                0.520
Skew:                           0.155   Prob(JB):                        0.771
Kurtosis:                       3.426   Cond. No.                         163.
==============================================================================

In [19]:

fig, ax = plt.subplots(figsize=(12,8))
fig = sm.graphics.influence_plot(prestige_model, ax=ax, criterion="cooks")

In [20]:

fix, ax = plt.subplots(figsize=(12,14))
fig = sm.graphics.plot_partregress("prestige", "income", ["education"], data=prestige, ax=ax)

http://statsmodels.sourceforge.net/devel/examples/notebooks/generated/regression_plots.html