Things that you may want to use, so you don't need to reinvent the wheel
.
try:
%matplotlib inline
except:
%pylab inline
import matplotlib.pyplot as plt
scipy.io
load and save mat files (from Matlab)
scipy.linalg
scipy.sparse
scipy.ndimage
scipy.spatial
delaunay triangulation convex hulls vornoi diagrams
#when you import scipy, you do NOT import all the sub packages
import scipy
scipy.i
scipy.linspace(10,11, num=10)
array([ 10. , 10.11111111, 10.22222222, 10.33333333, 10.44444444, 10.55555556, 10.66666667, 10.77777778, 10.88888889, 11. ])
# you need to import a subpackage to have access
scipy.io
import scipy.io as io
io?
## note about namespaces
#import scipy.linalg
from scipy import linalg
linalg.
from scipy import interpolate
interpolate
Example showing how to use B-splines in scipy.signal to do interpolation.
The input points must be equally spaced to use these routines.
from numpy import r_, sin
from scipy.signal import cspline1d, cspline1d_eval
x = r_[0:10] #matlab like range creation
print x
dx = x[1]-x[0]
newx = r_[-3:13:0.1] # notice outside the original domain
y = sin(x)
## find cubic spline coefficients for a 1D signal
cj = cspline1d(y)
## Evaluate spline on a new set of points
### dx is old sampling space, x0 was old sample origin
newy = cspline1d_eval(cj, newx, dx=dx,x0=x[0])
plt.plot(newx, newy, x, y, 'o')
[0 1 2 3 4 5 6 7 8 9]
[<matplotlib.lines.Line2D at 0x10aa77590>, <matplotlib.lines.Line2D at 0x10aa77810>]
## example r_
r_[1:2:.1]
import numpy as np
from sklearn import mixture
n_samples = 300
# generate random sample, two components
np.random.seed(0)
C = np.array([[0., -0.7], [3.5, .7]])
X_train = np.r_[np.dot(np.random.randn(n_samples, 2), C),
np.random.randn(n_samples, 2) + np.array([20, 20])]
clf = mixture.GMM(n_components=2, covariance_type='full')
clf.fit(X_train)
x = np.linspace(-20.0, 30.0)
y = np.linspace(-20.0, 40.0)
X, Y = np.meshgrid(x, y)
XX = np.c_[X.ravel(), Y.ravel()]
Z = np.log(-clf.score_samples(XX)[0])
Z = Z.reshape(X.shape)
CS = plt.contour(X, Y, Z)
CB = plt.colorbar(CS, shrink=0.8, extend='both')
plt.scatter(X_train[:, 0], X_train[:, 1], .8)
plt.axis('tight')
(-22.5, 32.5, -23.0, 43.0)
import numpy as np
C = np.array([[0., -0.7], [3.5, .7]])
n_samples = 300
np.dot(np.random.randn(n_samples, 2), C)
np.random.randn(n_samples, 2) + np.array([20, 20])
X_train = np.r_[np.dot(np.random.randn(n_samples, 2), C),
np.random.randn(n_samples, 2) + np.array([20, 20])]
plt.plot(X_train[:,0], X_train[:,1], 'o')
[<matplotlib.lines.Line2D at 0x10c8c29d0>]
Image processing in Python
Computer Vision C/C++ with python wrappers
http://docs.opencv.org/trunk/doc/py_tutorials/py_tutorials.html
Use a Sobel filter on a greyscale image
##Basic scikit-image example
from skimage import data, io, filter
import matplotlib.pyplot as plt
image = data.coins() # or any NumPy array!
edges = filter.sobel(image)
plt.figure(figsize=(10,10))
plt.subplot(121)
io.imshow(image)
plt.subplot(122)
io.imshow(edges)
http://statsmodels.sourceforge.net/
Link to some great notebooks
https://github.com/statsmodels/statsmodels/tree/master/examples/notebooks
#statsmodel simple example
#Regression plot and loading datasets from R
http://statsmodels.sourceforge.net/devel/examples/notebooks/generated/regression_plots.html
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.formula.api import ols
We can use a utility function to load any R dataset available from the great Rdatasets package.
prestige = sm.datasets.get_rdataset("Duncan", "car", cache=True).data
Description
The **Duncan** data frame has 45 rows and 4 columns. Data on the
prestige and other characteristics of 45 U. S. occupations in 1950.
Format
~~~~~~
This data frame contains the following columns:
type
Type of occupation. A factor with the following levels:
* prof - professional and managerial
* wc - white-collar
* bc - blue-collar.
income
Percent of males in occupation earning $3500 or more in 1950.
education
Percent of males in occupation in 1950 who were high-school
graduates.
prestige
Percent of raters in NORC study rating occupation as excellent or
good in prestige.
Source
~~~~~~
Duncan, O. D. (1961) A socioeconomic index for all occupations. In
Reiss, A. J., Jr. (Ed.) *Occupations and Social Status.* Free Press
[Table VI-1].
References
~~~~~~~~~~
Fox, J. (2008) *Applied Regression Analysis and Generalized Linear
Models*, Second Edition. Sage.
Fox, J. and Weisberg, S. (2011) *An R Companion to Applied Regression*,
Second Edition, Sage.
## remember pandas, this should look familiar
prestige.head()
type | income | education | prestige | |
---|---|---|---|---|
accountant | prof | 62 | 86 | 82 |
pilot | prof | 72 | 76 | 83 |
architect | prof | 75 | 92 | 90 |
author | prof | 55 | 90 | 76 |
chemist | prof | 64 | 86 | 90 |
5 rows × 4 columns
prestige_model = ols("prestige ~ income + education", data=prestige).fit()
print prestige_model.summary()
OLS Regression Results ============================================================================== Dep. Variable: prestige R-squared: 0.828 Model: OLS Adj. R-squared: 0.820 Method: Least Squares F-statistic: 101.2 Date: Tue, 15 Apr 2014 Prob (F-statistic): 8.65e-17 Time: 15:31:02 Log-Likelihood: -178.98 No. Observations: 45 AIC: 364.0 Df Residuals: 42 BIC: 369.4 Df Model: 2 ============================================================================== coef std err t P>|t| [95.0% Conf. Int.] ------------------------------------------------------------------------------ Intercept -6.0647 4.272 -1.420 0.163 -14.686 2.556 income 0.5987 0.120 5.003 0.000 0.357 0.840 education 0.5458 0.098 5.555 0.000 0.348 0.744 ============================================================================== Omnibus: 1.279 Durbin-Watson: 1.458 Prob(Omnibus): 0.528 Jarque-Bera (JB): 0.520 Skew: 0.155 Prob(JB): 0.771 Kurtosis: 3.426 Cond. No. 163. ==============================================================================
fig, ax = plt.subplots(figsize=(12,8))
fig = sm.graphics.influence_plot(prestige_model, ax=ax, criterion="cooks")
fix, ax = plt.subplots(figsize=(12,14))
fig = sm.graphics.plot_partregress("prestige", "income", ["education"], data=prestige, ax=ax)