5. SciPy.¶

Antes de comenzar...

In [1]:

%matplotlib inline
from IPython.display import Image
import numpy as np
import matplotlib.pyplot as plt

5.1. Descripción.¶

SciPy se desarrolla sobre NumPy y provee de un gran número de algoritmos científicos de alto nivel.

Functiones especiales (scipy.special)
Integración (scipy.integrate)
Optimización (scipy.optimize)
Interpolación (scipy.interpolate)
Fourier Transformads (scipy.fftpack)
Signal Processing (scipy.signal)
Linear Algebra (scipy.linalg)
Problemas de matrices dispersas (scipy.sparse)
Statistics (scipy.stats)
Multi-dimensional image processing (scipy.ndimage)
File IO (scipy.io)

Para utilizar la biblioteca, necesitamos importarla... si queremos usar el modulo de álgebra lineal:

5.1. Integración¶

Integración numérica de una función del tipo

$\displaystyle \int_a^b f(x) dx$

SciPy provee de una serie de funciones para resolver este tipo de problemas: quad, dblquad y tplquad para integrales simples, dobles y triples, respectivamente.

In [2]:

from scipy.integrate import quad, dblquad, tplquad

In [3]:

# defino una función a integrar 
def f(x):
    return x

In [4]:

x_lower = 0 # límite inferior de x
x_upper = 1 # límite superior de x

val, abserr = quad(f, x_lower, x_upper)

print "integral value =", val, ", absolute error =", abserr 

integral value = 0.5 , absolute error = 5.55111512313e-15

Podemos pasar argumentos extras a la función a integrar usando args:

In [5]:

def integrand(x, n):
    return x*n


x_lower = 0  # the lower limit of x
x_upper = 1 # the upper limit of x

val, abserr = quad(integrand, x_lower, x_upper, args=3)

print val, abserr 

1.5 1.66533453694e-14

Podemos usar funciones lambda para definir explícitamente a la función a integrar:

In [6]:

val, abserr = quad(lambda x: np.exp(-x ** 2), -np.inf, np.inf) # Inf = infinito

print "numerical  =", val, abserr

analytical = np.sqrt(np.pi)
print "analytical =", analytical

numerical  = 1.77245385091 1.42026367809e-08
analytical = 1.77245385091

Integración en dos dimensiones:

In [7]:

def integrand(x, y):
    return np.exp(-x**2-y**2)

x_lower = 0  
x_upper = 10
y_lower = 0
y_upper = 10

val, abserr = dblquad(integrand, x_lower, x_upper, lambda x : y_lower, lambda x: y_upper)

print val, abserr 

0.785398163397 1.63822994214e-13

Nota: los límites de integración para y, los pasamos a traves de funciones lambda constantes (en general, y es una función de x).

5.2. Álgebra lineal.¶

El módulo de álgebra lineal contiene funciones para trabajar con matrices, incluyendo resolución de ecuaciones lineales, resolución de autovalores, decomposiciones (SVD, LU, cholesky), etc.

Más info: http://docs.scipy.org/doc/scipy/reference/linalg.html

In [8]:

import scipy.linalg as la

5.2.1. Sistema de ecuaciones lineales.¶

Ecuaciones lineales de la forma

$A x = b$

donde $A$ es una matrix y $x,b$ son vectores, pueden resolverse de la siguiente forma:

In [9]:

A = np.array([[1,2,3], [5,6,7], [9,10,11]])
b = np.array([1,2,3])

In [10]:

x = la.solve(A, b)
x

Out[10]:

array([-0.625,  1.   , -0.125])

In [11]:

np.dot(A, x) - b # chequeo

Out[11]:

array([ 0.,  0.,  0.])

Podemos hacer lo mismo

$A X = B$

donde $A, B, X$ son matrices:

In [12]:

A = np.random.rand(3,3)
B = np.random.rand(3,3)

In [13]:

X = la.solve(A, B)

In [14]:

Out[14]:

array([[  3.68454931,  -3.91580978, -22.1044433 ],
       [ -1.41836009,   2.12891689,   9.96756428],
       [ -1.68928033,   1.80894716,  10.6411778 ]])

In [15]:

la.norm(np.dot(A, X) - B) # chequeo

Out[15]:

1.8873791418627661e-15

5.2.2. Autovalores y autovectores.¶

El problema de los autovalores para una matrix $A$:

$\displaystyle A v_n = \lambda_n v_n$

donde $v_n$ es el $n$th autovector y $\lambda_n$ es el $n$th autovalor.

Para calcular los autovalores de una matrix usaremos la función eigvals y para calcular autovalores y autovectores, usaremos la función eig:

In [16]:

evals = la.eigvals(A)

In [17]:

evals

Out[17]:

array([ 1.91569473+0.j,  0.04152022+0.j,  0.25989239+0.j])

In [18]:

evals, evecs = la.eig(A)

In [19]:

evals

Out[19]:

array([ 1.91569473+0.j,  0.04152022+0.j,  0.25989239+0.j])

In [20]:

evecs

Out[20]:

array([[-0.574175  , -0.83951781, -0.75933893],
       [-0.57297464,  0.32943996,  0.02428778],
       [-0.58482743,  0.43206383,  0.65024187]])

El autovector correspodiente al $n$th autovalor (evals[n]) es la $n$th columna en evecs, es decir, evecs[:,n]. Para verificarlo:

In [21]:

n = 1
la.norm(np.dot(A, evecs[:,n]) - evals[n] * evecs[:,n])

Out[21]:

3.1551467995333175e-16

5.2.3. Operationes con matrices.¶

In [22]:

la.inv(A) # inversa de una matrix

Out[22]:

array([[ 18.99989211, -38.02674665,  19.11473248],
       [ -7.38385472,  17.23270871,  -9.12266594],
       [ -9.38385033,  17.87541438,  -7.77819987]])

In [23]:

la.det(A) # determinante

Out[23]:

0.020671858294129455

In [24]:

la.norm(A, ord=2), la.norm(A, ord=np.inf) # normas de varios ordenes

Out[24]:

(1.917209871229647, 1.934093890933267)

5.3. Optimización.¶

Buscando mínimos y máximos de una función.

Más info: http://scipy-lectures.github.com/advanced/mathematical_optimization/index.html

Como siempre, necesitamos importar el módulo corresondiente, en este caso optimize:

In [25]:

from scipy import optimize

5.3.1. Buscando mínimos.¶

In [26]:

def f(x):
    return 4*x**3 + (x-2)**2 + x**4

In [27]:

fig, ax  = plt.subplots()
x = np.linspace(-5, 3, 100)
ax.plot(x, f(x));

Podemos usar la función fmin_bfgs para buscar los mínimos locales:

In [28]:

x_min = optimize.fmin_bfgs(f, -2)
x_min 

Optimization terminated successfully.
         Current function value: -3.506641
         Iterations: 6
         Function evaluations: 30
         Gradient evaluations: 10

Out[28]:

array([-2.67298167])

In [29]:

optimize.fmin_bfgs(f, 0.5) 

Optimization terminated successfully.
         Current function value: 2.804988
         Iterations: 3
         Function evaluations: 15
         Gradient evaluations: 5

Out[29]:

array([ 0.46961745])

También podemos usar las funciones brent o fminbound que utilizan diferentes algoritmos para encontrar los mínimos:

In [30]:

optimize.brent(f)

Out[30]:

0.46961743402759754

In [31]:

optimize.fminbound(f, -4, 2)

Out[31]:

-2.6729822917513886

5.3.2. Buscando soluciones.¶

Para buscar la raíces de una función de la forma $f(x) = 0$, podemos usar el método fsolve:

In [32]:

def f(x):
    return np.sin(x)

In [33]:

fig, ax  = plt.subplots(figsize=(10,4))
x = np.linspace(-10, 10, 50)
y = f(x)
ax.plot(x, y)

Out[33]:

[<matplotlib.lines.Line2D at 0x3d1e7d0>]

In [34]:

optimize.fsolve(f, 0.5)

Out[34]:

array([ 0.])

In [35]:

optimize.fsolve(f, 4.0)

Out[35]:

array([ 3.14159265])

In [36]:

optimize.fsolve(f, -3.0)

Out[36]:

array([-3.14159265])

5.4. Interpolación.¶

Conociendo los valores de X e Y, la función interp1d genera un objeto (como si fuera una función) al que puedo interrogar para cualquier valor de x dentro del intervalo de X propuesto y que nos arroja el valor de y interpolado.

In [37]:

from scipy import interpolate

In [38]:

def f(x):
    return np.sin(x)

In [39]:

n = np.arange(0, 10)  
x = np.linspace(0, 9, 100)

y_meas = f(n) + 0.1 * np.random.randn(len(n))
y_real = f(x)

linear_interpolation = interpolate.interp1d(n, y_meas)
y_interp1 = linear_interpolation(x)

cubic_interpolation = interpolate.interp1d(n, y_meas, kind='cubic')
y_interp2 = cubic_interpolation(x)

In [40]:

fig, ax = plt.subplots(figsize=(10,4))
ax.plot(n, y_meas, 'bs', label='datos originales')
ax.plot(x, y_real, 'k', lw=2, label='funcion real')
ax.plot(x, y_interp1, 'r', label='interpolacion lineal')
ax.plot(x, y_interp2, 'g', label='interpolacion cubica')
ax.legend(loc=3);

5.5. Estadística.¶

El módulo scipy.stats contien un gran número de distribuciones, funciones stadísticas y tests.

Más info: http://docs.scipy.org/doc/scipy/reference/stats.html.

In [41]:

from scipy import stats

In [42]:

X = stats.poisson(3.5) # crea una variable aleatoria discreta con distribución de Poisson

In [43]:

n = np.arange(0,15)

fig, axes = plt.subplots(3,1, sharex=True)

axes[0].step(n, X.pmf(n)) # (PMF)

axes[1].step(n, X.cdf(n)) # (CDF)

axes[2].hist(X.rvs(size=1000)); #histograma de 1000 realizaciones aleatorias de la variable estocástica X

Out[43]:

(array([ 130.,  193.,  219.,  185.,  136.,  111.,   21.,    3.,    1.,    1.]),
 array([  0. ,   1.2,   2.4,   3.6,   4.8,   6. ,   7.2,   8.4,   9.6,
        10.8,  12. ]),
 <a list of 10 Patch objects>)

In [44]:

Y = stats.norm() # crea una variable aleatoria continua con distribución normal

In [45]:

x = np.linspace(-5,5,100)

fig, axes = plt.subplots(3,1, sharex=True)

axes[0].plot(x, Y.pdf(x))

axes[1].plot(x, Y.cdf(x));

axes[2].hist(Y.rvs(size=1000), bins=50);

In [46]:

X.mean(), X.std(), X.var() # distribución de Poisson

Out[46]:

(3.5, 1.8708286933869707, 3.5)

In [47]:

Y.mean(), Y.std(), Y.var() # distribución normal

Out[47]:

(0.0, 1.0, 1.0)

Tests estadísticos.

Testeamos si dos muestras de datos aleatorios independientes provienen de la misma población:

In [48]:

t_statistic, p_value = stats.ttest_ind(X.rvs(size=1000), X.rvs(size=1000))

print "t-statistic =", t_statistic
print "p-value =", p_value

t-statistic = 0.839527450252
p-value = 0.401273830399

Para testear, a partir de una media muestral de 0.1, si la media poblacional es 0.0:

In [49]:

stats.ttest_1samp(Y.rvs(size=1000), 0.1)

Out[49]:

(array(-3.4559458628292523), 0.00057143662803006202)

Podemos rechazar la hipotesis nula que la media poblacional de Y es 0.1

In [50]:

Y.mean()

Out[50]:

0.0

In [51]:

stats.ttest_1samp(Y.rvs(size=1000), Y.mean())

Out[51]:

(array(3.1406882198499484), 0.0017350548629815369)

5.6. Procesamiento de imágenes.¶

Como hacemos usualmente, debemos importar el módulo de interés:

In [52]:

from scipy import ndimage

5.6.1. Transformación geométrica de las imágenes.¶

In [53]:

from scipy import misc
lena = misc.lena()

In [54]:

shifted_lena = ndimage.shift(lena, (50, 50))
shifted_lena2 = ndimage.shift(lena, (50, 50), mode='nearest')
rotated_lena = ndimage.rotate(lena, 30)
cropped_lena = lena[50:-50, 50:-50]
zoomed_lena = ndimage.zoom(lena, 2)

In [55]:

fig, axes = plt.subplots(1,5, figsize=(16, 5))

axes[0].imshow(shifted_lena, cmap=plt.cm.gray)

axes[1].imshow(shifted_lena2, cmap=plt.cm.gray)

axes[2].imshow(rotated_lena, cmap=plt.cm.gray)

axes[3].imshow(cropped_lena, cmap=plt.cm.gray)

axes[4].imshow(zoomed_lena, cmap=plt.cm.gray)

for ax in range(5):
    axes[ax].axis('off')

In [56]:

import numpy as np
noisy_lena = np.copy(lena).astype(np.float)
noisy_lena += lena.std()*0.5*np.random.standard_normal(lena.shape)
blurred_lena = ndimage.gaussian_filter(noisy_lena, sigma=3)
median_lena = ndimage.median_filter(blurred_lena, size=5)
from scipy import signal
wiener_lena = signal.wiener(blurred_lena, (5,5))

In [57]:

fig, axes = plt.subplots(1,4, figsize=(16, 5))

axes[0].imshow(noisy_lena, cmap=plt.cm.gray)

axes[1].imshow(blurred_lena, cmap=plt.cm.gray)

axes[2].imshow(median_lena, cmap=plt.cm.gray)

axes[3].imshow(wiener_lena, cmap=plt.cm.gray)

for ax in range(4):
    axes[ax].axis('off')

5.6.2. Detección de bordes.¶

In [58]:

im = np.zeros((256, 256))
im[64:-64, 64:-64] = 1

im = ndimage.rotate(im, 15, mode='constant')
im = ndimage.gaussian_filter(im, 8)

sx = ndimage.sobel(im, axis=0, mode='constant')
sy = ndimage.sobel(im, axis=1, mode='constant')
sob = np.hypot(sx, sy)

plt.figure(figsize=(16, 5))
plt.subplot(141)
plt.imshow(im, cmap=plt.cm.gray)
plt.axis('off')
plt.title('Cuadrado', fontsize=16)
plt.subplot(142)
plt.imshow(sx)
plt.axis('off')
plt.title('Sobel (direccion x)', fontsize=16)
plt.subplot(143)
plt.imshow(sob)
plt.axis('off')
plt.title('Sobel', fontsize=16)

im += 0.07*np.random.random(im.shape)

sx = ndimage.sobel(im, axis=0, mode='constant')
sy = ndimage.sobel(im, axis=1, mode='constant')
sob = np.hypot(sx, sy)

plt.subplot(144)
plt.imshow(sob)
plt.axis('off')
plt.title('Sobel con ruido', fontsize=20)



plt.subplots_adjust(wspace=0.02, hspace=0.02, top=1, bottom=0, left=0, right=0.9)

plt.show()

5.6.3. Morfología matemática.¶

In [59]:

np.random.seed(1)
n = 10
l = 256
im = np.zeros((l, l))
points = l*np.random.random((2, n**2))
im[(points[0]).astype(np.int), (points[1]).astype(np.int)] = 1
im = ndimage.gaussian_filter(im, sigma=l/(4.*n))

mask = (im > im.mean()).astype(np.float)


img = mask + 0.3*np.random.randn(*mask.shape)

binary_img = img > 0.5

# Eliminamos los pequeños puntos blancos
open_img = ndimage.binary_opening(binary_img)
# Eliminamos los pequeños puntos negros
close_img = ndimage.binary_closing(open_img)

plt.figure(figsize=(12, 3))

l = 128

plt.subplot(141)
plt.imshow(binary_img[:l, :l], cmap=plt.cm.gray)
plt.axis('off')
plt.subplot(142)
plt.imshow(open_img[:l, :l], cmap=plt.cm.gray)
plt.axis('off')
plt.subplot(143)
plt.imshow(close_img[:l, :l], cmap=plt.cm.gray)
plt.axis('off')
plt.subplot(144)
plt.imshow(mask[:l, :l], cmap=plt.cm.gray)
plt.contour(close_img[:l, :l], [0.5], linewidths=2, colors='r')
plt.axis('off')

plt.subplots_adjust(wspace=0.02, hspace=0.3, top=1, bottom=0.1, left=0, right=1)

plt.show()