Notebook

Breakout: EigenFaces¶

In this breakout, we'll be using Principal Component Analysis to explore how it interacts with the faces dataset that we saw earlier.

In [1]:

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

# use seaborn plotting defaults
import seaborn as sns; sns.set()

We'll use this code to load the data:

In [2]:

from sklearn.datasets import fetch_lfw_people
faces = fetch_lfw_people(min_faces_per_person=70, resize=0.4)

X, y = faces.data, faces.target

1. Compute a PCA of the data¶

Compute a Principal Component Analysis of the data, using all components
Plot the cumulative explained variance ratio. How many components do we need to recover 90% of the variance?

2. Plot the "eigenfaces"¶

The mean of the data (found in the mean_ attribute) and each component of the data (found in the rows of the components_ attribute) can be reshaped and interpreted as an image.

Display the mean face using plt.imshow
Display the first few "eigenfaces" (given by the rows of the components_ matrix

You'll have to play around with the colormap and grid settings to make this look OK

3. Plot the reconstructed faces¶

For several faces, plot the true image plus the reconstruction (computed using inverse_transform) for several different values of n_components. (you might even use IPython's interactive functions to make this exploration easier).

Does the 90% variance choice seem to correspond to a good visual representation of each picture?

Note: As you experiment with this, you may want to use RandomizedPCA rather than PCA for this task. RandomizedPCA is an approximate method with the same interface as PCA, but operates much more quickly.