In this breakout, we'll be using Principal Component Analysis to explore how it interacts with the faces dataset that we saw earlier.
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
# use seaborn plotting defaults
import seaborn as sns; sns.set()
We'll use this code to load the data:
from sklearn.datasets import fetch_lfw_people
faces = fetch_lfw_people(min_faces_per_person=70, resize=0.4)
X, y = faces.data, faces.target
The mean of the data (found in the mean_
attribute) and each component of the data (found in the rows of the components_
attribute) can be reshaped and interpreted as an image.
plt.imshow
components_
matrixYou'll have to play around with the colormap and grid settings to make this look OK
For several faces, plot the true image plus the reconstruction (computed using inverse_transform
) for several different values of n_components
. (you might even use IPython's interactive functions to make this exploration easier).
Does the 90% variance choice seem to correspond to a good visual representation of each picture?
Note: As you experiment with this, you may want to use RandomizedPCA
rather than PCA
for this task. RandomizedPCA
is an approximate method with the same interface as PCA
, but operates much more quickly.