Notebook

Deep Autoencoders¶

by Khaled Nasr as a part of a GSoC 2014 project mentored by Theofanis Karaletsos and Sergey Lisitsyn¶

This notebook illustrates how to train and evaluate a deep autoencoder using Shogun

Introduction¶

A (single layer) autoencoder is a neural network that has three layers: an input layer, a hidden (encoding) layer, and a decoding layer. The network is trained to reconstruct its inputs, which forces the hidden layer to try to learn good representations of the inputs.

In order to encourage the hidden layer to learn good input representations, certain variations on the simple autoencoder exist. Shogun currently supports two of them: Denoising Autoencoders [1] and Contractive Autoencoders [2]. In this notebook we'll focus on denoising autoencoders.

For denoising autoencoders, each time a new training example is introduced to the network, it's randomly corrupted in some mannar, and the target is set to the original example. The autoencoder will try to recover the orignal data from it's noisy version, which is why it's called a denoising autoencoder. This process will force the hidden layer to learn a good representation of the input, one which is not affected by the corruption process.

A deep autoencoder is an autoencoder with multiple hidden layers. Training such autoencoders directly is usually difficult, however, they can be pre-trained as a stack of single layer autoencoders. That is, we train the first hidden layer to reconstruct the input data, and then train the second hidden layer to reconstruct the states of the first hidden layer, and so on. After pre-training, we can train the entire deep autoencoder to fine-tune all the parameters together. We can also use the autoencoder to initialize a regular neural network and train it in a supervised manner.

In this notebook we'll apply deep autoencoders to the USPS dataset for handwritten digits. We'll start by loading the data and dividing it into a training set and a test set:

In [12]:

from scipy.io import loadmat
from modshogun import RealFeatures, MulticlassLabels

# load the dataset
dataset = loadmat('../../../data/multiclass/usps.mat')

Xall = dataset['data']
# the usps dataset has the digits labeled from 1 to 10 
# we'll subtract 1 to make them in the 0-9 range instead
Yall = np.array(dataset['label'].squeeze(), dtype=np.double)-1 

# 7000 examples for training
Xtrain = RealFeatures(Xall[:,0:7000])
Ytrain = MulticlassLabels(Yall[0:7000])

# the rest for testing
Xtest = RealFeatures(Xall[:,7000:-1])
Ytest = MulticlassLabels(Yall[7000:-1])

Creating the autoencoder¶

Similar to regular neural networks in Shogun, we create a deep autoencoder using an array of NeuralLayer-based classes, which can be created using the utility class NeuralLayers. However, for deep autoencoders there's a restriction that the layer sizes in the network have to be symmetric, that is, the first layer has to have the same size as the last layer, the second layer has to have the same size as the second-to-last layer, and so on. This restriction is necessary for pre-training to work. More details on that can found in the following section.

We'll create a 5-layer deep autoencoder with following layer sizes: 256->512->128->512->256. We'll use rectified linear neurons for the hidden layers and linear neurons for the output layer.

In [13]:

from modshogun import NeuralLayers, DeepAutoencoder

layers = NeuralLayers()
layers = layers.input(256).rectified_linear(512).rectified_linear(128).rectified_linear(512).linear(256).done()

ae = DeepAutoencoder(layers)

Pre-training¶

Now we can pre-train the network. To illustrate exactly what's going to happen, we'll give the layers some labels: L1 for the input layer, L2 for the first hidden layer, and so on up to L5 for the output layer.

In pre-training, an autoencoder will formed for each encoding layer (layers up to the middle layer in the network). So here we'll have two autoencoders: L1->L2->L5, and L2->L3->L4. The first autoencoder will be trained on the raw data and used to initialize the weights and biases of layers L2 and L5 in the deep autoencoder. After the first autoencoder is trained, we use it to transform the raw data into the states of L2. These states will then be used to train the second autoencoder, which will be used to initialize the weights and biases of layers L3 and L4 in the deep autoencoder.

The operations described above are performed by the the pre_train() function. Pre-training parameters for each autoencoder can be controlled using the pt_* public attributes of DeepAutoencoder. Each of those attributes is an SGVector whose length is the number of autoencoders in the deep autoencoder (2 in our case). It can be used to set the parameters for each autoencoder indiviually. SGVector's set_const() method can also be used to assign the same parameter value for all autoencoders.

Different noise types can be used to corrupt the inputs in a denoising autoencoder. Shogun currently supports 2 noise types: dropout noise, where a random portion of the inputs is set to zero at each iteration in training, and gaussian noise, where the inputs are corrupted with random gaussian noise. The noise type and strength can be controlled using pt_noise_type and pt_noise_parameter. Here, we'll use dropout noise.

In [14]:

from modshogun import AENT_DROPOUT, NNOM_GRADIENT_DESCENT, MSG_INFO

ae.pt_noise_type.set_const(AENT_DROPOUT) # use dropout noise
ae.pt_noise_parameter.set_const(0.5) # each input has a 50% chance of being set to zero

ae.pt_optimization_method.set_const(NNOM_GRADIENT_DESCENT) # train using gradient descent
ae.pt_gd_learning_rate.set_const(0.01)
ae.pt_gd_mini_batch_size.set_const(128)

ae.pt_max_num_epochs.set_const(100)
ae.pt_epsilon.set_const(0.0) # disable automatic convergence testing

# allow the INFO messages to be printed to the console, useful for monitoring training progress
ae.io.set_loglevel(MSG_INFO) 

# start pre-training. this might take some time
ae.pre_train(Xtrain)

Fine-tuning¶

After pre-training, we can train the autoencoder as a whole to fine-tune the parameters. Training the whole autoencoder is performed using the train() function. Training parameters are controlled through the public attributes, same as a regular neural network.

In [15]:

ae.noise_type = AENT_DROPOUT # same noise type we used for pre-training
ae.noise_parameter = 0.5

ae.max_num_epochs = 100
ae.optimization_method = NNOM_GRADIENT_DESCENT
ae.gd_mini_batch_size = 128
ae.gd_learning_rate = 0.0001
ae.epsilon = 0.0

# start fine-tuning. this might take some time
_ = ae.train(Xtrain)

Evaluation¶

Now we can evaluate the autoencoder that we trained. We'll start by providing it with corrupted inputs and looking at how it will reconstruct them. The function reconstruct() is used to obtain the reconstructions:

In [16]:

# get a 50-example subset of the test set
subset = Xtest[:,0:50].copy()

# corrupt the first 25 examples with multiplicative noise
subset[:,0:25] *= (random.random((256,25))>0.5)

# corrupt the other 25 examples with additive noise 
subset[:,25:50] += random.random((256,25))

# obtain the reconstructions
reconstructed_subset = ae.reconstruct(RealFeatures(subset))

# plot the corrupted data and the reconstructions
figure(figsize=(10,10))
for i in range(50):
    ax1=subplot(10,10,i*2+1)
    ax1.imshow(subset[:,i].reshape((16,16)), interpolation='nearest', cmap = cm.Greys_r)
    ax1.set_xticks([])
    ax1.set_yticks([])

    ax2=subplot(10,10,i*2+2)
    ax2.imshow(reconstructed_subset[:,i].reshape((16,16)), interpolation='nearest', cmap = cm.Greys_r)
    ax2.set_xticks([])
    ax2.set_yticks([])

The figure shows the corrupted examples and their reconstructions. The top half of the figure shows the ones corrupted with multiplicative noise, the bottom half shows the ones corrupted with additive noise. We can see that the autoencoders can provide decent reconstructions despite the heavy noise.

Next we'll look at the weights that the first hidden layer has learned. To obtain the weights, we can call the get_layer_parameters() function, which will return a vector containing both the weights and the biases of the layer. The biases are stored first in the array followed by the weights matrix in column-major format.

In [17]:

# obtain the weights matrix of the first hidden layer
# the 512 is the number of biases in the layer (512 neurons)
# the transpose is because numpy stores matrices in row-major format, and Shogun stores 
# them in column major format
w1 = ae.get_layer_parameters(1)[512:].reshape(256,512).T

# visualize the weights between the first 100 neurons in the hidden layer 
# and the neurons in the input layer
figure(figsize=(10,10))
for i in range(100):
	ax1=subplot(10,10,i+1)
	ax1.imshow(w1[i,:].reshape((16,16)), interpolation='nearest', cmap = cm.Greys_r)
	ax1.set_xticks([])
	ax1.set_yticks([])

Now, we can use the autoencoder to initialize a supervised neural network. The network will have all the layer of the autoencoder up to (and including) the middle layer. We'll also add a softmax output layer. So, the network will look like: L1->L2->L3->Softmax. The network is obtained by calling convert_to_neural_network():

In [18]:

from modshogun import NeuralSoftmaxLayer

nn = ae.convert_to_neural_network(NeuralSoftmaxLayer(10))

nn.max_num_epochs = 50

nn.io.set_loglevel(MSG_INFO)

nn.set_labels(Ytrain)
_ = nn.train(Xtrain)

Next, we'll evaluate the accuracy on the test set:

In [19]:

from modshogun import MulticlassAccuracy

predictions = nn.apply_multiclass(Xtest)
accuracy = MulticlassAccuracy().evaluate(predictions, Ytest) * 100

print "Classification accuracy on the test set =", accuracy, "%"

Classification accuracy on the test set = 94.6451893774 %