cs231n 2019. A1, part 5. Image features exercise

Solution by Yury Kashnitsky (@yorko)

Complete and hand in this completed worksheet (including its outputs and any supporting code outside of the worksheet) with your assignment submission. For more details see the assignments page on the course website.

We have seen that we can achieve reasonable performance on an image classification task by training a linear classifier on the pixels of the input image. In this exercise we will show that we can improve our classification performance by training linear classifiers not on raw pixels but on features that are computed from the raw pixels.

All of your work for this exercise will be done in this notebook.

In [1]:
import random
import numpy as np
from cs231n.data_utils import load_CIFAR10
import matplotlib.pyplot as plt

%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# for auto-reloading extenrnal modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

Load data

Similar to previous exercises, we will load CIFAR-10 data from disk.

In [2]:
from cs231n.features import color_histogram_hsv, hog_feature

PATH_TO_CIFAR = '/home/yorko/data/cifar-10-batches-py/'

def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=1000):
    # Load the raw CIFAR-10 data
    X_train, y_train, X_test, y_test = load_CIFAR10(PATH_TO_CIFAR)
    
    # Subsample the data
    mask = list(range(num_training, num_training + num_validation))
    X_val = X_train[mask]
    y_val = y_train[mask]
    mask = list(range(num_training))
    X_train = X_train[mask]
    y_train = y_train[mask]
    mask = list(range(num_test))
    X_test = X_test[mask]
    y_test = y_test[mask]
    
    return X_train, y_train, X_val, y_val, X_test, y_test

X_train, y_train, X_val, y_val, X_test, y_test = get_CIFAR10_data()

Extract Features

For each image we will compute a Histogram of Oriented Gradients (HOG) as well as a color histogram using the hue channel in HSV color space. We form our final feature vector for each image by concatenating the HOG and color histogram feature vectors.

Roughly speaking, HOG should capture the texture of the image while ignoring color information, and the color histogram represents the color of the input image while ignoring texture. As a result, we expect that using both together ought to work better than using either alone. Verifying this assumption would be a good thing to try for the bonus section.

The hog_feature and color_histogram_hsv functions both operate on a single image and return a feature vector for that image. The extract_features function takes a set of images and a list of feature functions and evaluates each feature function on each image, storing the results in a matrix where each column is the concatenation of all feature vectors for a single image.

In [3]:
%%time
from cs231n.features import *

num_color_bins = 10 # Number of bins in the color histogram
feature_fns = [hog_feature, lambda img: color_histogram_hsv(img, nbin=num_color_bins)]
X_train_feats = extract_features(X_train, feature_fns, verbose=True)
X_val_feats = extract_features(X_val, feature_fns)
X_test_feats = extract_features(X_test, feature_fns)

# Preprocessing: Subtract the mean feature
mean_feat = np.mean(X_train_feats, axis=0, keepdims=True)
X_train_feats -= mean_feat
X_val_feats -= mean_feat
X_test_feats -= mean_feat

# Preprocessing: Divide by standard deviation. This ensures that each feature
# has roughly the same scale.
std_feat = np.std(X_train_feats, axis=0, keepdims=True)
X_train_feats /= std_feat
X_val_feats /= std_feat
X_test_feats /= std_feat

# Preprocessing: Add a bias dimension
X_train_feats = np.hstack([X_train_feats, np.ones((X_train_feats.shape[0], 1))])
X_val_feats = np.hstack([X_val_feats, np.ones((X_val_feats.shape[0], 1))])
X_test_feats = np.hstack([X_test_feats, np.ones((X_test_feats.shape[0], 1))])
Done extracting features for 1000 / 49000 images
Done extracting features for 2000 / 49000 images
Done extracting features for 3000 / 49000 images
Done extracting features for 4000 / 49000 images
Done extracting features for 5000 / 49000 images
Done extracting features for 6000 / 49000 images
Done extracting features for 7000 / 49000 images
Done extracting features for 8000 / 49000 images
Done extracting features for 9000 / 49000 images
Done extracting features for 10000 / 49000 images
Done extracting features for 11000 / 49000 images
Done extracting features for 12000 / 49000 images
Done extracting features for 13000 / 49000 images
Done extracting features for 14000 / 49000 images
Done extracting features for 15000 / 49000 images
Done extracting features for 16000 / 49000 images
Done extracting features for 17000 / 49000 images
Done extracting features for 18000 / 49000 images
Done extracting features for 19000 / 49000 images
Done extracting features for 20000 / 49000 images
Done extracting features for 21000 / 49000 images
Done extracting features for 22000 / 49000 images
Done extracting features for 23000 / 49000 images
Done extracting features for 24000 / 49000 images
Done extracting features for 25000 / 49000 images
Done extracting features for 26000 / 49000 images
Done extracting features for 27000 / 49000 images
Done extracting features for 28000 / 49000 images
Done extracting features for 29000 / 49000 images
Done extracting features for 30000 / 49000 images
Done extracting features for 31000 / 49000 images
Done extracting features for 32000 / 49000 images
Done extracting features for 33000 / 49000 images
Done extracting features for 34000 / 49000 images
Done extracting features for 35000 / 49000 images
Done extracting features for 36000 / 49000 images
Done extracting features for 37000 / 49000 images
Done extracting features for 38000 / 49000 images
Done extracting features for 39000 / 49000 images
Done extracting features for 40000 / 49000 images
Done extracting features for 41000 / 49000 images
Done extracting features for 42000 / 49000 images
Done extracting features for 43000 / 49000 images
Done extracting features for 44000 / 49000 images
Done extracting features for 45000 / 49000 images
Done extracting features for 46000 / 49000 images
Done extracting features for 47000 / 49000 images
Done extracting features for 48000 / 49000 images
CPU times: user 1min 15s, sys: 251 ms, total: 1min 15s
Wall time: 1min 26s

Train SVM on features

Using the multiclass SVM code developed earlier in the assignment, train SVMs on top of the features extracted above; this should achieve better results than training SVMs directly on top of raw pixels.

In [4]:
%%time
# Use the validation set to tune the learning rate and regularization strength

from cs231n.classifiers.linear_classifier import LinearSVM
from tqdm import tqdm_notebook

learning_rates = [1e-9, 1e-8, 1e-7]
regularization_strengths = [5e4, 5e5, 5e6]

results = {}
best_val = -1
best_svm = None

################################################################################
# TODO:                                                                        #
# Use the validation set to set the learning rate and regularization strength. #
# This should be identical to the validation that you did for the SVM; save    #
# the best trained classifer in best_svm. You might also want to play          #
# with different numbers of bins in the color histogram. If you are careful    #
# you should be able to get accuracy of near 0.44 on the validation set.       #
################################################################################
for lr in tqdm_notebook(learning_rates):
    for reg in tqdm_notebook(regularization_strengths):
        svm = LinearSVM()
        _ = svm.train(X_train_feats, y_train, learning_rate=lr, 
                      reg=reg,
                      num_iters=1500, verbose=False)
        y_train_pred = svm.predict(X_train_feats)
        train_acc = np.mean(y_train == y_train_pred)
        y_val_pred = svm.predict(X_val_feats)
        val_acc = np.mean(y_val == y_val_pred)
        results[(lr, reg)] = (train_acc, val_acc)
        if val_acc > best_val:
            best_val = val_acc
            best_svm = svm
################################################################################
#                              END OF YOUR CODE                                #
################################################################################

# Print out results.
for lr, reg in sorted(results):
    train_accuracy, val_accuracy = results[(lr, reg)]
    print('lr %e reg %e train accuracy: %f val accuracy: %f' % (
                lr, reg, train_accuracy, val_accuracy))
    
print('best validation accuracy achieved during cross-validation: %f' % best_val)
lr 1.000000e-09 reg 5.000000e+04 train accuracy: 0.089898 val accuracy: 0.092000
lr 1.000000e-09 reg 5.000000e+05 train accuracy: 0.098245 val accuracy: 0.089000
lr 1.000000e-09 reg 5.000000e+06 train accuracy: 0.415429 val accuracy: 0.417000
lr 1.000000e-08 reg 5.000000e+04 train accuracy: 0.089347 val accuracy: 0.082000
lr 1.000000e-08 reg 5.000000e+05 train accuracy: 0.412510 val accuracy: 0.408000
lr 1.000000e-08 reg 5.000000e+06 train accuracy: 0.402020 val accuracy: 0.381000
lr 1.000000e-07 reg 5.000000e+04 train accuracy: 0.413551 val accuracy: 0.417000
lr 1.000000e-07 reg 5.000000e+05 train accuracy: 0.417755 val accuracy: 0.419000
lr 1.000000e-07 reg 5.000000e+06 train accuracy: 0.300388 val accuracy: 0.274000
best validation accuracy achieved during cross-validation: 0.419000
CPU times: user 32min 17s, sys: 16.3 s, total: 32min 33s
Wall time: 8min 22s
In [5]:
# Evaluate your trained SVM on the test set
y_test_pred = best_svm.predict(X_test_feats)
test_accuracy = np.mean(y_test == y_test_pred)
print(test_accuracy)
0.43
In [6]:
# An important way to gain intuition about how an algorithm works is to
# visualize the mistakes that it makes. In this visualization, we show examples
# of images that are misclassified by our current system. The first column
# shows images that our system labeled as "plane" but whose true label is
# something other than "plane".

examples_per_class = 8
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
for cls, cls_name in enumerate(classes):
    idxs = np.where((y_test != cls) & (y_test_pred == cls))[0]
    idxs = np.random.choice(idxs, examples_per_class, replace=False)
    for i, idx in enumerate(idxs):
        plt.subplot(examples_per_class, len(classes), i * len(classes) + cls + 1)
        plt.imshow(X_test[idx].astype('uint8'))
        plt.axis('off')
        if i == 0:
            plt.title(cls_name)
plt.show()

Inline question 1

Describe the misclassification results that you see. Do they make sense?

$\color{blue}{\textit Your Answer:}$ Well, the mistakes made by the algorithm are often clear and can have an explanation: trucks are often classified as cars, deers are confused with dogs and so on. However, I've noticed several cases when SVM makes silly mistakes.

Neural Network on image features

Earlier in this assigment we saw that training a two-layer neural network on raw pixels achieved better classification performance than linear classifiers on raw pixels. In this notebook we have seen that linear classifiers on image features outperform linear classifiers on raw pixels.

For completeness, we should also try training a neural network on image features. This approach should outperform all previous approaches: you should easily be able to achieve over 55% classification accuracy on the test set; our best model achieves about 60% classification accuracy.

In [7]:
print(X_train_feats.shape)
(49000, 155)
In [8]:
%%time
from cs231n.classifiers.neural_net import TwoLayerNet

input_dim = X_train_feats.shape[1]
hidden_dim = 500
num_classes = 10

net = TwoLayerNet(input_dim, hidden_dim, num_classes)
best_net = None
results = {}
best_val_acc = -1
learning_rates = np.linspace(1e-1, 1, 5)
regularization_strengths = np.linspace(1e-4, 1e-3, 3)
hidden_sizes = [300, 400, 500]

################################################################################
# TODO: Train a two-layer neural network on image features. You may want to    #
# cross-validate various parameters as in previous sections. Store your best   #
# model in the best_net variable.                                              #
################################################################################
best_params = None
for lr in tqdm_notebook(learning_rates):
    for reg in tqdm_notebook(regularization_strengths):
        for hidden_size in tqdm_notebook(hidden_sizes):
            print('lr: {}, reg: {}, hidden size: {}'.format(lr, reg, hidden_size))
            net = TwoLayerNet(input_dim, hidden_size, num_classes)
            stats = net.train(X_train_feats, y_train, X_val_feats, y_val,
                    num_iters=1500, batch_size=200,
                    learning_rate=lr, learning_rate_decay=0.95,
                    reg=reg, verbose=False)
            y_train_pred = net.predict(X_train_feats)
            train_acc = np.mean(y_train == y_train_pred)

            val_acc = (net.predict(X_val_feats) == y_val).mean()
            print('Validation accuracy: ', val_acc)
            results[(lr, reg, hidden_size)] = val_acc
            if val_acc > best_val_acc:
                best_val_acc = val_acc
                best_net = net   
                best_params = (lr, reg, hidden_size)
################################################################################
#                              END OF YOUR CODE                                #
################################################################################
lr: 0.1, reg: 0.0001, hidden size: 300
Validation accuracy:  0.527
lr: 0.1, reg: 0.0001, hidden size: 400
Validation accuracy:  0.521
lr: 0.1, reg: 0.0001, hidden size: 500
Validation accuracy:  0.528
lr: 0.1, reg: 0.00055, hidden size: 300
Validation accuracy:  0.523
lr: 0.1, reg: 0.00055, hidden size: 400
Validation accuracy:  0.513
lr: 0.1, reg: 0.00055, hidden size: 500
Validation accuracy:  0.528
lr: 0.1, reg: 0.001, hidden size: 300
Validation accuracy:  0.534
lr: 0.1, reg: 0.001, hidden size: 400
Validation accuracy:  0.518
lr: 0.1, reg: 0.001, hidden size: 500
Validation accuracy:  0.512
lr: 0.325, reg: 0.0001, hidden size: 300
Validation accuracy:  0.582
lr: 0.325, reg: 0.0001, hidden size: 400
Validation accuracy:  0.581
lr: 0.325, reg: 0.0001, hidden size: 500
Validation accuracy:  0.579
lr: 0.325, reg: 0.00055, hidden size: 300
Validation accuracy:  0.596
lr: 0.325, reg: 0.00055, hidden size: 400
Validation accuracy:  0.581
lr: 0.325, reg: 0.00055, hidden size: 500
Validation accuracy:  0.581
lr: 0.325, reg: 0.001, hidden size: 300
Validation accuracy:  0.572
lr: 0.325, reg: 0.001, hidden size: 400
Validation accuracy:  0.582
lr: 0.325, reg: 0.001, hidden size: 500
Validation accuracy:  0.58
lr: 0.55, reg: 0.0001, hidden size: 300
Validation accuracy:  0.582
lr: 0.55, reg: 0.0001, hidden size: 400
Validation accuracy:  0.574
lr: 0.55, reg: 0.0001, hidden size: 500
Validation accuracy:  0.585
lr: 0.55, reg: 0.00055, hidden size: 300
Validation accuracy:  0.566
lr: 0.55, reg: 0.00055, hidden size: 400
Validation accuracy:  0.58
lr: 0.55, reg: 0.00055, hidden size: 500
Validation accuracy:  0.579
lr: 0.55, reg: 0.001, hidden size: 300
Validation accuracy:  0.573
lr: 0.55, reg: 0.001, hidden size: 400
Validation accuracy:  0.574
lr: 0.55, reg: 0.001, hidden size: 500
Validation accuracy:  0.573
lr: 0.775, reg: 0.0001, hidden size: 300
Validation accuracy:  0.568
lr: 0.775, reg: 0.0001, hidden size: 400
Validation accuracy:  0.591
lr: 0.775, reg: 0.0001, hidden size: 500
Validation accuracy:  0.57
lr: 0.775, reg: 0.00055, hidden size: 300
Validation accuracy:  0.576
lr: 0.775, reg: 0.00055, hidden size: 400
Validation accuracy:  0.593
lr: 0.775, reg: 0.00055, hidden size: 500
Validation accuracy:  0.553
lr: 0.775, reg: 0.001, hidden size: 300
Validation accuracy:  0.584
lr: 0.775, reg: 0.001, hidden size: 400
Validation accuracy:  0.596
lr: 0.775, reg: 0.001, hidden size: 500
Validation accuracy:  0.558
lr: 1.0, reg: 0.0001, hidden size: 300
Validation accuracy:  0.568
lr: 1.0, reg: 0.0001, hidden size: 400
Validation accuracy:  0.577
lr: 1.0, reg: 0.0001, hidden size: 500
Validation accuracy:  0.576
lr: 1.0, reg: 0.00055, hidden size: 300
Validation accuracy:  0.574
lr: 1.0, reg: 0.00055, hidden size: 400
Validation accuracy:  0.584
lr: 1.0, reg: 0.00055, hidden size: 500
Validation accuracy:  0.572
lr: 1.0, reg: 0.001, hidden size: 300
Validation accuracy:  0.559
lr: 1.0, reg: 0.001, hidden size: 400
Validation accuracy:  0.57
lr: 1.0, reg: 0.001, hidden size: 500
Validation accuracy:  0.568

CPU times: user 49min 8s, sys: 10.6 s, total: 49min 19s
Wall time: 10min 35s
In [9]:
best_params
Out[9]:
(0.325, 0.00055, 300)
In [10]:
best_net = TwoLayerNet(input_dim, best_params[2], num_classes)
stats = best_net.train(X_train_feats, y_train, X_val_feats, y_val,
            num_iters=5000, batch_size=200,
            learning_rate=best_params[0], learning_rate_decay=0.95,
            reg=best_params[1], verbose=True)
iteration 0 / 5000: loss 2.302585
iteration 100 / 5000: loss 1.684012
iteration 200 / 5000: loss 1.378974
iteration 300 / 5000: loss 1.411894
iteration 400 / 5000: loss 1.276494
iteration 500 / 5000: loss 1.229222
iteration 600 / 5000: loss 1.424251
iteration 700 / 5000: loss 1.305805
iteration 800 / 5000: loss 1.379131
iteration 900 / 5000: loss 1.218600
iteration 1000 / 5000: loss 1.214780
iteration 1100 / 5000: loss 1.103082
iteration 1200 / 5000: loss 1.338258
iteration 1300 / 5000: loss 1.134779
iteration 1400 / 5000: loss 1.047762
iteration 1500 / 5000: loss 1.092126
iteration 1600 / 5000: loss 1.082763
iteration 1700 / 5000: loss 0.989798
iteration 1800 / 5000: loss 1.148766
iteration 1900 / 5000: loss 1.208811
iteration 2000 / 5000: loss 1.144347
iteration 2100 / 5000: loss 1.049101
iteration 2200 / 5000: loss 1.193019
iteration 2300 / 5000: loss 1.035391
iteration 2400 / 5000: loss 0.984185
iteration 2500 / 5000: loss 1.124729
iteration 2600 / 5000: loss 1.038711
iteration 2700 / 5000: loss 1.049561
iteration 2800 / 5000: loss 1.024234
iteration 2900 / 5000: loss 1.039935
iteration 3000 / 5000: loss 0.951729
iteration 3100 / 5000: loss 0.963423
iteration 3200 / 5000: loss 0.954624
iteration 3300 / 5000: loss 1.075404
iteration 3400 / 5000: loss 0.998000
iteration 3500 / 5000: loss 0.925702
iteration 3600 / 5000: loss 0.922832
iteration 3700 / 5000: loss 0.826005
iteration 3800 / 5000: loss 0.964083
iteration 3900 / 5000: loss 0.915565
iteration 4000 / 5000: loss 1.097998
iteration 4100 / 5000: loss 0.884534
iteration 4200 / 5000: loss 1.114811
iteration 4300 / 5000: loss 0.856758
iteration 4400 / 5000: loss 0.805189
iteration 4500 / 5000: loss 0.943153
iteration 4600 / 5000: loss 0.879119
iteration 4700 / 5000: loss 0.832438
iteration 4800 / 5000: loss 0.937955
iteration 4900 / 5000: loss 0.828080
In [11]:
# Plot the loss function and train / validation accuracies
plt.subplot(2, 1, 1)
plt.plot(stats['loss_history'])
plt.title('Loss history')
plt.xlabel('Iteration')
plt.ylabel('Loss')

plt.subplot(2, 1, 2)
plt.plot(stats['train_acc_history'], label='train')
plt.plot(stats['val_acc_history'], label='val')
plt.title('Classification accuracy history')
plt.xlabel('Epoch')
plt.ylabel('Clasification accuracy')
plt.show()
In [12]:
# Run your neural net classifier on the test set. You should be able to
# get more than 55% accuracy.

test_acc = (best_net.predict(X_test_feats) == y_test).mean()
print(test_acc)
0.586