Classifying ImageNet: the instant Caffe way

Caffe provides a general Python interface for models with caffe.Net in python/caffe/, but to make off-the-shelf classification easy we provide a caffe.Classifier class and script. Both Python and MATLAB wrappers are provided. However, the Python wrapper has more features so we will describe it here. For MATLAB, refer to matlab/caffe/matcaffe_demo.m.

Before we begin, you must compile Caffe and install the python wrapper by setting your PYTHONPATH. If you haven't yet done so, please refer to the installation instructions. This example uses our pre-trained CaffeNet model, an ILSVRC12 image classifier. You can download it by running ./scripts/ models/bvlc_reference_caffenet.

Ready? Let's start.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# Make sure that caffe is on the python path:
caffe_root = '../'  # this file is expected to be in {caffe_root}/examples
import sys
sys.path.insert(0, caffe_root + 'python')

import caffe

# Set the right path to your model definition file, pretrained model weights,
# and the image you would like to classify.
MODEL_FILE = '../models/bvlc_reference_caffenet/deploy.prototxt'
PRETRAINED = '../models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel'
IMAGE_FILE = 'images/cat.jpg'

Loading a network is easy. caffe.Classifier takes care of everything. Note the arguments for configuring input preprocessing: mean subtraction switched on by giving a mean array, input channel swapping takes care of mapping RGB into the reference ImageNet model's BGR order, and raw scaling multiplies the feature scale from the input [0,1] to the ImageNet model's [0,255].

In [2]:
net = caffe.Classifier(MODEL_FILE, PRETRAINED,
                       mean=np.load(caffe_root + 'python/caffe/imagenet/ilsvrc_2012_mean.npy'),
                       image_dims=(256, 256))

We will set the phase to test since we are doing testing, and will first use CPU for the computation.

In [3]:

Let's take a look at our example image with Caffe's image loading helper.

In [4]:
input_image =
<matplotlib.image.AxesImage at 0x7fda204c0e10>

Time to classify. The default is to actually do 10 predictions, cropping the center and corners of the image as well as their mirrored versions, and average over the predictions:

In [5]:
prediction = net.predict([input_image])  # predict takes any number of images, and formats them for the Caffe net automatically
print 'prediction shape:', prediction[0].shape
print 'predicted class:', prediction[0].argmax()
prediction shape: (1000,)
predicted class: 281

You can see that the prediction is 1000-dimensional, and is pretty sparse.

The predicted class 281 is "Tabby cat." Our pretrained model uses the synset ID ordering of the classes, as listed in ../data/ilsvrc12/synset_words.txt if you fetch the auxiliary imagenet data by ../data/ilsvrc12/ If you look at the top indices that maximize the prediction score, they are cats, foxes, and other cute mammals. Not unreasonable predictions, right?

Now let's classify by the center crop alone by turning off oversampling. Note that this makes a single input, although if you inspect the model definition prototxt you'll see the network has a batch size of 10. The python wrapper handles batching and padding for you!

In [6]:
prediction = net.predict([input_image], oversample=False)
print 'prediction shape:', prediction[0].shape
print 'predicted class:', prediction[0].argmax()
prediction shape: (1000,)
predicted class: 281

Now, why don't we see how long it takes to perform the classification end to end? This result is run from an Intel i5 CPU, so you may observe some performance differences.

In [7]:
%timeit net.predict([input_image])
1 loops, best of 3: 355 ms per loop

It may look a little slow, but note that time is spent on cropping, python interfacing, and running 10 images. For performance, if you really want to make prediction fast, you can optionally code in C++ and pipeline operations better. For experimenting and prototyping the current speed is fine.

Let's time classifying a single image with input preprocessed:

In [8]:
# Resize the image to the standard (256, 256) and oversample net input sized crops.
input_oversampled =[, net.image_dims)], net.crop_dims)
# 'data' is the input blob name in the model definition, so we preprocess for that input.
caffe_input = np.asarray([net.preprocess('data', in_) for in_ in input_oversampled])
# forward() takes keyword args for the input blobs with preprocessed input arrays.
%timeit net.forward(data=caffe_input)
1 loops, best of 3: 210 ms per loop

OK, so how about GPU? it is actually pretty easy:

In [9]:

Voila! Now we are in GPU mode. Let's see if the code gives the same result:

In [10]:
prediction = net.predict([input_image])
print 'prediction shape:', prediction[0].shape
prediction shape: (1000,)
[<matplotlib.lines.Line2D at 0x7fda1ac309d0>]

Good, everything is the same. And how about time consumption? The following benchmark is obtained on the same machine with a GTX 770 GPU:

In [11]:
# Full pipeline timing.
%timeit net.predict([input_image])
10 loops, best of 3: 174 ms per loop
In [12]:
# Forward pass timing.
%timeit net.forward(data=caffe_input)
10 loops, best of 3: 34.2 ms per loop

Pretty fast right? Not as fast as you expected? Indeed, in this python demo you are seeing only 4 times speedup. But remember - the GPU code is actually very fast, and the data loading, transformation and interfacing actually start to take more time than the actual conv. net computation itself!

To fully utilize the power of GPUs, you really want to:

  • Use larger batches, and minimize python call and data transfer overheads.
  • Pipeline data load operations, like using a subprocess.
  • Code in C++. A little inconvenient, but maybe worth it if your dataset is really, really large.

Parting Words

So this is python! We hope the interface is easy enough for one to use. The python wrapper is interfaced with boost::python, and source code can be found at python/caffe with the main interface in and the classification wrapper in If you have customizations to make, start there! Do let us know if you make improvements by sending a pull request!