Caffe models can be transformed to your particular needs by editing the network parameters. In this example, we take the standard Caffe Reference ImageNet model "CaffeNet" and transform it into a fully-convolutional model for efficient, dense inference on large inputs. This model generates a classification map that covers a given input size instead of a single classification. In particular a 8 $\times$ 8 classification map on a 451 $\times$ 451 input gives 64x the output in only 3x the time. The computation exploits a natural efficiency of convolutional neural network (CNN) structure by dynamic programming in the forward pass from shallow to deep layers.

To do so we translate the inner product classifier layers of CaffeNet into convolutional layers. This is the only change: the other layer types are agnostic to spatial size. Convolution is translation-invariant, activations are elementwise operations, and so on. The `fc6`

inner product when carried out as convolution by `fc6-conv`

turns into a 6 \times 6 filter with stride 1 on `pool5`

. Back in image space this gives a classification for each 227 $\times$ 227 box with stride 32 in pixels. Remember the equation for output map / receptive field size, output = (input - kernel_size) / stride + 1, and work out the indexing details for a clear understanding.

Roll up your sleeves for net surgery with pycaffe!

In [1]:

```
!diff imagenet/imagenet_full_conv.prototxt ../models/bvlc_reference_caffenet/deploy.prototxt
```

`pool5`

as input -- and stride 1 for dense classification. Note that the layers are renamed so that Caffe does not try to blindly load the old parameters when it maps layer names to the pretrained model.

In [2]:

```
# Make sure that caffe is on the python path:
caffe_root = '../' # this file is expected to be in {caffe_root}/examples
import sys
sys.path.insert(0, caffe_root + 'python')
import caffe
# Load the original network and extract the fully-connected layers' parameters.
net = caffe.Net('../models/bvlc_reference_caffenet/deploy.prototxt', '../models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel')
params = ['fc6', 'fc7', 'fc8']
# fc_params = {name: (weights, biases)}
fc_params = {pr: (net.params[pr][0].data, net.params[pr][1].data) for pr in params}
for fc in params:
print '{} weights are {} dimensional and biases are {} dimensional'.format(fc, fc_params[fc][0].shape, fc_params[fc][1].shape)
```

In [3]:

```
# Load the fully-convolutional network to transplant the parameters.
net_full_conv = caffe.Net('imagenet/bvlc_caffenet_full_conv.prototxt', '../models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel')
params_full_conv = ['fc6-conv', 'fc7-conv', 'fc8-conv']
# conv_params = {name: (weights, biases)}
conv_params = {pr: (net_full_conv.params[pr][0].data, net_full_conv.params[pr][1].data) for pr in params_full_conv}
for conv in params_full_conv:
print '{} weights are {} dimensional and biases are {} dimensional'.format(conv, conv_params[conv][0].shape, conv_params[conv][1].shape)
```

The convolution weights are arranged in output $\times$ input $\times$ height $\times$ width dimensions. To map the inner product weights to convolution filters, we need to roll the flat inner product vectors into channel $\times$ height $\times$ width filter matrices.

The biases are identical to those of the inner product -- let's transplant these first since no reshaping is needed.

In [4]:

```
for pr, pr_conv in zip(params, params_full_conv):
conv_params[pr_conv][1][...] = fc_params[pr][1]
```

In [5]:

```
for pr, pr_conv in zip(params, params_full_conv):
out, in_, h, w = conv_params[pr_conv][0].shape
W = fc_params[pr][0].reshape((out, in_, h, w))
conv_params[pr_conv][0][...] = W
```

Next, save the new model weights.

In [6]:

```
net_full_conv.save('imagenet/bvlc_caffenet_full_conv.caffemodel')
```

In [7]:

```
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
caffe.set_phase_test()
# load input and configure preprocessing
im = caffe.io.load_image('images/cat.jpg')
net_full_conv.set_mean('data', np.load('../python/caffe/imagenet/ilsvrc_2012_mean.npy'))
net_full_conv.set_channel_swap('data', (2,1,0))
net_full_conv.set_raw_scale('data', 255.0)
# make classification map by forward and print prediction indices at each location
out = net_full_conv.forward_all(data=np.asarray([net_full_conv.preprocess('data', im)]))
print out['prob'][0].argmax(axis=0)
# show net input and confidence map (probability of the top prediction at each location)
plt.subplot(1, 2, 1)
plt.imshow(net_full_conv.deprocess('data', net_full_conv.blobs['data'].data[0]))
plt.subplot(1, 2, 2)
plt.imshow(out['prob'][0].max(axis=0))
```

Out[7]:

The classifications include various cats -- 282 = tiger cat, 281 = tabby, 283 = persian -- and foxes and other mammals.

In this way the fully-connected layers can be extracted as dense features across an image (see `net_full_conv.blobs['fc6'].data`

for instance), which is perhaps more useful than the classification map itself.

Note that this model isn't totally appropriate for sliding-window detection since it was trained for whole-image classification. Nevertheless it can work just fine. Sliding-window training and finetuning can be done by defining a sliding-window ground truth and loss such that a loss map is made for every location and solving as usual. (This is an exercise for the reader.)

*A thank you to Rowland Depp for first suggesting this trick.*