This approach follows ideas described in Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. Arxiv 2013.
First of all, we'll need a little Python script to run the Matlab Selective Search code.
Let's run detection on an image of a couple of cats frolicking (one of the ImageNet detection challenge pictures), which we will download from the web.
Before you get started with this notebook, make sure to follow instructions for getting the pretrained ImageNet model.
!mkdir _temp
!curl http://farm1.static.flickr.com/220/512450093_7717fb8ce8.jpg > _temp/cat.jpg
!echo `pwd`/_temp/cat.jpg > _temp/cat.txt
!python ../python/caffe/detection/detector.py --crop_mode=selective_search --pretrained_model=../examples/imagenet/caffe_reference_imagenet_model --model_def=../examples/imagenet/imagenet_deploy.prototxt _temp/cat.txt _temp/cat.h5
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 212k 100 212k 0 0 263k 0 --:--:-- --:--:-- --:--:-- 519k Loading Caffe model. WARNING: Logging before InitGoogleLogging() is written to STDERR I0318 11:15:21.671466 2104947072 net.cpp:74] Creating Layer conv1 I0318 11:15:21.671494 2104947072 net.cpp:84] conv1 <- data I0318 11:15:21.671500 2104947072 net.cpp:110] conv1 -> conv1 I0318 11:15:21.993130 2104947072 net.cpp:125] Top shape: 10 96 55 55 (2904000) I0318 11:15:21.993155 2104947072 net.cpp:151] conv1 needs backward computation. I0318 11:15:21.993165 2104947072 net.cpp:74] Creating Layer relu1 I0318 11:15:21.993170 2104947072 net.cpp:84] relu1 <- conv1 I0318 11:15:21.993175 2104947072 net.cpp:98] relu1 -> conv1 (in-place) I0318 11:15:21.993182 2104947072 net.cpp:125] Top shape: 10 96 55 55 (2904000) I0318 11:15:21.993187 2104947072 net.cpp:151] relu1 needs backward computation. I0318 11:15:21.993192 2104947072 net.cpp:74] Creating Layer pool1 I0318 11:15:21.993197 2104947072 net.cpp:84] pool1 <- conv1 I0318 11:15:21.993201 2104947072 net.cpp:110] pool1 -> pool1 I0318 11:15:21.993208 2104947072 net.cpp:125] Top shape: 10 96 27 27 (699840) I0318 11:15:21.993212 2104947072 net.cpp:151] pool1 needs backward computation. I0318 11:15:21.993217 2104947072 net.cpp:74] Creating Layer norm1 I0318 11:15:21.993221 2104947072 net.cpp:84] norm1 <- pool1 I0318 11:15:21.993227 2104947072 net.cpp:110] norm1 -> norm1 I0318 11:15:21.993233 2104947072 net.cpp:125] Top shape: 10 96 27 27 (699840) I0318 11:15:21.993238 2104947072 net.cpp:151] norm1 needs backward computation. I0318 11:15:21.993244 2104947072 net.cpp:74] Creating Layer conv2 I0318 11:15:21.993248 2104947072 net.cpp:84] conv2 <- norm1 I0318 11:15:21.993252 2104947072 net.cpp:110] conv2 -> conv2 I0318 11:15:21.995401 2104947072 net.cpp:125] Top shape: 10 256 27 27 (1866240) I0318 11:15:21.995414 2104947072 net.cpp:151] conv2 needs backward computation. I0318 11:15:21.995419 2104947072 net.cpp:74] Creating Layer relu2 I0318 11:15:21.995424 2104947072 net.cpp:84] relu2 <- conv2 I0318 11:15:21.995429 2104947072 net.cpp:98] relu2 -> conv2 (in-place) I0318 11:15:21.995432 2104947072 net.cpp:125] Top shape: 10 256 27 27 (1866240) I0318 11:15:21.995437 2104947072 net.cpp:151] relu2 needs backward computation. I0318 11:15:21.995441 2104947072 net.cpp:74] Creating Layer pool2 I0318 11:15:21.995445 2104947072 net.cpp:84] pool2 <- conv2 I0318 11:15:21.995450 2104947072 net.cpp:110] pool2 -> pool2 I0318 11:15:21.995455 2104947072 net.cpp:125] Top shape: 10 256 13 13 (432640) I0318 11:15:21.995460 2104947072 net.cpp:151] pool2 needs backward computation. I0318 11:15:21.995463 2104947072 net.cpp:74] Creating Layer norm2 I0318 11:15:21.995467 2104947072 net.cpp:84] norm2 <- pool2 I0318 11:15:21.995471 2104947072 net.cpp:110] norm2 -> norm2 I0318 11:15:21.995477 2104947072 net.cpp:125] Top shape: 10 256 13 13 (432640) I0318 11:15:21.995481 2104947072 net.cpp:151] norm2 needs backward computation. I0318 11:15:21.995487 2104947072 net.cpp:74] Creating Layer conv3 I0318 11:15:21.995491 2104947072 net.cpp:84] conv3 <- norm2 I0318 11:15:21.995496 2104947072 net.cpp:110] conv3 -> conv3 I0318 11:15:22.001526 2104947072 net.cpp:125] Top shape: 10 384 13 13 (648960) I0318 11:15:22.001549 2104947072 net.cpp:151] conv3 needs backward computation. I0318 11:15:22.001555 2104947072 net.cpp:74] Creating Layer relu3 I0318 11:15:22.001560 2104947072 net.cpp:84] relu3 <- conv3 I0318 11:15:22.001565 2104947072 net.cpp:98] relu3 -> conv3 (in-place) I0318 11:15:22.001570 2104947072 net.cpp:125] Top shape: 10 384 13 13 (648960) I0318 11:15:22.001574 2104947072 net.cpp:151] relu3 needs backward computation. I0318 11:15:22.001580 2104947072 net.cpp:74] Creating Layer conv4 I0318 11:15:22.001585 2104947072 net.cpp:84] conv4 <- conv3 I0318 11:15:22.001588 2104947072 net.cpp:110] conv4 -> conv4 I0318 11:15:22.005995 2104947072 net.cpp:125] Top shape: 10 384 13 13 (648960) I0318 11:15:22.006008 2104947072 net.cpp:151] conv4 needs backward computation. I0318 11:15:22.006014 2104947072 net.cpp:74] Creating Layer relu4 I0318 11:15:22.006018 2104947072 net.cpp:84] relu4 <- conv4 I0318 11:15:22.006022 2104947072 net.cpp:98] relu4 -> conv4 (in-place) I0318 11:15:22.006027 2104947072 net.cpp:125] Top shape: 10 384 13 13 (648960) I0318 11:15:22.006031 2104947072 net.cpp:151] relu4 needs backward computation. I0318 11:15:22.006037 2104947072 net.cpp:74] Creating Layer conv5 I0318 11:15:22.006042 2104947072 net.cpp:84] conv5 <- conv4 I0318 11:15:22.006045 2104947072 net.cpp:110] conv5 -> conv5 I0318 11:15:22.009027 2104947072 net.cpp:125] Top shape: 10 256 13 13 (432640) I0318 11:15:22.009048 2104947072 net.cpp:151] conv5 needs backward computation. I0318 11:15:22.009057 2104947072 net.cpp:74] Creating Layer relu5 I0318 11:15:22.009062 2104947072 net.cpp:84] relu5 <- conv5 I0318 11:15:22.009065 2104947072 net.cpp:98] relu5 -> conv5 (in-place) I0318 11:15:22.009071 2104947072 net.cpp:125] Top shape: 10 256 13 13 (432640) I0318 11:15:22.009075 2104947072 net.cpp:151] relu5 needs backward computation. I0318 11:15:22.009080 2104947072 net.cpp:74] Creating Layer pool5 I0318 11:15:22.009084 2104947072 net.cpp:84] pool5 <- conv5 I0318 11:15:22.009088 2104947072 net.cpp:110] pool5 -> pool5 I0318 11:15:22.009093 2104947072 net.cpp:125] Top shape: 10 256 6 6 (92160) I0318 11:15:22.009099 2104947072 net.cpp:151] pool5 needs backward computation. I0318 11:15:22.009104 2104947072 net.cpp:74] Creating Layer fc6 I0318 11:15:22.009107 2104947072 net.cpp:84] fc6 <- pool5 I0318 11:15:22.009111 2104947072 net.cpp:110] fc6 -> fc6 I0318 11:15:22.271282 2104947072 net.cpp:125] Top shape: 10 4096 1 1 (40960) I0318 11:15:22.271308 2104947072 net.cpp:151] fc6 needs backward computation. I0318 11:15:22.271320 2104947072 net.cpp:74] Creating Layer relu6 I0318 11:15:22.271327 2104947072 net.cpp:84] relu6 <- fc6 I0318 11:15:22.271332 2104947072 net.cpp:98] relu6 -> fc6 (in-place) I0318 11:15:22.271337 2104947072 net.cpp:125] Top shape: 10 4096 1 1 (40960) I0318 11:15:22.271340 2104947072 net.cpp:151] relu6 needs backward computation. I0318 11:15:22.271345 2104947072 net.cpp:74] Creating Layer drop6 I0318 11:15:22.271349 2104947072 net.cpp:84] drop6 <- fc6 I0318 11:15:22.271353 2104947072 net.cpp:98] drop6 -> fc6 (in-place) I0318 11:15:22.271369 2104947072 net.cpp:125] Top shape: 10 4096 1 1 (40960) I0318 11:15:22.271374 2104947072 net.cpp:151] drop6 needs backward computation. I0318 11:15:22.271380 2104947072 net.cpp:74] Creating Layer fc7 I0318 11:15:22.271384 2104947072 net.cpp:84] fc7 <- fc6 I0318 11:15:22.271389 2104947072 net.cpp:110] fc7 -> fc7 I0318 11:15:22.389216 2104947072 net.cpp:125] Top shape: 10 4096 1 1 (40960) I0318 11:15:22.389250 2104947072 net.cpp:151] fc7 needs backward computation. I0318 11:15:22.389258 2104947072 net.cpp:74] Creating Layer relu7 I0318 11:15:22.389264 2104947072 net.cpp:84] relu7 <- fc7 I0318 11:15:22.389271 2104947072 net.cpp:98] relu7 -> fc7 (in-place) I0318 11:15:22.389276 2104947072 net.cpp:125] Top shape: 10 4096 1 1 (40960) I0318 11:15:22.389279 2104947072 net.cpp:151] relu7 needs backward computation. I0318 11:15:22.389284 2104947072 net.cpp:74] Creating Layer drop7 I0318 11:15:22.389289 2104947072 net.cpp:84] drop7 <- fc7 I0318 11:15:22.389293 2104947072 net.cpp:98] drop7 -> fc7 (in-place) I0318 11:15:22.389298 2104947072 net.cpp:125] Top shape: 10 4096 1 1 (40960) I0318 11:15:22.389302 2104947072 net.cpp:151] drop7 needs backward computation. I0318 11:15:22.389308 2104947072 net.cpp:74] Creating Layer fc8 I0318 11:15:22.389312 2104947072 net.cpp:84] fc8 <- fc7 I0318 11:15:22.389317 2104947072 net.cpp:110] fc8 -> fc8 I0318 11:15:22.417853 2104947072 net.cpp:125] Top shape: 10 1000 1 1 (10000) I0318 11:15:22.417879 2104947072 net.cpp:151] fc8 needs backward computation. I0318 11:15:22.417887 2104947072 net.cpp:74] Creating Layer prob I0318 11:15:22.417892 2104947072 net.cpp:84] prob <- fc8 I0318 11:15:22.417898 2104947072 net.cpp:110] prob -> prob I0318 11:15:22.417917 2104947072 net.cpp:125] Top shape: 10 1000 1 1 (10000) I0318 11:15:22.417920 2104947072 net.cpp:151] prob needs backward computation. I0318 11:15:22.417924 2104947072 net.cpp:162] This network produces output prob I0318 11:15:22.417928 2104947072 net.cpp:173] Collecting Learning Rate and Weight Decay. I0318 11:15:22.417944 2104947072 net.cpp:166] Network initialization done. I0318 11:15:22.417948 2104947072 net.cpp:167] Memory required for Data 42022840 Caffe model loaded in 1.621 s Loading input and assembling batches... selective_search({'/Users/karayev/work/caffe-bvlc/examples/_temp/cat.jpg'}, '/var/folders/4q/vm1lt3t91p9gl06nz6s1dzzw0000gn/T/tmpOcszAc.mat') 23 batches assembled in 5.225 s Processing 1 files in 23 batches ...on batch 0/23, elapsed time: 0.000 s ...on batch 10/23, elapsed time: 3.819 s ...on batch 20/23, elapsed time: 7.571 s Processing complete after 8.818 s. /usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/pytables.py:2446: PerformanceWarning: your performance may suffer as PyTables will pickle object types that it cannot map directly to c-types [inferred_type->mixed,key->block1_values] [items->['feat']] warnings.warn(ws, PerformanceWarning) Done. Saving to _temp/cat.h5 took 0.160 s.
Running this outputs a DataFrame with the filenames, selected windows, and their ImageNet scores to an HDF5 file. (We only ran on one image, so the filenames will all be the same.)
import pandas as pd
df = pd.read_hdf('_temp/cat.h5', 'df')
print(df.shape)
print(df.iloc[0])
(223, 5) feat [6.90396e-06, 1.27811e-06, 1.82159e-06, 1.1020... ymin 0 xmin 0 ymax 500 xmax 496 Name: /Users/karayev/work/caffe-bvlc/examples/_temp/cat.jpg, dtype: object
In general, detector.py
is most efficient when running on a lot of images: it first extracts window proposals for all of them, batches the windows for efficient GPU processing, and then outputs the results.
Simply list an image per line in the images_file
, and it will process all of them.
Although this guide gives an example of ImageNet detection, detector.py
is clever enough to adapt to different Caffe models’ input dimensions, batch size, and output categories.
Refer to python detector.py --help
and the images_dim
and images_mean_file
parameters to describe your data set.
No need for hardcoding.
Anyway, let's now load ImageNet class names and make a DataFrame of the features. Note you'll need the auxiliary ilsvrc2012 data fetched by data/ilsvrc12/get_ilsvrc12_aux.sh
.
with open('../data/ilsvrc12/synset_words.txt') as f:
labels_df = pd.DataFrame([
{
'synset_id': l.strip().split(' ')[0],
'name': ' '.join(l.strip().split(' ')[1:]).split(',')[0]
}
for l in f.readlines()
])
labels_df.sort('synset_id')
feats_df = pd.DataFrame(np.vstack(df.feat.values), columns=labels_df['name'])
print(feats_df.iloc[0])
name tench 0.000007 goldfish 0.000001 great white shark 0.000002 tiger shark 0.000001 hammerhead 0.000007 electric ray 0.000004 stingray 0.000007 cock 0.000060 hen 0.003055 ostrich 0.000010 brambling 0.000004 goldfinch 0.000001 house finch 0.000004 junco 0.000002 indigo bunting 0.000001 ... daisy 0.000002 yellow lady's slipper 0.000002 corn 0.000020 acorn 0.000011 hip 0.000003 buckeye 0.000010 coral fungus 0.000005 agaric 0.000019 gyromitra 0.000039 stinkhorn 0.000002 earthstar 0.000025 hen-of-the-woods 0.000035 bolete 0.000037 ear 0.000008 toilet tissue 0.000019 Name: 0, Length: 1000, dtype: float32
Let's look at the activations.
gray()
matshow(feats_df.values)
xlabel('Classes')
ylabel('Windows')
<matplotlib.text.Text at 0x107290150>
<matplotlib.figure.Figure at 0x106877510>
Now let's take max across all windows and plot the top classes.
max_s = feats_df.max(0)
max_s.sort(ascending=False)
print(max_s[:10])
name proboscis monkey 0.923392 tiger cat 0.918685 milk can 0.783663 American black bear 0.637560 broccoli 0.612832 tiger 0.515798 platypus 0.514660 dhole 0.509583 lion 0.496187 dingo 0.482885 dtype: float32
Okay, there are indeed cats in there (and some nonsense). Picking good localizations is work in progress; manually, we see that the third and thirteenth top detections correspond to the two cats.
# Find, print, and display max detection.
window_order = pd.Series(feats_df.values.max(1)).order(ascending=False)
i = window_order.index[3]
j = window_order.index[13]
# Show top predictions for top detection.
f = pd.Series(df['feat'].iloc[i], index=labels_df['name'])
print('Top detection:')
print(f.order(ascending=False)[:5])
print('')
# Show top predictions for 10th top detection.
f = pd.Series(df['feat'].iloc[j], index=labels_df['name'])
print('10th detection:')
print(f.order(ascending=False)[:5])
# Show top detection in red, 10th top detection in blue.
im = imread('_temp/cat.jpg')
imshow(im)
currentAxis = plt.gca()
det = df.iloc[i]
coords = (det['xmin'], det['ymin']), det['xmax'] - det['xmin'], det['ymax'] - det['ymin']
currentAxis.add_patch(Rectangle(*coords, fill=False, edgecolor='r', linewidth=5))
det = df.iloc[j]
coords = (det['xmin'], det['ymin']), det['xmax'] - det['xmin'], det['ymax'] - det['ymin']
currentAxis.add_patch(Rectangle(*coords, fill=False, edgecolor='b', linewidth=5))
Top detection: name tiger cat 0.882021 tiger 0.075015 tabby 0.024404 lynx 0.012947 Egyptian cat 0.004409 dtype: float32 10th detection: name tiger cat 0.681169 Pembroke 0.063924 dingo 0.050501 golden retriever 0.027614 tabby 0.021413 dtype: float32
<matplotlib.patches.Rectangle at 0x108516c90>
That's cool. Both of these detections are tiger cats. Let's take all 'tiger cat' detections and NMS them to get rid of overlapping windows.
def nms_detections(dets, overlap=0.5):
"""
Non-maximum suppression: Greedily select high-scoring detections and
skip detections that are significantly covered by a previously
selected detection.
This version is translated from Matlab code by Tomasz Malisiewicz,
who sped up Pedro Felzenszwalb's code.
Parameters
----------
dets: ndarray
each row is ['xmin', 'ymin', 'xmax', 'ymax', 'score']
overlap: float
minimum overlap ratio (0.5 default)
Output
------
dets: ndarray
remaining after suppression.
"""
if np.shape(dets)[0] < 1:
return dets
x1 = dets[:, 0]
y1 = dets[:, 1]
x2 = dets[:, 2]
y2 = dets[:, 3]
w = x2 - x1
h = y2 - y1
area = w * h
s = dets[:, 4]
ind = np.argsort(s)
pick = []
counter = 0
while len(ind) > 0:
last = len(ind) - 1
i = ind[last]
pick.append(i)
counter += 1
xx1 = np.maximum(x1[i], x1[ind[:last]])
yy1 = np.maximum(y1[i], y1[ind[:last]])
xx2 = np.minimum(x2[i], x2[ind[:last]])
yy2 = np.minimum(y2[i], y2[ind[:last]])
w = np.maximum(0., xx2 - xx1 + 1)
h = np.maximum(0., yy2 - yy1 + 1)
o = w * h / area[ind[:last]]
to_delete = np.concatenate(
(np.nonzero(o > overlap)[0], np.array([last])))
ind = np.delete(ind, to_delete)
return dets[pick, :]
scores = feats_df['tiger cat']
windows = df[['xmin', 'ymin', 'xmax', 'ymax']].values
dets = np.hstack((windows, scores[:, np.newaxis]))
nms_dets = nms_detections(dets)
Show top 3 NMS'd detections for 'tiger cat' in the image.
imshow(im)
currentAxis = plt.gca()
colors = ['r', 'b', 'y']
for c, det in zip(colors, nms_dets[:3]):
currentAxis.add_patch(
Rectangle((det[0], det[1]), det[2], det[3],
fill=False, edgecolor=c, linewidth=5)
)
Remove the temp directory to clean up.
import shutil
shutil.rmtree('_temp')