The Minteye CAPTCHA system can be seen at http://www.minteye.com/Products.aspx. This notebook goes through a procedure to solve it. Firtsly, let's import the Python libraries we need.

In [1]:
%pylab inline
import os
import numpy as np
from PIL import Image


Welcome to pylab, a matplotlib-based Python environment [backend: module://IPython.zmq.pylab.backend_inline].



Now define some convenience functions. A set of test images from the CAPTCHA were downloading into directories test1 through test6. Within those directories the files img_1.jpg to img_30.jpg hold the input images. The goal is to find the 'untwisted' image. The following functions use PIL to load images from a directory and use numpy to tile a set of images for display.

In [2]:
def load_images_from_directory(dir_name):
"""Loads test images from a directory converting them to grayscale."""
out = []
for x in range(1, 31):
out.append(np.array(Image.open(os.path.join(dir_name, 'img_%s.jpg' % (x,))).convert('L')))
return out

In [3]:
def tile_images(ims):
"""Return a tiled set of images from a sequence."""
ncols = int(np.ceil(np.sqrt(len(ims))))
rows = []
for idx in range(0, len(ims), ncols):
rows.append(np.hstack(ims[idx:idx+ncols]))
return np.vstack(rows)


Our assumption is that natural advert images contain more detail aligned to the horizontal and vertical than aligned to the diagonal. To gather a directional metric of information we use the 2D FFT. In the FFT, bright pixels far from the centre show high frequency detail (such as text). The direction shows the alignment of the detail.

In [4]:
def fft_images(ims):
return [np.abs(np.fft.fftshift(np.fft.fft2(im))) for im in ims]


For example, let's load the test1 images and plot the FFTs.

In [5]:
dir_name = 'test1'
ffts = fft_images(ims)

figure(figsize=(20,10))

subplot(121)
imshow(tile_images(ims), cmap='gray')
title(dir_name)

subplot(122)
imshow(tile_images(np.log(ffts)))
title(dir_name + ' FFT')

Out[5]:
<matplotlib.text.Text at 0x489ac10>


The 'true' image is fourth from the left on the bottom row. As you can see from the FFTS, this is the FFT which has the most difference between horizontal/vertical and the diagonal. We'll use this ratio as the basis for a 'correctness' metric.

In [6]:
def metric(im):
"""Return a metric for each image's 2D FFT which is the ratio of the horizontal/vertical
aligned components to the diagonal.
"""
A = np.fft.fftshift(np.log(im))
h = []
v = []
d = []

for x in range(min(A.shape)):
h.append(A[0,x])
v.append(A[x,0])
d.append(A[x,x])

return float(max(np.mean(h), np.mean(v)) / np.mean(d))


Noe let's test our downloaded CAPTCHAs. We define functions top plot the correctness metric for all images and also one which uses the largest metric to solve the CAPTCHA.

In [7]:
def test(dir_name, gt):
m = [metric(im) for im in fft_images(load_images_from_directory(dir_name))]
plot(range(1, len(m)+1), m)
xmin, xmax, ymin, ymax = axis()
vlines(gt, ymin, ymax)
axis('tight')
#xlabel('Image')
ylabel('Correctness metric')
title('%s, Ground truth: %s' % (dir_name, gt))

def solve(dir_name):
m = [metric(im) for im in fft_images(ims)]
idx = np.argmax(m)
return ims[idx]


Plotting the test image metrics against ground truth reveals that we always manage to get to within one image of the truth. Empirically we need only be within two images to be deemed 'human'.

In [8]:
figure(figsize=(16,9))
subplot(321)
test('test1', 28)
subplot(322)
test('test2', 18)
subplot(323)
test('test3', 16)
subplot(324)
test('test4', 27)
subplot(325)
test('test5', 7)
subplot(326)
test('test6', 4)


Further, let's plot the 'solved' image for each input.

In [9]:
figure(figsize=(10,9))
subplot(321)
imshow(solve('test1'), cmap='gray')
subplot(322)
imshow(solve('test2'), cmap='gray')
subplot(323)
imshow(solve('test3'), cmap='gray')
subplot(324)
imshow(solve('test4'), cmap='gray')
subplot(325)
imshow(solve('test5'), cmap='gray')
subplot(326)
imshow(solve('test6'), cmap='gray')

Out[9]:
<matplotlib.image.AxesImage at 0x9a379d0>