MintEye Breaker

First we import stuff:

  • urlopen to download the captcha images
  • BeautifulSoup to get their addresses from the HTML
  • some matplotlib stuff that isn’t imported yet
  • PIL.Image because they’re JPGs
  • QtWebKit for scraping
In [1]:
import sys, os
from io import BytesIO
from urllib.request import urlopen

from matplotlib.image import pil_to_array
from mpl_toolkits.axes_grid.axes_grid import ImageGrid
from scipy import ndimage

from bs4 import BeautifulSoup
from PIL.JpegImagePlugin import Image

from PyQt4.QtGui import *
from PyQt4.QtCore import *
from PyQt4.QtWebKit import *

MintEye needs javascript, so we need something that handles that.

IPython mostly means Qt is installed so we’ll use its webkit.

In [2]:
class Renderer(QWebPage):
    def __init__(self, url):
        self.app = QApplication(sys.argv)
        QWebPage.__init__(self)
        @self.loadFinished.connect
        def _loadFinished(result):
            self.frame = self.mainFrame()
            self.app.quit()
        self.mainFrame().load(QUrl(url))
        self.app.exec_()

r = Renderer('http://www.minteye.com/products.aspx')
page = r.frame.toHtml()

html = BeautifulSoup(page)

With everything loaded, we can scrape the page for the captcha images and convert them to greyscale

In [3]:
imgs = []
for img in html.find_all(class_='minteyeimageloaded'):
    img_bytes = BytesIO(urlopen(img['src']).read())
    imgs.append(pil_to_array(Image.open(img_bytes).convert('L')))

Let’s show them quickly

In [4]:
side1 = round(len(imgs) ** (1/2))
side2 = len(imgs) // side1

fig = plt.figure(1, (side1*3, side2*4))
grid = ImageGrid(fig, 111,
                 nrows_ncols=(side1, side2),
                 axes_pad=.1)

for i, img in enumerate(imgs):
    grid[i].imshow(img, cmap=cm.Greys_r)

Now the meat: we apply the sobel filter to every image, and sum up the white pixels.

The minimum sum hopefully corresponds the unswirled one, or one of its neighbors, because those have less stretched edges.

In [5]:
fig = plt.figure(1, (side1*3, side2*4))
grid = ImageGrid(fig, 111,
                 nrows_ncols=(side1, side2),
                 axes_pad=.1)

sobels = []
total = []
for img in imgs:
    img = img.astype('int32')
    sobel = ndimage.generic_gradient_magnitude(img, ndimage.sobel)
    sobels.append(sobel)
    total.append(np.sum(sobel))

print(total)
res = np.argmin(total)
    
for i, sobel in enumerate(sobels):
    grid[i].set_label(str(total[i]))
    grid[i].imshow(sobel, cmap=cm.gist_earth if i==res else cm.Greys_r)
[6508898, 6331381, 6166045, 6003466, 5857080, 5714339, 5570207, 5444602, 5324195, 5234583, 5155544, 5084988, 5045224, 5005006, 4982156, 4965071, 5260894, 4956353, 4972140, 4997309, 5026343, 5077861, 5142310, 5216872, 5316882, 5441496, 5578822, 5746374, 5930380, 6145281]