Follow the lecture Introduction to Deep Learning with Python

In [1]:

from IPython.display import YouTubeVideo
YouTubeVideo('S75EdAcXHKk')

Out[1]:

after performing the set described below you can walk through the notebooks:

Get Data¶

download mnist files from http://yann.lecun.com/exdb/mnist/ and open the zipped file into ~/Downloads/lisa/data/mnist/

In [4]:

%%bash
mkdir -p ~/Downloads/lisa/data/mnist/
cd ~/Downloads/lisa/data/mnist/
#wget http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
#wget http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
#wget http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
#wget http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
gunzip *.gz

In [5]:

!ls ~/Downloads/lisa/data/mnist/

t10k-images-idx3-ubyte	train-images-idx3-ubyte
t10k-labels-idx1-ubyte	train-labels-idx1-ubyte

In [6]:

import os,sys,inspect
currentdir = os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe())))
parentdir = os.path.dirname(currentdir)
sys.path.insert(0,parentdir) 
import load

load.datasets_dir = os.path.expanduser("~/Downloads/lisa/data/")

trX, teX, trY, teY = load.mnist(onehot=True)

we have 60K images of handwritten numbers for training and 10K for testing. Each image has 28*28 gray-level pixels

In [7]:

trX.shape, teX.shape, trY.shape, teY.shape

Out[7]:

((60000, 784), (10000, 784), (60000, 10), (10000, 10))

In [9]:

import numpy as np
np.min(trX),np.max(trX),np.min(teX),np.max(teX)

Out[9]:

(0.0, 1.0, 0.0, 1.0)

Installation¶

Install Theano. On OS X it is highly recommended to install using anaconda. If you also installed MKL, which is also highly recommended, you will need to export the following environment variables before running ipython notebook

export DYLD_FALLBACK_LIBRARY_PATH=$HOME/anaconda/lib:$DYLD_FALLBACK_LIBRARY_PATH

On AWS I started from the following community image "Ubuntu-14.04-Caffe-GPU - ami-588d0030"

install Theano

In [14]:

!sudo pip install theano

Downloading/unpacking theano
  Downloading Theano-0.6.0.tar.gz (1.8MB): 1.8MB downloaded
  Running setup.py (path:/tmp/pip_build_root/theano/setup.py) egg_info for package theano
    
    warning: manifest_maker: MANIFEST.in, line 7: 'recursive-include' expects <dir> <pattern1> <pattern2> ...
    
Requirement already satisfied (use --upgrade to upgrade): numpy>=1.5.0 in /usr/local/lib/python2.7/dist-packages/numpy-1.9.0-py2.7-linux-x86_64.egg (from theano)
Requirement already satisfied (use --upgrade to upgrade): scipy>=0.7.2 in /usr/local/lib/python2.7/dist-packages (from theano)
Installing collected packages: theano
  Running setup.py install for theano
    changing mode of build/scripts-2.7/theano-cache from 644 to 755
    changing mode of build/scripts-2.7/theano-nose from 644 to 755
    changing mode of build/scripts-2.7/theano-test from 644 to 755
    
    warning: manifest_maker: MANIFEST.in, line 7: 'recursive-include' expects <dir> <pattern1> <pattern2> ...
    
    changing mode of /usr/local/bin/theano-nose to 755
    changing mode of /usr/local/bin/theano-cache to 755
    changing mode of /usr/local/bin/theano-test to 755
Successfully installed theano
Cleaning up...

Using GPU¶

You will need to install CUDA and export the following environment variables before running ipython notebook

On OSX

export DYLD_FALLBACK_LIBRARY_PATH=/Developer/NVIDIA/CUDA-6.5/lib:$DYLD_FALLBACK_LIBRARY_PATH
export PATH=/Developer/NVIDIA/CUDA-6.5/bin:$PATH

On AWS

export PATH=/usr/local/cuda-6.5/bin/:$PATH

you can control if CPU or GPU is used inside the notebook, however, you can only set it once. You need to restart the notebook if you want to change it

In [1]:

import os
#os.environ['THEANO_FLAGS'] = 'mode=FAST_RUN,device=cpu,floatX=float32'
os.environ['THEANO_FLAGS'] = 'mode=FAST_RUN,device=gpu,floatX=float32'

and check if you are using CPU or GPU

In [2]:

from theano import function, config, shared, sandbox
import theano.tensor as T
import numpy
import time

vlen = 10 * 30 * 768  # 10 x #cores x # threads per core
iters = 1000

rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], T.exp(x))
print f.maker.fgraph.toposort()
t0 = time.time()
for i in xrange(iters):
    r = f()
t1 = time.time()
print 'Looping %d times took' % iters, t1 - t0, 'seconds'
print 'Result is', r
if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):
    print 'Used the cpu'
else:
    print 'Used the gpu'

[GpuElemwise{exp,no_inplace}(<CudaNdarrayType(float32, vector)>), HostFromGpu(GpuElemwise{exp,no_inplace}.0)]
Looping 1000 times took 0.738588094711 seconds
Result is [ 1.23178029  1.61879349  1.52278066 ...,  2.20771813  2.29967761
  1.62323296]
Used the gpu

Using gpu device 0: GRID K520

On OSX, for this example, the GPU is x4-x5 faster than CPU

In [ ]: