In [1]:
using PyPlot
using NtToolBox

WARNING: Method definition macroexpand(Module, Any) in module Compat at C:\Users\Ayman\.julia\v0.5\Compat\src\Compat.jl:1491 overwritten in module MacroTools at C:\Users\Ayman\.julia\v0.5\MacroTools\src\utils.jl:64.


$\newcommand{\dotp}[2]{\langle #1, #2 \rangle}$ $\newcommand{\umin}[1]{\underset{#1}{\min}\;}$ $\newcommand{\RR}{\mathbb{R}}$ $\newcommand{\pd}[2]{ \frac{ \partial #1}{\partial #2} }$ $\newcommand{\norm}[1]{\|#1\|}$ $\newcommand{\pa}[1]{\left(#1\right)}$

We consider the problem of finding a minimizer of a convex smooth function $\;f : \RR^d \rightarrow \RR$; that is, we want to solve $$\umin{x \in \RR^d} f(x)$$

Note that the minimizer is not necessarily unique.

The simplest method is gradient descent, which iteratively computes $$x^{(k+1)} = x^{(k)} - \tau \nabla f(x^{(k)}),$$ where $\nabla f(x) \in \RR^d$ is the gradient of $f$ at $x$, $x^{(0)} \in \RR^d$ is an arbitrary initial point, and the stepsize $\tau$ must satisfy $$0<\tau<2/\beta,$$ to have convergence, where $\beta$ is a Lipschitz constant of $\nabla f$; that is, $$\forall (x,x') \in \RR^N \times \RR^N, \quad \norm{\nabla f(x) - \nabla f(x')} \leq \beta \norm{x-x'}.$$

For instance, if $f$ is of class $C^2$, $$\beta= \sup_x \norm{Hf(x)},$$ where $Hf(x) \in \RR^{d \times d}$ is the Hessian of $f$ at $x$ and $\norm{\cdot}$ is the spectral operator norm (largest eigenvalue).

We consider a simple problem, corresponding to the minimization of a 2-D ($d=2$) quadratic form $$f(x) = \frac{1}{2} \pa{ x_1^2 + \eta x_2^2 } ,$$ where $\eta>0$ controls the anisotropy, and hence the difficulty, of the problem.

We can note that minimizing a strongly convex quadratic function is equivalent to solving a positive definite linear system. Gradient descent is then equivalent to the Richardson iteration applied to this linear system.

We define the anisotropy parameter $\eta$:

In [2]:
eta = 8;


We define the function $f$:

In [3]:
f = x -> (x[1]^2 + eta*x[2]^2) / 2;


Note: a 'lambda' function is a one-line function definition; in Matlab, this would be [email protected](x)(x(1)^2+eta*x(2)^2)/2;

An example of function evaluation:

In [4]:
f([2,3])

Out[4]:
38.0

We display the function using a contourplot:

In [5]:
include("NtToolBox/src/ndgrid.jl")
t = linspace(-.7,.7,101)
(u, v) = meshgrid(t,t)
F = (u.^2 + eta.*v.^2) ./ 2
contourf(t, t, F, 35);


We define the gradient of $f$:

In [6]:
function Grad_f(x)
return[x[1]; eta*x[2]]
end;


An example of evaluation:

In [7]:
Grad_f([1,2]);


Since $f$ is quadratic, its Hessian is the constant matrix $\left(\begin{array}{cc}1&0\\0&\eta\end{array}\right)$. Its spectral norm, which is the Lipschitz constant of Grad_f, is $\beta=\max(1,\eta)$.

The stepsize $\tau$ must satisfy $0< \tau < 2/\beta$.

In [8]:
tau = 1.8/max(eta,1); tau

Out[8]:
0.225

Now we implement the gradient descent method: given the initial estimate $x^{(0)}$ of the solution, the stepsize $\tau$ and the number $k$ of iterations, we compute $x^{(k)}$:

In [9]:
nbiter = 10
x = [1, 1]  # initial estimate of the solution
for iter in 1:nbiter  # iter goes from 1 to nbiter
end
x   # to display x, like in Matlab. Use print(x) if this is not the last command of the cell, else nothing is displayed

Out[9]:
2-element Array{Float64,1}:
0.0781658
0.107374 
In [10]:
f(x)

Out[10]:
0.049171809817584886

We encapsulate the code in a function called GradientDescent.

In [11]:
function GradDescent(Grad_f, x0, nbiter, tau)
x = x0
for iter in 1:nbiter  # iter goes from 1 to nbiter
x = x - tau.*Grad_f(x)   # x has type 'array'. Like in C, one can also write x-=tau*Grad_f(x)
end
return x
end;

In [12]:
GradDescent(Grad_f, [1, 1], 10, tau)

Out[12]:
2-element Array{Float64,1}:
0.0781658
0.107374 

We define a function called GradDescentArray which puts the iterates in a 'matrix' (a 2-D array in fact); they are first put in a list and the list is converted to an array at the end (the + operation on lists concatenates them, whereas on arrays this is the classical elementwise addition).

In [13]:
function GradDescentArray(Grad_f, x0, nbiter, tau)
x = x0
xlist = [x0[1] x0[2]]
for iter in 1:nbiter
xlist = [xlist; [x[1] x[2]]]
end
return xlist
end;

In [14]:
xarray = GradDescentArray(Grad_f,[0.6,0.6],10,0.225)

Out[14]:
11×2 Array{Float64,2}:
0.6         0.6
0.465      -0.48
0.360375    0.384
0.279291   -0.3072
0.21645     0.24576
0.167749   -0.196608
0.130005    0.157286
0.100754   -0.125829
0.0780845   0.100663
0.0605155  -0.0805306
0.0468995   0.0644245

We can access the elements of an array like in Matlab. Be careful: like in C, the first element has index 0! Some examples:

In [15]:
xarray[2, 1]

Out[15]:
0.46499999999999997
In [16]:
xarray[2, :]

Out[16]:
2-element Array{Float64,1}:
0.465
-0.48 
In [17]:
xarray[:, 1]

Out[17]:
11-element Array{Float64,1}:
0.6
0.465
0.360375
0.279291
0.21645
0.167749
0.130005
0.100754
0.0780845
0.0605155
0.0468995

We can apply the function $f$ to every iterate $x^{(k)}$ of the list xarray, using the command map:

In [18]:
mapslices(f, xarray, 2)

Out[18]:
11×1 Array{Float64,2}:
1.62
1.02971
0.654759
0.416489
0.265017
0.168689
0.107407
0.0684076
0.043581
0.0277718
0.0177019

We plot the cost function $f(x^{(k)})$ as a function of $k$, in log-scale:

In [19]:
plot(0:size(xarray)[1] - 1, log10(mapslices(f, xarray, 2)), "o-");


We remark that because the function $f$ is strongly convex, the convergence is linear. Also, we can note that gradient descent is monotonic: $f(x^{(k)})$ is decreased at every iteration.

We plot the iterates above the contourplot of $f$:

In [20]:
contourf(t, t, F, 35)
plot(xarray[:, 1], xarray[:, 2], "w.-")

Out[20]:
1-element Array{Any,1}:
PyObject <matplotlib.lines.Line2D object at 0x000000000A23D828>

## Gradient descent in large dimension: image restoration¶

Local differential operators like the discrete gradient are fundamental for variational image processing.

We load the image Lena in the array $x^\sharp$ (we choose the symbol # - 'sharp', because the image is clean, not degraded).

In [21]:
name = "NtToolBox/src/data/lena.png"
println(size(xsharp))
println("The size of the image is $(size(xsharp,1)) x$(size(xsharp,2)).")
println("The range of the pixel values is [$(minimum(xsharp)),$(maximum(xsharp))].") # the range of the pixel values is [0.0, 1.0] because load_image scales the image.

(512,512)
The size of the image is 512 x 512.
The range of the pixel values is [0.0, 256.0].

In [31]:
xsharp = readdlm("xsharp");

SystemError: opening file xsharp: No such file or directory

in #systemerror#51 at .\error.jl:34 [inlined]
in systemerror(::String, ::Bool) at .\error.jl:34
in open(::String, ::Bool, ::Bool, ::Bool, ::Bool, ::Bool) at .\iostream.jl:89
in #readdlm_auto#11(::Array{Any,1}, ::Function, ::String, ::Char, ::Type{T}, ::Char, ::Bool) at .\datafmt.jl:124
in #readdlm#7(::Array{Any,1}, ::Function, ::String, ::Char, ::Char) at .\datafmt.jl:73
in #readdlm#5(::Array{Any,1}, ::Function, ::String) at .\datafmt.jl:56
in readdlm(::String) at .\datafmt.jl:56

We display the image.

In [24]:
imageplot(xsharp)
set_cmap("gray")
colorbar()       # displays the color bar close to the image
title("This is Lena");


We define the 'discrete gradient' $D$, which is the linear operator which maps an image to an image whose values are pairs of vertical and horizontal finite differences:

$$(D x)_{n_1,n_2} = \big((D x)_{n_1,n_2,v},(D x)_{n_1,n_2,h}\big)=( x_{n_1+1,n_2}-x_{n_1,n_2}, x_{n_1,n_2+1}-x_{n_1,n_2} ) \in \RR^2,$$

where $n_1$ and $n_2$ are the row and column numbers, respectively (the origin $n_1=n_2=0$ corresponds to the pixel at the top left corner) and where Neumann boundary conditions are assumed: a difference accross the boundary is zero.

Let us define the operator $D$ as a function:

In [25]:
function D(x)
vdiff = [diff(x,1) ; zeros(1,size(x,2))] # concatenates along the rows
hdiff = [diff(x,2) zeros(size(x,1),1)]   # concatenates along the columns
return cat(3, vdiff, hdiff) # combination along a third dimension
end;

In [26]:
v = D(xsharp);


We display the two components as two grayscale images.

In [27]:
fig = figure(figsize=(16,7))
suptitle("The two images of vertical and horizontal finite differences")
subplot(1,2,1)
imageplot(v[:, :, 1])
title("Image of vertical finite differences")
set_cmap("gray")
colorbar()

subplot(1,2,2)
imageplot(v[:, :, 2])
colorbar()
title("Image of horizontal finite differences")
set_cmap("gray")


Let us display the image of the magnitudes $\norm{(D x)_{n_1,n_2}}$, which are large near edges.

In [28]:
imshow(sqrt(sum(v.^2, 3)[:, :]))
colorbar()