# Newton Method¶


This tour explores the use of the Newton method for the unconstrained optimization of a smooth function

In [2]:
addpath('toolbox_signal')


## Newton Method for Unconstrained Problems¶

We test here Newton method for the minimization of a 2-D function.

We define a highly anisotropic function, the Rosenbrock function

$$g(x) = (1-x_1)^2 + 100 (x_2-x_1^2)^2$$
In [3]:
f = @(x1,x2)(1-x1).^2 + 100*(x2-x1.^2).^2;


The minimum of the function is reached at $x^\star=(1,1)$ and $f(x^\star)=0$.

Evaluate the function on a regular grid.

In [4]:
x1 = linspace(-2,2,150);
x2 = linspace(-.5,3,150);
[X2,X1] = meshgrid(x2,x1);
F = f(X1,X2);


3-D display.

In [5]:
clf;
surf(x2,x1, F, perform_hist_eq(F, 'linear') );
% camlight;
% axis tight;
% colormap jet;


2-D display (histogram equalization helps to better visualize the iso-contours).

In [6]:
clf;
imageplot( perform_hist_eq(F, 'linear') );
colormap jet(256);


Gradient descent methods, that only use first order (gradient) information about $f$ are not able to efficiently minimize this function because of its high anisotropy.

Define the gradient of $f$ $$\nabla g(x) = \pa{ \pd{g(x)}{x_1}, \pd{g(x)}{x_2} } = \pa{ 2 (x_1-1) + 400 x_1 (x_1^2-x_2), 200 (x_2-x_1^2) } \in \RR^2.$$

In [7]:
gradf = @(x1,x2)[2*(x1-1) + 400*x1.*(x1.^2-x2); 200*(x2-x1.^2)];


Compute its Hessian $$Hf(x) = \begin{pmatrix} \frac{\partial^2 g(x)}{\partial x_1^2} & \frac{\partial^2 g(x)}{\partial x_1 \partial x_2} \ \frac{\partial^2 g(x)}{\partial x_1 \partial x_2} & \frac{\partial^2 g(x)}{\partial x_2^2}  \end{pmatrix} = \begin{pmatrix} 2 + 400 (x_1^2-x_2) + 800 x_1^2 & -400 x_1 \\ -400 x_1 & 200 \end{pmatrix} \in \RR^{2 \times 2} $$

In [8]:
hessf = @(x1,x2)[2 + 400*(x1.^2-x2) + 800*x1.^2, -400*x1; ...
-400*x1,  200];
Hessf = @(x)hessf(x(1),x(2));


The Newton descent method starting from some $x^{(0)} \in \RR^2$, $$x^{(\ell+1)} = x^{(\ell)} - Hf( x^{(\ell)} )^{-1} \nabla f(x^{(\ell)}).$$

Exercise 1

Implement the Newton algorithm. Display the evolution of $f(x^{(\ell)})$ and $\norm{x^{(\ell)}-x^{(+\infty)}}$ during the iterations. isplay

In [9]:
exo1()

In [10]:
%% Insert your code here.


Exercise 2

Display the evolution of $x^{(\ell)}$, from several starting points.

In [11]:
exo2()

In [12]:
%% Insert your code here.


## Gradient and Divergence of Images¶

Local differential operators like gradient, divergence and laplacian are the building blocks for variational image processing.

Load an image $g \in \RR^N$ of $N=n \times n$ pixels.

In [13]:
n = 256;


Display it.

In [14]:
clf;
imageplot(g);


For continuous functions, the gradient reads $$\nabla g(x) = \pa{ \pd{g(x)}{x_1}, \pd{g(x)}{x_2} } \in \RR^2.$$ (note that here, the variable $x$ denotes the 2-D spacial position).

We discretize this differential operator using first order finite differences. $$(\nabla g)_i = ( g_{i_1,i_2}-g_{i_1-1,i_2}, g_{i_1,i_2}-g_{i_1,i_2-1} ) \in \RR^2.$$ Note that for simplity we use periodic boundary conditions.

Compute its gradient, using finite differences.

In [15]:
s = [n 1:n-1];


One thus has $\nabla : \RR^N \mapsto \RR^{N \times 2}.$

In [16]:
v = grad(g);


One can display each of its components.

In [17]:
clf;
imageplot(v(:,:,1), 'd/dx', 1,2,1);
imageplot(v(:,:,2), 'd/dy', 1,2,2);


One can also display it using a color image.

In [18]:
clf;
imageplot(v);


One can display its magnitude $\norm{\nabla g(x)}$, which is large near edges.

In [19]:
clf;
imageplot( sqrt( sum3(v.^2,3) ) );


The divergence operator maps vector field to images. For continuous vector fields $v(x) \in \RR^2$, it is defined as $$\text{div}(v)(x) = \pd{v_1(x)}{x_1} + \pd{v_2(x)}{x_2} \in \RR.$$ (note that here, the variable $x$ denotes the 2-D spacial position). It is minus the adjoint of the gadient, i.e. $\text{div} = - \nabla^*$.

It is discretized, for $v=(v^1,v^2)$ as $$\text{div}(v)_i = v^1_{i_1+1,i_2} - v^1_{i_1,i_2} + v^2_{i_1,i_2+1} - v^2_{i_1,i_2} .$$

In [20]:
t = [2:n 1];
div = @(v)v(t,:,1)-v(:,:,1) + v(:,t,2)-v(:,:,2);


The Laplacian operatore is defined as $\Delta=\text{div} \circ \nabla = -\nabla^* \circ \nabla$. It is thus a negative symmetric operator.

In [21]:
delta = @(f)div(grad(f));


Display $\Delta f_0$.

In [22]:
clf;
imageplot(delta(g));


Check that the relation $\norm{\nabla f} = - \dotp{\Delta f}{f}.$

In [23]:
dotp = @(a,b)sum(a(:).*b(:));

Should be 0: 000


## Newton Method in Image Processing¶

We consider now the problem of denoising an image $y \in \RR^N$ where $N = n \times n$ is the number of pixels ($n$ being the number of rows/columns in the image).

Add noise to the clean image, to simulate a noisy image $y$.

In [24]:
sigma = .1;
y = g + randn(n)*sigma;


Display the noisy image $y$.

In [25]:
clf;
imageplot(clamp(y));