# Forward-Backward Splitting¶


In [1]:
from __future__ import division
%pylab inline

Populating the interactive namespace from numpy and matplotlib


This numerical tour presents the Forward-Backward (FB) algorithm to minimize the sum of a smooth and a simple function. It shows an application to sparse deconvolution.

## Forward-Backward Algorithm¶

We consider the problem of minimizing the sum of two functions $$E^\star = \umin{x \in \RR^N} E(x) = f(x) + g(x).$$

So, we want to find a vector $x^\star$ solution to the problem, i.e. a minimizer of $E=f+g$.

We assume that $f$ is a $C^1$ function with $\beta$-Lipschitz gradient.

We also assume that $g$ is "simple", in the sense that one can compute exactly and quickly its proximity operator, which is defined as $$\text{prox}_{\ga g}(x) = \uargmin{y \in \RR^N} \frac{1}{2}\norm{x-y}^2 + \ga g(y).$$ for any $\ga > 0$.

The forward-backward algorithm reads, after initializing $x^{(0)} \in \RR^N$, $$x^{(k+1)} = \text{prox}_{\ga g}\pa{ x^{(k)} - \ga \nabla f( x^{(k)} ) }.$$

If $0 < \ga < 2/\beta$, then this scheme converges to a minimizer of $f+g$.

## Sparse Regularization of Inverse Problems¶

We consider a linear inverse problem $$y = A x^\sharp + w \in \RR^P$$ where $x^\sharp \in \RR^N$ is the (unknown) signal to recover, $w \in \RR^P$ is a noise vector, and $A \in \RR^{P \times N}$ models the acquisition device.

To recover an estimate of the signal $x^\sharp$, we consider basis pursuit denoising, which makes use of the $\ell^1$ norm as sparsity enforcing penalty: $$\umin{x \in \RR^N} \frac{1}{2} \norm{A x-y}^2 + \la \norm{x}_1,$$ where the $\ell^1$ norm is defined as $$\norm{x}_1 = \sum_i \abs{x_i}.$$

The parameter $\la$ should be set in accordance to the noise level $\norm{w}$.

This minimization problem can be cast in the form of minimizing $f+g$ where $$f(x) = \frac{1}{2} \norm{Ax-y}^2 \qandq g(x) = \la \norm{x}_1.$$

$f$ is smooth; we have $$\nabla f(x) = A^* (A x - y),$$ which is $\beta$-Lipschitz continuous, with $$\beta = \norm{ A^* A }.$$

The $\ell^1$-norm is "simple", because its proximal operator is soft thresholding: $$\big(\text{prox}_{\ga g}(x)\big)_n = \max\pa{ 0, 1 - \frac{\la \ga}{\abs{x_n}} } x_n.$$

## Signal Deconvolution on Synthetic Sparse Data¶

A simple linearized model of seismic acquisition considers a linear filtering operator (convolution): $$A x = h \ast x$$

The filter $h$ is called the impulse response, or the poind spread function, of the acquisition process $x\mapsto Ax$.

In [2]:
N = 1024


We define the width of the filter $h$.

In [3]:
s = 5


We define $h$ as the second derivative of a Gaussian.

In [4]:
t = arange(-N/2,N/2)
h = (1-t**2/s**2)*exp(-(t**2)/(2*s**2))
h = h - h.mean()


We define the operator $A$. For simplicity, here periodic boundary conditions are used, so that the convolution is efficiently implemented as a product in Fourier domain.

In [5]:
h_tf = fft.fft(fft.fftshift(h))
opA = lambda u : real(fft.ifft(fft.fft(u) * h_tf))


We display the filter $h$ and its spectrum (amplitude of its Fourier transform).

In [6]:
figsize(7,5)
plot(t,h)
xlim(-100,100)

Out[6]:
(-100, 100)
In [7]:
plot(t,fft.fftshift(abs(h_tf)))

Out[7]:
[<matplotlib.lines.Line2D at 0x108607d10>]

We generate a synthetic sparse signal $x^\sharp$, with only a small number of nonzero coefficients.

In [8]:
random.seed(80) # we set the seed of the random number generator for reproducibility purpose.

In [9]:
s = round(N*.01) # number of nonzero elements of xsharp
sel = random.permutation(N)
sel = sel[0:s]   # indices of the nonzero elements of xsharp
xsharp = zeros(N)
xsharp[sel] = sign(randn(s)) * (1-0.3*rand(s))

In [10]:
noiselevel = 0.2


Compute the measurements $y=A x^\sharp + w$ where $w$ is a realization of white Gaussian noise.

In [11]:
y = opA(xsharp) + noiselevel * randn(N)

In [12]:
figsize(14,5)
stem(sel,xsharp[sel])
xlim(0,N-1)
title('signal $x^\sharp$')

Out[12]:
<matplotlib.text.Text at 0x108711090>
In [13]:
figsize(14,5)
plot(range(N),y)
xlim(0,N-1)
title('signal $y$')

Out[13]:
<matplotlib.text.Text at 0x108a92110>

## Deconvolution¶

We now implement the foward-backward algorithm to recover an estimate of the sparse signal

We define the regularization parameter $\la$.

In [14]:
Lambda = 3


We define the proximity operator of $\ga g$.

In [15]:
def prox_gamma_g (x, gamma, Lambda) :
return x - x/maximum(abs(x)/(Lambda*gamma),1) # soft-thresholding


We define the gradient operator of $f$. Note that $A^*=A$ because the filter $h$ is symmetric.

### __Exercise: write the code of the function grad_f.__

In [16]:
grad_f = lambda x : # put your code here


We define the Lipschitz constant $\beta$ of $\nabla f$.

In [17]:
beta = abs(fft.fft(h)).max()**2
beta

Out[17]:
85.032184045506753

We define the stepsize $\ga$, which must be smaller than $2/\beta$.

In [18]:
gamma = 1.9 / beta


We compute the solution of $\ell_1$ deconvolution (basis pursuit denoising). We keep track of the energy $E_k=f(x^{(k)})+g(x^{(k)})$.

### __Exercise: write the code of the forward-backward iteration.__

In [19]:
nbiter = 2000
x = y
En_array = zeros(nbiter+1)
En_array[0] = norm(opA(x) - y)**2/2 + Lambda*norm(x, ord=1)
for iter in range(nbiter):  # iter goes from 0 to nbiter-1
En_array[iter+1] = norm(opA(x) - y)**2/2 + Lambda*norm(x, ord=1)
x_restored = x


We display the result.

In [21]:
fig, (subfig1,subfig2) = subplots(1,2,figsize=(16,7)) # one figure with two horizontal subfigures
subfig1.stem(xsharp)
subfig2.stem(x_restored)
subfig1.set_title('$x^\sharp$')
subfig2.set_title('$x_\mathrm{restored}$')

Out[21]:
<matplotlib.text.Text at 0x1091d6e50>

We plot the relative error $(E_k-E^\star)/(E_0-E^\star)$ in log-scale with respect to $k$.

In [22]:
plot(log10((En_array[0:1800]-En_array[-1])/(En_array[0]-En_array[-1])))

Out[22]:
[<matplotlib.lines.Line2D at 0x10cf32b90>]

## Over-relaxed Forward-Backward¶

It is possible to introduce a relaxation parameter $\rho$ with $0 < \rho < 1$. The over-relaxed foward-backward algorithm initializes $x^{(0)} \in \RR^N$, and then iterates, for $k=1,2,\ldots$ $$z^{(k)} = \text{prox}_{\ga g}\pa{ x^{(k-1)} - \ga \nabla f( x^{(k-1)} ) }.$$ $$x^{(k)} = z^{(k)} + \rho \pa{ z^{(k)} - x^{(k-1)} }$$

Let us assume $\gamma=1/\beta$. Convergence of the iterates $x^{(k)}$ and $z^{(k)}$ to a solution is guaranteed for $0 < \rho < 1/2$. The weaker property of convergence of $E(x^{(k)})$ to $E^\star$ is proved, when $1/2\leq \rho <1$.

### __Exercise: write the code of the over-relaxed forward-backward iteration.__

In [27]:
gamma = 1/beta
nbiter = 1700
rho = 0.95
x = y
En_array_overrelaxed = zeros(nbiter+1)
En_array_overrelaxed[0] = norm(opA(x) - y)**2/2 + Lambda*norm(x, ord=1)
for iter in range(nbiter):
En_array_overrelaxed[iter+1] = norm(opA(x) - y)**2/2 + Lambda*norm(x, ord=1)

In [28]:
plot(log10((En_array[0:1800]-En_array[-1])/(En_array[0]-En_array[-1])))
plot(log10((En_array_overrelaxed[0:1800]-En_array[-1])/(En_array[0]-En_array[-1])))

Out[28]:
[<matplotlib.lines.Line2D at 0x1085a78d0>]

As we see, in this example, over-relaxation does not bring any speedup, because $\gamma$ is lower than without over-relaxation. There are other setting parameters or other problems, for which over-relaxation does bring a significant speedup.

## FISTA-like Accelerated Forward-Backward Algorithm¶

We consider the FISTA algorithm introduced in:

A. Beck and M. Teboulle, "A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems", SIAM Journal on Imaging Sciences, 2009.

More precisely, we consider a slightly modified version of FISTA, whose convergence is proved, see A. Chambolle and C. Dossal, "On the convergence of the iterates of "FISTA"", preprint, 2015.

Given an initial estimate $x^{(0)}$ of the solution and a parameter $a>2$, the algorithm sets $\gamma=1/\beta$, sets $z^{(0)}=x^{(0)} \in \RR^N$, and iterates, for $k=1,2,\ldots$ $$x^{(k)} = \text{prox}_{\ga g}\pa{ z^{(k-1)} - \ga \nabla f( z^{(k-1)} ) }.$$ $$\alpha_k=(k-1)/(k+a)$$ $$z^{(k)} = x^{(k)} + \alpha_k \pa{ x^{(k)} - x^{(k-1)} }$$

It is proved that the iterates $x^{(k)}$ converge to a solution $x^\star$ of the problem. Moreover, the optimal convergence rate for this class of problems is reached, namely $$E_k - E^\star = O(1/k^2),$$ whereas the convergence rate for the normal forward-backward is only $O(1/k)$.

Note the difference between the over-relaxed forward-backward and the accelerated forward-backward: the later is based on an inertia mechanism, of different nature than over-relaxation.

### __Exercise: write the code of the accelerated forward-backward iteration.__

In [31]:
gamma = 1/beta
nbiter = 1700
a = 10
x = y
En_array_fista = zeros(nbiter+1)
En_array_fista[0] = norm(opA(x) - y)**2/2 + Lambda*norm(x, ord=1)
for iter in range(nbiter):
En_array_fista[iter+1] = norm(opA(x) - y)**2/2 + Lambda*norm(x, ord=1)

In [32]:
plot(log10((En_array[0:1800]-En_array[-1])/(En_array[0]-En_array[-1])))
plot(log10((En_array_overrelaxed[0:1800]-En_array[-1])/(En_array[0]-En_array[-1])))
plot(log10((En_array_fista[0:1800]-En_array[-1])/(En_array[0]-En_array[-1])))

/Users/condatl/anaconda/lib/python2.7/site-packages/IPython/kernel/__main__.py:3: RuntimeWarning: divide by zero encountered in log10
app.launch_new_instance()
/Users/condatl/anaconda/lib/python2.7/site-packages/IPython/kernel/__main__.py:3: RuntimeWarning: invalid value encountered in log10
app.launch_new_instance()

Out[32]:
[<matplotlib.lines.Line2D at 0x10d984c90>]

### __Exercise: try different values of a. You can try a=3,10,30,50,100.__

We can note that the accelerated forward-backward is not monotonic: the cost function E is not decreasing along with the iterations and some oscillations are present.