$\newcommand{\umin}[1]{\underset{#1}{\min}\;}$
We have seen in the lab 3 that total variation denoising can be performed using the dual forward-backward algorithm. But the setting is restrictive: this algorithm cannot be applied to general inverse problems.
This tour explores the primal-dual proximal splitting algorithm proposed in
A. Chambolle and T. Pock, "A First-order primal-dual algorithm for convex problems with application to imaging," Journal of Mathematical Imaging and Vision, vol. 40, no. 1, 2011
and further analyzed and extended in
L. Condat, "A primal-dual splitting method for convex optimization involving Lipschitzian, proximable and linear composite terms," J. Optimization Theory and Applications, vol. 158, no. 2, 2013.
from __future__ import division
%pylab inline
%load_ext autoreload
%autoreload 2
We consider a (primal) optimization problem of the form $$ \umin{x} f(x) + g(Lx) $$ where $f$ and $g$ are convex functions, whose proximity operators can be computed, and $L$ is a linear operator.
The dual problem is
$$ \umin{u} f^*(-L^*u) + g^*(u) $$The (relaxed) Chambolle-Pock algorithm takes initial estimates $x^{(0)}$ and $u^{(0)}$ of the primal and dual solutions, a parameter $\tau>0$, a second parameter $0<\sigma\leq 1/(\tau\|L\|^2)$, a relaxation parameter $0<\rho<2$, and iterates, for $k=1,2,\ldots$ $$ \left|\begin{array}{l} \tilde{x}^{(k)} = \mathrm{prox}_{\tau f}( x^{(k-1)}-\tau L^*(u^{(k-1)}) ) \\ \tilde{u}^{(k)} = \mathrm{prox}_{\sigma g^*}( u^{(k-1)}+ \sigma L(2\tilde{x}^{(k)}-x^{(k-1)}) \\ x^{(k)}= x^{(k-1)} + \rho (\tilde{x}^{(k)}-x^{(k-1)})\\ u^{(k)}= u^{(k-1)} + \rho (\tilde{u}^{(k)}-u^{(k-1)}) \end{array}\right.$$
Then, $x^{(k)}$ converges to a primal solution $x^\star$ and $u^{(k)}$ converges to a dual solution $u^\star$.
In practice, like for the Douglas-Rachford algorithm, it is always interesting to take $\rho$ close to $2$, e.g. $\rho=1.9$, instead of $\rho=1$ like in the paper of Chambolle & Pock. Also, for fixed $\tau$, the higher $\sigma$, the better; so, one can set $\sigma=1/(\tau\|L\|^2)$, which leaves only the parameter $\tau$ to tune.
With this choice of $\sigma$, the algorithm exactly reverts to the Douglas-Rachford algorithm when $L=\mathrm{Id}$ (replacing $\sigma$ by $1/\tau$ in the algorithm). So, it is a natural extension of the latter.
We recall that being able to compute the proximity operator of $f^*$ is equivalent to being able to compute the proximity operator of $f$, thanks to the Moreau identity $$ x = \mathrm{prox}_{\gamma f^*}(x) + \gamma \mathrm{prox}_{f/\gamma}(x/\gamma) $$
Like in the lab 1, we want to reconstruct an estimate of the Lena image from a random subset of its pixels. So, we want to solve $$\umin{x} \mathrm{TV}(x)\quad\mbox{s.t.}\quad Ax=b,$$ where we keep the notations of the labs 1 and 3: $A$ is the degradation operator which multiplies the image by a binary mask and $\mathrm{TV}$ is the total variation.
Like in the lab 3, we want to denoise the noisy Lena image by solving $$\umin{x} \frac{1}{2}\|x-y\|^2+\lambda\mathrm{TV}(x).$$