$\newcommand{\umin}[1]{\underset{#1}{\min}\;}$
We have seen in the lab 3 that total variation denoising can be performed using the dual forward-backward algorithm. But the setting is restrictive: this algorithm cannot be applied to general inverse problems.
This tour explores the primal-dual proximal splitting algorithm proposed in
A. Chambolle and T. Pock, "A First-order primal-dual algorithm for convex problems with application to imaging," Journal of Mathematical Imaging and Vision, vol. 40, no. 1, 2011
and further analyzed and extended in
L. Condat, "A primal-dual splitting method for convex optimization involving Lipschitzian, proximable and linear composite terms," J. Optimization Theory and Applications, vol. 158, no. 2, 2013.
using PyPlot
We consider a (primal) optimization problem of the form $$ \umin{x} f(x) + g(Lx) $$ where $f$ and $g$ are convex functions, whose proximity operators can be computed, and $L$ is a linear operator.
The dual problem is
$$ \umin{u} f^*(-L^*u) + g^*(u) $$The (relaxed) Chambolle-Pock algorithm takes initial estimates $x^{(0)}$ and $u^{(0)}$ of the primal and dual solutions, a parameter $\tau>0$, a second parameter $0<\sigma\leq 1/(\tau\|L\|^2)$, a relaxation parameter $0<\rho<2$, and iterates, for $k=1,2,\ldots$ $$ \left|\begin{array}{l} \tilde{x}^{(k)} = \mathrm{prox}_{\tau f}( x^{(k-1)}-\tau L^*(u^{(k-1)}) ) \\ \tilde{u}^{(k)} = \mathrm{prox}_{\sigma g^*}( u^{(k-1)}+ \sigma L(2\tilde{x}^{(k)}-x^{(k-1)}) \\ x^{(k)}= x^{(k-1)} + \rho (\tilde{x}^{(k)}-x^{(k-1)})\\ u^{(k)}= u^{(k-1)} + \rho (\tilde{u}^{(k)}-u^{(k-1)}) \end{array}\right.$$
Then, $x^{(k)}$ converges to a primal solution $x^\star$ and $u^{(k)}$ converges to a dual solution $u^\star$.
In practice, like for the Douglas-Rachford algorithm, it is always interesting to take $\rho$ close to $2$, e.g. $\rho=1.9$, instead of $\rho=1$ like in the paper of Chambolle & Pock. Also, for fixed $\tau$, the higher $\sigma$, the better; so, one can set $\sigma=1/(\tau\|L\|^2)$, which leaves only the parameter $\tau$ to tune.
With this choice of $\sigma$, the algorithm exactly reverts to the Douglas-Rachford algorithm when $L=\mathrm{Id}$ (replacing $\sigma$ by $1/\tau$ in the algorithm). So, it is a natural extension of the latter.
We recall that being able to compute the proximity operator of $f^*$ is equivalent to being able to compute the proximity operator of $f$, thanks to the Moreau identity $$ x = \mathrm{prox}_{\gamma f^*}(x) + \gamma \mathrm{prox}_{f/\gamma}(x/\gamma) $$
Like in the lab 1, we want to reconstruct an estimate of the Lena image from a random subset of its pixels. So, we want to solve $$\umin{x} \mathrm{TV}(x)\quad\mbox{s.t.}\quad Ax=b,$$ where we keep the notations of the labs 1 and 3: $A$ is the degradation operator which multiplies the image by a binary mask and $\mathrm{TV}$ is the total variation.
Try different values of $\tau$ and $\rho$ and observe the convergence speed by monitoring the decay of $\mathrm{TV}(x^{(k)})$.
Compare the inpainted image with the one obtained in the lab 1, with Tikhonov instead of total variation regularization.
Like in the lab 3, we want to denoise the noisy Lena image by solving $$\umin{x} \frac{1}{2}\|x-y\|^2+\lambda\mathrm{TV}(x).$$
Try different values of $\tau$ and $\rho$ and observe the convergence speed by monitoring the sum of the primal and dual energies, like in the lab 3. Compare with the accelerated forward-backward algorithm on the dual problem developed in the lab 3.