View the description of this assignment at http://www.cs.ubc.ca/~nando/540-2013/lectures/homework2.pdf

Exercise 1

The ridge method solves the following minimization problem $$\min_{\boldsymbol{\theta} \in \mathbb{R}^d} \| \mathbf{y} - \mathbf{X}\boldsymbol{\theta} \|^{2}_{2} + \delta^2 \| \boldsymbol{\theta} \|^2_2$$

This expression can be written as $$\left( \mathbf{y} - \mathbf{X}\boldsymbol\theta \right)^T\left( \mathbf{y} - \mathbf{X}\boldsymbol\theta \right)

  • \delta^2 \boldsymbol\theta^T \boldsymbol\theta = \mathbf{y}^T\mathbf{y} + \boldsymbol\theta^T\mathbf{X}^T\mathbf{X}\boldsymbol\theta
  • 2 \boldsymbol\theta^T\mathbf{X}^T\mathbf{y}
  • \delta^2\boldsymbol\theta^T\boldsymbol\theta$$

to find the minimum we differentiate w.r.t. $\boldsymbol\theta$ and set it equal to zero $$2\mathbf{X}^T\mathbf{X}\boldsymbol\theta - 2 \mathbf{X}^T \mathbf{y} + 2 \delta^2 \boldsymbol\theta =0$$

$$\boldsymbol\theta = \delta^{-2} \left( \mathbf{X}^T\mathbf{y} - \mathbf{X}^T\mathbf{X}\boldsymbol\theta \right) = \mathbf{X}^T\boldsymbol\alpha$$

where $\boldsymbol\alpha = \delta^{-2}\left( \mathbf{y} - \mathbf{X}\boldsymbol\theta \right)$

Exercise 2

Lemma: $$A \left( A^TA + aI \right)^{-1} = \left( AA^T + aI \right)^{-1} A$$

Proof: Multiply both sides by $\left( A^TA + aI \right)$ to get: $$A = \left( AA^T + aI \right)^{-1} A \left( A^TA + aI \right)$$ $$=\left( AA^T + aI \right)^{-1} \left( AA^TA + aA \right)$$ $$=\left( AA^T + aI \right)^{-1} \left( AA^T + aI \right)A = A \; \blacksquare$$

From exercise 1 we know that $$\boldsymbol\alpha = \delta^{-2}\left( \mathbf{y} - \mathbf{X}\boldsymbol\theta \right)$$

so now we substitute $\boldsymbol\theta$ and then apply the matrix identity that we have proved before $$\boldsymbol\alpha = \delta^{-2}\left( \mathbf{y} - \mathbf{X} \left( \mathbf{X}^T\mathbf{X} + \delta^2I \right)^{-1} \mathbf{X}^T\mathbf{y} \right)$$ $$=\delta^{-2}\left( I - \left( \mathbf{X}\mathbf{X}^T + \delta^2I \right)^{-1} \mathbf{X}\mathbf{X}^T \right)\mathbf{y}$$ $$=\delta^{-2} \left( \mathbf{X}\mathbf{X}^T + \delta^2I \right)^{-1} \left( \left( \mathbf{X}\mathbf{X}^T + \delta^2I \right) - \mathbf{X}\mathbf{X}^T \right)\mathbf{y}$$ $$=\left( \mathbf{X}\mathbf{X}^T + \delta^2I \right)^{-1}\mathbf{y}$$