Linear Algebra and Optimisation

COMP4670/8600 - Introduction to Statistical Machine Learning - Tutorial 1B

$\newcommand{\trace}[1]{\operatorname{tr}\left\{#1\right\}}$ $\newcommand{\Norm}[1]{\lVert#1\rVert}$ $\newcommand{\RR}{\mathbb{R}}$ $\newcommand{\inner}[2]{\langle #1, #2 \rangle}$ $\newcommand{\DD}{\mathscr{D}}$ $\newcommand{\grad}[1]{\operatorname{grad}#1}$ $\DeclareMathOperator*{\argmin}{arg\,min}$

Setting up python environment (do not use pylab)

In [ ]:
import matplotlib.pyplot as plt
import numpy as np
import scipy as sp
import scipy.optimize as opt
import time

%matplotlib inline

Consider the following cost function $ f(X) $ defined over the space of real $ n \times p $ matrices \begin{equation} f(X) = \frac{1}{2} \trace{X^T C X N} + \mu \frac{1}{4} \Norm{N - X^T X}^2_F \end{equation} where $ X \in \RR^{n \times p} $, $ n \ge p $, and the matrix $ C $ is symmetric, such that $ C = C^T $. The scalar $ \mu $ is assumed to be larger than the $p$th smallest eigenvalue of $ C $. The matrix $ N $ is diagonal with distinct positive entries on the diagonal. The trace of a square matrix $ A $ is denoted as $ \trace{A} $, and the Frobenius norm of an arbitrary matrix $ A $ is defined as $ \Norm{A}_F = \sqrt{\trace{A^T A}} $.

Frobenious Norm

Implement a Python function frobenius_norm which accepts an arbitrary matrix $ A $ and returns $ \Norm{A}_F $ using the formula given. (Use numpy.trace and numpy.sqrt.)

  1. Given a matrix $ A \in \RR^{n \times p} $, what is the complexity of your implementation of frobenius_norm using the formula above?
  2. Can you come up with a faster implementation, if you were additionally told that $ p \ge n $ ?
  3. Can you find an even faster implementation than in 1. and 2.?

Solution description

In [ ]:
# Solution goes here

Cost Function $f(X)$ with matrix argument

Implement the cost function defined as $f(X)$ above as a function cost_function_for_matrix in Python.

Hint: As good programmers, we do not use global variables in subroutines.

In [ ]:
# Solution goes here

Cost Function $f(X)$ with vector argument

Many standard optimisation routines work only with vectors. Fortunately, as vector spaces, the space of matrices $ \RR^{n \times p} $ and the space of vectors $ \RR^{n p} $ are isomorphic. What does this mean?

Implement the cost function $ f(X) $ given $ X $ as a vector and call it cost_function_for_vector. Which extra arguments do you need for the cost function?

In [ ]:
# Solution goes here

Construction of a random matrix $C$ with given eigenvalues

A diagonal matrix has the nice property that the eigenvalues can be directly read off the diagonal. Given a diagonal matrix $ C \in \RR^{n \times n} $ with distinct eigenvalues, how many different diagonal matrices have the same set of eigenvalues?

Given a diagonal matrix $ C \in \RR^{n \times n} $ with distinct eigenvalues, how many different matrices have the same set of eigenvalues?

Given a set of $ n $ distinct real eigenvalues $ \mathcal{E} = \{e_1, \dots, e_n \} $, write a Python function random_matrix_from_eigenvalues which takes a list of eigenvalues $ E $ and returns a random symmetric matrix $ C $ having the same eigenvalues.

Solution description

In [ ]:
# Solution goes here

Minimising the cost function $f(X)$

Use the minimisation functions fmin or fmin_powell provided in the Python package scipy.optimize to minimise the cost function cost_function_for_vector.

Hint: Use the argument args in the minimisation functions fmin or fmin_powell to provide the extra parameters to your cost function cost_function_for_vector.

In [ ]:
# Solution goes here

Gradient of $f(X)$

Calculate the gradient for the cost function $f(X)$ given the inner product on the space of real matrices $ n \times p $ is defined as \begin{equation} \inner{A}{B} = \trace{A^T B} \end{equation}

Implement a function gradient_for_vector which takes $ X $ as a vector, and returns the gradient of $ f(X) $ as a vector.

Solution description

In [ ]:
# Solution goes here

Minimising the cost function $f(X)$ using the gradient

Use the minimisation functions fmin_cg or fmin_bfgs provided in the Python package scipy.optimize to minimise the cost function cost_function_for_vector utilising the gradient gradient_for_vector.

Compare the speed of convergence to the minimisation with fmin or fmin_powell.

In [ ]:
# Solution goes here

Minima of $f(X)$

Compare the columns $x_1,\dots, x_p$ of the matrix $X^\star$ which minimises $ f(X) $ \begin{equation} X^\star = \argmin_{X \in \RR^{n \times p}} f(X) \end{equation}

with the eigenvectors related to the smallest eigenvalues of $ C $.

Solution description

In [ ]:
# Solution goes here