# Linear Algebra and Optimisation¶

###### COMP4670/8600 - Introduction to Statistical Machine Learning - Tutorial 1B¶

$\newcommand{\trace}[1]{\operatorname{tr}\left\{#1\right\}}$ $\newcommand{\Norm}[1]{\lVert#1\rVert}$ $\newcommand{\RR}{\mathbb{R}}$ $\newcommand{\inner}[2]{\langle #1, #2 \rangle}$ $\newcommand{\DD}{\mathscr{D}}$ $\newcommand{\grad}[1]{\operatorname{grad}#1}$ $\DeclareMathOperator*{\argmin}{arg\,min}$

Setting up python environment (do not use pylab)

In [ ]:
import matplotlib.pyplot as plt
import numpy as np
import scipy as sp
import scipy.optimize as opt
import time

%matplotlib inline


Consider the following cost function $f(X)$ defined over the space of real $n \times p$ matrices $$f(X) = \frac{1}{2} \trace{X^T C X N} + \mu \frac{1}{4} \Norm{N - X^T X}^2_F$$ where $X \in \RR^{n \times p}$, $n \ge p$, and the matrix $C$ is symmetric, such that $C = C^T$. The scalar $\mu$ is assumed to be larger than the $p$th smallest eigenvalue of $C$. The matrix $N$ is diagonal with distinct positive entries on the diagonal. The trace of a square matrix $A$ is denoted as $\trace{A}$, and the Frobenius norm of an arbitrary matrix $A$ is defined as $\Norm{A}_F = \sqrt{\trace{A^T A}}$.

## Frobenious Norm¶

Implement a Python function frobenius_norm which accepts an arbitrary matrix $A$ and returns $\Norm{A}_F$ using the formula given. (Use numpy.trace and numpy.sqrt.)

1. Given a matrix $A \in \RR^{n \times p}$, what is the complexity of your implementation of frobenius_norm using the formula above?
2. Can you come up with a faster implementation, if you were additionally told that $p \ge n$ ?
3. Can you find an even faster implementation than in 1. and 2.?

### Solution description¶

In [ ]:
# Solution goes here


## Cost Function $f(X)$ with matrix argument¶

Implement the cost function defined as $f(X)$ above as a function cost_function_for_matrix in Python.

Hint: As good programmers, we do not use global variables in subroutines.

In [ ]:
# Solution goes here


## Cost Function $f(X)$ with vector argument¶

Many standard optimisation routines work only with vectors. Fortunately, as vector spaces, the space of matrices $\RR^{n \times p}$ and the space of vectors $\RR^{n p}$ are isomorphic. What does this mean?

Implement the cost function $f(X)$ given $X$ as a vector and call it cost_function_for_vector. Which extra arguments do you need for the cost function?

In [ ]:
# Solution goes here


## Construction of a random matrix $C$ with given eigenvalues¶

A diagonal matrix has the nice property that the eigenvalues can be directly read off the diagonal. Given a diagonal matrix $C \in \RR^{n \times n}$ with distinct eigenvalues, how many different diagonal matrices have the same set of eigenvalues?

Given a diagonal matrix $C \in \RR^{n \times n}$ with distinct eigenvalues, how many different matrices have the same set of eigenvalues?

Given a set of $n$ distinct real eigenvalues $\mathcal{E} = \{e_1, \dots, e_n \}$, write a Python function random_matrix_from_eigenvalues which takes a list of eigenvalues $E$ and returns a random symmetric matrix $C$ having the same eigenvalues.

### Solution description¶

In [ ]:
# Solution goes here


## Minimising the cost function $f(X)$¶

Use the minimisation functions fmin or fmin_powell provided in the Python package scipy.optimize to minimise the cost function cost_function_for_vector.

Hint: Use the argument args in the minimisation functions fmin or fmin_powell to provide the extra parameters to your cost function cost_function_for_vector.

In [ ]:
# Solution goes here


## Gradient of $f(X)$¶

Calculate the gradient for the cost function $f(X)$ given the inner product on the space of real matrices $n \times p$ is defined as $$\inner{A}{B} = \trace{A^T B}$$

Implement a function gradient_for_vector which takes $X$ as a vector, and returns the gradient of $f(X)$ as a vector.

### Solution description¶

In [ ]:
# Solution goes here


## Minimising the cost function $f(X)$ using the gradient¶

Use the minimisation functions fmin_cg or fmin_bfgs provided in the Python package scipy.optimize to minimise the cost function cost_function_for_vector utilising the gradient gradient_for_vector.

Compare the speed of convergence to the minimisation with fmin or fmin_powell.

In [ ]:
# Solution goes here


## Minima of $f(X)$¶

Compare the columns $x_1,\dots, x_p$ of the matrix $X^\star$ which minimises $f(X)$ $$X^\star = \argmin_{X \in \RR^{n \times p}} f(X)$$

with the eigenvectors related to the smallest eigenvalues of $C$.

### Solution description¶

In [ ]:
# Solution goes here