Regression

COMP4670/8600 - Introduction to Statistical Machine Learning - Tutorial 2

$\newcommand{\trace}[1]{\operatorname{tr}\left\{#1\right\}}$ $\newcommand{\Norm}[1]{\lVert#1\rVert}$ $\newcommand{\RR}{\mathbb{R}}$ $\newcommand{\inner}[2]{\langle #1, #2 \rangle}$ $\newcommand{\DD}{\mathscr{D}}$ $\newcommand{\grad}[1]{\operatorname{grad}#1}$ $\DeclareMathOperator*{\argmin}{arg\,min}$

Setting up the environment

In [ ]:
import matplotlib.pyplot as plt
import numpy as np

%matplotlib inline

The data set

We will use an old dataset on the price of housing in Boston (see description). The aim is to predict the median value of the owner occupied homes from various other factors. We will use a normalised version of this data, where each row is an example. The median value of homes is given in the first column (the label) and the value of each subsequent feature has been normalised to be in the range $[-1,1]$. Download this dataset from mldata.org.

Read in the data using np.loadtxt with the optional argument delimiter=','.

Check that the data is as expected using print(). Use np.delete and del to remove the column containing the binary variable 'chas' and the corresponding label, respectively. names.index('chas') is a convenient way to get the index of that column. This should give you an np.ndarray with 506 rows (examples) and 13 columns (1 label and 12 features).

In [ ]:
names = ['medv', 'drim', 'zn', 'indus', 'chas', 'nox', 'rm', 'age', 'dis', 'rad', 'tax', 'ptratio', 'b', 'lstat']
In [ ]:
# Solution goes here
In [ ]:
print('name, min, max, #unique:')
print('\n'.join([str((name, min(vals), max(vals), len(set(vals)))) for name, vals in zip(names, data.T)]))
assert data.shape == (506,13)

Plotting

Plotting is done using the matplotlib toolbox. For example:

In [ ]:
x = [0,1.2,2,3,5.1,7,8,9]
y1 = [1.1,3,2,4,5,6,8.1,8.2]
y2 = [4.2,4.2,4.1,5,6,3.2,4.8,6]
fig = plt.figure(figsize=(11,5))
ax = fig.add_subplot(121)
ax.plot(x,y1,'b--')
ax.plot(x,y1,'bs',label='y1')
ax.plot(x,y2,'r:')
ax.plot(x,y2,'r>',label='y2')
ax.set_title('Some random data')
ax.set_ylabel('labels')
ax.legend(loc='upper left', numpoints=1)
ax = fig.add_subplot(122)
ax.plot(x,y1,'bo')
ax.set_title('same data as before, without lines')
ax.set_xlabel(r'examples (symbols, e.g. $\alpha,\beta,\gamma$, works)')

Plot the median value of the property (vertical axis) versus the tax rate (horizontal axis).

In [ ]:
# Solution goes here

Regression without regularization

Implement the sum of squares error function to find the maximum likelihood solution $w_{ML}$ for the regression problem. Implement subroutines for polynomial basis function of degree 2. See expansion based on binomial formula.

In [ ]:
# Solution goes here

Training and testing

Use half of the available data for training the model using maximum likelihood. The rest of the data is allocated to the test set. Report the root mean squared error (RMSE) for the training set and the test set.

In [ ]:
# Solution goes here

Using the standard basis function (no transformations), find the feature with the biggest weight. Plot two figures, one for the training set and one for the test set. In each figure, plot the label against the this most important feature. Also include a line showing your maximum likelihood predictor (Hint: use np.arange to generate data).

In [ ]:
# Solution goes here

Regression with regularization

Implement the regularized least squares regression to find the maximum likelihood solution $w_{reg}$ with regularizer $\lambda>0$. (Warning: the keyword lambda is a reserved word in Python).

In [ ]:
# Solution goes here

Similar to the previous exercise, plot two figures showing the most important feature along with the label and prediction. Use $\lambda = 1.1$.

In [ ]:
# Solution goes here

Analysis of results

Compare the RSME of regression with and without regularization. By also considering the plots, describe what you observe and explain your observations.

Solution description

(optional) Exploration of basis functions and regularization parameter

The choice of basis function as well as the value of the regularization parameter $\lambda$ affects the performance of the predictor. Using the same training and test data as before, compute the different RMSE for:

  • The standard basis (as done above)
  • polynomial basis function of degree 2.
  • $\lambda$ = [0.01, 0.1, 1, 10, 100]
In [ ]:
# Solution goes here