Introduction to Jupyter notebooks

COMP4670/8600 - Introduction to Statistical Machine Learning - Tutorial 1A

The first tutorial will introduce the basic elements for writing Python programs, and using Jupyter notebooks. All tutorials and assignments will be done using this format.

Due to the wide variety of backgrounds that students may have, it is worth recalling some mathematics and statistics that we build upon in this course.

Basic knowledge

IMPORTANT: When using mathematical formulas, provide the precise name for each component.

$\newcommand{\RR}{\mathbb{R}}$

Random variables

Write down the definitions of the following entities, and provide a simple example to illustrate.

  1. The expectation of a function $f$ with respect to a
    • continuous random variable $X$
    • discrete random variable $X$
  2. The variance of a random variable $X$.
  3. Independence of two random variables $X$ and $Y$

Solution description

Discrete probabilities

For discrete random variables $X$ and $Y$, define the following, and show an example of how it applies to the example below.

$p(\mathbf{X},\mathbf{Y})$ X=a X=b X=c X=d X=e
Y = red 0.2 0.1 0.1 0.01 0.04
Y = green 0.08 0.07 0.01 0.05 0.05
Y = blue 0.01 0.01 0.07 0.05 0.15
  1. The sum rule of probability theory
  2. The product rule of probability theory
  3. Independence of two random variables $X$ and $Y$

Solution description

Linear algebra

Write down the definitions, being careful to specify conditions (if needed) when they exist.

  1. The eigenvector decomposition of a matrix $C$.
  2. The solution of a linear system of equations $Ax=b$.

Solution description

Calculus

Compute the gradient of the following function $f:\RR\to\RR$ $$ f(x) = \frac{1}{1 + \exp(x^2)} $$ What would the gradient be if $x$ was two dimensional (that is $f:\RR^2\to\RR$), and we let $x^2$ be the squared Euclidean norm, $\|x\|^2$?

Solution description

Python and Programming for Machine Learning

If you already know Python and Jupyter notebooks well, please work on Tutorial 1B "Linear Algebra and Optimisation".

The introduction will focus on the concepts necessary for writing small programs in Python for the purpose of Machine Learning. That means, we expect a user of the code will be a reasonable knowledgeable person. Therefore, we can skip most of the code a robust system would have to contain in order to check the input types, verify the input parameter ranges, and make sure that really nothing can go wrong when somebody else is using the code. Having said this, you are nevertheless encouraged to include some sanity tests into your code to avoid making simple errors which can cost you a lot of time to find. Some of the Python concepts discussed in the tutorial will be

  • Data types (bool, int, float, str, list, tuple, set, dict)
  • Operators
  • Data flow
  • Functions
  • Classes and objects
  • Modules and how to use them

We will be using Python3 in this course.

Some resources:

Installation

The easiest way to get a working Python environment is using one of the following collections:

It is also not too difficult to install python using your favourite package manager and then use conda or pip to manage python packages.

Jupyter Notebooks

To work on a worksheet or assignment, download the notebook and edit it locally.

Jupyter notebooks provide a convenient browser based environment for data analysis in a literate programming environment. The descriptive parts of the notebook implements an enhanced version of markdown, which allows the use of LaTeX for rendering equations.

  1. Descriptive notes
    • Markdown
    • LaTeX
  2. Computational code
    • numerical python
      • numpy
      • scipy
    • matplotlib

To use a notebook locally:

jupyter notebook name_of_file.ipynb

Markdown and LaTeX

In addition to lists and links which are already shown above, tables are also nice and easy

Title Middle Left aligned Right aligned
Monday 10:00 Sunny 30
Thursday 12:32 Rain 22.3

It is also easy to typeset good looking equations inline, such as $f(x) = x^2$, or on a line by itself. \begin{equation} g(x) = \sum_{i=1}^n \frac{\prod_{j=1}^d y_j \sqrt{3x_i^4}}{f(x_i)} \end{equation} If you use a symbol often, you can define it at the top of a document as follows (look at source), and use it in equations.

$\newcommand{\amazing}{\sqrt{3x_i^4}}$

\begin{equation} h(x) = \sum_{i=1}^n \amazing \end{equation}

Computational code

Setting up python environment (do not use pylab)

In [ ]:
import matplotlib.pyplot as plt
import numpy as np
import scipy as sp

%matplotlib inline