Machine Learning Overview¶

Preliminaries¶

Goal
- Top-level overview of machine learning
Materials
- Study Bishop pp. 1-4
- Study this notebook

What is Machine Learning?¶

Machine Learning relates to building models from data.
- Suppose we want to make a model for a complex process about which we have little knowledge (so hand-programming is not possible).

Solution: Get the computer to program itself by showing it examples of the behavior that we want.

Practically, we choose a library of models, and write a program that picks a model and tunes it to fit the data.

Criterion: a good model generalizes well to unseen data from the same process.

This method is known in various scientific communities under different names such as machine learning, statistical inference, system identification, data mining, source coding, data compression, etc.

Machine learning and the scientific inquiry loop.¶

Machine Learning is Difficult¶

Modeling (Learning) Problems
- Is there any regularity in the data anyway?
- What is our prior knowledge and how to express it mathematically?
- How to pick the model library?
- How to tune the models to the data?
- How to measure the generalization performance?

Quality of Observed Data
- Not enough data
- Too much data?
- Available data may be messy (measurement noise, missing data points, outliers)

A Machine Learning Taxonomy¶

Supervised Learning: Given examples of inputs and corresponding

desired outputs, predict outputs on future inputs.

Examples: classification, regression, time series prediction

Unsupervised Learning: (a.k.a. density estimation). Given only inputs, automatically discover representations, features, structure, etc.
- Examples: clustering, outlier detection, compression

Reinforcement Learning: Given sequences of inputs, actions from a

fixed set, and scalar rewards/punishments, learn to select action sequences in a way that maximizes expected reward, e.g. chess and robotics. (This is more akin to learning how to design good experiments and is not covered in this course.)

Other stuff, like Preference Learning, learning to rank, etc. (also not covered in this course). Note that many machine learning problems can be (re-)formulated as special cases of either a supervised or unsupervised problem, which are both covered in this class.

Supervised Learning¶

Given observations $D=\{(x_1,y_1),\dots,(x_N,y_N)\}$, the goal is to estimate the conditional distribution $p(y|x)$.

Classification¶

The target variable $y$ is a discrete-valued vector representing class labels

Regression¶

Same problem statement as classification but now the target variable is a real-valued vector.

Unsupervised Learning¶

Given data $D=\{x_1,\ldots,x_N\}$, model the (unconditional) probability distribution $p(x)$ (a.k.a. density estimation).

Clustering¶

Group data into clusters such that all data points in a cluster have similar properties.

Compression / dimensionality reduction¶

Output from coder is much smaller in size than original, but if coded signal if further processed by a decoder, then the result is very close (or exactly equal) to the original.

Some Machine Learning Applications¶

computer speech recognition, speaker recognition
face recognition, iris identification
printed and handwritten text parsing
financial prediction, outlier detection (credit-card fraud)
user preference modeling (amazon); modeling of human perception
modeling of the web (google)
machine translation
medical expert systems for disease diagnosis (e.g., mammogram)
strategic games (chess, go, backgammon)
any 'knowledge-poor' but 'data-rich' problem