What is machine learning?

In this section we will begin to explore the basic principles of machine learning. Machine Learning is about building programs with tunable parameters (typically an array of floating point values) that are adjusted automatically so as to improve their behavior by adapting to previously seen data.

Machine Learning can be considered a subfield of Artificial Intelligence since those algorithms can be seen as building blocks to make computers learn to behave more intelligently by somehow generalizing rather that just storing and retrieving data items like a database system would do.

We'll take a look at two very simple machine learning tasks here. The first is a classification task: the figure shows a collection of two-dimensional data, colored according to two different class labels. A classification algorithm may be used to draw a dividing boundary between the two clusters of points:

In [ ]:
# Start matplotlib inline mode, so figures will appear in the notebook
%matplotlib inline
In [ ]:
# Import the example plot from the figures directory
from figures import plot_sgd_separator
plot_sgd_separator()

This may seem like a trivial task, but it is a simple version of a very important concept. By drawing this separating line, we have learned a model which can generalize to new data: if you were to drop another point onto the plane which is unlabeled, this algorithm could now predict whether it's a blue or a red point.

If you'd like to see the source code used to generate this, you can either open the code in the figures directory, or you can load the code using the %load magic command:

In [ ]:
%load figures/sgd_separator.py
In [ ]:
 

The next simple task we'll look at is a regression task: a simple best-fit line to a set of data:

In [ ]:
from figures import plot_linear_regression
plot_linear_regression()

Again, this is an example of fitting a model to data, such that the model can make generalizations about new data. The model has been learned from the training data, and can be used to predict the result of test data: here, we might be given an x-value, and the model would allow us to predict the y value. Again, this might seem like a trivial problem, but it is a basic example of a type of operation that is fundamental to machine learning tasks.

An Overview of Scikit-learn

In [ ]:
%matplotlib inline
import numpy as np
from matplotlib import pyplot as plt

Loading an Example Dataset

In [ ]:
from sklearn import datasets
digits = datasets.load_digits()
In [ ]:
digits.data
In [ ]:
digits.target
In [ ]:
digits.images[0]

Learning and Predicting

In [ ]:
from sklearn import svm
clf = svm.SVC(gamma=0.001, C=100.)
In [ ]:
clf.fit(digits.data[:-1], digits.target[:-1])
In [ ]:
clf.predict(digits.data[-1])
In [ ]:
plt.figure(figsize=(2, 2))
plt.imshow(digits.images[-1], interpolation='nearest', cmap=plt.cm.binary)
In [ ]:
print(digits.target[-1])
In [ ]: