cs231n. A1, part 1. k-Nearest Neighbor (kNN) exercise

Comments on compute_distances_no_loops by @yorko

  • $num\_test$, $num\_train$ – are the numbers of test and train images correspondingly
  • $d$ – is the number of features
  • $X$ – is a test set feature matrix. Dimensions: $num\_test \times d$
  • $X^{(tr)}$ – is a training set feature matrix. Dimensions: $num\_train \times d$

We need to compute matrix $D$ which stores all pairwise distances between test and training samples.

$$\Large D_{ij} = {\lVert X_{i,:} - X^{(tr)}_{j,:}\rVert }_2$$

where $X_{i,:}$ designates the $i$-th row of matrix $X$.

Further, $$\Large D^2_{ij} = \sum_{k=1}^d {(X_{ik} - X^{(tr)}_{jk})} ^ 2$$

where $D^2_{ij}$ is a square of an element of $D$ at row $i$ and column $j$. $k$ iterates over $d$ features of the $i$-th test sample and $j$-th training sample.

$$\large D^2_{ij} = \sum_{k=1}^d (X^2_{ik} - 2 X_{ik} X^{(tr)}_{jk} + {X^{(tr)}_{jk}}^2) = \sum_{k=1}^d X^2_{ik} - 2\sum_{k=1}^d X_{ik} X^{(tr)}_{jk} + \sum_{k=1}^d {X^{(tr)}_{jk}}^2$$

From here, we get a closed-form vectorized expression for $D * D$:

$$\Large D * D = {(X * X)}_{:,\Sigma} - 2X{X^{(tr)}}^{\text{T}} + {(X^{(tr)} * X^{(tr)})}_{:,\Sigma}^{\text{T}}$$

where $*$ designates element-wise multiplication and $A_{:,\Sigma}$ stands for a vector derived from matrix $A$ by summing all values in each row.

Lastly, for all dimensions to match, we assume broadcasting here, that is, ${(X * X)}_{:,\Sigma}$ is comprised of $num\_train$ identical column vectors, stacked horizontally, and ${(X^{(tr)} * X^{(tr)})}_{:,\Sigma}^{\text{T}}$ is made of $num\_test$ identical row vectors, stacked vertically. Thus, each term in the above formula has shape $num\_test \times num\_train$.

Simple broadcasting example

$a$ is a column-vector of length 3, $b$ is a row-vector of length 4. $a - b$ is a matrix of shape $3 \times 4$.

In [1]:
import numpy as np
In [2]:
a, b = 3 * np.ones((3, 1)), 4 * np.ones((1, 4))
In [3]:
a
Out[3]:
array([[ 3.],
       [ 3.],
       [ 3.]])
In [4]:
b
Out[4]:
array([[ 4.,  4.,  4.,  4.]])
In [5]:
a - b
Out[5]:
array([[-1., -1., -1., -1.],
       [-1., -1., -1., -1.],
       [-1., -1., -1., -1.]])

Testing the derived formula

In [6]:
X = np.array([[4, 3, 2, 1],
              [1, 1, 1, 1]])

X_tr = np.array([[1, 2, 3, 4],
                 [0, 1, 2, 3],
                 [1, 1, 1, 1]])
In [7]:
np.sqrt((X * X).sum(axis=1, keepdims=True) - 2 * X.dot(X_tr.T) + \
        (X_tr * X_tr).sum(axis=1, keepdims=True).T)
Out[7]:
array([[ 4.47213595,  4.89897949,  3.74165739],
       [ 3.74165739,  2.44948974,  0.        ]])

Double-check by calculating the same distances with two loops.

In [8]:
for test_id in range(X.shape[0]):
    for train_id in range(X_tr.shape[0]):
        print("Distance between {} test sample and {} train sample: {}".format(
            test_id + 1, train_id + 1,
            np.linalg.norm(X[test_id, :] - X_tr[train_id, :])))
Distance between 1 test sample and 1 train sample: 4.47213595499958
Distance between 1 test sample and 2 train sample: 4.898979485566356
Distance between 1 test sample and 3 train sample: 3.7416573867739413
Distance between 2 test sample and 1 train sample: 3.7416573867739413
Distance between 2 test sample and 2 train sample: 2.449489742783178
Distance between 2 test sample and 3 train sample: 0.0