### Predicting Probabilities

Often the outcome of a learning scheme is a probability measure. Here we consider the case where the learning scheme assigns a probability to

*K* possible outcomes for a given instance. Assume these are output as a probability vector $\mathbf{p} = \left(p_1, p_2, \ldots, p_k \right)\hspace{2pt}$. Express the actual class for the instance as a vector $\mathbf{a} = \left(a_1, a_2, \ldots, a_k \right) \hspace{3pt}$ where $a_i$ equals 1 if

*i* is the class the instance actually belongs to and 0 otherwise. A performance metric that may apply in such situations is the use of a loss function that is calculated for a given instance. If the test set contains several instances the loss function is summed over all of them. Two are described here.

#### Quadratic Loss Function

$$\sum_{j=1}^K \left(p_j-a_j\right)^2$$

#### Informational Loss Function

$$-\log_2 p_i$$

where *i* is the actual class for the instance. This function represents the information (in bits) required to express the actual class *i* with respect to the probability distribution $\mathbf{p}$, i.e. if one has the knowledge of the distribution, this is the number of bits required to communicate a specific class if done optimally.