Notebook

Applying Naive Bayes classification to spam filtering¶

Let's pretend we have an email with three words: "Send money now." We want to classify that email as ham or spam.

We'll use Naive Bayes classification:

$$P(spam | \text{send money now}) = \frac {P(\text{send money now} | spam) \times P(spam)} {P(\text{send money now})}$$

By assuming that the features (the words) are conditionally independent, we can simplify the likelihood function:

$$P(spam | \text{send money now}) \approx \frac {P(\text{send} | spam) \times P(\text{money} | spam) \times P(\text{now} | spam) \times P(spam)} {P(\text{send money now})}$$

We could calculate all of the values in the numerator by examining a corpus of spam email:

$$P(spam | \text{send money now}) \approx \frac {0.2 \times 0.1 \times 0.1 \times 0.9} {P(\text{send money now})} = \frac {0.0018} {P(\text{send money now})}$$

We could repeat this process with a corpus of ham email:

$$P(ham | \text{send money now}) \approx \frac {0.05 \times 0.01 \times 0.1 \times 0.1} {P(\text{send money now})} = \frac {0.000005} {P(\text{send money now})}$$

All we care about is whether spam or ham has the higher probability, and so we predict that the email is spam.

Key takeaways¶

The "naive" assumption of Naive Bayes (that the features are conditionally independent) is critical to making these calculations simple.
The normalization constant (the denominator) can be ignored since it's the same for all classes.
The prior probability is basically irrelevant once you have a lot of features.
The Naive Bayes classifier can handle a lot of irrelevant features.

Comparing Naive Bayes with other models¶

Advantages of Naive Bayes:

Model training and prediction are very fast
No tuning is required
Features don't need scaling
Handles irrelevant features well
Performs better than logistic regression when the training set is very small

Disadvantages of Naive Bayes:

Interpretability is limited
Predicted probabilities are not well-calibrated
Has a higher "asymptotic error" than logistic regression
Can't automatically learn feature interactions