Let's pretend we have an email with three words: "Send money now." We want to classify that email as ham or spam.
We'll use Naive Bayes classification:
$$P(spam | \text{send money now}) = \frac {P(\text{send money now} | spam) \times P(spam)} {P(\text{send money now})}$$By assuming that the features (the words) are conditionally independent, we can simplify the likelihood function:
$$P(spam | \text{send money now}) \approx \frac {P(\text{send} | spam) \times P(\text{money} | spam) \times P(\text{now} | spam) \times P(spam)} {P(\text{send money now})}$$We could calculate all of the values in the numerator by examining a corpus of spam email:
$$P(spam | \text{send money now}) \approx \frac {0.2 \times 0.1 \times 0.1 \times 0.9} {P(\text{send money now})} = \frac {0.0018} {P(\text{send money now})}$$We could repeat this process with a corpus of ham email:
$$P(ham | \text{send money now}) \approx \frac {0.05 \times 0.01 \times 0.1 \times 0.1} {P(\text{send money now})} = \frac {0.000005} {P(\text{send money now})}$$All we care about is whether spam or ham has the higher probability, and so we predict that the email is spam.
Advantages of Naive Bayes:
Disadvantages of Naive Bayes: