Naïve Bayes

Description

Naïve bayes classifier uses probability models to classify class membership of unknowns. In using this classifier, you should use features that are independent of each other or be prepared to assume that they are independent of each other.

Simple Explanation with Example

The Naïve Bayes classifier classifies data based on the following equation:

probability of class membership based on evidence=(likelihood of evidence based on class membership*probability of outcome)/(probability of evidence)

Table 1. Training data and unknown (last row) of disease and no disease based on a particular test.

Class

Test

Disease

Positive

Disease

Positive

Disease

Negative

Disease

Positive

Disease

Negative

No disease

Negative

No disease

Negative

No disease

Positive

No disease

Negative

No disease

Negative

Unknown

Negative

To classify the unknown in Table 1, we look at the probability of disease and no disease for a negative test.

Probability of Disease based on Negative Test=(likelihood of Negative test based on Disease*probability of Disease )/(probability of Negative test)

Probability of Disease based on Negative Test=(2/5*5/10)/(6/10)

Probability of Disease based on Negative Test=1/3

Similarly, we get probability of no disease based on negative test as (4/5*5/10)/(6/10)=2/3

Therefore, the naïve bayes classifier would classify the unknown (Table 1) as no disease.

Additional Information

By default, the naïve bayes classifier on MLGenius assumes normal distribution for features with numerical values. If you know that you don't have a normal distribution, you can either use transformations to your dataset to obtain a normal distribution, or please contact us for assistance.

Pros and Cons

ProsCons
  • fast execution
  • tolerates well with small dataset even with high number of features compared to many other classifiers
  • probabilistic model breaks down when features are correlated