Naive Bayesian classifiers are a collection of classification algorithms based on Bayesian theorem . This is not a separate algorithm, but a family of algorithms in which they all adhere to the general principle, that is, each classifiable pair of features is independent of each other.
First, let's look at a dataset.
Consider a fictional dataset that describes the weather conditions for a golf game. Given the weather conditions, each tuple classifies the conditions as good ("Yes") or not good ("No") for golf.
Here is a tabular view of our dataset.
|0||Raind class = "amp-wp-inline-2f0158eb062d1ac553a7edcb8a744628"> Hot||High < / td>||False||No|
|4||Cool||Normal "> False||Yes|
|8||Cool||Normal class = "amptd>||False||Yes|
|12||Overcast||Hot||Normal td >||False||Yes|
The dataset is split into two parts, namely the feature matrix and response vector .
Assumption: p >
The fundamental naive Bayesian assumption is that each feature produces:
contributes to the outcome .
In relation to our dataset, this concept can be understood as:
Note: assumptions made by naive Bayes are generally not true in real life situations. In fact, the assumption of independence is never correct, but it often works well in practice.
Now, before moving on to the formula for naive Bayes, it is important to know about Bayes' theorem.
Bayes' Theorem finds the probability of an event occurring given the probability of another event that is already occurred. Bayes' theorem is mathematically formulated as the following equation:
where A and B & # 8212 ; events, and P (B)? 0.
Now, for our dataset, we can apply Bayes' theorem as follows:
where, y — class variable, and X — vector of dependent objects ( n size), where:
Just to clarify, an example of a vector of objects and the corresponding class variable might be: (see 1st row of dataset)
X = (Rainy, Hot, High, False) y = No
So basically P (X | y) here means the probability of “not playing golf” given that the weather conditions are “rainy”, “temperature hot”, “high humidity "and" no wind ".
Now it's time to naively rely on Bayes' theorem, namely independence from singularities. So now we have divided proofs into independent parts.
Now, if any two events A and B are independent, then
P (A, B) = P (A) P (B)
Therefore, we achieve the result:
which can be expressed as:
Now, since the denominator remains constant for the given input, we can remove this term:
Now we need to create a classifier model. To do this, we find the probability of a given set of input data for all possible values of the class variable y and select the output with the maximum probability. This can be expressed mathematically as:
So finally, we are faced with the task of calculating P (y) and P (x i | y).
Note that P (y) is also called the class probability , and P (x i | y) is called conditional probability .
Different naive Bayesian classifiers differ mainly in the assumptions they make relative to the distribution of P (x i | y).
Let's try to apply the above formula manually to our weather dataset. To do this, we need to do some preliminary calculations for our dataset.
We need to find P (x i | y j ) for each x i in X and y j in y. All these calculations were demonstrated in the tables below:
So, in the picture above, we computed P (x i | y j ) for each x i in X and y j to y manually in Tables 1-4. For example, the likelihood of playing golf, given that the temperature is cool, i.e. P (temp. = Cool | play golf = Yes) = 3/9.
We also need to find the probabilities of the classes (P (y)) that were calculated in Table 5. For example, P ( play golf = Yes) = 9/14.
So now we have finished our preliminary calculations and the classifier is ready!
Let's test it for a new set of features (let's call it today) :
today = (Sunny, Hot, Normal, False)
So, the probability of playing golf is defined as:
and the probability of not playing golf is defined as:
Since P (today) is common to both probabilities, we can ignore P (today) and find the proportional probabilities as:
Now that As
These numbers can be converted to probability making the sum equal to 1 (normalization):
So the forecast that golf will be played is yes.
The method we discussed above applies to discrete data. In the case of continuous data, we need to make some assumptions about the distribution of values for each feature. The various naive Bayesian classifiers differ mainly in the assumptions they make about the distribution of P (x i | y).
We now discuss one such classifier here.
Gaussian Naive Bayesian Classifier
In Gaussian Naive Bayesian Method it is assumed that the continuous values associated with each feature are distributed according to the Gaussian distribution . Gaussian distribution is also called normal distribution . When plotted, the graph produces a bell-shaped curve that is symmetrical about the mean of the object, as shown below:
The probability of features is assumed to be Gaussian, therefore, the conditional probability is defined as:
We now look at implementing a Gaussian naive Bayesian classifier using scikit-learn.
Gaussian Naive Bayes model accuracy (in%): 95.0
Others popular naive Bayesian classifiers:
When we get to the end of this article, here are some important points to think about:
This blog is courtesy of Nikhil Kumar . If you like Python.Engineering and would like to contribute, you can also write an article using contrib.python.engineering, or email your article to [email protected] See my article appearing on the Python.Engineering homepage and help other geeks.
Please post comments if you find anything wrong or if you would like to share more information on the topic discussed above.