Change language

Application of the polynomial naive Bayesian approach to NLP problems

| |

P (c | x) = P (x | c) * P (c) / P (x)

Naive Bayesian in mainly used in natural language processing (NLP) tasks. A naive Bayesian predicts a text tag. They calculate the likelihood of each tag for a given text and then output the tag with the highest value.

How does naive Bayesian algorithm work?

Let’s take an example, classify an overview whether it is positive or negative.

Training dataset:

Text Reviews
“I liked the movie” positive
“It’s a good movie. Nice story ” positive
“ Nice songs. But sadly boring ending. ” negative
“ Hero’s acting is bad but heroine looks good. Overall nice movie ” positive
“ Sad, boring movie ” negative

We classify whether the text "generally liked the movie" has a positive review or a negative review. We have to calculate,
P (positive | liked the movie overall) — the likelihood that the sentence tag is positive given that the sentence “liked the movie as a whole”. 
P (negative | liked the movie overall) — the likelihood that the sentence tag is negative given that the sentence “liked the movie as a whole.”

Before that, firstly, we apply Stop Word and Stemming Removal in the text.

Remove Stop Words : These are ordinary words that add nothing to the classification, such as skill, yet, ever, and so on.

Stemming : Stemming to infer the root of a word.

Now, after applying these two methods, our text becomes

Text Reviews
“ ilikedthemovi ” positive
“itsagoodmovienicestori” positive
“nicesongsbutsadlyboringend” negative
“herosactingisbadbutheroinelooksgoodoverallnicemovi” positive
“sadboringmovi” negative

The important part is finding the features of the data to make machine learning algorithms work. In this case, we have text. We need to convert this text to numbers on which we can perform calculations. We use word frequencies. That is, each document is considered as a set of words that it contains. Our specifics will be counts of each of these words.

In our case, we have P (positive | movie liked overall) using this theorem:

P (positive | overall liked the movie) = P (overall liked the movie | positive) * P (positive) / P (overall liked the movie)

Since for our classifier we have to figure out which tag has the highest probability, we can drop the divisor, which is the same for both tags,

P (generally liked the movie | positive ) * P (positive) with P (overall liked the movie | negative) * P (negative)

However, there is a problem: “Generally liked the movie” does not appear in our training dataset, so the probability is zero ... Here we are assuming a "naive" condition that each word in a sentence is independent of the others. This means that we are now looking at individual words.

We can write it like:

P (overall liked the movie) = P (overall) * P (liked ) * P (the) * P (movie)

The next step is to apply Bayes’ theorem:

P (overall liked the movie | positive ) = P (overall | positive) * P (liked | positive) * P (the | positive) * P (movie | positive)

And now, these individual words actually appear several times in our training data and we can calculate them!

Calculating probabilities:

First we calculate the prior probability of each tag: for a given sentence in our training data, the probability that it is positive P (positive) is 3/5. Then P (negative) is 2/5.

Then calculating P (general | positive) means counting how many times the word “general” occurs in positive texts (1), divided by the total number of words in positive (eleven). Therefore, P (overall | positive) = 1/17, P (liked / positive) = 1/17, P (positive / positive) = 2/17, P (movie / positive) = 3/17.

If the probability turns out to be zero, then using Laplace smoothing: we add 1 to each score so that it never equals zero. To counterbalance this, we add the number of possible words to the divisor so that the division never exceeds 1. In our case, the total number of possible words is 21.

Applying anti-aliasing, the results are:

Word P (word | positive) P (word | negative)
overall 1 + 1/17 + 21 0 + 1/7 + 21
liked 1 + 1/17 + 21 0 + 1/7 + 21
the 2 + 1/17 + 21 0 + 1/7 + 21
movie 3 + 1/17 + 21 1 + 1/7 + 21

Now we just multiply all the probabilities and see who is bigger:

P (overall | positive) * P (liked | positive) * P (the | positive ) * P (movie | positive) * P (postive) = 1.38 * 10 ^ {- 5} = 0.0000138

P (overall | negative) * P (liked | negative) * P (the | negative ) * P (movie | negative) * P (negative) = 0.13 * 10 ^ {- 5} = 0.0000013

Our classifier gives a "generally liked movie" positive tag.

Below is the implementation:

# text cleaning

import pandas as pd

import re

import nltk

from nltk.corpus import stopwords

from nltk.stem.porter import PorterStemmer

from sklearn.feature_extraction.text import CountVectorizer  


dataset = [[ "I liked the movie" , "positive" ],

[ "It’s a good movie. Nice story " , " positive " ],

["Hero’s acting is bad but heroine looks good. 

Overall nice movie "," positive "],

  [ "Nice songs. But sadly boring ending. " , " negative " ],

[ "sad movie, boring movie" , "negative" ]]


dataset = pd.DataFrame (dataset)

dataset.columns = [ "Text" , "Reviews" ] ( ’stopwords’ )


cor pus = []


for i in range ( 0 , 5 ):

text = re.sub ( ’[^ a-zA-Z] ’ ,’ ’, dataset [’ Text’] [i])

text = text.lower ()

  text = text.split ()

  ps = PorterStemmer ()

text = ’’ .join (text)

corpus.append (text )

# create a bag of words

cv = CountVectorizer (max_features = 1500 )


X = cv.fit_transform (corpus) .toarray ()

y = dataset .iloc [:, 1 ]. values ​​

# splitting the dataset into training and test cases

from sklearn.cross_validation import train_test_split


X_train, X_test, y_train, y_test = train_test_split (

X, y , test_size = 0.25 , random_state = 0 )

< code>


Learn programming in R: courses


Best Python online courses for 2022


Best laptop for Fortnite


Best laptop for Excel


Best laptop for Solidworks


Best laptop for Roblox


Best computer for crypto mining


Best laptop for Sims 4


Latest questions


Common xlabel/ylabel for matplotlib subplots

12 answers


How to specify multiple return types using type-hints

12 answers


Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

12 answers


Flake8: Ignore specific warning for entire file

12 answers


glob exclude pattern

12 answers


How to avoid HTTP error 429 (Too Many Requests) python

12 answers


Python CSV error: line contains NULL byte

12 answers


csv.Error: iterator should return strings, not bytes

12 answers


Python | How to copy data from one Excel sheet to another

Common xlabel/ylabel for matplotlib subplots

Check if one list is a subset of another in Python


How to specify multiple return types using type-hints


Printing words vertically in Python


Python Extract words from a given string

Cyclic redundancy check in Python

Finding mean, median, mode in Python without libraries


Python add suffix / add prefix to strings in a list

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

Python - Move item to the end of the list

Python - Print list vertically

# fitting naive bayes to the training set

from sklearn.naive_bayes import GaussianNB

from sklearn .metrics import confusion_matrix


classifier = GaussianNB (); (X_train, y_train)

# predicting test case results

y_pred = classifier.predict (X_test)

# creating confusion

cm = confusion_matrix (y_test, y_pred)