Change language

Application of the polynomial naive Bayesian approach to NLP problems

| |

P (c | x) = P (x | c) * P (c) / P (x)

Naive Bayesian in mainly used in natural language processing (NLP) tasks. A naive Bayesian predicts a text tag. They calculate the likelihood of each tag for a given text and then output the tag with the highest value.

How does naive Bayesian algorithm work?

Let’s take an example, classify an overview whether it is positive or negative.

Training dataset:

Text Reviews
“I liked the movie” positive
“It’s a good movie. Nice story ” positive
“ Nice songs. But sadly boring ending. ” negative
“ Hero’s acting is bad but heroine looks good. Overall nice movie ” positive
“ Sad, boring movie ” negative

We classify whether the text "generally liked the movie" has a positive review or a negative review. We have to calculate,
P (positive | liked the movie overall) — the likelihood that the sentence tag is positive given that the sentence “liked the movie as a whole”. 
P (negative | liked the movie overall) — the likelihood that the sentence tag is negative given that the sentence “liked the movie as a whole.”

Before that, firstly, we apply Stop Word and Stemming Removal in the text.

Remove Stop Words : These are ordinary words that add nothing to the classification, such as skill, yet, ever, and so on.

Stemming : Stemming to infer the root of a word.

Now, after applying these two methods, our text becomes

Text Reviews
“ ilikedthemovi ” positive
“itsagoodmovienicestori” positive
“nicesongsbutsadlyboringend” negative
“herosactingisbadbutheroinelooksgoodoverallnicemovi” positive
“sadboringmovi” negative

Feature:
The important part is finding the features of the data to make machine learning algorithms work. In this case, we have text. We need to convert this text to numbers on which we can perform calculations. We use word frequencies. That is, each document is considered as a set of words that it contains. Our specifics will be counts of each of these words.

In our case, we have P (positive | movie liked overall) using this theorem:

P (positive | overall liked the movie) = P (overall liked the movie | positive) * P (positive) / P (overall liked the movie)

Since for our classifier we have to figure out which tag has the highest probability, we can drop the divisor, which is the same for both tags,

P (generally liked the movie | positive ) * P (positive) with P (overall liked the movie | negative) * P (negative)

However, there is a problem: “Generally liked the movie” does not appear in our training dataset, so the probability is zero ... Here we are assuming a "naive" condition that each word in a sentence is independent of the others. This means that we are now looking at individual words.

We can write it like:

P (overall liked the movie) = P (overall) * P (liked ) * P (the) * P (movie)

The next step is to apply Bayes’ theorem:

P (overall liked the movie | positive ) = P (overall | positive) * P (liked | positive) * P (the | positive) * P (movie | positive)

And now, these individual words actually appear several times in our training data and we can calculate them!

Calculating probabilities:

First we calculate the prior probability of each tag: for a given sentence in our training data, the probability that it is positive P (positive) is 3/5. Then P (negative) is 2/5.

Then calculating P (general | positive) means counting how many times the word “general” occurs in positive texts (1), divided by the total number of words in positive (eleven). Therefore, P (overall | positive) = 1/17, P (liked / positive) = 1/17, P (positive / positive) = 2/17, P (movie / positive) = 3/17.

If the probability turns out to be zero, then using Laplace smoothing: we add 1 to each score so that it never equals zero. To counterbalance this, we add the number of possible words to the divisor so that the division never exceeds 1. In our case, the total number of possible words is 21.

Applying anti-aliasing, the results are:

Word P (word | positive) P (word | negative)
overall 1 + 1/17 + 21 0 + 1/7 + 21
liked 1 + 1/17 + 21 0 + 1/7 + 21
the 2 + 1/17 + 21 0 + 1/7 + 21
movie 3 + 1/17 + 21 1 + 1/7 + 21

Now we just multiply all the probabilities and see who is bigger:

P (overall | positive) * P (liked | positive) * P (the | positive ) * P (movie | positive) * P (postive) = 1.38 * 10 ^ {- 5} = 0.0000138

P (overall | negative) * P (liked | negative) * P (the | negative ) * P (movie | negative) * P (negative) = 0.13 * 10 ^ {- 5} = 0.0000013

Our classifier gives a "generally liked movie" positive tag.

Below is the implementation:

# text cleaning

import pandas as pd

import re

import nltk

from nltk.corpus import stopwords

from nltk.stem.porter import PorterStemmer

from sklearn.feature_extraction.text import CountVectorizer  

 

dataset = [[ "I liked the movie" , "positive" ],

[ "It’s a good movie. Nice story " , " positive " ],

["Hero’s acting is bad but heroine looks good. 

Overall nice movie "," positive "],

  [ "Nice songs. But sadly boring ending. " , " negative " ],

[ "sad movie, boring movie" , "negative" ]]

 

dataset = pd.DataFrame (dataset)

dataset.columns = [ "Text" , "Reviews" ]

 

nltk.download ( ’stopwords’ )

 

cor pus = []

 

for i in range ( 0 , 5 ):

text = re.sub ( ’[^ a-zA-Z] ’ ,’ ’, dataset [’ Text’] [i])

text = text.lower ()

  text = text.split ()

  ps = PorterStemmer ()

text = ’’ .join (text)

corpus.append (text )

 
# create a bag of words

cv = CountVectorizer (max_features = 1500 )

  

X = cv.fit_transform (corpus) .toarray ()

y = dataset .iloc [:, 1 ]. values ​​

# splitting the dataset into training and test cases

from sklearn.cross_validation import train_test_split

 

X_train, X_test, y_train, y_test = train_test_split (

X, y , test_size = 0.25 , random_state = 0 )

< code>

Shop

Learn programming in R: courses

$

Best Python online courses for 2022

$

Best laptop for Fortnite

$

Best laptop for Excel

$

Best laptop for Solidworks

$

Best laptop for Roblox

$

Best computer for crypto mining

$

Best laptop for Sims 4

$

Latest questions

NUMPYNUMPY

Common xlabel/ylabel for matplotlib subplots

12 answers

NUMPYNUMPY

How to specify multiple return types using type-hints

12 answers

NUMPYNUMPY

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

12 answers

NUMPYNUMPY

Flake8: Ignore specific warning for entire file

12 answers

NUMPYNUMPY

glob exclude pattern

12 answers

NUMPYNUMPY

How to avoid HTTP error 429 (Too Many Requests) python

12 answers

NUMPYNUMPY

Python CSV error: line contains NULL byte

12 answers

NUMPYNUMPY

csv.Error: iterator should return strings, not bytes

12 answers


Wiki

Python | How to copy data from one Excel sheet to another

Common xlabel/ylabel for matplotlib subplots

Check if one list is a subset of another in Python

sin

How to specify multiple return types using type-hints

exp

Printing words vertically in Python

exp

Python Extract words from a given string

Cyclic redundancy check in Python

Finding mean, median, mode in Python without libraries

cos

Python add suffix / add prefix to strings in a list

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

Python - Move item to the end of the list

Python - Print list vertically

# fitting naive bayes to the training set

from sklearn.naive_bayes import GaussianNB

from sklearn .metrics import confusion_matrix

 

classifier = GaussianNB (); 

classifier.fit (X_train, y_train)

  
# predicting test case results

y_pred = classifier.predict (X_test)

 
# creating confusion

cm = confusion_matrix (y_test, y_pred)

cm