NLP | Combining NGram Taggers

NgramTagger has 3 subclasses

  • UnigramTagger
  • BigramTagger
  • TrigramTagger

BigramTagger subclass uses the previous tag as part of its context
The TrigramTagger subclass uses the previous two tags as part of its context.

ngram — it is a subsequence of n elements. 
Idea for NgramTagger subclasses:

  • Looking at previous words and POS tags, you can guess the part of speech tag for the current word.
  • Each tagger maintains a context dictionary (the parent class ContextTagger is used to implement it).
  • This dictionary is used to guess this tag based on the context.
  • The context is a number of previously tagged words in the case subclasses of NgramTagger.

Code # 1: Working with Bigram Tagger

# Loading libraries

from nltk.tag import DefaultTagger 

from nltk.tag import BigramTagger

  

from nltk.corpus import treebank

 
# initialize training and testing the set

train_data = treebank.tagged_sents () [: 3000 ]

test_data = treebank.tagged_sents () [ 3000 :]

 
# Tag

tag1 = BigramTagger (train_data)

 
# Rating tag1.evaluate (test_data)

Output:

 0.11318799913662854 

Code # 2: Working with Trigram tag

# Loading Libraries

from nltk.tag import DefaultTagger 

from nltk.tag import TrigramTagger

 

from nltk.corpus import treebank

  
# initialization of training and testing set

train_data = treebank.tagged_sents ( ) [: 3000 ]

test_data = treebank.tagged_sents () [ 3000 :]

 
# Markup

tag1 = TrigramTagger (train_data)

 
# Rating
tag1. evaluate (test_data)

Output:

 0.06876753723289446 

Code # 3: Collectively using Unigram, Bigram and Trigram tags.

# Loading libraries

 

from nltk.tag import TrigramTagger

from tag_util import backoff_tagger

from nltk.corpus import treebank

 
# initialize learning and testing the suite

train_data = treebank.tagged_sents () [: 3000 ]

test_data = treebank.tagged_sents () [ 3000 :]

 

backoff = DefaultTagger ( `NN` )

tag = backoff_tagger (train_sents, 

[UnigramTagger, BigramTagger, TrigramTagger], 

backoff = backoff)

 
tag.evaluate (test_sents)

Output:

 0.8806820634578028 

How does it work?

     

  • The backoff_tagger function creates an instance of each tagger class.
  • This gives the previous tagger and train_sents as fallback.
  • The order of the tagger classes is important: in the above In the code, the first class is UnigramTagger, and therefore it will be trained first and receive the initial rollback tag (DefaultTagger).
  • This tagger then becomes the return tag for the next tagger class.
  • Last returned the tagger will be an instance of the latest tags —  TrigramTagger .

Code # 4: Proof

print (tagger._taggers [ - 1 ] = = backoff)

 

print ( "" , isinstance (tagger._taggers [ 0 ], TrigramTagger))

 

print ( " " , i sinstance (tagger._taggers [ 1 ], BigramTagger))

Output:

 True True True