Change language

NLP | Combining NGram Taggers

| |

NgramTagger has 3 subclasses

  • UnigramTagger
  • BigramTagger
  • TrigramTagger

BigramTagger subclass uses the previous tag as part of its context
The TrigramTagger subclass uses the previous two tags as part of its context.

ngram — it is a subsequence of n elements. 
Idea for NgramTagger subclasses:

  • Looking at previous words and POS tags, you can guess the part of speech tag for the current word.
  • Each tagger maintains a context dictionary (the parent class ContextTagger is used to implement it).
  • This dictionary is used to guess this tag based on the context.
  • The context is a number of previously tagged words in the case subclasses of NgramTagger.

Code # 1: Working with Bigram Tagger

# Loading libraries

from nltk.tag import DefaultTagger 

from nltk.tag import BigramTagger

  

from nltk.corpus import treebank

 
# initialize training and testing the set

train_data = treebank.tagged_sents () [: 3000 ]

test_data = treebank.tagged_sents () [ 3000 :]

 
# Tag

tag1 = BigramTagger (train_data)

 
# Rating tag1.evaluate (test_data)

Output:

 0.11318799913662854 

Code # 2: Working with Trigram tag

# Loading Libraries

from nltk.tag import DefaultTagger 

from nltk.tag import TrigramTagger

 

from nltk.corpus import treebank

  
# initialization of training and testing set

train_data = treebank.tagged_sents ( ) [: 3000 ]

test_data = treebank.tagged_sents () [ 3000 :]

 
# Markup

tag1 = TrigramTagger (train_data)

 
# Rating
tag1. evaluate (test_data)

Output:

 0.06876753723289446 

Code # 3: Collectively using Unigram, Bigram and Trigram tags.

# Loading libraries

 

from nltk.tag import TrigramTagger

from tag_util import backoff_tagger

from nltk.corpus import treebank

 
# initialize learning and testing the suite

train_data = treebank.tagged_sents () [: 3000 ]

test_data = treebank.tagged_sents () [ 3000 :]

 

backoff = DefaultTagger ( ’NN’ )

tag = backoff_tagger (train_sents, 

[UnigramTagger, BigramTagger, TrigramTagger], 

backoff = backoff)

 
tag.evaluate (test_sents)

Output:

 0.8806820634578028 

How does it work?

     
  • The backoff_tagger function creates an instance of each tagger class.
  • This gives the previous tagger and train_sents as fallback.
  • The order of the tagger classes is important: in the above In the code, the first class is UnigramTagger, and therefore it will be trained first and receive the initial rollback tag (DefaultTagger).
  • This tagger then becomes the return tag for the next tagger class.
  • Last returned the tagger will be an instance of the latest tags —  TrigramTagger .

Code # 4: Proof

print (tagger._taggers [ - 1 ] = = backoff)

 

print ( "" , isinstance (tagger._taggers [ 0 ], TrigramTagger))

 

print ( " " , i sinstance (tagger._taggers [ 1 ], BigramTagger))

Output:

 True True True 

Shop

Learn programming in R: courses

$

Best Python online courses for 2022

$

Best laptop for Fortnite

$

Best laptop for Excel

$

Best laptop for Solidworks

$

Best laptop for Roblox

$

Best computer for crypto mining

$

Best laptop for Sims 4

$

Latest questions

NUMPYNUMPY

Common xlabel/ylabel for matplotlib subplots

12 answers

NUMPYNUMPY

How to specify multiple return types using type-hints

12 answers

NUMPYNUMPY

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

12 answers

NUMPYNUMPY

Flake8: Ignore specific warning for entire file

12 answers

NUMPYNUMPY

glob exclude pattern

12 answers

NUMPYNUMPY

How to avoid HTTP error 429 (Too Many Requests) python

12 answers

NUMPYNUMPY

Python CSV error: line contains NULL byte

12 answers

NUMPYNUMPY

csv.Error: iterator should return strings, not bytes

12 answers

News


Wiki

Python | How to copy data from one Excel sheet to another

Common xlabel/ylabel for matplotlib subplots

Check if one list is a subset of another in Python

sin

How to specify multiple return types using type-hints

exp

Printing words vertically in Python

exp

Python Extract words from a given string

Cyclic redundancy check in Python

Finding mean, median, mode in Python without libraries

cos

Python add suffix / add prefix to strings in a list

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

Python - Move item to the end of the list

Python - Print list vertically