NLP | Trigrams & # 39; n & # 39; Tags (TnT)

TnT Tagger: is a statistical tagger that works on second-order Markov models.

  • This is a very effective part-of-speech tagger that can be taught in different languages ​​and on any set of tags.
  • To generate parameters, the component is trained on labeled corpuses. It includes various methods of smoothing and handling unknown words
  • For smoothing, linear interpolation is used, the corresponding weights are determined by remote interpolation.

TnT tagger has a different API than regular tagger ... You can explicitly use the train () method after creating it.

Code # 1: Using the train () method

from nltk.tag import tnt

from nltk.corpus import treebank

 
# initialize learning and testing the suite

train_data = treebank.tagged_sents () [: 3000 ]

test_data = treebank.tagged_sents () [ 3000 :]

 
# tagger initialization

tnt_tagging = tnt.TnT ()

 
# advanced training
tnt_tagging.train (train_data)

  
# rating

a = tnt_tagging.evaluate (test_data)

 

print ( "Accuracy of TnT Tagging:" , a)

Output:

 Accuracy of TnT Tagging: 0.8756313403842003 

Understanding the work you TnT tagger:

  • Supports a number of
    • internal FreqDist
    • ConditionalFreqDist based on training data.
  • Frequency Distribution (FreqDist) counts unigrams, bigrams and trigrams.
  • These frequencies are used to calculate probabilities of possible tags for each word.
  • TnT tagger uses all ngram models together to select the best tag, instead of creating a rollback chain of subclasses of NgramTagger.
  • Based on the probabilities of each possible tag, it chooses the most likely model for the entire sentence.

Code # 2: Using the tagger for unknown words as "unk"

from nltk.tag import tnt

from nltk.corpus import treebank

from nltk.tag import DefaultTagger

 
# initialize learning and testing the suite

train_data = treebank.tagged_sents () [: 3000 ]

test_data = treebank.tagged_sents () [ 3000 :]

 
# tagger initialization

unk = DefaultTagger ( 'NN' < / code> )

tnt_tagging = tnt.TnT (unk = unk, Trained = True )

  
# advanced training
tnt_tagging.train (train_data)

 
# rating

a = tnt_tagging.evaluate (test_data)

 

print ( "Accuracy of TnT Tagging: " , a)

Output:

 Accurac y of TnT Tagging: 0.892467083962875 
  • The tag () method of an unknown tagger is called with only one sentence.
  • A TnT tagger can pass a tagger for unknown words as unk.
  • Trained = True can be passed if this tag is already trained.
  • Otherwise it will call unk.train (data) with the same data that can be passed to the train () method.

Beam search control:

  • Another parameter that needs to be changed for TnT is N, i.e. e. it controls the value of no. possible solutions that the tagger supports.
  • The default is N = 1000.
  • The amount of memory will increase as the value of N increases without any particular improvement in precision.
  • The amount of memory will decrease if the value of N decreases, but may decrease accuracy.

Code # 3: Using N = 100

from nltk.tag import tnt

from nltk.corpus import treebank

from nltk.tag import DefaultTagger

 
# initialize learning and test set

train_data = treebank.tagged_sents () [: 3000 ]

test_data = treebank.tagged_sents () [ 3000 :]

 
# tagger initialization

tnt_tagger = tnt.TnT (N = 100 )

 
# training
tnt_tagging.train (train_data)

 
# rating

a = tnt_tagging.evaluate (test_data)

  

print ( " Accuracy of TnT Tagging: " , a)

Output:

 Accuracy of TnT Tagging: 0.8756313403842003 




Get Solution for free from DataCamp guru