NLP | Unigram Tagger Training



Unigram Tagger: Only one word is used to define the Part of Speech tag.  UnigramTagger inherits from NgramTagger, which is a subclass of ContextTagger , which inherits from SequentialBackoffTagger . So UnigramTagger — this is context tagging, UnigramTagger from one word.

Code # 1: UnigramTagger training.

# Loading Libraries

from nltk.tag import UnigramTagger

from nltk.corpus import treebank

Code # 2: Train using the first 1000 tag sentences of the tree corpus as data.

# Data usage

train_sents = treebank.tagged_sents () [: 1000 ]

 
# Initialization

tagger = UnigramTagger (train_sents)

 
# Let`s take a look at the first sentence
# (tree body) as a list

treebank.sents () [ 0 ]

Output:

 [`Pierre`,` Vinken`, `,`, `61`,` years`, `old`,`, `,` will`, `join`,` the`, `board`,` as`, `a`,` nonexe cutive`, `director`,` Nov.`, `29`,` .`] 

Code # 3: Find marked results after training.

tagger.tag (treebank.sents () [ 0 ])

Output:

 [(`Pierre`,` NNP`), (`Vinken`,` NNP`), (`,`, `,`), (`61`,` CD`), (`years`,` NNS`), (`old`,` JJ`), (`,`, `,`), (`will`,` MD`), (`join`,` VB`), (`the`,` DT`), (`board`,` NN`), (`as`,` IN`), (`a`,` DT`), (`nonexecutive`,` JJ`), (`director`,` NN`), (`Nov.`,` NNP`), (`29`,` CD`), (`.`,` .`)] 

How does the code work?
UnigramTagger creates a context model from a list of UnigramTagger sentences. Since UnigramTagger inherits from ContextTagger , instead of providing choose_tag () , it must implement a context () method that accepts the same three arguments as choose_tag () . The context marker is used to create the model and also to find the best tag after creating the model. This is also explained graphically in the diagram above.

Overriding the context model —
All taggers inherited from ContextTagger instead of training their own your own model can use the pre- ContextTagger model. This model — just a Python dictionary mapping a context key to a tag. The context keys (single words in the case of UnigramTagger) will depend on what ContextTagger subclass returns from its context () method.

Code # 4: overriding the context model

tagger = UnigramTagger (model = { `Pierre` : `NN` })

 

tagger.tag (treebank.sents () [ 0 ])

Output:

 [(`Pierre`,` NN`), (`Vinken`, None), (`, `, None), (` 61 `, None), (` years`, None), (`old`, None), (`, `, No ne), (`will`, None), (` join`, None), (`the`, None), (` board`, None), (`as`, None), (` a`, None) , (`nonexecutive`, None), (` director`, None), (`Nov.`, None), (` 29`, None), (`.`, None)]