What is part-of-speech (POS) tagging?
This is the process of converting sentences to forms — a list of words, a list of tuples (where each tuple is of the form (word, tag) ). The tag in the case of a word is a part of speech tag and indicates whether the word is a noun, adjective, verb, etc.
Default Marking is the basic step for part-of-speech marking. This is done using the DefaultTagger class. The
DefaultTagger class takes a tag as its only argument. NN — it is a tag for a singular noun.
DefaultTagger is most useful when it works with the most common part of speech tag. This is why the noun tag is recommended.
Code # 1: How does it work?
[(’Hello’,’ NN’), (’Geeks’,’ NN’)]
Each tagger has a
tag () method that accepts a list of tokens (usually a list of words generated by a word tokenizer) where each token is a separate word.
tag () returns a list of tagged tokens — a tuple of.
How does DefaultTagger work?
This is a subclass of
SequentialBackoffTagger and implements a
choose_tag () method that has three arguments.
- list of tokens
- Index of the current token to select a tag.
- list of previous tags
Code # 2: Marking Offers
[[(’welcome’, ’NN’), (’ to’, ’NN’), (’ .’, ’NN’)], [(’ Geeks’, ’NN’), (’ for’, ’NN’), (’ Geeks ’,’ NN’)]]
Note. Each tag in the tagged offer list (in the code above) is NN, because we used the
DefaultTagger class .
Code # 3: Illustrating how to mark up.
[’Geeks’ , ’for’,’ Geeks’]