Understanding the concept —
- RegexpTagger is a subclass of SequentialBackoffTagger. It can be placed before the
DefaultTagger class
to tag words that n-gram tag tags have missed, and thus can be a useful part of the rollback chain. - On initialization, templates are saved to
RegexpTagger class
.choose_tag ()
, it iterates over templates. It then returns the first tag of the expression, which can match the current word using re.match (). - This way, if two given expressions match, then the tag of the first one will be returned without even trying to use the second expression .
- If this template is like — (r & # 39 ;. * & # 39 ;, & # 39; NN & # 39;), the RegexpTagger class can replace the
DefaultTagger class
Code # 1: Regular Expression Python Module and Repeated Syntax
|
The RegexpTagger class expects a list of two tuples
-" first element in the tuple is a regular expression -" second element is the tag
Code # 2: Using RegexpTagger
|
Output:
Accuracy: 0.037470321605870924
What is affix tagging?
This is a subclass of ContextTagger. In the case of the AffixTagger class, the context is either a suffix or a prefix of a word. Thus, it clearly indicates that this class can learn tags based on substrings of a fixed length of the beginning or end of a word.
Indicates three-character suffixes. These words must be at least 5 characters long, and None is returned as a tag if the word contains less than five characters.
Code # 3: Understanding AffixTagger.
|
Output:
Train data: [(’Mr.’,’ NNP’), (’Vinken’,’ NNP’), (’is ’,’ VBZ’), (’chairman’,’ NN’), (’of’,’ IN’), (’Elsevier’,’ NNP’), (’NV’,’ NNP’), (’, ’,’, ’), (’ the’, ’DT’), (’ Dutch’, ’NNP’), (’ publishing’, ’VBG’), (’ group’, ’NN’), (’. ’,’ .’)] Accuracy: 0.27558817181092166
Code # 4: AffixTagger, specifying 3-character prefixes.
|
Output:
Accuracy: 0.23587308439456076
Code # 5: AffixTagger with 2-character suffixes
|
Output:
Accuracy: 0.31940427368875457