Change language

NLP | Regex and Affix labeling

| |

Understanding the concept —

  • RegexpTagger is a subclass of SequentialBackoffTagger. It can be placed before the DefaultTagger class to tag words that n-gram tag tags have missed, and thus can be a useful part of the rollback chain.
  • On initialization, templates are saved to RegexpTagger class choose_tag () , it iterates over templates. It then returns the first tag of the expression, which can match the current word using re.match ().
  • This way, if two given expressions match, then the tag of the first one will be returned without even trying to use the second expression .
  • If this template is like — (r & # 39 ;. * & # 39 ;, & # 39; NN & # 39;), the RegexpTagger class can replace the DefaultTagger class

Code # 1: Regular Expression Python Module and Repeated Syntax

patterns = [(r ’^ d + $’ , ’CD’ ),

# gerunds, meaning interesting

(r ’. * ing $’ , ’VBG’ ), 

  # that is, a miracle

  (r ’. * ment $’ , ’NN’ ),

  # that is great

(r ’. * ful $’ , ’JJ’ )]

The RegexpTagger class expects a list of two tuples

 -" first element in the tuple is a regular expression -" second element is the tag 

Code # 2: Using RegexpTagger

# Loading libraries

from tag_util import patterns

from nltk.tag import RegexpTagger

from nltk. corpus import treebank

  

test_data = treebank.tagged_sents () [ 3000 :]

 

tagger = RegexpTagger (patterns)

print ( " Accuracy: " , tagger.evaluate ( test_data))

Output:

 Accuracy: 0.037470321605870924 

What is affix tagging?
This is a subclass of ContextTagger. In the case of the AffixTagger class, the context is either a suffix or a prefix of a word. Thus, it clearly indicates that this class can learn tags based on substrings of a fixed length of the beginning or end of a word. 
Indicates three-character suffixes. These words must be at least 5 characters long, and None is returned as a tag if the word contains less than five characters.

Code # 3: Understanding AffixTagger.

# loading libraries

from tag_util import word_tag_model

from nltk.corpus import treebank

from nltk.tag import AffixTagger

 
# initialize learning and testing the suite

train_data = < / code> treebank.tagged_sents () [: 3000 ]

test_data = treebank.tagged_sents () [ 3000 :]

 

print ( "Train data:" , train_data [ 1 ])

  
# Tagger initialization

tag = AffixTagger (train_data)

 
# Testing

print ( "Accuracy:" , tag.evaluate (test_data))

Output:

 Train data: [(’Mr.’,’ NNP’), (’Vinken’,’ NNP’), (’is ’,’ VBZ’), (’chairman’,’ NN’), (’of’,’ IN’), (’Elsevier’,’ NNP’), (’NV’,’ NNP’), (’, ’,’, ’), (’ the’, ’DT’), (’ Dutch’, ’NNP’), (’ publishing’, ’VBG’), (’ group’, ’NN’), (’. ’,’ .’)] Accuracy: 0.27558817181092166 

Code # 4: AffixTagger, specifying 3-character prefixes.

# Specify 3 character prefixes

prefix_tag = AffixTagger (train_data, 

affix_length = 3 )

  
# Testing

accuracy = prefix_tag.evaluate (test_data)

 

print ( "Accuracy:" , accuracy)

Output:

 Accuracy: 0.23587308439456076 

Code # 5: AffixTagger with 2-character suffixes

# Specify two-character suffixes

sufix_tag = AffixTagger (train_data, 

  affix_length = < code class = "keyword"> - 2 )

  
# Testing

accuracy = sufix_tag.evaluate (test_data)

 

print ( "Accuracy:" , accuracy)

Output:

 Accuracy: 0.31940427368875457 

Shop

Learn programming in R: courses

$

Best Python online courses for 2022

$

Best laptop for Fortnite

$

Best laptop for Excel

$

Best laptop for Solidworks

$

Best laptop for Roblox

$

Best computer for crypto mining

$

Best laptop for Sims 4

$

Latest questions

NUMPYNUMPY

Common xlabel/ylabel for matplotlib subplots

12 answers

NUMPYNUMPY

How to specify multiple return types using type-hints

12 answers

NUMPYNUMPY

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

12 answers

NUMPYNUMPY

Flake8: Ignore specific warning for entire file

12 answers

NUMPYNUMPY

glob exclude pattern

12 answers

NUMPYNUMPY

How to avoid HTTP error 429 (Too Many Requests) python

12 answers

NUMPYNUMPY

Python CSV error: line contains NULL byte

12 answers

NUMPYNUMPY

csv.Error: iterator should return strings, not bytes

12 answers

News


Wiki

Python | How to copy data from one Excel sheet to another

Common xlabel/ylabel for matplotlib subplots

Check if one list is a subset of another in Python

sin

How to specify multiple return types using type-hints

exp

Printing words vertically in Python

exp

Python Extract words from a given string

Cyclic redundancy check in Python

Finding mean, median, mode in Python without libraries

cos

Python add suffix / add prefix to strings in a list

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

Python - Move item to the end of the list

Python - Print list vertically