NLP | Classifier-based tags

NLP | Python Methods and Functions | Regular Expressions

ClassifierBasedPOSTagger class :

  • This is a subclass of ClassifierBasedTagger that uses a classification technique to perform part-of-speech tagging.
  • From words, functions are extracted and then passed to the internal classifier.
  • It classifies the functions and returns a label, that is, a part of speech tag.
  • The feature detector finds suffixes of several lengths, matches regular expressions and looks at the history of unigrams, bigrams, etc. trigrams to get a fairly complete set of functions for each word.

Code # 1: Using ClassifierBasedPOSTagger

from nltk.tag.sequential import ClassifierBasedPOSTagger

from nltk.corpus import treebank

 
# initialize training and testing the set

train_data = treebank.tagged_sents () [: 3000 ]

test_data = treebank.tagged_sents () [ 3000 :]

  

tagging = ClassifierBasedPOSTagger (train = train_data)

 

a = tagging.evaluate (test_data)

  

print ( " Accuracy: " , a)

Output:

 Accuracy: 0.9309734513274336 

The ClassifierBasedPOSTagger class inherits from ClassifierBasedTagger and only implements the feature_detector () method. All training and tagging is done in the ClassifierBasedTagger.

Code # 2: Using the MaxentClassifier

from nltk.classify import MaxentClassifier

from nltk.corpus import treebank

 
# initialize learning and testing the suite

train_data = treebank.tagged_sents () [: 3000 ]

test_data = treebank.tagged_sents () [ 3000 :]

  

 

tagger = ClassifierBasedPOSTagger (

  train = train_sents, classifier_builder = MaxentClassifier.train)

 

a = tagger.evaluate (test_data)

 

print ( "Accuracy:" , a)

Output:

 Accuracy: 0.9258363911072739 

  custom feature detector detection features
There are two ways to do this:

  1. Subclass ClassifierBasedTagger and implement the feature_detector () method.
  2. Pass the feature as an argument of the feature_detector keyword in the ClassifierBasedTagger on initialization.

Code # 3: Custom Feature Detector

from nltk.tag.sequential import ClassifierBasedTagger

from tag_util import unigram_feature_detector

from nltk.corpus import treebank

  
# initialize training and testing the set

train_data = treebank.tagged_sents () [: 3000 ]

test_data = treebank.tagged_sents () [ 3000 :]

 

tag = ClassifierBasedTagger (

train = train_data, 

feature_detector = unigram_feature_detector)

  

a = tagger .evaluate (test_data)

 

print ( "Accuracy:" , a)

Output:

 Accuracy: 0.8733865745737104 




Get Solution for free from DataCamp guru