NLP | Tagger-based Training Choker | Set 1



In the codes below, we use the treebank_chunk corpus to create fragmented sentences in the form of trees. 
– & gt; To train tag-based chunker — the methods of the chunked_sents () class are used by the TagChunker class. 
– & gt; To extract a list of (pos, iob) tuples from a list of trees — the TagChunker class uses the conll_tag_chunks () helper function.

These tuples are then finally used to train the tag. and it learns IOB tags for part-of-speech tags.

Code # 1: Let`s take a look at the Chunker class for training.

Output:

 Training TagChunker 

Code # 2: Usage Tag Chunker.

from nltk.chunk import ChunkParserI

from nltk.chunk.util import tree2conlltags, conlltags2tree

from nltk.tag import UnigramTagger, BigramTagger

from tag_util import backoff_tagger

 

  

def conll_tag_chunks (chunk_data):

 

tagged_data = [tree2conlltags (tree) for  

tree in chunk_data]

 

return [[(t, c) for (w, t, c) in sent] 

for sent in tagged_data]

  

class TagChunker (ChunkParserI):

  

  def __ init__ ( self , train_chunks, 

tagger_classes = [UnigramTagger, BigramTagger]):

 

train_data = conll_tag_chunks (train_chunks)

self . tagger = backoff_tagger ( train_data, tagger_classes)

 

< code class = "undefined spaces">  def parse ( self , tagged_sent):

if not tagged_sent: 

return None

 

(words, tags) = zip ( * tagged_sent)

chunks = self . tagger.tag (tags)

wtc < / code> = zip (words, chunks)

 

return conlltags2tree ([(w, t, c) for (w, (t, c) ) in wtc])

# loading libraries

from chunkers import TagChunker

from nltk.corpus import treebank_chunk

  
# data from treebank_chunk

train_data = treebank_chunk.chunked_sents () [: 3000 ]

test_data = treebank_chunk.chunked_sents () [ 3000 :]

 
# Initailazing

chunker = TagChunker (train_data )

Code # 3: TagChunker Rating

# testing

score = chunker.evaluate (test_data)

  

a = score.accuracy ()

p = score.precision ()

r = recall

 

print ( "Accuracy of TagChunker:" , a)

print ( "Precision of TagChunker:" , p)

print ( "Recal l of TagChunker: " , r)

Output:

 Accuracy of TagChunker: 0.9732039335251428 Precision of TagChunker: 0.9166534370535006 Recall of TagChunker: 0.9465573770491803