NLP | Splitting with Corpus Reader

How it works:

  • The ChunkedCorpusReader class works similarly to TaggedCorpusReader for receiving tagged tokens, and also provides three new methods for receiving chunks .
  • An instance of nltk.tree.Tree represents each chunk.
  • Nominal phrase trees look like a Tree (& # 39; NP & # 39 ;, [… ]), and Sentence level trees look like Tree (& # 39; S & # 39 ;, […]).
  • A list of sentence trees with each nominative phrase as a sentence subtree is obtained in n chunked_sents ()
  • A list of noun trees, along with marked tokens of words that were not in the chunk, is obtained in chunked_words ().

A diagram listing the main methods:

Code # 1 : Create ChunkedCorpusReader for words

< / p>

 

# Using ChunkedCorpusReader

from nltk.corpus.reader import ChunkedCorpusReader

 
# initialization

x = ChunkedCorpusReader ( `.` , r `. *. chunk` )

 

words = x.chunked_words ()

print ( "Words:" , words)

Output:

 Words: [Tree (`NP`, [(` Earlier`, `JJR`), (` staff-reduction`, `NN`), (` moves`, `NNS`)]), (` have`, `VBP`), ...] 

Code # 2: for offer

Chunked Sentence = x.chunked_sents ()

print ( "Chunked Sentence:" , tagged_sent)

Output:

 Chunked Sentence: [Tree (`S`, [Tree (` NP `, [(` Earlier`, `JJR`), (` staff-reduction`, `NN`), (` moves`, `NNS`)]), (` have`, `VBP`), (` trimmed `,` VBN`), (`about`,` IN`), Tree (`NP`, [(` 300`, `CD`), (` jobs`, `NNS`)]), (`, ` , `,`), Tree (`NP`, [(` the`, `DT`), (` spokesman`, `NN`)]), (` said`, `VBD`), (` .`, `.`)])] 

Code # 3: for paragraphs

para = x.chunked_paras () ()

print ( "para:" , para)

Output:

 [[Tree ( `S`, [Tree (` NP`, [(`Earlier`,` JJR`), (`staff-reduction`,` NN`), (`moves`,` NNS`)]), (`have` , `VBP`), (` trimmed`, `VBN`), (` about`, `IN`), Tree (` NP`, [(`300`,` CD`), (`jobs`,` NNS `)]), (`, `,`, `), Tree (` NP`, [(`the`,` DT`), (`spokesman`,` NN`)]), (`said`,` VBD`), (`.`,` .`)])]]