NLP | Chunk to text conversion and chunk chaining

Code # 1: Concatenate words in a tree with a space.

# Loading library

from nltk.corpus import treebank_chunk

  
# tree

tree = treebank_chunk.chunked_sents () [ 0 ]

 

print ( "Tree:" , tree)

 

print ( "Tree leav es: " , tree.leaves ())

  

print ( "Sentence from tree:" , `` . join (

[w for w, t in tree.leaves ()]))

Output:

 Tree: (S (NP Pierre / NNP Vinken / NNP), /, (NP 61 / CD years / NNS) old / JJ, /, will / MD join / VB (NP the / DT board / NN) as / IN (NP a / DT nonexecutive / JJ director / NN Nov./NNP 29 / CD) ./.) Tree leaves: [(`Pierre`,` NNP`), (`Vinken`,` NNP`), (`,`, `,`), (`61`,` CD`), (`years`,` NNS`), (`old`,` JJ`), (`,`, `,`), (`will`,` MD`), (`join`,` VB`), (`the`,` DT`), (`board`,` NN`), (`as`,` IN`), (`a`,` DT`), (`nonexecutive`,` JJ`), (`director`,` NN`), (`Nov.`,` NNP`), (`29`,` CD`) , (`.`,` .`)] Sentence from tree: Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. 

As in the above code, punctuation marks are incorrect because period and comma are treated as special words. In this way, they get the surrounding spaces. But in the code below, we can fix it with regex substitution.

Code # 2: chunk_tree_to_sent () to improve Code 1

import re

 
# regex expression definition

punct_re = re. compile (r `s ([,.;?])` )

 

def chunk_tree_to_sent (tree, concat = `` ):

 

s = concat.join ([w for w, t in tree.leaves ()])

return re.sub (punct_re, r `g & lt; 1 & gt;` , s)

Code # 3: Evaluation chunk_tree_to_sent ()

Output:

 Tree: (S (NP Pierre / NNP Vinken / NNP), /, (NP 61 / CD years / NNS) old / JJ, /, will / MD join / VB (NP the / DT board / NN) as / IN (NP a / DT nonexecutive / JJ director / NN Nov./NNP 29 / CD) ./.) Tree leaves: [(`Pierre`,` NNP`), (`Vinken`,` NNP`), (`,`, `,`), (`61`,` CD`), (` years`, `NNS`), (` old`, `JJ`), (`, `,`, `), (` will`, `MD`), (` join`, `VB`), (` the`, `DT`), (` board`, `NN`), (` as`, `IN`), (` a`, `DT`), (` nonexecutive`, `JJ`), (` director`, `NN`), (` Nov.`, `NNP`), (` 29`, `CD`), (` .`, `.`)] Tree to sentence: Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. 

Transforming chunk chaining
Transform functions can be chained to normalize chunks, and the resulting chunks are often shorter and still have the same meaning.

In the code below — one snippet and an optional list of conversion functions is passed to the function. This function will call every transform function in the chunk and return the final chunk.

Code # 4:

# Loading library

from nltk.corpus import treebank_chunk

from transforms import chunk_tree_to_sent

 
# tree

tree = treebank_chunk.chunked_sents () [ 0 ]

 

print ( "Tree:" , tree)

 

print ( "Tree leaves:" , tree.leaves ())

 

print ( "Tree to sentence:" , chunk_tree_to_sent (tree))

def transform_chunk (

chunk, chain = [filter_insignificant, 

  swap_verb_phrase, swap_infinitive_phrase, 

singularize_plural_noun], trace = 0 ):

for f in chain:

  chunk = f (chunk)

 

if trace:

print (f .__ name__, `:` , chunk)

 

return chunk

Code # 5: Evaluation transform_chunk

from transforms import transform_chunk

  

chunk = [( `the` , ` DT` ), ( `book` , ` NN ` ), ( ` of` , `IN` ), 

  ( `recipes` , ` NNS` ), ( `is` , `VBZ` ), ( ` delicious` , `JJ` )]

  

< p> print ( "Chunk:" , chunk)

  

print ( "Transformed Chunk:" , transform_chunk (chunk))

Output:

 Chunk: [ (`the`,` DT`), (`book`,` NN`), (`of`,` IN`), (`recipes`,` NNS`), (`is`,` VBZ`), (`delicious`,` JJ`)] Transformed Chunk: [(`delicious`,` JJ`), (`recipe`,` NN`), (`book`,` NN`)]