Change language

NLP | Chunk to text conversion and chunk chaining

| |

Code # 1: Concatenate words in a tree with a space.

# Loading library

from nltk.corpus import treebank_chunk

  
# tree

tree = treebank_chunk.chunked_sents () [ 0 ]

 

print ( "Tree:" , tree)

 

print ( "Tree leav es: " , tree.leaves ())

  

print ( "Sentence from tree:" , ’’ . join (

[w for w, t in tree.leaves ()]))

Output:

 Tree: (S (NP Pierre / NNP Vinken / NNP), /, (NP 61 / CD years / NNS) old / JJ, /, will / MD join / VB (NP the / DT board / NN) as / IN (NP a / DT nonexecutive / JJ director / NN Nov./NNP 29 / CD) ./.) Tree leaves: [(’Pierre’,’ NNP’), (’Vinken’,’ NNP’), (’,’, ’,’), (’61’,’ CD’), (’years’,’ NNS’), (’old’,’ JJ’), (’,’, ’,’), (’will’,’ MD’), (’join’,’ VB’), (’the’,’ DT’), (’board’,’ NN’), (’as’,’ IN’), (’a’,’ DT’), (’nonexecutive’,’ JJ’), (’director’,’ NN’), (’Nov.’,’ NNP’), (’29’,’ CD’) , (’.’,’ .’)] Sentence from tree: Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. 

As in the above code, punctuation marks are incorrect because period and comma are treated as special words. In this way, they get the surrounding spaces. But in the code below, we can fix it with regex substitution.

Code # 2: chunk_tree_to_sent () to improve Code 1

import re

 
# regex expression definition

punct_re = re. compile (r ’s ([,.;?])’ )

 

def chunk_tree_to_sent (tree, concat = ’’ ):

 

s = concat.join ([w for w, t in tree.leaves ()])

return re.sub (punct_re, r ’g "1"’ , s)

Code # 3: Evaluation chunk_tree_to_sent ()

Output:

 Tree: (S (NP Pierre / NNP Vinken / NNP), /, (NP 61 / CD years / NNS) old / JJ, /, will / MD join / VB (NP the / DT board / NN) as / IN (NP a / DT nonexecutive / JJ director / NN Nov./NNP 29 / CD) ./.) Tree leaves: [(’Pierre’,’ NNP’), (’Vinken’,’ NNP’), (’,’, ’,’), (’61’,’ CD’), (’ years’, ’NNS’), (’ old’, ’JJ’), (’, ’,’, ’), (’ will’, ’MD’), (’ join’, ’VB’), (’ the’, ’DT’), (’ board’, ’NN’), (’ as’, ’IN’), (’ a’, ’DT’), (’ nonexecutive’, ’JJ’), (’ director’, ’NN’), (’ Nov.’, ’NNP’), (’ 29’, ’CD’), (’ .’, ’.’)] Tree to sentence: Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. 

Transforming chunk chaining
Transform functions can be chained to normalize chunks, and the resulting chunks are often shorter and still have the same meaning.

In the code below — one snippet and an optional list of conversion functions is passed to the function. This function will call every transform function in the chunk and return the final chunk.

Code # 4:

# Loading library

from nltk.corpus import treebank_chunk

from transforms import chunk_tree_to_sent

 
# tree

tree = treebank_chunk.chunked_sents () [ 0 ]

 

print ( "Tree:" , tree)

 

print ( "Tree leaves:" , tree.leaves ())

 

print ( "Tree to sentence:" , chunk_tree_to_sent (tree))

def transform_chunk (

chunk, chain = [filter_insignificant, 

  swap_verb_phrase, swap_infinitive_phrase, 

singularize_plural_noun], trace = 0 ):

for f in chain:

  chunk = f (chunk)

 

if trace:

print (f .__ name__, ’:’ , chunk)

 

return chunk

Code # 5: Evaluation transform_chunk

from transforms import transform_chunk

  

chunk = [( ’the’ , ’ DT’ ), ( ’book’ , ’ NN ’ ), ( ’ of’ , ’IN’ ), 

  ( ’recipes’ , ’ NNS’ ), ( ’is’ , ’VBZ’ ), ( ’ delicious’ , ’JJ’ )]

  

< p> print ( "Chunk:" , chunk)

  

print ( "Transformed Chunk:" , transform_chunk (chunk))

Output:

 Chunk: [ (’the’,’ DT’), (’book’,’ NN’), (’of’,’ IN’), (’recipes’,’ NNS’), (’is’,’ VBZ’), (’delicious’,’ JJ’)] Transformed Chunk: [(’delicious’,’ JJ’), (’recipe’,’ NN’), (’book’,’ NN’)] 

Shop

Learn programming in R: courses

$

Best Python online courses for 2022

$

Best laptop for Fortnite

$

Best laptop for Excel

$

Best laptop for Solidworks

$

Best laptop for Roblox

$

Best computer for crypto mining

$

Best laptop for Sims 4

$

Latest questions

NUMPYNUMPY

psycopg2: insert multiple rows with one query

12 answers

NUMPYNUMPY

How to convert Nonetype to int or string?

12 answers

NUMPYNUMPY

How to specify multiple return types using type-hints

12 answers

NUMPYNUMPY

Javascript Error: IPython is not defined in JupyterLab

12 answers

News


Wiki

Python OpenCV | cv2.putText () method

numpy.arctan2 () in Python

Python | os.path.realpath () method

Python OpenCV | cv2.circle () method

Python OpenCV cv2.cvtColor () method

Python - Move item to the end of the list

time.perf_counter () function in Python

Check if one list is a subset of another in Python

Python os.path.join () method