NLP | Expanding and Removing Fragments with RegEx

This article focuses on 3 such classes:

ExpandRightRule: it adds chink (no chunk) words to the right of the chunk. 
ExpandLeftRule: it adds chink words (no chunk) to the left of the chunk. 
For ExpandLeftRule and ExpandRightRule takes — the right and left chink patterns respectively, which we want to add to the beginning and end of the chunk respectively.

UnChunkRule: it cuts off any matching chunk and becomes a chink.

Code # 1: How the code works

# Loading libraries

from nltk.chunk.regexp import ChunkRule, ExpandLeftRule

from nltk.chunk.regexp import ExpandRightRule, UnChunkRule

from nltk.chunk import RegexpChunkParser

 
# ChunkRule initialization

ur = ChunkRule ( ` & lt; NN & gt; ` , ` single noun ` )

  
# Initializing ExpandLeftRule

el = ExpandLeftRule ( `& lt; DT & gt;` , `& lt; NN & gt;` , `get left determiner` )

 
# Initializing ExpandRightRule

er = ExpandRightRule ( `& lt; NN & gt;` , `& lt; NNS & gt;` , `get right plural noun` )

  
# Initializing UnChunkRule

un = UnChunkRule ( ` & lt; DT & gt; & lt; NN. * & gt; * ` , `unchunk everything` )

  

chunker = RegexpChunkParser ([ur, el, er, un])

 

sent = [( ` the` , `DT` ), (  `sushi` , ` NN` ), ( `rolls` , ` NNS` )]

 
chunker.parse (sent)

Output:

 Tree (`S`, [(` the`, `DT`), (` sushi`, `NN`), (` rolls`, `NNS`)]) 

Note. Output — this is a flat sentence, since the UnChunkRule undoes the chunk created according to the previous rules.

How does it work?

  • Make the chunk with a noun. 

  • Extension left determinants to pieces that start with a noun. 

  • Extension regular plural nouns into pieces ending in a noun. 

  • Finally , it frees every fragment that is qualifier + noun + plural noun, resulting in the original sentence tree. 

    ​​

Code # 2: Step-by-Step Code Schema Explain.

# Loading libraries

from nltk.chunk.regexp import ChunkRule, ExpandLeftRule

from nltk.chunk.regexp import ExpandRightRule, UnChunkRule

from nltk.chunk import RegexpChunkParser

from nltk.chunk.regexp import ChunkString

from nltk.tree import Tree

  

chunk_string = ChunkString (Tree ( `S` , sent))

print ( "Chunk String:" , chunk_string)

 
# ChunkRule initialization

ur = ChunkRule ( `& lt; NN & gt;` , `single noun` )

ur. apply (chunk_string)

print ( "step 1:" , chunk_string)

 
# Initializing ExpandLeftRule

el = ExpandLeftRule ( `& lt; DT & gt;` , `& lt; NN & gt;` , `get left determiner` )

el. apply (chunk_string)

print ( "step 2:" , chunk_string)

 
# Initializing ExpandRightRule

er = ExpandRightRule ( `& lt; NN & gt;` , ` & lt; NNS & gt; ` , ` get right plural noun` )

er. apply (chunk_string)

print ( "step 3:" , chunk_string)

 
# Initializing UnChunkRule

un = UnChunkRule ( `& lt; DT & gt; & lt; NN. * & gt; *` , `unchunk everything` )

un. apply (chunk_string)

print ( "step 4:" , chunk_string)

Output:

 Chunk String: & lt; DT & gt; & lt; NN & gt; & lt; NNS & gt; step 1: & lt; DT & gt; {& lt; NN & gt;} & lt; NNS & gt; step 2: {& lt; DT & gt; & lt; NN & gt;} & lt; NNS & gt; step 3: {& lt; DT & gt; & lt; NN & gt; & lt; NNS & gt;} step 4: & lt; DT & gt; & lt; NN & gt; & lt; NNS & gt;