NLP | Filtering out irrelevant words



Code # 1: filter_insignificant () class to filter irrelevant words

def filter_insignificant ( chunk, 

tag_suffixes = [ `DT` , `CC` ]): 

  good = []

 

for word, tag in chunk:

  ok = True

 

for suffix in tag_suffixes:

if tag.endswith (suffix):

ok = False

break

 

if ok:

good.append ((word, tag))

 

return good

filter_insignificant () checks if this tag (for each tag) ends with suffix tags iterating over the tagged words in the chunk. A tagged word is skipped if the tag ends with any of the tag_suffixes . Otherwise, if all is well with the tag, the tagged word is added to the new valid snippet that is returned.

Code # 2: Using filter_insignificant () for a phrase

from transforms import filter_insignificant

 

print ( "Significant words:"

filter_insignificant ([( `the` , `DT` ), 

( `terrible` , `JJ` ), ( `movie` , ` NN` )]))

Output:

 Significant words: [(`terrible`,` JJ`), (`movie`,` NN`)] 

We can give different tag suffixes using filter_insignificant () … In the code below, we are talking about pronouns and possessive words like “you”, “you”, “them” and “them”, they are useless, but the words “DT” and “CC” are fine. Then the tag suffixes are PRP and PRP $:

Code # 3: Passing custom tag suffixes using filter_insignificant()

from transforms import filter_insignificant

 
# select tag_suffixes

print ( "Significant words:"

  filter_insignificant ([( `your` , ` PRP $ ` ), 

( `b ook` , `NN` ), ( `is` , ` VBZ` ), 

( `great` , `JJ` )], 

tag_suffixes = [ `PRP` , ` PRP $ ` ]))

Output:

 Significant words: [(`book`,` NN`), (`is`,` VBZ`), (`great`,` JJ`)]