NLP | Part of speech labeled — word corpus

What is part-of-speech (POS) tagging?
This is the process of converting sentences to forms — a list of words, a list of tuples (where each tuple is of the form (word, tag)). The tag in the case of a word is a part of speech tag and indicates whether the word is a noun, adjective, verb, etc.

Part of Speech (POS) Tagged Corpus

 The /  at -tl  expense /  nn  and /  cc  time /  nn  involved /  vbn  are /  ber  astronomical /  jj  ./. 

The format for a tagged corpus is word / tag . Each word with a tag denoting its POS. For example, nn refers to a noun, vb — to the verb.

Code # 1: Create TaggedCorpusReader. for words

# Using TaggedCorpusReader

from nltk.corpus.reader import TaggedCorpusReader

 
# initialization

x = TaggedCorpusReader ( `.` , r `. *. pos` )

  

words = x.words ( )

print ( "Words:" , words)

  

tag_words = x.tagged_words ()

print ( " tag_words: " , tag_words)

Output:

 Words: [`The`,` expense`, `and`,` time`, `involved`,` are`, ...] tag_words: [(`The`,` AT-TL`), (`expense`,` NN`), (`and`,` CC`), ...] 

Code # 2: for offer

 

Output:

 tagged_sent: [[(`The`,` AT-TL`), (`expense`,` NN`), ( `and`,` CC`), (`time`,` NN`), (`involved`,` VBN`), (`are`,` BER`), (`astronomical`,` JJ`), ( `.`,` .`)]] 

Code # 3: for paragraphs

tagged_sent = x.tagged_sents ()

print ( "tagged_sent:" , tagged_sent)

para = x.para ()

print ( "para:" , para)

 

tagged_para = x.tagged_paras ()

print ( "tagged_paras:" , tagged_paras)

Output:  

 para: [[[`The`,` expense`, `and`,` time`, `involved`,` are`, `astronomical`,` .`]]] tagged_paras: [[ [(`The`,` AT-TL`), (`expense`,` NN`), (`and`,` CC`), (`time`,` NN`), (`involved`,` VBN `), (` are`, `BER`), (` astronomical`, `JJ`), (` .`, `.`)]]]