Made with para_block_reader function, which is nltk.corpus.reader.util.read_blankline_block
Number of other block readers is present in nltk.corpus.reader.util whose purpose is to read blocks of text from the stream.
Setting the delimiter tag
If & # 39; / & # 39; is not used as a word / tag separator, you can pass an alternate string to the TaggedCorpusReader for sep.
By default, this is sep = & # 39; / & # 39; , but if anyone someone wants to separate words and tags with & # 39; | & # 39;, for example & # 39; word | tag & # 39; then sep = & # 39; | & # 39; passed to.
Converting tags to a generic tag set Tagset: is a list of POS tags used by one or more corporations. Generic tag set: this is a simplified and concise tag set with only 12 part-of-speech tags
Code # 3: Match corpus tags to generic tag set
from nltk.corpus.reader import TaggedCorpusReader
x = TaggedCorpusReader ( `.` , r `. *. pos` , tagset = `en-brown` )