To run the Python program below, you need to have NLTK installed. Please follow the installation instructions.
A GUI will appear, then select all for all packages and click download. This will give you all the tokenizers, chunkers, other algorithms and all the enclosures, so it will take quite a long time to install.
import nltk nltk.download ()
let`s choose a small vocabulary:
Corpus: text, singular … Corpora is the plural of this.
Lexicon: words and their meanings.
Token: every “entity” that is part of something has been segmented based on rules.
In corpus linguistics, tagging part of speech strong > ( POS tagging or PoS tagging or POST ) also called grammatical tagging or disambiguation in the dictionary category .
Input: Everything is all about money. Output: [(`Everything`,` NN`), (`is`,` VBZ`), (`all`,` DT`), (`about`,` IN`), (`money`,` NN `), (` .`, `.`)]
Here is a list of tags, what they mean, and some examples:
CC coordinating conjunction
CD cardinal digit
EX existential there ( like : “there is” … think of it like “ there exists ” ) span>
FW foreign word
IN preposition / subordinating conjunction
JJ adjective `big`
JJR adjective , comparative `bigger`
JJS adjective span> ,
LS list marker 1 )
MD modal could , will
NN noun , singular `desk`
NNS noun plural `desks`
NNP proper noun , singular `Harrison`
NNPS proper noun , plural span> `Americans`
PDT predeterminer `all the kids` span>
POS possessive ending parent `s
PRP personal pronoun I, he, she
PRP $ possessive pronoun my, his, hers
RB adverb very, silently,
RBR adverb, comparative better
RBS adverb, superlative best
RP particle give up
TO to go ` to `the store.
UH interjection errrrrrrrm
VB verb, base form take
VBD verb, past tense took
VBG verb, gerund / present participle taking
VBN verb, past participle taken
VBP verb, sing. present, non-3d take
VBZ verb, 3rd person sing. present takes
WDT wh-determiner which
WP wh-pronoun who, what
WP $ possessive wh-pronoun whose
WRB wh-abverb where, when p>
The text can contain stop words such as & # 39;, & # 39; is & # 39;, & # 39; are & # 39 ;. Stop words can be filtered out of the text for processing. There is no universal stopword list in the nlp study, but the nltk module contains a stopword list.
You can add your own stop word. Navigate to directory path NLTK downloads – & gt; corpora – & gt; stopwords – & gt; update the file stopwords depending on which language you are using. We use English here (stopwords.words (& # 39; english & # 39;)).
[(`Sukanya`,` NNP`), (`Rajib`,` NNP`), (`Naba`,` NNP `), (` good`, `JJ`), (` friends`, `NNS`)] [(` Sukanya`, `NNP`), (` getting`, `VBG`), (` married`, ` VBN`), (`next`,` JJ`), (`year`,` NN`)] [(`Marriage`,` NN`), (`big`,` JJ`), (`step`, `NN`), (` one`, `CD`), (` ``,` NN`), (`life`,` NN`)] [(`It`,` PRP`), (`exciting` , `VBG`), (` frightening`, `VBG`)] [(` But`, `CC`), (` friendship`, `NN`), (` sacred`, `VBD`), (` bond `,` NN`), (`people`,` NNS`)] [(`It`,` PRP`), (`special`,` JJ`), (`kind`, `NN`), (` love`, `VB`), (` us`, `PRP`)] [(` Many`, `JJ`), (` must`, `MD`), (` tried` , `VB`), (` searching`, `VBG`), (` friend`, `NN`), (` never`, `RB`), (` found`, `VBD`), (` right` , `RB`), (` one`, `CD`)]
Basically, the purpose of a POS tagger is to assign linguistic (mostly grammatical) information to sub-represented units. These units are called tokens and in most cases correspond to words and symbols (for example, punctuation) .