Part of speech tagging using stop words using NLTK in Python

To run the Python program below, you need to have NLTK installed. Please follow the installation instructions.

  • Open your terminal, run pip install nltk .
  • Write python on the command line to have python Interactive Shell was ready to execute your code / script.
  • nltk import type
  • nltk.download ()
  • A GUI will appear, then select all for all packages and click download. This will give you all the tokenizers, chunkers, other algorithms and all the enclosures, so it will take quite a long time to install. 
    Examples:

     import nltk nltk.download () 

    let`s choose a small vocabulary:
    Corpus: text, singular … Corpora is the plural of this. 
    Lexicon: words and their meanings. 
    Token: every “entity” that is part of something has been segmented based on rules.

    In corpus linguistics, tagging part of speech ( POS tagging or PoS tagging or POST ) also called grammatical tagging or disambiguation in the dictionary category .

     Input: Everything is all about money. Output: [(`Everything`,` NN`), (`is`,` VBZ`), (`all`,` DT`), (`about`,` IN`), (`money`,` NN `), (` .`, `.`)] 

    Here is a list of tags, what they mean, and some examples:

    CC coordinating conjunction
    CD cardinal digit
    DT determiner
    EX existential there ( like : “there is” think of it like “ there exists ” )
    FW foreign word
    IN preposition
    / subordinating conjunction
    JJ adjective
    `big`
    JJR adjective
    , comparative `bigger`
    JJS adjective , superlative
    `biggest`
    LS list marker
    1 )
    MD modal could
    , will
    NN noun
    , singular `desk`
    NNS noun plural
    `desks`
    NNP proper noun
    , singular `Harrison`
    NNPS proper noun
    , plural `Americans`
    PDT predeterminer
    `all the kids`
    POS possessive ending parent
    `s
    PRP personal pronoun I, he, she
    PRP $ possessive pronoun my, his, hers
     RB adverb very, silently,
    RBR adverb, comparative better
    RBS adverb, superlative best
    RP particle give up
    TO to go `
    to `the store.
    UH interjection errrrrrrrm
    VB verb, base form take
    VBD verb, past tense took
    VBG verb, gerund / present participle taking
    VBN verb, past participle taken
    VBP verb, sing. present, non-3d take
    VBZ verb, 3rd person sing. present takes
    WDT wh-determiner which
    WP wh-pronoun who, what
    WP $ possessive wh-pronoun whose
    WRB wh-abverb where, when

    The text can contain stop words such as & # 39;, & # 39; is & # 39;, & # 39; are & # 39 ;. Stop words can be filtered out of the text for processing. There is no universal stopword list in the nlp study, but the nltk module contains a stopword list. 
    You can add your own stop word. Navigate to directory path NLTK downloads – & gt;  corpora – & gt;  stopwords – & gt; update the file stopwords depending on which language you are using. We use English here (stopwords.words (& # 39; english & # 39;)).

    import nltk

    from nltk.corpus import stopwords

    from nltk.tokenize import word_tokenize, sent_tokenize

    stop_words = set (stopwords.words ( `english` ))

      

    / / Dummy text

    txt = " Sukanya, Rajib and Naba are my good friends. "

      " Sukanya is getting married next year. "

      " Marriage is a big step in one`s life. "

      "It is both exciting and frightening. "

      " But friendship is a sacred bond between people. "

      "It is a special kind of love between us. "

      " Many of you must have tried searching for a friend "

      "but never found the right one."

     
    # sent_tokenize is one from examples
    # PunktSentenceTokenizer from nltk.tokenize.punkt module

      

    tokenized = sent_tokenize (txt)

    for i in tokenized:

     

    # Word tokenizers are used to find words

    # and line punctuation

      wordsList = nltk.word_tokenize (i)

     

    # remove stop words from wordList

    wordsList = [w for w in wordsList if not w in stop_words] 

     

    # Using Tagger. What is part of speech

    # tagger or POS-tagger.

    tagged = nltk.pos_tag (wordsList)

     

      print (tagged)

    Output:

     [(`Sukanya`,` NNP`), (`Rajib`,` NNP`), (`Naba`,` NNP `), (` good`, `JJ`), (` friends`, `NNS`)] [(` Sukanya`, `NNP`), (` getting`, `VBG`), (` married`, ` VBN`), (`next`,` JJ`), (`year`,` NN`)] [(`Marriage`,` NN`), (`big`,` JJ`), (`step`, `NN`), (` one`, `CD`), (` ``,` NN`), (`life`,` NN`)] [(`It`,` PRP`), (`exciting` , `VBG`), (` frightening`, `VBG`)] [(` But`, `CC`), (` friendship`, `NN`), (` sacred`, `VBD`), (` bond `,` NN`), (`people`,` NNS`)] [(`It`,` PRP`), (`special`,` JJ`), (`kind`, `NN`), (` love`, `VB`), (` us`, `PRP`)] [(` Many`, `JJ`), (` must`, `MD`), (` tried` , `VB`), (` searching`, `VBG`), (` friend`, `NN`), (` never`, `RB`), (` found`, `VBD`), (` right` , `RB`), (` one`, `CD`)] 

    Basically, the purpose of a POS tagger is to assign linguistic (mostly grammatical) information to sub-represented units. These units are called tokens and in most cases correspond to words and symbols (for example, punctuation) .