Python | PoS tagging and lemmatization using spaCy



How to install?

 pip install spacy python -m spacy download en_core_web_sm 

SpaCy main features:
1. Non-destructive tokenization
2. Named object recognition
3. Support for more than 49 languages ​​
4. 16 statistical models for 9 languages ​​
5. Pretrained word vectors
6. Part of speech tagging
7. Marked up dependency parsing
8. Syntactic segmentation of sentences

Import and load library:

import spacy

 
# python -m spacy download en_core_web_sm

nlp = spacy.load ( "en_core_web_sm" )

POS tags for reviews:

This is a method of identifying words as nouns, verbs, adjectives, adverbs, etc.

import spacy

 
# Load english tokenizer, tagger,
# parser, NER and word vectors

nlp = spacy.load ( " en_core_web_sm " )

 
# Integer handling documents

text = (  & quot; & quot; & quot; My name is Shaurya Uppal.

I like writing articles on Python.Engineering checkout

my other article by going to my profile section. & quot; & quot; & quot; )

 

doc = nlp (text)

  
# Token and tag

for token in doc:

  print (token, token.pos_)

 
# You want a list of verb tokens

print ( "Verbs:" , [token.text for token in doc if token.pos_ = = " VERB " ])

Output:

 My DET name NOUN is VERB Shaurya PROPN Uppal PROPN. PUNCT I PRON enjoy VERB writing VERB articles NOUN on ADP Python.Engineering PROPN checkout VERB my DET other ADJ article NOUN by ADP going VERB to ADP my DET profile NOUN section NOUN. PUNCT # Verb based Tagged Reviews: - Verbs: [`is`,` enjoy`, `writing`,` checkout`, `going`] 

lemmatization:

This is the process of grouping curved word forms so that they can be parsed as a single element, identified by a word lemma or dictionary form.

import spacy

 
# Download the English tokenizer, tagger,
# parser, NER and word vectors

nlp = spacy.load ( "en_core_web_sm" )

 
# Processing entire documents

text = ( & quot; & quot; “My name is Shaurya Uppal. I like to write

articles about Python.Engineering checkout my other

article by going to my profile section. & quot; & quot; & quot; )

 

doc = nlp (text)

  

for token in doc:

print (token, token.lemma_)

Output:

 My -PRON- name name is be Shaurya Shaurya Uppal Uppal. ... I -PRON- enjoy enjoy writing write articles article on on Python.Engineering Python.Engineering checkout checkout my -PRON- other other article article by by going go to to my -PRON- profile profile section section. ...