Change language

Python | PoS tagging and lemmatization using spaCy

| | | |

How to install?

 pip install spacy python -m spacy download en_core_web_sm 

SpaCy main features:
1. Non-destructive tokenization
2. Named object recognition
3. Support for more than 49 languages ​​
4. 16 statistical models for 9 languages ​​
5. Pretrained word vectors
6. Part of speech tagging
7. Marked up dependency parsing
8. Syntactic segmentation of sentences

Import and load library:

import spacy

 
# python -m spacy download en_core_web_sm

nlp = spacy.load ( "en_core_web_sm" )

POS tags for reviews:

This is a method of identifying words as nouns, verbs, adjectives, adverbs, etc.

import spacy

 
# Load english tokenizer, tagger,
# parser, NER and word vectors

nlp = spacy.load ( " en_core_web_sm " )

 
# Integer handling documents

text = (  & quot; & quot; & quot; My name is Shaurya Uppal.

I like writing articles on Python.Engineering checkout

my other article by going to my profile section. & quot; & quot; & quot; )

 

doc = nlp (text)

  
# Token and tag

for token in doc:

  print (token, token.pos_)

 
# You want a list of verb tokens

print ( "Verbs:" , [token.text for token in doc if token.pos_ = = " VERB " ])

Output:

 My DET name NOUN is VERB Shaurya PROPN Uppal PROPN. PUNCT I PRON enjoy VERB writing VERB articles NOUN on ADP Python.Engineering PROPN checkout VERB my DET other ADJ article NOUN by ADP going VERB to ADP my DET profile NOUN section NOUN. PUNCT # Verb based Tagged Reviews: - Verbs: [’is’,’ enjoy’, ’writing’,’ checkout’, ’going’] 

lemmatization:

This is the process of grouping curved word forms so that they can be parsed as a single element, identified by a word lemma or dictionary form.

import spacy

 
# Download the English tokenizer, tagger,
# parser, NER and word vectors

nlp = spacy.load ( "en_core_web_sm" )

 
# Processing entire documents

text = ( & quot; & quot; “My name is Shaurya Uppal. I like to write

articles about Python.Engineering checkout my other

article by going to my profile section. & quot; & quot; & quot; )

 

doc = nlp (text)

  

for token in doc:

print (token, token.lemma_)

Output:

 My -PRON- name name is be Shaurya Shaurya Uppal Uppal. ... I -PRON- enjoy enjoy writing write articles article on on Python.Engineering Python.Engineering checkout checkout my -PRON- other other article article by by going go to to my -PRON- profile profile section section. ... 

Shop

Learn programming in R: courses

$

Best Python online courses for 2022

$

Best laptop for Fortnite

$

Best laptop for Excel

$

Best laptop for Solidworks

$

Best laptop for Roblox

$

Best computer for crypto mining

$

Best laptop for Sims 4

$

Latest questions

NUMPYNUMPY

Common xlabel/ylabel for matplotlib subplots

12 answers

NUMPYNUMPY

How to specify multiple return types using type-hints

12 answers

NUMPYNUMPY

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

12 answers

NUMPYNUMPY

Flake8: Ignore specific warning for entire file

12 answers

NUMPYNUMPY

glob exclude pattern

12 answers

NUMPYNUMPY

How to avoid HTTP error 429 (Too Many Requests) python

12 answers

NUMPYNUMPY

Python CSV error: line contains NULL byte

12 answers

NUMPYNUMPY

csv.Error: iterator should return strings, not bytes

12 answers

News


Wiki

Python | How to copy data from one Excel sheet to another

Common xlabel/ylabel for matplotlib subplots

Check if one list is a subset of another in Python

sin

How to specify multiple return types using type-hints

exp

Printing words vertically in Python

exp

Python Extract words from a given string

Cyclic redundancy check in Python

Finding mean, median, mode in Python without libraries

cos

Python add suffix / add prefix to strings in a list

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

Python - Move item to the end of the list

Python - Print list vertically