Python | Named Object Recognition (NER) using spaCy



Named Object Recognition (NER) — it is a standard NLP problem that involves extracting named objects (people, places, organizations, etc.) from a piece of text and classifying them into a predefined set of categories. Some of the practical uses of NER include:

  • Crawling news articles for people, organizations, and localities.
  • Providing concise functions to optimize search: instead of searching all content, you can just search for the main objects involved.
  • Quickly find geographic locations mentioned in Twitter posts.

NER with spa
spaCy is considered to be the fastest NLP platform in Python, with only optimized features for each of the NLP tasks it implements. Easy to learn and use, simple tasks can be done easily with a few lines of code.

Installation:

 pip install spacy python -m spacy download en_core_web_sm 

Code for NER using spaCy.

import spacy

  

nlp = spacy.load ( `en_core_web_sm` )

 

sentence = " Apple is looking at buying UK startup for $ 1 billion "

  

doc = nlp (sentence)

 

for ent in doc.ents:

print (ent.text, ent.start_char, ent .end_char, ent.label_)

Exit

 Apple 0 5 ORG UK 27 31 GPE $ 1 billion 44 54 MONEY 

In the output, the first column indicates the entity, the next two columns — the starting and ending characters in the sentence / document, and the last column — category.

Also, it is interesting to note that spaCy`s NER model uses capitalization as one of the signals to identify named objects. The same example, when tested with a little modification, gives a different result.

import spacy

 

nlp = spacy.load ( `en_core_web_sm` )

 

sentence = "apple is looking at buying UK startup for $ 1 billion"

 

doc = nlp (sentence)

 

for ent in doc.ents:

print (ent.text, ent.start_char, ent.end_char, ent.label_)

Exit

 UK 27 31 GPE $ 1 billion 44 54 MONEY 

The word “apple” no longer appears as named object. Therefore, it is important to use NER before the usual preprocessing or normalization steps.

You can also use their own examples to train and modify the built-in NER spa model. There are several ways to do this. The following code shows an easy way to inject new instances and update the model.

import spacy

from spacy.gold import GoldParse

from spacy.language import EntityRecognizer

  

nlp = spacy.load ( `en ` , entity = False , parser = False )

< code class = "undefined spaces">  

doc_list = []

doc = nlp ( `Llamas make great pets.` )

doc_list.append (doc)

gold_list = []

gold_list.append (GoldParse (doc, [u `ANIMAL` , u `O` , u `O` , u ` O` ] ))

 

ner = EntityRecognizer (nlp.vocab, entity_types = [ ` ANIMAL` ])

ner.update (doc_list, gold_list)

By adding enough examples to the doc_list, you can create a customized NER using spaCy.

spaCy supports the following entity types:
PERSON, NORP (nationalities, religious and political groups), FAC (buildings, airports, etc.), ORG (organizations), GPE (countries, cities, etc.), LOC (mountain ranges, bodies of water, etc.), PRODUCT ( products), EVENT (event titles), WORK_OF_ART (books, song titles), LAW (titles of legal documents), LANGUAGE (named languages), DATE, TIME, PERCENTAGE, MONEY, QUANTITY, ORDINAL and CARDINAL.

Links

  • https://spacy.io/