Named Object Recognition (NER) — it is a standard NLP problem that involves extracting named objects (people, places, organizations, etc.) from a piece of text and classifying them into a predefined set of categories. Some of the practical uses of NER include:
NER with spa
spaCy is considered to be the fastest NLP platform in Python, with only optimized features for each of the NLP tasks it implements. Easy to learn and use, simple tasks can be done easily with a few lines of code.
pip install spacy python -m spacy download en_core_web_sm
Code for NER using spaCy.
Apple 0 5 ORG UK 27 31 GPE $ 1 billion 44 54 MONEY
In the output, the first column indicates the entity, the next two columns — the starting and ending characters in the sentence / document, and the last column — category.
Also, it is interesting to note that spaCy`s NER model uses capitalization as one of the signals to identify named objects. The same example, when tested with a little modification, gives a different result.
UK 27 31 GPE $ 1 billion 44 54 MONEY
The word “apple” no longer appears as named object. Therefore, it is important to use NER before the usual preprocessing or normalization steps.
You can also use their own examples to train and modify the built-in NER spa model. There are several ways to do this. The following code shows an easy way to inject new instances and update the model.
< code class = "undefined spaces">
By adding enough examples to the doc_list, you can create a customized NER using spaCy.
spaCy supports the following entity types:
PERSON, NORP (nationalities, religious and political groups), FAC (buildings, airports, etc.), ORG (organizations), GPE (countries, cities, etc.), LOC (mountain ranges, bodies of water, etc.), PRODUCT ( products), EVENT (event titles), WORK_OF_ART (books, song titles), LAW (titles of legal documents), LANGUAGE (named languages), DATE, TIME, PERCENTAGE, MONEY, QUANTITY, ORDINAL and CARDINAL.