Change language

Python | Word similarities using spaCy

| | |

The SpaCy model —
spaCy supports two methods for finding word similarity: using context sensitive tensors and using word vectors. Below is the code to download these models.

 # Downloading the small model containing tensors. python -m spacy download en_core_web_sm # Downloading over 1 million word vectors. python -m spacy download en_core_web_lg 

Below is the code to find word similarity that can be extended to sentences and documents.

import spacy

 

nlp = spacy.load ( ’en_core_web_md’ )

 

print ( "Enter two space-separated words" )

words = input ()

  

tokens = nlp (words)

  

for token in tokens:

# Prints the following attributes of each token.

# text: word string, has_vector: if it contains

# vector view in the model,

  # vector_norm: vector algebraic norm,

  # is_oov: if the word is out of the dictionary.

print (token.text, token.has_vector, token.vector_norm, token.is_oov) 

 

token1, token2 = tokens [ 0 ], tokens [ 1 ]

 

print ( "Similarity:" , token1.similarity (token2))

Exit:

 cat True 6.6808186 False dog True 7.0336733 False Similarity: 0.80168545 

Model "en_core_web_md" gives vectors of size 300 * 1 for "dog" and "cat". You can also use the larger en_vectors_web_lg model, which gives vectors of higher dimension for the same two words.

Using custom language models —
By simply switching the language model, we can find similarities between Latin, French or German documents. SpaCy currently supports 49 languages. spaCy also allows you to capture word vectors for words according to user needs. Below is an example.

Exit:

 Before custom setting array ([0., 0., 0., 0., 0., 0., 0., 0., ---]) After custom setting array ([0.68106073, 0.6037007, 0.9526876, -0.25600302, -0.24049562, ---]) 

Shop

Learn programming in R: courses

$

Best Python online courses for 2022

$

Best laptop for Fortnite

$

Best laptop for Excel

$

Best laptop for Solidworks

$

Best laptop for Roblox

$

Best computer for crypto mining

$

Best laptop for Sims 4

$

Latest questions

NUMPYNUMPY

Common xlabel/ylabel for matplotlib subplots

12 answers

NUMPYNUMPY

How to specify multiple return types using type-hints

12 answers

NUMPYNUMPY

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

12 answers

NUMPYNUMPY

Flake8: Ignore specific warning for entire file

12 answers

NUMPYNUMPY

glob exclude pattern

12 answers

NUMPYNUMPY

How to avoid HTTP error 429 (Too Many Requests) python

12 answers

NUMPYNUMPY

Python CSV error: line contains NULL byte

12 answers

NUMPYNUMPY

csv.Error: iterator should return strings, not bytes

12 answers


Wiki

Python | How to copy data from one Excel sheet to another

Common xlabel/ylabel for matplotlib subplots

Check if one list is a subset of another in Python

sin

How to specify multiple return types using type-hints

exp

Printing words vertically in Python

exp

Python Extract words from a given string

Cyclic redundancy check in Python

Finding mean, median, mode in Python without libraries

cos

Python add suffix / add prefix to strings in a list

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

Python - Move item to the end of the list

Python - Print list vertically

import spacy

import numpy as np

from spacy.vocab import Vocab

  

nlp = spacy.load ( ’en_core_web_md’ )

new_word = ’bucrest’

  

print ( ’Before custom setting’ )

print  (vocab.get_vector ( ’bucrest’ ))

 

custom_vector = np.random.uniform ( - 1 , 1 , ( 300 ,))

 
vocab.set_vector (new_word, custom_vector)

 

print ( ’After custom setting’ )

print (vocab.get_vector ( ’bucrest’ ))