The SpaCy model —
spaCy supports two methods for finding word similarity: using context sensitive tensors and using word vectors. Below is the code to download these models.
# Downloading the small model containing tensors. python -m spacy download en_core_web_sm # Downloading over 1 million word vectors. python -m spacy download en_core_web_lg
Below is the code to find word similarity that can be extended to sentences and documents.
cat True 6.6808186 False dog True 7.0336733 False Similarity: 0.80168545
Model “en_core_web_md” gives vectors of size 300 * 1 for “dog” and “cat”. You can also use the larger en_vectors_web_lg model, which gives vectors of higher dimension for the same two words.
Using custom language models —
By simply switching the language model, we can find similarities between Latin, French or German documents. SpaCy currently supports 49 languages. spaCy also allows you to capture word vectors for words according to user needs. Below is an example.