There are many ways to compare strings in Python. Some of the main methods are:
- Using regular expressions
- Simple comparison
- Using difflib
But one of very simple methods — use the library fuzzywuzzy, where we can get a score of 100, which means the two strings are equal, giving a similarity index. This article explains how we got started using the fuzzywuzzy library.
FuzzyWuzzy — it is a Python library that is used for string matching. Fuzzy string matching — it is the process of finding strings that match a given pattern. It mainly uses Levenshtein distance to calculate differences between sequences.
FuzzyWuzzy was developed and launched by SeatGeek, a ticket finder for sports and concert events. Their original use case is as described in their blog.
- Fuzzy requirements
- Python 2.4 or higher
- Python-Levenshtein
Install via pip:
pip install fuzzywuzzy pip install python-Levenshtein
How to use this library?
First import these modules,
|
Simple ratio usage :
|
< table border = "0" cellpadding = "0" cellspacing = "0">
fuzz.partial_ratio (
"geeks for geeks"
,
"geeks for geeks!"
)
100
# Exclamation point in the second line,
but still partially words are same so score comes
100
fuzz.partial_ratio (
"geeks for geeks "
,
" geeks geeks "
)
64
# less rating because there is additional
token
in
the middle middle of the string.
The token now sets the token’s sort ratio:
|
Now suppose that if we have a list of parameters, and we want to find the closest matches, we can use the process module
|
There is also another relationship, which is often called WRatio , sometimes it is better to use WRatio instead of a simple relationship, as WRatio handles lowercase and uppercase and some other parameters.
|
Full code
|
Output:
FuzzyWuzzy Ratio: 84 FuzzyWuzzy PartialRatio: 85 FuzzyWuzzy TokenSortRatio: 84 FuzzyWuzzy TokenSetRatio: 86 FuzzyWuzzy WRatio: 84 List of ratios: [’g. 95), (’geek for geek’, 93), (’ geek geek’, 86)] Best among the above list: (’g. For geeks’, 95)
FuzzyWuzzy is built on top of the library difflib, python-Levenshtein is used for speed. So this is one of the best ways to match strings in Python.