Change language

# FuzzyWuzzy Python library

There are many ways to compare strings in Python. Some of the main methods are:

1. Using regular expressions
2. Simple comparison
3. Using difflib

But one of very simple methods — use the library fuzzywuzzy, where we can get a score of 100, which means the two strings are equal, giving a similarity index. This article explains how we got started using the fuzzywuzzy library.

FuzzyWuzzy — it is a Python library that is used for string matching. Fuzzy string matching — it is the process of finding strings that match a given pattern. It mainly uses Levenshtein distance to calculate differences between sequences.
FuzzyWuzzy was developed and launched by SeatGeek, a ticket finder for sports and concert events. Their original use case is as described in their blog.

Fuzzy requirements

• Python 2.4 or higher
• Python-Levenshtein

Install via pip:

`  pip install fuzzywuzzy pip install python-Levenshtein  `

How to use this library?

First import these modules,

 ` from ` ` fuzzywuzzy ` ` import ` ` fuzz ` ` from ` ` fuzzywuzzy ` ` import ` ` process `

Simple ratio usage :

 ` fuzz.ratio (` ` ’pythonengineering’ ` ` , ` ` ’geeksgeeks’ ` `) ` ` 87 `   ` # Exact match ` ` fuzz.ratio (` ` ’GeeksforGeeks’ ` `, ` ` ’GeeksforGeeks’ ` `) ` ` `  ` 100 ` ` fuzz.ratio (` `’ geeks for geeks’ ` `, ` ` ’Geeks For Geeks’ ` `) ` ` 80 `

< table border = "0" cellpadding = "0" cellspacing = "0">

` fuzz.partial_ratio (` ` "geeks for geeks" ` `, ` ` "geeks for geeks!" ` `) `

` 100 `
` # Exclamation point in the second line, `

` but still partially words are same so score comes ` ` 100 `

` fuzz.partial_ratio (` ` "geeks for geeks "` `, ` `" geeks geeks "` `) `

` 64 `
` # less rating because there is additional `

` token ` ` in ` ` the middle middle of the string. `

The token now sets the token’s sort ratio:

 ` # Token Sort Ratio ` ` fuzz.token_sort_ratio (` `" geeks for geeks "` `, ` `" for geeks geeks "` `) ` ` 100 `   ` # This gives 100 since every word is the same regardless of position `   ` # Token Set Ratio ` ` fuzz.token_sort_ratio (` ` "geeks for geeks" ` `, ` ` "geeks for for geeks" ` `) ` ` 8 8 ` ` fuzz.token_set_ratio (` ` " geeks for geeks "` `, ` `" geeks for geeks "` `) ` ` 100 ` ` # The score comes 100 in the second case, because token_set_ratio ` ` considers duplicate words as a single word. `

Now suppose that if we have a list of parameters, and we want to find the closest matches, we can use the process module

` `

` query = ’geeks for geeks’ choices = [ ’geek for geek’ , ’geek geek’ , ’ g. for geeks’ ]    # Get a list of matches ordered by score, the default limit is 5 process.extract (query, choices) [( ’geeks geeks’ , 95 ), ( ’ g. For geeks’ , 95 ), ( ’geek for geek’ , 93 )]    # If we only want the top one process.extractOne (query, choices) ( ’geeks geeks’ , 95 ) `

` `

There is also another relationship, which is often called WRatio , sometimes it is better to use WRatio instead of a simple relationship, as WRatio handles lowercase and uppercase and some other parameters.

 ` fuzz.WRatio (` ` ’geeks for geeks’ ` ` , ` ` ’Geeks For Geeks’ ` `) ` ` 100 ` ` fuzz.WRatio (` ` ’geeks for geeks !!!’ ` `, ` ` ’geeks for geeks’ ` `) ` ` 100 ` ` # whereas a simple ratio will give for the above case `` fuzz.ratio ( ’geeks for geeks !!!’ , ’ geeks for geeks’ ) 91 `

Full code

 ` # Python code showing all relationships together ` ` # make sure you have fuzzywuzzy module installed `   ` from ` ` fuzzywuzzy ` ` import ` ` fuzz ` ` from ` ` fuzzywuzzy ` ` import ` ` process `   ` s1 ` ` = ` ` "I love GeeksforGeeks" ` ` s2 ` ` = ` ` "I am loving GeeksforGeeks" ` ` print ` ` "FuzzyWuzzy Ratio:" ` `, fuzz.ratio (s1, s2) ` ` print ` ` "FuzzyWuzzy PartialRatio:" ` `, fuzz.partial_ratio ( s1, s2) ` ` print ` ` "FuzzyWuzzy TokenSortRatio:" ` `, fuzz.token_sort_ratio (s1, s2) ` ` print ` `" FuzzyWuzzy TokenSetRatio: "` `, fuzz.token_set_ratio (s1, s2) ` ` print ` `" FuzzyWuzzy WRatio: "` `, fuzz.WRatio (s1, s2), ` `’ ’ `   ` # for process library ` ` query ` ` = ` ` ’geeks for geeks’ ` ` choices ` ` = ` ` [` ` ’ geek for geek’ ` `, ` ` ’geek geek’ ` `, ` ` ’g. for geeks’ ` `] ` ` print ` ` "List of ratios: "` ` print ` ` process.extract (query, choices), ` ` ’’ ` ` print ` ` "Best among the above list:" ` `, process.extractOne (query, choices) `

Output:

` FuzzyWuzzy Ratio: 84 FuzzyWuzzy PartialRatio: 85 FuzzyWuzzy TokenSortRatio: 84 FuzzyWuzzy TokenSetRatio: 86 FuzzyWuzzy WRatio: 84 List of ratios: [’g. 95), (’geek for geek’, 93), (’ geek geek’, 86)] Best among the above list: (’g. For geeks’, 95) `

FuzzyWuzzy is built on top of the library difflib, python-Levenshtein is used for speed. So this is one of the best ways to match strings in Python.

## Shop

Best laptop for Excel

\$

Best laptop for Solidworks

\$399+

Best laptop for Roblox

\$399+

Best laptop for development

\$499+

Best laptop for Cricut Maker

\$299+

Best laptop for hacking

\$890

Best laptop for Machine Learning

\$699+

Raspberry Pi robot kit

\$150

Latest questions

PythonStackOverflow

Common xlabel/ylabel for matplotlib subplots

PythonStackOverflow

Check if one list is a subset of another in Python

PythonStackOverflow

How to specify multiple return types using type-hints

PythonStackOverflow

Printing words vertically in Python

PythonStackOverflow

Python Extract words from a given string

PythonStackOverflow

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

PythonStackOverflow

Python os.path.join () method

PythonStackOverflow

Flake8: Ignore specific warning for entire file

## Wiki

Python | How to copy data from one Excel sheet to another

Common xlabel/ylabel for matplotlib subplots

Check if one list is a subset of another in Python

How to specify multiple return types using type-hints

Printing words vertically in Python

Python Extract words from a given string

Cyclic redundancy check in Python

Finding mean, median, mode in Python without libraries