Python | String similarity metrics



Method # 1: Using a naive approach ( sum () + zip () )
We can accomplish this particular task using a naive approach using functions sum and zip, we can formulate a utility function that can compute the similarity of both strings.

# Python3 demo code
# similarity between lines
# using the naive method (sum () + zip ())

 
# Utility function for calculating similarity

def similar (str1, str2):

str1 = str1 + `` * ( len (str2) - len (str1) )

str2 = str2 + `` * ( len (str1) - len (str2))

  return sum ( 1 if i = = j   else 0  

for i, j in zip (str1, str2)) / float ( len (str1))

  
# Initializing strings

test_string1 = ` Geeksforgeeks`

test_string2 = `Geeks4geeks`

  
# using the naive method (sum () + zip ())
# line-to-line similarity

res = similar (test_string1, test_string2)

 
# print result

print ( "The similarity between 2 strings is:" + str (res))

Output:

 The similarity between 2 strings is: 0.38461538461538464 

Method # 2: Using SequenceMatcher.ratio ()
There is a built-in method that helps to accomplish this specific task, and it is recommended to do this specific task as it does not require a special approach, but uses built-in constructs to accomplish the task more efficiently. / p>

# Python3 demo code
# line-to-line similarity
# using SequenceMatcher.ratio ()

from difflib import SequenceMatcher

 
# Utility function for calculating similarity

def similar (str1, str2):

return SequenceMatcher ( None , str1, str2) .ratio ()

 
# Line initialization

test_string1 = `Geeksforgeeks`

test_string2 = `Geeks`

  
# using SequenceMatcher.ratio ( )
# similarity between lines

res = similar (test_string1, test_string2)

  
# print result

print ( "The similarity between 2 strings is:" + str (res))

Output:

 The similarity between 2 strings is: 0.5555555555555556