I want to find string similarity between two strings. This page has examples of some of them. Python has an implemnetation of Levenshtein algorithm. Is there a better algorithm, (and hopefully a python library), under these contraints.
Would something other than Levenshtein distance(or Levenshtein ratio) be a better algorithm for my case?
There's a great resource for string similarity metrics at the University of Sheffield. It has a list of various metrics (beyond just Levenshtein) and has open-source implementations of them. Looks like many of them should be easy to adapt into Python.
Here's a bit of the list:
I realize it's not the same thing, but this is close enough:
>>> import difflib >>> a = 'Hello, All you people' >>> b = 'hello, all You peopl' >>> seq=difflib.SequenceMatcher(a=a.lower(), b=b.lower()) >>> seq.ratio() 0.97560975609756095
You can make this as a function
def similar(seq1, seq2): return difflib.SequenceMatcher(a=seq1.lower(), b=seq2.lower()).ratio() > 0.9 >>> similar(a, b) True >>> similar('Hello, world', 'Hi, world') False