I have implemented a string matching function in Python utilizing n-grams and similarity ratios. The function signature is as follows:
# concise version of the function def match_strings(strings1, strings2, ngram_n=2, threshold=0): similarities = numerators / denominators similarities = np.where(similarities > threshold, similarities, 0)
The function is designed to compare two lists of strings (strings1 and strings2) and return matches based on their similarity ratio, with an optional threshold parameter.
However, I'm encountering an issue where even when I set the threshold parameter to 0, indicating that strings should match regardless of dissimilarity, certain string pairs fail to match.
For example, when strings1 == 'J.S' and strings2 == 'jayanthsjay', the function doesn't recognize them as matches despite the threshold being set to 0.
Could anyone please help me understand why setting the threshold to 0 doesn't seem to enforce matching for all string pairs, and how can I modify the function to achieve this behavior correctly? Thank you!