Friday, May 27, 2016

Comparison and Review of .NET Fuzzy Matching Nuget Packages

I am simply using Jaro-Winkler to get a similarity factor of 2 strings.  I'm using this for name and address comparisons and doing my own score aggregation and weighting.

I first tried Fuzzy String.  Unfortunately, it has several issues preventing it from working properly.  Even among these issues, I found other examples that caused the Jaro-Winkler algorithm to go into an infinite loop.  It's funny that this package has a 5 star rating, because for my use case, only using Jaro-Winkler, it failed miserably.

Then I tried BlueSimilarity.  This package also had issues loading a BlueSimilarity.Interop.dll.  At this point I was tired of troubleshooting and just wanted a solution that worked.  Besides, on nuget the project site is a broken link.  Man.

Finally I tried SimMetrics-TextFunctions.  This worked really well!  I had a few small unit tests to simply verify that the bugs in FuzzyString are not in this implementation.  Awesome!
EDIT: Wow.  I found out 7 months later that this package does indeed have a bug.  It is easy to work around, but I consider it a bug non-the-less.  This code, with a space prefix on one of the strings returns with a zero similarity.

No comments:

Post a Comment