Friday, May 27, 2016

Comparison and Review of .NET Fuzzy Matching Nuget Packages

I am simply using Jaro-Winkler to get a similarity factor of 2 strings.  I'm using this for name and address comparisons and doing my own score aggregation and weighting.

I first tried Fuzzy String.  Unfortunately, it has several issues preventing it from working properly.  Even among these issues, I found other examples that caused the Jaro-Winkler algorithm to go into an infinite loop.  It's funny that this package has a 5 star rating, because for my use case, only using Jaro-Winkler, it failed miserably.

Then I tried BlueSimilarity.  This package also had issues loading a BlueSimilarity.Interop.dll.  At this point I was tired of troubleshooting and just wanted a solution that worked.  Besides, on nuget the project site is a broken link.  Man.

Finally I tried SimMetrics-TextFunctions.  This worked really well!  I had a few small unit tests to simply verify that the bugs in FuzzyString are not in this implementation.  Awesome!
EDIT: Wow.  I found out 7 months later that this package does indeed have a bug.  It is easy to work around, but I consider it a bug non-the-less.  This code, with a space prefix on one of the strings returns with a zero similarity.  EDIT 2: Over a year later, I have found a bug.  With strings "Canyon Rd" and "Canyon Est Dr" I am getting a similarity score of .4.  It should be much higher than that.  So...  I'm changing my implementation... again.

Now I am using some code from Stack Overflow.  I'm really surprised that all the NuGet packages have some kind of issue and that this bit of code passes where all the others fail.  Thank goodness for unit tests.  I have a ton of them.

Monday, May 9, 2016

My Entity Framework Cheat Sheet

This is a list of tools I use to get generated code and get work done quickly with Entity Framework.

Generate Classes for Entity Framework Code-First from Database

  1. Prerequisite: An existing database
  2. Do one time:  Download Entity Framework 6 Tools for Visual Studio 2012 & 2013.
  3. Right click a project
  4. Add new item...
  5. Search for ADO, select ADO.NET Entity DataModel.
  6. Name it.
  7. Select Code First from database.
  8. Select / Create Database connection.
  9. Select tables you want to include.
  10. Finish
  11. Now you have models and a DbContext to work with.

Generate SQL/Database from a DbContext

  1. For the entities, create a models directory and create classes in it.
    1. If your primary key does not follow convention name, use the [Key] attribute.
    2. Create navigation properties for many to one relationships; create a property for the ID and the object.
    3. If your nav prop ID does not follow convention, use the [ForeignKey] attribute on the object.
    4. Other attributes to use: Required, MaxLength, Index, InverseProperty, Column (if you want your db column to have a different name), [Table("TableName", Schema = "SchemaName")]
  2. VIEW > Other > Package Manager Console.
  3. Make sure the console is showing the correct project.  You also may need to set that project as the startup project in VS so it can find the correct config file with the DB connection string.
  4. If you are trying to recreate your database, make sure to drop the DB or change the connection string.  Also remove the Migrations folder.
  5. Make sure you have a DbContext file with:
    1. A good database connection string.
    2. Sub-classing DbContext
    3. public DbSet<Entity> Entities {get;set;}
  6. Type: Enable-Migrations
  7. Add-Migration MigrationName
  8. Update-Database -Script.  Save off the SQL if you want it.
  9. Update-Database.  Now your database tables are created.