Call now: (800) 766-1884  



 Home


 SQL Server Tips
 SQL Server Training

 SQL Server Consulting
 SQL Server Support
 SQL Server Remote DBA



 Articles
 Services
 SQL Server Scripts
 Scripts Menu



 

 

 

   
  SQL Server Tips by Gama and Naughter


Levenshtein distance in .NET

The code used in this function is adapted from VB6 with minimal changes. The original code was developed by Michael Gilleland from Merriam Park Software. More info and many flavors of the algorithm can be found at http://www.merriampark.com/ld.htm.

The Levenshtein distance is a measure of how different two words are. Zero means that the words are identical and each different character increases the distance by one. Missing or extra characters are also considered the same way as different characters.
This algorithm uses arrays that TSQL can emulate with strings and for loops that can be emulated with while loops. The TSQL functions for string manipulation are very fast but still, the extra code for the emulation adds too much overhead and the performance is severely affected.

Four implementations of the algorithm were tested: TSQL, .NET and two XPís.

The TSQL code is a SP called spLevenshteinTSQL and it uses a CASE statement instead of another function to calculate the minimum of three numbers to improve performance:

CASE WHEN @a<=@b AND @a<=@c THEN @a
WHEN @b<=@a AND @b<=@c THEN @b
WHEN @c<=@a AND @c<=@b THEN @c
END


The .NET function is called spLevenshteinVB and the XPís are called XPLevenshtein (using Michael Gillelandís code) and XPLevenshtein2 (using Anders Sewerin Johansenís code). XPLevenshtein2 uses a dynamic array and it should be not only faster but also more memory efficient.

The test code repeats a call to the function 10,000 times with two words of 5 and 6 characters each.

DECLARE @start datetime
SET @start=GETDATE()
DECLARE @counter int, @ld int
DECLARE @word1 VARCHAR(100), @word2 VARCHAR(100)
SET @word1='flying'
SET @word2='fleeing'
SET @counter=1
WHILE @counter <=10000
BEGIN
EXEC master..XP_LEVENSHTEIN @word1, @word2, @ld OUT
--EXEC master..XP_LEVENSHTEIN2 @word1, @word2, @ld OUT
--EXEC spLevenshteinTSQL @word1, @word2, @ld OUT
--EXEC spLevenshteinVB @word1, @word2, @ld OUT
SET @counter=@counter +1
END
SELECT datediff(ms,@start, GETDATE())


The second test involved two strings with 8000 characters each.


The above book excerpt is from:

Super SQL Server Systems
Turbocharge Database Performance with C++ External Procedures

ISBN: 0-9761573-2-2
Joseph Gama, P. J. Naughter

 http://www.rampant-books.com/book_2005_2_sql_server_external_procedures.htm
 

 

Burleson Consulting Remote DB Administration


 

 


 

 

 

 

 
Burleson is the America's Team

Note: The pages on this site were created as a support and training reference for use by our staff of DBA consultants.  If you find it confusing, please exit this page.

Errata?  SQL Server technology is changing and we strive to update our SQL Server support information.  If you find an error or have a suggestion for improving our content, we would appreciate your feedback.  Just  e-mail:and include the URL for the page.
 


Burleson Consulting
SQL Server database support

 

Copyright © 1996 -  2013 by Vaaltech Web Services. All rights reserved.

Hit Counter