Call now: (800) 766-1884  



 Home


 SQL Server Tips
 SQL Server Training

 SQL Server Consulting
 SQL Server Support
 SQL Server Remote DBA



 Articles
 Services
 SQL Server Scripts
 Scripts Menu



 

 

 

 
 

SQL Server

SQL Server Tips by Gama and Naughter Consulting

 


Implementation

If we just want to integrate regular expressions into SQL Server using a “quick and dirty” approach, then we could use the VBScript regular expression COM objects in conjunction with the “sp_OA*” functions. This is the path taken by Cory Koski in his article on using regular expressions in SQL Server at http://sqlteam.com/item.asp?ItemID=13947. An XP implementation of a regular expression on the other hand should be much faster as it avoids the inherent overhead in the “sp_OA*” calls.

The first thing to decide is what regular expression engine to use to implement the XP in. There are a number of choices we could pick from including reusing the VBScript regular expression COM object. Unfortunately this COM object does not provide an early bound interface meaning that we would still have the overhead involved in the “sp_OA*” function calls. Another alternative is PCRE with PCRE++ (a C++ wrapper-class for the Perl Compatible Regular Expressions library).

The choice we finally plump for is the ATL Regular expression parser that is included in the ATL Server framework provided in Visual Studio .NET 2003. Unlike most of the other XP’s in this book, this is one XP, which we must compile in this latest version of Visual Studio because the regular expression support we use was added in this version. Another nice advantage, which the ATL regular expression classes have, is that it handles MBCS and Unicode data as well as ASCII. Doing searches on MBCS strings is a very handy feature as many Asian languages use these types of strings. The classes we use are “ATL::CAtlRegExp" and “ATL::CAtlREMatchContext”. They provide a very complete set of regular expression operators, support case sensitivity, has a small memory footprint and it is very fast as it is implemented as raw C++ template based code.

As is usual we use the XP++ framework to provide the skeleton to the XP DLL code. As this XP requires Visual Studio .NET 2003, it is also a good test of the XP++ framework in compilers other than Visual Studio 6, which is used by most of the other XP++ samples.

The XP namely XP_REGEXP exposes 3 XP’s using the following XP++ code:

IMPLEMENT_XP(XP_REGEXP_ASCII, CASCIIRegExpExtendedStoredProcedure)
IMPLEMENT_XP(XP_REGEXP_MBCS, CMBCSRegExpExtendedStoredProcedure)
IMPLEMENT_XP(XP_REGEXP_UNICODE, CUnicodeRegExpExtendedStoredProcedure)

The module (XP_REGEXPProc.cpp) contains the code for the individual XP’s. The code is divided in three blocks, with very similar code, one for ASCII, one for MBCS and another one for Unicode. This is required due to the way the ATL classes are designed. The “CAtlRegExp” class takes a template traits class, which specifies what character set is to be used. Each one of these XP’s is made up of four basic steps for matching a regular expression.

First, the parameter data is obtained (string to be searched, regular expression and boolean to determine case sensitivity) as follows:

ATL::CAtlRECharTraitsMB::RECHARTYPE* pszSourceText = (ATL::CAtlRECharTraitsMB::RECHARTYPE*) XP_ASCIITEXT(m_pParameterData[REGEXP_MBCS_SOURCETEXT_INDEX].m_Data);

ATL::CAtlRECharTraitsMB::RECHARTYPE* pszRegularExpression = (ATL::CAtlRECharTraitsMB::RECHARTYPE*) XP_ASCIITEXT(m_pParameterData[REGEXP_MBCS_REGEXP_INDEX].m_Data);

DBBOOL bCaseSensitive = XP_BIT(m_pParameterData[REGEXP_MBCS_CASESENSITIVE_INDEX].m_Data);

Second, an instance of the regular expression parser is created:

ATL::CAtlRegExp<ATL::CAtlRECharTraitsA> regexp;

Third, the regular expression is parsed ready for it to be used:

ATL::REParseError reParseError = regexp.Parse(pszRegularExpression, bCaseSensitive);

Finally, the match is performed:

ATL::CAtlREMatchContext<ATL::CAtlRECharTraitsA> mcRegExp;
BOOL bMatch = regexp.Match(pszSourceText, &mcRegExp);

The “bMatch” variable is then returned as an OUTPUT parameter in the XP. Note that the code snippets above were based on the ASCII XP.

Each XP in XP_REGEXP has four parameters, namely @SourceText, @RegExp, @CaseSensitive and @Match. “SourceText” is the string being searched, “RegExp” is the regular expression, “CaseSensitive” is the case sensitivity flag and “Match” is an output parameter boolean that returns the result of the match.


The above book excerpt is from:

Super SQL Server Systems
Turbocharge Database Performance with C++ External Procedures

ISBN: 0-9761573-2-2
Joseph Gama, P. J. Naughter

 http://www.rampant-books.com/book_2005_2_sql_server_external_procedures.htm

.


 

 

 

Burleson Consulting Remote DB Administration


 

 


 

 

 

 

 
Burleson is the America's Team

Note: The pages on this site were created as a support and training reference for use by our staff of DBA consultants.  If you find it confusing, please exit this page.

Errata?  SQL Server technology is changing and we strive to update our SQL Server support information.  If you find an error or have a suggestion for improving our content, we would appreciate your feedback.  Just  e-mail:and include the URL for the page.
 


Burleson Consulting
SQL Server database support

 

Copyright © 1996 -  2013 by Vaaltech Web Services. All rights reserved.

Hit Counter