Call now: (800) 766-1884  



 Home


 SQL Server Tips
 SQL Server Training

 SQL Server Consulting
 SQL Server Support
 SQL Server Remote DBA



 Articles
 Services
 SQL Server Scripts
 Scripts Menu



 

 

 

   
  SQL Server Tips by Gama and Naughter

SQL Server Collation

A collation is a particular set of rules for using characters for a language or alphabet. Although the same language might be spoken by several nations, there are national or cultural variations, not to mention nations with several languages. Microsoft Windows created the Language ID Reference Number (LCID), a 32 bit code for nearly 200 languages. The LCID is more than a number that identifies a national language, it contains information encoded in its bits:

The first ten bits are the primary language ID in the range 0x200 to 0x3FF. Bits ten to fifteen are the sub-language ID in the range 0x20 to 0x3F, for the same language from different regions. Bits sixteen to nineteen are the sort ID and the remaining twelve bits are reserved and should be zero.

Collations are responsible for determining the correct characters and how they are sorted or compared. In SQL Server 2000, different collations can coexist down to the level of columns.

Each SQL Server collation determines:

* The sort order for Unicode.
* The sort order for ASCII.
* The code page used for ASCII.

ASCII characters in multilingual databases are problematic because ASCII has different character sets, called code pages. Converting between code pages is difficult because identical characters from different code pages might have different ASCII codes and some characters have no equivalent. With characters in Unicode there is no such problem.

The Sort ID field determines the sort order, which is very important when comparing or sorting data. There are five considerations about the sort order:

* Ascending or descending? (is ‘a’>‘b’?)
* Case-sensitive? (is ‘a’>‘A’?)
* Accent-sensitive? (is ‘a’=‘á’?)
* Character width? (is ‘a’>‘aa’?)
* Kana character types? (is ‘?’=‘?’?)

SQL Server 2000 has two types of collations:

* Windows collations – use the Windows locale.
* SQL collations - provided for backwards compatibility with sort orders.

A SQL collation name consists of four components:

* SortRules – name of the alphabet or language.
* Pref - uppercase preference. (optional)
* CodePage - code page. (optional)
* CaseSensitivity + AccentSensitivity or BIN

CaseSensitivity can be either CI (case insensitive) or CS (case sensitive)

AccentSensitivity can be either AI (accent insensitive) or AS (accent sensitive)

BIN means that binary sort order is used, instead of text order.

For example SQL_Latin1_General_Pref_CP437_CI_AS is Latin1_General(alphabet), Pref(uppercase preference), CP437(code page 437), CI(case insensitive) and AS(accent sensitive).


The above book excerpt is from:

Super SQL Server Systems
Turbocharge Database Performance with C++ External Procedures

ISBN: 0-9761573-2-2
Joseph Gama, P. J. Naughter

 http://www.rampant-books.com/book_2005_2_sql_server_external_procedures.htm  

 

Burleson Consulting Remote DB Administration


 

 


 

 

 

 

 
Burleson is the America's Team

Note: The pages on this site were created as a support and training reference for use by our staff of DBA consultants.  If you find it confusing, please exit this page.

Errata?  SQL Server technology is changing and we strive to update our SQL Server support information.  If you find an error or have a suggestion for improving our content, we would appreciate your feedback.  Just  e-mail:and include the URL for the page.
 


Burleson Consulting
SQL Server database support

 

Copyright © 1996 -  2013 by Vaaltech Web Services. All rights reserved.

Hit Counter