||SQL Server Tips by Gama and Naughter
SQL Server Collation
A collation is a particular set of rules for using characters for a
language or alphabet. Although the same language might be spoken by
several nations, there are national or cultural variations, not to
mention nations with several languages. Microsoft Windows created
the Language ID Reference Number (LCID), a 32 bit code for nearly
200 languages. The LCID is more than a number that identifies a
national language, it contains information encoded in its bits:
The first ten
bits are the primary language ID in the range 0x200 to 0x3FF. Bits
ten to fifteen are the sub-language ID in the range 0x20 to 0x3F,
for the same language from different regions. Bits sixteen to
nineteen are the sort ID and the remaining twelve bits are reserved
and should be zero.
Collations are responsible for determining the correct characters
and how they are sorted or compared. In SQL Server 2000, different
collations can coexist down to the level of columns.
Each SQL Server collation determines:
* The sort order for
* The sort order for ASCII.
* The code page used for ASCII.
ASCII characters in
multilingual databases are problematic because ASCII has different
character sets, called code pages. Converting between code pages is
difficult because identical characters from different code pages
might have different ASCII codes and some characters have no
equivalent. With characters in Unicode there is no such problem.
The Sort ID
field determines the sort order, which is very important when
comparing or sorting data. There are five considerations about the
* Ascending or descending?
* Case-sensitive? (is a>A?)
* Accent-sensitive? (is a=á?)
* Character width? (is a>aa?)
* Kana character types? (is ?=??)
SQL Server 2000 has two types
* Windows collations use
the Windows locale.
* SQL collations - provided for backwards compatibility with
A SQL collation name consists
of four components:
* SortRules name of the
alphabet or language.
* Pref - uppercase preference. (optional)
* CodePage - code page. (optional)
* CaseSensitivity + AccentSensitivity or BIN
CaseSensitivity can be either
CI (case insensitive) or CS (case sensitive)
AccentSensitivity can be either AI (accent insensitive) or AS
BIN means that binary sort order is used, instead of text order.
For example SQL_Latin1_General_Pref_CP437_CI_AS is
Latin1_General(alphabet), Pref(uppercase preference), CP437(code
page 437), CI(case insensitive) and AS(accent sensitive).
The above book excerpt is from:
Turbocharge Database Performance with C++ External Procedures
Joseph Gama, P. J. Naughter