Call now: (800) 766-1884  



 Home


 SQL Server Tips
 SQL Server Training

 SQL Server Consulting
 SQL Server Support
 SQL Server Remote DBA



 Articles
 Services
 SQL Server Scripts
 Scripts Menu



 

 

 

 
  SQL Server Tips by Gama and Naughter

 

Crosstab queries
 

Crosstab queries are very dear to those who developed databases in Microsoft Access. They are also known as pivot tables, and display a summary of data by calculating a sum, average, count, or other type of total for the grouped data. It takes one field from a table and uses its values to display new columns of information, grouped by the total vertically and by the new columns horizontally. The process is called pivoting and consists of turning a table with many records and few columns into another table with fewer records and more columns, by grouping records and stores them into new columns.

 

SQL Server has no support for crosstab queries and some miss them because they were very useful particularly for formatting reports; some others argue that they are too slow, non standard and make it hard to migrate from Microsoft Access to other Database Systems, including Microsoft SQL Server. The reality is that there are plenty of Microsoft Access databases that need to migrate to SQL Server and contain crosstab queries. There are also plenty of DBA’s who would like to have an easy way to represent tabular data. SQL Server 2005 has crosstab queries, although declared in a way different from MS Access.

 

Crosstab queries in Microsoft Access
 

The following examples use the Northwind database in Access, not in SQL Server.

 

Example: To create a query for a report with the total expenses per month and year based on tables Orders and Order Details. This is not a crosstab query but part of the introduction to the problem to be solved.


In design view:

 

The code groups rows by year and month to get the total:

 

SELECT TOP 100 PERCENT MONTH([Orders].[OrderDate]) AS order_month, YEAR([Orders].[OrderDate]) AS order_year, SUM(([Order Details].[UnitPrice]*[Order Details].[Quantity])*(1-[Order Details].[Discount])) AS total

FROM Orders INNER JOIN [Order Details] ON [Orders].[OrderID]=[Order Details].[OrderID]

GROUP BY MONTH([Orders].[OrderDate]), YEAR([Orders].[OrderDate])

ORDER BY YEAR([Orders].[OrderDate]), MONTH([Orders].[OrderDate]);

  

The rows could be organized more tidily in the report but the rows are displayed horizontally, taking too much room and preventing comparisons between months from different years.

This is a typical scenario where a crosstab query would improve readability.

 

Example: To create a query with the total expenses per month and year using a pivot on the month column.

 

The TRANSFORM statement defines the aggregate function for the grouped data, usually SUM or AVG.
 

The SELECT statement defines the vertical data, which is the row’s titles for the grouped data.
 

The FROM statement has the name of the table to be pivoted, or more than one table with a JOIN.
 

The GROUP BY statement is necessary to denote how to aggregate the rows.
 

ORDER BY is optional and defines the sort order.
 

The PIVOT statement defines the pivot column. This is the horizontal data, which are the headers for the grouped data.

 

TRANSFORM Sum([UnitPrice]*[quantity]*(1-[discount])) AS Total

SELECT Year([OrderDate]) AS order_year

FROM [Order Details] RIGHT JOIN Orders ON [Order Details].OrderID = Orders.OrderID

GROUP BY Year([OrderDate])

PIVOT Month([OrderDate]);

 

Vertically the data is organized by years and horizontally by months, allowing quick comparisons and a general overview of the data that is very intuitive.
 

There is one problem with this query:  if there are no rows for a certain month or months then the corresponding column in the result will be missing. This is easy to verify by deleting all rows with a start date within June 1997; by doing this, the crosstab query will miss column 6.

 

The solution is to use Fixed Column Headings, which is to predefine the column names and accept only rows that match those names. This is done by adding a IN clause to the PIVOT statement:

 

TRANSFORM Sum([UnitPrice]*[quantity]*(1-[discount])) AS Total

SELECT Year([OrderDate]) AS order_year

FROM [Order Details] RIGHT JOIN Orders ON [Order Details].OrderID = Orders.OrderID

GROUP BY Year([OrderDate])

PIVOT Month([OrderDate]) IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12);

 

This will create columns 1 to 12, even if there is no data to display, in which case it will display NULL values; it can also be used to filter data, for example if only the first 4 months were to be displayed it would be IN (1, 2, 3, 4)

 

The IN clause will generate an error when used with the full INNER JOIN syntax of Access.

 

Fixed Column Headings works great when the data is within a known range and it will not change or if the data outside that range is discarded.
 

Example: To create a query with the total expenses per month and year using a pivot on the year column.

 

If the year can take any value and it is being used as a pivot then there is no known range of values for Fixed Column Headings. Unlike what happens with months where empty data is still valuable, with years it is different. If several years were missing, it would cause several columns with NULL’s that would add no value. Not to mention that by arbitrarily determining a range of years, there is the possibility of weeding out other years with important data.

 

This is the code:

 

TRANSFORM Sum([UnitPrice]*[quantity]*(1-[discount])) AS Total

SELECT Month([OrderDate]) AS order_month

FROM [Order Details] RIGHT JOIN Orders ON [Order Details].OrderID = Orders.OrderID

GROUP BY Month([OrderDate])

PIVOT Year([OrderDate]);

 

The apparently minor change in the pivot column has a very drastic effect.

 

Pivoting by months is very straightforward because there are 12 months and so the number of columns in the result is constant. Pivoting by year is very different because the number of columns is unknown to start with and it might even change. The implications are many and delicate. A query that returns an unknown number of columns, with unknown names makes it impossible for other objects or code to reference it like any other query.


The above book excerpt is from:

Super SQL Server Systems
Turbocharge Database Performance with C++ External Procedures

ISBN: 0-9761573-2-2
Joseph Gama, P. J. Naughter

 http://www.rampant-books.com/book_2005_2_sql_server_external_procedures.htm
 

 

Burleson Consulting Remote DB Administration


 

 


 

 

 

 

 
Burleson is the America's Team

Note: The pages on this site were created as a support and training reference for use by our staff of DBA consultants.  If you find it confusing, please exit this page.

Errata?  SQL Server technology is changing and we strive to update our SQL Server support information.  If you find an error or have a suggestion for improving our content, we would appreciate your feedback.  Just  e-mail:and include the URL for the page.
 


Burleson Consulting
SQL Server database support

 

Copyright © 1996 -  2013 by Vaaltech Web Services. All rights reserved.

Hit Counter