Call now: (800) 766-1884  



 Home


 SQL Server Tips
 SQL Server Training

 SQL Server Consulting
 SQL Server Support
 SQL Server Remote DBA



 Articles
 Services
 SQL Server Scripts
 Scripts Menu



 

 

 

 
  SQL Server Tips by Gama and Naughter

 

Calculating Median with a SP
 

There are many ways to calculate the median from a table, each one being more effective on a particular situation. The algorithm is minimal and linear but there are many different possible implementations.
 

Knowing that a decision based on the parity of the number of elements will retrieve one central element or the average of two central elements an IF-ELSE statement sounds perfect:

 

CREATE PROCEDURE myMedianRoyalty2

AS

--Calculate the median of the royalty field from table roysched using MAX and TOP

DECLARE @totalRec int

SELECT @totalRec=COUNT(royalty) FROM roysched

SEE CODE DEPOT

              --if even then get the average of both central values

      SELECT AVG(royalty*1.0) as Median FROM (--get average

SELECT MAX(royalty) as royalty FROM (SELECT TOP 50 PERCENT royalty FROM roysched ORDER BY royalty)tmpCount

      UNION--union of both central values

SELECT  MIN(royalty) as royalty FROM (SELECT TOP 50 PERCENT royalty FROM roysched ORDER BY royalty desc)tmpCount

      ) tmp2

 

This code works but it would be much better with only one SELECT statement and using the CASE statement within the SELECT:

 

CREATE PROCEDURE myMedianRoyalty

AS

--Calculate the median of the royalty field from table roysched using MAX and TOP

SELECT CASE (SELECT COUNT(royalty)  FROM roysched)

      WHEN 1 THEN

               (SELECT MAX(royalty)  FROM (SELECT TOP 50 PERCENT royalty FROM roysched ORDER BY royalty)tmpCount)

      ELSE

               (SELECT AVG(royalty*1.0) as Median FROM (--get average

                         SELECT MAX(royalty) as royalty FROM (SELECT TOP 50 PERCENT royalty FROM roysched ORDER BY royalty)tmpCount

                                   UNION--union of both central values

                         SELECT  MIN(royalty) as royalty FROM (SELECT TOP 50 PERCENT royalty FROM roysched ORDER BY royalty desc)tmpCount

                                   ) tmp2)

END AS Median

 

When possible CASE should be used instead of IF-ELSE, particularly if there is more than one condition.

 

Another solution that might be the best for a huge number of records is using a cursor. Although cursors are very often not recommended, because of the performance degradation and record locking, this is a different situation because there will be only one or two record retrievals.

 

CREATE PROCEDURE myMedianRoyalty3

AS

--Calculate the median of the royalty field from table roysched using a Cursor

DECLARE @tot1 float, @tot2 float, @midRec int, @totalRec int, @median float

SELECT @totalRec=COUNT(royalty) FROM roysched--get total number of records

SET @midRec=@totalRec/2--calculate the central point

DECLARE cur_med  SCROLL CURSOR --declare the cursor

FOR SELECT royalty FROM roysched ORDER BY royalty

OPEN cur_med

SEE CODE DEPOT

ELSE                --if even then get the average of both central values

      BEGIN

      SET @midRec=@midRec

      FETCH ABSOLUTE @midRec FROM cur_med INTO @tot1--get 1st central value

      FETCH NEXT FROM cur_med INTO @tot2--get 2nd central value

      SET @median=(@tot1+@tot2)/2.0--calculate average

      END

CLOSE cur_med

DEALLOCATE cur_med

SELECT @median as Median

 

The next solution uses a temporary table. The best application for a temporary table is to store intermediate data or data with very intensive computations, such as complex calculated fields.

 

CREATE PROCEDURE myMedianRoyalty4

AS

--Calculate the median of the royalty field from table roysched using a temporary table

DECLARE  @totalRec int,  @midRec int

SELECT @totalRec=COUNT(royalty) FROM roysched--get total number of records

SET @midRec=@totalRec/2--calculate the central point

print @midRec

SELECT IDENTITY(int, 1, 1) as Rid, royalty --create temporary table from roysched and with an identity field

INTO #RoySchedTmp

FROM roysched  ORDER BY royalty

SEE CODE DEPOT

ELSE                --if even then get the average of both central values

      BEGIN

      SELECT AVG(royalty*1.0) as Median --calculate average

               FROM #RoySchedTmp WHERE (Rid=@midRec) OR (Rid=@midRec+1)

      END

DROP TABLE #RoySchedTmp

 

A classic and very elaborate solution is the one created by David Rozenshtein, Anatoly Abramovich and Eugene Berger in their article “Computing the median”. Their solution consists of a self-join plus characteristic functions (This is a functions that emulates conditional clauses). The following code was adapted from the article:

 

CREATE PROCEDURE myMedianRoyalty5

AS

--Calculate the median of the royalty field from table roysched using a self join

SELECT

   CASE WHEN COUNT(*)%2=1

       

 

 

THEN x.royalty

        ELSE

 

 

 

 

 

(x.royalty+

MIN(CASE WHEN y.royalty>x.royalty

 

 

                               THEN y.royalty

                          END))/2.0

   END median

FROM roysched x, roysched y

GROUP BY x.royalty

HAVING

   SUM(CASE WHEN y.royalty <= x.royalty

      THEN 1 ELSE 0 END)>=(count(*)+1)/2

AND

   SUM(CASE WHEN y.royalty >= x.royalty

      THEN 1 ELSE 0 END)>=(count(*)/2)+1

CREATE PROCEDURE myMedianRoyalty6

AS

--Calculate the median of the royalty field from table roysched using a self join

SELECT

1.0*(SIGN((1-SIGN(SIGN(SUM(SIGN(1-SIGN(y.royalty-x.royalty)))

+(count(*)+1)/2

)))+count(*)%2))

*x.royalty

+

1.0*(SIGN((1-SIGN(SIGN(SUM(SIGN(1-SIGN(x.royalty-y.royalty)))

+(count(*))/2+1

)))+1-count(*)%2))

*(

x.royalty +

MIN(((SIGN(1-SIGN(y.royalty-x.royalty)))

*y.royalty--this is to increase the smaller numbers

+1)

*y.royalty )

)/2.0

as median

FROM roysched x, roysched y

GROUP BY x.royalty

HAVING

(SUM(SIGN(1-SIGN(y.royalty-x.royalty)))>=(count(*)+1)/2)

and

(SUM(SIGN(1-SIGN(x.royalty-y.royalty)))>=count(*)/2+1)

 

The left side takes advantage of the TSQL specific statement CASE that returns values according to a condition. The right side is very similar to the original code from the article. Both are equivalent and work fine for tables with unique values. Self-joins are rare in development code because they are too complex and consume too many resources from the system. There are simpler and faster alternatives. 


The above book excerpt is from:

Super SQL Server Systems
Turbocharge Database Performance with C++ External Procedures

ISBN: 0-9761573-2-2
Joseph Gama, P. J. Naughter

 http://www.rampant-books.com/book_2005_2_sql_server_external_procedures.htm
 

 

Burleson Consulting Remote DB Administration


 

 


 

 

 

 

 
Burleson is the America's Team

Note: The pages on this site were created as a support and training reference for use by our staff of DBA consultants.  If you find it confusing, please exit this page.

Errata?  SQL Server technology is changing and we strive to update our SQL Server support information.  If you find an error or have a suggestion for improving our content, we would appreciate your feedback.  Just  e-mail:and include the URL for the page.
 


Burleson Consulting
SQL Server database support

 

Copyright © 1996 -  2013 by Vaaltech Web Services. All rights reserved.

Hit Counter