Pages

Thursday, November 11, 2010

SQL Ranking functions

Ranking functions return a ranking value for each rows in your result set. Many tasks, like generating sequential numbers, finding ranks, and so on, which in pre-2005 SQL Server versions requires many lines of code, now can be implemented much easier and faster.

Look at the following example:

SELECT FirstName, LastName, PostalCode, 
       ,ROW_NUMBER() OVER (ORDER BY PostalCode) AS 'Row Number'
       ,RANK() OVER (ORDER BY PostalCode) AS 'Rank'
       ,DENSE_RANK() OVER (ORDER BY PostalCode) AS 'Dense Rank'
       ,NTILE(2) OVER (ORDER BY PostalCode) AS 'Bi-tile'
FROM   Persons

Result Set:
FirstNameLastNamePostalCodeRow NumberRankDense RankBi-tile
JoeHarley30071111
DavidBigel30072111
JoelFriedlaender30073111
KennyLam31284422
NirmalParrera31285422
GurnamMadan31286423
TroyParker31507733
Melanie????31508734
JeremyPickhaver31509734

Lets look at each of these ranking functions:

1. ROW_NUMBER()
ROW_NUMBER ()     OVER ( [ <partition_by_clause> ] <order_by_clause> )
This function returns the sequential number of a row within a partition of a result set, starting at 1 for the first row in each partition. The ORDER BY clause determines the sequence in which the rows are assigned their unique ROW_NUMBER within a specified partition. In the above example we have not specified any partition columns, so the ROW_NUMBER is assigned based on the ordering of postal code column for each row.

2. RANK()
RANK ()    OVER ( [ < partition_by_clause > ] < order_by_clause > )
This function returns the rank of each row within the partition of a result set. The rank of a row is one plus the number of ranks that come before the row in question. You can also group the the rankings by using the PARTITION BY clause. In the above example the rankings are the same for rows having the same value in postal code column. But the ranking value keeps on incrementing for each row, so that when a new postal code value is encountered, ranking value on that new row will be one more than the number of proceeding rows.

3. DENSE_RANK()
DENSE_RANK ()    OVER ( [ < partition_by_clause > ] < order_by_clause > )
This function returns the rank of rows within the partition of a result set, without any gaps in the ranking. The rank of a row is one plus the number of distinct ranks that come before the row in question. In the above example rankings are incremented as the value in postal code column changes.

4. NTILE()
NTILE (integer_expression)    OVER ( [ <partition_by_clause> ] < order_by_clause > )
This function distributes the rows in an ordered partition into a specified number of groups. The groups are numbered, starting at one. For each row, NTILE returns the number of the group to which the row belongs. The integer constant expression specifies the number of groups into which each partition must be divided. In the above examples the rows are divided in 2 groups. Because the total number of rows(9) is not divisible by the number of groups, the first group has three rows and the remaining groups have two rows each.

These functions can be really handy for complex sorting and generating sequential record set. These windowed functions can only be used in SELECT and ORDER BY statements. You can read more about them here.

No comments:

Post a Comment