The SQL OVER() clause - when and why is it useful?
USE AdventureWorks2008R2;
GO
SELECT SalesOrderID, ProductID, OrderQty
,SUM(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Total'
,AVG(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Avg'
,COUNT(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Count'
,MIN(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Min'
,MAX(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Max'
FROM Sales.SalesOrderDetail
WHERE SalesOrderID IN(43659,43664);
I read about that clause and I don't understand why I need it.
What does the function Over
do? What does Partitioning By
do?
Why can't I make a query with writing Group By SalesOrderID
?
Solution 1:
You can use GROUP BY SalesOrderID
. The difference is, with GROUP BY you can only have the aggregated values for the columns that are not included in GROUP BY.
In contrast, using windowed aggregate functions instead of GROUP BY, you can retrieve both aggregated and non-aggregated values. That is, although you are not doing that in your example query, you could retrieve both individual OrderQty
values and their sums, counts, averages etc. over groups of same SalesOrderID
s.
Here's a practical example of why windowed aggregates are great. Suppose you need to calculate what percent of a total every value is. Without windowed aggregates you'd have to first derive a list of aggregated values and then join it back to the original rowset, i.e. like this:
SELECT
orig.[Partition],
orig.Value,
orig.Value * 100.0 / agg.TotalValue AS ValuePercent
FROM OriginalRowset orig
INNER JOIN (
SELECT
[Partition],
SUM(Value) AS TotalValue
FROM OriginalRowset
GROUP BY [Partition]
) agg ON orig.[Partition] = agg.[Partition]
Now look how you can do the same with a windowed aggregate:
SELECT
[Partition],
Value,
Value * 100.0 / SUM(Value) OVER (PARTITION BY [Partition]) AS ValuePercent
FROM OriginalRowset orig
Much easier and cleaner, isn't it?
Solution 2:
The OVER
clause is powerful in that you can have aggregates over different ranges ("windowing"), whether you use a GROUP BY
or not
Example: get count per SalesOrderID
and count of all
SELECT
SalesOrderID, ProductID, OrderQty
,COUNT(OrderQty) AS 'Count'
,COUNT(*) OVER () AS 'CountAll'
FROM Sales.SalesOrderDetail
WHERE
SalesOrderID IN(43659,43664)
GROUP BY
SalesOrderID, ProductID, OrderQty
Get different COUNT
s, no GROUP BY
SELECT
SalesOrderID, ProductID, OrderQty
,COUNT(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'CountQtyPerOrder'
,COUNT(OrderQty) OVER(PARTITION BY ProductID) AS 'CountQtyPerProduct',
,COUNT(*) OVER () AS 'CountAllAgain'
FROM Sales.SalesOrderDetail
WHERE
SalesOrderID IN(43659,43664)