Understanding the differences between CUBE and ROLLUP
My assignment asked me to find out "how many invoices are written for each date?"
I was a little stuck and asked my professor for help. She emailed me a query that would answer the question, "How many stoves of each type and version have been built? For a challenge but no extra points, include the total number of stoves."
This was the query she sent me:
SELECT STOVE.Type + STOVE.Version AS 'Type+Version'
, COUNT(*) AS 'The Count'
FROM STOVE
GROUP BY STOVE.Type + STOVE.Version WITH ROLLUP;
So, I tweaked that query until it met my needs. This is what I came up with:
SELECT InvoiceDt
, COUNT(InvoiceNbr) AS 'Number of Invoices'
FROM INVOICE
GROUP BY InvoiceDt WITH ROLLUP
ORDER BY InvoiceDt ASC;
And it returned the following results that I wanted.
Anyway, I decided to read up on the ROLLUP clause and started with an article from Microsoft. It said that the ROLLUP clause was similar to the CUBE clause but that it was distinguished from the CUBE clause in the following way:
- CUBE generates a result set that shows aggregates for all combinations of values in the selected columns.
- ROLLUP generates a result set that shows aggregates for a hierarchy of values in the selected columns.
So, I decided to replace the ROLLUP in my query with CUBE to see what would happen. They produced the same results. I guess that's where I'm getting confused.
It seems like, if you're using the type of query that I am here, that there isn't any practical difference between the two clauses. Is that right? Or, am I not understanding something? I had thought, when I finished reading the Microsoft article, that my results should've been different using the CUBE clause.
You won't see any difference since you're only rolling up a single column. Consider an example where we do
ROLLUP (YEAR, MONTH, DAY)
With a ROLLUP
, it will have the following outputs:
YEAR, MONTH, DAY
YEAR, MONTH
YEAR
()
With CUBE
, it will have the following:
YEAR, MONTH, DAY
YEAR, MONTH
YEAR, DAY
YEAR
MONTH, DAY
MONTH
DAY
()
CUBE
essentially contains every possible rollup scenario for each node whereas ROLLUP
will keep the hierarchy in tact (so it won't skip MONTH and show YEAR/DAY, whereas CUBE
will)
This is why you didn't see a difference since you only had a single column you were rolling up.
Hope that helps.
We can understand the difference between ROLLUP and CUBE with a simple example. Consider we have a table which contains the results of quarterly test of students. In certain cases we need to see the total corresponding to the quarter as well as the students. Here is the sample table
SELECT * INTO #TEMP
FROM
(
SELECT 'Quarter 1' PERIOD,'Amar' NAME ,97 MARKS
UNION ALL
SELECT 'Quarter 1','Ram',88
UNION ALL
SELECT 'Quarter 1','Simi',76
UNION ALL
SELECT 'Quarter 2','Amar',94
UNION ALL
SELECT 'Quarter 2','Ram',82
UNION ALL
SELECT 'Quarter 2','Simi',71
UNION ALL
SELECT 'Quarter 3' ,'Amar',95
UNION ALL
SELECT 'Quarter 3','Ram',83
UNION ALL
SELECT 'Quarter 3','Simi',77
UNION ALL
SELECT 'Quarter 4' ,'Amar',91
UNION ALL
SELECT 'Quarter 4','Ram',84
UNION ALL
SELECT 'Quarter 4','Simi',79
)TAB
1. ROLLUP(Can find total for corresponding to one column)
(a) Get total score of each student in all quarters.
SELECT * FROM #TEMP
UNION ALL
SELECT PERIOD,NAME,SUM(MARKS) TOTAL
FROM #TEMP
GROUP BY NAME,PERIOD
WITH ROLLUP
HAVING PERIOD IS NULL AND NAME IS NOT NULL
// Having is used inorder to emit a row that is the total of all totals of each student
Following is the result of (a)
(b) Incase you need to get total score of each quarter
SELECT * FROM #TEMP
UNION ALL
SELECT PERIOD,NAME,SUM(MARKS) TOTAL
FROM #TEMP
GROUP BY PERIOD,NAME
WITH ROLLUP
HAVING PERIOD IS NOT NULL AND NAME IS NULL
Following is the result of (b)
2. CUBE(Find total for Quarter as well as students in a single shot)
SELECT PERIOD,NAME,SUM(MARKS) TOTAL
FROM #TEMP
GROUP BY NAME,PERIOD
WITH CUBE
HAVING PERIOD IS NOT NULL OR NAME IS NOT NULL
Following is the result of CUBE
Now you may be wondering about the real time use of ROLLUP and CUBE. Sometimes we need a report in which we need to see the total of each quarter and total of each student in a single shot. Here is an example
I am changing the above CUBE query slightly as we need total for both totals.
SELECT CASE WHEN PERIOD IS NULL THEN 'TOTAL' ELSE PERIOD END PERIOD,
CASE WHEN NAME IS NULL THEN 'TOTAL' ELSE NAME END NAME,
SUM(MARKS) MARKS
INTO #TEMP2
FROM #TEMP
GROUP BY NAME,PERIOD
WITH CUBE
DECLARE @cols NVARCHAR (MAX)
SELECT @cols = COALESCE (@cols + ',[' + PERIOD + ']',
'[' + PERIOD + ']')
FROM (SELECT DISTINCT PERIOD FROM #TEMP2) PV
ORDER BY PERIOD
DECLARE @query NVARCHAR(MAX)
SET @query = 'SELECT * FROM
(
SELECT * FROM #TEMP2
) x
PIVOT
(
SUM(MARKS)
FOR [PERIOD] IN (' + @cols + ')
) p;'
EXEC SP_EXECUTESQL @query
Now you will get the following result
This is because you only have one column that you are grouping by.
Add Group by InvoiceDt, InvoiceCountry
(or whatever field will give you more data.
With Cube will give you a Sum for each InvoiceDt and you will get a Sum for each InvoiceCountry.
You can find more detail about GROUPING SET, CUBE, ROLL UP. TL;DR they just expand GROUP BY + UNION ALL in some ways to get aggregation :)
https://technet.microsoft.com/en-us/library/bb510427(v=sql.105).aspx