Calculate a Running Total in SQL Server
Imagine the following table (called TestTable
):
id somedate somevalue
-- -------- ---------
45 01/Jan/09 3
23 08/Jan/09 5
12 02/Feb/09 0
77 14/Feb/09 7
39 20/Feb/09 34
33 02/Mar/09 6
I would like a query that returns a running total in date order, like:
id somedate somevalue runningtotal
-- -------- --------- ------------
45 01/Jan/09 3 3
23 08/Jan/09 5 8
12 02/Feb/09 0 8
77 14/Feb/09 7 15
39 20/Feb/09 34 49
33 02/Mar/09 6 55
I know there are various ways of doing this in SQL Server 2000 / 2005 / 2008.
I am particularly interested in this sort of method that uses the aggregating-set-statement trick:
INSERT INTO @AnotherTbl(id, somedate, somevalue, runningtotal)
SELECT id, somedate, somevalue, null
FROM TestTable
ORDER BY somedate
DECLARE @RunningTotal int
SET @RunningTotal = 0
UPDATE @AnotherTbl
SET @RunningTotal = runningtotal = @RunningTotal + somevalue
FROM @AnotherTbl
... this is very efficient but I have heard there are issues around this because you can't necessarily guarantee that the UPDATE
statement will process the rows in the correct order. Maybe we can get some definitive answers about that issue.
But maybe there are other ways that people can suggest?
edit: Now with a SqlFiddle with the setup and the 'update trick' example above
Update, if you are running SQL Server 2012 see: https://stackoverflow.com/a/10309947
The problem is that the SQL Server implementation of the Over clause is somewhat limited.
Oracle (and ANSI-SQL) allow you to do things like:
SELECT somedate, somevalue,
SUM(somevalue) OVER(ORDER BY somedate
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS RunningTotal
FROM Table
SQL Server gives you no clean solution to this problem. My gut is telling me that this is one of those rare cases where a cursor is the fastest, though I will have to do some benchmarking on big results.
The update trick is handy but I feel its fairly fragile. It seems that if you are updating a full table then it will proceed in the order of the primary key. So if you set your date as a primary key ascending you will probably
be safe. But you are relying on an undocumented SQL Server implementation detail (also if the query ends up being performed by two procs I wonder what will happen, see: MAXDOP):
Full working sample:
drop table #t
create table #t ( ord int primary key, total int, running_total int)
insert #t(ord,total) values (2,20)
-- notice the malicious re-ordering
insert #t(ord,total) values (1,10)
insert #t(ord,total) values (3,10)
insert #t(ord,total) values (4,1)
declare @total int
set @total = 0
update #t set running_total = @total, @total = @total + total
select * from #t
order by ord
ord total running_total
----------- ----------- -------------
1 10 10
2 20 30
3 10 40
4 1 41
You asked for a benchmark this is the lowdown.
The fastest SAFE way of doing this would be the Cursor, it is an order of magnitude faster than the correlated sub-query of cross-join.
The absolute fastest way is the UPDATE trick. My only concern with it is that I am not certain that under all circumstances the update will proceed in a linear way. There is nothing in the query that explicitly says so.
Bottom line, for production code I would go with the cursor.
Test data:
create table #t ( ord int primary key, total int, running_total int)
set nocount on
declare @i int
set @i = 0
begin tran
while @i < 10000
begin
insert #t (ord, total) values (@i, rand() * 100)
set @i = @i +1
end
commit
Test 1:
SELECT ord,total,
(SELECT SUM(total)
FROM #t b
WHERE b.ord <= a.ord) AS b
FROM #t a
-- CPU 11731, Reads 154934, Duration 11135
Test 2:
SELECT a.ord, a.total, SUM(b.total) AS RunningTotal
FROM #t a CROSS JOIN #t b
WHERE (b.ord <= a.ord)
GROUP BY a.ord,a.total
ORDER BY a.ord
-- CPU 16053, Reads 154935, Duration 4647
Test 3:
DECLARE @TotalTable table(ord int primary key, total int, running_total int)
DECLARE forward_cursor CURSOR FAST_FORWARD
FOR
SELECT ord, total
FROM #t
ORDER BY ord
OPEN forward_cursor
DECLARE @running_total int,
@ord int,
@total int
SET @running_total = 0
FETCH NEXT FROM forward_cursor INTO @ord, @total
WHILE (@@FETCH_STATUS = 0)
BEGIN
SET @running_total = @running_total + @total
INSERT @TotalTable VALUES(@ord, @total, @running_total)
FETCH NEXT FROM forward_cursor INTO @ord, @total
END
CLOSE forward_cursor
DEALLOCATE forward_cursor
SELECT * FROM @TotalTable
-- CPU 359, Reads 30392, Duration 496
Test 4:
declare @total int
set @total = 0
update #t set running_total = @total, @total = @total + total
select * from #t
-- CPU 0, Reads 58, Duration 139
In SQL Server 2012 you can use SUM() with the OVER() clause.
select id,
somedate,
somevalue,
sum(somevalue) over(order by somedate rows unbounded preceding) as runningtotal
from TestTable
SQL Fiddle
While Sam Saffron did great work on it, he still didn't provide recursive common table expression code for this problem. And for us who working with SQL Server 2008 R2 and not Denali, it's still fastest way to get running total, it's about 10 times faster than cursor on my work computer for 100000 rows, and it's also inline query.
So, here it is (I'm supposing that there's an ord
column in the table and it's sequential number without gaps, for fast processing there also should be unique constraint on this number):
;with
CTE_RunningTotal
as
(
select T.ord, T.total, T.total as running_total
from #t as T
where T.ord = 0
union all
select T.ord, T.total, T.total + C.running_total as running_total
from CTE_RunningTotal as C
inner join #t as T on T.ord = C.ord + 1
)
select C.ord, C.total, C.running_total
from CTE_RunningTotal as C
option (maxrecursion 0)
-- CPU 140, Reads 110014, Duration 132
sql fiddle demo
update
I also was curious about this update with variable or quirky update. So usually it works ok, but how we can be sure that it works every time? well, here's a little trick (found it here - http://www.sqlservercentral.com/Forums/Topic802558-203-21.aspx#bm981258) - you just check current and previous ord
and use 1/0
assignment in case they are different from what you expecting:
declare @total int, @ord int
select @total = 0, @ord = -1
update #t set
@total = @total + total,
@ord = case when ord <> @ord + 1 then 1/0 else ord end,
------------------------
running_total = @total
select * from #t
-- CPU 0, Reads 58, Duration 139
From what I've seen if you have proper clustered index/primary key on your table (in our case it would be index by ord_id
) update will proceed in a linear way all the time (never encountered divide by zero). That said, it's up to you to decide if you want to use it in production code :)
update 2 I'm linking this answer, cause it includes some useful info about unreliability of the quirky update - nvarchar concatenation / index / nvarchar(max) inexplicable behavior.
The APPLY operator in SQL 2005 and higher works for this:
select
t.id ,
t.somedate ,
t.somevalue ,
rt.runningTotal
from TestTable t
cross apply (select sum(somevalue) as runningTotal
from TestTable
where somedate <= t.somedate
) as rt
order by t.somedate
SELECT TOP 25 amount,
(SELECT SUM(amount)
FROM time_detail b
WHERE b.time_detail_id <= a.time_detail_id) AS Total FROM time_detail a
You can also use the ROW_NUMBER() function and a temp table to create an arbitrary column to use in the comparison on the inner SELECT statement.