When you work with data and database systems, eventually you need to calculate running totals on (for example) product sales or financial data. There are several methods to calculate these amounts. In this post I’ll try to show the pros and cons to the different solutions.
Let’s start with creating the resources for the examples. The most basic example I could think of, is one with only the necessary information: Date and Turnover.
CREATE TABLE Dough (Date DATE, Turnover FLOAT)
And then insert some data:
INSERT INTO Dough (Date, Turnover) VALUES ('2011-01-01', 1000), ('2011-02-01', 1250), ('2011-03-01', 1500), ('2011-04-01', 1750), ('2011-05-01', 2000), ('2011-06-01', 2250), ('2011-07-01', 2250), ('2011-08-01', 2000), ('2011-09-01', 1750), ('2011-10-01', 1500), ('2011-11-01', 1250), ('2011-12-01', 1000) INSERT INTO Dough (Date, Turnover) VALUES ('2012-01-01', 100), ('2012-02-01', 125), ('2012-03-01', 150), ('2012-04-01', 175), ('2012-05-01', 200), ('2012-06-01', 225), ('2012-07-01', 225), ('2012-08-01', 200), ('2012-09-01', 175), ('2012-10-01', 150), ('2012-11-01', 125), ('2012-12-01', 100)
With this resource, we can start on the examples.
Different solutions
When looking at this question, you’ll notice that there are more solutions to return the correct result. The following queries return the same result, but all the solutions are written for a specific version of SQL Server.
SQL 2000
If you’re using SQL Server 2000 (and I certainly hope you don’t have to anymore ;)), you can use the query with the INNER JOIN. This can be used on all SQL Server versions:
SELECT A.Date, A.Turnover, SUM(B.Turnover) AS RunningTotal FROM Dough A INNER JOIN Dough B ON YEAR(B.Date) = YEAR(A.Date) AND B.Date <= A.Date GROUP BY A.Date, A.Turnover ORDER BY A.Date ASC
SQL 2005
In SQL Server 2005 they entered a new join type, called CROSS JOIN:
SELECT A.Date, A.Turnover, SUM(B.Turnover) AS RunningTotal FROM Dough A CROSS JOIN Dough B WHERE YEAR(B.Date) = YEAR(A.Date) AND B.Date <= A.Date GROUP BY A.Date, A.Turnover ORDER BY A.Date ASC
The example with the INNER JOIN and the CROSS JOIN generate the same execution plan.
SQL 2012
With the release of SQL Server 2012 they handed us (SQL developers) a whole new “bag of tricks”. One of these “tricks” is the window function.
The first time I saw the window function, was at a Techdays NL 2012 session. This session was hosted by Bob Beauchemin (Blog | @bobbeauch). The sessions (T-SQL improvements in SQL Server 2012) is worth watching. Even if you’re using SQL Server 2012 already!
With the window function you can compute and group data, and this is done with the rows you specify.
SELECT Date, TurnOver, SUM(TurnOver) OVER (PARTITION BY YEAR(Date) ORDER BY Date ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS RunningTotals FROM Dough
Performance
Seeing all these different solutions for the same question, I (and you probably will too) wonder about the performance of these queries. One very quick conclusion: they all return the same records ;).
When using SET STATISTICS IO, you can see the amount of disk activity generated by your statement. If you run this for the queries above, you will get the following results:
INNER JOIN:
Table ‘Dough’. Scan count 2, logical reads 25, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
CROSS JOIN:
Table ‘Dough’. Scan count 2, logical reads 25, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
OVER:
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Dough’. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
In the OVER query, you see a table called “Worktable”. This is an “extra object” that is generated by SQL Server because you use the OVER statement.
Conclusion
As shown above, there are several different ways to get to the same result. In this example I didn’t show you the cursor solution. This because it’s a bad practice, a bad performer, and a little bit to preserve my good name ;). If you do want to see this, please leave me a comment, and I’ll add it to this post.
But with every solution you’ll see as much discussion about reasons to use is, as discussions on why NOT to use it. And in this case, you might be bound to a specific SQL Server version, so you can’t use a specific approach.
But if you ask me for my opinion, I’ll go for the last option. Not only because I’ve got the privilege to work with SQL Server 2012 in my daily work, but also because it’s the best performer and you’ll end up with the most readable code.
I’m guessing you have a totally different opinion, so please leave a comment with your ideas and/or approaches to this challenge! Also, comments and questions are also greatly appreciated!
