SQL server join tables and pivot

I have two tables with data

TABLE 1

    ---------------------------------------------------
    | SALEID | SOLDBY | SALEPRICE | MARGIN |   DATE   |
    |  1     |  'aa'  |  10,000   |   10   | 2013-1-1 |
    |  2     |  'bb'  |  25,000   |    5   | 2013-5-1 |

TABLE 2

    ---------------------------------------------------
    | SALEITEMID | SALEID | SALEPRICE | CATEGORY |
    |  1         |  1     |   6,000   | BOOKS    |
    |  2         |  1     |   4,000   | PRINTING |
    |  3         |  2     |   5,000   | BOOKS    |
    |  4         |  2     |   12,000  | PRINTING |
    |  5         |  2     |   8,000   | DVD      |

I need a query which will produce

TAB3

    --------------------------------------------------------------------------------
    | SALEID | SOLDBY | SALEPRICE | MARGIN |   DATE   |  BOOKS  | PRINTING | DVD
    |  1     |  'aa'  |  10,000   |   10   | 2013-1-1 |  6,000  |  4,000   | 0
    |  2     |  'bb'  |  25,000   |    5   | 2013-5-1 |  5,000  | 12,000   | 8,000

I am pretty new to pivoting and not sure if pivot is way to go for this or not.


This should work:

WITH Sales AS (
   SELECT
      S.SaleID,
      S.SoldBy,
      S.SalePrice,
      S.Margin,
      S.Date,
      I.SalePrice,
      I.Category
   FROM
      dbo.Sale S
      INNER JOIN dbo.SaleItem I
         ON S.SaleID = I.SaleID
)
SELECT *
FROM
   Sales
   PIVOT (Max(SalePrice) FOR Category IN (Books, Printing, DVD)) P
;

Or alternately:

SELECT
   S.SaleID,
   S.SoldBy,
   S.SalePrice,
   S.Margin,
   S.Date,
   I.Books,
   I.Printing,
   I.DVD
FROM
   dbo.Sale S
   INNER JOIN (
      SELECT *
      FROM
         (SELECT SaleID, SalePrice, Category FROM dbo.SaleItem) I
         PIVOT (Max(SalePrice) FOR Category IN (Books, Printing, DVD)) P
   ) I ON S.SaleID = I.SaleID
;

These have the same result set and may in fact be treated the same by the query optimizer, but possibly not. The big difference comes into play when you start putting conditions on the Sale table—you should test and see which query works better.

Note: it is crucial when using PIVOT that only the columns that should be part of the resulting output are available. This is why the two above queries have extra derived table subqueries (SELECT ...) so that only specific columns are exposed. All columns that are available to be seen by PIVOT that aren't listed in the pivot expression will implicitly be grouped on and included in the final output. This will likely not be what you want.

May I suggest, however, that you do the pivoting in the presentation layer? If, for example, you are using SSRS it is quite easy to use a matrix control that will do all the pivoting for you. That is best, because then if you add a new Category, you won't have to modify all your SQL code!

There is a way to dynamically find the column names to pivot, but it involves dynamic SQL. I don't really recommend that as the best way, either, though it is possible.

Another way that could work would be to preprocess this query—meaning to set a trigger on the Category table that rewrites a view to contain all the extant categories that exist. This does solve a lot of the other problems I've mentioned, but again, using the presentation layer is best.

Note: If your column names (that were formerly values) have spaces, are numbers or begin with a number, or are otherwise not valid identifiers, you must quote them with square brackets as in PIVOT (Max(Value) FOR CategoryId IN ([1], [2], [3], [4])) P. Alternately, you can modify the values before they get to the PIVOT part of the query to prepend some letters or remove spaces, so that the column list doesn't need escaping. For further reading on this check out the rules for identifiers in SQL Server.