Real life example, when to use OUTER / CROSS APPLY in SQL
I have been looking at CROSS / OUTER APPLY
with a colleague and we're struggling to find real life examples of where to use them.
I've spent quite a lot of time looking at When should I use Cross Apply over Inner Join? and googling but the main (only) example seems pretty bizarre (using the rowcount from a table to determine how many rows to select from another table).
I thought this scenario may benefit from OUTER APPLY
:
Contacts Table (contains 1 record for each contact) Communication Entries Table (can contain n phone, fax, email fro each contact)
But using subqueries, common table expressions, OUTER JOIN
with RANK()
and OUTER APPLY
all seem to perform equally. I'm guessing this means the scenario isn't applicable to APPLY
.
Please share some real life examples and help explain the feature!
Some uses for APPLY
are...
1) Top N per group queries (can be more efficient for some cardinalities)
SELECT pr.name,
pa.name
FROM sys.procedures pr
OUTER APPLY (SELECT TOP 2 *
FROM sys.parameters pa
WHERE pa.object_id = pr.object_id
ORDER BY pr.name) pa
ORDER BY pr.name,
pa.name
2) Calling a Table Valued Function for each row in the outer query
SELECT *
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle)
3) Reusing a column alias
SELECT number,
doubled_number,
doubled_number_plus_one
FROM master..spt_values
CROSS APPLY (SELECT 2 * CAST(number AS BIGINT)) CA1(doubled_number)
CROSS APPLY (SELECT doubled_number + 1) CA2(doubled_number_plus_one)
4) Unpivoting more than one group of columns
Assumes 1NF violating table structure....
CREATE TABLE T
(
Id INT PRIMARY KEY,
Foo1 INT, Foo2 INT, Foo3 INT,
Bar1 INT, Bar2 INT, Bar3 INT
);
Example using 2008+ VALUES
syntax.
SELECT Id,
Foo,
Bar
FROM T
CROSS APPLY (VALUES(Foo1, Bar1),
(Foo2, Bar2),
(Foo3, Bar3)) V(Foo, Bar);
In 2005 UNION ALL
can be used instead.
SELECT Id,
Foo,
Bar
FROM T
CROSS APPLY (SELECT Foo1, Bar1
UNION ALL
SELECT Foo2, Bar2
UNION ALL
SELECT Foo3, Bar3) V(Foo, Bar);
There are various situations where you cannot avoid CROSS APPLY
or OUTER APPLY
.
Consider you have two tables.
MASTER TABLE
x------x--------------------x
| Id | Name |
x------x--------------------x
| 1 | A |
| 2 | B |
| 3 | C |
x------x--------------------x
DETAILS TABLE
x------x--------------------x-------x
| Id | PERIOD | QTY |
x------x--------------------x-------x
| 1 | 2014-01-13 | 10 |
| 1 | 2014-01-11 | 15 |
| 1 | 2014-01-12 | 20 |
| 2 | 2014-01-06 | 30 |
| 2 | 2014-01-08 | 40 |
x------x--------------------x-------x
CROSS APPLY
There are many situation where we need to replace INNER JOIN
with CROSS APPLY
.
1. If we want to join 2 tables on TOP n
results with INNER JOIN
functionality
Consider if we need to select Id
and Name
from Master
and last two dates for each Id
from Details table
.
SELECT M.ID,M.NAME,D.PERIOD,D.QTY
FROM MASTER M
INNER JOIN
(
SELECT TOP 2 ID, PERIOD,QTY
FROM DETAILS D
ORDER BY CAST(PERIOD AS DATE)DESC
)D
ON M.ID=D.ID
The above query generates the following result.
x------x---------x--------------x-------x
| Id | Name | PERIOD | QTY |
x------x---------x--------------x-------x
| 1 | A | 2014-01-13 | 10 |
| 1 | A | 2014-01-12 | 20 |
x------x---------x--------------x-------x
See, it generated results for last two dates with last two date's Id
and then joined these records only in outer query on Id
, which is wrong. To accomplish this, we need to use CROSS APPLY
.
SELECT M.ID,M.NAME,D.PERIOD,D.QTY
FROM MASTER M
CROSS APPLY
(
SELECT TOP 2 ID, PERIOD,QTY
FROM DETAILS D
WHERE M.ID=D.ID
ORDER BY CAST(PERIOD AS DATE)DESC
)D
and forms he following result.
x------x---------x--------------x-------x
| Id | Name | PERIOD | QTY |
x------x---------x--------------x-------x
| 1 | A | 2014-01-13 | 10 |
| 1 | A | 2014-01-12 | 20 |
| 2 | B | 2014-01-08 | 40 |
| 2 | B | 2014-01-06 | 30 |
x------x---------x--------------x-------x
Here is the working. The query inside CROSS APPLY
can reference the outer table, where INNER JOIN
cannot do this(throws compile error). When finding the last two dates, joining is done inside CROSS APPLY
ie, WHERE M.ID=D.ID
.
2. When we need INNER JOIN
functionality using functions.
CROSS APPLY
can be used as a replacement with INNER JOIN
when we need to get result from Master
table and a function
.
SELECT M.ID,M.NAME,C.PERIOD,C.QTY
FROM MASTER M
CROSS APPLY dbo.FnGetQty(M.ID) C
And here is the function
CREATE FUNCTION FnGetQty
(
@Id INT
)
RETURNS TABLE
AS
RETURN
(
SELECT ID,PERIOD,QTY
FROM DETAILS
WHERE ID=@Id
)
which generated the following result
x------x---------x--------------x-------x
| Id | Name | PERIOD | QTY |
x------x---------x--------------x-------x
| 1 | A | 2014-01-13 | 10 |
| 1 | A | 2014-01-11 | 15 |
| 1 | A | 2014-01-12 | 20 |
| 2 | B | 2014-01-06 | 30 |
| 2 | B | 2014-01-08 | 40 |
x------x---------x--------------x-------x
OUTER APPLY
1. If we want to join 2 tables on TOP n
results with LEFT JOIN
functionality
Consider if we need to select Id and Name from Master
and last two dates for each Id from Details
table.
SELECT M.ID,M.NAME,D.PERIOD,D.QTY
FROM MASTER M
LEFT JOIN
(
SELECT TOP 2 ID, PERIOD,QTY
FROM DETAILS D
ORDER BY CAST(PERIOD AS DATE)DESC
)D
ON M.ID=D.ID
which forms the following result
x------x---------x--------------x-------x
| Id | Name | PERIOD | QTY |
x------x---------x--------------x-------x
| 1 | A | 2014-01-13 | 10 |
| 1 | A | 2014-01-12 | 20 |
| 2 | B | NULL | NULL |
| 3 | C | NULL | NULL |
x------x---------x--------------x-------x
This will bring wrong results ie, it will bring only latest two dates data from Details
table irrespective of Id
even though we join with Id
. So the proper solution is using OUTER APPLY
.
SELECT M.ID,M.NAME,D.PERIOD,D.QTY
FROM MASTER M
OUTER APPLY
(
SELECT TOP 2 ID, PERIOD,QTY
FROM DETAILS D
WHERE M.ID=D.ID
ORDER BY CAST(PERIOD AS DATE)DESC
)D
which forms the following desired result
x------x---------x--------------x-------x
| Id | Name | PERIOD | QTY |
x------x---------x--------------x-------x
| 1 | A | 2014-01-13 | 10 |
| 1 | A | 2014-01-12 | 20 |
| 2 | B | 2014-01-08 | 40 |
| 2 | B | 2014-01-06 | 30 |
| 3 | C | NULL | NULL |
x------x---------x--------------x-------x
2. When we need LEFT JOIN
functionality using functions
.
OUTER APPLY
can be used as a replacement with LEFT JOIN
when we need to get result from Master
table and a function
.
SELECT M.ID,M.NAME,C.PERIOD,C.QTY
FROM MASTER M
OUTER APPLY dbo.FnGetQty(M.ID) C
And the function goes here.
CREATE FUNCTION FnGetQty
(
@Id INT
)
RETURNS TABLE
AS
RETURN
(
SELECT ID,PERIOD,QTY
FROM DETAILS
WHERE ID=@Id
)
which generated the following result
x------x---------x--------------x-------x
| Id | Name | PERIOD | QTY |
x------x---------x--------------x-------x
| 1 | A | 2014-01-13 | 10 |
| 1 | A | 2014-01-11 | 15 |
| 1 | A | 2014-01-12 | 20 |
| 2 | B | 2014-01-06 | 30 |
| 2 | B | 2014-01-08 | 40 |
| 3 | C | NULL | NULL |
x------x---------x--------------x-------x
Common feature of
CROSS APPLY
andOUTER APPLY
CROSS APPLY
or OUTER APPLY
can be used to retain NULL
values when unpivoting, which are interchangeable.
Consider you have the below table
x------x-------------x--------------x
| Id | FROMDATE | TODATE |
x------x-------------x--------------x
| 1 | 2014-01-11 | 2014-01-13 |
| 1 | 2014-02-23 | 2014-02-27 |
| 2 | 2014-05-06 | 2014-05-30 |
| 3 | NULL | NULL |
x------x-------------x--------------x
When you use UNPIVOT
to bring FROMDATE
AND TODATE
to one column, it will eliminate NULL
values by default.
SELECT ID,DATES
FROM MYTABLE
UNPIVOT (DATES FOR COLS IN (FROMDATE,TODATE)) P
which generates the below result. Note that we have missed the record of Id
number 3
x------x-------------x
| Id | DATES |
x------x-------------x
| 1 | 2014-01-11 |
| 1 | 2014-01-13 |
| 1 | 2014-02-23 |
| 1 | 2014-02-27 |
| 2 | 2014-05-06 |
| 2 | 2014-05-30 |
x------x-------------x
In such cases a CROSS APPLY
or OUTER APPLY
will be useful
SELECT DISTINCT ID,DATES
FROM MYTABLE
OUTER APPLY(VALUES (FROMDATE),(TODATE))
COLUMNNAMES(DATES)
which forms the following result and retains Id
where its value is 3
x------x-------------x
| Id | DATES |
x------x-------------x
| 1 | 2014-01-11 |
| 1 | 2014-01-13 |
| 1 | 2014-02-23 |
| 1 | 2014-02-27 |
| 2 | 2014-05-06 |
| 2 | 2014-05-30 |
| 3 | NULL |
x------x-------------x
One real life example would be if you had a scheduler and wanted to see what the most recent log entry was for each scheduled task.
select t.taskName, lg.logResult, lg.lastUpdateDate
from task t
cross apply (select top 1 taskID, logResult, lastUpdateDate
from taskLog l
where l.taskID = t.taskID
order by lastUpdateDate desc) lg
To answer the point above knock up an example:
create table #task (taskID int identity primary key not null, taskName varchar(50) not null)
create table #log (taskID int not null, reportDate datetime not null, result varchar(50) not null, primary key(reportDate, taskId))
insert #task select 'Task 1'
insert #task select 'Task 2'
insert #task select 'Task 3'
insert #task select 'Task 4'
insert #task select 'Task 5'
insert #task select 'Task 6'
insert #log
select taskID, 39951 + number, 'Result text...'
from #task
cross join (
select top 1000 row_number() over (order by a.id) as number from syscolumns a cross join syscolumns b cross join syscolumns c) n
And now run the two queries with a execution plan.
select t.taskID, t.taskName, lg.reportDate, lg.result
from #task t
left join (select taskID, reportDate, result, rank() over (partition by taskID order by reportDate desc) rnk from #log) lg
on lg.taskID = t.taskID and lg.rnk = 1
select t.taskID, t.taskName, lg.reportDate, lg.result
from #task t
outer apply ( select top 1 l.*
from #log l
where l.taskID = t.taskID
order by reportDate desc) lg
You can see that the outer apply query is more efficient. (Couldn't attach the plan as I'm a new user... Doh.)