Is this a bug, or does Snowflake not fully support correlated subqueries in a WHERE EXISTS clause?

Solution 1:

I see this still happens, and I just notice you already know about swapping empPay.End_Date to CURRENT_DATE which is how I would have written it.

It does make the correlated sub query more complex because now you are mixing in two tables instead of one.

when CURRENT_DATE is used the SQL is the same as:

SELECT 
    s.emp_sk
    ,ep.pay_rate_sk
    ,TO_DATE('2021-05-31') AS report_date
    ,ep.start_date AS pay_start_date
    ,ep.end_date AS pay_end_date
FROM (
    SELECT 
        e.emp_sk
    FROM employee e
    WHERE EXISTS (
        SELECT 1 
        FROM employee_x_location AS el
        WHERE e.emp_sk = el.emp_sk
            AND TO_DATE('2021-05-31') BETWEEN el.start_date AND COALESCE(el.end_date, CURRENT_DATE)
    )
) AS s
JOIN employee_x_pay_rate AS ep
    ON s.emp_sk = ep.emp_sk
        AND TO_DATE('2021-05-31') BETWEEN ep.start_date AND ep.end_date;

so demonstrating the correlation is complex verse simple can be shown, by swapping employee table with employee_x_pay_rate in the sub-select, like so:

SELECT 
    e.emp_sk
FROM Employee_X_Pay_Rate e
WHERE EXISTS (
    SELECT 1 
    FROM employee_x_location AS el
    WHERE e.emp_sk = el.emp_sk
        AND TO_DATE('2021-05-31') BETWEEN el.start_date AND COALESCE(el.end_date, CURRENT_DATE)
)

works, but use the value from that table does not:

SELECT 
    e.emp_sk
FROM Employee_X_Pay_Rate e
WHERE EXISTS (
    SELECT 1 
    FROM employee_x_location AS el
    WHERE e.emp_sk = el.emp_sk
        AND TO_DATE('2021-05-31') BETWEEN el.start_date AND COALESCE(el.end_date, e.End_Date)
)

sign IFNULL(el.end_date, e.End_Date) and NVL(el.end_date, e.End_Date) both fail also.

But you can restruct the code to move the COALESCE into a CTE, and then use the WHERE EXISTS like so:

WITH r_emp_pay AS (
    SELECT 
       empPay.Emp_SK
       ,empPay.Pay_Rate_SK
       ,empPay.Start_Date 
       ,empPay.End_Date
    FROM Employee_X_Pay_Rate AS empPay
    WHERE TO_DATE('2021-05-31', 'yyyy-mm-dd') BETWEEN empPay.Start_Date AND empPay.End_Date
), r_emp_loc AS (
    SELECT 
        empLoc.Emp_SK
        ,empLoc.Start_Date
        ,empLoc.End_Date
        ,COALESCE(empLoc.End_Date, empPay.End_Date) as col_end_date
    FROM Employee_X_Location empLoc
    JOIN r_emp_pay empPay
        ON empPay.Emp_SK = empLoc.Emp_SK
    WHERE TO_DATE('2021-05-31', 'yyyy-mm-dd') BETWEEN empLoc.Start_Date AND COALESCE(empLoc.End_Date, CURRENT_DATE)
)
SELECT 
   emp.Emp_SK
   ,empPay.Pay_Rate_SK
   ,TO_DATE('2021-05-31', 'yyyy-mm-dd') AS Report_Date
   ,empPay.Start_Date AS Pay_Start_Date
   ,empPay.End_Date AS Pay_End_Date
FROM Employee emp
JOIN r_emp_pay empPay
    ON emp.Emp_SK = empPay.Emp_SK
WHERE EXISTS (
   SELECT 1 FROM r_emp_loc empLoc
   WHERE emp.Emp_SK = empLoc.Emp_SK
      AND TO_DATE('2021-05-31', 'yyyy-mm-dd') BETWEEN empLoc.Start_Date AND empLoc.col_end_date
);

gives:

EMP_SK PAY_RATE_SK REPORT_DATE PAY_START_DATE PAY_END_DATE
1 3 2021-05-31 2021-04-01 00:00:00.000 2099-12-31 00:00:00.000