How to rewrite IS DISTINCT FROM and IS NOT DISTINCT FROM?

Solution 1:

The IS DISTINCT FROM predicate was introduced as feature T151 of SQL:1999, and its readable negation, IS NOT DISTINCT FROM, was added as feature T152 of SQL:2003. The purpose of these predicates is to guarantee that the result of comparing two values is either True or False, never Unknown.

These predicates work with any comparable type (including rows, arrays and multisets) making it rather complicated to emulate them exactly. However, SQL Server doesn't support most of these types, so we can get pretty far by checking for null arguments/operands:

  • a IS DISTINCT FROM b can be rewritten as:

    ((a <> b OR a IS NULL OR b IS NULL) AND NOT (a IS NULL AND b IS NULL))
    
  • a IS NOT DISTINCT FROM b can be rewritten as:

    (NOT (a <> b OR a IS NULL OR b IS NULL) OR (a IS NULL AND b IS NULL))
    

Your own answer is incorrect as it fails to consider that FALSE OR NULL evaluates to Unknown. For example, NULL IS DISTINCT FROM NULL should evaluate to False. Similarly, 1 IS NOT DISTINCT FROM NULL should evaluate to False. In both cases, your expressions yield Unknown.

Solution 2:

Another solution I like leverages the true two-value boolean result of EXISTS combined with INTERSECT. This solution should work in SQL Server 2005+.

  • a IS NOT DISTINCT FROM b can be written as:

    EXISTS(SELECT a INTERSECT SELECT b)

As documented, INTERSECT treats two NULL values as equal, so if both are NULL, then INTERSECT results in a single row, thus EXISTS yields true.

  • a IS DISTINCT FROM b can be written as:

    NOT EXISTS(SELECT a INTERSECT SELECT b)

This approach is much more concise if you have multiple nullable columns you need to compare in two tables. For example, to return rows in TableB that have different values for Col1, Col2, or Col3 than TableA, the following can be used:

SELECT *
FROM TableA A
   INNER JOIN TableB B ON A.PK = B.PK
WHERE NOT EXISTS(
   SELECT A.Col1, A.Col2, A.Col3
   INTERSECT
   SELECT B.Col1, B.Col2, B.Col3);

Paul White explains this workaround in more detail: https://sql.kiwi/2011/06/undocumented-query-plans-equality-comparisons.html

Solution 3:

If your SQL implementation does not implement the SQL standard IS DISTINCT FROM and IS NOT DISTINCT FROM operators, you can rewrite expressions containing them using the following equivalencies:

In general:

a IS DISTINCT FROM b <==>
(
    ((a) IS NULL AND (b) IS NOT NULL)
OR
    ((a) IS NOT NULL AND (b) IS NULL)
OR
    ((a) <> (b))
)

a IS NOT DISTINCT FROM b <==>
(
    ((a) IS NULL AND (b) IS NULL)
OR
    ((a) = (b))
)

This answer is incorrect when used in a context where the difference between UNKNOWN and FALSE matters. I think that is uncommon, though. See the accepted answer by @ChrisBandy.

If a placeholder value can be identified that does not actually occur in the data, then COALESCE is an alternative:

a IS DISTINCT FROM b <==> COALESCE(a, placeholder) <> COALESCE(b, placeholder)
a IS NOT DISTINCT FROM b <==> COALESCE(a, placeholder) = COALESCE(b, placeholder)