WHERE IN with multiple columns in Redshift

I am using Amazon Redshift where I have two tables. A staging table where I COPY all data from S3 and a target table where everything should eventually be inserted.

Now I have query that should delete everything from the target table that is in the staging table so that I won't have duplicates.

DELETE FROM xschema.target
WHERE (col1, col2) IN 
(
SELECT col1, col2
FROM xschema.staging
)
AND col3 = 'de'
;

The query executes but it deletes everything where col3 = 'de' is true. The IN condition of the statement always returns true.

For example: If I have in my target table the following values

col1 | col2 | col3 | col4
aaa  | 1    | de   | 100
bbb  | 2    | de   | 200
ccc  | 1    | us   | 180

And in my staging table the following values:

col1 | col2 | col3 | col4
aaa  | 1    | de   | 100
ddd  | 1    | de   | 250

And I run the query above to delete all values in the target table from the staging table I get:

col1 | col2 | col3 | col4
ccc  | 1    | us   | 180

but I would expect:

col1 | col2 | col3 | col4
bbb  | 2    | de   | 200
ccc  | 1    | us   | 180

I have not found anything so far in the official documentation or here (at least not for redshift). Any ideas on how to solve that?

Edit: Added examples

I would use EXISTS :

DELETE 
FROM xschema.target t
WHERE EXISTS (SELECT 1 FROM xschema.staging s WHERE s.col1 = t.col1 AND s.col2 = t.col2) AND
      col3 = 'de';

I think Redshift started supporting this recently. Following query works without any issue on Redshift.

select * from inventory
where (INV_WAREHOUSE_SK, inv_item_sk) IN (select 1, 2);

WHERE IN with multiple columns in Redshift

Related

Recent Posts