WHERE IN with multiple columns in Redshift
I am using Amazon Redshift where I have two tables. A staging table where I COPY all data from S3 and a target table where everything should eventually be inserted.
Now I have query that should delete everything from the target table that is in the staging table so that I won't have duplicates.
DELETE FROM xschema.target
WHERE (col1, col2) IN
(
SELECT col1, col2
FROM xschema.staging
)
AND col3 = 'de'
;
The query executes but it deletes everything where col3 = 'de'
is true.
The IN
condition of the statement always returns true.
For example: If I have in my target table the following values
col1 | col2 | col3 | col4
aaa | 1 | de | 100
bbb | 2 | de | 200
ccc | 1 | us | 180
And in my staging table the following values:
col1 | col2 | col3 | col4
aaa | 1 | de | 100
ddd | 1 | de | 250
And I run the query above to delete all values in the target table from the staging table I get:
col1 | col2 | col3 | col4
ccc | 1 | us | 180
but I would expect:
col1 | col2 | col3 | col4
bbb | 2 | de | 200
ccc | 1 | us | 180
I have not found anything so far in the official documentation or here (at least not for redshift). Any ideas on how to solve that?
Edit: Added examples
I would use EXISTS
:
DELETE
FROM xschema.target t
WHERE EXISTS (SELECT 1 FROM xschema.staging s WHERE s.col1 = t.col1 AND s.col2 = t.col2) AND
col3 = 'de';
I think Redshift started supporting this recently. Following query works without any issue on Redshift.
select * from inventory
where (INV_WAREHOUSE_SK, inv_item_sk) IN (select 1, 2);