MySQL Find closest date and corresponding records between 2 tables
A snowflake focused solution:
So with some CTEs for to fake the data:
WITH orders(order_id, order_date, email) AS (
SELECT column1, to_date(column2,'DD/MM/YYYY'), column3 FROM VALUES
(1234, '20/02/2021', '[email protected]')
), subscriptions(subscription_id, start_date, email, cancel_date) AS (
SELECT column1, to_date(column2,'DD/MM/YYYY'), column3, to_date(column4,'DD/MM/YYYY') FROM VALUES
(1236, '20/03/2021', '[email protected]', '20/04/2021'),
(1232, '19/02/2021', '[email protected]', '19/03/2021'),
(1219, '20/01/2021', '[email protected]', '29/01/2021')
)
and this SQL:
SELECT
o.order_id,
o.order_date,
o.email,
FIRST_VALUE(s.start_date) OVER (PARTITION BY o.order_id, o.order_date, o.email ORDER BY s.start_date) as EARLIEST_SUBSCRIPTION_START,
start_date AS CLOSEST_SUBSCRIPTION_START,
cancel_date AS CLOSEST_SUBSCRIPTION_END
FROM orders AS o
JOIN subscriptions AS s
ON o.email = s.email AND o.order_date > s.start_date
QUALIFY row_number() over(PARTITION BY o.order_id, o.order_date, o.email ORDER BY s.start_date desc ) = 1
we get:
ORDER_ID | ORDER_DATE | EARLIEST_SUBSCRIPTION_START | CLOSEST_SUBSCRIPTION_START | CLOSEST_SUBSCRIPTION_END | |
---|---|---|---|---|---|
1234 | 2021-02-20 | [email protected] | 2021-01-20 | 2021-02-19 | 2021-03-19 |
This is using the WINDOW function FIRST_VALUE to get the first row for each row, then we use QUALIFY to drop the rows (per order,oder_date,email) that are not the first when sorted by subscription date in reverse.