How to fill in missing dates
Solution 1:
Here is a query that would work. Start by cross joining all combinations of dates and users (add filters as needed), then left join the users table and calculate quota using the last_value() function (note that if you are using Snowflake, you must specify "rows between unbounded preceding and current row" as documented here):
with all_dates_users as (
--all combinations of dates and users
select date, user
from dates
cross join (select distinct user_email as user from users)
),
joined as (
--left join users table to the previous
select DU.date, DU.user, U.sent_at, U.user_email, U.score, U.quota
from all_dates_users DU
left join users U on U.sent_at = DU.date and U.user_email = DU.user
)
--calculate quota as previous quota using last_value() function
select date, user, nvl(score, 0) as score, last_value(quota) ignore nulls over (partition by user order by date desc rows between unbounded preceding and current row) as quota
from joined
order by date desc;
Solution 2:
Maja's solution using the WINDOW FUCNTION LAST_VALUE works well, Snowflake has another way to express that which is a little shorter LAG IGNORE NULL OVER form.
WITH dates AS (
SELECT DATEADD(day, ROW_NUMBER() OVER(ORDER BY TRUE),'2022-01-01')::date AS date
FROM TABLE(generator(ROWCOUNT => 50))
), new_data AS (
SELECT * FROM VALUES
('2022-01-19','[email protected]',85,100),
('2022-01-18','[email protected]',50,200),
('2022-01-15','[email protected]',34,400)
v(date, user_email, score , quota )
), users_ranges AS (
SELECT
user_email,
min(date) as min_date,
max(date) as max_date
FROM new_data
GROUP BY 1
), dates_for_users AS (
SELECT d.date,
u.user_email
FROM users_ranges AS u
JOIN dates AS d
ON d.date between u.min_date and u.max_date
)
SELECT du.date,
du.user_email,
ZEROIFNULL(u.score) as score,
NVL(u.quota, LAG(u.quota) IGNORE NULLS OVER(partition by du.user_email order by du.date)) as quota
FROM dates_for_users AS du
LEFT JOIN new_data AS u
ON du.date = u.date AND du.user_email = u.user_email
ORDER BY 1 desc,2;
Which gives:
DATE | USER_EMAIL | SCORE | QUOTA |
---|---|---|---|
2022-01-19 | [email protected] | 85 | 100 |
2022-01-18 | [email protected] | 50 | 200 |
2022-01-17 | [email protected] | 0 | 400 |
2022-01-16 | [email protected] | 0 | 400 |
2022-01-15 | [email protected] | 34 | 400 |
As your problem is described, there is a tiny logic gap of you onyl give two rows, but are expecting a large date range, but I assume that is not the heart of the problem, and you just really want the LAG / LAST_VALUE logic.
This works by dates
& new_data
are just the table proxies. Then users_ranges
is used to find the time ranges for each user, that will be used for the join dates the dates
table which happens in dates_for_users
. Now we have all the date gaps filled for each user, we can left join to the new data, to get score and zero if null, and use LAG to get the prior QUOTA ignoring nulls, which a little NVL to only use the LAG value if the current row does not have a value (COALESCE can also be used for this).