How do you write incremental models in DBT?
If you have a date field you can use it to load only the last data. I.e. you have transaction_date column.
{{ config(
materialized='incremental',
as_columnstore=false,
pre_hook="""
{% if is_incremental() %}
delete from {{this}}
where transaction_date >= '{{ (modules.datetime.datetime.now() - modules.datetime.timedelta(2)).isoformat() }}'
{% endif %}
"""
)}}
SELECT
u.user_id, t.transaction_id
FROM users u
JOIN transactions t ON t.user_id = u.user_id
WHERE u.active = 1
{% if is_incremental() %}
and transaction_date >= '{{ (modules.datetime.datetime.now() - modules.datetime.timedelta(2)).isoformat() }}'
{% endif %}
The first time you run this model it will ignore all the code under "if is_incremental()". All the upcoming runs it will clean up transactions for the last two days and reload them.
As @anders-swanson wrote in his comment, if transaction_id
is definitely unique, you could set it as the unique_key
and materialize your model as an incremental table.
dbt's docs explain how to do this. Using your example, it might be:
{{
config(
materialized='incremental',
unique_key='transaction_id'
)
}}
select
u.user_id, t.transaction_id
from users u
join transactions t ON t.user_id = u.user_id
where u.active = 1
If transaction_id
is not unique but transaction_id
||user_id
is, you could try creating a new column which concatenates these columns in an upstream dbt model, and then assigning that as the unique_key
:
{{
config(
materialized='incremental',
unique_key='pkey'
)
}}
select
u.user_id,
t.transaction_id,
u.user_id||t.transaction_id as pkey
from users u
join transactions t ON t.user_id = u.user_id
where u.active = 1
Otherwise, you'll have to pull in a column that is either a) unique, or b) has an ordered quality that could be used to apply an is_incremental()
filter (like @viacheslav-nefedov wrote).