Insert into postgreSQL table from pandas with "on conflict" update

If you notice in the to_sql docs there's mention of a method argument that takes a callable. Creating this callable should allow you to use the Postgres clauses you need. Here's an example of a callable they mentioned in the docs: https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-sql-method

It's pretty different from what you need, but follow the arguments passed to this callable. They will allow you to construct a regular SQL statement.


To follow up on Brendan's answer with an example, this is what worked for me:

import os
import sqlalchemy as sa
import pandas as pd
from sqlalchemy.dialects.postgresql import insert


engine = sa.create_engine(os.getenv("DBURL"))
meta = sa.MetaData()
meta.bind = engine
meta.reflect(views=True)


def upsert(table, conn, keys, data_iter):
    upsert_args = {"constraint": "test_table_col_a_col_b_key"}
    for data in data_iter:
        data = {k: data[i] for i, k in enumerate(keys)}
        upsert_args["set_"] = data
        insert_stmt = insert(meta.tables[table.name]).values(**data)
        upsert_stmt = insert_stmt.on_conflict_do_update(**upsert_args)
        conn.execute(upsert_stmt)


if __name__ == "__main__":
    df = pd.read_csv("test_data.txt")
    with db.engine.connect() as conn:
        df.to_sql(
            "test_table",
            con=conn,
            if_exists="append",
            method=upsert,
            index=False,
        )

where in this example the schema would be something like:

CREATE TABLE test_table(
    col_a text NOT NULL,
    col_b text NOT NULL,
    col_c text,
    UNIQUE (col_a, col_b)
)

If anybody wanted to build on top of the answer from zdgriffith and dynamically generate the table constraint name you can use the following query for postgreSQL:

select distinct tco.constraint_name
from information_schema.table_constraints tco
         join information_schema.key_column_usage kcu
              on kcu.constraint_name = tco.constraint_name
                  and kcu.constraint_schema = tco.constraint_schema
                  and kcu.constraint_name = tco.constraint_name
where kcu.table_name = '{table.name}'
  and constraint_type = 'PRIMARY KEY';

You can then format this string to populate table.name inside the upsert() method.

I also didn't require the meta.bind and meta.reflect() lines. The latter will be deprecated soon anyway.