psycopg2: insert multiple rows with one query
I need to insert multiple rows with one query (number of rows is not constant), so I need to execute query like this one:
INSERT INTO t (a, b) VALUES (1, 2), (3, 4), (5, 6);
The only way I know is
args = [(1,2), (3,4), (5,6)]
args_str = ','.join(cursor.mogrify("%s", (x, )) for x in args)
cursor.execute("INSERT INTO t (a, b) VALUES "+args_str)
but I want some simpler way.
I built a program that inserts multiple lines to a server that was located in another city.
I found out that using this method was about 10 times faster than executemany
. In my case tup
is a tuple containing about 2000 rows. It took about 10 seconds when using this method:
args_str = ','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x) for x in tup)
cur.execute("INSERT INTO table VALUES " + args_str)
and 2 minutes when using this method:
cur.executemany("INSERT INTO table VALUES(%s,%s,%s,%s,%s,%s,%s,%s,%s)", tup)
New execute_values
method in Psycopg 2.7:
data = [(1,'x'), (2,'y')]
insert_query = 'insert into t (a, b) values %s'
psycopg2.extras.execute_values (
cursor, insert_query, data, template=None, page_size=100
)
The pythonic way of doing it in Psycopg 2.6:
data = [(1,'x'), (2,'y')]
records_list_template = ','.join(['%s'] * len(data))
insert_query = 'insert into t (a, b) values {}'.format(records_list_template)
cursor.execute(insert_query, data)
Explanation: If the data to be inserted is given as a list of tuples like in
data = [(1,'x'), (2,'y')]
then it is already in the exact required format as
-
the
values
syntax of theinsert
clause expects a list of records as ininsert into t (a, b) values (1, 'x'),(2, 'y')
Psycopg
adapts a Pythontuple
to a Postgresqlrecord
.
The only necessary work is to provide a records list template to be filled by psycopg
# We use the data list to be sure of the template length
records_list_template = ','.join(['%s'] * len(data))
and place it in the insert
query
insert_query = 'insert into t (a, b) values {}'.format(records_list_template)
Printing the insert_query
outputs
insert into t (a, b) values %s,%s
Now to the usual Psycopg
arguments substitution
cursor.execute(insert_query, data)
Or just testing what will be sent to the server
print (cursor.mogrify(insert_query, data).decode('utf8'))
Output:
insert into t (a, b) values (1, 'x'),(2, 'y')
Update with psycopg2 2.7:
The classic executemany()
is about 60 times slower than @ant32 's implementation (called "folded") as explained in this thread: https://www.postgresql.org/message-id/20170130215151.GA7081%40deb76.aryehleib.com
This implementation was added to psycopg2 in version 2.7 and is called execute_values()
:
from psycopg2.extras import execute_values
execute_values(cur,
"INSERT INTO test (id, v1, v2) VALUES %s",
[(1, 2, 3), (4, 5, 6), (7, 8, 9)])
Previous Answer:
To insert multiple rows, using the multirow VALUES
syntax with execute()
is about 10x faster than using psycopg2 executemany()
. Indeed, executemany()
just runs many individual INSERT
statements.
@ant32 's code works perfectly in Python 2. But in Python 3, cursor.mogrify()
returns bytes, cursor.execute()
takes either bytes or strings, and ','.join()
expects str
instance.
So in Python 3 you may need to modify @ant32 's code, by adding .decode('utf-8')
:
args_str = ','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x).decode('utf-8') for x in tup)
cur.execute("INSERT INTO table VALUES " + args_str)
Or by using bytes (with b''
or b""
) only:
args_bytes = b','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x) for x in tup)
cur.execute(b"INSERT INTO table VALUES " + args_bytes)
cursor.copy_from is the fastest solution I've found for bulk inserts by far. Here's a gist I made containing a class named IteratorFile which allows an iterator yielding strings to be read like a file. We can convert each input record to a string using a generator expression. So the solution would be
args = [(1,2), (3,4), (5,6)]
f = IteratorFile(("{}\t{}".format(x[0], x[1]) for x in args))
cursor.copy_from(f, 'table_name', columns=('a', 'b'))
For this trivial size of args it won't make much of a speed difference, but I see big speedups when dealing with thousands+ of rows. It will also be more memory efficient than building a giant query string. An iterator would only ever hold one input record in memory at a time, where at some point you'll run out of memory in your Python process or in Postgres by building the query string.
A snippet from Psycopg2's tutorial page at Postgresql.org (see bottom):
A last item I would like to show you is how to insert multiple rows using a dictionary. If you had the following:
namedict = ({"first_name":"Joshua", "last_name":"Drake"},
{"first_name":"Steven", "last_name":"Foo"},
{"first_name":"David", "last_name":"Bar"})
You could easily insert all three rows within the dictionary by using:
cur = conn.cursor()
cur.executemany("""INSERT INTO bar(first_name,last_name) VALUES (%(first_name)s, %(last_name)s)""", namedict)
It doesn't save much code, but it definitively looks better.