Parameter substitution for a SQLite "IN" clause
I am trying to use parameter substitution with SQLite within Python for an IN clause. Here is a complete running example that demonstrates:
import sqlite3
c = sqlite3.connect(":memory:")
c.execute('CREATE TABLE distro (id INTEGER PRIMARY KEY AUTOINCREMENT, name TEXT)')
for name in 'Ubuntu Fedora Puppy DSL SuSE'.split():
c.execute('INSERT INTO distro (name) VALUES (?)', [ name ] )
desired_ids = ["1", "2", "5", "47"]
result_set = c.execute('SELECT * FROM distro WHERE id IN (%s)' % (", ".join(desired_ids)), ())
for result in result_set:
print result
It prints out:
(1, u'Ubuntu') (2, u'Fedora') (5, u'SuSE')
As the docs state that "[y]ou shouldn’t assemble your query using Python’s string operations because doing so is insecure; it makes your program vulnerable to an SQL injection attack," I am hoping to use parameter substitution.
When I try:
result_set = c.execute('SELECT * FROM distro WHERE id IN (?)', [ (", ".join(desired_ids)) ])
I get an empty result set, and when I try:
result_set = c.execute('SELECT * FROM distro WHERE id IN (?)', [ desired_ids ] )
I get:
InterfaceError: Error binding parameter 0 - probably unsupported type.
While I hope that any answer to this simplified problem will work, I would like to point out that the actual query I want to perform is in a doubly-nested subquery. To wit:
UPDATE dir_x_user SET user_revision = user_attempted_revision
WHERE user_id IN
(SELECT user_id FROM
(SELECT user_id, MAX(revision) FROM users WHERE obfuscated_name IN
("Argl883", "Manf496", "Mook657") GROUP BY user_id
)
)
You do need the right number of ?
s, but that doesn't pose a sql injection risk:
>>> result_set = c.execute('SELECT * FROM distro WHERE id IN (%s)' %
','.join('?'*len(desired_ids)), desired_ids)
>>> print result_set.fetchall()
[(1, u'Ubuntu'), (2, u'Fedora'), (5, u'SuSE')]
According to http://www.sqlite.org/limits.html (item 9), SQLite can't (by default) handle more than 999 parameters to a query, so the solutions here (generating the required list of placeholders) will fail if you have thousands of items that you're looking IN
. If that's the case, you're going to need to break up the list then loop over the parts of it and join up the results yourself.
If you don't need thousands of items in your IN
clause, then Alex's solution is the way to do it (and appears to be how Django does it).
Update: this works:
import sqlite3
c = sqlite3.connect(":memory:")
c.execute('CREATE TABLE distro (id INTEGER PRIMARY KEY AUTOINCREMENT, name TEXT)')
for name in 'Ubuntu Fedora Puppy DSL SuSE'.split():
c.execute('INSERT INTO distro (name) VALUES (?)', ( name,) )
desired_ids = ["1", "2", "5", "47"]
result_set = c.execute('SELECT * FROM distro WHERE id IN (%s)' % ("?," * len(desired_ids))[:-1], desired_ids)
for result in result_set:
print result
The issue was that you need to have one ? for each element in the input list.
The statement ("?," * len(desired_ids))[:-1]
makes a repeating string of "?,", then cuts off the last comma. so that there is one question mark for each element in desired_ids.
I always end up doing something like this:
query = 'SELECT * FROM distro WHERE id IN (%s)' % ','.join('?' for i in desired_ids)
c.execute(query, desired_ids)
There's no injection risk because you're not putting strings from desired_ids into the query directly.