Python CSV to SQLite

Chris is right - use transactions; divide the data into chunks and then store it.

"... Unless already in a transaction, each SQL statement has a new transaction started for it. This is very expensive, since it requires reopening, writing to, and closing the journal file for each statement. This can be avoided by wrapping sequences of SQL statements with BEGIN TRANSACTION; and END TRANSACTION; statements. This speedup is also obtained for statements which don't alter the database." - Source: http://web.utk.edu/~jplyon/sqlite/SQLite_optimization_FAQ.html

"... there is another trick you can use to speed up SQLite: transactions. Whenever you have to do multiple database writes, put them inside a transaction. Instead of writing to (and locking) the file each and every time a write query is issued, the write will only happen once when the transaction completes." - Source: How Scalable is SQLite?

import csv, sqlite3, time

def chunks(data, rows=10000):
    """ Divides the data into 10000 rows each """

    for i in xrange(0, len(data), rows):
        yield data[i:i+rows]


if __name__ == "__main__":

    t = time.time()

    conn = sqlite3.connect( "path/to/file.db" )
    conn.text_factory = str  #bugger 8-bit bytestrings
    cur = conn.cur()
    cur.execute('CREATE TABLE IF NOT EXISTS mytable (field2 VARCHAR, field4 VARCHAR)')

    csvData = csv.reader(open(filecsv.txt, "rb"))

    divData = chunks(csvData) # divide into 10000 rows each

    for chunk in divData:
        cur.execute('BEGIN TRANSACTION')

        for field1, field2, field3, field4, field5 in chunk:
            cur.execute('INSERT OR IGNORE INTO mytable (field2, field4) VALUES (?,?)', (field2, field4))

        cur.execute('COMMIT')

    print "\n Time Taken: %.3f sec" % (time.time()-t)

It's possible to import the CSV directly:

sqlite> .separator ","
sqlite> .import filecsv.txt mytable

http://www.sqlite.org/cvstrac/wiki?p=ImportingFiles

As it's been said (Chris and Sam), transactions do improve a lot insert performance.

Please, let me recommend another option, to use a suite of Python utilities to work with CSV, csvkit.

To install:

pip install csvkit

To solve your problem

csvsql --db sqlite:///path/to/file.db --insert --table mytable filecsv.txt

Try using transactions.

begin    
insert 50,000 rows    
commit

That will commit data periodically rather than once per row.

Python CSV to SQLite

Related

Recent Posts