Improving Postgres psycopg2 query performance for Python to the same level of Java's JDBC driver
Solution 1:
This is not an answer out of the box, with all client/db stuff you may need to do some work to determine exactly what is amiss
backup postgresql.conf changing
log_min_duration_statement to 0
log_destination = 'csvlog' # Valid values are combinations of
logging_collector = on # Enable capturing of stderr and csvlog
log_directory = 'pg_log' # directory where log files are written,
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' # log file name pattern,
debug_print_parse = on
debug_print_rewritten = on
debug_print_plan output = on
log_min_messages = info (debug1 for all server versions prior to 8.4)
Stop and restart your database server ( reload may not pick up the changes ) Reproduce your tests ensuring that the server time and client times match and that you record the start times etc.
copy the log file off an import into editor of your choice (excel or another spreadsheet can be useful for getting advance manipulation for sql & plans etc)
now examine the timings from the server side and note:
is the sql reported on the server the same in each case
if the same you should have the same timings
is the client generating a cursor rather than passing sql
is one driver doing a lot of casting/converting between character sets or implicit converting of other types such as dates or timestamps.
and so on
The plan data will be included for completeness, this may inform if there are gross differences in the SQL submitted by the clients.
Solution 2:
The stuff below is probably aiming above and beyond what you have in mind or what is deemed acceptable in your environment, but I'll put the option on the table just in case.
- Is the destination of every
SELECT
in yourtest.sql
truly a simple|
-separated results file? - Is non-portability (Postgres-specificity) acceptable?
- Is your backend Postgres 8.2 or newer?
- Will the script run on the same host as the database backend, or would it be acceptable to generate the
|
-separated results file(s) from within the backend (e.g. to a share?)
If the answer to all of the above questions is yes, then you can transform your SELECT ...
statements to COPY ( SELECT ... ) TO E'path-to-results-file' WITH DELIMITER '|' NULL E'\\N'
.
Solution 3:
An alternative could be to use ODBC. This is assuming that Python ODBC driver performs well.
PostgreSQL has ODBC drivers for both Windows and Linux.