Best tool to migrate a PostgreSQL database to MS SQL 2005?
Solution 1:
I ended up not using any third-party tool for the data as none of the ones I've tried worked for the large tables. Even SSIS failed. I did use a commercial tool for the schema, though. So my conversion process was as follows:
- Full Convert Enterprise to copy the schema (no data).
- pg_dump to export the data from Postgres in "plain text" format, which is basically a tab-separated values (TSV) file.
- Python scripts to transform the exported files into a format bcp would understand.
- bcp to import the data into MSSQL.
The transformation step took care of some differences in the formats used by pg_dump and bcp, such as:
- pg_dump puts some Postgres-specific stuff at the start of the file and ends the data with ".", while bcp expects the entire file to contain data
- pg_dump stores NULL values as "\N", while bcp expects nothing in place of a NULL (ie. no data in-between column separators)
- pg_dump encodes tabs as "\t" and newlines as "\n", while bcp treats those literally
- pg_dump always uses tabs and newlines as separators, while bcp allows the user to specify separators. This becomes necessary if the data contains any tabs or newlines, since they're not encoded.
I also found that some unique constraints that were fine in Postgres were violated in MSSQL, so I had to drop them. This was because NULL=NULL in MSSQL (ie. NULL is treated as a unique value), but not in Postgres.
Solution 2:
If you have the appropriate Postgres support drivers installed on your SQL 2005 box (or wish to use Postgres via ODBC, or wish to dump the data from Postgres to a file and import from that) you can use import/export wizard in SQL Server in order to copy the data. This will ask you a variety of questions and then execute the import as a SQL Server Integration Services (SSIS) package job, using appropriate batch insert operations.
However if that wizard is not an option, it's worth considering that although you have a large number of rows, the individual size of the rows is < 135 bytes on average, and given sufficient transaction log space to allow a 50 GB transaction to occur 'simple insert' statements are not themselves out of the question.