How do you change the character encoding of a postgres database?

Solution 1:

First off, Daniel's answer is the correct, safe option.

For the specific case of changing from SQL_ASCII to something else, you can cheat and simply poke the pg_database catalogue to reassign the database encoding. This assumes you've already stored any non-ASCII characters in the expected encoding (or that you simply haven't used any non-ASCII characters).

Then you can do:

update pg_database set encoding = pg_char_to_encoding('UTF8') where datname = 'thedb'

This will not change the collation of the database, just how the encoded bytes are converted into characters (so now length('£123') will return 4 instead of 5). If the database uses 'C' collation, there should be no change to ordering for ASCII strings. You'll likely need to rebuild any indices containing non-ASCII characters though.

Caveat emptor. Dumping and reloading provides a way to check your database content is actually in the encoding you expect, and this doesn't. And if it turns out you did have some wrongly-encoded data in the database, rescuing is going to be difficult. So if you possibly can, dump and reinitialise.

Solution 2:

To change the encoding of your database:

  1. Dump your database
  2. Drop your database,
  3. Create new database with the different encoding
  4. Reload your data.

Make sure the client encoding is set correctly during all this.

Source: http://archives.postgresql.org/pgsql-novice/2006-03/msg00210.php

Solution 3:

Dumping a database with a specific encoding and try to restore it on another database with a different encoding could result in data corruption. Data encoding must be set BEFORE any data is inserted into the database.

Check this : When copying any other database, the encoding and locale settings cannot be changed from those of the source database, because that might result in corrupt data.

And this : Some locale categories must have their values fixed when the database is created. You can use different settings for different databases, but once a database is created, you cannot change them for that database anymore. LC_COLLATE and LC_CTYPE are these categories. They affect the sort order of indexes, so they must be kept fixed, or indexes on text columns would become corrupt. (But you can alleviate this restriction using collations, as discussed in Section 22.2.) The default values for these categories are determined when initdb is run, and those values are used when new databases are created, unless specified otherwise in the CREATE DATABASE command.


I would rather rebuild everything from the begining properly with a correct local encoding on your debian OS as explained here :

su root

Reconfigure your local settings :

dpkg-reconfigure locales

Choose your locale (like for instance for french in Switzerland : fr_CH.UTF8)

Uninstall and clean properly postgresql :

apt-get --purge remove postgresql\*
rm -r /etc/postgresql/
rm -r /etc/postgresql-common/
rm -r /var/lib/postgresql/
userdel -r postgres
groupdel postgres

Re-install postgresql :

aptitude install postgresql-9.1 postgresql-contrib-9.1 postgresql-doc-9.1

Now any new database will be automatically be created with correct encoding, LC_TYPE (character classification), and LC_COLLATE (string sort order).

Solution 4:

Daniel Kutik's answer is correct, but it can be even more safe, with database renaming.

So, the truly safe way is:

  1. Create new database with the different encoding and name
  2. Dump your database
  3. Restore dump to the new DB
  4. Test that your application runs correctly with the new DB
  5. Rename old DB to something meaningful
  6. Rename new DB
  7. Test application again
  8. Drop the old database

In case of emergency, just rename DBs back

Solution 5:

# dump into file
pg_dump myDB > /tmp/myDB.sql

# create an empty db with the right encoding (on older versions the escaped single quotes are needed!)
psql -c 'CREATE DATABASE "tempDB" WITH OWNER = "myself" LC_COLLATE = '\''de_DE.utf8'\'' TEMPLATE template0;'

# import in the new DB
psql -d tempDB -1 -f /tmp/myDB.sql

# rename databases
psql -c 'ALTER DATABASE "myDB" RENAME TO "myDB_wrong_encoding";' 
psql -c 'ALTER DATABASE "tempDB" RENAME TO "myDB";'

# see the result
psql myDB -c "SHOW LC_COLLATE"