Docker PostgreSQL change database encoding to UTF-8

I want to run via docker-compose a postgres container which has COLLATE and CTYPE 'C' and database encoding 'UTF-8'. But this looks to be impossible.

This is the part on the docker-compose.yml:

database:
    image: postgres:latest
    volumes:
        - db:/var/lib/postgresql/data
    environment:
        POSTGRES_PASSWORD: test
        LC_COLLATE: C
        LC_CTYPE: C
        LANG: C.UTF-8

And this is the log output:

The database cluster will be initialized with locales.
The default text search configuration will be set to "english".
  COLLATE:  C
  CTYPE:    C
  MESSAGES: C.UTF-8
  MONETARY: C.UTF-8
  NUMERIC:  C.UTF-8
  TIME:     C.UTF-8
The default database encoding has accordingly been set to "SQL_ASCII".

I must have the database encoding in UTF-8 and the COLLATE and CTYPE in 'C' and not 'C.UTF-8' as otherwise a dependend application cannot connect.

I didn't find anything in any documentation or anywhere else.


Solution 1:

You need to conjoin two pieces of the puzzle here:

https://www.postgresql.org/docs/9.5/app-initdb.html

initdb, teachs you how to pass encoding information to the database creation function.

The postgres official Docker image, states you can pass options, to initdb:

https://hub.docker.com/_/postgres

Ergo, the answer would be something like:

database:
    image: postgres:latest
    volumes:
        - db:/var/lib/postgresql/data
    environment:
        POSTGRES_PASSWORD: test
        POSTGRES_INITDB_ARGS: '--encoding=UTF-8 --lc-collate=C --lc-ctype=C'

Or similar arguments. I ignored the lang option, as this is not an official "pass this flag to postgres" option on the man page (the first link I included).

My tests did not run this using docker compose, it was on the command line using the -e option. This is the exact same concept however; "environment" in docker compose is -e on the command line. To wit:

https://docs.docker.com/engine/reference/commandline/run/

--env , -e Set environment variables

Test #1 with only the password env set:

docker run -e POSTGRES_PASSWORD=test postgres:latest

Here's the output of the default run:

postgres@cbf23636dabc:~$ psql
psql (13.4 (Debian 13.4-1.pgdg100+1))
Type "help" for help.

postgres=# \l
                                 List of databases
   Name    |  Owner   | Encoding |  Collate   |   Ctype    |   Access privileges   
-----------+----------+----------+------------+------------+-----------------------
 postgres  | postgres | UTF8     | en_US.utf8 | en_US.utf8 | 
 template0 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
           |          |          |            |            | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
           |          |          |            |            | postgres=CTc/postgres

Test #2, with environment variables set as above in the suggested docker compose only on CLI:

docker run -e POSTGRES_PASSWORD=test -e POSTGRES_INITDB_ARGS='--encoding=UTF-8 --lc-collate=C --lc-ctype=C' postgres:latest

And then the output:

postgres@b6b80c876f3e:~$ psql 
psql (13.4 (Debian 13.4-1.pgdg100+1))
Type "help" for help.

postgres=# \l
                             List of databases
   Name    |  Owner   | Encoding | Collate | Ctype |   Access privileges   
-----------+----------+----------+---------+-------+-----------------------
 postgres  | postgres | UTF8     | C       | C     | 
 template0 | postgres | UTF8     | C       | C     | =c/postgres          +
           |          |          |         |       | postgres=CTc/postgres
 template1 | postgres | UTF8     | C       | C     | =c/postgres          +
           |          |          |         |       | postgres=CTc/postgres

Note also, the section on the official Postgresql Docker image page, where it describes initialization scripts. This is something you may look into as well.