Starting and populating a Postgres container in Docker

I have a Docker container that contains my Postgres database. It's using the official Postgres image which has a CMD entry that starts the server on the main thread.

I want to populate the database by running RUN psql –U postgres postgres < /dump/dump.sql before it starts listening to queries.

I don't understand how this is possible with Docker. If I place the RUN command after CMD, it will of course never be reached because Docker has finished reading the Dockerfile. But if I place it before the CMD, it will run before psql even exists as a process.

How can I prepopulate a Postgres database in Docker?


After a lot of fighting, I have found a solution ;-)

For me was very useful a comment posted here: https://registry.hub.docker.com/_/postgres/ from "justfalter"

Anyway, I have done in this way:

# Dockerfile
FROM postgres:9.4

RUN mkdir -p /tmp/psql_data/

COPY db/structure.sql /tmp/psql_data/
COPY scripts/init_docker_postgres.sh /docker-entrypoint-initdb.d/

db/structure.sql is a sql dump, useful to initialize the first tablespace.

Then, the init_docker_postgres.sh

#!/bin/bash

# this script is run when the docker container is built
# it imports the base database structure and create the database for the tests

DATABASE_NAME="db_name"
DB_DUMP_LOCATION="/tmp/psql_data/structure.sql"

echo "*** CREATING DATABASE ***"

# create default database
gosu postgres postgres --single <<EOSQL
  CREATE DATABASE "$DATABASE_NAME";
  GRANT ALL PRIVILEGES ON DATABASE "$DATABASE_NAME" TO postgres;
EOSQL

# clean sql_dump - because I want to have a one-line command

# remove indentation
sed "s/^[ \t]*//" -i "$DB_DUMP_LOCATION"

# remove comments
sed '/^--/ d' -i "$DB_DUMP_LOCATION"

# remove new lines
sed ':a;N;$!ba;s/\n/ /g' -i "$DB_DUMP_LOCATION"

# remove other spaces
sed 's/  */ /g' -i "$DB_DUMP_LOCATION"

# remove firsts line spaces
sed 's/^ *//' -i "$DB_DUMP_LOCATION"

# append new line at the end (suggested by @Nicola Ferraro)
sed -e '$a\' -i "$DB_DUMP_LOCATION"

# import sql_dump
gosu postgres postgres --single "$DATABASE_NAME" < "$DB_DUMP_LOCATION";


echo "*** DATABASE CREATED! ***"

So finally:

# no postgres is running
[myserver]# psql -h 127.0.0.1 -U postgres
psql: could not connect to server: Connection refused
    Is the server running on host "127.0.0.1" and accepting
    TCP/IP connections on port 5432?

[myserver]# docker build -t custom_psql .
[myserver]# docker run -d --name custom_psql_running -p 5432:5432 custom_psql

[myserver]# docker ps -a
CONTAINER ID        IMAGE                COMMAND                CREATED             STATUS              PORTS                    NAMES
ce4212697372        custom_psql:latest   "/docker-entrypoint.   9 minutes ago       Up 9 minutes        0.0.0.0:5432->5432/tcp   custom_psql_running

[myserver]# psql -h 127.0.0.1 -U postgres
psql (9.2.10, server 9.4.1)
WARNING: psql version 9.2, server version 9.4.
         Some psql features might not work.
Type "help" for help.

postgres=# 

# postgres is now initialized with the dump

Hope it helps!


For those who want to initialize a PostgreSQL DB with millions of records during the first run.

Import using *.sql dump

You can do simple sql dump and copy the dump.sql file into /docker-entrypoint-initdb.d/. The problem is speed. My dump.sql script is about 17MB (small DB - 10 tables with 100k rows in only one of them) and the initialization takes over a minute (!). That is unacceptable for local development / unit test, etc.

Import using binary dump

The solution is to make a binary PostgreSQL dump and use shell scripts initialization support. Then the same DB is initialized in about 500ms instead of 1 minute.

1. Create the dump.pgdata binary dump of a DB named "my-db" directly from within a container or your local DB

pg_dump -U postgres --format custom my-db > "dump.pgdata"

Or from host from running container (postgres-container)

docker exec postgres-container pg_dump -U postgres --format custom my-db > "dump.pgdata"

2. Create a Docker image with a given dump and initialization script

$ tree
.
├── Dockerfile
└── docker-entrypoint-initdb.d
    ├── 01-restore.sh
    ├── 02-small-updates.sql
    └── dump.pgdata
$ cat Dockerfile
FROM postgres:11

COPY ./docker-entrypoint-initdb.d/ /docker-entrypoint-initdb.d/
$ cat docker-entrypoint-initdb.d/01-restore.sh
#!/bin/bash

file="/docker-entrypoint-initdb.d/dump.pgdata"
dbname=my-db

echo "Restoring DB using $file"
pg_restore -U postgres --dbname=$dbname --verbose --single-transaction < "$file" || exit 1
$ cat docker-entrypoint-initdb.d/02-small-updates.sql
-- some updates on your DB, for example for next application version
-- this file will be executed on DB during next release
UPDATE ... ;

3. Build an image and run it

$ docker build -t db-test-img .
$ docker run -it --rm --name db-test db-test-img

Alternatively, you can just mount a volume to /docker-entrypoint-initdb.d/ that contains all your DDL scripts. You can put in *.sh, *.sql, or *.sql.gz files and it will take care of executing those on start-up.

e.g. (assuming you have your scripts in /tmp/my_scripts)

docker run -v /tmp/my_scripts:/docker-entrypoint-initdb.d postgres

There is yet another option available that utilises Flocker:

Flocker is a container data volume manager that is designed to allow databases like PostgreSQL to easily run in containers in production. When running a database in production, you have to think about things like recovering from host failure. Flocker provides tools for managing data volumes across a cluster of machines like you have in a production environment. For example, as a Postgres container is scheduled between hosts in response to server failure, Flocker can automatically move its associated data volume between hosts at the same time. This means that when your Postgres container starts up on a new host, it has its data. This operation can be accomplished manually using the Flocker API or CLI, or automatically by a container orchestration tool that Flocker is integrates with, for example Docker Swarm, Kubernetes or Mesos.