Starting and populating a Postgres container in Docker
I have a Docker container that contains my Postgres database. It's using the official Postgres image which has a CMD entry that starts the server on the main thread.
I want to populate the database by running RUN psql –U postgres postgres < /dump/dump.sql
before it starts listening to queries.
I don't understand how this is possible with Docker. If I place the RUN
command after CMD, it will of course never be reached because Docker has finished reading the Dockerfile. But if I place it before the CMD
, it will run before psql even exists as a process.
How can I prepopulate a Postgres database in Docker?
After a lot of fighting, I have found a solution ;-)
For me was very useful a comment posted here: https://registry.hub.docker.com/_/postgres/ from "justfalter"
Anyway, I have done in this way:
# Dockerfile
FROM postgres:9.4
RUN mkdir -p /tmp/psql_data/
COPY db/structure.sql /tmp/psql_data/
COPY scripts/init_docker_postgres.sh /docker-entrypoint-initdb.d/
db/structure.sql
is a sql dump, useful to initialize the first tablespace.
Then, the init_docker_postgres.sh
#!/bin/bash
# this script is run when the docker container is built
# it imports the base database structure and create the database for the tests
DATABASE_NAME="db_name"
DB_DUMP_LOCATION="/tmp/psql_data/structure.sql"
echo "*** CREATING DATABASE ***"
# create default database
gosu postgres postgres --single <<EOSQL
CREATE DATABASE "$DATABASE_NAME";
GRANT ALL PRIVILEGES ON DATABASE "$DATABASE_NAME" TO postgres;
EOSQL
# clean sql_dump - because I want to have a one-line command
# remove indentation
sed "s/^[ \t]*//" -i "$DB_DUMP_LOCATION"
# remove comments
sed '/^--/ d' -i "$DB_DUMP_LOCATION"
# remove new lines
sed ':a;N;$!ba;s/\n/ /g' -i "$DB_DUMP_LOCATION"
# remove other spaces
sed 's/ */ /g' -i "$DB_DUMP_LOCATION"
# remove firsts line spaces
sed 's/^ *//' -i "$DB_DUMP_LOCATION"
# append new line at the end (suggested by @Nicola Ferraro)
sed -e '$a\' -i "$DB_DUMP_LOCATION"
# import sql_dump
gosu postgres postgres --single "$DATABASE_NAME" < "$DB_DUMP_LOCATION";
echo "*** DATABASE CREATED! ***"
So finally:
# no postgres is running
[myserver]# psql -h 127.0.0.1 -U postgres
psql: could not connect to server: Connection refused
Is the server running on host "127.0.0.1" and accepting
TCP/IP connections on port 5432?
[myserver]# docker build -t custom_psql .
[myserver]# docker run -d --name custom_psql_running -p 5432:5432 custom_psql
[myserver]# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ce4212697372 custom_psql:latest "/docker-entrypoint. 9 minutes ago Up 9 minutes 0.0.0.0:5432->5432/tcp custom_psql_running
[myserver]# psql -h 127.0.0.1 -U postgres
psql (9.2.10, server 9.4.1)
WARNING: psql version 9.2, server version 9.4.
Some psql features might not work.
Type "help" for help.
postgres=#
# postgres is now initialized with the dump
Hope it helps!
For those who want to initialize a PostgreSQL DB with millions of records during the first run.
Import using *.sql dump
You can do simple sql dump and copy the dump.sql
file into /docker-entrypoint-initdb.d/
. The problem is speed. My dump.sql
script is about 17MB (small DB - 10 tables with 100k rows in only one of them) and the initialization takes over a minute (!). That is unacceptable for local development / unit test, etc.
Import using binary dump
The solution is to make a binary PostgreSQL dump and use shell scripts initialization support. Then the same DB is initialized in about 500ms instead of 1 minute.
1. Create the dump.pgdata
binary dump of a DB named "my-db"
directly from within a container or your local DB
pg_dump -U postgres --format custom my-db > "dump.pgdata"
Or from host from running container (postgres-container)
docker exec postgres-container pg_dump -U postgres --format custom my-db > "dump.pgdata"
2. Create a Docker image with a given dump and initialization script
$ tree
.
├── Dockerfile
└── docker-entrypoint-initdb.d
├── 01-restore.sh
├── 02-small-updates.sql
└── dump.pgdata
$ cat Dockerfile
FROM postgres:11
COPY ./docker-entrypoint-initdb.d/ /docker-entrypoint-initdb.d/
$ cat docker-entrypoint-initdb.d/01-restore.sh
#!/bin/bash
file="/docker-entrypoint-initdb.d/dump.pgdata"
dbname=my-db
echo "Restoring DB using $file"
pg_restore -U postgres --dbname=$dbname --verbose --single-transaction < "$file" || exit 1
$ cat docker-entrypoint-initdb.d/02-small-updates.sql
-- some updates on your DB, for example for next application version
-- this file will be executed on DB during next release
UPDATE ... ;
3. Build an image and run it
$ docker build -t db-test-img .
$ docker run -it --rm --name db-test db-test-img
Alternatively, you can just mount a volume to /docker-entrypoint-initdb.d/ that contains all your DDL scripts. You can put in *.sh, *.sql, or *.sql.gz files and it will take care of executing those on start-up.
e.g. (assuming you have your scripts in /tmp/my_scripts)
docker run -v /tmp/my_scripts:/docker-entrypoint-initdb.d postgres
There is yet another option available that utilises Flocker:
Flocker is a container data volume manager that is designed to allow databases like PostgreSQL to easily run in containers in production. When running a database in production, you have to think about things like recovering from host failure. Flocker provides tools for managing data volumes across a cluster of machines like you have in a production environment. For example, as a Postgres container is scheduled between hosts in response to server failure, Flocker can automatically move its associated data volume between hosts at the same time. This means that when your Postgres container starts up on a new host, it has its data. This operation can be accomplished manually using the Flocker API or CLI, or automatically by a container orchestration tool that Flocker is integrates with, for example Docker Swarm, Kubernetes or Mesos.