Docker daemon ignores daemon.json on boot

My Docker Daemon seems to ignore /etc/docker/daemon.json on boot.

Similar to this question, I'm having some troubles telling the Docker daemon that it should not use the default 172.17.* range. That range is already claimed by our VPN and prevents people connected through that VPN from making a connection to the server Docker runs on.

The hugely annoying thing is that every time I reboot my server, Docker claims an IP from the VPN's range again, regardless of what I put in /etc/docker/daemon.json. I have to manually issue

# systemctl restart docker

directly after boot before people on the 172.17.* network can reach the server again.

This obviously gets forgotten quite often and leads to many problem tickets.

My /etc/docker/daemon.json looks like this:

{
 "default-address-pools": [
   {
      "base": "172.20.0.0/16",
      "size": 24
   }
 ]
}

and is permissioned like so:

-rw-r--r--   1 root root   123 Dec  8 10:43 daemon.json

I have no idea how to even start diagnosing this problem; any ideas?

For completeness:

Ubuntu 18.04.5 LTS
Docker version 19.03.6, build 369ce74a3c

EDIT: output of systemctl cat docker:

# /lib/systemd/system/docker.service
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network-online.target firewalld.service containerd.service
Wants=network-online.target
Requires=docker.socket
Wants=containerd.service

[Service]
Type=notify
# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker
ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
ExecReload=/bin/kill -s HUP $MAINPID
TimeoutSec=0
RestartSec=2
Restart=always

# Note that StartLimit* options were moved from "Service" to "Unit" in systemd 229.
# Both the old, and new location are accepted by systemd 229 and up, so using the old location
# to make them work for either version of systemd.
StartLimitBurst=3

# Note that StartLimitInterval was renamed to StartLimitIntervalSec in systemd 230.
# Both the old, and new name are accepted by systemd 230 and up, so using the old name to make
# this option work for either version of systemd.
StartLimitInterval=60s

# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity

# Comment TasksMax if your systemd version does not support it.
# Only systemd 226 and above support this option.
TasksMax=infinity

# set delegate yes so that systemd does not reset the cgroups of docker containers
Delegate=yes

# kill only the docker process, not all processes in the cgroup
KillMode=process

[Install]
WantedBy=multi-user.target

Output of sudo docker info (after systemctl restart docker):

Client:
 Debug Mode: false

Server:
 Containers: 34
  Running: 19
  Paused: 0
  Stopped: 15
 Images: 589
 Server Version: 19.03.6
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 
 runc version: 
 init version: 
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 4.15.0-140-generic
 Operating System: Ubuntu 18.04.5 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 16
 Total Memory: 47.16GiB
 Name: linuxsrv
 ID: <redacted>
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Username: <redacted>
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Registry Mirrors:
  http://172.16.30.33:6000/
 Live Restore Enabled: false

WARNING: No swap limit support

Solution 1:

There are multiple address pools used by docker. The default-address-pools applies to all new user created bridge networks. It's possible you'll need to delete and recreate those networks after changing this setting.

There's also bip, set in the daemon.json file with a line like:

"bip": "192.168.63.1/24"

The bip setting applies to the default bridge network named bridge and needs to be set to the CIDR for the gateway on that bridge network (so you can't define it to 192.168.63.0/24, the trailing .1 was important).

And if you are using swarm mode, overlay networks have their own address pools shared across nodes in the overlay network. That needs to be configured during docker swarm init with the --default-addr-pool flag.

Lastly if you are running docker via snap, the location of this file is /var/snap/docker/current/etc/docker/daemon.json and it doesn't appear that is preserved across updates, so you'll need to replace this file again after an update.

Solution 2:

Although I thought I resolved the problem using BMitch's answer, I was wrong - the docker0 address was still in the wrong 172.17.*.* range after boot.

After a lot more digging, it turned out that, somehow, I had multiple versions of dockerd installed: