how to create docker overlay network between multi hosts?

I have been trying to create an overlay network between two hosts with no success. I keep getting the error message:

mavungu@mavungu-Aspire-5250:~$ sudo docker -H tcp://192.168.0.18:2380 network create -d overlay myapp
Error response from daemon: 500 Internal Server Error: failed to parse pool request for address space "GlobalDefault" pool "" subpool "": cannot find address space GlobalDefault (most likely the backing datastore is not configured)

mavungu@mavungu-Aspire-5250:~$ sudo docker network create -d overlay myapp
[sudo] password for mavungu:
Error response from daemon: failed to parse pool request for address space "GlobalDefault" pool "" subpool "": cannot find address space GlobalDefault (most likely the backing datastore is not configured)

My environment details:

mavungu@mavungu-Aspire-5250:~$ sudo docker info Containers: 1
Images: 364 Server Version: 1.9.1 Storage Driver: aufs Root Dir:
/var/lib/docker/aufs Backing Filesystem: extfs Dirs: 368 Dirperm1
Supported: true Execution Driver: native-0.2 Logging Driver:
json-file Kernel Version: 3.19.0-26-generic Operating System: Ubuntu
15.04 CPUs: 2 Total Memory: 3.593 GiB Name: mavungu-Aspire-5250 Registry: https://index.docker.io/v1/ WARNING: No swap limit support

I have a swarm cluster working well with consul as the discovery mechanism:

mavungu@mavungu-Aspire-5250:~$ sudo docker -H tcp://192.168.0.18:2380 info 

Containers: 4 
Images: 51 
Role: primary
Strategy: spread
Filters: health, port, dependency, affinity, constraint
Nodes: 2
mavungu-Aspire-5250: 192.168.0.36:2375
└ Containers: 1
└ Reserved CPUs: 0 / 2
└ Reserved Memory: 0 B / 3.773 GiB
└ Labels: executiondriver=native-0.2, kernelversion=3.19.0-26-generic, operatingsystem=Ubuntu 15.04, storagedriver=aufs
mavungu-HP-Pavilion-15-Notebook-PC: 192.168.0.18:2375
└ Containers: 3
└ Reserved CPUs: 0 / 4
└ Reserved Memory: 0 B / 3.942 GiB
└ Labels: executiondriver=native-0.2, kernelversion=4.2.0-19-generic, operatingsystem=Ubuntu 15.10, storagedriver=aufs
CPUs: 6
Total Memory: 7.715 GiB
Name: bb47f4e57436

My consul is available at 192.168.0.18:8500 and it works well with the swarm cluster.

I would like to be able to create an overlay network across the two hosts. I have configured the docker engines on both hosts with this additional settings:

DOCKER_OPTS="-D --cluster-store-consul://192.168.0.18:8500 --cluster-advertise=192.168.0.18:0"

DOCKER_OPTS="-D --cluster-store-consul://192.168.0.18:8500 --cluster-advertise=192.168.0.36:0"

I had to stop and restart the engines and reset the swarm cluster... After failing to create the overlay network, I changed the --cluster-advertise setting to this :

DOCKER_OPTS="-D --cluster-store-consul://192.168.0.18:8500 --cluster-advertise=192.168.0.18:2375"

DOCKER_OPTS="-D --cluster-store-consul://192.168.0.18:8500 --cluster-advertise=192.168.0.36:2375"

But still it did not work. I am not sure of what ip:port should be set for the --cluster-advertise= . Docs, discussions and tutorials are not clear on this advertise thing.

There is something that I am getting wrong here. Please help.


When you execute the docker runcommand, be sure to add --net myapp. Here is a full step-by-step tutorial (online version):

How to deploy swarm on a cluster with multi-hosts network

TL;DR: step-by-step tutorial to deploy a multi-hosts network using Swarm. I wanted to put online this tutorial ASAP so I didn't even take time for the presentation. The markdown file is available on the github of my website. Feel free to adapt and share it, it is licensed under a Creative Commons Attribution 4.0 International License.

Prerequisites

Environment
  • Tutorial done with docker engine 1.9.0.
  • Swarm agents are discovered through a shared file (other methods are available).
  • Consul 0.5.2 is used for discovery for the multi-hosts network swarm containers.

Swarm manager and consul master will be run on the machine named bugs20. Other nodes, bugs19, bugs18, bugs17 and bugs16, will be swarm agents and consul members.

Before we start

Consul is used for the multihost networking, any other key-value store can be used -- note that the engine supports Consul Etcd, and ZooKeeper. Token (or static file) are used for the swarm agents discovery. Tokens use REST API, a static file is preferred.

The network

The network is range 192.168.196.0/25. The host named bugsN has the IP address 192.168.196.N.

The docker daemon

All nodes are running docker daemon as follow:

/usr/bin/docker daemon -H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock --cluster-advertise eth0:2375 --cluster-store consul://127.0.0.1:8500
Options details:
-H tcp://0.0.0.0:2375

Binds the daemon to an interface to allow be part of the swarm cluster. An IP address can obviously be specificied, it is a better solution if you have several NIC.

--cluster-advertise eth0:2375

Defines the interface and the port of the docker daemon should use to advertise itself.

--cluster-store consul://127.0.0.1:8500

Defines the URL of the distributed storage backend. In our case we use consul, though there are other discovery tools that can be used, if you want to make up your mind you should be interested in reading this service discovery comparison.

As consul is distributed, the URL can be local (remember, swarm agents are also consul members) and this is more flexible as you don't have to specify the IP address of the consul master and be selected after the docker daemon has been started.

The aliases used

In the following commands these two aliases are used:

alias ldocker='docker -H tcp://0.0.0.0:2375'
alias swarm-docker='docker -H tcp://0.0.0.0:5732' #used only on the swarm manager

Be sure to have the path of the consul binary in your $PATH. Once you are in the directory just type export PATH=$PATH:$(pwd) will do the trick.

It is also assumed that the variable $IP has been properly set and exported. It can be done, thanks to .bashrc or .zshrc or else, with something like this:

export IP=$(ifconfig |grep "192.168.196."|cut -d ":" -f 2|cut -d " " -f 1)

Consul

Let's start to deploy all consul members and master as needed.

Consul master (bugs20)

consul agent -server -bootstrap-expect 1 -data-dir /tmp/consul -node=master -bind=$IP -client $IP
Options details:
agent -server

Start the consul agent as a server.

-bootstrap-expect 1

We expect only one master.

-node=master20

This consul server/master will be named "master20".

-bind=192.168.196.20

Specifies the IP address on which it should be bound. Optional if you have only one NIC.

-client=192.168.196.20

Specifies the RPC IP address on which the server should be bound. By default it is localhost. Note that I am unsure about the necessity of this option, and this force to add -rpc-addr=192.168.196.20:8400 for local request such as consul members -rpc-addr=192.168.196.20:8400 or consul join -rpc-addr=192.168.196.20:8400 192.168.196.9 to join the consul member that has the IP address 192.168.196.9.

Consul members (bugs{16..19})

consul agent -data-dir /tmp/consul -node=$HOSTNAME -bind=192.168.196.N

It is suggested to use tmux, or similar, with the option :setw synchronize-panes on so this one command: consul -d agent -data-dir /tmp/consul -node=$HOST -bind=$IP starts all consul members.

Join consul members

consul join -rpc-addr=192.168.196.20:8400 192.168.196.16
consul join -rpc-addr=192.168.196.20:8400 192.168.196.17
consul join -rpc-addr=192.168.196.20:8400 192.168.196.18
consul join -rpc-addr=192.168.196.20:8400 192.168.196.19

A one line command can be used too. If you are using zsh, then consul join -rpc-addr=192.168.196.20:8400 192.168.196.{16..19} is enough, or a foor loop: for i in $(seq 16 1 19); do consul join -rpc-addr=192.168.196.20:8400 192.168.196.$i;done. You can verify if your members are part of your consul deployment with the command:

consul members -rpc-addr=192.168.196.20:8400
Node      Address              Status  Type    Build  Protocol  DC
master20  192.168.196.20:8301  alive   server  0.5.2  2         dc1
bugs19    192.168.196.19:8301  alive   client  0.5.2  2         dc1
bugs18    192.168.196.18:8301  alive   client  0.5.2  2         dc1
bugs17    192.168.196.17:8301  alive   client  0.5.2  2         dc1
bugs16    192.168.196.16:8301  alive   client  0.5.2  2         dc1

Consul members and master are deployed and working. The focus will now be on docker and swarm.


Swarm

In the following the creation of swarm manager and swarm members discovery are detailed using two different methods: token and static file. Tokens use a hosted discovery service with Docker Hub while static file is just local and does not use the network (nor any server). Static file solution should be preferred (and is actually easier).

[static file] Start the swarm manager while joining swarm members

Create a file named /tmp/cluster.disco with the content swarm_agent_ip:2375.

cat /tmp/cluster.disco
192.168.196.16:2375
192.168.196.17:2375
192.168.196.18:2375
192.168.196.19:2375

Then just start the swarm manager as follow:

ldocker run -v /tmp/cluster.disco:/tmp/cluster.disco -d -p 5732:2375 swarm manage file:///tmp/cluster.disco

And you're done !

[token] Create and start the swarm manager

On the swarm master (bugs20), create a swarm:

ldocker run --rm swarm create > swarm_id

This create a swarm and save the token ID in the file swarm_id of the current directory. Once created, the swarm manager need to be run as a daemon:

ldocker run -d -p 5732:2375 swarm manage token://`cat swarm_id`

To verify if it is started you can run:

ldocker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                    NAMES
d28238445532        swarm               "/swarm manage token:"   5 seconds ago       Up 4 seconds        0.0.0.0:5732->2375/tcp   cranky_liskov

[token] Join swarm members into the swarm cluster

Then the swarm manager will need some swarm agent to join.

ldocker run swarm join --addr=192.168.196.16:2375 token://`cat swarm_id`
ldocker run swarm join --addr=192.168.196.17:2375 token://`cat swarm_id`
ldocker run swarm join --addr=192.168.196.18:2375 token://`cat swarm_id`
ldocker run swarm join --addr=192.168.196.19:2375 token://`cat swarm_id`

std[in|out] will be busy, these commands need to be ran on different terminals. Adding -d abefore the join should solve this and enables a for-loop to be used for the joins.

After the join of the swarm members:

auzias@bugs20:~$ ldocker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                    NAMES
d1de6e4ee3fc        swarm               "/swarm join --addr=1"   5 seconds ago       Up 4 seconds        2375/tcp                 fervent_lichterman
338572b87ce9        swarm               "/swarm join --addr=1"   6 seconds ago       Up 4 seconds        2375/tcp                 mad_ramanujan
7083e4d6c7ea        swarm               "/swarm join --addr=1"   7 seconds ago       Up 5 seconds        2375/tcp                 naughty_sammet
0c5abc6075da        swarm               "/swarm join --addr=1"   8 seconds ago       Up 6 seconds        2375/tcp                 gloomy_cray
ab746399f106        swarm               "/swarm manage token:"   25 seconds ago      Up 23 seconds       0.0.0.0:5732->2375/tcp   ecstatic_shockley

After the discovery of the swarm members

To verify if the members are well discovered, you can execute swarm-docker info:

auzias@bugs20:~$ swarm-docker info
Containers: 4
Images: 4
Role: primary
Strategy: spread
Filters: health, port, dependency, affinity, constraint
Nodes: 4
 bugs16: 192.168.196.16:2375
  └ Containers: 0
  └ Reserved CPUs: 0 / 12
  └ Reserved Memory: 0 B / 49.62 GiB
  └ Labels: executiondriver=native-0.2, kernelversion=3.16.0-4-amd64, operatingsystem=Debian GNU/Linux 8 (jessie), storagedriver=aufs
 bugs17: 192.168.196.17:2375
  └ Containers: 0
  └ Reserved CPUs: 0 / 12
  └ Reserved Memory: 0 B / 49.62 GiB
  └ Labels: executiondriver=native-0.2, kernelversion=3.16.0-4-amd64, operatingsystem=Debian GNU/Linux 8 (jessie), storagedriver=aufs
 bugs18: 192.168.196.18:2375
  └ Containers: 0
  └ Reserved CPUs: 0 / 12
  └ Reserved Memory: 0 B / 49.62 GiB
  └ Labels: executiondriver=native-0.2, kernelversion=3.16.0-4-amd64, operatingsystem=Debian GNU/Linux 8 (jessie), storagedriver=aufs
 bugs19: 192.168.196.19:2375
  └ Containers: 4
  └ Reserved CPUs: 0 / 12
  └ Reserved Memory: 0 B / 49.62 GiB
  └ Labels: executiondriver=native-0.2, kernelversion=3.16.0-4-amd64, operatingsystem=Debian GNU/Linux 8 (jessie), storagedriver=aufs
CPUs: 48
Total Memory: 198.5 GiB
Name: ab746399f106

At this point swarm is deployed and all containers run will be run over different nodes. By executing several:

auzias@bugs20:~$ swarm-docker run --rm -it ubuntu bash

and then a:

auzias@bugs20:~$ swarm-docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
45b19d76d38e        ubuntu              "bash"              6 seconds ago       Up 5 seconds                            bugs18/boring_mccarthy
53e87693606e        ubuntu              "bash"              6 seconds ago       Up 5 seconds                            bugs16/amazing_colden
b18081f26a35        ubuntu              "bash"              6 seconds ago       Up 4 seconds                            bugs17/small_newton
f582d4af4444        ubuntu              "bash"              7 seconds ago       Up 4 seconds                            bugs18/naughty_banach
b3d689d749f9        ubuntu              "bash"              7 seconds ago       Up 4 seconds                            bugs17/pensive_keller
f9e86f609ffa        ubuntu              "bash"              7 seconds ago       Up 5 seconds                            bugs16/pensive_cray
b53a46c01783        ubuntu              "bash"              7 seconds ago       Up 4 seconds                            bugs18/reverent_ritchie
78896a73191b        ubuntu              "bash"              7 seconds ago       Up 5 seconds                            bugs17/gloomy_bell
a991d887a894        ubuntu              "bash"              7 seconds ago       Up 5 seconds                            bugs16/angry_swanson
a43122662e92        ubuntu              "bash"              7 seconds ago       Up 5 seconds                            bugs17/pensive_kowalevski
68d874bc19f9        ubuntu              "bash"              7 seconds ago       Up 5 seconds                            bugs16/modest_payne
e79b3307f6e6        ubuntu              "bash"              7 seconds ago       Up 5 seconds                            bugs18/stoic_wescoff
caac9466d86f        ubuntu              "bash"              7 seconds ago       Up 5 seconds                            bugs17/goofy_snyder
7748d01d34ee        ubuntu              "bash"              7 seconds ago       Up 5 seconds                            bugs16/fervent_einstein
99da2a91a925        ubuntu              "bash"              7 seconds ago       Up 5 seconds                            bugs18/modest_goodall
cd308099faac        ubuntu              "bash"              7 seconds ago       Up 6 seconds                            bugs19/furious_ritchie

As shown, the containers are disseminated over bugs{16...19}.


Multi-hosts network

A network overlay is needed so all the containers can be "plugged in" this overlay. To create this network overlay, execute:

auzias@bugs20:~$ swarm-docker network create -d overlay net
auzias@bugs20:~$ swarm-docker network ls|grep "net"
c96760503d06        net                 overlay

And voilà !

Once this overlay is created, add --net net to the command swarm-docker run --rm -it ubuntu bash and all your containers will be able to communicate natively as if they were on the same LAN. The default network is 10.0.0.0/24.

Enabling Multicast

Multicast is not support by the default overlay. Another driver is required to be able to use multicast. The docker plugin weave net does support multicast.

To use this driver, once installed, you will need to run $weave launch on all Swarm agents and Swarm manager. Then you'll need to connect the weave together, this is done by running $weave connect $SWARM_MANAGER_IP. It is not obviously the IP address of the Swarm manager but it is cleaner to do so (or use another node than the Swarm agents).

At this point the weave cluster is deployed, but no weave network has been created. Running $swarm-docker network create --driver weave weave-net will create the weave network named weave-net. Starting containers with the --net weave-net will enable them to share the same LAN and use multicast. Example of a full command to start such containers is: $swarm-docker run --rm -it --privileged --net=weave-net ubuntu bash.


I think the options that you specify should use cluster-store=consul instead of cluster-store-consul. Try to reset and restart the engine and swarm and check if it works. It should work after that. The getting started doc clearly explains how to configure docker overlay networks using consul as backing data-store.

DOCKER_OPTS="-D --cluster-store=consul://192.168.0.18:8500 --cluster-advertise=192.168.0.18:2375"

DOCKER_OPTS="-D --cluster-store=consul://192.168.0.18:8500 --cluster-advertise=192.168.0.36:2375"

For anyone coming to this since Docker 1.12 was released, this is now trivially easy - Swarm Mode is built into the engine and you don't need Consul or any other extra components.

Assuming you have two hosts with Docker installed, intitialize the Swarm on the first machine:

> docker swarm init
Swarm initialized: current node (6ujd4o5fx1dmav5uvv4khrp33) is now a manager

To add a worker to this swarm, run the following command:                                                                                                   
docker swarm join \                                                                                                                                     
--token SWMTKN-1-54xs4bn7qs6su3xjjn7ul5am9z9073by2aqpey56tnccbi93zy-blugim00fuozg6qs289etc \                                                         
172.17.0.54:2377

That host becomes the first manager node in the swarm, and it writes out the command you use to join other nodes to the swarm - the secret token, and the IP address where the manager is listening.

On the second host:

> docker swarm join 172.17.0.54:2377 --token SWMTKN-1-54xs4bn7qs6su3xjjn7ul5am9z9073by2aqpey56tnccbi93zy-blugim00fuozg6qs289etc
This node joined a swarm as a worker.

Now you have a secure 2-node swarm which has service discovery, rolling updates and service scaling.

Create your overlay network on the manager node with:

> docker network create -d overlay my-net
d99lmsfzhcb16pdp2k7o9sehv

And you now have a multi-host overlay network with built-in DNS, so services can resolve each other based on service name.