How does Docker Swarm implement volume sharing?

What you're asking about is a common question. Volume data and the features of what that volume can do are managed by a volume driver. Just like you can use different network drivers like overlay, bridge, or host, you can use different volume drivers.

Docker and Swarm only come with the standard local driver out of the box. It doesn't have any awareness of Swarm, and it will just create new volumes for your data on whichever node your service tasks are scheduled on. This is usually not what you want.

You want a 3rd party driver plugin that is Swarm aware, and will ensure the volume you created for a service task is available on the right node at the right time. Options include using "Docker for AWS/Azure" and its included CloudStor driver, or the popular open source REX-Ray solution.

There are lots of 3rd party volume drivers, which you can find on the Docker Store.

Swarm Mode itself does not do anything different with volumes, it runs any volume mount command you provide on the node where the container is running. If your volume mount is local to that node, then your data will be saved locally on that node. There is no built in functionality to move data between nodes automatically.

There are some software based distributed storage solutions like GlusterFS, and Docker has one called Infinit which is not yet GA and development on that has taken a back seat to the Kubernetes integration in EE.

The typical result is you either need to manage replication of storage within your application (e.g. etcd and other raft based algorithms) or you perform your mounts on an external storage system (hopefully with its own HA). Mounting an external storage system has two options, block or file based. Block based storage (e.g. EBS) typically comes with higher performance, but is limited to only be mounted on a single node. For this, you will typically need a 3rd party volume plugin driver to give your docker node access to that block storage. File based storage (e.g. EFS) has lower performance, but is more portable, and can be simultaneously mounted on multiple nodes, which is useful for a replicated service.

The most common file based network storage is NFS (this is the same protocol used by EFS). And you can mount that without any 3rd party plugin drivers. The unfortunately named "local" volume plugin driver that docker ships with give you the option to pass any values you want to the mount command with driver options, and with no options, it defaults to storing volumes in the docker directory /var/lib/docker/volumes. With options, you can pass it the NFS parameters, and it will even perform a DNS lookup on the NFS hostname (something you don't have with NFS normally). Here's an example of the different ways to mount an NFS filesystem using the local volume driver:

  # create a reusable volume
  $ docker volume create --driver local \
      --opt type=nfs \
      --opt o=nfsvers=4,addr=192.168.1.1,rw \
      --opt device=:/path/to/dir \
      foo

  # or from the docker run command
  $ docker run -it --rm \
    --mount type=volume,dst=/container/path,volume-driver=local,volume-opt=type=nfs,\"volume-opt=o=nfsvers=4,addr=192.168.1.1\",volume-opt=device=:/host/path \
    foo

  # or to create a service
  $ docker service create \
    --mount type=volume,dst=/container/path,volume-driver=local,volume-opt=type=nfs,\"volume-opt=o=nfsvers=4,addr=192.168.1.1\",volume-opt=device=:/host/path \
    foo

  # inside a docker-compose file
  ...
  volumes:
    nfs-data:
      driver: local
      driver_opts:
        type: nfs
        o: nfsvers=4,addr=192.168.1.1,rw
        device: ":/path/to/dir"
  ...

If you use the compose file example at the end, note that changes to a volume (e.g. updating the server path or address) are not reflected in existing named volumes for as long as they exist. You need to either rename your volume, or delete it, to allow swarm to recreate it with new values.

The other common issue I see in most NFS usage is "root squash" being enabled on the server. This results in permission issues when containers running as root attempt to write files to the volume. You also have similar UID/GID permission issues where the container UID/GID is the one that needs permissions to write to the volume, which may require directory ownernship and permissions to be adjusted on the NFS server.

My solution for our locally hosted swarm: every worker node has mounted an nfs-share, provided by our fileserver on /mnt/docker-data. When I define volumes in my services compose files, I set the device to some path under /mnt/docker-data, for example:

volumes:
  traefik-logs:
    driver: local
    driver_opts:
      o: bind
      device: /mnt/docker-data/services/traefik/logs
      type: none

With this solution, docker creates the volume on every node, the service is deployed to and - surprise - there is data already, because it is the same path, which was used by the volume on the other node.

If you take a closer look at the nodes filesystem, you see that mounts to my fileserver mount are created under /var/lib/docker/volumes, see here:

root@node-3:~# df -h
Dateisystem                                                                                                   Größe Benutzt Verf. Verw% Eingehängt auf
[...]
fs.mydomain.com:/srv/shares/docker-data/services/traefik/logs                                 194G    141G   53G   73% /var/lib/docker/volumes/traefik_traefik-logs/_data

How does Docker Swarm implement volume sharing?

Related

Recent Posts