What is the difference between creating a volume or a mount in docker containers?
There are actually three types of volumes:
- Host Volume: what you refer to as a mount in a container, the more common term is a bind mount.
- Named Volume: any volume managed by docker which you give a name.
- Anonymous Volume: any volume without a source, docker will create this as a local volume with a long unique id, and it behaves as a named volume.
Volumes have a source and a target. The source identifies the type of volume, so a path (including the leading slash) to a file/directory results in a host volume. If you do not provide a source, you get the anonymous volumes. If you define a volume inside a Dockerfile, you cannot specify a source there, so by default docker will create anonymous volumes unless you direct it otherwise at runtime.
For each type, here are the pros/cons:
- Host:
- Pro: easy to access the underlying files from the host
- Con: uid/gid permission issues occur when container user's uid does not match the host gid
- Con: data is not initialized
- Named:
- Pro: easy to create an reuse between different containers/images. If you only give it a name with no other settings, the local driver will default to storing your data in /var/lib/docker/volumes which should only be accessible by root from outside of docker.
- Pro: initializes content to the image contents when it is empty/new and the container is created. This initialization includes file owners and permissions from the image, which can resolve most uid/gid issues.
- Pro: Can connect to anything that a mount command can, including a bind mount or NFS mount, with a local driver. Other drivers let you reference data in even more locations (e.g. cloud providers).
- Con: managing content should be done via a container.
- Anonymous:
- Pro: requires no planning to use
- Con: data typically goes here to be lost since there is no mapping from the volume back to the container/image that created it. This is the worst way to store volumes in my opinion, and the reason that no one should ever define a volume inside their Dockerfile.
When possible, I use a named volume. The initialization of data and better handling of uid/gid issues trump the convenience of a host volume. If I really need access outside of docker directly to the data, then I try to use a named volume that points to a bind mount instead of the default local driver settings. A simple example of this is:
$ docker volume create --driver local \
--opt type=none \
--opt device=/home/user/test \
--opt o=bind \
test_vol
For defining my volumes, since you do not want to do this in a Dockerfile, I use a docker-compose.yml and define my volumes in there. If it's deployed with swarm mode, I'll point to a NFS server with a named volume to allow the data to be reached as the containers migrate to different hosts. Otherwise it's a local named volume that can be easily used with docker-compose.
Volumes in the dockerfile allow a path to be specified in the image that should always be created as a volume. This inherently bypasses the union filesystem docker uses.
Users of such an image will always get a volume at that location when running
docker run <imagename>
i.e. there is no reason to ever add -v /my/mount/point:/mount/here
and thus users need not be concerned with it.
binding mounts (like the example above with -v
) must always be present if they are required. and are not portable between images.
the effective differences to optimization are these:
- volumes can be used where a lot of r/w operations are needed and it has business writing on the union file system (think databases)
- volumes are worthless for mounting things like data volumes. you can do it, but you take an enormous r/w hit because there's no reason for this to be in the union file system.
- mounts however will store this (the above) quite well as it simply mounts the existing directory to a place within the container and ignores the union file system for that directory all together.
does this make sense?