Docker : Can a container A call an executable located on an other container B?

I have two Docker images, one containing pandoc (an utility to convert documents in different formats to many formats), and an other containing pdflatex (from texlive, to convert tex files into pdf). My goal here is to convert documents from md to pdf.

I can run each image separately :

# call pandoc inside my-pandoc-image (md -> tex)
docker run --rm \
    -v $(pwd):/pandoc \
    my-pandoc-image \
    pandoc -s test.md -o test.tex

# call pdflatex inside my-texlive-image (tex -> pdf)
docker run --rm \
    -v $(pwd):/texlive \
    my-texlive-image \
    pdflatex test.tex # generates test.pdf

But, in fact, what I want is to call pandoc (from its container) directly to convert md into pdf, like this :

docker run --rm \
    -v $(pwd):/pandoc \
    my-pandoc-image \
    pandoc -s test.md --latex-engine pdflatex -o test.pdf

This command does not work here, because pandoc inside the container tries to call pdflatex (that must be in $PATH) to generate the pdf, but pdflatex does not exist since it is not installed in the my-pandoc-image.

In my case, pdflatex is installed in the image my-texlive-image.

So, from this example, my question is : Can a container A call an executable located on an other container B ?

I am pretty sure this is possible, because if I install pandoc on my host (without pdflatex), I can run pandoc -s test.md--latex-engine=pdflatex -o test.pdf by simply aliasing the pdflatex command with :

pdflatex() {
    docker run --rm \
        -v $(pwd):/texlive \
        my-texlive-image \
        pdflatex "$@"
}

Thus, when pdflatex is called by pandoc, a container starts and do the conversion.

But when using the 2 containers, how could I alias the pdflatex command to simulate its existence on the container having only pandoc ?

I took a look at docker-compose, since I have already used it to make 2 containers communicate (app communicating with a database). I even thought about ssh-ing from container A to container B to call the pdflatex command, but this is definitively not the right solution.

Finally, I also have built an image containing pandoc + pdflatex (it worked because the two executables were on the same image), but I really want to keep the 2 images separately, since they could be used independently by other images.

Edit :

A similar question is exposed here, as I understand the provided answer needs Docker to be installed on container A, and needs a docker socket binding (/var/run/docker.sock) between host and container A. I don't think this is best practice, it seems like a hack that can create security issues.


Solution 1:

There are multiple solutions to your problem, I'll let you choose the one that suits you best. They are presented below, from the cleanest to the ugliest (in my opinion and regarding the best practices generally followed).

1. Make it a service

If you end up calling it often, it may be worth exposing pandoc as an (HTTP) API. Some images already do that, for example metal3d/pandoc-server (which I already used with success, but I'm sure you can find others).

In this case, you just run a container with pandoc + pdflatex once and you're set!

2. Use image inheritance!

Make 2 images : one with pandoc only, and the other one with pandoc + pdflatex, inheriting the first one with the FROM directive in the Dockerfile.

It will solve your concerns about size and still being able to run pandoc without having to fetch pdflatex too. Then if you need to pull the image with pdflatex, it will just be an extra layer, not the entire image.

You can also do it the other way, with a base image pdflatex and another adding pandoc to it if you find yourself using the pdflatex image alone often and rarely using the pandoc image without pdflatex. You could also make 3 images, pandoc, pdflatex, and pdflatex + pandoc, to cover every need you might have, but then you'll have at least one image that isn't linked in any way to the 2 others (can't heritate a "child" image), making it a bit harder to maintain.

3. Docker client in my-pandoc-image + Docker socket mount

This is the solution that you mentionned at the end of your post, and which is probably the most generic and straightforward solution for calling other containerized commands, not taking your precise usecase of pandoc + pdflatex into account.

Just add the docker client tu your image my-pandoc-image and pass the Docker socket as volume at runtime using docker run -v /var/run/docker.sock:/var/run/docker.sock. And if you're concerned is not being able to make pandoc call docker run ... instead of pdflatex directly, just add a poor wrapper called pdflatex in /usr/local/bin/ which will be responsible of doing the docker run

4. Use volumes-from to get the binary

This one is probably the less clean I'll present here. You could try getting either the pandoc binary in a pdflatex container or the pdflatex binary in a pandoc container using --volumes-from to keep everything packaged in its own Docker image. But honnestly, it's more of a duct tape than a real solution.

Conclusion

You can chose the solution that best fits your needs, but I would advise the first 2 and strongly discourage the last one.