Is there a way to combine Docker images into 1 container?

I have a few Dockerfiles right now.

One is for Cassandra 3.5, and it is FROM cassandra:3.5

I also have a Dockerfile for Kafka, but t is quite a bit more complex. It is FROM java:openjdk-8-fre and it runs a long command to install Kafka and Zookeeper.

Finally, I have an application written in Scala that uses SBT.

For that Dockerfile, it is FROM broadinstitute/scala-baseimage, which gets me Java 8, Scala 2.11.7, and STB 0.13.9, which are what I need.

Perhaps, I don't understand how Docker works, but my Scala program has Cassandra and Kafka as dependencies and for development purposes, I want others to be able to simply clone my repo with the Dockerfile and then be able to build it with Cassandra, Kafka, Scala, Java and SBT all baked in so that they can just compile the source. I'm having a lot of issues with this though.

How do I combine these Dockerfiles? How do I simply make an environment with those things baked in?


Solution 1:

You can, with the multi-stage builds feature introduced in Docker 1.17

Take a look at this:

FROM golang:1.7.3
WORKDIR /go/src/github.com/alexellis/href-counter/
RUN go get -d -v golang.org/x/net/html  
COPY app.go .
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o app .

FROM alpine:latest  
RUN apk --no-cache add ca-certificates
WORKDIR /root/
COPY --from=0 /go/src/github.com/alexellis/href-counter/app .
CMD ["./app"]  

Then build the image normally:

docker build -t alexellis2/href-counter:latest

From : https://docs.docker.com/develop/develop-images/multistage-build/

The end result is the same tiny production image as before, with a significant reduction in complexity. You don’t need to create any intermediate images and you don’t need to extract any artifacts to your local system at all.

How does it work? The second FROM instruction starts a new build stage with the alpine:latest image as its base. The COPY --from=0 line copies just the built artifact from the previous stage into this new stage. The Go SDK and any intermediate artifacts are left behind, and not saved in the final image.

Solution 2:

You can't combine dockerfiles as conflicts may occur. What you want to do is to create a new dockerfile or build a custom image.

TL;DR; If your current development container contains all the tools you need and works, then save it as an image and upon it to a repo and create a dockerfile to pull from that image off that repo.

Details: Building a custom image is by far easier than creating a dockerfile using a public image as you can store whatever hacks and mods into the image. To do so, start a blank container with a basic Linux image (or broadinstitute/scala-baseimage), install whatever tools you need and configure them until everything works correctly, then save it (the container) as an image. Create a new container off this image and test to see if you can build your code on top of it via docker-compose (or however you want to do/build it). If it works, than you have a working base image that you can upload to a repo so others can pull it.

To build a dockerfile with a public image, you will need to put all hacks, mods and setup on the dockerfile itself. That is, you will need to place every command line that you used into a text file and reduce whatever hacks, mods and setup into command lines. At the end, your dockerfile will create an image automatically and you don't need to store this image into a repo and all you need to do is to give others the dockerfile and they can spin the image up at their own docker.

Note that once you have a working dockerfile, you can tweak it easily as it will create a new image every time you use the dockerfile. With a custom image, you may run into issues where you need to rebuild the image due to conflicts. For example, all of your tools work with openjdk until you install one that doesn't work. The fix may involve uninstalling openjdk and use the oracle one, but all configuration you did for all the tools that you have installed broke.

Solution 3:

The following answer applies to docker 1.7 and above:

I would prefer to use --from=NAME and from image as NAME Why? You can use --from=0 and above but this might get little hard to manage when you have many docker stages in dockerfile.

sample example:

FROM golang:1.7.3 as backend
WORKDIR /backend
RUN go get -d -v golang.org/x/net/html  
COPY app.go .
RUN  #install some stuff, compile assets....
    
FROM golang:1.7.3 as assets
WORKDIR /assets
RUN ./getassets.sh

FROM nodejs:latest as frontend 
RUN npm install
WORKDIR /assets
COPY --from=assets /asets .
CMD ["./app"] 

FROM alpine:latest as mergedassets
WORKDIR /root/
COPY --from=merge ./
COPY --from=backend ./backend .
CMD ["./app"]

Note: Managing dockerfile properly will help to build a docker image much faster. Internally docker usings docker layer caching to help with this process, incase the image have to be rebuilt.

Solution 4:

Yes, you can roll a whole lot of software into a single Docker image (GitLab does this, with one image that includes Postgres and everything else), but generalhenry is right - that's not the typical way to use Docker.

As you say, Cassandra and Kafka are dependencies for your Scala app, they're not part of the app, so they don't all belong in the same image.

Having to orchestrate many containers with Docker Compose adds an extra admin layer, but it gives you much more flexibility:

  • your containers can have different lifespans, so when you have a new version of your app to deploy, you only need to run a new app container, you can leave the dependencies running;
  • you can use the same app image in any environment, using different configurations for your dependencies - e.g. in dev you can run a basic Kafka container and in prod have it clustered on many nodes, your app container is the same;
  • your dependencies can be used by other apps too - so multiple consumers can run in different containers and all work with the same Kafka and Cassandra containers;
  • plus all the scalability, logging etc. already mentioned.