Why is the Alpine Docker image over 50% slower than the Ubuntu image?

Solution 1:

I've run the same benchmark as you did, using just Python 3:

$ docker run python:3-alpine3.6 python --version
Python 3.6.2
$ docker run python:3-slim python --version
Python 3.6.2

resulting in more than 2 seconds difference:

$ docker run python:3-slim python -c "$BENCHMARK"
3.6475560404360294
$ docker run python:3-alpine3.6 python -c "$BENCHMARK"
5.834922112524509

Alpine is using a different implementation of libc (base system library) from the musl project(mirror URL). There are many differences between those libraries. As a result, each library might perform better in certain use cases.

Here's an strace diff between those commands above. The output starts to differ from line 269. Of course there are different addresses in memory, but otherwise it's very similar. Most of the time is obviously spent waiting for the python command to finish.

After installing strace into both containers, we can obtain a more interesting trace (I've reduced the number of iterations in the benchmark to 10).

For example, glibc is loading libraries in the following manner (line 182):

openat(AT_FDCWD, "/usr/local/lib/python3.6", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
getdents(3, /* 205 entries */, 32768)   = 6824
getdents(3, /* 0 entries */, 32768)     = 0

The same code in musl:

open("/usr/local/lib/python3.6", O_RDONLY|O_DIRECTORY|O_CLOEXEC) = 3
fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
getdents64(3, /* 62 entries */, 2048)   = 2040
getdents64(3, /* 61 entries */, 2048)   = 2024
getdents64(3, /* 60 entries */, 2048)   = 2032
getdents64(3, /* 22 entries */, 2048)   = 728
getdents64(3, /* 0 entries */, 2048)    = 0

I'm not saying this is the key difference, but reducing the number of I/O operations in core libraries might contribute to better performance. From the diff you can see that executing the very same Python code might lead to slightly different system calls. Probably the most important could be done in optimizing loop performance. I'm not qualified enough to judge whether the performance issue is caused by memory allocation or some other instruction.

  • glibc with 10 iterations:

    write(1, "0.032388824969530106\n", 210.032388824969530106)
    
  • musl with 10 iterations:

    write(1, "0.035214247182011604\n", 210.035214247182011604)
    

musl is slower by 0.0028254222124814987 seconds. As the difference grows with number of iterations, I'd assume the difference is in memory allocation of JSON objects.

If we reduce the benchmark to solely importing json we notice the difference is not that huge:

$ BENCHMARK="import timeit; print(timeit.timeit('import json;', number=5000))"
$ docker run python:3-slim python -c "$BENCHMARK"
0.03683806210756302
$ docker run python:3-alpine3.6 python -c "$BENCHMARK"
0.038280246779322624

Loading Python libraries looks comparable. Generating list() produces bigger difference:

$ BENCHMARK="import timeit; print(timeit.timeit('list(range(10000))', number=5000))"
$ docker run python:3-slim python -c "$BENCHMARK"
0.5666235145181417
$ docker run python:3-alpine3.6 python -c "$BENCHMARK"
0.6885563563555479

Obviously the most expensive operation is json.dumps(), which might point to differences in memory allocation between those libraries.

Looking again at the benchmark, musl is really slightly slower in memory allocation:

                          musl  | glibc
-----------------------+--------+--------+
Tiny allocation & free |  0.005 | 0.002  |
-----------------------+--------+--------+
Big allocation & free  |  0.027 | 0.016  |
-----------------------+--------+--------+

I'm not sure what is meant by "big allocation", but musl is almost 2× slower, which might become significant when you repeat such operations thousands or millions of times.