ld linker question: the --whole-archive option

Solution 1:

There are legitimate uses of --whole-archive when linking executable with static libraries. One example is building C++ code, where global instances "register" themselves in their constructors (warning: untested code):

handlers.h

typedef void (*handler)(const char *data);
void register_handler(const char *protocol, handler h);
handler get_handler(const char *protocol);

handlers.cc (part of libhandlers.a)

typedef map<const char*, handler> HandlerMap;
HandlerMap m;
void register_handler(const char *protocol, handler h) {
   m[protocol] = h;
}
handler get_handler(const char *protocol) {
   HandlerMap::iterator it = m.find(protocol);
   if (it == m.end()) return nullptr;
   return it->second;
}

http.cc (part of libhttp.a)

#include <handlers.h>
class HttpHandler {
    HttpHandler() { register_handler("http", &handle_http); }
    static void handle_http(const char *) { /* whatever */ }
};
HttpHandler h; // registers itself with main!

main.cc

#include <handlers.h>
int main(int argc, char *argv[])
{
    for (int i = 1; i < argc-1; i+= 2) {
        handler h = get_handler(argv[i]);
        if (h != nullptr) h(argv[i+1]);
    }
}

Note that there are no symbols in http.cc that main.cc needs. If you link this as

g++ main.cc -lhttp -lhandlers

you will not get an http handler linked into the main executable, and will not be able to call handle_http(). Contrast this with what happens when you link as:

g++ main.cc -Wl,--whole-archive -lhttp -Wl,--no-whole-archive -lhandlers

The same "self registration" style is also possible in plain-C, e.g. with the __attribute__((constructor)) GNU extension.

Solution 2:

Another legitimate use for --whole-archive is for toolkit developers to distribute libraries containing multiple features in a single static library. In this case, the provider has no idea what parts of the library will be used by the consumer and therefore must include everything.

Solution 3:

An additional good scenario in which --whole-archive is well-used is when dealing with static libraries and incremental linking.

Let us suppose that:

  1. libA implements the a() and b() functions.
  2. Some portion of the program has to be linked against libA only, e.g. due to some function wrapping using --wrap (a classical example is malloc)
  3. libC implements the c() functions and uses a()
  4. the final program uses a() and c()

Incremental linking steps could be:

ld -r -o step1.o module1.o --wrap malloc --whole-archive -lA
ld -r -o step2.o step1.o module2.o --whole-archive -lC
cc step3.o module3.o -o program

Failing to insert --whole-archive would strip function c() which is anyhow used by program, preventing the correct compilation process.

Of course, this is a particular corner case in which incremental linking must be done to avoid wrapping all calls to malloc in all modules, but is a case which is successfully supported by --whole-archive.

Solution 4:

I agree that using —whole-archive to build executables is probably not what you want (due to linking in unneeded code and creating bloated software). If they had a good reason to do so they should have documented it in the build system, as now you are left to guessing.

As to your second part of the question. If an executable links both a static library and a dynamic library that has (in part) the same object code as the static library then the —whole-archive will ensure that at link time the code from the static library is preferred. This is usually what you want when you do static linking.