Learning to compile things from source (on Unix/Linux/OSX)

I apologise for directly answering everything, but I don't know any useful tutorials, FAQs, etc. Basically what follows is 8 years of making desktop apps (that I help distribute), frustration and googling:

1. How do I figure out what arguments to pass to ./configure?

Practice really. Autotools is easy enough as it is consistent. But there's plenty of stuff out there using cmake, or custom build scripts. Generally, you shouldn't have to pass anything to configure, it should figure out if your system can build foo-tool or not.

Configure and GNU tools all look in /, /usr and /usr/local for dependencies. If you install anything anywhere else (which makes things painful if the dependency was installed by MacPorts or Fink), you will have to pass a flag to configure or modify the shell's environment to help GNU tools find these dependencies.

2. How shared libraries work under OS X / Linux - where they live on the filesystem, how ./configure && make finds them, what actually happens when they are linked against

On Linux they need to be installed to a path that the dynamic linker can find, this is defined by the LD_LIBRARY_PATH environment variable and the contents of /etc/ld.conf. On Mac it is the same for most open source software almost always (unless it is an Xcode Project). Except the env variable is DYLD_LIBRARY_PATH instead.

There is a default path that the linker searches for libraries. It is /lib:/usr/lib:/usr/local/lib

You can supplement this by using the CPATH variable, or CFLAGS or any number of other environment variables really (conveniently complicated). I suggest CFLAGS like so:

export CFLAGS="$CFLAGS -L/new/path"

The -L parameter adds to the link path.

Modern stuff uses the pkg-config tool. Modern stuff you install also installs a .pc file that describes the library and where it is and how to link to it. This can make life easier. But it doesn't come with OS X 10.5 so you'll have to install that too. Also a lot of basic deps don't support it.

The act of linking is just "resolve this function at runtime", really it's a big string table.

3. What are the actual differences between a shared and a statically linked library? Why can't I just statically link everything (RAM and disk space are cheap these days) and hence avoid weird library version conflicts?

When you link to a static library file the code becomes part of your application. It would be like if there was one giant .c file for that library and you compiled it into your application.

Dynamic libraries have the same code, but when the app is run, the code is loaded into the app at runtime (simplified explanation).

You can statically link to everything, however, sadly hardly any build systems make this easy. You'd have to edit build system files manually (eg. Makefile.am, or CMakeLists.txt). However this is probably worth learning if you regularly install things that require different versions of libraries and you are finding installing dependencies in parallel difficult.

The trick is to change the link line from -lfoo to -l/path/to/static/foo.a

You can probably find and replace. Afterwards check the tool doesn't link to the .so or dylib using ldd foo or otool -L foo

Another problem is not all libraries compile to static libraries. Many do. But then MacPorts or Debian may have decided not to ship it.

4. How can I tell what libraries I have installed, and what versions?

If you have pkg-config files for those libraries it is easy:

pkg-config --list-all

Otherwise you often can't easily. The dylib may have a soname (ie. foo.0.1.dylib, the soname is 0.1) that is the same as the library's version. However this is not required. The soname is a binary computability feature, you have to bump the major part of the soname if you change the format of the functions in the library. So you can get eg. version 14.0.5 soname for a 2.0 library. Although this is not common.

I got frustrated with this sort of thing and have developed a solution for this on Mac, and I'm talking about it next.

5. How can I install more than one version of a library without breaking my normal system?

My solution to this is here: http://github.com/mxcl/homebrew/

I like installing from source, and wanted a tool that made it easy, but with some package management. So with Homebrew I build, eg. wget myself from source, but make sure to install to a special prefix:

/usr/local/Cellar/wget/1.1.4

I then use the homebrew tool to symlink all that into /usr/local, so I still have /usr/local/bin/wget and /usr/local/lib/libwget.dylib

Later if I need a different version of wget I can install it in parallel and just change the version that is linked into the /usr/local tree.

6. If I am installing stuff from source on a system that is otherwise managed using packages, what's the cleanest way of doing so?

I believe the Homebrew way is cleanest, so use it or do the equivalent. Install to /usr/local/pkgs/name/version and symlink or hard link the rest in.

Do use /usr/local. Every build tool that exists searches there for dependencies and headers. Your life will be much easier.

7. Assuming I manage to compile something fiddly from source, how can I then package that up so other people don't have to jump through the same hoops? Particularly on OS X....

If it has no dependencies you can tar up the the build directory and give it to someone else who can then do "make install". However you can only do this reliably for the exact same versions of OS X. On Linux it will probably work for similar Linux (eg. Ubuntu) with the same Kernel version and libc minor version.

The reason it is not easy to distribute binaries on Unix is because of binary compatibility. The GNU people, and everyone else change their binary interfaces often.

Basically don't distribute binaries. Things will probably break in very strange ways.

On Mac, the best option is to make a macports package. Everyone uses macports. On Linux there are so many different build systems and combinations, I don't think there is any better advise than to write a blog entry about how you succeeded building x tool in y strange configuration.

If you make a package description (for macports or homebrew) then anyone can install that package, and it solves the dependency problems too. However this is often not easy, and it also isn't easy to get your macports recipe included in the main macports tree. Also macports doesn't support exotic installation types, they offer one choice for all packages.

One of my future goals with Homebrew is to make it possible to click a link on a website (eg. homebrew://blah and it will download that Ruby script, install the deps for that package and then build the app. But yeah, not yet done, but not too tricky considering the design I chose.

8. What are the command line tools I need to master to get good at this stuff? Stuff like otool, pkg-config etc.

otool is really only useful afterwards. It tells you what the built binary links to. When you are figuring out the dependencies of a tool you have to build, it is useless. The same is true of pkg-config as you will have already installed the dependency before you can use it.

My tool chain is, read the README and INSTALL files, and do a configure --help. Watch the build output to check it is sane. Parse any build errors. Maybe in future, ask on serverfault :)

This is a huge topic so lets start with shared libraries on Linux (ELF on Linux and Mach-O on OS X), Ulrich Drepper has a good introduction to writing DSOs (dynamic shared objects) which covers some history of shared libraries on Linux available here including why they are important

Ulrich also describes why static linking is considered harmful one of the key points here is security updates. Buffer overflows in a common library (eg zlib) that is extensively linked statically can cause a huge overhead for distributions - this occured with zlib 1.1.3 (Red Hat advisory)

ELF

The linker ld.so manual page

man ld.so

explains the basic paths and files involved in runtime dynamic linking. On modern Linux systems you'll see additional paths added via /etc/ld.so.conf.d/ added usually via a glob include in /etc/ld.so.conf.

If you want to see what is available dynamically via your ld.so configuration you can run

ldconfig -v -N -X

Reading the DSO howto should give you a good basic level of knowledge in order to then go on to understand how those principles apply to Mach-O on OS X.

Mach-O

On OS X the binary format is Mach-O. Local system documentation for the linker is

man dyld

The Mach format documentation is available from Apple

UNIX build tools

The common configure, make, make install process is generally provided by GNU autotools which has an online book that covers some of the history of the configure/build split and the GNU toolchain. Autoconf uses tests to determine feature availability on the target build system, it uses M4 macro language to drive this. Automake is basically a templating method for Makefiles, the template generally being called Makefile.am which outputs a Makefile.in that the output of autoconf (the configure script) converts into a Makefile.

The GNU hello program acts as a good example for understanding the GNU toolchain - and the manual includes autotools documentation.

Learning to compile things from source (on Unix/Linux/OSX)

Related

Recent Posts