A case for static linking in scientific computing

Published on

When researchers run scientific software on high-performance clusters, they often experience problems with shared libraries, such as this one:

bcftools: /lib64/libz.so.1: version `ZLIB_1.2.5.2' not found

Or this one:

eagle: error while loading shared libraries: libhts.so.1: cannot open shared object file: No such file or directory

Popular advice points them in the direction of LD_LIBRARY_PATH, but a simple and robust solution—static linking—is often overlooked.

In this article, I explain the background behind static and dynamic linking, demonstrate the advantages of static linking, address some of the objections against static linking, and give instructions on how to prepare static binaries.

What is static and dynamic linking?

The word linking itself refers to the process of assembling a complete program from libraries of subprograms1.

Static linking occurs as the last stage of the compilation process; the required libraries are embedded in the final binary file of your program.

Some (or even all) of the libraries may not be included in the binary at this stage. In this case, when we attempt to run the program, we need dynamic linking in order to find the libraries and make the missing subroutines accessible to the program. This second type of libraries is called dynamic, or shared, libraries. The files with these libraries are usually named libsomething.so on Linux and something.dll on Windows.

The rules that an operating system follows when it searches for dynamic libraries are complex. And simply having a library in the right place is not enough; it needs to be the same version of the library that was used during the compilation, or at least a different version with the same ABI. (So no, you shouldn’t ln -s /usr/lib/libsomething.so.2 /usr/lib/libsomething.so.1 when a program doesn’t work, contrary to another popular piece of advice one can find on the Internet.)

Linux distributions, most of which dynamically link the software they distribute, manage this by engaging qualified package maintainers and by having tools and centralized infrastructure to build and distribute packages.

But the world of scientific software is not there yet. And if you fail to take care of your dynamic libraries, the result can vary from a program refusing to start (as we saw earlier), to a program crash in the middle of operation, to a hard to detect and diagnose case of data corruption.

This is why I think that scientific software should be linked statically by default.

Advantages of static linking

Reproducibility. When a program is linked statically, it executes the same algorithm wherever it is run. A dynamic executable executes the code from the version of the dynamic library that happens to be installed on a particular computing node.

Note that the static executable doesn’t contain the metainformation about the library versions used to build it. You should record that information when you compile the software, and ideally use a binary repository manager for your builds.

But replacing dynamic linking with static linking by itself dramatically increases the probability that your program will run tomorrow in the same way as it runs today.

Ease of distribution. Suppose that you want your colleague to run the program you have compiled. If you link statically, you only need to distribute a single binary. If you link dynamically, you need to distribute the binary, all dynamic libraries that it depends on, and the instructions on how to make the binary find its libraries.

Portability. Static linking ensures that you can compile the program on your Ubuntu laptop and run it on a Debian or CentOS cluster. With dynamic linking, you’d have to compile it directly on a cluster or in an identical environment.

That said, you still need to ensure that both systems use the same or compatible architectures (the majority of scientific computing happens on x86-64 anyway), the same OS (probably Linux) and not too different kernel versions (or libc versions if you link libc dynamically).

No problems with finding libraries. Since no dynamic libraries are needed, the OS cannot fail to find them. No more cannot open shared object file messages. No more LD_LIBRARY_PATH tricks.

Isn’t static linking considered harmful?

Ulrich Drepper says it is:

There are still too many people out there who think (or even insist) that static linking has benefits. This has never been the case and never will be the case.

Ulrich certainly knows about this stuff much more than I ever hope to. But as you can tell from the above quote, he is sometimes a bit extreme in his judgment.

There is no shortage of knowledgeable people who disagree with him on this issue.

But more importantly, he looks at linking from a very different perspective. For many years, he was employed by Red Hat. He was one of those people who knew a lot about dealing with dynamic libraries and maintained a centralized repository of packages that worked well together in a controlled environment.

It is understandable that he would not care about any of the advantages I list above (though this is different from claiming that there has never been and never will be any benefits to static linking).

But what about the advantages of the dynamic linking that Ulrich describes in his article?

Centralized bug/security fixes.

  1. Security issues matter less for scientific software because it is not exposed to the outside world.

  2. HPC cluster users don’t benefit from centralized bug fixes because usually they don’t have the permissions to install software system-wide. Every user of the same cluster or node would still be responsible for their own updates.

  3. The scale is very different. If you are Red Hat, re-linking hundreds or thousands of binaries every time there is an update in a library is a significant burden. If you are a researcher, you deal maybe with a dozen or two programs, and you may not have to update them often.

  4. Even when centralized updates are possible (e.g. if you can request libraries to be installed centrally and then link against them), scientists would not want them because they are directly at odds with reproducibility.

More efficient use of physical memory through sharing the code.

  1. In high-performance computing, the size of the libraries is usually negligible compared to the size of the data being processed.

  2. When the number of running processes is small, and they don’t have many common dependencies, there’s not much opportunity for sharing.

  3. On the other hand, sometimes multiple copies of the same executable are run in parallel. This happens with software that is not capable of multithreading or cannot exploit it efficiently. Well, in this case, the OS actually can share the code across the processed because it is exactly the same.

  4. When there’s little sharing of code between processes, static linking can sometimes be more memory-efficient. This is because static linking only embeds the object files (i.e. parts of a library) that are actually used by the application, whereas dynamic linking has to load the entire library into memory.

Security measures like load address randomization—see above.

Some features of glibc require dynamic linking. Ulrich, by the way, was one of the core developers of glibc—just in case you were wondering why he considers this a problem of static linking and not a problem of glibc.

Fortunately, most scientific software doesn’t perform character conversions or go to the network. It just crunches numbers. You don’t need dynamic linking for that.

Licensing considerations. I am not a lawyer, but as far as I can tell, this should concern you only if the software is closed-source (or distributed under a license incompatible with GPL) and some of those dependencies are licensed under LGPL. In that case, those dependencies must be linked dynamically, although the other ones can still be linked statically.

I am not sure why Ulrich writes “(L)GPL”, since, to my knowledge, GPL itself does not make a distinction between static and dynamic linking, but I am happy to be corrected.

Tools and hacks like ltrace, LD_PRELOAD, LD_PROFILE, LD_AUDIT don’t work. Oh well.

OK, how do I do this?

Unfortunately, most of the scientific software I come across is linked dynamically by default. Otherwise, I wouldn’t be writing this article.

Convincing the build system

Read the installation instructions. They usually can be found in a file named README, INSTALL, or on the website. If they mention static linking, congratulations.

If not, also try looking inside the Makefile (or whatever build system the software uses). If there is a configure script, try ./configure --help. There could be a target for static linking that the author has not documented.

If the build system doesn’t support static linking out of the box, you will have to modify the linker flags. If a Makefile is well-written, this should be as simple as LDFLAGS=-static make or make LDFLAGS=-static (these are different; try them in this order).

If that doesn’t work, edit the Makefile. Locate the rule that does linking, i.e. produces the final executable.

It usually looks like this:

program: $(OBJS)
    gcc -o program $(OBJS)

Note that gcc could also be g++, or ld, or hidden behind a variable such as $(CC), $(CXX), or $(LD). The variable $(OBJS) could also be named differently, or be a literal list of .o, .c, or .cpp files. But the program is usually exactly the name of the final program (or, again, a variable that expands to one).

Once you located this rule, try adding a -static flag to the gcc command line:

program: $(OBJS)
    gcc -static -o program $(OBJS)

In many cases it will be enough to perform static linking.

Sometimes you need to get more creative. One tool, for instance, explicitly built itself as a shared library. This is incompatible with a (global) -static flag set as part of $(LDFLAGS). The way I solved this was to specify a target explicitly, i.e. make prog1 prog2, so that it wouldn’t attempt to build a dynamic library and fail.

Dependencies

In order to statically link a program, you need to have its dependencies available as static libraries. These are files names as libsomething.a.

If this is a library available in your distribution:

If this is a third-party library, you’ll need to get a static version of it or compile it from source, following the same instructions that you are reading right now.

Verifying the result

How do you check that you got a static binary? Try running file and ldd on it.

For a dynamic binary, you’ll get something like this:

% ldd ./eagle 
    linux-vdso.so.1 (0x00007ffd47d87000)
    libhts.so.1 => not found
    libboost_program_options.so.1.49.0 => not found
    libboost_iostreams.so.1.49.0 => not found
    libz.so.1 => /lib64/libz.so.1 (0x00007fe77a445000)
    libopenblas.so.0 => /lib64/libopenblas.so.0 (0x00007fe778133000)
    libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fe777f17000)
    libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fe777b90000)
    libm.so.6 => /lib64/libm.so.6 (0x00007fe777886000)
    libgomp.so.1 => /lib64/libgomp.so.1 (0x00007fe777658000)
    libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fe777441000)
    libc.so.6 => /lib64/libc.so.6 (0x00007fe77707e000)
    libgfortran.so.3 => /lib64/libgfortran.so.3 (0x00007fe776d4d000)
    /lib64/ld-linux-x86-64.so.2 (0x000055f7b3885000)
    libdl.so.2 => /lib64/libdl.so.2 (0x00007fe776b49000)
    libquadmath.so.0 => /lib64/libquadmath.so.0 (0x00007fe776908000)

% file ./eagle
./eagle: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.26, BuildID[sha1]=af18461c835d6f0209754b78c639581c67ed1443, stripped

For a static binary, you’ll see this instead:

% ldd ./Minimac3
    not a dynamic executable

% file ./Minimac3
./Minimac3: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), statically linked, for GNU/Linux 2.6.26, BuildID[sha1]=edf443bb3e695b3f421b0d162ca30bb0f422c2ad, stripped

By the way, if file says that your binary is “not stripped”, run the strip command on it:

% strip eagle

This will significantly reduce its on-disk size and make it faster to copy to the cluster.