Roman Cheplyaka

Haskell library in a C project

Published on July 26, 2017

Haskell FFI (Foreign Function Interface) lets us call C code from Haskell and Haskell code from C.

Usually, people are more interested in calling C libraries from Haskell than vice versa. This is how we have bindings to operating systems (like POSIX and Windows APIs), graphics libraries (like GTK and Qt), and highly optimized code (like BLAS or ICU).

But at NStack, we are going in the opposite direction. We want to build a Haskell library that is going to be used by the programming languages that we support, such as Python.

Several places on the internet explain how to write FFI export declarations and initialize the GHC RTS from C: the GHC manual, the Haskell wiki, and a tutorial by Alex Petrov among others.

But none of them address the elephant in the room: how do you turn your Haskell code into a library? You know, an .so file that you can pass to a linker or load with dlopen.

This may seem like a trivial matter, so let’s review what exactly makes it complicated.

When ghc compiles a Haskell package that contains a library, by default it will produce both a dynamic (.so) and a static (.a) library. The dynamic libraries are used to execute Template Haskell code, while the static libaries are used when linking the final executable (again, by default).

These libraries are scattered around the file system, e.g.:

A typical Haskell package will have hundreds of transitive dependencies, all of which you need to give to the linker. Furthermore, they need to come in the right (reverse topological) order.

Much of this complexity is managed by stack, cabal-install, or ghc --make, so you don’t notice it. Additionally, since by default Haskell depedencies are linked statically, you don’t have to worry about distributing them.

But these tools are designed to build executables, not shared objects, and I couldn’t make them do what I needed.

So I’ve written a perl script that does the following:

  1. constructs the package dependency graph by repeatedly calling ghc-pkg field;
  2. collects all compilation and linking options and writes them to two files, incopts.txt (“inc” stands for “include”) and ldopts.txt;
  3. creates a lib directory and symlinks all the relevant .so files there so that they can be easily distributed.

Because the script uses the low-level ghc-pkg command, it can be used in any Haskell environment — be it stack, cabal-install, or even pure ghc. For instance, to use it with stack, I run

stack exec perl opts.pl libnstack

Here are a few things I’ve learned along the way while writing the script, for your education or amusement:

  1. Each package has two library locations associated with it: library-dirs and dynamic-library-dirs. Except the rts package, which only has library-dirs containing both static and dynamic libraries.
  2. The libraries listed in hs-libraries are divided into “Haskell” and “C” libraries. Their library names start with HS and C, respectively. The division has nothing to do with the language used to produce a library, because the rts, which is written in C, has the name HSrts. The way I understand it is that “Haskell” libraries differ among the ghc versions, while “C” libraries usually don’t. The only example of a “C” library is currently Cffi.

    Why is libffi in hs-libraries and not in extra-libraries, together with libgmp, libm etc.? My guess is, to link it statically by default.
  3. The hs-libraries field of the package description gives the static library name. To get the dynamic library name of a “Haskell” library, append -ghc8.0.2 (substitute your compiler version).

    To get the dynamic library name of a “C” library (Cffi), strip that C. As the comment in compiler/main/Packages.hs explains:

     -- For non-Haskell libraries, we use the name "Cfoo". The .a
     -- file is libCfoo.a, and the .so is libfoo.so. That way the
     -- linker knows what we mean for the vanilla (-lCfoo) and dyn
     -- (-lfoo) ways.
  4. Make sure you are linking against the threaded RTS, or else your signals won’t be handled properly, and an innocent Ctrl-C will likely result in a segfault. To link against the threaded RTS, replace HSrts with HSrts_thr.

    HSrts is not the only library that has a _thr version; there’s also libCffi_thr.a, but it is identical to libCffi.a.
  5. We’ve paid quite a bit of attention to libCffi, but it’s probably not even used on the major platforms like x86 or x86_64. See rts/Adjustor.c and mk/config.mk.
  6. There’s a field for linker options, called ld-options, which is apparently only used by the rts package. It contains a bunch of flags like

    -Wl,-u,base_GHCziTopHandler_runNonIO_closure
    -Wl,-u,ghczmprim_GHCziTypes_Czh_con_info
    -Wl,-u,base_GHCziInt_I8zh_static_info
    ...

    I haven’t yet figured out what they do and why they are needed.

Thanks to Ben Gamari for answering some of my questions on IRC and Edward George for helping to debug the script.