Haskell library in a C project
Published on ; updated on
Haskell FFI (Foreign Function Interface) lets us call C code from Haskell and Haskell code from C.
Usually, people are more interested in calling C libraries from Haskell than vice versa. This is how we have bindings to operating systems (like POSIX and Windows APIs), graphics libraries (like GTK and Qt), and highly optimized code (like BLAS or ICU).
But at NStack, we are going in the opposite direction. We want to build a Haskell library that is going to be used by the programming languages that we support, such as Python.
Several places on the internet explain how to write FFI export declarations and initialize the GHC RTS from C: the GHC manual, the Haskell wiki, and a tutorial by Alex Petrov among others.
But none of them address the elephant in the room: how do you turn
your Haskell code into a library? You know, an .so
file
that you can pass to a linker or load with dlopen
.
This may seem like a trivial matter, so let’s review what exactly makes it complicated.
When ghc compiles a Haskell package that contains a library, by
default it will produce both a dynamic (.so
) and a static
(.a
) library. The dynamic libraries are used to execute
Template Haskell code, while the static libaries are used when linking
the final executable (again, by default).
These libraries are scattered around the file system, e.g.:
~/.stack/programs/x86_64-linux/ghc-8.0.2/lib/ghc-8.0.2/deepseq-1.4.2.0/libHSdeepseq-1.4.2.0-ghc8.0.2.so
~/.stack/snapshots/x86_64-linux/lts-8.11/8.0.2/lib/x86_64-linux-ghc-8.0.2/libHSwai-3.2.1.1-9yigkTgtHNLHh2mXrnIXo-ghc8.0.2.so
.stack-work/dist/x86_64-linux/Cabal-1.24.2.0/build/libHStasty-0.11.2.1-4ogqtS3RxMEH205xhabqo-ghc8.0.2.so
A typical Haskell package will have hundreds of transitive dependencies, all of which you need to give to the linker. Furthermore, they need to come in the right (reverse topological) order.
Much of this complexity is managed by stack, cabal-install, or
ghc --make
, so you don’t notice it. Additionally, since by
default Haskell depedencies are linked statically, you don’t have to
worry about distributing them.
But these tools are designed to build executables, not shared objects, and I couldn’t make them do what I needed.
So I’ve written a perl script that does the following:
- constructs the package dependency graph by repeatedly calling
ghc-pkg field
; - collects all compilation and linking options and writes them to two
files,
incopts.txt
(“inc” stands for “include”) andldopts.txt
; - creates a
lib
directory and symlinks all the relevant.so
files there so that they can be easily distributed.
Because the script uses the low-level ghc-pkg
command,
it can be used in any Haskell environment — be it stack, cabal-install,
or even pure ghc. For instance, to use it with stack, I run
stack exec perl opts.pl libnstack
Here are a few things I’ve learned along the way while writing the script, for your education or amusement:
Each package has two library locations associated with it:
library-dirs
anddynamic-library-dirs
. Except therts
package, which only haslibrary-dirs
containing both static and dynamic libraries.The libraries listed in
hs-libraries
are divided into “Haskell” and “C” libraries. Their library names start withHS
andC
, respectively. The division has nothing to do with the language used to produce a library, because the rts, which is written in C, has the nameHSrts
. The way I understand it is that “Haskell” libraries differ among the ghc versions, while “C” libraries usually don’t. The only example of a “C” library is currentlyCffi
.Why is
libffi
inhs-libraries
and not inextra-libraries
, together withlibgmp
,libm
etc.? My guess is, to link it statically by default.The
hs-libraries
field of the package description gives the static library name. To get the dynamic library name of a “Haskell” library, append-ghc8.0.2
(substitute your compiler version).To get the dynamic library name of a “C” library (
Cffi
), strip thatC
. As the comment incompiler/main/Packages.hs
explains:-- For non-Haskell libraries, we use the name "Cfoo". The .a -- file is libCfoo.a, and the .so is libfoo.so. That way the -- linker knows what we mean for the vanilla (-lCfoo) and dyn -- (-lfoo) ways.
Make sure you are linking against the threaded RTS, or else your signals won’t be handled properly, and an innocent Ctrl-C will likely result in a segfault. To link against the threaded RTS, replace
HSrts
withHSrts_thr
.HSrts
is not the only library that has a_thr
version; there’s alsolibCffi_thr.a
, but it is identical tolibCffi.a
.We’ve paid quite a bit of attention to
libCffi
, but it’s probably not even used on the major platforms like x86 or x86_64. Seerts/Adjustor.c
andmk/config.mk
.There’s a field for linker options, called
ld-options
, which is apparently only used by therts
package. It contains a bunch of flags like-Wl,-u,base_GHCziTopHandler_runNonIO_closure -Wl,-u,ghczmprim_GHCziTypes_Czh_con_info -Wl,-u,base_GHCziInt_I8zh_static_info ...
I haven’t yet figured out what they do and why they are needed.
Thanks to Ben Gamari for answering some of my questions on IRC and Edward George for helping to debug the script.
Note on Cabal’s foreign libraries
Mikhail
Glushenkov points out that, starting from Cabal 2.0, you can use a
foreign-library
component in a cabal file instead of the usual library
.
While this feature is definitely an improvement, it doesn’t seem to address the issues described above and solved by my Perl script:
- Link the resulting library using gcc or ld (not ghc).
- Distribute the Haskell foreign library together with its dependencies.