Roman Cheplyaka

Reasoning about space usage in Haskell

April 8, 2012

This article describes why it’s hard to reason about space usage of Haskell programs.

Operational semantics

The standard (“The Haskell Report”) does not specify an operational semantics for Haskell. Thus, in order to talk about space usage of Haskell programs, we need to fix a particular semantics. A good choice is the STG semantics as defined in the fast curry paper. First, it is formally defined; second, it has significant practical value, since it is used in GHC, the de-facto standard Haskell compiler.

Note, however, that the STG operational semantics is defined for STG programs, not Haskell programs (STG defines its own small functional language). It’s not always obvious what STG program corresponds to a given Haskell program because of numerous optimizations performed by the compiler, but GHC allows to see the resulting STG code using the -ddump-stg flag.

Operational properties are context-dependent

In eager languages, usually we can easily express time and space complexity, e.g. using O-notation.

In Haskell, operational properties are largely context-dependent, and there’s no existing framework to express them.


One source of context dependency is a direct consequence of lazy evaluation: the function result may be evaluated to different extents by different callers.

For example, it may seem that [1..n] requires O(n) memory, but when called from take m [1..n], it takes only O(min(m,n)) memory.

Garbage collection

Another source of context dependency is interaction between laziness and automatic memory management.

For example, the following two functions look equivalent in terms of space usage (although the second one is somewhat less time-efficient):

f1 list = foldl' (\(!a,!b) x -> (a + x, b * x)) (0, 1) list

f2 list = (foldl' (\a x -> a + x) 0 list, foldl' (\b x -> b * x) 1 list)

It would seem that these functions run in constant space. Of course, by “constant space” we really mean “constant additional space” – we do not count the space occupied by the list itself, because the function doesn’t create the list. Well, in a lazy language it does!

The difference becomes clear if we consider f1 [1..n] and f2 [1..n]: the former indeed runs in constant space, while the latter requires linear space.

The reason is that f1 can process the list incrementally. It demands the next value from the list, updates its accumulators and throws the value away.

In f2, the first fold evaluates the list, but it has to be kept in memory to be used for the second fold. As a result, by the time the first fold is evaluated, the whole list is evaluated and none of it is garbage collected.


It’s not enough to track how much space our data occupies. Another kind of heap inhabitants is thunks. When more thunks are created than necessary, we have a thunk leak.

By the way, how many thunks may be necessary? Are there examples where having more than O(1) thunks at some point of program execution is the best solution to the problem? I don’t know yet.

In any case, we need to be able to reason about the number of thunks that are created by a function.

To complicate things further, creation of thunks depends on how the function result is evaluated. For example, a function returns a (lazy) tuple. We force the first element of that tuple. During the evaluation, everything that is needed to evaluate that first element is evaluated immediately, but values needed only for the second element are stored in thunks.

If we evaluated the second element, the picture would be dual, and the number of thunks probably would be different.


Currently, if we want to reason about space usage of Haskell programs, we need:

  • study the source code of each function. A function documentation typically includes its type, denotational semantics and perhaps some O-figures, but that’s not enough, as we’ve seen;
  • for each function, understand how its result is evaluated by all callers.

Clearly, this breaks abstraction and modularity.

In practice, GHC’s strictness analyser is very smart, so most of the time we don’t notice any problems. Still, it’s very important to have a framework in which we could analyse and document functions’ operational behaviour in a modular and composable way.