MonadFix is an odd beast; many Haskell programmers will never use it in their careers. It is indeed very rarely that one needs MonadFix; and for that reason, non-contrived cases where MonadFix is needed are quite interesting to consider.
In this article, I’ll introduce MonadFix and show how it can be handy for compiling the Kleene closure (also known as star or repetition) of regular expressions.
What is MonadFix?
If you hear about MonadFix for the first time, you might think that it is needed to define recursive monadic actions, just like ordinary
fix is used to define recursive functions. That would be a mistake. In fact,
fix is just as applicable to monadic actions as it is to functions:
So, what is
mfix for? First, recall that in Haskell, one can create recursive definitions not just for functions (which makes sense in other, non-lazy languages) or monadic actions, but for ordinary data structures as well. This is known as cyclic (or circular, or corecursive) definitions; and the technique itself is sometimes referred to as tying the knot.
The classic example of a cyclic definition is the (lazy, infinite) list of Fibonacci numbers:
Cyclic definitions are themselves rare in day-to-day Haskell programming; but occasionally, the right hand side will be not a pure value, but a monadic computation that needs to be run in order to obtain the value.
Consider this (contrived) example, where we start the sequence with an arbitrary number entered by the user:
This doesn’t typecheck because
fibIO is not a list; it’s an IO action that produces a list.
But if we try to run the computation, it doesn’t make much sense either:
This version of
fibIO will ask you to enter the start number ad infinitum and never get to evaluating anything.
Of course, the simplest thing to do would be to move IO out of the recursive equation; that’s why I said the example was contrived. But MonadFix gives another solution:
Or, using the do-rec syntax:
Compiling regular expressions
As promised, I am going to show you an example usage of MonadFix that solved a problem other than “how could I use MonadFix?”. This came up in my work on regex-applicative.
For a simplified presentation, let’s consider this type of regular expressions:
Our goal is to compile a regular expression into a corresponding NFA. The states will be represented by integer numbers. State 0 corresponds to successful completion; and each
Sym inside a regex will have a unique positive state in which we are expecting the corresponding character.
The NFA will be represented by a map
where each state is mapped to the characters expected at that state and the list of states where we go in case we get the expected character.
To compile a regular expression, we’ll take as an argument the list of states to proceed to when the regular expression as a whole succeeds (otherwise we’d have to compile each subexpression separately and then glue NFAs together). This is essentially the continuation-passing style; only instead of functions, our continuations are NFA states.
During the compilation, we’ll use a stack of two State monads: one to assign sequential state numbers to
Syms; the other to keep track of the currently constructred NFA.
-- Returns the list of start states and the transition table compile :: RE -> ([NFAState], NFA) compile re = runState (evalStateT (go re ) 0) IntMap.empty -- go accepts exit states, returns entry states go :: RE -> [NFAState] -> StateT NFAState (State NFA) [NFAState] go re exitStates = case re of Sym c -> do !freshState <- gets (+1); put freshState lift $ modify' (IntMap.insert freshState (c, exitStates)) return [freshState] Alt r1 r2 -> (++) <$> go r1 exitStates <*> go r2 exitStates Seq r1 r2 -> go r1 =<< go r2 exitStates
This was easy so far: alternatives share their exit states and their entry states are combined; and consequtive subexpressions are chained. But how do we compile
Rep? The exit states of the repeated subexpression should become its own entry states; but we don’t know the entry states until we compile it!
And this is precisely where MonadFix (or recursive do) comes in:
Why does this circular definition work? If we unwrap the
State types, we’ll see that the
go function actually computes a triple of three non-strict fields:
- The last used state number
- The list of entry states
- The NFA map
The elements of the triple may depend on each other as long as there are no actual loops during evaluation. One can check that the fields can be indeed evaluated linearly in the order in which they are listed above:
- The used state numbers at each step depend only on the regular expression itself, so it can be computed wihtout knowing the other two fields.
- The list of entry states relies only on the state number information; it doesn’t need to know anything about the NFA transitions.
- The NFA table needs to know the entry and exit states; but that is fine, we can go ahead and compute that information without creating any reverse data dependencies.
An ASM Monad – a similar example from a different domain.
Oliver Charles’s 24 Days of GHC Extensions: Recursive Do.
Levent Erkok’s thesis which contains all you need to know about MonadFix, including several other examples.
Todd Wilson points out that Douglas McIlroy describes a similar regular expression compilation technique in his 2004 JFP Functional Pearl Enumerating the strings of regular languages. Like this article, Douglas’s paper uses a circular definition when compiling the Kleene closure. But the circular definition is not monadic there: instead of using the State monad, Douglas passes the state around by hand.