Generalizing generic fold
March 11, 2013
Suppose we’ve created a new container data structure in Haskell, such as
Data.Sequence.Seq. We now want to write a
Data instance for it, to allow generic queries and transforms inside our container.
gfoldl definition for lists:
gfoldl f z  = z  gfoldl f z (x:xs) = z (:) `f` x `f` xs
We can define
gfoldl similarly for sequences (in fact, this is the precise definition from
gfoldl f z s = case viewl s of EmptyL -> z empty x :< xs -> z (<|) `f` x `f` xs
There are, however, downsides to this definition:
Generic map over our container would be less efficient than hand-written map, even if we ignore the costs associated with run-time type casts.
Indeed, in the generic definition we do not take advantage of the fact that we already have a proper container structure for our elements. Instead, we construct the structure from scratch, adding elements one by one. This overhead may be small for
Seq, whose cons is O(1). But in case of an array-based structure, such as
Vector, we’re out of luck.
If we only care about the closed recursion combinators, such as
everywhere, we wouldn’t know the difference. But when open recursion is needed (e.g.
gmapT), the semantics is not the one we’d expect. The intuition is that
gmapTmaps over the immediate «subterms» of an expression. One might expect that the subterms of
Seq aare all of its elements (of type
a). But no, we’re forced into this recursive list mindset, where the immediate subterms of a sequence are its head and tail.
(To be fair, it does make sense if we imagine data structures really being inductively-defined terms. But it is a very limited worldview.)
Can we define
gfoldl in the uniform fashion, folding over all the elements at once? Try this yourself to understand better why it’s not easy.
I haven’t actually done it, but I think it’s achievable using some amount of type class hackery. Anyway, having to come up with such hacks defeats the point of an easy-to-use generic programming library.
In essence, the limitation of
gfoldl is that it forces us to know the number of elements in advance. We cannot (easily) construct the result of the fold by recursive computation.
In order to resolve the problem stated above, we need to restrict the type of
Recall its type:
gfoldl :: (forall d b. Data d => c (d -> b) -> d -> c b) -> (forall g. g -> c g) -> a -> c a
What might the first functional argument of
gfoldl do? It probably transforms
d to something like
c d by using its
Data instance, and then combines
c (d -> b) with
c d to obtain
c b. This is not a sure thing, of course: the caller of
gfoldl knows what
c exactly is, so it might combine them in unexpected ways. Let us proceed with this assumption, however; as I said above, we need to restrict the caller in order to give more power to the callee.
So, if our guess is correct, we can separate the two activities into two different functions:
gfoldl :: (forall d. Data d => d -> c d) -> (forall d b. c (d -> b) -> c d -> c b) -> (forall g. g -> c g) -> a -> c a
The whole thing now suddenly becomes very familiar:
c is an applicative functor! The second and third arguments are
pure, respectively. And the applicative laws had better hold — there are now many ways to express the same thing, and we don’t expect there to be a difference.
Applicative class, we can rewrite the type as
gtraverse :: Applicative c => (forall d . GTraversable d => d -> c d) -> a -> c a
Note that I took this chance to rename our restricted
GTraversable. The reason is that it is a generic version of
Data.Traversable, whose type is
traverse :: Applicative c => (a -> c b) -> t a -> c (t b)
Read the type of
gtraverse as «traverse this structure, applying the supplied function to the immediate subterms; then use the
Applicative instance to reconstruct the original structure».
We can now write the uniform instance for lists (and any other algebraic data type) without a hassle:
gtraverse f  = pure  gtraverse f (x:xs) = (:) <$> f x <*> gtraverse f xs
But how does it restrict the caller of
gfoldl? Not much. All the usual combinators, such as
gmapM etc. are expressible with
gtraverse. I’d be curious to see an application of
gfoldl for which
gtraverse is too restrictive2.