Generalizing generic fold
Published on
Uniform folding
Suppose we’ve created a new container data structure in Haskell, such
as Data.Sequence.Seq
. We now want to write a
Data
instance for it, to allow generic queries and
transforms inside our container.
Let’s recall gfoldl
definition for lists:
= z []
gfoldl f z [] :xs) = z (:) `f` x `f` xs gfoldl f z (x
We can define gfoldl
similarly for sequences (in fact,
this is the precise definition from Data.Sequence
):
= case viewl s of
gfoldl f z s EmptyL -> z empty
:< xs -> z (<|) `f` x `f` xs x
There are, however, downsides to this definition:
Generic map over our container would be less efficient than hand-written map, even if we ignore the costs associated with run-time type casts.
Indeed, in the generic definition we do not take advantage of the fact that we already have a proper container structure for our elements. Instead, we construct the structure from scratch, adding elements one by one. This overhead may be small for
Seq
, whose cons is O(1). But in case of an array-based structure, such asVector
, we’re out of luck.If we only care about the closed recursion combinators, such as
everywhere
, we wouldn’t know the difference. But when open recursion is needed (e.g.gmapT
), the semantics is not the one we’d expect. The intuition is thatgmapT
maps over the immediate «subterms» of an expression. One might expect that the subterms ofSeq a
are all of its elements (of typea
). But no, we’re forced into this recursive list mindset, where the immediate subterms of a sequence are its head and tail.(To be fair, it does make sense if we imagine data structures really being inductively-defined terms. But it is a very limited worldview.)
Can we define gfoldl
in the uniform fashion, folding
over all the elements at once? Try this yourself to understand better
why it’s not easy.
I haven’t actually done it, but I think it’s achievable using some amount of type class hackery. Anyway, having to come up with such hacks defeats the point of an easy-to-use generic programming library.
In essence, the limitation of gfoldl
is that it forces
us to know the number of elements in advance. We cannot (easily)
construct the result of the fold by recursive computation.
Generic traversals
In order to resolve the problem stated above, we need to
restrict the type of gfoldl
1.
Recall its type:
gfoldl :: (forall d b. Data d => c (d -> b) -> d -> c b)
-> (forall g. g -> c g)
-> a -> c a
What might the first functional argument of gfoldl
do?
It probably transforms d
to something like
c d
by using its Data
instance, and then
combines c (d -> b)
with c d
to obtain
c b
. This is not a sure thing, of course: the caller of
gfoldl
knows what c
exactly is, so it might
combine them in unexpected ways. Let us proceed with this assumption,
however; as I said above, we need to restrict the caller in order to
give more power to the callee.
So, if our guess is correct, we can separate the two activities into two different functions:
gfoldl :: (forall d. Data d => d -> c d)
-> (forall d b. c (d -> b) -> c d -> c b)
-> (forall g. g -> c g)
-> a -> c a
The whole thing now suddenly becomes very familiar: c
is
an applicative functor! The second and third arguments are
<*>
and pure
, respectively. And the
applicative laws had better hold — there are now many ways to express
the same thing, and we don’t expect there to be a difference.
Using the Applicative
class, we can rewrite the type
as
gtraverse :: Applicative c => (forall d . GTraversable d => d -> c d) -> a -> c a
Note that I took this chance to rename our restricted
gfoldl
to gtraverse
and Data
to
GTraversable
. The reason is that it is a generic version of
traverse
from Data.Traversable
, whose type
is
traverse :: Applicative c => (a -> c b) -> t a -> c (t b)
Read the type of gtraverse
as «traverse this structure,
applying the supplied function to the immediate subterms; then use the
Applicative
instance to reconstruct the original
structure».
We can now write the uniform instance for lists (and any other algebraic data type) without a hassle:
= pure []
gtraverse f [] :xs) = (:) <$> f x <*> gtraverse f xs gtraverse f (x
But how does it restrict the caller of gfoldl
? Not much.
All the usual combinators, such as gmapT
,
gmapQ
, gmapM
etc. are expressible with
gtraverse
. I’d be curious to see an application of
gfoldl
for which gtraverse
is too
restrictive2.