# Generalizing generic fold

Published on

## Uniform folding

Suppose we’ve created a new container data structure in Haskell, such as `Data.Sequence.Seq`

. We now want to write a `Data`

instance for it, to allow generic queries and transforms inside our container.

Let’s recall `gfoldl`

definition for lists:

We can define `gfoldl`

similarly for sequences (in fact, this is the precise definition from `Data.Sequence`

):

There are, however, downsides to this definition:

Generic map over our container would be less efficient than hand-written map, even if we ignore the costs associated with run-time type casts.

Indeed, in the generic definition we do not take advantage of the fact that we already have a proper container structure for our elements. Instead, we construct the structure from scratch, adding elements one by one. This overhead may be small for

`Seq`

, whose cons is O(1). But in case of an array-based structure, such as`Vector`

, we’re out of luck.If we only care about the

*closed recursion*combinators, such as`everywhere`

, we wouldn’t know the difference. But when*open recursion*is needed (e.g.`gmapT`

), the semantics is not the one we’d expect. The intuition is that`gmapT`

maps over the immediate «subterms» of an expression. One might expect that the subterms of`Seq a`

are all of its elements (of type`a`

). But no, we’re forced into this recursive list mindset, where the immediate subterms of a sequence are its head and tail.(To be fair, it does make sense if we imagine data structures really being inductively-defined terms. But it is a very limited worldview.)

Can we define `gfoldl`

in the uniform fashion, folding over all the elements at once? Try this yourself to understand better why it’s not easy.

I haven’t actually done it, but I think it’s achievable using some amount of type class hackery. Anyway, having to come up with such hacks defeats the point of an easy-to-use generic programming library.

In essence, the limitation of `gfoldl`

is that it forces us to know the number of elements in advance. We cannot (easily) construct the result of the fold by recursive computation.

## Generic traversals

In order to resolve the problem stated above, we need to *restrict* the type of `gfoldl`

^{1}.

Recall its type:

What might the first functional argument of `gfoldl`

do? It *probably* transforms `d`

to something like `c d`

by using its `Data`

instance, and then combines `c (d -> b)`

with `c d`

to obtain `c b`

. This is not a sure thing, of course: the caller of `gfoldl`

knows what `c`

exactly is, so it might combine them in unexpected ways. Let us proceed with this assumption, however; as I said above, we need to restrict the caller in order to give more power to the callee.

So, if our guess is correct, we can separate the two activities into two different functions:

```
gfoldl :: (forall d. Data d => d -> c d)
-> (forall d b. c (d -> b) -> c d -> c b)
-> (forall g. g -> c g)
-> a -> c a
```

The whole thing now suddenly becomes very familiar: `c`

is an applicative functor! The second and third arguments are `<*>`

and `pure`

, respectively. And the applicative laws had better hold — there are now many ways to express the same thing, and we don’t expect there to be a difference.

Using the `Applicative`

class, we can rewrite the type as

Note that I took this chance to rename our restricted `gfoldl`

to `gtraverse`

and `Data`

to `GTraversable`

. The reason is that it is a generic version of `traverse`

from `Data.Traversable`

, whose type is

Read the type of `gtraverse`

as «traverse this structure, applying the supplied function to the immediate subterms; then use the `Applicative`

instance to reconstruct the original structure».

We can now write the uniform instance for lists (and any other algebraic data type) without a hassle:

But how does it restrict the caller of `gfoldl`

? Not much. All the usual combinators, such as `gmapT`

, `gmapQ`

, `gmapM`

etc. are expressible with `gtraverse`

. I’d be curious to see an application of `gfoldl`

for which `gtraverse`

is too restrictive^{2}.

Yes, the article’s title is a lie. Thanks for reading this far. What we are really

*generalizing*, by allowing more instances, is the`Data`

class.↩See gtraverse vs gfoldl which shows that

`gtraverse`

and`gfoldl`

are in fact equivalent in power.↩