Reducing boilerplate in finally tagless style

Published on February 3, 2016

Introduction

Typed Tagless, a.k.a tagless-final or finally tagless, is an approach to embedding DSLs and modeling data in general, advocated by Oleg Kiselyov. Instead of defining a set of algebraic data types to describe data or terms, thus focusing on how data is constructed, the approach focuses on data consumption, defining a canonical eliminator for every constructor that we would otherwise define.

For instance, instead of defining lists as

data List a = Nil | Cons a (List a)

we would define a class

class List rep a where
  nil :: rep
  cons :: a -> rep -> rep

which of course corresponds to the Böhm-Berarducci (or Church) encoding of the above algebraic type.

Oleg has written extensively on the merits of this approach. In this article, I want to discuss a certain aspect of writing transformations in the finally tagless style.

The use case: language-integrated query

Oleg, together with Kenichi Suzuki and Yukiyoshi Kameyama, have published a paper Finally, Safely-Extensible and Efficient Language-Integrated Query. In this paper, they employ the finally tagless approach to embed, optimize, and interpret SQL queries in OCaml.

Here are some excerpts from their OCaml code:

(* Base Symantics *)
module type Symantics_base = sig
  ...
  (* lambda abstract *)
  val lam     : ('a repr -> 'b repr) -> ('a -> 'b) repr
  (* application *)
  val app     : ('a -> 'b) repr -> 'a repr -> 'b repr
  ...
end

(* Symantics with list operations *)
module type SymanticsL = sig
  include Symantics

  (* comprehension *)
  val foreach : (unit -> 'a list repr) ->
                ('a repr ->  'b list repr) -> 'b list repr
  (* condition *)
  val where   :  bool repr -> (unit -> 'a list repr) -> 'a list repr
  (* yield singleton list *)
  val yield   : 'a repr ->  'a list repr
  (* empty list *)
  val nil     :  unit -> 'a list repr
  (* not empty *)
  val exists  :  'a list repr ->  bool repr
  (* union list *)
  val (@%)    : 'a list repr -> 'a list repr -> 'a list repr

  (* the table constructor which take a table name and table contents *)
  val table : (string * 'a list) -> 'a list repr
end

(‘Symantics’ is not a typo; it’s a portmanteau of ‘syntax’ and ‘semantics’.)

Transformations

A SQL trasnformation (such as transforming a subquery to a join) is represented by an ML functor, i.e. a function mapping one SymanticsL to another, which interprets the term slightly differently than the original one. I say slightly, because normally a transformation touches only a few relevant methods. The others are transformed mechanically following the Reflection-Reification pattern (RR). Informally speaking, we leave the irrelevant methods unchanged, applying the minimal transformation that makes them typecheck.

The question is, how to avoid mentioning irrelevant methods when defining a transformation?

This question is not idle. The language-integrated query code contains about 40 methods and 13 transformations. Pause for a second and imagine the amount of boilerplate that would have to be written if we needed to define every single method for every transformation. As we will see below, ML modules make this a non-issue. In Haskell, however, it is an issue, exhibited in Oleg’s own Haskell example (although easy to miss for a class that only contains 3 methods).

In OCaml, the RR is defined as a transformation of the whole module:

module OL(X:Trans)
         (F:SymanticsL with type 'a repr = 'a X.from)  = struct
  include O(X)(F)
  open X

  let foreach src body =
    fwd (F.foreach (fun () -> bwd (src ())) (fun x -> bwd (body (fwd x))))
  let where test body  =
    fwd (F.where (bwd test) (fun () -> bwd (body ())))
  let yield e    = fmap F.yield e
  let nil ()     = fwd (F.nil ())
  let exists e   = fmap F.exists e
  let (@%) e1 e2 = fmap2 F.(@%) e1 e2

  let table (name,data) =
    fwd @@ F.table (name, data)
end

When they define a transformation, they first transform the module in this mechanical fashion, and then override the few relevant methods:

module AbsBeta_pass(F:SymanticsL) = struct
  module X0 = struct
    type 'a from = 'a F.repr
    type 'a term = Unknown : 'a from -> 'a term
             | Lam     : ('a term -> 'b term) -> ('a -> 'b) term
    let fwd x = Unknown x                              (* generic reflection *)
    let rec bwd : type a. a term -> a from = function  (* reification *)
      | Unknown e -> e
      | Lam f     -> F.lam (fun x -> bwd (f (fwd x)))
  end
  open X0
  module X = Trans_def(X0)
  open X
  (* optimization *)
  module IDelta = struct
    let lam f = Lam f
    let app e1 e2 =
      match e1 with
      | Lam f -> f e2
      | _ -> fmap2 F.app e1 e2
  end
end

(* Combine the concrete optimization with the default optimizer *)
module AbsBeta(F:SymanticsL) = struct
  module M = AbsBeta_pass(F)
  include OL(M.X)(F)        (* the default optimizer *)
  include M.IDelta          (* overriding `lam` and `app` *)
end

How can we do this in Haskell?

Explicit dictionaries

An explicit dictionariy (a data type containing methods as its fields) seems like a great fit for Symantics. The RR transformation would be a simple function mapping one record to another. To define a transformation, we would override the relevant methods via record update.

However, explicit dictionaries are not that well suited for the finally tagless style. In OCaml, you can include one module into another (notice include Symantics in the OCaml code above). This “unpacks” the contents of one module into another, so that when you open the second module, the contents of the first module is available, too.

This is important for the finally tagless style. One of its strength is extensibility, which is achieved through such inclusion. Consequently, deep inclusion chains are common. With Haskell’s data types, unpacking such chains manually at every use site will quickly become unwieldy.

Type classes

Type classes are better suited for inclusion. If we declare

class Symantics1 rep => Symantics2 rep where { ... }

and impose a Symantics2 rep constraint on a function definition, the methods of Symantics1 become available without any additional effort.

But then we don’t have good support for RR. Type class instances are not first class citizens; we can’t declare a function that transforms one instance into another. Nor can we create one instance from another by overriding a few methods… Or can we?

We can achieve our goal by using default method signatures.

We define the RR transformation simultaneously with the class itself:

class Symantics rep where
  lam :: (rep a -> rep b) -> rep (a -> b)
  default lam :: RR t rep => (t rep a -> t rep b) -> t rep (a -> b)
  lam f = fwd $ lam $ bwd . f . fwd

  app :: rep (a -> b) -> rep a -> rep b
  default app :: RR t rep => t rep (a -> b) -> t rep a -> t rep b
  app f x = fwd $ bwd f `app` bwd x

  foreach :: rep [a] -> (rep a -> rep [b]) -> rep [b]
  default foreach :: RR t rep => t rep [a] -> (t rep a -> t rep [b]) -> t rep [b]
  foreach a b = fwd $ foreach (bwd a) (bwd . b . fwd)

  ...

The implementation of RR is straightforward:

class RR t rep where
  fwd :: rep a -> t rep a
  bwd :: t rep a -> rep a

Now let’s define the AbsBeta pass in Haskell.

data AbsBeta rep a where
  Unknown :: rep a -> AbsBeta rep a
  Lam :: (AbsBeta rep a -> AbsBeta rep b) -> AbsBeta rep (a -> b)

instance Symantics rep => RR AbsBeta rep where
  fwd = Unknown
  bwd = \case
    Unknown t -> t
    Lam f -> lam (bwd . f . fwd)

instance Symantics rep => Symantics (AbsBeta rep) where
  lam = Lam
  app f x =
    case f of
      Unknown f' -> fwd $ app f' (bwd x)
      Lam b -> b x

All the methods not mentioned in the last instance get their default implementations based on RR, which is exactly what we wanted.

Associated types

Apart from methods, ML/OCaml modules can also define types. This is used in the Language-integrated query paper and code in the following way:

(* Base Symantics *)
module type Symantics_base = sig
  type 'a repr  (* representation type *)
  val observe : (unit -> 'a repr) -> 'a obs
  ...

In Haskell, we can replicate that with an associated type:

class SymanticsObs rep where
  type Obs rep :: * -> *

  observe :: rep a -> Obs rep a
  default observe :: RR t rep => t rep a -> Obs rep a
  observe = observe . bwd

The default definition for observe saves us from redefining it for derived representations, but what about Obs itself? We would like to write, in the spirit of default method signatures,

class SymanticsObs rep where
  type Obs rep :: * -> *
  type Obs (t rep) = rep

However, GHC would not let us to. Since recently, GHC does support default type declarations, but they need to be of the general form type Obs rep = ....

Nevertheless, we can create a type family that will extract the rep from t rep for us:

type family Peel (rep :: * -> *) :: (* -> *) where
  Peel (t rep) = rep

class SymanticsObs rep where
  type Obs rep :: * -> *
  type Obs rep = Obs (Peel rep)

  observe :: rep a -> Obs rep a
  default observe :: RR t rep => t rep a -> Obs rep a
  observe = observe . bwd

Now we can say

instance (Symantics rep, SymanticsObs rep) => SymanticsObs (AbsBeta rep)

without having to define either type Obs or observe explicitly.

Conclusion

Extensions such as default method signatures, default associated types, and type families can significantly reduce the boilerplate when defining transformations in the finally tagless style.

Update. Although I missed it on the first reading of the paper, /u/rpglover64 on reddit points out that the authors themselves acknowledge the boilerplate problem which this article addresses:

Haskell typeclasses made the encoding lightweight compared to OCaml modules. On the other hand, in OCaml we relied on the include mechanism to program optimizations by reusing the code for the identity transformation and overriding a couple of definitions. Haskell does not support that sort of code reuse among type classes. Therefore, programming tagless-final transformation in Haskell has quite a bit of boilerplate.