Reducing boilerplate in finally tagless style
Published on
Introduction
Typed Tagless, a.k.a tagless-final or finally tagless, is an approach to embedding DSLs and modeling data in general, advocated by Oleg Kiselyov. Instead of defining a set of algebraic data types to describe data or terms, thus focusing on how data is constructed, the approach focuses on data consumption, defining a canonical eliminator for every constructor that we would otherwise define.
For instance, instead of defining lists as
data List a = Nil | Cons a (List a)
we would define a class
class List rep a where
nil :: rep
cons :: a -> rep -> rep
which of course corresponds to the Böhm-Berarducci (or Church) encoding of the above algebraic type.
Oleg has written extensively on the merits of this approach. In this article, I want to discuss a certain aspect of writing transformations in the finally tagless style.
The use case: language-integrated query
Oleg, together with Kenichi Suzuki and Yukiyoshi Kameyama, have published a paper Finally, Safely-Extensible and Efficient Language-Integrated Query. In this paper, they employ the finally tagless approach to embed, optimize, and interpret SQL queries in OCaml.
Here are some excerpts from their OCaml code:
(* Base Symantics *)
module type Symantics_base = sig
...(* lambda abstract *)
val lam : ('a repr -> 'b repr) -> ('a -> 'b) repr
(* application *)
val app : ('a -> 'b) repr -> 'a repr -> 'b repr
...end
(* Symantics with list operations *)
module type SymanticsL = sig
include Symantics
(* comprehension *)
val foreach : (unit -> 'a list repr) ->
list repr) -> 'b list repr
('a repr -> 'b (* condition *)
val where : bool repr -> (unit -> 'a list repr) -> 'a list repr
(* yield singleton list *)
val yield : 'a repr -> 'a list repr
(* empty list *)
val nil : unit -> 'a list repr
(* not empty *)
val exists : 'a list repr -> bool repr
(* union list *)
val (@%) : 'a list repr -> 'a list repr -> 'a list repr
(* the table constructor which take a table name and table contents *)
val table : (string * 'a list) -> 'a list repr
end
(‘Symantics’ is not a typo; it’s a portmanteau of ‘syntax’ and ‘semantics’.)
Transformations
A SQL trasnformation (such as transforming a subquery to a join) is
represented by an ML functor, i.e. a function mapping one
SymanticsL
to another, which interprets the term slightly
differently than the original one. I say slightly, because
normally a transformation touches only a few relevant methods. The
others are transformed mechanically following the Reflection-Reification
pattern (RR). Informally speaking, we leave the irrelevant methods
unchanged, applying the minimal transformation that makes them
typecheck.
The question is, how to avoid mentioning irrelevant methods when defining a transformation?
This question is not idle. The language-integrated query code contains about 40 methods and 13 transformations. Pause for a second and imagine the amount of boilerplate that would have to be written if we needed to define every single method for every transformation. As we will see below, ML modules make this a non-issue. In Haskell, however, it is an issue, exhibited in Oleg’s own Haskell example (although easy to miss for a class that only contains 3 methods).
In OCaml, the RR is defined as a transformation of the whole module:
module OL(X:Trans)
with type 'a repr = 'a X.from) = struct
(F:SymanticsL include O(X)(F)
open X
let foreach src body =
fun () -> bwd (src ())) (fun x -> bwd (body (fwd x))))
fwd (F.foreach (let where test body =
fun () -> bwd (body ())))
fwd (F.where (bwd test) (let yield e = fmap F.yield e
let nil () = fwd (F.nil ())
let exists e = fmap F.exists e
let (@%) e1 e2 = fmap2 F.(@%) e1 e2
let table (name,data) =
fwd @@ F.table (name, data)end
When they define a transformation, they first transform the module in this mechanical fashion, and then override the few relevant methods:
module AbsBeta_pass(F:SymanticsL) = struct
module X0 = struct
type 'a from = 'a F.repr
type 'a term = Unknown : 'a from -> 'a term
| Lam : ('a term -> 'b term) -> ('a -> 'b) termlet fwd x = Unknown x (* generic reflection *)
let rec bwd : type a. a term -> a from = function (* reification *)
| Unknown e -> efun x -> bwd (f (fwd x)))
| Lam f -> F.lam (end
open X0
module X = Trans_def(X0)
open X
(* optimization *)
module IDelta = struct
let lam f = Lam f
let app e1 e2 =
match e1 with
| Lam f -> f e2
| _ -> fmap2 F.app e1 e2end
end
(* Combine the concrete optimization with the default optimizer *)
module AbsBeta(F:SymanticsL) = struct
module M = AbsBeta_pass(F)
include OL(M.X)(F) (* the default optimizer *)
include M.IDelta (* overriding `lam` and `app` *)
end
How can we do this in Haskell?
Explicit dictionaries
An explicit dictionariy (a data type containing methods as its
fields) seems like a great fit for Symantics
. The RR
transformation would be a simple function mapping one record to another.
To define a transformation, we would override the relevant methods via
record update.
However, explicit dictionaries are not that well suited for the
finally tagless style. In OCaml, you can include one module into another
(notice include Symantics
in the OCaml code above). This
“unpacks” the contents of one module into another, so that when you open
the second module, the contents of the first module is available,
too.
This is important for the finally tagless style. One of its strength is extensibility, which is achieved through such inclusion. Consequently, deep inclusion chains are common. With Haskell’s data types, unpacking such chains manually at every use site will quickly become unwieldy.
Type classes
Type classes are better suited for inclusion. If we declare
class Symantics1 rep => Symantics2 rep where { ... }
and impose a Symantics2 rep
constraint on a function
definition, the methods of Symantics1
become available
without any additional effort.
But then we don’t have good support for RR. Type class instances are not first class citizens; we can’t declare a function that transforms one instance into another. Nor can we create one instance from another by overriding a few methods… Or can we?
We can achieve our goal by using default method signatures.
We define the RR transformation simultaneously with the class itself:
class Symantics rep where
lam :: (rep a -> rep b) -> rep (a -> b)
lam :: RR t rep => (t rep a -> t rep b) -> t rep (a -> b)
default= fwd $ lam $ bwd . f . fwd
lam f
app :: rep (a -> b) -> rep a -> rep b
app :: RR t rep => t rep (a -> b) -> t rep a -> t rep b
default= fwd $ bwd f `app` bwd x
app f x
foreach :: rep [a] -> (rep a -> rep [b]) -> rep [b]
foreach :: RR t rep => t rep [a] -> (t rep a -> t rep [b]) -> t rep [b]
default= fwd $ foreach (bwd a) (bwd . b . fwd)
foreach a b
...
The implementation of RR
is straightforward:
class RR t rep where
fwd :: rep a -> t rep a
bwd :: t rep a -> rep a
Now let’s define the AbsBeta pass in Haskell.
data AbsBeta rep a where
Unknown :: rep a -> AbsBeta rep a
Lam :: (AbsBeta rep a -> AbsBeta rep b) -> AbsBeta rep (a -> b)
instance Symantics rep => RR AbsBeta rep where
= Unknown
fwd = \case
bwd Unknown t -> t
Lam f -> lam (bwd . f . fwd)
instance Symantics rep => Symantics (AbsBeta rep) where
= Lam
lam =
app f x case f of
Unknown f' -> fwd $ app f' (bwd x)
Lam b -> b x
All the methods not mentioned in the last instance get their default implementations based on RR, which is exactly what we wanted.
Associated types
Apart from methods, ML/OCaml modules can also define types. This is used in the Language-integrated query paper and code in the following way:
* Base Symantics *)
(module type Symantics_base = sig
type 'a repr (* representation type *)
: (unit -> 'a repr) -> 'a obs
val observe ...
In Haskell, we can replicate that with an associated type:
class SymanticsObs rep where
type Obs rep :: * -> *
observe :: rep a -> Obs rep a
observe :: RR t rep => t rep a -> Obs rep a
default= observe . bwd observe
The default definition for observe
saves us from
redefining it for derived representations, but what about
Obs
itself? We would like to write, in the spirit of
default method signatures,
class SymanticsObs rep where
type Obs rep :: * -> *
type Obs (t rep) = rep
However, GHC would not let us to. Since recently, GHC does support
default type declarations, but they need to be of the general form
type Obs rep = ...
.
Nevertheless, we can create a type family that will extract the
rep
from t rep
for us:
type family Peel (rep :: * -> *) :: (* -> *) where
Peel (t rep) = rep
class SymanticsObs rep where
type Obs rep :: * -> *
type Obs rep = Obs (Peel rep)
observe :: rep a -> Obs rep a
observe :: RR t rep => t rep a -> Obs rep a
default= observe . bwd observe
Now we can say
instance (Symantics rep, SymanticsObs rep) => SymanticsObs (AbsBeta rep)
without having to define either type Obs
or
observe
explicitly.
Conclusion
Extensions such as default method signatures, default associated types, and type families can significantly reduce the boilerplate when defining transformations in the finally tagless style.
Update. Although I missed it on the first reading of the paper, /u/rpglover64 on reddit points out that the authors themselves acknowledge the boilerplate problem which this article addresses:
Haskell typeclasses made the encoding lightweight compared to OCaml modules. On the other hand, in OCaml we relied on the include mechanism to program optimizations by reusing the code for the identity transformation and overriding a couple of definitions. Haskell does not support that sort of code reuse among type classes. Therefore, programming tagless-final transformation in Haskell has quite a bit of boilerplate.