Typed Tagless, a.k.a tagless-final or finally tagless, is an approach to embedding DSLs and modeling data in general, advocated by Oleg Kiselyov. Instead of defining a set of algebraic data types to describe data or terms, thus focusing on how data is constructed, the approach focuses on data consumption, defining a canonical eliminator for every constructor that we would otherwise define.
For instance, instead of defining lists as
we would define a class
which of course corresponds to the Böhm-Berarducci (or Church) encoding of the above algebraic type.
Oleg has written extensively on the merits of this approach. In this article, I want to discuss a certain aspect of writing transformations in the finally tagless style.
The use case: language-integrated query
Oleg, together with Kenichi Suzuki and Yukiyoshi Kameyama, have published a paper Finally, Safely-Extensible and Efficient Language-Integrated Query. In this paper, they employ the finally tagless approach to embed, optimize, and interpret SQL queries in OCaml.
Here are some excerpts from their OCaml code:
(* Base Symantics *) module type Symantics_base = sig ... (* lambda abstract *) val lam : ('a repr -> 'b repr) -> ('a -> 'b) repr (* application *) val app : ('a -> 'b) repr -> 'a repr -> 'b repr ... end (* Symantics with list operations *) module type SymanticsL = sig include Symantics (* comprehension *) val foreach : (unit -> 'a list repr) -> ('a repr -> 'b list repr) -> 'b list repr (* condition *) val where : bool repr -> (unit -> 'a list repr) -> 'a list repr (* yield singleton list *) val yield : 'a repr -> 'a list repr (* empty list *) val nil : unit -> 'a list repr (* not empty *) val exists : 'a list repr -> bool repr (* union list *) val (@%) : 'a list repr -> 'a list repr -> 'a list repr (* the table constructor which take a table name and table contents *) val table : (string * 'a list) -> 'a list repr end
(‘Symantics’ is not a typo; it’s a portmanteau of ‘syntax’ and ‘semantics’.)
A SQL trasnformation (such as transforming a subquery to a join) is represented by an ML functor, i.e. a function mapping one
SymanticsL to another, which interprets the term slightly differently than the original one. I say slightly, because normally a transformation touches only a few relevant methods. The others are transformed mechanically following the Reflection-Reification pattern (RR). Informally speaking, we leave the irrelevant methods unchanged, applying the minimal transformation that makes them typecheck.
The question is, how to avoid mentioning irrelevant methods when defining a transformation?
This question is not idle. The language-integrated query code contains about 40 methods and 13 transformations. Pause for a second and imagine the amount of boilerplate that would have to be written if we needed to define every single method for every transformation. As we will see below, ML modules make this a non-issue. In Haskell, however, it is an issue, exhibited in Oleg’s own Haskell example (although easy to miss for a class that only contains 3 methods).
In OCaml, the RR is defined as a transformation of the whole module:
module OL(X:Trans) (F:SymanticsL with type 'a repr = 'a X.from) = struct include O(X)(F) open X let foreach src body = fwd (F.foreach (fun () -> bwd (src ())) (fun x -> bwd (body (fwd x)))) let where test body = fwd (F.where (bwd test) (fun () -> bwd (body ()))) let yield e = fmap F.yield e let nil () = fwd (F.nil ()) let exists e = fmap F.exists e let (@%) e1 e2 = fmap2 F.(@%) e1 e2 let table (name,data) = fwd @@ F.table (name, data) end
When they define a transformation, they first transform the module in this mechanical fashion, and then override the few relevant methods:
module AbsBeta_pass(F:SymanticsL) = struct module X0 = struct type 'a from = 'a F.repr type 'a term = Unknown : 'a from -> 'a term | Lam : ('a term -> 'b term) -> ('a -> 'b) term let fwd x = Unknown x (* generic reflection *) let rec bwd : type a. a term -> a from = function (* reification *) | Unknown e -> e | Lam f -> F.lam (fun x -> bwd (f (fwd x))) end open X0 module X = Trans_def(X0) open X (* optimization *) module IDelta = struct let lam f = Lam f let app e1 e2 = match e1 with | Lam f -> f e2 | _ -> fmap2 F.app e1 e2 end end (* Combine the concrete optimization with the default optimizer *) module AbsBeta(F:SymanticsL) = struct module M = AbsBeta_pass(F) include OL(M.X)(F) (* the default optimizer *) include M.IDelta (* overriding `lam` and `app` *) end
How can we do this in Haskell?
An explicit dictionariy (a data type containing methods as its fields) seems like a great fit for
Symantics. The RR transformation would be a simple function mapping one record to another. To define a transformation, we would override the relevant methods via record update.
However, explicit dictionaries are not that well suited for the finally tagless style. In OCaml, you can include one module into another (notice
include Symantics in the OCaml code above). This “unpacks” the contents of one module into another, so that when you open the second module, the contents of the first module is available, too.
This is important for the finally tagless style. One of its strength is extensibility, which is achieved through such inclusion. Consequently, deep inclusion chains are common. With Haskell’s data types, unpacking such chains manually at every use site will quickly become unwieldy.
Type classes are better suited for inclusion. If we declare
and impose a
Symantics2 rep constraint on a function definition, the methods of
Symantics1 become available without any additional effort.
But then we don’t have good support for RR. Type class instances are not first class citizens; we can’t declare a function that transforms one instance into another. Nor can we create one instance from another by overriding a few methods… Or can we?
We can achieve our goal by using default method signatures.
We define the RR transformation simultaneously with the class itself:
class Symantics rep where lam :: (rep a -> rep b) -> rep (a -> b) default lam :: RR t rep => (t rep a -> t rep b) -> t rep (a -> b) lam f = fwd $ lam $ bwd . f . fwd app :: rep (a -> b) -> rep a -> rep b default app :: RR t rep => t rep (a -> b) -> t rep a -> t rep b app f x = fwd $ bwd f `app` bwd x foreach :: rep [a] -> (rep a -> rep [b]) -> rep [b] default foreach :: RR t rep => t rep [a] -> (t rep a -> t rep [b]) -> t rep [b] foreach a b = fwd $ foreach (bwd a) (bwd . b . fwd) ...
The implementation of
RR is straightforward:
Now let’s define the AbsBeta pass in Haskell.
data AbsBeta rep a where Unknown :: rep a -> AbsBeta rep a Lam :: (AbsBeta rep a -> AbsBeta rep b) -> AbsBeta rep (a -> b) instance Symantics rep => RR AbsBeta rep where fwd = Unknown bwd = \case Unknown t -> t Lam f -> lam (bwd . f . fwd) instance Symantics rep => Symantics (AbsBeta rep) where lam = Lam app f x = case f of Unknown f' -> fwd $ app f' (bwd x) Lam b -> b x
All the methods not mentioned in the last instance get their default implementations based on RR, which is exactly what we wanted.
Apart from methods, ML/OCaml modules can also define types. This is used in the Language-integrated query paper and code in the following way:
In Haskell, we can replicate that with an associated type:
The default definition for
observe saves us from redefining it for derived representations, but what about
Obs itself? We would like to write, in the spirit of default method signatures,
However, GHC would not let us to. Since recently, GHC does support default type declarations, but they need to be of the general form
type Obs rep = ....
Nevertheless, we can create a type family that will extract the
t rep for us:
Now we can say
without having to define either
type Obs or
Extensions such as default method signatures, default associated types, and type families can significantly reduce the boilerplate when defining transformations in the finally tagless style.
Update. Although I missed it on the first reading of the paper, /u/rpglover64 on reddit points out that the authors themselves acknowledge the boilerplate problem which this article addresses:
Haskell typeclasses made the encoding lightweight compared to OCaml modules. On the other hand, in OCaml we relied on the include mechanism to program optimizations by reusing the code for the identity transformation and overriding a couple of definitions. Haskell does not support that sort of code reuse among type classes. Therefore, programming tagless-final transformation in Haskell has quite a bit of boilerplate.