Roman Cheplyaka

Open your name resolution

March 4, 2013

What is name resolution?

Name resolution is a program analysis which for every name decides what that name refers to.

Consider the following Haskell program as an example:

n :: Integer
n = let k = 10 in k * k

There are three names in this program: n, k and Integer. The result of name resolution in this case would be:

  • n refers to the global value defined in the program itself, on lines 1-2;
  • Integer refers to the type (implicitly) imported from Prelude;
  • k on line 2 is a local value defined on the same line.

This article is concerned with the following question: how should the API of a name resolution library be structured?

Annotations

The standard approach is a function that transforms a simple AST to the annotated AST, where every name is annotated with its binding information (just like we did above).

(Note that a simple map from names to binding information won’t do: the same name can refer to different things in different parts of the program.)

This is enough for most applications of name resolution, such as type checking, compilation or static analysis.

Significance of lexical environment

However, simple annotations are not enough for source-to-source transformations, like the ones performed by HaRe or HasFix.

Such transformations typically require access to the whole symbol table to avoid variable capture. As an example, imagine that in the new version of some module Mod foo has been renamed to bar, and we want to adapt our own code to use this new version. And our code looks like this:

baz x = let bar = x + 1 in foo bar

Name resolution tells us that foo on that line indeed comes from Mod and should be renamed. But if we blindly apply the renaming, we’ll get

baz x = let bar = x + 1 in bar bar

where bar no longer refers to the global value imported from Mod — it has been captured by let!

The solution is simple: rename foo to qualified name Mod.bar whenever simple bar is already defined in the enclosing lexical scope. It requires us, however, to know the lexical environment at every point.

Annotating with symbol tables

Why not store the whole symbol table at every point in the AST? This could work. In a lazy language this is not even as bad, time- and space-wise, as it might seem.

It is cumbersome, however. Every time we replace a subtree, we need manually to re-run name resolution on the new subtree (or even on the whole tree, if there’s any possibility that other annotations may become invalid as a result of the replacement).

Open recursion

In order to maintain the symbol table during the traversal, we can try to keep track of the local bindings ourselves, only to discover that it’s not that trivial, and that we’re replicating a large part of the resolution code.

The core of the problem is that the name resolution algorithm is one big-step tree traversal, and we can only observe its final result (or a trace of its execution in the form of annotations).

If we could split it up into small-step, one layer at a time, resolution functions, we could use them in our own traversals and interleave them with transformations. Another name for this approach is open recursion, hence the article’s title.

Single-sorted AST

If our AST is a single recursive data type (parameterised by the annotation type)

type VarName = String

data Exp a
  = Var a VarName
  | App a (Exp a) (Exp a)
  | Lambda a VarName (Exp a)

type SymbolTable a = Map.Map VarName a

then such a small-step resolution function would have type

resolve
  :: (SymbolTable a -> Exp -> Exp)
  ->  SymbolTable a -> Exp -> Exp 

It takes an expression, the symbol table (or environment) that corresponds to the enclosing lexical context of that expression, and a transformation function. It then applies the transformation to all immediate children of the expression.

Note that the transformation function is in turn parameterised by the symbol table. It is the responsibility of resolve to update the symbol table accordingly before using it to transform the children.

An implementation of resolve might look like this

resolve f tbl e =
  case e of
    Var l n -> Var l n
    App l e1 e2 -> App l (f tbl e1) (f tbl e2)
    Lambda l n body -> Lambda l n (f (Map.insert n l tbl) body)

Now that we have resolve, it is easy to tie the knot and write the big-step recursive traversal:

resolveRec :: Exp a -> Exp a
resolveRec = fix resolve Map.empty

resolveRec doesn’t do anything interesting — it’s an identity function. But we can compose resolve with an interesting transformation before “closing” the recursion:

annotate
  :: (SymbolTable a -> Exp a -> Exp a)
  ->  SymbolTable a -> Exp a -> Exp a
annotate f tbl e =
  case e of
    Var _ n -> f tbl $ Var (tbl Map.! n) n
    _ -> f tbl e

annotateRec :: Exp a -> Exp a
annotateRec = fix (resolve . annotate) Map.empty

Let’s try it out:

*Main> annotateRec $ Lambda 1 "x" $ App 2 (Var 3 "x") (Var 4 "x")
Lambda 1 "x" (App 2 (Var 1 "x") (Var 1 "x"))

Hence, the annotation interface to a name resolution library is easy to implement given the open recursion interface. But the open recursion interface is much more powerful, allowing the transformation function to depend on the lexical environment in non-trivial ways.

Multi-sorted AST

What if our AST is represented by a multitude of mutually recursive data types, just like in haskell-src-exts? Generic programming may help here — specifically, the “Scrap your boilerplate” approach.

A stock library, such as syb-with-class, might go a long way, but let’s roll out our own. It will be simpler this way, but also more flexible (for example, it’s possible to modify this presentation to allow the traversal to change the annotation type).

How does the type of resolve change? It now has to work for any AST node type, rather than just for Exp, i.e. be polymorphic. Furthermore, the functional argument to resolve, which used to have type SymbolTable a -> Exp -> Exp, also has to be polymorphic, because the types of node’s children may be different (and may differ from the node’s type itself).

To encode this polymorphism, one can use either a type class or a GADT sum type (see this StackOverflow entry for comparison of these approaches).

Although we are dealing with a finite, closed universe (consisting of haskell-src-exts node types) in this case, I prefer the class-based approach because it’s more practical — it doesn’t force us to define a giant GADT.

Thus, resolve now becomes a class method:

class Typeable a => Resolvable a where
  resolve
    :: (forall b . Resolvable b => SymbolTable -> b -> b)
    ->                             SymbolTable -> a -> a

The Typeable superclass constraint is needed here so that we can write type-specific cases (such as case for Var in the annotate function).

Otherwise, implementing and using this approach for a multi-sorted AST is quite similar to the single-sorted case.

Conclusions

The article argues that the open recursion approach to name resolution offers more power to the user and shows how to implement it.

The presentation here is simplified for the sake of clarity. A full implementation would allow queries (folds), effectful traversals and traversal that can change the annotation type.

For Haskell (and haskell-src-exts AST), I’ve just started implementing this idea in haskell-names.