What is name resolution?
Name resolution is a program analysis which for every name decides what that name refers to.
Consider the following Haskell program as an example:
There are three names in this program:
Integer. The result of name resolution in this case would be:
nrefers to the global value defined in the program itself, on lines 1-2;
Integerrefers to the type (implicitly) imported from
kon line 2 is a local value defined on the same line.
This article is concerned with the following question: how should the API of a name resolution library be structured?
The standard approach is a function that transforms a simple AST to the annotated AST, where every name is annotated with its binding information (just like we did above).
(Note that a simple map from names to binding information won’t do: the same name can refer to different things in different parts of the program.)
This is enough for most applications of name resolution, such as type checking, compilation or static analysis.
Significance of lexical environment
However, simple annotations are not enough for source-to-source transformations, like the ones performed by HaRe or HasFix.
Such transformations typically require access to the whole symbol table to avoid variable capture. As an example, imagine that in the new version of some module
foo has been renamed to
bar, and we want to adapt our own code to use this new version. And our code looks like this:
Name resolution tells us that
foo on that line indeed comes from
Mod and should be renamed. But if we blindly apply the renaming, we’ll get
bar no longer refers to the global value imported from
Mod — it has been captured by
The solution is simple: rename
foo to qualified name
Mod.bar whenever simple
bar is already defined in the enclosing lexical scope. It requires us, however, to know the lexical environment at every point.
Annotating with symbol tables
Why not store the whole symbol table at every point in the AST? This could work. In a lazy language this is not even as bad, time- and space-wise, as it might seem.
It is cumbersome, however. Every time we replace a subtree, we need manually to re-run name resolution on the new subtree (or even on the whole tree, if there’s any possibility that other annotations may become invalid as a result of the replacement).
In order to maintain the symbol table during the traversal, we can try to keep track of the local bindings ourselves, only to discover that it’s not that trivial, and that we’re replicating a large part of the resolution code.
The core of the problem is that the name resolution algorithm is one big-step tree traversal, and we can only observe its final result (or a trace of its execution in the form of annotations).
If we could split it up into small-step, one layer at a time, resolution functions, we could use them in our own traversals and interleave them with transformations. Another name for this approach is open recursion, hence the article’s title.
If our AST is a single recursive data type (parameterised by the annotation type)
then such a small-step resolution function would have type
It takes an expression, the symbol table (or environment) that corresponds to the enclosing lexical context of that expression, and a transformation function. It then applies the transformation to all immediate children of the expression.
Note that the transformation function is in turn parameterised by the symbol table. It is the responsibility of
resolve to update the symbol table accordingly before using it to transform the children.
An implementation of
resolve might look like this
Now that we have
resolve, it is easy to tie the knot and write the big-step recursive traversal:
resolveRec doesn’t do anything interesting — it’s an identity function. But we can compose
resolve with an interesting transformation before “closing” the recursion:
Let’s try it out:
*Main> annotateRec $ Lambda 1 "x" $ App 2 (Var 3 "x") (Var 4 "x") Lambda 1 "x" (App 2 (Var 1 "x") (Var 1 "x"))
Hence, the annotation interface to a name resolution library is easy to implement given the open recursion interface. But the open recursion interface is much more powerful, allowing the transformation function to depend on the lexical environment in non-trivial ways.
What if our AST is represented by a multitude of mutually recursive data types, just like in haskell-src-exts? Generic programming may help here — specifically, the “Scrap your boilerplate” approach.
A stock library, such as syb-with-class, might go a long way, but let’s roll out our own. It will be simpler this way, but also more flexible (for example, it’s possible to modify this presentation to allow the traversal to change the annotation type).
How does the type of
resolve change? It now has to work for any AST node type, rather than just for
Exp, i.e. be polymorphic. Furthermore, the functional argument to
resolve, which used to have type
SymbolTable a -> Exp -> Exp, also has to be polymorphic, because the types of node’s children may be different (and may differ from the node’s type itself).
To encode this polymorphism, one can use either a type class or a GADT sum type (see this StackOverflow entry for comparison of these approaches).
Although we are dealing with a finite, closed universe (consisting of haskell-src-exts node types) in this case, I prefer the class-based approach because it’s more practical — it doesn’t force us to define a giant GADT.
resolve now becomes a class method:
Typeable superclass constraint is needed here so that we can write type-specific cases (such as
Var in the
Otherwise, implementing and using this approach for a multi-sorted AST is quite similar to the single-sorted case.
The article argues that the open recursion approach to name resolution offers more power to the user and shows how to implement it.
The presentation here is simplified for the sake of clarity. A full implementation would allow queries (folds), effectful traversals and traversal that can change the annotation type.
For Haskell (and haskell-src-exts AST), I’ve just started implementing this idea in haskell-names.