Open your name resolution
Published on
What is name resolution?
Name resolution is a program analysis which for every name decides what that name refers to.
Consider the following Haskell program as an example:
n :: Integer
= let k = 10 in k * k n
There are three names in this program: n
, k
and Integer
. The result of name resolution in this case
would be:
n
refers to the global value defined in the program itself, on lines 1-2;Integer
refers to the type (implicitly) imported fromPrelude
;k
on line 2 is a local value defined on the same line.
This article is concerned with the following question: how should the API of a name resolution library be structured?
Annotations
The standard approach is a function that transforms a simple AST to the annotated AST, where every name is annotated with its binding information (just like we did above).
(Note that a simple map from names to binding information won’t do: the same name can refer to different things in different parts of the program.)
This is enough for most applications of name resolution, such as type checking, compilation or static analysis.
Significance of lexical environment
However, simple annotations are not enough for source-to-source transformations, like the ones performed by HaRe or HasFix.
Such transformations typically require access to the whole symbol
table to avoid variable capture. As an example, imagine that in the new
version of some module Mod
foo
has been
renamed to bar
, and we want to adapt our own code to use
this new version. And our code looks like this:
= let bar = x + 1 in foo bar baz x
Name resolution tells us that foo
on that line indeed
comes from Mod
and should be renamed. But if we blindly
apply the renaming, we’ll get
= let bar = x + 1 in bar bar baz x
where bar
no longer refers to the global value imported
from Mod
— it has been captured by let
!
The solution is simple: rename foo
to qualified name
Mod.bar
whenever simple bar
is already defined
in the enclosing lexical scope. It requires us, however, to know the
lexical environment at every point.
Annotating with symbol tables
Why not store the whole symbol table at every point in the AST? This could work. In a lazy language this is not even as bad, time- and space-wise, as it might seem.
It is cumbersome, however. Every time we replace a subtree, we need manually to re-run name resolution on the new subtree (or even on the whole tree, if there’s any possibility that other annotations may become invalid as a result of the replacement).
Open recursion
In order to maintain the symbol table during the traversal, we can try to keep track of the local bindings ourselves, only to discover that it’s not that trivial, and that we’re replicating a large part of the resolution code.
The core of the problem is that the name resolution algorithm is one big-step tree traversal, and we can only observe its final result (or a trace of its execution in the form of annotations).
If we could split it up into small-step, one layer at a time, resolution functions, we could use them in our own traversals and interleave them with transformations. Another name for this approach is open recursion, hence the article’s title.
Single-sorted AST
If our AST is a single recursive data type (parameterised by the annotation type)
type VarName = String
data Exp a
= Var a VarName
| App a (Exp a) (Exp a)
| Lambda a VarName (Exp a)
type SymbolTable a = Map.Map VarName a
then such a small-step resolution function would have type
resolve :: (SymbolTable a -> Exp -> Exp)
-> SymbolTable a -> Exp -> Exp
It takes an expression, the symbol table (or environment) that corresponds to the enclosing lexical context of that expression, and a transformation function. It then applies the transformation to all immediate children of the expression.
Note that the transformation function is in turn parameterised by the
symbol table. It is the responsibility of resolve
to update
the symbol table accordingly before using it to transform the
children.
An implementation of resolve
might look like this
=
resolve f tbl e case e of
Var l n -> Var l n
App l e1 e2 -> App l (f tbl e1) (f tbl e2)
Lambda l n body -> Lambda l n (f (Map.insert n l tbl) body)
Now that we have resolve
, it is easy to tie the knot and
write the big-step recursive traversal:
resolveRec :: Exp a -> Exp a
= fix resolve Map.empty resolveRec
resolveRec
doesn’t do anything interesting — it’s an
identity function. But we can compose resolve
with an
interesting transformation before “closing” the recursion:
annotate :: (SymbolTable a -> Exp a -> Exp a)
-> SymbolTable a -> Exp a -> Exp a
=
annotate f tbl e case e of
Var _ n -> f tbl $ Var (tbl Map.! n) n
-> f tbl e
_
annotateRec :: Exp a -> Exp a
= fix (resolve . annotate) Map.empty annotateRec
Let’s try it out:
*Main> annotateRec $ Lambda 1 "x" $ App 2 (Var 3 "x") (Var 4 "x")
Lambda 1 "x" (App 2 (Var 1 "x") (Var 1 "x"))
Hence, the annotation interface to a name resolution library is easy to implement given the open recursion interface. But the open recursion interface is much more powerful, allowing the transformation function to depend on the lexical environment in non-trivial ways.
Multi-sorted AST
What if our AST is represented by a multitude of mutually recursive data types, just like in haskell-src-exts? Generic programming may help here — specifically, the “Scrap your boilerplate” approach.
A stock library, such as syb-with-class, might go a long way, but let’s roll out our own. It will be simpler this way, but also more flexible (for example, it’s possible to modify this presentation to allow the traversal to change the annotation type).
How does the type of resolve
change? It now has to work
for any AST node type, rather than just for Exp
, i.e. be
polymorphic. Furthermore, the functional argument to
resolve
, which used to have type SymbolTable a -> Exp -> Exp
,
also has to be polymorphic, because the types of node’s children may be
different (and may differ from the node’s type itself).
To encode this polymorphism, one can use either a type class or a GADT sum type (see this StackOverflow entry for comparison of these approaches).
Although we are dealing with a finite, closed universe (consisting of haskell-src-exts node types) in this case, I prefer the class-based approach because it’s more practical — it doesn’t force us to define a giant GADT.
Thus, resolve
now becomes a class method:
class Typeable a => Resolvable a where
resolve :: (forall b . Resolvable b => SymbolTable -> b -> b)
-> SymbolTable -> a -> a
The Typeable
superclass constraint is needed here so
that we can write type-specific cases (such as case
for Var
in the
annotate
function).
Otherwise, implementing and using this approach for a multi-sorted AST is quite similar to the single-sorted case.
Conclusions
The article argues that the open recursion approach to name resolution offers more power to the user and shows how to implement it.
The presentation here is simplified for the sake of clarity. A full implementation would allow queries (folds), effectful traversals and traversal that can change the annotation type.
For Haskell (and haskell-src-exts AST), I’ve just started implementing this idea in haskell-names.