Lambda Calculus Compiler: Part III: First-Order Functions
Table of contents for A Lambda Calculus compiler for LLVM
- A compiler for Lambda Calculus to LLVM, Part 1
- Lambda Calculus compiler, Part II: Wading in with arithmetic
- Lambda Calculus Compiler: Part III: First-Order Functions
Our last version of this project was a bit … underwhelming. The Lambda Calculus is Turing-equivalent, but it’s hard to imagine expressing much in the way of interesting computations with what we have right now. In fact, we can’t, since our calculator is missing the most important tool of the Lambda Calculus: functions.
In this section, we’ll look at generating first-order functions. It’s easier to understand how functions in LLVM work if we aren’t dealing with the machinery for closures. As always, this post is literate Haskell. You’ll need the parser from the first section to compile it.
Our finished product this time will compile our code into LLVM bytecode, then disassemble it and print the corresponding LLVM assembly. It wouldn’t be much of a stretch to run it as we did last time, but it also wouldn’t really explain much either. We’re still at a point where reading the generated assembly can convince us of correct results.
> module Main where
> import Parser
As before, we’re going to need a bunch of imports.
> import Control.Monad.State(forever)
> import Data.Int
> import Data.List
> import LLVM.Core
> import LLVM.Util.File(writeCodeGenModule)
> import System.IO
> import System.Cmd
Our environment from the calculator needs to be extended to keep track of functions. In theory we could look up and store the name of newly generated functions here, but it’s easier to store the functions directly.
> data Env = Env {
> vars :: [(String, Value Int32)]
> , funs :: [(String, Function (Int32 -> IO Int32))]
> }
It’s nice to be able to print the variables we already have, so we’ll
make Env an instance of Show to help with that.
> instance Show Env where
> show e = "vars: " ++ (m $ vars e) ++ ", " ++
> "funs: " ++ (m $ funs e) where
> m l = show $ names l
At several times we’ll need to separate the names from the values in our environment. In fact we’ve already needed to once.
> names = map fst
Here’s our familiar AST -> LLVM translation of arithmetic operations.
> getOpt Add = add
> getOpt Mul = mul
Detecting Closures
Since we’re restricting ourselves to first-order functions, we need to take some steps to detect (and fail on) higher-order functions. In some ways we can continue to be bad compiler writers and rely on the user to follow arcane and senseless rules. This is exactly the technique I will be using to prevent functions as arguments and functions as return values.
However, there’s one other feature we need to prevent, and that’s closures. Obviously we could expect the user to avoid those as well, but in this case the code we need to detect them now will be convenient later. Since this project is all about doing things that are convenient for us as the compiler writers (enjoy it now, this almost never happens in real life), we’ll go ahead and implement that check.
So, what does a closure look like in an abstract syntax tree? The answer is pretty simple: a closures references variables from its surrounding environment, so build up a list of all the name references in a piece of the AST and eliminate the ones that are declared in that piece. What is left is the variables that are being closed over.
Here’s an example of a perfectly valid first-order function, the identity function:
\x. x
There’s only one variable involved, x, and we can easily see that it
is introduced in the lambda terms, so the identity function has no
free variables.
In this example, the function bound to g is a closure:
let f = 5; g = \x. x + f in g 2
That’s easy to see when we pull it out of its context:
\x. x + f
We can still identify where x “comes from”, but now there’s nothing
to provide the definition of f. In this situation f is called a
free variable, and it’s the presence of these that differentiates
closures from more ordinary functions.
Now that we have a handle on what free variables are, on to detecting them. We’ll need a function that takes an expression and generates a list of the free variables in it.
> fv :: Exp -> [Var]
Perhaps the simplest way to structure this is to go through each constructor for our AST and decide how to find free variables for that constructor. Vars are simple, since they’re just a variable name with nothing to bind them to a value, they’re automatically free variables.
> fv (EVar v) = [v]
Constants have no variables, so there’s no free variables to be found here.
> fv (Con _) = []
Function applications have two places they could contain free
variables. The obvious one is the expression the function is being
applied to, but the slightly-less-obvious case inovlves the function
itself. Remember, we eventually will have functions that can return
functions, so something like (f 2) 2 will eventually be valid, and
leaves plenty of room to start closing over variables.
This isn’t tough to account for though, just have to look up the free
variables in both, then filter the list to eliminate
duplicates. Luckily, nub from Data.List has already been written
to take care of just this issue.
> fv (App f x) = nub (fv f ++ fv x)
Lambda terms are one of our two constructs that are capable of binding
a name. We need to make sure we don’t accidentally declare the
function argument as a free variable, since we know exactly what it’s
value is. The function (\\) scans a list on the left to remove any
items from the list on the right, so we can just look up the free
variables in our lambda terms expression then pluck out the one that
doesn’t belong.
> fv (Lam v e) = fv e \\ [v]
Arithmetic operations can close over variables in their left or right terms. Again, we should clean up the list so any given free variable is only present once.
> fv (EOp _ l r) = nub (fv l ++ fv r)
Let bindings are our other construct that can bind a name. The steps needed here are about the same as those for lambda expressions.
> fv (Let v eq xp) = nub $ fv eq ++ ([v] \\ fv xp)
Generating Functions
Now that we can identify free variables, we’re pretty much ready to
start compiling functions. This has to happen within the
CodeGenModule monad, which is fine now but will complicate our
closure code a bit in the next section. We’ll rewrite compileFun
from the calculator to support the kinds of functions we’re
defining. We’ll have to change the type signature to match the Int ->
Int functions our lambda calculus uses.
While creating a function, we’re inside of the CodeGenFunction
monad, and thus can compile its body using the code from our
calculator (remember, we’re trusting the users not to write any
“interesting” functions).
The only tricky part left is distinguishing closures from non-closures, but that’s pretty simple, just look up the free variables of the function. If there are any we have a closure and throw an error, otherwise we can compile it.
The only interesting aspect of compiling the body of our function is that we have to add the argument to our environment while compiling the body. Since the code for this looks kind of ugly, we’ll wrap it in a slightly prettier function.
> addv :: Var -> Value Int32 -> Env -> Env
> addv v x e = e {vars = ((v::String),x):vars e}
Finally, we’re ready to compile functions. Since the only piece of our AST that can product a function is a lambda term, that’s the only piece we can meaningfully compile.
> compileFun :: Env -> Exp -> CodeGenModule (Function (Int32 -> IO Int32))
> compileFun env f@(Lam v e) = case (fv f) of
> [] -> createFunction ExternalLinkage $ \x -> do
> t <- compileExp (addv v x env) e
> ret t
> otherwise -> error "No closures yet!"
> compileFun _ _ = error "undefined"
Due to the restrictions we have in place, we’ll expect to see a set of nested lets that introduce functions, and a final body that uses them. We can special case the compilation of this to ensure our eventual body is in a special function.
First, we’ll need a convenience function to introduce a function to
the environment. This is almost the same as addv, with an obvious
change to the record being updated.
> addf :: Var -> (Function (Int32 -> IO Int32)) -> Env -> Env
> addf f x e = e {funs = ((f::String),x):funs e}
When compiling lets, we’re going to start out only dealing with lets that bind a lambda term to a name. The process is relatively simple: compile the function, store it in the environment under the name it was bound do, and compile the let body with that environment.
> compileLet :: Env -> Exp -> CodeGenModule Env
> compileLet env l@(Let v fun@(Lam x exp) body) =
> do fn <- compileFun env fun
> letBody v (addf v fn env) body
> compileLet _ _ = error "undefined"
While we’re compiling the let body, we need so way to decide where to “stop” and emit our final function. We can pile on one more restriction to help with this: let bindings that introduce functions must come before those that bind non-function values. That way once we see a let which doesn’t introduce a function, we can emit our main function and compile as normal in that.
Note that we don’t really need this restriction. Since functions aren’t being allowed to close over their environment, we could pull all of the non-function bindings past the function bindings without changing the meaning of the program. That would be a good exercise to try.
To decide when to stop making more functions, we need to distinguish lets that bind a function from those that bind a value. It doesn’t get much simpler than this:
> isFunLet :: Exp -> Bool
> isFunLet (Let _ (Lam _ _) _) = True
> isFunLet _ = False
The body of each let can meet one of two criteria. It either contains another let that defines a function, in which case we need to keep compiling functions, or it doesn’t, in which case we’ve hit the “body” of our main function and can compile that. For convenience we’re adding an unused parameter to the main function so its type matches that of the others and we can return it with the entire environment.
> letBody :: Var -> Env -> Exp -> CodeGenModule Env
> letBody v env expr | isFunLet expr = do compileLet env expr
> | otherwise =
> do f <- createNamedFunction ExternalLinkage "main" $ \n ->
> do t <- compileExp env expr; ret t
> return $ addf v f env
>
Compilation, extended
Since we now have functions and variables, let’s add a bit more useful error handling to make life simpler.
> envError :: String -> String -> Env -> a
> envError "function" n e = error $ "could not find function: " ++ n ++ ", have: " ++ (show $ names $ funs e)
> envError "variable" n e = error $ "could not find variable: " ++ n ++ ", have: " ++ (show $ names $ vars e)
> envError t _ _ = error $ "unrecognized error type: " ++ t
The code to compile expressions is largely unchanged, with the exception that we’re now supporting function application.
> compileExp :: Env -> Exp -> CodeGenFunction r (Value Int32)
> compileExp _ (Con x) = return $ valueOf ((fromIntegral x)::Int32)
> compileExp e (EOp o l r) = do
> l' <- compileExp e l
> r' <- compileExp e r
> getOpt o l' r'
> compileExp e (EVar v) = case lookup v $ vars e of
> Just a -> return a
> Nothing -> envError "variable" v e
> compileExp env@(Env e f) (Let v val body) = do
> v' <- compileExp env val
> let e' = Env (((v::String),v'):e) f in
> compileExp e' body
Due to the restrictions we have in place, we don’t have to worry about expressions that result in a function. Our code just has to handle variables bound to a give function. It does this by looking up the function in our environment (obviously failing if it isn’t found), then compiling the argument into a register and calling the function with that register as its argument.
> compileExp e (App (EVar v) e2) = case lookup v $ funs e of
> Just a -> do
> t1 <- compileExp e e2
> call a t1
> Nothing -> envError "function" v e
> compileExp _ _ = error "undefined"
To start off our compilation, we’ll need an empty environment. Here it is.
> emptyEnv = Env [] []
Our main function is unchanged from last time.
> main =
> forever $
> do putStr "> "
> hFlush stdout -- have to flush for proper output ordering
> s <- getLine
> go s
Now, go has been rewritten to compile our code into llvm bytecode,
use a shell command to disassemble it, then read the disassembled
output and print it to the screen.
> go s =
> case runwith letexpr id s of
> Left err -> putStrLn $ show err
> Right exp -> do writeCodeGenModule "FirstClass.bc" $ compileLet emptyEnv exp
> rawSystem "llvm-dis" ["-f", "FirstClass.bc"]
> f <- readFile "FirstClass.ll"
> putStr f
No comments
No comments yet. Be the first.
Leave a reply