Finite Improbability

Just programming and math, no spontaneously jumping undergarments

A compiler for Lambda Calculus to LLVM, Part 1

Table of contents for A Lambda Calculus compiler for LLVM

  1. A compiler for Lambda Calculus to LLVM, Part 1
  2. Lambda Calculus compiler, Part II: Wading in with arithmetic
  3. Lambda Calculus Compiler: Part III: First-Order Functions

In this series of posts, I’m going to walk through building a compiler for a simple Lambda Calculus to llvm. I’ll try to keep things to the point that nothing more than a general understanding of Haskell, Parsec, and the basic concepts of LLVM are required.

If you’d like to follow along, this post is a Literate Haskell file. That is unless Wordpress broke it, in which case let me know and I’ll fix it.

This first post concentrates on the parser. We’re going to target the abstract syntax found below. It’s a relatively simple untyped Lambda Calculus, with the conceit of adding let bindings to make life a bit more interesting. Note that we’re simplifying things considerably by limiting our types and arithmetic operations to stay within the realm of positive integers.

For educational value, I will not be using Text.ParserCombinators.Parsec.Language. Please note that, if you’re trying to implement a “real” language parser, that’s a much more sane option.

Let’s get started. We’re going to need to define a module for htis, and we obviously need some parsec libraries as well as the AST we’re translating to, so let’s take care of that.

> module Parser(letexpr, runwith, runIO,
>               Exp(EVar, Let, App, Lam, EOp, Con),
>               Op(Add, Mul),
>               Var) where

> import Data.List

> import Text.ParserCombinators.Parsec
> import Text.ParserCombinators.Parsec.Error(messageString, errorMessages)

Here’s the AST I will be using:

> data Exp  = EVar Var
>           | Let Var Exp Exp
>           | App Exp Exp
>           | Lam Var Exp
>           | EOp Op Exp Exp
>           | Con Int
>             deriving Show

> type Var  = String

> data Op = Add
>         | Mul
>           deriving Show

I’m going to start out by defining a few useful tools. It’s entirely likely that these already exist in Parsec, and furthermore that my implementations are inferior, so please let me know if you spot a problem.

Expressing “this should be between spaces” is a pain, so we’ll make something to handle that. But, we don’t want “expected space or…” in all of our errors, so this isn’t as intuitive as it would seem:

> spaced x = between spaces' spaces' x where
>     spaces' = spaces <?> ""

We’ll need a similar construct for things in parentheses, and while we’re at it a “this string should be spaced out” construct wouldn’t hurt either:

> symbol s = spaced $ string s

> parens = between (symbol "(") (symbol ")")

Here’s a few example expressions to “sketch out” what the language should look like:

The identity function:

\x. x

A sample let expression:

let id = \x. x in
id 5

Some math:

\x. x * 3 + 2

Closures

\x. (\y. x+y)

From this, we can see that there are only two keywords, “let” and “in”. We’ll define those now:

> letKW = try $ do string "let"
>                  notFollowedBy alphaNum

> inKW = try $ do string "in"
>                 notFollowedBy alphaNum

> keywords = ["in", "let"]

A “this is a keyword” test will also be useful:

> isKW c = null ([c] \\ keywords)

Pretty much entirely stolen from the code for Parsec’s token parser library.

> kwfail f = do{ x <- f
>              ; if isKW x
>                then unexpected ("reserved word '" ++ x ++ "'")
>                else return x
>              }

Variable names are pretty simple, just have to check that the identifier we’ve parsed is not a keyword.

> var = spaced $ do
>                name <- kwfail ident
>                return $ EVar name

> ident = do
>         c <- letter <|> char '_'
>         cs <- many (letter <|> char '_' <|> digit)
>         return (c:cs)

We’re going to have literal integer values, which are pretty straightforward except that we need to convert the value into an integer when putting it in the ast.

> literal = spaced $ do
>                    n <- many1 digit
>                    return $ Con ((read n)::Int)

We’re going to have two arithmetic operations, addition and multiplication. Just to be fancy, we’ll abstract out the process of recognizing an operator. The following code is a function that takes a symbol and the AST representation and generates a function that, given a left and right value, generates the AST for that operation.

> mkOp sym val = do
>      symbol sym
>      return (\l r -> EOp val l r)

Now we can trivially define operators for addition and multiplication (and anything else, when we need to):

> add = mkOp "+" Add
> mul = mkOp "*" Mul

Order of operations can be tricky here. The standard “definition” (stolen from wikipedia) looks like this:

expression ::= additive-expression
additive-expression ::= multiplicative-expression ( '+' multiplicative-expression ) *
multiplicative-expression ::= primary ( '*' primary ) *
primary ::= '(' expression ')' | NUMBER | VARIABLE

Unfortunately, it’s left recursive, which will throw Parsec into an infinite loop. Fortunately, there’s tools to deal with this, namely chainl1. Our arithmetic expression looks pretty similar to the above, once that’s taken into account:

> arithExpr    = arithTerm   `chainl1` add
> arithTerm    = arithFactor `chainl1` mul
> arithFactor  = parens aexpr <|> var <|> literal

Obviously we need a definition for an expression. Since we’re only concerned with what is useful in the context of arithmetic expressions, we’ll limit what can happen here to the things that could have an arithmetic value.

> aexpr = spaced $ do
>   n <- (try app) <|> (try $ parens app) <|> arithExpr <|> literal <|> var <|> parens aexpr
>   return n

A function application is composed of a (parenthesized) lambda expression or variable on the left and an arithmetic expression on the right.

> app = do
>   f <- (parens lam) <|> var
>   x <- (try $ parens arithExpr) <|>
>        (try $ parens app)       <|>
>        var                      <|>
>        literal
>   return $ App f x

A let expression binds a var to either a value or a lambda expression. Nested let expressions are allowed, but we have to take care to ensure that ultimately an arithmetic expression is evaluated. We’re also going to allow multiple bindings in a single let, immediately desugaring them into equivalent nested let expressions. Note that, unlike in Haskell, the order of variable naming is relevant. Fixing that is an exercise left to the reader.

Sample desugaring:

let f = 5
    g = \x. x + x in
    g f

let f = 5 in
let g = \x. x + x in
g f

First, we’ll start with the name binding. To aid in using this for our syntactic sugar, it produces a (Var, Exp) pair for the expression bound to a given name:

> binding = do
>   (EVar v) <- var
>   spaced (char '=')
>   x <- aexpr <|> lam <|> var <|> literal
>   return $ (v,x)

Desugaring the lets is as simple as taking a list of the binding results, along with the eventual body, and walking the list to nest each binding in the let expression for the binding before it. For completeness, we’ll handle the empty list case as well, even though it won’t be permitted by the parser:

> desugarLets :: [(Var,Exp)] -> Exp -> Exp
> desugarLets [] b         = b
> desugarLets ((v,x):[]) b = Let v x b
> desugarLets ((v,x):xs) b = Let v x (desugarLets xs b)

> letexpr = do
>   spaced letKW
>   xs <- sepBy1 binding (newline <|> char ';')
>   spaced inKW
>   many $ newline <|> char ';'
>   b <- letexpr <|> aexpr
>   return (desugarLets xs b)

A lambda expression starts with a backslash, introduces a variable name, and has either a let expression, a raw arithmetic expression, or another lambda expression for its body.

> lam = do
>   symbol "\\"
>   (EVar v) <- var
>   symbol "."
>   x <- letexpr <|> (try aexpr) <|> (try app) <|> lam <|>
>        parens lam
>   return $ Lam v x

We’ll need this function later on to ease the process of running functions over the parser’s output.

> runwith :: Parser a -> (a -> b) -> String -> Either ParseError b
> runwith p conv input
>         = case (parse p "" input) of
>                 Left err -> Left err
>                 Right x  -> Right $ conv x

Finally, we have a simple function to run a parser and print the output.

> runIO :: Show a => Parser a -> String -> IO ()
> runIO p input
>         = case (parse p "" input) of
>             Left err -> do{ putStr "parse error at "
>                           ; print err
>                           }
>             Right x  -> print x

Next time, we’ll start working on an llvm-based simple calculator.

2 comments

2 Comments so far

  1. [...] This post was mentioned on Twitter by mwkohout and Tech news, hrjnrss. hrjnrss said: A compiler for Lambda Calculus to LLVM, Part 1: Comments http://url4.eu/moHn [...]

  2. uberVU - social comments November 19th, 2009 2:30 am

    Social comments and analytics for this post…

    This post was mentioned on Hackernews by mbrubeck: With the right link this time. (I submitted this earlier with a corrupt URI and didn’t notice until it was too late to edit/delete.)…

Leave a reply