Thursday, December 8, 2011

Literate, Racket-Styled Interpreter from Ch. 1 of Lisp in Small Pieces

#lang racket

We'll need Racket's mutable pairs.

(require racket/mpair)

Our initial interpreter will not be very discriminating: it is sufficient to recognize atoms, that is, values which evaluate to themselves, by:

(define (atom? x) (not (pair? x)))

However, atoms as defined above include both symbols and things like strings and numbers. We will be using symbols to refer to variables in our interpreted code, though, so we need to draw a distinction between symbols and other atoms.

(define (atom/symbol? x) (symbol? x))
(define (atom/non-symol? x) (and (atom? x) (not (symbol? x))))

I'd like to extend my interpreter as I go, because I think that is clearer. We will take the following approach to allow this:

(define (suffix item onto)
  (reverse (cons item (reverse onto))))

(define-values (interpret extend-interpreter!)
  (let* ((interpreter-handlers '())
         (extend-interpreter!
          (lambda (dispatch-function handle-function)
            (set! interpreter-handlers
                  (suffix (cons dispatch-function handle-function) interpreter-handlers))))
         (interpret
          (lambda (exp env)
            (let loop 
                [(interpreter-handlers interpreter-handlers)]
              (match interpreter-handlers
                [(list) (error (format "Can't interpret ~a - no handler matched!" exp))]
                [(cons (cons dis han) rest)
                 (if (dis exp) (han exp env)
                     (loop rest))])))))
    (values interpret extend-interpreter!)))

The above creates a closure over three-values, the interpreter-handlers, a list of pairs, a function extend-interpreter!, which allows us to add a form-recognizer, form-handler pair to this list (at the end, using suffix) and a function interpret, which does the actual interpretation.

Interpret loops through the handlers, testing the form it is working on with the first function in each pair. If the test succeeds, it interprets the expression with the second function in the pair.

The downside to this approach is that we destructure our expressions twice - once during matching and once during interpretation. The upside is that we can extend our interpreter as we go, clearly separating concerns. This kind of interpreter is about as slow as it can be anyway, so it hardly matters.

Now we can add our first interpretation rule:

(extend-interpreter!
 atom/non-symol?
 (lambda (val _) val))

Each interpreting function takes the form to interpret and the environment. However, things like "this" and 10 don't need an environment, so we give it a name which indicates our disdain for it, _. We simply return what we were passed in.

Now a note on the environment, since we'll need it soon: since Racket lists are made of immutable pairs, and we'll need to mutate our environment, we are going to use mutable cons cells in our list. The global environment, however, is the empty one:

(define the-global-environment (list))

It needn't itself be mutable - adding items to the global environment will be accomplished by set! on the-global-environment, rather than mutation of the list itself.

We want to support adding a binding to the environment from "outside" the language, and extending the environment from inside. Respectively:

(define (add-global-binding name value)
  (if (symbol? name)
      (set! the-global-environment 
            (cons (mcons name value) the-global-environment))
      (error "Bindings must be to symbols.")))

This adds a new mutable cons pair to the global environment, created with mcons.

(define (extend-environment env vars vals)
  (match (list vars vals)
    [(list (list) (list)) env]
    [(list (cons key subs-keys)
           (cons val subs-vals))
     (extend-environment 
      (cons (mcons key val) env) subs-keys subs-vals)]))

Note that extend-environment produces a new environment without modifying the old one. This is important.

Allowing us to say:

(interpret "A string" the-global-environment)
(interpret 100 the-global-environment)

These yield their first arguments, as they should.

Now let's add a rule for symbols. First we need to know how we will look the meaning symbols up:. The simplest possible choice is that we represent the environment as a list of pairs; eg '((x . 10) (y . 11))

(define (lookup s env)
  (match env
    [(list) (error (format "No binding for ~a." s))]
    [(cons (mcons key val) env)
     (if (eq? key s) val
         (lookup s env))]))

lookup is exactly the function which evaluates symbols.

(extend-interpreter!
 atom/symbol?
 lookup)

(interpret 'x (list (mcons 'x 100)))

Which evaluates to 100, just as our environment specifies.

Now we can write the interpreter cases for special forms. Lisp In Small Pieces supports only quote,if,begin,set!, and lambda. quote first:

(define quote-expression? (match-lambda 
                           [`(quote ,exp) #t]
                           [_ #f]))

The evaluation of a quoted form is the form itself.

(extend-interpreter!
 quote-expression?
 (lambda (s env) (second s)))

(interpret '(quote x) the-global-environment)

Results in x, as it should.

Now for if:

(define if-expression?
  (match-lambda 
   [`(if ,test ,true ,false) #t]
   [_ #f]))

We restrict ourselves to ternary if.

(extend-interpreter!
 if-expression?
 (lambda (e env)
   (match e 
     [`(if ,test ,true ,false)
      (if (interpret test env)
          (interpret true env)
          (interpret false env))])))

(interpret '(if #t 'x 'y) the-global-environment)
(interpret '(if #f 'x 'y) the-global-environment)

These expressions are x and y respectively. If's interpreter only interprets the appropriate branch.

Now we implement begin:

(define begin-expression? 
  (match-lambda 
   [(cons 'begin rest) #t]
   [_ #f]))

Here we create a struct whose sole instance is the value returned by an empty begin statement. We could also forbid such a statement.

(struct Empty-begin () #:transparent)
(define the-empty-begin (Empty-begin))

(define (interpret-begin e env)
  (match e
    [`(begin) the-empty-begin]
    [`(begin ,expr) (interpret expr env)]
    [(cons 'begin  (cons expr rest))
     (interpret expr env)
     (interpret-begin (cons 'begin rest) env)]))

(extend-interpreter! 
 begin-expression?
 interpret-begin)

We can't really do anything interesting with begin without implementating side effecting expressions. However:

(interpret '(begin 'a 'b 'c) the-global-environment)

is 'c, just as we expected.

Now for set!, which will allow us to update the environment from within the language itself.

(define set!-expression? 
  (match-lambda
   [`(set! ,(? symbol? var) ,expr) #t]
   [_ #f]))

This is a side effect inducing form and we'll need to use side effects to simulate the result. To match L.I.S.P., however, we will not allow set! to introduce new bindings, only update them.

(struct Set!-result () #:transparent)
(define set!-result (Set!-result))

(define (update! env key value)
  (match env
    [(list) (error (format "Can't find a binding ~a to update." key))]
    [(cons (and (mcons lkey val) pair) rest)
     (if (eq? key lkey) 
         (set-mcdr! pair value)
         (update! rest key value))]))

set-mcdr! does the work. We search until we find the symbol requested, and then we mutate its pair. It is an error to set a symbol without a binding.

(define (interpret-set!-expression e env)
  (match e
    [`(set! ,(? symbol? var) ,expr)
     (let ((val (interpret expr env)))
       (begin (update! env var val)
              set!-result))]))

(extend-interpreter!
 set!-expression?
 interpret-set!-expression)

We will define a test environment which has a binding for 'x

And test our new form:

(interpret '(begin (set! x 100) x) (list (mcons 'x 10)))

Finally, we write our implementation of lambda:

(define lambda-expression? 
  (match-lambda 
   [`(lambda ,(? list?) ,@(list body ...)) #t]
   [_ #f]))

(define (interpret-lambda e env)
  (match e
    [`(lambda ,(? list? vars) ,body ...)
     (lambda values
       (interpret
        `(begin ,@body)
        (extend-environment env vars values)))]))

Lambda's are represented by lambdas, which is confusing, I know (wait till you see the continuation passing interpreter!). However, the way this works will be clear in the invokation word. Our lambda encloses its env, var and body lists and interprets the latter with the environment extended with the former. It receives values to bind to these symbols when it is invoked.

(extend-interpreter! 
 lambda-expression?
 interpret-lambda)

This allows us to write and evaluate lambda expressions but we can't do anything with them until we define the part of the interpreter which invokes functions.

(define invokation?
  (match-lambda 
   [(list e maybe-args ...) #t]
   [_ #f]))

(define (interpret-invokation e env)
  (match e
    [(list e maybe-args ...)
     (let ((f (interpret e env))
           (args (map (lambda (e) (interpret e env)) maybe-args)))
       (if (procedure? f)
           (apply f args)
           (error "Non-function in application position!")))]))

Invoke checks to make sure its first element is actually a function, and then evaluates the arguments. It then passes those values to the function produced by evaluating the first argument. This function contains an internal call to interpret in the extended environment of the function.

(extend-interpreter!
 invokation?
 interpret-invokation)

Ok, we should add some things to our environment so that we can actually do things.

(add-global-binding '+ (lambda args (apply + args)))
(add-global-binding '- (lambda args (apply - args)))
(add-global-binding '* (lambda args (apply * args)))
(add-global-binding '/ (lambda args (apply / args)))

(interpret '(lambda (a b c) (+ a (+ b c))) the-global-environment)

(interpret '((lambda (a b c) (- (+ a b) c)) 1 2 3) the-global-environment)

Wheeeee!

Look out for the next interpreter! Code is available on github.

Wednesday, December 7, 2011

Persistent Hash Tables for Elisp

Another quick update: I've added an implementation of persisten hash tables to my emacs library. These are built on top of random access persistent lists and are similar to Clojure's persistent table datatype.

They are like hash tables (with similar performance) except that set operations return a new table - the old table persists unmodified.

You use them like so:

(require 'persistent-hash-tables)

(setq tbl ({} 'a 10 'b 11 'c 12))
(setq tbl2 (ptbl-set tbl 'c 100))
(ptbl-get tbl 'c) ;-> 12 
(ptbl-get tbl2 'c) ;-> 100

Such tables allow functional programming over table abstractions. These might be a better fit than association lists if you have a large number of associations and/or you are doing a lot of functional updates.


Tuesday, December 6, 2011

Random Access Lists for Emacs

Here is a quick post: I've added an implementation of random access lists to my Emacs Lisp library. These are a persistent data structure which supports fast persistent list operations (O(1)) as well as pretty fast random-access and persistent update (O(log(n))). These data structures are someplace between a list and a vector, but have the benefit of persistence - update to a random-access-list returns a new list and leaves the old one unmodified. These are similar to Clojure's vector datatype.

You can read about these structures in srfi-101. My implementation is a cargo-cult port of the reference implementation of that srfi to Emacs Lisp.

Examples:

Creation:

(setq r (ra:list 1 2 3 4 5)) ; -> [cl-struct-pl-cons 1 1 [cl-struct-pl-cons 1 2 [cl-struct-pl-cons 3 [cl-struct-pl-node 3 4 5] nil]]]

List operations are prefixed with ra::

(ra:car r) ;-> 1
(ra:cdr r) ;-> [cl-struct-pl-cons 1 2 [cl-struct-pl-cons 3
[cl-struct-pl-node 3 4 5] nil]]
(ra:map (lambda (x) (+ x 1)) r) ;-> [cl-struct-pl-cons 1 2
[cl-struct-pl-cons 1 3 [cl-struct-pl-cons 3 [cl-struct-pl-node 4 5
6] nil]]]
(ra:cons 100 r) ; ->[cl-struct-pl-cons 3 [cl-struct-pl-node 100 1 2] [cl-struct-pl-cons 3 [cl-struct-pl-node 3 4 5] nil]]

And so on.

Random access and update are accomplished with:

(ra:list-ref r 1) ;-> 2
(ra:list-set r 1 1000) [cl-struct-pl-cons 1 1 [cl-struct-pl-cons 1
1000 [cl-struct-pl-cons 3 [cl-struct-pl-node 3 4 5] nil]]]

Note that indexing is zero based.

Once can convert between ra:lists and lists via:

(ra:random-access-list->linear-access-list r) ;-> (1 2 3 4 5)
(ra:linear-access-list->random-access-list '(1 2 3)) ;->
[cl-struct-pl-cons 3 [cl-struct-pl-node 1 2 3] nil]

Conclusion:

I wrote this library by porting over the reference implementation. There are many places where very large random access lists will blow the stack. There are also probably bugs and places where efficiency can be improved. Please let me know if you find any!

They are available from my github account.


Friday, December 2, 2011

Better Monad Parse Updated!

I've made a few patches to better-monad-parse and pushed the new version to github. The improvements primarily deal with performance of parsers which parse strings, which I've found is the most common use cases.

This seems to make a big usability improvement for me.

I also improved the =number parser, which had some unexpected behavior.

Better-monad-parse is a monadic parser combinator implementation in pure Elisp which is both useful for writing parsers and understanding how monadic computations work.

Suppose you have a set of files which have names consisting of one or more characters and then a number, which you'd like to write a function to parse. You can do so like:

(require 'better-monad-parse)

(defparser =file-name-parser
 (name <- (=>one-or-more =alpha))
 (number <- =number)
 (m-return 
  `((name . ,name)
    (number . ,number))))

This binds the symbol =file-name-parser to a function which takes a string as input and returns an association list (and the leftover input) as a result, but only if parsing succeeds. Otherwise, it returns nil.

The library provides a rich set of parsers and combinators on parsers (like (=>one-or-more ...) above, which takes a parser and returns a parser which parses at least one, and potentially as many as possible, of the specified parser.

Parsers are generic over the input type, and the library knows how to parse strings, lists and emacs buffers by default. Using my implementation of multimethods, you can extend the parser for other input types.

Using better-monad-parse is kind of like the opposite of slumming - you can go visit the neighborhood of pure, functional, monadic programming from the slums of Emacs Lisp!