Tuesday, August 30, 2011

Monadic Turtle Graphics (Screencast Edition)

Apparently the video is still borked for many users. Anyone here have any advice on the best way to record screencasts on Ubuntu Natty? A trillion apologies and abasements for any inconvenience.

Hi everyone, a few months ago I posted a lengthy article about using monads to simulate state for a turtle graphics system, including a "novel" monad to represent parallel graphics operations.

I had the opportunity to give that material as a talk at Racketcon (I used Racket as the demonstration language). There is a version available of that online someplace, I believe, but I only had 15 minutes there, and its more of a half hour or longer talk. So I recorded a screencast version, which is now available here.

The talk is in OGV format. If anyone needs it in AVI, I can probably accommodate, but I figure anyone who cares about monads can probably figure out how to watch an Ogg Video file.

Oh yeah: all the code (including the slides, which are written in Racket) is in the racketcon directory at this github repo.

Thursday, August 25, 2011

A Survey of Syntactic Extension Techniques in the Lisp Family of Languages (Part 3)

Last time we talked about the Lisps which use something we might call defmacro-style syntactic extensions (Common Lisp, Emacs Lisp, Clojure, Arc, and things we didn't specifically talk about, like LuSH). The time before that, we talked about dynamically scoped interpreted Lisps with fexprs in which the fact that a piece of code's value is always dynamically defined was exploited to let various run-time approaches to staging execution stand in for macros.

Hopefully, throughout the previous posts, I've impressed upon the reader the importance of scope in all of these discussions. The key reason that picoLisp or newLisp can get away with fexrp-style macros is that code literally means what it says whenever it is you evaluate it. This has some things to recommend it, but these days lexical scope is the biggest rooster in the yard. Lexical scope makes reading and writing code easier and prevents certain kinds of bugs, but it introduces problems with defmacro-style macros, which manipulate code without knowing anything about the lexical contexts associated with variables. What this principally means is that a defmacro-style macro:

  1. Will behave incorrectly if it depends on other macros/special forms whose definitions are changed between its declaration and interpretation.
  2. Can introduce symbol-bindings which unexpectedly shadow symbol bindings in expanded sub-forms.

In Lisps with defmacro style macros, 1 is kind of dealt with by having separate namespaces - generally, one just hopes that people won't often redefine macros and, if they do, they will do so in their own namespace. Macros implicitly invoked by a macro expansion will hopefully or by design expand into package-qualified symbols, which we trust the user not to clobber without knowing what she is doing.

Point 2 is dealt with by generating symbols during macro expansion which are guaranteed (under most circumstances) to be unique from any other symbols which are in use. This gives macros a means of expanding into code which references its own private variables without any concern of clobbering some bindings unexpectedly.

If we restrict our attention to regular old Lisp code, without macros, in a lexical Lisp, we might begin to think that syntactic extensions are kind of lacking when compared to other kinds of kinds of things with names. Of course, we can't pass syntactic extension tokens around at run time or anything like that but we might want them to behave in a way a little more like variables. That is, we want them to obey reasonable scoping relationships. In a lexically scoped language, for instance:

(let ((x 10)) (lambda (y) (+ x y)))

The x inside the lambda will always refer to the x bound in the enclosing let expression. If you want to check the value of an expression that uses that lambda, you don't have to understand the whole history of the symbol x throughout the program. You just need to know the value passed in (y), and then go look at the lexical context of the lambda to see what x is.

What if we could invent a macro system where this was true of macros? Where the meaning of a macro could be determined by looking at the inputs at the location of macro expansion, but then only look at the macro definition, and its lexical context, to understand what the expansion means?

Extending Our Code Representation

Most Lisps represent code as a list of atoms and lists. This is fine for most things, but we've discussed at length how this representation makes macro expansion problematic. Scheme extend their code representation to include extra information, most interestingly about the lexical context of a piece of code. Code, then, is represented as syntax objects.

In Racket, the Scheme(+) implementation I recommend for fiddling with Scheme, you can see this explicitly. The quotation operator still produces naked code:

> 'x ;-> x

But the sharp quote operator, short hand for syntax, produces syntax, which is a data object which contains (abstractly) lists of atoms and lists, but decorated:

> #'x -> #<syntax:11:10 x>

(Note that #' used in this way is specific to Racket).

Syntax
Syntax, unlike the raw code representation of other Lisps, contains information about the lexical context and binding of symbols.

The Racket printer shows a little of the information associated with the syntax (line number, it appears), but lots of other information is stored. Racket macros transform and produce syntax, not raw lists. Except for this fact, they are the same as defmacro style macros. You may have heard people say that you can't write Scheme macros in Scheme (as you write Lisp macros in Lisp), but this isn't really true. Syntax transformations have to produce and consume syntax, but otherwise are unrestricted. It is true that Schemes vary in the degree to which you are provided tools to arbitrarily manipulate syntax, but that is at least partly a separate issue.

Introducing New Syntax

Scheme provides an operation, define-syntax, which informs the system that a symbol should be associated with a new syntax transformer. This is the equivalent of defmacro in other Lisps. It has the form:

(define-syntax <symbol> <syntax-transformer-expression>)

The first symbol is equivalent to the name of the macro in defmacro lisps. The second expression must be a piece of Scheme code that generates a syntax-transformer which is a regular function (as we shall see) that must take and return syntax1. After executing a define-syntax, if the <symbol> is detected during macro-expansion, its contents are passed to the syntax transformer, and the results are stitched back into the code.

Simple Tools Help Maintain Hygiene

So how do we write new syntax-transformers? In Lisp, we just use the list manipulation functions, quasiquote and unquote, to take apart and build up our new source code. Schemes provide equivalent procedures for syntax objects, but most macros are simple enough to be written in a high-level form which helps maintain hygiene automatically. There are many ways to do this, the simplest of which is syntax-rules.

First in words: syntax-rules is a form which takes a series of patterns and expansions and produces a syntax-transformer which examines syntax until a pattern matches and returns the associated expansions. Critically, when a pattern is matched, the parts of the pattern are bound to pieces of syntax which can be used in the expansion. Let's do let*, like we did last time:

(define-syntax my-let*
 (syntax-rules 
  [(my-let* () body ...)
   (begin body ...)]
  [(my-let* ((id expr)) body ...)
   ((lambda (id) body ...) expr)]
  [(my-let* ((id0 expr0) (id expr) ...) body ...)
   ((lambda (id0) (my-let* ((id expr) ...) body ...)) 
    expr0)]))

I think people find syntax-rules confusing, when coming from defmacro, because it handles a lot of things for you. For one thing, the fact that this expression is a syntax transformer isn't as obvious as it could be, but think of syntax-rules as an expression which returns a function, which is exactly what it is. That isn't so odd!

But syntax-rules also handles pattern matching for you and binding the results of that matching to pieces of syntax in the expansion expression. On top of that, neither the pattern nor the expansion constitutes regular Scheme code. Each is, in fact, its own unique, but related, domain specific pattern language, a bit like a regular expression match and replacement language for syntax.

Let's look at each specific pattern and its expansion. The first pattern handles the case where someone invokes a my-let* binding expression without any bindings. It matches a piece of code which looks like:

(my-let* () body ...)

There is a piece of that domain specific pattern language in this code. The expression body ... tells syntax-rules to match zero-or-more forms of arbitrary structure and bind them to the pattern body ... in the expansion part of the clause. Since we aren't doing anything with the body we don't care if it is there or not, since we handle it the same way. We expand this clause into:

(begin body ...)

Here body ... now refers to a piece of bound syntax produced by the pattern part of the expression. When the expansion is expanded, it will contain whatever was matched to body ... in the pattern, which might be nothing.

The next clause's pattern is:

(my-let* ((id expr)) body ...)

Here we instruct the pattern language to only match when the binding part of the let contains one two-element list which should be an identifier and an expression (we will have to think about error checking later). The identifier is bound to id, the expressions is bound to expr. body ... is bound as above. The expansion is given as:

((lambda (id) body ...) expr)

Because id, body ..., and expr were bound in the pattern, they have special meaning in the expansion. The symbol lambda, however, is a piece of syntax which is corresponds to the currently operating lambda syntax. Note that syntax-rules is basically handling automatically what we would have done manually with gensym and quasiquotation/unquotation.

At the risk of being tedious, lets look at the final pattern. It doesn't contain anything new, but it does highlight some of the strategies often encountered in writing this kind of syntax transformer. The pattern is:

(my-let* ((id0 expr0) (id expr) ...)
         body ...)

Here, var0 and expr0 are bound to the first id/expr pair. We write out the first explicitely for two reasons. The first is that (id expr) ... matches zero or more expressions, and we want at least one. The second is that we need to split up the id and the expr part of our match and insert the ... rest of them somewhere else. You can't do this with the pattern ... match, which must appear in the expansion as pattern .... We could write:

(my-let* ((id0 expr0) binder ...)
    body ...)

of course, but by specifying a more concrete match we make our macro a little bit better in terms of robustness. We really want at least a pair in each binder position, so we say so directly.

We expand this pattern as:

((lambda (id0) (my-let* ((id expr) ...) body ...)) expr0)

Which should be understandable enough. The main thing to note is that we are recursively expanding this syntax-transformation with calls to itself. Because syntax-transformations often represent purely functional computations on input syntax, recursion is a common (and elegant) strategy.

The Scheme universe offers other forms that allow more complex macro expansions, but they all owe a lot to the basic shape of syntax-rules.

Shortcomings of Syntax Rules

The standard way of demonstrating the shortcomings of syntax-rules is by trying to write an anaphoric if macro. Anaphoric if is a macro that binds the result of its predicate part to a variable, which is available for use inside either branch expression (often the symbol it is used for this purpose). Since you've read Part 1 and Part 2, I can just explain this by writing the appropriate macro in Common Lisp:

(defmacro anaphoric-if (pred true &optional false)
 `(let ((it ,pred))
    (if ,it ,true ,false)))

Simple! The utility of the macro is related to the fact that if treats lots of things other than just booleans as true. Imagine you have a function which processes a command entered by the user. The command is returned by the function get-command. If the user just pressed enter, nil is returned.

(anaphoric-if (get-command) (handle-command it) nil)

it contains the result of get-command. Our macro has captured the symbol it and is using it for its own purposes. If we already had an it in scope, this might be inconvenient (and, in my opinion, there are lots of things wrong with this kind of macro, but some people are into it). Misgivings about variable capture aside, it is easy to write this variable capturing macro in Common Lisp, and it behaves just as we expect it to with our filthy, untutored brains.

We might try to write this macro using our new knowledge of syntax-rules like so:

(define-syntax anaphoric-if
 (syntax-rules ()
  [(anaphoric-if pred true false)
   (let ((it pred))
     (if it true false))]))

But if we attempt to use this syntax-transformer, we'll quickly run into problems:

(anaphoric-if 10 (format "it is ~a~n" it) #f)
  . . reference to an identifier before its definition: it
  >

We expected the string "It is 10\n", but we got an error! What is happening is that syntax-rules is trying to protect us from ourselves. Inside the expansion we introduce a piece of syntax, it. It isn't, however, simply a symbol - it is syntax, and its metadata tells the Scheme system that its scope is limited to the macro expansion context. We can use it as the predicate spot in our if statement, but any it symbols appearing in the expressions matched to true or false will only be superficially similar. The Scheme system will know that the its in those expressions are different than the one we create in our macro expansion. Hence, when the evaluator encounters it in one of our branches, it checks that symbol's original lexical context (in our example, the top-level binding environment) for a binding, which it doesn't find. Hence the error.

We can approach the solution in two ways (at least). The first is to force the user to provide the symbol name herself. Scheme will then know that the symbol, whatever it is, is meant to be the same for all appearances in the expansion:

(define-syntax named-anaphoric-if 
 (syntax-rules () 
  [(named-anaphoric-if id pred true false)
   (let ((id pred))
     (if id true false))]))

Or we can try to tell scheme we want to capture the symbol it. I don't know of any way to do this with syntax-rules, although I suspect that if you really wanted to bend over backwards, there might be some way. We need something else! Plus, there are lots of other idioms, not just the ugly duckling variable capture, which are hard to express in syntax-rules. Hence, Schemers cooked up...

Syntax-case

Before we talk about syntax-case I want to emphasize that both syntax-rules and syntax-case are just tools that produce syntax-transformers. Their unusual, domain specific languages are meant to make writing transformers correctly easier, but they are not, themselves, the Scheme macro system. They are just tools built on top of that system. The underlying system is much like the one in defmacro Lisps, except it transforms syntax, not lists.

syntax-case looks a lot like syntax-rules. Here is how you'd write the named-anaphoric-if in syntax-case:

(define-syntax named-anaphoric-if* 
  (lambda (stx) 
    (syntax-case stx ()
      [(named-anaphoric-if* id pred true false)
       (syntax (let ((id pred))
                 (if id true false)))])))

Almost identical. Sharp eyed readers will not that the expansion has been wrapped with a syntax expression, which tells the environment to evaluate the contents contained therein as syntax, which means automatically handling any syntax bound by the pattern match. We also have to wrap syntax-case in a lambda, which is the syntax-transformer proper. Unlike syntax-rules, which is a form resulting in a syntax-transformer, syntax-case merely transforms syntax. It does not produce a transformer itself.

However, since the expansion part of syntax-case clauses is not automatically interpreted as syntax, it allows us to finagle the matched syntax manually. Turns out this is just what we need to "capture" a variable. Here is how we do it:

(define-syntax anaphoric-if** 
 (lambda (stx)
  (syntax-case stx ()
   [(anaphoric-if** pred true false)
    (with-syntax ([it-id (datum->syntax (syntax true) 'it)])
     (syntax 
      (let ((it-id pred))
       (if it-id true false))))])))

So this is a syntax-case expression with one clause, which matches things like (anaphoric-if** a b c). The next part of the clause must return a piece of syntax, but it can do any arbitrary scheme activity to produce that syntax, unlike syntax-rules. The expression with-syntax allows us to introduce an environment with new syntax-bindings in addition to those produced by the match expression. We use it to introduce a binding to it-id. This means we can use the expression it-id inside our expansion (within the with-syntax body) and it will be interpreted correctly by syntax expressions. How do we actually create a new piece of syntax, though? The function datum->syntax takes a piece of data (in our case, the symbol it) and transforms it into a piece of syntax. But syntax contains much more information than just its value, and some of that information determines how it is interpreted during macroexpansion and subsequent compilation. datum->syntax allows us to pass in, as the first argument, a piece of syntax from which we will clone that additional information. Since we want to let it operate in the same environment as the true and false clauses, we clone the syntax-metadata from the true branch. We then return a piece of syntax corresponding to our entire expression. Again, syntax automatically handles the equivalent of quasi/un-quotation as we might do it in Lisp. It looks at the syntax bindings which are active and automatically interpolates them. We can now say, with a straight face:

(anaphoric-if** 10 it it)

And get 10, as we expect to. We now know how to write selectively-non-hygienic macros. Go us.

Recapitulation

We started, back in Part 1 by talking about dynamically scoped Lisps where code was always written and executed in the most obvious way possible. These Lisps maybe have been difficult to write programs in, or at least unfamiliar to modern Lispers and programmers at large, but they did have one thing going for them: their treatment of code/data was so simple that special forms could be treated identically with functions. These so-called fexprs could choose when to evaluate their inputs because they put the onus of maintaining various bindings to a single symbol on the user.

In these languages, macros were often considered to be less useful than fexprs, and why not? We could see when we looked at the defmacro Lisps shared some problems which, if not really a problem in practical use cases, at least left us philosophically unsettled. Macros could easily be written which unintentionally captured variables, they were not first class, and didn't follow regular lexical binding rules. In exchange for these shortcomings, we did get macros which were almost as nice as fexprs and worked better in lexically scoped languages, because they forced us to emphasize the fact that macro-expansion occurs before execution. Tools like gensym and packages/namespaces helped us avoid the worst hygiene problems.

And so we've finally arrived at Scheme, were macros, now called syntax-transformers work not on an undecorated representation of Lisp code, as they do in defmacro lisps, but on syntax, a code representation where information about context and binding is represented as well. Adding this additional representation to the code allows forms like syntax-rules to automatically handle syntax transformation for us while maintaining hygiene.

Finally, we grudgingly admitted that people somewhere might want to intentionally violate hygiene, and introduced the syntax-case expression, which allows the user to manipulate syntax-objects to the end of capturing variables and the like.

Further Reading

There are tons of things still to be said about the subject, especially regarding error checking and reporting. And the small, but enthusiastic communities around newLisp and picoLisp as well as the thriving Emacs Lisp community demonstrates that despite academic discussion, even issues of lexical or dynamic scope is far from resolved.

Further reading might entail getting one's head around syntax-parse a special form implemented in Racket and (partially) in Clojure, which is meant to make writing hygienic (and selectively unhygienic) macros both easy to write and easy to make provide useful, informative error messages. You could also check out Kernel, which is an experimental Lisp which tries to address all of these issues with tons of dollar signs and abstraction.

As always, I hope this has been useful to someone other than myself!

Back Matter

If all you care about is Lisp, then we've reached the end of this series on syntax. However, discussions with a reader, Dru Nelson, compel me to mention The Factor Language as something of a conceptual digestif. Factor is a concatenative language which has many similarities with Lisp, which puts it in the odd category of being more Lispy than Lisp in many respects.

Factor, for programmers unfamiliar with concatenative languages, in its ordinary operation, does not have variables: values are passed between subroutines (called "words") on a data stack. I strongly encourage readers to download Factor and give it a spin - I find writing Factor programs to be delightful mental calisthenics. Here is how you calculate (3 + 13) * 6 in Factor:

( scratchpad ) 3 13 + 6 *

--- Data stack:
96

A Factor program is, apparently, a flat list of "words" with data flow handled implicitly. What is the equivalent of a "lambda" expression in such an exotic language? Well, its just a list! Since there are no named things except for words, a list of the words (and literals, which can be thought of as words which "push" themselves onto the stack) which make up a subroutine is all you need or can have. In Factor, code really is data and execution of that code is even simpler than it is in dynamically scoped languages. You can't even make the common mistakes of dynamical languages respecting variable binding because, well, you don't have variables. You don't need to write Macros, such as we have them in Lisp, in Factor, because your functions are free to shuffle around, decompose, build and partially evaluate quotations to their heart's content2.

Factor goes one step further towards simplicity than even picoLisp - it completely eliminates variables, and as such, data becomes even closer to code than it is in that language.

More Lispy than Lisp indeed!


1 I'm pretty sure that syntax, as (essentially) a wrapper for data, is a monad, but that is definitely another story.

2 Factor actually requires that quotations have infer-able stack-effects, which places limitations on them in practice (surprisingly few, though!)

Friday, August 12, 2011

On Discovering Programming, Age 12

I discovered at once a parallel
eroticism. New and strange, unknown,
fumbling, my mind hardly able to tell
what subtle provocations meant, shown
in stipled fading purple, inkjet blown,
on pages already aged when I first
touched them and, virginal, brought them home.
Like the child I was I struggled to birth
new sensuous faculties, learned to perch
full, on the edge of abstract orgasms.
To caress the invisible shapes, to lurch
towards clasping ideas in ersatz spasms.
  I still remember my naive eros,
  from which a lifetime of pleasures arose.

Thursday, August 11, 2011

A Survey of Syntactic Extension Techniques in the Lisp Family of Languages (Part 2)

Last time we began our survey of syntactic extension technqiues in Lisp with a rather airy meditation on syntax, quotation, evaluation and lambda, all by way of picoLisp's unusual dynamic, interpreter oriented semantics. Very briefly, we looked at the way picoLisp lets regular, first class, functions specify that certain arguments should be passed in unevaluated, which the function can then selectively evaluate. Because the language is dynamically scoped and interpreted, this gets you as far as macros, and a little farther (since even "special forms" are first class).

This time, we're going to attempt to be a little more direct in our survey. But we should still think a little bit about macros before we jump in. In the Lisps we'll talk about today, syntactic extension is handled via macros. Given the flexibility of the picoLisp approach (also known as fexpr's, and also supported by newLisp, which is itself an exotic species I might write about later), you might wonder why we'd want to use macros. The principal obvious disadvantage of macros is that they are not first class objects, they can't be stored in variables or passed around, whereas picoLisp style functions/special forms are. There are lots of reasons they were abandoned, but one easy one to see is that macros make separating different stages of program interpretation and execution a bit easier.

Consider executing an fexpr. In order to correctly calculate the value of such a first class function, we need to remember not just the flow of values from one function to another, we also have to know the syntax of all the inputs. Much of the efficiency of compilation is in the ability to disregard this information, and reduce the program to a series of simple, primitive, operations. If our language is lexically scoped, then, in addition to remembering the syntax of the inputs, we also need to know the context of that syntax in order to correctly evaluate it. I'm surely oversimplifying things here, but you can read about the problems with fexpr's here (the commentary on this subject at Lambda the Ultimate is also great)1. Kazimir Majorinc, by the way, has written a rebuttal of Pitman's original concerns. I have to admit, I find newLisp in particular to be a fascinating case study in alternative approaches to Lisp and computer programming languages in general. More people should try it out!

If we stage our syntactic extensions, as in more modern/typical Lisps, after program reading, but before execution, it simplifies the design of our compiler/virtual machine/interpreter. The macro systems we'll be discussing today operate in this way, transforming an intermediate form of Lisp code, produced by the reader, before finally handing the result to the compiler. The upside is the compiler need only concern itself with the behavior of basic lisp and its small set of special forms. The downside is that macros are stuck working only on source code: they cannot, in general, know anything about runtime, as the code has not yet been run!

Transformation

The Arc Documentation has something rather insightful to say about this:

Macros live in the land of the names, not the land of the things they refer to.

So, today we'll be looking at macro systems by which the user declares that certain symbols indicate to the intermediate processing step of the language that some portion of code should be transformed by a specified piece of code and the result inserted back into the code representation where the macro appeared.

Business Time!

Macro syntax can be a bit confusing when you see it for the first time. It is useful, then, to consider what a macro might do rather than how it is written, first. Macros should almost always transform code into code, with no side effects along the way, and so we can understand a macro by writing the code before and then the code after macro expansion.

If you know some Lisp, you know that variables are introduced with let epxressions. Each binding expression in a let is executed without any of the other bindings in scope, so:

(let ((x 10)
      (y (+ x 1)))
    (+ y x))

Will produce an error, because x is not visible until the body portion of the let expression. If you want to nest references to new variables, you need let*:

(let* ((x 10)
       (y (+ x 1)))
   (+ y x)) ; -> 21

Which does work. Let's write a let* macro as an example. Expressions like let* are all about [binding][], and that means we can use lambdas to express let*. We want:

A:

(my-let* ((x 10)
          (y (+ x 1)))
      (+ y x))

To expand to:

B:

(funcall (lambda (x) 
   (funcall (lambda (y) (+ y x)) (+ x 1)))
   10)

Take a second to think about it, if you don't see why. The basic insight is that lambda can be used to introduce a context where a variable is bound, and funcall binds that variable.

The transformation of some code A to some other code B is the action of all but a few exotic macros. This transformation happens, conceptually, before any code is run (in practice, because many Lisps favor interactive development, macro expansion might occur after the user has executed some code, but it is highly unusual, and probably an error, for the macro expansion itself to depend on the results of previous execution (except that the macro may use previously defined functions to effect the code transformation)).

Let's write the macro, this time in Emacs Lisp:

(defmacro my-let* (bindings &rest body)
  (cond 
    ((empty? bindings) (cons 'progn body))
    (t
     (let ((pair (car bindings))
           (leftover-bindings (cdr bindings)))
      `(funcall 
         (lambda (,(car pair))
           (my-let* ,leftover-bindings ,@body))
         ,(cadr pair))))))

(my-let* ((x 10)
          (y (+ x 1)))
     (+ y x)) ;-> 21

We need to say a bit about quasi-quotation to really understand how this works, but there are some things we can explain right away. Emacs Lisp represents code as a list of symbols, other atoms, and lists. When my-let* is encountered, the source code after my-let* in the code is passed, as a list, into a function (implicitly defined by the defmacro statement. This list is destructured by this function using regular old argument specification (so that the first item in the list is bound to bindings and all the subsequent items are bound to body, in a list) and then the body of the defmacro is executed. It's result must be a piece of code, which is inserted wherever the macro appeared in the source code representation.

Let's focus on the first possibility, when the macro is called with an empty set of binding forms. Our code is:

(my-let* () "Hey there.")

This is slurped up into the Lisp interpreter as:

(list 'my-let* '() "Hey there.")

The interpreter sees my-let* and knows that this indicates a macro. To expand this macro, it calls a function based on the defmacro expression. This function doesn't have a name, but we can write what it would look like:

(defun -my-let*-expander- (binders &rest body)
   (cond 
    ((empty? bindings) (cons 'progn body))
    (t
     (let ((pair (car bindings))
           (leftover-bindings (cdr bindings)))
      `(funcall 
         (lambda (,(car pair))
           (my-let* ,leftover-bindings ,@body))
         ,(cadr pair))))))

The macro expanding part of the lisp system calls this function with the following arguments:

(-my-let*-expander- '() "Hey There")

This function returns:

'(progn "Hey There")

And the macro expanding part of the lisp system inserts that expression, whole hog, so to speak, into the place where the original form starting with my-let* was located. It then moves on. Once all the macros are expanded, the code is passed to the compiler/interpreter/etc and then executed. Viola!

Viola!
Voila!

To understand the other branches, we need to cover a feature which appears in most lisps: quasiquotation.

Macros & Quasiquotation

Lisps, more or less, represent code internally as lists or/of atoms (things like numbers, strings, symbols). The code x represents, more or less, a piece of code that evalutes to the value of the symbol x. The list (+ x 1) is the piece of code that evaluates to the value of symbol x plus one. We covered last time how to enter these unevaluated pieces of code into a running lisp session using quotation. The quote operator tells a Lisp not to evaluate whatever it is applied to. So:

(setq x 10)
x  ;-> 10
'x ;-> x
(+ x 1) ;-> 11
'(+ x 1) ;-> (+ x 1)

Quote is ok when we want to write a tiny bit of code-as-data, but when we write macros, we often want to interpolate between data and code in a more dynamic way. We could, of course, use list to create our code:

(list '+ 'x 10) ;-> (+ x 10)

Because list is a function, each argument is evaluated - we quote manually whatever arguments we want to leave unevaluated as we construct our code.

Quasiquotation is kind of the opposite of list. A quasiquoted expression, by default, doesn't evaluate its inputs except when they are unquoted. Quasiquote is indicated by the back-quote character:

`

And, within a quasiquoted form, unquotation is indicated by a ,, which kind of makes sense. Quasiquotation works pretty much identically in Emacs and Common Lisp, so fire one of them up and try it out. You can use quicklisp to install Common Lisp quickly.

`(+ x ,(+ 10 11 12)) ;-> (+ x 33)

Should work in either Common Lisp or Emacs. Note, however, that different Lisps handle symbols slightly differently. In SBCL, a Common Lisp implementation, 'x will evaluated to X, and symbols are not case-sensitive by default. Symbols are case sensitive in Emacs Lisp.

Sometimes you want to interpolate the contents of a list into a piece of code, in which case you say ,@ instead of ,:

`(+ ,@(list 1 2 3)) ;-> (+ 1 2 3)

Without concerning ourselves with namespaces or packages (of which Emacs has neither) Clojure macros operate in the same way as Emacs/Common Lisp macros, but with some slight changes in the way operations are indicated:

(defmacro my-let* [bindings & body]
  (cond
   (empty? bindings) (cons 'do body)
   :else
   (let [[symbol value-expr & leftover-bindings] bindings]
     `((fn [~symbol] (my-let* ~leftover-bindings ~@body))
       ~value-expr))))

This piece of code reads just like the Common/Emacs Lisp except for a few details. The first is that the argument list to the macro is specified using a Clojure vector rather than a list. Unlike in Common/Emacs Lisp, the default Clojure reader is able to read more than lists and symbols. The [] syntax indicates a Clojure persistent vector. It is read into the code representation as a vector, rather than as some kind of list or atom. This is both an important and a trivial difference. Consider in Emacs Lisp:

(vector 1 2 3)

This indeed evaluates to a vector with elements 1 2 3. But it is read as a list whose head is the symbol vector. Only upon evaluation do we get a vector.

(vectorp '(vector 1 2 3)) ;-> nil
(listp   '(vector 1 2 3)) ;-> t

In Clojure, by contrast:

(vector? '[1 2 3]) ;-> true
(list?   '[1 2 3]) ;-> false

(But see this footnote:2).

(As an aside, Clojure also supports tables as part of its code representation, in keeping with its intent of expanding Lisp's philosophy to data structures other than lists.)

In Clojure, do is how you say progn, both of which introduce a form which evaluates all its parts, and returns the result of the last. Next, we follow the convention that redundant parentheses in Clojure should be elided. Because we know bindings contains a list of pairs, we just read that list by two, rather than as an actual list of pairs. You see this in Clojure's let form, which has one less layer of nesting.

Emacs:

(let* ((x 10) (y 11)) (+ x y))

Clojure:

(let [x 10 y 11] (+ x y))

(Note that picoLisp also elides redundant parentheses, but does not use vectors for binding, also note that Clojure doesn't have let*, let has that behavior).

Also by convention, we use vectors for the binding part of any form (although this macro doesn't check for that). We say fn instead of lambda in Clojure, and we use ~ for the unquote operation. In Clojure, ~@ is the way you say ,@.

Other than those differences, the Clojure macro has the same behavior as the Emacs/Common Lisp Macro. It creates an invisible function which is used to expand code tagged with that macro name during post-read processing, and then the code is passed to the Clojure compiler. It is worth noting here that Brian Goslinga has ported Scheme-style syntax-case/rules macros to Clojure, but it isn't clear to me without further study whether they truly are hygeinic. Claims of macro hygiene are often exagerrated outside of the Scheme universe.

Issues with Naive Code-Rewriting Macros

The best way to understand the motivation for Scheme hygeinic macros, as well as some of the differences between more or less conventional macro systems in other Lisps is to understand how things can go wrong. Last time we talked extensively about scope and how it effects the way we look at a piece of data that represents code. The upshot was that in languages where scope is dynamic, which means variables evaluate to whatever the current binding of the variable is at the moment of evaluation, can simply represent code (and particularly the meaning of symbols) as just lists of atoms and symbols. A piece of code "means" whatever it is you get by evaluating it in the context of the current bindings from symbols to values. Period.

Lexically scoped languages, on the other hand, impose more strenuous semantics on code. In a lexically-scoped language, variables refer to the lexical environment (the environment around them "on the page") when they are evaluated. Hence, in a lexically scoped language, a "naked" piece of code, such as the code produced by quotations, is "impoverished" - it doesn't record in any way the lexical environment in which the quotation was created, and so it is, in some sense, "meaningless," at least the extent that it contains symbols which aren't bound.

In a dynamic language, the code fragment consisting of the single symbol 'x means "Whatever value x has when you evaluate me." This representation is complete, but depends on the evaluator. In a lexically scoped language, where symbols, when they appear in code, are implicitly associated by the rules of evaluation with a lexical context they were originally created in, 'x is meaningless. It looks like a piece of code, but in a real sense it isn't quite one.

Emacs/Common Lisp/Clojure style macros, however, operate on pieces of "code" produced by ordinary quotation - that is, they operate on "hypocode," say, which just means something not quite code.

Making a Mess

Let's make a mess, by way of example. Suppose we have an object system wherin objects are collections of key -> value relations in a persistent data structure, like an association list3. We will work in Emacs/Common Lisp.

Preliminaries:

(defun empty? (x)
  (eq x nil))

(defun set-slot (obj symbol val &optional acc)
  (cond 
   ((empty? obj) 
    (cons (cons symbol val) (reverse acc)))
   (t
    (let* ((first (car obj))
           (o-key (car first))
           (rest (cdr obj)))
      (if (eq symbol o-key) 
          (append (reverse acc) (cons (cons symbol val) rest))
        (set-slot rest symbol val (cons first acc)))))))

(defun get-slot (obj symbol)
  (cond
   ((empty? obj) nil)
   ((eq (car (car obj)) symbol)
    (cdr (car obj)))
   (t
    (get-slot (cdr obj) symbol))))

(get-slot (set-slot (set-slot (set-slot '() 'x 10) 'x 11) 'y 14) 'y)

Methods will just be functions whose first argument is the self object.

(defun make-person (first last) 
  `((:first-name . ,first)
    (:last-name . ,last)))

(defun change-first-name (self new-first)
   (set-slot self :first-name new-first))

(defun change-last-name (self new-last)
   (set-slot self :last-name new-last))

(note the use of backquote outside of a macro.)

Our objects are "pure" in this example - the methods change-first-name and change-last-name don't modify self, they return a fresh self object with the appropriate changes. I prefer this behavior, but it makes chaining method application clumsy. We'll develop a macro to make method chaining for pure objects easier.

the Bionic Woman
I don't know the Bionic Woman, but she comes up when you google "Woman".

I know a woman who changed both her first and last names when she got married. She went from Ami Culbert to Amy Klein (she had always resented her parents for the unusual spelling of her first name). We've got to nest our method calls or use a let to affect both renamings:

(let ((new-self 
       (change-first-name (make-person "Ami"
                                       "Culbert") "Amy")))
   (change-last-name new-self "Klein"))
 ; -> ((:first-name . "Amy") (:last-name . "Klein"))

Let's write a macro which automatically threads an object through method invokation. It should expand:

(with-object o 
  (change-first-name "Amy")
  (change-last-name "Klein"))

into something like:

(let ((new-self 
       (change-first-name o "Amy")))
   (change-last-name new-self "Klein"))

People often complain, inaccurately, I think, that macros are "hard to debug" because they aren't regular functions. It is true they aren't invoked as regular functions in the ordinary course of events, but one can certainly write the "business end" of the macro as a regular function and just have the macro definition pass the quoted inputs to this function appropriately. Then you can interactively test the macro expansion, build unit tests, etc.

(defun with-object-expander (object-expr method-applications)
  `(let* ((object ,object-expr)
          ,@(mapcar 
             (lambda (methap) 
               `(object (,(car methap) object ,@(cdr methap)))) 
            method-applications))
        object))

(with-object-expander 'test-object 
 '((change-first-name "Amy")
   (change-last-name "Klein"))) ;-> 
  (let* ((object test-object) 
         (object (change-first-name object "Amy")) 
         (object (change-last-name object "Klein"))) 
     object)

If you are new to Lisp macro writing, this probably looks like it works just fine. If you are a hoary old Lisper (and I know some of you are), then this probably looks like one of those ridiculous examples from safety videos where a goofy guy climbs up a ladder, or some such, with an aerial around some high tension power lines, which is exactly what it is. Let's sally forth like Goofus, however:

(defmacro with-object (object-expression &rest method-exprs)
  (with-object-expander object-expression method-exprs))

(with-object (make-person "Ami" "Culbert")
  (change-first-name "Amy")
  (change-last-name "Klein")) ; -> 
   ((:first-name . "Amy") (:last-name . "Klein"))

Oh Goofus
Goofus probably programs in Perl. Just sayin'.

Indeed, apparent success! However, imagine if we are instead taking the new last name from an object representing Ami's fiance:

(let ((object (make-person "Jason" "Klein")))
  (with-object (make-person "Ami" "Culbert")
    (change-first-name "Amy")
    (change-last-name (get-slot object :last-name)))) ; ->
  ((:first-name . "Amy") (:last-name . "Culbert"))

What gives? Amy's last name should be "Klein" but it is stil "Culbert". What happened? A quick look at the macro expansion of the entire expression would help, but how do we get such a thing? Emacs/Common/Clojure Lisps provide macro-expansion tools which take an expression and expand macros inside of it:

(macroexpand-all 
 '(let ((object (make-person "Jason" "Klein")))
  (with-object (make-person "Ami" "Culbert")
    (change-first-name "Amy")
    (change-last-name (get-slot object :last-name))))) ;->

    (let ((object (make-person "Jason" "Klein")))
      (let* ((object (make-person "Ami" "Culbert"))
     (object (change-first-name object "Amy"))
     (object (change-last-name object 
               (get-slot object :last-name))))
     object))

Now we can see what the problem is. We bound Amy's fiance to the symbol object, but our macro expansion also used that symbol internally, to represent the object being threaded through the method applications, which means it was bound to the Amy person by the time the call to get-slot was executed on object. Amy's last name was just her last name, so the change-last-name call changed nothing!

Note that this is, essentially, an error of context or variable interpretation. Our mental model was that the object symbol used during macro expansion was unrelated to the object symbol used in the subsequent let expression. But the Lisp has no way of knowing that. Because we are manipulating naked code, a symbol is just a symbol, and the behavior of our complete expansion is just whatever behavior it looks like it would have if someone had written it by hand. Scheme's macro system is designed to resolve this problem by separating contexts more aggressively, so that macro expansion doesn't get in the way of later variable binding unless you specifically want it to.

We don't, however, have hygeinic macros in the Lisps we are talking about today, so how do we avoid the problem specified above? There are at least two ways which are in common use. The first is to force the invoker of a macro to provide any names which might be used during expansion, so they are forced to acknowledge that those names will be rebound by the macro. The second is to make sure that the macro uses symbols which you are certain will never be used by the programmer who invokes the macro.

The former strategy looks like this:

(defmacro with-object-bound-to (sym obj &rest meths)
  `(let* ((,sym ,obj)
          ,@(mapcar
            (lambda (m)
              `(,sym (,(car m) ,sym ,@(cdr m)))) 
            meths))
        ,sym))

Invoked as:

(let ((object (make-person "Jason" "Klein")))
      (with-object-bound-to amy 
        (make-person "Ami" "Culbert")
        (change-first-name "Amy")
        (change-last-name (get-slot object :last-name))))

Which has the intended effect. However, if you don't intend on using the value bound to amy at any time, you'd probably prefer regular old with-object.

We could implement with object by just trying to use a crazy symbol for object, something we'd hope the users of our macro will never think to use. For instance, we could prepend the letter s to the md5 hash of the word "object": sa8cfde6331bd59eb2ac96f8911c4b666:

(defun with-object-expander (object-expr method-applications)
  `(let* ((sa8cfde6331bd59eb2ac96f8911c4b666 ,object-expr)
          ,@(mapcar 
             (lambda (methap) 
               `(sa8cfde6331bd59eb2ac96f8911c4b666 (,(car methap)
  sa8cfde6331bd59eb2ac96f8911c4b666 
  ,@(cdr methap)))) 
            method-applications))
        sa8cfde6331bd59eb2ac96f8911c4b666))

And that would probably work. However, Lisps usually provide a facility to generate symbols guaranteed not to be used (subject to the understanding that if symbols are read at run-time, all bets are off - althouh it is still unlikely there will be a problem). In Common and Emacs Lisp, this is accomplished via the function gensym. In these Lisps, we'd write:

(defun with-object-expander (object-expr method-applications)
  (let ((object-name (gensym "object-")))
  `(let* ((,object-name ,object-expr)
          ,@(mapcar 
             (lambda (methap) 
               `(,object-name (,(car methap) ,object-name ,@(cdr methap)))) 
            method-applications))
        ,object-name)))

The critical points are that we've introduced a variable object-name which, at macroexpansion time, is bound to a fresh symbol. We then insert that symbol wherever object used to be by unquoting it into out expanded expression.

Turns out this feature is so commonly used that Clojure provides special support for it - during backquote expansion, symbols terminated with a # character are automatically bound to appropriately generated symbols. By appropriate we mean that object# will always be expanded to the same symbol within a single backquote. In this example, since some of the uses of the object symbol are generated programmatically, we'd still need to use gensym, which Clojure also provides.

Namespaces, Packages, Hygiene

All this gensym falderal is meant to help us ensure our macros are hygienic, which informally means that in the body of a macro, the code you write behaves the way you expect it to behave, modulo whatever changes in meaning the macro implies AND that when the macro expands, it means what the macro writer meant it to mean, regardless of what symbols mean where the macro is expanded.. Variable capture, the example we looked at above, is only one of many possible hygiene complications. Another possibility is that a macro (the dependent macro, lets call it) might depend on another macro, which itself is defined one way when the dependent macro is written, but might be redefined later when someone else tries to expand the dependent macro. It's very hard to write macros that are hygienic in this sense in the Lisps we've talked about so far, but some of them do provide the faculties to at least help a bit.

Emacs Lisp is the oddball in this department, however - it has neither packages or namespaces or any other module system4.

Both Clojure and Common Lisp provide some conception of modules or bags of symbols. In Common Lisp these are called packages, in Clojure they are called namespaces. In both cases, macro writers can use namespaces to help avoid macro expansion problems. Consider the case where some joker redefines the let* special form. Your macro wants to use let* as it is ordinarily defined by Common Lisp. He wants to use your macro, but wants to put code in it that depends on his crazy new definition of let*. If you write your macro (expander) as:

(defun with-object-expander (object-expr method-applications)
  (let ((object-name (gensym "object-")))
  `(cl::let* ((,object-name ,object-expr)
                ,@(mapcar 
             (lambda (methap) 
               `(,object-name (,(car methap) ,object-name ,@(cdr methap)))) 
            method-applications))
        ,object-name)))

You can have your cake and the user of your macro can eat it too. The key changes is that we've qualified our let* symbol with the package it lives in. cl::let* means we want Lisp to use the meaning of let* as defined in the package cl or common-lisp (packages can have nicknames in Common Lisp). If our user redefines let* in his own package, then he can use our macro without any problems.

Clojure takes this one step further. Backquote automatically expands symbols to their namespace-qualified names, so macros automatically expand to refer to the special forms as defined in whatever namespaces they are defined in when the macro itself was defined. Material unquoted into the macro expansion is expanded appropriate to the context it was introduced in. The net result is that someone redefines let in their own package, and calls your macro, it automatically expands to use the let from the base Clojure package:


`(let [x 10 ~'y 11] (+ x y)) ;->
 (clojure.core/let [user/x 10 (quote user/y) 11] 
    (clojure.core/+
       user/x 
       user/y))

I haven't specifically mentioned Arc throughout all this, but Arc macros basically follow the same design as those in Common/Emacs Lisp and Clojure. If they are more similar to any of the above than any other, it is Emacs Lisp, since Arc doesn't appear to have, at the moment, a namespace or package system.

Conclusions

Good Lisp programmers use macros sparingly, for the most part, and so it is difficult to think up examples when you'd need more macro hygiene then what you get in Lisps with gensym and packages and namespaces. In particular, Clojure seems to provide enough protection that you won't accidentally shoot yourself in the foot unless you wander off the beaten path.

However, you might wonder if we can't come up with a more formal syntactic extension scheme which really protects us in a simple to understand way. Something that makes macro hygiene as easy to conceptualize as variable hygiene for regular code? There are lots of ways to approach this problem (more than I know of, certainly) but next time we'll discuss the way Scheme does it. Scheme syntax-rules/case macros are very different compared to defmacro style forms, and their intent is to make hygiene the default behavior, so that complex macros (that, for instance, depend on one another) can be defined easily without much extra manual fiddling (gensym, packages, etc). As we'll see, the solutions are related to the difference between lexical and dynamical behaviors, a theme which keeps coming up!


1 See also Kernel.

2 I must point out that Emacs Lisp, but not Common Lisp, has a similar read syntax:

[a b c] ;-> [a b c]

Which is sort of like the vector syntax except that it reads the elements of the vector literally, as if they were quoted. The emacs evaluator will not evaluate the elements: when it encounters a vector in the code it is evaluating, it just returns the vector. The Clojure evaluator encounters a vector, and returns the vector containing the result of evaluating each element in the vector.

In emacs:

(let ((a 10) (b 11) (c 12)) [a b c]) ;-> [a b c]

In Clojure:

(let [a 10 b 11 c 12] [a b c]) -> [10 11 12]

I wish Emacs had the Clojure behavior (or an extensible reader, like Common Lisp).

3

An association list is a list of pairs (created with cons). Each pair consists of a symbol and a value.

4

I estimate that the emacs session I'm running right now, to write this piece, has about 25,000 functions and symbols bound. The fact that emacs works at all without namespaces, with just this giant soup of symbols, is amazing and maybe even indicative of some kind of blind spot in our understanding of software development. Maybe worse is better.