Last time we began our survey of syntactic extension technqiues in Lisp with a rather airy meditation on syntax, quotation, evaluation and lambda, all by way of picoLisp's unusual dynamic, interpreter oriented semantics. Very briefly, we looked at the way picoLisp lets regular, first class, functions specify that certain arguments should be passed in unevaluated, which the function can then selectively evaluate. Because the language is dynamically scoped and interpreted, this gets you as far as macros, and a little farther (since even "special forms" are first class).
This time, we're going to attempt to be a little more direct in our survey. But we should still think a little bit about macros before we jump in. In the Lisps we'll talk about today, syntactic extension is handled via macros. Given the flexibility of the picoLisp approach (also known as fexpr's, and also supported by newLisp, which is itself an exotic species I might write about later), you might wonder why we'd want to use macros. The principal obvious disadvantage of macros is that they are not first class objects, they can't be stored in variables or passed around, whereas picoLisp style functions/special forms are. There are lots of reasons they were abandoned, but one easy one to see is that macros make separating different stages of program interpretation and execution a bit easier.
Consider executing an
fexpr. In order to correctly calculate the
value of such a first class function, we need to remember not just the
flow of values from one function to another, we also have to know the
syntax of all the inputs. Much of the efficiency of compilation is
in the ability to disregard this information, and reduce the program
to a series of simple, primitive, operations. If our language is
lexically scoped, then, in addition to remembering the syntax of the
inputs, we also need to know the context of that syntax in order to
correctly evaluate it. I'm surely oversimplifying things here, but
you can read about the problems with
(the commentary on this subject at Lambda the Ultimate is also
great)1. Kazimir Majorinc, by the way, has written a
rebuttal of Pitman's original concerns. I have to admit, I find
newLisp in particular to be a fascinating case study in alternative
approaches to Lisp and computer programming languages in general.
More people should try it out!
If we stage our syntactic extensions, as in more modern/typical Lisps, after program reading, but before execution, it simplifies the design of our compiler/virtual machine/interpreter. The macro systems we'll be discussing today operate in this way, transforming an intermediate form of Lisp code, produced by the reader, before finally handing the result to the compiler. The upside is the compiler need only concern itself with the behavior of basic lisp and its small set of special forms. The downside is that macros are stuck working only on source code: they cannot, in general, know anything about runtime, as the code has not yet been run!
The Arc Documentation has something rather insightful to say about this:
Macros live in the land of the names, not the land of the things they refer to.
So, today we'll be looking at macro systems by which the user declares that certain symbols indicate to the intermediate processing step of the language that some portion of code should be transformed by a specified piece of code and the result inserted back into the code representation where the macro appeared.
Macro syntax can be a bit confusing when you see it for the first time. It is useful, then, to consider what a macro might do rather than how it is written, first. Macros should almost always transform code into code, with no side effects along the way, and so we can understand a macro by writing the code before and then the code after macro expansion.
If you know some Lisp, you know that variables are introduced with
let epxressions. Each binding expression in a
let is executed
without any of the other bindings in scope, so:
(let ((x 10) (y (+ x 1))) (+ y x))
Will produce an error, because
x is not visible until the body
portion of the
let expression. If you want to nest references to
new variables, you need
(let* ((x 10) (y (+ x 1))) (+ y x)) ; -> 21
Which does work. Let's write a
let* macro as an example.
let* are all about [binding], and that means we
lambdas to express
let*. We want:
(my-let* ((x 10) (y (+ x 1))) (+ y x))
To expand to:
(funcall (lambda (x) (funcall (lambda (y) (+ y x)) (+ x 1))) 10)
Take a second to think about it, if you don't see why. The basic
insight is that
lambda can be used to introduce a context where a
variable is bound, and funcall binds that variable.
The transformation of some code A to some other code B is the action of all but a few exotic macros. This transformation happens, conceptually, before any code is run (in practice, because many Lisps favor interactive development, macro expansion might occur after the user has executed some code, but it is highly unusual, and probably an error, for the macro expansion itself to depend on the results of previous execution (except that the macro may use previously defined functions to effect the code transformation)).
Let's write the macro, this time in Emacs Lisp:
(defmacro my-let* (bindings &rest body) (cond ((empty? bindings) (cons 'progn body)) (t (let ((pair (car bindings)) (leftover-bindings (cdr bindings))) `(funcall (lambda (,(car pair)) (my-let* ,leftover-bindings ,@body)) ,(cadr pair)))))) (my-let* ((x 10) (y (+ x 1))) (+ y x)) ;-> 21
We need to say a bit about quasi-quotation to really understand how
this works, but there are some things we can explain right away.
Emacs Lisp represents code as a list of symbols, other atoms, and
my-let* is encountered, the source code after
in the code is passed, as a list, into a function (implicitly defined
defmacro statement. This list is destructured by this
function using regular old argument specification (so that the first
item in the list is bound to
bindings and all the subsequent items
are bound to
body, in a list) and then the body of the
executed. It's result must be a piece of code, which is inserted
wherever the macro appeared in the source code representation.
Let's focus on the first possibility, when the macro is called with an empty set of binding forms. Our code is:
(my-let* () "Hey there.")
This is slurped up into the Lisp interpreter as:
(list 'my-let* '() "Hey there.")
The interpreter sees
my-let* and knows that this indicates a macro.
To expand this macro, it calls a function based on the
expression. This function doesn't have a name, but we can write what
it would look like:
(defun -my-let*-expander- (binders &rest body) (cond ((empty? bindings) (cons 'progn body)) (t (let ((pair (car bindings)) (leftover-bindings (cdr bindings))) `(funcall (lambda (,(car pair)) (my-let* ,leftover-bindings ,@body)) ,(cadr pair))))))
The macro expanding part of the lisp system calls this function with the following arguments:
(-my-let*-expander- '() "Hey There")
This function returns:
'(progn "Hey There")
And the macro expanding part of the lisp system inserts that
expression, whole hog, so to speak, into the place where the original
form starting with
my-let* was located. It then moves on. Once all
the macros are expanded, the code is passed to the
compiler/interpreter/etc and then executed. Viola!
To understand the other branches, we need to cover a feature which appears in most lisps: quasiquotation.
Macros & Quasiquotation
Lisps, more or less, represent code internally as lists or/of atoms
(things like numbers, strings, symbols). The code
more or less, a piece of code that evalutes to the value of the symbol
x. The list
(+ x 1) is the piece of code that evaluates to the
value of symbol
x plus one. We covered last time how to enter these
unevaluated pieces of code into a running lisp session using
quotation. The quote operator tells a Lisp not to evaluate whatever
it is applied to. So:
(setq x 10) x ;-> 10 'x ;-> x (+ x 1) ;-> 11 '(+ x 1) ;-> (+ x 1)
Quote is ok when we want to write a tiny bit of code-as-data, but when
we write macros, we often want to interpolate between data and code in
a more dynamic way. We could, of course, use
list to create our
(list '+ 'x 10) ;-> (+ x 10)
list is a function, each argument is evaluated - we
manually whatever arguments we want to leave unevaluated as we
construct our code.
Quasiquotation is kind of the opposite of
list. A quasiquoted
expression, by default, doesn't evaluate its inputs except when they
are unquoted. Quasiquote is indicated by the back-quote character:
And, within a quasiquoted form, unquotation is indicated by a
which kind of makes sense. Quasiquotation works pretty much
identically in Emacs and Common Lisp, so fire one of them up and try
it out. You can use quicklisp to install Common Lisp quickly.
`(+ x ,(+ 10 11 12)) ;-> (+ x 33)
Should work in either Common Lisp or Emacs. Note, however, that
different Lisps handle symbols slightly differently. In SBCL, a
Common Lisp implementation,
'x will evaluated to
X, and symbols
are not case-sensitive by default. Symbols are case sensitive in
Sometimes you want to interpolate the contents of a list into a piece
of code, in which case you say
,@ instead of
`(+ ,@(list 1 2 3)) ;-> (+ 1 2 3)
Without concerning ourselves with namespaces or packages (of which Emacs has neither) Clojure macros operate in the same way as Emacs/Common Lisp macros, but with some slight changes in the way operations are indicated:
(defmacro my-let* [bindings & body] (cond (empty? bindings) (cons 'do body) :else (let [[symbol value-expr & leftover-bindings] bindings] `((fn [~symbol] (my-let* ~leftover-bindings ~@body)) ~value-expr))))
This piece of code reads just like the Common/Emacs Lisp except for a
few details. The first is that the argument list to the macro is
specified using a Clojure vector rather than a list. Unlike in
Common/Emacs Lisp, the default Clojure reader is able to read more
than lists and symbols. The
 syntax indicates a Clojure
persistent vector. It is read into the code representation as a
vector, rather than as some kind of list or atom. This is both an
important and a trivial difference. Consider in Emacs Lisp:
(vector 1 2 3)
This indeed evaluates to a vector with elements
1 2 3. But it is
read as a list whose head is the symbol
vector. Only upon
evaluation do we get a vector.
(vectorp '(vector 1 2 3)) ;-> nil (listp '(vector 1 2 3)) ;-> t
In Clojure, by contrast:
(vector? '[1 2 3]) ;-> true (list? '[1 2 3]) ;-> false
(But see this footnote:2).
(As an aside, Clojure also supports tables as part of its code representation, in keeping with its intent of expanding Lisp's philosophy to data structures other than lists.)
do is how you say
progn, both of which introduce a
form which evaluates all its parts, and returns the result of the
last. Next, we follow the convention that redundant parentheses in
Clojure should be elided. Because we know
bindings contains a list
of pairs, we just read that list by two, rather than as an actual list
of pairs. You see this in Clojure's
let form, which has one less
layer of nesting.
(let* ((x 10) (y 11)) (+ x y))
(let [x 10 y 11] (+ x y))
(Note that picoLisp also elides redundant parentheses, but does not
use vectors for binding, also note that Clojure doesn't have
let has that behavior).
Also by convention, we use vectors for the binding part of any
form (although this macro doesn't check for that). We say
lambda in Clojure, and we use
~ for the unquote
operation. In Clojure,
~@ is the way you say
Other than those differences, the Clojure macro has the same behavior as the Emacs/Common Lisp Macro. It creates an invisible function which is used to expand code tagged with that macro name during post-read processing, and then the code is passed to the Clojure compiler. It is worth noting here that Brian Goslinga has ported Scheme-style syntax-case/rules macros to Clojure, but it isn't clear to me without further study whether they truly are hygeinic. Claims of macro hygiene are often exagerrated outside of the Scheme universe.
Issues with Naive Code-Rewriting Macros
The best way to understand the motivation for Scheme hygeinic macros, as well as some of the differences between more or less conventional macro systems in other Lisps is to understand how things can go wrong. Last time we talked extensively about scope and how it effects the way we look at a piece of data that represents code. The upshot was that in languages where scope is dynamic, which means variables evaluate to whatever the current binding of the variable is at the moment of evaluation, can simply represent code (and particularly the meaning of symbols) as just lists of atoms and symbols. A piece of code "means" whatever it is you get by evaluating it in the context of the current bindings from symbols to values. Period.
Lexically scoped languages, on the other hand, impose more strenuous semantics on code. In a lexically-scoped language, variables refer to the lexical environment (the environment around them "on the page") when they are evaluated. Hence, in a lexically scoped language, a "naked" piece of code, such as the code produced by quotations, is "impoverished" - it doesn't record in any way the lexical environment in which the quotation was created, and so it is, in some sense, "meaningless," at least the extent that it contains symbols which aren't bound.
In a dynamic language, the code fragment consisting of the single
'x means "Whatever value
x has when you evaluate me." This
representation is complete, but depends on the evaluator. In a
lexically scoped language, where symbols, when they appear in code,
are implicitly associated by the rules of evaluation with a lexical
context they were originally created in,
'x is meaningless. It
looks like a piece of code, but in a real sense it isn't quite
Emacs/Common Lisp/Clojure style macros, however, operate on pieces of "code" produced by ordinary quotation - that is, they operate on "hypocode," say, which just means something not quite code.
Making a Mess
Let's make a mess, by way of example. Suppose we have an object system wherin objects are collections of key -> value relations in a persistent data structure, like an association list3. We will work in Emacs/Common Lisp.
(defun empty? (x) (eq x nil)) (defun set-slot (obj symbol val &optional acc) (cond ((empty? obj) (cons (cons symbol val) (reverse acc))) (t (let* ((first (car obj)) (o-key (car first)) (rest (cdr obj))) (if (eq symbol o-key) (append (reverse acc) (cons (cons symbol val) rest)) (set-slot rest symbol val (cons first acc))))))) (defun get-slot (obj symbol) (cond ((empty? obj) nil) ((eq (car (car obj)) symbol) (cdr (car obj))) (t (get-slot (cdr obj) symbol)))) (get-slot (set-slot (set-slot (set-slot '() 'x 10) 'x 11) 'y 14) 'y)
Methods will just be functions whose first argument is the
(defun make-person (first last) `((:first-name . ,first) (:last-name . ,last))) (defun change-first-name (self new-first) (set-slot self :first-name new-first)) (defun change-last-name (self new-last) (set-slot self :last-name new-last))
(note the use of backquote outside of a macro.)
Our objects are "pure" in this example - the methods
change-last-name don't modify
self, they return a fresh
self object with the appropriate changes. I prefer this behavior,
but it makes chaining method application clumsy. We'll develop a
macro to make method chaining for pure objects easier.
I know a woman who changed both her first and last names when she got
married. She went from Ami Culbert to Amy Klein (she had always
resented her parents for the unusual spelling of her first name).
We've got to nest our method calls or use a
let to affect both
(let ((new-self (change-first-name (make-person "Ami" "Culbert") "Amy"))) (change-last-name new-self "Klein")) ; -> ((:first-name . "Amy") (:last-name . "Klein"))
Let's write a macro which automatically threads an object through method invokation. It should expand:
(with-object o (change-first-name "Amy") (change-last-name "Klein"))
into something like:
(let ((new-self (change-first-name o "Amy"))) (change-last-name new-self "Klein"))
People often complain, inaccurately, I think, that macros are "hard to debug" because they aren't regular functions. It is true they aren't invoked as regular functions in the ordinary course of events, but one can certainly write the "business end" of the macro as a regular function and just have the macro definition pass the quoted inputs to this function appropriately. Then you can interactively test the macro expansion, build unit tests, etc.
(defun with-object-expander (object-expr method-applications) `(let* ((object ,object-expr) ,@(mapcar (lambda (methap) `(object (,(car methap) object ,@(cdr methap)))) method-applications)) object)) (with-object-expander 'test-object '((change-first-name "Amy") (change-last-name "Klein"))) ;-> (let* ((object test-object) (object (change-first-name object "Amy")) (object (change-last-name object "Klein"))) object)
If you are new to Lisp macro writing, this probably looks like it works just fine. If you are a hoary old Lisper (and I know some of you are), then this probably looks like one of those ridiculous examples from safety videos where a goofy guy climbs up a ladder, or some such, with an aerial around some high tension power lines, which is exactly what it is. Let's sally forth like Goofus, however:
(defmacro with-object (object-expression &rest method-exprs) (with-object-expander object-expression method-exprs)) (with-object (make-person "Ami" "Culbert") (change-first-name "Amy") (change-last-name "Klein")) ; -> ((:first-name . "Amy") (:last-name . "Klein"))
Indeed, apparent success! However, imagine if we are instead taking the new last name from an object representing Ami's fiance:
(let ((object (make-person "Jason" "Klein"))) (with-object (make-person "Ami" "Culbert") (change-first-name "Amy") (change-last-name (get-slot object :last-name)))) ; -> ((:first-name . "Amy") (:last-name . "Culbert"))
What gives? Amy's last name should be "Klein" but it is stil "Culbert". What happened? A quick look at the macro expansion of the entire expression would help, but how do we get such a thing? Emacs/Common/Clojure Lisps provide macro-expansion tools which take an expression and expand macros inside of it:
(macroexpand-all '(let ((object (make-person "Jason" "Klein"))) (with-object (make-person "Ami" "Culbert") (change-first-name "Amy") (change-last-name (get-slot object :last-name))))) ;-> (let ((object (make-person "Jason" "Klein"))) (let* ((object (make-person "Ami" "Culbert")) (object (change-first-name object "Amy")) (object (change-last-name object (get-slot object :last-name)))) object))
Now we can see what the problem is. We bound Amy's fiance to the
object, but our macro expansion also used that symbol
internally, to represent the object being threaded through the method
applications, which means it was bound to the Amy person by the time
the call to
get-slot was executed on
object. Amy's last name was
just her last name, so the
change-last-name call changed nothing!
Note that this is, essentially, an error of context or variable
interpretation. Our mental model was that the
object symbol used
during macro expansion was unrelated to the
object symbol used in
let expression. But the Lisp has no way of knowing
that. Because we are manipulating naked code, a symbol is just a
symbol, and the behavior of our complete expansion is just whatever
behavior it looks like it would have if someone had written it by
hand. Scheme's macro system is designed to resolve this problem by
separating contexts more aggressively, so that macro expansion doesn't
get in the way of later variable binding unless you specifically want
We don't, however, have hygeinic macros in the Lisps we are talking about today, so how do we avoid the problem specified above? There are at least two ways which are in common use. The first is to force the invoker of a macro to provide any names which might be used during expansion, so they are forced to acknowledge that those names will be rebound by the macro. The second is to make sure that the macro uses symbols which you are certain will never be used by the programmer who invokes the macro.
The former strategy looks like this:
(defmacro with-object-bound-to (sym obj &rest meths) `(let* ((,sym ,obj) ,@(mapcar (lambda (m) `(,sym (,(car m) ,sym ,@(cdr m)))) meths)) ,sym))
(let ((object (make-person "Jason" "Klein"))) (with-object-bound-to amy (make-person "Ami" "Culbert") (change-first-name "Amy") (change-last-name (get-slot object :last-name))))
Which has the intended effect. However, if you don't intend on using
the value bound to
amy at any time, you'd probably prefer regular
We could implement with object by just trying to use a crazy symbol
object, something we'd hope the users of our macro will never
think to use. For instance, we could prepend the letter s to the md5
hash of the word "object": sa8cfde6331bd59eb2ac96f8911c4b666:
(defun with-object-expander (object-expr method-applications) `(let* ((sa8cfde6331bd59eb2ac96f8911c4b666 ,object-expr) ,@(mapcar (lambda (methap) `(sa8cfde6331bd59eb2ac96f8911c4b666 (,(car methap) sa8cfde6331bd59eb2ac96f8911c4b666 ,@(cdr methap)))) method-applications)) sa8cfde6331bd59eb2ac96f8911c4b666))
And that would probably work. However, Lisps usually provide a
facility to generate symbols guaranteed not to be used (subject to the
understanding that if symbols are read at run-time, all bets are off -
althouh it is still unlikely there will be a problem). In Common and
Emacs Lisp, this is accomplished via the function
gensym. In these
Lisps, we'd write:
(defun with-object-expander (object-expr method-applications) (let ((object-name (gensym "object-"))) `(let* ((,object-name ,object-expr) ,@(mapcar (lambda (methap) `(,object-name (,(car methap) ,object-name ,@(cdr methap)))) method-applications)) ,object-name)))
The critical points are that we've introduced a variable
which, at macroexpansion time, is bound to a fresh symbol. We then
insert that symbol wherever
object used to be by unquoting it into
out expanded expression.
Turns out this feature is so commonly used that Clojure provides
special support for it - during backquote expansion, symbols
terminated with a
# character are automatically bound to
appropriately generated symbols. By appropriate we mean that
object# will always be expanded to the same symbol within a single
backquote. In this example, since some of the uses of the
symbol are generated programmatically, we'd still need to use
gensym, which Clojure also provides.
Namespaces, Packages, Hygiene
gensym falderal is meant to help us ensure our macros are
hygienic, which informally means that in the body of a macro, the
code you write behaves the way you expect it to behave, modulo
whatever changes in meaning the macro implies AND that when the macro
expands, it means what the macro writer meant it to mean, regardless
of what symbols mean where the macro is expanded.. Variable capture,
the example we looked at above, is only one of many possible hygiene
complications. Another possibility is that a macro (the dependent
macro, lets call it) might depend on another macro, which itself is
defined one way when the dependent macro is written, but might be
redefined later when someone else tries to expand the dependent
macro. It's very hard to write macros that are hygienic in this sense
in the Lisps we've talked about so far, but some of them do provide
the faculties to at least help a bit.
Emacs Lisp is the oddball in this department, however - it has neither packages or namespaces or any other module system4.
Both Clojure and Common Lisp provide some conception of modules or
bags of symbols. In Common Lisp these are called packages, in Clojure
they are called namespaces. In both cases, macro writers can use
namespaces to help avoid macro expansion problems. Consider the case
where some joker redefines the
let* special form. Your macro wants
let* as it is ordinarily defined by Common Lisp. He wants to
use your macro, but wants to put code in it that depends on his
crazy new definition of
let*. If you write your macro (expander)
(defun with-object-expander (object-expr method-applications) (let ((object-name (gensym "object-"))) `(cl::let* ((,object-name ,object-expr) ,@(mapcar (lambda (methap) `(,object-name (,(car methap) ,object-name ,@(cdr methap)))) method-applications)) ,object-name)))
You can have your cake and the user of your macro can eat it too. The
key changes is that we've qualified our
let* symbol with the package
it lives in.
cl::let* means we want Lisp to use the meaning of
let* as defined in the package
common-lisp (packages can
have nicknames in Common Lisp). If our user redefines
let* in his
own package, then he can use our macro without any problems.
Clojure takes this one step further. Backquote automatically expands
symbols to their namespace-qualified names, so macros automatically
expand to refer to the special forms as defined in whatever namespaces
they are defined in when the macro itself was defined. Material
unquoted into the macro expansion is expanded appropriate to the
context it was introduced in. The net result is that someone
let in their own package, and calls your macro, it
automatically expands to use the
let from the base Clojure package:
`(let [x 10 ~'y 11] (+ x y)) ;-> (clojure.core/let [user/x 10 (quote user/y) 11] (clojure.core/+ user/x user/y))
I haven't specifically mentioned Arc throughout all this, but Arc macros basically follow the same design as those in Common/Emacs Lisp and Clojure. If they are more similar to any of the above than any other, it is Emacs Lisp, since Arc doesn't appear to have, at the moment, a namespace or package system.
Good Lisp programmers use macros sparingly, for the most part, and so
it is difficult to think up examples when you'd need more macro
hygiene then what you get in Lisps with
gensym and packages and
namespaces. In particular, Clojure seems to provide enough protection
that you won't accidentally shoot yourself in the foot unless you
wander off the beaten path.
However, you might wonder if we can't come up with a more formal
syntactic extension scheme which really protects us in a simple to
understand way. Something that makes macro hygiene as easy to
conceptualize as variable hygiene for regular code? There are lots of
ways to approach this problem (more than I know of, certainly) but
next time we'll discuss the way Scheme does it. Scheme
syntax-rules/case macros are very different compared to
style forms, and their intent is to make hygiene the default
behavior, so that complex macros (that, for instance, depend on one
another) can be defined easily without much extra manual fiddling
(gensym, packages, etc). As we'll see, the solutions are related to
the difference between lexical and dynamical behaviors, a theme which
keeps coming up!
1 See also Kernel.
2 I must point out that Emacs Lisp, but not Common Lisp, has a similar read syntax:
[a b c] ;-> [a b c]
Which is sort of like the vector syntax except that it reads the elements of the vector literally, as if they were quoted. The emacs evaluator will not evaluate the elements: when it encounters a vector in the code it is evaluating, it just returns the vector. The Clojure evaluator encounters a vector, and returns the vector containing the result of evaluating each element in the vector.
(let ((a 10) (b 11) (c 12)) [a b c]) ;-> [a b c]
(let [a 10 b 11 c 12] [a b c]) -> [10 11 12]
I wish Emacs had the Clojure behavior (or an extensible reader, like Common Lisp).
An association list is a list of pairs (created with
pair consists of a symbol and a value.
I estimate that the emacs session I'm running right now, to write this piece, has about 25,000 functions and symbols bound. The fact that emacs works at all without namespaces, with just this giant soup of symbols, is amazing and maybe even indicative of some kind of blind spot in our understanding of software development. Maybe worse is better.