Monday, March 17, 2008

Common Lisp Reader Macros Pt 2: Packages and Complications

Last time I explained the basics of reader macros in Common Lisp, defining a special syntax for vector application of functions and a short lambda declaration as a bit of syntactic sugar. It's this latter construct:

[(+ X 1)] 
;; becomes/expands to (lambda (X) (+ X 1))
;; And
[a b c -> (* a (+ b c) (- b c))] 
;; becomes/expands to (lambda (a b c) (* a (+ b c) (- b c)))
which we will discuss further today. In the last entry, I also included the hopefully not too obscure warning "A word of caution: Macros are always tricky beasts, and extending Lisp this way may only seem like a good idea. In practice, it introduces all sorts of tricky issues." Today some of those tricky issues will become apparent as we approach the subject of Common Lisp Packages. Much of the information I am going to discuss is covered in The Complete Idiot's Guide to Common Lisp Packages. This article can be thought of as an extended example. Programming, as a practical problem, is often about marshaling complexity. Most constructs in programming languages (procedures, closures, objects, etc) are all at least partly ways of encapsulating little bits of code in the hopes that as the system grows, interactions between logically separate computations remain simple to understand. To support this at levels higher than functions and objects, Common Lisp provides the concept of packages. We will begin by thinking of packages as just lumps of code and then refine our understanding as we encounter problems. We want to create a package to contain all of our utility functions so we can use them in various projects without worrying much about name collisions. In Common Lisp we create a package (which is a first order value) with the defpackage macro.

CL-USER> (defpackage 
  #:my-utils 
  (:use #:common-lisp) 
  (:use #:common-lisp-user))
#
This creates a package named my-utils and declares that inside this package, all the symbols exported by the packages common-lisp and common-lisp-user will be available (these are the default symbols available at your average REPL). What does the #: macro character do? That is what I asked myself the first time I tackled packages in Common Lisp. We will get to the answer, but for now just think of it as a special way to introduce a symbol (bearing in mind that defpackage is a macro and its arguments are quoted). Now that we have created a package lets put some functions into it. We do that using the in-package macro.

CL-USER> (in-package #:my-utils)
MY-UTILS>
Your prompt will change to indicate that you now are working inside the my-utils package. Any symbols or functions introduced here will be visible only in this package. Lets add a function to the package. Suppose we frequently want to get the CDR of a list stored in a list of lists or symbols if its a list and its CAR is a particular symbol. That is, get-tagged-list-body should do the following:

MY-UTILS> (let ((descriptor 
                  '(dog (paws 
                          front-left
                          front-right 
                          back-left 
                          back-right))))
(get-tagged-list-body 'paws descriptor))
(FRONT-LEFT FRONT-RIGHT BACK-LEFT BACK-RIGHT)
We can define such a function easily:

(defun get-tagged-list (tag lst)
  (car (member-if 
         (lambda (el) (and (listp el) (eq tag (car el)))) 
         lst)))
(defun get-tagged-list-body (tag lst)
  (cdr (get-tagged-list tag lst)))
N.B. member-if Now we can use the function inside the package.

MY-UTILS> (get-tagged-list-body 
            'dog 
            '(x (dog a b c)
              y (cat q r s)))
(A B C)
Now we want to make it available to users outside the package with the export function.

MY-UTILS> (export 'get-tagged-list-body)
We can then use it outside the package thusly:

MY-UTILS> (in-package #:common-lisp-user) ;; Go back to the CL-USER package.
CL-USER> (my-utils:get-tagged-list-body 'dog '(x (dog a b c)))
(A B C)
Note that we just prefix the function call with the package name and : (:: accesses unexported symbols). Simple enough, neglecting the business with #: and exactly how packages manage this trick. Now, though, lets go back to our short function call macro. We might easily have written get-tagged-list in the following way:

(defun get-tagged-list (tag lst)
  (car (member-if 
        [el -> (and (listp el) (eq tag (car el)))] 
        lst)))
Using our (perhaps silly) short-lambda macro. Lets assume that we defined the macro and its associated functions in the (default) CL-USER package. As a reminder, here is the code:

(defun short-lambda-split (elements)
(let ((args
  (loop while (and
         (not
     (eq (car elements) '->))
         elements)
  collect
    (pop elements))))
  (if (not elements)
  ; There is no arg list
    `(lambda (x) ,@args)
    `(lambda (,@args) ,@(cdr elements)))))

(set-macro-character #\[
(lambda (stream char)
 (let* ((lst
        (read-delimited-list #\]
     stream t)))
  (short-lambda-split lst))))

(set-macro-character #\] 
  (get-macro-character #\)))
Since we defined these in the CL-USER package and MY-UTILS uses this package (see defpackage, then we should be able to go back into the MY-UTILS package and defun the above version of get-tagged-list and be good to go. If we do this, though, something surprising happens - something which will teach us a lot about the way packages (and the reader) work. Go back into MY-UTILS and redefine get-tagged-list as above. Then switch back to CL-USER and execute:

CL-USER> (my-utils:get-tagged-list-body 'dog '(x (dog a b c)))
When I do this, Slime under SBCL complains like this:

The variable MY-UTILS::EL is unbound.
   [Condition of type UNBOUND-VARIABLE]

Restarts:
 0: [ABORT] Return to SLIME's top level.
 1: [ABORT] Exit debugger, returning to top level.

Backtrace:
  0: ((LAMBDA (X)) #)
  1: (MEMBER-IF # (X (DOG A B C)))
  2: (MY-UTILS::GET-TAGGED-LIST DOG (X (DOG A B C)))
  3: (MY-UTILS:GET-TAGGED-LIST-BODY DOG (X (DOG A B C)))
  4: (SB-INT:SIMPLE-EVAL-IN-LEXENV (MY-UTILS:GET-TAGGED-LIST-BODY (QUOTE DOG) (QUOTE (X #))) #)
  5: (SWANK::EVAL-REGION "(my-utils::get-tagged-list-body 'dog '(x (dog a b c)))
     ")
  6: ((LAMBDA NIL))
So what is going on here? It actually isn't totally obvious where the problem might be from the somewhat terse error message and stack trace, so I will save you the trouble of process of elimination that I went through. The thing to do is to expand our reader macro in the MY-UTILS package, since that is where the expansion is happening when we evaluate the function. This may give us an idea of what the issue is. Remember, reader macros are expanded by the reader before any evaluation (during the read), so we can see the expansion of a Macro with just a quote.

MY-UTILS> '[(a b c) -> (+ a b c)]
(LAMBDA (COMMON-LISP-USER::X) (A B C) -> (+ A B C))
What has happened here is that the short-lambda-split function has somehow not found the -> symbol, and has decided that the form it read, which is manifestly (a b c -> (+ a b c)), is a short-lambda without an explicit argument list. So it inserts what it believes to the body (the entire expression) into the a lambda with only one argument (here somewhat mysteriously expanded into COMMON-LISP-USER::X). Turns out that COMMON-LISP-USER::X is the real hint we need to figure out what is going on, although our initial instinct might not point us there. What is happening is that the Reader is expanding the macro with a function written in the COMMON-LISP-USER package. Since we are not in that package, Lisp is telling us in this quoted expression that the symbol X produced with a quotating and inserted into the macro list to be exported is actually a symbol which belongs to the COMMON-LISP-USER package, not the MY-UTILS one. This is the key. The code testing whether a symbol in the macro expression list is EQ to the -> symbol is actually properly understood as comparing to COMMON-LISP-USER::->. I believe this is because the quote expression has been evaluated at definition (that is, lexically) or at compile time, resulting in the symbol being interned in the package the code is compiled/defined in. (I would love it if a Lisper could tell me which). What does interning mean? To understand, you have to understand that a package is a collection of symbols, and an interned symbol is one that belongs to a package. By default, Common Lisp interns all symbols entered into the reader with a quote into the current package. So the reader is interning the -> symbol into the package CL-USER, where the split-short-lambda function is defined. Hence, EQ is detecting that COMMON-LISP-USER::-> is not the same as MY-UTIL::->, which actually makes sense - they are different - that is the entire point of packages. (N.B. the MY-UTIL:: is implicit and not printed by the printer in our examples since we are "in" the MY-UTIL package, but the whole short-lambda list is made of symbols automatically interned by the reader into the current package.) So we want to force the split-short-lambda function to intern certain symbols at evaluation time, and into the current package, rather than at "definition" time. We can do this my modifying it in the following way:

(defun short-lambda-split (elements)
(let ((args
  (loop while (and
         (not
     (eq (car elements) (intern "->")))
         elements)
  collect
    (pop elements))))
  (if (not elements)
  ; There is no arg list
    `(lambda (,(intern "X")) ,@args)
    `(lambda (,@args) ,@(cdr elements)))))
Here we use calls to the function intern to force the creation of symbols interned in the current package. Now if we execute our macro, it gives us what we expect. Note that we also interned the symbol x in the no argument case for (basically) the same reasons.

MY-UTILS> '[a b c -> (+ a b c)]
(LAMBDA (A B C) (+ A B C))
What have we learned? We've learned that macros (reader macros, in this case) are hard to debug, and that they interact with the reader and the currently defined packages in strange ways. We've also learned that the package system works by grouping symbols into different packages, and it also interacts with the reader in perhaps non-intuitive ways. Given these complexities, the question to ask is obviously is it really worth five few keystrokes every time we want to write an anonymous function? That depends on how you feel about it and the problems you are trying to solve, but at least you have the choice. One more thing! I promised we would revisit the #: macro character and now that we have an operational understanding of packages, we can figure out what is going on. If you look on the sharpsign page of the hyperspec, we read that #: indicates an "uninterned symbol". The reason we use #: instead of regular symbol is so that we don't pollute our packages with symbols, which are otherwise automatically interned as they are encountered by the reader. Using #: seems to be the conventional way of being a little more tidy about your packages. Questions, comments and (most importantly) corrections are welcome.

7 comments:

Anonymous said...

You wrote:

CL-USER> (my-utils::get-tagged-list-body 'dog '(x (dog a b c)))

But since you've exported GET-TAGGED-LIST-BODY from MY-UTILS, you should write (my-utils:get-tagged-list-body ...) --- notice the single colon. That's sort of the whole point of exporting symbols.

J.V. Toups said...

Oops!

M Jared Finder said...

The '-> expression got interned at /READ/-time, when the SHORT-LAMBDA-SPLIT function was typed into the repl.

' is just a read macro, that returns (quote {call read recursively}), and quote returns whatever its arg is with no modification.

Incidentally, I'd prefer to check that the symbol-name is EQUAL, instead of interning a symbol, because the value of *PACKAGE* may not be the package being read in.

J.V. Toups said...

jared, thanks for the information and suggestion. Will EQUAL really decide that two symbols interned in separate packages are the same?

(defpackage :a)
(defpackage :b)
(equal 'A::x 'B::x) -> NIL
(equal 'X 'B::X) -> NIL

When I try it. Maybe I don't see what you mean, though.

J.V. Toups said...

Oops, I mean #:a and #:b.

John Connors said...

This is a really good pair of articles, it's shed light on something that was previously pretty obscure for me, thanks!

Fae said...

This discussion may be pretty old, but I came across it while reading about reader macros and thought I'd clarify one point.

M Jared Finder was suggesting the use of symbol-names for comparison purposes, in this fashion:

(equal (symbol-name 'A::x) (symbol-name 'B::x)) -> t