Wednesday, December 26, 2012

On Lisps

My recent release of Gazelle, a Javascript Lisp dialect, has, unsurprisingly, produced some commentary about my choice to implement the system in Emacs Lisp. Coincidentally, this also comes on the tail of a long (and mostly civil, I might add) discussion I had in the Lisp sub-reddit about the merits of including in our rhetorical ecosystem discussion about unusual Lisp dialects like Newlisp and Picolisp, in which I argued that its useful to discuss and even program in these dialects. Many people express strong opinions to the contrary, and while some of the objections relate to aspects of Newlisp's implementation, like One Reference Only Memory Management, to which, as a dabbler in language implementation, I cannot speak, a lot of the distaste for these dialects, and for Emacs Lisp, seems to relate to their use of dynamic scope.

Now, in general, I like my Lisps static. A lot of what I fiddle with in Lisp has to do with emulating features I like in Haskell (poorly), and Haskell hews pretty close to orthodoxy on the scope issue. My favorite Lisp is probably Racket, and the only support it provides for dynamic-like bindings is through parameters. In short, I am no fan of dynamic binding, and was delighted as anyone when Emacs Lisp began to support lexical binding natively, and would be happy to program without it in any language.

However, I do not view dynamic scope as the kiss of death like many people seem to, especially when programming in Lisp dialects. Of which more later. I'd like to make the case that perhaps, just maybe, we should think of default dynamic scope in a Lisp as a wart, rather than language nullifying feature.

Why do I feel this way?

 Well, it has something to do with Concatenative Languages.

What?

Consider the following diagram:
------------------------------------------------------
dynamicish                                   lexicish
+--------+  +--------+      +-----------+   +--------+
|factor  |  |picolisp|      |common lisp|   |haskell |
|        |  |        |      |           |   |        |
+--------+  +--------+      +-----------+   +--------+
 +--------+    +---------+        +-------+
 |joy     |    |newlisp  |        |scheme |
 |        |    |         |        |       |
 +--------+    +---------+        +-------+
Where the left hand side of each box indicates the position of the language inside on a sloppily conceived of axis going from "dynamicish" to "lexicish", ideas which we will try to pin down as we go. (If, by the way, any actual computer scientists are reading this, I apologize in advance for the conceptual butchery I am no doubt in the process of committing.)

One thing this diagram reflects is the intuitive degree to which we can depend on the nature of binding in the language - Picolisp and Newlisp here obviously favor dynamic binding, Common Lisp supports both well, Scheme favors lexical bindings somewhat more than Common Lisp, and a language like Haskell is pretty much straight lexical binding. (Exceptions abound of course, given the power of these languages to add features.)

What is interesting is that if we change the above diagram by relabling the axes:
------------------------------------------------------
code is simple data            code is not simple data
+--------+  +--------+      +-----------+   +--------+
|factor  |  |picolisp|      |common lisp|   |haskell |
|        |  |        |      |           |   |        |
+--------+  +--------+      +-----------+   +--------+
 +--------+    +---------+        +-------+
 |joy     |    |newlisp  |        |scheme |
 |        |    |         |        |       |
 +--------+    +---------+        +-------+

We don't really need to move the boxes around very much. This is particularly true if we make ourselves clearer and say that code is data when the "official" result of evaluating a piece of "code data" that represents a piece of code is encodable in that data in a simple form. Examples will clarify that horrible monster of a sentence, hopefully.

Concatenative Languages and Quotations

Put aside for the moment the oddness of concatenative languages, and questions which may be appearing in your mind about why anyone would want to program in one. The point here isn't to convince you of their merits. The point is that concatenative languages (except Forth), as it lacks quotations) provide an example of languages where "code is data" is particularly true, and where that data is particularly simple while still retaining a consistent meaning.
Concatenative languages are based on the composition of functions which map one stack onto another stack (usually). Because all functions in such languages have the same interface, one writes a concatenative program as a list of functions, as in this piece of Factor code:

1 1 +

Which evaluates to "2" and is read from left to right as "take the stack and return it with 1 pushed onto it", do that again, and then "take the stack, return a new stack with the result of the first two items removed and added together". Note the utter absence of variable bindings (functions do have names, of course). This (arguably punishing) simplicity makes evaluation unusually simple: begin at the left with an empty stack, and call each "word" on the stack until you run out of functions. We can represent a program as a flat list of lexemes, each of which denotes a function. So the representation of code can be a list and its meaning is simple.

Here is what is interesting about these languages to me: they lack a notion of variable binding, and so you might think they they lack anonymous functions, which they do, strictly speaking. However, because code is easily represented as a list, lists serve the same role as anonymous functions and in languages like Factor, these fragments of code (called quotations because they are introduced via something like quote in Lisps) are constantly exploited to perform the equivalent of "functional programming" idioms in Factor. For instance, to add one to each number in a list of numbers, one writes in Factor:

{ 1 2 3 } [ 1 + ] map

Or, less succinctly, but more evocatively:

{ 1 2 3 } 1 [ + ] curry map

Where I have just shown that one can curry, really, partially apply, and compose (not illustrated) quotations.

The question one must ask is what do we really need for functional programming? We must be able to apply functions, compose them, and partially apply them. Many, but not all, uses of lexical scope in more lexical languages reduce transparently to these operations. We might write in Scheme:

(map (curry + 1) (list 1 2 3))

For instance, with (a simple) curry being implemented as:

(define (curry f a)
  (lambda (b) (f a b)))

Where we are exploiting the lexical scope introduced by curry to stick two values into the lambda.
In a dynamic Lisp, we might say instead, as I often say in Emacs Lisp when I don't want to engage the machinery of the old lexical-let implementation in cl.el:

(defun curry (f a)
 `(lambda (b) (funcall #',f ,a)))

Code is Sort of Data

Factor uses a special data type called a quotation for its, um, quotations, but this has more to do with type safety and compiltion than with the semantics of what a quotation is. It is still basically a list, so let us imagine that the implementation of the curry word is:

: curry ( item quotation -- quotations ) cons ;

That is, effectively, curry is cons. It adds an element to the front of a list.
Now, the dynamic, Emacs Lisp version

(defun curry (f a)
 `(lambda (b) (funcall #',f ,a))) ; thanks to coroi on reddit 
for pointing out an error 
                                  ; here. 

Has something in common with the Factor version. They both operate on and return a simple representation of code, a list. The extra complexity of Emacs Lisp lies in its evaluator. For instance:

(funcall (curry #'+ 1) 1)

is evaluated in the following way: the call to curry is evaluated, returning a list. 1 evaluates to 1, of course, and then funcall does the following. The evaluator can see that the first argument is a lambda because its head is lambda. That means that it extends the dynamic environment with a binding of 1 to b, and then recursively evaluates the body. That means the evaluator is carrying around an environment which it needs to resolve the meaning of code. Still, its pretty simple: just (conceptually) a list of stacks for each possible variable binding with the latest binding on the tops.

The Scheme code, in contrast:

(define (curry f a)
 (lambda (b) (f a b)))

Doesn't deal with any simple data structure. f is a procedure, and its representation is complex, hidden and has a lot to do with compiling the Scheme code, probably. The return value, as the result of a lambda expression, is also a procedure, and similarly inscrutible. The compiler probably does some efficient fiddling to encode in the lambda itself where the references f and a point to: in all likelihood, the compiled code removes the mention of specific symbols entirely. Even though the lambda represents a piece of code in some sense, as it is executable, its representation is now quite complicated. We can still say "Code is data" of Scheme lambda expressions, but it is not very transparent or interesting data.

What about Quotations?

In Factor, quotations and lambdas are the same thing, because there are no symbols to bind in any scopes. Higher order functions and macros are basically the same!
In Emacs Lisp quotations are pretty close to lambdas too: if they happen to have as their head a lambda, then they are, subject to some constraints, actually lambda expressions. If not, you can still call eval on them and if all of their free symbols are located in the dynamic environment that you call eval from, then you get a "meaningful" result. In picolisp, where this philosophy is entirely embraced, one can write higher order functions that do the same thing as macros because they can inspect code passed in.

In Scheme, a quotation introduces syntax, primarily because you want to use quotation during macro construction and hygienic macros need to know more than just the list-structure of the code they work on to ensure hygiene. So already the representation of code in just a quotation is more complicated. This is the primary reason that Scheme is "more staticy/lexical" than Common Lisp, in which a quotation really does produce just a regular list. Scheme wants to ensure macro hygeine, and for that reason static or lexical concerns must even enter into your primary code represantation. In Common Lisp there is a macro system which is distinct from the function system, but allows you to use regular list operations to build the generated code. In Scheme the macro system is somewhat more cumbersome conceptually, with special operations on "syntax" objects to enforce lexical hygeine. Complexity has crawled deep into the language.

Finally, in a language like Haskell, you don't really have quotations at all. The representation of any kind of code has become fairly complex and generally not worth fiddling with (template metahaskell aside).

What does this mean?

The point of the above discussion is twofold:
  1. Languages lie on a continuum between dynamic and static and that positions on that continuum have complexity and flexibility payoffs and penalties. The very dynamic languages pick up a lot of flexibility (which is good and bad) and the static languages pick up a lot of complexity, but get certain conceptual guarantees and speed in exchange.
  2. Especially in Lisps, which provide facilities for meta-programming, this distinction isn't extremely important. Heck, even Factor has an implementation of lambda with lexically scoped variables that compiles down to stack code. Emacs Lisp has a lexical-let form that gets you 99% of what you want. Specific features of closures that you need at any given moment can easily be built in dynamic lisps. If you want to program functionally, for instance, you can whip up good curry and compose operations in any of these languages and away you go.
I can't find the reference at the moment, but I recall reading from one of the Factor maintainer's that they use lexical variables in less than ten percent of the Factor code base. I suspect if you examine any large Lisp project you will find that for the most part the actual use of the semantics of lexical scope is small.
There is absolutely an argument to be made that dynamic scope introduces certain kinds of hard to find bugs, and I agree with that assessment. I don't think dynamic scope is optimal by any means, but the case against it is somewhat overstated. A dynamically scoped Lisp is still extremely powerful and in some ways more flexible than an efficiently compiled lexical lisp.

Actual Code

I work hard on making my Emacs Lisp code readable and clear, and so I wanted to show a bit of the implementation of Gazelle here, to drive the point home that in a Lisp the exact feature set under the hood doesn't make as big of a difference as it might seem.

(defun-match- prim:transcode ('_false)
  (prim:insert "false"))

(defun-match prim:transcode ('_true)
  (prim:insert "true"))

(defun-match prim:transcode ('_null)
  (prim:insert "null"))

(defun-match prim:transcode ('_undefined)
  (prim:insert "undefined"))

(defun-match prim:transcode ((non-kw-symbol s))
  (prim:insert (prim:mangle s)))

(defun-match prim:transcode ((keyword k))
  (prim:insert "\"")
  (prim:insert (prim:mangle (prim:kw->symbol k)))
  (prim:insert "\""))

(defun-match prim:transcode ((number n))
  (prim:insertf "%s" n))

(defun prim:transcode-string (string)
  (prim:insert "\"")
  (loop for character in (coerce string 'list) do
        (match character
               (?\n (insert "\\n"))
               (?\t (insert "\\t"))
               (?\" (insert "\\\""))
               (?\\ (insert "\\"))
               (else (insert else))))
  (prim:insert "\""))

(defun-match prim:transcode ((string s))
  (prim:transcode-string  s))

(defun-match prim:transcode ((list '_var (non-kw-symbol s) expr))
  (prim:insert "var ")
  (prim:transcode s)
  (prim:insert " = ")
  (prim:transcode expr))

(defun-match prim:transcode 
       ((list '_return 
              (list (and which 
                    (or '_for '_while '_try '_var '_=)) (tail body))))
  (recur `(,which ,@body)))

(defun-match prim:transcode 
       ((list '_return 
              (list 
               (and which (or '_throw '_continue '_break)) expression)))
  (recur `(,which ,expression)))

(defun-match prim:transcode ((list '_return expression))
  (prim:insert "return ")
  (prim:in-parens 
   (prim:transcode expression)))

This is a small excerpt from the prim module, which converts a primitive representation of Javascript in s-expressions to Javascript source code. That transcoder consists of one large function definition, split up into several invocations of defun-match (the initial defun-match- clears previous function bindings to the provided function name), each of which matches its pattern against the input and executes when the match succeeds. Matches are tried in the order in which the defun-match forms are compiled, in most cases from top to bottom. defun-match allows you to tail-recur to the same function without growing the stack, so in many cases (two shown above), prim:transcode calls itself to perform the actual transcoding.

I can only ultimately speak for myself, but I find that this code is clear, readable (once you get used to the pattern matching) and self documenting. If you can write code like this in Emacs Lisp, is it really such a crime to do so?

And given that by really thinking about the different ways to program in different Lisp and Lisp-like languages we can learn a great deal about programming, even in our favorite dialects, is it really useful to stifle conversation and development in alternative Lisps?
I really don't think so.

2 comments:

logicgrimoire said...

Wow, this is the first article that has explained dynamic vs. lexical scope in language (and using examples) that I can grok. As you note, it's usually framed as good vs. evil, with the more static camp on the side of angels. And yet a lot of useful (and even beautiful) software has been written in dynamically scoped Lisps. Thanks for the great essay.

J.V. Toups said...

Thanks!

Well, at least partly for rhetorical purposes I suppressed my own bias towards lexical scope. But basically, I stand by my assertion that the distinction is not as important as it is made out to be.