Lately I've been working on an Emacs Lisp library I'm pretty proud of,
Shadchen. It implements extensible pattern matching, somewhat
like Racket's Match facility for Emacs Lisp. In this
tutorial/introduction I'll explain how to use Shadchen's various
facilities and, most importantly, how to extend Shadchen itself with
new patterns, which is an interesting subject in and of itself,
combining compile time and run-time execution in interesting ways. If
you already know about pattern matching, feel free to jump to the end,
where I talk about writing non-trivial patterns using Shadchen's
defpattern.
The Problem Shadchen Solves
Shadchen solves several problems you may not know you may have. In
one sentence, Shadchen lets you concisely express both destructuring
and type checking for complex data structures. It can be thought of
as combining the features of cond, case and assert into one nice
package. My experience is that this collection of features helps me
write better code, since it encourages me to dilineate exactly the
kind of data a function or form expects before doing anything with
it.
Shadchen is also conservative - unless you provide a pattern that matches the input data, it will fail with a match error, so you know something is wrong before something strange happens.
For instance, suppose we were writing an interpreter for Lisp. It might look like:
(defun eval (form env)
(cond form
((symbolp form) (eval-symbol form env))
((listp form)
(case (car form)
(if (handle-if (cadr form)
(caddr form)
(caddr form) env))
(let (handle-let
(cadr form)
(cdddr form)))
...))))
Note that we have both a cond and a case here, and that after we
test our data, we destructure it. Here is a similar piece of code
using Shadchen:
(defun eval (form env)
(match form
((p #'symbolp s) (eval-symbol s env))
((list) nil)
((list 'if pred true-branch false-branch)
(handle-if pred true-branch false-branch env))
((list-rest 'let (list-rest pairs) body)
(handle-let pairs body env))
...))
(Notes: the pattern p passes when the predicate as its second
argument is true on the match value, and then matches against the
third argument. So (p #'symbolp s) matches only when form is a
symbol and then binds s to that symbol. list matches when the
input is a list and each pattern in the the list pattern matches each
corresponding element in the list. list-rest is similar but any
leftover parts of the list are matched against the final pattern.)
This code is more concise, and yet it is also much more explicit, both
in that it provides better naming for values and it provides more
explicit error checking. For instance, this version will only match
if with three expressions, where as the previous evaluator would
have been fine with the expression (if a b c d e f). This version
also asserts explicitly that the binding part of the let needs to be a
list. With a custom pattern we could also ensure it was a list of
symbol/expression pairs in almost the same space.
It takes all kinds, but I found that once I got used to programming with pattern matching, it was hard to go back.
Other Rad Features of Shadchen
match-let
Shadchen wants to let you program in a functional style. To that end,
in addition to the regular pattern matching form match, it also
provides some other, nice features. Many algorithms involve examining
some intermediate data, checking its structure somehow, and then
recursively processing the next step. Shadchen allows this kind of
thing with the Scheme-flavored match-let form.
The form match-let can be used exactly like let:
(match-let
((x 10)
(y 11))
(+ x y))
But in each "binding" pair, the symbol may be replaced with any Shadchen pattern. Eg:
(match-let
((x 10)
(y 11)
((list q r s) (list 1 2 3)))
(+ x y q r s))
Will give you 27. If any pattern fails, the form produces a match
fail error, which means you can use match-let as a let form with
tidy type checking.
Finally, a match-let form allows tail recursion. Invoking recur
in a tail position inside the form causes the match-let to be
re-entered without growing the stack. For instance:
(match-let
(((list x y) (list 0 0)))
(if (< (+ x y) 10000)
(recur (list (+ x 1) (+ x y)))
(list x y)))
Results in (141 9870) and can't blow the stack. It is an error to
invoke recur in a non-tail position, but because of limitations in
Emacs Lisp, it is difficult to enforce this statically.
defun-match
The form defun-match lets you write functions which pattern match on
their arguments and split their calculations across multiple bodies,
in a bit like the style of Shen or Haskell.
For instance, suppose we have an animal simulator, where each animal is represented by a list, the first element of which is a symbol representing the animal name. We can say:
(defun-match- vocalize ((list-rest 'cat properties))
"Cat vocalization."
(message "Meow"))
(defun-match vocalize ((list-rest 'dog properties))
"Dog vocalization."
(message "Woof"))
Then:
(vocalize '(cat :name tess))
(vocalize '(dog :name bowzer))
Functions defined with defun-match can also use recur to re-enter
themselves without growing the stack. Consider a function which
causes a list of animals to vocalize:
(defun-match- vocalize-list (nil) nil)
(defun-match vocalize-list ((cons animal animals))
(vocalize animal)
(recur animals))
recur can dispatch to any of the bodies defined for the function and
it doesn't grow the stack. It must be invoked from tail position,
though non-tail calls can be affected by simply calling the function.
(N.B. defun-match- with that dangling minus sign causes previous
bodies to be expunged before defining the indicated body.)
Extending Shadchen with defpattern
Shadchen is an extensible pattern matching facility. We can define
new patterns much in the way we define new functions, although
patterns are more like macros than functions. Let's look at a simple
example, and then I'll guide you through a more complex example I just
added to the library using the defpattern.
A quirk of Common and Emacs Lisp is that (car nil) is nil even
though nil is not a cons cell, and so does not have a car or a
cdr. I hate this behavior, because its quite evident that (cons
nil some-list) is different from nil, but car can't tell that -
the user has to do more inspection to find this out. Bugs waiting to
happen, let me tell you.
However, I'm nothing if not accommodating, and so the cons pattern in
Shadchen will, in fact, match against nil. So:
(match nil
((cons a b) (list a b)))
Will be '(nil nil). Let's define a pattern which is like cons,
but only matches against actual cons cells, into which category
nil fails to fall.
(defpattern strict-cons (car cdr)
`(p #'consp (cons ,car ,cdr)))
A defpattern body must evaluate to a legal shadchen pattern. Each
argument to the defpattern is also a shadchen pattern. So this
pattern reads "define a new pattern strict-cons, which first checks
that the match value is a cons cell using the p pattern, and then
matches the car and cdr of that cons cell against the patterns car
and cdr.".
During the expansion of a shadchen pattern matching form, user defined
patterns are looked up and their expansions are inserted into the
macro expansion. In short, defpattern allows you to define new
patterns in terms of old patterns.
This might seem very restrictive, but Shadchen provides primitive patterns that allow you to write arbitrarily complex pattern matchers that can perform rich computations on their way to rejecting or accepting a match.
Implementing concat, a non-trivial pattern
I just used defpattern to implement a pretty complex pattern,
concat and it was something of a learning experience. Writing
complex patterns definitely takes some thought and practice, but
hopefully this tutorial will bootstrap users to a point where their
own patterns can be implemented without too much pain.
What is so complicated about a concat pattern? Well, we want
concat to match the concatenation of patterns which match strings.
Eg:
(concat "dog" "cat")
Should match "dogcat". Writing a pattern that has this behavior is easy:
(defpattern concat (&rest strings)
(reduce #'concat strings))
This pattern can't match subpatterns that are anything other than strings, however. We'd really like to be able to match, for instance:
(concat (and (or "dog" "cat") which) "dog")
against either "dogdog" or "catdog", binding which to whatever the
initial string contents actually are. How can we do this?
Nailing down concat's semantics.
We want concat to function this way:
If the initial pattern is not a string, then try matching that pattern
against larger and larger substrings until either you run out of
string to match against, or you match. If you match, then match,
again using concat with the unused patterns, against whatever is
left of the string after you've removed the part that matched.
Repeat until all patterns are exhausted and then make sure the string
has been completely consumed too.
If the initial pattern is a string, then just cleave off the same length of characters from the input, and if they match, recursively match the rest. Here is the entry point:
(defpattern concat (&rest patterns)
(cond
((length=0 patterns)
"")
((length=1 patterns)
`(? #'stringp ,(car patterns)))
(:otherwise
(cond
((stringp (car patterns))
`(simple-concat ,@patterns))
(:otherwise
`(full-concat 0 ,@patterns))))))
The :otherwise has all the meat, but we defer it to to other
helper-patterns; simple-concat and full-concat. Simple concat
looks like this:
(defpattern simple-concat (&rest patterns)
(cond
((length=0 patterns)
"")
((length=1 patterns)
`(? #'stringp ,(car patterns)))
(:otherwise
(let* ((the-string (car patterns))
(static-len (length the-string)))
`(and
(p #'stringp)
(p (lambda (s)
(>= (length s) ,static-len)))
(p
(lambda (s)
(string= (substring s 0 ,static-len) ,the-string)))
(funcall (lambda (s)
(substring s ,static-len))
(concat ,@(cdr patterns))))))))
Look at the backquoted expression. It is an and pattern, which only
succeeds if all the patterns beneath it also succeed. These patterns
are (p #'stringp), which asserts that the input is a string, (p
(lambda (s) (string= (substring s 0 ,static-len) ,the-string))) which
asserts that the input is at least long enough to contain the string
we want to match against. The next form asserts that the substring of
the input equal to the pattern string in length is equal to the
pattern. If this is true, then the pattern matches, and we use the
funcall pattern to match against the rest of the string with the
leftover patterns.
The funcall pattern takes the input to the match, applies a
function to it, and then matches the output of that function
application to the pattern provided as its third slot.
full-concat is more complex. Note that when we invoke
full-concat we provide it an numerical first argument. This number
tells the pattern how far into the string to match we've looked, so it
starts at zero. After all the first pattern could match the empty
string. full-concat looks like this2:
(defpattern full-concat (pivot &rest patterns)
(assert (numberp pivot)
()
"Pivot should be a number.")
(cond
((length=0 patterns)
"")
((length=1 patterns)
`(? #'stringp ,(car patterns)))
(:otherwise
`(and
(p (lambda (s)
(>= (length s) ,pivot)))
(or
(and (funcall
(lambda (s)
(substring s 0 ,pivot))
,(car patterns))
(funcall
(lambda (s)
(substring s ,pivot))
(concat ,@(cdr patterns))))
(full-concat ,(+ pivot 1) ,@patterns))))))
Here we use and again. We first check that the input string is long
enough to grab the substring indicated by pivot. If this isn't true,
the match fails. We then use the or pattern to indicate a branch.
Either of the or patterns might succeed, but the first to do so is
the only one that will happen. The first pattern to or uses funcall
to peel off the substring of the input from 0 to pivot. If the
initial pattern matches, then we use funcall again to get the rest of
the string, and invoke concat again.
If this match fails, then we invoke full-concat again, but increment
the pivot by one, indicating that we want to check against a larger
substring.
If this is confusing, and it is understandible if it is, remember the
following: when writing defpatterns, or is used for flow control,
and is used to assert multiple things about the input, p is used
to assert individual arbitrary conditions on the input, and funcall
is used to transform the input for further matching. Recursive
pattern expansion is used for iteration 1.
And feel free to contact me with questions, if they come up.
1 It is a lot like writing prolog, actually. Pattern matching is a significant distance from lisp to prolog.
2 After writing this I realized we can do better. If we get a match for the initial pattern, and then check the rest of the patterns, its possible they will fail because the initial match didn't consume enough of the string. It is simple to say, "if the subsequent match fails, keep increasing the pivot and trying again." I leave it as an exercise to the reader to figure out how to represent this trivial backtrackingish thing - but you can always check the source for the solution.
No comments:
Post a Comment