Wednesday, January 18, 2012

Shadchen: A pattern Matching Library for Elisp

A pattern matching library for Emacs Lisp

One of the things I like most about Racket (and other functional programming languages) is that they have good support for pattern matching, which is a great way to simultaneously dispatch on structure type, enforce constraints on values held in a data structure, and bind variables. A tremendous amount of code is devoted to these activities, and pattern matching combines them all in a succinct, easy to read form.

It's been my intent for some time to write a complete pattern matching library for Emacs Lisp, since it is a Lisp which I use very frequently. Shadchen is the first time any attempt to do so has resulted in a reasonable product. Shadchen is a Yiddish word for matchmaker, in case you were wondering about the title. The library is available in my elisp repository. Even though it comes in a giant directory full of junk, it runs standalone.

[EDIT: Here it is in a standalone repo.]

How it Works

Shadchen's interface consists of just three forms. match is the work horse - it actually performs a pattern match. match has the syntax:

(match <VALUE>
 (<PATTERN1> FORMS ...)
 (<PATTERN2> FORMS ...)
 ...
 (<PATTERNN> FORMS ...))

The <VALUE> expression is evaluated, and then each pattern attempts to match against it. If a pattern succeeds, it's associated FORMS are evaluated, in a context where the environment has been extended by the pattern's bindings. If a match fails, the next pattern is tried. If no patterns succeed, an error occurs.

Here is an example:

(match (list 1 2 3)
  ((list x y z) (+ x y z)))

The expression (list x y z) is the pattern in the above expression. Patterns are such that they resemble the code which creates the data structure in question, in this case a three element list. The pattern (list x y z) basically specifies that a match occurs when the input value is a three element list. When a match occurs, x is bound to the first element, y to the second and so on. Hence this form evaluates to 6.

Here is another example: summing a list:

(defun* dummy-sum (lst &optional (acc 0))
 (match lst
  (nil acc)
  ((cons hd tl)
   (dummy-sum tl (+ acc hd)))))

First we specify the pattern nil, which matches only nil. When the input is nil, we return the accumulator. Then we match against (cons hd tl) which matches a cons pair, binding hd to the car and tl to the cdr. We then recur, adding to the accumulator.

The pattern matching language is rich. There are patterns for matching against literals, applying functions to values, matching against arbitrary conditions, matching against structs.

Eg:

(defstruct a-struct f1 f2)

With this struct defined, the pattern:

(struct a-struct (f1 (? #'numberp x)) (f2 (? #'stringp y)))

Matches only a struct whose field f1 is a number, and whose field f2 is a string. When that is true, x and y are bound.

Extending Shadchen

The pattern matcher is user extensible using defpattern. Defpattern defines a function which receives the pattern's arguments, and returns a new pattern which effects the desired match. For instance, the struct pattern is defined thusly:

(defun cl-struct-prepend (s)
  (intern (format "cl-struct-%s" s)))

(defun make-cl-struct-accessor (struct-name slot) 
  (intern (format "%s-%s" struct-name slot)))


(defpattern struct (struct-name &rest fields)
  `(and
    (? #'vectorp)
    (? #'(lambda (x) (> (length x) 0)))
    (? #'(lambda (o)
           (eq (elt o 0) ',(cl-struct-prepend struct-name))))
    ,@(loop for f in fields collect
            `(funcall 
              #',(make-cl-struct-accessor struct-name (car f))
              ,(cadr f)))))

Note that the defpattern body must return a valid pattern in terms of previously defined patterns (or itself, patterns can be recursive). In this case we use the patterns and, ?, and funcall to create a new matcher.

Supported Patterns:

Shadchen supports the following patterns:

Shadchen supports the following built-in patterns.

<SYMBOL>

Matches anything, binding to that value in the body expressions.

<KEYwORD-LITERAL>

Matches only when the value is the same keyword.

<NUMBER-LITERAL>

Matches only when the value is the same number.

<STRING-LITERAL>

Matches only when the value is string= is the same string.

(CONS <PATTERN1> <PATTERN2>)

Matches any CONS cell, or NIL, then matches <PATTERN1> and <PATTERN2>, executing the body in a context where their matches are bound. If the match value is NIL, then each PATTERN matches against NIL.

(LIST <P1> ... <PN>)

Matches a list of length N, then matches each pattern <PN> to the elements of that list.

(LIST-REST <P1> ... <PN> <REST-PATTERN)

Matches - to elements in at list, as in the LIST pattern. The final <REST-PATTERN> is matched against the rest of the list.

(QUOTE DATUM)

Only succeeds when DATUM is EQUALP to the match-value. Binds no values.

 (AND <P1> .. <PN>)

Tests all <PN> against the same value, succeeding only when all patterns match, and binding all variables in all patterns.

 (? PREDICATE <PATTERN>)

Succeeds when (FUNCALL PREDICATE MATCH-VALUE) is true and when <PATTERN> matches the value. Body has the bindings of <PATTERN>.

 (FUNCALL FUN <PATTERN>)

Applies FUN to the match value, then matches <PATTERN> against the result.

 (BQ EXPR)

Matches as if by BACKQUOTE. If EXPR is an atom, then this is equivalent to QUOTE. If EXPR is a list, each element is matches as in QUOTE, unless it is an (UQ <PATTERN>) form, in which case it is matched as a pattern. Eg:

(match (list 1 2 3)
  ((BQ (1 (UQ x) 2)) x))

Will succeed, binding X to 2.

(match (list 10 2 20)
   ((BQ (1 (UQ x) 2)) x))

Will fail, since 10 and 1 don't match.

(values <P1> ... <PN>)

Will match multiple values produced by a (values ...) form.

(struct struct-name (field-name <P1>)
                    (field-name <P2>)
                    ...
                    (field-name <P3>))

Which matches when the input is a struct of type struct-name, whose fields match <P1> ... <PN>.

Conclusions

If you are an elisp hacker like me, now you don't have to envy those snooty Racket, ML and Haskell programmers. Happy Hacking!

(PS - the library is also available in Common Lisp, check my github).

(PPS - I love Racket, ML and Haskell programmers and they aren't at all snooty.