Monday, April 25, 2011

Emacs Lisp Curios (some useful, some fanciful)

A Quick Overview of the Curios in this Library

Since I've been place on the planet emacs blog roll, I thought I'd do a quick summary of the things it might be interested in that I've worked on. Code is available on my github.

Clojure-Style Destructuring Bind/Function Definitions

This library, defn.el, supports an approximation of Clojure's destructuring bind syntax. If you are familiar with Clojure, the following code will be recognizable:

(require 'defn)
(defn prod 
  ([[el & tail :as lst] acc]
   (if lst (recur tail (* el acc))
       acc))
  ([lst] (prod lst 1)))
(prod '(1 2 3 4)); -> 24

The above code demonstrates the declaration of a function, prod which has two arities: the number of arguments determines which body is executed. If one arg is passed in, prod calls itself with the appropriate initial value for its accumulator. If two arguments are provided, it uses a recur expression to call itself repeatedly, calculating the produce of a list passed in. This is done without growing the stack. You can recur within a single body perpetually.

defn and its anonymous cousin fn support destructuring on hash tables and association lists where each element in the list is a proper list:

(defn dst-demo [[:: x :x y :y]]
  (list x y))

(dst-demo (tbl! :x 10 :y 11)); -> (10 11)
(dst-demo '((:x 40) (:y 50))); -> (40 50)

Default values may be provided for both sequential and table forms using the :or keyword (clojure doesn't support :or for sequences). Destructuring can be nested arbitrarily deeply as well. There are also let forms which provide the same destructuring for let-like situations. All these forms expand to lexical-closures, but there are forms like defn_ which are identical but create dynamic scopes instead.

As with most heavy macro magic, you can't really use this library without byte compiling, but when this is done, it can be reasonably zippy.

Standalone Recur

I really like the recur feature from Clojure. I like it so much I factored it out of the defn implementation so that it can be used in situations where a lighter touch is required.

(require 'recur)
(recur-defun* prod (lst &optional (acc 1))
  (if (empty? lst) 
      acc
      (recur (cdr lst) (* acc (car lst)))))

Again, this won't blow the stack. The lambda list supports the whole cl style lambda list. Also provided is a let form which can recur on itself, recur-let.

Multi-methods

Another Clojure-inspired feature is multi-methods. These are a kind of proto-multiple dispatch "object" system, by which I mean they are slightly more general than any concrete object system. You can use them to implement a single-dispatch object system-like functionality, for instance:

(require 'multi-methods)
(defmulti volcalize-n-times (lambda (&rest args)
                          (pseudo-class-of (car args)))
  "Volcalization multi-method.") ;Must be executed before defunmethod

(defunmethod volcalize-n-times :cat (thing n)
  (loop for i from 1 to n do (print "Meow")))

(defunmethod volcalize-n-times :dog (thing n)
  (loop for i from 1 to n do (print "Woof")))

(defunmethod volcalize-n-times :human (thing n)
  (loop for i from 1 to n do (print "Lorem ipsum dolor sit
  amet, consectetur adipiscing elit. Nulla sed sapien ligula. Sed
  luctus cursus consequat. Morbi egestas magna auctor dui
  sagittis bibendum. Ut in felis sit amet eros eleifend ultricies
  gravida id ligula. Suspendisse potenti. Nulla semper porttitor
  massa, sed feugiat mauris tempor nec.")))

(defun pseudo-class-of (val)
  (gethash :pseudo-class val :thing))

(volcalize-n-times (tbl! :pseudo-class :cat :name 'tibald) 10)
; prints "meow" 10 times.
(volcalize-n-times (tbl! :pseudo-class :human :name 'vincent) 2)
; prints lorem-ipsum fragment twice.

Multimethods match their arguments via isa? predicate, which can be extended with functions like derives. For instance:

($ :cat derives-from :quadraped)
($ :dog derives-from :quadraped)
($ :human derives-from :biped)
($ :emu derives-from :biped)

Note that $ is a simple infix macro that swaps the function position with the first argument position in the top level of the subsequent forms.

(defmulti n-legs 
  (lambda (&rest args)
   (pseudo-class-of (car args)))
  "Number of legs multi-method")

(defunmethod n-legs :biped (o) 2)
(defunmethod n-legs :quadraped (o) 4)

(n-legs (tbl! :pseudo-class :human)) ;-> 2
(n-legs (tbl! :pseudo-class :cat)) ;-> 4

By writing the appropriate dispatch function you can create many kinds of crazy object systems. Dispatch is cached, so if the derives hierarchy doesn't change, dispatch is fast after the first method call.

Partial Application

Particularly because emacs lisp is dynamically scoped, its handy to have functions to create functions which handle lexical static for you. If you (require 'functional), you'll have access to these functions, which make point free programming more fun:

(mapcar (par #'+ 5) '(1 2 3));-> (6 7 8)

par partially applies a function on the right. pal does the same on the left. Both always return a function, so you can partially apply more arguments than a function has, in which case when you call the result, you'll get an error. Considering that elisp functions have variable and unlimited arity, this seemed like the most idiomatic accomidation. There are also some other interesting things in functional.el, though it probably should be called point-free.el.

With-stack

with-stack.el is an emacs lisp embedded stack language in the mode of Factor or Joy. It looks like this:

(||| 4 4 + ) ;-> 8

The macro ||| introduces a stack language form. Subsequent objects are interpreted as in a stack languages, with atomic forms pushing themselves onto the stack and symbols acting as "words" which manipulate the stack. The file stack-words.el defines many basic stack words, but you can define your own using the word: word.

(||| word: plus-four 4 + end: 10 plus-four ) ;-> 14

||| returns whatever is on the top of the stack at the end of evaluation. The stack is discarded between invocations of |||. The language supports the notion of regular "words" and parsing or immediate words, which are executed when they are encountered during the compilation scan of the stack language segment. An immediate word is written in the stack language itself, and sees the "future" stream of tokens as its stack. It may manipulate that stack as it sees fit before terminating, at which point, compilation continues till the next immediate word or the end of the stack segment.

(||| immediate-word: -lisp-val: '1>eval swap end:)
(||| -lisp-val: '(+ 1 1) ) ;-> 2

The language supports Factor/Joy style quotations and stack-words defines the most common combinators (if,loop,bi, etc). You can call emacs functions using a bit of sugar (shown above). The syntax 2>concat, for instance, means "call concat with two values from the stack." These glue-words are assumed to return 1 value, so don't forget to drop nils.

Conclusion

That is a bunch of the crazy stuff I have been working on. There are other projects in a half-way state of completion. All of these projects are on my github, as is this document. They depend on my giant pile of utils library utils.el.

Friday, April 22, 2011

Lexical-let Corner Case

Emacs 24 will have actual lexical binding, but until that becomes the standard, we've got to satisfy ourselves with cl.el's lexical-let form, a codewalker which does some magical things to simulate lexical scope. Today I discovered the following corner case when combining lexical and dynamic scope on variables with the same name. Consider:

(setq x "Hi there")
(defun return-x () x)
(return-x) -> "Hi there"
(let ((x "Yo."))
  (return-x)) -> "Yo." ; Dynamic scope in action, 'x is shadowed.
(lexical-let ((x "Yo."))
  (return-x)) -> "Hi there." 
; lexical-let doesn't create a dynamic binding, so x
; is not shadowed
(lexical-let ((x "Yo."))
  (let ((x x))
    (return-x))) -> "Hi there", but should be "Yo."
; This is unexpected behavior.

What is the problem? The let expression in the last example is rewritten by the codewalker, which replaces the x in the binding form with a hidden name. This prevents x from being dynamically bound, even though the let should shadow the x inside return-x. I am not sure if there is a way around this that isn't incredibly baroque. Probably best to just be aware of it.

Saturday, April 16, 2011

Deep Emacs Lisp Part 2

Streams

Last time we went through a lengthy development of monads in emacs lisp. Most of our attention was focused on just getting the idea down, and we developed a pretty full list monad.

It was intimated at the time that the list monad can be thought of as a "possibilities" monad. That is, we have functions which depend on a single input, but can produce many possible outcomes.

Suppose we wish to understand the probabilty of of rolling a given number when combining N dice with different numbers of sides. Characterizing a particular fair die is easy enough.

(require 'utils)
(defun die-outcomes (n-sides) 
  (range 1 (+ n-sides 1)))
(die-outcomes 6) ;-> (1 2 3 4 5 6)

(Note: All the code here depends on my utils.el and other packages available from my github page).

You might want to know: what is the probability of rolling a 10 when I roll three six sided dice? You might have noticed that die-outcomes is a monadic function in the list monad. It takes a number and returns a list of outcomes. How can we use the monad to calculate the probability in question?

(require 'monads)
(let* ((outcomes (mlet*_ monad-seq
                  ((d1 (die-outcomes 6))
                   (d2 (die-outcomes 6))
                   (d3 (die-outcomes 6)))
                  (m-return (+ d1 d2 d3))))
       (n-outcomes (length outcomes))
       (n-tens (length (filter (par #'= 10) outcomes) )))
   (* 100 (/ n-tens (* 1.0 n-outcomes)))); ->  12.5 percent chance

This is a kind of interesting thing, if you think about it. While this certainly represents a brute force approach, the really neat thing is that we only just described the possible outcomes of each die. This is a very easy problem. We then just let the list monad sort out the details.

I just joined the Triangle Area Functional Programmers Group, and they recently considered the "Cracker Barrel Peg Board Puzzle" question. If you aren't familiar with the game, here is a guy solving it with a mnemonic device:

But mnemonic devices are pretty crappy programs for pretty crappy computers (brains). Can we do better? Well, in the spirit of the above dice example, instead of considering the problem "how do we solve the peg board puzzle?", let us consider instead the simpler problem: given a peg board, how do we enumerate all the moves that we can make on any given turn?

We're going to be working functionally, so lets decide an on a persistent representation of a game board. By persistent we here mean that when we modify such a structure, we actually get a copy of that structure back. One such representation is :

(defun fresh-board ()
  (alist>>
   0 '(0)
   1 '(0 1)
   2 '(0 1 2)
   3 '(0 1 2 3)
   4 '(0 1 2 3 4)))

We just represent a board as an alist with row indexes as keys and each row represented by a list of occupied positions. alist>> is a function from my utilities which just turns the above into:

((0 (0)) (1 (0 1)) (2 (0 1 2)) (3 (0 1 2 3)) (4 (0 1 2 3 4)))

Adding a removing a peg are somewhat obvious. We just dip into the alist to the appropriate row and remove the number of the peg we want to take out. Adding is the opposite operation. We will handle error checking at another level of abstraction, but we will at least maintain at this point that each row contains only unique peg positions.

We'll represent positions as cons cells with column as the cdr and row as the car.

(defun pos (x y)
  (cons x y))

And we'll be destructuring positions a lot, so lets whip up a quick macro for that:

(defmacro let-pos (peg-binders &rest body)
  (cond ((empty? peg-binders) `(progn ,@body))
        (t (let ((binder (car peg-binders))
                 (peg-sym (gensym "peg-")))
             `(let* ((,peg-sym ,(cadr binder))
                     (,(car (car binder)) (car ,peg-sym))
                     (,(cadr (car binder)) (cdr ,peg-sym)))
                (let-pos ,(cdr peg-binders) ,@body))))))

Example:

(let-pos (((x y) (pos 3 4)))
  (list :x x :y y)) ;-> (:x 3 :y 4)

There are more general destructuring bind operations lurking in the fetid depths of my emacs lisp library, but clarity demands we use this simple solution. Note that the macro accepts any number of position binding forms.

Now those board-related functions:

(defun peg-at-board? (board pos)
  (let-pos (((x y) pos))
           (mlet* monad-maybe^i
                  ((row (alist board y))
                   (at? ($ x in row)))
                  at?)))

(defun remove-peg (board pos)
  (let-pos (((x y) pos))
           (alist-conjugate board
                            y
                            (lambda (row)
                              (filter 
                               (f-not (par #'= x)) row)))))

(defun n-sort-cons (n n-list)
  (cond ((empty? n-list) (list n))
        ((= n (car n-list)) n-list)
        (($ n < (car n-list)) (cons n n-list))
        (t (cons (car n-list) (n-sort-cons n (cdr n-list))))))

(defun add-peg (board pos)
  (let-pos (((x y) pos))
           (alist-conjugate board
                            y (pal #'n-sort-cons x))))

Note: I snuck in a maybe-monad use up there. We will talk about it in a bit.

Don't worry too hard about these functions. They just do what they say on the tin, with the understanding that they return a new board with the indicated changes, rather than modifying the board. n-sort-cons is not tail recursive, but since there are no more than five pegs in a row, we aren't likely to blow the stack.

It will be handy to simulate moving pegs in a way that results in failure when we try to move off the board.

(defun on-board? (pos)
  (let-pos (((x y) pos))
           (and (>= y 0)
                (<  y 5)
                (>= x 0)
                (<= x y))))

(defun move1 (pos direction)
  (let-pos (((x y) pos))
           (let* ((new-pos
                   (case direction
                     (:nw (pos (- x 1) (- y 1)))
                     (:ne (pos x (- y 1)))
                     (:e (pos (- x 1) y))
                     (:w (pos (+ x 1) y))
                     (:sw (pos x (+ y 1)))
                     (:se (pos (+ x 1) (+ y 1))))))
             (if (on-board? new-pos)
                 new-pos
               nil))))

move1 takes a starting position and a direction (one of (:nw :ne :e :w :sw :se)) and returns the new position, if it is on the board. Using move1 we can define move-n which just repeats this process N times, or until we fall off the board. You can see this code on the github, its straightforward.

Ok, finally something interesting:

(defun generate-hop (board pos dir)
  (lexical-let ((pos pos))
    (mlet*_ monad-maybe^i 
            ((origin-occupied? (peg-at-board? board pos))
             (over (move1 pos dir))
             (target (move-n 2 pos dir))
             (over-occupied? (peg-at-board? board over))
             (target-empty? (not (peg-at-board? board target))))
            `((:remove ,pos)
              (:remove ,over)
              (:place ,target)))))

This function takes a board, a position, and a direction and generates a hop in that direction if one is allowed. It returns nil otherwise. This kind of thing is built for the maybe monad, so most of the function lives there. It is a good time to remind ourselves of how monads work. mlet*_ tells use we are going to be chaining our binding through a monad, in this case monad-maybe^i, which is defined in monads.el:

(defvar monad-maybe^i
  (tbl!
   :m-zero nil
   :m-return (lambda (x) x)
   :m-bind (lambda (v f)
             (if (not v) v
               (funcall f v))))
  "The (implicit) MAYBE monad.  
   NIL indicates failure.  
  MaybeVal is the identity.  
  Just is the identity.")

This monad is a variation on the regular old maybe monad which just lets the programmer express failure with a regular old lisp nil instead of a tagged value. Hence the zero of this monad is just nil. Return is the identity function, and bind is almost funcall, but not quite. If the monadic value passed in is nil, it doesn't apply the monadic function. It just returns nil. The net effect is that if any expression in the mlet* expression above is nil, the whole expression evaluates to nil, short circuiting through the subsequent expressions. Handy. Monads are cool (see footnote 1 for why this monad has an ^i).

Ok, so now we have a function which can generate a hop. It returns a list of instructions on how that hop is implemented which we can use to modify a board to effect that hop. These instructions can be thought of as a really dead simple programming language, so we can write a function:

(defun interpret-hop (board hop)
  (reduce 
   (lambda (board move) 
     (let ((what (car move))
           (where (cadr move)))
      (case what
        (:remove (remove-peg board where))
        (:place  (add-peg board where)))))
   hop
   :initial-value board))

Which takes a hop and modifies the board to account for that move. Apart from a bit of book keeping, we've solved the peg-puzzle. We can write a function which returns all the legal hops for a given board quite easily using the list-monad.

(defun generate-hops (board)
  (if (board-solved? board) (list board)
      (mlet*_ monad-seq^i 
        ((direction       (list :nw :ne :e :w :sw :se))
         (position        (generate-positions))
         (hop             (generate-hop 
                           board position
                           direction)))
       hop)))

(Astute readers might notice we could partially apply the board argument of generate-hop and then lift-2 the resulting function into the sequence monad, then apply it to the lists of positions and directions. The provided approach is probably easier to read for people not too familiar with monads.)

Now solving the problem is just matter of applying generate-hops thirteen times, taking care to produce a new set of boards from the generate hops at each step.

If you try this, you'll find that it takes a very long time. The peg game has a pretty large state space, and its a hassle to generate ALL solutions when we really don't want all of them, at least not all at once. Can we recapture the elegance of this simple, declarative solution without having to calculate every single win condition?

Enter Streams

One way we can use essentially the same methodology but not have to calculate all the answers is to use lazy lists or streams. Emacs doesn't have them, so we are going to have to roll our own, but they aren't conceptually that difficult. A stream is simply a conceptual pair of objects. The first object represents the head of a the stream. The second object is a function which returns the rest of the stream when called. From these simple beginnings we can produce a data structure with all sorts of unusual behaviors. For instance, its possible to create an infinitely long stream of, for instance, all the integers starting with 1 or all the Fibonacci numbers. Even though such streams are conceptually infinite, we can pass then around and operate on them almost as we would any list. We can even do things which seem counter-intuitive at first, for instance, mapping a function over an infinite stream to produce a new, infinite stream.

Streams take some getting used to, but the process can be very enlightening. For instance, one learns quickly why Haskellers are not as concerned about non-tail recursion when writing stream functions. If the recursion occurs inside the "future" of the stream, you get a free trampoline.

Laziness, however, takes special care in Emacs Lisp, because variables are not lexically scoped by default. This means every time you create a lambda which you intend to be called later, you have to make sure to explicitly lexically-let over it, indicating whatever variables the lambda depends upon. In a way, this is good, because it forces us to think consciously about closures, which are important ideas in functional programming. However, it can get syntactically busy to be wrapping up things in lexical-let forms all the time, particularly because we often want only to create a lexical copy of a dynamically bound variable.

Ergo, our very first step in creating a stream library is to create a nice form for delaying computations.

(defun single-symbol-list? (item)
  (and (listp item)
       (= (length item) 1)
       (symbolp (car item))))
(defun binderish? (item)
  (and (listp item)
       (= (length item) 2)
       (symbolp (car item))))

(defun with-form->binder (item)
  (cond ((symbolp item )(list item item))
        ((listp item)
         (cond ((single-symbol-list? item)
                (cons (car item) item))
               ((binderish? item)
                item)
               (t (error "with-forms require symbols, 
                          a single symbol list, or a binder-like 
                          expression.  Got %S." item))))
        (t (error "with-forms require symbols, 
                   a single symbol list, or a binder-like 
                   expression.  Got %S." item))))

(defmacro* later (expr &key (with nil) (with* nil))
  (cond (with 
         `(lexical-let ,(mapcar #'with-form->binder with)
            (later ,expr :with* ,with*)))
        (with* 
         `(lexical-let* ,(mapcar #'with-form->binder with*)
            (later ,expr)))
        (t `(lambda () ,expr))))

The macro later takes a single expression and wraps it in a lambda with no arguments. This is exactly the sort of lambda which forms the tail of streams. Because the contents of the tail expression often depend on variables dynamically bound at the time the lambda is created, later lets you specify which values to produce a closure over in several ways.

(later 10) ;-> (lambda () 10)
(let ((x 10))
  (later x :with (x))) -> (let ((x 10))
                            (lexical-let ((x x))
                             (lambda () x )))
(let ((x 10)) 
   (later y :with ((y (+ x 1))))) ->
 (let ((x 10)
    (lexical-let ((y (+ x 1)))
      (lambda () y))))

(let ((x 10))
  (later z :with* ((y (+ x 1))
                   (z (+ x y))))) ->
  (let ((x 10))
    (lexical-let* ((y (+ x 1))
                   (z (+ x y)))
      (lambda () z)))

In words, later takes an expression and a list of binding expressions which are similar to let binders, but which can be single symbols, which expand to a (x x) binding expression. Now we are equipped to do a reasonable job implementing streams.

(Note: These streams are based loosely on those covered in "The Reasoned Schemer", although they've been adapted to stand alone from the Kanren interpreter and also to be more easily understood (to me)).

We could represent streams as cells, build them using "cons," etc, but I prefer to have a bit more error detection built into the implementation. I don't want to accidentally use a list as a stream or vice versa. Emacs provides some basic facilities for defining new "types," with defstruct (see Footnote 2).

(defstruct stream head future)
(defun stream (hd &optional future)
  (make-stream :head hd :future future))

This defines the functions make-stream, stream-head, stream-future and stream-p. Good enough for use to get started, certainly. The function stream is just the stream analog of cons, it creates a stream cell. We allow the tail to be optional, because we'll want to create streams of one element frequently. Incidentally, stream with one element is our return operation for the stream monad.

What about car and cdr? Stream car (scar) is easy enough:

(defun scar (stream)
  (cond ((not stream) nil)
        (t 
         (progn
           (if (not (stream-p stream))
               (error "Tried to take the scar of a non-stream %S." stream))
           (stream-head stream)))))

The first cond checks for the nil stream, which we represent with nil. If the stream isn't nil, it returns the head portion of the stream. Easy enough.

(defun scdr (stream)
  (cond ((not stream) nil)
        (t
         (progn
           (if (not (stream-p stream))
               (error "Tried to take the scdr of a non-stream %S." stream))
           (let ((fut (stream-future stream)))
             (if fut
                 (progn 
                   (if (not (functionp fut)) 
                    (error "The future of a stream must be 
                            either nil or a function.  Instead its %S" fut))
                   (let ((fut (funcall fut)))
                     (if (not (stream-p fut)) 
                      (error "The future of a stream must 
                              evaluate to a stream.  Instead it was %S" fut))
                     fut))
               nil))))))

scdr is longer, but it is simple enough. Check for nil, in which case the scdr is nil. Otherwise, grab the lambda in the second half of the stream and call it, checking to make sure that the output is itself a stream. Return that.

defstruct created stream-p for us, which tests to see if an object counts as a stream, but we want the definition to include the nil object, so we should define for later:

(defun stream? (x) (or (not x) (stream-p x)))

I like the question-mark indicates predicate style better anyway.

It is convenient to split a stream into three cases. The first case is the nil stream, the second is a stream with one element, and the third is a stream with a future. Lots of algorithms need to act differently in these cases, so we should write a macro over them:

(defun stream-future-nil? (object)
  (nil? (stream-future object)))
(defun stream-with-future? (object)
  (not (nil? (stream-future object))))

 (defmacro stream-case (stream
                        nil-case
                        =a=expressions
                        =a-f=expressions)
   (with-gensyms 
    (stream%)
    `(let ((,stream% ,stream))
       (if (not (stream? ,stream%)) 
          (error "Stream-case needs a stream 
                  input, got instead %S." ,stream%))
       (cond 
        ((nil? ,stream%)
         ,@nil-case)
        ((stream-future-nil? ,stream%)
         (let ((,(car (car =a=expressions)) (scar ,stream%)))
           ,@(cdr =a=expressions)))
        ((stream-with-future? ,stream%)
         (let ((,(car (car =a-f=expressions)) (scar ,stream%))
               (,(cadr (car =a-f=expressions)) (stream-future ,stream%)))
           ,@(cdr =a-f=expressions)))
        (t (error "Couldn't figure out what to do 
                   with stream %S.  This should never 
                   happen." ,stream%))))))

Stream case figures out which condition our stream is in, and then destructures the stream into the appropriate parts. When the stream is nil, the first body form is simply executed. When the stream is a singleton, the value contained in the stream is bound to a symbol the user passes in and the bodyforms are executed in that context, and when there is a value and a future-stream, those are bound to user defined symbols in a body for that case. It is all wrapped up in a hidden cond.

Example:

(let ((s (stream 'x (later (stream 'y nil)))))
  (stream-case s
    ((print "this is not executed because s is not nil")
    ((a) (print "If the stream had a single element, a would be
  bound to it here"))
    ((a f) 
     (print "This case is executed becase s has a future, a is the
  `scar` of s, f is the function which produces the future
  stream.")))))

About the simplest function we can write acting on streams is take-n, which simply takes a limited number of elements from a stream and converts them into a list.

(defun take-n (stream n &optional acc)
 (if (= n 0) (reverse acc)
   (stream-case stream 
     ((reverse acc))
     ((a) (reverse (cons a acc)))
     ((a f) (take-n 
             (funcall f) (- n 1) (cons a acc))))))

This function will return fewer than the requested number of elements if the stream ends before n is reached. Readers might notice this is a tail-recursive function. Emacs doesn't support tail-recursion natively, but I wrote a library that does, so we can define this function this way:

(require 'recur)
(recur-defun* take-n (stream n &optional acc)
  (if (= n 0) (reverse acc)
    (stream-case stream
                 ((reverse acc))
                 ((a) (reverse (cons a acc)))
                 ((a f) (recur (funcall f) (- n 1) (cons a acc))))))

This will never blow the stack, which is important, since we want to take many elements off a stream without worrying about whether the stream is shorter than the recursion limit.

Ok. With just these few functions we can play with some non-trivial streams.

(defvar *ones*
   (stream 1 (later *ones*))
   "Infinite stream of ones.")

(defun ints-from (n)
  (stream n (later (ints-from (+ n 1)) :with (n))))

(take-n (ints-from 5) 10) ;-> (5 6 7 8 9 10 11 12 13 14)
(take-n (ints-from 10) 10);-> (10 11 12 13 14 15 16 17 18 19)

These are really just parlor tricks, though. See if you can figure out how to define forever, which is a function which takes a value and returns a stream of that value forever. Or repeating, which takes a list and returns a stream which is that list repeated over and over again.

Note that even though functions like ints-from appear to call themselves, they do so only after they return their stream. There isn't any real recursive call - the stream serves as a trampoline. These functions won't blow the stack.

Building Towards the Stream Monad

Streams are obviously somewhat analogous to lists. Lists form a monad with list and map-cat as the return and bind operations respectively. We'd like to form a monad with streams too. Then we could do some truly interesting things with mlet* like notation, cuing up potentially infinite calculations into a nice list-like package.

We've already determined that the return operation for a stream monad would simply be the function which puts a single value into a singleton stream:

(defun stream-return (x) (stream x))

We need to build up to mapcat for streams, and then we will have our monad.

Streams get a little confusing at this point. The key to keeping our head straight is to remember that the tail of the stream always needs to be handled in such a way as to maintain the laziness of the stream. When you map over a stream, you don't actually visit the rest of the stream at the time of the map. You simply return a new stream whose future includes the fact that it needs to apply a function to each value before returning it. stream-map is just as lazy, in other words, as our streams.

Let's write smapcar, that is stream-map-car. The exact analog of the list mapcar function. It will take a function and a stream and return a new stream which is that function applied to every element in the input stream.

(defun smapcar (f stream)
  (stream-case stream
    (nil) ; empty stream -> empty stream
    ((a) ; singleton stream -> singleton stream of (f a)
     (stream (funcall f a) nil))
    ((a future)
     (stream (funcall f a)
       (later (smapcar f (funcall future)) :with (f future))))))

The only case which might be confusing is the last case. The stream-case expression extracts a, the value at the head of the stream, and future, the function which returns the future of the stream. We form a new stream whose head is (f a) and whose future is the mapping of f onto the future of the input stream. This mapping occurs later, only when someone asks for the future of the output stream.

It is possible to write a generalization of this function which maps a multi-input function over multiple streams. I'll leave such a function to the reader, but I will provide here a function smapcar2 which maps over two streams, because it will let us construct some interesting streams.

(defun smapcar2 (f-of-2 stream1 stream2)
  (stream-case stream1
               (nil)
               ((a) (stream-case stream2
                     (nil)
                     ((b) (stream (funcall f-of-2 a b) nil))
                     ((b g)
                      (stream (funcall f-of-2 a b) nil))))
               ((a f)
                (stream-case stream2 
                 (nil)
                 ((b) (stream (funcall f-of-2 a b) nil))
                 ((b g)
                   (lexical-let ((f-of-2 f-of-2)
                                 (f f)
                                 (g g))
                    (stream (funcall f-of-2 a b) 
                     (lambda () (smapcar2 f-of-2 (funcall f)
                      (funcall g))))))))))

With smapcar2 we can finally create things like the stream of Fibonacci Numbers:

(defvar fibs (stream 1
                      (later 
                        (stream 1 
                                (later (smapcar2 #'+ fibs (scdr fibs))))))
   "The stream of Fibonacci numbers")
(take-n fibs 10) ;-> (1 1 2 3 5 8 13 21 34 55)

Whee! An eternal golden braid, etc!

Stream Map & Concatenate

We might think that all we need to do now is define stream-map and stream-cat and then stream-map-cat would be the composition of these functions. This might almost be made to work except that either operation might be produce an infinite stream of results. It is more straightforward to interleave the concatenation with the map operation. We can define stream-cat by itself:

(defun stream-cat (stream1 stream2)
  (stream-case 
   stream1
   (stream2)
   ((a) (stream-case stream2 
                     ((stream a nil))
                     ((b) (stream a (lexical-let ((b b)) 
                                      (lambda () (stream b nil)))))
                     ((b g)
                      (stream a (lexical-let ((b b)
                                              (g g))
                                  (lambda () (stream b g)))))))
   ((a f)
    (stream-case stream2
                 ((stream a f))
                 ((b) (stream a 
                              (lexical-let ((f f)
                                            (stream2 stream2))
                                (lambda () (stream-cat (funcall f) stream2)))))
                 ((b g) 
                  (stream a 
                          (lexical-let ((f f)
                                        (stream2 stream2))
                            (lambda () (stream-cat (funcall f) stream2)))))))))

This function is a doozy, but the upshot is simple. If either stream is empty, we return the other stream, obviously. Otherwise we lazily pass stream two down stream one, until we find the tail of stream one, whereupon we fasten stream two. A call to stream cat doesn't instantaneously modify the entirety of stream1, it should be noted. It modifies just the tail, essentially passing instructions "down the line" to further modify each tail until a nil.

We can now define stream-map-cat.

(recur-defun* stream-map-cat (mf stream)
  (lexical-let ((mf mf))
    (stream-case stream
                 (nil)
                 ((a) (funcall mf a))
                 ((a f) 
                  (lexical-let ((interior-stream (funcall mf a))
                                (f f))
                    (stream-case 
                     interior-stream
                     ((recur mf (funcall f)))
                     ((b) (stream b 
                                  (later
                                   (stream-map-cat mf (funcall f)))))
                     ((b g) (stream b
                                    (lexical-let ((g g))
                                      (later 
                                       (stream-cat (funcall g)
                                                   (stream-map-cat mf (funcall f)))))))))))))

stream-map-cat is actually pretty easy to understand if you remember that mf is going to be a monadic function of the stream monad, and so whatever is passed in, mf always returns a stream (see Footnote 3 for a subtlety). Obviously if the input stream is nil, the output is also. If input has a single element, then mf on that element is the output stream. When there is a non-trivial stream, then we evaluate the monadic function to get the interior-stream and lazily concatenate it on the tail of the result of map-cat on the rest of the input stream.

That is it. We have the stream monad:

(setq monad-stream 
      (tbl! :m-bind #'stream-bind
            :m-return #'stream-return
            :m-zero nil))

So what can we do with it?

We'll, besides implement Kanren, the point of all this, I can think of at least one interesting example. Emacs provides functions to generate random numbers with a uniform distribution. random* from the cl.el library is able to produce such numbers in any given range. Consider the function:

(defun* random-numbers (lim &optional (state (make-random-state)))
  (let* ((*random-state* state)
         (val (random* lim)))
    (lexical-let ((new-state (make-random-state))
                  (lim lim))
      (stream val (lambda () 
                    (random-numbers lim new-state))))))

This function takes a limit and an optional state and returns a stream of uniformly distributed numbers. Using a dynamically bound *random-state* and the fact that make-random-state returns the current state, the resulting stream returns an infinite, but repeatable, list of uniformly distributed numbers.

Uniform distributions are fine, but we often want Guassian numbers. There are several ways to get them if you have uniform distributions. One way is to take the average of many uniformly distributed numbers. These averages will have a Gaussian distribution which can be scaled to whatever standard deviation is desired. A way which is less computationally intensive, however, is the Box-Muller transformation, which takes two independent distributions of uniformly distributed numbers and returns two distributions of Guassian numbers. If u and v are uniformly distributed, then

p = sqrt(-2*log u)*cos(2 pi v)
q = sqrt(-2*log u)*sin(2 pi v)

p and q are two independent but normally distributed numbers. If we don't care about the seeds in particular, only that they are different, we can create a stream of Guassian numbers like so:

(defun zip-streams (&rest streams)
  (apply #'smapcar* #'list streams))


(setq normal-numbers 
      (mlet**_ monad-stream ((pair
                  (zip-streams (random-numbers 1.0)
                               (random-numbers 1.0
                     (make-random-state t)))))
               (lexical-let ((u (car pair))
                             (v (car pair))
                             (r (sqrt (* -2 (log  u))))
                             (s (* 2 pi v)))
                 (stream (* r (cos s))
                         (later (stream (* r (sin s)) nil))))))

The monad handles collecting the two results into a single stream for us, and the result is itself an infinite list of normally distributed numbers. I don't know about you, but I think that is pretty cool.

Don't Interleave Just Yet

Because streams can be infinite, sometimes we don't want to strictly concatenate them. We can certainly concatenate a stream of infinite ones and a stream of infinite twos, but the result will be, effectively, a stream of infinite ones. The twos will be conceptually at the end of this stream, but you'll never reach them. Because of this, we sometimes want to interleave the results of monadic functions. Hence, this library provides an alternative monad monad-stream^i which has this behavior. In this case i means "interleave" rather than implicit. I'm going to have to do something about that.

The Peg Puzzle

If you return to the peg puzzle, making the minor changes you need to adapt the functions which enumerate possible moves so that they return streams, you can use almost the exact same code to produce a lazy-list of all possible solutions to the peg game. The full code is available in the examples directory. Using this code, one can queue up all the solutions to the puzzle, actually examining and returning only those which are needed.

I've implemented a slightly more complex version of the peg puzzle problem so that we end up with not just a stream of solved puzzles, but a stream of solved puzzles AND the included solution as a list of instructions. It's too lengthy to reproduce here, but you can see the code here.

As always, all the code here, and this tutorial, are available from my github account.


Footnote 1: I've found it handy define a set of "implicit" monads, the defining feature of which is bind does a lot more heavy lifting, basically squeezing non-monadic values into the monad if it encounters them during its work. For instance, the implicit list monadic bind operation takes two arguments. When the first is a list, bind is identical to the non-implicit list monad, but when it isn't a list it is wrapped in one before things proceed. Similarly, monadic functions may return non-lists, and when this happens, they are simply loaded into lists before being treated normally. The end effect is that you don't have to decorate all your expressions with m-return all the time. The only time you need to use one is when you really mean to return a list of nil rather than nil itself.

This might be a crazy thing, but seems ok to me! It seems to be more appropriate for dynamically typed languages because functions are always modifying their behavior to fit the input types anyway.

Footnote 2: Deftype produces a type which is unfortunately true under vectorp. Too bad.

Footnote 3: The function shown here is technically correct, in terms of the final values that appear in the stream and where. However, it is not as lazy as it could be. The key recognition is that, while mf is not the tail of a stream, the partial application of mf to a is a function which takes no arguments and returns a stream. That is, (lambda () (funcall mf a)) is an acceptable stream tail, and we don't need to evaluate mf a to use it as such. The following code implements this fully lazy version of stream-map-cat, here called stream-map-cat-tail. The only thing you need to understand in addition to the above is that par in the following code stands for partially apply on the right.

(defun stream-cat-tail (head-stream tail)
  (if (stream? head-stream)
      (stream-case head-stream
                   ((funcall tail))
                   ((a) (stream a tail))
                   ((a f)
                    (stream a 
                            (later 
                             (stream-cat-tail (funcall f) tail) :with (f tail)))))
    (stream-cat-tail (funcall head-stream) tail)))

(defun* stream-map-cat-tail (mf instream)
  (stream-case instream
               (nil)
               ((a) (funcall mf a))
               ((a f)
                (let ((tail (par mf a)))
                  (stream-cat-tail tail
                                   (later
                                    (stream-map-cat-tail mf (funcall f))
                                    :with (mf f)))))))

stream-cat-tail pins the function tail at the end of the stream head-stream. It turns out to be convenient to pass functions returning streams into the head-stream position, so this function checks for that case and extracts the head stream if needed. The library actually uses these functions rather than the slightly less lazy version in the tutorial body. I found that version easier to understand at first.

Saturday, April 9, 2011

Deep Emacs Lisp Part 1 (Basically, a Monad Tutorial)

Implementing Monads

(Note: everything in this tutorial (and this tutorial) is available on my github)

I've started working on an implementation of Kanren, the logic mini-language covered in The Reasoned Schemer, in Elisp. I want to write about this process vis-a-vis the unique considerations of Emacs Lisp as a language, but its a middle sized project, which depends on several pieces of code, which themselves depend on code which I haven't described in detail anywhere. Kanren depends on a monad (the "stream of substitutions" monad, for the nerds out there). An implementation of this monad is a good place to start when thinking about implementing Kanren itself, but I've never written up a piece of documentation on the implementation of Monads I've cooked up for Elisp, so I thought I'd start there.

So, this document one part monad tutorial and one part introduction to the monads.el implementation.

First, credit where its due: the following implementation of Monads owes is basic form to the monad library in Clojure-contrib. I wrote it while reading through Jim Duey's completely excellent monad tutorials for clojure.

Besides minor considerations associated with dealing with the default dynamic scope in Elisp, the implementation here is basically that of the Clojure Contrib library. And I could never have understood what I was doing without the tutorial.

Onto the Monads

We've reached the monad tutorial part of this document. Like most people, I found monads pretty hard to understand when I first encountered them. Most monad tutorials speak in Haskell, which I think can be doubly confusing because its hard to tell exactly where Haskell ends and where monads begin, since Haskell has special syntax built in to the language just for the purpose of dealing with monads. You can, of course, use the functions >>= and return outside of do notation, as many tutorials do, but at least dealing with >>= requires a fair amount of comfort with Haskell's infix operator system. At least it seemed that way to me.

I really had to implement monads in Elisp to finally get it. I think this worked because every part of the monad beastiary had to be built from the ground up. If you want "do notation," for instance, it isn't just there, you have to write a macro for it, and understand it. The uniformity of Lisp syntax (which some people hate, I know) also helps keep things focused on the ideas. That is the approach we will take here. We'll start just kicking around some general ideas and gradually build up to do notation, making constant analogy to well understood lisp and programming concepts. I think this is better than throwing oneself in head first.

Notes on Types

I think of a monad as a way of decorating function composition. If I have a group of functions that conform to a particular input and output constraint, a monad associated with that constraint tells me how to combine those functions so that something interesting happens. The fact that these functions all have the same type of input and output is critical, otherwise the "plumbing" specified by the monad won't work. Such functions are called "monadic functions" and when we try to understand monads, or a specific monad, we must meditate on the type of these functions.

When coming to monads from dynamic languages, it can be easy to gloss over all the type language used to describe them. But it really helps to think about types in a loosy-goosey way, at the very least, because type constraints are what make the plumbing in a monad work. Monads generally contain things. In statically typed languages the monad type depends on the type of the things it holds, so it would be specified, but we'll just refer to these things as "values". Their type isn't terribly important except that they are distinct from the type of monadic values.

A monadic value can be thought of as an abstract container which contains regular values. Sometimes this is a rather obvious kind of relationship. A list "contains" values, for instance. Sometimes it can be much more abstract.

Finally, there are monadic functions. A monadic function is a function which takes a regular value and returns a monadic value. It is important to think about these types while we work on the rest of the tutorial, so take a second to look at the diagrams, and then get into the habit of calling to mind the types of various objects whenever you get confused. It really helped me.

Types of Interest

The List or Sequence Monad

We'll be talking about the sequence monad first. We will represent sequences with regular old lisp lists, which can hold values of any lisp type. The monadic values, then, are lists. Lists "contain" (they also contain) lisp data. Let us meditate upon the type of monadic functions in the sequence monad.

==== this space reserved for meditation ====

If you guessed in your meditation that the monadic functions associated with the sequence or list monad (we'll use the two interchangeably from here on) are functions which take a single value, which is any lisp data, and return a list of lisp-values, you got it.

So, whatever the hell a "list monad" is, its got something to do with composing functions which take a single lisp value and return a list.

What kind of functions might do this? Well, a common idea is that functions in the list monad represent computations with multiple possible outcomes or values. A person might have multiple friends, so a function get-friends might return a list of persons rather than a single person. We might want to go on to calculate those persons' friends (or enemies, relatives, lovers). All these get- functions would be monadic functions in the list monad, and the list monad will help use deal with them.

Let's start kicking around ideas about composing these functions. To do that we need a few functions to consider explicitly. What are the simplest monadic functions of the list monad? Well list is certainly such a function when you call it with one argument, so lets put that on the pile. In that spirit, consider:


(defun stub (x) nil)
(defun list-return (x) (list x))

monadic return

list-return can be thought of as a sort of "default" or "simplest possible" monadic function. We've just wrapped up list so that it takes but one argument. Every monad needs a return, and every return does something like the above - it wraps its input into the monad in a simple way.

stub is also a monadic function - it ignores its inputs and returns the empty list.

Ok, lets write some semi-meaningful list-monadic functions. Consider:


(defvar *people* '(:ted :lea :leo :james 
                   :harvey :sally :jane :andrew 
                   :catherine) 
                   "A list of all the people that matter.")
(defvar *friends-db* 
      '((:ted (:lea :leo :sally :andrew :catherine :leo :jane))
        (:lea (:ted :leo :jane :andrew :harvey :sally :catherine))
        (:leo (:ted :lea :ted :harvey :sally :jane :andrew 
                    :catherine :harvey :andrew :catherine))
        (:james (:jane :harvey :jane))
        (:harvey (:leo :lea :leo :james :harvey :harvey :sally))
        (:sally (:ted :leo :lea :harvey :jane :andrew))
        (:jane (:lea :leo :james :sally :ted :james :andrew :catherine))
        (:andrew (:ted :lea :leo :sally :jane :leo))
        (:catherine (:ted :leo :lea :leo :jane :catherine :catherin))) 
"Our database of friend connections.")

(defun friends-of (person) 
  "Return a list of all the people in friends-db."
  (alist *friends-db* person)
         ;alist is a function which retrieves
         ;a key's data from an association list.
)

(defun mutual-friends-with (person1)
  "Return a monadic function which returns 
   the friends person1 has in common with person2."
  (lexical-let ((p1-friends (friends-of person1)))
    (lambda (person2) 
      (let ((p2-friends (friends-of person2)))
       (filter (lambda (friend)
                 (in friend p1-friends)) p2-friends)))))

Friends-of is more or less obvious enough. It simply retrieves the database entry in the friends database, here represented as an association list. It takes a person and returns a list of persons, so it is a monadic function.

The second function is a little more complicated. It takes a person and returns a function, so its not a monadic function itself. It does, however, return a monadic function - one parameterized on person1. It returns the friends of person2 that are also friends with person1. This business of having a function which returns a parameterized monadic function is pretty common, and it will lend monadic expressions a certain domain-specific-language feel, as you'll see. Note that we need a lexical-let in the function body to create a closure around p1-friends. If we used a let, p1-friends would be out of scope by the time someone called the lambda. If this is confusing, see the wikipedia article on scope in programming languages. Emacs has a default dynamic scope, but most modern languages are lexical2.

But we were meditating on function composition. Before going any further lets implement a function which composes functions. The only wrinkle here is that order of composition is reversed from the order of application (f1 is applied before f2) because this looks more natural in an applicative language.


(defun compose2 (f2 f1)
  "Compose F2 and F1 by returning a new function which
   calls F1 on its arguments, then calls F2 on the result"
   (lexical-let ((f1 f1)
                 (f2 f2))
    (lambda (&rest args)
      (funcall f2 (apply f1 args)))))

(defun compose (&rest fs)
  "Compose the functions FS (FN FN-1 FN-2 ... F1) 
   beggining with F1 and working backwards."
  (let ((rfs (reverse fs)))
   (reduce
     (lambda (acc-f f)
       (compose2 f acc-f))
     (cdr rfs)
     :initial-value (car rfs))))

Let's try just composing some of our list monadic functions. We might actually want a function to calculate the friends-of the friends-of someone:


(compose #'friends-of #'friends-of)

Because this is Elisp, this doesn't cause an error. Of course, if we think about this, we know it can't work. As soon as well call this function, we'll get an error (actually, it will return nil, because a list of friends is not a key in the friends db), because the second friends-of will be applied to a list, rather than a single person. Try it, if you are having trouble seeing it.

Suppose we were dead set on jamming these two functions together and, in particular, we want to make sure that the resulting function can be used wherever either of the two functions might have been used. That is, we want the result type to be the same as the input types. There are several approaches, obviously, but here is a sort of obvious one (read the comments here, they are important):


(defun list-func-compose (f2 f1)
   (lexical-let ((f1 f1)
                 (f2 f2))
    (lambda (arg) ; lambda need take only a single arg, as f1 must
      (let* ((r1 (funcall f1 arg)) 
        ; r1 is a list, because it is the result of f1, a monadic function.
        ; f2 accepts regular old values, though, and r1 could
        ; concievably have many values in it.
             (r2s (mapcar f2 r1)))
          ; now r2s contain the results of applying f2 to ALL
          ; the values in r1.  Each of these values in r2s is
          ; itself a list, because f2 always returns a list (it 
          ; is a monadic function itself)
             (reduce #'append r2s)
          ; Well, we want a list of things in the monad, not a
          ; list of lists, so we just append all the r2s together 
          ; (see footnote #1).
          ; Now we've got a list of lisp values, which  means
          ; that this lambda is a monadic function in its own
          ; right.  Success!!
          ))))

(n.b. If something about r2s type strikes you as odd, see footnote 1.)

Pause to consider what happened here. We apply f1 to the input, which produces a list (a monadic value). F2 takes values, not monadic values, so we just mapcar f2 over the values in the list. F2, like f1 returns a list, so now we have a list of lists, which we concatenate into a big list, which is now our output monadic value.

I hate when, in the course of a didactic development, something springs from nowhere with the proviso that "it will be useful later." This kind of thing is typical of physics and math texts, and I think its part of the reason they are so difficult to read. However, for the sake of future utility, we are going to factor out a piece of the above code. Consider the function list-bind which takes a monadic value and a monadic function and returns a new monadic value:


(defun list-bind (mon-val mon-fun) ; -> mon-val 
       (let ((results (mapcar mon-fun mon-val)))
          (reduce #'append results)))

We can rewrite list-func-compose like this, now:


(defun list-func-compose (f2 f1)
  (lexical-let ((f1 f1)
                (f2 f2))
    (lambda (value) 
       (let ((monadic-value (funcall f1 value)))
          (list-bind monadic-value f2)))))

Ok, what is so special a priori about list-bind (post priori, its one of the monad functions which any monad needs)? List-bind is, in a way, the fundamental operation relating monadic functions and monadic values in the list monad. List-func-compose might seem more directly useful, but it is kind of strange that the resulting function takes a non-monadic value and returns a monadic one.

List-func-compose relates monadic functions and other monadic functions, but it doesn't reveal much about how they are combined with values. List-bind describes that process, and it turns out this is more interesting.

Another way of thinking of it is that if list describes how to turn a value into a monadic value, list-bind relates a mondadic function to a function which both operates on and returns monadic values.

Monadic Bind

Ok, believe it or not, we've basically covered the entire sequence monad already. All that is left is to really get the idea and to understand how it relates to Haskell's do notation and the canonical monad operations. But it bears repeating - "do notation" isn't the same thing as monads. It is just syntactic sugar. Its might be best to understand monads before do notation, depending on your temperament.

Let's try things out, however.


(funcall (list-func-compose #'friends-of #'friends-of) :leo) 
(:lea
 :leo :sally :andrew :catherine :leo :jane :ted :leo :jane :andrew
 :harvey :sally :catherine :lea :leo :sally :andrew :catherine :leo
 :jane :leo :lea :leo :james :harvey :harvey :sally :ted :leo :lea
 :harvey :jane :andrew :lea :leo :james :sally :ted :james :andrew
 :catherine :ted :lea :leo :sally :jane :leo :ted :leo :lea :leo
 :jane :catherine :catherin :leo :lea :leo :james :harvey :harvey
 :sally :ted :lea :leo :sally :jane :leo :ted :leo :lea :leo :jane
 :catherine :catherin)

(We're probably really interested in the unique values of this list. We could use something called the set monad to get those, or we could just pass this result through unique.

Pretty anti-climactic, for sure. But we can do more interesting things:


(funcall 
  (list-func-compose 
     (mutual-friends-with :lea) #'friends-of)
  :leo)

This composition gives a list of the friends of Leo who are also friends with Lea. As indicated above, what we've done here is inserted a parameterized monadic function into the composition. This turns out to be useful. It also hints at the next step. What if we wanted to parameterize a monadic function based on some of the values "coming down the pipe" in the monad? How could we do that? In a regular function composition all those values are invisible. Is there way we can name them which preserves the utility of this approach?

Stepping Back

Monads in one sentence: a monad is set of operations which relate specific functions called monadic functions and specific values, which are either naked values (the input type to monadic functions) or monadic values, which is the output type of monadic functions.

In particular, the bind operation knows how to pull values out of monadic values, apply monadic functions to them, and collect all the resulting monadic values into one big monadic value. Using bind we can compose or otherwise manipulate monadic functions in a controlled way.

Everything else is window dressing.

Window Dressing

Let's talk about let*. We've been twisting up our brains into knots over monads, so let's let those knots untwist and return to this simple, clean, construct. In Lisp, variable bindings are explicitly introduced. You don't just declare a variable and go, you create a context for that variable with a let or let* statement. You probably know that if you only had lambda and defmacro you could define let like so:


(defmacro my-let (bindings &rest body)
  (funcall (lambda ,(mapcar #'car bindings)
             ,@body)
      ,@(mapcar #'cadr bindings)))

eg:


(my-let ((x 1) (y 2)) (+ x y))

expands to


(funcall (lambda (x y) (+ x y)) 1 2)

It is clear that x and y can't depend upon one another in this form:


(funcall (lambda (x y) (+ x y)) 1 (+ x 2))

Obviously won't work (why?). let* is the form which lets us chain together multiple variable bindings so that previous bindings are active during the expression form of subsequent bindings. Of course you can implement it as a series of nested lets, but its useful to write an implementation using only lambda.


(defun empty? (x) (not x)) ; sugar to test if a list is empty.
                           ; nil is the empty list, (not nil) -> t
(defmacro my-let* (bindings &rest body)
  (cond
    ((empty? bindings) `(progn ,@body))
    (t 
     `(funcall (lambda (,(car (car bindings)))
          (my-let* ,(cdr bindings) ,@body))
        ,(cadr (car bindings))))))

Don't get spooked by this recursive macro. All it means is that


(my-let* ((x 10)
          (y (+ x 11)))
   (+ x y))

expands to


(funcall 
      (lambda (x)
        (funcall (lambda (y) (progn (+ x y))) (+ x 11))
     10))

That is, each subsequent bind value expression is evaluated in the context of a function where the previous binding symbols are already bound.

Holy balls! This is almost function composition! The lambda containing our body (the progn form) is f1 and each containing lambda represents a composition onto this inner function. The only difference is that we're threading function application through the composition in such a way as to provide named values into more deeply nested contexts. Things are about to get serious!

Also, we are just throwing the word bind around like crazy! Does this use of the word bind have something to do with the list-bind function?

You bet it does! You know, ordinarily I could take or leave Lisp-2's. It seems kind of crufty to me to have to worry about a second namespace, to have to remember to type (funcall f-var a b c) or (funcall #'f a b c) but in this one case, its actually really useful that we have to write out funcall explicitly. Its useful because it reminds us something is going on there. What would happen if we replaced those funcalls in the above expansion of let* with some other function? Let's kick this can around a bit, and see if something falls out, shall we?


(defun regular-bind (v f)
  (funcall f v))

So in the let* expansion funcall is only ever taking functions with one argument, so the first thing we do is declare a new function which applies a single argument function to a single value. Why did we name it regular-bind? Well, think about the expression:


(regular-bind 10 (lambda (x) (+ x 1))); -> 11

In (lambda (x) (+ x 1)) x is unbound. That is what a lambda is, essentially, a chunk of code in which the arguments are unbound, waiting to be bound so the body can be evaluated. Regular-bind is just a function which binds a value (10, above) to a variable and executes the code. Note: funcall takes f and then v, but regular-bind takes v and then f. This might make more sense depending on how you read code ("bind v in f") or it might not. It is just convention. It is called regular-bind because it doesn't really do anything that funcall doesn't already do. Monads, though, are all about irregular bind operations.

Let's rewrite let* so that it uses bind instead of funcall:


(defmacro my-let* (bindings &rest body)
  (cond
    ((empty? bindings) `(progn ,@body))
    (t 
     `(regular-bind
        ,(cadr (car bindings))
        (lambda (,(car (car bindings)))
                      (my-let* ,(cdr bindings) ,@body))))))

Easy enough.

let* is how you thread together computations in the "identity monad". You may have heard someone say something like "regular lisp lives in the identity monad." This is what they mean. Normally, all bind does is associate a value with a symbol and go. The above let* expression now expands to:


(funcall 
      (lambda (x)
        (regular-bind  (+ x 11) (lambda (y) (progn (+ x y)))))
     10)

You should be chomping at the bit now, because we've basically invented do notation. The question you should be asking is "What happens when we replace bind with some other function?" Let's make a slightly newer macro to play with that idea.


(defmacro monadic-let*-inner (bind-symbol binders &rest body)
   (cond 
     ((empty? binders) `(progn ,@body))
     (t 
       (let ((symbol (car (car binders)))
             (bind-expression (cadr (car binders)))
             (leftover-bindings (cdr binders)))
      `(funcall ,bind-symbol 
          ,bind-expression
           (lambda (,symbol)
              (lexical-let ((,symbol ,symbol)) ; create a lexical 
                                               ; over ,symbol.  
                                               ; we want one
                                               ; for lots of monads.
               (monadic-let*-inner 
                   ,bind-symbol ,leftover-bindings ,@body))))))))

(defmacro monadic-let* (bind-f-expression binders &rest body)
  (let ((bind-symbol (gensym "temporary-bind-symbol-")))
    `(let ((,bind-symbol ,bind-f-expression))
          (monadic-let*-inner ,bind-symbol ,binders ,@body))))

This new form takes an expression which evaluates to a bind function, a set of let-like binders, and a list of body forms and then it just builds the same code as a let* except that it uses the specified bind operation rather than a mere funcall. Let* sequences computations. Monadic-let* sequences operations through a monad, the behavior of which is specified in bind.

Let's try this crazy thing out:


(monadic-let* 
       #'list-bind 
       ((x '(1 2 3)) 
        (y (list (+ x 1) (- x 1))))
     (list y)) ; -> (2 0 3 1 4 2)

Congratulations, friends, we are now living in the list monad.

Let's think about what this is doing one more time. We have a series of expressions (the right hand portion of each binding form). The monadic bind operation assumes each expression is a monadic value. In our case, these values are lists of lisp data. Rather than bind the result of this expression to the symbol on its left directly, our bind specifies that we should bind the symbol to each value inside the monad in turn, evaluate the subsequent binding expressions in the same way, evaluate the body, which must be a monadic value, collect each of those values, and then, finally, combine them back into a monadic value again before returning the result.

That is really complicated

Why would anyone do this?

The reasons for doing something like this are as varied as the kinds of monads that are out there. In Haskell, monads are a way of writing imperative style code in a pure fashion, but even in non-static, non-pure languages, monads can pull some neat tricks. The sequence monad, combined with monadic-let* is essentially a list comprehension. Suppose we want to collect all the combinations of two numbers less than 100 whose sum is less than 10.


(monadic-let*
 #'list-bind 
 ((x (range 1 101))
  (y (range 1 101)))
  (if (< (+ x y)  10) (list (vector x y)) nil))

;-> ((1 1) (1 2) (1 3) (1 4) (1 5) (1 6) (1 7) (1 8) 
       (2 1) (2 2) (2 3) (2 4) ...)

Here we simply return the empty list when we don't want to put anything into the monad because appending the empty list onto something does not change it. Other applications in an impoverished language like emacs is to use the continuation monad to fake co-routines or to use a stream monad to make dealing with lazy lists easier. We'll revisit this in another post.

Notes about the monads.el library

(note: I've made some changes to the `monads.el` library so that there are monad sequencing forms that cover a large range of possible intents. I've also renamed some of the forms, which I've tried to reflect here. If you can't get some code to run, look at `monads.el`. Each of the forms is documented.)

It turns out its handy in the context of monadic operations to have bind and return (and some other things) dynamically defined, so in the monads.el library, monadic-let* is slightly different. In that library, monads are defined as hash tables containing :m-bind and :m-return keys which associate with the appropriate functions. Eg:


(defvar monad-seq 
  (tbl! 
   :m-zero (list)
   :m-return (lambda (x) (list x))
   :m-bind (lambda (v f) (apply #'append (mapcar f v))))
 "The list/sequence monad.  
  Combine computations over multiple possibilities.")

Note here that tbl! is is just sugar for make-hash-table. The nearest equivalent to what we've called monadic-let* is mlet which takes as its first argument a monad represented as a hash table. In addition to the bind-chaining described above, it introduces a dynamic context wherein the functions m-bind and m-return are defined for that monad. Because we're in emacs, and everything just sits in one giant rotting pool of symbols, we prefix the monadic functions with m- to avoid future name collisions.


(mlet
 monad-seq
 ((x (range 1 101))
  (y (range 1 101)))
 (if (< (+ x y) 10) (m-return (list x y)) nil))

Note that we can use m-return inside mlet-like forms, because they define them automatically.

Monads.el defines several common monads in addition to the sequence monad. These include the state monad, the continuation monad, the set monad (parameterized on an equality predicate), an Error monad, and a Maybe monad. Monad-parse.el defines a monadic parser combinator library which is really useful (to me, anyway).

Sujar

The monad library can make use of my implementation of Clojure-style destructuring bind too. One can write the above expression as


(domonad monad-seq 
 [x (range 1 101)
  y (range 1 101)]
 (if (< (+ x y) 10) (m-return (list x y)) nil))

Besides the superficial change of using a vector for the binding expressions, you can use arbitrary nested binding forms inside. If you are familiar with clojure, it works basically like that, except that tables are destructured with [:: ] instead of { } and you can use table-destructuring on hash tables and alists.

Conclusions

For me, monads didn't hit home until I walked through the process of creating a new monad from scratch based on my own ideas. This is described in weighted-graph-monad.el, but in order to move forward in our discussion of Kanren, the logic language, we'll need to develop a stream monad, which facilitates work with lazy lists. That will be the next example.


Footnote 1: Here is a bit of a confusing thing. The monadic values of the list monad are lists of lisp data types. So a list of lists is, in fact, a monadic value, which means that r2s above could be considered, itself, to be a monadic value. If this didn't occur to you, just skip the rest of this note. I bring it up only because if it has occured to you, it might be somewhat confusing. You might ask: why don't we just return r2s instead of (reduce #'append r2s)? This kind of of conceptual confusion is related to the fact that Lisp is not statically typed (or is statically typed but you are restricted to a single type of which the various "types" of values constitute classes).

In a statically typed language the sequence monad would be parameterized on the type of its contents, which we'd specify where we were using it. So we'd have a type like SeqM Int (read that as "the sequence of ints monad") which would tell use that our monadic values were lists of integers only, and consequently r2s above would be manifestly not a monadic value, and it would be more obvious that we needed to append its contents together before recapturing a monadic value. In a dynamic language this isn't as obvious, but its still conceptually necessary. The upshot is that the fact that r2s is a monadic value is a coincidence, the essence of the list monad is still to combine the results of monadic functions, rather than to merely collect them, which is what returning r2s itself would represent. If you did read this despite my warning, you're probably really confused. Its ok. This stuff is confusing at first.)


Footnote 2: This is a long, diversionary footnote. Probably its best to finish the rest of the document before reading.

For some reason I find it useful/pleasurable/amusing to consider the question of monads in languages without scope at all. Namely, stack languages like Factor or Joy. Not only are these languages without scope, by default, variables are not even named. Values are merely pushed onto and pulled off of a stack which is passed implicitly between "words," the analog of functions:


(require 'with-stack)
(||| 4 4 + 3 -) ; -> 5

Here we are using a stack language interpreter which lives inside emacs lisp. You can get "with-stack.el" from my github.

Without a means of binding variables, "lambdas" become merely "quotations," simple lists of literals and words which can be executed by words like call.


(||| 4 '(4 +) call '(3 -) call) ; -> 5

In the "emacs stack language" quotation serves directly as the quote operator, and quotations are represented as lists. Executing a quotation is the same looping over it, executing each word as it is encountered, or pushing atoms onto the stack.

The nakedness of these quotations encourages languages to use them as blocks in control structures:


(||| 1.0 1>random* .5 > '("true" print) '("false" print) if)

The above will print "true" 50% of the time and "false" 50% of the time. I don't want to get bogged down with a full with-stack tutorial, but "1.0 1>random*" means "push 1.0 onto the stack and call the emacs lisp function random* with 1 argument from the stack, pushing the result". The word if takes a boolean and two quotations, executing the first when the bool is true and the second when it is false.

Ok, so quotations are lambdas. What are monadic quotations of the list monad? These must be quotations which take a value off the stack and return a list of values. The quotation version of return for the list monad would be:


(||| word: list-return 1>list end:)

That is, take a value off the stack and return a list with that value in it.

What about bind?


(require 'stack-words)
(||| word: list-bind ;( mv f (v ;; mv) ;; mv )
     map '(append) reduce end:)

Characteristically, the word version is quite terse. Interestingly, we hardly need to have an analog of do notation, because variables are not named and application is merely juxtaposition:


(||| word: bi>list ;( item qtn1 qtn2 ;; ( res1 res2 ) )
     bi 2>list end:)
(||| '(1 2 3) 
     '( '(1 +) '(1 -) bi>list )
     list-bind
     '( '(3 +) '(3 -) bi>list )
     list-bind) ;-> (5 -1 3 -3 6 0 4 -2 7 1 5 -1)

(The word bi>list takes an item and two quotations, applies the quotations each to the item and collects the results into a list of two elements. It assumes the quotations take one item off and push one onto the stack.)

Wherever we want to use monadic behavior, we simply replace call with list-bind. We can write a word to make this more pleasant:


(||| word: fold-bind-into ; ( qtnseq bind-qtn ;; qtnseq )
      '( curry ) curry map 1>flatten-once end:)
(||| word: monadically ; ( init-val qtnseq bind-qtn ;; result )
       fold-bind-into call end:)

(||| '(1 2 3)
       {{ '( '(1 +) '(1 -) bi>list )
          '( '(3 +) '(3 -) bi>list ) }}
     '(list-bind) monadically )

Where {{ is a parsing word which makes a list of what is on the stack between {{ and }}. We could have used quote to make the list of quotations, but we'd be getting a lot of quote marks on the screen. All this word does is fold the contents of a binding quotation (here a simple call to list-bind) into the sequence of quotations we want to operate monadically. Then we simply call the resulting quotation.

Assuming some comfort with the idioms of concatenative languages, the result is an incredibly simple demonstration of the essence of monadic operation. It strikes me that it is often the lowest-level words which are the hardest to read and write in concatenative languages, while high level code has an almost comic simplicity. I wonder what this means?