Thursday, April 26, 2012

re-enable the "Reader" logo-link in Google Reader.

Google recently changed the behavior of their google reader app in one small but incredibly obnoxious way.  Used to be, you'd click on the "Reader" logo  in the top rightish area of the page and you'd go back to the "home" page of Reader.  You know, pretty much like every single web page in the entire universe.

They disabled this perfectly reasonable feature, presumably because of some loathsome person in marketing who felt that users should have to select Reader again from the "More" tab on the little toolbar at the top of the page, forcing them to peruse googles other fantastic services.

This sucks.  

Luckily, this is simple to repair using Greasemonkey or some other script manager.  Add this script and set it to match against "http://reader.google.com/view/*":

var link = document.getElementById("logo-link");
link.setAttribute("href","https://www.google.com/reader/view/#overview-page");

Use the extra time you save not having to find the reader link in the menu to research alternatives to google services, since it looks like google is going to jump the shark pretty soon.


Wednesday, April 18, 2012

Parenlab: S-expressions on top of Matlab/Octave

I am two people - one of them is a numerical scientist who spends most of his time writing code in Matlab to do data analysis. The other is a freelance Lisp programmer, where I do regular software engineering. I've long been a Lisp enthusiast, and I do a fair amount of Lisp programming in Emacs Lisp as part of my scientific work, since Emacs is my Matlab IDE, but I didn't start really programming in Lisp for "real things" until recently. As the number of hours dedicated to Lisp increased in my life, I began to really miss the features in Lisp that I didn't have in Matlab; in particular s-expression motion and editing and metaprogramming.

I can't abandon Matlab, however. Matlab's support for numerical programming is exceptional, true, but the real value is that its plotting system is rich, well integrated and customizable. Plus, I have 7+ years of Matlab utilities which it would take significant time to reproduce in another numerical analysis language, even if it was feature complete otherwise.

So I brought the mountain to Muhammad.

Parenlab

Parenlab is a Lisp which compiles to Matlab by way of Emacs Lisp. What this means is that the syntax is Emacs Lisp and the Semantics is Matlab. Metaprogramming is possible via macros which transform the elisp representation of Matlab code before the code is translated to Matlab. Because I use Emacs as my Matlab IDE, this is a reasonable solution - just use the Emacs Lisp interpreter as the macro language. One day, Parenlab may be self-hosting.

Examples:

The parenlab code:

(let* ((x (range 0 (* 2 pi) 100))
       (y (sin x)))
 (plot x y :color "g"))

Is converted to:

funcall(...
    @(x)funcall(@(y)plot(x, y, 'color', 'g'), sin(x)),...
     range(0, mtimes(2, pi), 100))

Or:

(defun (a b c) some-function (a b c)
   "Documentation."
   (setq a (++ a b c))
   (setq b (++ a b c))
   (setq c (++ a b c)))

Is converted to a file, called someFunction.m which contains:

function [a,b,c] = someFunction(a,b,c)
%Documentation.
'Documentation.';
a = plusplus(a, b, c);
b = plusplus(a, b, c);
c = plusplus(a, b, c);

defuns can be sprinkled throughout your script files - they will be transcoded as they are encountered and produce no output in the currently building function. One can also say:

(script some-script-name 
 <parenlab-code>)

Which indicates the code beneath should be transcoded to a script file rather than "in place."

Parenlab is still a work in progress, but it is rapidly approaching the point where any functionality you wish can be expressed in Parenlab.

Notes on Usage

Mangling

Parenlab, like Parenscript, tries to make life easy by letting you write lisp-style identifiers. In doing so, it mangles names during translation. Eg, a parenlab symbol:

some-function

Will be rendered as:

someFunction

eg, dashes followed by letters are converted to camel case. Other "special" characters are transcoded according to the following table:

("+" "plus")
("-" "minus")
("*" "mtimes")
("<" "lessThan")
(">" "greaterThan")
("$" "cash")
("=" "equal")
("!" "bang")
("?" "who")
(":" ":")
("/" "divide")
("\\" "mdivide")
("#" "hash")
("@" "at"))

These choices are made to improve transcoding, so that many operators don't need special cases. Eg (* 10 10) translates to mtimes(10,10) - conveniently using Matlab's built in mtimes function, which is equivalent to Matlab's * operator.

One wrinkle is that : is reserved to preserve simple matrix construction expressions. For instance,

x:some-variable:y

transcodes to

x:someVariable:y

This syntax is limited by how the lisp reader reads symbols. For complex generation expressions, use the (: ) macro. Eg:

(: start step stop)

Keywords, that is symbols starting with :, are transcoded to mangled strings, so that :a-keyword becomes 'aKeyword'.

Since many matlab functions use strings as keyword arguments, this lets you use keywords for them instead.

nil transcodes to []. There are no true and false values in Matlab.

Scope and Variables

Scope in Matlab is funny - only limited lambdas are allowed in scripts and functions, although nested functions with full definitions and full lexical scope behavior are allowed within functions (though not anonymously).

Sort of like in Python, lambdas are restricted to single-expression-only bodies. While they do form closures over their lexical environment, those closures are static, that is, they do not permit side effects of any kind.

That is,

(setq y 10)
(setq f 
 (lambda (x) 
   (setq y x)))

Will compile fine in Parenlab, but produce an error, because the inner setq is not an expression, but a statement, which isn't allowed. Parenlab does provide a progn form, so you might try:

(setq y 10)
(setq f 
 (lambda (x) 
   (progn (setq y 10)
          nil)))

This will generate Matlab code which uses eval to produce code which expresses this intent, but you'll still get an error, saying that the lambda tries to modify its static environment, which is not allowed. let and let* are implemented using lambda, so they create static lexical environments. It is conceivable that major cross compilation tricks could be used to simulate dynamic scopes, but I'd prefer to keep things simple.

You shouldn't be using side effects anyway.

The rule of thumb is write Matlab code with s-expressions, not Lisp code that calls Matlab functions. Parenlab tries to bridge the gap but in some ways Matlab is too limited.

If Statements

If in Matlab is "flat", in that each branch is a series of statements or expressions and the if statement itself doesn't return a value. Parenlab if statements are not flat - they return the value of whatever branch is evaluated.

This rule is sometimes inconveneint for the Matlab idiom, so it can be disabled if the branches are blocks. A statement of the form:

(if condition 
  (block 
    <some-code>)
  (block 
    <some-other-code>))

Transcodes to:

 if condition 
    some code
 else
    some other code
 end

Ordinarily, if expands to

fif(@()condition, @()true-code, @()false-code)

Here we use the trick of using lambda to delay evaluation. This if expansion cannot perform side effects, unfortunately. Use (block) legs if you want regular matlab semantics. Both legs must be blocks.

Cell Arrays

Indexing is identical function calls, syntactically, so the index expression:

x(1:10)

Is written as

(x 1:10)

In parenlab. Matlab also has cell array indexing, which looks like this:

x{1:10}

Which is expressed like this in Parenlab:

({} x 1:10)

These are macros, not functions, and so the use of end as an identifier inside them is fine, eg:

({} x 1:end)

Will convert to:

x{1:end}

Structs

Structure access can be written as symbol.name since the parenlab mangler leaves dots in symbol names. Programmatic access, where a string is used, is written with ->, eg:

(-> s :field)

Which is equivalent to s.('field'). -> supports nested access, so you can say;

(-> s :f1 :f2 :f3)

Which is like this: s.('f1').('f2').('f3') which may not actually be valid Matlab syntax. Parenlab implements this as a function call.

Macros

In elisp, parenlab macros can be defined with pl:def-pl-macro. During transcoding, the form defmacro causes a new parenlab macro to be defined and generates no output. Parenlab macros have the same semantics as regular Macros. You get syntax passed to the macro as arguments, you transform it into valid parenlab, and you return the result.

For instance, a macro which introduces a lexical variable binding called with, like this:

(with x 10 (* 2 x)) ;-> 20

Is implemented like this:

(pl:def-pl-macro with (symbol value expression)
 `(funcall (lambda (,symbol) ,expression) ,value))

Or, inside some Parenlab code:

(defmacro with (symbol value expression)
 `(funcall (lambda (,symbol) ,expression) ,value))

Other Notes

Right now you use parenlab by invoking pl:transcode on an s-expression. This will transcode in the current buffer. pl:transcode-to-string will transcode to a string instead.

Parenlab requires matlab-mode so that it can indent its outputted code correctly.

I have a highly idiosyncratic Matlab setup, so I put some integration code in auxilliary.el. If you have a Matlab process running in a buffer called *evalshell*, then this code will let you load a parenlab file and press C-c C-c to "compile" and execute the code therein. This process will write any script and defun forms to files before executing the code. C-x C-e will evaluate the last s-expression as Matlab code in the interpreter.

Parenlab depends on Shadchen-el my pattern matching library.

Thursday, April 5, 2012

Shadchen-el introduction and defpattern tutorial

Lately I've been working on an Emacs Lisp library I'm pretty proud of, Shadchen. It implements extensible pattern matching, somewhat like Racket's Match facility for Emacs Lisp. In this tutorial/introduction I'll explain how to use Shadchen's various facilities and, most importantly, how to extend Shadchen itself with new patterns, which is an interesting subject in and of itself, combining compile time and run-time execution in interesting ways. If you already know about pattern matching, feel free to jump to the end, where I talk about writing non-trivial patterns using Shadchen's defpattern.

The Problem Shadchen Solves

Shadchen solves several problems you may not know you may have. In one sentence, Shadchen lets you concisely express both destructuring and type checking for complex data structures. It can be thought of as combining the features of cond, case and assert into one nice package. My experience is that this collection of features helps me write better code, since it encourages me to dilineate exactly the kind of data a function or form expects before doing anything with it.

Shadchen is also conservative - unless you provide a pattern that matches the input data, it will fail with a match error, so you know something is wrong before something strange happens.

For instance, suppose we were writing an interpreter for Lisp. It might look like:

(defun eval (form env)
 (cond form 
  ((symbolp form) (eval-symbol form env))
  ((listp form)
   (case (car form)
    (if (handle-if (cadr form)
                   (caddr form)
                   (caddr form) env)) 
    (let (handle-let 
          (cadr form)
          (cdddr form)))
    ...))))

Note that we have both a cond and a case here, and that after we test our data, we destructure it. Here is a similar piece of code using Shadchen:

(defun eval (form env)
 (match form 
  ((p #'symbolp s) (eval-symbol s env))
  ((list) nil)
  ((list 'if pred true-branch false-branch)
   (handle-if pred true-branch false-branch env))
  ((list-rest 'let (list-rest pairs) body)
   (handle-let pairs body env))
  ...))

(Notes: the pattern p passes when the predicate as its second argument is true on the match value, and then matches against the third argument. So (p #'symbolp s) matches only when form is a symbol and then binds s to that symbol. list matches when the input is a list and each pattern in the the list pattern matches each corresponding element in the list. list-rest is similar but any leftover parts of the list are matched against the final pattern.)

This code is more concise, and yet it is also much more explicit, both in that it provides better naming for values and it provides more explicit error checking. For instance, this version will only match if with three expressions, where as the previous evaluator would have been fine with the expression (if a b c d e f). This version also asserts explicitly that the binding part of the let needs to be a list. With a custom pattern we could also ensure it was a list of symbol/expression pairs in almost the same space.

It takes all kinds, but I found that once I got used to programming with pattern matching, it was hard to go back.

Other Rad Features of Shadchen

match-let

Shadchen wants to let you program in a functional style. To that end, in addition to the regular pattern matching form match, it also provides some other, nice features. Many algorithms involve examining some intermediate data, checking its structure somehow, and then recursively processing the next step. Shadchen allows this kind of thing with the Scheme-flavored match-let form.

The form match-let can be used exactly like let:

(match-let 
  ((x 10)
   (y 11))
 (+ x y))

But in each "binding" pair, the symbol may be replaced with any Shadchen pattern. Eg:

(match-let 
 ((x 10)
  (y 11)
  ((list q r s) (list 1 2 3)))
 (+ x y q r s))

Will give you 27. If any pattern fails, the form produces a match fail error, which means you can use match-let as a let form with tidy type checking.

Finally, a match-let form allows tail recursion. Invoking recur in a tail position inside the form causes the match-let to be re-entered without growing the stack. For instance:

(match-let 
 (((list x y) (list 0 0)))
  (if (< (+ x y) 10000)
      (recur (list (+ x 1) (+ x y)))
      (list x y)))

Results in (141 9870) and can't blow the stack. It is an error to invoke recur in a non-tail position, but because of limitations in Emacs Lisp, it is difficult to enforce this statically.

defun-match

The form defun-match lets you write functions which pattern match on their arguments and split their calculations across multiple bodies, in a bit like the style of Shen or Haskell.

For instance, suppose we have an animal simulator, where each animal is represented by a list, the first element of which is a symbol representing the animal name. We can say:

(defun-match- vocalize ((list-rest 'cat properties))
  "Cat vocalization."
  (message "Meow"))

(defun-match vocalize ((list-rest 'dog properties))
  "Dog vocalization."
  (message "Woof"))

Then:

(vocalize '(cat :name tess))
(vocalize '(dog :name bowzer))

Functions defined with defun-match can also use recur to re-enter themselves without growing the stack. Consider a function which causes a list of animals to vocalize:

(defun-match- vocalize-list (nil) nil)
(defun-match vocalize-list ((cons animal animals))
  (vocalize animal)
  (recur animals))

recur can dispatch to any of the bodies defined for the function and it doesn't grow the stack. It must be invoked from tail position, though non-tail calls can be affected by simply calling the function.

(N.B. defun-match- with that dangling minus sign causes previous bodies to be expunged before defining the indicated body.)

Extending Shadchen with defpattern

Shadchen is an extensible pattern matching facility. We can define new patterns much in the way we define new functions, although patterns are more like macros than functions. Let's look at a simple example, and then I'll guide you through a more complex example I just added to the library using the defpattern.

A quirk of Common and Emacs Lisp is that (car nil) is nil even though nil is not a cons cell, and so does not have a car or a cdr. I hate this behavior, because its quite evident that (cons nil some-list) is different from nil, but car can't tell that - the user has to do more inspection to find this out. Bugs waiting to happen, let me tell you.

However, I'm nothing if not accommodating, and so the cons pattern in Shadchen will, in fact, match against nil. So:

(match nil 
 ((cons a b) (list a b)))

Will be '(nil nil). Let's define a pattern which is like cons, but only matches against actual cons cells, into which category nil fails to fall.

(defpattern strict-cons (car cdr)
 `(p #'consp (cons ,car ,cdr)))

A defpattern body must evaluate to a legal shadchen pattern. Each argument to the defpattern is also a shadchen pattern. So this pattern reads "define a new pattern strict-cons, which first checks that the match value is a cons cell using the p pattern, and then matches the car and cdr of that cons cell against the patterns car and cdr.".

During the expansion of a shadchen pattern matching form, user defined patterns are looked up and their expansions are inserted into the macro expansion. In short, defpattern allows you to define new patterns in terms of old patterns.

This might seem very restrictive, but Shadchen provides primitive patterns that allow you to write arbitrarily complex pattern matchers that can perform rich computations on their way to rejecting or accepting a match.

Implementing concat, a non-trivial pattern

I just used defpattern to implement a pretty complex pattern, concat and it was something of a learning experience. Writing complex patterns definitely takes some thought and practice, but hopefully this tutorial will bootstrap users to a point where their own patterns can be implemented without too much pain.

What is so complicated about a concat pattern? Well, we want concat to match the concatenation of patterns which match strings. Eg:

(concat "dog" "cat")

Should match "dogcat". Writing a pattern that has this behavior is easy:

(defpattern concat (&rest strings)
 (reduce #'concat strings))

This pattern can't match subpatterns that are anything other than strings, however. We'd really like to be able to match, for instance:

(concat (and (or "dog" "cat") which) "dog")

against either "dogdog" or "catdog", binding which to whatever the initial string contents actually are. How can we do this?

Nailing down concat's semantics.

We want concat to function this way:

If the initial pattern is not a string, then try matching that pattern against larger and larger substrings until either you run out of string to match against, or you match. If you match, then match, again using concat with the unused patterns, against whatever is left of the string after you've removed the part that matched. Repeat until all patterns are exhausted and then make sure the string has been completely consumed too.

If the initial pattern is a string, then just cleave off the same length of characters from the input, and if they match, recursively match the rest. Here is the entry point:

(defpattern concat (&rest patterns)
  (cond 
   ((length=0 patterns)
    "")
   ((length=1 patterns)
    `(? #'stringp ,(car patterns)))
   (:otherwise
    (cond 
     ((stringp (car patterns))
      `(simple-concat ,@patterns))
     (:otherwise 
      `(full-concat 0 ,@patterns))))))

The :otherwise has all the meat, but we defer it to to other helper-patterns; simple-concat and full-concat. Simple concat looks like this:

(defpattern simple-concat (&rest patterns)
  (cond 
   ((length=0 patterns)
    "")
   ((length=1 patterns)
    `(? #'stringp ,(car patterns)))
   (:otherwise
    (let* ((the-string (car patterns))
           (static-len (length the-string)))
      `(and 
        (p #'stringp)
        (p (lambda (s)
             (>= (length s) ,static-len)))
        (p 
         (lambda (s)
           (string= (substring s 0 ,static-len) ,the-string)))
        (funcall (lambda (s)
                   (substring s ,static-len))
                 (concat ,@(cdr patterns))))))))

Look at the backquoted expression. It is an and pattern, which only succeeds if all the patterns beneath it also succeed. These patterns are (p #'stringp), which asserts that the input is a string, (p (lambda (s) (string= (substring s 0 ,static-len) ,the-string))) which asserts that the input is at least long enough to contain the string we want to match against. The next form asserts that the substring of the input equal to the pattern string in length is equal to the pattern. If this is true, then the pattern matches, and we use the funcall pattern to match against the rest of the string with the leftover patterns.

The funcall pattern takes the input to the match, applies a function to it, and then matches the output of that function application to the pattern provided as its third slot.

full-concat is more complex. Note that when we invoke full-concat we provide it an numerical first argument. This number tells the pattern how far into the string to match we've looked, so it starts at zero. After all the first pattern could match the empty string. full-concat looks like this2:

(defpattern full-concat (pivot &rest patterns)
  (assert (numberp pivot)
          ()
          "Pivot should be a number.")
  (cond 
   ((length=0 patterns)
    "")
   ((length=1 patterns)
    `(? #'stringp ,(car patterns)))
   (:otherwise
    `(and 
      (p (lambda (s)
           (>= (length s) ,pivot)))
      (or 
       (and (funcall
             (lambda (s)
               (substring s 0 ,pivot))
             ,(car patterns))
            (funcall 
             (lambda (s)
               (substring s ,pivot))
             (concat ,@(cdr patterns))))
       (full-concat ,(+ pivot 1) ,@patterns))))))

Here we use and again. We first check that the input string is long enough to grab the substring indicated by pivot. If this isn't true, the match fails. We then use the or pattern to indicate a branch. Either of the or patterns might succeed, but the first to do so is the only one that will happen. The first pattern to or uses funcall to peel off the substring of the input from 0 to pivot. If the initial pattern matches, then we use funcall again to get the rest of the string, and invoke concat again.

If this match fails, then we invoke full-concat again, but increment the pivot by one, indicating that we want to check against a larger substring.

If this is confusing, and it is understandible if it is, remember the following: when writing defpatterns, or is used for flow control, and is used to assert multiple things about the input, p is used to assert individual arbitrary conditions on the input, and funcall is used to transform the input for further matching. Recursive pattern expansion is used for iteration 1.

And feel free to contact me with questions, if they come up.


1 It is a lot like writing prolog, actually. Pattern matching is a significant distance from lisp to prolog.

2 After writing this I realized we can do better. If we get a match for the initial pattern, and then check the rest of the patterns, its possible they will fail because the initial match didn't consume enough of the string. It is simple to say, "if the subsequent match fails, keep increasing the pivot and trying again." I leave it as an exercise to the reader to figure out how to represent this trivial backtrackingish thing - but you can always check the source for the solution.