Thursday, February 2, 2012

A (almost) Pure Random Demon Name Generator in Racket

Hey: someone implemented a similar algorithm in one of my favorite languages: Factor! See it here.

Hey. Check out some of these fictional demon names:

("Barzan" "Melkot" "Bandi" "Fek'Ihri" "Krenim")

They are from this list of fictional demon names on the wikipedia.

You see, I've been listening to Songs of the Dying Earth, a short story collection, written by various authors, but set in Jack Vance's Dying Earth setting, which is a far future science fiction/fantasy world where mages insult eachother politely and make mischeif.

Back!

Another thing they do is enslave demons from lower or higher realms. This gave me the idea to eventually write a game where you play as one such enslaved demon, and I thought it would be cute to generate a fresh demon name for each play through. So I wrote such a system, in Racket, and here is a guided tour.

This isn't a very advanced technique, but all the code here is side effect free - so if that floats your boat, or if you are just curious how one programs in such a manner, then read on!

Overview

Our demon name generator will be a modified, ad-hoc Markov Model generator. What does that mean? It is easier to explain than implement, actually. We will take a corpus of demon names (the list from wikipedia) and do some statistics on it. In particular, we'll calculate how often a given letter follows another letter in a name, and then we can generate names based on that table of transitions. The only wrinkle is that the first letter in a name is obviously going to not have a letter before it to bias the choice, and that any letter might lead to the end of the name.

We'll add an additional wrinkle, which improves name generation, which is that we'll maintain a separate transition table for each position in the name. That is, the probability of going from "a" to "b" might be different if "a" is the third character in the name vs the seventh. This will let our model include the fact that certain letters and transitions are more common, for instance, near the front of the name, than near the end.

Populating our Transition Table

Our corpus looks like this:

(define demon-names 
  (list "Abraxas"
        "Abbadon"
        "Agrith-Naar"
        "Aku"
        "Alastair"
        ...))

We need to visit each name in the corpus and scan through it, recording each time, for a given index, that the letter at that index follows whatever letter was previous to the index. We want to maintain a side-effect free discipline, so we will use Racket's purely functional, persistent dictionaries to store the transitions. We will often be incrementing the value at a key in such a dictionary by 1, so lets get that code out of the way now:

(define (dict-update d key fun . args)
  (match args
    ((list)
     (let ((val (dict-ref d key)))
       (dict-set d key (fun val))))
    ((list or-value)
     (let ((val (dict-ref d key (lambda () or-value))))
       (dict-set d key (fun val))))))

(define (at-plus d k)
  (dict-update d k (lambda (x) (+ x 1)) 0))

dict-update takes a dictionary, a key and a function, fetches the value currently at the position, calls the function on it, and sets that value in the dictionary, the new version produced by which action is then returned. If they key isn't there, the first value in the args list is passed to the function instead, allowing us to specify a default value.

at-plus used dict-update to increment the value stored at a key by one, setting it to one if no such value is present.

Our dictionary is going to associate transitions with counts. How shall we represent a transtion? A transition is a triple, in our case, consisting of the index the transition covers, the previous character, and the current character. We will use a Racket struct to represent a transition as a triple:

(struct triple (index from to) #:transparent)

We will use the symbol 'a to represent the character "a" and so one. For the first and last transition, we will use the special values 'start and 'end. The first transition always goes from 'start to a letter, and any subsequent transition can arrive at 'end, in which case the name is over.

The function which populates the transition table looks like this, then:

(define (populate-transition-table names)
  (let loop 
      ((names names)
       (table (make-immutable-hash '())))
    (match names
      ((list) table)
      ((cons name names)
       (loop names
             (foldl 
              (lambda (triple table)
                (at-plus table triple))
              table
              (string->triples name)))))))

It takes of list of names and iterates over them, accumulating the table as we go. make-immutable-hash returns an empty immutable hash table. The heavy work is done by a call to foldl, which accumulates over the result of calling string->triples on name.

string->triples converts the current name to a list of triples.

(define (string->triples s)
  (let loop
      ((i 0)
       (l (append '(start) 
                  (string->list s)
                  '(end)))
       (triples '()))
    (match l
      ((list 'end) (reverse triples))
      ((cons from 
             (and rest
                  (cons to _)))
       (loop 
        (+ i 1)
        rests
        (cons (triple i
                    (char->symbol* from) 
                    (char->symbol* to))
              triples))))))

We initially convert our string to a list, prepend 'start and suffix 'end, and then iterate over the list, looking for 'end to terminate. At each iteration, we grab two values from the list, create a triple, and then recur on the current list minus just one element. Hence the (and rest (cons to _)) pattern match.

Generating Novel Names

Generating a new demon name is easy, now that we have the table. We simply start with an empty name, and consult the table for which character to generate next until we encounter an end.

Most of the work is done by the method next-character, which takes a table, the previous character as a symbol (or 'start) and the index of the character to be generated:

(define (next-character table prev-character index . args)
  (match args
    ((list) (next-character table prev-character
                            index
                            (current-pseudo-random-generator)))
    ((list generator)
     (let* ((sub-table 
             (restrict-table table index prev-character))
            (total-elements (foldl + 0 (dict-values sub-table)))
            (draw (random total-elements generator)))
       (let loop 
           ((draw draw)
            (key/val (dict->list sub-table)))
         (match key/val
           ((cons (cons 
                   (triple _ from to)
                   count) rest)
            (if (or (empty? rest)
                    (<= draw 0))
                to
                (loop (- draw count)
                      rest)))))))))

This function restricts the table to the triples which match the index and previous letter, calculates the total number of possible transitions, generates a random number in that range, and iterates through the possible transitions until that number is zero or less, subtracting away the count of each possible transition as it does so. It takes an option random state so that it can be used purely functionally, if we so desire. By default, it consults the current random state, which isn't completely pure - oh well!

This function returns the symbol generated.

generate-demon-name does the rest of the work:

(define (generate-demon-name table . args)
  (match args
    ((list) (generate-demon-name table (current-pseudo-random-generator)))
    ((list gen)
     (let loop ((ix 0)
                (name-list '(start)))
       (let ((next (next-character table (car name-list) ix gen)))
         (if (eq? next 'end)
             (symbol-list->string (cdr (reverse name-list)))
             (loop 
              (+ ix 1)
              (cons next name-list))))))))

This takes a table and an optional random state and calls next-character until and end is found. When it is, the 'start element is stripped off and the list of symbols is converted into a string, which is returned to the user.

It is invoked like this:

(generate-demon-name (populate-transition-table demon-names)) ;->
"Quinag"

Another function, generate-demon-names allows many to be generated at once:

(generate-demon-names (populate-transition-table demon-names) 10)
'("Azaigair"
  "Qwalboy"
  "Tecasex"
  "Mabuak"
  "Cinofego"
  "Abby"
  "Nurdar"
  "Zarigak"
  "Yahaxae"
  "Yk'legod")

Voila!

Conclusions

I think the result is pretty nice, and if you precache the table population step, it is also a pretty zippy algorithm. The code is on my github, comments are welcome.

For the intreped, try feeding in other seed data. Included in the library is a list of alien races from Star Trek and a list of all the names in the Book of Mormon. Have fun!