Hey: someone implemented a similar algorithm in one of my favorite languages: Factor! See it here.
Hey. Check out some of these fictional demon names:
("Barzan" "Melkot" "Bandi" "Fek'Ihri" "Krenim")
They are from this list of fictional demon names on the wikipedia.
You see, I've been listening to Songs of the Dying Earth, a short story collection, written by various authors, but set in Jack Vance's Dying Earth setting, which is a far future science fiction/fantasy world where mages insult eachother politely and make mischeif.
Another thing they do is enslave demons from lower or higher realms. This gave me the idea to eventually write a game where you play as one such enslaved demon, and I thought it would be cute to generate a fresh demon name for each play through. So I wrote such a system, in Racket, and here is a guided tour.
This isn't a very advanced technique, but all the code here is side effect free - so if that floats your boat, or if you are just curious how one programs in such a manner, then read on!
Overview
Our demon name generator will be a modified, ad-hoc Markov Model generator. What does that mean? It is easier to explain than implement, actually. We will take a corpus of demon names (the list from wikipedia) and do some statistics on it. In particular, we'll calculate how often a given letter follows another letter in a name, and then we can generate names based on that table of transitions. The only wrinkle is that the first letter in a name is obviously going to not have a letter before it to bias the choice, and that any letter might lead to the end of the name.
We'll add an additional wrinkle, which improves name generation, which is that we'll maintain a separate transition table for each position in the name. That is, the probability of going from "a" to "b" might be different if "a" is the third character in the name vs the seventh. This will let our model include the fact that certain letters and transitions are more common, for instance, near the front of the name, than near the end.
Populating our Transition Table
Our corpus looks like this:
(define demon-names
(list "Abraxas"
"Abbadon"
"Agrith-Naar"
"Aku"
"Alastair"
...))
We need to visit each name in the corpus and scan through it, recording each time, for a given index, that the letter at that index follows whatever letter was previous to the index. We want to maintain a side-effect free discipline, so we will use Racket's purely functional, persistent dictionaries to store the transitions. We will often be incrementing the value at a key in such a dictionary by 1, so lets get that code out of the way now:
(define (dict-update d key fun . args)
(match args
((list)
(let ((val (dict-ref d key)))
(dict-set d key (fun val))))
((list or-value)
(let ((val (dict-ref d key (lambda () or-value))))
(dict-set d key (fun val))))))
(define (at-plus d k)
(dict-update d k (lambda (x) (+ x 1)) 0))
dict-update takes a dictionary, a key and a function, fetches the
value currently at the position, calls the function on it, and sets
that value in the dictionary, the new version produced by which action
is then returned. If they key isn't there, the first value in the
args list is passed to the function instead, allowing us to specify
a default value.
at-plus used dict-update to increment the value stored at a key by
one, setting it to one if no such value is present.
Our dictionary is going to associate transitions with counts. How shall we represent a transtion? A transition is a triple, in our case, consisting of the index the transition covers, the previous character, and the current character. We will use a Racket struct to represent a transition as a triple:
(struct triple (index from to) #:transparent)
We will use the symbol 'a to represent the character "a" and so
one. For the first and last transition, we will use the special
values 'start and 'end. The first transition always goes from
'start to a letter, and any subsequent transition can arrive at
'end, in which case the name is over.
The function which populates the transition table looks like this, then:
(define (populate-transition-table names)
(let loop
((names names)
(table (make-immutable-hash '())))
(match names
((list) table)
((cons name names)
(loop names
(foldl
(lambda (triple table)
(at-plus table triple))
table
(string->triples name)))))))
It takes of list of names and iterates over them, accumulating the
table as we go. make-immutable-hash returns an empty immutable hash
table. The heavy work is done by a call to foldl, which accumulates
over the result of calling string->triples on name.
string->triples converts the current name to a list of triples.
(define (string->triples s)
(let loop
((i 0)
(l (append '(start)
(string->list s)
'(end)))
(triples '()))
(match l
((list 'end) (reverse triples))
((cons from
(and rest
(cons to _)))
(loop
(+ i 1)
rests
(cons (triple i
(char->symbol* from)
(char->symbol* to))
triples))))))
We initially convert our string to a list, prepend 'start and suffix
'end, and then iterate over the list, looking for 'end to
terminate. At each iteration, we grab two values from the list,
create a triple, and then recur on the current list minus just one
element. Hence the (and rest (cons to _)) pattern match.
Generating Novel Names
Generating a new demon name is easy, now that we have the table. We
simply start with an empty name, and consult the table for which
character to generate next until we encounter an end.
Most of the work is done by the method next-character, which takes a
table, the previous character as a symbol (or 'start) and the index
of the character to be generated:
(define (next-character table prev-character index . args)
(match args
((list) (next-character table prev-character
index
(current-pseudo-random-generator)))
((list generator)
(let* ((sub-table
(restrict-table table index prev-character))
(total-elements (foldl + 0 (dict-values sub-table)))
(draw (random total-elements generator)))
(let loop
((draw draw)
(key/val (dict->list sub-table)))
(match key/val
((cons (cons
(triple _ from to)
count) rest)
(if (or (empty? rest)
(<= draw 0))
to
(loop (- draw count)
rest)))))))))
This function restricts the table to the triples which match the index and previous letter, calculates the total number of possible transitions, generates a random number in that range, and iterates through the possible transitions until that number is zero or less, subtracting away the count of each possible transition as it does so. It takes an option random state so that it can be used purely functionally, if we so desire. By default, it consults the current random state, which isn't completely pure - oh well!
This function returns the symbol generated.
generate-demon-name does the rest of the work:
(define (generate-demon-name table . args)
(match args
((list) (generate-demon-name table (current-pseudo-random-generator)))
((list gen)
(let loop ((ix 0)
(name-list '(start)))
(let ((next (next-character table (car name-list) ix gen)))
(if (eq? next 'end)
(symbol-list->string (cdr (reverse name-list)))
(loop
(+ ix 1)
(cons next name-list))))))))
This takes a table and an optional random state and calls
next-character until and end is found. When it is, the 'start
element is stripped off and the list of symbols is converted into a
string, which is returned to the user.
It is invoked like this:
(generate-demon-name (populate-transition-table demon-names)) ;->
"Quinag"
Another function, generate-demon-names allows many to be generated
at once:
(generate-demon-names (populate-transition-table demon-names) 10)
'("Azaigair"
"Qwalboy"
"Tecasex"
"Mabuak"
"Cinofego"
"Abby"
"Nurdar"
"Zarigak"
"Yahaxae"
"Yk'legod")
Voila!
Conclusions
I think the result is pretty nice, and if you precache the table population step, it is also a pretty zippy algorithm. The code is on my github, comments are welcome.
For the intreped, try feeding in other seed data. Included in the library is a list of alien races from Star Trek and a list of all the names in the Book of Mormon. Have fun!
3 comments:
Heh, pretty cool
Is there a reason you did the optional arguments by explicit matching instead of (define (generate-demon-name table [gen (current-pseudo-random-generator)]) ...)?
Entirely stylistic.
Post a Comment