Richard Dawkins' weasels beat random monkeys to Shakespeare's work
By John Timmer | Published 6 months ago
There's a classic example of probability that focuses on the question of whether a million monkeys, given a million typewriters, could ever recreate a work of Shakespeare by chance. A programmer from Nevada is now giving virtual monkeys a chance, having them pop out random strings and matching the results against the complete works of Shakespeare. But the details of the work suggest it's not really a demonstration of brute force producing a low-probability result; instead, the system appears to mimic one used by Richard Dawkins to demonstrate the power of evolutionary selection.
Jesse Anderson, who's running the virtual monkeys on a home computer, describes his system using text and video on his site. One thing that's very clear is that he made the challenge a bit simpler than it might have been. Each virtual monkey on his machine only spits out a string of standard ASCII letters—no punctuation, no capitals or digits, not whitespaces. This cuts the potential space he's searching down considerably.
But that's not the only thing that's been simplified in order to make the monkeys' virtual lives a bit easier. Instead of being an attempt to reproduce Shakespeare with random characters, the algorithm Anderson is using is a bit closer to one used as a simple demonstration of the power of biological evolution (one that, coincidentally, also used Shakespeare as its text).
Recapitulating Shakespeare at random can be done in a number of ways. The simplest, and most difficult manner, involves adding a single random character at a time, just as a monkey on a typewriter would. If the monkey ever hits the wrong key, the whole work gets thrown out, even if the previous thousand were correct. That's the premise of the Simpson's skit that Anderson says was his inspiration to tackle the project.
At the other end of the spectrum, we have the weasel program, first discussed by Richard Dawkins in The Blind Watchmaker. In this example, the target text is the Shakespeare line "Methinks it is like a weasel." The random typing of characters is considered to be analogous to the results of random mutation. But Dawkins adds a new step, analogous to natural selection: if any of the letters are right, they're retained as "fit." The rest get reshuffled and are tested again. Adding this selection step radically shortens the time it takes to arrive at the correct solution, since the monkey will never have to throw out any of its successful work and start over.
Anderson's search process is a lot closer to Dawkins' weasel example. Instead of single characters, his monkeys bang out blocks of nine. Those blocks are then compared to a compilation of all the text in all of Shakespeare's work. If they match anywhere, that block is marked as complete. There are only 26 single characters being used by Anderson, and that creates 5.4 trillion potential nine-character sequences, so there are a fair number to crunch through (Anderson's monkeys have done over 500 billion combinations). But that's a far cry from having to directly match even a simple phrase like "Methinks it is like a weasel"—that's 28 characters long, and the 27 character alphabet (Dawkins didn't ignore the spaces) means that it's only one of 1.2 x 1040 possible combinations.
What Anderson has really shown is that a lot of very carefully supervised monkeys can eventually bang out fragments that cover a substantial proportion of Shakespeare's text, and we've now got the computing power to make virtualizing that process a manageable task. But we're not yet at the point where we've got enough computing power to make enough virtual monkeys so that one of them will likely be able to spit out more than a fragment in a single go.