The bedroom window was a very seedy and disreputable hard-felt hat.

Jeremy Brett as Sherlock Holmes

A Markov chain is a type of statistical model that's used to describe things that happen sequentially. You begin in one state, then there is a certain probability that you will move to each of the next possible states, which is useful for things like finding conserved DNA sequences.

This can also be quite a fun thing to play with - it's great for taking in text and trying to make sensible-sounding sentences out of it. The idea being that instead of understanding what the sentences actually mean, you can just see what word usually comes after the word you started with and pick one of them to go next.

So I decided to model the English language as a Markov chain, using the text from The Adventures of Sherlock Holmes (from Project Gutenberg) as training data, and produced about as much coherence as you'd expect from such a method. If you go to http://bethmcmillan.com/geek/markov/, you can generate your very own pseudo-sentence.

I also made a Twitter bot that tweets these nonsense Holmesian sentences.

In brief, I installed node.js, which lets you run JavaScript without a browser, and added the "twit", "jsdom" and "jquery" modules. I followed this tutorial for making a twitter bot. The bot tweets every 5 minutes (I might change this if it turns out to be too much). After stripping the newlines, quotation marks and double spaces from the text, it picks a random word to begin with. Then, it takes this random word and the one that follows it, and finds all of the other places in the text where this pair of words can be found. Next, at random, it picks one of these locations and takes the next word in the sentence. Finally, the process repeats with the two newest words until there's a tweet-length phrase.

All my code's available under the fold, for anyone who's interested. Feel free to follow @markov_holmes for entertaining gibberish!

Continue reading The bedroom window was a very seedy and disreputable hard-felt hat.

...