Markov Chain
Eric Hodel | Sun, 26 Feb 2006 00:39:00 GMT
The back of my brain has been wanting a Markov chain based random-text generator for some time. I found some decently explanatory pseudo-code in the Generating Text section of the online Programming Pearls by Jon Bently.
What I really wanted was something that illustrated the concept well so I could tell what it was doing. The perl, python and ruby versions I found weren’t simple enough for me to see that. I also wanted to adjust the amount of state in the chain, but the only implementations that supported that were far too complicated to figure out.
Working from Jon Bently’s pseudo-code I ended up with the following implementation.
#!/usr/local/bin/ruby -w
# amount of state (order-k)
phrase_length = (ARGV.shift || 1).to_i
max_words = ARGV.shift
raise ArgumentError, 'phrase length too short' if phrase_length < 1
words = []
ARGF.each_line { |line| words.push(*line.scan(/\S+/)) }
# number words to generate
max_words = (max_words || words.length).to_i
# phrase => next-word possibilities
phrases = Hash.new { |h,k| h[k] = [] }
# build up a list of phrases and their next-word possibilities (k-grams)
words.each_with_index do |word, index|
phrase = words[index, phrase_length] # current phrase
phrases[phrase] << words[index + phrase_length] # next possibility
end
# our seed phrase
phrase = words[0, phrase_length]
output = []
max_words.times do
# grab all possibilities for our state
options = phrases[phrase]
# add the first word to our output and discard
output << phrase.shift
# select at random and add it to our phrase
phrase.push options[rand(options.length)]
# the last phrase of the input text will map to an empty array of
# possibilities so exit cleanly.
break if phrase.compact.empty? # all out of words
end
# print out our output
puts output.join(' ')
Here’s some sample output using Pride and Prejudice as input text:
Phrase length 1:
It soothed, but his name of balls are brother and sometimes coming to assure you. “FITZWILLIAM DARCY” If the country. Under such a circumstance itself, and if you the wonder. Elizabeth was a doubt of all! If I shall have satisfied with some family knew he was over, he owes
Phrase length 2:
It is because you have the carriage by Mr. Darcy acknowledged, that a cause of compassion and honour, he had all the way which the country as in her husband, called out to recover her?” “My father and mother. When they get to our email newsletter (free!). Those of you
Phrase length 3:
It is a nice long walk, and Mr. Darcy had brought with him a Colonel Fitzwilliam, the younger son of his uncle Lord—, and, to the great surprise of all the Pemberley estates, and whose good conduct in the whole course of my life.” This roused a general astonishment;
Unfortunately the input text is not diverse enough to support phrase lengths of 4.
Comments are disabled

Articles