any algorithms for token reassembly?

http://img211.imageshack.us/img211/7775/42287652.jpg

am creating a spam filter but I notice many spammers like to use delimiters like whitespace, comma, etc to separate the letters of suspicious words(refer to image), are there any good algorithms to reassemble the tokens?
Last edited on
A better way to deal with this particular problem is to see if the words make any sense. Parse the words as you would a normal sentence using whitespace. What percentage of words match the dictionary? A low match generally means spam.
Topic archived. No new replies allowed.