any algorithms for token reassembly?

Forum
General C++ Programming
any algorithms for token reassembly?

any algorithms for token reassembly?

http://img211.imageshack.us/img211/7775/42287652.jpg

am creating a spam filter but I notice many spammers like to use delimiters like whitespace, comma, etc to separate the letters of suspicious words(refer to image), are there any good algorithms to reassemble the tokens?

Last edited on

PanGalactic (1658)

A better way to deal with this particular problem is to see if the words make any sense. Parse the words as you would a normal sentence using whitespace. What percentage of words match the dictionary? A low match generally means spam.

Topic archived. No new replies allowed.

C++

Forum

any algorithms for token reassembly?