am creating a spam filter but I notice many spammers like to use delimiters like whitespace, comma, etc to separate the letters of suspicious words(refer to image), are there any good algorithms to reassemble the tokens?
A better way to deal with this particular problem is to see if the words make any sense. Parse the words as you would a normal sentence using whitespace. What percentage of words match the dictionary? A low match generally means spam.