Assume you are creating spam filter. You are given some training emails (text files, each known to be spam or ham), so you can "teach" your SF to be able to say which of testing emails (set of emails we don't have yet) might be spam or not.
How would you analyse the training set of emails? According to what would your SF decide? Would it look at the most common words in files? Which algorithms can come handy?