This is something I was doing in SpamAssassin but SA is a crude tool. What I’m doing is making files with regular expressions where if a number of matched occur the file mane is used as a token indicating a match. for example, I test for lotsofmoney, truseme, africa, banks, religion, etc. All of which are harmless by themselves but when combined can be very effective at classifying spam and ham.

I’m throwing the tokens into my Evolution AI and it creates the associations and scores them automatically. All I have to do it create these files as a way of pointing out what is interesting and the AI does the work. Amazing how well it works. I wrote up a detailed explanation here.

http://wiki.junkemailfilter.com/index.php/Concept_Parsing_Spam_Filter