Spam Reporting @Fermilab
What are we doing?
- We have been using SpamAssassin to tag messages with a "spam
score" that mail programs can use to filter messages.
- We recently activated the Bayesian Filtering module of
SpamAssassin.
What is Bayesian Filtering?
- Unlike the rest of the SpamAssassin local rules, as opposed to the network based rules, Bayesian
Filtering is an adaptable spam filtering method.
- It does not use specific predefined tests.
- It relies on a training program to analyze known spam emails and
known non-spam emails and generates databases with counts of the
frequencies of each word appearing in spam and non-spam emails.
- When scanning a message, it combines the scores for all the words
in the message and comes up with a percentage that indicates the
likelihood a message is spam. This is then used to add or subtract
points to the normal SpamAssassin score.
What messages are we training with?
- For most of the training, we are using auto-learning.
- This takes messages with a normal SpamAssassin score < 0.1 or
greater than 12 and trains the Bayesian Filter databases on those
messages.
- We also take user submissions via the Spam
Reporting Form and train on them.
- The submitted spam is minimally reviewed by a person just to make
sure anything that is obviously not spam is not accidentally submitted.
Links:
Questions? contact Kevin
Hill.