Spam Filtering
From Apis Networks Wiki
Contents |
Overview
SpamAssassin is an open-source software application that intelligently scans messages to determine spam/non-spam status of e-mail.
Usage
SpamAssassin is implicitly invoked through the global maildrop filter (/etc/maildroprc) for each site. No further steps are necessary to enable SpamAssassin for a site.
Problems
Is SpamAssassin filtering my messages?
Barring rare and extraneous circumstances, yes. SpamAssassin is internally monitored by the integrity daemon to ensure it is up and running. Since adding a SpamAssassin check to the daemon, we have seen zero reported cases of SpamAssassin being offline and only a handful of false positives within a year. Spam that slips through is due to either (a) low Bayesian scoring or (b) a recently compromised computer sending out spam. You may view the transmission headers. In Thunderbird select View -> Message Details; Outlook 2007, expand the Options menu. Below is an example of a correctly filtered message:
X-Spam-Checker-Version: SpamAssassin 3.2.4 (2008-01-01) on
assmule.apisnetworks.com
If the message was labeled as spam, then the X-Spam-Flag will report YES otherwise NO
X-Spam-Score: 7.1
X-Spam-Flag: YES
Further, if the message is labeled as spam, then the subject will have [SPAM] at the front followed by its score. You may use this attribute to filter within your e-mail client or filter server-side with the SpamAssassin Wizard
Scoring
How can I enhance scoring?
If your account has received an adequate volume of e-mail (200 spams + 200 non-spam messages), Bayesian filtering will automatically activate. As your account ages, Bayesian filtering will progressively become Edit .spamassassin/user_prefs and to increase Bayesian scoring of 95 and above:
score BAYES_99 7
score BAYES_95 5
This should greatly enhance the server's ability to catch new spam, but only if you have an adequate number of learned messages.
It is also recommended you run sa-learn periodically on missed spam to retrain the filter.
See also: Streamlining SpamAssassin's Learning Process
Determining how many spams/hams Are learned
sa-learn --dump magic will display the Bayesian metadata.
[msaladna@assmule ~]$ sa-learn --dump magic
0.000 0 3 0 non-token data: bayes db version
0.000 0 140684 0 non-token data: nspam
0.000 0 33568 0 non-token data: nham
0.000 0 136013 0 non-token data: ntokens
0.000 0 1207054271 0 non-token data: oldest atime
0.000 0 1207402325 0 non-token data: newest atime
0.000 0 1207402432 0 non-token data: last journal sync atime
0.000 0 1207399817 0 non-token data: last expiry atime
0.000 0 345600 0 non-token data: last expire atime delta
0.000 0 15349 0 non-token data: last expire reduction count
nham: number of hams learned
nspam: number of spams learned
ntokens: number of tokens (words) within the database
In this example the database has 140,684 spams, 33,568 hams, and the oldest entry is from April 1st (1207054271).
