Spam Filtering

From Apis Networks Wiki

Jump to: navigation, search

Contents

Overview

SpamAssassin is an open-source software application that intelligently scans messages to determine spam/non-spam status of e-mail.

Usage

SpamAssassin is implicitly invoked through the global maildrop filter (/etc/maildroprc) for each site. No further steps are necessary to enable SpamAssassin for a site.

Problems

Is SpamAssassin filtering my messages?

Barring rare and extraneous circumstances, yes. SpamAssassin is internally monitored by the integrity daemon to ensure it is up and running. Since adding a SpamAssassin check to the daemon, we have seen zero reported cases of SpamAssassin being offline and only a handful of false positives within a year. Spam that slips through is due to either (a) low Bayesian scoring or (b) a recently compromised computer sending out spam. You may view the transmission headers. In Thunderbird select View -> Message Details; Outlook 2007, expand the Options menu. Below is an example of a correctly filtered message:

X-Spam-Checker-Version: SpamAssassin 3.2.4 (2008-01-01) on 
assmule.apisnetworks.com

If the message was labeled as spam, then the X-Spam-Flag will report YES otherwise NO

X-Spam-Score: 7.1 
X-Spam-Flag: YES

Further, if the message is labeled as spam, then the subject will have [SPAM] at the front followed by its score. You may use this attribute to filter within your e-mail client or filter server-side with the SpamAssassin Wizard

Scoring

How can I enhance scoring?

If your account has received an adequate volume of e-mail (200 spams + 200 non-spam messages), Bayesian filtering will automatically activate. As your account ages, Bayesian filtering will progressively become Edit .spamassassin/user_prefs and to increase Bayesian scoring of 95 and above:

score BAYES_99 7
score BAYES_95 5

This should greatly enhance the server's ability to catch new spam, but only if you have an adequate number of learned messages.

It is also recommended you run sa-learn periodically on missed spam to retrain the filter.

See also: Streamlining SpamAssassin's Learning Process

Determining how many spams/hams Are learned

sa-learn --dump magic will display the Bayesian metadata.

[msaladna@assmule ~]$ sa-learn --dump magic
0.000          0          3          0  non-token data: bayes db version
0.000          0     140684          0  non-token data: nspam
0.000          0      33568          0  non-token data: nham
0.000          0     136013          0  non-token data: ntokens
0.000          0 1207054271          0  non-token data: oldest atime
0.000          0 1207402325          0  non-token data: newest atime
0.000          0 1207402432          0  non-token data: last journal sync atime
0.000          0 1207399817          0  non-token data: last expiry atime
0.000          0     345600          0  non-token data: last expire atime delta
0.000          0      15349          0  non-token data: last expire reduction count

nham: number of hams learned
nspam: number of spams learned
ntokens: number of tokens (words) within the database

In this example the database has 140,684 spams, 33,568 hams, and the oldest entry is from April 1st (1207054271).

Personal tools