[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

8.18.5 Filtering Spam Using The Spam ELisp Package

The idea behind `spam.el' is to have a control center for spam detection and filtering in Gnus. To that end, `spam.el' does two things: it filters incoming mail, and it analyzes mail known to be spam or ham. Ham is the name used throughout `spam.el' to indicate non-spam messages.

So, what happens when you load `spam.el'?

First of all, you must set the variable spam-install-hooks to t and install the spam.el hooks:

 
(setq spam-install-hooks t)
(spam-install-hooks-function)

This is automatically done for you if you load spam.el after one of the spam-use-* variables explained later are set. So you should load spam.el after you set one of the spam-use-* variables:

 
(setq spam-use-bogofilter t)
(require 'spam)

You get the following keyboard commands:

M-d
M s x
S x
gnus-summary-mark-as-spam.

Mark current article as spam, showing it with the `$' mark. Whenever you see a spam article, make sure to mark its summary line with M-d before leaving the group. This is done automatically for unread articles in spam groups.

M s t
S t
spam-bogofilter-score.

You must have Bogofilter installed for that command to work properly.

See section 8.18.5.7 Bogofilter.

Also, when you load `spam.el', you will be able to customize its variables. Try customize-group on the `spam' variable group.

The concepts of ham processors and spam processors are very important. Ham processors and spam processors for a group can be set with the spam-process group parameter, or the gnus-spam-process-newsgroups variable. Ham processors take mail known to be non-spam (ham) and process it in some way so that later similar mail will also be considered non-spam. Spam processors take mail known to be spam and process it so similar spam will be detected later.

Gnus learns from the spam you get. You have to collect your spam in one or more spam groups, and set or customize the variable spam-junk-mailgroups as appropriate. You can also declare groups to contain spam by setting their group parameter spam-contents to gnus-group-spam-classification-spam, or by customizing the corresponding variable gnus-spam-newsgroup-contents. The spam-contents group parameter and the gnus-spam-newsgroup-contents variable can also be used to declare groups as ham groups if you set their classification to gnus-group-spam-classification-ham. If groups are not classified by means of spam-junk-mailgroups, spam-contents, or gnus-spam-newsgroup-contents, they are considered unclassified. All groups are unclassified by default.

In spam groups, all messages are considered to be spam by default: they get the `$' mark (gnus-spam-mark) when you enter the group. If you have seen a message, had it marked as spam, then unmarked it, it won't be marked as spam when you enter the group thereafter. You can disable that behavior, so all unread messages will get the `$' mark, if you set the spam-mark-only-unseen-as-spam parameter to nil. You should remove the `$' mark when you are in the group summary buffer for every message that is not spam after all. To remove the `$' mark, you can use M-u to "unread" the article, or d for declaring it read the non-spam way. When you leave a group, all spam-marked (`$') articles are sent to a spam processor which will study them as spam samples.

Messages may also be deleted in various other ways, and unless ham-marks group parameter gets overridden below, marks `R' and `r' for default read or explicit delete, marks `X' and `K' for automatic or explicit kills, as well as mark `Y' for low scores, are all considered to be associated with articles which are not spam. This assumption might be false, in particular if you use kill files or score files as means for detecting genuine spam, you should then adjust the ham-marks group parameter.

Variable: ham-marks
You can customize this group or topic parameter to be the list of marks you want to consider ham. By default, the list contains the deleted, read, killed, kill-filed, and low-score marks.

Variable: spam-marks
You can customize this group or topic parameter to be the list of marks you want to consider spam. By default, the list contains only the spam mark.

When you leave any group, regardless of its spam-contents classification, all spam-marked articles are sent to a spam processor, which will study these as spam samples. If you explicit kill a lot, you might sometimes end up with articles marked `K' which you never saw, and which might accidentally contain spam. Best is to make sure that real spam is marked with `$', and nothing else.

When you leave a spam group, all spam-marked articles are marked as expired after processing with the spam processor. This is not done for unclassified or ham groups. Also, any ham articles in a spam group will be moved to a location determined by either the ham-process-destination group parameter or a match in the gnus-ham-process-destinations variable, which is a list of regular expressions matched with group names (it's easiest to customize this variable with customize-variable gnus-ham-process-destinations). The ultimate location is a group name. If the ham-process-destination parameter is not set, ham articles are left in place. If the spam-mark-ham-unread-before-move-from-spam-group parameter is set, the ham articles are marked as unread before being moved.

When you leave a ham group, all ham-marked articles are sent to a ham processor, which will study these as non-spam samples.

By default the variable spam-process-ham-in-spam-groups is nil. Set it to t if you want ham found in spam groups to be processed. Normally this is not done, you are expected instead to send your ham to a ham group and process it there.

By default the variable spam-process-ham-in-nonham-groups is nil. Set it to t if you want ham found in non-ham (spam or unclassified) groups to be processed. Normally this is not done, you are expected instead to send your ham to a ham group and process it there.

When you leave a ham or unclassified group, all spam articles are moved to a location determined by either the spam-process-destination group parameter or a match in the gnus-spam-process-destinations variable, which is a list of regular expressions matched with group names (it's easiest to customize this variable with customize-variable gnus-spam-process-destinations). The ultimate location is a group name. If the spam-process-destination parameter is not set, the spam articles are only expired.

To use the `spam.el' facilities for incoming mail filtering, you must add the following to your fancy split list nnmail-split-fancy or nnimap-split-fancy:

 
(: spam-split)

Note that the fancy split may be called nnmail-split-fancy or nnimap-split-fancy, depending on whether you use the nnmail or nnimap back ends to retrieve your mail.

The spam-split function will process incoming mail and send the mail considered to be spam into the group name given by the variable spam-split-group. By default that group name is `spam', but you can customize spam-split-group.

You can also give spam-split a parameter, e.g. `'spam-use-regex-headers'. Why is this useful?

Take these split rules (with spam-use-regex-headers and spam-use-blackholes set):

 
 nnimap-split-fancy '(|
		      (any "ding" "ding")
		      (: spam-split)
		      ;; default mailbox
		      "mail")

Now, the problem is that you want all ding messages to make it to the ding folder. But that will let obvious spam (for example, spam detected by SpamAssassin, and spam-use-regex-headers) through, when it's sent to the ding list. On the other hand, some messages to the ding list are from a mail server in the blackhole list, so the invocation of spam-split can't be before the ding rule.

You can let SpamAssassin headers supersede ding rules, but all other spam-split rules (including a second invocation of the regex-headers check) will be after the ding rule:

 
 nnimap-split-fancy '(|
		      (: spam-split 'spam-use-regex-headers)
		      (any "ding" "ding")
		      (: spam-split)
		      ;; default mailbox
		      "mail")

Basically, this lets you invoke specific spam-split checks depending on your particular needs. You don't have to throw all mail into all the spam tests. Another reason why this is nice is that messages to mailing lists you have rules for don't have to have resource-intensive blackhole checks performed on them. You could also specify different spam checks for your nnmail split vs. your nnimap split. Go crazy.

You still have to have specific checks such as spam-use-regex-headers set to t, even if you specifically invoke spam-split with the check. The reason is that when loading `spam.el', some conditional loading is done depending on what spam-use-xyz variables you have set.

Note for IMAP users

The boolean variable nnimap-split-download-body needs to be set, if you want to split based on the whole message instead of just the headers. By default, the nnimap back end will only retrieve the message headers. If you use spam-check-bogofilter, spam-check-ifile, or spam-check-stat (the splitters that can benefit from the full message body), you should set this variable. It is not set by default because it will slow IMAP down, and that is not an appropriate decision to make on behalf of the user.

See section 6.5.1 Splitting in IMAP.

TODO: Currently, spam.el only supports insertion of articles into a back end. There is no way to tell spam.el that an article is no longer spam or ham.

TODO: spam.el needs to provide a uniform way of training all the statistical databases. Some have that functionality built-in, others don't.

The following are the methods you can use to control the behavior of spam-split and their corresponding spam and ham processors:

8.18.5.1 Blacklists and Whitelists  
8.18.5.2 BBDB Whitelists  
8.18.5.3 Gmane Spam Reporting  
8.18.5.4 Anti-spam Hashcash Payments  
8.18.5.5 Blackholes  
8.18.5.6 Regular Expressions Header Matching  
8.18.5.7 Bogofilter  
8.18.5.8 ifile spam filtering  
8.18.5.9 spam-stat spam filtering  
8.18.5.10 Using SpamOracle with Gnus  
8.18.5.11 Extending the spam elisp package  


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

This document was generated on October, 20 2003 using texi2html