Content Filtering with OPEN eFilter

Version 1.2
March 15, 2004

If you have any questions that the guide does not answer, please email us at info@inter7.com.

  1. Content Filtering with OPEN eFilter
    1. The 4 Layers of Open eFilter
      1. Layer 1 (rblsmtpd)
      2. Layer 2 (Check User)
      3. Layer 3 (Virus Scanning)
      4. Layer 4 (Content Filtering)
    2. Startup Scripts
      1. BSD Systems
      2. SysV style startup scripts for Linux/Unix systems
    3. Log files
      1. Log directories
      2. How to monitor the log files
    4. Checking the Status of the Processes
    5. Configuration files
    6. General Information on Spam Filtering
    7. Setting Up Mail Clients to Filter Spam
      1. Netscape 4
      2. Netscape 6/Mozilla
      3. Pine
      4. Eudora
      5. Outlook 2002
      6. Outlook Express 6
      7. Mac OS X Mail
    8. SpamAssassin Administrator Information
      1. Overview
      2. Startup / Shutdown
      3. Configuration files
        1. whitelist_from
        2. report_safe { 0 | 1 | 2 } (default: 1)
        3. required_hits n.nn (default: 5)
        4. rewrite_subject { 0 | 1 } (default: 0)
        5. subject_tag STRING ... (default: *****SPAM*****)
        6. skip_rbl_checks { 0 | 1 } (default: 0)
        7. check_mx_attempts n (default: 2)
        8. dns_available { yes | test[: name1 name2...] | no } (default: test)
        9. use_auto_whitelist ( 0 | 1 ) (default: 1)
      4. Per user configuration files
      5. Enabling/Disabling SpamAssassin Per Domain
      6. Enabling/Disabling SpamAssassin Per User
      7. Ruleset Customization and Addition
      8. Whitelisting
      9. Changing the Score of a Default Test
    9. Howto's





The 4 Layers of Open eFilter

Layer 1 (rblsmtpd)

RBL stands for Remote Black Hole List. When a remote site makes a connection to your mail server, the rblsmtpd program is envoked.  It records the IP address of the remote site and sends the IP to the RBL server. The RBL server looks up the IP address in their database. If the IP is not in the database it returns back an 'ok' message. If the IP is listed it returns back a failure message along with a URL to the reason why the IP was rejected.

If the IP was in the database, your smtp server returns back a failure message telling the sender not to attempt delivery again. If the IP was not in the database then the rest of the session is processed.There are several sites which support RBL lists. We recommend and have configured your smtp server to use sbl-xbl.spamhaus.org. You can read about the free RBL service at http://www.spamhaus.org.  One benefit of this is that if the sender is a spam site that cleans their email lists of bad email addresses, the sender will remove the email address from their database and never send email to that address again.

Layer 2 (Check User)

Once the sending site tells your smtp server whom the mail is too, the qmail-smtpd daemon performs a lookup to see if the email address is a valid account, mailing list or alias/forward.  If the account does not exist qmail-smtpd ends the session by returning a failure message telling the sender to not attempt delivery again and that the user does not exist. There are two benefits to this approach.

  1. If the sender is a spam site that cleans their email, the email address will be removed from their database and never send to that email address again.
  2. The primary benefit is the email never makes it onto the machine.
Normally, the stock qmail server accepts all incoming email, then attempts to deliver the email locally.  If the email address does not exist, qmail bounces the email back to the sender.  Unfortunately most spam sites do not accept bounced email.  In these situations, qmail will defer the email and attempt to redeliver for several days.  By blocking the email address at the SMTP level, your queue will not have to keep attempting delivery.

Layer 3 (Virus Scanning)

If the email makes it past the first two layers, it is handed over to a program called qscanq which is configured to unpack the email and run the ClamAV anti virus program against each of the parts. If any part of the email contains a virus, an error status is returned back up to the qmail-smtpd process which returns a failure status to the sender telling it to not attempt delivery again. This keeps all virii off of your email server.  ClamAV is set to check for new virus signature files every two hours, and apply them automatically if necessary.

Layer 4 (Content Filtering)

The last stage of the process is to deliver the email to the local users account.  During the delivery, Spam Assassin is called to check the contents for SPAM keywords and signatures.  Your server is configured to start with a global spamassassin configuration and then read in a users personal settings if they exist. The email is then scanned and it given a spam score.  If the score is above the users' threshold, the email is marked as spam and delivered to the user.  There is an additional option which allows any user to optionally just delete the email so it never arrives in their mailbox. The default behavior is to deliver the mail but add [SPAM] to the beginning o the subject line.  Users can usually configure their email client to filter on the subject line and move all spam to a separate folder.  Just in case an “good” email is marked as spam the user can always go through there spam folder and read the email.


Startup Scripts

Open eFilter uses standard startup scripts to start all the services. Some qmail people like to use svscan instead of startup scripts. But we decided to stay with the standard setup that all unix systems use.

BSD Systems

BSD systems usually place their startup scripts in the /etc directory and name the startup scripts with rc.service name. So on your system you will find the following scripts:

These files are normally added to the end of your /etc/rc.local startup script.

SysV style startup scripts for Linux/Unix systems

For these systems the startup scripts are normally placed in /etc/init.d or /etc/rc.d/init.d directory.
Under many Linux distributions, the above files are then symbolically linked to the run state directories.  For example:

/etc/rc.d/init.d/qmail linked to /etc/rc.d/rc3.d/S80qmail

and so on.  During installation, these symbolic links should have already been created.


Redhat systems use the the service and chkconfig programs to help manage startup scripts. On redhat you can stop/start a service like this

If you are not using a Redhat-style system, run the script itself, without any options, to see all options available to you.



Log files

Each of the services log their activity to directories using the multilog program. Multilog automatically rotates your log files so you never have to worry about deleting old log files.

Log directories

qmail /var/log/qmail
smtp /var/log/smtp
pop3 /var/log/pop3
smtp/ssl /var/log/smtps
pop3/ssl /var/log/pop3s
spamd /var/log/spamd
clamd /var/log/clamd
freshclam /var/log/freshclam

In each of these directories you will find a current file along with a state and lock file. The state and lock files are used by multilog for housekeeping. The current file is the file multilog will be activitly logging to. You will/may also see other files that start with an @ sign followed by a long series of numbers and letters. These are the old log files. The filename is in tai64 notation, which is a 64 bit time stamp notation concieved by Dan Bernstein.

How to monitor the log files

tail -f /var/log/qmail/current | tai64nlocal

That will print out the last 10 lines of the log file, converting the 64bit timestamp to human-readable form. The -f option makes tail continue to print out new lines as they are added to the file.



Checking the Status of the Processes

The easiest way to see if services are running is to search through the list of currently running processes.
We like to use the ps command and use grep to do the search.

Spam Assassin
ps ax | grep spamd
ClamAV
ps ax | grep clamd
Freshclam antivirus update ps ax | grep freshclam
SMTP Daemon(s)
ps ax | grep smtp
POP3 Daemon(s)
ps ax | grep pop3
Main Qmail Daemons
ps ax | grep qmail-send
ps ax | grep qmail




Configuration files

Spam Assassin
/etc/mail/spamassassin/
ClamAV
/usr/local/etc/clamav.conf
Qmail
/var/qmail/control/
/var/qmail/users/
/var/qmail/aliases/



General Information on Spam Filtering

SpamAssassin augments the headers of incoming email by adding several additional lines. It does not change the way that email is delivered in any way, but the addition of these new headers gives users the opportunity to filter their incoming mail according to their own standards. These additional headers provide a score for each message, which estimate of the likelihood that this particular piece of email is spam. Below is an example of what the augmented headers look like:

        X-Spam-Flag: YES
X-Spam-Status: Yes, hits=22.6 required=5.9
X-Spam-Level: **********************

SpamAssassin uses several heuristics to determine if a piece of mail is spam. Following is the report associated with the sample email from which the above header lines were taken:

  X-Spam-Report: Detailed Report
SPAM: -------------------- Start SpamAssassin results ----------------------
SPAM: This mail is probably spam. The original message has been altered
SPAM: so you can recognise or block similar unwanted mail in future.
SPAM: See http://spamassassin.org/tag/ for more details.
SPAM:
SPAM: Content analysis details: (22.60 hits, 5.9 required)
SPAM: INVALID_DATE (1.5 points) Invalid Date: header (not RFC 2822)
SPAM: UNDISC_RECIPS (1.5 points) Valid-looking To "undisclosed-recipients"
SPAM: NO_REAL_NAME (1.3 points) From: does not include a real name
SPAM: SMTPD_IN_RCVD (1.2 points) Received via SMTPD32 server (SMTPD32-n.n)
SPAM: MSGID_HAS_NO_AT (0.3 points) Message-Id has no @ sign
SPAM: FROM_HAS_MIXED_NUMS (0.3 points) From: contains numbers mixed in with letters
SPAM: ALL_CAPS_HEADER (0.2 points) Header with all capitals found
SPAM: INVALID_MSGID (0.0 points) Message-Id is not valid, according to RFC 2822
SPAM: DRASTIC_REDUCED (1.9 points) BODY: Drastically Reduced
SPAM: ONCE_IN_LIFETIME (1.8 points) BODY: Once in a lifetime, apparently
SPAM: REMOVE_SUBJ (0.8 points) BODY: List removal information
SPAM: HOME_EMPLOYMENT (0.6 points) BODY: Information on how to work at home (2)
SPAM: CALL_FREE (0.2 points) BODY: Contains a tollfree number
SPAM: SPAM_PHRASE_21_34 (1.9 points) BODY: Spam phrases score is 21 to 34 (high)
SPAM: [score: 22]
SPAM: LINES_OF_YELLING (0.2 points) BODY: A WHOLE LINE OF YELLING DETECTED
SPAM: RAZOR2_CHECK (3.9 points) Listed in Razor2, see http://razor.sf.net/
SPAM: RAZOR_CHECK (2.6 points) Listed in Razor1, see http://razor.sf.net/
SPAM: DATE_IN_PAST_24_48 (1.0 points) Date: is 24 to 48 hours before Received: date
SPAM: RCVD_IN_OSIRUSOFT_COM (0.4 points) RBL: Received via a relay in relays.osirusoft.com
SPAM: [RBL check: found 142.249.10.63.relays.osirusoft.com., type: 127.0.0.3]
SPAM: X_OSIRU_DUL (0.6 points) RBL: DNSBL: sender ip address in in a dialup block
SPAM: X_OSIRU_DUL_FH (0.4 points) RBL: Received from first hop dialup listed in relays.osirusoft.com
SPAM: [RBL check: found 142.249.10.63.relays.osirusoft.com., type: 127.0.0.3]
SPAM:
SPAM: -------------------- End of SpamAssassin results ---------------------



Setting Up Mail Clients to Filter Spam

You can use your mail client to filter, based on the modified headers. Many modern mail clients, such as Netscape, Mozilla, Pine, Eudora, Outlook, Mac OS X Mail, etc... will support this functionality. The header to use for these mail clients is: 
X-Spam-Flag: YES

Netscape 4

  1. Access the Edit -> Message Filters -> New menu and choose a name for the new filter.

  2. Access Filter Criteria -> Customize Headers -> New (this is usually auto-selected to Subject), enter X-Spam-Flag and click Ok.

  3. Return to the Filter Rules window, select the following Filter Criteria:

    	X-Spam-Flag contains YES
  4. Under Filter Action, select a destination folder where you would like to move the likely spam.

Netscape 6/Mozilla

  1. In the Mail window, select: Tools -> Message Filters -> New

  2. In the new filter pane, name the filter.

  3. Under the menu Filter Criteria -> Customize -> New Message Header enter X-Spam-Flag click Add, then OK.

  4. Select as the Filter Criteria: X-Spam-Flag contains YES

  5. Under Perform this action, select a destination folder where you would like move the likely spam.

Pine

As of Pine 4.44 (and possibly some earlier versions), you can automatically filter spam emails from your INBOX. You will need to add a filter rule to look for the X-Spam-Flag header; with this rule you can delete, mark or move the spam to a separate folder.

  1. Start by navigating to the Add Filter screen:

            (M)ain menu -> (S)etup -> (R)ules -> (F)ilters -> (A)dd
  2. By default, the rule will look in your INBOX. Leave this as the default.

  3. Under FILTERED MESSAGE CONDITIONS, navigate to the e(X)traHdr command to add a new header filter.

  4. Enter X-Spam-Flag, then use the (C)hange command to set the value to YES.

  5. In the ACTIONS section, you can set the Filter Action to delete, mark or move the message to a different folder. Enter your choice and save the changes.

Now when you start pine, your INBOX should have all messages marked as spam automatically filtered.

Eudora

  1. Go to the Tools menu and select Filters to open the Filters window.
  2. To add a new filter, click NEW.
  3. Select the option to Match Incoming messages.
  4. In the Header: field type in X-Spam-Flag.
  5. The next drop-down field should have the word 'contains'.
  6. In the field to the right of the word 'contains' type in YES.
  7. Under Action, move the mouse over the arrow next to None and click. (Not all actions are available in the free version of Eudora).
  8. When you do this a field with the word In will appear.
  9. Click on In and select New...
  10. Enter a name for the new mailbox to which your messages will be filtered.

For more information on setting up Eudora see the following web page located at Qualcomm:

http://www.eudora.com/techsupport/tutorials/win_filters.html

Outlook 2002

  1. Select Rules Wizard from the Tools menu.
  2. Select your Inbox folder for the "Apply changes to this folder" field.
  3. Click the New button.
  4. Select "Start from a blank rule".
  5. Select "Check messages when they arrive".
  6. Click "Next."
  7. Checkmark the Condition "with specific words in the subject".
  8. Click "specific words" in the Rule description field to edit it. Type [SPAM] in the "Specify words or phrases..." field.
  9. Click "Add" and then click "OK".
  10. Click "Next."
  11. Checkmark "move it to the specified folder" from the "What do you want to do..." list.
  12. Click "Specified" in the Rule description field to edit it.
  13. Click "New" to create a new folder.
  14. Type 'Spam' in the Name field
  15. Click in the "Select where to place the folder" field and click INBOX then "OK."
  16. Click "Finish" to create the rule.
  17. Click "OK" to exit.

Outlook Express 6

  1. Click on Tools -> Message Rules -> Mail
  2. Select "Where the subject line contains specific words" in section 1.
  3. Select "Move it to the specified folder"
  4. Click on "contains specific words" in the bottom box
  5. Enter [SPAM] into the box and click on add.
  6. Click on specified in the bottom box.
  7. Click on New Folder, enter a name for the folder (ie. spam) and click OK and then OK again.
  8. Click on OK once more and the rule will be complete.

NOTE: Outlook Express 6 can only filter local folders used by a POP3 protocol connection. If you are using an IMAP protocol connection, you must use another program to filter mail identified as spam.

Mac OS X Mail

  1. Access the account settings for your mailbox by either control-clicking or right clicking the mailbox name, selecting Account Settings, and selecting Edit "mailbox name"...
  2. Select Rules in the Accounts window.
  3. Click the Add Rule button.
  4. Within the conditions area at the top, select the first conditional pulldown and either...
  5. Set X-Spam-Flag equal to YES within the conditions area
  6. Under Perform the following actions:, select how you would like to handle spam. We suggest transferring the message to your Junk mailbox.
  7. Click OK to complete the rule.
  8. Close the Rules window to return to Mail.



SpamAssassin Administrator Information

Overview

Full documentation is available at http://www.spamassassin.org/doc.html.
The spamd daemon is started and listens to connections on 127.0.0.1 port 783. Each vpopmail domains .qmail-default file calls spamc which makes a connection to the spamd daemon. The spamd daemon scans the email and returns it back to the spamc client, which in turn passes it to vpopmail's vdelivermail program for delivery to the users email directory.

Startup / Shutdown

The SpamAssassin daemon will automatically start on system reboot. The process name is spamd. Note: the system will keep delivering mail even if the spam assassin daemon is not running (spamc -f option) The daemon is started with the following options:
-d -v -u vpopmail

-d = start in daemon mode
-v = include vpopmail user support
-u vpopmail = run as the vpopmail user

To stop the spam assassin daemon:
BSD: /usr/local/etc/rc.d/spamd.sh stop
Linux: /etc/init.d/spamd stop

To start the spam assassin daemon:
BSD: /usr/local/etc/rc.d/spamd.sh start
Linux: /etc/init.d/spamd start

Configuration files

Configuration files are located in /etc/mail/spamassassin.
The file in this directory is named local.cf for local (global) configuration.

Details of current settings:

report_safe 0
required_hits 4
rewrite_subject 1
subject_tag [SPAM]
skip_rbl_checks 1
check_mx_attempts 0
dns_available no
use_auto_whitelist 0

Explanation from http://www.spamassassin.org/doc/Mail_SpamAssassin_Conf.html

whitelist_from

Please see the section on whitelisting for more information.

report_safe { 0 | 1 | 2 } (default: 1)

If this option is set to 1, and if an incoming message is tagged as spam, instead of modifying the original message, SpamAssassin will create a new report message and attach the original message as a message/rfc822 MIME part (ensuring the original message is completely preserved, not easily opened, and easier to recover).

If this option is set to 2, then original messages will be attached with a content type of text/plain instead of message/rfc822. This setting may be required for safety reasons on certain broken mail clients that automatically load attachments without any action by the user. This setting may also make it somewhat more difficult to extract or view the original message. If this option is set to 0, incoming spam is only modified by adding some X-Spam- headers and no changes will be made to the body. In addition, a header named X-Spam-Report will be added to spam. You can use the remove_header option to remove that header after setting report_safe to 0.

required_hits n.nn (default: 5)

Set the number of hits required before a mail is considered spam. n.nn can be an integer or a real number. 5.0 is the default setting, and is quite aggressive; it would be suitable for a single-user setup, but if you're an ISP installing SpamAssassin, you should probably set the default to be more conservative, like 8.0 or 10.0. It is not recommended to automatically delete or discard messages marked as spam, as your users will complain, but if you choose to do so, only delete messages with an exceptionally high score such as 15.0 or higher.

Note: the system is current configured to never delete email marked as spam.

rewrite_subject { 0 | 1 } (default: 0)

By default, the subject lines of suspected spam will not be tagged. This can be enabled here.

subject_tag STRING ... (default: *****SPAM*****)

Text added to the Subject: line of mails that are considered spam, if rewrite_subject is 1. Tags can be used here as with the add_header option. If report_safe is not used (see below), you may only use the _HITS_ and _REQD_ tags, or SpamAssassin will not be able to remove this markup from your message.
Note: current setting string is [SPAM]

skip_rbl_checks { 0 | 1 } (default: 0)

By default, SpamAssassin will run RBL checks. If your ISP already does this for you, set this to 1. Note: currently, the system is set to use spamhaus rbl lists, so spam assassindoes not need to duplicate the check.

check_mx_attempts n (default: 2)

By default, SpamAssassin checks the From: address for a valid MX this many times, waiting 5 seconds each time.
Note: currently we have this turned off for performance. After watching the  system load, we could turn this on.

dns_available { yes | test[: name1 name2...] | no } (default: test)

By default, SpamAssassin will query some default hosts on the internet to attempt to check if DNS is working on not. The problem is that it can introduce some delay if your network connection is down, and in some cases it can wrongly guess that DNS is unavailable because the test connections failed. SpamAssassin includes a default set of 13 servers, among which 3 are picked randomly.
Note: currently set to no

use_auto_whitelist ( 0 | 1 ) (default: 1)

Whether to use auto-whitelists. Auto-whitelists track the long-term average score for each sender and then shift the score of new messages toward that long-term average. This can increase or decrease the score for messages, depending on the long-term behavior of the particular correspondent. For more information about the auto-whitelist system, please look at the the Automatic Whitelist System section of the README file. The auto-whitelist is not intended as a general-purpose replacement for static whitelist entries added to your config files.
Note: currently set to 0 to turn off auto white listing

Additional configuration files are located in the directory containing 10_misc.cf.  This file contains the text message of "Spam detection software.."

Per user configuration files

Each user can have it's own configuration file to override settings, such as the require_hits, whitelisting, report_safe type. Spam assassin looks for a user configuration file in the same directory as the users Maildir.

For example: user@domain.com spam directory would be located in /home/vpopmail/domains/domain.com/user/.spamassassin
The file to modify would be local.cf. The directory and the local.cf file should be owned by vpopmail:vchkpw

Enabling/Disabling SpamAssassin Per Domain

The domains .qmail-default file is where spam assassin is enabled/disabled per domain. Examples of .qmail-default file contents

Enabled .qmail-default (all on one line)
| spamc -f -u $EXT@$HOST | /home/vpopmail/bin/vdelivermail '' user@domain.com

Disabled .qmail-default example:
| /home/vpopmail/bin/vdelivermail '' user@domain.com

Options to spamc:
-f -u $EXT@$HOST

-f deliver mail even if spam daemon is not running
-u $EXT@$HOST look for user configuration file in vpopmail location.
$EXT@$HOST gets expanded to the delivery user@domain


Enabling/Disabling SpamAssassin Per User


Turn Spam Assassin off:
/home/vpopmail/bin/vmoduser -f
Turn Spam Assassin on: /home/vpopmail/bin/vmoduser -x
NOTE: vmoduser -x clears ALL flags, including the spam-off flag

Ruleset Customization and Addition

There is an abundance of documentation available for updating rules.
Here are some good links on where to start.

http://www.merchantsoverseas.com/wwwroot/gorilla/sa_rules.htm
http://wiki.spamassassin.org/w/SpamAssassinRules

There is an execellent web site which will help you create new rules to block certain
words or obfuscated versions of those words:

http://sandgnat.com/cmos/cmos.jsp

For example, for viagra you can put the following in /etc/mail/spamasassin/local.cf

header ONLY_VGR_SUBJ Subject =~ /\bviagra\b/i
score ONLY_VGR_SUBJ 2.8
describe ONLY_VGR_SUBJ Obfuscated 'viagra' in subject

body ONLY_VGR /\bviagra\b/i
score ONLY_VGR 1.8
describe ONLY_VGR Obfuscated 'viagra' in body

If you want to really get the viagra messages increase
the "score" count to something higher, like:

header ONLY_VGR_SUBJ Subject =~ /\bviagra\b/i
score ONLY_VGR_SUBJ 4.0
describe ONLY_VGR_SUBJ Obfuscated 'viagra' in subject

body ONLY_VGR /\bviagra\b/i
score ONLY_VGR 4.0
describe ONLY_VGR Obfuscated 'viagra' in body

After you make the changes to /etc/mail/spamassassin/local.cf , restart the spamassassin deamon.

Whitelisting

Used to specify addresses which send mail that is often tagged (incorrectly) as spam.  You can add these options into any of the local.cf files you are currently using on your system.  For global whitelisting, edit /etc/mail/spamassassin/local.cf.

Whitelist and blacklist addresses are now file-glob-style patterns, so friend@somewhere.com, *@isp.com, or *.domain.net will all work. Specifically, * and ? are allowed, but all other metacharacters are not.
Regular expressions are not used for security reasons
.

Multiple addresses per line, separated by spaces, is OK. Multiple whitelist_from lines is also OK.

The headers checked for whitelist addresses are as follows: if Resent-From is set, use that; otherwise check all addresses taken from the following set of headers:

        Envelope-Sender
Resent-Sender
X-Envelope-From
From

e.g.

  whitelist_from joe@example.com fred@example.com
whitelist_from *@example.com

You might want to whitelist your primary domains so that news letters or other communications within the domains are not marked as spam. This can be done by adding a new line to /etc/mail/spamassassin/local.cf as follows:

whitelist_from *@domain

After you make the changes to /etc/mail/spamassassin/local.cf , restart the spamassassin deamon.

Changing the Score of a Default Test

If you wish to change the score of a built-in test from the default value, add a line like this to your /etc/mail/spamassassin/local.cf:

score NAME_OF_TEST 3.0

Where 3.0 is the hits you wish that test to incur, and NAME_OF_TEST is the test name from the TEST NAME column below.

If you wish to disable a test, set the score to 0 by adding a line like this to your /etc/mail/spamassassin/local.cf:

score NAME_OF_TEST 0

After you make the changes to /etc/mail/spamassassin/local.cf , restart the spamassassin deamon.

Howto's




Inter7 Internet Technologies, Inc.
Phone: 847 492 0470
Fax: 847 492 0632

http://www.inter7.com/openefilter.html