Location: http://66.102.7.104/search?q=cache:bGgu0V39hJsJ:www.uwasa.fi/~ts/info/proctips.html+procmail&hl=en&ie=UTF-8
+------------------------------------------------------------------------------------------------+
|+----------------------------------------------------------------------------------------------+|
|| This is G o o g l e's cache of http://www.uwasa.fi/~ts/info/proctips.html as retrieved on    ||
|| Oct 10, 2005 06:57:51 GMT.                                                                   ||
|| G o o g l e's cache is the snapshot that we took of the page as we crawled the web.          ||
|| The page may have changed since that time. Click here for the current page without           ||
|| highlighting.                                                                                ||
|| This cached page may reference images which are no longer available. Click here for the      ||
|| cached text only.                                                                            ||
|| To link to or bookmark this page, use the following url: http://www.google.com/search?q=     ||
|| cache:bGgu0V39hJsJ:www.uwasa.fi/~ts/info/proctips.html+procmail&hl=en&ie=UTF-8               ||
||                                                                                              ||
|| Google is neither affiliated with the authors of this page nor responsible for its content.  ||
||----------------------------------------------------------------------------------------------||
|| These search terms have been highlighted:  procmail                                          ||
|+----------------------------------------------------------------------------------------------+|
+------------------------------------------------------------------------------------------------+
--------------------------------------------------------------------------------------------------

                                                             <http://www.uwasa.fi/~ts/            
                                                             info/proctips.html>                  
                                                             Copyright © 1999-2005 by           
                                                             Prof. Timo Salmi                     
(Logo and goto: Prof. Timo Salmi, Department of Accounting   Last modified Sat          (Sign     
and Finance, University of Vaasa, Finland)                   16-Apr-2005 05:07          guestbook)

--------------------------------------------------------------------------------------------------
Visits to this page (with images on) since Oct 7 1999: (Visits counter image)
--------------------------------------------------------------------------------------------------
+------------------------------------------------------------------------------------------------+
|                                          (Timo Salmi)                                          |
|                           [tspcml] Timo's procmail tips and recipes                            |
+------------------------------------------------------------------------------------------------+

Although there already is an abundance of procmail material on the net, here are some of my own
tips and observations. This tips page is a companion of my Foiling Spam with an Email Password
System page. The items on this page are in no particular order.

 1. I want to filter my email automatically. How do I get started with procmail?
 2. Building a testbench. How can I test individual procmail recipes?
 3. I know how to make "and" rules in procmail recipes, but how do I make "or" rules?
 4. How can one perform multiple shell commands on the action line?
 5. How can I find out what the subject of a posting is?
 6. How do I get a copy of the headers of all the incoming email into a separate file?
 7. Would you give some further hints for spam foiling recipes?
 8. I have limited disk space. How can I truncate long messages?
 9. How can I quickly test if my rules with regular expressions match?
10. How can I detect if the email comes, say, from the .com domain?
11. What alternatives do I have to detect a sender all through the various header-fields?
12. How can I extract a valid address from the Reply-To field?
13. How can I extract the address of the sender's postmaster?
14. How can I weed out an inordinately long recipient list?
15. What is this procmail scoring? How can I utilize it?
16. How can I test if the subject is empty or if the subject field is missing altogether?
17. How can I modify the "To:" field of the email I received?
18. I have a long list of spammers in a separate file. How can I utilize it?
19. How do I forward certain messages that I get, and preserve myself a copy?
20. How do I forward certain messages to two different addresses?
21. How do I automatically return certain email messages?
22. My address has changed. How do I forward a copy to myself and tell the sender?
23. How can I set variable values based on the text in the body of the email message?
24. How can I insert some token text in front of the body of incoming email?
25. Do you have any useful tips for regular expression matching?
26. How can I test if two procmail variables have the same contents?
27. I am having difficulties with "<". How does one match it?
28. How can I insert identification text to the beginning of the subject line?
29. I tried out your tips, but some of them failed on my system. What next?
30. Is there a cure for the echo and grep blues?
31. How do I know which of my many procmail recipes has been enacted?
32. How can I detect Korean, Cyrillic, or Chinese to avoid such frequent spam?
33. How can I change the subject line and include part of the message body into it?
34. How can I remove the signature from the incoming email?
35. What unix manuals relating to procmail should I get?
36. Is it possible to use procmail to call the vacation program?
37. How can I avoid duplicate messages sent in rapid succession?
38. How can I skip logging a certain, matched recipe?
39. Could you please solve for me this procmail problem of mine?
40. I liked this material. Do you have anything else on programming?
41. Exercises
42. Acknowledgements for useful advice and/or feedback

--------------------------------------------------------------------------------------------------
[tsroa] I want to filter my email automatically. How do I get started with procmail?
 
Unix email can conveniently be preprocessed with automatic filters such as procmail, the
"Autonomous mail processor". This item repeats what already is presented about getting started in
many of the other FAQs, including mine on spamfoiling. Nevertheless, this is so crucial that I'll
try to give the essential outline also here.

Find out what your email directory is. Go ("cd") to the directory where your email folders are
located and type "pwd". Assume in this item that you get "/home/myid/Mail". Further assume in the
example that "/home/myid" is your home directory so that you can use the variable "${HOME}" to
denote it.

Find out where your system's Bourne shell is located by typing "which sh". Assume that you get "/
usr/bin/sh".

Prepare a "~/.procmailrc" file with a suitable editor. For example you might use "emacs ~
/.procmailrc". To start with, put something like this into the ~/.procmailrc file:

#Preliminaries
SHELL=/usr/bin/sh               #Use the Bourne shell (check your path!)
MAILDIR=${HOME}/Mail            #First check what your mail directory is!
LOGFILE=${MAILDIR}/procmail.log
LOG="--- Logging ${LOGFILE} for ${LOGNAME}, "

#Whatever recipes you'll use
#The order of the recipes is significant
:0
* ^From: scam@cyberspam\.com
/dev/null

# Accept all the rest to your default mailbox
:0:
${DEFAULT}

For the "~/.procmailrc" file a read permission for the user him/herself will be sufficient. To
ensure, give the command "chmod u+r ~/.procmailrc".

Find out where the "procmail" program is located on your system by typing "which procmail". Assume
below that you get "/usr/local/bin/procmail". Also check what your id is: "whoami". Assume that
you get "myid".

Next comes the crucial step. Put the following line in your "~/.forward" file. Include the quotes
(") into the ~/.forward file contents.

"|IFS=' ' && exec /usr/local/bin/procmail || exit 75 #myid"

Set adequate permissions for accessing the "~/.forward" file: "chmod 644 ~/.forward". Lastly,
check ("ls -lFd ~/") that your main directory permissions are at least (the equivalent of) "
drwx--s--x". If not, "chmod u+rwx ~/" and "chmod og+x ~/".

You should now be set to go. To check, send an email to yourself to see if it gets through. If
there is a problem see the advice on troubleshooting.
--------------------------------------------------------------------------------------------------
# How can I test individual procmail recipes? I do not wish to disturb my regular ~/.procmailrc
recipes file in the process.
 
There are several options. One method is building a simple test environment as follows. It is a
very convenient method. If you apply it right, it allows the testing without affecting your normal
flow of email in any way. Create the following "proctest" file, preferably at your path. Make it
executable using "chmod u+x proctest". Thus you'll have a new command "proctest" available.

#!/bin/sh
#The executable file named "proctest"
#
# You need a test directory.
TESTDIR=/home/myid/test
if [ ! -d ${TESTDIR} ] ; then
  echo "Directory ${TESTDIR} does not exist; First create it"
  exit 0
fi
#
#Feed an email message to procmail. Apply proctest.rc recipes file.
#First prepare a mail.msg email file which you wish to use for the
#testing.
procmail ${TESTDIR}/proctest.rc < mail.msg
#
#Show the results.
less ${TESTDIR}/Proctest.log
clear
less ${TESTDIR}/Proctest.mail
#
#Clean up.
rm -i ${TESTDIR}/Proctest.log
rm -i ${TESTDIR}/Proctest.mail

The beauty of this method is that besides "proctest.rc" you can easily edit also "mail.msg" for
testing different kinds of incoming mail and the behavior of your recipes in various situations.
Note, however, that it is best to test only for one email message at a time. In other words, do
not put more than one email message into the mail.msg test file.

A question remains. Where does one get the structure of a posting for the "mail.msg" test posting?
Easy. Invoke elm, select a suitable, existing posting, and make a copy of it to "mail.msg" by
pressing C (capital C) and reply mail.msg to "Copy message to:". Other mail programs probably have
similar options.

Below is the proctest.rc recipe file which I used in preparing for this item:

SHELL=/bin/sh
TESTDIR=/home/myid/test
MAILDIR=${TESTDIR}
LOGFILE=${TESTDIR}/Proctest.log
LOG="--- Logging for ${LOGNAME}, "

#Troubleshooting:
VERBOSE=yes
LOGABSTRACT=all

#Let's test stripping lines from the email message's header
:0 fwh
| egrep -vi "(^Content-|^MIME-Version:.)"

#If it is from myself, store the email message
:0:
* $ ^From:.*${LOGNAME}
${TESTDIR}/Proctest.mail

#Otherwise, discard the email message
:0
/dev/null

# Feedback: The header stripping does not work if any of those header lines is continued. It is
almost always an error to use grep/egrep/fgrep when filtering a message header. A better recipe
would be the following, utilizing formail:

#Let's test stripping lines from the email message's header,
#but only when they're there
:0 fwh
* ^(Mime-Version:|Content-)
| formail -IMime-Version: -IContent-
 
To continue myself. The flags are as follows: "f" use the pipe as a filter, "w" execute before
proceeding, "h" it is about the header of the email message.

The formail -I switch means that if the field is found it is to be replaced with a similar field
with and "Old-" prefix, provided that the field is not empty (if it is empty the field is
removed).
--------------------------------------------------------------------------------------------------
[tsru] I know how to make "and" rules in procmail recipes, but how do I make "or" rules?
 
Just in case, let's first revisit an "and" rule by a common example:

#Trivial catching of potential spam towards the end of a ~/.procmailrc
#Place only after accepting all the mailing lists you want to receive
:0:
* ! ^TO_ts@([-a-z0-9_]+\.)*uvasa\.fi
* ! ^TO_timo\.salmi
${HOME}/.mail/PotentialSpam.mail

For entering an "or" rule, consider the following example:

#Accept email from Era Eriksson, the author of the major procmail FAQ
:0:
* ^From:.*reriksso@([-a-z0-9_]+\.)*helsinki\.fi|\
  ^From:.*era@iki\.fi
${DEFAULT}

Let's look at a few details:

  * The "^TO_" in the first recipe is a procmail reserved predefined special expression "which
    should catch all destination specifications containing a specific address." It must be written
    in upper case.
   
  * The "!" in the first recipe is the familiar operator indicating a negation.
   
  * If "${HOME}/.mail" is your mail directory you don't need to spell out the entire path "${HOME}
    /.mail/PotentialSpam.mail". Just "PotentialSpam.mail" will be sufficient.
   
  * The first detail of the "or" example is complicated and is per se unrelated to the "or" issue
    at hand. The "([-a-z0-9_]+\.)*" expression in "reriksso@([-a-z0-9_]+\.)*helsinki\.fi" sees to
    it that if Era has several machines in his domain (as I do under mine), all will be matched by
    the recipe. The "[-a-z0-9_]" matches any of the characters within the brackets "[]", the
    trailing "+" tells that there must be at least one repeat of those characters, the "\."
    matches a dot, and the "*" tells that there has to be zero or more repeats if the preceding
    expression within the parentheses "()". [This item owes heavily to Era's friendly guidance.]
   
  * The backslash "\" in "helsinki\.fi" sees to it that the actual dot (.) is matched. This is
    because if the "quote next character" "\" is omitted, the "." is taken as a regular expression
    matching any (exactly one) character.
   
  * The "|" in the "|\" indicates an "or" condition, and the "\" quotes the embedded end of line,
    i.e. tells that the rule is continued on the next line.
   
  * The "|" or condition sees to it that the recipe matches email coming from Era either from the
    "helsinki.fi" or the "iki.fi" domain.
   
  * The "${DEFAULT}" puts the email in the regular mailbox.
   
  * The trailing ":" in the recipe start line ":0:" tells procmail to use temporary file locking
    to avoid writing simultaneously arriving potential email on top of each other at your "$
    {DEFAULT}" mailbox. Since no lock file name is given after the ":0:", procmail will provide
    the lockfile name. Always use this format when delivering to a mail folder, unless the target
    folder is /dev/null. That is, unless you want the email is discarded.
   
There are alternatives. Scoring could be used for the same purpose

:0:
* 1^0 ^From:.*reriksso@([-a-z0-9_]+\.)*helsinki\.fi
* 1^0 ^From:.*era@iki\.fi
${DEFAULT}

Likewise, you could alternatively use ( ) grouping

:0:
* ^From:.*(\
reriksso@([-a-z0-9_]+\.)*helsinki\.fi|\
era@iki\.fi)
${DEFAULT}

# Feedback: That condition looks a bit ugly to me. Let me rephrase it to show you what I mean:
 
* ^From:.*(reriksso@([-a-z0-9]+\.)*helsinki|era@iki)\.fi
 
(an underscore can not be part of a hostname, as far as I know.)
 
Yes, many of the rules presented in this FAQ can be written more concisely and/or effectively. The
rules, as presented in the FAQ, are often formulated for easier understanding than efficiency. But
it is useful to improve on the efficiency after one first has got the basic logic of a rule
outlined.
--------------------------------------------------------------------------------------------------
[tsmen] How can one perform multiple shell commands on the action line?
 
See the action line below (i.e. the one starting with the "|" pipe). Separate the commands with "&
&". If you wish to continue on a second line for readability, apply "\" Alternatively, just one
long line could have been used. The recipe below is from a test with the testbench. Its purpose is
just to show this method of giving multiple commands.

#Test if the message has a "Subject:" header and has a subject in it
#(The brackets [] contain a space and a tab)
:0:${TESTDIR}/Proctest.mail.lock
* ^Subject:
* ^Subject:[ ]*\/[^ ].*
| echo "A ^Subject: header found with" >> ${TESTDIR}/Proctest.mail &&\
  echo "${MATCH}" >> ${TESTDIR}/Proctest.mail
 
Likewise, a single command can be subdivided for easier documentation:
 
| echo "A ^Subject: header found but there is no subject" \
  >> ${TESTDIR}/Proctest.mail
 
Below is another example with a slightly different syntax using the semicolon ";" as the
separator. The example also demonstrates how to save diskspace by zipping email from a particular
source. You'll need Info-ZIP's zip and unzip in order to be able to apply it. (They are available
from the proper Unix section of Garbo program archives at the University of Vaasa, Finland.)

:0w:Test.mail.lock
* ^From:.*test
| unzip ${HOME}/mail/Test.zip; \
  cat >> Test.mail; \
  zip -oj9 ${HOME}/mail/Test.zip Test.mail; \
  rm -f Test.mail
 
What happens on the action line is this:

 1. The potentially existing "Test.zip" zip-file is unzipped to obtain the earlier email messages
    that already might be within Test.zip.
     
 2. The incoming email is appended to the extracted Test.mail file.
     
 3. The updated Test.mail file is compressed back into the Test.zip zip-file.
     
 4. The uncompressed Test.mail is deleted.

To be on the safe side procmail is told to wait (the "w" flag in ":0w:Test.mail.lock") until the
pipe ("|") has been performed.
--------------------------------------------------------------------------------------------------
# How can I find out what the subject of a posting is?
 
Now is a good time to utilize my testbench in order to find out if a logic works. Build a /home/
myid/test/proctest.rc file.

SHELL=/bin/sh
TESTDIR=/home/myid/test
MAILDIR=${TESTDIR}
LOGFILE=${TESTDIR}/Proctest.log
LOG="--- Logging for ${LOGNAME}, "

First, a few environment variables are included.

#Troubleshooting:
VERBOSE=yes
LOGABSTRACT=all

The above means: Use full reporting for the debugging.

#An auxiliary regular expression to detect text,
#The brackets [] contain a space and a tab
GETTEXT="[  ]*\/[^  ].*"

If the same expression is used several times in a recipe file, it is convenient to put the
expression into an environment variable instead of writing it out repeatedly.

  * The first part "[ ]*" of the regular expression matches any number of spaces and tabs (even
    the case of none) which can lead the subject.
     
  * The "\/" is a special procmail-only operand which puts a (possible) match found by the rest of
    the expression into a variable named MATCH.
     
  * "[^ ]" means all other characters but the one's within the brackets. The ".*" means that a
    match of non-tab, non-space characters is sought for.

#Test if the message has a "Subject:" header and has a subject in it
:0c:${TESTDIR}/Proctest.mail.lock
* ^Subject:
* $ ^Subject:${GETTEXT}
| echo "A ^Subject: header found with" >> ${TESTDIR}/Proctest.mail &&\
  echo "${MATCH}" >> ${TESTDIR}/Proctest.mail

  * The "c" flag in ":0wc" tells that the processing should continue also after this particular
    recipe has been acted upon. (When the "c" flag is not present, the all the rest of the recipes
    in proctest.rc are all skipped.) The "w" tells to wait until the "|" pipe has finished.
     
  * The ":${TESTDIR}/Proctest.mail.lock" tells which lockfile to use in order to avoid the
    confusion from the possibility of simultaneous arrival of several email messages. Note that
    since we use a pipe "|" in the actions part, it is prudent to explicitly give the name of the
    lock.
     
  * Note the first "$" on the "$ ^Subject:${GETTEXT}" condition line. It tells that the
    environment variables (in this case "GETTEXT") on the line are to be expanded, not to be taken
    as literal text.

#Test if the message has a "Subject:" header but has no subject in it
:0c:${TESTDIR}/Proctest.mail.lock
* ^Subject:
* $ !^Subject:${GETTEXT}
| echo "A ^Subject: header found but there is no subject" \
  >> ${TESTDIR}/Proctest.mail
 
#Test if the message has a "Subject:" at all
:0c:${TESTDIR}/Proctest.mail.lock
* !^Subject:
| echo "No ^Subject: header was found" >> ${TESTDIR}/Proctest.mail
 
#Otherwise, discard the message
:0
/dev/null
 
After the recipes above have been testbenched and cleared, you know that the methods used in them
will work for you in your own environment.

Of course, there are other options for extracting the subject into an environment variable. One is
to utilize "formail" which is a companion to the procmail program. If you include the following
expression at the beginning of your ~/.procmailrc recipes file, you will have the variable $
{SUBJECT} available for the rest of the recipes file.

#Environment variables for procmail
#
#Get the subject
#Discard some dangerous special chars + any leading and trailing blanks
SUBJECT=`formail -xSubject: \
         | tr '\;\`\\' '   ' \
         | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
 
For an example of usage see the Foiling Spam with an Email Password System page.
 
# Feedback: Extracting the header from inside procmail using the \/ token is _much_ faster than
the formail solution.
 
Feedback: If the SUBJECT variable is left empty, apply quotes on the first line, i.e.
SUBJECT=`formail -x"Subject: "\
--------------------------------------------------------------------------------------------------
# How do I get a copy of the headers of all the incoming email into a separate file?
 
You can use

#Header logging
:0hc:${HOME}/.mail/Procmail.head.lock
| cat >> ${HOME}/.mail/Procmail.head

  * The "h" flag in ":0hc" tells that the header should be accessed.
   
  * The "c" flag in ":0hc" orders the processing to continue also after this recipe. In other
    words, you put your other recipes, after the header-catching, in the ordinary fashion. The
    email will reach them.
   
  * The ":${HOME}/.mail/Procmail.head.lock" tells which particular lockfile to use.
   
  * Since there are no condition lines (lines starting with *) this item will always be acted upon
    when it is reached. You wanted to log the headers from all the incoming email, right?
   
  * The "| cat >> ${HOME}/.mail/Procmail.head" appends the headers to the ${HOME}/.mail/Procmail
    .head file.

# Feedback: Since appending to a file is the result of a normal mailbox delivery, that can be more
efficiently written as simply:
 
:0 hc:
$HOME/headers.cut

That eliminates a cat and a shell process, plus the pipe and extra reads and writes.
 
Now, if you want to overwrite the file with each new message [or do some further shell operations
within the pipe], then the cat command is a reasonable choice.
 
[A further point] That would have been an odd name for the lockfile. Why not $HOME/
headers.cut.$LOCKEXT?
 
--------------------------------------------------------------------------------------------------
[tsno] Would you give some further hints for spam foiling recipes?
 
Besides what is on my page Foiling Spam with an Email Password System and a separate item on 
detecting the sender, below are some instructive little tricks.

Perhaps the strongest generic trick against spam is to shirk any email that is not addressed to
you directly, since most spam is addressed to some kind of mailing lists. Of course, you first
will have to accept email from any legitimate mailing list which you have subscribed to. If you
put a suitable recipe after your recipes that accept the legitimate email lists much of the
incoming spam will be caught. Below is a simplified and a bit munged) version of what I do in my
own ~/.procmailrc:

#Catch potential spam
:0
* !^TO_(ts|timo\.salmi)@([-a-z0-9_]+\.)*uwasa\.fi
{
  :0 fwh
  * ^Content-Length:
  | formail -IContent-Length:
  :0:
  Spam.mail
}

If you look carefully through this page, you'll find explanations for all the details in the above
recipe. It will be a good exercise to do so. :-)

Since so much, if not practically all spam comes from forged sender addresses it is much more
effective to block certain suspect email routes than to try to match the elusive spammers. The 
scoring recipe example below treats as spam all email that is routed via dialsprint.net and that
is not addressed to "me" personally.

#Spam avoidance of certain routes and if not for me personally
:0:
* -1^0
*  1^0 ? formail -x"Received:" | egrep -is "dialsprint\.net"
*  1^0 ! ^TO_(myid|myFirstName[ _\.]myLastName)@([-a-z0-9_]+\.)*myhost\.mydom
Spam.mail

  * The "?" at the start of the condition executes and evaluates what is on the condition line
    instead of searching for a literal match.
   
  * Procmail's companion program formail is used to extract all the "Received:" routing
    information from the posting's header. Then "dialsprint.net" is sought for using Unix egrep
    via the "|" pipe.
   
  * This is a sideline, but the simpler, less general form of the last condition line would, of
    course, be just "*  1^0 ! ^TO_myid@myhost\.mydom"
   
  * The scoring system is explained elsewhere on this page, but in brief the score is initialized
    at -1. Each explicit condition is given a weight of 1. If the total score is at least 1 (i.e.
    positive) then the action (storing to the Spam.mail file) is initiated.
   
Fairly often there is a tell-tale exhortation to email to a remove@ or a removeme@ address within
the actual message. As you may know, these are just common ploys of the spammers to get your
address confirmed to make matters even worse for you.

:0B:
* (remove@|removeme@)
PotentialSpam.mail

  * The "B" flag tells the recipe to search through the body of the email message.
     
  * Note the "or" testing on the conditions line.
     
  * Note again the file locking (the trailing : in ":0B:"). Since the email message is directed to
    a folder, we do not need explicitly to name the lockfile. We can let procmail do it. As a
    default it will use the name PotentialSpam.mail.lock
     
  * The "B" means the body and only the body of the message. The header is not included. However,
    I have as hearsay that some procmail versions have a bug in this respect, but I have not been
    able to test that situation myself.

The subject line of the allegedly more respectable [sic] unsolicited advertising has an "ADV"
marker in upper case on the subject line. (For an imaginary legitimacy such spammers occasionally
attach some xenophobic quibble about U.S legislation, not very relevant on the international
Internet.)

:0D:
* ^Subject:.*ADV
PotentialSpam.mail

  * The "D" flag tells to distinguish between the lower and the upper case in testing for a match.

There are some obvious code words that tend to appear on the subject line, such as "make money
fast" and "$$$".

:0:
* (^Subject:.*make.*money.*fast|^Subject:.*\$\$\$)
PotentialSpam.mail

  * Note, not "^Subject:.*$$$", but "^Subject:.*\$\$\$" because, if not quoted with "\", a "$" is
    taken as a regular expression indicating the end of line.
   
  * Other typical subjects which you might wish to catch include such as
      + cable descrambler
      + FOR SALE
      + laser printer toner
      + million email addresses
      + ONLY $
      + Quit Your Job
  * Other typical contents include such as
      + absolutely no obligation
      + call now 24 h
      + to be taken off our list

Don't overdo it, though, lest you end up weeding also some legitimate email.

# Feedback: The regexp:
 
(remove@|removeme@)
 
is much slower than
 
remove(me)?@
 
Having the 'top-level' of the regexp be an alternation (via '|') slows down matching by quite a
bit. The more that can be factored out at the beginning of the regexp, the better. The same goes
for the recipe that matches against the Subject: header-field:
 
^Subject:.*(make.*money.*fast|\$\$\$)
 
is faster than:
 
(^Subject:.*make.*money.*fast|^Subject:.*\$\$\$)
 
My comment: Of course it is commendable to be efficient, especially where easy understanding is
not compromised. However, if the two clash, I often prefer clarity of expression and convenience
over a strict maximization of code efficiency. Don't we have our powerful modern computers to
perform our tasks for us, not vice versa :-). (This is not about the particular feedback above.
The improvements are useful. They are both legible and instructive.)

# More feedback: The "* ^Subject:.*ADV" rule is overly simplistic and catches many non-spam
subjects. Maybe rather something like "* ^Subject:\<*ADV\>"

My comment: Ok. Let's try

:0D:
#(The brackets [] start with a space and a tab)
* ^Subject:.*([ \{<]+)ADV([ :\}>]+|$) |\
  ^Subject:.*(\[+)ADV(:)?(\]+|$)
PotentialSpam.mail
 
It is far from perfect, but it should work reasonably well for regular purposes. Spam detection
requires experimenting anyway. Regular expressions are not easy. They are quite a large subject
area of their own.

The above assumes that there is (at least) one space after the "Subject:" header before the
subject begins. This can be ensured by first applying "formail -z" which you can have high up your
~/.procmailrc. For example I have the upper two lines in mine.

:0 fwh
| formail -z -iContent-Length:
 
:0D:
* ^Subject:.*([ \{<]+)ADV([ :\}>]+|$) |\
  ^Subject:.*(\[+)ADV(:)?(\]+|$)
PotentialSpam.mail
 
See the other items in this tips file for an explanation of the "fwh" flags. The formail program
with the "-z" switch will insert the desired blanks into the header. The "-iContent-Length:"
switch (which is outside the theme of the current item) will replace the Content-Length: headers
with Old-Content-Length: headers.

I use a slightly different recipe in my own ~/.procmailrc recipes file:

:0D
* ^Subject:.*([ ]|<|\[)ADV([ ]|>|:|\]|$)
{
  :0
  { RULE="Catch potential spam by detecting an ADV keyword" }
  :0
  /dev/null
}
 
If you wonder about the "RULE" variable, see the item about logging which rules have been used.

On to a different facet. Some ISPs (Internet Service Providers) do now allow numbers in the email
addresses. Thus, you may identify some of the forged spam by catching a violation in this respect.
The following recipe catches email with numbers in the user id before the @ mark from all the
various nodes on "respectable.net".

:0:
* ^From:.*[0-9].+@([-a-z0-9_]+\.)*respectable.net
PotentialSpam.mail

                                  -----------------------------                                   

Date: Thu, 19 Dec 2002 10:44:44 +1000
From: Philip Gunter
To: Timo Salmi
Subject: A procmail tidbit

Hi Timo, thanks for your excellent procmail reference.

Here is a small recipe you might like to add to your site.
It limits the number of emails being forwarded from an account,
useful to stop sms storms.

Cheers,
Philip.

:0
{
  :0
  {
    # remove any sms-alert files older than 5 minutes
    GLOP_=`find /var/tmp/sms -name sms-alert\* -cmin +5 -exec rm -f {} \;`

    # Create an sms-alert file for this message.
    GLOP_=`touch /var/tmp/sms/sms-alert$$`

    # Count the number of sms-alert files
    COUNT=`ls /var/tmp/sms | grep sms-alert | wc -l`
    COUNT1=`expr ${COUNT}`

    # Check if number of alerts in the last 5 minutes is less than 2

    ISLT=`expr ${COUNT1} \< 2`

  }
  :0:
  # if the expression is true then forward the email
  * ISLT ?? ^^1^^
    ! 0123456789@pager.net
}

--------------------------------------------------------------------------------------------------
[tsz] I have limited disk space. How can I truncate long messages?
 
Before we proceed any further, there is a very important email feature to observe. If you alter
the content-length of a message it is highly advisable first to discard any "Content-Length:"
lines from the email's header. If you fail to do that, there is the danger that next time you read
the relevant email folder your email program will break your folder because of erroneous length
information. Many email programs are brain-dead that way.

#Truncate messages longer than 4000 bytes to 100 lines
:0
* > 4000
{
  :0 fwh
  * ^Content-Length:
  | formail -IContent-Length:
 
  :0:Truncated.mail.lock
  | head -100 >> Truncated.mail
}
 
Some details:

  * The "* > 4000" matches email messages longer than 4000 bytes.
     
  * The already familiar set of flags "fwh" tells to treat the email's header.
     
  * Use formail to ensure removing even complicated "Content-Length:" lines.
     
  * The above also serves as an example of "block nesting", i.e. the rules and actions between the
    braces "{ }".

Let's expand the recipe a bit.

#Truncate messages longer than 4000 bytes to 100 + 10 lines
:0
* > 4000
{
:0 fwh
  * ^Content-Length:
  | formail -IContent-Length:
 
  :0c:Truncated.mail.lock
  | head -100 >> Truncated.mail &&\
    echo "-:-:-:- (snip) -:-:-:-" >> Truncated.mail
 
  :0:Truncated.mail.lock
  | tail +101 | tail -10 >> Truncated.mail
}
 
A few observations:

  * The first 100 lines are included. So are the last 10.
     
  * The above also exemplifies giving multiple commands. Recall that a standard recipe only allows
    one action line.

Another option is to compress the incoming email instead of truncating it.
--------------------------------------------------------------------------------------------------
[tszig] How can I quickly test if my rules with regular expressions match? The fuller procmail 
testbench is a bit heavy a machinery for quick testing.
 
Let's see. A lite version of the testbench could be the following. Put the rules you wish to try
out in a "greptest" file of your rules with egrep since procmail matching closely (but not quite!)
follows egrep's. Make the file executable with "chmod u+x greptest". Then make a "mail.msg" file
with the texts you wish to try to match (or not to match). Thus you might have:

#The executable file named "greptest"
#!/bin/sh
egrep -i '(ts|timo\.salmi)@([-a-z0-9_]+\.)*uvasa\.fi' mail.msg
#
#Allow a quick visual comparison on the screen
echo ""
cat mail.msg

#The mail.msg target file with the trial text for the matching
ts@uvasa.Fi
ts@loisto.uvasa.fi
Timo.Salmi@uvasa.Fi
Timo.Salmi
null@uvasa.fi

Then, just give the command "greptest" and visually compare the outputs.

Miscellaneous notes:

  * There are some special differences between procmail extended matching rules and the egrep
    expressions. Thus under special circumstances they do not match the regular expressions quite
    the same way. This might raise occasional confusion. See "man procmailrc" for the details.
     
  * You can also test egrep regular expressions on your PC since egrep clones are available from
    the Garbo program archives. For example you might try gnuegrep.zip, egrep.zip and dgrep.zip.

--------------------------------------------------------------------------------------------------
# How can I detect if the email comes, say, from the .com domain?
 
I have been baffling over this item myself, because it is not as trivial as it first appears. The
catch is that the ".com" is exactly at the end of the address. The problem naturally is that in
the email headers there can be text after the email address, such as the sender's name. E.g.

From: scam@cyberspam.com (The Big Bad Spammer)

The first solution that comes to mind is the following, but it is not entirely accurate.

:0:
* ^From:.*\.com
* !^From:.*\.com\.
* !^TO_(ts|timo\.salmi)@([-a-z0-9_]+\.)*uwasa\.fi
ProbableComSpam.mail

  * The first condition line matches a ".com" anywhere on the "From:" address line. It would
    match, for example, email from "someone@my.company.net".
     
  * The second condition line tries narrow the condition down, but it still would match e.g.
    "someone@my.ispcom.net". (Or would it? Anyway, the recipe is not quite accurate.)
     
  * The third condition line is just standard spam avoidance, not necessarily related to the task
    at hand. It is just that much, if not the majority of spam appears to involve .com addresses.

Quite possibly there are better solutions, but below is what I came up with for hopefully an
accurate match:

# Get the sender's address
# Discard any leading and trailing whitespaces
FROMADDR_=`formail -rt -xTo: \
           | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`

# Test if the email came from the .com domain
:0:
* $ ? echo ${FROMADDR_} | egrep -is '\.com$'
ComDomain.mail

  * Let formail take care of finding out from the headers what the sender's address is. Get rid of
    any leading and/or trailing white spaces using "expand" for tabs and "sed" for the remaining
    spaces. You should have this definition high up in your ~/.procmailrc
     
  * The "$" on the condition line tells to expand any variables on the line. In this case the "$
    {FROMADDR_}" instead of taking in literally.
     
  * As far as I understand, the "?" executes a line (and tells to transmit an exit code, but that
    is beside the current point). BTW, if you have the procmail extended diagnostics on ("VERBOSE=
    yes") you can get in your procmail logfile a sinister looking "Program failure (1)". Don't
    panic. It just is egrep's exit code telling that no match was found for that particular email
    message, i.e. that it was not from the ".com" domain.
     
  * The condition line echoes the stripped email address to "egrep" in order to test if there is a
    match. The "-i" switch is used since email addresses are case insensitive. The essence of the
    "egrep" is the trailing "$" matching the end of the extracted address. The "-s" switch tells
    egrep to work silently, i.e. only to give the return code.

There is one small convenience in the first, inaccurate recipe version. It is easy to include
several domains into the same recipe. For example:

:0:
* ^From:.*\.hk|\
  ^From:.*\.kr|\
  ^From:.*\.tr
* !^From:.*\.hk\.|\
  !^From:.*\.kr\.|\
  !^From:.*\.tr\.
* !^TO_myid@([-a-z0-9_]+\.)*myhost\.mydom
ProbableSpam.mail

An aside: You could also utilize a more condensed format:

* ^From:.*\.(hk|kr|tr)

(Condensing the rest of the above recipe is left as an exercise.)

Using scoring is one option. The recipe could also be rewritten as

#Define getting the sender's address
#Discard any leading and trailing whitespaces
FROM_=`formail -rt -xTo: \
         | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`

#Whatever other recipes in between.

#Spam screening of certain susceptible domains
:0:
* -1^0
*  1^0 $ ? echo ${FROM_} | egrep -is '\.hk$'
*  1^0 $ ? echo ${FROM_} | egrep -is '\.kr$'
*  1^0 $ ? echo ${FROM_} | egrep -is '\.tr$'
*  1^0 !^TO_myid@([-a-z0-9_]+\.)*myhost\.mydom
ProbableSpam.mail

There also is the option

:0:
* ^From:.*\.hk([ >]|$)|\
  ^From:.*\.kr([ >]|$)|\
  ^From:.*\.tr([ >]|$)
* ! ^TO_myid@([-a-z0-9_]+\.)*myhost\.mydom
ProbableSpam.mail

--------------------------------------------------------------------------------------------------
[tsb] What alternatives do I have to detect a sender all through the various header-fields?
 
If we only look at the "From:" field in the header we have the familiar:

#Accept all email from myself, weed out autoreplies
:0:
* ^From:.*myid@([-a-z0-9_]+\.)*myhost\.mydom
* ! ^X-Loop: myid@myhost\.mydom
${DEFAULT}

Next, let's extend the matching to more fields in the header:

:0
* ? formail -x"From" -x"From:" -x"Reply-To:" -x"Errors-To:"\
    | egrep -i "scam@cyberspam\.com"
/dev/null

  * The "?" at the start of the condition executes and evaluates what is on the condition line
    instead of searching for a literal match.
   
  * Use formail to extract from the headers.
   
  * The "-x" switch means extract the contents a header-field from the header. Formail is
    convenient (also) because it can concatenate the potential continuation lines in a
    header-field.
   
  * Pipe the results to "egrep" regular expression search. The "-i" switch tells egrep to ignore
    the lower/uppercase status of the target string.
   
  * Incidentally: Since we discard the email message to "/dev/null", file locking ":0:" must not
    be used.
   
We can utilize a predefined expression to match the header-fields. The clever "FROM" expression
below comes from Jari Aalto's procmail material.

FROM="^(From[ ]|(Old-|X-)?(Resent-)?(From|Reply-To|Sender):)(.*\<)?"
#(whatever else in between)
:0
* $ ${FROM}scam@cyberspam\.com
/dev/null

  * The first "$" on the condition line tells that the environment variable(s) on the line are to
    be expanded, instead of taking all the text on the condition line literally.
   
You may go even further in your detective work and include the information from the header's
"Received:" lines. That is, you also can detect if something what you wish to avoid is along the
route where the email came from.

:0
* ? formail -x"Received:"\
    | egrep -i "cyberspam\.com"
/dev/null

Spam email is sometimes indicated by a missing or an empty "From:" line in the header.
Furthermore, the "From:" line might contain an empty <> instead of having a proper address within
the <>. Using scoring we might have something like

:0:
* 1^0 ^From:([ ]$|$)
* 1^0 ! ^From:
#A catch: Don't use here the word-boundary operators \< \>
#Use just the plain <>
* 1^0 ^From:.*<>
NoFrom.mail

Under a worst-case scenario, the various sender headers might all be empty. To test for this
unlikely eventuality we can utilize the fact that formail would put a "foo@bar" into the "FROM_"
under such circumstances.

# Define getting the sender's address
# Discard any leading and trailing whitespaces
FROM_=`formail -rt -xTo: \
         | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`

# Test if the sender could not be identified at all
:0:
* FROM_ ?? foo@bar
NoSender.mail

As always, there are several alternatives to solving a problem. Consider a potential case where a
spammer poses as the mailer-daemon but the "From:" header is either missing or total gibberish.
How to detect this situation? The second condition in the recipe below ensures that there is
"From:" line in the header, and that it has some elementary validity.

:0:
* ^From[  ]*MAILER-DAEMON
* ! ? formail -x"From:" | egrep -is "[a-z]"
ProbableSpam.mail

  * The first condition is to check the first From line in the header.
   
  * The [] contains a space and a tab.
   
  * In the second condition the "!" is the familiar operator indicating a negation.
   
  * The "?" tells to execute and evaluate what is on the condition line instead of searching for a
    literal match.
   
  * formail's -x"From:" extracts the From: header contents (without the field name).
   
  * Unix egrep is used to test whether the "From:" field exists and contains at least one ordinary
    letter, upper or lower case ("i"), working silently ("s").
   
--------------------------------------------------------------------------------------------------
[tsspe] How can I extract a valid address from the Reply-To field, and that field only?
 
One trick is to utilize the following variable definition letting formail do the worrying about
the proper address format.

REPLYTO_=`egrep "^Reply-To:" | head -1 \
         | formail -c -rt -xTo: \
         | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`

  * Assume that indeed you strictly want the address from a "Reply-To:" header. No address in any
    other header will do. Use egrep to extract the "Reply-To:" header-field from the incoming
    email.
   
  * head -1 ensures that only the first occurrence of a "Reply-To:" in the message counts.
   
  * formail -c -rt -xTo: is a standard, special trick to form a return address. The key is the -r
    switch which "generates an autoreply header". The "-c" switch concatenates any continued
    fields in the header.
   
  * If no "Reply-To:" header is found in the email message, foo@bar will be returned as the
    address.
   
  * The last line removes any leading and trailing tabs and blanks from the address.
   
If you put the REPLYTO_ definition high up in your ~/.procmailrc you will have the variable
available to the rest of your recipes.
 
# Feedback: Let me suggest this:

    REPLYTO_=`formail -cXReply-To: | head -1 | formail -rtzxTo:`

  * "formail -cX" rather than "egrep" in case the header has a different capitalization -- or if
    the real address is on a continuation line.
     
  * formail "-z" flag to avoid "expand" and "sed".

Timo's further comments:

  * The "-c" switch concatenates continuation lines.
   
  * The "-X" switch extracts the header-field, preserving the field name.
   
  * The "-rt", "-x" and "To:" trick prepare a return address.
   
  * The "-z" switch ensures that a whitespace exists between field name and content.
   
  * If the Reply-To: header-field is empty or missing, the value of the REPLYTO_ variable will be
    foo@bar

--------------------------------------------------------------------------------------------------
# How can I extract the address of the sender's postmaster?
 
Put these definitions high up in your ~/.procmailrc :

#Get the sender's address, the generic version
FROM_=`formail -rt -xTo: \
         | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`

#Get the sender's host
FHOST_=`echo "${FROM_}" | awk -F@ '{ print $2 }'`

#Build the postmaster's address
FMAST_="postmaster@${FHOST_}"

Thus, you have the postmaster's alleged address available as ${FMAST_} from this point on in your
recipes file. Note, however, that all validity testing of the address is missing.

What happens in the FROM_ formula:

  * At e quick glance it may appear that the "From:" header and the "To:" header have been
    confused in the formula, but this is not the case. The formail program is asked to ("-r") to
    prepare a reply header to send email back to the sender. Then that return address is
    extracted. That is why we have a "-xTo:" since we want to extract where the reply would be
    sent. That is where we assume that the email came from.
   
  * In the pipe "expand" is used to replace potential tabs with spaces, and "sed" is used to omit
    any leading and trailing white spaces.
   
Formail uses a certain priority order in preparing the reply header. If there is a "Reply-To:"
field in the header, the "FROM_" variable will contain that address. In same cases one may wish to
ignore that field for example to prevent malicious relaying. Here is the how:

#Get the sender's address, ignore Reply-To:
FROM_=`formail -I"Reply-To:" -rt -xTo: \
         | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`

--------------------------------------------------------------------------------------------------
[tsau] How can I weed out an inordinately long recipient list? I am one of the recipients of a
very useful professional mailing list, but it lists in its "To: " header-field all the recipients
to the list. Furthermore, it repeats the messages in HTML format. The text format is sufficient
for me.
 
The (only slightly modified) example below is based on a true situation from my own ~/.procmailrc.

#Ensure a whitespace exists between field name and content
#Comment "Old-" the Content-Length field from all the headers
:0 fwh
| formail -z -i"Content-Length:"

#(whatever else in between)

:0
* From:.*the-mailing-list-maintainer
* ^TO_the@first\.recipient\.edu
{
  :0 fw
  | formail -I"To:" -I"X-" -I"Content-Type:" -I"MIME-Version:"\
    -A "To: Maintainer's long recipient list suppressed" \
  | sed -e '/^This is a multi-part /,/^Content-Transfer-Encoding: /d' \
        -e '/------=_NextPart_/,$d'

  :0:
  ${DEFAULT}
}

  * There are two condition lines.
      + Match if it is from the mailing list maintainer.
      + Match if it is for the full mailing list and not only to me personally from the
        maintainer.
       
  * Feed ("f") the email message to a pipe of several lines. Tell procmail to wait ("w") for the
    pipe to finish.
      + Let formail weed out superfluous fields.
      + Append a very brief "To:"-field for your information.
      + Let sed take out any special format information.
      + Let sed weed from the start of the HTML part to the end of the message
       
  * This example shows the principles, but it is based on the established format of the postings
    on the particular mailing list. Therefore it is not applicable as such, but you'll have to
    customize and test it for your own situation. (See the items on test methods on this page.)

--------------------------------------------------------------------------------------------------
[tsc] What is this procmail scoring? How can I utilize it?
 
This is a somewhat complicated subject with material dispersed throughout the various procmail
FAQs. Basically scoring is a method to count how many of the conditions are fulfilled in a recipe
and if the "score" is positive, that is the score is 1 or more, the action line in the recipe will
be performed. There is much, much more to scoring, but this is a good starting point.

Consider the following simple spam foiling recipe. It will put the email into the
ProbableSpam.mail file if the score adds up to at least to one. If the first condition is met, 1
is added to the score. Ditto for the second condition. Thus if either of the tell-tale spam
signals occur, the score will be positive (that is greater than zero) and the action (storing the
email message into the ProbableSpam.mail file) will be enacted.

:0:
* 1^0 ^Subject:.*make money fast
* 1^0 ^Subject:.*\$\$\$
ProbableSpam.mail

The example above uses equally-weighted scoring. One can also have unequal scores. Below, a hit of
the second condition gives two points while a hit of the first only gives one.

* 1^0 ^Subject:.*make money fast
* 2^0 ^Subject:.*\$\$\$

Scoring can be used to build some extremely trivial artificial intelligence into the recipes.
Consider the following

:0:
* -1^0
*  1^0 ^Subject:.*money
*  1^0 ^Subject:.*fast
*  1^0 ^Subject:.*\$\$\$
ProbableSpam.mail

  * The initial score is set at -1. Thus at least two of the subsequent conditions have to be met
    in order for the entire recipe to match. If none or only one of the conditions is met, the
    score will not rise above zero.

An alternative formulation of scoring to foil spam is given below. This time it is required that
at least three of the score-condition lines match. (The [] contain a space and a tab, as usual.)

:0:
* ^Subject:[ ]*\/[^ ].*
* -2^0
* 1^0 MATCH ?? ()\<easy\>
* 1^0 MATCH ?? ()\<fast\>
* 1^0 MATCH ?? ()\<(cash|money)\>
* 1^0 MATCH ?? \$\$\$
ProbableSpam.mail

  * procmail \/ operand is used to extract the subject of the email into the reserved MATCH
    variable.
   
  * Variables testing "??" is used.
   
  * Word matching is used applying the word boundaries "\<". Thus "fast" would be matched, but not
    "faster".
   
  * If both the words "cash" and "money" appear on the subject line no more than one score point
    will be awarded.

Further, simple examples

#Catch potential spam by examining the email route
:0:
* 1^0 ? formail -x"Received:" | egrep -i "157\.161\.140\.2"
* 1^0 ? formail -x"Received:" | egrep -i "199\.217\.231\.46"
* 1^0 ? formail -x"Received:" | egrep -i "212\.106\.213\.36"
* 1^0 ? formail -x"Received:" | egrep -i "216\.154\.1\.82"
ProbableSpam.mail

  * As usual, the "?" executes and evaluates what is on the rest of the condition line instead of
    searching for a literal match. Note the syntax order.
   
  * Incidentally, there is a subtle catch in using the IP numbers. Assume that you wish to detect
    the nodes from 216.154.1.74 through to 216.154.1.86. This rule won't work quite right: "216
    \.154\.1\.[74-86]". Why? The "[74-86]" will match 4-8. (The 7 and 6 would be superfluous since
    they already are within the 4-8 range.) The rule would find matches outside the intended
    range. E.g. "216\.154\.1\.72" would be matched. Instead, applying both "216\.154\.1\.7[4-9]"
    and "216\.154\.1\.8[0-6]" would match correctly.

This 'precision' recipe checks in the message header both the "From:" field and the "Received:"
path of a forgery spam.

#Avoid a specific forgery spam
:0:
* -1^0
*  1^0 ^From:.*mikerobbins2000@hotmail\.com
*  1^0 ? formail -x"Received:" | egrep -is "psi\.net"
Spam.mail

Scoring and ordinary conditions can be mixed in the rules. For example the two recipes below
achieve roughly the same thing, but the latter option produces less steps if the email is for you.

:0:
* -1^0
*  1^0 ? formail -c -x"Received:" | fgrep -is 'alladvantage.com'
*  1^0 ? formail -c -x"Received:" | fgrep -is 'ameritech.net'
*  1^0 ? formail -c -x"Received:" | fgrep -is 'bellatlantic.net'
*  1^0 ! ^TO_(myid)@([-a-z0-9_]+\.)*myhost\.mydom
ProbableSpam.mail

:0:
* ! ^TO_(myid)@([-a-z0-9_]+\.)*myhost\.mydom
* 1^0 ? formail -c -x"Received:" | fgrep -is 'alladvantage.com'
* 1^0 ? formail -c -x"Received:" | fgrep -is 'ameritech.net'
* 1^0 ? formail -c -x"Received:" | fgrep -is 'bellatlantic.net'
ProbableSpam.mail

The formail switches in the above are

  * -c Concatenate continued fields in the header.
  * -x Get the contents of the said header-field. Do not include the field name.

The fgrep (search a file for a fixed-character string) switches in the above are

  * -i Ignore upper/lower case distinction during comparisons.
  * -s Silent (only produce error messages) in order to check the return status without any
    output.

The above example could also be written more efficiently without scoring as

:0:
* ! ^TO_(myid)@([-a-z0-9_]+\.)*myhost\.mydom
* ^Received:.*(\
alladvantage\.com|\
ameritech\.net|\
bellatlantic\.net)
ProbableSpam.mail

--------------------------------------------------------------------------------------------------
[tspo] How can I test if the subject is empty or if the subject field is missing altogether?
 
Scoring seems to be the answer:

:0:
* 1^0 ^Subject:([  ]$|$)
* 1^0 !^Subject:
NoSubject.mail

As usual, the brackets [] contain a space and a tab.

There are other options to test for an empty "Subject:" or an entirely missing "Subject:" field.
The one below puts the subject contents in a variable. The actual recipe then tests if the value
of the "SUBJ_" variable is empty. (Also see the feedback about the syntax.)

#Get the subject discarding any leading and trailing blanks
SUBJ_=`formail -xSubject: \
       | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`

#Test for an empty or missing subject
:0:
* SUBJ_ ?? ^^^^
NoSubject.mail

  * ^^^^ denotes empty contents. The trick is adopted from procmail material of some other authors
    where the ^^ anchor is better explained than what I can do. Also see procmarc.man for it.
   
  * Likewise, see procmarc.man for the ?? definition.

--------------------------------------------------------------------------------------------------
[ts] How can I modify the "To:" field of the email I received?
 
I am not exactly sure why you wish to do this, but here is how to replace the "To:" header-field
of a message using formail. Choose the formail "-i" option to rename the old "To:" field to be
"old-To:" and to insert the new "To:" header-field. The flags in the recipe are as follows: "f"
use the pipe as a filter, "h" it is about the header of the email message, "w" execute before
proceeding down the rest of the "~/.procmailrc".

:0 fhw
* To.*myoldid@myoldhost.myolddom
| formail -i "To: mynewid@mynewhost.mynewdom"

--------------------------------------------------------------------------------------------------
# I have a long list of spammers and other Internet lowlife in a separate file. How can I utilize
it?
 
The technique is fairly simple. Put this in your "~/.procmailrc" file:

MAILDIR=/home/myid/Mail   #The location of your own mail directory
# Whatever other preliminaries

# Whatever other recipes

# Test if the email's sender is in the blacklisted
:0
* ? formail -x"From" -x"From:" -x"Sender:" \
    -x"Reply-To:" -x"Return-Path:" -x"To:" \
    | egrep -is -f black.lst
/dev/null

  * All the common email sender headers are covered.
   
  * Also the "To:" field is covered in the recipe, since spammers often name their mailing lists
    as phony addresses.
   
  * Continuation lines ("\") are utilized. Incidentally, ensure that there are no trailing
    whitespaces after the "\" on a line.
   
  * The "-i" option in egrep tells to ignore upper/lower case distinction. The "-s" is for
    silence. The "-f file" option tells to take the list of the regular expressions from file.
   
Prepare a "/home/myid/Mail/black.lst" file with contents something like:

abc23@airnewz.ccn
abdu@advis.com.tr
adexec@mail.com
dinner@dine.com
friend@public.com
helpingyou@mail.com
mk1977@ms1.kingnet.com.tw
nb8MAMxhq@mail.com
no@body.com
owieuj@peterlink.ru
patkline00@usa.net
promotions@web-vertise.com
unknown@unknown.com

  * The black.lst file should reside in your "${MAILDIR}" mail directory (unless you explicitly
    include the path in your "~/.procmailrc").
   
  * The problem with such lists is that most of the spam related addresses are very transient by
    nature. I do not think such lists alone are a very effective method, as I have explained in my
    Foiling Spam with an Email Password System measures medley.
   
  * For an exact matching you might wish to use e.g. "no@body\.com" instead of "no@body.com".
    Alternatively, one could use fgrep (fixed grep) or grep -F
   
--------------------------------------------------------------------------------------------------
[tsshu] How do I forward certain messages that I get, and preserve myself a copy?
 
Below is an example:

#Get the sender's bare email address from the first "From" line
FROM_=`formail -c -x"From " \
         | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g' \
         | awk '{ print $1 }'`

#Get the original subject of the email
#Discard superfluous tabs and spaces
#On some systems -xSubject: has to be -x"Subject: "
SUBJ_=`formail -c -xSubject: \
         | expand \
         | sed -e 's/  */ /g' \
         | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`

#Whatever other recipes you'll use

:0
* ^From:.*infolist@([-a-z0-9_]+\.)*infohost\.infodom
# Avoid email loops
* ! ^X-Loop: myid@myhost\.mydom
{
  :0c:   #Preserve a copy of the email
  Infolist.mail
  :0fwh  #Adjust some headers before forwarding
  | formail -A"X-Loop: myid@myhost.mydom" \
            -A"X-From-Origin: ${FROM_}" \
            -i"Subject: $SUBJ_ (fwd)"
  # Forward the email
  :0
  !mydept@myhost.mydom
}

--------------------------------------------------------------------------------------------------
# How do I forward certain messages to two different addresses?
 
I have the following recipe in my ~/.procmailrc file, but the email does not get forwarded to the
myid2@myhost.mydom address.
 
  :0 c
  *^From.*info.gov
    ! friend@somehost.domain myid2@myhost.mydom
 
I am not sure what is wrong with that, but at least the solution below should work:

:0
* ^From.*info.gov
* ! ^X-Loop: myid@myhost\.mydom
{
  :0fwh
  | formail -A"X-Loop: myid@myhost.mydom"
  :0c
  ! friend@somehost.domain
  :0
  ! myid2@myhost.mydom
}
 
The X-Loop is not relevant from the point of the stated problem, but using it as a safeguard is
always advisable.

# Feedback: The reason that the first one does not work is that the recipients' addresses are
separated by space while they should be separated by a comma [as in]

  :0
  ! friend@somehost.domain,myid2@myhost.mydom
 
(I have not tested this one.)
--------------------------------------------------------------------------------------------------
[ts] How do I automatically return certain email messages?
 
Ah! Another potential case of spam avoidance? (This is a companion page to Foiling Spam with an
Email Password System, remember.) Below is an example. But be sensible in using the method, since
most spam has forged senders.

#Define getting the sender's address
#Discard any leading and trailing whitespaces
FROM_=`formail -rt -xTo: \
  | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
 
#Whatever other recipes in between.
 
#Return certain email
:0
#
# Is the email from a frequent spam domain?
# (Note: fgrep takes no regular expressions)
* ? formail -c -x"Received:" | fgrep -is 'cyperspam.com'
#
# Is it for a mailing list rather than to me?
* ! ^TO_(myid|myFirst\.mySecond)@([-a-z0-9_]+\.)*myhost\.mydom
#
# Avoid forgeries that pretend to be from my own site
* ! $ ? echo ${FROM_} | fgrep -is 'myhost.mydom'
* $ ? echo ${FROM_} | fgrep -is '.'
* $ ? echo ${FROM_} | fgrep -is '@'
#
# Avoid email loops
* ! ^X-Loop: myid@myhost\.mydom
{
  # Make a temporary file of the message to be returned
  :0c:formail.lock
  # Discard whitespaces, insert a leading blank
  | expand | sed -e 's/[ ]*$//g' | sed -e 's/^/ /' > return.tmp
  # Prepare and send the rejection
  # Be sure to customize your sendmail path
  :0:formail.lock
  | (formail -r -I"Subject: Rejected mail: Recipient refusal" \
    -A"X-Loop: myid@myhost.mydom" ; \
    echo "--- begin rejected mail ---" ; \
    cat return.tmp ; \
    echo "--- end rejected mail ---" ; \
    rm -f return.tmp) \
    | /usr/lib/sendmail -t
}

  * The spamfoiling page has a further example.
   
  * The "-r" option tells formail to generate an auto-reply header.
   
There can be many variants of detecting and returning email which one does not wish to get. Below
is a fictitious example utilizing variables to enhance the flexibility of the return address
handling. (If you are baffled by the "RULE" variable, which is just a sideline here, see the item
on identifying executed recipes.)

:0
* ^From:.*(charpie|charpie5266)@mydeja\.com
{ REJECT="charpie5266@mydeja.com" }
:0
* ^From:.*umidextr@([-a-z0-9_]+\.)*mindfall\.com
{ REJECT="umidextr@mindfall.com" }
:0
* ^From:.*(rasch|Greg.*\.Rasch)@([-a-z0-9_]+\.)*millkirn\.com
{ REJECT="rasch@millkirn.com" }
:0
* ^From:.*(daren|Daren[_\.]Risenthal)@([-a-z0-9_]+\.)*slunet\.org
{ REJECT="daren@slunet.org" }
 
:0
* ! REJECT ?? ^^^^
{
  :0
  { RULE="These users I do not want to talk with" }
  :0cw
  | expand | sed -e 's/[ ]*$//g' | sed -e 's/^/ /' > return.tmp
  :0:procmail.lock
  | (formail -r -I"To: ${REJECT}" \
    -I"Subject: Rejected mail: Recipient refusal" \
    -A"X-Loop: myid@myhost.mydom" ; \
    echo "--- begin rejected mail ---" ; \
    cat return.tmp ; \
    echo "--- end rejected mail ---" ; \
    rm -f return.tmp) \
    | /usr/lib/sendmail -t
}
 
Note how the above set of rules has two parts, the actual detection plus the return address
definition, and the return action. The latter could be written in many alternative ways, including

:0
* ! REJECT ?? ^^^^
{
  :0cw
  | expand | sed -e 's/[ ]*$//g' | sed -e 's/^/ /' > return.tmp
  :0 fwh
  | formail -r \
    -A"Subject: Rejected mail: Recipient refusal" \
    -A"From: myid@myhost.mydom" \
    -A"X-Loop: myid@myhost.mydom" ; \
    echo "--- begin rejected mail ---" ; \
    cat return.tmp ; \
    echo "--- end rejected mail ---" ; \
    rm -f return.tmp
  :0
  ! ${REJECT}
}
 
--------------------------------------------------------------------------------------------------
[tst] My address has changed. How do I forward a copy to myself and tell the sender?
 
This is a theme whose constituents already are covered throughout this material. But also take a
look at "man procmailex" for the "vacation database" idea even if a better name here would be
something like "dejatold database".

#Note: On some systems -xSubject: has to be -x"Subject: "
SUBJ_=`formail -c -xSubject: \
       | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`

:0
# Was it to me
* ^TO_myoldid@myoldhost\.myolddom
# Ignore messages for daemons
* ! ^FROM_DAEMON
# Avoid email loops
* ! ^X-Loop: myid@myhost\.mydom
{
  :0 c
  ! myid@myhost.mydom
  :0:dejatold.lock
  | formail -rD 8192 dejatold.cache
  :0 eh
  | (formail -r \
     -A"X-Loop: myid@myhost.mydom" \
     -I"Subject: Changed email address" ; \
     echo "Dear Sender," ; \
     echo "" ; \
     echo "Thank you for your email about" ; \
     echo "\"${SUBJ_}\"" ; \
     echo "" ; \
     echo "My email address has changed." ; \
     echo "Old: myoldid@myoldhost.myolddom" ; \
     echo "New: myid@myhost.mydom" ; \
     echo "Your email has been forwarded to my new address." ) \
     | /usr/lib/sendmail -oi -t
}

Some explanations:

  * The "-r" switch prepares s reply header for sending email back to the sender.
   
  * The "-D maxlen idcache" switch in "-rD" controls the message identification cache. For more
    see "man formail"
   
  * The "c" flag in ":0 c" tells that the processing should continue also after this particular
    recipe has been acted upon.
   
  * The "e" flag in ":0 eh" decrees that recipe only executes if the immediately preceding recipe
    failed
   
  * The "h" flag in ":0 eh" tell to feed the header to the pipe. Put since it is the default, it
    is not compulsory.
   
Naturally, the recipe does not stand alone in the ~/.procmailrc but is a part of it. Thus you
would e.g. have previous recipes that take care of the email that is not to you, and email that
was for mailer daemons.
--------------------------------------------------------------------------------------------------
[tsf] How can I set variable values based on the text in the body of the email message?
 
Let's start with another, much simpler question:

From: ts(a)UWasa.Fi (Timo Salmi)
Newsgroups: comp.mail.misc
Subject: Re: Procmail: How do I filter by the body
Date: Sun Apr 23 09:34:38 EET DST 2000
X-Comment: Slightly modified

I am trying to save all the messages that come to me with "mypassword" in the body to a folder
called password. How do I do that?
 
As the manuals state:

    Flags can be any of the following:
    B   Egrep the body.

Hence, all there is to it is

:0 B:
* mypassword
password

If you want your password case sensitive then use ":0 BD:".

    All the best, Timo

..

From: ts(a)UWasa.Fi (Timo Salmi)
Newsgroups: comp.mail.misc
Subject: Re: Question of procmail newbie
Date: Tue Nov 23 23:09:41 EET 1999
X-Comment: Slightly modified

How could I solve the following problem with procmail: I receive e-mails with a body like this:

    Category: aaa
    Subcategory: bbb
    File: ccc

I need to store this mail to the folder aaa/bbb/ccc, so procmail should create directories aaa/bbb
. What kind of .procmailrc should I write?
 
The trick is to extract the appropriate text from the body of the email message and to set 
procmail variable values on the basis of the results. This is how it can be done.

#Preliminaries
SHELL=/usr/bin/sh #Use the Bourne shell (check your path!)
 
CATE=`cat | egrep "^Category:" | awk '{ print $2 }'`
SCAT=`cat | egrep "^Subcategory:" | awk '{ print $2 }'`
FILE=`cat | egrep "^File:" | awk '{ print $2 }'`
 
#Whatever other recipes
 
:0B:Procmail.lock
* ^Category:[ ].+[a-z0-9]
* ^Subcategory:[ ].+[a-z0-9]
* ^File:[ ].+[a-z0-9]
| mkdir ${CATE} ; mkdir ${CATE}/${SCAT} ;\
  cat >> ${CATE}/${SCAT}/${FILE}
 
#Whatever other recipes
 
As a validity check the condition lines require that all the key-lines are present in the email
message body and that the lines contain names.

    All the best, Timo

# Feedback: It would be much more efficient rewriting these definitions using awk's pattern
matching, such as:
 
CATE=`cat | awk '/^Category:/ { print $2 }'`
etc
 
Apropos awk. On the Usenet there are dedicated was newsgroups comp.lang.awk and alt.lang.awk.
Furthermore, although used in quite another connection than procmail, there are several awk
(actually GnuAWK) usage examples in my MS-DOS batch programming tricks collection.

                                  -----------------------------                                   

Next, let's consider a trickier task. Find from the body of the text the last line that
potentially contains the string "mailto:". Insert the contents of that line into a MAILTO_
variable.

:0
* ^Subject:.*Whatever
{
  :0
  {
  MAILTO_=`sed -e '1,/^$/ d' \
           | egrep "mailto:" \
           | tail -1 \
           | expand \
           | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g' \
           | sed -e 's/[^o]://g' -e 's/^://g' \
           | awk -F: '{ print $2 }' | awk '{ print $1 }'`
  }
  :0:
  WhichEverFolderYouWant
}

Consider the MAILTO_ construct. (The test of the recipe should be self-explanatory.)

  * The sed -e '1,/^$/ d' extracts the body of the email message (i.e. the headers are ignored).
   
  * The egrep "mailto:" finds all the lines containing mailto:.
   
  * If there are several mailto: lines the tail -1 gets the last of them.
   
  * The expand expands any TAB characters to SPACE characters.
   
  * The sed -e 's/^[ ]*//g' -e 's/[ ]*$//g' omits any leading and trailing blanks.
   
  * The sed -e 's/[^o]://g' -e 's/^://g' weeds out from the same line the possible preceding
    colons (:) which might cause confusion. It is not perfect, though.
   
  * The awk -F: '{ print $2 }' gets the rest (until the end of line or the next colon) after the
    colon (:), i.e. the email address from the mailto: line and what may come after it. The awk '
    { print $1 }' discards the potential rest of the line starting with the first blank after the
    address. What should thus be left is the email address in the mailto: field.
   
Should you wish to get the entire line with the "mailto:" into the MAILTO_ variable instead of
just the email address there, simply leave out the last two lines from the MAILTO_ definition.
--------------------------------------------------------------------------------------------------
[tsc] How can I insert some token text in front of the body of incoming email?
 
I have a really simple procmail question. All I want to do is add a line
"======= Forwarded Mail =========="
to the top of the body of all incoming messages, and forward them to another account.
 
Let start by considering the first part of the question only. This is how it is done. The solution
owes heavily to Philip Guenther.

:0
{
  :0 fhw
  | cat - ; \
  echo "===== Filtered email ====="
  :0:
  ${DEFAULT}
}

So far so good. Next let's add the forwarding so that the token will only appear in the forwarded
message. (If you wish to change that, adjust the order of the rules.)

:0
{
  :0c:
  ${DEFAULT}
  :0 fhw
  | cat - ; \
  echo "======= Forwarded Mail =========="
  :0
  !forward@myhost.mydom
}

Finally, let's add avoiding email loops.

# Discard loops
:0
* ^X-Loop: myid@myhost\.mydom
/dev/null

:0
{
  :0c:
  ${DEFAULT}
  :0 fhw
  | cat - ; \
  echo "======= Forwarded Mail =========="
  :0 fhw
  | formail -A"X-Loop: myid@myhost.mydom"
  :0
  !forward@myhost.mydom
}

--------------------------------------------------------------------------------------------------
# Do you have any useful tips for regular expression matching?
 
..
This is a terribly complicated subject involving many features which I do not know. Let's
nevertheless look at some further example recipes.

# Matching a few undelivery and such reports
:0:
* ^Subject:.*Undeliver(ed|able) (e)?mail|\
  ^Subject:.*Returned (spam )?(e)?mail
* ^TO_(myid|firstname\.lastname)@([-a-z0-9_]+\.)*myhost\.mydom
Returned.mail

Consider the first rule of the recipe above. It will match all email with the following on the
"Subject:" line in the header:

  * Undelivered mail
  * Undeliverable mail
  * Undelivered email
  * Undeliverable email
  * Re: Undelivered mail
  * etc...

The continuation line will match

  * Returned mail
  * Returned email
  * Returned spam mail
  * Returned spam email
  * Re: Returned mail
  * etc...

What if you don't want to match "Re: Undelivered mail"? The following condition gives a more exact
match

* ^Subject:[  ]+Undeliver(ed|able) (e)?mail

In other words only spaces and/or tabs are allowed between "Subject:" and the start of the actual
subject.

..
Let's consider another example. Say that we have two hosts

  * cyber.com
  * cyber.com.au

How to catch email from the former, but not the latter:

:0:
* ^From:.*cyber.com([^\.]|$)
ProbableSpam.mail

That is, do not allow a dot after the .com or alternatively require that the line ends there.
However, cyber.comet would be matched! Thus, depending on what you want to achieve, you might have
e.g.

:0:
* ^From:.*cyber.com( |"|>|$)
ProbableSpam.mail
 
..
What is the difference between the rules below?

* ^From:.*myid@([-a-z0-9_]+\.)*myhost.mydom
* ^From:.*myid@([-a-z0-9_]+\.)?myhost.mydom
* ^From:.*myid@([-a-z0-9_]+\.)+myhost.mydom

The first one matches any of

 1. myid@myhost.mydom
 2. myid@subhost1.myhost.mydom
 3. myid@mypc.subhost1.myhost.mydom

  * The first one does not match myid@.myhost.mydom (and neither should it!).
  * The second one matches 1 and 2, but not 3.
  * The third one matches 2 and 3, but not 1.

..
To recount the purpose of the main special regexp symbols:

+------------------------------------------------------------+
|Symbol|                   Interpretation                    |
|------+-----------------------------------------------------|
|  *   |Match zero or more times                             |
|------+-----------------------------------------------------|
|  ?   |Match zero or one times                              |
|------+-----------------------------------------------------|
|  +   |Match one or more times                              |
|------+-----------------------------------------------------|
|  .   |Any character                                        |
|------+-----------------------------------------------------|
| [ ]  |Match from the list within the brackets              |
|------+-----------------------------------------------------|
|  ^   |The start of the line (within [] however, a negation)|
|------+-----------------------------------------------------|
|  $   |The end of the line                                  |
|------+-----------------------------------------------------|
|  \   |Quote the next character to take it literally        |
|------+-----------------------------------------------------|
| ( )  |Grouping                                             |
+------------------------------------------------------------+

--------------------------------------------------------------------------------------------------
[tsmon] How can I test if two procmail variables have the same contents?
 
Basically the syntax for variable value tests is

VAR1_=Whichever expression you devise
:0:
* VAR1_ ?? regexp
wherever

But you can build rules like

VAR1_=Whichever expression you devise
VAR2_=whatever
:0:
* $ VAR1_ ?? ${VAR2_}
wherever

Note, however, that the above still is regular expression matching, not an equality.

The blank after the first $ is significant. It tells that the variable references on the line ($
{VAR2_}) are to be expanded, not to be taken as a literal text.

# Feedback: That's easily resolved using $\var expansion and anchoring both ends of the regexp:

        * VAR1_ ?? $ ^^$\VAR2_^^

That condition will succeed if and only if VAR1_ and VAR2_ have the same contents, with the
possible exception of VAR1_ having one more trailing newline than VAR2_.
--------------------------------------------------------------------------------------------------
[ts] I am having difficulties with "<". How does one match it?
 
Date: 09 Dec 1999 23:06:41 -0600
From: Philip Guenther
Newsgroups: comp.mail.misc
Subject: Re: procmail, trivial html detection, and a quirk
 
ts(a)UWasa.Fi (Timo Salmi) wrote:
> I just noted that, at least in procmail v3.13.1 1999/04/05
>
> :0B:
> * </body>
> * </html>
>
> does not work. Instead one has to apply
>
> :0B:
> * [<]/body>
> * [<]/html>
 
Yep. A leading '<' or '>' on a condition causes procmail to interpret the condition as a size
test. If you want a normal regexp condition that starts by matching a literal '<' or '>' character
you have to protect the leading character from such interpretation. There are several ways of
doing so. The most efficient are to use parens or a backslash:

    * ()</body>
    or
    * (<)/body>
    or
    * (</body>)
    or
    * \</body>

That last one is generally avoided because it looks like you're using the \< regexp special when
you really aren't. Putting the '<' or '>' in brackets also works, as you did above, but it slows
down the matching ever so slightly as a character class is slower to match than a single normal
character. Thus, one of the above four methods is usually preferred.

Philip Guenther

(Timo's addendum: As far as I understand \< is a word-boundary in procmail. Hence \< is best
avoided, when not used as an actual boundary.)
--------------------------------------------------------------------------------------------------
[tsorg2] How can I insert identification text to the beginning of the subject line?
 
I know how to sort my incoming email with procmail into different folders, but how do I use
formail to automatically add some suitable identification text to the subject line of the email
that I receive?
 
The general idea is this

#Get the subject discarding any leading and trailing blanks
#Note: On some systems -xSubject: has to be -x"Subject: "
SUBJ_=`formail -xSubject: \
       | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`

* YourFirstSelectionCriterion
{
  :0 fwh
  | formail -I"Subject: WhateverYouAdd_1 ${SUBJ_}"
  :0:
  YourFirstFolder
}

* YourSecondSelectionCriterion
{
  :0 fwh
  | formail -I"Subject: WhateverYouAdd_2 ${SUBJ_}"
  :0:
  YourSecondFolder
}

The flags are as follows: "f" use the pipe as a filter, "w" execute before proceeding, "h" it is
about the header of the email message.

The -I option in formail removes and replaces the old header. Should you wish to retain the old
subject header with an "Old-" prefix added, use -i instead.
--------------------------------------------------------------------------------------------------
[tsno] I tried out your tips, but some of them failed on my system. What next?
 
Here are a few ideas:

 1. Have you copied right? For example:
   
      + If you cut and paste, the brackets [] containing tabs will not be copied correctly, since
        on this page the assumed tabs aren't true tabs.
         
      + Make sure that you have not misinterpreted the meaning of the quotation (") marks anywhere
        in the advice.
         
      + If you have a backslash \ at the end of the line to continue the line, it is very
        important to ensure that you do not have white spaces after the \ backslash.
         
 2. Have you customized all your file-paths right? Some of the recipes may require a slightly
    different setup in your environment than assumed in this FAQ.
     
 3. Check that procmail is getting your proper path. Try "echo ${PATH}" and then include "PATH=
    WhatYouGot" high up in your ~/.procmailrc recipe file.
     
 4. Include "VERBOSE=yes" high up in your ~/.procmailrc recipe file. Then see what is in the
    logfile procmail produced for debugging. The testbench is a useful aid in the debugging.
     
 5. The shell you use may affect some actions. Check where your Bourne shell sh is with "which sh
    ". If it is e.g. /bin/sh then include "SHELL=/bin/sh" at the beginning ~/.procmailrc recipe
    file and see if anything changes. Bourne shell is the shell I have used in preparing this tips
    page.
     
 6. Work systematically. Try to pinpoint which particular line is causing the offense and how. If
    the problem is with the condition part make general enough a version to get it match. Then
    narrow it down towards what you wanted until the recipe fails. If the problem is with an
    action, try to separate whether the problem is with the actual action or your procmail syntax.
    For example if you pipe the email to a program, try to separate if it is the call syntax that
    is in error (e.g. do you manage to convey the parameters right) or if it the actual program
    you called that fails.
     
 7. If you have a procmail problem which you can't solve after trying properly, post your problem
    to the comp.mail.misc Usenet newsgroup and/or your corresponding local newsgroup. If you have
    genuine feedback about my procmail tips, your email is most welcome, but please refrain from
    using email for private consultation requests.

--------------------------------------------------------------------------------------------------
[tsun] Echo and grep blues. I am having difficulties with echo and grep usages in procmail.
 
The combination of quoting and regular expressions can cause some subtle problems when the Unix 
echo and one of the greps (grep, fgrep, egrep) is used in the procmail recipes.

Consider

#Note: On some systems -xSubject: has to be -x"Subject: "
SUBJ_=`formail -c -xSubject:`

# Responses to filter reports
:0:
* -1^0
*  1^0 $ ? echo \"${SUBJ_}\" | fgrep -is 'Re: Filter report'
*  1^0 ^TO_myid@([-a-z0-9_]+\.)*myhost\.mydom
Response.mail

  * In the example the email's subject header is put into a "SUBJ_" variable utilizing formail
    "-x" option.
   
  * The "-c" option is used to concatenate the potential continuation lines, since occasionally
    the headers are divided onto several lines. This is more common on the "Received:" line, but
    can also occur on the "Subject:" line.
   
  * If the quoted quotes (\") are not used in the echo, the special characters on the email's
    Subject line in the header will be processed as shell related operators. This must not be
    allowed, since it will result in errors that may be hard to trace. For example operators such
    as "(", ")", "`", "'", "<", ">" and "|" all have a special meaning to the shell.
   
  * It is safer to use fgrep (the fixed-character expression search) because fgrep interprets also
    the regular expression special characters literally. For example, for fgrep you could use 
    fgrep 'myhost.mydom' instead of egrep "myhost\.mydom". BTW, as you gather from the example
    above, procmail uses egrep-like syntax.
   
Consider a more complicated expression to extract the subject:

#Note: On some systems -xSubject: has to be -x"Subject: "
SUBJ_=`formail -c -xSubject: \
         | expand | tr '\;\|\$\`\\]/' '     ' \
         | sed -e 's/  */ /g' \
         | sed -e 's/(/\\\(/g' -e 's/)/\\\)/g' \
         | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`

  * The potential tabs are expanded.
   
  * Some of the problem special characters are substituted with spaces.
   
  * Multiple spaces are substituted with a single space.
   
  * Parentheses are covered with backslashes "\". Here things can get really complicated, since
    the number of backslashes must be compatible with the number of interpretation rounds through 
    procmail and the shell.
   
  * The last sed gets rid of any leading and trailing whitespaces.
   
There is much more to the echo and grep interactions with the shell and the regular expressions.
That is why sufficient trials using the testbench are advisable before including the more
complicated recipes into one's "~/.procmailrc" file.
--------------------------------------------------------------------------------------------------
# How do I know which of my many procmail recipes has been enacted?
 
To get a log of what happens you set at the beginning of your ~/.procmailrc recipes file

SHELL=/usr/bin/sh                 # Use Bourne shell
MAILDIR=${HOME}/Mail              # Customize as appropriate
LOGFILE=${MAILDIR}/procmail.log   # Your procmail log
VERBOSE=yes                       # Produce full information
LOGABSTRACT=all                   #       - " -

However, this produces so much information that it is not convenient for a routine checking by a
visual examination. But you can include a suitable (dummy) variable definition in each one of your
recipes and then search the log file for occurrences of that variable. Here is an example
demonstrating how it goes. Consider a recipe that originally is

# Discard probable spam mail, set 1
:0:
* ! ^TO_(myid)@([-a-z0-9_]+\.)*myhost\.mydom
* 1^0 ^From:.*alladvantage.com
* 1^0 ^From:.*ameritech.net
* 1^0 ^From:.*bellatlantic.net
ProbableSpam.mail

Change this to be

:0
* ! ^TO_(myid)@([-a-z0-9_]+\.)*myhost\.mydom
* 1^0 ^From:.*alladvantage.com
* 1^0 ^From:.*ameritech.net
* 1^0 ^From:.*bellatlantic.net
{
  :0
  { RULE="Discard probable spam mail, set 1" }
  :0:
  ProbableSpam.mail
}

Apply the same principle for all your recipes in your ~/.procmailrc file. Then, as email has
arrived, you can check which rules have been used by searching the log file with the command 
grep "RULE=" ${HOME}/Mail/procmail.log. If you need this regularly, make the grep search one of
your Unix scripts:

#!/usr/bin/sh
grep "Assigning \"RULE=" ${HOME}/Mail/procmail.log

In the altered procmail recipe, further up, carefully note some of the syntax

  * The location of the lockfile invocation ":".
   
  * Above the RULE="..." line there is no cloning "c" flag in ":0" since setting a variable is a
    non-delivering action. The next line will be reached anyway. In fact, it would be a mistake to
    use a "c" there. It would lead to complications.
   
  * In setting the RULE variable ensure that there is space after the "{" and prior the "}".
    Otherwise the email will go to a folder with rather a long and complicated name.
   
Procmail recipes nesting can get fairly complicated. Consider the following example involving
setting the RULE variable and procmail else if conditions ":0E".

:0
* ^TO_my-mailing-list
{
  :0
  * ^From:.*@([-a-z0-9_]+\.)*myhost\.mydom
    {
      :0
      { RULE="To my-mailing-list, probably legitimate" }
      :0:
      ${DEFAULT}
    }
  :0E
    {
      :0
      { RULE="To my-mailing-list, probably spam" }
      :0:
      Spam.mail
    }
}

# Feedback: There is a method for logging which action took place without using the VERBOSE yes
which creates large log files. This method uses the LOG variable:

LOGFILE=$HOME/.MailFilter_log
SHELL=/bin/sh
 
:0 B
* .*spam
{
  LOG="TRAPPED SPAM - "
  :0
    /dev/null
}
 
#- Accept All other mail -#
:0
{
  LOG="ACCEPTED MAIL - "
  :0
  $ORGMAIL
}
 
the out put looks something like this:
 
  TRAPPED SPAM - From spammer@spam.com Thu May 16 03:52:42 2002
   Subject: Make Money Fast
    Folder:
/dev/null 43140
ACCEPTED MAIL - From goodguy@save.com Thu May 16 03:54:08 2002
 Subject: Legitimate email message
  Folder:
var/spool/mail/username 4683

My comment: If you look at the example for testing for individual procmail recipes you'll see that
for logging one sets (usually for troubleshooting)

#Troubleshooting:
VERBOSE=yes
LOGABSTRACT=all

For the method in the feedback above, leave those variables out or set

VERBOSE=no

However, do not set

LOGABSTRACT=no

because then you'll miss all but the actual log variable identification. Instead, just leave the
line out.
--------------------------------------------------------------------------------------------------
# How can I detect Korean, Cyrillic, or Chinese to avoid such frequent spam?
 
There was a very good page by Walter Dnes explaining the method. Unfortunately, that page no
longer seems available. The method relies on ad-hoc approximation. In brief, scoring is used to
detect if more than 5 per cent of the characters in the body of the message are high-bit
characters typical of the said language codes. If you have gone through the items in my procmail
FAQ, it should be easy to understand the inventive method given on Walter's page. See the exercise
at the end of the current FAQ involving detecting Korean.
--------------------------------------------------------------------------------------------------
# How can I change the subject line and include part of the message body into it?
 
I have a cellular phone. I want to save the incoming email normally and also to send a modified
copy to my second account (a Short Message Service). The forwarded copy should include the
original subject AND five lines of the original message text. The original body should not be
included. Is this possible with procmail?
 
Well yes, it is. It takes some figuring out needing many of the principles presented in the other
items in my proctips collection. It also needs a few tricks with Bourne shell programming. Perhaps
most importantly, this item demonstrates how to put the body of the message into a variable.

# Customize these paths if they do not match yours
SHELL=/usr/bin/sh
SENDMAIL=/usr/lib/sendmail
 
:0
* ^Subject:.*Timo testing
{
  # Put the email intact in the default folder
  :0c:
  ${DEFAULT}
  # The "c" flag above tells the recipe to continue
  # Now we prepare a different version of the message
  :0
  {
    # Get the subject into a variable
    # Expand the possible tabs into blanks
    # Discard any leading and trailing blanks
    # On some systems -xSubject: has to be -x"Subject: "
    SUBJ_=`formail -xSubject: \
      | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
    # Get the body of the message into a variable
    # Accept only the first five lines
    # Discard newlines, i.e. put everything on one line
    BODY_=`sed -e '1,/^$/ d' | head -5 | tr -d '\n'`
  }
  # Prepare and send a message with no body
  # -X "" extracts just the header (discards the body)
  # Plug in the new subject
  # Content fields might cause problems if not discarded
  # Change to To: address
  :0:proc.lock
  | formail -X "" \
      -I"Subject: ${SUBJ_} ${BODY_}" \
      -i"Content-Type:" \
      -i"Content-Length:" \
      -I"To: your@second.address" \
  | ${SENDMAIL} -t
}
 
The line

BODY_=`sed -e '1,/^$/ d' | head -5 | tr -d '\n'`

retrieves the first five lines from the body of the text. It would be more useful to retrieve a
specified number of characters from it. Say we wish to retrieve 160 characters. This is how to do
that.

BODY_=`sed -e '1,/^$/ d' | tr -d '\n' | dd bs=1 count=160`

Solving the alternative of having a maximum of 160 characters in the concatenated SUBJ_ and BODY_
is left as an exercise to the reader.

There also is another, more important improvement that can be made in the action above. Replace 
tr -d '\n' with tr '\n' ' ' so that when the lines are concatenated a space is put in between
them.
--------------------------------------------------------------------------------------------------
[tspad] How can I remove the signature from the incoming email?
 
The recipe below assumes that the signature properly adheres to the Internet "-- " convention to
denote where the signature starts.

:0
* ^Subject: Whatever
{
  :0 fbw
  | sed -e '/^-- /,$ d'
  :0:
  ${DEFAULT}
}

Let's look at what we've got:

  * The b flag means feed the body to the pipe.
  * The f flag means use the pipe as a filter.
  * The w flag means wait for the filter or program to finish.
  * This is not a sed FAQ, but in brief:
      + In the sed script the /^-- / matches the first occurrence of the signature designator
        string "-- ".
      + In sed, a lone $ stands for the last line.
      + The d denotes deleting the "pattern space" found.

In the above the sed script will delete everything in the message body starting from the "-- "
until the end of the incoming message. Substituting

sed -e '/^-- /,$ d'
 
with

sed -e '/^-- /,/^$/ d'
 
will instead delete everything starting from the "-- " until the first encountered empty line.
Thus if there is e.g. an attachment after the signature, the attachment will not be thrown away.

--------------------------------------------------------------------------------------------------
[tsb] What unix manuals relating to procmail should I get?
 
Unix manuals are not very helpful as starting points, but after you have got the rudiments under
your belt, you may wish to browse the following manuals for additional information. Below is a
simple "manuals" Bourne shell script. It prepares plain text format files of some of the essential
Unix man manuals for a procmail user, especially suited for offline reading.

Note that the "^H" is not a "^" and an "H", but a CTRL-H, i.e. ASCII 8 (the backspace character).
To make the "manuals" file executable type "chmod u+x manuals".

#!/bin/sh
TODIR=${HOME}/myman
echo ${TODIR}
man egrep      | sed -e 's/_^H//g' > ${TODIR}/egrep.man
man formail    | sed -e 's/_^H//g' > ${TODIR}/formail.man
man procmail   | sed -e 's/_^H//g' > ${TODIR}/procmail.man
man procmailex | sed -e 's/_^H//g' > ${TODIR}/procmaex.man
man procmailrc | sed -e 's/_^H//g' > ${TODIR}/procmarc.man
man regexp     | sed -e 's/_^H//g' > ${TODIR}/regexp.man
man sendmail   | sed -e 's/_^H//g' > ${TODIR}/sendmail.man
ls -lF ${TODIR}

Many of the recipes in this FAQ utilize sed and/or awk. Some useful links (note, however, as is
common with links, I can't guarantee that they still are current):

  * The sed FAQ Eric Pement
  * SED - the stream editor Sven Guckes
  * Manipulating Strings with sed
  * SED Patrick Hartigan
   
  * comp.lang.awk FAQ Russell Schulz
  * Awk -- A Pattern Scanning and Processing Language Aho, Kernighan, Weinberger
  * Awk programming links The University of Edinburgh
  * How to Use AWK Patrick Hartigan

--------------------------------------------------------------------------------------------------
[tsm] Is it possible to use procmail to call the vacation program?
 
Yes, it is, but it is not quite as straight-forward as one would expect.

Since this is a procmail, not the vacation program advice collection I'll assume that you are
reasonably familiar with the vacation program. If not, start with "man vacation". You have to use 
procmail to customize the ~/.vacation.msg file because when invoked via procmail, the vacation
$SUBJECT variable is not necessarily set.

Usually, when vacation is used, it is first called interactively to crate the ~/.vacation.msg file
and to replace the ~/.forward file. If you are going to use the procmail solution it is very
important not to do this. In particular, the ~/.forward file must not be touched in any way. The
reason is that in this solution it is used to invoke procmail, not vacation. (The vacation program
is, of course, called by procmail now.)

# Set a number of variables high up in your ~/.procmailrc
#
VACATION=/usr/bin/vacation
ONVACAT=yes
VACFREQ=5d
VACMSG=${HOME}/.vacation.msg
MYNAME_="MyFirstName MyLastName"
MYEMAIL_=myid@myhost.mydom
 
# Get the subject discarding any leading and trailing blanks
# Note: On some systems -xSubject: has to be -x"Subject: "
#
SUBJ_=`formail -xSubject: \
    | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
 
# Prepare the vacation message's base
# This is done only once in ~/.procmailrc
#
:0 cwi
* ONVACAT ?? ^^yes^^
| echo "From: ${MYEMAIL_}" > ${VACMSG} ;\
  echo "Subject: ${MYNAME_}, away from my mail" >> ${VACMSG} ;\
  echo "X-Loop: myid@myhost.mydom" >> ${VACMSG} ;\
  echo "" >> ${VACMSG} ;\
  echo "Thank you for your email about:" >> ${VACMSG} ;\
  echo "\"$SUBJ_\"" >> ${VACMSG} ;\
  echo "" >> ${VACMSG} ;\
  echo "Your email will be seen to when I return." >> ${VACMSG} ;\
  echo "" >> ${VACMSG} ;\
  cat ${HOME}/.signature >> ${VACMSG}
 
# Here we go invoking vacation and also saving the email
# You might have several, different of these recipes
#
:0
* ^Subject:.*Whatever
{
  :0
  { RULE="Testing" }
  :0 cwi
  * ONVACAT ?? ^^yes^^
  * ! ^X-Loop:.*myid@myhost\.mydom
  | ${VACATION} -t${VACFREQ} myid
  :0:
  WhateverFolder
}

# Feedback: Maybe I [Collin Park] can add one more comment: I think you need a global LOCKFILE to
cover the area from when you generate the vacation message to the place where you invoke
$VACATION.
 
Otherwise, message #N may generate .vacation.msg, then message #N+1 overwrites it before #N
invokes $VACATION.
--------------------------------------------------------------------------------------------------
[tsmai] How can I avoid duplicate messages sent in rapid succession?
 
One, but not the only option is the following heuristics. You will wish to customize and
streamline it in accordance to your own preferences.
 
#Some variables
FROM2_=`formail -c -I"Reply-To:" -rt -xTo: \
 | tr '\;\|\$\`\\]/' '     ' \
 | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
DFROM2_=`echo /${FROM2_}/ \
 | expand | sed -e 's/[ \<\>\+\?\$]//g'`
SUBJ_=`formail -z -c -xSubject: \
 | expand | tr '\;\|\$\`\\]/' '     ' \
 | sed -e 's/ */ /g' \
 | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
DSUBJ_=`echo /${SUBJ_}/ | expand | sed -e 's/[ \<\>\+\?\$]//g'`
DWC_=/`wc -w`/
 
#Discard doubles
# W Wait for the filter or program to finish,
# suppress any 'Program failure' message.
:0W
* $ ? sed -n 1p LastIn | egrep -is '${DFROM2_}'
* $ ? sed -n 2p LastIn | egrep -is '${DSUBJ_}'
* $ ? sed -n 3p LastIn | egrep -is '${DWC_}'
{
  :0
  { RULE="Discard doubles" }
  :0
  /dev/null
}
 
#Store some information about the latest message
# c then continue
:0Wc
| echo "${DFROM2_}" > LastIn ;\
  echo "${DSUBJ_}" >> LastIn ;\
  echo "${DWC_}" >> LastIn
 
--------------------------------------------------------------------------------------------------
[tsform1] How can I skip logging a certain, matched recipe? Say virus warnings from my postmaster.
 
The solution is rather simple. Direct LOGFILE to /dev/null (or anywhere you may wish) for the
duration of the relevant recipe. For example
 
:
LOGFILE_=${LOGFILE}
LOGFILE=/dev/null
:0:
* ^Subject:.*Virus in a mail for you
* ^From:.*postmaster
VirusWarnings
LOGFILE=${LOGFILE_}
:

Alternatively you could likewise (re)set
VERBOSE=no
LOGABSTRACT=no
but the first solution is the more flexible.
 
--------------------------------------------------------------------------------------------------
[tsb] Could you please solve for me this procmail problem of mine?
 
It is nice that you have found my proctips so useful that you ask for my personal advice.
Nevertheless, if you ask me by email for individualized procmail consultation my response has to
be similar to that as in asking me for any programming advice. Briefly, the response is that I do
not do email consultation. If you have a procmail related problem please post your question to the
Usenet news to a newsgroup like comp.mail.misc. The added advantage of posting is that in a
newsgroup both the question and the potential answers will have a wider forum. That way everyone
will benefit.

On rare occasions I have also been asked to email my own personal ~/.procmailrc or my own
spamfoiling scripts. The answer is a definite no. There are two main reasons. First, that material
is private. Second, I have neither the willingness nor the time to send out material to users on
individual requests. If and when I want to share my material I make it available for the users to
themselves retrieve it via WWW or FTP.
--------------------------------------------------------------------------------------------------
[tsf] I liked this material. Do you have anything else on programming, etc?
 
Yes, notably this:

+------------------------------------------------------------+
|                        Programming                         |
|------------------------------------------------------------|
|[tstp]  |Turbo Pascal programming material                  |
|--------+---------------------------------------------------|
|[tsbatc]|MS-DOS batch programming material                  |
|--------+---------------------------------------------------|
|[tspr]  |NT/2000/XP command line programming material links |
|--------+---------------------------------------------------|
|[tsun]  |Unix Bourne shell scripts programming material     |
|------------------------------------------------------------|
|                            Etc                             |
|------------------------------------------------------------|
|[tsfaq] |More links to Timo's FAQ materials                 |
+------------------------------------------------------------+

--------------------------------------------------------------------------------------------------
[tsq] Some exercises

Let's see if we can put to work the methods presented in this FAQ to solve some tasks, part of
them having come up on the Usenet news.

Ex.1) Keep a copy of incoming email, and at the same time, get only the first five lines from the
message body and forward it to another account.

# Discard potential email loops
:0
* ^X-Loop: myid@myhost\.mydom
/dev/null

:0
* Any rule(s) you might wish to have
{
  # Keep a copy, but don't stop yet ( the c )
  :0c:
  ${DEFAULT}

  # Comment with "Old-" the Content-Length field from the header
  # Ensure that a whitespace exists between field name and content
  :0 fwh
  * ^Content-Length:
  | formail -z -i"Content-Length:"

  # Add the loop avoidance
  # ( f for piping; w for waiting for completion; h for headers )
  :0 fwh
  | formail -A"X-Loop: myid@myhost.mydom"

  # Truncate the body ( the b ) to five lines
  :0 fwb
  | head -5

  # Forward to the other account
  :0
  ! myid2@myhost.mydom
}

It is important to handle the content-length header-field when the length of the email is altered.
This is done to ensure that the receiving email program will not break the forwarded message when
it is read. The -i switch is used to retain the information about the original message length to
the attention of the receiver.

                                  -----------------------------                                   

Ex.2) Forward the first 10 lines of the message body to the user's second account while preserving
all the original message headers -- I.e. at the receiving side, the user wants to see all the
message travel history and only first 10 line of the message body.
 
This is a more complicated version of the first exercise. The transformed task is not trivial,
since when you forward, the original message headers will be replaced by your forwarding headers.
Therefore, you'll have to see to preserving also the original headers. Below is how I would solve
the problem based on several items in this FAQ.

# A trick to extract the subject into a variable
# Note: On some systems -xSubject: has to be -x"Subject: "
SUBJ_=`formail -c -xSubject: | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
 
# The actual recipe to solve the exercise starts here
:0
* Whatever condition(s) you wish to select the messages for forwarding
# Avoid email loops
* ! ^X-Loop: myid@myhost\.mydom
{
  :0c: #If you want to, preserve a full copy of the email, else omit
  ${DEFAULT}
  :0fwh #Preserve the information about the original content length
  * ^Content-Length:
  | formail -z -i"Content-Length:"
  :0fwb #Truncate the body of the message to ten lines
  | head -10
  :0fwh #Insert a blank line at the beginning of the body for clarity
  | cat - ; echo ""
  :0fwh #Store the original headers, quoting them to avoid problems
  | sed -e 's/^/\> /'
  :0fwh #Insert some of your own information before forwarding
  | formail -A"X-Loop: myid@myhost.mydom" \
    -A"X-Info: Forwarded body truncated to 10 lines" \
    -i"Subject: $SUBJ_ (fwd)"
  #Finally, forward the adjusted email
  :0
  !my2dnId@myhost.mydom
}
 
# Discard potential email loops
:0
* ^X-Loop: myid@myhost\.mydom
/dev/null
 
# Feedback: The recipe with head probably needs an "i" on the flags line, as:

    :0 fwbi
    | head -10

since write errors on the pipe are likely for messages larger than a certain size. (I've seen
numbers like 4096 and 10240... it apparently varies with the system.)

                                  -----------------------------                                   

Ex.3) Match a potential [TS999] identification in the Subject header, such as "[TS001] Timo
testing". If found, insert a "Subject id: [TS999]" as the first line in the body of the message.
(The rest of the original subject line must not reappear in the id.)

:0
* ^Subject:.*\/\[TS[0-9]+\]
{
  :0 fhw
  | cat - ; \
  echo "Subject id: ${MATCH}"
  :0:
  ${DEFAULT}
}

But what if you do want to include the rest of the original subject line? In that case use

* ^Subject:.*\/\[TS[0-9]+\].*

                                  -----------------------------                                   

Ex.4) Multi-part messages (which typically include attachments) have in their headers a field like
the two examples below:
 
Content-Type: multipart/mixed; boundary=ELM965173874-25050-0_
Content-Type: multipart/mixed; boundary="------------BA45271FBDAA479CECA7E20A"

Write a recipe that inserts into a variable (call it BOUND) the boundary string. Note that the
potential quotes (") are not to be part of that string. Also note that the header might be divided
on multiple lines as in
 
Content-Type: multipart/mixed;
  boundary=ELM965173874-25050-0_
 
There are alternative solutions, which not necessarily are quite equivalent. The first one is
putting high up in your ~/.procmailrc recipe file the line(s)

BOUND1=`formail -z -x"Content-Type:" \
  | awk -F= '{ print $2 }' \
  | sed -e 's/\"//g' | tr -d '\n'`
 
A second one is:

:0h
* ^Content-Type:
{ BOUND2=`egrep -i 'boundary=' \
  | awk -F= '{ print $2 }' | sed -e 's/\"//g'` }
 
This was not in the exercise, but you can then have recipes like

:0:
* ! BOUND2 ?? ^^^^
WhateverFolder
 
                                  -----------------------------                                   

Ex.5) Identify if the arriving email is in Korean. If so, return the message to the sender and his
/her postmaster. Ignore a potential Reply-To: field in the header. Avoid email loops. Avoid
forgeries which appear to come from your own host. Avoid forgeries which lack a host name. Be
careful not to take Finnish/Swedish or French as Korean.
 
This is quite a difficult exercise with many details involved.

# Get the sender's address, ignore Reply-To:
FROM_=`formail -c -I"Reply-To:" -rt -xTo: \
  | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
 
# Get the sender's host
FHOST_=`echo "${FROM_}" | awk -F@ '{ print $2 }'`
 
# Your path to sendmail
SENDMAIL="/usr/lib/sendmail"
 
# Reject probable Korean email using character scoring
:0
* ! ^X-Loop:.*myid@myhost\.mydom
* ! $ ? echo ${FHOST_} | fgrep -is 'myhost.mydom'
* $ ? echo ${FHOST_} | fgrep -is '.'
{
  :0BD
  *  -1^1 .
  *   2^1 =[0-9A-F][0-9A-F]
  *  20^1 [¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿]
  *  20^1 [ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß]
  *  20^1 [àáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ]
  *  20^1 =[89A-F][0-9A-F]
  * -20^1 [åÅäÄöÖàáâçèéêë]
  * -20^1 =(E5|C5|E4|C4|F6|D6|E0|E1|E2|E7|E8|E9|EA|EB)
  {
    :0
    { RULE="Probable Korean email" }
    #
    :0c:${HOME}/procmail.lock
    | expand | sed -e 's/[ ]*$//g' \
      | sed -e 's/^/ /' > ${HOME}/procmail.reject.korean
    #
    :0:${HOME}/procmail.lock
    | (formail -r -I"Subject: Autorejected email" \
      -I"To: ${FROM_}" \
      -I"Cc: postmaster@${FHOST_}" \
      -A"X-Loop: myid@myhost.mydom" ; \
      echo "--- begin rejected probable Korean email ---" ; \
      echo "" ; \
      cat ${HOME}/procmail.reject.korean ; \
      echo "--- end of rejected probable Korean email ---" ; \
      rm -f ${HOME}/procmail.reject.korean) \
        | ${SENDMAIL} -t
  }
}
                                  -----------------------------                                   

Ex.6) If the subject of the email contains the identifier [INFO], in capitals, put the body of the
incoming email into a temporary file. Ensure that the name of the temporary file is unique. Insert
the full subject line at the top of the temporary file. (Why, and what then is beyond this
exercise.)
 
#Get the subject discarding any leading and trailing blanks
#Note: On some systems -xSubject: has to be -x"Subject: "
SUBJ_=`formail -xSubject: \
| expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
 
# Assign a temporary file name
TMPFILE_=proctemp.$$
 
:0D
* ^Subject.*\[INFO\]
{
  :0 fwbi
    | echo "Subject: ${SUBJ_}" > ${TMPFILE_}; \
    echo >> ${TMPFILE_}; \
    cat >> ${TMPFILE_}
}
                                  -----------------------------                                   

Ex.7) If the email comes from a certain sender, check if the time-zone information is present in
the Date header. If not, add it assuming +3 hours.
 
#Get the date discarding any leading and trailing blanks
DATE_=`formail -xDate: \
  | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
 
:0
* ^From:.*TheCertainSender
* ! ^Date:.*(EET|DST|GMT)
{
  :0 fwhi
  | formail -i"Date: ${DATE_} +0300 (EET DST)"
  :0:
  ${DEFAULT}
}
                                  -----------------------------                                   

Ex.8) The simple spamfoiling recipe below won't work. Correct it.
 
:0:
* !^TO$USER@xxxxxxx.xxx
ProbableSpam.mail
 
:0
{
  :0
  { USER=`whoami` }
  :0:
  * $ ! ^TO_${USER}@([-a-z0-9_]+\.)*xxxxxxx\.xxx
  ProbableSpam.mail
}
 
The ([-a-z0-9_]+\.)* is optional.

Another solution:

:0:
* $ ! ^TO_${LOGNAME}@([-a-z0-9_]+\.)*xxxxxxx\.xxx
ProbableSpam.mail
                                  -----------------------------                                   

Ex.9) Insert at the beginning of the subject the date/time of receiving the incoming message in
the YYYYMMDD HHMMSS format.
 
:0
* Whatever rules
{
  :0
  { SUBJ_=`formail -c -xSubject: \
    | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'` }
  :0
  { DATETIME_=`date "+%Y%m%d %k%M%S"` }
  :0 fhwi
  | formail -I"Subject: ${DATETIME_} ${SUBJ_}"
  :0:
  ${DEFAULT}
}
                                  -----------------------------                                   

Ex.10) This partly is based on an actual incident. Consider the following recipe with three small,
but crucial syntax errors, and one omission. Find them.
 
:0
* ^From:.*(\
(abuse(-news)?|acct_closed)@
(pacificnet\.net|\
mindspring\.net|\
InfoAve\.net|\
netcom\.com\|
yahoo\.com|\
alladvantage\.com|\
hotmail\.com))
* ^TO_(myid|myFirstName\.mySecondName)@([-a-z0-9_]+\.)*myhost\.mydom
{
  :0
  {RULE="Abuse reception notes"}
  :0
  ReceivedNotes
}

The answer is a bit further down
  :
  :
  :
  :
  :
  :
  :
  :
  :
  :
  :
  :
  :
  :
 
:0
* ^From:.*(\
(abuse(-news)?|acct_closed)@\
(pacificnet\.net|\
mindspring\.net|\
InfoAve\.net|\
netcom\.com|\
yahoo\.com|\
alladvantage\.com|\
hotmail\.com))
* ^TO_(myid|myFirstName\.mySecondName)@([-a-z0-9_]+\.)*myhost\.mydom
{
  :0
  { RULE="Abuse reception notes" }
  :0:
  ReceivedNotes
}
                                  -----------------------------                                   

Ex.11) Write a recipe to match the subject line below. The (RECENT) may or may not be there, and
the numbers will change from posting to posting.
Subject: Re: [SpamCop:(RECENT)38.204.225.29,id:16135684] Make lotsof $$$

:0:
* ^Subject: Re: \[SpamCop:(\(RECENT\))?[0-9\.]+,id:[0-9]+\]
WhateverFolder
                                  -----------------------------                                   

Ex.12) It is fairly common that spam email has the same sender and recipient in the From: and To:
fields. Device a recipe that detects such postings.

This is not quite as simple as it first sounds, since it is advisable to take into the account the
fact that the contents of the two fields may not be quite identical even in the case of the actual
addresses being the same. Thus I would use regular expression matching both ways as below as one
of the optional solutions. By default, variable comparisons are regular expression matching, not
strict equalities. Also note avoiding email loops and falsely targeting email which one may have
sent to oneself.

WHOFROM=`formail -xFrom: \
  | expand \
  | sed -e 's/  */ /g' \
  | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
 
WHOTO=`formail -xTo: \
  | expand \
  | sed -e 's/  */ /g' \
  | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
 
:0:
* -100^0 ^X-Loop: myid@myhost\.mydom
* -100^0 ^TO_(myid|myFirst\.mySecond)@([-a-z0-9_]+\.)*myhost\.mydom
* -100^0 ^From:.*LegitimateMailingList
* 1^0 $ WHOFROM ?? ${WHOTO}
* 1^0 $ WHOTO ?? ${WHOFROM}
ProbableSpam.mail
                                  -----------------------------                                   

Ex.13) Write a (spam avoidance) recipe to detect email with more than seven recipients in the
"To:" header field. Assume for simplicity that each address will have exactly one "@" character in
it.
 
:0
* ^Subject:.*The information you requested
{
  :0
  {
    WHOTO=`formail -z -xTo:`
    COUNT=`echo ${WHOTO} | sed -e 's/[^@]//g' | wc -c`
    COUNT1=`expr ${COUNT} - 1`
    ISGT=`expr ${COUNT1} \> 7`
  }
  :0:
  * ISGT ?? ^^1^^
  ProbableSpam.mail
}
                                  -----------------------------                                   

Ex.14) Make procmail forward email that arrives between 9am and 5pm to a predefined daytime email
address.
 
:0
# Omit the condition line below if this is for all email
* ^Subject:.*Whatever
{
  :0
  {
    TIME=`date +%H%M`
    ISGT=`expr ${TIME} \> 0900`
    ISLT=`expr ${TIME} \< 1700`
  }
  :0
  * ISGT ?? ^^1^^
  * ISLT ?? ^^1^^
  ! daytime_forward_address
}
                                  -----------------------------                                   

Ex.15) Write a Procmail recipe which detects if there is a Word document attached to the incoming
email.
 
# Email with a Word document attached
:0
* ^Content-Type: multipart/
{
  :0 B
  * ^Content-.*attachment.*name=.*\.(doc|rtf)
  {
    :0
    { RULE="Email with a Word document attached" }
    :0:
    WordAttachmentEmail
  }
}
                                  -----------------------------                                   

Ex.16) Write a recipe to detect a "whatever pattern" on exactly the second line of the body of an
incoming message. Ignore case in the pattern.
 
:0B:
* ? sed -n 2p | egrep -is 'whatever pattern'
WhateverPatternMail
 
A tip: Even if there is no direct relation with procmail, my collection of useful MS-DOS batch
files and tricks contains several examples of the sed (and awk) usages. So does my collection of
useful NT/2000/XP script tricks and tips.
                                  -----------------------------                                   

Ex.17) Write a spam detection recipe that does the following:
1. Check the body of the message against the keywords (collected spam sites' www addresses etc.)
in a BlackList.lst pattern-file. The pattern-file might contain something like:
   This letter may come to you as a surprise
   Urgent business proposal
   cheap-medz.com
   discreetdelivery.net
   http://homemarketplace.cjb.net
   mailto:reklamapoezd@
   quityourjobworkforus
   statesmoneyz.com
   www.badcrednp4u.biz
2. If a KEEPSPAM variable has been set to "yes" save the spam to Spam.mail, truncated to 100
lines. If not, discard the message.

# Probable spam mail, by message body
:0B
* $ ? fgrep -is -f BlackList.lst
{
  :0
  * KEEPSPAM ?? ^^yes^^
  {
    :0:MyProcmail.lock
    | sed -n 1,100p >> Spam.mail
  }
  :0E
  {
    :0
    /dev/null
  }
}

--------------------------------------------------------------------------------------------------
[tsm] Acknowledgements for useful advice and/or feedback:

 Aughey, John
 Bump, Jorey
 Davey, David
 Dnes, Walter
 Eriksson, Era
 Guenther, Philip
 Hebeisen, Christoph
 Hirvonen, Hannu
 Melish, Jacob
 Menezes, Evandro
 Novak, Curtis
 Park, Collin
 Pettigrew, John
 van Tol, Ruud
 Van Steenkist, Vernon

Any errors and inadequacies are, however, solely my own responsibility.

S: A legal note: The author shall not be liable to the user, the reply target or any third party
for any direct, indirect or consequential loss or damage arising from using, abusing, or a failure
to be able to use, the information in this message/file howsoever caused. No warranty is given
that all the information contained is correct, or that it is current.

--------------------------------------------------------------------------------------------------

[ts(a)uwasa.fi ] [Photo ] [Programs ] [FAQs ] [Research ] [Lectures ] [Department: front main  
links] [Faculty ] [University ]
 
[Revalidate]