logo

regm.py

Here is a Python script I made to find mail in mailboxes using quick regular expression.
It could be useful for people with huge mail directory (and especially for Mutt users). Regm.py is released under GNU General Public License.

Documentation

The output of regm --help :



regm version 1.0 15/07/01

Filter and extract messages from mailboxes with regular expressions.

installation:
        - change defaults at the begining of the script to fit your 
          needs, it's GPL code.
        - put it with your python scripts (need python 2.0) and do a 
          ln -s regm.py ~/bin/regm
bugs:
- parse encoded attachment
- only tested on Linux
- not usable as a library

usage:
regm [options] [search string] [input files]

exemples of use:
regm hello ~/mail/* > ~/tmp/mbox
regm -v -f ~/tmp/mbox -b hello -o -b bye ~/mail/*
regm -vx -f ~/tmp/mbox 'hello||bye'  ~/mail/*
regm -v -f ~/tmp/mbox -n -b hello ~/mail/*
regm -v -f ~/tmp/mbox -h '^to:.*@free.fr' ~/mail/*
regm -vx 't:@free.fr' > ~/tmp/mbox
regm -vN -f ~/tmp/mbox -h '^to:.*@free.fr' ~/mail/*
regm -xm 'f:joe&&hello&&!t:@free.fr'
regm -p '%40d  sub: %s\n %5B\n\n' "" ~/mail/inbox
regm -v -p '%a\n\n' '' ~/news/fr.misc.bavardages.dinosaures
regm -p '%D %T\n' "" ~/mail/sent
regm -U '' duplicate.mbox > clean.mbox

Default file path for mbox is ~/mail/*

All research string are regex, in expert mode (-x) you can use "||" 

"&&" and "!" operator in the same string as separator between different
regex.

Each output message is absolutely left unmodified by the filter.

options:
-h string       search string in message header
-b string       search string in message body
-n              negation of the following -h or -b 
-N              global negation (invert the filter output)
-o              "or" (between -h or -b). Default is a "and"
-f file         output file. there is a warning if output file exists.
                Default output is stdout.
-x              xpert mode
                no -h or -b option, use of && || !, the search string
                must be after all options (and before input path)
                's:' is '^Subject: .*'
                'd:' is '^Date: .*'
                'f:' is '^From: .*'
                'e:' is '^Sender: .*'
                't:' is '^To: .*'
                'c:' is '^Cc: .*'
                'r:' is '^Reply-To: .*'
                'i:' is '^Message-ID: .*'
                'g:' is '^References: .*'
                'a:' is '^Approved: .*'
                'x:' is '^X-Loop: .*'
                'n:' is '^newsgroups: .*'
                'h:' is equivalent to the -h option, default is 
                        searching in body
                
-u              case sensitive. Default is case insensitive
-p string       output format, syntax : \n \t and 
                %[number][sdfetcrixnagBDFETCR] for subject, date, from,... 
                B is body, other uppercase are for stripped mail 
                addresses and D for date with "%D" format
-m              direct launch of mutt on the result (using temp output
                file)
-U              discard duplicate messages (with same message-id)
-D string       change output date format of -p "%D" option
-q              quiet
-v              verbosity
--help          this help
--version       version

	

Files

Other tools that you may consider:

  • grepmail http://grepmail.sourceforge.net
    Search mailboxes for mail matching a regular expression.
    Grepmail is a lot more powerful than regm, with many options.
  • archivemail http://archivemail.sourceforge.net
    A tool written in Python for archiving and compressing old email in mailboxes.
    Extraction based on date, gzip of the resulting mailbox, and many other usefull options.