Fighting Spam
with open source software
Charly Kühnast
Municipal Datacenter for the Lower Rhine Area Internet Infrastructure
Introduction: KRZN and spam filtering
~ 11.000 users
2 e-mails per user per day = ~ 22.000 e-mails per day
That is, 20k e-mails that we actually want. But we get quite a lot more.
Today: 6.000.000 spam-mails per day H1/2007 H2/2007 H1/2008 H2/2008 0 1 2 3 4 5 6 7 Spam: mil-lions/day
Averages
On average, ...
… 99,65% of incoming SMTP traffic is unwanted. … we have 5.300 incoming spam-mails per minute However, peaks have reached > 25.000 spams/min.
So, where does all this spam come from?
Botnets
A trojan is written to infiltrate as many PCs (and servers, even) as possible
The trojan's author then has full command over that machine. It is now a
remote-controlled bot (or drone)
If a sizeable number of PCs have been
Botnets are weapons. They can...
… saturate network connections (dDoS)
… infect other systems to expand the botnet … be used for data and identity theft
Botnets can grow very large
Several botnets with more than 1.000.000 drones
exist.
They are powerful enough to cut whole countries off the internet (which
happened to Estland in 2007)
For a fistful of dollars
Anyone can rent (a part of) a botnet and make it send spam
It's not even expensive ( 1 US$ per Bot per day,
chinese botnets are cheaper) Botnets generate a lot of
collateral damage, but the ROI is great
Conversion rate
Scientists of the UCSD gained control over 80.000 bots (1.5%) of the “Storm” botnet and tracked its actions for 30 days.
For every mail that lead to a purchase of pharmacy products,
12.500.000
Can botnets be destroyed?
It happens, but not very often. In Oct '08, a spammer-friendly hosting provider (McColo) was shut down:
Part II
Now you know what the problem is.
Let's look at a possible solution.
DNSBL header checks Address Verification Image-spam filter Anti-Virus Content Filter
Spamfilters are step-by-step systems.
Each step eliminates more spam.
The KRZN filter uses six
steps. Open source software is used for each of them.
An e-mail that survives all filtering steps is considered clean and may proceed to its final destination.
DNSBL header checks Address Verification Image-spam filter Anti-Virus Content Filter Postfix / PolicyD-weight Postfix / PolicyD-weight header checks
Postfix (built-in feature)
SpamAssassin + ext. rulesets FuzzyOCR
ClamAV + ext. pattern sources
Spammer? DNSBL mail-out.sender.net my.spamfilter.net DNSBL list host
?
NoDNSBL
DNSBLs are very, very, very effective tools. However, they must be used with care.
Is the DNSBL provider trustworthy?
What happens when a DNSBL ceases to exist? Why not build your own DNSBL?
Build your own DNSBL
Set up a few e-mail accounts without any filtering. Spread these e-mail adresses
Poll the accounts once per minute and extract the sending server's IP address
Add the IP to your blacklist and have it removed
after 48 hours, if no further spam from this IP came in
Ask more than one DNSBL
You might want to reject mails only when they are listed in more than one DNSBL.
01 ## DNSBL settings 02 @dnsbl_score = (
03 #HOST, BAD SCORE, GOOD SCORE, LOG NAME 04 'list.dsbl.org' 3.5, 0, 'DSBL_ORG',
05 'cbl.abuseat.org' 3.5, 0, 'ABUSEAT',
06 'sbl.hsnr.de', 3.5, 0, 'HSNR_DE', 07 );
header checks Postfix / PolicyD-weight header checks
With access to the mail headers, a policy daemon can
Header Checks
- throttle connection if too many mails - come in from the same sender - come in to the same recipient - make use of
- greylisting
- SPF/DKIM checks - HELO checks
HELO randomization (same server, different HELO):
Apr 24 12:41:11 connect from rectal.post.ru[83.102.180.3] Apr 24 12:41:32 connect from triplex.post.ru[83.102.180.3] Apr 24 12:42:04 connect from hole.post.ru[83.102.180.3]
Header Checks
Occasionally, a spammer will use your own server's name as a HELO string...
Incidentally, I'm not
Address verification
Recipient address verification:
Mails to non-existent addresses should be rejected as early as possible.
Sender address verification:
Mails from non-existent addresses are considered bad form. However, this doesn't stop people from sending them (newsletters, order confirmations...)
Address verification
Recipient address verification is easy if you have a list of all valid addresses.
Needless to say, usually you don't, because there are lots of different mail servers in your organization.
The solution is to have your spam filter make dummy connections to the destination mail server.
Spamfilter Mail Server To: [email protected]
Does [email protected] exist?
Content Filter
The content filter is depicted here as a single step.
SpamAssassin: hundreds of
individual checks are applied to the content and structure of the e-mail.
If one check is a “hit”, points are added to the mail's total spam score.
A spam mail, 18-Nov-08:
From: "Dickson"<[email protected]>
Subject: INVESTIGATION ON BEHALF OF OUR BANK Date: Tue, 18 Nov 2008 11:28:20 -0000
To: undisclosed-recipients:; Dear Sir/Madam,
I am conducting a standard process investigation on behalf of our Bank an international banking conglomerate. This investigation involves a client and also the circumstances surrounding
investments made by this client with our Bank.
Our client died intestate and nominated no successor in title over the investments made with our bank. The essence of this communication with you is to request you provide us information/comment on this issue so that I can use my position in the bank to establish your eligibility to assume status of successor in title to the deceased.
...and what the content filter made of it:
X-Spam-Score: 16.376 X-Spam-Report:
* 2.0 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net * [Blocked - see <http://www.spamcop.net/bl.shtml?217.171.129.66>]
* 0.6 SPF_SOFTFAIL SPF: sender does not match SPF record (softfail) * 2.1 SUBJ_ALL_CAPS Subject is all capitals
* 1.6 DEAR_SOMETHING BODY: Contains 'Dear (something)' * 0.0 BAYES_50 BODY: Bayesian spam probability is 40 to 60% * [score: 0.5368]
* 0.5 RAZOR2_CHECK Listed in Razor2 (http://razor.sf.net/)
* 1.5 RAZOR2_CF_RANGE_E4_51_100 Razor2 gives engine 4 confidence level * above 50%
* [cf: 100]
* 0.5 RAZOR2_CF_RANGE_51_100 Razor2 gives confidence level above 50% * [cf: 100]
* 3.7 PYZOR_CHECK Listed in Pyzor (http://pyzor.sf.net/)
* 0.0 DIGEST_MULTIPLE Message hits more than one network digest check * 0.8 MSOE_MID_WRONG_CASE MSOE_MID_WRONG_CASE
If the total score exceeds a “warning” threshold, the mail's subject line will be modified:
[*Spam?*] original subject line
Content Filter
SpamAssassin comes with a large set of anti-spam rules, but you can still add more to it.
Content Filter
Sa-Update will fetch rules from the SpamAssassin Rule Emporium (SARE) and various other sources like
- openprotect.com - daryl.dostech.ca
sa-update example:
Content Filter
sa-update -D --channelfile /etc/spamassassin/channels.text --gpgkeyfile /etc/spamassassin/keys.text channels.text: updates.spamassassin.org saupdates.openprotect.com 70_sare_stocks.cf.sare.sa-update.dostech.net 70_sare_adult.cf.sare.sa-update.dostech.net [...more...]
Spammers usually use text-only or HTML messages. But sometimes “containers” are used, such as
Spam containers
- Images, e.g. animated .gifs - PDFs
- Flash
- .doc, .rtf, .ppt - MP3
Image to text
FuzzyOCR extracts text from images and feeds it into SpamAssassin's content filter.
FuzzyOCR even works with images that are - distorted,
- animated,
With ClamAV, you can use virus patterns that you have made yourself (or someone you trust).
Virus Filter
These “unofficial” pattern files can be used to catch anything, not just viruses or malware.
For example, they can be aimed at spam (suprise!), phishing and attachments that aren't exactly spam, but unwanted nonetheless.
SaneSecurity and MSRBL provide pattern files for ClamAV and a shell script (“unofficial-sigs.sh”) to download them.
Virus Filter: third-party files
rsync://rsync.sanesecurity.net/sanesecurity/phish.ndb rsync://rsync.sanesecurity.net/sanesecurity/scam.ndb rsync://rsync.sanesecurity.net/sanesecurity/junk.ndb rsync://rsync.sanesecurity.net/sanesecurity/rogue.hdb rsync://rsync.sanesecurity.net/sanesecurity/spear.ndb rsync://rsync.sanesecurity.net/sanesecurity/spamimg.hdb rsync://rsync.sanesecurity.net/sanesecurity/lott.ndb rsync://rsync.sanesecurity.net/sanesecurity/spam.ldb rsync://rsync.mirror.msrbl.com/msrbl/MSRBL-Images.hdb rsync://rsync.mirror.msrbl.com/msrbl/MSRBL-SPAM.ndb
Your own AV patterns
HTML.Phishing.Bank-66:3:*:6c696d6974656420616363657373
HTML.Phishing.Bank-66: name (shows up in logfile)
3: file type, 3 = HTML
*: Offset
6c696d6974656420616363657373: hex-encoded string
Your own AV patterns
sigtool --md5 thisisspam.gif >> /path/to/my-patterns.hdb
Creating pattern files against “Container spam” is even
DNSBL header checks Address Verification Image-spam filter Anti-Virus Content Filter
Kills 97% of incoming spam
Kills 3 %of incoming spam header checks
Thank you! Questions?