Fighting Spam with open source software

(1)

Fighting Spam

with open source software

Charly Kühnast

Municipal Datacenter for the Lower Rhine Area Internet Infrastructure

(2)

Introduction: KRZN and spam filtering

~ 11.000 users

2 e-mails per user per day = ~ 22.000 e-mails per day

That is, 20k e-mails that we actually want. But we get quite a lot more.

(3)

Today: 6.000.000 spam-mails per day H1/2007 H2/2007 H1/2008 H2/2008 0 1 2 3 4 5 6 7 Spam: mil-lions/day

(4)

(5)

Averages

On average, ...

… 99,65% of incoming SMTP traffic is unwanted. … we have 5.300 incoming spam-mails per minute However, peaks have reached > 25.000 spams/min.

(6)

(7)

So, where does all this spam come from?

(8)

(9)

Botnets

A trojan is written to infiltrate as many PCs (and servers, even) as possible

The trojan's author then has full command over that machine. It is now a

remote-controlled bot (or drone)

If a sizeable number of PCs have been

(10)

Botnets are weapons. They can...

… saturate network connections (dDoS)

… infect other systems to expand the botnet … be used for data and identity theft

(11)

Botnets can grow very large

Several botnets with more than 1.000.000 drones

exist.

They are powerful enough to cut whole countries off the internet (which

happened to Estland in 2007)

(12)

For a fistful of dollars

Anyone can rent (a part of) a botnet and make it send spam

It's not even expensive ( 1 US$ per Bot per day,

chinese botnets are cheaper) Botnets generate a lot of

collateral damage, but the ROI is great

(13)

Conversion rate

Scientists of the UCSD gained control over 80.000 bots (1.5%) of the “Storm” botnet and tracked its actions for 30 days.

For every mail that lead to a purchase of pharmacy products,

12.500.000

(14)

Can botnets be destroyed?

It happens, but not very often. In Oct '08, a spammer-friendly hosting provider (McColo) was shut down:

(15)

Part II

Now you know what the problem is.

Let's look at a possible solution.

(16)

DNSBL header checks Address Verification Image-spam filter Anti-Virus Content Filter

Spamfilters are step-by-step systems.

Each step eliminates more spam.

The KRZN filter uses six

steps. Open source software is used for each of them.

An e-mail that survives all filtering steps is considered clean and may proceed to its final destination.

(17)

DNSBL header checks Address Verification Image-spam filter Anti-Virus Content Filter Postfix / PolicyD-weight Postfix / PolicyD-weight header checks

Postfix (built-in feature)

SpamAssassin + ext. rulesets FuzzyOCR

ClamAV + ext. pattern sources

(18)

(19)

Spammer? DNSBL mail-out.sender.net my.spamfilter.net DNSBL list host

?

No

(20)

DNSBL

DNSBLs are very, very, very effective tools. However, they must be used with care.

Is the DNSBL provider trustworthy?

What happens when a DNSBL ceases to exist? Why not build your own DNSBL?

(21)

Build your own DNSBL

Set up a few e-mail accounts without any filtering. Spread these e-mail adresses

Poll the accounts once per minute and extract the sending server's IP address

Add the IP to your blacklist and have it removed

after 48 hours, if no further spam from this IP came in

(22)

Ask more than one DNSBL

You might want to reject mails only when they are listed in more than one DNSBL.

01 ## DNSBL settings 02 @dnsbl_score = (

03 #HOST, BAD SCORE, GOOD SCORE, LOG NAME 04 'list.dsbl.org' 3.5, 0, 'DSBL_ORG',

05 'cbl.abuseat.org' 3.5, 0, 'ABUSEAT',

06 'sbl.hsnr.de', 3.5, 0, 'HSNR_DE', 07 );

(23)

header checks Postfix / PolicyD-weight header checks

(24)

With access to the mail headers, a policy daemon can

Header Checks

- throttle connection if too many mails - come in from the same sender - come in to the same recipient - make use of

- greylisting

- SPF/DKIM checks - HELO checks

(25)

HELO randomization (same server, different HELO):

Apr 24 12:41:11 connect from rectal.post.ru[83.102.180.3] Apr 24 12:41:32 connect from triplex.post.ru[83.102.180.3] Apr 24 12:42:04 connect from hole.post.ru[83.102.180.3]

Header Checks

Occasionally, a spammer will use your own server's name as a HELO string...

Incidentally, I'm not

(26)

(27)

Address verification

Recipient address verification:

Mails to non-existent addresses should be rejected as early as possible.

Sender address verification:

Mails from non-existent addresses are considered bad form. However, this doesn't stop people from sending them (newsletters, order confirmations...)

(28)

Address verification

Recipient address verification is easy if you have a list of all valid addresses.

Needless to say, usually you don't, because there are lots of different mail servers in your organization.

The solution is to have your spam filter make dummy connections to the destination mail server.

(29)

Spamfilter Mail Server To: [email protected]

Does [email protected] exist?

(30)

(31)

Content Filter

The content filter is depicted here as a single step.

(32)

SpamAssassin: hundreds of

individual checks are applied to the content and structure of the e-mail.

If one check is a “hit”, points are added to the mail's total spam score.

(33)

A spam mail, 18-Nov-08:

From: "Dickson"<[email protected]>

Subject: INVESTIGATION ON BEHALF OF OUR BANK Date: Tue, 18 Nov 2008 11:28:20 -0000

To: undisclosed-recipients:; Dear Sir/Madam,

I am conducting a standard process investigation on behalf of our Bank an international banking conglomerate. This investigation involves a client and also the circumstances surrounding

investments made by this client with our Bank.

Our client died intestate and nominated no successor in title over the investments made with our bank. The essence of this communication with you is to request you provide us information/comment on this issue so that I can use my position in the bank to establish your eligibility to assume status of successor in title to the deceased.

(34)

...and what the content filter made of it:

X-Spam-Score: 16.376 X-Spam-Report:

* 2.0 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net * [Blocked - see <http://www.spamcop.net/bl.shtml?217.171.129.66>]

* 0.6 SPF_SOFTFAIL SPF: sender does not match SPF record (softfail) * 2.1 SUBJ_ALL_CAPS Subject is all capitals

* 1.6 DEAR_SOMETHING BODY: Contains 'Dear (something)' * 0.0 BAYES_50 BODY: Bayesian spam probability is 40 to 60% * [score: 0.5368]

* 0.5 RAZOR2_CHECK Listed in Razor2 (http://razor.sf.net/)

* 1.5 RAZOR2_CF_RANGE_E4_51_100 Razor2 gives engine 4 confidence level * above 50%

* [cf: 100]

* 0.5 RAZOR2_CF_RANGE_51_100 Razor2 gives confidence level above 50% * [cf: 100]

* 3.7 PYZOR_CHECK Listed in Pyzor (http://pyzor.sf.net/)

* 0.0 DIGEST_MULTIPLE Message hits more than one network digest check * 0.8 MSOE_MID_WRONG_CASE MSOE_MID_WRONG_CASE

(35)

If the total score exceeds a “warning” threshold, the mail's subject line will be modified:

[*Spam?*] original subject line

Content Filter

(36)

SpamAssassin comes with a large set of anti-spam rules, but you can still add more to it.

Content Filter

Sa-Update will fetch rules from the SpamAssassin Rule Emporium (SARE) and various other sources like

- openprotect.com - daryl.dostech.ca

(37)

sa-update example:

Content Filter

sa-update -D --channelfile /etc/spamassassin/channels.text --gpgkeyfile /etc/spamassassin/keys.text channels.text: updates.spamassassin.org saupdates.openprotect.com 70_sare_stocks.cf.sare.sa-update.dostech.net 70_sare_adult.cf.sare.sa-update.dostech.net [...more...]

(38)

(39)

Spammers usually use text-only or HTML messages. But sometimes “containers” are used, such as

Spam containers

- Images, e.g. animated .gifs - PDFs

- Flash

- .doc, .rtf, .ppt - MP3

(40)

(41)

Image to text

FuzzyOCR extracts text from images and feeds it into SpamAssassin's content filter.

FuzzyOCR even works with images that are - distorted,

- animated,

(42)

(43)

(44)

(45)

With ClamAV, you can use virus patterns that you have made yourself (or someone you trust).

Virus Filter

These “unofficial” pattern files can be used to catch anything, not just viruses or malware.

For example, they can be aimed at spam (suprise!), phishing and attachments that aren't exactly spam, but unwanted nonetheless.

(46)

SaneSecurity and MSRBL provide pattern files for ClamAV and a shell script (“unofficial-sigs.sh”) to download them.

Virus Filter: third-party files

rsync://rsync.sanesecurity.net/sanesecurity/phish.ndb rsync://rsync.sanesecurity.net/sanesecurity/scam.ndb rsync://rsync.sanesecurity.net/sanesecurity/junk.ndb rsync://rsync.sanesecurity.net/sanesecurity/rogue.hdb rsync://rsync.sanesecurity.net/sanesecurity/spear.ndb rsync://rsync.sanesecurity.net/sanesecurity/spamimg.hdb rsync://rsync.sanesecurity.net/sanesecurity/lott.ndb rsync://rsync.sanesecurity.net/sanesecurity/spam.ldb rsync://rsync.mirror.msrbl.com/msrbl/MSRBL-Images.hdb rsync://rsync.mirror.msrbl.com/msrbl/MSRBL-SPAM.ndb

(47)

(48)

Your own AV patterns

HTML.Phishing.Bank-66:3:*:6c696d6974656420616363657373

HTML.Phishing.Bank-66: name (shows up in logfile)

3: file type, 3 = HTML

*: Offset

6c696d6974656420616363657373: hex-encoded string

(49)

Your own AV patterns

sigtool --md5 thisisspam.gif >> /path/to/my-patterns.hdb

Creating pattern files against “Container spam” is even

(50)

DNSBL header checks Address Verification Image-spam filter Anti-Virus Content Filter

Kills 97% of incoming spam

Kills 3 %of incoming spam header checks

(51)

(52)

Thank you! Questions?