• No results found

Effective Open-Source Spam Filtering

N/A
N/A
Protected

Academic year: 2021

Share "Effective Open-Source Spam Filtering"

Copied!
24
0
0

Loading.... (view fulltext now)

Full text

(1)

Effective Open-Source Spam Filtering

For Enterprise

For Enterprise

Chris Lewis Thomas Choi Thomas Choi October 2008 VB2008, Ottawa

(2)

Agenda

• Introduction Background

z Background

z Something New - Rationale z The Open-Source Project

zBasic Requirements zComponents I t ti zIntegration zTest/Performance Ad d T h i • Advanced Techniques

(3)

Introduction/Authors Introduction/Authors

Chris Lewis

Senior Security Analyst/Anti-Spam, Nortel Senior Technical Advisor, MAAWG

Member, Canadian Federal Anti-Spam Task Force

Thomas Choi Nortel

Ph D Student Carleton University Ph.D Student, Carleton University

(4)

Background Background

z Spam became a problem in 1994/1995 z Initially in Usenet

z Initially in Usenet

z Clearly would transition to Email

z Commenced Email Anti-Spam program in 1997 z Extremely customized Lyris Mailshield

z Extremely customized Lyris Mailshield

implementation

VB2004 “Corporate Spam Fighting: 5 years of

z VB2004 Corporate Spam Fighting: 5 years of

success and lessons Learned”: by Chris Lewis and John Morris – don't forget those lessons!

(5)

Something New - Rationale Something New Rationale

L i M il hi ld h t d i d t d

z Lyris Mailshield has stood us in good stead

z But, getting a little elderly, higher volumes, difficult

to extend with newer techniques

z Review of many other vendor offerings: z Review of many other vendor offerings:

z All missing one or more of critical features z Integrated poorly with existing infrastructure z Not, or poorly extensible/configurable

z Not, or poorly extensible/configurable z Not as effective as current solution

(6)

Rationale ... Continued Rationale ... Continued

z Needed open architecture/modular/easy extension z Low capital/license cost (free obviously best!)p ( y )

z Use standard components to minimize

development costs development costs

z Use existing basic low-medium size server class

hardware hardware

z Focus on 3rd party/popular filtering methodologies,

simple ad-hoc filtering capabilities, plus with our own “secret sauce”.

(7)

The Open Source Project The Open Source Project

z Basic Requirements – Functional Specification z Component Selection z Component Selection z Integration z Back end z Testing z Testing

(8)

Basic Requirements - Filter Basic Requirements Filter

z Support multiple recipient domains z Configurable per-domain handling z Per-domain filter enable

z Configurable archiving/quarantine/disposition g g q p

(pass,filter, trap)

z Output routingp g z Full logging

NEVER b il t bl kh l ( t t )

z NEVER bounce or silent blackhole (except trap) z Plugin architecture – each technique an g q

(9)

Basic Filter Requirements ... Continued Basic Filter Requirements ... Continued

z Fault tolerant (eg: failover)

z Support 3rd party facilities, eg: z Support 3 party facilities, eg:

z DNSBL (IP blacklists)

z SURBL/URIBL (URI blacklists) z SURBL/URIBL (URI blacklists)

z “informational” lookups (eg: ASN) z Content Scoring filter

z Anti-virus

z Arbitrary ad-hoc string filters anywhere/on anything z Direct/real-time feedback to filtering

(10)

Basic “Not filter” Requirements Basic Not filter Requirements

z Full end-user quarantine view/forward

z End-user (recipient) notification (if desired) z End user (recipient) notification (if desired) z Full logs in database/arbitrary queries

z (Almost) fully automated false positive handling

(forward, filter tune, notification/explanation)

z Operational and Management metrics

Postfacto analysis and automated filter tuning

(11)

Components, Filter, Open-Source Components, Filter, Open Source

z Core SMTP listening engine/agent: Qpsmtpd (Hansen, Sergeant

et. al.). 100% Perl implementation (really!)

z Async (event driven) mode

z Async (event driven) mode

z Very high performance – 20M+/day small servers

z Entirely flexible by plugin interfacey y p g z Actively supported & robust

z Has many sample plugins

z SpamAssassin (popular scoring addon filter). (Perl)

z ClamAV (*ix-based) anti-virus signature-based engine( ) g g

z Nearly two dozen ad-hoc filtering plugins, few more than a dozen

lines.

(12)

Components, Filter, Glue Components, Filter, Glue

z A spam filter is more than just a filter, needs: z Start/stop/reboot/monitoring

z Start/stop/reboot/monitoring

z Log & quarantine handling and transfer

z Extended filtering heuristic processes (for things

that take too long for real-time)

(13)

Components, Backend Components, Backend

z PostgreSQL database

z Apache (admin and user interface) z Apache (admin and user interface)

z Interface to corporate user databases (push to

filters) filters)

z Admin (research, false positive, configuration,

d l t) i t f CGI

deployment) interface CGIs

z User interfaces (configuration and quarantine)( g q ) z Quarantine management

R l ti fi ti ft

VB2008, Ottawa

z Rsync – log, quarantine, configuration, software

(14)

Integration SPAM PostgreSQL SPAM Database I t t DMZ N S Apache Internet DMZ QPSMTPD Plugins Non-Spam Mail servers Plugins SpamAssassin ClamAV Mail servers Config Users Rejection Config Rejection Notices F l P iti DNSBL 3rd Party BL False Positive Reports CORWAN

(15)

Test/Performance Test/Performance

z Spamtrap operating 9 months

z Performance heavily depends on “early pruning” z Performance heavily depends on early pruning

z “Cheap” tests first

z Prune filtering subsequent to block decision z Prune filtering subsequent to block decision z “Expensive” (body scans, SpamAssassin,

ClamAV) tests last ClamAV) tests last

z Volumes: typical 7m/server (50-100/sec), mostly

spamtrap spamtrap

(16)

Advanced Techniques Advanced Techniques z State of Affairs z Hide! z Hide! z Banner delays z Bot fingerprinting

z DNSBLs (local and/or otherwise) z DNSBLs (local and/or otherwise) z DNSBL infrastructure

z Bounces & BATV

z Ones we've omitted and why z Ones we ve omitted and why

(17)

State of Affairs State of Affairs

z Underground economy (spam, phish, spyware, CC, mules)

increasing

Some LE believe larger than International Drug trade

z Some LE believe larger than International Drug trade z BOTS responsible for 80%+ of all spam.

z Most getting good at stopping BOTs (<1% deliverability) z => BOTs shifting to reputation theft (relay through legit

MTA ) MTAs)

z State of Anti-Virus: disaster. (new BOT caught by AV 23%

of the time by battery of 35 AV tools only increases to 50% of the time by battery of 35 AV tools, only increases to 50% by 30 days)

z Inadequate AV => can’t find BOT, let alone remediateInadequate AV can t find BOT, let alone remediate

(18)

Hide! Hide!

z Make it difficult for BOTs to email you.

z BOTs not full MTAs, high volume/throughput g g p

requirements.

z Primary MX – “refuse connections” (Google for y ( g

“nolisting”)

z Tertiary MX – “always retry”y y y

z Dumb bots try once (primary or tertiary), get refusal

or retry, and give up. Real MTAs do right thing.y, g p g g

z As much as 50% of BOT spam simply vanishes.

L f t i

(19)

Banner Delays Banner Delays

z Most BOTs impatient, and won’t retry z 20-40 second banner delays =>

z 20 40 second banner delays z BOTs give up in disgust

z Some legit MTAs equally impatient, may need to

(20)

BOT Fingerprinting BOT Fingerprinting

z Most BOTs have fingerprints in the headers and

SMTP protocol that can be caught by pattern t hi

matching.

z Some mutate, some don’t. z Srizbi > 50% of all spam.

F d IP f d t ti b k i t l l

z Feed source IP of detections back into local

(21)

DNSBL (DNS Blacklist) DNSBL (DNS Blacklist)

z Hundreds of 3rd party DNSBLs (IP based, domain

based, URIBL filtering etc)

z A handful are both reliable and effective.

There are DNSBLs effective to 70 80%+ of all

z There are DNSBLs effective to 70-80%+ of all

(22)

DNSBL Merge DNSBL Merge

z High volume receivers may impose undue loading on 3rd

party DNSBL infrastructure.

z Occasional erratic delays (including DDOS on DNSBL) z => Host them locally

z We use rbldnsd – very high performance DNS server

designed for high-performance serving of DNSBL zones.

z We combine multiple 3rd party zones (plus ones we create

ourselves) into a single zone.

z Each DNSBL source distinguishable by return code,

(23)

Filtering/Bounces & BATV Filtering/Bounces & BATV

z Accepting then bouncing email with forged from => bounce

storms (aka backscatter/blowback) => evil

z Simple blackholing also evil

z Aim is inline reject, with remediation information.

z Support costs of receiving end of blowback often exceed

spam

z BATV (Bounce Address Tag Validation) see

http://mipassoc.org/batv/

z When sending email, encode bounce address (MAIL

FROM)

(24)

Omitted Techniques & Why Omitted Techniques & Why

z Greylisting – (force retry of “new senders”). z Increasing reports of BOTs doing retry.

z Doesn’t prevent spam-by-reputation-hijacking

z Bayesian – needs training, in many cases defeatedy g, y z Checksumming (Razor/DCC et. al.) –

Detects bulk not spam per se

z Detects bulk, not spam per-se

z Problemmatic when outsourcing user-contact (eg: HR) z Needs whitelisting

z Needs whitelisting

References

Related documents

Filters, called Rules in Outlook and Outlook Express, are tools within e-mail programs that use specific criteria to identify incoming messages as spam.. Filters can then

Cognitive therapy techniques: A practitioner’s guide.. New York, NY:

literature for incidents such as the late-2000s financial crisis or the events of September 11th 2001, which share the three principal characteristics rarity, extreme impact,

Since lending legislation has evolved around the banking system and is aimed at virtually only banks that are allowed to facilitate deposits into lending, one would then assume that

The students were then shown Futures Window, a slide show of photos set to music depicting weak signals provocative of possible futures (Heinonen &amp; Hiltunen 2012), as food

Immigration and emigration has also consequences for public financing. Residence has become criterion for affiliation instead of citizenship. Immigrants are included an

The literature identifies two channels that potentially may affect the economy: (i) diminishing of investment risk premia through lower interest rates and cost of capital services

Alice (Health and Social Care Lecturer, Appleton College) made comparisons with nurse training and Carol (Business Studies Lecturer, Cannons College) compared the GNVQ