Chapter 3: Current State of the Art Combating Spam
3.4 Filtering
3.4.2 Network Based Filtering
An alternate approach to content based filtering is network level filtering, which typically comprises of a blacklist. A blacklist comprises of known bad IP addresses that have been used to send spam. A mail server uses a blacklist when it receives a connection from a sending server. Specifically, the receiving server checks the sending server’s IP address against the blacklist to determine if the sending server’s IP address is listed.. If it is listed then the mail is rejected. However, if the sending server’s IP address is not listed in the blacklist then it is inconclusive as to what the filtering system should do with messages from the sending server.
There is a wide range of different publicly available blacklists where each list has a specific focus. For example, the Composite Blocking List (CBL) [40] and Not Just Another Bogus List (NJABL-PROXY) [41] are focused on compromised machines whereas the Spamhaus [44] blacklist [6] list is focused on known spam sources that are not necessarily compromised machines.
When a receiving mail server receives an incoming SMTP connection, the receiving server can query a blacklist to determine whether the incoming mail server is listed on that blacklist or not. The query is performed by doing a reverse DNS lookup on the blacklist server. The response from the blacklist server helps the receiving server know whether to block or allow such e-mails. For example, if a receiving mail server received a connection from a sending server with an IP address of 1.2.3.4 and wanted to know
whether the sending server was listed on the Spamhaus blacklist then the receiving server would perform the following DNS query:
4.3.2.1.xbl.spamhaus.org
If the IP address is listed in the Spamhaus blacklist then the response would be
127.0.0.2
If the IP address is not listed in the Spamhaus blacklist then the response would be
NXDOMAIN.
By examining the response received from the blacklist server, the receiving server will be able to determine whether the IP is listed and whether or not they should accept the e-mail.
One problem with blacklists is that they are detection based and suffer from lag times. For example, in [42] of the 4295 spam zombie hosts IP addresses queried against the Spamhaus blacklist, only 255 were blacklisted after 46 days of continual activity. This implies that there is a lag time from when a new spam source starts to send spam, to the time when the IP address of the source is listed in the blacklist [45]. The actual criteria used to determine whether an IP gets added to the list and how long that IP remains in that list is unclear and as a result, false positives can also result from using such lists. In fact, some blacklists are poorly maintained such that legitimate users are unable to effectively use IPs once used by spammers [46]. Furthermore, since the list is detection based and zombies are known to generate a large amount of spam immediately upon infection, it is possible that hundreds of spams can get through a filter that utilize such blacklisting approaches [42]. Another problem with blacklists is that the major blacklists [39][40][41] are IP-based. In particular, since many spam zombies are known to reside
on end user machines with broadband connection, typically on dynamic IP space [14], using IP addresses to list spam zombies may not be effective solution especially if said zombies are continuously assigned new IP addresses. With thousands of new malicious sources being generated everyday, such reactive based approaches are always a step behind the attacker.
Since spammers tend to make efficient usage of their spam-related websites, it is not surprising that many spams contain similar Universal Resource Identifiers (URIs), which is a standard means of addressing resources on the web. This leads us to another type of blacklist called the URI blacklist, which lists URIs, found in spams. An example of such list is the Spam URI Realtime Blocklists (SURBL) [48].
Unfortunately, URI blacklists suffer from similar problems as IP based blacklists where the list is generated on a detection-basis and a spam containing a newly created URI will not be detected by such list. Furthermore, there is also the risk of false positives. For example, in cases such as phishing e-mails where a ‘clean’ URI is obfuscated and actually directs the user to the spammer’s site, the ‘clean’ URI may end up getting listed in that list.
Since the above-mentioned blacklists are owned by third party volunteer organizations, it makes sense to have a local whitelist that can be used in the event of a false positive. In particular, such whitelists are typically designed to allow mail from a particular IP or e-mail address to bypass all spam filtering. One issue with this is that the machine using the whitelisted IP address may become compromised giving the spammer unfiltered access to recipient’s inbox. Similarly, if the spammer knew that a particular e-
mail address was whitelisted, he or she could forge that e-mail address in their spam to ensure that the spam gets through the filter. As a result, whitelists need to be carefully maintained so that they do not get abused, which leads to additional overhead for the e- mail administrator.
For more information regarding the drawbacks of using network filtering, we refer the reader to [4].