Objectionable Content Filtering

The Objectionable Content Filter defines a list of key words that will cause a message to be blocked if any of those words appear in the message.

The Objectionable Content Filter provides enhanced content filtering functionality and flexibility, allowing users to restrict content of any form including objectionable words or phrases, offensive content and/or confidential information.

This list is end user manageable, and can be updated and customized to meet the specific needs of any organization. Rules can also be applied to both inbound and outbound messages preventing unwanted content from entering an organization and prohibiting the release of sensitive

information.

OCF words can be extracted from messages that disguise the words with certain techniques.

For example, OCF will detect the word "spam", even if it is disguised as "sp@m" or "s_p_a_m".

Select Mail Delivery -> Anti-Spam -> OCF to configure the objectionable content filter.

Actions

You can set actions for both inbound and outbound messages. The following actions can be set:

• Just log — Log the event and take no further action.

• Reject mail — The message is rejected with notification to the sending system.

• Quarantine mail — The message is placed into quarantine.

• Discard mail — The message is discarded without notification to the sending system.

Notifications

Notifications for inbound and outbound messages can be enabled for all recipients, the sender, and the administrator. The content for the Inbound and Outbound notification can be customized.

See “Customizing Notification and Annotation Messages” on page 273 for a full list of system variables that can be used in the notification.

Upload and Download Filter List

A predefined list of objectionable words is included with the ePrism Email Security Appliance.

To customize the list and to add or remove words, click Download File to download the list to a local system.

Use a text editor to edit the file using one word or phrase per line. When finished, upload the file by clicking the Upload File button.

RBL (Real-time Blackhole List)

RBL (Real-time Blackhole List)

RBLs contain the addresses of known sources of spam and are maintained by both commercial and non-commercial organizations. The RBL mechanism is based on DNS. Every server that attempts to connect to ePrism will be looked up on the specified RBL servers using DNS. If the server is blacklisted, then a configurable action can be taken, such as rejecting the mail, or flagging the message in its header or subject.

Note the following considerations when using RBL:

• If the RBL server is not available, the DNS request times out. This may affect performance and requires monitoring for timed-out connections. Remove any servers which you do not use to prevent time-outs.

• If a message that you want to receive is blocked by an RBL, add an item to the Pattern Based Message Filtering list to "Trust" (to train for STA) or "Accept" (not train for STA) this message.

• Choose your RBLs carefully. St. Bernard provides a default server, but we recommend you review RBL providers (both commercial and free) as some servers are more reliable than others, while some may not exist after a certain period of time. It is recommended for stability and accuracy that a commercial RBL service be used.

Caution: The default RBL server in ePrism (rbl-plus.mail-abuse.org) is a commercial RBL provider. To work properly, you must purchase a subscription to this service.

Configuring RBLs

Select Mail Delivery -> Anti-Spam from the menu. Click Realtime Blackhole List (RBL) to configure RBLs.

• Enable RBLs — Select this check box to enable RBLs.

• Check Relays — The Check Relays setting deals with spammers who are relaying their messages, usually illegally, through an intermediate server. The information about the originating server is carried in the headers of the message which is checked by ePrism against the RBL. For example, set Check Relays to "2" for ePrism to look for the last two relays.

• Action — Specify one of the following actions:

Just log: An entry is made in the log, and no other action is taken.

Modify Subject Header: The text specified in Action Data will be inserted into the message subject line.

Add header: An "X-" mail header will be added as specified in the Action Data.

Redirect to: The message will be delivered to the mail address specified in Action Data.

Reject mail: The mail will not be accepted, and the connecting mail server is forced to return it.

BCC: The message will be copied to the mail address specified in Action Data.

• Action data — Depending on the specified action:

Modify Subject Header: The specified text will be inserted into the subject line, such as [RBL].

Add header: A message header will be added with the specified text, such as [RBL].

Redirect to: Send the message to a mailbox such as [email protected]. You can also specify a domain such as spam.example.com.

Note: The Add header field can be left blank, if required. If you specify a header such as [RBL], the header will be written as "X-Reject: [RBL]". If you use the form

RBL:[RBL_List], the header will be written as "X-RBL:[RBL_List]".

RBL Domains

Click Edit to modify the list of your RBL domain serves. Click Update when finished.

Caution: The default RBL server in ePrism (rbl-plus.mail-abuse.org) is a commercial RBL provider. To work properly, you must purchase a subscription to this service.

DCC (Distributed Checksum Clearinghouse)

DCC (Distributed Checksum Clearinghouse)

DCC is based on a number of servers that maintain databases of message checksums derived from numeric values that uniquely identify a message. DCC provides a simple but very effective way to successfully identify spam and control its disposition while updating its database with new spam message types.

Mail users and ISPs all over the world submit checksums of all messages received. The database records how many of each message is submitted. If requested, the DCC server can return a count of how many instances of a message have been received. ePrism uses this count to determine the disposition of a message.

A DCC server receives no mail, address, headers, or any similar information, but only the

cryptographically secure checksums of such information. A DCC server cannot determine the text or other information that corresponds to the checksums it receives. It only acts as a clearinghouse of counts of checksums computed by clients.

DCC interacts with ePrism’s other spam controls as follows:

• Mail is checked by DCC after it has been filtered by Specific Access Patterns and Pattern Based Message Filters. Messages that trigger an "accept" rule will not be processed by DCC.

• All messages classified as "bulk" by DCC (those that exceed the locally set threshold) are passed to the STA engine for analysis as spam unless the specified action is "reject".

Note: You must allow a connection on UDP port 6277 on your firewall or router to allow communications with a DCC server. If this port is not available, DCC server calls will fail and slow down mail delivery.

DCC Considerations

When implementing DCC, consider the following:

• Educate your user community about this tool and request them to submit mailing lists and other bulk mail sources that need to be whitelisted. This step is crucial if DCC and STA are to work properly.

• Configure your initial disposition for bulk mail to be Modify Subject Header. Users will see all the bulk mail and will quickly identify any sources of mail they want to whitelist. Users can also create local filter rules in their mail clients to put all tagged mail into a folder.

Configuring DCC

Select Mail Delivery -> Anti-Spam on the menu, and then DCC to configure Distributed Checksum Clearinghouse.

Threshold Settings

The threshold is used to determine what should happen to mail when it has been classified.

• If bulk exceeds — DCC returns a number showing how many times the message has been identified. This can be zero (unique and therefore not bulk) or another number, such as 1352, indicating that the message has been reported 1351 prior times.

It may also return the value "many". This is a special DCC value returned when DCC has seen a certain message in such volumes and in such a frequency that it is most certainly considered

"bulk".

For DCC to be useful, you need to specify a threshold that will trigger an action. It is

DCC (Distributed Checksum Clearinghouse)

• Action — The action can be one of the following:

Just log: An entry is made in the log, and no other action is taken.

Modify Subject Header: The text specified in Action Data will be inserted into the message subject line.

Add header: An "X-" mail header will be added as specified in the Action Data.

Redirect to: The message will be delivered to the mail address specified in Action Data.

Reject mail: The mail will not be accepted, and the connecting mail server is forced to return it.

BCC: The message will be copied to the mail address specified in Action Data.

• Action data — Depending on the specified action:

Modify Subject Header: The specified text will be inserted into the subject line, such as [DCC_BULK].

Add header: A message header will be added with the specified text, such as [DCC_BULK].

Redirect to: Send the message to a mailbox such as [email protected]. You can also specify a domain such as spam.example.com.

Note: The Add header field can be left blank, if required. If you specify a header such as [DCC_BULK], the header will be written as "X-Reject: [DCC_BULK]". If you use the form DCC_REJECT:[BULK], the header will be written as "X-DCC_REJECT:[BULK]".

DCC Trusted and Blocked List

You can create exceptions to DCC’s bulk classifications by using the Trusted and Blocked List. In many cases, it may be easier to specify such exceptions using Pattern Based Message Filters, in which case the mail bypasses both DCC and STA.

Note: In most cases, use the Pattern Based Message Filter menu for creating exceptions.

The DCC trusted and blocked list feature is useful for removing legitimate bulk mail, such as mailing lists, from consideration as bulk while letting it be scanned by STA for spam characteristics.

Click Edit to add entries to the Trusted and Block lists

DCC Servers

The default DCC servers supplied will cover most cases and should not be changed without careful consideration.

Click Edit in the DCC Servers section to configure your DCC server settings, if required.

STA (Statistical Token Analysis)

STA (Statistical Token Analysis)

STA is a sophisticated method of identifying spam based on statistical analysis of mail content.

Simple text matches can lead to false positives because a word or phrase can have many meanings depending on the context. STA provides a way to accurately measure how likely any particular message is to be spam without having to specify every word and phrase.

STA achieves this by deriving a measure of a word or phrase contributing to the likelihood of a message being spam. This is based on the relative frequency of words and phrases in a large number of spam messages. From this analysis, it creates a table of "discriminators" (words associated with spam) and associated measures of how likely a message is spam.

When a new incoming message is received, STA analyzes the message, extracts the discriminators (words and phrases), finds their measures from the table, and aggregates these measures to produce a spam metric for the message.

STA uses three sources of data to build its run-time database:

• The initial tables supplied by St. Bernard based on analysis of known spam.

• Tables derived from an analysis of local legitimate mail. This is referred to as "local learning" or

"training".

• Mail identified as "bulk" by DCC is also analyzed to provide an example of local spam.

How STA Works

Consider the following simple message:

---Subject: Get rich quick!!!!

Click on http://getrichquick.com to earn millions!!!!!

---STA will break the message down into the following tokens:

Get

earn

millions!!!!!

Each token is looked up in the database and a metric is retrieved. The token "Click" has a high measure of 91, whereas the word "to" is neutral (indicating neither spam nor legitimate.)

These measures are aggregated using statistical methods to give the overall score for the message of 98. Based on the resulting cumulative score, the message can then be rejected, quarantined, annotated, or forwarded according to how the local threshold is set.

STA Considerations

Several factors can affect the accuracy of STA:

• Is STA seeing all local mail? — The more local or outbound mail that STA sees, the more accurate it will be. It is recommended that ePrism should process all inbound and outbound mail.

• "Trusted" and "Untrusted" mail must be properly identified — If STA treats a local source of mail as "untrusted", it will not be used for training. Treating an external unknown source of mail as "trusted" will exempt this mail from spam processing. Similarly, using

"untrusted" mail for training may insert spam into the STA database.

• Add your own definitions of "valid" or "spam" mail — Instead of simply creating a Pattern Based Message Filtering rule that rejects mail, you can label it as "spam" which sends the message to STA for training before rejecting it. Trusted external sources of mail can be labeled as "trusted" which sends the message to STA for training before delivery. STA’s advanced features allow you to upload your own lists of neutral words, spam, and legitimate mail.

STA (Statistical Token Analysis)

Configuring STA

Select Mail Delivery -> Anti-Spam on the menu, and then select STA to configure Statistical Token Analysis.

STA can be enabled to filter spam immediately after installation. It is recommended that you start STA by running in "Training Only" mode to gather an initial sample of legitimate mail and spam.

When enabled, STA will always run in training mode and analyze all local mail. Local mail is assumed to be not spam and the frequency of the words found in this mail may therefore be used to modify the values supplied by St. Bernard’s master list. For example, a mortgage company may use the word "refinance" quite frequently in its regular mail. The likelihood of this word suggesting spam would therefore be reduced.

• Training Only — STA will analyze local mail but will NOT classify incoming mail.

• Scanning and Training — STA will analyze local mail AND will classify incoming mail.

When a sufficient number of local messages have been analyzed (minimum of 48 hours, 4-5 days recommended), switch to Scanning and Training to start classifying incoming mail.

Setting Thresholds

STA measures the likelihood of spam for each message it processes. This likelihood is represented by a number between 0 and 100. The closer to 100, the more likely the message is to be spam. You can set both an Upper and Lower Threshold. Leave the field blank to disable the action.

It is recommended that you initially set the Upper Threshold to a high value, such as 95, and then slowly lower it as the training improves. Then set the Lower Threshold, if required.

Messages typically fall into three groups:

• Over 90 — Almost certainly spam.

• Between 55 and 90 — Possibly spam.

• Less than 55 — Almost certainly legitimate mail.

ePrism provides an upper and lower threshold to manage the mail that has been classified.

For each threshold, the range of available actions is as follows:

• Action — The action can be one of the following:

Just log: An entry is made in the log, and no other action is taken.

Modify Subject Header: The text specified in Action Data will be inserted into the message subject line.

Add header: An "X-" mail header will be added as specified in the Action Data.

Redirect to: The message will be delivered to the mail address specified in Action Data.

Reject mail: The mail will not be accepted, and the connecting mail server is forced to return it.

BCC: The message will be copied to the mail address specified in Action Data.

• Action data — Depending on the specified action:

Modify Subject Header: The specified text will be inserted into the subject line, such as [STA_SPAM].

Add header: A message header will be added with the specified text, such as [STA_SPAM].

Redirect to: Send the message to a mailbox such as [email protected]. You can also specify a domain such as spam.example.com.

STA (Statistical Token Analysis)

enabled), and local training. Since the database is not built for the first time until 12 hours after installation, you can use this option to immediately rebuild the STA database.

Delete Training

Click the Delete Training button to remove all training material. You should delete all training material if your ePrism system has been misconfigured and starts to treat "trusted" mail as

"untrusted" or vice versa.

STA Advanced Options

Click the Advanced button to reveal additional STA options. These options are for advanced STA configuration only, and it is highly recommended that the default values be used. Modifications to the default values may decrease STA accuracy and should be used with care.

Neutral Words

Neutral words are words that may or may not indicate spam. For example, a mortgage company may want to build a neutral word list that includes "refinance" or "mortgage" because these words show up quite frequently in spam mail. By adding them to the neutral word list, the likelihood of this word suggesting spam would therefore be reduced to a neutral value.

• Default Neutral Words — Select the check box to enable the St. Bernard neutral words list.

This list helps prevent pollution of the STA database. It is recommended that you leave this option enabled.

• Uploaded Neutral Words — Enables use of the uploaded neutral words list.

You must upload a file using the Upload Neutral Words button. The file must be in text format, and contain a list of neutral words with one word per line. Uploading a new list will replace the previous neutral words list.

Note: During the upload of a neutral words list, the system will automatically rebuild the STA database. This process may take some time to complete.

STA and Languages

The STA spam database is based on English language spam. As a result, it may not be initially responsive to spam created in other languages. STA’s ability to learn means that it can readily adapt to other languages. Ensure that DCC is enabled because all mail identified as "bulk" by DCC will be used by STA to train as spam. Assuming that some of these messages are in the local language, STA will build a database that reflects that language. STA will train on local legitimate mail from the moment the system is started. This will help properly characterize the local language use and prevent it from being classified as spam.

It is recommended that you use the "spam" action in Pattern Based Message Filters (PBMF), and select "Train as STA Spam" in the PBMF Preferences. Messages specified as "spam" will be forwarded to STA and will increase its database of local language words.

In document M1000, M2000, M3000. eprism User Guide (Page 117-135)