Before you can define a record resource, you must first obtain a text-only delimited file of the data source that you want a Structured Data policy to reference as a Record resource. You then map the columns in your data file to the field names that define a Record resource. Structured Data policy templates require that you provide data files with the columns required by a given policy. For instance, if you use the Price Information Structured Data policy template, your data file must include columns that correspond to the SKU number for the item and its price. Check the policy template before obtaining the database file to make sure that your database administrator can provide data for the fields that are required by the Structured Data policy that you are creating. If your data file does not contain all the fields required by the policy template, you will need to modify the conditions specified in the template accordingly.
See“Compliance policy templates”on page 116.
You map each column in your data-source file to a Record resource field name in either one of two ways:
■ As a customized field name – In this case, a Structured Data policy references the Record resource field by its column heading in the delimited data-source text file. Check the box under Data Source Attributes to indicate that the data
Configuring content compliance filtering Managing policy resources
source file contains a header row. Data in columns identified by a customized field name are treated as of type WORD and are not validated when the index is created.
■ As a system pattern field – In this case, Symantec Mail Security matches data in a Record-resource field against a system pattern defined by a set of regular expressions. Data in your data-source file must conform to one of the regular expressions for a data file to be successfully uploaded and indexed. You do not need to check the Data source file contains a header row box if you define all fields as system patterns.
When mapping columns in your data file to the Record resource definition, make sure that the following conditions are met:
■ Symantec recommends that you use customized field names in the Record-resource definition that are similar to the field names used by the Structured Data policy template to identify matches. For example, if you are using the Resumes template, you should map the first_name and last_name columns in your data-source file to similarly named customized fields in the Record resource definition.
■ The system-pattern fields must correspond to the data types used by the Structured Data policy template to identify matches. For example, if you are using the Customer Data Protection template, you should use system pattern fields that correspond to SSN, CCN, Phone, and Email columns.
■ The data used in the data file should match the type of system pattern selected for the field name.
■ Make sure that your data-source file contains the minimum number of columns that you will use to define a Record resource view. For example, a Structured Data policy that calls for a minimum of three fields to trigger a rule must be able to reference at least those three fields, which must be mapped to the Record resource. For example, when using the EU Data Protection Directives policy template, any view that accesses the EU Data Protection Directives rule should be configured to match entries in at least 4 of 5 fields: Last name, email, phone, account number, and username.
■ The size of the data file should not exceed 1.5GB. Larger files sizes cannot be indexed when uploaded as a Record resource.
■ Set Maximum Allowable Errors to a percentage of the total rows that can safely return errors and continue processing. Setting too low of a percentage may make it difficult to complete processing an otherwise useful data-source file. Setting the percentage too high may result in a Record resource that will return too many false positives because fewer matches were found.
173 Configuring content compliance filtering
Symantec Mail Security uses regular expression system patterns to find data in a Record resource. Use the following table to match the types of data in your data file to the system patterns for the Structured Data record resource that you want to define.
Configuring content compliance filtering Managing policy resources
Table 5-14 Record resource system patterns
Description Examples
System pattern
MasterCard: Any 16-digit number beginning with 5 and whose second digit is a number from 1 to 5, separated into 4 groups of 4 by spaces or hyphens
5369 7777 8888 9999 Credit Card number
5369-7777-8888-9999
VISA: Any 16-digit number beginning with 4 and separated into 4 groups of 4 digits separated by a space or hyphen, or any 12-digit number beginning with 4
4567 1234 5678 9123
4123-6666-7777-8888
4123456789012
American Express: Any 15-digit number beginning with 34 or 37 and separated into 3 groups of 4, 6, and 5 digits, respectively, by spaces or hyphens
3442 456789 12345 3758 456789 12345
Diners Club card: Any 15-digit number beginning with 30, 36, or 38 and separated into 3 groups of 4, 6, and 5 digits, respectively, by spaces or hyphens 3056 123456 7890 3667 123456 7890 3878 123456 7890 3056-123456-7890 3667-123456-7890 3842-123456-7890
Discover card: Any 16-digit number beginning with 6011 and separated into groups of 4 by spaces or hyphens 6011 1234 5678 9012
6011-1234-5678-9012
Enroute card: Any 15-digit number beginning with 2014 or 2149 separated into groups of 4, 6, and 5 by spaces or hyphens
2014 123456 78901
2149-123456-78901
JCB: Any 16-digit number separated into 4 groups of 4 by a space or hyphen, beginning with 3; or any 15-digit number beginning with 2131 or 1800 and followed by 11 digits 3123 4567 8901 1234
3123-4567-8901-1234
213112345678901 1800123245678901
175 Configuring content compliance filtering
Table 5-14 Record resource system patterns (continued) Description Examples
System pattern
Any alphanumeric string, which can be divided by an underscore (_), hyphen (-), or period, followed by the @ sign and an alphanumeric string, a period, and one of the
domain-name extensions listed. Symantec Mail Security cannot validate top-level domains of two letters, where one or both letters is uppercase. It does, however, validate uppercase three-letter domains. For example, it will not validate [email protected] or [email protected]. However, Symantec Mail Security will validate mister_smith @senate.GOV. jabberwocky @symantec.com mister_smith @senate.gov tom.swift @gadgets.arpa t-rex9@nature. museum [email protected] Email
Any grouping of three 1-, 2-, or 3-digit numbers followed by a 1- or a 2-digit number (possibly separated by a backward slash) separated by periods.
Symantec Mail Security does not parse any terminal characters other than a 1- or 2-digit numeral preceded by a backslash. Thus, 10.113.14.10a is not interpreted as a valid IP address. 1.2.3.4 10.0.0.0 18.255.30.41 10.0.10.0\24 10.0.10.0\1 10.0.10.0\0 IP address
Configuring content compliance filtering Managing policy resources
Table 5-14 Record resource system patterns (continued) Description Examples
System pattern
Symantec Mail Security recognizes European-style numbers, where the comma serves as decimal point and periods separate groups of 3 digits. Fractions must be preceded by a numeral, including zero (0) if necessary, and expressed as a decimal.
Although numbers 8 digits or smaller with commas are supported, Symantec recommends that you use tab- or pipe-delimited data-source text files that contain numbers using commas. This avoids the possibility that commas in numbers will be mistaken as field delimiters. Numbers that are larger than 8 digits are interpreted as of type WORD. 10 10.99 0.33 9999 10,000 6.999,66 99.999.999 -9,999 -10.99 Number
Symantec Mail Security recognizes European-style numbers, where the comma serves as decimal point and periods separate groups of 3 digits. Fractions of a percent must be preceded by a numeral, including zero (0) if necessary, and expressed as a decimal. Only numbers adjacent to the percent sign (%) (no space) are regarded as valid percentages. The following patterns will not produce a match: -.32% .32 32 percent 32per 5 ¾% 76% 23.4% 56.78% -1.089,01% -0.32% Percent 177 Configuring content compliance filtering
Table 5-14 Record resource system patterns (continued) Description Examples
System pattern
Any 10-digit phone number beginning with 2-9 and/or preceded by 1 followed by a hyphen or a space-hyphen-space. The 3-digit area code can be enclosed in parantheses or not, followed by a space, hyphen, or period and the 7-digit number grouped into 3 and 4 digits separated by a space, hyphen or period or not separated at all. (238) 832 5555 (238) 832-5555 238-832-5555 238 8325555 1-238-832-5555 238.832.5555 1 - (238) 8325555 1 - (238) 832-5555 1 - (238) 832 5555 US phone
Any 5 digit ZIP code or combination of 5-digit code plus 4-digit extension separated by a hyphen.
90210 89412-4321 US ZIP code
Any 9-digit number, either continuous or separated into 3 groups of 3, 2, and 4 digits separated by a hyphen or space. The first group of three digits cannot be 000. The second number group must be greater than or equal to one, and the last number group must be greater than zero.
777-77-7777 777 77 7777 123456789 SSN/ITIN
Keep in mind the following issues when mapping data file columns to a Record resource system patterns:
■ All credit card numbers must pass the Luhn checksum test, where total modulus 10 is congruent to 0, in order to produce a match. The Luhn test is used to distinguish valid numbers from random collections of digits.
■ If your data source contains adjacent fields that you map as number patterns, Symantec recommends that you use a tab space or pipe instead of a comma as your field delimiter. Otherwise, the Record resource validator may interpret two numbers from adjacent fields as a single number belonging to the first field, with digits offset by commas, and return an error. For example, the Record resource validator could interpret two adjacent, comma-delimited fields—Age and Weight—in one row as 25,150 under one column rather than as 25, 150 under separate Age and Weight columns, respectively.
Configuring content compliance filtering Managing policy resources
■ Symantec Mail Security does not match rows occurring more than 99 times. ■ Symantec Mail Security cannot match entries consisting of a single character.
Each entry must contain at least two alphanumeric characters. To define a record resource
1
In the Control Center, click Compliance > Resources > Add.2
Specify a Record resource name.3
Specify an optional description4
Under Data Source Attributes select the appropriate delimiter character for your data source file.The supported delimiter characters are Tab (Tab key), Comma (","), or Pipe ("|").
5
Check the Data source file contains a header row checkbox if your data source file contains a header row.A header row is neither processed nor included in the Record resource.
Note: CRLF line breaks that precede rows in a data set are included in the row count. Instead of skipping the actual header row as it would for a data set without CRLFs, Symantec Mail Security Appliance treats the first CRLF as the header row and returns the values from subsequent rows, including those for the actual header row. If system mappings do not match the columns in the header row, Symantec Mail Security Appliance counts the actual header row as invalid because the header row returns values other than those expected. For example, if one column is mapped to recognize the US Zip code pattern and one or more CRLFs begin the data set, Symantec Mail Security counts the actual header row as a normal row that is expected to return a 5-digit number in that column. When the actual header row returns a Word value instead of a 5-digit number, Symantec Mail Security Appliance counts it as an invalid row. Symantec Mail Security Appliance ignores CRLFs that occur within a data set or at the end of a data set. Such CRLFs are not counted as rows.
6
Under Error Threshold indicate the maximum allowable percentage of errors that occur before processing is halted.179 Configuring content compliance filtering
7
Under Mapping indicate the field names to associate with the columns in the data source file.You can select the field name from the drop-down list or you may enter custom field names for your Record resource by selecting Customize... from the drop-down list.
Once you select Customize... from the drop-down list, an adjacent blank text field is available to enter a unique custom field name. These custom field names cannot be the same as any of the pre-defined list of Field Names and cannot be the same as any other custom field name in the Record resource. These custom fields are then available to use when creating a View that references the Record resource.
8
Change the order of one or more columns and associated Field Names by clicking the check box opposite the Field Names and then clicking Move Up or Move Down.Use Move Up and Move Down when you want to modify the mappings for the Record resource without having to delete all the mappings and then recreate them. For example, a new version of the data in your Record resource might have columns 2 and 3 interchanged. You can move up the mapping for column 3 to interchange the mapping for column 3 and column 2.