separate match key for each of the processes.
The match key dictionary for each process is contained in a file named either PLACE.DCT (PLACE dictionary file) or
MTCHADDR.DCT (MTCHADDR dictionary file).
The dictionaries are ASCII files that can be maintained with a standard text editor. Each line of the dictionary represents a field of the match key. The format of a dictionary line is:
<field-identifier> <field-type> <field-length> <missing> [ ;
<comments>]
The field identifier is a two-character field name (case-insensitive) that must be unique over all the dictionaries defined.
The field type defines how information is to be placed in the field.
The following are the only possible values, and one must be chosen for each field defined:
N or C must be coded to indicate whether the field is a character field or a numeric field. Numeric fields are right-justified and filled with leading blanks in the key, and character fields are left-justified and filled with trailing blanks.
NS may be coded to indicate that leading zeros should be stripped from a numeric field. For example, house numbers would be defined with type NS and ZIP Codes would be defined with type N, since leading zeros matter with ZIP Codes but not with house numbers.
M indicates mixed alphas and numerics. This type causes numeric values to be right-justified in the fields, and alpha values to be left-justified. Leading zeros, if present, are retained. This type helps retain the proper sort order for these fields. The U.S. Postal Service uses this type of field for house numbers and apartment numbers. For example, in a four-character type M field, 102 becomes b102, and A3 becomes A3bb (where b represents a space [blank]).
MN indicates mixed name. It is generally used for representing street names in sort order. Field values beginning with an alphabetic character are left justified. Field values beginning with a number are indented as if the number were a separate three-character field. For example, the following list demonstrates the correct sorting order:
MAIN CHERRY HILL
The match key
bb2ND b13TH 123RD
Notice that single-digit numbers are indented two spaces, two-digit numbers are indented one space, and three-two-digit numbers and alphabetics are not indented. This keeps numbers in a proper sort order. 2ND precedes 13TH. The U.S. Postal Service uses this type of field for street names in the ZIP+4 files.
The third operand is the field length in characters.
The fourth operand is a missing value identifier. This is included for compatibility with the interactive matching library. You should just code an X for this operand.
Optional comments may follow a semicolon. For example:
HN N 7 X ; House number
SN MN 25 X ; Street name
MV M 6 X ; Multiunit value
CT C 20 X ; City name
This sample key defines four fields. It creates a fixed record 48 characters long. It contains a house number field, identified by the abbreviation HN; a street name field, SN; a multiunit value field, MV; and a city field, CT. The house number is numeric, the street name is mixed-name, the multiunit value is mixed, and the city name is character. Notice the missing value operand (X).
The first line of a match key dictionary should be the following:
\FORMAT\ SORT=N
This line prevents the table from being sorted on field identifier.
The fields in the match key will be in the same order that they are defined in the dictionary. There cant be any comments preceding this line.
The following subsections present the match key dictionaries for the processes supplied with the system.
Name match key
The NAME process is supplied with the following default match key:
\FORMAT\ SORT=N
;
; Name match key dictionary
;
TL C 4 X ; Title (Mr., Mrs., Dr., etc.)
FN C 16 X ; First name
MN C 16 X ; Middle name
LN C 16 X ; Last name
RK C 4 X ; Rank (Jr., Sr., and so on)
XN C 4 X ; Soundex of last name
To see how fields would be moved to the match key, consider the following example:
MR. JOHN CHARLES THOMAS, JR.
TL = MR FN = JOHN MN = CHARLES LN = THOMAS RK = JR XN = T530
Soundex is a phonetic encoding scheme that is used to aid searching despite spelling errors. For example, TOMAS, THOMIS, and TOMS, would all have a Soundex code of T530.
You may change any of the field lengths as needed. If other changes are made (new fields added, fields are deleted, and so on) you must make appropriate changes to the pattern files.
Remember that the two-character field names must be unique across all processes.
Place match key
The PLACE process is supplied with the following default match key:
\FORMAT\ SORT=N
;
; Place match key dictionary
;
The state abbreviation is the two-character standard USPS code for the state name. There are two ZIP Code fields, comprising both parts of a ZIP+4 Code. If there is no four-digit add-on code, then this field will be blank.
Consider the following example of standardized place information:
BERKELEY SPRINGS, WVA 12345-6789 CT = BERKELEY SPRINGS
XC = B624
SA = WV ZP = 12345 Z4 = 6789
You can change field lengths, except for the Soundex fields and the state abbreviation. If the state abbreviation or postal code fields are different, then appropriate changes must be made to the pattern files.
Street address match key
The MTCHADDR process is supplied with the following default match key:
\FORMAT\ SORT=N
;
; Street address match key
;
HN N 8 X ; House Number
CHN 8 X ; Coordinate house number
HS C 4 X ; House Number Suffix A, 1/2, and so on UTC 4 X ; Multiunit type UVC 10 X ; Multiunit value XS C 4 X ; Soundex of Street Name XRC 4 X ; Reverse Soundex of street name
Consider this example of an address:
1234-04 1/2 NORTH CEDAR KNOLLS LN EAST, SUITE 560-A HN = 1234
CH = 04 HS = 1/2 PD = N PT =
SN = CEDAR KNOLLS ST = LN
SD = E UT = STE UV = 560-A XS = C360 XR = S452
You may change field lengths (except for Soundex fields) as desired. However, deleting and adding fields will require changes to the pattern files.
The reverse Soundex code is used to facilitate address matching.
It is a Soundex code of the street name spelled backwards. Thus, the two Soundex codes are developed from CEDAR KNOLLS and SLLONK RADEC.
It should be clear from the discussion in the previous section that each standardization process requires a match key definition, a classification table, and a pattern file. The next section describes the classification table.
Classification table format