• No results found

Modifying the .cls file

In document How To Write A Geocoding Program (Page 46-52)

The classification table takes a particular value in the address and assigns it a standardized abbreviation or value and a token type value. You can use the .cls file to add, remove, or modify street directions, types, and ordinal suffixes. For example, if you don’t want

“AVENUE” to be standardized as “AVE” because your reference data records “AV” as the street type, you may change it in the table. If you have some particular street types used in your databases, such as

“CLOSE”, that are not found in the existing table, you can add them to the table. For example, a new line like this:

CLOSE CLOSE T can be added to the table.

When editing the .cls file, remember that any changes you make will be universal. If you wish to make a change in a special case (for example, filter for “North Bend” in a way that lets North be a part of the address for this address, but leaves it as a direction in all other cases), you can add a special routine to the .pat file.

For more information, see Chapter 7, ‘The pattern file’.

Modifying the .cls file to change standardization of ordinal suffixes

1. Open the .cls file in Notepad.

2. Scroll down to the ordinal numbers you wish to change, for example, FIRST.

3. If you wish to change your data so it is standardized as a full word (rather than the numeral), change the second column to the full word.

4. When you are done editing the .cls file, click the File menu and click Save.

3

Modifying the .cls file to change AVE to AV

1. Open the .cls file in Notepad.

2. Scroll down to Avenue (use the Find tool in the Edit menu).

3. Change the text in the second column from AVE to AV.

4. Click the File menu and click Save.

3

If your address data and reference data use different abbreviated values, you may need to modify how the term is abbreviated. The second column should match the format used in the reference material.

Removing Spanish street types for datasets that store prefix types in the street name field

1. Open the .cls file.

2. Scroll down to Avenida.

3. Comment out Avenida by placing a semicolon in front of it.

4. Repeat the process with Calle and Paseo, if desired.

5. Click the File menu and click Save.

In certain cases, some prefix values should not be removed from the name (for example, Calle Real) but are. This is because the rule base sees these words as prefix street types. You can fix this in the .cls file so these words are classified as part of the street name.

3

Adding new keywords and standard

abbreviations to the .cls file

1. Open the .cls file in Notepad.

2. Scroll down to the bottom of the file.

3. Type the words you want to add (for example, Close) in the first column.

4. Type the words as they should be standardized in the second column.

5. Type a T in the third column to standardize as a type.

6. Click the File menu and click Save.

In some cases, you may have an unrecognizable street type.

In a case like this, you can add the type to the .cls file, stan-dardize the third column as a type, and save the .cls file. By doing this, you ensure that the words will be standardized correctly.

3 4 5

Changing abbreviated names in the address data to match data in the reference files

1. Open the .cls file in Notepad.

2. Scroll down to MLK.

3. Change the second column from MARTIN LUTHER KING to MLK.

4. Click the File menu and click Save.

Your address data should now be the same as your reference file data, which will result in much better

candidate scores.

When you are dealing with abbreviations for names of streets (for example, MLK or JFK), you may notice that all your candidate matching scores are low. This may be because the .cls file is standard-izing the abbreviation to something other than what is in the reference data file. In other words, if the .cls file is stan-dardizing MLK to MARTIN LUTHER KING, but the refer-ence file contains MLK, the match score will be low because it won’t be able to find the street name. To fix this problem, you can edit the .cls file so that MLK is standardized to MLK, and MARTIN LUTHER KING is standardized to MARTIN LUTHER KING.

Similarly, you may run into this sort of trouble when you are working with instances of ST.

ST is a special case, since it can represent Street, Saint, the st in ordinal numbers (first, twenty-first, and so on), or Suite. The classification table is not the best place to deal with in-stances of ST, because it can’t take different situations into account. Instead, it is handled in the pattern rules. For more information on dealing with instances of ST, see Chapter 7,

‘The pattern file’.

3

2

IN THIS CHAPTER

The pattern file 7

• Overview of the pattern file

• Pattern rules

• Actions

• Modifying the pattern file

• Dealing with street intersections

• Editing intersection .xat/.pat files

• Adding custom routines to the pattern file

The pattern file (.pat extension) is critical to the standardization process because it defines pattern rules and actions. This chapter looks at how the pattern file is set up, examines the different rules and actions that are available, and shows you how to modify the pattern file.

The pattern file (.pat extension) contains pattern rules and actions for standardizing an address and converting the recognized operands into match key fields.

The example below shows the three parts of the pattern file. The POST action section is optional and contains actions that are executed after patterns in the main section and subroutines are processed for the record. The pattern/action section shows that patterns and actions must be grouped together. This section can contain as many pattern–action sequence pairs as are necessary.

The last section shows where subroutines are located.

In document How To Write A Geocoding Program (Page 46-52)