Working with Specific Types of Address Data
Exercise 2: Use the Address Cleanse Transform
Exercise Objectives
After completing this exercise, you will be able to:
• Use the USA Address Cleanse transform configuration to parse, cleanse, and standardize data
• Use the Query transform to create a new column for information code descriptions
• Assign content types for input data to assist the definition of mappings
Business Example
After analyzing the target data for the Beta Businesses, you have decided that the address data requires parsing, cleansing, and standardization. You would also like to add a column that provides descriptive information for the information code values.
Task 1:
Create Datastores and import metadata for the Beta Businesses and Delta databases.
1. Log in to the Data Services Designer.
If you have already logged in to the Designer, you may proceed to the next step.
Note: Since this lesson could be used in a context where no other lesson/exericse in this course is being consumed, this step from the previous exercise has been duplicated.
2. If the BETA, DQSTANDALONE AND DELTA datastores already exist, please continue to the next step. If they do not exist, then create these datastores according to the following directions.
Note: Since this lesson could be used in a context where no other lesson/exericse in this course is being consumed, this step from the previous exercise has been duplicated.
In your Local Object Library, create a new source Datastore for the BETA Businesses database.
Continued on next page
3. In your Local Object Library, create a new Datastore for the DQSTANDALONE database.
4. In your Local Object Library, create a new Datastore for the DELTA staging database.
Task 2:
In the USA_Customer table in the Beta datastore, set the content type for certain fields to Name.
1. In the USA_Customer table in the BETA datastore, set the content type for certain fields to Name.
Task 3:
Within the Omega project, create a batch job Beta_Customers_Address-Cleanse_Job and a dataflow Beta_Customers_AddressCleanse_DF containing the USA Address Cleanse transform with a source table and a target template table.
1. If the Omega project already exists, please continue to the next step. If it does not exist, then create a new project called Omega.
Note: Since this lesson could be used in a context where no other lesson/exericse in this course is being consumed, this step from the previous exercise has been duplicated.
2. In the Omega project, create a new batch job Beta_Customers_Address-Cleanse_Jobwith a new data flow called Beta_Customers_Address-Cleanse_DF.
3. In the workspace for Beta_Customers_AddressCleanse_DF, add the USA_Customerstable from the Beta datastore as the source object connected to the USA Address Cleanse transform with a target template table AddressCleansed_USAin the Delta datastore.
Task 4:
Configure the USA Address Cleanse transform to parse the names and addresses in the USA_Customers table.
1. Map input columns to output columns for those columns that should be processed by the USA Address Cleanse transform and output additional columns, which parse the original information based on the USA engine.
Continued on next page
Task 5:
Execute the Beta_Customers_AddressCleanse_Job with the default options after saving all objects.
1. Execute the job with the default execution properties after saving all created objects.
Task 6:
Add a Query transform to the data flow to use a lookup function to provide the definition for AC_INFO_CODE values and execute the batch job again to see the results.
1. Add the Query transform to the workspace between the USA Address Cleanse transform and the target template table AddressCleansed_USA.
2. Use the Functions wizard to create a new lookup expression using the lookup_ext function.
3. Execute the job with the default execution properties after saving all created objects.
Solution 2: Use the Address Cleanse Transform
Task 1:
Create Datastores and import metadata for the Beta Businesses and Delta databases.
1. Log in to the Data Services Designer.
If you have already logged in to the Designer, you may proceed to the next step.
Note: Since this lesson could be used in a context where no other lesson/exericse in this course is being consumed, this step from the previous exercise has been duplicated.
a) From the Windows Terminal Server (WTS) training environment desktop, choose Start → Programs → SAP Business Objects Data Services 4.0 SP1
→ Data Services Designer.
b) In the dialog box, enter your assigned User ID.
c) Enter your password, which is the same as your User ID.
d) Choose the Log on button.
e) In the list of repositories, select your repository DSREPO##, where ## is the number portion of your User ID.
f) Choose OK.
2. If the BETA, DQSTANDALONE AND DELTA datastores already exist, please continue to the next step. If they do not exist, then create these datastores according to the following directions.
Note: Since this lesson could be used in a context where no other lesson/exericse in this course is being consumed, this step from the previous exercise has been duplicated.
In your Local Object Library, create a new source Datastore for the BETA Businesses database.
a) In the Local Object Library, select the Datastores tab and right-click in the Datastores workspace to select the New option from the context menu.
b) In the resulting dialog box, use the options:
Continued on next page
Field Value
Datastore name BETA
Datastore type Database
Database type Microsoft SQL Server
Database version Microsoft SQL Server 2005 Database server name Supplied by the Instructor
Database name BETA
User name sourceuser (lower case)
Password sourcepass (lower case)
Click OK to finish.
c) In the Local Object Library, right-click the BETA datastore to select the Open option from the context menu.
The tables from the BETA database are displayed on the Metadata window on the right.
d) Import the metadata for the BETA database source tables by selecting all the tables, right-click on them and from the menu, choose the option Import.
• source.addrcodes
Note: This database is a read-only database.
e) View the data for the USA_Customers table and confirm that Jane Hartley from Planview Inc. is the first customer record by right clicking on the table in the Local Object Library and choose the option View data.
Continued on next page
3. In your Local Object Library, create a new Datastore for the DQSTANDALONE database.
a) In the Local Object Library, select the Datastores tab and right-click in the Datastores workspace to select the New option from the context menu.
b) In the resulting dialog box, use the options:
Field Value
Datastore name DQSTANDALONE
Datastore type Database
Database type Microsoft SQL Server
Database version Microsoft SQL Server 2005 Database server name Supplied by the Instructor
Database name DQSTANDALONE
User name sourceuser (lower case)
Password sourcpass (lower case)
Choose OK to finish.
c) In the Local Object Library, right-click the DQSTANDALONE datastore to select the Open option from the context menu.
The tables from the DQSTANDALONE database are displayed on the Metadata window on the right.
d) Import the metadata for the DQSTANDALONE database source tables by selecting all the tables, right-click on them and from the menu, choose the option Import.
Note: This database is a read-only database.
Continued on next page
4. In your Local Object Library, create a new Datastore for the DELTA staging database.
a) In the Local Object Library, select the Datastores tab and right-click in the Datastores workspace to select the New option from the context menu.
b) In the resulting dialog box, use the options:
Field Value
Datastore name DELTA
Datastore type Database
Database type Microsoft SQL Server
Database version Microsoft SQL Server 2005 Database server name Supplied by the Instructor
Database name DELTA## (where ## is the
group number assigned by your Instructor)
User name student## or consult your instructor Password student## or consult your instructor Choose OK to finish.
c) You do not have to import any metadata.
Note: This database is a read/write database and will be used as the target for your template tables.
Task 2:
In the USA_Customer table in the Beta datastore, set the content type for certain fields to Name.
1. In the USA_Customer table in the BETA datastore, set the content type for certain fields to Name.
a) In the Local Object Library, select the Datastores tab and open the node for the BETA datastore.
b) Open the node Tables and right-click on the table USA_Customer to choose the option Edit.
Continued on next page
c) Right-click the field Cust_Name to select the option Properites and choose the type Name from the drop-down list for the Content Type associated with the field.
d) Select OK to close the dialog box.
e) Close the table metadata by selecting the Back button.
Task 3:
Within the Omega project, create a batch job Beta_Customers_Address-Cleanse_Job and a dataflow Beta_Customers_AddressCleanse_DF containing the USA Address Cleanse transform with a source table and a target template table.
1. If the Omega project already exists, please continue to the next step. If it does not exist, then create a new project called Omega.
Note: Since this lesson could be used in a context where no other lesson/exericse in this course is being consumed, this step from the previous exercise has been duplicated.
a) From the Project menu, choose the option New → Project.
b) When the Project New dialog box appears, enter Omega in the Project name field.
c) Select Create so that the new project appears in the Project area.
2. In the Omega project, create a new batch job Beta_Customers_Address-Cleanse_Jobwith a new data flow called Beta_Customers_Address-Cleanse_DF.
a) In the Project area, right-click the project name and choose New Batch Job from the menu.
b) Enter the name of the job as Beta_Customers_AddressCle-anse_Job.
c) Press Enter to commit the change.
d) Open the job Beta_Customers_AddressCleanse_Job by double-clicking it.
e) Select the Data Flow icon in the Tool Palette.
f) Select the workspace where you want to add the data flow.
g) Enter Beta_Customers_AddressCleanse_DF as the name.
Continued on next page
h) Press Enter to commit the change.
i) Double-click the data flow to open the data flow workspace.
3. In the workspace for Beta_Customers_AddressCleanse_DF, add the USA_Customerstable from the Beta datastore as the source object connected to the USA Address Cleanse transform with a target template table AddressCleansed_USAin the Delta datastore.
a) In the Local Object Library, select the Datastores tab and then select the USA_Customerstable from the Beta datastore.
b) Select and drag the object to the data flow workspace and in the context menu, choose the option Make Source.
c) On the Transforms tab of the Local Object Library, expand the Data Quality Global Address Cleansebranch to select and drag the USA Address Cleansetransform to the data flow workspace.
d) Connect the source table to the transform by selecting the source table and while holding down the mouse button, drag to the transform. Release the mouse button to create the link.
e) In the Tool Palette, select the icon for a template table and then select in the data flow workspace. Enter AddressCleansed_USA as the table name in the DELTA## datastore.
f) Connect the transform to the target table by selecting the transform and while holding down the mouse button, drag to the target table. Release the mouse button to create the link.
Task 4:
Configure the USA Address Cleanse transform to parse the names and addresses in the USA_Customers table.
1. Map input columns to output columns for those columns that should be processed by the USA Address Cleanse transform and output additional columns, which parse the original information based on the USA engine.
a) In the data flow workspace, double-click the USA Address Cleanse transform to open the transform editor.
b) In the transform editor for the USA Address Cleanse transform, map all columns from the Input Schema to the Output Schema by selecting all the columns in the Input Schema and dragging them to the Output Schema.
c) Map these fields from the Schema In to the Input tab of the parameters area:
Continued on next page
Schema In Input Mapping
Address1 MULTILINE1
Address2 MULTILINE2
City LOCALITY1
PostalCode POSTCODE
Create the mapping by selecting the field form the Schema In area and dragging them to the field in the Input tab.
Note: The fields Firm, Region, and Country are already mapped, because these content types are relevant to address cleanse transforms.
d) On the Options tab of the parameters area, ensure that only the USA engine is enabled.
Note: Other settings may be modified to conform to business rules.
In this case, accept the default address options.
e) On the Output tab, filter the output columns based on best practice by selecting the radio button Best Practice.
f) Since selection order determines output order, add these output columns in the order specified to the output schema after the columns you have already mapped. Use the check boxes to make your selections.
Continued on next page