• No results found

Map the data element or elements to the target data element—First map the element or elements to the target element, then working with a data modeler, a data integration

Data Integration Analysis Phase

Step 4: Perform Source\Target Data Mappings

5. Map the data element or elements to the target data element—First map the element or elements to the target element, then working with a data modeler, a data integration

analyst would create mappings for the three core key structures that followed the cus-tomer key example shown in Figure 6.22.

ptg

1. Source-to-Enterprise Data Warehouse Data Mappings Source File/

Table

Source Field Source

Domain

Mapping Rule Subject Area File Column Name Column Definition Target

Domain

Mandatory Key

Customer Subject Area

CUST.dat Customer_Number The unique identifier assigned to a customer.

INTEGER(10) Yes Primary

d

CUST.dat Source_System_Identifier The identifier of the source system that the data was sourced.

VARCHAR(4) Yes Primary

SYS 1 CUST FILE CUST_# Varchar(04) Pad last 6 digits CUST.dat Source_System_Code The unique identifier of the application or system from which the information last used to update the entity instance was populated.

VARCHAR(10) Yes Primary

em

CUST.dat Customer_Number The unique identifier assigned to a customer.

INTEGER(10) Yes Primary

d

CUST.dat Source_System_Identifier The identifier of the source system that the data was sourced.

VARCHAR(4) Yes Primary

SYS 2 CUST FILE ID Decimal(10) Translate decimal to Varchar

CUST.dat Source_System_Code The unique identifier of the application or system from which the information last used to update the entity instance was populated.

VARCHAR(10) Yes Primary

CUST.dat Customer_Number The unique identifier assigned to a customer.

INTEGER(10) Yes Primary

d

CUST.dat Source_System_Identifier The identifier of the source system that the data was sourced.

VARCHAR(4) Yes Primary

SYS 3 CUST FILE CUST_ID Decimal(10) Translate Decimal to Varchar

CUST.dat Source_System_Code The unique identifier of the application or system from which the information last used to update the entity instance was populated.

VARCHAR(10) Yes Primary

Figure 6.22 Common customer key

6. Map technical mapping requirements to each target’s subject area data element—

Build in any mapping business rules, which may be as simple as padding or trimming the field, to aggregating and/or calculating amounts.

This mapping from the Wheeler customer subject area provides a simple padding example, as shown in Figure 6.23.

1. Source-to-Enterprise Data Warehouse Data Mappings

Source File/

Table

Source Field Source Domain

Mapping Rule Subject Area File Column Name Column Definition Target Domain

Mandatory Key

Customer Subject Area

SYS 2 CUST FILE F_NAME Char(15) Pad last 5 digits CUST.dat Purchaser_First_Name The first name of the purchaser Varchar(20) Yes No SYS 2 CUST FILE L_NAME Char(15) Pad last 5 digits CUST.dat Purchaser_Last_Name The last name of the purchaser Varchar(20) Yes No

Figure 6.23 Applying technical requirement: padding data elements

7. Reconcile definitional (data governance) issues between source systems—Resolve any data element (attribute)–level definitional differences between the different sources and the target data element.

ptg

Data Integration Analysis Phase 139

This task addresses the very first point in the book. Addressing the technical challenges of data integration are difficult enough; determining the correct interpretation of a data element, whether it is simply two fields being merged into one, or a calculation, requires attention from the data integration analyst performing the mapping, the data modeler that created the target element and target definition, and the business subject matter experts that understand each of the source data element definitions that are being mapped to the target.

The completed Wheeler data warehouse source-to-EDW mapping document is illustrated in Figure 6.24.

ptg

ptg

Data Integration Analysis Phase 141

Source File/

Table

Source Field Source Domain

Mapping Rule Subject Area File

Column Name Target Domain Product Subject Area

Create a system- generated ID

PROD.dat Product_Id INTEGER(10)

Must be assigned

"SYS1"

PROD.dat Source_System_Identifier VARCHAR(4)

SYS 1 PROD FILE Item Number Varchar(04) 1.Translate Varchar to integer. 2. Pad last 6 digits.

PROD.dat Source_System_Code INTEGER(10)

SYS 1 PROD FILE Description Char(30) Pad last 10 digits. PROD.dat Product_Name CHAR(40)

er

"Rubber," "Wheels,"

or "Bearings."

Product_Type CHAR(40)

Insert "No Source System Value"

Product_Code VARCHAR(20)

SYS 1 PROD FILE Cost Decimal(12,2) Trim first 5 digits. PROD.dat Product_Cost Decimal 7,2

SYS 1 PROD FILE Price Decimal(12,2) Trim first 5 digits. PROD.dat Product_Price Decimal 7,2 SYS 1 PROD FILE Inventory Decimal(12,2) Trim first 5 digits. PROD.dat Inventory Decimal 7,2

Create a system- generated ID

PROD.dat Product_Id INTEGER(10)

Must be assigned

"SYS1"

PROD.dat Source_System_Identifier VARCHAR(4)

SYS 2 PROD FILE Item ID Integer(06) 1.Translate Integer to Varchar. 2. Pad last 4 digits.

PROD.dat Source_System_Code VARCHAR(10)

SYS 2 PROD FILE Inventory Name Char(30) 1. Pad last 10 digits. PROD.dat Product_Name CHAR(40)

er

"Rubber," "Wheels,"

or "Bearings."

Product_Type CHAR(40)

Insert "No Source System Value"

Product_Code VARCHAR(20)

SYS 2 PROD FILE Cost Decimal(12,2) Trim first 5 digits. PROD.dat Product_Cost Decimal 7,2

SYS 2 PROD FILE Price Decimal(12,2) Trim first 5 digits. PROD.dat Product_Price Decimal 7,2 SYS 2 PROD FILE Inventory Decimal(12,2) Trim first 5 digits. PROD.dat Inventory Decimal 7,2

Create a system- generated ID

PROD.dat Product_Id INTEGER(10)

Must be assigned

"SYS1"

PROD.dat Source_System_Identifier VARCHAR(4)

SYS 3 PROD FILE ID Number Integer(06) 1.Translate Integer to Varchar. 2. Pad last 4 digits.

PROD.dat Source_System_Code VARCHAR(10)

SYS 3 PROD FILE Name Char(30) 1. Pad last 10 digits. PROD.dat Product_Name CHAR(40)

er

"Rubber," "Wheels,"

or "Bearings."

Product_Type CHAR(40)

Insert "No Source System Value"

Product_Code VARCHAR(20)

SYS 3 PROD FILE Cost Decimal(12,2) Trim first 5 digits. PROD.dat Product_Cost Decimal 7,2

SYS 3 PROD FILE Price Decimal(12,2) Trim first 5 digits. PROD.dat Product_Price Decimal 7,2 SYS 3 PROD FILE Inventory Decimal(12,2) Trim first 5 digits. PROD.dat Inventory Decimal 7,2

Figure 6.24 Wheeler source-to-data warehouse data mapping

ptg

ptg

Data Integration Analysis Phase 143

1. Source-to-Enterprise Data Warehouse Data Mappings Source File/

Table

Source Field Source Domain

Mapping Rule Subject Area File Column Name Column Definition Target

Domain

Mandatory Key Note

Order Subject Area

SYS 2 ORDR FILE PROD_PRICE Decimal(05,2) 1.Translate Decimal to Integer. 2. Trim the first digit.

PROD.dat Product_Price The per unit price that Wheeler charges

their customers.

Decimal 7,2 Yes No

SYS 2 ORDR FILE AMNT_ORDR Decimal(08,2) Translate Decimal to Integer

ORDR.dat Quantity_Ordered The per unit quantity of the product ordered INTEGER(07)

Use the same system-generated ID

PROD.dat Order_Number This number represents a single

occurrence of an order.

INTEGER(07) Yes Primary

SYS 2 ORDR FILE LINE_2 Decimal(2,2) Insert "2" into the field.

Order_Line_Number The unique identifier for one occurrence of a status code on a order.

INTEGER(04) Yes Primary

SYS 2 ORDR FILE ITEM_ID Integer(06) 1. Pad the first 4

digits. 2. Perform a lookup and match the

"Item_ID" with the product table

"Source_System_Co de" in the

"Product_Id.Source _System_Identifier.S ource_System_Cod e" primary key, once matched insert the

"Product_ID" value from that row.

ORDR.dat Product_Id The unique identifier of a Wheeler product. INTEGER(10) Yes Foreign

SYS 2 ORDR FILE PROD_PRICE Decimal(05,2) 1.Translate Decimal to Integer. 2. Trim the first digit.

PROD.dat Product_Price The per unit price that Wheeler charges

their customers.

Decimal 7,2 Yes No

SYS 2 ORDR FILE AMNT_ORDR Decimal(08,2) Translate Decimal to Integer

ORDR.dat Quantity_Ordered The per unit quantity of the product ordered INTEGER(07)

Create a system- generated ID

PROD.dat Order_Number This number represents a single

occurrence of a order.

INTEGER(07) Yes Primary

Must be Assigned

"SYS1"

PROD.dat Source_System_Identifier The identifier of the source system that the data was sourced.

VARCHAR(4) Yes Primary

SYS 3 ORDR FILE ORD _# Decimal(05,2) 1.Translate Decimal

to Varchar. 2. Trim the last 2 digits, pad the first 5.

ORDR.dat Source_System_Code The unique identifier of the application or system from which the information last used to update the entity instance was populated.

VARCHAR(10) Yes Primary

SYS 3 ORDR FILE STS Char(07) 1. Translate Char to

VarChar. 2. Pad the last 25 digits

ORDR.dat Terms The terms of payment for the order. VARCHAR(30) Yes No

SYS 3 ORDR FILE DTE Integer(08) Translate Integer to

Date

ORDR.dat Order_Date The date that the order was placed. Date Yes No

Insert "00/00/0000" ORDR.dat Effective_Date The date that the order will take effect. Date No No

SYS 3 ORDR FILE CUST_# Varchar(04) 1. Translate Varchar

to Integer. 2. Pad the first 6 digits. 3.

Perform a lookup and match the "Cust_#"

with the customer table " Cust_ID" , once matched insert the "Cust_ID" value from that row

ORDR.dat Cust_Id The unique identifier assigned to a

customer.

INTEGER(10) Yes Foreign

Use the same system-generated ID

PROD.dat Order_Number This number represents a single

occurrence of a order.

INTEGER(07) Yes Primary

SYS 3 ORDR FILE LN_1 Decimal(2,2) Insert "1" into the field.

Order_Line_Number The unique identifier for one occurrence of a status code on a order.

INTEGER(04) Yes Primary

SYS 3 ORDR FILE ID_NUMBER Integer(06) 1. Pad the first 4

digits. 2. Perform a lookup and match the

"Item_ID" with the product table

"Source_System_Co de" in the

"Product_Id.Source _System_Identifier.S ource_System_Cod e" primary key, once matched insert the

"Product_ID" value from that row.

ORDR.dat Product_Id The unique identifier of a Wheeler product. INTEGER(10) Yes Foreign

SYS 3 ORDR FILE PROD_PRICE Decimal(05,2) 1. Pad the first digit. PROD.dat Product_Price The per unit price that Wheeler charges

their customers.

Decimal 7,2 Yes No

SYS 3 ORDR FILE AMNT_ORDR Decimal(08,2) 1.Translate Decimal to Integer. 2. Trim the first digit, and the last 2 digits.

ORDR.dat Quantity_Ordered The per unit quantity of the product ordered INTEGER(07)

Use the same system-generated ID

PROD.dat Order_Number This number represents a single

occurrence of an order.

INTEGER(07) Yes Primary

SYS 3 ORDR FILE LN_2 Decimal(2,2) Insert "2" into the field.

Order_Line_Number The unique identifier for one occurrence of a status code on an order.

INTEGER(04) Yes Primary

SYS 3 ORDR FILE ID_NUMBER Integer(06) 1. Pad the first 4

digits. 2. Perform a lookup and match the

"Item_ID" with the product table

"Source_System_Co de" in the

"Product_Id.Source _System_Identifier.S ource_System_Cod e" primary key, once matched insert the

"Product_ID" value from that row.

ORDR.dat Product_Id The unique identifier of a Wheeler product. INTEGER(10) Yes Foreign

SYS 3 ORDR FILE PROD_PRICE Decimal(05,2) 1. Pad the first digit. PROD.dat Product_Price The per unit price that Wheeler charges

their customers.

Decimal 7,2 Yes No

SYS 3 ORDR FILE AMNT_ORDR Decimal(08,2) 1.Translate Decimal to Integer. 2. Trim the first digit, and the last 2 digits.

ORDR.dat Quantity_Ordered The per unit quantity of the product ordered INTEGER(07)

Use the same system-generated ID

PROD.dat Order_Number This number represents a single

occurrence of a order.

INTEGER(07) Yes Primary

SYS 3 ORDR FILE LN_3 Decimal(2,2) Insert "3" into the field.

Order_Line_Number The unique identifier for one occurrence of a status code on a order.

INTEGER(04) Yes Primary

SYS 3 ORDR FILE ID_NUMBER Integer(06) 1. Pad the first 4

digits. 2. Perform a lookup and match the

"Item_ID" with the product table

"Source_System_Co matched insert the

"Product_ID" value from that row.

ORDR.dat Product_Id The unique identifier of a Wheeler Product. INTEGER(10) Yes Foreign

SYS 3 ORDR FILE PROD_PRICE Decimal(05,2) 1. Pad the first digit. PROD.dat Product_Price The per unit price that Wheeler charges

their customers.

Decimal 7,2 Yes No

SYS 3 ORDR FILE AMNT_ORDR Decimal(08,2) 1.Translate Decimal to Integer. 2. Trim the first digit, and the last 2 digits.

ORDR.dat Quantity_Ordered The per unit quantity of the product ordered INTEGER(07)

ptg Data Mapping Checkpoint

Version: 1.0 Released:

Quality Control Process/Procedure/Task Review Perform Data Mapping to Source Systems

Roles and Responsibilities

1. Were critical transaction-level data elements confirmed?

2. Were key data aggregations and calculations confirmed?

3. Were technical requirements mapped to each source system?

1. Were definitional (data governance) issues between source systems reconciled?

Key: R-Responsible, A-Approves, S-Supports, I-Informs, C-Consults

The deliverables review on this checklist conforms to standards:

Comments

Completely Partially Not at all Project Manager

Data Steward Business Analyst

Data Integration Architect

Data Integration

Architect

Metadata Specialist

A A A B S A

Yes No N/A Comments Content Owner:

Dept Name:

Figure 6.25 Data mapping sign-off form

Once all the source fields have been mapped to the target data model, plan for two to three review (and renovation) sessions with the business stakeholders on confirming the completeness and accuracy of the data mappings.

Pay careful attention on calculations and key mapping confirmations.

Finally, it is a very good best practice to have an internal review session with formal check-points by peers or peer groups prior to a final sign-off on the data mappings with the end user.

Figure 6.25 provides an example of a formal sign-off sheet for data mapping.

ptg

Summary

In this chapter, we began our second case study with the emphasis on working through the entire data integration life cycle tasks and deliverables. In subsequent chapters, we cover phases of the data integration life cycle, and provide case studies for each phase. This case study was based on integrating three order management systems for the Wheeler Company into an enterprise data warehouse and product line profitability data mart.

Before starting the case study, the chapter first reviewed the important concept of where calculations and aggregations could go in the different layers of a data warehouse and the advan-tages and disadvanadvan-tages of each approach.

The first task that the case study covered was how to graphically scope out the data integra-tion project by building a “picture” of the intended Wheeler data integraintegra-tion processes in a con-ceptual data integration diagram. Once documented and the scope is identified and confirmed, attention is moved to the source systems.

Next, we performed source systems profiling and analysis to have a good understanding of the underlying Wheeler source system data.

Finally, we mapped the source data to the target database; in the Wheeler case study, it was the data warehouse data model. We reviewed the fact that data mapping is not a one-to-one exer-cise, but requires both a horizontal and vertical view of the sources to target.

Chapter 7, “Data Integration Logical Design,” focuses on using the analysis phase deliver-ables such as the source-to-target mapping document and the Data Quality Criteria Work Book as sources for building out the logical design deliverables such as the logical data integration archi-tecture and logical data integration models.

Summary 145

ptg

ptg In a data integration project, the logical design phase transforms the data integration

require-ments (e.g., the data mappings) into logical business designs. It segrequire-ments those mappings into logical units of work, using the data integration modeling technique and reference architecture.

The logical design phase also completes the analysis on data quality by focusing on the tar-get’s data quality criteria, both technical and business.

It is also important to begin to determine the physical volume sizing of the proposed data integration application on the data integration environment.

The tasks for the data integration logical design phase include the following: