Data Integration Analysis Phase
Step 4: Perform Source\Target Data Mappings
5. Map the data element or elements to the target data element—First map the element or elements to the target element, then working with a data modeler, a data integration
analyst would create mappings for the three core key structures that followed the cus-tomer key example shown in Figure 6.22.
ptg
1. Source-to-Enterprise Data Warehouse Data Mappings Source File/
Table
Source Field Source
Domain
Mapping Rule Subject Area File Column Name Column Definition Target
Domain
Mandatory Key
Customer Subject Area
CUST.dat Customer_Number The unique identifier assigned to a customer.
INTEGER(10) Yes Primary
d
CUST.dat Source_System_Identifier The identifier of the source system that the data was sourced.
VARCHAR(4) Yes Primary
SYS 1 CUST FILE CUST_# Varchar(04) Pad last 6 digits CUST.dat Source_System_Code The unique identifier of the application or system from which the information last used to update the entity instance was populated.
VARCHAR(10) Yes Primary
em
CUST.dat Customer_Number The unique identifier assigned to a customer.
INTEGER(10) Yes Primary
d
CUST.dat Source_System_Identifier The identifier of the source system that the data was sourced.
VARCHAR(4) Yes Primary
SYS 2 CUST FILE ID Decimal(10) Translate decimal to Varchar
CUST.dat Source_System_Code The unique identifier of the application or system from which the information last used to update the entity instance was populated.
VARCHAR(10) Yes Primary
CUST.dat Customer_Number The unique identifier assigned to a customer.
INTEGER(10) Yes Primary
d
CUST.dat Source_System_Identifier The identifier of the source system that the data was sourced.
VARCHAR(4) Yes Primary
SYS 3 CUST FILE CUST_ID Decimal(10) Translate Decimal to Varchar
CUST.dat Source_System_Code The unique identifier of the application or system from which the information last used to update the entity instance was populated.
VARCHAR(10) Yes Primary
Figure 6.22 Common customer key
6. Map technical mapping requirements to each target’s subject area data element—
Build in any mapping business rules, which may be as simple as padding or trimming the field, to aggregating and/or calculating amounts.
This mapping from the Wheeler customer subject area provides a simple padding example, as shown in Figure 6.23.
1. Source-to-Enterprise Data Warehouse Data Mappings
Source File/
Table
Source Field Source Domain
Mapping Rule Subject Area File Column Name Column Definition Target Domain
Mandatory Key
Customer Subject Area
SYS 2 CUST FILE F_NAME Char(15) Pad last 5 digits CUST.dat Purchaser_First_Name The first name of the purchaser Varchar(20) Yes No SYS 2 CUST FILE L_NAME Char(15) Pad last 5 digits CUST.dat Purchaser_Last_Name The last name of the purchaser Varchar(20) Yes No
Figure 6.23 Applying technical requirement: padding data elements
7. Reconcile definitional (data governance) issues between source systems—Resolve any data element (attribute)–level definitional differences between the different sources and the target data element.
ptg
Data Integration Analysis Phase 139
This task addresses the very first point in the book. Addressing the technical challenges of data integration are difficult enough; determining the correct interpretation of a data element, whether it is simply two fields being merged into one, or a calculation, requires attention from the data integration analyst performing the mapping, the data modeler that created the target element and target definition, and the business subject matter experts that understand each of the source data element definitions that are being mapped to the target.
The completed Wheeler data warehouse source-to-EDW mapping document is illustrated in Figure 6.24.
ptg
ptg
Data Integration Analysis Phase 141
Source File/
Table
Source Field Source Domain
Mapping Rule Subject Area File
Column Name Target Domain Product Subject Area
Create a system- generated ID
PROD.dat Product_Id INTEGER(10)
Must be assigned
"SYS1"
PROD.dat Source_System_Identifier VARCHAR(4)
SYS 1 PROD FILE Item Number Varchar(04) 1.Translate Varchar to integer. 2. Pad last 6 digits.
PROD.dat Source_System_Code INTEGER(10)
SYS 1 PROD FILE Description Char(30) Pad last 10 digits. PROD.dat Product_Name CHAR(40)
er
"Rubber," "Wheels,"
or "Bearings."
Product_Type CHAR(40)
Insert "No Source System Value"
Product_Code VARCHAR(20)
SYS 1 PROD FILE Cost Decimal(12,2) Trim first 5 digits. PROD.dat Product_Cost Decimal 7,2
SYS 1 PROD FILE Price Decimal(12,2) Trim first 5 digits. PROD.dat Product_Price Decimal 7,2 SYS 1 PROD FILE Inventory Decimal(12,2) Trim first 5 digits. PROD.dat Inventory Decimal 7,2
Create a system- generated ID
PROD.dat Product_Id INTEGER(10)
Must be assigned
"SYS1"
PROD.dat Source_System_Identifier VARCHAR(4)
SYS 2 PROD FILE Item ID Integer(06) 1.Translate Integer to Varchar. 2. Pad last 4 digits.
PROD.dat Source_System_Code VARCHAR(10)
SYS 2 PROD FILE Inventory Name Char(30) 1. Pad last 10 digits. PROD.dat Product_Name CHAR(40)
er
"Rubber," "Wheels,"
or "Bearings."
Product_Type CHAR(40)
Insert "No Source System Value"
Product_Code VARCHAR(20)
SYS 2 PROD FILE Cost Decimal(12,2) Trim first 5 digits. PROD.dat Product_Cost Decimal 7,2
SYS 2 PROD FILE Price Decimal(12,2) Trim first 5 digits. PROD.dat Product_Price Decimal 7,2 SYS 2 PROD FILE Inventory Decimal(12,2) Trim first 5 digits. PROD.dat Inventory Decimal 7,2
Create a system- generated ID
PROD.dat Product_Id INTEGER(10)
Must be assigned
"SYS1"
PROD.dat Source_System_Identifier VARCHAR(4)
SYS 3 PROD FILE ID Number Integer(06) 1.Translate Integer to Varchar. 2. Pad last 4 digits.
PROD.dat Source_System_Code VARCHAR(10)
SYS 3 PROD FILE Name Char(30) 1. Pad last 10 digits. PROD.dat Product_Name CHAR(40)
er
"Rubber," "Wheels,"
or "Bearings."
Product_Type CHAR(40)
Insert "No Source System Value"
Product_Code VARCHAR(20)
SYS 3 PROD FILE Cost Decimal(12,2) Trim first 5 digits. PROD.dat Product_Cost Decimal 7,2
SYS 3 PROD FILE Price Decimal(12,2) Trim first 5 digits. PROD.dat Product_Price Decimal 7,2 SYS 3 PROD FILE Inventory Decimal(12,2) Trim first 5 digits. PROD.dat Inventory Decimal 7,2
Figure 6.24 Wheeler source-to-data warehouse data mapping
ptg
ptg
Data Integration Analysis Phase 143
1. Source-to-Enterprise Data Warehouse Data Mappings Source File/
Table
Source Field Source Domain
Mapping Rule Subject Area File Column Name Column Definition Target
Domain
Mandatory Key Note
Order Subject Area
SYS 2 ORDR FILE PROD_PRICE Decimal(05,2) 1.Translate Decimal to Integer. 2. Trim the first digit.
PROD.dat Product_Price The per unit price that Wheeler charges
their customers.
Decimal 7,2 Yes No
SYS 2 ORDR FILE AMNT_ORDR Decimal(08,2) Translate Decimal to Integer
ORDR.dat Quantity_Ordered The per unit quantity of the product ordered INTEGER(07)
Use the same system-generated ID
PROD.dat Order_Number This number represents a single
occurrence of an order.
INTEGER(07) Yes Primary
SYS 2 ORDR FILE LINE_2 Decimal(2,2) Insert "2" into the field.
Order_Line_Number The unique identifier for one occurrence of a status code on a order.
INTEGER(04) Yes Primary
SYS 2 ORDR FILE ITEM_ID Integer(06) 1. Pad the first 4
digits. 2. Perform a lookup and match the
"Item_ID" with the product table
"Source_System_Co de" in the
"Product_Id.Source _System_Identifier.S ource_System_Cod e" primary key, once matched insert the
"Product_ID" value from that row.
ORDR.dat Product_Id The unique identifier of a Wheeler product. INTEGER(10) Yes Foreign
SYS 2 ORDR FILE PROD_PRICE Decimal(05,2) 1.Translate Decimal to Integer. 2. Trim the first digit.
PROD.dat Product_Price The per unit price that Wheeler charges
their customers.
Decimal 7,2 Yes No
SYS 2 ORDR FILE AMNT_ORDR Decimal(08,2) Translate Decimal to Integer
ORDR.dat Quantity_Ordered The per unit quantity of the product ordered INTEGER(07)
Create a system- generated ID
PROD.dat Order_Number This number represents a single
occurrence of a order.
INTEGER(07) Yes Primary
Must be Assigned
"SYS1"
PROD.dat Source_System_Identifier The identifier of the source system that the data was sourced.
VARCHAR(4) Yes Primary
SYS 3 ORDR FILE ORD _# Decimal(05,2) 1.Translate Decimal
to Varchar. 2. Trim the last 2 digits, pad the first 5.
ORDR.dat Source_System_Code The unique identifier of the application or system from which the information last used to update the entity instance was populated.
VARCHAR(10) Yes Primary
SYS 3 ORDR FILE STS Char(07) 1. Translate Char to
VarChar. 2. Pad the last 25 digits
ORDR.dat Terms The terms of payment for the order. VARCHAR(30) Yes No
SYS 3 ORDR FILE DTE Integer(08) Translate Integer to
Date
ORDR.dat Order_Date The date that the order was placed. Date Yes No
Insert "00/00/0000" ORDR.dat Effective_Date The date that the order will take effect. Date No No
SYS 3 ORDR FILE CUST_# Varchar(04) 1. Translate Varchar
to Integer. 2. Pad the first 6 digits. 3.
Perform a lookup and match the "Cust_#"
with the customer table " Cust_ID" , once matched insert the "Cust_ID" value from that row
ORDR.dat Cust_Id The unique identifier assigned to a
customer.
INTEGER(10) Yes Foreign
Use the same system-generated ID
PROD.dat Order_Number This number represents a single
occurrence of a order.
INTEGER(07) Yes Primary
SYS 3 ORDR FILE LN_1 Decimal(2,2) Insert "1" into the field.
Order_Line_Number The unique identifier for one occurrence of a status code on a order.
INTEGER(04) Yes Primary
SYS 3 ORDR FILE ID_NUMBER Integer(06) 1. Pad the first 4
digits. 2. Perform a lookup and match the
"Item_ID" with the product table
"Source_System_Co de" in the
"Product_Id.Source _System_Identifier.S ource_System_Cod e" primary key, once matched insert the
"Product_ID" value from that row.
ORDR.dat Product_Id The unique identifier of a Wheeler product. INTEGER(10) Yes Foreign
SYS 3 ORDR FILE PROD_PRICE Decimal(05,2) 1. Pad the first digit. PROD.dat Product_Price The per unit price that Wheeler charges
their customers.
Decimal 7,2 Yes No
SYS 3 ORDR FILE AMNT_ORDR Decimal(08,2) 1.Translate Decimal to Integer. 2. Trim the first digit, and the last 2 digits.
ORDR.dat Quantity_Ordered The per unit quantity of the product ordered INTEGER(07)
Use the same system-generated ID
PROD.dat Order_Number This number represents a single
occurrence of an order.
INTEGER(07) Yes Primary
SYS 3 ORDR FILE LN_2 Decimal(2,2) Insert "2" into the field.
Order_Line_Number The unique identifier for one occurrence of a status code on an order.
INTEGER(04) Yes Primary
SYS 3 ORDR FILE ID_NUMBER Integer(06) 1. Pad the first 4
digits. 2. Perform a lookup and match the
"Item_ID" with the product table
"Source_System_Co de" in the
"Product_Id.Source _System_Identifier.S ource_System_Cod e" primary key, once matched insert the
"Product_ID" value from that row.
ORDR.dat Product_Id The unique identifier of a Wheeler product. INTEGER(10) Yes Foreign
SYS 3 ORDR FILE PROD_PRICE Decimal(05,2) 1. Pad the first digit. PROD.dat Product_Price The per unit price that Wheeler charges
their customers.
Decimal 7,2 Yes No
SYS 3 ORDR FILE AMNT_ORDR Decimal(08,2) 1.Translate Decimal to Integer. 2. Trim the first digit, and the last 2 digits.
ORDR.dat Quantity_Ordered The per unit quantity of the product ordered INTEGER(07)
Use the same system-generated ID
PROD.dat Order_Number This number represents a single
occurrence of a order.
INTEGER(07) Yes Primary
SYS 3 ORDR FILE LN_3 Decimal(2,2) Insert "3" into the field.
Order_Line_Number The unique identifier for one occurrence of a status code on a order.
INTEGER(04) Yes Primary
SYS 3 ORDR FILE ID_NUMBER Integer(06) 1. Pad the first 4
digits. 2. Perform a lookup and match the
"Item_ID" with the product table
"Source_System_Co matched insert the
"Product_ID" value from that row.
ORDR.dat Product_Id The unique identifier of a Wheeler Product. INTEGER(10) Yes Foreign
SYS 3 ORDR FILE PROD_PRICE Decimal(05,2) 1. Pad the first digit. PROD.dat Product_Price The per unit price that Wheeler charges
their customers.
Decimal 7,2 Yes No
SYS 3 ORDR FILE AMNT_ORDR Decimal(08,2) 1.Translate Decimal to Integer. 2. Trim the first digit, and the last 2 digits.
ORDR.dat Quantity_Ordered The per unit quantity of the product ordered INTEGER(07)
ptg Data Mapping Checkpoint
Version: 1.0 Released:
Quality Control Process/Procedure/Task Review Perform Data Mapping to Source Systems
Roles and Responsibilities
1. Were critical transaction-level data elements confirmed?
2. Were key data aggregations and calculations confirmed?
3. Were technical requirements mapped to each source system?
1. Were definitional (data governance) issues between source systems reconciled?
Key: R-Responsible, A-Approves, S-Supports, I-Informs, C-Consults
The deliverables review on this checklist conforms to standards:
Comments
Completely Partially Not at all Project Manager
Data Steward Business Analyst
Data Integration Architect
Data Integration
Architect
Metadata Specialist
A A A B S A
Yes No N/A Comments Content Owner:
Dept Name:
Figure 6.25 Data mapping sign-off form
Once all the source fields have been mapped to the target data model, plan for two to three review (and renovation) sessions with the business stakeholders on confirming the completeness and accuracy of the data mappings.
Pay careful attention on calculations and key mapping confirmations.
Finally, it is a very good best practice to have an internal review session with formal check-points by peers or peer groups prior to a final sign-off on the data mappings with the end user.
Figure 6.25 provides an example of a formal sign-off sheet for data mapping.
ptg
Summary
In this chapter, we began our second case study with the emphasis on working through the entire data integration life cycle tasks and deliverables. In subsequent chapters, we cover phases of the data integration life cycle, and provide case studies for each phase. This case study was based on integrating three order management systems for the Wheeler Company into an enterprise data warehouse and product line profitability data mart.
Before starting the case study, the chapter first reviewed the important concept of where calculations and aggregations could go in the different layers of a data warehouse and the advan-tages and disadvanadvan-tages of each approach.
The first task that the case study covered was how to graphically scope out the data integra-tion project by building a “picture” of the intended Wheeler data integraintegra-tion processes in a con-ceptual data integration diagram. Once documented and the scope is identified and confirmed, attention is moved to the source systems.
Next, we performed source systems profiling and analysis to have a good understanding of the underlying Wheeler source system data.
Finally, we mapped the source data to the target database; in the Wheeler case study, it was the data warehouse data model. We reviewed the fact that data mapping is not a one-to-one exer-cise, but requires both a horizontal and vertical view of the sources to target.
Chapter 7, “Data Integration Logical Design,” focuses on using the analysis phase deliver-ables such as the source-to-target mapping document and the Data Quality Criteria Work Book as sources for building out the logical design deliverables such as the logical data integration archi-tecture and logical data integration models.
Summary 145
ptg
ptg In a data integration project, the logical design phase transforms the data integration
require-ments (e.g., the data mappings) into logical business designs. It segrequire-ments those mappings into logical units of work, using the data integration modeling technique and reference architecture.
The logical design phase also completes the analysis on data quality by focusing on the tar-get’s data quality criteria, both technical and business.
It is also important to begin to determine the physical volume sizing of the proposed data integration application on the data integration environment.
The tasks for the data integration logical design phase include the following: