Customer
the legal name for the customer) as used by the financial
ll
Data Quality Criteria: Male, Female, Unknown
r
Source System Unique Key Text The unique identifier of the customer in the source system. Source_Sys_Unique_Key_Text VARCHAR(32) Yes Must be not null
Source System Code The unique identifier of the source system. Source_Sys_Code VARCHAR(20) Yes Must be not null
Customer Type Identifier The unique identifier assigned to the customer type. For example, commercial, retail
Customer_Type_Id SMALLINT Yes Must be not null
Customer Effective Date The date on which the customer first became relevant to the financial institution.
Cust_Effective_Date DATE Yes Must be not null and a date
field Customer End Date The date on which the customer ceased to be relevant to the
financial institution.
Last Update Run Identifier Last_Update_Run_Id INTEGER(10) Yes Must be not null
Created Run Identifier Created_Run_Id INTEGER(10) Yes Must benotnull
Figure 5.9 Data quality checkpoint definition example
Performing Source\Target Data Mappings
This task maps each source system data element’s technical and business definition to the intended target element (or data elements). For example, for every expected derived or
transac-ptg
Overview of Data Mapping
Data mapping, one of the most critical aspects of data integration, is the process of conforming data elements between one or (usually) more sources to a target data model. Data mapping is used as a first step for a wide variety of data integration tasks, including the following:
• Data transformation or data mediation between a data source and a destination, which includes the identification of all data relationships as part of this data lineage analysis
• The discovery of hidden sensitive data, for example, the last four digits of a Social Secu-rity number hidden in another user ID as part of a data masking or de-identification project for multiple databases into a single database
For example, a company that would like to transmit and receive purchases and invoices with other companies might use data mapping to create data maps from a company’s data to stan-dardized ANSI ASC X12 messages for items such as purchase orders and invoices. Figure 5.10 illustrates a typical data mapping example where three system primary keys, Customer #, Cus-tomer Number (using Social Security number), and CusCus-tomer #, are used to build an overall cus-tomer key, Involved Party.
System 1 Customer # Alpha 15
System 2 Customer Number Social Security 9
System 3 Customer # Numeric 06
Involved Party ID Alphanumeric 20
Figure 5.10 Typical data mapping example
Data mapping is not a technical task; it is a business analysis task and is one of the most important tasks in any data integration project.
Data mapping is also not a one-to-one concept. It requires both “horizontal” and “vertical”
analysis of the one-to-many sources to (usually) one target, as demonstrated in Figure 5.11; it requires deep business knowledge of the particular industry.
ptg For example, for integration loans from multiple loan systems, a data integration analyst
with knowledge of banking is needed. For the integration of multiple product masters for auto-motive parts, a data integration analyst with knowledge of manufacturing would be needed to explain the business rules and relationships of their particular data.
Types of Data Mapping
Data mapping is a series of design patterns or “types” that requires the different types of analysis, as follows:
• One-to-one data mapping—The simplest type of data mapping is a one-to-one (see Figure 5.12). Even in this scenario, there is a level of transformation that is needed. In this mapping, the data elements need to be translated from Integer to VarChar to not have data mapping errors in the data integration jobs.
Performing Source\Target Data Mappings 113
Horizontal Analysis
Vertical Analysis
System 1 Customer # Alpha 15
System 2 Customer Number Social Security 9
System 3 Customer # Numeric 06
Involved Party ID Alphanumeric 20
Figure 5.11 The multidimensional analysis aspect of data mapping
Source File/
Table
Source Field Source Domain
Mapping Rule
Column Name Target Domain
Mandatory Key
CS1001 SOC-SEC-# INTEGER (09) Translate Integer to Varchar
Social_Sec_Number VARCHAR(09) Yes Yes
Figure 5.12 One-to-one data mapping scenario
• One-to-many data mapping—One-to-many scenarios often occur when data is being mapped from a second normal form data model to a third normal form data model, as displayed in Figure 5.13. In this example, the Customer File data elements are mapped to a normalized relational database. The data mapper will need to analyze what data ele-ments map to what table. For example:
CUST_ID maps to Customer Number in the Customer_Table and to the
ptg The mapping to perform this normalization creates the one-to-many mapping shown in
Figure 5.14.
Figure 5.13 One-to-many data mapping example: one file to two tables
Source File/
Table
Source Field Source Domain
Mapping Rule
Target Table Column Name Target Domain
Mandatory Key
CUST FILE CUST_ID CHAR (15) None. Customer Customer Number VARCHAR(15) Yes Yes
CUST FILE CUST_FNAME CHAR (20) None. Customer Customer First Name VARCHAR(20) Yes No
CUST FILE CUST_LNAME Customer CustomerLastName No
Increment from 1
Address Address Id Yes Yes Yes
CUST FILE CUST_ID CHAR (15) None. Address Customer Number VARCHAR(15) Yes No
CUST FILE CHAR (20) None. Address Address Line 1 Yes No
0)
CUST FILE CITY CHAR (20) None. Address City VARCHAR(20) Yes No
CUST FILE STATE CHAR (20) None. Address State VARCHAR(20) Yes No
CUST FILE ZIP CHAR (09) None. Address Zip VARCHAR(09) Yes No
Figure 5.14 One-to-many data mapping example
• Many-to-one data mapping—The next mapping scenario, shown in Figure 5.15, requires a horizontal mapping view and is a typical mapping situation that rationalizes multiple source customer keys to one new customer key, in this example the Customer_Number attribute.
Customer File 1 CUST_ID INTEGER (09)
Customer Table
Customer_Number Integer (10) Source_System_Id Integer (02) Source_System_Number Integer (10) Customer File 2
CST Packed Decimal (08)
Customer File 3 Customer_ID Numeric (07)
Customer File 4 CUST_NUM Decimal (07)
Figure 5.15 Many-to-one data mapping example: four files to one table
ptg This mapping is illustrated in Figure 5.16 as each of the source customer IDs are
mapped to the target customer ID.
Performing Source\Target Data Mappings 115
et
Mandatory Key
r
If source system 1, then move "1" to the field, else if source system 2, then move "2" to the field, else if source system 3, then move "3" to the field, else if source system 4, then move "4" to the field, else if "U".
Customer Source_System_Id Integer (02) Yes Yes
CUSTOMER FILE 1CUST_ID INTEGER(09) Padaltsdigti Cutsomer Source_System_Number Integer (10) Yes No CUSTOMER FILE 2CST Packed Decimla(08)TransaltepadDecimatloIntege.rPadaltstwodigtis Cutsomer Source_System_Number Integer(10) No
CUSTOMER FILE 3Cutsomer_ID Numeric(07) TransaltepadNumercitoIntege.rPad latsthreedigtis Address Source_System_Number Integer (10) Yes No CUSTOMER FILE 4CUST_NUM Deicmla(07) TransaltepadDecimatloIntege.rPadaltsthreedigtis Address Source_System_Number Integer (10) Yes No
Figure 5.16 Many-to-one data mapping example
Key Source\Target Data Mapping Task Steps Key source-to-target data mapping steps include the following:
1. Determine the target subject areas—If applicable, review the target data model to group the target tables into logical subject areas.
2. Identify the target data element or elements by subject area—For each of the sub-ject areas (such as customer or product), determine what data elements fit within that grouping.
3. Review all the source systems for candidate data elements—Review the other sources for potential one-to-many source data elements for the target data element.
4. Map the candidate data element or elements to the target data element—Map the identified source data element to target data element. For this deliverable, document dif-ferences in technical metadata such as format (e.g., VarChar versus Char) and length.
5. Review each source and target data element for one-to-many or many-to-one requirements—Perform both a vertical and horizontal review of the sources against the target data element.
6. Map technical mapping requirements to each target’s subject area data element—
Build in any mapping business rules, which may be as simple as padding or trimming the field, to aggregating and/or calculating amounts.
7. Reconcile definitional (data governance) issues between source systems—Resolve any data element (attribute)–level definitional differences between the different sources and the target data element.
ptg
Summary
This chapter covered the data integration analysis tasks, steps, and techniques necessary to determine the requirements for a data integration solution.
The first task is to graphically scope the project by building a “picture” of the intended data integration processes in a conceptual data integration diagram. Once documented and the scope is identified and confirmed, attention is moved to the source systems.
Much of the time spent in difficult downstream development phase errors are a result of a lack of knowledge of the source systems (not the target); therefore, a significant amount of time and effort needs to be spent on determining the structures, the content, and the explicit and implicit business rules of the data.
Gaining an understanding of this data requires an iterative approach of profiling and ana-lyzing the data first within the file or table (e.g., columnar profiling) and then across the data files or tables.
We reviewed the fact that data mapping is not a one-to-one exercise but requires both a hor-izontal and vertical view of the sources to target.
The key theme of iterative design was embedded in all the tasks in this chapter. For example, the understanding of the data sources and how to map those sources to the target usually requires more than one pass to get it right.
The next chapter begins the next of a multichapter case study that goes through the entire data integration life cycle. Chapter 6, “Data Integration Analysis Case Study,” focuses on apply-ing the analysis techniques in this chapter to the Wheeler Automotive Company.
End-of-Chapter Questions
Question 1.
How does a conceptual data integration model help define scope?
Question 2.
What are the reasons why source system data discovery is so difficult?
Question 3.
Define data profiling.
Question 4.
Define data mapping.
Question 5.
Using the following diagram, what type of data mapping scenario is this?
System 1 Customer # Alpha 15
System 2 Customer Number Social Security 9
System 3 Customer # Numeric 06
Involved Party ID Alphanumeric 20
ptg