Build the Logical Extract DI Models - Case Study: Customer Loan Data Warehouse Project

Case Study: Customer Loan Data Warehouse Project

Step 3: Build the Logical Extract DI Models

The ﬁrst question is how we structure the logical extract data integration model or models, one or many. For our case study, there are only three sources: the customer hub, commercial loan, and retail loan.

It is best to put all three sources on the same diagram for the sake of simplicity. In practice, however, there are some things to consider:

• Multiple data sources—Most projects have many, many sources. In a new data ware-house build-out, a typical data integration project can have from 20 to 30 sources, which at a conceptual and high level can potentially be displayed on one page, but not with any detail.

• Modularity 101—Following the development technique of one function per process, focusing on one source per extract data integration model will be reﬁned from analysis through design into building one data integration job per source system.

In addition, we will need to build three logical extract data integration models, one per source system. These activities include the following:

ptg

Step 3: Build the Logical Extract DI Models 73

• Conﬁrming the subject area focus from the data mapping document

• Reviewing whether the existing data integration environment can fulﬁll the requirements

• Determining the business extraction rules

Conﬁrm the Subject Area Focus from the Data Mapping Document

Conﬁrm the target database subject areas. Subject is deﬁned as a logical grouping or “super type”

of entities/tables surrounding a business concept. An example is the Party concept, which may have multiple entities such as Party, which includes the following entities:

• Customer

• Employee

• Individual

By grouping the entities/tables from subject areas such as Party into a target subject area, a common target is created that multiple source systems can be mapped in such a way to be con-formed into a common format, as shown in Figure 4.6 from our earlier example.

So what are the subject areas for the data warehouse and data mart? By reviewing the data models, a pattern can be determined for logical groupings for subject areas. In the tables deﬁned for both models, a clear grouping can be observed:

• Customer

• Customers

• Addresses

• Loan

• Loans

• Products

So for our logical data integration models, the following subject area ﬁles will be used:

• CUST.dat

COB-TYPE PIC S9(3)

COM Field Name Length and Type

LN-TYP-IXR PIC S10(2)

RETL Field Name Length and Type Loan Type Decimal 10.2

EDW Field Name Length and Type

Figure 4.6 Subject area mappings

ptg

Review Whether the Existing Data Integration Environment Can Fulﬁll the Requirements

One of the major tenets of building data integration models and components from the models is reuse. It is our nature to build ﬁrst and then look for reuse opportunities! So to break that bad habit, let’s look ﬁrst, especially in a maturing data integration environment if a model exists and then build new if necessary.

Determine the Business Extraction Rules

Determine what needs to occur to extract or capture the data from the source system.

For batch, determine when and how the ﬁles need to be captured:

• From the source system?

• From an extract directory?

• When (e.g., 3:00 a.m.)?

For real time, determine when and how the transactional packets need to be captured:

• From a message queue?

• From the source system log?

Control File Check Processing

An important aspect of extraction is confirming that the data extract is correct. The best practice used to verify file extracts is control file check, which is a method to ensure that the captured files meet predefined quality criteria, as shown in Figure 4.7.

111 112 113 114 115

$90,000

$11,000

$120,000

$45,000

$38,000

$304,000

5 $304,000

Loan File Control File Total Loans Loan Amount Loan File

Loan Number Loan Amount

Figure 4.7 Sample control ﬁles

Complete the Logical Extract Data Integration Models

The ﬁnal step is to assemble the requirements into the logical extract data integration models.

Figures 4.8, 4.9, and 4.10 illustrate the customer hub logical extract data integration model, the commercial loan logical extract data integration model, and the retail loan logical extract data integration model.

ptg

Step 3: Build the Logical Extract DI Models 75

Model Name: Customer Logical Extract Data Integration Model Project: Customer Loan

Life Cycle Type: Logical DI Architecture Layer: Extract

Extract Header

& Detail from the Customer

Hub Verify the

Header and Detail Extract

with the Control File

Format into the CUST.dat Subject Area

File Customer

Hub Application

Header

Detail

CUST.dat Subject Area File

Figure 4.8 Customer logical extract data integration model

Commercial Loan Application

COM 010

COM 200

Model Name: Domestic Order Management Logical Extract Data Integration Model Project: Wheeler Enterprise Data Integration

Life Cycle Type: Logical DI Architecture Layer: Extract

Extract COM 010 and COM 200 from the Commercial Loan System

Verify the COM 010 and COM 200 Extracts with the Control File

Format COM010 into the CUST.dat Subject Area

File

Format COM200 into the LOAN.dat Subject Area

File

LOAN.dat Subject Area File

CUST.dat Subject Area File

Figure 4.9 Commercial loan logical extract data integration model

ptg

Retail Loan Application

RETL 010 RETL 020

LOAN.dat Subject Area File

CUST.dat Subject Area File Format RETL

010 into the CUST.dat Subject Area

File

Format RETL 020 into the LOAN.dat Subject Area

File Extract RETL

010 and RETL 020 from the Retail Loan System

Verify the RETL 010 and

RETL 020 Extracts with the Control File Model Name: Retail Loan Logical Extract Data Integration Model Project: Customer Loan

Life Cycle Type: Logical DI Architecture Layer: Extract

Figure 4.10 Retail loan logical extract data integration model

Final Thoughts on Designing a Logical Extract DI Model

One of the key themes is to get the “big” picture before design; it is best practice to ﬁrst identify all the sources then evaluate each of the data sources in its entirety.

In addition, to leverage the “read once, write many” best practice, when extracting from a source, rather than only extracting the data elements needed for a speciﬁc target, it is best to extract the entire ﬁle for both current and potentially future sourcing needs.

When extracting a limited set of data for a single application or database, it is highly probable that there will be the need to extend the application, or rewrite the application, or in the worst case, write another extract from the same source system.

In document Data Integration Blueprint and Modeling (Page 97-101)