Technical Implementation of the Data Acquisition Process

Chapter 6. The Data Warehouse Definitions

6.2 Technical Implementation of the Data Acquisition Process

We used Visual Warehouse to implement the data acquisition process in this example. We implemented two different scenarios. One scenario uses the target database including the staging tables on a DB2 UDB for AIX; the other scenario was implemented using DB2 UDB on NT.

The administration of both target systems is done from the Windows NT workstation ’KHANKA’, which has the Visual Warehouse server function installed. This server function is not available on AIX.

The Visual Warehouse control database ’VWCNTL01’ resides on the server system ’KHANKA’. The target warehouse databases are located on the Windows NT system ’PALAU’ (SJNTDWH1) and on the AIX system ’AZOV’ (SJAXDWH1).

The transformation from the source data to the target model is defined and managed by Visual Warehouse using Visual Warehouse Business Views (BVs).

The following features of Visual Warehouse were used to support the data transformation:

• Transient BVs for the staging data

• Scheduling and cascading capabilities to automate the data acquisition process

• Work in Progress view of Visual Warehouse to monitor the execution of the data acquisition process

• Visual Warehouse Meta Data Catalog to document and drive the data acquisition process

The BVs for the intermediate sources can be created as transient BVs, that is, the population of the transient BV is triggered by the target BV, and the data is kept only temporarily until the data has arrived in the final dimension table safely. To do this, the Transient Data check box has to be selected on the Information folder of the Business View notebook.

This feature helps you to manage the space in the data warehouse more efficiently, without the administrative overhead needed to get rid of temporary result tables manually.

The actual control flow during the data acquisition process is managed with the scheduling and cascading function of Visual Warehouse.

The transient BV will be referenced by the target BV created as a dimension table for OLAP. If this target BV is populated due to an explicit request or a schedule, the transient BVs are populated first, then the target BV is filled, and the transient BV will be deleted. A transient BV cannot be scheduled! Another way to automate the population of BVs belonging to one target BV (dimension table), is to schedule the execution of the next BV to be populated by the one which just completed. To make sure that all the sources have been populated before the target BV is run, you will have to set this up as a chain. So the first BV will be populated by request, or scheduled by another event, and the next BV will be scheduled by this one, and so on, until the target BV is scheduled as the last one. To do this, you can select the Starts option from the Schedule Type pulldown menu in the Schedule folder of the Business

View notebook. You can add the appropriate BV from the selection panel,

which is then displayed.

Once the BVs and the relationships between them have been defined, the entire data acquisition process can be monitored from the work in progress view within Visual Warehouse Administrator. Every planned, ongoing, and finished execution of every BV is listed in this view. At any point in time, the administrator can identify what the status of the tables in the warehouse is. If there were errors during the execution of a BV, the details of the error situation are logged and made accessible from this work-in-progress view as well.

All the information that is generated during the data acquisition process—that is, information about the structure of the data sources and target tables, mapping information, BV descriptive and administrative information, and statistical information about the execution of the BVs—is stored in the Visual Warehouse meta data catalog. It can be enriched with business definitions and can be accessed by the business analysts in order to help them

understand the content and status of the warehouse as well as to help them to navigate within the warehouse structures.

The design and implementation of a manageable data warehouse

infrastructure is not a trivial task, many different functions are involved. A data staging area is needed to support the transformation of the data from an application centric model to a business centric model. The staging process can have several steps until the final target data model is reached, depending on the diversity of the data sources and the complexity of the transformations. We could solve most of the requirements using Visual Warehouse in this example.

However, in some cases, the functions provided by Visual Warehouse may not be sufficient to address all data quality and data conversion issues. You may have to incorporate user-written code to implement specific

transformations or you may want to use generated code from tools such as ETI Extract and Vality Integrity to support your data acquisition and cleansing process.

Visual Warehouse supports the integration of these other components and allows for the control and management of these components from a single point.

Appendix A. The OS/390 Environment

This appendix provides detailed information and samples of the OS/390 environment.

In document Getting Started with Data Warehouse and Business Intelligence (Page 190-193)