• No results found

Aggregate and Surrogate Key Process

In document Data Warehouse Design with UML (Page 176-184)

9.3 Modeling ETL processes

9.4.3 Aggregate and Surrogate Key Process

In Figure 9.10, we have represented a loading process into a DW. The grain level of Sales(the data source) is ticket line, but we need the daily total sales in the DW (Sales fact table at the right hand side of Figure 9.10). Therefore, Sales are grouped and summed up by product and date inSummedSales. Then,SurrogatedSalesadds a surrogate key (Time) based on theDate attribute before loading the data into theDW. This surrogate key is used in theDWto establish a relation between Sales fact andTime dimension (Sales contains a foreign key toTime). Finally,SalesLoaderloads summarized sales into Sales fact table,ProductsLoader loads the product list into Products dimension, andTimeLoader loads time data intoTime dimension.

9.5. Conclusions 151

Figure 9.10: Aggregate and surrogate key process

9.5

Conclusions

In this chapter, we have presented the modeling ofETLprocesses as part of our integrated and global approach for DWdesign. Thanks to the use of theUML, we can seamlessly model different aspects of a DW architecture such as operational data sources, logical schema and ETLprocesses in an integrated manner. In this way, it is very easy to detect inconsistencies between the different schemas of aDW

architecture and it helps the designer to estimate the feasibility of the development.

Our approach for modelingETLprocesses defines a set of UML

stereotypes that represent the most common ETL tasks such as the integration of different data sources, the transformation between source and target attributes, the generation of surrogate keys, and so on. Furthermore, thanks to the use of the UMLpackage mech- anism, largeETLprocesses can be easily modeled in differentETL

packages obtaining a very simple but yet powerful approach. Thanks to its simplicity, our approach facilitates the design and subsequent maintenance of ETLprocesses at any modeling phase. Finally, we have implemented our approach in Rational Rose through theRose Extensibility Interface (REI)[107].

Part III

Physical Level

Chapter 10

Physical Modeling of

Data Warehouses

During the few last years, few efforts have been dedicated to the modeling of the physical design (i.e. the physical structures that will host data together with their corresponding implementations) of aDWfrom the early stages of aDWproject. In this chapter, we present a proposal for the modeling of the physical design of DWby using thecomponent

diagram and deployment diagram of UML. With these diagrams, we

can anticipate important physical design decisions that may reduce the overall development time of aDWsuch as replicating dimension tables, horizontal partitioning of a fact table, the use of particular servers for certainETLprocesses and so on.

Contents

10.1 Introduction . . . 157 10.2 UML Component and Deployment Di-

agrams . . . 158

10.2.1 Component Diagram . . . 158 10.2.2 Deployment Diagram . . . 159

10.3 Data Warehouse Physical Design . . . . 162

10.3.1 Source Physical Schema . . . 165 10.3.2 Data Warehouse Physical Schema . . . 166 10.3.3 Integration Transportation Diagram . . 167 10.3.4 Client Physical Schema . . . 169 10.3.5 Customization Transportation Diagram 169

10.1. Introduction 157

10.1

Introduction

Unfortunately, most of the research efforts in designing and modeling

DW has been focused on the development of MD data models [2],

For more infor- mation about the multidimensional modeling, consult section 6.2, pp. 56.

while the interest on the physical design of DW has been very poor (see related work in Chapter 3). Nevertheless, an outstanding phys- ical design is of a vital importance and highly influences the overall performance of theDW [93] and the ulterior maintenance.

In DW, as in any other software project, once the conceptual and logical design have been accomplished, we have to deal with the physical design that implements the corresponding specification. Nevertheless, inDWand mainly due to the large volume of data that they manage, we normally face with a high number of implementa- tion problems such as the storage of fact tables in different hard disks, copying the same table, vertical and horizontal partitioning and so on. Due to the idiosyncrasy of DW, we can adopt several decisions regarding the physical design from the early stages of aDWproject (in which final users, designers and analysts, and administrators par- ticipate). We believe that these decisions will normally reduce the total development time of theDW. It should be taken into consider- ation that we are not saying to accomplish the conceptual modeling of a DW taking into account physical issues, instead we argue to model physical aspects and ulterior implementations together with the conceptual modeling of theDW from the early stages of aDW

project.

In this chapter, we present a proposal to accomplish the physical design of DW from early stages of a DW project. To accomplish this, we propose the use of the component diagram and deployment diagram of UML. Bothcomponent and deployment diagrams must be defined at the same time by DW designers and people who will be in charge of the ulterior implementation and maintenance. This is mainly due to the fact that, while the former know how to design and build aDW, the latter have a better knowledge in the corresponding implementation and the real hardware and software needs for the correct functioning of theDW.

The modeling of the physical design of aDWfrom the early stages of a DWproject with our proposal provides us many advantages:

• We deal with important aspects of the implementation before we start with the implementation process, and therefore, we can reduce the total development time of the DW. This is mainly due to the fact that, after the conceptual modeling has been accomplished, we can have enough information to take some decisions regarding the implementation of the DW structures such as replicating dimension tables or making the horizontal

partitioning of a fact table.

• We have a rapid feedback if we have a problem with theDW

implementation as we can easily track a problem to find out its main reasons.

• It facilitates the communication between all people involved in the design of a DW since all of them use the same notation (based on theUML) for modeling different aspects of a DW.

• It helps us choose both hardware and software on which we in- tend to implement theDW. This also allows us to compare and evaluate different configurations based on user requirements.

• It allows us to verify that all different parts of theDW(fact and dimension tables,ETLprocesses,OLAPtools, etc.) perfectly fit together.

The rest of this chapter is organized as follows. In Section 10.2, we present main issues that can be specified by using both compo- nent and deployment diagrams of UML. In Section 10.3, we describe our proposal for using both component and deployment diagrams for the physical design of DW. Finally, in Section 10.4, we present our conclusions.

In document Data Warehouse Design with UML (Page 176-184)