• No results found

Data Warehouse Engineering Process

In document Data Warehouse Design with UML (Page 45-48)

During the last few years, different data models [49, 20, 133, 55, 132] (see Section 3.3 for a review of the most important proposals), both conceptual and logical, have been proposed for theDW design. These approaches are based on their own visual modeling languages or make use of a well-known graphical notation, such as the ER

widely accepted as a standardDWmodel, because they present some important lacks.

On the other hand, differentDWmethods [63, 50, 20, 48, 23, 87] have also been proposed. However, all of them present some of these problems: they do not address the whole DW process, they do not include a visual modeling language, they do not propose a clear set of steps or phases, or they are based on a specific implementation (e.g., the star schema in relational databases).

In [63], different case studies ofData Mart (DM)are presented. TheMDmodeling is based in the use of thestar schema and its dif- ferent variations (snowflake and fact constellation). Moreover, the BUS matrix architecture, which integrates the design of severalDM, is proposed. Although we consider this work as a fundamental refer- ence in theMDfield (R. Kimball provides a very sound discussion of star schema design), we miss a formal method for the design of DW. Furthermore, the conceptual and logical models coincide in this pro- posal, and the concepts about the BUS matrix architecture are a compilation of the personal experiences of the authors and the prob- lems they have faced during the built of enterprise DW from DM. In [65], theDWlifecycle with the most relevant phases is presented: different tools and techniques are suggested, but a method (and a model) for all the process is not proposed.

In [50], the authors propose theDimensional-Fact Model (DFM), a particular notation for theDWconceptual design. Moreover, they also propose how to derive aDW schema from the data sources de- scribed byERschemas. From our point of view, this proposal is only oriented to the conceptual and logical design ofDW, because it does not consider important aspects such as the design of ETLprocesses. Furthermore, the authors assume a relational implementation of the

DW and the existence of all the ER schemas of the data sources, what is impossible many times. Finally, we think that the use of a particular notation makes difficult the application of this proposal.

In [20], the authors present theMultidimensional Model, a logi- cal model for OLAP systems, and show how it can be used in the design of MDdatabases. The authors also propose a general design method, aimed at building an MD schema starting from an opera- tional database described by an ER schema. Although the design steps are described in a logic and coherent way, the DW design is only based on the operational data sources, what is insufficient from our point of view, because the final users’ requirements are very im- portant in theDW design.

In [47], a framework to build a DW in three basic steps (plan- ning, design and implementation, and support and enhancement) is presented. The author highlights the importance of using a method: “Successfully implementing a DW requires a proven framework, or

3.2. Data Warehouse Engineering Process 21

blueprint”. Nonetheless, a model for the analysis and design of a

DW is not provided: only the activities to be carried out and the decisions to be taken are shown.

In [34], a method based on the UMLfor theDW design is pre- sented. From our point of view, the most significant aspect of this proposal is the incorporation of theUMLuse cases in order to specify the roles of each one of the members of theDWdevelopment team. Apart from that, this method does not study in depth some relevant aspects such as the conceptual or logical design of the DW, or the

ETLprocesses; because of this, this approach cannot be considered a detailed method.

In [28], different DW architectures and the activities needed for the construction of a DW are discussed. Although the book has a chapter dedicated to “DW design methodology”, only the steps needed for the construction of the “preferred architecture of a DW” are pre- sented, and the modeling is based on the star schema.

In [87], the building of the star schema (and its different varia- tions) from the conceptual schemas of the operational data sources is proposed another time. And again, it is supposed that the data sources are defined by means of ER schemas. This approach differs in that it does not propose a particular graphical notation for the conceptual design of the DW, but it uses the ER graphical nota- tion.

In [6], the authors mainly focus on the definition of MD hierar- chies, but they also sketch aDW design method based on the three usual modeling levels (conceptual, logical, and physical). The con- ceptual design is based on theUML, but the authors propose their Unified Multidimensional Model for the logical design.

Most recently, in [23] another method for theDW design is pro- posed. This method is based on a MD model called IDEA and it proposes a set of steps that address the conceptual, logical, and phys- ical design of a DW. One of the most important advantages, with respect to the previous proposals, is that the operational data sources together with the final users’ requirements are considered in the de- sign. Nevertheless, this method only considers the data modeling and does not address other relevant aspects, such as theETLprocesses. In [21], different DW development methods are analyzed and a new method is proposed. This method stands out because it inte- grates the management of metadata. However, it lacks a model that can be used to reflect and document theDWdesign.

In [118], a comparison of DW methodologies is presented. The comparison is based on using a common set of attributes to determine which methodology to use in a particular data warehousing project. Nevertheless, the authors only focus on commercial methodologies. Moreover, the authors state that “...the field of data warehousing

is not very mature” and “None of the methodologies reviewed in this article has achieved the status of a widely recognized standard as yet”. Finally, the only work we know that uses theUMLfor the design of DWis [51], which explains the modeling of the star and snowflake schemas using theUML. However, this work only addresses a single step of theDWdesign process and does not propose aUMLexten- sion for DW design: it only shows how to achieve the star schema using theUML.

Therefore, and based on the previous considerations, we believe that currently there is not a general and standard formal method that comprises the main steps of theDW design.

In document Data Warehouse Design with UML (Page 45-48)