When the homogeneity condition is dropped, the correctness of a rollup opera-tion does not follow from the cube dependence graph. The problem can be easily realized by noting that the cube dependence graph of a single dimension corre-sponds to its hierarchy schema. The anomalous behavior of the cube dependence graph has deep implications for OLAP query processing, since the cube depen-dence graph is used in algorithms to compute and maintain data cubes (Mumick, Quass, & Mumick, 1997), and to speed up OLAP query processing (Harinarayan et al., 1996).
Implications for Relational OLAP Models
Kimball (1996) argues that the best logical model to place OLAP queries is the star schema. In the star schema the dimension consists of a single table where the categories are arranged as attributes. We will refer to this table as a star dimen-sion. The table organizes the dimension elements as tuples, and allows simple tuple browsing to place filters in OLAP queries. A star dimension requires the hierarchy schema to have a single bottom category, which is the key attribute of the table. The edges in the hierarchy schema are regarded as functional depen-dencies over the table.
Another relational realization of OLAP data is the snowflake schema, which yields snowflake dimensions. A snowflake dimension has one table per category c, which stores together all the rollup relations to the parent categories of c. The tables are normalized avoiding redundant data. However, in contrast to the star dimension, the snowflake dimension requires to process join operations over its tables, in order to assemble tuples for browsing and aggregate query processing.
Relationship among Graph, Star, and Snowflake Dimensions
Dimensions in the graph model we presented (we will refer to them simply as di-mensions, in contrast to star and snowflake dimensions) can be easily translated to a star dimension (Hurtado et al., 1999), provided the dimension is homogeneous and has a single bottom category. The table has as attributes the categories of the dimension. For each bottom element m of the graph dimension, there is a tuple in the table composed of the ancestors of m. As an example, Table 1 shows the star dimension corresponding to the dimension of Figure 4.
The translation of a star dimension to a graph dimension requires capturing the rollup relations from the table. It is simply obtained by projecting the table over
Handling Structural Heterogeneity in OLAP 4
the categories involved in the rollup relation. As an example, the rollup relation Γ[Brand,Category] can be obtained by the following query:
SELECT Brand,Category FROM Table_Product.
The translation between graph and snowflake dimensions is straightforward.
The problems reported in this chapter also appear in star and snowflake dimensions.
We next explain further implications of heterogeneity for them.
Allowing Heterogeneity
In a relational table each element of dimension table needs one entry for each attri-bute. Thus in order to allow structural heterogeneity in star and snowflake schemas, null values should be allowed in the tables. It is not easy to do this since functional dependencies should be interpreted in presence of null values. In order to allow nulls, Lehner et al. (1998) propose weak functional dependencies, that is, functional dependencies A → B that do not constrain tuples when they have null values in the attribute B. The attributes that participate in the right sides of weak functional dependencies are treated outside the hierarchy schema as descriptive attributes for the categories. Weak functional dependencies can be used in snowflake dimensions.
However, in star dimensions we may also need to interpret functional dependen-cies when nulls appear in the left side of the functional dependence. An additional problem is that due to the denormalized nature of star dimensions, heterogeneity may lead to a proliferation of null values in the table. Due to these problems, some researchers have stated that the star schema does not allow structural heterogene-ity (Jagadish et al., 1999).
Table 1.
Product Brand Category Department
p1 b1 c1 d1
p2 b1 c2 d1
p3 b2 c1 d1
p4 b2 c2 d1
p5 b3 c2 d1
44 Hurtado & Gutierrez
Implications to the Representation of the Hierarchy Domain
In the presence of heterogeneity there is no precise correspondence between dimen-sions and star dimensions. The hierarchy domain (i.e., the hierarchical arrangement of elements) of a heterogeneous dimension may not be correctly captured by the star dimension. As an example, if we represent the heterogeneous dimension of Figure 4 as a star dimension, we cannot recover the original child/parent relation back. Indeed, if we try to recover the rollup relation Γ[Brand,Category], we obtain the relation with a single pair [Brand: b3,Category:c2]. The original rollup relation is show in Table 2.
Among other problems, this implies that the standard semantics of drilldown and rollup operations differ for both models. Notice also that in this case, although the dimension is heterogeneous, its star representation does not have null values. This situation illustrates that, as explained in the introductory section, heterogeneity is not only caused by the nonapplicability of attributes (e.g., dimension of Figure 2) but also by the mixture of hierarchies in the dimension (e.g., dimension of Figure 4).
Adapting. Heterogenous. Dimensions
The general approach to handle structural heterogeneity in OLAP is to adapt or transform the dimensions in order to obtain homogeneous data. In this section we examine two approaches along this idea. First, we explain the use of null elements, and then we explore structural adjustments of heterogeneous dimension necessary to obtain homogeneous dimensions. In the latter case, the hierarchy schemas of the original dimensions are modified.
Table 2.
Brand Category
b1 c1
b1 c2
b2 c1
b2 c2
b3 c2
Handling Structural Heterogeneity in OLAP 4