• No results found

4.6 Area 1 : Data-Driven Menu

4.6.1 Star Joins

For computing data-driven menus in a DW system, if m is the number of dimensions in the schema, we need to compute m queries [Kes08] q1, . . . , qm, where qi+1returns the next, i.e., the

(i + 1)-th, menu to display:

qi+1:

SELECT DISTINCT Di+1.value FROM Di+1

WHERE key IN

(SELECT FactTable.f keyi+1 FROM FactTable, D1, . . . , Di

WHERE FactTable.f key1 = D1.key AND ... AND FactTable.f keyi = Di.key AND D1.value IN (values1) AND ... AND Di.value IN (valuesi)) In query qi+1, the result column Di+1.value returns the options to be displayed in the (i + 1)-th

menu, while values1, . . . , valuesi in the subquery are the options the user has selected,

respectively, in the 1-st,. . . , i-th menus. As shown in Figure 4.4, for computing q1, . . . , qm, a

dimension Diis joined with the fact table m − i + 1 times in total.

q1 q2 q3 D1 D1 D2 D1 D2 D3

(a) State of the art

q1part. q2part. q3part. D1 D2 D3 (b) Partial Evaluation

Example 14. In this example we compute the first three menus of Feedbase.ch applying state of the art solutions. Query q1 computes the Feeds menu by displaying the name of the feeds for

which data exists in the fact table. q1 : SELECT DISTINCT Feed.name

FROM Feed

WHERE Feed.key IN

(SELECT FactTable.f eedF key FROM FactTable )

Query q2 computes the Nutrients menu by displaying the name of the nutrients measured on the

selected feeds (e.g., Hay and Barley).

q2 :

SELECT DISTINCT Nutrient.name FROM Nutrient

WHERE Nutrient.key IN

(SELECT FactTable.nutrientF key FROM FactTable, Feed

WHERE FactTable.F eedF key=Feed.key AND Feed.name IN (’Hay’,’Barley’))

Finally, query q3 computes the Time menu by displaying the years during which the selected

nutrients have been measured on the selected feeds.

q3 :

SELECT DISTINCT Time.year FROM Time

WHERE Time.key IN

(SELECT FactTable.timeF key FROM FactTable, Feed, Nutrient

WHERE FactTable.F eedF key = Feed.key AND

FactTable.N utrientF key = Nutrient.key AND Feed.name IN (’Hay’,’Barley’) AND

Nutrient.name IN (’Crude Protein’,’Vit. A’) )

The reader can see that, in the IN subuery of q1, . . . , qm, the dimensions are redundantly joined

to the fact table.

In Figure 4.5 we show the query plan for q3. The plan shows that the Time (i.e., the (i + 1)-

th dimension) is scanned and a hash join with the subquery is computed. The subquery is a star-join between the fact table and the Feed and Nutrient (i.e., the 1-st, . . . , i-th) dimensions.

4.6 Area 1 : Data-Driven Menu 107 Commercial DBMSs such as PostgreSQL decompose a star-join with i dimensions, as i binary joins (obtaining a right- or left-deep execution tree [CMX13]). Thus, as shown in Figure 4.5, first the join between the FactTable and the Feed dimension is computed, then the one with the Nutrient dimension. Although selection push-down is applied to the dimensions before the star-join, such a deep query execution tree is not efficient since single dimensions are not selective. Thus, many measurements are retrieved and fetched from the fact table by the first join.

Figure 4.5: Building of the Time menu in the SFDW using Hashing.

In Figure 4.6 we show that the previous query plan is not suitable for building data-driven menus in the SFDW since it takes around 5 seconds to complete, i.e., it does not provide interactive performances to the user. This is a problem in our system since for each request in Feedbase.ch the user computes many data-driven menus (each resulting in a new star-join): for example if (s)he is interested in the nutritive content of the Cereals during Summer in the Northern cantons of Switzerland then (s)he will make selections on 4 menus (feeds, nutrients, seasons, and cantons) causing a total response time of 20 seconds. In Figure 4.6 we also show that although an index can be used to access only selected tuples of the fact table (and avoid hashing the entire fact table) it does not reduce the response time. This is so because single dimension keys are not selective in DWs (e.g., many tuples exist with the same feed value), and many tuples are returned by an indexed join (e.g., the first). Using a composite index slightly increases the performances since it allows to compute all i joins without fetching any tuple from the fact table (all foreign keys are stored in the index). However the query would still require 3 seconds to complete (thus, 12 seconds waiting time for 4 menus) since many keys are returned by the first join. In the figure we also show the runtime of using a denormalized fact table. It consists in storing in the fact table also the dimensional attributes, so that each condition involving them can be evaluated without joining the fact table with the dimensions. However the size of each tuple grows with

one order of magnitude making a scan of the fact table half a minute. Indexes help to fetch only selected tuples from the denormalized fact table but, because the index is built on the dimensional attributes values (and not on the dimension keys), it takes several seconds to answer the query.

0.1 1 10 SELECTIVE UNSELECTIVE runtime [sec] Part. Evaluation Single-Idx StarJ Composite-Idx StarJ Hash StarJ Idx Denorm. FactTable Denorm. FactTable

Figure 4.6: Building of the Time menu in the SFDW.

Summarizing, for q1, . . . , qm, two drawbacks need to be addressed for reducing the response

time of the user:

1. a dimension Diis in total joined to the fact table m − i + 1 times

2. the join with D1 returns many false positives, i.e., tuples that do not satisfy the predicates

on D2or D3, etc.

We will now show how recently proposed techniques can be used to address the second drawback above; then we will introduce partial evaluation for also ensuring that each dimension is not accessed more than once.