View Adaptation Using Auxiliary Attributes

Literature Review

2.4 View Adaptation Using Auxiliary Attributes

The work presented in [MD96] either extends or provides more efficient methods based on the centralised approach for data warehouses. They proposed a view adaptation mechanism that saves network communication cost, mainly the data transferring cost, and does not rely on foreign keys. Their approach augments base relations and derived materialised views with a join-count and a derive-count attribute, respectively. The join-count on a base relation indicates how many times a tuple joins with tuples in other relations and the derive-count attribute represents the number of derivations of each view tuple. They support the situation that dangling tuples are allowed.

Unlike the approach in [GMR95], which relies on the appearance of the foreign key at-tributes, in this approach, when changes take place in the SELECT clause, it stores join attributes as extra information with views and with an extra join-count attribute indicating the number of times a tuple joins with other tuples in the second relation. Similar to the

CustomerID OrderDate OrderT otal

c1 25/03/2010 1 c1 04/02/2010 2 c2 10/01/2010 2 c2 15/01/2010 1 c3 01/01/2010 5

(a) Materialised View V

CustomerID AddressID FirstName LastName join-count

c1 a1 f1 l1 2

c2 a2 f2 l2 2

c3 a3 f3 l3 1

c4 a4 f4 l4 0

c5 a5 f5 l5 0

(b) Customer (Augmented with join-count at-tribute)

Table 2.3: Sample View and Augmented Relation

previous approach, the purpose of augmenting the view with extra data is to facilitate the identification of the relevant data required for adaptation and additionally, to reduce the network communication cost. Example 2.6 gives an illustration of a change takes place in the SELECT clause. Table 2.2 shows the sample data of the Customer and Orders relations that are used in Example 2.6.

Assume we wish to define a view V , which stores all details of the customer orders includ-ing customer id, order date and order total. The view is expressed in Example 2.6 and the materialised data of the view is shown in Table 2.3a.

Example 2.6 (Handle Changes with a Join-Count Attribute) CREATEVIEW V AS SELECT CustomerID, OrderDate,

OrderTotal FROM Customer & Orders

Suppose one would like to add customer names into V . This involves retrieving all tuples from the Customer relation and joining them with the materialised view V . However, by observation, one would easily find that customer c4 and c5 (see Table 2.2) have no orders at all. Therefore, retrieving c4 and c5 from the base relation is a redundant process, which may become a significant cost when the number of redundant tuples are large and the Customer relation is stored on a different site on the network. By augmenting the base relation with an additional join-count attribute as shown in Table 2.3b, the process can avoid unnecessary data transferring cost caused by the dangling tuples, e.g., c4 and c5. The old view is then

FirstName LastName OrderT otal

deri ve-count

f1 l1 1 1

f1 l1 2 1

f2 l2 1 1

f2 l2 2 1

f3 l3 5 1

(a) Before Adaptation

FirstName LastName deri ve-count

f1 l1 2

f2 l2 2

f3 l3 1

f4 l4 1

f5 l5 1

(b) After Adaptation

Table 2.4: Change in FROM Clause

adapted by joining the old materialised view with tuples retrieved from the base relation and projecting out the new attributes that are desired, e.g., in this case, FirstName and LastName.

When removing a relation from the FROM clause, the previous approach [GMR95] works under very restricted parameters: when duplicate tuples are maintained and where dangling tuples are permitted in the view. As depicted in Example 2.7, the algorithm presented in this work does not have such restrictions as extra or redundant tuples can be identified by using the join-count and derive-count attributes.

Define a view V containing order information of all customers, the view is expressed in Example 2.7 where the materialised data contained in V is shown in Table 2.4a with an additional derive-count attribute.

Example 2.7 (Using Join-Count and Derive-Count Attributes) CREATEVIEW V AS SELECT FirstName, LastName, OrderTotal

FROMCustomer & Orders

Suppose one would like to delete relation Orders from V , which leads to the deletion of OrderTotal column in V as it is part of the relation Orders. The materialised view becomes less restricted after deleting and therefore, tuples that were previously eliminated should now be added into the view. In this case, customer c4 and c5 should be added. The following steps are used to achieve the adaptation:

• Execute the statement SELECT FirstName, LastName, derive-count INTO I FROM V. The purpose of this is to temporarily store attributes that will not be deleted into a relation I.

• Find tuples in Customer relation (Table 2.3b) where the join-count is zero and store them in a temporarily relation C, i.e., c4 and c5.

• Compute the new view V⁰= I ∪ C; also update the derive-count (see cells in red in Table 2.4b);

The new materialisation results from the change is shown in Table 2.4b. As shown in Example 2.7 extra data transferring costs are eliminated when retrieving data from base relations and it is not necessary to retrieve data with join-count greater than 0 as they are already in the materialised view. The purpose of these examples is to demonstrate how this work addresses the shortcomings in [GMR95] where network communication costs were not considered. These costs are crucial in systems such as data warehouses as we will show in our experiments.

Summary and Issues. One of the problem involved in this approach is that, depending on the number of joins existing in a view, each join relation involved in the FROM clause is augmented by one or more join-count and derive-count attributes. When there are many relations involved in the query and each relation contains a large number of tuples, a signifi-cant amount of additional data is added into each relation. Additionally, views are treated as single entities and adaptation algorithms cannot detect and reuse exiting materialised data to further improve the adaptation performance or to reduce the number of accesses to the source data.

The next approach we are about to present considers the reusability of the old materiali-sation. Rather than augmenting the materialised view, their purpose is to find maximum reusability of the existing materialisation for the view adaptation process.

In document The XFM view adaptation mechanism: An essential component for XML data warehouses (Page 30-34)