• No results found

Ernest Teniente

4.3 Query Processing

Deductive DBMSs must provide a query-processing system able to answer queries specified in terms of views as well as in terms of base predicates. The subject of query processing deals with finding answers to queries requested on a certain DB. A query evaluation procedure finds answers to queries according to the DB semantics.

In Datalog syntax, a query requested on a deductive DB has the form

?-W(x),wherexis a vector of variables and constants, andW(x)is a conjunc- tion of literals. The answer to the query is the set of instances ofxsuch that

W(x) is true according to the EDB and to the IDB. Following are several examples.

?- Ancestor(John, Mary) returns true if John is ancestor of Mary and false otherwise.

?- Ancestor(John, x) returns as a result all personsxthat have John as ancestor.

?- Ancestor(y, Mary) returns as a result all personsythat are ancestors of Mary.

?- Ancestor(y, Mary)∧Ancestor(y, Joe) returns all common ancestorsy

of Mary and Joe.

Two basic approaches compute the answers of a queryQ:

• Bottom-up (forward chaining). The query evaluation procedure starts from the base facts and applies all deductive rules until no new consequences can be deduced. The requested query is then evaluated against the whole set of deduced consequences, which is treated as if it was base information.

• Top-down (backward chaining). The query evaluation procedure starts from a queryQand applies deductive rules backward by trying to deduce new conditions required to makeQtrue. The conditions are expressed in terms of predicates that defineQ, and they can be understood as simple subqueries that, appropriately combined, pro- vide the same answers asQ. The process is repeated until conditions only in terms of base facts are achieved.

Sections 4.3.1 and 4.3.2 present a query evaluation procedure that fol- lows each approach and comments on the advantages and drawbacks. Section 4.3.3 explains magic sets, which is a mixed approach aimed at achiev- ing the advantages of the other two procedures. We present the main ideas of each approach, illustrate them by means of an example, and then discuss their main contributions. A more exhaustive explanation of previous work in query processing and several optimization techniques behind each approach can be found in most books on deductive DBs (see, for instance, [1, 8, 9, 24]).

The following example will be used to illustrate the differences among the three basic approaches.

Example 4.2

Consider a subset of the rules in Example 4.1, with some additional facts:

Father(Anthony, John) Mother(Susan, Anthony)

Father(Anthony, Mary) Mother(Susan, Rose)

Father(Jack, Anthony) Mother(Rose, Jennifer)

Father(Jack, Rose) Mother(Jennifer, Monica)

Parent(x,y)←Father(x,y) (rule R1)

Parent(x,y)←Mother(x,y) (rule R2)

GrandMother(x,y)←Mother(x,z)∧Parent(z,y) (rule R3)

4.3.1 Bottom-Up Query Evaluation

The naive procedure for evaluating queries bottom-up consists of two steps. The first step is aimed at computing all facts that are a logical consequence of the deductive rules, that is, to obtain the minimal Herbrand model of the deductive DB. That is achieved by iteratively considering each deductive rule until no more facts are deduced. In the second step, the query is solved

Deductive Databases 103

TEAM

FLY

against the set of facts computed by the first step, since that set contains all the information deducible from the DB.

Example 4.3

A bottom-up approach would proceed as follows to answer the query

?-GrandMother(x, Mary), that is, to obtain all grandmothersxofMary: 1. All the information that can be deduced from the DB in Example

4.2 is computed by the following iterations: a. Iteration 0: All base facts are deduced.

b. Iteration 1: Applying rule R1 to the result of iteration 0, we get

Parent(Anthony, John) Parent(Jack, Anthony)

Parent(Anthony, Mary) Parent(Jack, Rose)

c. Iteration 2: Applying rule R2 to the results of iterations 0 and 1, we also get

Parent(Susan, Anthony) Parent(Rose, Jennifer)

Parent(Susan, Rose) Parent(Jennifer, Monica)

d. Iteration 3: Applying rule R3 to the results of iterations 0 to 2, we further get

GrandMother(Rose, Monica) GrandMother(Susan, Mary) GrandMother(Susan, Jennifer) GrandMother(Susan, John) e. Iteration 4: The first step is over since no more new

consequences are deduced when rules R1, R2, and R3 are applied to the result of previous iterations.

2. The query?-GrandMother(x, Mary) is applied against the set con- taining the 20 facts deduced during iterations 1 to 4. Because the fact GrandMother(Susan, Mary) belongs to this set, the obtained result is x = Susan, which means that Susan is the only grand- mother of Mary known by the DB.

Bottom-up methods can naturally be applied in a set- oriented fashion, that is, by taking as input the entire extensions of DB predicates. Despite this important feature, bottom-up query evaluation presents several drawbacks.

• It deduces consequences that are not relevant to the requested query. In the preceding example, the procedure has computed several

data about parents and grandmothers that are not needed to compute the query, for instance, Parent(Jennifer, Monica), Parent(Rose, Jennifer), Parent(Jack, Anthony), or GrandMother (Susan, Jennifer).

• The order of selection of rules is relevant to evaluate queries effi-

ciently. Computing the answers to a certain query must be per- formed as efficiently as possible. In that sense, the order of taking rules into account during query processing is important for achieving maximum efficiency. For instance, if we had con- sidered rule R3 instead of rule R1 in the first iteration of the pre- vious example, no consequence would have been derived, and R3 should have been applied again after R1.

• Computing negative information must be performed stratifiedly. Negative information is handled by means of the CWA, which assumes as false all information that cannot be shown to be true. Therefore, if negative derived predicates appear in the body of deductive rules, we must first apply the rules that define those predicates to ensure that the CWA is applied successfully. That is, the computation must be performed strata by strata.

4.3.2 Top-Down Query Evaluation

Given a certain query Q, the naive procedure to evaluate Q top-down is aimed at obtaining a set of subqueries Qi such that Q’s answer is just the union of the answers of each subqueryQi. To obtain those subqueries, each derived predicatePinQmust be replaced by the body of the deductive rules that defineP. Because we only replace predicates inQby their definition, the evaluation of the resulting queries is equivalent to the evaluation ofQ, when appropriately combined. Therefore, the obtained subqueries are “simpler,” in some sense, because they are defined by predicates “closer” to the base predicates.

Substituting queries by subqueries is repeated several times until we get queries that contain only base predicates. When those queries are reached, they are evaluated against the EDB to provide the desired result. Constants of the initial queryQare used during the process because they point out to the base facts that are relevant to the computation.

Example 4.4

The top-down approach to compute ?-GrandMother(x, Mary) works as

follows:

1. The query is reduced to Q1:?- Mother(x,z)∧ Parent(z, Mary) by using rule R3.

2. Q1 is reduced to two subqueries, by using either R1 or R2:

Q2a:?- Mother(x, z)∧Father(z, Mary)

Q2b:?- Mother(x, z)∧Mother(z, Mary)

3. Query Q2a is reduced to Q3: ?- Mother(x, Anthony) because the DB contains the fact Father(Anthony, Mary).

4. QueryQ2b does not provide any answer because no fact matches

Mother(z, Mary).

5. QueryQ3 is evaluated against the EDB and givesx =Susan as a result.

At first glance, the top-down approach might seem preferable to the bottom-up approach, because it takes into account the constants in the initial query during the evaluation process. For that reason, the top-down approach does not take into account all possible consequences of the DB but only those that are relevant to perform the computation. However, the top-down approach also presents several inconveniences:

• Top-down methods are usually one tuple at a time. Instead of reason- ing on the entire extension of DB predicates, as the bottom-up method does, the top-down approach considers base facts one by one as soon as they appear in the definition of a certain subquery. For that reason, top-down methods used to be less efficient.

• Top-down may not terminate. In the presence of recursive rules, a top-down evaluation method could enter an infinite loop and never terminate its execution. That would happen, for instance, if we con- sider the derived predicate Ancestor in Example 4.1 and we assume that a top-down computation starts always by reducing a query about Ancestor to queries about Ancestors again.

• It is not possible to determine always, at definition time, whether a top- down algorithm terminates.Thus, in a top-down approach we do not know whether the method will finish its execution if it is taking too much time to get the answer.

• Repetitive subqueries. During the process of reducing the original query to simpler subqueries that provide the same result, a certain subquery may be requested several times. In some cases, that may

cause reevaluation of the subquery, thus reducing efficiency of the whole evaluation.

4.3.3 Magic Sets

The magic sets approach is a combination of the previous approaches, aimed at providing the advantages of the top-down approach when a set of deduc- tive rules is evaluated bottom-up. Given a deductive DB Dand a query Q

on a derived predicateP, this method is aimed at rewriting the rules ofDinto an equivalent DB D′by takingQinto account. The goal of rule rewriting is to introduce the simulation of top-down into D′ in such a way that a bottom-up evaluation of rules inD′will compute only the information nec- essary to answerQ. Moreover, the result of evaluatingQonD′is equivalent to queryingQonD.

Intuitively, this is performed by expressing the information of Q as extensional information and by rewriting the deductive rules ofDused dur- ing the evaluation of Q. Rule rewriting is performed by incorporating the information ofQin the body of the rewritten rules.

Example 4.5

Consider again Example 4.2 and assume now that it also contains the follow- ing deductive rules defining the derived predicate Ancestor:

Ancestor(x,y)←Parent(x,y)

Ancestor(x,y)←Parent(x,z)∧Ancestor(z,y)

Rewritten “magic” rules for evaluating bottom-up the query?-Ancestor(Rose,x) are as follows:

Magic_Anc(Rose)

Ancestor(x,y)←Magic_Anc(x)∧Parent(x,y) (rule R1)

Magic_Anc(z)←Magic_Anc(x)∧Parent(x,z) (rule R2)

Ancestor(x,y)←Magic_Anc(x)∧Parent(x,z)∧Ancestor(z,y) (rule R3) Assuming that all facts about Parent are already computed, in particular, Parent(Rose, Jennifer) and Parent(Jennifer, Monica), a naive bottom-up evaluation of the rewritten rules would proceed as follows:

1. The first step consists of seven iterations.

a. Iteration 1: Ancestor(Rose, Jennifer) is deduced by applying R1. b. Iteration 2: Magic_Anc(Jennifer) is deduced by applying R2. c. Iteration 3: No new consequences are deduced by applying R3. d. Iteration 4: Ancestor(Jennifer, Monica) is deduced by applying R1. e. Iteration 5: Magic_Anc(Monica) is deduced by applying R2. f. Iteration 6: Ancestor(Rose, Monica) is deduced by R3.

g. Iteration 7: No new consequences are deduced by applying R1, R2, and R3.

2. The obtained result is {Ancestor(Rose, Jennifer), Ancestor(Rose, Monica)}.

Note that by computing rewritten rules bottom-up, we only deduce the information relevant to the requested query. That is achieved by means of the Magic_Anc predicate, which is included in the body of all rules, and by the fact Magic_Anc(Rose), which allows us to compute only Rose’s descendants.