• No results found

Datalog Evaluation: Datalog to Algebra

6 – Deductive & Object-Oriented Databases

6.1 Deductive Databases and Datalog

6.1.6 Datalog Evaluation: Datalog to Algebra

The model-theoretic semantics provides for understanding the meaning of a Datalog program. This understanding is closely associated with the declarative meaning and consequently, from a processor point of view, we would like to have a mapping from declarative constructs to procedural constructs to enable an evaluation of the program. A likely target is the relational algebra. It is a straightforward algorithm to translate a simple sub-set of Datalog programs into algebraic expressions (whose input is an EDB) that evaluate the program.

Let us describe the sub-set of Datalog programs we can provide an algebraic evaluation for. The limitations are: positive Datalog programs; each variable appearing in a rule is limited; and no intentional predicate (i.e. rule) is recursive. For these types of programs our model theoretic semantics, proof theoretic semantics, and the algebraic computation evaluation

Deductive & Object-Oriented Databases - Page [ 130]

coincide. Indeed the Datalog programs in this category are simple (i.e. the example above is recursive and therefore not applicable to the following conversion).

The general outline of the evaluation algorithm depends on initially re-ordering the predicates in a program by their evaluation dependency; which is easily extracted by examining each predicates sub-goals. For example the evaluation of predicate PARENT (in the above example) depends on having evaluated its sub-goals; i.e. the predicates MOTHER and FATHER. The EDB predicates, like MOTHER and FATHER, do not depend on any other predicate. Some predicates do depend on themselves to be evaluated (i.e. recursive) with predicate ANC being such–this is a reason why we are proposing this evaluation method on non-recursive programs. Through this dependency relation one can build a graph – called a

dependency graph. In a dependency graph each program-defined predicate is represented

as a node, and each dependency relation is depicted through a directed edge. The dependency graph of the example program is presented in figure 6.1.

mother(x,y) father(x,y) parent(x,y)

anc(x,y)

Figure 6.1 – Predicate dependency graph – negated sub-goals are adorned with a negation symbol

According to the predicate dependency graph of a programs one starts by identify rules that depend on predicates that have already been evaluated. For each of these one then needs to extract from the program all the rules in which this predicate is its head. And for each of these rules we need to form a relation that is the natural join of the sub-goal’s relations (which from the dependency graph we know have already been evaluated). During this process one also needs to restrict the extent of the new relation through any constants or built-in predicates found in the sub-goals. The next step is to project attributes required by

Deductive & Object-Oriented Databases - Page [ 131]

the rule head from the natural join created. If the predicate being evaluated has a number of rules then one must union each evaluation. Some technical details are left out here; but the interested reader can refer to the whole procedure as found in Ullman’s textbook [ULLMA90] (algorithm 3.1 p.109, V. I).

How to evaluate query parent(x,y)?

eval(parent(x,y)) == UNION(mother(x,y),father(x,y)).

Assume we have another rule to introduce the predicate grandparent(x,y), then rule and a query evaluation follow:

grandparent(x,y) <- parent(x,z), parent(z,y). The evaluation of query grandparent(x,y) follow:

eval(grandparent(x,y)) ==

PROJECT([1st,4th], SELECT([2nd=3rd],PRODUCT(PARENT,PARENT))). Remark: 2nd and 3rd represent the second and third column

Remark: of PRODUCT result

It is relatively easy to show that this algorithm produces the facts that can be proved from the database and that these facts (i.e. IDB and EDB) correspond to the unique minimal model.

6.1.6.1 Recursive Datalog Evaluation

We now need to consider an algorithm to compute the minimal model for a positive and recursive Datalog program. It is not hard to accept that the IDB rules are a basis for constructive build-up of the IDB relations. This observation, together with our previous evaluation algorithm for non-recursive programs form a framework of the required computation. Understandably the crucial point of this framework is that in recursive rules a predicate is mentioned both in the head and in the body. The technique used to solve recursive rules evaluation is the fixpoint mapping. In the fixpoint evaluation of a recursive rule we start with an empty relation, and execute an assignment from the EDB to the IDB and hence derive new facts for the recursive relation from new assignments until no new facts are derived through the recursive rule. This technique is usually referred to as the naïve evaluation and is attributed to Chang [CHANG88]. The pseudo code that follows is

from [ULLMA90].

remark: let us assume that the datalog program has k EDB predicates remark: and M IDB relations

FOR I: = 1 to M DO

Pi := 0; -- set predicate extent to null REPEAT

Deductive & Object-Oriented Databases - Page [ 132]

For I := 1 to M do Qi := Pi; For I := 1 to M do

Pi := eval(pi, R1, …, Rk, Q1, … , Qm); UNTIL Pi = Qi for all I, 1 <= I <= M;

OUTPUT Pi’s

The fixed point of a program given an EDB and the derived IDB relations form a model of the program. It is a well-known result that Datalog programs have a unique minimal model and this coincides with their unique minimal fixed point. At this point an important bridge is our requirement that the naïve evaluation of a program relative to an EDB does reach a fixed point and that this fixed point coincides with the minimal model of the program.

To show that the naïve evaluation converges to a fixed point we need to establish that our call to the evaluation round is monotonic (the output is not smaller than the input) and that there is an upper limit of rounds. The relational algebraic operations of ‘union’, ‘select’, ‘project’ and ‘product’ are monotonic (but the ‘diff’ operator is non-monotonic). As the algebraic expression built by the evaluation round is based on these operands then the evaluation call per round is monotonic. To establish that there is indeed an upper limit to the number of rounds one can use the number of constants and predicates in the program to establish the limit’s magnitude (i.e. each evaluation’s increment fact’s arguments are constants from the program and the number of predicates and arities are known). The final point of the proof is the establishment that the fixed point reached is indeed the least fixed point.

The naïve evaluation approach is greatly improved by the semi naïve evaluation which basically prunes the search space in each round to the newly derived facts (i.e. new facts generated in the previous round) – Bancilhon is credited with its introduction [BANCI85]. A

totally different approach is the magic set approach [BANCI86] that combines bottom up and

top down approaches by passing evaluation information sideways and rewriting the original program to reflect this. Also a control algorithm is required to drive the evaluation – should result be a set at a time or fact at a time.