• No results found

Basic Concepts of Deductive Databases

Ernest Teniente

4.2 Basic Concepts of Deductive Databases

This section presents the basic notions needed to understand the problems behind deductive DBs. More precisely, it provides a formal definition of deductive DBs, a brief overview of their semantics, a discussion of the advan- tages provided by the intensional information and a comparison of the expressive powers of relational and deductive DBs. The language used to define deductive DBs is usually called Datalog. For that reason, deductive DBs are sometimes known as Datalog programs.

4.2.1 Definition of a Deductive Database

A deductive DB consists of three finite sets: a set of facts, a set of deductive rules, and a set of integrity constraints. Facts state basic information that is known to be true in the DB. Deductive rules allow the derivation of new facts from other facts stored in the DB. Integrity constraints correspond to conditions that each state of the DB should satisfy.

To formally define those concepts, we need to introduce the follow- ing terminology. Atermis either a variable or a constant. IfPis a predicate

Deductive Databases 93

TEAM

FLY

symbol and t1, …, tnare terms, then P(t1, …, tn) is an atom. The atom is

ground if all termsti are constants. Aliteral is either an atom or a negated atom.

Facts, deductive rules, and integrity constraints are represented as clauses of the general form:

A0←L1∧ … ∧Ln withn≥0

whereA0is an atom denoting theconclusionand eachLiis a literal, represent- ing acondition.A0is called theheadandL1∧ … ∧Lnthebody. Variables in the conclusion and in the conditions are assumed as universally quantified over the whole formula. For that reason, quantifiers are always omitted.

Facts are represented by means of clauses with an empty body, that is, atoms. For instance, a fact stating that Mary is the mother of Bob would be represented as Mother(Mary, Bob). A deductive DB contains only ground facts.

Deductive rules define intensional information, that is, information that is not explicitly stored in the DB. That intensional information is gath- ered by means of facts about derived predicates. The definition of a derived predicate P is the set of all deductive rules that have P in their head. For example, the rules “ifxis the mother ofy, thenxis a parent ofy” and “ifx

is the father of y, then x is a parent of y” define the derived predicate Parent(x,y), which corresponds to the classical notion of parent. That can be represented as:

Parent(x,y)←Mother(x,y) Parent(x,y)←Father(x,y)

Integrity constraints are usually represented by means of clauses with an empty head, also calleddenials. A denial has the following form:

←L1∧ … ∧Ln withn≥1

and states thatL1∧ … ∧Lnmay never hold in a DB containing that integrity constraint. Representation by denials entails no loss of generality because any integrity constraint expressed as a general first-order formula can be trans- formed into an equivalent set of clauses containing, at least, an integrity con- straint in denial form [16].

For the sake of uniformity, the head of each integrity constraint usually contains an inconsistency predicate ICn, which is just a possible name given to that constraint. This is useful for information purposes because ICnallows the identification of the constraint to which it refers. If a fact ICiis true in a certain DB state, then the corresponding integrity constraint is violated in that state. For instance, an integrity constraint stating that nobody may be father and

mother at the same time could be represented as IC2 ← Parent(x,y) ∧

Mother(x,z).

A deductive DBDis a tripleD=(F, DR, IC), whereFis a finite set of ground facts,DRa finite set of deductive rules, andICa finite set of integrity constraints. The setFof facts is called the extensional part of the DB (EDB), and the setsDRandICtogether form the so-called intensional part (IDB).

Database predicates are traditionally partitioned into base and derived predicates, also calledviews. A base predicate appears in the EDB and, possibly, in the body of deductive rules and integrity constraints. A derived (or view) predicate appears only in the IDB and is defined by means of some deductive rule. In other words, facts about derived predicates are not explicitly stored in the DB and can only be derived by means of deductive rules. Every deductive DB can be defined in this form [17].

Example 4.1

This example is of a deductive DB describing familiar relationships.

Facts

Father(John, Tony) Mother(Mary, Bob) Father(Peter, Mary)

Deductive Rules

Parent(x,y)←Father(x,y) Parent(x,y)←Mother(x,y)

GrandMother(x,y)←Mother(x,z)∧Parent(z,y) Ancestor(x,y)←Parent(x,y)

Ancestor(x,y)←Parent(x,z)∧Ancestor(z,y) Nondirect-anc(x,y)←Ancestor(x,y)∧ ¬Parent(x,y)

Integrity Constraints

IC1(x)←Parent(x,x)

IC2(x)←Father(x,y)∧Mother(x,z)

The deductive DB in this example contains three facts stating exten- sional data aboutfathersandmothers, six deductive rules defining the inten- sional notions ofparent,grandmother, andancestor, with their meaning being hopefully self-explanatory, andnondirect-anc, which defines nondirect ances- tors as those ancestors that do not report a direct parent relationship. Two integrity constraints state that nobody can be the parent of himself or herself and that nobody can be father and mother at the same time.

Note that inconsistency predicates may also contain variables that allow the identification of the individuals that violate a certain integrity con- straint. For instance, the evaluation of IC2(x) would give as a result the dif- ferent values ofxthat violate that constraint.

4.2.2 Semantics of Deductive Databases

A semantic is required to define the information that holds true in a particu- lar deductive DB. This is needed, for instance, to be able to answer queries requested on that DB. In the absence of negative literals in the body of deduc- tive rules, the semantics of a deductive DB can be defined as follows [18].

An interpretation, in the context of deductive DBs, consists of an assignment of a concrete meaning to constant and predicate symbols. A cer- tain clause can be interpreted in several different ways, and it may be true under a given interpretation and false under another. If a clause C is true under an interpretation, we say that the interpretation satisfiesC. A factF

follows from a setSof clauses; each interpretation satisfying every clause ofS

also satisfiesF.

The Herbrand base (HB) is the set of all facts that can be expressed in the language of a deductive DB, that is, all facts of the formP(c1,…,cn) such

that allciare constants. AHerbrand interpretationis a subset J of HB that contains all ground facts that are true under this interpretation. A ground factP(c1,…,cn) is true under the interpretationJifP(c1,…,cn)∈J. A rule

of the formA0←L1∧ … ∧Lnis true underJif for each substitutionqthat replaces variables by constants, wheneverL1q∈J∧ … ∧Lnq ∈J, then it also holds thatA0q∈J.

A Herbrand interpretation that satisfies a setSof clauses is called a Her- brand model ofS. The least Herbrand model of Sis the intersection of all possible Herbrand models ofS. Intuitively, it contains the smaller set of facts required to satisfyS. The least Herbrand model of a deductive DBDdefines exactly the facts that are satisfied byD.

For instance, it is not difficult to see that the Herbrand interpretation {Father(John,Tony), Father(Peter,Mary), Mother(Mary,Bob), Parent(John,

Tony)} is not a Herbrand model of the DB in Example 4.1. Instead, the interpretation {Father(John,Tony), Father(Peter,Mary), Mother(Mary,Bob), Parent(John,Tony), Parent(Peter,Mary), Parent(Mary,Bob), Ancestor(John, Tony), Ancestor(Peter,Mary), Ancestor(Mary,Bob), Ancestor(Peter,Bob)} is a Herbrand model. In particular, it is the least Herbrand model of that DB.

Several problems arise if semantics of deductive DBs are extended to try to care for negative information. In the presence of negative literals, the semantics are given by means of the closed world assumption (CWA) [19], which considers as false all information that cannot be proved to be true. For instance, given a factR(a), the CWA would conclude that¬R(a) is true ifR(a) does not belong to the EDB and if it is not derived by means of any deductive rule, that is, ifR(a) is not satisfied by the clauses in the deductive DB.

This poses a first problem regarding negation. Given a predicateQ(x), there is a finite number of valuesxfor whichQ(x) is true. However, that is not the case for negative literals, where infinite values may exist. For instance, valuesxfor which¬Q(x) is true will be all possible values ofxexcept those for whichQ(x) is true.

To ensure that negative information can be fully instantiated before being evaluated and, thus, to guarantee that only a finite set of values is con- sidered for negative literals, deductive DBs are restricted to beallowed. That is, any variable that occurs in a deductive rule or in an integrity constraint has an occurrence in a positive literal of that rule. For example, the ruleP(x)←

Q(x)∧ ¬R(x) is allowed, whileP(x)←S(x)∧ ¬T(x,y) is not. Nonallowed rules can be transformed into allowed ones as described in [16]. For instance, the last rule is equivalent to this set of allowed rules: {P(x) ← S(x) ∧ ¬aux-T(x),aux-T(x)←T(x,y)}.

To define the semantics of deductive DBs with negation, the Herbrand interpretation must be generalized to be applicable also to negative literals. Now, given a Herbrand interpretationJ, a positive factFwill be satisfied inJ

ifF∈J, while a negative fact will be satisfied inJif¬F∉J. The notion of Herbrand model is defined as before.

Another important problem related to the semantics of negation is that a deductive DB may, in general, allow several different interpretations. As an example, consider this DB:

R(a)

P(x)←R(x)∧ ¬Q(x)

Q(x)←R(x)∧ ¬P(x)

This DB allows to consider as true either {R(a), Q(a)} or {R(a),P(a)}.R(a) is always true because it belongs to the EDB, while P(a) or Q(a) is true depending on the truth value of the other. Therefore, it is not possible to agree on unique semantics for this DB.

To avoid that problem, deductive DBs usually are restricted to being

stratified. A deductive DB is stratified if derived predicates can be assigned to different strata in such a way that a derived predicate that appears negatively on the body of some rule can be computed by the use of only predicates in lower strata. Stratification allows the definition of recursive predicates, but it restricts the way negation appears in those predicates. Roughly, semantics of stratified DBs are provided by the application of CWA strata by strata [14]. Given a stratified deductive DBD, the evaluation strata by strata always pro- duces a minimal Herbrand model ofD[20].

For instance, the preceding example is not stratifiable, while the DB of Example 4.1 is stratifiable, with this possible stratification: S1 = {Father,

Mother, Parent, GrandMother, Ancestor} and S2={Nondirect-anc}.

Determining whether a deductive DB is stratifiable is a decidable prob- lem and can be performed in polynomial time [6]. In general, several stratifi- cations may exist. However, all possible stratifications of a deductive DB are equivalent because they yield the same semantics [5].

A deeper discussion of the implications of possible semantics of deduc- tive DBs can be found in almost all books explaining deductive DBs (see, for instance, [5, 6, 8, 9, 11, 14]). Semantics for negation (stratified or not) is dis- cussed in depth in [5, 21]. Several procedures for computing the least Her- brand model of a deductive DB are also described in those references. We will describe the main features of these procedures when dealing with query evaluation in Section 4.3.

4.2.3 Advantages Provided by Views and Integrity Constraints

The concept of view is used in DBs to delimit the DB content relevant to each group of users. A view is a virtual data structure, derived from base facts or other views by means of a definition function. Therefore, the extension of a view does not have an independent existence because it is completely defined by the application of the definition function to the extension of the DB. In deductive DBs, views correspond to derived predicates and are defined by means of deductive rules. Views provide the following advantages.

• Views simplify the user interface, because users can ignore the

GrandMother(x,y) in Example 4.1 provides only information about the grandmotherxand the grandson or granddaughtery. However,

the information about the parent of y is hidden by the view

definition.

• Views favor logical data independence, because they allow changing the logical data structure of the DB without having to perform cor- responding changes to other rules. For instance, assume that the base predicate Father(x,y) must be replaced by two different predi- cates Father1(x,y) and Father2(x,y), each of which contains a subset of the occurrences of Father(x,y). In this case, if we consider Father(x,y) as a view predicate and define it as

Father(x,y)←Father1(x,y) Father(x,y)←Father2(x,y)

we do not need to change the rules that refer to the original base predicate Father.

• Views make certain queries easier or more natural to define, since by means of them we can refer directly to the concepts instead of hav- ing to provide their definition. For instance, if we want to ask about the ancestors of Bob, we do not need to define what we mean by ancestor since we can use the view Ancestor to obtain the answers.

• Views provide a protection measure, because they prevent users

from accessing data external to their view. Users authorized to access only GrandMother do not know the information about parents. Real DB applications use many views. However, the power of views can be exploited only if a user does not distinguish a view from a base fact. That implies the need to perform query and update operations on the views, in addition to the same operations on the base facts.

Integrity constraints correspond to requirements to be satisfied by the DB. In that sense, they impose conditions on the allowable data in addition to the simple structure and type restrictions imposed by the basic schema definitions. Integrity constraints are useful, for instance, for caching data- entry errors, as a correctness criterion when writing DB updates, or to enforce consistency across data in the DB.

When an update is performed, some integrity constraint may be vio- lated. That is, if applied, the update, together with the current content of the

DB, may falsify some integrity constraint. There are several possible ways of resolving such a conflict [22].

• Reject the update.

• Apply the update and make additional changes in the extensional

DB to make it obey the integrity constraints.

• Apply the update and ignore the temporary inconsistency.

• Change the intensional part of the knowledge base (deductive rules and/or integrity constraints) so that violated constraints are satisfied. All those policies may be reasonable, and the correct choice of a policy for a particular integrity constraint depends on the precise semantics of the con- straint and of the DB.

Integrity constraints facilitate program development if the conditions they state are directly enforced by the DBMS, instead of being handled by external applications. Therefore, deductive DBMSs should also include some capability to deal with integrity constraints.

4.2.4 Deductive Versus Relational Databases

Deductive DBs appeared as an extension of the relational ones, since they made extensive use of intensional information in the form of views and integ- rity constraints. However, current relational DBs also allow defining views and constraints. So exactly what is the difference nowadays between a deduc- tive DB and a relational one?

An important difference relies on the different data definition language (DDL) used: Datalog in deductive DBs or SQL [23] in most relational DBs. We do not want to raise here the discussion about which language is more natural or easier to use. That is a matter of taste and personal background. It is important, however, to clarify whether Datalog or SQL can define con- cepts that cannot be defined by the other language. This section compares the expressive power of Datalog, as defined in Section 4.2.1, with that of the SQL2 standard. We must note that, in the absence of recursive views, Datalog is known to be equivalent to relational algebra (see, for instance, [5, 7, 14]).

Base predicates in deductive DBs correspond to relations. Therefore, base facts correspond to tuples in relational DBs. In that way, it is not diffi- cult to see the clear correspondence between the EDB of a deductive DB and the logical contents of a relational one.

Deductive DBs allow the definition of derived predicates, but SQL2 also allows the definition of views. For instance, predicate GrandMother in Example 4.1 could be defined in SQL2 as

CREATE VIEW grandmother AS SELECT mother.x, parent.y FROM mother, parent WHERE mother.z=parent.z

Negative literals appearing in deductive rules can be defined by means of the NOT EXISTS operator from SQL2. Moreover, views defined by more than one rule can be expressed by the UNION operator from SQL2.

SQL2 also allows the definition of integrity constraints, either at the level of table definition or as assertions representing conditions to be satisfied by the DB. For instance, the second integrity constraint in Example 4.1 could be defined as

CREATE ASSERTION ic2 CHECK (NOT EXISTS (

SELECT father.x FROM father, mother

WHERE father.x=mother.x ))

On the other hand, key and referential integrity constraints and exclu- sion dependencies, which are defined at the level of table definition in SQL2, can also be defined as inconsistency predicates in deductive DBs.

Although SQL2 can define views and constraints, it does not provide a mechanism to define recursive views. Thus, for instance, the derived predi- cate Ancestor could not be defined in SQL2. In contrast, Datalog is able to define recursive views, as we saw in Example 4.1. In fact, that is the main difference between the expressive power of Datalog and that of SQL2, a limi- tation to be overcome by SQL3, which will also allow the definition of recur- sive views by means of a Datalog-like language.

Commercial relational DBs do not yet provide the full expressive power of SQL2. That limitation probably will be overcome in the next few years; perhaps then commercial products will tend to provide SQL3. If that is achieved, there will be no significant difference between the expressive power of Datalog and that of commercial relational DBs.

Despite these minor differences, all problems studied so far in the con-