6 – Deductive & Object-Oriented Databases
6.1 Deductive Databases and Datalog
6.1.3 Deductive Databases
6.1.4.1 Datalog – the language
A simple program in Datalog is (line numbers on the left are not part of Datalog):
1. anc(X, Y) <- parent(X, Y).
2. anc(X, Y) <- anc(X,Z),parent(Z, Y).
3. parent(X, Y) <- father(X, Y).
4. parent(X, Y) <- mother(X, Y).
5. father(eric, mario).
6. mother(alice, simon).
Deductive & Object-Oriented Databases - Page [ 125]
Lines 5, 6 and 7 denote facts with the last line relating ‘alice is the mother of mario’. Lines 1, 2, 3, and 4 have rules through which other facts can be inferred. Rule 4 says that if ‘X is the mother of Y’ then ‘X is a parent of Y’. Note that the scope of variables is the rule and the multiple occurrence of a variable in a rule implies that the same constant is substituted in each occurrence.
A query is an expression; the “?-“ symbol is used prior to the query as a command prompt.
For example the following tries to find the father and mother of objects. ?- mother(X,C), father(Y, C).
A possible response to the query would be X=alice, Y=eric and C=mario.
6.1.4.2 Syntax
The basic elements of Datalog syntax are constants, predicates, variables (starting in uppercase), and terms (i.e. a variable or a constant). Also the usual logical connectives (e.g. ‘and’, ‘or’, ‘not’, and ‘implication’) and quantifiers (universal and existential) are used.
A Datalog term is a constant or a variable.
A Datalog well formed formula (i.e. wff) is defined inductively as:
If p is a n-ary predicate and t1, … , tn are terms then p(t1, … , tn) is an atomic wff.
If A and B are wffs then so are the following: not A A and B A or B A <- B A -> B A <-> B
If A is a wff and X is a variable then so are the following:
X (A) Remark: X is bound within A. X (A) Remark: X is bound within A.
Those wff that contain no variables (e.g. facts) are called ground wff. A wff is closed if every variable in the formula is quantified. Conversely if there is a variable in a wff that is not quantified then the wff is not closed.
A clause in Datalog is a closed wff of the following form:
X1, … , X2 ( A1 or … or Ak or not B1 or … or not Bl )
Remark: where Ai and Bi are atomic wffs.
A definite clause is a Datalog clause with one positive atomic wff and zero or many negative atomic wffs (another name for these is Horn clauses). In this structure called a rule the
Deductive & Object-Oriented Databases - Page [ 126]
positive clause is the head and any negative wffs form the body. A rule with an empty body is sometimes called a unit clause and a unit clause without any variables is a fact. A
definite clause program (or positive Datalog logic program) is a collection of definite clauses. 6.1.4.3 Semantics
It is interesting to note that there are three different, but equivalent, ways on how to describe the semantics of a definite clause program. These are the model theoretic, the proof theoretic, and the operational. (Ullman in [ULLMA90] mentions also the ad hoc
semantics (e.g. like Prolog) but quickly loses interest in it). In essence the model theoretic provides a declarative meaning to a program and the operational semantics provide a coupling between a program’s evaluation (i.e. bottom-up) to deductive databases. These are the two semantics descriptions of interest in this study. Out of completeness, the proof theoretic interpretation is related to the SLD-resolution and top-down program evaluation.
The model theoretic semantics often called Tarskian semantics has two main aims: the first is to enumerate the objects (or individuals) comprising the domain of discourse and their interrelationships; and the second is a mapping from a language’s symbols to those objects in the discourse. Consequently the semantics of the language depends both on the world one is trying to represent and on how the constants and predicate symbols in the syntax correspond to individuals and properties of the world. This is the interpretation. We are interested in interpretations that make a Datalog program true; in which case the interpretation is called a model of the program. To give a formal cladding to these ideas the following steps are required.
Given that L is our language (e.g. Datalog) and P is a positive Datalog program then we initially select a non-empty set of elements U, called the domain of interpretation. Then an interpretation of L is defined as:
1. For each constant in L, an assignment of an element in U.
2. For every predicate p in L, an assignment of a mapping from Un into {TRUE, FALSE}
(where n is the arity of p).
It is important to state that for definite programs it suffices to assume that constants represent themselves in the interpretation (an observation due to Lowenheim, Skolem and Herbrand) – these particular interpretations are called Herbrand Interpretations. The
Deductive & Object-Oriented Databases - Page [ 127]
Herbrand universe for L, denoted UL, is the set of all terms that can be built from L (if L is
devoid of constants then one introduces a single constant). The Herbrand base of L, denoted by BL, is the set of atoms that can be generated by assigning objects of UL to the
arguments of predicates found in P.
For a Datalog program P, the Herbrand universe, Up, and the Herbrand base, Bp, are respectively, defined as UL and BL of language L that has constants and predicates identical
with those appearing in P.
It quickly becomes apparent to us the astronomical number of possible interpretations. In our example, with just four predicates (with each having two arguments), and four constants, the size of Bp is 64 (i.e. 4 * 4 * 4). Since any subset of Bp is an interpretation,
then there are 264 Herbrand interpretations.
Let us now consider the ground instances of a rule, e.g. p, in P. If ground(p) represents the ground instances of a rule p (obtained by assigning constants from UB to the variables in p)
then the set ground(P) denotes all the program’s predicate ground instances assigned constants from UB.
Given the enumeration of P through ground(P), then one is able to check whether the elements of ground(P) are in an interpretation of P. If the instance is in the interpretation then the ground atom is said to be satisfied otherwise it is not. An interpretation that satisfies all rules in P is called a model for P. The following interpretations for the program used earlier gives a tangible idea of the definitions here.
Remark:interpretation one
I1 = { father(eric, mario), mother(alice, simon), mother(alice, mario)}.
Comment:- Is I1 an interpretation? Rules 5, 6 & 7 are alright. But what about 4? The body is in this
interpretation but the head is not. Therefore I1 is not an interpretation of the above P.
Remark: interpretation two
I2 = { father(eric, mario), mother(alice, simon), mother(alice, mario), parent(eric, mario),
parent(alice, simon), parent(alice, mario), anc(eric, mario), anc(alice, simon), anc(alice, mario), anc(simon, mario)}.
Comment:- Is I2 an interpretation? Rules 5, 6 & 7 are
Deductive & Object-Oriented Databases - Page [ 128]
some perseverance rules 1 and 2 are also satisfied by I2. Note that the interpretation item anc(simon, mario) does not contradict any rule or imply that parent(simon, mario) should be present – which in fact it is not. I2 is a model of the above P.
Remark: interpretation three I3 = I2 – { anc(simon, mario) }. Remark: I3 is a model of the above P.
Clearly, there can be many interpretations of a program that are a model. A useful result is that the intersection of two models is another model. A further observation (i.e. resulting from this result) is that there are models that on subtracting an element from them will not remain a model. For example by taking off any element from interpretation I3 disables its status of a model for the program. Models with this property are called minimal models. Furthermore if there is a model that is contained by all other models then it is called the
least model. At this point one can easily prove an important result for positive Datalog
programs: every program has a least model (one can follow the proof in [ULLMA90]).