8 Extended Database Concepts

(1)

8 Extended Database Concepts

DOOD

• deductive and

• object–oriented databases

DOOD databases offer advanced features for

• data modelling and • database programming

(2)

8.1 Deductive Databases and Logic Programming

The ease of handling the data structure of terms and the powerful built–in

control structure of backtracking are features that distinguish PROLOG from

other programming languages.

PROLOG is very well–suited for embedded database programming.

In the database context, frequently a restricted version is used, which is called DATALOG – the basis of deductive databases.

• PROLOG and DATALOG are declarative languages; they can access

databases and XML documents.

• Relations and complex objects (like, e.g., XML documents) can be

represented as term structures.

• With the help of declarative rules, we can represent integrity constraints

(3)

Basic Syntax of PROLOG

Constant Symbol: a, 10, ’Smith, John B.’

Variable Symbol: X, Lname (starts with a capital letter) Term: f(t₁, . . . , t_n),

with function symbol f and terms ti

a, X (constant and variable symbols are terms), f(g(a,b),X,10), a*(b+c) (complex terms),

[LNAME, . . . , DNO] (this is a list)

Predicate Symbol: employee, e_works_on_p, transitive_closure Atom: p(t₁, . . . , tn),

(4)

Extra–Logical Features

• assert and retract can update the internal PROLOG database

• meta–predicates like maplist, findall, and ddbase_aggregate

form control structures

• the cut “!” freezes variable bindings and cuts off alternative computations paths

(5)

Terms in Infix / Prefix Form

It is possible to define binary, infix functors ⊙ in PROLOG. Then, the binary

term ⊙(t1, t2) can be represented in infix form as t1 ⊙ t2.

• The infix term 1955-01-09 representing a date has the prefix form

-(-(1955,01),09).

• The infix term _a*(b+c) representing an arithmetic expression has the prefix form _*(a,+(b,c)).

The operator trees for the terms above are given in the following:

-1955 -01 09 R R * b + c a R R

(6)

When an infix functor ⊙ is used multiple times in a term a ⊙ b ⊙ c, then there

are rules in PROLOG that determine whether a and b or b and c are joined first

in the prefix form.

• The infix term 1955-01-09 representing a date has the prefix form

-(-(1955,01),09)_.

• The infix term T:As:Es representing an XML element has the prefix

form :(T,:(As,Es)).

The operator trees for the terms above are given in the following:

-1955 -01 09 R R : As : Es T R R

(7)

Term Representation for Lists

A non–empty list is represented as a binary term structure .(X, Xs)_{, where}

• the head X _{is the first element and}

• the tail Xs _{represents the rest of the list.}

The list functor ”._{” is binary, and the empty list is represented by ”}[]_”.

Additionally, PROLOG supports the compact list notation [X₁,X₂,...,X_n].

It is equivalent to [X₁|[X₂,...,X_n]].

[c] = .(c, [])

[a, b, c] = .(a, .(b, .(c, [])))

(8)

Term Representation for XML

An XML element

can be represented by a complex term in field notation (FN):

table:[name:employee]:[

attribute:[name:’FNAME’]:[] ].

This infix form is using the binary functor ”:”.

(9)

Facts, Rules, and Goals

Literal: atom A oder negated atom not_(A)

Fact: A

with atom A; e.g.,

employee(’John’, ’B’, ’Smith’, ...) Rule: A |{z} head :- B₁, . . . , B_m | {z } body

with atom A and literals Bi, example later

Goal: :- B₁, . . . , B_m

with literals Bi

A set of facts for the same predicate symbol corresponds to a relation in databases. Rules generalize views. Goals are used for expressing queries.

(10)

8.1.1 PROLOG as a Database Language

• PROLOG can be used for representing tables from relational databases.

The tuples of a table become PROLOG facts with the same predicate

symbol – usually, the table name is used.

• The data dictionary of a relational database can also be represented

using PROLOG facts. This can be done using PROLOG terms that

correspond to an XML representation of the data dictionary.

• Queries and integrity constraints can be represented as PROLOG rules.

Conjunctive queries are posed in the form of PROLOG goals, which are

then evaluated using the PROLOG rules.

• DATALOG is a restricted version of PROLOG without function symbols

and extra–logical features. DATALOG ensures termination and the

(11)

Database Tables

in MySQL:

EMPLOYEE

FNAME MINIT LNAME SSN BDATE ADDRESS SEX SALARY SUPERSSN DNO John B Smith 444444444 1955-01-09 731 Fondren, Houston, TX M 30000 222222222 5 Franklin T Wong 222222222 1945-12-08 638 Voss, Houston, TX M 40000 111111111 5 Alicia J Zelaya 777777777 1958-07-19 3321 Castle, Spring, TX F 25000 333333333 4 Jennifer S Wallace 333333333 1931-06-20 291 Berry, Bellaire, TX F 43000 111111111 4 Ramesh K Narayan 555555555 1952-09-15 975 Fire Oak, Humble, TX M 38000 222222222 5 Joyce A English 666666666 1962-07-31 5631 Rice, Houston, TX F 25000 222222222 5 Ahmad V Jabbar 888888888 1959-03-29 980 Dallas, Houston, TX M 25000 333333333 4 James E Borg 111111111 1927-11-10 450 Stone, Houston, TX M 55000 NULL 1

A database table p can be represented by a set of PROLOG facts, namely one

(12)

WORKS_ON

ESSN PNO HOURS

111111111 20 NULL 222222222 2 10.0 222222222 3 10.0 333333333 20 15.0 333333333 30 20.0 444444444 1 32.5 444444444 2 7.5 555555555 3 40.0 666666666 1 20.0 666666666 2 20.0 777777777 10 10.0 777777777 30 30.0 888888888 10 35.5 888888888 30 5.0 PROJECT

PNAME PNUMBER PLOCATION DNUM

ProductX 1 Bellaire 5 ProductY 2 Sugarland 5 ProductZ 3 Houston 5 Computerization 10 Stafford 4 Reorganization 20 Houston 1 Newbenefits 30 Stafford 4

(13)

Database Tables in PROLOG: employee(’John’, ’B’, ’Smith’, 444444444, 1955-01-09, ’731 Fondren, Houston, TX’, ’M’, 30000, 222222222, 5). employee(’Franklin’, ’T’, ’Wong’, ...). ... works_on(444444444, 1, 32.5). works_on(444444444, 2, 7.5). ... department(’Research’, 5, 222222222, 1978-05-22). ... project(’ProductX’, 1, ’Bellaire’, 5). ...

We do not quote the date values. Then, they are terms, and we can access their components more conveniently without string parsing.

(14)

Views and Queries vs. Rules and Goals

• SQL VIEW:

CREATE VIEW E_WORKS_ON_P AS

SELECT FNAME, LNAME, PNAME, HOURS FROM EMPLOYEE, WORKS_ON, PROJECT WHERE EMPLOYEE.SSN = WORKS_ON.ESSN

AND PROJECT.PNUMBER = WORKS_ON.PNO

• PROLOG rule:

e_works_on_p(FNAME, LNAME, PNAME, HOURS) :-works_on(SSN, P, HOURS),

employee(FNAME, _, LNAME, SSN, _,_,_,_,_,_), project(PNAME, P, _,_).

(15)

• SQL SELECT:

SELECT *

FROM E_WORKS_ON_P

The SELECT statement calls the view. • PROLOG goal:

?- e_works_on_p(FNAME, LNAME, PNAME, HOURS).

The query is submitted to the PROLOG interpreter as a goal.

(16)

Integrity Constraints in PROLOG

• Primary Key Constraint for Employee:

primary_key_violation(employee, X, Y) :-X = employee(_,_,_, SSN, _,_,_,_,_,_), Y = employee(_,_,_, SSN, _,_,_,_,_,_), call(X), call(Y), X \= Y.

• Foreign Key Constraint for Employee:

foreign_key_violation(employee(’DNO’), X) :-X = employee(_,_,_,_,_,_,_,_,_, DNO),

call(X),

not(department(_, DNO, _,_)).

In DDBASE, the primary and foreign key contraints of a relational database

(17)

Argument Positions vs. Field Notation (FN)

• Like in other programming languages, the arguments ti of an atom

p(t₁, . . . , tn) are handed over by position in PROLOG. E.g., in

works_on(S, P, H),

the first position t₁ = S is the SSN of an employee who has worked on the project with the number t₂ = P for t₃ = H hours.

• In the database context, we could use a meta–interpreter for accessing

arguments in field notation – in a more abstract way – by their

corresponding attribute name. Then, according to the database schema,

works_on(’PNO’:P, ’ESSN’:S)

means that the employee S _{has worked on the project} P_{. The order of the}

(18)

Semantic Constraints in Field Notation (FN)

• No employee should earn more than his manager: trigger(salary, X, Y)

:-employee(’SSN’:X, ’SALARY’:S1, ’SUPERSSN’:Y), employee(’SSN’:Y, ’SALARY’:S2),

S1 > S2.

• Which employee works on a foreign project ?

trigger(employee_works_on_foreign_project, E, P) :-works_on(’ESSN’:E, ’PNO’:P),

employee(’SSN’:E, ’DNO’:D1),

project(’PNUMBER’:P, ’DNUM’:D2), D1 \= D2.

FN abstracts from argument positions: employee(’SSN’:E, ’DNO’:D1)

(19)

Bottom–Up Evaluation of DATALOG

• The set of all given facts for a predicate corresponds to a relation. • A rule without function symbols corresponds to a VIEW statement

defining a relation for the head predicate.

• The relations for the body predicates are derived using rules themselves.

Thus, it can happen that a rule transitively helps to derive tuples for one of its body predicates (recursion).

E.g., the second rule for supervisor is directly recursive.

• The bottom–up evaluation iteratively enlarges the relations for the

predicates by repeatedly evaluating all rules until a fixpoint is reached. Thus, e.g., all transitive supervisors can be derived, which is usually not possible using SQL systems.

(20)

Recursion and Transitive Closure 444444444 555555555 666666666 777777777 888888888 222222222 333333333 111111111 ? j R j

The following recursive DATALOG rule set derives the transitive supervisor

relation on the social security numbers: supervisor(SSN_1, SSN_2) :-direct_supervisor(SSN_1, SSN_2). supervisor(SSN_1, SSN_2) :-direct_supervisor(SSN_1, SSN_3), supervisor(SSN_3, SSN_2). direct_supervisor(SSN_1, SSN_2) :-employee(_,_,_, SSN_2, _,_,_,_, SSN_1, _). SSN_1 SSN_3 SSN_2 direct s. supervisor ? ?

(21)

The first iteration derives the facts for direct_supervisor _{from the}

facts for employee:

direct_supervisor(111111111, 222222222). direct_supervisor(111111111, 333333333). direct_supervisor(222222222, 444444444). direct_supervisor(222222222, 555555555). direct_supervisor(222222222, 666666666). direct_supervisor(333333333, 777777777). direct_supervisor(333333333, 888888888).

The second iteration translates these facts to the corresponding 7 facts for

supervisor_.

supervisor(111111111, 222222222). ...

(22)

The third iteration derives the 5 new facts that 111111111 is the transitive (indirect) supervisor of the employees 444444444 to 888888888:

supervisor(111111111, 444444444). supervisor(111111111, 555555555). supervisor(111111111, 666666666). supervisor(111111111, 777777777). supervisor(111111111, 888888888).

Since the hierarchy is of limited depth 2 here, the relations corresponding to

these facts could also be derived in SQL.

For arbitrary hierarchies of unlimited depth, however, it is usually not possible to derive the transitive supervisors in SQL.

(23)

In principle, all rules can be used in all iterations. But, a rule can only fire and derive facts, as soon as facts for the body atoms have been derived in previous iterations. From then on, the rule can always be used to derive the same facts. One of the purposes of efficient bottom–up evaluation is to avoid these

redundant derivations – especially in the presence of recursion.

Finally, in iteration 4, the 5 facts for transitive supervisors are derived. Iteration 5 does not derive any new facts.

(24)

Comparison with SQL

• Non–recursive DATALOG could be simulated in SQL by mapping the

rules to View statements – or to INSERT statements whose result is computed using a SELECT statement.

• Recursion brings higher expressivity to DATALOG.

• There are DATALOG extensions which allow for default negation and

aggregate operations as well.

• The rule–based approach of DATALOG supports modularization:

instead of one single, complex VIEW or SELECT statement in SQL, a set

(25)

Transitive closure cannot be formulated in standard SQL systems. Some

relational database systems, however, offer limited forms of recursion – cf. SQL:2003.

CREATE RECURSIVE VIEW supervisor(Emp, Sup) AS SELECT Emp, Sup

FROM direct_supervisor

UNION

SELECT D.Emp, S.Sup

FROM direct_supervisor D, supervisor S WHERE D.Sup = S.Emp

This assumes a table direct_supervisor with the attributes Emp

and Sup. Obviously, this SQL implementation is structurally equivalent to

the following shorter rule implementation (“;_{” means “or”).} supervisor(Emp, Sup)

:-( direct_supervisor:-(Emp, Sup)

(26)

8.1.2 The Deductive Database System DDBASE

The deductive database system DDBASE, which is part of the DDK,

combines PROLOG and DATALOG. It can process • relational databases and

• XML documents

within the same query using ODBC and FNQuery, respectively:

DDBASE

RDB XML

ODBC FNQuery

U

(27)

ODBC

The following PROLOG rule accesses a relational database – given by the

connection handle mysql – using the ODBC library of SWI PROLOG.

generate_html_table(Salary, table:Rows)

:-concat(’SELECT fname, minit, lname, salary \

FROM employee WHERE salary >= ’, Salary, Query), Types = [types([atom,atom,atom,integer])],

findall( Row,

( odbc_query(mysql, Query, row(F,M,L,S), Types), Row = tr:[td:[F], td:[M], td:[L], td:[S]] ), Rows ).

The query string Query is obtained by concatenating a partial select statement with the value for the salary. Types gives the types of the components of the result tuples.

(28)

The findall Statement

• The call odbc_query(mysql, Query, row(F,M,L,S), Types) returns the values F,M,L,S for the attributes fname, minit, lname, salary of the table employee.

• By backtracking, the findall statement produces a list Rows of PROLOG terms Row of the form tr:[td:[F], td:[M],

td:[L], td:[S]], which represent XML elements in FNQuery. • For a given Salary, the call generate_html_table(Salary,

table:Rows) produces a PROLOG term table:Rows, which represents the following HTML table in FNQuery.

(29)

The generated HTML table <table> <tr><th>Fname</th><th>Minit</th><th>Lname</th><th>Salary</th></tr> <tr><td>John</td><td>B</td><td>Smith</td><td>30000</td></tr> <tr><td>Franklin</td><td>T</td><td>Wong</td><td>40000</td></tr> <tr><td>Jennifer</td><td>S</td><td>Wallace</td><td>43000</td></tr> <tr><td>Ramesh</td><td>K</td><td>Narayan</td><td>38000</td></tr> <tr><td>James</td><td>E</td><td>Borg</td><td>55000</td></tr> </table>

(30)

By ODBC, we can make SQL tables available in DDBASE: employee(A,B,C,D,E,F,G,H,I,J) :-ddbase_call(odbc(mysql), company:employee(A,B,C,D,E,F,G,H,I,J)). works_on(A,B,C) :-ddbase_call(odbc(mysql), company:works_on(A,B,C)).

It is also possible to generate these rules in DDBASE, which avoids the

error–prone, repeated use of so many variable symbols.

The call ddbase_connect(odbc(mysql), M, Database:Table)

asserts a corresponding rule in a PROLOG module M.

The following two aggregation statements refer to the predicate

employee/10 provided by ODBC. In the second statement, the facts for

works_on/3 _{are derived using F}NQuery from an XML document

(31)

Aggregation Queries

The meta–predicate ddbase_aggregate/3 _{in the following query}

groups over the employees:

• for every employee – given by FNAME,MINIT,LNAME,SSN _–

• the corresponding list of all tuples [PNO,HOURS] _{is computed:}

?- ddbase_aggregate( [F, M, L, S, list([P,H])], ( works_on(S, P, H), employee(F, M, L, S, _,_,_,_,_,_) ), Tuples ), Attributes = [’FNAME’,’MINIT’,’LNAME’,’SSN’,’[PNO,HOURS]’], xpce_display_table(Attributes, Tuples).

(32)

Tuples = [

[’Ahmad’, ’V’, ’Jabbar’, ’888888888’, [[10, 35.5], [30, 5.0]]], [’Alicia’, ’J’, ’Zelaya’, ’777777777’, [[10, 10.0], [30, 30.0]]], ... ]

Thus, DDBASE can produce nested (NF2) tables, which is not possible

(33)

Aggregation on RDB and XML

For every Ssn in the table employee, the following query groups all corresponding entries from the document works_on.xml:

?- ddbase_aggregate( [Ssn, list([Pno, Hours])], ( employee(_,_,_, Ssn, _,_,_,_,_,_),

Row := doc(’works_on.xml’)/row::[@’ESSN’=Ssn], Pno := Row@’PNO’, Hours := Row@’HOURS’ ),

Tuples ).

Tuples = [

[’111111111’, [[’20’, ’0.0’]]],

[’222222222’, [[’2’, ’10.0’], [’3’, ’10.0’]]], ... ] The resulting list Tuples represents an NF2 relation.

A query optimizer could rearrange the Goal in ddbase_aggregate/3 by changing the order of the calls to the predicate employee/10 and the XML

(34)

In DDBASE, we can define arbitary binary aggregation predicates.

ddbase_aggregate/3 groups over all variable symbols that occur standalone in the result template [Ssn, list([Pno, Hours])]; in this case, this is Ssn.

• For every Ssn, the above call to ddbase_aggregate/3 computes the list Xs _{of all corresponding pairs} [Pno, Hours]_.

• Then, the call list(Xs, Pairs) _{simply passes} Xs _to Pairs_.

• Thus, ddbase_aggregate/3 _{produces a nested tuple} [Ssn,

Pairs] _{for every} Ssn_. Pairs _{is a list of lists; it represents a relation.}

(35)

The following statement aggregates the working hours of the employees of the departments:

?- ddbase_aggregate( [Dno, sum(Hours)],

( employee(_,_,_, Ssn, _,_,_,_,_, Dno),

Row := doc(’works_on.xml’)/row::[@’ESSN’=Ssn], H := Row@’HOURS’, atom_number(H, Hours) ),

Tuples ).

Tuples = [[1, 0.0], [4, 115.5], [5, 140.0]]

The attribute value H of the attribute ’Hours’ of Row is an atom that has to be converted to a number Hours.

The template [Dno, sum(Hours)] leads to a grouping on the department numbers. For every Dno, first the list Xs of all corresponding Hours is

computed, and then the sum is computed by the call sum(Xs, Sum); thus, we obtain a standard result tuple [Dno, Sum]_.

(36)

8.1.3 PROLOG as a Programming Language

In the following, we will present PROLOG implementations of well–known

algorithms for searching in graphs and binary search trees. The benefits of PROLOG are

• the elegant handling of data structures (lists, trees, XML), • (implicit) backtracking, and

• the compact representation of case distinctions in different rules.

The algorithms are typically recursive. Recursion can be formulated nicely due to the compact list access.

(37)

Graph Search

(38)

The graph is given by PROLOG facts for the predicates graph_arc/2 and graph_sink/1. Labyrinth: 6 -a _b c d e f g _h _i graph_arc(i, f). graph_arc(i, h). graph_arc(h, g). graph_arc(g, d). graph_arc(d, e). graph_arc(d, a). graph_arc(a, b). graph_arc(b, c). graph_sink(c).

(39)

Search for Simple Paths

The predicate graph_search/2 searches for a simple path from a given node to a sink of a graph:

% graph_search(+Node, ?Path) <-graph_search(X, Path)

:-graph_search(X, [X], Path).

Another predicate graph_search/3 _{with the same predicate symbol but a}

different arity is called.

Notation for arguments in the comment line:

(40)

Visited

Path =

[Y1=Y, _{. . .} ,Yn=Z]

X Y Z

- -

-• A call graph_search(X, Visited, Nodes) _{with a bound}

argument X, which is not a sink, and a list Visited of already visitied nodes

– uses an edge from X to not yet vistited successor node Y, and then – calculates a path Path from Y to a sink Z, which does not visit Y and

the nodes in Visited.

If no path from Y to a sink can be found, then another successor node of

X must be used (Backtracking).

• The result Nodes = [X|Path] is a simple path from X to a sink of the graph.

(41)

The predicate graph_search/3 is recursive, because of its second rule:

% graph_search(+Node, +Visited, ?Path) <-graph_search(X, _, [X])

:-graph_sink(X).

graph_search(X, Visited, [X|Path]) :-graph_edge(X, Y),

not(member(Y, Visited)),

write(user, ’->’), write(user, Y), graph_search(Y, [Y|Visited], Path).

Visited

Path =

[Y₁=Y, . . . ,Yn=Z]

X Y Z

- -

-Termination is ensured by the fact that already visited nodes cannot be visited

(42)

The following rule symmetrisises the predicate graph_arc/2:

graph_edge(X, Y) :-( graph_arc:-(X, Y)

; graph_arc(Y, X) ).

Thus, it is not necessary to explicitely list the inverse edges. E.g., from

graph_arc(i, f).

we obtain

graph_edge(i, f). graph_edge(f, i).

(43)

• The initial call graph_search(X, [X], Path) calculates a simple path from X to a sink of the graph.

– If X is a sink, then the first rule for graph_search/3 computes

Path as an empty list.

– Otherwise, the recursive, second rule choses a successor node Y using

graph_edge(X, Y)_{, and then it continues searching from there.}

• Further paths can be searched for by backtracking.

– Alternative successor nodes Y _{can be used in the second rule.}

– In the implementation above, we can continue searching beyond a sink by using the second rule instead of the fist one.

(44)

Implicit and Explicit Backtracking

In PROLOG, backtracking is used automatically (implicitly).

In a procedural language, backtracking has to be implemented explicitly. In a direct translation of the code above to a procedural environment, a call

graph_edge(X, Y) _{can only produce a single successor node} Y _of X _–

if there is no path from Y to a sink, then the computation fails. Moreover, at most one solution could be computed.

If we implement the graph search procedurally using explicit backtracking, then we get more lines of more complex code than in PROLOG.

(45)

Computation

• The predicate graph_search/2 use depth first search, and it calculates simple paths (without duplicate nodes).

• With the call graph_search(+Node, -Path), we can calculate all simple paths from Node to a sink (graph_sink) by backtracking:

?- graph_search(i, Path). ->f->h->g->d->e->a->b->c Path = [i, h, g, d, a, b, c] ?- graph_search(e, Path). ->d->a->b->c Path = [e, d, a, b, c] ; ->g->h->i->f No

(46)

• If we add another edge graph_arc(e, b) to the graph (i.e., we tear down the wall between e and b), then there appears another simple path

[e, b, c] from e to the sink c.

• All results can be calculated by backtracking and findall/3:

graph_arc(e, b). ?- findall( Path,

graph_search(e, Path), Paths ).

Paths = [[e, d, a, b, c], [e, b, c]] Yes

(47)

The Meta–Predicate findall/3

Finding of all solutions for a goal:

findall( X, goal(X), Xs )

The DDK allows for the following equivalent set notation:

Xs <= { X | goal(X) }

Further important meta–predicates are checklist/2 _and maplist/3 _for

lists, as well as the predicates for loops (control structures) from the library

(48)

Binary Search Trees

% search_in_tree(+Key, +Tree) <-search_in_tree(Key, Tree)

:-Tree = tree(Root, Lson, Rson), ( Key = Root

; Key < Root ->

search_in_tree(Key, Lson) ; Key > Root ->

search_in_tree(Key, Rson) ).

(49)

Term Representation of a Search Tree

tree(5,

tree(4, nil, nil), tree(9,

tree(6, nil, nil),

tree(10, nil, nil) ) )

5

10 6

9 4

(50)

% insert_into_tree(+Key, +Tree, ?New_Tree) <-insert_into_tree(Key, Tree, New_Tree)

:-Tree = tree(Root, Lson, Rson), ( Key = Root ->

New_Tree = Tree ; Key < Root ->

insert_into_tree(Key, Lson, L), New_Tree = tree(Root, L, Rson) ) ; K > Root ->

insert_into_tree(Key, Rson, R), New_Tree = tree(Root, Lson, R) ). insert_into_tree(Key, nil New_Tree)

(51)

Important Concepts

• Terms (for Data and Control Structures) and Unification • Backtracking

• SLDNF–Resolution

PROLOG allows for

• declarative programming, • compact programs, and

(52)

Data Structures, Operations, and Control Structures

• The restriction to a few basic data types and a single complex data type,

namely the terms, which is generic and subsumes all the other types, standardizes the data structures.

• There are no explicit type declarations.

• There exists a large collection of generic operations that are applicable to

terms – and thus to all data types.

• Frequently, meta–predicates are used.

• In addition to standard control structures, such as branching

(if–then–else), loops (for, while), and recursion, user–defined control structures can be built as meta–predicates.

(53)

Software Engineering Aspects

PROLOG supports abstraction and compact code, and thus stimulates

refactoring:

• The generic type of terms with generic operations supports abstraction

and code reuse.

• User–defined control structures allow for further abstraction.

• Unification, implicit backtracking, and abstaining from explicit type

declarations, result in very compact code and support rapid prototyping.

• Declarativity makes the code much more readable and thus extensible.

Switching from conventional programming languages to the logic

programming paradigm is difficult and usually requires a lot of training and effort.

(54)

Course on Deductive Databases

Topics:

• foundations and applications of PROLOG and DATALOG,

data modelling and programming;

• the deductive database system DDBASE; • efficient evaluation of DATALOG programs;

• further language constructs in the DDK (DISLOG Developers’ Kit):

– complex data structures,

– default negation and disjunction;

(55)

8.2 Semantic Web Databases

Knowledge representation in the Semantic Web (Web 2.0) is based on ontologies and logic.

Reasoning Tasks:

• search: query answering;

• knowledge engineering / modelling:

analysis of the structure of the ontologies for anomalies.

Knowledge engineering and reasoning in the Semantic Web is based on ontology editors and specialized databases. It can further be supported by deductive databases and logic programming techniques.

(56)

In the Semantic Web, it is possible to reason about

• the ontology / taxonomy (i.e., the schema) and • the instances.

This is called terminological or assertional (T–Box or A–Box) reasoning, respectively. This makes search in the Semantic Web more effective.

• In the following printer ontology, we could search for a printer from HP,

and the result could be a laser–jet printer from HP, since the system knows that hpLaserJetPrinter _{is a sub–class of} hpPrinter_.

• It can also be derived, that all laser–jet printers from HP are no laser

writers from Apple; in this case, this is very easy, since it is explicitely stored in the ontology.

Moreover, we will show in the following how to support knowledge

(57)

The Web Ontology Language (OWL)

In OWL, we can mix concepts from

• rdf (Resource Description Framework) for defining instances and

• rdfs (rdf Schema) for defining the schema

of an application. Moreover, tags with the namespace owl are allowed.

The Semantic Web Rule Language (SWRL) incorporates logic programming

rules into OWL ontologies.

There exist well–known, powerful tools for asking queries on and for

(58)

The Printer Ontology

product

printer

personalPrinter laserJetPrinter hpPrinter

hpLaserJetPrinter hpApplePrinter hpProduct appleLaserWriter {disjoint} ibmLaserPrinter

(59)

The Printer Ontology in OWL <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:xsd="http://www.w3.org/2001/XLMSchema#" xmlns="file:/protege/Ontologies/p.owl#"> <owl:Ontology rdf:about="">

<owl:VersionInfo> Printer Example, Version 1.3, 02.02.2013 </owl:VersionInfo> </owl:Ontology>

<owl:Class rdf:ID="printer"/>

<owl:Class rdf:ID="laserJetPrinter">

<rdfs:subClassOf rdf:resource="#printer"/> </owl:Class> ...

(60)

The following owl:Class element defines the class appleLaserWriter:

<owl:Class rdf:ID="appleLaserWriter"> <rdfs:comment>

Apple laser writers are laser jet printers </rdfs:comment>

<rdfs:subClassOf rdf:resource="#laserJetPrinter"/> <owl:disjointWith rdf:resource="#hpLaserJetPrinter"/> </owl:Class>

The rdfs:subClassOf sub–element states that appleLaserWriter is a

sub–class of laserJetPrinter. The owl:disjointWith sub–element

states that appleLaserWriter is disjoint from hpLaserJetPrinter.

Observe, that we refer using the attribute rdf:resource and a “#”, whereas

(61)

The following owl:Class element defines a class of printers from a joint

venture of HP and Apple:

<owl:Class rdf:ID="hpApplePrinter"> <rdfs:comment>

Printers from a joint venture of HP and Apple </rdfs:comment>

<rdfs:subClassOf rdf:resource="#hpLaserJetPrinter"/> <rdfs:subClassOf rdf:resource="#appleLaserWriter"/> </owl:Class>

The existence of such printers would contradict the disjointWith

restriction between the classes hpLaserJetPrinter and apperLaserWriter.

The emptiness of the class hpApplePrinter _{can be detected by reasoners}

(62)

Every laserJetPrinter is a printer, and every hpPrinter is an hpProduct: <owl:Class rdf:ID="printer"/> <owl:Class rdf:ID="laserJetPrinter"> <rdfs:subClassOf rdf:resource="#printer"/> </owl:Class> <owl:Class rdf:ID="hpProduct"/> <owl:Class rdf:ID="hpPrinter"> <rdfs:subClassOf rdf:resource="#hpProduct"/> </owl:Class>

(63)

Redundant subClassOf Relation

Since hpLaserJetPrinter is a sub–class of hpPrinter and hpPrinter

is a sub–class of hpProduct, it is redundant to explicitely state that hpLaserJetPrinter is a sub–class of hpProduct.

<owl:Class rdf:ID="hpLaserJetPrinter"> <rdfs:subClassOf rdf:resource="#laserJetPrinter"/> <rdfs:subClassOf rdf:resource="#hpPrinter"/> <rdfs:subClassOf rdf:resource="#hpProduct"/> <owl:disjointWith rdf:resource="#appleLaserWriter"/> </owl:Class>

This redundancy is not an error. We could simply consider it as an anomaly, that should be reported to the knowledge engineer.

(64)

Instances

Finally, we have some instances of some of the defined classes: <appleLaserWriter rdf:ID="1001"/>

As mentioned before, there cannot exist instances of the class

(65)

(66)

The ontology editor Protégé has some plugged in reasoners, such as

• FaCT++, • HermiT, and • Racer.

In the session that is shown in the screenshot above, the emptiness of the class

hpApplePrinter was be detected by the ontology reasoner FaCT++. It is inferred that the class hpApplePrinter is EquivalentTo the

empty class Nothing. By clicking the question mark, an explanation can be shown.

There are also databases for handling rdf _{data, so called triple stores, such}

as Sesame or Jena. They use extensions of SQL– most notably SPARQL – as

(67)

Declarative Queries in FNQuery

Complex XML data structures in PROLOG:

’owl:Class’:[’rdf:ID’:’appleLaserWriter’]:[ ’rdfs:comment’:[’Apple laser ...’], ’rdfs:subClassOf’:[ ’rdf:resource’:’#laserJetPrinter’]:[], ’owl:disjointWith’:[ ’rdf:resource’:’#hpLaserJetPrinter’]:[] ]

An XML element is represented as a term structure T:As:C, called FN–triple.

• T is the tag of the element,

• As is the list of the attribute/value pairs A:V of the element, and

(68)

FNSELECT

In an OWL knowledge base Owl, there exists an isa relation between two classes C1 and C2, if a subclassOf relation is stated explicitely, or

if C1 _{was defined as the interesection of} C2 _{and some other classes:} % isa(+Owl, ?C1, ?C2) <-isa(Owl, C1, C2) :-C := Owl/’owl::-Class’::[@’rdf:ID’=:-C1], ( R2 := C/’rdfs:subClassOf’@’rdf:resource’ ; R2 := C/’owl:intersectionOf’/’owl:Class’@’rdf:about’ ), owl_reference_to_id(R2, C2). % owl_reference_to_id(+Reference, ?Id) <-owl_reference_to_id(Reference, Id)

:-( concat:-(’#’, Id, Reference) ; Id = Reference ).

(69)

Disjointness of Classes % disjointWith(+Owl, ?C1, ?C2) <-disjointWith(Owl, C1, C2) :-R2 := Owl/’owl:Class’::[@’rdf:about’=R1] /’owl:disjointWith’@’rdf:resource’, owl_reference_to_id(R1, C1), owl_reference_to_id(R2, C2).

In the following, we often suppress the ontology argument Owl_.

Transitive Closure of isa

% subClassOf(?C1, ?C2) <-subClassOf(C1, C2)

:-isa(C1, C2).

subClassOf(C1, C2)

(70)

Anomalies in Ontologies Cycle ?- isa(C1, C2), subClassOf(C2, C1). C1 = personalPrinter, C2 = printer Partition Error ?- disjointWith(C1, C2), subClassOf(C, C1), subClassOf(C, C2). C = hpApplePrinter, C1 = hpLaserJetPrinter, C2 = appleLaserWriter

(71)

Incompleteness

?- isa(C1, C), isa(C2, C), isa(C3, C),

disjointWith(C1, C2), not(disjointWith(C2, C3)). C = laserJetPrinter,

C1 = hpLaserJetPrinter, C2 = appleLaserWriter, C3 = ibmLaserPrinter

The class C has three sub–classes C1, C2 and C3, from which only the two sub–classes C1 and C2 are declared as disjoint in the knowledge base.

The fact that C2 _and C3 _{are disjoint and that} C1 _and C3 _{are disjoint as well,}

possibly was forgotten by the knowledge engineer during the creation of the knowledge base.

(72)

Redundant subClassOf/instanceOf Relations % redundant_isa(?Chain) <-redundant_isa(C1->C2->C3) :-isa(C1, C2), subClassOf(C2, C3), isa(C1, C3). ?- redundant_isa(Chain).

Chain = hpLaserJetPrinter -> hpPrinter -> hpProduct

The sub–class relation between C1 and C3 can be derived by transitivity over the class C2_.

Here, isa(C1, C2), subClassOf(C2, C3), requires that this deduction is done over at least two levels.

(73)

Undefined Reference

During the development of an ontology in OWL, it is possible that we

reference a class that we have not yet defined. % undefined_reference(+Owl, ?Ref) <-undefined_reference(Owl, Ref) :-rdf_reference(Owl, Ref), not(owl_class(Owl, Ref)). rdf_reference(Owl, Ref) :-( R := Owl/descendant_or_self::’*’@’rdf:resource’ ; R := Owl/descendant_or_self::’*’@’rdf:about’ ), owl_reference_to_id(R, Ref). owl_class(Owl, Ref) :-Ref := Owl/’owl:Class’@’rdf:ID’.

If we load such an ontology into Protégé, then the ontology reasoners may produce wrong results, even for unrelated parts of the ontology.

(74)

8.3 Object–Oriented Databases

Application Domains

• engineering (CAD/CAM, CIM) • image and graphics databases • scientific applications

• geo–databases

• multimedia systems

(75)

Influences and concepts from other areas of computer science:

• programming languages:

abstract data typs and encapsulation completeness (w.r.t. expressivity)

• software engineering:

modularisation, code extensibility and reuse

• artificial intelligence:

concepts for knowledge representation, classification

• databases:

(76)

8.3.1 Complex Objects

Every object has a unique object identifier OID.

This value is invisible for the user. It is only used internally by the system to identify an object and to allow for references between different objects.

An object o is represented by a triple h i, c, v i: • i is the unique object identifier,

• c is a type constructor, • v is the value of o.

Type Constructors: atom, tuple, set, list, array

(77)

Given an object o = h i, c, v i.

• If c = atom, then v is an atomic value. • If c = tuple, then

v = h a₁ : i₁, a₂ : i₂, . . . , an : in i

is a tuple with attribute names aj and OID’s ij.

• If c = set, then

v = { i1, i2, . . . , in }

is a set of OID’s ij.

(78)

A complex object can be represented by a graph.

Two complex objects o₁ = h i₁, c₁, v₁ i and o₂ = h i₂, c₂, v₂ i are called • deeply equal, if c₁ = c₂ and v₁ = v₂.

• shallow equal, if their graphs are isomorphic and the atomic values in

the corresponding leaves are the same.

OODDL: Object–Oriented Data Definition Language

Nowadays, complex objects are frequently stored and managed using XML

(79)

Example (Complex Objects)

The complex objects o₁, o₂, and o₃ contain the atomic objects o₄, o₅, and o₆ as sub–objects: o₁ = h i1, tuple,h a1 : i4, a2 : i6 i i, o₂ = h i2, tuple,h a1 : i4, a2 : i6 i i, o₃ = h i₃, tuple,h a₁ : i₅, a₂ : i₆ i i, o₄ = h i4, atom,10 i, o₅ = h i₅, atom,10 i, o₆ = h i₆, atom,20 i.

(80)

tuple tuple

atom

atom atom atom

o o o o o o i i i i i i <a :10, a :20> <a :10, a :20> a a a a 1 1 1 1 1 1 2 2 2 2 3 3 4 4 6 6 5 5 6 6 : : : : : :

(81)

Nested Relations

Given a set U of attributes with domains dom(A), A ∈ U .

Formats R and domains dom(R) are defined recursively:

• R = (A1, . . . , An, R1, . . . , Rm) with Ai ∈ U , 1 ≤ i ≤ n, and formats

Rj, 1 ≤ j ≤ m, is a format with

dom(R) = dom(A₁) × . . . × dom(An) × 2dom(R1) × . . . × 2dom(R2).

• If m = 0, then R is a basic format.

A nested tuple over a format R is an element of dom(R).

A nested relation or NF2–Relation (Non–First–Normal–Form) over R is a subset of dom(R).

(82)

Example (NF2–Relation) formats

Children = (Cname, BDate, Sex) Graduations = (Type, Date)

Employees = (Id, Name, Address, Children, Graduations) NF2–Relation over the format Employees:

Employees

Id Name Address Children Graduations Cname Bdate Sex Type Date

Mary 120261 F

driv_lic 121255

100 Joe LA Peter 041465 M

phd_cs 021565

John 082270 M

200 Theo NY Mary 051578 F driv_lic 082686

(83)

8.3.2 Features of Object–Orientation

Encapsulation of Structure and Behaviour

In the relational data model there exist generic operatios for searching, inserting, deleting, and updating tuples, which can be applied to arbitrary relation schemas.

In object–oriented databases there are visible and hidden attributes.

• The visible attributes can be accessed by a declarative query language. • The hidden attributes are accessed by sending messages (message

passing) between the objects.

Each object type “has” integrity conditions, which are realised in the access operations.

(84)

Type and Class Hierarchies, Inheritance

A type is given by its

• type name, • attributes, and

• operations (methods).

(85)

A type hierarchy is an acyclic, binary relation of the set of types: Person Student Faculty Grad_Student Supertype Subtype R ? ? specialization: ↓ generalization: ↑

(86)

A sub–type inherits the functions of the super–type (inheritance). Additionally, the sub–type has its own functions.

→ multiple inheritance, selective inheritance

A class is a set of objects, which usually are of the same type. Usually, the set of all stored objects of each type forms a class.

(87)

The type system in OODBs can be extended at run time.

Frequently, the non–standard data type BLOB (binary large object) is used for

• raster pixel pictures and • long text strings.

(88)

Polymorphism (Operator Overloading)

The same operator name can have different implementations.

The implementation which is suitable for a certain object is determined at run time, when the type of the object is known (late binding).

E.g., the function “area” for calculating the area is implemented differently for different geometrical objects.

GEOMETRY_OBJEKT: Shape, Area, Centerpoint RECTANGLE subtype-of GEOMETRY_OBJECT

(Shape=’rectangle’): Width, Height TRIANGLE subtype-of GEOMETRY_OBJECT

(Shape=’triangle’): Side1, Side2, Angle CIRCLE subtype-of GEOMETRY_OBJECT

(89)

Multiple and Selective Inheritance

• Multiple inheritance occurs in a type hierarchy, if a type T is a sub–type

of several super–types T₁, . . . , Tn:

T₁ T₂ . . . Tn

T RU

Then T inherits the functions of T₁, . . . , Tn; this can lead to ambiguities.

• Selective inheritance occurs, if a type should only inherit some special

functions of one of its super–types T′_{. In this case, the undesired}

(90)

Versions and Configurations

Many database applications require the management of different versions

versions of complex objects:

• software projects • CAD applications.

A version graph shows the relations between the different versions of an object.

A configuration of a complex object is a composition of compatible versions for the sub–objects.

(91)

8.3.3 Examples: COMPANY and UNIVERSITY Database

In the following we will see the

• types, • classes,

• methods, and • some queries

(92)

The COMPANY Database as an OODB

DNAME DNUMBER MGR LOCATIONS EMPLOYEES PROJECTS tuple

set set set

tuple atom atom MANAGER atom atom atom atom tuple tuple tuple tuple tuple i i i i i i i i i i i i i i i i o o o o o o o o o o o 8 8 5 5 4 4 9 9 7 7 10 10 11 11 1 1 2 ₂ 3 ₃ 6 6 MANAGERSTARTDATE 15 16 17 v v v v v v 5 Research

Houston Bellaire Sugarland

v v 22-May-78 10 11 4 9 7 v 1 2 3 v 6 ₁₃ ₁₄ : : : : : : : : : : : :... :... :... :... :... 5

(93)

Complex Objects o1 = h i1,atom, Houstoni, o2 = h i2,atom, Bellairei, o3 = h i3,atom, Sugarlandi, o4 = h i4,atom, 5i, o5 = h i5,atom, Researchi, o6 = h i6,atom, 22-May-78i, o7 = h i7,set,{ i1, i2, i3 } i,

o8 = h i8,tuple,hDNAME : i5,DNUMBER : i4,MGR : i9,

LOCATIONS : i7,EMPLOYEES : i10,PROJECTS : i11i i,

o9 = h i9,tuple,hMANAGER : i12,MANAGERSTARTDATE : i6 i i,

o10 = h i10,set,{ i12, i13, i14 } i,

(94)

Data Types

define type Date:

tuple( year: integer, month: integer, day: integer );

define type Employee:

tuple( name: string, ssn: string,

birthdate: Date, sex: char, dept: Department );

define type Department:

tuple( dname: string, dnumber: integer,

mgr: tuple( manager: Employee, startdate: Date ), locations: set(string),

employees: set(Employee), projects: set(Project) );

(95)

Classes

define class Employee:

type tuple( name: string, ssn: string,

birthdate: Date, sex: char,

dept: Department );

operations

age(e: Employee): integer, create_new_emp: Employee,

(96)

define class Department:

type tuple ( dname: string, dnumber: integer,

mgr: tuple( manager: Employee, startdate: Date ), locations: set (string),

employees: set (Employee), projects: set (Project) );

operations

number_of_emps(d: Department): integer, create_new_dept: Department,

destroy_dept(d: Department): boolean,

add_emp(d: Department, e: Employee): boolean, remove_emp(d: Department, e: Employee): boolean;

(97)

define class DepartmentSet:

type set (Department);

operations

create_dept_set: DepartmentSet,

destroy_dept_set(ds: DepartmentSet): boolean,

add_dept(ds: DepartmentSet, d: Department): boolean, remove_dept(ds: DepartmentSet, d: Department): boolean;

persistent name AllDepartments: DepartmentSet;

/* AllDepartments is a persistent named object of type set(Department) */

. . .

d := create_new_dept;

/* creates new department object in the variable d */ b := add_dept(AllDepartments, d);

(98)

The UNIVERSITY Database as an OODB

Data Types

type Phone: tuple( area_code: integer, number: integer );

type Date: tuple( year: integer, month: integer, day: integer );

(99)

Classes

class Person

type tuple( ssn: string,

name: tuple( firstname: string, middlename: string, lastname: string ), address: tuple( number: integer, street: string, apt_no: string,

city: string, state: string, zipcode: string ), birthdate: Date,

sex: character );

method age: integer

(100)

class Student inherit Person

type tuple(

class: string, majors_in: Department, minors_in: Department, registered_in: set(Section),

transcript: set ( tuple(

grade: character, ngrade: real, section: Section ) ) );

method grade_point_average: real, change_class: boolean, change_major(new_major: Department): boolean;

end

class Grad_Student inherit Student

type tuple(

degrees: set ( tuple ( college: string, degree: string, year: integer ) ), advisor: Faculty );

(101)

class Faculty inherit Person

type tuple(

salary: real, rank: string, foffice: string, fphone: Phone, belongs_to: set(Department),

grants: set(Grants), advises: set(Student) ),

method promote_faculty, give_raise(percent: real),

end

class Department

type tuple(

dname: string, office: string, dphone: Phone, members: set(Faculty), majors: set(Student), chairperson: Faculty, courses: set(Course) ),

method add_major(s: Student), remove_major(s: Student):boolean

(102)

class Section

type tuple(

sec_num: integer, qtr: Quartar, year: Year,

students: set ( tuple( stud: Student, grade: character ) ), course: Course,

teacher: Instructor ),

method change_grade(s: Student, g: character);

end

class Course

type tuple(

cname: string, cnumber: string, cdescription: string, sections: set(Section),

offering_dept: Department );

(103)

Methods

method body age: integer in class Person

{

int a; Date d; d=today();

a=d->year - self->birthdate->year;

if ((d->month < self->birthdate->month) ||

(d->_month == self->_birthdate->_{month) &&}

(d->_{day < self}->_birthdate->_day)) --_a;

return a;

(104)

method body grade_point_average: real in class Student { float sum=0.0; int count=0; struct { char gr; float ngrade; o2_Section sec; } t;

for (t in self->transcript) {

/* increment sum by ngrade, count by 1 */ sum += t->ngrade; ++count;

}

return sum/count;

(105)

method body

change_major (new_major: Department): boolean in class Student

{

if (self->majors_in->remove_major(self)) {

return 0; } else { new_major->_{add_major(self);} self->_{majors_in=new_major;} return 1; } }

(106)

method body

remove_major(s: Student): boolean in class Department

{

if (s in self->majors) {

/* –= apply set difference to remove object s from set of majors */ self->majors –= set(s);

return 1;

}

else return 0;

(107)

method body

add_major(s: Student) in class Department

{

/* += apply set union to add object s to set of majors */ self->majors += set(s);

}

/* a persistent root to hold all persistent Person objects */

name All_Persons: set(Person);

/* a persistent root to hold a single Person object */

(108)

run body {

/* create a new Person object p */ o2 Person p = new Person;

*p = tuple (

ssn: ”222222222”, name: tuple(

firstname: ”Franklin”, middlename: ”T”, lastname: ”Wong” ), address: tuple( number: 638, street: ”Voss Road”,

city: ”Houston”, state: ”Texas”, zipcode: ”77079” ), birthdate: tuple( year: 1945, month: 12, day: 8 ),

sex: M );

/* p becomes persistent by attaching to persistent root */ All_Persons += set(p);

(109)

/* now put values in persitent named object John_Smith */ John_Smith->ssn=”444444444”,

John_Smith->name: tuple(

firstname: ”John”, middlename: ”B”, lastname: ”Smith”),

John_Smith->address: tuple( number: 731, street: ”Fondren Road”, city: ”Houston”, state: ”Texas”, zipcode: ”77036” ),

John_Smith->birthdate: tuple( year: 1955, month: 1, day: 9 ), John_Smith->sex:M;

(110)

Queries

select tuple (

fname: s.name.firstname, lname: s.name.lastname ) from s in Student

where s.majors_in.dname = ”Computer Science”

select tuple(

fname: s.name.firstname, lname: s.name.lastname, transcript: select tuple(

cname: sc.section.course.cname, sec_no: sc.section.sec_num, quarter: sc.section.qtr, year: sc.section.year, grade: sc.grade ) from sc in sec )

from s in Student, sec in s.transcript