8
Extended Database Concepts
DOOD
• deductive and
• object–oriented databases
DOOD databases offer advanced features for
• data modelling and • database programming
8.1
Deductive Databases and Logic Programming
The ease of handling the data structure of terms and the powerful built–in
control structure of backtracking are features that distinguish PROLOG from
other programming languages.
PROLOG is very well–suited for embedded database programming.
In the database context, frequently a restricted version is used, which is called DATALOG – the basis of deductive databases.
• PROLOG and DATALOG are declarative languages; they can access
databases and XML documents.
• Relations and complex objects (like, e.g., XML documents) can be
represented as term structures.
• With the help of declarative rules, we can represent integrity constraints
Basic Syntax of PROLOG
Constant Symbol: a, 10, ’Smith, John B.’
Variable Symbol: X, Lname (starts with a capital letter) Term: f(t1, . . . , tn),
with function symbol f and terms ti
a, X (constant and variable symbols are terms), f(g(a,b),X,10), a*(b+c) (complex terms),
[LNAME, . . . , DNO] (this is a list)
Predicate Symbol: employee, e_works_on_p, transitive_closure Atom: p(t1, . . . , tn),
Extra–Logical Features
• assert and retract can update the internal PROLOG database
• meta–predicates like maplist, findall, and ddbase_aggregate
form control structures
• the cut “!” freezes variable bindings and cuts off alternative computations paths
Terms in Infix / Prefix Form
It is possible to define binary, infix functors ⊙ in PROLOG. Then, the binary
term ⊙(t1, t2) can be represented in infix form as t1 ⊙ t2.
• The infix term 1955-01-09 representing a date has the prefix form
-(-(1955,01),09).
• The infix term a*(b+c) representing an arithmetic expression has the prefix form *(a,+(b,c)).
The operator trees for the terms above are given in the following:
-1955 -01 09 R R * b + c a R R
When an infix functor ⊙ is used multiple times in a term a ⊙ b ⊙ c, then there
are rules in PROLOG that determine whether a and b or b and c are joined first
in the prefix form.
• The infix term 1955-01-09 representing a date has the prefix form
-(-(1955,01),09).
• The infix term T:As:Es representing an XML element has the prefix
form :(T,:(As,Es)).
The operator trees for the terms above are given in the following:
-1955 -01 09 R R : As : Es T R R
Term Representation for Lists
A non–empty list is represented as a binary term structure .(X, Xs), where
• the head X is the first element and
• the tail Xs represents the rest of the list.
The list functor ”.” is binary, and the empty list is represented by ”[]”.
Additionally, PROLOG supports the compact list notation [X1,X2,...,Xn].
It is equivalent to [X1|[X2,...,Xn]].
[c] = .(c, [])
[a, b, c] = .(a, .(b, .(c, [])))
Term Representation for XML
An XML element
<table name="employee">
<attribute name="FNAME"/> </table>
can be represented by a complex term in field notation (FN):
table:[name:employee]:[
attribute:[name:’FNAME’]:[] ].
This infix form is using the binary functor ”:”.
Facts, Rules, and Goals
Literal: atom A oder negated atom not(A)
Fact: A
with atom A; e.g.,
employee(’John’, ’B’, ’Smith’, ...) Rule: A |{z} head :- B1, . . . , Bm | {z } body
with atom A and literals Bi, example later
Goal: :- B1, . . . , Bm
with literals Bi
A set of facts for the same predicate symbol corresponds to a relation in databases. Rules generalize views. Goals are used for expressing queries.
8.1.1 PROLOG as a Database Language
• PROLOG can be used for representing tables from relational databases.
The tuples of a table become PROLOG facts with the same predicate
symbol – usually, the table name is used.
• The data dictionary of a relational database can also be represented
using PROLOG facts. This can be done using PROLOG terms that
correspond to an XML representation of the data dictionary.
• Queries and integrity constraints can be represented as PROLOG rules.
Conjunctive queries are posed in the form of PROLOG goals, which are
then evaluated using the PROLOG rules.
• DATALOG is a restricted version of PROLOG without function symbols
and extra–logical features. DATALOG ensures termination and the
Database Tables
in MySQL:
EMPLOYEE
FNAME MINIT LNAME SSN BDATE ADDRESS SEX SALARY SUPERSSN DNO John B Smith 444444444 1955-01-09 731 Fondren, Houston, TX M 30000 222222222 5 Franklin T Wong 222222222 1945-12-08 638 Voss, Houston, TX M 40000 111111111 5 Alicia J Zelaya 777777777 1958-07-19 3321 Castle, Spring, TX F 25000 333333333 4 Jennifer S Wallace 333333333 1931-06-20 291 Berry, Bellaire, TX F 43000 111111111 4 Ramesh K Narayan 555555555 1952-09-15 975 Fire Oak, Humble, TX M 38000 222222222 5 Joyce A English 666666666 1962-07-31 5631 Rice, Houston, TX F 25000 222222222 5 Ahmad V Jabbar 888888888 1959-03-29 980 Dallas, Houston, TX M 25000 333333333 4 James E Borg 111111111 1927-11-10 450 Stone, Houston, TX M 55000 NULL 1
A database table p can be represented by a set of PROLOG facts, namely one
WORKS_ON
ESSN PNO HOURS
111111111 20 NULL 222222222 2 10.0 222222222 3 10.0 333333333 20 15.0 333333333 30 20.0 444444444 1 32.5 444444444 2 7.5 555555555 3 40.0 666666666 1 20.0 666666666 2 20.0 777777777 10 10.0 777777777 30 30.0 888888888 10 35.5 888888888 30 5.0 PROJECT
PNAME PNUMBER PLOCATION DNUM
ProductX 1 Bellaire 5 ProductY 2 Sugarland 5 ProductZ 3 Houston 5 Computerization 10 Stafford 4 Reorganization 20 Houston 1 Newbenefits 30 Stafford 4
Database Tables in PROLOG: employee(’John’, ’B’, ’Smith’, 444444444, 1955-01-09, ’731 Fondren, Houston, TX’, ’M’, 30000, 222222222, 5). employee(’Franklin’, ’T’, ’Wong’, ...). ... works_on(444444444, 1, 32.5). works_on(444444444, 2, 7.5). ... department(’Research’, 5, 222222222, 1978-05-22). ... project(’ProductX’, 1, ’Bellaire’, 5). ...
We do not quote the date values. Then, they are terms, and we can access their components more conveniently without string parsing.
Views and Queries vs. Rules and Goals
• SQL VIEW:
CREATE VIEW E_WORKS_ON_P AS
SELECT FNAME, LNAME, PNAME, HOURS FROM EMPLOYEE, WORKS_ON, PROJECT WHERE EMPLOYEE.SSN = WORKS_ON.ESSN
AND PROJECT.PNUMBER = WORKS_ON.PNO
• PROLOG rule:
e_works_on_p(FNAME, LNAME, PNAME, HOURS) :-works_on(SSN, P, HOURS),
employee(FNAME, _, LNAME, SSN, _,_,_,_,_,_), project(PNAME, P, _,_).
• SQL SELECT:
SELECT *
FROM E_WORKS_ON_P
The SELECT statement calls the view. • PROLOG goal:
?- e_works_on_p(FNAME, LNAME, PNAME, HOURS).
The query is submitted to the PROLOG interpreter as a goal.
Integrity Constraints in PROLOG
• Primary Key Constraint for Employee:
primary_key_violation(employee, X, Y) :-X = employee(_,_,_, SSN, _,_,_,_,_,_), Y = employee(_,_,_, SSN, _,_,_,_,_,_), call(X), call(Y), X \= Y.
• Foreign Key Constraint for Employee:
foreign_key_violation(employee(’DNO’), X) :-X = employee(_,_,_,_,_,_,_,_,_, DNO),
call(X),
not(department(_, DNO, _,_)).
In DDBASE, the primary and foreign key contraints of a relational database
Argument Positions vs. Field Notation (FN)
• Like in other programming languages, the arguments ti of an atom
p(t1, . . . , tn) are handed over by position in PROLOG. E.g., in
works_on(S, P, H),
the first position t1 = S is the SSN of an employee who has worked on the project with the number t2 = P for t3 = H hours.
• In the database context, we could use a meta–interpreter for accessing
arguments in field notation – in a more abstract way – by their
corresponding attribute name. Then, according to the database schema,
works_on(’PNO’:P, ’ESSN’:S)
means that the employee S has worked on the project P. The order of the
Semantic Constraints in Field Notation (FN)
• No employee should earn more than his manager: trigger(salary, X, Y)
:-employee(’SSN’:X, ’SALARY’:S1, ’SUPERSSN’:Y), employee(’SSN’:Y, ’SALARY’:S2),
S1 > S2.
• Which employee works on a foreign project ?
trigger(employee_works_on_foreign_project, E, P) :-works_on(’ESSN’:E, ’PNO’:P),
employee(’SSN’:E, ’DNO’:D1),
project(’PNUMBER’:P, ’DNUM’:D2), D1 \= D2.
FN abstracts from argument positions: employee(’SSN’:E, ’DNO’:D1)
Bottom–Up Evaluation of DATALOG
• The set of all given facts for a predicate corresponds to a relation. • A rule without function symbols corresponds to a VIEW statement
defining a relation for the head predicate.
• The relations for the body predicates are derived using rules themselves.
Thus, it can happen that a rule transitively helps to derive tuples for one of its body predicates (recursion).
E.g., the second rule for supervisor is directly recursive.
• The bottom–up evaluation iteratively enlarges the relations for the
predicates by repeatedly evaluating all rules until a fixpoint is reached. Thus, e.g., all transitive supervisors can be derived, which is usually not possible using SQL systems.
Recursion and Transitive Closure 444444444 555555555 666666666 777777777 888888888 222222222 333333333 111111111 ? j R j
The following recursive DATALOG rule set derives the transitive supervisor
relation on the social security numbers: supervisor(SSN_1, SSN_2) :-direct_supervisor(SSN_1, SSN_2). supervisor(SSN_1, SSN_2) :-direct_supervisor(SSN_1, SSN_3), supervisor(SSN_3, SSN_2). direct_supervisor(SSN_1, SSN_2) :-employee(_,_,_, SSN_2, _,_,_,_, SSN_1, _). SSN_1 SSN_3 SSN_2 direct s. supervisor ? ?
The first iteration derives the facts for direct_supervisor from the
facts for employee:
direct_supervisor(111111111, 222222222). direct_supervisor(111111111, 333333333). direct_supervisor(222222222, 444444444). direct_supervisor(222222222, 555555555). direct_supervisor(222222222, 666666666). direct_supervisor(333333333, 777777777). direct_supervisor(333333333, 888888888).
The second iteration translates these facts to the corresponding 7 facts for
supervisor.
supervisor(111111111, 222222222). ...
The third iteration derives the 5 new facts that 111111111 is the transitive (indirect) supervisor of the employees 444444444 to 888888888:
supervisor(111111111, 444444444). supervisor(111111111, 555555555). supervisor(111111111, 666666666). supervisor(111111111, 777777777). supervisor(111111111, 888888888).
Since the hierarchy is of limited depth 2 here, the relations corresponding to
these facts could also be derived in SQL.
For arbitrary hierarchies of unlimited depth, however, it is usually not possible to derive the transitive supervisors in SQL.
In principle, all rules can be used in all iterations. But, a rule can only fire and derive facts, as soon as facts for the body atoms have been derived in previous iterations. From then on, the rule can always be used to derive the same facts. One of the purposes of efficient bottom–up evaluation is to avoid these
redundant derivations – especially in the presence of recursion.
Finally, in iteration 4, the 5 facts for transitive supervisors are derived. Iteration 5 does not derive any new facts.
Comparison with SQL
• Non–recursive DATALOG could be simulated in SQL by mapping the
rules to View statements – or to INSERT statements whose result is computed using a SELECT statement.
• Recursion brings higher expressivity to DATALOG.
• There are DATALOG extensions which allow for default negation and
aggregate operations as well.
• The rule–based approach of DATALOG supports modularization:
instead of one single, complex VIEW or SELECT statement in SQL, a set
Transitive closure cannot be formulated in standard SQL systems. Some
relational database systems, however, offer limited forms of recursion – cf. SQL:2003.
CREATE RECURSIVE VIEW supervisor(Emp, Sup) AS SELECT Emp, Sup
FROM direct_supervisor
UNION
SELECT D.Emp, S.Sup
FROM direct_supervisor D, supervisor S WHERE D.Sup = S.Emp
This assumes a table direct_supervisor with the attributes Emp
and Sup. Obviously, this SQL implementation is structurally equivalent to
the following shorter rule implementation (“;” means “or”). supervisor(Emp, Sup)
:-( direct_supervisor:-(Emp, Sup)
8.1.2 The Deductive Database System DDBASE
The deductive database system DDBASE, which is part of the DDK,
combines PROLOG and DATALOG. It can process • relational databases and
• XML documents
within the same query using ODBC and FNQuery, respectively:
DDBASE
RDB XML
ODBC FNQuery
U
ODBC
The following PROLOG rule accesses a relational database – given by the
connection handle mysql – using the ODBC library of SWI PROLOG.
generate_html_table(Salary, table:Rows)
:-concat(’SELECT fname, minit, lname, salary \
FROM employee WHERE salary >= ’, Salary, Query), Types = [types([atom,atom,atom,integer])],
findall( Row,
( odbc_query(mysql, Query, row(F,M,L,S), Types), Row = tr:[td:[F], td:[M], td:[L], td:[S]] ), Rows ).
The query string Query is obtained by concatenating a partial select statement with the value for the salary. Types gives the types of the components of the result tuples.
The findall Statement
• The call odbc_query(mysql, Query, row(F,M,L,S), Types) returns the values F,M,L,S for the attributes fname, minit, lname, salary of the table employee.
• By backtracking, the findall statement produces a list Rows of PROLOG terms Row of the form tr:[td:[F], td:[M],
td:[L], td:[S]], which represent XML elements in FNQuery. • For a given Salary, the call generate_html_table(Salary,
table:Rows) produces a PROLOG term table:Rows, which represents the following HTML table in FNQuery.
The generated HTML table <table> <tr><th>Fname</th><th>Minit</th><th>Lname</th><th>Salary</th></tr> <tr><td>John</td><td>B</td><td>Smith</td><td>30000</td></tr> <tr><td>Franklin</td><td>T</td><td>Wong</td><td>40000</td></tr> <tr><td>Jennifer</td><td>S</td><td>Wallace</td><td>43000</td></tr> <tr><td>Ramesh</td><td>K</td><td>Narayan</td><td>38000</td></tr> <tr><td>James</td><td>E</td><td>Borg</td><td>55000</td></tr> </table>
By ODBC, we can make SQL tables available in DDBASE: employee(A,B,C,D,E,F,G,H,I,J) :-ddbase_call(odbc(mysql), company:employee(A,B,C,D,E,F,G,H,I,J)). works_on(A,B,C) :-ddbase_call(odbc(mysql), company:works_on(A,B,C)).
It is also possible to generate these rules in DDBASE, which avoids the
error–prone, repeated use of so many variable symbols.
The call ddbase_connect(odbc(mysql), M, Database:Table)
asserts a corresponding rule in a PROLOG module M.
The following two aggregation statements refer to the predicate
employee/10 provided by ODBC. In the second statement, the facts for
works_on/3 are derived using FNQuery from an XML document
Aggregation Queries
The meta–predicate ddbase_aggregate/3 in the following query
groups over the employees:
• for every employee – given by FNAME,MINIT,LNAME,SSN –
• the corresponding list of all tuples [PNO,HOURS] is computed:
?- ddbase_aggregate( [F, M, L, S, list([P,H])], ( works_on(S, P, H), employee(F, M, L, S, _,_,_,_,_,_) ), Tuples ), Attributes = [’FNAME’,’MINIT’,’LNAME’,’SSN’,’[PNO,HOURS]’], xpce_display_table(Attributes, Tuples).
Tuples = [
[’Ahmad’, ’V’, ’Jabbar’, ’888888888’, [[10, 35.5], [30, 5.0]]], [’Alicia’, ’J’, ’Zelaya’, ’777777777’, [[10, 10.0], [30, 30.0]]], ... ]
Thus, DDBASE can produce nested (NF2) tables, which is not possible
Aggregation on RDB and XML
For every Ssn in the table employee, the following query groups all corresponding entries from the document works_on.xml:
?- ddbase_aggregate( [Ssn, list([Pno, Hours])], ( employee(_,_,_, Ssn, _,_,_,_,_,_),
Row := doc(’works_on.xml’)/row::[@’ESSN’=Ssn], Pno := Row@’PNO’, Hours := Row@’HOURS’ ),
Tuples ).
Tuples = [
[’111111111’, [[’20’, ’0.0’]]],
[’222222222’, [[’2’, ’10.0’], [’3’, ’10.0’]]], ... ] The resulting list Tuples represents an NF2 relation.
A query optimizer could rearrange the Goal in ddbase_aggregate/3 by changing the order of the calls to the predicate employee/10 and the XML
In DDBASE, we can define arbitary binary aggregation predicates.
ddbase_aggregate/3 groups over all variable symbols that occur standalone in the result template [Ssn, list([Pno, Hours])]; in this case, this is Ssn.
• For every Ssn, the above call to ddbase_aggregate/3 computes the list Xs of all corresponding pairs [Pno, Hours].
• Then, the call list(Xs, Pairs) simply passes Xs to Pairs.
• Thus, ddbase_aggregate/3 produces a nested tuple [Ssn,
Pairs] for every Ssn. Pairs is a list of lists; it represents a relation.
The following statement aggregates the working hours of the employees of the departments:
?- ddbase_aggregate( [Dno, sum(Hours)],
( employee(_,_,_, Ssn, _,_,_,_,_, Dno),
Row := doc(’works_on.xml’)/row::[@’ESSN’=Ssn], H := Row@’HOURS’, atom_number(H, Hours) ),
Tuples ).
Tuples = [[1, 0.0], [4, 115.5], [5, 140.0]]
The attribute value H of the attribute ’Hours’ of Row is an atom that has to be converted to a number Hours.
The template [Dno, sum(Hours)] leads to a grouping on the department numbers. For every Dno, first the list Xs of all corresponding Hours is
computed, and then the sum is computed by the call sum(Xs, Sum); thus, we obtain a standard result tuple [Dno, Sum].
8.1.3 PROLOG as a Programming Language
In the following, we will present PROLOG implementations of well–known
algorithms for searching in graphs and binary search trees. The benefits of PROLOG are
• the elegant handling of data structures (lists, trees, XML), • (implicit) backtracking, and
• the compact representation of case distinctions in different rules.
The algorithms are typically recursive. Recursion can be formulated nicely due to the compact list access.
Graph Search
The graph is given by PROLOG facts for the predicates graph_arc/2 and graph_sink/1. Labyrinth: 6 -a b c d e f g h i graph_arc(i, f). graph_arc(i, h). graph_arc(h, g). graph_arc(g, d). graph_arc(d, e). graph_arc(d, a). graph_arc(a, b). graph_arc(b, c). graph_sink(c).
Search for Simple Paths
The predicate graph_search/2 searches for a simple path from a given node to a sink of a graph:
% graph_search(+Node, ?Path) <-graph_search(X, Path)
:-graph_search(X, [X], Path).
Another predicate graph_search/3 with the same predicate symbol but a
different arity is called.
Notation for arguments in the comment line:
Visited
Path =
[Y1=Y, . . . ,Yn=Z]
X Y Z
- -
-• A call graph_search(X, Visited, Nodes) with a bound
argument X, which is not a sink, and a list Visited of already visitied nodes
– uses an edge from X to not yet vistited successor node Y, and then – calculates a path Path from Y to a sink Z, which does not visit Y and
the nodes in Visited.
If no path from Y to a sink can be found, then another successor node of
X must be used (Backtracking).
• The result Nodes = [X|Path] is a simple path from X to a sink of the graph.
The predicate graph_search/3 is recursive, because of its second rule:
% graph_search(+Node, +Visited, ?Path) <-graph_search(X, _, [X])
:-graph_sink(X).
graph_search(X, Visited, [X|Path]) :-graph_edge(X, Y),
not(member(Y, Visited)),
write(user, ’->’), write(user, Y), graph_search(Y, [Y|Visited], Path).
Visited
Path =
[Y1=Y, . . . ,Yn=Z]
X Y Z
- -
-Termination is ensured by the fact that already visited nodes cannot be visited
The following rule symmetrisises the predicate graph_arc/2:
graph_edge(X, Y) :-( graph_arc:-(X, Y)
; graph_arc(Y, X) ).
Thus, it is not necessary to explicitely list the inverse edges. E.g., from
graph_arc(i, f).
we obtain
graph_edge(i, f). graph_edge(f, i).
• The initial call graph_search(X, [X], Path) calculates a simple path from X to a sink of the graph.
– If X is a sink, then the first rule for graph_search/3 computes
Path as an empty list.
– Otherwise, the recursive, second rule choses a successor node Y using
graph_edge(X, Y), and then it continues searching from there.
• Further paths can be searched for by backtracking.
– Alternative successor nodes Y can be used in the second rule.
– In the implementation above, we can continue searching beyond a sink by using the second rule instead of the fist one.
Implicit and Explicit Backtracking
In PROLOG, backtracking is used automatically (implicitly).
In a procedural language, backtracking has to be implemented explicitly. In a direct translation of the code above to a procedural environment, a call
graph_edge(X, Y) can only produce a single successor node Y of X –
if there is no path from Y to a sink, then the computation fails. Moreover, at most one solution could be computed.
If we implement the graph search procedurally using explicit backtracking, then we get more lines of more complex code than in PROLOG.
Computation
• The predicate graph_search/2 use depth first search, and it calculates simple paths (without duplicate nodes).
• With the call graph_search(+Node, -Path), we can calculate all simple paths from Node to a sink (graph_sink) by backtracking:
?- graph_search(i, Path). ->f->h->g->d->e->a->b->c Path = [i, h, g, d, a, b, c] ?- graph_search(e, Path). ->d->a->b->c Path = [e, d, a, b, c] ; ->g->h->i->f No
• If we add another edge graph_arc(e, b) to the graph (i.e., we tear down the wall between e and b), then there appears another simple path
[e, b, c] from e to the sink c.
• All results can be calculated by backtracking and findall/3:
graph_arc(e, b). ?- findall( Path,
graph_search(e, Path), Paths ).
Paths = [[e, d, a, b, c], [e, b, c]] Yes
The Meta–Predicate findall/3
Finding of all solutions for a goal:
findall( X, goal(X), Xs )
The DDK allows for the following equivalent set notation:
Xs <= { X | goal(X) }
Further important meta–predicates are checklist/2 and maplist/3 for
lists, as well as the predicates for loops (control structures) from the library
Binary Search Trees
% search_in_tree(+Key, +Tree) <-search_in_tree(Key, Tree)
:-Tree = tree(Root, Lson, Rson), ( Key = Root
; Key < Root ->
search_in_tree(Key, Lson) ; Key > Root ->
search_in_tree(Key, Rson) ).
Term Representation of a Search Tree
tree(5,
tree(4, nil, nil), tree(9,
tree(6, nil, nil),
tree(10, nil, nil) ) )
5
10 6
9 4
% insert_into_tree(+Key, +Tree, ?New_Tree) <-insert_into_tree(Key, Tree, New_Tree)
:-Tree = tree(Root, Lson, Rson), ( Key = Root ->
New_Tree = Tree ; Key < Root ->
insert_into_tree(Key, Lson, L), New_Tree = tree(Root, L, Rson) ) ; K > Root ->
insert_into_tree(Key, Rson, R), New_Tree = tree(Root, Lson, R) ). insert_into_tree(Key, nil New_Tree)
Important Concepts
• Terms (for Data and Control Structures) and Unification • Backtracking
• SLDNF–Resolution
PROLOG allows for
• declarative programming, • compact programs, and
Data Structures, Operations, and Control Structures
• The restriction to a few basic data types and a single complex data type,
namely the terms, which is generic and subsumes all the other types, standardizes the data structures.
• There are no explicit type declarations.
• There exists a large collection of generic operations that are applicable to
terms – and thus to all data types.
• Frequently, meta–predicates are used.
• In addition to standard control structures, such as branching
(if–then–else), loops (for, while), and recursion, user–defined control structures can be built as meta–predicates.
Software Engineering Aspects
PROLOG supports abstraction and compact code, and thus stimulates
refactoring:
• The generic type of terms with generic operations supports abstraction
and code reuse.
• User–defined control structures allow for further abstraction.
• Unification, implicit backtracking, and abstaining from explicit type
declarations, result in very compact code and support rapid prototyping.
• Declarativity makes the code much more readable and thus extensible.
Switching from conventional programming languages to the logic
programming paradigm is difficult and usually requires a lot of training and effort.
Course on Deductive Databases
Topics:
• foundations and applications of PROLOG and DATALOG,
data modelling and programming;
• the deductive database system DDBASE; • efficient evaluation of DATALOG programs;
• further language constructs in the DDK (DISLOG Developers’ Kit):
– complex data structures,
– default negation and disjunction;
8.2
Semantic Web Databases
Knowledge representation in the Semantic Web (Web 2.0) is based on ontologies and logic.
Reasoning Tasks:
• search: query answering;
• knowledge engineering / modelling:
analysis of the structure of the ontologies for anomalies.
Knowledge engineering and reasoning in the Semantic Web is based on ontology editors and specialized databases. It can further be supported by deductive databases and logic programming techniques.
In the Semantic Web, it is possible to reason about
• the ontology / taxonomy (i.e., the schema) and • the instances.
This is called terminological or assertional (T–Box or A–Box) reasoning, respectively. This makes search in the Semantic Web more effective.
• In the following printer ontology, we could search for a printer from HP,
and the result could be a laser–jet printer from HP, since the system knows that hpLaserJetPrinter is a sub–class of hpPrinter.
• It can also be derived, that all laser–jet printers from HP are no laser
writers from Apple; in this case, this is very easy, since it is explicitely stored in the ontology.
Moreover, we will show in the following how to support knowledge
The Web Ontology Language (OWL)
In OWL, we can mix concepts from
• rdf (Resource Description Framework) for defining instances and
• rdfs (rdf Schema) for defining the schema
of an application. Moreover, tags with the namespace owl are allowed.
The Semantic Web Rule Language (SWRL) incorporates logic programming
rules into OWL ontologies.
There exist well–known, powerful tools for asking queries on and for
The Printer Ontology
product
printer
personalPrinter laserJetPrinter hpPrinter
hpLaserJetPrinter hpApplePrinter hpProduct appleLaserWriter {disjoint} ibmLaserPrinter
The Printer Ontology in OWL <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:xsd="http://www.w3.org/2001/XLMSchema#" xmlns="file:/protege/Ontologies/p.owl#"> <owl:Ontology rdf:about="">
<owl:VersionInfo> Printer Example, Version 1.3, 02.02.2013 </owl:VersionInfo> </owl:Ontology>
<owl:Class rdf:ID="printer"/>
<owl:Class rdf:ID="laserJetPrinter">
<rdfs:subClassOf rdf:resource="#printer"/> </owl:Class> ...
The following owl:Class element defines the class appleLaserWriter:
<owl:Class rdf:ID="appleLaserWriter"> <rdfs:comment>
Apple laser writers are laser jet printers </rdfs:comment>
<rdfs:subClassOf rdf:resource="#laserJetPrinter"/> <owl:disjointWith rdf:resource="#hpLaserJetPrinter"/> </owl:Class>
The rdfs:subClassOf sub–element states that appleLaserWriter is a
sub–class of laserJetPrinter. The owl:disjointWith sub–element
states that appleLaserWriter is disjoint from hpLaserJetPrinter.
Observe, that we refer using the attribute rdf:resource and a “#”, whereas
The following owl:Class element defines a class of printers from a joint
venture of HP and Apple:
<owl:Class rdf:ID="hpApplePrinter"> <rdfs:comment>
Printers from a joint venture of HP and Apple </rdfs:comment>
<rdfs:subClassOf rdf:resource="#hpLaserJetPrinter"/> <rdfs:subClassOf rdf:resource="#appleLaserWriter"/> </owl:Class>
The existence of such printers would contradict the disjointWith
restriction between the classes hpLaserJetPrinter and apperLaserWriter.
The emptiness of the class hpApplePrinter can be detected by reasoners
Every laserJetPrinter is a printer, and every hpPrinter is an hpProduct: <owl:Class rdf:ID="printer"/> <owl:Class rdf:ID="laserJetPrinter"> <rdfs:subClassOf rdf:resource="#printer"/> </owl:Class> <owl:Class rdf:ID="hpProduct"/> <owl:Class rdf:ID="hpPrinter"> <rdfs:subClassOf rdf:resource="#hpProduct"/> </owl:Class>
Redundant subClassOf Relation
Since hpLaserJetPrinter is a sub–class of hpPrinter and hpPrinter
is a sub–class of hpProduct, it is redundant to explicitely state that hpLaserJetPrinter is a sub–class of hpProduct.
<owl:Class rdf:ID="hpLaserJetPrinter"> <rdfs:subClassOf rdf:resource="#laserJetPrinter"/> <rdfs:subClassOf rdf:resource="#hpPrinter"/> <rdfs:subClassOf rdf:resource="#hpProduct"/> <owl:disjointWith rdf:resource="#appleLaserWriter"/> </owl:Class>
This redundancy is not an error. We could simply consider it as an anomaly, that should be reported to the knowledge engineer.
Instances
Finally, we have some instances of some of the defined classes: <appleLaserWriter rdf:ID="1001"/>
<appleLaserWriter rdf:ID="1002"/> <hpLaserJetPrinter rdf:ID="1003"/> <hpLaserJetPrinter rdf:ID="1004"/>
As mentioned before, there cannot exist instances of the class
The ontology editor Protégé has some plugged in reasoners, such as
• FaCT++, • HermiT, and • Racer.
In the session that is shown in the screenshot above, the emptiness of the class
hpApplePrinter was be detected by the ontology reasoner FaCT++. It is inferred that the class hpApplePrinter is EquivalentTo the
empty class Nothing. By clicking the question mark, an explanation can be shown.
There are also databases for handling rdf data, so called triple stores, such
as Sesame or Jena. They use extensions of SQL– most notably SPARQL – as
Declarative Queries in FNQuery
Complex XML data structures in PROLOG:
’owl:Class’:[’rdf:ID’:’appleLaserWriter’]:[ ’rdfs:comment’:[’Apple laser ...’], ’rdfs:subClassOf’:[ ’rdf:resource’:’#laserJetPrinter’]:[], ’owl:disjointWith’:[ ’rdf:resource’:’#hpLaserJetPrinter’]:[] ]
An XML element is represented as a term structure T:As:C, called FN–triple.
• T is the tag of the element,
• As is the list of the attribute/value pairs A:V of the element, and
FNSELECT
In an OWL knowledge base Owl, there exists an isa relation between two classes C1 and C2, if a subclassOf relation is stated explicitely, or
if C1 was defined as the interesection of C2 and some other classes: % isa(+Owl, ?C1, ?C2) <-isa(Owl, C1, C2) :-C := Owl/’owl::-Class’::[@’rdf:ID’=:-C1], ( R2 := C/’rdfs:subClassOf’@’rdf:resource’ ; R2 := C/’owl:intersectionOf’/’owl:Class’@’rdf:about’ ), owl_reference_to_id(R2, C2). % owl_reference_to_id(+Reference, ?Id) <-owl_reference_to_id(Reference, Id)
:-( concat:-(’#’, Id, Reference) ; Id = Reference ).
Disjointness of Classes % disjointWith(+Owl, ?C1, ?C2) <-disjointWith(Owl, C1, C2) :-R2 := Owl/’owl:Class’::[@’rdf:about’=R1] /’owl:disjointWith’@’rdf:resource’, owl_reference_to_id(R1, C1), owl_reference_to_id(R2, C2).
In the following, we often suppress the ontology argument Owl.
Transitive Closure of isa
% subClassOf(?C1, ?C2) <-subClassOf(C1, C2)
:-isa(C1, C2).
subClassOf(C1, C2)
Anomalies in Ontologies Cycle ?- isa(C1, C2), subClassOf(C2, C1). C1 = personalPrinter, C2 = printer Partition Error ?- disjointWith(C1, C2), subClassOf(C, C1), subClassOf(C, C2). C = hpApplePrinter, C1 = hpLaserJetPrinter, C2 = appleLaserWriter
Incompleteness
?- isa(C1, C), isa(C2, C), isa(C3, C),
disjointWith(C1, C2), not(disjointWith(C2, C3)). C = laserJetPrinter,
C1 = hpLaserJetPrinter, C2 = appleLaserWriter, C3 = ibmLaserPrinter
The class C has three sub–classes C1, C2 and C3, from which only the two sub–classes C1 and C2 are declared as disjoint in the knowledge base.
The fact that C2 and C3 are disjoint and that C1 and C3 are disjoint as well,
possibly was forgotten by the knowledge engineer during the creation of the knowledge base.
Redundant subClassOf/instanceOf Relations % redundant_isa(?Chain) <-redundant_isa(C1->C2->C3) :-isa(C1, C2), subClassOf(C2, C3), isa(C1, C3). ?- redundant_isa(Chain).
Chain = hpLaserJetPrinter -> hpPrinter -> hpProduct
The sub–class relation between C1 and C3 can be derived by transitivity over the class C2.
Here, isa(C1, C2), subClassOf(C2, C3), requires that this deduction is done over at least two levels.
Undefined Reference
During the development of an ontology in OWL, it is possible that we
reference a class that we have not yet defined. % undefined_reference(+Owl, ?Ref) <-undefined_reference(Owl, Ref) :-rdf_reference(Owl, Ref), not(owl_class(Owl, Ref)). rdf_reference(Owl, Ref) :-( R := Owl/descendant_or_self::’*’@’rdf:resource’ ; R := Owl/descendant_or_self::’*’@’rdf:about’ ), owl_reference_to_id(R, Ref). owl_class(Owl, Ref) :-Ref := Owl/’owl:Class’@’rdf:ID’.
If we load such an ontology into Protégé, then the ontology reasoners may produce wrong results, even for unrelated parts of the ontology.
8.3
Object–Oriented Databases
Application Domains
• engineering (CAD/CAM, CIM) • image and graphics databases • scientific applications
• geo–databases
• multimedia systems
Influences and concepts from other areas of computer science:
• programming languages:
abstract data typs and encapsulation completeness (w.r.t. expressivity)
• software engineering:
modularisation, code extensibility and reuse
• artificial intelligence:
concepts for knowledge representation, classification
• databases:
8.3.1 Complex Objects
Every object has a unique object identifier OID.
This value is invisible for the user. It is only used internally by the system to identify an object and to allow for references between different objects.
An object o is represented by a triple h i, c, v i: • i is the unique object identifier,
• c is a type constructor, • v is the value of o.
Type Constructors: atom, tuple, set, list, array
Given an object o = h i, c, v i.
• If c = atom, then v is an atomic value. • If c = tuple, then
v = h a1 : i1, a2 : i2, . . . , an : in i
is a tuple with attribute names aj and OID’s ij.
• If c = set, then
v = { i1, i2, . . . , in }
is a set of OID’s ij.
A complex object can be represented by a graph.
Two complex objects o1 = h i1, c1, v1 i and o2 = h i2, c2, v2 i are called • deeply equal, if c1 = c2 and v1 = v2.
• shallow equal, if their graphs are isomorphic and the atomic values in
the corresponding leaves are the same.
OODDL: Object–Oriented Data Definition Language
Nowadays, complex objects are frequently stored and managed using XML
Example (Complex Objects)
The complex objects o1, o2, and o3 contain the atomic objects o4, o5, and o6 as sub–objects: o1 = h i1, tuple,h a1 : i4, a2 : i6 i i, o2 = h i2, tuple,h a1 : i4, a2 : i6 i i, o3 = h i3, tuple,h a1 : i5, a2 : i6 i i, o4 = h i4, atom,10 i, o5 = h i5, atom,10 i, o6 = h i6, atom,20 i.
tuple tuple
atom
atom atom atom
o o o o o o i i i i i i <a :10, a :20> <a :10, a :20> a a a a 1 1 1 1 1 1 2 2 2 2 3 3 4 4 6 6 5 5 6 6 : : : : : :
Nested Relations
Given a set U of attributes with domains dom(A), A ∈ U .
Formats R and domains dom(R) are defined recursively:
• R = (A1, . . . , An, R1, . . . , Rm) with Ai ∈ U , 1 ≤ i ≤ n, and formats
Rj, 1 ≤ j ≤ m, is a format with
dom(R) = dom(A1) × . . . × dom(An) × 2dom(R1) × . . . × 2dom(R2).
• If m = 0, then R is a basic format.
A nested tuple over a format R is an element of dom(R).
A nested relation or NF2–Relation (Non–First–Normal–Form) over R is a subset of dom(R).
Example (NF2–Relation) formats
Children = (Cname, BDate, Sex) Graduations = (Type, Date)
Employees = (Id, Name, Address, Children, Graduations) NF2–Relation over the format Employees:
Employees
Id Name Address Children Graduations Cname Bdate Sex Type Date
Mary 120261 F
driv_lic 121255
100 Joe LA Peter 041465 M
phd_cs 021565
John 082270 M
200 Theo NY Mary 051578 F driv_lic 082686
8.3.2 Features of Object–Orientation
Encapsulation of Structure and Behaviour
In the relational data model there exist generic operatios for searching, inserting, deleting, and updating tuples, which can be applied to arbitrary relation schemas.
In object–oriented databases there are visible and hidden attributes.
• The visible attributes can be accessed by a declarative query language. • The hidden attributes are accessed by sending messages (message
passing) between the objects.
Each object type “has” integrity conditions, which are realised in the access operations.
Type and Class Hierarchies, Inheritance
A type is given by its
• type name, • attributes, and
• operations (methods).
A type hierarchy is an acyclic, binary relation of the set of types: Person Student Faculty Grad_Student Supertype Subtype R ? ? specialization: ↓ generalization: ↑
A sub–type inherits the functions of the super–type (inheritance). Additionally, the sub–type has its own functions.
→ multiple inheritance, selective inheritance
A class is a set of objects, which usually are of the same type. Usually, the set of all stored objects of each type forms a class.
The type system in OODBs can be extended at run time.
Frequently, the non–standard data type BLOB (binary large object) is used for
• raster pixel pictures and • long text strings.
Polymorphism (Operator Overloading)
The same operator name can have different implementations.
The implementation which is suitable for a certain object is determined at run time, when the type of the object is known (late binding).
E.g., the function “area” for calculating the area is implemented differently for different geometrical objects.
GEOMETRY_OBJEKT: Shape, Area, Centerpoint RECTANGLE subtype-of GEOMETRY_OBJECT
(Shape=’rectangle’): Width, Height TRIANGLE subtype-of GEOMETRY_OBJECT
(Shape=’triangle’): Side1, Side2, Angle CIRCLE subtype-of GEOMETRY_OBJECT
Multiple and Selective Inheritance
• Multiple inheritance occurs in a type hierarchy, if a type T is a sub–type
of several super–types T1, . . . , Tn:
T1 T2 . . . Tn
T RU
Then T inherits the functions of T1, . . . , Tn; this can lead to ambiguities.
• Selective inheritance occurs, if a type should only inherit some special
functions of one of its super–types T′. In this case, the undesired
Versions and Configurations
Many database applications require the management of different versions
versions of complex objects:
• software projects • CAD applications.
A version graph shows the relations between the different versions of an object.
A configuration of a complex object is a composition of compatible versions for the sub–objects.
8.3.3 Examples: COMPANY and UNIVERSITY Database
In the following we will see the
• types, • classes,
• methods, and • some queries
The COMPANY Database as an OODB
DNAME DNUMBER MGR LOCATIONS EMPLOYEES PROJECTS tuple
set set set
tuple atom atom MANAGER atom atom atom atom tuple tuple tuple tuple tuple i i i i i i i i i i i i i i i i o o o o o o o o o o o 8 8 5 5 4 4 9 9 7 7 10 10 11 11 1 1 2 2 3 3 6 6 MANAGERSTARTDATE 15 16 17 v v v v v v 5 Research
Houston Bellaire Sugarland
v v 22-May-78 10 11 4 9 7 v 1 2 3 v 6 13 14 : : : : : : : : : : : :... :... :... :... :... 5
Complex Objects o1 = h i1,atom, Houstoni, o2 = h i2,atom, Bellairei, o3 = h i3,atom, Sugarlandi, o4 = h i4,atom, 5i, o5 = h i5,atom, Researchi, o6 = h i6,atom, 22-May-78i, o7 = h i7,set,{ i1, i2, i3 } i,
o8 = h i8,tuple,hDNAME : i5,DNUMBER : i4,MGR : i9,
LOCATIONS : i7,EMPLOYEES : i10,PROJECTS : i11i i,
o9 = h i9,tuple,hMANAGER : i12,MANAGERSTARTDATE : i6 i i,
o10 = h i10,set,{ i12, i13, i14 } i,
Data Types
define type Date:
tuple( year: integer, month: integer, day: integer );
define type Employee:
tuple( name: string, ssn: string,
birthdate: Date, sex: char, dept: Department );
define type Department:
tuple( dname: string, dnumber: integer,
mgr: tuple( manager: Employee, startdate: Date ), locations: set(string),
employees: set(Employee), projects: set(Project) );
Classes
define class Employee:
type tuple( name: string, ssn: string,
birthdate: Date, sex: char,
dept: Department );
operations
age(e: Employee): integer, create_new_emp: Employee,
define class Department:
type tuple ( dname: string, dnumber: integer,
mgr: tuple( manager: Employee, startdate: Date ), locations: set (string),
employees: set (Employee), projects: set (Project) );
operations
number_of_emps(d: Department): integer, create_new_dept: Department,
destroy_dept(d: Department): boolean,
add_emp(d: Department, e: Employee): boolean, remove_emp(d: Department, e: Employee): boolean;
define class DepartmentSet:
type set (Department);
operations
create_dept_set: DepartmentSet,
destroy_dept_set(ds: DepartmentSet): boolean,
add_dept(ds: DepartmentSet, d: Department): boolean, remove_dept(ds: DepartmentSet, d: Department): boolean;
persistent name AllDepartments: DepartmentSet;
/* AllDepartments is a persistent named object of type set(Department) */
. . .
d := create_new_dept;
/* creates new department object in the variable d */ b := add_dept(AllDepartments, d);
The UNIVERSITY Database as an OODB
Data Types
type Phone: tuple( area_code: integer, number: integer );
type Date: tuple( year: integer, month: integer, day: integer );
Classes
class Person
type tuple( ssn: string,
name: tuple( firstname: string, middlename: string, lastname: string ), address: tuple( number: integer, street: string, apt_no: string,
city: string, state: string, zipcode: string ), birthdate: Date,
sex: character );
method age: integer
class Student inherit Person
type tuple(
class: string, majors_in: Department, minors_in: Department, registered_in: set(Section),
transcript: set ( tuple(
grade: character, ngrade: real, section: Section ) ) );
method grade_point_average: real, change_class: boolean, change_major(new_major: Department): boolean;
end
class Grad_Student inherit Student
type tuple(
degrees: set ( tuple ( college: string, degree: string, year: integer ) ), advisor: Faculty );
class Faculty inherit Person
type tuple(
salary: real, rank: string, foffice: string, fphone: Phone, belongs_to: set(Department),
grants: set(Grants), advises: set(Student) ),
method promote_faculty, give_raise(percent: real),
end
class Department
type tuple(
dname: string, office: string, dphone: Phone, members: set(Faculty), majors: set(Student), chairperson: Faculty, courses: set(Course) ),
method add_major(s: Student), remove_major(s: Student):boolean
class Section
type tuple(
sec_num: integer, qtr: Quartar, year: Year,
students: set ( tuple( stud: Student, grade: character ) ), course: Course,
teacher: Instructor ),
method change_grade(s: Student, g: character);
end
class Course
type tuple(
cname: string, cnumber: string, cdescription: string, sections: set(Section),
offering_dept: Department );
Methods
method body age: integer in class Person
{
int a; Date d; d=today();
a=d->year - self->birthdate->year;
if ((d->month < self->birthdate->month) ||
(d->month == self->birthdate->month) &&
(d->day < self->birthdate->day)) --a;
return a;
method body grade_point_average: real in class Student { float sum=0.0; int count=0; struct { char gr; float ngrade; o2_Section sec; } t;
for (t in self->transcript) {
/* increment sum by ngrade, count by 1 */ sum += t->ngrade; ++count;
}
return sum/count;
method body
change_major (new_major: Department): boolean in class Student
{
if (self->majors_in->remove_major(self)) {
return 0; } else { new_major->add_major(self); self->majors_in=new_major; return 1; } }
method body
remove_major(s: Student): boolean in class Department
{
if (s in self->majors) {
/* –= apply set difference to remove object s from set of majors */ self->majors –= set(s);
return 1;
}
else return 0;
method body
add_major(s: Student) in class Department
{
/* += apply set union to add object s to set of majors */ self->majors += set(s);
}
/* a persistent root to hold all persistent Person objects */
name All_Persons: set(Person);
/* a persistent root to hold a single Person object */
run body {
/* create a new Person object p */ o2 Person p = new Person;
*p = tuple (
ssn: ”222222222”, name: tuple(
firstname: ”Franklin”, middlename: ”T”, lastname: ”Wong” ), address: tuple( number: 638, street: ”Voss Road”,
city: ”Houston”, state: ”Texas”, zipcode: ”77079” ), birthdate: tuple( year: 1945, month: 12, day: 8 ),
sex: M );
/* p becomes persistent by attaching to persistent root */ All_Persons += set(p);
/* now put values in persitent named object John_Smith */ John_Smith->ssn=”444444444”,
John_Smith->name: tuple(
firstname: ”John”, middlename: ”B”, lastname: ”Smith”),
John_Smith->address: tuple( number: 731, street: ”Fondren Road”, city: ”Houston”, state: ”Texas”, zipcode: ”77036” ),
John_Smith->birthdate: tuple( year: 1955, month: 1, day: 9 ), John_Smith->sex:M;
Queries
select tuple (
fname: s.name.firstname, lname: s.name.lastname ) from s in Student
where s.majors_in.dname = ”Computer Science”
select tuple(
fname: s.name.firstname, lname: s.name.lastname, transcript: select tuple(
cname: sc.section.course.cname, sec_no: sc.section.sec_num, quarter: sc.section.qtr, year: sc.section.year, grade: sc.grade ) from sc in sec )
from s in Student, sec in s.transcript