• No results found

Relational Algebra Machine

3.4 Logic Specialisation

3.4.2 Relational Algebra Machine

Next, we define the Relational Algebra Machine (RAM) language. RAM is the tar- get language of the first specialisation. RAM is an abstract machine that we have developed for Souffl´e that we use as a semantic model for evaluating translated input programs. The machine is specifically tailored to execute relational algebra programs that are produced by the semi-na¨ıve evaluation. The RAM program con- tains relational algebra operations to compute results produced by clauses and has the ability to efficiently model Datalog fixpoint evaluation schemes through im- perative constructs including statement composition for sequencing the operations, and loop construction with exit conditions. Additionally, RAM contains relation management operations to keep track of previous, current and new knowledge as required by efficient evaluation schemes such as the semi-na¨ıve evaluation.

3.4.2.1 Execution Model

The abstract machines operates solely on relations and have no notion of variables and/or memory. Thus the evaluation of a RAM program entails maintaining a col-

3.4. Logic Specialisation 85 lection of relations hR1,...,Rkias a state for executing a RAM program. The re- lations R1,...,Rk are fixed throughout the execution of a RAM program, i.e., no new relation is added to or deleted from the state whilst executing the program. However, the contents of a relation may change. There is a set of relations that the program operates on, some of which are pre-loaded with data, e.g., the tuples de- fined by the facts in the original input program. We define the RAM state s as a map between the relation names in the Datalog program and a sets of tuples the defining the relation, i.e., (R17→ {t1,t2,...},...,Rn7→ {t1,t2,...}). Given a state s, s[R] denotes a map access, accessing the element mapped to R. The notation [R 7→ e] denotes a map update, i.e., replacing the value mapped to R with e. Note, two Relations can be simultaneously updated as follows: [R17→e1,R27→e2]. Maps are closed under intersection, union, and compliment. τ denotes a variable to value set mapping. A mapping is assigned by τ ←tS where t is mapped in τ to a set of values S .

3.4.2.2 Syntax and Semantics

In the design of RAM, we attempt to limit expressivity in order to avoid errors in translation and yet it must be expressive enough to model all required constructs. Therefore, we should employ sufficient constructs to represent the Datalog evalu- ation mechanisms described above. The RAM constructs are divided into control flow statements, operations, relational management and values and conditions. The control flow constructs allow a RAM program to model the iteration of the semi- na¨ıve algorithm. Operations allow for the modelling of nested-loop joins for clause evaluation and relational management allows modelling of the book-keeping as- pects of the semi-na¨ıve algorithm.

Control Flow. The RAM syntax is defined in Figure 3.7. RAM has two statements for control flow, i.e., sequences of statements, and a loop statement with multiple exit statements. The sequencing of statements S1;S2 is necessary to order com- putations of relations that depend on each other. The order among relations stems from the strongly connected component graph of the dependencies between rela- tions [90]. Loops constructions are necessary for computing fixpoints of recursively defined relations. Mutually recursive relations are congregated in a single strongly

S ∈Stmt → loopS1; [exit C1;]...Sn; [exit Cn;]end S ∈Stmt → S1;S2 S ∈Stmt → mergeR1intoR2 S ∈Stmt → swapR1, R2 S ∈Stmt → purgeR S ∈Stmt → insertO

O ∈Oper → searchRas t[where C]doO O ∈Oper → project(V1,...,Vk)intoR C ∈Cond → C1andC2 C ∈Cond → V1relV2 C ∈Cond → notexistsR(V1,...,Vk) V ∈Value → R.v V ∈Value → t(v) V ∈Value → count(R) V ∈Value → const

Figure 3.7:RAM BNF grammar definition

connected component and the computations of the clauses of the relations are iter- ated until no further knowledge can be obtained. The semantic function for control statements takes a function with a state s as an argument, which is defined as a map- ping of the relation names to sets of tuples. The control flow loop is defined as the least fixpoint of the function F : (S → S) → (S → S), as shown below:

F(α)(s)=             

α(S~Sis) if ¬C~Cks for all Ckwhere k < i

s otherwise

Here, we execute a statement if all of its previous conditions didn’t trigger an exit. The sequence statement is defined by the composition of two statement executions. This type of control flow models the fixpoint characteristics of the loop in the semi-na¨ıve algorithm.

3.4. Logic Specialisation 87

S~loop S1;[exit C1;]...Sn;[exit Cn;]end ::= lfp(F ) S~S1;S2 ::= λs.S~S2(S~S1s) S~merge R1into R2 ::= λs.s[R27→s[R2] ∪ s[R1]] S~swap R1,R2 ::= λs.s[R17→ s[R2],R27→s[R1]] S~purge R ::= λs.s[R 7→ ∅] S~insert O ::= λs.O~Os τ0 O~search R as t [where C] do O ::= λs.λτ.O~Os (τ ←t{v ∈ R | C~C(v)}) O~project (V1,...,Vk)intoR ::= λs.λτ.s[R 7→ (~V1τ × . . . × ~Vkτ)] C~C1andC2 ::= λs, τ.C~C1τ, s ∧ C~C2τ, s C~V1relV2 ::= λs, τ.V~V1τ, s ∧ V~V2τ, s C~notexists R(V1,...,Vk) ::= λs,τ.(V~V1τ s, . . . , V~Vkτ s) < s[R] V~R.v ::= λs, τ.R.v V~t(v) ::= λs, τ.τ(t)(v) V~count(R) ::= λs, τ.card(s[R]) V~const ::= λs, τ.const

Figure 3.8:RAM semantics

Relational Management. The RAM statements for relational management are de- fined as the next three constructs in Figures 3.7 and 3.8. The statementmerge adds all of the tuples of relation R1to relation R2. The statementpurge deletes all tuples in relation R. The statementswap swaps the contents of two relations. Statements can be sequenced by a semicolon S1;S2such that S1is executed prior to S2.

Example 10 (Semi-Naive). Here we show how we can describe a semi-na¨ıve itera- tion using RAM.

1 insert (number (0)) into I

2 merge I into ∆I;

3 loop

4 ...

5 exit I ’ , ∅;

6 merge I ’ into I;

7 swap ∆I, I’;

8 purge I ’

9 end loop;



Nested-Loop Joins. The insert statement is used to model rule evaluation. To evalu- ate inserts, we instantiate a new loop state τ0, where the prime denotes a new empty map. This map stores tuple names to sets of tuples. An important feature of a RAM program is its ability to express nested-loop joins. To implemented nested-loop joins an insert statement contains a relational algebra operation O, that combines cross-product, selection and projection operations.

The operations in an insert are defined in the next two lines in Figure 3.7 and 3.8 which defines their syntax and semantics, respectively. The search traverses over all tuples in relation R, and tests whether, for a tuple t, the condition C holds. If it holds, the attached operation O is executed recursively, passing on the currently selected tuple of the traversal and the selected tuples of the outer traversals. If the condition does not hold, the operation O is skipped and the next tuple is assessed until the end of the relation is reached. The condition C is referred to as a primitive search condition. It is a restricted formula, as defined in Definition 6, consisting of a conjunction of equality predicates with right-hand-side attribute variables t.v from the tuple t ∈ R in the search, and left-hand-side constant tj.vjobtained from a tuple from further up the nested-loop join. In the semantics of Figure 3.8 a search updates the nested-loop state by mapping the tuple t to the filtered tuples of the search.

The project operation selects a set of attribute variables ti.vi,...,tk.vk from the tuples in the relations in the nested-loop join and projects their values onto the target relation R1. In the semantic definition, we now update the global state, as the

3.4. Logic Specialisation 89 relation projected onto may not be in the nested-loop join traversal. The syntax of a condition used for thesearch operation as well as other statements is listed next in Figures 3.7 and 3.8.

A condition can be a conjunction of conditions, a binary relation over two values, for which rel is either one of the following binary relations: =, ,, <, ≤, > and ≥, or a check on whether the tuple (V1,...,Vk) can be found in relation R and represents an existence check of a tuple in a relation. We refer the reader to [141] to see that relational algebra is indeed expressed in the semantics of RAM operations. A value can have the following syntax and semantics defined in the remaining of the definitions of Figures 3.7 and 3.8. Here, a value can be a reference to an attribute variable of a relation R.v, a tuple value t.v, a number of tuples in a relation or a constant value. We further clarify the semantics of nested-loop joins with an example:

Example 11 (RAM Nested-Loop Join). Consider a non-recursive Datalog rule P(x,y) :- R1(x,y),R2(y, x).

We can evaluated this rule with a cascading searches (forall loops) on R1 and R2 with a primitive equality on the attributes of R1, R2. This can be represented as the following RAM program:

1 search R1 as t do

2 search R2 as t1 where R2.x = t1(y) ∧ R2.y = t1.x do

3 project (t.x,t.y) into P

4 end

5 end