Analysis - The constraint aware approach - XML-to-SQL Query Translation

5.5 The constraint aware approach

5.5.4 Analysis

Theorem 4 Given a tree XML-to-Relational mapping T along with the integrity con- straints that hold on the underlying relational schema, and a path expression query P , the constraint aware approach outputs a correct equivalent SQL query. The running time of the query translation algorithm is polynomial in the size of the input, while the running time of the precomputation stage may be exponential in the size of the input.

Proof: For a given path expression query Q, let Q1 be the SQL query obtained by

applying the constraint aware-translation algorithm. Let Q2 be the SQL query obtained

by applying the baseline query translation algorithm. Recall that Q2 = Sn∈Srtol(n),

where S = P athId(Q). We show that Q1 = Q2 under multiset semantics, which proves

that the constraint aware approach always outputs a correct equivalent SQL query. From the algorithm in Figure 35, we have

Q1 = SQLbij

[

Similarly, Q2 can be written as Q2 = [ n∈Sbij rtol(n) [ [ n∈Snonbij rtol(n) By definition, we have SQLnonbij = [ n∈Snonbij rtol(n)

So, we need to show that

SQLbij =

[

n∈Sbij

rtol(n)

Since each relational column being projected in these two queries is bijectively mapped, it suffices if we prove that

keySQLbij =

[

n∈Sbij

keyrtol(n)

Here keySQLbij refers to the query SQLbij augmented with the appropriate key column(s)

in the projection clause. We show this equality by proving containment in both the directions. In the following discussion, we assume that no two nodes in S are annotated with different column names from the same relation. Otherwise, we partition S based on the annotations and the following proof shows that the equality holds for each partition.

1. keySQLbij ⊇ Sn∈Sbijkeyrtol(n): If we show that every tuple t appearing in the

result of the RHS query also appears in the result of the LHS query, the containment result holds under set semantics. We first show this to be true. Let t belong to relation R and R.C be the bijective column being projected. Let n1 be the schema

node, such that, t occurs in the result of keyrtol(n1). The existence of a unique

schema node n1 follows from the fact that R.C is bijectively mapped. Consider

the same node n1 in the LHS query, keySQLbij. Since node n1 ∈ Sbij, at the

a node sequence N S =< Conf lict lda(n1), . . . , n1 > corresponding to node n1.

In the Grouping phase, we create a basic SQL query (say Q3) corresponding to

RelSeq(N S). It can be seen that tuple t occurs in the result of query Q3. This

is due to the fact that the relation sequence corresponding to Q3 is a suffix of the

relation sequence corresponding to keyrtol(n1). Moreover, the where clause in Q3

is of the form CN S or (conditions corresponding to other nodes with same relation

sequence as n1). Notice that CN S is a subset of the where clause of keyrtol(n1).

As a result, tuple t occurs in the result of query Q3. This implies that t occurs in

the result of keySQLbij. Hence, we have the required containment result under set

semantics.

Since R.C is bijectively mapped, t appears exactly once in the result of the RHS query. So, we have the containment result under multiset semantics.

2. keySQLbij ⊆ Sn∈Sbijkeyrtol(n): Here, if we show that every tuple t occuring in

the result of the LHS query also occurs in the result of the RHS query, we prove the containment under set semantics. Since the column under consideration is bijectively mapped, there are no duplicates in the result of the RHS query. So, if we also show that no tuple will appear multiple times in the result of the LHS query, we prove the containment under multiset semantics.

We first show that if a tuple t occurs in the result of the LHS query, it also occurs in the result of the RHS query. Since the corresponding column is bijectively mapped, there is some schema node n, such that, keyrtol(n) has t in its result set. We next show that n ∈ Sbij, i.e., the root-to-leaf path corresponding to n matches the query.

Assume that the tuple is produced by a basic SQL query Q4 in keySQLbij. Let

There is a condition Cni that was satisfied by some evaluation making tuple t

appear in the result. Let ni be the corresponding schema node. Now t appears

in the results of both the prefix-eliminated query for ni and keyrtol(n). This

implies that either (i) ni = n or (ii) n ∈ Sbij and the prefix-eliminated queries of

ni and n are combineable. Otherwise, the lda for ni would have been a higher

ancestor (steps 1-5 of the prefix-elimination algorithm (Figure 36). In either case, we have that n ∈ Sbij. Combining this with the fact that t appears in the result of

keyrtol(n), we have that t appears in the result of Sn∈Sbij keyrtol(n).

We next show that no tuple appears multiple times in the result of the LHS query. Assume the contrary. Suppose a tuple t appears more than once in the result. Since the relational column being projected is bijectively mapped, a single basic SQL query in keySQLbij will not produce duplicates. So, there are two basic SQL

queries in keySQLbij that have t in their result set. Let these queries be Q5 and

Q6. As in the previous case, we can find the corresponding schema nodes n5 and n6

that cause t to appear in the result of Q5 and Q6respectively. Let N S5 and N S6 be

the corresponding prefix-eliminated node sequences. Since the column is bijectively mapped, there is some schema node n, such that, keyrtol(n) has t in its result set. Using the same argument as in the previous case, we see that n ∈ Sbij. Let N S be

the prefix-eliminated node sequence for n. The two node sequences N S and N S5

are combineable, as otherwise this pair violates the condition in step 7 of the prefix- elimination algorithm (Figure 36). Similarly, the node sequences N S and N S6 are

also combineable. Since combinability is an equivalence relation, this implies that N S5 and N S6 are also combineable. This contradicts the assumption that n5 and

LHS query.

We have shown that Q1 = Q2 under multiset semantics, which proves that the con-

straint aware approach always outputs a correct equivalent SQL query.

The running time of the algorithm is polynomial as the while loop terminates in a polynomial number of iterations and the cost of each step in the algorithm is polynomial. The precomputation phase may have an exponential cost since the subroutines for the UQC, EQI and DUP problems may have exponential running times. 2

Theorem 5 Given sound and complete algorithms A and A0 _{for the UQC and DUP}

problems over the class C, the XML-to-SQL Query Translation problem for a bijective tree XML view under metric PrefixMetric can be solved in polynomial time.

Proof: We need to show that the SQL query output by the constraint aware algorithm is optimal under metric PrefixMetric. In other words, if we show that for each node n ∈ S, the constraint aware algorithm identifies the lowest ancestor till which we need to go up, we have proved the optimality of the algorithm.

Since we have sound and complete algorithms for UQC and DUP, the lda compu- tation is also complete. Consider a node n ∈ S. Let C lda(n) be the final ancestor chosen by the constraint aware algorithm. Let Q1 be a correct SQL query and Qn1 be

the corresponding fragment that returns results corresponding to the root-to-leaf path ending in n. Suppose Qn

1 corresponds to stopping at the ancestor n1. We argue that n1

is not a proper descendant of C lda(n). Otherwise, Qn

1 will return results corresponding

to some leaf node /∈ S or not combinable with RelSeq(C lda(n),n). The former implies that Q1 does not return the correct result and the latter implies that Q1 returns du-

query. Hence, n1 is at least as high as C lda(n), implying that the query output by the

constraint aware algorithm is optimal under metric PrefixMetric. 2

In document XML-to-SQL Query Translation (Page 152-157)