Regular nested word languages and XML Schema

III. Parameter-dependent context-free games

7.2.1. Regular nested word languages and XML Schema

In this subsection, we prove an exponential-time upper bound for games with target, validation and replacement languages given by DNWA. Furthermore, we prove that a corresponding lower bound already holds for games with a single function symbol, DTD replacement and target languages, and a single XML Schema validation language, so these bounds are tight for arbitrary combinations of XML Schema and DNWA target, validation and replacement languages.

Proposition 7.2. For the class of validation games with target, validation and replacement languages specified by DNWA, and strategies without replay, JWin is in EXPTIME.

Proof. We describe an alternating polynomial-space algorithm A which, given a validation game G“ pΣ, Γ, R, V, T, errq and an input string w P WFpΣq, decides whether Juliet has a winning strategy on w in G. To simplify the algorithm’s presentation, we assume Γ“ t1, . . . , ku.

The basic idea behind the algorithm A is to use alternation to simulate a play of Juliet and Romeo on w while aggregating over sub-plays on replacement strings in a manner similar to the effect technique used in Chapters 3 to 6. In other words, while the algorithm simulates a play on a given input string, it does not track the current string itself through the play, but only stores enough relevant information about sub- plays regarding the target and validation automata to decide validity of substrings and containment of the final string in the target language.

Thanks to the lack of replay, succinct representations do not need to take Call moves inside of replacement strings into account. On the other hand, unlike in Chapters 3 to 6, it does not suffice to aggregate sub-plays into states of the target language automaton, due to the need to take care of validation. As an additional complication, just tracking states of the validation automata is not enough, either, since we need to deal with nested function symbols and thus potentially with the impact of strings on several copies of the same validation automaton.

To solve this problem, A tracks aggregated transition functions for each relevant DNWA, i.e. instead of keeping some string v obtained by an alternating rewriting in memory, it instead tracks the transitions induced by v starting from each state of each DNWA. It does so in a relatively straightforward manner by going through the input string in a left-to-right fashion; any time it encounters an opening tag, it initialises a new aggregated transition function for each automaton (storing the old ones on a stack), and every time it encounters a closing tag, it guesses existentially a move by Juliet and universally a response by Romeo (if applicable) and aggregates those onto the transition functions it has stored so far. After the entire input string has been processed in this manner, all that remains is to check whether the aggregated transition function for the rewriting that A has guessed corresponds to a string in T .

More concretely, A tracks apk ` 1q-tuple t of aggregated transition functions, one for each validation DNWA as well as the target DNWA. That is, if A has already processed a prefix uhaiv of w, with v P WFpΣq being the longest well-nested suffix, the tuple t contains, for each DNWA with transition function δ, the function δ˚p¨, v1_{q, where v}1 _is the rewriting of v according to a play which A has guessed using alternation. If A next reads some opening hbi tag, it pushes t to a stack and proceeds on the string nested inside the b tags, starting with a new tuple of identity functions; if it reads a closing

h{ai tag instead, it guesses Juliet’s strategy choice onh{ai and (if applicable) a reply by Romeo, pops the function tuple t1 it has pushed with the correspondinghaitag, and computes from t, t1 and the strategy choices of Juliet and Romeo a new function tuple

corresponding to the rewriting of haivh{ai.3

To describe A in formal detail, let Ai “ pQi,Σ, δi, si, Fiq be a DNWA in normal form for Vi (for each i P rks), and let AT “ pQT,Σ, δT, sT, FTq be a DNWA in normal form for T . A function tuple is a tuple pf1, . . . , fk, fTq, where fT : QT Ñ QT, and (for each i) fi : Qi Ñ Qi. By id we denote the function tuple that has an identity function idi in each entry. For two function tuples t “ pf1, . . . , fk, fTq, t1 “ pf11, . . . , fk1, fT1q, let t˝t1 _{“ pf}

1˝f11, . . . , fk˝fk1, fT˝fT1 q denote their component-wise composition. Furthermore, for a P Σ and a function tuple t “ pf1, . . . , fk, fTq, let aptq “ papf1q, . . . , apfkq, apfTqq, where, for each iP t1, . . . , kuYtT u, apfiq : Qi Ñ Qiis the function defined for each qP Qi by apfiqpqq “ δpfipδpq,haiqq, q,h{aiq, i.e. the aggregated transition function simulating a transition with haibefore, and with h{ai after applying fi.

The algorithm A tracks a current function tuple t, initialised to id, and a stack of function tuples, initialised to ǫ. It reads the input string w from left to right. With the current tuple t being pf1, . . . , fk, fTq, A proceeds as follows:

• On an opening taghai with aP Σ, A pushes pf1, . . . , fk, fTq to the stack and then resets t to id.

• On a closing tagh{ai, A guesses existentially a strategy choice for Juliet onh{ai, if aP Γ.

(i) Depending on the precise case, A then summarises the information about the subgame on the current substring into a function tuple t2 as follows.

(a) If aP ΣzΓ (and Juliet must therefore play Read onh{ai), or if aP Γ and Juliet plays Read on h{ai, then A sets t2 :“ aptq;

(b) If a “ i for some i P Γ, Juliet plays Call on h{ii, and fipsiq R Fi (i.e. Juliet’s Call on h{aiis invalid), t2 :“ errptq;

(c) If a “ i for some i P Γ, Juliet plays Call on h{ii, and fipsiq P Fi (i.e. Juliet plays a valid Call onh{ai), A guesses universally a function tuple t2 corresponding to Romeo’s choice of replacement string, and verifies that there is indeed a replacement string in Ri inducing t2.

(ii) Finally A pops t1 _{“ pf}1

1, . . . , fk1, fT1q from the stack and replaces t by t1˝ t2. At the end of w, A accepts if and only if fTpsTq P FT holds for the final function tuple t“ pf1, . . . , fk, fTq.

Assuming that the verification in case (iii) is possible in alternating polynomial space, it is clear that A indeed only requires alternating polynomial space; furthermore, under the same assumption, it is easy to see (if somewhat tedious to prove) that A is correct, as it directly simulates a play of G on w, aggregating well-nested rewritten strings into their induced transition functions.

It remains to be explained how we verify, for a function tuple t2 “ pf2

1, . . . , fk2, fT2q and iP Γ, whether there exists a string v P Ri inducing the transition functions in t2, i.e.

This technique is somewhat similar to (and inspired by) the algorithm for determinising nondetermin- istic NWA [AM09].

with f_j2pqjq “ δj˚pqj, vq for each j P t1, . . . , k, T u and qj P Qj. For a single tuple, this is possible in deterministic exponential time by computing a product automaton consisting, for each j P t1, . . . , k, T u, of |Qj| many copies of Aj, each starting in a different state qP Qj and having fjpqq as its only accepting state, and a single copy of Ai without any modification. Obviously, this product automaton is of exponential size and is nonempty if and only if there is a string v P Ri such that for each j P t1, . . . , k, T u and each qj P Qj it holds that δj˚pqj, vq “ f pqjq, and emptiness for this product automaton can be checked in deterministic exponential time. Since EXPTIME“ APSPACE, this yields the desired complexity for verification in case (iii) above and completes the proof.

The corresponding lower bound proof once again follows one of the standard techniques for lower bounds: Reducing from the complement of an existence problem (in this case the intersection emptiness problem for XML Schema), tasking Romeo with providing a witness for existence (here: a nested word encoding a tree in the intersection), and having Juliet verify the correctness of this witness.

Proposition 7.3. For the class of validation games with target, validation and replacement languages specified by XML Schemas, and strategies without replay, JWin is EXPTIME-hard. This lower bound already holds for games with a DTD target language and one single function symbol, whose replacement language is given by a DTD.

Proof. The proof is by reduction from the intersection emptiness problem for XML Schema: Given XML Schemas S1, . . . , Sn over a common label alphabet Σ, is LpS1q X . . .X LpSnq “ H? This problem is known to be EXPTIME-complete [MNS09].

The basic idea behind the reduction is to construct from S1, . . . , Sn a game in which Romeo chooses some string v P WFpΣq, and Juliet picks an index i P rns such that v R LpSiq; in this way, Romeo has a winning strategy (which basically consists of choosing a string from LpS1q X . . . X LpSnq) if and only if LpS1q X . . . X LpSnq is nonempty. The rest of the proof consists of setting up this game G and an input string w such that the above idea can be carried out with a replay-free game with only a single function symbol, and making sure that none of the players is able to win the game by cheating (like for instance Juliet not playing any Call moves and denying Romeo the option of choosing a string v).

The input string is w“hfin`1h{fin`1, with f R Σ being the only function symbol in G. The idea is that Juliet is supposed to Call the innermosth{fi, to which Romeo replies with a string v P WFpΣq, leading to a string of the form hfinvh{fin. The validation language for f should be set up such that it contains, for all i P rns, all strings from LpSiq enclosed in i opening and closing f -tags. In this way, Juliet can provoke a return ofherrih{errito a Call on some h{fiif and only if the string v given by Romeo is not in LpSiq for some i.

In order for Romeo to be able to return the string v, Juliet’s first Call to the innermost h{fi must be valid; therefore, the validation language Vf must contain the string hfih{fi. At first glance, it might seem that having the validation language Vf consist of hfih{fi as well as all hfiivh{fii for all i P rns and v P LpSiq would suffice; however, with a validation language of this shape, Juliet would always be able to win

the game by playing Call on some h{fi that is not the innermost, since it is Juliet’s goal to provoke a return of herrih{erri and this na¨ıve validation language would yield such an error on an input of multiple nested f tags. Therefore Vf must also contain all strings of the form hfiih{fii for iP rns. In the above sketch, however, this would allow Romeo the option to cheat by always returning the empty string to any Call by Juliet, never allowing her to make an invalid Call. Therefore, we preface all replacement strings yielded by Romeo with a special symbol #R Σ, i.e. the replacement language for f is

Rf “ th#ih{#iv| v P WFpΣqu.

Combining this with our previous considerations, we obtain the following validation language for f :

Vf “ thfiih{fii| i P rn ` 1su Y thfiih#ih{#ivh{fii| i P rns, v P LpSiqu

Finally, the target language consists of all strings in which Juliet let Romeo choose a candidate string v and successfully showed that vR LpSiq for some i:

T “ thfiiherrih{errih{fii | i ě 0u

From the above considerations, it should be clear that Juliet has a winning strategy on w in G if and only if LpS1q X . . . X LpSnq “ H. Furthermore, it is easy to see that DTDs for Rf and T can be efficiently constructed.

An XML Schema Sf for Vf is also quite easy to construct: Assume without loss of generality that the type alphabets for S1, . . . , Sn(as defined in Section 2.2) are pairwise disjoint. We take additional types fi for each iP t0, . . . , nu which are all labelled with f , with f0 being the root type for Sf, and add for each iP rns rules fi´1Ñ fi| #ri, where ri is the root type of Si. Finally, we add rules fn Ñ ǫ and # Ñ ǫ. The content models of these additional rules are clearly deterministic, and since no fi or # appears in any Si, the schema thus constructed is single-type. This concludes the proof.

In document Context-free games on strings and nested words (Page 143-147)