Determinacy and rewriting of functional top–down and MSO tree transformations

(1)

Determinacy and Rewriting of Functional

Top-Down and MSO Tree Transformations

I

M. Benedikta, J. Engelfrietb, S. Manethc,∗

a_{Department of Computer Science, University of Oxford, United Kingdom} b_{Leiden Institute of Advanced Computer Science, Leiden University, The Netherlands}

c_{School of Informatics, University of Edinburgh, United Kingdom}

Abstract

A query is determined by a view, if the result of the query can be reconstructed from the result of the view. We consider the problem of deciding for two given (functional) tree transformations, whether one is determined by the other. If the view transformation is induced by a tree transducer that may copy, then determi-nacy is undecidable. For a large class of noncopying views, namely compositions of extended linear top-down tree transducers, we show that determinacy is decid-able, where queries are either deterministic top-down tree transducers (with regular look-ahead) or deterministic MSO tree transducers. We show that if a query is de-termined by a view, then it can be rewritten into a query working over the view which is in the same class of transducers as the query. The proof relies on the de-cidability of equivalence for the considered classes of queries, and on their closure under composition.

Keywords: view-query determinacy, top-down tree transducers, MSO definable tree transducers

1. Introduction

Given a transformation between data structures, i.e., a function from data struc-tures to data strucstruc-tures, a basic question is what sort of information it preserves. In some contexts, one desires a transformation that is “fully information-preserving”: one can recover the input structure from the output structure. In other cases it may be acceptable, or even important, to hide certain pieces of information in the

in-put; but necessarily there issomeimportant information in the input that must be

recoverable from the output. This notion has been studied in the database commu-nity [1, 2]: a queryqis determined by another queryvif there exists a function f

I_{Benedikt and Maneth were supported by the Engineering and Physical Sciences Research}

Coun-cil project “Enforcement of Constraints on XML streams” (EPSRC EP/G004021/1).

∗

Corresponding author

Email addresses:[email protected](M. Benedikt),

(2)

such thatq=v◦f, where◦denotes sequential composition, i.e.,q(s)= f(v(s)) for every inputs. The queryvis referred to as “view”. Note that nothing is said about

how efficiently f can be computed (or if it can be computed at all). We can then

strengthen determinacy by requiring the function f to lie within a certain classR; then f is a “rewriting inR”. These notions have received considerable attention in the database setting [1, 2, 3, 4, 5].

In this paper we study determinacy and rewriting for classes of tree transforma-tions (or, tree translatransforma-tions), as in [6, 7]. Injectivity is undecidable for deterministic top-down tree transducers [8, 9]; hence, one cannot decide if the identity query is determined by such a transducer. This even holds for transducers that copy the input tree only once. Therefore, as in [6, 7], we restrict our attention to views

in-duced bylinear tree transducers, i.e., noncopying tree transducers. For the same

reason we restrict to a single view (while in database research, normally multiple views are considered). Our main result is that determinacy is decidable for views that are compositions of extended linear top-down tree transducers (with regular look-ahead) and for queries that are either deterministic top-down tree transducers

(with regular look-ahead) or deterministicmsotree transducers (wheremsomeans

“definable in Monadic Second-Order logic”). Extended transducers generalize the left-hand sides of conventional finite-state tree transducers, from one input symbol to an arbitrary “pattern tree”. They were introduced in [10] and have recently been studied in [11, 12, 13, 14]. Extended linear transducers are convenient because (1) they are more powerful than ordinary linear top-down or bottom-up transducers and (2) they allow to elegantly capture the inverses of translations.

As an example of determinacy, consider the transformation v taking binary

trees as input, with internal nodes labeleda,b,c, and leaves labeledl. It relabels

the b nodes asa nodes, and otherwise copies the tree as is. A linear top-down

tree transducer realizing this translationvhas a single statepand these translation rules:

p(a(x,y)) → a(p(x),p(y)) p(b(x,y)) → a(p(x),p(y))

p(c(x,y)) → c(p(x),p(y)) p(l) → l

Letq₀be the identity query, i.e.,q₀(s)= sfor every input trees. Since information about the (labels of)bnodes andanodes is lost in the viewv, from the outputv(s)

we cannot determine the answer sto the queryq0. In contrast, information about

thelnodes and their relationship tocnodes is maintained. Consider, for example,

the following query q1 that removes a and b nodes (and their second subtrees),

realized by the top-down tree transducer with the following translation rules:

p(a(x,y)) → p(x) p(b(x,y)) → p(x)

p(c(x,y)) → c(p(x),p(y)) p(l) → l

Clearly,q1is determined byv. In fact,q1 =v◦ f where f is realized by the same

(3)

Our decision procedure for determinacy establishes several results that are

in-teresting on their own. For a viewvrealized by an extended linear top-down tree

transducer, its inverse v−1 is a binary relation on trees. Our approach converts

v−1 into a composition of two nondeterministic translations, a translationτ1 of a

very simple form and a translationτ2 in the same class asv. We then construct

uniformizersu₁,u₂ofτ1, τ2 and compose them to form a uniformizeruofv−1. A uniformizerof a binary relationRis a functionusuch thatu⊆Randuhas the same domain asR; thususelects one of the possibly several elements thatRassociates with an element of its domain. It is easy to see that a queryqis determined byvif and only ifq=v◦u◦q, whereuis any uniformizer ofv−1. We show that ifqis a deterministic top-down ormsotree translation, then so isv◦u◦q. This is achieved

by proving thatu₁,u₂, andvare deterministic top-downand_msotree translations.

Since our two query classes are closed under composition andu = u1◦u2, this

shows thatv◦u◦qis in the same class asq. We then decideq = v◦u◦q, and hence determinacy, making use of the decidability of equivalence for deterministic

top-down ormsotree translations ([15, 16] or [17]). The same proof also shows

that ifqis determined byv, thenu◦qis a rewriting belonging to the same class as the queryq.

The results of this paper were first presented at MFCS 2013 (see [18]).

Related Work. The notion of view-query determinacy was introduced by Segoufin and Vianu in [1]. They focus on relational queries definable in first-order logic and show that if such queries are determined over arbitrary structures, then they can be rewritten in first-order logic, but that if they are determined over fi-nite structures, they may require a much more powerful type of relational query to be rewritten. Nash, Segoufin, and Vianu [2] summarize a number of other results

on the relational case. Due to the differing data models and notions of equality

used in relational queries and tree transducers, results on determinacy for queries in the relational case do not (directly) apply to tree transducers, and vice versa. In the context of unranked trees, determinacy is considered in Groz’s thesis [19] for XML views and queries, see also [20]. Two notions of determinacy are consid-ered, depending on whether or not the output trees preserve provenance informa-tion (i.e., node identities) from the input document. It is shown that both noinforma-tions of determinacy are undecidable for views and queries defined using a transformation language that can select subtrees using regular XPath filters. On the positive side, it is shown that if the views are “interval-bounded” – there is a bound on the num-ber of consecutive nodes skipped along a path – then determinacy can be tested effectively.

The most related work is that of Hashimoto et al. [6, 7], who consider the

(4)

the functionality test for bottom-up tree transducers in [21] to extended bottom-up tree transducers with “grafting” (needed for the inverse of the view). Our main result generalizes the above result of [6, 7], and provides an alternative proof of it. In fact, the classes of views and queries for which we can decide determinacy are larger than those of [6, 7]. In particular, our class of queries is much larger than their class, as top-down tree transducers and MSO tree transducers are much more powerful than bottom-up tree transducers (for defining partial functions). As explained above, for a given viewvand queryq, rather than testing functionality ofv−1◦q, our approach is to test the equivalenceq = v◦u◦q, whereuis a uni-formizer ofv−1. This implies that, for the fixed class of views, our approach is in fact applicable to any class of queries that has a decidable equivalence problem and is closed under left-composition with everyvandu. This is discussed in more detail in Section 7.2 and in the Conclusion.

2. Preliminaries

Fork ∈ _N= {0,1, . . .}let [k] denote the set{1, . . . ,k}. For a binary relationR

we denote byR−1its inverse{(y,x)| (x,y)∈R}. For a setA,R(A) ={y| ∃x ∈A: (x,y) ∈ R}, and for an element x,R(x) = R({x}). IfR ⊆ B×C for setsBandC, then the range ofRisran(R)=R(B) and the domain ofRisdom(R)=R−1(C). For

two relationsRandS we denote their composition “Rfollowed byS” byR◦S,

i.e., for every element x, (R◦S)(x) = S(R(x)). Note that this is in contrast to the conventional use of◦. IfR,Sare classes of binary relations and n ≥ 1, then

R ◦ S = {R◦S | R ∈ R,S ∈ S}, Rn = {R1◦ · · · ◦Rn | Ri ∈ Rfori ∈ [n]},

R∗=S

m≥1Rm, andR−1={R−1|R∈ R}. A relationRisfunctionalif it is a partial

function, i.e., ifR(x) is either empty or a singleton, for every element x. IfR is a class of relations, then fu-R denotes the class of all partial functions inR. The identity on a setAis the functionidA ={(x,x) | x∈ A}. For a relationRand a set

A, the domain restriction ofRtoAisidA◦R= {(x,y) ∈R| x ∈ A}, and the range

restriction ofRtoAisR◦idA ={(x,y)∈R|y∈A}.

Determinacy. We define determinacy and rewritability, following [2]. Let

Q,Vbe classes of partial functions and let q ∈ Q andv ∈ V. We say thatq is

determined by v, if there exists a partial function f such thatq = v◦ f. Note that the latter means that the domains ofqandv◦ f coincide, and thatq(s) = f(v(s))

for each sin that domain. Determinacy forQunderVis the problem that takes

as inputq ∈ Qandv ∈ V and outputs “yes” if qis determined by v, and “no”

otherwise. Determinacy says that there is a functional dependency ofqonv, with

no limit on how complex it is to reconstruct the answer toq from the answer to

v. A finer notion requires that the reconstruction be in a given class: a classRof

partial functions iscomplete forV-to-Qrewritings, if for everyq ∈ Qandv ∈ V

such thatqis determined byv, there is an f ∈ Rwithq=v◦ f.

(5)

that rank_Σ(a) = k and define Σ(k) as the set {a ∈ Σ | rank_Σ(a) = k}. The set of (ranked, ordered, node-labeled, finite) trees overΣ, denoted byT_Σ, is the set of strings defined recursively as the smallest setT such thata(s₁, . . . ,sk) ∈ T if

a∈ Σ(k)_,_k _≥_{0, and} _s

1, . . . ,sk ∈T. For a treea() we simply writea. For a setT0

of trees, we denote byT_Σ(T0) the smallest set of treesT containing T0 such that

a(s₁, . . . ,sk)∈T ifa∈Σ(k),k≥0, ands1, . . . ,sk ∈T. Thus,TΣ=TΣ(∅). A subset

ofT_Σis called a tree language overΣ.

We represent the nodes of a tree as usual by Dewey notation, i.e., by strings

of positive natural numbers. The empty stringεrepresents the root node and, for

i∈_N₊={1,2, . . .},virepresents theith child of the nodev. Every nodevof a tree

shas a label inΣ, denotedlab(s,v). Formally, the setV(s)⊆N∗₊of nodes of the tree sis inductively defined as:V(s)= {ε} ∪ {iv|i∈[k],v∈V(si)}ifs=a(s1, . . . ,sk),

a ∈ Σ(k)_{, and} _s

1, . . . ,sk ∈ TΣ; moreover, lab(s, ε) = a andlab(s,iv) = lab(si,v).

The subtree of s rooted at v ∈ V(s) is denoted by s/v; formally, s/ε = s and if

s=a(s₁, . . . ,sk) thens/iv= si/v. The size of a tree is the number of its nodes.

LetT1, . . . ,Tnbe sets of trees. For treess1, . . . ,snthat are not subtrees of each

other, and a trees, we denote bys[si ←Ti|i∈[n]] the set of trees obtained froms

by replacing each occurrence of a subtreesiofsby a tree fromTi(where different

occurrences ofsi need not be replaced by the same tree). If everyTiis a singleton

{ti}, then we writes[si ←ti|i∈[n]].

We fix a countably infinite setX={x1,x2, . . .}ofvariables, which are assumed

to have rank 0. Fork ∈_N, letXk ={x1, . . . ,xk}. A trees∈TΣ(X) islinearif each

variable xi occurs at most once in s. For k ∈ N, an Xk-context (overΣ) is a tree

C inT_Σ(Xk) such that each variable xi ∈ Xk occurs exactly once inC. For such a

contextC and trees s₁, . . . ,sk,C[s1, . . . ,sk] denotes the treeC[xi ← si | i ∈ [k]].

For a ranked alphabetQwithQ(1)= Qwe denote byQ(Xk) the set of trees{q(xi)|

q∈Q,i∈[k]}.

LetΣand∆be ranked alphabets. Atree homomorphism f fromT_Σ toT_∆ is

given by a mapping that assigns a tree fa ∈ T∆(Xk) to everya ∈Σ(k),k ≥ 0. The

mapping f is then defined by f(a(s1, . . . ,sk)) = fa[xi ← f(si) | i∈ [k]] for every

a ∈ Σ(k), k ≥ 0, and s₁, . . . ,sk ∈ TΣ. The tree homomorphism f islinear if fa

is linear, i.e., each variable xi occurs at most once in fa, nondeletingif every xi,

i∈[k], occurs at least once in fa, andnonerasingif fa < Xk, for alla ∈Σ(k) with

k≥0.

A (deterministic bottom-up)finite tree automaton (fta, for short) over Σis a

tupleA=(P,Σ,F, δ) wherePis a finite set of states,Σis a ranked alphabet,F⊆ P

is the set of final states, andδis the transition function. For everya ∈Σ(k),k ≥0, and p1, . . . ,pk ∈ P, δ(a,p1, . . . ,pk) is an element of P. The transition function

δgives rise to a mapping δ∗ from T_Σ to P. It is defined by δ∗(a(s1, . . . ,sk)) =

δ(a, δ∗(s₁), . . . , δ∗(sk)) fora ∈ Σ(k),k ≥ 0, and s1, . . . ,sk ∈ TΣ. For convenience,

we denote the functionδ∗as well byδ. Thus, for a nodevof an input trees,δ(s/v) is the state reached byAat nodev, whenAprocessess. The language accepted by

(6)

We will use the following elementary and well-known fact.

Lemma 1. LetA1, . . . ,Anbe ftas overΣ,n≥1. There exist ftasA0₁, . . . ,A0noverΣ

such that

(1) A0₁, . . . ,A0_nhave the same set of states and the same transition function, and (2) L(A0_i)=L(Ai) for everyi∈[n].

Proof. LetAi = (Pi,Σ,Fi, δi) for everyi∈[n]. We now defineA0_i = (P0,Σ,F0_i, δ0)

to be the usual product ofAiand all automata (Pj,Σ,Pj, δj) for j, i. That means

thatP0 =P1× · · · ×Pn,F_i0= P1× · · · ×Pi−1×Fi×Pi+1× · · · ×Pn, and, fora∈Σ(k),

δ0

(a,(p1,1, . . . ,pn,1), . . . ,(p1,k, . . . ,pn,k))=

(δ1(a,p1,1, . . . ,p1,k), . . . , δn(a,pn,1, . . . ,pn,k)).

Thus, A0_i ignores all components of a state except theith. It should therefore be

clear thatL(A0_i)=L(Ai).

For an ftaA=(P,Σ,F, δ) we extend the transition functionδto trees inT_Σ(Xk),

in particular toXk-contexts, as follows. Let p1, . . . ,pk ∈ P. Fori∈[k], we define

δ(xi,p1, . . . ,pk) = pi. For a ∈ Σ(n), n ≥ 0, and s1, . . . ,sn ∈ TΣ(Xk), we define

δ(a(s1, . . . ,sn),p1, . . . ,pk) = δ(a, δ(s1,p1, . . . ,pk), . . . , δ(sn,p1, . . . ,pk)). Clearly,

for anXk-contextC and trees s1, . . . ,sk ∈ TΣ, δ(C[s1, . . . ,sk]) = δ(C,p1, . . . ,pk)

whereδ(si)= pi fori∈[k].

Convention: All lemmas, propositions, theorems, etc., stated in this paper (except in Section 5) areeffective.

3. Extended Top-Down Tree Transducers

In this section we define extended top-down tree transducers (with regular look-ahead), prove a normal form for them, and discuss a number of their properties. Extended top-down tree transducers are studied in, e.g., [10, 11, 12, 13, 14].

3.1. Definition

An extended top-down tree transducer with regular look-ahead (etR

trans-ducer, for short) is a tuple M = (Q,Σ,∆,I,R,A) where Qis a ranked alphabet

of states all of rank 1,Σand∆are ranked alphabets of input and output symbols,

respectively, I ⊆ Qis the set of initial states,A = (P,Σ,F, δ) is an fta called the

look-ahead automaton, andRis a finite set of rules of the form

q(C)→ζ hp1, . . . ,pki

where q ∈ Q, k ≥ 0, C , x1 is an Xk-context over Σ, ζ ∈ T∆(Q(Xk)), and

p1, . . . ,pk ∈P. For every stateq∈Qwe define theq-translation[[q]]M ⊆TΣ×T∆

as follows. For an input tree s ∈ T_Σ, theq-translation [[q]]M(s) of s is the

(7)

s1, . . . ,sk ∈TΣ, ifs=C[s1, . . . ,sk] andδ(si)= pifor everyi∈[k], thenT contains

the set of trees

ζ[q0(xi)←[[q0]]M(si)|q0∈Q,i∈[k]].

Thetranslation[[M]]realized by Mis the binary relation

{(s,t)∈T_Σ×T_∆| s∈L(A),t∈ ∪q∈I[[q]]M(s)}.

The class of all translations realized byetRtransducers is denotedetR(and

simi-larly for other types of transducers). TwoetRtransducersM1andM2are equiva-lentif [[M1]]=[[M2]].

Since we do not allow the left-hand side of a rule to be of the formq(x₁) with

q∈Q, the application of a rule always consumes part of the input tree, i.e., ouretR

transducers are “epsilon-free” (cf. [11, 12, 14]).

The etR transducer M is linear (or noncopying), if the right-hand side ζ of

each rule is linear, i.e., each variablexi occurs at most once inζ. We use “l” to

abbreviate “linear”, i.e., aneltRtransducer is a linearetRtransducer, and soeltR

is the class of translations realized by linearetRtransducers.

Transducerswithout look-ahead are defined by transducers with a trivial

one-state look-ahead automaton (acceptingT_Σ); this is indicated by omitting the super-script “r” for transducers and classes. In the notation for such transducers we omit

the look-ahead automaton and we omit the sequence of look-ahead states at the end of each rule.

TheetRtransducerMis an (ordinary, not extended)top-down tree transducer

with regular look-ahead(tRtransducer) if the left-hand side contextCof each of its rules contains exactly one symbol inΣ, more precisely, each rule is of the form

q(a(x₁, . . . ,xk))→ζhp1, . . . ,pkiwitha∈Σ(k)andk≥0.

A tR transducer M is finite-copying (a tR_fc transducer) if each input node is

translated only a bounded number of times. Formally this means there exists a numberK such that for every p ∈ P, s ∈ T_Σ({}), andt ∈ [[Mp]](s), ifoccurs

exactly once in s, thenoccurs≤ K times int; here we assume thatis a new

input and output symbol of rank 0, and thatMpisMextended with the look-ahead

transitionδ()= pand the rulesq()→for every stateq.

AtRtransducer isdeterministicif it has exactly one initial state and for each

q,a, andhp1, . . . ,pkiit has at most one rule as above. Determinism is denoted by

the letter “d”, thus we havedtR anddtR_fctransducers. Note that their translations

are functional.

Example 2. LetΣ = ∆ = {a(3),b(2),e(0)}. We will define aneltRtransducerM =

(Q,Σ,∆,I,R,A) such that

[[M]]={(a(b(s₁,s₂),b(s₃,s₄),e),a(s₁,s₂,s₃))|s₁,s₂,s₃,s₄∈T{b,e}}.

This translation cannot be realized by anltR transducer (as can easily be shown

using the proof of [12, Lemma 4.7] and the fact that ltR is closed under

(8)

LetA = (P,Σ,F, δ) with P = F = {p,p} and with δ(e) = p, δ(b,p,p) = p and δ(. . .) = p in all remaining cases. Thus, for s ∈ T_Σ,δ(s) = pif s ∈ T{b,e}, and

δ(s) = potherwise. The set of states ofMisQ = {q₀,q}, whereq₀ is the unique initial state, i.e.,I = {q₀}. The rules ofMare defined as follows. First,Rcontains the rule

q₀(a(b(x₁,x₂),b(x₃,x₄),e))→a(q(x₁),q(x₂),q(x₃))hp,p,p,pi.

Second,Rcontains the rulesq(b(x1,x2))→b(q(x1),q(x2))hp0,p00ifor all p0,p00 ∈ {p,p}, and the ruleq(e) → e. Clearly, [[q]]M is the identity on T{b,e}, and [[M]] =

[[q0]]Mis the required translation.

3.2. Dead Ends

In this subsection we prove a simple normal form foretR transducers, to be

used in the proofs of Lemmas 25 and 34. We will need the following

elemen-tary fact, that allows us to add as much regular look-ahead information to anetR

transducer as we wish. It is an obvious extension of Lemma 1.

Lemma 3. LetM =(Q,Σ,∆,I,R,A₁) be anetRtransducer, and letA2, . . . ,Anbe

ftas overΣ,n≥ 2. There exist anetRtransducer M0 = (Q,Σ,∆,I,R0,A0₁) and ftas

A0₂, . . . ,A0_noverΣsuch that

(1) A0₁,A0₂, . . . ,A0_nhave the same set of states and the same transition function, (2) L(A0_i)=L(Ai) for everyi∈[n],

(3) [[q]]M0 =[[q]]_M for everyq∈Q, and [[M0]]=[[M]], and

(4) ifMis aneltRtransducer, atRtransducer, or atR_fctransducer, then so isM0.

Proof. LetAi =(Pi,Σ,Fi, δi) for everyi∈[n]. We defineA0_i =(P0,Σ,F0_i, δ0) as in

the proof of Lemma 1 (withP0 =P1× · · · ×Pn).

Moreover, ifq(C) → ζ hp1, . . . ,pkiis a rule of M, then the setR0 of rules of

M0contains all rulesq(C) →ζ hp0₁, . . . ,p0_kisuch that the first component of p0_i is

pi for everyi∈[k]. Thus,M0ignores all components of a look-ahead state except

the first.

It should be clear thatM0andA0₂, . . . ,A0_nas defined above, satisfy the require-ments. For the finite-copying property, note that [[M₍0_p_1,...,_p

n)]](s)=[[Mp1]](s) for all

pi ∈Pi ands∈TΣ({}).

AnetRtransducerMiswithout dead endsif the following holds for every rule

q(C) → ζ hp1, . . . ,pki, everyq(xi) that occurs inζ (withq ∈ Qandi∈ [k]), and

everys∈T_Σ: ifδ(s)= pi, thens∈dom([[q]]M). Intuitively this means thatMdoes

not have unsuccessful computations.

Lemma 4. For everyetRtransducer M there is an equivalentetR transducer M0

without dead ends. IfMis aneltR,tR, ortRfctransducer, then so isM

0_{. Moreover,}

(9)

Proof. Let M = (Q,Σ,∆,I,R,A) with A = (P,Σ,F, δ). Since dom([[q]]M) is a

regular tree language for everyq ∈ Q(because etR = tR by Lemma 11 and the

domain of atRtranslation is regular by [22, Corollary 2.7]), it is easy to transform

MintoM0by adding appropriate look-ahead information, as follows.

For everyq∈Q, letAq= (P,Σ,Fq, δ) be an fta such thatL(Aq)= dom([[q]]M).

Note that by Lemma 3 we can assume that these automata and the automaton A

have the same set of states and the same transition function. We letM0 be theetR

transducer (Q,Σ,∆,I,R0,A), whereR0 is the set of rulesq(C) →ζ hp1, . . . ,pkiin

Rsuch that ifq(xi) occurs inζthenpi ∈Fq.

It should be clear that [[M0]] = [[M]]. Obviously, if M is aneltR, tR, ortR_fc

transducer, then so is M0. For the finite-copying property, note that [[M0_p]](s) ⊆

[[Mp]](s) for every p∈Pands∈TΣ({}).

Example 5. Let Σ = ∆ = {a(1),b(0),c(0)}. Let A = (P,Σ,F, δ) be the fta with P={p_e,po}andF={pe}such that, fors∈TΣ,δ(s)= peif and only if the number

ofa’s insis even. We define anltRtransducerM=(Q,Σ,∆,I,R,A) such that

[[M]]={(a2nb,a2nb)|n≥0} ∪ {(a2nc,c)|n≥0}.

LetQ= I = {qb,qc}, and letRcontain the rulesqb(b) →bandqc(c)→c, and for

everyp∈ {p_e,p_o}the rulesqb(a(x1))→a(qb(x1))hpiandqc(a(x1))→qc(x1)hpi.

Clearlydom([[qb]]M) = T{a,b}anddom([[qc]]M) = T{a,c}. So,Misnotwithout dead

ends; for instance, the computation ofMwith input treea(a(c)) that starts in initial stateqbis unsuccessful.

Now letBbe the fta with two statespbandpc, both final, such thatδB(anb)= pb

andδB(anc) = pc for everyn ≥ 0 (whereδB is the transition function of B), and

let A0 be the usual product of A and B (see the proof of Lemma 1). Moreover,

let M0 = (Q,Σ,∆,I,R0,A0) be the ltR transducer with the rules qb(b) → b and

qc(c)→c, and for everyp∈ {pe,po}the rulesqb(a(x1))→a(qb(x1))h(p,pb)iand

qc(a(x1)) →qc(x1) h(p,pc)i. It is easy to see that M0is equivalent to Mand that

M0is without dead ends.

3.3. Properties

One of the two classes of queries that we are interested in, isdtR. We will need

the following two basic results. The first is from [22, Theorem 2.11], the second is proved in [15] (see also [16]).

Proposition 6. _dtRis closed under composition.

Proposition 7. It is decidable for twodtRtransducers whether they are equivalent.

As we will show in Section 5, determinacy is undecidable if the view transduc-ers copy. Therefore, when we prove determinacy results in Section 7, we define views using linear transducers. To be precise,

(10)

i.e., the class of all partial functions that are compositions of translations ineltR.

This class properly contains fu-eltRby the proof of Theorem 5.2 of [12] (see also

the proof of Theorem 34 of [14]). It will follow from Proposition 6 and Corol-lary 33 (in Section 6.3) that the class of views fu-(eltR)∗is properly contained in

the class of queriesdtR.

From [11, Theorem 17] we recall a useful characterization of eltR as a class

of so-called bimorphisms. We note that in [11] the classeltRis denoted l-xtopRef,

where “ef” stands for “epsilon-free”.

Proposition 8. _eltRis the class of all tree translations{(f(r),g(r))|r ∈R}where

f is a linear nondeleting nonerasing tree homomorphism,gis a linear tree

homo-morphism, andRis a regular tree language.

From this proposition we obtain some basic properties ofeltR. They are

imme-diate from the fact that the class of regular tree languages is closed under inverse tree homomorphisms, intersection, and linear tree homomorphisms (noting that

{(f(r),g(r))|r∈R}= f−1◦idR◦g).

Corollary 9. Ifτ∈_eltR_and_R_{is a regular tree language, then}_τ(_R_{) and}_τ−1₍_R_{) are}

regular tree languages. In particular,dom(τ) andran(τ) are regular tree languages.

For the next corollary we observe that ifτ = {(f(r),g(r)) | r ∈ Rτ}, then its domain restriction to the setRis

idR◦τ={(f(r),g(r))|r ∈Rτ∩ f−1(R)},

and its range restriction toRis

τ◦idR ={(f(r),g(r))|r∈Rτ∩g−1(R)}.

Corollary 10. Ifτ∈_eltR_and_R_{is a regular tree language, then}_id

R◦τandτ◦idR

are ineltR.

It is shown in [12, Theorem 4.8] thatetR = tR. The importance of extended

top-down tree transducers is that in the linear case they are more powerful than

ordinary top-down tree transducers, i.e.,ltR ( eltR, see Lemmas 4.6 and 4.7 of

[12] (and Example 2). Here we show, using the construction in the proof of [12, Theorem 4.8] thateltR is included in tR_fc (as already observed before [11,

Theo-rem 19]). The inclusion is proper by Corollary 9 becausetR_fccontains translations

that do not preserve regularity.

Lemma 11. _etR ₌

tRandeltR (tR_fc.

Proof. We use the construction in the proof of [12, Theorem 4.8], in our

termi-nology, for an etR transducer M = (Q,Σ,∆,I,R,A) with A = (P,Σ,F, δ). Let

(11)

let V be the set of all nodes of subtrees of those contexts, i.e., the union of all

V(C/v) withC ∈ Candv∈V(C). For anXk-contextC∈ Candp1, . . . ,pk ∈P, let

AC,p1,...,pkbe an fta overΣsuch that

L(AC,p1,...,pk)={C[s1, . . . ,sk]| si ∈TΣ, δ(si)= pifori∈[k]}.

SinceCandPare finite, Lemma 1 shows that there exist ftasA0 = (P0,Σ,F0, δ0) andA0_C_,_p_1,...,_p

k = (P

0_,_Σ_,_F

C,p1,...,pk, δ

0_{) such that}_L₍_A0₎ ₌ _L₍_A_{) and} _L₍_A0

C,p1,...,pk) = L(AC,p1,...,pk). Note that these automata have the same set of statesP

0_{and the same}

transition functionδ0.

We now construct thetRtransducerM0 =(Q× V,Σ,∆,I× {ε},R0,A0) with the

following rules.

(i) Ifq(C)→ζhp1, . . . ,pkiis a rule inRandC=a(C1, . . . ,Cm) witha∈Σ(m),

m≥0, andC1, . . . ,Cm∈TΣ(Xk), thenR0contains all rules

hq, εi(a(x1, . . . ,xm))→ζ0 hp0₁, . . . ,p0mi

such thatδ0(a,p0₁, . . . ,p0_m)∈FC,p1,...,pk and

ζ0₌_ζ[

q0(xi)← hq0,vi(xj)|q0∈Q,i∈[k],j∈[m],v∈V(Cj),lab(Cj,v)= xi].

Note that sinceCis anXk-context, jandvare uniquely determined byi.

(ii) Ifa∈ Σ(m)_,_m_≥ _{0, and}_hq_,_{ivi ∈} _Q_{× V}_with_i _∈_[_m_{] and}_v _∈

N∗₊, thenR0

contains all rules

hq,ivi(a(x1, . . . ,xm))→ hq,vi(xi)hp0₁, . . . ,p0mi

withp0₁, . . . ,p0_m∈P0. Thus, the look-ahead is irrelevant for these rules.

It should be intuitively clear that M0is equivalent to M. In fact, it is straight-forward to show by induction on the structure of s ∈ T_Σ that [[hq,vi]]M0(s) = [[q]]M(s/v) if v ∈ V(s) and [[hq,vi]]M0(s) = ∅ otherwise. Hence [[hq, εi]]_M0(s) = [[q]]M(s) for everyq∈I; sinceL(A0)=L(A), this shows that [[M0]]=[[M]].

It remains to prove thatM0is finite-copying ifMis linear. LetK∈N+be such that ifq(C)→ζhp₁, . . . ,pkiis a rule ofM, thenk ≤K. Considerhq,vi ∈Q× V,

p ∈ P0, s ∈ T_Σ({}) with one occurrence of, and t ∈ [[hq,vi]]M0

p(s). We claim

that the number of occurrences ofintis

(1) ≤K ifv∈V(s) andoccurs ins/v

(2) =0 ifv∈V(s) anddoes not occur in s/v, and

(3) =1 ifv<V(s) andvhas a prefixwsuch thatlab(s,w)=.

Note that ifv_<V(s) andvhas no prefixwwithlab(s,w)=, then [[hq,vi]]M0p(s)=

∅ (by the rules of M0 of type (ii)). Note also that (again by the rules of M0 of

type (ii)) ifv ∈ V(s), thent ∈ [[hq, εi]]M0

p(s/v). That proves Claim (2). We prove

Claims (1) and (3) by induction on the structure ofs. Ifs =thent= , and so

(12)

We first consider the case thatv,ε, i.e.,v= iv0 withi∈[m] andv0 ∈N∗₊. By

a rule of type (ii),t ∈[[hq,v0i]]M0p(si). Clearly, ifvands satisfy the conditions of

Claim (1), then so dov0andsi, and hence Claim (1) is true by induction. The same

holds for Claim (3).

It remains to consider the case that v = ε, for which we only have to prove

Claim (1). Lethq, εi(a(x₁, . . . ,xm))→ζ0 hp0₁, . . . ,p0mibe the rule ofM0of type (i)

that is applied to s to obtain t, constructed from a rule q(C) → ζ hp₁, . . . ,pki

of M. Let hq0₁,v1i(xj1), . . . ,hq 0

n,vni(xjn) be all occurrences of subtrees of ζ

0 _in

(Q× V)(Xm). Note that since M is linear,n ≤ k ≤ K. We also observe that all

sequences jrvr,r ∈ [n], are different and, in fact, no such sequence is a prefix of

another one, because they are different leaves ofC(with label inXk). So, there are

t₁, . . . ,tnsuch thatt =ζ0[hq0r,vri(xjr) ←tr | r ∈[n]] andtr ∈[[hq

0

r,vri]]M0

p(sjr) for

everyr∈[n]. We now consider two subcases.

Case 1: There is nor ∈[n] such thatvr ∈V(sjr) andoccurs insjr/vr. Then,

by induction, the number of occurrences ofintris≤ 1 by Claims (2) and (3),

and so the number of occurrences ofintis≤n≤ K.

Case 2: There existsr0 ∈[n] such thatvr0 ∈ V(sjr₀) andoccurs in sjr₀/vr0. Thenoccurs in s/jr0vr0 and so we obtain by the above observation, for every

r ,r0, thatdoesnotoccur insjr/vr (ifvr ∈V(sjr)) andvr hasnoprefixwsuch

thatlab(sjr,w)= (ifvr<V(sjr)); hence, by induction,does not occur intrby

Claim (2). By induction and Claim (1), tr0 has at most K occurrences of, and

hence so hast.

Example 12. Consider theeltR transducer M = (Q,Σ,∆,I,R,A) of Example 2. Let Bbe the fta with states pa,pb,pe, all final, such that, for all s1,s2,s3 ∈ TΣ,

δB(a(s1,s2,s3)) = pa, δB(b(s1,s2)) = pb, and δB(e) = pe. Let A0 be the usual

product ofAandB, with set of statesP0. Then an equivalenttR_fc transducer M0 =

(Q0,Σ,∆,I0,R0,A0) has set of statesQ0 = {q0,q,q1,q2}, withI0 = I = {q0}, where q₀andqstand forhq₀, εiandhq, εi, andqistands forhq,ii. Moreover,M0has the

rule

q0(a(x1,x2,x3))→a(q1(x1),q2(x1),q1(x2))h(p,pb),(p,pb),(p,pe)i,

for allp0,p00∈P0the rules

q1(b(x1,x2)) → q(x1)hp0,p00i, q₂(b(x₁,x₂)) → q(x₂)hp0,p00i,

q(b(x1,x2)) → b(q(x1),q(x2))hp0,p00i,

and the ruleq(e)→e.

Since our views are taken from the class fu-(eltR)∗, the next proposition is

relevant.

Proposition 13. For an eltR transducer M it is decidable whether [[M]] is

(13)

Proof. By Lemma 11,etR = tR. The result follows because functionality is

de-cidable for tR transducers by [15] (see the sentence after Theorem 8 of [15]).

However, we do not know whether functionality is decidable for the

composi-tion of the translacomposi-tions realized by a given sequence ofeltR transducers. On the

other hand, we will show in Corollary 36 (in Section 6.3) that our class fu-(eltR)∗is

in fact equal to the class (fu-eltR)∗of compositions of functionaleltRtranslations.

Thus, sequences of functionaleltRtransducers form an effective representation of

the class of views fu-(eltR)∗. We observe also that three such transducers suffice:

it is shown in [14] that (eltR)∗=(eltR)3.

We finally note that, by [14, Lemma 15], (eltR)∗ = elt∗and hence our class

fu-(eltR)∗of views equals fu-elt∗.

4. MSO Definable Tree Translations

A deterministicmsograph transducer is a device that transforms graphs by

us-ing formulas from monadic second-order logic to define the output graph in terms of a finite number of copies of the input graph. For a formal definition see, e.g., [23, Chapter 7] where it is called a parameterless monadic second-order definition scheme. We say that it is adeterministic_msotree transducer(dmsottransducer) if

it transforms trees into trees, see, e.g., [23, Chapter 8]. We note that the translation

realized by a dmsottransducer is functional. The class dmsot of these

transla-tions is the second of the two classes of queries that we are interested in. It is investigated, e.g., in [24, 25, 26, 27].1 It is incomparable with the class of queries

dtR.

We will only need the following three results ondmsot. The first result is easy

to prove, but also follows from [25, Theorem 7.1]. The second result is well known, see, e.g., [23, Theorem 7.14]. The third result is from [17, Corollary 10].

Proposition 14. _dtR_fc⊆_dmsot.

It is proved in [26, Theorem 7.4] that, in fact,dtR_fcis the intersection of our two

classes of queries, i.e.,dtR_fc =dtR∩dmsot.

Proposition 15. _dmsotis closed under composition.

It will follow from Propositions 14, 15 and Corollary 33 (in Section 6.3) that our class of views fu-(eltR)∗is properly contained in the class of queriesdmsot.

Proposition 16. It is decidable for twodmsottransducers whether they are

equiv-alent.

1_{In [24, 25, 26] the class contains total functions only, and is denoted}

(14)

5. Undecidability Results

The simplest kind of query is the identity on the set of inputs. Letiddenote

the class of translations that are the identity onT_Σ, for any ranked alphabetΣ. As observed in [6], a functionvis injective if and only ifqis determined byv, where

qis the identity ondom(v). Since the injectivity problem for tree homomorphisms is undecidable by [9], one obtains (as stated in [6, Theorem 17]) undecidability of

the determinacy problem forid underhom, where hom ⊆ dtis the class of tree

homomorphisms.

The tree homomorphisms used in the proof of [9] are not linear, and hence they are not even finite-copying. Here we show that determinacy is already undecidable for identity queries if the view makes at most one copy of the input tree. To be

pre-cise, we show that determinacy is undecidable foridunder views that are realized

by what we will here call “total copy-once” dt transducers (t-dtco transducers).

A dt transducer M is total if for each state q ∈ Q and input symbol a ∈ Σ(k),

k ≥ 0, it has a rule with left-hand sideq(a(x1, . . . ,xk)). For such a transducer,

dom([[M]])=T_Σ. It iscopy-onceif for every ruleq(a(x₁, . . . ,xk))→ζ,

• the initial stateq0ofMdoes not occur inζ,

• ifq=q0then each variablexioccurs at most twice inζ, and

• ifq_,q₀thenζis linear.

Thus the transducer copies at most once, at the root of the input tree. The undecid-ability of injectivity for nontotaldtco transducers was proved by ´Esik in [8], and

in his PhD thesis (in Hungarian). Our proof for totaldtco transducers is a slight

variation of ´Esik’s proof.

Theorem 17. Determinacy foridunder t-dtcois undecidable.

Proof. We use an encoding of the Modified Post Correspondence Problem (mpcp).

An instance of this problem consists of two sequences of strings (α1, . . . , αn) and

(β1, . . . , βn),n≥1, and it has a solution if there exist numbersi1, . . . ,ik ∈[n],k≥0,

such that the stringsα1αi1· · ·αik andβ1βi1· · ·βik are equal. It is well known to be

undecidable whether anmpcpinstance has a solution. LetΩbe the ranked alphabet

consisting of all symbols that appear in the stringsαi andβi, where each symbol

has rank one. For a stringα=a1· · ·amand a treet, we writeα(t) to mean the tree

a1(a2(· · ·am(t)· · ·)). We define the t-dtco transducerM = (Q,Σ,∆,{q0},R) where Q = {q0,qa,qb,qid},Σ = {i(1) | i ∈[n]} ∪ {a(1),b(1),e(0)}, and∆ = Σ∪Ω∪ {f(2)}.

The state q_id realizes the identity on T_Σ; for everyσ ∈ Σ(k), k ≥ 0, it has the rule qid(σ(x1, . . . ,xk)) → σ(qid(x1), . . . ,qid(xk)) in R. For i ∈ [n] we define the

following rules inR.

q0(i(x1)) → i(qid(x1)) qa(i(x1)) → αi(qa(x1))

q0(e) → e qb(i(x1)) → βi(qb(x1))

(15)

The transducerMtranslates every input tree of which the root is not labeled by a

or bybinto itself. It translates every input treeai1· · ·ik(s) where the root ofs∈TΣ

is not a number, into f(i₁· · ·ik(s), α1αi1· · ·αik(e)), and similarly forbandβij. This

implies that the givenmpcpinstance has no solution if and only if [[M]] is injective

if and only if the identityidT_Σ is determined by [[M]], and hence it is undecidable

whetheridT_Σ is determined by [[M]].

Since, obviously, every dtco transducer is a dtfc transducer, anddtfc is

(ef-fectively) included in dmsot by Proposition 14, Theorem 17 immediately gives

undecidability of determinacy foridunderdmsot, which slightly strengthens

The-orem 19 of [6] (where the identities on all regular tree languages are allowed as query).

One often considers that a query q is determined by a finite tuple of views

F = (v1, . . . ,vn). The extended definition states that q is determined byF if it

is determined by the single view ¯v, where ¯v(s) = (v1(s), . . . ,vn(s)) for every input

s. In this case one has undecidability even when the views are total deterministic finite-state string transformations. In fact, for a pair of viewsF = (v₁,v₂) consist-ing of two total deterministic finite-state strconsist-ing transducersv1 andv2, the proof is

the same as for Theorem 17, but producing theith branch of the output tree by the viewvi. Thus, in what follows we consider only a single noncopying view.

6. Inverses and Uniformizers of Linear Transducers

As Theorem 17 shows, determinacy cannot be decided under view transducers that copy, not even for a single initial copy at the root of the input tree. We therefore restrict our attention to linear view transducers. In the first two subsections of this section we consider arbitrary linear transducers. In the third subsection, and in Section 7, we restrict attention to functional linear translations.

Letτandube translations (i.e., binary relations). We say thatuis auniformizer

of τ if u ⊆ τ and dom(u) = dom(τ). Note that, for technical reasons, in this

definition we do not require u to be a function (as we did in the Introduction).

We will, however, be mainly interested in functional uniformizers. The notion of uniformizer is relevant for deciding whether a queryqis determined by a viewv.

Lemma 18. Let q,v,u be partial functions such that u is a uniformizer of v−1. Then,qis determined byvif and only ifq=v◦u◦q.

Proof. If q = v◦u ◦q, then q = v◦ f where f is the partial function u◦q.

So, q is determined byv. Vice versa, ifq = v◦ f for a partial function f, then

v◦u◦q=v◦u◦v◦ f =v◦ f =qbecauseu◦vis the identity onran(v).2

2_{We know that}_u_⊆_v−1_and_dom₍_u₎₌_dom₍_v−1_{). Since}_ran₍_u₎_⊆_ran₍_v−1₎₌_dom₍_v_{), we have}

(16)

Example 19. Consider the viewv = [[M]] whereMis theeltRtransducer of

Ex-ample 2, i.e.,

v={(a(b(s1,s2),b(s3,s4),e),a(s1,s2,s3))|s1,s2,s3,s4∈T{b,e}}.

Letqbe the query

q={(a(b(s₁,s₂),b(s₃,s₄),e), b(s₁,s₁))|s₁,s₂,s₃,s₄∈T{b,e}}.

Using the construction in the proof of Lemma 11 it is easy to see that q ∈ _dtR_fc

and henceqis in bothdtRanddmsot. Clearly,qis determined byv; for example,

q=v◦ f where

f ={(a(s₁,s₂,s₃),b(s₁,s₁))|s₁,s₂,s₃ ∈T_Σ}.

A functional uniformizer ofv−1is

u={(a(s1,s2,s3),a(b(s1,s2),b(s3,e),e))| s1,s2,s3∈T{b,e}}.

Note thatu ∈ _lt, as can easily be verified. Note also thatu◦v = _idran(v) where

ran(v)={a(s1,s2,s3)| s1,s2,s3∈T{b,e}}. Obviously,

u◦q={(a(s1,s2,s3), b(s1,s1))| s1,s2,s3∈T{b,e}}

andq=v◦(u◦q).

6.1. Inverses

Because of Lemma 18, we are interested in uniformizers of inverse views.

Thus, given aneltR transducer M, we would like to construct a transducer

real-izing its inverse [[M]]−1. Since Mcan translate the set of all input trees inT_Σto a single output tree, a transducer realizing [[M]]−1may need to translate a tree back

to any tree in T_Σ. This is not possible by our extended top-down tree

transduc-ers because the height of an output tree is linearly bounded by the height of the input tree. The next, easy lemma “factors out” this problem by decomposing an

eltR transducer into a component that can be inverted as anelttransducer, and a

component of a very simple form: a “projection”.

Let n-eltdenote the class ofnondeleting nonerasingelttransducers: those in

which every rule is of the formq(C) →ζ such that each variable inXk occurs in

ζ andζ < Q({x1}). The phrase “nondeleting” indicates that we do not drop any

inputxi, thus “deleting” an entire subtree from the input. The phrase “nonerasing”

indicates that we do not have a rule such asq(a(x1,b)) →q0(x1), which “erases”

the symbolsaandb.

Let∆be a ranked alphabet and letHbe a ranked alphabet disjoint from∆with

(17)

h∈ Handπd = d(x1, . . . ,xk) for everyd ∈∆(k).3 We denote byprojthe class of

all projections.

Lemma 20. _eltR ⊆n-elt◦proj.

Proof. It is proved in [10] (where n-eltis denotedtid) that Proposition 8 holds for

n-elt whengis also required to be nondeleting and nonerasing. Thus, it suffices

to prove that every linear tree homomorphismg can be decomposed intog0◦π,

whereg0is a linear nondeleting nonerasing tree homomorphism andπ∈_proj. As

shown in Lemma I-2-1-3-5 of [28] we can take the ranked alphabetH = {#(nn) |

1 ≤ n ≤ m+1}, wheremis the maximal rank of the symbols inΣ, and we can

defineg0_a=#n+1(ga,xi1, . . . ,xin) for everya∈Σ

(k)_,_k_≥_{0, where} _x

i1, . . . ,xin are the

variables fromXkthat do not occur inga. It should be clear thatg=g0◦π∆,H.

It is easy to see that n-elt is closed under inverse. In fact, as shown in [10],

one just “inverts” the rules of the given transducer.

Lemma 21. n-elt−1⊆n-elt.

Proof. A rule of an n-elttransducerMis of the form

q(C)→C0[q1(x1), . . . ,qk(xk)]

whereC andC0 are Xk-contexts , x1 (and q,q1, . . . ,qk ∈ Q). An n-elt

trans-ducer realizing [[M]]−1is obtained fromMby changing each such rule into the rule

q(C0)→C[q1(x1), . . . ,qk(xk)].

The previous two lemmas give us the next corollary.

Corollary 22. (eltR)−1⊆proj−1◦elt.

Example 23. Let Mbe theeltR transducer of Example 2. Then [[M]] = [[M0]]◦

π_∆,HwhereH={#(2)₂ }andM0is theelttransducer with the rule

q0(a(b(x1,x2),b(x3,x4),e))→#2(a(q(x1),q(x2),q(x3)),q(x4))

and the rulesq(b(x1,x2)) → b(q(x1),q(x2)) andq(e) → e. Then we obtain that

[[M]]−1=π−_∆1_,_H◦[[M00]] whereM00is theelttransducer with the rule

q0(#2(a(x1,x2,x3),x4))→a(b(q(x1),q(x2)),b(q(x3),q(x4)),e)

and the same rulesq(b(x1,x2))→b(q(x1),q(x2)) andq(e)→e.

3_Thus,_π₍_h₍_s_{1, . . . ,}_s_k))₌_π₍_s

1) forh∈H(k)andπ(d(s1, . . . ,sk))=d(π(s1), . . . , π(sk)) ford∈∆(k),

(18)

6.2. Uniformizers

We say that the sequence τ1, . . . , τn of translations iscompatibleifran(τi) ⊆

dom(τi+1) for everyi∈[n−1].

Lemma 24. Ifu1, . . . ,un are uniformizers of τ1, . . . , τn, respectively, and the

se-quenceτ1, . . . , τnis compatible, thenu1◦ · · · ◦unis a uniformizer ofτ1◦ · · · ◦τn.

Proof. Obviously u1◦ · · · ◦un ⊆ τ1 ◦ · · · ◦τn. Since the sequenceτ1, . . . , τn is

compatible,dom(τ1◦ · · · ◦τn) = dom(τ1). Sinceran(ui) ⊆ ran(τi) ⊆ dom(τi+1) =

dom(ui+1), the sequenceu1, . . . ,unis also compatible and sodom(u1◦ · · · ◦un) =

dom(u1). This shows thatdom(u1◦ · · · ◦un)=dom(τ1◦ · · · ◦τn).

For classesT,U of translations we say thatT has uniformizers in U if for

everyτ∈ T we can construct a uniformizeruofτsuch thatu∈ U.

Our goal in this subsection is to show that (eltR)∗and ((eltR)∗)−1 have

uni-formizers in (dtR_fc)∗. We do this by decomposing into compatible translations,

con-structing uniformizers indtR_fcfor them, and then obtaining a uniformizer in (dtR_fc)∗

by Lemma 24. A similar idea was used in [29] to obtain uniformizers indtR for

compositions of top-down and bottom-up tree translations. Note thatdtRis closed

under composition by Proposition 6; in fact,dtR_fc is also closed under composition

(see, e.g., [30, Theorem 5.4]), but this fact will not be needed in this paper.

We start with a slight generalization of the Lemma in [29] (which says thatt

has uniformizers indtR).

Lemma 25. _tRhas uniformizers indtR, andtR_fchas uniformizers indtR_fc.

Proof. Let M = (Q,Σ,∆,I,R,A) be atR transducer. Obviously, we may assume

that I is a singleton (as required for determinism).4 By Lemma 4 we may also

assume thatMis without dead ends. Suppose thatMhas two rules

q(a(x1, . . . ,xk))→ζi hp1, . . . ,pki,

fori=1,2 withζ1,ζ2, and let M0be obtained fromMby removing one of these

rules. Since M is without dead ends, [[q]]M0 is a uniformizer of [[q]]_M for every

q ∈ Q.5 Hence [[M0]] is a uniformizer of [[M]], andM0 is still without dead ends. Repeating this process, one finally obtains adtR transducer M00 such that [[M00]]

is a uniformizer of [[M]]. The construction obviously preserves the finite-copying

property.

The next result is immediate from Lemmas 11 and 25.

4_{Introduce a new state}_q

0 (which is the unique new initial state), and for every ruleq(C) → ζhp1, . . . ,pkiwhereqis an old initial state, add the ruleq0(C)→ζhp1, . . . ,pki.

(19)

Corollary 26. _eltRhas uniformizers indtRfc.

Theorem 27. (eltR)∗has uniformizers in (dtR_fc)∗.

Proof. Letτ1, . . . , τn beeltRtranslations. We defineeltRtranslations ρ1, . . . , ρn

such thatρ1◦ · · · ◦ρn = τ1◦ · · · ◦τn and the sequenceρ1, . . . , ρn is compatible.

We letρn = τn, and fori∈ [n−1] we letρi be the range restrictionτi◦idRofτi

toR= dom(ρi+1). The languageRis regular by Corollary 9, andτi◦idR is ineltR

by Corollary 10. We obtain uniformizers indtR_fcfor theρi by Corollary 26, and a

uniformizer forρ1◦ · · · ◦ρnin (dtRfc)

∗_{by Lemma 24.}

We now turn to the inverses of the translations in (eltR)∗. Since the relation “is

a uniformizer of” is transitive, it suffices to show that ((eltR)∗)−1has uniformizers

in (eltR)∗.

Anftatransduceris an ftaA, seen as a tree transducer realizing the translation

[[A]] = idL(A), which is the identity on L(A); composing a tree translationτ with

[[A]] amounts to restricting the range ofτtoL(A):τ◦[[A]]={(s,t)∈τ|t∈L(A)}.

Lemma 28. _proj−1◦_ftahas uniformizers inlt.

Proof. Let τ = π−1 ◦[[A]] where π ∈ proj and A is an fta transducer. Thus,

π = π_Σ,H for disjoint ranked alphabets Σ and H such that H(0) = ∅, and A is

an fta (Q,Σ∪H,F, δ). Let Cbe the set of all X1-contextsC over Σ∪H such

that the left-most leaf ofC has label x1 and all the ancestors of this leaf have

labels in H. Note thatπ(C[t]) = π(t) for everyC ∈ C andt ∈ T_Σ∪H. For every

a ∈ Σ(k) _and_q_,_q

1, . . . ,qk ∈ Q, letC(a,q,q1, . . . ,qk) be the set of allC ∈ Csuch

thatδ(C[a(x1, . . . ,xk)],q1, . . . ,qk)=q.6LetC0(a,q,q1, . . . ,qk) be one (fixed) such

C– since the setC(a,q,q₁, . . . ,qk) is effectively regular,7one can always compute

such an elementC if the set is nonempty. If there does not exist such aC then

C0(a,q,q1, . . . ,qk) is undefined.

We now construct anlttransducerMsuch that [[M]] is a uniformizer ofτ. We

defineM=(Q,Σ,Σ∪H,F,R0) whereR0consists of all rules

q(a(x₁, . . . ,xk))→C0(a,q,q1, . . . ,qk)[a(q1(x1), . . . ,qk(xk))]

such thatC0(a,q,q1, . . . ,qk) is defined. Intuitively, for s ∈ TΣ, Msimulates

top-down the state behavior ofAon some treetinπ−1(s) and, at each node ofs, outputs a context inCon whichAhas the same state behavior as on the context inCthat is “above” the corresponding node int. Formally, the correctness of the construction follows from the following two claims, in whichq∈Q,s∈T_Σ, andt∈T_Σ∪H.

Claim 1.Ift∈[[q]]M(s), thenπ(t)= sandδ(t)=q.

6_{Recall again that the transition function}_δ_{was extended to contexts at the end of Section 2.} 7_{It is the intersection of the regular tree language}_C_{with the tree language accepted by the fta}

(20)

The proof is by structural induction on s. If s = a(s1, . . . ,sk), then, by the above

rule,t=C[a(t1, . . . ,tk)] whereC=C0(a,q,q1, . . . ,qk) andti ∈[[qi]]M(si) for every

i ∈ [k]. By induction, π(ti) = si andδ(ti) = qi. Thenπ(t) = π(a(t1, . . . ,tk)) =

a(π(t1), . . . , π(tk))= sandδ(t)=δ(C[a(x1, . . . ,xk)],q1, . . . ,qk)=q.

Claim 2.Ifδ(t)=q, thenπ(t)∈dom([[q]]M).

The proof is by induction on the size oft. Clearly,tis of the formC[a(t₁, . . . ,tk)] for

(unique)C ∈ C,k ≥ 0,a∈Σ(k)andt1, . . . ,tk ∈TΣ∪H. Letqi = δ(ti) andsi = π(ti)

fori∈[k]. By induction,si ∈dom([[qi]]M), i.e., there existst_i0∈[[qi]]M(si). Lets=

a(s₁, . . . ,sk), soπ(t)= s. Then, by the above rule,C0(a,q,q1, . . . ,qk)[a(t0₁, . . . ,t0_k)]

is in [[q]]M(s), and so s ∈ dom([[q]]M). Note that C0(a,q,q1, . . . ,qk) is defined

becauseC∈ C(a,q,q1, . . . ,qk):δ(C[a(x1, . . . ,xk)],q1, . . . ,qk)=δ(t)=q.

Takingq∈F, Claim 1 shows that [[M]]⊆τ, and Claim 2 shows thatdom([[M]]) equalsdom(τ) (in fact, if s ∈ dom(τ) then there exists tsuch that s = π(t) and δ(t) = q∈ F, and sos ∈ dom([[q]]M) ⊆ dom([[M]])). Hence [[M]] is a uniformizer

ofτ.

Lemma 29. (eltR)−1has uniformizers inelt2.

Proof. Let τ ∈ (eltR)−1. By Corollary 22, τ = τ1 ◦τ2 with τ1 ∈ proj−1 and

τ2 ∈elt. By Corollary 9 we can construct an ftaAsuch thatL(A)=dom(τ2). Then

τ=τ1◦[[A]]◦τ2and the translationsτ1◦[[A]], τ2are compatible (by definition of A). Forτ1◦[[A]]∈proj−1◦ftawe obtain, by Lemma 28, a uniformizeru1inelt.

Thenu₁◦τ₂is a uniformizer inelt2forτby Lemma 24.

Theorem 30. ((eltR)∗)−1has uniformizers in (eltR)∗.

Proof. The proof is analogous to the one of Theorem 27. Letτ1, . . . , τn be eltR

translations. We defineeltRtranslationsρisuch thatρ−₁1◦ · · · ◦ρn−1=τ−₁1◦ · · · ◦τ−n1

and the sequence ρ−₁1, . . . , ρ−n1 is compatible, i.e., ran(ρ

−1

i ) ⊆ dom(ρ

−1

i+1). We let

ρn = τn, and for i ∈ [n−1] we let ρi be the domain restriction idR◦τi ofτi to

R = ran(ρi+1). The language Ris regular by Corollary 9, and idR◦τi is in eltR

by Corollary 10. We obtain uniformizers inelt2for theρ−i1by Lemma 29, and a

uniformizer in (eltR)∗forρ₁−1◦ · · · ◦ρ−n1by Lemma 24.

The next corollary is immediate from Theorems 27 and 30.

Corollary 31. ((eltR)∗)−1has uniformizers in (dtR_fc)∗.

Example 32. We continue Example 23. Recall thatM00 is theelttransducer with

the rule

q0(#2(a(x1,x2,x3),x4))→a(b(q(x1),q(x2)),b(q(x3),q(x4)),e)

and the rulesq(b(x₁,x₂))→b(q(x₁),q(x₂)) andq(e)→e. The domain of [[M00]] is

(21)

Clearly, dom([[M00]]) = L(A) where A = (Q,∆∪H,F, δ) withQ = {q,qa,qf,q},

F = {qf}, andδ(e) = q, δ(b,q,q) = q, δ(a,q,q,q) = qa, δ(#2,qa,q) = qf and

δ(. . .)=qin the remaining cases. With the terminology of the proof of Lemma 28, it is easy to see thatC(a,qf,q,q,q) = {#2(x1,s) | s ∈ T{b,e}} and so we can take

C0(a,qf,q,q,q) = #2(x1,e). Also, C0(b,q,q,q) = C0(e,q) = x1. From this we

obtain that a uniformizer u₁ of π−_∆1_,_H ◦[[A]] is realized by the (deterministic) lt

transducer with the rule

qf(a(x1,x2,x3))→#2(a(q(x1),q(x2),q(x3)),e)

and the rules q(b(x₁,x₂)) → b(q(x₁),q(x₂)) and q(e) → e. Then a (functional) uniformizeruof [[M]]−1 =π−_∆1_,_H◦[[M00]] inelt2isu=u1◦[[M00]]. Composing the

translations of the two transducers, it is clear that

u={(a(s1,s2,s3),a(b(s1,s2),b(s3,e),e))| s1,s2,s3∈T{b,e}}.

Thus, for the viewv = [[M]], u is a uniformizer of v−1, as already observed in

Example 19.

6.3. The Class of Views

As observed before, our class of views is fu-(eltR)∗. It is immediate from

Theorem 27 that every such view is in (dtR_fc)∗, as stated in the next corollary.8

Corollary 33. fu-(eltR)∗((dtR_fc)∗.

In view of Theorem 27 and Corollary 31, a natural question to ask now is whether (eltR)∗ and ((eltR)∗)−1 have uniformizers in fu-(eltR)∗. We prove a

slightly stronger statement. We first strengthen Corollary 26.

Lemma 34. _eltR_{has uniformizers in}

fu-eltR.

Proof. LetM =(Q,Σ,∆,I,R,A) be aneltRtransducer, withA= (P,Σ,F, δ). The

proof is similar to the one of Lemma 25, but more complicated. As in that proof,

we may assume thatIis a singleton and, by Lemma 4, thatMis without dead ends.

We will construct aneltRtransducerM0that simulatesMand, at each moment

of a computation, applies exactly one of the applicable rules of M. Since M is

without dead ends, this ensures that [[M0]] is a functional uniformizer of [[M]]. We fix an arbitrary linear order on the rules ofM. The transducerM0will always pick the first applicable rule ofM, in this order.

LetC ⊆T_Σ(X) be the finite set of contexts in the left-hand sides of the rules of

M. As in the proof of Lemma 11, for anXk-contextC ∈ Cand p1, . . . ,pk ∈ P, let

AC,p1,...,pkbe an fta overΣsuch that

L(AC,p1,...,pk)={C[s1, . . . ,sk]| si ∈TΣ, δ(si)= pifori∈[k]}.

8_{The inclusion is proper by Corollary 9, because}

dtRfccontains translations that do not preserve

(22)

It follows from Lemma 1 that there exist ftasA0 = (P0,Σ,F0, δ0) andA0_C_,_p_1,...,_p k =

(P0,Σ,FC,p1,...,pk, δ

0_{) such that} _L₍_A0₎ ₌ _L₍_A_{) and} _L₍_A0

C,p1,...,pk) = L(AC,p1,...,pk).

These automata have the same set of states P0 and the same transition function

δ0 .

We now define theeltRtransducerM0 =(Q,Σ,∆,I,R0,A0) with the following

set of rulesR0. A ruleq(C) →ζ hp0₁, . . . ,p0_kiis inR0 if and only if there is a rule

q(C)→ζhp1, . . . ,pkiinRsuch that

• δ0(C,p0₁, . . . ,p0_k)∈FC,p1,...,pk, 9_and

• there is no smaller (in the given order) ruleq(C)→ζhp₁, . . . ,pmiinRsuch

thatq=qandδ0(C,p0₁, . . . ,p0_k)∈F_C_,_p

1,...,pm.

It should be clear thatM0satisfies the requirements. Formally, it is straightforward to show for everyq∈Qands∈T_Σthat [[q]]M0(s)⊆[[q]]_M(s) and that [[q]]_M0(s) is a singleton if [[q]]M(s),∅(by induction on the size ofs). SinceI is a singleton, this

shows that [[M0]] is a functional uniformizer of [[M]].

The next corollary strengthens Theorem 27. Its proof is the same as the one of Theorem 27, using Lemma 34 instead of Corollary 26.

Corollary 35. (eltR)∗has uniformizers in (fu-eltR)∗.

As a direct consequence of Corollary 35 we obtain that our class of views equals (fu-eltR)∗, as already observed in Section 3.

Corollary 36. fu-(eltR)∗=(fu-eltR)∗.

Note that, since (eltR)∗=(eltR)3by [14] and, clearly, (eltR)3has uniformizers

in (fu-eltR)3, we have that fu-(eltR)∗=(fu-eltR)3.

Similarly to Corollary 35, we strengthen Corollary 31 in the next corollary. It follows immediately from Theorem 30 and Corollary 35.

Corollary 37. ((eltR)∗)−1has uniformizers in (fu-eltR)∗.

7. Decidability of Determinacy and Rewriting

Consider a queryq, a viewv, and a uniformizeruofv−1, each of them being a partial function. By Lemma 18,qis determined byvif and only ifq=v◦u◦q. For queries indtR ordmsot, equivalence is decidable and they are effectively closed

under left composition withdtR_fc. Thus, ifvanduare in (dtR_fc)∗, then we can decide

determinacy. This holds for the views in the class (eltR)∗. In fact, if vis in this

class, then v−1 has a uniformizer u in (dtR_fc)∗ by Corollary 31, and v itself is in

(dtR_fc)∗by Corollary 33.

The main results of this paper are presented in the next three theorems.

(23)

Theorem 38. Determinacy is decidable fordtRand fordmsotunder fu-(eltR)∗.

Proof. Letv∈fu-(eltR)∗. By Corollary 33,vis in (dtR_fc)∗, effectively.10By

Corol-lary 31, one can construct (transducers for) a uniformizeru ∈(dtR_fc)∗ofv−1. If a

queryqis adtR(dmsot) translation, then so areu◦qandv◦u◦q, by Propositions 6,

14 and 15, effectively. We can decide ifq= v◦u◦q(i.e., ifqis determined byv

by Lemma 18) because equivalence is decidable fordtRanddmsottransducers by

Propositions 7 and 16.

The proof of Theorem 38 also proves the next theorem (taking f =u◦q).

Theorem 39. LetV = fu-(eltR)∗,v ∈ V, and letqbe adtR(dmsot) translation

such thatqis determined byv. AdtR(dmsot) translationfcan be constructed such

thatq=v◦f. That is,dtRanddmsotare complete forV-to-dtRandV-to-dmsot

rewritings, respectively.

As defined in [3], a classQof queries (and views)admits view-based rewritings

ifQis complete forQ-to-Qrewritings, i.e., for everyq∈ Qandv ∈ Qsuch thatq

is determined byv, there is an f ∈ Qwithq = v◦ f. Intuitively this means that

the query languageQhas the nice property that it need not be extended to allow

rewriting of queries with respect to views.11 In the next theorem we show that,

considered as a class of (linear) queries, the class fu-(eltR)∗ admits view-based

rewritings. Recall that fu-(eltR)∗is a subclass of both classes of queriesdtRand

dmsot(effectively), by Corollary 33 and Propositions 6, 14, and 15.

Theorem 40. Let V = fu-(eltR)∗. Determinacy is decidable for V under V.

Moreover,Vis complete forV-to-Vrewritings.

Proof. The first statement is a special case of Theorem 38. To prove the second

statement, letq,v ∈ Vsuch thatqis determined byv. By Corollary 37,v−1has a

uniformizeru∈ V. Then, by Lemma 18,q=v◦ f with f =u◦q∈ V.

7.1. Weakly Determined Queries

A query qis determined by a view vif there exists a partial function f such

thatq = v◦ f. For practical purposes, this could be weakened toq ⊆ v◦ f. For

a given input s, one first checks if s ∈ dom(q), and if so, obtainsq(s) as f(v(s)).

We say thatq is weakly determined by vif there exists a partial function f such

thatq ⊆ v◦ f. As an example considerq = {(1,1)} andv = {(1,1),(2,1)}. Then

qisnot determined byv, but is weakly determined byv. Let Q,V,R be classes

10_{This means that if}_v₌_[[_M

1]]◦ · · · ◦[[Mk]] for giveneltRtransducersM1, . . . ,Mk, then one can

construct_dtR

fctransducersM 0 1, . . . ,M

0

nsuch thatv=[[M

0

1]]◦ · · · ◦[[M 0

n]].

11_{Similarly, in the other direction, it is nice when}_Q_{is closed under composition: for every view} v∈ Qand every queryq∈ Q“on that view”, the “viewless” queryv◦qis inQ. In this case one could say thatQadmits removal of views. This holds for_dtR_,

(24)

of partial functions. We say thatRiscomplete for weakV-to-Qrewritings, if for

everyq ∈ Qandv ∈ V such thatqis weakly determined byv, there is an f ∈ R

withq⊆v◦ f.

We now show that our main results also hold for weak determinacy and weak

rewritings. Clearly,qis weakly determined byvif and only ifqis determined by

the domain restrictionidR◦vofvtoR=dom(q), with the same functions finvolved.

Forq ∈_dtR_,_dom₍_q_{) is e}_ff_{ectively regular by [22, Corollary 2.7]; the same holds}

forq ∈ _dmsot becausedom(q) ismsodefinable by definition, and hence regular

by [31, 32]. And ifv ∈ fu-(eltR)∗, thenidR◦vis in fu-(eltR)∗ for every regular

tree languageR, by Corollary 10. This shows that Theorems 38 and 39 also hold

for weak determinacy.

Corollary 41. LetV=fu-(eltR)∗. Weak determinacy is decidable fordtRand for

dmsotunderV. The classesdtR,dmsot, andVare complete for weakV-to-dtR,

weakV-to-dmsot, and weakV-to-Vrewritings, respectively.

7.2. Extended Bottom-Up Tree Transducers

Finally, we compare our results with those in [6, 7]. The views in that paper are (partial functions) realized by extended linearbottom-uptree transducers, and the queries are realized by (ordinary)bottom-uptree transducers.

An extended bottom-up tree transducer (eb transducer, for short) is a tuple

M = (Q,Σ,∆,F,R) where Qis a ranked alphabet of states all of rank 1,Σand∆

are ranked alphabets of input and output symbols, respectively,F ⊆ Qis a set of

final states, andRis a finite set of rules of the form

C[q1(x1), . . . ,qk(xk)]→q(ζ)

wherek≥0,C_, x₁is anXk-context overΣ,q1, . . . ,qk,q∈Q, andζ∈T∆(Xk). For

every stateq∈ Qwe define theq-translation[[q]]M ⊆ TΣ×T∆as follows. For an

input trees∈T_Σ, theq-translation [[q]]M(s) of sis the smallest set of treesT ⊆ T∆

such that for every ruleC[q₁(x₁), . . . ,qk(xk)] → q(ζ) and all s1, . . . ,sk ∈ TΣ, if

s = C[s1, . . . ,sk] and ti ∈ [[qi]]M(si) for every i ∈ [k], thenT contains the tree

ζ[xi ←ti|i∈[k]]. Thetranslation realized by Mis [[M]]=∪q∈F[[q]]M.

The transducer Mislinear (anelbtransducer), if the right-hand sideq(ζ) of

each rule is linear, i.e., each variablexioccurs at most once inζ.

The transducer Mis an (ordinary, not extended)bottom-up tree transducer(b

transducer) if the left-hand side contextCof each of its rules contains exactly one symbol inΣ, i.e., each rule is of the forma(q1(x1), . . . ,qk(xk))→q(ζ) witha∈Σ(k)

andk ≥0.

Extended bottom-up tree transducers are studied in, e.g., [10, 13, 6]. In [13, 6, 7], the left-hand side of a rule is allowed to be of the formq1(x1) withq1∈Q, and

the corresponding (larger) class of translations is denotedxbot. It is easy to show

(25)

that [[M]] is a partial function (as is the case for views and queries), in such a way that linearity ofMis preserved.12

In the next proposition we show that for linear extended transducers, top-down (with look-ahead) gives the same translations as bottom-up, just as for nonextended transducers (see [22, Theorem 2.8]). This result was already pointed out below Proposition 5 in [13] (see also [33, Theorem 3.1]). Based on Proposition 8, its proof is a standard one, relating bimorphisms to finite-state transducers (see [34] and [10]).

Proposition 42. _eltR =_elb.

Proof. Letbimdenote the class of all tree translations{(f(r),g(r))| r ∈R}where

f is a linear nondeleting nonerasing tree homomorphism,gis a linear tree

homo-morphism, andRis a regular tree language. In accordance with Proposition 8, we

show thatbim=elb.

bim⊆elb: Letτ={(f(r),g(r))|r ∈Rτ}be abimtranslation, where f :TΩ→ T_Σ, g : T_Ω → T_∆, andRτ ⊆ T_Ω. Let A = (Q,Ω,F, δ) be an fta acceptingRτ. We construct theelbtransducerM = (Q,Σ,∆,F,R) with the following rules. Let ω ∈ Ω(k), k ≥ 0, and letq,q1, . . . ,qk ∈ Qsuch thatδ(ω,q1, . . . ,qk) = q. Then R

contains the ruleC[q₁(x₁), . . . ,qk(xk)] → q(ζ), whereC = fω andζ = gω. Note

that sincef is linear and nondeleting,Cis anXk-context, and since fis nonerasing,

C ,x1. Also,ζis linear becausegis.

It should be clear that [[M]] = τ. Formally, it is straightforward to show, for everyq∈Q,s ∈T_Σandt∈T_∆, thatt∈[[q]]M(s) if and only if there existsr ∈TΩ

such thatδ(r) = q, f(r) = s and g(r) = t. The proof in the if direction is by induction on the structure ofr, and the one in the only-if direction by induction on the size ofs.

elb ⊆ bim: This inclusion holds by the proof of Lemma 6 of [13]. For

com-pleteness’ sake we also prove it here. LetM=(Q,Σ,∆,F,R) be anelbtransducer.

We turn the set of rulesRinto a ranked alphabetΩby defining the rank of a rule

ω : C[q1(x1), . . . ,qk(xk)] → q(ζ) to be k. We define the regular tree language

D = L(A) where A = (Q∪ {⊥},Ω,F, δ) is the fta such that, for the rule ωabove, δ(ω,q1, . . . ,qk) = q andδ(ω,q0₁, . . . ,q0_k) = ⊥ for all q0₁, . . . ,q0_k ∈ Q∪ {⊥} with

(q0₁, . . . ,q0_k) , (q1, . . . ,qk). Intuitively, Dcan be viewed as the set of derivation

trees of M. Finally we define the tree homomorphisms f andgsuch that, for the

ruleωabove, fω =Candgω =ζ.

Letτ be thebim translation{(f(r),g(r)) | r ∈ D}, and let theelb transducer

M0=(Q∪ {⊥},Σ,∆,F,R0) be associated withτas in the proof ofbim⊆elbabove

(withR_τ= D). So, [[M0]]=τ. SinceR⊆ R0and the rules inR0−Rall lead to the

nonfinal state⊥, it is obvious that [[M0]]=[[M]], and henceτ=[[M]].

12_{First, repeatedly add rules as follows: if}_C_[_q

1(x1), . . . ,qk(xk)]→ q(ζ) andq(x1) → q0(ζ0) are

rules, then add the ruleC[q1(x1), . . . ,qk(xk)] → q0(ζ0[x1 ← ζ]). Second, remove all rules with