Look-ahead removal for total deterministic top-down tree transducers

(1)

Look-Ahead Removal for Total Deterministic

Top-Down Tree Transducers

Joost Engelfriet1_{, Sebastian Maneth}2_{, and Helmut Seidl}3 1

LIACS, Leiden University, The Netherlands [email protected] 2

School of Informatics, University of Edinburgh, United Kingdom [email protected]

3

Institut für Informatik, Technische Universität München, Germany [email protected]

Abstract. Top-down tree transducers are a convenient formalism for describ-ing tree transformations. They can be equipped with regular look-ahead, which allows them to inspect a subtree before processing it. In certain cases, such a look-ahead can be avoided and the transformation can be realized by a transducer without look-ahead. Removing the look-ahead from a transducer, if possible, is technically highly challenging. For a restricted class of transducers with look-ahead, namely those that are total, deterministic, ultralinear, and bounded erasing, we present an algorithm that, for a given transducer from that class, (1) decides whether it is equivalent to a total deterministic transducer without look-ahead, and (2) constructs such a transducer if the answer is positive. For the whole class of total deterministic transducers with look-ahead we present a similar algorithm, which assumes that a so-called difference bound is known for the given trans-ducer. The designer of a transducer can usually also determine a difference bound for it.

1 Introduction

Many simple tree transformations can be modeled by top-down tree transducers [26, 27]. They are recently used in XML database theory (e.g., [9, 18, 20, 23–25]), in com-putational linguistics (e.g., [19, 21, 22]) and in picture generation [4]. Atop-down tree transduceris a finite-state device that scans the input tree in a (parallel) top-down fash-ion, simultaneously producing the output tree in a (parallel) top-down fashion. A more expressive (but also more complex) model for specifying tree translations is the top-down tree transducerwith regular look-ahead[6]. It consists of a top-down tree trans-ducer and a finite-state bottom-up tree automaton, called the look-ahead automaton. We may think of its execution in two phases: In a first phase the input tree is relabeled by attaching to each input node the active state of the automaton, called the look-ahead state at that node. In the second phase the top-down tree transducer is executed over the relabeled tree, thus possibly making use of the look-ahead information in the new input labels. As an example, consider a deterministic transducerMexof which the look-ahead automaton checks whether the input tree contains a leaf labeleda. If so, thenMex out-putsa, and otherwise it outputs a copy of the input tree. It should be clear that there is

(2)

no deterministic top-down tree transducer (without look-ahead) that realizes the same translation asMex. The intuitive reason is that in general the complete input tree must be read and buffered in memory, before the appropriate choice of output can be made. How can we formally prove that indeed no deterministic top-down tree transducer (without look-ahead) can realize this translation? In general, is there a method to determine for a given top-down tree transducerwith look-ahead, whether or not its translation can be realized by a top-down tree transducerwithout look-ahead? And if the answer is yes, can such a transducer be constructed from the given one?

In this paper we give two partial answers to these questions, where we restrict our-selves to total deterministic transducers (which will not be mentioned any more in the remainder of this introduction). For such transducers we provide a general method as discussed above. However, part of the method is not automatic, but depends on addi-tional knowledge about the given transducer with look-ahead (which can usually be de-termined by the designer of the transducer). For a restricted type of transducers (where the restrictions concern the capability of the transducer to copy and erase) that knowl-edge can also be obtained automatically, which means that for a thus restricted trans-ducer with look-ahead it is decidable whether its translation can be realized by a (nonre-stricted) transducer without look-ahead, and if so, such a transducer can be constructed from the given transducer.

The main notion on which our method is based, is that of a difference treeof a top-down tree transducer with regular look-ahead. Consider two trees obtained from one input tree by replacing one of its leaves by two different look-ahead states of the transducerM. Compare now the two output trees ofM on these input trees, whereM

treats the look-ahead state as representing an input subtree for which the look-ahead automaton ofM arrives in that state at the root of the subtree. SinceM is total and deterministic, these output trees exist and are unique, respectively. By removing the largest common prefix of the two output trees (i.e., every node of which every ancestor has the same label in each of the two trees), we obtain a number of output subtrees that we call difference trees ofM. Intuitively, the largest common prefix is the part of the output that does not depend on the two possible look-ahead states of the subtree, whereas a difference tree is a part of the output that can be produced becauseMknows the look-ahead state of the subtree. Thus, the setdiff(M)of all difference trees ofM

can be viewed as a measure of the impact of the look-ahead on the behaviour ofM. For the example transducerMexabove,diff(Mex)consists of the one-node treeaand all trees of which no leaf is labeleda (with one leaf representing a subtree without

a-labeled leaves); thus,diff(Mex)is infinite.

(3)

equivalent to the (canonical earliest) dtlaM. Then the dtlaM is at least as early asN. In other words, at each moment of the translation,M may be ahead ofN but not vice versa, i.e., the output ofNis a prefix of that ofM, which is becauseM has additional information through its look-ahead. The output ofNis the part ofM’s output that does not depend on the look-ahead state. Thus, when removing the output ofNfrom that of

M, the remaining trees are difference trees ofM. SinceNmust be able to simulateM, it has to store these difference trees in its states. Hence,diff(M)must be finite. More-over, it turns out that the above description ofN’s behaviour completely determinesN, and so, roughly speaking,Ncan be constructed fromM anddiff(M). Note that since

diff(Mex)is infinite, the translation ofMexcannot be realized by a dtop.

A natural numberhis adifference boundfor a dtlaM if the following holds: ifM

has finitely many difference trees, thenhis an upper bound on their height; in other words, if a tree indiff(M)has height> h, thendiff(M)is infinite. Our first main result is that it is decidable for a given dtlaM for which a difference bound is also given, whetherM is equivalent to a dtopN, and if so, such a dtopN can be constructed. We do not know whether a difference bound can be computed for every dtlaM, but the designer ofM will usually be able to determinediff(M)and hence a difference bound forM. Our second main result is that a difference bound can be computed for dtlas that are ultralinear and bounded erasing. Ultralinearity means that the transducer cannot copy an input subtree when it is in a cycle (i.e., in a computation that starts and ends in the same state). Thus it is weaker than the linear property (which forbids copying) but stronger than the finite-copying property [11, 8]. The latter implies that the size of the output tree of an ultralinear dtla is linear in the size of its input tree. Bounded erasing means that the transducer has no cycle in which no output is produced. The proof that a difference bound can be computed for ultralinear and bounded erasing dtlas, is based on pumping arguments that are technically involved.

The paper is structured as follows. Section 2 contains basic terminology, in particu-lar concerning prefixes of trees. Section 3 defines the dtla (deterministic top-down tree transducer with regular look-ahead) and discusses some of its basic properties. It also explains the treatment of look-ahead states that occur in the input tree. In Section 4 we define the notions of difference tree and difference bound, illustrated by some ex-amples. In Section 5 we discuss some normal forms for dtlas, in particularlook-ahead uniformitywhich is technically convenient. We prove that for every dtlaM there is an equivalent canonical earliest dtlaM0(which is also look-ahead uniform), and we show how to compute a difference bound for M0 from one of M. Our first main result is proved in Section 6, which is divided in three subsections. Section 6.1 starts with the definition of adifference tupleof a dtlaM, which generalizes the notion of difference tree by considering all look-ahead states ofM rather than just two. IfNis a dtop equiv-alent toM, then its states are in one-to-one correspondence with the difference tuples ofM (assuming that bothNandM are canonical earliest), see Lemma 26, and its ax-iom and rules are completely determined byM, see Lemmas 29 and 31. In Section 6.2 we present the algorithm that computesNfromM for a given difference bound forM

(4)

some basic properties of dtlas: thelinksthat exist between an input tree and its corre-sponding output tree, and for each node of the output tree, itsoriginin the input tree. In Section 8 the problem of computing a difference bound for a dtlaM is reduced to that of computing two related upper bounds: anoutput boundforM and anancestral boundforM. An output bound can be computed for every dtla. Finally, in Section 9, an ancestral bound is computed for every ultralinear and bounded erasing dtla. The com-puted output bound (in the previous section) and ancestral bound for a dtlaM are both based on pumping arguments (simple for the output bound, complicated for the ances-tral bound). In both cases a part of the input tree on whichM has a cyclic computation, is pumped in such a way that the corresponding output tree contains arbitrarily large difference trees. In the ancestral case the pumping argument is technically based on the fact thatM cannot copy and must produce output during its cyclic computations. Since the pumping of trees makes it hard to address nodes by the usual Dewey notation, a

dependency graphis defined forMsuch that a cyclic computation ofM corresponds to a cycle in its dependency graph; pumping the input tree then corresponds to repeating a cycle in the graph. At the end of Section 9 we consider two other classes of dtlas for which equivalence to a dtop is decidable (and if so, such a dtop can be constructed):

output-monadicdtlas anddepth-uniformdtlas. Output-monadic means that every node of an output tree has at most one child. Depth-uniform means, in its simplest form, that all states in the right-hand sides of the rules of the dtla are at the same depth.

Related Work. For deterministic string transducers with regular ahead, ahead removal is decidable, i.e., it is decidable whether a given transducer with look-ahead is equivalent to a transducer without look-look-ahead, and if so, such a transducer can be constructed. This was proved in [3] (see also [2, Theorem IV.6.1]), for so-called subsequential functions. We extend that result (for the total case) by proving that look-ahead removal is decidable for output-monadic dtlas.

Look-ahead has been investigated for other types of tree transducers. For macro tree transducers [12, 8] and streaming tree transducers [1], regular look-ahead can always be removed. The same is true for nondeterministic visibly pushdown transducers [15]. For deterministic visibly pushdown transducers the addition of regular look-ahead increases their power, but the decidability of look-ahead removal for these transducers is not studied in [15].

In [16] the deterministic multi bottom-up tree transducer (dmbot) was introduced and shown to have (effectively) the same expressive power as the dtla. Thus, our results can also be viewed as partial answers to the question whether it is decidable for a given dmbot to be equivalent to a dtop.

Note. The results of this paper were first presented at DLT 2014, see [10].

2 Preliminaries

The set of natural numbers is_N={0,1,2, . . .}, and_N+={1,2, . . .}. Fork∈Nwe

(5)

and onN∪ {∞}, withn < ∞for alln∈N. For a setS ⊆N∪ {∞},lubSdenotes

the least upper bound of the elements ofS. IfSis finite and nonempty, thenlubSis the maximal element ofS. Also,lub∅= 0.

The domain of a partial functionf is denoteddom(f). For a setA, we denote by

A∗the set of sequences, or strings, of elements ofA. A string(a1, . . . , an)∈A∗will

be denoteda1· · ·an, unless there is a danger of confusion. The concatenation of two

stringsuandvis denotedu·vor justuv, and the empty string is denotedε. A stringu

is a prefix (postfix) of a stringvif there exists a stringwsuch thatv =uw(v =wu); it is a proper prefix (postfix) if w 6= ε. The length of a stringuis denoted|u|. The cardinality of a setAis denoted|A|.

A directed edge-labeled graphGover a setA(of edge labels) consists of a setV

of nodes and a setE ⊆V ×A×V of edges. An edge(u, a, v)is said to be an edge with labelathat starts atuand ends atv, or shortly, fromutov. A (directed) pathπin

Gis a sequencee1· · ·en withn≥0andei ∈Efor everyi∈[n], such that for every

m∈[n−1]the edgeem+1starts at the node whereemends. Ifn= 0, i.e.,π=ε, then

πis a path fromutoufor everyu∈V. Ifn≥1, thenπis a path from the start node ofe1to the end node ofen. Ifπis a path fromutov, then it is said to start atuand end

atv. A path is a cycle if it is nonempty and starts and ends at the same node.

Top-down tree transducers, which will be recalled in Section 3, work on ranked trees. This means that the number of children of a node of a tree is determined by the symbol at that node. A ranked alphabet Σ is a finite set of symbols such that each symbola ∈ Σis implicitly equipped with a rank rk(a) ∈ N. Fork ∈ Nwe define

Σ(k) ₌ _{_a _∈ _Σ _| _rk₍_a_{) =} _k_}_{. To avoid trivialities, we assume that}_Σ(0) ₆₌ _∅_{. To} indicate thatσ∈Σhas rankk, we also write it asσ(k)_.

The setTΣof (finite, ordered, ranked) trees over the ranked alphabetΣis the

small-est set (of terms) such thata(t1, . . . , tk)∈ TΣifk∈N,a∈Σ(k), andt1, . . . , tk∈ TΣ.

Ifa∈Σ(0)_{, then we also write}_a_{for the tree}_a_{(). If}_a_∈_Σ(1)_and_t_{∈ T}

Σ, then we also

writeatfora(t). More generally, for a stringw=a1· · ·anwithn∈Nandai∈Σ(1),

we writew(t) =a1· · ·an(t)for the treea1(a2(· · ·an(t)· · ·)); in particularε(t) =t.

We represent the nodes of a tree in Dewey notation, i.e., by strings of positive natural numbers. The empty stringεrepresents the root node and, fori ∈ N+,virepresents

theith child of the nodev (andv is the parent ofvi). Every nodevof a treethas a label inΣ, denotedlab(t, v). Formally, the setV(t)⊆N∗+of nodes (together with their labels) of the treetis inductively defined as:V(t) ={ε} ∪ {iv |i ∈[k], v ∈V(ti)}

if t = a(t1, . . . , tk),a ∈ Σ(k), and t1, . . . , tk ∈ TΣ; moreover,lab(t, ε) = aand

lab(t, iv) = lab(ti, v). A nodeuis an ancestor of nodev (andv is a descendant of

u) ifuis a prefix ofv; it is a proper ancestor/descendant if it is a proper prefix. For

∆ ⊆Σ, we defineV∆(t) = {v ∈V(t)| lab(t, v) ∈∆}; fora ∈Σ, we writeVa(t)

instead ofV_{a}(t). The subtree oftrooted atv ∈ V(t)is denoted by t/v; formally,

t/ε = tand ift = a(t1, . . . , tk)thent/iv = ti/v. The size oft, denotedsize(t), is

its number|V(t)|of nodes. The height oft, denotedht(t), is the maximal length of its nodes, i.e.,max{|v| |v ∈V(t)}. As an example, ift=σ(σ(a, b), τ(b)), thenV(t) =

{ε,1,(1,1),(1,2),2,(2,1)}, lab(t,(1,2)) = b, Vb(t) = {(1,2),(2,1)}, t/1 =

σ(a, b), size(t) = 6, andht(t) = 2. The height of a tuple of trees¯t = (t1, . . . , tk),

(6)

Let∆ be a ranked alphabet such that every symbol inΣ∩∆has the same rank inΣ and∆. For a set of treesT ⊆ T∆, we defineΣ(T) ⊆ TΣ∪∆ to be the set of

trees a(t1, . . . , tk)such that k ∈ N, a ∈ Σ(k), and t1, . . . , tk ∈ T, and we define TΣ(T)⊆ TΣ∪∆to be the smallest set of treesT0such thatT ∪Σ(T0)⊆ T0. Note that TΣ(∅) =TΣ.

AΣ-patternis an upper portion, or prefix, of a tree inTΣ. Formally the setPΣ

ofΣ-patterns is defined to be the set of treesTΣ({⊥}), where⊥is a new symbol of rank zero that is not inΣ. Ift0is a pattern containing exactlykoccurrences of⊥, and

t1, . . . , tk is a sequence of kpatterns, then the patternt = t0[t1, . . . , tk]is obtained

fromt0by replacing theith occurrence of⊥(in left-to-right order) byti. AΣ-context

is aΣ-pattern that contains exactly one occurrence of⊥. The set of allΣ-contexts is denotedCΣ. Thus, forC ∈ CΣ andt ∈ TΣ, the treeC[t] ∈ TΣ is obtained from the contextCby replacing the unique occurrence of⊥inCbyt.

On the setPΣ we define a partial ordervas follows: for patternstandt0 _in_PΣ_,

t0is aprefix oft, denotedt0 vt, ift = t0[t1, . . . , tk]for suitable patternst1, . . . , tk;

equivalently,Va(t0) ⊆Va(t)for everya∈ Σ. Obviously,⊥ v tfor every patternt.

We note that in [9] the inverse of the partial ordervis used. Every nonempty setΠof

Σ-patterns has a greatest lower bounduΠ inPΣ, called thelargest common prefixof

the patterns inΠ; it is the unique patternt0 such that for everyv ∈ _N∗

+ anda ∈ Σ,

v ∈Va(t0)if and only if (1)v∈Va(t)for everyt ∈Π and (2) every proper ancestor

ofvis inV(t0). This implies the following easy lemma. Lemma 1. LetΠ be a nonempty subset ofTΣ, and letv∈N∗+.

Then,v∈V⊥(uΠ)if and only if

(1) v∈V(t)for everyt∈Π,

(2) lab(t1,ˆv) =lab(t2,vˆ)for every proper ancestorvˆofvand allt1, t2∈Π, and

(3) there existt1, t2∈Π such thatlab(t1, v)6=lab(t2, v).

For instance,u{σ(τ(a), b), σ(b, b)}=σ(τ(a), b)uσ(b, b) =σ(⊥, b).

Fort, t0 ∈ TΣandv∈V(t), we denote byt[v←t0]the tree that is obtained fromt

by replacing its subtreet/vbyt0. More precisely, ifCis the unique context inCΣsuch thatCvtandC/v=⊥, thent[v←t0] =C[t0].

LetS be a subset of TΣ such that nos ∈ S is a subtree ofs0 ∈ S withs 6= s0. For a tree t ∈ TΣ and a partial function ψ : S → TΣ, we define t[s ← ψ(s) |

s ∈ S] to be the result of replacing every subtrees oft byψ(s), for every s ∈ S. More precisely,t[s ← ψ(s) | s ∈ S] =t[v1 ←ψ(t/v1)]· · ·[vk ← ψ(t/vk)]where {v1, . . . , vk}={v∈V(t)|t/v∈ S}. Note thatviis not an ancestor ofvj, fori6=j,

and hence the order of the substitutions [vi ← ψ(t/vi)]is irrelevant. Note also that

t[s←ψ(s)|s∈ S]is defined if and only ifψ(t/vi)is defined for everyi∈[k].

To formulate the rules of top-down tree transducers, we use variablesxi, withi∈N,

which are assumed to have rank 0. The set {x0, x1, x2, . . .} of all such variables is denotedX. Fork∈_N, we denote{x1, . . . , xk}byXk; note thatX0=∅.

3 Deterministic Top-Down Tree Transducers

(7)

are the ranked input and output alphabets, respectively, andP is a finite nonempty set of look-ahead states. The functionAmaps look-ahead states to trees inT∆(Q({x0})); forp∈P, the treeA(p)is called thep-axiom ofM. The finite setRprovides at most one rule

q(a(x1:p1, . . . , xk:pk))→ζ

for every stateq, every input symbolaof rankk ≥0 and every sequencep1, . . . , pk

of look-ahead states. The right-hand sideζ of the rule is a tree in T∆(Q(Xk)), i.e.,

ζ =t[q1(xi1), . . . , qr(xir)]for some patternt∈ P∆,r=|V⊥(t)|,qj ∈Q, andxij ∈ {x1, . . . , xk}forj∈[r]; we will denoteζalso byrhs(q, a, p1, . . . , pk). Finally,δis the

transition function of the (total deterministic bottom-up) look-ahead automaton(P, δ). That means thatδ(a, p1, . . . , pk)∈P for everyk≥0,a∈Σ(k), andp1, . . . , pk ∈P.

Examples of dtlas are given in the next section. Whenever we consider a dtla with the nameM, it will be understood that its components are named(Q, Σ, ∆, R, A, P, δ). When necessary we provide the components of a dtlaM with the subscriptM. Then we haveQM,ΣM,∆M,RM,rhsM, etc. We denote bymaxrhs(M)the maximal height

of the axioms and the right-hand sides of the rules ofM.

We now define the semantics of the dtlaM, starting with the semantics of its look-ahead automaton(P, δ). The transition functionδgives rise to a functionδ∗that maps

TΣ toP. It is defined byδ∗(a(s1, . . . , sk)) = δ(a, δ∗(s1), . . . , δ∗(sk))fora ∈ Σ(k)

ands1, . . . , sk ∈ TΣ. For convenience, we denote the functionδ∗ by δas well. For

p ∈ P we denote by_Jp_KM the set of treess ∈ TΣ that have look-ahead statep, i.e.,

δ(s) = p; we drop the subscriptM from_Jp_KM whenever it is clear from the context.

Note that{_Jp_K|p∈P}is a partition ofTΣ. For a nodeuof an input trees∈ TΣ, we also say thatδ(s/u)is the look-ahead state atu.

Forq∈ Q,s ∈ TΣ, andu∈ V(s), we definerhs(q, s, u) = rhs(q, a, p1, . . . , pk)

wherelab(s, u) =a∈Σ(k)andpi=δ(s/ui)for everyi∈[k]. Intuitively,rhs(q, s, u)

is the right-hand side of the rule that is applied whenM arrives at nodeuin stateq(if that rule exists); it is uniquely determined by the label ofuand the look-ahead states at its children.

Asentential formofM fors ∈ TΣ is a tree inT∆(Q(V(s))), where the nodes in

V(s)are viewed as symbols of rank 0. For sentential formsξ, ξ0 we writeξ ⇒s ξ0 if

there existv∈V(ξ),q∈Q, andu∈V(s)such that

ξ/v=q(u)andξ0 =ξ[v←rhs(q, s, u)[xi←ui|i∈N+]].

This will be called a computation step ofM in stateqat nodesuandv. It is easy to see that the rewriting in computations is confluent (i.e., ifξ⇒∗

sξ1andξ⇒∗sξ2, then there

exists a sentential formξ¯such thatξ1⇒∗sξ¯andξ2⇒∗s ξ¯). Hence, ifξ⇒∗s t∈ T∆and

ξ ⇒∗

s ξ0, thenξ0 ⇒∗s t; thus, computations that start with a given sentential form lead

to a unique tree inT∆(if it exists).

The dtlaM realizes a partial functionJMK : TΣ → T∆, called its translation. Lets ∈ TΣ andδ(s) = p ∈P. The output treeJMK(s)of the transducerM for the input treesis the unique treet∈ T∆such thatA(p)[x0 ←ε] ⇒∗s t(if it exists). For

readability, we will writeM(s)instead of_JM_K(s).

Two dtlasM1 andM2 are equivalent if they realize the same translation, i.e., if

(8)

Intuitively, a sentential formξconsists of output that has already been produced by

M; moreover,ξ/v=q(u)means thatMhas arrived at nodeuof the input treesin state

qand, starting in that state, will translate the input subtrees/uinto the output subtree

M(s)/v. Note that several parallel copies ofM can arrive atufor different nodes ofξ, i.e., there may exist nodesv0 6=vsuch thatξ/v0 =q0(u), whereq0may also be equal toq.

A sentential formξforsisreachableifA(p)[x0←ε]⇒∗sξwherep=δ(s). Thus,

ifM(s)is defined andξis a reachable sentential form fors, thenξ⇒∗

sM(s).

We also define the semantics of every stateq ofM as a partial function_Jq_KM : TΣ → T∆ as follows. For s ∈ TΣ, _Jq_KM(s) is the unique tree t ∈ T∆ such that

q(ε)⇒∗

st(if it exists). For readability, we will writeqM(s)instead ofJqKM(s). The following lemma is easy to prove.

Lemma 2. Lets∈ TΣandt∈ T∆.

(1) For everyq∈Qandu∈V(s),

q(u)⇒∗_st if and only if qM(s/u) =t.

(2) For every sentential formξfors,

ξ⇒∗st if and only if t=ξ[q(u)←qM(s/u)|q∈Q, u∈V(s)].

(3) Ifδ(s) =p, then

M(s) =A(p)[q(x0)←qM(s)|q∈Q].

(4) For everyq¯∈Q, ifs=a(s1, . . . , sk), then

¯

qM(s) =rhs(¯q, a, δ(s1), . . . , δ(sk))[q(xi)←qM(si)|q∈Q, i∈[k]].

Proof. (1) follows from the obvious bijection between the nodes ofs/uand the nodes ofswith prefixu(i.e., the descendants ofuins).

(2) is obvious from (1) and the fact that the computation stepsξ ⇒s ξ0 ofM are

context-free. In fact, the computations of M on scan be viewed as derivations of a context-free grammar with the set of nonterminalsQ(V(s))and with rulesq(u) →

rhs(q, s, u)[xi←ui|i∈N+].

(3) and (4) follow from (2), takingξ=A(p)[x0←ε]andξ=rhs(¯q, s, ε)[xi←i|

i∈[k]], respectively. ut

Note that (3) and (4) of Lemma 2 form an alternative way of defining the semantics ofM (recursively).

Convention.For a given dtlaMit can be assumed that it isreduced, i.e., that all its states and look-ahead states arereachablein the following sense. A look-ahead statep

is reachable if_Jp_KM 6=∅. A stateqis reachable ifqoccurs in an axiom, or ifqoccurs

(9)

that it has certain properties, we allow the dtla to be non-reduced, but we will ensure that those properties are preserved under reduction (i.e., removing unreachable states and look-ahead states and the rules in which they occur). However, that will not be

mentioned explicitly. ut

Adeterministic top-down tree transducer (dtopfor short) is a dtla M with triv-ial look-ahead automaton(P, δ), i.e.,P is a singleton. Whenever convenient, we drop (P, δ)from the tuple definingM, we identifyAwith the unique axiomA(p), we write a rule asq(a(x1, . . . , xk))→ζrather thanq(a(x1:p, . . . , xk:p))→ζ(wherepis the

unique look-ahead state ofM) and we denoteζbyrhs(q, a).

A dtlaM isproper(adtplafor short) if it is not a dtop, i.e., if|P| ≥2. Obviously, to decide whetherM is equivalent to a dtop, we may assume thatM is proper.

A dtlaM istotalifdom(_JM_K) =TΣ, i.e., if its translationJMKis a total function. Note that it is decidable whetherM is total, becausedom(_JM_K)is effectively a regular tree language (cf. [6, Corollary 2.7]). From now on we mostly consider total dtlas.

A dtlaM iscompleteifrhs(q, a, p1, . . . , pk)is defined for everyq ∈ Q,k ∈ N,

a∈ Σ(k)_{, and}_p

1, . . . , pk ∈P. By Lemma 2(4), this means thatqM(s)is defined for

everyq∈Qands∈ TΣ. Thus, by Lemma 2(3), ifM is complete, thenM is total. A dtlaM islinearif (1) for everyp∈ P, the variablex0occurs at most once in

A(p), and (2) for every ruleq(a(x1:p1, . . . , xk:pk))→ζ, each variablexioccurs at

most once inζ.

A dtlaM isultralinearif there is a mappingµ :Q →Nsuch that for every rule

q(a(x1:p1, . . . , xk:pk))→ ζthe following two properties hold for everyq¯(xi)that

occurs inζ: (1)µ(¯q)≥ µ(q), and (2) ifµ(¯q) = µ(q), thenxi occurs only once inζ.

Obviously, every linear dtla is ultralinear. We note that ultra-linearity was first defined for context-free grammars, in [17].

A dtlaM isnonerasingif it does not have erasing rules. A rule ofM is anerasing ruleif its right-hand side is inQ(X), i.e., contains no symbols from∆.

A dtlaMisbounded erasing(for short,b-erasing) if there is no cycle in the directed graphEM with the set of nodesQand an edge fromqtoq0 if there is an erasing rule

of the formq(a(x1:p1, . . . , xk:pk)) → q0(xj). Obviously, every nonerasing dtla is

b-erasing.

Note that all the above properties, except properness, are preserved under reduction.

Look-ahead states in input trees. LetM = (Q, Σ, ∆, R, A, P, δ)be a total dtla. To analyze the behaviour ofMfor different look-ahead states, we will consider input trees ¯

swith occurrences of p ∈ P, viewed as input symbol of rank zero, representing an absent subtree swithδ(s) = p. IfM arrives in stateqat a p-labeled leaf ofs¯, then

M will output the new symbolhq, piof rank zero, representing the absent output tree

qM(s). In this way,M translates input treess¯∈ TΣ(P)to output trees inT∆(Q×P).

Without loss of generality we assume that P andΣ are disjoint, and so areQ×P

and∆.

Formally, we extendM to a dtlaM◦ = (Q, Σ◦, ∆◦, R◦, A, P, δ◦)whereΣ◦ =

(10)

q(p)→ hq, pifor allq∈Qandp∈Psuch thatqM(s)is defined for somes∈JpKM, andδ◦is the extension ofδsuch thatδ◦(p) =pfor everyp∈P.

For notational simplicity, we will denoteδ◦(¯s),M◦(¯s), andqM◦(¯s)byδ(¯s),M(¯s),

andqM(¯s), respectively, for every input tree¯s∈ TΣ(P). But note that we donotdrop

◦_from

JpKM◦,JM ◦

K, andJqKM◦, i.e.,JpKM,JMK, andJqKM keep their meaning. We will use the following elementary lemma, which expresses the above intuition. Lemma 3. LetM be a total dtla. Let¯sbe a tree inTΣ(P), and for everyp∈P, letsp

be a tree inTΣ(P)such thatδ(sp) =p. Thenδ(¯s[p←sp|p∈P]) =δ(¯s)and

M(¯s[p←sp|p∈P]) =M(¯s)

hq, pi ←qM(sp)|q∈Q, p∈P

. (1)

Proof. Lets0 = ¯s[p←sp|p∈P]. It should be clear thatδ(s0/v) =δ(¯s/v)for every

nodevof¯s. Letp0 =δ(s0) =δ(¯s).

We first assume thatsp ∈ TΣ for every p ∈ P. Thuss0 ∈ TΣ, and soM(s0)is

defined. LetU be the set of nodesuofs¯such thatlab(¯s, u) ∈P, and foru∈ U, let

pu = lab(¯s, u). Now consider a computation A(p0)[x0 ← ε] ⇒∗s0 ξof M such that ξ ∈ T∆(Q(U))and none of the computation steps is at a nodeu∈U (and hence not at a descendant ofu). Then, by the observation above, alsoA(p0)[x0 ←ε] ⇒∗¯s ξ. By

Lemma 2(2),M(s0) =ξ[q(u)←qM(spu)|q∈Q, u∈U]. Thus,qM(spu)is defined

for everyq(u)that occurs inξ, and soq(pu)→ hq, puiis a rule ofM◦. Henceξ ⇒∗s¯

ξ[q(u)← hq, pui |q∈Q, u∈U]and soM(¯s) =ξ[q(u)← hq, pui |q∈Q, u∈U]. This proves Equation (1) for the case wheresp ∈ TΣ. It also implies thatM◦is total,

because for every¯s∈ TΣ(P)one can choose somesp∈JpKMfor eachp∈P, and then

M(¯s[p←sp|p∈P])is defined and hence so isM(¯s). SinceM◦is total, the previous

argument also proves Equation (1) for the general case wheresp∈ TΣ(P). ut

Note that the proof of Lemma 3 shows that ifM is total, then so isM◦. From Lemma 3 we immediately obtain the next lemma forΣ-contexts. Note that for every

C∈ CΣandp∈P, the treeM(C[p])is inT∆(Q× {p}).

Lemma 4. LetMbe a total dtla. LetC∈ CΣ,s∈ TΣ(P), andp∈Psuch thatδ(s) =

p. Thenδ(C[s]) =δ(C[p])andM(C[s]) =M(C[p])

hq, pi ←qM(s)|q∈Q.

Proof. Apply Lemma 3 withs¯=C[p]andsp=s. ut

In Section 6 the next lemma will be needed.

Lemma 5. LetMbe a total dtop. ThenMis complete, and for everyq∈Qthere exists C∈ CΣsuch thathq, pioccurs inM(C[p]), whereP ={p}.

Proof. It is convenient to assume thatp = ⊥(and soC[p] = C). We first prove the second statement. Since we assume, by convention, that every stateqofM is reachable, we proceed by induction on the definition of reachability. Ifq(x0)occurs in the axiomA ofM, thenC=⊥satisfies the requirement becauseM(⊥) =A[¯q(x0)← hq,¯ ⊥i |q¯∈

Q]. Now letq(a(x1, . . . , xk))→ζbe a rule ofMsuch thatqis reachable, and letζ/z=

(11)

C[a(s1, . . . , sk)]. By Lemma 4,M(C0) = M(C)[hq,¯ ⊥i ←q¯M(a(s1, . . . , sk))|q¯∈

Q]and by Lemma 2(4),qM(a(s1, . . . , sk)) = ζ[¯q(xi) ← q¯M(si) | q¯∈ Q, i∈ [k]].

ThenM(C0)/vz =qM(a(s1, . . . , sk))/z = q0M(sj) = hq0,⊥i. So,hq0,⊥ioccurs in

M(C0).

To show thatM is complete, letq ∈Qanda ∈ Σ(k). We have shown that there existC ∈ CΣ andv ∈ V(M(C))such thatM(C)/v = hq,⊥i. LetC0 be as above. ThenM(C0)/v=qM(a(s1, . . . , sk)). Hencerhs(q, a)is defined by Lemma 2(4). ut

4 Difference Trees

LetM be a total dtla. We wish to decide whetherM is equivalent to a dtop. LetCbe aΣ-context and letp, p0 ∈ P. As explained in the Introduction, we are interested in the difference between the output ofM on inputC[p]and its output on inputC[p0], see also Lemma 4. Intuitively, a dtopNthat is equivalent toMdoes not know whether the subtreesof an input treeC[s]has look-ahead stateporp0, and hence, when reading the contextC, it can output at most the largest common prefixM(C[p])uM(C[p0])of the output treesM(C[p])andM(C[p0]). Recall thatM(C[p])denotesM◦(C[p]), which is defined becauseM◦ is total (as shown in the proof of Lemma 3). Letvbe a node of

M(C[p])uM(C[p0])with label⊥. Then we say thatM(C[p])/vis adifference tree

ofM (and hence, by symmetry, so isM(C[p0])/v). Thus, a difference tree is a part of the output that can be produced byM because it knows thatshas look-ahead statep

(orp0). Intuitively, to simulateM, the dtopN must store the difference trees in its state. Hence, forN to exist, there should be finitely many difference trees (as will be proved in Corollary 28). We denote the set of all difference trees ofM bydiff(M), for varying

C,p,p0, andv. Thus we define

diff(M) =

{M(C[p])/v|C∈ CΣ, p∈P,∃p0∈P :v∈V⊥(M(C[p])uM(C[p0]))},

which is a subset ofT∆(Q×P). We define the numberlubdiff(M)∈N∪ {∞}to be

the least upper bound of the heights of all difference trees ofM, i.e.,

lubdiff(M) =lub{ht(t)|t∈diff(M)}

(for the definition oflubsee Section 2). Intuitively,lubdiff(M)gives a measure of how much the transducerM makes use of its look-ahead information. Clearly,lubdiff(M) is finite (i.e., is in N) if and only if diff(M) is finite. We will say that a number

h(M) ∈ Nis a difference boundfor M if the following holds: ifdiff(M)is finite,

thenlubdiff(M) ≤ h(M). Thus, ifdiff(M)is infinite, then any natural number is a difference bound forM; ifdiff(M)is finite, then any upper bound of the heights of the (finitely many) difference trees ofM is a difference bound forM.

(12)

A nodev∈V⊥(M(C[p])uM(C[p0]))will be called adifference nodeofM(C[p]) andM(C[p0]). It is characterized in the next lemma, which follows immediately from Lemma 1.

Lemma 6. LetC∈ CΣ,p, p0∈P, andv∈N∗+. Then,

vis a difference node ofM(C[p])andM(C[p0])if and only if

(1) v∈V(M(C[p]))∩V(M(C[p0])),

(2) lab(M(C[p]),ˆv) =lab(M(C[p0]),ˆv)for every proper ancestorvˆofv, and (3) lab(M(C[p]), v)=6 lab(M(C[p0]), v).

Note that ifvis a difference node ofM(C[p])andM(C[p0]), thenp6=p0. Thus, in the definition ofdiff(M)we can assume thatp 6= p0. Hence, ifM is a dtop then

diff(M) =∅and solubdiff(M) = 0. Note also that, to computelubdiff(M)for a dtla

M, it suffices to consider difference trees of non-zero height, i.e., difference nodesv

that are not leaves ofM(C[p]).

We now give some examples of dtlas with their sets of difference trees.

Example 7. LetΣ=∆={σ(1), a(0), b(0)}, which means thatΣand∆are the ranked alphabet{σ, a, b}withrk(σ) = 1andrk(a) =rk(b) = 0. We consider the following total dtla M = (Q, Σ, ∆, R, A, P, δ) withM(σn_a_{) =} _a_and _M₍_σn_b_{) =} _σn_b _for

everyn ∈ N. It is, in fact, the dtlaMexof the Introduction, for this particular input alphabet. Its set of look-ahead states isP ={pa, pb}with transition functionδdefined

byδ(a) =pa,δ(b) =pb,δ(σ, pa) =pa, andδ(σ, pb) =pb. Its set of states isQ={q},

its two axioms areA(pa) = aandA(pb) = q(x0), and its setRof rules contains the two rulesq(σ(x1:pb))→σ(q(x1))andq(b)→b.

Clearly,CΣ = {σn⊥ | n ∈ N}and forC = σn⊥we haveM(C[pa]) = aand

M(C[pb]) =σnhq, pbi. SinceM(C[pa])uM(C[pb]) = ⊥, the only difference node

of M(C[pa])and M(C[pb]) isε, and we obtain the difference trees M(C[pa])and

M(C[pb]). Hence,diff(M) ={a} ∪ {σnhq, pbi |n∈N}andlubdiff(M) =∞. Since

diff(M)is infinite,M is not equivalent to a dtop, as will be shown in Corollary 28. ut

Example 8. LetΣ=∆={σ(1), τ(1), a(0), b(0)}and consider the following total dtla

M. For an input trees with leaf a,M outputs the top-most 80 unary symbols ofs

and the leaf a if size(s) > 80, and it outputs s if size(s) ≤ 80. The same is true for an input tree with leaf b, with 30 unary symbols instead of 80. The look-ahead automaton of M is similar to the one of the previous example:P = {pa, pb} with

δ(a) = pa,δ(b) = pb, andδ(γ, p) = pforγ ∈ {σ, τ}andp∈ P. The set of states

isQ = {qa

i | i ∈ [80]} ∪ {qib | i ∈ [30]}. The axioms areA(pa) = q80a (x0)and

A(pb) = q30b (x0). For the states with superscripta,M has the rulesqia(a) → afor

i∈[80], and forγ∈ {σ, τ}, the rulesqa

i(γ(x1:pa))→γ(qai−1(x1))for1 < i≤80, andqa

1(γ(x1:pa))→γ(a). The rules for the states with superscriptbare similar (in the

obvious way, withbinstead ofaand30instead of80).

EveryΣ-contextCis of the formC =w(⊥)withw∈ {σ, τ}∗_{, cf. Section 2 for} this notation. For such a context we haveM(C[pa]) =w(hqa_80−|_w_|, pai)if|w| <80,

(13)

w(hqa

80−|w|, pai)andM(C[pb]) =w(hqb_30−|w|, pbi), henceM(C[pa])uM(C[pb]) =

w(⊥), which gives the difference treeshqa

i+50, paiandhqib, pbi, wherei= 30− |w|. For

w=w1zwith|w1|= 30and|z|<50, we get thatM(C[pa]) =w1z(hq_80−|a _w₁_z_|, pai) andM(C[pb]) = w1(b), henceM(C[pa])uM(C[pb]) =w1(⊥), with the difference

treesz(hqa

50−|z|, pai)andb. Finally, ifw=w1zw2with|w1|= 30and|z|= 50, then

M(C[pa]) =w1z(a)andM(C[pb]) =w1(b), hence againM(C[pa])uM(C[pb]) =

w1(⊥), now with the difference treesz(a)and (again)b. Hence,lubdiff(M) = 50. It is not difficult to see that there exists a dtopN equivalent toM. It outputs the top-most30symbols, and then it stores in its state the next50(or less) unary symbols of the input trees, and depending on the leaf label ofs, it outputs these symbols and

aor it outputsb. Formally, its set of states isQN = QN1∪QN2 withQN1 = {qi |

i ∈ [30]} andQN2 = {qz | z ∈ {σ, τ}∗,|z| ≤ 50}, and its axiom isq30(x0). The

rules ofN are the following, for everyqi ∈ QN1,qz ∈ QN2, andγ ∈ {σ, τ}. First,

the rulesqi(γ(x1)) → γ(qi−1(x1))fori 6= 1, andq1(γ(x1)) → γ(qε(x1)). Second,

qz(γ(x1))→qzγ(x1)for|z|<50, andqz(γ(x1))→qz(x1)for|z|= 50. Third, and finally,qi(a)→a, qi(b)→b, qz(a)→z(a), andqz(b)→b. ut

Example 9. LetΣ={σ(2), aa(0), ab(0), ba(0), bb(0)}where we viewaa,ab,ba, andbb

as symbols, and let∆={σ(3),#(2), a(0), b(0)} ∪Σ(0). We consider the following total dtlaM such thatM(aa) =aa,M(ab) =ab,M(ba) =ba,M(bb) =bb, and for every

s1, s2∈ TΣ,M(σ(s1, s2)) =σ(M(s1), M(s2),#(y, z))wherey∈ {a, b}is the first letter of the label of the left-most leaf ofσ(s1, s2)andz∈ {a, b}is the second letter of the label of its right-most leaf. Its look-ahead automaton has four statespyzwithy, z∈ {a, b}, such thatδ(yz) =pyz andδ(σ, pwx, pyz) = pwzfor allw, x, y, z ∈ {a, b}. It

has one stateq, its axioms areA(pyz) =q(x0), and its rules areq(yz)→yzand

q(σ(x1:pwx, x2:pyz))→σ(q(x1), q(x2),#(w, z))

for allw, x, y, z∈ {a, b}.

Consider aΣ-contextC and the trees M(C[paa])andM(C[pba]). Let ube the

node ofCwithC/u=⊥. It is easy to see that the difference nodes ofM(C[paa])and

M(C[pba])are the nodeuand all nodesv·(3,1)such thatv 6=uis a node ofCand

uis the left-most leaf ofC/v. That gives the difference treesM(C[paa])/u=hq, paai,

M(C[pba])/u = hq, pbai,M(C[paa])/v ·(3,1) = a and M(C[pba])/v ·(3,1) =

b. In this way we obtain thatdiff(M) = {a, b} ∪ {hq, pyzi | y, z ∈ {a, b}}. Thus,

lubdiff(M) = 0.

Clearly, there is a dtopNequivalent toM. It has three statesq, q1, q2, axiomq(x0), and rulesq(yz)→yz,

q(σ(x1, x2))→σ(q(x1), q(x2),#(q1(x1), q2(x2))),

qi(σ(x1, x2))→qi(xi)fori= 1,2, q1(yz)→y, andq2(yz)→zfory, z∈ {a, b}. u t

(14)

δ(σ, po, po) =pe, andδ(σ, pe, po) =δ(σ, po, pe) =po. Its set of states is empty, and

its axioms areA(pe) =eandA(po) =o.

For everyΣ-contextC,{M(C[pe]), M(C[po])}={e, o}. Hencediff(M) ={e, o}

andlubdiff(M) = 0. Althoughdiff(M)is finite, there is obviously no dtop equivalent

toM. ut

5 Normal Forms

In this section we prove normal forms for total dtlas. For each of these normal forms we consider its effect onlubdiff(M). We start with a simple normal form in which each axiom consists of one state, more precisely, is inQ({x0}).

A dtlaM isinitializedif for everyp ∈ P there is a stateq0,p such thatA(p) =

q0,p(x0). The statesq0,pare called initial states; they are not necessarily distinct. Note

that for an initialized dtla,M(s) = qM(s)whereq = q0,pandp = δ(s), for every

s∈ TΣ. The dtlas and dtops in Examples 8 and 9 are initialized. The dtlas in Examples 7

and 10 are not.

Recall thatmaxrhs(M)is the maximal height of the axioms and the right-hand sides of the rules ofM.

Lemma 11. For every total dtla M an equivalent initialized dtla M0 can be con-structed, with the same look-ahead automaton asM, such that

|QM0|=|Q_M|+ 1, maxrhs(M0)≤2·maxrhs(M), and

lubdiff(M0)≤lubdiff(M)≤max{lubdiff(M0),maxrhs(M)}.

IfM is ultralinear or b-erasing, then so isM0_.

Proof. To constructM0fromM, we introduce a new stateq0. For everya∈Σ(k)and

p1, . . . , pk ∈Pwe add the rule

q0(a(x1:p1, . . . , xk:pk))→A(p)[q(x0)←rhs(q, a, p1, . . . , pk)|q∈Q],

wherep=δ(a, p1, . . . , pk). The right-hand side of this rule is defined: since every

look-ahead state is reachable, there exist treessiwithδ(si) =pi, and soδ(a(s1, . . . , sk)) =

p; sinceM is total,M(a(s1, . . . , sk))is defined and hencerhs(q, a, p1, . . . , pk)is

de-fined for everyqthat occurs inA(p). After adding the above rules, we changeA(p)into

q0(x0)for everyp∈P.

ThenM0 is initialized, withq0,p = q0 for all p ∈ P. It should be clear from Lemma 2(3/4) thatM0is equivalent toM. It should also be clear that for everyC∈ CΣ

andp ∈ P, if C 6= ⊥thenM0(C[p]) = M(C[p]). Moreover, forC = ⊥we have

M0(p) =hq0, piandM(p) =A(p)[q(x0)← hq, pi |q∈Q]. Sinceht(M0(p)/ε) = 0 andht(M(p)/v)≤ht(A(p))≤maxrhs(M)for everyv∈V(M(p)), this implies the required inequalities forlubdiff(M)andlubdiff(M0).

(15)

Note that it follows from the inequalities forlubdiff(M)andlubdiff(M0)that if

h(M0)is a difference bound for M0, thenmax{h(M0),maxrhs(M)} is a difference bound forM. In fact, ifdiff(M)is finite, thendiff(M0)is finite becauselubdiff(M0)≤

lubdiff(M), hencelubdiff(M0) ≤ h(M0), from which it follows thatlubdiff(M) ≤

max{lubdiff(M0),maxrhs(M)} ≤max{h(M0),maxrhs(M)}.

We continue with a basic and technically convenient normal form in which every state of the dtla only translates input trees that have the same look-ahead state; more-over, the rules satisfy a generalized completeness condition. It is closely related to the uniform i-transducer in [9].

A dtlaM islook-ahead uniform(for short,la-uniform) if there is a mappingρ :

Q→P (calledla-map) satisfying the following conditions:

(1) For everyp∈Pandq∈Q, ifq(x0)occurs inA(p), thenρ(q) =p. (2) For every ruleq(a(x1:p1, . . . , xk:pk))→ζinR,

(a) ρ(q) =δ(a, p1, . . . , pk)and

(b) for everyq¯∈Qandi∈[k], ifq¯(xi)occurs inζ, thenρ(¯q) =pi.

(3) For everyq∈Q,a∈Σ(k)_{, and}_p

1, . . . , pk ∈Psuch thatδ(a, p1, . . . , pk) =ρ(q),

there is a ruleq(a(x1:p1, . . . , xk:pk))→ζinR.

Clearly, the dtlaM of Example 7 is la-uniform withρ(q) =pb, and similarly, the one of

Example 8 is la-uniform withρ(qy_i) =pyfory∈ {a, b}. Note that a dtop is la-uniform

if and only if it is complete (if and only if it is total, by Lemma 5). In general, an la-uniform dtla is not complete; in fact, it is easy to see that every complete la-la-uniform dtla is a dtop.

We will need the following straightforward properties of an la-uniform dtla.

Lemma 12. LetM be an la-uniform dtla with la-mapρ.

(1) dom(_Jq_KM) =Jρ(q)KM for everyq∈Q.

(2) Mis total.

(3) M◦is la-uniform with the same la-mapρasM. (4) Letξbe a reachable sentential form fors∈ TΣ.

For allv∈V(ξ),q∈Q, andu∈V(s), ifξ/v=q(u)thenρ(q) =δ(s/u).

Proof. (1) We prove by structural induction ons ∈ TΣ thatqM(s)is defined if and

only ifδ(s) = ρ(q). Lets= a(s1, . . . , sk)andδ(si) = pi fori ∈[k]. Thenδ(s) =

δ(a, p1, . . . , pk). Thus, by conditions (2)(a) and (3) above,δ(s) = ρ(q)if and only if

there is a ruleq(a(x1:p1, . . . , xk:pk))→ζinR. For such a rule, by condition (2)(b)

above, ifq¯(xi)occurs inζ, thenρ(¯q) =piand hence, by induction,q¯M(si)is defined.

It now follows from Lemma 2(4) thatδ(s) =ρ(q)if and only ifqM(s)is defined.

(2) This is immediate from (1) and Lemma 2(3), by condition (1) above.

(3) By (1), the setR◦ of rules ofM◦ is obtained fromR by adding all the rules

q(p)→ hq, pisuch thatρ(q) =p. Henceρalso satisfies conditions (2) and (3) above forM◦_{(and condition (1) above, because}_M◦_{has the same axioms as}_M_).

(4) The easy proof is by induction on the length of the computationA(δ(s))[x0←

ε]⇒∗

(16)

Note that by (3) of this lemma, for everyC ∈ CΣ andp∈P, the treeM(C[p])is

inT∆(Qp× {p})whereQp={q∈Q|ρ(q) =p}.

We now prove that la-uniformity is a normal form for total dtlas.

Lemma 13. For every total dtla M an equivalent la-uniform dtla M0 _{can be}

con-structed, with the same look-ahead automaton asM, such that|QM0|=|Q_M| · |P_M|,

maxrhs(M0) =maxrhs(M), andlubdiff(M0) =lubdiff(M). IfM is initialized, ultra-linear or b-erasing, then so isM0.

Proof. We observe that it may be assumed thatM is complete: ifrhs(q, a, p1, . . . , pk)

is undefined, then we add the (dummy) ruleq(a(x1:p1, . . . , xk:pk))→dwheredis

any element of∆(0)_.

We constructM0 as follows. The state set ofM0 isQM0 =Q×P. Every axiom A(p)ofM is changed intoA(p)[q(x0)← hq, pi(x0)|q∈Q], and every rule

q(a(x1:p1, . . . , xk:pk))→ζ

is changed into the rule

hq, pi(a(x1:p1, . . . , xk:pk))→ζ[¯q(xi)← hq, p¯ ii(xi)|q¯∈Q, i∈[k]]

wherep=δ(a, p1, . . . , pk).

It should be clear thatM0 _{satisfies conditions (1) and (2) of the definition of} la-uniformity with la-mapρsuch thatρ(hq, pi) =p; sinceMis complete, condition (3) is also satisfied. HenceM0is la-uniform.

SinceM andM0are total, so areM◦and(M0)◦(see the proof of Lemma 3). Ob-viously, for every computation of (M0)◦ on an input trees¯ ∈ TΣ(P), one obtains a computation ofM◦on that input tree by changing everyhq, pi(u)that occurs in a sen-tential form intoq(u), and everyhhq, pi, piintohq, pi. HenceM0(¯s) =M(¯s)[hq, pi ← hhq, pi, pi |q∈Q, p∈P]. This implies thatM0_{is equivalent to}_M_{. It also implies that}

lubdiff(M0) =lubdiff(M), as can easily be verified.

Obviously, ifM is ultralinear with mappingµM :Q→N, then so isM0with the

mappingµsuch thatµ(hq, pi) = µM(q). Moreover, if there is an edge fromhq, pito hq0, pjiinEM0, then there is an edge fromqtoq0inE_M. Hence, ifMis b-erasing, then

so isM0. ut

Note that sincelubdiff(M0) = lubdiff(M), the la-uniform dtlaM0 has the same difference bounds asM.

Example 14. The dtla M of Example 9 is not uniform. We change it into an la-uniform dtla by the construction in the proof of Lemma 13 (but keep calling itM). Then it has set of statesQ ={qyz | y, z ∈ {a, b}}whereqyzabbreviateshq, pyzi, so

ρ(qyz) =pyz. Its axioms areA(pyz) =qyz(x0), and its rules areqyz(yz)→yzand

qwz(σ(x1:pwx, x2:pyz))→σ(qwx(x1), qyz(x2),#(w, z))

(17)

From now on we mainly consider la-uniform dtlas. For an la-uniform dtlaM, its la-map will be denotedρ(orρM when necessary).

Finally we generalize the normal form for dtops in [9] to total dtlas. For this nor-mal form it is essential that dtlas need not be initialized, i.e., that arbitrary axioms are allowed.

A dtlaM isearliestif it is la-uniform and, for every stateqofM, the set

rlabsM(q) :={lab(qM(s), ε)|s∈dom(JqKM)} ⊆∆

is not a singleton. This is equivalent with requiring thatu{qM(s)|s∈dom(JqKM)}=

⊥, cf. the definition of earliest in [9]. In other words,Misnotearliest if it has a stateq

for which the roots of all output treesqM(s)have the same label; intuitively, the node

with that label could be produced earlier in the computation ofM. A dtlaM iscanonicalif it is earliest and_Jq_KM 6=Jq

0

KM for all distinct statesq, q 0

ofM. Since it is required thatMis la-uniform, the earliest and canonical properties are appropriately relativized with respect to each look-ahead state, see Lemma 12(1).

It is easy to see that the dtlaM of Example 8 is canonical. It is earliest because

rlabsM(qia) = {σ, τ, a}andrlabsM(qib) = {σ, τ, b}. For an input treew(a),Jq

a iKM outputs the firstisymbols ofwand the leafaif|w| ≥i, and it outputsw(a)if|w|< i. For an input treew(b),_Jq_ia_KM is undefined. And the analogous statement holds forqbi.

Similarly, the dtlaM of Example 14 (which is the la-uniform version of the dtla of Example 9) is canonical: for ally, z ∈ {a, b},rlabsM(qyz) = {yz, σ}andJqyzKM is the restriction ofJMKtoJpyzKM.

The term ‘canonical’ suggests that any two equivalent canonical dtlasM1andM2 are the same, modulo a renaming of their states and look-ahead states. That is indeed true for dtops, as shown in [9, Theorem 15], but it doesnothold for arbitrary dtlas: for instance, the dtlaM of Example 14 and the dtopN of Example 9 are both canonical, and they are equivalent but not the same. It is, however, true ifM1 andM2have the same look-ahead automaton (by a proof similar to the one of [9, Theorem 15]), but that fact will not be needed in what follows.

For an la-uniform dtlaM the setsrlabsM(q)can be computed in a standard way. In

fact, consider the directed graph with set of nodesQ∪∆and with the following edges: for every ruleq(a(x1:p1, . . . , xk:pk))→ζofM, iflab(ζ, ε) =d∈∆then there is

an edgeq→d, and ifζ=q0(xj)then there is an edgeq→q0. Note that the subgraph

induced byQis the graphEM, as in the definition of a b-erasing dtla in Section 3. It is

straightforward to show thatrlabsM(q) ={d∈∆|q→∗d}, as follows:

(⊆)Structural induction ons=a(s1, . . . , sk), such thatqM(s)has root labeld. By

Lemma 2(4),qM(s) =ζ[¯q(xi)←q¯M(si)|q¯∈Q, i∈[k]]whereζ=rhs(q, s, ε). Ifζ

has root labeld∈∆, then there is an edgeq→d. Ifζ=q0₍_x

j), thenqM(s) =q0M(sj)

and soq→q0_→∗_d_{by induction.}

(⊇) Induction on the length of q →∗ d. If q → d thenqM(a(s1, . . . , sk)) =

ζ[¯q(xi) ← q¯M(si) | q¯ ∈ Q, i ∈ [k]] by Lemma 2(4), for any si ∈ JpiK, and so

qM(a(s1, . . . , sk))has root labeld. We have used thatM is la-uniform: ifq¯(xi)occurs

inζ, thenρ(¯q) = pi and henceq¯M(si)is defined by Lemma 12(1). Ifq → q0 →∗ d

thenqM(a(s1, . . . , sk)) = q0M(sj)has root labeld, because by induction there exists

(18)

We now prove that canonicalness is a normal form for total dtlas. For an la-uniform dtlaM, letfix(M)be a fixed subset ofTΣ such that for everyp∈Pthere is a unique

s∈fix(M)withδ(s) =p. Thus,fix(M)is a set of representatives of the equivalence classes_Jp_K,p∈P. Since every_Jp_Kis a regular tree language, a particularfix(M)can be computed fromM. For everyp∈P, letspbe the unique tree infix(M)withδ(sp) =p.

We define

sumfix(M) =X

q∈Q

size(qM(sρ(q))).

Note thatsumfix(M)is inNand can be computed fromM.

Theorem 15. For every la-uniform dtlaM an equivalent canonical dtlacan(M)can be constructed, with the same look-ahead automaton asM, such that

lubdiff(M)−sumfix(M)≤lubdiff(can(M))≤lubdiff(M) +sumfix(M).

Proof. We first prove the statement of this theorem for the case whereM is earliest. Since the equivalence of two dtlas is decidable (see [13] and [9, Corollary 19]), it is decidable for two statesq, q0ofM whether or notJqKM =Jq

0

KM. If this holds, thenq 0

can be replaced byqin every axiom and every right-hand side of a rule, thus makingq0 unreachable and hence superfluous. Since inM(C[p])everyhq0, piis replaced byhq, pi,

lubdiff(M)does not change. Thus, repeating this procedure one obtains a canonical dtla

can(M)equivalent toM, withlubdiff(can(M)) =lubdiff(M).

For the interested reader we observe that for an earliest dtlaMthe equivalence rela-tion≡onQdefined byq≡q0 if and only if_Jq_KM =Jq

0

KM, can in fact easily be com-puted by fixpoint iteration, because it is the largest equivalence relation onQsuch that ifq≡q0then (a)ρ(q) =ρ(q0)and (b) ifrhs(q, a, p1, . . . , pk) =t[q1(xi1), . . . , qr(xir)]

wheret ∈ PΣ andr =|V⊥(t)|, thenrhs(q0, a, p1, . . . , pk) = t[q10(xi1), . . . , q

0

r(xir)]

withqj ≡q0j for everyj ∈[r]. The straightforward proof of this is left to the reader,

cf. the proof of [9, Theorem 13]. Thus, the full dtla equivalence test of [13, 9] is not needed.

It remains to be proved that every la-uniform dtlaM can be transformed into an equivalent earliest dtlaM0, with the same look-ahead automaton, such that the distance betweenlubdiff(M0₎_and_lubdiff₍_M₎_{is at most}_sumfix₍_M_{). If}_M _{is not earliest, then} we obtainM0_{by repeatedly applying the following transformation step.}

Transformation. We transformM into a dtlaN with the same look-ahead automa-ton. LetQ1be the (nonempty) set of statesq ∈Qsuch thatrlabsM(q)is a singleton,

and for everyq∈Q1letrlabsM(q) ={dq}andmq =rk(dq). The set of states ofN is

QN := (Q−Q1)∪ {hq, ii |q∈Q1, i∈[mq]}.

WhenM arrives in stateq ∈ Q1 at a nodeuof an input tree s,N will first output the symbol dq and then arrive at nodeuin the states hq,1i, . . . ,hq, mqi, to compute

the direct subtrees of the tree qM(s/u), wherehq, iicomputes theith direct subtree

qM(s/u)/i. So, to describeN, we define for every treeζ∈ T∆(Q(Ω))whereΩis any

set of symbols of rank 0, the treeζΦΩ = ζ[q(ω) ← dq(hq,1i(ω), . . . ,hq, mqi(ω)) |

q∈Q1, ω∈Ω]. For everyp∈P, thep-axiom ofN isA(p)Φ{x0}. Every ruleq(a(x1:

(19)

q ∈ Q−Q1, and into themq ruleshq, ii(a(x1:p1, . . . , xk:pk)) → ζΦXk/i(with

i ∈[mq]) ifq ∈Q1. Note that in the latter case the root ofζΦXk has labeldq and so

itsith direct subtree is well defined. That is clear iflab(ζ, ε) =dq. Ifζ=q0(xj), then

q0 ∈Q1anddq0 = d_q; soζΦ_X

k = dq(hq

0_,₁_i₍_x

j), . . . ,hq0, mqi(xj))and one obtains

the ruleshq, ii(a(x1:p1, . . . , xk:pk))→ hq0, ii(xj)fori∈[mq].

IfM has la-mapρ, thenN is la-uniform with la-mapρN such thatρN(q) =ρ(q)

forq∈Q−Q1andρN(hq, ii) =ρ(q)forq∈Q1andi∈[mq].

It should be clear intuitively thatN is equivalent toM. Formally it can easily be shown for everys∈ TΣand every reachable sentential formξofM fors, thatξΦV(s) is a reachable sentential form ofN fors(where each computation step ofM is sim-ulated by one or mq computation steps ofN), and hence N(s) = M(s). We will

computelubdiff(N)below; to do that we need to extend the previous statement to trees ¯

s∈ TΣ(P). LetΨ be the substitution[hq, pi ←dq(hhq,1i, pi, . . . ,hhq, mqi, pi)| q∈

Q1, p∈P]. Then, for every reachable sentential formξofM◦fors¯, the treeξΦV(s)Ψ is a reachable sentential form ofN◦for¯s, and henceN(¯s) =M(¯s)Ψ.

Repetition. Using Lemma 2(3/4), it can easily be shown for everyq∈ Qands ∈

Jρ(q)K, that qN(s) = qM(s)if q /∈ Q1, and that hq, iiN(s) = qM(s)/ifor every

i∈[mq]ifq∈Q1. From this (and assuming thatfix(N) =fix(M)), it should be clear

thatsumfix(N)<sumfix(M), because, forq∈Q1ands∈Jρ(q)K, X

i∈[mq]

size(hq, iiN(s)) = X

i∈[mq]

size(qM(s)/i) =size(qM(s))−1.

Hence, the repetition of the above transformation stops after at mostsumfix(M)steps, with an earliest dtlaM0equivalent toM.

Difference trees. It now remains to prove that the distance betweenlubdiff(N)and

lubdiff(M)is at most 1, i.e., thatlubdiff(N) ≤ lubdiff(M) + 1andlubdiff(M) ≤

lubdiff(N) + 1. ConsiderC ∈ CΣ andp, p0 ∈P withp6=p0. Recall from above that

N(C[p]) = M(C[p])Ψ = M(C[p])[hq, pi ← dq(hhq,1i, pi, . . . ,hhq, mqi, pi) | q ∈

Q1]and similarly forp0. We observe that ifvis a node ofM(C[p]), then each proper ancestor ofvhas the same label inM(C[p])andN(C[p]), and similarly forp0.

Letvbe a difference node ofN(C[p])andN(C[p0])that is not a leaf ofN(C[p]). Thenv ∈ V(M(C[p])). If alsov ∈ V(M(C[p0])), then Lemma 6 implies that v is also a difference node ofM(C[p])andM(C[p0]) (in fact, by the above observation every proper ancestor ofv has the same label inM(C[p])andM(C[p0]); ifv would have the same label inM(C[p])andM(C[p0_{]), then that label would be in}_∆_because

p 6= p0_{, and hence}_v _{would have the same label in} _N₍_C_[_p_]) _and_N₍_C_[_p0_])). Conse-quently,ht(N(C[p])/v) =ht(M(C[p])Ψ/v)≤ht(M(C[p])/v)+1≤lubdiff(M)+1. If v /∈ V(M(C[p0])), then the parent vˆ of v is a difference node of M(C[p]) and

M(C[p0])(in fact, the label ofvˆinM(C[p0])ishq, p0ifor some q ∈ Q1, and so vˆ has different labels inM(C[p])andM(C[p0])because p 6= p0). Hence, in this case,

ht(N(C[p])/v) ≤ ht(N(C[p])/vˆ) ≤ ht(M(C[p])/vˆ) + 1 ≤ lubdiff(M) + 1. This proves thatlubdiff(N)≤lubdiff(M) + 1.

Now let v be a difference node of M(C[p]) andM(C[p0]) that is not a leaf of

(20)

ht(N(C[p])/v)≤lubdiff(N). Now assume thatvis not a difference node ofN(C[p]) andN(C[p0]). Then, by Lemma 6, the label ofvinM(C[p0])ishq, p0ifor someq∈Q1 and lab(N(C[p]), v) = lab(N(C[p0]), v) = dq. This implies, again by Lemma 6,

that the children of v are difference nodes of N(C[p]) andN(C[p0]). Let vi be a child ofv for which ht(M(C[p])/vi) is maximal. Then we haveht(M(C[p])/v) =

ht(M(C[p])/vi) + 1 ≤ ht(N(C[p])/vi) + 1 ≤ lubdiff(N) + 1. This proves that

lubdiff(M)≤lubdiff(N) + 1. ut

Note that it follows from the inequalities forlubdiff(can(M))that if h(M) is a difference bound forM, thenh(M) +sumfix(M)is a difference bound forcan(M). In fact (similar to the argument after Lemma 11), ifdiff(can(M))is finite, thendiff(M) is finite becauselubdiff(M) ≤ lubdiff(can(M)) +sumfix(M), hence lubdiff(M) ≤

h(M)and hencelubdiff(can(M))≤lubdiff(M) +sumfix(M)≤h(M) +sumfix(M). Note also that the transformation in the above proof does not preserve the ultralinear property (as can be seen in the next example).

Example 16. In this example we denote by Y the nonempty subsets of {a, b}, i.e.,

Y = {{a},{b},{a, b}}. Let Σ = {σ(2)_{, a}(0)_{, b}(0)_} _and_∆ ₌ _{_σ(2)

y | y ∈ Y} ∪

{a(0)_{, b}(0)_}_{. We consider an la-uniform dtla}_M _{such that}_M₍_a_{) =} _a_,_M₍_b_{) =} _b_{, and}

M(σ(s1, s2)) =σy(M(s1), M(s2))fors1, s2∈ TΣ, whereyis the set of labels of the leaves ofσ(s1, s2). Its set of look-ahead states isP ={py |y ∈Y}andδis defined

in the obvious way:δ(a) ={a},δ(b) ={b}, andδ(σ, py, pz) =py∪zfory, z ∈Y. Its

set of states isQ={qy|y∈Y}withρ(qy) =py, its axioms areA(py) =qy(x0)for

everyy∈Y, and its setRconsists of the rulesq{a}(a)→a, q{b}(b)→b, and

qy∪z(σ(x1:py, x2:pz))→σy∪z(qy(x1), qz(x2))

fory, z ∈Y.

The dtla M1 that is obtained fromM by identifying all its states into one state

q, is equivalent to M; it hasrlabsM1(q) = ∆, but it is not la-uniform. However, M

is not earliest: in fact,rlabsM(q{a}) = {σ{a}, a}andrlabsM(q{b}) = {σ{b}, b}, but

rlabsM(q{a,b}) ={σ{a,b}}. LetN be the dtla obtained fromM by applying the trans-formation in the proof of Theorem 15 once. ThenQ1 = {q{a,b}}. We will write the states hq{a,b},1iandhq{a,b},2ias q1{a,b} andq2{a,b}, respectively. So,N has states

q{a}, q{b}, q1{a,b}, and q2{a,b}. Its axioms are AN(py) = qy(x0) for y = {a} or

y = {b} (just as inM), and AN(py) = σy(q1y(x0), q2y(x0))for y = {a, b}. For

y={a}ory={b}, its setRN of rules contains the rule

qy(σ(x1:py, x2:py))→σy(qy(x1), qy(x2))

plus the rulesq_{a}(a)→aand q{b}(b)→b(just as inM). Moreover, fory={a, b},

RN contains the rules

q1y(σ(x1:py, x2:pz))→σy(q1y(x1), q2y(x1))

q2y(σ(x1:pz, x2:py))→σy(q1y(x2), q2y(x2))

for allz∈Y, plus the rules

q1y(σ(x1:pw, x2:pz))→qw(x1)

(21)

for all w, z ∈ Y with w 6= y andw ∪z = y. For N we have rlabsN(q{a}) =

{σ{a}, a} and rlabsN(q{b}) = {σ{b}, b} as forM, and we haverlabsN(q1{a,b}) =

rlabsN(q2{a,b}) = ∆. HenceN is earliest. Obviously, N is also canonical, and so

N =can(M).

Clearly,Nis not ultralinear, because condition (2) of the definition of ultralinearity cannot be satisfied for the ruleq1y(σ(x1:py, x2:pz))→σy(q1y(x1), q2y(x1)). Since

M is linear, this shows that ultralinearity is not preserved. After the definition of canon-icalness we observed (without proof) that the dtlaN =can(M)is unique, modulo a renaming of states, and so the nonpreservation of ultralinearity in the proof of

Theo-rem 15 is, in fact, unavoidable. ut

6 Difference Tuples and the Algorithm

In this section we introduce the notion of difference tuple, generalizing the notion of difference trees by considering all look-ahead states of a dtla simultaneously. Based on that notion, we present an algorithm which, for a given total dtlaM and a difference bound forM, decides whetherM is equivalent to a dtop, and if so, constructs such a dtop. By the results of the previous section, we may assumeM to be canonical.

6.1 Difference Tuples

LetM be a total dtpla and letP = {pˆ1, . . . ,pˆn}, where the order of the look-ahead

states is fixed as indicated. Recall that a dtpla is a proper dtla, i.e., a dtla that is not a dtop, hencen≥2. For a given contextCconsider the treesM(C[ˆp1]), . . . , M(C[ˆpn]).

Intuitively, the largest common prefix of all these trees doesnotdepend on the look-ahead. In contrast, the subtrees of the above trees that are not part of the largest common prefix,dodepend on the look-ahead information.

For treest1, . . . , tn ∈ T∆(Q×P)we define

diftup(t1, . . . , tn) :={(t1/v, . . . , tn/v)|v∈V⊥(u{t1, . . . , tn})},

which is a set of n-tuples inT∆(Q×P)n. We will say that(t1/v, . . . , tn/v)is the

difference tuple of t1, . . . , tn at such a nodev. We define theset of difference tuples

ofM as

diftup(M) := [

C∈CΣ

diftup(M(C[ˆp1]), . . . , M(C[ˆpn])).

For aΣ-contextC, we definepref(M, C)∈ P∆as

pref(M, C) :=u{M(C[p])|p∈P}=u{M(C[ˆp1]), . . . , M(C[ˆpn])}.

Note that pref(M, C)is a ∆-pattern because a node with label hq,pˆiiin M(C[ˆpi])

cannot have the same label inM(C[ˆpj])fori6=j. Note also that

(22)

We define the numberlubdiftup(M) ∈ _N∪ {∞}to be the least upper bound of the heights of the components of all difference tuples ofM, i.e.,

lubdiftup(M) =lub{ht(¯t)|¯_t_∈_diftup₍_M₎_}_.

Difference tuples are introduced for the following reason, cf. Lemma 4. We wish to decide whetherM is equivalent to a dtop. If there exists a dtopN that is equiva-lent toM, then we expect intuitively for anys ∈ TΣ andC ∈ CΣ, thatN(C[s]) =

t[q1N(s), . . . , qrN(s)] wheret = pref(M, C) = u{M(C[ˆp1]), . . . , M(C[ˆpn])} and

r = |V⊥(t)|. Thus, sinceN does not know the look-ahead stateδM(s)ofs, it

trans-latesC into the largest common prefix of the output treesM(C[ˆp1]), . . . , M(C[ˆpn]).

Moreover, if theith occurrence of⊥is at nodevioftfori∈[r], then we expect the

dif-ference tuple(M(C[ˆp1])/vi, . . . , M(C[ˆpn])/vi)of these output trees atvito be stored

in the stateqi ofN; in this wayN is prepared to continue its simulation ofM on the

subtrees. This will be proved in Lemma 24, under the condition thatM is canonical andNis earliest. IfNis also canonical, then its states are in one-to-one correspondence with the difference tuples ofM, as will be proved in Lemma 26.

Before giving some examples, we show thatlubdiftup(M)equalslubdiff(M), de-fined in Section 4. This implies thatdiftup(M)is finite if and only ifdiff(M)is finite.

Lemma 17. For every total dtplaM,lubdiff(M) =lubdiftup(M).

Proof. (≤) We show that every difference tree is a subtree of a component of a dif-ference tuple. Consider a difdif-ference tree M(C[p])/vwithC ∈ CΣ,p ∈ P, andv a difference node ofM(C[p])andM(C[p0])wherep0 ∈ P, i.e.,v ∈ V⊥(M(C[p])u

M(C[p0_{])). Since}_pref₍_{M, C}₎_v_M₍_C_[_p_])_u_M₍_C_[_p0_{]), there is an ancestor}_v_ˆ_of_v_such thatˆv∈V⊥(pref(M, C)). Thus,M(C[p])/ˆvis a component of a difference tuple, and

M(C[p])/vis one of its subtrees.

(≥) We show that every component of a difference tuple is a difference tree. Con-siderM(C[p])/vwithC∈ CΣ,p∈P, andv ∈V⊥(pref(M, C)). By Lemma 1, each proper ancestor ofvhas the same label in allM(C[¯p]),p¯∈P, butvdoes not have the same label in allM(C[¯p]). Thus, there existsp0 _∈ _P _{such that}_v_{has different labels} inM(C[p])andM(C[p0]). Thenvis a difference node ofM(C[p])andM(C[p0])by

Lemma 6. ut

We now give examples ofdiftup(M)for several total dtplasM. In the remainder of this subsection, we will use the dtla of Example 8 as a running example.

Example 18. For the dtlaMof Example 7, with the orderP ={pa, pb}, we obtain that

diftup(M) ={(a, σn_h_{q, p}_bi₎_|_n_∈

N}.

For the dtlaM of Example 8, also with the orderP = {pa, pb}, we obtain that

diftup(M)consists of all pairs

(hqa

i+50, pai,hqib, pbi)fori∈[30],

(z(hqa_50−|_z_|, pai), b) forz∈ {σ, τ}∗with|z|<50,and

(23)

For the dtla M of Example 14 (which is the la-uniform version of the dtla of Example 9) it is not difficult to see that diff(M) = {a, b} ∪ {hqyz, pyzi | y, z ∈ {a, b}}, and that the setdiftup(M)consists of the three 4-tuples(a, a, b, b), (a, b, a, b), and(hqaa, paai,hqab, pabi,hqba, pbai,hqbb, pbbi), where we have taken the orderP = {paa, pab, pba, pbb}.

For the dtlaM of Example 10,diftup(M) ={(e, o),(o, e)}.

In the above examples, the components of the difference tuples are exactly the dif-ference trees. As another example, letΣ=∆={σ(1)_{, a}(0)_{, b}(0)_{, c}(0)_}_{and consider the} dtlaM withP ={pa, pb, pc},δ(y) =pyandδ(σ, py) =pyfory ∈ {a, b, c},Q=∅,

A(pa) = a,A(pb) = σ(b), andA(pc) = σ(σ(c)). Thus, for everyn ∈ N,M

trans-latesσn_a_into_a_,_σn_b_into_σb_{, and}_σn_c_into_σσc_{. Since}_a_u_σb₌_⊥_, _a_u_σσc₌_⊥_{, and}

σbuσσc=σ⊥, we obtain thatdiff(M) ={a, σb, σσc, b, σc}. Sinceauσbuσσc=⊥, we obtain thatdiftup(M) ={(a, σb, σσc)}. Thus,bandσcare difference trees that are not components of a difference tuple (but are subtrees of such components). Note that there is a dtop with one state that is equivalent toM. ut

In the next lemmasN is a total dtop, equivalent toM. We assume that the unique look-ahead state ofN is⊥. So,N◦translates input trees inTΣ({⊥}), in particularΣ -contexts, into output trees inT∆(QN × {⊥}); for aΣ-contextCwe of course writeC

instead ofC[⊥]. The unique axiomAN(⊥)is denoted byAN, a ruleq(a(x1:⊥, . . . , xk: ⊥)) → ζ is writtenq(a(x1, . . . , xk)) → ζ, andζ is denotedrhsN(q, a). For a tree

t∈ T∆(QN × {⊥})we define the patterntΦ∈ P∆bytΦ=t[hq,⊥i ← ⊥ |q∈QN];

similarly, fort∈ T∆(QN(X))we definetΦ=t[q(xi)← ⊥ |q∈QN, i∈N].

LetM be a canonical dtpla andN a dtop such that _JM_K = _JN_K. We first show that the translation of an input tree byM is always ahead of its translation byN, in a uniform way. Anaheadness mappingfromN toM is a functionϕ : QN ×PM → T∆(QM×PM)such that for everyC∈ CΣ andp∈PM,

M(C[p]) =N(C)[hq,⊥i ←ϕ(q, p)|q∈QN]. (2)

Note thatϕ(q, p)must be inT∆({hq, p¯ i |q¯∈QM, ρM(¯q) =p}). Intuitively,ϕdefines

the exact amount in whichM is ahead ofN, which is independent ofC. This amount is stored in each stateqofN, for every look-ahead statepofM.

The next lemma provides an obvious equivalent formulation of Equation (2), using the substitutionΦdefined above.

Lemma 19. For everyC∈ CΣandp∈PM, Equation (2) is equivalent to the following

two conditions:

(a) N(C)ΦvM(C[p]), and

(b) for everyv∈V⊥(N(C)Φ)andq∈QN,

ifN(C)/v=hq,⊥i, thenϕ(q, p) =M(C[p])/v.

(24)

Let us now computeN(C). For|w| < 30we obtain thatN(C) = w(hq_30−|w|,⊥i). Since in this caseM(C[pa]) = w(hq_80−|a _w_|, pai)andM(C[pb]) = w(hq_30−|b _w_|, pbi),

Equation (2) requires fori= 30−[w], and hence for everyi∈[30], that

ϕ(qi, pa) =hqai+50, pai and ϕ(qi, pb) =hqib, pbi.

Note that in this caseM is not properly ahead ofN. Forw=w1zwith|w1|= 30and

|z|<50, we get thatN(C) =w1(hqz,⊥i). SinceM(C[pa]) =w1z(hq_80−|a _w

1z|, pai)

andM(C[pb]) =w1(b), the mappingϕshould satisfy

ϕ(qz, pa) =z(hqa50−|z|, pai) and ϕ(qz, pb) =b, for |z|<50.

Finally, ifw=w1zw2with|w1|= 30and|z|= 50, thenN(C) =w1(hqz,⊥i). Since

M(C[pa]) =w1z(a)andM(C[pb]) =w1(b), we get that

ϕ(qz, pa) =z(a) and ϕ(qz, pb) =b, for |z|= 50.

Clearly, the above requirements defineϕuniquely, and henceϕis an aheadness map-ping fromN toM (and, in fact, the unique one). ut

Lemma 21. LetM be a canonical dtpla andN a dtop such thatJMK = JNK. Then

there is a unique aheadness mapping fromNtoM.

Proof. We first show thatMis ahead ofN, i.e., that all output symbols produced byN

on a given input context are also produced byM. Claim 1. Letp∈PM and letCbe aΣ-context.

For everyd∈∆,Vd(N(C))⊆Vd(M(C[p])). Equivalently,N(C)ΦvM(C[p]).

Proof. We show that every nodev of N(C)with label d ∈ ∆is also a node of

M(C[p], with the same label. The proof is by induction on the length ofv, as follows. Sincev’s proper ancestors are inV∆(N(C)), the induction hypothesis implies thatv

is a node ofM(C[p]). Consider an arbitrarys∈_Jp_KM. By Lemma 4,vhas labeldin

N(C[s]). Since_JM_K=_JN_K,M(C[s]) = N(C[s])and sovhas labeldinM(C[s]). Suppose thatv does not have label dinM(C[p]). Then, again by Lemma 4,v must have some labelhq, piinM(C[p])such thatqM(s)has root labeld. Since this holds

for every s ∈ JpKM, we obtain that rlabsM(q) = {d}contradicting the fact that M is earliest. Note that, sinceM is la-uniform,ρM(q) = pby Lemma 12(3) and hence

JpKM =dom(JqKM)by Lemma 12(1). This proves the claim.

Next we show that the amount in whichMis ahead ofN, is independent ofC. Claim 2.LetC1, C2beΣ-contexts,v1, v2∈N∗+,q∈QN, andp∈PM.

IfN(C1)/v1=N(C2)/v2=hq,⊥i, thenM(C1[p])/v1=M(C2[p])/v2.

Proof. By Claim 1,viis a node ofM(Ci[p]). Letti ∈ T∆(QM × {p})denote the

tree M(Ci[p])/vi. For everys ∈ JpKM,N(C1[s])/v1 = N(C2[s])/v2 = qN(s)by Lemma 4, and soM(C1[s])/v1 =M(C2[s])/v2. Hence, again by Lemma 4,t1Ψs =

t2Ψs for all s ∈ JpKM, where Ψs = [hq, pi ← qM(s) | q ∈ QM]. Suppose that