Tree recognizers - Recognizable tree languages

4.4 Recognizable tree languages

4.4.1 Tree recognizers

Since trees are hierarchical structures, they can be processed either starting from the root and moving towards the leaves (top-down), or starting from the leaves and going up towards the root (bottom-up). In both cases the control may be nondeterministic or deterministic.

We shall see that one type of recognizer accepts a smaller class than recognizable tree languages, the other three being equivalent. We start with nondeterministic top-down recognizers rst studied by Magidor and Moran (1969).

Denition 4.4.1. A nondeterministic top-down (ndT) ΣX-recognizer is a system TR = (Q, Σ, X, P, I), specied as follows.

(1) Q is a unary ranked alphabet of states such that Q ∩ (Σ ∪ X) = ∅.

(2) Σ and X are the input alphabets.

(3) I ⊆ Q is the set of initial states.

(4) P is a nite set of rules, each of one of the following two types:

(NDT1) q(d) → d, where d ∈ X ∪ Σ0 and q ∈ Q;

(NDT2) q(f(ξ1, . . . , ξ_m)) → f (q₁(ξ₁), . . . , q_m(ξ_m)), where m ≥ 1, f ∈ Σm and q, q1, . . ., q_m ∈ Q.

Since we dened trees as terms, we give the semantics of an ndT as a term rewriting system. So, for any s, t ∈ TΣ∪Q(X), s ⇒TRt means that t is obtained from s by replacing an occurrence of

(1) a subtree q(d) of s by d, where q(d) → d is a rule of type (NDT1) in P , or

(2) a subtree q(f(t1, . . . , t_m)) of s by the tree f(q1(t₁), . . . , q_m(t_m)) by using a type (NDT2) rule q(f(ξ1, . . . , ξm)) → f (q1(ξ1), . . . , qm(ξm))appearing in P .

Then, the tree language accepted by TR is the set

T (TR) := {t ∈ TΣ(X) | q(t) ⇒^∗_TRtfor some q ∈ I} .

Thus, TR accepts a ΣX-tree t if and only if it is possible to choose an initial state in which TRstarts at the root of t and then successively apply transition rules for each node of t in such a way every leaf is reached in a state q that agrees with the label d of the leaf, i.e., P contains the rule q(d) → d. A ΣX-tree language R is called recognizable, or regular, if R = T (TR)for some ndT ΣX-recognizer TR. Let Rec denote the family of all recognizable tree languages. Hence, by our general convention, RecΣ(X), and RecΣ denote the sets of all recognizable ΣX-tree languages, and all recognizable Σ-tree languages, respectively.

An ndT Σ-recognizer TR = (Q, Σ, P, I) of a Σ-tree language T ⊆ TΣ is obtained by an obvious modication of Denition 4.4.1, and let Rec^vf be the family of recognizable tree languages without a leaf alphabet.

Example 4.4.2. The system TR = ({qf, q_g, q_x}, {f /2, g/1}, {x}, P, {q_f}), where P is con-sisting of the rules

q_f(f (ξ₁, ξ₂)) → f (q_g(ξ₁), q_g(ξ₂)) qg(g(ξ1)) → g(qg(ξ1)) qg(g(ξ1)) → g(qx(ξ1))

qx(x) → x ,

is an ndT recognizer, and T (TR) = {f(gⁿ(x), g^m(x)) | n, m ≥ 1). A sample computation in TRis

q_f(f (g(x), g(g(x)))) ⇒_TRf (q_g(g(x)), q_g(g(g(x)))) ⇒²_TRf (g(q_x(x)), g(q_g(g(x)))) ⇒_TR

⇒_TRf (g(qx(x)), g(g(qx(x)))) ⇒²_TRf (g(x), g(g(x))) , which shows that the tree f(g(x), g(g(x))) is accepted by the machine.

If a nondeterministic top-down tree recognizer has exactly one initial state and, in the accepting process, at most one rule can be applied at each node of the tree, then it becomes a deterministic top-down recognizer. In this case, the recognition power is weaker because the machine has to make the decision of acceptance separately at each leaf without any information about the tree outside the path leading from the root to that leaf.

Denition 4.4.3. A deterministic top-down (dT) ΣX-recognizer is an ndT ΣX-recognizer TR = (Q, Σ, X, P, I)with exactly one initial state, at most one rule of type (NDT1) for each pair (d, q) ∈ (X ∪ Σ0) × Q, and at most one rule of type (NDT2) for any m ≥ 1, f ∈ Σm

and q ∈ Q.

If I = {q0}, we write TR = (Q, Σ, X, P, q0). A ΣX-tree language is deterministic recognizable (dT-recognizable) if it is recognized by a dT ΣX-recognizer. The family of all dT-recognizable tree languages is denoted by DRec. An example is given next.

Example 4.4.4. If Σ = {f/2, g/1} and X = {x}, then TR = ({q0, q₁}, Σ, X, P, q₀) with P consisting of the following rules

q₀(f (ξ₁, ξ₂)) → f (q₁(ξ₁), q₁(ξ₂)) q0(g(ξ1)) → g(q1(ξ1)) q1(f (ξ1, ξ2)) → f (q0(ξ1), q0(ξ2))

q₁(g(ξ₁)) → g(q₀(ξ₁)) q₀(x) → x

is a deterministic top-down ΣX-recognizer accepting the language of all ΣX-trees in which each path is of even length.

We also note the following (see Magidor and Moran, 1969, Gécseg and Steinby, 1984, Gécseg and Steinby, 1997, Engelfriet, 1975c, for example).

Theorem 4.4.5. DRec ⊂ Rec.

Proof. It is clear from the denitions that any dT recognizer is also an ndT ΣX-recognizer. Let Σ = {f/2} and X = {x, y}. Obviously, the nite set {f(x, y), f(y, x)} is in Rec_Σ(X). If there is a dT ΣX-recognizer that accepts the trees f(x, y) and f(y, x), then it must also accept the trees f(x, x) and f(y, y). Consequently, we get that {f(x, y), f(y, x)} /∈

DRecΣ(X), and hence DRec ⊂ Rec.

Let us now consider the bottom-up generalization of the nite automaton. The precise denition follows.

Denition 4.4.6. A nondeterministic bottom-up (ndB) ΣX-recognizer is a system BR = (Q, Σ, X, P, F ), where

(1) Q, Σ and X have the same meaning as in Denition 4.4.1, (2) F ⊆ Q is the set of nal states, and

(3) P is a nite set of rules, each of one of the following two types:

(NDB1) d → q(d) with d ∈ X ∪ Σ0 and q ∈ Q;

(NDB2) f(q1(ξ₁), . . . , q_m(ξ_m)) → q(f (ξ₁, . . . , ξ_m)) with m ≥ 1, f ∈ Σm and q, q1, . . ., q_m∈ Q.

The next-conguration relation ⇒BR is dened for any ndB BR as a term rewriting system as follows. For any s, t ∈ TΣ∪Q(X), s ⇒BR t means that t is obtained from s by replacing an occurrence of

(1) d in s by the tree q(d), where q(d) → d is a rule of type (NDB1) in P , or

(2) a subtree f(q1(t1), . . . , qm(tm)) of s by the tree q(f(t1, . . . , tm)) by using a type (NDB2) rule f(q1(ξ₁), . . . , q_m(ξ_m)) → q(f (ξ₁, . . . , ξ_m))appearing in P .

Thus, the tree language accepted by BR is the set

T (BR) := {t ∈ T_Σ(X) | t ⇒^∗_BRq(t)for some q ∈ F } .

Moreover, BR is deterministic if there are no two transition rules in P with the same left-hand side. Next we shall give an example.

Example 4.4.7. The device BR = ({qf, q_g, q_x}, {f /2, g/1}, {x}, P, {q_f}), where P consists of the rules

f (q_g(ξ₁), q_g(ξ₂)) → q_f(f (ξ₁, ξ₂)) g(qg(ξ1)) → qg(g(ξ1) g(qx(ξ1)) → qg(g(ξ1)

x → qx(x)

is a bottom-up tree recognizer recognizing the tree language {f(gⁿ(x), g^m(x)) | n, m ≥ 1). The computation of f(g(x), g(g(x))) in TR is

f (g(x), g(g(x))) ⇒²_BRf (g(q_x((x))), g(g(q_x(x)))) ⇒²_BRf (q_g(g(x)), g(q_g(g(x)))) ⇒_TR

⇒_BRf (q_g(g(x)), q_g(g(g(x)))) ⇒_TRq_f(f (g(x), g(g(x)))) . Note that actually BR is deterministic.

Obviously, any deterministic ΣX-bottom-up tree recognizer is by denition an ndB that contains exactly one rule for each possible left-hand side. On the other hand, any ndB ΣX-recognizer can be made deterministic by the usual subset construction for nite automata (see Engelfriet (1975c, Theorem 3.8) or Gécseg and Steinby (1984, Theorem 2.6), for example). Thus, deterministic and nondeterministic bottom-up tree recognizers

de-ne the same class of tree languages. Furthermore, any ndB ΣX-recognizer dede-ned as a term rewriting system becomes an equivalent ndT ΣX-recognizer of the kind introduced in Denition 4.4.1 when all rules are reversed and nal states are turned into initial states (see Example 4.4.7). Moreover, the converse transformation yields an equivalent ndB ΣX-recognizer for any given ndT ΣX-ΣX-recognizer (Engelfriet, 1975c, Theorem 3.17). Hence, ndB tree recognizers recognize exactly the recognizable tree languages.

In what follows we also speak generally about tree recognizers without specifying the alphabets.

In document Syntax-directed translations, tree transformations and bimorphisms (Page 64-67)