Inference of recognizable tree sets. P. García and J. Oncina DSIC-II/47/93

(1)

Inference of recognizable tree sets P. García and J. Oncina

(2)

INFERENCE OF RECOGNIZABLE TREE SETS

*

P. Garcia

† _and

_{J. Oncina}

‡

† Departamento de Sistemas Informáticos y Computación. Universidad Politécnica de Valencia ‡ Departamento de Tecnología Informática y Computación. Universidad de Alicante.

e-mail: [email protected]

Abstract

Recognizable tree sets are a generalization of regular languages. We show that the class of recognizable tree sets is identifiable in the limit from complete presentation and we propose an inference algorithm with this property. This algorithm can be used for the inference of context-free grammars from positive structural information and negative data. The proposed algorithm constructs its hypothesis in polynomial time with the size of the data.

key words: Inductive Inference, Recognizable Tree Sets running head: learning tree sets

*_{Work supported in part by the Spanish CICYT under grants TIC93-0633-CO2 and} TIC1026/92-CO2.

(3)

I.

I

NTRODUCTION

The inference of recognizable tree sets [Gécsec and Steinby,84] has recently been the object of renewed interest with regard to the problem of learning context-free languages from structural information [Sak,88,92]. An algorithm is proposed in [Sak,92] that identifies in the limit any context-free language from positive structural information. This algorithm is, in reality, an algorithm capable of identifying a

subclass of recognizable tree sets, the 0-reversible tree sets, from positive data. On the

other hand, for any context-free language a reversible grammar exists such that the set of skeletons of its derivation trees is 0-reversible. In order to correctly learn context-free languages Sakakibara’s algorithm requires its positive structural information to have been extracted from a reversible grammar that generates the unknown language.

However, if the form of context-free grammar that provides the structural information is unrestricted, the class of the context-free languages is no longer identifiable in the limit from only positive structural information. This is a direct result of the fact that the family of recognizable tree sets is not identifiable in the limit from positive data.

In this paper, we propose an algorithm that identifies in the limit any recognizable tree set from complete presentation (positive and negative). Thus, our algorithm allows for the correct inference of any context-free language from complete structural information regardless of the type of context-free grammar that is assumed to provide the learning skeletons. The algorithm which we present here is a generalization of the regular language inference algorithm proposed by [Oncina and García,92]

This algorithm is based on the state clustering technique applied to the so called "Subtree Automaton" associated to the given positive data. The subtree automaton is the equivalent acceptor for trees of the conventional Prefix Tree Acceptor for usual strings. The algorithm merges the states of the Subtree Automaton in a certain order using the negative data to avoid certain groupings. For finite data sets, the final result is a Non-Deterministic Tree Automaton which is consistent with the data. The algorithm converges in the limit to the minimum Deterministic Tree Automaton of the tree set to be inferred. Moreover, the algorithm can be used in such a way that it works in polynomial time with the size of the input data.

II.

P

RELIMINARIES AND NOTATION

Let N be the set of natural numbers and N* _{the free monoid generated by N with}

"." as the operation and λasthe identity. We define u≤w for u,w∈N* iff there exists v∈N*_{, such that w = u.v (u}<_{w if u}≤_{w and u}≠_{w). For x}∈_N*_,_{we define the length of x}

denoted by |x| as follows:

|λ| = 0,|x.n| = |x| +1 for n N

D ⊆ N* is a tree domain iff it satisfies: a) v∈D and u<v implies u∈D b) if u.i ∈D, i ∈N, then u.j ∈D for 1≤j≤i .

(4)

A ranked alphabet V is a finite set associated with a finite relation r ⊆(V x N).

Vn denotes the subset of V: {σ∈V| (σ, n)∈r}.

A tree t over a ranked alphabet V is a mapping t : D →V with D being a tree

domain called domain of t and denoted by dom(t). The set of finite trees over V will be called VT_. _{The alphabet can be seen as a set of function symbols having different}

arities in such a way that VT _{can be considered as the set of terms over V. Let t}∈_VT

and x∈dom(t). The depth of x is defined as depth(x) = |x|, and the depth of t as depth(t) = max{depth(x) | x∈dom(t)}.

The subtree of t rooted at x, denoted as t/x is defined as: dom(t/x) = {y | x.y

∈dom(t)} and (t/x) (y) = t(x.y) ∀y ∈ dom(t/x). If t∈VT_{, then Sub(t) is the set of}

subtrees of t, that is, Sub(t) = {t/x | x ∈dom(t)} and for the set T ⊆ VT_,_{Sub(T) =}∪ t∈T Sub(t).

Let $∉V be a new symbol of arity zero. We denote VT$ as the set of trees in (V ∪{$})T_{which contains exactly one symbol $. For s}∈_VT_$_{and t}∈_(VT∪_VT_$_{) , we}

define the $-replacement s#t as:

s # t(x)= s(x) if x∈dom(s), s(x)≠$ t(y) if x =z. y, s(z)=$, y∈dom(t)

  

For t∈VT_{and T}⊆_VT_,_{we define the quotient t}-1_{T as:}

t−1T= s∈V$ T s# t∈T

{

}

if t∈VT − V₀ t if t∈V₀   

Let V be a ranked alphabet and m the greatest arity of the symbols in V. A

non-deterministic tree automaton (NTA) is defined as the four-tuple A = (Q,V,δ,F), where Q is a finite set of states Q∩V0 = ∅, F ⊆ Q is the set of final states, and δ = (δ0,δ₁,...,δm), the set of state transition functions defined by:

δn:(Vnx(Q∪V0)n)→2Q, n=1,2,...,m δ0(a) = a, ∀a∈V0

δ can be extended to operate on trees as follows:

δ(σ(t1 ,...,tn)) = q1∈δ(t1),...,qn∈δ( tn)

∪

δn(σ,q1,...,qn)

if n≥1

δ0(a) if a∈V0

A tree t∈VT_{is accepted by A if}δ_(t)∩_F≠∅_{,. The set of trees accepted by A is}

defined as T(A) = {t∈VT | δ(t)∩F≠∅}. A deterministic tree Automaton (DTA) is a NTA such that for every q1,...qk∈(Q∪V0) and every σ∈Vk, δk/σ, q1,...,qk) has at most

(5)

A = (Q,V,δ,F) is isomorphic to A’ = (Q’,V,δ’,F’) if and only if there exists a

bijection ϕ: Q→Q’ such that ϕ(F) = F’, and for every q1,...,qk∈(Q∪V0) and every σ∈Vk,ϕ(δκ(σ, q1,...,qk)) = δ’κ(σ, q’1,...,q’k), where q’i =ϕ(qi) if qi∈Q and q’i =qi if qi∈V0 for i = 1,...,k.

The tree automaton A’ = (Q’,V,δ’,F’) is a subautomaton of A = (Q,V,δ,F) if

and only if Q’⊆ Q, F’⊆ F and for every q1,...,qk∈(Q’∪V0) and every σ∈Vk, δ’κ (σ, q1,...,qk) = δκ(σ, q1,...,qk) or, δ’κ(σ, q1,...,qk)) is undefined.

Let A = (Q,V,δ,F) be a tree automaton. If Q’⊆ Q, then the subautomaton of A induced by Q is defined as A’ = (Q’,V,δ’,F’), where F’ = Q’∩ F and q∈δ’_κ(σ, q1,...,qk) if and only if q∈Q’, q1,...,qk∈(Q’∪V0) and q∈δκ(σ, q1,...,qk).

Let A = (Q,V,δ,F) be a tree automaton. If π is a partition of Q, the quotient automaton of A induced by π is defined as A/π = (Q’,V,δ’,F’), where Q’=π , F’={ B ∈π | B ∩ F≠ ∅ } and for every B1,...,Bk∈(Q’∪V0) and σ∈Vk, B∈δ’κ(σ, B1,...,Bk) if

there exist q∈Β and qi∈Βi∈Q’ or qi=Βi∈V0, i=1,...,k such that q∈δκ(σ, q1,...,qk).

The tree set accepted by A/π is a superset of the tree set accepted by A, i. e.

T(A) ⊆T(A/π ). If π’ refines π , then T(A/π’)⊆T(A/π ).

Let T be a recognizable tree set. The Canonical tree automaton that recognizes

T is defined as A(T) = (Q,V,δ,F) , where:

Q = {t-1_{T | t}∈_{Sub(T) -V}₀_{}; F = {t}-1_{T | t}∈_T};

δκ(σ, s1-1T,...,sk-1T) =σ(s1,...,sk)-1T, s1,...,sk,σ(s1,...,sk)∈Sub(T) δ0(a) = a, a ∈V0

Let S ⊆VT_{be a finite set. We define the Subtree automaton of S as SA(S) =} (Q,V,δ,F) , where:

Q = Sub(S) -V0 ; F = S;

δκ(σ, s1,...,sk) =σ(s1,...,sk), s1,...,sk,σ(s1,...,sk)∈Sub(S) δ0(a) = a, a ∈V₀

For a context-free grammar G=(N,Σ,P,S) and for each symbol X∈N∪Σ, we

define the set of trees from G rooted at X as:

D X( G)= a { } if X =a∈Σ X (t₁," , t_n) X→ X₁" X_n ∈P, t_i ∈D_X i (G ), 1≤i≤n           if X∈N     

(6)

A skeletal alphabet is a ranked alphabet with exactly one symbol whose arities are greater or equal to one. If Sk denotes such an alphabet and V0 is an alphabet of

symbols whose arity is zero, a tree over Sk ∪V0 is called a skeleton (all its inner nodes

are labeled by σ and the leaves are symbols from V0). Given (V,r) and (V’,r’), a transcription is a mapping π: V→V’ with π(V_n) ⊆ V’_n.The function π can be extended

to π: VT→_V’T_{, making}π₍σ_(t₁_,...,t_n₎₎ = π₍σ₎₍π_(t₁_),...,π_(t_n_{)). Let V’=(Sk}∪_V₀_{) and for}

any a∈V0, π(a) = a and π(A) = σ for A∈Vn , n≥1; in that case, if t∈VT, π(t) will give

the skeleton of t which will be denoted as sk(t)

.

For the set T ⊆ VT_, _{the set of}

skeletons associated to the trees in T is sk(T) =

∪

t∈T sk(t).

If Sk={σ}, then the set of trees over Sk ∪Σ , which are skeletons associated to trees in DS(G), is denoted by sk(D(G)). Both DS(G) and sk(D(G)) are regular tree sets.

Let t be a derivation tree or a skeleton of a context-free grammar with a terminal alphabet Σ (we admit that t may be an element of Σ ). The frontier string of t, FS(t), is defined as: FS( t )= t if t∈Σ FS(t₁)... FS(t_k) if t =

σ

(t₁,..., t_k)    

Let A = (Q,Sk ∪Σ,δ,F) be a DTA for a set of skeletons. There exists a

context-free grammar G=(N,Σ,P,S) such that sk(D(G)) = T(A), which can be obtained as

follows:

N = Q ∪{S}; P = {δn(σ,q1 ,...,qn) →q1...qn | n≥1, σ∈Skn, q1,...,qn∈(Q∪Σ)}∪ {S→ q1...qn | n≥1, δn(σ,q1 ,...,qn) ∈F}

III.

I

NFERENCE ALGORITHM

Given a positive sample S+ taken from a set of trees T ⊆ VT, a total order

relation can be defined in Sub(S+) - V0 (the set of states of SA(S+)) by sorting the trees

in Sub(S+) - V0 according to their depth and, if two trees have the same depth,

according to a fixed arbitrary order. We will denote by "<" the order defined in this way. This order can be extended to the blocks of any partition π of Sub(S+) - V0 as

follows:

∀ B,B’∈π B<B’ if and only if ∃ t ∈B |∀t’∈ B’ t<t’

Given a partition π of Sub(S+)-V0 and B,B’∈π, the result of the operation merge(π,B,B’) is the partition obtained by grouping B and B’ in the same block and

(7)

Let S = S+

∪

S- be a finite sample of an unknown tree set T , and let r be the

number of states of A0 = SA(S+). With input S, the algorithm outputs a tree

automaton, A0/πr, where πr is obtained by the following iterative procedure: π1 is the trivial partition in Sub(S+)-V0 (the states of A0).

πn+1= merge(πn,B,tn+1) if ∃∈Β1,..,Bk,B∈πn, ∃σ∈Vk | {B,tn+1}⊆δk n (σ, B1,...,Bk), B<tn+1 and B is the first block in δk n (σ, B1,...,Bk) such that T(A0/merge(πn,B,tn+1)) ∩ S- = ∅

πn+1= merge(πn,B,tn+1) if B<tn+1 and B is the first block in πn such

that T(A0/merge(πn,B,tn+1)) ∩ S- = ∅

πn+1= πn otherwise Example 1. Let S₊ = { σ σ σ a b c _, σ σ σ a b c σ c _, σ σ σ a σb c a b _{} and} S- = { σ σ σ c c _, σ σ σ a b a b _, σ a _{b ,} σ c , σ a σb a b , σ c σ c } Sub(S+)-V0 ={ σ c _, σ a b _, σ c σ c _, σ aσb a b _, σ σ σ a _{b c ,} σ σ σ a b c σ c _, σ σ σ aσb c a b _}

(8)

q1 = σ c , q2 = σ a _b,_q 3 = σ c σ c , q4 = σ aσb a b , q5 = σ σ σ a _{b c ,} q6 = σ σ σ a b c σ c_,_q 7 = σ σ σ a σb c a b _. A0 is defined by: δ0 0 (a) = a; δ₀0(b) =b; δ₀0(c) =c; δ1 0₍_σ_,c)₌ q₁, δ2 0₍_σ_,a,b)₌ q₂,δ₂0(σ,c,q₁)= q₃,δ₂0(σ,q₂,q₁)=q₅,δ₂0(σ,q₂,q₃)=q₆ δ2 0 (σ,q₄,q₁)=q₇ δ3 0₍_σ_,a,q 2,b)=q4 F0 = {q5,q6,q7} π1 = { q1, q2, q3, q4, q5, q6, q7 } therefore A0/π1 = A0

A0/merge(π1,q1,q2) is defined by: δ0 2 (a) = a; δ₀2(b) =b; δ₀2(c) =c; δ1 2₍_σ_,c)₌_[q 1,q2], δ2 2₍_σ_,a,b)₌_[q 1,q2], δ2 2₍_σ_,c,[q 1,q2])=q3, δ2 2₍_σ_,[q 1,q2],[q1,q2])=q5, δ2 2 (σ,[q₁,q₂],q₃)=q₆, δ₂2(σ,q₄,[q₁,q₂])=q₇ δ3 2₍_σ_,a,[q 1,q2],b)=q4 F2 = {q5,q6,q7}

We can observe that σ σ σ

c

c _∈_Τ(_A

0/merge(π1,q1,q2) ),

then, T(A0/merge(π1,q1,q2)) ∩ S-≠∅ and then π2 = π1

A0/merge(π2,q1,q3) is defined by: δ0 3 (a) = a; δ₀3(b) =b; δ₀3(c) =c; δ1 3₍_σ_,c)₌_[q 1,q3]; δ2 3 (σ, a,b)=q₂; δ₂3(σ,c,[q₁,q₃])=[q₁,q₃]; δ2 3₍_σ_{, q} 2,[q1,q3])={q5,q6}; δ2 3₍_σ_{, q} 4,[q1,q3])=q7

(9)

δ3

3₍_σ_{, a,q}

2,b)=q4

F₃ = {q₅,q₆,q₇}

We can see that T(A0/merge(π2, q1, q3 ) ∩ S- =∅

then π3 = merge(π2, q1, q3 ) = {[q1, q3], q2, q4, q5, q6, q7}

A0/merge(π3,[q1, q3], q4 ) is defined by: δ0 4 (a) = a; δ₀4(b) =b; δ₀4(c) =c; δ1 4₍_σ_,c)₌_[q 1,q3,q4]; δ2 4 (σ,a,b)=q₂; δ₂4(σ,c,[q₁,q₃,q₄])=[q₁,q₃,q₄]; δ2 4₍_σ_,q 2,[q1,q3, q4])={q5,q6};δ2 4₍_σ_,[q 1,q3,q4],[q1,q3,q4])=q7 δ3 4 (σ,a,q₂,b)=[q₁,q₃,q₄] F4 = {q5,q6,q7}

We can see that σ σ σ c c _∈_Τ(_A 0/merge(π3,[q1, q3], q4 )), then,Τ(A0/merge(π3,[q1, q3], q4 )) ∩ S-≠∅

A0/merge(π3, q2, q4 ) is defined by: δ0 4 (a) = a; δ₀4(b) =b; δ₀4(c) =c; δ1 4₍_σ_,c)₌_[q 1,q3]; δ2 4₍_σ_,a,b)₌_[q 2,q4];δ2 4₍_σ_,c,[q 1,q3])=[q1,q3];δ2 4₍_σ_,[q 2,q4],[q1,q3])={q5,q6,q7} ; δ3 4₍_σ_,a,[q 2, q4],b)=[q2,q4] F₄= {q₅,q₆,q₇}

We can see that T(A0/merge(π3, q2, q4 ) ) ∩ S- =∅

then π_{4 = merge(}π_{3, q2, q}4 ) = {[q1, q3], [q2, q4], q5, q6, q7}

A0/merge(π4, [q1,q3], q5 ) is defined by: δ0 5 (a) = a; δ₀5(b) =b; δ₀5(c) =c; δ1 5₍_σ_,c)₌_[q 1,q3,q5]; δ2 5 (σ,a,b)=[q₂,q₄]; δ₂5(σ,c,[q₁,q₃,q₅])=[q₁,q₃,q₅]; δ2 5₍_σ_,[q 2,q4],[q1,q3,q5])={[q1,q3,q5],q6, q7}; δ3 5 (σ,a,[q₂,q₄],b)=[q₂,q₄] F5 = {[q1,q3, q5], q6, q7}

(10)

We can see that σ

c_∈_Τ(_A₀_/merge(_π_{4, [q1,q3], q}₅_{) ),} then,Τ(A0/merge(π4, [q1,q3], q5 ) ) ∩ S-≠∅

A0/merge(π4, [q2,q4], q5 ) is defined by: δ0 5 (a) = a; δ₀5(b) =b; δ₀5(c) =c; δ1 5₍_σ_,c)₌_[q 1,q3,q5]; δ2 5 (σ,a,b)=[q₂,q₄, q₅]; δ₂5(σ,c,[q₁,q₃])=[q₁,q₃]; δ2 5₍_σ_,[q 2,q4,q5],[q1,q3])={[q2,q4,q5],q6,q7}; δ3 5₍_σ_,a,[q 2,q4,q5],b)=[q2,q4, q5] F₅= {[q2, q4, q₅], q₆, q₇}

We can see that σ

a b _∈_Τ(_A

0/merge(π4, [q2,q4], q5 ) ),

then,Τ(A0/merge(π4, [q2,q4], q5 ) ) ∩ S-≠∅

and then π5 = π₄

In the same way, we can see that: σ c _∈_Τ(_A 0/merge(π5, [q1,q3], q6) ), thenΤ(A₀/merge(π_{5, [q1,q3], q}₆) ) ∩ S_-≠∅ σ a b _∈_Τ(_A 0/merge(π5, [q2,q4], q6) ), then Τ(A0/merge(π5, [q2,q4], q6 ) ) ∩ S-≠∅ but Τ(A0/merge(π5, q5, q6 ) ) ∩ S-=∅ then π6 = merge(π_{5, q5, q}6 ) = {[q1, q3], [q2, q4], [q5, q6], q7} σ c ∈ Τ(A0/merge(π6, [q1,q3], q7) ), thenΤ(A0/merge(π6, [q1,q3], q7 ) ) ∩ S-≠∅ σ a b _∈_Τ(_A₀_/merge(_π_{6, [q2,q4], q}₇_{) ),} then Τ(A0/merge(π6, [q2,q4], q7 ) ) ∩ S-≠∅ but Τ(A0/merge(π6, [q5, q6], q7 ) ) ∩ S-=∅

(11)

then _{π7 =}merge(π_{6, [q5, q}6],q7) = { [q1, q3], [q2, q4], [q5, q6, q7]} Α0/π7 is defined by: δ0 7 (a) = a; δ₀7(b) =b; δ₀7(c) =c; δ1 7₍_σ_,c)₌_[q 1,q3]; δ2 7₍_σ_,a,b)₌_[q 2,q4]; δ2 7₍_σ_,c,[q 1,q3])=[q1,q3]; δ3 7₍_σ_,a,[q 2,q4], b)=[q2, q4]; δ2 7₍_σ_,[q 2,q4],[q1,q3])=[q5,q6,q7] F7 = {[q5,q6,q7]}

Example 2. Let S+ and S- be the same as in Example 1, but now considered as

structural information of a context-free grammar.

We may construct a context-free grammar G=(N,Σ,P,S), from Α0/π7, according

to what was seen in section II, so that sk(D(G)) = T(Α0/π7 )

If we call A = [q2, q4], B = [q1, q3] and C = [q5 q6, q7], G is defined by N =

{S,A,B,C}, Σ= {a,b,c} and P:

S → AB;C → AB;S → aAb |ab;S → _{cB |c}

The rule C → AB can be eliminated since C is not accessible.

IV.

I

DENTIFICATION IN THE LIMIT

Definition 1. The set of Short Subtrees of T⊆ VT, SSub(T) is defined as: SSub(T) = {t∈Sub(T) | ∀s ∈VT _(s-1_{T = t}-1_T⇒_s≥_t)}

If T is a recognizable set, then SSub(T) is finite and its cardinal is the number of states of A(T), the canonical tree automaton for T, plus the cardinal of V0.

Definition 2. The Kernel of T⊆ VT, K(T) is defined as: K(T) = {σ(s₁,...,s_n)∈Sub(T) | σ∈V_k, s₁,...,s_k∈SSub(T)}

∪

V₀

Definition 3. Let S = S+

∪

S- be a finite sample of the tree set T⊆ VT. We say

that S is a representative sample of T if:

i) ∀t∈K(T) ∃s∈VT

$ | s#t ∈ S+ (if t∈T, then s = $)

ii) ∀t1∈SSub(T) ∀t2∈K(T) (t1-1T ≠t2-1T ⇒∃s∈VT$ |

(12)

Let A0 be the Subtree Automaton of a representative sample of a recognizable set of trees T,. Now consider that a partition π is made in A0 in which two states

s∈SSub(T) and t∈K(T) that do not represent the same state of the canonical acceptor

of T, belong to the same block. Then the quotient automaton A0/π accepts some negative sample.

Lemma 1. Let T be a recognizable set , letS = (S+,S-) be a representative

sample of T, let π be a partition of Sub(S+) - V0 and s∈SSub(T), t∈K(T) such that B(t, π) =B(s,π). If T(A0/π) ∩ S- = ∅, then t-1T = s-1T.

Proof

Suppose T(A0/π) ∩ S- = ∅ and t-1T ≠ s-1T.

Since s∈SSub(T), t∈K(T) then some r∈V$

T

will exist such that

r#t∈ S+ and r#s ∈S- or r#t∈ S- and r#s ∈S+. We will consider the first case

(the second is analogous):

Since B(t, π) =B(s,π) and r#t ∈S+ then r#t ∈T(A0/π) and also r#s ∈T(A0/π) .

But r#s ∈S- and then T(A0/π) ∩ S-≠∅ , contradicting the hypothesis.

■

Lemma 2. Let T be a recognizable set and let S = (S+,S-) be a representative

sample of T . Let π be a partition in Sub(S+) - V0 and let t,s ∈VT be such thatt-1T = s -1_{T. Then:}

T(A0/π) ∩ S- = ∅ implies that T(A0/merge(π,t,s)) ∩ S- = ∅ .

Proof

Suppose that there exists an r ∈T(A0/merge(π,t,s)) ∩ S- and r∉T(A0/π). In this

case, r can be expressed as r= t1#t2 with t2-1T = t-1T and t1 ∈s-1T or t2-1T = s-1T and

t1∈t-1T. Then, since t-1(T) = s-1(T), then r∈T and r∉S-, contradicting the hypothesis.

■

Remark 1. Note that any partition π of Sub(S+) - V0 obtained by the algorithm

is such that T(A0/π) ∩ S- = ∅ .

Let π_{i be the partition of Sub(S+) - V0 obtained in the i-th iteration of the} algorithm, with input S = (S+,S-) (note that it only affects the first i states). If A0/πi = (Qi, V, δi, Fi), let A’i = (Q’i, V, δ’i, F’i) be the subautomaton induced by the first i

(13)

Theorem 1. Let S = (S+,S-) be a representative sample of the recognizable set

T. Then A’i is isomophic to a subautomaton of A(T).

Proof

Letϕi: Q’i→Q be defined as ϕ(B) = B-1T for B∈Q’i .

We will see that for every iteration i of the algorithm, ϕi is an injective function such that ϕi(F’i ) ⊆F and for all Βi,...,Bk ∈(Q’i ∪V0) and for all σ∈Vk, ϕi(δk

' i

(σ,B1,...,Bk)) = δκ(σ, B1-1T,...,Bk-1T).

By induction in the number of iterations:

If i=1, A0/π1 = A0 and A’1 = ({t1} , V, δ’1, F’1) with:

F₁=

{ }

t1 if t1 ∈S+

∅ otherwise

  

and δ’’1 _{is defined as follows:}

if t1 = σ(a1,...,ak) with ai∈V0, 1≤i≤k’

δ

'₀1 _(a i) =ai , 1≤i≤ k δ'_k1 ₍σ_{, a} 1,..., ak) =t1, and δ'k 1 undefined otherwise But then ϕ1(δ'1_k(σ, a₁,...,a_k)) = ϕ1(t₁) =t-1_T δκ(σ, a1-1T,...,ak-1T) = σ(a1,...,ak)-1T = t-1T

Then ϕ1(δ'1_k(σ, a₁,...,a_k)) = δ_κ(σ, a₁-1_T,...,a k-1T)

Suppose the result holds for i≤_{n and let i = n+1. If tn+1}∈Sub(S+) is the state

considered in the iteration n+1, then a unique σ∈Vk and a unique set Β1,...,Bk ∈(Q’n

∪V0) will exist such that tn+1∈ δk n

(σ, B1,...,Bk). Let s1,...,sn be the smallest trees in B1,...,Bk , respectively.

CLAIM. There exists an s∈K(T) such that tn+1∈ δ’n(s)

It is sufficient to see that s1,...,sn∈SSub(T) by the induction hypothesis, and s = σ(s1,...,sn). We can consider that the state tn+1 is a state in K(T).

(14)

1) If there exists a block B∈ Qn such that B∈ δk ' n

(σ, B1,...,Bk), then it holds that

tn+1-1T = B-1T and, since by Lemma 2, T(A0/πn) ∩ S- = ∅ , then

T(A0/merge(πn,B,tn+1)) ∩ S- = ∅ . Then πn+1 = merge(πn,B,tn+1) and tn+1∈B and, as

a consequence,ϕn+1 continues being an inyective function. Moreover, Q’n+1 = Q’n

and if tn+1∈Fn, by construction of S+ there will exist t∈B, t<tn+1 such that t∈Fn, and

then F’n+1 = F’n and δ’n+1 = δ’n.

2) If there exists a block B<tn+1 that is the first block in πn such that

T(A0/merge(πn,B,tn+1)) ∩ S- = ∅ , then B ∈SSub(T) and tn+1∈K(T) and by Lemma 1

t_n+1-1_{T = B}-1_{T and, when}πn+1_{is obtained, t}

n+1∈B and, as a consequence, ϕn+1

continues being an injective function. In the same way F’n+1 = F’n .

On the other hand δ’n+1 = δ’n ∪ {B∈ δk

' n+1 (σ, B1,...,Bk)}. But ϕi(δk ' n+1 (σ, B1,...,Bk)) = ϕi(B) = ϕi(σ(s1,...,sk)) =σ(s1,...,sk)-1T = δ_κ(σ, s₁-1_T,...,s k-1T) = δκ(σ, B1-1T,...,Bk-1T)

3) If no B<tn+1 exists such that T(A0/merge(πn,B,tn+1)) ∩ S- = ∅ , then by

Lemma 2, no B<tn+1 will exist such that tn+1-1T = B-1T. In this case tn+1∈SSub(T)

andϕi continue being an injective function. Moreover, Q’n+1 = Q’n ∪ {tn+1}, ϕi(F’n+1)⊆ F. Finally, δ’n+1 = δ’n ∪ {tn+1∈ δk ' n+1 (σ, B1,...,Bk)} and ϕi(δk ' n+1 (σ, B1,...,Bk)) = ϕi(tn+1) = ϕi(σ(s1,...,sk)) =σ(s1,...,sk)-1T = δκ(σ, s₁-1_T,...,s k-1T) = δκ(σ, B1-1T,...,Bk-1T) ■

Corollary 1. The proposed algorithm identifies in the limit the family of

recognizable tree sets.

Proof It is a consequence of Theorem 1 and is based on the fact that any

complete presentation of a recognizable tree set includes an initial finite sequence which is a representative sample.

■

V COMPLEXITY

The membership of a tree t to the language of a tree automaton A can be computed by means of a Dynamic Programming algorithm with a temporal complexity given by O(|A| |t|mA) where |A| is the number of functions δ of the automaton, mA is the maximum arity of the functions δ and |t| is the number of nodes of the tree t. Then, the verification of whether a tree language intersects the set of

(15)

trees S- has a complexity of O(|A|mA||S-||), where ||S-|| denotes the total number of nodes of the trees of the tree set S-.

Keeping in mind that the state that is merged always appears as the father of a single function δ, and using an adequate data structure, we can compute each new automaton from the last one in a time that is independent of the size of the problem.

The worst case occurs when no pair of states can be merged. In this case we have to try to merge each state with all the previous states. That is, a total of O(|A0|2) trials and, for each one, we have to see if the language recognized by the automaton intersects S-. The number of states of the obtained automaton is always one less than the original. Moreover, since we are assuming a worst case, no state can be merged and the number of states of all the automata is constant and is equal to |A0|-1. And since |A0| is O( ||S+||), the worst case overall complexity of the algorithm is:

(16)

VI.

C

ONTEXT-FREE INFERENCE

We have already seen in Example 2 how the proposed algorithm can be used to learn Context-Free Languages (CFL) from positive and negative structural information. However, in the case that the target is CFL inference, the expression

T(A0/merge(πn,B,tn+1)) ∩ S- of the algorithm can be changed for

Fr(T(A0/merge(πn,B,tn+1))) ∩ S’- where S’- is a set of the usual negative strings

rather than structural data (trees).

To see that the modified algorithm can identify any CFL in the limit, it is enough to see that if Fr(T(A0/merge(πν,B,tn+1))) ∩ S’- = ∅ then, obviously,

T(A0/merge(πn,B,tn+1)) ∩ S- = ∅ . And if Fr(T(A0/merge(πn,B,tn+1))) ∩ S’-≠∅ , it

is because there exists a tree in A0/merge(πn,B,tn+1) such that its frontier is a string

rejected by the CFG. But this tree is a possible negative sample and, in the limit it must appear in S- . And, when ever this happens, T(A0/merge(πn,B,tn+1)) ∩ S- ≠∅.

Both algorithms produce the same results in the limit and since the first algorithm converges to a correct solution, so does the second one.

We can observe that the second algorithm will need less negative samples than the first. But this advantange is obtained by increasing the computational complexity. Now the cost of verifying whether some string is a frontier of some tree accepted by the tree automaton (the string is in the CFL represented by the automaton) is

O(|A||c|3), where |c| is the length of the string. Then, the complexity of the algorithm is

given by O(||S+||3|S’-|(mS’-)3), where ||S+|| is the number of nodes of the trees in the set S+, |S’-| is the number of strings in S’- and mS’- is the length of the longest string in S’-.

R

EFERENCES

F. Gécsec and M. Steinby. (1984)"Tree Automata" Akadémiai Kiadó, Budapest. J. Oncina and P. García. (1992) “Inferring regular languages in polynomial update time”. In Pattern Recognition and Image Analysis. N. Pérez de la Blanca, A. Sanfeliu and E. Vidal eds. World Scientific, 49-64.

Y. Sakakibara. (1990)"Learning context-free grammars from structural data in polynomial time". Theoretical Computer Science, 76, pp. 223-242.

Y. Sakakibara. (1992)"Efficient Learning of Context-Free Grammars from Positive Structural Examples". Intfomation and Computation, 97, pp. 23-60.