Inference of recognizable tree sets P. García and J. Oncina
INFERENCE OF RECOGNIZABLE TREE SETS
*P.
Garcia
† andJ. Oncina
‡† Departamento de Sistemas Informáticos y Computación. Universidad Politécnica de Valencia ‡ Departamento de Tecnología Informática y Computación. Universidad de Alicante.
e-mail: [email protected]
Abstract
Recognizable tree sets are a generalization of regular languages. We show that the class of recognizable tree sets is identifiable in the limit from complete presentation and we propose an inference algorithm with this property. This algorithm can be used for the inference of context-free grammars from positive structural information and negative data. The proposed algorithm constructs its hypothesis in polynomial time with the size of the data.
key words: Inductive Inference, Recognizable Tree Sets running head: learning tree sets
* Work supported in part by the Spanish CICYT under grants TIC93-0633-CO2 and TIC1026/92-CO2.
I.
I
NTRODUCTIONThe inference of recognizable tree sets [Gécsec and Steinby,84] has recently been the object of renewed interest with regard to the problem of learning context-free languages from structural information [Sak,88,92]. An algorithm is proposed in [Sak,92] that identifies in the limit any context-free language from positive structural information. This algorithm is, in reality, an algorithm capable of identifying a
subclass of recognizable tree sets, the 0-reversible tree sets, from positive data. On the
other hand, for any context-free language a reversible grammar exists such that the set of skeletons of its derivation trees is 0-reversible. In order to correctly learn context-free languages Sakakibara’s algorithm requires its positive structural information to have been extracted from a reversible grammar that generates the unknown language.
However, if the form of context-free grammar that provides the structural information is unrestricted, the class of the context-free languages is no longer identifiable in the limit from only positive structural information. This is a direct result of the fact that the family of recognizable tree sets is not identifiable in the limit from positive data.
In this paper, we propose an algorithm that identifies in the limit any recognizable tree set from complete presentation (positive and negative). Thus, our algorithm allows for the correct inference of any context-free language from complete structural information regardless of the type of context-free grammar that is assumed to provide the learning skeletons. The algorithm which we present here is a generalization of the regular language inference algorithm proposed by [Oncina and García,92]
This algorithm is based on the state clustering technique applied to the so called "Subtree Automaton" associated to the given positive data. The subtree automaton is the equivalent acceptor for trees of the conventional Prefix Tree Acceptor for usual strings. The algorithm merges the states of the Subtree Automaton in a certain order using the negative data to avoid certain groupings. For finite data sets, the final result is a Non-Deterministic Tree Automaton which is consistent with the data. The algorithm converges in the limit to the minimum Deterministic Tree Automaton of the tree set to be inferred. Moreover, the algorithm can be used in such a way that it works in polynomial time with the size of the input data.
II.
P
RELIMINARIES AND NOTATIONLet N be the set of natural numbers and N* the free monoid generated by N with
"." as the operation and λasthe identity. We define u≤w for u,w∈N* iff there exists v∈N*, such that w = u.v (u<w if u≤w and u≠w). For x∈N*,we define the length of x
denoted by |x| as follows:
|λ| = 0,|x.n| = |x| +1 for n N
D ⊆ N* is a tree domain iff it satisfies: a) v∈D and u<v implies u∈D b) if u.i ∈D, i ∈N, then u.j ∈D for 1≤j≤i .
A ranked alphabet V is a finite set associated with a finite relation r ⊆(V x N).
Vn denotes the subset of V: {σ∈V| (σ, n)∈r}.
A tree t over a ranked alphabet V is a mapping t : D →V with D being a tree
domain called domain of t and denoted by dom(t). The set of finite trees over V will be called VT. The alphabet can be seen as a set of function symbols having different
arities in such a way that VT can be considered as the set of terms over V. Let t∈VT
and x∈dom(t). The depth of x is defined as depth(x) = |x|, and the depth of t as depth(t) = max{depth(x) | x∈dom(t)}.
The subtree of t rooted at x, denoted as t/x is defined as: dom(t/x) = {y | x.y
∈dom(t)} and (t/x) (y) = t(x.y) ∀y ∈ dom(t/x). If t∈VT, then Sub(t) is the set of
subtrees of t, that is, Sub(t) = {t/x | x ∈dom(t)} and for the set T ⊆ VT,Sub(T) = ∪ t∈T Sub(t).
Let $∉V be a new symbol of arity zero. We denote VT$ as the set of trees in (V ∪{$})T which contains exactly one symbol $. For s∈VT$ and t ∈(VT∪VT$ ) , we
define the $-replacement s#t as:
s # t(x)= s(x) if x∈dom(s), s(x)≠$ t(y) if x =z. y, s(z)=$, y∈dom(t)
For t∈VT and T ⊆ VT, we define the quotient t-1T as:
t−1T= s∈V$ T s# t∈T
{
}
if t∈VT − V0 t if t∈V0 Let V be a ranked alphabet and m the greatest arity of the symbols in V. A
non-deterministic tree automaton (NTA) is defined as the four-tuple A = (Q,V,δ,F), where Q is a finite set of states Q∩V0 = ∅, F ⊆ Q is the set of final states, and δ = (δ0,δ1,...,δm), the set of state transition functions defined by:
δn:(Vnx(Q∪V0)n)→2Q, n=1,2,...,m δ0(a) = a, ∀a∈V0
δ can be extended to operate on trees as follows:
δ(σ(t1 ,...,tn)) = q1∈δ(t1),...,qn∈δ( tn)
∪
δn(σ,q1,...,qn)if n≥1
δ0(a) if a∈V0
A tree t∈VT is accepted by A if δ(t)∩F≠∅,. The set of trees accepted by A is
defined as T(A) = {t∈VT | δ(t)∩F≠∅}. A deterministic tree Automaton (DTA) is a NTA such that for every q1,...qk∈(Q∪V0) and every σ∈Vk, δk/σ, q1,...,qk) has at most
A = (Q,V,δ,F) is isomorphic to A’ = (Q’,V,δ’,F’) if and only if there exists a
bijection ϕ: Q→Q’ such that ϕ(F) = F’, and for every q1,...,qk∈(Q∪V0) and every σ∈Vk,ϕ(δκ(σ, q1,...,qk)) = δ’κ(σ, q’1,...,q’k), where q’i =ϕ(qi) if qi∈Q and q’i =qi if qi∈V0 for i = 1,...,k.
The tree automaton A’ = (Q’,V,δ’,F’) is a subautomaton of A = (Q,V,δ,F) if
and only if Q’⊆ Q, F’⊆ F and for every q1,...,qk∈(Q’∪V0) and every σ∈Vk, δ’κ (σ, q1,...,qk) = δκ(σ, q1,...,qk) or, δ’κ(σ, q1,...,qk)) is undefined.
Let A = (Q,V,δ,F) be a tree automaton. If Q’⊆ Q, then the subautomaton of A induced by Q is defined as A’ = (Q’,V,δ’,F’), where F’ = Q’∩ F and q∈δ’κ(σ, q1,...,qk) if and only if q∈Q’, q1,...,qk∈(Q’∪V0) and q∈δκ(σ, q1,...,qk).
Let A = (Q,V,δ,F) be a tree automaton. If π is a partition of Q, the quotient automaton of A induced by π is defined as A/π = (Q’,V,δ’,F’), where Q’=π , F’={ B ∈π | B ∩ F≠ ∅ } and for every B1,...,Bk∈(Q’∪V0) and σ∈Vk, B∈δ’κ(σ, B1,...,Bk) if
there exist q∈Β and qi∈Βi∈Q’ or qi=Βi∈V0, i=1,...,k such that q∈δκ(σ, q1,...,qk).
The tree set accepted by A/π is a superset of the tree set accepted by A, i. e.
T(A) ⊆T(A/π ). If π’ refines π , then T(A/π’)⊆T(A/π ).
Let T be a recognizable tree set. The Canonical tree automaton that recognizes
T is defined as A(T) = (Q,V,δ,F) , where:
Q = {t-1T | t ∈Sub(T) -V0 }; F = {t-1T | t ∈T};
δκ(σ, s1-1T,...,sk-1T) =σ(s1,...,sk)-1T, s1,...,sk,σ(s1,...,sk)∈Sub(T) δ0(a) = a, a ∈V0
Let S ⊆VT be a finite set. We define the Subtree automaton of S as SA(S) = (Q,V,δ,F) , where:
Q = Sub(S) -V0 ; F = S;
δκ(σ, s1,...,sk) =σ(s1,...,sk), s1,...,sk,σ(s1,...,sk)∈Sub(S) δ0(a) = a, a ∈V0
For a context-free grammar G=(N,Σ,P,S) and for each symbol X∈N∪Σ, we
define the set of trees from G rooted at X as:
D X( G)= a { } if X =a∈Σ X (t1," , tn) X→ X1" Xn ∈P, ti ∈DX i (G ), 1≤i≤n if X∈N
A skeletal alphabet is a ranked alphabet with exactly one symbol whose arities are greater or equal to one. If Sk denotes such an alphabet and V0 is an alphabet of
symbols whose arity is zero, a tree over Sk ∪V0 is called a skeleton (all its inner nodes
are labeled by σ and the leaves are symbols from V0). Given (V,r) and (V’,r’), a transcription is a mapping π: V→V’ with π(Vn) ⊆ V’n. The function π can be extended
to π: VT→V’T, making π(σ(t1,...,tn)) = π(σ)(π(t1),...,π(tn)). Let V’=(Sk ∪V0) and for
any a∈V0, π(a) = a and π(A) = σ for A∈Vn , n≥1; in that case, if t∈VT, π(t) will give
the skeleton of t which will be denoted as sk(t)
.
For the set T ⊆ VT, the set ofskeletons associated to the trees in T is sk(T) =
∪
t∈T sk(t).If Sk={σ}, then the set of trees over Sk ∪Σ , which are skeletons associated to trees in DS(G), is denoted by sk(D(G)). Both DS(G) and sk(D(G)) are regular tree sets.
Let t be a derivation tree or a skeleton of a context-free grammar with a terminal alphabet Σ (we admit that t may be an element of Σ ). The frontier string of t, FS(t), is defined as: FS( t )= t if t∈Σ FS(t1)... FS(tk) if t =
σ
(t1,..., tk) Let A = (Q,Sk ∪Σ,δ,F) be a DTA for a set of skeletons. There exists a
context-free grammar G=(N,Σ,P,S) such that sk(D(G)) = T(A), which can be obtained as
follows:
N = Q ∪{S}; P = {δn(σ,q1 ,...,qn) →q1...qn | n≥1, σ∈Skn, q1,...,qn∈(Q∪Σ)}∪ {S→ q1...qn | n≥1, δn(σ,q1 ,...,qn) ∈F}
III.
I
NFERENCE ALGORITHMGiven a positive sample S+ taken from a set of trees T ⊆ VT, a total order
relation can be defined in Sub(S+) - V0 (the set of states of SA(S+)) by sorting the trees
in Sub(S+) - V0 according to their depth and, if two trees have the same depth,
according to a fixed arbitrary order. We will denote by "<" the order defined in this way. This order can be extended to the blocks of any partition π of Sub(S+) - V0 as
follows:
∀ B,B’∈π B<B’ if and only if ∃ t ∈B |∀t’∈ B’ t<t’
Given a partition π of Sub(S+)-V0 and B,B’∈π, the result of the operation merge(π,B,B’) is the partition obtained by grouping B and B’ in the same block and
Let S = S+
∪
S- be a finite sample of an unknown tree set T , and let r be thenumber of states of A0 = SA(S+). With input S, the algorithm outputs a tree
automaton, A0/πr, where πr is obtained by the following iterative procedure: π1 is the trivial partition in Sub(S+)-V0 (the states of A0).
πn+1= merge(πn,B,tn+1) if ∃∈Β1,..,Bk,B∈πn, ∃σ∈Vk | {B,tn+1}⊆δk n (σ, B1,...,Bk), B<tn+1 and B is the first block in δk n (σ, B1,...,Bk) such that T(A0/merge(πn,B,tn+1)) ∩ S- = ∅
πn+1= merge(πn,B,tn+1) if B<tn+1 and B is the first block in πn such
that T(A0/merge(πn,B,tn+1)) ∩ S- = ∅
πn+1= πn otherwise Example 1. Let S+ = { σ σ σ a b c , σ σ σ a b c σ c , σ σ σ a σb c a b } and S- = { σ σ σ c c , σ σ σ a b a b , σ a b , σ c , σ a σb a b , σ c σ c } Sub(S+)-V0 ={ σ c , σ a b , σ c σ c , σ aσb a b , σ σ σ a b c , σ σ σ a b c σ c , σ σ σ aσb c a b }
q1 = σ c , q2 = σ a b, q 3 = σ c σ c , q4 = σ aσb a b , q5 = σ σ σ a b c , q6 = σ σ σ a b c σ c, q 7 = σ σ σ a σb c a b . A0 is defined by: δ0 0 (a) = a; δ00(b) =b; δ00(c) =c; δ1 0(σ,c)= q1, δ2 0(σ,a,b)= q2,δ20(σ,c,q1)= q3,δ20(σ,q2,q1)=q5,δ20(σ,q2,q3)=q6 δ2 0 (σ,q4,q1)=q7 δ3 0(σ,a,q 2,b)=q4 F0 = {q5,q6,q7} π1 = { q1, q2, q3, q4, q5, q6, q7 } therefore A0/π1 = A0
A0/merge(π1,q1,q2) is defined by: δ0 2 (a) = a; δ02(b) =b; δ02(c) =c; δ1 2(σ,c)=[q 1,q2], δ2 2(σ,a,b)=[q 1,q2], δ2 2(σ,c,[q 1,q2])=q3, δ2 2(σ,[q 1,q2],[q1,q2])=q5, δ2 2 (σ,[q1,q2],q3)=q6, δ22(σ,q4,[q1,q2])=q7 δ3 2(σ,a,[q 1,q2],b)=q4 F2 = {q5,q6,q7}
We can observe that σ σ σ
c
c ∈ Τ(A
0/merge(π1,q1,q2) ),
then, T(A0/merge(π1,q1,q2)) ∩ S-≠∅ and then π2 = π1
A0/merge(π2,q1,q3) is defined by: δ0 3 (a) = a; δ03(b) =b; δ03(c) =c; δ1 3(σ,c)=[q 1,q3]; δ2 3 (σ, a,b)=q2; δ23(σ,c,[q1,q3])=[q1,q3]; δ2 3(σ, q 2,[q1,q3])={q5,q6}; δ2 3(σ, q 4,[q1,q3])=q7
δ3
3(σ, a,q
2,b)=q4
F3 = {q5,q6,q7}
We can see that T(A0/merge(π2, q1, q3 ) ∩ S- =∅
then π3 = merge(π2, q1, q3 ) = {[q1, q3], q2, q4, q5, q6, q7}
A0/merge(π3,[q1, q3], q4 ) is defined by: δ0 4 (a) = a; δ04(b) =b; δ04(c) =c; δ1 4(σ,c)=[q 1,q3,q4]; δ2 4 (σ,a,b)=q2; δ24(σ,c,[q1,q3,q4])=[q1,q3,q4]; δ2 4(σ,q 2,[q1,q3, q4])={q5,q6};δ2 4(σ,[q 1,q3,q4],[q1,q3,q4])=q7 δ3 4 (σ,a,q2,b)=[q1,q3,q4] F4 = {q5,q6,q7}
We can see that σ σ σ c c ∈ Τ(A 0/merge(π3,[q1, q3], q4 )), then,Τ(A0/merge(π3,[q1, q3], q4 )) ∩ S-≠∅
A0/merge(π3, q2, q4 ) is defined by: δ0 4 (a) = a; δ04(b) =b; δ04(c) =c; δ1 4(σ,c)=[q 1,q3]; δ2 4(σ,a,b)=[q 2,q4];δ2 4(σ,c,[q 1,q3])=[q1,q3];δ2 4(σ,[q 2,q4],[q1,q3])={q5,q6,q7} ; δ3 4(σ,a,[q 2, q4],b)=[q2,q4] F4 = {q5,q6,q7}
We can see that T(A0/merge(π3, q2, q4 ) ) ∩ S- =∅
then π4 = merge(π3, q2, q4 ) = {[q1, q3], [q2, q4], q5, q6, q7}
A0/merge(π4, [q1,q3], q5 ) is defined by: δ0 5 (a) = a; δ05(b) =b; δ05(c) =c; δ1 5(σ,c)=[q 1,q3,q5]; δ2 5 (σ,a,b)=[q2,q4]; δ25(σ,c,[q1,q3,q5])=[q1,q3,q5]; δ2 5(σ,[q 2,q4],[q1,q3,q5])={[q1,q3,q5],q6, q7}; δ3 5 (σ,a,[q2,q4],b)=[q2,q4] F5 = {[q1,q3, q5], q6, q7}
We can see that σ
c ∈ Τ(A0/merge(π4, [q1,q3], q5 ) ), then,Τ(A0/merge(π4, [q1,q3], q5 ) ) ∩ S-≠∅
A0/merge(π4, [q2,q4], q5 ) is defined by: δ0 5 (a) = a; δ05(b) =b; δ05(c) =c; δ1 5(σ,c)=[q 1,q3,q5]; δ2 5 (σ,a,b)=[q2,q4, q5]; δ25(σ,c,[q1,q3])=[q1,q3]; δ2 5(σ,[q 2,q4,q5],[q1,q3])={[q2,q4,q5],q6,q7}; δ3 5(σ,a,[q 2,q4,q5],b)=[q2,q4, q5] F5 = {[q2, q4, q5], q6, q7}
We can see that σ
a b ∈ Τ(A
0/merge(π4, [q2,q4], q5 ) ),
then,Τ(A0/merge(π4, [q2,q4], q5 ) ) ∩ S-≠∅
and then π5 = π4
In the same way, we can see that: σ c ∈ Τ(A 0/merge(π5, [q1,q3], q6) ), thenΤ(A0/merge(π5, [q1,q3], q6 ) ) ∩ S-≠∅ σ a b ∈ Τ(A 0/merge(π5, [q2,q4], q6) ), then Τ(A0/merge(π5, [q2,q4], q6 ) ) ∩ S-≠∅ but Τ(A0/merge(π5, q5, q6 ) ) ∩ S-=∅ then π6 = merge(π5, q5, q6 ) = {[q1, q3], [q2, q4], [q5, q6], q7} σ c ∈ Τ(A0/merge(π6, [q1,q3], q7) ), thenΤ(A0/merge(π6, [q1,q3], q7 ) ) ∩ S-≠∅ σ a b ∈ Τ(A0/merge(π6, [q2,q4], q7) ), then Τ(A0/merge(π6, [q2,q4], q7 ) ) ∩ S-≠∅ but Τ(A0/merge(π6, [q5, q6], q7 ) ) ∩ S-=∅
then π7 = merge(π6, [q5, q6],q7) = { [q1, q3], [q2, q4], [q5, q6, q7]} Α0/π7 is defined by: δ0 7 (a) = a; δ07(b) =b; δ07(c) =c; δ1 7(σ,c)=[q 1,q3]; δ2 7(σ,a,b)=[q 2,q4]; δ2 7(σ,c,[q 1,q3])=[q1,q3]; δ3 7(σ,a,[q 2,q4], b)=[q2, q4]; δ2 7(σ,[q 2,q4],[q1,q3])=[q5,q6,q7] F7 = {[q5,q6,q7]}
Example 2. Let S+ and S- be the same as in Example 1, but now considered as
structural information of a context-free grammar.
We may construct a context-free grammar G=(N,Σ,P,S), from Α0/π7, according
to what was seen in section II, so that sk(D(G)) = T(Α0/π7 )
If we call A = [q2, q4], B = [q1, q3] and C = [q5 q6, q7], G is defined by N =
{S,A,B,C}, Σ= {a,b,c} and P:
S → AB;C → AB;S → aAb |ab;S → cB |c
The rule C → AB can be eliminated since C is not accessible.
IV.
I
DENTIFICATION IN THE LIMITDefinition 1. The set of Short Subtrees of T⊆ VT, SSub(T) is defined as: SSub(T) = {t∈Sub(T) | ∀s ∈VT (s-1T = t-1T ⇒ s≥t)}
If T is a recognizable set, then SSub(T) is finite and its cardinal is the number of states of A(T), the canonical tree automaton for T, plus the cardinal of V0.
Definition 2. The Kernel of T⊆ VT, K(T) is defined as: K(T) = {σ(s1,...,sn)∈Sub(T) | σ∈Vk, s1,...,sk∈SSub(T)}
∪
V0Definition 3. Let S = S+
∪
S- be a finite sample of the tree set T⊆ VT. We saythat S is a representative sample of T if:
i) ∀t∈K(T) ∃s∈VT
$ | s#t ∈ S+ (if t∈T, then s = $)
ii) ∀t1∈SSub(T) ∀t2∈K(T) (t1-1T ≠t2-1T ⇒∃s∈VT$ |
Let A0 be the Subtree Automaton of a representative sample of a recognizable set of trees T,. Now consider that a partition π is made in A0 in which two states
s∈SSub(T) and t∈K(T) that do not represent the same state of the canonical acceptor
of T, belong to the same block. Then the quotient automaton A0/π accepts some negative sample.
Lemma 1. Let T be a recognizable set , letS = (S+,S-) be a representative
sample of T, let π be a partition of Sub(S+) - V0 and s∈SSub(T), t∈K(T) such that B(t, π) =B(s,π). If T(A0/π) ∩ S- = ∅, then t-1T = s-1T.
Proof
Suppose T(A0/π) ∩ S- = ∅ and t-1T ≠ s-1T.
Since s∈SSub(T), t∈K(T) then some r∈V$
T
will exist such that
r#t∈ S+ and r#s ∈S- or r#t∈ S- and r#s ∈S+. We will consider the first case
(the second is analogous):
Since B(t, π) =B(s,π) and r#t ∈S+ then r#t ∈T(A0/π) and also r#s ∈T(A0/π) .
But r#s ∈S- and then T(A0/π) ∩ S-≠∅ , contradicting the hypothesis.
■
Lemma 2. Let T be a recognizable set and let S = (S+,S-) be a representative
sample of T . Let π be a partition in Sub(S+) - V0 and let t,s ∈VT be such thatt-1T = s -1T. Then:
T(A0/π) ∩ S- = ∅ implies that T(A0/merge(π,t,s)) ∩ S- = ∅ .
Proof
Suppose that there exists an r ∈T(A0/merge(π,t,s)) ∩ S- and r∉T(A0/π). In this
case, r can be expressed as r= t1#t2 with t2-1T = t-1T and t1 ∈s-1T or t2-1T = s-1T and
t1∈t-1T. Then, since t-1(T) = s-1(T), then r∈T and r∉S-, contradicting the hypothesis.
■
Remark 1. Note that any partition π of Sub(S+) - V0 obtained by the algorithm
is such that T(A0/π) ∩ S- = ∅ .
Let πi be the partition of Sub(S+) - V0 obtained in the i-th iteration of the algorithm, with input S = (S+,S-) (note that it only affects the first i states). If A0/πi = (Qi, V, δi, Fi), let A’i = (Q’i, V, δ’i, F’i) be the subautomaton induced by the first i
Theorem 1. Let S = (S+,S-) be a representative sample of the recognizable set
T. Then A’i is isomophic to a subautomaton of A(T).
Proof
Letϕi: Q’i→Q be defined as ϕ(B) = B-1T for B∈Q’i .
We will see that for every iteration i of the algorithm, ϕi is an injective function such that ϕi(F’i ) ⊆F and for all Βi,...,Bk ∈(Q’i ∪V0) and for all σ∈Vk, ϕi(δk
' i
(σ,B1,...,Bk)) = δκ(σ, B1-1T,...,Bk-1T).
By induction in the number of iterations:
If i=1, A0/π1 = A0 and A’1 = ({t1} , V, δ’1, F’1) with:
F1=
{ }
t1 if t1 ∈S+∅ otherwise
and δ’’1 is defined as follows:
if t1 = σ(a1,...,ak) with ai∈V0, 1≤i≤k’
δ
'01 (a i) =ai , 1≤i≤ k δ'k1 (σ, a 1,..., ak) =t1, and δ'k 1 undefined otherwise But then ϕ1(δ'1k(σ, a1,...,ak)) = ϕ1(t1) =t-1T δκ(σ, a1-1T,...,ak-1T) = σ(a1,...,ak)-1T = t-1TThen ϕ1(δ'1k(σ, a1,...,ak)) = δκ(σ, a1-1T,...,a k-1T)
Suppose the result holds for i≤n and let i = n+1. If tn+1∈Sub(S+) is the state
considered in the iteration n+1, then a unique σ∈Vk and a unique set Β1,...,Bk ∈(Q’n
∪V0) will exist such that tn+1∈ δk n
(σ, B1,...,Bk). Let s1,...,sn be the smallest trees in B1,...,Bk , respectively.
CLAIM. There exists an s∈K(T) such that tn+1∈ δ’n(s)
It is sufficient to see that s1,...,sn∈SSub(T) by the induction hypothesis, and s = σ(s1,...,sn). We can consider that the state tn+1 is a state in K(T).
1) If there exists a block B∈ Qn such that B∈ δk ' n
(σ, B1,...,Bk), then it holds that
tn+1-1T = B-1T and, since by Lemma 2, T(A0/πn) ∩ S- = ∅ , then
T(A0/merge(πn,B,tn+1)) ∩ S- = ∅ . Then πn+1 = merge(πn,B,tn+1) and tn+1∈B and, as
a consequence,ϕn+1 continues being an inyective function. Moreover, Q’n+1 = Q’n
and if tn+1∈Fn, by construction of S+ there will exist t∈B, t<tn+1 such that t∈Fn, and
then F’n+1 = F’n and δ’n+1 = δ’n.
2) If there exists a block B<tn+1 that is the first block in πn such that
T(A0/merge(πn,B,tn+1)) ∩ S- = ∅ , then B ∈SSub(T) and tn+1∈K(T) and by Lemma 1
tn+1-1T = B-1T and, when πn+1 is obtained, t
n+1∈B and, as a consequence, ϕn+1
continues being an injective function. In the same way F’n+1 = F’n .
On the other hand δ’n+1 = δ’n ∪ {B∈ δk
' n+1 (σ, B1,...,Bk)}. But ϕi(δk ' n+1 (σ, B1,...,Bk)) = ϕi(B) = ϕi(σ(s1,...,sk)) =σ(s1,...,sk)-1T = δκ(σ, s1-1T,...,s k-1T) = δκ(σ, B1-1T,...,Bk-1T)
3) If no B<tn+1 exists such that T(A0/merge(πn,B,tn+1)) ∩ S- = ∅ , then by
Lemma 2, no B<tn+1 will exist such that tn+1-1T = B-1T. In this case tn+1∈SSub(T)
andϕi continue being an injective function. Moreover, Q’n+1 = Q’n ∪ {tn+1}, ϕi(F’n+1)⊆ F. Finally, δ’n+1 = δ’n ∪ {tn+1∈ δk ' n+1 (σ, B1,...,Bk)} and ϕi(δk ' n+1 (σ, B1,...,Bk)) = ϕi(tn+1) = ϕi(σ(s1,...,sk)) =σ(s1,...,sk)-1T = δκ(σ, s1-1T,...,s k-1T) = δκ(σ, B1-1T,...,Bk-1T) ■
Corollary 1. The proposed algorithm identifies in the limit the family of
recognizable tree sets.
Proof It is a consequence of Theorem 1 and is based on the fact that any
complete presentation of a recognizable tree set includes an initial finite sequence which is a representative sample.
■
V COMPLEXITY
The membership of a tree t to the language of a tree automaton A can be computed by means of a Dynamic Programming algorithm with a temporal complexity given by O(|A| |t|mA) where |A| is the number of functions δ of the automaton, mA is the maximum arity of the functions δ and |t| is the number of nodes of the tree t. Then, the verification of whether a tree language intersects the set of
trees S- has a complexity of O(|A|mA||S-||), where ||S-|| denotes the total number of nodes of the trees of the tree set S-.
Keeping in mind that the state that is merged always appears as the father of a single function δ, and using an adequate data structure, we can compute each new automaton from the last one in a time that is independent of the size of the problem.
The worst case occurs when no pair of states can be merged. In this case we have to try to merge each state with all the previous states. That is, a total of O(|A0|2) trials and, for each one, we have to see if the language recognized by the automaton intersects S-. The number of states of the obtained automaton is always one less than the original. Moreover, since we are assuming a worst case, no state can be merged and the number of states of all the automata is constant and is equal to |A0|-1. And since |A0| is O( ||S+||), the worst case overall complexity of the algorithm is:
VI.
C
ONTEXT-FREE INFERENCEWe have already seen in Example 2 how the proposed algorithm can be used to learn Context-Free Languages (CFL) from positive and negative structural information. However, in the case that the target is CFL inference, the expression
T(A0/merge(πn,B,tn+1)) ∩ S- of the algorithm can be changed for
Fr(T(A0/merge(πn,B,tn+1))) ∩ S’- where S’- is a set of the usual negative strings
rather than structural data (trees).
To see that the modified algorithm can identify any CFL in the limit, it is enough to see that if Fr(T(A0/merge(πν,B,tn+1))) ∩ S’- = ∅ then, obviously,
T(A0/merge(πn,B,tn+1)) ∩ S- = ∅ . And if Fr(T(A0/merge(πn,B,tn+1))) ∩ S’-≠∅ , it
is because there exists a tree in A0/merge(πn,B,tn+1) such that its frontier is a string
rejected by the CFG. But this tree is a possible negative sample and, in the limit it must appear in S- . And, when ever this happens, T(A0/merge(πn,B,tn+1)) ∩ S- ≠∅.
Both algorithms produce the same results in the limit and since the first algorithm converges to a correct solution, so does the second one.
We can observe that the second algorithm will need less negative samples than the first. But this advantange is obtained by increasing the computational complexity. Now the cost of verifying whether some string is a frontier of some tree accepted by the tree automaton (the string is in the CFL represented by the automaton) is
O(|A||c|3), where |c| is the length of the string. Then, the complexity of the algorithm is
given by O(||S+||3|S’-|(mS’-)3), where ||S+|| is the number of nodes of the trees in the set S+, |S’-| is the number of strings in S’- and mS’- is the length of the longest string in S’-.
R
EFERENCESF. Gécsec and M. Steinby. (1984)"Tree Automata" Akadémiai Kiadó, Budapest. J. Oncina and P. García. (1992) “Inferring regular languages in polynomial update time”. In Pattern Recognition and Image Analysis. N. Pérez de la Blanca, A. Sanfeliu and E. Vidal eds. World Scientific, 49-64.
Y. Sakakibara. (1990)"Learning context-free grammars from structural data in polynomial time". Theoretical Computer Science, 76, pp. 223-242.
Y. Sakakibara. (1992)"Efficient Learning of Context-Free Grammars from Positive Structural Examples". Intfomation and Computation, 97, pp. 23-60.