• No results found

6.2 An Automaton Model for Unranked Regular Rooted Graph Languages

6.2.2 Membership Test for a Tree using Hyper Graph Automata

An algorithm able to test membership of unranked trees in a language represented by a hyper graph based automaton is presented. In contrast to the standard approach for ranked tree as shown in [21] and introduced earlier, this algorithm is able to operate directly on the unranked tree model without prior transcription of data instances to a ranked tree representation. Calculus rules are used to explain the algorithm in a non-deterministic way. Rules are of the following shape: C1 .. . Cn e1:a1 en:an e:a (EXAMPLE)

Ci denote constraints one,a,ei,ai ande,ei are trees or content lists of trees, i.e. sequences

of trees that all share the same parent node. Bya,aieither states or transitions of an automaton

are denoted. An expressione:awill also be called a configuration of the automaton. The rules relate configurations of automata. Two different kinds of configuration exist: (1) configurations of shapet:τwheretis a tree andτis a transition, (2) or[t1, . . . , tn] :Swhere[t1, . . . , tn]is a list

of trees andSis a state of the automaton.

τ ∈RA

(s, l, c, e) =τ se∈FA

t:τ

(ROOT) The ROOT rule matches the root of the data tree, if there is a transition in the set of root

transitions from which on a whole derivation tree can be found. c∈FA

[ ] :c (END)

The ENDrule accepts an empty list, if the configuration involves an empty list and a state of the set of final states.

τ∈∆A τ= (s, l, c, e) [t1, . . . , tn] :c l[t1, . . . , tn] :τ (NODE) n≥1 τ∈∆A τ= (s, l, c, e) t1:τ [t2,· · ·, tn] :e [t1, t2, . . . , tn] :s (LIST)

In a successful derivation, applications of the NODE and the LISTrule are interwoven and all branches end with an application of the ENDrule while the root of the derivation tree is an application of the ROOTrule. A tree without possible derivation is not valid with respect to the given automaton, multiple derivations may exist.

Example of a Tree Recognition using a Hyper Graph Automaton Given the treea[b[], b[a[b[], b[]]]]

and the automatonAas presented in the former example (example 6.1, the following derivation is a possible recognition:

Operational Semantics of The Recognition Rules The rules presented above give an abstract description of a recognition algorithm. Neither the control flow nor decision in case of ambi-

guity are captured by the rules. It is possible, that an automaton can recognize a data tree us- ing different derivations, a concrete algorithm should be designed to either choose one of those derivations, maybe driven by other parameters.

Exemplary, a simple algorithm choosing one rule is sketched now:

Algorithm 6.2.1: MEMBERSHIPTEST(A= (Q,Σ,∆, S, F), e:a, ρ)

comment:ρdenotes the set of rules

comment:edenotes either a tree or a list of trees

comment:eithera∈Qora∈∆

comment:To check a treet, call MEMBERSHIPTEST(A,t:s,R) withs∈S

for C1 .. . Cm e1:a1 en:an eρ:aρ ∈R do                ifC1∧ · · · ∧Cm=true then           

ifMEMBERSHIPTEST(A, e1:a1, ρ) =true

∧ · · · ∧

MEMBERSHIPTEST(A, en:an, ρ) =true

then return(true) return(f alse)

An Upper Bound Complexity for Membership Test of Tree Shaped Data Various ways of document validation with automata have been proposed in “Tree Automata Techniques and Ap- plications”[21]. Easily adaptable to the presented approach is the non-deterministic bottom-up approach. The upper complexity is polynomial in the number of nodes and the number of states. Applied to the former algorithm this can be explained as follows:

1. On each node, there may be at most all rules to apply, each with all states or edges (depend- ing on the rule—some apply to states other to edges) to be checked. The number of rules is constant (there are 4 rules), the number of edges is linear in the size of the grammar, as shown in section 6.3.

2. A naive, top down, approach could easily result in a combinatorial search, yielding expo- nential complexity. A bottom-up approach is an easy way to overcome that:

(a) Each child node isreached upin a recursive application in all possible typing. Note, thatreaching upin the recursive application corresponds to reaching

• the typed sequence of nodes to it’s parent node in applications of the NODEtyping rule,

• the typed sequence of following siblings to it’s direct preceding sibling in the ap- plications of the LISTtyping rule.

(b) Only the typed contributions, that can contribute to successful typing in a given recur- sion are kept, at the same time, if they occur in different successful typing in the given recursion, they can be reused without recalculating them—their validity is indepen- dent on their context.

3. When checking a node and a type (state or transition), this type may have to be checked against all types returned by the recursive calls (at most two of them exist—one for the following sibling and one for the content model). As a typed node may at most have as many types as edges exist, this step is quadratic in the number of edges.

4. Hence, as consequence of (1), each node at most has ‘number of edges’ types, and as conse- quence of (2), it is not necessary to calculate the possible types of a node more than once. As consequence of (3) each node takes at most quadratic time in the computation—this gives us apolynomial complexity in the order ofO(N×M2)whereNis the size of the tree (or

the number of nodes) andM is the size of the automaton (or the number of edges as an upper bound).