Tree automata-based checking - Sufficient completeness

Chapter 5 Sufficient completeness

5.4 Tree automata-based checking

The sufficient completeness checker in the previous section deals well with con- ditional rules, but has difficulty with specifications using the other important extension of CERM systems — rewriting modulo axioms. This section presents a second sufficient completeness checker based on equational tree automata techniques that is capable of checking specifications with rewriting modulo axioms. Due to the arbitrary conditions that may appear in memberships, it appears quite difficult to apply tree automata techniques to arbitrary MEL specifications. However, our tool is supports order-sorted, left-linear and unconditional specifications with rewriting modulo any combination of associativity, commutativity, and identity axioms. This class appears quite small relative to the much more general class of arbitrary MEL specifications, but it contains many interesting specifications that existing tools have not been able to check.

As an example, our tree automata-based checker is capable of handling the NAT-LIST example from Section 2.4 which we reproduce below:

fmod NAT-LIST is protecting NAT . sort NeList List .

subsorts Nat < NeList < List .

op nil : -> List [ctor].

op __ : NeList NeList -> NeList [ctor assoc id: nil]. op __ : List List -> List [assoc id: nil].

var N : Nat . var L : List .

op head : NeList -> Nat . eq head(N L) = N . op end : NeList -> Nat . eq end(L N) = N .

op reverse : List -> List . eq reverse(N L) = reverse(L) N . eq reverse(nil) = nil .

endfm

In this specification, the operator nil is a constructor, while the operator __ is overloaded: it is defined on all lists, but only a constructor on non-empty lists. The operations head and end are partial operations which are only defined on non-empty lists while reverse is defined on all lists.

The previous sufficient completeness checker described in Section 5.3 is not able to show the sufficient completeness of NAT-LIST. That checker would spe- cialize the associative append symbol to construct terms that are wider, such as head(l1(l2l3)), that still would not match any equations. For this approach

to work, one must be able to bound the width of the terms we consider. For unsorted and many-sorted left-linear specifications with rewriting modulo AC, this is possible as shown by Jouannaud and Kounalis [82]. However, it appears quite difficult to extend their results to the order-sorted case.

To deal with the order-sorted case, we cast the sufficient completeness problem with rewriting modulo axioms as a decision problem for equational tree automata [123]. Equational tree automata extend regular tree automata to allow some of the symbols to have equational properties such as associativity and commutativity. The automaton then recognizes languages that are closed modulo those equational properties. This is important, because when rewriting modulo, the set of reducible terms contains not only the set of terms that syntactically match a rule, but also terms equivalent modulo the axioms to syntactically matching terms.

The idea of using tree automata in checking sufficient completeness is not a new one. Tree automata techniques were used to yield a sufficient completeness checking algorithm that was optimal from the complexity theory point of view. It was shown in [90] that sufficient completeness was EXPTIME-hard, but an exponential time algorithm for checking it was unknown. This problem remained open for several years, until [32] described an exponential time algorithm for checking sufficient completeness that worked by casting the problem as a decision problem for reduction tree automata [24].

The results in [32] depend on the fact that reduction tree automata are capable of recognizing the reducible terms of a rewrite system. For rewriting modulo axioms A, this set of terms must be closed modulo A. In casting sufficient completeness of a specification as an equational tree automata with equations A, we lose the support for non-linear constraints of reduction tree automata, but gain the ability to recognize equationally closed sets of terms.

The class of CERM systems which our tool can handle correspond to the order-sorted subset that are ground weakly-normalizing, ground sort-preserving and have left-linear rules. This class is defined precisely below:

Definition 5.4.1. A CERM system R/A is TA checkable when (a) R/A is ground weakly reductive and ground sort-preserving.

(b) Every axiom in R has the form α if x1: s1∧ · · · ∧ xn: sn where x1, . . . , xn

are distinct, vars(α) ⊆ { x1, . . . , xn} and each variable appears at most once

in the left-hand-side of α.

The details for how to convert the sufficient completeness property for rewriting modulo into a propositional emptiness problem were originally presented

in [74]. Our presentation here is slightly different, but captures the same basic idea. The key idea is to construct an automaton ASCwith two different types of

states: (1) for each sort s ∈ S, A_SCcontains states csand ds: csrecognizes terms

with sort s using only the memberships in MΩ, and dsrecognizes terms with a

defined root operator and constructors underneath; (2) ASC contains states for

recognizing intermediate subterms in the left-hand side of rules in R as well as a state r which recognizes R/A-reducible terms. We then define the propositional formula φ = ¬r ∧W

s∈Sds∧ ¬cs.

If we recall from Section 2.3 that A denotes the underlying unsorted equational theory obtained from the axioms A, then it is not difficult to show the following theorem relating sufficient completeness and Lφ(ASC/A).

Theorem 5.4.2. Let R/A be a TA checkable CERM system with rules R, memberships M , and a signature Σ = (K, F, S) Given a set of memberships MΩ⊆ M , let RΩ= R ∪ MΩ.

There effectively exists a equational tree automata A_SC and propositional formula φ such that R/A is sufficiently complete relative to constructor memberships MΩiff Lφ(ASC/A) = ∅.

Proof. We observe that since the memberships in a TA checkable specification do not have equations in the conditions, if t is R/A-irreducible and R/A ` t : s or respectively RΩ/A ` t : s, then M/A ` t : s or respectively MΩ/A ` t : s.

Every TA checkable CERM system R/A is ground weakly reductive and ground sort-preserving, consequently we can reduce checking the sufficiently completeness of R/A relative to MΩ to checking defined reducibility of R/A

relative to M∆ = M − MΩ. We check this by defining a language Lφ(ASC/A)

which contains an equivalence class [t] ∈ TA iff there exists an R/A-irreducible

ground term t ∈ TΣ for which there exists

• a membership (∀x : s) l : s in M∆and

• a ground substitution θ : x → TΣ

such that MΩ/A ` θ(x) : sx for all variables x ∈ x, t =A lθ, and MΩ/A 6` t : s.

Since the equations in A are kind independent (see Def. 2.6.1), we have that Lφ(ASC/A) = ∅ iff R/A is sufficiently complete relative to MΩ.

In order to define Lφ(ASC/A), we define the set of IR which denotes the

non-variable strict subterms appearing in the left-hand side of clauses in R. The elements in IRare further annotated with the sorts bound to each variable

in a clause. Specifically,

IR = { t[x : s] | (∀x : s) α in R ∧ C[t] ∈ lhs(α) ∧ t 6∈ X ∧ C 6= }.

The states Q of the automaton ASC is the set

To simplify later notation for each variable x appearing the left-hand side of a rule (∀x : s)α in R, we identify cx[x:swith csxwhere sxis the variable associated

to x inx : s.

We define the clauses in ASC as follows.

• For each term f (t1, . . . , tn)[x : s] ∈ IR, ASC contains

cf (t1,...,tn)[x:s](f (x1, . . . , xn)) ⇐ ct1[x:s](x1), . . . , ctn[x:s](xn)

• For each constructor membership (∀x : s) f (t1, . . . , tn) : s in MΩ, ASC

contains

cs(f (x1, . . . , xn)) ⇐ ct1[x:s](x1), . . . , ctn[x:s](xn)

• For each constructor membership (∀x : s) x : s in MΩ, ASC contains

cs(x) ⇐ csx(x).

• For each defined membership (∀x : s) f (t1, . . . , tn) : s in M∆, ASC contains

ds(f (x1, . . . , xn)) ⇐ ct1[x:s](x1), . . . , ctn[x:s](xn)

• For each defined membership (∀x : s) x : s in M∆, ASCcontains

ds(x) ⇐ dsx(x).

• For each operator f : k1. . . kn→ k in F , ASCcontains

q>(f (x1, . . . , xn)) ⇐ q>(x1), . . . , q>(xn).

• For each rule (∀x : s) f (t1, . . . , tn) → u in R, ASC contains

r(f (x1, . . . , xn)) ⇐ ct1[x:s](x1), . . . , ctn[x:s](xn).

• For each rule (∀x : s) y → u in R, ASC contains

r(x) ⇐ csy(x) and r(x) ⇐ dsy.

• Finally, for each operator f ∈ F with arity n and index i ∈ [1, n], ASC

contains

By induction on t ∈ T_Σ, we have

t ∈ Lcs(ASC) ⇐⇒ MΩ` t : s, and

t ∈ Lcu[x:s(ASC) ⇐⇒ (∃θ)t = uθ ∧ (∀x ∈ x) MΩ` θ(x) : s.

We can use these results and the fact that Lq>(ASC) = TΣ, to show that t ∈

Lr(ASC) iff there is a t is R-reducible, and to show that the terms in Lds(ASC)

are those whose root has a sort using a defined membership and whose subterms are constructors.

To reduce sufficient completeness to a propositional emptiness problem for equational tree automata, we define the formula

φ = ¬r ∧ _

s∈S

ds∧ ¬cs.

This formula defies a language accepting irreducible terms that accepted by the language Lds(ASC/A) for some sort s ∈ s that are not constructor terms with

sort s. By the restrictions on A in the Definition 2.6.1 of CERM systems, it follows from Theorem 3.1.1 that Lφ(ASC/A) contains exactly the counterexamples

to defined reducibility. Thus R/A is sufficiently complete iff Lφ(ASC/A) = ∅ by

Theorem 5.2.7.

The decidability of the above emptiness problem depends on the particular axioms A. It is decidable when the axioms in the specification are any combination of associativity, commutativity, and identity, except when a symbol is associative but not commutative. For the case of commutativity alone, this was shown in [123]. For symbols that are both associative and commutative, this was shown in [125]. Identity equations can be transformed into identity rewrite rules using a specialized completion procedure along the lines of coherence completion in [140], and then we can extend the emptiness test to only recognize terms that are in normal form with respect to identity rewrite rules.

For symbols that are associative and not commutative, the problem is unde- cidable. However, for these associative symbols, we can use the semi-algorithm presented in Chapter 3. The semi-algorithm presented in that work is capable of always showing non-emptiness if a language is non-empty, and capable of showing emptiness if the language is empty and certain regularity conditions are satisfied. What this means for sufficient completeness checking is that we can always find counterexamples to sufficient completeness if they exist, and can show sufficient completeness in most practical specifications, where the sorts in a specification are used to model regular data structures like lists or non-empty lists.

The implementation of the tree automata based SCC has two major compo- nents: an analyzer written in Maude that generates the tree automaton emptiness problem from a Maude specification; and a C++ library called CETA that

performs the emptiness check.

Analyzer. The analyzer accepts commands from the user, generates a propositional emptiness problem from a Maude specification, forwards the problem to CETA, and presents the user with the results. If the specification is not sufficiently complete, the tool shows the user a counterexample illustrating the error. The analyzer consists of approximately 900 lines of Maude code, and exploits Maude’s support for reflection. The specifications it checks are also written in Maude.

If the user asks the tool to check the sufficient completeness of a specification that is not left-linear and unconditional, the tool transforms the specification by renaming variables and dropping conditions into a checkable order-sorted left-linear specification. Even if the tool is able to verify the sufficient completeness of the transformed specification, it warns the user that it cannot show the sufficient completeness of the original specification. However, any counterexamples found in the transformed specification are also counterexamples in the original specification. We have found this feature quite useful to identify errors in Maude specifications falling outside the decidable class — including the sufficient completeness checker itself.

CETA. The propositional tree automaton generated by the analyzer is for- warded to the CETA tree automata library which we have developed. CETA is a complex C++ library with approximately 10 thousand lines of code. Empti- ness checking is performed by a subset construction algorithm extended with support for associative and commutativity axioms as described in Chapter 3. The reason that CETA is so large is that the subset construction algorithm relies on quite complex algorithms on context free grammars, semilinear sets, and finite automata.

We have found that CETA performs quite well for our purposes. Most exam- ples can be verified in seconds. A table with a few of the checked specifications from the Maude prelude, Maude primer [108] and Maude book [28] is shown in Figure 5.1. All successfully checked modules are sufficiently complete, however modules are in italics if the sufficient completeness checker identified errors in early versions. The column labeled |E | indicates the total number of sorts, operators, and equations in the theory E , while the column labeled |ETA| indicates the total number of states, operators, and rules in the corresponding automaton. The current version of the checker is not fast enough to verify itself in less than our time limit of 30 minutes, but has been able to successfully identity real sufficient completeness errors in early versions of the checker.

As an example, in Figure 5.2, we present a tool session in which we check two specifications: NAT-LIST from the previous section; and NAT-LIST-ERROR which updates NAT-LIST to change the operator declaration of head from op head : NeList -> Nat to op head : List -> Nat. Since the NAT-LIST specification contained an associative symbol, it fell outside the class known to be decidable. However, the CETA library is still able to show the automaton given by the

Module |E| |ETA| Time Module |E| |ETA| Time TRUTH-VALUE 3 22 0.33s META-LEVEL 610 2011 2.52s TRUTH 6 22 0.35s COUNTER 56 206 0.44s BOOL 19 60 0.38s LOOP-MODE 116 439 0.69s EXT-BOOL 25 74 0.38s CONFIGURATION 18 105 0.35s NAT 55 204 0.47s NAT-CONS 33 135 0.37s INT 96 262 0.55s MY-NAT-LIST 30 109 0.36s RAT 197 397 1.20s NAT-LIST-FIX 33 114 0.45s FLOAT 56 206 0.42s BLACKBOARD 60 217 0.50s STRING 74 288 0.57s CHESS-COVER 80 308 0.53s CONVERSION 262 677 1.35s DIE-HARD 62 238 0.53s RANDOM 56 208 0.45s JOSEPHUS 63 245 0.51s NAT-LIST 90 291 0.65s JOSEPHUS-GEN 64 251 0.51s QID-LIST 113 401 0.70s KHUN-PHAN 66 258 0.46s QID-SET 128 431 0.74s CHIPS 70 273 0.45s META-TERM 143 447 0.69s RABBIT-HOP 68 254 0.54s META-MODULE 499 1538 1.89s CC-LOOP 1381 3837 >30m Figure 5.1: SCC benchmarks

sufficient completeness analyzer was empty, and therefore the specification was sufficiently complete. The checker also finds the correct counterexample for NAT-LIST-ERROR.

In document Decision Procedures for Equationally Based Reasoning (Page 127-133)