The subject of developing certified algorithms for deciding regular expression equivalence within theorem provers is not new. In recent years, much attention was directed to this particular subject, resulting in several formalizations, some of which are based on derivatives, and spawn along three different interactive theorem provers, namely Coq, Isabelle [79], and Matita [12].
The most complete of the developments is the one of Braibant and Pous [18]: the authors formalised Kozen’s proof of the completeness of KA [57] and developed also efficient tactics to decide KA equalities by computational reflection. Their construction is based on the classical automata process for deciding regular expressions equivalence without minimisation of the involved automata. Moreover, they use a variant of Illie and Yu’s method [52] for constructing automata from regular expressions, and the comparison is performed using Karp’s [49] direct comparison of automata. The resulting development is quite general (it is able to prove (in)equivalence of expression of several models of Kleene algebra) and it is also quite efficient due to a careful choice of the data structures involved.
The works that are closer to ours are the works of Coquand and Siles [29], and of Nipkow and Krauss [66]. Coquand and Siles implemented a procedure for regular expression equivalence based on Brzozowski’s derivative method, supported by a new construction of finite sets in type theory. They prove their algorithm correct and complete. Nipkow and Krauss’ develop- ment is also based in Brzozowski’s derivative, and it is a compact and elegant development carried out in the Isabelle theorem prover. However, the authors did not formalise the termination and completeness of the algorithm. In particular, the termination is far from being a trivial subject, as demonstrated by the work presented in this thesis, and in the work of Coquand and Siles.
More recently, Asperti presented a development [11] of an algorithm based on pointed regular expressions, which are regular expressions containing internal points. These points serve as indicators of the part of the regular expression that was already processed (transformed into a DFA) and therefore which part of the regular expression remains to be processed. The development is also quite short and elegant and provides an alternative to the algorithms based on Brzozowski’s derivatives, since it does not require normalisation modulo a suitable set of axioms to prove the finiteness of the number of the states of the corresponding DFA. In Table 3.2we provide results about a comparison between our development and the one of Braibant and Pous. We no not present comparison with the other two Coq developments since they clearly exhibit worst performances than ours and the previous one. For technical reasons, we were not able to test the development of Asperti. In these experiments we have used a datasets of 1000 uniform randomly generated regular expressions, and they were conducted in a Macbook Pro 15”, with a 2.3 GHz Intel Core i7 processor with 4 GB of RAM
3.5. CONCLUSIONS 71 alg./(k, n) (2, 5) (2, 10) (2, 20)
eq ineq eq ineq eq ineq
equivP 0.003 0.002 0.008 0.003 0.020 0.004 ATBR 0.059 0.016 0.080 0.042 0.258 0.099
(4, 20) (4, 50) (10, 100)
eq ineq eq ineq eq ineq
equivP 0.035 0.004 0.172 0.010 0.776 0.016 ATBR 0.261 0.029 0.436 0.358 1.525 0.874
(20, 200) (50, 500) (50, 1000)
eq ineq eq ineq eq ineq
equivP 2.211 0.048 9.957 0.121 17.768 0.149 ATBR 3.001 1.654 5.876 2.724 16.682 12.448
Table 3.2: Comparison of the performances. memory.
It is clear from Table3.2that the work of Braibant and Pous scales better than ours for larger families of regular expressions but it is drastically slower than ours with respect to regular expression inequivalence. For smaller families of regular expressions, our procedure is also faster than theirs in both cases. The values k and n in Table3.2are the same measures that were used in Table 3.1, presented in the previous section for the analysis of the performance of equivP.
3.5
Conclusions
In this chapter we have described the mechanisation, within the Coq proof assistant, of the procedure equivP for deciding regular expressions equivalence based on the notion of partial derivatives. This procedure decides the (in)equivalence of regular expressions by an iterated method of comparing the equivalence of their partial derivatives. The main advantage of our method, when compared to the ones based on Brzozowski’s derivatives, is that it does not require normalisation modulo the associativity, commutativity and idempotence of the + operator in order to prove the finiteness of the number of derivatives and of the termination of the corresponding algorithms. The performances exhibited by our algorithm are satisfactory. Nevertheless, there is space for improvement. A main point of improvement is the development of intermediate tactics that are able to automate common proof steps. An interesting continuation of our development is its extension to support extended regular expressions, that is, regular expressions containing intersection and complement. The recent work of Caron, Champarnaud and Mignot [23] extends the notion of partial derivative to
handle these extended regular expressions and its addition to our formalisation should not carry any major difficulty.
Another point that we wish to address is the representation of partial derivatives similarly to the work of Almeida et. al., where partial derivatives are represented in a linear way. This representation has the advantage of reducing the number of symbols involved in the derivation process whenever some of the symbols lead to derivatives whose result is the empty set.
Chapter 4
Equivalence of KAT Terms
Kleene algebra with tests (KAT) [59,64] is an algebraic system that extends Kleene algebra, the algebra of regular expressions, by considering a subset of tests whose elements satisfy the axioms of Boolean algebra. The addition of tests brings a new level of expressivity in the sense that in KAT we are able to express imperative program constructions, rather than just non-deterministic choice, sequential composition and iteration on a set of actions, as it happens with regular expressions.
KAT is specially fitted to capture and verify properties of simple imperative programs since it provides an equational way to deal with partial correctness and program equivalence. In particular KAT subsumes propositional Hoare logic (PHL) [65,60] in the sense that PHL’s deductive rules become theorems of KAT. Consequently, proving that a given program p is partially correct using the deductive system of PHL is tantamount to checking if p is partially correct by equational reasoning in KAT. Moreover, some Horn formulas [43, 44] of KAT can be reduced into standard equalities which can then be decided automatically using one of the available methods [101,64,62].
In this chapter we present a mechanically verified implementation of a procedure to decide KAT terms equivalence using partial derivatives. The decision procedure is an extension of the procedure already introduced and described in the previous chapter. The Coq development is available in [75].
4.1
Kleene Algebra with Tests
A KAT is a KA extended with an embedded Boolean algebra (BA). Formally, a KAT is an algebraic structure
(K, T, +, ·,?,−, 0, 1), 73
such that (K, +, ·,?, 0, 1)is a KA, (T, +, ·,−, 0, 1)is a Boolean algebra and T ⊆ K. Therefore,
KAT satisfies the axioms of KA and the axioms of Boolean algebra, that is, the set of axioms (3.8–3.22) and the following ones, for b, c, d ∈ T :
bc = cb (4.1) b + (cd) = (b + c)(b + d) (4.2) b + c = bc (4.3) b + b = 1 (4.4) bb = b (4.5) b + 1 = 1 (4.6) b + 0 = b (4.7) bc = b + c (4.8) bb = 0 (4.9) b = b (4.10)