Transformational Development of LEX

(1)

Transformational Development of LEX

Kolyang, Junbo Liu and Burkhart Wolff

Universität Bremen1

Abstract

In this paper we present a transformational development of an efficient implementation of a lexical scanner, corresponding to the well-known LEX in the UNIX system. Based on a formal requirement specification of LEX written in the algebraic specification language SPECTRUM, the development is guided by global plans and realised by applications of correctness-preserving transformations (partly developed in the PROSPECTRA project [HK 93]). Optimization is obtained naturally as the end product of the formal development. Our transformational approach is partly an alternative to common "invent and verify" techniques. The development is formally presented and embedded in the terminology of the KORSO Framework.

I

Introduction

KORSO2 (a cooperative project by 13 universities and one company, founded by the German Ministry of Research and Technology) aims at the development of correct software through for-mal methods. Its goal is to capture all phases of the life cycle of a software product and to inte-grate formal methods into these phases in order to achieve correctness. These phases traditionally include an informal problem description, producing a formal requirement specifi-cation, and a stepwise development of efficient implementations from formal requirement and design specifications. A significant extension of classical methodologies consists in a formali-sation of the development process for reuse and redevelopment. In the project, four working groups for handling various aspects of the development of correct software were founded: "Methodology", "Language", "Tools" and "Case Studies". In the working group "Case Stud-ies", a number of smaller and larger examples have been tackled by different research groups, applying different methods and techniques. LEX, corresponding to the well-known scanner generator in the UNIX system, is only one of the case studies for illustrating different instances of the KORSO methodology framework. Because of its well-understood description and

man-1. Universität Bremen, FB3 Informatik, Postfach 330 440, 28334 Bremen, Germany. E-mail: {kol, liu, bu}@informatik.uni-bremen.de

2. This work has been partially supported by the German Ministry of Research and Technology (BMFT) as part of the project “KORSO - Korrekte Software”

(2)

ageable size, LEX has been chosen by several research groups (as presented in [HET 92], [BDDG 93], among others). LEX also attracted us for similar reasons. We hope that the development of LEX pre-sented here highlights the essential advantages of our synthesis-oriented approach for achieving cor-rect software.

The transformational methodology presented here is based on [CIP 85], the ESPRIT research project PROSPECTRA [HK 93], and its descendant in KORSO [KLSW 94]. For the specific problem at hand and its treatment, it is instantiated to the SPECTRUM [Broy+_{93] specification language and its}

seman-tics. Briefly speaking, our development model is as follows: starting from a formal requirement spec-ification written in SPECTRUM, a number of subsequent design specspec-ifications and implementations (also expressed in SPECTRUM) are developed through correctness-preserving transformations such that each specification is a correct specialisation or implementation of its ancestor(s). Verification of the correctness of the whole development is decomposed into reduction of the applicability conditions of the applied transformation rules due to the compositionality of correctness-preserving transforma-tions. The transformational development, in contrast to the "invent and verify" refinement approach, seems therefore more manageable, especially with respect to verifications. Moreover, by formalising development processes, programming and optimization knowledge can be formally specified and used in the form of transformation strategies guiding the choice of transformation rules for each develop-ment step. We consider our developdevelop-ment method as an instantiation of the KORSO methodological framework presented in [Wirsing et. al. 92].

I.1 The LEX Problem

The problem of lexical analysis has its roots in the world of formal languages. The corresponding regu-lar grammars have been widely investigated and are applied in standard compiler technology ([ASU 86] gives a detailed introduction to this field). Regular grammars can be represented by regular expres-sions, recursively defined by:

• ε is a regular expression; L(ε)1 contains the empty string;

• a character c is a regular expression; L(c) contains only the word c; • Given regular expressions r and s denoting the languages L(r) and L(s)

• (r) | (s) is a regular expression denoting L(r) ∪ L(s),

•r o s is a regular expression denoting L(r)L(s),

• r* is a regular expression denoting L(r)*;

• There are no other regular expressions.

A string matches a regular expression if it belongs to its language. Now, the LEX problem consists in finding a list of prefixes of a given string that matches one regular expression out of a list of regular expressions. In case of ambiguity, for each prefix, LEX chooses the longest prefix that matches the first regular expression of the list. For historical reasons and for convenience, these two properties are summarized by the predicate "longest_prefix_match".

(3)

I.1.1 Example

The strings1="aa" matches the regular expression r1=*a,and s2 = "bb" matches the regular expressionr2=(a|b)o(a|b), both of them are longest prefix matches of the corresponding strings with respect to the given patterns. The result of the LEX problem of the input ("aabbc", [*a, (a | b) o (a | b)]) is therefore ([*a, (a | b) o(a | b)], ‘c’).

We started our development of LEX with the formal requirement specification [HET 92] given in ap-pendix A. The LEX problem is specified here in a structured way: there are three related units of spec-ifications: REGEXP, MATCH and SCAN. The unit REGEXP contains just the definition of regular expressions as informally presented above. The unit MATCH, based on the unit REGEXP, specifies the predicates match and longest-prefix-match. Finally, the function scan in the unit SCAN realises the lexical analysis. It takes a string and a list of regular expressions and returns a list of matched regular expressions and the rest string that can not be matched.

Unfortunately, the requirement specification given in [HET 92] has several drawbacks, which, accord-ing to our observations, preclude implementation freedom because of overspecification and, more se-riously, may result in implementations that are obviously not desired. The undesired implementations that can be constructed from the requirement specification are caused by underspecification in the ter-mination case. More analysis and explanation can be found in appendix B. For these reasons, the re-quirement specification was rewritten into a slightly more abstract, non-recursive specification of the function scan. In the course of development, the respective recursive version corresponding to [HET 92] is produced.

I.2 The Plan of the Development

Our ultimate goal of the development is an efficient algorithm with respect to the formal requirement specification given in the next sections. By an efficient algorithm, we aim at an algorithm that

•works with one character look-ahead; the complexity of the core matching function should be linear.

•should be "state-transition-based", where the set of states is required to be only dependent on the list of regular expressions.

Thus, we aim at an algorithm that works analogously to the well known power state automaton. Instead of inventing the tabulation of the state transition function, and then constructing an efficient algorithm and proving its correctness, we prefer to derive these concepts of solutions by a transformational de-velopment. As a consequence, our ultimate goal is divided into four subgoals according to four stages in the development. While the reasons behind each subgoal will be explained in detail in the transfor-mation strategies and the development plan in later chapters, we illustrate each subgoal informally. The subgoals can be roughly represented by the following figure:

(4)

DescriptiveScan stands for the abstract, descriptive and non-operational requirement specification of

the lexical scanner, which is the first subgoal in the development. Since no specific method supports the translation from an informal specification to a formal requirement specification, this goal is achieved by the developer’s understanding of LEX and the SPECTRUM specification language.

Re-cursiveScan, a recursive version of the DescriptiveScan specification, specifies the main function of

the problem in a recursive way such that the problem is constructive and further development can focus on operational improvement. This subgoal is achieved by the transformations. ConstructiveScan, a particular constructive version of RecursiveScan, aims at making a particular development strategy ap-plicable. Finally EfficientScan, an efficient implementation of ConstructiveScan, aiming at the afore-mentioned efficient solution, is achieved by the synthesis strategy "filter fusion". A closely related approach is "global search" as described in [SL 90].

According to the subgoals discussed above, the transformational development of LEX relies on the fol-lowing four intermediate stages derived from the development plan:

• Making the specification recursive • Developing a constructive version • Optimizing through filter fusion • Extracting the state function

After a restructuring of the requirement specification we start making the top level function, scan, re-cursive. This aim is achieved by the application of the transformation rule SPLITOFPOST. In the next step, we focus on making the specification constructive. The rules applied here are lifted theorems out of a set of lemmata included in the library of theories about maxima, sets and sequences. The whole problem is then reduced to a dynamic programming problem [BM 93] of the form:

F(x) = maxR(<) (P ➢ (Gen(x))

where maxR(<) is a function which selects the optimal solution out of a sets of possible solutions, P is a predicate,➢ filters all solutions satisfying the given predicate, and Gen is a generator of all possible solution candidates.

When LEX has been broken down for enabling the use of dynamic programming strategies, we use optimization techniques to increase efficiency. Here we focus on the fusion of the predicate with the generator of all possible solutions. This method is known as "filter fusion". It is mainly concerned with answering the following questions: Instead of generating all solution candidates and then testing them against the predicate, is there any way of improving the generator such that it produces only ac-ceptable candidates in one step? Is there any possibility of generating just admissible solutions, those

DescriptiveScan ⇑ RecursiveScan ⇑ ConstructiveScan ⇑ EfficientScan

(5)

where the predicate P is satisfied? In this special case, the generator, Gen, is reduced to the building of a matrix. This can be done in three ways, namely, in rows, columns or diagonals. In a later chapter we will concentrate on the fusion of the order relation maxR(<) with the rest of the function.

The last but not the least point is the extraction of the states which are typical to automata. The main point of this development is the derivation of the state transition function of the automaton Here every step is derived from a previously given version of the problem by applying lemmata and theorems from the library.

A few words should also be said about the style of the development, especially with respect to correct-ness issues. Throughout this paper, we will assume the consistency of basic data structures SEQ, SET, NAT and CHAR. Furthermore, we will assume the correctness of the transformations used; however, we will apply a very restricted set of basic transformations.

I.3 A Critique of the Requirement Specification

For the development of a lexical scanner, it is important to understand the formal specification of LEX, especially its essential parts like regular expressions and the function scan.

I.3.1 Some Problems with the Requirement Specification [HET92]

The specification of LEX in [HET92] has been used as a design specification and used for further de-velopment. It turned out that this specification is erroneous. Below, we reproduce a piece of the spec-ification and show through a simple example its erroneous character.

SCAN = {--- Signature of auxiliary functions

scan: String × Seq Regexp→ Seq Regexp× String

axioms∀ s, s´: String; rs, ts: Seq Regexp in

scan(s,rs) = (ts,s´)

⇒

s´ is_postfix_of s∧ (∀ r: Regexp. r∈ ts

⇒

r∈ rs)∧ ((ts = ε)

⇒

s= s´ ∧

∀ s´´: String; r:Regexp. r∈ rs ∧ s´´≠ ε

⇒

¬((s´´,r) is_prefix_match_of s) ) ∧

((ts ≠ ε)

⇒

∃ s1,s2: String, s = s1^s2 ∧ scan(s2, rs) = rest(ts, s´) ∧ (s1, first ts) is_longest_prefix-match_of (s, rs)

);

endaxioms}

scan (s, [ε]) = ([ε, ε, ε, ε, ε,....], s), is a valid result according to the above specification and obviously it causes non-termination during computation. We believe, that the errors in this specification stem from the fact that scan is defined recursively, because its authors want a rather constructive design specifi-cation of LEX. As an alternative, we prefer a more abstract and non-recursive version, which allows a more careful study of the preconditions of the algorithm and opens the perspective to even more effi-cient algorithms1. Its transformation into a recursive version can easily be done by a standard

transfor-1. The best known algorithms for scan have linear complexity. The recursive specification suggests an outermost loop around a function stripping of the next longest prefix match. Unfortunately, this result is an n2-solution (also chosen by the Berlin group.).

(6)

mation rule. In the following, we present a loosely specified LEX that consists of three modules: REGEXP, MATCH and SCAN..

I.3.2 Improved Specification of LEX

Firstly, regular expressions are specified. REGEXP = { enriches CHAR;

sort Regexp;

ε : Regexp; --- empty expression

mkreg : char→ Regexp; --- any character

.o. : Regexp×Regexp→Regexp; --- concatenation .|. : Regexp× Regexp→Regexp; --- union

*. : Regexp→Regexp; --- iteration

Regexp freely generated byε , mkreg, *, .o., .|.};

The important module MATCH is now specified below. It is just a translation of the informal problem description given as an introductory example into a formal text. The head of the module includes or imports sequences specified in appendix A and the data structure REGEXP specified above.

MATCH = { enriches NAT + SEQ + REGEXP;

.matches. : Regexp×Seq Char→ Bool;

.isPrefixMatchOf. : Seq Char×Regexp×Seq Char→Bool;

.isMatchOf. : (Seq Char× Regexp)× (Seq Char×Seq Regexp)→Bool; .isLongestPrefixMatchOf. : (Seq Char×Regexp)× (Seq Char× Seq Regexp)→Bool;

axioms∀ rs:Seq Regexp;c: Char; r₁, r₂, r:Regexp; s₁, s₂, s: Seq Char in

ε matches s = ( s =ε); mkreg(c) matches s = (%c = s); r1 I r2 matches s = r1 matches s∨ r2 matches s;

r₁ o r₂ matches s =∃ s₁, s₂ : Seq Char. s = s₁ ++ s₂∧ r₁ matches s₁∧ r₂ matches s₂; *(r) matches s = (s =ε)∨ ∃ ss:Seq (Seq Char). s = flatten ss∧

∀ st: Seq Char. st∈ ss⇒ st matches r; (s₁,r) isMatchOf (s₂,rs)⇔ r∈ rs∧ s₁ isPrefixOf s₂∧ r matches s₁;

(s₁,r₁) isLongestPrefixMatchOf (s,rs)⇔(s₁,r₁) isMatchOf (s,rs)∧

∀ s2:Seq Char; r2:Regexp. (s2 ,r2)

≠

(s1,r1) ∧ (s2,r2) isMatchOf (s,rs)⇒

#s₂ < #s₁∨ (#s₂ = #s₁∧ pos(r₁,rs) < pos(r₂,rs)); (s₁,r) isPrefixMatchOf (s₂) = r matches s₁∨ s₁ isPrefixOf s₂;

endaxioms};

The module SCAN now formally specifies LEX. Notice the difference of the loosely specified LEX without recursion and the original specification.

SCAN = {enriches MATCH;

scan: Seq Regexp× Seq Char→ Seq Regexp× Seq Char;

(7)

scan(rs, s) = (rs₁, s₁)⇔∃ strS:Seq(Seq Char). flatten strS ++ s1 = s ∧ #rs1 = #strS∧

strS ≠ ε⇒ ∀ i:Nat, s₂:Seq Char. i

<

#rs₁ ∧ s₂ =flatten (drop(i,strS)) ++ s₁ ⇒ (strS! i , rs₁! i) isLongestPrefixMatchOf(s₂, rs) ∧strS! i ≠ ε ∧

∀ r₂:Regexp; s₂:Seq Char. (s₂, r₂) isMatchOf (s₁, rs)⇒ s₂ = ε

endaxioms}

Fig. 2: Initial Dependency Graph drawed with daVinci V1.3

Fig. 2 shows development graph as defined by the requirement specification. The units on the horizon-tal line represent the library of basic data types, and the ones on the vertical line the starting point of the development.

I.4 Overview of the Paper

In chapter 2, we establish the foundation of the development, especially with respect to correctness is-sues: throughout this paper, we will assume the consistency of basic data structures such as SEQ, SET, NAT and CHAR (see appendix A). This is important, since we deliberately concentrated all lemmata stating basic properties over these structures there such that they can be reused in further develop-ments. Furthermore, we presented our formal specification of transformations in SPECTRUM via pa-rameterised specifications. All the transformations employed in the subsequent development are described as black boxes. A first attempt is made to understand the development process as a formal object by partly describing our development plan as a formal object.

In chapter 3, a recursive specification of the top level function, scan, is obtained. We apply a transfor-mation rule called SPLITOFPOST [HK 93] that recursifies an operational specification unit.

In chapter 4, a more constructive specification is obtained. The goal of the development is to yield a

ETC SET MAX CHAR REGEXP MATCH SCAN NAT SEQ

(8)

function that is written in the style of dynamic programming problems following [BM 93]. After this important step we focus our attention on a subgoal called PREMATCH. Here a product of regular ex-pressions and the initial parts of the string to be matched is built and then filtered by a predicate called

matches. At this stage of the development, we do not yet take care of the longest match.

In chapter 5, we adopt an advanced design and analysis technique of dynamic programming typically applied to optimization problems. It consists of applying the development technique of filter fusion, by reusing the transformational proof several times in order to get a systematic, non-post-hoc devel-opment of the critical preconditions of an adequate fusion theorem adequate to our problem context. In chapter 6, we instantiate the fusion theorem. We compute states out of the development process by using an advanced fusion strategy that leads to a state transition function. After the instantiation we eliminate the applicability condition.

In chapter 7, we develop the second fusion that computes the longest prefix match out of the solu-tions as described in chapter 6.

In chapter 8, we increase efficiency by tabulating the states. After a short overview about tabulation techniques, we compute an automaton for LEX.

We conclude in the last chapter by a brief evaluation of the whole development. The main results are summarized and the future work is presented.

Appendix A contains the basic data types, while in appendix B, we present the original requirement specification. Appendix C contains development alternatives based on definitions of cartesian prod-ucts. All these depend on how the product is built. Given the product of regular expressions and inits of a given string as a matrix, many strategies can be used. A vertical method, a horizontal and a diag-onal one. We use one of these methods to underline the efficiency of the development relying on com-piler techniques.

I.7 How to Read this Paper

Every chapter consists of a main part containing the formal development and a summary where the main results are put together. Where it was possible, we have presented dependency graphs that sum-marize the transformational development at the end of each chapter. The chapters build on each other. On a first read, the reader may skip the formal proofs and concentrate on the summaries.

(9)

II

Formal Preliminaries

II.1 Foundational Definitions of our Framework

Transformation in our sense is a form of deduction, applying transformation rules in a kind of

term-rewriting process. A transformation rule is a pair of terms l and r, written as l⇒ r. If a term t matches l, then a term t' can be constructed out of r from the match of t and l. In this case we say that the

trans-formation application (l⇒r) (t) is reduced to t'. This reduction is also called a transformation step.

Rules can even be higher-order: e.g. (l⇒ r)⇒(c l ⇒ c r) is a possible transformation rule, if c is a

con-structor of a higher-order term language and l, r, t, t´ range only over terms (but not over transformation rules). The rule arrow binds to the right. We will sloppily use the word "transformation" for the con-cept, the step and the rule.

Contexts are term functions that can be represented byλ-abstractions: λξ. c (cξ) is the context

com-posed of the two nested constructors c around a "hole"ξ. Context sensitive transformation rules allow

the definition of applicability conditions Φ_χbased on annotation functions (or: attributes). Here, (l

⇒_Φχ r) (t) is reduced to t' if matching and reconstruction is possible and the applicability condition Φ_χ

is fulfilled. Annotation functions assign for each pair of a term t (w.r.t. t') and its contextχ a value like a type or the set of bound variables. With applicability conditions it is possible to require a particular type for a sub-expression or that a certain variable does not occur free in it. We will not go into more detail here, the interested reader is referred to [WS 93].

Let C be the set of Hindley-Milner typed constants and V the set of variables. Let base-terms M be M:=

C | V | M M. Let E be a set of conditional equations over M and D_Σ be a set of values for all congruence

classes over M modulo E and modulo α, β andη conversion (as a consequence of the typing, equality is decidable). The syntax T of transformation expressions is defined by: T:= C | V | T ⇒_Φχ T | T T | T; T | [V] T, where ";" denotes the sequential composition and [x]T a form of scoping based on universal

quantification. The typing of rules is analogous to the typing of expressions. We will usually omit the applicability conditionΦ_χ of an transformation arrow. In order to describe the semantics of transfor-mation rules, we introduce at first the set E to be the least solution (w.r.t. to set inclusion) for the fol-lowing set-equation:

E = D_Σ ∪ (E ×Ε)

The semantic function Sem has the type Sem: T→ ℘(E), where℘(E) is the powerset of E. Sem is

defined as follows:

Sem[c] = {[t]_Σ}

Sem[r⇒_Φχl] = {(t, t') |p∈ Sem[r σ] ∧ p =_E t ∧ q∈Sem[lσ] ∧q =_E t' ∧ Φ_χ(t, t')}

Sem[S T] = {t | (t’,t)∈Sem[S]∧t∈Sem[T]}

Sem[S T] = {(t,t’’) |∃t’.(t,t’)∈Sem[S]∧(t’,t’’)∈Sem[T]} Sem[[x] T] = {(t,t’) |∃t’’.(t,t’)∈Sem[S {x:=t’’}]}

where =_E is a term congruence modulo E andσ is a substitution replacing the free variables in r and l (also called matching variables) with other terms. The transformation (r⇒l)(t)reduces to t' iff (t, t')

∈Sem[r ⇒ l]. We will write t (r⇒ l) t' instead of (r⇒ l)(t) reduces to t'. A transformation r ⇒l is

applicable to t iff∃t'.t (r ⇒l)t'. The crucial focus-transformation (apply a rule R in a particular context

(10)

F=defχ ⇒(a⇒b) ⇒ χ(a) ⇒ χ(b)

the variables a, b,χ are term variables, while in the axiom R(F(λξ.ξ))R the variable R is a

transfor-mation variable (for which we will only use R, S, T). F has the type F: (α → β)→ (α → α) → (β → β). The focus transformation of context sensitive functions is defined analogously:

F=defχ ⇒(a⇒_Φχ_'b)⇒ (χ(a) ⇒_{Φχ. χ}_' χ(b))

whereο denotes the function composition andχ ο χ' realizes the adjustment to the local context, over which context predicates are assumed to hold.

II.1.1 SPECTRUM-Transformations and Their Correctness

We will not formally define a concrete instance of our transformational framework with SPECTRUM. Such an instance would comprise:

•a complete definition of the abstract syntax of SPECTRUM •the definition of a suitable matching combinator theory E

•the definition of annotation functions assigning to the type and the set of bound variables to pairs of contexts and expressions (i.e. a function that realises the complete static analysis of SPECTRUM),

•the definition of an annotation function assigning to pairs of contexts and expressions a function assigning to each type the set of its constructors (if known). This annotation is necessary for the transformations, for example, CASEDISTINCT.

Since SPECTRUM is still under development and parts of the formal definition are not yet stabilised, this is clearly not possible. For this reason, we will only define an abstract instance, i.e. described by transformation relations, and discuss its connection to the SPECTRUM-calculus.

II.1.2 A Notation for the Abstract Syntax of SPECTRUM

We will use the concrete syntax to denote SPECTRUM abstract syntax terms. A context term will be used to denote a higher order abstract syntax term, for example:

λξ.SCAN ={enriches MATCH + NEXT-MATCH;

scan: Seq Regexp × Seq Char → Seq Regexp × Seq Char;

axioms ∀ s:Seq Char. rs:Seq Regexp in

ξ

endaxioms}

Since SPECTRUM is the object language, "variables" like s in the sense of SPECTRUM are constants in the sense of our framework, while the variables in the common sense can be interpreted as "holes" in SPECTRUM terms.

II.1.3 An Interface to Context Correctness

The context correctness of a SPECTRUM-specification depends partly on the imported specifications. As a consequence, context correctness can only be decided in an import-closed collection of units. For

(11)

this reason, we assume a sequence of SPECTRUM-specifications, called a system. Furthermore, we assume a function CC: system→ bool that decides if a system is context correct. Global

transforma-tions are relatransforma-tions on context correct systems, local transformatransforma-tions T (i.e. transformatransforma-tions on expres-sions) are relations, that require the existence of a system context that allows the embedding in a global transformation:

t (T) t'⇒ ∃ χ. CC(χ(t))∧ CC(χ(t'))

II.1.4 The Semantic Foundation of "Simple" Transformations

The semantic foundation (w.r.t. to the object language) of the local transformations that we use throughout this paper are logical inference rules from the SPECTRUM calculus. For example, the ex-istential quantifier introduction of SPECTRUM:

Γ |− A[x:=t] (∃

Γ |− ∃x.A

yields the foundation for the local transformation (performing backwards style): EXELIM =def t ⇒(∃x.A ⇒A{x:=t})

Many transformations are simple reformulations of logical rules; hence, their correctness is not diffi-cult to prove.

II.1.5 The Semantic Foundation of "Complex" Transformations

Of course, the crown jewels of transformational program development are more complex transforma-tions having the character of "development strategies", "design tactics" (cf. [SL 90]) or "design meth-ods" (cf. [HK 93]). These transformations realise more abstract goals like "recursifying a function definition" in a program synthesis style. Correctness is a critical issue here. Fortunately, these trans-formations can be separated into a logical core (or synthesis theorem) and tactical sugar. It turns out that the logical core can be seen as a theorem to be proven in the logics of the object language. The form of synthesis theorems is inspired by the informal scheme of a transformation given in [HK 93], pp. 100.

Fig. 3: Transformation scheme

INPUT Pattern

of program fragment

OUTPUT Pattern of program fragment

Let

Parameters with static constraints

Where

Constraints on pattern variables

Such that

Verification Conditions

(12)

Translated into a scheme of a synthesis theorem, this can be reformulated as follows: SYN=param X ={sort S1, ...,Sm;

F1 : Typ1, -- major matching variables

...

Fm : Typm}

body { enriches X <predefined operation symbols> axioms∀<minor matching variables> in

∃ <parameters of the transformation>. <verification condition>

⇒

<input fragment> ≅ <output fragment> endaxioms}

Here≅ denotes less definedness, implication, equality or equivalence.

An instance of this scheme with the development strategy "Split of Postcondition" is given in the next chapter. The logical correctness of a local transformation can be proven by verifying the consistency of the underlying synthesis theorem in the logic of the object language. The application of a transfor-mation SYN to a structure A at position p can be understood as a tactical program that (we assume the usual backward-chaining proof-strategy):

•constructs a substitution s via unification of <input fragment> and the subfragment of A at p

•Refines A via enrichment with SYN, instantiated with the substitution: ... enriches SYN[σ(S1)/S1, ..., σ(Fm)/Fm]; ...

•expands this enrichment via unfolding.

•Applies "quantor introduction" for the parameters of the transformation (suitable substitutions are usually demanded from the user)

•Applies "implication introduction" to describe the behaviour input/output fragment • Starts a subproof for the <verification condition>

As a result, an axiom has been produced in A stating that the subfragment at p is equal (equivalent, less defined than, implies, corresponding to the operator≅) to the instantiated output pattern of SYN. This tactical program is exactly what we meant by "tactical sugar" in the introduction to this section. In the light of the antimony "logical core" vs. "tactical sugar", the transformational approach can be seen as a tactically highly supported program deduction style, that allows the synthesis of the next sub-goals of the program development with the construction of the verification condition.

II.2 The Synthesis Theorem of “Split-of-Post-Condition”

Here we specialise the transformation rule given ([HK 93], page 99 - 127) above in a SPECTRUM-like logic. We present here a syntactic translation of the PROSPECTRA rule "SplitOf Post" ([HK 93], page 109). Since SPECTRUM does not have transformation rules as in PROSPECTRA, we are not reasoning according to a formal semantics of the rules w.r.t. to SPECTRUM.

(13)

SOPC = param X= { sort R, S;

f : S→R;

Inv, B: S× R→bool }

body { enriches X, NAT;

for : nat→ (S×R→R)→S×R→ R;

axioms∀i, h, x, y in

for(0) (h) (x , y) = y;

i >0⇒for (i) (h) (x , y) = for (pred i) h(x, (h) (x, y));

endaxioms;

axioms∀ x, y, z in

∀ E : S→R, H: S×R→R. --"parameter" (Inv(x, E(x))∧ Inv(x,y)∧ ¬B(x,y)⇒ Inv(x,H(x, y))∧

∃i. B(x, for (i) (H) (x, E x)) -- "appl. cond."

) ⇒

(z = f(x)⇒ Inv(x, z)∧B(x, z)) -- "input pattern"

⇔ f(x) = letrec g(x,y) = if B(x,y) then y else g(x, H(x, y)) endif

in g(x, E x)

endlet -- "output pattern"

endaxioms; }

The part of the specification labelled with param denotes the input fragment as defined by the trans-formation rule SPLITOFPOST. It contains the matching variables that are provided by the input pat-tern. The axioms and the applicability conditions are translated into a Spectrum-like logic. Proving the applicability condition with concrete parameters ensures the correctness of the employed transforma-tion. Therefore, if a specification is matched with the input pattern and the reduction of the applicabil-ity conditions with instantiated parameters is successful, we can get a recursive definition of the function from the original axiomatic specification.

II.3.1 Overview of the Transformations Used in Our Development

In this paper, we content ourselves with very elementary presentation and re-arrangement of the formal development based on the laws for the focus transformation in the previous section. We will not present all transformations used in this SPECTRUM-like framework. We will just mention them here. They are informally described below (cf. [HK 93]):

FOLD, UNFOLD: expr➥ expr;

These transformations take an expression and return another expression. Their goal is to produce re-cursive definitions or to prepare for further simplification. A variant of these traformation rules are:

FOLDASM, UNFOLDASM:expr➥(expr➥expr)➥ expr➥expr;

that transform the conclusion of an implication A⇒ B by using an equation E, must occur in A, at the context X, that must be a proper context of B.

ASSUMPTION(A), HASSUMPTION(A)(B): expr➥expr;

creates at an arbitrary place of a specification implications of the form A⇒ A rsp. A⇒ (B⇒B).

(14)

takes the name of a specification unit in the system and replace this unit enrichment by the content of the named specification.

EXELIM(t): expr ➥ expr;

eliminates an existential quantifier in an expression.

ENRICHSIG(name): unit➥ unit;

takes the name of a specification unit and returns a new unit where the axioms of the enriched unit are implicitly contained in the new unit produced.

CASEDISTINCTION: name➥ expr➥ expr;

The goal of this transformation is to introduce conditions which lead to a case distinction in the original definition. This may cause simplification of right hand sides of equations, giving a chance for achiev-ing a recursive definition. name is a variable whose sort must be freely generated and that occurs free in expr.

SPLITOFPOST: expr ➥ expr ➥ expr ➥ expr;

This transformation was discussed in the previous section. Its first two arguments stand for the param-eter E and H, the third for the boolean expression to be transformed.

There are still many other transformations that we can not mention all here. Some of them are suggest-ed by their name like LET, SWAP, MOD_PONENS etc... (fold with equation from the assumption of an implication).

(15)

III Towards a Recursive Version

III.1 Preparing the Application of the SPLITOFPOST- Transformation

In order to produce a recursive version for scan out of a non-recursive one, we apply the trans-formation SPLITOFPOST (that is represented in the form of a SPECTRUM specification SOPC), which has been given in the previous section. We concentrate for the moment on the "input pattern", that will be matched with a subterm of the unit SCAN in order to be trans-formed. Of course, this subterm will be a logical expression, that is:

scan(s, rs) = (rs₁, s₁)⇔ . . . ∧ . . .

After performing a preparatory transformation that applies symmetry to the equality (such that

z in the patternz = f(x)⇒ Inv(x, z)∧ B(x, z) may match(rs1, s1) in the above specification), the matching will construct an assignment forInv andB automatically. The skills of the implemen-tor consist in providing the right parameters for E andH as defined by the following applicabil-ity condition:

(Inv(x, E(x))∧Inv(x,y)∧ ¬ B(x,y)⇒ Inv(x,H(x, y))∧ ∃ i. B(x, for (i) (H) (x, E x)).

H can be seen as the body of the recursion andE as the starting value. The resulting specification contains an additional implication. Further logical transformations will allow the elimination of the precondition of this implication (the instantiated applicability condition), such that we fi-nally deduce the unconditional right-hand-side of the equivalence (the instantiated output pat-tern), containing the recursive definition ofscan. Informally, the body of the recursion H must contain a function that produces the next longest prefix match and inserts it into the global result of scan. In order not to blow up our target specification too much, we introduce a new unit, a sort of auxiliary function, which will enable us, by an enrichment, to build the recursive version of the function.This new unit provides next_match, a function that will be used in the instan-tiation of the parameters of SPLITOFPOST.

NEXT_MATCH= {enriches MATCH;

next_match: Seq Regexp×Seq Regexp ×Seq Char→ Seq Regexp×Seq Char;

axioms∀s, s₁:Seq Char; rs, rs₁:Seq Regexp, ∀s₂:Seq; rs₂:Seq Regexp in

δ

(

next_match (rs, rs₁, s₁))

next_match (rs, rs₁, s₁) = (rs₂, s₂)⇔

∃ s3:Seq Char. s1 = s3 ++ s2∧s3≠ ε ∧ ∃r: Regexp. rs2 = rs1 ++ %r∧

(s₃,r) isLongestPrefixMatchOf (s₁, rs);

next_match (rs, rs1, s1) = (rs1, s1) ⇔ ∀s2:Seq; rs2:Seq Regexp

(s₂,r₂) isMatchOf (s₁, rs)⇒s₂ = ε);

endaxioms}

where next_match computes the next longest match ofs1 and rs1 with respect to the input of the regular expression list rs. Now we can formally apply the transformation SPLITOFPOST to the aforementioned version. We briefly sketch the plan of this sub-development in below:

(16)

III.1.1 The Instantiation of SPLITOFPOST

As mentioned, the transformation SPLITOFPOST must be "applied" to an expression that is matchable to the input pattern. For this purpose, we have to combine it with the focus transfor-mation F and a proper context (let us denote this context asCon₁):

F(Con₁)(SPLITOFPOST(E)(H)) whereE andH still have to be defined.

The application of the transformation internally constructs the following substitutions: S |→ Seq Char × Seq Regexp;

R |→ Seq Regexp× Seq Char;

f |→ scan;

Inv |→ λx:S,z:R.∃ strS:Seq(Seq Char).

flatten strS ++π₂(z) =π₁(x) ∧ #π₁(z) = #strS∧ strS ≠ ε⇒ ∀ i:Nat, s₂:Seq Char. i< #π₁(z) ∧

s2 =flatten (drop(i,strS)) ++π2(z) ⇒

(strS ! i ,π1(z) ! i) isLongestPrefixMatchOf(s2,π2(x) ∧(strS ! i≠ ε)

B |→ λx:S,z:R.∀ r₂:Regexp; s₂:Seq Char. (s₂, r₂) isMatchOf (π₂(z),π₂(x))⇒ s₂ = ε;

z |→ (rs₁, s₁) x |→ (s, rs)

together with the dependent substitution constraints, whereπ_i is the i-th projection: s =π₁(x) rs =π₂(x) rs₁ =π₁(z) s₁ =π₂(z)

As parameters of the transformation, we choose:

E |→ λx:S. (ε,s); H |→ λx:S, y:R. next_match(π₂(x), y); E is the starting value and H the body of the recursion.

III.1.2 Intermediate Results

In order not to expand the transformed unit too much and avoiding a clutter-up with too much detail, we will maintain the substitution variablesInv,S, R, B andH as textual abbreviations in the following: Enriching by NEXT-MATCH ⇓ Preparing SPLITOFPOST ⇓ Instantiating SPLITOFPOST ⇓

Eliminating applicability conditions

(17)

SCAN1 = {enriches NAT+ MATCH+NEXT-MATCH;

scan : Seq Regexp× Seq Char→ Seq Regexp× Seq Char; for : nat→ (S× R→ R)→ S× R→ R;

axioms∀ i, h, x, y in for(0) (h) (x , y) = y

i > 0⇒ for(i)(h)(x , y) = for(pred i)h(x,h(x, y))

endaxioms;

axioms ∀ s, s₁:Seq Char; r:Regexp; rs₁,rs₂:Seq Regexp in

(rs1, s1) = scan(s, rs) ⇔ Inv((s, rs),(rs1, s1)) ∧ B((s, rs),(rs1, s1)) ∧

(∀ x,y. Inv(x, (ε,s))∧Inv(x,y)∧ ¬ B(x,y)⇒ Inv(x,H(x, y))∧

∃ i. B(x, for(i)(H)(x, (ε,s)) )

⇒

((rs₁, s₁) = scan(rs₁, s₁)⇔ Inv((rs₁, s₁), (rs₁, s₁))∧ B((rs₁, s₁), (rs₁, s₁)))

⇔

scan(rs₁, s₁) = letrec g(x,y) = if B(x,y) then y else g(x, H(x, y)) endif in g((rs₁, s₁), (ε,s))

endlet endaxioms }

III.2 The Elimination (Proof) of ScanSplit, the Applicability Condition

We focus on the applicability conditionScanSplit, for the correctness of the application of the transformation depends on the succefull reduction ofScanSplit, that is, it can be proved in the given context. As we know from the first-order logic that universal quantifier is distributed ovey the conjunction and the non-emptiness of the domains of the variables, the following sim-plifications can be done onScanSplit

∀ x, y. Inv(x, (ε,π₁(x)))∧ Inv(x, y)∧

¬ B(x, y)⇒ Inv(x,H(x, y))∧

∃ i. B(x, for(i)(H)(x, (ε,π₁(x))))

⇔ ∀ x, y. Inv(x, (ε,π₁(x)) ∧ ---- A

∀ x, y. Inv(x, y)∧ ¬ B(x, y)⇒ Inv(x, H(x, y)) ∧ ---- B

∀ x, y. ∃ i. B(x, for(i)(H)(x, (ε,π₁(x)))) ---- C

ScanSplit can be divided in three parts A, B and C; this enables a modular construction of the proofs.

STEP 1: The proof of the first part ofScanSplit, A.

∀ x,y. Inv(x, (ε,π₁(x)))

⇔ ∀ x. Inv(x, (ε,π₁(x))))

⇔ ∀ s, rs. Inv((s, rs), (ε, s))

⇔ ∀ s, rs. ∃ strS:Seq(Seq Char).

flatten strS ++ s = s ∧ #ε = #strS∧ strS ≠ ε⇒ ∀ i:Nat, s2:Seq Char. i

<

#ε ∧

s2 =flatten (drop(i,strS)) ++ε ⇒

(18)

⇔ ∀ s, rs. ε ++ s = s ∧ #ε = #ε ∧True

⇔ ∀ s, rs. s = s∧ 0 = 0

⇔ True

STEP 2: The proof of the second part ofScanSplit, B.

∀ x, y. Inv(x, y)∧ ¬ B(x, y)⇒ Inv(x, H(x, y))

⇔ ∀ s, rs, rs1, s1. Inv(s, rs, rs1,s1)∧ ¬ B(s, rs ,rs1, s1)⇒ Inv(s, rs, H(s, rs, rs1, s1)

⇔ ∀ s, rs, rs1, s1. (∃ strS:Seq(Seq Char). flatten strS ++s1 =s ∧ #rs1= #strS∧ strS ≠ ε⇒ ∀ i:Nat, s₂:Seq Char. i

<

#rs1 ∧

s2 =flatten (drop(i, strS)) ++ s1 ⇒

(strS! i , rs1! i) isLongestPrefixMatchOf(s₂, rs))∧strS ! i ≠ ε

∧¬(∀ r₃:Regexp; s₃:Seq Char. (s₃, r₃) isMatchOf (s1, rs)⇒ s₃ = ε;)

⇒ Inv(s, rs, H(s, rs, rs1, s1)

We establisch following prerequisite of the proof namely

∀ s, rs, rs1, s1. (∃ strS:Seq(Seq Char).

flatten strS ++s1 =s ∧ #rs1= #strS∧ strS ≠ ε⇒ ∀ i:Nat, s₂:Seq Char. i

<

#rs1 ∧

s2 =flatten (drop(i, strS)) ++ s1 ⇒

(strS! i , rs1! i) isLongestPrefixMatchOf(s₂, rs))∧strS ! i ≠ ε

∧¬(∀ r₃:Regexp; s₃:Seq Char. (s₃, r₃) isMatchOf (s1, rs)⇒ s₃ = ε;) and setting C1= flatten strS ++ s1 =s ∧ #rs1= #strS

C2 = strS ≠ ε⇒ ∀ i:Nat, s₂:Seq Char. i

<

#rs1 ∧ s₂ =flatten (drop(i, strS)) ++ s1⇒ (strS! i , rs1! i) isLongestPrefixMatchOf(s₂, rs))∧strS ! i ≠ ε

C3 =¬(∀ r₃:Regexp; s₃:Seq Char. (s₃, r₃) isMatchOf (s1, rs)⇒ s₃ = ε;)

⇒ Inv(s,rs, next_match (rs, rs1, s1))

By definition of next_match, we can split the proof into two cases,

Case 1: (next_match (rs, rs₁, s₁) = (rs₂, s₂)) ⇒ Inv(s,rs,rs2,s2)

⇒ ∃ strS1. flatten strS1 ++ s2 =s ∧ #rs2 = #strS1∧ strS1 ≠ ε⇒ ∀ i:Nat, s₃:Seq Char. i

<

#rs2 ∧

s3 =flatten (drop(i,strS1)) ++ s2 ⇒

(strS1 ! i , rs2 ! i) isLongestPrefixMatchOf(s₃,rs)∧strS1 ! i ≠ ε For the following substitutions and derived properties

flatten strS ++ s1 = s condition C1 flatten strS1 ++ s2 = s Proof goal #rs1 = #strS

#strS + 1 = #strS1 # rs2= #rs1 + 1

(19)

⇒ ∃ strS1. flatten strS ++ s2 =s ∧ #rs1 = #strS ∧

strS1 ≠ ε⇒ ∀ i:Nat, s₃:Seq Char. i

<

#rs2 ∧

(strS1 ! i , rs2 ! i) isLongestPrefixMatchOf(s₃,rs)∧strS1 ! i ≠ ε ⇒ ∃ strS1. (since #strS1 = #strS +1 this means strS1 ≠ ε)

True⇒ ∀ i:Nat, s3:Seq Char. i

<

#rs2 ∧

(strS1 ! i , rs2 ! i) isLongestPrefixMatchOf(s₃,rs)∧strS1 ! i ≠ ε C2 = strS ≠ ε⇒ ∀ i:Nat, s₂:Seq Char. i

<

#rs1 ∧ s₂ =flatten (drop(i, strS)) ++ s1⇒

(strS! i , rs1! i) isLongestPrefixMatchOf(s2, rs))∧strS ! i ≠ ε

We proceed by Induction over the length of strS1 i = 0

⇒ ∃ strS1. (since #strS1 = #strS +1 this means strS1 ≠ ε) True⇒ (True ∧ s₃ = s2 ⇒

(first(strS1) , first(rs2)) isLongestPrefixMatchOf(s₂,rs)∧ first(strS1) ≠ ε) Because

first(strS1) = first(strS) first(rs2) = first(rs1)

⇒ (since #strS1 = #strS +1 this means strS1 ≠ ε) True⇒ (True ∧ s₃ =s₂ ⇒

(first(strS) , first(rs1)) isLongestPrefixMatchOf(s₂,rs)∧first(strS) ≠ ε) Compared to the presupposed conditions, it yields

⇒ (True case i = #strS = #rs1

⇒ ∃ strS1. (since #strS1 = #strS +1 this means strS1 ≠ ε) True⇒ (True∧s₃ =flatten (drop(#strS, strS1)) ++ s2 ⇒

(strS1 ! #strS , rs2 ! #rs1) isLongestPrefixMatchOf(s₃,rs)∧strS1 ! #strS ≠ ε if (drop(#strS, strS1)) = s4

strS1 ! #strS = s4 rs2 ! #rs1 = r4

⇒ (since #strS1 = #strS +1 this means strS1 ≠ ε) True⇒ (True∧ s₃ = s4 ++ s2 ⇒

(s4 , r4) isLongestPrefixMatchOf(s₃,rs)∧s4 ≠ ε And this is exactly the conditions mentionned above

(20)

Case 2:(next_match (rs, rs1,s1) = (rs1,s1))

⇒ Inv(s,rs,rs1,s1) By fold

⇔ ∀ s,rs, rs1,s1. Inv(s, rs, rs1,s1)∧

¬(∀ r₂:Regexp; s₂:Seq Char. (s₂, r₂) isMatchOf (s1,rs)⇒ s₂ = ε;)

⇒ Inv(s, rs, rs1,s1) By case distinction

(∀ r₂:Regexp; s4:Seq Char. (s4, r₂) isMatchOf (s1,rs)⇒ s4 = ε;) = True

⇔ ∀ s,rs, rs1,s1. Inv(s, rs, rs1,s1)∧ ¬True⇒ Inv(s, rs, rs1,s1)

⇔ ∀ s,rs, rs1,s1. False ⇒Inv(s, rs, rs1,s1)

⇔ ∀ s,rs, rs1,s1. True

(∀ r₂:Regexp; s4:Seq Char. (s4, r₂) isMatchOf (s1,rs)⇒ s₄ = ε;) = False

⇔ ∀ s,rs, rs1,s1. Inv(s, rs, rs1,s1)∧ ¬False⇒ Inv(s, rs, rs1,s1)

⇔ ∀ s,rs, rs1,s1. Inv(s, rs, rs1,s1)⇒Inv(s, rs, rs1,s1)

⇔ ∀ s,rs, rs1,s1. True

We turn now our attention to the termination condition in the applicability condition. Step 3: The proof of the third part ofScanSplit, C.

∀ x, y. ∃ i. B(x, for(i)(H)(x, (ε,π1(x))))

⇔ ∀ x, ∃ i. B(x, for(i)(H)(x, (ε,π₁(x))))

⇔ ∀ s, rs,∃ i. B((s, rs), for(i)(H)((s, rs), (ε, s)))

We proceed the proof then by induction on the inverse construction of character sequence s, that is, empty list and list constructed by adding an element to its end. The idea behind this induction is to simulate the computation of longst matchable sequence regarding to an arbitrary sequence. Suppose that having computed the longst matchable sequence of a given sequence, when add-ing an element to the end of the sequence, then it is possible that the whole sequence becomes a longst matchable sequence, otherwise the longst matchable sequence is that of the original se-quence. We will guide our inductive proof in the following by this informal description.

Induction Basis:(s =ε) and we let i = 0

∀ rs. B((ε, rs), for(0)(H)((ε, rs), (ε,ε)))

unfold ⇔ ∀ rs. B((ε, rs), next_match(rs, ε,ε)) (1)

By the definition of next_match, we have the following derivations.

∀ s.ε = s ++ε ⇒ s =ε

⇔ ∀ s. (¬ ε = s ++ε) ∨ s =ε

⇔ ¬ ∃ s.¬( (¬ ε = s ++ε) ∨ s =ε)

⇔ ¬ ∃ s. (ε = s ++ε) ∧ s≠ ε)

⇔ ¬(∃ s₃:Seq Char.ε = s₃ ++ε ∧ s₃≠ ε ∧ ∃ r: Regexp.ε =ε ++ %r

∧ (s₃, r) isLongestPrefixMatchOf (ε, rs));

(21)

We continue the proof of (1) by substitute next_match and unfold B as follows. (1)

⇔ ∀ rs. B((ε, rs), (ε,ε))

unfold ⇔ ∀ rs: Seq Regexp;∀ r₂:Regexp; s₂:Seq Char. (s₂, r₂) isMatchOf (ε, rs)⇒ s₂ = ε

⇔ (s2 isPrefixOfε)⇒s2 = ε

⇔ True

The above proof has established then the induction basis. Remembering that the list is consid-ered in inverse construction, that is, for induction step, we have

Induction Step:(s = s’ ++ %a) and we assume that there is a nutural numberi: Natsuch that

∀ rs. B((s’, rs), for(i)(H)((s’, rs), (ε, s’))) (2)

With this inudction hypothesis we want to prove that there is a j: Natsuch that

∀ rs. B((s’, rs), for(i)(H)((s’, rs), (ε, s’)))

⇔ ∀ rs. B((s’ ++ %a, rs), for(j)(H)((s’ ++ %a, rs), (ε, s’ ++ %a))) (3) We claim that the wanted nutural number is j = i +1, that is,

∀ rs. B((s’ ++ %a, rs), for(i+1)(H)((s’ ++ %a, rs), (ε, s’ ++ %a))) (4) is provable. By definition of the functionfor, we can assume that there is a pair

(rs₁: Seq Regexp; s₁: Seq Char) =for(i+1)(H)((s’ ++ %a, rs), (ε, s’ ++ %a))) and by substituting of function call regardingforand unfoldingBwe obtain

(4)

unfold ⇔ ∀ rs: Seq Regexp;∀ r2:Regexp; s2:Seq Char. (s2, r2) isMatchOf (s1, rs)⇒ s2 = ε (5)

In order to prove (5) is true, we have to analysis the properties of the results computed by the function calledfor(i+1)(H)((s’ ++ %a, rs), (ε, s’ ++ %a))).After the analysis, we can perform the proof by case distiction, for we will show that the results of the function callfor can be expressed by two outputs. Firstly, let us make the following notational assumptions.

(rsk, sk) =for(k)(H)((s, rs), (ε, s))) (rsk

1, sk1) =for(k)(H)((s ++ %a, rs), (ε, s ++ %a)))

Firstly, we establish the following lemmas.

∀ rs, s₁, s₂.(s₂ isPrefixOf s₁)∧ (∀ r₃, s₃. ¬ (s₃, r₃) isLongstPrefixMatchOf (s₁, rs)) ⇒∀ r₃, s₃.¬(s₃, r₃) isLongstPrefixMatchOf (s₂, rs))

∀ rs, s₁, s₂.(s₂ isPrefixOf s₁)∧ (∀ r₃, s₃. (s₃ isPrefixOf s₂)

∧ ((s₃, r₃) isLongstPrefixMatchOf (s₁, rs)) ⇒ (s₃, r₃) isLongstPrefixMatchOf (s₂, rs))

The next lemma we want to prove is

(22)

by induction on the natural number k

Induction Basis:(k =0)

(rs0, s0) =for(0)(H)((s, rs), (ε, s))) ⇒ (rs0_{, s}0₎_{= next_match(rs,}ε, s)

(rs0

1, s01) =for(0)(H)((s ++ %a, rs), (ε, s ++ %a)))

⇒ (rs0

1, s01) = next_match(rs,ε,s ++ %a)

Case I ∀ r₂, s₂. (s₂, r₂)isLongstPrefixMatchOf (s ++ %a, rs)⇒ s₂ = ε (I.1) ⇒ next_match(rs,ε,s ++ %a) = (ε,s ++ %a) ⇒ (rs01, s01) = (ε,s ++ %a) (I.2) ⇒ ∀ r₂, s₂. (s₂, r₂)isLongstPrefixMatchOf (s, rs)⇒ s₂ = ε ⇒ next_match(rs,ε,s) = (ε,s) ⇒ (rs0, s0) = (ε,s) ⇒ s0 1=s0 ++ %a ⇒s01=ε∨ ((s01=s0 ++ %a)∧rs01=rs0)

Case II ∃ s₃:Seq Char. (s ++ %a) = s₃ ++ s0

1∧s3≠ ε ∧ ∃r: Regexp. rs01 =ε ++ %r∧ (s₃, r) isLongestPrefixMatchOf (s ++ %a, rs); CaseΙΙ.1 (s ++ %a) = s₃ ⇒ s0₁=ε ⇒ s0 1=ε∨ ((s01=s0 ++ %a)∧rs01=rs0) CaseΙΙ.1 s₃ isPrefixOf s ⇒ ∃ s₄:Seq Char. (s ++ %a) = s₃ ++ s₄++ %a ⇒ s = s₃ ++ s₄∧s0 1=s4++ %a ⇒ (s₃, r) isLongestPrefixMatchOf (s, rs)

⇒ ∃ s₄:Seq Char. s = s₃ ++ s₄∧s₃≠ ε ∧ ∃r: Regexp. rs0

1 =ε ++ %r∧ (s₃, r) isLongestPrefixMatchOf (s, rs); ⇒ ( rs0 1, s4) = next_match(rs,ε, s) ⇒ (rs0_{, s}0₎_{= (}_rs0 1, s4) ⇒ (rs0, s0++ %a) = (rs0₁, s₄++ %a) ⇒ (rs0_{, s}0_{++ %a)}_{= (}_rs0 1, s01) ⇒ s01=ε∨ ((s01=s0 ++ %a)∧rs01=rs0)

Induction Step: we assume that (sk

1=ε)∨ ((sk1=sk ++ %a)∧rsk+11=rsk+1) and to prove that

(sk+1

1=ε)∨ ((sk+11=sk+1 ++ %a)∧rsk+11=rsk+1)

(rsk+1, sk+1) =for(k+1)(H)((s, rs), (ε, s)))

(23)

⇒ (rsk+1, sk+1) = next_match(rs,rsk, sk) (rsk+1

1, sk+11) =for(k+1)(H)((s ++ %a, rs), (ε, s ++ %a)))

⇒ (rsk+1₁, sk+1₁) = next_match(rs,rsk₁, sk₁) Case I ∀ r₂, s₂. (s₂, r₂)isLongstPrefixMatchOf (sk 1, rs)⇒ s2 = ε ⇒ (rsk1, sk1) = next_match(rs,rsk1, sk1) ⇒ (rsk₁, sk₁) = (rsk+1₁, sk+1₁) ⇒ (sk 1=ε)∨ ((sk1=sk ++ %a)∧rsk1=rsk) ⇒sk+11=ε∨((sk+11=sk ++ %a)∧rsk+11=rsk) ⇒∀ r₂, s₂. (s₂, r₂)isLongstPrefixMatchOf (sk_{, rs)}⇒_s 2 = ε ⇒ (sk+1₌_sk₎_{∧ (}_rsk+1₌_rsk₎ ⇒ (sk+1₁=ε)∨ ((sk+1₁=sk+1 ++ %a)∧rsk+1₁=rsk+1) Case II ∃ s₃:Seq Char. sk

1= s3 ++ sk+11∧s3≠ ε ∧ ∃r: Regexp. rsk+11 = rsk1 ++ %r∧ (s₃, r) isLongestPrefixMatchOf (sk₁, rs); by induction hypothesis (sk 1=ε)∨ (sk1=sk ++ %a)∧rsk1=rsk) Case II.1 sk 1=ε ⇒ ε = s₃ ++ sk+1₁∧s₃≠ ε ⇒contridiction Case II.2 (sk₁=sk ++ %a)∧rsk₁=rsk) ⇒sk_{++ %a = s} 3 ++ sk+11 Case II.2.1 sk_{++ %a = s} 3 ⇒sk ++ %a = sk ++ %a ++ sk+1₁ ⇒sk+1 1 =ε ⇒(sk+1 1=ε)∨ ((sk+11=sk+1 ++ %a)∧rsk+11=rsk+1)

Case II.2.2 s₃ isPrefixOf sk ⇒∃ s₄. sk+1 1= s4 ++ %a∧sk ++ %a = s3 ++ s4 ++ %a∧s3≠ ε ∧ ∃r. rsk+1 1 = rsk1 ++ %r∧(s3, r) isLongestPrefixMatchOf (sk ++ %a, rs); ⇒sk_{= s} 3 ++ s4∧s3≠ ε ∧ ∃r. rsk+1 1 = rsk ++ %r∧(s3, r) isLongestPrefixMatchOf (sk, rs); ⇒(rsk+1 1,s4) = next_match(rs, rsk,sk) ⇒(rsk+1₁,s₄) =(rsk+1,sk+1) ⇒ (rsk+1 1,s4++ %a) =(rsk+1,sk+1++ %a) ⇒ (rsk+1 1,sk+11) =(rsk+1,sk+1++ %a) ⇒(sk+1₁=ε)∨ ((sk+1₁=sk+1 ++ %a)∧rsk+1₁=rsk+1)

(24)

Other lemmas we need are ∀k∈ NAT. (sk+1=sk⇒ rsk+1= rsk) ∀k∈ NAT. (sk+1 1=sk1⇒ rsk+11= rsk1) ∀k∈ NAT. (sk=ε ⇒sk+1=ε) ∀k∈ NAT. (sk 1=ε ⇒sk+11=ε) ∀k∈ NAT. (k≥ i)⇒ (sk+1=sk)

With these two lemmas, we proceed our main proof again by the case distiction (sj 1=ε∨sj1=sj ++ %a) Case I sj₁ =ε ∀ rs, r₂, s₂. (s₂, r₂) isMatchOf (sj 1, rs) ⇒∀ rs, r₂, s₂. (s₂, r₂) isMatchOf (ε, rs) ⇒ s2 isPrefixOfε ⇒ s₂ = ε Case I sj 1=sj ++ %a ⇒ sj1=si+1 ++ %a ⇒sj₁=si ++ %a

Remembering that j = i +1, and by case disticntion of (si

1=ε∨si1=sj ++ %a),we can derive

Case I.1 si1=ε ⇒sj 1=ε ⇒contridiction Case I.1 si 1=sj ++ %a ⇒sj₁=si ++ %a ⇒sj 1=si1 ⇒ (rsj 1,sj1) = (rsi1,si1) ⇒∀ r₂, s₂. (s₂, r₂) isMatchOf (sj₁, rs) ⇒s₂ = ε

This finishes the reduction of the applicationbility condition in the application of the transfor-mation SpliOfPost.

(25)

III.3 The Transformational Development

Written as formal transformational development, our proof-activities can be represented as fol-lows:

CREATEUNIT("NEXT-MATCH", "{ ... << text of NEXT-MATCH>>...}"); F(C_Scan)(ENRICHSIG("NEXT-MATCH"));

F(C_Scanbody; C_EquivLeft)(SWAP);

F(C_Scanbody) (SPLITOFPOST(λx. (ε,ε))(λx, y. next_match(π₂(x),y));

F(C_Scanbody; C_A)(STEP1); F(C_Scanbody; C_B)(STEP2); F(C_Scanbody; C_C)(STEP3); F(C_Scanbody; C_D)(ANDELIM); F(C_Scanbody; C_E)(ANDELIM); F(C_Scanbody; C_F)(IMPLELIM); F(C_Scanbody; C_G)(DISCHBIMPL);

Here, CREATEUNIT introduces NEXT-MATCH into the development graph. Then, focussed on the structure Scan (whereC_Scan represent the context of the unit Scan in the development graph), the next transformation produces an enrichment statement in the signature. The follow-ing transformations effectuate a swap of the left, and the right hand side of the equivalence (as preparation for the next step), then follows the application of the instantiated SPLITOFPOST. The variablesSTEP1,STEP2andSTEP3represent the developments for "INV holds in the termination case", "INV is invariant during recursion" and "the recursion terminates" dis-cussed in the previous sections. Since we are not going to reuse them somewhere, they are om-mited here.

In the following, the next transformations produce boolean simplification and the syntactical elimination of the previous version of scan from the specification. (for the final result, see next section).

As contexts, let C_Scanbody be defined by

C_Scan; λξ.SCAN ={ enriches MATCH + NEXT-MATCH;

scan: Seq Regexp × Seq Char → Seq Regexp × Seq Char;

axioms ∀ s:Seq Char. rs:Seq Regexp in ξ

endaxioms

and letC_EquivLeft = λξ.ξ ⇔∃ strS:Seq(Seq Char...). The following contexts are constructed anal-ogously and are ommited here.

As a simple introductory example of meta-development, we apply some algebraic laws on our transformational development above and generalize it (i.e. refine the relations by stronger ones in the sense of set inclusion) via introduction of quantification over arbitrary contexts and ^ (re-late to all normal-forms; in other words: apply as long as possible). This leads to the following

(26)

transformational development, which has the same effect on our target, but is more readable: CREATEUNIT("NEXT-MATCH", "{ ... << text of NEXT-MATCH>>...}");

F(C_Scan)(ENRICHSIG("NEXT-MATCH")); F(C_Scanbody)( F(C_EquivLeft)(SWAP);

(SPLITOFPOST(λx. (ε,ε))(λx, y. next_match(π₂(x),y));

F(CA)(STEP1);

F(C_B)(STEP2);

F(C_C)(STEP3);

([ξ]. F(ξ)(ANDELIM∪ DISCHBIMPL ))^);

Together with the tactical combinators, it is possible to represent the tactical programs similar to the ones well known from provers like COQ or Isabelle, but still maintaining a clean and sim-ple (relational) semantics. The last relational expression specifies a boolean simplification tac-tic; the computability of this relation is ensured due to the fact that the rules ANDELIM and

DISCHBIMPLare terminating.

III.4 Summary

The following figure shows the development at an early stage of the development. Associated units are shown and the proof obligation is contained in the rhombus, the transformation rules are in the ellipse.

ScanSplit contains the system-generated applicability condition. One can imagine that the ap-plicability conditions is automatically generated during the application of an appropriate trans-formation rule.

In the following we give the result of this section in its full. We have not instantiated the appli-cability condition since it has been proved true in the previous section. It would have made the unit too fat.

SCAN1 = {enriches NAT+MATCH+NEXT_MATCH;

scan : Seq Regexp× Seq Char→ Seq Regexp× Seq Char; for : nat→ (S× R→ R)→ S× R→ R;

axioms∀ i, h, x, y in for(0) (h) (x , y) = y

i > 0⇒ for(i)(h)(x , y) = for(pred i)h(x,h(x, y))

endaxioms;

axioms ∀ s, s₁:Seq Char; r:Regexp; rs₁,rs₂:Seq Regexp in

(rs₁, s₁) = scan(s, rs) ⇔ Inv((s, rs),(rs₁, s₁)) ∧ B((s, rs),(rs₁, s₁)) ∧ (∀ x,y. Inv(x, (ε,s))∧Inv(x,y)∧ ¬ B(x,y)⇒ Inv(x,H(x, y))∧

∃ i. B(x, for(i)(H)(x, (ε,s)) )

⇒

((rs₁, s₁) = scan(rs₁, s₁)

⇔ flatten strS ++s1 =s ∧ #rs1= #strS∧

strS ≠ ε⇒ ∀ i:Nat, s2:Seq Char. i

<

#rs1 ∧

s₂ =flatten (drop(i, strS)) ++ s1 ⇒

(27)

∧ ¬(∀ r₃:Regexp; s₃:Seq Char. (s₃, r₃) isMatchOf (s1, rs1)⇒ s₃ = ε;)

⇔

scan(rs₁, s₁) =

letrec g(s, rs, rs1,s1) =

if (∀ r₃:Regexp; s₃:Seq Char. (s₃, r₃) isMatchOf (s1, rs1)⇒ s₃ = ε)

then (rs1,s1) else g(s, rs, next_match (rs, rs1, s1)) endif in g((rs₁, s₁), (ε,s)) endlet endaxioms }

If we convert this definition into a functional program (e.g. OPAL), a decent compiler would recognize, that in each recursive call next_match has to be evaluated only once. This could also be achieved by a transformation for abstraction corresponding to "constant subexpression eli-mination", cf. [HK 93]. Furthermore,scan is defined tail-recursively, such that we now esta-blished "the outermost loop".

Fig. 4: Development Graph and Associated Units

daVinci V1.3 ETC SET MAX CHAR REGEXP MATCH SCAN SCAN1 ScanSplit NEXT_MATCH NAT SEQ SOPC(E,H)

(28)

IV Making the Specification Constructive

IV.1 Making matches Constructive

The function matches as defined in the unit MATCH is not constructive, since it contains existential quantifiers. Here is the original function recalled without the sorts and some axioms.:

MATCH={enriches REGEXP;

.matches. :Regexp×Seq Char→Bool;

axioms ...

εmatches s = ( s =ε);

mkreg(c) matches s =(%c = s);

r₁ | r₂matches s = r₁matches s∨ r₂matches s;

r₁ o r₂matches s =∃s₁,s₂ :Seq Char. s= s₁ ++s₂ ∧ r₁matches s₁∧ r₂matches s₂; *(r) matches s = (s =ε)∨∃ss:Seq (Seq Char). s = flatten ss∧

∀st: Seq Char. st∈ss⇒st matches r; ....

endaxioms}

Out of the above specification, we define a constructive version of matches by eliminating all the quan-tifiers. We introduce a new function called splits that realises the relation between the existence of a partition and its functional equivalence. By means of higher-order functions, such as map or reduce, the new constructive function matches is defined as follows:

MATCH1={enriches REGEXP;

.matches. : Regexp× Seq Char→ Bool;

axioms ...

ε matches s = ( s =ε);

mkreg(c) matches s = (%c = s);

r₁ I r₂ matches s = r₁ matches s∨ r₂ matches s;

r₁ o r₂ matches s = ¬((〈λx. r₁ matches x, λx. r₂ matches x〉➢ splits (s)) = ε); *(r) matches s = (ε I r o *(r)) matches s;

...

endaxioms}

First, the formula : *(r) matches s = ... ; is equivalently transformed via application of SPLITOFPOST, folding with εmatches s = ( s =ε), r₁ o r₂matches s =..., andr₁ | r₂matches s = ... Second, the formular₁ o r₂ matches s = ...; is transformed via unfolding of the laws CHARN_SPLIT, λ-abstraction and FUN_TUPLE,CHARN_FILTER and CHARN_ELEM from the library (see Appendix A3 and A7). We resign on a formal treatment of the development here.

IV.2 Making next_match constructive

The unit NEXT_MATCH was introduced in order to encapsulate auxilliary functions necessary to make the SPLITOFPOST applicable. Now it will itself be object of development, in order to make it constructive.

(29)

NEXT_MATCH= {enriches MATCH;

next_match: Seq Regexp×Seq Regexp ×Seq Char→ Seq Regexp×Seq Char;

axioms∀s, s1:Seq Char; rs, rs1:Seq Regexp, ∃s2:Seq; rs2:Seq Regexp in

δ(next_match (rs, rs1, s1))

next_match (rs, rs1, s1) = (rs2, s2)⇐ ∃ s3:Seq Char. s1 = s3 ++ s2∧

∃r: Regexp. rs2 = rs1 ++ %r∧(s3,r) isLongestPrefixMatchOf (s1, rs); next_match (rs, rs1, s1) = (rs1, s1) ⇐

¬(∃s3:Seq Char. s1 = s3 ++ s2∧ ∃ r: Regexp. rs2 = rs1 ++ %r∧ (s3,r) isLongestPrefixMatchOf (s1, rs));

endaxioms}

In order to make the function next-match constructive, let’s call it next, it is necessary to eliminate the existential quantifier. Therefore we introduce a logic transformation rule called skolemization.

if truethen Eelse Fendif = E

if falsethen Eelse Fendif = F

if p then Eelse Fendif = p

Since Bool is freely generated by true and false, it follows that: (δ(B) ∧ B⇒ E∧ ¬B⇒ F) =if Bthen Eelse Fendif

We now apply the above equation as a transformation rule. next_match(rs, rs1, s1) = if∃ s3.s1 = s3 ++ s2∧

∃ r. rs2 = rs1 ++ %r

∧

(s3,r) isLongestPrefixMatchOf (s1, rs)

then (rs2, s2) else (rs1, s1) endif

By lifting the two existential quantifiers and by elimination of the existential quantification for_rs2 (substitution:σ={rs2→ rs1 ++ %r}

)

we get the next version ofNEXT_MATCH

:

NEXT_MATCH1= {enriches MATCH ;

next_match: Seq Regexp× Seq Regexp ×Seq Char→ Seq Regexp×Seq Char;

axioms∀s, s1:Seq Char; rs, rs1:Seq Regexp, ∃ s2:Seq;

∃s3:Seq Char;∃ r:Regexp;in

δ(next_match (rs, rs1, s1)) next_match (rs, rs1, s1) = if rs1 ++ %r = rs1 ++ %r∧ s1 = s3 ++ s2∧ (s3,r) isLongestPrefixMatchOf (s1, rs) then(rs1 ++ %r, s2) else(rs1, s1) endif endaxioms}

We focus on the definition of next_match: next_match (rs, rs1, s1)

(by ex.elim. s2 →drop(s1, |s3|), (by boolean simplification)

= if s1 = s3 ++ drop(s1, |s3|)∧

(s3,r) isLongestPrefixMatchOf (s1, rs)

then (rs1 ++ %r, drop(s1, |s3|)) else (rs1, s1)

endif

(by applying theorem s1 = s3 ++ drop(s1, |s3|) fromSEQ)

= if (s3,r) isLongestPrefixMatchOf (s1, rs) then (rs1 ++ %r, drop(s1, |s3|)) else (rs1, s1)

(30)

endif

(by boolean simplificatian)

= if (s3,r) isLongestPrefixMatchOf (s1, rs) then (rs1 ++ %r, drop(s1, |s3|)) else (rs1, s1)

endif

(by fold withLongestPrefixMatchOfS defined below)

= if (s3,r)∈ LongestPrefixMatchOfS(s1,rs)

then (rs1 ++ %r, drop(s1, |s3|)) else (rs1, s1)

endif

(by let introduction)

= let S = LongestPrefixMatchOfS(s1,rs) in if (s3,r)∈ S

then let (a,c) =if S≠{} then arb(S)

else ([],[]) endif in(rs1 ++ %r, drop(s1, |s3|)) endlet else (rs1, s1) endif endlet

Here we create a new auxilliary unit NEXT_LONGEST_PREFIX_MATCH as following: NEXT_LONGEST_PREFIX_MATCH ={enriches MATCH;

isMatchOfS: (Seq Char ×Seq Regexp)→Set(Seq Char×Regexp

);

LongestPrefixMatchOfS

:

Seq Char ×Seq Regexp

→

Set(RegexpSeq Char×Regexp

)

axioms ∀s,rs

.

∀s1,r

,

r1

.

in δ

(

LongestPrefixMatchOfS(s, rs)); (s1, r)∈LongestPrefixMatchOfS (s, rs) ⇔ LongestPrefixMatchOfS (s, rs)

;

δ

(

(s1,r1) isMatchOf (s,rs)); (s1,r1) isMatchOf (s,rs) ⇔ (s1,r1)∈isMatchOfS (s,rs) |LongestPrefixMatchOfS(s, rs)| ≤ 1 endaxioms

;

Remark:

This conversion of predicates into a set-valued functions is very common in constructiva-tion phases; in [BM 93b] this fact gives rise to a theoretical investigaconstructiva-tion of a schematic operaconstructiva-tion "power set transpose", which has exactly this effect. Since this operation needs the machinery of cat-egory theory, we content ourselves in simulating it on the level of SPECTRUM-specifications. The definedness of the transposed functions is a requirement, that would lead to the inconsistency of the specification, if the predicate is not computable. If a refinement of the specification to a program is possible, then the existence of this model discharges the pending consistency proof here.

We return to the specificationNEXT_MATCH1, enrich it by NEXT_LONGEST_PREFIX_MATCH and MAX , and focus again on the right hand side of next_match:

(apply theorem S≠{}⇔ ∃a. a∈ S from SETS)

= let S = LongestPrefixMatchOfS(s1,rs) in if S≠ {}

then let (a,c) =if S≠{} then arb(S)

else ([],[]) endif in(rs1 ++ %c, drop(s1, |s3|))