VIII Increasing Efficiency by Tabulation
VIII.2 Tabulation Techniques: A Short Overview
Efficiency is a crucial need in formal development methodology. One way towards efficiency is to compute a function only once and to store the result in some conveniently represented table for later use. This techniques is known as tabulation. Various representations have been proposed. But all must be capable of storing those previously computed values of a function which are needed to continue the evaluation of the same function with different parameters. They must also be capable of determining for a given argument whether or not the function value has previously been computed, and if so what it is.
Tabulation as a means of improving efficiency of recursive programs is familiar and it forms a major component of an algorithmic design that came later to be known as dynamic programming [Bell 57]. Furthermore it can be seen as a technique for program transformation. Three techniques are mentioned and discussed in [Bird 80]:
Dependency Graphs that induce a partial ordering of the vertices of a given graph and the problem of
efficient tabulation can be viewed partially as the problem of embedding this partial order in a linear order, that is, arranging the nodes in a linear sequence.
Overtabulation. It consists of embedding one scheme in another where function values are also cal-
culated that are not crucial for the final answer; i.e. a sort of general embedding over tables.
Exact Tabulation. Only those function values actually needed for calculating the corresponding func-
tion value are computed and stored in the table.
All table operations can be carried out in constant time. If the subsidiary functions have constant com- plexity then the time spent on table operations apart form initialization will increase the overall running time of the tabulation only by a proportional constant.
The concrete realization can be done by hash tables or nonlinear structures like balanced trees.
VIII.2.1 Tabulating the States
Because the set of states is finite (a well-founded order on the length of the initials) these can be tabu- lated. This is also the task of the precomputation hidden in the functionQ. A last step of our develop- ment is to maintain efficiency by choosing a suitable representation of the table. We assume furthermore the existence of a constant-time look-up function.
Example
In the following example we will sloppily use the following notations: list []-notations will denote a sets of regular expressions, "" will represent strings and at least{} is reserved for sets of states.
Let the following be given:R = [ab, a*a] ands= "abc". We want to compute the states and to build the corresponding automaton. In below the building process is shown:
Fig.8: The Computed States and the Corresponding Automaton a {%a*a}∪{ε} {ab, a*a} a T "a" [ab, a*a] = (∪ / {}) (T2 a * [ab, a*a]) =T2 a ab ∪ T2 a *a = {b}∪{*a} T "b" [ab, a*a] = {} T "c" [ab, a*a] = {} T "a" [b, *a] = {} ∪{*a}∪{ε} = {*a,ε} T "b" [b, *a] = {ε} T "c" [b, *a] = {} T "a" [*a,ε] = {*a} T "b" [*a,ε] = {} T "c" [*a,ε] = {} {%b,*a} b {ε} a
The figure above shows the tabulation of the states computed by the functionT. It is a character-wise computation. As one can easily see the call of the functionT with the character parameter b and c pro- duces an empty set, since these can not be matched in the example. This builds the main efficiency issue. If a character can not be recognized as a head of a regular expression, the whole regular expres- sion is thrown away.
After the computation of the first character, we proceed with the successor state computed by the first call of the functionT as done above. So the function is recurrently called with the following characters of the string to be matched.
VIII.2.2 How to Read The Longest Match on The Automaton
The start state is denoted by a vertex that comes from outside of the graph. Every node is labelled with the whole set of regular expressions that the string is to be matched into. The vertices are always la- belled with the recognized character.
To build the longest prefix match, we proceed like follows:
We simply collect the labels on the vertices starting from the initial state, concatenate them and walk through the graph. If we reach the end state {ε}, the collected and concatenate string builds the longest prefix match of the given regular expression(s). The difference between the new string and the initial one builds the unmatched string that the functionscan returns as part of its result.
For instance, the example discussed above, we have (informally): LongestPrefixMatchOfS [ab, a*a] "abc" = {(ab,"c")}
The first element of the result pair is the longest prefix match. And the second is the rest string that has not been matched. One can easily imagine the formalization of this knowledge by thinking of a filter by means of which every potential end states are described.
VIII.3 Summary
In this chapter, we have presented tabulation techniques used for making recursive programs efficient. The computed states have been computed by the function T and the corresponding function Q. An au- tomaton has been constructed for the corresponding example.
The section about how to read the Longest Prefix Match enables the reader to collect the labels on the vertices and to concatenate them to the matched string.
IX Conclusion
IX.1 Results
It is an widespread opinion in the community of formal program development, that development tech- niques like filter fusion, which have their origin in work over the Bird-Meertens calculus, are far too specific to be usable in a wider context. This judgement is not justified by our experience, where it turned out to be easy to combine with "data-type-refinements" (see Chapter V), or with traditional transformations like "SPLITOFPOST" (which can be seen also as quite representative for "design tac- tics" like "global search" of [Smith 88]). Moreover, it seems that a LCF-like specification formalism like SPECTRUM can serve as a good formal framework for combining these different development techniques, benefiting from their different achievements and advantages.
Compared to the approach of [BDDG 93], our experience seems to validate our claim that data-struc- ture changes - like regular expressions implemented by non-deterministic graphs, non-deterministic graphs implemented by deterministic graphs, graphs by arrays etc. - can and should be avoided, at least in the earlier stages of the development and at least on the level of abstract development plans. Data structure changes are not only more complicated and errorprone, but endanger also to ad-hoc deci- sions, that have to be corrected later on by specific ad-hoc constructions in the algorithm (like
ε
-edge- introduction in the implementation of regular expressions, that culminates in a special elimination phase in the classical literature and, hence, in the usual implementations - see section).We are completely aware over the fact, that this document with a complicated proof structure contains formal errors. We believe in Reif’s postulate, that "any specification contains an error, even the small- est one". This was the case for the original specification, whose errors were detected during verifica- tion also by other authors, and our own specifications and proofs are not essentially different. Hence, for the goal of a correct implementation, we believe that a formal, machine-checked verification is in- dispensable (see also: Future Work). Mathematicians, even within the formal program development community, are often not very patient with this argument, stressing the fact that mathematics flour- ished for thousands of years without the existence of theorem provers and that "you can’t do anything on machine, that you can’t do with your brain" (E. Dijkstra).
We feel that the latter argument is not right, revealing a deeper fact about the difference between for- mal program development and mathematics: Due to a turnover from quantity to quality, human beings can not maintain consistency of formal proofs analogously to the fact that humans can not compute "by brain" whether 23845638465-1 is a prime number or not, although able to multiply and divide small numbers and although perfectly aware over the nature of computing. The success of mathematics is a consequence of its freedom to keep models simple via abstraction. Its maxim is: "whenever a defini- tion, a model, a theory turns out to become too complex, look for a higher degree of abstraction". Al- though this maxim applies also in many fields of theoretical computer science, it does not apply on the field of formal program development, where the problems and their discrete modelling are basically predestined.
Summing up, the following points seem most interesting and relevant to us:
error-prone and less implementation-oriented. It leaves more freedom for alternative de- velopments.
(2). The uniform approach makes our transformational development more manageable by specifying problems as well as development steps in SPECTRUM.
(3). The investigation of the refinement of specifications in SPECTRUM is based on a for- mulation of a transformation rule as a parameterised specification with a particular in- terpretation. We outline how the correctness proof can be carried out in the SPECTRUM-Logic.
(4). Our transformational development attempts to be straight forward and is synthesis-ori- ented, i.e. important concepts of the algorithmic solution (state, powerset-automaton) are derived during the development. Thus, large proof-obligations are avoided that re- sult from refinement steps necessary to introduce these concepts in the algorithmic spec- ification. Further, ad-hoc constructions well known from the classical literature are avoided.
(5). The development process is partly presented as a formal object, and is thus a first step towards reuse of development methods in "finding the right intermediate lemma-situa- tions" and even similar problem areas.
(6). An efficient implementation of LEX is achieved.
(7). The presentation of the problem and the development are done according to the KORSO terminology and documentation guidelines.