• No results found

TECHNIQUES FOR SIMPLIFICATION OF LARGE THEORIES

5.4 EXPERIMENTAL RESULTS

[(theory [theory name]

(extends [extended theory]*) (axioms [axiom]*)

)]*

)

[goal formula]*

Figure 5.4. New haRVey syntax.

iii) specification criterion: in elementary specifications, the above criteria can also be used to determine if an operation defined on it, is a hierarchy persistent enrichment of the rest;

iv) recursive criterion: these three criteria can be applied recursively, until the set of axioms gets stable.

To implement this approach in haRVey, we first needed to change its input syntax, to add support for structured specifications. We have then used the syntax showed in Figure 5.4, so to allow users to create hierarchic specifications.

With these changes, we are now able to represent structured specification by giving names to the theories, and enriching them through the extends clause. And, when working with hierarchy persistent specifications, we can apply the algorithm of Reifi et. al to reduce the given background theory.

After we have implemented this solution, the second step was to try it with the Krakatoa files. This conversation would involve mainly two files, the file that contained the model of the memory (such file can be seen in Appendix A), and the files that model each of the classes being verified (automatically generated in each run). In both cases the structuring was pretty straightforward, and could be easily discovered by looking at the symbols used in each axiom. In fact we have ended up, in most cases, with one axiom per theory.

5.4 EXPERIMENTAL RESULTS

To measure the real impact that the presented techniques have in the verification of Java/JavaCard applications, we have used as base a set of classes found in [56], where they are used to show the functionalities of Krakatoa with the Coq proof assistant. The classes we have used are four:

Lesson1 Comprise three basic functions, one to find the max between two numbers and two implementations to calculate a square root;

Arrays Problems related to arrays, to find the maximal value in a given array, to verify if a given number is maximal in an interval and to do a shift in the array;

Purse An implementation of a pseudo purse applet, that deals with operations such as deposits and withdraw;

Flag An implementation of the Dijkstra’s dutch flag problem;

All the source code for these classes, with their respective JML specification, can be found in Appendixes B, C, D and E, respectively.

To do the tests, we run haRVey in the proof obligations generated by Krakatoa for each of these classes, using each of the reduction techniques we have developed, and, when possible, a combination of them. There were generated a total of 54 proofs obligations (eight for Lesson1, nineteen for Arrays, eighteen for Purse and nine for Flag). Table 5.1 shows the first set of results we have obtained using the original structure of haRVey (without theory structuring). In this table, the first column shows the kind of test we have run, either using no reduction at all (norm), the tailor theory algorithm (tailor), the “local” equivalence removal (eql), and the “full” equivalence removal (eqf). The next columns show how many problems could be verified, how many haRVey could not decide as being correct or not, and how many timed out (using a time out of 30 seconds). The last column presents the total time taken for the verifications that finished (the ones that were verified and the ones that could not be decided). All the tests were run on a machine with 2Ghz processor and 512Mb of RAM memory, using Linux with kernel 2.6, and E prover version 0.82.

Verified Can not decide Timed out Proof Time

norm 14 0 40 1.821

While in these cases, there is no gain in the amount of verified proofs, by using our techniques, we can see in Table 5.2 that the size of their theories was reduced by an average of 25%. The table shows the size of the original theories for each of these problems, as well as their new sizes after applying each of the reduction techniques. To verify the consequence of our techniques in a broader scenario, we have also run this same kind of tests in two different sets of problems, one coming from [16] and other gathered from a set of problem related to software reuse. In both cases, the theory reduction results in more obligations being verified.

While the techniques used so far truly yield a reduced background theory, its size is still too big to be fully handled by haRVey. In Table 5.3 we present the results obtained using, the last approach, the structured theory. Its results follow the same scheme of the previous one, showing the amount of proofs verified using the new structured theory alone, and together with the other reduction approaches.

Problem original tailor eql eqf eql + tailor eqf + tailor

Lesson1 79 60 55 51 48 44

Arrays 79 59 55 51 47 43

Purse 79 59 55 51 47 43

Flag 76 60 56 51 49 44

Table 5.2. Reduction of the theory size.

Verified Can not decide Timed out Proof Time

norm 24 1 29 2.340

eql 23 2 29 2.243

tailor 24 1 29 2.302

eqf 23 2 29 2.208

eql + tailor 23 2 29 2.253

eqf + tailor 23 2 29 2.218

Table 5.3. Results with structured theory.

Clearly the results obtained are better than the ones in the original case, by an average of 70%. We have obtained a very good reduction of the size of the theories using this approach as well. In most cases we have reduced the theory by a factor of four, and in some cases (the problem from Lesson1), we were able to have as little as 5 axioms in the new theory. It is also important to observe that in this case, the tailor theory algorithm had no effect in the size of the theory, since the structured approach had, in most cases, just one axiom in each of the defined theories, and that the use of the equivalence removal resulted in, at most, four axioms being removed. Table 5.4 shows the minimal and the maximal size of the background theory in each case.

Problem original tailor eql eqf eql + tailor eqf + tailor

Lesson1 5 - 6 5 - 6 3 - 4 2 - 3 3 - 4 2 - 3

Arrays 15 - 18 15 - 18 12 - 15 10 - 13 12 - 15 10 - 13 Purse 14 - 30 14 - 30 11 - 27 10 - 23 11 - 27 10 - 23 Flag 13 - 23 13 - 23 12 - 22 9 - 18 12 - 22 9 - 18

Table 5.4. Reduction of the theory size with structured theory.

Another thing that deserves attention is a phenomenon that is occurring with our underlying theorem prover. With a careful examination of the execution trace of this, and other examples, we have found that, our underlying theorem prover, the E prover [74, 75, 76], is having some issues, that could yield to some proof obligations not being able to be verified after the theory reduction.

5.4.1 Heuristic problems

In more than one occasion, we have faced some problems with respect to the under-lying theorem prover. This happens because the E prover has a set of heuristics, which

(

(forall balance this (<-> (Purse_invariant balance this) (<= 0 (acc balance this))))

Figure 5.5. E prover problem with JML obligation.

it can choose from when verifying a problem. This heuristic will determine things such as the term ordering and precedence. The default action of haRVey is to let the E prover automatically choose a heuristic to apply for each problem. The problem with this is that, in some cases, simple changes to the problem at hand (as the addition or removal of one axiom) yields to a different heuristic being chosen, and, as consequence, very different results being obtained.

Take for example the proof obligation in Figure 5.5 that possess an axiom in the background theory that, obviously, has no relation to the formula we want to verify.

While this proof can be trivially verified by haRVey, at the moment that we remove this

“useless” axiom, haRVey says it can not decide this problem.

What happens here is that, without the additional axioms, the E prover is not able to derive new clauses, that are used by haRVey to continue its verification process, and as consequence this problem is considered to be non-decidible.

Another case where we have detected this same kind of behavior was while testing our approaches in the problems from [16]. In the case shown in Figure 5.6, the removal of the two showed clauses, changes the heuristic chosen by E prover, and, while still provable, is 10 times slower than the original problem.

To verify the real consequences of this in our cases, we have chosen the heuristic that resulted in more verified obligations, and forced the E prover to use it. We then run all the tests again and obtained the results shown at Table 5.5.

Verified Can not decide Timed out Proof Time

norm 29 4 21 41.757

Table 5.5. Results with fixed heuristic.

With the use of this fixed heuristic we get better results (20% in average) than letting

(

(forall U V (<->

(lt U V) (gt V U))) (forall U V (<->

(geq U V) (leq V U))) ...

) (->

(forall U V (->

(and

(leq num 0 U)

(leq U (minus (plus num 1 (minus num 135300 num 1)) num 1))) (= (sum num 0 (minus num 5 num 1) (a select3 q U V)) num 1))) (forall W X (->

(and (leq num 0 W) (leq W (minus num 135300 num 1)))

(= (sum num 0 (minus num 5 num 1) (a select3 q W X)) num 1)))) Figure 5.6. Other problem with E prover.

the E prover automatically decides which heuristic to use. But we must consider two things here: i) while in this case we have obtained better results with the chosen heuristic, it is very possible that, this same heuristic, will not yeld the best result in other problems;

and ii) even if the chosen heuristic yields better results, we can not guarantee that this is, indeed, the best heuristic that can be used for this problem.

As can be seen we need to find a way so to find (an approximation of) the best heuristic for each problem, or even to a set of problems. These results pinpoint a new issue in the consolidation of the automated theorem prover haRVey, that needs a careful study that should be led in cooperation with the authors of the E prover.