Generating triggers - Performance of program verification with VerCors

After eliminating complex subscripts we can try to generate trigger sets for universal quantifiers, at the end of the rewriting phase. This ensures that the generated triggers will not be invalidated by other transformations. Since the selection and the main elements of a COL quantifier are rewritten to an implica- tion in the body of the Viper quantifier, we have to look for trigger candidates in the selection and the main of the COL quantifier. These trigger candidates have to adhere to the restrictions that Viper imposes on triggers. A trigger candidate is thus an expression that mentions at least one of the quantified variables, has some sort of structure (thus is not the variable itself) and does not contain an accessibility predicate. From this set of trigger candidates we compose valid trigger sets. We do this by first generating the powerset of the set with triggers (excluding the empty set, since the empty set is not a valid trigger set). From this powerset we select all the sets that mention all quantified variables, and are thereby valid trigger sets. We chose to consider all possible valid combinations of trigger expressions, to make sure that we do not inadvertently block neces- sary quantifier instantiations. The intuition is that these triggers still contain

more abstractions (e.g. domain encodings used by VerCors) than triggers that would be inferred by Viper. Thereby our triggers are more specific and cause less spurious quantifier instantiations in the SMT solver.

5.4 Implementation

To enable the rewriting of complex subscripts in quantifiers and adding trig-

gers, the optional command line flag --triggers has been introduced. This

command line flag should receive an integer value. Internally, the binary rep- resentation of the value is used to switch on or off the rewriting of complex subscripts and trigger generation. If the second least bit is set (value & 2), the complex subscripts are rewritten. If the least significant bit is set (value & 1), then triggers are generated. This gives us the following possible values:

0 No additional transformation will be made.

1 Will add triggers if possible, without any further transformations. 2 Will rewrite complex subscripts to a new quantified variable, and will not

add triggers.

3 Will rewrite complex subscripts and add triggers if possible.

5.5 Results

To check the effect of rewriting complex subscripts and adding triggers to quantifiers, we compared verification times for the set of representable examples from

Section 3.2. We compare verification times for versionVct3 of the tool, for all

four options for the--triggerflag: No additional transformations (Vct3), only

add triggers (Vct3-t1), only rewrite complex subscripts in quantifiers (Vct3-t2)

and rewriting complex subscrips in quantifiers plus adding triggers (Vct3-t3).

For each version the example is verified 5 times, and the average is shown in Table 5.1.

However from Chapter 4 we know that examples using multi-dimensional ar-

rays could not be verified with versionVct3of the tool. Therefore we have also

applied our changes to implement the triggers features to theVct3-a version

of the tool. This gives us the versionsVct3-a-t1(Vct3-aplus trigger genera-

tion),Vct3-a-t2(Vct3-aplus rewriting complex subscripts in quantifiers) and

Vct3-a-t3(Vct3-a plus rewriting complex subscripts and trigger generation). In Table 5.2 are the verification times for the examples using this version.

References to the exact points in the Git history for version Vct3-t and

File Vct3 Vct3-t1 Vct3-t2 Vct3-t3 floats/TestFloat.java - 17483 - 17516 openmp/add-spec-simd.c 15514 15646 - - openmp/addvec2.pvl 14673 14857 - - basic/CollectionTest.pvl 8483 8429 8400 8395 futures/TestFuture.pvl 9568 9730 9558 9492 manual/option.pvl 10275 10352 10299 10202 floats/TestHist.java 20226 19997 20297 20212 arrays/DutchNationalFlag.pvl 13887 13913 13870 13951 waitnotify/Queue.pvl 8283 8458 8504 8340 carp/summation-kernel-1.pvl 19694 19770 19817 19865 type-casts/TypeExample1. java 8022 8161 8144 8152 layers/LFQHist.java 16130 16717 16398 16441 witnesses/TreeWandSilver. java 29072 29841 30250 29682

Table 5.1: Comparing total verification times (in ms) for versionVct3, with and

without triggers. Sorted by relative speedup fromVct3toVct3-t3.

5.6 Discussion

From the results in Table 5.1 we can note multiple things. First of all we see

that in theTestFloat.javaexample the added triggers eliminate the matching

loop that caused the verification to run indefinitely. In the next two examples we see that the complexity that is introduced by rewriting complex subscripts to additional quantified variables cause the verification to fail. Further we see that generating triggers does not have a positive effect on verification times

for these examples. Next we see that the Challenge2.pvl only succeeds for

version Vct3. However in Section 4.3 we already concluded that verification

of this example is inconsistent. Therefore we can not base conclusions on this result. For the remaining examples we see that the added transformations are of a minimal effect on the total verification times.

For the results with the new array encoding in combination with generating

triggers from Table 5.2, we can note the same for theTestFloat.javaexample,

namely that the added triggers help break the matching loop that caused this example to run indefinitely. Then there are the same two examples that fail to verify when complex subscripts are rewritten to additional quantified variables. But there also are two examples that benefit significantly from the added trans-

formations: TheTestHist.javaexample takes only 67 % of the time to verify

if we rewrite complex subscripts and add triggers, And theprefixsum-drf.pvl

example is also 29 % faster when we rewrite complex subscripts and add triggers. For the remaining examples we see that again the effect on total verification time is minimal.

File Vct3-a Vct3-a-t1 Vct3-a-t2 Vct3-a-t3 floats/TestFloat. java - 32693 - 34219 openmp/ add-spec-simd.c 18842 19329 - - openmp/addvec2.pvl 18530 19629 - - floats/TestHist.java 34278 24412 31877 23122 case-studies/ prefixsum-drf.pvl 98715 96133 68190 69797 arrays/ DutchNationalFlag. pvl 14036 14153 14673 13975 manual/option.pvl 10103 10266 11123 10072 type-casts/ TypeExample1.java 8068 7993 8002 8081 futures/TestFuture. pvl 9741 9880 9708 9819 layers/LFQHist.java 16481 16457 16662 16657 carp/ histogram-submatrix. c 18095 18380 18058 18324 basic/ CollectionTest.pvl 8391 8289 8394 8500 carp/ summation-kernel-1. pvl 18934 19517 21016 19222 waitnotify/Queue.pvl 8197 8190 8304 8351 witnesses/ TreeWandSilver.java 29334 29502 30290 30001

Table 5.2: Comparing total verification times (in ms) for versionVct3-a, with

Because the effect of rewriting complex subscripts in quantifiers and/ or adding triggers can have a positive or negative effect depending on the context,

we chose to not enable this feature by default. Only theTestFloat.java ex-

ample does not verify in the Vct3 version, and can be verified when triggers

are generated. However we think that the matching loop is caused by a more fundamental problem, that needs to be solved in the VCTFloat domain that is used by VerCors to model floating point numbers. For the other examples, rewriting complex subscripts and generating triggers does not necessarily have

a positive effect on the performance. Therefore we have kept the --trigger

option with the integer value to specify what transformations the user wants to apply. In this way expert users that think they can benefit of these features can still use them, but for users unaware of these features the safest way is to not use them by default.

Chapter 6

Related work

This research is focused very specifically at the performance of program verification with VerCors. In literature we can look more broadly at research and comparison of performance between different verification techniques. Cassios et al have published an experience report in which Symbolic Execution (SE) is compared to Verification Condition Generation (VNG) [11]. However, to the best of our knowledge, there is no report on the analysis of specifically which parts of a program have the most influence on the performance of verification.

Leino et al identify matching loops in quantified expressions as a significant contributor to instabilities in performance and user experience in program verification [13]. They propose to move trigger logic away from the SMT solver and into the high level verifier. The paper presents three techniques for trigger selection that are implemented in the Dafny verifier: Quantifier splitting, trigger sharing and matching loop detection. First terms that could be part of a trigger are collected. Then it enumerates subsets of these terms and rejects terms that are not allowed. The trigger candidates are evaluated to see if they can cause a matching loop. If this is the case, then the trigger is also removed from the Pool. In practice users tend to collect related conditions under a single quantifier. However, the technique to prevent matching loops may cause essential triggers to be eliminated if the conditions in the quantifier are only weakly related. To mitigate this problem, the quantifier is split into multiple quantifiers. Thereby preventing the case that a possible matching loop for one condition eliminates an essential trigger for another condition. To recover the triggers that were eliminated by the matching loop detection for the quantifier that is split, the pool of triggers is enriched with the trigger candidates that derive from the same split quantifier. This is what they call trigger sharing. The techniques are implemented in the Dafny verification tool. They show significant performance gains on Dafny’s test suite and larger verification problems.

Chapter 7

Conclusion

In this research we wanted to investigate what performance bottlenecks in program verification with VerCors were, in order to optimize performance of the tool. First we have identified two program constructs that could cause poor performance in verification, namely the encoding of (injectivity of) arrays and the use of universal quantifiers without triggers. We have also developed a framework that can help identify which program constructs take relatively more time to verify. First results have shown that the framework can identify the constructs that were also identified by experts as possibly being problematic. We think this framework can help identify new problematic constructs as the tool evolves. If the encoding of existing concurrency abstractions in VerCors changes, the learned normalized verification times for the nodes that model this abstraction will automatically change accordingly. Thus if the change has a negative effect on performance, this will show up in higher verification times. Further, if new abstractions are added to the tool, for instance by adding new node types in the COL AST, then they can easily be added to the framework. This makes it possible to immediately monitor how the performance of verification of this abstraction relates to verification of the rest of the programs.

Next we have developed a new encoding for arrays. Before this research the support for multi-dimensional arrays was broken in VerCors, due to an update in the back-end verifier. Therefore we had to compare results to an older version of VerCors. We have seen a verification speedup of 4 times in cases where array injectivity was a relevant property.

Finally we have investigated if we could benefit from generating trigger sets for universal quantifiers in VerCors. We have seen that in specific cases adding triggers can improve performance. However additional transformations that are needed to be able to generate valid triggers can have a negative effect.

7.1 Future work

In order to improve the framework to monitor normalized verification times for AST nodes, future work could focus on also considering the structure of the syntax tree. For instance taking into account how deep (invariants for) loops are nested, or if certain expressions are used in special contexts.

In the current situation, the learning of normalized verification times for AST nodes is done before the AST is rewritten. This is at the highest level of abstraction. To identify more specifically which part of the transformation causes an abstraction to be slow to verify, it could also be interesting to add counting passes further on in the transformation chain, in order to monitor unit times at different abstraction levels.

With the new array encoding, multi-dimensional arrays no longer have to be flattened to a one-dimensional array. This eliminates the need for a non-linear calculation to map indices from the original array to the flattened array. In VerCors there are rewrite rules that rewrite quantified expressions with subscripts to help the back-end verifier to reason about the non-linear arithmetic. It would be interesting to investigate if eliminating these rewrite rules would have a positive effect on performance.

In transforming abstractions VerCors can generate various quantified expressions. In subsequent transformations these quantifiers might be split and rewritten further. It could be interesting to investigate how these transformations influence the ability to generate valid trigger sets, and how this influences performance.

Currently, the only strategy to select trigger sets for quantified expressions is to generate the powerset of all trigger candidates and filter the valid sets. Future work could focus on how to reduce the set of triggers without eliminating essential trigger expressions.

Bibliography

[1] Afshin Amighi, Stefan Blom, Saeed Darabi, Marieke Huisman, Wojciech Mostowski, and Marina Zaharieva-Stojanovski. Verification of concurrent

systems with VerCors. InInternational School on Formal Methods for the

Design of Computer, Communication and Software Systems, pages 172– 216. Springer, 2014.

[2] Mike Barnett, Bor-Yuh Evan Chang, Robert DeLine, Bart Jacobs, and K Rustan M Leino. Boogie: A modular reusable verifier for object-oriented

programs. InInternational Symposium on Formal Methods for Components

and Objects, pages 364–387. Springer, 2005.

[3] Nils Becker, Peter M¨uller, and Alexander J. Summers. The axiom profiler:

Understanding and debugging smt quantifier instantiations. In Tom´aˇs Vo-

jnar and Lijun Zhang, editors, Tools and Algorithms for the Construction

and Analysis of Systems, pages 99–116, Cham, 2019. Springer International Publishing.

[4] Stefan Blom, Saeed Darabi, Marieke Huisman, and Wytse Oortwijn. The VerCors tool set: Verification of parallel and concurrent software. In Nadia

Polikarpova and Steve Schneider, editors,Integrated Formal Methods, pages

102–110, Cham, 2017. Springer International Publishing.

[5] John Boyland. Checking interference with fractional permissions. InInter-

national Static Analysis Symposium, pages 55–72. Springer, 2003.

[6] Lilian Burdy, Yoonsik Cheon, David R Cok, Michael D Ernst, Joseph R Kiniry, Gary T Leavens, K Rustan M Leino, and Erik Poll. An overview

of JML tools and applications. International journal on software tools for

technology transfer, 7(3):212–232, 2005.

[7] Leonardo De Moura and Nikolaj Bjørner. Z3: An efficient smt solver. In International conference on Tools and Algorithms for the Construction and Analysis of Systems, pages 337–340. Springer, 2008.

[8] David Detlefs, Greg Nelson, and James B. Saxe. Simplify: A theorem

[9] Christian Haack, Marieke Huisman, Cl´ement Hurlin, and Afshin Amighi.

Permission-based separation logic for multithreaded java programs. arXiv

preprint arXiv:1411.0851, 2014.

[10] Charles Antony Richard Hoare. An axiomatic basis for computer program-

ming. Communications of the ACM, 12(10):576–580, 1969.

[11] I. T. Kassios, P. M¨uller, and M. Schwerhoff. Comparing verification condi-

tion generation with symbolic execution: an experience report. In R. Joshi,

P. M¨uller, and A. Podelski, editors, Verified Software Theories Tools Ex-

periments (VSTTE), volume 7152 ofLecture Notes in Computer Science, pages 196–208. Springer-Verlag, 2012.

[12] K Rustan M Leino, Peter M¨uller, and Jan Smans. Verification of concurrent

programs with Chalice. InFoundations of Security Analysis and Design V,

pages 195–222. Springer, 2009.

[13] K Rustan M Leino and Cl´ement Pit-Claudel. Trigger selection strategies

to stabilize program verifiers. In International Conference on Computer

Aided Verification, pages 361–381. Springer, 2016.

[14] Peter M¨uller, Malte Schwerhoff, and Alexander J Summers. Viper: A

verification infrastructure for permission-based reasoning. InInternational

Conference on Verification, Model Checking, and Abstract Interpretation, pages 41–62. Springer, 2016.

[15] John C Reynolds. Separation logic: A logic for shared mutable data struc-

tures. InLogic in Computer Science, 2002. Proceedings. 17th Annual IEEE

Symposium on, pages 55–74. IEEE, 2002.

[16] UTwente-FMT. VerCors example repository. https://github.com/

utwente-fmt/vercors/tree/master/examples, 2018. [Online; accessed 24-October-2018].

Appendix A

VerCors repository history

Over time various versions of the VerCors tool have been key in the context of this thesis. With regard to the encoding of arrays these were the points where

the old encoding was still working (Vct1), the version on which the new encoding

is build (Vct2) and a more recent version at the time of writing (Vct3). The

Git commit hashes that mark these points of time in the history of the VerCors

repository1 _{are the following:}

Vct1 e6f0a2e198ca4e0ccf1fd3e3baae3bb7421cbaec

Vct2 50262799fdef5ec65bd396e5d19cac76fd6c07f5

Vct3 6f266bacd9e2bb056eff0dddf0fc08ffdca22be8

A.1 Time learning framework

The implementation of the time learning framework as duscussed in Section 3.3

is based on theVct1version of the tool. The Git commit hash that marks this

point in the VerCors repository is:

7befcbef70fea0425f80092bfaadcc329b514cc0

A.2 New array encoding

The new array encoding, as discussed in Section 4.2, that was based on Vct2,

can be found inVct2-a. InVct3-athe new array encoding is merged with the

later version ov VerCors. The Git commit hashes that mark these points in the VerCors repository are the following:

Vct2-a 716a46ad2af31426011287adec1daeecb90d4070

Vct3-a eed4c0530d161f269f05ed8dde0026bfb2431d11

A.3 Trigger generation

The implementation for generating triggers, as discussed in Chapter 5, based

on versionVct3of the VerCors tool, can be found in versionVct3-t. In version

Vct3-a-tthe implementation of triggers is merged with the new array encoding from Section 4.2. The Git commit hashes that mark these points in the VerCors repository are:

Vct3-t 96922644acb6aa27ab8d4f9d94354f34b159cc0e

Appendix B

Set of representable

examples

Below is the list of examples that is used to compare performance of different versions of the VerCors tool. The examples used are as they were current at commit 7b738ddb45483e43a96f34420c147555e7349f7c.

These examples are all compatible with versionVct1of VerCors, and are fixed

where possible for the other versions of the tool.

type-casts/TypeExample1.java floats/TestHist.java basic/CollectionTest.pvl carp/summation-kernel-1.pvl floats/TestFloat.java openmp/addvec2.pvl carp/histogram-submatrix.c witnesses/TreeWandSilver.java futures/TestFuture.pvl case-studies/prefixsum-drf.pvl arrays/DutchNationalFlag.pvl manual/option.pvl waitnotify/Queue.pvl layers/LFQHist.java

openmp/add-spec-simd.c

Appendix C

Results

In this appendix are the results as presented in this thesis. Measurements are done with a system with the following specifications:

Dell Precision M2800

Processor: Intel(R) Core(TM) i7-4710MQ CPU @ 2.50GHz, 2501 Mhz, 4

Core(s), 8 Logical Processor(s).

Memory: 8.00 GB

Operating system: Windows 10 Pro 1803 X64.

C.1 Normalized verification time per AST node

In this section are the learned unit times as discussed in Section 3.3 and 3.4. In Listing C.1 are the learned unit times. In Listing C.2 is the learned absolute

In document Performance of program verification with VerCors (Page 33-52)