Invariants - SRC RR 160 pdf

In practice, almost all pivot fields are injective (one-to-one), that is, if b is a pivot field and u and v are distinct objects in the domain of b , then b[u] and b[v] are distinct (or they are both nil ). The reason for this is easily seen by considering the prototypical example involving a dynamic dependency, shown in Figure 10. The call to R from P modifies c[b[t]] . This affects the value of a[t] . If the pivot field

unit M type T

spec var a: T →. . .

...

proc P(t: T)modifies a[t] unit N

type U

spec var c: U→. . .

...

proc R(u: U)modifies c[u] unit MImpl import M,N

var b: T →U

depends a[t: T] on c[b[t]]

...

impl P(t: T)is . . .R(b[t]) . . . end

b were not injective, it would also affect a[u] for any u such that b[u] = b[t] .

In general, when a[t] is modified by changing part of its representation c[b[t]] , the only hope for showing that the modification obeys the modifies list

modifies a[t]

is to require the injectivity of b .

Note that although we find injectivity necessary to be able to verify interesting programs, we have not found injectivity to be a requirement for soundness.

By the way, it is surprisingly difficult to verify a procedure that initializes an injective field. While showing that a command like

b[t] :=new(U)

maintains the injectivity of b is easy, a command like

b[t] :=NewU()

does not verify, even if procedure NewU is specified to ensure ¬alloc[result] ∧ alloc[result] . The checker dreams up the possibility that NewU allocates a new

U object, squirrels it away into some b field, and then returns it. To cope with

this problem, we enrich the specification language with the expression virgin[x] , which means that x is not, and has never been, the value of any object field or global variable. The details are found in a paper by Leino and Stata [34].

How should a programmer use the specification language to record the design decision that a field is to be injective? One might first try to include this as part of the representation of an object’s validity, producing a rep declaration like

rep valid[t: T] ≡ . . . ∧ (b[t] =nil∨ ∀s: T :: s =t ⇒ b[s] =b[t])

But this seems problematical. It makes valid[t] depend not just on b[t] , but on

b[s] for all s of the appropriate type. It seems perverse to think of this unbounded

collection of b[s] ’s to be part of the “representation” of valid[t] .

A simpler and better strategy is to extend the specification language with the notion of a program invariant: a declaration of the form

invariant J

records the intention that the predicate J hold at every procedure call and return. For example, to specify the injectivity of b , the following program invariant can be used:

invariant ∀t, u: T :: t =nil ∧ u =nil ∧ t =u

The checker enforces program invariants with two checks. First, it checks that J is true at the “beginning of time”. Second, it checks that every procedure respects J (assuming that all the procedures it calls respect J ), that is, it conjoins

J to the pre- and postcondition of every procedure implementation and procedure

call.

The beginning-of-time test is straightforward and presents no modularity problems. It consists of the following proof obligation for each declared program in- variant J :

∀t :: t =nil ∨ ¬alloc[t] ⇒ J

That is, J must hold in a state in which no non- nil objects have been allocated. More precisely, this proof obligation must follow from the background predicate. If J contains free variables of primitive types like integers, then it must hold regardless of their values. To enforce invariants about global variables, the init- vars technique described in Section 9.2 is more useful. In our experience, we mostly use program invariants to assert universally quantified properties of objects of a certain type, like injectivity. In this case, the beginning-of-time test passes trivially.

The second test, that every procedure respects J , involves subtle modularity issues. The basic idea is simple: when in a scope D the checker desugars a spec- ification (either in reasoning about a procedure call or in checking a procedure implementation), it adds to the pre- and postcondition all invariants whose decla- rations are in D . However, if the program consists of a single global scope, then the soundness of this approach is clear: the change to the pre- and postconditions is the same for reasoning about the calls as for checking the implementations. If the program consists of many scopes, then modularity requirements must be im- posed to achieve soundness, by ensuring that primitive steps in a scope where the invariant is not visible cannot falsify the invariant. We will build up to the correct modularity requirements in stages. To begin with, we assume that the invariant contains concrete variables only.

The first modularity requirement for invariants that comes to mind is: a program invariant must be declared near all of its free variables.

Two declarations are near one another if they are contained in the same unit. It follows that they are visible in the same scopes.

This simple modularity requirement achieves soundness because an invariant cannot be falsified except by modifying its free variables. Thus, those procedures

whose implementation lies outside the scope of the invariant preserve the invariant because they cannot mention any of its free variables. The rest of the procedures are proved to maintain the invariant.

Unfortunately, this simple requirement is too strong because of the special concrete variable alloc , which represents the set of allocated objects and occurs implicitly in almost all invariants: recall from Section 8.1 that a quantification

∀t: T :: . . .

is desugared to

∀t: T :: alloc[t] ⇒ . . .

Consequently, it is necessary to loosen the simple rule to allow program invariants to mention alloc . This introduces the danger of a procedure falsifying an invariant invisible to it by modifying alloc . We address this difficulty by observing that the only way a procedure can directly modify alloc is by performing an allocation, and we can demand of an invariant that it be maintained by any allocation in any portion of the program in which it is not visible. To this end, we say that an invariant J passes the blind allocation test for a type T if J is invariant under

new(T).

This brings us to the second version of the modularity requirement for invariants:

(0) a program invariant must be declared near all of its free concrete variables, except alloc , and

(1) for all types T , either (a) T is declared near the invariant, or (b) the invari- ant passes the blind allocation test for T , or (c) T is not mentioned in the invariant.

Here’s a sketch of a justification for this version of the modularity requirement: Because of (0), the only invariant-falsifying primitive steps that we need to worry about are those that modify alloc , that is, expressions of the form new(T) for some type T . But it is impossible for the expression new(T) to falsify the in- variant, because for such a T , neither (a) nor (b) nor (c) could hold: not (a), since if T is declared near the invariant, the invariant is visible wherever new(T) can be called; not (b), since the blind allocation test explicitly checks that new(T)

maintains the invariant; and not (c), since new(T) cannot falsify the invariant if the invariant doesn’t mention T and passes the blind allocation test for T .

In order to pass the blind allocation test, a programmer must choose appropriate default values for the fields of an object type. For example, if a pivot is specified to be injective, its default value should be nil .

Let us return to a problem that we touched on in Section 9.1, namely the problem of declaring a data type containing unique identifiers:

unit U type T

spec var valid: T →bool proc init(t: T): T

modifies valid[t]

ensures valid[t] ∧ result =t unit UImpl import U

var id: T →int

. . . (other fields) . . .

rep valid[t: T] ≡ . . . var gcount: int impl init(t: T): T is

id[t] :=gcount;gcount :=gcount+1 . . .

result :=t end

To record the design decisions about id and gcount , one can add to UImpl the program invariants:

invariant ∀t: T :: t =nil ∧ valid[t] ⇒ id[t]<gcount invariant ∀t, u: T ::

t =nil ∧ u =nil ∧ valid[t] ∧ valid[u] ∧ t =u

⇒ id[t] =id[u]

In this approach, the statements about id and gcount that were problematical to place in the rep declaration (see Section 9.1) have been moved into program in- variants. The rep declaration for valid[t] concerns only fields of t . This seems an improvement, but this approach still has two problems: one is giving the pub- lic init method the license to modify the private variable gcount , the other is allowing the abstract variable valid to appear in a program invariant.

To solve the first problem, we can introduce an abstract variable, say istate for internal state, in the interface U :

spec var istate: any

We then allow init to modify istate , but istate has no other occurrences in the interface:

proc init(t: T): T

modifies valid[t],istate

ensures valid[t] ∧ result=t

Finally, we add the entire dependency of istate on gcount to the module UImpl :

depends istate on gcount

which by downward closure gives init the license to modify gcount .

The second problem is that the invariants mention valid[t] , but so far we have considered invariants containing concrete variables only. We cannot just eliminate the occurrences of valid[t] , since no default value for id will make the second invariant pass the blind allocation test for T . The blind allocation test is needed, since T is mentioned in the invariant but T and the invariant are not declared near one another.

One way to solve the second problem is to allow abstract variables in program invariants. We believe that it is sound to do so, provided that the invariant satisfies (0) and (1) from above, and also, for each abstract variable a appearing in the invariant:

(2) all dependencies of a are static, and

(3) either (a) the invariant is declared near a , or (b) the invariant is declared near every rep declaration of a and near every dependency of a .

However, this story is getting more complicated than we like. Perhaps it is best simply to forbid abstract variables from appearing in program invariants. If we do, we need some other way of dealing with the occurrences of valid[t] in the program invariants in the unique identifiers example. This we can do simply by inlining them, that is, by replacing valid[t] by whatever expression is given as its rep. Although awkward, this entails no loss of modularity or information hiding, since the invariants occur in a scope ( UImpl ) where the representation of valid[t] is visible.

10 Implementation status

Almost everything described in this paper has been implemented in the Modula-3 Extended Static Checker. Exceptions are:

0. the checker implements only the individual residues, not the shared residue

sres described in Section 6 beginning on page 34,

1. the checker does not enforce the dependency segregation restriction of Sec- tion 7.1 on page 45, but instead uses a more general way of computing the dynamic closure (“upward closure of dynamic predecessors”), which does not necessitate the restriction,

2. the checker does not enforce the disjoint ranges requirement of Section 7.2 (and as mentioned in that section, we leave it to the programmer to avoid abstract aliasing), and

3. the checker does not implement the initialization order checking of Sec- tion 9.2.

Our experience with the checker is described in more detail in our companion paper [8]. We have applied the checker to thousands of lines of code, both from the Modula-3 libraries and from programs that use the libraries. In specifying the libraries, we constantly used static and dynamic dependencies.

After experimenting with our Modula-3 checker, we embarked on another project to build an extended static checker for Java [13, 32]. In the ESC/Java project, we circumvented most of the difficulties described in this paper by omit- ting data abstraction from the annotation language. To partially make up for the omission, we provide object invariants [33] and ghost variables, but the fundamen- tal basis of our decision was to accept less thorough checking in order to produce a simpler checker.

11 Related work

Most work on data abstraction seems to be directed at one of two goals: algorithm design or structuring large systems.

When data abstraction is used for algorithm design, the representation is “in- lined” into the site of use as the refinement step of the design [5, 16, 25, 22, 39,

17, 14]. Consequently, the work on this kind of data abstraction is largely uncon- nected with the large system structuring problems that we are concerned with in this paper. This is not to deny that the underlying mathematics of data abstraction applies to both enterprises. Indeed, our first verification condition generator did not use explicit functionalization of abstract variables but instead used the “change of coordinates” approach common in algorithm refinement. However, we found that the result was that our theorem-prover was constantly forced to apply the “one-point rule” and that for our purposes, explicit functionalization is preferred.

Turning to data abstraction for the purpose of structuring large systems, the earliest treatments were in contexts where there was no independent information- hiding mechanism (like our units) and therefore the problems addressed in the present paper did not arise, or were ignored in the semi-formal treatments in the literature. These treatments include Milner’s definition of simulation [37], Hoare’s classic treatment of abstraction functions [18], and the influential work of Liskov and Guttag and the rest of the CLU community [35].

The first programming language to support information hiding in the way our units do was Mesa [38], with its definition modules and implementation modules. The Mesa designers appear to have been influenced by Parnas’s classic paper on decomposing systems into modules [46]. Mesa in turn influenced Modula [50], Modula-2 [51], Modula-3 [44], Oberon-2 [40], and Ada [4]. Ernst, Hookway, and Ogden have studied the problem of specifying Modula-2 programs where the objects of a module may share some global state [12]. These authors share our concern for modular verification, but the possible scopes they consider are not rich enough to allow subclasses or the RdRep interface of our example.

Another, rather different, approach of hiding information is to classify declarations as public or private. This approach is used in Oberon [49], C++ [11], and Java [15]. In the course of the ESC/Java project [13, 32], we used the modularity requirements of the units approach to guide our design for visibility of invariants in the public/private approach [33].

One of the central ideas of this paper, explicit dependency declarations, were introduced in Leino’s PhD thesis [28] in 1995. Between that time and this, they have been applied in a number of contexts: they played a central rˆole in ESC for Modula-3 [8], and they were incorporated in the specification languages JML [27] and Larch/C++ [26] and in the programming logic of M¨uller and Poetzsch-Heffter [41]. Another application (or reformulation) of dependency declarations is Leino’s technique of Data Groups [30].

As described in Section 7.2, our best attempt at a solution to the problem of abstract aliasing [7] is not fully satisfactory. We do find that our framework

of modular soundness and dynamic dependencies has allowed us to give a more incisive definition of the problem than other approaches in the literature, such as Hogg’s Islands [20], Almeida’s Balloons [3], Utting’s Extended Local Stores [48], the Flexible Aliasing Protection of Noble et al. [45], and Boyland’s Alias Bury- ing [6].

A few other researchers have employed declarations similar to our depends declaration connecting an abstract variable to the (more) concrete variables in its representation. Daniel Jackson’s Aspect system features dependencies much like ours, but his motivation seems to be to avoid the need for reasoning about the details of the actual representation, whereas we have argued that dependency declarations are necessary even in the presence of full representation declarations [21]. The COLD specification language of Jonkers includes abstract variables (called functions) and dependency declarations between them, but COLD seems not to allow an abstract variable to appear in a modifies list, so it doesn’t address many of the problems we have wrestled with [23].

12 Conclusions

We have applied precise formal methods to systems programs that are typical examples of the programming techniques used by careful and experienced contemporary programmers. We found that the formal methods described in the verification literature are inadequate to deal with the patterns of data abstraction and modularization in these programs. We have developed new formal methods to address these shortcomings.

Central to the new methods is the concept of an abstraction dependency, which is a kind of abstraction of an abstraction function, in the same sense that an opaque type is an abstraction of a concrete type. A dependency specifies one or more of the variables that occur in an abstraction function, but hides the detailed definition of the function. Just as an opaque type may be widely visible in a multi-module program, while the corresponding concrete type may be visible only narrowly, we discovered that it is often useful to make a dependency more widely visible than the abstraction function itself.

Different kinds of abstraction dependencies occur in different styles of design. Top-down programming leads to static dependencies, where an abstract field of an object is represented in terms of other fields of that same object. Bottom- up programming with reusable libraries leads to dynamic dependencies, where an abstract field of an object is represented in terms of fields of other objects,

reachable indirectly from the first object.

We have shown how to verify programs in the presence of static and dynamic dependencies by rewriting modifies lists, preconditions, and postconditions.

For static dependencies, we have two simple modularity requirements, which are laws for the placement of dependency declarations in a multi-module program. The requirements do not seem to preclude any useful designs, and we have a formal proof of modular soundness for the requirements. The formal proof makes use of our identification of modular soundness with the monotonicity of verifiabil- ity with respect to scope. For dynamic dependencies, we have several modularity requirements, but no soundness theorem, nor any confidence that the list of requirements is complete.

In our experience with static checking of contemporary program libraries, we have found that we use dependencies constantly in our annotations. We have also found that dependencies provide a new perspective on old problems like the problem of encapsulation and rep exposure.

Acknowledgments

At several points in the paper, we have remarked that implementing our ideas in a realistic program checker was critical to many of our discoveries. Here we will remark that Dave Detlefs was critical: he wrote the majority of the code, and he was often the first to see the methodological implications of practical issues.

In Section 6, we attributed the subtle example program to Jim Saxe’s pen- etrating intelligence; we also would like to thank him for his help with cyclic dependencies and other thorny problems.

Every project owes a debt to its devil’s advocates, and to our ESC project,

In document SRC RR 160 pdf (Page 72-178)