Extending ILR - Theory and Implementation of Software Bounded Model Checking

This section extends ILR by an additional sort representing heap allocation states as well as multiple functions operating on this sort.

5.2.1 Sorts

Support for C’s heap memory allocation is implemented as an extension to the ILR language as well as an extension to the term rewriting systems for encoding. The theory uses pairs of pointers and integers to encode individual objects but these pairs are neither part of the language extension nor the term rewriting system. Instead, these objects are only used in axioms and proofs and the presented extension introduces only a single new sort which represents the heap memory allocation state as a whole. The new sorts are shown in table 5.1. In this table hN i indicates the width of the heap model’s pointers.

Sort Family Pattern Examples

Heap allocation state hhN i h16, h32, h64

Table 5.1: Heap sort

Apart from the sort for the heap state itself, the language extension also requires sorts for indices and sizes. The theory, as presented here, is not restricted to a single index sort but can be used with any sort as an index sort that provides the functions + and the predicates < and ≤, each with the usual meaning. Bitvectors as well as mathematical integers are both conceivable for this, though we will restrict ourselves to ILR’s pointer for indices and integers for sizes and offsets, both of which are bitvectors.

As already done in section 3.7 and chapter 4, we use placeholders instead of concrete sorts because the theory is independent of the target architecture’s bitwidth. These placeholders are shown in table 5.2. Note that we intentionally reuse the placeholders P and I as these are the sorts used in LLBMC.

Sort Placeholder Instantiatable Sorts

H Heap allocation state

P Pointer sort

I Size and offset sort

Table 5.2: Heap sort placeholders used in the following and their possible instantiations

For consistency with the rewrite rules presented in other parts of this thesis, we will

use ILR’s notation for the addition and comparison operations, namely addP,I and

eq_I, leu_I. To improve readability of axioms and proofs of this chapter, a simplified

notation is used for these. For example, we will assume a single target architecture with a single bitwidth for pointers and we will assume all integers to be of the

same bitwidth.3 This allows us to omit the subscripts indicating the functions’ sorts. Furthermore, we use the usual mathematical symbols for addition, equality, and integer comparison instead of ILR’s function symbols. Finally, we will omit the predicate h·i for the boolean sort.

Variable naming conventions in the chapter are shown in table 5.3. As usual, if any of these names are used, the sort is implied to be as given in this table.

Sort Family Variables

Heap allocation state h, h1, h2

Pointers p, q, r, o

Sizes and offsets s, t, u

Table 5.3: Heap variable naming conventions

For the axiomatization of the heap theory, we use pairs of pointers and integers to represent objects, as defined in [ISOC99, section 6.2.6.1]. Given an object (p, s), we will call p its address and s its size. Furthermore, the interval [p, p + s) will be

called the object’smemory range. Equality of pairs is defined by ∀p, s, q, t (p, s) =

(q, t) ↔ (p = q ∧ s = t).

According to [ISOC99, section 6.3.2.1], an lvalue may only be referenced if it refers to an object. From a low-level perspective this means a memory range may only be accessed, if it is completely contained in an object’s memory range. This motivates the definition of the auxiliary predicate contains:

Definition 5.1 (Contains). The predicate contains (P ×I ×P ×I) expresses that a memory range (p, s) another memory range (q, t) and is defined by the following axiom:

∀p, s, q, t contains(p, s, q, t) ↔ p ≤ q ∧ q + t ≤ p + s.

Similarly, [ISOC99, section 7.20.3] requires objects to be mutually disjoint motivating to the following definition of the auxiliary function disjoint:

Definition 5.2 (Disjoint). The predicate disjoint (P × I × P × I) expresses that two memory ranges (p, s) and (q, t) do not overlap and is defined by the following axiom:

∀p, s, q, t disjoint(p, s, q, t) ↔ p + s < q ∨ q + t < p.

Note that the functions contains and disjoint are not part of ILR but rule templates as described in section 2.1.6, though in the following, the two functions can be simply assumed to be rewritten by the following rewrite rules:

containsP,I(p, s, q, t) −→ and(leuP(p, q), le

P(addP,I(q, t), addP,I(p, s))) (5.1a)

disjoint_P,I(p, s, q, t) −→ or(ltu_P(addP,I(p, s), q), ltuP(addP,I(q, t), p)) (5.1b)

3_{While LLBMC allows for objects and heap allocation states of different target architectures}

in the same formula, we assume that pointers are only used on their own architecture and we therefore will not take differing bitwidths into account.

Given the definition of disjoint, we can now define a heap allocation state as the set of currently allocated objects where each object is identified by a base pointer and a size:

Definition 5.3 (Heap Allocation State). Given a bitwidth n ∈ N, a heap alloca- tion state h of sort hhNi is a set of pairs (P × I) where

∀(p, s) ∈ h p 6= 0

, and (5.2)

∀(p, s) ∈ h, (q, t) ∈ h (p, s) 6= (q, t) → disjoint(p, s, q, t). (5.3)

Both the requirement for p 6= 0 and the one for disjointness of objects follow directly from [ISOC99, section 7.20.3.3]. The additional constraint concerning alignment mentioned in that section is omitted in this thesis.

The terms heap and heap state are usually used to refer to the contents of dynamically allocated memory objects. Because we do not handle memory’s contents in this chapter and for the sake of brefity, we will refer to the heap allocation state as heap or heap state exclusively.

5.2.2 Functions

The core functions of the theory of dynamic memory allocation are listed in table 5.4.

The function empty_H creates a heap constant without any allocated objects. The

function malloc allocates a memory object given by a pointer to its lowest address

and its size, while the function freeH deallocates a previously allocated memory ob-

ject. The functions validaccessH and validfreeH check if memory allocation related

operations are valid. The predicate allocatableHasserts that a memory object given

by a pair of a pointer and size can be allocated.

Symbol : Signature Interpretation

empty_H: → H Empty heap

mallocH: H × P × I → H Heap allocation

freeH: H × P → H Heap deallocation

validaccessH: H × P × I → bool Access validity

validfreeH: H × P → bool Deallocation validity

allocatableH: H × P × I → bool Allocatability

Table 5.4: Functions encoding heap state (|H| = |P| = |I|) An empty heap does not contain any objects:

empty_H= ∅ (5.4)

The function malloc is the counterpart to C’s malloc function. In contrast to C’s function, malloc does not return a pointer to the allocated heap but takes the pointer as an argument instead.

There are two reasons for this, one of which is that malloc already returns the modified heap state. This could be handled in multiple ways, e.g. by returning the modified heap state and the pointer as a pair. Another approach is to return only the modified heap state and provide a separate function for retrieving a pointer to the last allocated object on a heap.

The second and more important reason for choosing the mentioned approach is that we consider malloc to have two separate concerns: First it identifies an allocatable memory range, and second it modifies the heap state to contain an object allocated at this address. The presented signature of malloc allows separating these two concerns cleanly. With the proposed solution, malloc itself is only concerned with modifying the heap allocation state by adding an object at a given memory range. We provide

the function allocatableH for identifying suitable memory ranges, but the use of

it is not enforced and it is perfectly fine to identify allocatable memory ranges in

other ways. For example, LLBMC can be configured to either use allocatableH to

identify allocatable memory ranges or to use fixed addresses which do not overlap by construction.

This separating of concerns is not without its disadvantages, however. The approach makes it possible to pass invalid memory ranges to malloc, Namely, those that overlap with ranges already contained in the heap object. To prevent this, the “good” and the “bad” case need to be handled separately:

∀h, p, s p 6= 0 ∧ ∀(r, u) ∈ h disjoint(r, u, p, s) →

malloc(h, p, s) = h ∪ {(p, s)} (5.5a)

∀h, p, s p = 0 ∨ ∃(r, u) ∈ h ¬ disjoint(r, u, p, s) →

malloc(h, p, s) = h (5.5b)

The function freeH removes an object from the heap allocation state if it was

previously allocated and does nothing if it was not (see [ISOC99, section 7.20.3.2]):

∀h, p, q, t freeH(h, p) = h \ {(q, t) : q = p} (5.6)

Deallocating a non-allocated object is undefined behavior in C, but we strongly prefer

the function freeH to be total. If free is called for a non-existent object, the only

sensible thing to do is to ignore the request. Because of this, freeH is a NOOP if

the pointer does not point at a currently allocated object.

A memory access is either a read or a write operation. This can be a load or a store instruction but also a memcpy or a similar function. A memory access to a range of memory is valid if and only if an object exists that contains the range (see [ISOC99, section 6.2.4]):

∀h, p, s validaccessH(h, p, s) ↔ ∃(r, u) ∈ h contains(r, u, p, s) (5.7)

A deallocation is valid if the pointer to be deallocated was allocated in the heap allocation state (see [ISOC99, section 7.20.3.2]):

∀h, p validfreeH(h, p) ↔ ∃(r, u) ∈ h r = p (5.8)

The function allocatableHcan be used to ensure that a pointer points at a suitable

and only if it does not overlap with any previously allocated object and its address is not null (see [ISOC99, section 7.20.3]):

∀h, p, s allocatableH(h, p, s) ↔ (p 6= 0 ∧ ∀(r, u) ∈ h disjoint(r, u, p, s)) (5.9)

5.3 A Partial Decision Procedure Based on Term

In document Theory and Implementation of Software Bounded Model Checking (Page 129-133)