Model checking for symbolic-heap separation logic with inductive predicates

(1)

Co nsist ent *Complete* W ellD_o cu m en ted *E asy to Re us e * * Ev alu ate d * PO P L* Artifact * A EC

Model Checking for Symbolic-Heap Separation Logic with

Inductive Predicates

James Brotherston

University College London, UK

[email protected]

Nikos Gorogiannis

Middlesex University, UK

[email protected]

Max Kanovich

University College London, UK and National Research University Higher

School of Economics, Russia

[email protected]

Reuben Rowe

University College London, UK

[email protected]

Abstract

We investigate themodel checkingproblem for symbolic-heap

sep-aration logic with user-defined inductive predicates, i.e., the prob-lem of checking that a given stack-heap memory state satisfies a given formula in this language, as arises e.g. in software testing or runtime verification.

First, we show that the problem isdecidable; specifically, we

present a bottom-up fixed point algorithm that decides the problem and runs in exponential time in the size of the problem instance.

Second, we show that, while model checking for the full

lan-guage isEXPTIME-complete, the problem becomesNP-complete

orPTIME-solvable when we impose natural syntactic restrictions on the schemata defining the inductive predicates. We additionally

presentNPandPTIMEalgorithms for these restricted fragments.

Finally, we report on the experimental performance of our pro-cedures on a variety of specifications extracted from programs, ex-ercising multiple combinations of syntactic restrictions.

Categories and Subject Descriptors D.2.4 [Software / Program

Verification]: Model checking; F.3.1 [Specifying and Verifying and Reasoning about Programs]: Logics of programs, Assertions

Keywords Separation logic, model checking, inductive

defini-tions, complexity, runtime verification, program testing.

1. Introduction

In modern computer science,model checkingis most commonly

considered to be the problem of deciding whether a given Kripke

structure or transition systemS— typically representing a program

or system — satisfies, or is amodelof, a given formulaAof modal

or temporal logic [18]; this property is usually written asS |=A.

More generally, in mathematical logic,Smight be a mathematical

structure of virtually any kind andAa formula in some appropriate

logic for such structures (see e.g. [20] for the cases of first-order and monadic second-order logic).

In this paper, we investigate the model checking problem as it

arises in the setting ofseparation logicwith user-defined inductive

predicates. Separation logic is an established formalism for the ver-ification of imperative pointer programs, comprising both an

asser-tion language of formulas based onbunched logicand a Hoare-style

system of triples manipulating the pre- and postconditions of pro-grams [23, 29]. Given a program annotated with separation logic

assertions, one can try to provestaticallythat each assertion holds

at the appropriate program point; a long line of research in this area has resulted in a number of tools that are capable of doing

this automatically at leastsomeof the time for industrial code (see

e.g. [7, 8, 14, 16, 19, 24, 28]). Alternatively, one might also try to

testdynamicallywhether properties hold: simply execute the

pro-gram and check whether each assertion is satisfied by the actual memory state of the program at that point (this is sometimes known

asrun-time verification). Such an approach obviously necessitates a

method for deciding, for any memory stateSand separation logic

formulaA, whether or notS |= A: a model checking problem.

While this is straightforward for simple formulas, it becomes much more complicated when arbitrary user-defined inductive predicates, describing complex shape properties of the memory, are permitted. Our first contribution is a general model checking procedure

(in the sense above) for the most commonly consideredsymbolic

heapfragment of separation logic, extended with a general schema

for user-defined inductive predicates. Since our definition schema allows inductive predicates to denote possibly-empty heap memo-ries, and any heap trivially decomposes into itself combined with the empty heap, a naive top-down approach based on backtrack-ing search will generally fail to terminate. Instead, we employ a bottom-up approach based on computing the fixed point of all “sub-models” of the original memory that satisfy one of the defined in-ductive predicates. The crucial insight is that, for any given model checking query, the witnesses for the existentially quantified

vari-ables can be chosen from afixedset of values given in advance.

Our algorithm decides the model checking problem for our logic, in (worst-case) exponential time in the query size. Indeed, we show

that this problem isEXPTIME-complete.

In practice, however, it is often the case that the inductive pred-icate definitions encountered in verification practice fall within much more well-behaved fragments of our general inductive schemata. Our second main contribution is an analysis of the model

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Copyright is held by the owner/author(s).

POPL’16, January 20–22, 2016, St. Petersburg, FL, USA ACM. 978-1-4503-3549-2/16/01...$15.00

(2)

checking problem in cases where the syntactic form of inductive definitions is restricted in various ways (e.g., when recursion is forbidden in cases where the heap might be empty). We show that, for different combinations of these restrictions, the model checking

problem can becomeNP-complete or evenPTIME-solvable; and,

in such cases, we give concrete model checking algorithms that fall within the appropriate complexity bound.

Finally, we provide an implementation of our general model checking algorithm, and of our specialised algorithm for the

polynomial-time fragment, within the CYCLISTtheorem proving

framework; this implementation is available online [1]. We evaluate their performance on a range of examples gathered from the sep-aration logic community, as well as some hand-crafted examples. Our experimental results seem to bear out that our model checking methods are practical for runtime verification applications when suitable syntactic restrictions are present, and for offline testing (such as in unit test suites) in the general case.

Related work. Runtime verification for separation logic was

ad-dressed first in [26], and then more recently in the Verifast tool [3]. In both cases, model checking works only for classes of recursive predicates that are restricted in various ways, and comes without any formal correctness claims or complexity bounds. As far as we know, the present paper is the first to specifically address model checking for symbolic-heap separation logic with general induc-tive predicates from a fully formal perspecinduc-tive. However, the logic itself has attracted considerable recent interest amongst the veri-fication community. The aforementioned automated program ver-ification tools based on separation logic [7, 8, 14, 16, 19, 24, 28] are all based on symbolic heaps, and increasingly targeted at veri-fying specifications involving user-defined rather than hard-coded

predicates. Indeed, there are now even tools capable of

automati-callygenerating the definitions of inductive predicates needed for

analysis [11, 25]. On the theoretical side, the satisfiability problem for our logic was recently shown decidable [10] and its entailment problem undecidable [4], although decidability results have been obtained for restricted classes of entailments [5, 22]. Alongside these theoretical developments, there are automated tools geared towards the proof [13, 17] and disproof [12] of entailments, as needed to support program verification.

The remainder of this paper is structured as follows. Section 2 introduces our fragment of separation logic, and Section 3 develops our general model checking procedure for it. This model checking

problem is then shown to beEXPTIME-complete in Section 4. We

present our restricted fragments in Section 5, and establish their various complexities in Section 6. Section 7 presents details of our implementation and experiments, and Section 8 concludes.

2. SL

SH_ID

: Symbolic-heap Separation Logic with

Inductively Defined Predicates

In this section we present our fragmentSLSH

ID of separation logic,

which restricts the syntax of formulas tosymbolic heapsas

intro-duced in [5, 6], but allows arbitrary user-defined inductive predi-cates over these, as considered e.g. in [9, 10, 11, 22].

We often write vector notation to abbreviate tuples, e.g.xfor

(x1, . . . , xm). We writeproji for theith projection function on

tuples, and we often abuse notation by treating a tuplex as the

set containing exactly the elements occurring inx. IfXandY are

sets, we writeX#Y as a shorthand forX∩Y =∅.

2.1 Syntax

Atermis either avariablein the infinite setVar, or the constantnil.

We writex, y, z,etc. to range over variables, andt, u,etc. to range

over terms. We assume a finite setP={P1, . . . , Pn}ofpredicate

symbols, each with associated arity.

Definition 2.1 (Symbolic heap). Spatial formulas F and pure

formulasπare given by the following grammar:

F ::=emp|x7→t|Pt|F∗F π::=t=t|t6=t

where x ranges over variables, t over terms, P over predicate

symbols andt over tuples of terms (matching the arity ofP in

Pt).

Asymbolic heap is given by∃z.Π : F, where zis a tuple

of (distinct) variables,F is a spatial formula andΠis a finite set

of pure formulas. Whenever one ofΠ, F is empty, we omit the

colon. We writeF V(A)for the set of free variables occurring in

a symbolic heapA; by convention, the bound variable names inA

are chosen disjoint from the free variablesF V(A).

Definition 2.2. Aninductive rule set is a finite set ofinductive

rules, each of the form A ⇒ Px, where Ais a symbolic heap

(called thebodyof the rule),Pxa formula (called itshead),xis a

tuple of distinct variables andF V(A)⊆x.

For convenience, we sometimes drop existential quantifiers

from inductive rulesA⇒Px: in that case, any variables occurring

inAbut not inxare implicitly existentially quantified.

As usual, the inductive rules with P in their head should be

read as exhaustive, disjunctive clauses of an inductive definition of

P. The formal semantics appears below.

2.2 Semantics

We use a RAM model employing heaps of records. We assume

a countably infinite setValofvaluesof which an infinite subset

Loc⊂Valare addressablelocations; we insist on at least one

non-addressable valuenil∈Val\Loc.

Astackis a functions:Var→Val; we extend stacks to terms

by settings(nil) =defnil, and writes[z7→a]for the stack defined

assexcept thats[z 7→a](z) =a. We extend stacks pointwise to

act on tuples of terms.

A heap is a partial functionh:Loc*fin(Val List) mapping

finitely many locations torecords, i.e. arbitrary-length tuples of

values; we set dom(h) to be the set of locations on which h

is defined, ande to be the empty heap that is undefined on all

locations. We write◦forcompositionof domain-disjoint heaps:

ifh1andh2are heaps, thenh1◦h2is the union ofh1andh2when

dom(h1) # dom(h2), and undefined otherwise. Finally, we define

thecoverof a heaphas

cover(h) =defdom(h)∪ {b∈Val|b∈h(a), a∈dom(h)},

i.e., the set of all values mentioned anywhere inh.

Definition 2.3. LetΦbe a fixed inductive rule set. Then we say

that a stack-heap pair(s, h)is amodelof a symbolic heapAif the

relations, h|=ΦAholds, defined by structural induction onA:

s, h|=Φt1=t2 ⇔ s(t1) =s(t2) s, h|=Φt16=t2 ⇔ s(t1)6=s(t2) s, h|=Φemp ⇔ h=e s, h|=Φx7→t ⇔ dom(h) ={s(x)}andh(s(x)) =s(t) s, h|=ΦPt ⇔ (s(t), h)∈_JPi_KΦ s, h|=ΦF1∗F2 ⇔ ∃h1, h2. h=h1◦h2ands, h1|=ΦF1 ands, h2|=ΦF2

s, h|=Φ∃z.Π :F ⇔ ∃a∈Val|z|. s[z7→a], h|=Φπfor all

π∈Πands[z7→a], h|=ΦF

where the semantics_JP_KΦof the inductive predicatePunderΦis

defined below.

IfA contains no inductive predicates, then its satisfaction

re-lation does not depend on the inductive rulesΦ, and we typically

(3)

is a set of pure formulas, we writes|= Πto mean thats, h|=ΦΠ

for any heaphand inductive rule setΦ.

The following definition gives the standard semantics of the

inductive predicate symbolsPaccording to a fixed inductive rule

setΦ, i.e., as the least fixed point of ann-ary monotone operator

constructed fromΦ:

Definition 2.4. First, for each predicatePi∈Pwith arityαisay,

we defineτi= Pow(Valαi×Heap)(wherePow(−)is powerset).

We also partition the rule setΦintoΦ1, . . . ,Φn, whereΦiis the

set of all inductive rules inΦof the formA⇒Pix.

Now let eachΦi be indexed by j (i.e., Φi,j is the j-th rule

definingPi), and for each inductive ruleΦi,jof the form∃z.Π :

F ⇒Pix, we define the operatorϕi,j:τ1×. . .×τn→τiby:

ϕi,j(Y) =def{(s(x), h)|s, h|=YΠ :F}

whereY∈τ1×. . .×τnand|=Yis the satisfaction relation defined

above, except that_JPi_KY=defproji(Y). We then finally define the

tuple_JP_KΦ∈τ1×. . .×τnby:

JPK

Φ

=defµY.(Sjϕ1,j(Y), . . . ,Sjϕn,j(Y))

whereµis the least fixed point constructor. We write_JPiK

Φ as an

abbreviation forproji(JPK

Φ

).

Note that in computingϕi,j(Y)above, we strip the existential

quantifiers ∃z from the body of the inductive rule Φi,j, taking

advantage of the convention that the existentially bound variables

zare disjoint from the free variablesxinΦi,j.

3. A Model Checking Algorithm for

SL

SH_ID

In this section we develop a decision procedure for the model

checking problemin our logicSLSHID. Formally, this problem is stated

as follows:

Model checking problem (MC).Given an inductive rule setΦ,

stacks, heaphand symbolic heapA, decide whethers, h|=ΦA.

We observe that whethers, h|=ΦAdepends not on the entire

(infinite) valuation of sbut only on the values ofs onF V(A),

which is finite; thus an instance ofMCcan be also viewed as finite.

In fact, the problem can be simplified further by noting that, if we

can solve the case whenA = Px, forP an inductive predicate,

then the general case follows almost immediately:

Restricted model checking problem (RMC).Given an inductive

rule setΦ, tuple of valuesa∈Val, heaphand predicate symbol

P, decide whether(a, h)∈JPK

Φ .

Proposition 3.1. MCandRMCare (polynomially) equivalent.

Proof. Given an instance(Φ,a, h, P)ofRMC, wherem=|a|is

the arity ofP, we define the corresponding instance ofMCto be

(Φ, s, h, Px), wherexis anm-tuple of distinct variables andsis

any stack satisfyings(x) =a. Then, clearly,

s, h|=ΦPx ⇔ (s(x), h)∈JPK

Φ _⇔

(a, h)∈_JP_KΦ.

Conversely, let(Φ, s, h, A)be an instance ofMC. LetF V(A) =

x, letQbe a predicate symbol of arity|x|not occurring inΦ, and

defineΦ0 = Φ∪ {A ⇒Qx}. We then define the corresponding

instance ofRMCto be(Φ0, s(x), h, Q). By construction,

s, h|=ΦA ⇔ s, h|=Φ0Qx ⇔(s(x), h)∈_JQ_KΦ

0

.

Both reductions are trivially computable in polynomial time.

Thus it suffices to formulate a decision procedure for the

re-stricted problemRMC. Before diving into the details of our

de-cision procedure, let us motivate its development by making two main observations about this problem.

1. One might be tempted to adopt a top-down approach to the

problem by applying inductive rules backwards toPx,

obtain-ing smaller model-checkobtain-ing problems (in the size of the heap

h) as recursive instances. Unfortunately, our general schema for

inductive rules does not guarantee that the models of subformu-las of the body of an inductive rule are strictly smaller than the models of the entire body, and so such an approach might fail

to terminate. For example, suppose((a, b), h)∈_JP_KΦ, and is

generated by the inductive rule

∃z. P xz∗P zy⇒P xy.

Then we know that, for somec ∈ Val, we should have both

((a, c), h1) ∈ JPK

Φ

and ((c, b), h2) ∈ JPK

Φ

, whereh =

h1◦h2; but we donotknow thath1,h2 are smaller thanh;

either might be the empty heape. (Indeed, it is quite possible

thatallofh, h1, h2are empty.)

Therefore, we adopt a bottom-up approach: we attempt to

com-putealltuples in_JPiK

Φ_{that are “sub-models” of}_{(a, h)}_{, by}

iter-atively applying the inductive rules until we reach a fixed point.

(In fact, we have to do this for all inductive predicatesP

simul-taneously, in order to account for possible mutual dependency among them.) This process is guaranteed to terminate provided that there are only finitely many such sub-models.

2. The principal remaining difficulty is one of completeness: i.e.,

can we guarantee thatany (a, h) ∈ JPK

Φ

can be generated

by applying the inductive rules inΦto sub-models of(a, h)?

In fact this point is quite delicate, due to the presence of un-restricted existential quantification in our inductive rules. For

example, suppose(a, e)∈_JP_KΦis generated by the rule

∃z. z6=x:Qxz⇒P x.

Then we know that for someb ∈ Val, we have((a, b), e) ∈

JQK

Φ

, whereb 6= aand b(trivially) does not appear in the

empty heape. Thus we must allow our sub-models to mention

fresh, or “spare”, values not mentioned inaorh.

Fortunately, as we show (Lemma 3.7), only finitely many such

spare values are needed for any given rule setΦ; these can be

“recycled” as needed at each fresh application of an inductive rule in our fixed point computation.

We now formally define our fixed point construction computing

all tuples(b, h0) ∈ _JP_KΦ_{that are sub-models of a given}_{(a, h)}_,

where “sub-model” means thath0 ⊆ handbconsists of values

froma,cover(h), the null valueniland a suitably chosen set of

“spare” values.

Definition 3.2. LetΦbe an inductive rule set,aa tuple of values

(fromVal) andha heap. We define the setGood(a, h)(of “good

sub-model values for(a, h)”) by

Good(a, h) =a∪ {nil} ∪cover(h).

Now let β be the maximum number of (free and bound)

vari-able names appearing in any inductive rule in Φ. We define

Spare_Φ(a, h)to be a set ofβfresh values (fromVal) that do not

occur inGood(a, h).

Definition 3.3. LetΦbe an inductive rule set,aa tuple of values

and h a heap. For each inductive rule Φi,j ∈ Φ of the form

(4)

as follows:

ψi,j(Y) =def

(s(x), h0) s, h0|=YΠ :Fandh0⊆hand

s(x∪z)⊆Good(a, h)∪SpareΦ(a, h)

whereGood(a, h)andSpareΦ(a, h)are the sets of values given

by Definition 3.2. It should be clear that eachψi,jis a monotone

operator. Thus we define the tupleMCΦ_{(a, h)}_∈_τ

1×. . .×τnby

MCΦ(a, h) =defµY.(Sjψ1,j(Y), . . . ,

S

jψn,j(Y))

We writeMCΦ

i(a, h)as an abbreviation forproji(MC

Φ_{(a, h))}

. For the remainder of this section, we shall assume a fixed

in-stance ofMCΦ(a, h), given by choosing inductive rule setΦ,

tu-ple of valuesaand heaph.

It should be fairly obvious by comparing the constructions in

Definitions 2.4 and 3.3 thatMCΦ

i(a, h)can only contain tuples that

are already elements of_JPiK

Φ_{. The following lemma formalises}

that claim.

Lemma 3.4(Soundness). MCΦ(a, h)⊆JPK

Φ .

Proof. We proceed by fixed point induction on the tuple of sets

MCΦ(a, h). That is, we assume the inclusionY ⊆ _JP_KΦholds

for some tuple of setsY= (Y1, . . . , Yn)∈τ1×. . .×τn, and must

show it holds for (S

jψ1,j(Y), . . . ,

S

jψn,j(Y)). This means,

assuming that(b, h0)∈ψi,j(Y)for some inductive ruleΦi,j, we

must show that(b, h0)∈_JPiK

Φ

. Without loss of generality, we can

considerΦi,jto be written in the form:

∃z.Π :y17→u1∗. . .∗yk7→uk∗Pj1x1∗. . .∗Pjmxm⇒Pix.

By construction ofψi,j(Y), there is a stackswiths(x) =band

s, h0|=YΠ :y17→u1∗. . .∗yk7→uk∗Pj1x1∗. . .∗Pjmxm.

This means thats|= Πandh0=h1◦. . .◦hk+m, where

s, hi|=Yyi7→ui for all1≤i≤k,

and s, hk+i|=YPjixi for all1≤i≤m.

In particular, for any1≤i≤mwe have(s(xi), hk+i)∈Yjiand

thus, by the induction hypothesis,(s(xi), hk+i)∈JPjiK

Φ . That is,

s, hi|=Φyi7→ui for all1≤i≤k,

and s, hk+i|=ΦPjixi for all1≤i≤m.

Putting everything together, we have

s, h0|=ΦΠ :y17→u1∗. . .∗yk7→uk∗Pj1x1∗. . .∗Pjmxm.

Therefore,s, h0|=ΦPix, i.e.,(b, h0)∈_JPi_KΦas required.

Next, we must show thatMCΦi(a, h) containsall(b, h0) ∈

JPiK

Φ

that are “sub-models” of (a, h). To do this, we need to

argue that for any element(b, h0) ∈_JPi_KΦthat is “almost a

sub-model” of (a, h) in that h0 ⊆ h but bcontains “bad” values

(not inGood(a, h)orSpareΦ(a, h)), there are corresponding

sub-models inMCΦi(a, h), obtained by substituting “spare” values for

“bad” ones. The following definition captures the relevant notion of substitution.

Definition 3.5. A finite partial function θ : Val *fin Val is

called asubstitution forMCΦ_{(a, h)}

if it is injective, and, for all

b∈dom(θ),

θ(b) =b ifb∈Good(a, h), and

θ(b)∈Spare_Φ(a, h) ifb /∈Good(a, h).

Next, the following technical lemma, which will be crucial to completeness, captures the fact that we can “recycle” values as needed. Roughly speaking, it says that we can extend a substitution

on the valuesb instantiating the head of an inductive rule to a

substitution on the valuesV ⊇binstantiating the head of the rule

andthe existentially quantified variables in its body. This relies on

the fact that, by construction, there are at least as many spare values

inSpare_Φ(a, h)as there are variables in any inductive rule.

Lemma 3.6. Let θ be a substitution forMCΦ(a, h) such that

dom(θ)⊇b, andV ⊂Vala (finite) set of values withb⊆V and

|V| ≤ |SpareΦ(a, h)|. LetSpareΦ(a, h)\θ(b) ={d1, . . . , dm}

and letV \(b∪Good(a, h)) ={e1, . . . , ek}. Then the function

θ0:V →Val, defined by θ0(c) =def      c ifc∈Good(a, h) θ(c) ifc∈b\Good(a, h) di ifc=eifor some1≤i≤k,

is also a substitution forMCΦ(a, h), withθ0(b) =θ(b).

Proof.For convenience, we abbreviate Good(a, h) by G and

SpareΦ(a, h)bySin this proof.

First, sinceV ⊂Valis finite,θ0is indeed a finite partial function

Val*fin Val. We argue thatθ0is well-defined. The three cases of

its definition above are non-overlapping by construction, the first case is trivially well-defined and the second case is well-defined

sincedom(θ) ⊇b. Thus we just need to show that the third case

is well-defined, which means showing thatk≤m, i.e.,

|V \(b∪ G)| ≤ |S \θ(b)|.

Sinceθis injective by assumption,|θ(b)|= |b|. Thus, as|V| ≤

|S|, we have|V| − |b| ≤ |S| − |θ(b)|. Then, using standard set

theory, we have as required

|V \(b∪ G)| ≤ |V \b|

= |V| − |b| (sinceb⊆V)

≤ |S| − |θ(b)| (by the above)

≤ |S \θ(b)|.

Next we argue thatθ0is indeed a substitution forMCΦ(a, h).

It is easy to see thatθ0(c) = cifc ∈ Good(a, h)andθ0(c) ∈

SpareΦ(a, h)otherwise. We just need to showθ

0

is injective. This

follows from the fact that the three definitional cases ofθ0 are

given by three injective functions with pairwise disjoint ranges:G,

θ(b)(⊆ S)andS \θ(b), respectively. Hence ifθ0(c1) =θ0(c2)

then bothc1andc2fall into the same definitional case ofθ0, and so

c1=c2by injectivity of the corresponding function. Thus indeed

θ0is a substitution forMCΦ_{(a, h)}_{as required.}

Finally, to see thatθ0(b) = θ(b), observe thatθ0(c) = θ(c)

immediately ifc∈b\ G, and ifc∈b∩ Gthenθ0(c) =c=θ(c),

using the fact thatθis a substitution forMCΦ(a, h).

Lemma 3.7(Completeness). Let(b, h0) ∈ _JPi_KΦandh0 ⊆ h,

and letθ be a substitution forMCΦ(a, h) withdom(θ) ⊇ b.

Then(θ(b), h0)∈MCΦi(a, h).

Proof.We proceed by fixed point induction onJPK

Φ

. That is, we

assume the lemma holds forY= (Y1, . . . , Yn)∈τ1×. . .×τn,

and must show it holds for(S

jϕ1,j(Y), . . . ,

S

jϕn,j(Y)). This

means, assuming that(b, h0) ∈ ϕi,j(Y)for some inductive rule

Φi,j, whereh0⊆hand we have aθsatisfying the conditions of the

lemma, we must show that(θ(b), h0)∈MCΦ

i(a, h).

Without loss of generality, we may considerΦi,jto be written

in the form:

∃z.Π :y17→u1∗. . .∗yk7→uk∗Pj1x1∗. . .∗Pjmxm⇒Pix.

By construction ofϕi,j, we have a stackssuch thats(x) =band

(5)

This means thats|= Πandh0=h1◦. . .◦hk+m, where

s, hi|=yi7→ui for all1≤i≤k,

and s, hk+i|=YPjixi for all1≤i≤m.

The latter two statements can be rewritten as follows:

dom(hi) ={s(yi)}andhi(s(yi)) =s(ui) for all1≤i≤k,

and(s(xi), hk+i)∈Yji for all1≤i≤m.

Recall thatx and z describe respectively the sets of all free

and bound variables appearing in the inductive ruleΦi,j. We have

that s(x ∪ z) ⊂ Val is finite, and b = s(x) ⊆ s(x ∪z)

and|s(x∪z)| ≤ |SpareΦ(a, h)|by construction. Therefore, by

takingV =s(x∪z)in Lemma 3.6, and notingdom(θ) ⊇bby

assumption, we can obtain a substitutionθ0 forMCΦ_{(a, h)}

with

dom(θ0) =s(x∪z)andθ0(b) =θ(b).

Now, sinceθ0 is injective, it is easy to see thats◦θ0 |= Π

(where◦here denotes function composition). Additionally, since

s(yi), s(ui)⊆cover(h)⊆Good(a, h), we have by construction,

for all1≤i≤k,

dom(hi) ={θ0(s(yi))}andhi(θ0(s(yi))) =θ0(s(ui))

i.e. s◦θ0, hi|=yi7→ui.

Notice that, for any1≤i≤m, we have bothhk+i⊆h0⊆hand

dom(θ0)⊇s(xi). Therefore, by the induction hypothesis,

(θ0(s(xi)), hk+i))∈MCΦi(a, h) for all1≤i≤m.

Putting everything together, we obtain

s◦θ0, h0|=_MCΦ_(a,h) Π :y17→u1∗. . .∗yk7→uk

∗Pj1x1∗. . .∗Pjmxm.

As(s◦θ0)(x∪z)⊆Good(a, h)∪Spare_Φ(a, h)by construction,

we obtain by the definition ofMCΦ_{(a, h)}_{(Definition 3.3):}

((s◦θ0)(x), h0)∈MCΦi(a, h).

Finally, as s(x) = b and θ0 coincides with θ on b, we have

(s◦θ0)(x) =θ0(s(x)) =θ(b). Thus we obtain as required

(θ(b), h0)∈MCΦi(a, h).

Lemma 3.8. For each1≤i≤n,

(a, h)∈_JPi_KΦ ⇔ (a, h)∈MCΦi(a, h).

Proof. The(⇐)direction follows directly from Lemma 3.4. The

(⇒) direction follows from Lemma 3.7 by taking(b, h0) there

to be (a, h), andθ to be the identity function on a(noting that

a⊆Good(a, h), so this is trivially a substitution in the sense of

Definition 3.5).

Lemma 3.9. MCΦ(a, h)is finite and computable.

Proof. By construction (Definition 3.3),MCΦ_{(a, h)}_{can only}

con-tain tuples of the form(b, h0), whereh0⊆handbis a finite tuple

of values, drawn from the finite setGood(a, h)∪Spare_Φ(a, h). As

the heaphis also finite, each such(b, h0)is a finite object and there

can be only finitely many of them. HenceMCΦ(a, h)is finite.

To see thatMCΦ(a, h)is computable, observe that it is defined

as the least fixed point of a monotone operator. It is well known that

this least fixed point can be approached iteratively inapproximant

stages, starting from then-tuple(∅, . . . ,∅). SinceMCΦ_{(a, h)}

is finite, there can be only finitely many such approximants. To see

that each one is computable, it suffices to show that anyψi,j(Y)is

computable, given thatY∈τ1×. . .×τnis computable and the

inductive ruleΦi,jis of the form∃z.Π : F ⇒Pix, say. This is

quite clear: First, there are only finitely many membership

candi-dates(b, h0)withh0 ⊆handb⊆Good(a, h)∪SpareΦ(a, h).

Second, since whethers, h0 |=Y Π :F depends only on the

val-uessassigns to the variables appearing inΠ :F, for each

candi-date(b, h0)it suffices to pick anarbitrarystackswiths(x) =b

ands(z) ⊆ Good(a, h)∪Spare_Φ(a, h). Finally, for any such

s, h0 and computableY it is straightforward to decide whether

s, h0|=YΠ :F.

Theorem 3.10. The model checking problemMC is decidable.

That is, for any stacks, heaph, inductive rule setΦand symbolic

heapA, it is decidable whethers, h|=ΦA.

Proof.By Proposition 3.1, it suffices to show thatRMCis

decid-able. Let(Φ,a, h, Pi)be an instance ofRMC. By Lemma 3.8,

de-ciding whether(a, h) ∈ _JPiK

Φ_{is equivalent to deciding whether}

(a, h)∈MCΦi(a, h). By Lemma 3.9, we have thatMC

Φ

i(a, h) =

proj_i(MCΦ(a, h))is a finite and computable set. Hence it is

de-cidable whether(a, h)∈MCΦi(a, h).

Remark 3.11. Since satisfiabilityfor our logic is known to be

decidable [10], one might imagine that we can simply reduce

model checking to satisfiability: encode the state(s, h)as a

for-mulaγ(s, h), so that s, h |=Φ Aiff γ(s, h)∧A is satisfiable.

Unfortunately, this does not work for our logic since standard

con-junction∧between arbitrary symbolic heaps is not permitted.

Remark 3.12. In practice, we might sometimes want to consider

“intuitionistic” model checking queries, of the following form:

Given an inductive rule setΦ, stack s, heap h and formulaA,

decide whether there is anh0 ⊆ h such that s, h0 |=Φ A. As

in Proposition 3.1, we may assume without loss of generality

thatA = Pix for some predicate symbolPi. This problem is

clearly also decidable: lettings(x) =a, we simply check whether

(a, h0) ∈MCΦ

i(a, h)for someh

0

. Correctness follows similarly to Lemma 3.8. Indeed, all of our correctness and complexity results in this paper adapt straightforwardly to intuitionistic queries.

We conclude this section by deducing some immediate

conse-quences of Theorem 3.10 for theentailmentproblem inSLSHID.

Definition 3.13. Given an inductive rule setΦand symbolic heaps

A, B, we say the entailmentA`ΦBisvalidifs, h|=ΦAimplies

s, h|=ΦBfor all stackssand heapsh, andinvalidotherwise.

It was shown in [4] that the set of valid sequents is not recur-sively enumerable (and, therefore, validity is, in general,

undecid-able). However, it does turn out to beco-recursively enumerable.

Corollary 3.14. For any entailmentA`Φ B, the set of its

coun-termodels,{(s, h)|s, h|=Φ A and s, h6|=Φ B}, is recursively

enumerable.

Proof.First, the set of all heaps is recursively enumerable, since

heaps are finite objects. Second, although stacks are not finite

objects, it clearly suffices to enumerate only the values ofson the

finite set of variablesF V(A)∪F V(B) = x, say. Thus we can

recursively enumerate all “representative candidates” of the form

(s(x), h). Finally, for any such candidate model, we can decide

whethers, h|=ΦAands, h6|=ΦBby Theorem 3.10.

Corollary 3.15. For any inductive rule setΦ, the set of invalid

entailments overΦis recursively enumerable.

Proof.The set of all symbolic heaps overΦis recursively

enumer-able, so the set of all entailmentsA`ΦBis also enumerable. Next,

note that the set of countermodels of a given entailment is enumer-able (Corollary 3.14). Thus the invalid entailments are recursively enumerable simply by enumerating all entailments and, for each of these, dovetailing the process of searching for a countermodel.

(6)

4. Complexity of General Model Checking

In this section we investigate the computational complexity of the

general model checking problemMC, as described in the previous

section. Specifically, we show that MC isEXPTIME-complete,

and is stillNP-hard in the size of the heap when the underlying

inductive rule set is fixed in advance.

In the following, we writekokto denote the length of (some

reasonable) encoding of a finite mathematical objecto.

Lemma 4.1. MCisEXPTIME-hard.

Proof. By Proposition 3.1, it suffices to show that the restricted

model checking problem,RMC, isEXPTIME-hard. This is by

re-duction from the following special case of thesatisfiabilityproblem

forSLSH

ID, which was shown to beEXPTIME-hard in [10]: given

an inductive rule set Φcontaining no occurrences of 7→, and a

predicate symbolP fromΦof arity0, decide whether there

ex-ists a models, hsuch thats, h|=ΦP. SinceΦcontains no

occur-rences of7→andP no free variables, this means deciding whether

e ∈ _JP_KΦ(recalleis the empty heap). Thus, given any instance

(Φ, P)of the above problem, the corresponding instance ofRMC

is simply given by(Φ,(), e, P).

Lemma 4.2. MC∈EXPTIME.

Proof. By Proposition 3.1, it suffices to showRMC∈EXPTIME.

By Lemma 3.8, deciding a given instance I = (Φ,a, h, Pi)

of RMC can be done by computing MCΦ(a, h) and checking

whether(a, h)∈proji(MC

Φ

(a, h)). Thus it suffices to show that

MCΦ(a, h)can be computed in time exponential inm=defkIk.

Recall thatMCΦ_{(a, h)}

is obtained by a fixed point construc-tion of a monotone operator (Definiconstruc-tion 3.3):

MCΦ(a, h) =defµY.(Sjψ1,j(Y), . . . ,Sjψn,j(Y))

This least fixed point can be approached iteratively from below,

starting from(∅, . . . ,∅). WritingN = |MCΦ(a, h)|, this

pro-cess will reach a fixed point in at mostN iterations. LetT be the

maximum number of polynomial-time steps required to compute

anyψi,j(Y), given the earlier fixed point approximantY. Since

each iteration involves the computation ofψi,j(Y)for every

induc-tive ruleΦi,j∈Φ, it is clear that computingMCΦ(a, h)requires

N· |Φ| ·Tpolynomial-time steps.

Now, to obtain an upper bound for N, observe that by

con-struction|MCΦ(a, h)|contains only pairs of the form(b, h0)such

thath0 ⊆h, the length ofbis bounded by the maximum arity of

any predicate, which is bounded bykΦk, andb⊆Good(a, h)∪

SpareΦ(a, h), which is bounded bykak+khk+1+kΦk=kIk+1

(the extra1comes fromnil). Therefore, we obtain

N≤(kIk+ 1)kΦk·2khk=O2poly(m)

Next, we obtain an upper bound forT. GivenY and the

induc-tive rule Φi,j of the form ∃z.Π : F ⇒ Pix, say, it clearly

suffices, for any “candidate” (b, h0) of the above form, to

de-cide whether or not (b, h0) ∈ ψi,j(Y). This means checking

whetherh0 ⊆ hand, for every valuation of the variablesx∪z

intoGood(a, h)∪SpareΦ(a, h), checking whether s(x) = b

ands, h0 |=Y Π : F (wheresis any stack obtained by

extend-ing the chosen valuation). The heap inclusion check can be done in polynomial time, and the number of possible valuations is

(eas-ily) bounded byN. To checks, h0 |=Y Π : F we might need

to consider every possible division ofh0 into a number of

“sub-heaps” bounded by the maximum number of∗s in any rule, in turn

bounded bykΦk(as per the proofs of Lemmas 3.4 and 3.7). There

are at most2khk·kΦksuch combinations. Finally, asYmight

con-tain up toN elements, checking whether a chosen division ofh0

satisfiesF with respect toYmight take up toN steps. All other

checks are polynomial, so we obtain

T ≤N·N·N·2khk·kΦk=O2poly(m)

Therefore, altogether, the computation ofMCΦ_{(a, h)}

requires at

mostN· |Φ| ·T =O(2poly(m))polynomial-time steps.

Theorem 4.3. MCisEXPTIME-complete.

Proof.Immediate by Lemmas 4.1 and 4.2.

Typically, in program verification applications, the definitions of the inductive predicates are fixed in advance. Thus, it is also of

interest to know how the complexity ofMCvaries in the size of the

heaphover a fixed inductive rule setΦ.

Proposition 4.4. MCisNP-hard inkhk.

Proof.By Proposition 3.1, it suffices to showRMCisNP-hard in

khk. We exhibit a polynomial-time reduction from the following

triangle partition problem, known to beNP-complete [21]: given a

graphG= (V, E)with|V|= 3qfor someq >0, decide whether

there is a partition ofGinto triangles.

First, we fix the following inductive rule setΦPT:

x7→nil⇒V(x)

e7→(x, y)∗e07→(y, x)⇒E(x, y)

V(x)∗V(y)∗V(z)∗E(x, y)∗E(y, z)∗E(z, x)⇒T

E(x, y)∗J⇒J emp⇒J T∗P⇒P J⇒P

Now we give the reduction. For any instanceG= (V, E)of the

above triangle partition problem, first writeV = {v1, . . . , vn}

andE = {e1, . . . , em}. We let a1, . . . , an andb1, . . . , b2mbe

distinct addresses inLoc, and define a heaphG, withdom(hG) =

{a1, . . . , am, b1, . . . , b2m}, as follows:

hG(ai) =nil for1≤i≤n,

hG(b2k) = (ai, aj) for1≤k≤2mandek={vi, vj},

hG(b2k+1) = (aj, ai) for1≤k≤2mandek={vi, vj}.

The required instance ofRMCis then given by(ΦPT,(), hG, P)

(note thatΦPTandParefixedfor anyG). Clearly it is

polynomial-time computable. For correctness, we need to show thatGhas a

partition into triangles if and only ifhG∈ _JP_KΦPT. This follows

from the following easy observations, for any subheaphofhGand

valuesc, d(the formal details are easily reconstructed):

• (c, h)∈_JV_KΦPT_iff_{(c, h)}_{exactly represents a vertex in}_G_;

• ((c, d), h)∈JEK

ΦPT_iff_{((c, d), h)}_{exactly represents an}

(undi-rected) edge inG;

• h∈JTK

ΦPT_iff_h_{exactly represents a triangle in}_G_;

• h∈_JJ_KΦPT_iff_h_{exactly represents some collection of edges}

inG;

• h ∈ _JP_KΦPT _iff _h _{exactly represents a collection of}

non-overlapping triangles inG covering all vertices inG, plus a

collection of “leftover” edges fromG, i.e. iffGhas a partition

into triangles.

5. Restricted Fragments

According to Theorem 4.3, the general model checking problem is

EXPTIME-complete. In practice, however, one frequently encoun-ters definition schemas that are more restrictive than our general schema. Here and in the next section, we investigate the computa-tional complexity of model checking when various natural syntac-tic restrictions are imposed on predicate definitions. Informally, the restrictions we consider are the following:

(7)

MEM: “Memory-consuming” rules, which only permit recursion in the presence of explicit non-empty memory.

CV: “Constructively valued” rules, in which the values of all

vari-ables occurring in a rule body are uniquely determined by the values of variables occurring in its head, together with the heap.

DET:“Deterministic” rules, in which the pure (dis)equality

con-ditions in the rules for a predicatePare mutually exclusive.

Arity: The maximum arity of any predicate is fixed in advance. Importantly, each of the above restrictions can be described

in a clear syntactical way. The restrictions DET and CV have

appeared previously in the literature, in various guises (e.g. [16, 3]).

Together, as we will show in Section 6.5, they implyprecision, the

notion that a formula unambiguously circumscribes the part of the

heap on which it is true [27]. The restrictionMEM, as far as we

know, is novel, but is seemingly crucial in reducing the complexity

of model checking fromEXPTIMEdown toNPorPTIME.

In Section 6 we show that the generalEXPTIMEcan be reduced

toPSPACEtoNPor evenPTIMEfor the fragments defined by different combinations of the above restrictions. The following table summarises our results:

CV DET CV+DET

non-MEM EXPTIME EXPTIME EXPTIME ≥PSPACE

MEM NP NP NP PTIME

Remark 5.1. For each of the combinationsMEM, MEM+CV,

MEM+DET, theirNP-completeness holds even when the arity of

the predicates involved isfixed in advance.

In contrast, notwithstandingEXPTIME-hardness, for the

frag-ment defined only bynon-memory-consuming rules, model

check-ing can be resolved inPTIME, but the degree of the polynomial is

proportional to the maximal arity of the predicates involved.

We now formally introduce the restrictionsMEM,CVandDET.

Definition 5.2(MEM). An inductive rule set is said to be

memory-consuming(a.k.a. “inMEM”) if every rule in it is of the form

Π :emp ⇒ Px,

or ∃z.Π :F∗ x7→t ⇒ Px.

In practice, most predicate definitions in the literature fall into

MEM: one or more pointers are “consumed” when recursing.

Example 5.3. The following definitions of binary predicatesls,

defining possibly-cyclic list segments by head recursion, andrls,

defining possibly-cyclic list segments by tail recursion, are both in

MEM. Both definitions “consume” a pointer when recursing.

x=y:emp ⇒ ls(x, y)

∃z. x7→z ∗ ls(z, y) ⇒ ls(x, y) (1)

and

x=y:emp ⇒ rls(x, y)

∃z. x6=y: rls(x, z) ∗z7→y ⇒ rls(x, y) (2)

Definition 5.4(CV). A variablezoccurring in an inductive rule

∃y.Π :F ⇒Pixis said to beconstructively valuedin that rule

if: (a)z ∈x, or (b) there is a variablewalso occurring in the rule

such thatwis constructively valued, and either

•Π|=z=w, or,

•w7→uis a subformula ofFandz∈u.

An inductive rule is constructively valued if all its variables are,

and an inductive rule set is constructively valued (a.k.a. “inCV”) if

all its rules are.

Example 5.5. The existentially quantified variablezis

construc-tively valued in the definition oflsin Example 5.3 (1), but not in

the definition ofrls(2).

Definition 5.6(DET). A predicatePiis said to bedeterministic

(in an inductive rule setΦ) if for any two distinct rules of the form

∃z.Π :F ⇒Pix and ∃z

0

.Π0:F0⇒Pix,

there exists no stack s such that s, e |= ∃z.(Π : emp) and

s, e|= ∃z0.(Π0 : emp). An inductive rule setΦis deterministic

(a.k.a. “inDET”) if all predicates defined inΦare deterministic.

We note that whetherΦis deterministic is decidable in

polyno-mial time via a simple procedure that eliminates pure sub-formulas employing quantified variables [10].

Example 5.7. In Example 5.3, the definition ofrls(2) is

determin-istic, but the definition ofls(1) is not.

6. Complexity of Restricted Fragments

In this section, we investigate the computational complexity of model checking for different combinations of the restrictions

MEM,CVandDETintroduced in the previous section.

The main technical tool underpinning the following complexity

results is the notion of an unfolded inductive tree. The idea is

simple: in order to show that(a, h)∈_JP_KΦ, we repeatedly unfold

the rules from Φ backwards (from head to body), instantiating

variables with values and matching7→assertions with pointers inh

as we go according to the rule constraints.

Definition 6.1. For the sake of brevity, given an elementary

for-mulaQ(x1, .., xn), we writeQeto denote a statementQ(a1, .., an)

obtained by instantiating the variables inQwith values and heap

pointers in the obvious way. For example,x_e7→_eyrepresents the

one-cell heap that containsy_eat locationx_e. Then we specify the

unfolded inductive tree by induction (see Figure 1):

(a) LetQgmbe generated by an instantiated ruleρmof the form

g

Πm:hm ⇒ Qgm

Then we make a tree, T

g

Qm, consisting of one edge labelled

byρm; the root is labelled byQgm, the leaf byhm.

• g Qm ? ρm • hm T g Qm   

(b) Suppose that an instantiated ruleρof the form

e

Π :Qf0 ∗Qf1 ∗ · · · ∗Qgm ⇒ Re

generatesRe, andT

f

Q0,TQf1,. . . ,TQgmare inductive trees having

been already constructed forQf0,Qf1,. . . ,gQm, resp.

Then we make a tree, T

e

R, by taking a new root with m+1

outgoing edgesalllabelled byρ, labelling the root byRe, and

connecting our root with the roots ofT

f Q0,TQf1,. . . ,TQgm, resp. •e R ρ @ @ @ R ρ 9 ρ XX XX XX XX_X X z ρ • f Q0 A A A T e Q₀ • f Q1 A A A T e Q₁ •Qf2 A A A T e Q₂ •gQm A A A T e Qm • • •

(8)

Proposition 6.2. The restricted model checking problemRMCcan be solved by an exhaustive search for an unfolded inductive tree for

the query(a, h)∈_JP_KΦ, where the values for instantiated

existen-tial variables are drawn from the setGood(a, h)∪SpareΦ(a, h),

as per Definition 3.2.

Proof. (Sketch) The soundness of the approach is obvious.

Termi-nationfollows from the fact that the range of permissible values is

finite, andcompletenessfrom the results in Section 3 which show

that it suffices to confine our attention to values drawn from the

polynomial-size setGood(a, h)∪SpareΦ(a, h).

However, the above procedure might still generate an

exponen-tial number of leaves labelled by the empty heape:

Example 6.3. LetΦnbe the set of inductive rules (for1≤j≤n):

P1 ∗P2⇒P0,

P2j+1 ∗P2j+2⇒P2j−1, P2j+1 ∗ P2j+2⇒P2j,

emp⇒P2n+1, emp⇒P2n+2.

Then any unfolded inductive tree for the querys, e|=ΦnP0 has

2n+1leaves labelled bye.

Nevertheless, in the case of MEM, we are able to reduce

EXPTIMEtoNPby proving that the number of leaves labelled

byecan be bounded by|dom(h)|.

6.1 An UpperNP-Bound forMEM

Here, we show that, when we restrict to memory-consuming rules

(MEM), model checking becomes anNPproblem.

Theorem 6.4. We can design anNPprocedure to determine, given

a set of memory-consuminginductive rulesΦ, tuple of valuesa,

heaphand predicate symbolP, whether(a, h)∈JPK

Φ .

Proof.(Sketch) Taking into account the bounds provided by

Sec-tion 3, we look for an unfolded inductive tree such that within each

rule instanceρfromΦused in the tree allvaluesare taken only

from a set of polynomial size, which isfixed in advance.

To provide anNPprocedure, it suffices to prove Lemma 6.5

and Lemma 6.6 below. The crucial issue is that, in contrast to

Example 6.3, the number of leaves labelled byecan be bounded

by the size ofdom(h)when all rules are memory-consuming.

Lemma 6.5. According to Section 3,(a, h)∈_JP_KΦiff there is an

unfolded tree for the appropriately instantiatedPesuch thathis the

∗-composition of all heaps labelling the leaves of the tree.

Lemma 6.6. The number of nodes in the above inductive trees

forPeis bounded by2(m+ 1)· |dom(h)|, wheremis the maximal

number of predicate symbols in the body of the rules.

Proof. It is clear that the number of leaves labelled by non-empty

heaplets is bounded by|dom(h)|. Let vbe a leaf labelled bye.

Then either its parent w is the root of the tree, or the edge of

the form(v0, w)incoming towis labelled by someρ, providing

thereby a leafv0 with its incoming edge(v0, v0)labelled by the

sameρ, such thatv0 is labelled by anon-empty_ex7→et(Figure 2

shows such aρinMEM). Since no more thanmleaves labelled

byecan be associated with one and the samev0specified in such a

way,the total number of leavesis bounded by(m+1)· |dom(h)|.

It remains to apply an induction to conclude the proof.

6.2 NP-Hardness forMEM+CV

Here we showNP-hardness for the restricted fragmentMEM+CV.

The proof is by reduction from the3-partition problem[21].

•e R ρ @ @ @ R ρ 9 ρ XX XX XX XX XXz ρ • e x7→et • f Q1 A A A T e Q1 •Qf2 A A A T e Q2 •gQm ? ρm • • • • e Figure 2. ρ = Π :e x_e7→et ∗Qf1 ∗Qf2 ∗ · · · ∗gQm⇒Re

Definition 6.7. By means of the length of circular lists, we resolve

a key issue: representingintegersaslogic formulas.

By aring-formula of length`, with leading variablex,we mean

a formula of the form (x01,x

0 2,. . . ,x 0 `are fresh): x7→x01 ∗x 0 17→x 0 2 ∗x 0 27→x 0 3 ∗ · · ·x 0 `−17→x 0 ` ∗x 0 `7→x 0 1

Given a 3-partition problem instance, i.e., a bound B and a

multisetS={s1, s2, . . . , s3m}, we introduce a linear-ordered list

Xof distinct variables: X=x1, x2, . . . , xi, . . . , x3m.

Then we encode each of the numbers si by a ring-formula,

Si(xi), of lengthsi, with the leading variablexi.

The whole S is encoded as a concrete heap hS, which is a

collection of3mdisjoint “circular” lists of the formSi(ai).

With an appropriate stacksS, sS, hS|= Π(X) :ϕS(X), where

ϕS(X) = S1(x1)∗S2(x2)∗ · · · ∗S3m(x3m)

andΠ(X) says(x6=y) for all distinct variable namesxand y

mentioned, explicitly or implicitly, inϕS(X).

Define a set of inductive rules ΦS as follows. To keep the

predicate aritiesfixedand, at the same time, maintaini-th position

inside the tupleX, we define predicatesQi(x)by the rules

x6=nil:emp ⇒ Qi(x) (3)

Fori < j < k, we use “goal” predicatesRijk(x, y, z)with the rules

x6=y, y6=z, z=6 x:emp ⇒ Rijk(x, y, z) (4)

In the case ofi < j < kand si+sj+sk=B, we add the rule:

Si(x)∗Sj(y)∗Sk(z)∗Rijk(x, y, z) ⇒ Rijk(x, y, z) (5)

Lemma 6.8. LethSandsSbe defined above. Then

sS, hS|=ΦS Π(X) :

∗

3m

i=1Qi(xi)∗

∗

i<j<kRijk(xi, xj, xk)

iff there is a complete 3-partition onS- i.e.,Scan be partitioned

in groups of three, saysi,sj, andsk, so thatsi+sj+sk=B.

Proof.EachRijk(ai, aj, ak)at the top is generated either by (4),

withemp, or by (5), with Si(ai), Sj(aj), Sk(ak) being

‘con-sumed’. The latter provides the corresponding group ofsi,sj,sk.

Corollary 6.9. (a) In the case of the memory-consuming rules, the

model checking problem isNP-complete (even if the arity of the

predicates involved is at most3).

(a) For the memory-consuming rules with constructively valued

variables, model checking is stillNP-complete (even if the arity

of the predicates involved is at most3).

Proof.This follows from Sections 6.1 and 6.2.

6.3 NP-Hardness forMEM+DET

The challenge - to simulateintrinsically non-deterministic3-SAT

bydeterministicmemory-consuming rules, with, moreover,

keep-ing the arity of predicatesfixed- is solved using generalised

ver-sions of the linked list segments inductively defined in Exam-ple 5.3.

Namely, within a list fragment leading fromxtoy, by means of

RLS(x, u, y), defined below, we will keep the information about

(9)

Here we abbreviate X=x0, x1, .., xn, Q=q0, q1, .., q`, Ξ=ξ0, ξ1, .., ξm, Πi={xj6=q |j6=i, q∈Q} ∪ {z6=z0 _|_{z, z}0_∈_Q_∪_Ξ_}

Thefinal“empty configuration” is generated by the “backward” rule (recall thatξ0b is the blank symbol,ξ1b andξ2b are end markers) Π0, x0=q0, x1=ξ1, x2=x3=· · ·=xn−1=ξ0, xn=ξ2: emp ⇒ T(X,Q,Ξ)

An instruction“if in statebqklooking atξs, replace it byb ξbs0, move to the right, and go intobqk0”, is simulated bynrules (here0≤i < n): Πi, xi=qk, xi+1=ξs, yi=ξs0, yi+1=q_k0: T(x0, x1, .., xi−1, yi, yi+1, xi+2, .., xn,Q,Ξ) ⇒ T(X,Q,Ξ)

An instruction“if in state_bqklooking atξsb, replace it byξb_s0, move to the left, and go into_bq_k0”, is simulated in a similar “backward” way.

Analternatinginstruction“if in stateqkb, run two copies of the configuration in parallel but with statesbqk0andbqk00, resp.”, is simulated by (0≤i < n): Πi, xi=qk, yi=qk0, zi=q_k00: T(x0, .., xi−1, yi, xi+1, .., xn,Q,Ξ)∗T(x0, .., xi−1, zi, xi+1, .., xn,Q,Ξ)⇒T(X,Q,Ξ)

Figure 3. Simulating a Turing machineMrunning in spacen, in a backward manner - “from outputs to inputs”

A linked listRLS(x, u, y)is formed by attaching a new tail:

x=y: emp⇒RLS(x, x, y) ∃u. x6=z: RLS(x, u, y)∗ y7→(y, z)⇒RLS(x, y, z)

Notice that the rules forRLS(x, u, y)are both memory-consuming

and deterministic. The fact thatuis not constructively valued is a

key ingredient in our reduction, which allows us to cope with the

non-deterministicproblem3-SAT.

Definition 6.10. By means of the following heapletsh(0)_ab andh(1)_ab

(herea6=b) we represent the truth values, “false” and “true”, resp.:

h(δ)_ab = (

a→a b i.e.a7→(a, b), for δ= 0,

a→a b−→b b i.e a7→(a, b)∗b7→(b, b),for δ= 1.

Lemma 6.11. Assuming a6=b, let h(δ)_ab |=RLSRLS(a, c, b) for

somec. Then: (δ= 0)∧(a=c6=b)

∨ (δ= 1)∧(a6=c=b)

.

Proof. h(0)_ab |=RLSRLS(a, a, b), and h

(1)

ab |=RLSRLS(a, b, b).

Letϕ≡(Cb1∧ · · · ∧Cbm)be a formula ofmclauses over linear

orderednpropositional variablesp1,p2,..,pn, and eachCbjis of the

form (here, for anyq, we denote q1=q and q0=¬q):

b Cj(q1, q2, q3)≡(q εj,1 1 ∨q εj,2 2 ∨q εj,3 3 )

EachCcjis encoded by a predicateCj(α1, γ1, α2, γ2, α3, γ3)with

the following ruleξj(for the sake of readability, we squeeze three

deterministic rules into one but with disjunction): ((α16=γ1)εj,1_∨_(α2_6=γ2)εj,2_∨_(α3₆₌_γ3)εj,3₎_∧V

k6=`(αk6=α`) :

emp⇒Cj(α1, γ1, α2, γ2, α3, γ3)

Example 6.12. Here we representpias “xi6=ui”. So satisfiability

ofC(pb 1, p2, p3)of the form(p1∨ ¬p2∨p3)is reformulated as

(x16=u1)∨(x2=u2)∨(x36=u3) :C(x1, u1, x2, u2, x3, u3)

We take the following linear-ordered variables: W=w0,

X=x1, x2, .., x2n, U=u1, u2, .., u2n, Y=y1, y2, .., y2n.

The challenge - to maintain the arityfixedand, at the same time,

to “keepi-th position” inside the longX,U,Y- is solved by taking

predicatesQi(x, u, y)with the rulesκi:

x6=y, u=x:emp⇒Qi(x, u, y), x6=y, u=y:emp⇒Qi(x, u, y).

The key points of our reduction are encapsulated in the ruleωok:

∃X,U,Y,WΠ(X,Y,W) : w07→w0∗

∗

2n i=1Qi(xi, ui, yi)∗

∗

2n i=1RLS(xi, ui, yi)∗

∗

m j=1Cj(xij,1, uij,1, xij,2, uij,2, xij,3, uij,3) ⇒ ok

whereΠ(X,Y,W)says(x6=y)for all distinct variable namesx

andymentioned either inXor inYor inW.

Definition 6.13. Define a heapHRLSas a collection ofndisjoint

heaplets of the formh(0)_ab, andndisjoint heaplets of the formh(1)_ab

(we assumea6=b), and a heapHϕas a loop of the formd07→d0.

Lemma 6.14. With the empty input tuple of values,

((), HRLS◦Hϕ)∈JokK

ωok∪RLS∪Smj=1ξj∪Siκ2i=1n ₍₆₎

if and only if(Cb1∧ · · · ∧Cbm)is satisfiable.

Proof.(Sketch) The(⇒)direction is the hardest. Suppose that (6)

is valid. Thenok is generated by ωok with an unfolded

induc-tive tree forok. We can show that, for some valuesa1, a2, .., a2n,

c1, c2, .., c2n,b1, b2, .., b2n, the heapHRLScan be partitioned into

2ndisjoint heaplets of the formh(δi)

aibi, so that we get the

follow-ing:h(δi)

aibi|=RLSRLS(ai, ci, bi). Now, using the rulesξj andκi

and Lemma 6.11, we can prove the desired satisfiability (see also

Example 6.12): ϕ(δ1, δ2, . . . , δn−1, δn) = 1.

The(⇐)direction follows essentially by reading the above line

of reasoning “bottom-up”.

Corollary 6.15. For deterministic and memory-consuming rules,

model checking is stillNP-complete (even if the arity of the

predi-cates involved is at most6).

6.4 PSPACE- andEXPTIME-Hardness for Non-MEMRules

Unexpectedly, withnon-memory-consuming rules, such as

∃z.Π :Q1u1 ∗Q2u2 ∗ · · · ∗ Qmum ⇒ Px.

model checking becomes more complex. Namely:

Theorem 6.16. (a)For inductive rule sets inCV, model checking

isEXPTIME-complete.

(b) For inductive rule sets in CV+DET, model checking is

PSPACE-hard.

(c)For inductive rule sets inDET, model checking isEXPTIME

-complete.

Proof. (Sketch) We prove all three lower bounds by simulating

Turing machines in a backward manner - “from outputs to inputs”.

LetMbe a Turing machine that accepts in spacen, with states

b

q0,bq1,..,qb`, and tape symbolsξb0,ξb1,..,ξbm. Hereqb1is the initial

state,bq0is an accept state,ξb0is the blank symbol, andM acts in

the spacenbetween twounerasedmarkers, sayξb1andξb2.

LetMalways jump, and noM’s instruction starts withqb0.

By(η1, η2, .., ηi−1, qk, ηi, .., ηn) we formalize that “in statebqk,

M scansi-th square, whenηb1,bη2, ..,ηbi−1,bηi, ..,ηcnis printed on

its tape”. We encodeM by means of the rulesΦM given in

Fig-ure 3, where the “tape” predicateTdepictsM’s configurations:

T(x0, x1, .., xn | {z } configuration , q0, q1, .., q` | {z } states , ξ0, ξ1, .., ξm | {z } tape symbols )

Lemma 6.17. Let M be a deterministic TM or an alternating

TM [15]. Thens, e|=ΦM T(q1, ξ1, ξ0, ξ0, ξ0.., ξ0, ξ2,Q,Ξ)if and

only ifMcan go from the initial “empty configuration” to the final

(10)

All variables inΦMoccur inX∪Q∪Ξ, hence, they are con-structively valued, which provides item (a) in Theorem 6.16.

WheneverM is deterministic,ΦM is deterministic, which

pro-vides item (b) in Theorem 6.16.

To answer the challenging (c),M’snon-deterministic

instruc-tion“if in stateq_bklooking atξbs, go either into

b

qk1or intobqk2”

is simulated by thedeterministicrules (here0≤i < n, `= 1,2):

∃y.Πi, xi=qk, xi+1=ξs, y6∈Ξ, yi+1=ξd(s,k):

T(x0, .., xi−1, y, yi+1, xi+2, .., xn,Q,Ξ) ⇒ T(X,Q,Ξ),

Πi, xi=qd0(s,k,k_`), xi+1=ξd(s.k), yi=qk`, yi+1=ξs:

T(x0, .., xi−1, yi, yi+1, xi+2, .., xn,Q,Ξ) ⇒ T(X,Q,Ξ)

where, however,yisnotconstructively valued.

The idea behind the encoding is as follows. First,M goes

non-deterministically intoany statey, with encrypting the situation by

a special ‘double’ ξbd(s.k). To continue a computation, the lucky

guess should be one of our specially introduced statesq_bd0_(s,k,k

1)

and_bqd0_(s,k,k

2). As a result,Mfinishes either inqbk1or inqbk2.

Remark 6.18. The above EXPTIME-hardness necessarily

em-ploys predicates of unbounded arity (cf. Remark 5.1).

6.5 Polynomial Time forMEM+CV+DET

We next show that the model checking problem is inPTIME

when-everΦis inMEM+CV+DET. Essentially, predicates inCV+DET

make a top-down procedure fully deterministic; and, the size of any

possible proof is linear in the size of the heap, ifΦis inMEM.

Wherehis a heap andvis a value such thatv∈dom(h), we

writeh−. vto denote the heaph0that has domaindom(h)\vand

agrees withhon its domain. This operation is lifted in the obvious

way to sets of values, i.e.,h−. V whereV ⊆dom(h).

Definition 6.19. The constructs, h Φ A h0 is called a

reduction, wheresis a stack,h, h0are heaps andAis a symbolic

heap with inductive predicate occurrences defined inΦ. We say that

the above reduction isvalidifh0⊆hands, h−. dom(h0)|=ΦA.

Figure 4 presents a proof system for reductions. A proof is, as usual, a tree whose leaves are labelled by axioms and internal nodes are labelled by inference rules accordingly. We say that a reduction

Ris provable if there exists a proof whose root is labelled byR.

Lemma 6.20. (Soundness) For any set of rulesΦ, formulaA, stack

sand heaph, ifs, hΦA h0is provable then it is valid.

Proof. Follows by induction over the structure of the proof.

Letφ(Y) =¯ def (Sjϕ1,j(Y), . . . ,

S

jϕn,j(Y))(cf. Def. 3.3).

Set φ¯0 _{= (}_∅_{, . . . ,}_∅₎

and φ¯α+1 _{= ¯}_{φ( ¯}_φα₎

for any ordinal α.

Clearly, (a, h) ∈ JPiK

Φ

iff there is an ordinal α such that

(a, h)∈proji( ¯φ

α

). Next, we writes, h|=αΦΠ :Fif it is the case

thats, h|=ΦΠ :Fand for everyPitinF,(s(t), h0)∈proji( ¯φ

α

)

for the appropriate subheaph0 ⊆ h. We extend this to quantified

formulas using the same ordinal, and to valid reductions in the

obvi-ous manner. Finally, we say thats, hΦA h0isα-supported

ifαis theleast ordinalsuch thats, h−. dom(h0)|=αΦA.

LetRbe the set of valid, constructively valued reductions. We

define an ordering≺overR. LetRi ≡si, hiΦ Ai h0ibe

anαi-supported reduction inR, fori∈ {1,2}. ThenR1≺R2iff

either: (a)α1< α2; or (b)α1=α2and

1.A1≡Π1:FandA2≡Π2:FandΠ1⊂Π2, or,

2.A1≡F andA2≡σ∗Ffor some atomicσ, or,

3.A1≡ ∃x. BandA2≡ ∃y. Bandx⊂y.

The ordering≺is easily seen to be well-founded.

Lemma 6.21. (Completeness) For any set of rulesΦand formula

Athat are inCV, and any stacksand heaph, ifs, hΦA h0

is valid then it is provable.

Proof.We proceed by well-founded induction over(R,≺), i.e., we

show that if allR0∈ Rsuch thatR0 ≺Rare provable, then so is

R∈ R.

LetR ∈ Rbe the reduction s, h Φ ∃v. A h0. AsR

is constructively valued, there must be some variablex∈ vsuch

that (i) there is a free variableysuch thatx =yis a subformula

of A, or, (ii) xappears in the right-hand side of a subformula

y7→ tofA. Letz=defv\ {x}. AsRis valid we haveh0 ⊆h

ands, h−. dom(h0)|=α

Φ∃v. A, thus there is a stacks

0

such that

s0(F V(∃z. A)) =s(F V(∃z. A))ands0, h−.dom(h0)|=α

Φ∃z. A.

If clause (i) is true then clearlys0(x) = s0(y)by the

seman-tics of=, ands0(y) = s(y) asy is free. Thus without loss of

generality we can sets0 =s[x 7→s(y)]. Therefore the reduction

R0≡s[x7→s(y)], hΦ∃z. A h0is valid and constructively

valued. It is easy to see thatR0must beα-supported (otherwise we

can derive a contradiction with the assumption thatαis the least

such ordinal), thusR0 ≺Rby clause (2c) of the definition of≺.

By the inductive hypothesis,R0is provable and the rule (∃=) can

be applied, thus provingR. Clause (ii) is similar and uses (∃7→).

Next, letR∈ Rbe of the forms, hΦx=y,Π :F h0,

thus,h0 ⊆ h and s, h−. dom(h0) |=αΦ x = y,Π : F. It is

easy to see that the reductionR0 ≡ s, h Φ Π : F h0

is valid, constructively valued andα-supported. By clause (2a) of

the definition of≺it follows thatR0 ≺R. Thus by the inductive

hypothesis,R0is provable and the rule (=) applies, thereforeRis

also provable. Disequalities are treated similarly via the rule (6=).

Now, letR∈ Rbe of the forms, hΦ σ∗F h0. Thus,

h0 ⊆hands, h−. dom(h0) |=α

Φ σ∗F. Therefore there are two

disjoint heapshσ, hF such thath−. dom(h0) = hσ ◦hF, and

s, hσ |=αΦσ σ and s, hF |=

α_F

Φ F, whereα = max{ασ, αF}.

Thush = hσ ◦hF ◦h0, ands, h−. dom(hF ◦h0) |=αΦσ σ

and s, hF ◦h0 −. dom(h0) |=αΦF F. Therefore the reductions

Rσ≡s, hΦσ hF◦h0andRF ≡s, hF ◦h0ΦF h0

are both valid and constructively valued. In addition,Rσ, RF ≺R:

eitherασ < α(resp.,αF < α) where clause (1) of the definition

of≺applies, orασ =α(αF =α) and clause (2b) applies. By the

inductive hypothesis,RσandRFare provable, and so isRvia (∗).

Finally, supposeR ∈ Rhas the forms, hΦ Pit h0,

meaning thath0 ⊆hands, h−. dom(h0) |=α

Φ Pit. By Def. 3.3,

this means that(s(t), h−. dom(h0))∈_JPiK

Φ_{which in turn means}

that there is a rule ∃v.Π : F ⇒ Pix in Φ and some stack

s0 such that s0, h−. dom(h0) |=αΦ0 Π : F, and s

0

(x) = s(t)

(equality of tuples), andα0 < α. Trivially,s0, h−. dom(h0) |=αΦ0

∃v.Π : F. Thus the queryR0 ≡ s0, h Φ ∃v.Π : F h0

is valid, constructively valued andα0-supported. By the inductive

hypothesis,R0is provable and therefore so isRby applying (Pi).

The cases for7→,empeasily treated with rules (7→), (emp).

We recall the notion of precision: a formulaAis precise iff for

every stacksand heaph, there is at most oneh0 ⊆ hsuch that

s, h0 |=Φ A. Precision entails that ifs, hΦ A h0is valid

then there is no otherh00 6= h0 such thats, hΦ A h00is

valid. Thus precision allows the deterministic application of (∗).

Lemma 6.22. Let Φbe a set of rules in CV+DET. Then any

formula inCVusing predicates defined inΦis precise.

Proof.Observe the following points: ifΣandΣ0are precise then

so isΣ∗Σ0; ifΣis precise then so isΠ : Σ, for anyΠ; ifΠ : Σis

precise then so isA≡ ∃v.(Π : Σ), provided all variables invare

constructively valued inA. Finally, note that the problem reduces

to guaranteeing that every formula of the formPitis precise.

Thus we need to show that for every tuple of values a and

heaphthere is at most oneh0 ⊆ hsuch that(a, h0) ∈ _JPiK

Φ_.

This follows by fixpoint induction. Suppose there are valuesaand

(11)

s(x) =s(y) s, hΦΠ :F h0 (=) s, hΦx=y,Π :F h0 s(x)6=s(y) s, hΦΠ :F h0 (6=) s, hΦx6=y,Π :F h0 s(x)∈dom(h) h(s(x)) =s(t) (7→) s, hΦx7→t h−. s(x) s, hΦσ h0 s, h0ΦF h00 (∗) s, hΦσ∗F h00 x∈v y /∈v s[x7→s(y)], hΦ∃(v\ {x}).(x=y,Π :F) h0 (∃=) s, hΦ∃v.(x=y,Π :F) h0

x∈v y /∈v s(y)∈dom(h) ∃i. ti≡x s[x7→h(s(y))i], hΦ∃(v\ {x}).(Π :F∗y7→t) h0

(∃7→) s, hΦ∃v.(Π :F∗y7→t) h0 (emp) s, hΦemp h (∃v.(Π :F)⇒Pix)∈Φ s, e|=∃v.(Π :emp) s[x7→s(t)], hΦ∃v.(Π :F) h0 (Pi) s, hΦPit h0

Figure 4. Proof rules for reductions. The formulaσin the rule (∗) is atomic.

S

jϕi,j(Y). As the rules are precise there must bek 6= lsuch

that(a, h1) ∈ ϕi,k(Y)and(a, h2) ∈ ϕi,l(Y). This, however,

directly contradicts determinism.

Finally, we establish that proof search is a deterministic,

ter-minating decision procedure when all predicates are inCV+DET;

and, its runtime is naturally bounded by a polynomial in the input

size, if in addition, all rules are inMEM.

Theorem 6.23. Let Abe a formula inCVandΦa set of rules

inMEM+CV+DET. Then, for any stacksand heaph, checking

whethers, h|=ΦAcan be performed in polynomial time.

Proof. First, note thats, h|=ΦAiffs, hΦA eis provable.

Observe that the structure of a given reduction dictates that there is at most one applicable rule from Fig. 4. Rules about quantifiers,

(∃=) and (∃7→), form an exception but the order of their

applica-tions, as well as the choice of quantified variable to eliminate next,

is immaterial to the provability of aCVreduction, thus a fixed order

can be used.

AsΦis inDET, (Pi) can be used with at most one rule and this

rule can be found in polytime by evaluating the side condition of

(Pi) for all rules. This means no back-tracking is required overΦ.

The rule (∗) resembles a cut in that the intermediate heap h0

does not appear in the conclusion. This can be seen as a source of

non-determinism as many choices forh0may have to be checked

(due to the fact thath0 ⊆ h). However, as all formulas involved

are precise (Lemma 6.22) if there is such a heaph0 it is unique.

In addition, observe that only axioms impose constraints on the RHS heap. Thus, we avoid back-tracking by using meta-variables

for the heaph0and order the proof search to first prove the

left-hand subgoal of (∗). If the left subgoal of (∗) is proven, thenh0is

instantiated by axioms and the search continues in the right subgoal of (∗). Otherwise, the goal reduction is clearly invalid.

These observations together guarantee that the proof search is fully deterministic.

Now, forMEMrules each application of (Pi) leads to at least

one subgoal that requires (7→), and there cannot be more instances

of (7→) in a proof than the size of the heaphin the root reduction.

Thus the size of the proof isO(khk), and as the search is

determin-istic, runtime is linear as well.

Remark 6.24. Deciding intuitionistic queries (cf. Remark 3.12)

in MEM+CV+DETcan be done inPTIME. This follows easily

by noting that the proof search described in Theorem 6.23 actually

computesa heap for the RHS of any reduction. Given the precision

of the formulas involved (Lemma 6.22), this means that we can answer correctly intuitionistic queries using the same algorithm.

7. Implementation and Evaluation

We implemented the general model checking algorithm described

in Section 3 as well as theCV+DETalgorithm described in

Sec-tion 6.5, in about 1400 lines of OCaml code. Our implementaSec-tion is

part of the CYCLISTtheorem proving framework [13] which

pro-vides support for inductive definitions, and in particular for our

logicSLSHID. Both model checking algorithms are parameterised over

the datatypes for heap locations and ground values (e.g. integers, booleans, strings, etc.) and thus may be instantiated to handle mod-els where heap locations have arbitrary representations (e.g. hex strings) and heap cells contain arbitrary data. We employed a num-ber of techniques to improve the efficiency of the implementation, including pre-computing the models for points-to subformulas, us-ing hashsets to store submodels, and usus-ing bit vectors to represent heaps. We also implemented the intuitionistic version of our algo-rithms as per Remark 3.12 and Remark 6.24. The code and test suite for our tool are available online [1].

We tested the performance of our implementation across a range of ‘typical’ predicate definitions gathered from the verification community, and a number of hand-crafted definitions designed to elicit the worst-case, exponential performance. We extracted models from a number of example programs at runtime using an extension of GDB which supports scripting using Python [2]. All tests were carried out on a 2.93GHz Intel Core i7-870 processor with 8GB RAM.

We note that all tests carried out were on positive instances. This was decided for two reasons. First, the worst-case performance can be exhibited with positive instances as shown below. Second, when using runtime checks, for instance in code contracts or offline test suites, negative instances usually lead to program termination because they indicate that some pre- or postcondition, or invariant is no longer satisfied. Thus the runtime on positive instances is a much more important measure of performance.

‘Typical’ Performance. Testing our implementation against

typi-cal, real-world data requires sourcing programs annotated with sep-aration logic assertions. We identified 6 programs from the suite of examples in the Verifast distribution [24] containing non-trivial in-ductive predicates which translate into our assertion language:

(i)stack.c: a stack data-structure implemented using linked lists.

(ii) queue.c: a lock-free concurrent queue based on list segments.

(iii)set.c: a concurrent set data-structure based on linked lists.

(iv)schorr-waite.c: an implementation of the Schorr-Waite graph

marking algorithm over binary trees.

(v)iter.c: a list data-structure with aniteratorpointing into the list.

(vi) composite.c: an example of the composite design pattern,

where each node of a tree must maintain local data consistent with a global property.