Co nsist ent *Complete* W ellDo cu m en ted *E asy to Re us e * * Ev alu ate d * PO P L* Artifact * A EC
Model Checking for Symbolic-Heap Separation Logic with
Inductive Predicates
James Brotherston
University College London, UK
Nikos Gorogiannis
Middlesex University, UKMax Kanovich
University College London, UK and National Research University HigherSchool of Economics, Russia
Reuben Rowe
University College London, UKAbstract
We investigate themodel checkingproblem for symbolic-heap
sep-aration logic with user-defined inductive predicates, i.e., the prob-lem of checking that a given stack-heap memory state satisfies a given formula in this language, as arises e.g. in software testing or runtime verification.
First, we show that the problem isdecidable; specifically, we
present a bottom-up fixed point algorithm that decides the problem and runs in exponential time in the size of the problem instance.
Second, we show that, while model checking for the full
lan-guage isEXPTIME-complete, the problem becomesNP-complete
orPTIME-solvable when we impose natural syntactic restrictions on the schemata defining the inductive predicates. We additionally
presentNPandPTIMEalgorithms for these restricted fragments.
Finally, we report on the experimental performance of our pro-cedures on a variety of specifications extracted from programs, ex-ercising multiple combinations of syntactic restrictions.
Categories and Subject Descriptors D.2.4 [Software / Program
Verification]: Model checking; F.3.1 [Specifying and Verifying and Reasoning about Programs]: Logics of programs, Assertions
Keywords Separation logic, model checking, inductive
defini-tions, complexity, runtime verification, program testing.
1.
Introduction
In modern computer science,model checkingis most commonly
considered to be the problem of deciding whether a given Kripke
structure or transition systemS— typically representing a program
or system — satisfies, or is amodelof, a given formulaAof modal
or temporal logic [18]; this property is usually written asS |=A.
More generally, in mathematical logic,Smight be a mathematical
structure of virtually any kind andAa formula in some appropriate
logic for such structures (see e.g. [20] for the cases of first-order and monadic second-order logic).
In this paper, we investigate the model checking problem as it
arises in the setting ofseparation logicwith user-defined inductive
predicates. Separation logic is an established formalism for the ver-ification of imperative pointer programs, comprising both an
asser-tion language of formulas based onbunched logicand a Hoare-style
system of triples manipulating the pre- and postconditions of pro-grams [23, 29]. Given a program annotated with separation logic
assertions, one can try to provestaticallythat each assertion holds
at the appropriate program point; a long line of research in this area has resulted in a number of tools that are capable of doing
this automatically at leastsomeof the time for industrial code (see
e.g. [7, 8, 14, 16, 19, 24, 28]). Alternatively, one might also try to
testdynamicallywhether properties hold: simply execute the
pro-gram and check whether each assertion is satisfied by the actual memory state of the program at that point (this is sometimes known
asrun-time verification). Such an approach obviously necessitates a
method for deciding, for any memory stateSand separation logic
formulaA, whether or notS |= A: a model checking problem.
While this is straightforward for simple formulas, it becomes much more complicated when arbitrary user-defined inductive predicates, describing complex shape properties of the memory, are permitted. Our first contribution is a general model checking procedure
(in the sense above) for the most commonly consideredsymbolic
heapfragment of separation logic, extended with a general schema
for user-defined inductive predicates. Since our definition schema allows inductive predicates to denote possibly-empty heap memo-ries, and any heap trivially decomposes into itself combined with the empty heap, a naive top-down approach based on backtrack-ing search will generally fail to terminate. Instead, we employ a bottom-up approach based on computing the fixed point of all “sub-models” of the original memory that satisfy one of the defined in-ductive predicates. The crucial insight is that, for any given model checking query, the witnesses for the existentially quantified
vari-ables can be chosen from afixedset of values given in advance.
Our algorithm decides the model checking problem for our logic, in (worst-case) exponential time in the query size. Indeed, we show
that this problem isEXPTIME-complete.
In practice, however, it is often the case that the inductive pred-icate definitions encountered in verification practice fall within much more well-behaved fragments of our general inductive schemata. Our second main contribution is an analysis of the model
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Copyright is held by the owner/author(s).
POPL’16, January 20–22, 2016, St. Petersburg, FL, USA ACM. 978-1-4503-3549-2/16/01...$15.00
checking problem in cases where the syntactic form of inductive definitions is restricted in various ways (e.g., when recursion is forbidden in cases where the heap might be empty). We show that, for different combinations of these restrictions, the model checking
problem can becomeNP-complete or evenPTIME-solvable; and,
in such cases, we give concrete model checking algorithms that fall within the appropriate complexity bound.
Finally, we provide an implementation of our general model checking algorithm, and of our specialised algorithm for the
polynomial-time fragment, within the CYCLISTtheorem proving
framework; this implementation is available online [1]. We evaluate their performance on a range of examples gathered from the sep-aration logic community, as well as some hand-crafted examples. Our experimental results seem to bear out that our model checking methods are practical for runtime verification applications when suitable syntactic restrictions are present, and for offline testing (such as in unit test suites) in the general case.
Related work. Runtime verification for separation logic was
ad-dressed first in [26], and then more recently in the Verifast tool [3]. In both cases, model checking works only for classes of recursive predicates that are restricted in various ways, and comes without any formal correctness claims or complexity bounds. As far as we know, the present paper is the first to specifically address model checking for symbolic-heap separation logic with general induc-tive predicates from a fully formal perspecinduc-tive. However, the logic itself has attracted considerable recent interest amongst the veri-fication community. The aforementioned automated program ver-ification tools based on separation logic [7, 8, 14, 16, 19, 24, 28] are all based on symbolic heaps, and increasingly targeted at veri-fying specifications involving user-defined rather than hard-coded
predicates. Indeed, there are now even tools capable of
automati-callygenerating the definitions of inductive predicates needed for
analysis [11, 25]. On the theoretical side, the satisfiability problem for our logic was recently shown decidable [10] and its entailment problem undecidable [4], although decidability results have been obtained for restricted classes of entailments [5, 22]. Alongside these theoretical developments, there are automated tools geared towards the proof [13, 17] and disproof [12] of entailments, as needed to support program verification.
The remainder of this paper is structured as follows. Section 2 introduces our fragment of separation logic, and Section 3 develops our general model checking procedure for it. This model checking
problem is then shown to beEXPTIME-complete in Section 4. We
present our restricted fragments in Section 5, and establish their various complexities in Section 6. Section 7 presents details of our implementation and experiments, and Section 8 concludes.
2.
SL
SHID: Symbolic-heap Separation Logic with
Inductively Defined Predicates
In this section we present our fragmentSLSH
ID of separation logic,
which restricts the syntax of formulas tosymbolic heapsas
intro-duced in [5, 6], but allows arbitrary user-defined inductive predi-cates over these, as considered e.g. in [9, 10, 11, 22].
We often write vector notation to abbreviate tuples, e.g.xfor
(x1, . . . , xm). We writeproji for theith projection function on
tuples, and we often abuse notation by treating a tuplex as the
set containing exactly the elements occurring inx. IfXandY are
sets, we writeX#Y as a shorthand forX∩Y =∅.
2.1 Syntax
Atermis either avariablein the infinite setVar, or the constantnil.
We writex, y, z,etc. to range over variables, andt, u,etc. to range
over terms. We assume a finite setP={P1, . . . , Pn}ofpredicate
symbols, each with associated arity.
Definition 2.1 (Symbolic heap). Spatial formulas F and pure
formulasπare given by the following grammar:
F ::=emp|x7→t|Pt|F∗F π::=t=t|t6=t
where x ranges over variables, t over terms, P over predicate
symbols andt over tuples of terms (matching the arity ofP in
Pt).
Asymbolic heap is given by∃z.Π : F, where zis a tuple
of (distinct) variables,F is a spatial formula andΠis a finite set
of pure formulas. Whenever one ofΠ, F is empty, we omit the
colon. We writeF V(A)for the set of free variables occurring in
a symbolic heapA; by convention, the bound variable names inA
are chosen disjoint from the free variablesF V(A).
Definition 2.2. Aninductive rule set is a finite set ofinductive
rules, each of the form A ⇒ Px, where Ais a symbolic heap
(called thebodyof the rule),Pxa formula (called itshead),xis a
tuple of distinct variables andF V(A)⊆x.
For convenience, we sometimes drop existential quantifiers
from inductive rulesA⇒Px: in that case, any variables occurring
inAbut not inxare implicitly existentially quantified.
As usual, the inductive rules with P in their head should be
read as exhaustive, disjunctive clauses of an inductive definition of
P. The formal semantics appears below.
2.2 Semantics
We use a RAM model employing heaps of records. We assume
a countably infinite setValofvaluesof which an infinite subset
Loc⊂Valare addressablelocations; we insist on at least one
non-addressable valuenil∈Val\Loc.
Astackis a functions:Var→Val; we extend stacks to terms
by settings(nil) =defnil, and writes[z7→a]for the stack defined
assexcept thats[z 7→a](z) =a. We extend stacks pointwise to
act on tuples of terms.
A heap is a partial functionh:Loc*fin(Val List) mapping
finitely many locations torecords, i.e. arbitrary-length tuples of
values; we set dom(h) to be the set of locations on which h
is defined, ande to be the empty heap that is undefined on all
locations. We write◦forcompositionof domain-disjoint heaps:
ifh1andh2are heaps, thenh1◦h2is the union ofh1andh2when
dom(h1) # dom(h2), and undefined otherwise. Finally, we define
thecoverof a heaphas
cover(h) =defdom(h)∪ {b∈Val|b∈h(a), a∈dom(h)},
i.e., the set of all values mentioned anywhere inh.
Definition 2.3. LetΦbe a fixed inductive rule set. Then we say
that a stack-heap pair(s, h)is amodelof a symbolic heapAif the
relations, h|=ΦAholds, defined by structural induction onA:
s, h|=Φt1=t2 ⇔ s(t1) =s(t2) s, h|=Φt16=t2 ⇔ s(t1)6=s(t2) s, h|=Φemp ⇔ h=e s, h|=Φx7→t ⇔ dom(h) ={s(x)}andh(s(x)) =s(t) s, h|=ΦPt ⇔ (s(t), h)∈JPiKΦ s, h|=ΦF1∗F2 ⇔ ∃h1, h2. h=h1◦h2ands, h1|=ΦF1 ands, h2|=ΦF2
s, h|=Φ∃z.Π :F ⇔ ∃a∈Val|z|. s[z7→a], h|=Φπfor all
π∈Πands[z7→a], h|=ΦF
where the semanticsJPKΦof the inductive predicatePunderΦis
defined below.
IfA contains no inductive predicates, then its satisfaction
re-lation does not depend on the inductive rulesΦ, and we typically
is a set of pure formulas, we writes|= Πto mean thats, h|=ΦΠ
for any heaphand inductive rule setΦ.
The following definition gives the standard semantics of the
inductive predicate symbolsPaccording to a fixed inductive rule
setΦ, i.e., as the least fixed point of ann-ary monotone operator
constructed fromΦ:
Definition 2.4. First, for each predicatePi∈Pwith arityαisay,
we defineτi= Pow(Valαi×Heap)(wherePow(−)is powerset).
We also partition the rule setΦintoΦ1, . . . ,Φn, whereΦiis the
set of all inductive rules inΦof the formA⇒Pix.
Now let eachΦi be indexed by j (i.e., Φi,j is the j-th rule
definingPi), and for each inductive ruleΦi,jof the form∃z.Π :
F ⇒Pix, we define the operatorϕi,j:τ1×. . .×τn→τiby:
ϕi,j(Y) =def{(s(x), h)|s, h|=YΠ :F}
whereY∈τ1×. . .×τnand|=Yis the satisfaction relation defined
above, except thatJPiKY=defproji(Y). We then finally define the
tupleJPKΦ∈τ1×. . .×τnby:
JPK
Φ
=defµY.(Sjϕ1,j(Y), . . . ,Sjϕn,j(Y))
whereµis the least fixed point constructor. We writeJPiK
Φ as an
abbreviation forproji(JPK
Φ
).
Note that in computingϕi,j(Y)above, we strip the existential
quantifiers ∃z from the body of the inductive rule Φi,j, taking
advantage of the convention that the existentially bound variables
zare disjoint from the free variablesxinΦi,j.
3.
A Model Checking Algorithm for
SL
SHIDIn this section we develop a decision procedure for the model
checking problemin our logicSLSHID. Formally, this problem is stated
as follows:
Model checking problem (MC).Given an inductive rule setΦ,
stacks, heaphand symbolic heapA, decide whethers, h|=ΦA.
We observe that whethers, h|=ΦAdepends not on the entire
(infinite) valuation of sbut only on the values ofs onF V(A),
which is finite; thus an instance ofMCcan be also viewed as finite.
In fact, the problem can be simplified further by noting that, if we
can solve the case whenA = Px, forP an inductive predicate,
then the general case follows almost immediately:
Restricted model checking problem (RMC).Given an inductive
rule setΦ, tuple of valuesa∈Val, heaphand predicate symbol
P, decide whether(a, h)∈JPK
Φ .
Proposition 3.1. MCandRMCare (polynomially) equivalent.
Proof. Given an instance(Φ,a, h, P)ofRMC, wherem=|a|is
the arity ofP, we define the corresponding instance ofMCto be
(Φ, s, h, Px), wherexis anm-tuple of distinct variables andsis
any stack satisfyings(x) =a. Then, clearly,
s, h|=ΦPx ⇔ (s(x), h)∈JPK
Φ ⇔
(a, h)∈JPKΦ.
Conversely, let(Φ, s, h, A)be an instance ofMC. LetF V(A) =
x, letQbe a predicate symbol of arity|x|not occurring inΦ, and
defineΦ0 = Φ∪ {A ⇒Qx}. We then define the corresponding
instance ofRMCto be(Φ0, s(x), h, Q). By construction,
s, h|=ΦA ⇔ s, h|=Φ0Qx ⇔(s(x), h)∈JQKΦ
0
.
Both reductions are trivially computable in polynomial time.
Thus it suffices to formulate a decision procedure for the
re-stricted problemRMC. Before diving into the details of our
de-cision procedure, let us motivate its development by making two main observations about this problem.
1. One might be tempted to adopt a top-down approach to the
problem by applying inductive rules backwards toPx,
obtain-ing smaller model-checkobtain-ing problems (in the size of the heap
h) as recursive instances. Unfortunately, our general schema for
inductive rules does not guarantee that the models of subformu-las of the body of an inductive rule are strictly smaller than the models of the entire body, and so such an approach might fail
to terminate. For example, suppose((a, b), h)∈JPKΦ, and is
generated by the inductive rule
∃z. P xz∗P zy⇒P xy.
Then we know that, for somec ∈ Val, we should have both
((a, c), h1) ∈ JPK
Φ
and ((c, b), h2) ∈ JPK
Φ
, whereh =
h1◦h2; but we donotknow thath1,h2 are smaller thanh;
either might be the empty heape. (Indeed, it is quite possible
thatallofh, h1, h2are empty.)
Therefore, we adopt a bottom-up approach: we attempt to
com-putealltuples inJPiK
Φthat are “sub-models” of(a, h), by
iter-atively applying the inductive rules until we reach a fixed point.
(In fact, we have to do this for all inductive predicatesP
simul-taneously, in order to account for possible mutual dependency among them.) This process is guaranteed to terminate provided that there are only finitely many such sub-models.
2. The principal remaining difficulty is one of completeness: i.e.,
can we guarantee thatany (a, h) ∈ JPK
Φ
can be generated
by applying the inductive rules inΦto sub-models of(a, h)?
In fact this point is quite delicate, due to the presence of un-restricted existential quantification in our inductive rules. For
example, suppose(a, e)∈JPKΦis generated by the rule
∃z. z6=x:Qxz⇒P x.
Then we know that for someb ∈ Val, we have((a, b), e) ∈
JQK
Φ
, whereb 6= aand b(trivially) does not appear in the
empty heape. Thus we must allow our sub-models to mention
fresh, or “spare”, values not mentioned inaorh.
Fortunately, as we show (Lemma 3.7), only finitely many such
spare values are needed for any given rule setΦ; these can be
“recycled” as needed at each fresh application of an inductive rule in our fixed point computation.
We now formally define our fixed point construction computing
all tuples(b, h0) ∈ JPKΦthat are sub-models of a given(a, h),
where “sub-model” means thath0 ⊆ handbconsists of values
froma,cover(h), the null valueniland a suitably chosen set of
“spare” values.
Definition 3.2. LetΦbe an inductive rule set,aa tuple of values
(fromVal) andha heap. We define the setGood(a, h)(of “good
sub-model values for(a, h)”) by
Good(a, h) =a∪ {nil} ∪cover(h).
Now let β be the maximum number of (free and bound)
vari-able names appearing in any inductive rule in Φ. We define
SpareΦ(a, h)to be a set ofβfresh values (fromVal) that do not
occur inGood(a, h).
Definition 3.3. LetΦbe an inductive rule set,aa tuple of values
and h a heap. For each inductive rule Φi,j ∈ Φ of the form
as follows:
ψi,j(Y) =def
(s(x), h0) s, h0|=YΠ :Fandh0⊆hand
s(x∪z)⊆Good(a, h)∪SpareΦ(a, h)
whereGood(a, h)andSpareΦ(a, h)are the sets of values given
by Definition 3.2. It should be clear that eachψi,jis a monotone
operator. Thus we define the tupleMCΦ(a, h)∈τ
1×. . .×τnby
MCΦ(a, h) =defµY.(Sjψ1,j(Y), . . . ,
S
jψn,j(Y))
We writeMCΦ
i(a, h)as an abbreviation forproji(MC
Φ(a, h))
. For the remainder of this section, we shall assume a fixed
in-stance ofMCΦ(a, h), given by choosing inductive rule setΦ,
tu-ple of valuesaand heaph.
It should be fairly obvious by comparing the constructions in
Definitions 2.4 and 3.3 thatMCΦ
i(a, h)can only contain tuples that
are already elements ofJPiK
Φ. The following lemma formalises
that claim.
Lemma 3.4(Soundness). MCΦ(a, h)⊆JPK
Φ .
Proof. We proceed by fixed point induction on the tuple of sets
MCΦ(a, h). That is, we assume the inclusionY ⊆ JPKΦholds
for some tuple of setsY= (Y1, . . . , Yn)∈τ1×. . .×τn, and must
show it holds for (S
jψ1,j(Y), . . . ,
S
jψn,j(Y)). This means,
assuming that(b, h0)∈ψi,j(Y)for some inductive ruleΦi,j, we
must show that(b, h0)∈JPiK
Φ
. Without loss of generality, we can
considerΦi,jto be written in the form:
∃z.Π :y17→u1∗. . .∗yk7→uk∗Pj1x1∗. . .∗Pjmxm⇒Pix.
By construction ofψi,j(Y), there is a stackswiths(x) =band
s, h0|=YΠ :y17→u1∗. . .∗yk7→uk∗Pj1x1∗. . .∗Pjmxm.
This means thats|= Πandh0=h1◦. . .◦hk+m, where
s, hi|=Yyi7→ui for all1≤i≤k,
and s, hk+i|=YPjixi for all1≤i≤m.
In particular, for any1≤i≤mwe have(s(xi), hk+i)∈Yjiand
thus, by the induction hypothesis,(s(xi), hk+i)∈JPjiK
Φ . That is,
s, hi|=Φyi7→ui for all1≤i≤k,
and s, hk+i|=ΦPjixi for all1≤i≤m.
Putting everything together, we have
s, h0|=ΦΠ :y17→u1∗. . .∗yk7→uk∗Pj1x1∗. . .∗Pjmxm.
Therefore,s, h0|=ΦPix, i.e.,(b, h0)∈JPiKΦas required.
Next, we must show thatMCΦi(a, h) containsall(b, h0) ∈
JPiK
Φ
that are “sub-models” of (a, h). To do this, we need to
argue that for any element(b, h0) ∈JPiKΦthat is “almost a
sub-model” of (a, h) in that h0 ⊆ h but bcontains “bad” values
(not inGood(a, h)orSpareΦ(a, h)), there are corresponding
sub-models inMCΦi(a, h), obtained by substituting “spare” values for
“bad” ones. The following definition captures the relevant notion of substitution.
Definition 3.5. A finite partial function θ : Val *fin Val is
called asubstitution forMCΦ(a, h)
if it is injective, and, for all
b∈dom(θ),
θ(b) =b ifb∈Good(a, h), and
θ(b)∈SpareΦ(a, h) ifb /∈Good(a, h).
Next, the following technical lemma, which will be crucial to completeness, captures the fact that we can “recycle” values as needed. Roughly speaking, it says that we can extend a substitution
on the valuesb instantiating the head of an inductive rule to a
substitution on the valuesV ⊇binstantiating the head of the rule
andthe existentially quantified variables in its body. This relies on
the fact that, by construction, there are at least as many spare values
inSpareΦ(a, h)as there are variables in any inductive rule.
Lemma 3.6. Let θ be a substitution forMCΦ(a, h) such that
dom(θ)⊇b, andV ⊂Vala (finite) set of values withb⊆V and
|V| ≤ |SpareΦ(a, h)|. LetSpareΦ(a, h)\θ(b) ={d1, . . . , dm}
and letV \(b∪Good(a, h)) ={e1, . . . , ek}. Then the function
θ0:V →Val, defined by θ0(c) =def c ifc∈Good(a, h) θ(c) ifc∈b\Good(a, h) di ifc=eifor some1≤i≤k,
is also a substitution forMCΦ(a, h), withθ0(b) =θ(b).
Proof.For convenience, we abbreviate Good(a, h) by G and
SpareΦ(a, h)bySin this proof.
First, sinceV ⊂Valis finite,θ0is indeed a finite partial function
Val*fin Val. We argue thatθ0is well-defined. The three cases of
its definition above are non-overlapping by construction, the first case is trivially well-defined and the second case is well-defined
sincedom(θ) ⊇b. Thus we just need to show that the third case
is well-defined, which means showing thatk≤m, i.e.,
|V \(b∪ G)| ≤ |S \θ(b)|.
Sinceθis injective by assumption,|θ(b)|= |b|. Thus, as|V| ≤
|S|, we have|V| − |b| ≤ |S| − |θ(b)|. Then, using standard set
theory, we have as required
|V \(b∪ G)| ≤ |V \b|
= |V| − |b| (sinceb⊆V)
≤ |S| − |θ(b)| (by the above)
≤ |S \θ(b)|.
Next we argue thatθ0is indeed a substitution forMCΦ(a, h).
It is easy to see thatθ0(c) = cifc ∈ Good(a, h)andθ0(c) ∈
SpareΦ(a, h)otherwise. We just need to showθ
0
is injective. This
follows from the fact that the three definitional cases ofθ0 are
given by three injective functions with pairwise disjoint ranges:G,
θ(b)(⊆ S)andS \θ(b), respectively. Hence ifθ0(c1) =θ0(c2)
then bothc1andc2fall into the same definitional case ofθ0, and so
c1=c2by injectivity of the corresponding function. Thus indeed
θ0is a substitution forMCΦ(a, h)as required.
Finally, to see thatθ0(b) = θ(b), observe thatθ0(c) = θ(c)
immediately ifc∈b\ G, and ifc∈b∩ Gthenθ0(c) =c=θ(c),
using the fact thatθis a substitution forMCΦ(a, h).
Lemma 3.7(Completeness). Let(b, h0) ∈ JPiKΦandh0 ⊆ h,
and letθ be a substitution forMCΦ(a, h) withdom(θ) ⊇ b.
Then(θ(b), h0)∈MCΦi(a, h).
Proof.We proceed by fixed point induction onJPK
Φ
. That is, we
assume the lemma holds forY= (Y1, . . . , Yn)∈τ1×. . .×τn,
and must show it holds for(S
jϕ1,j(Y), . . . ,
S
jϕn,j(Y)). This
means, assuming that(b, h0) ∈ ϕi,j(Y)for some inductive rule
Φi,j, whereh0⊆hand we have aθsatisfying the conditions of the
lemma, we must show that(θ(b), h0)∈MCΦ
i(a, h).
Without loss of generality, we may considerΦi,jto be written
in the form:
∃z.Π :y17→u1∗. . .∗yk7→uk∗Pj1x1∗. . .∗Pjmxm⇒Pix.
By construction ofϕi,j, we have a stackssuch thats(x) =band
This means thats|= Πandh0=h1◦. . .◦hk+m, where
s, hi|=yi7→ui for all1≤i≤k,
and s, hk+i|=YPjixi for all1≤i≤m.
The latter two statements can be rewritten as follows:
dom(hi) ={s(yi)}andhi(s(yi)) =s(ui) for all1≤i≤k,
and(s(xi), hk+i)∈Yji for all1≤i≤m.
Recall thatx and z describe respectively the sets of all free
and bound variables appearing in the inductive ruleΦi,j. We have
that s(x ∪ z) ⊂ Val is finite, and b = s(x) ⊆ s(x ∪z)
and|s(x∪z)| ≤ |SpareΦ(a, h)|by construction. Therefore, by
takingV =s(x∪z)in Lemma 3.6, and notingdom(θ) ⊇bby
assumption, we can obtain a substitutionθ0 forMCΦ(a, h)
with
dom(θ0) =s(x∪z)andθ0(b) =θ(b).
Now, sinceθ0 is injective, it is easy to see thats◦θ0 |= Π
(where◦here denotes function composition). Additionally, since
s(yi), s(ui)⊆cover(h)⊆Good(a, h), we have by construction,
for all1≤i≤k,
dom(hi) ={θ0(s(yi))}andhi(θ0(s(yi))) =θ0(s(ui))
i.e. s◦θ0, hi|=yi7→ui.
Notice that, for any1≤i≤m, we have bothhk+i⊆h0⊆hand
dom(θ0)⊇s(xi). Therefore, by the induction hypothesis,
(θ0(s(xi)), hk+i))∈MCΦi(a, h) for all1≤i≤m.
Putting everything together, we obtain
s◦θ0, h0|=MCΦ(a,h) Π :y17→u1∗. . .∗yk7→uk
∗Pj1x1∗. . .∗Pjmxm.
As(s◦θ0)(x∪z)⊆Good(a, h)∪SpareΦ(a, h)by construction,
we obtain by the definition ofMCΦ(a, h)(Definition 3.3):
((s◦θ0)(x), h0)∈MCΦi(a, h).
Finally, as s(x) = b and θ0 coincides with θ on b, we have
(s◦θ0)(x) =θ0(s(x)) =θ(b). Thus we obtain as required
(θ(b), h0)∈MCΦi(a, h).
Lemma 3.8. For each1≤i≤n,
(a, h)∈JPiKΦ ⇔ (a, h)∈MCΦi(a, h).
Proof. The(⇐)direction follows directly from Lemma 3.4. The
(⇒) direction follows from Lemma 3.7 by taking(b, h0) there
to be (a, h), andθ to be the identity function on a(noting that
a⊆Good(a, h), so this is trivially a substitution in the sense of
Definition 3.5).
Lemma 3.9. MCΦ(a, h)is finite and computable.
Proof. By construction (Definition 3.3),MCΦ(a, h)can only
con-tain tuples of the form(b, h0), whereh0⊆handbis a finite tuple
of values, drawn from the finite setGood(a, h)∪SpareΦ(a, h). As
the heaphis also finite, each such(b, h0)is a finite object and there
can be only finitely many of them. HenceMCΦ(a, h)is finite.
To see thatMCΦ(a, h)is computable, observe that it is defined
as the least fixed point of a monotone operator. It is well known that
this least fixed point can be approached iteratively inapproximant
stages, starting from then-tuple(∅, . . . ,∅). SinceMCΦ(a, h)
is finite, there can be only finitely many such approximants. To see
that each one is computable, it suffices to show that anyψi,j(Y)is
computable, given thatY∈τ1×. . .×τnis computable and the
inductive ruleΦi,jis of the form∃z.Π : F ⇒Pix, say. This is
quite clear: First, there are only finitely many membership
candi-dates(b, h0)withh0 ⊆handb⊆Good(a, h)∪SpareΦ(a, h).
Second, since whethers, h0 |=Y Π :F depends only on the
val-uessassigns to the variables appearing inΠ :F, for each
candi-date(b, h0)it suffices to pick anarbitrarystackswiths(x) =b
ands(z) ⊆ Good(a, h)∪SpareΦ(a, h). Finally, for any such
s, h0 and computableY it is straightforward to decide whether
s, h0|=YΠ :F.
Theorem 3.10. The model checking problemMC is decidable.
That is, for any stacks, heaph, inductive rule setΦand symbolic
heapA, it is decidable whethers, h|=ΦA.
Proof.By Proposition 3.1, it suffices to show thatRMCis
decid-able. Let(Φ,a, h, Pi)be an instance ofRMC. By Lemma 3.8,
de-ciding whether(a, h) ∈ JPiK
Φis equivalent to deciding whether
(a, h)∈MCΦi(a, h). By Lemma 3.9, we have thatMC
Φ
i(a, h) =
proji(MCΦ(a, h))is a finite and computable set. Hence it is
de-cidable whether(a, h)∈MCΦi(a, h).
Remark 3.11. Since satisfiabilityfor our logic is known to be
decidable [10], one might imagine that we can simply reduce
model checking to satisfiability: encode the state(s, h)as a
for-mulaγ(s, h), so that s, h |=Φ Aiff γ(s, h)∧A is satisfiable.
Unfortunately, this does not work for our logic since standard
con-junction∧between arbitrary symbolic heaps is not permitted.
Remark 3.12. In practice, we might sometimes want to consider
“intuitionistic” model checking queries, of the following form:
Given an inductive rule setΦ, stack s, heap h and formulaA,
decide whether there is anh0 ⊆ h such that s, h0 |=Φ A. As
in Proposition 3.1, we may assume without loss of generality
thatA = Pix for some predicate symbolPi. This problem is
clearly also decidable: lettings(x) =a, we simply check whether
(a, h0) ∈MCΦ
i(a, h)for someh
0
. Correctness follows similarly to Lemma 3.8. Indeed, all of our correctness and complexity results in this paper adapt straightforwardly to intuitionistic queries.
We conclude this section by deducing some immediate
conse-quences of Theorem 3.10 for theentailmentproblem inSLSHID.
Definition 3.13. Given an inductive rule setΦand symbolic heaps
A, B, we say the entailmentA`ΦBisvalidifs, h|=ΦAimplies
s, h|=ΦBfor all stackssand heapsh, andinvalidotherwise.
It was shown in [4] that the set of valid sequents is not recur-sively enumerable (and, therefore, validity is, in general,
undecid-able). However, it does turn out to beco-recursively enumerable.
Corollary 3.14. For any entailmentA`Φ B, the set of its
coun-termodels,{(s, h)|s, h|=Φ A and s, h6|=Φ B}, is recursively
enumerable.
Proof.First, the set of all heaps is recursively enumerable, since
heaps are finite objects. Second, although stacks are not finite
objects, it clearly suffices to enumerate only the values ofson the
finite set of variablesF V(A)∪F V(B) = x, say. Thus we can
recursively enumerate all “representative candidates” of the form
(s(x), h). Finally, for any such candidate model, we can decide
whethers, h|=ΦAands, h6|=ΦBby Theorem 3.10.
Corollary 3.15. For any inductive rule setΦ, the set of invalid
entailments overΦis recursively enumerable.
Proof.The set of all symbolic heaps overΦis recursively
enumer-able, so the set of all entailmentsA`ΦBis also enumerable. Next,
note that the set of countermodels of a given entailment is enumer-able (Corollary 3.14). Thus the invalid entailments are recursively enumerable simply by enumerating all entailments and, for each of these, dovetailing the process of searching for a countermodel.
4.
Complexity of General Model Checking
In this section we investigate the computational complexity of the
general model checking problemMC, as described in the previous
section. Specifically, we show that MC isEXPTIME-complete,
and is stillNP-hard in the size of the heap when the underlying
inductive rule set is fixed in advance.
In the following, we writekokto denote the length of (some
reasonable) encoding of a finite mathematical objecto.
Lemma 4.1. MCisEXPTIME-hard.
Proof. By Proposition 3.1, it suffices to show that the restricted
model checking problem,RMC, isEXPTIME-hard. This is by
re-duction from the following special case of thesatisfiabilityproblem
forSLSH
ID, which was shown to beEXPTIME-hard in [10]: given
an inductive rule set Φcontaining no occurrences of 7→, and a
predicate symbolP fromΦof arity0, decide whether there
ex-ists a models, hsuch thats, h|=ΦP. SinceΦcontains no
occur-rences of7→andP no free variables, this means deciding whether
e ∈ JPKΦ(recalleis the empty heap). Thus, given any instance
(Φ, P)of the above problem, the corresponding instance ofRMC
is simply given by(Φ,(), e, P).
Lemma 4.2. MC∈EXPTIME.
Proof. By Proposition 3.1, it suffices to showRMC∈EXPTIME.
By Lemma 3.8, deciding a given instance I = (Φ,a, h, Pi)
of RMC can be done by computing MCΦ(a, h) and checking
whether(a, h)∈proji(MC
Φ
(a, h)). Thus it suffices to show that
MCΦ(a, h)can be computed in time exponential inm=defkIk.
Recall thatMCΦ(a, h)
is obtained by a fixed point construc-tion of a monotone operator (Definiconstruc-tion 3.3):
MCΦ(a, h) =defµY.(Sjψ1,j(Y), . . . ,Sjψn,j(Y))
This least fixed point can be approached iteratively from below,
starting from(∅, . . . ,∅). WritingN = |MCΦ(a, h)|, this
pro-cess will reach a fixed point in at mostN iterations. LetT be the
maximum number of polynomial-time steps required to compute
anyψi,j(Y), given the earlier fixed point approximantY. Since
each iteration involves the computation ofψi,j(Y)for every
induc-tive ruleΦi,j∈Φ, it is clear that computingMCΦ(a, h)requires
N· |Φ| ·Tpolynomial-time steps.
Now, to obtain an upper bound for N, observe that by
con-struction|MCΦ(a, h)|contains only pairs of the form(b, h0)such
thath0 ⊆h, the length ofbis bounded by the maximum arity of
any predicate, which is bounded bykΦk, andb⊆Good(a, h)∪
SpareΦ(a, h), which is bounded bykak+khk+1+kΦk=kIk+1
(the extra1comes fromnil). Therefore, we obtain
N≤(kIk+ 1)kΦk·2khk=O2poly(m)
Next, we obtain an upper bound forT. GivenY and the
induc-tive rule Φi,j of the form ∃z.Π : F ⇒ Pix, say, it clearly
suffices, for any “candidate” (b, h0) of the above form, to
de-cide whether or not (b, h0) ∈ ψi,j(Y). This means checking
whetherh0 ⊆ hand, for every valuation of the variablesx∪z
intoGood(a, h)∪SpareΦ(a, h), checking whether s(x) = b
ands, h0 |=Y Π : F (wheresis any stack obtained by
extend-ing the chosen valuation). The heap inclusion check can be done in polynomial time, and the number of possible valuations is
(eas-ily) bounded byN. To checks, h0 |=Y Π : F we might need
to consider every possible division ofh0 into a number of
“sub-heaps” bounded by the maximum number of∗s in any rule, in turn
bounded bykΦk(as per the proofs of Lemmas 3.4 and 3.7). There
are at most2khk·kΦksuch combinations. Finally, asYmight
con-tain up toN elements, checking whether a chosen division ofh0
satisfiesF with respect toYmight take up toN steps. All other
checks are polynomial, so we obtain
T ≤N·N·N·2khk·kΦk=O2poly(m)
Therefore, altogether, the computation ofMCΦ(a, h)
requires at
mostN· |Φ| ·T =O(2poly(m))polynomial-time steps.
Theorem 4.3. MCisEXPTIME-complete.
Proof.Immediate by Lemmas 4.1 and 4.2.
Typically, in program verification applications, the definitions of the inductive predicates are fixed in advance. Thus, it is also of
interest to know how the complexity ofMCvaries in the size of the
heaphover a fixed inductive rule setΦ.
Proposition 4.4. MCisNP-hard inkhk.
Proof.By Proposition 3.1, it suffices to showRMCisNP-hard in
khk. We exhibit a polynomial-time reduction from the following
triangle partition problem, known to beNP-complete [21]: given a
graphG= (V, E)with|V|= 3qfor someq >0, decide whether
there is a partition ofGinto triangles.
First, we fix the following inductive rule setΦPT:
x7→nil⇒V(x)
e7→(x, y)∗e07→(y, x)⇒E(x, y)
V(x)∗V(y)∗V(z)∗E(x, y)∗E(y, z)∗E(z, x)⇒T
E(x, y)∗J⇒J emp⇒J T∗P⇒P J⇒P
Now we give the reduction. For any instanceG= (V, E)of the
above triangle partition problem, first writeV = {v1, . . . , vn}
andE = {e1, . . . , em}. We let a1, . . . , an andb1, . . . , b2mbe
distinct addresses inLoc, and define a heaphG, withdom(hG) =
{a1, . . . , am, b1, . . . , b2m}, as follows:
hG(ai) =nil for1≤i≤n,
hG(b2k) = (ai, aj) for1≤k≤2mandek={vi, vj},
hG(b2k+1) = (aj, ai) for1≤k≤2mandek={vi, vj}.
The required instance ofRMCis then given by(ΦPT,(), hG, P)
(note thatΦPTandParefixedfor anyG). Clearly it is
polynomial-time computable. For correctness, we need to show thatGhas a
partition into triangles if and only ifhG∈ JPKΦPT. This follows
from the following easy observations, for any subheaphofhGand
valuesc, d(the formal details are easily reconstructed):
• (c, h)∈JVKΦPTiff(c, h)exactly represents a vertex inG;
• ((c, d), h)∈JEK
ΦPTiff((c, d), h)exactly represents an
(undi-rected) edge inG;
• h∈JTK
ΦPTiffhexactly represents a triangle inG;
• h∈JJKΦPTiffhexactly represents some collection of edges
inG;
• h ∈ JPKΦPT iff h exactly represents a collection of
non-overlapping triangles inG covering all vertices inG, plus a
collection of “leftover” edges fromG, i.e. iffGhas a partition
into triangles.
5.
Restricted Fragments
According to Theorem 4.3, the general model checking problem is
EXPTIME-complete. In practice, however, one frequently encoun-ters definition schemas that are more restrictive than our general schema. Here and in the next section, we investigate the computa-tional complexity of model checking when various natural syntac-tic restrictions are imposed on predicate definitions. Informally, the restrictions we consider are the following:
MEM: “Memory-consuming” rules, which only permit recursion in the presence of explicit non-empty memory.
CV: “Constructively valued” rules, in which the values of all
vari-ables occurring in a rule body are uniquely determined by the values of variables occurring in its head, together with the heap.
DET:“Deterministic” rules, in which the pure (dis)equality
con-ditions in the rules for a predicatePare mutually exclusive.
Arity: The maximum arity of any predicate is fixed in advance. Importantly, each of the above restrictions can be described
in a clear syntactical way. The restrictions DET and CV have
appeared previously in the literature, in various guises (e.g. [16, 3]).
Together, as we will show in Section 6.5, they implyprecision, the
notion that a formula unambiguously circumscribes the part of the
heap on which it is true [27]. The restrictionMEM, as far as we
know, is novel, but is seemingly crucial in reducing the complexity
of model checking fromEXPTIMEdown toNPorPTIME.
In Section 6 we show that the generalEXPTIMEcan be reduced
toPSPACEtoNPor evenPTIMEfor the fragments defined by different combinations of the above restrictions. The following table summarises our results:
CV DET CV+DET
non-MEM EXPTIME EXPTIME EXPTIME ≥PSPACE
MEM NP NP NP PTIME
Remark 5.1. For each of the combinationsMEM, MEM+CV,
MEM+DET, theirNP-completeness holds even when the arity of
the predicates involved isfixed in advance.
In contrast, notwithstandingEXPTIME-hardness, for the
frag-ment defined only bynon-memory-consuming rules, model
check-ing can be resolved inPTIME, but the degree of the polynomial is
proportional to the maximal arity of the predicates involved.
We now formally introduce the restrictionsMEM,CVandDET.
Definition 5.2(MEM). An inductive rule set is said to be
memory-consuming(a.k.a. “inMEM”) if every rule in it is of the form
Π :emp ⇒ Px,
or ∃z.Π :F∗ x7→t ⇒ Px.
In practice, most predicate definitions in the literature fall into
MEM: one or more pointers are “consumed” when recursing.
Example 5.3. The following definitions of binary predicatesls,
defining possibly-cyclic list segments by head recursion, andrls,
defining possibly-cyclic list segments by tail recursion, are both in
MEM. Both definitions “consume” a pointer when recursing.
x=y:emp ⇒ ls(x, y)
∃z. x7→z ∗ ls(z, y) ⇒ ls(x, y) (1)
and
x=y:emp ⇒ rls(x, y)
∃z. x6=y: rls(x, z) ∗z7→y ⇒ rls(x, y) (2)
Definition 5.4(CV). A variablezoccurring in an inductive rule
∃y.Π :F ⇒Pixis said to beconstructively valuedin that rule
if: (a)z ∈x, or (b) there is a variablewalso occurring in the rule
such thatwis constructively valued, and either
•Π|=z=w, or,
•w7→uis a subformula ofFandz∈u.
An inductive rule is constructively valued if all its variables are,
and an inductive rule set is constructively valued (a.k.a. “inCV”) if
all its rules are.
Example 5.5. The existentially quantified variablezis
construc-tively valued in the definition oflsin Example 5.3 (1), but not in
the definition ofrls(2).
Definition 5.6(DET). A predicatePiis said to bedeterministic
(in an inductive rule setΦ) if for any two distinct rules of the form
∃z.Π :F ⇒Pix and ∃z
0
.Π0:F0⇒Pix,
there exists no stack s such that s, e |= ∃z.(Π : emp) and
s, e|= ∃z0.(Π0 : emp). An inductive rule setΦis deterministic
(a.k.a. “inDET”) if all predicates defined inΦare deterministic.
We note that whetherΦis deterministic is decidable in
polyno-mial time via a simple procedure that eliminates pure sub-formulas employing quantified variables [10].
Example 5.7. In Example 5.3, the definition ofrls(2) is
determin-istic, but the definition ofls(1) is not.
6.
Complexity of Restricted Fragments
In this section, we investigate the computational complexity of model checking for different combinations of the restrictions
MEM,CVandDETintroduced in the previous section.
The main technical tool underpinning the following complexity
results is the notion of an unfolded inductive tree. The idea is
simple: in order to show that(a, h)∈JPKΦ, we repeatedly unfold
the rules from Φ backwards (from head to body), instantiating
variables with values and matching7→assertions with pointers inh
as we go according to the rule constraints.
Definition 6.1. For the sake of brevity, given an elementary
for-mulaQ(x1, .., xn), we writeQeto denote a statementQ(a1, .., an)
obtained by instantiating the variables inQwith values and heap
pointers in the obvious way. For example,xe7→eyrepresents the
one-cell heap that containsyeat locationxe. Then we specify the
unfolded inductive tree by induction (see Figure 1):
(a) LetQgmbe generated by an instantiated ruleρmof the form
g
Πm:hm ⇒ Qgm
Then we make a tree, T
g
Qm, consisting of one edge labelled
byρm; the root is labelled byQgm, the leaf byhm.
• g Qm ? ρm • hm T g Qm
(b) Suppose that an instantiated ruleρof the form
e
Π :Qf0 ∗Qf1 ∗ · · · ∗Qgm ⇒ Re
generatesRe, andT
f
Q0,TQf1,. . . ,TQgmare inductive trees having
been already constructed forQf0,Qf1,. . . ,gQm, resp.
Then we make a tree, T
e
R, by taking a new root with m+1
outgoing edgesalllabelled byρ, labelling the root byRe, and
connecting our root with the roots ofT
f Q0,TQf1,. . . ,TQgm, resp. •e R ρ @ @ @ R ρ 9 ρ XX XX XX XXX X z ρ • f Q0 A A A T e Q0 • f Q1 A A A T e Q1 •Qf2 A A A T e Q2 •gQm A A A T e Qm • • •
Proposition 6.2. The restricted model checking problemRMCcan be solved by an exhaustive search for an unfolded inductive tree for
the query(a, h)∈JPKΦ, where the values for instantiated
existen-tial variables are drawn from the setGood(a, h)∪SpareΦ(a, h),
as per Definition 3.2.
Proof. (Sketch) The soundness of the approach is obvious.
Termi-nationfollows from the fact that the range of permissible values is
finite, andcompletenessfrom the results in Section 3 which show
that it suffices to confine our attention to values drawn from the
polynomial-size setGood(a, h)∪SpareΦ(a, h).
However, the above procedure might still generate an
exponen-tial number of leaves labelled by the empty heape:
Example 6.3. LetΦnbe the set of inductive rules (for1≤j≤n):
P1 ∗P2⇒P0,
P2j+1 ∗P2j+2⇒P2j−1, P2j+1 ∗ P2j+2⇒P2j,
emp⇒P2n+1, emp⇒P2n+2.
Then any unfolded inductive tree for the querys, e|=ΦnP0 has
2n+1leaves labelled bye.
Nevertheless, in the case of MEM, we are able to reduce
EXPTIMEtoNPby proving that the number of leaves labelled
byecan be bounded by|dom(h)|.
6.1 An UpperNP-Bound forMEM
Here, we show that, when we restrict to memory-consuming rules
(MEM), model checking becomes anNPproblem.
Theorem 6.4. We can design anNPprocedure to determine, given
a set of memory-consuminginductive rulesΦ, tuple of valuesa,
heaphand predicate symbolP, whether(a, h)∈JPK
Φ .
Proof.(Sketch) Taking into account the bounds provided by
Sec-tion 3, we look for an unfolded inductive tree such that within each
rule instanceρfromΦused in the tree allvaluesare taken only
from a set of polynomial size, which isfixed in advance.
To provide anNPprocedure, it suffices to prove Lemma 6.5
and Lemma 6.6 below. The crucial issue is that, in contrast to
Example 6.3, the number of leaves labelled byecan be bounded
by the size ofdom(h)when all rules are memory-consuming.
Lemma 6.5. According to Section 3,(a, h)∈JPKΦiff there is an
unfolded tree for the appropriately instantiatedPesuch thathis the
∗-composition of all heaps labelling the leaves of the tree.
Lemma 6.6. The number of nodes in the above inductive trees
forPeis bounded by2(m+ 1)· |dom(h)|, wheremis the maximal
number of predicate symbols in the body of the rules.
Proof. It is clear that the number of leaves labelled by non-empty
heaplets is bounded by|dom(h)|. Let vbe a leaf labelled bye.
Then either its parent w is the root of the tree, or the edge of
the form(v0, w)incoming towis labelled by someρ, providing
thereby a leafv0 with its incoming edge(v0, v0)labelled by the
sameρ, such thatv0 is labelled by anon-emptyex7→et(Figure 2
shows such aρinMEM). Since no more thanmleaves labelled
byecan be associated with one and the samev0specified in such a
way,the total number of leavesis bounded by(m+1)· |dom(h)|.
It remains to apply an induction to conclude the proof.
6.2 NP-Hardness forMEM+CV
Here we showNP-hardness for the restricted fragmentMEM+CV.
The proof is by reduction from the3-partition problem[21].
•e R ρ @ @ @ R ρ 9 ρ XX XX XX XX XXz ρ • e x7→et • f Q1 A A A T e Q1 •Qf2 A A A T e Q2 •gQm ? ρm • • • • e Figure 2. ρ = Π :e xe7→et ∗Qf1 ∗Qf2 ∗ · · · ∗gQm⇒Re
Definition 6.7. By means of the length of circular lists, we resolve
a key issue: representingintegersaslogic formulas.
By aring-formula of length`, with leading variablex,we mean
a formula of the form (x01,x
0 2,. . . ,x 0 `are fresh): x7→x01 ∗x 0 17→x 0 2 ∗x 0 27→x 0 3 ∗ · · ·x 0 `−17→x 0 ` ∗x 0 `7→x 0 1
Given a 3-partition problem instance, i.e., a bound B and a
multisetS={s1, s2, . . . , s3m}, we introduce a linear-ordered list
Xof distinct variables: X=x1, x2, . . . , xi, . . . , x3m.
Then we encode each of the numbers si by a ring-formula,
Si(xi), of lengthsi, with the leading variablexi.
The whole S is encoded as a concrete heap hS, which is a
collection of3mdisjoint “circular” lists of the formSi(ai).
With an appropriate stacksS, sS, hS|= Π(X) :ϕS(X), where
ϕS(X) = S1(x1)∗S2(x2)∗ · · · ∗S3m(x3m)
andΠ(X) says(x6=y) for all distinct variable namesxand y
mentioned, explicitly or implicitly, inϕS(X).
Define a set of inductive rules ΦS as follows. To keep the
predicate aritiesfixedand, at the same time, maintaini-th position
inside the tupleX, we define predicatesQi(x)by the rules
x6=nil:emp ⇒ Qi(x) (3)
Fori < j < k, we use “goal” predicatesRijk(x, y, z)with the rules
x6=y, y6=z, z=6 x:emp ⇒ Rijk(x, y, z) (4)
In the case ofi < j < kand si+sj+sk=B, we add the rule:
Si(x)∗Sj(y)∗Sk(z)∗Rijk(x, y, z) ⇒ Rijk(x, y, z) (5)
Lemma 6.8. LethSandsSbe defined above. Then
sS, hS|=ΦS Π(X) :
∗
3m
i=1Qi(xi)∗
∗
i<j<kRijk(xi, xj, xk)iff there is a complete 3-partition onS- i.e.,Scan be partitioned
in groups of three, saysi,sj, andsk, so thatsi+sj+sk=B.
Proof.EachRijk(ai, aj, ak)at the top is generated either by (4),
withemp, or by (5), with Si(ai), Sj(aj), Sk(ak) being
‘con-sumed’. The latter provides the corresponding group ofsi,sj,sk.
Corollary 6.9. (a) In the case of the memory-consuming rules, the
model checking problem isNP-complete (even if the arity of the
predicates involved is at most3).
(a) For the memory-consuming rules with constructively valued
variables, model checking is stillNP-complete (even if the arity
of the predicates involved is at most3).
Proof.This follows from Sections 6.1 and 6.2.
6.3 NP-Hardness forMEM+DET
The challenge - to simulateintrinsically non-deterministic3-SAT
bydeterministicmemory-consuming rules, with, moreover,
keep-ing the arity of predicatesfixed- is solved using generalised
ver-sions of the linked list segments inductively defined in Exam-ple 5.3.
Namely, within a list fragment leading fromxtoy, by means of
RLS(x, u, y), defined below, we will keep the information about
Here we abbreviate X=x0, x1, .., xn, Q=q0, q1, .., q`, Ξ=ξ0, ξ1, .., ξm, Πi={xj6=q |j6=i, q∈Q} ∪ {z6=z0 |z, z0∈Q∪Ξ}
Thefinal“empty configuration” is generated by the “backward” rule (recall thatξ0b is the blank symbol,ξ1b andξ2b are end markers) Π0, x0=q0, x1=ξ1, x2=x3=· · ·=xn−1=ξ0, xn=ξ2: emp ⇒ T(X,Q,Ξ)
An instruction“if in statebqklooking atξs, replace it byb ξbs0, move to the right, and go intobqk0”, is simulated bynrules (here0≤i < n): Πi, xi=qk, xi+1=ξs, yi=ξs0, yi+1=qk0: T(x0, x1, .., xi−1, yi, yi+1, xi+2, .., xn,Q,Ξ) ⇒ T(X,Q,Ξ)
An instruction“if in statebqklooking atξsb, replace it byξbs0, move to the left, and go intobqk0”, is simulated in a similar “backward” way.
Analternatinginstruction“if in stateqkb, run two copies of the configuration in parallel but with statesbqk0andbqk00, resp.”, is simulated by (0≤i < n): Πi, xi=qk, yi=qk0, zi=qk00: T(x0, .., xi−1, yi, xi+1, .., xn,Q,Ξ)∗T(x0, .., xi−1, zi, xi+1, .., xn,Q,Ξ)⇒T(X,Q,Ξ)
Figure 3. Simulating a Turing machineMrunning in spacen, in a backward manner - “from outputs to inputs”
A linked listRLS(x, u, y)is formed by attaching a new tail:
x=y: emp⇒RLS(x, x, y) ∃u. x6=z: RLS(x, u, y)∗ y7→(y, z)⇒RLS(x, y, z)
Notice that the rules forRLS(x, u, y)are both memory-consuming
and deterministic. The fact thatuis not constructively valued is a
key ingredient in our reduction, which allows us to cope with the
non-deterministicproblem3-SAT.
Definition 6.10. By means of the following heapletsh(0)ab andh(1)ab
(herea6=b) we represent the truth values, “false” and “true”, resp.:
h(δ)ab = (
a→a b i.e.a7→(a, b), for δ= 0,
a→a b−→b b i.e a7→(a, b)∗b7→(b, b),for δ= 1.
Lemma 6.11. Assuming a6=b, let h(δ)ab |=RLSRLS(a, c, b) for
somec. Then: (δ= 0)∧(a=c6=b)
∨ (δ= 1)∧(a6=c=b)
.
Proof. h(0)ab |=RLSRLS(a, a, b), and h
(1)
ab |=RLSRLS(a, b, b).
Letϕ≡(Cb1∧ · · · ∧Cbm)be a formula ofmclauses over linear
orderednpropositional variablesp1,p2,..,pn, and eachCbjis of the
form (here, for anyq, we denote q1=q and q0=¬q):
b Cj(q1, q2, q3)≡(q εj,1 1 ∨q εj,2 2 ∨q εj,3 3 )
EachCcjis encoded by a predicateCj(α1, γ1, α2, γ2, α3, γ3)with
the following ruleξj(for the sake of readability, we squeeze three
deterministic rules into one but with disjunction): ((α16=γ1)εj,1∨(α26=γ2)εj,2∨(α36=γ3)εj,3)∧V
k6=`(αk6=α`) :
emp⇒Cj(α1, γ1, α2, γ2, α3, γ3)
Example 6.12. Here we representpias “xi6=ui”. So satisfiability
ofC(pb 1, p2, p3)of the form(p1∨ ¬p2∨p3)is reformulated as
(x16=u1)∨(x2=u2)∨(x36=u3) :C(x1, u1, x2, u2, x3, u3)
We take the following linear-ordered variables: W=w0,
X=x1, x2, .., x2n, U=u1, u2, .., u2n, Y=y1, y2, .., y2n.
The challenge - to maintain the arityfixedand, at the same time,
to “keepi-th position” inside the longX,U,Y- is solved by taking
predicatesQi(x, u, y)with the rulesκi:
x6=y, u=x:emp⇒Qi(x, u, y), x6=y, u=y:emp⇒Qi(x, u, y).
The key points of our reduction are encapsulated in the ruleωok:
∃X,U,Y,WΠ(X,Y,W) : w07→w0∗
∗
2n i=1Qi(xi, ui, yi)∗∗
2n i=1RLS(xi, ui, yi)∗∗
m j=1Cj(xij,1, uij,1, xij,2, uij,2, xij,3, uij,3) ⇒ okwhereΠ(X,Y,W)says(x6=y)for all distinct variable namesx
andymentioned either inXor inYor inW.
Definition 6.13. Define a heapHRLSas a collection ofndisjoint
heaplets of the formh(0)ab, andndisjoint heaplets of the formh(1)ab
(we assumea6=b), and a heapHϕas a loop of the formd07→d0.
Lemma 6.14. With the empty input tuple of values,
((), HRLS◦Hϕ)∈JokK
ωok∪RLS∪Smj=1ξj∪Siκ2i=1n (6)
if and only if(Cb1∧ · · · ∧Cbm)is satisfiable.
Proof.(Sketch) The(⇒)direction is the hardest. Suppose that (6)
is valid. Thenok is generated by ωok with an unfolded
induc-tive tree forok. We can show that, for some valuesa1, a2, .., a2n,
c1, c2, .., c2n,b1, b2, .., b2n, the heapHRLScan be partitioned into
2ndisjoint heaplets of the formh(δi)
aibi, so that we get the
follow-ing:h(δi)
aibi|=RLSRLS(ai, ci, bi). Now, using the rulesξj andκi
and Lemma 6.11, we can prove the desired satisfiability (see also
Example 6.12): ϕ(δ1, δ2, . . . , δn−1, δn) = 1.
The(⇐)direction follows essentially by reading the above line
of reasoning “bottom-up”.
Corollary 6.15. For deterministic and memory-consuming rules,
model checking is stillNP-complete (even if the arity of the
predi-cates involved is at most6).
6.4 PSPACE- andEXPTIME-Hardness for Non-MEMRules
Unexpectedly, withnon-memory-consuming rules, such as
∃z.Π :Q1u1 ∗Q2u2 ∗ · · · ∗ Qmum ⇒ Px.
model checking becomes more complex. Namely:
Theorem 6.16. (a)For inductive rule sets inCV, model checking
isEXPTIME-complete.
(b) For inductive rule sets in CV+DET, model checking is
PSPACE-hard.
(c)For inductive rule sets inDET, model checking isEXPTIME
-complete.
Proof. (Sketch) We prove all three lower bounds by simulating
Turing machines in a backward manner - “from outputs to inputs”.
LetMbe a Turing machine that accepts in spacen, with states
b
q0,bq1,..,qb`, and tape symbolsξb0,ξb1,..,ξbm. Hereqb1is the initial
state,bq0is an accept state,ξb0is the blank symbol, andM acts in
the spacenbetween twounerasedmarkers, sayξb1andξb2.
LetMalways jump, and noM’s instruction starts withqb0.
By(η1, η2, .., ηi−1, qk, ηi, .., ηn) we formalize that “in statebqk,
M scansi-th square, whenηb1,bη2, ..,ηbi−1,bηi, ..,ηcnis printed on
its tape”. We encodeM by means of the rulesΦM given in
Fig-ure 3, where the “tape” predicateTdepictsM’s configurations:
T(x0, x1, .., xn | {z } configuration , q0, q1, .., q` | {z } states , ξ0, ξ1, .., ξm | {z } tape symbols )
Lemma 6.17. Let M be a deterministic TM or an alternating
TM [15]. Thens, e|=ΦM T(q1, ξ1, ξ0, ξ0, ξ0.., ξ0, ξ2,Q,Ξ)if and
only ifMcan go from the initial “empty configuration” to the final
All variables inΦMoccur inX∪Q∪Ξ, hence, they are con-structively valued, which provides item (a) in Theorem 6.16.
WheneverM is deterministic,ΦM is deterministic, which
pro-vides item (b) in Theorem 6.16.
To answer the challenging (c),M’snon-deterministic
instruc-tion“if in stateqbklooking atξbs, go either into
b
qk1or intobqk2”
is simulated by thedeterministicrules (here0≤i < n, `= 1,2):
∃y.Πi, xi=qk, xi+1=ξs, y6∈Ξ, yi+1=ξd(s,k):
T(x0, .., xi−1, y, yi+1, xi+2, .., xn,Q,Ξ) ⇒ T(X,Q,Ξ),
Πi, xi=qd0(s,k,k`), xi+1=ξd(s.k), yi=qk`, yi+1=ξs:
T(x0, .., xi−1, yi, yi+1, xi+2, .., xn,Q,Ξ) ⇒ T(X,Q,Ξ)
where, however,yisnotconstructively valued.
The idea behind the encoding is as follows. First,M goes
non-deterministically intoany statey, with encrypting the situation by
a special ‘double’ ξbd(s.k). To continue a computation, the lucky
guess should be one of our specially introduced statesqbd0(s,k,k
1)
andbqd0(s,k,k
2). As a result,Mfinishes either inqbk1or inqbk2.
Remark 6.18. The above EXPTIME-hardness necessarily
em-ploys predicates of unbounded arity (cf. Remark 5.1).
6.5 Polynomial Time forMEM+CV+DET
We next show that the model checking problem is inPTIME
when-everΦis inMEM+CV+DET. Essentially, predicates inCV+DET
make a top-down procedure fully deterministic; and, the size of any
possible proof is linear in the size of the heap, ifΦis inMEM.
Wherehis a heap andvis a value such thatv∈dom(h), we
writeh−. vto denote the heaph0that has domaindom(h)\vand
agrees withhon its domain. This operation is lifted in the obvious
way to sets of values, i.e.,h−. V whereV ⊆dom(h).
Definition 6.19. The constructs, h Φ A h0 is called a
reduction, wheresis a stack,h, h0are heaps andAis a symbolic
heap with inductive predicate occurrences defined inΦ. We say that
the above reduction isvalidifh0⊆hands, h−. dom(h0)|=ΦA.
Figure 4 presents a proof system for reductions. A proof is, as usual, a tree whose leaves are labelled by axioms and internal nodes are labelled by inference rules accordingly. We say that a reduction
Ris provable if there exists a proof whose root is labelled byR.
Lemma 6.20. (Soundness) For any set of rulesΦ, formulaA, stack
sand heaph, ifs, hΦA h0is provable then it is valid.
Proof. Follows by induction over the structure of the proof.
Letφ(Y) =¯ def (Sjϕ1,j(Y), . . . ,
S
jϕn,j(Y))(cf. Def. 3.3).
Set φ¯0 = (∅, . . . ,∅)
and φ¯α+1 = ¯φ( ¯φα)
for any ordinal α.
Clearly, (a, h) ∈ JPiK
Φ
iff there is an ordinal α such that
(a, h)∈proji( ¯φ
α
). Next, we writes, h|=αΦΠ :Fif it is the case
thats, h|=ΦΠ :Fand for everyPitinF,(s(t), h0)∈proji( ¯φ
α
)
for the appropriate subheaph0 ⊆ h. We extend this to quantified
formulas using the same ordinal, and to valid reductions in the
obvi-ous manner. Finally, we say thats, hΦA h0isα-supported
ifαis theleast ordinalsuch thats, h−. dom(h0)|=αΦA.
LetRbe the set of valid, constructively valued reductions. We
define an ordering≺overR. LetRi ≡si, hiΦ Ai h0ibe
anαi-supported reduction inR, fori∈ {1,2}. ThenR1≺R2iff
either: (a)α1< α2; or (b)α1=α2and
1.A1≡Π1:FandA2≡Π2:FandΠ1⊂Π2, or,
2.A1≡F andA2≡σ∗Ffor some atomicσ, or,
3.A1≡ ∃x. BandA2≡ ∃y. Bandx⊂y.
The ordering≺is easily seen to be well-founded.
Lemma 6.21. (Completeness) For any set of rulesΦand formula
Athat are inCV, and any stacksand heaph, ifs, hΦA h0
is valid then it is provable.
Proof.We proceed by well-founded induction over(R,≺), i.e., we
show that if allR0∈ Rsuch thatR0 ≺Rare provable, then so is
R∈ R.
LetR ∈ Rbe the reduction s, h Φ ∃v. A h0. AsR
is constructively valued, there must be some variablex∈ vsuch
that (i) there is a free variableysuch thatx =yis a subformula
of A, or, (ii) xappears in the right-hand side of a subformula
y7→ tofA. Letz=defv\ {x}. AsRis valid we haveh0 ⊆h
ands, h−. dom(h0)|=α
Φ∃v. A, thus there is a stacks
0
such that
s0(F V(∃z. A)) =s(F V(∃z. A))ands0, h−.dom(h0)|=α
Φ∃z. A.
If clause (i) is true then clearlys0(x) = s0(y)by the
seman-tics of=, ands0(y) = s(y) asy is free. Thus without loss of
generality we can sets0 =s[x 7→s(y)]. Therefore the reduction
R0≡s[x7→s(y)], hΦ∃z. A h0is valid and constructively
valued. It is easy to see thatR0must beα-supported (otherwise we
can derive a contradiction with the assumption thatαis the least
such ordinal), thusR0 ≺Rby clause (2c) of the definition of≺.
By the inductive hypothesis,R0is provable and the rule (∃=) can
be applied, thus provingR. Clause (ii) is similar and uses (∃7→).
Next, letR∈ Rbe of the forms, hΦx=y,Π :F h0,
thus,h0 ⊆ h and s, h−. dom(h0) |=αΦ x = y,Π : F. It is
easy to see that the reductionR0 ≡ s, h Φ Π : F h0
is valid, constructively valued andα-supported. By clause (2a) of
the definition of≺it follows thatR0 ≺R. Thus by the inductive
hypothesis,R0is provable and the rule (=) applies, thereforeRis
also provable. Disequalities are treated similarly via the rule (6=).
Now, letR∈ Rbe of the forms, hΦ σ∗F h0. Thus,
h0 ⊆hands, h−. dom(h0) |=α
Φ σ∗F. Therefore there are two
disjoint heapshσ, hF such thath−. dom(h0) = hσ ◦hF, and
s, hσ |=αΦσ σ and s, hF |=
αF
Φ F, whereα = max{ασ, αF}.
Thush = hσ ◦hF ◦h0, ands, h−. dom(hF ◦h0) |=αΦσ σ
and s, hF ◦h0 −. dom(h0) |=αΦF F. Therefore the reductions
Rσ≡s, hΦσ hF◦h0andRF ≡s, hF ◦h0ΦF h0
are both valid and constructively valued. In addition,Rσ, RF ≺R:
eitherασ < α(resp.,αF < α) where clause (1) of the definition
of≺applies, orασ =α(αF =α) and clause (2b) applies. By the
inductive hypothesis,RσandRFare provable, and so isRvia (∗).
Finally, supposeR ∈ Rhas the forms, hΦ Pit h0,
meaning thath0 ⊆hands, h−. dom(h0) |=α
Φ Pit. By Def. 3.3,
this means that(s(t), h−. dom(h0))∈JPiK
Φwhich in turn means
that there is a rule ∃v.Π : F ⇒ Pix in Φ and some stack
s0 such that s0, h−. dom(h0) |=αΦ0 Π : F, and s
0
(x) = s(t)
(equality of tuples), andα0 < α. Trivially,s0, h−. dom(h0) |=αΦ0
∃v.Π : F. Thus the queryR0 ≡ s0, h Φ ∃v.Π : F h0
is valid, constructively valued andα0-supported. By the inductive
hypothesis,R0is provable and therefore so isRby applying (Pi).
The cases for7→,empeasily treated with rules (7→), (emp).
We recall the notion of precision: a formulaAis precise iff for
every stacksand heaph, there is at most oneh0 ⊆ hsuch that
s, h0 |=Φ A. Precision entails that ifs, hΦ A h0is valid
then there is no otherh00 6= h0 such thats, hΦ A h00is
valid. Thus precision allows the deterministic application of (∗).
Lemma 6.22. Let Φbe a set of rules in CV+DET. Then any
formula inCVusing predicates defined inΦis precise.
Proof.Observe the following points: ifΣandΣ0are precise then
so isΣ∗Σ0; ifΣis precise then so isΠ : Σ, for anyΠ; ifΠ : Σis
precise then so isA≡ ∃v.(Π : Σ), provided all variables invare
constructively valued inA. Finally, note that the problem reduces
to guaranteeing that every formula of the formPitis precise.
Thus we need to show that for every tuple of values a and
heaphthere is at most oneh0 ⊆ hsuch that(a, h0) ∈ JPiK
Φ.
This follows by fixpoint induction. Suppose there are valuesaand
s(x) =s(y) s, hΦΠ :F h0 (=) s, hΦx=y,Π :F h0 s(x)6=s(y) s, hΦΠ :F h0 (6=) s, hΦx6=y,Π :F h0 s(x)∈dom(h) h(s(x)) =s(t) (7→) s, hΦx7→t h−. s(x) s, hΦσ h0 s, h0ΦF h00 (∗) s, hΦσ∗F h00 x∈v y /∈v s[x7→s(y)], hΦ∃(v\ {x}).(x=y,Π :F) h0 (∃=) s, hΦ∃v.(x=y,Π :F) h0
x∈v y /∈v s(y)∈dom(h) ∃i. ti≡x s[x7→h(s(y))i], hΦ∃(v\ {x}).(Π :F∗y7→t) h0
(∃7→) s, hΦ∃v.(Π :F∗y7→t) h0 (emp) s, hΦemp h (∃v.(Π :F)⇒Pix)∈Φ s, e|=∃v.(Π :emp) s[x7→s(t)], hΦ∃v.(Π :F) h0 (Pi) s, hΦPit h0
Figure 4. Proof rules for reductions. The formulaσin the rule (∗) is atomic.
S
jϕi,j(Y). As the rules are precise there must bek 6= lsuch
that(a, h1) ∈ ϕi,k(Y)and(a, h2) ∈ ϕi,l(Y). This, however,
directly contradicts determinism.
Finally, we establish that proof search is a deterministic,
ter-minating decision procedure when all predicates are inCV+DET;
and, its runtime is naturally bounded by a polynomial in the input
size, if in addition, all rules are inMEM.
Theorem 6.23. Let Abe a formula inCVandΦa set of rules
inMEM+CV+DET. Then, for any stacksand heaph, checking
whethers, h|=ΦAcan be performed in polynomial time.
Proof. First, note thats, h|=ΦAiffs, hΦA eis provable.
Observe that the structure of a given reduction dictates that there is at most one applicable rule from Fig. 4. Rules about quantifiers,
(∃=) and (∃7→), form an exception but the order of their
applica-tions, as well as the choice of quantified variable to eliminate next,
is immaterial to the provability of aCVreduction, thus a fixed order
can be used.
AsΦis inDET, (Pi) can be used with at most one rule and this
rule can be found in polytime by evaluating the side condition of
(Pi) for all rules. This means no back-tracking is required overΦ.
The rule (∗) resembles a cut in that the intermediate heap h0
does not appear in the conclusion. This can be seen as a source of
non-determinism as many choices forh0may have to be checked
(due to the fact thath0 ⊆ h). However, as all formulas involved
are precise (Lemma 6.22) if there is such a heaph0 it is unique.
In addition, observe that only axioms impose constraints on the RHS heap. Thus, we avoid back-tracking by using meta-variables
for the heaph0and order the proof search to first prove the
left-hand subgoal of (∗). If the left subgoal of (∗) is proven, thenh0is
instantiated by axioms and the search continues in the right subgoal of (∗). Otherwise, the goal reduction is clearly invalid.
These observations together guarantee that the proof search is fully deterministic.
Now, forMEMrules each application of (Pi) leads to at least
one subgoal that requires (7→), and there cannot be more instances
of (7→) in a proof than the size of the heaphin the root reduction.
Thus the size of the proof isO(khk), and as the search is
determin-istic, runtime is linear as well.
Remark 6.24. Deciding intuitionistic queries (cf. Remark 3.12)
in MEM+CV+DETcan be done inPTIME. This follows easily
by noting that the proof search described in Theorem 6.23 actually
computesa heap for the RHS of any reduction. Given the precision
of the formulas involved (Lemma 6.22), this means that we can answer correctly intuitionistic queries using the same algorithm.
7.
Implementation and Evaluation
We implemented the general model checking algorithm described
in Section 3 as well as theCV+DETalgorithm described in
Sec-tion 6.5, in about 1400 lines of OCaml code. Our implementaSec-tion is
part of the CYCLISTtheorem proving framework [13] which
pro-vides support for inductive definitions, and in particular for our
logicSLSHID. Both model checking algorithms are parameterised over
the datatypes for heap locations and ground values (e.g. integers, booleans, strings, etc.) and thus may be instantiated to handle mod-els where heap locations have arbitrary representations (e.g. hex strings) and heap cells contain arbitrary data. We employed a num-ber of techniques to improve the efficiency of the implementation, including pre-computing the models for points-to subformulas, us-ing hashsets to store submodels, and usus-ing bit vectors to represent heaps. We also implemented the intuitionistic version of our algo-rithms as per Remark 3.12 and Remark 6.24. The code and test suite for our tool are available online [1].
We tested the performance of our implementation across a range of ‘typical’ predicate definitions gathered from the verification community, and a number of hand-crafted definitions designed to elicit the worst-case, exponential performance. We extracted models from a number of example programs at runtime using an extension of GDB which supports scripting using Python [2]. All tests were carried out on a 2.93GHz Intel Core i7-870 processor with 8GB RAM.
We note that all tests carried out were on positive instances. This was decided for two reasons. First, the worst-case performance can be exhibited with positive instances as shown below. Second, when using runtime checks, for instance in code contracts or offline test suites, negative instances usually lead to program termination because they indicate that some pre- or postcondition, or invariant is no longer satisfied. Thus the runtime on positive instances is a much more important measure of performance.
‘Typical’ Performance. Testing our implementation against
typi-cal, real-world data requires sourcing programs annotated with sep-aration logic assertions. We identified 6 programs from the suite of examples in the Verifast distribution [24] containing non-trivial in-ductive predicates which translate into our assertion language:
(i)stack.c: a stack data-structure implemented using linked lists.
(ii) queue.c: a lock-free concurrent queue based on list segments.
(iii)set.c: a concurrent set data-structure based on linked lists.
(iv)schorr-waite.c: an implementation of the Schorr-Waite graph
marking algorithm over binary trees.
(v)iter.c: a list data-structure with aniteratorpointing into the list.
(vi) composite.c: an example of the composite design pattern,
where each node of a tree must maintain local data consistent with a global property.