A Verified SAT Solver Framework with Learn, Forget, Restart, and Incrementality

(1)

HAL Id: hal-01904579

https://hal.inria.fr/hal-01904579

Submitted on 25 Oct 2018

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

A Verified SAT Solver Framework with Learn, Forget,

Restart, and Incrementality

Jasmin Christian Blanchette, Mathias Fleury, Peter Lammich, Christoph

Weidenbach

To cite this version:

Jasmin Christian Blanchette, Mathias Fleury, Peter Lammich, Christoph Weidenbach. A Verified SAT

Solver Framework with Learn, Forget, Restart, and Incrementality. Journal of Automated Reasoning,

Springer Verlag, 2018, 61 (1-4), pp.333-365. �10.1007/s10817-018-9455-7�. �hal-01904579�

(2)

J. Autom. Reasoning manuscript No. (will be inserted by the editor)

A Verified SAT Solver Framework with

Learn, Forget, Restart, and Incrementality

Jasmin Christian Blanchette · Mathias Fleury · Peter Lammich · Christoph Weidenbach

Received: date / Accepted: date

Abstract We developed a formal framework for CDCL (conflict-driven clause learning) us-ing the Isabelle/HOL proof assistant. Through a chain of refinements, an abstract CDCL cal-culus is connected first to a more concrete calcal-culus, then to a SAT solver expressed in a func-tional programming language, and finally to a SAT solver in an imperative language, with total correctness guarantees. The framework offers a convenient way to prove metatheorems and experiment with variants, including the DPLL (Davis–Putnam–Logemann–Loveland) calculus. The imperative program relies on the two-watched-literal data structure and other optimizations found in modern solvers. We used Isabelle’s Refinement Framework to auto-mate the most tedious refinement steps. The most noteworthy aspects of our work are the inclusion of rules for forget, restart, and incremental solving and the application of stepwise refinement.

Keywords SAT solvers · CDCL · DPLL · Proof assistants · Isabelle/HOL

1 Introduction

Researchers in automated reasoning spend a substantial portion of their work time develop-ing logical calculi and provdevelop-ing metatheorems about them. These proofs are typically carried out with pen and paper, which is error-prone and can be tedious. Today’s proof assistants

Jasmin Christian Blanchette

Vrije Universiteit Amsterdam, Department of Computer Science, Section of Theoretical Computer Science, De Boelelaan 1081a, 1081 HV Amsterdam, The Netherlands

E-mail: [email protected]

Jasmin Christian Blanchette · Mathias Fleury · Christoph Weidenbach

Max-Planck-Institut für Informatik, Saarland Informatics Campus E1 4, 66123 Saarbrücken, Germany E-mail: {jasmin.blanchette,mathias.fleury,christoph.weidenbach}@mpi-inf.mpg.de

Mathias Fleury

Saarbrücken Graduate School of Computer Science, Universität des Saarlandes, Saarland Informatics Campus E1 3, 66123 Saarbrücken, Germany

E-mail: [email protected] Peter Lammich

Institut für Informatik, Technische Universität München, Boltzmannstraße 3, Garching, Germany E-mail: [email protected]

(3)

are easier to use than their predecessors and can help reduce the amount of tedious work, so it makes sense to use them for this kind of research.

In this spirit, we started an effort, called IsaFoL (Isabelle Formalization of Logic) [4], that aims at developing libraries and a methodology for formalizing modern research in the field, using the Isabelle/HOL proof assistant [45, 46]. Our initial emphasis is on established results about propositional and first-order logic. In particular, we are formalizing large parts of Weidenbach’s forthcoming textbook, tentatively called Automated Reasoning—The Art of Generic Problem Solving. Our inspiration for formalizing logic is the IsaFoR (Isabelle Formalization of Rewriting) project [55], which focuses on term rewriting.

The objective of formalization work is not to eliminate paper proofs, but to complement them with rich formal companions. Formalizations help catch mistakes, whether superficial or deep, in specifications and theorems; they make it easy to experiment with changes or variants of concepts; and they help clarify concepts left vague on paper.

This article presents our formalization of CDCL (conflict-driven clause learning) based on Automated Reasoning, derived as a refinement of Nieuwenhuis, Oliveras, and Tinelli’s abstract presentation of CDCL [43]. It is the algorithm implemented in modern propo-sitional satisfiability (SAT) solvers. We start with a family of formalized abstract DPLL (Davis–Putnam–Logemann–Loveland) [17] and CDCL [3, 6, 40, 42] transition systems from Nieuwenhuis et al. (Section 3). Some of the calculi include rules for learning and forgetting clauses and for restarting the search. All calculi are proved sound and complete, as well as terminating under a reasonable strategy.

The abstract CDCL calculus is refined into the more concrete calculus presented in Auto-mated Reasoningand recently published [57] (Section 4). The latter specifies a criterion for learning clauses representing first unique implication points [6, Chapter 3], with the guaran-tee that learned clauses are not redundant and hence derived at most once. The correctness results (soundness, completeness, termination) are inherited from the abstract calculus. The calculus also supports incremental solving.

The concrete calculus is refined further to obtain a verified, but very naive, functional program extracted using Isabelle’s code generator (Section 5). The final refinement step derives an imperative SAT solver implementation with efficient data structures, including the well-known two-watched-literal optimization (Section 6).

Our work is related to other verifications of SAT solvers, which largely aimed at in-creasing their trustworthiness (Section 7). This goal has lost some of its significance with the emergence of formats for certificates that are easy to generate, even in highly optimized solvers, and that can be processed efficiently by verified checkers [16, 33]. In contrast, our focus is on formalizing the metatheory of CDCL, with the following objectives:

– Develop a basic library of formalized results and a methodology aimed at researchers who want to experiment with calculi.

– Study and connect the members of the CDCL family, including newer extensions. – Check the proofs in Automated Reasoning and provide a formal companion to the book. – Assess the suitability of Isabelle/HOL for formalizing logical calculi.

Compared with the other verified SAT solvers, the most noteworthy features of our frame-work are the inclusion of rules for forget, restart, and incremental solving and the applica-tion of stepwise refinement [59] to transfer results. The framework is available as part of the IsaFoL repository [20].

Any formalization effort is a case study in the use of a proof assistant. We depended heavily on the following features of Isabelle:

(4)

– Isar [58] is a textual proof format inspired by the pioneering Mizar system [41]. It makes it possible to write structured, readable proofs—a requisite for any formalization that aims at clarifying an informal proof.

– Sledgehammer [8, 48] integrates superposition provers and SMT (satisfiability modulo theories) solvers in Isabelle to discharge proof obligations. The SMT solvers, and one of the superposition provers [56], are built around a SAT solver, resulting in a situation where SAT solvers are employed to prove their own metatheory.

– Locales [2, 25] parameterize theories over operations and assumptions, encouraging a modular style. They are useful to express hierarchies of concepts and to reduce the num-ber of parameters and assumptions that must be threaded through a formal development. – The Refinement Framework [30] can be used to express refinements from abstract data structures and algorithms to concrete, optimized implementations. This allows us to reason about simple algebraic objects and yet obtain efficient programs. The Sepref tool [31] builds on the Refinement Framework to derive an imperative program, which can be extracted to Standard ML and other programming languages. For example, Isa-belle’s algebraic lists can be refined to mutable arrays in ML.

An earlier version of this work was presented at IJCAR 2016 [10]. This article extends the conference paper with a description of the refinement to an imperative implementation (Sections 2.4 and 6) and of the formalization of Weidenbach’s DPLL calculus (Section 4.1). To make the paper more accessible, we expanded the background material about Sledge-hammer (Section 2.1) and Isar (Section 2.2).

2 Isabelle/HOL

Isabelle [45,46] is a generic proof assistant that supports several object logics. The metalogic is an intuitionistic fragment of higher-order logic (HOL) [15]. The types are built from type variables 0a,0b, ... and n-ary type constructors, normally written in postfix notation (e.g,0a list). The infix type constructor 0a ⇒0bis interpreted as the (total) function space from0ato0b. Function applications are written in a curried style without parentheses (e.g.,

f x y). Anonymous functions x 7→ tx are written λx. tx. The notation t :: τ indicates that

term t has type τ. Propositions are terms of type prop, a type with at least two values. Symbols belonging to the signature (e.g.,f) are uniformly called constants, even if they are functions or predicates. No syntactic distinction is enforced between terms and formulas. The metalogical operators are universal quantificationV

:: (0a ⇒ prop) ⇒ prop, implication =⇒ :: prop ⇒ prop ⇒ prop, and equality ≡ :: 0a ⇒0a ⇒ prop. The notationVx. p

xabbreviates V(λx. p

x) and similarly for other binder notations.

Isabelle/HOL is the instantiation of Isabelle with HOL, an object logic for classical HOL extended with rank-1 (top-level) polymorphism and Haskell-style type classes. It ax-iomatizes a type bool of Booleans as well as its own set of logical symbols (∀, ∃,False,

True, ¬, ∧, ∨, −→, ← →, =). The object logic is embedded in the metalogic via a constant

Trueprop:: bool ⇒ prop, which is normally not printed. In practice, the distinction between the two logical levels is important operationally but not semantically.

Isabelle adheres to the tradition that started in the 1970s by the LCF system [22]: All inferences are derived by a small trusted kernel; types and functions are defined rather than axiomatized to guard against inconsistencies. High-level specification mechanisms let us define important classes of types and functions, notably inductive datatypes, inductive

(5)

pred-icates, and recursive functions. Internally, the system synthesizes appropriate low-level def-initions and derives the user specifications via primitive inferences.

Isabelle developments are organized as collections of theory files that build on one an-other. Each file consists of definitions, lemmas, and proofs expressed in Isar [58], Isabelle’s input language. Isar proofs are expressed either as a sequence of tactics that manipulate the proof state directly or in a declarative, natural-deduction format inspired by Mizar. Our formalization almost exclusively employs the more readable declarative style.

2.1 Sledgehammer

The Sledgehammer subsystem [8,48] integrates automatic theorem provers in Isabelle/HOL, including CVC4, E, LEO-II, Satallax, SPASS, Vampire, veriT, and Z3. Upon invocation, it heuristically selects relevant lemmas from the thousands available in loaded libraries, translates them along with the current proof obligation to SMT-LIB or TPTP, and invokes the automatic provers. In case of success, the machine-generated proof is translated to an Isar proof that can be inserted into the formal development, so that the external provers do not need to be trusted.

Sledgehammer is part of most Isabelle users’ workflow, and we invoke it dozens of times a day (according to the log files it produces). For example, while formalizing some results that depend on multisets, we found ourselves needing the basic property

lemma |A| + |B| = |A ∪ B| + |A ∩ B|

where A and B are finite multisets, ∪ denotes union defined such that for each element x, the multiplicity of x in A ∪ B is the maximum of the multiplicities of x in A and B, ∩ denotes intersection, and | | denotes cardinality. This lemma was not available in Isabelle’s underdeveloped multiset library, so we invoked Sledgehammer. Within 30 seconds, the tool came back with a brief proof text invoking a suitable tactic with a list of ten lemmas from the library, which we could insert into our formalization:

by (metis (no_types) Multiset.diff_right_commute add.assoc add_left_cancel monoid_add_class.add.right_neutral multiset_inter_commute multiset_inter_def size_union sup_commute sup_empty sup_multiset_def)

The generated proof refers to 10 library lemmas by name and applies the metis search tactic.

2.2 Isar

Without Sledgehammer, proving the above property could easily have taken 5 to 15 minutes. A manual proof, expressed in Isar’s declarative style, might look like this:

proof –

have |A| + |B| = |A + B| by auto

also have A ] B = (A ∪ B) ] (A ∩ B) unfolding multiset_eq_iff proof clarify

fix a

have count(A ] B) a =countA a +countB a by simp

moreover have count(A ∪ B ] A ∩ B) a =count(A ∪ B) a +count(A ∩ B) a by simp

(6)

moreover have count(A ∩ B) a =min(countA a) (countB a) by auto ultimately showcount(A ] B) a =count(A ∪ B ] A ∩ B) a by auto qed

ultimately show |A| + |B| = |A ∪ B| + |A ∩ B| by simp qed

The count function returns the multiplicity of an element in a multiset. The ] operator denotes the disjoint union operation—for each element, it computes the sum of the multi-plicities in the operands (as opposed to the maximum for ∪).

In Isar proofs, intermediate properties are introduced using have and proved using a tac-tic such as simp and auto. Proof blocks (proof . . . end) can be nested. The advantage of Isar proofs over one-line metis proofs is that we can follow and understand the steps. However, for lemmas about multisets and other background theories, we are usually content if we can get a proof automatic and carry on with formalizing the more interesting foreground theory.

2.3 Locales

Isabelle locales are a convenient mechanism for structuring large proofs. A locale fixes types, constants, and assumptions within a specified scope. A schematic example follows:

locale X = fixesc:: τ0_a assumes A0a,c begin hbodyi end

The definition of locale X implicitly fixes a type0a, explicitly fixes a constantc whose type τ0_amay depend on0a, and states an assumption A0_a,c:: prop over0aandc. Definitions made within the locale may depend on0aandc, and lemmas proved within the locale may additionally depend on A0_a,c. A single locale can introduce several types, constants, and as-sumptions. Seen from the outside, the lemmas proved inXare polymorphic in type variable

0_{a, universally quantified over}_c_{, and conditional on A}₀ a,c.

Locales support inheritance, union, and embedding. To embedYintoX, or makeYa sublocale ofX, we must recast an instance ofY into an instance ofX, by providing, in the context ofY, definitions of the types and constants ofXtogether with proofs of X’s assumptions. The command

sublocale Y⊆Xt

emits the proof obligation Aυ,t, where υ and t :: τυmay depend on types and constants available inY. After the proof, all the lemmas proved inXbecome available inY, with0a andc:: τ0_ainstantiated with υ and t :: τ_υ.

2.4 Refinement Framework

The Refinement Framework [30] provides definitions, lemmas, and tools that assist in the verification of functional and imperative programs via stepwise refinement [59]. The frame-work defines a programming language that is built on top of a nondeterminism monad. A program is a function that returns an object of type0a nres:

(7)

datatype 0a nres = FAIL|RES(0a set)

The Isabelle syntax is similar to that of Standard ML and other typed functional program-ming languages: The type is freely generated by its two constructors,FAIL::0a nresand

RES::0a set ⇒0a nres. The set X inRESXspecifies the possible values that can be returned. The return statement is defined as a constantRETURNx =RES{x} and specifies a single value, whereasRES{n | n> 0} indicates that an unspecified positive number is returned. The simplest program is a semantic specification of the possible outputs, encapsulated in a

RESconstructor. The following example is a nonexecutable specification of the function that subtracts 1 from every element of the list xs (with 0 − 1 defined as 0 on natural numbers):

definition sub1_spec:: nat list ⇒ nat list nres where

sub1_specxs =RETURN(map(λx. x − 1) xs)

Program refinement uses the same source and target language. The refinement relation ≤ is defined byRESX ≤RESY ←→ X ⊆ Y and r ≤FAILfor all r. For example, the concrete pro-gramRETURN2 refines (≤) the abstract programRES{n | n> 0}, meaning that all concrete behaviors are possible in the abstract version. The bottom elementRES{} is an unrefinable program; the top elementFAILrepresents a run-time failure (e.g., a failed assertion) or di-vergence.

Refinement can be used to change the program’s data structures and algorithms, towards a more deterministic and usually more efficient program for which executable code can be generated. We can refine the previous specification to a program that uses a ‘while’ loop:

definition sub1_imp:: nat list ⇒ nat list nres where

sub1_impxs =do {

(i,zs) ←WHILET(λ(i, ys). i < |ys|)

(λ(i, ys). do {

ASSERT(i< |ys|);

let zs =list_updateys i ((ys! i) − 1);

RETURN(i + 1, zs) })

(0, xs);

RETURNzs }

The program relies on the following constructs. The ‘do’ is a Haskell-inspired syntax for ex-pressing monadic computations (here, on the nondeterminism monad). TheWHILET

com-binator takes a condition, a loop body, and a start value. In our example, the loop’s state is a pair of the form (i, ys). TheTsubscript in the combinator’s name indicates that the loop

must not diverge. Totality is necessary for code generation. TheASSERTstatement takes an assertion that must always be true when the statement is executed. Finally, the xs ! i operation returns the (i + 1)st element of xs, andlist_updatexs i yreplaces the (i + 1)st element by y.

To prove the refinement lemmasub1_impxs ≤sub1_specxs, we can use the refine_vcg proof method provided by the Refinement Framework. This method heuristically aligns the statements of the two programs and generates proof obligations, which are passed to the user. If the abstract program has the formRESXorRETURNx, as is the case here, refine_vcg applies Hoare-logic-style rules to generate the verification conditions. For our example, two of the resulting proof obligations correspond to the termination of the ‘while’ loop and the correctness of the assertion. We can use the measure λ(i, ys). |ys| − i to prove termination.

In a refinement step, we can also change the types. For our small program, if we assume that the natural numbers in the list are all nonzero, we can replace them by integers and

(8)

use the subtraction operation on integers (for which 0 − 1 = −1 6= 0). The program remains syntactically identical except for the type annotation:

definition sub1_impint:: int list ⇒ int list nres where

sub1_impintxs = hsame body assub1_impi

We want to establish the following relation: If all elements in xs :: nat list are nonzero and the elements of ys :: int list are positionwise numerically equal to those of xs, then any list of integers returned bysub1_impintysis positionwise numerically equal to some

list returned bysub1_impxs. The framework lets us express preconditions and connections between types using higher-order relations called relators:

(sub1_impint,sub1_imp)

∈ [λxs. ∀i ∈ xs. i 6= 0] hint_of_nat_relilist_rel→ hhint_of_nat_relilist_relinres_rel

The relationint_of_nat_rel:: (int × nat) set relates natural numbers with their integer coun-terparts (e.g., (5 :: int, 5 :: nat) ∈int_of_nat_rel). The syntax of relators mimics that of types; for example, if R is the relation for0a, then hRilist_relis the relation for0a list, and hRinres_relis the relation for0a nres. The ternary relator [p] R → S , for functions0a ⇒0b, lifts the relations R and S for0aand0bunder precondition p.

The Imperative HOL library [14] defines a heap monad that can express imperative programs with side effects. On top of Imperative HOL, a separation logic, with assertion type assn, can be used to express relations0a ⇒0b ⇒ assnbetween plain values, of type0a, and data structures on the heap, of type0b. For example,array_assnR::0a list ⇒0b array ⇒ assn relates lists of0aelements with mutable arrays of0belements, where R ::0a ⇒0b ⇒ assnis used to relate the elements. The relation between the ! operator on lists and its heap-based counterpartArray.nthcan be expressed as follows:

(λ(xs, i).Array.nthxs i), (λ(xs, i).RETURN(xs ! i)) ∈ [λ(xs, i). i < |xs|](array_assnR)k×nat_assnk→ R

The arguments’ relations are annotated withk _{(“keep”) or}d _{(“destroy”) superscripts that}

indicate whether the previous value can still be accessed after the operation has been per-formed. Reading an array leaves it unchanged, whereas updating it destroys the old array.

The Sepref tool automates the transition from the nondeterminism monad to the heap monad. It keeps track of the values that are destroyed and ensures that they are not used later in the program. Given a suitable source program, it can automatically generate the target program and prove the corresponding refinement lemma automatically. The main difficulty is that some low-level operations have side conditions, which we must explicitly discharge by adding assertions at the right points in the source program to guide Sepref.

The following command generates a heap program called sub1_imp_code from the source programsub1_impint:

sepref_definition sub1_imp_code is

sub1_impint:: [λ_.True] (array_assn nat_assn)d→array_assn nat_assn

by sepref

The generated array-based program is

sub1_imp_codexs =do {

(i,zs) ←heap_WHILET(λ(i, ys). do { zs ←Array.lenys; return(i< zs) })

(9)

(0, xs);

returnzs }

The end-to-end refinement theorem, obtained by composing the refinement lemmas, is (sub1_imp_code,sub1_spec)

∈ [λxs. ∀i ∈ xs. i 6= 0] (array_assn int_of_nat_assn)d_→_array_{_}_{assn int}_{_}_of_{_}_nat_{_}_assn

If we want to execute the program efficiently, we can translate it to Standard ML using Isabelle’s code generator [23]. The following imperative code, including its dependencies, is generated (in slightly altered form):

fun sub1_imp_code xs = (fn () => let

val (i, zs) =

heap_WHILET (fn (i, ys) => fn () => i < len heap_int ys) (fn (i, ys) => fn () =>

let val z = nth heap_int ys i - 1 in (i + 1, upd heap_int i z ys) end) (0, xs) ();

in zs end);

The ML idiom(fn () =>...) ()is inserted to delay the evaluation of the body, so that the side effects occur in the intended order.

3 Abstract CDCL

The abstract CDCL calculus by Nieuwenhuis et al. [43] forms the first layer of our refine-ment chain. The formalization relies on basic Isabelle libraries for lists and multisets and on custom libraries for propositional logic. Properties such as partial correctness and termina-tion (given a suitable strategy) are inherited by subsequent layers.

3.1 Propositional Logic

The DPLL and CDCL calculi distinguish between literals whose truth value has been de-cided arbitrarily and those that are entailed by the current decisions; for the latter, it is sometimes useful to know which clause entails it. To capture this information, we introduce a type of annotated literals, parameterized by a type0vof propositional variables and a type

0_cls_{of clauses:}

datatype 0v literal = datatype (0v,0cls) ann_literal =

Pos0v Decided(0v literal) | Neg0v | Propagated(0v literal)0cls

The simpler calculi do not use0cls; they take0cls = unit, a singleton type whose unique value is (). Informally, we write A, ¬ A, and L†for positive, negative, and decision literals, and we write LC (with C ::0cls) or simply L (if0cls = unitor if the clause C is irrelevant) for propagated literals. The unary minus operator is used to negate a literal, with −(¬ A) = A.

As is customary in the literature [1, 57], clauses are represented by multisets, ignoring the order of literals but not repetitions. A0v clauseis a (finite) multiset over0v literal. Clauses

(10)

are often stored in sets or multisets of clauses. To ease reading, we write clauses using logical symbols (e.g., ⊥, L, and C ∨ D for /0, {L}, and C ] D). Given a clause C, we write ¬C for the formula that corresponds to the clause’s negation.

Given a set or multiset I of literals, I C is true if and only if C and I share a literal. This is lifted to sets and multisets of clauses or formulas: I N ←→ ∀C ∈ N. I C. A set or multiset is satisfiable if there exists a consistent set or multiset of literals I such that I N. Finally, N N0←→ ∀I. I N −→ I N 0. These notations are also extended to formulas.

3.2 DPLL with Backjumping

Nieuwenhuis et al. present CDCL as a set of transition rules on states. A state is a pair (M, N), where M is the trail and N is the multiset of clauses to satisfy. In a slight abuse of terminology, we will refer to the multiset of clauses as the “clause set.” The trail is a list of annotated literals that represents the partial model under construction. The empty list is written . Somewhat nonstandardly, but in accordance with Isabelle conventions for lists, the trail grows on the left: Adding a literal L to M results in the new trail L · M, where · ::0_{a ⇒}0_{a list ⇒}0_{a list. The concatenation of two lists is written M @ M}0_{. To lighten the}

notation, we often build lists from elements and other lists by simple juxtaposition, writing MLM0for M @ L · M0.

The core of the CDCL calculus is defined as a transition relationDPLL_NOT+BJ, an ex-tension of classical DPLL [17] with nonchronological backtracking, or backjumping. The

NOTpart of the name refers to Nieuwenhuis, Oliveras, and Tinelli. The calculus consists of three rules, starting from an initial state (, N). In the following, we abuse notation, implic-itly converting’s first operand from a list to a set and ignoring annotations on literals:

Propagate (M, N) =⇒DPLL_NOT+BJ(LM, N)

if N contains a clause C ∨ L such that M ¬C and L is undefined in M (i.e., neither M L nor M −L)

Decide (M, N) =⇒DPLL_NOT+BJ(L†M, N)

if the atom of L occurs in N and is undefined in M

Backjump (M0L†M, N) =⇒DPLL_NOT+BJ (L0M, N)

if N contains a conflicting clause C (i.e., M0L†M_{¬C) and there exists a clause C}0∨ L0 such that N C0∨ L0_{, M}_¬C0_{, and L}0_{is undefined in M but occurs in N or in M}0_L†

TheBackjump rule is more general than necessary for capturing DPLL, where it suffices to negate the leftmost decision literal. The general rule can also express nonchronological backjumping, if C0∨ L0_{is a new clause derived from N (but not necessarily in N).}

We represented the calculus as an inductive predicate. For the sake of modularity, we formalized the rules individually as their own predicates and combined them to obtain

DPLL_NOT+BJ :

inductive DPLL_NOT+BJ::0st ⇒0st ⇒ bool where

propagateS S0=⇒DPLL_NOT+BJS S0 |decideS S0=⇒DPLL_NOT+BJS S0 |backjumpS S0=⇒DPLL_NOT+BJ S S0

Since there is no call toDPLL_NOT+BJin the assumptions, we could also have used a plain definition here, but the inductive command provides convenient introduction and elimina-tion rules. The predicate operates on states of type 0st. To allow for refinements, this type is kept as a parameter of the calculus, using a locale that abstracts over it and that provides basic operations to manipulate states:

(11)

locale dpll_state = fixes

trail::0st ⇒ (0v, unit) ann_literal list and

clauses::0st ⇒0v clause multiset and

prepend_trail:: (0v, unit) ann_literal ⇒0st ⇒0st and

tl_trail::0st ⇒0st and

add_clause::0v clause ⇒0st ⇒0st and

remove_clause::0v clause ⇒0st ⇒0st assumes

state(prepend_trailL S) = (L ·trailS,clausesS)and

state(tl_trailS) = (tl(trailS),clausesS) and

state(add_clsC S) = (trailS,add_msetC (clausesS)) and

state(remove_clsC S) = (trailS,remove_allC (clausesS))

wherestateconverts an abstract state of type0stto a pair (M, N). Inside the locale, states are compared extensionally: S ∼ S0is true if the two states have identical trails and clause sets (i.e., ifstateS =stateS0). This comparison ignores any other fields that may be present in concrete instantiations of the abstract state type0st.

Each calculus rule is defined in its own locale, based ondpll_stateand parameterized by additional side conditions. Complex calculi are built by inheriting and instantiating lo-cales providing the desired rules. For example, the following locale provides the predicate corresponding to theDeciderule, phrased in terms of an abstract DPLL state:

locale decide_ops=dpll_state+ fixesdecide_conds::0st ⇒0st ⇒ bool begin

inductive decide::0st ⇒0st ⇒ bool where

undefined_lit(trailS) L =⇒

atm_ofL ∈atms_of(clausesS ) =⇒

S0∼prepend_trail(DecidedL) S =⇒

decide_condsS S0=⇒

decideS S0 end

Following a common idiom, theDPLL_NOT+BJcalculus is distributed over two locales: The first locale,DPLL_NOT+BJ_ops, defines theDPLL_NOT+BJcalculus; the second locale,

DPLL_NOT+BJ , extends it with an assumption expressing a structural invariant overDPLL_ NOT+BJthat is instantiated when proving concrete properties later. This cannot be achieved with a single locale, because definitions may not precede assumptions.

Theorem 1 (Termination [20,wf_dpll_bj]) The relationDPLL_NOT+BJ is well founded. Termination is proved by exhibiting a well-founded relation ≺ such that S0≺ S whenever S =⇒DPLL_NOT+BJS0. Let S = (M, N) and S0= (M0, N0) with the decompositions

M = MnL†n· · · M1L†₁M0 M0= Mn00L0†_n0· · · M0₁L0†₁M₀0 where the trail segments M0,..., Mn, M00,..., M

0

n0 contain no decision literals. Let V be the

number of distinct variables occurring in the initial clause set N. Now, let ν M = V − |M|, indicating the number of unassigned variables in the trail M. Nieuwenhuis et al. define ≺ such that S0≺ S if

(12)

(1) there exists an index i ≤ n, n0such that [ν M0₀, ..., ν M_i−10 ] = [ν M0, ..., ν Mi−1] and ν M0i<

ν Mi; or

(2) [ν M0, ..., ν Mn] is a strict prefix of [ν M00, ..., ν M 0 n0].

This order is not to be confused with the lexicographic order: We have [0] ≺ by condition (2), whereas <lex[0]. Yet the authors justify foundedness by appealing to the

well-foundedness of <lex on bounded lists over finite alphabets. In our proof, we clarify and

simplify matters by mapping states S to lists|M0|,...,|Mn|, without appealing to ν. Using

the standard lexicographic order, states become larger with each transition:

Propagate [k1,...,kn]<lex [k1,...,kn+ 1] Decide [k1,...,kn]<lex [k1,...,kn,0]

Backjump [k1,...,kn]<lex [k1,...,kj+ 1] with j ≤ n

The lists corresponding to possible states are bounded by the list [V, . . . , V] consisting of V occurrences of V, thereby delimiting a finite domain D = {[k1,...,kn] | k1,...,kn,n ≤ V}.

We take ≺ to be the restriction of >lexto D. A variant of this approach is to encode lists

into a measure µVM = ∑ni=0|Mi| Vn−i and let S0≺ S ←→µV M0> µVM, building on the

well-foundedness of > over bounded sets of natural numbers.

A final state is a state from which no transitions are possible. Given a relation =⇒, we write =⇒! for the right-restriction of its reflexive transitive closure to final states (i.e., S0=⇒!Sif and only if S0=⇒∗S ∧ ∀S0. S 6=⇒ S0).

Theorem 2 (Partial Correctness [20,full_dpll_backjump_final_state_from_init_state]) If (, N) =⇒!

DPLL_NOT+BJ(M, N), then N is satisfiable if and only if M N.

We first prove structural invariants on arbitrary states (M0, N) reachable from (, N), namely: (1) each variable occurs at most once in M0; (2) if M0= M2LM1where L is

prop-agated, then M1, N L. From these invariants, together with the constraint that (M, N) is a

final state, it is easy to prove the theorem.

3.3 Classical DPLL

The locale machinery allows us to derive a classical DPLL calculus from DPLL with back-jumping. We call this calculusDPLL_NOT . It is achieved through aDPLL_NOTlocale that restricts theBackjumprule so that it performs only chronological backtracking:

Backtrack (M0L†M, N) =⇒DPLL_NOT(−L · M, N)

if N contains a conflicting clause and M0contains no decision literals

Because of the locale parameters,DPLL_NOT is strictly speaking a family of calculi. Lemma 3 (Backtracking [20,backtrack_is_backjump]) TheBacktrackrule is a special case of theBackjumprule.

TheBackjumprule depends on two clauses: a conflict clause C and a clause C0∨ L0_that

justifies the propagation of L0. The conflict clause is specified byBacktrack. As for C0∨ L0_,

given a trail M0L†Mdecomposable as MnL†Mn−1Ln−1† · · · M1L†₁M0where M0,..., Mn

con-tain no decision literals, we can take C0= −L1∨ · · · ∨ −Ln−1.

Consequently, the inclusionDPLL_NOT⊆DPLL_NOT+BJ holds. In Isabelle, this is ex-pressed as a locale instantiation:DPLL_NOTis made a sublocale ofDPLL_NOT+BJ, with a side condition restricting the application of theBackjumprule. The partial correctness and

(13)

termination theorems are inherited from the base locale.DPLL_NOTinstantiates the abstract state type0stwith a concrete type of pairs. By discharging the locale assumptions emerging with the sublocale command, we also verify that these assumptions are consistent. Roughly:

locale DPLL_NOT = begin

type_synonym 0v state = (0v, unit, unit) ann_literal list ×0v clause multiset inductivebacktrack::0v state ⇒0v state ⇒ bool where . . .

end

sublocale DPLL_NOT⊆dpll_state fst snd(λL (M, N). (L · M, N)) . . .

sublocale DPLL_NOT⊆DPLL_NOT+BJ_ops . . . (λC L S S0.DPLL.backtrackS S0) . . . sublocale DPLL_NOT⊆DPLL_NOT+BJ. . .

If a conflict cannot be resolved by backtracking, we would like to have the option of stopping even if some variables are undefined. A state (M, N) is conclusive if M N or if Ncontains a conflicting clause and M contains no decision literals. ForDPLL_NOT, all final states are conclusive, but not all conclusive states are final.

Theorem 4 (Partial Correctness [20,dpll_conclusive_state_correctness]) If (, N) =⇒∗DPLL_NOT

(M, N) and (M, N) is a conclusive state, N is satisfiable if and only if M N.

The theorem does not require stopping at the first conclusive state. In an implementation, testing M N can be expensive, so a solver might fail to notice that a state is conclusive and continue for some time. In the worst case, it will stop in a final state—which is guaranteed to exist by Theorem 1. In practice, instead of testing whether M N, implementations typically apply the rules until every literal is set. When N is satisfiable, this produces a total model.

3.4 The CDCL Calculus

The abstract CDCL calculus extendsDPLL_NOT+BJwith a pair of rules for learning new lemmas and forgetting old ones:

Learn (M, N) =⇒CDCL_NOT(M, N ] {C}) if N C and each atom of C is in N or M

Forget (M, N ] {C}) =⇒CDCL_NOT(M, N) if N C

In practice, theLearnrule is normally applied to clauses built exclusively from atoms in M, because the learned clause is false in M. This property eventually guarantees that the learned clause is not redundant (e.g., it is not already contained in N).

We call this calculusCDCL_NOT . In general,CDCL_NOTdoes not terminate, because it is possible to learn and forget the same clause infinitely often. But for some instantiations of the parameters with suitable restrictions onLearnandForget, the calculus always terminates.

Theorem 5 (Termination [20, wf_cdclNOT_no_learn_and_forget_infinite_chain]) LetC be an instance of theCDCL_NOTcalculus(i.e., C⊆CDCL_NOT ). If Cadmits no infinite chains consisting exclusively ofLearnandForgettransitions, thenCis well founded.

In many SAT solvers, the only clauses that are ever learned are the ones used for back-tracking. If we restrict the learning so that it is always done immediately before backjump-ing, we can be sure that some progress will be made between aLearnand the nextLearnor

(14)

CDCL_NOT (Section 3.4) DPLL_NOT (Section 3.3) DPLL_NOT+BJ (Section 3.2) CDCL_NOT+restart (Section 3.5) CDCL_NOT_merge (Section 3.4) CDCL_NOT_merge+restartT (Section 3.5) (a) Syntactic dependencies

CDCL_NOT (Section 3.4) DPLL_NOT (Section 3.3) DPLL_NOT+BJ (Section 3.2) CDCL_NOT+restart (Section 3.5) CDCL_NOT_merge (Section 3.4) CDCL_NOT_merge+restartT (Section 3.5) (b) Refinements

Fig. 1: Connections between the abstract calculi

Learn+Backjump (M0L†M, N) =⇒CDCL_NOT_merge(L0M, N ] {C0∨ L0})

if C0, L, L0, M, M0, N satisfyBackjump’s side conditions

The calculus variant that performs this rule instead ofLearnandBackjumpis calledCDCL_ NOT_merge. Because a singleLearn+Backjumptransition corresponds to two transitions in

CDCL_NOT, the inclusionCDCL_NOT_merge ⊆CDCL_NOTdoes not hold. Instead, we have

CDCL_NOT_merge⊆CDCL_NOT +. Each step ofCDCL_NOT_mergecorresponds to a single step inCDCL_NOTor a two-step sequence consisting ofBackjumpfollowed byLearn.

3.5 Restarts

Modern SAT solvers rely on a dynamic decision literal heuristic. They periodically restart the proof search to apply the effects of a changed heuristic. This helps the calculus focus on a part of the initial clauses where it can make progress. Upon a restart, some learned clauses may be removed, and the trail is reset to . Since our calculus has aForgetrule, theRestart

rule needs only to clear the trail. AddingRestarttoCDCL_NOT yieldsCDCL_NOT+restart. However, this calculus does not terminate, becauseRestartcan be applied infinitely often.

A working strategy is to gradually increase the number of transitions between succes-sive restarts. This is formalized via a locale parameterized by a base calculusC and an unbounded function f :: nat ⇒ nat. Nieuwenhuis et al. require f to be strictly increasing, but unboundedness is sufficient.

The extended calculus C+restartToperates on states of the form (S, n), where S is a state in the base calculus and n counts the number of restarts. To simplify the presentation, we assume that bases states S are pairs (M, N). The calculusC+restartTstarts in the state ((, N),0) and consists of two rules:

Restart (S, n) =⇒C+restartT((, N0), n + 1) if S =⇒mC(M0, N0) and m ≥ f n

Finish (S, n) =⇒C+restartT(S0, n + 1) if S =⇒0 !CS

0

The symbol =⇒Crepresents the base calculusC’s transition relation, and =⇒mCdenotes an

(15)

in Section 4.5, we will review an alternative strategy based on the number of conflicts or learned clauses. Termination relies on a measure µV associated withCthat may not increase

from restart to restart: If S =⇒∗CS

0_=⇒

restartT S00, then µVS00≤µVS. The measure may

depend on V, the number of variables occurring in the problem.

We instantiated the locale parameter C withCDCL_NOT_mergeand f with the Luby sequence (1, 1, 2, 1, 1, 2, 4, . . . ) [35], with the restriction that no clause containing duplicate literals is ever learned, thereby bounding the number of learnable clauses and hence the number of transitions taken byC.

Figure 1a summarizes the syntactic dependencies between the calculi reviewed in this section. An arrow C−→B indicates thatC is defined in terms ofB. Figure 1b presents the refinements between the calculi. An arrowC=⇒Bindicates that we provedC⊆B∗or some stronger result—either by locale embedding (sublocale) or by simulatingC’s behavior in terms ofB.

4 A Refined CDCL towards an Implementation

TheCDCL_NOTcalculus captures the essence of modern SAT solvers without imposing a policy on when to apply specific rules. In particular, theBackjumprule depends on a clause C0∨ L0to justify the propagation of a literal, but does not specify a procedure for coming up with this clause. For Automated Reasoning, Weidenbach developed a calculus that is more specific in this respect, and closer to existing solver implementations, while keeping many aspects unspecified [57]. This calculus,CDCL_W, is also formalized in Isabelle and connected toCDCL_NOT.

4.1 The New DPLL Calculus

Independently from the previous section, we formalized DPLL as described in Automated Reasoning. The calculus operates on states (M, N), where M is the trail and N is the initial clause set. It consists of three rules:

Propagate (M, N) =⇒DPLL_W(LM, N) if C ∨ L ∈ N ]U, M ¬C, and L is undefined in M

Decide (M, N) =⇒DPLL_W(L†M, N) if L is undefined in M and occurs in N

Backtrack (M0K†M, N) =⇒DPLL_W(−K · M, N)

if N contains a conflicting clause and M0contains no decision literals

Backtrackperforms chronological backtracking: It undoes the last decision and picks the opposite choice. Conclusive states forDPLL_Ware defined as forDPLL_NOT(Section 3.3).

The termination and partial correctness proofs given by Weidenbach depart from Nieuw-enhuis et al. We also formalized them:

Theorem 6 (Termination [20,wf_dpllW]) The relationDPLL_Wis well founded.

Termination is proved by exhibiting a well-founded relation that includesDPLL_W. Let Vbe the number of distinct variables occurring in the clause set N. The weight ν L of a literal Lis 2 if L is a decision literal and 1 otherwise. The measure is

µ (Lk· · · L1, N) = ν L1, ..., ν Lk, 3, ..., 3

| {z }

V−koccurrences

(16)

Lists are compared using the lexicographic order, which is well founded because there are finitely many literals and all lists have the same length. It is easy to check that the measure decreases with each transition:

Propagate [k1,...,km,3,3,...,3] >lex [k1,...,km,1,3,...,3] Decide [k1,...,km,3,3,...,3] >lex [k1,...,km,2,3,...,3] Backtrack [k1,...,km,2,l1,...,ln]>lex [k1,...,km,1,3,...,3]

Theorem 7 (Partial Correctness [20,dpllW_conclusive_state_correctness]) If (, N) =⇒∗DPLL_W

(M, N) and (M, N) is a conclusive state, N is satisfiable if and only if M N.

The proof is analogous to the proof of Theorem 2. Some lemmas are shared between both proofs. Moreover, we can link Weidenbach’s DPLL calculus with the version we de-rived fromDPLL_NOT+BJin Section 3.3:

Theorem 8 (DPLL [20,dpllW_dpllNOT]) For all states S that satisfy basic structural invari-ants, S =⇒DPLL_WS0if and only if S =⇒DPLL_NOTS0.

This provides another way to establish Theorems 6 and 7. Conversely, the simple mea-sure that appears in the above termination proof can also be used to establish the termination of the more generalDPLL_NOT+BJcalculus (Theorem 1).

4.2 The New CDCL Calculus

TheCDCL_Wcalculus operates on states (M, N, U, D), where M is the trail; N and U are the sets of initial and learned clauses, respectively; and D is a conflict clause, or the distin-guished clause > if no conflict has been detected.

In the trail M, each decision literal L is marked as such (L†—i.e.,DecidedL), and each propagated literal L is annotated with the clause C that caused its propagation (LC_—i.e.,

PropagatedL C). The level of a literal L in M is the number of decision literals to the right of the atom of L in M, or 0 if the atom is undefined. The level of a clause is the highest level of any of its literals, with 0 for ⊥, and the level of a state is the maximum level (i.e., the number of decision literals). The calculus assumes that N contains no clauses with duplicate literals and never produces clauses containing duplicates.

The calculus starts in a state (, N, /0, >). The following rules apply as long as no conflict has been detected:

Propagate (M, N, U, >) =⇒CDCL_W(LC∨LM, N, U, >)

if C ∨ L ∈ N ] U, M ¬C, and L is undefined in M

Decide (M, N, U, >) =⇒CDCL_W(L†M, N, U, >) if L is undefined in M and occurs in N

Conflict (M, N, U, >) =⇒CDCL_W(M, N, U, D) if D ∈ N ] U and M ¬ D

Restart (M, N, U, >) =⇒CDCL_W(, N, U, >) if M 6 N

Forget (M, N, U ] {C}, >) =⇒CDCL_W(M, N, U, >) if M 6 N and M contains no literal LC

ThePropagateandDeciderules generalize theirDPLL_Wcounterparts. Once a conflict clause has been detected and stored in the state, the following rules cooperate to reduce it and backtrack, exploring a first unique implication point [6, Chapter 3]:

Skip (LC_{M, N, U, D) =⇒}

(17)

Resolve (LC∨L_{M, N, U, D ∨ −L) =⇒}

CDCL_W(M, N, U, C ∪ D)

if D has the same level as the current state

Jump (M0K†M, N, U, D ∨ L) =⇒CDCL_W(LD∨LM, N, U ] {D ∨ L}, >)

if L has the level of the current state, D has a lower level, and K and D have the same level

Exhaustive application of these three rule corresponds to a single step by the combined learning and nonchronological backjumping ruleLearn+BackjumpfromCDCL_NOT_merge . TheLearn+Backjumprule is even more general and can be used to express learned clause minimization [54].

InResolve, C ∪ D is the same as C ∨ D (i.e., C ] D), except that it keeps only one copy of the literals that belong to both C and D. When performing propagations and processing con-flict clauses, the calculus relies on the invariant that clauses never contain duplicate literals. Several other structural invariants hold on all states reachable from an initial state, including the following: The clause annotating a propagated literal of the trail is a member of N ] U. Some of the invariants were not mentioned in the textbook (e.g., whenever LCoccurs in the trail, L is a literal of C). Formalization helped develop a better understanding of the data structure and clarify the book.

LikeCDCL_NOT,CDCL_Whas a notion of conclusive state. A state (M, N, U, D) is con-clusiveif D = > and M N or if D = ⊥ and N is unsatisfiable. The calculus always termi-nates, but without a suitable strategy, it can block in an inconclusive state. At the end of the following derivation, neitherSkipnorResolvecan process the conflict further:

(, {A, B}, /0, >) =⇒Decide (¬ A†, {A, B}, /0, >)

=⇒Decide (¬ B†¬ A†, {A, B}, /0, >)

=⇒Conflict(¬ B†¬ A†, {A, B}, /0, A)

4.3 A Reasonable Strategy

To prove correctness, we assume a reasonable strategy:PropagateandConflictare preferred overDecide;RestartandForgetare not applied. (We will lift the restriction onRestartand

Forgetin Section 4.5.) The resulting calculus,CDCL_W+stgy , refinesCDCL_Wwith the as-sumption that derivations are produced by a reasonable strategy. This asas-sumption is enough to ensure that the calculus can backjump after detecting a nontrivial conflict clause other than ⊥. The crucial invariant is the existence of a literal with the highest level in any con-flict, so thatResolvecan be applied. The textbook suggests preferringConficttoPropagate

andPropagateto the other rules. While this makes sense in an implementation, it is not needed for any of our metatheoretical results.

Theorem 9 (Partial Correctness [20,full_cdclW_stgy_final_state_conclusive_from_init_state]) If (, N, /0, >) =⇒!

CDCL_W+stgy S0and N contains no clauses with duplicate literals, S0is a

conclusive state.

Once a conflict clause has been stored in the state, the clause is first reduced by a chain ofSkipandResolvetransitions. Then, there are two scenarios: (1) the conflict is solved by aJump, at which point the calculus may resume propagating and deciding literals; (2) the reduced conflict is ⊥, meaning that N is unsatisfiable—i.e., for unsatisfiable clause sets, the calculus generates a resolution refutation.

(18)

TheCDCL_W+stgycalculus is designed to have respectable complexity bounds. One of the reasons for this is that the same clause cannot be learned twice:

Theorem 10 (No Relearning [20,cdclW_stgy_distinct_mset_clauses]) If we have (, N, /0, >) =⇒∗CDCL_W+stgy(M, N, U, D), then noJumptransition is possible from the latter state causing

the addition of a clause from N ] U to U.

The formalization of this theorem posed some challenges. The informal proof in Auto-mated Reasoningis as follows (with slightly adapted notations):

Proof By contradiction. Assume CDCL learns the same clause twice, i.e., it reaches a state (M, N, U, D ∨ L) whereJumpis applicable and D ∨ L ∈ N ] U. More precisely, the state has the form (Kn· · · K2K₁†M2K†M1, N, U, D ∨ L) where the Ki, i > 1 are

propagated literals that do not occur complemented in D, as otherwise D cannot be of level i. Furthermore, one of the Kiis the complement of L. But now, because

D ∨ Lis false in Kn· · · K2K†1M2K†M1and D ∨ L ∈ N ] U instead of deciding K†1the

literal L should be propagated by a reasonable strategy. A contradiction. Note that none of the Kican be annotated with D ∨ L. ut

Many details are missing. To find the contradiction, we must show that there exists a state in the derivation with the trail M2K†M1, and such that D ∨ L ∈ N ] U. The textbook does

not explain why such a state is guaranteed to exist. Moreover, inductive reasoning is hidden under the ellipsis notation (Kn· · · K2). Such a high-level proof might be suitable for humans,

but the details are needed in Isabelle, and Sledgehammer alone cannot fill in such large gaps, especially if induction is needed. The first version of the formal proof was over 700 lines long and is among the most difficult proofs we carried out.

We later refactored the proof. Following the book, each transition inCDCL_W+stgywas normalized by applyingPropagateandConflictexhaustively. For example, we defined De-cide+stgyso that S =⇒Decide+stgyU ifPropagate andConflictcannot be applied to S and

S =⇒DecideT =⇒!Propagate,ConflictU for some state T. However, normalization is not

neces-sary. It is simpler to define S =⇒Decide+stgyT as S =⇒DecideT, with the same condition on

Sas before. This change shortened the proof by about 200 lines. In a subsequent refactor-ing, we further departed from the book: We proved the invariant that all propagations have been performed before deciding a new literal. The core argument (“the literal L should be propagated by a reasonable strategy”) remains the same, but we do not have to reason about past transitions to argue about the existence of an earlier state. The invariant also makes it possible to generalize the statement of Theorem 10: We can start from any state that satisfies the invariant, not only from an initial state. The final version of the proof is 250 lines long.

Using Theorem 10 and assuming that only backjumping has a cost, we get a complexity of O(3V_{), where V is the number of different propositional variables. If}_Conflict_{is always}

preferred overPropagate, the learned clause is never redundant in the sense of ordered res-olution [57], yielding a complexity bound of O(2V). We have not formalized this yet.

In Automated Reasoning, and in our formalization, Theorem 10 is also used to establish the termination ofCDCL_W+stgy. However, the argument for the termination ofCDCL_NOT

also applies toCDCL_Wirrespective of the strategy, a stronger result. To lift this result, we must show thatCDCL_WrefinesCDCL_NOT.

4.4 Connection with Abstract CDCL

It is interesting to show thatCDCL_WrefinesCDCL_NOT_merge, to establish beyond doubt thatCDCL_Wis a CDCL calculus and to lift the termination proof and any other general

(19)

results aboutCDCL_NOT_merge. The states are easy to connect: We interpret aCDCL_Wtuple (M, N, U, C) as aCDCL_NOTpair (M, N ] U), ignoring C.

The main difficulty is to relate the low-level conflict-relatedCDCL_Wrules to their high-level counterparts. Our solution is to introduce an intermediate calculus, calledCDCL_W_ merge, that combines consecutive low-level transitions into a single transition. This calculus refines bothCDCL_WandCDCL_NOT_ mergeand is sufficiently similar toCDCL_Wso that we can transfer termination and other properties fromCDCL_NOT_mergetoCDCL_Wthrough it. Whenever theCDCL_Wcalculus performs a low-level sequence of transitions of the form

Conflict(Skip|Resolve)∗Jump?, theCDCL_W_mergecalculus performs a single transition of a new rule that subsumes all four low-level rules:

Reduce+Maybe_Jump S =⇒CDCL_W_mergeS00

if S =⇒ConflictS0=⇒!Skip,Resolve,JumpS

00

When simulatingCDCL_W_mergein terms ofCDCL_NOT, two interesting scenarios arise. First,Reduce+Maybe_Jump’s behavior may comprise a backjump: The rule can be simulated usingCDCL_NOT_merge’sLearn+Backjumprule. The second scenario arises when the conflict clause is reduced to ⊥, leading to a conclusive final state. Then,Reduce+Maybe_Jumphas no counterpart inCDCL_NOT_merge. The two calculi are related as follows: If S =⇒CDCL_W_merge

S0, either S =⇒CDCL_NOT_mergeS0or S is a conclusive state. SinceCDCL_NOT_mergeis well

founded, so isCDCL_W_merge . This implies thatCDCL_WwithoutRestartterminates. SinceCDCL_W_merge is mostly a rephrasing ofCDCL_W, it makes sense to restrict it to a reasonable strategy that prefersPropagateandReduce+Maybe_Jump overDecide, yielding

CDCL_W_merge+stgy. The two strategy-restricted calculi have the same end-to-end behavior: S =⇒!CDCL_W_merge+stgyS

0 _←

→ S =⇒!CDCL_W+stgyS

0

4.5 A Strategy with Restart and Forget

We could use the same strategy for restarts as in Section 3.5, but we prefer to exploit The-orem 10, which asserts that no relearning is possible. Since only finitely many different duplicate-free clauses can ever be learned, it is sufficient to increase the number of learned clauses between two restarts to ensure termination. This criterion is the norm in modern SAT solvers. The lower bound on the number of learned clauses is given by an unbounded function f :: nat ⇒ nat. In addition, we allow an arbitrary subset of the learned clauses to be forgotten upon a restart but otherwise forbidForget. The calculusC+restartLthat realizes these ideas is defined by the two rules

Restart (S,n) =⇒C+restartL(S000,n + 1)

if S =⇒∗CS0=⇒RestartS00=⇒∗ForgetS000and |learnedS0| − |learnedS| ≥ f n

Finish (S,n) =⇒C+restartL(S0,n + 1) if S =⇒!CS0

We formally proved thatCDCL_W+stgy+restartLis totally correct. Figure 2 summarizes the situation, following the conventions of Figure 1.

4.6 Incremental Solving

SMT solvers combine a SAT solver with theory solvers (e.g., for uninterpreted functions and linear arithmetic). The main loop runs the SAT solver on a clause set. If the SAT solver

(20)

CDCL_W (Section 4.2) CDCL_NOT_merge (Section 3.4) CDCL_W+stgy (Section 4.3) CDCL_W_merge (Section 4.4) CDCL_W+stgy+restartL (Section 4.5) CDCL_W_merge+stgy (Section 4.4)

(a) Syntactic dependencies

CDCL_W (Section 4.2) CDCL_NOT_merge (Section 3.4) CDCL_W+stgy (Section 4.3) CDCL_W_merge (Section 4.4) CDCL_W+stgy+restartL (Section 4.5) CDCL_W_merge+stgy (Section 4.4) (b) Refinements

Fig. 2: Connections involving the refined calculi

answers “unsatisfiable,” the SMT solver is done; otherwise, the main loop asks the theory solvers to provide further, theory-motivated clauses to exclude the current candidate model and force the SAT solver to search for another one. This design crucially relies on incremen-tal SAT solving: The possibility of adding new clauses to the clause set C of a conclusive satisfiable state and of continuing from there.

As a step towards formalizing SMT, we designed a calculusCDCL_W+stgy+incrthat pro-vides incremental solving on top ofCDCL_W+stgy :

Add_NonconflictC (M, N, U, >) =⇒CDCL_W+stgy+incrS0

if M 6 ¬C and (M, N ] {C}, U, >) =⇒!

CDCL_W+stgyS

0 Add_ConflictC (M0LM, N, U, >) =⇒CDCL_W+stgy+incrS0

if LM ¬C, −L ∈ C, M0contains no literal of C, and (LM, N ] {C}, U, C) =⇒!

CDCL_W+stgyS

0

We first run theCDCL_W+stgycalculus on a clause set N, as usual. If N is satisfiable, we can add a nonempty, duplicate-free clause C to the set of clauses and apply one of the two above rules. These rules adjust the state and relaunchCDCL_W+stgy.

Theorem 11 (Partial Correctness [20,incremental_conclusive_state]) If state S is conclusive and S =⇒CDCL_W+stgy+incrS0, then S0is conclusive.

The key is to prove that the structural invariants that hold forCDCL_W+stgy still hold after adding the new clause to the state. Then the proof is easy because we can reuse the invariants we have already proved aboutCDCL_W+stgy.

5 A Naive Functional Implementation of CDCL

Sections 3 and 4 presented variants of DPLL and CDCL as parameterized transition sys-tems, formalized using locales and inductive predicates. We now present a deterministic SAT solver that implementsCDCL_W+stgy, expressed as a functional program in Isabelle.

When implementing a calculus, we must make many decisions regarding the data struc-tures and the order of rule applications. Our functional SAT solver is very naive and does not feature any optimizations beyond those already present in theCDCL_W+stgycalculus; in

(21)

Section 6, we will refine the calculus further to capture the two-watched-literal optimization and present an imperative implementation relying on mutable data structures.

For our functional implementation, we choose to represent states by tuples (M, N, U, D), where propositional variables are coded as natural numbers and multisets as lists. Each tran-sition rule inCDCL_W+stgyis implemented by a corresponding function. For example, the function that implements thePropagaterule is given below:

definition do_propagate_step:: state ⇒ state where

do_propagate_stepS = (case S of

(M, N, U, >) ⇒

(casefind_first_unit_propagationM (N@ U) of

Some(L, C) ⇒ (PropagatedL C · M, N, U, >)

|None⇒ S) | S ⇒ S)

The functions corresponding to the different rules are combined into a single function that performs one step. The combinatordo_if_not_equaltakes a list of functions implement-ing rules and tries to apply them in turn, until one of them has an effect on the state:

fun do_cdcl_step:: state ⇒ state where

do_cdcl_stepS =do_if_not_equal[do_conflict_step,do_propagate_step, do_skip_step,do_resolve_step,do_backtrack_step,do_decide_step] S The main loop appliesdo_cdcl_stepuntil the transition has no effect:

function do_all_cdclW_stgy:: state ⇒ state where

do_all_cdclW_stgyS = (let S0=do_cdcl_stepSin

if S0= S then S elsedo_all_cdcl_stgyS0)

The main loop is a recursive program, specified using the function command [27]. For Isabelle to accept the recursive definition of the main loop as a terminating program, we must discharge a proof obligation stating that its call graph is well founded. This is a priori unprovable: The solver is not guaranteed to terminate if starting in an arbitrary state.

To work around this, we restrict the input by introducing a subset type that contains a strong enough structural invariant, including the duplicate-freedom of all the lists in the data structure. With the invariant in place, it is easy to show that the call graph is included in theCDCL_W+stgy calculus, allowing us to reuse its termination argument. The partial correctness theorem can then be lifted, meaning that the SAT solver is a decision procedure for propositional logic.

The final step is to extract running code. Using Isabelle’s code generator [23], we can translate the program to Haskell, OCaml, Scala, or Standard ML. The resulting program is syntactically analogous to the source program in Isabelle, including its dependencies, and uses the target language’s facilities for datatypes and recursive functions with pattern matching. Invariants on subset types are ignored; when invoking the solver from outside Isabelle, the caller is responsible for ensuring that the input satisfies the invariant. The entire program is about 520 lines long in Standard ML. It is not efficient, due to its extensive reliance on lists, but it satisfies the need for a proof of concept.

6 An Imperative Implementation of CDCL

As an impure functional language, Standard ML provides assignment and mutable arrays. We use these features to derive an imperative SAT solver that is much more efficient than the

(22)

4. ¬ A B 3. ¬ A ¬ B C 2. ¬ A ¬ B ¬C 1. ¬ B C A (a) ¬ A B C ¬ B ¬ A ¬C ¬ B ¬ A ¬ B C A (b) ¬ A B C ¬ B ¬ A ¬C ¬ B ¬ A A C ¬ B (c)

Fig. 3: Evolution of the two-watched-literal data structure on an example

functional implementation. We start by integrating the two-watched-literal optimization into

CDCL_W+stgy. Then we refine the calculus to apply rules deterministically, and we generate code that uses arrays to represent clauses and clause sets.

The resulting SAT solver is orders of magnitude faster than the naive functional imple-mentation described in the previous section. However, it is one to two orders of magnitude slower than DPT 2.0 [21], the fastest imperative OCaml solver we know of, because it does not implement restarts or any sophisticated heuristics for learned clause minimization. We expect that many missing heuristics will be straightforward to implement. Due to inefficient memory handling, our solver is not competitive with state-of-the-art solvers.

6.1 The Two-Watched-Literal Scheme

The two-watched-literal (2WL or TWL) scheme [42] is a data structure that makes it pos-sible to efficiently identify candidate clauses for unit propagation and conflict. In each non-unit clause, we distinguish two watched literals—the other literals are unwatched. Initially, any of a non-unit clause’s literals can be chosen to be watched. In the simplest version of the scheme, the solver maintains the following invariant for each non-unit clause:

(α) A watched literal may be false only if all the unwatched literals are false.

As a consequence of this invariant, setting an unwatched literal will never yield a candidate for propagation or conflict, because the two watched literals can then only be true or unset.

For each literal L, the clauses that contain a watched L are chained together in a list (typically a linked list). When a literal L becomes true, the solver needs only to iterate through the list associated with −L to find candidates for propagation or conflict. For each candidate clause, there are four possibilities:

1. If some of the unwatched literals are not false, we restore the invariant by updating the clause: We start watching one of the non-false unwatched literals instead of −L. 2. Otherwise, we consider the clause’s other watched literal:

2.1. If it is not set, we can propagate it. 2.2. If it is false, we have found a conflict. 2.3. If it is true, there is nothing to do.

In Automated Reasoning, a weaker invariant is used, inspired by MiniSat [18]:

(β) A watched literal may be false only if the other watched literal is true or all the un-watched literals are false.

(23)

This invariant is easier to establish than (α): If the other watched literal is true, there is nothing to do, regardless of the truth value of the unwatched literals. The four-step procedure above can easily be adapted, by pulling step 2.3 to the front.

To illustrate how the solver maintains the invariant, whether (α) or (β), we consider the small problem shown in Figure 3. The clauses are numbered from 1 to 4. Gray cells identify watched literals. Thus, clause 1 is ¬ B ∨ C ∨ A, where ¬ B and C are watched.

1. We start with an empty trail and an arbitrary choice of watched literals (Figure 3a). 2. We decide to make A true. The trail becomes A†. In clauses 2 and 3, we exchange ¬ A

with another literal to restore the invariant (Figure 3b).

3. We propagate B from clause 4. The trail becomes B A†. In clause 1, we exchange ¬ B with A to restore the invariant (Figure 3c).

4. From clauses 2 and 3, we find out that we can propagate ¬C and C. We choose C. The trail becomes CB A†. Clause 2 is in conflict. The decision made in step 2 was wrong, so we backtrack.

Upon backtracking, there is no need to update the data structure. A key property for the data structure’s efficiency is that the invariant is preserved when we remove literals from the trail. In MiniSat and other implementations, propagation is performed immediately whenever a suitable clause is discovered, and when a conflict is detected, the solver stops updating the data structure and processes the conflict. Using this more efficient strategy, the following scenario is possible for the example of Figure 3:

1. We start with an empty trail and the same watched literals as before (Figure 3a). 2. We decide to make A true. The trail becomes A†_.

3. We propagate B from clause 4. The trail becomes B A†_.

4. We propagate C from clause 3. The trail becomes CB A†. Clause 2 is in conflict. The decision made in step 2 was wrong, so we backtrack.

By making the right arbitrary choices, we could go from propagation to propagation without having to update the clauses. However, neither invariant holds for clauses 1 to 3 after step 3. To capture the new state of affairs, we need a more precise invariant and a richer notion of state that take into account any pending updates. The new invariant is as follows:

(γ) If there are no pending updates for the clause and no conflict is being processed, in-variant (β) holds.

An update is represented by a pair (L,C), where L is a literal that has become false and C is a clause that has L as one of its watched literals. Each time a literal L is added to the trail, all possible updates (−L, C) are added to the set of pending updates, which is initially empty. Whenever a conflict is detected, the updates are reset to /0. Pending updates can be processed at any time by the calculus.

6.2 The CDCL Calculus with Watched Literals

CDCL with the 2WL data structure is defined as an abstract calculusCDCL_TWLthat refines

CDCL_W+stgy. Nonunit clauses are represented asTWL_ClauseW UW, where W is the mul-tiset of watched literals (of cardinality 2) and UW the mulmul-tiset of unwatched literals. Unit clauses are represented as singleton multisets. The state must also keep track of pending updates. States have the form (M, N, U, D, NP, UP, WS, Q), where

(24)

– N is the initial nonunit clause set in 2WL format; – U is the learned nonunit clause set in 2WL format; – D is a conflict clause or >;

– NP is the initial unit clause set; – UP is the learned unit clause set;

– WS is a multiset of literal–clause pairs (L,C) indicating that clause C must be updated with respect to literal L;

– Q is a set of literals for which further updates are pending.

NPand UP do not influence the calculus; they are ghost components that are useful for connecting a 2WL state to the format expected byCDCL_W:

stateW_of(M, N,U, D,NP,UP,WS, Q) = (M,clausesW_ofN ] NP,clausesW_ofU ] UP, D)

TheclausesW_offunction converts a 2WL clause set to a standard clause set.

The first two rules ofCDCL_TWLhave direct counterparts inCDCL_W:

Propagate (M, N,U,>,NP,UP,{(L,C)} ] WS, Q) =⇒CDCL_TWL

(L0CM, N,U,>,NP,UP,WS,{−L0} ] Q)

ifwatchedC = {L, L0}, L0is not set in M, and ∀K ∈unwatchedC. −K ∈ M

Conflict (M, N,U,>,NP,UP,{(L,C)} ] WS, Q) =⇒CDCL_TWL(M, N,U,C,NP,UP, /0, /0)

ifwatchedC = {L, L0}, −L0∈ M, and ∀K ∈unwatchedC. −K ∈ M

For both rules, the side condition ∀K ∈unwatchedC. −K ∈ M is necessary because

invari-ant (β) is not required to hold for C while a (L,C) update is pending.

The next rules manipulate the state’s 2WL-specific components, without affecting the state’s semantics as seen throughstateW_of:

Update (M, N,U,>,NP,UP,{(L,C)} ] WS, Q) =⇒CDCL_TWL(M, N0,U0,>,NP,UP,WS, Q)

if K ∈unwatchedC, −K /∈ M, and N0_{and U}0_{are obtained from N and U by replacing}

C =TWL_ClauseW UWwithTWL_Clause(W − {L} ] {K}) (UW − {K} ] {L})

Ignore (M, N,U,>,NP,UP,{(L,C)} ] WS, Q) =⇒CDCL_TWL(M, N,U,>,NP,UP,WS, Q)

ifwatchedC = {L, L0} and L0_{∈ M}

Next_Literal (M, N,U,>,NP,UP, /0,{L} ] Q) =⇒CDCL_TWL

(M, N,U,>,NP,UP,{(L,C). L ∈watchedC ∧ C ∈ N ] U}, Q)

As in CDCL_W+stgy , propagations and conflicts are preferred over decisions. This is achieved by checking that WS and Q are empty when making a decision:

Decide (M, N,U,>,NP,UP, /0, /0) =⇒CDCL_TWL(L†M, N,U,>,NP,UP, /0,{−L})

if L is not defined in M and appears in N

The restriction onDecideis enough to ensure that the reasonable strategy is applied in

CDCL_TWL.SkipandResolveare as before, except that they also preserve the 2WL-specific components of the state. TheJumprule is replaced by two rules, because of the distinction between unit and nonunit clauses:

Jump_Nonunit (M0K†_{M, N,U, D ∨ L,NP,UP, /0, /0) =⇒}

CDCL_TWL

(LD∨L_{M, N,U ] {D ∨ L},>,NP,UP, /0,{L})}

if D 6= ⊥ and L satisfies the conditions onJump Jump_Unit (M0K†_{M, N,U, L,NP,UP, /0, /0) =⇒}

CDCL_TWL