• No results found

Hierarchies for efficient clausal entailment checking: With applications to satisfiability and knowledge compilation.

N/A
N/A
Protected

Academic year: 2021

Share "Hierarchies for efficient clausal entailment checking: With applications to satisfiability and knowledge compilation."

Copied!
157
0
0

Loading.... (view fulltext now)

Full text

(1)

Swansea University E-Theses

_________________________________________________________________________

Hierarchies for efficient clausal entailment checking: With

applications to satisfiability and knowledge compilation.

Gwynne, Matthew

How to cite:

_________________________________________________________________________

Gwynne, Matthew (2014) Hierarchies for efficient clausal entailment checking: With applications to satisfiability and knowledge compilation.. thesis, Swansea University.

http://cronfa.swan.ac.uk/Record/cronfa42854

Use policy:

_________________________________________________________________________

This item is brought to you by Swansea University. Any person downloading material is agreeing to abide by the terms of the repository licence: copies of full text items may be used or reproduced in any format or medium, without prior permission for personal research or study, educational or non-commercial purposes only. The copyright for any work remains with the original author unless otherwise specified. The full-text must not be sold in any format or medium without the formal permission of the copyright holder. Permission for multiple reproductions should be obtained from the original author.

Authors are personally responsible for adhering to copyright and publisher restrictions when uploading content to the repository.

Please link to the metadata record in the Swansea University repository, Cronfa (link given in the citation reference above.)

(2)

Hierarchies for efficient clausal entailm ent checking:

w ith applications to satisfiability and knowledge compilation

M atthew Gwynne

Subm itted to Swansea U niversity in

fulfilment of the requirem ents for th e Degree of

D octor of Philosophy.

Swansea University

Prifysgol Abertawe

Swansea University

Julv 2013

(3)

ProQuest Number: 10821244

All rights reserved INFORMATION TO ALL USERS

The qu ality of this repro d u ctio n is d e p e n d e n t upon the q u ality of the copy subm itted. In the unlikely e v e n t that the a u th o r did not send a c o m p le te m anuscript and there are missing pages, these will be note d . Also, if m aterial had to be rem oved,

a n o te will in d ica te the deletion.

uest

ProQuest 10821244

Published by ProQuest LLC(2018). C op yrig ht of the Dissertation is held by the Author.

All rights reserved.

This work is protected against unauthorized copying under Title 17, United States C o d e M icroform Edition © ProQuest LLC.

ProQuest LLC.

789 East Eisenhower Parkway P.O. Box 1346

(4)

D eclaration

This work has not been previously accepted in substance for any degree and is not being con­ currently subm itted in candidature for any degree.

S ig n e d ... v ... (candidate)

D a t e i ...

S tatem en t 1

This thesis is the result of my own investigations, except where otherwise stated. O ther sources are acknowledged bv footnotes griviner exolicit references. A bibliography is appended.

Signed . _ (candidate)

D ate .... ...\ . . $ / . . d i r . / . W . t & : ...

S tatem en t 2

I hereby give my consent for my thesis, if accepted, to be available for photocopying and for

i n t e r - l i b r a r v l o a n , a n d f o r t.hp t.it.lp a n d sum m ary to be made available to outside organisations.

Signed (candidate)

(5)

A b str a c t

Given the NP-completeness of the satisfiability problem, th e search for classes of CNF clause- set w ith poly-time satisfiability checking is of particular interest and applicability (e.g., Horn, 2-C C S, SCUTZ). In areas such as knowledge com pilation (KC) and constraint translation (to C N F), even stronger properties such as clausal entailm ent (i.e., a CN F F logically implies a clause C ) need to be efficiently decidable. This thesis introduces several new hierarchies of clause-sets w ith efficient clausal-entailm ent checking, which extend and subsume existing hierarchies.

T he main object of study is the UCk hierarchy, generalising the UC class of unit-refuta- tion complete clause-sets, introduced in [51] as a targ e t class for knowledge compilation. Via th e theory of “hardness” of clause-sets as developed in [104, 110, 2] the class UC is naturally generalised to UCk, containing those clause-sets which are “unit-refutation complete of level fc” , which is the same as having hardness a t most k. Relating UCk to poly-time satisfiability the class SCUTZ (Single Lookahead Unit Resolution) is considered, which was introduced in [141] as an umbrella class for efficient (poly-time) SAT solving. SjCUTZ generalises to SCUTZk in the same way as UC and one of th e core insights of this thesis is th a t SCUTZk — UCk holds for all k (including SjCUTZ = UC\). One can thus exploit both SAT and KC stream s of intuitions and m ethods for investigations of these hierarchies. As applications it is shown th a t m em bership decision for fixed levels of these hierarchies is coNP-com plete and th a t SCUTZk (= UCk) strongly subsumes hierarchies from [36, 10], as well as other hierarchies for efficient SAT solving.

A natural question is w hether each level allows more succinct representations. A nother core result of this thesis is an answer in the positive sense. W ithout new variables, each level of these hierarchies offer exponentially more succinct representations. This means th a t there are boolean functions w ith only exponential-size equivalent clause-sets at level k, but w ith poly­ size equivalent clause-sets a t level k + 1. This shows th a t UCk forms a knowledge compilation hierarchy inbetween the CN F and PI classes from [46], w ith query efficiency similar to th e PI class (i.e., sets of prim e im plicates), allowing one to trad e query tim e for succinctness.

In the outlook, representations allowing new variables are considered. It is envisioned, due to the strong connections between th e “hardness” notion from [104, 110, 2] and resolution com­ plexity, th a t the hierarchies defined might act as a SA T-orientated alternatives to constraint- based notions of “good” SAT representation such as “m aintaining arc-consistency via unit-clause- propagation” . In this direction it is shown th a t several common CNF representations already fit into th e UCk scheme, and experim ents are provided dem onstrating th a t m odern DPLL and CDCL solvers are able to solve CNF clause-sets a t higher levels oiUCk and WCfc, while solvers can perform poorly when using much larger representations in UC\. This provides m otivation for considering higher levels of UCk for representations in satisfiability solving, and adds an additional dimension to th e question of “good” SAT representations.

(6)

A ck n ow led gem ents

Firstly I ’d like to th an k my examiners, Arnold Beckmann and Ian Gent. Their comments and questions helped shape this thesis into a b etter and more well presented whole and for th a t I am very grateful.

My supervisor, Oliver Kullm ann, needs special m ention for his tireless effort in guiding me not ju st academically in my work b u t also in term s of being a better, more productive person. I th in k it will still be m any years to come before I fully appreciate all of th e lessons I have learnt, and am still learning, from working w ith Oliver. I can not th an k him enough.

I am eternally indebted to the generousity and encouragement of Swansea University Com­ p u ter Science D epartm ent. They have both supported me financially throughout my tim e there and offered so many opportunities to excel. I am particularly grateful to members of the Theory group, including, in no particular order, Ulrich, Monika, John, Arnold, Jens, M arkus, Anton and so m any others, who all helped to make me feel welcome and enlighten me w ith a constant supply of new ideas.

In the same way, I m ust m ention the com panionship of those w ith whom I shared the same journey. W ithout friends like Phillip Jam es, Em m a Thom , Fredrik Forsberg, Liam O ’Reilly, Jennifer Pearson, Tom Owen and m any more, I think my sanity would have slowly ebbed away. My thanks also extend to Stephen Richards, who stoically and patiently suffered living with me throughout my PhD, w ith barely a word of complaint, no m atter how much deserved.

My longer suffering family deserve so m any thanks. I cherish your love and support through­ out my life and appreciate how much such things m atter in shaping one’s p ath in life.

Lastly, but most im portantly, I am grateful for the tireless, absolute and loving support of my partner Emma. T hank you for sticking w ith me throughout the entire journey and for making everything w orth it.

(7)
(8)

C ontents

1 I n tr o d u c tio n 7

1.1 Clausal entailm ent in knowledge compilation and S A T ... 8

1.2 Classes and hierarchies for SAT knowledge c o m p ilatio n ... 12

1.3 Strict hierarchies of re p re se n ta tio n s... 16

1.4 “G ood” SAT representations ... 18

1.5 Sum m ary of r e s u lts ... 23

2 P re lim in a ries on c la u se -se ts an d b o o le a n fu n ctio n s 27 2.1 Boolean fu n c tio n s... 27

2.2 Clause-sets ... 30

2.3 Forced lite ra ls /a s s ig n m e n ts ... 33

2.4 Reductions ... 34

2.5 R e so lu tio n ... 36

2.6 Prim e implicates and im plicants ... 38

2.7 Special classes of clause and clause-sets ... 41

2.8 C onstraint Program m ing and s a tis fia b ility ... 46

3 T h e h a rd n ess o f c la u se -se ts 51 3.1 Generalised unit-clause-propagation ... 52

3.2 Generalised input re s o lu tio n ... 54

3.3 Hardness of unsatisfiable c l a u s e - s e t s ... 55

3.4 Hardness of (arbitrary) c l a u s e - s e ts ... 58

3.5 Propagation-hardness ... 61

3.6 W idth-based h a r d n e s s ... 64

3.7 Rem arks on the term “hardness” ... 66

4 C la sses an d h ierarch ies for S A T k n o w led g e c o m p ila tio n 67 4.1 The SLUR class and e x te n s io n s ... 67

4.2 UCk'- unit-refutation complete clause-sets a t le v e l- k ... 70

4.3 Fundam ental properties of U C k ... 71

4.4 T he SLUR h ie r a r c h y ... 76

4.5 O p t im is a tio n ... 79

4.6 Interleaved hierarchies: UCk-, 'PCk, and W C k... 81

(9)

5 S trict h ierarch ies for eq u iv a len t r e p r e se n ta tio n s 87

5.1 Minimal premise sets and doped c la u s e - s e ts ... 87

5.2 Doping tree c lau se -sets... 92

5.3 Lower b o u n d s ... 97

6 A n a ly sin g th e T s e itin tr a n sla tio n 103 6.1 CNF re p re s e n ta tio n s ... 104

6.2 T he canonical t r a n s l a t i o n ... 106

6.3 The reduced canonical translation ... I l l 6.4 XOR-clause-sets ... 112

7 E x p e r im e n ts 119 7.1 The in s ta n c e s ... 119

7.2 Environm ent and so lv e rs... 121

7.3 Look-ahead s o lv e r s ... 122

7.4 Conflict-driven s o l v e r s ... 122

7.5 D iscu ssio n ... 123

8 C o n clu sio n an d o u tlo o k 135 8.1 Strictness of h ie r a r c h ie s ... 135

8.2 Hard boolean functions handled by o ra c le s ... 138

8.3 Exploring ( t-,w -) h a rd n e s s ... 138

8.4 A theory of “good” SAT re p re s e n ta tio n s ... 139

8.5 Compilation p r o c e d u r e s ... 141

8.6 Translating th e Schaefer c la s s e s ... 141

(10)

C hapter 1

In trod u ction

The (boolean) satisfiability problem (SAT) has been a much studied research area, especially in the last 20 years, seeing applications in verification (see [102]) and bounded model checking (see [22]), scheduling and planning (see [137]), com binatorial m athem atics (see [164]), cryptanalysis (see [122, 41, 11, 150]), and also further afield in areas such as bioinformatics (see [72]); for an overview see [23]. A large p art of the success and popularity of SAT and m odern SAT solvers over other m ethods comes from the simplicity and rich underlying structure of Conjunctive Normal Form (CNF) clause-sets (see C hapter 2), used as the standardised representation format. Given the NP-completeness of th e SAT problem (as proven by Cook in [38]), an im portant them e is the search for relevant classes C of clause-sets F for which one can (at least) decide satisfiability in polynomial time (th a t is, deciding w hether F logically implies the em pty clause); see Section 1.19 in [63] for some basic information.

However, w hat if one wants a stronger property th a n mere poly-tim e satisfiability checking? W hat if one wants to be able to check satisfiability of F under any partial instantiation? Then the (known and fixed) satisfiability of F does not necessarily help. Such properties are im portant in knowledge compilation, where one compiles (offline) a knowledge base into some representa­ tion (e.g., a CNF clause-set) and then wants th a t to be able to perform many (online) queries efficiently (e.g., checking satisfiability under a partial assignment). For this task one requires th a t clausal entailm ent queries (deciding w hether F logically implies some given clause, rather th an just th e em pty clause) can be decided in polynomial time; see [46] for an overview.

Such properties are also im portant when one is interested in translating constraints or boolean functions to some CNF F which will form only part of a larger SAT problem. Most SAT solvers search for a satisfying assignment by building up partial assignments, piece by piece. Search times can be exponentially reduced if the solver can (efficiently) detect the unsatisfiability of F under this partial assignment, and so halt search “in the wrong direction” as soon as possible. This is th e underlying idea behind “good” representations in satisfiability by “m aintaining arc- consistency” via unit-clause propagation, i.e., detect assignments which m ust be m ade (to satisfy the CNF) as early as possible, using linear tim e unit-clause propagation.

This thesis brings together these two areas to form hierarchies, based on hardness notions from [104, 110], for S A T knowledge compilation: compiling all of th e “knowledge” of a constraint or boolean function, (i.e., w hether it is satisfiable under each p artial assignment), into the CNF representation, such th a t partial assignments m ade by m odern SAT solvers (the “queries” ) are efficient. T hree hierarchies are constructed: UCk, based on the UC class introduced by Del Val in [51] for knowledge compilation and strongly connected to tree-resolution space complexity; VCk, based on the VC class introduced in [26] and intim ately connected w ith well-known concept of

(11)

“m aintaining arc-consistency via unit-clause-propagation” (see Section 2.8 of C hapter 2); and the WC/c hierarchy, extending WC = UC b u t now using (improved) asymmetric based notions of f u l /-resolution w idth complexity from [104, 110]. Each fixed level of each of these hierarchies offer poly-tim e clausal entailm ent (as well as other queries) for knowledge compilation, while subsum ing existing classes, and in C hapter 5 it is shown th a t each level allows for exponentially more compression th a n the level below (i.e., the entire hierarchies themselves are useful for knowledge compilation). Furtherm ore, in C hapter 7 it is shown SAT solvers are able to utilise th e strong connections of these hierarchies to resolution to quickly find short proofs for small compressed clause-sets in UCk for low enough k, while dem onstrating poor perform ance on “easier” b u t much larger instances in UC\. Together, the results from C hapter 5 and C hapter 7 dem onstrate both th a t these hierarchies as a whole are useful, not ju st UCi, and also th a t finding “good” SAT representations is a m ulti-dimensional problem, w ith at least “hardness” and size dimensions to consider.

1.1

Clausal entailm ent in know ledge com pilation and SAT

As an illustrative example, consider the boolean function / given by the tru th table in Figure 1.1. W hile in reality this function has been chosen for its ability to concisely convey th e key concepts of this thesis, for th e sake of argum ent, consider two scenarios and interpretations of / :

1. S c e n a rio 1: / encodes the following product configuration problem (see [146] for work on using UC for product configuration problem s):

(a) there is a product (e.g., a com puter, car etc) which can be configured w ith 4 different pd>rts • • • i ^4?

(b) the valid configurations for p are given by those sets S C {i1?. . . , i4} where f(<ps) — 1 and the partial assignment ips '■ {vj '■ ij € 5} x {0,1} is defined by

j^O otherwise.

2. S c e n a rio 2: / encodes scheduling constraints at tim e t in th e following scheduling problem: (a) there are 4 employees e i , . . . , e4 working at a factory;

(b) a valid scheduling of employees to tim e slot t m ust obey the constraints in / , i.e., the valid sets of employees 5 C { e i , . . . , e4} which can be scheduled together at tim e t is given by those sets for which f(<ps) — 1 and th e partial assignment <ps '■ {vi '■ ei £ 5} x {0,1} is defined by

/ 1 Ci G 5 Vs\Vi) := \

0 otherwise.

In this case, / is ju st one constraint in th e larger scheduling problem. Each tim e slot t' € T (for a set of times T) will have its own scheduling constraints for e\ , . . . , e4. In general these constraints might even overlap (e.g., employees m ight not want to work back to back). So overall there is a boolean function / := / A / ' where f encodes together all scheduling constraints a t times other th an t.

(12)

V l v 2 v 3 v 4 f ( v i , V 2 , V 3 , V 4 ) 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 1 0 0 1 0 0 0 0 1 0 1 1 0 1 1 0 1 0 1 1 1 0 V l v 2 v 3 v 4 f ( v i , v 2 , v 3 , v 4 ) 1 0 0 0 0 1 0 0 1 0 1 0 1 0 0 1 0 1 1 1 1 1 0 0 0 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1

Figure 1.1: T ru th for boolean function / : {0, {q^ j y

In th e above scenarios we m ight w ant to consider two different tasks:

1. In Scenario 1, users will a tte m p t to build different configurations of product p and the system m ust be able to quickly retu rn w hether these configurations are valid. T h at is, we w ant to represent / in such a way as to be able to efficiently answer queries regarding its satisfiability under certain assignments.

2. In Scenario 2, th e larger scheduling problem / is given only implicitly as th e conjunction of the smaller scheduling problem s for each tim eslot, and so the task here is find repre­ sentations for / (and for / ' ) such th a t it is efficient to decide w hether there exists a valid scheduling of the employees across tim eslots (i.e., deciding satisfiability for / ) .

To answer these two tasks we will examine th e following 3 Conjunctive Normal Form (CNF) clause-set representations of / where for each CNF th e order and operations have been ex­ tracted away (recall th a t boolean conjunction ( A ) and disjunction ( V ) are both associative and com m utative). F 0 '■= { { V l , v 5 , n } , { v 2 , V 3 , V j } , { v 2 , V 3 , V 4 } , { v 5 , V 3 , V 4 } , { v i , V 3 , V 4 } , { v 1 , V 2 } }• V V ^ V y . ' ---V--- s , --- ' V — . y — ^ S v --- ' " ^ ' C j C 2 C3 C4 C5 Ce F i := { fol,V j,V j } : { V 2 , V 3 , V 4 },{ V 2, V3 ,^4}, {^2, ^3,^4}, C i C2 C3 C4 Cq F 2 := { {«1 , ^ , V 4 } , { v 2 , V 3 , v j } , { v 2 , V 3 , V 4 } , { V 2 i V 3 , V 4 } , { v 1 , V 3 , V 4 } }. ' v ' ' s, ' ^ v / v , y s '---v---' C x C 2 C 3 C 4 C 5

For each of th e above clause-sets th e CNF interpretation is applied, i.e.,

/ =

a

V i = A V i = A V x

C G Po x G C* C G -Pi x G C C G P2 %G C

So, for example, interp rettin g Fq as a CNF, we have

f — (v 1 V -1U3 V -1U4) A (v2 V v3 V v4) A (v2 V ~^v3 V v4) A (~iv2 V v3 V v4) A (tq V ^ 3 V v4) A (iq V v2)

T a s k 1: Fo, F\ and F2 are expected to be able to “efficiently” answer queries in the form of

assignm ents to th e variables to check th e consistency of potential product configurations. To illustrate th e different possibilities and an initial insight into th e UCk hierarchy (introduced in C hapter 2), consider these representations of F under th e following queries.

(13)

1. Consider Fo and the p artial assignment (fc6 (v i ~ > 0>v2 —> 0) (he-, is set to 0 and

V2 is set to 0). By applying stan d ard simplifications (i.e., removal of satisfied clauses and

falsified literals) on th e C NF we see im mediately th a t (pc6 * / (i.e., / after applying ty?c6)

is unsatisfiable, i.e., any product configuration not containing either ii or 1 2 is invalid:

<PCe * F o = { { ^1, ^3, V 4 } , { ^2, ^3, ^ } , { v2, U3i M , { u 2, U 3 , U4} , { u i , V3, U4} , { v l , V 2 } }

X X X ✓ X X X

= { {U3, V4}, {^3,^4}, {^3,^4}, {v3,U4}, J.}.

T h a t (ui —>■ 0, V2 —> 0) * / is unsatisfiable follows via th e observation th a t the em pty clause

(i.e., th e em pty disjunction) is derived.

2. Consider Fi and th e query ipc*, (vi —> 0,^3 —> 0, v4 —»• 0). Now when applying the same simplifications as for th e above case, we do not im mediately derive the em pty clause and so do not “im m ediately” detect th a t <pc5 * f = 0:

V C S * F \ = { { V l , V 3 , V 4 } , { V 2 , V 3 , V ^ } , { v 2 , V 5 , V 4 } , { v 2 , V 3 , V 4 } , { v 1 , V 2 } }

X ✓ ✓ X ✓ ✓ X X X X

= { {V 3,w },{^2},{v2} }•

However, observe th a t further setting V2 to 0 would yield th e em pty clause (due to {^2})

- therefore if one were to satisfy ipc5 * Fi then such a satisfying assignment m ust set V2

to 1. T h a t is, we m ust satisfy th e unit-clause {^2}- P ropagating (V2 —> 1) then yields the

em pty clause due to th e other unit-clause {^2}. In this way, this simple form of lookahead

( “unit-clause propagation” - see Section 2.4 of C hapter 2) determ ines the unsatisfiability of <pc5 * F\ and so th e incom patibility of th e corresponding product configuration.

3. Finally, consider F2 and th e query ipc6 '■= (ui 0,V2 —> 0). This time, applying the same

simplifications above yields a C N F clause-set w ithout any unit-clauses and so unit-clause propagation does nothing:

VC6 * F0 = {{ui,U3,U4},{u2,U3,W },{U2,t£,U4},{U2,U3,M ,{t>l,U3,U4}}

X X X ✓ X

Instead, we can apply the same reasoning b u t one level higher. T h a t is, checking w hether a literal is forced by setting it and then checking via unit-clause propagation w hether th a t assignm ent results in / being unsatisfiable (this is failed literal elimination).

More precisely, in this case we have (V3 —> 0) * (<Pc6 * Fo) = {{W}? {^4}} which then via

unit-clause-propagation is obviously unsatisfiable. Therefore, all satisfying assignments to (<pc6 * Fo) m ust set V3 to 1, however ( 0 3 —> 1) * ((fc6 * Fo) = {{W}-, {^4}} which again by

unit-clause propagation is unsatisfiable. Therefore (<p c6 * Fo) is unsatisfiable.

These three representations offer a trade-off between th e size of the representation (note Fo is larger th a n F\ and F2) and the level of lookahead necessary to decide clausal entailment (i.e.,

w hether some assignm ent falsifies / ) . T he above examples are small bu t in general the trade-offs can be exponential as will be shown in C hapter 5.

T a s k 2 In this case, we assume a fixed C NF clause-set representation F ' of f and vary th e representation of / over F o ,F i and F2. This yields th ree CNF clause-set representations Fo :=

(14)

Fo <p I (fi —>0) | (v2-»0) 1 X ±

Figure 1.2: (P a rtia l) b ack track in g trees for Fq,F[ and F2 respectively.

d eterm in e th e satisfiability of / (i.e., th e satisfiability of Fo, F i or equivalently F2) and hence

th e existence of a valid schedule in th e underlying scheduling problem .

T h e m ost fu ndam ental tech n iq u e in m odern SAT solving, underlying b o th m odern D P L L J) (see [88]) and C D C L 2> ( see [121]) m e th o d s is b acktracking (albeit augm ented w ith clause-learning and non-chronological b ackjum ping in C D C L ). T h a t is, essentially th e solver a tte m p ts to find a satisfying assignm ent for th e in p u t C N F by building up a satisfying assignm ent piece-by- piece, applying sim ple red u ctio n s as th e solver proceeds, an d b acktracking and try in g a different assignm ent w hen a conflict (i.e., th e e m p ty clause) is d etected . By considering th e (p artial)

b acktracking trees given in F ig u re 1.2 one can see th a t in each case, depending 011 th e choices

th e SAT solver makes, th e sub-problem on / can becom e unsatisfiable (i.e., th e c u rre n t p a rtia l assignm ent falsifies F 0 resp. F\ resp. F2) b u t th e solver does not d e te c t th is until m uch la te r (i.e., th e solver continues to b ack track until it d etects th e e m p ty clause). In p a rtic u la r, we see th a t

1. W hen Fo is m ade unsatisfiable u n d er th e p a rtia l assignm ent (t>i —>• 0,4*2 —>• 0), th is is

im m ediately detected by th e SAT solver because th e clause {^1, ^2} yields th e e m p ty clause

under th is assignm ent; no fu rth e r backtracking is necessary.

2. W hen F \ is m ade unsatisfiable u n d er th e p a rtia l assignm ent ( v \ —» 0,^3 —» 0,^4 —>■ 0)

th e n backtracking m ust still lookahead one level to d eterm in e th is (essentially m im icking unit-clause p ro p ag atio n ).

3. W hen F2 is m ade u n satisfiab le un d er th e p a rtia l assignm ent (iq —> 0, i>2 —> 0) th e n th e solver m ust now b ack track even m ore, looking ahead two levels ra th e r th a n one.

A gain as in Task 1, we see a trade-off, F 0 is larger b u t w hen an inconsistent assignm ent is m ade, th is is im m ediately obvious and no fu rth e r b acktracking is necessary. In general, a solver m ay (unnecessarily) need to search an ex p o n en tial size back track in g tre e to d eterm in e unsatisfiabil­ ity, and so th e search for re p re sen tatio n s where backtracking tre e s are sh o rt, or at least short backtracking trees exist are g u aran teed to exist, are very useful in reducing solver tim e.

T h e aim of th is thesis is to develop n a tu ra l and elegant C N F “targ et-classes” to represent boolean functions such th a t th e above queries and u n satisfiab ility d etectio n are efficient. In

n T h e D av is-P u tn am -L o g em an n -L o v elan d (D P L L ) alg o rith m ap p lies b ack track in g , along w ith u n it-c la u se e lim ­

in a tio n (see D efinition 2.4.4) an d pure literal e lim in a tio n (see D efinition 2.4.3) ap p lied a t every n o d e in th e tree.

2)T h e C onflict-D riven C lause L earn in g (C D C L ) alg o rith m a p p lie s b a ck trac k in g w ith clause learning an d non-

(15)

both cases, this essentially boils down to requiring th a t the representation allows efficient clausal entailment checking, i.e., checking unsatisfiability is efficient under any partial assignment. The meaning of the naturality and elegance conditions requirement is not ju st th a t the classes defined are not “ad-hoc” but also th a t the classes defined connect well w ith existing concepts and classes (e.g., resolution and existing poly-time SAT classes - see C hapter 4).

1.2

Classes and hierarchies for SAT know ledge com pilation

The concept of “SAT knowledge com pilation” comes about by bringing together two previously unconnected stream s of research from SAT and knowledge compilation:

S L U R The SLUR algorithm is an incom plete linear-tim e SAT-decision algorithm , based on look-ahead via unit-clause propagation.

U C The class UC of unit-refutation com plete clause-sets enables clausal-entailm ent decision in linear time via unit-clause propagation.

T h a t both stream s are based on unit-clause propagation, which is also the basic tool for efficient SAT solving, is considered here as an essential feature. It means th a t actually we have some form of “SAT knowledge com pilation” , where th e “knowledge” is compiled in such a way th a t a SAT solver can “understand” it!

1.2.1 The quest for SLUR hierarchies

In the year 1995 in [141] the SLUR algorithm was introduced, a simple incomplete non-determ inistic SAT-decision algorithm, which always succeeded on various classes w ith polynom ial-tim e SAT decision where previously only rather com plicated algorithms were known. The com putation is divided into two phases for input-clause-set F: F irst we check via unit-clause propagation (UCP) for unsatisfiability. If this check fails, th en we assume F is satisfiable, and guess a sat­ isfying assignment, using UCP-look-ahead for the guessed assignments to avoid obviously false assignments. The class SCUTZ contains those F where this algorithm always succeeds (i.e., de­ term ines unsatisfiability in the first phase, or always finds a satisfying assignment in th e second phase).

The natural question arises, whether SCUTZ can be turned into a hierarchy, covering in the limit all clause-sets. A generalisation of SLUR has been considered in [64] under the nam e “ISLUR” (improved SLUR), allowing a polynom ial num ber p(£(F)) of backtracks (for a fixed polynomial p, in the input-size 2(F)), in th e unsatisfiability as well as in the satisfiability phase of the SLUR algorithm , before giving up. It is mentioned th a t ISLUR gives up on every large enough “sparse” clause-set (which are “typical” as random k-CNF clause-sets), when no variable occurs “too often” . This was considered to be “disappointing” — bu t from the point of this thesis the value of the class SCUTZ lies not in being a “big” class of clause-sets w ith polynom ial-tim e SAT solving, but in establishing a basic targ e t class for representations of boolean functions w ith very strong properties via clause-set. For all fixed k there exists a polynomial p such th a t the fc-th level of the hierarchy, SCUTZk (introduced in C hapter 4), is contained in the class ISLUR (those clause-sets where the ISLUR algorithm never gives up). So all levels are negligible when considering th e above sparse clause-sets, b u t as we show in C hapter 5, nevertheless this hierarchy is proper regarding good representations of boolean functions, and the param eter k is meaningful and robust (not ju st a numerical param eter like the polynomial p).

In [36, 10] it was finally proved th a t m em bership decision of SCUTZ is coNP-complete, and three hierarchies, S£UTZ(k),SCUTZ*(k) and CANON(fc) were presented. It still seemed th a t none

(16)

of these hierarchies is th e final answer, though they all introduce a certain natu ral intuition. This thesis now presents w hat seems th e n atu ral “limit hierarchy” , called here SCUTZk, which unifies the two basic intuitions em bodied in SCUTZ(k), SCUTZ*(k) on the one hand and CANON(fc) on th e other hand.

In order to do so a precise analysis of the SCUTZ-c\ass is needed. A SLUR transition relation F > F ' between clause-sets F, F ' is introduced, which makes precise one non-determ inistic step of th e SLUR-algorithm. T his transition from F to F ' happens when assigning a (single) literal in such a way th a t U CP does not create the em pty clause. The core idea of the classes SCUTZ(k) and SCUTZ*(k) is to strengthen th e transition relation by requesting th a t not ju st one literal is choosable, b u t actually k literals can be chosen, while th e difference between them is th a t SCUTZ*(k) performs U CP inbetween the choices, while th e weaker class SCUTZ(k) does not.

Before th e solution can be described, th e S£UTZk-hiera,rchy, the second source of “SAT knowl­ edge com pilation” approach m ust be discussed, th e class UC of “unit-refutation complete clause- sets” , which is related to the stream embodied by CANON(fc).

1.2.2

U nit-refu tation com pleteness and “hardness”

In th e year 1994 in [51] the class UC was introduced, containing clause-sets F such th a t clausal entailm ent, th a t is, w hether F \= C holds (clause C follows logically from F , i.e., C is an implicate of F ), can be decided by unit-clause propagation. The m otivation was knowledge compilation, th a t is, to have a m ore succinct alternative to th e use of the set of all prime implicates (i.e., all m inim al CN F clauses entailed by F ; see Section 2.6) of a given clause-set Fo (clausal database), for which one seeks an equivalent F such th a t clausal entailm ent can be decided quickly.

A second development is im portant here, nam ely the development of the notion of “hardness” in [104, 110, 2]. T he first source [104] from 1999 introduced the notion of hardness as a measure hd0 : CCS —> N o , assigning natu ral num bers to clause-sets in th e following way (using S A T C CCS for th e satisfiable clause-sets, and U S A T := CCS \ S A T ) :

• hdo(F ) := 0 for th e simplest clause-sets F G CCS regarding SAT decision, containing the em pty clause (i.e., ± G F ) or being em pty (i.e., F = T ).3)

• hd0(F ) < k for k > 1 iff there is a literal x such th a t for F ' := (x —>■ 0) * F (setting x to 0) we have h d o (F ') < k — 1 and either F ' G U S A T and hdo((x —»■ 1) * F ) < k, or F ' G S A T - T he precise value of h d (F ) is then the minimum k such th a t h d (F ) < k.

In fact, for unsatisfiable clause-sets F , h d o ( F ) is th e minimum “H orton-Strahler” num ber of any

tree-resolution refutation of F and [104] showed th a t from h d o ( F ) we get upper and lower bounds

on th e tree-resolution complexity of F (see Section 3.3 of C hapter 3). h d o can be com puted in

tim e 0 (£ ( F ) • n ( F ) 2k~ 2) (essentially via breath-first search - see [104]).

T he second source [110] from 2004 generalised this approach to constraint satisfaction prob­ lems (and beyond). T he th ird source [2] from 2008 considered hdo(F ) on unsatisfiable clause-sets F € U S A T , relating it to backdoors, cycle-cutsets and treew idth, and performing an experim en­ tal study on random instances. Also in [2] we find a different extension of hdo : U S A T —> No to a m easure hd : CCS —>■ No, using for satisfiable instances F G S A T the m axim isation over all unsatisfiable sub-instances obtained by applying partial assignments. This hardness notion is

3) A ctu ally a tw o-dim ensional family h d ^ s o f such m easures was introduced, based on oracles U C U S A T , S C S A T for deciding unsatisfiability resp. satisfiability, and settin g h d j^ s fF ) := 0 for F E U U S . In this thesis we consider only th e sim plest base case hdo = hdw0l50 , where Uq := { FCCS : l £ f } and <S := { T } . Oracle S does not play a role in th e settin g o f th is thesis, which is fully unsatisfiability-based. See Subsection 4.3.2 for more inform ation on th ese hierarchies, and see Section 6.1 for discussions on relativised hardness.

(17)

harder to measure: as is shown in Theorem 4.4.5 of this thesis, determ ining w hether h d (F ) < k holds for a fixed k > 1 is coNP-complete, while h d o {F) < k can be decided in polynomial tim e (for fixed k). Nevertheless it is the central measure for this thesis, and is considered as measuring “representation hardness” , while h d o measures “solver hardness” .4)

As will be shown in Theorem 4.2.2, h d (F ) < k is equivalent to the property of F , th a t all implicates of F (i.e., all clauses C with F \= C) can be derived by fc-times nested input resolution from F , a generalisation of input resolution as introduced and studied in [104, 110].5) So we obtain th a t UC is precisely the class of clause-sets F w ith h d (F ) < 1 ! It is th en natural to define the hierarchy UCk via th e property h d (F ) < k. The hierarchy CANON(A;) is based on resolution trees of height a t most k, which is a special case of fc-times nested input resolution, and so we have CANON(Ar) C UCk (see Theorem 4.4.6).

1.2.3

Bringing SLUR and UC together

In order to get back to SLUR, we need to emphasise the two-sided natu re of the hardness measure, as developed in [104, 110]. In Subsection 1.2.2 the proof-theoretic side of it was discussed. The algorithmic side is given by th e reductions r^ : CCS —>■ CCS (introduced in [104]), which perform certain forced assignments:

1. ri is unit-clause propagation (UCP), assigning x —> 1 for unit-clauses {x} until all are eliminated.

2. T2 is (complete) failed-literal elim ination, assigning, while possible, x —> 1 for literals x

such th a t the assignment x —> 0 yields a contradiction via r i ; see Section 5.2.1 in [88] for the usage of failed literals in SAT solvers (so-called “look-ahead solvers” ), and see Section 7.2.2 in [114] for the general explanation of r2 being the “look-ahead version” of ri.

3. In general r^+i is the “look-ahead version” of r^, assigning, while possible, x -» 1 for literals x such th a t the assignment x -> 0 yields a contradiction via r^.

For unsatisfiable F the hardness h d (F ) is equal to th e minimal k such th a t rk{F) detects unsat­ isfiability of F , i.e., rfc(F) = {_L}. This yields the basic observation UC C SJCUTZ — and actually we have UC - SCUTZ !

So by replacing the use of ri in the SLUR algorithm by r^ (using the analysis in C hapter 3 via the transition relation) we obtain a natural hierarchy SCUTZk, which includes the previous SLUR-hierarchies SCUTZ(k) and SCUTZ*(k), as well as various other classes and hierarchies for poly-time SAT, and where we have SCUTZk — UCk• This equality of these two hierarchies, and th e dual perspective offered (algorithmic and proof theoretic), is th e argum ent th a t th e “limit hierarchy” for SLUR has been found.

1.2.4 PC and arc-consistency

A similar class to UC, bu t coming initially from a SAT perspective, is th e class VC, introduced in [26], of propagation complete clause-sets. W hereas UC treats ri as a consistency-checker, VC uses ri to detect all forced assignments under any partial assignment. As UC is generalised to UCk, FC is now generalised to the hierarchy VCk in C hapter 4. VCk th en provides interm ediate

4^hd(F) actually captures tree-like resolution (in a sense). In Section 3.6.2 a w idth-based measure of hardness is discussed, which captures dag-like resolution. T his thesis considers the tree-hardness as the natural starting point.

5)Equivalently, as shown in [104, 110], one can say th a t all im plicates C have a tree-resolution proof using space at m ost k + 1.

(18)

layers to th e UCk hierarchy (i.e., we have VCq C UCq C VC\ C UC\ C VC2 • • •) and relates more

directly to th e notion of a clause-set “m aintaining arc-consistency via unit-clause propagation” (see Section 2.8 in C hapter 2).

T h a t “arc-consistency” is “m aintained” via unit-clause propagation means th a t for all (par­ tial) assignments to the variables of th e constraint, if there is a forced assignment (i.e., some variable which m ust be set to a particular value to avoid inconsistency), then unit-clause prop­ agation (UCP) is sufficient to find and set this assignment. M aintaining arc-consistency and propagation-com pleteness may a t a glance seem the same concept (both ask for forced literals to be propagated by ri). However there is an essential difference. W hen translating a constraint into SAT, typically one does not ju st use th e variables of th e constraint, bu t one adds auxiliary vari­ ables to allow for a com pact representation. Now when one speaks of m aintaining arc-consistency, one only cares about assignments to the constraint variables. However propagation-completeness deals only w ith the representation clause-set, thus can not know about the distinction between original and auxiliary variables, and thus it is a property on the (partial) assignments over all variables! A SAT representation, which m aintains arc-consistency via UCP, will in general not be propagation-com plete, due to assignm ents over b oth constraint and new variables yielding a forced assignm ent or even an inconsistency which U CP doesn’t detect; see Example 6.2.5 and Lem m a 6.2.6. In contrast to this, for the basic concepts of “good” representations investigated in this thesis, considering all variables is a fundam ental feature.

1.2.5

WCjt: a hierarchy based on full resolution

As already m entioned in Section 1.2.2 (to be further elaborated in C hapter 3), there are strong proof theoretic connections for UCk to tree-resolution. In [95] th e argum ent is made th a t tree- resolution com plexity can not provide a good m easure of hardness of instances for SAT solving, citing th e ability of m odern Conflict Driven Clause Learning (CDCL) solvers to sim ulate ex­ ponentially more powerful full-resolution (see [4, 129, 130] for evidence th a t CDCL solvers can sim ulate full resolution). However, the aim of UCk is not to measure hardness, bu t to offer a target class for SAT translation. In this respect tree-resolution complexity measures are ideal, because they provide th e strongest translations, and upper-bound measures for full resolution.

UCk being based on the hdo, th e hardness measure from [104], which has strong connections to tree-resolution, m eans th a t if an unsatisfiable (sub-)clause-set has a high tree-resolution com­ plexity (see Definition 2.5.3 in C hapter 2) th en it has a high hardness, even if there exists a short resolution proof for it. For tighter targ e t classes in th e case of full resolution, the notion of w idth-hardness as introduced in [77, 78] is also considered, based on the w idth-based hierarchies of unsatisfiable clause-sets in [104, 110]. T h a t is, a clause-set is in WCfc, the hierarchy of clause- sets of w idth-hardness fc, iff under any partial assignment resulting in an unsatisfiable clause-set there is a “fc-resolution” refutation as introduced in [99]. Here, unlike th e typical notion of w idth, resolutions where only one parent clause needs to have length a t m ost k are allowed, thus prop­ erly generalising unit-resolution (one could speak of “asym m etric w idth” here, compared to the stan d ard “sym m etric w idth” ). This allows the sim ulation of nested input resolution, and thus we have UCk ^ WC/c for all k, whereas otherwise in the standard (sym metrical) sense even Horn clause-sets require unbounded w idth (recall th a t H O C UC\). W hile UCk offers th e stronger guarantee th a t short tree-resolution proofs exist, WCk relaxes this condition to allow clause-sets which m ight have only short fu l/-resolution proofs. In this way, one can prove lowerbounds for w idth-hardness and upper-bounds for (tree-)hardness and hence prove th e precise width- and tree- hardness for certain classes of clause-sets (see e.g., Section 1.3 and C hapter 5).

(19)

1.3

Strict hierarchies o f representations

Considering these hierarchies as targ et classes for SAT translations and knowledge compilation, we know at least th a t for every k G No and every boolean function / there is some CNF representation F G U C q (the prime im plicate representation of F). A natural question to ask is

w hether going from e.g., UCk to UCk+i means th a t certain boolean functions can be represented in polynomial size which only had exponential-size representations before. T h a t is, in term s of representation, does each level offer more, or does th e hierarchy collapse to U C \l

In [51] (Example 2) a separation was already shown between U C q (clause-sets containing

all of their prime im plicates) and UC\ = UC, and the question was raised of th e worst-case growth when compiling from an arb itra ry CNF clause-set F to some equivalent F ' G UC. This question was p artly answered in [17] (although th e connection was not made), where examples are provided of poly-size clause-sets w ith only super-polynomial size representations in UC, even when allowing new variables (see Subsections 6.1 and 8.2 for more on the connection between [17] and UCk)- This shows a super-polynomial lower-bound on the worst-case growth, but no m ethod or new (larger) target-class for knowledge-compilation. Theorem 5.3.13 in C hapter 5 now answers the question of worst-case growth from [51] in full generality w ith th e hierarchy UCk- Each level of UCk is exponentially more expressive th a n the previous (i.e., w ith possible exponential blow-up when compiling from some F G UCk+i to equivalent F ' G UCk), and so each level offers its own new, larger class for knowledge compilation, a t the expense of increased query tim e (now 0 (£ ( F ) • n ( F ) 2k~2) for UCk compared to 0 (£ (F )) for UC). This separation, between UCk+i and UCk for arb itrary k, is more involved th an th e simple separation in [51], due to the param eterised use of more advanced polynom ial-tim e m ethods th a n ri (UCP). Especially th e separation between U C q and UC\ is ra th e r simple, since U C q does not allow any form of

compression (i.e., simply checking for th e em pty clause propagates nothing and assigns to no variables).

1.3.1 Separating the UCk and WCk hierarchies

A sequence (f h ) h e N of boolean functions, which separates UCk+i from UCk w .r.t. clause-sets

equivalent to f h in UCk+i resp. UCk, should have th e following properties:

1. A large n um ber o f p rim e im p licates: th e num ber of prim e implicates for fh should at least grow super-polynomially in h , since otherwise already the set of prime implicates is a small clause-set in U Cq (see Definition 3.4.1) equivalent to f h

-2. E a sily ch a ra cterised p rim e im p licates: th e prime implicates of fh should be easily characterised, since otherwise we can not understand how clause-sets equivalent to f h look

like.

3. P o ly -siz e rep resen ta tio n s: there m ust exist short clause-sets in UCk+i equivalent to fh for all / i g N .

[148] introduced a special type of boolean functions, called there Non-repeating U nate Decision trees (NUD), by adding new variables to each clause of clause-sets in SM.U§=i, which is the class of unsatisfiable hitting clause-sets of deficiency 5 = 1 (see Section 5.2.1 of C hapter 5). These boolean functions have a large number of prim e implicates (the maximum regarding the origi­ nal number of clauses), and thus are n atu ral to consider as candidates to separate th e levels of UCk- Theorem 5.1.18 in Section 5.1 shows th a t it is actually the underlying S M U $ =i clause-sets th a t contribute the structure. The clause-sets in S A iU $ = i are exactly those with th e m aximum num ber of “minimal premise sets” , and th en “doping” elements of SM .Us=i yields clause-sets

(20)

w ith th e maximal number of prim e implicates. Theorem 5.3.13 th en uses th e tree structure of i S M .U {5= i to prove lower bounds on the size of equivalent representations of doped S M .U s =\

(clause-sets in

UCk-To be able to prove properties about all equivalent representations of some CNF clause- sset F, we m ust be able to understand its com binatorial stru ctu re in relation to th e set of all iits prime implicates. T he notion of minimal unsatisfiability (MU) and m inim ally unsatisfiable ^subsets (MUS) is im portant in understanding th e combinatorics of unsatisfiable clause-sets (see [[100, 120]). To understand the stru ctu re of satisfiable clause-sets and their associated boolean ffunctions, Section 5.1 of C hapter 5 now considers the concept of “m inim al premise sets” (MPS) iintroduced in [116]. The notion of MPS generalises th a t of MUS by considering clause-sets F w h ich are minimal w .r.t implying any clause C rather th a n ju st those implying JL (the em pty cclause). And accordingly one can consider the minimal-premise subsets (MPSS) of a clause-set IF.

Every prime implicate C of a clause-set F has an associated MPSS (just consider th e minimal ssub-clause-set of F th a t implies C ), bu t not every MPSS of F yields a prim e im plicate of F (e.g., oconsider the MPSS {C } for some non-prime clause C e F). By “doping” th e clause-set, i.e., aadding a new unique variable to every clause, every clause in an MPSS F ' makes a unique ccontribution to its derived clause C. This results in a new clause-set D (F ) which has an exact (correspondence between its minimal premise sets (which are (essentially) also those of F) and its {prime implicates. In this way, by considering clause-sets F w ith a very structured set of minimal {premise subsets, we can derive clause-sets D (F ) with very structured set of prim e implicates.

Section 5.3 of C hapter 5 introduces th e basic m ethod (see Theorem 5.3.4) for lower bounding fche size of equivalent clause-sets of a given hardness, via the transversal num ber of “trigger hiypergraphs” . Using this lower bound m ethod, Theorem 5.3.12 shows a lower bound on the rmatching number of th e trigger hypergraph of doped “extrem al” «SAdZ^= i-clause-sets. From tchis follows immediately Theorem 5.3.13, th a t for every k G No there are polysize clause-sets in

lUCk+i, where every equivalent clause-set in WCk is of exponential size. T hus the UCk as well as tihe WCk hierarchy is strict regarding equivalence of polysize clause-sets.

1L.3.2

Strict hierarchies for knowledge com pilation

L ooking at UCk = SCUTZk again from the UC perspective, i.e., for knowledge compilation, the q uestio n is where does it fit w ith respect to existing knowledge com pilation classes? [34] gives am overview of the CNF-based targ e t languages (prime implicates, UC, 2-CCS, Horn clause- scets). [59] consider disjunctions of simple CNF classes. [46] provides an overview of targ et ccompilation languages based on “nested” (graph-based) classes, nam ely variants of NNF, DNNF amd BDDs. In all cases query complexity and succinctness is investigated. This thesis focuses om CNF representations, w ith the hope in the outlook towards good representations for current rcesolution-based SAT solvers. All of the CNF classes studied in [34, 46, 59] are included at the fiirst three levels of the hierarchy UCk, namely, sets of prim e implicates in U Cq, (renam able) Horn

cllause-sets in UC\ =UC, and 2-C C S in UC 2

-We will see in Theorem 4.7.1 in C hapter 4 th a t UCk,VCk and WCk have th e same poly-time qiueries as the PI knowledge com pilation class (i.e., th e class of clause-sets which are sets of pirime implicates; see Section 2.6 of C hapter 2). This is shown in com parison to other knowledge ccompilation classes in Figure 1.4 (w ith full discussion in C hapter 4). Along w ith separation reesult in Theorem 5.3.13, this means th a t UCk, VCk and WCk form interm ediate layers between tHie PI and CNF knowledge com pilation classes, offering increasing more succinct representations fuirther up the hierarchy, at the cost of polynomially more query effort (illustrated in Figure 1.5). F urtherm ore, every fixed level of each hierarchy is a complete class w ith respect to representation

(21)

N o ta t io n Q u e ry

C O poly-time consistency (satisfiability) checking

V A poly-tim e validity (tautology) checking

C E poly-time clausal entailm ent checking

IM poly-tim e im plicant check

E Q poly-time equivalence checking

S E poly-time semantic entailm ent

C T poly-tim e model counting

M E poly-tim e model enum eration

Figure 1.3: N otations for knowledge com pilation queries. Full details are available in Theorem 4.7.1 in C hapter 4 and in [46].

of boolean functions (as are all in Figure 1.5), unlike classes such as 2-C C S (CNF clause-sets w ith clauses of size a t most two) or H O (Horn clause-sets) and other hierarchies for polynomial tim e satisfiability like VSHOk (generalised renam able Horn clause-sets; see Section 2.7.2.1); each of which is included a t some fixed level oiU C k C W C k .

An im portant point here is th a t, although th e separation between levels oiUCk, VCk and WCk is only w .r.t to equivalent representations (i.e., w ithout new variables), this perfectly fits into the knowledge compilation framework. Allowing the addition of new variables, i.e., moving to classes such as 3UC from [25], means th a t queries such as SE (semantic entailm ent, i.e., checking w hether one CNF clause-set F £ 3 UC entails another F ' £ 3 UC on free variables) are no longer poly-time decidable because unlike for CNF clause-sets (w ithout new variables) one can no longer enum erate sets of falsifying assignments on th e free (original) variables by enum erating clauses of the clause-set (i.e., for a CN F clause-set F we have F |= F ' iff VC £ F ' : F |= C which is not the case for existentially quantified CNFs). In particular, it is shown in [25] th a t 3 UC, as well as other query classes built by taking th e closure of UC under disjunction, does not allow poly-time VA, IM, EQ or SE queries (as shown in Figure 1.4) while UC, and now more generally UCk, does.

1.4

“G oo d ” SAT representations

So we have seen th a t th e UCk, VCk and WCk hierarchies are promising target classes for knowl­ edge com pilation (KC) and polynomial tim e satisfiability, offering interm ediate layers between the CNF and PI classes in KC. Looking back a t the S C U V k perspective and connections to resolution (and hence to m odern SAT solvers), in the outlook these classes should also provide good target classes for SAT translation, i.e., th e translation to CNFs to be solved by m odern SAT solvers. In general, for translations to SAT a typical p ath is

P r o b l e m —> C o n s tr a in ts —> B o o le a n f u n c tio n s —>■ SA T .

' '

f o c u s o f t h i s t h e s is

The VCk,UCk and WCk classes now offer “good” targ et classes for representing boolean functions (i.e., after a constraint is encoded as a boolean function) in the sense th a t they guarantee the existence of short tree- resp. full-resolution proofs. In [4, 129, 130] we see evidence th a t modern CDCL SAT solvers are able to find resolution proofs w ith low complexity in expected polynomial tim e. Furtherm ore, in C hapter 7, we see experim ental evidence th a t when the level of UCk is low enough m odern state-of-the-art SAT solvers are able to solve these problems relatively quickly.

(22)

L C O V A C E IM EQ SE C T M E NNF o o o o o o o o DNNF o o o o o d-NNF o o o o o o o o s-NNF o o o o o o o o f-NNF o o o o o o o o d-DNNF ? o ✓ sd-DNNF ? o BDD o o o o o o o o FBDD ? o OBDD o OBDD< DNF o o o o o CNF o o o o o o PI o IP ✓ ✓ ✓ o ✓ MODS ✓ ✓ u c k o r c k o W C k ✓ ✓ ✓ o ✓ 3UCoo o o oU C [ V ,3}oo o o o

Figure 1.4: Subsets of the NNF language and their corresponding polytim e queries, ^ m e a n s “query possible in polytim e” , o means “not possible in polytim e (in general) unless P = N P ” and ? th a t there is no known result either way. Results and definitions for UCk, VCk and WCk are from Theorem 4.7.1 and C hapter 4 resp., results and definitions for 3UC and UC[ V , 3 ] are from [25] and all other results and definitions are from [46]. See Figure 1.3 for query descriptions. [An NNF-language represents a boolean function as a rooted DAG (directed acyclic graph) w ith

A and V a t th e nodes and 1, 0, literal X or literal ->X at th e leaves. Subsets of NNF add constraints on th e DAG such as decomposability (D), determ inism (d-) and smoothness (s-) - see [46] for definitions.]

(23)

P I IP UC UCy C N F D N F N N F WC w c F B D D O B D D D N N F M O D S - u c k - x d -D N N F s d -D N N F O B D D

Figure 1.5: Illustration of th e succinctness relations between knowledge com pilation classes from [46] and those introduced in this thesis. If there is a solid directed p a th (using only —») from class L to class L' then every boolean function with poly-size representations in L ' has a poly-size representation in L (i.e., L is a t least as succinct as L'). If there is no solid directed p ath from L to L ' but there is a directed p ath (using some --->) then it is unknown w hether L is a t least as succinct as language L ' . No directed p ath from L to L' indicates th a t there are boolean functions for which L has a succinct (poly-size) representation b u t L' does not. Results relating to UCk, VCk and WCk are from Theorem 5.3.13; all other results are from [46]; some of which rely on the non-collapse of the polynomial hierarchy. More details are given in Section 4.7 of C hapter 4.

(24)

Furtherm ore, a more direct relation between SAT solving and UCk is possible. The class UCk uses generalised UCP, namely the reduction r*,. Especially r2, which is (complete) failed-literal

elim ination, is used in look-ahead SAT solvers (see [88] for an overview) such as OKsolver ([108]),

march ([89, 87]) and Satz ([119]). Also conflict-driven solvers such as CryptoMiniSat ([149])

and PicoSAT ([18, 20]) integrate r2 during search, and solvers such as Lingeling ([20, 21]) use

r2 as a preprocessing technique. Furtherm ore, in general r*, is used, in even stronger versions,

in th e Stalmarck-solver (see [153, 84, 143], and see Section 3.5 of [104] for a discussion of the connections to r^), and via breadth-first “branch/m erge” rules in HeerHugo (see [73]).

1.4.1

N ew variables: the relative vs absolute condition

C ertain results in this thesis apply only to representing boolean functions without new variables. On th e one hand, fixing the boolean function has certain advantages. We saw for knowledge com­ pilation th a t considering only equivalent representations had the upside th a t certain queries (i.e., VA, EQ, and SE) were poly-time decidable, while not being polytim e decidable once existential quantification was introduced. W hen finding “good” SAT representations, avoiding the use of new variables allows the enum eration and optim isation of representations within a target-class, i.e., the space of all representations is finite and th e optim um representation can be searched for (e.g., see [74]).

On the other hand, when translating constraints to satisfiability, new variables can play a pivotal role, both in term s of succinctness (for example, the XOR clauses in Section 6.4 of C hapter 6 have no short representation w ithout new variables at all) and in term s of power, e.g., many representations which “m aintain arc-consistency via U C P ” do this via the introduction of new variables with certain semantic meaning (see e.g., cardinality constraint translations in [7]). W hen using UCk, VCk and WCk as target-classes for SAT tran slatio n an im portant question is not ju st whether one uses new variables, bu t w hat being in UCk etc means in the presence of new variables.

The most prevalent notion of “good” representation in satisfiability (of some higher level constraint) is one th a t “m aintains arc-consistency via unit-clause propagation” . This is a relative notion th a t requires th a t for any partial assignm ent (instantiation) of the constraint variables (applied to th e representation) unit-clause propagation on th e CNF representation sets variables which are “forced” at the constraint level (see Section 2.8 for an explanation). As w ith the 3UC knowledge compilation class, there is a distinction between th e bound new variables and the free original variables - the properties regarding propagation only apply to p artial assignments over the original variables. Translating a constraint “fully” as a clause-set into one of the UCk, VCk, or WCk hierarchies is instead an absolute condition - th e clause-set knows nothing about the constraint and all partial assignments m ust be considered - a m odern SAT solver can branch on any variable (including new).

In fact, in [94] it is shown th a t conflict-driven solvers w ith branching restricted to input variables have only superpolynomial run-tim e on E P H P ^, an Extended Resolution extension to the well-known pigeon-hole formulas, while unrestricted branching determ ines unsatisfiability quickly (see Section 8.2 for more on this). Also experim entally it is dem onstrated in [96] th a t input-restricted branching can have a detrim ental effect on solver tim es and proof sizes for m odern CDCL solvers. This adds m otivation to the fundam ental choice of considering all variables (rather th a n ju st input variables), when deciding w hat properties we w ant for SAT translations. This is called here the “absolute (representation) condition” , taking also the auxiliary variables into account, while the “relative condition” only considers th e original variables. Besides avoiding the creation of hard unsatisfiable sub-problems, th e absolute condition also enables one to study th e targ et classes, like VC, on their own, w ithout relation to w hat is represented, i.e., UCk is just a class of clause-sets.

References

Related documents

The IESO and all market participants are required to prepare emergency preparedness plans to ensure grid reliability. As part of the emergency planning process, the IESO and

The object of the present Contract consists of the promotion and sale in international markets of the products manufactured by the Company as specified in Annex 1 of the

The capital stock in the wage equation is divided by the labor force in- stead of the level of employment to avoid the introduction of cyclical noise but of course since ln L − ln LF

We examined the effect of the type of the LDL receptor gene mutations and of common gene polymorphisms possibly affecting HDL metabolism [cholesterol ester transfer protein

We find that due to network externalities, prices for both products in the closed source model and for the complementary product in the open source model will begin low in order

This study provides a bibliometric analysis of North American tourism research by using papers published in Annals of Tourism Research , Journal of Travel Research and

1 Although the basic perioperative transesophageal echocardiographic examination consensus statement outlines the specific views required to obtain these measurements, 8 the use

Basic design tasks that have to be accomplished during the design process, determining the structure of a simulation run.. The duration and sector of the tasks as well as