Automated design of boolean satisfiability solvers employing evolutionary computation

(1)

Masters Theses Student Theses and Dissertations

Summer 2016

Automated design of boolean satisfiability solvers employing

evolutionary computation

Alex Raymond Bertels

Follow this and additional works at: https://scholarsmine.mst.edu/masters_theses Part of the Artificial Intelligence and Robotics Commons

Department: Department:

Recommended Citation Recommended Citation

Bertels, Alex Raymond, "Automated design of boolean satisfiability solvers employing evolutionary computation" (2016). Masters Theses. 7549.

https://scholarsmine.mst.edu/masters_theses/7549

This thesis is brought to you by Scholars' Mine, a service of the Missouri S&T Library and Learning Resources. This work is protected by U. S. Copyright Law. Unauthorized use including reproduction for redistribution requires the permission of the copyright holder. For more information, please contact [email protected].

(2)

EMPLOYING EVOLUTIONARY COMPUTATION

by

ALEX RAYMOND BERTELS

A THESIS

Presented to the Faculty of the Graduate School of the

MISSOURI UNIVERSITY OF SCIENCE AND TECHNOLOGY

In Partial Fulfillment of the Requirements for the Degree

MASTER OF SCIENCE IN COMPUTER SCIENCE

2016

Approved by

Dr. Daniel Tauritz, Advisor Dr. Bruce McMillin

(3)

(4)

ABSTRACT

Modern society gives rise to complex problems which sometimes lend them-selves to being transformed into Boolean satisfiability (SAT) decision problems; this thesis presents an example from the program understanding domain. Current conflict-driven clause learning (CDCL) SAT solvers employ all-purpose heuristics for making decisions when finding truth assignments for arbitrary logical expressions called SAT instances. The instances derived from a particular problem class exhibit a unique underlying structure which impacts a solver’s effectiveness. Thus, tailoring the solver heuristics to a particular problem class can significantly enhance the solver’s per-formance; however, manual specialization is very labor intensive. Automated devel-opment may apply hyper-heuristics to search program space by utilizing problem-derived building blocks. This thesis demonstrates the potential for genetic program-ming (GP) powered hyper-heuristic driven automated design of algorithms to create tailored CDCL solvers, in this case through custom variable scoring and learnt clause scoring heuristics, with significantly better performance on targeted classes of SAT problem instances. As the run-time of GP is often dominated by fitness evaluation, evaluating multiple offspring in parallel typically reduces the time incurred by fitness evaluation proportional to the number of parallel processing units. The naive syn-chronous approach requires an entire generation to be evaluated before progressing to the next generation; as such, heterogeneity in the evaluation times will degrade the performance gain, as parallel processing units will have to idle until the longest evalu-ation has completed. This thesis shows empirical evidence justifying the employment of an asynchronous parallel model for GP powered hyper-heuristics applied to SAT solver space, rather than the generational synchronous alternative, for gaining speed-ups in evolution time. Additionally, this thesis explores the use of a multi-objective GP to reveal the trade-off surface between multiple CDCL attributes.

(5)

ACKNOWLEDGMENTS

Before applying to graduate school, I was asked why I wanted to get a master’s degree. Was it for a higher salary or because I was not ready to abandon the warm embrace of academia? While both can be perks of the position, I was primarily eager to make my mark on the field I had spent my undergraduate years immersed in. As every academic can attest to, each attempt to make a contribution is not without sacrifice, and we must acknowledge those people in our lives that give us strength in the times that we struggle. Without a doubt, I am forever indebted to my advisor, Dr. Daniel Tauritz. He gave me encouragement in the face of overwhelming obstacles. From my freshman year, Dr. Tauritz has never passed the opportunity to push me to my limits; I am certainly better for all of his efforts.

I must also thank Dr. Samuel Mulder and Dr. Bruce McMillin. Dr. Mulder has always steered me in the right direction during my semesters in school and sum-mers at internships. Dr. McMillin was more than willing to provide guidance in my course work and ensured that I was worthy of this degree.

I am grateful for all my family and friends that encouraged me through the sleepless nights and holidays away from home. I would especially like to thank my brother, Jacob, my sister, Meredith, and, most importantly, my parents, Matthew and Michele, for their love and support throughout my entire life. To them and all those that I could not have completed this degree without, I dedicate my thesis.

I appreciate the funding providing by Sandia National Laboratories through their Critical Skills Master’s Program that made my graduate studies possible. San-dia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.

(6)

TABLE OF CONTENTS

Page

ABSTRACT . . . iii

ACKNOWLEDGMENTS . . . iv

LIST OF ILLUSTRATIONS . . . viii

LIST OF TABLES . . . ix

SECTION 1 INTRODUCTION . . . 1

2 RELATED WORK . . . 7

3 MOTIVATIONAL STUDY OF STRUCTURE IN SAT INSTANCES . . . 11

3.1 TRANSFORMING PROGRAM UNDERSTANDING PROBLEMS . 11 3.1.1 Specifications . . . 11 3.1.2 Design . . . 12 3.1.3 Pseudocode Parsing . . . 13 3.1.3.1 Assignment operation . . . 13 3.1.3.2 Comparison operations . . . 13 3.1.3.2.1 Equal . . . 14 3.1.3.2.2 Greater than . . . 14 3.1.3.2.3 Less than . . . 14 3.1.3.3 Arithmetic operations . . . 15 3.1.3.3.1 Addition . . . 15 3.1.3.3.2 Subtraction . . . 15

(7)

3.1.4 Resultant DIMACS SAT Instance . . . 15

3.2 CONDITIONAL CODE EXECUTION . . . 17

3.2.1 Branching . . . 17

3.2.2 Looping . . . 19

3.2.3 Remarks on Program Understanding Structure . . . 20

4 METHODOLOGY . . . 21 4.1 ADSSEC VERSION 1.0 . . . 21 4.1.1 Heuristic Representation . . . 22 4.1.2 Objective . . . 23 4.1.3 Evolutionary Algorithm . . . 25 4.1.3.1 Population initialization . . . 25

4.1.3.2 Parent selection and variation operators . . . 26

4.1.3.3 Survival selection . . . 26 4.1.3.4 Termination . . . 27 4.2 ADSSEC VERSION 2.0 . . . 27 4.2.1 Heuristic Representations . . . 27 4.2.2 Objective . . . 30 4.2.3 Selection . . . 31 5 EXPERIMENTATION . . . 32

5.1 ASYNCHRONOUS VERSUS SYNCHRONOUS APPROACHES . . . 32

5.1.1 Experimental Setup . . . 32

5.1.2 Results and Discussion . . . 35

5.2 ADSSEC VERSION 1.0 EXPERIMENT . . . 37

5.2.2 Results . . . 39

(8)

5.3 ADSSEC VERSION 2.0 EXPERIMENT . . . 45

5.3.2 Results and Discussion . . . 48

6 CONCLUSION . . . 54

7 FUTURE WORK . . . 56

BIBLIOGRAPHY . . . 59

(9)

LIST OF ILLUSTRATIONS

Figure Page

1.1 DPvis Structure of Application SAT Instances . . . 2

1.2 The conflict shows that under those assignments ofaand bthe expression cannot be true. . . 4

2.1 Search space of ADSSEC (not to scale) . . . 10

5.1 Boxplots of evolution time relative to the mean of the evolution time for the asynchronous runs on each respective dataset . . . 35

5.2 k-colorable graph cactus plot comparing number of decisions to solve . . 40

5.3 Modularity cactus plot comparing number of decisions to solve . . . 41

5.4 k-colorable graph cactus plot comparing CPU time needed to solve . . . 43

5.5 Modularity cactus plot comparing CPU time needed to solve . . . 43

5.6 Evolved heuristic for k-colorable graph instances . . . 45

5.7 Evolved heuristic for modularity instances . . . 46

5.8 Modularity cactus plot comparing number of decisions to solve . . . 49

5.9 Modularity cactus plot comparing number of conflict literals to solve . . 49

5.10 Modularity cactus plot comparing runtime to solve . . . 50

5.11 Modularity cactus plot comparing runtime to solve with top solvers . . . 50

5.12 k-colorable graph cactus plot comparing number of decisions to solve . . 51

5.13 k-colorable graph cactus plot comparing number of conflict literals to solve 52 5.14 k-colorable graph cactus plot comparing runtime to solve . . . 53

5.15 k-colorable graph cactus plot comparing runtime to solve with evolved variable scoring heuristics of both datasets . . . 53

(10)

LIST OF TABLES

Table Page

3.1 Clauses and Variables needed to represent example using n-bit values . . 16 3.2 Extended Examples . . . 17 5.1 ADSSEC EA parameter settings . . . 34 5.2 ANOVA results of evolution time in seconds comparing both models of

ADSSEC . . . 36 5.3 ADSSEC EA parameter settings . . . 39 5.4 Statistical Comparison on mean CPU time . . . 42 5.5 Metrics of worst modularity instances for MiniSat and the evolved variant 44 5.6 ADSSEC EA parameter settings . . . 47 5.7 Wilcoxon Signed-Ranks Test for Paired Samples comparing CPU time of

(11)

In some cases, real-world problems can be modelled as assertions on inter-actions between components in a defined domain; when these assertions are found to be correct, the assertions have been satisfied. Boolean satisfiability (SAT) in-stances – the subset of assertion problems that can be represented with Boolean components and associations – are typically divided further into three distinct sets determined by the method used to generate each instance. The less restrictive being random generation, with hand crafted and hard combinatorial instances meeting a few more criteria, and the industrial (or application) class being a direct mapping of real-world problems. Industrial SAT instances inherit associations from their parent problems. Whether the instance is derived from mathematical applications, such as graph coloring or solving Polarium puzzles, or computing-related fields (e.g., cryptog-raphy and scheduling) [1], will determine the “structure” of the Boolean expressions and their respective variables. Figure 1.1 illustrates the variable dependency graphs constructed by the DPvis tool∗ on several application instances. The left, top-right, and bottom-left are program understanding propositional formulas from the bounded model checker, BMC†. These particular instances are barrel2, longmult2, and queueinvar2 respectively. The fourth formula, called flat30-1, was taken from the graph coloring dataset flat30-60‡. There are very few pairs of nodes connected by multiple edges in the graph coloring figure, while every edge in the other structures is accompanied by at least one other edge. The higher collocations in the program understanding instances suggest that variable assignments are much more dependent on the assignment of other variables than in other applications. The relationship

∗ http://www-sr.informatik.uni-tuebingen.de/~sinz/DPvis/ † http://www.cs.cmu.edu/~modelcheck/bmc/bmc-benchmarks.html ‡ http://www.cs.ubc.ca/~hoos/SATLIB/benchm.html

(12)

becomes more apparent when considering the domain from which program under-standing instances are derived.

Figure 1.1 DPvis Structure of Application SAT Instances

Digital systems store integers as binary values and interpret arithmetic and comparison operations on these values with logic gates. Occasionally, the values are presented as variables, rather than constants, that depend on the outcome of previously performed operations. The logic gates establish a bit-wise relationship between the given integer values and/or variables and the resultant value. While the result of an arithmetic operation is another n-bit integer, comparison operators

(13)

produce a 1-bit status (i.e., 1 for true and 0 for false). All assignments, arithmetic operations, and comparisons with a status that is true can be represented as boolean expressions that are satisfied by at least one set of bits. A path through a program is simply the conjunction of multiple expressions. Given that any boolean expression can be transformed intoAND,OR, andNOTgates, a program path can be translated into a SAT instance. Many SAT-solvers have been developed to scale to large instances with low complexity. SAT solvers can be employed to verify paths with predetermined values. Also, if solutions can be found to paths that contain variables with unknown values, then the solution will contain a possible value for that variable.

If no solution is found for a given SAT instance, then the instance can be either undetermined or unsatisfiable. A solution can be proven unsatisfiable if a contradiction is found or if all possible variable assignments have been attempted. In this case, the path through the program will never be executed. However, if the SAT solver has not performed an exhaustive search and cannot find a contradiction, then nothing can be stated about the path through the program.

As previously mentioned, classes of structured SAT instances are created by encoding a specific problem class in SAT. The variable interactions or associations in each instance in a class define a distinct structure. Empirical evidence shows that each SAT solver has an ideal underlying instance structure and that each class of structured instances has an optimal solver [2, 3, 4]. Efficiently solving instances in a specific class requires finding the solver and parameter configuration that perform best for that class.

SAT instances are often structured in the conjunctive normal form (CNF), where clauses are the disjunctive components [5]. Each clause is composed of literals, and each literal can be either a variable or the negation of a variable (e.g.,x or¬x). As SAT solvers attempt to find a solution to an instance or prove that the instance is unsatisfiable, the solver must make decisions for the assignment of Boolean variables – often determined by a variable scoring heuristic – in the logical expression. A simple

(14)

example of variable assignments is illustrated by Figure 1.2. When many of the decisions cause conflicts in the Boolean statement, a conflict-driven clause learning (CDCL) solver determines that a restart is necessary. Restart heuristics dictate how frequently the solver backtracks to either the beginning of the search or to some stored decision. Static heuristics will restart a search after a set number of conflicts have occurred; smaller limits have been shown to be effective for certain satisfiable problem classes [6]. However, an empirical analysis comparing static to dynamic heuristics indicates that restarts dependent on detailed state-related information are much more effective [6]. Evolving heuristics that borrow primitives from both types may improve the performance for a targeted class.

(x∨ ¬a) | {z } clause ∧(¬x |{z} literal ∨b) = T a =T and b=F (x∨F)∧(¬x∨F) CONFLICT(x6=T/F) a=Fand b=F (x∨T)∧(¬x∨F) NO CONFLICT (x=F) Figure 1.2 The conflict shows that under those assignments ofaandbthe expression

cannot be true.

When a CDCL solver encounters a conflict, a “reason” for the conflict is constructed and stored. These “reasons” are called learnt clauses [7]; and, learnt clauses are employed to help make decisions. CDCL solvers can reach millions of conflicts in a run and cannot afford to keep all the learnt clauses. As such, some learnt clauses must be discarded. The clauses that are the most useful are kept and the rest are forgotten. The usefulness of a learnt clause is computed through a scoring heuristic. Generating novel learnt clause scoring heuristics will ensure that the most important clauses are not removed. Developing primitives from CDCL state-related values can make evolving these heuristics possible.

(15)

While evolving CDCL heuristics in a population, each solver containing a gen-erated heuristic must be evaluated against several instances from the target problem classes to measure the overall effectiveness of the new heuristic. When dealing with sampled fitness, some instances may take significantly more computational time to evaluate than others. This often leads to large variations in the computational time needed to evaluate the fitness of a trial solution. Additionally, this occurs when the complexity of the representation in a population can significantly vary, such as is often the case in Genetic Programming (GP), where typically limits are placed on genotype size (tree depth in Koza style GP) and larger genotypes are penalized to create par-simony pressure to combat bloat [8, 9]. Hyper-heuristics are a type of meta-heuristic which search program space for the purpose of automating the design of algorithms. They typically employ GP and their fitness evaluation relies on a sample consisting of multiple test cases [10, 11]. With hyper-heuristics, especially in the case of evolv-ing CDCL heuristics, the evaluation time varies drastically with the difficulty of the datasets, the sample size, and the quality of the heuristic. Additionally, these factors can also result in lengthy evolution times.

Given a distributed computing resource, such as a multi-core machine, or parallel cluster, hyper-heuristics implemented as Evolutionary Algorithms (EAs) are often able to reduce overall runtime by distributing individuals in the population to be evaluated concurrently. Synchronous Parallel EAs (SPEAs) maintain the generational step that is typical to most EAs; however, this approach suffers from idle CPU cycles when the fitness evaluation times vary. Asynchronous Parallel EAs (APEAs) eliminate this wasted time by producing offspring as slave nodes in the distributed computing resource become idle.

This thesis demonstrates how structure is introduced into a particular problem class, namely program understanding. In the interest of targeting solvers to particular problem classes, the design and modifications of the ADSSEC (Automated Design of SATSolvers employing Evolutionary Computation) system is discussed in detail.

(16)

Empirical studies are included supporting the performance gain of APEAs when compared to SPEAs on global populations of CDCL SAT solver variable scoring heuristics, as opposed to distributed populations such as in island-model EAs and diffusion EAs [12, 13, 14, 15]. Additionally, this thesis investigates the quality of the heuristics produced by the ADSSEC system as well as the effectiveness of its approach. Possible directions for this research are presented to address the limitations and expansion of automatically designing CDCL SAT solvers.

This thesis makes the following contributions:

• Provides empirical evidence of the substantial performance gains of the asyn-chronous hyper-heuristics approach over the synasyn-chronous alternative.

• Introduces a generative hyper-heuristic approach employing genetic program-ming to automatically construct variable scoring heuristics and learnt clause scoring heuristics for a CDCL solver.

• Significantly decreases CDCL SAT solver heuristic evolution time using an asyn-chronous parallel evolutionary algorithm.

• Evolves novel variable scoring heuristics that target classes of structured SAT instances.

• Demonstrates the potential of the hyper-heuristic approach to create efficient solver portfolios targeting particular problem classes.

• Discusses the automated design of CDCL SAT solver restart schemes as well as combining multiple components during evolution.

(17)

2 RELATED WORK

Populations within a parallel evolutionary algorithm (PEA) can either be structured as a single, centralized population or as multiple decentralized subpopula-tions [15]. Distributing subpopulasubpopula-tions over available machines achieves near-linear scalability, with the sole overhead due to inter-population communication through interchange of select individuals at typically fixed time intervals called epochs. This allows for each subpopulation to evolve semi-independently, while slowly diffusing genetic material throughout all populations. Alba et al. have reported on the effec-tiveness of various behaviors, particularly distributed and cellular reproduction, in distributed PEAs [12, 13, 14].

Durillo et al. have shown empirical evidence supporting the significant im-provement in terms of various quality metrics when employing APEAs rather than SPEAs for NSGA-II [16]. The APEA master process creates and sends individuals to be evaluated as the slave processors become idle. In the generational version, the population is replaced when enough offspring have been generated. With the steady-state alternative, the offspring are considered as each is received. The researchers employed homogeneous populations as the test cases during experimentation. While these results still apply to heterogeneous populations, an in-depth runtime analysis should be completed to measure performance.

Those that have specifically addressed heterogeneous populations note that APEAs are biased toward individuals with shorter evaluation times [17, 18, 19]. This is a result of the master process receiving those individuals sooner and more often, flooding the population. This potentially reduces the search space that can be reached within the runtime. Yagoubi and Schoenauer attempt to circumvent this with a duration-based selection on the received offspring [18]. This supposed defect can be taken advantage of in various situations, one of which is evolving genetic programs,

(18)

which must use a mechanism such as parsimony pressure or must minimize a size-related objective value to prevent any individual from becoming too large. If the size of the individual is proportionate to the evaluation time, then the bias provided by heterogeneous evaluation times can be used to produce an implicit parsimony pressure [20, 21]. This characteristic may not necessarily be present in some hyper-heuristics if the size of the genotype has little effect on the evaluation time, as is noted in the present application.

This research applies generative hyper-heuristics [22], approaches to automat-ically develop heuristics or algorithms, to generate CDCL SAT solvers tailored for an arbitrary, but particular, problem class, to populate solver portfolios. Alternatively to a generative approach, selective hyper-heuristics are provided with pre-existing heuristics from which to choose the best option [22]. While not quite as flexible as generating new heuristics, selecting heuristics can be beneficial if multiple compo-nents need to be matched together and effective heuristics are known. In this case, only a single heuristic is being modified during evolution and the generative approach can explore more of the search space. As is typical, the hyper-heuristic employs ge-netic programming (GP) to automatically reorganize and manipulate the algorithmic primitives constituting the heuristic [11, 23]. These primitives can be as general as state- related variables and binary operations or as specific as carefully constructed functions with tunable inputs. ADSSEC is a hyper-heuristics framework that uses CDCL state-based information and binary operations to automate the development of new restart or learnt clause scoring heuristics. ADSSEC’s primitives are more granular than statements in source code and are therefore much more versatile in developing new solver components than the line substitution approach proposed by Petke et al. designed to optimize entire SAT solvers [24, 25].

Ideally, a single solver would be able to adapt to an application at runtime. Tools such as ParamILS [26], SMAC [27], and − most recently − SpySMAC [28] automatically tailor the parameter configurations of reasonably versatile solvers to

(19)

particular datasets. While the parameter configurations are adjustable, most of the internal methods of solvers remain the same. The effectiveness of adapting a solver to a problem class solely through parameter optimization is limited by the appropri-ateness of the solver’s architecture for that problem class.

Running all computable solvers with all configurations simultaneously would guarantee the shortest possible time to solve a given instance. However, obtainable resources restrict this parallel procedure to a subset of existing solvers with carefully selected parameters. This method is referred to as a portfolio approach. Xu et al. were able to predict which solvers in a portfolio were able to perform well in particular domains [29, 30]. They did this by calculating values for a set of instances, testing the portfolio on the instances, and using machine learning to relate solvers to a given instance. This pairing of solvers with instance classes allowed Xu et al. to reduce the portfolio to the best suited solvers. Hutter et al. expanded on this work by employing these calculated values to predict the runtime of SAT solvers [31]. Portfolios of algorithms provide high flexibility in discovering the rightexisting solver for the job assuming that the right solver is in the portfolio to begin with.

Previous work has automatically evolved variable selection techniques for stochastic local search (SLS) solvers [32, 33, 34, 35, 36]. However, CDCL solvers are still the most efficient SAT solvers, and although Biere and Fr¨ohlich demon-strated that restart and variable selection schemes drastically impact CDCL solver efficiency for specific problem classes [6, 37], no known work focuses on automatically evolving CDCL heuristic functions. It is believed that appropriate CDCL operations will increase the effectiveness of a CDCL SAT solver in targeting classes of instances with unique structure. The diagram in Figure 2.1 illustrates how ADSSEC-generated SAT solvers relate to the entire SAT solver space.

(20)

(21)

3 MOTIVATIONAL STUDY OF STRUCTURE IN SAT INSTANCES

As a motivational study of the structure in SAT instances, the following sec-tions describe a brief working example that investigates an encoding of the program understanding program class. This transformation is merely a demonstration of in-teractions of clauses and variables in application-based instances. The encoding is not a novel contribution of this thesis. Clarke et al. provide a much more expansive and detailed explanation of this problem class [38, 39].

3.1 TRANSFORMING PROGRAM UNDERSTANDING PROBLEMS

In order to illuminate the encapsulation of structure in SAT instances, a tool for converting rudimentary pseudocode into the CNF format poses an introduction to the program understanding SAT problem class. At the current state of the framework, src2sat (read as source-to-SAT) is attempting to answer what possible inputs to a program will follow the path defined by the pseudocode.

3.1.1 Specifications. The steps that src2sat needs to take to discover the possible pseudocode inputs are as follows:

1. Parse the pseudocode representing a path through a program. This includes storing variables to be represented asn-bit integers.

2. Translate each arithmetic, comparison, and assignment operation into equiv-alent logic gate operations. These resulting boolean expressions will all be combined as a logical conjunction.

3. The conjunction will be converted to the DIMACS CNF to be used later as a SAT instance.

4. Solving this SAT instance will provide an input that will follow this path in a program.

(22)

3.1.2 Design. The first step in designing src2sat was to develop a simple pseudocode language that could represent the same operations that might be seen in very basic C programs. When following a path through a program, certain conditions must be met in order for the program to enter specific branches. These conditions along the path can be asserted as being true in this particular application. In this sec-tion, the following pseudocode will be employed as a working example during src2sat’s conversion process. To keep the example simple, the integers will be represented with 3-bits. input x z = 2 s = x + 1 assert(s < 2) y = s - z assert(y > -2)

Both positive and negative values are supported by src2sat. There are a few differences when including negative values. Negative values require the use of two’s complement; in that situation, arithmetic operators must allow the most significant carry bit− or borrow bit − to be either a 0 or a 1. If only positive values are used, that bit can be set to0. Also, when comparing a positive to negative value, the most significant bit must be checked. If that bit is not checked, then negative values will register as being greater. Allowing for both will add additional clauses to the SAT instance.

Floating-point values, or floats, have not been incorporated into src2sat. In addition to needing support for multiple types, these numbers would require different logical representations for each operation. Floats often consist of three components and are structured as a significand multiplied by some base to a given exponent. Operations would need to consider that operands may not necessarily have the same type, and, more importantly, the same structure.

(23)

3.1.3 Pseudocode Parsing. Each program will contain at least one line that starts with the keyword input. These are the only commands that will not produce a boolean expression. The instruction simply indicates that the program needs to store that variable in the symbol table as an n-bit integer. In the example, the input variable will be stored as

x=x2x1x0

All other instructions will produce boolean expressions that were designed from logic gates that at some level can be represented with just ANDs,ORs, and NOTs. However, to make things easier and less verbose, other more complex boolean operations were used, and the translation takes place in the next step.

The symbol table stores each variable name with their respective number of assignments. Given that the result of each assignment to a specific variable is inde-pendent, additional instances of that variable are needed to store each assignment’s result. For example, the first timex is assigned in the pseudocode, the boolean rela-tion will usex0. The second assignment to x will usex1 in the expression, the third

x2, and so on.

3.1.3.1 Assignment operation. In src2sat’s current state, the assignment operation always has a single variable on the left-hand side. The right-hand side can be another variable, an integer, or an expression. To ensure that the value of bit

a on the left-hand side is equivalent to bit b on the right-hand side, the following expression will be generated

a XNOR b or a⊕b

In the example, the line z = 2 would be interpreted as

(z2⊕0)(z1⊕1)(z0⊕0)

3.1.3.2 Comparison operations. All comparison operators will appear as an assert statement (i.e., assert(condition)). The comparison operations imple-mented so far are equivalence, not equal, greater than, greater than or equal to, less

(24)

than, andless than or equal to. Each allow for both sides of the operator to be either an integer or a variable. The logical disjunction of theequal operator and thegreater than and less than operators allows the formations of the greater than or equal to

and less than or equal to operators respectively.

3.1.3.2.1 Equal. Asserting that two values are equal is the same as assigning one value to the other. This is handled in exactly the same manner as the assign-ment operation previously described. To define thenot equal operator, the resulting expression forequal is simply negated.

3.1.3.2.2 Greater than. In deciding whether one value is greater than an-other, the boolean expression must rely on the relation between several bits in the two values. The logical statement that defines a > b for three bit integers is

((a2⊕0)·(b2⊕1)) +

((a2⊕b2)·(a2b2 + (x2a1b1) + (x2x1a0b0))); wherexi = (ai⊕bi)

The statementassert(y > -2) would have the following translation

((y₂⊕0)·(1⊕1)) +

((y₂⊕1)·(y₂1 + (y₂ ⊕1)y₁1 + (y₂⊕1)(y₁⊕1)y₀0)))

3.1.3.2.3 Less than. The representation for a < b is as follows (assuming three bit integers)

((a2⊕1)·(b2⊕0)) +

((a2⊕b2)·(a2b2 + (x2a1b1) + (x2x1a0b0))); wherexi = (ai⊕bi)

The statementassert(s < 2) would be translated to the following expression

((s2⊕1)·(0⊕0)) +

(25)

3.1.3.3 Arithmetic operations. Similar to the comparison operators, both sides of an arithmetic operator must be either an integer or a variable. Both addi-tion and subtracaddi-tion introduce new variables to handle the carry and borrow bits. Currently src2sat only allows for integer values.

3.1.3.3.1 Addition. The boolean expression for addition is based on the logic gates that construct a full adder. Two expressions are needed with the addition of the carry bits (c). The solution is represented by the variables.

(s⊕(a⊕b⊕cin))·(cout⊕(ab + acin + bcin))

The expression associated with the solution,s, in the example s = x + 1 is defined as

(s2⊕(x2⊕0⊕c2))·(s1⊕(x1⊕0⊕c1))·(s0⊕(x0⊕1⊕0))

and the carry bits are handled by

(c3⊕(x20 + x2c2 + 0c2))·(c2⊕(x10 + x1c1 + 0c1))· (c1⊕(x01 + x00 + 1·0))

3.1.3.3.2 Subtraction. The subtraction operator is similar to the addition operator; however, the operation relies on the outcome of the borrow bits (d).

(s⊕(a⊕b⊕din))·(dout⊕(¬ab + ¬adin + bdin))

The statementy = s - z is interpreted with the following two expressions

(y₂⊕(s2⊕z2⊕d2))·(y₁⊕(s1⊕z1⊕d1))·(y₀⊕(s0⊕z0 ⊕0)) (d3⊕(¬s2z2 + ¬s2d2 + z2d2))·

(d2⊕(¬s1z1 + ¬s1d1 + z1d1))· (d1⊕(¬s0z0 + ¬s00 + z00))

3.1.4 Resultant DIMACS SAT Instance. The src2sat program is de-pendent upon the PyEDA library (version 0.25)∗. This library can take the boolean

(26)

expressions as input and produce a DIMACS SAT instance. This library can also solve the generated instance. With the example pseudocode above, the SAT instance produced contains 18 variables in 59 clauses when assuming 3-bit integers. The num-ber of clauses and variables is dependent on the numnum-ber of bits needed to represent the values (Table 3.1). Also, three possible inputs would follow this path through a program. When solving the SAT instance,xis found to be0,7, and-8. The last two results are the boundary conditions when arithmetic operations can cause the values to wrap around from positive to negative and negative to positive.

Table 3.1 Clauses and Variables needed to represent example using n-bit values

n-bits Clauses Variables

3 18 59 4 24 83 5 30 107 6 36 131 7 42 155 8 48 179 9 54 203 10 60 227 .. . ... ... 32 192 755

The clause-to-variable ratio relies on varying aspects of the program path being analyzed. More variables, rather than values, used throughout the path requires that more bits be solved for in the SAT instance. In the example, if the variable,

z, had been replaced by the value 2 along the path, then there would have only been 15 clauses and 40 variables. The bloat introduced by usingz could potentially be mitigated by employing some preprocessing techniques on the path before the conversion.

(27)

Table 3.2 shows extended versions of the earlier example. These cases indicate that each operation has a different cost with regard to the number of clauses in the SAT instance. Additionally, the use of variables and values in the source code have an influence over the number of variables in the SAT instance. In the left extended version, the 3-bit conversion results in a SAT instance with 30 clauses and 79 variables. On the right side, the result contains 30 variables and 113 variables. A positive linear relationship exists between the number of lines and variable references in the source code and the number of clauses and variables in the SAT instance, respectively.

Table 3.2 Extended Examples

input x input x z = 2 z = 2 s = x + 1 s = x + 1 assert(s < 2) assert(s < 2) y = s - z y = s - z assert(y > -2) assert(y > -2) m = 0 + 0 m = x + x n = 0 - 0 n = m - x assert(1 == 1) assert(x == n)

3.2 CONDITIONAL CODE EXECUTION

While not implemented in src2sat, branching and looping are instrumental in representing most common languages. The following sections provide a brief overview how these affect the encoding of program understanding to SAT instances.

3.2.1 Branching. Branching in programs requires that some subset of state-ments are associated with, or implied by, one or more conditions. Each branch may make certain assertions or produce assignments to variables that determine future

(28)

code execution. The outcome of a set of branches can be represented with a disjunc-tion of implicadisjunc-tions. Below is a simple example of a ternary operadisjunc-tion:

a = b ? 0 : 1

In this case, if b is true (1), then a becomes 0; and, if b is false (0), then a becomes

1. This can be converted to the following disjunction:

(¬b·a) + (b· ¬a)

A more general scenario would be to see the ternary structured as anif-else branch. See the following pseudocode:

... if (c) x = 1 else x = 3 assert(x > 2) ...

In order to determine if there is a solution to the SAT instance − or a path through the program − where x > 2, the statements in both branches must first be associated with their respective conditions. Given that the logic operations be-hind the assignment operator and greater than operator are discussed above, these operators will be used in the boolean representation. Also as previously mentioned, each assignment to the same variable must be tied to a unique identifier. The pseu-docode will be reinterpreted such thatx0 = 1,x1 = 3, and x2 > z. With these new symbols, the relationship can be expressed as

((c ·(x2 = x0)) + (¬c ·(x2 = x1)))·(x2 > 2)

With the inclusion of elseif conditions and the increase in scope complexity, these expressions become more difficult to represent. The dependencies of variables and conditions are more manageable when stored as edges in a tree or graph.

(29)

3.2.2 Looping. Loops provide additional complexity in that the number of iterations is linearly related to the number of clauses needed to represent the loop in the SAT instance. If the number of iterations is known, then the loop can be “unrolled” before generating clauses. Other cases cannot be addressed with such simplicity.

One approach to loops with an unknown number of iterations is to generate multiple SAT instances. Initially, develop a boolean expression when the loop is not executed. If a solution cannot be found for the instance, then iterate through the loop once in the expression. Continue to increase the number of iterations until the instance can be satisfied. If some prior knowledge about the loop is known (e.g.,

do-whileloops), then this method may be modified.

The greatest common denominator (gcd) problem will be used as an example to show how this process would take place. The pseudocode is structured as follows:

function gcd(a, b) while b != 0 t := b b := a mod b a := t return a

In this example, assume that a and b will have known values. This initial boolean expression would attempt to solve for:

0 iterations

b0 == 0

If bdoes not equal zero, then no solution can be found for the generated SAT instance; and, another iteration of the loop will need to be included.

1 iteration

t0 = b0

(30)

a1 = t0 b1 == 0

This process will continue until the expression is satisfied.

2 iterations t0 = b0 b1 = a0 mod b0 a1 = t0 t1 = b1 b2 = a1 mod b1 a2 = t1 b2 == 0 .. .

This may require a maximum number of iterations or maximum solver wall time to determine when a SAT instance is no longer feasible to generate and solve for the given problem.

3.2.3 Remarks on Program Understanding Structure. The pseudocode language serves as a proof-of-concept. The src2sat tool is still far from encapsulating more commonly used languages. The pseudocode would better represent the pro-gram understanding problem class by incorporating conditional code execution and memory addressing. A possible starting point would be to use an assembly language (e.g., x86, ARM) as a reference. Also, specific paths will need to be extracted from the program to answer the question of possible inputs. While verification was the focus in this example, other questions may be answered with this conversion. More general than program understanding, the above working example demonstrates how structure is introduced in industrial SAT instances. In attempting to take advantage of that structure, there is certainly potential to discover or develop SAT solvers specif-ically tailored for this class of SAT instances. The remainder of this thesis details an approach to automate the design of problem-specific SAT solvers.

(31)

4 METHODOLOGY

The ADSSEC (Automated Design of SAT Solvers employing Evolutionary

Computation) system is a hyper-heuristics framework designed to automate the de-velopment of SAT solvers. The SAT solvers generated by ADSSEC are trained against specific sets of instances that represent a particular encoding or problem class. ADSSEC aims at discovering solver heuristics that perform well on datasets of interest, not necessarily produce a solver that is better in general. The following two sections feature the initial and updated implementations of the system, as well as fur-ther discussion on the integration of potential heuristic representations and existing CDCL SAT solver mechanisms.

4.1 ADSSEC VERSION 1.0

Influenced by the success of Fukunaga in improving SLS runtimes by evolv-ing specific heuristics [34], ADSSEC uses GP to evolve variable scorevolv-ing heuristics to automatically target CDCL solvers to specific classes of structured instances. In particular, the system employs Koza-style GP [40] as it is well suited to representing the parse trees of variable scoring heuristics. These heuristics assign high scores to more important variables leading CDCL solvers to select influential variables when making a decision. Biere and Fr¨ohlich demonstrated that the current best variable scoring heuristics are roughly equal in runtime performance when evaluated across many instance classes [37]. While no one heuristic function is best in all cases, each instance class has at least one heuristic that provides the best average runtime. Biere and Fr¨ohlich’s work motivated the choice to adapt CDCL solver variable scoring heuristics. ADSSEC creates an initial population of variable scoring heuristics and evolves the population through mutation and recombination. ADSSEC evaluates

(32)

these heuristics by replacing the variable selection heuristic in MiniSat [7], a com-monly used efficient and deterministic CDCL solver with dense source code. While ADSSEC employs a standard parent selection before producing offspring, its unique survival selection was specifically constructed for use by APEAs. ADSSEC returns the heuristics from the final population after reaching the termination criteria.

4.1.1 Heuristic Representation. Mapping variable scoring heuristic func-tions to objects easily manipulated in a GP is fairly straight-forward. Each scoring heuristic is represented as a parse tree where non-terminal nodes are operators and terminal nodes are state-related values.

Derived from currently implemented variable scoring heuristics [37], ADSSEC defines the following terminal nodes:

• Score (s): The previous score of the variable.

• Conflict Index (i): The current number of conflicts encountered.

• MiniSat Variable Decay Amount (f): Initially used as the rate of variable score decay in MiniSat. Now f is just used to derive MiniSat’s Variable Increment Value (MiniSat Default: 0.95).

• MiniSat’s Variable Increment Amount (g = (1/f)i): The amount MiniSat in-creases the score of a variable.

• Constant (C): A constant value in {1,2,3, . . . ,10} ∪ {0.0,0.1,0.2, . . . ,0.9}.

• Special Component (H): hm_i =        0.5·s if m divides i evenly s otherwise (4.1) wherem is a power of 2: {2,4,8, . . . ,1024}.

ADSSEC defines the following non-terminal (binary) operator nodes: Addi-tion (+), SubtracAddi-tion (−), Multiplication (∗), and Division (/). These arithmetic

(33)

operators may be applied because all the terminal nodes relate to either integer or floating-point values.

These nodes permit the evolution of novel schemes while still representing current schemes. For example, consider MiniSat’s current variable scoring heuristic:

s0 =s+g .

In ADSSEC, the following parse tree represents this heuristic:

+

g s

Put into words, the updated score of the variable is the sum of the previous score and the variable increment value maintained by MiniSat.

Again, ADSSEC evolves the parse tree genetic encodings. To evaluate each variable scoring heuristic, the parse tree is converted into a C++ statement. The original variable scoring heuristic is replaced in a pre-built MiniSat 2.2 by compiling and linking in the new variable scoring heuristic (C++ statement). The resulting solver is executed on test instances to evaluate the effectiveness of the heuristic. This method takes advantage of MiniSat and the performance derived from being implemented in C++ while reducing development time.

4.1.2 Objective. The objective score represents how well an evolved version of MiniSat performs on a provided training set of instances. The true objective score is a function (e.g., average) over all instances in the problem class being targeted. However, as it is infeasible to compute over a potentially infinite set of instances, instead a small sampling of instances provides an approximation as is discussed later. Determining the best performance measure to use in ADSSEC is difficult. Tradi-tionally the intent is to reduce the runtime needed to either find a solution or prove unsatisfiability. However, because ADSSEC evaluates several instances in parallel on the same hardware, runtimes for individual instances are inconsistent even with a deterministic solver. Therefore, runtime is substituted with the number of variable

(34)

decisions, which is a more consistent metric. An improved variable scoring heuristic should reduce this value. The per-instance sub-score for an evolved variant is the ratio of the number of decisions needed by the variant to that needed by the original selection scheme.

The objective score is then simply the average of all the instance sub-scores. Given that all the sub-scores are expressed relative to the original MiniSat, any evolved individual that performs identically to MiniSat’s variable scoring heuristic will end up with an objective score of 1.0. Lower scores indicate better schemes.

Occasionally, the EA will construct inadequate heuristics that cause the solver to require an inordinate number of decisions to reach a conclusion. The limiting func-tions are defined to prevent wasting evaluation time on such heuristics. ADSSEC relies on default MiniSat’s performance to approximate reasonable limits for any given SAT instance. Initially, ADSSEC limits an evolved MiniSat to three times the number of decisions the original MiniSat needed to solve that instance. While the multiplier of three is user-configurable, manual tuning indicated that this limit was fairly generous without wasting an excessive amount of evaluation time. These generous limits are required to collect decent heuristics in the population – decent heuristics provide complex genetic material for later optimization; they do not time out on all tested instances, but are generally worse than the original MiniSat. How-ever, as ADSSEC progresses through the evolutionary process, the interest shifts to exploiting the heuristics that are strictly better than the original. As such, the deci-sion limit linearly decreases down to the exact number of decideci-sions MiniSat needed for a specific instance or the average number of decisions for the sample set. For example, if ADSSEC is to complete 5000 evaluations throughout the run and the decision limit multiplier decreases from 3.0 to 1.0, then the multiplier is decremented by ((3.0−1.0)/5000 = 0.0004 after each evaluation.

Ideally, an accurate objective score would be determined by executing the evolved variable scoring heuristic against the entire dataset of interest. Because this is

(35)

generally too costly, ADSSEC utilizes strike-based sampling to gauge the effectiveness of a MiniSat variant. ADSSEC randomly selects a user-defined number of instances from the given training set to evaluate a variant. At the start of evolution, there is a bias toward selecting easier instances. The initial population contains mostly low-quality heuristics and evaluating these heuristics against difficult instances wastes evaluation time. As such, a bias is placed in favor of selecting easier instances in early evolution. This bias linearly transforms to a uniform selection at the end of the evolutionary process. For each instance in this selection, ADSSEC executes the evolved variant and assigns a sub-score ratio as described before. If that variant reaches the decision limit for that instance, then the variant receives a strike and a sub-score of the current decision-limit multiplier. After a variant reaches a user-defined number of strikes, all remaining sub-scores are assigned the current decision-limit multiplier.

4.1.3 Evolutionary Algorithm. The following sections define the evolu-tionary operations implemented within ADSSEC.

4.1.3.1 Population initialization. To create each individual in the initial population of variable scoring heuristics, ADSSEC randomly generates a parse tree from the primitives, or nodes, described previously (Section 4.1.1). First, as experi-mentation shows that no single terminal node produces an effective scoring scheme, ADSSEC assigns a random operator node to the root of the tree. ADSSEC then assigns two random nodes to the left and right branches of the operator node. There is a 50% chance that each node will be terminal (versus non-terminal). If the node is non-terminal, then ADSSEC repeats the process and assigns a random operator node. If the node is terminal, then there is a 50% chance that the node will be the previously assigned score,s; if nots, ADSSEC randomly assigns one of the other ter-minal node options. This bias was introduced because most current schemes appear to rely on the previous variable score. The maximum depth of a tree generated for the initial population was manually tuned to eight. Smaller depths contained much

(36)

less genetic diversity while larger trees produced complex heuristics that rarely solved instances in the decision limits.

4.1.3.2 Parent selection and variation operators. ADSSEC uses one of two methods to develop a single offspring (variant): mutation or recombination. For mutation, ADSSEC simply randomly selects a subtree in a random individual’s parse tree and replaces it with a new branch generated using the rules established in population initialization (Section 4.1.3.1) – without a depth limitation. Typically, APEAs mitigate tree growth by promoting an implicit parsimony pressure. This is under the assumption that smaller trees have shorter evaluation times and return to the population sooner. However, the strike-based sampling terminated bad heuristics quickly, which partially eliminated the implicit pressure. Fortunately, most overly-complicated heuristics receive poor objective scores and are removed during survival selection. For recombination, ADSSEC implements a sub-tree crossover: the system randomly selects two individuals in the population and replaces a random branch from the first parent with a random branch from the second parent. Again, this procedure only produces a single child.

4.1.3.3 Survival selection. The survival selection chooses which individ-uals in the population continue into the next generation. In ADSSEC, genetically diverse selection is encouraged so that smaller parse trees (which are generated more easily) do not flood the population. Certain small heuristics have adequate perfor-mance and, had one been discovered early on in evolution, could be spread throughout the population if diversity was not maintained.

Crowding functions are selection functions that excel at promoting genetic diversity in the population [41]. In a standard crowding function, an offspring com-petes with its closest parent, either replacing the parent or being dropped from the population in favor of the parent. In an APEA, however, generations are not clearly delineated and a parent can have multiple offspring being evaluated simultaneously. An asynchronous crowding function was developed that allows offspring to compete

(37)

with either their parents or any ‘siblings’ – or potentially descendants of siblings – that replaced the parents. A computationally cheap distance function compares histograms of node types (e.g., addition, constant, conflicts, etc.) to determine the closest remaining relative in the current population. To provide a fair comparison be-tween the models, the SPEA version of ADSSEC employs the same crowding selection method.

Parents have to be uniformly selected at random to ensure that each individual has an equal chance of providing genetic material to the pool. Additionally, uniform selection allows an equal chance of producing offspring, which can eliminate less fit parents from the population with the employed survival selection.

4.1.3.4 Termination. ADSSEC terminates the evolutionary cycle after com-pleting a user-defined number of evaluations. However, throughout the run individu-als may be replaced by randomly generated parse trees if the population has become stale or has converged. If the best individual has not been improved in a user-defined number of evaluations, ADSSEC introduces new material to the gene pool. Cur-rently, all variants whose performance is worse than that of the original MiniSat are replaced. This mechanism is useful in restarting the exploration of the variable scoring heuristics search space.

4.2 ADSSEC VERSION 2.0

The second version of ADSSEC attempts to address the limitations of the previous version. Also, ADSSEC Version 2.0 introduces a representation for learnt clause scoring heuristics and discusses the potential for incorporating restart schemes.

4.2.1 Heuristic Representations. Similar to the representation in evolv-ing CDCL variable scorevolv-ing heuristics, the primitives for the restart and learnt clause scoring heuristics need to be derived from modern CDCL solvers. Recently, these two heuristics of several solvers have adopted an attribute of learnt clauses as a metric

(38)

of interest. This measurement, labeled as the Literal Block Distance (LBD), essen-tially indicates how many decisions were made that contributed to the creation of a learnt clause. Audemard and Simon describe the significance of this value as well as how LBDs can be applied to restart schemes [42, 43]. Their solver, Glucose, stores the previous W LBDs and, if the average of the LBDs in that window multiplied by a constant (K) exceeds a calculated threshold, then a restart is triggered. Biere and Fr¨ohlich attempt to replace this window with an exponential moving average where the more recent LBDs are more influential [6]. This function still contains the constant (K), but substitutes the window size (W) with an exponential smoothing parameter (α). Currently, the MiniSat restart simply backtracks to the start when a set number of conflicts (C) has been reached. Evolving this heuristic will require selecting both the approach and the values of the associated parameters. Recom-bination of individuals in the population will require that the selected approaches share parameters. Audemard and Simon conclude that solvers with successful restart schemes will still perform poorly without useful learnt clauses [43]. As such, effec-tive learnt clause scoring heuristics will need to be discovered before evolving restart schemes, which are not currently implemented in ADSSEC Version 2.0.

The MiniSat default learnt clause scoring heuristic determines the usefulness of a learnt clause by how active the clause is [7]. If a learnt clause is employed when a decision is made, then the activity score of that clause is bumped. The more active clauses are assumed to be more useful. Less active clauses are removed to allow for more learnt clauses to be stored. The Glucose solver takes a different approach to determining the usefulness of a learnt clause. When a solver makes a decision for the assignment of a variable, this can propagate to assigning values for other variables without needing to make a decision. The decisions that propagate through many variables tend to be important and present themselves as low LBD scores in learnt clauses. Glucose assumes that the useful learnt clauses will have smaller LDB scores [42]. ADSSEC manipulates primitives derived from both of these

(39)

assumptions to evolve novel learnt clause scoring heuristics. In particular, the system uses Koza-style GP [40] as it is well suited to represent the parse trees representing the heuristics. ADSSEC defines the following terminal nodes:

• Score (s): The previous score of the learnt clause.

• Conflict Index (i): The current number of conflicts encountered.

• Minisat’s Clause Increment (gi_{): The amount MiniSat increases the score of}

a clause. As this value grows exponentially, ADSSEC only allows for g to be selected from the continuous range of [1.0−1.1].

• LBD (lbd): The LBD of the learnt clause.

• LBD Inverse (lbdI): Computed as (1/lbd). The higher learnt clause scores are kept and, therefore, low LBDs are more useful.

• Number of Literals (n): ADSSEC Version 2.0 promotes solvers that use minimal memory, so smaller learnt clauses are preferred.

• Constant (C): A constant value in {1,2,3, . . . ,10} ∪ {0.0,0.1,0.2, . . . ,0.9}. ADSSEC also defines the following non-terminal (binary) operator nodes for the learnt clause scoring heuristics: Addition (+), Subtraction (−), Multiplication (∗), and Division (/). Similar to the operators in the variable scoring heuristics, these arithmetic operators may be applied because all the terminal nodes relate to either integer or floating-point values.

Similar to the variable scoring heuristic, the learnt clause scoring heuristic is translated to C++ and then replaces the original heuristic. Also, a few additional lines are added for maintaining the values of nodes with type gi_{. Due to the varying}

complexity of the heuristics, a design decision had to be made when addressing the reduction of learnt clauses. MiniSat considers two conditions when selecting which learnt clauses to remove. After the learnt clauses are sorted, MiniSat removes the clauses with the lowest scores until the count is half of the stored learnt clauses. MiniSat will continue to remove clauses with scores less than a set threshold: gi

(40)

evolved heuristics may no longer be consistent with this threshold and, as such, this particular threshold cannot be used. However, ADSSEC may benefit from having a mechanism for eliminating useless learnt clauses, so the threshold was set at one divided by the number of learnt clauses. Any clauses with small or negative scores will be removed from the set. This allows much of the MiniSat code to remain unchanged and for ADSSEC to utilize existing functionality with the new heuristics.

4.2.2 Objective. While measuring SAT solver performance is typically de-pendent solely on runtime, concurrently executing multiple solvers heavily skews these results. Using the number of decisions the solver needed to find a solution is a more consistent metric. However, this value alone is not always proportionate to the runtime of a solver and cannot be the only objective score. As supported by the results obtained using Version 1.0, ADSSEC Version 2.0 includes the total number of learnt clause literals (i.e., conflict literals) as a second objective. This promotes solver variants that can solve instances in as few decisions as possible while also minimizing memory allocation.

The previous version of the ADSSEC system suffered from an inadequate “timeout” technique. For a given instance being executed by an evolved MiniSat variant, the solver was set to terminate if a solution had not been discovered in a calculated number of decisions relative to the default MiniSat’s performance. Unfor-tunately, these limitations could be unreasonable for the evolved variant if MiniSat quickly solved the instance. Version 2.0 loosens these bounds to be dependent on the maximum of either the average number decisions for the sample set of instances or the number of decisions MiniSat needed. This modified method should allow each heuristic a fair comparison against the original MiniSat heuristic on the entire sample set rather than a single instance. Additionally, the objective sub-scores were modi-fied to be ratios relative to the average number of decisions and conflict literals for the entire sample set rather than just the same instance. This promotes an overall improvement rather than an improvement on an instance-by-instance basis. Also,

(41)

should a heuristic be re-discovered and evaluated multiple times, the objective scores will be averaged over the number of occurrences. Such a check verifies or corrects the previous evaluations and prevents a single heuristic from flooding the population.

4.2.3 Selection. The crowding survivor selection could no longer be used with the addition of the second objective. Instead, a multi-objective approach similar to the NSGA-II algorithm is employed [44]. This allows individuals in the popula-tion to be removed without requiring that direct offspring be the replacement. The uniform parent selection was altered to ak-tournament selection.

(42)

5 EXPERIMENTATION

In the following three experiments, ADSSEC is employed to collect results, and an analysis of these results offers a discussion on the design of such a system. The first aims to more clearly understand the potential for asynchronous parallel approaches in hyper-heuristics, particularly in the case of ADSSEC Version 1.0. Utilizing the same version of ADSSEC, the second experiment considers the effectiveness of auto-matically constructing SAT solvers for specific sets of instances. Finally, the third explores and empirically analyzes the performance of an expanded multi-objective version of ADSSEC (Version 2.0).

5.1 ASYNCHRONOUS VERSUS SYNCHRONOUS APPROACHES

This section empirically evaluates the performance of asynchronous parallel evolutionary algorithms (APEAs) when compared to synchronous parallel evolution-ary algorithms (SPEAs) on global populations of CDCL SAT solver variable scoring heuristics.

5.1.1 Experimental Setup. Ideally, one would want to construct entire solvers for a given problem, and ADSSEC demonstrates the obstacles and, more importantly, the potential of adapting a single component of a CDCL solver. The experiments with the prototype ADSSEC system require datasets that have:

• instances that ADSSEC can feasibly train on in a short period of time (each instance should require at most a few seconds for the original MiniSat to solve)

• enough instances to sufficiently represent a distinct instance class for both train-ing and testtrain-ing

• instances that are difficult enough that the instance can benefit from a fitted heuristic

(43)

Unfortunately, these requirements make many of the usual SAT datasets inappropri-ate for the initial prototype experiments. Instances from previous SAT competitions attempt to challenge the capabilities of the solvers, so many require too significant an amount of time to solve. Many publicly available datasets contain too few instances to sufficiently represent the distinct problem class or the instances are so simple that nothing is gained by creating a fitted heuristic. The idea behind generating datasets for ADSSEC is that generators provide enough control to meet these criteria while preventing bias by hand-selecting specific instances.

A modularity-based generator developed by Jes´us Gir´aldez Cru∗ was used to create 80 instances for the datasets. These instances simulate an underlying structure that may be found in an industrial class of instances. Each CNF contained 5000 variables in 19000 clauses. The generator allows the user to specify the structure of each instance; the tool generated 90 communities with 3 literals per clause. The seeds 1 through 40 were used. Half of the instances were configured with a modularity of 0.85 and the other 40 with 0.9. A majority of the instances were satisfiable while 32 were unsatisfiable. ADSSEC was trained on select subsets of 32 instances where the sample size was 15 allowing up to 5 strikes per evaluation. To obtain the subsets, the 80 instances were first sorted by the number of decisions default MiniSat needed to solve each. Then, the instances were divided into five groups of sixteen instances, where Group 0 needed the least number of decisions and Group 4 needed the most decisions. The final subsets used for training were constructed as follows:

• Dataset A: combined Group 0 and Group 1

• Dataset B: combined Group 0 and Group 2

• Dataset C: combined Group 0 and Group 3

• Dataset D: combined Group 0 and Group 4

∗

(44)

Dataset A has the least variation in evaluation time and Dataset D has the most. Some instances can be solved in less than a thousandth of a second while the longer ones can take several seconds.

The computationally extensive search made automated tuning of ADSSEC’s parameters infeasible; thus, manual tuning was used to discover the configuration in Table 5.1. ADSSEC created an initial population of 20 random individuals. The master process used 20 slaves processes for evaluating offspring, either synchronously producing offspring at each generation or asynchronously creating new offspring as each node became available.

ADSSEC selected parents uniformly for either recombination or mutation – with a mutation probability of 0.10 and, subsequently, a recombination rate of 0.90 – and used a shared crowding method for survival selection. ADSSEC was config-ured to terminate after 1000 evaluations and restart after 100 evaluations without improvement.

Table 5.1 ADSSEC EA parameter settings

Population Offspring Mutation Crossover

(µ) (λ) Rate Rate

20 20 0.10 0.90

Termination Restart Dec. Limit Sample

Evaluations Evaluations Multiplier Size

1000 100 3.0 →1.0 15 (5 strikes)

ADSSEC was executed on a machine with dual Intel Xeon E5-2630 v3 2.4 GHz octa-core processors and 128 GB 2133 MHz DDR4 RDIMM ECC RAM running Ubuntu 14.04. Both the synchronous and asynchronous models were run 8 times on Datasets A, B, C, and D.

(45)

5.1.2 Results and Discussion. The aggregate user time across all pro-cesses was collected from each run of ADSSEC for both the synchronous model and asynchronous model on the datasets. The variance in the number of decisions needed to solve the instances in each dataset directly influences the variation in evaluation time for each heuristic. Increased variation in evaluation times allows for greater speed-ups in the asynchronous model over the synchronous approach. As illustrated in Figure 5.1, the growth in variance of MiniSat decisions in each dataset results in more evident speed-ups for the asynchronous evolution times. Additionally, there may be a proportional increase in variation of evolution time for both models.

(a) Dataset A (b) Dataset B (c) Dataset C (d) Dataset D

Figure 5.1 Boxplots of evolution time relative to the mean of the evolution time for the asynchronous runs on each respective dataset. The average relative asynchronous evolution time will always be at 1.0 for each plot. Using default MiniSat, Dataset A had an average of 7,441 decisions and stan-dard deviation of 11,798 decisions. Dataset B had an average of 99,313 decisions and standard deviation of 118,761 decisions. Dataset C had an average of 297,925 decisions and standard deviation of 309,996 decisions. Dataset D had an average of 522,668 decisions and standard deviation of 565,955 decisions. The average and standard deviation of the decisions needed by MiniSat to solve the instances used in each dataset describe the difficulty and variation that can be expected to influence evaluation times.

(46)

The results from ANOVA tests confirm the improvement provided by the asynchronous method (see Table 5.2); as the p-values are all approximately zero, there is very high confidence in this conclusion. In Dataset D, the synchronous method needed an average of 5.24 hours to complete the same number of evaluations that asynchronous finished in an average of 3.19 hours. These results were obtained where the longest time to solve an instance is approximately five seconds. In the read-world, industrial instances can require several minutes to hours to complete. Employing the asynchronous evolution when training ADSSEC on datasets containing those instances would measure speed-ups in CPU days or weeks.

Table 5.2 ANOVA results of evolution time in seconds comparing both models of ADSSEC Synchronous Asynchronous Dataset A Mean 515.1038 435.1900 Variance 35.4130 33.4400 P-value 0.0000 Dataset B Mean 1,647.6300 1,198.9300 Variance 11,554.0380 1,455.4835 P-value 0.0000 Dataset C Mean 15,453.0900 9,260.6925 Variance 493,279.9961 112,446.0049 P-value 0.0000 Dataset D Mean 18,865.0775 11,470.6363 Variance 3,539,070.1957 230,474.1322 P-value 0.0000

(47)

5.2 ADSSEC VERSION 1.0 EXPERIMENT

This section investigates the quality of the heuristics produced by the initial single-objective version of the ADSSEC system as well as the effectiveness of its approach.

5.2.1 Experimental Setup. Ideally, one would want to construct entire solvers for a given problem, and ADSSEC demonstrates the obstacles and, more importantly, the potential of adapting a single component of a CDCL solver. The experiments with the prototype ADSSEC system require datasets that have:

• instances that ADSSEC can feasibly train on in a short period of time (each instance should require seconds to minutes for the original MiniSat to solve)

• enough instances to sufficiently represent a distinct instance class for both train-ing and testtrain-ing

• instances that are difficult enough that the instance can benefit from a fitted heuristic

Unfortunately, these requirements make many of the usual SAT datasets inappropri-ate for the initial prototype experiments. Many publicly available datasets contain too few instances to sufficiently represent the distinct problem class or the instances are so simple that nothing is gained by creating a fitted heuristic. Also, instances from previous SAT competitions attempt to challenge the capabilities of the solvers, so many require too significant an amount of time to solve. The time needed for a single evaluation is the product of two numbers: (1) the amount of time needed by a solver to find a solution to the average instance in the dataset and (2) the number of instances needed for a representative sample. Additionally, one must consider approx-imately how many evaluations are needed to discover an improved heuristic and how many physical machines are available to ADSSEC. Applying ADSSEC to hard prob-lems becomes more feasible with smaller samples, fewer evaluations, and more/faster CPUs. The idea behind generating datasets for ADSSEC is that generators provide