Performance - Background Reasoning - Disproving in First-Order Logic with Definitions, Arithmet

3.2 Background Reasoning

3.3.1 Performance

Although there are many high-level descriptions of Cooper’s algorithm implementa- tions, there are few descriptions of actual implementation details, for example Phan and Hansen [PH15] describe an implementation optimized for parallelism. For test- ing their implementation, the authors used a parametric form of the pigeon hole problem encoded in Peano Arithmetic.

This section describes a selection of parametric problems in the language of Σ_Z and the performance of Beagle’s Cooper solver on them. There are five problem classes, one of which is encoded in two ways. Table 3.1 reports the results of Bea- gleand CVC4 (version 1.4) on the problem instances, along with the parameters used and the problem’s satisfiability status. The following sections describe each problem class along with the meaning of their parameters. In general, the problems instances reported in Table 3.1 were chosen to show points where the performance of either solver changed, or to illustrate an apparent relationship between some parameter and the solving time.

These experiments were carried out on a Linux desktop with a quad-core Intel i7 chip running at 2.8 GHz, with 8GB of RAM, although the host JVM6was configured with maximum heap size of 4GB (relevant for Beagle). The CPU time limit was 60 seconds soft (solver’s heuristic target time) and 65 seconds hard (unresponsive processes killed).

The values in the status column reflect theexpectedresult of a proof attempt, based on the construction of the problem. They have the following meanings:

• “Theorem/Counter-Sat” results indicate that the formulas have a designated conjecture formula which will be negated by the solver.

• “Satisfiable/Unsatisfiable”, have no designated conjecture.

• “?” indicates that status of the problem is unknown.

The rationale behind comparing with CVC4 is that CVC4 implements projection style BG reasoning []. As can be easily observed from the table, CVC4’s implementation is far more sophisticated than that ofBeagle, however, the class of problems for which QE is suited is not completely subsumed by projection style reasoning.

Systems of Linear Equations. Given equations a00x+a01y+a02z≈0 a10x+a11y+a12z≈0 a20x+a21y+a22z≈0

for fixed integer coefficients aij, check if there exists an assignment to the variables that satisfies all equations. There can be either no solution, a single solution or in- finitely many solutions, depending on the choice of coefficients. Cooper’s algorithm

Problem Parameter Status Cooper CVC4 Frobenius S={7, 8} Satisfiable 1.31 - S={17, 18} Satisfiable 1.11 - S={34, 35} Satisfiable 3.14 - S= {11, 17, 25} Satisfiable 5.60 - S= {53, 24, 27} Counter-Sat 4.74 - S= {179, 89, 90} Satisfiable - - S={3, 11, 17, 25} Satisfiable - - nQueens n = 3 Unsatisfiable 0.1 0 n = 4 Satisfiable 1.29 0.01 n = 8 Satisfiable 31.91 0.03 Subset-sum |S|=2,n=111,k =5 (1) Theorem 0.1 0.01 |S|=2,n=111,k =5 (3) Counter-Sat 1.1 0.01 |S |=3,n=13,k =3 (1) Counter-Sat 0.93 0 |S |=5,n=55,k =7 (1) ? - -

Pigeon-Hole Ex. p = 5, h=6 Satisfiable 0.88 0

p = 7, h=6 Unsatisfiable 10.15 0.62

p = 10, h=9 Unsatisfiable - -

p = 10, h=11 Satisfiable 1.78 0.01

Pigeon-Hole Rel. p = 5, h=6 Satisfiable 2.87 0

p = 7, h=6 Unsatisfiable 12.96 0.1 p = 10, h=9 Unsatisfiable - 1.33 p = 10, h=11 Satisfiable 8.17 0.8

Linear Equations n=3 Satisfiable 0.78 0

n=3 Satisfiable 11.98 0

n=3 Satisfiable - 0

n=3 Unsatisfiable 0.79 0

n=3 Unsatisfiable 25.1 0

n=5 Unsatisfiable - 0

Table3.1: Cooper performance on representative instances of problems

is especially sensitive to the size of coefficients, hence choosing larger coefficients provide good test cases for the instantiation phase.

Run time is proportional to the lcms of the coefficients, and it doesn’t appear to matter whether it is satisfiable or unsatisfiable. The exception is where one equation is a constant multiple of another, this case can be easily detected.

CVC4 has a built-in linear Diophantine equation solver, which likely explains the excellent performance on this problem set.

Frobenius problem. Given a set ofk positive numbers{a1, . . . ,ak}whose gcd is 1, find the maximum number that cannot be expressed as a sum a1x1+. . .+akxk for positive xi. For set {11, 17, 25}, the problem is equivalent to showing the following

formula is satisfiable: ∃y.(∀k1,k2,k3. ((0≤k1 ∧ 0≤k2 ∧ 0≤k3)⇒ y6≈11·k1+17·k2+25·k3))∧ ∀x.(∀k1,k2,k3. ((0≤k1 ∧ 0≤k2 ∧ 0≤k3)⇒ x 6≈11·k1+17·k2+25·k3))⇒ x≤y))).

The difficulty of the problem can be adjusted by changing k or the values ai in the set{a1, . . . ,ak}.

Problems in the table simply check the formula above with the set of coefficients given. The final problem is counter-satisfiable (i. e. , the negation of the above formula is satisfiable), as the parameters do not have gcd 1.

There are analytic solutions for k = 2 and k = 3. The performance of Cooper grows with the Frobenius number, which at least fork=2 andk =3 is proportional to a1×. . .×ak. However, the instance {3, 11, 17, 25}has Frobenius number 19 yet it is not solved, suggesting that the difficulty also scales withk.

Subset sum game. Consider a two player game, where given a set of non-zero numbersSand numbern, each player alternately subtracts a value inSfrom nuntil 0 is reached. Values inSare not removed during play. A player wins when they reach exactly 0, and loses if forced to subtract a value making the running sum negative. Hence, a player can also win if they force the other player to make a losing move. The problem is to show that given a fixed set S and positive numbersn,k there is a winning strategy for the first player inksteps. Expressed as a first order formula, for S= {1, 3, 4}andn=11,k =3:

∃x1. (((x1 ≈1 ∨ x1 ≈3 ∨ x1 ≈4) ∧ 11−x1≥0)∧

∀y1.((y1≈1 ∨ y1≈3 ∨ y1≈4)∧ (11−x1−y1)≥0))⇒

∃x2.((x2≈1 ∨ x2≈3 ∨ x2≈4) ∧ (11−x1−y1−x2)≈0)

Although there are other, more effective algorithms for proving the existence of winning strategies, the value in this problem lies in the fact that the number of quantifiers can be adjusted by setting the number of stepsk.

Problem instances listed above have parameters |S|, n, k, where values in S are chosen from the range [1,bn/2c] and k must be odd. Instances are allowed to be infeasible, e. g. , ifk·max(S)< n.

Problem difficulty appears to scale with the number of possible move sequences, roughly |S|k_{. For example, where both} _|_S_| _and _k _{are small, the problems are easily} solved regardless of the size ofn. (In fact ifnis too large then the problem is always infeasible). Conversely, if both |S|and k are large, then problems become difficult, even ifnis small.

SAT problems. Boolean SAT problems can be encoded, simply by replacing each Boolean variablexwith the literalx0 ≥0, or byx0 ≈1 and then addingx ≈0∨ x ≈1. Checking satisfiability of the SAT problem is equivalent to checking satisfiability of the existential closure of theΣ_Z-formula. Certain common SAT problems have more efficient encodings. The test problems include two encodings of the pigeon-hole problem. The first uses integer-sorted variables one for each ’pigeon’, and restricts the values each variable may take to be in[0,h−1](h is the number of holes). The variable takes the value of the hole the pigeon is in. This is the existential (Ex.) encoding in the table. The second encoding uses a simple Boolean encoding where each Boolean variable (px,y means pigeon x is in hole y) is replaced with x ≥ 0. In the table, parameter pis the number of pigeons and hthe number of holes.

There is also an encoding of then-queens problem with integer-sorted variables, where theith variable represents the column position of the queen in theith row.

All such SAT problems are limited to a single quantifier. The only adjustable parameter is the number of variables and possible assignments.

The results show better performance of the Cooper solver on the existential encoding with faster run times for each given parameter. As typical for pigeon-hole problems, satisfiable instances are solved much more easily than unsatisfiable ones for similar parameter values. It is interesting to note the performance degradation of the SMT solver on the p=10,h=9 instance of the existential encoding compared to the Boolean encoding.

Summary. It is encouraging to see that the provers have a somewhat complemen- tary capability. Problems solved were in line with predictions made from an un- derstanding of the search styles of the two algorithms: Cooper eliminates variables by considering the values of literals modulo divisibility constraints, this means it is strongly affected by coefficient values and only weakly by Boolean structure (i. e. , depth of formulas, size of disjuncts/conjuncts). The CVC4 solver appears to be based on a projection method (a version of the Omega test), which operates on conjunctions. This requires a conversion to DNF, which can be accomplished more efficiently using a SAT solver working on the propositional abstraction of the LIA formula. The result is that more ’Boolean-like’ problems are dealt with efficiently, such as the relational encoded pigeon hole problem, while more ’LIA-like’ problems suffer somewhat. This is mitigated by component solvers which allow solving, e. g. , linear systems of equations efficiently.

This suggests that the encoding must be taken into account when using the above theory solvers: ’Boolean-like’ encodings will work better when sent to projection style solvers, while arithmetic encodings will work better with Cooper.

In document Disproving in First-Order Logic with Definitions, Arithmetic and Finite Domains (Page 69-72)