An undirected weighted graph is a pair (V, w) where V is a set of vertexes and w : V ×V → R+∪{0} is the edge weight function that satisfies w(x, y) = w(y, x).
This definition generalizes the classical notion of graph (V, E), where E ⊆ V ×V , by taking w(x, y) = 1 if (x, y) ∈ E and w(x, y) = 0 otherwise. The degree of a vertex x is defined as deg(x) = P
y∈Vw(x, y). A bipartite graph is a tuple (V1, V2, w) where V1and V2are two disjoint sets of vertexes, and w : V1× V2→ R+∪ {0} is the edge weight function.
15
Given a SAT instance, we construct two graphs, following two models. In the Variable Incidence Graph model (VIG, for short), vertexes represent variables, and edges represent the existence of a clause relating two variables. A clause x1∨ · · · ∨ xn results into n2 edges, one for every pair of variables. Notice also that there can be more than one clause relating two given variables. To preserve this information we put a higher weight on edges connecting variables related by more clauses. Moreover, to give the same relevance to all clauses, we ponder the contribution of a clause to an edge by 1/ n2. This way, the sum of the weights of the edges generated by a clause is always 1.
Definition 2.1 (Variable Incidence Graph (VIG)). Given a SAT instance Γ over the set of variables X, its variable incidence graph is a graph (X, w) with set of vertexes the set of Boolean variables, and weight function:
w(x, y) = X
c∈Γ x,y∈c
1
|c|
2
where |c| is the length of the clause c.
In the Clause-Variable Incidence Graph model (CVIG, for short), vertexes represent either variables or clauses, and edges represent the occurrence of a variable in a clause. Like in the VIG model, we try to give the same relevance to all clauses, thus every edge connecting a variable x with a clause c containing it has weight 1/|c|. This way, the sum of the weights of the edges generated by a clause is also 1 in this model.
Definition 2.2 (Clause-Variable Incidence Graph (CVIG)). Given a SAT in-stance Γ over the set of variables X, its clause-variable incidence graph is a bipartite graph (X, {c | c ∈ Γ}, w), with vertexes the set of variables and the set of clauses, and weight function:
w(x, c) =
1/|c| if x ∈ c 0 otherwise
Other graph representations of SAT instances, which may be useful in future works, are discussed in Section 9.2.1.
Chapter 3
Related Work
In this chapter, we summarize some related works on the underlying structure of real-world problems, with special emphasis on SAT instances, and its relations to the cost of solving such problems, and also on the generation of random SAT instances with properties of real-world benchmarks. First, we introduce general background about structure in search problems in Section 3.1. Then, we review some related works on the generation of pseudo-industrial random SAT instances in Section 3.2. Finally, we dedicate two separate sections to review two interesting works that inspired this thesis. On the one hand, the analysis of the scale-free structure of industrial SAT formulas [Ans´otegui et al., 2009a] is summarized in Section 3.3. In that work, it is studied whether the number of variable occurrences and clause size follow power-law distributions. It is worth noting that all the tools needed to compute these structure features were re-implemented and integrated in our graph structure features software. Moreover, empirical results have been evaluated in a set of benchmarks distinct from the one used in [Ans´otegui et al., 2009a], complementing the conclusions drawn in that work. On the other hand, we review the pseudo-industrial random SAT instances generator based on the scale-free structure of real-world problems [Ans´otegui et al., 2009b] in Section 3.4. This generator produces random SAT instances with a power-law distribution in the number of variable occurrences. Also, authors use this generator to analyze the performance of CDCL SAT solving techniques, with the aim of better understanding their success on application benchmarks.
We use this approach to propose a new pseudo-industrial random SAT instances generator based on the notions of community structure and modularity (instead of scale-free structure). This generator is presented in Chapter 6.
3.1 Structure in search problems
The topology of graphs have a major impact on the cost of solving search prob-lems on these graphs. Gent et al. [1999] analyze the impact of a small-world topology on the cost of finding solutions to graph coloring problems. Walsh
17
[2001] does the same in the case of scale-free graphs. Walsh[1999] analyzes the small world topology of many graphs associated with search problems in Arti-ficial Intelligence. He also shows that the cost of solving these search problems can have a heavy-tailed distribution. Therefore, we can expect that SAT solving, viewed as a search process on a graph (the formula), will be affected by the topology of this graph.
As we mentioned before,Ans´otegui et al.[2009a] studied the scale-free struc-ture of real-world SAT instances, analyzing whether the number of variable occurrences and clause size follow power-law distributions. We review this work in Section ??.
The power-law distributions also appear in other aspects of SAT solving.
Gomes et al. [2004] show that the CPU time of the different executions (with different random variable selection) of a solver on a formula follows a power-law distribution. In [Gomes and Selman, 1997] and [Gomes et al., 1998], it is proposed the use of randomization and rapid restart techniques to prevent solvers from falling on the long tail of power-law distributions.
Biere and Sinz [2006] show that many SAT instances can be decomposed into connected components, and how to handle them within a SAT solver. They discuss how this structure into connected components can be used to improve the performance of SAT solvers. Since our notion of community structure is more general, it might be more practical to analyze and improve the performance of SAT solvers.
Our work on community structure of SAT instances [Ans´otegui et al., 2012]
has already had impact on the SAT community. A pioneering work on using community structure to speed-up solvers was presented byMartins et al.[2013].
In particular, they propose to solve Maximum Satisfiability formulas by parti-tioning them according to the community structure and adding incrementally to the MaxSAT solver the sets of clauses related to communities. This solution is impoved in [Neves et al., 2015]. Sonobe et al.[2014] use the partition obtained with the community structure to improve the performance of a parallel SAT solver. Newsham et al.[2014] show that the community structure is correlated with the runtime of CDCL SAT solvers. In [Newsham et al., 2015], a tool for visualizing the community structure of SAT instances is presented, and some empirical observations about the evolution of such structure regarding CDCL SAT solvers are enumerated. Ganian and Szeider [2015] use the community structure as inspiration to define a structural parameter of CNF instances, and they define tractable algorithms to solve SAT and #SAT fixing such structural parameter.
Other structural features of SAT instances have also been studied in other papers. For instance, Katsirelos and Simon[2012] study the centrality of vari-ables picked by a CDCL solver. Simon [2014] use observations from the SAT solver performance on industrial problems to better understand some of the un-satisfiability proof characteristics. Other notions of structure have been also used to better understand the branching heuristics of modern SAT solvers and improve them [Liang et al., 2015a,b].