Discussion - Statistical Methods for the Analysis of Stochastic Optimisers

3. Statistical Methods for the Analysis of Stochastic Optimisers

3.10. Discussion

Definition 3.3 A landscape L = (S, N , f ) is connected at a level l if, and only if, there

exists a path in the neighbourhood graph GN between any two solutions s, s0 ∈ S, visiting only

solutions t with f (t) ≤ l. If no restriction on f (t) is given, we that L is connected, without further specifications on the level.

The neighbourhood definition plays a fundamental role in the definition of connected landscapes. Most neighbourhood structures result in a connected landscape, if we do not impose any restriction on the level. But things may change with the introduction of constraints and with the adoption of particular search strategies.

We illustrate the possible pitfall with an example. In general, search landscapes are

multi-dimensional and extremely difficult to visualise. In Figure 3.10 we reduce the

search space to two dimensions and provide a three dimensional view which includes the evaluation function level and the contours of quality levels. A landscape similar to

the one in Figure3.10(a)may occur in a highly constrained problem with hard and soft

constraints weighted in a single evaluation function. The higher barrier may then con- sist of all those solutions that violate some hard constraints, since these are much more severely weighted in the evaluation function. The search space is in this case clearly connected if no level restriction is imposed on the local search.

If we adopt a different strategy, that is, if we decide not to weigh all the constraints in the evaluation function but only the soft constraints and to forbid visiting infeasible solutions, then the search landscape becomes disconnected even without any level restriction

on the local search. This case is visualised in Figure3.10(b). Clearly, in a disconnected

landscape the search may become trapped in single separated regions. In addition to this, SLS methods tend to limit the entity of worsening steps to jump out from local op- tima. A typical case is Simulated Annealing that uses a probabilistic acceptance criterion that links the probability of moving from a solution to a worse one to the size of the barrier between the two solutions in the landscape. As the search proceeds further and the probabilistic criterion becomes tighter, even small barriers become insurmountable. Thus the search landscape may become characterised by many small basins of attraction

as depicted in Figure3.10(c).

3.10. Discussion

We reviewed the main methods for analysing stochastic optimisers. We tried to collect and organise the tools of applied statistics that are most relevant in the context of combinatorial optimisation. In this process, we recognised that a statistical analysis of algorithms leaves many issues open. We will try to collect them in the conclusions of this thesis.

Of central importance for the whole thesis is the definition of the statistical methods for the analysis of experiments on the basis of solution quality. We will provide evidence in Chapter 4 of the difficulty to adapt parametric methods to the analysis of stochastic optimisers. We will instead compare permutation tests and rank-based tests and give

preference to the most powerful. A simulation study is reported in AppendixBfor vali-

dating and comparing these two approaches. When the behaviour of the tests is similar, the choice of the test depends on the real application because rank based tests remove the entity of the differences on single instances while permutation tests take these en- tities into account. However, the experience in this thesis seems to confirm the wide belief that parametric tests remain robust under considerable violations of some of their

82 Statistical Methods for the Analysis of Stochastic Optimisers

(a) All constraints weighted in the evaluation function.

(b) The path through solutions that violate some constraints is not allowed (vertical white line).

Figure 3.10.: A simplified two dimensional search space landscape with a continuous evaluation function for three different SLS strategies.

3.10 Discussion 83

assumptions, while instead the implementation of permutation tests still requires cali- bration efforts.

Of particular relevance is also sequential analysis that constitutes the central aspect in the experimental methodology for the development of SLS algorithms described in Chapter 6. A time dependent analysis will be used only to corroborate, or to restrict the extent of the results inferred by the previous methods.

The space devoted to landscape analysis in this thesis, instead, is marginal and certainly insufficient. This is a very attractive area of research, as it could unveil the mystery connected with local search algorithms and provide rational arguments for their use. But it certainly requires full time research and strong skills in a series of disciplines such as Mathematics, Programming, Statistics, Probability Theory, and Physics plus a certain degree of creativity in envisaging new hypothesis to test.

Chapter 4. Graph Colouring

In which we deal with Stochastic Local Search algorithms for solving large instances of the graph colouring problem. We introduce new methods, study in-depth the use of large scale

neighbourhoods, and provide extensive experimental results on benchmark graphs.

4.1. Introduction

The interest in graph colouring originates from the question about how many different colours are necessary to colour the countries of a map in such a way that no pair of adjacent countries receives the same colour. This question, that can be formalised by the use of planar graphs, has spawned an enormous amount of mathematical research and only rather recently the conjecture that four colours are enough for colouring planar

graphs has been finally proved trueAppel et al.(1977).

The graph colouring problem (GCP) for more general graphs than planar ones remains a central problem of graph theory with many important real life applications. Some of such applications were already mentioned in the introduction of this thesis and are reg-

ister allocation (Allen et al.,2002), job scheduling (Leighton,1979), air traffic flow man-

agement (Barnier and Brisset, 2002), light wavelengths assignment in optical networks

(Zymolka et al.,2003), and timetabling (de Werra,1985). Another real life situation that

can be modelled as a GCP is testing for unintended short circuits in printed board circuits, where “nets” can be partitioned in supernets that can be tested simultaneously thus

speeding up the testing process (Garey et al.,1976). Two applications of graph colouring

have also been pointed out in Mathematics: solving the algebraic structure of Quasi-

groups (Gomes and Shmoys, 2002), and numerical estimation of large, sparse Jacobian

matrices (Hossain and Steihaug, 2002). In statistics, the problem of constructing Latin

square designs is also solved through graph colouring (Lewandowski and Condon,1996,

see also footnote2of Chapter 3, on page45). All these applications entail solving large

GCP instances where the size of the graphs usually exceeds 100 vertices.

In this chapter, we study the application of SLS methods for the GCP. We introduce a new local search with a very large scale neighbourhood, and we are the first to imple- ment and test other variants of SLS methods that were successful for the satisfiability problem in propositional logic and for constraint satisfaction problems. We empirically assess the performance of construction heuristics, iterative improvement procedures, and high-performing SLS algorithms. In this latter case, we compare our new algorithms with state-of-art algorithms through a rigorous experimental analysis. The final outcome is an unbiased evaluation of approximate algorithms for solving large GCP instances which

86 Graph Colouring

will become an important reference in the empirical analysis and choice of algorithms for this problem. We also attempt a more detailed map of algorithms in relation to structural characteristics of the graphs to be solved. Finally, for some algorithms we provide more in-depth analyses showing the behaviour with respect to run time, investigating reasonable stopping times, or, explaining the reason why they perform unexpectedly poorly.

4.2. Formal definition of the problem and notation

In the graph colouring problem we are given an undirected graph G = (V, E) and a set of colours Γ. The finite set V is the set of vertices, while the set E ⊂ V × V is the set of edges.

A colouring is a mapping ϕ : V 7→ Γ that assigns a unique colour to each vertex. The set of colours is written as a set of natural numbers: Γ = {1, . . . , k}, and, hence |Γ| = k. Equivalently, a colouring can be seen as a partition C of the set of vertices into k subsets,

that is, C = {C1, . . . , Ck}, where the vertices in the colour class Ciare coloured with colour

i ∈ Γ. The assignment ϕ and the partition C are two equivalent ways of defining a colouring and we will use either of them interchangeably as is convenient in the exposition.

A colouring is said to be feasible (or legal) if there are no pairs of vertices u, v ∈ V such that (u, v) ∈ E and ϕ(u) 6= ϕ(v). Similarly, we say that a colouring is infeasible if there exists an edge (u, v) ∈ E such that ϕ(u) = ϕ(v). In this case we say that the end vertices of such an edge are in conflict.

The decision version of the GCP, called the vertex k-colouring problem, consists in finding a feasible colouring using a defined number k of colours. It can formally be defined as

Input: An undirected graph G = (V, E) and a set of colours Γ with |Γ| = k ≤ |V |. Question: Is there a k-colouring ϕ : V → Γ such that ϕ(u) 6= ϕ(v) for all

(u, v) ∈ E?

The chromatic number χGis a characteristic of the graph and corresponds to the smallest

ksuch that a feasible k-colouring exists. The optimisation version of GCP, also known as

the chromatic number problem, consists in determining χG, and can be formalised as

Input: An undirected graph G = (V, E) and a set of colours Γ with |Γ| = k ≤ |V |. Question: Which is the smallest k such that a feasible k-colouring exists?

The chromatic number problem can be approached by solving a decreasing sequence of k-colouring problems until for some k a feasible colouring cannot be found. In this case, the best feasible colouring uses k + 1 colours and this is the chromatic number of the graph.

Next, we introduce a few definitions from graph theory which will be used throughout the rest of the chapter. The order of a graph is the number of vertices in the graph, i.e., n = |V |. The edge density ρ(G) is the proportion of |E| with respect to the potential edges

of G, i.e., ρ(G) = |E|/ |V |₂ . Graphs with a number of edges that is roughly quadratic

in the number of vertices are usually called dense, as opposed to sparse graphs, which

exhibit, instead, a linear dependence.1

4.2 Formal definition of the problem and notation 87

The degree of a vertex, denoted by d(v), is the number of edges which have v as one of their end points. The maximal degree ∆(G) and the minimal degree δ(G) are, respectively, the maximal and the minimal vertex degree over all vertices in V . The average degree of G is given by d(G) = 1 |V | X v∈V d(v).

If all the vertices of G are pairwise adjacent, then G is complete, while a set of vertices is independent (or stable) if no two of its elements are adjacent. An independent set is also denoted as a proper set, while if one, or more edges, exist between its vertices then it

is said improper. A path is a non-empty graph Pl−1 = (V, E)with V = {v0, v1, . . . , vl}

and E = {(v0, v1), (v1, v2), . . . , (vl−1, vl)}, where all vi are distinct. A cycle is a graph

Cl = (V, E ∪ {(vl, v0)}) with V, E taken from the path definition . The length of a path

or cycle is the number of edges it contains, i.e., l − 1 or l, respectively. A subgraph of

G(V, E)is a graph G0(V0, E0)with V0 ⊆ V and E0 _{⊆ E. A non-empty graph G is called}

connected if any two of its vertices are linked by a path in G. A connected component is a maximal connected subgraph of G. If all vertices of G are pairwise adjacent then the

graph is complete. A complete graph on r vertices is denoted as Kr. The greatest number

rsuch that a Kr is a complete subgraph of G with order r is called the clique number of

the graph and it is denoted by ω(G).

In order to distinguish the value of the chromatic number and the clique number from approximations of their values we sometimes use for the approximations the notation b

χ(G)andω(G), respectively. Clearly, we haveb χ(G) ≥ χ(G)b andω(G) ≤ ω(G).b

We also introduce the following notation which will serve for the description of our algorithms (U and T are two subsets of V , and, for brevity, if U = {v} we write U = v).

• Vc_{is the set of vertices in V that are involved in at least one conflict, i.e., V}c _{= {v ∈}

V : ∃u ∈ V, ϕ(v) = ϕ(u), (u, v) ∈ E};

• A_U(v) is the set of vertices in U adjacent to the vertex v, i.e., AU(v) = {u ∈ U :

(u, v) ∈ E};2

• EU(T )with T ⊆ U is the set of edges that connects vertices in T with vertices in

U, i.e., EU(T ) = {(u, v) ∈ E : u ∈ U, v ∈ T }. Note that because of T ⊆ U , EU(T )

contains also all edges (u, v) with u, v in T . As a special case, we denote with E_Uc

the set of edges between vertices in U , that have the end vertices in the same colour

class, i.e., E_Uc = {(u, v) ∈ E : u, v ∈ U, ϕ(u) = ϕ(v)}. Clearly, Ec_V =Sk

i=1ECci.

The following relations between adjacent vertices |AU(v)|and incident edges |EU(T )|

are derived trivially from the definitions above and by the principle of set theory that avoids multiplicity of the objects collected.

Remark 4.1 |EU(v)| = |AU(v)|, ∀v ∈ U, U ⊆ V.

tends to infinity. In addition, the concept of a sparse graph is distinct from the concept of low-density graphs. By sparse we mean families of graphs for which the number of edges |E| is in O(|V |) and so the density decreases with increasing |V |. Meanwhile, by low-density graphs we mean families of graphs which have O(|V |2

)edges but for which the density, remains small but constant with increasing |V | (for example we will encounter uniform randomly generated graphs which maintain a fixed density of around 0.1 independently from their size).

2_{In Graph Theory, the set A is often referred to as the neighbourhood of vertex v. Here we avoid this}

88 Graph Colouring

Remark 4.2 |EU(U )| = |S_v∈UEU(v)| = 1₂P_v∈U|EU(v)|, ∀U ⊆ V.

4.3. Known theoretical results, complexity, and

approximations

The focus of our study is on finding the chromatic number. For some special graphs,3

χ(G)is known; such cases are the following ones.

• χ(G) = 1 iff G is totally disconnected.

• χ(Kn) = n;

• χ(C_n) =

3 for n odd

2 for n even and n > 1;

• χ(C2n+1) = n + 1 where Cn is the complement (i.e., a graph on the same set of

vertices but with all edges not present in the original graph) of an odd cycle of order at least 5;

• χ(Sn) = 2where Sn is a star graph (i.e., a tree with one vertex of degree n and all

others of degree 1) and n > 1;

• χ(W_n) =

3 for n odd

4 for n even where Wnis a wheel graph and n > 2;

• χ(G) ≥ 3 iff G has a cycle of odd length;

• χ(G) ≤ 4, for any planar graph G. This famous result, called the Four Colour Theo- rem, seems to have been first conjectured in a letter from De Morgan to Hamilton in

1852. Recently, shorter proofs, yet still based on computer, appeared byRobertson

et al.(1996).4

The k-colouring number problem for arbitrary k was shown to be N P-complete by

Karp(1972). The k-colouring problem is solvable in polynomial time only for k = 2, and

for arbitrary k on the following special graphs: comparability graphs, chordal graphs,

circular arc graphs, (3,1) graphs, and interval graphs (Garey and Johnson, 1979). The

problem remains N P-complete for k = 3, and graphs having no vertex degree exceeding

4 (Stockmeyer,1973), and for arbitrary k on intersection graphs for straight line segments

in the plane and on circle graphs.

3_{A formal definition of the special graphs mentioned in this section is not important in our discussion. The}

interested reader is referred toDiestel(2000) or, for a comprehensive survey on the topic, toBrandstädt et al.(1999).

4_Cahit₍₂₀₀₄_{) has recently proposed a non-computer proof but his paper is not yet published and a history}

of incorrect “proofs“ for this problem suggests caution. Note that an O(n2₎_{time complexity algorithm}

for four-colouring planar graphs can be derived from the proof ofRobertson et al.(1996) but it seems not practical with regard to implementation. With regard to our introductory problem of colouring ge- ographic maps, in practice the need arises for models with more general graphs than the planar ones. Countries with exclaves like Russia and single point bordering countries make the four-colour theorem not applicable. In practice, the assignment task is solved by simple construction heuristics such as the DSATUR, described later (Freimer,2000).

4.3 Known theoretical results, complexity, and approximations 89

Approximability results Non approximability results

Factor Due to Factor Due to Assumption

O(n(log log n)2_{/ log}3_n) _Halldórsson₍₁₉₉₃₎ _O(n1−₎ _{Feige and Kilian}₍₁₉₉₈₎ _{N P 6= ZPP} O(n(log log n/ log n)3₎ _{Berger and Rompel}₍₁₉₉₀₎ _O(n1₅−

) Bellare et al.(1998) N P 6= coRP

O(n(log log n/ log n)2₎ _Wigderson₍₁₉₈₃₎ _O(n1₇−

) Feige and Kilian(1998) P 6= N P

O(n/ log n) Johnson(1974) O(n15−) Bellare et al.(1998) P 6= N P

Table 4.1.: Bounds on the approximability and non approximability of the chromatic number problem. Strength of the results increases as going upwards in the table. The two conditions N P 6⊆ ZPP and N P 6⊆ coRP are known to be equivalent.

“Bad” results are known even on the approximability of the chromatic number. For most existing polynomial-time graph colouring algorithms the absolute performance ratio can be as bad as O(n) where n = |V | (we recall that the absolute performance ratio is the largest ratio of colours used by the algorithm to the chromatic number over

all possible graphs, see also Section2.2). The best absolute performance guarantee of

O(n(log log n)2_{/ log}3_n)_{for an approximation algorithm is given by}_Halldórsson_(1993).

Even worse, there exist results that show that, unless P = N P, no polynomial time approximation scheme may exist for certain approximation ratios. We report some of

these results in Table4.1. The tightest bound on polynomial time approximation schemes

for the chromatic number is presented byFeige and Kilian (1998) who state that unless

N P ⊆ ZPP it is intractable to approximate χ(G) to within n1− _{for any constant > 0.}

The class ZPP, P ⊆ ZPP, arises in the theory of randomised computation and com- prises problems that can be solved in polynomial time by a probabilistic algorithm that makes no errors. More details on approximability for graph colouring can be found in

the survey byPaschos(2003) and on the Internet (Crescenzi and Viggo,2004).

In the general cases lower and upper bounds on the chromatic number exist. A simple upper bound is given by ∆(G) + 1. Brook’s theorem tightens this bound to χ(G) ≤

∆(G) if G is neither complete nor it has an odd cycle. A lower bound is instead ωG.

However finding the maximal clique in a graph is also an N P-hard problem (Garey and

Johnson,1979). Tighter lower bounds for specific classes of graphs may be found in the

literature. Johri and Matula (1982) provide tables on lower bounds for random graphs

obtained with the probabilistic method, which is a versatile approach for graph theory on random graphs. They also provide tables for estimates on the chromatic number. Finally,

a conjecture based on a theorem ofBollobás and Thomason(1985) states that, for random

graphs of sufficiently large order and with edge density equal to 0.5, χ(G) is with high

probability equal to |V |/2 log2|V | (Bollobás,2004).5

It seems reasonable to speculate that graphs with a large chromatic number must have large cliques and hence small girth (i.e., the length of the shortest cycle in a graph). Yet,

this may not be true. Erd˝os (1961) showed that for any two positive integers g and k

5_{The complete formula would be}n log(1/q)

2 log n (1 + o(1))where q = 1 − p, with p being the edge density, and

o(1)denotes a function of n converging to zero as n → ∞. For graphs of order 1000 and edge density 0.5, o(1) is still about 0.6 as shown inBollobás and Thomason(1985), hence the approximation given in the text is actually valid only for graphs of order much larger than those to which we will address our attention. The probabilistic approximations provided byBollobás and Thomason(1985) andJohri and Matula(1982) are very close as, for example, for n = 1000 and p = 0.5 they indicate χ(G) ≈ 80 and χ(G) ≈ 85, respectively. The tables inJohri and Matula(1982) present, however, more detailed results. Other improved probabilistic results are those ofLuczak(1991) andAchlioptas and Naor(2005). The latter one states that the chromatic number for a random graph Gnpwith p = d/n is either kdor kd+ 1

where kdis the largest integer k such that d < 2k log k. For n = 1000 and p = 0.5 this entails a χ(G) of 59

90 Graph Colouring

there exists a graph of girth at least g with chromatic number at least k. This important result suggests that a large chromatic number is not always caused by some dense local substructures, which could be easily detectable, but rather it may occur as a purely global phenomenon: in fact a graph of large girth, locally (around each vertex), looks just like a tree and is locally 2-colourable.

Graphs in which the local structure directly implies the chromatic number are perfect graphs. A perfect graph is a graph G such that for every induced subgraph of G, the size of the largest clique equals the chromatic number. Classes of graphs that are perfect include the empty graph, complete graphs, bipartite graphs, line graphs of bipartite graphs, and

In document Stochastic Local Search Methods for Highly Constrained Combinatorial Optimisation Problems (Page 97-110)