Improved Approximation Algorithms for Balanced Partitioning Problems

(1)

Partitioning Problems

Harald Räcke

1

_{and Richard Stotz}

2

1 Technische Universität München, Garching, Germany [email protected]

2 Technische Universität München, Garching, Germany [email protected]

Abstract

We present approximation algorithms for balanced partitioning problems. These problems are notoriously hard and we present new bicriteria approximation algorithms, that approximate the optimal cost and relax the balance constraint.

In the first scenario, we consider Min-Max k-Partitioning, the problem of dividing a graph intok equal-sized parts while minimizing the maximum cost of edges cut by a single part. Our approximation algorithm relaxes the size of the parts by (1 +ε) and approximates the optimal cost byO(log1.5nlog logn), for every 0< ε <1. This is the first nontrivial algorithm for this problem that relaxes the balance constraint by less than 2.

In the second scenario, we consider strategies to find a minimum-cost mapping of a graph of processes to a hierarchical network with identical processors at the leaves. This Hierarch-ical Graph Partitioning problem has been studied recently by Hajiaghayi et al. who presented an (O(logn),(1 +ε)(h+ 1)) approximation algorithm for constant network heightsh. We use spreading metrics to give an improved (O(logn),(1 +ε)h) approximation algorithm that runs in polynomial time for arbitrary network heights.

1998 ACM Subject Classification F.2.2 Nonnumerical Algorithms and Problems

Keywords and phrases graph partitioning, dynamic programming, scheduling

Digital Object Identifier 10.4230/LIPIcs.STACS.2016.58

1 Introduction

The problem of scheduling the processes of a parallel application onto the nodes of a parallel system can in its full generality be viewed as a graph mapping problem. Given a graph G= (VG, EG) that represents theworkload, and a graphH = (VH, EH) that represents the

physical computing resources, we want to map G onto H such that on the one hand the processing load is well balanced, and on the other hand the communication load is small.

More concretely, nodes of Grepresent processes and are weighted by a weight function w:VG→R+. Edges ofGrepresent communication requirements between processes, and are

weighted bycG:EG→R+. Nodes and edges ofH represent processors and communication links, respectively, and may also be weighted withwH :VH →R+representing the processing capacity of a physical node, andcH :EH →R+ representing the bandwidth capacity of a communication link. The goal is to map nodes of Gto nodes ofH, and to map edges of G to paths inH between the corresponding end-points such that a) every node inH obtains approximately the same weight and b) the communication load is small.

This general problem is very complex, and research has therefore focused on special cases. Many results can be interpreted as mapping to a star graphH withkleaves where the center

licensed under Creative Commons License CC-BY

(2)

nodexhaswH(x) = 0, i.e., processes may not be mapped to the center (cH(e) is assumed

to be uniform). When the goal is to minimizetotal communication load(sum over all edge loads inH) this problem is known as Min-Sumk-partitioning. You want to partitionGinto kdisjoint piecesV1, . . . , Vk such that the weight of any piece is at mostw(Vi)≤w(V)/kbut

the total capacity of edges between partitions is minimized.

For this problem it is not possible to obtain meaningful approximation guarantees without allowing a slight violation of the balance constraints (w(Vi)≤w(V)/k). Even et al. [5] obtain

a bicriteria approximation guarantee of (O(logn),2), which means they obtain a solution that may violate balance constraints by a factor of 2, and that has a cost that is at most an O(logn)-factor larger than the cost of the optimum solution that doesnot violate balance constraints. This has been improved by Krauthgamer et al. [9] to (O(√lognlogk),2). If one aims to violate the balance constraints by less than 2, completely different algorithms seem to be required that are based on Dynamic Programming. Andreev and Räcke [1] obtain an (O(log1.5_n),_{1 +}_{ε)-approximation which has been improved by Feldmann and Foschini to} (O(logn),1 +ε). For the case that the weight functionwH is non-uniform Krauthgamer et

al. [10] obtain an (O(logn),O(1))-approximation, which has been improved by Makarychev and Makarychev [11] to (O(√lognlogk),5 +ε).

When the goal is to minimize thecongestion(maximum load of a link) in the star network H, instead of the total communication load, the problem turns into the so-called Min-Max k-Partitioning problem. You want to partition a given graphGintok-partsV1, . . . , Vk of equal

size such that maxic(Vi) is minimized, wherec(Vi) denotes the total weight of edges leaving

Vi in G. For this problem Bansal et al. [2] obtain an (O(

√

lognlogk),2 +ε)-approximation but no non-trivial approximation that violates the balance constraint by less than 2 is known.

Hajiaghayi et al. [7] consider the mapping problem whenH is not just a star but exhibits parallelism on many different levels. For example, a large parallel system may consist of many workstations, each equipped with several CPUs, etc. Such parallel systems are usually highly symmetric. Therefore, Hajiaghayi et al. model them as a tree in which all subtrees routed at a specific level are isomorphic. They develop an approximation algorithm for a Hierarchical Graph Partitioning problem which then gives rise to a scheduling algorithm for such aregular tree topology with an approximation guarantee of (O(logn),(1 +ε)(h+ 1)). This algorithm runs in polynomial time as long as the heighthof the hierarchy is constant.

1.1 Our Contribution

In this paper we first consider the Min-Maxk-Partitioning problem and show how to obtain a polylogarithmic approximation on the cost while only violating the balance constraints by a factor of 1 +ε. For this we use a dynamic programming approach on trees, and then use an approximation of the graph by asingle tree to obtain our result. Note that results that approximate the graph by a convex combination of trees [12] are not useful for obtaining an approximation for Min-Maxk-Partitioning, as the objective function is not linear.

ITheorem 1. There is a polynomial-time(1+ε,1+ε)-approximation algorithm for Min-Max k-Partitioning on trees. This gives rise to an (O(log1.5_n_{log log}_n),_{1 +}_{ε)-approximation} algorithm for general graphs.

Then we consider the Hierarchical Graph Partitioning problem as introduced by Hajiaghayi et al. [7]. We give a slight improvement on the factor by which the balance constraints are violated, while maintaining the same approximation w.r.t. cost. More crucially, our result also works if the heighthis not constant.

(3)

ITheorem 2. There exists an(O(logn),(1 +ε)h)approximation algorithm for Hierarchical Graph Partitioning whose running time is polynomial in hand n.

Our technique is heavily based on spreading metrics and we show that a slight variation of the techniques in [5] can be applied.

At first glance the result of Theorem 2 does not look very good as the factor by which the balance constraints are violated seems to be very high as it depends onh. However, we give some indication that a large dependency on hmight be necessary. We show that an approximation guarantee of α≤h/2 implies an algorithm for the following approximate version of parallel machine scheduling (PMS). Given a PMS instanceI, with a setT of tasks, a length wt, t ∈T for every task, and a deadline d, we define the g-copied instance of I

(g∈N) as the instance where each task is copiedg times and the deadline is multiplied by g. The approximate problem is the following: GivenI, either certify thatI is not feasible or give a solution to theg-copied instance ofI for someg≤α. It seems unlikely that this problem can be solved efficiently for constantα, albeit we do not have a formal hardness proof. These results have been deferred to the full version.

1.2 Basic Notation

Throughout this paper,G= (V, E) denotes the input graph with n=|V|vertices. Its edges are given a cost functionc:E→Nand its vertices are weighted byw:V →N. For a subset of verticesS⊆V,w(S) denotes the total weight of vertices inS andc(S) denotes the total cost of edges leavingS, i.e.,c(S) =P

e={x,y}∈E,|{x,y}∩S|=1c(e). We will sometimes refer to w(S) as thesize ofS and toc(S) as itsboundary cost. In order to simplify the presentation, we assume that all edge costs and vertex weights are polynomially bounded in the number of verticesn. This can always be achieved with standard rounding techniques at the loss of a factor (1 +ε) in the approximation guarantee.

Apartition P of the vertex setV is a collectionP ={Pi}i of pairwise disjoint subsets of

V withS

iPi=V. We define two different cost functions for partitions, namely costsum(P) =

P

ic(Pi) and costmax(P) = maxic(Pi). We drop the superscript whenever the type is clear

from the context. A (k, ν)-balancedpartition is a partition into at mostkparts, such that the inequality w(Pi) ≤ ν·w(V)/k holds for all parts. The costs of the (k,1)-balanced

partition with minimum cost w.r.t. costsum_{and cost}max_{are denoted with}_OPTsum_{(k, G) and} OPTmax(k, G), respectively.

Ahierarchical partition H= (P1, . . . ,Ph) of heighthis a sequence ofhpartitions, where

P`+1 is arefinement ofP`, i.e., for everyS∈ P`+1, there is a setS0 ∈ P` withS ⊆S0. We

call P` the level-`-partition ofH. For any cost vector ~µ∈ Rh, the cost of His given by cost~µ(H) =P

h

`=1µ`costsum(P`). A (~k, ν)-balanced hierarchical partition is a hierarchical

partition whereP` is (k`, ν)-balanced. The minimum cost of a (~k,1)-balanced hierarchical

partition w.r.t.~µis denoted withOPTµ~(~k, G).

An (α, β)-approximate hierarchical partition with respect to~µand~kis a (~k, β)-balanced hierarchical partition whose cost is at mostα·OPTµ~(~k, G). An (α, β)-approximation algorithm

for Hierarchical Graph Partitioning finds an (α, β)-approximate hierarchical partition for any given input graph and cost vector. This also gives an (α, β)-approximation for scheduling on regular tree topologies (see [7]).

2 Min-Max K-Partitioning

In this section, we present an approximation algorithm for Min-Max k-Partitioning. Recall that this problem asks to compute a (k,1)-balanced partition P of a given graph that

(4)

minimizes costmax(P). We first consider instances, where the input graph is a treeT = (V, E). A suitable approximation of the graph by a tree (see [3], [8], [13]) allows to extend the results to arbitrary graphs with a small loss in the approximation factor. Our algorithm is a decision procedure, i.e., it constructs for a given boundb a solution whose cost is at most (1 +ε)bor proves thatb <OPTmax(k, G). A (1 +ε,1 +ε)-approximation algorithm follows by using binary search. In order to simplify the presentation, we assume throughout this chapter that all vertex weights are 1 and thatkdivides n. We further assume that the approximation constantεis at most 1.

The algorithm heavily relies on the construction of adecomposition ofT, which is defined as follows. A decomposition ofT w.r.t.kandbis a partitionD={Di}i, whose partsDi

are connected components, have sizew(Di)≤n/kand boundary costc(Di)≤b. With each

decompositionDwe associate a corresponding vector set ID. This setID contains a vector

(c(Di)/b, w(Di)k/n) that encodes the size and boundary cost ofDi in a normalized way. By

thisID contains (most of) the information aboutDand our algorithm can just work with

ID instead of the full decompositionD. Note that every vector inID is bounded by (1,1).

We call a partition ofID intok subsets{Ii}i, s.t. the vectors in each subset sum to at

mostαin both dimensions, anα-packingof ID. The decompositionDis calledα-feasible

if anα-packing ofID exists. Note that this impliesα≥1. Also note that this definition

of feasibility may be nonstandard, as feasibility incorporates the decomposition cost, i.e., a 1-packing has costb(the bound that we guessed for the optimal solution).

Determining whether there is an α-packing of a given vector set is an instance of the NP-hard Vector Scheduling problem. Chekuri and Khanna [4] gave a polynomial-time approximation scheme (PTAS) for the Vector Scheduling problem which leads to the following result.

ILemma 3. Let D be an α-feasible decomposition of T w.r.t. k and b, then there is a polynomial-time algorithm to construct a(k,(1 +ε)α)-balanced partition ofT of cost at most (1 +ε)α·b, for anyε >0.

Proof. The proof has been deferred to the full version. _J

An optimal (but slow) algorithm for Min-Maxk-Partitioning could iterate over all possible decompositions of the graph and check if any of these decompositions is 1-feasible. If a 1-feasible decomposition is found, the corresponding 1-packing is computed, otherwise the boundbis rejected.

We modify this approach to obtain a polynomial-time approximation algorithm. As the number of decompositions is exponential, we partition them into a polynomial number of classes. This classification is such that decompositions of the same class are similar w.r.t. feasibility. More precisely, we show that if a decomposition is α-feasible, then all decompositions of the same class are (1 +ε)α-feasible.

We present a dynamic program that, for every class, checks whether a decomposition in that class exists and that computes some representative decomposition that is in this class. Then, we check the feasibility of every representative. As an exact check is NP-hard, we use the approximate Vector Scheduling as described above. If a 1-feasible decomposition exists, then a representative of the same class has been computed and furthermore, this representative is (1 +ε)-feasible. Consequently, we obtain a (1 +ε)2_{-packing of the representative, which} induces a (k,(1 +ε)2)-balanced partition ofT with cost at most (1 +ε)2b.

(5)

2.1 Classification of Decompositions

We now describe the classification of decompositions using the vector setID. A vector~p∈ ID

issmall, if both its coordinates are at mostε3 , otherwise it islarge. Small and large vectors are henceforth treated separately.

We define a type of a large vector ~p= (p1, p2) as follows. Informally, the type of a large vector classifies the value of both coordinates. Let i = 0 if p1 < ε4, otherwise let i be the integer such that p1 lies in the interval [(1 +ε)i−1ε4,(1 +ε)iε4). Let j= 0 if p2< ε4, otherwise letj be the integer such thatp2lies in [(1 +ε)j−1ε4,(1 +ε)jε4). Note thatiand j can each taket=dlog₁₊_ε(1/ε4_)e_{+ 1 different values because}_~_p_{is upper-bounded by (1,}_1). This is a consequence of normalizing the vectors inID as mentioned earlier. The pair (i, j)

is thetype of ~p. We number types from 1, . . . , t2 _arbitrarily.

We next define thetype of a small vector ~q= (q1, q2). Informally, the type of a small vector approximates theratioq1/q2of its coordinates. Lets=dlog1+ε(1/ε)eand subdivide

[ε,1/ε] into 2sintervals of the form [(1 +ε)i_,_{(1 +}_ε)i+1_{) for}_i₌_{−s, . . . , s}₋_{1. Crop the first} and the last interval atεand 1/ε respectively. Add two outer intervals [0, ε) and [1/ε,∞) to obtain a discretization of the positive reals into 2s+ 2 types−(s+ 1), . . . , s. The vector ~

q= (q1, q2) has typei, if its ratioq1/q2falls into thei-th interval.

Using these terms, we define the size signature and ratio signature of a vector setID.

The size signature stores the number of large vectors of every type while the ratio signature stores thesum of small vectors of every type. LetI_Dlarge denote the subset of large vectors andIsmall

D,j denote the subset of small vectors of type j.

IDefinition 4(Signature). LetDbe a set of disjoint connected components. The vector ~

`= (`1, . . . , `t2) is thesize signature ofI_D, ifI_Dlarge contains exactly`_i vectors of typeifor

i= 1, . . . , t2. The vector~h= (~h−(s+1), . . . , ~hs) is theratio signatureofID if~hj=P~p∈Ismall D,j ~p

forj=−(s+ 1), . . . , s. We call~g= (~`, ~h) the signature ofDandID.

We prove in the following that decompositions with the same signature have nearly the same feasibility properties. We start with a technical claim whose proof is omitted here.

IClaim 5. Let 0< ε≤1 ands=dlog₁₊_ε(1/ε)e. Thens·ε3_≤_2ε.

ILemma 6. Assume that Dis anα-feasible decomposition of signature~g and thatD0 _has

the same signature. ThenD0 _is_{(1 + 9ε)α-feasible.}

Proof. Given anα-packing{Ii}i ofID, we describe a (1 + 9ε)α-packing{Ii0}i of ID0. For

every vectorp~∈ ID0, we have to select a setI_i0 to add this vector to. We only argue about the

first coordinate of the resulting packing, the reasoning for the second coordinate is analogous. We start with the large vectors. We choose an arbitrary bijection π : I_Dlarge0 → I

large

D

such that only vectors of the same type are matched. This can be done easily, sinceID and

ID0 have the same size signature. Now, we add a vector~p∈ Ilarge

D0 to setI_i0 if and only if

π(~p)∈Ii.

As vector ~p= (p1, p2) andπ(~p) share the same signature, their sizes are similar in both coordinates. More precisely if the first coordinate of ~p is larger than ε4_{, then the first} coordinate ofπ(~p) is at most (1 +ε)p1. If p1 is smaller thanε4, thenp2 must be larger than ε3 _because_~_p_{is a large vector. Consequently there can be at most}_α/ε3 _{large vectors with} such a small first coordinate in the same part ofα-packing{Ii}i. Their sum is bounded by

(6)

αεin the first coordinate. Let (Li,large, Ri,large) denote the sum of large vectors inIi and let

(L0_i,_large, R0_i,_large) denote the sum of large vectors in I_i0. We have

L0_i,_large≤(1 +ε)Li,large+αε . (1)

We now consider small vectors; recall that they are classified by their ratio. Let (Li,j, Ri,j)

denote the sum of small vectors of typej inIi. We call typesj≥0 left-heavyas they fulfill

Li,j≥Ri,j, and accordingly we call types j <0right-heavy as they fulfillLi,j< Ri,j.

For left-heavy types, add vectors ofIsmall

D0_,j toI_i0 until the sum (L0_i,j, R0_i,j) of these vectors

exceedsLi,j in the first coordinate. It follows that L0i,j≤Li,j+ε3, as the excess is at most

one small vector. Summing over all left-heavy types gives

s X j=0 L0_i,j≤(s+ 1)ε3+ s X j=0 Li,j≤3εα+ s X j=0 Li,j , (2)

where we usedα≥1 and Claim 5 in the last step. For right-heavy types, add vectors ofIsmall

D0_,j to I_i0 until the sum of these vectors exceeds

Ri,j in the second coordinate. It follows thatR0i,j≤Ri,j+ε3. We remark that the above

procedure distributes all small vectors of any typej. This follows because on the one hand the ratio signature ensures that the sum of vectors of typej is the same inID andID0,

and on the other hand our Greedy distribution assigns at least as much mass in the heavier coordinate as in the solution for {Ii}i. It remains to show an upper bound on L0i,j for

right-heavy types.

First consider Type−(s+ 1), which corresponds to interval [0, ε). Combining the upper bound onR0_i,j for right-heavy types and the fact thatL0_i,₋₍_s₊₁₎/R0_i,₋₍_s₊₁₎< ε, it follows that L0_i,₋₍_s₊₁₎≤ε(Ri,−(s+1)+ε3). Asε3≤αandRi,−(s+1)≤α, this impliesL0i,−(s+1)≤2εα.

The remaining types correspond to an interval [(1 +ε)j,(1 +ε)j+1) and bothLi,j/Ri,j

andL0_i,j/R0_i,j must lie in that interval. We derive a bound onL0_i,j as follows:

L0_i,j≤(1 +ε)j+1R0_i,j≤(1 +ε)j+1(Ri,j+ε3)≤(1 +ε)j+1(

Li,j

(1 +ε)j +ε

3₎

= (1 +ε)Li,j+ (1 +ε)j+1ε3≤(1 +ε)Li,j+ε3 .

The first step uses the fact that L0_i,j/R0_i,j <(1 +ε)j+1. The second step uses the upper bound onR0_i,j for right-heavy types and the third step uses the boundLi,j/Ri,j≥(1 +ε)j.

Asj <0 for right-heavy types andε >0, the last step follows. Summing up the bounds on all right-heavy types and using Claim 5 gives

−1 X j=−(s+1) L0_i,j≤sε3+ 2εα+ (1 +ε) −1 X j=−(s+1) Li,j≤4εα+ (1 +ε) −1 X j=−(s+1) Li,j . (3)

We now combine all bounds derived for part I_i0. Let (Li, Ri) be the sum of vectors in

Ii and let (L0i, R0i) denote the sum of vectors inIi0. Clearly L0i =L0i,large+

Ps

j=−(s+1)L

0

i,j

and a respective equality holds forLi. The first termL0i,large is upper-bounded in Equation (1). The sumPs

j=0L0i,j is upper-bounded in Equation (2), and the sum

P−1

(7)

upper-bounded in Equation (3). Combining these bounds gives L0_i=L0_i,_large+ −1 X j=−(s+1) L0_i,j+ s X j=0 L0_i,j ≤(1 +ε)Li,large+ 8εα+ s X j=0 Li,j+ (1 +ε) −1 X j=−(s+1) Li,j = (1 +ε)Li+ 8εα≤(1 + 9ε)α .

The last step follows from the assumption that{Ii}i is anα-packing. J

2.2 Finding Decompositions of the Tree

We now present a dynamic program that computes a set of decompositions _D with the following property: If there exists a decomposition of T with signature~g, thenDcontains a decomposition with that signature. The dynamic program adapts a procedure given by Feldmann and Foschini [6]. We first introduce some terminology.

Fix an arbitrary rootr of the tree and some left-right ordering among the children of each internal vertex. This defines theleftmost andrightmost sibling among the children. For each vertexv, letLv be the set of vertices that are contained in the subtree rooted atvor in

a subtree rooted at some left sibling ofv. The dynamic program iteratively constructs sets of connected components of a special form, defined as follows.

Alower frontier Fis a set of disjoint connected components with nodes, such that vertices that are not covered byF form a connected component including the root. In addition we require that components of a lower frontier have at most sizen/kand at most cost b. We define thecost of a lower frontierF as the total cost of edges with exactly one endpoint covered byF. Note that this may be counter-intuitive, because the cost of a lower frontier does not consider the boundary cost of the individual connected components.

The algorithm determines for every vertex v, signature~g, cost κ≤ b and number of verticesm≤n, if there exists a lower frontier ofLv with signature~g that coversm vertices

with cost κ. It stores this information in a table Dv(~g, m, κ) and computes the entries

recursively using a dynamic program. Along with each positive entry, the table stores the corresponding lower frontier. In the following paragraphs, we describe the dynamic program in more detail.

First, consider the case wherev is a leaf and the leftmost among its siblings. Letvp be

the parent ofv. AsLv ={v}, the lower frontier ofLv is either empty or a single connected

component{v}. Therefore Dv((~0, ~0),0,0) = 1 andDv(~g(1, c(v, vp)),1, c(v, vp)) = 1. Here,

we use~g(|S|, c(S)) to denote the signature of a set that just contains connected component S (recall thatc(S) denotes the cost of S). For all other values,Dv(~g, m, κ) = 0.

Second, we consider the case wherev is neither a leaf, nor the leftmost among its siblings. The case whenv is a leaf but not leftmost sibling and the case whenv is an inner vertex that is leftmost sibling and follow from an easy adaption. Letwbe the left sibling ofvand uits rightmost child. Observe thatLv is the disjoint union ofLu,Lw and{v}.

If the edge from v to its parent is not cut by a lower frontierF of Lv, then v is not

covered by F. Consequently F splits into two lower frontiers Fu andFw of Lu andLw

respectively. Denote their signatures with~gu and~gw; they must sum up to~g. IfFucovers

(8)

needs costκ−κu. Hence, if the edge fromv to its parent is not cut,Dv(~g, m, κ) is equal to _ mu≤m, κu≤κ, ~ gu+~gw=~g (Dw(~gw, m−mu, κ−κu)∧Du(~gu, mu, κu)) . (4)

If the edge fromv to its parentvp is cut by the lower frontierF ofLv, then F covers

the entire subtree ofv. It follows thatF consists of three disjoint parts: a component S that containsv, a lower frontierFu ofLuand a lower frontierFw ofLw. Denote the cost of

Fu byκu and the cost ofFwbyκw. LetTv denote the subtree ofv. The costc(S) equals

κu+c(v, vp), becauseS is connected tovpin the direction of the root and with the highest

vertices ofFu in the direction of the leaves. The costκof the lower frontierF isκw+c(v, vp),

because all edges leavingFu have their endpoints inF. The signatures of the three parts of

F must sum up to~g, therefore~g=~gu+~gw+~g(|S|, c(S)). Hence, if the edge fromv to its

parent is cut,Dv(~g, m, κ) is equal to

_ |S|≤n/k, c(S)≤b, ~ gu+~gw+~g(|S|,c(S))=~g Dw(~gw, m−|Tv|, κ−c(v, vp))∧Du(~gu,|Tv|−|S|, c(S)−c(v, vp)) . (5)

We conclude thatDv(~g, m, κ) = 1, if Term (4) or Term (5) evaluate to 1.

The running time of the dynamic program depends on the number of signatures that need to be considered at each vertex. The following lemma bounds their number, its proof is omitted due to space constraints.

ILemma 7. At vertex v, the dynamic program considers|Lv|t+4s+4(2b)2s+2γ signatures,

where|Lv| is the size ofLv andγ= (k/ε2)t

2

.

As the number of signatures is polynomial innandk, it follows that the above dynamic program only needs a polynomial number of steps.

IObservation 8. Let _Dbe the set of decompositions corresponding to positive entries of Dr(~g, n,0) as computed by the dynamic program. Then for any decomposition of T with

signature~g, there exists a decomposition inDwith the same signature.

Combining the dynamic program with the properties of signatures, we obtain an approxima-tion algorithm for Min-Maxk-Partitioning on trees.

ITheorem 9. There is a polynomial-time(1+ε,1+ε)-approximation algorithm for Min-Max k-Partitioning on trees.

Proof. Follows from Lemmas 6, 7 and Observation 8. The running time is polynomial as both the dynamic program and the Vector Scheduling subroutine run in polynomial time. _J

We extend our results to arbitrary undirected graphs G= (V, E, c) with the help of a decomposition tree that acts as a cut sparsifier (see Räcke and Shah [13]). Theorem 1 follows by standard arguments that are deferred to the full version.

(9)

Algorithm 1 merge(H0 _{= (P}0

1, . . . ,Ph0), ~k)

P1←Merge two sets ofP0

1 with minimum combined weight until|P10|=k1.

for`←2, . . . , hdo d←k`/k`−1. for allP`−1,i∈ P`−1 do Q← {P0_{∈ P}0 `|P0∩P`−1,i6=∅}. //Subclusters ofP`−1,i P`,1← ∅, . . . , P`,d← ∅. whileQ6=∅do

AssignQ’s next element toP`,j with smallest weight.

end while P`← P`∪SjP`,j. end for end for ReturnH= (P1, . . . ,Ph).

3 Hierarchical Partitioning

In this section, we present a bicriteria approximation algorithm for Hierarchical Graph Partitioning. In the Hierarchical Graph Partitioning problem, we are given a graphG= (V, E) and parameters~kand~µ. We are asked to compute a (~k,1)-balanced hierarchical partition P of Gthat minimizes cost~µ(P) among all such partitions. We assume furthermore that

a (~k,1)-balanced partition ofGexists and thereforek`+1/k` is integral for all levels. This

assumption is fulfilled in particular when~kis derived from a regular hierarchical network. Note that this is an extension of Min-Sum k-Partitioning and that cost~µ minimizes the

weightedsum of all edges cut by all subpartitions. This contrasts with the objective function costmax considered in the previous section.

Our algorithm relies on the construction of graph separators. Graph separators are partitions in which the maximum weight of a part is bounded, but the number of parts is arbitrary. More precisely, aσ-separatorofGis a partitionPofG, such thatw(Pi)≤w(V)/σ

for everyi.

For any positive vector~σ∈Rh, a~σ-hierarchical separator ofGis a hierarchical partition of G, where on every level P` is a σ`-separator. Note that any (k, ν)-balanced partition

is a (k/ν)-separator and any (~k, ν)-balanced hierarchical partition is a (~k/ν)-hierarchical separator. An α-approximate ~σ-hierarchical separator w.r.t. ~k and ~µ is a σ-hierarchical separator whose cost is at mostα·OPT~µ(~k, G).

Approximate hierarchical separators can be transformed into approximate hierarchical partitions using the merging procedure described in Algorithm 1. It is essentially a greedy strategy for bin packing on every level that makes sure that partitions remain refinements of each other. The next lemma states that the approximation error of the merging procedure is linear in the heighth.

ILemma 10. LetH0 _{= (P}0

1, . . . ,Ph0)be anα-approximate~k/(1 +ε)-hierarchical separator

w.r.t.~k andµ. Then Algorithm 1 constructs an~ (α,(1 +ε)(h+ 1))-approximate hierarchical partitionHw.r.t. ~µand~k.

Moreover, if P0

1 contains at most k1 parts, then H is an (α,(1 +ε)h)-approximate hierarchical partition w.r.t. ~µand~k.

Proof. We use induction overh. Forh= 1, assume without loss of generality thatP1,1∈ P1 has maximum weight. Ifw(P1,1)>2(1 +ε)w(V)/k1, then some merging must have occurred

(10)

in the first line of the algorithm. It follows that all distinct parts P1,i, P1,j ∈ P1 have

combined weight at least 2(1 +ε)w(V)/k1. This is a contradiction to the fact that the total weight of all parts isw(V).

Assume that the claim holds for someh≥1, i.e., the weight of all parts ofPhis

upper-bounded by (1 +ε)(h+ 1)w(V)/kh. Used=kh+1/kh. Assume without loss of generality

thatPh+1,1receives the last element S fromQwhen considering Ph,i in the fourth line of

the algorithm; this implies that the weight ofPh+1,1is smallest before receivingS. To derive a contradiction, assume thatw(Ph+1,1)>(1 +ε)(h+ 2)w(V)/kh+1. It follows that

w(Ph,i)≥w(S) + d X j=1 w(Ph+1,1\S)> d X j=1 (w(Ph+1,1)−w(S)) > d·(1 +ε)(h+ 2)w(V) kh+1 −(1 +ε)w(V) kh+1

The last line equals (1+ε)(h+1)·w(V)/khwhich is a contradiction to the induction hypothesis.

For the first inequality, we used that partPh+1,1 had the smallest weight before receivingS. The last step uses the fact that the weight of allS∈Qis at most (1 +ε)w(V)/kh+1.

As merging sets does not increase the cost of a hierarchical partition, the approximation factorαon the cost does not change. This finishes the proof of the first statement.

To see the second statement, observe that ifP0

1 contains at mostk1 parts, then there is nothing to do in the first line of the algorithm. As no merging occurs, every set inP0

1 has at most weight (1 +ε)w(V)/k1. With a similar induction argument as above, it follows that the merging procedure returns an (α,(1 +ε)h)-approximate hierarchical partition. _J In the following, we describe an algorithm that computes anO(logn)-approximate~k/(1 + ε)-hierarchical separatorH0 _{= (P}0

1, . . . ,Ph0). We use ˜µ` as a shorthand forPh_j₌_`µj.

Our algorithm first computes partitionP0

1 using the Min-Sumk-Partitioning algorithm by Feldmann and Foschini [6]. It can easily be adapted to handle polynomial vertex weights.

The following theorem states the result for the Min-Sumk-Partitioning algorithm.

ITheorem 11([6]). There is a polynomial-time algorithm that computes a(k1,1+ε)-balanced partition ofG with cut cost at mostO(logn)·OPTsum(k1, G).

Note that ˜µ1·OPTsum(k1, G) is an upper bound on the cost of any (~k,1)-balanced hierarchical partition.

The algorithm to construct (P0

2, . . . ,Ph0) is an adaptation of the algorithm by Even et al.

for Min-Sumk-Partitioning. Note that the following description is basically the description in [5]. It relies on a distance assignment{d(e)}e∈E to the edges of G, thespreading metric.

The distances are used in a procedure called cut-proc that permits to find within a subgraph ofGa subset of vertices T, whose cut cost is bounded by its volume vol(T). The volume of a set of vertices is approximately the weighted length of the edges in its cut.

Given the procedure cut-proc, the algorithm constructs all the remaining partitions (P0

2, . . . ,Ph0) as follows. First recall that P10 is already given by Theorem 11. For ` ≥2, the algorithm constructs P0

` by examining all parts of P`0−1 separately. On a given part P_`0₋₁∈ P0

`−1, it uses procedure cut-proc to identify a subgraphH ofP`0−1. This subgraph becomes a part ofP0

` and cut-proc is restarted onP`−1\H. This process is repeated until less than (1 +ε)w(V)/k` vertices are left. The remaining vertices also constitute a part of

P0

`.

It is important to note during the following discussion that the boundary costc(S) of a setS⊆V depends on the subgraph thatS was chosen from. Sometimes, we will chooseS from a subgraphH ofG; in this case,c(S) only counts the cost of edges leavingS towardsH.

(11)

The spreading metric used is an optimal solution to the following linear program, called spreading metrics LP. minX e∈E c(e)d(e) s.t. X u∈S distV(v, u)·w(u)≥µ˜` w(S)−w(V) k` , 1≤`≤h, S ⊆V, v∈S (6) 0≤d(e)≤µ˜1, e∈E .

Using the distancesd(e), distS(v, u) denotes the length of the shortest path between vertices

v anduin subgraph S. Letτ denote the optimal value of the spreading metrics LP. The following two lemmas prove the basic properties of solutions of the spreading metrics LP. Their proofs are omitted due to space constraints.

ILemma 12. Any (~k,1)-balanced hierarchical partition H= (P1, . . . ,Ph)induces a feasible

solution to the spreading metrics LP of value cost~µ(H).

Proof. The proof has been deferred to the full version. _J The above lemma implies that τ is a lower bound on costµ~(H). We define the radius of

vertexv in someS⊆V as radius(v, S) := max{distS(v, u)|u∈S}.

The following lemma proves that any set of sufficient weight has at least constant radius.

ILemma 13. IfS ⊆V satisfies w(S)>(1 +ε)w(V)/k` for some level `, then for every

vertexv∈S,radius(v, S)> ₁₊ε_εµ˜`.

Proof. The proof has been deferred to the full version. _J Even though the spreading metrics LP contains an exponential number of constraints, it can be solved in polynomial time using the ellipsoid algorithm and a polynomial-time separation oracle. By rewriting the Constraints (6), we obtain for all levels`, S ⊆V andv ∈S the constraintP

u∈S(distV(u, v)−µ˜`)w(u)≥w(V)/k`.

For any vertex v, the left-hand side of the constraint is minimized when the subset Sv = {u∈ V | distV(v, u) ≤ µ˜`} is chosen. Consequently, h single-source shortest path

computations from every vertex provide a polynomial-time separation oracle for the spreading metrics LP.

The notation introduced next serves the definition of procedure cut-proc in Algorithm 2. A ball B(v, r) in subgraphG0_{= (V}0_{, E}0_{) is the set of vertices in}_V0 _{whose distance to}_v _is

strictly less thanr, thusB(v, r) :={u∈V0|distV0(u, v)< r}. The set E(v, r) denotes the

set of edges leavingB(v, r). The volume ofB(v, r), denoted by vol(v, r), is defined by vol(v, r) :=η·τ+ X

(x,y)∈E(v,r), x∈B(v,r)

c(x, y)(r−distB(v,r)(v, x)), (7)

whereη= _hn1 . Note that the volume only considers the edges in the cut, not the edges within the ball, which is different from [5]. The following lemma is a corollary of Even et al. [5].

ILemma 14. Given a subgraphG0= (V0, E0) that satisfiesw(V0)>(1 +ε)w(V)/k`, and

given a spreading metric {d(e)}e, procedure cut-proc finds a subset T ⊆V0 that satisfies

(12)

Algorithm 2cut-proc(V0, E0,{c(e)}e∈E0,{d(e)}_e_∈_E0,µ˜_`) Choose an arbitraryv∈V0 ˜ r← ε 1+εµ˜` T← {v} v0 ←closest vertex toT in V0\T whilec(T)> 1 ˜ r·ln( vol(v,r˜)

vol(v,0))·vol(v,distV0(v, v

0₎₎_do

T ←T∪ {v0_}

v0 ←closest vertex toT in V0\T

end while

ReturnT

Proof. The proof has been deferred to the full version. J

The approximation guarantee on the cost induced by partitionsP10 toPh0 is given by the

next theorem.

ITheorem 15. The sequenceH0_{= (P}0

1, . . . ,Ph0) is a~k/(1 +ε)-hierarchical separator with

|P0

1| ≤k1 andcost~µ(H)≤α·OPT~µ(~k, G), whereα=O(logn).

Proof. The fact thatH0 _{is a}~_{k/(1 +}_{ε)-hierarchical separator follows immediately from the} construction and Algorithm 2. In particular, Theorem 11 ensures that|P0

1| ≤k1. The cost ofP0 2 toPh0 w.r.t.~µequals Ph `=2µ`P 0

`. By Lemma 14, the algorithm constructs setsT` for

P_`0 whose costc(T`) is at most

1 +ε ε˜µ` ·ln _vol(v `, ε˜µ`/(1 +ε)) vol(v`,0) ·vol(v`,distV0(v_`, v_`0)), (8)

wherev` andv0`are some vertices inT`. Recall from the description of the algorithm that

the left-hand side of this inequality ignores all edges that have been cut on a higher level or that have been cut previously on the same level. By using the shorthand ˜µ`, we account for

cuts on subsequent levels, thus obtainingPh

`=2µ`P`≤

Ph

`=2µ˜`

P

T`c(T`).

Observe that vol(v,0)≥ητ and vol(v, εµ˜`/(1 +ε))≤(1 +η)τ for allv ∈V. We apply

this to Equation (8) and obtain c(T`)≤ 1 +ε ε˜µ` ·ln _{1 +}_η η ·vol(v`,distV0(v_`, v0_`)) . (9) It follows that h X `=2 X T` ˜ µ`·c(T`)≤ 1 +ε ε ln _{1 +}_η η h X `=2 X T` vol(v`,distV0(v_`, v0_`)) = 1 +ε ε ln 1 +η η h X `=2 X T` (ητ+ X e∈E(v`,distV0(v`,v`0)) c(e)d(e)) ≤ 1 +ε ε ln _{1 +}_η η (hnητ+ h X `=2 X T` X e c(e)d(e)) .

We plug in Equation (9) to obtain the first step and use the definition of the volume in Equation (7) for the second step.

We now claim that every edgee∈Eappears at most once in the triple sum above. Ifeis cut by cut-proc forP0

(13)

on the same level`. Moreover, edgeeis not considered on all remaining levels`0> `, because procedure cut-proc is initiated on these levels with subgraphs of the parts ofP`. It follows

that the triple sum is upper-bounded byτ. Usingη = 1

hn and the fact thatεis constant, we conclude that h X `=2 X T` ˜ µ`·c(T`)≤ 1 +ε ε ln _{1 +}_η η (hnη+ 1)τ =O(logn)τ .

Recall that by Theorem 11, the weighted cost of the first partition ˜µ1·costsum(P0

1) is at most O(logn) times the cost of the optimal hierarchical partition OPT~µ(~k, G). As furthermore

τ≤OPT~µ(~k, G), the theorem follows. J

In other words, the above theorem states that H0 is an O(logn)-approximate~k/(1 + ε)-hierarchical separator. Combining Lemma 10 and Theorem 15 gives the following.

ITheorem 16. There exists a(O(logn),(1 +ε)h)approximation algorithm for Hierarchical Graph Partitioning whose running time is polynomial in hand n.

References

1 Konstantin Andreev and Harald Räcke. Balanced graph partitioning. Theory Comput. Syst., 39(6):929–939, 2006. doi:10.1007/s00224-006-1350-7.

2 Nikhil Bansal, Uriel Feige, Robert Krauthgamer, Konstantin Makarychev, Viswanath Naga-rajan, Joseph (Seffi) Naor, and Roy Schwartz. Min-max graph partitioning and small set expansion. InProc. of the 52nd FOCS, pages 17–26, 2011. doi:10.1109/FOCS.2011.79.

3 Marcin Bienkowski, Miroslaw Korzeniowski, and Harald Räcke. A practical algorithm for constructing oblivious routing schemes. In Proc. of the 15th SPAA, pages 24–33, 2003.

doi:10.1145/777412.777418.

4 Chandra Chekuri and Sanjeev Khanna. On multidimensional packing problems. SIAM J. Comput., 33(4):837–851, 2004. doi:10.1137/S0097539799356265.

5 Guy Even, Joseph (Seffi) Naor, Satish Rao, and Baruch Schieber. Fast approximate graph partitioning algorithms.SIAM J. Comput., 28(6):2187–2214, 1999. Also inProc. 8th SODA, 1997, pp. 639–648. doi:10.1137/S0097539796308217.

6 Andreas E. Feldmann and Luca Foschini. Balanced partitions of trees and applications. In Proc. of the 29th STACS, pages 100–111, 2012. doi:10.4230/LIPIcs.STACS.2012.100.

7 Mohammad Taghi Hajiaghayi, Theodore Johnson, Mohammad Reza Khani, and Barna Saha. Hierarchical graph partitioning. In Proc. of the 26th SPAA, pages 51–60, 2014.

doi:10.1145/2612669.2612699.

8 Chris Harrelson, Kirsten Hildrum, and Satish Rao. A polynomial-time tree decomposition to minimize congestion. In Proc. of the 15th SPAA, pages 34–43, 2003. doi:10.1145/

777412.777419.

9 Robert Krauthgamer, Joseph (Seffi) Naor, and Roy Schwartz. Partitioning graphs into balanced components. In Proc. of the 20th SODA, pages 942–949, 2009. arXiv:soda:

1496770.1496872.

10 Robert Krauthgamer, Joseph (Seffi) Naor, Roy Schwartz, and Kunal Talwar. Non-uniform graph partitioning. In Proc. of the 25th SODA, pages 1229–1243, 2014. arXiv:soda:

2634165.

11 Konstantin Makarychev and Yuri Makarychev. Nonuniform graph partitioning with un-related weights. In Proc. of the 41st ICALP, pages 812–822, 2014. doi:10.1007/

(14)

12 Harald Räcke. Optimal hierarchical decompositions for congestion minimization in net-works. InProc. of the 40th STOC, pages 255–264, 2008.

13 Harald Räcke and Chintan Shah. Improved guarantees for tree cut sparsifiers. InProc. of the 22nd ESA, pages 774–785, 2014. doi:10.1007/978-3-662-44777-2_64.