**Partitioning Problems**

**Harald Räcke**

**1**

_{and Richard Stotz}

_{and Richard Stotz}

**2**

**1** **Technische Universität München, Garching, Germany**
**raecke@in.tum.de**

**2** **Technische Universität München, Garching, Germany**
**stotz@in.tum.de**

**Abstract**

We present approximation algorithms for balanced partitioning problems. These problems are notoriously hard and we present new bicriteria approximation algorithms, that approximate the optimal cost and relax the balance constraint.

In the first scenario, we consider Min-Max *k-Partitioning, the problem of dividing a graph*
into*k* equal-sized parts while minimizing the maximum cost of edges cut by a single part. Our
approximation algorithm relaxes the size of the parts by (1 +*ε) and approximates the optimal*
cost byO(log1*.*5*n*log log*n), for every 0< ε <*1. This is the first nontrivial algorithm for this
problem that relaxes the balance constraint by less than 2.

In the second scenario, we consider strategies to find a minimum-cost mapping of a graph
of processes to a hierarchical network with identical processors at the leaves. This
Hierarch-ical Graph Partitioning problem has been studied recently by Hajiaghayi et al. who presented
an (O(log*n),*(1 +*ε)(h*+ 1)) approximation algorithm for constant network heights*h. We use*
spreading metrics to give an improved (O(log*n),*(1 +*ε)h) approximation algorithm that runs in*
polynomial time for arbitrary network heights.

**1998 ACM Subject Classification** F.2.2 Nonnumerical Algorithms and Problems

**Keywords and phrases** graph partitioning, dynamic programming, scheduling

**Digital Object Identifier** 10.4230/LIPIcs.STACS.2016.58

**1**

**Introduction**

The problem of scheduling the processes of a parallel application onto the nodes of a parallel
system can in its full generality be viewed as a graph mapping problem. Given a graph
*G*= (V*G, EG*) that represents the*workload, and a graphH* = (V*H, EH*) that represents the

*physical computing resources, we want to map* *G* onto *H* such that on the one hand the
processing load is well balanced, and on the other hand the communication load is small.

More concretely, nodes of *G*represent processes and are weighted by a weight function
*w*:*VG*→R+. Edges of*G*represent communication requirements between processes, and are

weighted by*cG*:*EG*→R+. Nodes and edges of*H* represent processors and communication
links, respectively, and may also be weighted with*wH* :*VH* →R+representing the processing
capacity of a physical node, and*cH* :*EH* →R+ representing the bandwidth capacity of a
communication link. The goal is to map nodes of *G*to nodes of*H, and to map edges of* *G*
to paths in*H* between the corresponding end-points such that a) every node in*H* obtains
approximately the same weight and b) the communication load is small.

This general problem is very complex, and research has therefore focused on special cases.
Many results can be interpreted as mapping to a star graph*H* with*k*leaves where the center

© Harald Räcke and Richard Stotz;

licensed under Creative Commons License CC-BY

node*x*has*wH*(x) = 0, i.e., processes may not be mapped to the center (c*H*(e) is assumed

to be uniform). When the goal is to minimize*total communication load*(sum over all edge
loads in*H*) this problem is known as Min-Sum*k-partitioning. You want to partitionG*into
*k*disjoint pieces*V1, . . . , Vk* such that the weight of any piece is at most*w(Vi*)≤*w(V*)/kbut

the total capacity of edges between partitions is minimized.

For this problem it is not possible to obtain meaningful approximation guarantees without
allowing a slight violation of the balance constraints (w(V*i*)≤*w(V*)/k). Even et al. [5] obtain

a bicriteria approximation guarantee of (O(log*n),*2), which means they obtain a solution
that may violate balance constraints by a factor of 2, and that has a cost that is at most an
O(log*n)-factor larger than the cost of the optimum solution that doesnot* violate balance
constraints. This has been improved by Krauthgamer et al. [9] to (O(√log*n*log*k),*2). If one
aims to violate the balance constraints by less than 2, completely different algorithms seem
to be required that are based on Dynamic Programming. Andreev and Räcke [1] obtain an
(O(log1*.*5_{n),}_{1 +}* _{ε)-approximation which has been improved by Feldmann and Foschini to}*
(O(log

*n),*1 +

*ε). For the case that the weight functionwH*is non-uniform Krauthgamer et

al. [10] obtain an (O(log*n),*O(1))-approximation, which has been improved by Makarychev
and Makarychev [11] to (O(√log*n*log*k),*5 +*ε).*

When the goal is to minimize the*congestion*(maximum load of a link) in the star network
*H, instead of the total communication load, the problem turns into the so-called Min-Max*
k-Partitioning problem. You want to partition a given graph*G*into*k-partsV1, . . . , Vk* of equal

size such that max*ic(Vi*) is minimized, where*c(Vi*) denotes the total weight of edges leaving

*Vi* in *G. For this problem Bansal et al. [2] obtain an (O(*

√

log*n*log*k),*2 +*ε)-approximation*
but no non-trivial approximation that violates the balance constraint by less than 2 is known.

Hajiaghayi et al. [7] consider the mapping problem when*H* is not just a star but exhibits
parallelism on many different levels. For example, a large parallel system may consist of
many workstations, each equipped with several CPUs, etc. Such parallel systems are usually
highly symmetric. Therefore, Hajiaghayi et al. model them as a tree in which all subtrees
routed at a specific level are isomorphic. They develop an approximation algorithm for a
Hierarchical Graph Partitioning problem which then gives rise to a scheduling algorithm for
such a*regular tree topology* with an approximation guarantee of (O(log*n),*(1 +*ε)(h*+ 1)).
This algorithm runs in polynomial time as long as the height*h*of the hierarchy is constant.

**1.1**

**Our Contribution**

In this paper we first consider the Min-Max*k-Partitioning problem and show how to obtain*
a polylogarithmic approximation on the cost while only violating the balance constraints by
a factor of 1 +*ε. For this we use a dynamic programming approach on trees, and then use*
an approximation of the graph by a*single* tree to obtain our result. Note that results that
approximate the graph by a convex combination of trees [12] are not useful for obtaining an
approximation for Min-Max*k-Partitioning, as the objective function is not linear.*

I**Theorem 1.** *There is a polynomial-time*(1+ε,1+ε)-approximation algorithm for Min-Max
*k-Partitioning on trees. This gives rise to an* (O(log1*.*5_{n}_{log log}_{n),}_{1 +}_{ε)-approximation}*algorithm for general graphs.*

Then we consider the Hierarchical Graph Partitioning problem as introduced by Hajiaghayi
et al. [7]. We give a slight improvement on the factor by which the balance constraints are
violated, while maintaining the same approximation w.r.t. cost. More crucially, our result
also works if the height*h*is not constant.

I**Theorem 2.** *There exists an*(O(log*n),*(1 +*ε)h)approximation algorithm for Hierarchical*
*Graph Partitioning whose running time is polynomial in* *hand* *n.*

Our technique is heavily based on spreading metrics and we show that a slight variation of the techniques in [5] can be applied.

At first glance the result of Theorem 2 does not look very good as the factor by which
the balance constraints are violated seems to be very high as it depends on*h. However, we*
give some indication that a large dependency on *h*might be necessary. We show that an
approximation guarantee of *α*≤*h/2 implies an algorithm for the following approximate*
version of parallel machine scheduling (PMS). Given a PMS instance*I, with a setT* of tasks,
a length *wt*, *t* ∈*T* for every task, and a deadline *d, we define the* *g-copied instance of* *I*

(g∈N) as the instance where each task is copied*g* times and the deadline is multiplied by
*g. The approximate problem is the following: GivenI, either certify thatI* is not feasible
or give a solution to the*g-copied instance ofI* for some*g*≤*α. It seems unlikely that this*
problem can be solved efficiently for constant*α, albeit we do not have a formal hardness*
proof. These results have been deferred to the full version.

**1.2**

**Basic Notation**

Throughout this paper,*G*= (V, E) denotes the input graph with *n*=|V|vertices. Its edges
are given a cost function*c*:*E*→Nand its vertices are weighted by*w*:*V* →N. For a subset
of vertices*S*⊆*V*,*w(S) denotes the total weight of vertices inS* and*c(S) denotes the total*
cost of edges leaving*S, i.e.,c(S) =*P

*e*={*x,y*}∈*E,*|{*x,y*}∩*S*|=1*c(e). We will sometimes refer to*
*w(S) as thesize* of*S* and to*c(S) as itsboundary cost. In order to simplify the presentation,*
we assume that all edge costs and vertex weights are polynomially bounded in the number of
vertices*n. This can always be achieved with standard rounding techniques at the loss of a*
factor (1 +*ε) in the approximation guarantee.*

A*partition* P of the vertex set*V* is a collectionP ={P*i*}*i* of pairwise disjoint subsets of

*V* withS

*iPi*=*V*. We define two different cost functions for partitions, namely costsum(P) =

P

*ic(Pi*) and costmax(P) = max*ic(Pi*). We drop the superscript whenever the type is clear

from the context. A (k, ν)-balancedpartition is a partition into at most*k*parts, such that
the inequality *w(Pi*) ≤ *ν*·*w(V*)/k holds for all parts. The costs of the (k,1)-balanced

partition with minimum cost w.r.t. costsum_{and cost}max_{are denoted with}_{OPT}sum_{(k, G) and}
OPTmax(k, G), respectively.

A*hierarchical partition* H= (P1*, . . . ,*P*h*) of height*h*is a sequence of*h*partitions, where

P*`*+1 is a*refinement* ofP*`*, i.e., for every*S*∈ P*`*+1, there is a set*S*0 ∈ P*`* with*S* ⊆*S*0. We

call P*`* the level-`-partition ofH. For any cost vector *~µ*∈ R*h*, the cost of His given by
cost*~µ*(H) =P

*h*

*`*=1*µ`*costsum(P*`*). A (*~k, ν)-balanced* hierarchical partition is a hierarchical

partition whereP*`* is (k*`, ν)-balanced. The minimum cost of a (~k,*1)-balanced hierarchical

partition w.r.t.*~µ*is denoted withOPT*µ~*(*~k, G).*

An (α, β)-approximate hierarchical partition with respect to*~µ*and*~k*is a (*~k, β)-balanced*
hierarchical partition whose cost is at most*α·OPTµ~*(*~k, G). An (α, β)-approximation algorithm*

for Hierarchical Graph Partitioning finds an (α, β)-approximate hierarchical partition for any given input graph and cost vector. This also gives an (α, β)-approximation for scheduling on regular tree topologies (see [7]).

**2**

**Min-Max K-Partitioning**

In this section, we present an approximation algorithm for Min-Max *k-Partitioning. Recall*
that this problem asks to compute a (k,1)-balanced partition P of a given graph that

minimizes costmax(P). We first consider instances, where the input graph is a tree*T* = (V, E).
A suitable approximation of the graph by a tree (see [3], [8], [13]) allows to extend the results
to arbitrary graphs with a small loss in the approximation factor. Our algorithm is a decision
procedure, i.e., it constructs for a given bound*b* a solution whose cost is at most (1 +*ε)b*or
proves that*b <*OPTmax(k, G). A (1 +*ε,*1 +*ε)-approximation algorithm follows by using*
binary search. In order to simplify the presentation, we assume throughout this chapter that
all vertex weights are 1 and that*k*divides *n. We further assume that the approximation*
constant*ε*is at most 1.

The algorithm heavily relies on the construction of a*decomposition ofT, which is defined*
as follows. A decomposition of*T* w.r.t.*k*and*b*is a partitionD={D*i*}*i*, whose parts*Di*

are connected components, have size*w(Di*)≤*n/k*and boundary cost*c(Di*)≤*b. With each*

decompositionDwe associate a corresponding *vector set* ID. This setID contains a vector

(c(D*i*)/b, w(D*i*)k/n) that encodes the size and boundary cost of*Di* in a normalized way. By

thisID contains (most of) the information aboutDand our algorithm can just work with

ID instead of the full decompositionD. Note that every vector inID is bounded by (1,1).

We call a partition ofID into*k* subsets{I*i*}*i*, s.t. the vectors in each subset sum to at

most*α*in both dimensions, an*α-packing*of ID. The decompositionDis called*α-feasible*

if an*α-packing of*ID exists. Note that this implies*α*≥1. Also note that this definition

of feasibility may be nonstandard, as feasibility incorporates the decomposition cost, i.e., a
1-packing has cost*b*(the bound that we guessed for the optimal solution).

Determining whether there is an *α-packing of a given vector set is an instance of*
the NP-hard Vector Scheduling problem. Chekuri and Khanna [4] gave a polynomial-time
approximation scheme (PTAS) for the Vector Scheduling problem which leads to the following
result.

I**Lemma 3.** *Let* D *be an* *α-feasible decomposition of* *T* *w.r.t.* *k* *and* *b, then there is a*
*polynomial-time algorithm to construct a*(k,(1 +*ε)α)-balanced partition ofT* *of cost at most*
(1 +*ε)α*·*b, for anyε >*0.

**Proof.** The proof has been deferred to the full version. _{J}

An optimal (but slow) algorithm for Min-Max*k-Partitioning could iterate over all possible*
decompositions of the graph and check if any of these decompositions is 1-feasible. If a
1-feasible decomposition is found, the corresponding 1-packing is computed, otherwise the
bound*b*is rejected.

We modify this approach to obtain a polynomial-time approximation algorithm. As
the number of decompositions is exponential, we partition them into a polynomial number
of *classes. This classification is such that decompositions of the same class are similar*
w.r.t. feasibility. More precisely, we show that if a decomposition is *α-feasible, then all*
decompositions of the same class are (1 +*ε)α-feasible.*

We present a dynamic program that, for every class, checks whether a decomposition in
that class exists and that computes some representative decomposition that is in this class.
Then, we check the feasibility of every representative. As an exact check is NP-hard, we use
the approximate Vector Scheduling as described above. If a 1-feasible decomposition exists,
then a representative of the same class has been computed and furthermore, this representative
is (1 +*ε)-feasible. Consequently, we obtain a (1 +ε)*2_{-packing of the representative, which}
induces a (k,(1 +*ε)*2)-balanced partition of*T* with cost at most (1 +*ε)*2*b.*

**2.1**

**Classification of Decompositions**

We now describe the classification of decompositions using the vector setID. A vector*~p*∈ ID

is*small, if both its coordinates are at mostε*3 , otherwise it is*large. Small and large vectors*
are henceforth treated separately.

We define a *type of a large vector* *~p*= (p1*, p*2) as follows. Informally, the type of a large
vector classifies the value of both coordinates. Let *i* = 0 if *p1* *< ε*4, otherwise let *i* be
the integer such that *p*1 lies in the interval [(1 +*ε)i*−1*ε*4*,*(1 +*ε)iε*4). Let *j*= 0 if *p*2*< ε*4,
otherwise let*j* be the integer such that*p2*lies in [(1 +*ε)j*−1*ε*4*,*(1 +*ε)jε*4). Note that*i*and
*j* can each take*t*=dlog_{1+}* _{ε}*(1/ε4

_{)e}

_{+ 1 different values because}

_{~}_{p}_{is upper-bounded by (1,}

_{1).}This is a consequence of normalizing the vectors inID as mentioned earlier. The pair (i, j)

is the*type of* *~p. We number types from 1, . . . , t*2 _{arbitrarily.}

We next define the*type of a small vector* *~q*= (q1*, q*2). Informally, the type of a small
vector approximates the*ratioq1/q2*of its coordinates. Let*s*=dlog1+*ε*(1/ε)eand subdivide

[ε,1/ε] into 2sintervals of the form [(1 +*ε)i _{,}*

_{(1 +}

*+1*

_{ε)}i_{) for}

_{i}_{=}

_{−s, . . . , s}

_{−}

_{1. Crop the first}and the last interval at

*ε*and 1/ε respectively. Add two outer intervals [0, ε) and [1/ε,∞) to obtain a discretization of the positive reals into 2s+ 2 types−(s+ 1), . . . , s. The vector

*~*

*q*= (q1*, q*2) has type*i, if its ratioq*1*/q*2falls into the*i-th interval.*

Using these terms, we define the size signature and ratio signature of a vector setID.

The size signature stores the number of large vectors of every type while the ratio signature
stores the*sum* of small vectors of every type. LetI_{D}large denote the subset of large vectors
andIsmall

D*,j* denote the subset of small vectors of type *j.*

I**Definition 4**(Signature). LetDbe a set of disjoint connected components. The vector
*~*

*`*= (`1*, . . . , `t*2) is the*size signature* ofI_{D}, ifI_{D}large contains exactly*` _{i}* vectors of type

*i*for

*i*= 1, . . . , t2. The vector*~h*= (~h−(*s*+1), . . . , ~h*s*) is the*ratio signature*ofID if*~hj*=P*~p*∈Ismall
D*,j* *~p*

for*j*=−(s+ 1), . . . , s. We call*~g*= (*~`, ~h) the signature of*DandID.

We prove in the following that decompositions with the same signature have nearly the same feasibility properties. We start with a technical claim whose proof is omitted here.

I**Claim 5.** *Let* 0*< ε*≤1 *ands*=dlog_{1+}* _{ε}*(1/ε)e. Then

*s*·

*ε*3

_{≤}

_{2ε.}

**Proof.** The proof has been deferred to the full version. _{J}

I**Lemma 6.** *Assume that* D*is anα-feasible decomposition of signature~g* *and that*D0 _{has}

*the same signature. Then*D0 _{is}_{(1 + 9ε)α-feasible.}

**Proof.** Given an*α-packing*{I*i*}*i* ofID, we describe a (1 + 9ε)α-packing{I*i*0}*i* of ID0. For

every vector*p~*∈ ID0, we have to select a set*I _{i}*0 to add this vector to. We only argue about the

first coordinate of the resulting packing, the reasoning for the second coordinate is analogous.
We start with the large vectors. We choose an arbitrary bijection *π* : I_{D}large0 → I

large

D

such that only vectors of the same type are matched. This can be done easily, sinceID and

ID0 have the same size signature. Now, we add a vector*~p*∈ Ilarge

D0 to set*I _{i}*0 if and only if

*π(~p)*∈*Ii*.

As vector *~p*= (p1, p2) and*π(~p) share the same signature, their sizes are similar in both*
coordinates. More precisely if the first coordinate of *~p* is larger than *ε*4_{, then the first}
coordinate of*π(~p) is at most (1 +ε)p1. If* *p1* is smaller than*ε*4, then*p2* must be larger than
*ε*3 _{because}_{~}_{p}_{is a large vector. Consequently there can be at most}* _{α/ε}*3

_{large vectors with}such a small first coordinate in the same part of

*α-packing*{I

*i*}

*i*. Their sum is bounded by

*αε*in the first coordinate. Let (L*i,*large, R*i,*large) denote the sum of large vectors in*Ii* and let

(L0_{i,}_{large}*, R*0_{i,}_{large}) denote the sum of large vectors in *I _{i}*0. We have

*L*0_{i,}_{large}≤(1 +*ε)Li,*large+*αε .* (1)

We now consider small vectors; recall that they are classified by their ratio. Let (L*i,j, Ri,j*)

denote the sum of small vectors of type*j* in*Ii*. We call types*j*≥0 *left-heavy*as they fulfill

*Li,j*≥*Ri,j*, and accordingly we call types *j <*0*right-heavy* as they fulfill*Li,j< Ri,j*.

For left-heavy types, add vectors ofIsmall

D0* _{,j}* to

*I*0 until the sum (L0

_{i}*0*

_{i,j}, R*) of these vectors*

_{i,j}exceeds*Li,j* in the first coordinate. It follows that *L*0*i,j*≤*Li,j*+*ε*3, as the excess is at most

one small vector. Summing over all left-heavy types gives

*s*
X
*j*=0
*L*0* _{i,j}*≤(s+ 1)ε3+

*s*X

*j*=0

*Li,j*≤3εα+

*s*X

*j*=0

*Li,j*

*,*(2)

where we used*α*≥1 and Claim 5 in the last step.
For right-heavy types, add vectors ofIsmall

D0* _{,j}* to

*I*0 until the sum of these vectors exceeds

_{i}*Ri,j* in the second coordinate. It follows that*R*0*i,j*≤*Ri,j*+*ε*3. We remark that the above

procedure distributes all small vectors of any type*j. This follows because on the one hand*
the ratio signature ensures that the sum of vectors of type*j* is the same inID andID0,

and on the other hand our Greedy distribution assigns at least as much mass in the heavier
coordinate as in the solution for {I*i*}*i*. It remains to show an upper bound on *L*0*i,j* for

right-heavy types.

First consider Type−(s+ 1), which corresponds to interval [0, ε). Combining the upper
bound on*R*0* _{i,j}* for right-heavy types and the fact that

*L*0

_{i,}_{−}

_{(}

_{s}_{+1)}

*/R*0

_{i,}_{−}

_{(}

_{s}_{+1)}

*< ε, it follows that*

*L*0

_{i,}_{−}

_{(}

_{s}_{+1)}≤

*ε(Ri,*−(

*s*+1)+

*ε*3). As

*ε*3≤

*α*and

*Ri,*−(

*s*+1)≤

*α, this impliesL*0

*i,*−(

*s*+1)≤2εα.

The remaining types correspond to an interval [(1 +*ε)j,*(1 +*ε)j*+1) and both*Li,j/Ri,j*

and*L*0* _{i,j}/R*0

*must lie in that interval. We derive a bound on*

_{i,j}*L*0

*as follows:*

_{i,j}*L*0* _{i,j}*≤(1 +

*ε)j*+1

*R*0

*≤(1 +*

_{i,j}*ε)j*+1(R

*i,j*+

*ε*3)≤(1 +

*ε)j*+1(

*Li,j*

(1 +*ε)j* +*ε*

3_{)}

= (1 +*ε)Li,j*+ (1 +*ε)j*+1*ε*3≤(1 +*ε)Li,j*+*ε*3 *.*

The first step uses the fact that *L*0* _{i,j}/R*0

_{i,j}*<*(1 +

*ε)j*+1. The second step uses the upper bound on

*R*0

*for right-heavy types and the third step uses the bound*

_{i,j}*Li,j/Ri,j*≥(1 +

*ε)j*.

As*j <*0 for right-heavy types and*ε >*0, the last step follows. Summing up the bounds on
all right-heavy types and using Claim 5 gives

−1
X
*j*=−(*s*+1)
*L*0* _{i,j}*≤

*sε*3+ 2εα+ (1 +

*ε)*−1 X

*j*=−(

*s*+1)

*Li,j*≤4εα+ (1 +

*ε)*−1 X

*j*=−(

*s*+1)

*Li,j*

*.*(3)

We now combine all bounds derived for part *I _{i}*0. Let (L

*i, Ri*) be the sum of vectors in

*Ii* and let (L0*i, R*0*i*) denote the sum of vectors in*Ii*0. Clearly *L*0*i* =*L*0*i,*large+

P*s*

*j*=−(*s*+1)*L*

0

*i,j*

and a respective equality holds for*Li*. The first term*L*0*i,*large is upper-bounded in Equation
(1). The sumP*s*

*j*=0*L*0*i,j* is upper-bounded in Equation (2), and the sum

P−1

upper-bounded in Equation (3). Combining these bounds gives
*L*0* _{i}*=

*L*0

_{i,}_{large}+ −1 X

*j*=−(

*s*+1)

*L*0

*+*

_{i,j}*s*X

*j*=0

*L*0

*≤(1 +*

_{i,j}*ε)Li,*large+ 8εα+

*s*X

*j*=0

*Li,j*+ (1 +

*ε)*−1 X

*j*=−(

*s*+1)

*Li,j*= (1 +

*ε)Li*+ 8εα≤(1 + 9ε)α .

The last step follows from the assumption that{I*i*}*i* is an*α-packing.* J

**2.2**

**Finding Decompositions of the Tree**

We now present a dynamic program that computes a set of decompositions _{D} with the
following property: If there exists a decomposition of *T* with signature*~g, then*Dcontains
a decomposition with that signature. The dynamic program adapts a procedure given by
Feldmann and Foschini [6]. We first introduce some terminology.

Fix an arbitrary root*r* of the tree and some left-right ordering among the children of
each internal vertex. This defines the*leftmost* and*rightmost* sibling among the children. For
each vertex*v, letLv* be the set of vertices that are contained in the subtree rooted at*v*or in

a subtree rooted at some left sibling of*v. The dynamic program iteratively constructs sets*
of connected components of a special form, defined as follows.

A*lower frontier* Fis a set of disjoint connected components with nodes, such that vertices
that are not covered byF form a connected component including the root. In addition we
require that components of a lower frontier have at most size*n/k*and at most cost *b. We*
define the*cost* of a lower frontierF as the total cost of edges with exactly one endpoint
covered byF. Note that this may be counter-intuitive, because the cost of a lower frontier
does not consider the boundary cost of the individual connected components.

The algorithm determines for every vertex *v, signature~g, cost* *κ*≤ *b* and number of
vertices*m*≤*n, if there exists a lower frontier ofLv* with signature*~g* that covers*m* vertices

with cost *κ. It stores this information in a table* *Dv*(~*g, m, κ) and computes the entries*

recursively using a dynamic program. Along with each positive entry, the table stores the corresponding lower frontier. In the following paragraphs, we describe the dynamic program in more detail.

First, consider the case where*v* is a leaf and the leftmost among its siblings. Let*vp* be

the parent of*v. AsLv* ={v}, the lower frontier of*Lv* is either empty or a single connected

component{v}. Therefore *Dv*((*~*0, ~0),0,0) = 1 and*Dv*(~*g(1, c(v, vp*)),1, c(v, v*p*)) = 1. Here,

we use*~g(|S|, c(S)) to denote the signature of a set that just contains connected component*
*S* (recall that*c(S) denotes the cost of* *S). For all other values,Dv*(~*g, m, κ) = 0.*

Second, we consider the case where*v* is neither a leaf, nor the leftmost among its siblings.
The case when*v* is a leaf but not leftmost sibling and the case when*v* is an inner vertex
that is leftmost sibling and follow from an easy adaption. Let*w*be the left sibling of*v*and
*u*its rightmost child. Observe that*Lv* is the disjoint union of*Lu*,*Lw* and{v}.

If the edge from *v* to its parent is not cut by a lower frontierF of *Lv*, then *v* is not

covered by F. Consequently F splits into two lower frontiers F*u* andF*w* of *Lu* and*Lw*

respectively. Denote their signatures with*~gu* and*~gw*; they must sum up to*~g. If*F*u*covers

needs cost*κ*−*κu*. Hence, if the edge from*v* to its parent is not cut,*Dv*(~*g, m, κ) is equal to*
_
*mu*≤*m, κu*≤*κ,*
*~*
*gu*+*~gw*=*~g*
(D*w*(~*gw, m*−*mu, κ*−*κu*)∧*Du*(~*gu, mu, κu*)) *.* (4)

If the edge from*v* to its parent*vp* is cut by the lower frontierF of*Lv*, then F covers

the entire subtree of*v. It follows that*F consists of three disjoint parts: a component *S*
that contains*v, a lower frontier*F*u* of*Lu*and a lower frontierF*w* of*Lw*. Denote the cost of

F*u* by*κu* and the cost ofF*w*by*κw*. Let*Tv* denote the subtree of*v. The costc(S) equals*

*κu*+*c(v, vp*), because*S* is connected to*vp*in the direction of the root and with the highest

vertices ofF*u* in the direction of the leaves. The cost*κ*of the lower frontierF is*κw*+*c(v, vp*),

because all edges leavingF*u* have their endpoints inF. The signatures of the three parts of

F must sum up to*~g, therefore~g*=*~gu*+*~gw*+*~g(|S|, c(S)). Hence, if the edge fromv* to its

parent is cut,*Dv*(~*g, m, κ) is equal to*

_
|*S*|≤*n/k, c*(*S*)≤*b,*
*~*
*gu*+*~gw*+*~g*(|*S*|*,c*(*S*))=*~g*
*Dw*(~*gw, m−|Tv*|, κ−c(v, v*p*))∧D*u*(~*gu,*|T*v*|−|S|, c(S)−c(v, v*p*))
*.* (5)

We conclude that*Dv*(~*g, m, κ) = 1, if Term (4) or Term (5) evaluate to 1.*

The running time of the dynamic program depends on the number of signatures that need to be considered at each vertex. The following lemma bounds their number, its proof is omitted due to space constraints.

I**Lemma 7.** *At vertex* *v, the dynamic program considers*|L*v*|*t*+4*s*+4(2b)2*s*+2*γ* *signatures,*

*where*|L*v*| *is the size ofLv* *andγ*= (k/ε2)*t*

2

*.*

**Proof.** The proof has been deferred to the full version. _{J}

As the number of signatures is polynomial in*n*and*k, it follows that the above dynamic*
program only needs a polynomial number of steps.

I**Observation 8.** *Let* _{D}*be the set of decompositions corresponding to positive entries of*
*Dr*(~*g, n,*0) *as computed by the dynamic program. Then for any decomposition of T with*

*signature~g, there exists a decomposition in*D*with the same signature.*

Combining the dynamic program with the properties of signatures, we obtain an
approxima-tion algorithm for Min-Max*k-Partitioning on trees.*

I**Theorem 9.** *There is a polynomial-time*(1+ε,1+ε)-approximation algorithm for Min-Max
*k-Partitioning on trees.*

**Proof.** Follows from Lemmas 6, 7 and Observation 8. The running time is polynomial as
both the dynamic program and the Vector Scheduling subroutine run in polynomial time. _{J}

We extend our results to arbitrary undirected graphs *G*= (V, E, c) with the help of a
decomposition tree that acts as a cut sparsifier (see Räcke and Shah [13]). Theorem 1 follows
by standard arguments that are deferred to the full version.

**Algorithm 1** merge(H0 _{= (P}0

1*, . . . ,*P*h*0), ~*k)*

P1←Merge two sets ofP0

1 with minimum combined weight until|P10|=*k*1.

**for***`*←2, . . . , h**do**
*d*←*k`/k`*−1.
**for all***P`*−1*,i*∈ P*`*−1 **do**
*Q*← {P0_{∈ P}0
*`*|*P*0∩*P`*−1*,i*6=∅}. *//*Subclusters of*P`*−1*,i*
*P`,*1← ∅, . . . , P*`,d*← ∅.
**while***Q*6=∅**do**

Assign*Q’s next element toP`,j* with smallest weight.

**end while**
P*`*← P*`*∪S*jP`,j*.
**end for**
**end for**
ReturnH= (P1, . . . ,P*h*).

**3**

**Hierarchical Partitioning**

In this section, we present a bicriteria approximation algorithm for Hierarchical Graph
Partitioning. In the Hierarchical Graph Partitioning problem, we are given a graph*G*= (V, E)
and parameters*~k*and*~µ. We are asked to compute a (~k,*1)-balanced hierarchical partition
P of *G*that minimizes cost*~µ*(P) among all such partitions. We assume furthermore that

a (*~k,*1)-balanced partition of*G*exists and therefore*k`*+1/k*`* is integral for all levels. This

assumption is fulfilled in particular when*~k*is derived from a regular hierarchical network.
Note that this is an extension of Min-Sum *k-Partitioning and that cost~µ* minimizes the

weighted*sum* of all edges cut by all subpartitions. This contrasts with the objective function
costmax considered in the previous section.

Our algorithm relies on the construction of *graph separators. Graph separators are*
partitions in which the maximum weight of a part is bounded, but the number of parts is
arbitrary. More precisely, a*σ-separator*of*G*is a partitionPof*G, such thatw(Pi*)≤*w(V*)/σ

for every*i.*

For any positive vector*~σ*∈R*h*, a*~σ-hierarchical separator* of*G*is a hierarchical partition
of *G, where on every level* P*`* is a *σ`*-separator. Note that any (k, ν)-balanced partition

is a (k/ν)-separator and any (*~k, ν*)-balanced hierarchical partition is a (*~k/ν)-hierarchical*
separator. An *α-approximate* *~σ-hierarchical separator w.r.t.* *~k* and *~µ* is a *σ-hierarchical*
separator whose cost is at most*α*·OPT*~µ*(*~k, G).*

Approximate hierarchical separators can be transformed into approximate hierarchical
partitions using the merging procedure described in Algorithm 1. It is essentially a greedy
strategy for bin packing on every level that makes sure that partitions remain refinements of
each other. The next lemma states that the approximation error of the merging procedure is
linear in the height*h.*

I**Lemma 10.** *Let*H0 _{= (P}0

1*, . . . ,*P*h*0)*be anα-approximate~k/(1 +ε)-hierarchical separator*

*w.r.t.~k* *andµ. Then Algorithm 1 constructs an~* (α,(1 +*ε)(h*+ 1))-approximate hierarchical
*partition*H*w.r.t.* *~µand~k.*

*Moreover, if* P0

1 *contains at most* *k*1 *parts, then* H *is an* (α,(1 +*ε)h)-approximate*
*hierarchical partition w.r.t.* *~µand~k.*

**Proof.** We use induction over*h. Forh*= 1, assume without loss of generality that*P*1*,*1∈ P1
has maximum weight. If*w(P1,*1)*>*2(1 +*ε)w(V*)/k1, then some merging must have occurred

in the first line of the algorithm. It follows that all distinct parts *P1,i, P1,j* ∈ P1 have

combined weight at least 2(1 +*ε)w(V*)/k1. This is a contradiction to the fact that the total
weight of all parts is*w(V*).

Assume that the claim holds for some*h*≥1, i.e., the weight of all parts ofP*h*is

upper-bounded by (1 +*ε)(h*+ 1)w(V)/k*h*. Use*d*=*kh*+1*/kh*. Assume without loss of generality

that*Ph*+1*,*1receives the last element *S* from*Q*when considering *Ph,i* in the fourth line of

the algorithm; this implies that the weight of*Ph*+1*,*1is smallest before receiving*S. To derive*
a contradiction, assume that*w(Ph*+1*,*1)*>*(1 +*ε)(h*+ 2)w(V)/k*h*+1. It follows that

*w(Ph,i*)≥*w(S) +*
*d*
X
*j*=1
*w(Ph*+1*,*1\*S)>*
*d*
X
*j*=1
(w(P*h*+1*,*1)−*w(S))*
*> d*·(1 +*ε)(h*+ 2)*w(V*)
*kh*+1
−(1 +*ε)w(V*)
*kh*+1

The last line equals (1+ε)(h+1)·w(V)/k*h*which is a contradiction to the induction hypothesis.

For the first inequality, we used that part*Ph*+1*,*1 had the smallest weight before receiving*S.*
The last step uses the fact that the weight of all*S*∈*Q*is at most (1 +*ε)w(V*)/k*h*+1.

As merging sets does not increase the cost of a hierarchical partition, the approximation
factor*α*on the cost does not change. This finishes the proof of the first statement.

To see the second statement, observe that ifP0

1 contains at most*k*1 parts, then there is
nothing to do in the first line of the algorithm. As no merging occurs, every set inP0

1 has at
most weight (1 +*ε)w(V*)/k1. With a similar induction argument as above, it follows that
the merging procedure returns an (α,(1 +*ε)h)-approximate hierarchical partition.* _{J}
In the following, we describe an algorithm that computes anO(log*n)-approximate~k/(1 +*
ε)-hierarchical separatorH0 _{= (P}0

1*, . . . ,*P*h*0). We use ˜*µ`* as a shorthand forP*h _{j}*

_{=}

*.*

_{`}µjOur algorithm first computes partitionP0

1 using the Min-Sum*k-Partitioning algorithm*
by Feldmann and Foschini [6]. It can easily be adapted to handle polynomial vertex weights.

The following theorem states the result for the Min-Sum*k-Partitioning algorithm.*

I**Theorem 11**([6]). *There is a polynomial-time algorithm that computes a*(k1,1+ε)-balanced
*partition ofG* *with cut cost at most*O(log*n)*·OPTsum(k1, G).

Note that ˜*µ*1·OPTsum(k1*, G) is an upper bound on the cost of any (~k,*1)-balanced hierarchical
partition.

The algorithm to construct (P0

2*, . . . ,*P*h*0) is an adaptation of the algorithm by Even et al.

for Min-Sum*k-Partitioning. Note that the following description is basically the description*
in [5]. It relies on a distance assignment{d(e)}*e*∈*E* to the edges of *G, thespreading metric.*

The distances are used in a procedure called cut-proc that permits to find within a subgraph
of*G*a subset of vertices *T, whose cut cost is bounded by its volume vol(T*). The volume of
a set of vertices is approximately the weighted length of the edges in its cut.

Given the procedure cut-proc, the algorithm constructs all the remaining partitions (P0

2*, . . . ,*P*h*0) as follows. First recall that P10 is already given by Theorem 11. For *`* ≥2,
the algorithm constructs P0

*`* by examining all parts of P*`*0−1 separately. On a given part
*P _{`}*0

_{−}

_{1}∈ P0

*`*−1, it uses procedure cut-proc to identify a subgraph*H* of*P`*0−1. This subgraph
becomes a part ofP0

*`* and cut-proc is restarted on*P`*−1\*H*. This process is repeated until
less than (1 +*ε)w(V*)/k*`* vertices are left. The remaining vertices also constitute a part of

P0

*`*.

It is important to note during the following discussion that the boundary cost*c(S) of a*
set*S*⊆*V* depends on the subgraph that*S* was chosen from. Sometimes, we will choose*S*
from a subgraph*H* of*G; in this case,c(S) only counts the cost of edges leavingS* towards*H*.

The spreading metric used is an optimal solution to the following linear program, called
spreading metrics LP.
minX
*e*∈*E*
*c(e)d(e)*
s.t. X
*u*∈*S*
dist*V*(v, u)·*w(u)*≥*µ*˜*`*
*w(S)*−*w(V*)
*k`*
*,* 1≤*`*≤*h, S* ⊆*V, v*∈*S* (6)
0≤*d(e)*≤*µ*˜1*,* *e*∈*E .*

Using the distances*d(e), distS*(v, u) denotes the length of the shortest path between vertices

*v* and*u*in subgraph *S. Letτ* denote the optimal value of the spreading metrics LP.
The following two lemmas prove the basic properties of solutions of the spreading metrics
LP. Their proofs are omitted due to space constraints.

I**Lemma 12.** *Any* (*~k,*1)-balanced hierarchical partition H= (P1, . . . ,P*h*)*induces a feasible*

*solution to the spreading metrics LP of value* cost*~µ*(H).

**Proof.** The proof has been deferred to the full version. _{J}
The above lemma implies that *τ* is a lower bound on cost*µ~*(H). We define the radius of

vertex*v* in some*S*⊆*V* as radius(v, S) := max{dist*S*(v, u)|*u*∈*S}.*

The following lemma proves that any set of sufficient weight has at least constant radius.

I**Lemma 13.** *IfS* ⊆*V* *satisfies* *w(S)>*(1 +*ε)w(V*)/k*`* *for some level* *`, then for every*

*vertexv*∈*S,*radius(v, S)*>* _{1+}*ε _{ε}µ*˜

*`.*

**Proof.** The proof has been deferred to the full version. _{J}
Even though the spreading metrics LP contains an exponential number of constraints, it can
be solved in polynomial time using the ellipsoid algorithm and a polynomial-time separation
oracle. By rewriting the Constraints (6), we obtain for all levels*`,* *S* ⊆*V* and*v* ∈*S* the
constraintP

*u*∈*S*(dist*V*(u, v)−*µ*˜*`*)w(u)≥*w(V*)/k*`*.

For any vertex *v, the left-hand side of the constraint is minimized when the subset*
*Sv* = {u∈ *V* | dist*V*(v, u) ≤ *µ*˜*`*} is chosen. Consequently, *h* single-source shortest path

computations from every vertex provide a polynomial-time separation oracle for the spreading metrics LP.

The notation introduced next serves the definition of procedure cut-proc in Algorithm 2.
A ball *B(v, r) in subgraphG*0_{= (V}0* _{, E}*0

_{) is the set of vertices in}

*0*

_{V}_{whose distance to}

_{v}_{is}

strictly less than*r, thusB(v, r) :=*{u∈*V*0|dist*V*0(u, v)*< r}. The set* *E(v, r) denotes the*

set of edges leaving*B(v, r). The volume ofB(v, r), denoted by vol(v, r), is defined by*
vol(v, r) :=*η*·*τ*+ X

(*x,y*)∈*E*(*v,r*)*,*
*x*∈*B*(*v,r*)

*c(x, y)(r*−dist*B*(*v,r*)(v, x))*,* (7)

where*η*= * _{hn}*1 . Note that the volume only considers the edges in the cut, not the edges within
the ball, which is different from [5]. The following lemma is a corollary of Even et al. [5].

I**Lemma 14.** *Given a subgraphG*0= (V0*, E*0) *that satisfiesw(V*0)*>*(1 +*ε)w(V*)/k*`, and*

*given a spreading metric* {d(e)}*e, procedure cut-proc finds a subset* *T* ⊆*V*0 *that satisfies*

**Algorithm 2**cut-proc(V0*, E*0*,*{c(e)}*e*∈*E*0*,*{d(e)}_{e}_{∈}* _{E}*0

*,µ*˜

*) Choose an arbitrary*

_{`}*v*∈

*V*0 ˜

*r*←

*ε*1+

*εµ*˜

*`*

*T*← {v}

*v*0 ←closest vertex to

*T*in

*V*0\

*T*

**while**

*c(T*)

*>*1 ˜

*r*·ln( vol(

*v,r*˜)

vol(*v,*0))·vol(v,dist*V*0(v, v

0_{))}_{do}

*T* ←*T*∪ {v0_{}}

*v*0 ←closest vertex to*T* in *V*0\*T*

**end while**

Return*T*

**Proof.** The proof has been deferred to the full version. J

The approximation guarantee on the cost induced by partitionsP10 toP*h*0 is given by the

next theorem.

I**Theorem 15.** *The sequence*H0_{= (P}0

1*, . . . ,*P*h*0*) is a~k/(1 +ε)-hierarchical separator with*

|P0

1| ≤*k1* *and*cost*~µ*(H)≤*α*·OPT*~µ*(*~k, G), whereα*=O(log*n).*

**Proof.** The fact thatH0 _{is a}*~ _{k/(1 +}_{ε)-hierarchical separator follows immediately from the}*
construction and Algorithm 2. In particular, Theorem 11 ensures that|P0

1| ≤*k1. The cost*
ofP0
2 toP*h*0 w.r.t.*~µ*equals
P*h*
*`*=2*µ`*P
0

*`*. By Lemma 14, the algorithm constructs sets*T`* for

P* _{`}*0 whose cost

*c(T`*) is at most

1 +*ε*
*ε˜µ`*
·ln
_{vol(v}
*`, ε˜µ`/(1 +ε))*
vol(v*`,*0)
·vol(v*`,*dist*V*0(v* _{`}, v_{`}*0))

*,*(8)

where*v`* and*v*0*`*are some vertices in*T`*. Recall from the description of the algorithm that

the left-hand side of this inequality ignores all edges that have been cut on a higher level or
that have been cut previously on the same level. By using the shorthand ˜*µ`*, we account for

cuts on subsequent levels, thus obtainingP*h*

*`*=2*µ`*P*`*≤

P*h*

*`*=2*µ*˜*`*

P

*T`c(T`*).

Observe that vol(v,0)≥*ητ* and vol(v, ε*µ*˜*`/(1 +ε))*≤(1 +*η)τ* for all*v* ∈*V*. We apply

this to Equation (8) and obtain
*c(T`*)≤
1 +*ε*
*ε˜µ`*
·ln
_{1 +}_{η}*η*
·vol(v*`,*dist*V*0(v* _{`}, v*0

*))*

_{`}*.*(9) It follows that

*h*X

*`*=2 X

*T`*˜

*µ`*·

*c(T`*)≤ 1 +

*ε*

*ε*ln

_{1 +}

_{η}*η*

*h*X

*`*=2 X

*T`*vol(v

*`,*dist

*V*0(v

*0*

_{`}, v*)) = 1 +*

_{`}*ε*

*ε*ln 1 +

*η*

*η*

*h*X

*`*=2 X

*T`*(ητ+ X

*e*∈

*E*(

*v`,*dist

*V*0(

*v`,v`*0))

*c(e)d(e))*≤ 1 +

*ε*

*ε*ln

_{1 +}

_{η}*η*(hnητ+

*h*X

*`*=2 X

*T`*X

*e*

*c(e)d(e))*

*.*

We plug in Equation (9) to obtain the first step and use the definition of the volume in Equation (7) for the second step.

We now claim that every edge*e*∈*E*appears at most once in the triple sum above. If*e*is
cut by cut-proc forP0

on the same level*`. Moreover, edgee*is not considered on all remaining levels*`*0*> `, because*
procedure cut-proc is initiated on these levels with subgraphs of the parts ofP*`*. It follows

that the triple sum is upper-bounded by*τ*.
Using*η* = 1

*hn* and the fact that*ε*is constant, we conclude that
*h*
X
*`*=2
X
*T`*
˜
*µ`*·*c(T`*)≤
1 +*ε*
*ε* ln
_{1 +}_{η}*η*
(hnη+ 1)τ =O(log*n)τ .*

Recall that by Theorem 11, the weighted cost of the first partition ˜*µ1*·costsum(P0

1) is at most
O(log*n) times the cost of the optimal hierarchical partition* OPT*~µ*(*~k, G). As furthermore*

*τ*≤OPT*~µ*(*~k, G), the theorem follows.* J

In other words, the above theorem states that H0 is an O(log*n)-approximate~k/(1 +*
ε)-hierarchical separator. Combining Lemma 10 and Theorem 15 gives the following.

I**Theorem 16.** *There exists a*(O(log*n),*(1 +*ε)h)approximation algorithm for Hierarchical*
*Graph Partitioning whose running time is polynomial in* *hand* *n.*

**References**

**1** Konstantin Andreev and Harald Räcke. Balanced graph partitioning. *Theory Comput.*
*Syst., 39(6):929–939, 2006.* doi:10.1007/s00224-006-1350-7.

**2** Nikhil Bansal, Uriel Feige, Robert Krauthgamer, Konstantin Makarychev, Viswanath
Naga-rajan, Joseph (Seffi) Naor, and Roy Schwartz. Min-max graph partitioning and small set
expansion. In*Proc. of the 52nd FOCS, pages 17–26, 2011.* doi:10.1109/FOCS.2011.79.

**3** Marcin Bienkowski, Miroslaw Korzeniowski, and Harald Räcke. A practical algorithm for
constructing oblivious routing schemes. In *Proc. of the 15th SPAA, pages 24–33, 2003.*

doi:10.1145/777412.777418.

**4** Chandra Chekuri and Sanjeev Khanna. On multidimensional packing problems. *SIAM J.*
*Comput., 33(4):837–851, 2004.* doi:10.1137/S0097539799356265.

**5** Guy Even, Joseph (Seffi) Naor, Satish Rao, and Baruch Schieber. Fast approximate graph
partitioning algorithms.*SIAM J. Comput., 28(6):2187–2214, 1999. Also inProc. 8th SODA,*
1997, pp. 639–648. doi:10.1137/S0097539796308217.

**6** Andreas E. Feldmann and Luca Foschini. Balanced partitions of trees and applications. In
*Proc. of the 29th STACS, pages 100–111, 2012.* doi:10.4230/LIPIcs.STACS.2012.100.

**7** Mohammad Taghi Hajiaghayi, Theodore Johnson, Mohammad Reza Khani, and Barna
Saha. Hierarchical graph partitioning. In *Proc. of the 26th SPAA, pages 51–60, 2014.*

doi:10.1145/2612669.2612699.

**8** Chris Harrelson, Kirsten Hildrum, and Satish Rao. A polynomial-time tree decomposition
to minimize congestion. In *Proc. of the 15th SPAA, pages 34–43, 2003.* doi:10.1145/

777412.777419.

**9** Robert Krauthgamer, Joseph (Seffi) Naor, and Roy Schwartz. Partitioning graphs into
balanced components. In *Proc. of the 20th SODA, pages 942–949, 2009.* arXiv:soda:

1496770.1496872.

**10** Robert Krauthgamer, Joseph (Seffi) Naor, Roy Schwartz, and Kunal Talwar. Non-uniform
graph partitioning. In *Proc. of the 25th SODA, pages 1229–1243, 2014.* arXiv:soda:

2634165.

**11** Konstantin Makarychev and Yuri Makarychev. Nonuniform graph partitioning with
un-related weights. In *Proc. of the 41st ICALP, pages 812–822, 2014.* doi:10.1007/

**12** Harald Räcke. Optimal hierarchical decompositions for congestion minimization in
net-works. In*Proc. of the 40th STOC, pages 255–264, 2008.*

**13** Harald Räcke and Chintan Shah. Improved guarantees for tree cut sparsifiers. In*Proc. of*
*the 22nd ESA, pages 774–785, 2014.* doi:10.1007/978-3-662-44777-2_64.