A Pass-efficient Recursive Contraction Algorithm

rithm

The BFS growth algorithm gives an optimum trade-off between space ˜O(n1+1/k_{) and}

approximation 2k−1, but uses O(k) passes which is less desirable. For example, to achieve a semi-streaming space bound, the number of passes is O(logn). While this

is interesting1_{, a natural question arises: can we produce a spanner in fewer passes?}

In what follows, we answer the question in the affirmative and provide an algorithm that uses O(logk) passes at the expense of a worse approximation factor.

The main intuition for the improvement in number of passes is the following: in the BFS algorithm we are growing regions of small diameter (at various granularities) and in each pass we are growing the region by 1 edge. Thus the growth of the regions is slow. Moreover in each of these steps we are using O(n) space (if the graph is dense) — yet the space allowed for the vertex is ˜O(n1+1/k) and we expect the extra space to matter precisely when the graphs are dense! But if we are growing BFS trees, the extra edges are simply not useful. We will therefore relax the BFS constraint — this will allow us to grow the regions faster. The algorithm RecurseConnect is

presented below.

Algorithm RecurseConnect :

1. The algorithm proceeds in phases which correspond to the passes. In the phase

i, we have a graph ˜Gi which is contraction of the graph G = ˜G0; that is,

subsets of vertices of theGhave been merged into a collection of supervertices. This process will proceed recursively; and we will maintain |G˜i| ≤ n1−(2

i₋₁₎_/k . Therefore after logk passes we have a graph of size √n and we remember the connectivity between every pair of vertices in O(n) space. We describe how to achieve ˜Gi+1 from ˜Gi below.

2. For each vertex in ˜Gi we sample n2

i_/k

distinct neighbors — vertices p, q in ˜Gi

(note thatp, q are subsets of the original vertex set) are neighbors in ˜Gi if there

exists an edge (u, v)∈G such that u∈pand v ∈q. To do this, for each vertex in ˜Gi, we independently partition the vertex set of ˜Gi into ˜O(n2

i_/k

) subsets, and use an `0 sampler for each partition. This can be achieved in ˜O(n1/k) space per

vertex and in total ˜O(n1+1/k_{) space, using the hypotheses} _|_G_˜

i| ≤ n1−(2

i₋₁₎_/k . Note that using the standard coupon-collector argument we can find all the vertices in ˜Gi whose degree is at mostn2

i_/k

as well.

3. The set of sampled edges in ˜Gi gives us a graphHi. We now choose a clustering

of Hi where the centers of the clusters are denoted byCi. Consider the subset

Si of vertices of Hi which have degree at least n2

i_/k

— we will ensure that Ci

is a maximal (not maximum) subset of Si which is independent in Hi2. This

is a standard construction used for the approximate k-center problem. More specifically: We start from the set C_i0 being an arbitrary vertex in Hi. We

repeatedly augment C_ij to C_ij+1 by adding vertices which are (i) at distance at least 3 (as measured in number of hops in Hi) from each vertex inCij. and (ii)

have degree at least n2i/k. Denote the final C_ij, when we cannot add any more vertices, as Ci. Observe that |Ci| ≤ |G˜i|/n2

i_/k

≤n1−(2(i+1)₋₁₎_/k .

4. For each vertexp∈Ciall neighbors ofpinHiare assigned top. For each vertex

q with degree at leastn2i/k in ˜Gi, if it is not chosen in Ci, we have a center pin

5. We now collapse all the vertices assigned to p ∈ Ci into a single vertex and

these |Ci| vertices define ˜Gi+1.

It is immediate that after at most logk passes ˜Gi shrinks below

√

n. We now prove the approximation achieved by the above algorithm. Observe that the distances are now primarily in the collapsed vertices.

Lemma 4.2.1. The distance between any adjacent u, v ∈ G is at most 5logk₋_{1 =}

klog25 −1.

Proof. Define the maximum distance between anyu, vwhich are in the same collapsed set in ˜Gi as ai. Note that a1 = 4 since the clustering C1 has radius 2, and therefore

any collapsed pair are at a distance at most 4. For i >1 observe that ai+1 = 5ai+ 4

and the result follows.

Theorem 4.2.2. RecurseConnectgives us aklog25−1approximation using logk passes and O˜(n1+1/k₎ _space.

Chapter 5 Graph Sparsification

Chapter Outline: In this chapter, we present the semi-streaming graph sparsi-

fication algorithm. We first discuss the minimum cut algorithm to demonstrate the high-level idea and then apply the same idea to the graph sparsification. Both al- gorithms require an access to random bits for edges which are not available for the insertion-only model.

5.1 A Warmup: The Minimum Cut Problem

To warm up, we start with a one-pass semi-streaming algorithm (Algorithm 12, for the minimum cut problem. This will introduce some the ideas used in the subsequent sections on sparsification. The algorithm is based on Karger’s Uniform Sampling Lemma (Lemma 2.2.1) [67].

Algorithm 12TheMinCutAlgorithm. Steps 1-5 are performed together in a single

pass. Step 6 is performed in post-processing.

1: Lethi :E → {0,1}be a uniform hash function for i∈ {1, . . . ,b2 lognc}. 2: for i= 0 to b2 lognc in parallel do

3: Let Gi be the subgraph of G containing edges e such that Q

j≤ihj(e) = 1

(G0 =G).

4: LetHi ← k-Connectivity (Gi) (Algorithm 7 for the insertion-only model or

Algorithm 8 for the dynamic model) for k =O(−2logn)

5: end for

6: return 2j_λ₍_H

j) wherej = min{i:λ(Hi)< k}

sequence of graphs G = G0 ⊇ G1 ⊇ G2 ⊇ . . . where Gi is formed by indepen-

dently removing each edge in Gi−1 with probability 1/2. Simultaneously we use

k-Connectivity to construct a sequence of graphs H0, H1, H2, . . . where Hi con-

tains all edges inGi that participate in a cut of size k or less. The idea is that if i is

not too large, λ(G) can be approximated via λ(Gi) and if λ(Gi)≤k then λ(Gi) can

be calculated from Hi.

Theorem 5.1.1. Assuming access to fully independent random hash functions, there

exists a single-pass, O(−2nlog4n)-space algorithm that (1 +)-approximates the minimum cut in the dynamic graph stream model. The requirement of fully independent random hash functions will be eliminated in Theorem 5.4.2

such edges. On the other hand, if a cut value is larger than k, the witness contains at least k edges that cross the cut. Therefore, if Gi is not k-edge-connected, we can

correctly find a minimum cut in Gi using the corresponding witness.

Let λ(G) be the minimum cut size of Gand let

i∗ = log max 1, λ 2 6 logn .

Fori≤i∗, the edge weights inGi are all 2i and therefore Gi approximates all the cut

values in G w.h.p. by Lemma 2.2.1. Therefore, if MinCut returns a minimum cut

fromGi with i≤i∗, the returned cut is a (1 +)-approximation.

By Chernoff bound, the number of edges in Gi∗ that crosses the minimum cut of

G is O(−2logn) ≤ k with high probability. Hence, MinCut terminates at i ≤ i∗

and returns a (1 +)-approximation minimum cut with high probability.

In document Analyzing Massive Graphs in the Semi-streaming Model (Page 67-73)