Minimum Spanning Tree (MST) - Analyzing Massive Graphs in the Semi-streaming Model

In this section, we present algorithms for finding exact and approximate MST. In the insertion-only model, we have a single-pass, semi-streaming algorithm for finding an exact MST. However, it takes multiple passes to find an exact MST in the dynamic model.

3.2.1 Exact Minimum Spanning Tree

We assume that there are no two edges with the same edge weight. We achieve by tie-breaking edge weight arbitrarily (for example, using lexicographic order when two edges have the same edge weight). Then, there is a unique MST and the follow- ing lemma and the algorithm for the exact MST (Algorithm 9) follows from Prim’s algorithm.

Lemma 3.2.1. Let T1 be the MST of G1. Then, the MST of T1∪G2 is the MST of

G1∪G2.

Algorithm 9The MST Algorithm (for insertion-only streams)

1: Initialize a spanning tree T =∅.

2: repeat

3: Read n edges from the stream. LetH be the set of edges.

4: Compute the minimum spanning tree of T ∪H.

5: Remember the new minimum spanning tree as T.

6: until the end of the stream

6: Assert T as the minimum spanning tree.

Now we proceed to the algorithm for dynamic streams. Our starting point is Boruvka’s algorithm for finding the MST. This algorithm proceeds inO(logn) phases. In each phase, the minimum weight edge incident on each vertex is added and the resulting connected components are collapsed to form new vertices. This algorithm can be implemented in the dynamic stream setting in O(log2n) passes by emulating each phase in O(logn) passes of the dynamic graph stream. We emulate a phase as follows: In the first pass, we `0-sample an incident edge on each vertex without

considering the weights. Suppose we sample an edge with weight wv on vertex v. In

the next pass, we again`0-sample incident edges but this time we ignore all edges of

weight at least wv on vertexv when we construct the sketch. Repeating this process

each vertex. Thus the algorithm takes O(log2n) passes as claimed. In the rest of this section, we transform the algorithm into a new algorithm that uses O(n1+1/p) space and O(p) passes which translates to a O(logn/log logn)-pass, O(npolylogn)-space algorithm.

Reducing the Number of Passes.

Our first step is to show that O(lognlog logn) passes suffice. The algorithm is based on the observation that the number of vertices under consideration in the ith phase is at mostn/2i_{. Hence, during the} _i_{th phase we can afford to sample} _t

i = 2i incident

edges without violating the semi-streaming space restriction. Therefore, theith phase can be emulated in O(log_t_in) passes which implies that the total number of passes is Plogn

i=1 log2in =O(lognlog logn).

The next step is to reduce the number of phases. The basic idea is to not just find the lightest incident edge for each vertex, but to find the k lightest edges. It follows from the next lemma that this allows us to reduce the number of vertices by a factor

Lemma 3.2.2. In a simple weighted graph G = (V, E), if E0 ⊂ E contains the k

lightest incident edges on each vertex, then we can identify the lightest edge in the cut (S, V \S) for any subset S ⊂V of size at most k.

Proof. If|S| ≤k then we know the lightest edge incident on eachv ∈S inE(S, V \S) because v has at most k−1 neighbors in S.

Unfortunately, after the first phase the graph under consideration is a multi-graph and the above lemma does not apply directly. For example, the lightest k incident edges onvmay all connectv to the same neighbor. However, we can rectify this situa- tion by constructingO(logn) random partitionsP1, . . . , PO(logn) of the vertices where

each partition is of size 2k. For each vertex v and each partition P = {V1, . . . , V2k}

we find the lightest edge from v to each Vi. Let E0 be the set of edges collected.

Lemma 3.2.3. With high probability, E0 contains the k lightest edges to distinct neighbors.

Proof. Let Nv be the k closest neighbors of v and consider u ∈ Nv. With high

probability there exists a partition P ={V1, . . . , V2k} whereu is the only element in

Vi∩Nv for somei. Hence, we identify the lightest edge between u and v.

The next theorem is proved by carefully combining the above idea with the

O(lognlog logn) pass algorithm.

Theorem 3.2.4. There exists a O(p)-pass, O˜(n1+1/p₎_{-space algorithm that finds the} MST of a dynamic graph stream. In particular there is a semi-streaming algorithm

that uses O(logn/log logn) passes.

Proof. Suppose at some point we haven1+1/p_/t_{remaining vertices. Then we can find}

the√tclosest neighbors of a vertex inO(log√

tn) passes as follows: constructO(logn)

random partitions each of size 2√t and then useO(log√

tn) successive batches of

√

in each set of each partition. Let ni be the number of vertices at the start of the

ith phase and define ti = n1+1/p/ni. Then t1 =n1/p, t2 = n3/(2p), t3 = n9/(4p). . . and

hence P

ilogti(n) =O(p).

3.2.2 Approximate Minimum Spanning Trees

Unless we have Ω(n2_{) space, it is not possible to compute the exact minimum spanning}

tree in dynamic streams. However, we can (1 +)-approximate the MST in one-pass with near linear space. We reduce the problem of estimating the weight of the MST to the problem of counting the number of connected components in graphs. The reduction uses an idea due to Chazelle et al. [27]. Consider a graph G with edge weights are in the range [1, W] where W = poly(n). We will assume that G is connected but our algorithm can be used to estimate the weight of the minimum weight spanning forest if G is unconnected. Let Gi be the subgraph of G consisting

of all edges whose weight is at most wi = (1 +)i and let cc(H) denote the number

of connected components of a graph H.

Lemma 3.2.5. Let T be a minimum spanning tree of G and set r = log₁₊W.

Then w(T)≤n−(1 +)r+ r X i=0 λicc(Gi)≤(1 +)w(T) where λi = (1 +)i+1−(1 +)i.

Proof. Consider theG0 formed by rounding each edge weight up to the nearest power of (1 +). Then it is clear thatw(T)≤w(T0)≤(1 +)w(T) where T0 is a minimum

spanning tree of G0. It remains to computew(T0) and we do this by considering the operation of Kruskal’s algorithm on G0. Kruskal’s algorithm will first addn−cc(G0)

edges of weight 1, then cc(G0)−cc(G1) edges of weight (1 +) etc. The total weight

of edges added will be

w(T0) = (n−cc(G0)) +

r−1

i=1

(1 +)i(cc(Gi)−cc(Gi+1))

which simplifies to give the claimed quantity.

Hence, we can estimate the weight of the minimum spanning tree using the con- nectivity algorithm.

Theorem 3.2.6. There exists a single-pass, O(−1_·_n_·_log4_n₎_-space, _O₍_m_·_polylog_n₎_-

time algorithm that (1 +) approximates the weight of the minimum spanning tree of a dynamic graph.

In document Analyzing Massive Graphs in the Semi-streaming Model (Page 57-62)