A Simple Sparsification - Analyzing Massive Graphs in the Semi-streaming Model

See Algorithm 13 for a simple sparsification algorithm. The algorithm extends

the MinCut Algorithm by taking into account the connectivity of different edges.

Lemma 5.2.1. Assuming access to fully independent random hash functions,Simple-

Sparsification uses O(−2nlog5n) space and the number of edges in the sparsifica-

tion is O(−2nlog3n).

Proof. Each of theO(logn) instance of k-Connectivityruns inO(knlog2n) space.

Algorithm 13 The Simple-Sparsification Algorithm. Steps 1-5 are performed

in a single pass. Step 6 is performed in post-processing.

1: Lethi :E → {0,1}be a uniform hash function for i∈ {1, . . . ,b2 lognc}. 2: for i= 0 to b2 lognc in parallel do

3: Let Gi be the subgraph of G containing edges e such that Q

j≤ihj(e) = 1

(G0 =G).

4: LetHi ← k-Connectivity (Gi) (Algorithm 7 for the insertion-only model or

Algorithm 8 for the dynamic model) for k =O(−2logn)

5: end for

6: For each edge e = (u, v), find j = min{i : λe(Hi) < k}. If e ∈ Hj, add e to the

sparsification with weight 2j. Return the constructed sparsification.

of edges returned is O(knlogn), the number of edges in the sparsification is also bounded by O(−2_n_log3_n_).

As mentioned earlier, the analysis of our sparsification result uses a modification of Theorem 2.2.3 that arises from the fact that we will not be able to independently sample each edge. The proof of Theorem 2.2.3 is based on the following version of the Chernoff bound.

Lemma 5.2.2(Fung et al. [46]). Consider any subsetC of edges of unweighted edges, where each edgee∈C is sampled independently with probabilitypefor somepe ∈(0,1] and given weight 1/pe if selected in the sample. Let the random variable Xe denote the weight of edges e in the sample; if e is not selected, then Xe = 0. Then, for any

p≤pe for all edges e, any ∈(0,1], and any N ≥ |C|, the following bound holds: P " X e∈C Xe− |C| ≥N # <2 exp(−0.382pN) .

We will need to prove an analogous lemma for our sampling procedure. Consider

the _{Simple-Sparsification} algorithm as a sampling process that determines the

edge weight in the sparsification. Initially, the edge weights are all 1. For each round

i = 1,2, . . . if an edge e is not k-connected in Gi−1, we freeze the edge weight. For

an edges ethat is not frozen, we sample the edge with probability 1/2. If the edge is sampled, we double the edge weight and otherwise, we assign weight 0 to the edge.

Definition 5.2.3. Let Xe,i be random variables that represent the edge weight of e

at roundiand letXebe the final edge weight ofe. Letpe = min

253λ−1

−2_log2

n,1 where λe is the edge-connectivity of e and letp0e = min{4pe,1}. Let Be be the event

that the edge weight of e is not frozen until round blog 1/p0_ec and let BC = ∪e∈CBe

for a set C of edges.

In the above process, freezing an edge weight at round iis equivalent to sampling an edge with probability 1/2i−1_{. We will use Azuma’s inequality, which is an expo-}

nentially decaying tail inequality for dependent random process, instead of Lemma 5.2.2.

Lemma 5.2.4 (Azuma’s inequality). A sequence of random variables X1, X2, X3, . . .

is called a martingale if for all i≥1,

If |Xi+1−Xi| ≤ci almost surely for all i, then P[|Xn−X1| ≥t]<2 exp −t2 2P ic2i .

We prove the following lemma which is identical to Theorem 5.2.2 if no bad event

Be occurs.

Lemma 5.2.5. Let C be a set of edges. For any p ≤ pe for all e ∈ C and any

N ≥ |C|, we have P " ¬BC and X e∈C Xe− |C| ≥N # <2 exp(−0.382pN) .

Proof. Suppose that we sample edges one by one and let Yi,j be the total weight of

edges inC afterj steps at round i. If Yi,0 ≥ |C|+N for anyi, we stop the sampling

process.

For each step in round i, we change the edge weight from 2i−1 _{to either 2}i _or

0 with equal probability. The expectation of the edge weight is 2i−1 and therefore, E[Yi,j|Yi,j−1] = Yi,j−1. In addition, there are at most

|C|+N

2i−1 random variables Yi,j

at round i since otherwise, Yi,0 has to be greater than |C|+N and we would have

stopped the sampling process. So

X i0_<i X j |Yi0_,j−Y_i0_,j₋₁|2 ≤ X i0_<i |C|+N 2i0₋₁ 2 2(i0−1) = X i0_<i 2i0−1(|C|+N)≤2i+1N .

Now the following inequality follows from Azuma’s inequality.

P[|Yi,0 − |C|| ≥N]<2 exp − 2_N 2i+2

Let i = blog max{1/(4p),1}c. If BC does not occur, Yi,0 = P_e∈CXe. From the

definition of i, i = 0 or 2−(i+2) ≥ 0.38p. If i = 0, obviously Yi,0 = |C|. If 2−(i+2) ≥

0.38p, we get the desired result: _P[|Yi,0− |C|| ≥N]<2 exp(−0.382pN).

Theorem 5.2.6. Assuming access to fully independent random hash functions, there

exists a single-pass, O(−2nlog5n)-space (1±)-sparsification algorithm in the semi- streaming model.

Proof. By replacing Theorem 5.2.2 by Lemma 5.2.5, we can conclude that _Simple-

Sparsification produces a sparse graph that approximates every cut with high

probability or for some edge e, Be occurs. Consider an edge e = (u, v) and some

minimum u-v cut of cut value λe. For i= blog 1/p0ec, the expected number of edges

in this cut is smaller than k/2 (assuming that we use a sufficiently large constant to decide k). By the Chernoff bound, e is not k-connected in Gi with high probability.

By union bound, Be do not occur for all e with high probability and we obtain the

desired result.

In document Analyzing Massive Graphs in the Semi-streaming Model (Page 73-77)