moved point p ∈ P . Since the minimum pairwise distance from distinct points in P is 1 and Inequality (8.2) is true for at least (1 − σ/2) · n2 pairs of points (p, q) ∈ P × P with probability at least 1 − δ/3, we have that
and slack σ/2 with probability at least 1 − δ/3. Furthermore, each point lies on a grid with cell size ε/(18qd(ε0, σ0, δ0)) and the maximum pairwise distance of points is O(√
On the obtained point set, we run our construction from Sections 7.2 and 8.2 with a precision parameter ε00 := ε/3, a slack parameter σ00 := σ/2, and an error probability parameter δ00 := δ/3. Then, with a total error probability of δ, the resulting point set P0 embeds P with distortion (1+ε/3)·(1+ε/3) ≤ (1+ε) and slack σ. It follows from the above and Theorem 7 that we also have P0 ⊂ {1, . . . , ∆0}d0 with spread ∆0 ∈ O(√
dn∆/(ε2√ σδ)) and dimension d0 ∈ O(1/(ε2σδ)).
As explained before in the proof of Theorem 9, we have to ensure that σ00 > 22d/n (confer Theorem 7) since we use the construction given in Section 7.2. However, this is implicitly required by the fact that the space requirement of a streaming algorithm has to be sublinear in n and the space requirement of our streaming algorithm is ω(1/σ).
Finally, we analyze the complexity of our construction. Due to Theorem 10, each point in P can be embedded into the low-dimensional space Rd(ε0,σ0,δ0) in O(d · log2(d)/(ε2σδ)) time using O(log(d)/(ε2σδ)) space. Due to Theorems 7 and 9, the construction from Sections 7.2 and 8.2 applied on a set of points with dimension O(1/(ε2σδ)) and spread O(√
dn∆/(ε2√
σδ)) has both an update time and space requirement of
O
The size of the set of representatives follows from Lemma 7.2.7.
8.3 Max-Cut in High Dimensions
In this section, we show how to embed a set of high-dimensional Euclidean points into a low-dimensional Euclidean space such that the sum of the pairwise distances is well preserved. Afterwards, we use this result to design a streaming algorithm that implicitly computes a (1 ± ε)-approximation of the max-cut problem for a dynamic data stream of high-dimensional Euclidean points.
Let ϕ : P → Rd(ε,δ) be the Johnson-Lindenstrauss embedding where each point is mapped into a Euclidean space with dimension d(ε, δ) ∈ Θ(1/(ε2δ2)). Then, we will show that, for a pair of points (p, q) ∈ P × P , the expected value of |D(ϕ(p), ϕ(q)) − D(p, q)| is δ ε · D(p, q) and |D(ϕ(p), ϕ(q)) − D(p, q)| is sharply concentrated around its expected value with probability 1 − δ. This leads to the following lemma:
Lemma 8.3.1. Let ε, 0 < ε < 1, be a precision parameter, let δ, 0 < δ < 1, be an error variable Yi(p) as explained in the proof of Theorem 10. We define the embedding ϕ for the point p by
ϕ(p) := 1
qd(ε, δ)
· (Y1(p), . . . , Yd(ε,δ)(p))T .
Following the construction in the proof of Theorem 10, each point can be embedded using a space of O(log(d)/(ε2δ2)) and by performing O(d/(ε2δ2)) arithmetic and finite field op-erations on elements of O(log(d)) bits. Furthermore, since ϕ is a linear function, we have ϕ(p − q) = ϕ(p) − ϕ(q) for all pairs (p, q) ∈Rd×Rd.
Now, let p and q be any two points inRd. We define ν := p − q and Y (ν) := kϕ(ν)k2 to be the random variable for the squared length of ϕ(ν). Then, as explained in the proof of Theorem 10, the expected value of Y (ν) is E [Y (ν)] = kνk2, and we can upper bound the The expected value of err(p, q) is given by
E [err(p, q)]
8.3 Max-Cut in High Dimensions 159
It follows that, in order to upper bound the expected value of err(p, q), we have to upper bound the probability that err(p, q) > εδ/5 · 2i· kνk for each i ∈ N0. Let ` be any fixed
By Chebyshev’s inequality, we can upper bound this probability by
Pr
Now, the expected value of err(p, q) can be upper bounded by
Due to Markov’s inequality, it follows that
Pr Due to linearity of expectation, we have
Pr
Given any Euclidean point set P , the embedding described above is useful for all geo-metric problems that satisfy the following four properties:
(i) The cost of an optimal solution for P is a function whose set of input parameters is a subset of all pairwise distances of P .
(ii) The cost of an optimal solution for P is at least Pp∈PPq∈P1/c · D(p, q), where c ≥ 1 is any small constant.
(iii) If the distance D(p, q) between any two points p, q ∈ P is increased or decreased by any value α > 0, the cost of an optimal solution for P is increased or decreased by at most O(α).
(iv) The complexity of all known (1 ± ε)-approximation algorithms depends exponentially on the dimension of P .
8.3 Max-Cut in High Dimensions 161
To handle these problems, we first embed the input points and afterwards apply any efficient (1 ± ε)-approximation algorithm on the embedded points.
One suitable problem is the max-cut problem in the dynamic geometric data stream model.
Definition 8.3.2 (Euclidean Max-Cut Problem). For a set P ⊂Rd, the Euclidean max-cut problem is to find a partition of P into two subsets C1 and C2 such that the sum
Cut(P, C1, C2) := X
(p,q)∈C1×C2
D(p, q)
of inter-cluster distances is maximized.
Obviously, the max-cut problem satisfies Properties (i) and (iii). Furthermore, it is shown in [44] that Property (ii) is satisfied for c = 4. Concerning Property (iv), the authors of [44] gave an efficient (1 ± ε)-approximation for the max-cut problem in low-dimensions that has the following properties:
Lemma 8.3.3 ([44]). Let ε, 0 < ε < 1, be a precision parameter. Given a stream of m Insert and Delete operations of points from a discrete Euclidean space {1, . . . , ∆}d, where d is a constant, there exists a streaming algorithm that computes with probabil-ity at least 2/3, for the current point set P with cardinalprobabil-ity n, a data structure of size O(log3(∆m) · log4(∆)/ε2d+4) from which an implicit (1 ± ε)-approximate solution for the max-cut problem can be extracted in poly(exp(1/ε)O(1), (1/ε)d, log(∆), log(n), log(m)) time.
An update can be processed in O(log2(∆) · log(∆m)) time.
By combining the embedding given in Lemma 8.3.1 with the approximation algorithm presented in [44], we can implicitly compute a (1 ± ε)-approximation for the max-cut problem on dynamic geometric data streams of high-dimensional points.
Theorem 12. Let ε, 0 < ε < 1, be a precision parameter. Given a stream of m In-sert and Delete operations of points from a discrete high-dimensional Euclidean space {1, . . . , ∆}d, there is a randomized streaming algorithm that has a space requirement of O(log7(d∆mn)/εO(1/ε2)) and computes with probability at least 5/8, for the current point set P of size n, a data structure from which an implicit (1 ± ε)-approximation for the max-cut problem can be extracted in poly(exp(1/ε)O(1), (1/ε)1/ε2, log(d), log(∆), log(n), log(m)) time. An update requires O(d · log2(d)/ε2+ log3(d∆nm/ε)) time.
Proof. We proceed in a similar way as we have done in the proof of Theorem 11. At first, we embed the discrete high-dimensional Euclidean point set P into a low-dimensional Euclidean space. This embedding induces a small multiplicative error on the cost of a max-imum cut. Then, we apply the snap-rounding technique, i.e., we impose an appropriately fine grid on the target space and move each embedded point to its nearest grid point. This movement of the points induces an additive error, which can be charged against a lower bound on the cost of a maximum cut for P to get a small multiplicative error. Finally, by
applying the techniques described in [44] on the embedded and moved points, we obtain the results stated in the theorem. Next, we explain our construction in more detail.
In the first step, we apply the embedding ϕ : P → P0given in Lemma 8.3.1 with precision parameter ε0 := ε/16 and error probability parameter δ0 := 1/24 on P . Then, we have that
X
(p,q)∈P ×P
|D(ϕ(p), ϕ(q)) − D(p, q)| ≤ ε0· X
(p,q)∈P ×P
D(p, q)
is true with probability at least 1 − δ0. Since Property (ii) (on page 160) is satisfied for c = 4 [44], we have MaxCut(P ) ≥ 1/4 ·P(p,q)∈P ×P D(p, q). Due to the fact that each cut maximum cut of P0. It follows from the above that
X
8.3 Max-Cut in High Dimensions 163
In the second step, we apply the snap-rounding technique. We impose a square grid on the target space Rd(ε0,δ0) with d(ε0, δ0) ∈ Θ(1/(ε2δ2)), where each cell has side length ε/(16 ·qd(ε0, δ0)), and move each point in P0 to its nearest grid point. Let P00 be the set of points that we obtain after moving the points in P0. Each point is moved by a distance of at most
Thus, the movement of the points induces an additive error of at most εn2/16 on the sum of the pairwise distances. Since Property (ii) (on page 160) is satisfied for c = 4 [44] and the minimum pairwise distance of P is 1, a lower bound on the cost of a maximum cut for P is n2/4. Hence, we have εn2/16 ≤ ε/4 · MaxCut(P ). Due to Inequality (8.3), we get
with probability at least 1 − 1/24. Besides, we can upper bound the diameter of P00 as follows. Since the maximum pairwise distance of P is√
d∆, the value n2·√
d∆ is an upper bound on the cost of a maximum cut for P . Since the diameter of a point set is a lower bound on the cost of a maximum cut of the point set, we get
diam(P0) ≤ MaxCut(P0) ≤
where the second inequality follows from Inequality (8.3). As a result, the diameter of P00 is O(√
d∆n2). Furthermore, each point in P00 lies on a grid with cell size ε/(16 ·qd(ε0, δ0)).
Thus, by scaling the point space by 16 ·qd(ε0, δ0)/ε, we get a set of points from a discrete low-dimensional space {1, . . . , ∆0}d0 with ∆0 ∈ O(√
d∆n2/ε2) and d0 ∈ O(1/ε2).
On the scaled point set, we run the approximation algorithm of [44] with precision parameter ε00 := ε/3. Due to Lemma 8.3.3 and our calculations above, with probability at least 23/24 − 1/3 = 5/8, we can compute a point set P000 such that construction computes an implicit (1 ± ε)-approximate solution for the max-cut problem with probability at least 5/8.
Note that our construction works in the streaming model, where the first two steps are used to transform a stream of high-dimensional points into a stream of low-dimensional points. Due to Lemma 8.3.1, the transformation of one high-dimensional input point requires O(log(d)/ε2) space and O(d · log2(d)/ε2) time. Finally, since we apply the ap-proximation algorithm of [44] on a stream of points with dimension O(1/ε2) and spread O(√
d∆n2/ε2), the complexity of our construction is as claimed in the theorem.