Complexity - Quality and Complexity of the Kinetic Data Structure

4.3 Quality and Complexity of the Kinetic Data Structure

4.3.2 Complexity

In the remainder of this chapter, we analyze our KDS in terms of its compactness, lo-cality, responsiveness, and efficiency (see Section 2.4.3 for definitions of these attributes).

Lemma 4.2.1 already implies that our KDS is compact and local. Next, we prove that the requirement for being responsive and efficient is also fulfilled.

Lemma 4.3.9. Each update operation requires O(log^d+1(n)·log(nR)) time and O(log(nR)) status changes.

Proof. Due to Lemma 4.2.1, the time to update the event queue is O(log(nR)). Except for algorithm Restore, all further steps require a constant number of range queries on T1

and T₂. Due to Lemma 4.1.8, this requires O(log^d+1(n)) time. Next, we examine the time needed for algorithm Restore. We consider the running time resulting for restoring the invariant at points with radius 2^k. The number of cubelets with radius 2^kin C(p_h(t), 6·2^k+1) is 12^d, where p_h(t) is the point that triggered the event. The query of open or closed points for one cubelet can be answered by one range query on T₁ or T₂. Due to Lemma 4.1.8, this requires O(log^d+1(n)) time. Afterwards, there has to be at most one point inserted and deleted in T₁ and T₂, which can be done in O(log^d+1(n)) time according to Lemma 4.1.8.

By summation over all radii, we get a total running time of O(log^d+1(n) · log(nR)).

There can exist at most one open facility with radius 2^k in a cubelet with radius 2^k be-cause otherwise at least one open facility would violate the invariant. Hence, the number of open facilities with radius 2^k that are closed while running algorithm Restore is con-stant. Furthermore, we open at most one facility in each cubelet, so the number of opened facilities with radius 2^k is also constant. Due to the fact that we handle O(log(nR)) radii, there are O(log(nR)) status changes per event.

Since our KDS processes a total number of O(n²log²(nR)) events (see Lemma 4.2.1), the total processing time is bounded by O(n²log^d+1(n)·log³(nR)). To measure the efficiency as defined in Section 2.4.3, we use a result from [46]. In [46], Gao et al. investigated a problem in the KDS framework which is closely related to the mobile facility location problem. In particular, they provided a randomized KDS to maintain a set of centers among moving points in the plane such that, given a specified radius, all the points are covered by balls of the given radius centered at the chosen center points. Gao et al. showed that the size

4.3 Quality and Complexity of the Kinetic Data Structure 73

of the center set is at most a constant factor larger than the minimum one. To prove the efficiency of their KDS, they showed that there is a set of n points moving linearly on the real line that forces any c-approximate cover to change Ω(n²/c²) times. With some minor modifications, their result can be transferred to the facility location problem.

Lemma 4.3.10. For any constant c > 1, there exists a set P of n points moving linearly on the real line such that any c-approximate solution to the mobile facility location problem for P undergoes Ω(n²/c²) status changes.

Proof. We assume that c is an integer and n = 2cm with m ≥ 12c² being also an integer.

Let P be the set of n moving points which is defined as follows. We partition P into m groups, each containing 2c points. Let the j-th point in the i-th group be denoted by p_i,j, where 0 ≤ i < m and 0 ≤ j < 2c. The initial position of all the points in the i-th group is i · 2m. Now, we let the point pi,j move with speed j · 2m. Let pi,j(t) be the position of pi,j

at the point of time t. Then, we have

p_i,j(t) = (i + jt) · 2m ,

for 0 ≤ i < m, 0 ≤ j < 2c, and t ≥ 0. Note that, in the time period from 0 to m, the points often change their ranks on the line. Afterwards, no two points will change their rank any more.

Let us consider the configuration of P at any point of time t₁ := k + 3c/m, for some integer k < m. At the point of time t₁, the location of the point p_i,j is

p_i,j(t₁) =

i + jk + 3cj m

· 2m = 2(i + jk)m + 6cj .

Let p_i,j and p_i⁰_,j⁰ be any two distinct points. In case that i + jk 6= i⁰ + j⁰k, the distance between p_i,j and p_i⁰_,j⁰ is

|p_i,j(t₁) − p_i⁰_,j⁰(t₁)| > 2m − 12c² ≥ 4c

at the point of time t₁. In case that i + jk = i⁰+ j⁰k, we have j⁰ 6= j since p_i,j and p_i⁰_,j⁰ are distinct. Then, it follows that the distance between p_i,j and p_i⁰_,j⁰ is

|p_i,j(t₁) − p_i⁰_,j⁰(t₁)| ≥ 6c

at the point of time t₁. Thus, at the point of time t₁, no two points are within distance 4c of each other.

Assuming that the opening costs as well as the demands of all the points in P are 1, we next analyze an optimal solution for P at the point of time t₁. Since the distance between any two points in P is greater then 4c at the point of time t₁, the only existing optimal solution is to open a facility at each input point. This leads to a total cost of n. It follows that any c-approximate solution can have a cost of at most cn. Let us now consider an approximate solution in which only a (1 − α)-fraction of the input points are open facilities.

Then, the cost for this solution is more than n − αn + 4c · αn since the distance between

any two points is greater than 4c at the point of time t₁. To ensure that the cost is at most cn, α must be smaller than (c − 1)/(4c − 1). Since 1/4 > (c − 1)/(4c − 1) for c > 1, we obtain that any c-approximate solution must open more than 3n/4 facilities.

Next, we consider the configuration of P at any point of time t₂ := k, for some integer k < m. Since p_i,j(t₂) = (i + jk) · 2m, where 0 ≤ i < m, 0 ≤ j < 2c, and k < m, each point is located at a position 2sm for some s ∈ {0, . . . , m + 2ck}. It follows that, at the point of time t₂, there exist at most m + 2ck open facilities in an optimal solution, and the optimal facility location cost is at most m + 2ck. Thus, a c-approximate solution may have at most c(m + 2ck) open facilities.

Hence, between the points of time t₁ and t₂, any c-approximate solution undergoes at least 3n/4 − c(m + 2ck) = n/4 − 2c²k status changes. Summing up over all k ∈ {0, . . . , K − 1}, the number of status changes is at least

K−1

k=0

4 − 2c²k > Kn

4 − c²K² .

Setting K = n/(8c²) < m, we have established that the total number of changes is Ω(n²/c²).

Due to Lemma 4.3.10 and the fact that we process a total number of O(n²log²(nR)) events, our KDS has an efficiency value of O(log²(nR)). Hence, the KDS for the mobile facility location problem is efficient.

We summarize our results in the following theorem:

Theorem 4. Let P be a set of n independently moving points in R^d, where d is a constant dimension. Then, there exists a deterministic KDS for the mobile facility location problem that maintains at any point of time t a set F (t) ⊆ P (t) such that we have

FacLoc(P (t), F (t)) < (64d + 1) · FacLoc(P (t), F^∗(t)) .

Let R = max_p_i∈Pf_i · max_p_i∈Pd_i/(min_p_i∈Pf_i· min_p_i∈Pd_i), where f_i and d_i are the opening cost and the demand of a point p_i, respectively. Then, the KDS has a space require-ment of O(n(log^d(n) + log(nR))) and each event requires O(log(nR)) status changes and O(log^d+1(n)·log(nR)) update time. In case that the trajectories can be described by bounded-degree polynomials, the total number of updates is O(n²log²(nR)), which results in a total processing time of O(n²log^d+1(n) · log³(nR)). A flight plan update involves O(log(nR)) certificates and requires O(log²(nR)) time.

5 Facility Location in Data Streams

This chapter deals with a constant-factor approximation algorithm for the cost of the uni-form facility location problem over dynamic geometric data streams in a discrete Euclidean space {1, . . . , ∆}^d, where d is a constant. The starting point of our algorithm is the work of Indyk [64]. It gives the best previous approach for approximating the cost of the uni-form facility location problem over dynamic geometric data streams and guarantees an approximation factor of O(log²(∆)). In [64], Indyk defines a certain partition of the space into nested square grids and a set of cells in this partition such that the number of these cells gives an O(log(∆))-approximation. During the approximation process to estimate the number of these cells, the algorithm of [64] looses another O(log(∆)) factor.

In Section 5.1, we use a similar partition of the space into nested square grids, and we show that opening a facility in each cell of a subset of the cells defined in [64] leads to a constant-factor approximation of the facility location cost. Moreover, in Section 5.3, we propose an algorithm that maintains this cost sufficiently well in the dynamic geometric data stream model. In this way, we obtain a streaming algorithm for approximating the cost of the uniform facility location problem that strongly improves the best previous one.

5.1 Definition of a Good Estimator

Let P := {p₁, . . . , p_n} be a set of n points from a discrete Euclidean space {1, . . . , ∆}^d, where d is a constant. In the streaming context, P will refer to the current point set, i.e., the set of points obtained after having applied an input sequence of insertions and deletions.

In this section, we will define a good estimator for the uniform Euclidean facility location problem (see Section 2.2 for a definition). Before we derive our estimator for the general case, we show how to deal with some special cases.

In document Approximation Techniques for Facility Location and Their Applications in Metric Embeddings (Page 86-89)