4.3 Quality and Complexity of the Kinetic Data Structure
4.3.2 Complexity
In the remainder of this chapter, we analyze our KDS in terms of its compactness, lo-cality, responsiveness, and efficiency (see Section 2.4.3 for definitions of these attributes).
Lemma 4.2.1 already implies that our KDS is compact and local. Next, we prove that the requirement for being responsive and efficient is also fulfilled.
Lemma 4.3.9. Each update operation requires O(logd+1(n)·log(nR)) time and O(log(nR)) status changes.
Proof. Due to Lemma 4.2.1, the time to update the event queue is O(log(nR)). Except for algorithm Restore, all further steps require a constant number of range queries on T1
and T2. Due to Lemma 4.1.8, this requires O(logd+1(n)) time. Next, we examine the time needed for algorithm Restore. We consider the running time resulting for restoring the invariant at points with radius 2k. The number of cubelets with radius 2kin C(ph(t), 6·2k+1) is 12d, where ph(t) is the point that triggered the event. The query of open or closed points for one cubelet can be answered by one range query on T1 or T2. Due to Lemma 4.1.8, this requires O(logd+1(n)) time. Afterwards, there has to be at most one point inserted and deleted in T1 and T2, which can be done in O(logd+1(n)) time according to Lemma 4.1.8.
By summation over all radii, we get a total running time of O(logd+1(n) · log(nR)).
There can exist at most one open facility with radius 2k in a cubelet with radius 2k be-cause otherwise at least one open facility would violate the invariant. Hence, the number of open facilities with radius 2k that are closed while running algorithm Restore is con-stant. Furthermore, we open at most one facility in each cubelet, so the number of opened facilities with radius 2k is also constant. Due to the fact that we handle O(log(nR)) radii, there are O(log(nR)) status changes per event.
Since our KDS processes a total number of O(n2log2(nR)) events (see Lemma 4.2.1), the total processing time is bounded by O(n2logd+1(n)·log3(nR)). To measure the efficiency as defined in Section 2.4.3, we use a result from [46]. In [46], Gao et al. investigated a problem in the KDS framework which is closely related to the mobile facility location problem. In particular, they provided a randomized KDS to maintain a set of centers among moving points in the plane such that, given a specified radius, all the points are covered by balls of the given radius centered at the chosen center points. Gao et al. showed that the size
4.3 Quality and Complexity of the Kinetic Data Structure 73
of the center set is at most a constant factor larger than the minimum one. To prove the efficiency of their KDS, they showed that there is a set of n points moving linearly on the real line that forces any c-approximate cover to change Ω(n2/c2) times. With some minor modifications, their result can be transferred to the facility location problem.
Lemma 4.3.10. For any constant c > 1, there exists a set P of n points moving linearly on the real line such that any c-approximate solution to the mobile facility location problem for P undergoes Ω(n2/c2) status changes.
Proof. We assume that c is an integer and n = 2cm with m ≥ 12c2 being also an integer.
Let P be the set of n moving points which is defined as follows. We partition P into m groups, each containing 2c points. Let the j-th point in the i-th group be denoted by pi,j, where 0 ≤ i < m and 0 ≤ j < 2c. The initial position of all the points in the i-th group is i · 2m. Now, we let the point pi,j move with speed j · 2m. Let pi,j(t) be the position of pi,j
at the point of time t. Then, we have
pi,j(t) = (i + jt) · 2m ,
for 0 ≤ i < m, 0 ≤ j < 2c, and t ≥ 0. Note that, in the time period from 0 to m, the points often change their ranks on the line. Afterwards, no two points will change their rank any more.
Let us consider the configuration of P at any point of time t1 := k + 3c/m, for some integer k < m. At the point of time t1, the location of the point pi,j is
pi,j(t1) =
i + jk + 3cj m
· 2m = 2(i + jk)m + 6cj .
Let pi,j and pi0,j0 be any two distinct points. In case that i + jk 6= i0 + j0k, the distance between pi,j and pi0,j0 is
|pi,j(t1) − pi0,j0(t1)| > 2m − 12c2 ≥ 4c
at the point of time t1. In case that i + jk = i0+ j0k, we have j0 6= j since pi,j and pi0,j0 are distinct. Then, it follows that the distance between pi,j and pi0,j0 is
|pi,j(t1) − pi0,j0(t1)| ≥ 6c
at the point of time t1. Thus, at the point of time t1, no two points are within distance 4c of each other.
Assuming that the opening costs as well as the demands of all the points in P are 1, we next analyze an optimal solution for P at the point of time t1. Since the distance between any two points in P is greater then 4c at the point of time t1, the only existing optimal solution is to open a facility at each input point. This leads to a total cost of n. It follows that any c-approximate solution can have a cost of at most cn. Let us now consider an approximate solution in which only a (1 − α)-fraction of the input points are open facilities.
Then, the cost for this solution is more than n − αn + 4c · αn since the distance between
any two points is greater than 4c at the point of time t1. To ensure that the cost is at most cn, α must be smaller than (c − 1)/(4c − 1). Since 1/4 > (c − 1)/(4c − 1) for c > 1, we obtain that any c-approximate solution must open more than 3n/4 facilities.
Next, we consider the configuration of P at any point of time t2 := k, for some integer k < m. Since pi,j(t2) = (i + jk) · 2m, where 0 ≤ i < m, 0 ≤ j < 2c, and k < m, each point is located at a position 2sm for some s ∈ {0, . . . , m + 2ck}. It follows that, at the point of time t2, there exist at most m + 2ck open facilities in an optimal solution, and the optimal facility location cost is at most m + 2ck. Thus, a c-approximate solution may have at most c(m + 2ck) open facilities.
Hence, between the points of time t1 and t2, any c-approximate solution undergoes at least 3n/4 − c(m + 2ck) = n/4 − 2c2k status changes. Summing up over all k ∈ {0, . . . , K − 1}, the number of status changes is at least
K−1
X
k=0
n
4 − 2c2k > Kn
4 − c2K2 .
Setting K = n/(8c2) < m, we have established that the total number of changes is Ω(n2/c2).
Due to Lemma 4.3.10 and the fact that we process a total number of O(n2log2(nR)) events, our KDS has an efficiency value of O(log2(nR)). Hence, the KDS for the mobile facility location problem is efficient.
We summarize our results in the following theorem:
Theorem 4. Let P be a set of n independently moving points in Rd, where d is a constant dimension. Then, there exists a deterministic KDS for the mobile facility location problem that maintains at any point of time t a set F (t) ⊆ P (t) such that we have
FacLoc(P (t), F (t)) < (64d + 1) · FacLoc(P (t), F∗(t)) .
Let R = maxpi∈Pfi · maxpi∈Pdi/(minpi∈Pfi· minpi∈Pdi), where fi and di are the opening cost and the demand of a point pi, respectively. Then, the KDS has a space require-ment of O(n(logd(n) + log(nR))) and each event requires O(log(nR)) status changes and O(logd+1(n)·log(nR)) update time. In case that the trajectories can be described by bounded-degree polynomials, the total number of updates is O(n2log2(nR)), which results in a total processing time of O(n2logd+1(n) · log3(nR)). A flight plan update involves O(log(nR)) certificates and requires O(log2(nR)) time.
5 Facility Location in Data Streams
This chapter deals with a constant-factor approximation algorithm for the cost of the uni-form facility location problem over dynamic geometric data streams in a discrete Euclidean space {1, . . . , ∆}d, where d is a constant. The starting point of our algorithm is the work of Indyk [64]. It gives the best previous approach for approximating the cost of the uni-form facility location problem over dynamic geometric data streams and guarantees an approximation factor of O(log2(∆)). In [64], Indyk defines a certain partition of the space into nested square grids and a set of cells in this partition such that the number of these cells gives an O(log(∆))-approximation. During the approximation process to estimate the number of these cells, the algorithm of [64] looses another O(log(∆)) factor.
In Section 5.1, we use a similar partition of the space into nested square grids, and we show that opening a facility in each cell of a subset of the cells defined in [64] leads to a constant-factor approximation of the facility location cost. Moreover, in Section 5.3, we propose an algorithm that maintains this cost sufficiently well in the dynamic geometric data stream model. In this way, we obtain a streaming algorithm for approximating the cost of the uniform facility location problem that strongly improves the best previous one.
5.1 Definition of a Good Estimator
Let P := {p1, . . . , pn} be a set of n points from a discrete Euclidean space {1, . . . , ∆}d, where d is a constant. In the streaming context, P will refer to the current point set, i.e., the set of points obtained after having applied an input sequence of insertions and deletions.
In this section, we will define a good estimator for the uniform Euclidean facility location problem (see Section 2.2 for a definition). Before we derive our estimator for the general case, we show how to deal with some special cases.