An optimal randomized algorithm for d-variate zonoid depth

(1)

www.elsevier.com/locate/comgeo

An optimal randomized algorithm for d-variate zonoid depth

Pat Morin

School of Computer Science, Carleton University, Canada

Received 5 September 2006; received in revised form 15 December 2006; accepted 18 December 2006 Available online 12 January 2007

Communicated by T. Chan

Abstract

A randomized linear expected-time algorithm for computing the zonoid depth [R. Dyckerhoff, G. Koshevoy, K. Mosler, Zonoid data depth: Theory and computation, in: A. Prat (Ed.), COMPSTAT 1996—Proceedings in Computational Statistics, Physica-Verlag, Heidelberg, 1996, pp. 235–240; K. Mosler, Multivariate Dispersion, Central Regions and Depth. The Lift Zonoid Approach, Lecture Notes in Statistics, vol. 165, Springer-Verlag, New York, 2002] of a point with respect to a fixed dimensional point set is presented.

Keywords: Robust statistics; Data depth; Zonoids; Zonotopes; Support vector machines

1. Introduction

Let S be a set of n points inRd. For a real number k 1, the k-zonoid of S is defined as

Zk(S)= p∈S

λpp: 0 λp 1/k for all p ∈ S and

p∈S λp= 1

[8,19]. Notice that, for k= 1 the 1-zonoid of S is the convex hull of S, i.e., Z1(S)= conv(S). As k increases, Zk(S)

becomes smaller and smaller until the limiting case k= n, for which Zn(S)consists of a single point, the mean of S.

The zonoid depth of a point p∈ conv(S) with respect to S is defined as

Z(p, S)= supk: p∈ Zk(S),

and is a real number in the interval[1, n].

Dyckerhoff et al. [8] give an algorithm to compute Z(p, S) by solving a linear program in the variables{λp: p∈ S}.

To obtain an efficient algorithm they make use of the fact that most of the constraints on the λ’s are independent of S. The worst-case running time of their algorithm is unclear.

Bern and Eppstein [1] study zonoids (also called reduced convex hulls) in the context of support vector machines used in machine learning. Among other things they solve a more general problem than that of zonoid depth: Given two sets S1 and S2inRd, compute the minimum value k such that Zk(S1)∩ Zk(S2)is non-empty. Their algorithm

E-mail address: [email protected].

(2)

has a running time of O(n(Ld log n)O(1)₎_{, where L is the number of bits used to describe the points in S}

1and S2.

Their algorithm uses Khachiyan’s ellipsoid method for linear programming [13] to exploit the fact that, for a given direction v, it is easy (see Section 3) to test if there is a hyperplane orthogonal to v that separates Zk(S1)and Zk(S2).

The zonoid depth decision problem asks, given p, k and S, if p∈ Zk(S). Ogryczak and Tamir [20] show that

the dual of the zonoid depth decision problem can be reduced to a linear multiple-choice knapsack problem that can be solved in O(n) time using the algorithms of Zemel [23] or Dyer [9]. Zemel and Dyer’s algorithms are, in turn, modifications of Megiddo’s O(n) time algorithm for linear programming in fixed dimensions [17,18]. While these results optimally solve the zonoid depth decision problem, further machinery is needed to turn these results into an algorithm for computing the zonoid depth of p with respect to S.

Gopala and Morin [12] consider algorithms for bivariate (d= 2) zonoid depth and give a randomized O(n) ex-pected time algorithm for computing Z(p, S) when p and S are in R2. Their algorithm is a combination of two techniques, namely a prune-and-search algorithm due to Lo et al. [15] for searching the k-level of a line arrangement and an optimization method due to Chan [3] for efficiently converting decision algorithms into optimization algo-rithms. While the latter technique extends efficiently into arbitrary (constant) dimensions [4] the former technique, unfortunately, does not.

The current paper extends and bridges the above results by giving an O(n) time algorithm to compute Z(p, S) when p and S are inRd for any constant dimension d. The algorithm uses a recent method, due to Chan [4], for solving linear programs with many constraints that are defined implicitly by a small number of objects. Besides being the first linear-time algorithm for solving the zonoid depth problem in constant dimensions, the current results are interesting for two other reasons:

1. Zonoid depth is one of many definitions of depth proposed in the robust statistics literature [14]. Perhaps the gold standard in this regard is Tukey (halfspace) depth [22]:

T (p, S)= min|h ∩ S|: h is a closed halfspace containing p.

Tukey depth and zonoid depth have an interesting feature in common; under duality, the combinatorial structure of the depth k contour is determined by the k-level and the (n− k + 1)-level of a set of hyperplanes. The structure of k-levels has been extensively studied by combinatorial geometers [16, Chapter 11] although our understanding of their complexity is still not complete, even in 2 dimensions.

The current result shows a divergence in the computational complexity of Tukey and zonoid depth. In con-stant dimensions d 3 the fastest algorithms for computing the Tukey depth of a point have running times of

(nd−1)[5], whereas the current result shows that zonoid depth can be computed in O(n) time in any constant dimension d. When the dimension grows arbitrarily large the situation is even worse. Computing T (p, S) is NP-hard in general [2], while the result of Bern and Eppstein [1] yields a polynomial time algorithm for com-puting Z(p, S) in any dimension. Thus, together these results show that zonoid depth is computationally more tractable than Tukey depth in both large and small dimensions.

2. Our algorithm makes use of Chan’s recent technique for solving implicit linear programs in small dimensions [4]. Interestingly, this technique was introduced in order to solve a problem related to Tukey depth, namely the prob-lem of finding a point p that maximizes T (p, S). Unfortunately, the resulting algorithm runs in O(n log n+ nd−1₎

time, limiting its usefulness for dimensions d 3.1_{Indeed, although Chan’s technique itself does not}

asymptot-ically increase the running time as the dimension d increases, it seems that most applications of the technique either break down or have quickly increasing running times as d increases.2The current result is therefore an atypical example that illustrates the full utility of this extremely powerful technique.

In the following, all points, vectors, and hyperplanes are assumed to live inRd _and_Hd _{denotes the set of all}

hyperplanes inRd. The notation xi denotes the ith coordinate of the point x. We use the· notation to denote the

inner-product of two points/vectors, i.e., x· y =di=1xiyi. For a set S of n points and a non-zero vector r, Sr1, . . . , Snr

is the sequence of elements of S ordered by decreasing projections onto r, i.e., S_ir· r S_ir₊₁· r for all 1 i n − 1.

1 _{In fact, this running time is probably optimal. See Chan [4, Section 1.4] for details.} 2 _{One notable exception is parametric minimum spanning trees [11].}

(3)

For a point x and a hyperplane h, we denote by x↓ h the dth coordinate of the vertical projection of x onto h (the height of x when dropped onto h). For a set H of n hyperplanes, let H_ixbe the ith hyperplane in H encountered by a downward vertical ray originating at (x1, . . . , xd−1,∞). For ease of notation we use the shorthand H_−ix = H_{|H |−i+1}x .

For i >|H| we use the convention that H_ix(respectively H_−ix ) is the “horizontal hyperplane at infinity”{x: xd= −∞}

(respectively,{x: xd= +∞}).

The remainder of this paper is organized as follows: Section 2 reviews Chan’s generalized optimization technique. Section 3 discusses properties of zonoids in primal and dual space. Section 4 presents an algorithm to answer the zonoid depth decision problem. Finally, Section 5 describes our algorithm for computing Z(p, S).

2. Chan’s generalized optimization technique

Chan [4] used the following theorem to provide an O(n log n) time algorithm for maximum Tukey depth.3In the following, and throughout the remainder of the paper, we use the shorthand∩S to denote_s_∈Ss.

Theorem 1. (See Chan [4].) LetH denote the set of all halfspaces in Rd, letP denote the set of all possible inputs to some problem, let f :P → 2Hbe any function mapping problem inputs to sets of halfspaces, let g :Rd_{→ R be any} linear objective function, and let D(n)= (n), for some >0, be a non-decreasing function of n. Suppose that f

and g satisfy:

0. Given inputs P1, . . . , Pd∈ P each of constant size, a point p ∈ ∩(f (P1)∪ · · · ∪ f (Pd)) maximizing g(p) can be found in constant time.

1. Given a point p∈ Rd and an input P ∈ P of size n, there exists a D(n) time algorithm to determine whether p∈ ∩f (P ).

2. There exists constants α < 1 and r such that, for any input P∈ P of size n, it is possible to compute, in D(n)

time, inputs P1, . . . , Pr, each of size at most αn, and such that ∩f (P ) = ∩(f (P1)∪ · · · ∪ f (Pr)).

Then there exists a randomized O(D(n)) expected time algorithm to compute, for any input P ∈ P of size n a point p∈ ∩f (P ) that maximizes g(p).

It is worth noting that the codomain of the function f may contain infinite sets. That is, it is acceptable (and common) to have inputs P∈ P that generate an infinite number of constraints, i.e., |f (P )| = ∞.

3. Properties of primal and dual zonoids

The k-zonoid Zk(S)is a convex polytope. The extreme-most vertex of Zk(S)in direction x can be obtained as

a convex combination of the k extreme-most points of S in direction x. More precisely, argmax p p· x: p ∈ Zk(S) = _k i₌₁ 1 kS x i +1− k/kS_kx (1)

[1,12]. Intuitively, we assign the maximum allowable coefficient (1/k) to each of thek extreme-most vertices and the “leftover” (1− k/k) is assigned to the next vertex.

We wish to arrive at a situation in which we can apply Theorem 1 and this is best done by working in the dual. Consider the point-hyperplane duality function ϕ given by

ϕ(x)=y∈ Rd: yd= x1y1+ · · · + xd−1yd−1− xd

when x is a point inRd_and

ϕ(X)=ϕ(x): x∈ X

when X is a subset ofRd. See Edelsbrunner’s book [10] for properties of this duality.

(4)

Let H= ϕ(S). Then, under this duality, the dual k-zonoid ϕ(Zk(S))is the set of all hyperplanes inRdthat do not

intersect either of two convex sets Ak(S)and Bk(S). That is, ϕZk(S) =h∈ Hd: h∩Ak(S)∪ Bk(S) = ∅, where Ak(S)= x∈ Rd: xd _k i=1 1 k x↓ H_ix +1− k/kx↓ H_kx (2) and Bk(S)= x∈ Rd: xd _k i=1 1 k x↓ H_−ix +1− k/kx↓ H_− kx . (3)

The definitions of Ak(S)and Bk(S)follow from (1) and the duality ϕ. The two sets Ak(S)and Bk(S)are convex,

unbounded from above, respectively, below, and piecewise linear. Indeed, the linear pieces of Ak(S) (respectively Bk(S)) are in correspondence with the linear pieces of the k-level (respectively the (n − k + 1)-level) of the

hyperplanes in H .4Thus, Ak(S)and Bk(s)are convex polytopes that are implicitly defined by the hyperplanes in H

and it is these implicit “linear programs” that will ultimately allow us to apply Theorem 1.

4. The decision algorithm

Next we consider the following decision problem: Given a point set S and an integer k, is the origin contained in

Zk(S)? By translation, a solution to this problem allows us to test if an arbitrary point p∈ Rd is contained in Zk(S).

One approach to solving this problem is to compute the intersection of Zk(S)with the vertical line{x ∈ Rd: x0= x1= · · · = xd−1= 0} through the origin and then check if this intersection contains the origin.

Under the duality ϕ, the above strategy is equivalent to finding the lowest point on Ak(S)and the highest point

on Bk(S)and checking that each of these points is above, respectively, below, the hyperplane{x ∈ Rd: xd= 0}. In

the remainder, we focus on determining the lowest point in Ak(S). Finding the highest point in Bk(S) is done in

a symmetric manner. However, before we can proceed, we need to define a slightly more general problem involving weights.

Let S be a set of n points inRd and let w : S→ N be a function assigning positive integer weights to the ele-ments of S. We denote by Sw the multiset in which each element p∈ S occurs w(p) times. The w-weighted zonoid

Zk(S, w)is simply the k-zonoid of the multiset Sw, i.e., Zk(S, w)= Zk(Sw). As with standard zonoids, the weighted

zonoid Zk(S, w)dualizes to the set of all hyperplanes that do not intersect either of two convex regions Ak(S, w)and Bk(S, w), where Ak(S, w)= Ak(Sw)and Bk(S, w)= Bk(Sw).

This definition of weighted zonoids allows us to naturally define subproblems. For a subset C⊆ S, define the total

weight

w(C)= p∈C

w(p)

and the weighted mean

μ(C)= 1 w(C)

p∈C

p× w(p).

The contraction of (S, w) by C is obtained by replacing the points of C by their weighted average, μ(C). More precisely, the contraction of (S, w) by C is the pair (R, v) where

R= (S \ C) ∪μ(C)

4 _{When k is not an integer, there is a bijection between the pieces of A}

k(S)and the k-level of H . When k is an integer there is an injection

(5)

and

v(p)=

w(p) if p∈ S \ C,

w(C) if p= μ(C).

The following lemma shows that contraction results in strictly smaller zonoids:

Lemma 1. If (R, v) is a contraction of (S, w) by C then Zk(R, v)⊆ Zk(S, w).

Proof. Let x be any point in Zk(R, v). Then, by the definition of zonoids:

x= p∈Rv λpp = p∈(R\{μ(C)})v λpp + p∈{μ(C)}v λμ(C)p = p∈(S\C)w λpp + p_∈Cw λμ(C)p ∈ Zk(S, w) as required. 2

We now have all the tools required to apply Theorem 1 to solve our decision problem.

Theorem 2. Given a set S of n points inRd and a function w : S→ N that is computable in constant time, the point x∈ Ak(S, w) such that xdis minimum can be found in O(n) expected time.

Proof. Let f be the function that maps the pair (S, w) onto a set of halfspaces whose intersection is Ak(S, w)and let

the objective function g(x)= xd. We need to show that the functions f and g satisfy conditions 0–2 of Theorem 1.

To satisfy condition 0 of Theorem 1 we can enumerate all the linear constraints generated by each of the d sub-problems and use any linear programming algorithm to find a point x that satisfies all constraints and such that xd is

minimum. There are only d subproblems, each of constant size, so this step takes constant time, as required.

To satisfy condition 1 of Theorem 1 we observe that testing if x∈ Ak(S, w)simply involves checking if x

satis-fies (2). Let H = ϕ(S). This check can be accomplished by using a D(n) = O(n) time weighted selection algorithm [7, Exercise 9-2] to compute the smallest index t and the hyperplanes H₁x, . . . , H_tx such thatt_i₌₁w(ϕ(H_ix)) k.

Once this is done we need only check (2) which, in the weighted setting, becomes

x _t₋₁ i₌₁ 1 k x↓ H_ix× wϕH_ix +1 k x↓ H_tx× k− t−1 i₌₁ vϕH_ix .

To satisfy condition 2 of Theorem 1 we make use of cuttings [16, Section 6.5]. In particular, we use the fact that, in O(n) time, it is possible to partitionRd into r= O(1) simplices Δ1, . . . , Δr such that the interior of each simplex

is intersected by at most n/2 of the hyperplanes in ϕ(S). For each simplex Δi we create a subproblem (Si, wi)as

follows: Let Ci⊆ S contain every point p ∈ S such that ϕ(p) is above the interior of Δi. We first construct the pair (Ti, wi)by contracting (S, w) by Ci. Next, we obtain Si by removing from Ti every point p such that ϕ(p) is below

the interior of Δi. The subproblems (Si, wi)for 1 i r that we obtain in this manner are each of size at most n/2+ 2.

It follows from Lemma 1 (the contraction step) and the definition of Zk(S, w)(the deletion step) that Zk(Si, wi)⊆ Zk(S, w). In the dual, this means that Ak(Si, wi)⊇ Ak(S, w). To satisfy condition 2 of Theorem 1 we must show

that r_i₌₁Ak(Si, wi)= Ak(S, w). To do this, consider any point x on the boundary of Ak(S, w). It is sufficient to

show that x is also on the boundary of at least one region Ak(Si, wi)for 1 i r. The point x is defined by k

hyperplanes h1, . . . , h k∈ ϕ(Sw)in the sense that

xd= _k i=1 1 k(x↓ hi) +1− k/k(x↓ h_k).

(6)

Let q be the vertical projection of x onto h_k. There is some simplex Δi that contains q. Observe that each of h1, . . . , h k−1 is either completely above the interior of Δi or intersects Δi. Furthermore, any hyperplane in ϕ(S)

that is completely above Δi is one of h1, . . . , h k−1. Therefore, the subproblem (Si, wi)is obtained from (S, w) by

contracting Ci⊆ {ϕ(h1), . . . , ϕ(h k−1)} and then deleting some subset of S \{ϕ(h1), . . . , ϕ(h k−1)}. Let I = ϕ(S_iwi).

Then, every point x in Ak(Si, wi)must satisfy xd _k i=1 1 k x↓ I_ix +1− k/kx↓ I_kx = _k i=1 1 k(x↓ hi) +1− k/k(x↓ h_k).

This last equality follows from the fact that the contraction operation that creates (Si, wi)simply takes the weighted

mean of k k hyperplanes h1, . . . , hk ∈ ϕ(Sw)and replaces these with kcopies I1x, . . . , I x k ∈ ϕ(S

wi

i )of the mean.

Thus x is on the boundary of Ak(Si, wi), as required. We have now satisfied all three conditions necessary to apply Theorem 1, completing the proof. 2

5. The optimization algorithm

In the previous section we showed, given p, S and k, how to answer the question: Is p∈ Zk(S)? In this section

we consider the optimization problem, given p and S: What is the maximum value of k such that p∈ Zk(S)? For

this problem, we apply Theorem 1 again, this time on a problem inRd+1. To do this, we use the (d+ 1)st coordinate of our space to represent the value k, so that we consider the polytope whose cross-sections are the k-zonoids for all 1 k n.

Formally, consider the point set

Z(S)=p∈ Rd+1: (p1, . . . , pd)∈ Zpd+1(S)

.

The set Z(S) is a convex polytope inRd+1. Dualizing Z(S) as before gives two regions A(S) and B(S). Recall that the zonoid depth of p with respect to S is

Z(p, S)= supk: p∈ Zk(S)

= infk: p /∈ Zk(S)

.

If we assume (by translation) that p is the origin, then, in the dual p becomes the hyperplane h0= {x: xd= 0} and p∈ Zk(S)if and only if h0does not intersect Ak(S)or Bk(S). In particular, the value k we are searching for is the

minimum of kAand kBwhere kA= min xd+1: x∈ h0∩ A(S) and kB= min xd+1: x∈ h0∩ B(S) .

In words, we want the minimum value of k such that Ak(S)or Bk(S)intersects the hyperplane h0= {x ∈ Rd: xd= 0}.

Theorem 3. Given a set S of n points inRd_{and a point p}_{∈ R}d_{, the maximum value k such that p}_{∈ Z}

k(S) can be found in O(n) expected time.

Proof sketch. The proof is another application of Theorem 1 to find the values kAand kBdescribed above. We focus

only on finding kA, as finding kBis a symmetric problem. The details are much the same as in Theorem 2 so we only

sketch them. As before we generalize A(S) and B(S) to the weighted setting using multisets and let f (S, w) be the function that maps (S, w) on to the set of linear constraints that define h0∩ A(Sw).

As before, S satisfies condition 0 of Theorem 1 since, for constant size subproblems we can explicitly generate the polytopes Z(S1, w1), . . . , Z(Sd+1, wd+1), compute their common intersection with h0 and find a point in the

(7)

The decision problem we must solve to satisfy condition 1 of Theorem 1 is the problem of testing whether a point

p∈ Rd+1_{is contained in h}

0∩ A(S). But this is simply a matter of checking that p ∈ h0and that (p1, . . . , pd)is in Apd+1(S), a problem for which we described an O(n) time algorithm in the proof of Theorem 2.

The partitioning into subproblems required to satisfy condition 2 of Theorem 1 can be done in exactly the same manner as described in the proof of Theorem 2. To see that this partitioning works in the current case we need only observe that the partitioning makes no use of the value k and the argument used to show its correctness holds for all values of k. This completes the proof sketch. 2

We conclude with a few remarks about the constants in the algorithm. The use of Chan’s technique yields a simple-to-implement algorithm, but this algorithm has extremely large constants hidden in the asymptotic notation. Even using the most sophisticated forms of cuttings and fixed-dimensional linear programming, the expected running time of the algorithm is given by the recurrence

T (n)= (log r)(cd)d/2+O(1)× T (n/r) + Ordn,

where r is an integer parameter and c is a constant that occurs in Clarkson’s randomized linear programming algo-rithm [6]. Thus, to even obtain an expected running time of O(n), we require r/ log(r) > (cd)d/2+O(1)_{. It might be}

possible to reduce the dependence on d somewhat by engineering a hybrid algorithm that uses Chan’s original opti-mization technique [3] in conjunction with the decision algorithm of Ogryczak and Tamir [20]. However, the latter algorithm already includes a factor of d! in the running time so the running time of the resulting algorithm will still have a superpolynomial dependence on d.

References

[1] M.W. Bern, D. Eppstein, Optimization over zonotopes and training support vector machines, in: Workshop on Algorithms and Data Structures, 2001, pp. 111–121.

[2] N. Chakravarti, Some results concerning post-infeasibility analysis, European Journal of Operations Research 73 (1994) 139–143. [3] T.M. Chan, Geometric applications of a randomized optimization technique, Discrete & Computational Geometry 22 (4) (1999) 547–567. [4] T.M. Chan, An optimal randomized algorithm for maximum Tukey depth, in: Proc. 15th ACM-SIAM Symposium on Discrete Algorithms

(SODA), 2004, pp. 423–429.

[5] T.M. Chan, Low-dimensional linear programming with violations, SIAM Journal on Computing 34 (2005) 879–893.

[6] K.L. Clarkson, Las Vegas algorithms for linear and integer programming when the dimension is small, Journal of the ACM 42 (2) (1995) 488–499.

[7] T.H. Cormen, C.E. Leiserson, R.L. Rivest, C. Stein, Introduction to Algorithms, second ed., McGraw-Hill, New York, 2001.

[8] R. Dyckerhoff, G. Koshevoy, K. Mosler, Zonoid data depth: Theory and computation, in: A. Prat (Ed.), COMPSTAT 1996—Proceedings in Computational Statistics, Physica-Verlag, Heidelberg, 1996, pp. 235–240.

[9] M.E. Dyer, An O(n) algorithm for the multiple-choice knapsack linear program, Mathematical Programming 29 (1) (1984) 57–63. [10] H. Edelsbrunner, Algorithms in Combinatorial Geometry, Springer-Verlag, Heidelberg, 1997.

[11] D. Eppstein, Setting parameters by example, SIAM Journal on Computing 82 (2003) 638–653.

[12] H. Gopala, P. Morin, Algorithms for bivariate zonoid depth, Computational Geometry: Theory and Applications (2006). Special issue of selected papers from the 16th Canadian Conference on Computational Geometry (CCCG 2004).

[13] L.G. Khachiyan, A polynomial time algorithm for linear programming, Soviet Mathematics Doklady 20 (1979) 1092–1096.

[14] R. Liu, J.M. Parelius, K. Singh, Multivariate analysis by data depth: Descriptive statistics, graphics and inference, The Annals of Statis-tics 27 (3) (1999) 783–858.

[15] C.-Y. Lo, J. Matousek, W. Steiger, Algorithms for ham-sandwich cuts, Discrete & Computational Geometry 11 (1994) 433–452. [16] J. Matoušek, Lectures on Discrete Geometry, Springer-Verlag, New York, 2002.

[17] N. Megiddo, Linear time algorithms for linear programming inR3and related problems, SIAM Journal on Computing 12 (1983) 759–776. [18] N. Megiddo, Linear programming in linear time when the dimension is fixed, Journal of the ACM 31 (1984) 114–127.

[19] K. Mosler, Multivariate Dispersion, Central Regions and Depth. The Lift Zonoid Approach, Lecture Notes in Statistics, vol. 165, Springer-Verlag, New York, 2002.

[20] W. Ogryczak, A. Tamir, Minimizing the sum of the k largest functions in linear time, Information Processing Letters 85 (2003) 117–122. [21] M. Sharir, E. Welzl, A combinatorial bound for linear programming and related problems, in: Proceedings of Symposion on Theoretical

Aspects of Computer Science (STACS ’92), 1992, pp. 569–588.

[22] J.W. Tukey, Mathematics and the picturing of data, in: R.D. James (Ed.), Proceedings of the International Congress of Mathematicians, vol. 2, Vancouver Canada, August 1974, pp. 523–531.

[23] E. Zemel, An O(n) algorithm for the linear multiple choice knapsack problem and related problems, Information Processing Letters 18 (1984) 123–128.