• No results found

It is common to define pairwise similarities of points via a decreasing func- tion of the distance between them. That is, for a decreasing functionk:R+→

R+, the similarity functionsmay be written,

s(P,i,j) =k d(p i,pj) σ , (3.21)

whered(·,·)is a metric andσ>0 is ascaling parameter. We have found that the

projection pursuit method which we propose can be susceptible to outliers when the standard Euclidean distance metric is used, especially in the case of minimisingλ2(L(θθθ)). In this subsection we discuss how to embed a balancing

constraint into the distance function. By including this balancing mechanism the projection pursuit is steered away from projections which result in only few data being separated from the remainder of the data set.

While the normalisation of the graph cut objective, given in (3.2), is ex- tremely effective in emphasising balanced partitions in the general spectral clustering problem (von Luxburg,2007), we have found that in the projection pursuit formulation a further emphasis on balance is sometimes required. This is especially the case in high dimensional applications. Consider the extreme case where d>N. Then the projection equation, V>X=P, is an underdetermined system of linear equations. Therefore for any desired pro- jected data setPthere existθθθ∈Θ,c∈R\ {0}s.t.V(θθθ)>X=cP. In other words

the projected data can be made to have any distribution, up to a scaling constant. In particular we can generally find projections which induce a suf- ficient separation of a small group of points from the remainder of the data that the normalisation in (3.2) is inadequate to obtain a balanced partition. We have observed that in practice even for problems of moderate dimension this situation can occur. The importance of including a balancing constraint in the context of projection pursuit for clustering has been observed previ- ously byZhang et al.(2009) andPavlidis et al.(2015).

Emphasising balanced partitions is achieved through the use of a compact constraint set∆, which may be defined using the distribution of the projected data setP. By defining the metricd(·,·)in such a way that distances between points extending beyond ∆are reduced, we increase the similarity of points outside∆with others. IfPisldimensional then we define∆as the rectangle ∆=∏li=1∆i, where each ∆i is an interval in R which is defined using the

distribution of the i-th component of P. A convenient way of increasing similarties with points lying outside ∆is with a transformationT:RlRl,

defined as follows, T(y) = t1(y1),...,t∆l(yl) , (3.22) t∆i(z):=                −δ min∆i−z+ (δ(1−δ)) 1 δ 1−δ +δ(δ(1−δ)) 1−δ δ , z<min∆i z−min∆i, z∈∆i δ z−max∆i+ (δ(1−δ)) 1 δ 1−δδ(δ(1−δ)) 1−δ δ , +Diam(∆i), z>max∆i, (3.23) whereδ∈(0,.5]is the distance reducing parameter. Eachti is linear on∆ibut

ykfor anyx,y∈Rl, with strict inequality whenever eitherxory lies outside

∆. We define T in this way so that it is continuously differentiable even at the boundaries of ∆, and so does not affect the differentiability properties of the similarity function,s. Figure3.1illustrates how the functionTinfluences distances and similarities in the univariate case.

In the context of projection pursuit it is convenient to define a full di- mensional convex constraint set∆∆∆⊂Rdand define the univariate constraint

intervals, which we now index by the corresponding projection angles, via the projection of∆∆∆onto eachV(θθθ)i. That is,

θθθi:=

h

min{V(θθθ)i>x|x∈∆∆∆},max{V(θθθ)i>x|x∈∆∆∆}

i

. (3.24)

In our implementation, we define∆∆∆to be a scaled covariance ellipsoid cen- tered at the mean of the data. The projections of∆∆∆are thus given by intervals of the form,

θθθi= [µθθθi−βσθθθi,µθθθi+βσθθθi], (3.25)

whereµθθθiandσθθθiare the mean and standard deviation of thei-th component

of the projected data setP(θθθ)and the parameterβ≥0 determines the width

of the projected constraint interval∆θθθi.

Figure3.2shows two dimensional projections of the 64 dimensional op- tical recognition of handwritten digits dataset1. The leftmost plot shows the PCA projection which is used as initialisation for the projection pursuit. The remaining plots show the projections arising from the minimisation of

λ2(L(θθθ))for a variety of values ofβ. Forβ=∞, i.e., an unconstrained projec-

tion, the projection pursuit focuses only on a few data and leaves the remain- der of the dataset almost unaffected by the projection. Setting β=2.5 causes

the projection pursuit to focus on a larger proportion of the tail of the data distribution. Setting β=1.5 however allows the projection pursuit to iden-

tify the cluster structure in the data and find a projection which provides a good separation of three of the clusters in the data, i.e., those shown as black, orange and turquoise in the top right plot.