Introduction - Low Density Cluster Separators for Large, High Dimensional, Mixed and Non Linear

In the density-based approach to clustering, clusters are defined as subsets of observations belonging to contiguous regions of high probability density, concentrated around the modes of some unknown probability density functionpx, which may be estimated by a non-

parametric estimated densitypˆx. As discussed in Chapter 4, the inaccuracy of density esti-

mation in even moderate dimensions, restricts the direct location of clusters associated with high-density regions ofpˆxto low-dimensional problems (Rinaldo and Wasserman,2010).

However, it is possible to apply the alternative formulation of locating low-density cluster boundaries that separate these high-density regions. This is known as thelow-density separa- tion assumption. These low-density cluster separators may be located using one-dimensional orthogonal projections of the data, making this alternative formulation applicable in high- dimensional datasets. However, the evaluation of the density intersected by a cluster boundary is computationally intractable for boundaries of arbitrary shapes, and therefore, the resulting separator is restricted to be a linear cluster boundary (hyperplane).

In Chapter 4, we proposed approaches to locate high-density clusters using a collection of minimum density hyperplane separators that identify linear cluster boundaries that inter- sect regions of minimal density while separating the regions of contiguous high probability density around the modes ofpˆx, since the subsets of observations in these regions are associ-

ated with clusters. This approach is capable of locating high-quality partitions in arbitrarily oriented subspaces. However, the ability to correctly identify clusters that are not linearly separable is an attractive property of density-based clustering generally, and the restriction to linear cluster boundaries imposed by the minimum density hyperplane (MDH) is an im- portant limitation.

lowing the application of our approaches to low-density cluster separation to high-dimensional datasets whose clusters cannot be correctly identified by a collection of hyperplane separators in the space of the original observations. We first map the data non-linearly into a feature space, and a MDH is sought in the new feature space, where the hyperplane separator corresponds to a non-linear separator in the input space. The potentially infinite dimen- sionality of the feature space means it is not feasible to calculate the mapped observations (feature vectors) explicitly. However, we provide a formulation that permits the location of the KMDH in the feature space using the kernel matrix of pairwise inner products be- tween the feature vectors, that is computed directly by the kernel function on the original observations. This also permits the KMDH to be computed for any dataset that permits the construction of a kernel matrix, including data with discrete or non-numeric attributes.

The location of the KMDH involves a non-smooth, non-convex optimisation problem overnvariables, wherenis the number of observations. In many applications of interestn can be very large, in which case an exhaustive search over allndimensions for the KMDH is infeasible, and unlikely to be necessary to locate a high-quality separator. To overcome this we propose an approximation method, which we call the subspace KMDH (S-KMDH), that seeks hyperplanes in a subspace of the feature space. This reduces the search space for a low-density separator, and avoids searching over dimensions of the feature space that are unlikely to be meaningful for cluster separation.

Since any projection vectors that permit a meaningful cluster separator will lie in then- dimensional space spanned by the feature vectors, the KMDH may be equivalently located using the projections of the feature vectors onto ann-dimensional orthonormal basis of the feature space, that spans the same space as the feature vectors. For the practical location of the KMDH we take this approach, using the orthonormal basis defined by kernel principal

component analysis (KPCA) (Schölkopf et al.,1998). This also permits an intuitive specifi- cation of an appropriate subspace for S-KMDH, which is located using the projections of the feature vectors onto the firstn′ ≪nkernel principal components.

The remainder of this chapter is organised as follows: Section 5.2 presents the formulation of the MDH in feature space (KMDH), and the approximation of this using a smaller subspace of the feature space (S-KMDH) using the kernel matrix directly. Section 5.3 then describes how we locate the KMDH and S-KMDH practically, using the projections of the feature vectors onto the kernel principal components. Next, in Section 5.4 we discuss how we combine bi-partitions resulting from hyperplane separators of the feature vectors in a divisive algorithm, producing a complete clustering. Section 5.5 provides an empirical evaluation of the clustering results from the proposed divisive algorithm using bi-partitions from the KMDH and S-KMDH at each level of the hierarchy. The proposed approaches are compared to alternative kernel-based clustering algorithms across benchmark datasets with varying characteristics. Conclusions are given in Section 5.6.

In document Low Density Cluster Separators for Large, High Dimensional, Mixed and Non Linearly Separable Data (Page 109-111)