2.4. Clustering Algorithms
2.4.3. Density-Based Methods
The Density-Based methods construct the cluster according to the spatial density, where the cluster always extends towards the direction of densely distributed objects. Consequently, these methods are more suitable to investigate the cluster of arbitrary shapes, they are more immune against the outliers and have good scalability (Berkhin, 2006) (Saraee et al., 2007). Density-based clustering can be classified into density-based connectivity and density functions. The first algorithm in density-based connectivity is DBSCAN (Density-Based Spatial Clustering of Applications with Noise) (Han et al., 2001). The good performance of DBSCAN depends on the good estimation of its input parameters. Therefore, a new approach called OPTICS (Ordering Points To Identify the Clustering Structure) develops the DBSCAN to compute the clustering parameters (Ankerst et al., 1999). Finally, one of the most popular density functions methods is DENCLUE (DENsity based CLUstEring) (Hinneburg and Keim, 1998), which also influence by input parameters. Finally, it is clear that clustering according to the
density needs to carefully identify some parameters basing on the characteristic of clustered objects, which require the predefining of these parameters or doing an additional computation to extract them. Also, this approach has been criticised for not being fully informative, and creating unbalanced clusters (Han et al., 2001).
Page | 32 has low complexity O(N), although its complexity exponentially increases with the rising size of the dimensions (Sheikholeslami et al., 1998). Lastly, more complicated grid-based methods are developed to cluster high-dimensional objects, such as CLIQUE (CLustering in QUEst) (Agrawal et al., 1998) and MAFIA (Merging of Adaptive Finite Intervals) (Goil et al., 1999).
2.4.5. Graph Theory-Based Clustering
Graph theory-based clustering could be classified into graph-growing, flows, spectral-
partitioning, recursive-graph-bisection, geometric-partitioning and gene- expression-data clustering methods (Buluç et al., 2015). In addition to the previous six
methods there are some other graph theory-based approaches categorised under different types such as Linkage Metrics (Xu and WunschII, 2005) and CHAMELEON (Karypis et al., 1999) (see Section 2.4.2.,(A) Agglomerative (bottom-up)). Both Linkage Metrics and CHAMELEON are agglomerative clustering methods, but they use the distances between the graph nodes to describe the objects similarity (Xu and WunschII, 2005). The aforementioned six classes of graph theory-based clustering are summarised below:
A) Graph-Growing Methods: A graph-growing method using the BFS (Breadth-First
Search) algorithm as a base for most of its versions (Buluç et al., 2015). Then, the graph- growing method is developed to a newer version which called Bubble Framework (Walshaw et al., 1995). Bubble Framework is an iterative method it splits the graph into more than two clusters (k > 2). The drawback of the Bubble Framework according to
this work is that the beginning K nodes should be well distributed over the graph, while controllers' positions are constrained in the centre of graph. In addition, it has
a high complexity due to the iterative optimisation of the method.
B) Flows Method: Flows Method or HCS (Highly Connected Sub-graphs), uses the
popular max-flow min-cut algorithm (Ford and Fulkerson, 2009) to divide the graph into unbalanced clusters, which have minimum cut edges (links) between them. This approach almost used as a subroutine inside another algorithm because it generates unbalance clusters (Buluç et al., 2015).
C) Spectral Partitioning Methods: The earliest model of the spectral partitioning
method is spectral bisection. It is optimised many times by (Donath and Hoffman, 1972), (Fiedler, 1975) and others. The spectral method is extended by Hendrickson
Page | 33 and Leland, (1995) to slice the graph into more than two clusters by applying multiple eigenvectors. However, this method does not require a high computational
complexity to obtain the multiple eigenvectors; it can only partition the graph into four or eight clusters. The spectral clustering is used in many controller placement studies like (Xiao et al., 2014) (He et al., 2017) ( see Sections 2.3.1 Minimizing network
latency and 2.3.2 Maximizing resilience and reliability), and demonstrates the high
computational complexity and unguaranteed balanced clusters.
D) Recursive Graph bisection (RGB) Method: The RGB method uses the distance of
shortest-path that connects the graph nodes instead of Euclidean distance (Simon, 1991). The algorithm needs to find the two furthest nodes for every cluster which require a high computational complexity, therefore, it uses some heuristic method to simplify this issue such as SPARSPAK (George and Liu, 1984). The drawback of the RGB
method is that it can only create 𝟐𝒌 balanced clusters, where k=1,2,3...etc.
E) Geometric-Partitioning Methods: In geometric-partitioning, the graph is divided
into regions (clusters) according to the nodes (objects) coordinates, which indicate the nodes similarity. The simplest method in this approach is Recursive Coordinate Bisection (RCB) (Buluç et al., 2015)(Simon, 1991). There are several improvements to RCB such as inertial partitioning method (Williams, 1991) and the random spheres
algorithm (Miller et al., 1991). The weaknesses of RCB is producing skinny and long clusters. Also, it can only cluster the graph into 𝟐𝒌 balanced clusters (Simon, 1991). A further, geometric-partitioning method is the Space-Filling Curves (Zumbusch, 2000) (Castro et al., 2005). The Space-Filling Curves partitioning is a mixture of the grid- partitioning and the Space-Filling Curves techniques. The drawback of this method is
determining the adequate density of grid-cells. Also, it is used to cluster the large- scale system with a proximate clustering and acceptable computation complexity but does not serve the present work which requires exact clustering. Finally, it does not consider the connectivity of the graph nodes which is the basic attribute in the wired network.
F) Gene Expression Data Clustering: One of the algorithms which is established to
group similar gene expression patterns in the cluster is CLICK (CLuster Identification via Connectivity Kernels) (Sharan and Shamir, 2000). The major disadvantages of CLICK
Page | 34
reduce clustering quality (Sharan and Shamir, 2000). Another graph theory algorithm
developed to cluster gene expression data, is CAST (Cluster Affinity Search Technique) (Zhang, 2006) (Berrar et al., 2003). The downsides of CAST are summarised by the
needs for defining the affinity threshold (ᵼ) and final clearing step (Bellaachia et al.,
2002).