• No results found

Variations of the MPACA from the Traditional Clustering Algorithms

3.10 Novelty and Contribution of the MPACA

3.10.2 Variations of the MPACA from the Traditional Clustering Algorithms

As highlighted in section (2.2.5), the MPACA is locatable under the graph-theoretic clustering algorithms, section (2.2.4.4). The graph represents the problem domain, where node proximity in space also signifies a higher degree of relevance. Ants in the MPACA traverse the graph struc- ture and lay pheromone to further link nodes of higher relevance. The effect of this pheromone causes ants to form higher densities around certain regions at the expense of others.

The MPACA does not utilise a direct probabilistic mechanism to create clusters. It instead uses the ant population density to define the clusters, not the original objects. It then assigns nodes to the nearest ant colony, which is a representative of a cluster. This is a density function which differs from that presented in section (2.2.4.3), since the MPACA uses the frequency of ants, which are themselves proxies of node content, rather than making use of node density itself. This can be interpreted as a density-by-proxy or a mocked density function. Ants in the MPACA are not interested in nodes per se, but rather in the features that make up the node. These feature and ant encounters are used for the two-fold merging mechanism, applied to both features and colonies. The MPACA merges features which are carried by the ants depending on the frequency of their encounters, and also ants from different smaller colonies merge into bigger ones, using a similar mechanism. The MPACA uses the critical number of ants to merge into one colony, based on a colony level threshold. This can be considered as equivalent to the MinPtsin DBSCAN [Ester et al., 1996b], since once a colony level threshold is exceeded, two colonies merge.

The MPACA can be viewed as a hybridisation of the algorithm methodologies presented, since it uses the concepts of density as in the DBSCAN [Ester et al., 1996b] to form clusters, and the K-Means [MacQueen, 1967] to allocate data elements to one cluster or the other. It also uses a normalisation function, sharing similarities with Grid-based clustering (section 2.2.4.3). These classical clustering approaches serve as a benchmark for results.

3.10.3 Variations from Ant Based Clustering Literature

This subsection in part explores a global overview of the variations the MPACA variations from the algorithms presented in the literature overview chapter (2), based on typology groupings presented earlier. In this text ACO stands for typologies of type IV and their sub-derivative algorithms.

Type I: Akin to the phenomenon of self-assembly in the AntTree algorithm, ants in the MPACA also use self-assembly. This differs, as rather than building fixed structures, where the structure is the representation of the cluster, in the MPACA ants merge into colonies of ants, forming a dynamic structure. In the MPACA it is this colony of ants and the carried features that ultimately represent a cluster. This structure is dynamic, as ants can easily move from one colony to another. In the MPACA ants follow other ants which are searching for similar features. When a locale is found which has many ants searching for similar features, there is an increased tendency for the ant to stay fixed within it. The modus operandi of the AntTree, as discussed in section (2.6.1), remains that of building a hierarchical structure and cluster definitions are extracted from it. This is totally distinct from the MPACA which is based on ACO principles (type IV). Type II: The MPACA uses a graph based architecture where data elements are represented as nodes, which are fixed during the clustering process. This differs substantially from the SACA approach in which data elements are represented as blocks within a 2D space and are moved around by ants to form clusters of higher density. Hybrid-SACA approaches introduce the notion of pheromone driven ants, such as the ACLUSTER and APC. In these algorithms the final spatial positioning of data elements is a result of the clustering process itself, whilst in the MPACA graph nodes are fixed. The notion of graph clustering clearly separates the MPACA from any type II method discussed earlier in section (2.6.2).

Type III: The ANTCLUST algorithm has introduced clustering based on the agreement of a colonial odour between ants. That is, the ants determine what pheromone values best rep- resent the colony. This approach introduces a “learning” function within ants, as ants adjust their pheromone according to the interaction with other ants. This is a mechanism akin to the MPACA, where ants rather than learning pheromone carried by other ants, learn which fea- tures they should follow, and into which colonies they should join into. But the similarities with ANTCLUST are also limited as has been discussed in the relevance to the MPACA, as per section (2.6.3).

Type IV: The MPACA is graph based, and shares analogies with the core ACO principles. That is, ants traverse graph space, and during these traversals deposit pheromone traces. This

chapter has reviewed two different ACO (graph) based approaches. In the first instance, the problem is used for purely optimisation purposes, where at the end of each iteration a fitness function is applied to calculate the best tours performed by the ants and in order to determine the quantity of pheromone that is to be deposited. Other multi-objective, multi-colony and multi-pheromone implementations use an extension of this approach. All make use of the core ant colony optimisation mechanism, thus even if they do provide a multi-colony aspect, they are still distinct from the MPACA. Although some previous ant models have multiple pheromones, none of them have a different pheromone associated with each distinguishing feature of the objects being analysed. This is a key innovation of the proposed model.

In the second case, one finds the closest resemblance to the MPACA, where pheromone traces are used to create connections between nodes of higher relevance, and where node proximity also signifies a higher degree of relevance. This is an approach which follows on from graph theoretic clustering, section (2.2.4.4). Much like ACODF, the MPACA uses this ant population in specific areas of higher density to create clusters. The MPACA is a mechanism which shares similarities to ODUEC, where colonies compete to colonise nodes, whereas in the MPACA colonies seem to compete to take in more ants within their fold. Ants in the MPACA form into bigger colonies depending on the frequency of their encounters. If an ant detects many other ants belonging to one colony rather than another, there is a higher likelihood of the ant joining that colony. Once more, akin to ODUEC, the MPACA uses multiple pheromones. The effect of this pheromone causes ants to form in higher densities around certain regions.

Despite the MPACA being a graph-based clustering algorithm, a substantial effort has been invested in pursuing a graph architecture that is scalable and efficient.