Current Limitations and Recommended Improvements

Developing and implementing a new computational intelligence algorithm has a number of phases. The most important is clarifying how the algorithm differs from alternatives and how these differences impact on performance. These have been the focus of this thesis but there are other issues that are also relevant. Optimisation is a particular concern because the learning process is comparatively slow for ACO, making it difficult to explore, for example, the best settings for parameters. This is where parallelisation may have its biggest influence because speeding up the learning process will help understand better where the MPACA needs adjustments and how it can fit different types of data sets such as those with uneven class sizes.

5.4.1 Parallel versus Non-Parallel

A parallel version of the MPACA was implemented using the .NET 4.5 framework, a Microsoft technology allowing a simplified parallel implementation. It was applied to the movement of the ants, which is where parallelisation would provide the maximum advantage. In this version, all ants are generated as separate parallel threads. Each ant moves independently from any other ant, as there is no selected sequence by which ants move.

Cluster quality derived between standard and parallel approaches does not differ. However, the variation between speed of execution is certainly significant. A preliminary investigation contrasting execution speeds is given in table (5.1). This is the result of executing the Iris dataset using the standard versus the parallel mode for 100 instances each, with both experiments using the exact same set of parameters. Results indicate that the parallel version is approximately 16% faster.

Even if parallel execution can significantly shorten execution time, improvements in the data structures used and superior hardware than currently available to the author, could furthermore accelerate the MPACA processing.

5.4.2 Parameters and Parameter Adjustment

Parameters are an important influence on the model’s operation and all optimisation algorithms depend on finding the values that provide the best fit. The problem will always be that ACO methods are computationally expensive and time consuming, requiring careful optimisation of the MPACA code to generate the necessary execution speed. This would help the search for a best fit but more work is needed on better methods. There may even be room for improvement within the algorithm by having certain parameters merged or eliminated. This would benefit the MPACA and experiments already carried out on the parameters shows that some do not change their settings much when learning optimal values. This suggests they could be linked in to the architecture and governed by fewer parameters.

5.4.3 Termination Criteria

Termination is said to occur when ants reach a dynamic equilibrium between colony populations, or a maximum number of iterations has been reached. In many cases when the clusters to be recognised are relatively well dispersed, this stabilisation of colonies occurs rapidly. Thus, this mechanism is successful in cases when balanced datasets exist. However, a clear limitation occurs when tackling uneven datasets. This since, varying population sizes can cause premature termination, as too many ants join a particular reduced set of colonies, and from an opposite perspective, in some cases might cause the algorithm to fail to build the correct colonies and adequately terminate. A limitation discussed next.

5.4.4 Tackling Uneven Datasets

The MPACA is shown to return results comparable to literature for many datasets. However, it has failed to return good quality results on datasets which are unevenly distributed, as the case with the yeast dataset. The evaluation of such a failure is given in section (4.4.2), however it definitely opens further avenues where improvements can occur.

At present the algorithm has difficulties in defining smaller clusters which would be spatially positioned within a larger cluster. If the number of nodes representing the cluster to be learn- t/recognised is too small, this will fail to generate the required critical mass for the creation of colonies. This occurs because ants join colonies depending on ant encounters, and a small denser concentration within a larger concentration is counter intuitive to this process. Postulating an alternative, if the colony level merge threshold is set to an overly high value, the smaller cluster region within the larger cluster will form a colony to define it. However, the larger cluster might

not have a colony which represents it in full, but have a number of smaller colonies. Thus, the larger cluster might fail to be recognised, which is also an undesired result.

A possible solution which could potentially solve both problems of learning uneven datasets and correct termination, is to investigate further the influence of the visibility parameter. This discussion builds on the material presented in chapters (3) and (4), where lower visibility is shown to slow termination, whilst higher visibility accelerates it. By allowing this parameter to be self-adjustable, depending on the node concentrations, nodes of higher concentration to each other would automatically lower visibility, whilst more sparse nodes would have higher visibility. Therefore, linking this parameter with node density should in theory improve the results attained.

In document The multiple pheromone Ant clustering algorithm (Page 161-163)