3.8 The MPACA Parameters
3.8.2 Baseline Analysis
This set of experiments builds on the research methodology later presented in section (4.3). In brief, a number of parameter configurations are executed by the MPACA and depending on the quality of the result attained, the parameters are narrowed further to improve results. Parameters are not adjusted during the MPACA execution, yet they are adjusted in between various execution instances. In order to establish a set of baseline values, the MPACA has been applied for 1,000 instances on varied parameter settings. The applicable parameter ranges used are given in table (3.1). Each of the experiments which now follow are the combination of results attained from these baseline experiments augmented with results from 50 execution runs for each tested parameter.
These experiments only demonstrate the internal interaction of parameters and their influence on each other. No further comparisons are undertaken as per section (2.2.3), with the latter produced in chapter (4). The units applicable to this experimental configuration also apply to other experiments presented in chapter (4) and are as follows:
1. Maximum Edge Length - the maximum number of steps on each edge; 2. Step Size - equates to a fraction of a SD;
3. Ant Complement - an integral unit representing the number of ants present per feature per node;
4. Detection Range for Ordinal Dimensions - steps above or below a mean;
5. Quantity of Pheromone (Ph.) Deposited - an integral value representing the pheromone, Q;
6. Maximum Coefficient of Ph. Deposited - an integral value representing a coefficient (for example Q × 2);
7. Minimum Tolerance Value - an integral unit;
8. Evaporation Rate - a percentage applied to pheromone quantity, Q;
Parameter start mean SD
Max. Edge Length 5 8 1.42
Step Size 0.1 0.1 0 Ant Complement 1 5.09 4.51 Detection Range 1 1.5 0.5 Ph. Qty. Deposited 100 175 56 Max. Ph. Coefficient 1 1.49 0.5 Min. Tolerance 1 1 0 Evaporation Rate (%) 0.01 0.06 0.04 Residual Value 0 1.01 0.82 Feature Merging 3 4.51 1.5 Colony Merging 3 4.55 1.5 Visibility 3 3.5 0.5 Time-window 50 74 25
TABLE 3.1: The MPACA parameter settings as applied to the Square1 dataset. Values as
varied over 1,000 instances of the algorithm, consisting of a minimum (start), mean, and stan- dard deviation (SD). Note: Furthermore, isolated runs are executed on a per parameter basis
(approximately 50 runs each) and are not included in the above baseline evaluation.
10. Feature Merging Threshold - an integral count; 11. Colony Merging Threshold - a separate integral count;
12. Visibility on Edge - an integral representing steps within an edge that an ant can see through; and
13. Time-window - the number of time stamped intervals which are analysed.
Due to the difficulty in measuring the interaction of all parameters concurrently, parameters are grouped together by their perceived action, as follows:
1. Domain initialisation, in which the maximum edge length and step size are analysed; 2. Ant initialisation, analysing the ant complement and detection range parameters;
3. Pheromone deposition and movement, in which the quantity, minimum and maximum amounts of pheromone present on each edge, evaporation rate, and also the residual pa- rameter are explored; and
4. Merging thresholds, in which feature and colony merging are explored together with the edge visibility and the time-window parameters.
The evaluations to follow represent a combination of parameter specific experiments combined with the baseline evaluation, as per table (3.1). Each analysis has a varying optimality range, as optimality differs from the initial phases of execution towards the latter stages of execution. The optimal value should in theory build towards the “best” clustering solution. This analysis aims to determine the optimal value applicable to each parameter, its degree of influence, coupled with the influence this parameter has on other parameters, and why such behaviour takes place.
Graph are used to represent four key benchmarks, as described next.
Figure (a) shows the average feature combinations carried by ants. For the given 2D problem domain the optimal average feature combination value at the end of processing should be within the range of 1.5 − 1.7. Higher values, especially at the early stage mean premature stagnation, whilst lower values towards the end mean that feature merging is not occurring. The latter lack of feature merging is an indication that not enough ant encounters are taking place.
Figure (b) shows the distribution of ants in the top N colonies versus other smaller colonies. The absorption of ants in the top N colonies is an indication of a good clustering solution. Under optimal conditions this value should range between 0.6 and 0.8, which indicates that enough ants are joining the larger colonies, whilst allowing some free ants to exist. It is unlikely that all ants combine correctly into the major colonies, and when no minor colonies are present this can indicate over aggregation. An important consideration is that sudden merging of features and colonies does not imply correct cluster formation.
Figure (c) shows the repetition in node traversals, where a high value indicates less randomness, whilst a lower value is indicative of a more varied search. In the opening phases, repetition should be low, indicating that ants are varied around the graph nodes. As time progresses more ants should be repeating the sequence of nodes that they traverse, which is indicative of correct pheromone path following. Under optimal conditions this value should range between 0.5 and 0.6, as some freedom in movement should always be present.
Figure (d) shows the progressive decline in the number of colonies present and the termination criteria. This is also time dependent, where the initial colony count is reduced as more colonies are engulfed by larger colonies. Coupled with figure (b), whilst the majority of ants are expected to be in the top N colonies, a number of smaller colonies are likely to exist, indicating correct functionality. Termination is not necessarily shown in these experiments given the short time cycles used. The decline in colony counts and the colony membership stabilisation is indicative of correct termination.