4.3 Architectural Design of the Prototype
4.3.1 Parameter Setting Interface
Before the prototype being executed, the parameters for both GA and ANN need to be defined. As this is only the prototype of our approach, there is no graphical user interface (GUI) environment for such definition. Instead, it was made directly on the coding statement shown in Figure A.1 in Appendix A. Table 4.2 presents the the parameter settings of the prototype.
Table 4.2: The summary of the GANN interface parameters.
Parameter Description
RUN COUNT The whole GANN process cycle. The default value is 5000. INPUT ROWS The number of samples to be read from the data set. INPUT COLS The number of features to be read from the data set. CLASS COUNT The number of class clusters in the data set.
HIST COUNT The number of ranked features produced by GANN. The default value is consistent with the INPUT COLS parameter.
HIST MIN The minimum frequency of the correctly labelled samples in each feature produced by GANN. The default value is 0.
Table 4.2 – Continued
Parameter Description
HIST MAX The maximum frequency of the correctly labelled samples in each fea- ture produced by GANN. The default value is consistent with the IN- PUT ROWS parameter.
GA POPSIZE The population sizes to be examined {100, 200, 300}. User can input integer values that is higher or lower than the defined sizes.
GA EVALS The evaluation sizes to be examined {5000, 10000, 15000, 20000, 25000, 30000, 35000, 40000, 45000, 50000}. User can input integer values that is higher or lower than the defined sizes.
GA PRECISION The fitness confidence (accuracy) level of GANN. The default value is consistent with the INPUT ROWS parameter.
GA MUTATIONDIST The mutation point (the point on the chromosome to be changed) of the string (offspring). The default value is 0.5.
GA MUTATIONRATE The mutation rate. The default value is 0.1.
GA XFACTOR The cut-point on the parent strings. The default value is 2 (single-point crossover).
GA CROSSOVER The execution of crossover function. The default setting is true. User can de-activates this function by amending the parameter to false. GA TOURNAMENTSIZE The size of the tournament selection. The default value is 2. MLP ISIZE The input nodes in the input layer. The default value is 10. MLP HSIZE The hidden nodes in the hidden layer. The default value is 5.
MLP OSIZE The output nodes in the output layer. The value must consistent with the CLASS COUNT parameter.
MLP ACT The activation function. The available functions include binary sigmoid, linear, tanh, threshold.
MLP SIZE The network weights including the bias nodes. User can amend the number of bias nodes in the hidden and output layers.
Three data set parameters, which are INPUT ROWS, INPUT COLS and CLASS COUNT ; were used to describe the experimental data set. The INPUT ROWS is the total number of samples in the data set, the INPUT COLS is the total number of features in the data set and the CLASS COUNT is the number of classes in the data set.
The layout of the results is defined by three parameters, i.e. HIST COUNT, HIST MIN and HIST MAX. The HIST COUNT indicates the total number of ranked features to be displayed by the prototype, the
Figure 4.4: GANN Prototype: A high level of architectural design.
HIST MIN and the HIST MAX indicates the cut-off range for the minimum and the maximum correct labelled sample frequency for each ranked feature to be displayed by the prototype.
For a GA to optimise chromosomes in the population, eight parameters were included, which are GA POPSIZE, GA EVALS, GA PRECISION, GA MUTATIONDIST, GA MUTATIONRATE, GA XFACTOR, GA CROSSOVER and GA TOURNAMENTSIZE. The GA POPSIZE indicates the population size, the GA EVALS indicates the evaluation size, the GA PRECISION is the fitness accuracy of the prototype, the GA MUTATIONDIST and GA MUTATIONRATE are the mutation point and the mutation rate for mutation operator, the GA XFACTOR is the cut point for crossover operator and the GA TOURNAMENTSIZE is the number of chromosomes competed in the tournament.
To compute the fitness values for GA chromosomes, five ANN parameters were used, i.e. MLP ISIZE, MLP HSIZE, MLP OSIZE, MLP ACT and MLP SIZE. The first three parameters (MLP ISIZE, MLP HSIZE and MLP OSIZE) indicate the number of nodes for the input, hidden and output layers, respectively. The MLP ACT is the activation function for the hidden layer and the MLP SIZE is the total ANN weights, including the bias weights.
INITIALISE GANN parameters
REPEAT until termination criteria A (Max. no. of iteration) is satisfied { Generate GA population
Calculate fitness values for each string in the population { DO WHILE EOF {
a. Generate network weights b. RUN ANN
c. Compute centroid values for each class using target output d. Calculate distance between samples and classes
e. Label samples to its nearest class }
}
REPEAT until termination criteria B (Max. no. of fitness evaluation || predefined precision value) is satisfied {
Select 2 strings as parents for reproduction Perform GA operators {
For network evolution {
a. Crossover 2 set of network weights to produce new set of network weights b. Mutate the new set of weights
}
For feature evolution {
a. Crossover 2 strings to produce new set of string b. Mutate the new string
} }
Calculate fitness value for new string
Replace the least fit string with the new string }
Calculate the number of correctly labelled samples }
PRODUCE summary results
Figure 4.5: The pseudocode of the GANN prototype.
achieved (GA PRECISION ). The RUN COUNT parameter performs external looping on the entire extrac- tion process and the GA EVALS parameter internally repeating the fitness computation module.