Chapter 3 Research Methodologies
3.7 MSGA: A New Evolutionary Algorithm Based on Multiple Seeds
In order to address the major challenges and issues raised by single seed based genetic algorithm, a novel framework named MSGA (Multiple Seeds Based Genetic Algorithm) is presented for obtaining strong search ability. The novel features of this method are as follows:
1) m-Domain Model. This proposed method subdivides the whole solution space into an m-number of same size domains. The purpose of dividing the whole so- lution space is to get seeds from each domain and to maintain diversity for gen- erating an initial population.
2) m-Seeds Selection Process. This proposed approach uses the m-seeds selection process where n-number of chromosomes are generated from each domain. From the members of a domain, this process selects only one high fitness value chromosome as a seed. This chromosome is used as a seed for that domain. 3) Initialize Population based on m-Seeds. Based on a seed chromosome, the next
step is to generate n-number of individuals by randomly changing any position of a seed chromosome. Through this technique, m seeds generate a m×n number of individuals. These individuals are used as an initial population for MSGA. 4) Method Implementation and Evaluation. In order to demonstrate the effective-
ness of the proposed approach, a large number of studies is carried out on asso- ciation rule mining approaches and different crossover and mutation operators. To demonstrate the feasibility of the proposed method, a number of experiments are conducted to mine a reduced set of interesting association rules by optimiz- ing conditional probability using different crossover and mutation operators. To compare a single seed based approach with the proposed method, the same set of experiments is applied on different data sets for mining BARs using different crossover and mutation operators by initialising a population using a single seed. Experimental results in Chapter 5 show that a multiple seeds based method demonstrates satisfactory performance over different single seed based methods. The major focus of this algorithm is to apply the multiple seeds based generation mech- anism to generate diversified initial population with good coverage into the evolutionary process, which generates a large amount of high quality rules. This process also helps to
automate selection of a seed chromosome without depending on a data set. This ap- proach is applied to ensure that an evolutionary algorithm is not trapped into a local op- timum in an early stage and ensures multiple convergences on a whole solution space. Thus, an overall global optimum is achieved. In the following chapters, all the charac- teristics of a multiple seeds based genetic algorithm are described (see Chapter 4).
Figure 7: The architecture of MSGA
The traditional genetic algorithm uses a single seed to generate an initial population. The basic idea of a single seed based genetic algorithm (SSGA) is to randomly select a chromosome from a large solution space for generating an initial population. Because of random selection, SSGA could face premature convergence problem and extract a small number of high quality rules from a large data set.
On the other hand, some seeds may have a low fitness value but could generate a large number of high quality rules since it explores a huge area of a large solution space. The random selection of a seed chromosome and generating an initial population based on
Multiple Archives Encoding
Generate Multiple Seeds Generate Initial Population
Selection Crossover
Mutation
Replace offspring into popu- lation Stopping Condition K-Domain Solution Space End Solution Space Equally subdivide the solution space into k-domain
...
1. Gather chromosomes from each domain with fitness value 2. Ranked chromosomes based on fitness values in each domain
1 3 2 5 4.56 2 8 1 9 3.13 2 7 3 4 1.25 … … … … … … …
1. Select top chromosomes as seeds from m-archives
2. Generate m×n population from m- seeds by mutating any bits in a seed
No
Evaluate fitness value of Individuals
that chromosome cannot guarantee whether that population cover the whole solution space or not. As an initial population has significant effects on obtaining best results after several generations, so the population diversity including good coverage of a large solution space is important for the generation of an initial population for balancing the exploration and exploitation search. Thus, a global optimum can be achieved.
The basic idea of this research is to equally divide the whole solution space into m- domain. From each domain, this method generates n-number of individuals. The indi- viduals which are randomly generated from a domain are stored in an archive. Therefore, m-domains generate m-archives. Chromosomes of each archive are ranked based on the fitness values of those chromosomes. A chromosome of a high fitness value has a high- er rank than the chromosome of a low fitness value. From each archive, top ranked chromosome is selected as a seed. By mutating any bits of a seed, each seed generates n-number of individuals. Therefore, m seeds generate m×n individuals which are used as an initial population for multiple seeds based genetic algorithm. The architecture of MSGA is shown in Figure 7.