• No results found

Chapter 2: Intelligent System Techniques

2.4 Genetic Algorithm (GA)

Genetic Algorithm (GA) is based on natural genetics, which consist of a number of chromosomes in a population of individuals. These genetic structures, called β€˜genotypes’ can be used to select the best solution to a problem based on their β€˜fitness’ (Michalewicz, 1992, Holland, 1992).

In IS, GAs are used as optimization and search algorithm which use the same concepts as natural genetics, mimicking evolution in nature (Goldberg, 1989, Randy L. Haupt, 2004, Negnevitsky, 2005, Z. Didekova, 2009). The idea in developing GA comes from John Holland; he and his team tried to explain the adaptive process of natural systems, and at the same time developed artificial software that retains the important mechanisms of natural and artificial systems (Goldberg, 1989).

41

In the GA, there are 3 main processes involved; selection, crossover and mutation. The selection process is used to select which pair of chromosomes contributes the most in term of giving the best results or solutions to the problem. This is based on the determination of the fitness function as in equation (2.3) (Michalewicz, 1992, Negnevitsky, 2005). A commonly used selection function is based on a roulette wheel with slots sized according to the fitness values as in equation (2.4) (Michalewicz, 1992). In (Negnevitsky, 2005) a method is described to find the maximum value of a function 15x-x2. Figure 2.4 shows the selection process of this function.

(2.3) (2.4)

Figure 2.4: Selection process (Negnevitsky, 2005)

𝐹 = βˆ‘π‘π‘œπ‘ 𝑠𝑖𝑧𝑒𝑖=1 π‘’π‘£π‘Žπ‘™ (𝑣𝑖)

π‘Ÿ <𝑣𝑖 βˆ’ 𝑣1 (π‘β„Žπ‘Ÿπ‘œπ‘šπ‘œπ‘ π‘œπ‘šπ‘’ 1)

π‘œπ‘‘β„Žπ‘’π‘Ÿπ‘€π‘–π‘ π‘’

42

In this selection process, the division of the area in the pie chart will depend on the fitness value of each chromosome’s population. In other words, if the fitness value is high, the area of the division in the chart will be reflected by this. With the roulette selection process, the wheel will spin randomly six times due to there being six populations, where the greater of the area in the chart, the higher the chance that a particular chromosome’s population will be selected.

The crossover process involves the use of the crossover probability functions as shown in Figure 2.5, which shows the same case problem as in Figure 2.4. This function will compare a pair of chromosomes from the selection output to get a new offspring based on the probability parameter. It can be either a 1-point crossover or multi-point crossover such as X6, X2, X1 and X5 in Figure 2.5, depending on the problem being solved. If no crossover occurs, as shown in the population of X2 and X5 cloning will not take place (Negnevitsky, 2005). In the Matlab GA and direct search toolbox, a variety of crossover functions are given that can be used with any model of an application (Mathwork, 2004).

43

The last stage of process is mutation, which generally comes after the crossover but sometimes before crossover. Mutation is performed on a randomly selected gene in a pair of chromosomes. Usually mutation can be beneficial solution to the problem, but sometimes it can be harmful to the results. However the process is useful in guaranteeing that the search algorithm is not trapped in a local optimum, by keeping the GA from converging too fast before sampling the entire search region (Holland, 1992, Negnevitsky, 2005, Randy L. Haupt, 2004). Therefore the mutation probability is usually given a small value, typically in the range 0.001 to 0.01, but this also depends on the problem being handled (Michalewicz, 1992, Negnevitsky, 2005). An example of mutation is shown in Figure 2.6 which considers the same problem as in the crossover process (Negnevitsky, 2005, Mathwork, 2004). The Figure shows that only X1 and X2 are being mutated at the second gene and the third gene.

Figure 2.6: Mutation example (Negnevitsky, 2005)

The overall basic GA process flowchart is shown in Figure 2.7. For each main process – selection, crossover or mutation, a new offspring will be given to the output,

44

and this new offspring usually has the best fitness value for the solution to the problem, for example in the problem shown in Figure 2.8 (Negnevitsky, 2005). This shows the new offspring for the next generation process and the starting point for data before selection process in Figure 2.3. The process will run a few times until it achieves its target or reaches the stopping criterion which has been set. Every time the GA processes start, it will randomly pick the chromosomes and determine their fitness values based on the fitness function. The process will run until the population size and the number of generations ends. The dataset can be encoded either based binary or decimal numbers, whichever is most beneficial to the system.

Determining size of chromosomes and size

of population Randomnly initialize the

chromosomes Selection

Crossover

Mutation (if any)

END START

New Population

Is the rules or the

conditions met? NO

YES

45

Figure 2.8: Original data and new offspring data for generation (i+1) (Negnevitsky, 2005)

2.4.1 Purpose of Using GA

In terms of optimizing and feature selection, the GA is one of the best optimization and search technique, as it has the advantages of being able to avoid local minima and it also can search all possible solution in all every region simultaneously within the data itself (Muhd Khairulzaman Abdul Kadir, 2012). Therefore the GA is used to optimize the network architecture and to select best feature for a certain application to achieve better prediction, which also can extend the decision making process of the study cases in this thesis.