Optimization of DWM-WIN Parameters using GA
CHAPTER 4. OPTIMIZATION OF DWM-WIN PARAMETERS 63 Table 4.2: Parameters used in GA.
Parameter Value Population size 20 Number of Generation 10 Crossover rate 0.8 .. . ...
Crossover operator Scattered function Selection function Stochastic Uniform Mutation function Adaptive feasible
tations were conducted with Matlab. DWM-WIN error rate is induced as a fitness function in GA. We note that we test the method in dataset without concept drift to proof that the optimized method can identify fixed concepts. An assessment of DWM-WIN in datasets with concept drift will be detailed in the next chapter.
4.4.1 Impact of applying GA in DWM-WIN
For DWM-WIN, we divide the data into 20 and 40 blocks to obtain different batch sizes. For GA parameters, we use the following parameter settings shown in Table (4.2). In fact, crossover rate specifies the fraction of the next generation, that are produced by crossover. The remaining individuals in the next generation are produced by mutation. Scattered function creates a random binary vector. It then selects from the first parent if the vector is 1, and where the vector is 0 from the second parent, and combines the genes to form the child. For example, if the first parent is [a, b, c, d, e, f , g, h] and the second parent is [1, 2, 3, 4, 5, 6, 7, 8], the random crossover vector is [1, 1, 0, 0, 1, 0, 0, 0] and the child is [a, b, 3, 4, e, 6, 7, 8]. Adaptive feasible is a mutation function which randomly generates the directions that are adaptive with respect to the last successful or un- successful generation. Directions means the step length that satisfies bounds and linear constraints and they are also called search regions. Table (4.3) shows a comparison between the error rates of DWM-WIN with and without optimization. The second and the third columns of this table repre- sent the mean of the error rates over the different runs whereas the last column entitled BestFitness indicates the best DWM-WIN error rate obtained during the different runs.
In fact, experimental results show that using GA as an optimization algorithm to select the best combination of the parameters (β , θ , η) for DWM-WIN decreases the error rates for the 4 datasets. In Table (4.3), 40 batches were used for each dataset. For these datasets, it is clear that the use of the optimized parameters computed with GA outperforms the error rates of DWM-WIN algorithm.
64 4.4 EXPERIMENTS Table 4.3: Comparison between error rates of DWM-WIN before GA parameter optimization and after parameter optimization using 40 batches with population size 20.
Dataset DWM-WIN DWM-WIN-GA BestFitness
Pima 0.222 0.222 (0.2206)
Iris 0.09 0.0844 (0.0801)
Tictactoe 0.2415 0.2146 (0.2027)
Credit Approval 0.124 0.113 (0.1053)
4.4.2 Best parameters values
GA is applied to detect the best combination of DWM-WIN parameters. For different subsets, the optimal combination of DWM-WIN parameters varies with each subset and impacts the fitness function. Note that [Kolter and Maloof, 2007] used random values of the three parameters, and [Mejri et al., 2013] used many cross validations to select the best values. In fact, in both cases the choice of these parameters is not automatic and without any rational choice. In this chapter, these parameters are automatically determined improving the accuracy rate of the dynamic ensemble method. Also, this selection of parameters enables the algorithm to cope very well with online data sets and to update the parameter values each time a new stream of data arrives over time. Table (4.4) illustrates the optimal parameter values using 20 batches for the 4 simulated datasets. Different optimal combinations of the three parameters were obtained. Concerning β , it can take values between 0 and 1. If β = 0, this means that when a classifier predicts incorrectly an instance, it is removed from the ensemble and another one will take its place. In consequence, a small number of classifiers will be obtained at the end. If θ = 0, this means that a classifier is removed when its weight reaches 0, this can only occur when β = 0. Else, if β 6= 0 no classifier can be removed from the ensemble and a large number of classifiers will be obtained at the end.
Table 4.4: Best parameter values of best performances (20 Batches).
Dataset θ β η BestFitness
Tictactoe 4.31 · 10−5 0 1.095 (0.2027)
Pima 0.1 0.5077 1.6929 (0.2206)
Iris 0.1 0 1.01 (0.0801)
CHAPTER 4. OPTIMIZATION OF DWM-WIN PARAMETERS 65
4.4.3 Effect of selecting appropriate starting population on the error rate of DWM-WIN based on GA optimization
The population size is an important parameter of GA. We studied its influence on the optimization reliability. In order to compare the impact of modifying the size of population on GA accuracy we focus on modifying the population size and fix all the other parameters.
In Table (4.5), a simulation on different datasets with different population sizes: 50, 300 and 500 for 20 batches is presented. We note that error rates depend on the population sizes. Small changes in the population size can affect the algorithm’s accuracy. Actually, for small population sizes, there are good and bad cases. The larger the initial population is, the better is the capacity to determine the best combination of the parameters and to decrease the error rates of DWM-WIN algorithm achieved by GA.
For Credit Approval dataset for example, the fitness function has significantly improved from 10.53 % when population size is 20, as shown in Table (4.4), to 8.7 % with a population size 500 in Table (4.5). This means that using a large population size clearly reduces the average error rate. In fact, the error rate values of DWM-WIN-GA corresponding to the largest population size 500 is lower than other error rate values in most of the cases for the different datasets.
Also, the values corresponding to the population size 300 are in most cases below the values corresponding to population sizes 50 and 100. The best parameter combinations are given in Table (4.5).
4.4.4 Comparison of DWM-WIN-GA with other classification methods
In order to show the effectiveness of the proposed DWM-WIN-GA algorithm versus other tra- ditional methods, we have compared its error rate with CART of [Breiman et al., 1984]. For the latter, all observations are used together and not in an incremental way. Table (4.6) based on 4 datasets illustrates how our proposed method outperforms the existing ones. In consequence, the optimized DWM-WIN-GA outperforms the DWM-WIN as well as traditional classification methods such as decision trees.
4.5 Conclusion
We introduced an improved version of DWM-WIN algorithm of [Mejri et al., 2013] entitled DWM-WIN-GA based on GA as an optimization technique. In fact, experimental results show that combining several classifiers using a dynamic ensemble method technique with GA optimiza- tion leads to an improvement of the accuracy for many datasets. This successful optimization technique of a dynamic ensemble method technique is adaptable for different population sizes and for many batches while automating the choice of the parameter values for each dataset. The larger is the population size, the better is the performance of the optimized DWM-WIN-GA and the
66 4.5 CONCLUSION Table 4.5: Best parameter values of best performance (20 Batches).
Dataset θ β η BestFitness
Tictactoe (pop size = 50) 0.1 0 1 (0.2027)
Pima (pop size = 50) 0.1 0 1 (0.2142)
Iris (pop size = 50) 0 0 1 (0.074)
Credit Approval (pop size = 50) 0.1 0 1.1909 (0.1003)
Tictactoe (pop size = 100) 0. 1 1 ( 0.2027)
Pima (pop size = 100) 0.1 0.01 1 (0.2041)
Iris (pop size = 100) 0.1 1 1 (0.08)
Credit Approval (pop size = 100) 0 0.01 1 (0.1053)
Tictactoe (pop size = 300) 0 0.858 1.254 (0.197)
Pima (pop size = 300) 0 0.01 1 (0.2011)
Iris (pop size = 300) 0.1 0.01 1 (0.074
Credit Approval (pop size = 300) 0 0.7143 1.6235 (0.095)
Tictactoe (pop size = 500) 0. 0.1 1 ( 0.2263)
Pima (pop size = 500) 0.1 0.1 1.190 (0.198)
Iris (pop size = 500) 0.1 0.1 1 (0.063)
Credit Approval (pop size = 500) 0 1 1 (0.087)
Table 4.6: Comparison between error rates of DWM-WIN with GA parameter optimization using 20 batches with populationsize 500 and the CART of [Breiman et al., 1984].
Datasets CART (Breiman) DWM-WIN-GA
Pima 0.3 0.198
Iris 0.26 0.063
Tictactoe 0.4 0.2263
Credit Approval 0.15 0.087
lower becomes the error rate. As a future work, it will be interesting to look for feature selection optimization and applying it to multivariate SPC.