The Impact of Various Parameter Tunings on MAFS Performance

Algorithm 3.2 Relief-F

3.7 The Impact of Various Parameter Tunings on MAFS Performance

The next experiments are intended to show the results after parameter tuning, conducted as a series of empirical experiments on the same datasets used in the previous tests. All previous tests were based on the parameters set by (Zhu, Ong et al. 2007) and shown in Table 3.6. The FSGATC method that corresponds to the GA wrapper method is used for comparison purposes (Abualigah, Khader et al. 2016).

Table 3.6 Parameters Used with MAFS

Parameter Value

Search Strategy Genetic Search Method Population size 50

Solution size 50

65 Local search length 8

Number of generations: 200 Probability of crossover: 0.6

Crossover Type: Uniform Crossover Probability of mutation: 0.1

Local Search: TRUE

Local Search Method: Filter Ranking Local Search Strategy: Improvement First Selection Type: Linear Rank Selection

Stopping criterion 6000 Objective Function Evaluations

In this section, various experiments are conducted using different parameter combinations to reveal the effects of them on feature subsets. Table 3.7 shows the parameters are tuned based on their significance to MAFS.

The crossover type is tuned first, using the uniform, one-point, and multi-point crossover types. For the local search range w, five different values were compared. Similarly, for the local search length l, three different values were tested. The classification methods used with the wrapper as a fitness function were also tested with three different values. The compared classifiers were the lwl (Locally Weighted Learning) classifier, knn with k = 1 and k = 3. The knn classifier using k = 3 achieved the highest accuracy among the other two options. In order to measure the impact of the local search range, the best value of the local search range equaled 10 as it achieved the minimal error rate in the last generation in comparison to other methods.

Table 3.7 Values Tested for the Parameters Tuning No. Crossover w l Classifier

1 One-point 10 3 lwl (locally weighted learning)

2 Multi points 1 8 knn, k = 3

3 Uniform 15 1 knn, k = 1

4 _ 5 _ _

5 _ 25 _ _

Regarding the local search length, the tested values were 3, 8, and 1. It is noteworthy that other values are selected within the range from 1 to 10, but the results were almost similar to the

66 results of the three tested values. Keeping the local search range to its best value selected previously, the best value chosen for the local search length was 8. Finally, the crossover experiments showed that using the multi-point crossover was more productive than the two counterparts. Moreover, as for the other parameters, they were kept as same as shown in Table 3.6.

From the results in Table 3.8 it can be concluded that by using the MAFS method, document clustering performance can be improved. The MAFS method achieved more accurate results for D2, D3, and D5 datasets, whereas the ADDC values of the MAFS method were increased slightly for D1 and D4. However, the corresponding F-measure values for D2 and D4 were much higher than those obtained by ALL and FSGATC methods. The slight increase of ADDC values for D1 and D4 could be tolerated in favor of the higher leap achieved by the MAFS method using the external measure. On the other hand, the external evaluation measures for the D2, D3, and D5 had a higher value for the F-measure after using the MAFS method. At the same time, results for these datasets generated by using MAFS methods provided smaller ADDCs in comparison to those generated by using FSGATC and ALL methods as mentioned earlier.

The results in Table 3.8, it is also noted that in all datasets, the k-means performance improved after using the MAFS method by observing the F-measure. An improvement in performance was achieved by using the MAFS method for the k-means clustering. For the ADDC values and results of D2, D3, and D5, the MAFS method obtained smaller values but a higher F- measure. On the other hand, for D1 and D4, although the FSGATC method obtained smaller ADDC values than the MAFS method, the corresponding FSGATC external measure values were still less than those achieved by MAFS for both D1 and D4.

The clearest observation to make from Table 3.8 is that the proposed MAFS method is superior when compared to the ALL and FSGATC methods. It appears from the trends of both the ADDC measure and the F-measure that the relationship between them could be stated in three cases which are listed below:

A. When the internal measure decreases the external measure increases, which is an ideal convergence state. For example, this happened with the MAFS method for the D2, D3, and D5 datasets using the spherical k-means while it is also clear for the D1, D3, and D5 using the k-means

67 B. The second case happens when the internal measure does not significantly decrease while the corresponding external measure increases significantly, which indicates a notable improvement in the clustering accuracy.

C. Finally, the worst case that might happen is when there is no improvement in the external measure, but this was not visible in the results of the proposed MAFS method in any of the datasets used. It can be clearly concluded that the MAFS performed well with more stability than using ALL and FSGATC methods for all datasets.

Table 3.8 Average Results of 20 Spherical k-means Runs

Spherical k-means k-means

Methods ADDC F-Measure ADDC F-measure D1 ALL 0.57 0.69 0.56 0.52 MAFS 0.24 0.81 0.22 0.70 FSGATC 0.22 0.52 0.24 0.56 D2 ALL 0.67 0.31 0.20 0.15 MAFS 0.15 0.36 0.54 0.66 FSGATC 0.17 0.17 _0.82 _0.1 D3 ALL 0.54 0.20 0.86 0.27 MAFS 0.26 0.28 0.47 0.33 FSGATC 0.33 0.24 0.73 0.33 D4 ALL 0.85 0.94 0.82 0.83 MAFS 0.82 0.94 0.76 0.89 FSGATC 0.63 0.83 0.63 0.83 D5 ALL 0.65 0.74 0.69 0.92 MAFS 0.37 0.75 0.48 0.93 FSGATC 0.51 0.47 ₁ _0.33

In document Document clustering with optimized unsupervised feature selection and centroid allocation (Page 80-83)