4.2 Experiment II: SDS and undersampling
4.2.1 Experiment setup
As shown in Table 4.2, there is a difference between the number of sub- scribers, which is equal to 451 and that of non-subscribers which is equal to 3668. To balance the dataset, the model used SDS for undersampling the
majority class and SMOTE to oversample the minority class. At the algorith- mic level, the model made predictions using the SVM with the RBF kernel in which the γ set to 1.00and the C set to 0.00. To prepare the dataset for the SVM, the following pre-processing steps were taken: all nominal values are converted to numerical; and all values are normalised to avoid the value scale difference among all attributes. For all the experiments, 10-folds cross validation was applied.
In order to oversample the minority class to 2000, SMOTE algorithm was used with the following configurations:
• Class was set to zero to detect the minority class automatically.
• Nearest neighbours was set to 5, which created synthetic instances from the 5 nearest neighbours4.
• The percentage of instances to create was set to 345%. This was to over- sample the minority class to 2006 subscribers. The majority class was then undersampled using SDS to 2000 to balance the dataset.
• The number of seeds used for sampling was set to 0.
The aim was to oversample the minority class in order to reach a com- parable size with the undersampled majority class (see Table 4.8 for the size of the dataset before and after balancing). The next section illustrates the balancing techniques used in this work.
TABLE 4.8: The label class before and after balancing the
dataset
Number of instances in the dataset
Label Imbalanced Balanced
Subscribers 451 2006 Non-subscribers 3668 2000
Balancing the Dataset
There are several methods to deal with the class imbalance problem in the dataset; the proposed model investigated balancing the dataset by apply- ing two different approaches: undersampling the majority class and over-
4As mentioned in Experiment 4.1, the number of nearest neighbours was set to five based
sampling the minority class, which was conducted using SMOTE. The un- dersampling process was performed by SDS, whose performance was then contrasted against random undersampling as well as undersampling with ED.
Applying SDS for undersampling
The initial experiment used SDS to undersample the majority class from 3668 to 2000 non-subscribers. In this experiment, 100 agents were used5. Initially,
the model was selected from the search space (all non-subscribers) and the agents were set to find the closest match from the remaining items of the search space. Once a match or the most similar item was found, it was re- moved from the majority class with the aim of removing redundant data. Given that this process aims at reducing the size of the search space with- out removing useful data, removing the closest item to a randomly selected model discourages the deletion of useful data. This hypothesis was later val- idated (in section 4.2.3) when the spread and the central tendency of the data were investigated before and after the undersampling process (McCluskey and Lalkhen, 2007).
Following the initialisation phase where each agent was allocated to a hy- pothesis from the search space (a random non-subscriber), in thetest phase, a randomly selected micro-feature (attribute) from the hypothesis was com- pared with the corresponding micro-feature of the model; if the randomly se- lected micro-feature of the hypothesis lay within a specific threshold (which will be discussed later) of the model’s micro-feature, the agent was set to active, otherwise to inactive. This process is repeated for all agents.
In the next phase, the diffusion phase, a passive recruitmentmode was ap- plied where each inactive agent chose another agent and adopted the same hypothesis if the randomly selected agent was active. If the randomly se- lected agent was inactive, the selecting agent picked a random hypothesis (i.e. a random non-subscriber from the search space). This process was re- peated for all inactive agents.
The cycle of test-diffusion was repeated10times, which is the best empir- ically chosen value, at the end of which a non-subscriber with the maximum number of active agents was removed from the search space and the model was moved to another list (e.g., model list). This guaranteed that while the
most similar item was removed from the search space, the model, which rep- resents the deleted item, was kept and used later during the classification process. This process was repeated until the dataset was undersampled.
In brief, the process of picking a random non-subscriber as a model and deleting the most similar item was repeated until the size of the search space plus the model list was equal to the number required (i.e., 2000, which is close to the number of the oversampled minority class).
In the experiments reported in this work, three different thresholds, in- cluding 1.00, 0.50, and 0.00 were used and thus three different datasets of non-subscribers were generated, all sized 2000. As the input dataset was normalised and the range of values was between 0.00 and 1.00, the SDS al- gorithm with threshold of 1.00 randomly undersampled the data; threshold 0.00 looked for an exact micro-feature match from the model; and threshold 0.50 was a state between random and exact-match undersampling.
Applying euclidean distance for undersampling
Euclidean distance is a metric used to measure distances betweenn points in the space. Over the past years, this measure has been widely used for database dimensionality reductions (Keogh et al. (2001) and Beckmann, Ebecken, and Lima (2015)). Although it is a comprehensive metric and there is a high computational expense involved in it to the undersampling problem, it was used in this experiment as the mean to contrast with the proposed computa- tionally cheaper swarm intelligence technique. ED was used to undersample the majority class; in each iteration, a model was picked randomly, then the ED of the model with each element in the search space was calculated; once all the distances had been calculated, the closest element to the model was removed. This process was repeated until the size of the search space was reduced to the number required (i.e. 2000 entries).
4.2.2
Results
In this experiment, various performance measurements were used: accuracy, sensitivity, specificity, Area Under the Curve (AUC), F-measure, and preci- sion. The experimental results show that the new approach (i.e., a combina- tion of SDS at threshold 0.00 to undersample the majority class, and SMOTE to oversample the minority class) achieved the best performance in terms of accuracy, specificity, F-measure, and precision, as shown in Table 4.9.
TABLE4.9: Performance measurements comparison Euclidean Threshold 0.00 0.50 1.00 Distance Accuracy 90.46% 88.56% 88.56% 89.47% Sensitivity 95.46% 96.06% 96.06% 96.76% Specificity 85.45% 81.04% 81.04% 82.15% AUC 0.959 0.96 0.96 0.965 F-measure 90.93% 89.41% 89.41% 90.91% Precision 86.82% 83.67% 83.67% 84.48%
As shown in Table 4.9, the proposed model achieved higher accuracy because of the higher specificity. On the other hand, obtaining higher F- measure is attributable to the higher precision rate as opposed to the ED undersampling. However when using ED for undersampling, the results exhibited higher sensitivity and AUC which can be justified given the much higher computational expense; this claim is explored further in the next sec- tion along with a more in-depth discussion about SDS and the impact of the varying thresholds on the results. From the results, the proposed SDS un- dersampling process performed the undersampling and removed redundant data using the threshold 0.00 as a distance measurement to the model, with a better specificity rate as opposed to the ED. Threshold 0.00 indicates that the difference between each corresponding attribute is minimised as the dataset is normalised between 0 and 1. Thus, threshold 0.00 finds the closest cor- responding attribute in the majority class. Also, SDS undersampling out- performed the ED undersampling in terms of speed and computation time. Moreover, the results reported in this experiment show that the proposed method can offer promising results when compared with previous work on the same dataset, as shown in Table 4.10.
TABLE4.10: Results for previous models on the direct market-
ing dataset
Models AUC Accuracy Sensitivity Specificity
Moro, Laureano, and Cortez, 2011 0.938 NA NA NA Moro, Cortez, and Rita, 2014 0.8 NA NA NA Feng, Zhang, and Liao, 2014 NA 83% NA NA Elsalamony, 2014 NA 90.09% 59.06% 93.23% Bahnsen, Aouada, and Ottersten, 2015 NA 88.28% NA NA Proposed Model 0.959 90.46% 95.46% 85.45%