This section discusses the results obtained from the presented methodology. (Section 7.1) Represents the results obtained from the preprocess step, (Section 7.2) Represents the final results obtained. All the results are assessed by AUC of the ROC curve as proposed by the challenge committee
4.1. P
REPROCESS RESULTSThe results of the preprocess step are presented in this section. The dataset contains three different labels: Churn, Appetency and Up-selling. Each of these labels will be used to construct a separate dataset and every dataset will be preprocessed in different manner. For every step a 10-fold validation was used and 80% of the dataset was used to train the model and the rest 20% was used as a test set to assess the performance
4.1.1.
Imputing Missing Values for the three datasets
We start preprocessing this dataset by comparing different techniques to impute missing values. Table 7.1 shows the performance of different missing values imputation techniques in combination with three base classifiers and their results were assessed by using AUC of the ROC curve after the imputation. The best performance in each column is in bold. The results show that different techniques yielded different results with different classifiers.
4.1.1.1.
Imputing missing values for Churn dataset
Different imputing methods was performed on the churn dataset to impute the missing values, to assess the performance of each imputation technique the AUC of the ROC curve of each technique is measures against
C4.5 DT, RF and GBM as shown in Table 3. The highest AUC was achieved by Hot-Deck imputation technique
and so this technique will be used to impute the missing value in the churn dataset. Mean Hot-Deck
Hot-Deck with Indicators
MiceC4.5 0.5061 0.5116 0.5231 0.5169
RF 0.6407 0.6531 0.6888 0.6398
GBM 0.6683 0.6794 0.7162 0.6641
Table 3 The AUC results for Churn Dataset Missing Value Imputation
4.1.1.2.
Imputing missing values for Appetency dataset
The same technique is used for Appetency dataset; the best performing technique was Hot-Deck imputation. So, this technique will be used to impute the missing value in the Appetency dataset. The results are shown in Table 4.
a.
Mean Hot-Deck
Hot-Deck with Indicators
MiceC4.5 0.5 0.5 0.5 0.5
RF 0.7464 0.7686 0.7445 0.7407
GBM 0.8027 0.8107 0.7975 0.8067
30
4.1.1.3.
Imputing missing values for Up-Selling dataset
The same technique is used for Up-Selling dataset, the best performing technique was MICE imputation. So, this technique will be used to impute the missing value in the Appetency dataset. The results are shown in Table 5.
a.
Mean Hot-Deck
Hot-Deck with Indicators
MiceC4.5 0. 6969 0. 6465 0. 6934 0. 6262
RF 0. 8115 0. 8334 0. 8229 0. 8380
GBM 0. 8394 0. 8544 0. 8445 0. 8579
Table 5 The AUC results for Up-Selling Missing Value Imputation
4.1.2.
Balancing the datasets
This subsection discusses different algorithms to balance the datasets, three algorithms were used and their results were assessed by using AUC after the balancing. The algorithm with the highest AUC will be used to balance the dataset.
4.1.2.1.
Balancing the Churn Dataset
Different balancing algorithms were used to balance the dataset. To assess the performance of each algorithm two classifiers (RF and GBM) were used and their AUC were measured before and after applying each algorithm. The initial AUC of the RF was 0.6796 and for GBM was 0.7032306. After applying SMOTE the AUC results increased by more than 10% which can indicate that SMOTE algorithm overfitted the dataset which is common for SMOTE algorithm and so the OSS was used instead for balancing the dataset. The results are shown in Table 6.
a.
SMOTE
TOMEK
OSSRF 0.8565 0.6533 0.6592
GBM 0.8414 0.6865 0.6818
Table 6 The AUC results for Churn Dataset Balancing
4.1.2.2.
Balancing the Appetency Dataset
The same technique is used for Appetency dataset. The initial AUC of the RF was 0.7760 and for GBM was 0.7814. SMOTE algorithm behaved the same as Churn Dataset and so the OSS was used to balance the Appetency dataset. Results are shown in Table 7.
a.
SMOTE
TOMEK
OSSRF 0.8830 0.7522 0.7608
GBM 0.8774 0.7744 0.7834
31
4.1.2.3.
Balancing the Up-Selling dataset
The same technique is used for Up-Selling dataset. The initial AUC of the RF was 0.8172 and for GBM was 0.8427. SMOTE algorithm behaved the same as Churn Dataset and so the OSS was used to balance the Appetency dataset. Results are shown in Table 8.
a.
SMOTE
TOMEK
OSSRF 0. 8754 0. 8358 0. 8345
GBM 0. 8692 0. 8513 0. 8583
Table 8 The AUC results for Up-Selling Dataset Balancing
4.1.3.
Feature Selection
This subsection discusses different algorithms to select the most important features in the dataset, the feature importance matrix of RF and GBM was used to decide which features are more important for the model to be selected. The features that will score the highest AUC will be used.
4.1.3.1.
Feature Selection of the Churn Dataset
The initial AUC for RF was 0.6796 and for GBM was 0.7032. Using Random Forests to select the most important 25 features in the dataset, then to run RF on these important features and to assess the AUC. Also, the GBM is used to assess the predictive power of these features by running the algorithm on the original dataset and the reduced dataset. Feature selected using GBM achieved higher AUC than RF and so it will be selected for feature Selection. Results are shown in Table 9.
a.
Feature Selection using Random Forests
Feature Selection using Gradient Boosting Machine
RF 0. 6153 0. 6635
GBM 0. 6482 0. 6935
Table 9 The AUC results for Churn Dataset after Feature Selection
4.1.3.2.
Feature Selection of the Appetency Dataset
The initial AUC for RF was 0.7760 and for GBM was 0.7814. Using Random Forests to select the most important 25 features in the dataset, then to run RF on these important features and to see the AUC. Also, the GBM is used to assess the predictive power of these features by running the algorithm on the original dataset and the reduced dataset. Feature selected using GBM achieved higher AUC than RF and so it will be selected for feature Selection. Results are shown in Table 10.
a.
Feature Selection using Random Forests
Feature Selection using Gradient Boosting Machine
RF 0. 6953 0. 7579
GBM 0. 7158 0. 7712
32
4.1.3.3.
Feature Selection of the Up-Selling Dataset
The initial AUC for RF was 0. 8172661 and for GBM was 0. 8427427. Using Random Forests to select the most important 25 features in the dataset, then to run RF on these important features and to see the AUC. Also, the GBM is used to assess the predictive power of these features by running the algorithm on the original dataset and the reduced dataset. Feature selected using Random Forests achieved higher AUC than GBM and so it will be selected for feature Selection. Results are shown in Table 11.
b.
Feature Selection using Random Forests
Feature Selection using Gradient Boosting Machine
RF 0. 8251 0. 8256
GBM 0. 8423 0. 8380
Table 11 The AUC results for Up-Selling Dataset after Feature Selection
4.2. F
INAL MODEL AUC RESULTSThis section discusses the results obtained after applying Random Forests, GBM and Stacking them to the cleaned dataset obtained from the preprocess step. Every classifier is applied to the three datasets and the AUC of the ROC curve was measured. From Table 12 we can see that Stacking did increase the performance of the classifiers by an average of 2%.
Churn Appetency Up-Selling Average
Random Forest 0.6635 0.7579 0.8251 0.7488
GBM 0.6935 0.7712 0.8423 0.769
Stacking 0.7111 0.78 0.845 0.7787
Table 12 The AUC results of the Final model
4.3. A
NALYSIS OF THE RESULTSThe machine used was Core i5 with 8GB RAM and all the experiments have been conducted using R statistical software. Training and testing ratio for each dataset was fixed at 80:20 ratios. Class imbalance was handled only for the training datasets. Testing datasets were kept separated all over the experiment and were used only in the end to assess the performance of the final classifiers. The present research work aims at identifying an appropriate workflow to deal with a challenging dataset that contains missing values, imbalance, high cardinality and of mixed nature. Also, this work aims at comparing the performance of different decision trees- based ensemble techniques to detect churn. After analyzing the results, it has shown that ensemble techniques improve the performance of the classifiers and the stacking of these ensembles further improved the results. Churn dataset was the hardest dataset whose AUC was the lowest compared to the other two datasets. Imputation techniques used increased the performance of the classifiers with different measures, the best performing technique was the hot-deck imputation technique which yielded good results for Churn and appetency datasets, while MICE gave the best performance for the Up-Sell dataset. The best performing balancing technique was the OSS which gave a descent increase in the AUC without overfitting the datasets. The feature importance of GBM yielded an outstanding result compared to the Random Forests for churn and appetency datasets while Random Forests was the same as GBM for the up-sell dataset. The Stacking the ensembles increased a lot the classifiers performance. The final average AUC result was 0.7787 which will
33 place us in the top 10% of this competition. The following table presents the top winner using the same dataset.
Table 13 The KDD 2009 winners (Guyon, Lemaire, Boullé, Dror, & Vogel, 2009)
4.4. K
EYF
INDINGS Multiple methods for imputation produces better results than single imputation methods.
SMOTE should be treated with careful cause in some cases it can cause overfitting.
Wrapper methods for feature selection can take less computational effort and produce better results than filter methods.
Tree-based ensembles produced better results than single trees through decreasing the variance in the dataset.
Churn can be predicted by using tree-based ensembles with high accuracy.
H2O R package is a powerful tool that’s more than capable of handling large, deep and heterogeneous datasets.
34