Data analysis - A Churn prediction model based on gaussian processes

In this section, I will try to analyse some of the selected feature and their relation with each other and with the target value. First, I started with a look at the most informative of the features selected in the previous section. Figure 5.2 presents the number of occurrence of each value of feature51 and feature144 among the customers and the proportion of churning customer (in red) for each value. It should be noted by the look at the figure that those features -as well as the majority of the features in the data set- follow a Gaussian distribution with a small variance in feature144 and a larger one for feature52. We can also figure out why feature144 was one of the most informative feature: all the churner have the value 270 in this

feature.

(a) Feature51 (b) Feature144

Figure 5.2: Visual of the two of the most informative feature and their relation with the target value

Then, I tried to make a Self-organizing map (SOM) presentation of the data. It is a data exploring technique base on neural network. This technique is used to present on a two-dimensional space (called map) a highly dimensional instances of a data. SOM is composed of set of neurons, each neuron has an assigned weight, which is initialised to a small or random value. The input data is then fed to the neuron and the weight of each neuron is adjusted until it feats the input data [12]. At the end of the process each neuron will present a cluster of customers that are considered to have similar behaviour. This process will allow to verify if the selected features present any clear separation between the churners and the non-churners.

In the map (figure 5.3), The neuron are portrayed by pentagons and the circles inside each pentagon exhibit the cluster of customers represented by the neuron and the proportion of churner in each one. The darker the pentagon is the larger the distance between the customers represented by it and the ones represented by its neighbours. As the figure shows, a huge majority of the pentagons are white which means that, by looking at the features, no clear distinction exist between the cluster of customers represented by these pentagon and no clear border exist between the two categories of customers (churners and non-churners).

CHAPTER 5. DATA PREPROCESSING

Figure 5.3: SOM presentation of the data

Figure 5.4: 2D Presentation of features 144 and 93

Further, I tried to study the information that a combination of two feature can bring about the target value. For that purpose, I made a 2D plot representing a certain feature as a function of another. The results for different combination of feature is illustrated in figures 5.5, 5.6 and 5.4 where the red cross represent the churning customers and the blue cross the non-churning ones. As shown by the figures, no clear separation was discovered the two category of customers. In deed, the churning customers are scattered over all the possible values of the features.

Figure 5.5: 2D Presentation of features 51 and 93

Finally, the result of this section show that the correlation between the features and the target value is not important. This makes the problem of training for the data set and prediction the churning customers rather difficult.

Chapter 6 Experiments and evaluation

In this chapter, I will try to apply the Gaussian process classification on the data set after preprocessing. I used a Matlab implementation of Gaussian process de- veloped by Rasmussen and Nickisch [20]. During the experiments, I will test different choices of covariance functions and compared the results of the respec- tive predictions. In order to do that we need a better accuracy measurement criteria than the percentage of correct classification because -as we have seen in chapter 5- the data contain 7.3% churners. Thus, a classification algorithm that classifies all customers as non-churner will end up with 92.7% accuracy according to the percentage of correct classification criteria. That is a height score that does not reflect how poorly the algorithm performed. Section 6.1 will try to answer to this problem. Section 6.2 will expose and discuss the result obtained by Gaussian process classification. Finally, section 6.3 will try to compare those results to those of the previously used methods for churn prediction.

6.1 Accuracy measurement criteria

The percentage of correct classification metric does not constitute, as seen previously, a proper accuracy measurement. The problem with that method arise when the churners are a minority in the data set. What we actually need is a criterion that would capture how many churners were detected by the classification algorithm.

The receiver operating characteristic (ROC) curve, on the other hand, is a graphical plot that allow to represent the true positive rate also known as sensitivity (in our case the number of churning customers detected by the classification algorithm divided by the number of true churners) versus the false positive rate also known as specificity (in our case the number of churning customers not detected by the classification algorithm divide by the true number of non-churners). If the predictions are the churning probabilitypand we suppose that the customer is classified as churner ifp > S for0 < S < 1then the ROC curve present the sensitivity as a function of the specificity for each value ofS.

In this case, the area under the ROC curve (AUC) could be a good performance measure in our case since it allows to reduce the ROC curve performance to a single number representing the expected performance of the classification. A random method will have an identity line ROC cure with an AUC of 0.5. Classifi- cation algorithm are supposed to have AUC higher than this. The bigger the AUC is the higher the classification’ performance is.

In document A Churn prediction model based on gaussian processes (Page 40-45)