Chapter 3: Methodology
3.7. Poor data throughput root cause analysis (RCA)
This section proposed a deep learning approach based on a DNN architecture to train and evaluate a model for a poor throughput root cause analysis using both the radio and the core network performance indicators of a 3G packet-switched network as inputs. The approach is a user-centric one and is based on user perception of QoS from categorization between βBADβ for 3G data throughput less than 500Kbps and βGOODβ throughput higher than 500Kbps. To ensure end-to-end visibility, both the performance from the radio and the core network have been correlated based on subscribers as common keys. For the radio part, the radio conditions were derived from the measurement reports considering essentially two parameters: The energy per chip and noise (EcNo) to provide information about the quality of the radio signal and the received signal code power (RSCP) to provide the radio signal strength. For the core network part, seven
57
parameters were considered: the total time of data connection (act_dl), the total bytes transmitted on the downlink (bytes_dl), the total bytes retransmitted on the downlink (retransbytes_dl), the latency from the core to the user equipment (latency_dl), the total number of events (event), the number of successful DNS transactions (dns_successful) and number of unsuccessful DNS transactions (dns_failure).
3.7.1. Data preparation
The data was hourly aggregated and cleaned from unnecessary information. Each record represented an IMSI aggregated transaction for an hour with different new computed metrics based on the two radio conditions information from the measurement report and the seven core network parameters. The seven core network parameters reused the equations (14), (15), (16) and (17) to compute the four core network metrics which are the round-trip time on the downlink (rtt_dl), the retransmission rate on the downlink (rtx_dl), the DNS success rate (dns_sr) and the throughput on the downlink (thp_dl). Instead, the two radio parameters based on equations (22) and (23) computed new metrics with good EcNo (ecno_good), the sum of transactions with bad EcNo (ecno_bad), the sum of transactions with critical EcNo (ecno_critical), the sum of transactions with good RSCP (rscp_good), the sum of transactions with bad RSCP (rscp_bad), the sum of transactions with critical RSCP (rscp_critical).
ππππ = {
ππππ, Threshold EcNo > β10ππ΅
πππ, β 15ππ΅ < Threshold EcNo < β10ππ΅
ππππ‘ππππ, Threshold EcNo β€ β15ππ΅
(22)
ππ ππ = {
ππππ, Threshold RSCP > β90ππ΅π
πππ, β 100ππ΅ < Threshold RSCP < β90ππ΅π
ππππ‘ππππ, Threshold RSCP β€ β100ππ΅π
(23)
A total number of 44,711 observations that were later split for the training, validation and testing of the model considering the average thp_dl higher than 500Kbps as βGOODβ and less than 500Kbps as βBADβ. Table 9 provides the full list of attributes with their description.
58
Table 9: Poor data throughput attributes description
Attributes name Description Example
X1 IMSI ************13992 X2 thp_dl BAD X3 rtx_dl 21% X4 dns_sr 96% X5 rtt_dl 111msec X6 ecno_good 9 X7 ecno_bad 14 X8 ecno_critical 2 X9 rscp_good 7 X10 rscp_bad 13 X11 rscp_critical 5
3.7.2. Model design
3.7.2.1. Training and testing approach
The dataset has been divided into two new datasets with a ratio of 70% for the training and validation dataset (31,298 Observations) and 30% for the testing dataset (13,413 Observations). nine attributes were used as predictors: rtx_dl, dns_sr, rtt_dl, ecno_good, ecno_bad, ecno_critical, rscp_good, rscp_bad and rscp_critical. The response variable for training and testing was the thp_dl that was one-hot encoded to provide two outputs (first output for βBADβ throughput response and the second output for the βGOODβ throughput response) each with
59
binary possibility of 0 or 1 (0 for βNoβ and 1 for βYesβ). It must be noted that having two outputs instead of one will not improve the results in case of binary classification, but we have chosen this approach to enable dynamic configuration using one-hot encoding to support multiclass cases for the future. The normalization transformation based on equation (8) was applied to all the attributes except the targets outputs which were already binary 0 or 1 as result of one-hot encoding.
The DNN training has an objective of finding the neural network parameters that would minimize the loss or cost function while improving the performance. The proposed model was a sequential model as a linear stack of layers as shown in Figure 25.
Figure 25: Proposed DNN architecture
The activation function used for the hidden layers was the rectified linear unit (ReLu) which is one of the common modern non-linear activation functions as shown in Figure 26 in comparison to traditional non-linear activation functions such as the Hyperbolic Tangent.
60
Figure 26: Non-linear activation functions
The output layer instead used the Softmax activation function which output the probability of an instance to belong to a specific class or category. The configuration summary of the model built is described in Table 10.
Table 10: Proposed deep neural network architecture
Layer type Number of units Activation function
Input layer 9 -
Hidden layer 1 5 ReLu
Hidden layer 2 5 ReLu
61
One of the difficult tasks in deep learning is the optimization of the learning rate, since small learning rate causes many iterations until convergence and trapping in local minima while large learning rate causes overshooting. Different methods are used to guess the learning rate when using the stochastic gradient descent (SGD), but most of them are time consuming. We used the approach proposed by Kingma and Ba [96] with the adaptive learning rate optimizer such as the βAdamβ providing better performance as shown in Figure 27 [96].
Figure 27: Convolutional neural networks training cost comparison [96]
With adaptive learning rate, the learning rate is no longer fixed and can be made larger or smaller depending on the size of the gradient, how fast learning is happening and the size of weights. We used the minibatch technique to fit model since it provides much accurate estimation of the gradient with smoother convergence allowing larger learning rates and therefore faster training. The parameters used to compile and fit the model are shown Table 11.
62
Table 11: Proposed deep neural network model parameters
Model parameters Value
optimizer Adam
loss function Cross-entropy Optimization metric Accuracy
batch size 32
epochs 50
validation split 20%
The accuracy and the loss of both training and validation phase were used to tune the parameters while the F1_Score and the AUC were used to evaluate the final model.
3.7.2.2. Root cause analysis (RCA) approach
Very often the blackbox data mining techniques such as DNN are not used in the environment of root cause analysis because of lack of explanation from the results, but this study proposed a root cause analysis using the feature importance during the prediction phase to derive the predictors that impacted the most the final output of the throughput. The user-based root cause analysis required a different approach since the poor QoE of different users might have different reasons. To be able to understand single poor throughput reason, we used a R library based on the local interpretable model-agnostic explanations (LIME) technique which provides explanation of complex machine learning classifiers.
While the model could retrieve the features importance for any subscribers in the testing dataset, for demonstration, only the first four subscribers have been used to outline the influence of predictors focusing on the output response βBADβ for poor data throughput. For this, the interest was only on the top four features based on importance using a kernel width of 0.7.
63