Poor data throughput root cause analysis (RCA)

Chapter 3: Methodology

3.7. Poor data throughput root cause analysis (RCA)

This section proposed a deep learning approach based on a DNN architecture to train and evaluate a model for a poor throughput root cause analysis using both the radio and the core network performance indicators of a 3G packet-switched network as inputs. The approach is a user-centric one and is based on user perception of QoS from categorization between “BAD” for 3G data throughput less than 500Kbps and “GOOD” throughput higher than 500Kbps. To ensure end-to-end visibility, both the performance from the radio and the core network have been correlated based on subscribers as common keys. For the radio part, the radio conditions were derived from the measurement reports considering essentially two parameters: The energy per chip and noise (EcNo) to provide information about the quality of the radio signal and the received signal code power (RSCP) to provide the radio signal strength. For the core network part, seven

parameters were considered: the total time of data connection (act_dl), the total bytes transmitted on the downlink (bytes_dl), the total bytes retransmitted on the downlink (retransbytes_dl), the latency from the core to the user equipment (latency_dl), the total number of events (event), the number of successful DNS transactions (dns_successful) and number of unsuccessful DNS transactions (dns_failure).

3.7.1. Data preparation

The data was hourly aggregated and cleaned from unnecessary information. Each record represented an IMSI aggregated transaction for an hour with different new computed metrics based on the two radio conditions information from the measurement report and the seven core network parameters. The seven core network parameters reused the equations (14), (15), (16) and (17) to compute the four core network metrics which are the round-trip time on the downlink (rtt_dl), the retransmission rate on the downlink (rtx_dl), the DNS success rate (dns_sr) and the throughput on the downlink (thp_dl). Instead, the two radio parameters based on equations (22) and (23) computed new metrics with good EcNo (ecno_good), the sum of transactions with bad EcNo (ecno_bad), the sum of transactions with critical EcNo (ecno_critical), the sum of transactions with good RSCP (rscp_good), the sum of transactions with bad RSCP (rscp_bad), the sum of transactions with critical RSCP (rscp_critical).

𝑒𝑐𝑛𝑜 = {

𝑔𝑜𝑜𝑑, Threshold EcNo > −10𝑑𝐵

𝑏𝑎𝑑, − 15𝑑𝐵 < Threshold EcNo < −10𝑑𝐵

𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙, Threshold EcNo ≤ −15𝑑𝐵

(22)

𝑟𝑠𝑐𝑝 = {

𝑔𝑜𝑜𝑑, Threshold RSCP > −90𝑑𝐵𝑚

𝑏𝑎𝑑, − 100𝑑𝐵 < Threshold RSCP < −90𝑑𝐵𝑚

𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙, Threshold RSCP ≤ −100𝑑𝐵𝑚

(23)

A total number of 44,711 observations that were later split for the training, validation and testing of the model considering the average thp_dl higher than 500Kbps as “GOOD” and less than 500Kbps as “BAD”. Table 9 provides the full list of attributes with their description.

Table 9: Poor data throughput attributes description

Attributes name Description Example

X1 IMSI ************13992 X2 thp_dl BAD X3 rtx_dl 21% X4 dns_sr 96% X5 rtt_dl 111msec X6 ecno_good 9 X7 ecno_bad 14 X8 ecno_critical 2 X9 rscp_good 7 X10 rscp_bad 13 X11 rscp_critical 5

3.7.2. Model design

3.7.2.1. Training and testing approach

The dataset has been divided into two new datasets with a ratio of 70% for the training and validation dataset (31,298 Observations) and 30% for the testing dataset (13,413 Observations). nine attributes were used as predictors: rtx_dl, dns_sr, rtt_dl, ecno_good, ecno_bad, ecno_critical, rscp_good, rscp_bad and rscp_critical. The response variable for training and testing was the thp_dl that was one-hot encoded to provide two outputs (first output for “BAD” throughput response and the second output for the “GOOD” throughput response) each with

binary possibility of 0 or 1 (0 for “No” and 1 for “Yes”). It must be noted that having two outputs instead of one will not improve the results in case of binary classification, but we have chosen this approach to enable dynamic configuration using one-hot encoding to support multiclass cases for the future. The normalization transformation based on equation (8) was applied to all the attributes except the targets outputs which were already binary 0 or 1 as result of one-hot encoding.

The DNN training has an objective of finding the neural network parameters that would minimize the loss or cost function while improving the performance. The proposed model was a sequential model as a linear stack of layers as shown in Figure 25.

Figure 25: Proposed DNN architecture

The activation function used for the hidden layers was the rectified linear unit (ReLu) which is one of the common modern non-linear activation functions as shown in Figure 26 in comparison to traditional non-linear activation functions such as the Hyperbolic Tangent.

Figure 26: Non-linear activation functions

The output layer instead used the Softmax activation function which output the probability of an instance to belong to a specific class or category. The configuration summary of the model built is described in Table 10.

Table 10: Proposed deep neural network architecture

Layer type Number of units Activation function

Input layer 9 -

Hidden layer 1 5 ReLu

Hidden layer 2 5 ReLu

One of the difficult tasks in deep learning is the optimization of the learning rate, since small learning rate causes many iterations until convergence and trapping in local minima while large learning rate causes overshooting. Different methods are used to guess the learning rate when using the stochastic gradient descent (SGD), but most of them are time consuming. We used the approach proposed by Kingma and Ba [96] with the adaptive learning rate optimizer such as the “Adam” providing better performance as shown in Figure 27 [96].

Figure 27: Convolutional neural networks training cost comparison [96]

With adaptive learning rate, the learning rate is no longer fixed and can be made larger or smaller depending on the size of the gradient, how fast learning is happening and the size of weights. We used the minibatch technique to fit model since it provides much accurate estimation of the gradient with smoother convergence allowing larger learning rates and therefore faster training. The parameters used to compile and fit the model are shown Table 11.

Table 11: Proposed deep neural network model parameters

Model parameters Value

optimizer Adam

loss function Cross-entropy Optimization metric Accuracy

batch size 32

epochs 50

validation split 20%

The accuracy and the loss of both training and validation phase were used to tune the parameters while the F1_Score and the AUC were used to evaluate the final model.

3.7.2.2. Root cause analysis (RCA) approach

Very often the blackbox data mining techniques such as DNN are not used in the environment of root cause analysis because of lack of explanation from the results, but this study proposed a root cause analysis using the feature importance during the prediction phase to derive the predictors that impacted the most the final output of the throughput. The user-based root cause analysis required a different approach since the poor QoE of different users might have different reasons. To be able to understand single poor throughput reason, we used a R library based on the local interpretable model-agnostic explanations (LIME) technique which provides explanation of complex machine learning classifiers.

While the model could retrieve the features importance for any subscribers in the testing dataset, for demonstration, only the first four subscribers have been used to outline the influence of predictors focusing on the output response “BAD” for poor data throughput. For this, the interest was only on the top four features based on importance using a kernel width of 0.7.

In document Quadri-dimensional approach for data analytics in mobile networks (Page 70-77)