Case Study: Mine Pump - Experimental Evaluation of the EDSM-Markov Algorithms

5.2 Experimental Evaluation of the EDSM-Markov Algorithms

5.3.2 Case Study: Mine Pump

The second case study is the mine pump system that is introduced by Damas et al. [142] for the following requirement: the pump must be switched off whenever the water level is below a low threshold. Damas et al. [142] showed a simplified LTS specification of mine pump that can be used for evaluating LTS inference methods. In this case study, the number of states is 10, alphabet size is 8, and the number of transitions is 13.

1 2 4 8 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.4 0.5 0.6 0.7 0.8 0.9 1.0 L=0.3 L=0.5 L=1.0 L=0.3 L=0.5 L=1.0

Trace Length Muliplier

BCR score

EDSM−Markov k =1 EDSM−Markov k =2 EDSM−Markov k =3 SiccoN

Trace Number

Figure 5.28: BCR scores of water mine pump case study

Figure 5.28illustrates the BCR scores of the inferred LTSs for the mine pump case study using EDSM-Markov and SiccoN, where different numbers of traces were considered. It is obvious from Figure 5.28 that the EDSM-Markov learner inferred LTSs with higher BCR scores in the majority of cases, especially when the number of traces was higher than 1 and k >1. The EDSM-Markov k = 1 learner did not learn LTSs well compared to SiccoN if the number of traces was 4 or 8. This was because the accuracy of the trained Markov model when k = 1 was not good compared to k = 2 or k = 3. It is apparent from Figure 5.28

that the SiccoN learner performs well on heavily-branching traces compared to the EDSM- Markov learner when k = 1. It is interesting to note that EDSM-Markov k=2 and EDSM- Markov k=3 learners inferred LTSs with BCR scores much higher than those obtained

using SiccoN even if there are 1 or 2 traces considered. This indicates that the trained Markov models predicted labels of outgoing transitions well during the process of merging states.

Additionally, SiccoN performed well when the number of traces was 8, in contrast to when it was 1 or 2. This is because SiccoN is going to infer LTSs well whenever the traces are heavily branched, and this interprets why SiccoN generates LTSs of the mine pump case study with BCR scores close to those inferred by EDSM-Markov k=2 and EDSM-Markov k=3.

Table5.10shows the reported p-values of BCR scores obtained from the Wilcoxon signed- rank statistical test. The null hypothesis H0 is that there is no significant difference

between the BCR scores of the inferred LTS using EDSM-Markov and SiccoN. The result- ing p-values were less than 0.05. Therefore, the H0 could be rejected. It is clear that there

was a significant difference between SiccoN and EDSM-Markov when k = 1 if the number of traces was 8, indicating that SiccoN performed better than EDSM-Markov k=1. On the other hand, the null hypothesis was accepted if the number of traces was 4, denoting that there was no significant difference between SiccoN and EDSM-Markov k=1.

l Trace Number 1 2 4 8 0.3 EDSM-Markov k=1 vs. SiccoN 0.52 0.03 0.75 2.18 × 10−05 EDSM-Markov k=2 vs. SiccoN 0.06 0.002 7.33 × 10−04 _{9.55 × 10}−07 EDSM-Markov k=3 vs. SiccoN 0.017 8.40 × 10−04 0.003 9.55 × 10−07 0.5 EDSM-Markov k=1 vs. SiccoN 0.002 5.82 × 10−04 0.40 7.76 × 10−04 EDSM-Markov k=2 vs. SiccoN 0.003 2.58 × 10−05 _{5.33 × 10}−05 _{1.16 × 10}−06 EDSM-Markov k=3 vs. SiccoN 0.06 0.003 7.08 × 10−05 _{1.16 × 10}−06 1.0 EDSM-Markov k=1 vs. SiccoN 0.003 0.024 0.39 2.98 × 10−05 EDSM-Markov k=2 vs. SiccoN 2.13 × 10−04 5.03 × 10−05 2.60 × 10−06 1.14 × 10−06 EDSM-Markov k=3 vs. SiccoN 1.29 × 10−04 _{1.81 × 10}−06 _{2.58 × 10}−06 _{1.14 × 10}−06

Table 5.10: p-values of Wilcoxon signed rank test of water mine case study for BCR scores

Figure 5.29 shows the structural-similarity scores of the mined LTSs for the water mine pump case study. The outcomes that are shown in Figure 5.29 support the hypothesis that EDSM-Markov generates LTSs models that are structurally very similar to the reference LTS compared to those models inferred using SiccoN when k = 2 and 3. The

1 2 4 8 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 L=0.3 L=0.5 L=1.0 L=0.3 L=0.5 L=1.0

Trace Length Muliplier

str

uctur

al similar

ity score

EDSM−Markov k =1 EDSM−Markov k =2 EDSM−Markov k =3 SiccoN

Trace Number

Figure 5.29: structural-similarity scores of water mine pump case study

structural-similarity scores of the inferred LTSs using SiccoN were low, denoting that the synthesized LTSs were over-generalized. Furthermore, the structural-similarity scores of the inferred LTSs using the EDSM-Markov learner were worse than other learners when k= 1; this is attributed to the low precision scores of the trained Markov model.

Besides, it is apparent from Figure 5.29 that SiccoN inferred LTSs better than EDSM- Markov if k = 1, and this was because the EDSM-Markov learner predicted labels of outgoing transitions incorrectly. In terms of measuring the performance of the EDSM- Markov when k = 2 and 3 on structural-similarity scores of the inferred models, it is evident that EDSM-Markov identifies LTSs of higher structural-similarity scores, as shown in Figure5.29.

Table5.11summarizes the p-values obtained from the Wilcoxon signed-rank statistical test for the mine pump case study. The null hypothesis H0 is that the structural-similarity

values of EDSM-Markov and SiccoN are not significantly different. The test reported p- values for structural-similarity values less than the 0.05 significance level in all numbers of traces considered. Therefore, the H0could be rejected, and this means that the structural-

the H0 was accepted if the number of traces was 2 and l = 0.5 when the structural-

similarity scores for the mined LTSs using EDSM-Markov k=1 were compared to the scores attained by SiccoN, and this suggested that there was no significant difference between the structural-similarity scores. l Trace Number 1 2 4 8 0.3 EDSM-Markov k=1 vs. SiccoN 1.25 × 10−05 _{1.04 × 10}−04 _0.004 _{1.78 × 10}−04 EDSM-Markov k=2 vs. SiccoN 0.04 3.09 × 10−04 1.79 × 10−06 1.03 × 10−06 EDSM-Markov k=3 vs. SiccoN 0.001 1.16 × 10−05 _{1.78 × 10}−06 _{1.03 × 10}−06 0.5 EDSM-Markov k=1 vs. SiccoN 1.13 × 10−05 _0.10 _0.01 _{1.52 × 10}−05 EDSM-Markov k=2 vs. SiccoN 3.40 × 10−05 _{1.82 × 10}−06 _{1.80 × 10}−06 _{1.17 × 10}−06 EDSM-Markov k=3 vs. SiccoN 1.17 × 10−05 2.47 × 10−06 1.80 × 10−06 1.17 × 10−06 1.0 EDSM-Markov k=1 vs. SiccoN 1.53 × 10−05 1.91 × 10−04 0.002 8.12 × 10−05 EDSM-Markov k=2 vs. SiccoN 1.07 × 10−05 _{7.97 × 10}−06 _{1.77 × 10}−06 _{1.16 × 10}−06 EDSM-Markov k=3 vs. SiccoN 4.00 × 10−06 _{1.81 × 10}−06 _{1.77 × 10}−06 _{1.16 × 10}−06

Table 5.11: p-values of Wilcoxon signed rank test of water mine case study for structural- similarity Scores

The precision and recall scores of the Markov models were computed during the conducted experiments. The intention behind computing this is to study the influence of Markov models on the accuracy of the inferred LTSs. Figure 5.30 illustrates the precision/recall scores of the trained Markov models for different settings of prefix length k, and a varied number of traces were considered. It can be seen from Figure5.30that the precision scores of the trained Markov models when k = 1 were very low compared to other settings of k, and this explains why the EDSM-Markov learner performed worse than SiccoN when k= 1. The EDSM-Markov learner over-generalized whenever the precision score was very low (say below 0.5), and this happened if k = 1. It is noticed that the precision scores were very high (above 0.8) when k = 2 or 3 and it had a significantly positive effect on the BCR and structural-similarity scores.

Figure 5.31 shows the number of inconsistencies computed for the reference LTS of the water mine case study after training Markov models. In case where k = 3, the mean value of the BCR scores of the inferred LTSs using EDSM-Markov was higher than (say 0.95) when the number of traces was 4 or 8. This can be attributed to the low inconsistency score in this case, as shown in Figure 5.31. In contrast, low BCR scores of the inferred LTSs using EDSM-Markov were achieved if the number of traces was very small and k = 3;

1 2 4 8 0 20 40 60 80 100 0 20 40 60 80 100 L=0.3 L=0.5 L=1.0 L=0.3 L=0.5 L=1.0

Trace Length Muliplier

ercentage %

Precision k=1 Precision k=2 Precision k=3 Recall k=1 Recall k=2 Recall k=3

Trace Number

Figure 5.30: Markov precision and recall scores of water mine case study

this indicates that Markov models did not train well to observe sequences of length k + 1.

In document Improving Software Model Inference by Combining State Merging and Markov Models (Page 172-176)