Experiments Using EDSM - DFA Inference Competitions

2.7 DFA Inference Competitions

3.1.5 Experiments Using EDSM

In order to evaluate the efficiency of the EDSM algorithm when alphabet size is large, the same experiments that are described in Section3.1.1were conducted for different variants of the EDSM algorithm. In cases where only positive traces were considered, the total number of sequences is given by |Q|×5. In this section, another experiment was conducted to study the impact of negative traces. Therefore, the total number of traces is |Q| × 5 where half of the sequences were positive and the other half were negative.

Figure3.7shows two groups of box-plots representing the BCR scores of the inferred LTSs in two cases. First, if only positive samples are included in the inference process (the right group of box-plots), and second if negative samples are provided with positive ones (the left group of box-plots).

0.0 0.2 0.4 0.6 0.8 1.0

Positive and Negative Positive Only

Different EDSM Thresholds

BCR scores

EDSM>=0 EDSM>=1 EDSM>=2 EDSM>=3 EDSM>=4

Figure 3.7: BCR scores obtained using the EDSM algorithm for different EDSM threshold values

The right group consists of different box-plots representing the BCR scores of LTSs inferred using different EDSM learners from positive sequences only. From Figure 3.7 we can see that EDSM over-generalizes LTSs when the threshold is set to zero or one; this is due to the absence of negative samples that can control the generalization by preventing many incorrect mergers. The horizontal line in the right group in Figure3.7shows that the BCR values are 0.5, which indicates that the learner makes random guesses of states merging (over-generalization).

In the absence of negative samples, one may constrain the merging process by increasing the threshold to two, three, and four. Figure 3.7 illustrates that when the threshold is three, the average BCR is around 0.67. However, EDSM under-generalizes LTSs when the threshold is greater than 2; this means many states are blocked from being merged, which is considered bad during the generalization process.

As noted by Walkinshaw et al. [100], the accuracy of the inferred models becomes very low when the merging threshold is low compared to that with a high threshold. Moreover, with a very low threshold, the language of the inferred models accept many false positive sequences. The study in this section agrees with their findings [100] in which the BCR scores can be improved by increasing the EDSM threshold from two to three.

The left group of box-plots shown in Figure 3.7 summarizes the BCR scores attained by variants of the EDSM algorithm in cases where positive and negative sequences were sup- plied. In comparison to the case when only positive samples are provided, the figures show that EDSM performs better if negative samples are available and the merging threshold is one or two. For instance, the average BCR scores of LTSs inferred when negative samples are available and the threshold is zero is 0.60 compared to 0.5 in cases of positive samples only. 0.0 0.2 0.4 0.6 0.8 1.0

Positive and Negative Positive Only

Different EDSM Thresholds

Str

uctur

al−similar

ity scores

EDSM>=0 EDSM>=1 EDSM>=2 EDSM>=3 EDSM>=4

Figure 3.8: Structural-similarity scores of LTSs inferred using the EDSM algorithm for different EDSM threshold values

Additionally, the impact of the EDSM threshold on the structural-similarity scores of the inferred LTSs using EDSM is illustrated in Figure 3.8. In cases where only positive sequences are provided and the merging threshold is one or two, the average structural- similarity score of inferred LTSs are zero; this denotes that the models are over-generalized.

It is clear that the structural-similarity scores achieved by learners are very sensitive to the existence of negative samples and the settings of the EDSM threshold. The average structural-similarity scores attained by EDSM is nearly 0.5 when the threshold is three, which is higher than others obtained by different settings of the EDSM threshold. During the conducted experiments, the ratio of correctness for the number of states was computed as follows:

ratio of correctness = The number of states of LTSs inferred using a learner

The number of states of the target LTSs (3.1)

The ratio of correctness for the number of states of LTSs inferred using different EDSM learners are shown in Figure 3.9. It is apparent from Figure3.9that the number of states is affected by the setting of the EDSM threshold. As shown in Figure 3.9, the EDSM learner generates LTSs with the number of states close to those in the hidden target LTSs when the EDSM threshold equals two. From Figure 3.9, the figures indicate that many mergers are not made that should be when the threshold is three or four. This indicates that the setting of the EDSM threshold is critical.

In situations where positive and negative sequences are provided, the number of states is affected by the setting of the EDSM threshold, as shown in Figure 3.10. It is apparent that the inferred LTSs have more states compared to the target LTSs if the threshold is set to three or four. On the other hand, the numbers of states of the inferred models are so close to the target LTSs when the EDSM threshold is two.

5 10 15 20 25 30 35 40 45 50 1 2 3 0 1 2 3 4 5 0 2 4 6 8 0.0 2.5 5.0 7.5 0 2 4 6 0 2 4 6 0 2 4 6 0 2 4 6 0 2 4 6 0 2 4

EDSM>=0 EDSM>=1 EDSM>=2 EDSM>=3 EDSM>=4 Target EDSM>=0 EDSM>=1 EDSM>=2 EDSM>=3 EDSM>=4 Target

Different learners

Ratio of correctness

EDSM>=0 EDSM>=1 EDSM>=2 EDSM>=3 EDSM>=4 Target

State number

Figure 3.9: Ratio of correctness for the number of states of learnt LTSs using different EDSM learners from positive samples only

In document Improving Software Model Inference by Combining State Merging and Markov Models (Page 78-82)