• No results found

Optimizations to the Active Learning Framework

1.6 Related Work

1.6.3 Optimizations to the Active Learning Framework

Recent optimizations of the learning setting have involved reducing learning algorithm overhead, ensuring quality of subsequent hypothesis during learning and better testing. We refer to [119] for a thorough overview of earlier advancements.

Our experience in Chapters 4 and 6 has shown that in terms of the number of tests needed to learn a model, the best performing algorithms are those which require the fewest tests to incorporate counterexamples into their structure and produce new hypotheses. In other words, learners with the least overhead are the most efficient. Recent advancements have tackled this overhead by shorter counterexamples and optimized data structures.

Aarts et al. [2] noted that shorter counterexamples reduce the overall number of tests required, as they result in shorter suffixes. Consequently, they incorporated into their RA learning algorithm, techniques introduced by Koopman et al. [131] for counterexample shrinking. These techniques involve eliminating from counterexamples single transitions or sequences which form loops in the last hypothesis. The effectiveness of applying these techniques was evidenced in [8], which compared existing RA learning approaches. Therein, loop elimination was shown to have a marked effect in shortening counterexamples, and consequently in reducing the number of tests needed for learning.

Reduction in test numbers can also be attained by using optimized data structures. Observation tables provide an intuitive, yet costly way of encoding observations. The cost lies not so much in the memory footprint, as it does in the number of tests

24 1. Introduction needed to close a table and build a new hypothesis. Some of these tests might be meaningless with regards to incorporating the essence of the last counterexample, yet are needed as a side-effect of the structure used and of the redundancies often present in counterexamples. Kearns and Vazirani [127] introduced discrimination trees (they call ’classification trees’) as an alternative to observation tables. More advanced algorithms such as the Observation Pack [116] and TTT [122] incorporate similar structures. The work in [122] benchmarks different learning algorithms and shows the effectiveness of TTT, and more generally, of algorithms based on discrimination trees

A different line of work follows the quality of hypotheses generated in learning. The aim is to ensure that every new hypothesis is at least just as good qualitatively as the last. Comparing hypothesis is done on the basis of a distance metric. Smetsers et al. [197] formalize a metric based on the minimal-length counterexample which distinguishes a hypothesis from the SUL. A hypothesis is better if the minimal-length counterexample is longer. This metric follows the remark of Alfaro et al. [75] that a potential bug in the far-away future is less troubling than a potential bug today. Smetsers et al. integrate the metric into L∗ by adding an additional check performed

on each newly generated hypothesis, comparing it to the previous. This comparison results either in a quality guarantee or in a new counterexample for the learner. Later work by Van den Bos et al. [207] enhances Angluin’s framework by adding a general Comparator component to perform the comparison based on a given metric. They also introduce a new metric centered on the distance of a hypothesis to a set of logs. While ensuring a notion of quality was the main goal, both works note a decrease in the number of tests as a (desirable) side-effect of enforcing these metrics.

Finally, learning correct models cannot be done without effective testers. Our case studies have used the model-based algorithm introduced in [191]. The novelty of the algorithm lies in forming a test by post-pending to a sequence of inputs leading to a state, an adaptive distinguishing sequence which distinguishes this state. By comparison, other model-based algorithms (W-method [65] and Wp-method [92]) post-pend other forms of separating sequences. The conception of this algorithm was prompted by failure of the W and Wp-method algorithms to find counterexamples to an invalid hypothesis in an industrial case study [190]. On the note of separating sequences for states in the model, Smetsers et al. [195] propose a more efficient algorithm for computing them, which can be used to enhance the performance of classical test algorithms like the W-method.

A different approach used in [16] adapts model-based mutation testing (shown to be effective in [14, 15]) for learning Mealy machines. The resulting algorithm compares favorably to that in [191] on the TCP and MQTT models inferred in [88, 200]. Yoo et al. [225] propose a different test approach whereby testing is done using a com- bination of the W-method and random sampling. Effectiveness is shown through learning experiments on the DNP3 protocol. Alternatively, in a context where several implementations are learned simultaneously, counterexamples can be derived from

1.6. Related Work 25 differences between the hypotheses generated, as done so in [21].

Testing approaches presented so far have been mainly guided by models (model-based). In the context of conformance testing however, it is specifications which we want to check. So it is natural to design tests on the basis of these specifications. To that end, works proposing integration of model checking with model learning [149–152, 170] use counterexamples supplied by the model checker to drive the learning process.

Moving away from black-box settings, white-box methods can also prove effective. For example, the learning algorithms in [63] and [157] use symbolic or concolic execution [130] to instantiate tests exploring paths in programs. The effectiveness of symbolic execution compared to black-box approaches was shown in case studies involving the concolic execution tool JDart [94, 145]. Smetsers et al. [196] propose an alternative approach whereby testing is done by fuzzing. Their combined model learning and fuzzing approach scored very well in the RERS 2016 challenge [180], a competition which aims at comparing verification techniques and tools. For fuzzing, they used the tool American Fuzzy Lop (AFL) [13], which helped uncover behaviors that weren’t found using an adapted Wp-method.