1.5 Model Learning Algorithms and Contributions
1.5.4 Learning Systems with Tree Queries
Cassel et al. [53, 58] presents a different approach for learning RAs which incorporates the handling of parameterized behavior into the learning algorithm. Their algorithm, SL∗, utilizes tree queries in place of simple tests. A tree query comprises a concrete prefix and a symbolic suffix. The algorithm poses tree queries to a tree oracle, which answers them by generating Symbolic Decision Trees (SDT) describing the SUL’s behavior after the prefix for the suffix. An SDT is a data structure which compactly encodes observations made by running a large number of tests on the SUL. Each SDT obtained running a tree query is stored into a tabled structure similar to L∗’s, in
the cell corresponding to the query’s concrete prefix and symbolic suffix. Closedness checks are done by comparing SDTs in rows for equivalence.
SL∗ Algorithm
Tree
Oracle SUL
tree query tests
observations SDTs
Figure 1.8: RALib’s architecture
Canonical implementations of a tree oracle permit the generation of more succinct (i.e. compact) models. The framework of Cassel et al. supports learning RAs with advanced relations by providing canonical tree oracles for these relations. In [58], Cassel et. al. formalize tree oracles for equalities and inequalities (involving the <,> and = relations). They also give an intuition on how various combinations of relations are handled, including inequalities over sums with constants. They then use a prototype implementation to learn simple models of these combinations. Cassel et al. [54] introduce RALib2, an open-source implementation of this approach which
supports equality and inequality relations.
Before discussing contributions, we give an intuition on the structure of SDTs, and on how a tree oracle can be implemented. Figure 1.9 shows SDTs a tree oracle may construct on a tree query with the concrete prefix connect ack(10) and symbolic suffix msg(p) ack(p). An SDT symbolically describes all the instantiations of a suffix that when appended to a prefix, form valid traces of the SUL. These instantiations lead to accepting states in the SDT, whereas those forming invalid traces lead to rejecting states. In the case of Protocol B, the suffix forms valid traces if the parameters of msg and ack are equal to the parameter of ack in the prefix and its successor, respectively.
To answer a tree query, a tree oracle as presented in Chapter 5 first generates a maximally refined tree which explores all possible parameter configurations for the suffix given the relations. In our example, we consider equality and successor relations. Consequently, we have to explore cases when a suffix parameter is equal, the successor
1.5. Model Learning Algorithms and Contributions 15 ack(p) p ==r 1+ 1 ack(p) p! =r1+ 1 msg(p) p== r1 ack(p) true msg(p) p== r1+ 1 ack(p) true msg(p) p!= r1∧ p!= r1+ 1
(a) Maximally refined SDT
ack(p) p ==r 1+ 1 ack(p) p! =r1+ 1 msg(p) p == r1 ack(p) true msg(p) p! =r1 (b) Maximally abstract SDT
Figure 1.9: SDTs for prefix connect ack(10) and suffix msg(p) ack(p). r1 refers to the first parameter in the prefix
or different, relative to previous parameters. This requires execution of three tests, which may result in the concrete traces:
connect() ack(10) msg(10) ack(11)(equal) connect() ack(10) msg(11) nok()(successor) connect() ack(10) msg(20) nok()(different)
Note, that only the first trace ends in ack, and thus matches our suffix. All others don’t, hence the cases they encode lead to rejecting states in the tree. For the matching trace, the output value of this ack is a successor of the value of the previous ack. This automatically invalidates similar traces whose last ack contains a value that is not a successor.
Once it has built a maximally refined tree, the oracle compresses it into an equivalent maximally abstract tree by merging equivalent subtrees and their respective branches, and returns this tree as answer. This compression step is needed to ensure that learning converges, and also to produce compact models. Notice that the SDT shown in Figure 1.9a is maximally refined only in terms of its input parameters and is already maximally abstract in terms of its output parameters. This was done to ease exposition and also because producing maximally abstract subtrees for output parameters is greatly simplified by the determinism requirement. This requirement means that at most one refined output branch can lead to an accepting state, while all others necessarily lead to rejecting states, allowing for their simple merger.
Contribution In Chapter 5, we extend RALib and use it to generate and check TCP client implementations for FreeBSD and Linux. This is the first practical case study involving an RA learner. We frame the case study within the learning-based testing framework introduced by Meinke [151] (where learning is used as means of building tests more likely to uncover problems). The case study produced detailed concrete models with data that also captured abnormal scenarios. It also lead to the discovery of two
16 1. Introduction new violations. Conducting experiments lead to the uncovering of a bug, whereby while closing a data connection, the Linux TCP client processed and acknowledged certain invalid segments. Uncovering the bug was made possible by the exploratory tests learning involves. The bug was acknowledged and subsequently fixed by developers. Analyzing the models we discover a different violation to the RFC regarding the size of the receive window in TCP, acknowledged by the developers.
Getting RALib to the point where it could learn TCP involved several steps, some of which are detailed in the chapter. First we provide an implementation of the tree oracle for a setting of equalities and inequalities over sums with constants (Protocol B would fit in such a setting). We then adapt the Determinizer concept developed in Chapter 4 to this setting, and connect it to the framework. We also implement suffix optimizations for these relations to make the approach more scalable.
The concept of suffix optimization was introduced in [57] for a setting of equalities, but never implemented for our specific setting. This optimization involves annotating the symbolic suffixes obtained from counterexamples, with the relations they capture within the counterexample. The tree oracle only considers these relations when processing tree queries with this suffix, instead of all enabled relations, leading to a reduction in the number of tests needed to answer the tree query. To give a concrete example, in Figure 1.9 knowing that we only have to test the parameter of msg for equality (instead of also for successor) would reduce the number of tests from 3 to 2. The reduction becomes (much) more pronounced once we consider more relations, or suffixes and prefixes with more parameters.