Regular Model Checking Our first application scenario for automata learning in the context of verification

is Regular Model Checking. Regular Model Checking is a special type of Model Checkingin which the program—or system—at hand is modeled symbolically in terms of finite automata. More precisely, configurations of a program are modeled as finite words, sets of configurations as regular languages, and the program’s semantic (i.e., its transitions) as a rational relation over pairs of configurations. A key motivation for modeling programs in terms of regular languages is that (deterministic) finite automata provide an efficient representation of large and infinite state spaces.

We here consider the verification of safety properties as in the case of the original Regular Model Checking framework [KMM+]. More precisely, we want to verify that a given program does not allow for an execution that starts in a set I of initial

configurations and ends in a dedicated set B of bad configurations; the bad configurations

describe conditions of the program that must not occur during its execution. However, this question is undecidable in general and tools for Regular Model Checking are, therefore, necessarily incomplete (i.e., they give the correct answer on termination

but are not guaranteed to halt). Nonetheless, good results of such algorithms were reported for a large number of practical applications [Leg, BFL].

In general, verification techniques can broadly be classified into two categories:

white-box techniques, which have complete access to the internals of the program in

question, andblack-box techniques, which are (largely) agnostic of the program and

typically obtain their information from an external source.

The majority of existing tools and techniques for Regular Model Checking, such as Faster [BFL] and T(o)rmc [Leg], fall in the first category as they directly

_{We assume a familiarity with the fundamentals of Model Checking and refer the reader to Baier and}

Katoen [BK] for an introduction to this topic.

 4 Regular Model Checking

manipulate the input-automata. A general advantage of white-box techniques is that they can use every bit of information available to reason about the given program. At the same time, however, complete knowledge of a (potentially complex) program can also be disadvantageous, and building a verification procedure that works cognizant of the program is hard, simply due to the fact that one has to handle the complex logic of the program. In the context of Regular Model Checking, for instance, the size of the input-automata (which provides a natural measure for the complexity of the program) is a prime factor for the actual performance of algorithms. In fact, the applicability of tools such as Faster and T(o)rmc is often seriously limited in practical applications by the size of the input-automata.

Black-box techniques, on the other hand, offer an elegant solution to this problem as they avoid working directly on the given automata. Automata learning seem partic- ularly useful in this context: since such techniques are agnostic to the complexity of the program at hand and obtain their information from sample executions or from an external information source, they typically produce simple solutions. Another promis- ing feature is that automata learning algorithms are good at generalizing from few information, whereas white-box algorithms often struggle with the comprehensive knowledge they posses about the program. Moreover, the complexity of learning- based algorithms usually depends on the final result but not immediately on the input. Therefore, developing learning-based black-box techniques for Regular Model Checking seems worthwhile.

The objective of this chapter is to study how automata learning can advance the state-of-the-art in Regular Model Checking. To this end, we develop several learning- based algorithms and contrast them with existing tools. Our algorithms are based upon an approach commonly used to verify safety properties: we search for aninvariant

of the program that contains at least the reachable configurations and is disjoint from the set of bad configurations. More precisely, we aim for a setInv of program

configurations that

• contains all of the initial configurations in I; • contains none of the bad configurations in B; and

• isinductive (i.e., c ∈ Inv implies c0 ∈_{Inv for all transitions (c, c}0_{) of the program).} Such a set is in fact enough to prove a program correct because it witnesses that there is no way to reach a bad configuration from an initial configuration. Since program configurations are modeled as finite words, we use DFAs as representations of invariants. We introduce all necessary formalisms and notations in Section..

The pivotal idea of the algorithms presented next is to separate the program to verify from the actual invariant synthesis algorithm and use Angluin’s learning framework



as a standard protocol for data exchange: a membership query corresponds to the question whether a configuration is to be included in an invariant, whereas an equivalence query corresponds to the question whether a conjecture accepts an invariant. In the context of Regular Model Checking, however, we cannot simply apply standard learning algorithms because we cannot build a teacher who has precise knowledge about an invariant (as this would entail solving the Regular Model Checking problem beforehand). The situation becomes even more intricate because there might exist several valid invariants, which means that the teacher’s target concept is no longer unique. Indeed, the lack of a precise teacher for invariants is a great challenge, and we have to carefully design new learning settings to be able to apply automata learning techniques.

In the course of this chapter, we develop a family of three different types of algorithms for computing invariants in Regular Model Checking. In Section., we start with a white-box algorithm. Although this algorithm is not yet based on automata learning, it serves as basis for the following learning-based algorithms. On top of that, it is also of its own interest in that it avoids many difficulties that existing white-box techniques have to deal with. Subsequently, in Section., we develop two algorithms that combine active and passive learning. These algorithms obtain their information about the initial and bad configurations from a teacher, but they still need access to the transducer of the program. Thus, one might think of these algorithms as halfway between white-box and black-box approaches. Finally, in Section., we modify the algorithms of Section. to completely operate in a black-box fashion. The algorithms of Section. obtain all of their information solely from a teacher, whose task it is to reason about the program. Thus, these algorithms can also be applied in situations in which the program cannot be modeled in terms of regular languages, the program is given as a black-box, or the input-automata are too large for white-box techniques.

The remainder of this introduction gives an overview of each type of algorithm and concludes with a summary of topics also covered in this chapter.

A white-box algorithm Our first algorithm works in a complete white-box fashion: it takes two NFAs accepting the sets of initial and bad configurations as well as an asynchronous transducer as input and produces a DFA accepting an invariant as output. Internally, the algorithm constructs logic formulas that postulate the existence of such a DFA and delegates their satisfiability checks to an underlying logic solver. More precisely, our algorithm creates and solves a sequence of logic formulas ϕnwhere

n ∈ N+that depend on the input automata and have the following two properties: first,

ϕ_nis satisfiable if and only if there exists a DFA with n states that accepts an invariant; second, a model of ϕnacts as a blueprint to construct such a DFA. Starting with n = 1,

 4 Regular Model Checking

guarantees to find a smallest DFA accepting an invariant (in terms of the number of states) provided that one exists.

We implement the formulas ϕnin two different logics, which already proved their

efficacy in the context of passive learning (cf. Section .): Propositional Boolean Logic and the logic of uninterpreted functions over the naturals. Since mature solvers for both logics are available, the hope is—and we substantiate this hope experimentally— that a solver-based algorithm also proves its effectiveness in the context of Regular Model Checking.

Although the algorithm sketched above is white-box and does not use automata learning, it still offers an advantage over existing tools in that it avoids manipulating the input-automata. Broadly speaking, tools such as T(o)rmc and Faster create a sequence of DFAs, starting with a DFA accepting the set of initial configurations, by applying the transition relation until a DFA accepts an inductive set that is disjoint from the set of bad configuration. Both tools use abstraction techniques during their computation (acceleration in the case of Faster and extrapolation in the case of T(o)rmc)

in order to make the sequence of DFAs converge in finite time.However, a serious drawback of such approaches is that they often produce huge intermediate results (although T(o)rmc tries to reduce their size during the computation), even if their final result itself is small. In contrast, our solver-based algorithm uses a highly opti- mized logic solver to handle expensive calculations (our experiments show that the used solvers performed well on the resulting formulas). Moreover, our algorithm is deliberately designed to search for a smallest invariant, which helps to keep the logic formulas and the effort to solve them as small as possible.

Semi-black-box algorithms Our semi-black-box algorithms learn an invariant in interaction with a teacher. The underlying idea is to abstract from the exact sets of initial and bad configurations by sampling them. More precisely, our algorithms maintain a sample S = (S+, S−) that consists of a finite set S₊approximating the initial

configurations and a finite set S− approximating the bad configurations. In every

iteration, the algorithms compute a DFA that is consistent with S and inductive with respect to the transducer by means of a logic solver (which involves encoding the transducer into the logic formula as in the case of the white-box algorithm). The resulting DFA contains at least the configurations reachable from S+via the transitions

defined by the transducer and does not contain any configuration in S−. If all initial

configurations and no bad configurations are contained, the DFA accepts an invariant. If this is not the case, the respective approximation needs to be refined.

_{We describe the functioning of both Faster and T(o)rmc in more detail in the section about related}



The actual learning takes place in an Angluin-style active learning setting. More precisely, we assume a setting in which the teacher answers queries as described next.

Membership query On a membership query, the teacher returns whether the configura-

tion in question is an initial configuration (the teacher returns “yes”) or whether it is a bad configuration (the teacher returns “no”). If the configuration is neither initial nor bad, the teacher does not know whether an invariant includes or excludes the configuration (as the problem is learning an unknown invariant) and answers “don’t know”.

Equivalence query On an equivalence query, the teacher only permits conjectures

that accept an inductive set. Then, checking whether the conjecture accepts an invariant amounts to checking whether it classifies the sets of initial and bad configurations correctly. Should that not be the case, the teacher can easily identify a counterexample.

We call such teachersincomplete because they can only provide incomplete information

about an invariant.

We propose two learners capable of learning from incomplete teachers (i.e., learners that can handle “don’t cares” and produce conjectures accepting inductive sets), which differ in the strategy to sample and refine sets of program configurations. The first learner follows the idea of theCEGAR framework [CGJ+]: if the abstraction of

either the initial or the bad configurations is too coarse and a conjecture does not accept an invariant, the teacher returns a counterexample and the learner refines the abstraction accordingly. The second learner follows a more elaborated procedure based on Angluin’s learning algorithm, where additional membership queries ask whether individual configurations belong to an invariant. These queries refine the abstraction further and remove the need of generating a new conjecture in every step. To complete our semi-black-box approach, we describe exemplary how to construct an incomplete teacher, assuming that the program at hand is given in terms of finite automata. Note, however, that the semi-black-box setting does not prescribe in which form the sets of initial and bad configurations have to be given but only requires an incomplete teacher to work.

Black-box algorithms Finally, we turn the incomplete teacher setting from above into an Angluin-style black-box learning setting. The learner is now completely agnostic of the program being verified and obtains all information exclusively from a teacher, who reasons about the program. Therefore, the learner’s objective becomes to build a conjecture that satisfies the teacher’s demands.

However, this learning setting comes with an inherent problem. Consider an equiv- alence query with a DFA accepting a set P of program configurations; if P is not

 4 Regular Model Checking

inductive, say due to a transition (c, c0) with c ∈ P and c0 < P , then the teacher is stuck: since he does not know an invariant, the teacher has no information about whether c0 should be included in or c should be excluded from a conjecture. In current state-of- the-art learning algorithms for invariant synthesis [CGP, AMNa, ACMN], the teacher cheats: whenever a conjecture is not inductive, he makes an arbitrary choice in the hope that this choice will still result in an invariant. However, this makes the setting nonrobust, “causing divergence, blocking the learner from learning the sim- plest concepts, and introducing arbitrary bias that is very hard to control” [GLMNa, Page].

As solution to this problem, we recently proposed the so-calledICE-learning frame- work (learning via implications, counterexamples, and examples) [GLMNa, GLMN],

which extends Angluin’s learning setting withimplication counterexamples: if the DFA

conjectured on an equivalence query accepts a noninductive set P , the teacher returns a pair of program configurations witnessing a violation of inductivity (i.e., he returns a pair (c, c0

) such that c ∈ P , c0

< P , and c0 is reachable from c via a transition of the program). This way, a teacher can precisely communicate why a conjecture is not inductive even without knowing an invariant.

The exact learning setting we use for Regular Model Checking is a combination of the incomplete teacher setting and the ICE-learning framework. A teacher for this black-box setting answers queries as follows.

Membership query On a membership query, the teacher works in the same way as the

incomplete teacher and returns “yes”, “no”, or “don’t know”.

Equivalence query Equivalence queries are no longer restricted. Given a conjecture,

the teacher returns “yes” if it accepts an invariant, a (classical) counterexample if it classifies the initial or bad configurations incorrectly, or an implication- counterexample if the conjecture’s language is not inductive.

This new learning setting requires us to construct a learning algorithm that can handle both “don’t know” answers and implication-counterexamples. To this end, we adapt our semi-black-box algorithms to encode implications rather than the program’s transducer in the logic formulas. Moreover, we describe exemplary how to build an appropriate teacher, again assuming that the program at hand is given in terms of finite automata. As in the case of the semi-black-box setting, the black-box setting does not prescribe in which form the program has to be given but only requires an ICE-teacher to work.

Evaluation and further applications Based on a prototype implementation, we compare our algorithms with Faster and T(o)rmc. To this end, we use two benchmark

In document Applications of automata learning in verification and synthesis (Page 97-103)