• No results found

Learning Quantified Data Automata

Quantified Invariants of Linear Data Structures

5.2 Learning Quantified Data Automata

5.2 Learning Quantified Data Automata

Our goal in this section is to learn QDAs using existing active learning algorithms, such as Angluin’s algorithm, which was developed to infer the canonical DFA for a regular language. Therefore, we begin this section by analyzing the notion of canonical automata for QDAs. The result of this analysis then allows us to reduce the learning of QDAs to the learning of Moore machines.

5.2.1 Canonical QDAs

Recall that QDAs define two kinds of languages, namely a data word language and a valuation word language. We begin by observing that we cannot hope for unique minimal QDA on the level of data words.

To see why, consider the QDA A depicted in Figure. (on Page ) over PV = ∅ and Y = {y1, y2}. It accepts all valuation words in which

• d(y1) ≤ d(y2) if y1occurs before y2and y1, y2are both on even positions; • y2< y1; or

• at least one of y1and y2does not occur at an even position.

Hence, A accepts the data word language that consist of all data words for which the data at even positions is sorted. Since each QDA has to ensure that each variable occurs exactly once, the number of states of A is minimal for defining this language of data words.

However, a QDA in which we replace the transition δ(q6, b) = q5by the transition

δ(q6, b) = q1 accepts the same language of data words. This new QDA checks the sortedness only for all y1, y2with y2= y1+ 2, which is sufficient. This shows that the

transition structure of a state-minimal QDA for a given language of data words is not unique.

On the level of valuation words, on the other hand, we can show the existence of canonical minimal automata.

Theorem.. For each QDA A there is a unique minimal QDA Aminthat accepts the same

set of valuation words.

An intuitive explanation is that the automaton model is deterministic and since all universally quantified variables are in different positions, a QDA cannot derive any information about the relation between the data during its run. Let us prove Theorem. formally.

 5 Quantified Invariants of Linear Data Structures q1 q2 q0 q3 q4 q5 q6 q7 true d(y1) ≤ d(y2) b (b, y1) (b, y2) b (b, y 1) (b, y2) b (b, y2) b (b, y2) b b (b, y2) b b (b, y1)

Figure.: A QDA expressing the property over lists that the data on even positions is sorted. Missing transitions lead to a sink-state labeled withfalse, which

is not shown for the sake of readability. All states drawn as single circle are implicitly labeled with the formulafalse.

Proof of Theorem.. Consider a language Lval(Π × D)of valuation words that can be accepted by a QDA, and let w ∈ Π∗ be a symbolic word. Then, there has to be a formula ϕwin the lattice that precisely characterizes all valuation words v ∈ Lvalthat

extend w with data (i.e., that satisfy sw(v) = w). Since we assume all formulas in the lattice to be pairwise nonequivalent, ϕwis uniquely determined. One obtains ϕwby

considering for each valuation word v with sw(v) = w the greatest lower bound ϕv of all formulas in the lattice that v satisfies and then taking the least upper bound of all these ϕv.

In fact, the formula ϕw is independent of a particular QDA accepting Lval. To see

why, consider two QDAs, say A and A0, that both accept Lval. In addition, assume

that the QDA A reaches state q on reading w and that the QDA A0 reaches state q0 on reading w. In this situation, both q and q0 have to be labeled with the same data formula because A and A0 would otherwise accept different languages of valuation words. This proves that ϕw is unique and only depends on the given language of

valuation words.

Thus, a language of valuation words can be seen as a function that assigns a formula to every symbolic word, and one can think of a QDA as a Moore machine that computes this function. Moreover, for each Moore machine, there exists a unique minimal Moore machine that computes the same function (e.g., see Kohavi [Koh]). This proves Theorem..

5.2 Learning Quantified Data Automata 

5.2.2 Learning QDAs by Learning Moore Machines

The proof of Theorem. suggests viewing QDAs as Moore machines, and our goal is to use existing learning algorithms for Moore machines to learn QDAs. To this end, we separate the structure of valuation words (i.e., the length of the words, the cells the pointer variables point to, and so on) from the data contained in the words. We do so by introducing what we callformula words.

Definition. (Formula word). Let PV be a finite set of pointer variables, Y a finite, nonempty set of universally quantified variables, and F a lattice over a finite set F of formulas. Aformula word is a pair (w, ϕ) ∈ (Π∗×F) consisting of a symbolic word

w ∈ Π∗(as before, Σ = 2PVand Π = Σ × (Y ∪ {−})) and a data formula ϕ ∈ F.

Note that a formula word does not contain elements of the data domain—it simply consists of the symbolic word that depicts the pointers into the list (modeled using Σ), a valuation for the quantified variables (modeled using Y ∪ {−}), as well as a formula over the data domain. Hence, a symbolic word represents a set of valuation words, namely those whose data component satisfies the data formula.

Example.. The formula word 

({head}, y1)b(b, y2)({tail}, −), d(y1) ≤ d(y2)

represents thathead and y1point to the first cell of a list, y2points to the third cell,

andtail points to the last cell; the data formula is d(y1) ≤ d(y2). /

A QDA A = (Q, Π, q0, δ, f ) over the set F of data formulas accepts a formula word

(w, ϕ) ∈ Π∗×F if A reaches a state q ∈ Q on reading the symbolic word w and f (q) = ϕ. Given a QDA A, we define the language Lfor(A) ⊆ Π∗×F of formula words accepted

by A in the usual way. Moreover, we call a language L ⊆ Π∗×F of formula words

QDA-acceptable if there exists a QDA A with Lfor(A) = L.

Note that not every language of formula words is QDA-acceptable; for instance, consider the language

L?for= {(bi(p, y)bi, true) | i ≥ 1}.

A standard pumping argument shows that L?forcannot be accepted by a QDA since the number of blanks at the beginning and at the end of a word have to match. Furthermore, words whose symbolic component is not of the form bi(p, y)biare not present in L?for but a QDA necessarily assigns a unique formula to every symbolic word.

In fact, every QDA-acceptable language Lfor of formula words has to fulfill at least

 5 Quantified Invariants of Linear Data Structures

• For every symbolic word w ∈ Π, there exists a formula ϕ such that (w, ϕ) ∈ Lfor.

• If (w, ϕ) ∈ Lforand (w, ϕ0) ∈ Lfor, then ϕ = ϕ0.

• The number of different formulas occurring in formula words in Lforis finite.

These constraints allow us to treat QDAs as Moore machines that read symbolic words and output data formulas. In fact, a QDA-acceptable language Lfor⊆ Π∗×F is

an alternative representation of a Moore machine-computable mapping f : Π∗→F. One easily deduces that two two QDAs (over the same lattice of formulas) that accept the same set of valuation words also accept the same set of formula words (assuming that all formulas in the lattice are pairwise nonequivalent). Thus, we can easily reduce the problem of learning QDAs to the problem of learning Moore machines. Note that we intentionally do not view a QDA as a device that computes a mapping but as a device that accepts a language. We do this to ease the description in later sections.

Before we describe how to reduce the learning of QDAs to the learning of Moore machines, let us briefly discuss the latter.

Actively Learning Moore Machines

In the context of actively learning Moore machines, the target concept is a Moore machine computable function f : Σ∗ → Γ that maps each word u over the input alphabet Σ to an output f (u) taken from an output alphabet Γ . Note that we obtain Angluin’s original setting by letting Γ = {0, 1}.

Given a Moore machine-computable function f : Σ∗→ Γ, a teacher for f answers queries as follows.

Membership query On a membership query with a word u ∈ Σ∗, the teacher returns the function value f (u).

Equivalence query On an equivalence query with a Moore machine M, the teacher

checks whether fM= f is satisfied. If this is the case, he returns “yes”. If this is

not the case, he returns a counterexample u ∈ Σwith f (u) , fM(u).

Note that the learner and teacher do not need to agree a priori on the output alphabet since the learner can obtain this knowledge through membership queries.

One can straightforwardly adapt observation table-based learning algorithms, such as Angluin’s algorithm and Rivest and Schapire’s algorithm, to learn Moore machines. The idea is to lift the Nerode congruence to Moore machine-computable functions

f : Σ∗→ Γ by defining

u ∼f v if and only if ∀w ∈ Σ

5.2 Learning Quantified Data Automata 

where u, v ∈ Σ. Then, it is indeed enough to adapt the mapping T of an observation table and the way conjectures are generated.

More precisely, one changes the mapping T of an observation table O = (R, S, T ) to a mapping T : (R ∪ R · Σ) · S → Γ . Moreover, one does no longer produce a DFA as conjectures but a Moore machine M = (Q, Σ, Γ , q0, δ, λ) whose output function

is defined by λ([[u]]O) = T (u) where u ∈ R. Everything else (i.e., the notion of O-

equivalence, the notion of closedness and consistency, and the functioning of the algorithm) is left unchanged. Chen et al. [CFC+] demonstrate this adaptation for the case |Γ | = 3.

These so adapted algorithms learn the unique minimal Moore machine for the target function in time polynomial in the size of the minimal Moore machine and the length of the longest counterexample returned by the teacher. Thus, we immediately obtain the following remark.

Remark .. Given a teacher for a Moore machine-computable function, who can

answer membership and equivalence queries, the unique minimal Moore machine for this function can be learned in time polynomial in the size of the minimal Moore machine and the length of the longest counterexample returned by the teacher. Actively Learning QDAs

For the task of actively learning QDAs, we assume that the teacher has access to a QDA-acceptable language L ⊆ Π∗×F of formula words and answers queries as follows.

Membership query On a membership query, the learner provides a symbolic word w ∈ Π, and the teacher returns the unique formula ϕ ∈ F with (w, ϕ) ∈ L. Note that such a formula word is guaranteed to exist since L is a QDA-acceptable language.

Equivalence query On an equivalence query with a QDA A, the teacher checks whether Lfor(A) = L is satisfied. If this is the case, he returns “yes”. If this is not the case,

then there exists a formula word (w, ϕ) such that (w, ϕ) ∈ Lfor(A) if and only if (w, ϕ) < L (since both Lfor(A) and L contain a formula word of the form (w0, ϕ0)

for every w0∈ Π∗), and the teacher returns w as counterexample.

Since a teacher for QDAs answers queries in the same manner as a teacher for Moore machines and each QDA-acceptable language contains only finite many different data formulas, we can reduce the learning of QDAs to the learning of Moore machines. This allows us to apply off-the-shelf learning algorithms, such as Angluin’s or Rivest and Schapire’s algorithm, and we obtain the following result.

Theorem.. Given a teacher for a QDA-acceptable language of formula words, who can

 5 Quantified Invariants of Linear Data Structures can be learned in time polynomial in the size of the minimal QDA and the length of the longest counterexample returned by the teacher.

We end this section by remarking that learning QDAs on the level of valuation words (or formula words) has the drawback that one has to fix a set of valuation words that represents the language of data words one is interested in. In Section ., we present an implementation of a teacher who answers queries based on information derived from runs of actual code manipulating lists and arrays. Such a teacher does not know an invariant and is necessarily imprecise. Thus, the teacher might answer the queries with respect to a language of valuation words that requires a large QDA or is not even QDA-acceptable at all. However, in the setting of learning invariants, a learner does not need to learn the exact language the teacher “has in mind”. It suffices if a learner arrives at some QDA that represents an invariant. Since invariants are often not very complex, the hope is that a learner succeeds in learning an invariant from the incomplete information provided by the teacher. We substantiate this hope in Section., where we present results on learning invariants for various programs.