In this section, the membership query generator of the ModifiedQSM and MarkovQSM algorithms is described. It contains two sub-generators that are designed to work together as one generator. The two sub-generators are described in Sections6.2.1 and 6.2.2. The following definitions are introduced before describing the generators of membership queries. The set of sequences that lead to a state q from the initial state q0is defined in Definition6.1.
It is denoted by Seq (q).
Definition 6.1. Given a state q ∈ Q and the current automaton(A). Seq (q) = {w ∈ L(A)|ˆδ(q0, w) = q}.
The shortest sequences that lead to a state q from the initial state q0are defined in Definition6.2,
denoted by Sp(q). The shortest sequences of the state Sp(q) are a subset of the short pre-
fixes of the language Sp(L) that is identified by the automaton A.
Definition 6.2. Given a state q ∈ Q, let Seq (q) denote the set of sequences that lead to a state q from the initial state q0, and the current automaton(A). A sequence w
that belongs to the Seq (q) is said to be the shortest sequence if there is no other se- quence y ∈ Seq (q) where the length of y is shorter than w. Sp(q) = {w ∈ Seq (q)|@y ∈
Seq(q)\{w} such that |w| > |y|}.
6.2.1 Dupont’s QSM Queries
The main generator of membership queries in the QSM algorithm was introduced by Dupont et al. [36] and is called the Dupont generator in this thesis. The Dupont generator is re- sponsible for the generation of queries about new scenarios (sequences) that appear as a consequence of merging states. In other words, it is asked about sequences that belong to the language of the merged automaton but do not belong to the language of the current solution (LTS hypothesis). The objective of asking these queries is to prevent bad general- izations (state merging) of the inferred models [36]. Hence, it is considered as an essential (main) generator in ModifiedQSM and MarkovQSM.
Let Suff (qb, A) denotes the set of suffixes of the blue state qb in the current automaton(A).
sequences that lead to the red state qr from the root state (q0) in A, denoted by Sp(qr).
The membership queries are generated by concatenating each sequence belonging to Sp(qr)
with each suffix belonging to Suff (qb, A) and not to Suff (qr, A). A generated membership
query is a sequence obtained by concatenating two sequences s · y such that s ∈ Sp(qr)
and y ∈ Suff (qb, A). Thus, the generated query s · y belongs to L(A0) and does not belong
to L(A). The way of constructing Dupont queries is defined in Definition 6.3.
Definition 6.3. Given a pair of red/blue states (qr, qb) ∈ Q, the current automaton(A),
and the merged automaton(A0). The Dupont queries is defined by:
Dupontqueries= {s · y | s ∈ Sp(qr), y ∈ Suff (qb, A)} such that s · y ∈ L(A0)\L(A).
The following two examples show how to construct the membership query for a recursive and non-recursive merge of states respectively.
A start B C D G E H I K Load Edit Close Edit Save Save Close Load
(a)The current automaton A
A start C D G E H I K Load Edit Close Edit Save Save Close Load
(b) The merged automaton A0
Figure 6.1: The first example of computing the Dupontqueries
Example 6.1. Figure 6.1(a)shows the current automaton during the induction process. Let us consider that the B state is chosen to merge with the A state. The shortest sequence that leads to the red state A are empty, denoted by Sp(qr) = {}. The Suff (qb, A) set
contains the following sequences: {hEdit, Edit, Save, Closei, hEdit, Savei, hClose, Loadi}. The Dupontqueries queries are generated by concatenating the Sp(qr) with each suffix in
queries are generated: Dupontqueries = {hEdit, Edit, Save, Closei, hEdit, Savei, hClose, Loadi}. It is clear that the generated queries belong to L(A0), which is shown in Fig- ure6.1(b), and do not belong to L(A).
Example 6.2. Figure 6.2(a) illustrates the current automaton during the inference pro- cess. Consider that the G state is chosen to merge with the C state. Figure 6.2(a) shows the merged automaton (A0) computed by merging the chosen pair of states. The short- est sequence that leads to the red state C is Sp(qr) = hLoad, Editi. The Suff (qb, A)
set contains the following sequences: {hSave, Closei}. The Dupontqueries queries are generated by concatenating the Sp(qr) with each suffix in the Suff (qb, A) set as described
in Definition6.3. In this way, the following membership queries are generated: Dupontqueries = {hLoad, Edit, Save, Closei}. It is noticed that the query belongs to L(A0) and does not belong to L(A). A start B C D G E H I K Load Edit Close Edit Save Save Close Load
(a)The current automaton A
A start B C D E I K Load Edit Close Edit Save Close Load
(b) The merged automaton A0
Figure 6.2: The second example of computing the Dupontqueries
6.2.2 One-step Generator
The second generator of membership queries is called one-step. It is motivated by the observation that the membership queries that are constructed using the Dupont generator are insufficient to prevent merging inequivalent pairs of states. The example below shows
that the Dupont generator does not generate any query. It is important to highlight that the one-step queries are only present in the ModifiedQSM and MarkovQSM.
Example 6.3. Consider the automaton that is shown in Figure 6.3, and suppose that the C state is chosen to merge with the B state. The Dupont generator will not generate any queries since merging of states will not add new scenarios to the merged (red) node. There is no label that will be added to the red state if the EDSM merges them. In this case, the set of Dupontqueries is empty. The following example shows that the pair of states (B, C) are compatible for merging using the QSM learner because they are both accepting states. However, they are inequivalent with respect to the language of the reference LTS.
A start B C D G N K Load Edit Close Edit Edit Load
Figure 6.3: An example of computing the one-step generator
It is interesting to consider extra membership queries in order to detect incompatible pairs of states to avoid merging them. Thus, one way is to ask about the labels of the outgoing transitions of a red state that lead to an accepting state; where those labels are not overlapped with the labels of outgoing transitions of a blue state. In other words, labels (elements of alphabet) of the outgoing transitions that lead to an accepting state where they belong to Σout
qr and do not belong to Σ
out
qb are asked from the blue state. It is
inspired by the notion of the k-tails algorithm in which a pair of states are deemed to be equivalent if they share the same suffixes of length k. It is worth mentioning that k-tails suffixes are leading to accepting states.
The one-step generator constructs queries by collecting the shortest sequences from the root state to the blue node Sp(qb) to pick one of them. Then, the shortest sequence
c ∈ Sp(qb) is concatenated with each label of the outgoing transitions of the red state that
lead to an accepting state, but there is no transition to emerge from the blue state with the same label. The construction of the one-step queries is defined formally in Definition 6.4.
Definition 6.4. Given a pair of red/blue states (qr, qb) ∈ Q and the current automa-
ton(A). The one-step queries is defined by (oq) = {s · hσi | s ∈ Sp(qb), σ ∈ Σoutqr \Σ
out qb ∧ q
0 ∈
δ(qr, σ) such that q0∈ F+}.
Example 6.4. Let us consider the pair of states that is shown in Figure 6.3 above, and suppose that the C state is chosen to merge with the B state. The shortest path to the blue state from the root state is Sp(C) = hLoad, Editi. The Σoutqr \Σ
out
qb contains only
one label as follows: {Close}. In this example, the one-step generator results in only one membership by concatenating Sp(C) with the Close label. This yields the following query:
onestep queries= {hLoad, Edit, Closei}