Chapter 4 Mining Parametric Speci cations
4.3 Slicing Traces
4.3.3 Trace Slicing Algorithm
As discussed in Section 4.3.2, the number of trace slices is (๐๐)๐in the worst case. Since all trace slices can be distinct, this number gives a lower bound for all trace
slicing algorithms. ๎ขis lower bound is hard to achieve, though, since computing complete and connected parameter bindings may require several operations of com- bining. For example, โจP๔ทโฆ ๐๔ท,P๔ท โฆ ๐๔ท ,๔ท ,P๔ทกโฆ ๐๔ทก,๔ท ,โฆ, P๐โฆ ๐๐,๔ท โฉ in Figure 4.8 can
be obtained only a๎er at least ๐ combining operations: (((โจP๔ทโฆ ๐๔ท,P๔ท โฆ ๐๔ท ,๔ท โฉ โ
โจP๔ทโฆ ๐๔ท,P๔ทกโฆ ๐๔ทก,๔ท โฉ) โ โจP๔ทโฆ ๐๔ท,P๔ทขโฆ ๐๔ทข,๔ท โฉ) โ โฆ โ โจP๔ทโฆ ๐๔ท,P๐โฆ ๐๐,๔ท โฉ). Fur-
thermore, a trace slicing algorithm needs to search for compatible parameter bind- ings, which can be expensive, in order to create combined ones.
Figure 4.9 shows ๏ชM๏ฉ๏ฎ๏ฅ๏ฒโs trace slicer, called S๏ฌ๏ฉ๏ฃ๏ฅ๏ฒ. ๎ขis trace slicer traverses the given parametric trace only once and does not output spurious trace slices, such as ones that correspond to incomplete or unconnected parameter bindings. It has two stages: it rst processes the entire parametric trace, event by event, constructing intermediate results ฮ; and then it constructs the set of trace slices ฮจ, each corre- sponding to a complete and connected parameter binding.
During the rst stage, this algorithm stores in ฮ intermediate trace slices only for parameter bindings that are carried by events; i.e., it does not combine parameter bindings yet. ๎ขe second stage, C๏ฏ๏ฎ๏ณ๏ด๏ฒ๏ต๏ฃ๏ดC๏ฏ๏ฎ๏ฎ๏ฅ๏ฃ๏ด๏ฅ๏ค, constructs ฮฉ holding all possible connected parameter bindings by combining compatible ones in the loop on lines 2โ3. For each complete and connected parameter binding, its correspond- ing trace slice is nally constructed on lines 4โ6. ฮ collects all intermediate trace slices corresponding to ๐โs sub-bindings. M๏ฅ๏ฒ๏ง๏ฅT๏ฒ๏ก๏ฃ๏ฅ๏ณ is essentially the merge function of merge sort, using the position of events in the trace for comparison; recall that events in trace slices are listed chronologically.
๎ขeorem 1. A๎er running S๏ฌ๏ฉ๏ฃ๏ฅ๏ฒ on ๐ โ ๐ธโจ๐โฉโ,
1. ฮจ(๐) is de ned i๏ฌ ๐ is ๐-connected andDom(๐) = ๐;
2. If ฮจ(๐) is de ned, then ฮจ(๐) = ๐โพ๐.
๎ขis theorem states that all trace slices corresponding to ๐-connected and complete parameter bindings can be retrieved from ฮจ. Below is the proof of this theorem.
Lemma 1. A๎er nishing the loop on lines 2โ6 in C๏ฏ๏ฎ๏ณ๏ด๏ฒ๏ต๏ฃ๏ดC๏ฏ๏ฎ๏ฎ๏ฅ๏ฃ๏ด๏ฅ๏ค, a pa- rameter binding ๐ is ๐-connected i๏ฌ ๐ โ ฮฉ.
Proof. (โ) According to De nition 7, all parameter bindings added to ฮฉ on line 1
in C๏ฏ๏ฎ๏ณ๏ด๏ฒ๏ต๏ฃ๏ดC๏ฏ๏ฎ๏ฎ๏ฅ๏ฃ๏ด๏ฅ๏ค are ๐-connected because ฮ(๐) is de ned only if ๐โจ๐โฉ โ ๐. All parameter bindings added on line 3 are also ๐-connected because ๐๔ท , ๐๔ทกare
๐-connected and compatible, and ๐๔ท โ ๐๔ทกโ โฅ from the condition on line 2.
(โ) We prove this by well-founded induction on โ because the minimal ele- ment โฅ exists. Suppose that the property holds for all ๐โฒsuch that ๐โฒ โ ๐. It must then be shown that the property holds for ๐ as well. If ๐ comes from an event like
Input : ๐, ๐ = ๐๔ท โจ๐๔ท โฉ ๐๔ทกโจ๐๔ทกโฉ โฆ ๐๐โจ๐๐โฉ Output: ฮจ โ [[๐ โ ๐๐] โ ๐ธโ] Global : ฮ โ [[๐ โ ๐๐] โ ๐ธโ] Function S๏ฌ๏ฉ๏ฃ๏ฅ() for ๐ โ 1 to ๐ do 1 H๏ก๏ฎ๏ค๏ฌ๏ฅE๏ถ๏ฅ๏ฎ๏ด(๐๐โจ๐๐โฉ) 2 C๏ฏ๏ฎ๏ณ๏ด๏ฒ๏ต๏ฃ๏ดC๏ฏ๏ฎ๏ฎ๏ฅ๏ฃ๏ด๏ฅ๏ค() 3 Function H๏ก๏ฎ๏ค๏ฌ๏ฅE๏ถ๏ฅ๏ฎ๏ด(๐โจ๐โฉ)
if ฮ(๐) unde ned then
1 ฮ(๐) โ ๐ 2 ฮ(๐) โ ฮ(๐) ๐ 3 Function C๏ฏ๏ฎ๏ณ๏ด๏ฒ๏ต๏ฃ๏ดC๏ฏ๏ฎ๏ฎ๏ฅ๏ฃ๏ด๏ฅ๏ค() ฮฉ โ {๐ โฃ ฮ(๐) is de ned} 1 while โ๐๔ท , ๐๔ทกโ ฮฉ compatible, ๐๔ท โ ๐๔ทกโ โฅ, ๐๔ท โ ๐๔ทกโ ฮฉ do 2 ฮฉ โ ฮฉ โช {๐๔ท โ ๐๔ทก} 3 foreach ๐ โ ฮฉ s.t.Dom(๐) = ๐ do 4 ฮ = {ฮ(๐โฒ) โฃ ๐โฒโ ๐ and ฮ(๐โฒ) is de ned} 5 ฮจ(๐) โ M๏ฅ๏ฒ๏ง๏ฅT๏ฒ๏ก๏ฃ๏ฅ๏ณ(ฮ) 6
Figure 4.9: S๏ฌ๏ฉ๏ฃ๏ฅ๏ฒ: Trace Slicing algorithm.
in the rst case of De nition 7, then the property holds because ๐ belongs to ฮฉ as per line 1 in C๏ฏ๏ฎ๏ณ๏ด๏ฒ๏ต๏ฃ๏ดC๏ฏ๏ฎ๏ฎ๏ฅ๏ฃ๏ด๏ฅ๏ค. If ๐ is ๐๔ท โ ๐๔ทกlike in the second case of Def-
inition 7, then both ๐๔ท and ๐๔ทก belong to ฮฉ by the induction hypothesis, resulting
in ๐ โ ฮฉ as per line 3.
Lemma 2. A๎er running S๏ฌ๏ฉ๏ฃ๏ฅ๏ฒ, ฮจ(๐) is de ned i๏ฌ ๐ is connected andDom(๐) = ๐.
Proof. (โ) Line 6 in C๏ฏ๏ฎ๏ณ๏ด๏ฒ๏ต๏ฃ๏ดC๏ฏ๏ฎ๏ฎ๏ฅ๏ฃ๏ด๏ฅ๏ค is the only place ฮจ(๐) is de ned.
From the condition on line 4 and Lemma 1, ๐ is ๐-connected andDom(๐) = ๐. (โ) From Lemma 1, ฮฉ contains all ๐-connected parameter bindings. ๎ขerefore, if ๐ is ๐-connected andDom(๐) = ๐, then the body of the loop on lines 4โ6 is executed and, consequently, de nes ฮจ(๐).
Lemma 3. A๎er running S๏ฌ๏ฉ๏ฃ๏ฅ๏ฒ, if ฮจ(๐) is de ned, ฮจ(๐) = ๐โพ๐.
Proof. We rst show that ฮจ(๐) preserves the order of events as in ๐. ฮ(๐โฒ) preserves the order because H๏ก๏ฎ๏ค๏ฌ๏ฅE๏ถ๏ฅ๏ฎ๏ด processes events by chronological order and line 3 appends each event to ฮ(๐โฒ). Since M๏ฅ๏ฒ๏ง๏ฅT๏ฒ๏ก๏ฃ๏ฅ๏ณ is the same as the merge func- tion of a merge sort and all input lists to M๏ฅ๏ฒ๏ง๏ฅT๏ฒ๏ก๏ฃ๏ฅ๏ณ are sorted, the result of M๏ฅ๏ฒ๏ง๏ฅT๏ฒ๏ก๏ฃ๏ฅ๏ณ is also sorted.
Now, showing that ฮจ(๐) returned from M๏ฅ๏ฒ๏ง๏ฅT๏ฒ๏ก๏ฃ๏ฅ๏ณ keeps the base event of ๐โฒโจ๐โฒโฉ i๏ฌ ๐โฒโ ๐ will complete the proof.
(โ) A๎er running H๏ก๏ฎ๏ค๏ฌ๏ฅE๏ถ๏ฅ๏ฎ๏ด for all events in ๐, if there is an event ๐โฒโจ๐โฒโฉ in ๐, then line 2 in H๏ก๏ฎ๏ค๏ฌ๏ฅE๏ถ๏ฅ๏ฎ๏ด de nes ฮ(๐โฒ), resulting in ฮ(๐โฒ) โ ฮ (line 5 in
C๏ฏ๏ฎ๏ณ๏ด๏ฒ๏ต๏ฃ๏ดC๏ฏ๏ฎ๏ฎ๏ฅ๏ฃ๏ด๏ฅ๏ค). Since line 2 in H๏ก๏ฎ๏ค๏ฌ๏ฅE๏ถ๏ฅ๏ฎ๏ด stores the base event of ๐โฒโจ๐โฒโฉ in ฮ(๐โฒ), M๏ฅ๏ฒ๏ง๏ฅT๏ฒ๏ก๏ฃ๏ฅ๏ณ dispatches the base event of ๐โฒโจ๐โฒโฉ to ฮจ(๐).
(โ) ฮ(๐โฒ) keeps an event only if its parameter binding is ๐โฒ (line 3 in H๏ก๏ฎ-
๏ค๏ฌ๏ฅE๏ถ๏ฅ๏ฎ๏ด), and ฮ(๐โฒ) is considered to be merged only if ๐โฒ โ ๐ (line 5 in C๏ฏ๏ฎ- ๏ณ๏ด๏ฒ๏ต๏ฃ๏ดC๏ฏ๏ฎ๏ฎ๏ฅ๏ฃ๏ด๏ฅ๏ค). ๎ขus, ฮจ(๐) keeps the base event of ๐โฒโจ๐โฒโฉ only if ๐โฒโ ๐.
From Lemma 2 and Lemma 3, ๎ขeorem 1 holds.
Below the complexity of S๏ฌ๏ฉ๏ฃ๏ฅ๏ฒ is analyzed. It rst calls H๏ก๏ฎ๏ค๏ฌ๏ฅE๏ถ๏ฅ๏ฎ๏ด ๐ times, and, assuming that a self-balancing binary search tree is used for ฮ, the complexity of H๏ก๏ฎ๏ค๏ฌ๏ฅE๏ถ๏ฅ๏ฎ๏ด is ๐(log ๐). ๎ขe loop on lines 2โ3 in C๏ฏ๏ฎ๏ณ๏ด๏ฒ๏ต๏ฃ๏ดC๏ฏ๏ฎ๏ฎ๏ฅ๏ฃ๏ด๏ฅ๏ค can pick ๐๔ท and ๐๔ทกfrom ฮฉ ร ฮฉ, and each iteration takes ๐(๐) time for checking the
compatibility and combining the two parameter bindings. ๎ขere are |ฮฉ| iterations of the loop on lines 4โ6, with each iteration taking ๐(๐) time. ๎ขe running time of the entire algorithm is thus ๐(๐ log ๐+|ฮฉ|๔ทกโ ๐+|ฮฉ|โ ๐) = ๐(๐ log ๐+|ฮฉ|๔ทกโ ๐). Since the
algorithm creates all possible connected parameter bindings, |ฮฉ| can be calculated as follows: the number of connected ones with |Dom(๐)| = ๐ + 1 is (๐
๐) โ ( ๐
๐)๐because
we can choose ๐ parameters and there are ๐๐ parameter values for each parameter. ๎ขus, we have |ฮฉ| = โ๐๐=๔ท (๐
๐) โ ( ๐ ๐)๐= (
๐
๐+ 1)๐, and the time complexity of S๏ฌ๏ฉ๏ฃ๏ฅ๏ฒ
is ๐(๐ log ๐ + (๐๐ + 1)๔ทก๐ โ ๐) = ๐((๐๐ + 1)๔ทก๐ โ ๐). As for the space complexity, it needs to maintain ๐(|ฮฉ|) connected parameter bindings of length ๐(๐) during trace slicing. It also needs space for (๐๐)๐trace slices of size ๐ as illustrated in Figure 4.8. ๎ขerefore, the space complexity is ๐((๐๐+1)๐โ ๐+(๐๐)๐โ ๐) = ๐((๐๐+1)๐โ ๐). S๏ฌ๏ฉ๏ฃ๏ฅ๏ฒ iterates through all possible connected parameter bindings in the loop on lines 2โ3 in C๏ฏ๏ฎ๏ณ๏ด๏ฒ๏ต๏ฃ๏ดC๏ฏ๏ฎ๏ฎ๏ฅ๏ฃ๏ด๏ฅ๏ค. Since it turned out that this step is expen- sive, two optimizations have been applied. First, instead of blindly picking a pair of parameter bindings from ฮฉ and combining them, the implementation proceeds in a bottom-up manner. At the rst step, it picks two parameter bindings (๐๔ท and
๐๔ทก) such that |Dom(๐๔ท )| = |Dom(๐๔ทก)| = ๐, and creates ๐๔ท โ ๐๔ทก, if necessary. Af-
ter handling all parameter bindings with ๐ parameter bindings, it picks parameter bindings with ๐ + 1 parameter bindings, and so on, until ๐ reaches the size of ๐, the set of parameters. ๎ขis way, a parameter binding is considered for compatibility within only a limited window, reducing the number of iterations.
๎ขe second optimization is to group parameter bindings so that all parameter bindings in the same group bind exactly the same parameter values. Grouping also reduces the number of iterations on lines 2โ3 in C๏ฏ๏ฎ๏ณ๏ด๏ฒ๏ต๏ฃ๏ดC๏ฏ๏ฎ๏ฎ๏ฅ๏ฃ๏ด๏ฅ๏ค. For ex-
ample, if โจPโฆ ๐๔ท ,Qโฆ ๐๔ท โฉ is chosen as ๐๔ท , all parameter bindings that belong to
the group corresponding to {R,S} will be excluded from the list of candidates for ๐๔ทก
because any parameter binding in this group would result in ๐๔ท โ ๐๔ทก= โฅ.