• No results found

Chapter 4 Mining Parametric Speci cations

4.3 Slicing Traces

4.3.3 Trace Slicing Algorithm

As discussed in Section 4.3.2, the number of trace slices is (๐‘š๐‘›)๐‘šin the worst case. Since all trace slices can be distinct, this number gives a lower bound for all trace

slicing algorithms. ๎ขis lower bound is hard to achieve, though, since computing complete and connected parameter bindings may require several operations of com- bining. For example, โŸจP๔ทŸโ†ฆ ๐‘๔ทŸ,P๔ท โ†ฆ ๐‘๔ท ,๔ท ,P๔ทกโ†ฆ ๐‘๔ทก,๔ท ,โ€ฆ, P๐‘šโ†ฆ ๐‘๐‘š,๔ท โŸฉ in Figure 4.8 can

be obtained only a๎‚er at least ๐‘š combining operations: (((โŸจP๔ทŸโ†ฆ ๐‘๔ทŸ,P๔ท โ†ฆ ๐‘๔ท ,๔ท โŸฉ โŠ”

โŸจP๔ทŸโ†ฆ ๐‘๔ทŸ,P๔ทกโ†ฆ ๐‘๔ทก,๔ท โŸฉ) โŠ” โŸจP๔ทŸโ†ฆ ๐‘๔ทŸ,P๔ทขโ†ฆ ๐‘๔ทข,๔ท โŸฉ) โŠ” โ€ฆ โŠ” โŸจP๔ทŸโ†ฆ ๐‘๔ทŸ,P๐‘šโ†ฆ ๐‘๐‘š,๔ท โŸฉ). Fur-

thermore, a trace slicing algorithm needs to search for compatible parameter bind- ings, which can be expensive, in order to create combined ones.

Figure 4.9 shows ๏ชM๏ฉ๏ฎ๏ฅ๏ฒโ€™s trace slicer, called S๏ฌ๏ฉ๏ฃ๏ฅ๏ฒ. ๎ขis trace slicer traverses the given parametric trace only once and does not output spurious trace slices, such as ones that correspond to incomplete or unconnected parameter bindings. It has two stages: it rst processes the entire parametric trace, event by event, constructing intermediate results ฮ”; and then it constructs the set of trace slices ฮจ, each corre- sponding to a complete and connected parameter binding.

During the rst stage, this algorithm stores in ฮ” intermediate trace slices only for parameter bindings that are carried by events; i.e., it does not combine parameter bindings yet. ๎ขe second stage, C๏ฏ๏ฎ๏ณ๏ด๏ฒ๏ต๏ฃ๏ดC๏ฏ๏ฎ๏ฎ๏ฅ๏ฃ๏ด๏ฅ๏ค, constructs ฮฉ holding all possible connected parameter bindings by combining compatible ones in the loop on lines 2โ€“3. For each complete and connected parameter binding, its correspond- ing trace slice is nally constructed on lines 4โ€“6. ฮ“ collects all intermediate trace slices corresponding to ๐œƒโ€™s sub-bindings. M๏ฅ๏ฒ๏ง๏ฅT๏ฒ๏ก๏ฃ๏ฅ๏ณ is essentially the merge function of merge sort, using the position of events in the trace for comparison; recall that events in trace slices are listed chronologically.

๎ขeorem 1. A๎„—er running S๏ฌ๏ฉ๏ฃ๏ฅ๏ฒ on ๐œ โˆˆ ๐ธโŸจ๐‘‹โŸฉโˆ—,

1. ฮจ(๐œƒ) is de ned i๏ฌ€ ๐œƒ is ๐œ-connected andDom(๐œƒ) = ๐‘‹;

2. If ฮจ(๐œƒ) is de ned, then ฮจ(๐œƒ) = ๐œโ†พ๐œƒ.

๎ขis theorem states that all trace slices corresponding to ๐œ-connected and complete parameter bindings can be retrieved from ฮจ. Below is the proof of this theorem.

Lemma 1. A๎„—er nishing the loop on lines 2โ€“6 in C๏ฏ๏ฎ๏ณ๏ด๏ฒ๏ต๏ฃ๏ดC๏ฏ๏ฎ๏ฎ๏ฅ๏ฃ๏ด๏ฅ๏ค, a pa- rameter binding ๐œƒ is ๐œ-connected i๏ฌ€ ๐œƒ โˆˆ ฮฉ.

Proof. (โ‡) According to De nition 7, all parameter bindings added to ฮฉ on line 1

in C๏ฏ๏ฎ๏ณ๏ด๏ฒ๏ต๏ฃ๏ดC๏ฏ๏ฎ๏ฎ๏ฅ๏ฃ๏ด๏ฅ๏ค are ๐œ-connected because ฮ”(๐œƒ) is de ned only if ๐‘’โŸจ๐œƒโŸฉ โˆˆ ๐œ. All parameter bindings added on line 3 are also ๐œ-connected because ๐œƒ๔ท , ๐œƒ๔ทกare

๐œ-connected and compatible, and ๐œƒ๔ท โŠ“ ๐œƒ๔ทกโ‰  โŠฅ from the condition on line 2.

(โ‡’) We prove this by well-founded induction on โŠ‘ because the minimal ele- ment โŠฅ exists. Suppose that the property holds for all ๐œƒโ€ฒsuch that ๐œƒโ€ฒ โŠ‘ ๐œƒ. It must then be shown that the property holds for ๐œƒ as well. If ๐œƒ comes from an event like

Input : ๐‘‹, ๐œ = ๐‘’๔ท โŸจ๐œƒ๔ท โŸฉ ๐‘’๔ทกโŸจ๐œƒ๔ทกโŸฉ โ€ฆ ๐‘’๐‘›โŸจ๐œƒ๐‘›โŸฉ Output: ฮจ โˆˆ [[๐‘‹ โ†’ ๐‘‰๐‘‹] โ‡ ๐ธโˆ—] Global : ฮ” โˆˆ [[๐‘‹ โ‡ ๐‘‰๐‘‹] โ‡ ๐ธโˆ—] Function S๏ฌ๏ฉ๏ฃ๏ฅ() for ๐‘– โ† 1 to ๐‘› do 1 H๏ก๏ฎ๏ค๏ฌ๏ฅE๏ถ๏ฅ๏ฎ๏ด(๐‘’๐‘–โŸจ๐œƒ๐‘–โŸฉ) 2 C๏ฏ๏ฎ๏ณ๏ด๏ฒ๏ต๏ฃ๏ดC๏ฏ๏ฎ๏ฎ๏ฅ๏ฃ๏ด๏ฅ๏ค() 3 Function H๏ก๏ฎ๏ค๏ฌ๏ฅE๏ถ๏ฅ๏ฎ๏ด(๐‘’โŸจ๐œƒโŸฉ)

if ฮ”(๐œƒ) unde ned then

1 ฮ”(๐œƒ) โ† ๐œ– 2 ฮ”(๐œƒ) โ† ฮ”(๐œƒ) ๐‘’ 3 Function C๏ฏ๏ฎ๏ณ๏ด๏ฒ๏ต๏ฃ๏ดC๏ฏ๏ฎ๏ฎ๏ฅ๏ฃ๏ด๏ฅ๏ค() ฮฉ โ† {๐œƒ โˆฃ ฮ”(๐œƒ) is de ned} 1 while โˆƒ๐œƒ๔ท , ๐œƒ๔ทกโˆˆ ฮฉ compatible, ๐œƒ๔ท โŠ“ ๐œƒ๔ทกโ‰  โŠฅ, ๐œƒ๔ท โŠ” ๐œƒ๔ทกโˆ‰ ฮฉ do 2 ฮฉ โ† ฮฉ โˆช {๐œƒ๔ท โŠ” ๐œƒ๔ทก} 3 foreach ๐œƒ โˆˆ ฮฉ s.t.Dom(๐œƒ) = ๐‘‹ do 4 ฮ“ = {ฮ”(๐œƒโ€ฒ) โˆฃ ๐œƒโ€ฒโŠ‘ ๐œƒ and ฮ”(๐œƒโ€ฒ) is de ned} 5 ฮจ(๐œƒ) โ† M๏ฅ๏ฒ๏ง๏ฅT๏ฒ๏ก๏ฃ๏ฅ๏ณ(ฮ“) 6

Figure 4.9: S๏ฌ๏ฉ๏ฃ๏ฅ๏ฒ: Trace Slicing algorithm.

in the rst case of De nition 7, then the property holds because ๐œƒ belongs to ฮฉ as per line 1 in C๏ฏ๏ฎ๏ณ๏ด๏ฒ๏ต๏ฃ๏ดC๏ฏ๏ฎ๏ฎ๏ฅ๏ฃ๏ด๏ฅ๏ค. If ๐œƒ is ๐œƒ๔ท โŠ” ๐œƒ๔ทกlike in the second case of Def-

inition 7, then both ๐œƒ๔ท and ๐œƒ๔ทก belong to ฮฉ by the induction hypothesis, resulting

in ๐œƒ โˆˆ ฮฉ as per line 3.

Lemma 2. A๎„—er running S๏ฌ๏ฉ๏ฃ๏ฅ๏ฒ, ฮจ(๐œƒ) is de ned i๏ฌ€ ๐œƒ is connected andDom(๐œƒ) = ๐‘‹.

Proof. (โ‡’) Line 6 in C๏ฏ๏ฎ๏ณ๏ด๏ฒ๏ต๏ฃ๏ดC๏ฏ๏ฎ๏ฎ๏ฅ๏ฃ๏ด๏ฅ๏ค is the only place ฮจ(๐œƒ) is de ned.

From the condition on line 4 and Lemma 1, ๐œƒ is ๐œ-connected andDom(๐œƒ) = ๐‘‹. (โ‡) From Lemma 1, ฮฉ contains all ๐œ-connected parameter bindings. ๎ขerefore, if ๐œƒ is ๐œ-connected andDom(๐œƒ) = ๐‘‹, then the body of the loop on lines 4โ€“6 is executed and, consequently, de nes ฮจ(๐œƒ).

Lemma 3. A๎„—er running S๏ฌ๏ฉ๏ฃ๏ฅ๏ฒ, if ฮจ(๐œƒ) is de ned, ฮจ(๐œƒ) = ๐œโ†พ๐œƒ.

Proof. We rst show that ฮจ(๐œƒ) preserves the order of events as in ๐œ. ฮ”(๐œƒโ€ฒ) preserves the order because H๏ก๏ฎ๏ค๏ฌ๏ฅE๏ถ๏ฅ๏ฎ๏ด processes events by chronological order and line 3 appends each event to ฮ”(๐œƒโ€ฒ). Since M๏ฅ๏ฒ๏ง๏ฅT๏ฒ๏ก๏ฃ๏ฅ๏ณ is the same as the merge func- tion of a merge sort and all input lists to M๏ฅ๏ฒ๏ง๏ฅT๏ฒ๏ก๏ฃ๏ฅ๏ณ are sorted, the result of M๏ฅ๏ฒ๏ง๏ฅT๏ฒ๏ก๏ฃ๏ฅ๏ณ is also sorted.

Now, showing that ฮจ(๐œƒ) returned from M๏ฅ๏ฒ๏ง๏ฅT๏ฒ๏ก๏ฃ๏ฅ๏ณ keeps the base event of ๐‘’โ€ฒโŸจ๐œƒโ€ฒโŸฉ i๏ฌ€ ๐œƒโ€ฒโŠ‘ ๐œƒ will complete the proof.

(โ‡) A๎‚er running H๏ก๏ฎ๏ค๏ฌ๏ฅE๏ถ๏ฅ๏ฎ๏ด for all events in ๐œ, if there is an event ๐‘’โ€ฒโŸจ๐œƒโ€ฒโŸฉ in ๐œ, then line 2 in H๏ก๏ฎ๏ค๏ฌ๏ฅE๏ถ๏ฅ๏ฎ๏ด de nes ฮ”(๐œƒโ€ฒ), resulting in ฮ”(๐œƒโ€ฒ) โˆˆ ฮ“ (line 5 in

C๏ฏ๏ฎ๏ณ๏ด๏ฒ๏ต๏ฃ๏ดC๏ฏ๏ฎ๏ฎ๏ฅ๏ฃ๏ด๏ฅ๏ค). Since line 2 in H๏ก๏ฎ๏ค๏ฌ๏ฅE๏ถ๏ฅ๏ฎ๏ด stores the base event of ๐‘’โ€ฒโŸจ๐œƒโ€ฒโŸฉ in ฮ”(๐œƒโ€ฒ), M๏ฅ๏ฒ๏ง๏ฅT๏ฒ๏ก๏ฃ๏ฅ๏ณ dispatches the base event of ๐‘’โ€ฒโŸจ๐œƒโ€ฒโŸฉ to ฮจ(๐œƒ).

(โ‡’) ฮ”(๐œƒโ€ฒ) keeps an event only if its parameter binding is ๐œƒโ€ฒ (line 3 in H๏ก๏ฎ-

๏ค๏ฌ๏ฅE๏ถ๏ฅ๏ฎ๏ด), and ฮ”(๐œƒโ€ฒ) is considered to be merged only if ๐œƒโ€ฒ โŠ‘ ๐œƒ (line 5 in C๏ฏ๏ฎ- ๏ณ๏ด๏ฒ๏ต๏ฃ๏ดC๏ฏ๏ฎ๏ฎ๏ฅ๏ฃ๏ด๏ฅ๏ค). ๎ขus, ฮจ(๐œƒ) keeps the base event of ๐‘’โ€ฒโŸจ๐œƒโ€ฒโŸฉ only if ๐œƒโ€ฒโŠ‘ ๐œƒ.

From Lemma 2 and Lemma 3, ๎ขeorem 1 holds.

Below the complexity of S๏ฌ๏ฉ๏ฃ๏ฅ๏ฒ is analyzed. It rst calls H๏ก๏ฎ๏ค๏ฌ๏ฅE๏ถ๏ฅ๏ฎ๏ด ๐‘› times, and, assuming that a self-balancing binary search tree is used for ฮ”, the complexity of H๏ก๏ฎ๏ค๏ฌ๏ฅE๏ถ๏ฅ๏ฎ๏ด is ๐‘‚(log ๐‘›). ๎ขe loop on lines 2โ€“3 in C๏ฏ๏ฎ๏ณ๏ด๏ฒ๏ต๏ฃ๏ดC๏ฏ๏ฎ๏ฎ๏ฅ๏ฃ๏ด๏ฅ๏ค can pick ๐œƒ๔ท and ๐œƒ๔ทกfrom ฮฉ ร— ฮฉ, and each iteration takes ๐‘‚(๐‘š) time for checking the

compatibility and combining the two parameter bindings. ๎ขere are |ฮฉ| iterations of the loop on lines 4โ€“6, with each iteration taking ๐‘‚(๐‘š) time. ๎ขe running time of the entire algorithm is thus ๐‘‚(๐‘› log ๐‘›+|ฮฉ|๔ทกโ‹…๐‘š+|ฮฉ|โ‹…๐‘š) = ๐‘‚(๐‘› log ๐‘›+|ฮฉ|๔ทกโ‹…๐‘š). Since the

algorithm creates all possible connected parameter bindings, |ฮฉ| can be calculated as follows: the number of connected ones with |Dom(๐œƒ)| = ๐‘– + 1 is (๐‘š

๐‘–) โ‹… ( ๐‘›

๐‘š)๐‘–because

we can choose ๐‘– parameters and there are ๐‘š๐‘› parameter values for each parameter. ๎ขus, we have |ฮฉ| = โˆ‘๐‘š๐‘–=๔ท (๐‘š

๐‘–) โ‹… ( ๐‘› ๐‘š)๐‘–= (

๐‘›

๐‘š+ 1)๐‘š, and the time complexity of S๏ฌ๏ฉ๏ฃ๏ฅ๏ฒ

is ๐‘‚(๐‘› log ๐‘› + (๐‘š๐‘› + 1)๔ทก๐‘š โ‹… ๐‘š) = ๐‘‚((๐‘š๐‘› + 1)๔ทก๐‘š โ‹… ๐‘š). As for the space complexity, it needs to maintain ๐‘‚(|ฮฉ|) connected parameter bindings of length ๐‘‚(๐‘š) during trace slicing. It also needs space for (๐‘š๐‘›)๐‘štrace slices of size ๐‘š as illustrated in Figure 4.8. ๎ขerefore, the space complexity is ๐‘‚((๐‘š๐‘›+1)๐‘šโ‹…๐‘š+(๐‘š๐‘›)๐‘šโ‹…๐‘š) = ๐‘‚((๐‘š๐‘›+1)๐‘šโ‹…๐‘š). S๏ฌ๏ฉ๏ฃ๏ฅ๏ฒ iterates through all possible connected parameter bindings in the loop on lines 2โ€“3 in C๏ฏ๏ฎ๏ณ๏ด๏ฒ๏ต๏ฃ๏ดC๏ฏ๏ฎ๏ฎ๏ฅ๏ฃ๏ด๏ฅ๏ค. Since it turned out that this step is expen- sive, two optimizations have been applied. First, instead of blindly picking a pair of parameter bindings from ฮฉ and combining them, the implementation proceeds in a bottom-up manner. At the rst step, it picks two parameter bindings (๐œƒ๔ท and

๐œƒ๔ทก) such that |Dom(๐œƒ๔ท )| = |Dom(๐œƒ๔ทก)| = ๐‘, and creates ๐œƒ๔ท โŠ” ๐œƒ๔ทก, if necessary. Af-

ter handling all parameter bindings with ๐‘ parameter bindings, it picks parameter bindings with ๐‘ + 1 parameter bindings, and so on, until ๐‘ reaches the size of ๐‘‹, the set of parameters. ๎ขis way, a parameter binding is considered for compatibility within only a limited window, reducing the number of iterations.

๎ขe second optimization is to group parameter bindings so that all parameter bindings in the same group bind exactly the same parameter values. Grouping also reduces the number of iterations on lines 2โ€“3 in C๏ฏ๏ฎ๏ณ๏ด๏ฒ๏ต๏ฃ๏ดC๏ฏ๏ฎ๏ฎ๏ฅ๏ฃ๏ด๏ฅ๏ค. For ex-

ample, if โŸจPโ†ฆ ๐‘๔ท ,Qโ†ฆ ๐‘ž๔ท โŸฉ is chosen as ๐œƒ๔ท , all parameter bindings that belong to

the group corresponding to {R,S} will be excluded from the list of candidates for ๐œƒ๔ทก

because any parameter binding in this group would result in ๐œƒ๔ท โŠ“ ๐œƒ๔ทก= โŠฅ.