• No results found

Advanced Sub-expression Sharing with Different Negative

5.3 Shared Optimized NEEL Pattern Execution

5.3.2 Advanced Sub-expression Sharing with Different Negative

ponents

Beyond prior work [WDR06, BDG+07, MM09], we now also tackle the case of

sub-expression sharing with different negative components. Namely subpatterns contain the same projected positive event types while their negative event types may differ. Besides saving CPU resources, we achieve the added benefit that one sequence result may satisfy several such expressions. If we construct the results for such normalized event expressions of a nested query separately, we may in- advertently produce duplicate results namely one for each of these different event

5.3. SHARED OPTIMIZED NEEL PATTERN EXECUTION 142

expressions. This then would not only waste CPU resources for re-computation but also incurs the costs associated with duplication removal.

We observe that such event expressions with common positive event types re- turn the same results yet only apply different negation filters. The main idea is that we record the constraints of non-occurrence and non-projected occurrence for each expression at compile time. At run time, as we construct each sequence result, we keep track of which of the given constraints are satisfied (or, rather violated). We stop the evaluation early for unsatisfied event expressions.

Expression-vs-Negative Map (EMap). To facilitate the advanced sequence re- sult generation, we design a data structure EMap that records the negative compo- nents and non-projected positive components of an expression with their positions. Columns in the map correspond to negative components and non-projected positive components with positions in the shared expressions while rows list the expression identifiers. If the same negative component or non-projected positive component exists in different positions in an expression, such negative component is listed multiple times in EMap. At compile time, a cell entry indicated by its row and col- umn Map[i, j] is assigned a “1” if the negative event type as indicated by column j

is listed in the specified position in an expression Eiand a “0” otherwise. Possibly

one negative component may exist in more than one location in different queries. Result Vector Indicator (RVI). For each partial sequence result, we maintain a

Result Vector Indicator (RVI) which is represented by a bit array. The columns of

RVI are the same as the ones in EMap. During query execution, a RVI is maintained to check if the current partial result is indeed a correct match. We mark the cell entry <i, j> for a column that corresponds to a negative component or a non- projected positive component as “1” if at run time the negative component or the

5.3. SHARED OPTIMIZED NEEL PATTERN EXECUTION 143

non-projected positive component assigned with that column evaluates to true in the specified position in an event stream (not found for the negative component and found for the non-projected positive component).

Lemma 3 We stop query evaluation early for one sub-expression Ei if logical

AND-ing the bit vectors of the row for Ei in EMap with the RVI for the partial

result is “0”.

Proof: When the logical AND-ing of the bit vectors of the row for Ei in EMap

with the RVI for the partial result is “0”, as the bits in EMap are all “1”, it indicates at least one bit in RVI is “0”. So we can conclude that at least either one negative component is evaluated to false (found) or one non-projected positive component is evaluated to false (not found). According to the semantics of SEQ operator with

negation 5.4, such partial result is not satisfied.2

Example 29 The normalization procedure rewrites Q1= SEQ(Recycle, Washing,

! SEQ(Sharpening, Disinfection, Checking), Operating) into the expression in Fig- ure 5.5. Figure 5.6(a) shows the shared instance stacks for all three expressions. Figures 5.6(b) and 5.6(c) show the EMap and RV I structures respectively. The negative component for E1is ! Checking, for E2(! Disinfection, Checking) (Check- ing is not a positive component as it is not listed in the projection list) and for E3(! Sharpening, Disinfection, Checking). When event instance o20of type Operating arrives, the sequence construction is initiated. When evaluating the partial result < w5, o20>, we mark the cell “1” under (! S, D, C) in RV I as < d6, c16> exists between w5and o20and no Sharpening events siwith 5 < i < 6 exist. Similarly, the

5.3. SHARED OPTIMIZED NEEL PATTERN EXECUTION 144

continue the result construction for E3 because the AND of the bits in the result vector RVI in Figure 5.6 (c) with the row for E3 in the EMAP in Figure 5.6 (b) is “1”. Result computation for E1 and E2 stopped early by Lemma 3 because the AND of such bits is “0”.

SEQ(Recycle, Washing, ! Checking, Operating) OR

ProjR, W, OSEQ(Recycle, Washing, ! Disinfection, Checking, Operating) OR

ProjR, W, OSEQ(Recycle, Washing, ! Sharpening, Disinfection, Checking, Operating)

Figure 5.5: Normalized Expression for Q1

Bit-Marking

Sharpening Washing Recycle Operating Checking Disinfection r1 r2 w5 s4 s10 d6 c16 o20 s12 1 1 1 (W, O)!S,D, C !D,C !C E1 E2 E3

(b) Expression-vs-Negative Map (EMap)

(a) Shared Instance Stacks

Evaluate Partial Result: <w5, o20>

(c) Result Vector Indicator (RVI)

1 !D,C0 !C0 !S,D, C j = 0 j = 1 j = 2 i = 0 i = 1 i = 2

Figure 5.6: Bit-Marking Example

Lemma 4 No duplicate results will be produced because we conduct sequence

construction only once for all expressions in a group.

Proof: We will output a sequence result for a group of shared expressions S if

and only if∃ Ei in S for which the logical bit by logical AND-ing the bit vectors

of the row for the sub-expression Ei with the current result’s RVI is “1”. Each

sequence result is only outputted once for a group of shared expressions. It implies that all the non-existence constraints in at least one of the clustered expressions are

Related documents