Indexing - Preprocessing Step - General Boolean Expressions in Publish-Subscribe Systems

4.2 Preprocessing Step

4.2.3 Indexing

Having outlined the encoding scheme for subscription trees, we now proceed with describing the second part of preprocessing: the indexing of subscriptions, including their predicates. After performing this step for a subscription s, this subscription is registered with the system, that is, s is included in the event filtering process, described in Section 4.3.

The overall indexing step consists of two parts, predicate indexing and subscription indexing.

Predicate Indexing

Predicate indexing utilizes one-dimensional index structures. They are specialized with respect to a certain attribute domain and, if necessary, target the efficient implementation of one particular filter function for this domain. For example, equality predicates on integer or float domains could utilize hash tables as index structures; Patricia trees can be used for string domains. Do- mains of a fixed enumerable size allow for the development of specialized, highly efficient data structures, for example, those described in [AJL02]. The goal of predicate indexing is to provide the filtering algorithm with the means to efficiently determine all predicates that are fulfilled by an incoming event message.

Before indexing, each predicate p is assigned a unique identifier id (p). If subscriptions contain common predicates, that is, predicates p specifying the same attribute name an_{, filter function f, and operand op, these predicates in}

different subscriptions get assigned the same predicate identifier.

The identification of common predicates is accomplished by a lookup in the respective predicate index. If p is not indexed yet, it is inserted in the index structure and associated with a new predicate identifier id (p). Otherwise, if a predicate is found in the index, the already assigned identifier is used.

Predicate indexing also includes the integration of knowledge about the use of predicates in a predicate-subscription association table. This table stores information about the occurrence of each predicate p in subscriptions. For this task, each subscription s gets assigned a unique subscription identifier id (s). The predicate-subscription association table thus maps predicate identifiers to sets of subscription identifiers, that is, it stores (id (pi), {id(sj), . . . , id(sl)})

tuples (see Figure 4.4 on page 108 for a graphic illustration). If a subscription s contains the same predicate p several times (e.g., as Subscription Class 2 in Figure 3.2, page 81), p is associated with s more than once in this table. Thus, the subscription set is in fact a multiset. This table is similarly used in the conjunctive counting algorithm (see Section 2.3.3, page 40).

Subscription Indexing

The second part of the indexing step is subscription indexing. Using the subscription index structures that are created during this process allows the filtering algorithm to efficiently determine all subscriptions that are fulfilled by an incoming event message (see Section 4.3 for a description of the filter algorithm).

Firstly, the subscription indexing process encodes the filter expression of a subscription s, as described in Section 4.2.2. The memory address loc(s) of the encoded subscription tree of s is then stored in the subscription location table, mapping subscription identifiers to memory addresses. This table thus stores (id (s), loc(s)) tuples for all indexed subscriptions s (see bottom part of Figure 4.4 for a graphic illustration of this table).

Secondly, subscription indexing calculates a subscription-specific property, pmin(s), the minimal number of fulfilled predicates that is required for a fulfilled

subscription s (shortly referred to as minimal number of fulfilled predicates). The value of this property is inserted into the minimum predicate count vector ,

storing (s, pmin(s)) tuples.

For each subscription s, we can recursively calculate pmin(s) by analyzing

the structure of its filter expression, encoded in the subscription tree. We show the pseudo code for the calculation of this property for nodes n of subscription trees in Algorithm 1. It works as follows:

• For a leaf node nl, pmin(nl) is equal to 1 (Line 3 of Algorithm 1).

• For a disjunctive node nd, pmin(nd) equals the minimum value of pmin(nj)

for all children nj of nd (Lines 5 to 9).

• For a conjunctive node nc, pmin(nc) is the sum of the values pmin(nj) of

all children nj of nc (Lines 11 to 12).

Algorithm 1: Calculation of the minimal number of fulfilled predicates

Input: A node n of a subscription tree

Output: The minimal number of fulfilled predicates pmin(n)

GetMinPredicates(n) (1) _{result ← 0}

(2) if n is a leaf node (3) _{result ← 1}

(4) else if n is a disjunctive node (5) foreach c in n.children (6) if result = 0

(7) _{result ← GetMinPredicates(c)}

(8) else

(9) _{result ← min(result, GetMinPredicates(c))} (10) else if n is a conjunctive node

(11) foreach c in n.children

(12) _{result ← result + GetMinPredicates(c)} (13) return result

The minimal number of fulfilled predicates of a subscription s, pmin(s), is

equal to the value of this property for the root node n of the filter expression of s, that is, pmin(s) = pmin(n). We illustrate this calculation in the following

example:

Example 4.5 (Calculation of pmin(s) for subscription s) In Figure 4.3,

we illustrate the structure of subscriptions of Subscription Class 1 and name all 10 nodes of the subscription tree. The calculation of pmin(s) for a subscription

OR Ending < 1 day AND Title ~ A AND AND Price < C Price < B

Condition = new Condition = used

n2 n1 n7 n8 n10 n9 n6 n5 n4 n3

Figure 4.3: Subscription tree of Subscription Class 1 with named nodes. • For the leaf nodes, it holds pmin(n1) = pmin(n2) = pmin(n3) = pmin(n4) =

pmin(n5) = pmin(n6) = 1.

• For the conjunctive nodes above the leaf level (n9 and n10), it holds

pmin(n9) = pmin(n3)+pmin(n4) = 2 and pmin(n10) = pmin(n5)+pmin(n6) =

• For disjunctive node n8, it holds pmin(n8) = min(pmin(n9), pmin(n10)) =

min(2, 2) = 2.

• For the conjunctive root node n7, it holds pmin(n7) = pmin(n1)+pmin(n8)+

pmin(n2) = 1 + 2 + 1 = 4.

• Finally, for subscription s it holds pmin(s) = pmin(n7) = 4.

After having performed the subscription indexing process for a subscription s (and the previously described predicate indexing process), BoP considers s in its event filtering algorithm, described in the following section.

In document General Boolean Expressions in Publish-Subscribe Systems (Page 141-144)