• No results found

One of the design goals of our filtering approach is to provide a solution for general-purpose content-based pub-sub systems. There are also specific exten- sions and optimizations to the previously presented generic filtering algorithm

that significantly improve its performance.

4.5.1

Pure Conjunctive Subscriptions

If subscribers register pure conjunctive subscriptions, BoP handles them with only a little overhead compared to the specialized counting algorithm. In the registration process, BoP analyzes and encodes (see Section 4.2.2) the filter expression of each subscription s. If s is a pure conjunctive subscription, the structural component of the encoded root node of the subscription tree contains an operator identifier that is different from the operator identifier of an ordinary conjunction. The root node is the only inner node in case of conjunctive subscriptions. The filtering algorithm subsequently avoids the evaluation of s in final subscription matching, that is, the evaluation method for the subscription tree just returns true without accessing the leaf nodes.

BoP can apply this method, because in a conjunctive subscription s, pmin(s) is always equal to the total number of predicates of s. Hence ev-

ery conjunctive candidate constitutes a fulfilled subscription. The minimal overhead of our filtering approach, in comparison to the counting approach, is to retrieve the memory address of the subscription tree, consulting the sub- scription location table once. Our experiments confirmed that there is only a marginal overhead of a fraction of a millisecond per filtered event message for processing up to 300,000 subscriptions.

4.5.2

Short-Circuiting

For general Boolean subscriptions, BoP applies a short-circuiting optimiza- tion. However, due to the memory-aware encoding scheme of subscription trees (see Section 4.2.2), full short-circuiting can only be applied to root nodes of subscription trees. Inner nodes use partial short-circuiting, that is, nodes are not fully bypassed but only accessed to determine their width in bytes. BoP thus avoids the evaluation of Boolean expressions and the access of the fulfilled predicate vector. For root nodes, on the other hand, BoP applies the full bypass method.

As presented in Section 4.2.2, we also experimented with an alternative encoding scheme that stores the widths of the children of a node and thus allows for full bypassing of any nodes [BH05b]. This alternative scheme requires more memory resources, but led to the same efficiency properties as the applied

scheme in empirical experiments.

4.5.3

Order of Children

BoP applies a routing optimization (see Chapter 6) that estimates the se- lectivity of the nodes of subscription trees. The filtering algorithm uses this information and re-orders the children of a node according to the selectivity estimate. For conjunctions, BoP orders children with increasing selectivity. It is thus more likely to determine a non-fulfilled candidate early in the eval- uation process (in final subscription matching). For disjunctions, children are arranged with decreasing values of selectivity estimation. Hence, BoP deter- mines fulfilled candidates early and avoids their further evaluation.

4.5.4

Filtering Shortcut

All approaches in this dissertation work with the subscription or advertisement- forwarding scheme as routing algorithm (see Section 2.4.5, page 46), depending on the application of advertisements. This allows for the implementation of a shortcut optimization to avoid the evaluation of most candidates in final subscription matching. The same shortcut can be applied if subscribers, having various registered subscriptions, only need to be notified about matching events but not about what subscriptions are fulfilled by the message.

BoP uses a hash table (mapping a neighbor broker to a Boolean value) to record whether any non-local subscription that was forwarded by a par- ticular neighbor broker is fulfilled by the incoming event message e. Because e needs to be routed to a neighbor regardless of how many of the forwarded subscriptions are fulfilled, BoP only requires to evaluate the respective candi- dates until one fulfilled subscription is found. Proceeding in that way avoids the evaluation of the majority of candidate subscriptions in the distributed pub-sub system. The same approach can also be used for subscribers, having properties as described before. An inspiring shortcut optimization was pro- posed in [CW03] in combination with subscriptions restricted to disjunctive normal form (treating a set of conjunctive subscriptions as one subscription).

4.5.5

Minimal Number of Fulfilled Predicates

The calculation algorithm for the minimal number of fulfilled predicates pmin(s)

scription trees. Considering the semantics of subscriptions, however, can lead to a larger value for pmin(s). Generally, the higher pmin(s) for subscription s,

the less frequent s occurs as a candidate subscription in the filtering process. Hence the overall filtering performance is improved for larger values of pmin(s).

To exemplify the potential increase of pmin(s), let us consider a subscrip-

tion s of Subscription Class 2, for example, s2 (see Section 3.3.1, page 80).

According to Section 4.2.3, it holds that pmin(s) = 5. However, one can derive

that pmin(s) = 6 when considering the semantics of s2 (Figure 3.2, page 81):

every fulfilled subscription has to specify either a used or a new book copy. For a new copy, predicates p5 and p9 are always fulfilled, whereas predicates p8

and p12 are always fulfilled for used book copies. Either one of these two con-

ditions always holds in practice, leading to two fulfilled predicates. Hence the system still works correctly if increasing pmin(s) by one, leading to pmin(s) = 6.

The general goal of this optimization is to incorporate semantic dependencies among predicates into subscriptions.

So far, we have not included this extended semantic analysis of subscrip- tions into BoP. We plan to do so in the future.

4.5.6

Exploiting Event Types

Event types, on the one hand, define the semantics of subscriptions. On the other hand, one can exploit these types to improve the filtering process: mes- sages can only match subscriptions if they specify the same type (see Defi- nition 4.4, page 98). A filtering algorithm can thus neglect subscriptions of any type other than the one stated by the event message. This restriction is automatically exploited in predicate matching (only predicate indexes of at- tributes of the respective type are evaluated, confer Section 4.3.1). However, candidate subscription matching (see Section 4.3.2), in the generic way we described previously, offers some optimization potential.

The general idea for this optimization is to compact the hit vector, pop- ulated in candidate subscription matching, in order to reduce the number of comparisons that is required to identify candidates. A way of compacting this structure, but still using an efficient array implementation, is to use an ad- vanced handling of subscription identifiers. Firstly, these identifiers contain two parts, one stating the event type and one stating a unique identifier for this type. This allows a specialized hit vector (as an array) to only contain entries for one event type. Secondly, subscription identifiers should not contain

holes, that is, the identifier space should be densely populated. This can be achieved by reissuing these identifiers to subscribers, or by adding another level of indirection, that is, internal identifiers differ from those used by subscribers. We plan to fully integrate this extension into BoP in the future.