• No results found

Implications in Practice

2.5 Current Routing Optimizations

2.5.5 Implications in Practice

Having analyzed existing routing optimizations within the previous subsec- tions, we identify three common problems in recent approaches:

1. Current optimizations are only applicable to restricted conjunctive sub- scription and advertisement languages. That is, the requirement to sup- port more general subscriptions and advertisements is not fulfilled. 2. The optimization potential of current optimizations depends on the ex-

isting relationships among the registered subscriptions or advertisements. That is, current optimizations are only practically applicable in restricted application scenarios.

3. The benefit of current optimizations needs to be paid back when dereg- istering subscriptions or advertisements. That is, current optimizations assume relatively static subscription patterns.

The development of a routing optimization that targets only one of these problems already constitutes a valuable contribution to current practice. We make such a contribution in this dissertation by presenting two optimization approaches in Chapter 6 and Chapter 7. These approaches not only target one of the identified problems, but tackle all three of these current shortcomings. Based on our novel approaches and their evaluation, we can verify the second part of our central hypothesis (page 6).

Part 1 of this hypothesis regards the unsuitability of canonical conversion for filtering algorithms in general-purpose pub-sub systems. We already hinted, both in this chapter and in Chapter 1, at the problems that occur when cur- rent conjunctive solutions are applied to general Boolean expressions. In the following section, we look into these problems in more detail.

2.6

Influences of Canonical Conversion

It is common knowledge that a general Boolean expression, such as included in a subscription or an advertisement, can be rewritten to a canonical form. We refer to this rewriting process as canonical conversion. A candidate for

a canonical form is the disjunctive normal form [Men97]. For content-based pub-sub systems, it is questionable whether a canonical conversion approach should be taken for subscriptions and advertisements.

Database Management Systems. In database management systems, the restricting clause in database queries is typically internally converted into a canonical form before its execution. Queries are rewritten by database man- agement systems to allow for a common starting point to perform query op- timization [JK84]. This optimization is then applied to the conjunctive com- ponents of a disjunctive normal form [KMPS94] by employing a selection of predefined conversion rules. Finally, the database management system cre- ates access plans for different ways of processing the query and executes the cheapest plan [JK84].

Pub-Sub Systems. The conversion is already implicitly applied in con- tent-based pub-sub systems if taking the data storage view (see Section 2.1.1, page 15): the transient counterparts to database queries, event messages, are restricted to a canonical property—they are defined as attribute-value pairs with default conjunctive semantics.

Content-based pub-sub systems thus build on the foundation of database management systems with respect to the canonical property of transient data. The conversion of subscriptions and advertisements (stored data), however, does not have an equivalent in database management systems; in these systems, it would correspond to the conversion of all data to a predefined canonical form, such as a flat-file format.

Main Problem: Explosion in Complexity. Our main argument against the practice of converting general Boolean subscriptions or advertisements into disjunctive normal forms is its influence on the memory requirements for their storage (and indexing): a disjunctive normal form, in the worst case, is expo- nential in size compared to the equivalent general Boolean expression. This im- plication is consistently acknowledged in the pub-sub area [CCC+01, MFB02].

An exponential increase in size might not occur that often in practice. However, even relatively little increases in complexity already favor the use of general Boolean subscriptions and advertisements over the equivalent canonical form. We show this property throughout this dissertation.

The underlying reason for the inappropriateness of conversion in pub-sub systems is found in (i) the opposite problem definitions in content-based pub- sub and database management systems, and (ii) the opposing application of canonical conversion in these systems. A database management system deals with a small number of transient and canonically converted queries at one point in time. Instead, in a content-based pub-sub system, a large number of stored and canonically converted subscriptions is registered, and they need to be continuously matched against incoming messages (transient and canonical by definition).

The increase in resources (both memory and computational) required in pub-sub systems when performing conversion is thus, in absolute terms, much higher than in the case of simultaneously executing, for example, a 2-digit number of database queries at one point in time. Additionally, pub-sub systems lack sufficient solutions to optimize subscriptions in general application settings (see Section 2.3), this optimization being the reason for conversion in database management systems.

The increased complexity when converting subscriptions and advertisement affects the main algorithms for pub-sub systems, including the filtering algo- rithm, the routing algorithm, and the overlapping calculation algorithm. Consequences: Scalability. The filtering algorithm in pub-sub systems is applied in each individual broker component; its scalability is mainly deter- mined by the memory requirements (space-scalability, see Section 2.2). Canon- ical conversion increases the size of subscriptions, and thus their memory re- quirements for storage and indexing. Hence the scalability of individual brokers decreases. Even though there exists some redundancy among converted sub- scriptions, current filtering algorithms cannot exploit this property (see Sec- tion 2.3). A Boolean filtering approach, on the other hand, does not convert in the first place.

The specifications of publishers (advertisements) need to be handled simi- larly to subscriptions in pub-sub systems [M¨uh02]. Hence comparable problems and implications regarding scalability arise when converting advertisements.

The applied routing algorithm distributes subscriptions and advertisements as routing entries within the broker network. Thus, if subscriptions and ad- vertisements increase in their overall size, the respective routing tables be- come larger. Thus, the effects of canonical conversion on central brokers are multiplied in the overall network due to the distribution of subscriptions and

advertisements. Besides these effects of canonical conversion on memory re- quirements, the network load for distributing subscriptions and advertisements increases, also affecting overall system scalability (see Figure 2.9, page 27).

Consequences: Efficiency. The influence of canonical conversion on sys- tem efficiency is twofold. On the one hand, filtering algorithms specialized for conjunctive subscriptions exploit the property of only handling conjunctions, and thus do not need to consider the Boolean combination of predicates in subscriptions (see Section 2.3). The same advantageous property holds for the algorithms to calculate the overlap between subscriptions and advertisements. On the other hand, as an argument against canonical conversion, the size of the problem that needs to be solved by the filtering algorithm or the over- lapping calculation algorithm increases. Firstly, conjunctive algorithms need to work on more subscriptions and advertisements (due to their conversion). Secondly, the overall number of predicates within the converted subscriptions or advertisements is much higher.

For the overlapping algorithm, these influences are more severe than for the filtering algorithm. Both subscriptions and advertisements are converted canonically. Hence, both inputs to the algorithm, potentially, are exponential in size, resulting in a multiple explosion of the problem size.

The same overall argument can be applied to the routing task. Routing table entries, on an individual basis, are less complex after conversion than before conversion, that is, routing entries contain fewer predicates that are conjunctively combined per definition. However, the number of routing entries increases exponentially in the worst case, due to canonical conversion.

Conclusions. We give an overview of the identified, twofold influences of canonical conversion on event filtering, event routing, and overlapping task in Figure 2.14. Advantages of conversion are presented on the left-hand side whereas disadvantages are shown on the right-hand side of the figure.

Contemplating the depicted dual effects of canonical conversion instantly raises the question of the benefit of solely conjunctive content-based pub-sub systems. They are clearly advantageous if an application area only requires conjunctive subscriptions and advertisements. However, for scenarios necessi- tating general Boolean subscriptions and advertisements, this benefit evidently degrades and even transforms into a drawback. Within this dissertation, we

Advantages

advertisements to distribute More routing entries

Canonical conversion

Individual subscription more efficient to filter Less complex individual routing entries

Less complex individual subscriptions and advertisements

Relation of more sub− scriptions and adver− tisements required

More subscriptions to filter

Filtering task Routing task Overlapping task

Disadvantages

+

More subscriptions and

Figure 2.14: Overview of the influences of canonical conversion on event filtering task, routing task, and overlapping task.

show this behavior, and the general advantages of supporting general Boolean subscriptions and advertisements.

2.7

Summary

Within this chapter, we introduced the general concepts and algorithms for content-based pub-sub systems. Furthermore, we started to analyze recent approaches and to identify their implications.

Content-based pub-sub systems show a range of similarities to database management systems, but there are also fundamental differences between these two kinds of systems. Their most severe dissimilarity is in the vast num- ber of simultaneously registered subscriptions in pub-sub compared to only a moderate number of concurrently processed queries in database management systems. Current content-based pub-sub systems further increase not only the number of registered subscriptions but also the number of registered advertise- ments due to their sole support of conjunctive expressions. General Boolean subscriptions and advertisements thus need to be converted into disjunctive normal forms to become processable.

This canonical conversion has major influences on the scalability character- istics of content-based pub-sub systems and also on their efficiency properties. Conversion affects the filtering and overlapping calculation tasks in central broker components, as well as the routing tasks within the distributed sys- tem. Solutions to these tasks thus experience an explosion in their memory

requirements due to the conversion approach taken. The effect of this increased memory use is a degrading of overall system scalability. Regarding efficiency, the influences of conversion are twofold. They decrease the complexity of in- dividual subproblems that need to be solved, but they strongly increase the overall problem size.

The immediate question emerging out of these observations is whether canonical conversion is a suitable operation in content-based pub-sub systems. Within this dissertation, we make a case for the application of content-based pub-sub systems that internally work on general Boolean subscriptions and advertisements. We do so by providing the required filtering, overlapping cal- culation, and routing solutions supporting these expressions. With the help of our proposals, we show that systems for application scenarios involving general Boolean subscriptions and advertisements can benefit from these more com- pact expressions: their support leads to an extended system scalability and system efficiency. We introduce one potential application scenario, serving as a running example throughout, in the following chapter.

Application Scenario: Online

Auctions

I

based pub-sub systems: online auctions. We gave an initial illustration ofn this chapter, we introduce an example application scenario for content- some pub-sub functionalities in this scenario in Example 1.1 (page 3). Gener- ally, active notification mechanisms, as offered by pub-sub systems, are highly desirable in online auctions to allow for an efficient dissemination of process- related information [CB02]. We further elaborate on online auctions in general and the benefits of integrating pub-sub mechanisms in Section 3.1.

Subsequently, we analyze the patterns of typical event messages for on- line auctions (Section 3.2). This is followed by the definition of exemplary subscriptions (Section 3.3) and advertisements (Section 3.4) for this scenario. We use these instances throughout this dissertation to better describe and exemplify our approaches, to apply the developed models, and finally to prac- tically analyze and evaluate our proposals. To further enhance this chapter, we sketch other valuable application scenarios for content-based pub-sub systems in Section 3.5.

The event distributions, and the subscription and advertisement examples we present in the following sections are based on our analysis1 of auction items

on eBay2. We restricted this analysis to book auctions, in particular to fiction

books offered in the United States. Our results allow for the derivation of a typical event load in online auction settings (as we show in Section 3.2.3).

Combining these typical event distributions with our example subscriptions 1The analysis was undertaken on July 8, 2005.

2

http://www.ebay.com/

and advertisements allows for the experimental evaluation of our approaches using this semi-realistic scenario (see Chapter 5 and Chapter 8). This is a valuable advantage over recent evaluations, mostly using purely artificial test settings. The assumptions made to create these artificial workloads are rather strong and hardly ever described in detail. This circumstance does not allow for the repeatability of experiments or comparative evaluations of different approaches by different researchers. This chapter is intended to close this gap, and to describe and provide the foundations of a more realistic test setting.