• No results found

The majority of existing work on content-based pub-sub systems focuses on subscriptions and advertisements in conjunctive form. Therefore, subscriptions and advertisements using operators other than conjunction are not directly supported by these systems. However, various applications, in particular more high-level areas such as electronic commerce settings, require subscriptions and advertisements in a general Boolean form (see Chapter 3).

To approach the support of general Boolean subscriptions and advertise- ments, it is typically argued that systems only need to support conjunctions because any general Boolean expression can be converted to disjunctive nor- mal form. Then, each conjunctive element of such a form can be treated as an individual subscription or advertisement by the system (provided it supports more than one subscription or advertisement per client). At first glance, this argument appears to be sound and was provided, for example, by M¨uhl and Fiege [MF01], and by Pietzuch [Pie04].

However, on examining the influence of conversion in content-based pub- sub systems more closely, it is questionable whether the conversion approach is a suitable means for these systems. Already one of the fundamental works in the pub-sub area by Yan and Garc´ıa-Molina [YGM94], targeting the se- lective dissemination of information (SDI, as introduced by Salton [Sal68]), addresses the implications of the required conversion when only conjunctions

are supported. Yan and Garc´ıa-Molina argue that the handling of general sub- scriptions as disjunctive normal forms may not be the most efficient processing strategy for subscriptions containing disjunctions [YGM94]8. However, within

their SIFT system [YGM99] they apply the conversion approach and leave the required investigation of the influence of conversion to future work. Research analyzing the effects of conversion has not been undertaken so far, either by Yan and Garc´ıa-Molina or by other researchers.

Instead of investigating the suitability of conversion, the majority of sub- sequent work in the pub-sub area (i.e., after the seminal work of Yan and Garc´ıa-Molina [YGM94]) has built on their approach without scrutinizing the suitability of converting general Boolean subscriptions and advertisements. As identified previously, various application areas intuitively require disjunctions. For these systems, an investigation of the advantages and disadvantages of supporting general Boolean expressions is even more pressing than for systems targeting the original, pure text-based SDI approach, which allows for the han- dling of the majority of the existing disjunctions in a specialized way [YGM94]. General content-based pub-sub systems do not offer such an opportunity for handling disjunctions.

The consequences of the conversion approach on pub-sub systems are two- fold:

1. Disjunctive normal forms require more memory for storage. They are, in fact, exponential in size in the worst case compared to the original general Boolean form. These memory requirements directly influence the scalability of pub-sub systems (see Section 2.2).

2. The advantageous effect of optimizing algorithms with respect to con- junctions (as done by Yan and Garc´ıa-Molina, and most subsequent work on content-based pub-sub systems) is counterbalanced by the overall in- crease in the number of subscriptions, and thus the overall increase in the size of the problem to process, after conversion. Even though al- gorithms might need to compute the result of a common subexpression only once, this result has to be incorporated into all subscriptions and advertisements containing the subexpression.

8SDI is one of the historically “original” terms for what evolved into pub-sub sys- tems [Hin03]. Solutions to the filtering problem in the SDI area have been applied to the filtering problem in the content-based pub-sub area [CW03]. The implications and drawbacks of these solutions thus remain in content-based pub-sub systems.

These two effects may not disadvantage systems that perform only a small number of conversions at a given time. For example, database management systems effectively apply the conversion of queries to normal forms [JK84]. The conversion approach is reasonable in these systems because of their pat- tern of evaluating only few queries simultaneously—queries are transient in these systems. Additionally, database management systems apply query op- timization algorithms based on the converted form. Content-based pub-sub systems, however, show the typical pattern of large numbers of subscriptions and advertisements, inherently creating a high system load. These subscrip- tions and advertisements are stored by the system at all times. Additionally, existing pub-sub approaches for general application settings cannot optimize based on the converted forms as database management systems do.

Hence, the suitability of a conversion approach in these systems is ques- tionable because of the explosion of the already-existing large problem size without the application of an advantageous optimization later on (this being the motivation for conversion in database management systems). We elabo- rate on the advantages and disadvantages conversion has on the algorithms in content-based pub-sub systems in Section 2.6.

The general topic of conversion to disjunctive normal form and how this influences content-based pub-sub systems recurs throughout this dissertation. Within this work, we will answer the question as to the usefulness of conversion in a step-by-step manner. We outline the contributions of this dissertation in the following section.