Specialized Filter Objects - Content-Based Data/Filter Models

4.2 Content-Based Data/Filter Models

4.5.3 Specialized Filter Objects

Another possibility is to use specialized filter objects, an approach that can also be combined with class filters. Such a filter implements a match method that evaluates whether a notification matches this filter instance or not. Moreover, it can also implement methods for covering and merging. Figure 4.12 shows the implementation of a QuoteFilter in Java. Note that the filters can also be built upon a more generic filter library which offers, for example, set-oriented filters.

4.6 Related Work

Support of Routing Optimizations. Elvin [101] supports quenching in which notifications are first evaluated against a broader subscription that covers the disjunction of all subscriptions but no algorithms for quenching are described.

public class QuoteFilt er {

public boolean covers ( QuoteFilt er qf ){

return g e t S y m b o l S e t (). isSuperSet ( qf . g e t S y m b o l S e t ()); }

public static QuoteFilt e r merge ( QuoteFilt e r [] qf ){

return new QuoteFilt e r ( QuoteFilt e r . u n i o n O f S y m b o l S e t s ( qf )); }

public boolean match ( Event e ) { i f (!( e instanceof QuoteEvent )) return false ; return ( qf . g e t S y m b o l S e t (). contains ( (( QuoteEven t ) e ). getSymbol ())); } }

Figure 4.12: Implementation of a QuoteFilter in Java

Siena_{[19, 17] exploits covering among filters, and uses overlapping of filters} to support advertisements. The data/filter model of Siena is similar to struc- tured records but allows for multiple attribute filters on the same attribute. If multiple attribute filters are imposed on an attribute, they are interpreted dif- ferently with respect to subscriptions and advertisements. In the case of a subscription they are interpreted conjunctively, while for advertisements they are interpreted disjunctively. Hence, their model is not symmetrical. This choice causes no problems with the simple types and operators supported in Siena, but it inhibits supporting routing optimizations for more complex types and operators. Algorithms that determine coverage or overlapping among filters are not presented.

Matching Algorithms. Yan and Garcia-Molina [120] describe several matching algorithms including the predicate counting, the key, and the tree algorithm in the context text documents which are matched against keyword-based profiles. In this paper they also present performance results obtained from simulation.

Fabre et al. [37] and Pereira et al. [89] present matching algorithms which exploit similarities among predicates. In a first step the satisfied predicates are computed and after that the number of predicates satisfied by a subscription are counted using an association table. Two variants of this algorithm are described which incorporate special treatment of equality tests and of constraints having only inequality tests.

A predicate matching algorithm for database rule systems is presented by Hanson et al. [54] that indexes the most selective predicate that is determined by the query optimizer. They use a special indexing data structure called interval binary search tree to support the efficient evaluation of interval tests.

tomata theory. They show how a set of conjunctions of predicates, each depen- dent on exactly one attribute, can be transformed to a deterministic finite state automaton. In the paper different types of test predicates are considered and complexity results are obtained. Their algorithm is very efficient, but its worse case space complexity is exponential. The proposed solution is also not suited for dynamic environments as the automaton has to be newly constructed from scratch if subscriptions change.

Pu et al. [70, 112] present indexing strategies for continual queries based on trigger patterns. In particular, a strategy which uses an index on the most selective predicate is described. More complex indexing strategies exploit similarities among trigger patterns to reduce the processing costs. They restrict optimizations to constraints which place a constraint on a single attribute involving at most one constant.

Gryphon uses the content-based matching algorithm presented by Aguilera et al. [1]. This algorithm traverses a parallel search tree where non-leaf nodes correspond to simple tests and edges from non-leaf nodes represent results. Leaf- nodes are associated with matched subscriptions. Banavar et al. [8] present a multicast routing algorithm that executes the matching algorithm at each broker. The algorithm presented is limited to equality tests.

Answering Queries using Views. Covering relations are known from database theory and in particular from the area of answering queries using views [53, 115]. There, the question is whether the result set of a given query Q can be solely obtained from a set of predefined views V whose elements can be combined by the usual relational operators, i.e., whether Q is covered by some combination of the views in V . Answering this question for relational expressions is NP-hard even without comparison operators. If only the union operator is allowed, this is still a more general scenario than the one presented here. Although special cases have been investigated, an approach that is closely related does not seem to exist.

Semantic Caching. Lee and Chu [65] describe a semantic caching algorithm for conjunctive point queries that exploits covering between conjunctive predicates to find cache entries which cover a given query. However, this work is re- stricted to point queries involving the equivalence and the like operator. Godfrey and Gryz [49] depict an architecture for predicate-based caching that is similar to answering queries using views. Therefore, it is not surprising that their algorithms are NP-complete, too. Keller and Basu [61] propose a predicate-based caching scheme for client/server database architectures. They perfectly merge predicates in the cache to obtain a more compact cache description and to speed up query processing. Their algorithm has exponential time complexity.

Query Merging. Crespo et al. [27] propose merging of queries that are evaluated periodically against a database. As example, they use geographical queries

represented by a rectangle. Before the queries are processed a merging algorithm is run that combines similar queries and outputs a set of merged queries whose answers contain all tuples of the original query. Their aim is to find a set of mergers which is cost optimal. They show that in the general case query merging is NP -complete and discuss optimal and heuristic algorithms.

Geometrical Algorithms. In the context of geometrical algorithms [95], for example, polygon inclusion, intersection, and containment of convex polygons are investigated. These algorithms can be integrated with the work presented here to support efficient matching, covering, and merging of notifications contain- ing geometric objects. Such objects are, for example, prevalent in geographical information systems.

In document Large-Scale Content-Based Publish-Subscribe Systems (Page 121-124)