Viability: Performance Evaluation - Complex Event Processing as a Service in Multi-Cloud Enviro

This section discusses the viability ofAGeCEPas a formal foundation for developing generic CEP algorithms and management procedures. The analysis focuses on the time required to transform CEP queries using both simple and complex graph rewriting rules.

In the following, the AGG tool [143] was used to define and apply the graph rewriting rules. The experiments were conducted on a server with two six-core processors (Intel Xeon E5-2630, 2.6GHz) and 96GB of RAM. The server was running Ubuntu Linux 14.04 and Java 1.7.0 75.

5.4.1 Simple Policy

The first experiment verified the execution time and scalability of the actions executed by the

Combpolicy (Section 5.3.1). This is a simple policy that consists of a single rewriting rule in which only two vertices are matched.

The total number of queries to which the rule was applied varied from 100 to 1000, and for each number, three query compositions were tested. In the first composition, 20% of the queries were clones of queryq1 (Figure 4.6a), and 80% were clones of q2 (Figure 4.6b). In

the second and third compositions, query q1 represented 50% and 80% of the total queries

respectively. Note that only queryq1has a sequence of combinable filters f1and f2.

The graph in Figure 5.7 shows the average execution time of 30 runs along with the 99% confidence interval. The growth in execution time is close to linear. For all three compositions, 100 queries were processed in less than one second, and 1000 queries in less than 14 seconds. For the 80% composition, this is equivalent to rewriting 800 queries according to the operator combination policy.

5.4.2 Complex Policy

This experiment verified the performance and scalability of complex sequences of actions. To perform this experiment, the analysis was divided into two parts. First, the execution time

Figure 5.7:Combpolicy execution time. <<op>> id=“j1_1” impl=“json_parser” type=“processing” combinable=“false” duplicable=“true” reqSplit=“random” reqMerge=“union” <<prod>> id=“p1” impl=“kafka” <<cons>> id=“c1” impl=“service” <<op>> id=“f1” impl=“ﬁlter” type=“processing” combinable=“true” duplicable=“true” reqSplit=“random” reqMerge=“union” <<op>> id=“f2” impl=“ﬁlter” type=“processing” combinable=“true” duplicable=“true” reqSplit=“random” reqMerge=“union” <<op>> id=“xml1” impl=“xml_conv” type=“merge” mergeType=“custom” combinable=“false” duplicable=“false” <<prod>> id=“p2” impl=“kafka” <<op>> id=“jsonSplit” impl=“split_random” type=“split” splitType=“random” <<op>> id=“j1_2” impl=“json_parser” type=“processing” combinable=“false” duplicable=“true” reqSplit=“random” reqMerge=“union” id=“e1” sources={“p1”} queries={“q2”} attrs=∅ id=“e2” sources={“p2”} queries={“q2”} attrs=∅ id=“e6” sources={“p1”, “p2”} queries={“q2”} attrs=∅ id=“e7” sources={“p1”, “p2”} queries={“q2”} attrs=∅ id=“e8” sources={“p1”, “p2”} queries={“q2”} attrs=∅ id=“e9” sources={“p1”, “p2”} queries={“q2”} attrs=∅ id=“e10” sources={“p1”, “p2”} queries={“q2”} attrs=∅ id=“e11” sources={“p1”, “p2”} queries={“q2”} attrs=∅ id=“e10” sources={“p1”, “p2”} queries={“q2”} attrs=∅

Figure 5.8: Queryq2 - optimized version.

for applying the Duplpolicy (Section 5.3.2) was assessed. Following, the execution time for applyingDuplfollowed byRemMS (Section 5.3.3) was analyzed.

Both parts were executed using the same numbers of queries and the same query compositions as in the previous experiment. In this case, however, the duplication was applied only to the operator j1 belonging to queryq2 clones (Figure 4.6b). Note that after j1 duplication,

the newly created merge forms a void sequence with the f S plit operator. Applying RemMS

therefore caused this sequence to be removed, resulting in the query depicted in Figure 5.8. Figure 5.9a depicts the execution time of the Dupl policy as a function of the number of queries for all three compositions. For each duplication, four rewriting rules were applied:

Pinit1

dupl once to create the two instances of j1 connected to a new split and merge; Pinitdupl2 twice

to redirect j1 inputs (p1 and p2) to the new split; and Pinit_dupl3 to connect the new merge to the j1successor (f S plit). For the 20% composition, 1000 queries were processed in less than 40

(a)Duplexecution time. (b)Duplfollowed byRemMS execution time.

Figure 5.9: DuplandRemMS policies execution times.

The execution time to apply Dupl followed byRemMS is shown in Figure 5.9b. To exe- cute the RemMS policy, three more rewriting rules were applied: Pbyprem twice to connect each

instance of j1 to an instance of f12, and Premsup once to remove the redundant merge and split.

Therefore,Duplfollowed byRemMS requires the application of seven rewriting rules in total. The graph clearly shows an exponential growth that is especially pronounced in the 50% and 80% scenarios. In these scenarios, rewriting all queries may take minutes. Indeed, for the 80% scenario there are no data point for 900 and 1000 queries because the execution time exceeded the established timeout of 15 minutes.

It is important to discuss these results under proper assumptions about how these rules will be applied in practice. Finding homomorphisms in graphs is a well-known NP-complete problem [55]. Nevertheless, most of the time, these rules will be applied to a much smaller number of queries. For example, SQO policies are executed in response to new queries, and therefore only them need to be analyzed. Similarly, most runtime management rewriting rules are applied only to the small subset of running queries that need to be rewritten. For instance, as described in Section 5.3.2, duplication is performed only after a bottleneck has been pin- pointed. The extreme cases described in this section were investigated for theoretical purposes and for completeness of analysis.

5.5 Summary

To demonstrate the feasibility of AGeCEP for specification and enforcement of self- management policies, this chapter introduced the design of an autonomic manager based on

AGeCEP and a selection of five policies built on this design. Furthermore, it presented a

investigated the viability ofAGeCEPby executing performance measurements of query recon- figurations. By considering both expressiveness and performance, these results suggest that

AGeCEP can be e↵ectively used to develop algorithms for application and integration into

diverse modern CEP systems.

The next chapter discussesCEPSim, a simulator of cloud-based CEP system that usesAGe- CEPas query representation model.

Complex Event Processing Simulator

This chapter1 _introduces _CEPSim_{, a simulator that has been developed to overcome the dif-}

ficulties of evaluating CEP systems and of comparing query management and processing ap- proaches. The chapter starts with a discussion about CEPSim motivation and benefits. Fol- lowing this discussion, Sections 6.2 and 6.3 introduceCEPSimdesign principles and the foun- dational concepts on top of which CEPSimis built. Finally, the simulation algorithms and a thorough evaluation ofCEPSimare presented in Sections 6.4 and 6.5.

6.1 Motivation

The resurgence of interest in CEP systems caused by the new Big Data world has been accom- panied by the use of cloud environments as their runtime platform. Clouds are usually lever- aged to provide the low latency and scalability needed by modern applications [25, 69, 128]. Other systems, such as theCEPaaSsystem proposed in this research, also explore cloud computing to facilitate o↵ering CEP functionalities in the services model. In this context, the

development of efficient operator placement and scheduling strategies is essential to achieve

the required quality of service. However, validating these strategies at the required Big Data scale in a cloud environment is a difficult problem and constitutes a research problemper se.

First, cloud environments are subject to variations that make it difficult to reproduce the

environment and conditions of an experiment [56]. Moreover, setting up and maintaining large cloud environments are laborious, error-prone, and may be associated with a high financial cost. Finally, there are also many challenges related to generating and storing the volume of data required by Big Data experiments.

Simulators have been used in many di↵erent fields to overcome the difficulty of execut-

1_{The content of this chapter has been published as a conference paper [76] and as a journal paper [77].}

ing repeatable and reproducible experiments. Early research into distributed systems [105] and grid computing [33] used simulators, as well as the more recent field of cloud computing [34, 92, 119]. Generally, cloud computing simulators make it possible to model cloud environments and to simulate di↵erent workloads running on them. Nonetheless, these simu-

lators are mostly based on application models and simulation algorithms that cannot represent properly the dynamics of CEP systems. To overcome these limitations, this research presents

CEPSim, a flexible simulator of cloud-based CEP systems.

CEPSim extends CloudSim [34] using a query model based on AGeCEP and introduces

simulation algorithms based on a novel abstraction called event sets. CEPSim can be used to model di↵erent types of clouds, including public, private, hybrid, and multi-cloud environ-

ments, and to simulate execution of user-defined queries on them. In addition, it can also be customized with various operator placement and scheduling strategies. These features enable system architects and researchers to analyze the scalability and performance of cloud-based CEP systems and to easily compare the e↵ects of di↵erent query processing strategies.

In document Complex Event Processing as a Service in Multi-Cloud Environments (Page 104-109)