• No results found

Despite a recent surge of interest in CEP motivated by its use in Big Data scenarios, today the CEP market is still dominated by a few proprietary solutions [86, 123, 139] that require large investments for their acquisition, but are still not as flexible as desired. Alternatively, on the other side of the spectrum, many companies adopt open-source, low-level systems [17, 18, 153] whose deployment demands intense technical training and high operating costs.

To address these problems, this research proposes the creation of a CEP as a Service

(CEPaaS) system to enable the o↵ering of CEP functionalities in the cloud services model.

This model brings many advantages to the system users, such as no up-front investment, low maintenance cost, and ubiquitous access via the Internet.

Nevertheless, o↵ering such a service involves many challenges, which is reflected in the

limited number of similar services today. First, low latency is essential to many CEP use cases, but is difficult to achieve in a service environment because there is no control over the

locations of event sources and consumers. In addition, some use cases impose an unpredictable and variable load over the system, requiring the implementation of elasticity capabilities in the system.

Moreover, CEPaaS is inherently multi-tenancy, which also brings many implications to the system architecture and design. For instance, a multi-tenancy system has to have high

availability because an outage a↵ects many customers and can seriously damage the service

provider reputation. It is also necessary to control the resource usage of user queries and guarantee their isolation so that they do not interfere with each other.

By o↵ering it to anyone with Internet access, the system is expected to scale primarily in

the number of queries rather than in the input event rate of a small number of queries. Finally, by targeting such a wide spectrum of users, the system must be usable by non-specialists, but at the same time should not prohibit the definition of custom processing logic by advanced users. The next sections discuss in detail the architecture, design and implementation of aCEPaaS

system that aim to solve the mentioned challenges.

7.2 System Overview

To handle the challenges associated with o↵ering CEP as a managed service, theCEPaaSsys-

tem is built on three main pillars: a multi-cloud architecture, container management systems

(CMS), and an extensible multi-tenant design. The first two are leveraged at the architectural level to provide a scalable and fault-tolerant runtime environment for queries. The third pro- vides a novel design in which the system applicability increases with the number of users.

Figure 7.1 shows an overview of the system architecture. The figure depicts one primary and two secondary deploymentsof the system, each one running in a di↵erent cloud. In this

context, cloud is loosely defined as a cluster of servers o↵ered by a cloud provider that are

connected via a high speed network and are geographically close to each other. In terms of Amazon’s and Google’s nomenclature, this definition implies that the servers from a cloud are running on the sameregionorzone.

This architecture is not strictly compliant with the multi-cloud definition provided in Sec- tion 2.2.3, which requires clouds managed by di↵erent providers. TheCEPaaS system, on the

other hand, only demands clouds that are physically apart. Note, however, that this less archi- tecture already brings the two most important advantages of multi-cloud to theCEPaaSsystem. First, it increases system availability, as it is possible to continue to process user queries even if an entire cloud goes o↵-line. Second, it enables exploration of the geographical diversity

of clouds, creating the possibility of a strategic deployment in which system resources are positioned close to event sources and consumers.

It is important to emphasize the architecture does not need to be modified whether the clouds are managed by di↵erent providers or not. In both cases, all three deployments from

the figure contain a set ofsystem components that are required for running user queries. The primary deployment also hosts components used for user interaction. Note that the number of secondary deployments is not fixed and depends on the quality of the service that the provider

Container Mgmt System (CMS) RegistryImages

Message Broker Messages Q1 Q2 Config Manager Config Manager Query Analyser & Manager Query Analyzer & Manager (QAM)

CEPaaS Core

CEPaaS Core CEPaaS UI CEPaaS Web Q3 UI API Data Storage Container Mgmt System (CMS) RegistryImages

Message Broker Messages

Q4 Config Manager Config Manager Query Analyser & Manager Query Analyzer

& Manager (QAM) Q5

Primary Deployment

Secondary Deployment 1 Secondary Deployment 2

System Components User Queries Provided Components Communication Boundary Container Mgmt System (CMS) RegistryImages

Message Broker Messages

Q6 Config Manager Config Manager Query Analyser & Manager Query Analyzer

& Manager (QAM) Q7

Figure 7.1:CEPaaSsystem architecture. wants to o↵er.

Another important aspect of the CEPaaS architecture is that every deployment is man- aged by a CMS, which is either provided as a managed service, such as Amazon Container Service [11] and Google Container Engine [60], or is pre-installed in the cloud servers. By encapsulating every system component as anapplication containerit is possible to isolate and control their resource usage. This encapsulation also facilitates and encourages independent upgrade of system functionalities. These benefits are similar to the ones brought by VMs, yet with less execution overhead and more efficient usage of resources (Section 2.3). Moreover, the

infrastructure provided by a CMS guarantees that all containers are constantly running, which drastically simplifies the implementation of fault-tolerance in the system.

It is important to note that even user queries are executed as application containers in

CEPaaS. This is a very important design decision that brings two additional benefits to the

system. First, scalability in the number of queries is naturally handled as new query containers are created and scheduled by the CMS. Second, because queries have di↵erent resource re-

quirements and workload profiles, an intelligent scheduling strategy can significantly increase the utilization level of the cloud servers.

On top of this architecture, the CEPaaS system adopts an extensible multi-tenant design based on a query template mechanism that relieves users from learning query definition lan- guages. In the CEPaaS system, queries are created by simply instantiating query templates.

In addition, advanced users can still create new query templates based on a library of oper-

ator templates or create new operator templates based on a Java API. Finally, because query

and vertex templates can be shared among customers, this design promotes a strong library of operators and queries that is maintained and reinforced by the users themselves.