• No results found

Automatic, multi-grained elasticity-provisioning for the Cloud. Architecture Definition of the Elasticity Provisioning Platform

N/A
N/A
Protected

Academic year: 2021

Share "Automatic, multi-grained elasticity-provisioning for the Cloud. Architecture Definition of the Elasticity Provisioning Platform"

Copied!
45
0
0

Loading.... (view fulltext now)

Full text

(1)

Automatic, multi-grained elasticity-provisioning for the Cloud

Architecture Definition of the Elasticity Provisioning Platform

Deliverable no.: D3.1

(2)

Table of Contents

1 Introduction ... 7

1.1 Purpose of this document ... 8

1.2 Document structure ... 8

2 Automated cloud services deployment and management state of the art... 9

2.1 Resource management tools ... 10

2.1.1 Cloud specific resource management tools ... 10

2.1.2 General purpose resource management tools ... 10

2.2 Automated cloud resource management approaches ... 11

2.2.1 Industrial/opensource approaches on automated cloud resource management 11 2.2.2 Research approaches on automated cloud resource management ... 13

2.3 Application modeling/profiling ... 15

2.4 Multi-tier application tracing ... 16

3 Elasticity provisioning platform requirements ... 17

3.1 Actors ... 17

3.2 Use cases ... 18

3.3 Functional requirements ... 20

3.4 Non-functional requirements ... 22

4 Elasticity provisioning platform architecture ... 23

4.1 Overview ... 23 4.2 Platform components ... 23 4.2.1 Resource Provisioner ... 26 4.2.2 Decision module... 27 4.2.3 Application profiler ... 31 4.2.4 CELAR DataBase ... 32 4.2.5 CELAR manager ... 34

4.3 Elasticity provisioning platform deployment overview ... 35

4.4 Elasticity provisioning platform workflows of supported functions ... 35

4.4.1 Deploy application ... 36

4.4.2 Decide elastic operation ... 36

4.4.3 Perform elastic operation ... 37

4.4.4 Profile application ... 38

5 Conclusions ... 39

(3)

List of Figures

Figure 1-1 CELAR system architecture ... 8

Figure 3-1 UML diagram of the elasticity provisioning platform use cases ... 20

Figure 4-1 Elasticity platform architecture ... 25

Figure 4-2 Elasticity platform interaction during the lifecycle of an application ... 25

Figure 4-3 Decision module overview ... 29

Figure 4-4 Application profiler ... 31

Figure 4-5 Application deployment loop... 36

Figure 4-6 Decision loop ... 37

Figure 4-7 Perform elastic operation loop ... 38

Figure 4-8 Profiling loop ... 38

List of Tables

Table 1 Elasticity Provisioning Platform Use Cases ... 18

List of Abbreviations

UI User Interface

API Application Programming Interface SaaS Software as a Service

IaaS Infrastructure as a Service PaaS Platform as a Service MPC Model-Predictive Control AWS Amazon Web Services EBS Elastic Block Storage S3 Simple Storage Service EC2 Elastic Compute Cloud IaaS Infrastructure as a Service VM Virtual Machine

SLO Service Level Objective AMI Amazon Machine Image

(4)

Deliverable Title Deliverable no.:

Architecture Definition of the Elasticity Provisioning Platform

D3.1

Filename CELAR_D3.1_finalrelease.docx

Author(s) Dimitrios Tsoumakos, Ioannis Konstantinou,

Nikolaos Papailiou, Ioannis Giannakopoulos, Christos Mantas, Georgiana Copil, Daniel Moldovan

Date 05-04-2013

Start of the project: 01-10-2013 Duration: 36 months

Project coordinator organization: ATHENA RESEARCH AND INNOVATION CENTER IN INFORMATION COMMUNICATION & KNOWLEDGE TECHNOLOGIES (ATHENA)

Due date of deliverable: 31/03/2013 Actual submission date: 05/04/2013

Dissemination Level X PU Public

PP Restricted to other programme participants (including the Commission Services) RE Restricted to a group specified by the consortium (including the Commission Services) CO Confidential, only for members of the consortium (including the Commission Services) Deliverable status version control

Version Date Author

1.1 5/4/2013 Dimitrios Tsoumakos, Ioannis

Konstantinou, Nikolaos Papailiou, Ioannis Giannakopoulos, Georgiana

Copil, Daniel Moldovan, Christos Mantas

1.0 29/3/2013 Dimitrios Tsoumakos, Ioannis

Konstantinou, Nikolaos Papailiou, Ioannis Giannakopoulos, Georgiana

(5)

Christos Mantas

0.95 27/3/2013 Dimitrios Tsoumakos, Ioannis

Konstantinou, Nikolaos Papailiou, Ioannis Giannakopoulos, Georgiana

Copil, Daniel Moldovan, Christos Mantas

0.8 14/3/2013 Dimitrios Tsoumakos, Ioannis

Konstantinou, Nikolaos Papailiou, Ioannis Giannakopoulos, Georgiana Copil, Daniel Moldovan, Christos Mantas

0.7 13/03/2013 Dimitrios Tsoumakos, Ioannis

Konstantinou, Nikolaos Papailiou, Ioannis Giannakopoulos

0.6 12/03/2013 Dimitrios Tsoumakos, Ioannis

Konstantinou, Nikolaos Papailiou, Ioannis Giannakopoulos

0.5 11/03/2013 Dimitrios Tsoumakos, Ioannis

Konstantinou, Nikolaos Papailiou, Ioannis Giannakopoulos

0.4 10/03/2013 Dimitrios Tsoumakos, Ioannis

Konstantinou, Nikolaos Papailiou, Ioannis Giannakopoulos

0.3 07/03/2013 Dimitrios Tsoumakos, Ioannis

Konstantinou, Nikolaos Papailiou, Ioannis Giannakopoulos

0.2 06/03/2013 Dimitrios Tsoumakos, Ioannis

Konstantinou, Nikolaos Papailiou, Ioannis Giannakopoulos

0.1 28/02/2013 Dimitrios Tsoumakos, Ioannis

Konstantinou, Nikolaos Papailiou, Ioannis Giannakopoulos

(6)

Abstract

The aim of this document is to present the architecture of the Elasticity Provisioning Platform. A detailed study of approaches for automated cloud services deployment and management is performed, where we compare these approaches with CELAR’s Elasticity Provisioning Platform and we identify their relative strengths and weaknesses. We present the platform’s functional and non-functional requirements. The platform is broken down into sub-modules where a detailed description for each one is given. A description of the platform’s deployment is presented. The basic platform’s functionalities are being described as workflows in the form of sequence diagrams depicting the module’s interaction with other CELAR sub-modules.

Keywords

(7)

1

Introduction

Elasticity, i.e., the property of a system to expand or contract its resources according to user demand offers multiple advantages, especially in situations where the workload is not static or predictable. In these situations, system operators resort in over-provisioning solutions where the amount of needed resources during peak-hours is constantly reserved, which incurs extra costs for the system operator. The wide adoption of cloud computing has given the opportunity to allocate or de-allocate computing resources according to user demand in a pay-as-you go manner by utilizing well defined management APIs that interact with the cloud provisioning platform. Although this functionality may form the basis for elastic resource allocation, there are a lot of other issues that a system operator needs to design and implement in order to provide a fully automated, configurable and reliable elastic functionality.

The CELAR platform gives the opportunity to an application provider that operates over cloud infrastructures to automatically scale up or down any part of its application by utilizing user defined-policies. Users describe their application topology (i.e., the different components or modules and their interactions) by utilizing a user-friendly UI. For each component, the user can select from a list of available scaling actions and define its scaling policy. CELAR takes care of the correct initial application launching and configuration. During the application’s lifetime, a scalable monitoring module constantly tracks application and infrastructure specific metrics. The application’s performance as observed through the monitoring system is fed to the CELAR’s elasticity platform, which, according to the user-defined policies, decides and performs the appropriate scaling actions. The actions are performed by translating higher level commands into application specific scripts or commands which are executed at the deployed application in an automated manner. An overview of the CELAR architecture is depicted in Figure 1-1.

The automatic fine-grained elasticity of multi-layered applications in the cloud requires a management layer that can efficiently handle all the challenging requirements posed by the required functionality. In the CELAR case, the Elasticity Provisioning Platform offers the aforementioned functionalities. In the right of Figure 1-1 we notice the Elasticity Platform’s place in the total CELAR architecture. In essence, the Elasticity Provisioning Platform is responsible for the efficient, automatic and versatile application resource management by interacting with the cloud provider in the SaaS/PaaS/IaaS layers (center of Figure 1-1) following user-provided scaling policies or rules. User input such as application description or launching is provided by the UI/c-Eclipse framework (top of Figure 1-1), whereas the application’s performance is being reported by the Monitoring module (left of Figure 1-1).

(8)

Figure 1-1 CELAR system architecture

1.1

Purpose of this document

The aim of this document is to define and present the system architecture of the Elasticity Provisioning middleware. The definitions of the platform’s functional and non-functional requirements drive the design of the platform’s modules. A detailed description of each sub-module will offer better understanding of each sub-module’s role and help during the platform development phase. What is more, the description of the most important platform actions in the form of a workflow of both internal and external platform components aims to clarify the particular component interactions during the basic platform functionalities.

1.2

Document structure

The structure of the rest of this document is as follows: In Section 2 we give an overview of the state of the art in automated cloud services deployment and management. We present both commercial/opensource approaches along with research approaches that address specific problems that arise during automated application resource management and profiling/tracing, in order to compare them with the CELAR vision. In Section 3 we describe in detail the functional and non-functional requirements of the Elasticity Provisioning Platform. In Section 4 we present in detail the platform components, we give a description of the platform’s deployment and present the workflow of the basic platform tasks that depicts how the components interact with each-other in order to offer the requested functionality. We conclude this document in Section 4.4.

(9)

2

Automated cloud services deployment and management state of

the art

Automated cloud services deployment and management is an active of interest to both the industrial and the academic community: the ability to automate the configuration and deployment of complex multi-tier applications in the cloud in a dynamic and adaptive manner presents interesting research challenges and offers a new business model to companies operating on the cloud.

A software system running on the cloud consists of general cloud resources and application (i.e., software) modules. The efficient management (i.e., automatic, adaptive, flexible and generic resource handling) that CELAR targets requires the translation of these abstract resources and modules into specific, fine-grained (such as number of CPUs, memory size, available bandwidth, storage types and sizes, etc) or coarse grained (e.g., number of virtual machines of a specific type/size) cloud resources. Moreover, specific application deployment descriptions which state in a more coherent way how these resources are interacting, configured and correctly installed are required, in order to allow the correct application re-configuration and setup after the performed cloud infrastructure changes (i.e., IaaS resource scaling up or down).

Proper and efficient management of this diverse set of resources and interactions requires the following information:

• Definition of the available management actions: each resource needs to be modeled and all the possible management actions need to be defined in order to correctly identify the entire space of the available actions.

• Definition of automated management functionality: In order to support the automatic and unattended resource management the triggering conditions that will initiate management actions need to be modeled and correctly stated.

• Definition of the means of measuring management efficiency: in order to evaluate the outcome of management actions, a clear view of the application’s performance and how it was affected by the management decisions is needed.

Our state-of-the-art analysis is structured as follows:

We first begin with a brief description of both general purpose and cloud specific resource management tools. While these tools do not automatically perform any resource management, we present them because they are utilized during the automation process. In the following section we describe the automated cloud resource management approaches. This section presents works that are more directly related to CELAR, regarding the automatic elasticity aspect. This section is broken down into industrial/opensource and research approaches. We conclude our state of the art analysis with a brief description of works targeting application modeling/profiling and multi-tier application tracing. Although these works do not directly address automated service deployment and management, their findings will be used during the monitoring and decision making process of the CELAR platform.

(10)

2.1

Resource management tools

In this section, we give an overview of resource management tools and platforms that can be used to perform automatic resource management. We present an overview of industrial/opensource cloud specific solutions along with a suite of general purpose management tools that can be used to automate the resource configuration procedure. In general, management is offered by current cloud providers/software by the means of abstracting available resources and providing a simple API to interact with them.

2.1.1 Cloud specific resource management tools

All commercial solutions from major cloud vendors such as Amazon AWS [Aws], RackSpace cloud [Rackspace], GoGrid [Gogrid], recent Google’s Compute Engine [Googleceng], Microsoft’s Windows Azure Virtual Machines [Azurevms], IBM smart cloud [IBMcloud], HP Cloud services [Hpcloud], etc along with open-source IaaS management software implementations that can be utilized to deploy private clouds like Eucalyptus [Eucalytpus], OpenStack [Openstack], OpenNebula [Opennebula], etc follow a similar pattern regarding resource management. They break down and abstract available resources into the general categories of computation, memory and network, they pre-construct a mixture of these resources into a set of instance types, each one representing a different resource size, and they offer this instance types to the final users through a management API which is used to determine and regulate their lifecycle.

Storage abstraction and management is a bit different. In general, it can be categorized between ephemeral storage (i.e., storage that is destroyed upon virtual machine termination), transactional persistent block storage (e.g., Amazon’s EBS and OpenStack’s block storage offered at virtual machines as raw block devices) and “write once read many” object storage (e.g., Amazon’s S3 and OpenStack’s Swift offered to the user as a KeyValue datastore). Nevertheless, storage management is supported using a set of provider specific APIs that regulate the storage’s size define read/write operations and attach/detach commands.

Many commercial and open-sourced solutions that target the management of multiple IaaS clouds in order to support multi-cloud operations are being offered in the IaaS layer. RightScale [Rightscale], which is the pioneer in this area, followed by other companies/solutions like scalr [Scalr], enStratus [Enstratus], kaavo [Kaavo] and SlipStream [SlipStream] offer multi-vendor IaaS management solutions. In order to achieve this, each system supports a set of different IaaS providers and translates higher level management commands to the IaaS specific commands. The cloud interoperability functionality in the CELAR case is addressed with the use of SlipStream , which offers all the above functionality.

2.1.2 General purpose resource management tools

Complex intra-application configuration management is also addressed by a number of software packages, that, although they do not directly address cloud infrastructures, they can easily be utilized inside a cloud setting. Although it is not CELAR’s purpose to propose, design and build a new management tool, we briefly present a representative set of these applications in order to allow the reader to have a better understanding of how abstract configuration and management commands can be compiled into specific deployment commands and scripts.

(11)

Chef [Chef] is a management tool, used for configuring cloud services. Chef can automatically tweak the operating systems and the applications that run in data centers by using Chef scripts. A Chef script contains a set of “recipes” written by the user in order to define how Chef will manage server applications and how they are to be configured. Each recipe describes a series of resources along with their state: which applications should be running, which packages should be installed, which files should be written, etc. Chef can be executed in a standalone mode (chef-solo) or a server/client mode and it can be integrated with many cloud based platforms.

Puppet [Puppet] is another configuration management tool, which is used to configure cloud services. Users describe system resources and their state in a declarative way, stored in files calls Puppet manifests. The manifests are then compiled into system-specific catalogs containing resources and resource dependency which are then applied to target systems. The resource abstraction layer enables users to describe application in high level terms (e.g., users, packages, services, etc) without specifying OS specific commands.

CFEngine (ConFiguration Engine) [Cfengine] is a popular open source configuration management system which provides automated configuration and maintenance of large-scale computer systems. In contrast to the aforementioned configuration tools, CFEngine does not use recipes or commands of tasks that should be executed, but its main idea is that the management system should execute the appropriate tasks in order to achieve the desired final state. This means that when the system wants to achieve a new state, it executes a number of steps in order to converge to this state. Although the ideal state cannot be guaranteed, the system tries to approach it with a best-effort manner. The CFEngine system is highly portable, as it can be used both on UNIX and Windows systems. Salt [Salt] is another open source configuration management system. It has been designed to be highly modular and easily extensible. Salt enables the user to configure and manage the system by writing appropriate functions (in Python language) and providing a function and a target.

SlipStream [Slipstream] is an application that allows automated provisioning and creation of cloud resources. SlipStream provides a simple access to a cloud infrastructure. It can provide repeatable single-click deployment of multi-component distributed applications on any cloud infrastructure; it hides cloud-specific instantiation of images and contextualization while keeping deployment scripts versioned and organized.

2.2

Automated cloud resource management approaches

In this section we present approaches that automatically handle resource management in a cloud environment. We first describe industrial/opensource approaches and we continue with research works.

2.2.1 Industrial/opensource approaches on automated cloud resource management

Commercial offerings such as Google’s AppEngine [Appengine], Microsoft’s Azure [Azure], Heroku [Heroku], CloudBees [Cloudbees], and Engine Yard[Engineyard] offer tools to easily create and deploy a cloud application on premises. The management of the utilized resources is then performed automatically by the provider, without allowing the user to control the underlying utilized resources: users access a sandboxed system in which they have only a limited set of available operations to perform, without the ability to monitor and

(12)

manage underlying IaaS related aspects, since this is performed by the provider on a user-agnostic manner. Scaling up and down of resources is performed transparently to the user. In order to solve the aforementioned limitations the Amazon’s Beanstalk [Beanstalk] and AWS OpsWorks [Opsworks], Redhat’s openshift [Openshift], Vmware’s cloud foundry [Cloudfoundry] and windows azure compute PaaS management [Azure] systems have been proposed. All these systems offer similar functionality with previous PaaS offerings with the main difference that they give access to the underlying IaaS infrastructure to the user. They all follow a general pattern: application modules are being defined and IaaS resources are being assigned to each module by utilizing the notion of node “roles”, according to the application architecture. For instance, in a typical web serving application, there are two types of roles: the web server role and the datastore role. According to the application’s needs and user defined parameters, the population of different node roles regulates the utilized resources. These systems offer out of the box support for a number of popular software packages such as databases, web servers, KeyValue stores, messaging systems, etc. In the Beanstalk [Beanstalk] case, users create their code and upload it to the AWS management console, and a set of different Amazon services including Elastic Load Balancer [Awselb], Auto Scaling [Awsautoscaling], and Cloud Watch [Cloudwatch] are being auto-configured and deployed. After the initial deployment, the user can alter the vanilla settings of these services and modify both the application’s resources and the manner in which the system will respond into sudden load changes.

In the AWS OpsWorks [Opsworks] case, users create a number of layers that define how to configure a set of resources which are managed together, including software configuration and installation scripts. Each layer may consist of a number of application instances. Every time a new instance is initialized, the predefined configuration steps are taken automatically.

In the AWS CloudFormation [Cloudformation] case, users can create and manage a collection of AWS resources, provision and update them. Users can use templates to describe their resources and any associated dependency between them. When deployed, users can modify and update the resources, allowing version control of the infrastructure. In the Cloudify [Cloudify] case, users describe their application using external blueprints (recipes) used for application deployment and post-deployment configuration. These blueprints hide any cloud-specific configuration from the user, allowing the application to be highly portable between different IaaS’s while, at the same time, it allows fine-tuned application deployment.

In the OpenShift [Openshift] case, the application is deployed into a number of RHEL Linux machines which take a “multi-tenant on the OS” approach, allowing many users to run on the same virtual machine. Auto-scaling rules such as “scale up if the number of concurrent requests exceed 90% of max concurrent requests over 1 period” or “scale down if the number of concurrent requests fall below 49.9% of max concurrent requests over 3 consecutive periods” are being utilized. Users can freely alter the infrastructure and the management/scaling rules. Nevertheless, openshift does not provide a completely isolated virtualization framework and it is heavily dependent on RHEL software.

The Cloud Foundry [Cloudfoundry] PaaS platform management system runs on Vmware’s infrastructure and utilizes its vSphere virtualization platform. The platform supports a number of runtimes and applications such as Java, Ruby, Node.Js, MySql, Redis, etc and the management follows the same concept: the application is deployed utilizing an initial pool of

(13)

resources, along with some basic scaling decisions, and the user later can alter both the resource size and the scaling manner. We also notice the heavy dependence in VMware’s products something that may lead to a vendor-lock in.

The windows Azure [Azure] compute PaaS platform also provides a management API to launch a PaaS application by utilizing the notion of Cloud service. Cloud services comprise of different types of cloud resources, where the user can regulate their size according to his requirements.

RightScale [Rightscale] is another commercial solution that aims at automated cluster provisioning, providing tools for Server, Cluster and Application Monitoring. It also provides auto-scaling triggered by alerts based on built in and custom metrics. Inserting of new nodes is automated using Javascript macros but resize decisions are not evaluated cost-wise. The applications targeted by RightScale are mainly conventional web-servers (Apache, Nginx, etc.) backed by relational DB datastores. The system is not application-aware (cannot be trained for a particular backend) and the provisioning of Dynamic NoSQL Applications aimed at large-scale, dynamic workloads is not addressed. On the other hand, CELAR will be able to support Distributed NoSQL systems, use application-specific information if available and also take cost into account for its resize decisions.

2.2.2 Research approaches on automated cloud resource management

Recent approaches in managing cloud resources utilize findings of the area of control theory, since in essence the whole management can be abstracted in the automatic control of an abstracted system. The work of [Lim 2010] presents policies for elastically scaling a Hadoop storage-tier of multi-tier Web services based on automated control. They utilize an integral controller for a specific metric that dictates the number of VMs needed to satisfy a pre-defined Service Level Objective (e.g. response time should be less than 3 sec). Although this is an interesting approach, it is only focused on Hadoop clusters, whereas we opt for a more generic approach.

The work of [Trushkowsky 2011] also utilizes findings in the area of control theory. Their work targets dynamic resource allocation for distributed storage systems which need to maintain strict performance SLOs. They opt for model-predictive control (MPC) which can yield improvements over classical closed-loop control systems, and during scaling decisions they also manage data placement and replication factor while trying to minimize actual data transfers. In our case, we are targeting a more broad area of applications, resources and available cloud providers.

The automatic scaling of a generic distributed storage system is also the work of [Gandhi 2012]. The author presents AutoScale, a scaling methodology that monitors traffic and takes resizing actions by utilizing the capacity inference algorithm, an algorithm that calculates the required number of nodes in a broad range of applied workloads. Their approach also is limited to specific application architectures, namely distributed keyValue datastores, and it also considers a narrow space of available management actions, i.e., only server additions and removals.

In the work of [Shen 2011], the authors present an elastic resource scaling system that can adaptively manage virtual machine resource usage according to applied workloads and agreed SLOs. They present a system that is installed in each hypervisor of the cloud operator’s hardware, and regulate each individual virtual machine resource usage. The system can adaptively set the percentage of CPU usage along with the memory size of each

(14)

virtual machine, and it can dictate VM migrations. Nevertheless, it is only applied at the virtual machine layer, without considering higher level application semantics, it can manage only CPU and memory resources and it needs extensive changes at the cloud provider’s underlying cloud management software.

PRESS [Gong 2010] is an example of an application resource elasticity framework which automatically scales virtual machines according to the workloads while considering energy consumption and SLOs. The framework use prediction for reducing under and over provisioning errors and use CPU frequency scaling for achieving energy savings with minimal impact on SLOs. As in the case of [Shen 2011], PRESS needs changes at the cloud provider’s internal architecture and used software, whereas CELAR does not require specific changes at the software running on the IaaS provider.

In the work of [Marshall 2010], a model implementing an elastic site resource manager that dynamically on demand consolidates remote cloud resources based on predefined policies and a prototype for extending Torque clusters is designed and presented. Nevertheless, their approach targets mostly cluster-based systems, whereas more complex applications are not addressed.

A similar approach is followed in the work of [Herodotou 2011]. In this work, the authors present the Elastisizer, a system that identifies the correct size of a Hadoop cluster and launches the appropriate resources, according to the datasets and job types posed by the users. The authors with the use of application profiling and some previously collected traces calculate the correct size of a Hadoop cluster, according to the amount of processing needed for submitted jobs. Nevertheless, their approach does not handle dynamic workload changes, and it is applied only to specific data-intensive applications.

The works in [Xiong 2011], [Soror 2008] solve the problem of optimizing VM resources (CPU, memory, etc) to achieve maximum performance using relational DBs. The work of [Xiong 2011] performs a more fine-grained application level resource tuning (e.,g, they alter the database replication factor according to demand) whereas the work of [Soror 2008] addresses only the interference of different databases running in a single physical host. Although these approaches efficiently solve specific resource allocation and management problems, we opt for a more generic, application and cloud agnostic methodology that can handle more cases (e.g., not only relational databases) using a variety of fine-grained and coarse-grained elasticity actions.

In [Elmroth 2011] the authors select predictive elasticity, admission control and virtual machines placement as key issues in managing cloud applications. They describe a hierarchical control mechanism, with separate controllers for each application tier, each controller having a set of sub low level controllers for memory, storage, bandwidth and CPU. CELAR also provides multi-level hierarchical control, but at different application logical structure levels, not at virtual machine level.

The work in [Malkowski 2010] defines three types of models (horizontal scale model, empirical model, and workflow forecast models) for scaling the workflow according to the SLAs and the predicted workloads. Wang et al. [Wang Q. 2011] takes a different approach in elasticity management, focusing on software resource allocation (e.g. thread pool size management, DB connection pool, etc.). The authors emphasize that software resource management is as important as hardware resource management for cloud applications, and that both under and overprovisioning of software resources can introduce performance bottlenecks. Kranas et al. [Kranas 2012] propose a framework for automatic scalability using

(15)

an application graph as a base model for the application structure/topology and introduce elasticity as a service cross-cutting different cloud stack layers (SaaS, PaaS, IaaS), for automatically allocating/de-allocating resources to different application nodes belonging to the application graph.

As opposed to resource-level elasticity management, CELAR defines a framework for higher level elasticity management, controlling the application elastically at different levels, not only from the cloud infrastructure point of view.

The economics challenges introduced by the pay-per-use cloud computing model are analyzed in [Suleiman 2011], promoting the comparison and selection of cloud service providers based on their elastic capabilities. The authors of [Sivadon 2012] also focus on optimizing cost of resource provisioning, highlighting the complexity of cloud VM selection, given existing different subscription schemes and types of VM, each with different characteristics. The work describes several provisioning stages, from initial deployment to run-time on demand provisioning. In our work we use a similar approach as a base tool, using information knowledge about the application logical structure and monitoring data to perform elasticity actions at different levels, provisioning of resources while respecting client predefined cost and quality.

Such works supports the idea that cost elasticity is an important property of elastic cloud computing. CELAR supports cost elasticity, allowing cloud clients to be elastic with respect with how much they pay for their cloud services.

Works such as [Iosup 2011] targeting cloud provider performance analysis towards migrating grid systems on public cloud infrastructure, or [Kitsos 2012], which targets performance degradation due to virtual machine collocation, acknowledge that quality management is a non-trivial task due to the heterogeneity of available services.

CELAR takes a multi-dimensional approach to cloud elasticity management, considering cost elasticity, resources elasticity and quality elasticity as core dimensions of cloud service elasticity.

2.3

Application modeling/profiling

Taking automated elastic decisions for an application requires a way to predict their impact on the application’s performance. Although approaches with zero or little initial knowledge that incrementally increases during the initial decisions can be utilized, they may have a negative impact during the early deployment phase. In order to be able to predict the application behavior, under different deployment and load scenarios, systems require both a generic application model and a way to train it using benchmarking tests. Poor application modeling could lead to increased decision errors that will affect the elasticity services provided by CELAR. In this section we present an overview of research approaches that try to tackle the problem of modeling, benchmarking and profiling distributed applications. Most modeling research approaches focus on multi-tier applications [Rui 2012, Urgaonkar 2005, Urgaonkar 2008, Xiong 2009] and use queuing theory for application modeling. A multi-tier architecture provides a flexible, modular approach for designing web applications. Each application tier provides certain functionality to its preceding tier and uses the functionality provided by its successor to carry out its part of the overall request processing. The proposed models are based on networks of queues, where the queues represent different tiers of the application (i.e., a queue represents a message or interaction flow between the different tiers).

(16)

In order to estimate tier latencies, models require several parameters as input. In practice, these parameters can be estimated by monitoring the application as it services its workload. Most of the above research approaches utilize application specific stress tests that are manually generated for every different application. In contrast, for the purposes of CELAR we opt that a generic benchmarking and profiling is required.

Research on application benchmarking is divided in two main categories: direct experimentation in real cloud environment and full cloud system simulators. Several cloud simulators have been developed, including CloudSim [Calheiros 2011], SPECI [Sriram 2010], GroudSim [Ostermann 2011], and iCanCloud [Nunez 2011]. The full simulation approach suffers from parameter explosion and thus systems need to trade precision for speed, even with advanced parallel and distributed simulation techniques. Furthermore, simulation results are error sensitive and require in depth analysis to avoid misleading results [Mills 2011].

In the other hand, direct experimentation in real cloud environments emerges as the best choice for obtaining accurate performance results. A few cloud benchmarking tools were developed to achieve automated experiment configuration, application workload generation, and on-line performance monitoring [Yigitbasi 2009, Ye 2010, Cooper 2010, Qi 2012]. In [Qi 2012], a discrete-event simulation is used to generate benchmarking scenarios for general applications deployed in the cloud. The authors of [Wang C. 2011] describe a framework for scalable real-time data collection and aggregation called Monalytics, defining a distributed hierarchical monitoring, data aggregation and analysis mechanism. Providing higher level metrics from a cloud monitoring system is a non trivial task, since this information cannot be collected from the virtual machine level, as it requires application specific knowledge.

The presented works focus either on detailed application modeling or benchmarking. CELAR requires both a general application model and an automated way to train the model's parameters. For these reasons a profiling module will be responsible to generate application benchmarking scenarios that will provide an initial training set for the application modeling. Application benchmarking will be done with respect to the time and the cost employed. Thus the profiling module should be able to select the most valuable profiling configurations that can be executed within the time and cost budgets of each application.

2.4

Multi-tier application tracing

Another area that has been extensively researched is performing Multi-Tier tracing of Distributes Systems. This function has traditionally been used for performance debugging and will also be helpful in CELAR's Interceptor Module (a more detailed description is given in D1.1) for providing information about the state at which the system operates. Google's Dapper [Sigelman 2010] is an example of a production-level tracing system that has been applied successfully to a large number of Cloud Applications to date, despite its short lifetime. Dapper however builds upon the concepts of some earlier research systems, namely Microsoft's Magpie [Barham 2003], X-Trace [Fonseca 2007] and Pinpoint [Chen 2002]. What those systems have in common, together with Whodunit [Chanda 2007] and Stardust [Thereska 2006] is the use of annotations to trace cross-tier function calls and messages and produce the "Call-Graph". This implies the ability to interfere with the Application's code (i.e. clear-box system), or at least its runtime environment. Systems that treat the Application as a complete "Black-Box" include Project5 [Aguilera 2003] and it's later work, WAP5 [Reinolds 2006] as well as [Cohen 2004] and Microsoft's Sherlock [Bahl

(17)

2007], Orion [Chen 2008] and eXpose [Kandula 2008]. In order for such a system to be used continuously with low overhead it needs to employ sampling and effective data management for its measurements. If so, then it can be helpful for application profiling and performance analysis (similar to [Paradyn 1995]).

CELAR will leverage knowledge from those systems and incorporate it in its interceptor module. The interceptor will be able to deduce causal relationships between the actions taken on each of the particular subsystems of the user’s application. By doing so in a sampling manner it will extract useful information for the application’s performance state and make this information available to the Decision Module without considerable overhead.

3

Elasticity provisioning platform requirements

The Elasticity Provisioning Platform is a “middleware” that hosts all central to CELAR operations. Its main functionality is to perform the appropriate elasticity actions according to the monitored workload and the user-defined policies and to ensure the correct application deployment according to user-defined descriptions. In order to provide the aforementioned functionality, it needs to interact with the Application Management Platform, so as to take user-provided application information, with the Cloud Information and Performance Monitor layer in order to receive real-time updated load information about individual application components, with the IaaS platform to perform the appropriate scaling actions and with the Application Expert to store and utilize application specific scripts or commands.

According to the Application Driven Use Cases presented in D1.1, in this section we present the Use Cases for the elasticity provisioning platform. We start with a brief description of the actors that interact with the platform, we continue with the description of the use cases, and finally we give an overview of the derived functional and non-functional requirements of the Elasticity Provisioning Platform. Since the Elasticity Provisioning Platform is an internal CELAR module that does not directly interact with all the CELAR “human” actors described in D1.1, in the following we consider actors the external CELAR modules that will directly interact with the Elasticity Provisioning Platform. The only exception is the CELAR Expert actor who has a more direct interaction with the platform.

3.1

Actors

The Elasticity Provisioning Platform will have a direct interaction with a number of different actors, each of whom will have specific responsibilities. These actors are:

IaaS/Application Platform: The IaaS actor represents the IaaS and the infrastructure where CELAR is located on. IaaS will accept commands from specific modules of the CELAR platform and it will be responsible for resource allocation, as well as providing hypervisor-level metrics to the platform (A detailed description about the monitoring metrics is given in D1.1). The current actor will interact with the Elasticity Provisioning platform when new IaaS resources are needed or released during the application’s lifetime.

Cloud Information and Performance Monitor: The role of this layer is to collect, process and distribute monitoring metrics to CELAR components and to subscribed

(18)

users. While an application is up and running, Monitoring Agents collect data via probes from the virtual machine, virtual cluster, cloud and application level. In order to monitor and collect intra-application performance using application-level metrics (e.g., latency/throughput) the Interceptor sub-module is proposed (a more detailed description is given in D1.1). The current actor provides information to the Elasticity Provisioning Platform about the application’s performance status, and is used during scaling decisions.

Application Management: Based on the actors defined in D1.1, the Application Management layer will be used to describe, submit and deploy a specific application to the CELAR platform. The module will provide an intuitive graphical user interface (GUI) that will enable the application expert to describe its application in an efficient way. Information about application topology, elasticity models and optimum policy requirements, etc, will be collected using this module. The Application Expert uses this module to deploy the described application by selecting resources and available applications existing in the cloud platform (a more detailed description is given in D1.1). Moreover, this module is used by the Application User. The Application Management actor will provide to the Elasticity Provisioning Platform the required information about how and when an application is deployed and configured.

CELAR Expert: CELAR expert is the actor who is responsible for the maintenance of the platform. Apart from this, the CELAR expert has the ability to execute Application Profiling as described in D1.1, if needed, and it will also be able to create custom made scripts, modeling the resizing actions of the application.

3.2

Use cases

In the following table we give an overview of the Elasticity Provisioning Platform Use Cases.

Table 1 Elasticity Provisioning Platform Use Cases

Use case name Actor Description

Provide IaaS

resources IaaS/Application platform Cloud resources are allocated or de-allocated using the IaaS management software whenever a scaling action is decided by the elasticity provisioning platform.

Provide

Metrics Cloud Information and Performance Monitor

The Cloud Information and Performance Monitor module provides fresh and up-to-date performance metrics that depict the application’s status, and are used by the elasticity provisioning platform during scaling decisions. Metrics can be both generic VM related (CPU usage, etc) and application or module specific metrics (by utilizing the Interceptor sub-module).

Describe

Application Application Management The Application Management module stores the application topology/description along with its elasticity requirements and available actions per component/module, as they were described by the application users. This information is used by the Elasticity Provisioning platform during scaling actions and decisions.

(19)

Deploy

Application Application Management The Application Management module also is used by the Elasticity Provisioning platform during the application deployment. All the appropriate scripts, VM types and configuration commands that are given to the Application Management module by the application user are being utilized for the application deployment. Profile

Application CELAR Expert Application profiling is the process where the application is executed under different deployment setups in order to export conclusions about the application's behaviour under different loads/committed resources. The CELAR expert controls the profiling process. The execution of the profiling process is being carried out by the Elasticity Provisioning platform, and the results of the profiling process are persistently kept.

Application Script Generation

CELAR Expert The CELAR expert with the help of the Application Expert creates custom scripts so that resizing actions can be correctly implemented at the application in hand. These scripts are being stored and executed by the Elasticity Provisioning platform during scaling decisions.

These functionalities are being presented in the following Figure in a common UML use-case diagram.

(20)

CELAR Expert IaaS/Application platform

Provide IaaS resources

Profile Application Application-script Generation Provide Metrics Deploy Application

Elasticity Provisioning

Platform

Application Management Cloud Information and Performance Monitor

Describe Application

Figure 3-1 UML diagram of the elasticity provisioning platform use cases

3.3

Functional requirements

In the following section we present the functional requirements of the elasticity provisioning platform that derive from the previous use cases along with the general CELAR use cases as described in the deliverable D1.1.

Deploy Application:

The goal of this function is to successfully configure and deploy a user defined application. The user describes its application using the c-Eclipse framework where it clearly describes application layers and modules, required software for each module, the interaction between them, and, possibly, an initial amount of resources needed. It takes as input a higher-level model describing the application structure, possibly utilizing the TOSCA [Tosca] specification

(21)

language. It outputs a fully functional application deployment in the cloud. Each application’s module will be registered to be monitored and managed by the CELAR platform: After the initial bootstrapping, the application will monitor its status, and the CELAR will adjust its resources according to user-defined policies and performance metrics, until the application user decides its termination. The application’s lifecycle consists of its initial deployment (bootstrapping) its CELAR-monitoring/management phase, and its termination phase. The initial deployment and termination are user-driven processes, whereas the monitoring/management phase is automated through the CELAR platform. Decide elastic operation:

The goal of this function is to decide the appropriate scaling action, according to the user-defined policy and the current monitored application workload. We note here that scaling actions can be made at the component/module level, and involve a number of different scaling decisions, according to the module type (e.g., centralized versus distributed, etc). For example, in a module consisting of a NoSQL database, a scaling action would be the addition of an extra VM server, whereas in a module consisting of a MySQL database a scaling action would be the addition of extra memory or CPUs. The decision making process takes as input a higher-level user defined policy and the current monitored application performance/load status. The user defined policy is provided by the application description. It outputs a higher-level “add resource” or “remove resource” command which will result into the reservation and deployment of the requested resources, along with the installation or configuration of the required software libraries, according to the software type.

Perform elastic operation:

During the lifetime of an application as described in Figure 4-2, workload variations will lead to a number of elastic operations which will interact with the application resources. In the majority of the applications, whenever a resource change occurs in a specific module, a number of different management actions need to be performed in order to make the application “aware” of the specific change. This function will be responsible for this role. It is different from the previous function of the decision of the elastic operation, as this function in essence implements the higher level “add” or “remove” resource command decision. Higher-level decision actions are translated into specific commands that allocate or de-allocate cloud infrastructure resources during a scaling up or a scaling down decision respectively. For instance, when the decision making module requests for more virtual machines, a command must be issued at the cloud provider to launch more virtual machines. For this part, in order to support better automation and cloud interoperability, the Resource Provisioner component will be utilized.

For instance, whenever a new NoSQL node is added into a cluster, in many cases the cluster “master” needs to be informed about this change, and perform the appropriate actions. It takes as input a higher-level “add resource” or “remove resource” command originating from the decision making module. It outputs a fully functional application deployment which has adjusted its utilized resources according to the scaling action. Each application’s module will be monitored and managed by the CELAR platform.

(22)

Maintain application status:

The goal of this function is to ensure the correct application operation. For fault tolerance, the platform must monitor that all the application components and modules are functioning correctly (i.e., they are up and running). Moreover, when elasticity actions fail to complete (for instance, due to an IaaS error) the platform must ensure that the actions are re-executed. The actual application state (as reported by the appropriate modules) must match with the application state that the Elasticity Platform maintains: whenever an inconsistency is observed the appropriate maintenance actions need to be taken.

Profile application:

In order for the decision making module to take well informed and correct scaling decisions, a form of an abstract knowledge about the application behavior under different workloads and resource allocations is required. Although this application profiling may not be entirely necessary, it will greatly enhance the elasticity platform’s ability to perform correct scaling actions. The profiling can be performed offline (on a non-production environment), by applying a number of a representative workloads on various application “sizes” and measuring some key performance indicators, application specific or not. The suite of representative workloads, the range of applicable application resource sizes and any application specific key performance indicators need to be provided by the application expert. This action takes as input representative workloads, resource sizes and key performance indicators. It outputs knowledge about the application behavior/performance under different workloads and application resource sizes.

3.4

Non-functional requirements

The aforementioned functional requirements of the Elasticity Provisioning Platform create a new set of constraints regarding its design and implementation, namely the non-functional requirements:

Abstraction:

The CELAR platform will be applied to a number of different application setups and deployments. In order to achieve this, the platform must offer a high level of abstraction regarding resource management configuration and deployment. In other words, application users must be able to describe their needs in a very generic/abstract way, and the elasticity provisioning platform must be able to correctly translate these requirements into a number of application specific actions/scripts. Although this translation may involve in the beginning the application user, there must be a straightforward and easy to perform way of translating these abstract actions into specific application scripts/actions.

High-availability:

Since the Elasticity Provisioning Platform manages the entire lifecycle of a specific application, it is very important to operate seamlessly and without any downtime, since this

(23)

may lead to application downtime, or application termination. Redundancy through replication, check-pointing and other solutions will be investigated to fulfill this requirement. Scalability:

CELAR targets web applications that serve millions of users under heavy load and applications that manage and analyze large amounts of data. As of this, the system must be able to scale in order to manage increasing application deployments without any problems: For instance, decision response time and accuracy must not be affected by the size of the available scaling decisions or the number of the reserved resources. What is more, the time needed for a scaling action to be applied must be relatively constant and not very dependent on the application size (i.e., size of allocated and used resources such as CPUs, disks, number of processes, etc), except where this is unavoidable (for instance in redistributing a large amount of data of a sharded DataBase during a node addition or removal).

Efficient and robust scaling actions:

Scaling actions interact with the application operation. This interaction needs to be seamless to the end user that utilizes the application. It is very important that the user does not realize any performance degradation during the time it takes to perform this action, or, when this is unavoidable (as, in the case of legacy modules that were not designed with scalability in mind), to last for a very small period of time. Users should benefit of elastic features without noticing any performance degradation, even if this is transient. What is more, the system must be able to respond quickly in load changes: decisions must be fast, in order to avoid under-provisioned or over-provisioned situations for a long period of time. Nevertheless, the platform needs to avoid oscillations caused by erratic workloads: load smoothing by utilizing moving averages can help the system to avoid unnecessary scaling actions.

4

Elasticity provisioning platform architecture

4.1

Overview

The Elasticity Provisioning Platform hosts all central to CELAR operations. It is the main CELAR component and takes input from almost all other CELAR modules. Its main functionality is to perform the appropriate elasticity actions according to the monitored workload and the user-defined policies and to ensure the correct application deployment according to user-defined descriptions.

4.2

Platform components

Taking into account both the functional and the non-functional requirements, we have broken down the elasticity provisioning platform in the following sub-modules:

(24)

1 Resource Provisioner: module that undertakes the task of reserving or releasing

cloud resources and updating the application configuration in order for it to function using the new resources committed or freed.

2 Decision Module: takes real-time, informed decisions on the type and quantity of

resources that need be added/removed to/from a running application.

3 Application Profiler: a module that will allow the monitoring and measurement for

the characterization of the application’s behavior over representative resource provisioning and load scenarios.

4 CELAR Manager: a module that handles action orchestration in the elasticity

platform, it operates upon the CELAR DataBase and provides fault-tolerance.

5 CELAR DataBase: a module that stores information generally useful to other CELAR

components (monitoring data, application data, etc).

In the following figure we can see the Elasticity Provisioning Platform’s role in the CELAR architecture. When a user wants to use the CELAR system to automatically manage a new application, it will describe the application components and layers by utilizing the Application Management module. What is more, it will also describe the permissible scaling actions and policies for each component or module, and this information will be stored in the CELAR DataBase system. Using this information, along with some “hints” about the range of utilized resources and observed workloads, the Application Profiler with the help of the CELAR expert can execute an initial profiling in order to collect information about the application behavior/performance in different system states. This information is then passed to the Decision module and will be used during future scaling decisions. When the application is bootstrapped, the Monitoring module collects and stores all the required performance metrics. These metrics are fed to the Decision module, which, according to the specific policy and observed workload, takes decisions about resource allocation (scale-up) or de-allocation (scale-down). These decisions are passed to the Resource Provisioner module and they are translated into application specific commands that are being applied through the Cloud and Application Orchestration modules at the IaaS and SaaS/PaaS layers.

(25)

Physical Layer IaaS SaaS/ Pass Cloud Provider Application Management Platform

Cloud Information and performance monitor

CELAR DataBase Resource Provisioner Decision Module Application Profiler Elasticity Platform Cloud Orchestration Application

Orchestration CELAR Manager

Figure 4-1 Elasticity platform architecture

Describe Submit Deploy Profile Monitor and Manage Terminate

Elasticity Provisioning Platform

Figure 4-2 Elasticity platform interaction during the lifecycle of an application

In the following, we give a brief description of how the Elasticity Provisioning Platform’s components interact during an application’s lifecycle, as described in D1.1. This interaction is depicted in Figure 4-2. The Application Management component, after a command issued by the Application user, deploys the application utilizing the CELAR Manager module. The CELAR Manager is being executed until the application’s termination. When the deployment is initiated, the lifecycle moves to the profiling phase: in this phase, the Application Profiler gets information about the application structure and permissible elasticity actions from the CELAR DataBase, and utilizes the Resource Provisioner to launch some representative application configurations which will be used to apply a set of client workloads. The results of these executions will be stored to the CELAR DataBase module. When the profiling phase

(26)

is completed, the lifecycle moves to the Monitor and Manage phase, which is the main phase of the application’s lifecycle. During this phase, the Decision Making module constantly gathers information from the monitoring module and the CELAR DataBase, performs elasticity actions utilizing the Resource Provisioner module and informs the CELAR DataBase about the performed actions. Finally, upon an Application user’s request, the lifecycle moves to the Terminate Phase, in which the application is terminated using the Resource Provisioner module, and the CELAR platform stores this information in the CELAR DataBase.

We now give a detailed description about each platform module.

4.2.1 Resource Provisioner

Input from: Decision Module, CELAR Manager

Type: (resource_type, [resource_handle], resource_description)

Output to: CELAR DataBase, IaaS/Application

The Resource Provisioner module is responsible for translating and executing the higher level elastic commands of the Decision Making module into application specific (PaaS) and IaaS commands. It is also responsible for tracking and ensuring the correct deployment and configuration of all the application modules. The following higher level steps are performed:

Cloud account configuration: Information about cloud user credentials (such as

private keys, user authentication tokens, etc) according to the underlying IaaS needs to be configured.

Deployment definition: The module will allow the user to select a base preexisting

image from the IaaS provider (for instance, a vanilla Debian image), install the appropriate binaries and applications, and take a snapshot with all the necessary binaries pre-installed. Image snapshot references (the AMI equivalent of AWS) are attached to different application components (for instance, a different AMI for an HBase server/client instance and a different AMI for a MySQL server instance). Deployment scripts that are executed upon image instantiation or termination are defined, along with the preferred VM types (number of CPUs, RAM, etc).

Launch deployment.

Monitor deployment status: The module must be able to monitor the correct

operation of the deployed application. Real time-tracking of the deployed virtual machines will be performed so that it is ensured that the application is up and running.

The launch deployment task requires both IaaS and PaaS (Application specific) interactions which are being performed by the Cloud Orchestration and the Application Orchestration modules respectively.

The Cloud Orchestration module translates higher level elasticity commands from the Decision Making module into specific IaaS commands. For instance, when a decision for a new virtual machine is being made, the Cloud Orchestration module utilizes the underlying cloud provider’s IaaS API, launches a new VM, and may block the Resource Provisioner’s execution cycle until the VM considered as launched (i.e., until the VM is up and running on IaaS level and booted with a valid IP and the appropriate user credentials to allow the Application Orchestration module to login). In another example, where extra storage space

(27)

is required, the Cloud Orchestrator will contact the underlying IaaS cloud and execute the appropriate commands to create and hot-plug a larger disk volume to the VM. After the volume got allocated and hot-plugged, the Cloud Orchestrator can then inform the Application Orchestrator to continue its workflow of actions.

The Application Orchestration will be responsible for the orchestration of the application’s resources during its execution in a manner similar to [Tsoumakos 2013]. Particularly, when new resources are allocated for the application (e.g., VM creation, storage increase, etc.) using the Cloud Orchestrator, the Application Orchestrator must be able to execute the appropriate tasks so that the application can utilize the newly created resources. Similarly, when a de-allocation action is decided (e.g., storage decrease, VM de-allocation, etc.), the Application Orchestrator will ensure that the application will not be affected by the resizing action and it will maintain its state. These actions can be accomplished by utilizing deployment and configuration management tools. In any case, when the Cloud Orchestrator commits a resizing action, the Application Orchestrator will follow a number of predefined steps (in the form of a workflow or a custom-made script which will be applied per application tier) in order to orchestrate the application and confirm that the resizing action performed in an application transparent manner. For instance, during an HBase node addition, the Application Orchestrator will be informed by the Cloud Orchestrator that a new VM with a specific IP is up and running. When this happens, it will login-in into the machine (using, for instance, passwordless SSH), it will transfer the appropriate XML configuration files, it will start the appropriate Java processes and it will inform the HBase master about the newly joined node.

The aforementioned deployment script will be defined once per application and per resizing action (e.g., one script providing the necessary tasks for the action of removing an HBase node). These scripts will be created by the Application Expert in cooperation with the CELAR engineer and, eventually, these scripts will be stored to CELAR DataBase and they can be used as a library of scripts for new applications.

At this phase, CELAR plans on utilizing the SlipStream [SlipStream] application to play the role of the Resource Provisioner. SlipStream is an application that allows automated provisioning and creation of cloud resources. SlipStream provides a simple access to a cloud infrastructure. It can provide repeatable single-click deployment of multi-component distributed applications on any cloud infrastructure; it hides cloud-specific instantiation of images and contextualization while keeping deployment scripts versioned and organized. SlipStream will soon be released under an open source license. The work done by the Cloud and Application Orchestration modules will then be integrated into the SlipStream environment: the higher level elasticity commands will be transformed into SlipStream API calls that, in turn, will directly interact with the IaaS layer (using hooks with the most popular cloud provider APIs).

4.2.2 Decision module

(28)

Type: Current application load status, Static Information (Cloud Description, Cloud Services, VM types, Pricing Schemes, Performance, etc.), Application Profile, Application Elasticity Requirements and Application Structure/Topology

Output to: CELAR Manager, and/or other interested parties (reporting)

Type: Application Deployment Action Plan, Application Elasticity Control Action Plan,

Cost Estimation, Application Elasticity Analysis Report, Application Deployment Action Plan, Application Elasticity Control Action Plan

4.2.2.1 Overview

The Decision Module generates and sends for enforcement action plans targeting the: (i) cloud deployment of new applications submitted to CELAR, and (ii) maintaining the behavior of running elastic cloud applications within client acceptable limits.

When a new application is described and submitted to CELAR the Application user can initiate the “Deploy Application” use case through the Application Submission Module. When the correct Elasticity Provisioning Platform environment has been correctly launched and configured through the CELAR Manager process, the Decision Module is notified, which uses information about application structure, elasticity restrictions, application profile, available elasticity actions for each application logical component and cloud services description to generate an Application Deployment Action Plan and a Cost Estimation for running the target application on cloud.

After deploying an elastic application, at run-time the decision module retrieves application monitoring data and analyses if the application behaves within the defined cost, resource and performance restrictions. If restriction violations are detected, an Application Elasticity Control Action Plan based on the available elastic actions defined for the target application is generated and sent to the Resource Provisioner through the CELAR Manager.

In the following we will describe conceptual components of Decision Making Module and its role in the architecture.

(29)

Figure 4-3 Decision module overview

4.2.2.2 Components

The decision module contains three main components (Figure 4-3), the Learning Engine, the Analysis Engine and the Planning Engine, and a support component, the Data Processing Unit.

4.2.2.2.1 Data processing unit

In order to support fine grained analysis and planning, the data used by the analysis component is passed through a Data Processing Unit. While many elasticity control mechanisms focus on analyzing the raw monitoring data retrieved at the virtual or physical

(30)

infrastructure level, we argue that just analyzing such low level metrics does not provide enough information to support correct analysis of application behavior. The Data Processing Unit is inserted as to address this shortcoming and refine the data received by the Decision Module, aggregating the virtual infrastructure level monitored metrics into higher level ones, with respect to the application structure/topology. Using such an aggregation mechanism, we can extract higher level information, such as overall CPU usage, performance and cost over a specific application tier or over the entire application.

Providing higher level metrics from low level ones retrieved from existing cloud monitoring systems is non-trivial, since the monitoring system needs more information than what can be collected from the virtual machine level. Instrumentation is usually applied as a technique for gathering application level information, but we argue that many cloud users might not want to instrument their applications due to various reasons, such as security or data privacy. CELAR approach differs as we provide a mechanism for defining higher level metrics based on data gathered at the virtual machine level from existing monitoring systems, without intervening in the client application or modifying already existing monitoring systems. This approach enables the decision module to use data retrieved from any monitoring data source, increasing its flexibility and applicability.

4.2.2.2.2 Analysis engine

The Analysis Engine uses cloud provider and application description data, monitoring data and application elasticity restrictions retrieved from the Data Processing Unit, translated into SYBL elasticity specifications for being enforced by the SYBL runtime integrated within the planning engine [Copil 2013]. During the deployment of a new cloud application, the engine analyses existing cloud provider services and compiles a report containing services that respect the cost, resources and performance restrictions, and which can be used to deploy the new application. Targeting already deployed applications, the analysis engine detects if the application monitoring data signals a violation of elasticity restrictions. The analysis result is sent to the Planning Engine.

4.2.2.2.3 Planning engine

The Planning Engine uses the analysis result. During the deployment of a new application, it selects for application deployment from the compatible cloud services discovered by the Analysis Engine, the subset that provides the best performance and most resources for the lowest cost. After the application is deployed, using the result of the application behavior analysis, if necessary, it generates from the available elasticity actions an action plan that would bring the target application behavior within client accepted boundaries.

The elastic control of cloud applications usually focuses on a single part of the application (i.e. storage part) or on specific application types. CELAR provides multi-level of control over the application. We consider that an application is composed of components (i.e. an application part which consists of a functional or data unit) and complex components (i.e. logical grouping of components). The Planning Engine uses the application structure to generate such multi-level action plans, applying elasticity actions at different application hierarchical levels, such as over the underlying virtual machines, over the existing tiers (such as a data or business end) or over the entire application.

The resulted action plan is sent to the Resource Provisioner module through the CELAR Manager, which interprets and executes it.

References

Related documents