Flexible Distributed Capacity Allocation and Load Redirect Algorithms for Cloud Systems

(1)

Flexible Distributed Capacity Allocation and

Load Redirect Algorithms for Cloud Systems

Danilo Ardagna∗, Sara Casolari∗∗, Barbara Panicucci∗ ∗_{Politecnico di Milano, Dipartimento di Elettronica Informazione}

∗∗_{Universit`a di Modena e Reggio Emilia, Dipartimento di Ingegneria dell’Informazione}

Email:{ardagna,panicucci}@elet.polimi.it,{sara.casolari}@unimore.it

Abstract—In Cloud computing systems, resource management is one of the main issues. Indeed, in any time instant resources have to be allocated to handle effectively workload fluctuations, while providing Quality of Service (QoS) guarantees to the end users. In such systems, workload prediction-based autonomic computing techniques have been developed. In this paper we propose capacity allocation techniques able to coordinate mul-tiple distributed resource controllers working in geographically distributed cloud sites. Furthermore, capacity allocation solutions are integrated with a load redirection mechanism which forwards incoming requests between different domains. The overall goal is to minimize the costs of the allocated virtual machine instances, while guaranteeing QoS constraints expressed as a threshold on the average response time. We compare multiple heuristics which integrate workload prediction and distributed non-linear optimization techniques. Experimental results show how our solutions significantly improve other heuristics proposed in the literature (5-35% on average), without introducing significant QoS violations.

Keywords: Infrastructure-as-a-Service Clouds, Perfor-mance Modeling and Management, Capacity Allocation, Load Balancing, QoS.

I. INTRODUCTION

Cloud computing is an emerging paradigm that aims at streamlining the on-demand provisioning of software, hard-ware, and data as services, providing end-users with flexible and scalable services accessible through the Internet [15]. Modern cloud infrastructures live in an open world, charac-terized by continuous changes in the environment and in the requirements they have to meet. Continuous changes occur autonomously and unpredictably, and they are out of control of the cloud provider. Therefore, in order to provide infrastructure or software as a service, advanced solutions have to be devel-oped able to dynamically adapt the cloud infrastructure, while providing continuous service and performance guarantees.

In this paper we propose workload prediction-based capac-ity allocation techniques able to coordinate multiple distributed resource controllers working in geographically distributed cloud sites. We propose also a dynamic load redirection mech-anism which allows to make near-instantaneous and intelligent decisions on the requests that have to be redirected during peak loads from heavily loaded sites to other sites. Requests’ distribution is optimized according to the average response time of incoming requests and the QoS requirements of end users.

In cloud systems centralized approaches to capacity alloca-tion and load balancing have several critical design limitaalloca-tions

including lack scalability and high network communication cost (such as network bottleneck congestion) [5], [25]. Cen-tralized solutions are not suitable for geographically distributed systems, such as the cloud or more in general massively distributed systems [3], [19], [16], since no entity has global information about all of the system resources. Therefore, efficient decentralized solutions are mandatory. Distributed resource management policies have been proposed to govern efficiently geographically distributed systems that cannot im-plement centralized decisions and support strong interactions among the remote nodes [3]. Sometimes, local decisions could lead the system even to unstable oscillations [18]. It is, thus, difficult to determine the best control mechanism at each node in isolation, so that the overall system performance is opti-mized. Dynamically choosing when, where and how allocate resources and coordinating the resource allocation accordingly is an open problem and is becoming more and more relevant with the advances of clouds [16]. One of the first contributions for resource management in geographically distributed systems has been proposed in [3], where novel autonomic distributed load balancing algorithms have been proposed. In distributed streaming networks, authors in [19] have proposed a joint admission control and resource allocation scheme.In our work, the capacity allocation and load redirect of multiple class of requests are modeled as non-linear programming problems and solved with decomposition techniques exploiting predictive models of the incoming workload at each physical site. We compare our approach with other heuristics proposed in the literature [13], [27], [26] obtaining 5-35% cost savings, with-out incurring in significant Service Level Agreement (SLA) violations. To the best of our knowledge this paper is the very first contribution that proposes an analytical solution to the capacity allocation and load redirection for cloud systems.

The remainder of the paper is organized as follows. The next Section introduces the problem under study, while Section III describes our main design assumptions. The prediction tech-niques used in our work are introduced in Section IV. The op-timization problem formulation is presented in Section V. The experimental results demonstrating the quality and efficiency of our solutions are reported in Section VI. Conclusions are finally drawn in Section VII.

II. PROBLEMSTATEMENT

In this paper we take the perspective of a Web service provider which offers multiple transactional Web Services 2011 IEEE 4th International Conference on Cloud Computing

(2)

Local workload manager _Virtualized Servers IaaS Provider !"#$%&'()&*+",-().,"$.#( /&#01&#-( !)( !)( !)( 234( 234( 235( !)( 236( 7( 7( i=1 i=2 i=3 i=4 !!"#$!""#$%#$!&"$$ !"#$%!&#$%'$%!(#%% !"#_$%!&#_$%'$%!(#_%% Local workload manager

Local CA and LR manager Local WS arrival rates

Virtualized Servers Execution rate of local arrivals

!!"#$!%"#$&#$!'"$$

!"#$%!&#$%'$%!(#%%

!"#$%!&#$%'$%!(#%%Redirect rate of local arrivals

!"#$%"#& & &

Fig. 1. Cloud System Reference Framework.

(WSs) hosted at multiple sites of an Infrastructure as a Service (IaaS) provider. The hosted WSs represents different applica-tions which can be heterogeneous with respect to resource de-mands, workload intensities, and QoS requirements. Services with different QoS and workload profiles are categorized into independent WS classes.

An SLA contract, associated with each WS classkis estab-lished between the WS provider and its end users. It specifies the QoS levels, expressed in terms of average response time

Rk, the WS provider must meet while responding to end users

requests for a given service class. Overall, the system serves a set K of WS classes and average response time thresholds are denoted withRk.

Applications are hosted in virtual machines (VMs) which are provided on demand by the IaaS provider. For the sake of simplicity, we assume that each VM hosts a single Web service application. Multiple VMs implementing the same WS class can run in parallel at each physical location. In that case, we assume that the running VMs are homogeneous in terms of RAM and CPU capacity and evenly share the incoming workload (this corresponds to the solution currently implemented by IaaS providers [2]). Furthermore, services can be located on multiple sites (see Figure 1). For example, if we consider Amazon Inc. with its Elastic Compute Cloud (EC2) [2] as IaaS provider, EC2 allows software providers to dynamically deploy VMs on five regions located around the world which are further spread on multiple availability zones. IaaS providers usually charge software providers on a hourly basis [2]. Hence, the WS provider has to face the Capacity Allocation (CA) problem which consists on determining every hour the optimal number of VMs for each WS class in each IaaS site according to the average load predicted on a hourly basis, while guaranteeing SLA constraints. In the following we will denote by T1 the mid-long time scale adopted for

VMs provisioning. On the other hand, if a site resources are insufficient (e.g., because of an unpredictable workload

fluctuation) and the computing conditions become critical, incoming requests can be redirected to other sites. As in other approaches, dynamic Load Redirection (LR) [5], [27] is performed periodically everyT2<< T1 time instants (see

Figure 2) at a more fine-grained time scale (e.g., 5 to 10 minutes) on the basis of a short-term prediction of future WS workloads [1], [8] or can be triggered by a monitoring system in order to react to unexpected events (e.g., system failures). By considering two different time scales, we are able to capture two types of information related to the IaaS sites [27]. In particular, the fine-grained workload traces exhibit a high variability due to the short-term variations of the typical Web-based workload and for this reason, the fine-grained time scale provides useful information for the dynamic load redirections. Instead, increasing the time scale, the workload traces are more smoothed and are not characterized by the instantaneous peaks of the fine-grained time scale. This aspect allows us to use the mid-long time scale to predict the workload trend that represents useful information for the capacity allocation algorithm. In the following, we will denote by I the set of IaaS sites. Each site is associated with its resource attribute (available VMs capacity). For simplicity we assume that VMs are homogeneous in terms of their computing capacity. In fact, even in case of heterogeneous VMs, cloud providers have a limited set of available configurations, sayS, and therefore a site with heterogeneous resources can be modelled asS sites with homogeneous resources. The capacity of VMs at site i

is denoted withCi_{, while for each site}_i_{and WS class}_k _we

denote withΛbi_k the arrival rate predicted at the time scaleT1

coming from the time zone where the site is located, while we denote with b

b Λ i

k the corresponding prediction at time scale T2 (see Figure 2).

The objective of the CA problem is to determine the number of VMs able to serve Λbi_k requests/s, while minimizing VMs

costs and guaranteeing thatRk≤Rk.We assume that a WS

provider can establish two different contracts with the IaaS provider. Namely, it may be possible to access VMs on a

pure on-demand basis and the WS provider will be charged

on a hourly basis (see e.g., Amazon EC2on-demand pricing scheme, [2]). Otherwise, it may be possible to pay a fixed annual flat rate for each VM and then access the VMs on a pay-per-use basis with a fee lower than the pure on-demand

case (see e.g., Amazon EC2 reserved instances pricing scheme, [2]). The time unit cost (e.g.,$per hour of VM usage) for the use offlat VMs at sitei is denoted byci, while the cost for

VMson demandwill be denoted byec

i_{, with}_ci_< e

ci_{. The CA}

problem solution determines every T1 time unit the number

offlat VMs to be allocated to WS classk at site i,Ni

k,and

the number of on demandVMs to be allocated to classk at sitei,Mi

k.We will denote withN i

, the number offlat VMs available at sitei(obtained with the annual flat contract).

On the other hand, the LR problem aims at determining (everyT2time instants) the execution rate of local arrivals for

classk at site i,xi

k,and the redirect rate of classk at site i

toward the other sites,zi

k,in order to satisfy the predictionΛbb i k

(3)

!"# !"# !$#!"## !!"## $# $#

Fig. 2. Prediction model time scales.

for the local arrivals, while guaranteeing thatRk≤Rk.

For the sake of clarity, the notation adopted in this paper is summarized in Table I.

III. DESIGNASSUMPTIONS

Our dynamic CA and LR techniques combine a workload predictor and an optimization model. In the following we will model each WS class hosted in a VM as an M/G/1 queue [10] as authors in [8] and we assume that requests are served according to the processor sharing scheduling discipline which is common among Web service containers. Future performance for each WS class are obtained on the basis of the prediction of future workloads. The optimization model uses these estimates to determine the number of VM intances Ni

k and Mki, the

execution rate of local arrivals for each classxik,and, possibly,

the workload redirect to other sites z_ki.

For the sake of simplicity, if the workload is redirected to other sites, the fraction of workload to individual sites is inversely proportional to the network delay or equivalently is directly proportional to the “conductance”gi,j _{of the network}

link between site i and j defined as gi,j = 1/di,j. In other

words, if we define the “equivalent conductance” Gi _{at site}

i as Gi ₌ P

j∈I,i6=j

gi,j_{, the overall load at site} _i _{due to the}

redirect of other sites is given by:

X

j∈I,j6=i

gj,i_zj

k

Gj .

At the time scaleT2, the total rate of classkrequests executed

at siteiis the sum of the requests executed from local arrival, i.e. xi

k ≤ Λik, and the requests executed from the redirect

which, according to the previous equation is given by:

xi_k+X

j6=i

gj,i_zj

k

Gj . (1)

In other words, in our LR scheme requests can be redirected only once. Otherwise multiple hops could penalize too much some individual requests, increasing the overall response time variance of requests within the same WS class.

IV. WORKLOADPREDICTIONMODELS

For each siteiand WS classk, our CA and LR algorithms use two different predictions of the (real) incoming workload

Λi

k coming from the time zone where the site is located.

System Parameters

I Set of sites

K Set of WS classes

Ci _{VM instances capacity at site}_i

ci _{Time unit cost for}_flat_{VMs at site}_i

ec

i _{Time unit cost for}_{on demand}_{VMs at site}_i

Ni Number offlatVMs available at sitei T1 Long term CA time horizon

T2 Short term LR time horizon

Λi

k Real local arrival rate for WS classkat sitei

b

Λi

k Local arrival rate prediction for WS classkat siteiat time scaleT1

b b

Λ i

k Local arrival rate prediction for WS classkat siteiat time scaleT2

µk Maximum service rate of a capacity 1 VM for executing WS classkrequests

di,j_{, i}₆₌_j _{Delay (s) for requests redirecting from} siteito sitej

gi,j₌ 1

di,j, i6=j “Conductance” of the communication link (i,j) Gi₌P

j

gi,j_{, i}₆₌_j _{“Equivalent conductance” seen from} siteito the other sites

Ri

k Response time for executing WS classk request at sitei

Rk WS class requestkthreshold Decision Variables

Ni

k Number offlatVMs allocated for classk re-quest at sitei

Mi

k Number ofon demandVMs allocated for class

krequest at sitei xi

k Execution rate of local arrivals for WS classk request at sitei

zi

k Redirect of WS classkrequest at sitei toward other sites

TABLE I

CAANDLRPROBLEMS PARAMETERS AND DECISION VARIABLES.

In order to predict the local arrival rate Λi

k, we take

into account a simple well known model, the Exponential Smoothing (ES). Our choice for a simple model is motivated by the application context characterized by short-time pre-dictions suitable for autonomic decisions subject to real-time constraints in cloud systems. ES is an intuitive forecasting method that unequally weights the samples of the input time series Λi

k [17]. Non-uniform weighting is achieved through

smoothing parameters which determine how much importance is assigned to each sample. ES models have been adopted in many fields for decades [12] and are suitable to runtime and non-stationary applications. In our work, we consider a ver-sions of ES, where parameters are dynamically chosen in order to adapt the prediction model to the workload fluctuations that characterize the modern cloud systems.

In the following description of the ES-based prediction model, we consider the time scale T1. At sample t, the

ES model predicts the local arrival rate at T1 steps ahead, b

Λi

k(t+T1), as a weighted average of the last sample Λik(t)

and of corresponding predicted sampleΛbi_k(t), that is equal to: b Λi k(T1) = 1 T1 T1 X t=1 Λi k(t) b Λik(t+T1) =γik(t)Λbi_k(t) + (1−γ_ki(t))Λi_k(t), t > T₁

(4)

whereΛbi_k(T1)is the initial predicted value and0< γki(t)<1

is thesmoothing factorat current sampletrelated to the sitei

and the class k that determines how much weight is given to each sample. We obtain a dynamic ES model by re-evaluating the smoothing factorγi

k(t)at each prediction samplet. There

are different proposals for the dynamic estimation of γi

k(t)

(e.g., [24], [17]). Although there is no consensus, a widely used procedure is proposed by Trigg and Leach [24]. They define the smoothing parameter as the absolute value of the ratio of the smoothed error,Ai

k(t), to the absolute error,Eki(t),

γi_k(t) =A

i k(t)

Ei

k(t). The smoothed and absolute errors are equal to:

Aik(t) =φik(t) + (1−φ)Aik(t−T1)

Eki(t) =φ|ik(t)|+ (1−φ)Eki(t−T1)

wherei_k(t)is the forecast error at samplet,i_k(t) = Λi_k(t)−

b

Λi

k(t), and φ is set arbitrary, with 0.2 being a common

choice [24]. This dynamical choice of γi

k(t)should improve

the prediction quality and should limit the delay problem related to the traditional ES model based on a static choice of the γ_ki parameter. The considered model is expected to be useful in contexts characterized by time series with non stationary behaviour and a variable noise component. We use an analogous implementation of the ES prediction model to predict the local arrival rate at time granularityT2,Λbb

i

k. For the

sake of simplicity, in the remainder of the paper thetsample index will be omitted.

V. OPTIMIZATIONPROBLEMFORMULATION

As discussed in Section II, Capacity Allocation and Load Redirect are performed with different time scales. The Capac-ity Allocation problem is formulated in the next Section, while our Load Redirect mechanism is presented in Section V-B.

A. Capacity Allocation problem

The CA problem is solved with T1 time period and aims

at minimizing the overall costs for flat and on demand VM instances of multiple distributed IaaS sites, while guaranteeing that the average response time of each class is lower than the SLA threshold. The CA determines the number of VMs Ni k

and Mi

k required to serve the arrival rate Λbi_k. In this phase

the LR mechanism is neglected. Preliminary results, indeed, have shown that the LR mechanism, even if significant at the lower time scale T2, introduces a limited increment to each

class local incoming workload Λbi_k which is comparable with

the workload prediction accuracy obtained in practice. If we denote withµkthe maximum service rate of a capacity

1 VM for executing WS class k requests, the response time for executing locally WS class k at sitei is given byRi

k = 1 Ci_µ k− b Λi k N i k+M ik

. In particular it must be (M/G/1 equilibrium condition) Λbi_k < Ciµk(Nki +Mki), and the total response

time for class krequest over all sites is:

Rk= X i b Λi kRik P j b Λj_k. (2)

Hence, after some basic algebra, the CA problem can be formulated as: (CA) min Ni k,M i k P k P i ci_Ni k+ec i_Mi k subject to b Λi_k< Ciµk(Nki+M i k) ∀k∈K,∀i∈I X i b Λi k(Nki+Mki) Ci_µ k(Nki+Mki)−Λbi_k ≤Rk X j b Λj_k ∀k∈K X k∈K N_ki≤Ni,∀i∈I,

where the last constraints family guarantees that the number of VMs allocated to the whole set of classes at site i is at most equal to the number of flat VMs available at each site. Note that, in the problem formulation we have not imposed variables Ni

k and Mki to be integer, as in reality

they are. In fact, requiring variables to be integer makes the solution much more difficult since non linear constraints are introduced. We therefore decide to deal with continuous variables, actually considering a relaxation of the real problem. However, preliminary experimental results have shown that if the optimal values of the variables are fractional and they are rounded to the closest integer solution, the gap between the solution of the real integer problem and the relaxed one is very small, justifying the use of a relaxed model. Furthermore, we can always choose a rounding to an integer solution which preserves the feasibility and the corresponding gap in the objective function is a few percentage points.

The CA problem has a linear objective function over a con-vex set. Hence, the global optimum solution can be obtained solving CA in parallel at each site by adopting standard non linear solvers. This requires that each site broadcasts its Λbi_k

predictions which can be obtained however considering only local information. Since this broadcast is performed everyT1

time instants, the network overhead for the CA solution is very limited.

B. Load Redirect problem

Once the number of on demand instances has been deter-mined, local requests can be dynamically redirected to other sites with time granularityT2in order to, e.g., avoid episodic

local congestions due to the variability of the incoming work-load at time granularityT2around its hourly average prediction

(see Figure 2).

According to equation (1), the response time for executing locally WS class k at site i (i.e., without considering the network delay due to redirects) is given by:

b Rik= 1 Ci_µ k− xi k+ P j6=i gj,i zj_k Gj Ni k+M i k .

During time interval T2 the number of executions of classk

request at site i is T2xik+T2 P

j6=i

gj,i_zj

k

Gj , and the response time for remote requests is given by both Rbi_k and the delay

P j6=i dj,i·gj,i z j k Gj P j6=i gj,i zj k Gj

(5)

classkrequest at siteiis: Rik=Rbi_k+ P j6=i zj_k Gj xi k+ P j6=i gj,i_zj k Gj ,

and the total response time for class k request over all sites is: Rk= X i xi k+ P j6=i gj,i_zj k Gj Ri k P j b b Λ j k .

The goal of our load redirect scheme is to cooperatively minimize request average response times. Formally the LR can be formulated as a constraint programming problem since

Rk ≤ Rk must hold and the cost for request execution

is determined by the CA solution and is not influenced by the LR decision variables. However, in order to provide an efficient distributed solution, in our LR problem formulation we consider the total requests response time as the metric to be minimized. Preliminary experimental results have shown, indeed, that introducing an objective function allows to speed up the distributed algorithm convergence relying on standard non linear solvers. The LR problem can be formulated as follows: (LR) min xi k,z i k X k X i      (Ni k+Mki) xik+ P j6=i gj,iz_kj Gj Ci_µ k(Nki+Mki)−(xik+ P j6=i gj,i_zj k Gj ) +X j6=i zj_k Gj      subject to xik+zik=bbΛ i k ∀k∈K, i∈I, (3) xik+ X j6=i gj,i_zj k Gj < C i_µ k(Nki+Mki) ∀k∈K, i∈I, (4) xi_k, zi_k≥0 ∀k∈K, i∈I.

Constraints (3) ensure that the overall classkrequests at node

iare locally executed or are redirected toward the other sites, while constraints (4) guarantee that VMs saturation conditions are avoided.

(LR) defines a centralized load balancing problem: All the system information (i.e., the local incoming workload predictionsb

b Λ i

k) has to be gathered together and used to get the

optimal workload balancing. However, for large scale cloud systems, this centralized load balancing scheme is not suitable. Even assuming that the broadcast of b

b Λ i

k values does not add

a significant network overhead in the system (indeed T2 is

around 5-10 minutes), the solution of the (LR) problem for large system cannot be obtained within theT2time limit with

the non linear solvers currently available. For this reason, we have devised a distributed decomposable solution for problem (LR) relying on Lagrangian techniques and obtaining closed formulas for elementary problems to be solved by applying the Karush Kuhn Tucker (KKT) conditions.

Our implementation supports a distributed protocol for (LR) solution in which each site solves its problem using both local

information and information received from other sites. In par-ticular, as discussed in the following, we develop an iterative method and at each iterationzi

k is the only information that

will be shared among sites.

Our decomposition technique is founded on duality theory in optimization, [9]. First of all, we observe that the optimiza-tion problem is convex (see [4]), the duality gap is zero and then the global optimum solution can be identified [23] solving the primal via the dual. Secondly, (LR) can be decomposed into|K|independent sub-problems (one for every class) which can be obtained from (LR) simply omitting thek index.

Furthermore, LR is characterized by two types of coupling: Coupling constraints, and coupled utilities [23]. Indeed, each term in the objective function not only depends on the local variable xi

k (xi in the following) but also on the variables of

the other siteszi

k(zi). The key idea to address coupled utilities

is to introduce auxiliary variables and additional equality constraints. The (LR) problem solution then can be obtained by solving|K|problems as:

(LRk) min xi_,yi_,zi_,wi X i " (Ni+Mi) xi+yi Ci_µ_(Ni₊_Mi₎₋_(xi₊_yi₎+w i # subject to xi₊_zi₌b b Λ i ∀i∈I, (5) xi+yi< Ciµ(Ni+Mi) ∀i∈I, (6) yi=X j6=i gj,i_zj Gj ∀i∈I, (7) wi=X j6=i zj Gj ∀i∈I, (8) xi, yi, zi, wi≥0 ∀i∈I, (9)

where xi_{, y}i_,_and_wi _{are local variables at site} _i_{. Next, we}

consider the Lagrangian: (RP) min xi_,yi_,zi_,wi X i " (Ni+Mi) xi+yi Ci_µ_(Ni₊_Mi₎₋_(xi₊_yi₎+w i₊ +Θi yi− X j6=i gj,i_zj Gj +ηi wi− X j6=i zj Gj  

subject to constraints (5), (6), and (9), where theΘi’s andηi’s

are theconsistency prices[23].

By exploiting the decomposable structure of the Lagrangian, the relaxed problem (RP) further separates into |I| subprob-lems: (SU Bi) min xi_,yi_,zi_,wi (Ni_+Mi₎ _xi_+yi Ci_µ_(Ni_+Mi₎₋_(xi_+yi₎+w i₊ +Θi yi−P j6=i gj,izj Gj +ηi wi−P j6=i zj Gj

subject to constraints (5), (6), and (9).

The optimal value of (RP) for a given set of Θi’s andηi’s

defines the dual functionL(Θ, η)and the dual problem is then given by:

(6)

(D) maxΘ,ηL(Θ, η).

The dual problem can be solved by using a subgradient method: Given initial values Θi(0) andηi(0) the iterates are

generated by Θi(t+ 1) = Θi(t) +αt yi− X j6=i gj,i_zj Gj , (10) ηi(t+ 1) =ηi(t) +βt wi− X j6=i zj Gj , (11)

wheretis the iteration index andαt, βtare sufficiently small

positive parameters (see [23]). Note that, each update step in this approach uses data from all of the sites. This method naturally lends itself to a distributed implementation: Each site

iupdates its primal variablesziwhich in turn are broadcasted

toward other sites. Then, each siteiupdates its dual variables

Θi, ηiusing only local (sub-gradient) information. The scheme

for the solution of each sub problem (LRk) is reported in

Algorithm 1. The solution of each sub-problem (SU Bi) can

be obtained also very efficiently by closed formula derived by the KKT conditions. Details are omitted here for space limitations and can be found in [4].

The procedure reported in Algorithm 1 is stopped when the percentage difference of the objective function of two consecutive iterations is within a given precision.

Algorithm 1:Lagrangian Distributed

Optimiza-tion Procedure

1) Initialization: Sett= 0,Θ(0)andη(0)equal to some values;

2) Each site solves its problem (SU Bi) and broadcast the solutionzi(not the auxiliary variablesyiandwi); 3) Price updating: Each site iterates the consistency prices

with the iterate in (10) and (11);

4) Set t ← t+ 1 and go to Step 2 (until satisfying stopping criterion);

VI. EXPERIMENTALRESULTS

The resource management algorithms proposed have been evaluated for a variety of system and workload configura-tions. Section VI-A presents the experimental settings and the results on the scalability of our algorithms. Section VI-B presents a cost-benefit evaluation of our solution compared with other heuristics and state-of-the-art techniques [13], [27], [26]. Finally, Section VI-C shows the results of the application of our resource management algorithms in a real prototype environment deployed in Amazon EC2.

A. Algorithm Performance

To evaluate the efficiency of the proposed algorithms, we have used a large set of randomly generated instances. All tests have been performed on VMWare virtual machine based on Ubuntu 9.10 server running on an Intel Nehalem dual socket quad-core system with 32 GB of RAM. The virtual machine has a physical core dedicated with guaranteed performance and 4 GB of memory reserved. We used SNOPT 7.2.4 as non linear solver [20].

The number of cloud sites|I|has been varied between 20 and 60, the number of request classes |K| between 100 and 1000. We would like to remark that, even if the number of cloud sites is small in reality (e.g., Amazon owns 14 avail-ability zones spread over five different regions worldwide),

we consider up to 60 sites. Recall from Section II that in our approach a site withS VM configurations is modelled byS

sites with a single VM configuration.

The maximum service rate of a capacity one VM for executing classk requests,µk,is setRk= 3/µk,as in [26].

Experimental results (see [4] for further details) have shown that the average execution time required to solve instances of maximum size is lower than 3 minutes and one minute for the CA and LR problems, respectively. Hence, both the CA and LR mechanisms can be adopted at the considered time scales.

B. Comparison with alternative literature proposals

We performed a cost-benefit evaluation of our approach considering other heuristics with a twofold aim: On the one hand we compared our solution with other state-of-the-art techniques which exploit the utilization principle and deter-mine the number of VM instances according to an utilization threshold upper bound [13], [27], [26], [2]. On the other hand, the research question we addressed was concerning the effectiveness of the LR mechanism in the cloud. Indeed, in cloud systems the resource provisioning can be performed in very few minutes and hence instead of redirecting the load to other sites one can argue that the allocation of additional VMs to manage peak of traffics could be more effective. In this Section we report the results of the comparison of our CA+LR mechanism with a set of solutions which perform a more fine grained CA at multiple time scales. In the remainder of this Section the following alternative solutions will be considered:

• Heuristic 1: The CA is performed on a 5 minutes

time horizon and the number of VMs is determined according to utilization thresholds as in other approaches proposed in the literature [13], [27], [26] and currently implemented also by IaaS providers (see, e.g., the very recent release of Amazon AWS Elastic Beanstalk [2]). In the evaluation, a life span of one hour for each instantiated VM has been considered. The number of VMs is determined such that the utilization of the VMs is equal to a given thresholdτ1. The VM provisioning is

further triggered if the prediction of the VMs utilization is higher than a second thresholdτ2> τ1. Multiple analyses

have been performed by adopting different thresholds:

(τ1, τ2)=(40%,50%),(50%,60%), and(60%,80%).

• Heuristic 2: Same as Heuristic 1 but the number of

VMs is determined by optimally solving the CA problem reported in Section V-A every 5 minutes.

• Heuristic 3:Same asHeuristic 2 but with a 10 minutes

time horizon.

The performance parameters of the request classes have been randomly generated as in Section VI-A, while the local incoming workload has been obtained from the traces of a very large dynamic Web-based system implementing a multi-tier logical architecture described in [11]. In our experiments the following daily traces have been considered with 5 minutes sample time interval:

(7)

where the number of clients requests changes following the bi-modal requests profile shown in [7].

• Heavy day scenario:It exhibits a 40%increment in the

number of the client requests with respect to the baseline workload.

• Noisy day scenario: It is characterized by the same

request profile belonging to the heavy day scenario with an additional noise component (we added a white noise with zero mean and standard deviation equal to 10% of the heavy day peak). In this way, we increase the system variability in order to prove the accuracy of the prediction model and the robustness of our overall solution also in highly variable contexts.

All scenarios are representative of the typical Web-based workload that is characterized by heavy-tailed distribu-tions [14], [6]. Moreover, the heavy scenarios add burst arrivals and flash crowds [21] that contribute to augment the request skew, and they represent a more stressful testbed for prediction models. The motivation behind this choice is to demonstrate that our prediction algorithm works even in critical scenarios and our CA+LR mechanism are robust to workload variability, although the toughest goal of predicting hot spot events remains an open issue beyond the scope of this paper. In particular, the prediction model considered in this paper is able to provide an accurate prediction quality that, in terms of mean square error [12], is always lower than 10%.

Overall we have considered 12 sites, which we assume are located in 12 different time zones with a one hour time lag and the normal, heavy, and noisy traces have been skewed accordingly.

In the following quantitative analysis we set T1=1 hour,

andT2 = 5minutes. Figure 3 plots, as an example, the VM

costs over the 24 hours for the noisy day scenarios (the normal and heavy cases are very similar), while Table II reports the percentage savings of our approach with respect to the other heuristics considering the total costs over the whole day. Figures 4 and 5 reports, as an example, the plot of the ratio

Rk

Rk of the average response time with respect to the response time threshold of a class considered as a reference example at site 1. The plot shape is pretty general and is independent of the considered site and request class. As the results show, the Heuristic 1 is very sensitive to the thresholds adopted. The

(40%,50%)case is very conservative, it is around 35% more

expensive than our approach but always allows to guarantee the response time threshold (the ratio is strictly lower than 1). Vice versa the (60%,80%) case provides costs close to our solution (only 2-4% higher) but introduces a very large number of SLA violations especially in the noisy day scenario (see Figure 5). Vice versa, our solution introduced overall only 37 violations over the 3456 time intervals considered in the 12 sites, over the whole day. Furthermore, Heuristics 1 is more sensitive to traffic variability. Heuristics 2 and 3 perform better than Heuristics 1, since the number of VMs is optimally determined by the (CA) problem solutions. However, the LR mechanism is still effective since allows to reduce costs by

Alternative solution % Savings

Normal day Heavy day Noisy day Heuristic 1 -(40%,50%) 35.47 34.86 36.84 Heuristic 1 -(50%,60%) 19.53 18.83 21.4 Heuristic 1 -(60%,80%) 3.12 2.25 4.93 Heuristic 2 4.3 3.26 4.44 Heuristic 3 11.56 10.27 6.98 TABLE II

VMPERCENTAGE COST SAVINGS OVER THE24HOURS OBTAINED BY OUR APPROACH.

4-12%. The fine grained resource allocation introduced by Heuristics 2 and 3 indeed ends into an over-provisioning and better performance (see Figures 4 and 5), while the LR mechanism allows to forward traffic spikes to other locations without overcoming in any additional capacity allocation or significant SLA violations.

C. Amazon EC2 Test

The effectiveness of our resource management algorithms has been also evaluated on Amazon EC2 performing experi-ments running the JSP implementation of the SPECweb2005 (http://www.spec.org/web2005/) benchmark. In particular, we have considered the banking workload, which simulates the access to an on line banking Web site implementing a full HTTPS load.

The Web server (Apache Tomcat 5.5.27 in our setup) has been deployed on a large instance, while the load generators, the client coordinator, and the back-end simulator have been hosted by extra-large Amazon instances (in this way we are guaranteed that they are not the system bottleneck). The test is performed deploying VM instances in Virginia and North California Amazon regions. We have obtained an estimate of the maximum service rate parameters and the network delay among different Amazon sites by performing an extensive off line profiling along the lines of [22].

We set R = 0.7 seconds as threshold for the average response time and the overall test lasts one hour. We have generated an appropriate traffic profile and run the CA algo-rithm at time 0 and at time 40 minutes. The LR algoalgo-rithm is run every 10 minutes. During the first 40 minutes the CA solution allocates two on demand Web server instances at the two Amazon sites. During the last 20 minutes the load is evenly shared by introducing the Amazon Elastic Load Balancer among three on demand Web server instances in the Virginia region and five in North California. The Virginia local incomig workload is redirected to North California from minute 30 to 40, while it is redirected from North California to Virginia during the last 20 minutes. Figures 6 and 7 show the the overall traffic served at the two sites.

Figure 8 reports the end users average response time and shows that our CA+LR algorithms are effective since the system provide performance according to the SLA for most of the time and it is able to react to abrupt workload variations.

VII. CONCLUSIONS

We proposed prediction-based distributed CA and LR al-gorithms for IaaS cloud system minimizing the cost of the

(8)

Fig. 3. VM instances costs for the noisy day scenario.

Fig. 4. Response time threshold ratio for a reference class, normal day scenario.

Fig. 5. Response time threshold ratio for a reference class, noisy day scenario.

Fig. 6. Overall traffic served at Virginia EC2 site.Fig. 7._{EC2 site.}Overall traffic served at North CaliforniaFig. 8. Average response time measured for the SPECweb2005 banking workload.

running VMs. Experimental results shown that our solutions significantly improve other heuristics proposed in the literature (5-35% on average), without introducing significant QoS vio-lations. Future work will extend the validation of our solution considering a larger experimental setup.

REFERENCES

[1] B. Abraham and J. Ledolter.Statistical Methods for Forecasting. John Wiley and Sons, 1983.

[2] Amazon Inc. Amazon Elastic Cloud. http://aws.amazon.com/ec2/. [3] M. Andreolini, S. Casolari, and M. Colajanni. Autonomic request

management algorithms for geographically distributed internet-based systems. InSASO, 2008.

[4] D. Ardagna, S. Casolari, and B. Panicucci. Flexible distributed capacity allocation and load redirect algorithms for cloud systems. Politecnico di Milano, Tech. Report 2011.3. http://home.dei.polimi.it/ardagna/CloudCALR2011.pdf, 2011. [5] D. Ardagna, B. Panicucci, M. Trubian, and L. Zhang. Energy-Aware

Autonomic Resource Allocation in Multi-tier Virtualized Environments. IEEE Trans. on Services Computing,available on line.

[6] M. Arlitt, D. Krishnamurthy, and J. Rolia. Characterizing the scalability of a large Web-based shopping system. 1(1):44–69, Aug. 2001. [7] Y. Baryshnikov, E. Coffman, G. Pierre, D. Rubenstein, M. Squillante,

and Y. Yimwadsana. Predictability of web server traffic congestion. In WCW Proc., 2005.

[8] M. Bennani and D. Menasc´e. Resource Allocation for Autonomic Data Centers Using Analytic Performance Models. InIEEE Int’l Conf. Autonomic Computing Proc., 2005.

[9] D. Bertsekas.Nonlinear Programming. Athena Scientific, 1999. [10] G. Bolch, S. Greiner, H. de Meer, and K. Trivedi. Queueing Networks

and Markov Chains. J. Wiley, 1998.

[11] H. W. Cain, R. Rajwar, M. Marden, and M. H. Lipasti. An architectural evaluation of Java TPC-W. InHPCA Proc., 2001.

[12] S. Casolari and M. Colajanni. On the selection of models for runtime prediction of system resources. Autonomic Systems, Springer (Eds. Danilo Ardagna, Li Zhang), 2010.

[13] L. Cherkasova and P. Phaal. Session-Based Admission Control: A Mechanism for Peak Load Management of Commercial Web Sites.IEEE Transactions on Computers, 51(6), June 2002.

[14] M. E. Crovella, M. S. Taqqu, and A. Bestavros. Heavy-tailed probability distributions in the World Wide Web. InA Practical Guide To Heavy Tails, pages 3–26. Chapman and Hall, New York, 1998.

[15] M. D. Dikaiakos, D. Katsaros, P. Mehra, G. Pallis, and A. Vakali. Cloud Computing: Distributed Internet Computing for IT and Scientific Research.IEEE Internet Computing, 13(5):10–13, 2009.

[16] H. Erdogmus. Cloud computing: Does nirvana hide behind the nebula? IEEE Softw., 26(2):4–6, 2009.

[17] S. Everette and J. Gardner. Exponential smoothing: State of the art. Journal of Forecasting, 4, 1985.

[18] P. Felber, T. Kaldewey, and S. Weiss. Proactive hot spot avoidance for web server dependability. Reliable Distributed Systems, IEEE Symposium on, pages 309–318, 2004.

[19] H. Feng, Z. Liu, C. H. Xia, and L. Zhang. Load shedding and distributed resource control of stream processing networks. Perform. Eval., 64(9-12):1102–1120, 2007.

[20] P. E. Gill, W. Murray, and M. A. Saunders. SNOPT: An SQP algorithm for large-scale constrained optimization.SIAM Journal of Optimization, 12:979–1006, 2002.

[21] J. Jung, B. Krishnamurthy, and M. Rabinovich. Flash crowds and denial of service attacks: Characterization and implications for CDNs and Web sites. InWWW2002 Proc., Honolulu, HW, May 2002.

[22] G. Pacifici, W. Segmuller, M. Spreitzer, and A. Tantawi. Cpu demand for web serving: Measurement analysis and dynamic estimation.Perform. Eval., 65(6-7):531–553, 2008.

[23] D. P. Palomar and M. Chiang. A tutorial on decomposition methods for network utility maximization. IEEE J. Sel. Areas Commun, 24:1439– 1451, 2006.

[24] D. Trigg and A. Leach. Exponential smoothing with an adaptive response rate.Operational Research Quarterly, 18, 1967.

[25] B. Urgaonkar, G. Pacifici, P. J. Shenoy, M. Spreitzer, and A. N. Tantawi. Analytic modeling of multitier Internet applications.ACM Transaction on Web, 1(1), January 2007.

[26] A. Wolke and G. Meixner. Twospot: A cloud platform for scaling out web applications dynamically. InServiceWave, 2010.

[27] X. Zhu, D. Young, B. Watson, Z. Wang, J. Rolia, S. Singhal, B. McKee, C. Hyser, D.Gmach, R. Gardner, T. Christian, and L. Cherkasova:. 1000 islands: An integrated approach to resource management for virtualized data centers.Journal of Cluster Computing, 12(1):45–57, 2009.

ACKNOWLEDGEMENT

The work of Danilo Ardagna and Barbara Panicucci has been partially supported by the GAME-IT and IDEAS- ERC Project 227977-SMScom research projects. Sara Casolari ac-knowledges the support of MIUR-PRIN project DOTS-LCCI. Thanks are expressed to Prof. Michele Colajanni for his fruitful comments on the preliminary versions of this paper.