INFORMATION Technology (IT) infrastructure management

(1)

Business-Driven Long-term Capacity Planning

for SaaS Applications

David Candeia, Ricardo Ara ´ujo Santos and Raquel Lopes

Abstract—Capacity Planning is one of the activities developed by Information Technology departments over the years, it aims at

estimating the amount of resources needed to offer a computing service. This activity contributes to achieving high Quality of Service levels and also to pursuing better economic results for companies. In the Cloud Computing context, one plausible scenario is to have Software-as-a-Service (SaaS) providers that build their IT infrastructure acquiring resources from Infrastructure-as-a-Service (IaaS) providers. SaaS providers can reduce operational costs and complexity by buying instances from a reservation market, but then need to predict the number of instances needed in the long-term. This work investigates how important is the capacity planning in this context and how simple business-driven heuristics for long-term capacity planning impact on the profit achieved by SaaS providers. Simulation experiments were performed using synthetic e-commerce workloads. Our analysis show that proposed heuristics increase SaaS provider profit, on average, at 9.6501% per year. Analysing such results we demonstrate that capacity planning is still an important activity, contributing to the increase of SaaS providers profit. Besides, a good capacity planning may also avoid bad reputation due to unacceptable performance, which is a gain very hard to measure.

Index Terms—Capacity Planning, Cloud Computing, Software-as-a-Service.

F

1 I

NTRODUCTION

I

NFORMATIONTechnology (IT) infrastructure manage-ment is a discipline that aims at achieving stability and control of an IT infrastructure [1]. IT management is important to meet QoS requirements and to achieve an efficient use of the infrastructure. Decisions made in such discipline have an impact on the infrastructure owner business bottom line and, because of this, IT infrastructure management evolved to consider business aspects [1].

Planning the amount of computing resources needed to deliver a computing service (i.e., application) is one of the activities of an IT infrastructure management plat-form, which is called capacity planning. Before Cloud Computing, capacity planning typically involved over-provisioning of the IT infrastructure [2] as a common solution to deal with workload peaks.

Cloud Computing has brought some novelties: provid-ers offer computing services in different markets; clients can buy computing services and start them quickly. Three main types of services are commonly present-ed [3]: Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS) and Software-as-a-Service (SaaS); all these services are acquired in a pay-per-use manner. In this paper, we consider a scenario in which a SaaS provider is a consumer of one IaaS provider [4].

IaaS providers typically offer several types of virtual instances configurations: different amounts of CPU units,

David Candeia is with Instituto Federal de Educa¸cão, Ciência e Tecnologia da Para´ıba, Campina Grande, Para´ıba, Brasil. E-mail: [email protected]. Raquel Lopes and Ricardo Ara újo Santos are with Universidade Federal de Campina Grande - UFCG. E-mail: [email protected], [email protected]

Manuscript received May 28, 2014

memory and storage. Furthermore, IaaS providers usu-ally offer virtual instances in two different markets, each one with different charging strategies and QoS offered: (i) in the on-demand market, available instances of the IaaS provider are acquired with no long-term commit-ments by paying an usage fee for each usage charging period (typically an hour). The IaaS provider does not guarantee that the consumer will receive the amount of instances required; (ii) in the reservation market, instances can be reserved for longer future intervals (typically greater than one year) by paying an upfront reservation fee. As the IaaS consumer uses reserved instances, she pays a discounted usage fee (compared to the on-demand usage fee) for each usage charging period. The IaaS provider ensures that reserved instances will be available whenever the IaaS consumer wishes to use them within the reservation interval.

Cloud Computing also brought some challenges for capacity planning. First, a capacity planner now has access to several types of instances and markets to build the IT infrastructure. With so many options, the capacity planning algorithm may achieve better results, but at the cost of higher complexity. Second, the capacity planner has to deal with the uncertainty of future workload prediction, which is typically a very hard task to ac-complish. This task is especially hard for SaaS providers that have contracts that may span from one week to some months and users that may quit whenever they want, as an open system. It is hard to predict the real usage of the system by each tenant. Finally, it is also a challenge to model the impact that capacity planning decisions may have on the business. Impacts on QoS of the services offered may lead to service level agreements (SLAs) violations, which may lead to penalties and bad

(2)

Fig. 1. A workload example

reputation, or even loosing important consumers. All these aspects turn capacity planning in the Cloud Computing context a non trivial problem. Considering the growing number of applications running in the cloud, whether academic or industrial, it’s important to study capacity planning heuristics and strategies. A good capacity planning reserves expected well used resources in the reservation market, in order to achieve cost reductions, instead of only using the more expensive resources from the on-demand market. For low used re-sources, the on-demand market might be the best choice. Also, IaaS providers ensure that reserved resources are available whenever they are needed, contributing to improving the availability of the SaaS application being offered. Capacity planners can combine resources from the on-demand and reservation markets in order to improve application QoS and cost reductions.

Imagine an application with its varying resources demand, as presented in the curve of Figure 1. It is clear that if 32 instances are reserved, lots of instances will not be efficiently used, which results in waste of money. Certainly, the best decision is to reserve some number of instances between 14 and 18 instances from the reser-vation market, and acquire more instances from the on-demand market when needed. Although the rationale behind this idea is apparently very simple, in practice it is difficult to achieve such a good heuristic, mainly because it is hard to predict the real demand of the applications. Besides, there is the risk of having requests for instances not satisfied from the on-demand market.

This work focus on the capacity planning for long future intervals (e.g., one year), evaluating heuristics that use a workload prediction to define how many instances must be acquired from the reservation market. We propose two heuristics based on literature concepts and compare them with other four heuristics, one of them being an optimal solution. Analysing the results we show the importance of performing a capacity planning, the improvements in the SaaS provider profit (mean values

of 10.04% and 9.25%) and the small number of SLA violations achieved by proposed heuristics.

1.1 Contributions and Structure of the Paper

The major contributions of this work are three-fold: (i) we propose an utility model to guide and evalu-ate capacity planning. Our model considers the pay-as-you-go aspect of Cloud Computing, the receipt of the SaaS provider and the costs related to offering a SaaS application; (ii) we propose two capacity planning heuristics that combine usual literature concepts, the utility model proposed, instances acquisition from the on-demand and reservation markets and the usage of different reservation markets. Our focus is on evaluating the combination of such points on both simple proposed heuristics. One of the heuristics also performs a simple evaluation of the possibility of using instances of differ-ent types; (iii) we compare proposed heuristics to four reference heuristics using simulation experiments and a synthetic e-commerce workload based on real world parameters [5] [6].

The rest of this paper is organized as follows: Section 2 discusses the related work. Section 3 presents the utility model proposed. Section 4 presents the two capacity planning heuristics proposed. Section 5 presents the simulation environment used in our experiments and Section 6 contains the evaluation of proposed heuristics. Finally, Section 7 discusses conclusions and possible future research directions.

2 R

ELATED

W

ORK

Capacity planning studies have presented throughout the years a set of streams in the literature, each with its particular features. Recently, the study of elasticity solu-tions was highlighted [7]. A first analysis can separate studies in two worlds: reactive and predictive approa-ches [7]. Reactive approaapproa-ches act only after a condition is satisfied, while predictive approaches anticipate system load to estimate the amount of necessary resources.

Considering workload prediction we can point long-term and short-long-term capacity planning solutions. On the one hand, a long-term capacity planning [8], [9], [10], [11] use workload predictions for a long future interval of time (e.g., a year) and estimates the amount of computing resources needed to deliver the application during such interval. In this work, we assume that long-term capacity planning results can guide the acquisition of instances at an IaaS provider reservation market. On the other hand, a short-term capacity planning [12], [13], [14], [15], [16] uses workload predictions for managing resources for a future short interval of time (e.g., an hour), optimising the amount that will be acquired or released.

The estimation of the amount of necessary resources can be based on operational metrics or business metrics. A capacity planning based on operational metrics focus on meeting operational metrics targets. These targets

(3)

can be defined in terms of availability [17], response time [18], CPU usage, power consumption [7] or a combination of such metrics [19].

Using only operational metrics to plan an infrastruc-ture, without considering metrics such as cost and re-ceipt, can lead to an infrastructure configuration that meets operational targets, but that is an economically infeasible solution. In order to deal with this prob-lem, business-driven IT management solutions [1] aim at combining operational and business metrics in the decision-making. Capacity planning solutions based on business metrics can consider infrastructure costs [20] [21] [14] [15] [10] [11] [16], loss incurred due to consumer defection [8] [9], SLA violations [8] [9] [7] and business profit/revenue [22] [23] [24] [25] [26] when planning IT infrastructures. As more high level models, business models enable the merge of operational metrics like power consumption and business metrics [26].

Our study resembles other capacity planning studies as we consider business metrics in the capacity planning. However, some aspects can be pointed to distinguish our study from the others: (i) we captured the business model of a real SaaS provider profit using a linear utility function that combines the revenue for the different plans with different penalties incurred due to SLA vi-olation; (ii) it is not from our knowledge the existence of studies that combine profit evaluation and the capacity planning of SaaS providers using instances acquired from different markets of an IaaS provider; (iii) proposed heuristics combine this utility model with concepts of resource utilization and Queueing Theory to produce good results. So, our work can be seen as complementary to previous studies.

3 U

TILITY

M

ODEL

Utility [27] is a microeconomics concept used to state the preference of agents (i.e., service providers and their consumers), with higher values typically stating greater preference. Agents use such preferences to guide their behavior: they attempt to achieve the outcomes they most prefer. A utility function maps a space of outcomes onto utility values. A utility function can combine differ-ent aspects and metrics, simplifying the evaluation and choice process done by agents.

The utility model proposed maps the profit of a SaaS provider, obtained as a result of offering an application, onto an utility value. A capacity planning agent can use this model to build a capacity plan that maximizes the utility value, which translates to maximize the SaaS provider profit. Our utility model combines three main components: (i) the revenue obtained from charging consumers that use the SaaS application; (ii) the cost of buying instances from a IaaS provider; (iii) penalties related to SLA violations.

3.1 Revenue Model

The utility model proposed considers that SaaS provid-ers can offer one or many plans to their consumprovid-ers, so

that each consumer chooses the plan that best fits its need and contracts it in order to use the SaaS application. As a result of evaluating current SaaS providers, the revenue model developed aims at covering the main aspects discovered: (i) SaaS consumers are typically charged periodically (i.e., per month or year); (ii) each application has its usage restrictions specified in the plans offered by the provider; (iii) a contract established between a SaaS provider and a SaaS consumer defines provider’s reimbursement rules.

A SaaS provider develops and offers an application A to a set U of SaaS consumers, U = {u1, u2, . . . , u|U |}.

In order to offer this application, the provider builds a portfolio of plans P = {p1, p2, . . . , p|P |}, where each

plan pj aims at meeting a demand of a specific class

of consumers, so it’s expected that |P | < |U |. Each consumer uk|uk ∈ U , chooses and signs a contract related

to a plan pj|pj ∈ P , in order to use application A.

After signing the contract of plan pj, a consumer uk

can use application A for an interval [nb

k, nek], where

ne

k ≥ nbk (for example, if a plan pj is semiannual then

ne

k − nbk = 6 months). For simplicity, all plans offered

by a SaaS provider are accounted by using the same fixed usage period duration (e.g., one month) and period

n with value 1 marks the launch of the application.

Also, we consider that new consumers can only enter the system just before a new usage period n starts. As time passes, n is incremented to indicate current usage period.

By the time uk signs the contract (i.e., in nbk), the

SaaS provider must configure and deploy application

A to serve its specific consumer. To this end, the SaaS

consumer uk can be requested to pay a configuration fee

I_jb. This fee depends on the plan pj contracted. During

the term of pj, as well as in sequential intervals in which

the consumer renews the contract, the configuration fee Ib

j is not charged again. The provider will only charge

this fee again if the consumer changes the contracted plan. The function ib _{: N}+_{⇒ {0, I}b

j} given by ib_k(n) = Ib j if n = nbk 0 otherwise (1)

defines if consumer uk must pay a configuration fee

in an usage period n.

In order to use application A during each usage period

n, the consumer uk must pay an usage fee Ij to the

SaaS provider. This fee should be enough to cover SaaS provider costs of acquiring necessary resources to offer application A to consumer uk. The fee Ij depends on the

plan pjcontracted by consumer uk. Regardless if the

con-sumer remains in her plan or changes to a new one, the fee Ij is always charged. The function ius: N+ ⇒ {0, Ij}

given by ius_k (n) = Ij+ ej,n if nbk ≤ n ≤ n e k 0 otherwise (2)

defines the usage fee payed by a consumer uk in an

(4)

Each plan pj defines the set of resources that can be

used by a SaaS consumer while using application A. For example, it is common that during an usage period n

a consumer uk uses a certain amount of storage and

transfers a certain amount of data over the network.

Each plan pj defines limits for each computing resource

a consumer can use during an usage period n. If during

an usage period n a consumer uk exceeds such limits,

the SaaS provider charges uk an extra fee ej,n. This extra

fee is proportional to the amount of extra resources used by the consumer.

Finally, a plan pj is associated with a service level

agreement (SLA), represented here as SLAj. For

simplic-ity, a SLA is defined by the tuple < AM IN_{, T}M AX _>,

which values must apply for each usage period n.

AM IN represents the lowest required availability for

application A and TM AX _{represents a response time}

percentile accepted for requests processing (e.g., 95% of requests must be processed within 8 seconds). According to SaaS provider business evaluations, it may be feasible

to define a SLA for each plan pj in order to offer a

higher quality of service to plans that contribute more to the business. Furthermore, if the SaaS provider violates SLAj, a function Mj(n) indicates, for a certain usage

period n, the penalty that the provider must pay to the corresponding consumer. The value of the penalty is proportional to the intensity of the violation and may be defined differently for each plan offered. Penalties payment is included in the cost model presented in Section 3.2.

Given the above aspects of plans, it’s necessary to

define how the SaaS provider charges each consumer uk.

During an usage period n the SaaS provider must do the accounting of resource consumption for each consumer

uk. With this accounting, the provider calculates the

amount of resources used inside plans limits and the amount of extra resources used.

The revenue obtained by a SaaS provider from the

payment of a consumer uk, in any period n, is given

as a combination of usage and configuration fees:

ik(n) = ibk(n) + iusk (n) (3)

Evaluating the set U of consumers that contracted ap-plication A, we can calculate the total revenue obtained by a SaaS provider in an usage period n as:

i(n) = k=|U | X k=1, uk∈U ik(n) (4)

The revenue obtained by a SaaS provider during an interval D of usage periods, where D = [nb_{, n}e_]_{and n}e_≥

nb_{, is given by the function ι : [n}b_{, n}e_{] ⇒ R}+_{, where}

nb_{, n}e_{∈ N}+_: ι(D) = ne X n=nb i(n) (5)

We can use the revenue model presented above to cal-culate the revenue obtained by a SaaS provider in a past interval D, or even to estimate the revenue in a future interval D. In this case, it’s necessary to characterize the future set P of SaaS provider plans and to estimate the future workload that will be submitted by a set U of estimated future consumers.

3.2 Cost Model

As a SaaS provider acquires computing resources from an IaaS provider to build its IT infrastructure, we con-sider that the following costs exist: (i) costs related to using acquired resources; (ii) costs related to reserving resources. Besides these costs, a SaaS provider spends money as a result of SLA violations, which may lead to the payment of penalties to SaaS consumers.

Each IaaS provider has a set O of resources classes being offered. These resources classes can be, for exam-ple: (i) virtual instances; (ii) storage resources; (iii) data transfer resources. Each resource class o|o ∈ O, defines

a set So _{of resource types offered in this class. Each}

resource type s|s ∈ So_{, is associated with an usage cost}

cs, which indicates the minimal charge unit of the type.

For example, considering the Amazon EC21 _{service, a}

small instance (s = small and o = virtual instances) has an usage cost csmall = $0, 062 for each hour of usage.

We consider that all resources from the same class o are charged according to the same minimal charge unit (i.e., for each hour).

The IaaS provider has an accounting system that reg-isters, for each period n and for each SaaS provider, the amount of resources used, as well as their types. This system is then queried to report total resource consumption. For each resource type s and each period n, counters an

s are incremented every time the SaaS

provider uses a resource of type s within period n. For example, suppose that during the first accounting period, n = 1, a large instance (s = large) has been used for 10 hours. In this scenario, the counter a1

large would

be 10.

The cost of the SaaS provider associated with IaaS resources usage in a period n, whether obtained in the on-demand or reservation markets, is defined by the

function ca : N+_{⇒ R}+ _{given by:}

ca(n) =X o∈O " X s∈So an_s· cs # (6) Even the use of reserved resources can be accounted in the equation above since those resources are related to a type s and a usage cost csrepresenting the fees practiced

in the reservation market.

Besides the costs related to resources usage, the SaaS provider has another cost related to the act of reserving

1_{http://aws.amazon.com/ec2/previous-generation/} 2_{Amazon EC2 service values in 2013.}

(5)

resources in advance from the IaaS provider. A reserva-tion of resources of type s|s ∈ So_{, is always associated}

with an amount of resources reserved (rs), an upfront

reservation fee (fs) and the interval in which such

re-sources will be available for use. Thus, we can define a reservation contract as:

V =< o, s, rs, fs, nbs, n e s>

where nb

s and nes indicate, respectively, the period at

which resources are available and the time limit to use such resources. It is noteworthy that the interval [nb

s, nes]

should be defined considering the periods in which the SaaS provider will be using resources to offer its application. The set γ represents the set of reservation contracts established between the SaaS provider and the IaaS provider. It’s important to remember that current IaaS providers only offer the possibility of reserving processing resources (i.e., virtual instances), but the cost model proposed here is flexible to consider other classes of resources that may be available for reservation in the future.

Upfront reservation fees paid by the SaaS provider can be amortized over the interval [nb

s, nes]. Thus, each period

nhas a cost component related to the amortization of the reservation contracts defined in γ. This cost component can be calculated using the function cv : N+_{⇒ R}+_given

by: cv(n) = ( _P <o,s,rs,fs,nbs,nes>∈γ fs·rs ne s−nbs if n b s≤ n ≤ nes 0 otherwise (7) Defined the cost components related to resource usage, ca(n), and resource reservations, cv(n), the total cost of a SaaS provider in a period n can be calculated using the function c : N+_{⇒ R}+ _{given by:}

c(n) = ca(n) + cv(n) + p(n) (8)

where p(n) = P

uk∈UMj(n) represents all penalties

paid by a SaaS provider to its consumers in a period n. A

provider must pay a penalty to a consumer ukwhenever

SLAj, established in the contracted plan pj, is violated.

A SLA violation, as mentioned in Section 3.1, is related to availability or response time violations, according to restrictions established in the plan pj contracted by the

consumer. The function Mj(n)can be used to model

dif-ferent penalties values according to violations intervals or to model single penalty values.

Finally, it’s possible to evaluate the total cost of a SaaS provider in an interval D, where D = [nb_{, n}e_]_{and n}e_≥

nb_{, using the function α : [n}b_{, n}e_{] ⇒ R}+_{, where n}b_{, n}e_∈

N+_: α(D) = ne X n=nb c(n) (9)

We can use the cost model presented here to calculate a SaaS provider cost in a past interval D, or even to

estimate the cost in a future interval D. In this case, it´s necessary to estimate the future workload and resources usage from an IaaS provider in each period n.

3.3 Utility Model

The utility function3_{proposed in this work is defined in}

terms of the profit achieved by a SaaS provider. Once the revenue model and the cost model are defined the utility function of a SaaS provider in an interval D, where D = [nb_{, n}e_] _{and n}e _{≥ n}b_{, is defined by the function}

υ : [nb, ne] ⇒ R, where nb_{, n}e_{∈ N}+_:

υ(D) = ι(D) − α(D) (10)

We can use this utility function to evaluate the utility obtained by a SaaS provider in a past interval D, and also to estimate the future utility of a SaaS provider (in a business-driven capacity planning). During the capacity planning, the function υ allows an agent to estimate SaaS provider utility over a set of possible capacity plans and select the most beneficial plan to the business.

4 C

APACITY

P

LANNING

H

EURISTICS

We propose two capacity planning heuristics: (i) one based on instance utilization (UT); and (ii) one based on Queueing Theory (QN). Both heuristics receive as input the prediction of a future workload for a time interval D, where D is the interval being planned. This prediction can be obtained from historical data of the SaaS applica-tion execuapplica-tion. Both heuristics consider more than one reservation market, each market offering a better cost according to reserved instances usage. Both heuristics use the utility model and the workload prediction to produce a capacity plan indicating the amount and type of instances to reserve in each reservation market. 4.1 Heuristic based on Instance Utilization - UT This heuristic focus on evaluating a trace of instances usage. The trace indicates the amount of instances used and the corresponding amount of hours during which these instances were used (e.g., a SaaS provider used 19 instances of type small for 1000 hours and 5 instances of type small for 1500 hours). This trace must be consistent with future workload prediction. UT uses this trace as input of the algorithm presented in Algorithm 1. If a trace of instances usage does not exist, one can be produced by simulating predicted workload processing. We consider that predicted workload processing simu-lation uses a workload prediction composed of requests arrival time and processing demand. The processing demand estimation considers a base instance. In simula-tion, a Dynamic Provisioning System (DPS) periodically (i.e., hourly) acquires on-demand instances from an IaaS provider. Simulated DPS is based on the behavior of a

3_{A more detailed version of the utility model can be found at} http://www.lsd.ufcg.edu.br/∼_{davidcmm/utilitymodel}

(6)

real DPS. We assume that the DPS chooses the correct type and amount of instances in order to meet SLAs established between the SaaS provider and its clients. After simulation, for example, a trace may indicate that 20 instances were used for 300 hours and 30 instances were used for 5000 hours.

Once we have a trace of instances usage, we have to adapt it for Algorithm 1. We consider that if 2 instances were used for 20 hours and 3 instances were used for 20 hours, in fact, 2 instances were used for 40 hours. When 3 instances are acquired to process the workload we consider that we can keep the 2 instances, previously acquired, and add other instance to meet workload demand. In this example, Algorithm 1 would receive two tuples indicating instances usage: h40, 2i, indicating that 2 instances were used for 40 hours, and h20, 3i, indicating that 3 instances were used for 20 hours.

UT uses the cost model proposed in Section 3.2 and instances usage to plan the infrastructure. For each in-stance type s and each reservation market, UT calcu-lates the minimal utilization rate that makes a reserved instance cheaper than an on-demand instance (line 3). An utilization rate represents the percentage of a time interval (e.g., 50% of the reservation interval) in which the instance is used. Such rate is calculated based on

usage costs (cs) of on-demand and reservation markets

and on reservation fee (fs). For example, UT can find that

a small instance should be used for at least 50% of the reservation interval in reservationM arket1 in order to be cheaper than an on-demand small instance. Also, UT can find that a small instance should have an utilization rate of at least 70% in reservationM arket2. Using these information, UT sorts reservation markets from the one with the lowest minimal utilization rate to the one with the highest utilization rate (line 4).

In the next step, UT calculates, for each instance type

sand amount of instances used (amount), obtained from

the trace, the average utilization rate of such instances (line 7). UT looks for the reservation market with greater minimal utilization rate that is lower than or equal to the average instance utilization rate (lines 8 to 12). This market is selected as bestM arket (line 10) and will be used to reserve instances. For example, sup-pose that reservationM arket1 has a minimal utilization rate of 50% and reservationM arket2 has a minimal utilization rate of 70%. Also, suppose that the average utilization rate for 10 instances of type small is 90%. Analysing such values, UT reserve these instances in

reservationM arket2 for the interval being planned.

After choosing the market that offers the best costs for reserving amount instances of type s, UT evaluates the amount of instances to reserve. Instances of same type can be reserved in different reservation markets. To consider this, UT calculates the amount of instances of type s to reserve in bestM arket as the difference between current amount of instances being evaluated (amount) and the total amount of instances of type s already reserved (lines 13 and 14). After evaluating the

whole set of instances usage data, UT has a capacity plan (capacityP lan) containing the type and amount of instances to reserve in each reservation market (line 17).

Algorithm 1.UT reservation algorithm.

1: function UTRESERVATION

Input: Sets (conss) containing tuples husage, amounti

indicating the amount of hours used by each amount of instances of type s acquired. Tuples are sorted in ascending order of amount of instances used. Input: A set (reservationM arkets) containing the

reser-vation markets that can be used to reserve instances. Output: UT returns a capacity plan (capacityP lan) con-taining the type, amount of instances to be reserved and reservation markets to be used.

2: for all sin type1, type2, . . . , typen do

3: Calculates minimal utilization rate

(minimalU tilizationmarket

s ) for each reservation

market in markets

4: Sorts markets, in ascending order, according

to minimalU tilizationmarket s

5: totalReserved ← 0

6: for all husage, amountiin conss do

7: instancesUtilization = usage / (planning

interval length in hours);

8: for allmarket in reservationM arkets do

9: if instancesU tilization ≥

minimalU tilizationmarket

s then

10: bestM arket ← reservationM arket;

11: end if

12: end for

13: capacityP lan[bestM arket][s]+ = amount−

totalReserved

14: totalReserved+ = amount−totalReserved;

15: end for

16: end for

17: return capacityP lan

18: end function

4.2 Heuristic based on Queue Networks - QN

QN heuristic uses Queueing Theory concepts [28] such as mean arrival rate and mean service time. Such con-cepts are used to model the IT infrastructure that will be used to process the workload. We consider that instances are used to process requests and that queues are formed according to the workload submitted. The steps of the algorithm are presented in Algorithm 2.

Instead of using information of each request to be submitted in predicted workload, QN uses a workload summary. This summary contains estimations for each hour of the future workload. Each hour estimation is

composed of: requests mean arrival rate (¯λ); requests

mean service time ( ¯S); mean number of users (N ); users mean think time (Z); instances utilization rate target (ρ). The utilization rate target ρ represents the maximum utilization expected for an instance, for example, a max-imum utilization of 70%. The workload summary can

(7)

be estimated considering historical workload traces and workload growth estimates.

Using the workload summary (especially ¯λ and ¯S),

QN estimates the total CPU demand (T ) needed to process future workload (line 2). Also, QN considers the cost model proposed in Section 3.2 to find the minimal utilization rate (minimalU tilizationmarket

s ) that makes a

reserved instance cheaper than an on-demand instance. To do this, QN associates usage costs (cs) of on-demand

and reservation markets and reservation fee (fs) to find

minimalU tilizationmarket

s (line 5).

Using T and minimalU tilizationmarket

s , QN calculates

the largest number of reserved instances of type s that could be used to process the workload (lines 6 to 10). This value is used to limit the amount of instances in the plans that will be evaluated. For example,

sup-pose that for a certain workload M AXsmall = 10 and

M AXlarge = 3. QN evaluates all 44 capacity plans

resulting of combinations containing from 0 to 10 small instances and from 0 to 3 large instances.

After choosing the plans to evaluate, QN estimates the utility of each of these plans (lines 14 to 33). For each hour of the predicted workload, QN determines the amount of instances to be used from the on-demand and reservation markets. First, QN distributes arriving requests (according to mean arrival rate ¯λ) in reserved instances calculating the amount of incoming requests that can be processed without violating the response time established in the SLA and without exceeding instances utilization rate target ρ (lines 17 to 20).

If not all requests could be processed using reserved instances, QN assumes that on-demand instances can be acquired. The throughput of these instances is used to find the amount of on-demand instances needed to process remaining requests (lines 21 and 22). QN also considers a risk (line 23) that the on-demand market denies service (i.e., can not provide the amount of on-demand instances needed). In this case, some requests are not be processed and the SLA might be violated (line 24). We consider that the on-demand market can deny service for two reasons: (i) the SaaS provider has reached the limit of instances that can be acquired from the IaaS provider; (ii) the IaaS provider does not have enough instances to offer4.

After estimating the amount of instances to be used from the on-demand and reservation markets, QN looks for the best reservation market to buy such instances (lines 26 to 28). QN calculates the cost of acquiring reserved instances in each reservation market using the cost model proposed in Section 3.2. Then, QN chooses the market that gives the lowest cost.

Finally, QN calculates an estimated utility value for the capacity plan being evaluated using the utility model presented in Section 3.3 (line 29). After estimating the utility of these plan, QN checks if it is the plan with

4_{Such risk of instances denial is real for current players} of IaaS, as can be seen for example at Amazon AWS http://aws.amazon.com/ec2/purchasing-options/

greater utility (lines 30 to 32) in order to return it as the plan to be used in the infrastructure (line 34).

Algorithm 2.QN reservation algorithm.

1: function QNRESERVATION

Input: A set (reservationM arkets) containing the reser-vation markets that can be used to reserve instances Input: A summary (predictedW orkload) of the predicted workload for an interval D containing, for each hour of the predicted workload: ¯λ, ¯S, N , Z

Input: Instances utilization target: ρ

Output: QN returns a capacity plan (capacityP lan) con-taining the type, amount of instances to be reserved and reservation markets to be used.

2: T =Pplanning interval hours

m=1 S¯m∗ ¯λm

3: for all marketin reservationM arkets do

4: for all s in type1, type2, . . . , typen do

5: Calculates minimalU tilizationmarket

s

ba-sed on con−demand

s , cmarkets , fs

6: demandmarket

s ←

bT /(minimalU tilizationmarket

s ∗

planning interval hours)c

7: end for

8: if demandmarkets ≥ M AXsthen

9: M AXs← demandmarkets

10: end if

11: end for

12: possibleP lans ← builds all possible capacity

plans with amount of instances from 0 to M AXs

13: capacityP lan ← null

14: for all plan ∈ possibleP lans do

15: utility[plan] ← 0

16: for all hour in predictedW orkload do

17: for all sin type1, type2, . . . , typen do

18: resReqs← the amount of requests

pro-cessed using reserved instances (instance utilization limited to ρ)

19: reservedHourss+ = dresReq ∗ ¯Sme

20: end for

21: onDemReq ←the amount of requests

pro-cessed using on-demand instances

22: onDemandHours+ = donDemReq ∗ ¯Sme

23: notP rocessed+ = the amount of requests

not processed in current hour

24: violations+ =the amount of requests that

violated the SLA in current hour

25: end for

26: for all s in type1, type2, . . . , typen do

27: Choose market ∈ markets that gives the

lowest cost for reservedHourss

28: end for

29: utility[plan] = estimateReceipt() −

estimateCost(reservedHours, onDemandHours) −

estimateP enalties(notP rocessed, violations)

30: if utility[plan] ≥ utility[capacityP lan] then

31: capacityP lan ← plan

32: end if

(8)

34: return capacityP lan

35: end function

5 S

IMULATOR

5.1 Simulation Model

Proposed heuristics were evaluated through simulation

experiments. Existent simulators, such as CloudSim5_,

were not used for two reasons: (i) the difficulty of adapt-ing them to support the utility model proposed (Section 3); (ii) the amount of details that are not the focus of this work (e.g., virtual machines allocation models and energy consumption models) . Instead, we developed an

extension6 _{of the SaaSim framework}7 _{[29] considering}

Verification & Validation techniques proposed by [30]. Our simulation model considers a SaaS provider of-fering one application to its consumers. The SaaS pro-vider IT infrastructure is composed of virtual instances acquired from an IaaS provider. The simulation has two main phases: (i) capacity planning, this phase considers one heuristic to build a capacity plan; and (ii) workload execution, this phase processes the workload considering instances reserved in the capacity planning phase and extra on-demand instances that might be acquired.

In the first phase, a capacity planning heuristic con-siders a workload prediction for a future interval D to build the capacity plan. A perfect workload predictor might not be used in a real scenario, so to model the predictor precision we consider a prediction error related to the amount of SaaS clients submitting requests to a SaaS provider. For example, an error of 10% means that if the real workload is composed of 100 clients, the predictor estimates a workload composed of 110 clients. On the other hand, an error of −10% means that if the real workload is composed of 100 clients, the predictor estimates a workload composed of 90 clients.

The second phase aims at evaluating the capacity planning performed in the first phase. We simulate workload processing using reserved and on-demand instances acquired from an IaaS provider. In the end of the simulation, we calculate the utility obtained by the SaaS provider as a result of using the capacity plan produced. It’s important to remember that the workload of each SaaS consumer is an aggregation of requests submitted by end users.

We consider that as requests arrive to be processed, a weighted round-robin load balancer distributes them in the available instances. The round-robin policy used considers the amount of virtual CPUs in each instance and distributes requests proportionally to such amount of CPUs (i.e., an instance with 2 CPUs receives the double of requests received by a one CPU instance).

Each instance process requests according to a consol-idated model [31]. An instance can process an amount of m requests in parallel, controlled by a set of m tokens

5_{http://www.cloudbus.org/cloudsim/}

6_{Available at http://code.google.com/p/saasim-david} 7_{Available at http://github.com/ricardoas/saasim}

Fig. 2. System Model: general view of queues and request processing

that represents available threads (Figure 2). As a request arrives, it acquires a token and enters the processing queue. If no tokens are available, the request waits at a backlog queue, which works in a first-come first-served policy (FCFS), until a token is available. If backlog is full, the request is discarded. The processing queue works in a time sharing policy8_.

Besides processing demand, each request has a data transfer demand. Also, each SaaS consumer has a stor-age demand related to hosting its application and user records. We assume that the IaaS provider meets these two demands regardless of the choice and negotiation of instances. The associated cost is calculated according to the model presented in Section 3.2.

Web applications typically present a variable work-load [32], so a DPS is used to control the amount of instances in the short-term. We consider an unrealistic perfect DPS that knows the future workload and uses this information to buy instances from the IaaS provider. Although this simplification is unrealistic, it is important to focus on evaluating the quality of the capacity plan-ning performed.

5.2 Simulation Model Instance

A preliminary full factorial design pointed workload prediction error and capacity planning heuristic as the main factors that influence SaaS provider profit. Ex-periments conducted later also pointed the on-demand denial of service risk as another important factor. Our experiments tried to explore several combinations of such factors, while other variables received fixed values.

8_{A request can use the CPU for an interval ∆, typically very small,} and then the CPU is allocated to another request in the processing queue. Thus, all requests are simultaneously processed and delays related to contention are captured.

(9)

TABLE 1

SaaS provider monthly fees

Plans Price

Bronze $24.95 Gold $79.95 Diamond $299.95

Our analysis is not exhaustive and different levels could be used for these fixed variables, but we are confident that our approach was enough to evaluate the trends of heuristics and get ideas of future work.

Well-known IaaS and SaaS providers were used as the basis to instantiate the utility model proposed in Section 3. For the revenue model, three plans offered by BigCommercein 2011 were considered (Table 1): Bronze, Gold and Diamond. BigCommerce charges its consumers monthly (n is equals to 1 month). A contribution margin of 30% was chosen for each plan according to what is practiced in the market9.

Regarding SLA, the availability (AM IN) and response

time limit (TM AX_{) were instantiated in ranges. We}

con-sider that the SaaS provider establishes a response time limit (TM AX_{) of 2 seconds. If requests processing take}

more than 2 seconds, we consider that these requests were lost. If less than 0.1% of the requests are lost, due to response time or availability problems, the SaaS provider does not pay any penalty to its consumer. If less than

1% of the requests are lost, the provider pays a penalty

corresponding to 25% of the value of the plan contracted. If less than 5% of the requests are lost, the provider pays a penalty corresponding to 50% of the value of the plan. Finally, if more than 5% of the requests are lost, the penalty corresponds to the whole value of the contract. Regarding the cost model, the IaaS provider simulated was based on the prices of the Amazon EC2 service in 2013. Three instance types were considered: small (1 virtual CPU), large (2 virtual CPUs) and xlarge (4 virtual CPUs). The only difference considered between these th-ree types is the amount of virtual CPUs. After an instance is requested from an IaaS provider there is a period, considered here as 5 minutes [21], to start the instance and the application. We also considered three reservation markets: a light utilization, a medium utilization and a heavy utilization. Each of this markets offers a better cost according to the usage of reserved instances. Usage costs of such instances (Table 2), per hour, and upfront reservation fees (Table 3) are presented.

We consider that one of the reasons that the on-demand market denies service is because the IaaS pro-vider does not have enough instances to offer. To model such aspect, an on-demand denial of service risk, which represents the probability of not being attended when requiring an instance from the on-demand market, is considered. We consider that the capacity planning

inter-9_{http://biz.yahoo.com/p/sum qpmd.html}

TABLE 2

Virtual instances usage price for each IaaS provider market

IaaS provider market Small Large Xlarge

On-demand $0.06 $0.24 $0.48 Light $0.034 $0.136 $0.271 Medium $0.021 $0.084 $0.168 Heavy $0.014 $0.056 $0.112

TABLE 3

Virtual instances upfront fees for a one year reservation

IaaS provider market Small Large Xlarge

Light $61 $243 $486 Medium $139 $554 $1108

Heavy $169 $676 $1352

val D has three types of intervals according to workload variations over the interval [5]. For each type of interval we associate a denial of service risk, and intervals with higher workload present a higher denial of service risk. We consider two scenario of risks: (i) risks of 1%, 5% and 10%; (ii) risks of 5%, 10% and 50%. The first scenario represents a IaaS provider that cares about its reputation and the quality of the service offered, while the second scenario represents a provider that does not care so much about its reputation.

We simulated a workload corresponding to an inter-val of 1 year (i.e., D = 1 year). A total of 100 SaaS consumers were uniformly distributed among the three plans offered by the SaaS provider. Workload prediction errors initially considered were −20%, 0% and 20%. For each combination of these variables levels a total of 70 different synthetic workloads were simulated to calculate confidence intervals of 95%.

Arlitt et al. [5] shows that an e-commerce workload has some peaks during the day (between 9:00 and 21:00) and the weeks (some days have more and others less load than typical days). A workload peak corresponds to twice the mean amount of requests, while lighter periods correspond to 50% of the mean amount of requests. These invariants were combined with SaaS plans’ prices, contribution margin and usage limits to calculate the request arrival rate of each SaaS provider plan during an year (Table 4). Moreover, some special events (e.g., Christmas) cause peak loads compared to typical weeks [5]. Workloads used in simulations were generated by GEIST [33], while workload predictions were derived from these workloads. GEIST generates a workload assuming a Poisson distribution as the mar-ginal distribution of the arrival process and then adds multifractal and self-similarity properties. Finally, we considered requests processing demands based on [6].

(10)

TABLE 4

Requests arrival rate for a typical week of the workload

Workload days Bronze Gold Diamond

Typical day 0.058req/s 0.176req/s 0.650req/s Peak day 0.117req/s 0.350req/s 1.300req/s Light day 0.029req/s 0.090req/s 0.325req/s

6 E

VALUATION

QN and UT heuristics were compared to four reference strategies/heuristics: 1) one baseline strategy that only uses instances acquired from the ondemand market -named ON; 2) one heuristic that reserves 20% of the in-stances needed to process the workload peak, using only small instances from the heavy utilization reservation market - named ST10_{; 3) a heuristic (COHR}0_{), based on}

[10], that uses the three reservation markets considered and a prediction of the amount of instances to be used. It tests a set of possible capacity plans containing from 0 to an upper bound amount of instances in order to choose the capacity plan with the lowest estimated cost; 4) an optimal strategy that knows the exact amount of instances that will be used by the DPS to process the future workload - named OP. It tests a set of capacity plans containing from the smallest to the highest amount of instances that will be used by the DPS and chooses the capacity plan with the best estimated utility value.

Our analysis focus on two metrics: 1) the SaaS pro-vider utility (Section 3.3); and 2) the gain, in percentage, obtained by each heuristic in comparison to the utility obtained by our baseline strategy (ON). This gain is given by:

gain(υA(D), υON(D)) = 100 ∗

(υA(D) − υON(D))

|υON(D)|

(11) First, we verify the feasibility of the capacity plan-ning performed by evaluated heuristics/strategies. The

null hypothesis υST(D) = υU T(D) = υQN(D) =

υCOHR0(D) = υ_ON(D) was rejected according to the

analysis of variance (ANOVA) performed with a

p-value of 1.952e−12_{. A post-hoc analysis was performed}

to evaluate if any heuristic obtained utilities simi-lar to the ones obtained by ON. We concluded that υU T(D), υQN(D), υCOHR0(D), υ_ST(D) > υ_ON(D).

Evalu-ated heuristics present different gains from each other, and different from zero, so they increase the utility of the SaaS provider in comparison to using ON.

According to Shapiro-Wilk normality tests, heuristics utilities are normally distributed while gains are not. So, in order to compare heuristics, we performed Student’s t-tests for heuristics utilities and Wilcoxon signed-rank tests for gains. Analysing the results of such tests (Table 5) we could observe that QN and UT always present the

10_{ST heuristic reserves 20% of the amount of instances needed to} process the workload peak since 20% is an expected utilization for an infrastructure that is planned for a workload peak [2].

TABLE 5

T-tests and Wilcoxon tests results

Prediction Errors Risks of 1%, 5%

and 10% Risks of 5%, 10% and 50% -20% U T > QN > ST ≥ COHR0 QN_{ST > COHR}> U T0 > 0% U T > QN > ST > COHR0 QN > U T > COHR0> ST 20% QN > U T > ST > COHR0 QN_{ST > COHR}> U T0 > TABLE 6

Average gains for different prediction error levels

Heuristics Prediction Error of -20% Prediction Error of 0% Prediction Error of 20% QN [8.83%; 9.05%] [10.2%; 10.43%] [10.27%; 10.36%] UT [10.23%; 10.28%] [10.64%; 10.68%] [9.24%; 9.32%] ST [7.70%; 7.83%] [8.89%; 8.97%] [8.99%; 9.09%] COHR0 _{[4.64%; 4.74%]} _{[7.12%; 8.09%]} _{[4.21%; 4.31%]}

best results. QN performs better than the other heuris-tics when the on-demand market risk or the workload prediction error increase. QN is the only heuristic that uses directly the on-demand market risk, so as the risk increases QN perceives that the on-demand market can not be trusted to provide instances and tries to reserve more instances (Figure 3) in order to avoid denying ser-vice to application end users. However, other heuristics do not notice such need in the scenarios simulated and do not increase the amount of instances to be reserved.

Since the capacity planning of evaluated heuristics increased the SaaS provider utility in comparison to the ON strategy, the next step of the post-hoc analysis consisted of quantifying the gains obtained. Table 6 presents the gains obtained by each heuristic at different workload prediction errors for the scenario of risks of 1%, 5% and 10%. Although the best (10.66% - error of 0%) and worst (4.5% - error of 20%) average gains obtained seem to be small, they can represent larger savings to a SaaS provider as higher is the profit of the SaaS provider. The difference in heuristics gains can be explained analysing reservations, presented in Figure 3, and in-stances utilization (i.e., percentage of the reservation interval in which the instance was used), presented in Figure 4. We can observe that QN and UT instances were reserved in two markets (heavy and light utilization) and that, in both markets, instances were very used. Since the lowest expected utilization for light market is

28%, for medium is 50% and for heavy is 75%, reserved

instances were cheaper than on-demand instances. UT reserved a larger amount of well used instances in the heavy market, thus obtaining the best cost reductions and greater utilities. QN obtained good cost reductions reserving different instances types (i.e., small, large and xlarge). QN variation in instances types reduced the absolute amount of instances reserved and increased

(11)

0 10 20 30 40 50 COHR' QN ST UT Heuristics Reser v ed CPUs Type large small xlarge

(a) Risks of 1%, 5% and 10%

0 10 20 30 40 50 COHR' QN ST UT Heuristics Reser v ed CPUs Type large small xlarge (b) Risks of 5%, 10% and 50% Fig. 3. Instances reserved by each heuristic for a prediction error of 0%

their usage.

As expected, ST heuristic obtained cost reductions in comparison to ON since small reserved instances were well used. However, more instances could be reserved instead of being acquired from the on-demand market

in order to achieve better cost reductions. COHR0

heu-ristic reserved instances were also well used, but more instances could be reserved. Moreover, in the evaluated

scenarios COHR0choice of reservation markets could be

improved (e.g., instances that were reserved in the light market could be reserved in the medium or heavy mar-kets for better cost reductions and utility improvements). By analysing the optimal heuristic (OP) results, we could observe that OP obtained average gains of 10.72% and 11.51% for risks of 1%, 5%, 10% and risks of 5%, 10%, 50%, respectively11_{. In order to compare OP with other}

heuristics, we computed an efficiency as the division of the gain obtained by each heuristic and the gain obtained by the OP heuristic. Comparing heuristics and OP for workload prediction errors of 0% (the best scenario for heuristics), UT achieves an efficiency of 99.42% (±0.16%) for risks of 1%, 5%, 10% and QN achieves an efficiency of

96.19%(±1.08%) for risks of 5%, 10%, 50%. These values

were calculated with a confidence interval of 95%. So, in such scenarios QN and UT achieve utilities that are close to the utilities obtained by OP. Table 7 presents the efficiencies achieved by QN, UT and ST heuristics considering the whole set of scenarios simulated.

By analysing such efficiencies, we can observe that as the on-demand risk increases UT and ST are the heu-ristics mostly affected. In such scenarios, improvements in the heuristics can be investigated in order to obtain greater gains and efficiencies. In all scenarios the worst percentage of requests lost were 0.047% for risks 1%, 5% and 10% and 0.05% for risks 5%, 10%, 50%.

Finally, considering the difficulty in workload

predic-11_{Confidence intervals are [10.706%; 10.747%] for risks of 1%, 5%} and 10% and [9.37%; 13.655%] for risks of 5%, 10%, 50%.

TABLE 7

Heuristics Average Efficiencies

Heuristics Risks of 1%, 5% and

10% Risks of 5%, 10% and 50% QN [85.5497%; 88.2974%] [74.1320%; 79.3194%] UT [85.5053%; 87.6603%] [64.5329%; 68.2585%] ST [74.9753%; 76.9642%] [45.4674%; 49.8758%]

tion, we performed a sensitivity analysis of the workload prediction error (Figure 5). This analysis attempted to reflect the possibility of using predictors that result in different prediction errors. We considered the following prediction errors: −40%, −20%, 0%, 20% and 40%. As expected, the analysis demonstrated that the reduction of the workload prediction error improves the gains obtain-ed by evaluatobtain-ed heuristics. Also, the QN heuristic deals better with positive prediction errors (i.e., overestimating the workload) due to its more conservative prediction. This is an important conclusion since providers may try to overestimate instances consumption in order to reduce the risk of denying service to end users. For these providers, QN is a better choice.

6.1 Discussion about the heuristics

There are some reasons why one heuristic makes better decisions than others. Following we discuss some of them. Firstly, ON does not use reserved instances, which may reduce its chance of being the best in terms of cost. Obviously, at least one instance must be always allocated to the application, otherwise the application would be unavailable. A reserved instance that is used all the time is cheaper than an on-demand instance.

Secondly, ST is a very simple heuristic that considers only peak load to make this capacity planning decision. On the opposite, QN and UT heuristics consider details of the load history, such as average demand, waiting

(12)

● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●●●●●●●●●●●●●● ● ● ● ● ● ● ● ● ● ● ● ●●●●●●●●● ●●●● ● ●●● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ●●●● ●●●● ●●●● ●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

Heavy Light On−demand

0.00 0.25 0.50 0.75 1.00 0 50 100 150 200 ₀ 50 100 150 200 ₀ 50 100 150 200 Instances Utilization

type ● _LARGE ● _SMALL ● _XLARGE

(a) QN ● ●●●●●●●● ●●●●●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●●●●●●●●●●●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ● ● ● ● ● ● ● ●●● ● ●●●●●●●●●●●●●●●●●●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●●●●●● ●●●● ●●●●●●●●●● ●●●●●●●● ●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●

Heavy Light On−demand

0.00 0.25 0.50 0.75 1.00 0 100 200 300 0 100 200 300 0 100 200 300 Instances Utilization type ● _SMALL (b) UT ● ● ● ● ●●● ● ● ● ● ● ●●●●● ● ● ● ● ● ● ● ●●●●●●●●●●●●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ●●●●●●●●●●●●●● ● ● ● ● ● ● ● ● ● ● ● ●●●●●●●●● ●●●● ●●●● ● ● ● ● ● ● ● ● ●●● ● ●●● ● ● ●● ●● ●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● Heavy On−demand 0.00 0.25 0.50 0.75 1.00 0 50 100 150 200 ₀ 50 100 150 200 Instances Utilization type ● _SMALL (c) ST ● ● ● ● ● ● ●● ●●●●●● ● ● ● ● ● ● ● ● ● ● ●●●●●●● ● ● ● ● ● ● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●● ●●●●●●●●●●● ● ●●● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● Heavy Light Medium On−demand 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0 100 200 300 0 100 200 300 Instances Utilization

type ● _LARGE ● _SMALL ● _XLARGE

(d) COHR0

Fig. 4. Instances utilization for a workload prediction error of 0% and risks of 1%, 5% and 10%

times, resource usage, etc. With more detailed informa-tion it is possible to make better decisions. For instance, if the peak load is 20 times greater than the average load and it happens just for a couple of minutes during the year, ST will make a bad decision, allocating far more nodes than really needed. QN and UT are able to identify such surges, making decisions more appropriate considering the workload history patterns. Besides, both QN and UT have a kind of what-if engine inside, which simulates decisions made by the DPS considering the past workload. With this engine QN and UT may search for the best capacity planning according to the business utility value. ST does not consider different possibilities, it is kind of deterministic, based only on the previous peak load and on small instances.

QN considers the on-demand market risk, so as the risk increases QN tries to reserve more instances, al-most keeping the gains obtained (Figure 5b) while other heuristics obtain lower utilities. This indicates that it is important to consider this risk, especially if it is high.

COHR0does not consider the on-demand market risk

and adaptations performed to verify cost improvement of exchanging small instances for large or xlarge in-stances resulted in inin-stances from the heavy market with utilizations lower than the threshold of 75%, which is ineficient.

Finally, QN and UT are less efficient than OP mainly due to prediction errors. For instance, UT with no predic-tion errors leads to an utility very similar to the optimal. Same occurs with QN.

(13)

● ● ● ● ● 2 3 4 5 6 7 8 9 10 11 −40 −20 −10 0 10 20 40 Prediction Error (%) Gain in compar ison to ON heuristic ● _COHR' QN ST UT

(a) Risks of 1%, 5% and 10%

● ● ● ● ● 2 3 4 5 6 7 8 9 10 11 −40 −20 −10 0 10 20 40 Prediction Error (%) Gain in compar ison to ON heuristic ● _COHR' QN ST UT (b) Risks of 5%, 10% and 50% Fig. 5. Sensitivity analysis of workload prediction error

6.2 Validity Threats

In order to enable the investigation of the proposed problem some simplifications were done, resulting in validity threats. Regarding external validity, a synthetic e-commerce workload was used. Requests arrivals were generated using a outdated workload generator (GEIST) [33] since more recent workload generators based on recent workload studies were not found. Although our utility model covers many IaaS and SaaS providers busi-ness models, our experiments were based on information of one IaaS provider and one SaaS provider and do not account for sensibility of their cost choices. Regarding construction validity, we modeled the SaaS application as a black box single tier application and users session were not considered.

7 C

ONCLUSIONS AND

F

UTURE

W

ORK

Analysing our simulation experiments using synthetic e-commerce workloads we demonstrated that capacity planning should not be neglected when offering a SaaS application deployed at instances acquired from an IaaS provider. We developed an utility model that considers business aspects related to offering a SaaS application. This model guides the capacity planning performed by proposed heuristics, QN and UT. Proposed heuristics were compared to other three solutions: (i) a baseline strategy that uses only on-demand instances (ON); (ii) a

heuristic (COHR0) based on [10]; and (iii) a heuristic

that considers workload peak to determine the amount of instances to reserve (ST). Analysing our results, all heuristics improve SaaS provider utility in comparison to ON. QN and UT present the best results, improving SaaS provider utility, on average, by 10.04% and 9.25%, respectively. Also, such heuristics lose 0.05% of the re-quests in the worst cases.

Our sensitivity analysis demonstrated that workload prediction errors influence the results obtained by

eval-uated heuristics. Large SaaS providers tend to have huge amounts of historical data and invest in good prediction techniques. As a consequence, they get small prediction errors and achieve better capacity planning results. However, smaller SaaS providers may not have access to such possibilities, obtaining larger errors and failing to explore the best of capacity planning heuristics. Simplifications considered result in validity threats that should be explored in future work. We plan to use real e-commerce workload to validate the results obtained by each heuristic considered in this work. Although we used synthetic workload in this work, the utility models proposed as well as the synthetic e-commerce workload generated considered real IaaS and SaaS providers information. So, we believe that an overview of heuristics behavior could be established. A more detailed application model can be considered using many tiers and users sessions. Finally, improvements in heuristics can be investigated, especially for scenarios of large workload prediction errors.

A

CKNOWLEDGMENTS

The authors would like to thank Siqi Shen [10] for pro-viding specifications about the original COHR heuristic.

R

EFERENCES

[1] A. Moura, J. Sauve, and C. Bartolini, “Business-driven it management-upping the ante of it: exploring the linkage between it and business to improve both it and business results,” Commu-nications Magazine, IEEE, vol. 46, no. 10, pp. 148–153, 2008. [2] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz,

A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, and M. Zaharia, “Above the Clouds : A Berkeley View of Cloud Computing Cloud Computing : An Old Idea Whose Time Has ( Finally ) Come,” Computing, pp. 07–013, 2009.

[3] L. Vaquero, L. Rodero-Merino, J. Caceres, and M. Lindner, “A break in the clouds: towards a cloud definition,” ACM SIGCOMM Computer Communication Review, vol. 39, no. 1, pp. 50–55, 2008.