Combined Power and Performance Management of Virtualized Computing Environments Serving Session-based Workloads

(1)

Combined Power and Performance Management of

Virtualized Computing Environments Serving

Session-based Workloads

Dara Kusic,

Member, IEEE,

Nagarajan Kandasamy,

Member, IEEE,

and Guofei Jiang

Abstract—This paper develops an online resource provisioning framework for combined power and performance management in a virtualized computing environment serving session-based workloads. We pose this management problem as one of sequen-tial optimization under uncertainty and solve it using limited lookahead control (LLC), a form of model-predictive control. The approach accounts for the switching costs incurred when provisioning virtual machines and explicitly encodes the risk of provisioning resources in an uncertain and dynamic operating environment. We experimentally validate the control framework on a server cluster supporting three online services. When managed using LLC, our cluster setup saves, on average, 41% in power-consumption costs over a twenty-four hour period when compared to a system operating without dynamic control. Finally, we use trace-based simulations to analyze LLC performance on server clusters larger than our testbed and show how concepts from approximation theory can be used to further reduce the computational burden of controlling large systems.

Key words: Power management, resource provisioning, virtual-ization technologies, predictive control

I. INTRODUCTION

Web-based services such as online banking and shopping must respond continuously to users in real time, and as people have become more reliant on such services, their expectations for quality of service (QoS) have risen dramatically. Online services are enabled byenterprise applications, defined as any software which simultaneously provides services to a large number of users over a computer network. These applications are typically hosted on computer clusters comprising hetero-geneous and networked servers, housed in a physical facility called a data center. A typical data center serves a variety of companies and users, and the computing resources needed to support such a wide range of online services leaves server rooms in a state of sprawl. Moreover, each new service to be supported often results in the acquisition of new hardware, leading to very low server utilization levels.

Virtualization is a promising approach to reduce power consumption by consolidating multiple online services onto

Manuscript received May 28, 2010; revised October 3, 2010. The associate editor coordinating the review of this paper and approving it for publication was J. Hellerstein.

D. Kusic is with the Coriell Institute for Medical Research Camden, Camden, NJ 08103. E-mail: [email protected]. N. Kandasamy is with the Electrical and Computer Engineering Department, Drexel University, Philadelphia, PA 19104. E-mail: [email protected]. G. Jiang is with the Robust and Secure System Group, NEC Laboratories America, Princeton, NJ 08540. E-mail: [email protected].

This material is based on work supported by the National Science Founda-tion under grants DGE-0538476, CNS-0643888, and Grant# 0937060 to the Computing Research Association for the CIFellows Project.

fewer computing resources within a data center. This tech-nology allows a single server to be shared among

multi-ple performance-isolated platforms called virtual machines

(VMs), where each virtual machine can, in turn, host multiple enterprise applications. Virtualization also enableson-demand

or utility computing, a dynamic resource provisioning model in which computing resources such as CPU and memory are made available to applications only as needed and not allocated statically based simply on the peak workload. This paper develops an online control framework for combined power/performance management in a virtualized computing environment, focusing on session-based workloads. The pro-posed framework allows system administrators to maintain service-level agreements (SLAs) with clients while achieving higher server utilization and energy efficiency by dynamically provisioning VMs, consolidating the workload, and turning servers on and off as needed.

Control theory offers a promising methodology to automate key system management tasks in data centers. It allows us to solve a general class of problems using the same basic control concepts and to verify the feasibility of the control scheme before deployment. Previous research has focused on using PID control for performance management and we refer the reader to Hellerstein et al. for a detailed discussion on this topic [1]. The authors show how to develop closed-loop feedback controllers that manage the performance of client-server systems such as the IBM Lotus Domino client-server and the Apache HTTP server. For example, the Lotus Domino server uses remote procedure calls for client/server interaction and each client request results in the creation of a certain number of these calls. The PID controller regulates the maximum number of clients that can be admitted into the system such that the total number of remote procedure calls is maintained at a specified set point. PID control has also been successfully applied to problems such as task scheduling [2], QoS adap-tation in web servers [3], load balancing in file servers [4], and power management in virtualized servers [5]. Assuming a linear time-invariant system, an unconstrained state space, and a continuous input/output domain, a closed-loop feedback con-troller is designed under stability and sensitivity requirements. Other researchers have developed more advanced state-space and multi-input multi-output (MIMO) methods to manage computing systems [6]–[8]. These methods can take into account multi-objective cost functions and dynamic operating constraints while optimizing performance.

PID control is not suitable for managing computing systems exhibiting hybrid and non-linear behavior, and where control

(2)

options must be chosen from a finite set. Also, actions such as turning on servers and VMs usually have an associated dead time—the delay between a control input and the corresponding system response—on the order of minutes, requiring predictive control where control inputs must be provided in anticipa-tion of future changes in operating condianticipa-tions. Therefore, in this work, we pose the power/performance management problem as one of sequential optimization under uncertainty

and solve it using the pro-active method oflimited lookahead

control(LLC), a form of model-predictive control wherein the idea is to solve a multi-objective optimization problem that maximizes the performance objective over a given prediction horizon, and then periodically roll this horizon forward. The LLC approach is applicable to computing systems that exhibit non-linear behavior, and when tuning options must be chosen from a finite set at any given time. It also models the vari-ous switching and opportunity costs associated with turning servers and VMs on or off. For example, profits may be lost while waiting for a VM and its host to be turned on. Other switching costs include the SLA violations incurred when migrating a VM between servers, and the power consumed while a machine is being powered up or down, and not performing any useful work.

The major contributions of this paper are as follows:

• In an operating environment where the workload is highly

variable, excessive switching activity may actually reduce the profit generated, especially in the presence of the switching costs described above. Thus, each provisioning decision made by the controller is risky and we explicitly encode risk into the LLC formulation using preference functions to order possible controller decisions.

• We validate the control framework using an experimental

setup of heterogenous servers supporting three online services, enabled by the Trade6, DVDStore, and RUB-BoS applications. The cluster processes a time-varying, session-based workload in which both the number of requests as well as the transaction mix can change dynamically during run time.

• We use trace-based simulations to analyze controller

performance on clusters larger than our testbed, showing how concepts from approximation theory can be inte-grated within the LLC framework to further reduce the computational burden of controlling large systems. We use a neural network (NN) to learn the decision making tendencies of the controller via offline simulations. At run time, given the current state and environment inputs, the NN provides an approximate solution, which is used as a starting point around which to perform a quick local search to obtain the final control decision.

Experimental results show that the server cluster, when managed using the proposed LLC approach, saves, on average, 41% in power-consumption costs over a twenty-four hour period when compared to a system operating without dynamic control. The execution-time overhead of the controller is quite low, making it practical for online performance management. We also characterize the effects of different risk-preference functions on control performance, finding that a risk-averse

controller reduces the number of SLA violations while main-taining energy savings. A risk-averse controller also reduces server switching activity—a beneficial result when excessive power-cycling of servers is a concern.

This paper builds upon our previous work [9], [10], specif-ically with regard to its consideration of session-based work-loads, migration of live VMs across servers during system operation, and the use of approximation theory to improve controller scalability. We also present a more comprehensive experimental analysis of LLC performance, including the effect of user-defined policies and dynamic transaction mixes. The paper is organized as follows. Section II describes our experimental setup. Section III formulates the control problem and describes the controller implementation. Sec-tion IV presents experimental results evaluating the control performance. Section V addresses controller scalability using trace-based simulations. Section VI discusses related work on resource provisioning in virtualized computing environments and Section VII concludes the paper.

II. TESTBEDDESCRIPTION

The computing cluster consists of the nine servers detailed in Fig. 1, networked via a gigabit switch. Virtualization of this cluster is enabled by VMWare’s ESX Server and the operating system on each VM is SUSE Enterprise Linux. The ESX server controls the disk space, memory, and CPU share (in MHz) allotted to the VMs, and provides an application programming interface (API) to support the remote manage-ment of VMs. Our controller uses this API to dynamically turn on VMs on the hosts and to assign CPU shares to the virtual machines. Migration of VMs between hosts is performed using the VMotion API. To physically turn off the host machine, we follow the shutdown command with an Intelligent Platform Management Interface command to power down the chassis. Hosts are remotely powered on using the wake-on-LAN protocol. We will discuss the system architecture in greater detail in Section II-B.

A. The Enterprise Applications

We use three multi-tier, transaction-based applications for

testing in our virtualized hosting environment. The Gold

application is DVD Store, an open source emulation of an e-commerce site that we host on an Apache Tomcat appli-cation server with DB2 as the database component (http:

//linux.dell.com/dvdstore). The Silver application is IBM’s

Trade6 benchmark, a stock-trading service that allows users to browse, buy, and sell stocks. Trade6 is integrated within the IBM WebSphere Application Server and uses DB2 as the database component. This execution environment is then distributed across multiple servers comprising the application

and database tiers. The Bronze application is RUBBoS, a

bulletin-board benchmark similar to Slashdot with the capabil-ity to browse for, and post messages (http://jmob.objectweb. org/rubbos.html). We host RUBBoS on an Apache Tomcat application server with DB2 as the database component.

The above applications operate on the notion of user ses-sions. Requests for each session are assumed to be dependent upon one each other, that is, the user waits for a response from

(3)

G o ld T o m c a t OS Host₂ Host₃ Dispatcher Workload Arrivals ) ( ), ( ), ( ₂ ₃ 1 k

O

k

O

k

O

Application Tier Host_{db_1} Host_{db_2} Operational Hosts Database Tier ) ( ), ( ), ( ₂_{₁_.._} ₂ ₃_{₁_.._} ₃ 1 } .. 1 { 1 n

O

k

J

n

O

k

J

n

O

k

J

Sleep f₁₁(k) f₂₁(k) f₁₂(k)f₂₂(k) LLC Controllable Parameters:

host on/off: N(k), VM on/off: n_i(k), Workload fraction: J_ij(k), CPU share f_ij(k)

Host₄ f2n(k) Controller G o ld T o m c a t OS S ilv e r W e b S p h e re OS S ilv e r W e b S p h e re OS S ilv e r W e b S p h e re OS B ro n z e D B 2 OS S ilv e r D B 2 OS B ro n z e T o m c a t OS f₃₂(k) f13(k)f23(k) G o ld T o m c a t OS S ilv e r W e b S p h e re OS B ro n z e T o m c a t OS f33(k) f24(k) G o ld T o m c a t OS S ilv e r W e b S p h e re OS B ro n z e T o m c a t OS f₃₄(k) Host₅ Host₁ n_i(k) f_ij(k) N(k)

J

_ij(k) Powered-down Hosts G o ld D B 2 OS NFS File System (FS) 8 GB 8 2.3 GHz 8 GB 2.3 GHz 2 GB 2 2.3 GHz 4 GB 8 1.6 GHz 4 GB 8 1.6 GHz 4 GB 8 1.6 GHz 8 GB 2 1.6 GHz 2 GB 2 1.6 GHz 8 GB 8 2.3 GHz Memory CPU Cores CPU Speed Host Name 2 Controller FS Host1 Hostdb_1 Host₂ Hostdb_2 Host3 Host₄ Host₅

Fig. 1. The system architecture supporting three services, denoted as Gold, Silver, and Bronze. The controller setsN(k), the number of active hosts,ni(k), the number of VMs to serve theith_{application, and}_f

ij(k)andγij(k), the CPU share and the fraction of workload to distribute to thejthVM, respectively. ASleepcluster holds machines that have been powered off.

the system before initiating the next request, and state informa-tion is maintained between multiple requests belonging to one user session. Session arrivals exhibit time-of-day variations typical of many enterprise workloads, as shown by the example workload in Fig. 2(a), and the workload intensity can change quite significantly within a short time period. The transaction mixes for all three applications can also vary during run time, from an all browsing mix (database reads) to an all buying mix (database writes). The Gold, Silver, and Bronze applications generate revenue as per the non-linear pricing graph shown in Fig. 2(b) that relates the average response time achieved per transaction to a dollar value that the client is willing to pay. Response times below a threshold value result in a reward paid to the service provider, while response times violating the SLA result in a penalty or refund paid to the client.

B. The System Architecture

Fig. 1 shows the virtualized server environment hosting the three services (termed Gold, Silver, and Bronze) that are, in turn, distributed over the application and database tiers. A dispatcher balances the incoming workload, with arrival rates

λ1, λ2, and λ3, for the Gold, Silver, and Bronze services, respectively, across those VMs running the designated appli-cation. Hosts not needed during periods of slow workload

arrivals are powered down and placed in the Sleep cluster to

reduce power consumption.

We focus on dynamic resource provisioning within the application tier only, since in most cases the application layer

requires many more CPUs than the database layer for each online service [11]. For example, this ratio can be as high as ten application processors per database processor for the SAP enterprise application. Similarly, Oracle applications have about a five to one ratio of application server processors to database processors. Therefore, increasing processor utiliza-tion at the applicautiliza-tion layer by consolidating the workload has the potential for significant energy savings.

Returning to Fig. 1, the controller aims to meet the SLAs of the Gold, Silver, and Bronze services while minimizing the corresponding use of computing resources at the application tier, in terms of the number of hosts and VMs, and the CPU share per VM. A VM’s CPU share is specified as an operating

frequency. For example, Host5, with eight CPU cores, each

operating at 1.6 GHz, has8×1.6 = 12.8GHz of processing

capacity that can be dynamically distributed among its VMs. The ESX server limits the maximum number of cores that a VM can use on a host to four, thereby setting an upper bound of 6 GHz for a VM’s CPU share, and reserves a total of 800 MHz of CPU share for system management processes. So, on a machine with eight CPU cores, we can, for example, host a 6 GHz Gold VM, a 3 GHz Gold VM, and a 3 GHz Silver VM. Memory is generally the resource limiting the number of VMs a host machine can support. Each VM is reserved 770 MB of RAM, and about 750 MB is reserved for the VMware system management. Thus, a host machine with 4 GB memory can support up to four live VMs.

(4)

0 500 1000 1500 2000 2500 0 200 400 600 800 1000 1200 1400 1600 1800 2000 2200

Number of new session arrivals

Time instance in 30 second increments New session arrivals to three online applications

Silver (Trade6) Gold (DVD Store) Bronze (RUBBoS) (a) 0 RUBBoS SLA Trade6 SLA

Response Time, seconds

R e v e n u e , d o ll a rs 7e-4 3e-4 -3e-4 0 1 2 3

A stepwise pricing SLA for the online services

5e-4 1e-4 -1e-4 4 DVD Store SLA 5 (b)

Fig. 2. (a) A sample workload showing session arrivals to the Gold, Silver, and Bronze applications, plotted at 30-second intervals. (b) A pricing strategy that differentiates the Gold, Silver, and Bronze services.

VMotion allows a live VM to be migrated between hosts at run time, allowing VMs to be consolidated on fewer host machines while preserving the state of active user sessions. VMotion requires that VM configuration files be stored on a network file system. In our setup, a VM is configured to host a particular service at boot time, but the same VM can be dynamically re-assigned by the controller during run time to host a different service as per workload demand.

Finally, we ensure that the workload generated for our system never exceeds the system’s capacity to process that workload, to ensure a fair comparison between the controlled and uncontrolled system in our results. To do this, we assume the worst-case scenario—transactions with a maximum service time per request—and determine the number of such requests that can be processed by the cluster while satisfying the SLA. Given the system setup in Fig. 1, we add up the total processing capacity of the system, measured in units of CPU frequency. The five host machines at the application tier provide a cumulative processing rate of about 54 GHz to incoming requests. We choose a policy wherein the three services equally share the available capacity, that is, as part of the off-line planning process, we allocate 33% of the overall

capacity, or about 18 GHz, to each service1_{. Now, we must}

determine the worst-case arrival rate that can be processed by a service without SLA violations, and we do this by profiling the corresponding applications. For example, our experiments indicate that a 6 GHz VM hosting Trade6 (the Silver service), when provided with a 0/100 mix of browse/buy requests, can process about 30 requests/second before response times begin to exceed the SLA. Since the Silver service is allocated 18 GHz on our system, the maximum arrival rate tolerated is

3×30 = 90requests per second. Similar calculations for the Gold and Bronze services show that the maximum arrival rates tolerated are 54 and 60 requests per second, respectively.

III. PROBLEMFORMULATION ANDLLC DESIGN

Given the system architecture in Fig. 1 and the SLA functions in Fig. 2(b), the goal is to maximize the profit gen-erated by the three services under a time-varying workload by dynamically tuning the following parameters: (1) the number of VMs to provision to each application; (2) the number of hosts on which to collocate the VMs, including migrating VMs between hosts; (3) the CPU share to be given to each VM; and (4) the number of servers to power up or down.

We solve the above problem using limited lookahead control (LLC), a technique conceptually based on model predictive control (MPC) [12]. The basic idea is to solve a multi-objective optimization problem that maximizes the perfor-mance goal over a given prediction horizon, and then pe-riodically roll this horizon forward. MPC techniques allow performance objectives to be formulated either as set-point

regulation problems or as utility optimization problems.

Set-point regulation requires that the underlying parameters be maintained at a prescribed level or follow a prescribed

trajec-tory. Utility optimization is used to maximize (or minimize)

a given performance measure represented as a function of state and input variables. A weighted norm is typically used as the cost function in which the corresponding variables are aggregated with different weights reflecting their contribution to the overall system utility and operation cost.

This paper formulates power/performance management as a utility optimization problem within the LLC framework. The LLC method is quite useful when control actions have dead times, such as switching on a server or VM and waiting for the bootstrap routine, and for control actions that must be chosen from a discrete set, such as the number of hosts and VMs to switch on. Fig. 3(a) shows the basic LLC concept where the environment inputωis estimated over the prediction horizonh

and used by the system model to forecast future system states

ˆ

x. At each time step k, as shown in Fig. 3(b), the controller finds a feasible sequence{u∗(l)|l∈[k+ 1, k+h]} of control actions within the prediction horizon that maximize the profit generated by the cluster. Then, only the first control action in

the chosen sequence, u(k), is applied to the system and the

rest are discarded. The process is repeated at timek+ 1, given updated state information and new workload arrivals.

1_{This policy is not unique and any good capacity-planning process will}

(5)

System Predictive filter System model System Optimizer ) (k O Oˆ(l) ) ( ˆ l x uˆ(l) ) (k x ) (k u (a) (b)

Fig. 3. (a) The LLC schematic and (b) the state-space explored by the controller within a horizon of lengthh. The shaded states show the trajectory with the optimal cost, forming the feasible control sequence.

A. Modeling the System Dynamics

We define a virtual computing cluster as a group of VMs

distributed across multiple physical machines, cooperating to host one online service. The dynamics of a virtual cluster for the Gold, Silver, and Bronze applications is described by the discrete-time state-space equation

xi(k+ 1) =ϕ

(

xi(k), ui(k), ωi(k)

)

(1) wherexi(k)is the state of the cluster,ωi(k)is the environment

input, andui(k)is the control input2. The behavioral modelϕ

captures the relationship between the system state, the control inputs that adjust the state parameters, and the environment input. The operating state of theith _{virtual cluster is denoted}

asxi(k) =

(

qi(k), ri(k)

)

whereqi(k)is the number of queued

requests andri(k)is the response time achieved by the cluster.

The control input to the ith _{virtual cluster is denoted as}

ui(k) =

(

N(k), ni(k),{fij(k)},{γij(k)}

)

whereN(k)is the

system-wide control variable indicating the number of active host machines,ni(k)is the number of VMs for theithservice,

fij(k) is the CPU share, and γij(k) is workload fraction

directed to the jth virtual machine. The environment input

ωi(k) = {λi(k), mi(k)} includes the workload arrival rate λi(k)and the transaction mixmi(k)to theithvirtual cluster.

An estimate for the environment inputλifor each step along

the prediction horizon accounts for the requests generated by existing sessions in the system plus an estimated number of requests for new sessions. The time-varying nature of

the workload makes it impossible to assume an a priori

distribution, and so we use a Kalman filter to estimate the number of future session arrivals [13]. The filter estimates λ

2_{We use the subscript}_i _{to denote the} _ith _{service class;} _i _{∈ {}₁_,₂_,₃_} denotes the Gold, Silver and Bronze services, respectively.

for timek, denoted byλ(k)ˆ , using the Holt-Winter forecasting model for time-series data. This model uses an exponentially

weighted moving average filter to calculate the meanuand a

slope componentb to capture the trend as

ˆ

λ(k) =u(k−1) +b(k−1)

u(k) =α0λ(k) + (1−α0)(u(k−1) +b(k−1)) + Υ

b(k) =α1(u(k)−u(k−1)) + (1−α1)b(k−1) +ξ

whereα0andα1are smoothing constants, andΥandξdenote

white-noise disturbances. The Kalman filter is trained using representative data to obtain values forα0andα1such that the sum of squared errors∑(λ(k)−λ(k))ˆ is minimized for a one-step-ahead forecast. Successive estimates of the environment

input for timek+ 1and beyond are obtained by extending the

time series into the future with respect to the slope.

Since the actual values for the environment input cannot be measured until the next sampling instant, the corresponding

system state for timek+ 1 can only be estimated as

ˆ xi(k+ 1) =ϕ ( xi(k), ui(k),ωˆi(k) ) . (2)

We developϕas a difference model for each virtual cluster

ias ˆ xi(k+ 1) = ( ˆ qi(k+ 1) ˆ ri(k+ 1) ) = ( ∑ni(k) j=1 max{ ( γij(k)·ˆλi(k)−µij(k) ) ·Ts,0} g(µi(k),ˆλi(k) ) ) (3) where ˆ λi(k) = ˆλKi (k) + qij(k) Ts , (4) µi(k) = n∑i(k) j=1 ( µij(k) ) , µij(k) =p(fij(k),mˆi(k)). (5)

Equation 3 captures the system dynamics over Ts, the

con-troller sampling time. The estimated queue length qˆi(k)≥0

is obtained using the the estimated incoming workloadλˆi(k)

dispatched to the cluster and the processing rate µi(k). The

estimated workloadˆλi(k)to be processed by the virtual cluster

is now given by the fraction γij(k) of the Kalman-filter

estimate ˆλK

i (k) given to the jth VM plus the current queue

length (converted to a rate value) of the VM. The estimated transaction mixmˆi(k)is simply the transaction mix observed

during the previous sampling interval.

The processing rateµi(k)of a virtual cluster in (5) depends

on the number of VMs as well as the CPU share given to each

VM on its host machine. Equation (5) uses the functionp(·)

to map the CPU share given to the jth _{VM in the cluster}

to a corresponding processing rate. This function is stored

as a lookup table, indexed by the CPU share fij(k) and the

estimated transaction mixmˆi(k). The estimated response time ˆ

ri(k)is provided by the functiong(·)that maps the processing

and arrival rates for a given transaction mix to a response time. The functions p(·) and g(·) are obtained via simulation-based learning, a technique to generate behavioral models of complex systems, wherein the system is first simulated for various environmental inputs and the simulation results

(6)

are used to obtain an approximation structure such as a lookup table, neural network, or regression tree—a lookup table, in our case. We do this by taking a VM off-line and measuring the response times achieved by this VM, when provided with different CPU shares under a range of workload conditions. This response time includes the latency incurred by requests at both the application and database tiers. A VM under test for the Gold service is profiled as follows. Since the testbed has only one database per service, we expect the VMs hosting these databases to experience the most stress during system operation (due to multiple VMs accessing the database simultaneously), with corresponding impact on the achieved response time. So, we use three application servers and one database server for our profiling experiments. We start the VM under test on an application server and two additional VMs on the other servers. The two side VMs are then provided with the maximum CPU share and loaded with a constant workload intensity. Keeping two machines busy during the test ensures that we stress the Gold database to the maximum, mimicking the worst-case operating scenario. We then profile the VM under test by setting its CPU share and measuring the response times achieved under increasing workload intensities. This testing is repeated for transaction mixes having 0/100, 50/50, and 100/0 ratios of browse/buy requests within each user session.

Fig. 4 shows the response times achieved by a VM under different test scenarios for the Silver service3_{. The SLA goal is}

indicated by the horizontal dotted line; our objective is to keep 99.9% of the response times within the SLA. The arrival rate at which response times begin to violate the SLA goal indicates the VM’s maximum processing rate for that application p(·), given that CPU and memory share. For example, Fig. 4 shows that a 6 GHz VM can process approximately 30 Silver requests per second under a 50/50 browse/buy transaction mix before queueing instability occurs. If the VM’s CPU share is further constrained, say to 3 GHz, its maximum processing rate decreases and SLA violations occur earlier, at about 20 requests per second.

We obtain g(·) in similar fashion. Consider, for example, Fig. 4(c) when the arrival rate is 20 requests per second with a 0/100 transaction mix. We assign a CPU share of 3 GHz to the VM under test, holding the two other VMs in the virtual cluster at 6 GHz and stressing them (and the database tier) with the maximum arrival rate that can be tolerated without SLA violations (about 30 requests per second). Now, we start providing 20 requests per second to the VM under test, and measure the response time achieved. The experimental data shows us that 99.9% of requests satisfy a response time of about 1800 ms. The function g(·)will then output a response time of 1800 ms for a VM given this CPU share, workload intensity, and transaction mix.

For the DVD Store and Trade6 applications, workload mixes consisting of more browse requests (database reads) than buy requests (database writes) impose less stress on the system, and these services tolerate higher arrival rates when the

work-3_{The profiles for the Gold and Bronze applications are qualitatively similar}

to that of the Silver application.

(a)

(b)

(c)

Fig. 4. Response times achieved by Trade6, the Silver application, as a function of a VM’s CPU share, arrival rate, and workload mix. The shortest response times are achieved with an all browsing mix.

load mix is mostly browse requests. RUBBoS, however, has a large database size (about 500,000 users), causing browsing for messages to incur higher response times than posting messages. Therefore, a workload of mostly browse requests will cause the maximum tolerated arrival rate to decrease. For transaction mixes that lie between the browe/buy mixes shown in Fig. 4, we use linear interpolation to estimate the maximum arrival rates tolerated by the various VMs, given different CPU shares. This interpolation enables the controller to adapt to a variable transaction mix at run-time.

(7)

0 10 20 30 40 50 60 70 80 90 100 0 50 100 150 200 250 300 350

CPU utilization (%) under web−driven workload

Power, Watts

Power consumption of Dell 1950 and Dell 2950 servers

Dell 1950 Watts=208 + 1.21*CPU (R2=0.95) Dell 2950 Watts=224 + 1.16*CPU (R2=0.98) 1 VM 2 VMs 3 VMs 4 VMs 5 VMs

Fig. 5. The power consumed by two models of Dell servers, the 1950 and 2950, when loaded with VMs hosting an application server. The line is fit from experimentally collected data.

The power consumption of the host machine is also profiled off-line by placing it in the operating states shown in Fig. 5. Using a clamp-style ammeter, we measured the current drawn by the servers as we instantiated VMs, and loaded each one with an increasing workload intensity before booting the next one. We then multiplied the measured current by the rated wall-supply voltage to compute the power consumption in Watts. We also measured the power consumed by our servers when booting up, powering down, and in a standby state, during which only the network card is powered on. The Dell PowerEdge 1950 and 2950 servers consume 218 and 228 Watts, respectively, when booting up, and 213 and 228 Watts, respectively, when powering down. The same machines con-sume 18 and 20 Watts in a standby state. To determine the cost of operating the host machine during each controller sampling interval Ts, the server’s power consumption is multiplied by

a dollar cost per kilo-Watt hour overTs.

Inspecting the data in Fig. 5, we can make the following observations regarding the power consumption of the servers, given the underlying applications.

• An idle machine consumes about 70% of the power

con-sumed by a machine running at full CPU utilization. So, to achieve maximum power savings on a lightly loaded machine, it is best to redirect the incoming workload and power down the machine.

• The intensity of the workload directed at the VMs

does not significantly affect the power consumption. Our experiments reveal that the Trade6, DVD Store, and RUBBoS applications are not compute intensive, and that the CPU utilization at the application tier peaks at about 30% before the database tier becomes the bottleneck. So, the host machine draws about the same amount of current from a minimum to the maximum arrival rate experienced by the VMs. So, our controller uses a simplified model where power consumption is obtained simply as a function of the number of active VMs. The vectorui(k)to be decided by the controller at sampling

time k for each virtual cluster includes ni(k) ∈ Z+, the

number of VMs to provision, fij(k) ∈ {3,4,5,6} GHz, the

On Off Shut-down Boot Wait for Host VM_off VM_wait & Host_boot VM_wait & Host_boot VM_boot & Host_on VM_on VM_on VM_boot VM_shutdown VM_off Signal Description

VM wait VM scheduled for instantiation; host not yet ready VM boot Instantiate a VM; host is ready

VM on VM instantiated; ready for workload VM shutdown VM powering down; resources

de-allocated

VM off VM is turned off

Host boot Host machine is booting up Host on Host machine is powered on

Fig. 6. The finite state machine corresponding to the various operating states of a virtual machine. A VM requires two state transitions to power down, and up to four state transitions to turn on.

CPU share, γij(k)∈ ℜ, the workload fraction to give to the

jth VM of the cluster, N(k) ∈ Z+, the number of active

hosts, and a mapping of VMs to host machines. The size of the virtual clusterni(k) may be modified by instantiating

and shutting down VMs, and by changing the service that the VM provides to the client (e.g., from Gold to Silver, or vice

versa). Similarly, the number of operating hostsN(k)may be

modified by powering up and shutting down machines, aided by VM migration that enables the consolidation of VMs onto fewer host machines.

Control delays present additional challenges that must be

modeled within the problem. The CPU sharefij(k)and

work-load distributionγij(k)to VMs can be actuated immediately;

that is, there is negligible time delay between deciding the con-trol input and realizing its effect on the system. Realizing the

controller-specified number of host machinesN(k), however,

incurs a delay; for example, about three minutes are required to power up a host machine. Similarly, realizing the number of VMsni(k)incurs a delay to power up or shut down virtual

machines, and the duration of this delay depends on the state of the host machine to which a VM is assigned. Migrating VMs using VMotion and changing applications on a live VM are assigned delays of 30 seconds each.

To handles control delays, the controller maintains state information for each host machine and VM for every step of the prediction horizon, using the finite-state machine shown in Fig. 6. For example, if the controller decides to turn on a VM, it must first evaluate the state of the host machine to which the VM has been assigned. If that host is currently turned

off, the VM transitions from the Off state to the VM wait

state, cycling for another time step while the host machine

powers up. Then, the VM transitions to the VM boot state,

and is ready to process incoming workload when it transitions

to the On state. The controller evaluates each state for its

(8)

B. The Profit Maximization Problem

Ifxi(k)denotes the operating state of theithvirtual cluster,

and

ui(k) =

(

N(k), ni(k),{fij(k)},{γij(k)}

)

is the decision vector at time k, the controller estimates the

profits generated from timek+ 1through time k+has

P(x(k), u(k))= k_∑+h l=k+1 3 ∑ i=1 Hi ( ri(l) ) −O(u(l))−S(∆N(l),∆n(l)), (6) where the revenueHi(ri(k))is obtained from the

correspond-ing SLA functionHithat classifies the response time achieved

per transaction into one of two categories, “satisfies SLA” or “violates SLA,” and maps it to a reward or refund, re-spectively. The power-consumption cost incurred in operating

N(k) machines is given byO(k) =∑N_j₌₁(k)(O(Nj(k)

)) that sums the power-consumption costs incurred by the host ma-chines in their current operating states,O(Nj). The switching

cost incurred by the system due to provisioning decisions is

denoted by S(∆N(k),∆n(k)). This accounts for transient

power-consumption costs incurred when powering up/down VMs and their hosts, as well as the dollar cost incurred when migrating a VM.

Our experiments indicate that migrating a VM takes much less than 30 seconds, from the time the migration request is issued to the time the state transfer is complete, and during that brief period, approximately 10% of the requests directed at this VM violate their SLA. So, the cost of migrating a VM is the refund paid to 10% of the estimated number of requests arriving during the migration interval. In terms of the latency incurred, migration is preferable to the alternative approach for VM consolidation that involves shutting down a VM on a machine and instantiating a new one on another machine, especially since the local cache of the new VM needs to be warmed up.

Due to the above-mentioned costs, excessive switching of hosts or VMs, caused by workload variability may actually increase SLA violations and reduce profits. Therefore, we convert the profit-generation function in (6) to a corresponding utility function that quantifies a controller’s preference be-tween different provisioning decisions in such a risky

environ-ment. We augment the estimated environment inputλ(k)ˆ with

an uncertainty band λ(k)ˆ ±ε(k), in which ε(k)denotes the past observed error between the actual and forecasted arrival rates, averaged over a window. For each control input, the next state equation in (2) must now consider three possible arrival-rate estimates, ˆλ(k)−ε(k),λ(k)ˆ , andˆλ(k) +ε(k), to form a set of possible future statesX(k)that the system may enter. GivenX(k), we obtain the corresponding set of profits

generated by these states as P(X(k), u(k)) and define the

quadratic utility function

U(P(·))=A·u¯(P(·))−β·(ν(P(·))+ ¯u(P(·))2), (7) whereA >2·u¯(P(·))is a constant,u¯(P(·))is the algebraic mean of the estimated profits, ν(P(·)) is the corresponding

variance, and β ∈ ℜ is a risk preference factor that can

be tuned by the data-center operator to achieve the desired controller behavior, from being risk averse (β > 0), to risk neutral (β = 0), to risk seeking (β <0). The utility function in (7) is an instance of the mean-variance model with constant risk aversion, used in stock portfolio management where the problem is to allocate funds between multiple shares in such a way to maximize the expected return while reducing the corresponding variance (or risk) [14].

Intuitively, we can explain (7) as follows. Given a choice between two operating states, each with an equal mean profit but with a different variance, a risk-averse controller will choose the state having the smaller variance. For example, when the incoming workload is highly variable, a risk-averse controller will tend to provision for the mean arrival rate, whereas, a risk-seeking controller will favor states with higher variance, provisioning resources optimistically in case either the highest or lowest possible arrival rates were to occur.

Given the utility function in (7), we formulate the resource provisioning problem as one of utility maximization.

Compute: max u k+h ∑ l=k+1 U(P(X(l), u(l)), u(l)) (8)

Subject to: N(l)≤5, ni(l)≥Kmin, i= 1,2,3 n∑i(l) j=1 γij(l) = 1, i= 1,2,3, and 3 ∑ i=1 n∑i(l) j=1 eijz(l)·fij(l)≤Fmaxz , z= 1, . . . ,5, wherehdenotes the prediction-horizon length. As an operating

constraint, N(l) ≤ 5 ensures that the number of operating

servers never exceeds the total number of application-tier servers in the testbed, and ni(l)≥Kminforces the controller

to operate at least Kmin VMs at all times in the cluster to

accommodate a sudden spike in request arrivals (including those caused by flash crowds). In our experiments,Kminis set

to one. We also introduce a decision variableeijz(l)∈ {0,1}

to indicate whether the jth VM of the ith application is

allocated to host z ∈ [1,5], and the final constraint ensures that the cumulative CPU share given to the VMs does not

exceedFz

max, the maximum capacity available on hostz.

C. The LLC Implementation

We have implemented a centralized controller to decide the control vector for the system at each sampling instance. Given

a system state x(k) at time k and the estimated workload

arrival rate for timek+ 1, the controller first generates a set of valid next states (or system configurations), in terms of the number of hosts and VMs to power on, the allocation of VMs to hosts, and the CPU share to VMs. Each state is assigned a utility value following (7).

When lookahead control techniques are applied to real-world systems, it is typically the case that the state-space is quite large. Therefore, to achieve real-time control of such systems, heuristics are often used to limit the search

(9)

) 1 ( ˆk x ) 2 ( ˆk x ] , 1 [ : horizon Prediction k kh x(k) ) ( ˆk h x ) 3 ( ˆk x

Fig. 7. The trajectory explored by the controller within a prediction horizon of lengthh. The shaded states show the best trajectory chosen by the controller.

process [12]. In our case, given the system state at timekand the estimated workload arrival rate for timek+1, the controller generates a set of valid next states (system configurations), in terms of the number of hosts and VMs to power on, the allocation of VMs to hosts, and the CPU share to VMs. The controller does not explore any additional configurations within the prediction horizon from time steps k+ 2tok+h,

but it simply lets the configurations generated at time k+ 1

evolve within the prediction horizon, as shown in Fig. 7. Note

that some of the states generated at time k+ 1 must evolve

within the horizon before their final impact can be evaluated. For example, consider a system configuration, generated at timek+ 1, that requires a server to be powered up, and then a VM to be started on that server. We need to look ahead at least five minutes (based on the time delays assumed in Section IV) to evaluate the impact of this control decision on

system performance. If the sampling time Ts is set to two

minutes, then the controller needs to look ahead at least three time steps to fully evaluate the impact of the control decision on both energy savings and SLA violations.

IV. EXPERIMENTALRESULTS

This section presents results for various LLC realizations, evaluated on their ability to maximize the profit generated by the testbed in Fig. 1 over a 24-hour period. According to (6), profit is maximized when both energy consumption and SLA violations are minimized. So, we evaluate the controller using two criteria: (1) the energy savings achieved over an ‘uncon-trolled system’ in which all machines remain in a powered-on state and (2) the number of SLA violations incurred by the controlled system. An uncontrolled system is one in which the initial configuration of three 6 GHz VMs for each service is unchanged over a 24-hour period. This configuration ensures that each service meets its SLA goal under the worst-case workload arrival rate for a 0/100 browse/buy transaction mix for the DVD Store and Trade6 applications, and a 100/0 transaction mix for the RUBBoS application.

Fig. 8 shows the system and controller parameters used in our experiments. The time needed to power up a server is about 2 min. 30 sec.; the time needed to boot a VM and initialize an application on it is about 1 min. 45 sec.; and the time to execute the control kernel is about two sec. We set the controller sampling time to two minutes. Therefore, to evaluate a control decision that involves powering up a host machine and booting a VM on that host, the controller must look

Parameter Value

Cost per KWatt hour $ 0.3

Time delay to power on a VM 1 min. 45 sec

Time delay to power off a VM 45 sec

Time delay of a VM migration 30 sec

Time delay to power on a host 2 min. 55 sec

Time delay to power off a host 1 min. 30 sec

Prediction horizon ≥3 steps

Control sampling period 2 min.

Initial config. for each service 3 VMs @ 6 GHz ea.

Fig. 8. The system and controller parameters used in the experiments.

ahead at least three sampling intervals, that is, the minimum lookahead horizon length is three steps. The time needed to turn off a VM includes the time to shut down the VM and de-allocate its resources on the host machine. Requests belonging to existing sessions on the de-allocated VM are re-routed to other live VMs hosting the same application.

The control kernel is written in Matlab and compiled to a C-language executable, and is invoked every two minutes. The workload generator sends browse and buy/sell/post requests to

0 500 1000 1500 2000 2500 0 200 400 600 800 1000 1200 1400 1600 1800 2000

2200 New session arrivals to three online applications

Time instance in 30 second increments

Numb er of ne w s ess ion arrivals Silver (Trade6) Bronze (RUBBoS) Gold (DVD Store) Kalman filter estimate, time k+1 (dotted grey line)

(a) 0 2 4 6 8 10 12 14 16 18 20 22 0 500 1000 1500 2000 2500 3000

Requests per session

Frequency per 10,000 sessions

Distribution of requests per session (10,000 sample sessions)

(b)

Fig. 9. (a) Workload showing new session arrivals to the Gold, Silver, and Bronze applications, plotted at 30-second intervals. The predictions provided by the Kalman filter for the Silver application is superimposed on the original trace. (b) Distribution of the number of requests per a sampling of 10,000 sessions for the services. The number of requests follows a long-tailed Pareto distribution up to a maximum 20 requests per session.

(10)

Performance Metric (β=−2) (β= 0) (β= 2)

Avg. Energy Savings 37% 41% 41%

Avg. SLA Violations 4,868 4,142 3,250

Avg. Host Sw. Activity 76 66 60

Avg. VM Migrations 81 58 48

Avg. Session Re-reroutes 57 47 40

Fig. 10. Control performance for the six workloads over a 24-hour period, when operating in the risk-seeking, risk-neutral, and risk-averse regimes.

the three services. It is triggered by a scheduler to start new sessions within the system every 30 seconds, the time gran-ularity of the World Cup ’98 (WC’98) workload traces used to synthesize our workload [15]. The traces have the desirable characteristics of burstiness and variability for stressing web applications, as seen in Fig. 2(a). The scheduler interprets each data point in Fig. 9(a) as the number of new sessions arriving during a 30-second interval, similar to an open-loop workload. The session length, in terms of the number of requests per session, follows the long-tailed distribution shown in Fig. 9(b) which is typical of web-based workloads [16]. The distribution has a mean of five requests per session, spanning a range from a minimum of two requests per session to a maximum of 20. The workload generator randomly selects a value within this distribution and sets the number of requests for each session it schedules. The time to issue the first request of each new session is uniformly distributed within the 30-second time period. The remaining requests are issued after a four-second user think time following the receipt of each response, as in a closed-loop workload. The combination of open-loop user arrivals and departures, and closed-loop sessions forms a partly-open workload as advocated by Schroeder et al. [17] as a realistic model for transactional applications.

Using the workload generation scheme described above, we

synthesized six different workload sets, termed WLA through

WLH, for our experiments. Future session arrivals are predicted

by a Kalman filter and sent to the controller. The filter trains itself on a small portion of the workload (the first 40 time steps) and once trained, it provides effective estimates. In Fig. 9(a), the predictions provided by the Kalman filter for the Silver application is shown, superimposed on the original trace. The average absolute error between the predicted and actual values is about 5% for the one-step-ahead estimate, and increases by about 1% for each subsequent estimate within the prediction horizon.

When the controller decides to shut down a VM, new sessions are not sent to it, and requests belonging to existing sessions on that VM are re-routed to other VMs hosting the same application. The scheduler instructs the workload generator where to re-route future requests belonging to the existing sessions. Since session lengths follow a long-tail distribution, only a few long sessions will linger, typically on the order of a few minutes, and the burden on the ‘backup’ VMs will not be significant. We use a simple round-robin scheme for re-routing sessions on to the backup VMs. We also use a clustered configuration for our IBM WebSphere and Apache Tomcat installations that enables sessions to be replicated across all live instances of the application servers. Session replication ensures that state information for each session is shared by all application servers in the cluster via

500 505 510 515 520 525 530 0 1 2 3 4

Numbe r of VM s in li ve state on host

Virtual and host machine switching activity, 1 hour snapshot, Workload 1 Apollo

Chronos Demeter Eros Poseidon

Fig. 11. VM and host machine switching activity over a one-hour period for WL A.

peer-to-peer communication. This allows requests belonging to a session to be re-routed between VMs with no interruption in service to the end user.

A. Performance of Risk-Aware Control

This section examines the effect of tuning the

risk-preference parameterβ on controller performance. Our

objec-tive is to identify a value forβ that maximizes energy savings and minimizes SLA violations, and we start by identifying the regime in which the controller performs best: risk seeking (β <0), risk neutral (β = 0), or risk-averse (β >0). Fig. 10 compares the controller performance in the three regimes, over six workloads, in terms of the energy savings achieved over an uncontrolled system during a 24-hour period, the number of SLA violations, the host machine switching activity4, the number of live VM migrations, and the number of sessions re-routed from a de-allocated VM to backup VMs. The workload for the three online services was synthesized using three, 24-hour WC’98 traces having the same burstiness and workload variability seen earlier in Fig. 2(a).

Our experiments indicate that a risk-averse controller with

(β = 2) outperforms a risk-seeking controller in energy

savings and SLA violations, and reduces switching activity. The energy savings average about 41% when compared to an uncontrolled system over a 24-hour period, and the number of SLA violations remains very low, about 0.01% of the total number of requests handled by the system. Fig. 11 shows the average CPU share and number of VMs and host machines, respectively, allocated by the controller at each time instance over a one-hour window for the workload WL A. A host showing zeros VMs indicates that the machine has been powered off. While switching activity enables dynamic consol-idation and energy conservation, it also involves power-cycling of servers, which can reduce the lifespan and reliability of the hardware. However, demand for energy-efficient controls is increasing, and it is expected that server models capable of supporting ‘standby’ and ‘hibernate’ states (similar to laptop and desktop models), will continue to emerge on the market, making switching activity less of a reliability concern.

In all experiments, on average, 99.99% of the measured response times for all three applications consistently meet

4_{Switching activity is defined as the number of times servers are powered}

(11)

0 100 200 300 400 500 600 0 1 2 3 4 5 Number of VMs ass ig ned to t he Go ld s ervice

VMs assigned to the Gold service for three risk−preference regimes, WL_A Risk seeking, β=−2 Risk neutral, β=0 Risk averse, β=2 (a) 0 100 200 300 400 500 600 0 5 10 15 20

CPU s hare as signe d t o

the Gold serv

ice

CPU share to the Gold service for three risk−preference regimes, WL_A Risk seeking, β=−2

Risk neutral, β=0 Risk averse, β=2

(b)

Fig. 12. (a) The effect of the risk-preference regime on VM switching activity for Workload WL A. As β increases, the controller re-assigns VMs less frequently. (b) The effect ofβon CPU share. Asβincreases, the controller re-distributes CPU share less frequently in accordance with the VM assignment.

their SLA goals under the control scheme. Our experiments indicate that in an adequately provisioned system without dynamic control, about 0.01% or more of requests violate SLA due to the shape of the distribution, and due to background processes such as Java garbage collection. Therefore, the pro-active control scheme bears no more negative effects upon performance than can be expected of an uncontrolled system with normal variations.

Fig. 12 illustrates the switching activity of the controller for VM assignment and CPU share to the Gold service under workload WL A. The decreased energy savings of a risk-seeking controller is due to its optimistic switching activity, which can reduce power savings under a noisy workload. On the other hand, the risk-averse controller differs from both the risk-seeking risk-neutral ones in that it reduces the overall switching activity, but even more importantly, it differs

in terms of when switching decisions are made. Thus, we

conclude that a risk-averse controller performs best under a noisy, time-varying workload.

Next, we tune the value of β for a risk-averse control

im-plementation. Figs. 13 shows the performance of risk-neutral (β = 0) and risk-averse (β > 0) controllers in terms of the energy savings achieved over an uncontrolled system during a 24-hour period. Although the energy savings decrease slightly,

0 1 2 3 4 5 30 35 40 45 50 β

Energy savings (%) over uncontrolled system

Energy savings for 6 workloads, 3−step lookahead controller WL_A WL_B WL_C WL_D WL_E WL_F

Fig. 13. The effect ofβon the energy savings achieved for the six workloads. Asβincreases, the energy savings and the number of SLA violations decrease slightly due to more conservative switching activity, up to a point.

h= 3 h= 4 h= 5

ES SLA ES SLA ES SLA

WL A 44% 1,263 42% 3,330 36% 4,024 WL B 36% 1,335 32% 1,573 34% 1,807 WL C 42% 1,874 42% 1,425 42% 1,947 WL D 42% 2,311 44% 2,986 40% 5,734 WL E 42% 8,818 42% 8,765 41% 10,056 WL F 40% 5,899 39% 6,142 34% 25,346 Avg. 41% 3,583 40% 4,036 38% 8,152

Fig. 14. Energy savings (ES) achieved and SLA violations (SLA) incurred by a risk-averse controller (β= 2) with increasing horizon length.

the number of SLA violations drops noticeably fromβ= 0to

β= 2, an average reduction of about 30% when compared to

the risk-neutral case. This is due to the conservative manner in which the risk-averse controller switches machines. Around

β = 4, however, the SLA violations begin to increase as

the controller becomes overly risk-averse and begins to shun useful provisioning decisions. We conclude that to save energy and ensure a minimum number of SLA violations, while reducing switching activity, aβ value of two is sufficient.

B. Effect of Tuning Prediction Horizon Length

Once we have established a best-performing value of β,

we now study the effect of tuning the prediction-horizon

length h on the performance of a risk-averse controller with

β = 2. The energy savings in Fig. 14 remain relatively

constant whenhis three or four, and decrease slightly when

his five. As seen from Fig. 14, the SLA violations increase

slightly from h = 3 to h = 4, and again from h = 4 to

h = 5. Therefore, we conclude that a prediction horizon

of 3 steps is sufficient for good performance in the general case. Longer horizon lengths can be explored at very little

additional cost—execution times for h > 5 remain under

two seconds. However, the prediction error typically increases with horizon length, and the probability of the controller making errant control decisions will increase correspondingly. Fig. 15(a) shows the increase in SLA violations as we extend the horizon further into the future, up toh= 8. By contrast, if the controller were to have perfect knowledge of future workload arrivals, the SLA violations would generally follow

(12)

3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 9 10

Lookahead horizon length SLA violations for 6 workloads

Number of SLA violations

WL_A WL_B WL_C WL_D WL_E WL_F (a) 3 4 5 6 7 8 1000 2000 3000 4000 5000 6000 7000 8000 9000

SLA violations for 6 workloads using an "oracle" controller

Number of SLA violations

Lookahead horizon length WL_A WL_B WL_C WL_D WL_E WL_F (b)

Fig. 15. (a) Effect of horizon length on SLA violations whenβ= 2. The violations show an increasing trend with horizon length due to the inaccuracies in workload forecasting. (b) The effect of horizon length on SLA violations when the controller has perfect knowledge of the future. Hereβ= 0since there are no prediction errors. The SLA violations generally show a decreasing trend with horizon length.

a decreasing trend as hincreases, as shown in Fig. 15(b).

C. Effect of User-Defined Policies

Excessive power cycling of a server can reduce its relia-bility, leading to software errors and disk failures. To control the repeated power cycling of servers, a system administrator may wish to specify a policy to leave a server powered on for a minimum time period once switched on. Such a policy acts as an additional operating constraint on the controller. We examined two policies, one to leave a server on for a minimum of 30 minutes once it is switched on, and the other to leave the server on for a minimum of 60 minutes. These policies reduce the number of SLA violations by 44% or more, by reducing the amount of switching activity by 58% or more, at the expense of a 2-4% decrease in power savings.

D. Time-Varying Transaction Mix

Over a 24-hour run of the controller, we dynamically vary the transaction mix of the workload every four hours to measure the effect on response times and SLA violations. The changing transaction mixes, comprised of browse/buy (or

0 500 1000 1500 2000 2500 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 0 500 1000 1500 2000 2500

Number of new session arrivals

Transaction mixes (browse/buy) to three online applications G=50/50 S=50/50 B=50/50 G=83/17 S=17/83 B=33/67 G=17/83 S=50/50 B=67/33 G=0/100 S=83/17 B=17/83 G=83/17 S=83/17 B=33/67 Gold (G) Silver (S) Bronze (B) (a) (b)

Fig. 16. (a) Time-varying transaction mixes for the three services over a 24-hour period. (b) A sampling of the performance of a risk-averse (β= 2) controller with horizon lengthh= 3, and a policy to leave a host machine on for at least 30 minutes.

post) requests, are shown in Fig. 16(a). We use a three-step lookahead, risk-averse controller (β= 2) with a policy to leave servers powered on for a minimum of 30 minutes. Using the system model captured in Fig. 4, we use linear interpolation to estimate the maximum processing rate achieved by a VM for any transaction mix, given some CPU share. If the transaction mix in Fig. 16(a) is dominated by browse requests, we interpolate between an all browse mix and a 50/50 mix. If the transaction mix is mostly buy (or post) requests, we interpolate between an all buy (post) mix and a 50/50 mix. Fig. 16(b) shows the system performance over two selected intervals. The Bronze application, RUBBoS, shows higher response times for transaction mixes dominated by browse requests due to the large database size. The Gold and Silver applications have higher response times for transaction mixes dominated by purchase requests.

E. Effect of Prediction Errors

Finally, in an uncertain operating environment, controller decisions are not optimal as the controller does not have perfect knowledge of future environment inputs, and the de-cisions are made using a limited prediction horizon. Our final series of tests were aimed at comparing the practical controller implementation against an “oracle” with perfect knowledge of future workload arrivals. We selected the best-performing,

risk-averse, three-step lookahead controller with β = 2 and

compared it against an oracle over the same prediction-horizon length. Fig. 17 shows that having perfect knowledge of the future does not greatly affect energy savings. However, SLA

(13)

Kalman Predictions Oracle % Reduction

ES SLA ES SLA in Violations

WL A 42% 1,263 44% 1,105 13% WL B 32% 1,335 38% 1,342 -1% WL C 42% 1,874 36% 1,456 22% WL D 44% 2,311 37% 1,469 36% WL E 42% 8,818 42% 7,270 18% WL F 39% 5,899 35% 3,034 47% Avg. 41% 3,583 39% 2,613 27%

Fig. 17. Energy savings (ES) and SLA violations (SLA) for a three-step lookahead, risk-averse (β= 2) controller using workload predictions with and without errors. The average energy saving is about the same for a controller with and without perfect workload predictions; however, perfect predictions from the Kalman filter reduces SLA violations by an average of 27%.

violations are reduced by an average of 27%. The improved performance of the oracle should be expected, considering that the error in the Kalman estimates, for typical workloads, starts at 5% and increases 1% for each prediction step thereafter.

V. CONTROLLERSCALABILITY

This section evaluates some techniques aimed at improving the controller’s scalability for server clusters larger than our testbed using trace-based simulations. First, we parallelize the search process by developing a multi-threaded controller, wherein the overall state space is decomposed into sub-spaces, and individual threads determine the optimal trajectory within each sub-space in parallel. Concepts from approximation theory are also used within the LLC framework to further reduce the computational burden. The relevant approximations are made in the optimization of the control variables input to the system. We use a neural network (NN) to learn the tendencies of the controller, in terms of its decision making, via off-line simulations. At run time, given the current state and environment inputs, the NN provides an approximate solution—a starting point around which to perform a local search to obtain the final control decision. The use of machine learning for controlling resource allocation in data centers has also been proposed by the authors of [18] and [19]

Fig. 18 summarizes the execution-time overhead incurred by the risk-averse controller if our system were to grow from five to twenty servers (up to 80 VMs). Our current testbed generates about 1,024 possible control options for VM assignments to the five servers, and would increase by three orders of magnitude every time five more servers are added. As expected, parallelizing the search strategy results in a significant speedup. Using eight threads (on an eight-core machine) to search the state space, we achieve a speedup of about five times over the sequential case. Finally, using a back-propagation, feed-forward NN to provide approximate

System Size Control Execution Time with

Options Eight Threads

LLC NN Approx.

5 Hosts (20 VMs) 103 _{0.3 sec.}

-10 Hosts (40 VMs) 106 _{26 sec.} _<_{1 sec.}

15 Hosts (60 VMs) 109 _{301 sec.} _{3 sec.}

20 Hosts (80 VMs) 1012 _{2,241 sec.} _{5 sec.}

Fig. 18. Execution times of the controller for simulated larger clusters when using methods of multi-threading and neural-network (NN) approximation to improve scalability. The simulations were executed on a server with two, quad-core CPUs at 1.6 GHz and 4 GB of memory.

control decisions, we can reduce the execution time of a single-threaded 3-step lookahead controller for 20 host machines to about 28 seconds, and to five seconds by multi-threading the NN-based controller. The NN is created with one hidden layer and trained using 2,800 data points collected over four workloads similar to that shown in Fig. 2(a). The average energy savings of the NN-based controller decrease slightly, from 39% in the baseline case, to 37% in the NN case. The average number of SLA violations for the NN-controlled system, although still small, increases from 0.01% in the baseline case, to about 0.04%.

VI. RELATEDWORK

Dynamic resource provisioning policies focusing on power management in a data-center setting have been proposed by researchers even before the advent of virtualization tech-nology [20], [21]. Virtualization allows for easier workload consolidation and exposes additional control knobs to the data-center operator for fine-grain control of resources such as the CPU and memory. Prior work on resource provisioning in virtualized environments [22]–[24] differs from ours in that we apply a predictive control technique to manage both power and performance in a heterogeneous computing environment, encoding the risk of provisioning decisions and accounting for various switching costs.

Khanna et al. propose a reactive control technique that selects a minimal VM configuration while accounting for the impact of VM migration on CPU and memory consumption, but not for the time delays for control actions [25]. Further-more, no static power is saved by switching off host machines. The server consolidation techniques proposed by Tsai et al. [26] and Steinder et al. [27] reduce the power consumed by a single application executing on homogeneous servers by migrating VMs between servers, but also without on/off control and accounting for the control costs. PID control has been proposed to manage virtualized computing environments, including power management and CPU utilization [5], [28]. We also see the use of hierarchical control strategies to manage multiple VMs in a coordinated fashion [29]–[31]. For example, Xu et al. [29] develop a multi-level optimization scheme using fuzzy logic to allocate CPU shares to VMs hosting two enterprise applications on a single host. A global controller arbitrates requests for CPU share from local controllers within VMs, aiming to maximize the profit generated by the server. Wang et al. [30] develop a two-layer control architecture for VMs that uses a primary loop to load-balance among VMs and a secondary loop to control CPU frequency for power efficiency whereas Xu et al. [32] use MPC to provision computing resources in a clustered web hosting environment, but without accounting for power consumption.

VII. CONCLUSION

We have presented an optimization framework for

power/performance management in a virtualized computing system hosting online services with session-based workloads. The proposed scheme achieves higher server utilization and energy efficiency by dynamically provisioning VMs, consol-idating the workload, and turning servers on/off as needed.