Approximation Modeling for the Online Performance Management of Distributed Computing Systems

(1)

1

Approximation Modeling for the Online

Performance Management of Distributed Computing

Systems

Dara Kusic, Student Member, IEEE, Nagarajan Kandasamy, Member, IEEE, and Guofei Jiang, Member, IEEE,

Abstract— A promising method of automating management

tasks in computing systems is to formulate them as control or optimization problems in terms of performance metrics. For an online optimization scheme to be of practical value in a distributed setting, however, it must successfully tackle the curses of dimensionality and modeling. This paper develops a hierarchical control framework to solve performance management problems in distributed computing systems operating in a data center. Concepts from approximation theory are used to reduce the computational burden of controlling such large-scale systems. The relevant approximations are made in the construction of the dynamical models to predict system behavior as well as in the solution of the associated control equations. Using a dynamic resource provisioning problem as a case study, we show that a computing system managed by the proposed control framework with approximation models realizes profit gains that are in the best case within 1% of a controller using an explicit model of the system.

Key words: Approximation modeling, neural network, regression tree, utility computing, limited lookahead control, dynamic optimization

I. INTRODUCTION

Web-based services such as online banking and shopping must respond continuously to external events in real-time. As people have become more reliant on online services, their expectations for dependability and quality of service (QoS) have risen dramatically. Online services are enabled by enterprise applications, defined as any software hosted on a server which simultaneously provides services to a large number of users over a computer network [1]. Enterprise applications are typically hosted on a distributed computing environment (DCE) comprising heterogeneous and networked servers housed in a physical facility called a data center— usually a stand-alone building. Data centers can be privately owned, and host applications serving a variety of companies and uses, ranging from e-commerce services, to scientific applications, and to on-line gaming.

To operate DCEs effectively while maintaining the de-sired QoS, multiple performance-related parameters must be dynamically tuned to adapt to changing application modes and operating conditions. Key management tasks that can D. Kusic and N. Kandasamy are with the Electrical and Computer En-gineering Department, Drexel University, Philadelphia, PA 19104. E-mail: [email protected] and [email protected]. D. Kusic is supported by NSF grant DGE-0538476 and N. Kandasamy acknowledges support from NSF grant CNS-0643888

G. Jiang is with the Robust and Secure System Group, NEC Laboratories America, Princeton, NJ 08540. E-mail: [email protected]

be automated in a DCE include power management, load balancing, and resource provisioning (dynamically adjusting the allocation of computing resources between the multiple services supported by the system).

A promising method of automating system management tasks is to formulate them as control/optimization problems in terms of cost or performance metrics. Researchers from academia and industry have recently applied control theory to several system management tasks. Classical PID (proportional, integral, and derivative) or reactive control has been success-fully applied to selected problems such as task scheduling, load balancing, and power management in web servers [2]– [4]. Others have used more advanced concepts from model-predictive control and limited-lookahead control (LLC) to pose system performance goals as optimization problems and solve them online under dynamic operating constraints [5], [6]. The basic idea here is to solve an optimization problem that maximizes the performance objective over a given prediction horizon, and then periodically roll this horizon forward. The LLC approach, for example, allows for multiple performance goals to be represented as optimization problems under ex-plicit operating constraints and solved for every control step. This approach is also applicable to computing systems with complex non-linear behavior where tuning options must be chosen from a finite set at any given time.

The above control and optimization techniques aim to man-age the performance of a stand-alone server or a small-scale system comprising a few servers. Significant research chal-lenges must still be addressed to achieve real-time control of a large-scale DCE with multiple interacting components. For an optimization scheme to be of practical value in a distributed setting, it must successfully tackle the so-called “curses” of modeling and dimensionality. The number of available tuning options is typically quite large in distributed systems and the corresponding search space grows exponentially with each new variable. Complex, and possibly time-varying, component behavior as well as inter-component interactions must be accurately modeled and carefully managed at run-time to achieve system-wide performance goals. Finally, the system management task is further complicated when these distributed components must communicate with each other to solve the overall problem.

This paper develops an optimization framework to solve a class of performance management problems (e.g., power management, on-demand computing, and dynamic resource provisioning) in distributed enterprise systems and applications

(2)

operating in a data center environment. We focus on tackling these problems using decentralized hierarchical control struc-tures wherein the overall problem is decomposed into a set of simpler sub-problems and solved in cooperative fashion by multiple controllers. We also show how concepts from approximation theory can be integrated within the framework to further reduce the computational burden of controlling non-linear systems. The relevant approximations will be made in two places—in the construction of the dynamical models to predict system behavior, and in the optimization of the control variables input to the system.

We illustrate the above concepts in the context of dynamic resource provisioning in a heterogeneous DCE. Here, the provisioning problem is to decide an optimal allocation of computing resources (the number of servers and their CPU operating frequencies) for multiple online services under a dynamic workload to maximize profit while reducing sys-tem operating costs. This performance optimization problem may need continuous re-solving with observed environmental events such as time-varying workload arrivals and server failures. Moreover, since workload intensity can change quite quickly in enterprise systems [7], they must adapt to such variations and provision resources over short time scales— usually on the order of 10s of seconds or a few minutes.

We will study how approximation structures such as neural networks and regression trees can help reduce the difficulties of modeling a complex non-linear computing system with multiple interacting levels of control, as well as reduce the associated control overhead. The key to developing an ap-proximation structure is to fit measurements obtained from the controlled system, i.e., training data, to an abstract math-ematical representation that accepts one or more features from each training point as input and provides an estimate of system behavior (e.g., response time) as the output. We further extend the use of approximation models to learn the controller behavior itself and make provisioning decisions very quickly. Simulations using workload traces from the 1998 World Cup Soccer (WC’98) web site show that a computing system managed by a control framework using approximation models realizes profit gains that are, in the best case, within 1% of a controller using an explicit model based upon first-principles while incurring low control overhead.

The paper is organized as follows. Section II discusses related work on resource provisioning and Section III dis-cusses system modeling assumptions, the control problem formulation, and basic LLC concepts. Section IV explains the control hierarchy and application of the different modeling techniques. Section V compares the performance results of an LLC-managed system using approximation models against that using explicit models. Section VI discusses optimality considerations and Section VII concludes the paper.

II. RELATEDWORK

This section reviews some recent research addressing mod-eling techniques for approximating complex systems, particu-larly in the context of DCEs.

A variety of engineering applications including the op-timization of electromagnetic devices [8],

telecommunica-tions simulation [9], and multiprocessor design space explo-ration [10] use approximation models toward a solution, citing computational efficiency as a primary motivation. Approxima-tion models have been applied to a general class of closed-loop systems [11]–[14] with considerable success; our work extends the use of approximation models to the scenario in which the system has no set-point or reference value to match. Parekh et al. use statistical ARMA (auto-regressive moving average) models to approximate the closed-loop queuing system of a Lotus Notes server in [15], achieving a high degree of accuracy in identifying the target system. While the controller in [15] seeks to maintain a desired set-point of single application, our controller solves a more complex optimization problem that aims to maximize profits across multiple applications, more similar to the utility maximization addressed in [16] that uses online reinforcement learning as an approximation technique. While [16] considers the switching costs to be negligible, we account for such costs of control our optimization problem.

In choosing an approximation technique, [17] finds neural networks to be more accurate estimation than regression models for approximating software reliability growth models. The effectiveness of using neural networks for identification and control of complex nonlinear systems is demonstrated in [18], [19], and though neural networks typically model systems with continuous input and output, we show that the neural network can be adapted with quantization to accom-modate discrete DCE systems. Memory management enabled by neural network modeling in [20] manages a computing system to achieve a target response time. As in our work, [20] considers a time-varying workload intensities, but our scenario consists of multiple applications running simultaneously as separate services, and our controller optimizes for profit rather than targeting a specific performance value.

Regression models are used to make simulation-free infer-ences about performance and power of various applications in [21], achieving a modeling accuracy within 5% of the actual values. The regression models use a comparatively small number of training samples with respect to the design space. Our work similarly seeks to reduce the number of simulations required to estimate performance data, though we are also concerned with training point coverage of the entire operating region. Regression techniques applied to energy estimation of battery-powered embedded devices in [22] produce low estimation errors of 1-6%, although we expect moderate error rates in our work as we estimate multiple parameters essential to the control task.

III. PRELIMINARIES

This section presents the system modeling assumptions, including the pricing strategy used to differentiate between the online services, and the regression tree and neural net-work approximation techniques. We also discuss basic LLC concepts.

A. System Model

We assume a DCE hosting three independent online ser-vices, labeled as “Gold”, “Silver”, and “Bronze” and indexed using i∈ {1,2,3}. Fig. 1 shows the system model in which requests for the Gold, Silver, and Bronze services arrive

(3)

Workload λ(k) Dispatcher λ1(k) r₁₁(k) λ2(k) λ3(k) Dispatcher n11(k)

…

n1m(k) n21(k) n2m(k) n31(k) n3m(k) r_1m(k) r₂₁(k) r_2m(k) r₃₁(k) r_3m(k) Sleep Dispatcher

…

Dispatcher

…

Silver Gold Bronze

Fig. 1. The system model comprising the Gold, Silver and Bronze service clusters and a Sleep cluster holds machines in a powered-off state

;;;;;;; ;;;;;;; 0 1e-5 Gold SLA Silver SLA Bronze SLA Response Time, ms Revenue, dollars 7e-6 3e-6 -3e-6 0 200 400 600

Stepwise Non-Linear Pricing of Service Level Agreements

Fig. 2. A pricing strategy (or SLA) differentiating the online services

0 500 1000 1500 2000 2500 0 200 400 600 800 1000 1200 1400 Time Instance

Arrival Rate Per 30 Second Interval

1998 World Cup HTTP Requests

Gold Workload

Silver Workload

Bronze Workload

Fig. 3. An example workload representing client requests for the three online services hosted by the computing system

with time-varying ratesλ1(k),λ2(k), andλ3(k), respectively,

wherekdenotes the time instant, and are routed to a computer cluster dedicated to hosting that service. Each cluster com-prises heterogeneous computers with different processing ca-pacities working independently to service incoming requests. Finally, computers contributing excess capacity during periods of slow workload arrivals are powered down and placed in the Sleep cluster to reduce system power consumption.

Online services are enabled by enterprise applications. In our model, instances of three enterprise applications are hosted

on stand-alone servers. Components of a typical enterprise application include databases, application and business logic, and middleware software. The electronic trading application Trade3 is one specific example [23]. This transaction-based application uses a database component (DB2), and is inte-grated within a middleware infrastructure (IBM WebSphere) to obtain the application environment. For each online service within our enterprise application, an appropriate fraction of the incoming workload is distributed to each machine in the service clusters.

Our system model groups servers within each cluster intom

different performance classes as shown in Fig. 1. Servers be-longing to a performance class have similar processing speeds and capabilities. In the optimization problem formulated in later sections, a server belonging to the jth performance class can only be switched to the jth performance classes of another cluster. If Ni(k) denotes the number of servers

within cluster iat timek, thenNi(k) =P m

j=1nij(k), where

nij(k)denotes the number of servers within performance class

j of the cluster. A dispatcher within each cluster distributes an appropriate fraction of the incoming workload to each performance class. Requests are routed to individual servers within a class in round-robin fashion for load balancing (since servers within a class are assumed to be homogeneous in terms of their processing capability) and processed in first-come, first-serve fashion. The average response time achieved by the

jth_{performance class within the cluster is denoted by}_r ij(k).

The revenue generated by a service hosted in our DCE is specified via a pricing scheme that relates the achieved response time to a dollar value that clients are willing to pay. The Gold, Silver, and Bronze services generate revenue as per the non-linear pricing graph shown in Fig. 2, a commonly used reward/refund strategy in which the response time of a com-pleted request is translated into a dollar amount to be collected from the client. Gold customers expect the best performance in terms of the lowest response times, and pay premium prices for this expectation, while the Silver and Bronze customers expect moderate and basic service, respectively. When the response time violates the service level agreement (SLA), the service provider pays a penalty to the client [24].

B. System Dynamics

Assuming perfect knowledge of the system, the continuous dynamics of a computing cluster i in Fig. 1 is described at timekby the discrete-time state-space equation

si(k+ 1) =φ si(k), ui(k), ωi(k) (1)

where si(k) and ωi(k) denote the operating state and

en-vironment inputs, respectively, and the behavioral function

φ captures the relationship between the observed system parameters relevant to profit generation and the control in-puts ui(k) that adjust these parameters. All state variables

are assumed observable and correspond directly to system outputs. The operating state of clusteriis denoted assi(k) =

{rij(k)},{Oij(k)}

where rij denotes the response time

achieved by the jth performance class and Oij denotes the

(4)

The lookahead nature of LLC requires that the relevant environment parameters ωi(k) to a cluster—the number of

workload arrivals λi(k) and the per-request processing time

µi(k)—be estimated for the corresponding prediction horizon.

The time-varying nature of the workload makes it impossible to assume an a priori distribution for workload arrivals. Therefore, we forecast the environment inputs to the system as ωî(k) = λî(k),µî(k)

using online predictive filters. A Kalman filter [25], implementing a time-series ARIMA model, estimates the number of workload arrivals ˆλi(k),

and an exponentially-weighted moving-average (EWMA) filter estimates the average request processing time as µˆi(k) =

π·µi(k) + (1−π)·µi(k−1)whereπis a smoothing constant.

Since the current values of the environment inputs cannot be measured until the next sampling instant, the corresponding system state can only be estimated as

ˆ

si(k+ 1) =φ si(k), ui(k),ωˆi(k)

(2) We develop φfor each clusteriusing the following differ-ence equations. qij(k+ 1) =qij(k) + γij(k)·ˆλi(k)−pij(k) ·Ts (3) pij(k) =nij(k)· 1 ˆ µi(k) · fij(k) fmax (4) rij(k) = ˆ µi(k) fmax +qij(k) pij(k) (5) Oij(k) =nij(k)· c0+c1·fij(k)3 (6) If Ts is the controller sampling time, then the queueing

dynamics for the jth _{performance class is given in (3) using}

the current queue length qij(k), the fraction of the incoming

workload γij(k) · λi(k) dispatched to the class, and the

processing rate pij(k). The processing rate of a performance

class depends on the number of serversnij(k)and the average

request processing time µi(k). As noted earlier, each server

within a cluster can be operated within a limited set of frequencies. Therefore, if the time required to process a request while operating at the maximum frequency fmax is µ, then

the corresponding processing time achieved while operating at some frequencyf(k)≤fmaxisµ·

fmax

f(k)

. This simple linear relationship between operating frequency and processing time has been found to be adequate by other researchers and is widely used [26].

The average response time rij(k) achieved by the jth

performance class includes the waiting time for requests in the queue and the processing time incurred on a server (as per its current operating frequency). The operating costOij(k)is due

to the total power consumed by servers in thejth_performance

class of the ith _{cluster; an active server incurs a base cost}

of c0 (due to the power consumed by the hard disk,

power-supply transformers, etc.) as well as a dynamic costc1·fij(k)3

depending on the current operating frequency [27]. Section V provides the specific values used for the constants c0 andc1

in our simulations.

The difference model captured by (3)-(6) adequately rep-resents the system dynamics when the incoming workload is CPU intensive, i.e., when the processor is the bottleneck resource. Other authors have used similar difference equations

to model server performance under a CPU-intensive work-load [2], [23]. Given the large memory capacity of mod-ern servers and sophisticated cache-replacement algorithms, request-processing delays can be almost exclusively limited to cache access, eliminating accesses to the hard disk.

Given the above setting, the vectorui(k)to be decided by

the controller at sampling timekfor each cluster includes the number of serversnij(k)to provision to thejthperformance

class, the fraction of incoming requestsγij(k)·λi(k), where

γ∈[0,1]andPm

j=1γij = 1, to dispatch to each performance

class, and the operating frequency fij(k) to assign to the

servers within a performance class.

Finally, our DCE assumes an admission policy wherein requests arriving at a rate exceeding the SLA are dropped. This policy ensures that the system is able to meet some minimum QoS goals given an initial cluster configuration, prevents against monetary losses, and ensures a fair compar-ison between the controlled and uncontrolled system. For a given system configuration in terms of individual cluster sizes and their processing capabilities, we determine the worst-case workload scenario in terms of workload arrivals and request processing time that can be handled by the system while generating no profit—the break-even point. If the arrival rate violates this upper bound, requests are dropped.

C. Control Concepts

Given the pricing scheme in Fig. 2, the controller must periodically tune the following parameters to maximize the profit generated by the DCE: (1) the number of servers to pro-vision per cluster, including switching machines between the different services; (2) the fraction of the incoming workload to distribute to each performance class within a cluster; (3) the operating frequencies of the servers within each cluster; and (4) the number of servers to power down. A practical solution to this control problem must address the following key issues. • Hybrid system behavior. Computing systems exhibit hy-brid behavior comprising both discrete-event and time-based dynamics [28], and the control decisions issued to the system are typically limited to a finite set at any given time. In Fig. 1, for example, only a limited number of servers can be moved between clusters, and each server can only choose from a finite and discrete set of possible operating frequencies.

• Optimization under constraints. In addition to maximiz-ing the revenue generated from the clients, our system must also minimize its operating cost. The corresponding nonlinear cost function includes multiple control variables and must be optimized under dynamic constraints. • Control actions with dead times. Actions such as

power-ing up a server on demand incur some dead time—the delay between a control decision and the corresponding system response—requiring proactive control where de-cisions must be provided in anticipation of future changes in operating conditions.

• Cost of control. In a practical setting, certain control de-cisions themselves incur significant cost. In our example, revenue is lost while a server switches between clients for some time duration while a different operating system

(5)

System Predictive filter System model System Optimizer ) (k ω ωˆ(k+1) ) 1 ( ˆk+ s uˆ(k+1) ) (k s ) (k u

Fig. 4. The controller schematic

) 1 ( ˆk+ s ) 2 ( ˆk+ s ] , 1 [ : horizon Prediction h k k+ + s(k) ) ( ˆk h s + ) 3 ( ˆk+ s

Fig. 5. The state-space trajectory explored by the controller within a lookahead horizon of lengthh

and/or application environment is loaded to service the new client.

We pose the resource-provisioning problem as one of sequential optimization under uncertainty within a limited-lookahead control (LLC) framework. The LLC approach, previously introduced by the authors in [6], [29], is a predictive control scheme allowing for multi-objective optimization and explicit constraint handling. The method applies to systems in which control inputs must be chosen from a finite set within the discrete domain, and systems with complex non-linear dynamics and dead times. The LLC problem is periodically re-solved with observed environmental inputs such as time-varying workload patterns. Fig. 3, for example, shows traces from the World Cup 98 web site. The workload, comprising HTTP requests plotted in 30-second granularity, clearly shows time-of-day variations where number of requests changes quite significantly within a matter of minutes.

Fig. 4 shows the basic LLC framework. The relevant en-vironment input ω is estimated over the prediction horizon

h and used by the system model to forecast future system statesˆs. The controller optimizes the forecast behavior per the specified QoS requirements by selecting the best control inputs to apply to the system. At each time stepk, the controller finds a feasible sequence {u∗(l)|l ∈ [k+ 1, k+h]} of decisions within the prediction horizon as shown in Fig. 5. Then, only the first control input is applied to the system and the rest of the sequence is discarded. This optimization procedure is repeated at timek+ 1when the new system state is available. The LLC approach accommodates cost functions where performance goals can be posed as set-point regulation where key operating parameters must be maintained at a specified level or follow a certain trajectory (e.g., the average response time in web servers) or utility optimization where the system

aims to maximize its utility (e.g., the profit-maximization problem considered in this paper). It is also possible to consider control costs as part of the cost function, indicating that certain trajectories toward the desired goal are preferable over others in terms of their cost to the system.

D. The Profit Maximization Problem

Returning to our provisioning problem, ifsi(k)denotes the

system state andui(k) = nij(k), γij(k), fij(k)

the decision vector for the jth _{performance class at time} _k_{, the profit}

Ri si(k), ui(k) generated by servicei is Ri si(k), ui(k) = m X j=1 Hi rij(k) −Oij k −Sw ∆nij(k) (7) where Hi(rij(k)), the revenue generated by the jth

perfor-mance class, is determined using the achieved response time

rij(k) and the corresponding SLA function Hi. The

power-consumption cost incurred in operating nij(k) machines at

a frequency fij(k) is given by Oij k

and Sw(∆nij(k))

denotes the switching cost incurred by a performance class due to the provisioning decision. This switching cost is a function of the number of computers being moved across services, including the Sleep cluster, and accounts for transient power-consumption costs incurred when starting up computers as well as the opportunity cost that accumulates during the time a computer is unavailable to perform any useful service.

Assuming three services (i ∈ {1,2,3}), the profit-maximization problem is posed as

Compute: max u k+h X l=k+1 3 X i=1 ˆ Ri si(l), ui(l) (8) Subject to: nij(l)≥Kmin

3 X i=1 m X j=1 nij(l)≤N(l) m X j=1 γij(l) = 1

where h is the length of the prediction horizon and N(k)

denotes the number of available servers in the computing system. System administrators can specify additional operating constraints; for example, in ( 8) the controller must maintain a minimum performance class size of Kmin at all times

to accommodate sudden (and rare) bursts of traffic caused by flash crowds, thereby reserving capacity for conservative operation.

If the controller may move up to two machines from each of the Gold, Silver and Bronze classes (i = 3) at any given time, each having an initial cluster size of n= 21 as noted in Table I, where the minimum prediction horizon is h= 2, then the number of combinations to explore is given by the expression 2 X h=0 2 X k=0 21 k 3 h = 5.5e15 (9)

To efficiently explore the large control space in (9), we employ a hierarchical control structure, localized searches, and

(6)

L2 controller n11(k) L1 controller γ11(k) L0 controller f11(k) n11(k) n1m(k) f1m(k) … γ1m(k) n1m(k) ... n21(k) L1 controller γ21(k) f21(k) n21(k) n2m(k) f2m(k) γ2m(k) n2m(k) ... n31(k) L1 controller γ31(k) f31(k) n31(k) n3m(k) f3m(k) γ3m(k) n3m(k) ... Silver Gold Bronze L0 controller … … L0

controller controllerL0 controllerL0 controllerL0

Sleep

… … …

Fig. 6. The control hierarchy showing L2, L1 and L0 controllers superim-posed upon the Gold, Silver and Bronze services andmperformance classes within each service cluster

approximating structures used in place of the combinatorial search space to reduce the control overhead, as discussed in Section IV.

IV. CONTROLLERDESIGN

Where control inputs must be chosen from discrete values, the LLC problem in ( 8) will show an exponential increase in worst-case complexity with an increasing number of control options and longer prediction horizons. Thus, centralized con-troller implementations cannot be used to provide fast resource provisioning decisions in a DCE.

We implement a hierarchical control structure such as the one superimposed on the DCE in Fig. 6 to reduce the dimensionality of the optimization problem in ( 8). The entire control vector ui(k) to each cluster i consists of ui(k) =

nij(k), fij(k), γij(k) ∀j, where nij(k) is the number of

machines,γij(k)is the fraction of workload, andfij(k)is the

operating frequency to set within each performance class j. In the hierarchical structure in Fig. 6, the following control layers decide these decision variables.

• The L0 Control Layer: At the lowest level, L0 con-trollers, local to a cluster’s performance class, decide the operating frequency fij to assign to the CPUs. Due

to the small number of components under its control, the relatively small size of the control space of discrete operating frequencies, and the near-zero control cost, the L0 controller has a small execution time and operates frequently, on the order of seconds with a lookahead horizon of h= 1. Each L0 controller uses the model φ, given by 3-6, to capture the underlying cluster dynamics. • The L1 Control Layer: The L1 controllers, local to each service cluster, decide γij, the fraction of the incoming

workload to distribute to each performance class. The L1 controller operates with a lookahead horizon of h= 1. • The L2 Control Layer: An L2 controller with a global

view of the system decides nij, the size of the

perfor-mance classes in each service cluster. The L2 control decision is subject to the constraint∆nij≤2to make the

control problem more tractable by limiting the size of the search space. The control cost at this level includes the time to boot a machine, or the time to load applications

for a new client class, and we set the lookahead horizon of the L2 controller toh >1.

Controllers at different levels of the hierarchy can operate at different time scales. Since the L2 controller accounts for the latency of turning on new machines and loading new applications, a cost denoted as Sw(∆nij(k)), the L2

controller operates on a longer time scale, on the order of minutes. The control decisions made by the L2 controller serve as additional operating constraints on the L1 controller, and the L1 controller can operate on the same time scale as the L2 controller to distribute the workload among the performance classes. The L0 controller typically reacts to short-term fluctuations in the environment on the order of seconds.

To realize the hierarchical structure in Fig. 6, each high-level controller must know the behavior of the components comprising the immediate lower level. For example, to solve the combinatorial optimization problem of determining nij,

the size of the performance classes in each service cluster, the L2 controller must be able to quickly quantify the behavior of each service cluster, including its L1 and L0 controllers, for various choices of nij. From the viewpoint of the L2

controller, this can be achieved using one of the following strategies:

• Simulate, at run time, the behavior of all lower-level controllers and components for various choices of nij.

• Simulate the behavior of downstream components and controllers in an offline fashion as part of a supervised learning process, and then construct an approximation of the cluster dynamics to use at run time.

In Fig. 6, the L1 controllers explicitly simulate the behavior of the underlying L0 controllers and managed components at run time due to the small size of the control space and short lookahead horizon (h= 1).

At the L2 layer, however, the size of the search space makes simulation-based optimization too costly to perform at run time. Recall that the provisioning decisions made by the L2 controller incur switching costs, i.e, when a server is being loaded with new applications needed for a different service class, it does not generate any revenue, and this unavailability period is assumed to be equal to the L2 sampling interval. Thus, to estimate the impact of an L2 provisioning decision, the controller must track the behavior of the system over a prediction horizon of h ≥ 2 steps—one step to load a new application environment on the server, and one step to estimate the profit generated when the server is back online. The increased depth of the prediction horizon results in an exponential increase of the search space, especially when the number of control inputs is large.

For the foregoing reasons, the behavior of the system com-ponents managed by the L1 and L0 control layers is learned via extensive offline simulations, and approximated by a neural network or regression tree within the L2 controller. To further reduce the control overhead, we will also use approximation modeling techniques to learn the behavior of the L2 controller as it adjusts nij in response to both the environment inputs

(7)

A. Neural Network Modeling

Applications in economics, engineering, pure science and gaming often apply neural networks to represent nonlinear systems for estimation and forecasting purposes. Neural net-works are abstract mathematical models that are trained offline using simulation data to predict one or more outcomes from an input vector. Neural networks are a good choice for behavioral approximation when the exact relationship between inputs and outputs of a system are impossible or difficult to define, such as in complex, nonlinear, delayed or high-dimensional systems. Although typically applied to data in the continuous domain, neural networks can accommodate the discrete domain with some quantized output filtering.

The basic architecture of a neural network consists of an input layer, an output layer and one or more hidden layers. The hidden layers introduce additional nonlinearity between the input and output layers and are added as needed. Input values{x1, ... xZ} are multiplied by some scalar weight, and

to that product an input bias may be added to raise positive values and decrease negative values. The weighted sum of the input layers is then passed to an activation function which may be a linear, sigmoid, or other nonlinearly shaped function. The activation function limits the amplitude of an output. Adjustable parameters of a neural network include the initial input weights, the presence of input biases, the number of hidden layers.

The general form of a fully-connected neuron consists of units of one layer connected to all units of the next layer. The feedforward neural network passes the output y of one layer as the input to the next layer [30]. Backpropagation within a neuron enables updating of the synaptic (edge) weightswand biases b during network training. Mathematically, the output of a neuron can be expressed as:

y=ψ Z X z=1 xz·wz+b (10) During the training phase, updates upon the edge weights seek to minimize the squared error between the neural work’s estimated output and the desired output. The neural net-work becomes increasingly nonlinear as the edge weights are updated. Neural networks typically make use of an additional parameter, a momentum factor, to prevent getting stuck at local minima and terminating the training process prematurely. The advantage of neural networks over other models is that they can be updated online in response to changing dynamics in the system.

B. Regression Tree Modeling

Regression trees, sometimes referred to as classification trees, are a form of recursive partitioning that can be rep-resented as a tree. Recursive partitioning is particularly well-suited to discrete data and nonlinear system dynamics [31]. The tree construction process sorts historic data points so as to maximize statistical criteria for one or more predicated values of a chosen system variable at a time. The partitioning function is called recursively, then ceases when there are fewer than K training points to be partitioned. Tree pruning

System Predictive filter Approximation model System Optimizer ) (k ω ωˆ(k+1) ) 1 ( ˆk+ s uˆ(k+1) ) (k s ) (k u

Fig. 7. A modified control schematic of Fig. 4 showing replacement of the system model with an approximation model

collapses two or more regions into one for a more compact representation. A regression tree representation is typically less compact than that of a neural network. Regression trees are static models in the sense that they are generated offline and must be reconstructed when changes in the system behavior or structure occur. Mathematically, the output of a regression tree can be expressed as [32]:

y= D X d=1 cdI x∈Regiond (11) where: x=x1, ... xz

where D denotes the total number of mappable regions or possible output values. Recursive partitioning assigns the values of Z input variablesx=

x1, ... xZ into D regions (Region1, ...RegionD

, scaling the incidenceI∈ {0,1} that an input vector falls within a particular region by the output value cd. Input values of selected system variables dictate a

path through the tree until terminating at a leaf node, resulting in a predicted output.

C. System Behavior Approximation

From the viewpoint of the L2 controller, the behavior of a service cluster is learned by simulating the underlying L1 and L0 control layers in Fig. 6 with a large number of training data, taken from the domains of the L2 control set, the environment inputω, and the system states=

n, q . Then, we use the collected data to train two different approximation models—a regression tree and a neural network—that the L2 controller can use to evaluate control actions at run time. The approximation models provide a cost for the current state and next state information as output. Fig. 7 shows the approximation structure as it would appear in the L2 control schematic.

D. Control Action Approximation

As number of service classesior the length of the prediction horizon h increases, the increased control complexity may prohibit real-time operation at the L2 level even when using approximation models for the lower-level control layers. In such cases, it is advantageous to train an approximation model to predict the behavior of the L2 controller itself. By learning the control action under a variety of conditions, two components in the LLC shown in Fig. 4 can be replaced with one approximation model as shown in Fig. 8, effectively elim-inating the combinatorial search process within the optimizer

(8)

System Predictive filter Approximation model System ) (k ω ωˆ(k+1) ) (k s ) (k u

Fig. 8. A modified control schematic of Fig. 4 to be applied at the L2 level of control showing replacement of the system model and optimizer with an approximation model

and the need for a model of downstream components. By learning the control action, the time-intensive combinatorial search process can be eliminated from the control process to make the infrastructure more scalable. In Section V, we show the performance results obtained by replacing these components of the L2 controller with an approximation model represented by a regression tree and a neural network in two separate cases for comparison.

In Section III-C, we discussed the LLC control concept of optimizing a sequence of control inputs over several discrete time steps in a forward horizon. The exploration over a forward horizon implies that the first control input in the sequence is optimized over h future time steps. Therefore, once the approximation model has been trained to learn how the controller behaves under a variety of conditions, we need only supply the approximation model with a vector of estimated environment inputs ωî = λî,µî

where λˆi =

{λî(k+ 1)...λî(k+h)} and µî = {µî(k+ 1)...µî(k+h)},

and the approximate control vector u˜i(k) = {n˜ij(k)∀j}

returned by the model will implicitly have been optimized over the horizon of length h. The control output n˜i(k) can

be considered as a “seed” value and a small search space around this value comprising{n˜ij(k)±1∀j} can be explored

to improve the control performance. V. SIMULATIONRESULTS

The results presented in this section were generated by simulations executed in Matlab on a 3 GHz Pentium proces-sor with 1 GB of RAM. Simulations using representative workloads show that a computing system controlled using the proposed LLC scheme with approximation modeling achieves profit gains in the best case within 1% of those achieved with explicit modeling using first-principles.

Table I shows the system and controller parameters used in our simulations. The set of operating frequencies in Table I could be increased with little cost to the L0 controller that selects from among them. However, the training time for the L2 controller would increase for each element added to the set. The same is true for adding clusters and servers to the initial configuration set in Table I. The prediction horizons are set to the minimum length possible at each level. The L2 controller must look ahead at least two time steps to account for switching delays while the L1 and L0 controllers can suffice with one step. The sampling period for the L1 and L0 controllers is determined by HTTP request traces, collected

TABLE I SIMULATION PARAMETERS

P arameter V alue

Maximum operating frequencies ofm= 3

performance classes, normalized to 1.0 1.0, 0.8, 0.6

Power consumed by idle server,c0 50 Watts

Dynamic power consumed, scaling

constantc1 80 Watts

Cost per kilo-Watt hour $ 0.17

Switching delay 1 min.

Prediction horizon, L2 2 time steps

Prediction horizon, L1, L0 1 time step

Control sampling period, L2 1 min.

Control sampling period, L1, L0 30 sec.

Initial configuration, num. servers (9,7,5)/(9,7,5)/(9,7,5)

in 30-second increments, while the L2 controller sampling period is determined by the time to power down and reboot a server, about 1 minute. The Kalman filter used to estimate the number of request arrivals is first trained using a small portion of the workload (42 samples) and then used to forecast the remainder of the load during controller execution. Fig. 9 shows the workload for the Gold service and the corresponding predictions which are within 6% of the actual values. The Kalman filter can output as many estimates as necessary; for example, at time k, a 3-step lookahead controller can obtain three estimates from the Kalman filter corresponding to the time stepsk+ 1,k+ 2, andk+ 3, although each successive estimate will have increased prediction error.

A. Approximation Model Construction

The regression tree approximating the behavior of the system elements downstream from the L2 controller contains more than 600 decision branches. The corresponding neural network uses one input layer and one hidden layer, each having 14 neurons, and one output layer. Sigmoid activation functions are used in the input and hidden layers whereas the output layer uses a linear activation function. For both of these approximation structures, the training data of 5.3 million samples is obtained by extensively simulating the operation of the L1 and L0 controllers for a range of system-state values

(n, q)and environment inputs(λ, µ)1_.

The regression tree used to learn the behavior of the L2 controller contains about 20 decision branches and is trained using 2720 samples. To collect these samples, we simulate the operation of the L2 controller using two representative workloads synthesized from the WC’98 traces such as those shown in Fig. 3. In this off-line training mode, the L2 con-troller makes its provisioning decisions under a time-varying workload by fully simulating the behavior of all downstream controllers and components. The neural network to learn the behavior of the L2 controller is trained in similar fashion. B. Workload Generation and Pricing Structure

We obtained six different synthetic workloads using HTTP-request traces from the WC’98 web site [33]. Fig. 3 shows one such workload synthesized using the WC’98 traces, represent-ing requests made to the Gold, Silver, and Bronze services 1_{The Matlab statistics and neural network toolboxes are used to train the}

(9)

over a 24-hour period. Request arrivals are plotted in 30-second intervals and exhibit an appreciable amount of noise and variability.

To generate the processing times for individual requests within the arrival sequences, we assumed processing times within a range reflecting both static and dynamic page re-quests [34]. We generated a virtual cache comprising 10,000 objects, and the time needed to process an object request was randomly chosen from a uniform distribution between [1.0, 43.0] ms. The distribution of individual requests within the arrival sequence was determined using two key characteristics of most web workloads: popularity of requests commonly following Zipf’s law [35], and temporal locality of requests commonly following a lognormal distribution [36].

The static cluster size for an uncontrolled computing system or the initial configuration for a controlled one is determined using the maximum arrival rate observed within the workload of interest and the maximum service time of a request (set at 43 ms). Given this arrival rate and processing time, the initial configuration for each cluster is simply the number of servers needed to achieve the response time corresponding to a revenue generation of zero dollars—the break-even point in the pricing structure shown in Fig. 2. The X-axis shows the response time in milliseconds, and the Y-axis shows the reward in micro-dollars per request. The point at which the function crosses the X-axis is the maximum response time tolerated by a client class before seeking a refund.

C. Controller Performance

Test cases for the hierarchical controller shown in Fig. 6 were constructed with the following implementations of the L2 controller.

• A “baseline” controller includes an L2 controller that uses an explicit model of system components and fully simulates the actions of the L1 and L0 controller at run-time to make provisioning decisions.

• Controllers denoted as “System model ← (Approxima-tion model)” contain an L2 controller that has learned the behavior of downstream controllers and components with a regression tree or a neural network. The implementation of these test cases is shown in Fig. 7.

• Controllers denoted as “L2 control behavior← (Approx-imation model)” contain a regression tree or a neural network that has learned the behavior of the L2 controller itself, obviating the need for a full-fledged optimization process inside the L2 controller. The implementation of these test cases is shown in Fig. 8.

Each of the two approximation-based implementations test both a regression tree and a neural network, thereby generating a total of four test cases in addition to the baseline controller. The performance metric in all cases compares the profit earned by the controlled system against that of an adequately proportioned but uncontrolled system. Fig. 10 shows an ex-ample of the profit gains achieved by the baseline controller for the workload shown in Fig. 3 over a 24-hour period, generating about 15% more than the $500 in profit earned by the uncontrolled system. In the best case, our experimental

0 500 1000 1500 2000 2500 0 100 200 300 400 500 600 700 Time Instance

Arrival Rate Per 30 Second Interval

Kalman Filter Estimate for Gold Workload − Original Workload

Kalman Estimate t+1, dotted line

Gold Workload, solid line

Fig. 9. The workload for the Gold service cluster and the corresponding predictions obtained using a Kalman filter

0 500 1000 1500 2000 2500 0 0.05 0.1 0.15 0.2 0.25 0.3

Time instance, 30 second intervals

Profit, dollars

Profits of controlled vs. uncontrolled system

Profit without controller Profit with controller

Fig. 10. A comparison of profits generated by an LLC controlled and uncontrolled systems 0 5 10 15 20 0 500 1000 1500 0 0.05 0.1 0.15 0.2 0.25

Num. machines, Gold cluster Revenue Generated by Gold SLA

Arrivals per 30 seconds

Revenue, dollars

Fig. 11. Surface map of the revenue returned by the Gold SLA function in Fig. 2 for various arrival rates and cluster sizes.

controllers using approximating structures should be able to match the profits earned by the baseline controller.

For the DCE shown in Fig. 1, a control structure using ap-proximation models at the L2 control layer achieves profits that are, in the average case, within 4.5% of the gains achieved by the baseline controller. Table II compares the profit achieved by the baseline controller against the four approximation-based controller implementations for six representative workloads, labeled WL0 to WL5.

(10)

TABLE II

THE PROFIT GAINS ACHIEVED BYLLC-MANAGEDDCES OVER AN UNCONTROLLED SYSTEM FOR SIX WORKLOADS SIMILAR TO THAT SHOWN INFIG. 3 AND THEL2CONTROL EXECUTION TIMES

L2 control Profit, Profit, Profit, Profit, Profit, Profit, L2 control implementation execution time WL0 WL1 WL2 WL3 WL4 WL5

Baseline controller with explicit model 82 sec. 20.05% 15.99% 13.35% 13.39% 18.75% 23.90%

System model←regression tree 23 sec. 17.19% 15.57% 13.31% 13.32% 18.00% 21.51%

System model←neural network 54 sec. 17.19% 15.52% 11.87% 13.06% 17.79% 19.54%

L2 control behavior←regression tree <1 sec. 19.16% 8.86% 10.64% 10.76% 14.65% 17.36%

L2 control behavior←neural network <1 sec. 17.35% 10.20% 8.18% 11.44% 13.34% 13.49%

0 500 1000 1500 2000 2500 0 2 4 6 8 10 12 14 16

Time in 30 second increments

Number of machines

Gold cluster size, System model <− Neural network, Workload J

Class 3 Class 2

Class 1

Fig. 12. Gold cluster sizes assigned by the L2 level controller for workload WL4 using a neural network in place of the system model

0 500 1000 1500 2000 2500 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Fraction of Workload

Gamma workload distribution in Gold cluster System model <− Neural network, Workload J

Class 3 Class 2

Class 1

Fig. 13. Gamma workload distribution assigned by the L1 level controller for the Gold cluster for workload WL4 when the L2 level controller is using a neural network in place of the system model

The table shows that the control performance using neural networks is inferior to that of the regression tree, largely due to the discontinuous function that the neural network must learn. Fig. 11 shows a surface map of the revenue generated from the Gold service class for various arrival rates and cluster sizes. The surface map shows discontinuities in the function that makes it challenging to approximate the function using a neural network. 0 500 1000 1500 2000 2500 0 2 4 6 8 10 12 14 16

Number of machines

Gold cluster size, System model <− Regression tree, Workload J

Class 1

Class 2

Class 3

Fig. 14. Gold cluster sizes assigned by the L2 level controller for Workload J using a regression tree in place of the system model

0 500 1000 1500 2000 2500 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Fraction of Workload

Gamma workload distribution in Gold cluster System model <− Regression tree, Workload J

Class 1 Class 2

Class 3

Fig. 15. Gamma workload distribution assigned by the L1 level controller for the Gold cluster for workload WL4 when the L2 level controller is using a regression tree in place of the system model

In Table II, we note that for workloads WL0, WL1, and WL4, the profit gains achieved by the L2 controller that uses a neural network or regression tree to approximate system behavior are nearly the same, though Figs. 12 and 14 show that the resulting L2 control actions vary greatly depending upon the approximation method used. Since controllers within the hierarchy cooperate to maximize the system utility, the impact of switching decisions made by the L2 controller

(11)

0 500 1000 1500 2000 2500 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Operating frequency for Class 2 machines in Gold cluster System model <− Neural network, Workload J

Operating frequency normalized to fmax

Fig. 16. Operating frequency assigned by the L1 level controller for Class 2 machines in the Gold cluster for workload WL4 when the L2 level controller is using a neural network in place of the system model

0 500 1000 1500 2000 2500 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Operating frequency normalized to fmax

Operating frequency for Class 2 machines in Gold cluster System model <− Regression tree, Workload J

Fig. 17. Operating frequency assigned by the L1 level controller for Class 2 machines in the Gold cluster for workload WL4 when the L2 level controller is using a regression tree in place of the system model

are counteracted somewhat by the underlying L1 and L0 controllers. For example, consider the L2 controller that switches machines between clusters using a neural network to approximate the system behavior (Fig. 12). The L1 and L0 controllers will counteract these switching decisions by redistributing the workload among the available machines, and by tuning the server operating frequencies. Moreover, since the L1 and L0 controllers execute more frequently than the L2 controller, they can correct for workload-prediction errors and short term variations in the workload arrivals.

Figs. 13-17 show how the lower-level controllers counteract the decisions of the L2 controller. (Only the Gold cluster is shown in Figs. 13-17 for brevity, although counteraction by lower-level controllers exists in all clusters.) For workload WL4, Figs. 12 and 14 show the Gold cluster sizes assigned by the L2 controller using a neural network and regression tree, separately, to model the system dynamics. The L2 controller using a neural network switches machines between clusters

TABLE III

AVERAGE APPROXIMATION ERROR OF THE REGRESSION TREE AND NEURAL NETWORK MODELS

Modeling Technique Average Error

Regression tree 27%

Neural network 30%

frequently and, on average, assigns1.5fewer machines to the Gold cluster per time instant than an L2 controller using a regression tree. The L0 controller counteracts by increasing the operating frequency; Figs. 16 and 17 show the operating frequencies (normalized tofmax) for Class 2 machines in the

Gold cluster. The operating frequencies assigned by the L0 controller in Fig. 16 are, on average, 15% higher than those shown in Fig. 17.

A neural network that approximates L2 control actions allocates a greater proportion of Class 3 machines to the Gold service. Fig. 13 shows that a greater proportion of the workload, 7% on average, is distributed to the Class 3 ma-chines by the L1 level controller as compared to the workload distribution shown in Fig. 15.

The execution-time overhead incurred by the baseline L2 controller with a prediction horizon of 2 steps is 82 seconds2_, clearly violating the 60 second timing sampling interval. By contrast, the overhead cost incurred by a similar L2 controller using a regression tree to model the system dynamics is 23 seconds, 78% less than the overhead incurred by the baseline controller. The overhead cost incurred by an L2 controller using a neural network model of the system is 54 seconds, 35% less than the overhead cost of the baseline controller. When either approximation model is applied to learn the L2 controller behavior, the overhead cost is reduced to just 1% of the cost incurred by the baseline.

An L2 controller using a 3-step lookahead horizon slightly outperforms its counterpart with a 2-step lookahead horizon, achieving 1-2% in additional profit gain, but contributes an order of magnitude to the execution time due to an exponential increase to the control search space as noted in (9).

VI. OPTIMALITYCONSIDERATIONS

In an approximation process, some discrepancy between an estimated value and the actual value is to be expected. Table III summarizes the average error of the estimated L2 control actions output by the neural network and regression tree, compared to the actual L2 control actions of the baseline controller. Errors may occur due to the interpolation process occurring within the regression tree or due to function mis-fits within the neural network. Therefore, exploring a small localized area nij(k)±1 around the ‘seed’ value returned

by the approximation model to adjust for interpolation errors, rounding errors, and functional aberrations will improve con-trol performance.

In an uncertain operating environment, control decisions cannot be shown to be optimal since the controller does not have perfect knowledge of future environment inputs. Furthermore, control decisions are made from a discrete set of

(12)

inputs chosen from a localized search area explored within a limited prediction horizon. These factors limit control actions to the best from a set of sub-optimal decisions. Therefore, we must be satisfied with the best control action confined within the boundaries of the these constraints.

VII. CONCLUSIONS

We have compared approximation modeling techniques to solve a dynamic resource provisioning problem in a DCE. The proposed control framework employs a limited lookahead control approach to maximize profit generation by dynamically shifting computing resources between services, distributing workloads, and reducing power consumption costs. Experi-ments using WC’98 workload traces indicate that, over the operating period of one day, the controllers using approxima-tion techniques generate profit gains in the best case within 1% of the profits earned by a controller with an explicit model having perfect knowledge of the underlying components, and with low control overhead.

Approximating the behavior of downstream managed com-ponents obviates the need for run-time simulations. By sub-stituting an approximation model in place of the iterative optimization process inside L2 controller, the controller may scale to a large number of service classes and cluster sizes. This result can be exploited to explore a larger control space in training sets and improve the LLC performance.

Finally, online model learning can be used to improve the accuracy of the original model or to maintain the correctness of the model against slow behavioral changes to system com-ponents (due to replacements, upgrades, etc). Achieving robust control using online model learning to adapt to changing operating conditions is a topic of interest in current work.

REFERENCES

[1] Q. Li and M. Bauer, “Understanding the performance of enterprise applications,” in Proc. of IEEE Conference on Systems, Man and

Cybernetics, June 2005, pp. 2825–29.

[2] G. Pacifici, M. Spreitzer, A. Tantawi, and A. Youssef, “Performance management for cluster based web services,” IBM Research Report RC22676, IBM Research Labs, Tech. Rep., May 2003.

[3] B. Urgaonkar, P. Shenoy, A. Chandra, and P. Goyal, “Dynamic provi-sioning of multi-tier internet applications,” in Proc. of IEEE Intl. Conf.

on Autonomic Computing (ICAC). IEEE, June 2005, pp. 217–28.

[4] Y. Chen, A. Das, W. Qin, A. Sivasubramaniam, Q. Wang, and N. Gau-tam, “Managing server energy and operational costs in hosting centers,” in Proc. of ACM SIGMETRICS. ACM, June 2004, pp. 303–14. [5] S. Abdelwahed, N. Kandasamy, and S. Neema, “Online control for

self-management in computing systems,” in Proc. of Real-Time and

Embedded Technology and Applications Symposium (RTAS). IEEE,

May 2004, pp. 368–75.

[6] D. Kusic and N. Kandasamy, “Risk-aware limited lookahead control for dynamic resource provisioning in enterprise computing systems,” in

Proc. of IEEE Intl. Conf. on Autonomic Computing (ICAC). IEEE,

June 2006, pp. 74–83.

[7] M. Welsh and D. Culler, “Adaptive overload control for busy internet servers,” in Proc. of USENIX Sym. on Internet Technologies and Systems

(USITS), March 2003.

[8] L. Wang and D. Lowther, “Selection of approximation models for elec-tromagnetic device optimization,” IEEE Trans. on Magnetics, vol. 42, no. 4, pp. 1227–30, Apr. 2006.

[9] J. Baras, “Modeling and simulation of telecommunication networks for control and management,” in Proc. IEEE Conf. on Simulation, Dec. 2003, pp. 431–40.

[10] E. Ipek, S. McKee, B. de Supinski, M. Schulz, and R. Caruana, “Effi-ciently exploring architectural design spaces via predictive modeling,”

ACM SIGOPS Operating Systems Review, vol. 40, no. 5, pp. 195–206,

Oct. 2006.

[11] D. Ho, J. Li, and Y. Niu, “Adaptive neural control for a class of nonlinearly parametric time-delay systems,” IEEE Trans. on Neural

Networks, vol. 16, no. 3, pp. 625–35, May 2005.

[12] S. Limanond and J. Si, “Neural network-based control design: An lmi approach,” IEEE Trans. on Neural Networks, vol. 9, no. 6, pp. 1422–9, Nov. 1998.

[13] J. Zhang and S. Dai, “Nn control of discrete-time mimo systems with input delay,” in Proc. IEEE American Control Conference, Jun. 2006. [14] M. Polycarpou and A. Helmicki, “Automated fault detection and

accom-modation: A learning systems approach,” IEEE Trans. on Systems, Man

and Cybernetics, vol. 25, no. 11, pp. 1447–58, Nov. 1995.

[15] S. Parekh, N. Gandhi, J. Hellerstein, D. Tilbury, T. Jayram, and J.Bigus, “Using control theory to achieve service level objectives in performance management,” in Proc. IFIP/IEEE Int. Symp. on Integrated Network

Management, May 2001, pp. 841–54.

[16] R. Das, G. Tesauro, and W. Walsh, “Model-based and model-free approaches to autonomic computing,” IBM Research Report RC23802 (WO511-125), Thomas J. Watson Research Center, Tech. Rep., Nov. 2005.

[17] S. Aljahdali, A. Sheta, and D. Rine, “Prediction of software reliability: A comparison between regression and neural network non-parametric models,” in Proc. ACS/IEEE Conf. on Computer Systems and

Applica-tions, Jun. 2001, pp. 470–3.

[18] K. Narendra and K. Parthasarathy, “Identification and control of dynam-ical systems using neural networks,” IEEE Trans. on Neural Networks, vol. 1, no. 1, pp. 4–27, Mar. 1990.

[19] K. Narendra, “Neural networks for control theory and practice,” IEEE

Trans. on Neural Networks, vol. 1, no. 1, pp. 4–27, Mar. 1990.

[20] J. Bigus, “Applying neural networks to computer system performance tuning,” in Proc. of IEEE Conf. on Neural Networks, Jul. 1994, pp. 2442–7.

[21] B. Lee and D. Brooks, “Accurate and efficient regression modeling for microarchitectural performance and power prediction,” ACM SIGOPS

Operating Systems Review, vol. 40, no. 5, pp. 285–94, Oct. 2005.

[22] S. Gurun and C. Krintz, “A run-time, feedback-based energy esti-mation model for embedded systems,” in Proc. of ACM Conf. on

Hardware/Software Codesign and System Synthesis, Oct. 2006, pp. 28–

33.

[23] G. Tesauro, N. K. Jong, R. Das, and M. N. Bennani, “A hybrid reinforcement learning approach to autonomic resource allocation,” in

Proc. of IEEE Intl. Conf. on Autonomic Computing (ICAC), 2006, pp.

65–73.

[24] J. Zhang, T. Hamalainen, and J. Joutsensalo, “Optimal resource allo-cation scheme for maximizing revenue in the future ip networks,” in

Proc. of the 10th Asia-Pacific Conf. on Comm. and 5th Intl. Sym. on

Multi-Dimensional Mobile Comm. IEEE, Sept 2004, pp. 128–32.

[25] A. C. Harvey, Forecasting, Structural Time Series Models and the

Kalman Filter. Cambridge, UK: Cambridge University Press, 2001. [26] P. Pillai and K. Shin, “Real-time dynamic voltage scaling for low-power

embedded operating systems,” in Operating Systems Principles (SOSP), 2001, pp. 89–102.

[27] M. Elnozahy, M. Kistler, and R. Rajamony, “Energy-efficient server clusters,” in Proc. of the 2nd Workshop on Power-Aware Computing

Systems (held with HPCA). HPCA, Feb. 2002.

[28] P. Antsaklis and A. Nerode, Eds., Special Issue on Hybrid Control

Systems, ser. IEEE Trans. Autom. Control, vol. 43, Apr. 1998.

[29] S. Abdelwahed, N. Kandasamy, and S. Neema, “A control-based frame-work for self-managing distrib-uted computing systems,” in Proc. ACM

Workshop Self-Managing Systems (WOSS), 2004.

[30] S. Haykin, Neural Networks: A Comprehensive Foundation. Prentice-Hall, 1999.

[31] J. F.E. Harell, Regression Modeling Strategies. Springer, 2001. [32] L. Wasserman, All of Nonparametric Statistics. Springer, 2006. [33] M. Arlitt and T. Jin, “Workload characterization of the 1998 world cup

web site,” Hewlett-Packard Labs, Technical Report HPL-99-35R1, Tech. Rep., Sept. 1999.

[34] C. Rusu, A. Ferreira, C. Scordino, and A. Watson, “Energy-efficient real-time heterogeneous server clusters,” April 2006, pp. 418–28. [35] M. Arlitt and C. Williamson, “Web server workload characterization:

The search for invariants,” in Proc. of ACM SIGMETRICS. ACM, 1996, pp. 126–37.

[36] P. Barford and M. Crovella, “Generating representative web workloads for network and server performance evaluation,” in Proc. ACM