Improving Response Time and Energy Efficiency in Server Clusters

(1)

Improving Response Time and Energy Efficiency

in Server Clusters

Raphael Guerra, Luciano Bertini and J.C.B. Leite

Instituto de Computação - Universidade Federal Fluminense Rua Passo da Pátria, 156, Bloco E, 24.210-240, Niterói, RJ, Brazil

[rguerra, lbertini, julius]@ic.uff.br

Abstract. The development of energy-efficient web server clusters requires the

study of different request dispatch policies applied by the central access point to the cluster, the front-end, and/or the application of hardware techniques that allow the best usage of resources. However, energy efficiency should not be attained at the expense of poor response times. This paper describes a technique that tries to balance energy consumption and adequate response times for soft real-time applications in server clusters.

Resumo. O desenvolvimento de servidores web energeticamente eficientes

re-quer não só o estudo de pol´ıticas de despacho a serem aplicadas pelo nó central, como também o uso de técnicas de hardware que permitam um melhor uso dos recursos. Contudo, essa eficiência não pode ser obtida em detrimento do atendi-mento aos prazos de execução das requisições. Este artigo descreve uma técnica que tenta balancear consumo de energia e tempos de resposta adequados para aplicações de tempo real não cr´ıticas em clusters de servidores.

1 Introduction

The development of energy-efficient mechanisms for web server clusters requires the study of different request dispatch policies applied by the central access point to the clus-ter, the front-end, and/or the application of hardware techniques that allows the best usage of resources. Several works have been published on these policies, and a good review is presented in [Cardellini et al. 2002]. Essentially, they classify the algorithms in those that work at OSI layer 4, and those that work at OSI layer 7. The former are not content aware, i.e., they cannot look to what content is being requested to make the dispatch decision. On the other hand, the latter can rely on information extracted from the URL, for purposes such as to improve the cache affinity, increase the load sharing, and use specialized server nodes to provide, for example, streaming media and dynamic content. Most of the work mentioned, however, was done aiming to maximize performance, not energy-efficiency.

Another important structural characteristic of a server cluster, for which research is still beginning, is to consider node heterogeneity and energy efficiency. When main-taining a web cluster, a replacement or a new node to be added is naturally different from the old ones. Thus, clearly, a cluster is usually homogeneous only when it is first put to service. Another viewpoint on heterogeneity in given in [Lefurgy et al. 2003], on the architecture of commercial servers and the possibilities for energy efficiency in its var-ious subsystems. In that work, they state that mixing power-efficient and performance-efficient processors is important for the support of Internet applications, because these ap-plications require both efficient network-protocol processing and application-level com-putation. Whatever the motivation, it is necessary to develop new power management

(2)

techniques aware of the cluster heterogeneity. Furthermore, for the mentioned service differentiation, it is necessary to provide some kind of QoS control, for example, at the response time level.

There are two main mechanisms that can be used to reduce the energy consump-tion in a cluster, without considering memory, disks and other peripherals. The first of them is DVS (Dynamic Voltage Scaling), which means to scale the voltage and frequency of the processor to predefined supported levels. The other is the dynamic structure con-figuration of the server cluster, or what is called VOVO (Vary-On Vary-Off), or simply dynamic cluster reconfiguration: turning a server off to save energy, or turning on a server to improve performance. Both techniques have been used together by some authors and they have been proved successful.

Although the energy minimization is important, it will not always be desired at maximum levels. For example, the system administrator may desire to speed-up the sys-tem, with more energy costs, or it may be desirable to maintain different classes of clients, which will have more privileges on response time. In an e-commerce application, for ex-ample, the clients that have already started a transaction should have better response times than the others that are navigating the site. For this reason, the system must be designed with QoS in mind.

The purpose of this paper is to present a heterogeneous web server cluster model, with the goal of attaining minimum energy expenditure while guaranteeing response time requirements. The techniques used are DVS and cluster reconfiguration, with a content-blind request dispatch algorithm. In this paper, through simulation, we show results that outperform state-of-the-art techniques. The paper is organized as follows: section 2 presents some related work in energy-efficient web servers. Section 3 presents the system model adopted and section 4 the problem formulation and solution. Section 5 presents some results and section 6 presents our conclusions.

2 Related Work

In [Bohrer et al. 2002] the authors applied DVS to a single server, based on

uti-lization limits to change frequency. Also for single servers, the technique of DVS

with delaying requests are presented in [Elnozahy et al. 2003]. Important works

for clustered servers [Chase and Doyle 2001, Chase et al. 2001, Pinheiro et al. 2003, Rajamani and Lefurgy 2003] presented similar ways of applying DVS and cluster recon-figuration, using threshold values, based on the utilization or the system load, to define the transition points and keep the processor frequencies as low as possible, with the fewer possible number of active nodes. All these works are summarized in the survey presented in [Bianchini and Rajamony 2004].

The work in [Rusu et al. 2004] evaluates DVS policies for power management in systems with unpredictable workloads. One simple technique, used in [Xu et al. 2005], is the application-oblivious prediction, based on periodical utilization monitoring. They also show more complex techniques which attempt to predict performance needs by mon-itoring the arrival rate and CPU requirements of each request.

In [Elnozahy et al. 2002] the IVS (Independent Voltage Scaling) and CVS (Coor-dinated Voltage Scaling) techniques are proposed. In the former, each server node decides

(3)

locally its frequency value, while in the latter scheme, all nodes operate close to the av-erage frequency for the whole cluster. They also combine these DVS techniques with VOVO, which was originally proposed in an earlier version of [Pinheiro et al. 2003]. In this work, only continuous frequencies are considered.

The work in [Sharma et al. 2003] considers DVS in QoS enabled web server clus-ters, assuming load balancing in the nodes, which makes the power management problem symmetric across the cluster. In [Lien et al. 2004] is presented a simple reconfiguration technique for a server cluster. Their model assumes a M/M/m queue and the energy con-sumption is calculated using the system expected waiting time. However, they do not consider heterogeneity, nor the DVS capability.

Finally, the papers [Xu et al. 2005] and [Rusu et al. 2006] are the most relevant to our work. The former propose the technique LAOVS (Load-Aware On-off with in-dependent Voltage Scale), where the determination of the active node number is made using a table calculated off-line, with a load discretization. For each load value, the best number of active nodes is obtained. The local power management is based on DVS using the same techniques presented in [Rusu et al. 2004]. They do not consider heterogeneity. In [Rusu et al. 2006] they include heterogeneity and QoS restrictions.

3 System Model

In our model, we consider a cluster with a total of N server nodes, from which n are

active, one front-end node, and only one type of request. The servers can be turned on and off as needed and their operating frequencies can be adjusted in a discrete way. The front-end node, assumed to work at the OSI layer 4, receives the requests from clients and redistributes them to the server nodes, in a content-blind request distribution method. The dispatching algorithm is a random weighted dispatch, where the requests are split

inton streams, wheren is the number of active nodes in the cluster. The probability of

a incoming request being sent to a stream is proportional to the operating frequency of the associated node. This same dispatching technique is used in some commercial web servers based on a layer-4 web switch [Cardellini et al. 2002].

We consider that the requests follow a Poisson distribution with average arrival

rateλ. The requests are distributed to N queues, each one with a service rate µi (thus

allowing for heterogeneous servers). The arrival rate for each queue is qiλ, where qi is

the probability of sending a request to server i and is given by f opi

PN

j=1f opj

. In this last

expression,f opi is the operating frequency of server iand PNj=1f opj is the sum of the

operating frequencies of all nodes (inactive nodes count as 0). Thus, the probabilityqi

represents the fraction of load thati can handle in the actual configuration. An inactive

node, obviously, handles a 0 load and have null probability of receiving a request. The

re-quests service time follow a exponential distribution and have a service rateµif executed

in the fastest processor at its highest frequency (M AX F REQ). Thus, the service rates

for each queue are given by( µf op1

M AX F REQ),(

µf op2

M AX F REQ), . . . ,(

µf opN

M AX F REQ). The model

is shown in Figure 1.

In the model described, one Poisson process is split intoN sequences of requests

among the N servers, randomly selected as previously described. It is a well known

(4)

front end λ λ2=Pf opN 2λ j=1f opj λN=Pf opNNλ j=1f opj λ1=Pf opN 1λ j=1f opj µ1=M AX F REQµf op1 µ2=M AX F REQµf op2 µN=M AX F REQµf opN

Figure 1. Cluster model

The response time (deadline) will be used as a QoS parameter and the goal is to keep a

predefined fractionβ of the requests finishing before this deadline. We callβ reliability

factor. Thus, we should keep the probabilityW(t) = P r[response time≤t] ≥ β. We

calculate the mean value of this probability for the whole cluster by the average of each

Wi weighted by the probabilityqi. The equation forW(t), using the distribution function

for the response time of a M/M/1 queue [Kleinrock 1975], is as follows:

W(t) = N X i=1 qi 1−e−(µi−λi)t ₍₁₎

The maximum workload that the system supports, in cycles per second, is

PN

i=1max f reqi. Without loss of generality, frequencies are normalized by the

maxi-mum frequency of all the processors and the parameter µ refers to this maximum

fre-quency. Thus, the requests mean number of cycles is M AX F REQ_µ , and the actual load of

the system is λM AX F REQ_µ , in cycles per second. We can then normalize the system load

by the maximum supported load:

x= λM AX F REQ µPN i=1max f reqi = λ µPN i=1 max f reqi M AX F REQ (2)

The actual capacity of the active cluster, given byPN

i=1f opi, must be higher than

or equal to the actual workload, in order to keep up with the incoming requests. In other

words, using the normalized equations, the normalized workloadxmust be smaller than

PN

i=1f opi

PN

i=1max f reqi .

Finally, it is assumed that VOVO and DVS decisions are cluster wide and thus taken only by the front-end node. Load measures are made periodically and the decision to reconfigure the system is taken after the increase or decrease of the load is repeated a predefined number of times.

4 Problem Definition and Solution Sketch

The problem to be solved is to establish, for each processor, whether it will be on or off and, in the former case, its operating frequency, subject to energy and timing restrictions.

The solution to the problem is a vector{f op1, f op2, . . . , f opN}, where f opi is the

(5)

can see this problem as an optimization problem, where the goal is to minimize the to-tal aggregate expended power of the cluster, and yet guarantying an acceptable response

time. Let pi(fj) be the power consumption of processor i, running at its frequency fj.

The valuef0 will be equal to zero, and will represent that processor iis turned off, and

consumes no energy. With these assumptions, considering only the active servers, the

ag-gregated power of the cluster isP = PN

i=1[ρpi(f opi) + (1−ρ)pi(idle)], where pi(idle)

is the power of processoriwhen idle, andρis the processor utilization. The problem can

be stated as follows: Minimize P = N X i=1 [ρpi(f opi) + (1−ρ)pi(idle)] (3) subject to PN i=1f opi PN i=1max f reqi ≥x (4) and N X i=1 qi 1−e−(µi−λi)t_≥_β ₍₅₎

wheret is the predefined expected response time, andβ is the minimum fraction of the

requests that should fulfill the QoS requirement.

In order to determine the number of active nodes and their respective operating frequencies, and inspired by the work done in [Rusu et al. 2006], the solution to this opti-mization problem is done off-line and a number of tables are obtained. That is, assuming

we have a normalized workload represented inrxdiscrete levels, and the desired response

time inrtlevels, and also assuming that we haverrdifferent reliability factorsβ, we will

have a maximum ofrr×rttables, each one with rx entries. Many techniques could be

used to obtain the solution. For this experiment, we used a search algorithm to solve the problem optimally, and that showed adequate to the number of nodes here considered

(≤ 10). Although for a greater number of nodes an exact algorithm will be inefficient,

this is not a concern in this work. To solve this, some heuristics, like GRASP or Tabu Search, could be used to reduce this off-line computation execution time.

5 Simulation Results

In our experiments we assumed a server cluster with8machines, two of each type shown

in table 1. The requests follow a Poisson process, with average dependent on the desired workload. The average execution time of each request follows an exponential distribution

with average0.01s (if executed at the highest frequency of the fastest processor). In each

experiment, a total of8×105 _{requests were simulated. To compute the tables referred}

in the previous section, a granularity of 0.01was assumed for the workload. The QoS

(6)

it is assumed that load measurements are made every1s, that changes in the configuration

are done after 5 consecutive load increases (decreases), and that ρ is computed every

1s in the simulation. Finally, it should be mentioned that, in the simulator, the effect

of switching on and off a server is taken into account. For the experiments described,

switching on a server implies a33s penalty and an additional190J of power consumption.

Table 1. Processors specifications

Processor Frequencies (MHz) Resp. power consumption (W)

XScale idle,150.0,400.0, 0.355,0.355,0.445, 600.0,800.0,1000.0 0.675,1.175,1.875 Power PC 750 idle,4.125,8.25,16.5, 1.150,1.150,1.369,1.811, 33,99,115.5,132 2.661,4.763,5.269,6.533 Power PC 1GHz 750GX idle,533,600,667,733, 7.63,7.63,7.8,7.97,8.13, 800,867,933,1000 8.30,10.35,12,12.25 Power PC 405 GP idle,66,133,200,266 0.74,0.74,1.09,1.36,1.58

To assess our method, we compared it to the one proposed in [Rusu et al. 2006]. Figure 2 shows average power consumption of our method, for different response time

QoS parameters and constant reliability factor equal to0.8, and for the method presented

in [Rusu et al. 2006], but without the QoS restrictions, so that we can compare both meth-ods in the most energy efficient situation. For this comparison, we assumed a QoS

param-eter of1s, because this value is high enough for the great majority of requests to be

exe-cuted in time, resulting in the best energy efficiency. Our method presents better results, even in some cases where there is a more tight response time restriction (workload lower

than0.3). The reason for this is that in our method, the search algorithm finds the best

configuration for each load level, while the method presented in [Rusu et al. 2006] uses a predefined sequence of machines to be turned on and off, and this limits the optimization process. As expected, the smaller the QoS requirement, or the higher the workload, the processors will have to work faster to respond to the requests within the specified dead-line, thus consuming more power. This behavior can be clearly seen in the figure. In our implementation, whenever the defined QoS cannot be satisfied we set up the cluster to full power, in order to operate at the best effort level. This is the reason why all curves, at some point, meet in the same line (the full power situation for a certain load). For

exam-ple, for workloads greater than0.7in Figure 2, all the configurations with QoS response

times of0.02,0.05, and0.07seconds achieve the same power consumption.

Figure 3 shows the cluster power consumption considering different workloads.

In this experiment, the QoS requirement were kept constant at0.05s. As it can be seen,

the effect on power consumption of imposing a higher reliability factor is greater as the workload increases. In this situation, the system becomes rapidly saturated and starts

to work at full power (best effort approach). For example, the curve with workload0.8

becomes saturated for reliability0.6, and the curve with workload0.6becomes saturated

only for reliability0.8.

Finally, Figure 4 shows the actual fraction of requests that have their time demands

satisfied, for differentβand workloads. Ideally, this curve should follow the identity line,

(7)

0 5 10 15 20 25 30 35 40 45 0 0.2 0.4 0.6 0.8 1 Power (Watts) Normalized workload Rusu2006 QoS = 0.02s QoS = 0.05s QoS = 0.07s QoS = 1.00s

Figure 2. Cluster aggregate power for different QoS requirements, withβ= 0_.8 0 5 10 15 20 25 30 35 40 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Power (Watts) Reliability factor workload=0.2 workload=0.4 workload=0.6 workload=0.8

Figure 3. Cluster aggregate power for different workloads and QoS re-quirement of0.05s

of met deadlines (or even impossible, due to cluster saturation). As can be seen, withβ

increasing, the curves depart from the identity line, and, eventually, saturate (meaning that the system cannot satisfy the QoS requirement at the specified reliability level, shown as points below the identity line). Additionally, due to the discrete frequencies of the

processors, as the workload and factorβ decrease, the curves will bend upward. This is

because the processors have a minimum operating frequency and the requests are being processed at a higher frequency than necessary. This can be clearly seen in the step-like

curve for workload0.2.

0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

Obtained fraction of met QoS requirements

Reliability factor workload=0.20

workload=0.55 workload=0.71 identity

Figure 4. Actual fraction of QoS restrictions met, as a function ofβ, for different workloads, with QoS=0.05s

6 Conclusion

In this paper we presented a technique to achieve minimization of energy consumption at adequate response times for soft real-time applications in web server clusters. The prob-lem is stated as an optimization probprob-lem and is solved off-line. During system operation, accordingly to the offered load, the QoS restriction (response times) and the predefined proportion of requests that should have their deadlines met (soft real-time criterion), pro-cessors are switched on and off, and the ones active are set to an optimal frequency of operation. In our simulations and comparison to other proposals, the technique here de-scribed showed promising results.

(8)

7 Acknowledgments

The authors would like to thank CNPq, Capes and Faperj for partially providing funds for this research, and also the comments from the anonymous reviewers.

References

Bianchini, R. and Rajamony, R. (2004). Power and energy management for server sys-tems. IEEE Computer, 37(11):68–74.

Bohrer, P., Elnozahy, M., Kistler, M., Lefurgy, C., McDowell, C., and Rajamony, R. (2002). The case for power management in web servers. In Graybill, R. and Melhem, R., editors, Power Aware Computing. Kluwer Academic Publishers.

Cardellini, V., Casalicchio, E., Colajanni, M., and Yu, P. S. (2002). The state of the art in locally distributed web-server systems. ACM Computing Surveys, 34(2):263–311. Chase, J., Anderson, D., Thakur, P., and Vahdat, A. (2001). Managing energy and server

resources in hosting centers. In Proceedings of the 18th Symposium on Operating

Systems Principles, pages 103–116, Banff, Alberta, Canada.

Chase, J. and Doyle, R. (2001). Balance of power: Energy management for server clus-ters. In Eighth Workshop on Hot Topics in Operating Systems.

Elnozahy, M., Kistler, M., and Rajamony, R. (2002). Energy-efficient server clusters. In

Second Workshop on Power Aware Computing Systems, pages 179–196, Cambridge,

MA, USA.

Elnozahy, M., Kistler, M., and Rajamony, R. (2003). Energy conservation policies for web servers. In 4th USENIX Symposium on Internet Technologies and Systems, Seattle, WA, USA.

Kleinrock, L. (1975). Queueing Systems, volume 1. John Wiley and Sons.

Lefurgy, C., Rajamani, K., Rawson, F., Felter, W., Kistler, M., and Keller, T. W. (2003). Energy management for commercial servers. IEEE Computer, 36(12):39–48.

Lien, C.-H., Bai, Y.-W., Lin, M.-B., and Chen, P.-A. (2004). The saving of energy in web server clusters by utilizing dynamic sever management. In 12th IEEE International

Conference on Networks, volume 1, pages 253–257, Singapore.

Pinheiro, E., Bianchini, R., Carrera, E. V., and Heath, T. (2003). Dynamic cluster recon-figuration for power and performance. In Compilers and Operating Systems for Low

Power. Kluwer Academic Publishers.

Rajamani, K. and Lefurgy, C. (2003). On evaluating request-distribution schemes for saving energy in server clusters. In IEEE International Symposium on Performance

Analysis of Systems and Software, pages 111–122, Austin, Texas, USA.

Rusu, C., Ferreira, A., Scordino, C., Watson, A., Melhem, R., and Moss´e, D. (2006). Energy-efficient real-time heterogeneous server clusters. In IEEE Real-Time and

Em-bedded Technology and Applications Symposium, San Jose, CA, USA.

Rusu, C., Xu, R., Melhem, R., and Moss´e, D. (2004). Energy-efficient policies for request-driven soft real-time systems. In 16th Euromicro Conference on Real-Time

Systems, pages 175–183, Catania, Italy.

Sharma, V., Thomas, A., Abdelzaher, T. F., Skadron, K., and Lu, Z. (2003). Power-aware QoS management in web servers. In 24th IEEE Real-Time Systems Symposium, pages 63–72, Cancun, Mexico.

Xu, R., Zhu, D., Rusu, C., Melhem, R., and Moss´e, D. (2005). Energy-efficient policies for embedded clusters. SIGPLAN Notices, 40(7):1–10.