POWER OPTIMIZATION IN HIGH AVAILABILITY SERVER CLUSTER

(1)

POWER OPTIMIZATION IN HIGH AVAILABILITY SERVER

CLUSTER

Supeno Djanali*, Hudan Studiawan†

*† Department of Informatics, Faculty of Information Technology, Institut Teknologi Sepuluh Nopember, 60111

Email: [email protected], [email protected]

ABSTRACT

Building a server cluster is one of the methods to provide high availability web server. However, not all of the servers in the cluster are utilized continuously. This condition leads to wasted energy consumed by unused servers. Therefore, to prevent excessive power consumption we need to optimize energy usage. This paper proposes an automated power optimization in high availability server cluster based on queuing model theory. The power is managed by allocating optimal power to each member in the cluster based on received workload. The proposed method could save power up to 70 Watt per checking period (25 seconds).

Keywords: power optimization, high availability, server cluster, queuing theoretic model.

1 INTRODUCTION

A server cluster consists of a set of computers that work together as an integrated computing resource [1]. Not only able to perform complex computing, server cluster also provides high availability services or applications. It means they have maximum availability level. In addition, the system can also cope with failures that are caused by planned or unplanned factors [2]. Cluster servers are widely used in industry or research field.

The example software that provides high availability services are Heartbeat [3] and Pacemaker [4]. Heartbeat is a cluster messaging services and Pacemaker is a cluster resource management. Heartbeat makes one server and the other one is able to detect each other presence. Cluster resource management gives us fail over capability. If there is one server dies, then resources can be diverted to other servers that are still alive.

Nevertheless, the performance of the server cluster that provides high availability has not been matched with the optimization of consumed power. In 2011, growth of clusters and data centers energy consumption costs in the United States is estimated at 7.4 billion dollars per year [5]. We would require power management to save money on a cluster.

Research in the field of power optimization begins in mobile devices environment to conserve battery [6]. However, it needs an adjustment on the power saving methods when we implement them on a cluster server. In [7], energy optimizations techniques are discussed with inter server consolidation and power level turnover on the server. The only resource considered in the study is storage device. Other power optimization techniques are the scheduling of high-performance computing applications between geographically distributed data centers [8]. In data centers that are not geographically distributed, queuing model theory is used to optimize the allocation of power on each server in a data center such that the achieved average response time is minimum [9]. This model proved to give good results in experiments with different scenarios and varying workloads. However, this model still utilizes a combination of hardware and expensive software in providing power allocation.

Before performing the optimization of power, we must calculate the power consumed by a server. Since the power measurement requires expensive type of server or a physical measuring instrument, we choose to use the power estimation techniques. Various researches to estimate power has been proposed. Power estimation in the entire system using performance counters on microprocessors is discussed in [10]. Access to the chipset, processor, memory, disk, and input/output (I/O) device recorded by the microprocessor and used to estimate power. Mantis is a power modeling in blade environment and Itanium servers [11]. Mantis calculate central processing unit (CPU) usage, access to memory, as well as the level of I/O on the hard disk and network interface. The collected data is then calculated by linear program to generate the power estimation formula.

Computing power without using physical measurement tool has also been done on a virtual machine environment [12]. The parameters considered are also still the same as the Mantis. Different from the research described previously, we utilize power estimation that takes into account

(2)

virtual memory utilization on the Linux operating system (swap). The calculation does not use a linear program but multiple linear regression method. This mechanism has been discussed in our previous work [13].

Another aspect in power optimization is the power restrictions or commonly stated as power capping. Methods of limiting the power use by embedding hardware on the server have been developed [14] and achieve precision measurements at 1 Watt power. This method was improved on [15] by increasing the precision to 0.1 Watt. Furthermore, technical limitations by using an embedded controller power is summarized, reviewed its relationship with the shift in power, and discussed in more detail in [16].

In principle, power capping can be done by setting the state on the components of a server that supports state regulation [17]. We prefer to cap the power by manipulating dynamic voltage and frequency scaling (DVFS) to run power restrictions in a more adaptive environment. This feature has been supported by almost all vendors of processors. DVFS has proven to achieve sufficiently low power consumption without sacrificing performance. The basic idea of DVFS techniques is to dynamically change the supply of voltage level to the processor while providing sufficient processor speed (frequency) to process the workload such that the computing time and limits of performance are still optimum [18].

Clusters of servers that provide services with high availability is pretty much researched and implemented. Large Hadron Collider (LHC) project, the highest energy particle accelerator and the largest one in the world, has a cluster consisting of more than 200 servers using Heartbeat and Pacemaker to ensure availability of services [19]. Power management, which coordinates several approaches to overcome server cooling costs and minimize average power consumption, has also been proposed [20].

From the above description, server cluster that provides high availability services have not been equipped with power optimization capabilities. On the other hand, the queuing model theory has been proposed to perform power management in server farms or data centers and has proven to give good enough performance. Therefore, in this paper we propose a method to optimize power in server cluster model based on queuing model theory. We will attach power optimization capabilities on a cluster server environment that provides high availability services. This feature will be implemented on the top of cluster resource management layer. To cover the weakness of

queuing models have been previously proposed, power limitations will utilize state manipulation features that already available on almost any type of processor.

2 RESEARCH METHOD

There are three main steps to optimize power in high availability server cluster. They are power estimation, power capping, and power optimization as the main step of the proposed method. Every step will be described in detail below.

2.1 Power Estimation

We utilize power estimation technique that not only consider main parameters of utilization i.e. processor, memory, hard disk, and data transfer rate on network interface but also adds virtual memory usage [13]. This technique has successfully been implemented in the blade server environment and we employ it in the non-blade one. This method is described briefly as follows.

A daemon (background process) to record the utilization of each server component is run first. Furthermore, make sure clamp meter as voltage measuring tool certainly has been running well and can monitor server power consumption. Then, the benchmark is run to give the load to the processor, memory, hard disk, swap, and I/O network. Various benchmarks provided are intended to overload the server component in accordance with the real work of the server that processes various tasks.

Data from the daemon log file and clamp meter are synchronized so that we obtain component utilization and power consumed per minute. This data is used as input for multiple linear regression method. Multiple linear regression is used because it can represents several independent variables to be predictors in the better way [21]. This method generates equation that can be used to estimate the power from utilization of server components as follows:

y = 59.0625 + 0.366407*x1 + 0.405598*x2 –

0.638004*x3 + 1.5529*x4 – 0.954271*x5 (1)

whereas x1, x2, x3, x4, and x5 are utilization of

processor, memory, hard disk, swap, and data transfer rate on the network, respectively.

2.2 Power Capping and

Power-Frequency Relationship

Power capping is implemented by manipulating DVFS or so-called P-state. This feature has proven to achieve sufficiently low power consumption without sacrificing performance. For example, on

(3)

an Intel Core i3 there are 14 states in MHz unit: 2.926, 2.793, 2.660, 2.527, 2.394, 2.261, 2.128, 1.995, 1.862, 1.729, 1.596, 1.463, 1.330, and 1.197. There are several modes in P-state setting such as performance, ondemand, conservative, powersave, and userspace. Performance mode puts the processor frequency at the maximum rate. The ondemand will dynamically adjust the rise and fall frequency based on the given workload. Conservative is similar to ondemand, but changes made in more subtle frequencies. As opposed to performance, powersave sets the frequency of the processor run at minimum levels. We will use userspace mode where the processor frequency can be set manually. All of this manipulation can be done with root privileges on the operating system.

Restrictions on the minimum power can be performed by adjusting the frequency of the processor at the lowest state. Whereas the maximum power limitation is implemented by adjusting the frequency of the processor at the highest state. Power capping technique proposed in this study can be run in the more adaptive environment because the features DVFS has been supported by almost all vendors of processors.

The main parameter for power optimization on the cluster is a relation between power and processor frequency. In this context, the maximum power allocation on each server is limited. To determine the relationship between power and frequency, we give workload on the cluster. One type of workload that is used is LINPACK [22].

LINPACK is given to the server continuously without any interval of time between the arrival of the first and subsequent loads. This ensures that the server get peak workload performance in accordance with the allocation of power delivered. The relation between power and processor frequency is expressed in a linear equation as follows:

s = sb + α(P - b), (2)

whereas s is the speed or server processor frequency (GHz), sb is fully loaded server processor frequency

at power b (Watt), α is coefficient of slope in power-frequency relationship (GHz/Watt), P is power allocation allocated (Watt), b is minimum power consumed by fully loaded server at allowed processor frequency range (Watt).

2.3 Power Optimization

Server cluster used consist of k servers. Workload, q, is given to the server with the speed λ. Each server receives the workload of qk. Processor

carries out workload by Processor-Sharing (PS)

scheduling algorithm. This algorithm is identical to the Round Robin algorithm with quantum or time slices that close to zero. Round Robin is a load sharing algorithms that are time-sharing. Furthermore, each server has a processor speed of sk

with given power allocation Pk. Distribution of the

load is carried by load balancer (LB). In the high availability architecture, LB should have a maximum availability. Therefore, we add another LB to handle workload distribution when there is a failure on the active LB. Our proposed server cluster is illustrated in Figure 1.

Queuing model theory will minimize the average response time with a certain power allocation given to the cluster. The workload is assumed to come into clusters based on the Poisson distribution. This study uses open-loop configuration. In this configuration, the load coming from outside the system and leave the system after the process is complete. For this type of configuration, queuing theory model used is shown in the Formula 3 [9].

If

If (3)

whereas c is maximum power consumed by fully loaded server at allowed processor frequency range (Watt), P* is optimum power allocation, n is number of active server at power c, m is number of active server at power b, k is total number of server in the cluster, λ is workload arrival rate in server cluster, and λlow = αP.

Power is measured with the Equation (1) that has been generated from the power estimation step. After getting input from the server processor speed and power consumed, the ratio of these two parameters were compared with coefficient α. We continue optimization process by looking at workload arrival rate (λ). Then, the system decides allocation of power to the server whether it is maximum power (c) or the minimum one (b).

Passwordless ssh (secure shell) is used to turn off the remote server. With passwordless ssh, ssh command can be directly followed by the halt command to shut down the server. As for turning on the server, we use WakeOnLan features that have been supported by all latest versions of motherboards. Passwordless ssh and WakeOnLan (WOL) must be configured first. Therefore, this power optimization daemon (background process) can run well. We employ daemon to automate all tasks implemented by the system. Power optimization daemon is installed on the load

(4)

balancer as well as decision-makers in turn on or off a server. Power optimization feature is run on the background of the operating system once every 25 seconds. Time of 25 seconds was chosen because it takes 25 seconds to turn on or off the server from booting on average.

PS

PS Power P1 and speed s1

Power P2 and speed s2

PS Power Pi and speed si

PS Power Pk and speed sk Arrival workload with speed λ

q1 q2 qi qk LB LB

Figure 1. Proposed server cluster architecture

3 RESULT

To test our proposed method, we employ power optimization system in the cluster with each server has the following hardware specifications: Processor Intel Core i3 530 4 GHz, 4 GHz memory, 250 GB hard disk, 4 GB swap area, and Ubuntu Server 10.04 Lucid Lynx with kernel 2.6.32-21-generic. All software used to implement this research is open source software include: Heartbeat as inter server service message, Pacemaker as a cluster resource management, Apache as web server that has proven reliability, MySQL as database server storing various data of server cluster, Haproxy as load balancer application, Httperf and Autobench as a tool to generate traffic to the cluster servers run from the client side.

To find the processor frequency and power relationship, we give CPU-bound workload Linpack as many as five threads at a time. This will ensure processor reaches its peak load, so consumption also reaching the top. Graphical representation of processor frequency and power relationship are presented in Figure 2. Power consumed by each state is mean of three times tests as drawn in Table 1. Data shown in Figure 2 are estimated by linear approximation so that the parameter α can be obtained as follows:

(4)

For ease of calculation in implementation, the minimum power (b) rounded up to 170 watts and maximum power (c) rounded up to 210 Watt. With 840 Watt power allocation, it can be calculated that

n = 840/210 = 4 and m = 840/170 = 5. In other words, there are four servers running with maximum power and five servers are operating at minimum power. Allocation of total power on a server cluster based on queuing theory model should satisfy the following equation: P = m x b = n

x c. This experiment is trying to find the proper allocation of resources for clusters with five servers in it. The constant α is a threshold for decision making whether a server can be switched on or off.

Figure 2. Relationship of processor frequency and consumed power

Power optimization experiments first done with giving workload to a server cluster. Workload generated by the Poisson distribution is given in accordance with the assumption of the theory of queuing models. Therefore, we set random number with the average value of 7000 as the Poisson distribution parameter. We select 7000 because the value is workload threshold when one server must be turned on or off. Random number is displayed on workload column in Table 2.

Table 1. Power consumed by each state State (MHz) Test 1 (Watt) Test 2 (Watt) Test 3 (Watt) Mean (Watt) 1.197 166.15 168.07 167.05 167.09 1.330 189.33 187.45 188.09 188.29 1.463 187.41 187.75 187.97 187.71 1.596 179.60 178.50 180.70 179.60 1.729 180.75 180.25 178.22 179.74 1.862 190.33 190.32 190.65 190.43 1.995 196.15 196.83 197.99 196.99 2.128 200.17 198.95 202.78 201.63 2.261 188.75 189.55 190.98 189.76 2.394 190.87 191.89 192.97 191.91 2.527 195.76 190.76 190.95 192.49 2.660 195.35 193.00 195.00 195.45 2.793 195.67 195.9 195.90 195.82 2.926 210.30 205.5 207.00 207.60

(5)

Power consumed on each server when processing workloads range from 50-70 Watts. With the optimization of this power, active server is adapted to accepted workload. From the experiments performed with httperf application, fifth server is turned on when each server in the cluster receives the workload of 7000 requests/second or more. With this mechanism, a cluster that contains five servers is able to save about 50 to 70 Watts per checking period (25 seconds). High availability features that have been configured on the load balancer can be checked by using the command crm_mon (cluster resource manager monitors) owned by Pacemaker. Screenshot when one load balancer is active or inactive is illustrated in Figure 3. When a load balancer fails, its role will be directly handled by the second one. Fail over takes about 23 seconds. If a cluster server has been tested for one week, then the level of availability clusters, A, can be calculated as follows:

(5)

%. (5) whereas A is availability, MTBF is mean time between failure, and MTTR is mean time to resolve.

Table 2. Power consumed by each state Workload (request/second) Active server 6969 4 6818 4 7048 5 6948 4 6963 4 7004 5 7122 5 7007 5 7026 5 6911 4

Figure 3. Load balancer monitoring

4 CONCLUSION

From the all steps of study that has been done, we can draw conclusion as follows. Power capping in each server in the cluster do not need to use expensive software or hardware because it can be implemented with state manipulation of processor providing features Dynamic Voltage and Frequency Scaling (DVFS). Power optimization in a high availability server cluster based on queuing theory models can be implemented in automated ways. Based on our experiment, the proposed method can save power up to 70 Watt per checking period (25 seconds).

There are several improvements that can be developed to increase the performance of proposed method in this paper. More accurate power capping should consider the hard disk state, memory, and network interface. Optimization of power is not only implemented on the processing server but also to load balancer server.

REFERENCE

[1] K. Kopper, The Linux Enterprise Cluster: Build a Highly Available Cluster with Commodity Hardware and Free Software, San Francisco; No Starch Press, 2005. [2] L. Parziale, A. Dias, L.T. Filho, D. Smith,

J. VanStee, M. Ver, Achieving High Availability on Linux for System z with Linux-HA, New York; IBM Redbooks, 2009.

[3] A. Robertson, The Evolution of The LinuxHA Project, Proceedings of UKUUG LISA/Winter Conference High-Availability and Reliability, The UK's Unix & Open Systems User Group, 2004.

[4] L. Marowsky-Bree, A New Cluster Resource Manager for Heartbeat,

Proceedings of UKUUG LISA/Winter

Conference High-Availability and

Reliability, The UK's Unix & Open Systems User Group, 2004.

[5] U.S. Environmental Protection Agency, Report to Congress on Server and Data Center Energy Efficiency, Public Law 109-431, 2007.

[6] A. Beloglazov, R. Buyya, Y.C. Lee, A. Zomaya, A Taxonomy and Survey of Energy-Efficient Data Centers and Cloud Computing Systems, Cloud Computing and Distributed Systems Laboratory, The University of Melbourne, 2010.

(6)

[7] S. Srikantaiah, A. Kansal, F. Zhao, Energy Aware Consolidation for Cloud Computing,

Cluster Computing, 12, 2009, 1-15.

[8] S.K. Garg, C.S. Yeo, A. Anandasivam, R. Buyya, Environment-Conscious Scheduling of HPC Applications on Distributed Cloud-Oriented Data Centers, Journal of Parallel and Distributed Computing, 71(6), 2010, 732-749.

[9] A. Gandhi, M. Harchol-Balter, R. Das, C. Lefurgy, Optimal Power Allocation in Server Farms, Proceedings of the 11th

International Joint Conference on

Measurement and Modeling of Computer Systems, ACM & SIGMETRICS, 2009, 157-168.

[10] W.L. Bircher, L.K. John, Complete System Power Estimation: A Trickle-Down Approach Based on Performance Events,

Proceedings of the International

Symposium on Performance Analysis Systems and Software, IEEE, 2007, 158-168.

[11] D. Economou, S. Rivoire, C. Kozyrakis, P. Ranganathan, Full-system power analysis and modeling for server environments,

Workshop on Modeling, Benchmarking and Simulation, 2006.

[12] A. Kansal, F. Zhao, J. Liu, N. Kothari, A.A. Bhattacharya, Virtual Machine Power Metering and Provisioning, Proceedings of the 1st_{Symposium of Cloud Computing}_, ACM, 2010, 39-50.

[13] H. Studiawan, S. Djanali, W. Suadi, Estimasi Daya pada Lingkungan Server Blade, Prosiding Seminar Nasional Manajemen Teknologi XII, 2011, C-16-1:C-16-8.

[14] X. Wang, C. Lefurgy, M. Ware, Managing Peak System-level Power with Feedback Control, Austin IBM Research Division, 2007.

[15] C. Lefurgy, X. Wang, M. Ware, Server-level Power Control, Proceedings of the 4th

International Conference on Autonomic Computing, IEEE Computer Society, 2007. [16] C. Lefurgy, X. Wang, M. Ware, Power

capping: a prelude to power shifting,

Cluster Computing, 11(2), 2008, 183-195. [17] D.J. Wong, Monitoring and controlling

power consumption using Linux on System x, Blueprint for Linux on IBM Systems, 2008.

[18] K. Choi, R. Soma, M. Pedram, Dynamic Voltage and Frequency Scaling based on Workload Decomposition, Proceedings of the 2004 International Symposium on Low Power Electronics and Design, ACM, 2004, 174-179.

[19] R. Schwemmer, N. Neufeld, Implementing High Availability with Cots Components and Open-Source Software, Proceedings of

12th _{International} _Conference _on

Accelerator and Large Experimental Physics Control Systems, 2009.

[20] R. Raghavendra, P. Ranganathan, V. Talwar, Z. Wang, X. Zhu, No ‘Power’ Struggles: Coordinated Multi-Level Power Management for the Data Center,

Proceedings of the 13th_{International}

Conference on Architectural Support for Programming Languages and Operating Systems, ACM, 2008, 48-59.

[21] R.E. Walpole, R.H. Myers, S.L. Myers, K. Ye, Probability and Statistics for Engineers and Scientists, New Jersey; Prentice Hall. 2007.

[22] J.J. Dongarra, P. Luszczek, A. Petitet, The LINPACK Benchmark: Past, Present, and Future, Concurrency and Computation: Practice and Experience, 5(9), 2003, 803-820.