Building a Power-Aware Database Management System

(1)

Building a Power-Aware Database Management System

Zichen Xu

University of South Florida Tampa, FL, 33620

[email protected]

ABSTRACT

In today’s large-scale data centers, energy costs (i.e., the electricity bill) are projected to outgrow that of hardware. Despite a long history of research in energy-saving tech-niques, especially low-power hardware, little work has been done to improve the power eﬃciency of data management software. Power-aware computing research at the applica-tion level has been found to be synergistic to that at the hardware and OS levels because it can provide more oppor-tunities for energy reduction in the underlying systems. This paper describes the author’s thesis work on creating a power-aware database management (P-DBMS) and initial ideas on the design of such systems, with the focus on a power-aware query optimization module inside the DBMS. We discuss the main technical challenges in designing the optimizer and present our strategies to meet such challenges. We focus our discussions on a power model to accurately measure the en-ergy costs of query executions plans, and a cost evaluation model for plan selection. An important feature of this work is the formal control-theoretic methods we use to model and optimize the database towards the performance and energy saving goals. This rigorous design methodology is in sharp contrast to heuristic-based adaptive solutions that rely on extensive empirical evaluation and manual tuning. Our ex-periments using a power-aware query optimizer under our initial design show that there exist signiﬁcant potential in power/energy savings.

Categories and Subject Descriptors

H.2 [Information Systems]: DATABASE MANAGEMENT

General Terms

Performance

1. INTRODUCTION

Demands for computing capacity of information process-ing is growprocess-ing dramatically, which have forced vendors to provide faster, larger scale and inexpensive solutions. Par-ticularly, they are seeking servers for high scalability setup, lower unit price, and simplicity of management [8]. As to

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

Proceedings of the Fourth SIGMOD PhD Workshop on

In-novative Database Research (IDAR 2010), June 11, 2010,

Indi-anapolis, USA.

maintenance costs, the second largest bill for large database centers comes from the power company. In US alone, data centers consumed about 60 billion kilowatt-hours (kWh) of electricity in 2006, equaling roughly 1.5% of the total U.S.

electricity consumption [1]. Furthermore, the amount of

power consumption in data centers has almost doubled since year 2006. Researchers from multiple disciplines are invest-ing their eﬀorts in energy optimization in data centers.

During the last decade, many work on the hardware and system levels has been done to provide a proper solution to the green computing problem. However, eﬀorts on the application level have been very little. Yet the computing community has gradually started to realize its importance because the uncontrolled energy usage in data centers have a negative impact on density, scalability and related envi-ronment design, and the problem cannot be solved only by

power-aware hardware and operating systems. The

com-mon belief now is that power-aware application is synergis-tic to green computing eﬀorts put on lower levels because it will provide information and extra opportunities for en-ergy conservation. In this paper, we address the problem of building power-aware databases. Such a problem is im-portant because database servers occupy a majority of the computing (and thus energy) resources in a typical data cen-ter [9]. On the other hand, a DBMS has many OS features by managing large chunk of computing resources and hides application level semantics and behavior. Therefore, it is an excellent platform to show the beneﬁts of application-level power-aware design and validate relevant ideas that can be applied to other application software.

Making a DBMS energy-aware is meaningless without con-sidering database performance. To save energy in databases, we can either make the queries run faster (the goal of tradi-tional query optimizer design), or let the system run in lower power states while sacriﬁcing some performance to some ex-tent. To fully study the relation between energy, power and throughput performance, [11] suggests following equations to understand what energy eﬃciency is about.

Energy=P ower×T ime (1)

Energy Ef f iciency=F inished W orkloads

Energy

=F inished W orkloads

P ower×T ime

=T hroughput

P ower (2)

(2)

TuplePer-Second/Watt as a unit to measure energy efficiency. More-over, the same equation also tells us two ways to enhance energy efficiency: (1) improve throughput while maintain-ing same power line (Watt); or (2) reducmaintain-ing the power cost while not sacrificing too much throughput performance. As mentioned above, in this paper, we focus on the second way to formulate the energy-saving problem in databases as an optimization problem:

Problem 1. Given a performance bound, how do we

minimize the energy consumption of query processing in a database system?

In this paper, we focus on designing a Power-aware DBMS (P-DBMS) that can significantly reduce energy use with graceful time performance degradation, and little effects on its scalability and reliability. To achieve this goal, we pro-pose the approach of designing a query optimizer that takes energy cost of plans into consideration in query optimiza-tion. This requires two models in DBMS design. The first

one is apower estimation model embedded in cost

estima-tion module in the plan generator to calculate estimated power cost of each query execution plan. The second model

iscost evaluation model to guide query plan selection. Such

a model should capture system’s preference of energy over time performance to ensure the DBA’s control on the pre-ferred balance between power consumption and system per-formance.

Our aim in this paper is present the author’s PhD thesis work in energy-eﬃcient data management. We review re-lated work in energy consumption in database systems and some experimental studies in database hardware conﬁgura-tions 1.1. In Section 2, we discuss our approach, challenges and possible strategies to meet such challenges. In Section 3, we present some preliminary results which support our claim and also result in many interesting open questions. At last, we conclude the paper in Section 4.

1.1 Related Work

The power consumption in databases has just started draw-ing attention from the research community, as evidenced by the recent Claremont report on database research [2]. [9] re-ports extensive experimental results on the power consump-tion patterns in typical commercial database servers. Based on these results, it provides suggestions on how to make the system more power efficient. However, these suggestions fo-cus on utilizing new hardware rather than modifying the DBMS kernel. CIDR 2009 published two position papers that promote energy-related research in the database sys-tems field: [6] introduced an idea to improve query energy-efficiency by introducing “Explicit Delays”(QED), which ba-sically reschedules the queries received for energy-saving pur-poses; and [5] explored possible ways and presented ex-amples of energy saving opportunities, one of them lies in “Energy-aware Optimizations”. However, as a position

pa-per, technical details were lacking.

Our previous work published in ICDE’10 [13] is, to the best of our knowledge, the first technical work that ap-proaches the power-saving problem in databases by redesign-ing the DBMS kernel. In that paper, we have shown that sig-nificant power savings can be achieved by designing a query optimizer that takes power cost into consideration in query evaluation. Some of the key observations reported in our paper were confirmed by another set of experiments

psented in [10]. A recent work [11] reports experimental re-sults that also verified the potential of power savings. How-ever, in terms of total energy savings, the authors concluded that the fastest plan is always also the most energy-efficient plan. However, we believe this conclusion is unnecessarily pessimistic because the baseline (i.e., idle power) power were not included in calculating the energy cost of the traditional databases after the workload is processed, making energy savings achieved by any power-aware system design negligi-ble or non-existing. Here we argue that baseline power con-sumption should not be disregarded since data center servers cannot be turned off after processing a “workload”. In real data centers, queries will continuously arrive (although the arrival rate may change over time). To that end, we focus on active power savings in our paper. An interesting thing is: a power-aware DBMS can also lower the baseline power consumption, as mentioned in [6]. We will also discuss this in our paper.

2. OVERVIEW OF OUR APPROACH

2.1 The Big Picture

Our vision of building P-DBMS is to enhance current DBMS components with energy-related functionalities, rather than building these components from scratch. This allows us to leverage the current DBMS architecture that is well-designed for performance-driven query processing. Multiple modules that span the whole work ﬂow of query processing (e.g., query optimizer, buﬀer management, storage manager, . . . ) will have to be revisited. We believe energy savings can be harvested by the following two mechanisms.

1. We may explore the search space during query opti-mization and look for query plans that have low energy costs (and acceptable performance).

2. We should also design new resource management algo-rithms within the DBMS to exploit the power-saving modes of energy-aware hardware systems (e.g., CPU, storage devices, and memory). The goal is to provide more opportunities to turn the hardware to low-power modes. Note that, in this paradigm, the DBMS will cooperative with the OS to form a cross-layer frame-work for saving energy.

Adjusting hardware mode (mechanism 2) seems to be the lowest-hanging fruit in energy-aware DBMS – it is reported that up to 47% of energy can be saved by controlling the P-states of the CPU in a single database server [6]. However, this thesis proposal focuses on the ﬁrst mechanism of power-aware query optimization because we view it as the central control inside the DBMS for managing the cost of workload processing. Research in power-aware query optimization will beneﬁt database servers deployed on top of regular hard-ware systems as well as those with energy-ahard-ware hardhard-ware systems. Once implemented, mechanism 2 will depend on a power-aware query optimizer to show its advantage.

Mechanism 1 is motivated by the observations thatthere

exists power-eﬃcient execution plans (with reasonably good performance) that are ignored by existing query optimizers. In a traditional DBMS, query optimization is essentially the problem of ﬁnding a plan with low I/O costs – CPU cost is safely ignored as it is often negligible comparing to the

(3)

Figure 1: Estimated time and power costs of exe-cution plans visited by the PostgreSQL query opti-mizer for query #5 in the TPC-H benchmark. This ﬁgure is borrowed from [13].

achieving high energy eﬃciency because, the energy con-sumed by CPUs is often greater than (or at least compa-rable to) that of the storage systems [9]. This fact is con-ﬁrmed by our experiments. Consider the following scenario:

for two plans A and B of the same query, the I/O cost

of planA is slightly higher than that of planB but it

re-quires much less CPU time thanB. To solve _{Problem 1},

we could choose A instead ofB (which will be selected in

a traditional DBMS), since planB’s high CPU requirement

translates into a (much) higher power consumption while it

is only marginally better than A in performance. Fig. 1

shows a real-world example of such a scenario, with the cir-cled red dot (the one on the left edge of the small ﬁgure) representing plan A and the circled green dot as plan B.

To capture the power-saving opportunities via query opti-mization, our strategy is to ﬁnd query plans with low power cost during plan generation. For that purpose, we propose

a power-aware query optimizer that incorporates a power

model to estimate the power consumption of plans, and a cost evaluation model that takes both performance and en-ergy costs into account. The cost evaluation model is used to select plans for execution and one of its key features is the ability to dynamically adjust its preference on power over performance in query plan selection. This feature is

neces-sary for meeting the system design goal (i.e.,Problem 1)

under ﬂuctuations in the workload and other environmental factors.

2.2 Power Model

Basic Design. An existing query optimizer evaluates each generated plan by estimating the numbers of basic op-erations (e.g., number of tuples to execute, indexed tuples to sort, pages to read/write, . . . ) needed to process such a

plan. Those numbers form a vector which we call the

op-eration vector, denoted as⃗o. The prediction of time cost is

accomplished by using a set of static parameters that stand for the resource holding time per basic operation (e.g., CPU time to process an indexed tuple, I/O time to read/write a

page). Such parameters form another vector denoted as⃗c.

Those parameters vary from one machine to another with diﬀerent hardware conﬁgurations and are calibrated at the

installation stage of the database server. Given the two vec-tors, , the estimated processing time of a plan is given by Equation (3).

T=⃗oT⃗c (3) In developing the power model, the power cost estimation takes advantage of the above mechanisms in existing query

optimizers. We still use the operation vector ⃗o, and the

problem becomes how to derive an accurate power proﬁle for the basic operations. Given the basic power cost of each

operator (in the form of a vectorc⃗′and the operation vector,

the power cost of a plan is given by the following equation.

P =⃗oT⃗c′ (4) As a ﬁrst step to design the P-DBMS query optimizer, we also aim at static power proﬁles for the basic operations.

The initial values ofc⃗′can be obtained from hardware

spec-ifications provided by the vendors. Then, these parameters must be calibrated by a series of experiments run under the computing environment where the database service will be provided. After repeating such tests sufficient number of times, historic data are collected to refine those parameters by using least-square method to find the best linear unbiased estimator (BLUE) so that the estimation errors are accept-able. We enjoyed reasonable success using such a method [13] – we were able to predict the power costs of certain queries with an error rate as low as 7.2%.

The Main Challenge.Although our static power model described above works in most cases we tested, we also no-ticed there are errors in predicting power costs. We also

en-countered scenarios where the values of ⃗c′ do not converge

(in which case we had to derive a value by regression). The reason for this is because the energy cost of an operation may be different under different (database) system states and workload characteristics. For example, the cost of read-ing a page is different when the system is under different level of contention. Maintaining only static parameters for the per-operation power costs is apparently an oversimplifi-cation.

Online Modeling in Power Estimation. To address

the above challenge, we propose an online model estimation method that dynamically adjusts the power model param-eters at runtime by taking the current system states and workload features into consideration. An online model esti-mator [12] is traditionally used to achieve analytical assur-ance of control accuracy (we will talk about control in Sec-tion 2.3) and system stability. In our research, we will apply it to our power models for the purpose of avoiding eﬀects of signiﬁcant workload variations or unpredictable changes of DBMS states, which cannot be overcome by the static model.

In particular, we plan to use a Recursive Least Square (RLS) estimator with directional forgetting [7] to estimate

and update the parameter vectorc⃗′ in the power model in

Eq. (4). To achieve this goal, we need to extend our model

in Eq. (4) and periodically update the value forc⃗′. Letc⃗′=

{c1, c2,· · ·, cn} and k =

∑n

j=1cj, we deﬁne a new vector

C′={c1, c2,· · ·, cn, k}and denote the value ofC′at period

i as C′(i). For the operation vector⃗o = {o1, o2,· · ·, om}

in Eq. (4), we deﬁne another vectorO={o1, o2,· · ·, on,1}

and its value at periodiasO(i). In each iteration, the real

(4)

0 2 4 6 8 10 0 2 4 6 8 10 E - Energy cost

T - Query processing time C=E+T

C=ET C=ET2

Figure 2: Pareto curves formed by diﬀerent cost

functions.

the matrixC′(i) based on equation as follows:

C′(i) =C′(i−1) + e(i)O(i)M(i−1)

λ+O(i)M(i−1)O(i)T (5)

wheree(i) =kp(i)−O(i)TC′(i) is the estimation error,M(i)

is the covariance matrix and λ is the constant forgetting

factor within [0,1] – a smaller λ enables the estimator to

forget the history faster.

The following routines are invoked at every period of model updating: (1) The RLS estimator records the operator

vec-tor,O(i) and the total power consumption of plans,p(i); (2)

it computesC′(i); (3) it updatesc′withcivalues inC′(i).

2.3 Plan Evaluation Model

Basic Idea. Towards a system-level optimization goal, the plan evaluation model provides a criterion to evaluate the superiority of alternative query plans with diﬀerent

op-eration vector⃗o. In our case, the optimization goal is stated

inProblem 1. Recall the 2D graph shown in Fig. 1 where each plan is represented as a point in the 2D space of power and time costs, it is easy to choose between two plans if either one dominates the other in the 2D space. The deci-sion becomes diﬃcult if neither one dominates, and the plan evaluation model is designed to make that decision. One way to view the role of the evaluation model is illustrated in Fig. 2. The superiority of a plan is some function of

the plan’s energy costE and performanceT. Such a

func-tion deﬁnes a series of pareto curves in the 2D space formed

in the domains ofE and T, and the pareto curves can be

used to choose among non-dominating points (plans) in the space. Using diﬀerent curves allows us to give preference to diﬀerent areas of the 2D graph during plan selection. For

example, the cost function C = E+T favors plans with

balanced performance and energy costs while the function

C=ET favors those with either good performance or low

energy cost. However, the function C = ET2 gives more

preference to those with very short processing time. We propose a metric model with the following format for our power-aware query optimizer.

C=P Tn=ETn−1 (6)

whereCis the aggregated cost (i.e., lower cost means higher

superiority) and n is a coeﬃcient that reﬂects the relative

importance ofPandT(we will discuss the choice ofnlater).

Intuitively, this model implies that the query optimizer is

willing to sacriﬁce adn-time degradation in performance to

achieve ad-time power reduction. The model provides

differ-ent plan selection strategies for differdiffer-ent optimization goals

with the choice of n. Whenn =∞, we only consider the

time cost (i.e., as the traditional DBMS does); forn= 0, we

optimize towards lowest power consumption; and forn= 1,

power and time performance are both taken into considera-tion (in other words, we optimize towards total energy cost according to Eq. (1)).

The Main Challenges. The model shown in Eq. (6)

provides a means for the DBA to reach the desired trade-oﬀ between throughput and energy eﬃciency. In an ideal world, the system is stable and predictable. The only thing

we need to do is to ﬁnd the right value fornand it should

be used for a long time until the system preference (i.e., the

constraint in_{Problem 1}) is changed. However, like many

other complex systems, a database is barely stable and pre-dictable due to many reasons such as: (1) Modeling errors.

The estimation of per-operation costs (i.e.,⃗cand⃗c′) cannot

be 100% correct (even with the online estimator). The

val-ues in the operation vector⃗oare also the results of best-eﬀort

estimations based on incomplete statistics - that is actually a source of error we inherent from the existing query

op-timizer. In short, neither E nor T is perfect in our plan

evaluation model (Eq. (6)). (2) The behavior of workload may render the instability of DBMS. For example, when the workload intensity increases, the potential of energy saving

is becoming slim and we should increase n automatically.

Given the above reasons, we need to construct a framework that dynamically and automatically change the evaluation parameter for the purpose of: (1) adapting to the current context including workload and system status; and (2) min-imizing the eﬀects of modeling errors.

Control-Based Query Evaluation Model. To meet

the above challenges, we propose a control-theoretic solu-tion. Recently, feedback control theory has been successfully applied to power control in servers and found to outperform

commonly used heuristic-based solutions. The beneﬁt of

having control theory as a theoretical foundation is that we can have (1) standard approaches to choose the right con-trol parameters so that exhaustive iterations of tuning and testing are avoided; (2) theoretically guaranteed control per-formance such as accuracy, stability, short settling time, and small overshoot; and (3) quantitative control analysis when the system is suﬀering unpredictable workload variations. To be speciﬁc, we will develop a feedback control loop that monitors the actual system throughput (i.e., output signal)

at runtime and then adjustsn(i.e., input signal), the

coef-ﬁcient to make an optimized trade-oﬀ between energy and processing time. The primary control objective is to

guar-antee that the throughputRconverges to the set pointRs

(i.e., performance bound in _{Problem 1}) within a limited

settling time. In the meantime, the controller tries to mini-mize the energy consumption, which is our optimization goal in_{Problem 1}.

Rigorous control theoretic design is based on a dynamic

model to describe the response of the DBMS (plantin

con-trol terminology) to the change of input n. Since a model

(5)

sys-tems such as a database, one way to generate such a model is using system identiﬁcation [3] techniques. The basic idea is to initially model the system as a diﬀerence equation with unknown parameters. Generally, a plant can be modeled as

R(k) = x ∑ i=1 aiR(n−i) + y ∑ j=1 bjn(k−j) (7)

whereR(k) is the output (i.e., throughput in our problem)

at the end of control periodk;n(k) is the input at period

k, xand yare the orders of the system output and input,

respectively;ai andbi are plant-speciﬁc parameters. Then

we can determine the order and parameters of the diﬀerence equation experimentally. Generally, we can start from some knowledge of the plant and then go back and forth between experiments and hypotheses thus a more reﬁned model is built each time. In the experiments, we can generate a se-quence of pseudo-random digital white noise as control input to stimulate the system and then measure the throughput in each control period. Based on the collected data, we can use the Least Squares Method (LSM) to iteratively estimate

the values of parametersaiandbi. To verify the accuracy of

the models in different orders, we can then change the seed of the white noise to generate a different sequence of control input, and then compare the actual control output to those predicted by the estimated models. If the estimated output based on the system model is sufficiently close to the actual output, the model can be used for controller design. Note that the system identification is done offline once only.

Once the system model is found, we can apply standard control-techniques to design a controller, which tells us how

to change the value ofnin order to keep the desired

through-put. We propose to use Proportional-Integral-Derivative

(PID) control theory [4] for this purpose. We skip the details about controller design due to page limits.

3. PRELIMINARY RESULTS

We thoroughly tested our basic design of the power-aware query optimizer. In our experiments, we modiﬁed the Post-greSQL kernel to implement the basic power model and cost

evaluation model with a static n value. We deployed the

DBMS in a single server and fed the system with various workloads derived from the TPC-C and TPC-H benchmarks. Fig. 3 shows the instantaneous active power consumption (of the whole server) of the same workload under diﬀerent

values of the cost model parametern. Whenngets bigger,

more power is consumed at all times. Note that the case

ofn=∞basically represents a performance-only query

op-timizer while the case of n = 0 is one that only has

con-cerns on power. The reason why power dropped abruptly in the first two subfigures (after 150th (470th) second in Fig. 3a (Fig. 3b)) was that the CPU bound queries in the workload finished earlier than the I/O bound queries. Such power drops show the significant share of power

consump-tion by CPUs. In summary, we observed 11-22% power

savings and 3-19% energy savingsfrom all the experi-ments described above. These results show strong evidence to support our proposed research direction of designing a power-aware query optimizer. Note that, as a proof of con-cept, the reported power and energy savings is achieved un-der the situation of a static power model and the lack of power-aware hardware. We believe that a more sophisti-cated energy model and the realization of P-DBMS on top

15 20 25 30 35 40 45 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 Power (watt) Time (second) n = ∞ n = 1 n = 0 15 20 25 30 35 40 45 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 Power (watt) n = ∞ n = 1 n = 0 15 20 25 30 35 40 45 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 Power (watt) a. 500MB database b. 1GB database c. 10GB database n = ∞ n = 1 n = 0

Figure 3: Power consumption of the TPC-H work-loads under three diﬀerent database sizes.

of power-aware hardware systems will further produce sig-niﬁcant energy savings.

We also systematically studied the energy/performance patterns of our workloads, and found that queries with power-efficient plans that are missed by traditional query optimiz-ers are very common. For example, we found plans that are significantly more energy efficient in 10 out of the 19 queries we studied in TPC-H. The power/energy savings we recorded are caused by executing different query plans, not by random system fluctuations. Details of the above exper-iments can be found in our ICDE’10 paper [13].

To study the eﬀects of hardware’s power proﬁle on power-aware query optimization, we recently extended the experi-ments described above by measuring the power consumption

Figure 4: Power breakdown of the tested database server.

(6)

10 30 50 70 90 110 130 100 200 300 400 500 600 700 800 900 1000 1100 Power(Watts) Time (1/18 second) n = ∞ n = 2 n = 1.5 n = 1 n = 0.5 10 30 50 70 90 110 130 100 200 300 400 500 600 700 800 900 1000 1100 Power(Watts) Time (1/18 second) n = ∞ n = 2 n = 1.5 n = 1 n = 0.5 10 30 50 70 90 110 130 100 200 300 400 500 600 700 800 900 1000 1100 Power(Watts) Time (1/18 second) n = ∞ n = 2 n = 1.5 n = 1 n = 0.5 10 30 50 70 90 110 130 100 200 300 400 500 600 700 800 900 1000 1100 Power(Watts) Time (1/18 second) n = ∞ n = 2 n = 1.5 n = 1 n = 0.5 10 30 50 70 90 110 130 100 200 300 400 500 600 700 800 900 1000 1100 Power(Watts) Time (1/18 second) n = ∞ n = 2 n = 1.5 n = 1 n = 0.5 10 30 50 70 90 110 130 100 200 300 400 500 600 700 800 900 1000 1100 Power(Watts) Time (1/18 second) n = ∞ n = 2 n = 1.5 n = 1 n = 0.5 CPU HDD

Figure 5: CPU and HDD’s Peak Power Consump-tion under an online control signal.

of individual server hardware components. Fig. 4 shows the runtime power distribution of our tested server running un-der its maximize capacity. We attached three power moni-tors to the server to capture the power consumption of CPU, hard drives, and the whole server. As seen from Fig. 4, in such a server with one disk and a dual-core CPU, the active power of CPU is way greater than that of the disk. This im-plies that those plans with low CPU consumption should be paid more attention since they have larger potential to save power. On the other hand, the power cost of the I/O oper-ations makes very little diﬀerence. Another fact is: most of

the power is consumed when system is idling.1 This clearly

points out that lowering the baseline power consumption is the right direction to go for power-aware database research. We believe the P-DBMS software, when deployed on top of power-aware hardware systems, will render more dramatic energy reductions.

We also explored the feasibility of our control-theoretic framework for power-aware data management. In another experiment (Fig. 5), the value of the cost evaluation model

parametern is changed at runtime to test the DBMS’s

re-sponse to such changes. The test started withn= 0, which

forces the power-aware optimizer to take power cost as the only factor in plan selection. Then, at the end of the 200th

period, a signal is sent to the DBMS to change n to the

other extreme,∞. It will make DBMS select plans based on

performance only. Within a few control periods’ time, we can see an abrupt increase of the CPU power consumption (represented by the red line) and it stabilizes. And then, the

nis changed to a series of smaller values and the power

con-sumption started decreasing and then stabilized again on a

lower level. We believe this shows thatthe parametern(or

its variations) can be used as a control signal in the feedback

control loop. In this experiment, a maximum power

deduc-tion of 30% was achieved (but it also renders unacceptable performance). However, when the system was running in

between the two extreme values ofn, we got much better

performance in query processing. On the other hand, When reviewing the power consumption of the hard drive (green line in Fig. 5), almost no changes of power consumption can be observed although different plans were being executed. This, again, confirms our experimental results given by Fig. 4 that power cost of the I/O system will not change much un-der different workloads. It would be interesting to see how new I/O systems built on multi-speed disks or solid-state disks (SSD) behave.

1

Being idle here means no query is being processed.

4. CONCLUSION

In this paper, we study the problem of improving energy eﬃciency in data centers from a new angle - building power-aware DBMS software. Our approach takes advantage of proven techniques from the ﬁeld of cost-based query opti-mization in traditional DBMSs and combine it with new models that evaluate the power cost of query plans and se-lect plans based on both performance and power costs. Com-pared to current power-saving solutions on the hardware and operating system levels, our approach is promising in that

it can provide signiﬁcant extra savings. We have

imple-mented our initial design in PostgreSQL and experimental results support our expectations on the eﬀectiveness of our solution. We believe power-aware DBMS is an area full of exciting research opportunities since various components in existing DBMSs should be revisited/redesigned to take en-ergy cost into consideration. We advocate more eﬀorts be invested into this topic by the database community.

5. REFERENCES

[1] U. E. P. Agency. Report to congress on server and data center energy eﬃciency public law 109-431. 2007. [2] R. Agrawal, A. Ailamaki, and et al. The claremont

report on database research.SIGMOD Rec.,

37(3):9–19, 2008.

[3] A. Arasu, B. Babcock, and et al. Stream: The

stanford stream data manager.IEEE Data Eng. Bull.,

26(1):19–26, 2003.

[4] G. F. Franklin, J. D. Powell, and M. Workman.Digital

Control of Dynamic Systems. Addition-Wesley, 1997.

[5] S. Harizopoulos, M. A. Shah, J. Meza, and

P. Ranganathan. Energy eﬃciency: The new holy grail

of data management systems research. InCIDR, 2009.

[6] W. Lang and J. M. Patel. Towards eco-friendly

database management systems. InCIDR, 2009.

[7] X. Liu, X. Zhu, P. Padala, Z. Wang, and S. Singhal. Optimal multivariate control for diﬀerentiated services on a shared hosting platform.

[8] R. Nambiar and M. Poess. Performance evaluation

and benchmarking. InTPCTC, volume 5895 of

Lecture Notes in Computer Science. Springer, 2009.

[9] M. Poess and R. O. Nambiar. Energy cost, the key challenge of today’s data centers: a power

consumption analysis of TPC-C results.PVLDB,

1(2):1229–1240, 2008.

[10] M. Poess and R. O. Nambiar. Tuning servers, storage and database for energy eﬃcient data warehouses. In

ICDE, 2010.

[11] D. Tsirogiannis, S. Harizopoulos, and M. A. Shah. Analyzing the energy eﬃciency of a database server.

InSIGMOD, 2010.

[12] Y. Wang, K. Ma, and X. Wang.

Temperature-constrained power control for chip multiprocessors with online model estimation.

SIGARCH Comput. Archit. News, 37(3):314–324,

2009.

[13] Z. Xu, Y. Tu, and X. Wang. Exploring

power-performance tradeoﬀs in database systems. In