Building a Power-Aware Database Management System
Zichen Xu
University of South Florida Tampa, FL, 33620
[email protected]
ABSTRACT
In today’s large-scale data centers, energy costs (i.e., the electricity bill) are projected to outgrow that of hardware. Despite a long history of research in energy-saving tech-niques, especially low-power hardware, little work has been done to improve the power efficiency of data management software. Power-aware computing research at the applica-tion level has been found to be synergistic to that at the hardware and OS levels because it can provide more oppor-tunities for energy reduction in the underlying systems. This paper describes the author’s thesis work on creating a power-aware database management (P-DBMS) and initial ideas on the design of such systems, with the focus on a power-aware query optimization module inside the DBMS. We discuss the main technical challenges in designing the optimizer and present our strategies to meet such challenges. We focus our discussions on a power model to accurately measure the en-ergy costs of query executions plans, and a cost evaluation model for plan selection. An important feature of this work is the formal control-theoretic methods we use to model and optimize the database towards the performance and energy saving goals. This rigorous design methodology is in sharp contrast to heuristic-based adaptive solutions that rely on extensive empirical evaluation and manual tuning. Our ex-periments using a power-aware query optimizer under our initial design show that there exist significant potential in power/energy savings.
Categories and Subject Descriptors
H.2 [Information Systems]: DATABASE MANAGEMENT
General Terms
Performance
1.
INTRODUCTION
Demands for computing capacity of information process-ing is growprocess-ing dramatically, which have forced vendors to provide faster, larger scale and inexpensive solutions. Par-ticularly, they are seeking servers for high scalability setup, lower unit price, and simplicity of management [8]. As to
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
Proceedings of the Fourth SIGMOD PhD Workshop on
In-novative Database Research (IDAR 2010), June 11, 2010,
Indi-anapolis, USA.
Copyright 2010 ACM 978-1-4503-0191-6/10/06...$10.00.
maintenance costs, the second largest bill for large database centers comes from the power company. In US alone, data centers consumed about 60 billion kilowatt-hours (kWh) of electricity in 2006, equaling roughly 1.5% of the total U.S.
electricity consumption [1]. Furthermore, the amount of
power consumption in data centers has almost doubled since year 2006. Researchers from multiple disciplines are invest-ing their efforts in energy optimization in data centers.
During the last decade, many work on the hardware and system levels has been done to provide a proper solution to the green computing problem. However, efforts on the application level have been very little. Yet the computing community has gradually started to realize its importance because the uncontrolled energy usage in data centers have a negative impact on density, scalability and related envi-ronment design, and the problem cannot be solved only by
power-aware hardware and operating systems. The
com-mon belief now is that power-aware application is synergis-tic to green computing efforts put on lower levels because it will provide information and extra opportunities for en-ergy conservation. In this paper, we address the problem of building power-aware databases. Such a problem is im-portant because database servers occupy a majority of the computing (and thus energy) resources in a typical data cen-ter [9]. On the other hand, a DBMS has many OS features by managing large chunk of computing resources and hides application level semantics and behavior. Therefore, it is an excellent platform to show the benefits of application-level power-aware design and validate relevant ideas that can be applied to other application software.
Making a DBMS energy-aware is meaningless without con-sidering database performance. To save energy in databases, we can either make the queries run faster (the goal of tradi-tional query optimizer design), or let the system run in lower power states while sacrificing some performance to some ex-tent. To fully study the relation between energy, power and throughput performance, [11] suggests following equations to understand what energy efficiency is about.
Energy=P ower×T ime (1)
Energy Ef f iciency=F inished W orkloads
Energy
=F inished W orkloads
P ower×T ime
=T hroughput
P ower (2)
TuplePer-Second/Watt as a unit to measure energy efficiency. More-over, the same equation also tells us two ways to enhance energy efficiency: (1) improve throughput while maintain-ing same power line (Watt); or (2) reducmaintain-ing the power cost while not sacrificing too much throughput performance. As mentioned above, in this paper, we focus on the second way to formulate the energy-saving problem in databases as an optimization problem:
Problem 1. Given a performance bound, how do we
minimize the energy consumption of query processing in a database system?
In this paper, we focus on designing a Power-aware DBMS (P-DBMS) that can significantly reduce energy use with graceful time performance degradation, and little effects on its scalability and reliability. To achieve this goal, we pro-pose the approach of designing a query optimizer that takes energy cost of plans into consideration in query optimiza-tion. This requires two models in DBMS design. The first
one is apower estimation model embedded in cost
estima-tion module in the plan generator to calculate estimated power cost of each query execution plan. The second model
iscost evaluation model to guide query plan selection. Such
a model should capture system’s preference of energy over time performance to ensure the DBA’s control on the pre-ferred balance between power consumption and system per-formance.
Our aim in this paper is present the author’s PhD thesis work in energy-efficient data management. We review re-lated work in energy consumption in database systems and some experimental studies in database hardware configura-tions 1.1. In Section 2, we discuss our approach, challenges and possible strategies to meet such challenges. In Section 3, we present some preliminary results which support our claim and also result in many interesting open questions. At last, we conclude the paper in Section 4.
1.1
Related Work
The power consumption in databases has just started draw-ing attention from the research community, as evidenced by the recent Claremont report on database research [2]. [9] re-ports extensive experimental results on the power consump-tion patterns in typical commercial database servers. Based on these results, it provides suggestions on how to make the system more power efficient. However, these suggestions fo-cus on utilizing new hardware rather than modifying the DBMS kernel. CIDR 2009 published two position papers that promote energy-related research in the database sys-tems field: [6] introduced an idea to improve query energy-efficiency by introducing “Explicit Delays”(QED), which ba-sically reschedules the queries received for energy-saving pur-poses; and [5] explored possible ways and presented ex-amples of energy saving opportunities, one of them lies in “Energy-aware Optimizations”. However, as a position
pa-per, technical details were lacking.
Our previous work published in ICDE’10 [13] is, to the best of our knowledge, the first technical work that ap-proaches the power-saving problem in databases by redesign-ing the DBMS kernel. In that paper, we have shown that sig-nificant power savings can be achieved by designing a query optimizer that takes power cost into consideration in query evaluation. Some of the key observations reported in our paper were confirmed by another set of experiments
psented in [10]. A recent work [11] reports experimental re-sults that also verified the potential of power savings. How-ever, in terms of total energy savings, the authors concluded that the fastest plan is always also the most energy-efficient plan. However, we believe this conclusion is unnecessarily pessimistic because the baseline (i.e., idle power) power were not included in calculating the energy cost of the traditional databases after the workload is processed, making energy savings achieved by any power-aware system design negligi-ble or non-existing. Here we argue that baseline power con-sumption should not be disregarded since data center servers cannot be turned off after processing a “workload”. In real data centers, queries will continuously arrive (although the arrival rate may change over time). To that end, we focus on active power savings in our paper. An interesting thing is: a power-aware DBMS can also lower the baseline power consumption, as mentioned in [6]. We will also discuss this in our paper.
2.
OVERVIEW OF OUR APPROACH
2.1
The Big Picture
Our vision of building P-DBMS is to enhance current DBMS components with energy-related functionalities, rather than building these components from scratch. This allows us to leverage the current DBMS architecture that is well-designed for performance-driven query processing. Multiple modules that span the whole work flow of query processing (e.g., query optimizer, buffer management, storage manager, . . . ) will have to be revisited. We believe energy savings can be harvested by the following two mechanisms.
1. We may explore the search space during query opti-mization and look for query plans that have low energy costs (and acceptable performance).
2. We should also design new resource management algo-rithms within the DBMS to exploit the power-saving modes of energy-aware hardware systems (e.g., CPU, storage devices, and memory). The goal is to provide more opportunities to turn the hardware to low-power modes. Note that, in this paradigm, the DBMS will cooperative with the OS to form a cross-layer frame-work for saving energy.
Adjusting hardware mode (mechanism 2) seems to be the lowest-hanging fruit in energy-aware DBMS – it is reported that up to 47% of energy can be saved by controlling the P-states of the CPU in a single database server [6]. However, this thesis proposal focuses on the first mechanism of power-aware query optimization because we view it as the central control inside the DBMS for managing the cost of workload processing. Research in power-aware query optimization will benefit database servers deployed on top of regular hard-ware systems as well as those with energy-ahard-ware hardhard-ware systems. Once implemented, mechanism 2 will depend on a power-aware query optimizer to show its advantage.
Mechanism 1 is motivated by the observations thatthere
exists power-efficient execution plans (with reasonably good performance) that are ignored by existing query optimizers. In a traditional DBMS, query optimization is essentially the problem of finding a plan with low I/O costs – CPU cost is safely ignored as it is often negligible comparing to the
Figure 1: Estimated time and power costs of exe-cution plans visited by the PostgreSQL query opti-mizer for query #5 in the TPC-H benchmark. This figure is borrowed from [13].
achieving high energy efficiency because, the energy con-sumed by CPUs is often greater than (or at least compa-rable to) that of the storage systems [9]. This fact is con-firmed by our experiments. Consider the following scenario:
for two plans A and B of the same query, the I/O cost
of planA is slightly higher than that of planB but it
re-quires much less CPU time thanB. To solve Problem 1,
we could choose A instead ofB (which will be selected in
a traditional DBMS), since planB’s high CPU requirement
translates into a (much) higher power consumption while it
is only marginally better than A in performance. Fig. 1
shows a real-world example of such a scenario, with the cir-cled red dot (the one on the left edge of the small figure) representing plan A and the circled green dot as plan B.
To capture the power-saving opportunities via query opti-mization, our strategy is to find query plans with low power cost during plan generation. For that purpose, we propose
a power-aware query optimizer that incorporates a power
model to estimate the power consumption of plans, and a cost evaluation model that takes both performance and en-ergy costs into account. The cost evaluation model is used to select plans for execution and one of its key features is the ability to dynamically adjust its preference on power over performance in query plan selection. This feature is
neces-sary for meeting the system design goal (i.e.,Problem 1)
under fluctuations in the workload and other environmental factors.
2.2
Power Model
Basic Design. An existing query optimizer evaluates each generated plan by estimating the numbers of basic op-erations (e.g., number of tuples to execute, indexed tuples to sort, pages to read/write, . . . ) needed to process such a
plan. Those numbers form a vector which we call the
op-eration vector, denoted as⃗o. The prediction of time cost is
accomplished by using a set of static parameters that stand for the resource holding time per basic operation (e.g., CPU time to process an indexed tuple, I/O time to read/write a
page). Such parameters form another vector denoted as⃗c.
Those parameters vary from one machine to another with different hardware configurations and are calibrated at the
installation stage of the database server. Given the two vec-tors, , the estimated processing time of a plan is given by Equation (3).
T=⃗oT⃗c (3) In developing the power model, the power cost estimation takes advantage of the above mechanisms in existing query
optimizers. We still use the operation vector ⃗o, and the
problem becomes how to derive an accurate power profile for the basic operations. Given the basic power cost of each
operator (in the form of a vectorc⃗′and the operation vector,
the power cost of a plan is given by the following equation.
P =⃗oT⃗c′ (4) As a first step to design the P-DBMS query optimizer, we also aim at static power profiles for the basic operations.
The initial values ofc⃗′can be obtained from hardware
spec-ifications provided by the vendors. Then, these parameters must be calibrated by a series of experiments run under the computing environment where the database service will be provided. After repeating such tests sufficient number of times, historic data are collected to refine those parameters by using least-square method to find the best linear unbiased estimator (BLUE) so that the estimation errors are accept-able. We enjoyed reasonable success using such a method [13] – we were able to predict the power costs of certain queries with an error rate as low as 7.2%.
The Main Challenge.Although our static power model described above works in most cases we tested, we also no-ticed there are errors in predicting power costs. We also
en-countered scenarios where the values of ⃗c′ do not converge
(in which case we had to derive a value by regression). The reason for this is because the energy cost of an operation may be different under different (database) system states and workload characteristics. For example, the cost of read-ing a page is different when the system is under different level of contention. Maintaining only static parameters for the per-operation power costs is apparently an oversimplifi-cation.
Online Modeling in Power Estimation. To address
the above challenge, we propose an online model estimation method that dynamically adjusts the power model param-eters at runtime by taking the current system states and workload features into consideration. An online model esti-mator [12] is traditionally used to achieve analytical assur-ance of control accuracy (we will talk about control in Sec-tion 2.3) and system stability. In our research, we will apply it to our power models for the purpose of avoiding effects of significant workload variations or unpredictable changes of DBMS states, which cannot be overcome by the static model.
In particular, we plan to use a Recursive Least Square (RLS) estimator with directional forgetting [7] to estimate
and update the parameter vectorc⃗′ in the power model in
Eq. (4). To achieve this goal, we need to extend our model
in Eq. (4) and periodically update the value forc⃗′. Letc⃗′=
{c1, c2,· · ·, cn} and k =
∑n
j=1cj, we define a new vector
C′={c1, c2,· · ·, cn, k}and denote the value ofC′at period
i as C′(i). For the operation vector⃗o = {o1, o2,· · ·, om}
in Eq. (4), we define another vectorO={o1, o2,· · ·, on,1}
and its value at periodiasO(i). In each iteration, the real
0 2 4 6 8 10 0 2 4 6 8 10 E - Energy cost
T - Query processing time C=E+T
C=ET C=ET2
Figure 2: Pareto curves formed by different cost
functions.
the matrixC′(i) based on equation as follows:
C′(i) =C′(i−1) + e(i)O(i)M(i−1)
λ+O(i)M(i−1)O(i)T (5)
wheree(i) =kp(i)−O(i)TC′(i) is the estimation error,M(i)
is the covariance matrix and λ is the constant forgetting
factor within [0,1] – a smaller λ enables the estimator to
forget the history faster.
The following routines are invoked at every period of model updating: (1) The RLS estimator records the operator
vec-tor,O(i) and the total power consumption of plans,p(i); (2)
it computesC′(i); (3) it updatesc′withcivalues inC′(i).
2.3
Plan Evaluation Model
Basic Idea. Towards a system-level optimization goal, the plan evaluation model provides a criterion to evaluate the superiority of alternative query plans with different
op-eration vector⃗o. In our case, the optimization goal is stated
inProblem 1. Recall the 2D graph shown in Fig. 1 where each plan is represented as a point in the 2D space of power and time costs, it is easy to choose between two plans if either one dominates the other in the 2D space. The deci-sion becomes difficult if neither one dominates, and the plan evaluation model is designed to make that decision. One way to view the role of the evaluation model is illustrated in Fig. 2. The superiority of a plan is some function of
the plan’s energy costE and performanceT. Such a
func-tion defines a series of pareto curves in the 2D space formed
in the domains ofE and T, and the pareto curves can be
used to choose among non-dominating points (plans) in the space. Using different curves allows us to give preference to different areas of the 2D graph during plan selection. For
example, the cost function C = E+T favors plans with
balanced performance and energy costs while the function
C=ET favors those with either good performance or low
energy cost. However, the function C = ET2 gives more
preference to those with very short processing time. We propose a metric model with the following format for our power-aware query optimizer.
C=P Tn=ETn−1 (6)
whereCis the aggregated cost (i.e., lower cost means higher
superiority) and n is a coefficient that reflects the relative
importance ofPandT(we will discuss the choice ofnlater).
Intuitively, this model implies that the query optimizer is
willing to sacrifice adn-time degradation in performance to
achieve ad-time power reduction. The model provides
differ-ent plan selection strategies for differdiffer-ent optimization goals
with the choice of n. Whenn =∞, we only consider the
time cost (i.e., as the traditional DBMS does); forn= 0, we
optimize towards lowest power consumption; and forn= 1,
power and time performance are both taken into considera-tion (in other words, we optimize towards total energy cost according to Eq. (1)).
The Main Challenges. The model shown in Eq. (6)
provides a means for the DBA to reach the desired trade-off between throughput and energy efficiency. In an ideal world, the system is stable and predictable. The only thing
we need to do is to find the right value fornand it should
be used for a long time until the system preference (i.e., the
constraint inProblem 1) is changed. However, like many
other complex systems, a database is barely stable and pre-dictable due to many reasons such as: (1) Modeling errors.
The estimation of per-operation costs (i.e.,⃗cand⃗c′) cannot
be 100% correct (even with the online estimator). The
val-ues in the operation vector⃗oare also the results of best-effort
estimations based on incomplete statistics - that is actually a source of error we inherent from the existing query
op-timizer. In short, neither E nor T is perfect in our plan
evaluation model (Eq. (6)). (2) The behavior of workload may render the instability of DBMS. For example, when the workload intensity increases, the potential of energy saving
is becoming slim and we should increase n automatically.
Given the above reasons, we need to construct a framework that dynamically and automatically change the evaluation parameter for the purpose of: (1) adapting to the current context including workload and system status; and (2) min-imizing the effects of modeling errors.
Control-Based Query Evaluation Model. To meet
the above challenges, we propose a control-theoretic solu-tion. Recently, feedback control theory has been successfully applied to power control in servers and found to outperform
commonly used heuristic-based solutions. The benefit of
having control theory as a theoretical foundation is that we can have (1) standard approaches to choose the right con-trol parameters so that exhaustive iterations of tuning and testing are avoided; (2) theoretically guaranteed control per-formance such as accuracy, stability, short settling time, and small overshoot; and (3) quantitative control analysis when the system is suffering unpredictable workload variations. To be specific, we will develop a feedback control loop that monitors the actual system throughput (i.e., output signal)
at runtime and then adjustsn(i.e., input signal), the
coef-ficient to make an optimized trade-off between energy and processing time. The primary control objective is to
guar-antee that the throughputRconverges to the set pointRs
(i.e., performance bound in Problem 1) within a limited
settling time. In the meantime, the controller tries to mini-mize the energy consumption, which is our optimization goal inProblem 1.
Rigorous control theoretic design is based on a dynamic
model to describe the response of the DBMS (plantin
con-trol terminology) to the change of input n. Since a model
sys-tems such as a database, one way to generate such a model is using system identification [3] techniques. The basic idea is to initially model the system as a difference equation with unknown parameters. Generally, a plant can be modeled as
R(k) = x ∑ i=1 aiR(n−i) + y ∑ j=1 bjn(k−j) (7)
whereR(k) is the output (i.e., throughput in our problem)
at the end of control periodk;n(k) is the input at period
k, xand yare the orders of the system output and input,
respectively;ai andbi are plant-specific parameters. Then
we can determine the order and parameters of the difference equation experimentally. Generally, we can start from some knowledge of the plant and then go back and forth between experiments and hypotheses thus a more refined model is built each time. In the experiments, we can generate a se-quence of pseudo-random digital white noise as control input to stimulate the system and then measure the throughput in each control period. Based on the collected data, we can use the Least Squares Method (LSM) to iteratively estimate
the values of parametersaiandbi. To verify the accuracy of
the models in different orders, we can then change the seed of the white noise to generate a different sequence of control input, and then compare the actual control output to those predicted by the estimated models. If the estimated output based on the system model is sufficiently close to the actual output, the model can be used for controller design. Note that the system identification is done offline once only.
Once the system model is found, we can apply standard control-techniques to design a controller, which tells us how
to change the value ofnin order to keep the desired
through-put. We propose to use Proportional-Integral-Derivative
(PID) control theory [4] for this purpose. We skip the details about controller design due to page limits.
3.
PRELIMINARY RESULTS
We thoroughly tested our basic design of the power-aware query optimizer. In our experiments, we modified the Post-greSQL kernel to implement the basic power model and cost
evaluation model with a static n value. We deployed the
DBMS in a single server and fed the system with various workloads derived from the TPC-C and TPC-H benchmarks. Fig. 3 shows the instantaneous active power consumption (of the whole server) of the same workload under different
values of the cost model parametern. Whenngets bigger,
more power is consumed at all times. Note that the case
ofn=∞basically represents a performance-only query
op-timizer while the case of n = 0 is one that only has
con-cerns on power. The reason why power dropped abruptly in the first two subfigures (after 150th (470th) second in Fig. 3a (Fig. 3b)) was that the CPU bound queries in the workload finished earlier than the I/O bound queries. Such power drops show the significant share of power
consump-tion by CPUs. In summary, we observed 11-22% power
savings and 3-19% energy savingsfrom all the experi-ments described above. These results show strong evidence to support our proposed research direction of designing a power-aware query optimizer. Note that, as a proof of con-cept, the reported power and energy savings is achieved un-der the situation of a static power model and the lack of power-aware hardware. We believe that a more sophisti-cated energy model and the realization of P-DBMS on top
15 20 25 30 35 40 45 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 Power (watt) Time (second) n = ∞ n = 1 n = 0 15 20 25 30 35 40 45 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 Power (watt) n = ∞ n = 1 n = 0 15 20 25 30 35 40 45 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 Power (watt) a. 500MB database b. 1GB database c. 10GB database n = ∞ n = 1 n = 0
Figure 3: Power consumption of the TPC-H work-loads under three different database sizes.
of power-aware hardware systems will further produce sig-nificant energy savings.
We also systematically studied the energy/performance patterns of our workloads, and found that queries with power-efficient plans that are missed by traditional query optimiz-ers are very common. For example, we found plans that are significantly more energy efficient in 10 out of the 19 queries we studied in TPC-H. The power/energy savings we recorded are caused by executing different query plans, not by random system fluctuations. Details of the above exper-iments can be found in our ICDE’10 paper [13].
To study the effects of hardware’s power profile on power-aware query optimization, we recently extended the experi-ments described above by measuring the power consumption
Figure 4: Power breakdown of the tested database server.
10 30 50 70 90 110 130 100 200 300 400 500 600 700 800 900 1000 1100 Power(Watts) Time (1/18 second) n = ∞ n = 2 n = 1.5 n = 1 n = 0.5 10 30 50 70 90 110 130 100 200 300 400 500 600 700 800 900 1000 1100 Power(Watts) Time (1/18 second) n = ∞ n = 2 n = 1.5 n = 1 n = 0.5 10 30 50 70 90 110 130 100 200 300 400 500 600 700 800 900 1000 1100 Power(Watts) Time (1/18 second) n = ∞ n = 2 n = 1.5 n = 1 n = 0.5 10 30 50 70 90 110 130 100 200 300 400 500 600 700 800 900 1000 1100 Power(Watts) Time (1/18 second) n = ∞ n = 2 n = 1.5 n = 1 n = 0.5 10 30 50 70 90 110 130 100 200 300 400 500 600 700 800 900 1000 1100 Power(Watts) Time (1/18 second) n = ∞ n = 2 n = 1.5 n = 1 n = 0.5 10 30 50 70 90 110 130 100 200 300 400 500 600 700 800 900 1000 1100 Power(Watts) Time (1/18 second) n = ∞ n = 2 n = 1.5 n = 1 n = 0.5 CPU HDD
Figure 5: CPU and HDD’s Peak Power Consump-tion under an online control signal.
of individual server hardware components. Fig. 4 shows the runtime power distribution of our tested server running un-der its maximize capacity. We attached three power moni-tors to the server to capture the power consumption of CPU, hard drives, and the whole server. As seen from Fig. 4, in such a server with one disk and a dual-core CPU, the active power of CPU is way greater than that of the disk. This im-plies that those plans with low CPU consumption should be paid more attention since they have larger potential to save power. On the other hand, the power cost of the I/O oper-ations makes very little difference. Another fact is: most of
the power is consumed when system is idling.1 This clearly
points out that lowering the baseline power consumption is the right direction to go for power-aware database research. We believe the P-DBMS software, when deployed on top of power-aware hardware systems, will render more dramatic energy reductions.
We also explored the feasibility of our control-theoretic framework for power-aware data management. In another experiment (Fig. 5), the value of the cost evaluation model
parametern is changed at runtime to test the DBMS’s
re-sponse to such changes. The test started withn= 0, which
forces the power-aware optimizer to take power cost as the only factor in plan selection. Then, at the end of the 200th
period, a signal is sent to the DBMS to change n to the
other extreme,∞. It will make DBMS select plans based on
performance only. Within a few control periods’ time, we can see an abrupt increase of the CPU power consumption (represented by the red line) and it stabilizes. And then, the
nis changed to a series of smaller values and the power
con-sumption started decreasing and then stabilized again on a
lower level. We believe this shows thatthe parametern(or
its variations) can be used as a control signal in the feedback
control loop. In this experiment, a maximum power
deduc-tion of 30% was achieved (but it also renders unacceptable performance). However, when the system was running in
between the two extreme values ofn, we got much better
performance in query processing. On the other hand, When reviewing the power consumption of the hard drive (green line in Fig. 5), almost no changes of power consumption can be observed although different plans were being executed. This, again, confirms our experimental results given by Fig. 4 that power cost of the I/O system will not change much un-der different workloads. It would be interesting to see how new I/O systems built on multi-speed disks or solid-state disks (SSD) behave.
1
Being idle here means no query is being processed.
4.
CONCLUSION
In this paper, we study the problem of improving energy efficiency in data centers from a new angle - building power-aware DBMS software. Our approach takes advantage of proven techniques from the field of cost-based query opti-mization in traditional DBMSs and combine it with new models that evaluate the power cost of query plans and se-lect plans based on both performance and power costs. Com-pared to current power-saving solutions on the hardware and operating system levels, our approach is promising in that
it can provide significant extra savings. We have
imple-mented our initial design in PostgreSQL and experimental results support our expectations on the effectiveness of our solution. We believe power-aware DBMS is an area full of exciting research opportunities since various components in existing DBMSs should be revisited/redesigned to take en-ergy cost into consideration. We advocate more efforts be invested into this topic by the database community.
5.
REFERENCES
[1] U. E. P. Agency. Report to congress on server and data center energy efficiency public law 109-431. 2007. [2] R. Agrawal, A. Ailamaki, and et al. The claremont
report on database research.SIGMOD Rec.,
37(3):9–19, 2008.
[3] A. Arasu, B. Babcock, and et al. Stream: The
stanford stream data manager.IEEE Data Eng. Bull.,
26(1):19–26, 2003.
[4] G. F. Franklin, J. D. Powell, and M. Workman.Digital
Control of Dynamic Systems. Addition-Wesley, 1997.
[5] S. Harizopoulos, M. A. Shah, J. Meza, and
P. Ranganathan. Energy efficiency: The new holy grail
of data management systems research. InCIDR, 2009.
[6] W. Lang and J. M. Patel. Towards eco-friendly
database management systems. InCIDR, 2009.
[7] X. Liu, X. Zhu, P. Padala, Z. Wang, and S. Singhal. Optimal multivariate control for differentiated services on a shared hosting platform.
[8] R. Nambiar and M. Poess. Performance evaluation
and benchmarking. InTPCTC, volume 5895 of
Lecture Notes in Computer Science. Springer, 2009.
[9] M. Poess and R. O. Nambiar. Energy cost, the key challenge of today’s data centers: a power
consumption analysis of TPC-C results.PVLDB,
1(2):1229–1240, 2008.
[10] M. Poess and R. O. Nambiar. Tuning servers, storage and database for energy efficient data warehouses. In
ICDE, 2010.
[11] D. Tsirogiannis, S. Harizopoulos, and M. A. Shah. Analyzing the energy efficiency of a database server.
InSIGMOD, 2010.
[12] Y. Wang, K. Ma, and X. Wang.
Temperature-constrained power control for chip multiprocessors with online model estimation.
SIGARCH Comput. Archit. News, 37(3):314–324,
2009.
[13] Z. Xu, Y. Tu, and X. Wang. Exploring
power-performance tradeoffs in database systems. In