• No results found

Dynamic Thermal Management for Distributed Systems

N/A
N/A
Protected

Academic year: 2021

Share "Dynamic Thermal Management for Distributed Systems"

Copied!
32
0
0

Loading.... (view fulltext now)

Full text

(1)

Dynamic Thermal Management

for

Distributed Systems

Andreas Weissel • Frank Bellosa

Department of Computer Science 4 (Operating Systems) University of Erlangen-Nuremberg

(2)

Benefits of Dynamic Thermal Management

■ Cooling servers, server clusters

◆cooling facilities often dimensioned for worst-case temperatures or overprovisioned

■ Guarantee temperature limits

◆no need for overprovisioning of cooling units

◆reduced costs (floor space, energy consumption, maintenance, ...) ■ Increased reliability

◆safe operation in case of cooling unit failure

◆avoid local hot-spots in the server room

(3)

Drawbacks of Existing Approaches

■ If critical temperature is reached

◆throttle the CPU:

e.g. halt cycles, reduced duty cycle, reduced speed

■ But: neglect of application-, user- or service-specific requirements due to missing online information about

◆the originator of a specific hardware activation and

◆the amount of energy consumed by that activity

(4)

Outline

■ From events to energy

◆event-monitoring counters

◆on-line estimation of energy consumption ■ From energy to temperature

◆temperature model ■ Energy Containers

◆accounting of energy consumption

◆task-specific temperature management

(5)

Approaches to Energy Characterization

■ Reading of thermal diode embedded in modern CPUs

◆low temporal resolution

◆significant overhead

(6)

Approaches to Energy Characterization

■ Reading of thermal diode embedded in modern CPUs

◆low temporal resolution

◆significant overhead

➜no information about originator of power consumption ■ Counting CPU cycles

◆time as an indicator for energy consumption

◆time as an indicator for contribution to temperature level

◆throttling according to runtime

(7)

Approaches to Energy Characterization

■ P4 (2 GHz) running compute intensive tasks: CPU load of 100%

◆variation between 30–51 W dd nc dd ad re op re nc L1 L2 ler 2 a 60 p r 0 10 20 30 40 50

power consumption [Watt]

idle power consumption: 12.3 W

(8)

From Events to Energy: Event-Monitoring Counters

■ Event counters register energy-critical events in the complete system architecture.

◆several events can be counted simultaneously

◆low algorithmic overhead

◆high temporal resolution

◆fast response ■ Energy estimation

➜correlate a processor-internal event to an amount of energy

◆select several events and use a linear combination of these event counts to compute the energy consumption

(QHUJ\ HYHQWL⋅ ZHLJKWL

L

=

(9)

From Events to Energy: Methodology

■ Measure the energy consumption of training applications

■ Find the events with the highest correlation to energy consumption

■ Compute weights from linear combination of event counts and real power measurements of the CPU

➜solve linear optimization problem:

find the linear combination of these events that produce the minimum estimation error

➜avoid underestimation of energy consumption

PLQ HYHQWL ⋅ZHLJKWL– PHDVXUHGHQHUJ\

L

(10)

From Events to Energy: Methodology

■ Set of events and their weights

■ Limitations of the Pentium 4

◆insufficient events for MMX, SSE & floating point instructions

the case for dedicated Energy Monitoring Counters

event weight [nJ]

time stamp counter 6.17

unhalted cycles 7.12 µop queue writes 4.75 retired branches 0.56 mispred branches 340.46 mem retired 1.73 ld miss 1L retired 13.55

(11)

Outline

■ From events to energy

◆event-monitoring counters

◆on-line estimation of energy consumption

From energy to temperature

◆temperature model ■ Energy Containers

◆accounting of energy consumption

◆task-specific temperature management

(12)

From Energy to Temperature: Thermal Model

■ CPU and heat sink treated as a black box with energy in- and output

◆energy input: electrical energy being consumed

◆energy output: heat radiation and convection

CPU

Energy(#Events) Thermal Capacity Heat Sink Thermal Resistance Energy(Temp)

(13)

From Energy to Temperature: Thermal Model

Energy input: energy consumed by the processor

CPU

Energy(#Events) Thermal Capacity Heat Sink Thermal Resistance Energy(Temp) G7 = F3GW &38SRZHU FRQVXPSWLRQ 3

(14)

From Energy to Temperature: Thermal Model

Energy output: primarily due to convection

CPU

Energy(#Events) Thermal Capacity Heat Sink Thermal Resistance Energy(Temp) G7 = –F 7 7( – )GW DPELHQW WHPSHUDWXUH 7 G7 = F3GW &38SRZHU FRQVXPSWLRQ 3

(15)

From Energy to Temperature: Thermal Model

■ Altogether:

◆energy estimator ➔ power consumption P

◆time stamp counter ➔ time interval dt

◆the constants , and have to be determined

G7 = [F3 F– (7 7– )]GW

(16)

From Energy to Temperature: Thermal Model

■ Altogether:

◆energy estimator ➔ power consumption P

◆time stamp counter ➔ time interval dt

◆the constants , and have to be determined ■ Solving this differential equation yields

G7 = [F3 F– (7 7– )]GW F F 7 7 W( ) –F F ---⋅ H F– W F F ---- ⋅3 7+ + =

(17)

Thermal Model: Dynamic Part

■ Measurements of the processor temperature

◆on a sudden constant power consumption and

◆a sudden power reduction to HLT power.

➜fit an exponential function to the data: coefficient = F

40 45 50 55 60 Idle Active Idle (50 W) (12 W)

(18)

Thermal Model: Static Part

■ Static temperatures and power consumption of the test programs

10 20 30 40 50

power consumption [Watt]

30 35 40 45 50 55 60

(19)

Thermal Model: Static Part

■ Linear function to determine and F 7

35 40 45 50 55 60

(20)

Thermal Model: Implementation

■ Linux 2.6 kernel

■ Periodically compute a temperature estimation from the estimated energy consumption

■ Deviation of a few degrees celsius over 24 hours

◆or if ambient temperature changes

(21)

Thermal Model: Accuracy

40 45 50 55 60 temperature (celsius) estimated temperature measured temperature

(22)

Outline

■ From events to energy

◆event-monitoring counters

◆on-line estimation of energy consumption ■ From energy to temperature

◆temperature model

Energy Containers

◆accounting of energy consumption

◆task-specific temperature management

(23)

Properties of Energy Accounting

■ Accounting to different tasks/activities/clients

◆example: web server serving requests from different client classes

◆e.g. Internet/Intranet, different service contracts ■ “Resource principal” can change dynamically

■ Client/server relationships between processes

(24)

Energy Containers

■ Resource Containers [OSDI ’99] ➔ Energy Containers

◆separation of protection domain and “resource principal” ■ Container Hierarchy

◆root container (whole system)

◆processes are attached to containers

◆this association can be changed

dynamically (client/server relationship) ➜energy is automatically accounted to

the activity responsible for it ■ Energy shares

◆amount of energy available (depending on energy limit)

◆periodically refreshed

◆if a container runs out of energy, its processes are stopped web server

root

Intranet Internet

80% 20%

(25)

Energy Containers

■ Example:

web server working for two clients with different shares

10 20 30 40 50 60

power consumption (Watt)

web server (total) client1 (80% share) client2 (20% share)

(26)

Task-specific Temperature Management

■ Periodically compute an energy limit for the root container (depending on the temperature limit )

■ Dissolve to ➔

■ Energy budgets of all containers are limited according to their shares

■ Tasks are automatically throttled according to their contribution to the current temperature

■ Throttling is implemented by removing tasks from the runqueue 7OLPLW

G7 = [F3 F+ (7 7– )]GW 7≤ OLPLW– 7

(27)

Temperature Management

■ Example: Enforcing a temperature limit of 45º

40 45 50 55 temperature (celsius) estimated temperature measured temperature temperature limit

(28)

Outline

■ From events to energy

◆event-monitoring counters

◆on-line estimation of energy consumption ■ From energy to temperature

◆temperature model ■ Energy containers

◆accounting of energy consumption

◆task-specific temperature management

Infrastructure for temperature management in distributed

systems

(29)

Energy Containers

■ Distributed energy accounting

◆Id transmitted with the network packets (IPv6 extension headers)

◆receiving process attached to the corresponding energy container

◆temperature and energy are cluster-wide accounted and limited

◆transparent to applications and unmodified operating systems

id 1 root Id 2 id 1 root id 2 id 1 root id 2 IPv6 IPv6

(30)

Energy Containers

■ Energy accounting across machine boundaries

◆requests from two different clients represented by two containers

◆web server sends requests to factorization server

➜the energy consumption of the server is correctly accounted to the client

0 500 1000

server #1 (web server)

time [s] 0 500 1000 0 10 20 30 40 50 power consum ption [W] server #2 (factorization) time [s] 1500 1500 container 1 (id 1) root container (total) container 2 (id 2)

(31)

Infrastructure for DTM in Distributed Systems

■ Distributed energy accounting

■ Foundation for policies managing energy and temperature in server clusters

◆account, monitor and limit energy consumption and temperature of each node

■ Examples

◆set equal energy/temperature limits for all servers

➜cluster-wide uniform temperature and power densities, no hot spots in the server room

◆use energy/temperature limits to

➜throttle affected servers in case of a cooling unit failure

(32)

Conclusion

■ Event-monitoring counters enable

◆on-line energy accounting

◆task-specific temperature management

■ Correctly account client/server relations across machine boundaries

■ Transparent to applications and unmodified operating systems

■ Future directions

◆examine more sophisticated energy models

References

Related documents

Revenue marketing is a new competency and when you begin the journey, put simply, “you don’t know what you don’t know.” As your team transitions from a creative organization to

Two out of 15 websites said they offered a whole of market search, eight websites said they offered quotes from a panel and five did not make it clear what type of search was

The research problems include: the changing role of industry in the Polish economy in the era of economic transformation, European integration based and globalization on its share

1 This study explored a manifold range of aspects linked to Asian migrations in Europe, but in this essay we will focus specifically on the findings concerning Chinese

Now we consider a dynamic where countries engage in a repetition of the static game analysed above. We assume at the start of the period, countries are in- formed about the

Recent studies demonstrate that the lanthanide −titanium oxo clusters (LnTOC) generally have excellent luminescence performance not only because there are usually O-bridges instead

Moreover, adults shifted by the correct distance to maximize their expected gain given their own visuomotor precision; an ANOVA comparing actual and optimal aiming-points across

46,47 In longitudinal Pima Indian studies, poor glycemic con- trol of type 2 diabetes was associated with an 11-fold increased risk of progressive bone loss compared to