• No results found

System Energy Efficiency Lab

N/A
N/A
Protected

Academic year: 2021

Share "System Energy Efficiency Lab"

Copied!
22
0
0

Loading.... (view fulltext now)

Full text

(1)

Prof. Tajana Šimunić Rosing Dept. of Computer Science

Sy

stem Energy Efficiency Lab

(2)

Energy efficiency at global scale

National LambdaRail

NSF GreenLight & IRNC: TransLight/StarLight

2.8MW Fuel Cell Power Plant , UCSD Wind from NREL, 2MW Solar @UCSD

(3)

Key issues for distributed

renewable-powered datacenters

Green energy availability varies dramatically

 Instantaneous use leads to significant energy efficiency losses  Prediction is needed

Datacenter computing requires consistent performance

Infrastructure that monitors and manages computation in datacenters has to be aware of performance costs

 Service response times are around 100ms, Max 10% batch job throughput hit

Energy costs of datacenters are typically higher than green

energy availability

Brown energy needs to be present to both supplement green and as “insurance” to meet performance constraints

Improvements in computation & networking infrastructure energy efficiency are necessary (power, thermal and cooling management)

(4)

Datacenter energy efficiency

Barroso & Hölzle, 2009

(5)

Energy efficiency of the infrastructure

Dynamic thermal management (DTM)

• Workload scheduling:

• Machine learning for dynamic adaptation

• Proactive thermal management

Reduces thermal hot spots by average 80% with no performance overhead

• Cooling aware management

Savings of 70% in cooling subsystem

Dynamic power management (DPM)

• HW level: adaptive power gating gives 40% energy savings with no perf. impact • SW level: 92% reduction in performance

variability with DVFS

• Optimal DPM for a class of workloads • Machine learning to adapt

• Select among specialized policies • Measured energy savings of 70%

NSF Project GreenLight

• Green cyber-infrastructure in

energy-efficient mobile facilities

• Closed-loop power and thermal

management

(6)

NSF GreenLight:

Dashboard & History plots

 Multiple sensor data: temperature, fan speed, liquid flow rate & temp, power

 Use measurements to develop models

(7)

GMVQ VM Power Cost Prediction

Goal: Estimate how much a VM consumes and predict what the cost would be if it migrates to another machine

70 W

Power clusters

CPU utilization is not enough!

Tajana Simunic Rosing, UCSD

Approach: Gaussian Mixture Vector Quantization (GMVQ) to fit a GMM to the training data

GMVQ is 3x better than regression

3x

(8)

Energy management with virtualization

CPU0 CPU1 NW CPU2 CPU n HD Hypervisor Guest n I/O CPUs Hardware

I/O Intensive? CPU Intensive? Guest 1 Guest 2 Apps OS Apps OS Apps OS Scheduler Workload Characterization VCPU1 VCPU2 wMPC wIPC util wMPC wIPC util vMPC vIPC vutil vMPC vIPC vutil nMPC nIPC nutil VM1 VM2 wMPC wIPC util wMPC wIPC util VCPU1 VCPU2 VGNODE  nMPCvMPC VM   nIPCvIPC VM   nutilvutil VM   vMPCwMPC VCPU   vIPCwIPC VCPU  vutilutil VCPU  • Scheduling

• Co-locate guests with orthogonal characteristics

• Management policies

• Based on the metrics maintained per guest

 Avg 35% Energy Savings  Avg 40% speedup

vGreen

(9)

vGreen+

Batch Jobs: •MIPS driven Services: •Latency sensitive

 Maximize qMIPS/Watt

 q  QoS ratio < 1

 MIPS  Batch job throughput

 Watt  Power consumption

vgnodes

vgserv

Tajana Simunic Rosing

(10)

Unmodified Xen: RUBIS w batch job

Poor SLA!

vGreen: Rubis & batch job

Great SLA!

Managing multi-tier applications

RUBiS: auction website

Clients

Webserver (PHP)

Database (mysql)

Tajana Simunic Rosing, UCSD

 What happens when we combine:

 Latency sensitive jobs (e.g. RUBIS)

 Throughput sensitive (e.g. batch jobs)

 Preliminary results:

 More than 10x improvement in SLA with

(11)

State of the art:

Baseline: running services and batch jobs on separate servers  Selective Consolidation (tChar): vGreen

Capping (tCap): Cap the CPU allotment to bVM to mitigate

interference effects (Padala@EuroSys’09, Nathuji@EuroSys’10)

Controller

:

 Dynamically control the vCPU allocation of the service VM to maximize the batch job throughput while meeting service response times

VM Scheduling Policies Throughput

Our controller is within 7% of baseline; CPU capping has 25% lower average throughput

Batch job

throughput

(12)

Use a controller to manage virtual CPUs dynamically

Maximize CPUs of various batch jobs while meeting Rubis SLAs

Energy Efficiency Improvements

70% more efficient than running service & batch VMs separately while within 7% of maximum batch job throughput

(13)

Green Energy Prediction

• Data from solar panels at UCSD

• State of the art: exponential weighted average

• EWMA: 32.6 % error

• Extended eEWMA: 23.4% error

Our algorithm:

WCMA: less than 9.6% error

• Data gathered from a wind farm in Lake Benton, available by NREL

• State-of-the-art:

• Integrated predictor: 48.2% error

Our algorithm: 21.2% error

• Combination of a weighted nearest-neighbor (NN) tables and wind power curve models

Predict green energy availability for the next 30min window

(14)

Methodology

Use green energy to schedule “extra” batch jobs.

 MapReduce (MR) type jobs for this purpose.

Initiate more subtasks with the available green energy.

 Increased throughput

 Reduced completion time - limit reduction to 10%  Kill a subtask if the green energy supply level drops

MR arrival

Active MR jobs Servers

Web Request arrival

Green Energy Supply

Server Web Request Queues

(15)

Experimental setup & validation

Globally distributed

datacenters connected

5 datacenters, 12 routers, modeled after ESnet

 Solar traces from UCSD  Wind traces from NREL

Tajana Simunic Rosing

Simulator validation Measured

Value

Simulated Value

Ave. Error

Avg. Power Consumption 246 W 251 W 3%

Rubis QoS ratio 0.08 0.085 6%

Avg. MapReduce Completion Time 112 min 121 min 8%

Jobs run within vGreen VMs on Nehalem server

Rubis used for services with 100ms 90th%ile response time constraint MapReduce used for batch jobs with 10% max job completion time

reduction (max 5 cores on Nehalem server)

(16)

Benefits of Green Energy Prediction

0 0.2 0.4 0.6 0.8 1

Solar GE Efficiency Wind GE Efficiency Combined GE Efficiency

Normali ze d Quant ities instantaneous prediction

Prediction has 93% GE Efficiency

GE Efficiency: ratio of green energy consumed for useful work vs. the total green energy available

Compare our green energy

predictive

jobs scheduler

(17)

Benefits of Green Energy Prediction

0 0.2 0.4 0.6 0.8 1

job completion time

Normali ze d Job Comletion Tim e

w/o GE instantaneou prediction

Prediction has 22% faster batch job completion time vs. instantaneous On average, 5x fewer batch tasks need to be terminated when using GE prediction vs. instantaneous usage

0 0.2 0.4 0.6

Solar GE Job % Wind GE Job % Combined GE Job %

Normali ze d Quant ities instantaneous prediction

38% more jobs complete with prediction vs. instantaneous

GE Job %: ratio of batch jobs completed with GE over all jobs.

(18)

Next steps: Green energy

powered global routing

(jointly with Inder Monga, Esnet)

2.8MW Fuel Cell Power Plant , UCSD Wind from NREL, 2MW Solar @UCSD

(19)

Focus Center Research Program

Sponsors

DOD & DARPA

Applied Materials Novellus Cadence

AMD MICRON Freescale Texas Instr. IBM Xilinx

Intel GlobalFoundries

Intel MICRON

Freescale Texas Instruments IBM Xilinx

GLOBALFOUNDRIES AMD

Director: Prof. Paul Kohl (Georgia Tech). Nanoscale electrical & optical interconnects; energy delivery and thermal management; wireless connectivity; modeling, analysis and assessment of new connectivity solutions.

Interconnect Focus Center [13 Universities]

Director: Prof. Larry Pileggi (CMU). Circuit/module infrastructure; enterprise systems; portable electronics; functional diversity and emerging circuits for post CMOS.

Center for Circuit & Systems Solutions [13 Universities]

Director: Prof. Jan Rabaey (UC-Berkeley). High-level systems design addressing distributed sense and control systems, large-scale and small-large-scale information technologies systems.

Multi-Scale Systems Research Center [10 Universities]

Functional Engineered Nano-Architectonics [14 Univ.]

Director: Prof. Kang Wang (UCLA). Novel materials and processes which enable fabrication of nanoscale devices and interconnects.

Materials, Structures and Devices [15 Universities]

Director: Prof. Dimitri Antoniadis (MIT). Integration of new

materials enabling CMOS extension; carbon-based devices; novel embedded memory; functional diversification; theory, modeling and simulation of new devices.

Director: Prof. Sharid Malik (Princeton). Platform architectures; concurrent systems programming; platform viability; resilient systems and alternative computation models.

Gigascale Systems Research Center [15 Universities]

Raytheon United Technologies

(20)

DSCS Theme

Distributed sense and control systems

Target: Airborne Platforms (Avionics) 5 faculty (reduced from 8), 3 locations

LSS Theme

Large-scale “energy-intensive” systems

Target: Data centers

9 faculty (stable), 5 locations

SSS Theme

Small-scale “energy-frugal” systems

Target: Human enhancement

5 faculty (reduced from 8), 4 locations

MuSyC in a Nutshell

GRAND CHALLENGE : “Energy-smart” distributed systems, that

 Are deeply aware of balance between energy availability and demand

 Adjust behavior through dynamic and adaptive optimization through all

scales of design hierarchy.

• 19 Faculty Distributed over 9 US Universities • 60 Students (many only

partly funded) • 109 publications

• Monthy e-seminars and bi-annual e-workshops 29% 32% 26% 13% 2010-2011 DSCS LSS SSS Center

(21)

Multiscale Systems Center:

Energy Balanced Datacenters

Theme leader: Tajana Simunic Rosing

Realize closed-loop energy management strategies that enable “energy-intensive”

large-scale systems to be orders of magnitude more energy-efficient, while ensuring that mission-critical goals are met.

(22)

Average cooling savings of 70%

relative to state-of-the-art

Combined Energy Thermal & Cooling,

CETC Management Results

Tajana Simunic Rosing, UCSD

CETC: Our policy

DLB: Dynamic load balancing (baseline)

NFMO: Only page migrations allowed

NMM: No memory clustering

NCM: No CPU scheduling optimizations Intel Xeon Dual Socket

Quad core Server; with state-of-the-art PI fan controller

CETC performance overhead < 0.2%

CETC page migration rate < 5 pages/sec -> negligible overhead & high stability

Local ambient Temp. # DIMM CETC % NFMO % NMM % NCM % DLB % 45 oC 8 0.175 0.184 0.093 0.102 0 35 OC 8 0.115 0.120 0.069 0.100 0 45 oC 16 0.175 0.187 0.109 0.102 0

References

Related documents

Waste refractory materials that are not to be processed and reused on site must be sampled prior to sending for disposal at an approved landfill site using a licensed

Core Energy Meditation ™ will teach you how to achieve a Core Energy State, a state of strong, positive, clear, and coherent vibration.. The Core Energy Technique

Energy efficiency, water efficiency and renewable energy improvements may be eligible for Commercial PACE financing.

[r]

Tubing and Filter Changes: Change 0.22-micron filter &amp; IV primary pump tubing (TPN) every 72 hours.. Change IV primary pump tubing (Lipids) every

• The paper presented an architectural pattern that helps software architects to design intelligent enterprise systems that combine interactive features with batch processing to

ASP’s Internet Data Transport Internet Internet Data Data Transport Transport Project Web Site Project Project Web Site Web Site Construction Docs Construction Docs Shop

This study aimed to understand why women choose to wear men’s clothing and to use this to question gender assignment in clothing, in order to develop design concepts for