• No results found

Data Center Network Minerals Tutorial

N/A
N/A
Protected

Academic year: 2021

Share "Data Center Network Minerals Tutorial"

Copied!
14
0
0

Loading.... (view fulltext now)

Full text

(1)

vManage: Loosely Coupled Platform

and Virtualization Management in

Data Centers

Sanjay Kumar (Intel), Vanish Talwar (HP Labs),

Vibhore Kumar (IBM Research), Partha Ranganathan (HP Labs),

Karsten Schwan (Georgia Tech)

2

Problem – Silos of management domains

Inefficiency and redundancy

High costs for customers

Platforms Mgmt. (e.g, server config, power mgmt.)

Virtualization/OS Mgmt. (e.g, VM prov., runtime monitoring)

App Mgmt. (e.g, SLA mgmt., patch updates)

Subsystem-level solutions across layers

(2)

0 5 10 15 20 25 30 35 280 300 320 340 360 380 400 420 0 20 40 60 80 100 120 R es p o n se Ti m e (m se cs ) Po w er (W a tt s) Time(sec)

Avg Power Resp Time

3 1 July 2009

Significant violations & instability due to oscillatory behavior

An illustrative problem with silos

0 5 10 15 20 25 30 35 280 300 320 340 360 380 400 420 0 20 40 60 80 100 120 Re sp on se Ti m e (m se cs ) Po w er (W at ts ) Time(sec)

Avg Power Resp Time

4 1 July 2009

Oscillations among SLA and power violations

Significant violations & instability due to oscillatory behavior

(3)

5 1 July 2009

Node (s)

Platform Manager

Platform Sensors & Actuators Cluster Virtualization

Manager

Virtualization & App Sensors & Actuators Local Virtualization Access Point Utilization Repository Management Node Power Usage Repository

vManage: New enablements for coordinated mgmt.

6 1 July 2009

Node (s)

Platform Sensors & Actuators Virtualization & App Sensors &

Actuators Local Virtualization Access Point Utilization Repository Management Node Power Usage Repository Coordinator 1 Platform Manager ClusterVirtualizationManager Coordinator 2

Registry & Proxy Service Registry & Proxy

Service

Key building blocks

(4)

7 1 July 2009

Node (s)

Platform Sensors & Actuators Virtualization & App Sensors &

Actuators Local Virtualization Access Point Utilization Repository Management Node Power Usage Repository Coordinator 1 Platform Manager ClusterVirtualizationManager Coordinator 2

Registry & Proxy Service Registry & Proxy

Service Key building Blocks Platform-aware Virtualization Management Virtualization-aware Platform Management

vManage: New enablements for coordinated mgmt.

• Structured & automated • Loosely-coupled & extensible • Works with legacy controllers

8 1 July 2009

Node (s)

Platf orm Sensors & Actuators Virtualization & App Sensors &

Actuators Local Virtualization Access Point Utilization Repository Management Node Power Usage Repository Coordinator Platform Manager Cluster Virtualization Manager

Coordinator

Registry & Proxy Service Key building blocks Platform-aware Virtualization Management Virtualization-aware Platform Management

Benefits

vManage: New enablements for coordinated mgmt.

Challenges

• Discovery & meta-data

registration

• Coordinated policies

(5)

VM Resource Requirements Virtualization Mgr. Requests

Coordinated policies

Node 1 Node 2 Node 3 . . . Node n

Current State of Art

Virtualization Mgr. Requests

Coordinated policies

Node 1 Node 2 Node 3 . . . Node n

Platform-aware

Virtualization Manager

VM Res Rqt + Power Budget Coordinator VM migration VM migration
(6)

Virtualization Mgr. Requests

Coordinated policies

Node 1 Node 2 Node 3 . . . Node n

Platform-aware

Virtualization Manager

Coordinator VM Res Rqt + Power Budget + Stability      R r T j v 0 0 j 0 0 M j T t) (t), (a T) , P( e.g., t t t t F

Coordinated policies

12 1 July 2009 P – State actuators Power Violations Power Mgr. Perf counters

Current State of Art

(7)

Coordinated policies

13 1 July 2009 P – State actuators Power Violations Power Mgr.

Virtualization-aware

Power Manager

Coordinator SLA + power violations SLA notifications VM migration actuators To Platform-aware Virtualization Manager Virtualization Mgr. Requests

Stability

Node 1 Node 2 Node 3 . . . Node n

Platform-aware

Virtualization Manager

Coordinator VM Res Rqt + Power Budget + Stability

Stability criterion

Pick node having highest probability for the placement decision to remain valid over certain duration in the future

(8)

Stability (contd.)

• Assume we can do offline profiling of applications or behavior traces

available

• i.e mean, standard deviation, PDF are known apriori

• Average probability with which the host can provide sufficient resources

of a particular type to a set M of VMs over a given time interval T is

T

t)

(t),

(a

T)

,

(

p

T j v 0 0 r 0 0 M j j

t t

t

t

F

• If we assume the PDF to have a normal distribution

)

2

(t)

σ

(t)

μ

x

erf

(1

2

1

t)

(x,

M j M j M j v v v

F

Application OS Xen Hypervisor Guest VM (s) Dom-M Power Manager

Registry & Proxy Services

iLO

Power Sensors iLO Firmware Dom0 Coordinator SLA Sensors

x86 Hardware

Vir t. Se ns or s, A ctu ato rs; Po w er A ctu ato rs

Prototype implementation

Per-node view

(9)

Experimental Setup

• Hardware

−Dual-core dual-socket machines with Intel 5150 processors −4 GB memory each machine

• Applications

−Rubis multi-tier app (4 VMs) −Nutch Web 2.0 app (1 VM) −Webserver app (1VM)

−Batch model app - CPU-intensive custom script

• Sensors and actuators

−Custom SLA sensors

−iLO power sensors and offline model-based calibration −XenMon for utilization

−CPU freq. driver for p-state actuations −VM migration actuator ( using Xen api )

Evaluation Results

28 VMs run over 20 hours on a 13-node testbed

−10 Nutch instances, 3 RUBiS instances, 6 static webserver instances

18 1 July 2009 210 220 230 240 250 Base (no coordination)CoordinatedSolution

(vManage) Average Power (Watts) 0 20 40 60 80 100 120 Base (no cordination)CoordinatedSolution

(vManage) Stability (# VM migrations) 0 0.5 1 Base (no coordination)CoordinatedSolution

(vManage)

SLAViolations (normalized to Base)

Significantly better QoS (71%)

Improved power savings (10%)

(10)

19 1 July 2009 0 2 4 6 8 10 12 14 16 18 20 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000 R e s p o n s e T im e (s e c ) Time (sec)

Application response time

200 220 240 260 280 300 320 340 360 380 400 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000 P o w e r (W a tt s ) Time (sec) Power usage

Traditional manager With our approach

Traditional manager With our approach

Snapshot of prototype operation

20 1 July 2009 0 2 4 6 8 10 12 14 16 18 20 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000 R e s p o n s e T im e (s e c ) Time (sec)

Application response time

200 220 240 260 280 300 320 340 360 380 400 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000 P o w e r (W a tt s ) Time (sec) Power usage

With our approach

With our approach

Snapshot of prototype operation

Traditional managers at low load

Traditional manager
(11)

21 1 July 2009 0 2 4 6 8 10 12 14 16 18 20 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000 R e s p o n s e T im e (s e c ) Time (sec)

Application response time

200 220 240 260 280 300 320 340 360 380 400 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000 P o w e r (W a tt s ) Time (sec) Power usage

With our approach

With our approach

Snapshot of prototype operation

Traditional manager at high load

Violations Violations Traditional manager Traditional manager 22 1 July 2009 0 2 4 6 8 10 12 14 16 18 20 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000 R e s p o n s e T im e (s e c ) Time (sec)

Application response time

200 220 240 260 280 300 320 340 360 380 400 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000 P o w e r (W a tt s ) Time (sec) Power usage

With our approach

With our approach

Snapshot of prototype operation

Coordinated manager at low load

Lower Power Traditional manager

(12)

23 1 July 2009 0 2 4 6 8 10 12 14 16 18 20 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000 R e s p o n s e T im e (s e c ) Time (sec)

Application response time

200 220 240 260 280 300 320 340 360 380 400 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000 P o w e r (W a tt s ) Time (sec) Power usage

With our approach

With our approach

Snapshot of prototype operation

Coordinated manager at high load

VM migration VM migration Traditional manager Traditional manager 24 1 July 2009 0 2 4 6 8 10 12 14 16 18 20 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000 R e s p o n s e T im e (s e c ) Time (sec)

Application response time

200 220 240 260 280 300 320 340 360 380 400 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000 P o w e r (W a tt s ) Time (sec) Power usage

With our approach

With our approach VM migration

Richness of information & actuation

Snapshot of prototype operation

Coordinated manager at high load

VM migration Traditional manager

(13)

0 0.2 0.4 0.6 0.8 1 Base C1 C2 C3 SLA Violations (normalized to Base) 0.84 0.88 0.92 0.96 1 Base C1 C2 C3 Power (normalized to Base) 0 20 40 60 80 100 120 Base C1 C2 C3 Stability (# migrations)

Benefits of our approach

Additional experimentation

Effects of stabilizer in coordinated solution

C1: “First fit” coordinated placement

C2: “Best fit” coordinated placement

C3: “Stability-aware” coordinated placement

Related Work

• Several individual mgmt solutions for virtualization mgmt, platform

mgmt., application mgmt.

− exist in isolated silos and represent partial subsystem-level solutions

• Few recent studies towards coordinated and unified mgmt.

[Raghavendra08], [Nathuji07], [Verma08], [Kephart07, Das08], [Chen08], [Adve02]

− lack a systematic systems/architecture approach to the coordination problem across hw-sw

− some focused on ad-hoc solutions dealing with limited actuators only

• Overall, vManage takes a loosely-coupled and practical approach

− works with most existing mgmt. infrastructures − easy plug-and-play of coordination solutions − extensible with multiple actuators

• Real prototype solution with enterprise applications running on a large

(14)

Summary

Management silos a critical and relevant problem

Our contributions

−architecture for cross-layer coordination in mgmt. systems

−mechanisms for unified discovery, coordination policies, and stability −Xen-based prototypes; experimentation on real testbeds

Future Work

−applying coordination mechanisms to more use cases

−extending the architecture for large scale (targeting millions of managed objects)

References

Related documents