Falloc: Fair Network Bandwidth Allocation in IaaS Datacenters via a Bargaining Game Approach

(1)

Falloc

: Fair Network Bandwidth Allocation in IaaS

Datacenters via a Bargaining Game Approach

2 _{Huazhong University of Science and Technology, Wuhan, China} 3_{The Chinese University of Hong Kong}

October 8, 2013 @ IEEE ICNP, Gottingen, Germany

Fangming Liu

1,2

In collaboration with Jian Guo 1,2, Haowen Tang1 ,2, Yingnan Lian1,2, Hai Jin 2

and John C.S. Lui 3

(2)

Outline

Motivation

• Fairness is important in datacenter networks

Problem

• How to achieve flexible fairness on bandwidth sharing

Idea

• Cooperation among VMs for flexible bandwidth allocation via a Bargaining Game

Solution

• Distributed cooperative algorithm

Evaluations

(3)

Motivation

Why fairness is important in IaaS datacenter networks

(Intra-DCNs)?

(4)

IaaS Clouds Hosting Increasingly More Apps

Datacenters for IaaS cloud services

36% growth

(5)

5

IaaS DCN: Challenges & Opportunities

Today’s IaaS cloud

• Shared & Multiplexed across many tenants

• Pay-per-usage charging model via different types of virtual machines (VMs)

• Only true for: CPU, memory, storage However

• Intra-DC network resources shared in best effort manner based on traditional protocols, e.g., TCP

• Bandwidth is not fairly shared based on payment

• Unpredictable/varying performance, e.g., job finish times

 Lack of performance isolation/performance guarantee for VMs

 NO charge on quantified intra-DCN bandwidth

• Remind that Providers do charge you for CPU, Memory, Storage… • Virtualization became mature except for Networking….

(6)

Issue I: Intra-DC Network is not fairly shared

Global view

• Different tenants sharing the same underlying intra-DCN

Tenants A

Tenants B

A is more aggressive (UDP, more TCP links)

B is more important (commercial transaction)

A will get more bandwidth

Total throughput

(7)

7

Issue I: VM-level Fairness in Intra-DCN

In details

• VMs are sharing congested links

• Relying on TCP’s congestion control  flow-level fairness • Applications are running in VMs

• The network allocation depends on: 1) VMs running on the same machine,

2) cross-traffic on each link used by the VM

Fairness among users

VMs

Congested links

The congested link is shared based on the number of TCP-flows

Transport layer fairness

VM-level fairness

(8)

Issue II: Bandwidth Guarantee

An existing approach

Server Switch

VMs

Allocate VMs in the topos

Reserve bandwidth for virtual clusters

(9)

9

Issue III: Utilization

An example of cloud service in DC:

•

The networking demands of cloud applications are

time-varying

•

Low network

utilization

if statically reserved

VM1: demand of 1Gbps VM0: demand of 10Mbps 500Mbps 500Mbps 1Gbps Virtual Switch Rooter Low utilization

(10)

A Large Design Space for 3-way Tradeoffs

Utilization Fair bandwidth share Minimum guarantee Predicable performance Provider User

(11)

Problem

How to achieve

flexible fairness

on bandwidth sharing

for balancing such tradeoffs

(12)

Fairness requirements

What do cloud users want?

• Paying for a fixed bandwidth

• A priority stands for the ratio of shared bandwidth

What do cloud providers want?

• High utilization • Meeting SLA

(13)

13

Requirements 1

Guarantee

base

bandwidth

t1 t2

Base bandwidth: B

A base bandwidth

• User: pay for a base bandwidth

How to guarantee

• D<B: allocate enough bandwidth to satisfy the demand

• D>B: limit the upper bound to maintain fairness among VMs

(14)

Requirement 2

Weight

• Important (expensive) jobs have larger weight

How to

• Share the bandwidth beyond the base bandwidth in proportion

Assign a

weight

for each VM

D - B

Base bandwidth: B

(15)

15

Problem

How to achieve these two goals, as

well as maintaining high utilization?

(16)

Idea & Solution

Cooperation

among VMs: Guarantee base

bandwidth and network proportionality for

VMs via a

Bargaining Game

Approach

(17)

17

Ideas

Traditional way

• The bandwidth allocation depends on users’ applications

• Selfish: Flow-level fairness/ Unpredictable performance

Why not

• Cloud providers manage the bandwidth allocation • Cooperation among VMs

• Social welfare: fairness for tenants/ performance in SLA/ high utilization

How to cooperate in bandwidth allocation for Requirements 1 and 2? Let’s make clear the problem.

(18)

Model formulation

Resources abstraction

•

Non-blocking core (full bisection bandwidth)

•

VMs located in servers

...

Non-blocking Switch BW1 BW2 BWm VM VM

...

VM Server VM

...

VM VM

...

(19)

19

Model formulation

We know

• VM placement matrix: 𝑊: [w_𝑖,𝑗]_𝑀∗𝑁 • VM Demand matrix: 𝐷: [d_𝑖,𝑗]_𝑁∗𝑁 • Server bandwidth: 𝐶_𝑚

• Weight and base bandwidth of VMs: 𝑉𝑀_𝑖 < 𝐵_𝑖, 𝐾_𝑖 >

We solve

• The bandwidth allocation from VM to VM: [r_𝑖,𝑗]

𝑁∗𝑁

We apply

(20)

Problem Characterization

Asymmetric Nash Bargaining Solution

𝑚𝑎𝑥 (

𝑟

_𝑖,𝑗

− 𝐿

_𝑖,𝑗

)

𝐾𝑖,𝑗 Maximize the product of utility gain

𝐿

_𝑖,𝑗

≤ 𝑟

_𝑖,𝑗

≤ 𝑈

_𝑖,𝑗

, ∀𝑖, 𝑗 ∈ ℵ

𝑟

_𝑖𝐼

≤ 𝐶

_𝑚

, ∀𝑚 ∈ ℳ

𝑣_𝑖∈𝑚

𝑟

_𝑖𝐸

≤ 𝐶

_𝑚

, ∀𝑚 ∈ ℳ

𝑣_𝑖∈𝑚

Constraints for bound and server capacity

Why Nash bargaining solution in game?

(21)

21

DCN: An Ideal Network Environment

to be viewed as a Harmonious Society

Server Switch Poor VMs Rich VMs Harmonious society End Server Wealth flow

• Poor VMs: base bandwidth > bandwidth demand (Bi>Di)

• Rich competitor: base bandwidth ≤ bandwidth demand (Bi≤Di)

• Fairness: 1) Minimum guarantee for the poor 2) Maintain proportionality among the rich

(22)

Solution

• Lagrangian relaxation dual problem/ Subgradient method • Solution to the dual problem

• Solution to the primal problem: bandwidth allocation

𝜆

_𝑚

= max (0, 𝜆

_𝑚

− 𝜉(𝐶

_𝑚

− 𝑟

_𝑝

))

𝑟

_𝑖,𝑗

= 𝐿

_𝑖,𝑗

+

𝐾𝑖,𝑗

𝜆_𝑚 +𝜆_𝑙

𝜆_𝑚 can be solved by iteration on each server

𝑟_𝑖,𝑗 of a link can be solved with 𝜆 on two end servers

Distributed

(23)

23

Solution

Distributed cooperative algorithm

• Distributed: dual variable 𝜆_𝑚, 𝜆_𝑙

• Cooperative: bandwidth allocation 𝑟_𝑖,𝑗

𝑟

_𝑖,𝑗

= 𝐿

_𝑖,𝑗

+

𝐾𝑖,𝑗 𝜆_𝑚 +𝜆_𝑙

𝜆

_𝑚

=

max (0, 𝜆

_𝑚

− 𝜉(𝐶

_𝑚

− 𝑟

_𝑝

))

𝜆

_𝑙

=

max (0, 𝜆

_𝑚

− 𝜉(𝐶

_𝑚

− 𝑟

_𝑝

))

(24)

Algorithm: Falloc (Fair allocation)

How does the algorithm work

• Remaining bandwidth (𝐶_𝑚 − 𝑟_𝑝 > 0) → 𝜆_𝑚 decrease → 𝑟_𝑖,𝑗 increase → Reaming bandwidth are allocated

• Exceeds capacity (𝐶_𝑚 − 𝑟_𝑝 < 0) → 𝜆_𝑚 incrase → 𝑟_𝑖,𝑗 decrease → Exceeded bandwidth are withdrawn

• Fully utilized (𝐶_𝑚 − 𝑟_𝑝 = 0) → 𝜆_𝑚 stable → 𝑟_𝑖,𝑗 stable

𝜆

_𝑚

= max (0, 𝜆

_𝑚

− 𝜉(𝐶

_𝑚

− 𝑟

_𝑝

))

𝑟

_𝑖,𝑗

= 𝐿

_𝑖,𝑗

+

𝐾𝑖,𝑗

(25)

Evaluations

(26)

Implementation via SDN

VM VM I/ O sc h ed u le r V M M Switch

Priority-based package on Layer 2

Modify VMM network I/O scheduler

Implemented with OpenFlow

• run our proposed bandwidth allocation algorithm in a centralized controller

• Enforce the allocation result by

forwarding packets through speciﬁed queues in the switches

Mininet Evaluation

• a SDN platform running real network protocols and workloads

• the developed code can be moved to a real OpenFlow network without any change

(27)

27

Fairness

•

Guarantee bandwidth for H1 and H3

•

Share the bandwidth beyond the base bandwidth proportionally

for H2 and H4

•

Balance the tradeoff

Base bandwidth: 250 Mbps

(28)

Utilization

(29)

29

Algorithm efficiency

•

Convergence speed under

Falloc

•

Small step size: slow

(30)

Summary

•

Falloc

• An application-layer bandwidth allocation protocol using cooperation for bandwidth allocation in multiplexed IaaS datacenters via Bargaining Game

• Not only provide flexible fairness for VMs by balancing the tradeoff between bandwidth guarantee and proportional bandwidth share, but also maintain high network utilization • Towards mutual benefits for both cloud providers and tenants

• Performance guarantee, fairness and high-utilization under multiplexed

(31)

31

Q&A

Your suggestion is appreciated

！

Thank you!

Prof. Fangming Liu

More details:

http://grid.hust.edu.cn/fmliu/

Email:

[email protected]

e details: