Falloc
: Fair Network Bandwidth Allocation in IaaS
Datacenters via a Bargaining Game Approach
2 Huazhong University of Science and Technology, Wuhan, China 3 The Chinese University of Hong Kong
October 8, 2013 @ IEEE ICNP, Gottingen, Germany
Fangming Liu
1,2In collaboration with Jian Guo 1,2, Haowen Tang1 ,2, Yingnan Lian1,2, Hai Jin 2
and John C.S. Lui 3
Outline
Motivation
• Fairness is important in datacenter networks
Problem
• How to achieve flexible fairness on bandwidth sharing
Idea
• Cooperation among VMs for flexible bandwidth allocation via a Bargaining Game
Solution
• Distributed cooperative algorithm
Evaluations
Motivation
Why fairness is important in IaaS datacenter networks
(Intra-DCNs)?
IaaS Clouds Hosting Increasingly More Apps
Datacenters for IaaS cloud services
36% growth
5
IaaS DCN: Challenges & Opportunities
Today’s IaaS cloud
• Shared & Multiplexed across many tenants
• Pay-per-usage charging model via different types of virtual machines (VMs)
• Only true for: CPU, memory, storage However
• Intra-DC network resources shared in best effort manner based on traditional protocols, e.g., TCP
• Bandwidth is not fairly shared based on payment
• Unpredictable/varying performance, e.g., job finish times
Lack of performance isolation/performance guarantee for VMs
NO charge on quantified intra-DCN bandwidth
• Remind that Providers do charge you for CPU, Memory, Storage… • Virtualization became mature except for Networking….
Issue I: Intra-DC Network is not fairly shared
Global view
• Different tenants sharing the same underlying intra-DCN
Tenants A
Tenants B
A is more aggressive (UDP, more TCP links)
B is more important (commercial transaction)
A will get more bandwidth
Total throughput
7
Issue I: VM-level Fairness in Intra-DCN
In details
• VMs are sharing congested links
• Relying on TCP’s congestion control flow-level fairness • Applications are running in VMs
• The network allocation depends on: 1) VMs running on the same machine,
2) cross-traffic on each link used by the VM
Fairness among users
VMs
Congested links
The congested link is shared based on the number of TCP-flows
Transport layer fairness
VM-level fairness
Issue II: Bandwidth Guarantee
An existing approach
Server Switch
VMs
Allocate VMs in the topos
Reserve bandwidth for virtual clusters
9
Issue III: Utilization
An example of cloud service in DC:
•
The networking demands of cloud applications are
time-varying
•
Low network
utilization
if statically reserved
VM1: demand of 1Gbps VM0: demand of 10Mbps 500Mbps 500Mbps 1Gbps Virtual Switch Rooter Low utilizationA Large Design Space for 3-way Tradeoffs
Utilization Fair bandwidth share Minimum guarantee Predicable performance Provider UserProblem
How to achieve
flexible fairness
on bandwidth sharing
for balancing such tradeoffs
Fairness requirements
What do cloud users want?
• Paying for a fixed bandwidth
• A priority stands for the ratio of shared bandwidth
What do cloud providers want?
• High utilization • Meeting SLA
13
Requirements 1
Guarantee
base
bandwidth
t1 t2
Base bandwidth: B
A base bandwidth
• User: pay for a base bandwidth
How to guarantee
• D<B: allocate enough bandwidth to satisfy the demand
• D>B: limit the upper bound to maintain fairness among VMs
Requirement 2
Weight
• Important (expensive) jobs have larger weight
How to
• Share the bandwidth beyond the base bandwidth in proportion
Assign a
weight
for each VM
D - B
Base bandwidth: B
15
Problem
How to achieve these two goals, as
well as maintaining high utilization?
Idea & Solution
Cooperation
among VMs: Guarantee base
bandwidth and network proportionality for
VMs via a
Bargaining Game
Approach
17
Ideas
Traditional way
• The bandwidth allocation depends on users’ applications
• Selfish: Flow-level fairness/ Unpredictable performance
Why not
• Cloud providers manage the bandwidth allocation • Cooperation among VMs
• Social welfare: fairness for tenants/ performance in SLA/ high utilization
How to cooperate in bandwidth allocation for Requirements 1 and 2? Let’s make clear the problem.
Model formulation
Resources abstraction
•
Non-blocking core (full bisection bandwidth)
•
VMs located in servers
...
Non-blocking Switch BW1 BW2 BWm VM VM...
VM Server VM...
VM VM...
19
Model formulation
We know
• VM placement matrix: 𝑊: [w𝑖,𝑗]𝑀∗𝑁 • VM Demand matrix: 𝐷: [d𝑖,𝑗]𝑁∗𝑁 • Server bandwidth: 𝐶𝑚• Weight and base bandwidth of VMs: 𝑉𝑀𝑖 < 𝐵𝑖, 𝐾𝑖 >
We solve
• The bandwidth allocation from VM to VM: [r𝑖,𝑗]
𝑁∗𝑁
We apply
Problem Characterization
Asymmetric Nash Bargaining Solution
𝑚𝑎𝑥 (
𝑟
𝑖,𝑗− 𝐿
𝑖,𝑗)
𝐾𝑖,𝑗 Maximize the product of utility gain𝐿
𝑖,𝑗≤ 𝑟
𝑖,𝑗≤ 𝑈
𝑖,𝑗, ∀𝑖, 𝑗 ∈ ℵ
𝑟
𝑖𝐼≤ 𝐶
𝑚, ∀𝑚 ∈ ℳ
𝑣𝑖∈𝑚𝑟
𝑖𝐸≤ 𝐶
𝑚, ∀𝑚 ∈ ℳ
𝑣𝑖∈𝑚Constraints for bound and server capacity
Why Nash bargaining solution in game?
21
DCN: An Ideal Network Environment
to be viewed as a Harmonious Society
Server Switch Poor VMs Rich VMs Harmonious society End Server Wealth flow
• Poor VMs: base bandwidth > bandwidth demand (Bi>Di)
• Rich competitor: base bandwidth ≤ bandwidth demand (Bi≤Di)
• Fairness: 1) Minimum guarantee for the poor 2) Maintain proportionality among the rich
Solution
Solution
• Lagrangian relaxation dual problem/ Subgradient method • Solution to the dual problem
• Solution to the primal problem: bandwidth allocation
𝜆
𝑚= max (0, 𝜆
𝑚− 𝜉(𝐶
𝑚− 𝑟
𝑝))
𝑟
𝑖,𝑗= 𝐿
𝑖,𝑗+
𝐾𝑖,𝑗𝜆𝑚 +𝜆𝑙
𝜆𝑚 can be solved by iteration on each server
𝑟𝑖,𝑗 of a link can be solved with 𝜆 on two end servers
Distributed
23
Solution
Distributed cooperative algorithm
• Distributed: dual variable 𝜆𝑚, 𝜆𝑙
• Cooperative: bandwidth allocation 𝑟𝑖,𝑗
𝑟
𝑖,𝑗= 𝐿
𝑖,𝑗+
𝐾𝑖,𝑗 𝜆𝑚 +𝜆𝑙𝜆
𝑚=
max (0, 𝜆
𝑚− 𝜉(𝐶
𝑚− 𝑟
𝑝))
𝜆
𝑙=
max (0, 𝜆
𝑚− 𝜉(𝐶
𝑚− 𝑟
𝑝))
Algorithm: Falloc (Fair allocation)
How does the algorithm work
• Remaining bandwidth (𝐶𝑚 − 𝑟𝑝 > 0) → 𝜆𝑚 decrease → 𝑟𝑖,𝑗 increase → Reaming bandwidth are allocated
• Exceeds capacity (𝐶𝑚 − 𝑟𝑝 < 0) → 𝜆𝑚 incrase → 𝑟𝑖,𝑗 decrease → Exceeded bandwidth are withdrawn
• Fully utilized (𝐶𝑚 − 𝑟𝑝 = 0) → 𝜆𝑚 stable → 𝑟𝑖,𝑗 stable
𝜆
𝑚= max (0, 𝜆
𝑚− 𝜉(𝐶
𝑚− 𝑟
𝑝))
𝑟
𝑖,𝑗= 𝐿
𝑖,𝑗+
𝐾𝑖,𝑗Evaluations
Implementation via SDN
VM VM I/ O sc h ed u le r V M M SwitchPriority-based package on Layer 2
Modify VMM network I/O scheduler
Implemented with OpenFlow
• run our proposed bandwidth allocation algorithm in a centralized controller
• Enforce the allocation result by
forwarding packets through specified queues in the switches
Mininet Evaluation
• a SDN platform running real network protocols and workloads
• the developed code can be moved to a real OpenFlow network without any change
27
Fairness
•
Guarantee bandwidth for H1 and H3
•
Share the bandwidth beyond the base bandwidth proportionally
for H2 and H4
•
Balance the tradeoff
Base bandwidth: 250 Mbps
Utilization
29
Algorithm efficiency
•
Convergence speed under
Falloc
•
Small step size: slow
Summary
•
Falloc
• An application-layer bandwidth allocation protocol using cooperation for bandwidth allocation in multiplexed IaaS datacenters via Bargaining Game
• Not only provide flexible fairness for VMs by balancing the tradeoff between bandwidth guarantee and proportional bandwidth share, but also maintain high network utilization • Towards mutual benefits for both cloud providers and tenants
• Performance guarantee, fairness and high-utilization under multiplexed
31
Q&A
Your suggestion is appreciated
!
Thank you!
Prof. Fangming Liu
More details:
http://grid.hust.edu.cn/fmliu/
Email: