Chatty Tenants and the Cloud Network Sharing
Problem
Hitesh Ballani†, Keon Jang†, Thomas Karagiannis† Changhoon Kim‡, Dinan Gunawardena†, Greg O’Shea†
This talk is about . . .
How to share the network in
multi-tenant datacenters?
Multi-tenant datacenters
Public cloud datacenters
Windows Azure, Amazon EC2, Rackspace, … Tenants: users renting virtual machines
A use-case of cloud datacenters
Cloud Provider Physical Host Physical Host Storage Services Map reduce Intra-tenant traffic Network is shared! Inter-tenant trafficRequirements for network sharing
Tenants want predictable performance / cost
Not all flows are equal: some tenants pay more
Req 1.
Minimum bandwidth guarantee
Req 2.
Proportionality
Utilize spare resources as much as possible
Req 3.
High utilization
Existing solutions for
network sharing
Today Seawall FairCloud (PS-L) Min-guarantee Oktopus FairCloud (PS-P) High utilization Proportionality With Inter-tenant traffic Harder Problematic
Chatty tenants in the cloud
File Storage SQL DB FrontendWeb
Cache OptimizerWan Firewall
Provider Services
A tenant’s own
Prevalence of inter-tenant traffic
Inter-tenant traffic accounts for
10-35
%
of traffic!Measurement from 8 datacenters of a public cloud service provider
0 10 20 30 40 1 2 3 4 5 6 7 8 In te r-te nan t traf fic (%) Datacenter
Min bandwidth guarantee is harder
Inter-tenant traffic leads to richer communication pattern
and makes minimum bandwidth guarantee harder! …… Intra-tenant traffic
Inter-tenant traffic
On what links bandwidth guarantee is required?
Physical Host
How to define proportionality?
P and Q are paying same amount
p r1 q r2 r4 1000Mbps r3 Allocation P (Mbps) Q (Mbps) Per flow 250
750
Seawall 250750
FairCloud 333666
Q: Whose payment should
dictate the flow bandwidth?
Hadrian Overview
What semantics should we provide to tenants?
Virtual network abstraction
How to allocate bandwidth?
Hose-compliant bandwidth allocation
How to place virtual machines?
Greedy heuristic that guarantees min bandwidth
Hadrian Overview
What semantics should we provide to tenants?
Virtual network abstraction
How to allocate bandwidth?
Hose-compliant bandwidth allocation
How to place virtual machines?
State of the art: Hose-model
VM 1 VM 2 VM p … Virtual Switch Tenant P VMsTenant Request: <N,
B
>
BPEach VM is guaranteed to send/receive at minimum of B bps
Minimum bandwidth guarantee
High-utilization
BP BP
State of the art: Hose-model
VM 1 VM 2 VM p … VM 1 VM 2 … VM q VM 1 VM 2 … VM r BQ BRVirtual Switch Virtual Switch
Tenant P VMs Tenant Q VMs Tenant R VMs
BP
Virtual Switch
Tenant Request: <N,
B
>
Each VM is guaranteed to send/receive at minimum of B bps
BP
Multi-hose model
VM 1 VM 2 VM p … VM 1 VM 2 … VM q VM 1 VM 2 … VM r BQ BRSwitch Switch Switch
Tenant P VMs Tenant Q VMs Tenant R VMs
BP
Multi Hose Switch
Tenant Request: <N, B>
VMs in different tenants communicates with each other at a rate of min(Bp, Bq)
BP
Hierarchical hose model
Virtual Inter-tenant Switch
VM 1 VM 2 VM p … VM 1 VM 2 … VM q VM 1 VM 2 … VM r BQ BR
Virtual Switch Virtual Switch Virtual Switch
Tenant P VMs Tenant Q VMs Tenant R VMs
BP
Tenant Request: <N, B,
B
inter>
BQinterBPinter B
Rinter
BP
Communication dependency
Tenant Request: <N, B, B
inter,
list of dependent tenants
>
Q:How about service tenants (e.g., storage )?
Reduces possible communication patterns Helps place dependent tenants closer
Tenant Request: <N, B, B
inter,
*
>
Hadrian Overview
What semantics should we provide to tenants?
Virtual network abstraction
How to allocate bandwidth?
Hose-compliant bandwidth allocation
How to place virtual machines?
Hose-compliant bandwidth allocation
5
flows
5
flows
VM p
100Mbps
VM q
200Mbps
Whose payment should dictate the bandwidth of the flow?
20
40 40
20
Our approach : take minimum from two sides
20 20 20 20 40 40 40
Min
Hose-compliant bandwidth allocation
𝑊
𝑝𝑞=
𝑚𝑖𝑛
(
𝐵
𝑝,
𝐵
𝑞)
Weighted fair-share at the bottleneck
N
pflows
N
qflows
VM p Bp VM q Bq𝑊
𝑝𝑞?
Upper bound proportionality
5
flows
VM p
100Mbps
20 20 20 20 20Max aggregate
weight
for p’s flow
= 100
Upper bound of total weight of VM’s flows
is proportional to the VM’s payment
Minimum bandwidth guarantee
Total weight for all flows of a given VM is bounded
Placement algorithm enforces 𝑻𝒐𝒕𝒂𝒍 𝑾𝒆𝒊𝒈𝒉𝒕 ≤ 𝑪𝒂𝒑𝒂𝒄𝒊𝒕𝒚 on any link in the network
Min bandwidth guarantee
p r1 q r2 rk
…
…
N flows 500Mbps 500Mbps 1000Mbps Total weight ≤ 1000Evaluation
Synthetic cloud workload benchmark
Tenants submit requests for VMs and execute jobs A job has
CPU Processing, Inter-tenant traffic, Intra-tenant traffic
Inter-tenant traffic ratio: 10 - 40%
Fraction of tenant w/ inter-tenant : 20%
Environments
Testbed: 16 end hosts
Evaluation criteria
Network sharing requirements
Minimum bandwidth guarantee Upper-bound proportionality High-utilization
Benefits of Hadrian
Metric: acceptance ratio
Comparison with
Baseline: per flow sharing
Job completion time
Utilize spare resources Always finishes in time
3.4x
at 95thpercentileRelative job completion time against estimate
Estimate = amount of data / bandwidth requirement
Minimum bandwidth guarantee
Bandwidth allocation
0 100 200 300 400 500 1 3 5 7 9 11 13 15 Thr ou ghput fo r VM p (Mbp s)Number of flows for VM q Per flow FairCloud(PS-L) Hose Compliant p r1 q r2 rk
…
1000Mbps N flows
35 72 37 72 0 20 40 60 80 100 Baseline Hadrian Simulation Testbed
Request acceptance ratio – testbed
37%
more jobs Accept ance R at io (%)Similar benefits in large scale simulations
Summary
We show that Inter-tenant traffic is prevalent
10~35% from a major public cloud provider
We propose
Hadrian
Virtual network abstraction: inter-tenant, dependency
Bandwidth allocation strategy: upper-bound proportionality Placement algorithm: greedy dependency aware packing
Our evaluation show that
Hadrian meets three network sharing requirements Hadrian delivers predictability and higher efficiency