Load Balancing for Distributed Stream
Processing Engines
Muhammad Anis Uddin Nasir
About me
•
Ex EMDC from Batch 2011 (the party batch)
•
Currently PhD Student at KTH Royal Institute of
Technology
–
Under supervision of Šarūnas Girdzijauskas
•
Internship
–
Ex-intern at Yahoo Labs Barcelona
Why PhD?
•
Freedom
–
Work timings
–
Set of problems
•
Flexibility
•
Conferences
•
Internships
Research Interest
•
Stream Processing
•
Social Network Analysis
•
Distributed Systems
Stream Processing Engines
•
Streaming Application
–
Online Machine Learning
–
Real Time Query Processing
–
Continuous Computation
•
Streaming Frameworks
Stream Processing Model
•
Streaming Applications are represented by
Directed Acyclic Graphs (DAGs)
Worker
Worker
Worker
Worker
Worker
Worker
Source
Source
Source
Source
Data Stream
Operators
Data Channels
Stream Grouping
•
Key or Fields Grouping
–
Hash-based assignment
–
Stateful operations, e.g., page rank, degree count
•
Shuffle Grouping
–
Round-robin assignment
Key Grouping
•
Key Grouping
•
Scalable
•
Low Memory
Shuffle Grouping
•
Shuffle Grouping
•
Load Balance
•
Memory O(W)
Problem Formulation
•
Input is a unbounded sequence of
messages
from a
key
distribution
•
Each message is assigned to a
worker
for processing
•
Load balance properties
–
Memory Load Balance
–
Network Load Balance
–
Processing Load Balance
Partial Key Grouping (PKG)
•
Key Splitting
–
Split each key into two server
–
Assign each instance using power of two choices
•
Local Load Estimation
–
each source estimates load on workers
–
using the local routing history
Partial Key Grouping (PKG)
•
Key Splitting
•
Local Load Estimation
Source
Source
Worker
Worker
Worker
2
0
1
1
0
2
2
0
2
Partial Key Grouping (PKG)
•
Key Splitting
–
Distributed
–
Stateless
–
Handle Skew
•
Local load estimation
–
No coordination among sources
–
No communication with workers
Partial Key Grouping
•
PKG
•
Load Balance
•
Memory O(1)
Streaming Applications
•
Most algorithms that use Shuffle Grouping can
be expressed using Partial Key Grouping to
reduce:
–
Memory footprint
–
Aggregation overhead
•
Algorithms that use Key Grouping can be
Stream Grouping: Summary
Grouping
•
Pros
•
Cons
Key Grouping
•
Scalable
•
Memory
•
Load Imbalance
Shuffle Grouping
•
Load Balance
•
Memory O(W)
•
Aggregation O(W)
Partial Key Grouping
•
Scalable
•
Load Balance
•
Memory O(1)
Experiments
•
What is the effect of
key splitting
on POTC?
•
How does
local estimation
compare to a
global oracle?
•
How does PKG perform on a real deployment
Experimental Setup
•
Metric
–
the difference of maximum and the average load of the workers at time t
•
Datasets
–
Twitter, 1.2G tweets (crawled July 2012)
–
Wikipedia, 22M access logs
–
Twitter, 690K cashtags (crawled Nov 2013)
–
Social Networks, 69M edges
Real deployment: Apache Storm
0
200
400
600
800
1000
1200
1400
1600
0
0.2 0.4 0.6 0.8
1
T
h
ro
u
g
h
p
u
t
(k
e
y
s
/s
)
(a) CPU delay (ms)
PKG
SG
KG
1000
1100
1200
0.10
0
2.10
6
4.10
6
6.10
6
(b) Memory (keys)
10s
10s
30s
30s 60s
60s
300s
300s
600s
600s
PKG
SG
KG
Load Balancing for Distributed Stream
Processing Engines
Muhammad Anis Uddin Nasir
Skewness
Worker
Worker
Worker
Worker
Worker
Worker
Source
Source
Source
Source
Skewness
Worker
Worker
Worker
Worker
Worker
Worker
Source
Source
Source
Source
Skewness
Worker
Worker
Worker
Worker
Worker
Worker
Source
Source
Source
Source
Worker
Worker
High level ideas
•
Splitting the high frequency keys on more than
two workers
•
Challenges
–
How to decide which are the high frequency keys?
–
How many workers to split the high frequency
Solution
•
Divide stream into two components
–
Heavy Hitters
–
Tail Distribution
•
Split Heavy Hitters on all the workers
•
Use Greedy W-Choices or Round Robin for heavy hitters
•
Use vanilla PKG for tail
Skewness
Worker
Worker
Worker
Worker
Worker
Worker
Source
Source
Source
Source
Experiments- Threshold
10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 0.4 0.8 1.2 1.6 2 I( t) ( m e ss a g e s) skew 5 workers W-Choice 2/|W| 1/|W| 1/2|W| 1/4|W| 1/8|W| 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 0.4 0.8 1.2 1.6 2 skew 10 workers W-Choice 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 0.4 0.8 1.2 1.6 2 skew 50 workers W-Choice 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 0.4 0.8 1.2 1.6 2 skew 100 workers W-Choice 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 0.4 0.8 1.2 1.6 2 I( t) ( m e ss ag e s) skew 5 workers RR 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 0.4 0.8 1.2 1.6 2 skew 10 workers RR 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 0.4 0.8 1.2 1.6 2 skew 50 workers RR 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 0.4 0.8 1.2 1.6 2 skew 100 workers RRExperiments- Load Imbalance
10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 0.4 0.8 1.2 1.6 2 I( t) ( m e ss a g e s) skew 5 workers 10k unique items PKG W-Choice RR 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 0.4 0.8 1.2 1.6 2 skew 10 workers 10k unique items 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 0.4 0.8 1.2 1.6 2 skew 50 workers 10k unique items 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 0.4 0.8 1.2 1.6 2 skew 100 workers 10k unique items 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 0.4 0.8 1.2 1.6 2 I( t) ( m e ss ag e s) skew 5 workers 100k unique items 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 0.4 0.8 1.2 1.6 2 skew 10 workers 100k unique items 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 0.4 0.8 1.2 1.6 2 skew 50 workers 100k unique items 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 0.4 0.8 1.2 1.6 2 skew 100 workers 100k unique items 10-6 10-5 10-4 10-3 10-2 10-1 100 I( t) ( m e ss ag e s) 5 workers 1M unique items 10-6 10-5 10-4 10-3 10-2 10-1 100 10 workers 1M unique items 10-6 10-5 10-4 10-3 10-2 10-1 100 50 workers 1M unique items 10-6 10-5 10-4 10-3 10-2 10-1 100 100 workers 1M unique itemsConclusion
•
Partial Key Grouping (PKG) reduces the load imbalance
by up to seven orders of magnitude compared to Key
Grouping
•
PKG imposes constant memory and aggregation
overhead, i.e., O(1), compared to Shuffle Grouping that
is O(W)
•
W-Choices improves PKG to achieve nearly perfect load
Load Balancing for Distributed Stream
Processing Engines
Muhammad Anis Uddin Nasir