Load Balancing for Distributed Stream Processing Engines. Muhammad Anis Uddin Nasir EMDC

(1)

Load Balancing for Distributed Stream

Processing Engines

Muhammad Anis Uddin Nasir

(2)

About me

•

Ex EMDC from Batch 2011 (the party batch)

•

Currently PhD Student at KTH Royal Institute of

Technology

–

Under supervision of Šarūnas Girdzijauskas

•

Internship

–

Ex-intern at Yahoo Labs Barcelona

(3)

(4)

(5)

(6)

(7)

(8)

(9)

Why PhD?

•

Freedom

–

Work timings

–

Set of problems

•

Flexibility

•

Conferences

•

Internships

(10)

Research Interest

•

Stream Processing

•

Social Network Analysis

•

Distributed Systems

(11)

Stream Processing Engines

•

Streaming Application

–

Online Machine Learning

–

Real Time Query Processing

–

Continuous Computation

•

Streaming Frameworks

(12)

Stream Processing Model

•

Streaming Applications are represented by

Directed Acyclic Graphs (DAGs)

Worker

Source

Data Stream

Operators

Data Channels

(13)

Stream Grouping

•

Key or Fields Grouping

–

Hash-based assignment

–

Stateful operations, e.g., page rank, degree count

•

Shuffle Grouping

–

Round-robin assignment

(14)

Key Grouping

•

Key Grouping

•

Scalable



•

Low Memory



(15)

Shuffle Grouping

•

Shuffle Grouping

•

Load Balance



•

Memory O(W)



(16)

Problem Formulation

•

Input is a unbounded sequence of

messages

from a

key

distribution

•

Each message is assigned to a

worker

for processing

•

Load balance properties

–

Memory Load Balance

–

Network Load Balance

–

Processing Load Balance

(17)

Partial Key Grouping (PKG)

•

Key Splitting

–

Split each key into two server

–

Assign each instance using power of two choices

•

Local Load Estimation

–

each source estimates load on workers

–

using the local routing history

(18)

•

Key Splitting

•

Local Load Estimation

Source

Worker

2

0

1

0

2

0

2

(19)

•

Key Splitting

–

Distributed

–

Stateless

–

Handle Skew

•

Local load estimation

–

No coordination among sources

–

No communication with workers

(20)

Partial Key Grouping

•

PKG

•

Load Balance



•

Memory O(1)



(21)

Streaming Applications

•

Most algorithms that use Shuffle Grouping can

be expressed using Partial Key Grouping to

reduce:

–

Memory footprint

–

Aggregation overhead

•

Algorithms that use Key Grouping can be

(22)

Stream Grouping: Summary

Grouping

•

Pros

•

Cons

Key Grouping

•

Scalable

•

Memory

•

Load Imbalance

Shuffle Grouping

•

Load Balance

•

Memory O(W)

•

Aggregation O(W)

Partial Key Grouping

•

Scalable

•

Load Balance

•

Memory O(1)

(23)

Experiments

•

What is the effect of

key splitting

on POTC?

•

How does

local estimation

compare to a

global oracle?

•

How does PKG perform on a real deployment

(24)

Experimental Setup

•

Metric

–

the difference of maximum and the average load of the workers at time t

•

Datasets

–

Twitter, 1.2G tweets (crawled July 2012)

–

Wikipedia, 22M access logs

–

Twitter, 690K cashtags (crawled Nov 2013)

–

Social Networks, 69M edges

(25)

(26)

(27)

Real deployment: Apache Storm

0

200

400

600

800

1000

1200

1400

1600

0

0.2 0.4 0.6 0.8

1

T

h

ro

u

g

h

p

u

t

(k

e

y

s

/s

)

(a) CPU delay (ms)

PKG

SG

KG

1000

1100

1200

0.10

0

2.10

6

4.10

6

6.10

6

(b) Memory (keys)

10s

30s

30s 60s

60s

300s

600s

PKG

SG

KG

(28)

Load Balancing for Distributed Stream

Processing Engines

(29)

(30)

Skewness

Worker

Source

(31)

Skewness

Worker

Source

(32)

Skewness

Worker

Source

Worker

(33)

High level ideas

•

Splitting the high frequency keys on more than

two workers

•

Challenges

–

How to decide which are the high frequency keys?

–

How many workers to split the high frequency

(34)

Solution

•

Divide stream into two components

–

Heavy Hitters

–

Tail Distribution

•

Split Heavy Hitters on all the workers

•

Use Greedy W-Choices or Round Robin for heavy hitters

•

Use vanilla PKG for tail

(35)

Skewness

Worker

Source

(36)

Experiments- Threshold

10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 0.4 0.8 1.2 1.6 2 I( t) ( m e ss a g e s) skew 5 workers W-Choice 2/|W| 1/|W| 1/2|W| 1/4|W| 1/8|W| 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 0.4 0.8 1.2 1.6 2 skew 10 workers W-Choice 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 0.4 0.8 1.2 1.6 2 skew 50 workers W-Choice 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 0.4 0.8 1.2 1.6 2 skew 100 workers W-Choice 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 0.4 0.8 1.2 1.6 2 I( t) ( m e ss ag e s) skew 5 workers RR 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 0.4 0.8 1.2 1.6 2 skew 10 workers RR 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 0.4 0.8 1.2 1.6 2 skew 50 workers RR 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 0.4 0.8 1.2 1.6 2 skew 100 workers RR

(37)

Experiments- Load Imbalance

10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 0.4 0.8 1.2 1.6 2 I( t) ( m e ss a g e s) skew 5 workers 10k unique items PKG W-Choice RR 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 0.4 0.8 1.2 1.6 2 skew 10 workers 10k unique items 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 0.4 0.8 1.2 1.6 2 skew 50 workers 10k unique items 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 0.4 0.8 1.2 1.6 2 skew 100 workers 10k unique items 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 0.4 0.8 1.2 1.6 2 I( t) ( m e ss ag e s) skew 5 workers 100k unique items 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 0.4 0.8 1.2 1.6 2 skew 10 workers 100k unique items 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 0.4 0.8 1.2 1.6 2 skew 50 workers 100k unique items 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 0.4 0.8 1.2 1.6 2 skew 100 workers 100k unique items 10-6 10-5 10-4 10-3 10-2 10-1 100 I( t) ( m e ss ag e s) 5 workers 1M unique items 10-6 10-5 10-4 10-3 10-2 10-1 100 10 workers 1M unique items 10-6 10-5 10-4 10-3 10-2 10-1 100 50 workers 1M unique items 10-6 10-5 10-4 10-3 10-2 10-1 100 100 workers 1M unique items

(38)

Conclusion

•

Partial Key Grouping (PKG) reduces the load imbalance

by up to seven orders of magnitude compared to Key

Grouping

•

PKG imposes constant memory and aggregation

overhead, i.e., O(1), compared to Shuffle Grouping that

is O(W)

•

W-Choices improves PKG to achieve nearly perfect load

(39)

Load Balancing for Distributed Stream

Processing Engines