• No results found

Load Balancing for Distributed Stream Processing Engines. Muhammad Anis Uddin Nasir EMDC

N/A
N/A
Protected

Academic year: 2021

Share "Load Balancing for Distributed Stream Processing Engines. Muhammad Anis Uddin Nasir EMDC"

Copied!
39
0
0

Loading.... (view fulltext now)

Full text

(1)

Load Balancing for Distributed Stream

Processing Engines

Muhammad Anis Uddin Nasir

(2)

About me

Ex EMDC from Batch 2011 (the party batch)

Currently PhD Student at KTH Royal Institute of

Technology

Under supervision of Šarūnas Girdzijauskas

Internship

Ex-intern at Yahoo Labs Barcelona

(3)
(4)
(5)
(6)
(7)
(8)
(9)

Why PhD?

Freedom

Work timings

Set of problems

Flexibility

Conferences

Internships

(10)

Research Interest

Stream Processing

Social Network Analysis

Distributed Systems

(11)

Stream Processing Engines

Streaming Application

Online Machine Learning

Real Time Query Processing

Continuous Computation

Streaming Frameworks

(12)

Stream Processing Model

Streaming Applications are represented by

Directed Acyclic Graphs (DAGs)

Worker

Worker

Worker

Worker

Worker

Worker

Source

Source

Source

Source

Data Stream

Operators

Data Channels

(13)

Stream Grouping

Key or Fields Grouping

Hash-based assignment

Stateful operations, e.g., page rank, degree count

Shuffle Grouping

Round-robin assignment

(14)

Key Grouping

Key Grouping

Scalable

Low Memory

(15)

Shuffle Grouping

Shuffle Grouping

Load Balance

Memory O(W)

(16)

Problem Formulation

Input is a unbounded sequence of

messages

from a

key

distribution

Each message is assigned to a

worker

for processing

Load balance properties

Memory Load Balance

Network Load Balance

Processing Load Balance

(17)

Partial Key Grouping (PKG)

Key Splitting

Split each key into two server

Assign each instance using power of two choices

Local Load Estimation

each source estimates load on workers

using the local routing history

(18)

Partial Key Grouping (PKG)

Key Splitting

Local Load Estimation

Source

Source

Worker

Worker

Worker

2

0

1

1

0

2

2

0

2

(19)

Partial Key Grouping (PKG)

Key Splitting

Distributed

Stateless

Handle Skew

Local load estimation

No coordination among sources

No communication with workers

(20)

Partial Key Grouping

PKG

Load Balance

Memory O(1)

(21)

Streaming Applications

Most algorithms that use Shuffle Grouping can

be expressed using Partial Key Grouping to

reduce:

Memory footprint

Aggregation overhead

Algorithms that use Key Grouping can be

(22)

Stream Grouping: Summary

Grouping

Pros

Cons

Key Grouping

Scalable

Memory

Load Imbalance

Shuffle Grouping

Load Balance

Memory O(W)

Aggregation O(W)

Partial Key Grouping

Scalable

Load Balance

Memory O(1)

(23)

Experiments

What is the effect of

key splitting

on POTC?

How does

local estimation

compare to a

global oracle?

How does PKG perform on a real deployment

(24)

Experimental Setup

Metric

the difference of maximum and the average load of the workers at time t

Datasets

Twitter, 1.2G tweets (crawled July 2012)

Wikipedia, 22M access logs

Twitter, 690K cashtags (crawled Nov 2013)

Social Networks, 69M edges

(25)
(26)
(27)

Real deployment: Apache Storm

0

200

400

600

800

1000

1200

1400

1600

0

0.2 0.4 0.6 0.8

1

T

h

ro

u

g

h

p

u

t

(k

e

y

s

/s

)

(a) CPU delay (ms)

PKG

SG

KG

1000

1100

1200

0.10

0

2.10

6

4.10

6

6.10

6

(b) Memory (keys)

10s

10s

30s

30s 60s

60s

300s

300s

600s

600s

PKG

SG

KG

(28)

Load Balancing for Distributed Stream

Processing Engines

Muhammad Anis Uddin Nasir

(29)
(30)

Skewness

Worker

Worker

Worker

Worker

Worker

Worker

Source

Source

Source

Source

(31)

Skewness

Worker

Worker

Worker

Worker

Worker

Worker

Source

Source

Source

Source

(32)

Skewness

Worker

Worker

Worker

Worker

Worker

Worker

Source

Source

Source

Source

Worker

Worker

(33)

High level ideas

Splitting the high frequency keys on more than

two workers

Challenges

How to decide which are the high frequency keys?

How many workers to split the high frequency

(34)

Solution

Divide stream into two components

Heavy Hitters

Tail Distribution

Split Heavy Hitters on all the workers

Use Greedy W-Choices or Round Robin for heavy hitters

Use vanilla PKG for tail

(35)

Skewness

Worker

Worker

Worker

Worker

Worker

Worker

Source

Source

Source

Source

(36)

Experiments- Threshold

10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 0.4 0.8 1.2 1.6 2 I( t) ( m e ss a g e s) skew 5 workers W-Choice 2/|W| 1/|W| 1/2|W| 1/4|W| 1/8|W| 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 0.4 0.8 1.2 1.6 2 skew 10 workers W-Choice 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 0.4 0.8 1.2 1.6 2 skew 50 workers W-Choice 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 0.4 0.8 1.2 1.6 2 skew 100 workers W-Choice 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 0.4 0.8 1.2 1.6 2 I( t) ( m e ss ag e s) skew 5 workers RR 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 0.4 0.8 1.2 1.6 2 skew 10 workers RR 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 0.4 0.8 1.2 1.6 2 skew 50 workers RR 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 0.4 0.8 1.2 1.6 2 skew 100 workers RR
(37)

Experiments- Load Imbalance

10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 0.4 0.8 1.2 1.6 2 I( t) ( m e ss a g e s) skew 5 workers 10k unique items PKG W-Choice RR 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 0.4 0.8 1.2 1.6 2 skew 10 workers 10k unique items 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 0.4 0.8 1.2 1.6 2 skew 50 workers 10k unique items 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 0.4 0.8 1.2 1.6 2 skew 100 workers 10k unique items 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 0.4 0.8 1.2 1.6 2 I( t) ( m e ss ag e s) skew 5 workers 100k unique items 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 0.4 0.8 1.2 1.6 2 skew 10 workers 100k unique items 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 0.4 0.8 1.2 1.6 2 skew 50 workers 100k unique items 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 0.4 0.8 1.2 1.6 2 skew 100 workers 100k unique items 10-6 10-5 10-4 10-3 10-2 10-1 100 I( t) ( m e ss ag e s) 5 workers 1M unique items 10-6 10-5 10-4 10-3 10-2 10-1 100 10 workers 1M unique items 10-6 10-5 10-4 10-3 10-2 10-1 100 50 workers 1M unique items 10-6 10-5 10-4 10-3 10-2 10-1 100 100 workers 1M unique items
(38)

Conclusion

Partial Key Grouping (PKG) reduces the load imbalance

by up to seven orders of magnitude compared to Key

Grouping

PKG imposes constant memory and aggregation

overhead, i.e., O(1), compared to Shuffle Grouping that

is O(W)

W-Choices improves PKG to achieve nearly perfect load

(39)

Load Balancing for Distributed Stream

Processing Engines

Muhammad Anis Uddin Nasir

References

Related documents