Pipelining and load-balancing in parallel joins on distributed machines

(1)

Pipelining and load-balancing in

parallel joins on distributed

machines

M. Bamha

(2)

Plan

Introduction

Shared Nothing Machines Join operations in RDBMS

Mono-processor join and its complexity Parallel Join and its complexity

Parallelization of join operation using hash functions Problem of load imbalance due to data skew

Attribute Value Skew (AVS) Join Product Skew (JPS)

Osfa-join : a skew-insensitive parallel join algorithm Parallel execution strategies for multi-join queries Sequential Parallel vs Pipelined Osfa_join execution Conclusion

(3)

Shared Nothing machine (SN)

Pn P2

P1

Interconnection Network

Memory Memory Memory

(4)

Join of two relations

The

join

of two relations

R

and

S

on

attribute

A

of

R

and attribute

B

of

S

is the

relation

T

, written

R

_o

_n

S

, obtained by

concatenating the pairs of tuples from

R

(5)

Example

-1-Relation R Relation S

Company

Item Product Item Company

Product Company R S 2 item3 2 2 2 2 prod1 prod2 prod3 prod4 prod5 prod6 1 6 tuples 7 tuples 2 2 3 3 3 item1 item2 item3 item4 item5 item6 item7 4 3 3 2 3 5 prod1 prod1 prod2 prod2 prod3 prod3 prod3 prod4 prod4 prod4 13 tuples prod5 prod5 prod5 item4 item5 item4 item5 item2 item6 item2 item3 item6 item2 item3 item6 3 3 3 3 3 3 3 3 3

(6)

Join’s complexity

The sequential join processing of two relations R and S

requires at least the time for input and output:

bound_inf₁ = Ω (|R| + |S| + |R _o_n S|) .

Parallel processing using p processors requires therefore

bound_infp = 1

(7)

Parallel join

Parallel join usually proceeds in two phases:

1. a redistribution phase where the relations to join are partitioned into distinct buckets. These buckets are generally generated using a hash function of the join attribute and sent to distinct processors

2. a join phase where each processor computes the join of its local buckets.

(8)

Example

-1.1-→ Number of processors = 3

→ Hashing function : (Company mod 3) +1

3 2 1 Produit Company Relation R3 2 prod4 prod2 3 2 tuples Relation S3 Item Company item4 item3 2 3 2 tuples (1) (3) (1) (3) Processor 3 Relation R2 prod3 3 Relation S2 Item Company item2 item5 3 2 2 tuples (1) (3) (1) Processor 2 Produit Company Relation R1 prod5 prod1 prod6 3 tuples Company item1 item7 item6 4 5 3 3 tuples Relation S1 (2) (1) (3) (2) (3) (1) Processor 1 1 tuples Item Produit Company

(9)

Example

-1.2-Product Company Relation R1 3 3 tuples Product Company Relation R2 Product Company Relation R3 2 tuples item6 3 3 tuples

Relation S1 Relation S2 Relation S3

Item Company item4 2 prod3 prod4 prod5 3 3 item2 item3 3 3 prod6 1 item1 4 prod1 prod2 2 2 2 5 item5 item7 1 tuples 3 tuples 1 tuples Processor 2 Processor 3 Processor 1 Company Company Item Item

(10)

Example

-1.3-0 tuples

4 tuples

Product Item Company

item6 3 item6 item2 item3 3 3 3 item6 item2 item3 3 3 3 item2 item3 3 3 prod3 prod3 prod3 prod4 prod4 prod4 prod5 prod5 prod5 S1 Product Company S2 R2 Company Item prod1 item4 item5 item4 item5 prod2 2 2 2 2 prod1 prod2 R3 S3 R1 9 tuples

Processor 1 Processor 2 Processor 3

Product Item

(11)

Problem of data skew

Attribute Value Skew (AVS):

It arises due to non-uniform distributions in the join attribute in the input relation. It may result a biased distribution of data among the processors whenever values are mapped to processors without taking their frequencies into account.

Join Product Skew (JPS): it arises due to differences in the

join selectivity of data processed by different nodes. JPS can occur even if the input relations have no AVS and is one of the subtler kinds of skew to detect.

(12)

Example - 2.1

Relation R Relation S R S Company Frequency .. .. .. .. .. .. .. 1 2 3 1000 1000 1000 Company Frequency .. 2 3 4 5 1 .. .. 2 3 .. 6 .. 10 1000 1000 1000 1000 3000 tuples 3011 tuples 1001000 tuples 1000000 .. Company Frequency

(13)

Example - 2.2

R1 S1 R2 S2 R3 S3 .. ₃ ₁₀₀₀ Company Frequency .. ₁₀₀₀ 1000 tuples .. ₂ 6 10 5 1000 .. .. 1010 tuples 1 1001 tuples .. ₃ _1000000 .. ₂ ₁₀₀₀ 4 Company Frequency .. .. ₁₀₀₀ 1000 tuples 3 Company Frequency .. .. ₁₀₀₀ 1000 tuples 1 Company Frequency .. .. ₂ ₁₀₀₀ 1000 tuples R1 R2 R3 S1 S2 S3 Company Frequency .. .. Company Frequency .. .. Company Frequency

(14)

Osfa_join algorithm

Osfa join a skew insensitive parallel join, it proceeds as

follow:

Computes Histograms of bases relations,

Creates communication templates according to the frequencies of join attribute

Exchange data

Computes local join

⇒ avoids the slowdown due to AVS and JPS

⇒ has an optimal complexity

(15)

Example - 3.1

Relation R Relation S R S Company .. Frequency .. .. .. .. .. .. ++++++++++++++++ ++++++++++++++++ 1 1 2 3 4 5 1000 2000 3 15 .. .. .. .. .. 1000000 tuples 1000000 tuples .. .. .. .. .. 1000 2000000 1 2 3 4 5 8000 1000 5 2 8 1000 1 2 3 4 5 .. 15 30 2009045 tuples .. Company Frequency .. Company Frequency

(16)

Example - 3.2

R1 R2 R3 2 333 8 1000 1 334 15 666 .. .. .. .. .. .. .. 1 2 3 1 2 3 5 Company .. Frequency 1 2 3 4 ₃ .. .. .. .. .. .. .. .. .. .. .. 1 2 3 8 1000 1 2 3 5 Company .. Frequency 1 2 3 4 .. .. .. .. 334 333 8 1000 S1 S2 S3 1 1 333 ₃₃₃ 5 1342 tuples 1346 tuples 1343 tuples 667 1016 tuples 1004 tuples

667 1001 tuples .. Company Frequency .. Company Frequency .. Company Frequency .. Company Frequency

(17)

Example - 3.3

.. .. .. .. .. .. .. 1 2 3 1 2 3 5 1 2 3 4 .. .. .. .. 334 333

667000 333 666000 15 30 2672 2664 2664 670012 tuples 669035 tuples S2 S3 R1 S1 R2 R3 667000 669998 tuples .. Frequency

(18)

Parallel strategies for multi-join queries

⇒ Optimization of the query execution,

(19)

Sequential parallel execution (SP)

1 1..10 1..10 1..10 1..10 2 4 3 10 9 8 7 6 5 4 3 2 1 11111111222233333333334444 11111111222233333333334444 11111111222233333333334444 11111111222233333333334444 11111111222233333333334444 11111111222233333333334444 11111111222233333333334444 11111111222233333333334444 11111111222233333333334444 11111111222233333333334444 Processors Time Joins

(20)

Parallel synchronous execution (PS)

1 1..10 1..10 2 4 3 10 9 8 7 6 5 4 3 2 1 1..4 5..10 222222222222223333334444444 222222222222223333334444444 222222222222223333334444444 222222222222223333334444444 222222222222223333334444444 222222222222223333334444444 11111111 3333334444444 11111111 3333334444444 11111111 3333334444444 11111111 3333334444444 Processors Time Joins

(21)

Segmented right-deep execution (SD)

1 1..10 2 4 3 10 9 8 7 6 5 4 3 2 1 2..6 7..10 1 111111222222222222222_{111111222222222222222} 111111222222222222222 111111222222222222222 111111444444444 444 4444 111111333333 33 33 33 111111333333 33 33 333 111111333333 33 33 333 111111333333 33 33 33 111111333333 33 33 33 Processors Time Joins

(22)

Full parallel execution (FP)

1 2 4 3 10 9 8 7 6 5 4 3 2 1 1 5..8 9..10 2..4 22222222222222222222222 111111111111111111 111111111111111111 111111111111111111 111111111111111111 2222222222222222222222 33 33 333333333333333 33 33 333333333333333 33 33 333333333333333 44444444 4444 44 444 444444 Processors Time

(23)

Sequential parallel vs Pipelined Osfa_join

a2 a1 a3 S1 R1 R3 R2

6. Create comm. templates, Exchange data,

Compute S4=R3 join S3 and save it to disk.

Exchange data,

2. Compute comm. templates, Exchange data,

Compute S3= R2 join S2 and save it to disk, 4. Compute comm. templates,

5. Create Hist(S3).

1. Create Hist(S1) on a1 3. Create Hist(R2) on a2 3. Create Hist(S2).

1. Create Hist(R1) on a1 5. Create Hist(R3) on a3 a2 a1 a3 S1 R1 R2 R3

Create Hist(S2) on a2, Exchange data,

2. Compute comm. templates, 4. Create comm. templates,

Exchange data,

Compute S4=R3 join S3 and save it to disk.

Create Hist(S3) on a3, 4. Compute S3= R2 join S2. 3. Compute comm. templates, Exchange data,

1. Create Hist(S1) on a1 1. Create Hist(R1) on a1

1. Create Hist(R2) on a2 1. Create Hist(R3) on a3

(24)

Pipelined Osfa_join

Reduces disk input/output for intermediate join results, Allows flexible resource allocation,

(25)

Conclusion

Pipelined parallelism has been induced successfully in

Osfa join algorithm:

Avoids the slowdown due to AVS and JPS,

Guarantee a perfect balancing properties during all the stages of join computation,

Reduces disk input/output for intermediate results, Allows flexible resource allocation,

(26)

Future work:

Improving the pipelined algorithm to generate only relevant tuples for intermediate join results,

Extend the algorithm to handle long chains of pipeline, Extend the algorithm for GRID computing.