• No results found

Load Balancing Techniques

N/A
N/A
Protected

Academic year: 2021

Share "Load Balancing Techniques"

Copied!
15
0
0

Loading.... (view fulltext now)

Full text

(1)

Parallel Computing(Load Balancing): Rajeev Wankar 1

Load Balancing Techniques

Lecture Outline

Following Topics will be discussed

 Static Load Balancing

 Dynamic Load Balancing

 Mapping for load balancing

 Minimizing Interaction

(2)

Parallel Computing(Load Balancing): Rajeev Wankar 3

Load Balancing Techniques

Objectives:

The amount of computation assigned to each processor is balanced so that some processors do not sit idlewhile others are executingtasks.

The interactionamong the different processors is

minimized, so that the processors spend most of the time in doing work.

Many times to balance the load among processors, it may be necessary to assign tasks, that interact heavily, to different processors.

General classes of problems

 The firstclass consists of problems in which all the tasks are available at the beginning of the computation but the amount of timerequired by each task is different

 The second class consists of problems in which tasks are available at the beginning but as the computation progresses, the amount of timerequired by each task changes

 The third class consists of problems in which tasksare not available at the beginning but are generated

dynamically

(3)

Parallel Computing(Load Balancing): Rajeev Wankar 5

Load Balancing Techniques

Static load-balancing

 Distribute the work among processors prior to the execution of the algorithm

 Example: Matrix-Matrix Computation

 Easy to design and implement

Dynamic load-balancing

 Distribute the work among processors during the execution of the algorithm

 Algorithms that require dynamic load-balancing are somewhat more complicated (Examples: Parallel Graph Partitioning and Adaptive Finite Element Computations)

Static load-balancing

Before the execution of any process. Some potential static load balancing techniques:

• Round robin algorithm — passes out tasks in sequential order of processes, coming back to the first when all processes have been given a task

• Randomized algorithms — selects processes at random to take tasks

• Recursive bisection — recursively divides the problem into sub problems of equal computational effort while minimizing message passing

• Simulated annealing — an optimization technique

• Genetic algorithm— another optimization technique

(4)

Parallel Computing(Load Balancing): Rajeev Wankar 7

Static load-balancing

Since load is balanced prior to the execution, several fundamental flaws with static load balancing even if a mathematical solution exists:

• Very difficult to estimate accurately the execution times of various parts of a program without actually executing the parts.

• Communication delays that vary under different circumstances

• Some problems have an indeterminate number of steps to reach their solution.

General Strategy for Load Balancing

(5)

Parallel Computing(Load Balancing): Rajeev Wankar 9

Static Load Balancing Ex.

Array distribution schemes:

 One dimensional (strip)

 Block distributions of matrix (checkerboard)

 Block cyclic distributions

 Randomized block distributions

Using Stripped Data Decomposition

When stripped data decomposition is used to derive

concurrency, a suitable decomposition of data can itself be used to balance the load and minimize interactions.

(6)

Parallel Computing(Load Balancing): Rajeev Wankar 11

P0

P1

P2

P3

P4

P5

P6

P7

A B C

P0

P1

P2

P3

P4

P5

P6

P7

P0

P1

P2

P3

P4

P5

P6

P7

Matrix-Matrix addition

Block distribution (Dense matrix-matrix multiplication)

(7)

Parallel Computing(Load Balancing): Rajeev Wankar 13

(a) will lead to load imbalances sparse matrix in Gaussian elimination.

Using the block-cyclicdistribution shown in (b) to distribute the computations to processors

P0 P1 P2 P3 P0 P1 P2 P3 P4 P5 P6 P7 P4 P5 P6 P7 P8 P9 P10 P11 P8 P9 P10 P11 P12 P13 P14 P15 P12 P13 P14 P15 P0 P1 P2 P3 P0 P1 P2 P3

P4 P5 P6 P7 P4 P5 P6 P7

P8 P9 P10 P11 P8 P9 P10 P11

P12 P13 P14 P15 P12 P13 P14 P15

Schemes for Dynamic Load Balancing

Dynamic partition

 There are problems in which we cannot statically partition the work among the processors

 In these problems, a static work partitioning is either

impossible (e.g. first class) or can potentially lead to serious load imbalance problems (e.g., second and third classes)

 The only way to develop efficient message passing programs for these classes of problem is if we allow dynamic load balancing

 Thus during the execution of the program, work is dynamically transferredamong the processors that have a lot of work to one that has little or no work

(8)

Parallel Computing(Load Balancing): Rajeev Wankar 15

Dynamic Load Balancing

Can be classified as:

• Centralized

• Decentralized

Centralized dynamic load balancing

Tasks handed out from a centralized location. Master-slave structure.

Decentralized dynamic load balancing Tasks are passed between arbitrary processes.

A collection of worker processes operate upon the problem and interact among themselves, finally reporting to a single process.

A worker process may receive tasks from other worker processes and may send tasks to other worker processes (to complete or pass on at their discretion).

(9)

Parallel Computing(Load Balancing): Rajeev Wankar 17

Centralized Dynamic Load Balancing

Master process(or) holds the collection of tasks to be performed.

Tasks are sent to the slave processes. When a slave process completes one task, it requests another task from the master process.

Terms used : work pool, replicated worker, processor farm.

Centralized work pool

(10)

Parallel Computing(Load Balancing): Rajeev Wankar 19

Termination

Computation terminates when:

• The task queue is empty and

• Every process has made a request for another task without any new tasks being generated Not sufficient to terminate when task queue empty

Reason:if one or more processes are still running if a running process may provide new tasks for task queue.

Decentralized Dynamic Load Balancing

Distributed Work Pool

(11)

Parallel Computing(Load Balancing): Rajeev Wankar 21

Process Selection

Algorithms for selecting a process:

Round robin algorithm – process Pi requests tasks from process Px, where x is given by a counter that is

incremented after each request, using modulo n arithmetic (n processes), excluding x = i.

Random polling algorithm – process Pi requests tasks from process Px,where x is a number that is selected randomly between 0 and n - 1 (excluding i).

Load Balancing Using a Line Structure

(12)

Parallel Computing(Load Balancing): Rajeev Wankar 23

Master process (P0) feeds queue with tasks at one end, and tasks are shifted down queue.

When a process, Pi (1 ≤ i < n), detects a task at its input from queue and process is idle, it takes task from queue.

Then tasks shuffle down to the right in queue so that space held by task is filled. A new task is inserted into the left side end of queue.

Eventually, all processes have a task and queue filled with new tasks. High- priority or larger tasks could be placed in queue first.

Load Balancing Using a Line Structure

Code Using Time Sharing Between Communication and Computation

Master process (P0)

for (i = 0; i < no_tasks; i++) {

recv(P1, request_tag); /*request for task*/

send(&task, P1, task_tag);/*send tasks into queue*/

}

recv(P1, request_tag); /*request for task*/

send(&empty, P1, task_tag); /*in case end of tasks*/

(13)

Parallel Computing(Load Balancing): Rajeev Wankar 25

Process Pi(1 < i < n) if (buffer == empty) {

send(Pi-1, request_tag); /* request new task */

recv(&buffer, Pi-1, task_tag); /* task from left proc */

}

if ((buffer == full) && (!busy)) { /* get next task */

task = buffer; /* get task*/

buffer = empty; /* set buffer empty */

busy = TRUE; /* set process busy */

}

nrecv(Pi+1, request_tag, request); /* check msg from right */

if (request && (buffer == full)) {

send(&buffer, Pi+1); /* shift task forward */

buffer = empty;

}

if (busy) { /* continue on current task */

Do some work on task.

If task finished, set busy to false.

}

Nonblocking nrecv() necessary to check for a request being received from right.

Load balancing using a tree

Tasks passed from node into one of the two nodes below it when node buffer empty.

(14)

Parallel Computing(Load Balancing): Rajeev Wankar 27

General Techniques for Choosing Right Parallel Algorithm

» Maximize data locality

» Minimize volume of data

» Minimize frequency of Interactions

» Overlapping computations with interactions.

» Decision on Data replication

» Minimize construction

» Use highly optimized collective interaction operations

» Collective data transfers and computations

» Maximize Concurrency

Decision tree to choose a mapping strategy

(15)

Parallel Computing(Load Balancing): Rajeev Wankar 29 Static number of

tasks

Structured communication

pattern

Unstructured communication

pattern

Roughly constant computation time

per task

Join tasks to minimize communication.

Create one task per processor

Cyclically map tasks to processors for computational load balancing

Computation time per task

varies

Use static load balancing techniques

Dynamic number of tasks

Many short-lived tasks. No inter task

communication Frequent

communications between tasks

Use dynamic load balancing techniques

Use run time task scheduling algorithms

References

Related documents

A policy in which all of the fixed assets of a firm are financed with long-term capital, but some of the firm’s permanent current assets are financed with short-term

Proprietary Schools are referred to as those classified nonpublic, which sell or offer for sale mostly post- secondary instruction which leads to an occupation..

4.1 The Select Committee is asked to consider the proposed development of the Customer Service Function, the recommended service delivery option and the investment required8. It

How Many Breeding Females are Needed to Produce 40 Male Homozygotes per Week Using a Heterozygous Female x Heterozygous Male Breeding Scheme With 15% Non-Productive Breeders.

1. The first class consists of problems in which all the tasks are available at the beginning of the computation but the amount of time required by each task is

Results suggest that the probability of under-educated employment is higher among low skilled recent migrants and that the over-education risk is higher among high skilled

In this PhD thesis new organic NIR materials (both π-conjugated polymers and small molecules) based on α,β-unsubstituted meso-positioning thienyl BODIPY have been

• Follow up with your employer each reporting period to ensure your hours are reported on a regular basis?. • Discuss your progress with