• No results found

Load balancing; Termination detection

N/A
N/A
Protected

Academic year: 2021

Share "Load balancing; Termination detection"

Copied!
71
0
0

Loading.... (view fulltext now)

Full text

(1)

Load balancing; Termination detection

Parallel and Distributed Computing

Department of Computer Science and Engineering (DEI) Instituto Superior T´ecnico

(2)

Outline

Load Balancing static dynamic centralized decentralized Termination Detection Debugging

(3)

Foster’s Design Methodology

Development of scalable parallel algorithms by delaying machine-dependent decisions to later stages.

Four steps:

partitioning communication agglomeration mapping

(4)

Mapping

Examples discussed in previous classes: circuit satisfiability

sieve of Eratosthenes all pairs shortest paths matrix-vector multiplication

So far, all primitive tasks require same amount of computation.

Aggregation + Mapping strategy:

create one task per processor and distribute primitive tasks evenly.

What if we can not predict beforehand the amount of computation required per primitive task?

(5)

Mapping

Examples discussed in previous classes: circuit satisfiability

sieve of Eratosthenes all pairs shortest paths matrix-vector multiplication

So far, all primitive tasks require same amount of computation.

Aggregation + Mapping strategy:

create one task per processor and distribute primitive tasks evenly. What if we can not predict beforehand the amount of computation

(6)

Load Balancing

P4 P3 P2 P1 P0 t

Perfect load balancing

P4 P3 P2 P1 P0 t Imperfect load balancing

Parallel execution time defined by last task to finish. Parallel time minimized when load distributedevenly.

(7)

Load Balancing

P4 P3 P2 P1 P0 t Perfect load balancing

P4 P3 P2 P1 P0 t’ Imperfect load balancing

Parallel execution time defined by last task to finish.

(8)

Load Balancing

P4 P3 P2 P1 P0 t Perfect load balancing

P4 P3 P2 P1 P0 t’ Imperfect load balancing

Parallel execution time defined by last task to finish. Parallel time minimized when load distributedevenly.

(9)

Load Balancing

Static load balancing: assignment of tasks to processors decided during program development

⇒ mapping problem.

Dynamic load balancing: assignment of tasks to processors decided during runtime

(10)

Load Balancing

Static load balancing: assignment of tasks to processors decided during program development

⇒ mapping problem.

Dynamic load balancing: assignment of tasks to processors decided during runtime

(11)

Static Load Balancing

Substantial amount of research has been dedicated to static load balancing.

Problem Statement (Partition Problem)

Given a number of tasks, each with a given computational effort, and a number of partitions (processors), assign each to a partition such that the maximum computational effort over all partitions is minimized.

NP-Hard combinatorial problem.

(12)

Static Load Balancing

Substantial amount of research has been dedicated to static load balancing.

Problem Statement (Partition Problem)

Given a number of tasks, each with a given computational effort, and a number of partitions (processors), assign each to a partition such that the maximum computational effort over all partitions is minimized.

NP-Hard combinatorial problem.

(13)

Static Load Balancing

Substantial amount of research has been dedicated to static load balancing.

Problem Statement (Partition Problem)

Given a number of tasks, each with a given computational effort, and a number of partitions (processors), assign each to a partition such that the maximum computational effort over all partitions is minimized.

NP-Hard combinatorial problem.

(14)

Limitations of Static Load Balancing

Limitations of static load balancing:

computational effort of each task may not be known a-priori

⇒ some problems have an indeterminate number of steps to complete

programs are subject to variable communication delays

performance of each processor may vary, and may not be known beforehand

Dynamic load balancing overcomes these issues by making the division of the load dependent on the actual runtimes.

(15)

Limitations of Static Load Balancing

Limitations of static load balancing:

computational effort of each task may not be known a-priori

⇒ some problems have an indeterminate number of steps to complete

programs are subject to variable communication delays

performance of each processor may vary, and may not be known beforehand

Dynamic load balancing overcomes these issues by making the division of the load dependent on the actual runtimes.

(16)

Limitations of Static Load Balancing

Limitations of static load balancing:

computational effort of each task may not be known a-priori

⇒ some problems have an indeterminate number of steps to complete

programs are subject to variable communication delays

performance of each processor may vary, and may not be known beforehand

Dynamic load balancing overcomes these issues by making the division of the load dependent on the actual runtimes.

(17)

Limitations of Static Load Balancing

Limitations of static load balancing:

computational effort of each task may not be known a-priori

⇒ some problems have an indeterminate number of steps to complete

programs are subject to variable communication delays

performance of each processor may vary, and may not be known beforehand

Dynamic load balancing overcomes these issues by making the division of the load dependent on the actual runtimes.

(18)

Limitations of Static Load Balancing

Limitations of static load balancing:

computational effort of each task may not be known a-priori

⇒ some problems have an indeterminate number of steps to complete

programs are subject to variable communication delays

performance of each processor may vary, and may not be known beforehand

Dynamic load balancing overcomes these issues by making the division of the load dependent on the actual runtimes.

(19)

Dynamic Load Balancing

Dynamic load balancing can be accomplished through:

centralized management

one process (master) is responsible for assigning tasks toslave

processes

decentralized management

(20)

Centralized Dynamic Load Balancing

Work Poolor Processor Farm model:

master holds all tasks of the application

new tasks may be generated during execution

when idle, slave process requests a task to the master

master selects tasks among those ready to run and sends to slave

tasks of larger size or importance are executed first

if tasks are all the same, a FIFO queue may be used

specialized slaves can be considered master can also hold global data

(21)

Centralized Dynamic Load Balancing

Work Poolor Processor Farm model:

master holds all tasks of the application new tasks may be generated during execution

when idle, slave process requests a task to the master

master selects tasks among those ready to run and sends to slave

tasks of larger size or importance are executed first

if tasks are all the same, a FIFO queue may be used

specialized slaves can be considered master can also hold global data

(22)

Centralized Dynamic Load Balancing

Work Poolor Processor Farm model:

master holds all tasks of the application new tasks may be generated during execution

when idle, slave process requests a task to the master

master selects tasks among those ready to run and sends to slave

tasks of larger size or importance are executed first

if tasks are all the same, a FIFO queue may be used

specialized slaves can be considered master can also hold global data

(23)

Centralized Dynamic Load Balancing

Work Poolor Processor Farm model:

master holds all tasks of the application new tasks may be generated during execution

when idle, slave process requests a task to the master

master selects tasks among those ready to run and sends to slave

tasks of larger size or importance are executed first

if tasks are all the same, a FIFO queue may be used

specialized slaves can be considered master can also hold global data

(24)

Centralized Dynamic Load Balancing

Work Poolor Processor Farm model:

master holds all tasks of the application new tasks may be generated during execution

when idle, slave process requests a task to the master

master selects tasks among those ready to run and sends to slave

tasks of larger size or importance are executed first

if tasks are all the same, a FIFO queue may be used

specialized slaves can be considered master can also hold global data

(25)

Centralized Dynamic Load Balancing

Work Poolor Processor Farm model:

master holds all tasks of the application new tasks may be generated during execution

when idle, slave process requests a task to the master

master selects tasks among those ready to run and sends to slave

tasks of larger size or importance are executed first

if tasks are all the same, a FIFO queue may be used

specialized slaves can be considered master can also hold global data

(26)

Centralized Dynamic Load Balancing

Work Poolor Processor Farm model:

master holds all tasks of the application new tasks may be generated during execution

when idle, slave process requests a task to the master

master selects tasks among those ready to run and sends to slave

tasks of larger size or importance are executed first

if tasks are all the same, a FIFO queue may be used

specialized slaves can be considered

(27)

Centralized Dynamic Load Balancing

Work Poolor Processor Farm model:

master holds all tasks of the application new tasks may be generated during execution

when idle, slave process requests a task to the master

master selects tasks among those ready to run and sends to slave

tasks of larger size or importance are executed first

if tasks are all the same, a FIFO queue may be used

(28)

Centralized Dynamic Load Balancing

Work Poolor Processor Farm model:

T1 T4 T5 T6 Tn T3 T2 T7 Master Process Work pool P0 Send Task Request Task (possibly, submit new task) P1 P2 Pp Slave Processes

(29)

Centralized Dynamic Load Balancing

Limitations of centralized dynamic load balancing:

Master can become a bottleneck as it can only issue one task at a time.

not a significant problem if few slaves and/or computationally intensive tasks

(30)

Centralized Dynamic Load Balancing

Limitations of centralized dynamic load balancing:

Master can become a bottleneck as it can only issue one task at a time.

not a significant problem if few slaves and/or computationally intensive tasks

(31)

Decentralized Dynamic Load Balancing

Initial approach can be simply to evolve the centralized system into a tree-like task management scheme:

main master distribute tasks to be managed by second-level masters

each second-level master behaves as in the centralized method

(32)

Decentralized Dynamic Load Balancing

Fully distributed work pool: each process starts with a given number of tasks, but may send or receive new tasks to handle.

Receiver-initiatedmethod:

process request new tasks when it has few or no tasks to do

⇒ better in high load systems

Sender-initiated method:

process under heavy load send tasks to processes willing to accept them

A mixture of both approaches is possible.

Which process to send/request? round-robin

(33)

Decentralized Dynamic Load Balancing

Fully distributed work pool: each process starts with a given number of tasks, but may send or receive new tasks to handle.

Receiver-initiatedmethod:

process request new tasks when it has few or no tasks to do

⇒ better in high load systems

Sender-initiated method:

process under heavy load send tasks to processes willing to accept them

A mixture of both approaches is possible. Which process to send/request?

round-robin random

(34)

Decentralized Dynamic Load Balancing

Fully distributed work pool: each process starts with a given number of tasks, but may send or receive new tasks to handle.

Receiver-initiatedmethod:

process request new tasks when it has few or no tasks to do

⇒ better in high load systems

Sender-initiated method:

process under heavy load send tasks to processes willing to accept them

A mixture of both approaches is possible. Which process to send/request?

round-robin random

(35)

Mapping Strategy

Decision tree for mapping strategy:

Static number of tasks Dynamic number of tasks

Structured communication pattern Unstructured communication pattern Roughly constant computation time per task

Computation per task varies

Create one task

per processor Cyclically map tasks

Frequent communication between tasks

Many tasks with little communication

(36)

Termination Detection

When can we be sure that the computation has finished?

Static load balancing

Typically termination is easy to detect has layout of application is fixed and well controlled.

Centralized dynamic load balancing

Also easy for master to recognize termination: task queue is empty

every slave is idle, having requested another task, and without generating any new task

Alternatively, if a slave indicates that a solution has been found, master can terminate all slaves.

Decentralized dynamic load balancing

(37)

Termination Detection

When can we be sure that the computation has finished?

Static load balancing

Typically termination is easy to detect has layout of application is fixed and well controlled.

Centralized dynamic load balancing

Also easy for master to recognize termination: task queue is empty

every slave is idle, having requested another task, and without generating any new task

Alternatively, if a slave indicates that a solution has been found, master can terminate all slaves.

Decentralized dynamic load balancing

(38)

Termination Detection

When can we be sure that the computation has finished?

Static load balancing

Typically termination is easy to detect has layout of application is fixed and well controlled.

Centralized dynamic load balancing

Also easy for master to recognize termination: task queue is empty

every slave is idle, having requested another task, and without generating any new task

Alternatively, if a slave indicates that a solution has been found, master can terminate all slaves.

Decentralized dynamic load balancing

(39)

Termination Detection

When can we be sure that the computation has finished?

Static load balancing

Typically termination is easy to detect has layout of application is fixed and well controlled.

Centralized dynamic load balancing

Also easy for master to recognize termination: task queue is empty

every slave is idle, having requested another task, and without generating any new task

Alternatively, if a slave indicates that a solution has been found, master can terminate all slaves.

Decentralized dynamic load balancing

(40)

Termination Detection

When can we be sure that the computation has finished?

Static load balancing

Typically termination is easy to detect has layout of application is fixed and well controlled.

Centralized dynamic load balancing

Also easy for master to recognize termination: task queue is empty

every slave is idle, having requested another task, and without generating any new task

Alternatively, if a slave indicates that a solution has been found, master can terminate all slaves.

Decentralized dynamic load balancing

(41)

Termination Detection

When can we be sure that the computation has finished?

Static load balancing

Typically termination is easy to detect has layout of application is fixed and well controlled.

Centralized dynamic load balancing

Also easy for master to recognize termination: task queue is empty

every slave is idle, having requested another task, and without generating any new task

Alternatively, if a slave indicates that a solution has been found, master can terminate all slaves.

Decentralized dynamic load balancing

(42)

Termination Detection

When can we be sure that the computation has finished?

Static load balancing

Typically termination is easy to detect has layout of application is fixed and well controlled.

Centralized dynamic load balancing

Also easy for master to recognize termination: task queue is empty

every slave is idle, having requested another task, and without generating any new task

Alternatively, if a slave indicates that a solution has been found, master can terminate all slaves.

Decentralized dynamic load balancing

(43)

Termination Detection

When can we be sure that the computation has finished?

Static load balancing

Typically termination is easy to detect has layout of application is fixed and well controlled.

Centralized dynamic load balancing

Also easy for master to recognize termination: task queue is empty

every slave is idle, having requested another task, and without generating any new task

Alternatively, if a slave indicates that a solution has been found, master can terminate all slaves.

(44)

Termination Detection

In general, termination at time t requires the following conditions:

1 local termination conditions exist for all processes at time t

2 there are no messages in transit at time t

Second condition is what makes this problem difficult.

(45)

Termination Detection

In general, termination at time t requires the following conditions:

1 local termination conditions exist for all processes at time t

2 there are no messages in transit at time t

Second condition is what makes this problem difficult.

(46)

Termination Detection

Using Acknowledgment Messages

each process has two states: active andinactive.

processes start inactive

they become active when they receive first task

sender of tasks becomesparentof process

process may receive many other tasks while active (computation itself need not be a tree)

every time a process receives a task sends an acknowledgment, with the exception of the parent

acknowledgment to parent is only sent when process is ready to become inactive

process is ready to become inactive when

local termination conditions exist (all tasks finished) it has transmitted all acknowledgments for tasks it received it has received all acknowledgments for tasks it has sent out

(47)

Termination Detection

Using Acknowledgment Messages

each process has two states: active andinactive. processes start inactive

they become active when they receive first task

sender of tasks becomesparentof process

process may receive many other tasks while active (computation itself need not be a tree)

every time a process receives a task sends an acknowledgment, with the exception of the parent

acknowledgment to parent is only sent when process is ready to become inactive

process is ready to become inactive when

local termination conditions exist (all tasks finished) it has transmitted all acknowledgments for tasks it received it has received all acknowledgments for tasks it has sent out

(48)

Termination Detection

Using Acknowledgment Messages

each process has two states: active andinactive. processes start inactive

they become active when they receive first task

sender of tasks becomesparentof process

process may receive many other tasks while active (computation itself need not be a tree)

every time a process receives a task sends an acknowledgment, with the exception of the parent

acknowledgment to parent is only sent when process is ready to become inactive

process is ready to become inactive when

local termination conditions exist (all tasks finished) it has transmitted all acknowledgments for tasks it received it has received all acknowledgments for tasks it has sent out

(49)

Termination Detection

Using Acknowledgment Messages

each process has two states: active andinactive. processes start inactive

they become active when they receive first task

sender of tasks becomesparentof process

process may receive many other tasks while active (computation itself need not be a tree)

every time a process receives a task sends an acknowledgment, with the exception of the parent

acknowledgment to parent is only sent when process is ready to become inactive

process is ready to become inactive when

local termination conditions exist (all tasks finished) it has transmitted all acknowledgments for tasks it received it has received all acknowledgments for tasks it has sent out

(50)

Termination Detection

Using Acknowledgment Messages

each process has two states: active andinactive. processes start inactive

they become active when they receive first task

sender of tasks becomesparentof process

process may receive many other tasks while active (computation itself need not be a tree)

every time a process receives a task sends an acknowledgment, with the exception of the parent

acknowledgment to parent is only sent when process is ready to become inactive

process is ready to become inactive when

local termination conditions exist (all tasks finished) it has transmitted all acknowledgments for tasks it received it has received all acknowledgments for tasks it has sent out

(51)

Termination Detection

Using Acknowledgment Messages

each process has two states: active andinactive. processes start inactive

they become active when they receive first task

sender of tasks becomesparentof process

process may receive many other tasks while active (computation itself need not be a tree)

every time a process receives a task sends an acknowledgment, with the exception of the parent

acknowledgment to parent is only sent when process is ready to become inactive

process is ready to become inactive when

local termination conditions exist (all tasks finished) it has transmitted all acknowledgments for tasks it received it has received all acknowledgments for tasks it has sent out

(52)

Termination Detection

Using Acknowledgment Messages

each process has two states: active andinactive. processes start inactive

they become active when they receive first task

sender of tasks becomesparentof process

process may receive many other tasks while active (computation itself need not be a tree)

every time a process receives a task sends an acknowledgment, with the exception of the parent

acknowledgment to parent is only sent when process is ready to become inactive

process is ready to become inactive when

local termination conditions exist (all tasks finished) it has transmitted all acknowledgments for tasks it received it has received all acknowledgments for tasks it has sent out

(53)

Termination Detection

Using Acknowledgment Messages

each process has two states: active andinactive. processes start inactive

they become active when they receive first task

sender of tasks becomesparentof process

process may receive many other tasks while active (computation itself need not be a tree)

every time a process receives a task sends an acknowledgment, with the exception of the parent

acknowledgment to parent is only sent when process is ready to become inactive

process is ready to become inactive when

local termination conditions exist (all tasks finished)

it has transmitted all acknowledgments for tasks it received it has received all acknowledgments for tasks it has sent out

(54)

Termination Detection

Using Acknowledgment Messages

each process has two states: active andinactive. processes start inactive

they become active when they receive first task

sender of tasks becomesparentof process

process may receive many other tasks while active (computation itself need not be a tree)

every time a process receives a task sends an acknowledgment, with the exception of the parent

acknowledgment to parent is only sent when process is ready to become inactive

process is ready to become inactive when

local termination conditions exist (all tasks finished) it has transmitted all acknowledgments for tasks it received

it has received all acknowledgments for tasks it has sent out

(55)

Termination Detection

Using Acknowledgment Messages

each process has two states: active andinactive. processes start inactive

they become active when they receive first task

sender of tasks becomesparentof process

process may receive many other tasks while active (computation itself need not be a tree)

every time a process receives a task sends an acknowledgment, with the exception of the parent

acknowledgment to parent is only sent when process is ready to become inactive

process is ready to become inactive when

local termination conditions exist (all tasks finished) it has transmitted all acknowledgments for tasks it received

(56)

Termination Detection

Using Acknowledgment Messages

Inactive Active Process Parent First Task Final Ack Ack Task Other processes

(57)

Termination Detection

Ring Termination Algorithm

processes become white when they terminate

when P0 becomes white, it sends a white token to P1

the token is passed to the next process in the ring when the process finishes, and:

if the color of the process is black, the token becomes black otherwise, the token keeps the same color

a process Pi becomes black if it sends a message to process Pj and

j < i

a black process becomes white when it passes the token

when P0 receives a black token, it passes a white token

(58)

Termination Detection

Ring Termination Algorithm

processes become white when they terminate

when P0 becomes white, it sends a white token to P1

the token is passed to the next process in the ring when the process finishes, and:

if the color of the process is black, the token becomes black otherwise, the token keeps the same color

a process Pi becomes black if it sends a message to process Pj and

j < i

a black process becomes white when it passes the token

when P0 receives a black token, it passes a white token

(59)

Termination Detection

Ring Termination Algorithm

processes become white when they terminate

when P0 becomes white, it sends a white token to P1

the token is passed to the next process in the ring when the process finishes, and:

if the color of the process is black, the token becomes black otherwise, the token keeps the same color

a process Pi becomes black if it sends a message to process Pj and

j < i

a black process becomes white when it passes the token

when P0 receives a black token, it passes a white token

(60)

Termination Detection

Ring Termination Algorithm

processes become white when they terminate

when P0 becomes white, it sends a white token to P1

the token is passed to the next process in the ring when the process finishes, and:

if the color of the process is black, the token becomes black otherwise, the token keeps the same color

a process Pi becomes black if it sends a message to process Pj and

j < i

a black process becomes white when it passes the token

when P0 receives a black token, it passes a white token

(61)

Termination Detection

Ring Termination Algorithm

processes become white when they terminate

when P0 becomes white, it sends a white token to P1

the token is passed to the next process in the ring when the process finishes, and:

if the color of the process is black, the token becomes black otherwise, the token keeps the same color

a process Pi becomes black if it sends a message to process Pj and

j < i

a black process becomes white when it passes the token

when P0 receives a black token, it passes a white token

(62)

Termination Detection

Ring Termination Algorithm

processes become white when they terminate

when P0 becomes white, it sends a white token to P1

the token is passed to the next process in the ring when the process finishes, and:

if the color of the process is black, the token becomes black otherwise, the token keeps the same color

a process Pi becomes black if it sends a message to process Pj and

j < i

a black process becomes white when it passes the token

when P0 receives a black token, it passes a white token

(63)

Termination Detection

Ring Termination Algorithm

processes become white when they terminate

when P0 becomes white, it sends a white token to P1

the token is passed to the next process in the ring when the process finishes, and:

if the color of the process is black, the token becomes black otherwise, the token keeps the same color

a process Pi becomes black if it sends a message to process Pj and

j < i

a black process becomes white when it passes the token

(64)

Debugging

Most common tool for debugging of MPI programs:

printf! Program runs slower. Difficulties for parallel debugging?

redirection of stdio: MPI takes care of this

by delaying some tasks, behavior of program may change significantly

Don’t forget fflush!

Debuggers can be used to exam execution of individual processes. However:

they change program execution times significantly

cannot capture relevant aspects of parallel computations such as timing of events

(65)

Debugging

Most common tool for debugging of MPI programs: printf!

Program runs slower. Difficulties for parallel debugging? redirection of stdio: MPI takes care of this

by delaying some tasks, behavior of program may change significantly

Don’t forget fflush!

Debuggers can be used to exam execution of individual processes. However:

they change program execution times significantly

cannot capture relevant aspects of parallel computations such as timing of events

(66)

Debugging

Most common tool for debugging of MPI programs: printf!

Program runs slower. Difficulties for parallel debugging?

redirection of stdio: MPI takes care of this

by delaying some tasks, behavior of program may change significantly

Don’t forget fflush!

Debuggers can be used to exam execution of individual processes. However:

they change program execution times significantly

cannot capture relevant aspects of parallel computations such as timing of events

(67)

Debugging

Most common tool for debugging of MPI programs: printf!

Program runs slower. Difficulties for parallel debugging? redirection of stdio: MPI takes care of this

by delaying some tasks, behavior of program may change significantly

Don’t forget fflush!

Debuggers can be used to exam execution of individual processes. However:

they change program execution times significantly

cannot capture relevant aspects of parallel computations such as timing of events

(68)

Debugging

Most common tool for debugging of MPI programs: printf!

Program runs slower. Difficulties for parallel debugging? redirection of stdio: MPI takes care of this

by delaying some tasks, behavior of program may change significantly

Don’t forget fflush!

Debuggers can be used to exam execution of individual processes. However:

they change program execution times significantly

cannot capture relevant aspects of parallel computations such as timing of events

(69)

Debugging

Most common tool for debugging of MPI programs: printf!

Program runs slower. Difficulties for parallel debugging? redirection of stdio: MPI takes care of this

by delaying some tasks, behavior of program may change significantly

Don’t forget fflush!

Debuggers can be used to exam execution of individual processes. However:

they change program execution times significantly

(70)

Review

Load Balancing static dynamic centralized decentralized Termination Detection Debugging

(71)

Next Class

References

Related documents

• Involved in business planning, pre-sales, engagement and program management, as program manager, account manager, project manager for application development and

Subspecialties in radiology are officially recognised in less than 50 % of the countries and subspecialty training starts at a variable level of the training programme. There

The Seismic Provisions for Structural Steel Buildings, ANSI/AISC 341-05 [AISC, 2005c], which apply when the seismic response modification coefficient, R, (as specified in the

In this paper, we’ll look at the first steps in measuring your AppSec program, starting with how to use metrics to understand what is working and where you need to improve,

By reviewing prior studies for the determinants of Cloud CRM service, this study identify the main factors which influence the SMEs adoption include expectation benefit and

Gavin and Wright (2007) as reported by Kirk et al (2014) state that African American patients showed “a lack of understanding about Type 2 diabetes and its relation to their

Thus a Tm-doped ‘all-fibre’ tuneable laser in the 2 µm wavelength region has been designed using a similar approach discussed in Chapter 5 with the key laser parameters such as

If we can improve the influence of the real estate developer on the communities to neutral (rather than a negative relationship), then the nine local communities will remain