• No results found

Branch-and-Bound with Peer-to-Peer for Large-Scale Grids

N/A
N/A
Protected

Academic year: 2021

Share "Branch-and-Bound with Peer-to-Peer for Large-Scale Grids"

Copied!
198
0
0

Loading.... (view fulltext now)

Full text

(1)

Branch-and-Bound with Peer-to-Peer

for Large-Scale Grids

Alexandre di Costanzo

INRIA - CNRS - I3S - Université de Sophia Antipolis Ph.D. defense

(2)
(3)

The Big Picture

(4)

The Big Picture

Objective

Solving combinatorial optimization problems with Grids

(5)

The Big Picture

Objective

Solving combinatorial optimization problems with Grids

(6)

The Big Picture

Objective

Solving combinatorial optimization problems with Grids

Approach

(7)

The Big Picture

Objective

Solving combinatorial optimization problems with Grids

Approach

Parallel Branch-and-Bound and Peer-to-Peer

(8)

The Big Picture

Objective

Solving combinatorial optimization problems with Grids

Approach

Parallel Branch-and-Bound and Peer-to-Peer

Contributions

(9)

The Big Picture

Objective

Solving combinatorial optimization problems with Grids

Approach

Parallel Branch-and-Bound and Peer-to-Peer

Contributions

(10)

The Big Picture

Objective

Solving combinatorial optimization problems with Grids

Approach

Parallel Branch-and-Bound and Peer-to-Peer

Contributions

1. Branch-and-Bound framework for Grids 2. Peer-to-Peer Infrastructure for Grids

(11)

Agenda

Context, Problem, and Related Work Contributions

Branch-and-Bound Framework for Grids Desktop Grid with Peer-to-Peer

Mixing Desktops & Clusters Perspectives & Conclusion

(12)
(13)

Context

Combinatorial Optimization Problems (COPs)

(14)

Context

Combinatorial Optimization Problems (COPs)

costly to solve (finding the best solution)

Branch-and-Bound (BnB)

well adapted for solving COPs

(15)

Context

Combinatorial Optimization Problems (COPs)

costly to solve (finding the best solution)

Branch-and-Bound (BnB)

well adapted for solving COPs

relatively easy to provide parallel version

Grid Computing

(16)
(17)

Branch-and-Bound

Consists of a partial enumeration of all

feasible solutions and returns the guaranteed

optimal solution

(18)

Branch-and-Bound

Feasible solutions are organized as a tree: search-tree

Consists of a partial enumeration of all

feasible solutions and returns the guaranteed

optimal solution

(19)

Branch-and-Bound

Feasible solutions are organized as a tree: search-tree

3 operations:

Consists of a partial enumeration of all

feasible solutions and returns the guaranteed

optimal solution

(20)

Branch-and-Bound

Feasible solutions are organized as a tree: search-tree

3 operations:

Branching: split in sub-problems

Consists of a partial enumeration of all

feasible solutions and returns the guaranteed

optimal solution

(21)

Branch-and-Bound

Feasible solutions are organized as a tree: search-tree

3 operations:

Branching: split in sub-problems

Bounding: compute lower/upper bounds (objective function)

Consists of a partial enumeration of all

feasible solutions and returns the guaranteed

optimal solution

(22)

Branch-and-Bound

Feasible solutions are organized as a tree: search-tree

3 operations:

Branching: split in sub-problems

Bounding: compute lower/upper bounds (objective function)

Pruning: eliminate bad branches

Consists of a partial enumeration of all

feasible solutions and returns the guaranteed

optimal solution

(23)

Branch-and-Bound

Feasible solutions are organized as a tree: search-tree

3 operations:

Branching: split in sub-problems

Bounding: compute lower/upper bounds (objective function)

Pruning: eliminate bad branches

Consists of a partial enumeration of all

feasible solutions and returns the guaranteed

optimal solution

(24)

Branch-and-Bound

Feasible solutions are organized as a tree: search-tree

3 operations:

Branching: split in sub-problems

Bounding: compute lower/upper bounds (objective function)

Pruning: eliminate bad branches

Well adapted for solving COPs [Papadimitriou 98]

return the best combinaison out of all

Consists of a partial enumeration of all

feasible solutions and returns the guaranteed

optimal solution

(25)

Branch-and-Bound

Feasible solutions are organized as a tree: search-tree

3 operations:

Branching: split in sub-problems

Bounding: compute lower/upper bounds (objective function)

Pruning: eliminate bad branches

Consists of a partial enumeration of all

feasible solutions and returns the guaranteed

optimal solution

(26)
(27)

Parallel Branch-and-Bound

COPs are difficult to solve:

(28)

Parallel Branch-and-Bound

COPs are difficult to solve:

enumeration size & NP-hard class

Many studies on parallel approach [Gendron 94, Crainic 06]

node-based: parallel bounding on sub-problems tree-based: building the tree in parallel

(29)

Parallel Branch-and-Bound

COPs are difficult to solve:

enumeration size & NP-hard class

Many studies on parallel approach [Gendron 94, Crainic 06]

node-based: parallel bounding on sub-problems tree-based: building the tree in parallel

multi-search: several trees are generated in parallel

Tree-based is the most studied [Authié 95 , Crainic 06]

the solution tree is not known beforehand

no part of the tree may be estimated at compilation tasks are dynamically generated

(30)

Parallel Branch-and-Bound

COPs are difficult to solve:

enumeration size & NP-hard class

Many studies on parallel approach [Gendron 94, Crainic 06]

node-based: parallel bounding on sub-problems tree-based: building the tree in parallel

multi-search: several trees are generated in parallel

Tree-based is the most studied [Authié 95 , Crainic 06]

the solution tree is not known beforehand

no part of the tree may be estimated at compilation tasks are dynamically generated

task allocations to processors must be done dynamically

(31)

Parallel Branch-and-Bound

COPs are difficult to solve:

enumeration size & NP-hard class

Many studies on parallel approach [Gendron 94, Crainic 06]

node-based: parallel bounding on sub-problems tree-based: building the tree in parallel

multi-search: several trees are generated in parallel

Tree-based is the most studied [Authié 95 , Crainic 06]

the solution tree is not known beforehand

no part of the tree may be estimated at compilation tasks are dynamically generated

(32)

Parallel Branch-and-Bound

COPs are difficult to solve:

enumeration size & NP-hard class

Many studies on parallel approach [Gendron 94, Crainic 06]

node-based: parallel bounding on sub-problems tree-based: building the tree in parallel

multi-search: several trees are generated in parallel

Tree-based is the most studied [Authié 95 , Crainic 06]

the solution tree is not known beforehand

no part of the tree may be estimated at compilation tasks are dynamically generated

task allocations to processors must be done dynamically

distributing issues, such as load-balancing & information sharing

Proposition: Parallel BnB + Grid

Tree-based is the most known by users & Related difficulties are known

(33)

BnB Related Work

Frameworks Algorithms Parallelization Machines PUBB Low-level, Basic B&B SPMD Cluster/PVM

BOB++ Low-level, Basic B&B SPMD Cluster/MPI

PPBB-Lib Basic B&B SPMD Cluster/PVM

PICO Basic B&B, Mixed-integer LP hier. master-worker Cluster/MPI

MallBa Low-level, Basic B&B SPMD Cluster/MPI

ZRAM Low-level, Basic B&B SPMD Cluster/PVM

ALPS/BiCePS

•Low-level

•Basic B&B

•Mixed-integer LP

•Branch&Price&Cut

hier. master-worker Cluster/MPI

Metacomputing MW Basic B&B master-worker Grids/Condor

(34)

BnB Related Work

Frameworks Algorithms Parallelization Machines PUBB Low-level, Basic B&B SPMD Cluster/PVM

BOB++ Low-level, Basic B&B SPMD Cluster/MPI

PPBB-Lib Basic B&B SPMD Cluster/PVM

PICO Basic B&B, Mixed-integer LP hier. master-worker Cluster/MPI

MallBa Low-level, Basic B&B SPMD Cluster/MPI

ZRAM Low-level, Basic B&B SPMD Cluster/PVM

ALPS/BiCePS

•Low-level

•Basic B&B

•Mixed-integer LP

•Branch&Price&Cut

hier. master-worker Cluster/MPI

Metacomputing MW Basic B&B master-worker Grids/Condor

Symphony Mixed-integer LP, Branch&Price&Cut master-worker Cluster/PVM

Most Previous work are based on

SPMD

and target

clusters

(35)
(36)

Grid Computing

Distributed shared computing infrastructure multi-institutional virtual organization

(37)

Grid Computing

Distributed shared computing infrastructure multi-institutional virtual organization

(38)

Grid Computing

Distributed shared computing infrastructure multi-institutional virtual organization

Provide large pool of resources New challenges geographically distributed deployment scalability communication fault-tolerance

multiple administrative domains, heterogeneity, high performance, programming model, etc.

(39)

Grid Computing

Distributed shared computing infrastructure multi-institutional virtual organization

Provide large pool of resources New challenges geographically distributed deployment scalability communication fault-tolerance

multiple administrative domains, heterogeneity, high performance, programming model, etc.

(40)

Grid Computing

Grids involve new concepts for development &

execution of applications

(41)

Grid Computing

Schedulers, Networking, OS

Grids involve new concepts for development &

execution of applications

(42)

Grid Computing

Grid Fabric Schedulers, Networking, OS

Federated Hardware Resources

Grid Middleware Infrastructure

Super-Schedulers, Resource Trading, Information, Security, etc.

Grids involve new concepts for development &

execution of applications

(43)

Grid Computing

Schedulers, Networking, OS

Grid Middleware Infrastructure

Super-Schedulers, Resource Trading, Information, Security, etc.

Grid Programming Models, Tools,

High-Level Access to Middleware

Grids involve new concepts for development &

execution of applications

(44)

Grid Computing

Grid Applications & Portals

Grid Fabric Schedulers, Networking, OS

Federated Hardware Resources

Grid Middleware Infrastructure

Super-Schedulers, Resource Trading, Information, Security, etc.

Grid Programming Models, Tools,

High-Level Access to Middleware

Grids involve new concepts for development &

execution of applications

(45)

Grid Computing

Grid Applications & Portals

Schedulers, Networking, OS

Grid Middleware Infrastructure

Super-Schedulers, Resource Trading, Information, Security, etc.

Grid Programming Models, Tools,

High-Level Access to Middleware

Grids involve new concepts for development &

execution of applications

(46)

Challenges

Combinatorial Optimization Problems

(47)

Challenges

Branch-and-Bound

adapted for solving COPs easy to parallelize

Combinatorial Optimization Problems

(48)

Challenges

Branch-and-Bound

adapted for solving COPs easy to parallelize

Grid Computing

huge number of resources

Combinatorial Optimization Problems

(49)

Challenges

Branch-and-Bound

adapted for solving COPs easy to parallelize

Grid Computing

huge number of resources

Combinatorial Optimization Problems

costly to solve

Use

Branch-and-Bound

on

Grids

for solving

COPs

(50)

Challenges

Branch-and-Bound

adapted for solving COPs easy to parallelize

Grid Computing

huge number of resources

Combinatorial Optimization Problems

costly to solve

Use

Branch-and-Bound

on

Grids

for solving

COPs

+

Efficient communications with Grids is difficult Problem with sharing bounds

(51)

Agenda

Context, Problem, and Related Work Contributions

Branch-and-Bound Framework for Grids Desktop Grid with Peer-to-Peer

Mixing Desktops & Clusters Perspectives & Conclusion

(52)
(53)

BnB for Grids Related Work

Aida et al. focus on the design:

(54)

BnB for Grids Related Work

Aida et al. focus on the design:

hierarchical master-worker scales on Grids

Foster et al. fully decentralized approach:

(55)

BnB for Grids Related Work

Aida et al. focus on the design:

hierarchical master-worker scales on Grids

Foster et al. fully decentralized approach:

communication overhead

ParadisEO focus on meta-heuristics (not exact solution):

(56)

BnB for Grids Related Work

Aida et al. focus on the design:

hierarchical master-worker scales on Grids

Foster et al. fully decentralized approach:

communication overhead

ParadisEO focus on meta-heuristics (not exact solution):

master-worker

Skeletons with farm or divide-and-conquer

(57)
(58)
(59)

BnB on Grids: Problems and Solutions

Latency

communications

Asynchronous

Scalability

Hierarchical master-

worker

Solution tree size

Dynamically generated

by splitting tasks

Share the best

(60)

BnB on Grids: Problems and Solutions

Latency

communications

Asynchronous

Scalability

Hierarchical master-

worker

Solution tree size

Dynamically generated

by splitting tasks

Share the best

bounds

Efficient parallelism &

communication

(61)

BnB on Grids: Problems and Solutions

Latency

communications

Asynchronous

Scalability

Hierarchical master-

worker

Solution tree size

Dynamically generated

by splitting tasks

Share the best

(62)

BnB on Grids: Problems and Solutions

Latency

communications

Asynchronous

Scalability

Hierarchical master-

worker

Solution tree size

Dynamically generated

by splitting tasks

Share the best

bounds

Efficient parallelism &

communication

(63)

BnB on Grids: Problems and Solutions

Latency

communications

Asynchronous

Scalability

Hierarchical master-

worker

Solution tree size

Dynamically generated

by splitting tasks

Share the best

(64)

BnB on Grids: Problems and Solutions

Latency

communications

Asynchronous

Scalability

Hierarchical master-

worker

Solution tree size

Dynamically generated

by splitting tasks

Share the best

bounds

Efficient parallelism &

communication

(65)

BnB on Grids: Problems and Solutions

Latency

communications

Asynchronous

Scalability

Hierarchical master-

worker

Solution tree size

Dynamically generated

by splitting tasks

Share the best

(66)

BnB on Grids: Problems and Solutions

Latency

communications

Asynchronous

Scalability

Hierarchical master-

worker

Solution tree size

Dynamically generated

by splitting tasks

Share the best

bounds

Efficient parallelism &

communication

(67)

BnB on Grids: Problems and Solutions

Latency

communications

Asynchronous

Scalability

Hierarchical master-

worker

Solution tree size

Dynamically generated

by splitting tasks

Share the best

(68)

BnB on Grids: Problems and Solutions

Latency

communications

Asynchronous

Scalability

Hierarchical master-

worker

Solution tree size

Dynamically generated

by splitting tasks

Share the best

bounds

Efficient parallelism &

communication

(69)

BnB on Grids: Problems and Solutions

Latency

communications

Asynchronous

Scalability

Hierarchical master-

worker

Solution tree size

Dynamically generated

by splitting tasks

Share the best

bounds

Efficient parallelism &

communication

Objective: hide Grid difficulties to users Especially communication problems

(70)
(71)

BnB Framework - Entities and Roles

Root Task:

implemented by users

(72)

BnB Framework - Entities and Roles

Root Task:

implemented by users

objective-function and splitting/branching operation

Master: Entry Point

splits the problem in tasks

(73)

BnB Framework - Entities and Roles

Root Task:

implemented by users

objective-function and splitting/branching operation

Master: Entry Point

splits the problem in tasks

collects partial-results ➟ the best solution

Sub-Master:

(74)

BnB Framework - Entities and Roles

Root Task:

implemented by users

objective-function and splitting/branching operation

Master: Entry Point

splits the problem in tasks

collects partial-results ➟ the best solution

Sub-Master:

intermediary between master and workers

Worker:

(75)

Master Sub-master Cluster Worker Worker Cluster Worker Worker Worker Cluster Worker Cluster Worker Sub-master Worker Worker Worker

BnB Framework - Architecture

(76)

Master Sub-master Cluster Worker Worker Worker Cluster Worker Worker Worker Cluster Worker Worker Worker Cluster Worker Worker Worker Sub-master Problem Worker Worker Worker Worker

BnB Framework - Architecture

(77)

Master Sub-master Cluster Worker Worker Cluster Worker Worker Worker Cluster Worker Cluster Worker Sub-master Problem

First splitting

Worker Worker Worker

BnB Framework - Architecture

(78)

Master Sub-master Cluster Worker Worker Worker Cluster Worker Worker Worker Cluster Worker Worker Worker Cluster Worker Worker Worker Sub-master Pending tasks Problem

First splitting

Worker Worker Worker Worker

BnB Framework - Architecture

(79)

Master Sub-master Cluster Worker Worker Cluster Worker Worker Worker Cluster Worker Cluster Worker Sub-master Pending tasks Problem Worker Worker Worker

BnB Framework - Architecture

(80)

Master Sub-master Cluster Worker Worker Worker Cluster Worker Worker Worker Cluster Worker Worker Worker Cluster Worker Worker Worker Sub-master Pending tasks Problem

Ask task

Worker Worker Worker Worker

BnB Framework - Architecture

(81)

Master Sub-master Cluster Worker Worker Cluster Worker Worker Worker Cluster Worker Cluster Worker Sub-master Pending tasks Problem Worker Worker Worker

BnB Framework - Architecture

(82)

Master Sub-master Cluster Worker Worker Worker Cluster Worker Worker Worker Cluster Worker Worker Worker Cluster Worker Worker Worker Sub-master Pending tasks Problem

Ask task

Worker Worker Worker Worker

BnB Framework - Architecture

(83)

Master Sub-master Cluster Worker Worker Cluster Worker Worker Worker Cluster Worker Cluster Worker Sub-master Pending tasks Problem Worker Worker Worker

BnB Framework - Architecture

(84)

Master Sub-master Cluster Worker Worker Worker Cluster Worker Worker Worker Cluster Worker Worker Worker Cluster Worker Worker Worker Sub-master Pending tasks Problem

Got task

Worker Worker Worker Worker

BnB Framework - Architecture

(85)

Master Sub-master Cluster Worker Worker Cluster Worker Worker Worker Cluster Worker Cluster Worker Sub-master Pending tasks Problem Worker Worker Worker

BnB Framework - Architecture

(86)

Master Sub-master Cluster Worker Worker Worker Cluster Worker Worker Worker Cluster Worker Worker Worker Cluster Worker Worker Worker Sub-master Pending tasks Problem

Got task

Worker Worker Worker Worker

BnB Framework - Architecture

(87)

Master Sub-master Cluster Worker Worker Cluster Worker Worker Worker Cluster Worker Cluster Worker Sub-master Pending tasks Problem Worker Worker Worker

BnB Framework - Architecture

(88)

Master Sub-master Cluster Worker Worker Worker Cluster Worker Worker Worker Cluster Worker Worker Worker Cluster Worker Worker Worker Sub-master Pending tasks Problem

Computing

Worker Worker Worker Worker

BnB Framework - Architecture

(89)

Master Sub-master Cluster Worker Worker Cluster Worker Worker Worker Cluster Worker Cluster Worker Sub-master Pending tasks Problem

Sending result

Worker Worker Worker

BnB Framework - Architecture

(90)

Master Sub-master Cluster Worker Worker Worker Cluster Worker Worker Worker Cluster Worker Worker Worker Cluster Worker Worker Worker Sub-master Pending tasks Problem

Computing

Worker Worker Worker Worker

BnB Framework - Architecture

(91)

Master Sub-master Cluster Worker Worker Cluster Worker Worker Worker Cluster Worker Cluster Worker Sub-master Pending tasks Problem Worker Worker Worker

BnB Framework - Architecture

(92)

Master Sub-master Cluster Worker Worker Worker Cluster Worker Worker Worker Cluster Worker Worker Worker Cluster Worker Worker Worker Sub-master

Pending tasks Partial results Problem

Computing

Worker Worker Worker Worker

BnB Framework - Architecture

(93)
(94)

BnB Framework - Solutions

Context ProActive Java Grid middleware: latency ➟ asynchronous communication

(95)

BnB Framework - Solutions

Context ProActive Java Grid middleware: latency ➟ asynchronous communication

underlaying Grid infrastructure ➟ deployment framework (abstraction)

Implement the tree-based parallel

Master-worker architecture

Problem: workers need to share bounds

(96)

BnB Framework - Solutions

Context ProActive Java Grid middleware: latency ➟ asynchronous communication

underlaying Grid infrastructure ➟ deployment framework (abstraction)

Solution 1: Master keeps the bound

Implement the tree-based parallel

Master-worker architecture

Problem: workers need to share bounds

(97)

BnB Framework - Solutions

Context ProActive Java Grid middleware: latency ➟ asynchronous communication

underlaying Grid infrastructure ➟ deployment framework (abstraction)

Solution 1: Master keeps the bound

- previous work shows that not scale [Aida 2003]

Implement the tree-based parallel

Master-worker architecture

Problem: workers need to share bounds

(98)

BnB Framework - Solutions

Context ProActive Java Grid middleware: latency ➟ asynchronous communication

underlaying Grid infrastructure ➟ deployment framework (abstraction)

Solution 1: Master keeps the bound

- previous work shows that not scale [Aida 2003]

Solution 2: Message framework (Enterprise Service Bus)

Implement the tree-based parallel

Master-worker architecture

Problem: workers need to share bounds

(99)

BnB Framework - Solutions

Context ProActive Java Grid middleware: latency ➟ asynchronous communication

underlaying Grid infrastructure ➟ deployment framework (abstraction)

Solution 1: Master keeps the bound

- previous work shows that not scale [Aida 2003]

Solution 2: Message framework (Enterprise Service Bus)

- Grid middleware dependent / Good for SOA

Implement the tree-based parallel

Master-worker architecture

Problem: workers need to share bounds

(100)

BnB Framework - Solutions

Context ProActive Java Grid middleware: latency ➟ asynchronous communication

underlaying Grid infrastructure ➟ deployment framework (abstraction)

Solution 1: Master keeps the bound

- previous work shows that not scale [Aida 2003]

Solution 2: Message framework (Enterprise Service Bus)

- Grid middleware dependent / Good for SOA

Solution 3: Broadcasting

Implement the tree-based parallel

Master-worker architecture

Problem: workers need to share bounds

(101)

BnB Framework - Solutions

Context ProActive Java Grid middleware: latency ➟ asynchronous communication

underlaying Grid infrastructure ➟ deployment framework (abstraction)

Solution 1: Master keeps the bound

- previous work shows that not scale [Aida 2003]

Solution 2: Message framework (Enterprise Service Bus)

- Grid middleware dependent / Good for SOA

Implement the tree-based parallel

Master-worker architecture

Problem: workers need to share bounds

(102)

BnB Framework - Solutions

Context ProActive Java Grid middleware: latency ➟ asynchronous communication

underlaying Grid infrastructure ➟ deployment framework (abstraction)

Solution 1: Master keeps the bound

- previous work shows that not scale [Aida 2003]

Solution 2: Message framework (Enterprise Service Bus)

- Grid middleware dependent / Good for SOA

Solution 3: Broadcasting

- 1 to n communication cannot scale

Implement the tree-based parallel

Master-worker architecture

Problem: workers need to share bounds

(103)
(104)

Organizing Communications for Broadcasting

(105)

Organizing Communications for Broadcasting

Idea: Grids are composed of clusters ➟ organizing Workers in groups

(106)

Organizing Communications for Broadcasting

Idea: Grids are composed of clusters ➟ organizing Workers in groups

(107)

Organizing Communications for Broadcasting

Idea: Grids are composed of clusters ➟ organizing Workers in groups

clusters are high-performance communication environments Solution:

add a new entity for organizing communications: Leader

(108)

Organizing Communications for Broadcasting

Idea: Grids are composed of clusters ➟ organizing Workers in groups

clusters are high-performance communication environments Solution:

add a new entity for organizing communications: Leader

(109)

Organizing Communications for Broadcasting

Idea: Grids are composed of clusters ➟ organizing Workers in groups

clusters are high-performance communication environments Solution:

add a new entity for organizing communications: Leader

Leader is a Worker chose by the Master for each group Process to update Bounds:

(110)

Cluster Worker Worker Worker Leader Cluster Worker Leader Worker Worker Cluster Worker Leader Worker Worker Cluster Leader Worker Worker Worker Sub-master Sub-master

(111)

Cluster Worker Worker Cluster Worker Leader Worker Worker Cluster Worker Leader Cluster Leader Worker Sub-master Sub-master

(112)

Cluster Worker Worker Worker Leader Cluster Worker Leader Worker Worker Cluster Worker Leader Worker Worker Cluster Leader Worker Worker Worker Sub-master Sub-master

(113)

Cluster Worker Worker Cluster Worker Leader Worker Worker Cluster Worker Leader Cluster Leader Worker Sub-master Sub-master

(114)

Cluster Worker Worker Worker Leader Cluster Worker Leader Worker Worker Cluster Worker Leader Worker Worker Cluster Leader Worker Worker Worker Sub-master Sub-master

new best bound!

(115)

Cluster Worker Worker Cluster Worker Leader Worker Worker Cluster Worker Leader Cluster Leader Worker Sub-master Sub-master

(116)

Cluster Worker Worker Worker Leader Cluster Worker Leader Worker Worker Cluster Worker Leader Worker Worker Cluster Leader Worker Worker Worker Sub-master Sub-master

new best bound!

(117)

Cluster Worker Worker Cluster Worker Leader Worker Worker Cluster Worker Leader Cluster Leader Worker Sub-master Sub-master

(118)

Cluster Worker Worker Worker Leader Cluster Worker Leader Worker Worker Cluster Worker Leader Worker Worker Cluster Leader Worker Worker Worker Sub-master Sub-master

new best bound!

(119)

Cluster Worker Worker Cluster Worker Leader Worker Worker Cluster Worker Leader Cluster Leader Worker Sub-master Sub-master

BnB framework - Communications for Sharing Bound

Grid middleware provides communications

(120)

BnB Search Strategies

The improvement of the Bound

1. depends of how it is shared (communication) 2. depends of how the search-tree is generated:

Classical Depth-First Search Breadth-First Search Contribution First-In-First-Out (FIFO) Priority open API ...

(121)
(122)

BnB Framework: Fault-Tolerance

(123)

BnB Framework: Fault-Tolerance

Manage user exception ➟ computation stopped

(124)

BnB Framework: Fault-Tolerance

Manage user exception ➟ computation stopped

For us, a fault is a Failed Stop Worker fault ➟ handled by (sub-)master

(125)

BnB Framework: Fault-Tolerance

Manage user exception ➟ computation stopped

For us, a fault is a Failed Stop Worker fault ➟ handled by (sub-)master

(126)

BnB Framework: Fault-Tolerance

Manage user exception ➟ computation stopped

For us, a fault is a Failed Stop Worker fault ➟ handled by (sub-)master

Leader fault ➟ master choose new one

(127)

BnB Framework: Fault-Tolerance

Manage user exception ➟ computation stopped

For us, a fault is a Failed Stop Worker fault ➟ handled by (sub-)master

Leader fault ➟ master choose new one

Sub-master fault ➟ manager change a worker to sub-master

(128)

BnB Framework: Fault-Tolerance

Manage user exception ➟ computation stopped

For us, a fault is a Failed Stop Worker fault ➟ handled by (sub-)master

Leader fault ➟ master choose new one

Sub-master fault ➟ manager change a worker to sub-master

Master fault ➟ back-up file

(129)

BnB Framework: Fault-Tolerance

Manage user exception ➟ computation stopped

For us, a fault is a Failed Stop Worker fault ➟ handled by (sub-)master

Leader fault ➟ master choose new one

Sub-master fault ➟ manager change a worker to sub-master

Master fault ➟ back-up file

(130)
(131)

Grid’BnB

[HiPC07]

Features

(132)

Grid’BnB

[HiPC07]

Features

Asynchronous communications

Hierarchical master-worker with com. Dynamic task splitting

Efficient communications with groups Fault-tolerance

(133)

Grid’BnB

[HiPC07]

Features

Asynchronous communications

Hierarchical master-worker with com. Dynamic task splitting

Efficient communications with groups Fault-tolerance

Design

(134)

Grid’BnB

[HiPC07]

Features

Asynchronous communications

Hierarchical master-worker with com. Dynamic task splitting

Efficient communications with groups Fault-tolerance

Hidden parallelism and Grid difficulties API for COPs

Ease of deployment Principally tree-based

Implementing and testing search strategies Focus on objective function

Design

(135)

Grid’BnB

[HiPC07]

Features

Asynchronous communications

Hierarchical master-worker with com. Dynamic task splitting

Efficient communications with groups Fault-tolerance

Hidden parallelism and Grid difficulties API for COPs

Ease of deployment Principally tree-based

Design

Users

(136)

Flow-Shop Experiments

NP-hard permutation optimization problem

m1 m2 m3 m4 o1,1 o1,2 o1,3 o1,4 o2,1 o2,2 o2,3 o2,4 o3,1 o3,2 o3,3 o3,4 o4,1 o4,2 o4,3 o4,4

J = {j

1

, j

2

, . . . j

n

}

j

i

= { o

i1

, o

i2

, . . . o

im

}

M = {m

1

,m

2

, . . .m

m

}

(137)

Flow-Shop: 16 Jobs / 20 Machines 0 50 100 150 200 250 Time in minutes

(138)

Flow-Shop: 16 Jobs / 20 Machines 0 10 20 30 40 50 60 70 80 20 30 40 50 60 CPUs Time in minutes 1 2 3 Ratio Factor

With Group Communications Without Group Communications

(139)

Grid'5000

: the French Grid

Sophia Grenoble Lyon Nancy Orsay Lille Rennes Bordeaux Toulouse
(140)

Flow-Shop: 17 Jobs / 17 Machines 0 20 40 60 80 100 120 0 100 200 300 400 500 600 700 CPUs Time in minutes

Grid Experimentations

up to 621 CPUs on 5 sites
(141)

Flow-Shop: 17 Jobs / 17 Machines 20 40 60 80 100 120 Time in minutes

Grid Experimentations

Flow-Shop: 17 Jobs / 17 Machines

20 40 60 80 100 120 Time in minutes 0,05 0,1 0,15 0,2 0,25 0,3

% of explored search tree

(142)

Flow-Shop: 17 Jobs / 17 Machines 0 20 40 60 80 100 120 0 100 200 300 400 500 600 700 CPUs Time in minutes

Grid Experimentations

Flow-Shop: 17 Jobs / 17 Machines

0 20 40 60 80 100 120 0 100 200 300 400 500 600 700 Time in minutes 0 0,05 0,1 0,15 0,2 0,25 0,3

% of explored search tree

up to 621 CPUs on 5 sites

(143)

Speedup Anomalies

& Efficiency

Parallel tree-based speedup may sometimes quite

spectacular (> or < linear)[Mans 95]

Speedup Anomalies in BnB [Roucairol 87, Lai 84, Li 86]

speedup depends of how the tree is dynamically built

Efficiency: is a related measure computed as the speedup

(144)

Speedup Anomalies

& Efficiency

Parallel tree-based speedup may sometimes quite

spectacular (> or < linear)[Mans 95]

Speedup Anomalies in BnB [Roucairol 87, Lai 84, Li 86]

speedup depends of how the tree is dynamically built

Efficiency: is a related measure computed as the speedup

divided by the number of processors.

Flow-Shop: 17 Jobs / 17 Machines

0 0,2 0,4 0,6 0,8 1 1,2 Efficiency

(145)
(146)

Grid'BnB: Results

Experimentally validate our BnB framework for Grids

validity of organizing communications

(147)

Grid'BnB: Results

Experimentally validate our BnB framework for Grids

validity of organizing communications

(148)

Grid'BnB: Results

Experimentally validate our BnB framework for Grids

validity of organizing communications

scalability on Grid (up to 621 CPUs on 5 sites)

Problems:

deployment on Grids is difficult

dynamically acquiring new resources is difficult

popularity of Grid'5000

(149)

Grid'BnB: Results

Experimentally validate our BnB framework for Grids

validity of organizing communications

scalability on Grid (up to 621 CPUs on 5 sites)

Problems:

deployment on Grids is difficult

dynamically acquiring new resources is difficult

popularity of Grid'5000

Grid middleware needs a better supporting of dynamic infrastructure

(150)

Agenda

Context, Problem, and Related Work Contributions

Branch-and-Bound Framework for Grids Desktop Grid with Peer-to-Peer

Mixing Desktops & Clusters Perspectives & Conclusion

(151)

Peer-to-Peer as Grid Middleware

Grid Computing and Peer-to-Peer share a common goal:

sharing resources [Foster 03, Goux 00]

Grid related work: [Globus]

providing computational resources

-

installing/deploying Grid middleware is difficult

P2P related work: [Gnutella, Freenet, DHT]

-

focusing on sharing data & mono-application
(152)

Peer-to-Peer as Grid Middleware

Grid Computing and Peer-to-Peer share a common goal:

sharing resources [Foster 03, Goux 00]

Grid related work: [Globus]

providing computational resources

-

installing/deploying Grid middleware is difficult

P2P related work: [Gnutella, Freenet, DHT]

-

focusing on sharing data & mono-application

dynamic & easy to deploy

Objective: provide a P2P infrastructure for Grids and sharing computational resources

(153)

Which P2P Architecture?

Master-Worker (SETI@home)

• centralized

- targets only desktops

• good for embarrassingly parallel

Pure/Unstructured Peer-to-Peer (Gnutella)

- flooding problems

✓ supports high-churn (good for Desktop Grids)

✓ supports many kind of application (data, computational)

Hybrid Peer-to-Peer (JXTA)

• uses central servers

✓ limits the flooding

- has to manage churn

Worker

Master

Worker Worker Worker Worker ... ...

T'T'' T'''

pending tasks

R' R'' R'''

partial results asking for a task

sending a task sending a partial result

computing task computing task

B A Resource Y B A Resource Y B A Resource Y

Peers in acquaintance Message query Peer

Flooding

Unstructured P2P Peer A accesses directly to resource Y on B

B A Resource Y B A Resource Y B A Resource Y Flooding

Hybrid P2P Peer A accesses directly to resource Y on B

B -> Y ...

(154)

Which P2P Architecture?

Master-Worker (SETI@home)

• centralized

- targets only desktops

• good for embarrassingly parallel

Pure/Unstructured Peer-to-Peer (Gnutella)

- flooding problems

✓ supports high-churn (good for Desktop Grids)

✓ supports many kind of application (data, computational)

Hybrid Peer-to-Peer (JXTA)

• uses central servers

✓ limits the flooding

- has to manage churn

Structured Peer-to-Peer (Chord)

- high cost for managing churn

• efficient for data sharing (Distributed Hash Table)

Worker

Master

Worker Worker Worker Worker ... ...

T'T'' T'''

pending tasks

R' R'' R'''

partial results asking for a task

sending a task sending a partial result

computing task computing task

B A Resource Y B A Resource Y B A Resource Y

Peers in acquaintance Message query Peer

Flooding

Unstructured P2P Peer A accesses directly to resource Y on B

B A Resource Y B A Resource Y B A Resource Y Peers in acquaintance Message query Peer Flooding

Hybrid P2P Peer A accesses directly to resource Y on B

Server B -> Y ... N1 N8 N14 N21 N48 N51 N56 lookup(K54) 0 2 -1m K54 Finger table N42 N8+32 N32 N8+16 N21 N8+8 N14 N8+4 N8+2 N14 N14 N8+1 m=6

Pure Peer-to-Peer is the most adapted Needs to avoid the flooding problem

(155)

Contribution & Positioning

Grid Applications & Portals

Grid Programming Grid Middleware

Infrastructure

(156)

Contribution & Positioning

Grid Applications & Portals

Grid Programming Grid Middleware Infrastructure Grid Fabric Branch-and-Bound API P2P Infrastructure

(157)
(158)

The Peer-to-Peer Infrastructure

[CMST06]

Pure Peer-to-Peer overlay network

Using it as Grid middleware infrastructure

The proposed solution to avoid the flooding problem:

3 node-request protocols:

1 node: Random walk algorithm

n nodes: Breadth-First-Search (BFS) algorithm with acknowledgement

max nodes: BFS without acknowledgement

(159)

INRIA Sophia P2P Desktop Grid

Need to validate the infrastructure with desktops

260 desktops at INRIA Sophia lab

No disturbing normal users: running in low priority

working schedules:

24/24 ≈ 50 machines (INRIA-2424)

(160)

INRIA Sophia P2P Desktop Grid

Need to validate the infrastructure with desktops

260 desktops at INRIA Sophia lab

No disturbing normal users: running in low priority

working schedules:

24/24 ≈ 50 machines (INRIA-2424)

night/weekend ≈ 260 machines (INRIA-ALL) Deployed our P2P infrastructure as permanent

(161)

INRIA Sophia P2P Desktop Grid

Need to validate the infrastructure with desktops

260 desktops at INRIA Sophia lab

No disturbing normal users: running in low priority

working schedules:

24/24 ≈ 50 machines (INRIA-2424)

night/weekend ≈ 260 machines (INRIA-ALL)

INRIA Sophia Antipolis - Desktop Machines Network

INRIA Sub Network INRIA Sub Network

INRIA Sub Network INRIA Sub Network

(162)

Long Running Experiments with the P2P Desktop Grid

Context: ETSI Grid Plugtests contest ➟ n-queens n-queens:

embarrassingly parallel / CPU intensive / master-worker

(163)

Long Running Experiments with the P2P Desktop Grid

Context: ETSI Grid Plugtests contest ➟ n-queens n-queens:

embarrassingly parallel / CPU intensive / master-worker

(164)

n-queens Experiment Results

World Record [Sloane Integers Sequence A000170]

What we learn from this experiments:

validate the workability of the infrastructure validate the robustness of the infrastructure

hard to forecast machine's performances

Total # of Tasks 12,125,199

Task Computation ≈ 138''

Computation Time ≈ 185 days

Cumulated Time ≈ 53 years

# of Desktop Machines 260

(165)

Agenda

Context, Problem, and Related Work Contributions

Branch-and-Bound Framework for Grids Desktop Grid with Peer-to-Peer

Mixing Desktops & Clusters Perspectives & Conclusion

(166)

Mixing Desktops & Clusters

[PARCO07]

Grid'BnB

Parallel BnB Framework for Grid

P2P Infrastructure

(167)

Mixing Desktops & Clusters

[PARCO07]

Grid'BnB

Parallel BnB Framework for Grid

P2P Infrastructure

as Grid Midlleware

API for solving COPs & Dynamic Grid Infrastructure

(168)

Mixing Desktops & Clusters

[PARCO07]

Grid'BnB

Parallel BnB Framework for Grid

P2P Infrastructure

as Grid Midlleware

API for solving COPs & Dynamic Grid Infrastructure

+

(169)

Mixing Desktops & Clusters

[PARCO07]

Grid'BnB

Parallel BnB Framework for Grid

P2P Infrastructure

as Grid Midlleware

API for solving COPs & Dynamic Grid Infrastructure

+

Validate by experiments on a Grid of Desktops & Clusters

(170)

Sharing Cluster's Nodes with P2P

Host

shared node

Peer

(171)

Sharing Cluster's Nodes with P2P

Host

shared node

Peer

Peer sharing local node

Cluster Host shared node Host shared node Host shared node Host shared node

...

Host Peer
(172)

Testbed

G5K Platform INRIA Sophia - Desktops

Orsay GDX

Sophia Lyon

(173)

Testbed

G5K Platform INRIA Sophia - Desktops

Orsay GDX

Sophia Lyon

G5K Platform INRIA Sophia - Desktops

Orsay GDX Sophia Lyon Peer Peer Peer Peer Node Node Node Node

(174)

Testbed

G5K Platform INRIA Sophia - Desktops

Orsay GDX

Sophia Lyon

Toulouse

G5K Platform INRIA Sophia - Desktops

Orsay GDX Sophia Lyon Toulouse Peer Peer Peer Peer Node Node Node Node Peer Node G5K Platform INRIA Sophia - Desktops

Orsay GDX Sophia Lyon Toulouse Peer Peer Peer Peer Peer Peer Peer Peer Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node RMI/SSH

(175)

Large-Scale Experiments

Goal: validate the infrastructure by experiments With n-queens:

no communication between workers

test the workability of the infrastructure With flow-shop:

(176)

1007 859 572 380 250 0 60 120 180 240 300 238,68 190,63 63,68 39,01 24,49 NQueens with n=22 Time in minutes

Total Number of CPUs for Experimentations

1007 859 572 380 250 0 210 420 630 840 1!050 50 298 302 70 70 140 175 176 156 80 80 80 250 240 247 305 349 INRIA-ALL G5K Lyon G5K Sophia G5K Bordeaux G5K Orsay G5K Toulouse Total !of CPUs Lyon Sophia Orsay Bordeaux Toulouse Sophia Sophia Sophia Lyon

Lyon Bord. Orsay

(177)

628 346 321 313 220 201 80 0 21,667 65,000 108,333 125,55 61,31 86,19 56,03 83,73 59,14 61,15 Time in minutes

Total Number of CPUs for Experimentations

628 346 321 313 220 201 80 0 140 280 420 560 700 142 186 80 256 290 90 62 146 163 173 116 56 24 55 57 57 58 56 42 Total # of CPUs Lyon Sophia Sophia Sophia Sophia Orsay Orsay Bordeaux

Bord. Nancy Rennes Toulouse

(178)

628 346 321 313 220 201 80 0 21,667 65,000 108,333 125,55 61,31 86,19 56,03 83,73 59,14 61,15

Flow-Shop 17 jobs / 17 machines Time in minutes

Total Number of CPUs for Experimentations

628 346 321 313 220 201 80 0 140 280 420 560 700 142 186 80 256 290 90 62 146 163 173 116 56 24 55 57 57 58 56 42 INRIA-2424 G5K Lyon G5K Sophia G5K Bordeaux G5K Orsay G5K Nancy Total # of CPUs Lyon Sophia Sophia Sophia Sophia Orsay Orsay Bordeaux

Bord. Nancy Rennes Toulouse

Flow-Shop Results

With desktops & clusters speedup anomalies occur more

(179)

Mixing - Analysis

N-Queens problem scales well up to 1007 CPUs

349 Desktops + 5 Clusters

Flow-Shop up to 628 CPUs

42 Desktops + 5 Clusters

worse performances than using only clusters (anomalies

are more frequents)

(180)
(181)

P2P as a Meta-Grid infrastructure

Observation from Grid'5000:

300 nodes available for 2 minutes provide best-effort queue

(182)

P2P as a Meta-Grid infrastructure

Observation from Grid'5000:

300 nodes available for 2 minutes provide best-effort queue

Idea: take benefit from these nodes [Condor]

by hand: not easy

with the P2P infrastructure:

(183)

P2P as a Meta-Grid infrastructure

Observation from Grid'5000:

300 nodes available for 2 minutes provide best-effort queue

Idea: take benefit from these nodes [Condor]

by hand: not easy

with the P2P infrastructure:

(184)

P2P as a Meta-Grid infrastructure

Observation from Grid'5000:

300 nodes available for 2 minutes provide best-effort queue

Idea: take benefit from these nodes [Condor]

by hand: not easy

with the P2P infrastructure:

deploying peers in best-effort queue

Result: a permanent P2P infrastructure over Grid'5000

(185)

P2P with N-Queens on Grid'5000

400 600 800 1000 1200 1400 # of CPUs Experimentation: n22.log.40 400 600 800 1000 1200 1400 1600 1800 # of Tasks
(186)

P2P with N-Queens on Grid'5000

Experimentation time in minutes 200 400 600 800 1000 1200 1400 # of CPUs Experimentation: n22.log.40 0 5 10 15 20 25 30 35 40200 400 600 800 1000 1200 1400 1600 1800 # of Tasks

1384 CPUs - 9 sites

# of workers by minutes

Embarrassingly applications can take benefit from Meta-Grid Infrastructure

(187)

Agenda

Context, Problem, and Related Work Contributions

Branch-and-Bound Framework for Grids Desktop Grid with Peer-to-Peer

Mixing Desktops & Clusters Perspectives & Conclusion

(188)

Summary

Grid'BnB: a BnB framework for Grids

communication between workers organizing workers in groups

Grid infrastructure

based on P2P architecture mixing desktops and clusters

(189)

Perspectives

Peer-to-Peer Infrastructure:

Job Scheduler

Resource Localization (PhD)

Large-Scale Experiments:

International Grid: France, Japan, and Netherlands

Grid Pugtests Deployment:

Contracts in Grids (GCM Standard)

Industrialization:

(190)
(191)

Conclusion

Branch-and-Bound for Grids

solve COPs

hide Grid difficulties

(192)

Conclusion

Branch-and-Bound for Grids

solve COPs

hide Grid difficulties

(193)

Conclusion

Branch-and-Bound for Grids

solve COPs

hide Grid difficulties

communication between workers

Peer-to-Peer as Grid infrastructure

(194)

Conclusion

Branch-and-Bound for Grids

solve COPs

hide Grid difficulties

communication between workers

Peer-to-Peer as Grid infrastructure

(195)

Conclusion

Branch-and-Bound for Grids

solve COPs

hide Grid difficulties

communication between workers

Peer-to-Peer as Grid infrastructure

mixing desktops and clusters

(196)

Conclusion

Branch-and-Bound for Grids

solve COPs

hide Grid difficulties

communication between workers

Peer-to-Peer as Grid infrastructure

mixing desktops and clusters

Tested and experimented

(197)

Conclusion

Branch-and-Bound for Grids

solve COPs

hide Grid difficulties

communication between workers

Peer-to-Peer as Grid infrastructure

mixing desktops and clusters

Tested and experimented

(198)

[SCCC05] Balancing Active Objects on a Peer to Peer Infrastructure

Javier Bustos-Jimenez, Denis Caromel, Alexandre di Costanzo, Mario Leyton and Jose M. Piquer. Proceedings of the XXV International Conference of the Chilean Computer Science Society (SCCC 2005), Valdivia, Chile, November 2005.

[HIC06] Executing Hydrodynamic Simulation on Desktop Grid with ObjectWeb ProActive

Denis Caromel, Vincent Cavé, Alexandre di Costanzo, Céline Brignolles, Bruno Grawitz, and Yann Viala. HIC2006: Proceedings of the 7th International Conference on HydroInformatics, Nice, France, September 2006.

[HiPC07] A Parallel Branch & Bound Framework for Grids

Denis Caromel, Alexandre di Costanzo, Laurent Baduel, and Satoshi Matsuoka. Grid’BnB: HiPC’07, Goa, India, December 2007.

[CMST06] ProActive: an Integrated platform for programming and running applications on grids and P2P systems

Denis Caromel, Christian Delbé, Alexandre di Costanzo, and Mario Leyton. Journal on Computational Methods in Science and Technology, volume 12, 2006.

[PARCO07] Peer-to-Peer for Computational Grids: Mixing Clusters and Desktop Machines

Denis Caromel, Alexandre di Costanzo, and Clément Mathieu. Parallel Computing Journal on Large Scale Grid, 2007.

[FGCS07] Peer-to-Peer and Fault-tolerance: Towards Deployment-based Technical Services

References

Related documents

Xaivia will be graduating from Cumberland High School later this month with a GPA of 3.66.She plans to attend Old Dominion University to major in nursing on a pre-law advising

Consistent with Hendricks and Singhal (2003a), the abnormal return during the announcement period is negative and statistically significant. We also observe statistically

• Discuss and identify industrial needs in matter of education and training in the field of maintenance , dismantling , decommissioning , waste management in nuclear facilities ,. •

- Organizational communication of changes on reward structures 4. Labour market characteristics and external equity. Grading - Group assignment with 3 phases: the weight of

 We agreed the amounts reported by the University on its supplemental schedule of fines and assessments, as reported in the annual financial statement audit for the year ended

18 In the CTA model, extreme Type B undergraduates are substantially more likely (3:1 odds) than extreme Type As to score at lowest levels on the anticipation dimension

auxiliary feature: As far as the core functionality is concerned, half of the users did not want to see advertisements while driving when the navigation... support was on, either

knowledgeable and experienced foreclosure attorney like the attorneys at Alliance Legal Group will know all the nuances of the legal process, what your rights are, and how to work