• No results found

Toward performance prediction for Multi-BSP programs in ML

N/A
N/A
Protected

Academic year: 2021

Share "Toward performance prediction for Multi-BSP programs in ML"

Copied!
54
0
0

Loading.... (view fulltext now)

Full text

(1)

Toward performance prediction for Multi-BSP programs in ML

Victor ALLOMBERT1, Frédéric GAVA2, Julien TESSON2

1lifo - Université d’Orléans, France

2lacl - Université Paris Est, France

13 December 2018

(2)

Table of Contents

1 Introduction

2 Semantics with costs 3 Experiments

4 Conclusion

(3)

Table of Contents

1 Introduction

Structured parallel computing bsp and bsml

multi-bsp and multi-ml 2 Semantics with costs

3 Experiments 4 Conclusion

(4)

The world of parallel computing

Simulations:

Fluid simulation 3D Visualisation

Big-Data:

IoT

Social Networking Data science

Symbolic computation:

Model-Checking Formal computing

Super-computer

(5)

Distributed computing

Characterised by:

Interconnected units

Distributed memory

Communication network

mpi

Core Memory Core Memory Core Memory Core Memory

(6)

Bulk Synchronous Parallelism

The bsp computer Defined by:

p pairs CPU/memory

Communication network

Synchronisation unit

Super-steps execution Properties:

Confluent

Deadlock-free

Predictable performances

local

computations p0 p1 p2 p3

communication barrier

next super-step ... ... ... ...

(7)

Bulk Synchronous Parallelism

BSP cost model

The number of processors p;

The time L required for a barrier;

The time g for collectively delivering a 1-relation.

(expressed as multiples the local processing speed r)

Superstep’s cost formulae

Cost(s) = max

0≤i<pwsi + max

0≤i<phsi × g + L

(8)

Bulk Synchronous ml

What is bsml?

Explicit bsp programming with a functional approach

Based upon ml and implemented over ocaml

Formal semantics→ computer-assisted proofs (coq)

Main idea

Parallel data structure⇒ parallel vector:

f0 f1 ... fp−1 parallel vector Replicated part (bsp)

Sequential part

(9)

Bulk Synchronous ml

What is bsml?

Explicit bsp programming with a functional approach

Based upon ml and implemented over ocaml

Formal semantics→ computer-assisted proofs (coq)

Main idea

Parallel data structure⇒ parallel vector:

f0 f1 ... fp−1 parallel vector Replicated part (bsp)

Sequential part

(10)

Bulk Synchronous ml

What is bsml?

Explicit bsp programming with a functional approach

Based upon ml and implemented over ocaml

Formal semantics→ computer-assisted proofs (coq)

Main idea

Parallel data structure⇒ parallel vector:

f0 f1 ... fp−1 parallel vector Replicated part (bsp)

Sequential part

(11)

Bulk Synchronous ml

What is bsml?

Explicit bsp programming with a functional approach

Based upon ml and implemented over ocaml

Formal semantics→ computer-assisted proofs (coq)

Main idea

Parallel data structure⇒ parallel vector:

f0 f1 ... fp−1 parallel vector Replicated part (bsp)

Sequential part

(12)

Hierarchical architectures

Characterised by:

Interconnected units

Both shared and distributed memories

Hierarchical memories

Core Memory

Core Memory Core Memory

Core Memory

Memory

Core Memory

Core Memory Core Memory

Core Memory

Memory Core

Memory Core Memory Core Memory

Core Memory

Memory

Core Memory

Core Memory Core Memory

Core Memory

Memory

(13)

multi-bsp

1 A tree structure with nested components

2 Where nodes have a storage capacity

3 And leaves are processors

4 With sub-synchronisation capabilities  

Stage 3

Network

RAM

P3= 4

Stage 2

RAM

L3

L2/L1

Core

P1= 8 P2= 4

(14)

multi-bsp

Stage 3: 4 nodes with a network access

Stage 2: one node has 4 chips plus RAM

Stage 1: one chip has 8 cores plus L3 cache

Stage 0: one core with L1/L2 caches  

Stage 3

Network

RAM

P3= 4

Stage 2

RAM

L3

L2/L1

Core

P1= 8 P2= 4

(15)

The multi-bsp model

Execution model A level i superstep is:

Level i− 1 executes code independently

Exchanges information with the mi memory

Synchronises

Level i

Level i− 1 n

n.1 n.pi

gi

gi−1 mi

Li

(16)

The multi-bsp model

Execution model A level i superstep is:

Level i− 1 executes code independently

Exchanges information with the mi memory

Synchronises

Level i

Level i− 1 n

n.1 n.pi

gi

gi−1 mi

Li

(17)

The multi-bsp model

Execution model A level i superstep is:

Level i− 1 executes code independently

Exchanges information with the mi memory

Synchronises

Level i

Level i− 1 n

n.1 n.pi

gi

gi−1 mi

Li

(18)

The multi-bsp model

Execution model A level i superstep is:

Level i− 1 executes code independently

Exchanges information with the mi memory

Synchronises

Level i

Level i− 1 n

n.1 n.pi

gi

gi−1 mi

Li

(19)

The multi-bsp model

multi-bsp cost model

4 parameters for the d levels: (pi, gi, Li, mi) Cost formulae

T =

d−1

i=0

(

Ni−1 j=0

wij+ Cij)

(20)

The multi-ml language

Basic ideas

bsml-like code on every stage of the multi-bsp architecture

Specific syntax over ml: eases programming

Multi-functions that recursively go through the multi-bsp tree

tree Replicated part

(multi-bsp)

(21)

The multi-ml language

Basic ideas

bsml-like code on every stage of the multi-bsp architecture

Specific syntax over ml: eases programming

Multi-functions that recursively go through the multi-bsp tree

e e

let v= <<e>>

<< >>

f0 f1 ... fp−1 parallelvector

Replicated part (bsp)

Sequential part

tree Replicated part

(multi-bsp)

(22)

The multi-ml language

Basic ideas

bsml-like code on every stage of the multi-bsp architecture

Specific syntax over ml: eases programming

Multi-functions that recursively go through the multi-bsp tree

e e

let v= <<e>>

<< >>

f0 f1 ... fp−1 parallelvector

Replicated part (bsp)

Sequential part

tree Replicated part

(multi-bsp)

(23)

The multi-ml language

Basic ideas

bsml-like code on every stage of the multi-bsp architecture

Specific syntax over ml: eases programming

Multi-functions that recursively go through the multi-bsp tree

tree Replicated part

(multi-bsp)

(24)

multi-ml: Tree recursion

Recursion structure let multi f [args]=

where node = (* BSML code *) ...

<< f [args] >>

... in v where leaf =

(* OCaml code *) ... in v

f

f f

f f f f

v0.0.0 v0.0.1 v0.1.0 v0.1.1

v0.0 v0.1

Result v0

(25)

multi-ml: Tree recursion

Recursion structure let multi f [args]=

where node = (* BSML code *) ...

<< f [args] >>

... in v where leaf =

(* OCaml code *) ... in v

f

f f

f f f f

v0.0.0 v0.0.1 v0.1.0 v0.1.1

v0.0 v0.1

Result v0

(26)

multi-ml: Tree recursion

Recursion structure let multi f [args]=

where node = (* BSML code *) ...

<< f [args] >>

... in v where leaf =

(* OCaml code *) ... in v

f

f f

f f f f

v0.0.0 v0.0.1 v0.1.0 v0.1.1

v0.0 v0.1

Result v0

(27)

multi-ml: Tree recursion

Recursion structure let multi f [args]=

where node = (* BSML code *) ...

<< f [args] >>

... in v where leaf =

(* OCaml code *) ... in v

f

f f

f f f f

v0.0.0 v0.0.1 v0.1.0 v0.1.1

v0.0 v0.1

Result v0

(28)

multi-ml: Tree recursion

Recursion structure let multi f [args]=

where node = (* BSML code *) ...

<< f [args] >>

... in v where leaf =

(* OCaml code *) ... in v

f

f f

f f f f

v0.0.0 v0.0.1 v0.1.0 v0.1.1

v0.0 v0.1

Result v0

(29)

multi-ml: Tree recursion

Recursion structure let multi f [args]=

where node = (* BSML code *) ...

<< f [args] >>

... in v where leaf =

(* OCaml code *) ... in v

f

f f

f f f f

v0.0.0 v0.0.1 v0.1.0 v0.1.1

v0.0 v0.1

Result v0

(30)

multi-ml: Tree recursion

Recursion structure let multi f [args]=

where node = (* BSML code *) ...

<< f [args] >>

... in v where leaf =

(* OCaml code *) ... in v

f

f f

f f f f

v0.0.0 v0.0.1 v0.1.0 v0.1.1

v0.0 v0.1

Result v0

(31)

multi-ml: Tree recursion

Recursion structure let multi f [args]=

where node = (* BSML code *) ...

<< f [args] >>

... in v where leaf =

(* OCaml code *) ... in v

f

f f

f f f f

v0.0.0 v0.0.1 v0.1.0 v0.1.1

v0.0 v0.1

Result v0

(32)

Table of Contents

1 Introduction

2 Semantics with costs 3 Experiments

4 Conclusion

(33)

Sequential semantics

P E ` e ⇓ v

Environment

Expression

Value Premises

Definitions

v ::= op | cst | (fun x→ e )[E] | (rec f x→ e )[E] E ::= {x17→ v1, …xn7→ xn}

(34)

Sequential semantics

P E ` e ⇓ v

Environment

Expression

Value Premises

Definitions

v ::= op | cst | (fun x→ e )[E] | (rec f x→ e )[E] E ::= {x17→ v1, …xn7→ xn}

(35)

Sequential semantics

P E ` e ⇓ v

Environment

Expression

Value Premises

Definitions

v ::= op | cst | (fun x→ e )[E] | (rec f x→ e )[E] E ::= {x17→ v1, …xn7→ xn}

(36)

Sequential semantics

P E ` e ⇓ v

Environment

Expression

Value

Premises

Definitions

v ::= op | cst | (fun x→ e )[E] | (rec f x→ e )[E] E ::= {x17→ v1, …xn7→ xn}

(37)

Sequential semantics

P E ` e ⇓ v

Environment

Expression

Value Premises

Definitions

v ::= op | cst | (fun x→ e )[E] | (rec f x→ e )[E] E ::= {x17→ v1, …xn7→ xn}

(38)

Sequential semantics

P E ` e ⇓ v

Environment

Expression

Value Premises

Definitions

v ::= op | cst | (fun x→ e )[E] | (rec f x→ e )[E]

E ::= {x17→ v1, …xn7→ xn}

(39)

Sequential semantics with cost

P E ` e ⇓ v ⇝ C

Environment

Expression Value

Premises

Cost

Definitions Cost C is:

c∈Cnc×Tc

(40)

Sequential semantics with cost

P E ` e ⇓ v ⇝ C

Environment

Expression Value

Premises

Cost

Definitions Cost C is:

c∈Cnc×Tc

(41)

Sequential semantics with cost

P E ` e ⇓ v ⇝ C

Environment

Expression Value

Premises

Cost

Definitions Cost C is:

c∈Cnc×Tc

(42)

Sequential semantics with cost

Examples

Csts E `scst⇓ cst ⇝s 0

Let E `se1 ⇓ v1s1 c1 E ] {x 7→ v} `s1 e2 ⇓ v2s2 c2 E `s let x = e1 in e2 ⇓ v2s2 c1⊕ c2⊕ Tlet

(43)

BSML semantics with cost

Definitions

v ::=· · · | < v1, . . . , vp>

Cost: < c1, . . . , cp >s | n × g | L

Cost algebra:

< c1, . . . , cp>s⊕ < c01, . . . , c0p>s≡< c1⊕ c01, . . . , cp⊕ c0p>s

< Top⊕ c1, . . . , Top⊕ cp >s≡ Top⊕ < c1, . . . , cp>s

0≡< 0, . . . , 0 >s

(44)

BSML semantics with cost

Examples

Rpl ∀i ∈ {1, . . . , p} E `se⇓ visci

E `s< e1, . . . , ep>⇓< v1, . . . , vp>sTrpl⊕ < c1, . . . , cp>s

Proj E `se⇓< v1, . . . , vp>s c where∀i ∈ {1, . . . , p} E ` (f i) ≡ vi

E `s(proj e)⇓ f ⇝s+1Tproj⊕ c ⊕ HRelation(v1, . . . , vp)×g ⊕ L

(45)

Multi-ML semantics with cost

Definitions

v ::= . . .  | (multi f x → e † e)[E]

Cost:

max(n1×T1⊕ · · · ⊕ nt×Tm⊕ < c1, . . . , cpn >s) max(n1×T1⊕ · · · ⊕ nt×Tt, maxi=1..pn(ci))

(46)

Multi-ML semantics with cost

Example

MultiNode

E `se1ln(multi f x→ e1† e2)[E]s1 c1

E `s1e2lnvs2c2

E`0e1bn+1vs3c3

E `s(e1e2)lnvs3Tapp⊕ c1⊕ c2⊕ max(c3)⊕ Ln)

(47)

Table of Contents

1 Introduction

2 Semantics with costs 3 Experiments

4 Conclusion

(48)

Matrix vector product algorithm

Estimate the cost of programs using the semantics

S(0) × Tmapd

i=1(S(i − 1) × gi−1⊕ Li−1)⊕ S(i) × Tred)

Benchmark the communication parameters g0 = 1100, g1 = 1800, g2 = 1

L0 = 149000, L1 = 1100, L2 = 1800

Micro-Benchs the “real-time” of each construction

Tmap= 3× Tget⊕ Tset⊕ 2 × TFloatMult⊕ · · · ⊕ 10 × TVar. With: TSet = 1,778µs TFloatMult = 1,317µs

TGet = 1,324µs TVar = 0.619µs

(49)

Matrix vector product algorithm

Estimate the cost of programs using the semantics

S(0) × Tmapd

i=1(S(i − 1) × gi−1⊕ Li−1)⊕ S(i) × Tred)

Benchmark the communication parameters g0 = 1100, g1 = 1800, g2 = 1

L0= 149000, L1 = 1100, L2 = 1800

Micro-Benchs the “real-time” of each construction

Tmap= 3× Tget⊕ Tset⊕ 2 × TFloatMult⊕ · · · ⊕ 10 × TVar. With: TSet = 1,778µs TFloatMult = 1,317µs

TGet = 1,324µs TVar = 0.619µs

(50)

Matrix vector product algorithm

Estimate the cost of programs using the semantics

S(0) × Tmapd

i=1(S(i − 1) × gi−1⊕ Li−1)⊕ S(i) × Tred)

Benchmark the communication parameters g0 = 1100, g1 = 1800, g2 = 1

L0= 149000, L1 = 1100, L2 = 1800

Micro-Benchs the “real-time” of each construction

Tmap= 3× Tget⊕ Tset⊕ 2 × TFloatMult⊕ · · · ⊕ 10 × TVar. With:

TSet = 1,778µs TFloatMult = 1,317µs TGet = 1,324µs TVar = 0.619µs

(51)

Comparing predictions and benchmarks

0 20 40 60

0 1 2 3 4

Input matrix size (2 nodes)

Executiontime(seconds)

Prediction multi-ml Execution multi-ml Prediction bsml Execution bsml

0 20 40 60 80 90

0 1 2 3 4

Input matrix size (8 nodes)

Executiontime(seconds)

Prediction multi-ml Execution multi-ml Prediction bsml Execution bsml

(52)

Table of Contents

1 Introduction

2 Semantics with costs 3 Experiments

4 Conclusion

(53)

Conclusion

Main results

Formal cost semantics for ML, BSML and Multi-ML

Semantics design incrementally

Use of semantics for cost prediction

Application to a skeleton based numerical example

Ongoing and future work

Static and automatic analysis for cost prediction

Need the use of annotations?

Real world example

(54)

Thank you for your attention ⌣

Questions ?

References

Related documents

In this paper I would like primarily to discuss the idea of hybridity as it was explored in surrealist thought, not only however through art but also through a kind of

This problem only concerns North American Operations. Wind energy is currently growing in China and the US. By 2009, there will be capacity issues in the US. We first need to

ΕΛΛΗΝΙΚΆ (Αρχικές οδηγίες) Š Πριν συνδέσετε τη συσκευή στο ρεύμα, ελέγξτε ότι η τάση που αναφέρεται στην πινακίδα σήμανσης συμπίπτει με την τάση του ρεύματος.

The imidazole concentration must therefore be optimized to ensure the best balance of high purity (low binding of host cell proteins), and high yield (binding of histidine-tagged

HisTrap ™ HP is a prepacked, ready-to-use column for the preparative purification of His-tagged recombinant proteins by immobilized metal affinity chromatography (IMAC)

Note: The purity of recombinant His-tagged proteins can often be increased by washing with binding buffer containing as high concentration of imidazole as possible.. However, care

Heat Indicator Two layers design Stainless Steel Main Container Inner Removable Container. 220 V - 60 Hz / 40W

Queries to see the effect of structural changes Data Definition Data Manipulation Application Generation Data Administration D B M S E n g I n e Database Data