External Memory in Bulk-Synchronous Parallel ML

(1)

EXTERNAL MEMORY IN BULK-SYNCHRONOUS PARALLEL ML

∗

FRÉDÉRIC GAVA

†

Abstrat. Afuntionaldata-parallellanguage alledBSMLwasdesignedforprogrammingBulk-SynhronousParallel

algo-rithms,amodelofomputingwhihallowsparallelprogramstobeportedtoawiderangeofarhitetures.BSMLisbasedonan

extensionoftheMLlanguagewithparalleloperationsonaparalleldatastruturealledparallelvetor. Theexeutiontimeanbe

estimated. Dead-loksandindeterminismareavoided. Forlargesaleappliationswhereparallelproessingishelpfulandwhere

thetotalamountofdataoftenexeedsthetotalmainmemoryavailable,paralleldiskI/Obeomesaneessity. Inthispaper,we

presentalibraryofI/OfeaturesforBSMLanditsformalsemantis.Aostmodelisalsogivenandsomepreliminaryperformane

resultsareshownforaommodityluster.

Keywords. ParallelFuntionalProgramming,ParallelI/O,Semantis,BSP.

1. Introdution. Some problems require performane that an only be provided by massivelyparallel

omputers. Programmingthesekindofomputersisstilldiult. Manyimportantomputationalappliations

involve solving problems with very large data sets [44℄. Suh appliations are also referred as out-of-ore

appliations. For example astronomial simulation [47℄, rash test simulation [10℄, geographi information

systems[32℄, weatherpredition [52℄, omputational biology[17℄, graphs[40℄ oromputationalgeometry [11℄

and many other sienti problems an involve data sets that are too large to t in the main memory and

therefore fall into this ategory. For another example, the Large Hadron Collider of the CERN laboratory

forndingtraesofexotifundamentalpartiles(webpageatlh-new-homepage.web.ern.h),whenstarts

running, thisinstrumentwill produes about10 Petabytesa month. The earth-simulator, the mostpowerful

parallel mahine in the top500 list, has 1 Petabyte of total main memory and 100 Petabytes of seondary

memories. Usingthemainmemoryisnotenoughtostoreallthedataofanexperiment.

Using parallelisman redue the omputation time andinrease theavailable memorysize, but for

hal-lenging appliations, the memory is always insuient in size. For instane, in a mesh deomposition of a

mehanialproblem, asientistmightwanttoinreasethemeshsize. Toinreasetheavailablememorysize,a

trivialsolutionis to usethevirtual memory mehanismpresentin modern operatingsystems. This hasbeen

established as a standard method for managing external memory. Its main advantage is that it allows the

appliationtoaesstoalargevirtualmemorywithouthavingtodealwiththeintriaiesofblokedseondary

memoryaesses. Unfortunately,thissolutionisineientifstandardpagingpoliy isemployed[7℄. Togetthe

best performanes, thealgorithmsmustberestruturedwithexpliitI/Oallsonthisseondarymemory.

Suhalgorithmsare generallyalledexternal memory (EM)algorithms andaredesignedforlarge

ompu-tational problemsin whih thesizeoftheinternalmemoryof theomputerisonlyasmallfrationofthesize

oftheproblem([55,53℄forasurvey). ParallelproessingisanimportantissueforEMalgorithmsforthesame

reasons that parallel proessing is of pratial interest in non-EM algorithm design. Existing algorithm and

datastrutureswereoftenunsuitableforout-of-oreappliations. Thisislargelyduetotheneedofloalityon

data referenes, whih is not generallypresent when algorithms are designedfor internal memory due to the

permissivenatureof thePRAMmodel: parallel EMalgorithms[54℄are new anddonotwork optimallyand

orretlyinlassial parallelenvironments.

Delarativeparallellanguagesareneededto simplifytheprogrammingofmassivelyparallel arhitetures.

Funtionallanguagesareoftenonsidered. Thedesignofparallelprogramminglanguagesisatradeobetween

thepossibilityto expresstheparallel featuresthat areneessaryforpreditableeieny (but withprograms

that are more diult to write, prove and port) and the abstration of suh features that are neessaryto

make parallel programmingeasier (but whih should not hinder eieny and performane predition). On

the onehand the programs should be eient but without the prieof non portability and unpreditability

of performanes. The portability of ode is needed to allow ode reuse on a wide variety of arhitetures.

Thepreditabilityofperformanesisneededtoguaranteethattheeieny willalwaysbeahieved,whatever

arhitetureisused.

∗

This work is supported by the ACI Grid program from the Frenh Ministry of Researh,under the projet Caraml

(http://www.araml.org )

†

Laboratory of Algorithms,Complexityand Logi(LACL),Universityof ParisXII, Val-de-Marne, 61avenuedu Généralde

(2)

Fig.2.1. TheBSPmodelofomputation

Another important harateristi of parallel programs is the omplexity of their semantis. Deadloks

and non-determinismoftenhinderthepratialuse ofparallelismby alargenumberofusers. Toavoid these

undesirableproperties,there isatrade-o betweentheexpressiveness ofthelanguageandits struturewhih

oulddereasetheexpressiveness.

We are urrently exploring the intermediate position of the paradigmof algorithmi skeletons [6, 42℄ in

ordertoobtainuniversalparallellanguageswheretheexeutionostaneasilybedeterminedfromthesoure

ode. Inthisontext, ostmeanstheestimate ofparallelexeutiontime. Thislastrequirementforestheuse

ofexpliitproessesorrespondingtotheproessorsoftheparallelmahine. Bulk-SynhronousParallelMLor

BSML is anextension of ML for programmingBulk-SynhronousParallel algorithmsas funtional programs

withaompositionalostmodel. Bulk-SynhronousParallel(BSP)omputingisaparallelprogrammingmodel

introduedbyValiant[46,50℄tooerahighdegreeofabstrationlikePRAMmodelsandyettoallowportable

andpreditableperformaneonawidevarietyofarhitetureswitharealistiostmodelbasedonastrutured

parallelism. Deadloksandindeterminismareavoided. BSPprogramsareportablearossmanyparallel

arhi-tetures. Suh algorithmsoerpreditable andsalableperformanes([38℄for asurvey) andBSMLexpresses

themwithasmallsetofprimitivestakenfromtheonuentBS

λ

alulus[37℄. Suhoperationsareimplemented

asalibrary for thefuntional, witha strit evaluation strategy, programminglanguageObjetive Caml [33℄.

Wereferto[27℄formoredetailsaboutthehoieofthisstrategyformassivelyparallelomputing.

Parallel disk I/O has been identied as a ritial omponent of a suitable high performane omputer.

ResearhinEMalgorithmshasreentlyreeivedonsiderableattention. Overthelastfewyears,omprehensive

omputingandostmodelsthatinorporatedisksandmultipleproessorshavebeenproposed[35,55,54℄,but

notwithalltheaboveelements. [14,16℄showedhowanEMmahineantakefulladvantageofparalleldiskI/O

andmultipleproessors. ThismodelisbasedonanextensionoftheBSPmodelforI/Oaesses. Ourresearh

aimsatombiningtheBSPmodelwithfuntionalprogramming. WenaturallyneedtoalsoextendBSMLwith

I/OaessesforprogrammingEMalgorithms. Thispaperisthefollow-upto ourworkonimperativefeatures

ofourfuntionaldata-parallellanguage[22℄.

Thispaperdesribesafurtherstepafter[21℄towardsthisdiretion. Theremainderofthispaperisorganized

as follows. First we review the BSP model in Setion 2 and, then, briey present the BSML language. In

setion3weintroduetheEM-BSPmodelandtheproblemsthat appearin BSML.Insetion4,wethengive

newprimitivesforourlanguage. Insetion5,wedesribetheformalsemantisofourlanguagewithpersistent

features. Setion 6 is devoted to the formal ost model assoiated to our language and Setion 7 to some

benhmarksofaparallelprogram. Wedisussrelatedworkinsetion8andweendwithonlusionsandfuture

researh(setion9).

2. Funtional Bulk-Synhronous Parallel ML.

(3)

exeutesolletiverequestsofasynhronizationbarrier. Forthesakeofoniseness,wereferto[5,46℄formore

details. Inthismodel,aparallel omputationissubdividedinto supersteps(Figure2.1) attheend ofwhiha

barriersynhronizationandaroutingareperformed. Afterthat,allrequestsfordatapostedduringapreeding

supersteparefullled. Theperformaneofthemahineisharaterizedby3parametersexpressedasmultiples

oftheloalproessingspeed

r

:

(i)

p

isthenumberofproessor-memorypairs;

(ii)

l

isthetimerequiredforaglobalsynhronizationand

(iii)

g

is the time for olletively deliveringa 1-relation, a ommuniation phase where every proessor

reeives/sendsatmostoneword. Thenetworkandeliveran

h

-relationintime

g

×

h

foranyarity

h

.

Theseparametersaneasilybeobtainedusingbenhmarks[28℄. Theexeutiontimeofasuperstep

s

isthus

thesumofthemaximalloal proessingtime, themaximaldatadeliverytime andtheglobalsynhronization

time,i.e, Time

(

s

) = max

i:processor

w

s

i

+ max

i:processor

h

s

i

∗

g

+

l

where

w

s

i

=loalproessing timeonproessor

i

during superstep

s

and

h

s

i

= max

{

h

s

i+

, h

s

i

−

}

where

h

s

i+

(resp.

h

s

i

−

)isthenumberofwordstransmitted (resp.

reeived)byproessor

i

duringsuperstep

s

. Theexeutiontime

P

s

Time

(

s

)

ofaBSPprogramomposed of

S

superstepsisthereforethesumof3terms:

t

comp

+

t

comm

+

L

where







t

comp

=

P

_s

max

i

w

i

s

t

comm

=

H

×

g

where

H

=

P

s

max

i

h

s

i

L

=

S

×

l.

Ingeneral

t

comp

, H

and

S

arefuntionsof

p

andofthesizeofdata

n

,orofmoreomplexparameterslikedata

skew and histogram sizes. To minimizeexeution time, the BSP algorithm designmust jointly minimize the

number

S

ofsuperstepsandthetotalvolume

h

(resp.

t

comp

)andimbalane

h

s

(resp.

t

comm

)ofommuniation

(resp. loal omputation). Bulk Synhronous Parallelism and the Coarse-Grained Multiomputer (CGM),

whih an be seenasaspeial ase ofthe BSP model are used for alargevariety ofappliations. As stated

in [13℄ A omparison of the proeedings of the eminent onferene in the eld, the ACM Symposium on

Parallel Algorithms and Arhitetures between thelate eighties and thetime from the mid-nineties to today

revealsastartlinghangein researhfous. Today,themajorityofresearhinparallelalgorithmsiswithinthe

oarse-grained,BSPstyle,domain.

bsp_p:unit

→

int bsp_l:unit

→

oat bsp_g:unit

→

oat

mkpar:( int

→

α

)

→

α

par

apply:(

α

→

β

)par

→

α

par

→

β

par

type

α

option=None|Someof

α

put:(int

→

α

option)par

→

(int

→

α

option) par

at:

α

par

→

int

→

α

Fig.2.2. TheCoreBsmllibLibrary

2.2. Bulk-Synhronous Parallel ML. BSML does not rely on SPMD programming. Programs are

usualsequential ObjetiveCaml(OCaml)programs[33℄ but work onaparalleldatastruture. Someof the

advantages aresimplersemantisandbetterreadability. Theexeutionorderfollowsthereadingorder in the

soure ode (or, at least, the resultsare suh asseems to followthe exeution order). There is urrently no

implementationofafullBSMLlanguagebutratherapartialimplementationasalibraryforOCaml(webpage

athttp://bsmllib.free.fr/).

Theso-alledBSMLliblibraryisbasedon theelementsgiven inFigure 2.2. Theygiveaess tothe BSP

parameters of the underling arhiteture: bsp_p() is

p

the stati number of proesses (this value does not

hange during exeution), bsp_g() is

g

thetime for olletively delivering a1-relationand bsp_l()is

l

the

timerequiredforaglobalsynhronizationbarrier.

Thereisanabstratpolymorphitype

α

parwhihrepresentsthetypeof

p

-wideparallelvetorsofobjets

oftype

α

oneper proessor. BSMLparallel onstrutsoperateon parallelvetors. Thoseparallel vetorsare

reatedbymkparsothat( mkparf)stores(fi)onproess

i

for

i

between

0

and

p

−

1

:

mkpar f

= (

f

0) (

f

1)

· · ·

(

f

i

)

· · ·

(

f

(

p

−

1))

We usually write f as funpid

→

e to show that the expression

e

may be dierent on eah proessor. This

(4)

it is said to be global. A usual ML expression whih is not within a parallel vetor is alled repliate, i.e,

identialtoeahproessor. ABSPalgorithmisexpressedasaombinationofasynhronousloalomputations

(rst phase of a superstep) and phases of global ommuniation (seond phase of a superstep) with global

synhronization (third phaseofa superstep). Asynhronousphasesareprogrammedwith mkpar andapply

suhthat(apply( mkparf) ( mkpare))stores(( fi)(ei))onproess

i

:

apply

f

0

f

1

· · ·

f

i

· · ·

f

p

−

1

v

0

v

1

· · ·

v

i

· · ·

v

p

−

1

= (

f

0

v

0

) (

f

1

v

1

)

· · ·

(

f

i

v

i

)

· · ·

(

f

p

−

1

v

p

−

1

)

Letusonsiderthefollowingexpression:

let vf=mkpar(funpidx

→

x+pid )

andvv=mkpar(funpid

→

2

∗

pid+1)

inapply vfvv

Thetwoparallelvetorsarerespetivelyequivalentto:

funx

→

x

+ 0

funx

→

x

+ 1

· · ·

funx

→

x

+

i

· · ·

funx

→

x

+ (

p

−

1)

and

0 3

· · ·

2 ×

i

+ 1

· · ·

2 ×

(

p

−

1) + 1

Theexpressionapplyvfvvisthenevaluatedto:

0 4

· · ·

2 ×

i

+ 2

· · ·

2 ×

(

p

−

1) + 2

ReadersfamiliarwithBSPlib[28℄willobservethatweignorethedistintionbetweenaommuniationrequest

and its realization at the barrier. The ommuniation and synhronization phases are expressed by put.

Consider the expression: put (mkpar(fun i

→

fs

i

))

(

∗

)

. To send a value v from proess

j

to proess

i

, the funtion fs

j

at proess

j

must besuh that

(

fs

j

i

)

evaluatesto Somev. Tosend novalue from proess

j

to proess

i

,

(

fs

j

i

)

mustevaluatetoNone. Theexpression

(

∗

)

evaluatestoaparallelvetorontainingafuntion fd

i

ofdeliveredmessagesoneveryproess

i

. Atproess

i

,(fd

i

j

)

evaluatestoNoneifproess

j

sentnomessage toproess

i

orevaluatestoSomevifproess

j

sentthevaluevto theproess

i

.

The full language would also ontain asynhronousprojetionoperationat. (atven) returns thenth

valueoftheparallelvetorve:

at

v

0

· · ·

v

n

· · ·

v

p

−

1

n

=

v

n

atexpressesommuniationandsynhronizationphases. Withoutit,theglobalontrolannottakeintoaount

dataomputedloally. Globalonditionalisneessaryforexpressingalgorithmslike:RepeatParallelIteration

UntilMaxofloalerrors

< ǫ

. Thenestingofpartypesisprohibitedandtheprojetionshouldnotbeevaluated insidethesopeofamkpar. Ourtypesystemenforestheserestritions[23℄.

2.3. Examples.

2.3.1. Often Used Funtions. Some usefulfuntions anbedened byusing onlythe primitives. For

examplethefuntionrepliatereatesaparallelvetorwhihontainsthesamevalueeverywhere.Theprimitive

applyanbeusedonlyforaparallelvetoroffuntionswhihtakeonlyoneargument. Todealwithfuntions

whihtaketwoargumentsweneedtodenetheapply2funtion.

letrepliatex=mkpar(fun pid

→

x)

letapply2vfv1v2 =apply (applyvfv1)v2

It is also ommonto apply thesame sequentialfuntion at eah proess. This anbedone using theparfun

funtions. Theyonlydierin thenumberofargumentstoapply:

letparfunfv=apply(repliatef)v

(5)

Itisalsoommontoapplyadierentfuntiononaproess. applyat

n f

1

f

2

v

appliesfuntion

f

1

atproess

n

andfuntion

f

2

at otherproesses:

letapplyatnf1f2v=

apply(mkpar(funi

→

ifi=nthenf1elsef2))v

2.3.2. CommuniationFuntion. Ourexampleisthelassialomputationoftheprex ofalist. Here

wemakethehypothesisthattheelementsofthelistaredistributedtoalltheproessesaslists. Eahproessor

performs aloal redution,then sendsitspartial resultto the followingproessorsand nallyloallyredues

itspartialresultwiththesentvalues. Takeforexamplethefollowingexpression:

san_list_direte(+)

[1; 2] [3; 4] [5]

Itwillbeevaluatedto:

[

e

+ 1;

e

+ 1 + 2] [

e

+ 1 + 2 + 3;

e

+ 1 + 2 + 3 + 4; ] [

e

+ 1 + 2 + 3 + 4 + 5]

for a prex of three proessors and where

e

is the neutral element (here

0

). To do this, we need rst the omputationoftheprexofaparallelvetor:

(

∗

san_diret:(

α

→

α

→

α

)

→

α

→

α

par

→

α

par

∗

) letsan_diretopevv=

letmkmsgpidvdst=ifdst<pidthenNoneelseSomevin

letpros_lists =mkpar(funpid

→

from_to0pid)in letreeivedmsgs=put(apply(mkparmkmsg)vv) in

letvalues_lists =parfun2List. map

(parfun(omposenoSome)reeivedmsgs)pros_listsin

applyat0(fun_

→

e )(List. fold_left ope)values_lists

where











List. map

f

[

v

0

;

. . .

;

v

n

] = [(

f v

0

);

. . .

; (

f v

n

)]

List.fold_left

f e

[

v

0

;

. . .

;

v

n

] =

f

(

· · ·

(

f

(

f e v

0

)

v

1

)

· · ·

)

v

n

from_to

n m

= [

n

;

n

+ 1;

n

+ 2;

...

;

m

]

noSome(Some

v

)

=

v

ompose

f g x

= (

f

(

g x

))

.

Then,weandiretly havetheprexoflistsusingsomegenerisan:

letsan_wide sanseq_san_lastmapopevv=

letloal_san=parfun(seq_san_lastope)vvin

letlast_elements=parfunfstloal_sanin

letvalues_to_add=(sanopelast_elements)in

letpop=applyat0(funxy

→

y)opin

parfun2map(popvalues_to_add)(parfunsndloal_san)

letsan_wide_diretseq_san_lastmapopevv=

san_widesan_diret seq_san_lastmapopevv

letsan_listsanopevl=

san_widesanseq_san_lastList.mapopevl

(

∗

san_list_diret :(

α

→

α

→

α

)

→

α

→

α

listpar

→

α

listpar

∗

) letsan_list_diretopevl=san_listsan_diretopevl

whereseq_san_last

f e

[

v

0

;

v

1

;

. . .

;

v

n

] = (

last

,

[(

f e v

0

);

f

(

f e v

0

)

v

1

;

. . .

;

last

])

where

last

=

f

(

· · ·

(

f

(

f e v

0

)

v

1

)

· · ·

)

v

n

. TheBSP ostformulaoftheabovefuntion (assumingophasaonstant ost

c

op

)isthus

2 ×

N

×

c

op

×

r

+ (

p

−

1)

×

s

×

g

+

l

where

s

denotesthesizeinwordsofavalueomputeby thesanand

N

thelengthofthebiggestlistheld ataproess. Wehavethusthetimeto omputethepartial

(6)

P/M P/M P/M P/M P/M

Disk

D

−

1

Disk

0

InternalBus

CPU

Memory

@

Network

Router

Fig.3.1. ABSPomputerwithexternalmemories

2.4. Advantages of Funtional BSP Programming. OneimportantbenetoftheBSPmodelisthe

abilitytoauratelypredittheexeutiontimerequirementsofparallelalgorithms. Communiationsarelearly

separated from synhronization, i. e., this avoids deadloks and it an be performed in any order, provided

that the information is delivered at the beginning of the next superstep. This is ahieved by onstruting

analytial formulas that are parameterizedby afew valueswhih aptured the omputation, ommuniation

andsynhronizationperformaneoftheparallelsystem.

The larity, abstration and formal semantis of funtional language make them desirable vehiles for

omplexsoftware. Thefuntionalapproahofthisparallelmodelallowsthere-useof suitabletehniquesfrom

funtional languagesbeauseafewnumberofparallel primitivesisneeded. PrimitivesoftheBSML language

with astrit strategyare derivedfrom aonuentalulus [37℄ soparallel algorithms arealso onuent and

keeptheadvantagesoftheBSPmodels: nodeadlok,eientimplementationusingoptimizedommuniation

algorithms, statiostformulasand ostprevisions. Thelazy evaluation strategyof purefuntional language

isnotsuited fortheneedofthemassivelyparallelprogrammer. Lazyevaluationhastheunwantedpropertyof

hidingomplexity fromthe programmer[27℄. The stritstrategy ofOCamlmakesthe BSMLlib abettertool

forhighperformaneappliationsbeauseprogramsaretransparentinthesenseofmakingomplexityexpliit

inthesyntax.

Also,asin funtionallanguages,weouldeasily prove and ertify funtional implementationof suh

algo-rithms withaproofassistant[1,4℄asin [20℄. Usingtheextration possibilityoftheproof assistant,weould

generate aertied implementation to be used independently of the sequential orparallel implementation of

theBSMLprimitives.

3. ExternalMemory.

3.1. The EM-BSP model. Modern omputers typially haveseverallayersof memorieswhih inlude

the main memory and ahesas well as disks. We restritourselves to the two-level model [54℄ beause the

speeddierene between disks and the main memory is muh more signiant than between other layers of

memories. [16℄extendedtheBSPmodelto inludeseondaryloal memories. The basiideaissimpleand it

isillustratedin Figure 3.1. Eahproessorhas,in additionto itsloal memory,anexternalmemory (EM)in

theformofasetofdisks. ThisideaisappliedtoextendtheBSPmodeltoitsEMversionalledEM-BSPby

addingthefollowingparameterstothestandardBSP parameters:

(i)

M

istheloalmemorysizeofeahproessor;

(ii)

D

is thenumberofdiskdrives ofeahproessor;

(iii)

B

isthetransferbloksize ofadiskdrive,and

(iv)

G

is theratio of loal omputationalapaity (number of loal omputationoperations)dividedby

loal I/Oapaity(numberof bloksof size

B

that anbetransferred betweentheloal disksand memory)

perunit time.

Inmanypratialases,allproessorshavethesamenumber ofdisksand,thus,themodelisrestritedto

thatase(althoughthemodelforbidsdierentmemorysizes). Thediskdrivesofeahproessoraredenotedby

D

0

,

D

1

, . . . ,

D

−

1

. Eahdriveonsistsofasequeneoftraks whihanbeaessedbydiretrandomaess. A

trakstoresexatlyoneblokof

B

words. Eahproessoranuseallits

D

diskdrivesonurrently andtransfer

D

×

B

wordsfrom/to theloal disks to/from itsloal memoryin a singleI/Ooperationbeingat ost

G

. In

(7)

assumed to be ableto storein its loal main memory at least somebloks from eah disk at the sametime,

i.e.,

M >> DB

.

Likeomputation onthe BSPmodel, theomputation ofthe EM-BSPmodelproeeds in asuession of

supersteps. TheommuniationostsarethesameasfortheBSPmodel. TheEM-BSPmodelallowsmultiple

I/Ooperationsduringtheomputationphaseofthesuperstep. Thetotalostofeahsuperstepisthusdenedas

t

comp,io

+

t

comm

+

L

where

t

comp,io

istheomputationalostandadditionalI/Oosthargedforthesupersteps,

i.e,

t

comp,io

=

P

s

max

i

(

w

s

i

+

m

s

i

)

where

m

s

i

istheI/Oostinurredbyproessor

i

duringsuperstep

s

. Werefer

to[16℄to havetheEM-BSPomplexityofsomelassialBSPalgorithms.

3.2. Examples of EM algorithms. Our rst example is the matrix inversion whih is used by many

appliations as a diret method to solve linear systems. The omputation of the inverse of a matrix

A

an be derivedfrom itsLUfatorization. [8℄presentsthe LUfatorization bybloks. Forthis parallelout-of-ore

fatorization, the matrix is divided in bloks of olumns alled superbloks. The width of the superblok is

determinedbytheamountofphysialavailablememory: onlybloksoftheurrentsuperblokareinthemain

memory,theothersareondisks. Thealgorithmfatorizethematrixfromlefttoright,superblokbysuperblok.

Eah time a newsuperblok of the matrix is fethed in the main memory (alled the ative superblok), all

previouspivoting andupdateofahistoryoftheright-lookingalgorithmare appliedto theativesuperbloks.

Onethelastsuperblokisfatorized,thematrixisre-readtoapplytheremainingrowpivotingofthereursive

phases. Note that the omputation isdone data in plae, thematrix hasbeenrst distributed onproessors

andthus,forloadbalaning,aylidistributionofthedataisused.

[9℄presentsPRAMalgorithmsusingexternal-memoryfor graphproblemsasbionneted omponentsofa

graphorminimumspanningforest.Oneofthemisthe3-oloringofayleappliedtondinglargeindependents

setsfortheproblemoflistranking(determine,foreah node

v

ofalist,therankof

v

dene asthenumberof

linksfrom

v

totheendofthelist). Themethodsforsolvingitistoupdatesatteredsuessorandpredeessor

olorsas neededafter re-oloring agroupof nodes of the list without sortingor sanningthe entire list. As

before, thealgorithmsworksgroupbygroupswithonlyonegroupinthemainmemory.

Thelastexampleisthemulti-stringsearhproblemwhihonsistsofdeterminingwhihof

k

patternstrings ourin anotherstring. Importantappliations onbiologialdatabasesmakeuseofverylargetextolletions

requiringspeializednontrivialsearhoperations. [19℄desribesanalgorithmforthisproblemwithaonstant

numberofsupersteps andbased onthedistributionof aproperdata strutureamong theproessorsand the

disksto redue andbalanetheommuniationost. Thisdatastruture isbasedonabindtree builtonthe

suxes of thestrings and thealgorithm works onlongest ommon prex onsuh trees and bylexiographi

order. Thealgorithmtakesadvantageofdisks byonlykeepingapartofabind treein themain memoryand

byolletingsubpartoftreesduringthesupersteps.

4. ExternalMemoryin BSML.

4.1. Problems by Adding I/O in BSML. Themainproblem byaddingexternalmemoryand soI/O

operators to BSML is to keep safethe fat that in the global ontext, the repliatevalues, i.e, usual OCaml

valuesrepliateoneahproessor,arethesame. Suhvaluesaredediatedtotheglobalontroloftheparallel

algorithms. Takeforexamplethefollowingexpression:

lethan=open_in"le. dat"in

if( at(mkpar(funpid

→

( pidmod2)=0))(input_valuehan))

thensan_diret(+)0(repliate1)

else( repliate1)

Itis nottruethat theleoneahproessorontainsthesamevalue. Inthis ase,eah proessorreads onits

seondarymemoryadierentvalue. Wewouldhaveobtainedaninoherentresultbeauseeahproessorreadsa

dierentintegeronthehannel hanandsomeofthemwouldexeutesan_diretwhihneedasynhronization.

Otherswould exeuterepliatewhihdoesnotneedasynhronization. This breakstheonuentresultofthe

BSML languageand the BSP model of omputation with its global synhronizations. If this expression had

been evaluated with theBSMLlib library, wewould havea breakdown of the BSP omputerbeauseat is a

globalsynhronousprimitive. NotethatwealsohavethiskindofproblemsintheBSPlib[28℄wheretheauthors

(8)

P/M P/M P/M P/M P/M

Disk

0

Disk

D

g

−

1

Network

Router

Fig.4.1.ABSPomputerwithshareddisks

leta=mkpar(fun i

→

ifi=0then(open_in"le.dat");() else())

in( open_out"le.dat")

where () is an empty value. If this expression had been evaluated with the BSMLlib library, only the rst

proessorwouldhaveopenedthelein areadmode. After,eahproessoropenedthelewiththesamename

in awrite mode exepttherstone. This lehasalreadybeenopenedin readmode. Wewouldalsohavean

inoherentresultbeausetherst proessorraisedanexeptionwhih isnotaughtat allby otherproesses

in theglobalontext. Thisproblem ofside-eetsouldalso beombinedwiththerstproblem ifthere isno

leatthebeginningoftheomputation. Takeforexamplethefollowingexpression:

lethan=open_out "le .dat"in

letx=mkpar(funi

→

ifi=0 then(ouput_value0) else())in

ouput_value1;loseha;

lethan=open_in"le.dat"in

if(at(mkpar( fun pid

→

(pid mod2)=0))(input_valuehan))

thensan_diret(+)0(repliate1)

else(repliate1)

Therstproessoraddstheintegers1and2onitsleandotherproessorsaddtheinteger2ontheirles. As

in therstexample,wewouldhaveabreakdownof theBSPomputerbeausetheintegerread would notbe

thesameandatisaglobalsynhronousprimitive.

4.2. The proposed solution. Oursolution isto havetwokindsof les: globaland loal ones. In this

way,wehavetwokindsofI/Ooperators. LoalI/Ooperators donothaveto ourin theglobalontext and

global I/O ones do not haveto our loally. Loal les are in loal le systems whih are presents in eah

proessorasintheEM-BSPmodel. Globallesareinagloballesystem. Theselesneedtobethesamefrom

thepointofviewof eahnode. Thegloballesystemisthusin shared disks (as inFigure4.1)orasaopyin

eahproessor. Theythusalwaysgivethesamevaluesfortheglobalontext. Notethatiftheyareonlyshared

disksandnotloal ones,theloalle systemsouldbeindierentdiretories,oneperproessorin theglobal

lesystem.

An advantageofhavingshareddisksistheaseofsomealgorithmswhih donothavedistributeddata at

thebeginningoftheomputation. Asthosewhihsort,thelistofdatatosortisinagloballeatthebeginning

oftheprogramandin anothergloballeattheend. Ontheotherhand,intheaseofadistributedgloballe

system,theglobaldataarealsodistributed andprogramsarelesssensitivetotheproblemoffaults. Thus, we

havetwoimportant asesfor the global lesystem whih ouldbe seenasanew parameterof the EM-BSP

mahine: haveweshareddisksornot?

Intherst ase,the onditionthatthe globalles arethesamefor eah proessorpointof viewrequires

some synhronizations forsome global I/Ooperators asreated, opened ordeleted ale. For example, it is

impossibleorun-deterministiforaproessortoreatealeinthegloballesystemifatthesametimeanother

proessordeleted it. On theother hand, reading(resp. writing) valuesfrom (resp. to)les do notneed any

synhronization. Alltheproessorsreadthesamevaluesin thegloballeandonlyoneoftheproessorsneeds

toreallywritethevalueontheshareddisks. Intheaseofaglobaloutputoperatoronlyoneoftheproessors

writesthevalueandintheaseofaglobal inputoperatorthevalueisrstread fromthedisksbyaproessor

and thenis read by other proessorsfrom theoperatingsystem buers. In this way, forall global operators,

thereisnotabottlenekoftheshareddisks.

Intheseondase,alltheles, loaland globalones,aredistributed andnosynhronizationis neededat

(9)

Notethatmanymodern parallelmahineshaveonurrentshareddisks. Suhdisksarealwaysonsidered

asuser disks, i.e,diskswhere theusersput thedataneededfortheomputationswhereasloaldisks areonly

generallyusedfor theparallelomputations ofprograms. Forexample,theearth simulatorhas1,5 Petabytes

for users as mass storage disks and aspeial network to aess them. If there are no shared disks, NFS or

salablelowlevellibrariesasin[36℄areabletosimulateonurrentshareddisks. Notealsothatiftheyareonly

shareddisks, loal disks ould be simulatedby usingdierentdiretoriesfor theloal disks of theproessors

(onediretoryforoneproessor).

0 1e-05

2e-05

3e-05

4e-05

5e-05

6e-05

0 2000

4000

6000

8000

10000

12000

14000

Size in words of the values on each disk

Write or read values

1.4e-05

1.45e-05

1.5e-05

1.55e-05

1.6e-05

1.65e-05

1.7e-05

1.75e-05

1.8e-05

1.85e-05

3900

3950

4000

4050

4100

4150

4200

4250

4300

Size in words of the values on each disk

Write or read values

Fig.4.2. BenhmarksofEMparameters

4.3. Our new model. After some experiments to determine the EM-BSP parameters of our parallel

mahine,wehavefoundthatoperatingsystemsdonotread/writedatainaonstanttimebutin alineartime

dependingonthesizeofthedata. Wealsonotiethatthereisanoverheaddependingonthesizeofthebloks,

(10)

to read/write thisvalue from/tothe

D

onurrentdisks. Figure 4.2givestheresultsof this experimentona

PCwith3disks,eahdiskwith bloksof4096words(seondsare plottedonthevertialaxis). Thisprogram

wasrun10000timesandtheaveragewastaken. Suhresultsarenotalteredifwedereasethenumberofdisks.

Ourproposedsolutiongivestheproessorsaesstotwokindsofles: globalandloalones. Bythis way,

ourmodelalled EM

2

-BSP extendstheBSPmodelto itsEM

2

versionwithtwokindsofexternalmemories,

loal and globalones. Eahloalle systemwill beonloal onurrentdisks asin the EM-BSPmodel. The

globalonewillbeononurrentshareddisks(asinFigure4.1)iftheyexistorrepliateontheloaldisks. The

EM

2

-BSP model isthus ableto takeintoaountthe timeto read thedata and todistributed theminto the

proessors. ThefollowingparametersarethusaddingtothestandardBSPparameters:

(i)

M

istheloalmemorysizeofeah proessor;

(ii)

D

l

isthenumberofindependentdisksofeahproessor;

(iii)

B

l

isthetransferbloksizeofaloaldisk;

(iv)

G

l

isthetimetoreadorwriteinparallelonewordoneahloal disk;

(v)

O

l

istheoverheadoftheonurrentloaldisks;

(vi)

D

g

isthenumberofindependentshareddisks(orglobaldisks);

(vii)

B

g

isthetransferbloksize ofaglobaldisk;

(viii)

G

g

is thetimetoread orwritein parallelonewordoneah globaldiskand

(ix)

O

g

istheoverheadoftheonurrentglobaldisks.

Ofourse,iftherearenoshareddisksornoloaldisks:

D

l

₌

_D

g

,

B

l

₌

_B

g

,

G

l

₌

_G

g

and

O

l

₌

_O

g

. Aproessor

isableto read/write

n

wordsto itsloaldisksintime

⌈

n

D

l

⌉ ×

G

l

₊

_⌈

n+1

D

l

_B

l

⌉ ×

O

l

and

n

wordstotheglobaldisks

intime

⌈

n

D

g

⌉ ×

G

g

+

⌈

_D

n+1

g

_B

g

⌉ ×

O

g

.

Asin theEM-BSPmodel,theomputationoftheEM

2

-BSPmodel proeedsinasuessionofsupersteps.

TheommuniationostsarethesameasfortheEM-BSPmodelandmultipleI/Ooperationsarealsoallowed

duringtheomputationphaseofasuperstep.

Notethat

G

g

isnot

g

evenifproessorsaesstotheshareddisksbythenetwork(inaseofsomeparallel

mahines):

g

is the time to perform a

1

-relation and

G

g

is the time to read/write

D

words on the shared

onurrentdisks. Itoulddependon

g

insomeparallelmahineaslustersbutitoulddependonmanyother

hardwareparametersif,forexample,thereisaspeialnetworktoaess tothesharedonurrentdisks.

4.4. New Primitives. In this setion wedesribethe ore of ourI/O library, i. e., the minimal set of

primitivesfor programmingEM

2

-BSPalgorithms. Thislibrarywill beinorporatedin the nextreleaseofthe

BSMLlib. ThisI/Olibrary isbasedon theelementsgivenin Figure4.3. Asin the BSMLliblibrary, wehave

funtionstoaesstotheEM

2

-BSPparametersoftheunderliningarhiteture. Forexample,embsp_lo_D()

is

D

l

the number of loal disks and glo_shared() givesif the global le systemis shared ornot. Sine we

havetwole systems, weneed twokindsof namesand twokinds ofabstrat types ofoutput hannels (resp.

input hannels): glo_out_hannel (resp. glo_in_hannel) and lo_out_hannel (resp. lo_in_hannel) to

read/writevaluesfrom/to globalorloalles.

Weanopenanamedleforwriting. Theprimitivereturnsanewoutputhannel onthat le. Theleis

trunatedtozerolengthifitalreadyexists. Eitheritisreatedortheprimitivewillraiseanexeptionifthele

ouldnotbeopened. For this, wehavetwokindsof funtions forglobal andloal les: (glo_open_outF)

whihopensthe globalle Fin write mode andreturns aglobalhannelpositioned at the beginning of that

le and(lo_open_outf ) whihopens theloal le fin writemodeand returnsaloal hannel positioned

at thebeginningof that le. Inthe samemanner, wehavetwofuntions,glo_open_inandlo_open_in

foropeninganamedlein readmode. Suhfuntions returnnewloal orglobalinputhannels positioned at

thebeginningoftheles. Intheaseofglobal shareddisks, asynhronizationours foreahglobalopen.

With this global synhronization, eah proessorouldsignalto the otherones if itmanaged to openthe le

withouterrorsornotandeahproessorwouldraiseanexeptionifoneofthem hasfailedto openthele.

Now, with our hannels, weanread/write values from/to theles. This feature is generallyalled

per-sistene. Towritetherepresentation ofastrutured valueofanytypeto ahannel(global orloal),weused

thefollowingfuntions: (glo_output_valueChav)whihwrites therepliatevaluev totheopenedglobal

leand(lo_output_valuehav)whihloallywritestheloalvaluevtotheopenedloalle. Theobjet

anbethen read bak,by thereading funtions: (glo_input_valueCha) (resp. (lo_input_valueha))

(11)

EM -BSPparameters

embsp_lo_D:unit

→

int embsp_lo_B:unit

→

int embsp_lo_G :unit

→

oat embsp_glo_D: unit

→

int embsp_glo_B:unit

→

int embsp_glo_G:unit

→

oat embsp_lo_O: unit

→

oat embsp_glo_O:unit

→

oat glo_shared :unit

→

bool

GlobalI/Oprimitives LoalI/Oprimitives

glo_open_out:glo_name

→

glo_out_hannel glo_open_in:glo_name

→

glo_in_hannel glo_output_value:glo_out_hannel

→

α

→

unit glo_input_value:glo_in_hannel

→

α

option glo_lose_out:glo_out_hannel

→

unit glo_lose_in:glo_in_hannel

→

unit glo_delete: glo_name

→

unit

glo_seek: glo_in_hannel

→

int

→

unit

lo_open_out:lo_name

→

lo_out_hannel lo_open_in:lo_name

→

lo_out_hannel lo_output_value: lo_out_hannel

→

α

→

unit lo_input_value: lo_in_hannel

→

α

option lo_lose_out: lo_out_hannel

→

unit lo_lose_in:lo_in_hannel

→

unit lo_delete :lo_name

→

unit

lo_seek:lo_in_hannel

→

int

→

unit

Fromloaltoglobal

glo_opy : int

→

lo_name

→

glo_name

→

unit

Fig.4.3. TheCoreI/OBsmllibLibrary

Suh funtions read therepresentation ofa strutured valueand we referto [34℄ abouthavingtype safetyin

hannels andreading them in asafeway. We alsohave(glo_seekChan) (resp.lo_seek)whih allowsto

positioned thehannelat the

n

th valueofagloballe (resp. loalle). Thebehaviorisunspeiedifany of

theabovefuntionsisalled withalosedhannel.

Notethatonlyloalorrepliatevaluesouldbewritten onloal orgloballes. Nesting ofparallelvetors

is prohibited and thus lo_output_value ouldonly write loal values. It is also impossibleto write ona

sharedgloballeaparallelvetorofvalues(globalvalues) beausethesevaluesaredierentoneahproessor

andglo_output_valueis anasynhronousprimitive. Suh valuesouldbewritten in anyorder and ould

bemixedwithothervalues. Thisiswhyonlyloalandrepliatevaluesshouldberead/writefrom/todisks(see

setion6formoredetails).

After, read/write valuesfrom/to hannels, we need to lose them. As previously, weneed four kinds of

funtions: twoforthe input hannels (loal and global ones) andtwofor the output hannels. Forexample,

(glo_lose_outCha),losestheglobaloutputhannelChawhihhadbeenreatedbyglo_open_out. The

glo_deleteandlo_deleteprimitivesdeleteaglobaloraloal leifitisrstlosed.

Thelastprimitiveopiesaloallefromaproessortothegloballesystem. Itisthusaglobalprimitive.

(glo_opyn fF)opiestheleffromtheproessorntothegloballesystemwiththenameF . Thisprimitive

ouldbeusedat theendof aBSML programto opytheloalresultsfrom loal lestotheglobal (user) le

system. Itisnotaommuniationprimitivebeauseusedasaommuniationprimitive,glo_opyhasamore

expensiveostthananyommuniationprimitive(seesetion6). Intheaseofadistributedgloballesystem,

the le is dupliated on all the global le systems of eah proessorand thus all the data of the le are all

put intothe network. Ontheontrary,in theaseof globalshareddisks, itis just aopyofthe lebeause,

aessto theglobalshareddisks isgenerallyslowerthan puttingvaluesinto thenetworkand readthem bak

byanotherproessor.

Usingtheseprimitives,thenalresultofanyprogramwouldbethesame(butnaturallywithoutthesame

totaltime, i. e.,without thesame osts)with shareddisk ornot. Now, to better understand how these new

primitiveswork,wedesribeaformalsemantisofourlanguagewithsuhpersistentfeatures.

5. HighOrder Formal Semantis.

5.1. Mini-BSML. Reasoning on the omplete denition of a funtional and parallel language suh as

BSML, would have been omplex and tedious. In order to simplify the presentation and to ease the formal

(12)

expressionsofmini-BSML,written

e

possiblywithaprime orsubsript,havethefollowingabstrat syntax:

e

::=

x

variables

|

c

onstants

|

op operators

|

fun

x

→

e

abstration

|

(

e e

)

appliation

|

let

x

=

e

in

e

binding

|

(

e, e

)

pairs

|

if

e

then

e

else

e

onditional

|

(

mkpar

e

)

parallelvetor

|

(

apply

e e

)

parallelappliation

|

(

put

e

)

ommuniation

|

(

at

e e

)

projetion

|

f

lenamesorhannels

Inthis grammar,

x

rangesoveraountable set ofidentiers. The form

(

e e

′

₎

stands forthe appliationof a

funtion or anoperator

e

, to anargument

e

′

. Theform fun

x

→

e

is the so-alledand well-known lambda-abstration that denes the rst-lass funtion of whih the parameter is

x

and the resultis the value of

e

.

Constants

c

are theintegers, the booleans, thenumber of proessesp and we assumehavinga uniquevalue

for thetypeunit: (). The set ofprimitiveoperations

op

ontainsarithmeti operations, pairoperators,test

funtionisnofthenonstrutorwhihplaystheroleoftheNoneonstrutorin OCaml,xpointtodened

naturaliteration funtions and ourI/O operators: open r

(resp. open w

) to open ale as ahannel in read

mode (resp. write mode), lose r

(resp. lose w

) to losea hannel in read mode (resp. write mode), read ,

writetoread orwritein ahannel, deletetodeleteale andseekto hangethereadingposition. All those

operatorsaredistinguishedwithasubsriptwhihis

l

foraloaloperatorand

g

foraglobalone. Wealsohave

ourparalleloperators: mkpar,apply, putand at. Wealso havetwokindsofle systems,theloal andthe

globalones,dened with(possiblywithaprime):

• f

foralename;

• f

w

fora write hannel,

f

r

for aread hannel and

g

ξ

k

fora hannel pointed onthe

k

th valueof a le

where

ξ

isthenameofthehannel;

•

?

f

vn

.

_.

.

v

0

is a le where

?

is , r or w for a lose le or an open le in read or write mode and where

v

0

, . . . , v

n

thevaluesholdinthele.

Whenaleisopenedinreadmode,itontainsthename

[

g

a

n

, . . . , g

z

m

]

ofthehannelsthatpointedtoitandthe positionofthesehannels. Beforepresentingthedynamisemantisofthelanguage,i.e.,howtheexpressions

of mini-BSMLare omputedto values, wepresentthevalues themselvesand thesimpleML types[39℄ ofthe

values. Thereisonesemantispervalueof

p

,thenumberofproessesoftheparallelmahine. Inthefollowing,

theexpressionsare extendedwith theparallel vetors:

h

e, . . . , e

i

(nesting ofparallel vetorsisprohibited; our

statianalysisenforesthisrestrition[23℄). Thevaluesofmini-BSMLaredened bythefollowinggrammar:

v

::=

fun

x

→

e

funtionalvalue

|

c

onstant

|

op primitive

|

(

v, v

)

pairvalue

| h

v, . . . , v

i

p-wideparallelvetorvalue

|

f

lenamesorhannels

andthesimpleMLtypesofvaluesaredened bythefollowinggrammar:

τ

::=

κ

basetype(bool,int,unit,lenamesorhannels)

|

α

typevariables

|

τ

1 →

τ

2

typeoffuntionalvaluesfrom

τ

1

to

τ

2 |

τ

1 ∗

τ

2

typeforpairvalues

Wenote

⊢

v

:

τ

tosaythat thevalue

v

hasthetype

τ

andwereferto [39℄ foranintrodution tothetypesof theMLlanguageandto[23℄ forthoseof BSML.

5.2. HighOrder Semantis. Thedynamisemantisisdenedbyanevaluationmehanismthatrelates

expressions to values. To express this relation, we used a small-step semantis. It onsists of a prediate

between an expression and another expression dened by a set of axioms and rules alled steps. The

small-step semantis desribes all the steps of the language from an expression to a value. We suppose that we

evaluate only expressions that havebeen type-heked[23℄ (nesting of parallel vetorshas been prohibited).

Unlike in asequentialomputerwith asequentialprogramminglanguage,aunique lesystem (a setof les)

for persistent operators is not suient. We need to express the le system of all our proessors and our

global le system. We assume a nite set

N

=

{

0 , . . . , p

−

1 }

whih represents the set of proessor names and we write

i

for these names and

⋊

⋉

for the whole parallel omputer. Now, we an formalize the les for

eahproessorand for thenetwork. Wewrite

{

f

i

}

forthe lesystem ofproessor

i

with

i

∈ N

. Weassume

that eah proessorhasalesystemasaninnite mappingof les whih aredierent ateahproessor. We

(13)

(

bsp_p()

)

⇀

δ

p

(

fst

(

v

1 , v

2 ))

⇀

δ

v

1

iftruethen

e

1

else

e

2 ⇀

δ

e

1 (

isnn

)

⇀

δ

true

(

xop

)

⇀

δ

op

(+ (

n

1 , n

2 ))

⇀

δ

n

with

n

=

n

1 +

n

2 (

snd

(

v

1 , v

2 ))

⇀

δ

v

2

iffalsethen

e

1

else

e

2 ⇀

δ

e

2 (

isn

v

)

⇀

δ

false if

v

6 =

n

(

x

(

fun

x

→

e

))

⇀

δ

e

[

x

←

(

x

(

fun

x

→

e

))]

Fig.5.1. Funtional

δ

-rules

system. Thepersistentversionofthesmall-stepssemantishasthefollowingform:

{F}

/e/

{

f

}

⇀

{F

′

_}

_/e

′

_/

_{

_f

′

_}

.

We note

∗

⇀

, for the transitive and reexive losure of

⇀

, e. g., we note

{F

0

_}

_/e

0

/

{

f

0

}

∗

⇀

{F}

/v/

{

f

}

for

{F

0

_}

_/e

0

/

{

f

0

}

⇀

{F

1

}

/e

1

/

{

f

1

}

⇀

{F

2

}

/e

2

/

{

f

2

}

⇀ . . . ⇀

{F}

/v/

{

f

}

. To dene therelation

⇀

, webegin

withsomerulesfortwokindsofredutions:

(i)

e/

{

f

i

}

i

⇀ e

′

_/

_{

_f

′

i

}

whih ouldberead as withthe initial loal le system

{

f

i

}

, at proessor

i

, the

expression

e

isreduedto

e

′

withthele system

{

f

′

i

}

";

(ii)

{F}

/e/

{

f

}

⋊

⋉

⇀

{F

′

_}

_/e

′

_/

_{

_f

_}

whih ouldberead aswith theinitial globallesystem

{F}

and with

theinitialset ofloal lesystems,theexpression

e

isreduedto

e

′

withthegloballesystem

F

′

andwiththe

sameset ofloallesystems".

Todenetheserelations,webeginwithsomeaxiomsforthefuntionalheadredution

ε

⇀

:

(

fun

x

→

e

)

v

ε

⇀

e

[

x

←

v

]

and let

x

=

v

in

e

ε

⇀

e

[

x

←

v

]

We write

e

[

x

←

v

]

for the expression obtained by substituting all the free ourrenes of

x

in

e

by

v

. Free ourrenes of a variable are dened as a lassial and trivial indutive funtion on our expressions. This

funtional headredutionhas twoversions. First, aloal redution,

ε

⇀

i

, of just theproessor

i

and seond,a

globalredution,

ε

⇀

⋊

⋉

,ofthewholeparallelmahine:

e

⇀ e

ε

′

e /

{

f

i

}

⇀

ε

i

e

′

_/

_{

_f

i

}

(1)

e

ε

⇀ e

′

{F}

/ e /

{

f

}

⇀

ε

⋊

⋉

{F}

/ e

′

_/

_{

_f

_}

(2)

Forprimitiveoperatorswealsohavesomeaxioms,the

δ

-rules. Thefuntional

δ

-rules

⇀

δ

aregiveninFigure5.1.

First,wehavefuntional

δ

-ruleswhih ouldbeusedbyoneproessor

i

,

⇀

δ

_i

orbytheparallelmahine,

⇀

δ

⋊

⋉

. As

inthefuntionalheadredution,wehavetwodierentasesforusingfuntional

δ

-rules:

e⇀

δ

e

′

e /

{

f

i

}

⇀

δ

i

e

′

_/

_{

_f

i

}

(3)

e⇀

δ

e

′

{F}

/ e /

{

f

}

⇀

δ

⋊

⋉

{F}

/ e

′

_/

_{

_f

_}

(4)

Suhredutions,whiharenotpersistentredutions,donothangeanddonotneedtheles. Onlypersistent

operatorshangeandneedthem.

{F}

/

(

mkpar

v

)

/

{

f

}

⇀

δ≎

{F}

/

h

(

v

0)

, . . . ,

(

v

(

p

−

1))

i

/

{

f

}

{F}

/

(

apply

h

v

0 , . . . , v

p−1

i h

v

′

0 , . . . , v

′

p−

1 i

)

/

{

f

}

⇀

δ≎

{F}

/

h

(

v

0 v

′

0 )

, . . . ,

(

v

p

−

1 v

′

_p

₋

1 )

i

/

{

f

}

{F}

/

(

at

h

. . . , v

n

, . . .

i

n

)

/

{

f

}

⇀

δ≎

{F}

/ v

n

/

{

f

}

if

A

c

(

v

n

)

6 =

True

{F}

/

(

put

h

v

0 , . . . , v

p−1

i

)

/

{

f

}

⇀

δ≎

{F}

/

(

mkfun

(

h

send

(

init

v

0

p

)

, . . . ,

send

(

init

v

p

−

1

p

)

i

))

/

{

f

}

{F}

/

h

send

[

v

0

0 , .., v

p−1

0 ]

, . . . ,

send

[

v

0 p−1

, .., v

p−1

]

i

/

{

f

}

⇀

δ≎

{F}

/

h

[

v

0

0 , .., v

0 p−

1 ]

, . . . ,

[

v

p−

1

0 , .., v

p−

1 p−1

]

i

/

{

f

}

if

∀

n, m

∈

0 , . . . , p

−

1 A

c

(

v

m

n

)

6 =

True

where mkfun

=

apply

(

mkpar

(

fun

j t i

→

if

(

and

(

≤

(0

, i

)

, <

(

i,

p

)))

then

(

aess

t i

)

elsen

))

(14)

Seond,for theparallelprimitives,wenaturallyhave

δ

-rules butwedonothavethose

δ

-rulesonasingle

proessorbutonlyfor theparallelmahine (Figure5.2). Forsimplereasonsit isimpossibleforaproessorto

send ahannelto anotherproessor. This seond proessor doesnot haveto read in this hannel beause it

ould beseen asahiddenommuniation. Inthis way, wehaveto test ifthesentvaluesontain hannels or

not. Todo this, weused atrivialindutivefuntion

A

c

whih tellswhether anexpressionontainsahannel

ornot. Note thatthis work isdonewhen OCamlserializesvalues. This raisesanexeptionwhen anabstrat

datumlikeahannelhasbeenfound. Theevaluation ofaputprimitiveproeedsin twosteps. Inarststep,

eahproessorreatesapurefuntionalarrayofvalues. Thus,weneedanewkindofexpression,arrayswritten

[

e, . . . , e

]

. initandaessoperatorsareusedtomanipulate thesefuntionalarrays:

aess

[

v

0 , . . . , v

n

, . . . , v

m

]

n

⇀

δ

v

n

and init

f m

⇀

δ

[(

f

0)

, . . . ,

(

f

(

m

−

1))]

Inthe seond step, thesend operationsexhange these arrays. For example,the valueat the index

j

of the

arrayheldatproess

i

issenttoproess

j

andisstoredatindex

i

oftheresult. Thefuntionmkfunonstruts

aparallel vetoroffuntions fromtheresultingvetorofarrays.

(

open r

f

)

/

{

f

′

_{, . . . ,}

f

vn

.

_.

.

v

0 , . . . , f

′′

_}

_⇀

io

δ

f

a

r

/

{

f

′

_{, . . . ,}

r

f

vn

.

_.

.

v

0 [

g

a

0 ]

, . . . , f

′′

}

(

open r

f

)

/

{

f

′

_{, . . . ,}

r

f

vn

.

_.

.

v

0 [

g

a

_{, . . . , g}

z

_]

_{, . . . , f}

′′

_}

io

⇀

δ

f

ξ

r

/

{

f

′

_{, . . . ,}

r

f

vn

.

_.

.

v

0 [

g

a

_{, . . . , g}

z

_{, g}

ξ

0 ]

, . . . , f

′′

_}

(

open w

f

)

/

{

f

′

_{, . . . ,}

f

vn

.

v

0 , . . . , f

′′

_}

_⇀

io

δ

f

ξ

w

/

{

f

′

_{, . . . ,}

w

f

∅

, . . . , f

′′

}

(

open w

l

f

)

/

{

f

′

, . . . , f

′′

}

io

⇀

δ

f

ξ

w

/

{

f

′

_{, . . . ,}

w

f

∅

, . . . , f

′′

}

if

f /

∈ {

f

′

_{, . . . , f}

′′

_}

(

lose r

f

r

ξ

)

/

{

f

′

_{, . . . ,}

r

f

vn

.

_.

.

v

0 [

g

a

_{, . . . , g}

ξ

k

, . . . , g

z

]

, . . . , f

′′

}

io

⇀

δ

()

/

{

f

′

_{, . . . ,}

r

f

vn

.

_.

.

v

0 [

g

a

_{, . . . , g}

z

_]

_{, . . . , f}

′′

_}

(

lose r

f

r

ξ

)

/

{

f

′

_{, . . . ,}

r

f

vn

.

v

0 [

g

a

_{, . . . , g}

z

_]

_{, . . . , f}

′′

_}

_⇀

io

δ

()

/

{

f

′

_{, . . . ,}

r

f

vn

.

v

0 [

g

a

_{, . . . , g}

z

_]

_{, . . . , f}

′′

_}

(

lose r

f

r

ξ

)

/

{

f

′

_{, . . . ,}

r

f

vn

.

v

0 [

g

ξ

k

]

, . . . , f

′′

_}

io

⇀

δ

()

/

{

f

′

_{, . . . ,}

f

vn

.

v

0 , . . . , f

′′

_}

(

lose r

f

r

ξ

)

/

{

f

′

_{, . . . ,}

f

vn

.

_.

.

v

0 , . . . , f

′′

_}

_⇀

io

δ

()

/

{

f

′

_{, . . . ,}

f

vn

.

_.

.

v

0 , . . . , f

′′

_}

(

lose w

f

ξ

w

/

{

f

′

_{, . . . ,}

?

f

vn

.

_.

.

v

0 , . . . , f

′′

_}

io

⇀

δ

()

/

{

f

′

_{, . . . ,}

f

vn

.

_.

.

v

0 , . . . , f

′′

_}

where?

=

wor?

=

(

read

τ

_f

ξ

r

)

/

{

f

′

_{, . . . ,}

r

f

.

vk

_.

.

[

g

a

_{, .., g}

ξ

k

, . . . , g

z

_]

_{, . . . , f}

′′

_}

io

⇀

δ

v

k

/

{

f

′

_{, . . . ,}

r

f

.

vk

_.

.

[

g

a

_{, .., g}

ξ

m

, . . . , g

z

]

, . . . , f

′′

}

if

⊢

v

k

:

τ

and

m

=

k

+ 1

.

v

k

isthe

k

thvalueof

f

(

read

τ

f

r

ξ

)

/

{

f

′

, . . . ,

r

f

vn

.

v

0 [

g

a

, .., g

ξ

_k

, . . . , g

z

]

, . . . , f

′′

}

⇀

io

δ

n

/

{

f

′

, . . . ,

r

f

vn

.

v

0 [

g

a

, .., g

_k

ξ

, . . . , g

z

]

, . . . , f

′′

}

if

k > n

(

seek

f

ξ

r

k

)

/

{

f

′

_{, . . . ,}

r

f

vn

.

v

0 [

g

a

_{, .., g}

ξ

m

, . . . , g

z

]

, . . . , f

′′

}

io

⇀

δ

n

/

{

f

′

_{, . . . ,}

r

f

vn

.

v

0 [

g

a

_{, .., g}

ξ

k

, . . . , g

z

_]

_{, . . . , f}

′′

_}

(

write

(

v, f

ξ

w

))

/

{

f

′

, . . . ,

w

f

.

_.

_{, . . . , f}

′′

}

⇀

io

δ

()

/

{

f

′

, . . . ,

w

f

v

.

_.

.

, . . . , f

′′

}

if

A

c

(

v

n

)

6 =

True

(

delete

f

)

/

{

f

′

_{, . . . ,}

f

.

_{, . . . , f}

′′

_}

_⇀

io

δ

()

/

{

f

′

_{, . . . , f}

′′

_}

Fig.5.3.

δ

-rulesof thepersistentoperators

Third, weomplete oursemantisby givingthe

δ

-rules

io

⇀

δ

ofthe I/Ooperators given in Figure 5.3. The

openoperationopensale(in readorwrite mode)andreturnsanewhannel,pointingtothis le,to aess

tothevaluesoftheleorwritevaluesinthisle. Openingaleinwritemode,givesanemptyle. Ifpossible,

read

τ

givesthe valueof type

τ

ontainedin the le from the hannel. If nomore valueould be read then