EXTERNAL MEMORY IN BULK-SYNCHRONOUS PARALLEL ML
∗
FRÉDÉRIC GAVA
†
Abstrat. Afuntionaldata-parallellanguage alledBSMLwasdesignedforprogrammingBulk-SynhronousParallel
algo-rithms,amodelofomputingwhihallowsparallelprogramstobeportedtoawiderangeofarhitetures.BSMLisbasedonan
extensionoftheMLlanguagewithparalleloperationsonaparalleldatastruturealledparallelvetor. Theexeutiontimeanbe
estimated. Dead-loksandindeterminismareavoided. Forlargesaleappliationswhereparallelproessingishelpfulandwhere
thetotalamountofdataoftenexeedsthetotalmainmemoryavailable,paralleldiskI/Obeomesaneessity. Inthispaper,we
presentalibraryofI/OfeaturesforBSMLanditsformalsemantis.Aostmodelisalsogivenandsomepreliminaryperformane
resultsareshownforaommodityluster.
Keywords. ParallelFuntionalProgramming,ParallelI/O,Semantis,BSP.
1. Introdution. Some problems require performane that an only be provided by massivelyparallel
omputers. Programmingthesekindofomputersisstilldiult. Manyimportantomputationalappliations
involve solving problems with very large data sets [44℄. Suh appliations are also referred as out-of-ore
appliations. For example astronomial simulation [47℄, rash test simulation [10℄, geographi information
systems[32℄, weatherpredition [52℄, omputational biology[17℄, graphs[40℄ oromputationalgeometry [11℄
and many other sienti problems an involve data sets that are too large to t in the main memory and
therefore fall into this ategory. For another example, the Large Hadron Collider of the CERN laboratory
forndingtraesofexotifundamentalpartiles(webpageatlh-new-homepage.web.ern.h),whenstarts
running, thisinstrumentwill produes about10 Petabytesa month. The earth-simulator, the mostpowerful
parallel mahine in the top500 list, has 1 Petabyte of total main memory and 100 Petabytes of seondary
memories. Usingthemainmemoryisnotenoughtostoreallthedataofanexperiment.
Using parallelisman redue the omputation time andinrease theavailable memorysize, but for
hal-lenging appliations, the memory is always insuient in size. For instane, in a mesh deomposition of a
mehanialproblem, asientistmightwanttoinreasethemeshsize. Toinreasetheavailablememorysize,a
trivialsolutionis to usethevirtual memory mehanismpresentin modern operatingsystems. This hasbeen
established as a standard method for managing external memory. Its main advantage is that it allows the
appliationtoaesstoalargevirtualmemorywithouthavingtodealwiththeintriaiesofblokedseondary
memoryaesses. Unfortunately,thissolutionisineientifstandardpagingpoliy isemployed[7℄. Togetthe
best performanes, thealgorithmsmustberestruturedwithexpliitI/Oallsonthisseondarymemory.
Suhalgorithmsare generallyalledexternal memory (EM)algorithms andaredesignedforlarge
ompu-tational problemsin whih thesizeoftheinternalmemoryof theomputerisonlyasmallfrationofthesize
oftheproblem([55,53℄forasurvey). ParallelproessingisanimportantissueforEMalgorithmsforthesame
reasons that parallel proessing is of pratial interest in non-EM algorithm design. Existing algorithm and
datastrutureswereoftenunsuitableforout-of-oreappliations. Thisislargelyduetotheneedofloalityon
data referenes, whih is not generallypresent when algorithms are designedfor internal memory due to the
permissivenatureof thePRAMmodel: parallel EMalgorithms[54℄are new anddonotwork optimallyand
orretlyinlassial parallelenvironments.
Delarativeparallellanguagesareneededto simplifytheprogrammingofmassivelyparallel arhitetures.
Funtionallanguagesareoftenonsidered. Thedesignofparallelprogramminglanguagesisatradeobetween
thepossibilityto expresstheparallel featuresthat areneessaryforpreditableeieny (but withprograms
that are more diult to write, prove and port) and the abstration of suh features that are neessaryto
make parallel programmingeasier (but whih should not hinder eieny and performane predition). On
the onehand the programs should be eient but without the prieof non portability and unpreditability
of performanes. The portability of ode is needed to allow ode reuse on a wide variety of arhitetures.
Thepreditabilityofperformanesisneededtoguaranteethattheeieny willalwaysbeahieved,whatever
arhitetureisused.
∗
This work is supported by the ACI Grid program from the Frenh Ministry of Researh,under the projet Caraml
(http://www.araml.org )
†
Laboratory of Algorithms,Complexityand Logi(LACL),Universityof ParisXII, Val-de-Marne, 61avenuedu Généralde
Fig.2.1. TheBSPmodelofomputation
Another important harateristi of parallel programs is the omplexity of their semantis. Deadloks
and non-determinismoftenhinderthepratialuse ofparallelismby alargenumberofusers. Toavoid these
undesirableproperties,there isatrade-o betweentheexpressiveness ofthelanguageandits struturewhih
oulddereasetheexpressiveness.
We are urrently exploring the intermediate position of the paradigmof algorithmi skeletons [6, 42℄ in
ordertoobtainuniversalparallellanguageswheretheexeutionostaneasilybedeterminedfromthesoure
ode. Inthisontext, ostmeanstheestimate ofparallelexeutiontime. Thislastrequirementforestheuse
ofexpliitproessesorrespondingtotheproessorsoftheparallelmahine. Bulk-SynhronousParallelMLor
BSML is anextension of ML for programmingBulk-SynhronousParallel algorithmsas funtional programs
withaompositionalostmodel. Bulk-SynhronousParallel(BSP)omputingisaparallelprogrammingmodel
introduedbyValiant[46,50℄tooerahighdegreeofabstrationlikePRAMmodelsandyettoallowportable
andpreditableperformaneonawidevarietyofarhitetureswitharealistiostmodelbasedonastrutured
parallelism. Deadloksandindeterminismareavoided. BSPprogramsareportablearossmanyparallel
arhi-tetures. Suh algorithmsoerpreditable andsalableperformanes([38℄for asurvey) andBSMLexpresses
themwithasmallsetofprimitivestakenfromtheonuentBS
λ
alulus[37℄. Suhoperationsareimplementedasalibrary for thefuntional, witha strit evaluation strategy, programminglanguageObjetive Caml [33℄.
Wereferto[27℄formoredetailsaboutthehoieofthisstrategyformassivelyparallelomputing.
Parallel disk I/O has been identied as a ritial omponent of a suitable high performane omputer.
ResearhinEMalgorithmshasreentlyreeivedonsiderableattention. Overthelastfewyears,omprehensive
omputingandostmodelsthatinorporatedisksandmultipleproessorshavebeenproposed[35,55,54℄,but
notwithalltheaboveelements. [14,16℄showedhowanEMmahineantakefulladvantageofparalleldiskI/O
andmultipleproessors. ThismodelisbasedonanextensionoftheBSPmodelforI/Oaesses. Ourresearh
aimsatombiningtheBSPmodelwithfuntionalprogramming. WenaturallyneedtoalsoextendBSMLwith
I/OaessesforprogrammingEMalgorithms. Thispaperisthefollow-upto ourworkonimperativefeatures
ofourfuntionaldata-parallellanguage[22℄.
Thispaperdesribesafurtherstepafter[21℄towardsthisdiretion. Theremainderofthispaperisorganized
as follows. First we review the BSP model in Setion 2 and, then, briey present the BSML language. In
setion3weintroduetheEM-BSPmodelandtheproblemsthat appearin BSML.Insetion4,wethengive
newprimitivesforourlanguage. Insetion5,wedesribetheformalsemantisofourlanguagewithpersistent
features. Setion 6 is devoted to the formal ost model assoiated to our language and Setion 7 to some
benhmarksofaparallelprogram. Wedisussrelatedworkinsetion8andweendwithonlusionsandfuture
researh(setion9).
2. Funtional Bulk-Synhronous Parallel ML.
exeutesolletiverequestsofasynhronizationbarrier. Forthesakeofoniseness,wereferto[5,46℄formore
details. Inthismodel,aparallel omputationissubdividedinto supersteps(Figure2.1) attheend ofwhiha
barriersynhronizationandaroutingareperformed. Afterthat,allrequestsfordatapostedduringapreeding
supersteparefullled. Theperformaneofthemahineisharaterizedby3parametersexpressedasmultiples
oftheloalproessingspeed
r
:(i)
p
isthenumberofproessor-memorypairs;(ii)
l
isthetimerequiredforaglobalsynhronizationand(iii)
g
is the time for olletively deliveringa 1-relation, a ommuniation phase where every proessorreeives/sendsatmostoneword. Thenetworkandeliveran
h
-relationintimeg
×
h
foranyarityh
.Theseparametersaneasilybeobtainedusingbenhmarks[28℄. Theexeutiontimeofasuperstep
s
isthusthesumofthemaximalloal proessingtime, themaximaldatadeliverytime andtheglobalsynhronization
time,i.e, Time
(
s
) = max
i:processor
w
s
i
+ max
i:processor
h
s
i
∗
g
+
l
wherew
s
i
=loalproessing timeonproessori
during supersteps
andh
s
i
= max
{
h
s
i+
, h
s
i
−
}
whereh
s
i+
(resp.h
s
i
−
)isthenumberofwordstransmitted (resp.reeived)byproessor
i
duringsupersteps
. TheexeutiontimeP
s
Time(
s
)
ofaBSPprogramomposed ofS
superstepsisthereforethesumof3terms:t
comp
+
t
comm
+
L
where
t
comp
=
P
s
max
i
w
i
s
t
comm
=
H
×
g
whereH
=
P
s
max
i
h
s
i
L
=
S
×
l.
Ingeneral
t
comp
, H
andS
arefuntionsofp
andofthesizeofdatan
,orofmoreomplexparameterslikedataskew and histogram sizes. To minimizeexeution time, the BSP algorithm designmust jointly minimize the
number
S
ofsuperstepsandthetotalvolumeh
(resp.t
comp
)andimbalaneh
s
(resp.
t
comm
)ofommuniation(resp. loal omputation). Bulk Synhronous Parallelism and the Coarse-Grained Multiomputer (CGM),
whih an be seenasaspeial ase ofthe BSP model are used for alargevariety ofappliations. As stated
in [13℄ A omparison of the proeedings of the eminent onferene in the eld, the ACM Symposium on
Parallel Algorithms and Arhitetures between thelate eighties and thetime from the mid-nineties to today
revealsastartlinghangein researhfous. Today,themajorityofresearhinparallelalgorithmsiswithinthe
oarse-grained,BSPstyle,domain.
bsp_p:unit
→
int bsp_l:unit→
oat bsp_g:unit→
oatmkpar:( int
→
α
)→
α
parapply:(
α
→
β
)par→
α
par→
β
partype
α
option=None|Someofα
put:(int
→
α
option)par→
(int→
α
option) parat:
α
par→
int→
α
Fig.2.2. TheCoreBsmllibLibrary
2.2. Bulk-Synhronous Parallel ML. BSML does not rely on SPMD programming. Programs are
usualsequential ObjetiveCaml(OCaml)programs[33℄ but work onaparalleldatastruture. Someof the
advantages aresimplersemantisandbetterreadability. Theexeutionorderfollowsthereadingorder in the
soure ode (or, at least, the resultsare suh asseems to followthe exeution order). There is urrently no
implementationofafullBSMLlanguagebutratherapartialimplementationasalibraryforOCaml(webpage
athttp://bsmllib.free.fr/).
Theso-alledBSMLliblibraryisbasedon theelementsgiven inFigure 2.2. Theygiveaess tothe BSP
parameters of the underling arhiteture: bsp_p() is
p
the stati number of proesses (this value does nothange during exeution), bsp_g() is
g
thetime for olletively delivering a1-relationand bsp_l()isl
thetimerequiredforaglobalsynhronizationbarrier.
Thereisanabstratpolymorphitype
α
parwhihrepresentsthetypeofp
-wideparallelvetorsofobjetsoftype
α
oneper proessor. BSMLparallel onstrutsoperateon parallelvetors. Thoseparallel vetorsarereatedbymkparsothat( mkparf)stores(fi)onproess
i
fori
between0
andp
−
1
:mkpar f
= (
f0) (
f1)
· · ·
(
fi
)
· · ·
(
f(
p
−
1))
We usually write f as funpid
→
e to show that the expressione
may be dierent on eah proessor. Thisit is said to be global. A usual ML expression whih is not within a parallel vetor is alled repliate, i.e,
identialtoeahproessor. ABSPalgorithmisexpressedasaombinationofasynhronousloalomputations
(rst phase of a superstep) and phases of global ommuniation (seond phase of a superstep) with global
synhronization (third phaseofa superstep). Asynhronousphasesareprogrammedwith mkpar andapply
suhthat(apply( mkparf) ( mkpare))stores(( fi)(ei))onproess
i
:apply
f
0
f
1
· · ·
f
i
· · ·
f
p
−
1
v
0
v
1
· · ·
v
i
· · ·
v
p
−
1
= (
f
0
v
0
) (
f
1
v
1
)
· · ·
(
f
i
v
i
)
· · ·
(
f
p
−
1
v
p
−
1
)
Letusonsiderthefollowingexpression:
let vf=mkpar(funpidx
→
x+pid )andvv=mkpar(funpid
→
2∗
pid+1)inapply vfvv
Thetwoparallelvetorsarerespetivelyequivalentto:
funx
→
x+ 0
funx→
x+ 1
· · ·
funx→
x+
i
· · ·
funx→
x+ (
p
−
1)
and
0 3
· · ·
2
×
i
+ 1
· · ·
2
×
(
p
−
1) + 1
Theexpressionapplyvfvvisthenevaluatedto:
0 4
· · ·
2
×
i
+ 2
· · ·
2
×
(
p
−
1) + 2
ReadersfamiliarwithBSPlib[28℄willobservethatweignorethedistintionbetweenaommuniationrequest
and its realization at the barrier. The ommuniation and synhronization phases are expressed by put.
Consider the expression: put (mkpar(fun i
→
fsi
))(
∗
)
. To send a value v from proessj
to proessi
, the funtion fsj
at proessj
must besuh that(
fsj
i
)
evaluatesto Somev. Tosend novalue from proessj
to proessi
,(
fsj
i
)
mustevaluatetoNone. Theexpression(
∗
)
evaluatestoaparallelvetorontainingafuntion fdi
ofdeliveredmessagesoneveryproessi
. Atproessi
,(fdi
j
)
evaluatestoNoneifproessj
sentnomessage toproessi
orevaluatestoSomevifproessj
sentthevaluevto theproessi
.The full language would also ontain asynhronousprojetionoperationat. (atven) returns thenth
valueoftheparallelvetorve:
at
v
0
· · ·
v
n
· · ·
v
p
−
1
n
=
v
n
atexpressesommuniationandsynhronizationphases. Withoutit,theglobalontrolannottakeintoaount
dataomputedloally. Globalonditionalisneessaryforexpressingalgorithmslike:RepeatParallelIteration
UntilMaxofloalerrors
< ǫ
. Thenestingofpartypesisprohibitedandtheprojetionshouldnotbeevaluated insidethesopeofamkpar. Ourtypesystemenforestheserestritions[23℄.2.3. Examples.
2.3.1. Often Used Funtions. Some usefulfuntions anbedened byusing onlythe primitives. For
examplethefuntionrepliatereatesaparallelvetorwhihontainsthesamevalueeverywhere.Theprimitive
applyanbeusedonlyforaparallelvetoroffuntionswhihtakeonlyoneargument. Todealwithfuntions
whihtaketwoargumentsweneedtodenetheapply2funtion.
letrepliatex=mkpar(fun pid
→
x)letapply2vfv1v2 =apply (applyvfv1)v2
It is also ommonto apply thesame sequentialfuntion at eah proess. This anbedone using theparfun
funtions. Theyonlydierin thenumberofargumentstoapply:
letparfunfv=apply(repliatef)v
Itisalsoommontoapplyadierentfuntiononaproess. applyat
n f
1
f
2
v
appliesfuntionf
1
atproessn
andfuntion
f
2
at otherproesses:letapplyatnf1f2v=
apply(mkpar(funi
→
ifi=nthenf1elsef2))v2.3.2. CommuniationFuntion. Ourexampleisthelassialomputationoftheprex ofalist. Here
wemakethehypothesisthattheelementsofthelistaredistributedtoalltheproessesaslists. Eahproessor
performs aloal redution,then sendsitspartial resultto the followingproessorsand nallyloallyredues
itspartialresultwiththesentvalues. Takeforexamplethefollowingexpression:
san_list_direte(+)
[1; 2] [3; 4] [5]
Itwillbeevaluatedto:
[
e
+ 1;
e
+ 1 + 2] [
e
+ 1 + 2 + 3;
e
+ 1 + 2 + 3 + 4; ] [
e
+ 1 + 2 + 3 + 4 + 5]
for a prex of three proessors and where
e
is the neutral element (here0
). To do this, we need rst the omputationoftheprexofaparallelvetor:(
∗
san_diret:(α
→
α
→
α
)→
α
→
α
par→
α
par∗
) letsan_diretopevv=letmkmsgpidvdst=ifdst<pidthenNoneelseSomevin
letpros_lists =mkpar(funpid
→
from_to0pid)in letreeivedmsgs=put(apply(mkparmkmsg)vv) inletvalues_lists =parfun2List. map
(parfun(omposenoSome)reeivedmsgs)pros_listsin
applyat0(fun_
→
e )(List. fold_left ope)values_listswhere
List. map
f
[
v
0
;
. . .
;
v
n
] = [(
f v
0
);
. . .
; (
f v
n
)]
List.fold_left
f e
[
v
0
;
. . .
;
v
n
] =
f
(
· · ·
(
f
(
f e v
0
)
v
1
)
· · ·
)
v
n
from_ton m
= [
n
;
n
+ 1;
n
+ 2;
...;
m
]
noSome(Somev
)=
v
ompose
f g x
= (
f
(
g x
))
.
Then,weandiretly havetheprexoflistsusingsomegenerisan:
letsan_wide sanseq_san_lastmapopevv=
letloal_san=parfun(seq_san_lastope)vvin
letlast_elements=parfunfstloal_sanin
letvalues_to_add=(sanopelast_elements)in
letpop=applyat0(funxy
→
y)opinparfun2map(popvalues_to_add)(parfunsndloal_san)
letsan_wide_diretseq_san_lastmapopevv=
san_widesan_diret seq_san_lastmapopevv
letsan_listsanopevl=
san_widesanseq_san_lastList.mapopevl
(
∗
san_list_diret :(α
→
α
→
α
)→
α
→
α
listpar→
α
listpar∗
) letsan_list_diretopevl=san_listsan_diretopevlwhereseq_san_last
f e
[
v
0
;
v
1
;
. . .
;
v
n
] = (
last,
[(
f e v
0
);
f
(
f e v
0
)
v
1
;
. . .
;
last])
wherelast
=
f
(
· · ·
(
f
(
f e v
0
)
v
1
)
· · ·
)
v
n
. TheBSP ostformulaoftheabovefuntion (assumingophasaonstant ostc
op
)isthus2
×
N
×
c
op
×
r
+ (
p
−
1)
×
s
×
g
+
l
wheres
denotesthesizeinwordsofavalueomputeby thesanandN
thelengthofthebiggestlistheld ataproess. Wehavethusthetimeto omputethepartialP/M P/M P/M P/M P/M
Disk
D
−
1
Disk0
InternalBusCPU
Memory
@
Network
Router
Fig.3.1. ABSPomputerwithexternalmemories
2.4. Advantages of Funtional BSP Programming. OneimportantbenetoftheBSPmodelisthe
abilitytoauratelypredittheexeutiontimerequirementsofparallelalgorithms. Communiationsarelearly
separated from synhronization, i. e., this avoids deadloks and it an be performed in any order, provided
that the information is delivered at the beginning of the next superstep. This is ahieved by onstruting
analytial formulas that are parameterizedby afew valueswhih aptured the omputation, ommuniation
andsynhronizationperformaneoftheparallelsystem.
The larity, abstration and formal semantis of funtional language make them desirable vehiles for
omplexsoftware. Thefuntionalapproahofthisparallelmodelallowsthere-useof suitabletehniquesfrom
funtional languagesbeauseafewnumberofparallel primitivesisneeded. PrimitivesoftheBSML language
with astrit strategyare derivedfrom aonuentalulus [37℄ soparallel algorithms arealso onuent and
keeptheadvantagesoftheBSPmodels: nodeadlok,eientimplementationusingoptimizedommuniation
algorithms, statiostformulasand ostprevisions. Thelazy evaluation strategyof purefuntional language
isnotsuited fortheneedofthemassivelyparallelprogrammer. Lazyevaluationhastheunwantedpropertyof
hidingomplexity fromthe programmer[27℄. The stritstrategy ofOCamlmakesthe BSMLlib abettertool
forhighperformaneappliationsbeauseprogramsaretransparentinthesenseofmakingomplexityexpliit
inthesyntax.
Also,asin funtionallanguages,weouldeasily prove and ertify funtional implementationof suh
algo-rithms withaproofassistant[1,4℄asin [20℄. Usingtheextration possibilityoftheproof assistant,weould
generate aertied implementation to be used independently of the sequential orparallel implementation of
theBSMLprimitives.
3. ExternalMemory.
3.1. The EM-BSP model. Modern omputers typially haveseverallayersof memorieswhih inlude
the main memory and ahesas well as disks. We restritourselves to the two-level model [54℄ beause the
speeddierene between disks and the main memory is muh more signiant than between other layers of
memories. [16℄extendedtheBSPmodelto inludeseondaryloal memories. The basiideaissimpleand it
isillustratedin Figure 3.1. Eahproessorhas,in additionto itsloal memory,anexternalmemory (EM)in
theformofasetofdisks. ThisideaisappliedtoextendtheBSPmodeltoitsEMversionalledEM-BSPby
addingthefollowingparameterstothestandardBSP parameters:
(i)
M
istheloalmemorysizeofeahproessor;(ii)
D
is thenumberofdiskdrives ofeahproessor;(iii)
B
isthetransferbloksize ofadiskdrive,and(iv)
G
is theratio of loal omputationalapaity (number of loal omputationoperations)dividedbyloal I/Oapaity(numberof bloksof size
B
that anbetransferred betweentheloal disksand memory)perunit time.
Inmanypratialases,allproessorshavethesamenumber ofdisksand,thus,themodelisrestritedto
thatase(althoughthemodelforbidsdierentmemorysizes). Thediskdrivesofeahproessoraredenotedby
D
0
,
D
1
, . . . ,
D
D
−
1
. Eahdriveonsistsofasequeneoftraks whihanbeaessedbydiretrandomaess. Atrakstoresexatlyoneblokof
B
words. EahproessoranuseallitsD
diskdrivesonurrently andtransferD
×
B
wordsfrom/to theloal disks to/from itsloal memoryin a singleI/Ooperationbeingat ostG
. Inassumed to be ableto storein its loal main memory at least somebloks from eah disk at the sametime,
i.e.,
M >> DB
.Likeomputation onthe BSPmodel, theomputation ofthe EM-BSPmodelproeeds in asuession of
supersteps. TheommuniationostsarethesameasfortheBSPmodel. TheEM-BSPmodelallowsmultiple
I/Ooperationsduringtheomputationphaseofthesuperstep. Thetotalostofeahsuperstepisthusdenedas
t
comp,io
+
t
comm
+
L
wheret
comp,io
istheomputationalostandadditionalI/Oosthargedforthesupersteps,i.e,
t
comp,io
=
P
s
max
i
(
w
s
i
+
m
s
i
)
wherem
s
i
istheI/Oostinurredbyproessori
duringsupersteps
. Wereferto[16℄to havetheEM-BSPomplexityofsomelassialBSPalgorithms.
3.2. Examples of EM algorithms. Our rst example is the matrix inversion whih is used by many
appliations as a diret method to solve linear systems. The omputation of the inverse of a matrix
A
an be derivedfrom itsLUfatorization. [8℄presentsthe LUfatorization bybloks. Forthis parallelout-of-orefatorization, the matrix is divided in bloks of olumns alled superbloks. The width of the superblok is
determinedbytheamountofphysialavailablememory: onlybloksoftheurrentsuperblokareinthemain
memory,theothersareondisks. Thealgorithmfatorizethematrixfromlefttoright,superblokbysuperblok.
Eah time a newsuperblok of the matrix is fethed in the main memory (alled the ative superblok), all
previouspivoting andupdateofahistoryoftheright-lookingalgorithmare appliedto theativesuperbloks.
Onethelastsuperblokisfatorized,thematrixisre-readtoapplytheremainingrowpivotingofthereursive
phases. Note that the omputation isdone data in plae, thematrix hasbeenrst distributed onproessors
andthus,forloadbalaning,aylidistributionofthedataisused.
[9℄presentsPRAMalgorithmsusingexternal-memoryfor graphproblemsasbionneted omponentsofa
graphorminimumspanningforest.Oneofthemisthe3-oloringofayleappliedtondinglargeindependents
setsfortheproblemoflistranking(determine,foreah node
v
ofalist,therankofv
dene asthenumberoflinksfrom
v
totheendofthelist). Themethodsforsolvingitistoupdatesatteredsuessorandpredeessorolorsas neededafter re-oloring agroupof nodes of the list without sortingor sanningthe entire list. As
before, thealgorithmsworksgroupbygroupswithonlyonegroupinthemainmemory.
Thelastexampleisthemulti-stringsearhproblemwhihonsistsofdeterminingwhihof
k
patternstrings ourin anotherstring. Importantappliations onbiologialdatabasesmakeuseofverylargetextolletionsrequiringspeializednontrivialsearhoperations. [19℄desribesanalgorithmforthisproblemwithaonstant
numberofsupersteps andbased onthedistributionof aproperdata strutureamong theproessorsand the
disksto redue andbalanetheommuniationost. Thisdatastruture isbasedonabindtree builtonthe
suxes of thestrings and thealgorithm works onlongest ommon prex onsuh trees and bylexiographi
order. Thealgorithmtakesadvantageofdisks byonlykeepingapartofabind treein themain memoryand
byolletingsubpartoftreesduringthesupersteps.
4. ExternalMemoryin BSML.
4.1. Problems by Adding I/O in BSML. Themainproblem byaddingexternalmemoryand soI/O
operators to BSML is to keep safethe fat that in the global ontext, the repliatevalues, i.e, usual OCaml
valuesrepliateoneahproessor,arethesame. Suhvaluesaredediatedtotheglobalontroloftheparallel
algorithms. Takeforexamplethefollowingexpression:
lethan=open_in"le. dat"in
if( at(mkpar(funpid
→
( pidmod2)=0))(input_valuehan))thensan_diret(+)0(repliate1)
else( repliate1)
Itis nottruethat theleoneahproessorontainsthesamevalue. Inthis ase,eah proessorreads onits
seondarymemoryadierentvalue. Wewouldhaveobtainedaninoherentresultbeauseeahproessorreadsa
dierentintegeronthehannel hanandsomeofthemwouldexeutesan_diretwhihneedasynhronization.
Otherswould exeuterepliatewhihdoesnotneedasynhronization. This breakstheonuentresultofthe
BSML languageand the BSP model of omputation with its global synhronizations. If this expression had
been evaluated with theBSMLlib library, wewould havea breakdown of the BSP omputerbeauseat is a
globalsynhronousprimitive. NotethatwealsohavethiskindofproblemsintheBSPlib[28℄wheretheauthors
P/M P/M P/M P/M P/M
Disk
0
Disk
D
g
−
1
Network
Router
Fig.4.1.ABSPomputerwithshareddisks
leta=mkpar(fun i
→
ifi=0then(open_in"le.dat");() else())in( open_out"le.dat")
where () is an empty value. If this expression had been evaluated with the BSMLlib library, only the rst
proessorwouldhaveopenedthelein areadmode. After,eahproessoropenedthelewiththesamename
in awrite mode exepttherstone. This lehasalreadybeenopenedin readmode. Wewouldalsohavean
inoherentresultbeausetherst proessorraisedanexeptionwhih isnotaughtat allby otherproesses
in theglobalontext. Thisproblem ofside-eetsouldalso beombinedwiththerstproblem ifthere isno
leatthebeginningoftheomputation. Takeforexamplethefollowingexpression:
lethan=open_out "le .dat"in
letx=mkpar(funi
→
ifi=0 then(ouput_value0) else())inouput_value1;loseha;
lethan=open_in"le.dat"in
if(at(mkpar( fun pid
→
(pid mod2)=0))(input_valuehan))thensan_diret(+)0(repliate1)
else(repliate1)
Therstproessoraddstheintegers1and2onitsleandotherproessorsaddtheinteger2ontheirles. As
in therstexample,wewouldhaveabreakdownof theBSPomputerbeausetheintegerread would notbe
thesameandatisaglobalsynhronousprimitive.
4.2. The proposed solution. Oursolution isto havetwokindsof les: globaland loal ones. In this
way,wehavetwokindsofI/Ooperators. LoalI/Ooperators donothaveto ourin theglobalontext and
global I/O ones do not haveto our loally. Loal les are in loal le systems whih are presents in eah
proessorasintheEM-BSPmodel. Globallesareinagloballesystem. Theselesneedtobethesamefrom
thepointofviewof eahnode. Thegloballesystemisthusin shared disks (as inFigure4.1)orasaopyin
eahproessor. Theythusalwaysgivethesamevaluesfortheglobalontext. Notethatiftheyareonlyshared
disksandnotloal ones,theloalle systemsouldbeindierentdiretories,oneperproessorin theglobal
lesystem.
An advantageofhavingshareddisksistheaseofsomealgorithmswhih donothavedistributeddata at
thebeginningoftheomputation. Asthosewhihsort,thelistofdatatosortisinagloballeatthebeginning
oftheprogramandin anothergloballeattheend. Ontheotherhand,intheaseofadistributedgloballe
system,theglobaldataarealsodistributed andprogramsarelesssensitivetotheproblemoffaults. Thus, we
havetwoimportant asesfor the global lesystem whih ouldbe seenasanew parameterof the EM-BSP
mahine: haveweshareddisksornot?
Intherst ase,the onditionthatthe globalles arethesamefor eah proessorpointof viewrequires
some synhronizations forsome global I/Ooperators asreated, opened ordeleted ale. For example, it is
impossibleorun-deterministiforaproessortoreatealeinthegloballesystemifatthesametimeanother
proessordeleted it. On theother hand, reading(resp. writing) valuesfrom (resp. to)les do notneed any
synhronization. Alltheproessorsreadthesamevaluesin thegloballeandonlyoneoftheproessorsneeds
toreallywritethevalueontheshareddisks. Intheaseofaglobaloutputoperatoronlyoneoftheproessors
writesthevalueandintheaseofaglobal inputoperatorthevalueisrstread fromthedisksbyaproessor
and thenis read by other proessorsfrom theoperatingsystem buers. In this way, forall global operators,
thereisnotabottlenekoftheshareddisks.
Intheseondase,alltheles, loaland globalones,aredistributed andnosynhronizationis neededat
Notethatmanymodern parallelmahineshaveonurrentshareddisks. Suhdisksarealwaysonsidered
asuser disks, i.e,diskswhere theusersput thedataneededfortheomputationswhereasloaldisks areonly
generallyusedfor theparallelomputations ofprograms. Forexample,theearth simulatorhas1,5 Petabytes
for users as mass storage disks and aspeial network to aess them. If there are no shared disks, NFS or
salablelowlevellibrariesasin[36℄areabletosimulateonurrentshareddisks. Notealsothatiftheyareonly
shareddisks, loal disks ould be simulatedby usingdierentdiretoriesfor theloal disks of theproessors
(onediretoryforoneproessor).
0
1e-05
2e-05
3e-05
4e-05
5e-05
6e-05
0
2000
4000
6000
8000
10000
12000
14000
Size in words of the values on each disk
Write or read values
1.4e-05
1.45e-05
1.5e-05
1.55e-05
1.6e-05
1.65e-05
1.7e-05
1.75e-05
1.8e-05
1.85e-05
3900
3950
4000
4050
4100
4150
4200
4250
4300
Size in words of the values on each disk
Write or read values
Fig.4.2. BenhmarksofEMparameters
4.3. Our new model. After some experiments to determine the EM-BSP parameters of our parallel
mahine,wehavefoundthatoperatingsystemsdonotread/writedatainaonstanttimebutin alineartime
dependingonthesizeofthedata. Wealsonotiethatthereisanoverheaddependingonthesizeofthebloks,
to read/write thisvalue from/tothe
D
onurrentdisks. Figure 4.2givestheresultsof this experimentonaPCwith3disks,eahdiskwith bloksof4096words(seondsare plottedonthevertialaxis). Thisprogram
wasrun10000timesandtheaveragewastaken. Suhresultsarenotalteredifwedereasethenumberofdisks.
Ourproposedsolutiongivestheproessorsaesstotwokindsofles: globalandloalones. Bythis way,
ourmodelalled EM
2
-BSP extendstheBSPmodelto itsEM
2
versionwithtwokindsofexternalmemories,
loal and globalones. Eahloalle systemwill beonloal onurrentdisks asin the EM-BSPmodel. The
globalonewillbeononurrentshareddisks(asinFigure4.1)iftheyexistorrepliateontheloaldisks. The
EM
2
-BSP model isthus ableto takeintoaountthe timeto read thedata and todistributed theminto the
proessors. ThefollowingparametersarethusaddingtothestandardBSPparameters:
(i)
M
istheloalmemorysizeofeah proessor;(ii)
D
l
isthenumberofindependentdisksofeahproessor;
(iii)
B
l
isthetransferbloksizeofaloaldisk;
(iv)
G
l
isthetimetoreadorwriteinparallelonewordoneahloal disk;
(v)
O
l
istheoverheadoftheonurrentloaldisks;
(vi)
D
g
isthenumberofindependentshareddisks(orglobaldisks);
(vii)
B
g
isthetransferbloksize ofaglobaldisk;
(viii)
G
g
is thetimetoread orwritein parallelonewordoneah globaldiskand
(ix)
O
g
istheoverheadoftheonurrentglobaldisks.
Ofourse,iftherearenoshareddisksornoloaldisks:
D
l
=
D
g
,
B
l
=
B
g
,
G
l
=
G
g
and
O
l
=
O
g
. Aproessor
isableto read/write
n
wordsto itsloaldisksintime⌈
n
D
l
⌉ ×
G
l
+
⌈
n+1
D
l
B
l
⌉ ×
O
l
and
n
wordstotheglobaldisksintime
⌈
n
D
g
⌉ ×
G
g
+
⌈
D
n+1
g
B
g
⌉ ×
O
g
.Asin theEM-BSPmodel,theomputationoftheEM
2
-BSPmodel proeedsinasuessionofsupersteps.
TheommuniationostsarethesameasfortheEM-BSPmodelandmultipleI/Ooperationsarealsoallowed
duringtheomputationphaseofasuperstep.
Notethat
G
g
isnot
g
evenifproessorsaesstotheshareddisksbythenetwork(inaseofsomeparallelmahines):
g
is the time to perform a1
-relation andG
g
is the time to read/write
D
words on the sharedonurrentdisks. Itoulddependon
g
insomeparallelmahineaslustersbutitoulddependonmanyotherhardwareparametersif,forexample,thereisaspeialnetworktoaess tothesharedonurrentdisks.
4.4. New Primitives. In this setion wedesribethe ore of ourI/O library, i. e., the minimal set of
primitivesfor programmingEM
2
-BSPalgorithms. Thislibrarywill beinorporatedin the nextreleaseofthe
BSMLlib. ThisI/Olibrary isbasedon theelementsgivenin Figure4.3. Asin the BSMLliblibrary, wehave
funtionstoaesstotheEM
2
-BSPparametersoftheunderliningarhiteture. Forexample,embsp_lo_D()
is
D
l
the number of loal disks and glo_shared() givesif the global le systemis shared ornot. Sine we
havetwole systems, weneed twokindsof namesand twokinds ofabstrat types ofoutput hannels (resp.
input hannels): glo_out_hannel (resp. glo_in_hannel) and lo_out_hannel (resp. lo_in_hannel) to
read/writevaluesfrom/to globalorloalles.
Weanopenanamedleforwriting. Theprimitivereturnsanewoutputhannel onthat le. Theleis
trunatedtozerolengthifitalreadyexists. Eitheritisreatedortheprimitivewillraiseanexeptionifthele
ouldnotbeopened. For this, wehavetwokindsof funtions forglobal andloal les: (glo_open_outF)
whihopensthe globalle Fin write mode andreturns aglobalhannelpositioned at the beginning of that
le and(lo_open_outf ) whihopens theloal le fin writemodeand returnsaloal hannel positioned
at thebeginningof that le. Inthe samemanner, wehavetwofuntions,glo_open_inandlo_open_in
foropeninganamedlein readmode. Suhfuntions returnnewloal orglobalinputhannels positioned at
thebeginningoftheles. Intheaseofglobal shareddisks, asynhronizationours foreahglobalopen.
With this global synhronization, eah proessorouldsignalto the otherones if itmanaged to openthe le
withouterrorsornotandeahproessorwouldraiseanexeptionifoneofthem hasfailedto openthele.
Now, with our hannels, weanread/write values from/to theles. This feature is generallyalled
per-sistene. Towritetherepresentation ofastrutured valueofanytypeto ahannel(global orloal),weused
thefollowingfuntions: (glo_output_valueChav)whihwrites therepliatevaluev totheopenedglobal
leand(lo_output_valuehav)whihloallywritestheloalvaluevtotheopenedloalle. Theobjet
anbethen read bak,by thereading funtions: (glo_input_valueCha) (resp. (lo_input_valueha))
EM -BSPparameters
embsp_lo_D:unit
→
int embsp_lo_B:unit→
int embsp_lo_G :unit→
oat embsp_glo_D: unit→
int embsp_glo_B:unit→
int embsp_glo_G:unit→
oat embsp_lo_O: unit→
oat embsp_glo_O:unit→
oat glo_shared :unit→
boolGlobalI/Oprimitives LoalI/Oprimitives
glo_open_out:glo_name
→
glo_out_hannel glo_open_in:glo_name→
glo_in_hannel glo_output_value:glo_out_hannel→
α
→
unit glo_input_value:glo_in_hannel→
α
option glo_lose_out:glo_out_hannel→
unit glo_lose_in:glo_in_hannel→
unit glo_delete: glo_name→
unitglo_seek: glo_in_hannel
→
int→
unitlo_open_out:lo_name
→
lo_out_hannel lo_open_in:lo_name→
lo_out_hannel lo_output_value: lo_out_hannel→
α
→
unit lo_input_value: lo_in_hannel→
α
option lo_lose_out: lo_out_hannel→
unit lo_lose_in:lo_in_hannel→
unit lo_delete :lo_name→
unitlo_seek:lo_in_hannel
→
int→
unitFromloaltoglobal
glo_opy : int
→
lo_name→
glo_name→
unitFig.4.3. TheCoreI/OBsmllibLibrary
Suh funtions read therepresentation ofa strutured valueand we referto [34℄ abouthavingtype safetyin
hannels andreading them in asafeway. We alsohave(glo_seekChan) (resp.lo_seek)whih allowsto
positioned thehannelat the
n
th valueofagloballe (resp. loalle). Thebehaviorisunspeiedifany oftheabovefuntionsisalled withalosedhannel.
Notethatonlyloalorrepliatevaluesouldbewritten onloal orgloballes. Nesting ofparallelvetors
is prohibited and thus lo_output_value ouldonly write loal values. It is also impossibleto write ona
sharedgloballeaparallelvetorofvalues(globalvalues) beausethesevaluesaredierentoneahproessor
andglo_output_valueis anasynhronousprimitive. Suh valuesouldbewritten in anyorder and ould
bemixedwithothervalues. Thisiswhyonlyloalandrepliatevaluesshouldberead/writefrom/todisks(see
setion6formoredetails).
After, read/write valuesfrom/to hannels, we need to lose them. As previously, weneed four kinds of
funtions: twoforthe input hannels (loal and global ones) andtwofor the output hannels. Forexample,
(glo_lose_outCha),losestheglobaloutputhannelChawhihhadbeenreatedbyglo_open_out. The
glo_deleteandlo_deleteprimitivesdeleteaglobaloraloal leifitisrstlosed.
Thelastprimitiveopiesaloallefromaproessortothegloballesystem. Itisthusaglobalprimitive.
(glo_opyn fF)opiestheleffromtheproessorntothegloballesystemwiththenameF . Thisprimitive
ouldbeusedat theendof aBSML programto opytheloalresultsfrom loal lestotheglobal (user) le
system. Itisnotaommuniationprimitivebeauseusedasaommuniationprimitive,glo_opyhasamore
expensiveostthananyommuniationprimitive(seesetion6). Intheaseofadistributedgloballesystem,
the le is dupliated on all the global le systems of eah proessorand thus all the data of the le are all
put intothe network. Ontheontrary,in theaseof globalshareddisks, itis just aopyofthe lebeause,
aessto theglobalshareddisks isgenerallyslowerthan puttingvaluesinto thenetworkand readthem bak
byanotherproessor.
Usingtheseprimitives,thenalresultofanyprogramwouldbethesame(butnaturallywithoutthesame
totaltime, i. e.,without thesame osts)with shareddisk ornot. Now, to better understand how these new
primitiveswork,wedesribeaformalsemantisofourlanguagewithsuhpersistentfeatures.
5. HighOrder Formal Semantis.
5.1. Mini-BSML. Reasoning on the omplete denition of a funtional and parallel language suh as
BSML, would have been omplex and tedious. In order to simplify the presentation and to ease the formal
expressionsofmini-BSML,written
e
possiblywithaprime orsubsript,havethefollowingabstrat syntax:e
::=
x
variables|
c
onstants|
op operators|
funx
→
e
abstration|
(
e e
)
appliation|
letx
=
e
ine
binding|
(
e, e
)
pairs|
if
e
then
e
else
e
onditional|
(
mkpar
e
)
parallelvetor|
(
apply
e e
)
parallelappliation|
(
put
e
)
ommuniation|
(
at
e e
)
projetion|
f
lenamesorhannelsInthis grammar,
x
rangesoveraountable set ofidentiers. The form(
e e
′
)
stands forthe appliationof a
funtion or anoperator
e
, to anargumente
′
. Theform fun
x
→
e
is the so-alledand well-known lambda-abstration that denes the rst-lass funtion of whih the parameter isx
and the resultis the value ofe
.Constants
c
are theintegers, the booleans, thenumber of proessesp and we assumehavinga uniquevaluefor thetypeunit: (). The set ofprimitiveoperations
op
ontainsarithmeti operations, pairoperators,testfuntionisnofthenonstrutorwhihplaystheroleoftheNoneonstrutorin OCaml,xpointtodened
naturaliteration funtions and ourI/O operators: open r
(resp. open w
) to open ale as ahannel in read
mode (resp. write mode), lose r
(resp. lose w
) to losea hannel in read mode (resp. write mode), read ,
writetoread orwritein ahannel, deletetodeleteale andseekto hangethereadingposition. All those
operatorsaredistinguishedwithasubsriptwhihis
l
foraloaloperatorandg
foraglobalone. Wealsohaveourparalleloperators: mkpar,apply, putand at. Wealso havetwokindsofle systems,theloal andthe
globalones,dened with(possiblywithaprime):
•
f
foralename;•
f
wfora write hannel,
f
rfor aread hannel and
g
ξ
k
fora hannel pointed onthek
th valueof a lewhere
ξ
isthenameofthehannel;•
?
f
vn
.
.
.
v
0
is a le where
?
is , r or w for a lose le or an open le in read or write mode and wherev
0
, . . . , v
n
thevaluesholdinthele.Whenaleisopenedinreadmode,itontainsthename
[
g
a
n
, . . . , g
z
m
]
ofthehannelsthatpointedtoitandthe positionofthesehannels. Beforepresentingthedynamisemantisofthelanguage,i.e.,howtheexpressionsof mini-BSMLare omputedto values, wepresentthevalues themselvesand thesimpleML types[39℄ ofthe
values. Thereisonesemantispervalueof
p
,thenumberofproessesoftheparallelmahine. Inthefollowing,theexpressionsare extendedwith theparallel vetors:
h
e, . . . , e
i
(nesting ofparallel vetorsisprohibited; ourstatianalysisenforesthisrestrition[23℄). Thevaluesofmini-BSMLaredened bythefollowinggrammar:
v
::=
funx
→
e
funtionalvalue|
c
onstant|
op primitive|
(
v, v
)
pairvalue| h
v, . . . , v
i
p-wideparallelvetorvalue|
f
lenamesorhannelsandthesimpleMLtypesofvaluesaredened bythefollowinggrammar:
τ
::=
κ
basetype(bool,int,unit,lenamesorhannels)|
α
typevariables|
τ
1
→
τ
2
typeoffuntionalvaluesfromτ
1
toτ
2
|
τ
1
∗
τ
2
typeforpairvaluesWenote
⊢
v
:
τ
tosaythat thevaluev
hasthetypeτ
andwereferto [39℄ foranintrodution tothetypesof theMLlanguageandto[23℄ forthoseof BSML.5.2. HighOrder Semantis. Thedynamisemantisisdenedbyanevaluationmehanismthatrelates
expressions to values. To express this relation, we used a small-step semantis. It onsists of a prediate
between an expression and another expression dened by a set of axioms and rules alled steps. The
small-step semantis desribes all the steps of the language from an expression to a value. We suppose that we
evaluate only expressions that havebeen type-heked[23℄ (nesting of parallel vetorshas been prohibited).
Unlike in asequentialomputerwith asequentialprogramminglanguage,aunique lesystem (a setof les)
for persistent operators is not suient. We need to express the le system of all our proessors and our
global le system. We assume a nite set
N
=
{
0
, . . . , p
−
1
}
whih represents the set of proessor names and we writei
for these names and⋊
⋉
for the whole parallel omputer. Now, we an formalize the les foreahproessorand for thenetwork. Wewrite
{
f
i
}
forthe lesystem ofproessori
withi
∈ N
. Weassumethat eah proessorhasalesystemasaninnite mappingof les whih aredierent ateahproessor. We
(
bsp_p())
⇀
δ
p
(
fst(
v
1
, v
2
))
⇀
δ
v
1
iftruethen
e
1
elsee
2
⇀
δ
e
1
(
isnn)
⇀
δ
true
(
xop)
⇀
δ
op
(+ (
n
1
, n
2
))
⇀
δ
n
with
n
=
n
1
+
n
2
(
snd(
v
1
, v
2
))
⇀
δ
v
2
iffalsethen
e
1
elsee
2
⇀
δ
e
2
(
isnv
)
⇀
δ
false if
v
6
=
n(
x(
funx
→
e
))
⇀
δ
e
[
x
←
(
x
(
funx
→
e
))]
Fig.5.1. Funtional
δ
-rulessystem. Thepersistentversionofthesmall-stepssemantishasthefollowingform:
{F}
/e/
{
f
}
⇀
{F
′
}
/e
′
/
{
f
′
}
.
We note
∗
⇀
, for the transitive and reexive losure of⇀
, e. g., we note{F
0
}
/e
0
/
{
f
0
}
∗
⇀
{F}
/v/
{
f
}
for{F
0
}
/e
0
/
{
f
0
}
⇀
{F
1
}
/e
1
/
{
f
1
}
⇀
{F
2
}
/e
2
/
{
f
2
}
⇀ . . . ⇀
{F}
/v/
{
f
}
. To dene therelation⇀
, webeginwithsomerulesfortwokindsofredutions:
(i)
e/
{
f
i
}
i
⇀ e
′
/
{
f
′
i
}
whih ouldberead as withthe initial loal le system{
f
i
}
, at proessori
, theexpression
e
isreduedtoe
′
withthele system
{
f
′
i
}
";(ii)
{F}
/e/
{
f
}
⋊
⋉
⇀
{F
′
}
/e
′
/
{
f
}
whih ouldberead aswith theinitial globallesystem
{F}
and withtheinitialset ofloal lesystems,theexpression
e
isreduedtoe
′
withthegloballesystem
F
′
andwiththe
sameset ofloallesystems".
Todenetheserelations,webeginwithsomeaxiomsforthefuntionalheadredution
ε
⇀
:(
funx
→
e
)
v
ε
⇀
e
[
x
←
v
]
and letx
=
v
ine
ε
⇀
e
[
x
←
v
]
We write
e
[
x
←
v
]
for the expression obtained by substituting all the free ourrenes ofx
ine
byv
. Free ourrenes of a variable are dened as a lassial and trivial indutive funtion on our expressions. Thisfuntional headredutionhas twoversions. First, aloal redution,
ε
⇀
i
, of just theproessor
i
and seond,aglobalredution,
ε
⇀
⋊
⋉
,ofthewholeparallelmahine:
e
⇀ e
ε
′
e /
{
f
i
}
⇀
ε
i
e
′
/
{
f
i
}
(1)
e
ε
⇀ e
′
{F}
/ e /
{
f
}
⇀
ε
⋊
⋉
{F}
/ e
′
/
{
f
}
(2)
Forprimitiveoperatorswealsohavesomeaxioms,the
δ
-rules. Thefuntionalδ
-rules⇀
δ
aregiveninFigure5.1.
First,wehavefuntional
δ
-ruleswhih ouldbeusedbyoneproessori
,⇀
δ
i
orbytheparallelmahine,
⇀
δ
⋊
⋉
. As
inthefuntionalheadredution,wehavetwodierentasesforusingfuntional
δ
-rules:e⇀
δ
e
′
e /
{
f
i
}
⇀
δ
i
e
′
/
{
f
i
}
(3)
e⇀
δ
e
′
{F}
/ e /
{
f
}
⇀
δ
⋊
⋉
{F}
/ e
′
/
{
f
}
(4)
Suhredutions,whiharenotpersistentredutions,donothangeanddonotneedtheles. Onlypersistent
operatorshangeandneedthem.
{F}
/
(
mkparv
)
/
{
f
}
⇀
δ≎
{F}
/
h
(
v
0)
, . . . ,
(
v
(
p
−
1))
i
/
{
f
}
{F}
/
(
applyh
v
0
, . . . , v
p−1
i h
v
′
0
, . . . , v
′
p−
1
i
)
/
{
f
}
⇀
δ≎
{F}
/
h
(
v
0
v
′
0
)
, . . . ,
(
v
p
−
1
v
′
p
−
1
)
i
/
{
f
}
{F}
/
(
ath
. . . , v
n
, . . .
i
n
)
/
{
f
}
⇀
δ≎
{F}
/ v
n
/
{
f
}
if
A
c
(
v
n
)
6
=
True{F}
/
(
puth
v
0
, . . . , v
p−1
i
)
/
{
f
}
⇀
δ≎
{F}
/
(
mkfun
(
h
send(
initv
0
p)
, . . . ,
send(
initv
p
−
1
p)
i
))
/
{
f
}
{F}
/
h
send[
v
0
0
, .., v
p−1
0
]
, . . . ,
send[
v
0
p−1
, .., v
p−1
p−1
]
i
/
{
f
}
⇀
δ≎
{F}
/
h
[
v
0
0
, .., v
0
p−
1
]
, . . . ,
[
v
p−
1
0
, .., v
p−
1
p−1
]
i
/
{
f
}
if∀
n, m
∈
0
, . . . , p
−
1
A
c
(
v
m
n
)
6
=
Truewhere mkfun
=
apply(
mkpar(
funj t i
→
if(
and(
≤
(0
, i
)
, <
(
i,
p)))
then(
aesst i
)
elsen))
Seond,for theparallelprimitives,wenaturallyhave
δ
-rules butwedonothavethoseδ
-rulesonasingleproessorbutonlyfor theparallelmahine (Figure5.2). Forsimplereasonsit isimpossibleforaproessorto
send ahannelto anotherproessor. This seond proessor doesnot haveto read in this hannel beause it
ould beseen asahiddenommuniation. Inthis way, wehaveto test ifthesentvaluesontain hannels or
not. Todo this, weused atrivialindutivefuntion
A
c
whih tellswhether anexpressionontainsahannelornot. Note thatthis work isdonewhen OCamlserializesvalues. This raisesanexeptionwhen anabstrat
datumlikeahannelhasbeenfound. Theevaluation ofaputprimitiveproeedsin twosteps. Inarststep,
eahproessorreatesapurefuntionalarrayofvalues. Thus,weneedanewkindofexpression,arrayswritten
[
e, . . . , e
]
. initandaessoperatorsareusedtomanipulate thesefuntionalarrays:aess
[
v
0
, . . . , v
n
, . . . , v
m
]
n
⇀
δ
v
n
and init
f m
⇀
δ
[(
f
0)
, . . . ,
(
f
(
m
−
1))]
Inthe seond step, thesend operationsexhange these arrays. For example,the valueat the index
j
of thearrayheldatproess
i
issenttoproessj
andisstoredatindexi
oftheresult. Thefuntionmkfunonstrutsaparallel vetoroffuntions fromtheresultingvetorofarrays.
(
open rf
)
/
{
f
′
, . . . ,
f
vn
.
.
.
v
0
, . . . , f
′′
}
⇀
io
δ
f
a
r
/
{
f
′
, . . . ,
rf
vn
.
.
.
v
0
[
g
a
0
]
, . . . , f
′′
}
(
open rf
)
/
{
f
′
, . . . ,
rf
vn
.
.
.
v
0
[
g
a
, . . . , g
z
]
, . . . , f
′′
}
io
⇀
δ
f
ξ
r
/
{
f
′
, . . . ,
rf
vn
.
.
.
v
0
[
g
a
, . . . , g
z
, g
ξ
0
]
, . . . , f
′′
}
(
open wf
)
/
{
f
′
, . . . ,
f
vn
.
.
.
v
0
, . . . , f
′′
}
⇀
io
δ
f
ξ
w
/
{
f
′
, . . . ,
wf
∅
, . . . , f
′′
}
(
open wl
f
)
/
{
f
′
, . . . , f
′′
}
io
⇀
δ
f
ξ
w
/
{
f
′
, . . . ,
wf
∅
, . . . , f
′′
}
iff /
∈ {
f
′
, . . . , f
′′
}
(
lose rf
rξ
)
/
{
f
′
, . . . ,
rf
vn
.
.
.
v
0
[
g
a
, . . . , g
ξ
k
, . . . , g
z
]
, . . . , f
′′
}
io
⇀
δ
()
/
{
f
′
, . . . ,
rf
vn
.
.
.
v
0
[
g
a
, . . . , g
z
]
, . . . , f
′′
}
(
lose rf
rξ
)
/
{
f
′
, . . . ,
rf
vn
.
.
.
v
0
[
g
a
, . . . , g
z
]
, . . . , f
′′
}
⇀
io
δ
()
/
{
f
′
, . . . ,
rf
vn
.
.
.
v
0
[
g
a
, . . . , g
z
]
, . . . , f
′′
}
(
lose rf
rξ
)
/
{
f
′
, . . . ,
rf
vn
.
.
.
v
0
[
g
ξ
k
]
, . . . , f
′′
}
io
⇀
δ
()
/
{
f
′
, . . . ,
f
vn
.
.
.
v
0
, . . . , f
′′
}
(
lose rf
rξ
)
/
{
f
′
, . . . ,
f
vn
.
.
.
v
0
, . . . , f
′′
}
⇀
io
δ
()
/
{
f
′
, . . . ,
f
vn
.
.
.
v
0
, . . . , f
′′
}
(
lose wf
ξ
w
/
{
f
′
, . . . ,
?f
vn
.
.
.
v
0
, . . . , f
′′
}
io
⇀
δ
()
/
{
f
′
, . . . ,
f
vn
.
.
.
v
0
, . . . , f
′′
}
where?
=
wor?=
(
readτ
f
ξ
r
)
/
{
f
′
, . . . ,
rf
.
.
.
vk
.
.
.
[
g
a
, .., g
ξ
k
, . . . , g
z
]
, . . . , f
′′
}
io
⇀
δ
v
k
/
{
f
′
, . . . ,
rf
.
.
.
vk
.
.
.
[
g
a
, .., g
ξ
m
, . . . , g
z
]
, . . . , f
′′
}
if
⊢
v
k
:
τ
andm
=
k
+ 1
.v
k
isthek
thvalueoff
(
readτ
f
rξ
)
/
{
f
′
, . . . ,
rf
vn
.
.
.
v
0
[
g
a
, .., g
ξ
k
, . . . , g
z
]
, . . . , f
′′
}
⇀
io
δ
n
/
{
f
′
, . . . ,
rf
vn
.
.
.
v
0
[
g
a
, .., g
k
ξ
, . . . , g
z
]
, . . . , f
′′
}
if
k > n
(
seekf
ξ
r
k
)
/
{
f
′
, . . . ,
rf
vn
.
.
.
v
0
[
g
a
, .., g
ξ
m
, . . . , g
z
]
, . . . , f
′′
}
io
⇀
δ
n
/
{
f
′
, . . . ,
rf
vn
.
.
.
v
0
[
g
a
, .., g
ξ
k
, . . . , g
z
]
, . . . , f
′′
}
(
write(
v, f
ξ
w
))
/
{
f
′
, . . . ,
wf
.
.
.
, . . . , f
′′
}
⇀
io
δ
()
/
{
f
′
, . . . ,
wf
v
.
.
.
, . . . , f
′′
}
ifA
c
(
v
n
)
6
=
True(
deletef
)
/
{
f
′
, . . . ,
f
.
.
.
, . . . , f
′′
}
⇀
io
δ
()
/
{
f
′
, . . . , f
′′
}
Fig.5.3.
δ
-rulesof thepersistentoperatorsThird, weomplete oursemantisby givingthe
δ
-rulesio
⇀
δ
ofthe I/Ooperators given in Figure 5.3. The
openoperationopensale(in readorwrite mode)andreturnsanewhannel,pointingtothis le,to aess
tothevaluesoftheleorwritevaluesinthisle. Openingaleinwritemode,givesanemptyle. Ifpossible,
read
τ
givesthe valueof type