Volume6,Number4,pp.116. http://www.spe.org
2005SWPS
EVALUATING THEPERFORMANCEOFPIPELINE-STRUCTURED PARALLEL
PROGRAMS WITH SKELETONSANDPROCESSALGEBRA
∗
ANNE BENOIT
†
, MURRAY COLE , STEPHEN GILMORE , AND JANE HILLSTON
Abstrat. Weshowinthispaper how toevaluatethe performaneofpipeline-strutured parallelprograms withskeletons
andproessalgebra. Sinemanyappliationsfollowsomeommonlyusedalgorithmiskeletons,weidentifysuh skeletonsand
modelthem withproess algebrainorderto getrelevant informationaboutthe performaneofthe appliation,and tobeable
totakegoodshedulingdeisions. Thisonept isillustrated through the asestudyofthe pipeline skeleton,and atool whih
generatesautomatiallyasetofmodelsandsolvesthemispresented. Somenumerialresultsareprovided,provingtheeayof
thisapproah.
Key words. Algorithmiskeletons, pipeline,high-levelparallel programs,performaneevaluation,proess algebra,PEPA
Workbenh.
1. Introdution. One of the most promising tehnial innovations in present-day omputing is the
in-ventionof grid tehnologies whih harness theomputational power of widelydistributed olletionsof
om-puters [8℄. Designing an appliation for the Grid raises diult issues of resoure alloation and sheduling
(roughlyspeaking,howtodeidewhihomputerdoeswhat,and when,andhowtheyinterat). These issues
aremadeall themoreomplexbytheinherentunpreditabilityof resoureavailabilityand performane. For
example,asuperomputermayberequiredforamoreimportanttask,ortheInternetonnetionsrequiredby
theappliation maybepartiularlybusy.
In this ontext of grid programming, askeleton-basedapproah [5, 16, 7℄ reognizes that manyreal
ap-pliations drawfrom arange of well-known solution paradigmsand seeks to make it easy for anappliation
developerto tailorsuhaparadigmto aspei problem. Powerfulstruturingoneptsare presentedto the
appliation programmeras alibrary ofpre-dened `skeletons'. As with otherhigh-levelprogrammingmodels
theemphasisisonprovidinggeneripolymorphiroutineswhihstrutureprogramsinlearly-delineatedways.
Skeletal parallel programming supports reasoning about parallel programs in order to removeprogramming
errors. Itenhanesmodularityandongurabilityinordertoaidmodiation,portingandmaintenane
ativ-ities. InthepresentworkwefousontheEdinburghSkeletonLibrary(eSkel)[6℄. eSkelisanMPI-basedlibrary
whihhasbeendesignedforSMPandlusteromputingandisnowbeingonsideredforgridappliationsusing
grid-enabledversionsofMPIsuhasMPICH-G2 [14℄.
Theuse ofapartiular skeleton arrieswith itonsiderable information aboutimplied sheduling
depen-denies. Bymodellingthese withstohastiproessalgebrassuh asPerformaneEvaluationProessAlgebra
[13℄,andtherebybeingableto inludeaspetsofunertaintywhihareinherenttogridomputing,webelieve
thatwewillbeabletounderpinsystemswhihanmakebettershedulingdeisionsthanlesssophistiated
ap-proahes. Mostsigniantly,sinethismodellingproessanbeautomated,andsinegridtehnologyprovides
failitiesfor dynami monitoringofresoureperformane, ourapproahwill support adaptive reshedulingof
appliations.
Stohastiproessalgebraswereintroduedintheearly1990sasaompositionalformalismforperformane
modelling. Sinethentheyhavebeensuessfullyappliedtotheanalysisofawiderangeofsystems. Ingeneral
analysisisbasedonthegenerationofanunderlyingontinuoustimeMarkovhain(CTMC)andderivation of
itssteadystateprobabilitydistribution. Thisvetorreordsthelikelihoodofeahpotentialstateofthesystem,
and an in turn be used to derive performane measures suh as throughput, utilisationand response time.
Several stohasti proess algebrashaveappeared in theliterature; we useHillston's PerformaneEvaluation
ProessAlgebra(PEPA)[13℄.
Somerelatedprojetsobtainperformaneinformationfrom theGridusing benhmarkingandmonitoring
tehniques[4,17℄. IntheICENIprojet[9℄,performanemodelsareusedto improvetheshedulingdeisions,
but these arejust graphswhih approximate dataobtainedexperimentally. Moreover,there is noupper-level
layerbasedonskeletonsinanyofthese approahes.
∗
Thiswork ispart of the ENHANCEprojet, funded by the UnitedKingdom Engineeringand PhysialSienes Researh
ounilgrantnumberGR/S21717/01.
†
Shool of Informatis, TheUniversityof Edinburgh, JamesClerk MaxwellBuilding,The King's Buildings,MayeldRoad,
Otherreentwork onsiderstheuseofskeleton programswithin gridnodestoimprovethequalityofost
information[1℄. Eahserverprovidesasimplefuntionapturingtheostofitsimplementationofeahskeleton.
Inan appliation,eah skeleton therefore runs onlyon oneserver,and the goalof sheduling is to seletthe
most appropriate serverswithin the wider ontext of the appliation and supporting grid. In ontrast, our
approah onsiders singleskeletons whih span theGrid. Moreover,weuse modelling tehniques to estimate
performane.
Our main ontribution is based on the ideaof using performane models to enhane the performane of
grid appliations. We propose to model skeletons in ageneri way to obtain signiantperformane results
whihmaybeusedtoresheduletheappliationdynamially. Tothebestofourknowledge,this kindofwork
has notbeendone before. We show in this paper how wean obtainsigniantresults onarst ase study
basedonthepipelineskeleton. Anearlierversionofthispaperispublishedintheproeedingsoftheworkshop
on PratialAspets of High-level Parallel Programming (PAPP04), part of the InternationalConferene on
ComputationalSiene(June7-9,2004,Kraków,Poland)[2℄. InthisextendedversionapresentationofPEPA
isinluded; the model resolutionand thetoolAMoGeTaredesribedmorepreisely; and moreexperimental
resultsareexposed.
In thenext setion, wepresent thepipeline and amodel of theskeleton. Then we explain how to solve
themodelwiththePEPA Workbenhinordertogetrelevantinformation(Setion3). InSetion 4wepresent
a tool whih automatially determines the best mappingto usefor the appliation, by rst generating a set
of models, then solvingthem and omparing the results. Some numerial resultson the pipeline appliation
are provided in Setion 5,and thefeasibilityof this approah isdisussed in Setion 6. Finallywegivesome
onlusions.
2. Thepipelineskeleton. Manyparallelalgorithmsanbeharaterizedandlassiedbytheiradherene
tooneormoreofanumberofgenerialgorithmiskeletons[16,5,7℄. Wefousinthispaperontheoneptof
pipelineparallelism,whihisofwell-provenusefulnessinseveralappliations. Wereallbrieytheprinipleof
thepipelineskeleton. ThenweintroduetheproessalgebraPEPA[13℄andweexplainhowweanmodelthe
pipelinewithPEPA. Finally,weshowin Setion2.4thestatetransitiondiagramofathree stagepipeline.
2.1. The prinipleof pipeline. Inthesimplest formofpipelineparallelism[6℄,asequeneof
Ns
stagesproessasequeneofinputstoprodueasequeneofoutputs(Fig.2.1).
...
Stage
1
Stage2
StageN
s
inputs outputs
Fig.2.1. Thepipeline appliation
Eah inputpasses througheah stage in thesameorder, andthe dierent inputsare proessedone after
another(astageannotproessseveralinputsatthesametime). Notethattheinternalativityofastagemay
beparallel, but this is transparent to ourmodel. Inthe remainderof the paper weuse the termproessor
todenotethehardwareresponsibleforexeutingsuhativity,irrespetiveofitsinternaldesign(sequentialor
parallel).
We onsider this appliation lass in the ontext of omputational grids, and so we want to map it to
our omputing resoures, whih onsist of a set of potentially heterogeneous proessors interonneted by a
heterogeneousnetwork.
It is well known that aomputing pipeline performs mosteetively when theworkload iswell balaned
arossstages andthereare alargeenoughnumberof inputsto amortizetheostsof llingand draining. Our
work diretly addressesthe rst of these issues, by failitating explorationof the stage-to-proessormapping
spae. Theseondissueremainstheresponsibilityoftheprogrammer: ourapproahassumesthat runningthe
appliation will take longenough for the system to reah an equilibrium behaviour. The models help us to
studythissteadystatebehaviour.
Considering the pipeline appliation in the eSkel library [6℄, we fous here on a pipeline variant whih
requiresthat eahstageproduesexatlyoneoutputforeah input.
We now go on to present thePEPA languagewhih wewill use to model thepipeline appliation. The
2.2. Introdution to PEPA. The PEPA language provides a small set of ombinators. These allow
languagetermstobeonstruteddeningthebehaviourofomponents,viatheativitiestheyundertakeandthe
interationsbetweenthem. Timinginformationisassoiatedwitheahativity. Thus,whenenabled,anativity
a
= (
α, r
)
will delay foraperiod sampled from thenegativeexponentialdistribution whih hasparameterr
. If several ativities are enabled onurrently, either in ompetition orindependently, we assume that araeondition exists between them. The omponent ombinators, together with their namesand interpretations,
arepresentedinformallybelow.
Prex: Thebasi mehanism for desribingthe behaviour of asystem is to givea omponent a designated
rstation using the prexombinator,denoted by afull stop. For example,the omponent
(
α, r
)
.S
arries outativity(
α, r
)
,whihhasationtypeα
andanexponentiallydistributeddurationwithparameterr
,andit subsequentlybehavesasS
.Choie: The hoie ombinatoraptures the possibility of ompetition betweendierent possible ativities.
Theomponent
P
+
Q
representsasystemwhih maybehaveeither asP
orasQ
. The ativitiesof bothP
andQ
areenabled. Therstativitytoompletedistinguishesoneofthem: theotherisdisarded. Thesystemwillbehaveasthederivativeresultingfromtheevolutionofthehosenomponent.
Constant: It is onvenientto beableto assign namesto patternsof behaviourassoiatedwith omponents.
Constantsare omponentswhose meaningis givenbyadening equation. Forexample,
P
def= (
α, r
)
.P
denes aomponentwhihperformsativityα
atrater
,forever.Hiding: Thepossibilityto abstrat awaysomeaspets ofaomponent's behaviouris provided bythehiding
operator, denoted
P/L
. Here, the setL
of visible ation types identies those ativities whih are to beonsideredinternalorprivatetotheomponentandwhihwillappearastheunknowntype
τ
.Cooperation: InPEPAdiretinteration,orooperation,betweenomponentsisthebasisofompositionality.
The set whih is used asthe subsript to the ooperation symbol, the ooperation set
L
, determines thoseativities on whih the o-operands are fored to synhronise. For ation types not in
L
, the omponentsproeedindependentlyand onurrentlywith theirenabledativities. However,anativity whoseationtype
is in the ooperation set annot proeed until both omponents enable an ativity of that type. The two
omponents then proeed together to omplete the shared ativity. The rate of the sharedativity may be
alteredto reetthe work arriedoutby bothomponentsto ompletethe ativity (fordetails see[13℄). We
write
P
k
Q
asanabbreviationforP
⊲
⊳
L
Q
when
L
isempty.In some ases, when an ativity is known to be arried out in ooperation with another omponent, a
omponent may be passive with respet to that ativity. This means that the rate of the ativity is left
unspeied(denoted
⊤
)andisdetermineduponooperation,bytherateoftheativityintheotheromponent.Allpassiveationsmustbesynhronisedin thenalmodel.
ThedynamibehaviourofaPEPA modelisrepresentedbytheevolutionofitsomponents,either
individ-uallyorinooperation. Theformofthisevolutionisgovernedbyasetofformalruleswhihgiveanoperational
semantisofPEPAterms(see[13℄). Thus,asinlassialproessalgebra,thesemantisofeahterminPEPAis
givenviaalabelledmulti-transitionsystem(themultipliitiesofarsaresigniant). Inthetransitionsystema
stateorrespondstoeahsyntatitermofthelanguage,orderivative,andanarrepresentstheativitywhih
ausesonederivativeto evolveinto another. Theompleteset ofreahablestatesis termedthederivative set
of amodel and these form the nodesof the derivation graph whih is formed by applying thesemanti rules
exhaustively.
ThederivationgraphisthebasisoftheunderlyingContinuousTimeMarkovChain(CTMC)whihisused
toderiveperformanemeasuresfrom aPEPA model. Thegraphissystematiallyreduedto aform where it
anbetreatedasthestatetransitiondiagramoftheunderlyingCTMC. Eah derivativeisthenastatein the
CTMC. Thetransition rate betweentwoderivatives
P
andQ
inthederivationgraphis therateatwhih thesystemhangesfrom behavingasomponent
P
to behavingasQ
. It isthesumof theativityrateslabellingarsonnetingnode
P
tonodeQ
.The stages
Therstpartofthemodelistheappliation model,whihisspeiedindependentlyoftheresouresonwhih
theappliation will beomputed. We deneone PEPA omponent per stage. For
i
= 1
..Ns
, theomponent Stagei
workssequentially. Atrst,itgetsdata(ativitymovei
),thenproessesit(ativityproessi
),andnally movesthedatato thenextstage(ativitymovei
+1
).Stage
i
def
=
(
movei,
⊤
)
.
(
proessi
,
⊤
)
.
(
movei
+1
,
⊤
)
.
Stagei
Alltheratesareunspeied,denotedbythedistinguishedsymbol
⊤
,sinetheproessingandmovetimesdependontheresoureswheretheappliationis running. These rateswill bedened later,inanotherpartof
themodel.
Thepipeline appliation isthen dened asaooperationofthe dierentstages overthemove
i
ativities, fori
= 2
..Ns
.Theativities move
1
and moveN
s
+1
represent, respetively, thearrivalof aninput in theappliation andthetransferofthenal outputoutofthepipeline. Theydonotrepresentanydatatransferbetweenstages,so
theyarenotsynhronizingthepipelineappliation. Finally, wehave:
Pipeline def
=
Stage1
{
move⊲
⊳
2
}
Stage2
{
move⊲
⊳
3
}
. . .
⊲
⊳
{
moveNs
}
Stage
N
s
The proessors
Weonsiderthattheappliationmustbemappedonasetof
Np
proessors. Eahstageisproessedbyagiven(unique)proessor,but aproessormayproessseveralstages (intheasewhere
Np
< Ns
). Inorderto keepthemodelsimple, wedeideto put information abouttheproessor(suh asthe loadof theproessororthe
numberof stages being proessed)diretlyin therate
µi
of theativities proessi
,i
= 1
..Ns
(these ativities havebeendenedfortheomponentsStagei
).Eah proessor is then represented by a PEPA omponent whih has a yli behaviour, onsisting of
proessingsequentiallyinputsforastage. Someexamplesfollow.
•
IntheasewhenNp
=
Ns
,wemap onestageperproessor:Proessor
i
def=
(
proessi
, µi
)
.
Proessori
•
If several stages are proessed by a same proessor, we use a hoie omposition. In the followingexample (
Np
= 2
andNs
= 3
), the rst proessor proesses the two rst stages, and the seond proessorproessesthethirdstage.Proessor
1
def=
(
proess1
, µ
1
)
.
Proessor1
+ (
proess2
, µ
2
)
.
Proessor1
Proessor2
def
=
(
proess3
, µ
3
)
.
Proessor2
Sineallproessorsareindependent,thesetofproessorsisdenedasaparallelompositionoftheproessor
omponents:
Proessors def
=
Proessor1
||
Proessor2
||
. . .
||
ProessorN
p
The network
Thelastpartofthemodelisthenetwork.Wedonotneedtodiretlymodelthearhitetureandthetopology
of the network for what we aim to do, but we wantto getsome information about theeieny of thelink
onnetionbetweenpairsofproessors. Thisinformationisgivenbyaetingtherates
λi
ofthemovei
ativities (i
= 1
..Ns
+ 1
).λ
1
representstheonnetionbetweentheuser(providinginputstothepipeline)andtheproessorhostingtherststage.
For
i
= 2
..Ns
,λi
representstheonnetionbetweentheproessorhostingstagei
−
1
and theproessor hostingstagei
.λN
Notethat
λi
will enodeinformation bothabouttheloadonthe linksand thesize of thedataproessedbyproess
i
−
1
. Whenthedataistransferred onthesameomputer,therateisreallyhigh, meaningthat theonnetionis fast(omparedtoatransferbetweendierentsites).
Thenetworkisthen modelledbythefollowingomponent:
Network def
=
(
move1
, λ
1
)
.
Network+
· · ·
+ (
moveN
s
+1
, λN
s
+1
)
.
NetworkThe pipeline model
Onewehavedenedthedierentomponentsofourmodel,wejusthavetomapthestagesontotheproessors
andthenetworkbyusingtheooperationombinator. Forthis,wedenethefollowingsetsofationtypes:
Lp
=
{
proessi
}
i
=1
..N
s
tosynhronizethePipeline andtheProessorsLm
=
{
movei
}
i
=1
..N
s
+1
tosynhronizethePipeline andtheNetworkMapping def
=
Network⊲
⊳
Lm
Pipeline
⊲
⊳
Lp
Proessors
PEPA input le
AnexampleofaninputleforthePEPA WorkbenhanbefoundinAppendix B.
2.4. State transition diagram for the pipeline model. Figure 2.2 represents the state transition
diagram of a three stage, three proess pipeline. This piture shows all of the possible interleavings of the
omponents of the model with ars of various kinds showingthe dierent types of transitions from state to
state.
InTable2.1 we showthe orrespondene betweenthe statenumbersin Figure2.2 and the PEPA terms.
Sine the PEPA terms are long we have omitted the ooperation sets, showing only the loal state of eah
omponent. Moreoverto keep thetable ompat we havenamed thederivativesof the
Stage
omponentsasfollows:
Stage
i
0
def
= (
movei,
⊤
)
.
Stagei
1
Stage
i
1
def
= (
proessi,
⊤
)
.
Stagei
2
Stage
i
2
def
= (
movei
+1
,
⊤
)
.
Stagei
0
3. Solving the models. One reason to work with aformal modelling languagesuh as PEPA is that
modelsareunambiguousandanservetosupport reliableommuniationbetweenthosewhodesignsystems,
those who develop them and those who maintain them. Another reason to work with a formal modelling
languageis that formal models an be automatially proessed by tools in order to derive information from
themwhih otherwisewouldhavetobeproduedbymanualalulationorreasoning.
Thetoolwhih wehaveusedforproessingourPEPA modelsandomputingthesteady-stateprobability
distributionofoursystemisthePEPAWorkbenh. Afulldesriptionofthefuntioningofthissoftwareanbe
foundin[11℄;thereferenemanualforthelatestreleaseis[12℄. Weinludeabriefdesriptionofthefuntioning
oftheWorkbenhinAppendixC.1in ordertomakethepresentpaperself-ontained.
Notiehoweverthat thesteady-state probability distribution of thesystem is rarelythe desiredresult of
the performane analysis proess and so to progress we must identify a signiant performane result. The
performane result that is pertinent for the pipeline appliation is the throughput of the proess
i
ativities (i
= 1
..Ns
). Sinedatapassessequentiallythrougheahstage,thethroughputisidentialforalli
,andweneed toomputeonlythethroughputofproess1
toobtainsigniantresults. Thisisdonebyaddingthesteady-stateprobabilitiesof eahstateinwhih proess
1
anhappen,andmultiplyingthisbyµ
1
.We have made somehanges to the Java edition of the PEPA Workbenh in order to allowthe user to
25
26
27
24
21
move1
move2
move3
move4
process3
process2
process1
8
9
2
3
10
11
12
7
16
17
18
23
22
4
19
6
20
14
15
5
13
1
Fig.2.2. Statetransitiondiagramofathreestage,threeproesspipelinewithstatesnumberedaordingtoTable2.1
4. AMoGeT:The AutomatiModelGenerationTool. Weinvestigateinthispaperhowtoenhane
theperformaneofgridappliationswiththeuseofalgorithmiskeletonsandproessalgebras. Todothis,we
havereated atool whih automatially generates performane models for thepipeline asestudy, and then
solvesthemodels. These resultsouldbeusedto reshedule theappliation.
Wegiveatrstanoverviewofthetool. Thenwedesribetheinformationwhihisprovidedtothetoolvia
adesription le. Finally,weexplainthefuntioningofthetool.
desription
le
performane
information PEPA
models results
AMoGeT
Compare
results
models Generate
Workbenh PEPA
Fig.4.1.TheprinipleofAMoGeT
4.1. AMoGeT desription. Fig.4.1 illustrates thepriniple of the tool. Inits urrent form, thetool
is a generi, reusable software omponent. Its ultimate role will be as an integrated omponent of a
Table2.1
CorrespondenebetweenstatenumbersinFigure2.2andPEPAterms(ooperation setsareomittedbutremainonstant)
stateno. PEPAstate
1
(
Network,
(
Stage10
,
Stage20
,
Stage30
)
,
(
Proessor1
,
Proessor2
,
Proessor3))
2(
Network,
(
Stage11
,
Stage20
,
Stage30
)
,
(
Proessor1
,
Proessor2
,
Proessor3))
3(
Network,
(
Stage12
,
Stage20
,
Stage30
)
,
(
Proessor1
,
Proessor2
,
Proessor3))
4(
Network,
(
Stage10
,
Stage21
,
Stage30
)
,
(
Proessor1
,
Proessor2
,
Proessor3))
5(
Network,
(
Stage11
,
Stage21
,
Stage30
)
,
(
Proessor1
,
Proessor2
,
Proessor3))
6(
Network,
(
Stage12
,
Stage21
,
Stage30
)
,
(
Proessor1
,
Proessor2
,
Proessor3))
7(
Network,
(
Stage10
,
Stage22
,
Stage30
)
,
(
Proessor1
,
Proessor2
,
Proessor3))
8(
Network,
(
Stage11
,
Stage22
,
Stage30
)
,
(
Proessor1
,
Proessor2
,
Proessor3))
9(
Network,
(
Stage12
,
Stage22
,
Stage30
)
,
(
Proessor1
,
Proessor2
,
Proessor3))
10(
Network,
(
Stage10
,
Stage20
,
Stage31
)
,
(
Proessor1
,
Proessor2
,
Proessor3))
11(
Network,
(
Stage11
,
Stage20
,
Stage31
)
,
(
Proessor1
,
Proessor2
,
Proessor3))
12(
Network,
(
Stage12
,
Stage20
,
Stage31
)
,
(
Proessor1
,
Proessor2
,
Proessor3))
13(
Network,
(
Stage10
,
Stage21
,
Stage31
)
,
(
Proessor1
,
Proessor2
,
Proessor3))
14(
Network,
(
Stage11
,
Stage21
,
Stage31
)
,
(
Proessor1
,
Proessor2
,
Proessor3))
15(
Network,
(
Stage12
,
Stage21
,
Stage31
)
,
(
Proessor1
,
Proessor2
,
Proessor3))
16(
Network,
(
Stage10
,
Stage22
,
Stage31
)
,
(
Proessor1
,
Proessor2
,
Proessor3))
17(
Network,
(
Stage11
,
Stage22
,
Stage31
)
,
(
Proessor1
,
Proessor2
,
Proessor3))
18(
Network,
(
Stage12
,
Stage22
,
Stage31
)
,
(
Proessor1
,
Proessor2
,
Proessor3))
19(
Network,
(
Stage10
,
Stage20
,
Stage32
)
,
(
Proessor1
,
Proessor2
,
Proessor3))
20(
Network,
(
Stage11
,
Stage20
,
Stage32
)
,
(
Proessor1
,
Proessor2
,
Proessor3))
21(
Network,
(
Stage12
,
Stage20
,
Stage32
)
,
(
Proessor1
,
Proessor2
,
Proessor3))
22(
Network,
(
Stage10
,
Stage21
,
Stage32
)
,
(
Proessor1
,
Proessor2
,
Proessor3))
23(
Network,
(
Stage11
,
Stage21
,
Stage32
)
,
(
Proessor1
,
Proessor2
,
Proessor3))
24(
Network,
(
Stage12
,
Stage21
,
Stage32
)
,
(
Proessor1
,
Proessor2
,
Proessor3))
25(
Network,
(
Stage10
,
Stage22
,
Stage32
)
,
(
Proessor1
,
Proessor2
,
Proessor3))
26(
Network,
(
Stage11
,
Stage22
,
Stage32
)
,
(
Proessor1
,
Proessor2
,
Proessor3))
27(
Network,
(
Stage12
,
Stage22
,
Stage32
)
,
(
Proessor1
,
Proessor2
,
Proessor3))
Informationisprovidedtothetoolviaadesriptionle(f.Setion4.2). Thisinformationanbegathered
fromtheGridresouresandfrom theappliationdenition. Inthefollowingexperiments,itisprovidedbythe
user,butweanalsogetitautomatiallyfromgridservies,forexamplefromtheNetworkWeatherServie[17℄.
The tool allows everythingto be done in a single step through a simplePerl sript (f. Setion 4.3): it
generatesthemodels, solvesthemwith thePEPA Workbenh, and thenomparestheresults. This allowsus
tohavefeedbakontheappliationwhentheperformaneof theavailable resouresismodied.
4.2. Desription le for AMoGeT. The aimof this leis toprovideinformation abouttheavailable
gridresouresandthemodelledappliation,in ourasethepipeline.
Thisdesriptionle isnamed mymodel.des,wheremymodelisthenameoftheappliation.
•
Therstinformationprovidedisthetypeofthemodel. Sinewestudyherethepipelineskeleton,therstlineis
type
=
pipeline
;
•
WethenhavetheinformationabouttheGridresouresandNetworklinks,asalistofparameters. Thenumberofproessors
N
mustatrstbespeied:nbproc
=
N
;
Andthen,for
i
= 1
..N
andj
= 1
..N
,wespeifytheavailableomputingpoweroftheproessori
(pi
), andtheperformaneofthenetworklinkbetweenproessorsi
andj
(nli
-j
):p1=10; p2=5;
nl1-1=10000; nl1-2=8;
•
Conerning the appliation, we have some information about the stages of the pipeline.Ns
is thenumberofstages.
nbstage=
Ns
;Theamountofwork w
i
requiredtoomputeoneoutputforstagei
mustbespeiedfori
= 1
..Ns
: w1=2; w2=4; ...Finally,weneed tospeifythesize ofthedatatransferred toand fromeahstage. For
i
= 1
..Ns
+ 1
, dsi
isthesizeofthedatatransferredtostagei
,withtheboundaryasedsNs
+ 1
whihrepresentsthe sizeoftheoutputdata.ds1=100; ds2=5; ...
•
Next wedeneasetofandidatemappings ofstagesto proessors. Eahmappingspeies wheretheinitial datais loated, where theoutput data must beleft and (as atuple) theproessorwhere eah
stage isproessed. Forexample,thetuple
(1
,
1
,
2)
meansthat thetworststages areonproessor1
, withthethirdstageonproessor2
. Amappingisthenoftheform[
input, tuple, output
]
. Themapping denitionisasetofmappings,itanbeasfollows:mappings=[1,(1,2,3),3℄,[1,(1,1,2),2℄,[1,(1,1,1),1℄;
•
Thelastthingistheperformaneresultwewanttoompute. Forthepipelineappliation,weanaskforthethroughputwiththeline:
throughput;
4.3. The AMoGeTPerlsript. Thetoolallowseverythingtobedoneinasinglestepthroughasimple
Perlsript. The model generation is done by alling an auxiliaryfuntion. Models arethen solvedwith the
PEPA Workbenh asseenin Setion3. Finally, theresultsare ompared. Thisallowsusto havefeedbakon
theappliationwhentheperformaneoftheavailableresouresismodied.
Onemodelisgeneratedfromeahmappingofthedesriptionle. EahmodelisasdesribedinSetion2.3.
Thediultpointonsistsofgeneratingtheratesfromtheinformationgatheredbefore. Themodelgeneration
itselfisthenstraightforward.
Toomputetheratesoftheproess
i
ativitiesforagivenmodel (i
= 1
..Ns
), weneedto knowhowmany stagesarehosted oneahproessor,andweassumethattheworksharingbetweenthestagesisequitable. Therateassoiatedwiththeproess
i
ativityisthen:µi
=
w
i
×
cp
j
nbst
j
where
j
is thenumberoftheproessorhostingthestagei
,andnbstj
is thenumberofstagesbeingproessedonproessor
j
. Ineet,theavailableomputingpowerpj
isfurther dilutedbyourowninternaltimesharingfatornbst
j
,beforebeingapplied totheworkloadassoiatedwiththestage,wi
.Theratesofommuniationbetweenstagesdependonthemappingtoo,sinetherateofamove
i
ativity dependsontheonnetionlinkbetweentheproessorj
1
hostingstagei
−
1
andtheproessorj
2
hostingstagei
, whihisgivenbynlj
1
-j
2
. Sinethemappingspeieswhere theinputandoutput dataare, weanalsondthe onnetion link for the data arriving into the pipeline and thedata exiting the appliation. These rates
depend alsoonthe size ofthe datatransferred from onestage of thepipeline to thenext, givenby ds
i
. Theboundaryases areapplied to omputethe ratesof the move
1
and moveN
s
+1
ativities. The rate assoiatedwiththemove
i
ativityistherefore:λi
=
nl
j
1
−
j
2
ds
i
Onetheseratesarederived,generatingthemodelisstraightforward. Weadd intothelethedesription
ofthethroughputoftheproess
1
ativityasarequiredresulttoallowanautomatiomputationofthisresult.ThemodelsanthenbesolvedwiththePEPAWorkbenh,andthethroughputofthepipelineisautomatially
omputed(Setion3). Duringtheresolution,alltheresultsaresavedinasinglele,andthelaststepofresults
5. Numerialresults. Wepresentinthissetionsomenumerialresults. Weexplainthroughthemhow
theinformationobtainedwithAMoGeTanberelevantforoptimizingtheappliation.
In the present paper we do notapply this method to agivenreal-world example. We use anabstrat
pipeline forwhih wearbitrarily x thetime requiredto omplete eah stage. Thisis suient toshowthat
AMoGeTanhelptooptimizeanappliation.
5.1. Experiment1: Pipelinewith
3
stagesxed data size. Wegivehereafewnumerialresults onan examplewith3
pipeline stages (and upto3
proessors). The models that weneed to solve arereally small(inthisase,themodelhas27
statesand51
transitions,f. Figure2.2).Wesupposeinthisexperimentthatnl
i
-i
=10000fori
= 1
..
3
,andthatthereisnoneedtotransfertheinput or the output data. Moreover,wesuppose that the network is symmetrial(nli
-j
=nlj
-i
for alli, j
= 1
..
3
). Conerningthepipelineparameters,theamountofworkwi
requiredto omputeeahstageis1
,aswellasthe sizeofthedatadsi
whihistransferredfromonestagetoanother. Therelevantparametersarethereforenl1-2,nl2-3,nl1-3,andp
i
fori
= 1
..
3
. Weomparedierentmappings,andjustspeifythetupleindiatingwhih stageisonwhihproessor. Weomparethemappings(1,1,1),(1,1,2),(1,1,3),(1,2,1),(1,2,2),(1,2,3),(1,3,1),(1,3,2)and(1,3,3)(therststageisalwaysonproessor
1
). TheresultsaredisplayedinTable5.1,andweonly putthebest ofthemappingswhihwereinvestigatedintherelevantlineofthetable.Table5.1
ResulttableforExperiment1
Setofresults Parameters Mapping&
nl1-2 nl2-3 nl1-3 p1 p2 p3 Throughput
1
10000
10000
10000
10
10
10
(1,2,3):5
.
63467
10000
10000
10000
5
5
5
(1,2,3):2
.
81892
2
10000
10000
10000
10
10
1
(1,2,1):3
.
36671
10
10
10
10
10
1
(1,1,2):2
.
59914
1
1
1
10
10
1
(1,1,1):1
.
87963
3
10
1
1
10
10
10
(1,1,2):2
.
59914
10
1
1
1
1
100
(1,3,3):0
.
49988
Intherstsetofresults,alltheproessorsareidentialandthenetworklinksarereallyfast. Intheseases,
thebest mappingalwaysonsistsof putting onestageoneah proessor(the resultsforthemapping
(1
,
3
,
2)
areidentialtothebestmapping). Ifwedividethetimealloatedbytheproessortotheappliationby2
,the resultingthroughputisalsodividedby2
,sineonlytheproessingpowerhasanimpatonthethroughput.The seond set of results illustrates the ase when one proessor is beoming really busy, in this ase
proessor
3
. Weshouldnotuseitanymore,butdependingonthenetworklinks,thebestmappingmayhange. Ifthelinksarenoteient,weshouldindeedavoiddatatransferandtrytoputonseutivestagesonthesameproessor. Whennl1-2=nl2-3=nl1-3=
10
,themapping(1
,
2
,
2)
providesthesameresultsas(1
,
1
,
2)
. Finally, the third set of results shows what happens if thenetwork link to proessor3
is really slow. In this ase again, the use of the proessorshould be avoided, and the best mappings are(1
,
1
,
2)
and(1
,
2
,
2)
. However,ifproessor3
isareallyfastproessoromparedtotheotherones(lastline),weproessstage2
and stage3
onthethirdproessor(mapping(1
,
3
,
3)
).5.2. Experiment2: Pipeline with
3
stagesdata size hanging. Thethird experimentkeepsthe3
stagepipeline,butonsidershangesinthesizeofthedata. TheassumptionsarethesameasforExperiment1, butmoreparametershaveaxedvalue.In this experiment, the network onnetionbetweenproessors
1
and2
is slightlyless eetive than the others. So, wehave nl1
-2 = 100
, nl2
-3 =
nl1
-3 = 1000
. Moreover, theomputing powerof eah stage is pi
= 10
. The size of the data is nowxed to100
, exept from the data transiting from stage1
to stage2
(ds2
),whosesizeisvarying.Figure5.1presentsthethroughputobtainedwitheah mapping,asafuntionofthedatasize ds
2
. Notierstthatsomeofthemappingsarenotinuenedbythehangeofthedatasize,i.e.(1,1,1),(1,1,2)and(1,1,3). This isduetothefat thattheonnetionbetweenstages
1
and2
isgood beausethedatastays on the same proessor. The inuene of the size of the data transferred is muh moreimportant when theThroughput
ds2
mappings:
0
0.5
1
1.5
2
2.5
3
3.5
4
0
50
100
150
200
250
300
350
(1,1,1)
(1,1,2)
(1,1,3)
(1,2,1)
(1,2,2)
(1,2,3)
(1,3,1)
(1,3,3)
(1,3,2)
Fig.5.1.Experiment2: Throughputfuntionofds
2
The best mapping is (1,3,2) when ds
2
<
150
, and (1,1,3) for greater values. Both of them avoid the slow onnetion nl1
-2
, and they use several proessorsso the proessing power is better than for mappings like(1,1,1). Whenthesizeofthedatatransferredbetweenthersttwostagesbeomeshigh, thebottlenekistheonnetionlink betweenthem, soitisbetterto putthemonthesameproessor,evenifwemaylosesome
proessingpower.
5.3. Experiment3: Pipelinewith
8
stages. Thelastexperimentonsidersalargerpipeline,omposed of8
stages. We use up to8
proessors, and ompare four dierent mappings, depending on the number of proessorswewishto use:•
8proessors,themappingis[1
,
(1
,
2
,
3
,
4
,
5
,
6
,
7
,
8)
,
8]
•
4proessors,themappingis[1
,
(1
,
1
,
2
,
2
,
3
,
3
,
4
,
4)
,
4]
•
2proessors,themappingis[1
,
(1
,
1
,
1
,
1
,
2
,
2
,
2
,
2)
,
2]
•
1proessor,themappingis[1
,
(1
,
1
,
1
,
1
,
1
,
1
,
1
,
1)
,
1]
The parametersare the same as for Experiment 1, with p
i
= 10
, wi
= 1
, dsi
=1and nli
-i
= 10000
for alli
. We vary the parametersnli
-j
, fori
6
=
j
, assuming that all these links are equal, and we omputethe throughputforthedierentmappings. Figure5.2displaystheresults.The urves obtained onrm that we should avoid data transfer when the network onnetions are less
eient. When nl
i
-j >
7
, thenetwork performswellenough to allowthe useof the8
proessors. However, whentheperformanedereases,weshould useonly4
proessors,thentwo,andonlyonewhennli
-j <
0
.
8
.Whenweneedto transfertheoutputdatabaktotherstproessor(forexample,themapping
[1
,
(1
,
2
,
3
,
4
,
5
,
6
,
7
,
8)
,
1]
for theasewith
8
proessors),weobtainalmost thesameresults, with aslightly smallerthroughput due to thisadditionaltransfer.6. Feasibilityoftheapproah. Weenvisagetheuseofourapproahwithinashedulingandresheduling
platform for long-running grid appliations. In this ontext it is antiipated that after initial analysis and
sheduling,thesystemwouldbemonitoredandthatreshedulingwouldbeneededonlyrelativelyinfrequently,
forexample,oneanhour. Neverthelessitisimportantthattheuseofthetooldoesnotontributeanoverhead
whih eliminatesthe benetto be obtainedfrom its use. In this setion we present evidene whih suggests
that this is not likely to be the ase in pratie. The reader should note that herewe are reeting on the
Number of processors
Throughput
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
2.2
0
1
2
3
4
5
6
7
8
9
10
2
4
8
1
nl
i
-j
Fig.5.2. Experiment3: Pipelinewith
8
stagesWerananexperimenttoassessthetimetakentogenerateandsolvemodelsusingAMoGeT,whihwill, of
ourse,bedependentonthesizeofthegeneratedmodel. Fig.6.1illustratesthenumberofstatesandtransitions
ofthemodelsasafuntion oftheparametersof theskeleton. Thesenumbersareindependent ofthenumber
ofproessorsin themodel;theydependonlyonthenumberofpipelinestages.
1e+6
8e+5
6e+5
4e+5
2e+5
0
2
4
6
8
10
12
Numberof Stages
Num
b
er
of
states/transitions
states transitions
Fig.6.1. StatesandTransitions
Thetimerequiredtogenerateandsolvethemodelsmustbearefullyonsidered. Thegenerationisalways
very quik: it takes less than 0.01 seonds to generate 20 models. The time required to solve the models
isusually moreimportant,espeially when themodels havealargestatespae. However,if weonsider only
relativelysmallmodels(upto
20
,
000
states),theresolutionwiththePEPAworkbenhtakesonlyafewseonds. Fig.6.1showsthatwhenthenumberofstagesislessthan9
,thesizeofthemodelissmallenoughtohaveafast resolution. However,themodelgrowsexponentiallywhenthenumberofstagesisinreased,makingAMoGeTTheoveralluseofAMoGeTtakesusuallylessthanoneminuteforomplexappliationsrunningonseveral
proessors,evenwhenweonsider severalmodelstosolve.
Asstated earlier,in asenarioof longomputinggridappliations, witheventuallydynami resheduling
oftheappliation,weonsiderthat thetoolmayberun oneperhour. Wethereforebelievethattheamount
oftime requiredmaybequitenegligibleand that thegainobtainedbyusing thebestof themappings whih
were investigatedanoutperformtheostoftheuseofthetool.
7. Conlusions. In the ontext of grid appliations, the availability and performane of the resoures
hange dynamially. We haveshown through this study that the use of skeletons, and performane models
of these, an produe some relevant information to improve the performane of the appliation. This has
been illustrated on the pipeline skeleton, whih is a ommonly used algorithmi skeleton. The models help
us to hoose themapping,of the stagesonto theproessors,whih will produethebest throughput. Atool
automatesallthestepsto obtaintheresulteasily.
Thepipelineskeletonisasimpleontrolskeleton. Thedealskeletonhasalreadybeenmodelledinasimilar
way[3℄,andexperimentsareongoingusingdealskeletonsnestedintoapipelineappliation. Thisapproahwill
alsobedevelopedonsomeotherskeletonssoitmaybeusefulforalargerlassofappliations.
Our reent work onsiders the generation of models whih take into aountinformation from the Grid
resoures, whih is gathered with the help of the Network Weather Servie [17℄. This will allow us to have
models tted to the real-time onditions of the resoures. This rst ase study has already shown that we
an use suh information produtively and that we have the potential to enhane the performane of grid
appliationswiththeuseofskeletonsandproessalgebras.
Havingproessalgebramodelsof ourskeletons alsopotentially oersotherbenetssuh asthe ability to
formallyverifytheorretfuntioning oftheskeleton. Weintendto explorethis aspetinfuture work.
AppendixA. Strutured OperationalSemantisforPEPA.
Thesemantirules, in thestruturedoperationalstyle,are presentedin FigureA.1; theinterestedreader
is referredto [13℄ formoredetails. Therules areread asfollows: ifthetransition(s) abovetheinferene line
anbeinferred,thenwean inferthetransitionbelowtheline. Thenotation
rα
(
E
)
whihisusedin thethird ooperation rule denotes the apparent rate ofα
inE
, i.e. the sum of the rates of all ativities of typeα
inA
ct
(
E
)
.AppendixB. Pipelineexample: input le forthe PEPAWorkbenh.
Theinput lefor thePEPA Workbenh is displayedin Fig. B.1, forasmall examplewith
Ns
=
Np
= 3
, andwhereeahproessorishostingoneofthestages.AppendixC. The PEPA Workbenh.
C.1. Funtioning ofthe Workbenh. ThePEPAWorkbenhbeginsbygeneratingthereahablestate
spaeofaPEPAmodelasfoundfromallpossibleinterleavingsofitstransitionsfromstatetostate. Foranite
statemodelwith
n
stateswean enumerate thisstate spaeasC
=
{
C
1
, . . . , Cn
}
. As theworkbenharries outthistaskitompilestheinnitesimalgeneratormatrixQ
oftheontinuous-timeMarkovproessunderlyingthePEPAmodel. Theworkbenhaddsatransitionrate
r
toQij
everytimethatitndsatransitionfromstateCi
toCj
at rater
. Additionally it subtratsr
fromQii
in order that the rowsum of thematrix remainsinbalane.
Theonditionswhih must be satisedin orderto guarantee theexisteneof anequilibrium distribution
for aMarkov proess, and for this to be thesame asthe limitingdistribution, are well-knowna stationary
or equilibrium probability distribution,
Π
,exists for every time-homogeneous irreduible Markov hain whosestatesareall positive-reurrent.
Theintuitionbehindthisdistributionistheobviousone,namelythatinthelongruntheprobabilitythat
thePEPAmodel isinstate
Ci
isgivenbyΠ
(
Ci
)
.For nite state PEPA models whose derivation graph is strongly onneted, and whih therefore have
generated an ergodi Markov proess, the equilibrium distribution of the model,
Π
, is found by solvingthematrixequation
Prex
(
α, r
)
.E
−
−−→
(
α,r
)
E
Cooperation
E
−
(
−−→
α,r
)
E
′
E
⊲
⊳
L
F
(
α,r
)
−
−−→
E
′
⊲
⊳
L
F
(
α /
∈
L
)
F
(
α,r
)
−
−−→
F
′
E
⊲
⊳
L
F
(
α,r
)
−
−−→
E
⊲
⊳
L
F
′
(
α /
∈
L
)
E
−
(
−−→
α,r1
)
E
′
F
−
(
−−→
α,r2
)
F
′
E
⊲
⊳
L
F
(
α,R
)
−
−−→
E
′
⊲
⊳
L
F
′
(
α
∈
L
)
whereR
=
r
1
r
α
(
E
)
r
2
r
α
(
F
)
min(
r
α
(
E
)
, r
α
(
F
))
Choie
E
−
−−→
(
α,r
)
E
′
E
+
F
−
−−→
(
α,r
)
E
′
F
−
−−→
(
α,r
)
F
′
E
+
F
−
(
−−→
α,r
)
F
′
Hiding
E
−
−−→
(
α,r
)
E
′
E/L
−
−−→
(
α,r
)
E
′
/L
(
α /
∈
L
)
E
(
α,r
)
−
−−→
E
′
E/L
−
−−→
(
τ,r
)
E
′
/L
(
α
∈
L
)
Constant
E
(
−→
α,r
)
E
′
A
(
−→
α,r
)
E
′
(
A
def=
E
)
Fig.A.1. TheoperationalsemantisofPEPA
subjettothenormalisationonditionwhihensuresthat
Π
isawell-formedprobabilitydistributionX
Π
(
Ci
) = 1
.
(C.2)TheequationsC.1and C.2 areombinedbyreplaingaolumn of
Q
by aolumn of ones andplainga1intheorrespondingrowof
0
.Beausetheonnetivitygraphofthestatetransitionsystemofthemodelwillingeneralhavelowdegree,
the transition matrix of the Markov proess is best stored as a sparse matrix. The PEPA Workbenh uses
aJavaimplementation of the preonditioned bionjugate gradientmethod. This is an iterative proedure as
desribedin[15℄storingtheinnitesimalgeneratormatrixinrow-indexedsparsestoragemode,aompatstorage
mode whih requires storageof only abouttwotimes thenumberof nonzeromatrix elements. An advantage
of onjugate gradient methods for large sparse systems is that they referene the matrix only through its
multipliationofavetor,orthemultipliationofitstransposeandavetor.
C.2. Computing performane results with the PEPA Workbenh. The newfuntionalityof the
workbenhisdesribedthroughatinyexample[10℄,whihweshallrstdesribe. Wethenexplainhowtoadd
thedesriptionoftheresultsin thePEPA inputleandhowtoomputethem.
A tiny example. We desribe the omponents of the PEPA input language for the Workbenh via a
simpleexample,desribedin theletiny.pepa:
// PIPELINE APPLICATION
// 3 stages, 3 proessors (1 stage per proessor)
// Variables delaration (all idential)
mu1=10; mu2=10; mu3=10;
la1=10; la2=10; la3=10; la4=10;
// Definition of the Stages
Stage1 = (move1, infty).(proess1, infty).(move2, infty).Stage1;
Stage2 = (move2, infty).(proess2, infty).(move3, infty).Stage2;
Stage3 = (move3, infty).(proess3, infty).(move4, infty).Stage3;
// Definition of the Proessors
Proessor1 = (proess1, mu1).Proessor1;
Proessor2 = (proess2, mu2).Proessor2;
Proessor3 = (proess3, mu3).Proessor3;
// Definition of the Network
Network = (move1,la1).Network + (move2,la2).Network
+ (move3,la3).Network + (move4,la4).Network;
// The pipeline model
Network <move1,move2,move3,move4>
(Stage1 <move2> Stage2 <move3> Stage3)
<proess1,proess2,proess3> (Proessor1||Proessor2||Proessor3)
Fig.B.1.TheinputleforthePEPAWorkbenh:pipeline.pepa
P2=(run,r2).P3;
P3=(stop,r3).P1;
P1 || P1
Thismodelisomposedoftwoopiesofaomponent,
P
1
,exeutinginapureparallelsynhronization.P
1
isasimplesequentialproesswhih undergoesastartativitywithrate
r
1
tobeomeP
2
whihrunswithrater
2
to beomeP
3
whihgoesbaktoP
1
viaastop ativitywithrater
3
.Therstlineoftheleisdeningtherates. Thenthesequentialproessisdened,andthenallineisthe
systemequation,whihdesribesthebehaviourofthemodelledsystem.
Adding resultstothe inputle. Inordertoautomatiallyomputesomeperformaneresults,theuser
just needstospeifythemin thePEPAinputle, forexampleintheletiny.pepapresentedbefore. This is
donebyinludingat theendoftheleonelineperresult,oftheform:
result_name = {result_desription};
result_name = rate * {result_desription};
Thenameoftheperformaneresultthat isdesribedisresult_name,andthedesriptionoftheresultforthe
PEPAStateFinderisresult_desription.
Thestates ofinterest aredesribedthroughthe useof asimplepattern language,with double stars (**)
forwildards,anddoublevertialbars(||)forseparatorsbetweenmodelomponents. Themodelomponents
aredesribedin theorderusedin thesystemequation.
Arateanbeadded;inthisasethenalresultobtainedbythePEPAStateFinderwillbemultipliedby
thisrate. Thisisquiteusefulto omputethroughput.
For ourexample, we anadd someresults onerningthe rstproess, independentlyof the stateof the
run1 = {P2 || **};
Trun1 = r2 * {P2 || **};
stop1 = {P3 || **};
For example, the performane result run1 mathes all the states in whih the rst proess is ready to
perform therun ativity. Thestateof theseond proess anbeanything. Trun1isthe same,multiplied by
therateoftherunativityr2. Itorrespondsthereforetothethroughputofrunfortherstproess.
Forthepipelineappliation, therequiredperformaneresultisspeiedinthePEPAinputle
pipeline.pepa(Fig.B.1). Thisisdonebyaddingthefollowinglineat theend ofthisle:
Throughput = mu1 *
{
** <move1,move2,move3,move4>((proess1, infty).(move2,infty).Stage1 <move2> ** <move3> **)
<proess1,proess2,proess3> (** || ** || **)
}
Computingtheresults. Theresultsanbeomputedbyusingtheommandlineinterfae. Thisisdone
byinvokingthefollowingommand:
java pepa.workbenh.Main -run lr ./tiny.pepa
The-run lr(or-run lnbg+results)optionmeansthat weusethelinearbionjugate gradientmethod
toomputethesteadystatesolutionofthemodeldesribedin thele./tiny.pepa,andthenweomputethe
performaneresultsspeiedinthis le.
Thisexeutionprintstheresultstothesreen,anditalsosavesoneleperperformaneresult
(./results/model_name.result_name).ThisleistheoutputofthePEPAStateFinderfortheresult
desrip-tionspeiedin theinputle. Itontainsthestatemathing thedesription,andthesumofthesteady-state
probabilitiesforthesestates. Itdoesnottakethemultipliativerateintoaount. Theresultsarealsoappended
to the le./model_root.res,where model_root is the beginning of the model_name, until a
−
ora.
isfound. This isusedtoautomatiallyompareresultsofsimilarmodels.
OnlyafewleshavebeenmodiedtoinludethenewfuntionalityintheJavaWorkbenh. Theinterested
readershould referto[12℄.
REFERENCES
[1℄ M.Alt,H.Bishof,andS.Gorlath,ProgramDevelopmentforComputationalGridsUsingSkeletonsandPerformane
Predition,ParallelProessingLetters,12(2002),pp.157174.
[2℄ A. Benoit, M. Cole, S. Gilmore, and J. Hillston, Evaluating theperformane of skeleton-based highlevel parallel
programs,inTheInternationalConfereneonComputationalSiene(ICCS2004),PartIII,M.Bubak,D.vanAlbada,
P.Sloot,andJ.Dongarra,eds.,LNCS,SpringerVerlag,2004,pp.299306.
[3℄ A.Benoit,M.Cole,S.Gilmore,andJ.Hillston,Shedulingskeleton-basedgridappliationsusingPEPAandNWS,
SubmittedtoaspeialissueofTheComputerJournalonGridPerformabilityModellingandMeasurement,(2004).
[4℄ R. Biswas, M. Frumkin, W. Smith, andR. V. derWijngaart,Tools andTehniquesforMeasuring andImproving
Grid Performane,inPro. ofIWDC 2002 onDistributed Computing: Mobileand Wireless Computing,vol.2571of
LNCS,Calutta,India,De.2002,Springer-Verlag,pp.4554.
[5℄ M. Cole, Algorithmi Skeletons: Strutured Management of Parallel Computation, MIT Press & Pitman, 1989.
http://homepages.inf.ed.a.uk/mi/Pubs/skeletonbook.ps.gz.
[6℄ M.Cole,eSkel: TheedinburghSkeletonlibrary.TutorialIntrodution,InternalPaper,ShoolofInformatis,Universityof
Edinburgh,(2002).
http://homepages.inf.ed.a.uk/mi/eSkel/.
[7℄ M.Cole,BringingSkeletons outof theCloset: APragmatiManifesto forSkeletalParallel Programming,Parallel
Com-puting,30(2004),pp.389406.
[8℄ I.FosterandC.Kesselman,TheGrid:BlueprintforaNewComputingInfrastruture,MorganKaufmann,1998.
[9℄ N. Furmento, A. Mayer, S. MGough, S. Newhouse, T. Field, and J. Darlington, ICENI: Optimisation of
ComponentAppliationswithinaGridEnvironment,ParallelComputing,28(2002),pp.17531772.
[10℄ S.Gilmore,ThePEPAWorkbenh: User'sManual,InternalPaper,ShoolofInformatis,UniversityofEdinburgh,(2001).
http://www.ds.ed.a.uk/pepa/pwb.pdf.
[11℄ S.GilmoreandJ.Hillston,ThePEPAWorkbenh:ATooltoSupportaProessAlgebra-basedApproahtoPerformane
Modelling, inPro.of the 7th Int. Conf. on Modelling Tehniques and Tools for ComputerPerformaneEvaluation,
no.794inLNCS,Vienna,May1994,Springer-Verlag,pp.353368.
http://www.ds.ed.a.uk/pepa/workbenh.ps.gz.
[12℄ N.Haenel,UserGuidefortheJavaEditionofthePEPAWorkbenh-Tabasorelease,InternalPaper,ShoolofInformatis,
[13℄ J.Hillston,ACompositionalApproahtoPerformaneModelling,CambridgeUniversityPress,1996.
[14℄ N.Karonis,B.Toonen,andI. Foster,MPICH-G2:AGrid-Enabled ImplementationoftheMessagePassingInterfae,
JournalofParallelandDistributedComputing(JPDC),63(2003),pp.551563.
[15℄ W.H.Press,S.A.Teukolsky,W.T.Vetterling,andB.P.Flannery,NumerialReipesinC:TheArtofSienti
Computing,CambridgeUniversityPress,1992.
[16℄ F.RabhiandS.Gorlath,PatternsandSkeletonsforParallel andDistributedComputing,SpringerVerlag,2002.
[17℄ R.Wolski,N.Spring,andJ.Hayes,Thenetworkweatherservie: adistributedresoureperformaneforeastingservie
formetaomputing,FutureGenerationComputerSystems,15(1999),pp.757768.
Editedby: FrédériLoulergue
Reeived: June3,2004