Evaluating the performance of pipeline-structured parallel programs with skeletons and process algebra

(1)

Volume6,Number4,pp.116. http://www.spe.org

2005SWPS

EVALUATING THEPERFORMANCEOFPIPELINE-STRUCTURED PARALLEL

PROGRAMS WITH SKELETONSANDPROCESSALGEBRA

∗

ANNE BENOIT

†

, MURRAY COLE , STEPHEN GILMORE , AND JANE HILLSTON

Abstrat. Weshowinthispaper how toevaluatethe performaneofpipeline-strutured parallelprograms withskeletons

andproessalgebra. Sinemanyappliationsfollowsomeommonlyusedalgorithmiskeletons,weidentifysuh skeletonsand

modelthem withproess algebrainorderto getrelevant informationaboutthe performaneofthe appliation,and tobeable

totakegoodshedulingdeisions. Thisonept isillustrated through the asestudyofthe pipeline skeleton,and atool whih

generatesautomatiallyasetofmodelsandsolvesthemispresented. Somenumerialresultsareprovided,provingtheeayof

thisapproah.

Key words. Algorithmiskeletons, pipeline,high-levelparallel programs,performaneevaluation,proess algebra,PEPA

Workbenh.

1. Introdution. One of the most promising tehnial innovations in present-day omputing is the

in-ventionof grid tehnologies whih harness theomputational power of widelydistributed olletionsof

om-puters [8℄. Designing an appliation for the Grid raises diult issues of resoure alloation and sheduling

(roughlyspeaking,howtodeidewhihomputerdoeswhat,and when,andhowtheyinterat). These issues

aremadeall themoreomplexbytheinherentunpreditabilityof resoureavailabilityand performane. For

example,asuperomputermayberequiredforamoreimportanttask,ortheInternetonnetionsrequiredby

theappliation maybepartiularlybusy.

In this ontext of grid programming, askeleton-basedapproah [5, 16, 7℄ reognizes that manyreal

ap-pliations drawfrom arange of well-known solution paradigmsand seeks to make it easy for anappliation

developerto tailorsuhaparadigmto aspei problem. Powerfulstruturingoneptsare presentedto the

appliation programmeras alibrary ofpre-dened `skeletons'. As with otherhigh-levelprogrammingmodels

theemphasisisonprovidinggeneripolymorphiroutineswhihstrutureprogramsinlearly-delineatedways.

Skeletal parallel programming supports reasoning about parallel programs in order to removeprogramming

errors. Itenhanesmodularityandongurabilityinordertoaidmodiation,portingandmaintenane

ativ-ities. InthepresentworkwefousontheEdinburghSkeletonLibrary(eSkel)[6℄. eSkelisanMPI-basedlibrary

whihhasbeendesignedforSMPandlusteromputingandisnowbeingonsideredforgridappliationsusing

grid-enabledversionsofMPIsuhasMPICH-G2 [14℄.

Theuse ofapartiular skeleton arrieswith itonsiderable information aboutimplied sheduling

depen-denies. Bymodellingthese withstohastiproessalgebrassuh asPerformaneEvaluationProessAlgebra

[13℄,andtherebybeingableto inludeaspetsofunertaintywhihareinherenttogridomputing,webelieve

thatwewillbeabletounderpinsystemswhihanmakebettershedulingdeisionsthanlesssophistiated

ap-proahes. Mostsigniantly,sinethismodellingproessanbeautomated,andsinegridtehnologyprovides

failitiesfor dynami monitoringofresoureperformane, ourapproahwill support adaptive reshedulingof

appliations.

Stohastiproessalgebraswereintroduedintheearly1990sasaompositionalformalismforperformane

modelling. Sinethentheyhavebeensuessfullyappliedtotheanalysisofawiderangeofsystems. Ingeneral

analysisisbasedonthegenerationofanunderlyingontinuoustimeMarkovhain(CTMC)andderivation of

itssteadystateprobabilitydistribution. Thisvetorreordsthelikelihoodofeahpotentialstateofthesystem,

and an in turn be used to derive performane measures suh as throughput, utilisationand response time.

Several stohasti proess algebrashaveappeared in theliterature; we useHillston's PerformaneEvaluation

ProessAlgebra(PEPA)[13℄.

Somerelatedprojetsobtainperformaneinformationfrom theGridusing benhmarkingandmonitoring

tehniques[4,17℄. IntheICENIprojet[9℄,performanemodelsareusedto improvetheshedulingdeisions,

but these arejust graphswhih approximate dataobtainedexperimentally. Moreover,there is noupper-level

layerbasedonskeletonsinanyofthese approahes.

∗

Thiswork ispart of the ENHANCEprojet, funded by the UnitedKingdom Engineeringand PhysialSienes Researh

ounilgrantnumberGR/S21717/01.

†

Shool of Informatis, TheUniversityof Edinburgh, JamesClerk MaxwellBuilding,The King's Buildings,MayeldRoad,

(2)

Otherreentwork onsiderstheuseofskeleton programswithin gridnodestoimprovethequalityofost

information[1℄. Eahserverprovidesasimplefuntionapturingtheostofitsimplementationofeahskeleton.

Inan appliation,eah skeleton therefore runs onlyon oneserver,and the goalof sheduling is to seletthe

most appropriate serverswithin the wider ontext of the appliation and supporting grid. In ontrast, our

approah onsiders singleskeletons whih span theGrid. Moreover,weuse modelling tehniques to estimate

performane.

Our main ontribution is based on the ideaof using performane models to enhane the performane of

grid appliations. We propose to model skeletons in ageneri way to obtain signiantperformane results

whihmaybeusedtoresheduletheappliationdynamially. Tothebestofourknowledge,this kindofwork

has notbeendone before. We show in this paper how wean obtainsigniantresults onarst ase study

basedonthepipelineskeleton. Anearlierversionofthispaperispublishedintheproeedingsoftheworkshop

on PratialAspets of High-level Parallel Programming (PAPP04), part of the InternationalConferene on

ComputationalSiene(June7-9,2004,Kraków,Poland)[2℄. InthisextendedversionapresentationofPEPA

isinluded; the model resolutionand thetoolAMoGeTaredesribedmorepreisely; and moreexperimental

resultsareexposed.

In thenext setion, wepresent thepipeline and amodel of theskeleton. Then we explain how to solve

themodelwiththePEPA Workbenhinordertogetrelevantinformation(Setion3). InSetion 4wepresent

a tool whih automatially determines the best mappingto usefor the appliation, by rst generating a set

of models, then solvingthem and omparing the results. Some numerial resultson the pipeline appliation

are provided in Setion 5,and thefeasibilityof this approah isdisussed in Setion 6. Finallywegivesome

onlusions.

2. Thepipelineskeleton. Manyparallelalgorithmsanbeharaterizedandlassiedbytheiradherene

tooneormoreofanumberofgenerialgorithmiskeletons[16,5,7℄. Wefousinthispaperontheoneptof

pipelineparallelism,whihisofwell-provenusefulnessinseveralappliations. Wereallbrieytheprinipleof

thepipelineskeleton. ThenweintroduetheproessalgebraPEPA[13℄andweexplainhowweanmodelthe

pipelinewithPEPA. Finally,weshowin Setion2.4thestatetransitiondiagramofathree stagepipeline.

2.1. The prinipleof pipeline. Inthesimplest formofpipelineparallelism[6℄,asequeneof

Ns

stages

proessasequeneofinputstoprodueasequeneofoutputs(Fig.2.1).

...

Stage

1

Stage

2

Stage

N

s

inputs outputs

Fig.2.1. Thepipeline appliation

Eah inputpasses througheah stage in thesameorder, andthe dierent inputsare proessedone after

another(astageannotproessseveralinputsatthesametime). Notethattheinternalativityofastagemay

beparallel, but this is transparent to ourmodel. Inthe remainderof the paper weuse the termproessor

todenotethehardwareresponsibleforexeutingsuhativity,irrespetiveofitsinternaldesign(sequentialor

parallel).

We onsider this appliation lass in the ontext of omputational grids, and so we want to map it to

our omputing resoures, whih onsist of a set of potentially heterogeneous proessors interonneted by a

heterogeneousnetwork.

It is well known that aomputing pipeline performs mosteetively when theworkload iswell balaned

arossstages andthereare alargeenoughnumberof inputsto amortizetheostsof llingand draining. Our

work diretly addressesthe rst of these issues, by failitating explorationof the stage-to-proessormapping

spae. Theseondissueremainstheresponsibilityoftheprogrammer: ourapproahassumesthat runningthe

appliation will take longenough for the system to reah an equilibrium behaviour. The models help us to

studythissteadystatebehaviour.

Considering the pipeline appliation in the eSkel library [6℄, we fous here on a pipeline variant whih

requiresthat eahstageproduesexatlyoneoutputforeah input.

We now go on to present thePEPA languagewhih wewill use to model thepipeline appliation. The

(3)

2.2. Introdution to PEPA. The PEPA language provides a small set of ombinators. These allow

languagetermstobeonstruteddeningthebehaviourofomponents,viatheativitiestheyundertakeandthe

interationsbetweenthem. Timinginformationisassoiatedwitheahativity. Thus,whenenabled,anativity

a

= (

α, r

)

will delay foraperiod sampled from thenegativeexponentialdistribution whih hasparameter

r

. If several ativities are enabled onurrently, either in ompetition orindependently, we assume that arae

ondition exists between them. The omponent ombinators, together with their namesand interpretations,

arepresentedinformallybelow.

Prex: Thebasi mehanism for desribingthe behaviour of asystem is to givea omponent a designated

rstation using the prexombinator,denoted by afull stop. For example,the omponent

(

α, r

)

.S

arries outativity

(

α, r

)

,whihhasationtype

α

andanexponentiallydistributeddurationwithparameter

r

,andit subsequentlybehavesas

S

.

Choie: The hoie ombinatoraptures the possibility of ompetition betweendierent possible ativities.

Theomponent

P

+

Q

representsasystemwhih maybehaveeither as

P

oras

Q

. The ativitiesof both

P

and

Q

areenabled. Therstativitytoompletedistinguishesoneofthem: theotherisdisarded. Thesystem

willbehaveasthederivativeresultingfromtheevolutionofthehosenomponent.

Constant: It is onvenientto beableto assign namesto patternsof behaviourassoiatedwith omponents.

Constantsare omponentswhose meaningis givenbyadening equation. Forexample,

P

def

= (

α, r

)

.P

denes aomponentwhihperformsativity

α

atrate

r

,forever.

Hiding: Thepossibilityto abstrat awaysomeaspets ofaomponent's behaviouris provided bythehiding

operator, denoted

P/L

. Here, the set

L

of visible ation types identies those ativities whih are to be

onsideredinternalorprivatetotheomponentandwhihwillappearastheunknowntype

τ

.

Cooperation: InPEPAdiretinteration,orooperation,betweenomponentsisthebasisofompositionality.

The set whih is used asthe subsript to the ooperation symbol, the ooperation set

L

, determines those

ativities on whih the o-operands are fored to synhronise. For ation types not in

L

, the omponents

proeedindependentlyand onurrentlywith theirenabledativities. However,anativity whoseationtype

is in the ooperation set annot proeed until both omponents enable an ativity of that type. The two

omponents then proeed together to omplete the shared ativity. The rate of the sharedativity may be

alteredto reetthe work arriedoutby bothomponentsto ompletethe ativity (fordetails see[13℄). We

write

P

k

Q

asanabbreviationfor

P

⊲

⊳

L

Q

when

L

isempty.

In some ases, when an ativity is known to be arried out in ooperation with another omponent, a

omponent may be passive with respet to that ativity. This means that the rate of the ativity is left

unspeied(denoted

⊤

)andisdetermineduponooperation,bytherateoftheativityintheotheromponent.

Allpassiveationsmustbesynhronisedin thenalmodel.

ThedynamibehaviourofaPEPA modelisrepresentedbytheevolutionofitsomponents,either

individ-uallyorinooperation. Theformofthisevolutionisgovernedbyasetofformalruleswhihgiveanoperational

semantisofPEPAterms(see[13℄). Thus,asinlassialproessalgebra,thesemantisofeahterminPEPAis

givenviaalabelledmulti-transitionsystem(themultipliitiesofarsaresigniant). Inthetransitionsystema

stateorrespondstoeahsyntatitermofthelanguage,orderivative,andanarrepresentstheativitywhih

ausesonederivativeto evolveinto another. Theompleteset ofreahablestatesis termedthederivative set

of amodel and these form the nodesof the derivation graph whih is formed by applying thesemanti rules

exhaustively.

ThederivationgraphisthebasisoftheunderlyingContinuousTimeMarkovChain(CTMC)whihisused

toderiveperformanemeasuresfrom aPEPA model. Thegraphissystematiallyreduedto aform where it

anbetreatedasthestatetransitiondiagramoftheunderlyingCTMC. Eah derivativeisthenastatein the

CTMC. Thetransition rate betweentwoderivatives

P

and

Q

inthederivationgraphis therateatwhih the

systemhangesfrom behavingasomponent

P

to behavingas

Q

. It isthesumof theativityrateslabelling

arsonnetingnode

P

tonode

Q

.

(4)

The stages

Therstpartofthemodelistheappliation model,whihisspeiedindependentlyoftheresouresonwhih

theappliation will beomputed. We deneone PEPA omponent per stage. For

i

= 1

..Ns

, theomponent Stage

i

workssequentially. Atrst,itgetsdata(ativitymove

i

),thenproessesit(ativityproess

i

),andnally movesthedatato thenextstage(ativitymove

i

+1

).

Stage

i

def

=

(

move

i,

⊤

)

.

(

proess

i

,

⊤

)

.

(

move

i

+1

,

⊤

)

.

Stage

i

Alltheratesareunspeied,denotedbythedistinguishedsymbol

⊤

,sinetheproessingandmovetimes

dependontheresoureswheretheappliationis running. These rateswill bedened later,inanotherpartof

themodel.

Thepipeline appliation isthen dened asaooperationofthe dierentstages overthemove

i

ativities, for

i

= 2

..Ns

.

Theativities move

1

and move

N

s

+1

represent, respetively, thearrivalof aninput in theappliation and

thetransferofthenal outputoutofthepipeline. Theydonotrepresentanydatatransferbetweenstages,so

theyarenotsynhronizingthepipelineappliation. Finally, wehave:

Pipeline def

=

Stage

1 {

move

⊲

⊳

₂

}

Stage

2 {

move

⊲

⊳

₃

}

. . .

⊲

⊳

{

move

Ns

}

Stage

N

s

The proessors

Weonsiderthattheappliationmustbemappedonasetof

Np

proessors. Eahstageisproessedbyagiven

(unique)proessor,but aproessormayproessseveralstages (intheasewhere

Np

< Ns

). Inorderto keep

themodelsimple, wedeideto put information abouttheproessor(suh asthe loadof theproessororthe

numberof stages being proessed)diretlyin therate

µi

of theativities proess

i

,

i

= 1

..Ns

(these ativities havebeendenedfortheomponentsStage

i

).

Eah proessor is then represented by a PEPA omponent whih has a yli behaviour, onsisting of

proessingsequentiallyinputsforastage. Someexamplesfollow.

•

Intheasewhen

Np

=

Ns

,wemap onestageperproessor:

Proessor

i

def

=

(

proess

i

, µi

)

.

Proessor

i

•

If several stages are proessed by a same proessor, we use a hoie omposition. In the following

example (

Np

= 2

and

Ns

= 3

), the rst proessor proesses the two rst stages, and the seond proessorproessesthethirdstage.

Proessor

1

def

=

(

proess

1 , µ

1 )

.

Proessor

1 + (

proess

2 , µ

2 )

.

Proessor

1

Proessor

2

def

=

(

proess

3 , µ

3 )

.

Proessor

2

Sineallproessorsareindependent,thesetofproessorsisdenedasaparallelompositionoftheproessor

omponents:

Proessors def

=

Proessor

1 ||

Proessor

2 ||

. . .

||

Proessor

N

p

The network

Thelastpartofthemodelisthenetwork.Wedonotneedtodiretlymodelthearhitetureandthetopology

of the network for what we aim to do, but we wantto getsome information about theeieny of thelink

onnetionbetweenpairsofproessors. Thisinformationisgivenbyaetingtherates

λi

ofthemove

i

ativities (

i

= 1

..Ns

+ 1

).

λ

1

representstheonnetionbetweentheuser(providinginputstothepipeline)andtheproessorhosting

therststage.

For

i

= 2

..Ns

,

λi

representstheonnetionbetweentheproessorhostingstage

i

−

1

and theproessor hostingstage

i

.

λN

(5)

Notethat

λi

will enodeinformation bothabouttheloadonthe linksand thesize of thedataproessed

byproess

i

−

1

. Whenthedataistransferred onthesameomputer,therateisreallyhigh, meaningthat the

onnetionis fast(omparedtoatransferbetweendierentsites).

Thenetworkisthen modelledbythefollowingomponent:

Network def

=

(

move

1 , λ

1 )

.

Network

+

· · ·

+ (

move

N

s

+1

, λN

s

+1

)

.

Network

The pipeline model

Onewehavedenedthedierentomponentsofourmodel,wejusthavetomapthestagesontotheproessors

andthenetworkbyusingtheooperationombinator. Forthis,wedenethefollowingsetsofationtypes:

Lp

=

{

proess

i

}

i

=1

..N

s

tosynhronizethePipeline andtheProessors

Lm

=

{

move

i

}

i

=1

..N

s

+1

tosynhronizethePipeline andtheNetwork

Mapping def

=

Network

⊲

⊳

Lm

Pipeline

⊲

⊳

Lp

Proessors

PEPA input le

AnexampleofaninputleforthePEPA WorkbenhanbefoundinAppendix B.

2.4. State transition diagram for the pipeline model. Figure 2.2 represents the state transition

diagram of a three stage, three proess pipeline. This piture shows all of the possible interleavings of the

omponents of the model with ars of various kinds showingthe dierent types of transitions from state to

state.

InTable2.1 we showthe orrespondene betweenthe statenumbersin Figure2.2 and the PEPA terms.

Sine the PEPA terms are long we have omitted the ooperation sets, showing only the loal state of eah

omponent. Moreoverto keep thetable ompat we havenamed thederivativesof the

Stage

omponentsas

follows:

Stage

i

0

def

= (

move

i,

⊤

)

.

Stage

i

1

Stage

i

1

def

= (

proess

i,

⊤

)

.

Stage

i

2

Stage

i

2

def

= (

move

i

+1

,

⊤

)

.

Stage

i

0

3. Solving the models. One reason to work with aformal modelling languagesuh as PEPA is that

modelsareunambiguousandanservetosupport reliableommuniationbetweenthosewhodesignsystems,

those who develop them and those who maintain them. Another reason to work with a formal modelling

languageis that formal models an be automatially proessed by tools in order to derive information from

themwhih otherwisewouldhavetobeproduedbymanualalulationorreasoning.

Thetoolwhih wehaveusedforproessingourPEPA modelsandomputingthesteady-stateprobability

distributionofoursystemisthePEPAWorkbenh. Afulldesriptionofthefuntioningofthissoftwareanbe

foundin[11℄;thereferenemanualforthelatestreleaseis[12℄. Weinludeabriefdesriptionofthefuntioning

oftheWorkbenhinAppendixC.1in ordertomakethepresentpaperself-ontained.

Notiehoweverthat thesteady-state probability distribution of thesystem is rarelythe desiredresult of

the performane analysis proess and so to progress we must identify a signiant performane result. The

performane result that is pertinent for the pipeline appliation is the throughput of the proess

i

ativities (

i

= 1

..Ns

). Sinedatapassessequentiallythrougheahstage,thethroughputisidentialforall

i

,andweneed toomputeonlythethroughputofproess

1

toobtainsigniantresults. Thisisdonebyaddingthesteady-state

probabilitiesof eahstateinwhih proess

1

anhappen,andmultiplyingthisby

µ

1

.

We have made somehanges to the Java edition of the PEPA Workbenh in order to allowthe user to

(6)

25

26

27

24

21 move1

move2

move3

move4

process3

process2

process1

8

9

2

3

10

11

12

7

16

17

18

23

22

4

19

6

20

14

15

5

13

1

Fig.2.2. Statetransitiondiagramofathreestage,threeproesspipelinewithstatesnumberedaordingtoTable2.1

4. AMoGeT:The AutomatiModelGenerationTool. Weinvestigateinthispaperhowtoenhane

theperformaneofgridappliationswiththeuseofalgorithmiskeletonsandproessalgebras. Todothis,we

havereated atool whih automatially generates performane models for thepipeline asestudy, and then

solvesthemodels. These resultsouldbeusedto reshedule theappliation.

Wegiveatrstanoverviewofthetool. Thenwedesribetheinformationwhihisprovidedtothetoolvia

adesription le. Finally,weexplainthefuntioningofthetool.

desription

le

performane

information PEPA

models results

AMoGeT

Compare

results

models Generate

Workbenh PEPA

Fig.4.1.TheprinipleofAMoGeT

4.1. AMoGeT desription. Fig.4.1 illustrates thepriniple of the tool. Inits urrent form, thetool

is a generi, reusable software omponent. Its ultimate role will be as an integrated omponent of a

(7)

Table2.1

CorrespondenebetweenstatenumbersinFigure2.2andPEPAterms(ooperation setsareomittedbutremainonstant)

stateno. PEPAstate

1

(

Network

,

(

Stage

10 ,

Stage

20 ,

Stage

30 )

,

(

Proessor

1 ,

Proessor

2 ,

Proessor

3))

2

(

Network

,

(

Stage

11 ,

Stage

20 ,

Stage

30 )

,

(

Proessor

1 ,

Proessor

2 ,

Proessor

3))

3

(

Network

,

(

Stage

12 ,

Stage

20 ,

Stage

30 )

,

(

Proessor

1 ,

Proessor

2 ,

Proessor

3))

4

(

Network

,

(

Stage

10 ,

Stage

21 ,

Stage

30 )

,

(

Proessor

1 ,

Proessor

2 ,

Proessor

3))

5

(

Network

,

(

Stage

11 ,

Stage

21 ,

Stage

30 )

,

(

Proessor

1 ,

Proessor

2 ,

Proessor

3))

6

(

Network

,

(

Stage

12 ,

Stage

21 ,

Stage

30 )

,

(

Proessor

1 ,

Proessor

2 ,

Proessor

3))

7

(

Network

,

(

Stage

10 ,

Stage

22 ,

Stage

30 )

,

(

Proessor

1 ,

Proessor

2 ,

Proessor

3))

8

(

Network

,

(

Stage

11 ,

Stage

22 ,

Stage

30 )

,

(

Proessor

1 ,

Proessor

2 ,

Proessor

3))

9

(

Network

,

(

Stage

12 ,

Stage

22 ,

Stage

30 )

,

(

Proessor

1 ,

Proessor

2 ,

Proessor

3))

10

(

Network

,

(

Stage

10 ,

Stage

20 ,

Stage

31 )

,

(

Proessor

1 ,

Proessor

2 ,

Proessor

3))

11

(

Network

,

(

Stage

11 ,

Stage

20 ,

Stage

31 )

,

(

Proessor

1 ,

Proessor

2 ,

Proessor

3))

12

(

Network

,

(

Stage

12 ,

Stage

20 ,

Stage

31 )

,

(

Proessor

1 ,

Proessor

2 ,

Proessor

3))

13

(

Network

,

(

Stage

10 ,

Stage

21 ,

Stage

31 )

,

(

Proessor

1 ,

Proessor

2 ,

Proessor

3))

14

(

Network

,

(

Stage

11 ,

Stage

21 ,

Stage

31 )

,

(

Proessor

1 ,

Proessor

2 ,

Proessor

3))

15

(

Network

,

(

Stage

12 ,

Stage

21 ,

Stage

31 )

,

(

Proessor

1 ,

Proessor

2 ,

Proessor

3))

16

(

Network

,

(

Stage

10 ,

Stage

22 ,

Stage

31 )

,

(

Proessor

1 ,

Proessor

2 ,

Proessor

3))

17

(

Network

,

(

Stage

11 ,

Stage

22 ,

Stage

31 )

,

(

Proessor

1 ,

Proessor

2 ,

Proessor

3))

18

(

Network

,

(

Stage

12 ,

Stage

22 ,

Stage

31 )

,

(

Proessor

1 ,

Proessor

2 ,

Proessor

3))

19

(

Network

,

(

Stage

10 ,

Stage

20 ,

Stage

32 )

,

(

Proessor

1 ,

Proessor

2 ,

Proessor

3))

20

(

Network

,

(

Stage

11 ,

Stage

20 ,

Stage

32 )

,

(

Proessor

1 ,

Proessor

2 ,

Proessor

3))

21

(

Network

,

(

Stage

12 ,

Stage

20 ,

Stage

32 )

,

(

Proessor

1 ,

Proessor

2 ,

Proessor

3))

22

(

Network

,

(

Stage

10 ,

Stage

21 ,

Stage

32 )

,

(

Proessor

1 ,

Proessor

2 ,

Proessor

3))

23

(

Network

,

(

Stage

11 ,

Stage

21 ,

Stage

32 )

,

(

Proessor

1 ,

Proessor

2 ,

Proessor

3))

24

(

Network

,

(

Stage

12 ,

Stage

21 ,

Stage

32 )

,

(

Proessor

1 ,

Proessor

2 ,

Proessor

3))

25

(

Network

,

(

Stage

10 ,

Stage

22 ,

Stage

32 )

,

(

Proessor

1 ,

Proessor

2 ,

Proessor

3))

26

(

Network

,

(

Stage

11 ,

Stage

22 ,

Stage

32 )

,

(

Proessor

1 ,

Proessor

2 ,

Proessor

3))

27

(

Network

,

(

Stage

12 ,

Stage

22 ,

Stage

32 )

,

(

Proessor

1 ,

Proessor

2 ,

Proessor

3))

Informationisprovidedtothetoolviaadesriptionle(f.Setion4.2). Thisinformationanbegathered

fromtheGridresouresandfrom theappliationdenition. Inthefollowingexperiments,itisprovidedbythe

user,butweanalsogetitautomatiallyfromgridservies,forexamplefromtheNetworkWeatherServie[17℄.

The tool allows everythingto be done in a single step through a simplePerl sript (f. Setion 4.3): it

generatesthemodels, solvesthemwith thePEPA Workbenh, and thenomparestheresults. This allowsus

tohavefeedbakontheappliationwhentheperformaneof theavailable resouresismodied.

4.2. Desription le for AMoGeT. The aimof this leis toprovideinformation abouttheavailable

gridresouresandthemodelledappliation,in ourasethepipeline.

Thisdesriptionle isnamed mymodel.des,wheremymodelisthenameoftheappliation.

•

Therstinformationprovidedisthetypeofthemodel. Sinewestudyherethepipelineskeleton,the

rstlineis

type

=

pipeline

;

•

WethenhavetheinformationabouttheGridresouresandNetworklinks,asalistofparameters. The

numberofproessors

N

mustatrstbespeied:

nbproc

=

N

;

Andthen,for

i

= 1

..N

and

j

= 1

..N

,wespeifytheavailableomputingpoweroftheproessor

i

(p

i

), andtheperformaneofthenetworklinkbetweenproessors

i

and

j

(nl

i

-

j

):

p1=10; p2=5;

nl1-1=10000; nl1-2=8;

(8)

•

Conerning the appliation, we have some information about the stages of the pipeline.

Ns

is the

numberofstages.

nbstage=

Ns

;

Theamountofwork w

i

requiredtoomputeoneoutputforstage

i

mustbespeiedfor

i

= 1

..Ns

: w1=2; w2=4; ...

Finally,weneed tospeifythesize ofthedatatransferred toand fromeahstage. For

i

= 1

..Ns

+ 1

, ds

i

isthesizeofthedatatransferredtostage

i

,withtheboundaryaseds

Ns

+ 1

whihrepresentsthe sizeoftheoutputdata.

ds1=100; ds2=5; ...

•

Next wedeneasetofandidatemappings ofstagesto proessors. Eahmappingspeies wherethe

initial datais loated, where theoutput data must beleft and (as atuple) theproessorwhere eah

stage isproessed. Forexample,thetuple

(1

,

1 ,

2)

meansthat thetworststages areonproessor

1

, withthethirdstageonproessor

2

. Amappingisthenoftheform

[

input, tuple, output

]

. Themapping denitionisasetofmappings,itanbeasfollows:

mappings=[1,(1,2,3),3℄,[1,(1,1,2),2℄,[1,(1,1,1),1℄;

•

Thelastthingistheperformaneresultwewanttoompute. Forthepipelineappliation,weanask

forthethroughputwiththeline:

throughput;

4.3. The AMoGeTPerlsript. Thetoolallowseverythingtobedoneinasinglestepthroughasimple

Perlsript. The model generation is done by alling an auxiliaryfuntion. Models arethen solvedwith the

PEPA Workbenh asseenin Setion3. Finally, theresultsare ompared. Thisallowsusto havefeedbakon

theappliationwhentheperformaneoftheavailableresouresismodied.

Onemodelisgeneratedfromeahmappingofthedesriptionle. EahmodelisasdesribedinSetion2.3.

Thediultpointonsistsofgeneratingtheratesfromtheinformationgatheredbefore. Themodelgeneration

itselfisthenstraightforward.

Toomputetheratesoftheproess

i

ativitiesforagivenmodel (

i

= 1

..Ns

), weneedto knowhowmany stagesarehosted oneahproessor,andweassumethattheworksharingbetweenthestagesisequitable. The

rateassoiatedwiththeproess

i

ativityisthen:

µi

=

w

i

×

cp

j

nbst

j

where

j

is thenumberoftheproessorhostingthestage

i

,andnbst

j

is thenumberofstagesbeingproessed

onproessor

j

. Ineet,theavailableomputingpowerp

j

isfurther dilutedbyourowninternaltimesharing

fatornbst

j

,beforebeingapplied totheworkloadassoiatedwiththestage,w

i

.

Theratesofommuniationbetweenstagesdependonthemappingtoo,sinetherateofamove

i

ativity dependsontheonnetionlinkbetweentheproessor

j

1

hostingstage

i

−

1

andtheproessor

j

2

hostingstage

i

, whihisgivenbynl

j

1

-

j

2

. Sinethemappingspeieswhere theinputandoutput dataare, weanalsond

the onnetion link for the data arriving into the pipeline and thedata exiting the appliation. These rates

depend alsoonthe size ofthe datatransferred from onestage of thepipeline to thenext, givenby ds

i

. The

boundaryases areapplied to omputethe ratesof the move

1

and move

N

s

+1

ativities. The rate assoiated

withthemove

i

ativityistherefore:

λi

=

nl

j

1 −

j

2 ds

i

Onetheseratesarederived,generatingthemodelisstraightforward. Weadd intothelethedesription

ofthethroughputoftheproess

1

ativityasarequiredresulttoallowanautomatiomputationofthisresult.

ThemodelsanthenbesolvedwiththePEPAWorkbenh,andthethroughputofthepipelineisautomatially

omputed(Setion3). Duringtheresolution,alltheresultsaresavedinasinglele,andthelaststepofresults

(9)

5. Numerialresults. Wepresentinthissetionsomenumerialresults. Weexplainthroughthemhow

theinformationobtainedwithAMoGeTanberelevantforoptimizingtheappliation.

In the present paper we do notapply this method to agivenreal-world example. We use anabstrat

pipeline forwhih wearbitrarily x thetime requiredto omplete eah stage. Thisis suient toshowthat

AMoGeTanhelptooptimizeanappliation.

5.1. Experiment1: Pipelinewith

3

stagesxed data size. Wegivehereafewnumerialresults onan examplewith

3

pipeline stages (and upto

3

proessors). The models that weneed to solve arereally small(inthisase,themodelhas

27

statesand

51

transitions,f. Figure2.2).

Wesupposeinthisexperimentthatnl

i

-

i

=10000for

i

= 1

..

3

,andthatthereisnoneedtotransfertheinput or the output data. Moreover,wesuppose that the network is symmetrial(nl

i

-

j

=nl

j

-

i

for all

i, j

= 1

..

3

). Conerningthepipelineparameters,theamountofworkw

i

requiredto omputeeahstageis

1

,aswellasthe sizeofthedatads

i

whihistransferredfromonestagetoanother. Therelevantparametersarethereforenl1-2,

nl2-3,nl1-3,andp

i

for

i

= 1

..

3

. Weomparedierentmappings,andjustspeifythetupleindiatingwhih stageisonwhihproessor. Weomparethemappings(1,1,1),(1,1,2),(1,1,3),(1,2,1),(1,2,2),(1,2,3),(1,3,1),

(1,3,2)and(1,3,3)(therststageisalwaysonproessor

1

). TheresultsaredisplayedinTable5.1,andweonly putthebest ofthemappingswhihwereinvestigatedintherelevantlineofthetable.

Table5.1

ResulttableforExperiment1

Setofresults Parameters Mapping&

nl1-2 nl2-3 nl1-3 p1 p2 p3 Throughput

1

10000

10

(1,2,3):

5 .

63467

10000

5

(1,2,3):

2 .

81892

2

10000

10

1

(1,2,1):

3 .

36671

10

1

(1,1,2):

2 .

59914

1

10

1

(1,1,1):

1 .

87963

3

10

1

10

(1,1,2):

2 .

59914

10

1

100

(1,3,3):

0 .

49988

Intherstsetofresults,alltheproessorsareidentialandthenetworklinksarereallyfast. Intheseases,

thebest mappingalwaysonsistsof putting onestageoneah proessor(the resultsforthemapping

(1

,

3 ,

2)

areidentialtothebestmapping). Ifwedividethetimealloatedbytheproessortotheappliationby

2

,the resultingthroughputisalsodividedby

2

,sineonlytheproessingpowerhasanimpatonthethroughput.

The seond set of results illustrates the ase when one proessor is beoming really busy, in this ase

proessor

3

. Weshouldnotuseitanymore,butdependingonthenetworklinks,thebestmappingmayhange. Ifthelinksarenoteient,weshouldindeedavoiddatatransferandtrytoputonseutivestagesonthesame

proessor. Whennl1-2=nl2-3=nl1-3=

10

,themapping

(1

,

2 ,

2)

providesthesameresultsas

(1

,

1 ,

2)

. Finally, the third set of results shows what happens if thenetwork link to proessor

3

is really slow. In this ase again, the use of the proessorshould be avoided, and the best mappings are

(1

,

1 ,

2)

and

(1

,

2 ,

2)

. However,ifproessor

3

isareallyfastproessoromparedtotheotherones(lastline),weproessstage

2

and stage

3

onthethirdproessor(mapping

(1

,

3 ,

3)

).

5.2. Experiment2: Pipeline with

3

stagesdata size hanging. Thethird experimentkeepsthe

3

stagepipeline,butonsidershangesinthesizeofthedata. TheassumptionsarethesameasforExperiment1, butmoreparametershaveaxedvalue.

In this experiment, the network onnetionbetweenproessors

1

and

2

is slightlyless eetive than the others. So, wehave nl

1

-

2 = 100

, nl

2

-

3 =

nl

1

-

3 = 1000

. Moreover, theomputing powerof eah stage is p

i

= 10

. The size of the data is nowxed to

100

, exept from the data transiting from stage

1

to stage

2

(ds

2

),whosesizeisvarying.

Figure5.1presentsthethroughputobtainedwitheah mapping,asafuntionofthedatasize ds

2

. Notierstthatsomeofthemappingsarenotinuenedbythehangeofthedatasize,i.e.(1,1,1),(1,1,2)

and(1,1,3). This isduetothefat thattheonnetionbetweenstages

1

and

2

isgood beausethedatastays on the same proessor. The inuene of the size of the data transferred is muh moreimportant when the

(10)

Throughput

ds2

mappings:

0

0.5

1

1.5

2

2.5

3

3.5

4

0

50

100

150

200

250

300

350 (1,1,1)

(1,1,2)

(1,1,3)

(1,2,1)

(1,2,2)

(1,2,3)

(1,3,1)

(1,3,3)

(1,3,2)

Fig.5.1.Experiment2: Throughputfuntionofds

2

The best mapping is (1,3,2) when ds

2 <

150

, and (1,1,3) for greater values. Both of them avoid the slow onnetion nl

1

-

2

, and they use several proessorsso the proessing power is better than for mappings like(1,1,1). Whenthesizeofthedatatransferredbetweenthersttwostagesbeomeshigh, thebottlenekis

theonnetionlink betweenthem, soitisbetterto putthemonthesameproessor,evenifwemaylosesome

proessingpower.

5.3. Experiment3: Pipelinewith

8

stages. Thelastexperimentonsidersalargerpipeline,omposed of

8

stages. We use up to

8

proessors, and ompare four dierent mappings, depending on the number of proessorswewishto use:

•

8proessors,themappingis

[1

,

(1

,

2 ,

3 ,

4 ,

5 ,

6 ,

7 ,

8)

,

8]

• [1

,

(1

,

1 ,

2 ,

3 ,

4 ,

4)

,

4]

• [1

,

(1

,

1 ,

2 ,

2)

,

2]

•

1proessor,themappingis

[1

,

(1

,

1 ,

1)

,

1]

The parametersare the same as for Experiment 1, with p

i

= 10

, w

i

= 1

, ds

i

=1and nl

i

-

i

= 10000

for all

i

. We vary the parametersnl

i

-

j

, for

i

6 =

j

, assuming that all these links are equal, and we omputethe throughputforthedierentmappings. Figure5.2displaystheresults.

The urves obtained onrm that we should avoid data transfer when the network onnetions are less

eient. When nl

i

-

j >

7

, thenetwork performswellenough to allowthe useof the

8

proessors. However, whentheperformanedereases,weshould useonly

4

proessors,thentwo,andonlyonewhennl

i

-

j <

0 .

8

.

Whenweneedto transfertheoutputdatabaktotherstproessor(forexample,themapping

[1

,

(1

,

2 ,

3 ,

4 ,

5 ,

6 ,

7 ,

8)

,

1]

for theasewith

8

proessors),weobtainalmost thesameresults, with aslightly smallerthroughput due to thisadditionaltransfer.

6. Feasibilityoftheapproah. Weenvisagetheuseofourapproahwithinashedulingandresheduling

platform for long-running grid appliations. In this ontext it is antiipated that after initial analysis and

sheduling,thesystemwouldbemonitoredandthatreshedulingwouldbeneededonlyrelativelyinfrequently,

forexample,oneanhour. Neverthelessitisimportantthattheuseofthetooldoesnotontributeanoverhead

whih eliminatesthe benetto be obtainedfrom its use. In this setion we present evidene whih suggests

that this is not likely to be the ase in pratie. The reader should note that herewe are reeting on the

(11)

Number of processors

Throughput

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

2.2

0

1

2

3

4

5

6

7

8

9

10

2

4

8

1

nl

i

-

j

Fig.5.2. Experiment3: Pipelinewith

8

stages

WerananexperimenttoassessthetimetakentogenerateandsolvemodelsusingAMoGeT,whihwill, of

ourse,bedependentonthesizeofthegeneratedmodel. Fig.6.1illustratesthenumberofstatesandtransitions

ofthemodelsasafuntion oftheparametersof theskeleton. Thesenumbersareindependent ofthenumber

ofproessorsin themodel;theydependonlyonthenumberofpipelinestages.

1e+6

8e+5

6e+5

4e+5

2e+5

0

2

4

6

8

10

12

Numberof Stages

Num

b

er

of

states/transitions

states transitions

Fig.6.1. StatesandTransitions

Thetimerequiredtogenerateandsolvethemodelsmustbearefullyonsidered. Thegenerationisalways

very quik: it takes less than 0.01 seonds to generate 20 models. The time required to solve the models

isusually moreimportant,espeially when themodels havealargestatespae. However,if weonsider only

relativelysmallmodels(upto

20 ,

000

states),theresolutionwiththePEPAworkbenhtakesonlyafewseonds. Fig.6.1showsthatwhenthenumberofstagesislessthan

9

,thesizeofthemodelissmallenoughtohaveafast resolution. However,themodelgrowsexponentiallywhenthenumberofstagesisinreased,makingAMoGeT

(12)

TheoveralluseofAMoGeTtakesusuallylessthanoneminuteforomplexappliationsrunningonseveral

proessors,evenwhenweonsider severalmodelstosolve.

Asstated earlier,in asenarioof longomputinggridappliations, witheventuallydynami resheduling

oftheappliation,weonsiderthat thetoolmayberun oneperhour. Wethereforebelievethattheamount

oftime requiredmaybequitenegligibleand that thegainobtainedbyusing thebestof themappings whih

were investigatedanoutperformtheostoftheuseofthetool.

7. Conlusions. In the ontext of grid appliations, the availability and performane of the resoures

hange dynamially. We haveshown through this study that the use of skeletons, and performane models

of these, an produe some relevant information to improve the performane of the appliation. This has

been illustrated on the pipeline skeleton, whih is a ommonly used algorithmi skeleton. The models help

us to hoose themapping,of the stagesonto theproessors,whih will produethebest throughput. Atool

automatesallthestepsto obtaintheresulteasily.

Thepipelineskeletonisasimpleontrolskeleton. Thedealskeletonhasalreadybeenmodelledinasimilar

way[3℄,andexperimentsareongoingusingdealskeletonsnestedintoapipelineappliation. Thisapproahwill

alsobedevelopedonsomeotherskeletonssoitmaybeusefulforalargerlassofappliations.

Our reent work onsiders the generation of models whih take into aountinformation from the Grid

resoures, whih is gathered with the help of the Network Weather Servie [17℄. This will allow us to have

models tted to the real-time onditions of the resoures. This rst ase study has already shown that we

an use suh information produtively and that we have the potential to enhane the performane of grid

appliationswiththeuseofskeletonsandproessalgebras.

Havingproessalgebramodelsof ourskeletons alsopotentially oersotherbenetssuh asthe ability to

formallyverifytheorretfuntioning oftheskeleton. Weintendto explorethis aspetinfuture work.

AppendixA. Strutured OperationalSemantisforPEPA.

Thesemantirules, in thestruturedoperationalstyle,are presentedin FigureA.1; theinterestedreader

is referredto [13℄ formoredetails. Therules areread asfollows: ifthetransition(s) abovetheinferene line

anbeinferred,thenwean inferthetransitionbelowtheline. Thenotation

rα

(

E

)

whihisusedin thethird ooperation rule denotes the apparent rate of

α

in

E

, i.e. the sum of the rates of all ativities of type

α

in

A

ct

₍

_E

₎

.

AppendixB. Pipelineexample: input le forthe PEPAWorkbenh.

Theinput lefor thePEPA Workbenh is displayedin Fig. B.1, forasmall examplewith

Ns

=

Np

= 3

, andwhereeahproessorishostingoneofthestages.

AppendixC. The PEPA Workbenh.

C.1. Funtioning ofthe Workbenh. ThePEPAWorkbenhbeginsbygeneratingthereahablestate

spaeofaPEPAmodelasfoundfromallpossibleinterleavingsofitstransitionsfromstatetostate. Foranite

statemodelwith

n

stateswean enumerate thisstate spaeas

C

=

{

C

1 , . . . , Cn

}

. As theworkbenharries outthistaskitompilestheinnitesimalgeneratormatrix

Q

oftheontinuous-timeMarkovproessunderlying

thePEPAmodel. Theworkbenhaddsatransitionrate

r

to

Qij

everytimethatitndsatransitionfromstate

Ci

to

Cj

at rate

r

. Additionally it subtrats

r

from

Qii

in order that the rowsum of thematrix remainsin

balane.

Theonditionswhih must be satisedin orderto guarantee theexisteneof anequilibrium distribution

for aMarkov proess, and for this to be thesame asthe limitingdistribution, are well-knowna stationary

or equilibrium probability distribution,

Π

,exists for every time-homogeneous irreduible Markov hain whose

statesareall positive-reurrent.

Theintuitionbehindthisdistributionistheobviousone,namelythatinthelongruntheprobabilitythat

thePEPAmodel isinstate

Ci

isgivenby

Π

(

Ci

)

.

For nite state PEPA models whose derivation graph is strongly onneted, and whih therefore have

generated an ergodi Markov proess, the equilibrium distribution of the model,

Π

, is found by solvingthe

matrixequation

(13)

Prex

(

α, r

)

.E

−

−−→

(

α,r

)

E

Cooperation

E

−

(

−−→

α,r

)

E

′

E

⊲

⊳

L

F

(

α,r

)

−

−−→

E

′

⊲

⊳

L

F

(

α /

∈

L

)

F

(

α,r

)

−

−−→

F

′

E

⊲

⊳

L

F

(

α,r

)

−

−−→

E

⊲

⊳

L

F

′

(

α /

∈

L

)

E

−

(

−−→

α,r1

)

E

′

_F

₋

(

_−−→

α,r2

)

_F

′

E

⊲

⊳

L

F

(

α,R

)

−

−−→

E

′

⊲

⊳

L

F

′

(

α

∈

L

)

where

R

=

r

1 r

α

(

E

)

r

2 r

α

(

F

)

min(

r

α

(

E

)

, r

α

(

F

))

Choie

E

−

−−→

(

α,r

)

E

′

E

+

F

−

−−→

(

α,r

)

E

′

F

−

−−→

(

α,r

)

F

′

E

+

F

−

(

−−→

α,r

)

F

′

Hiding

E

−

−−→

(

α,r

)

E

′

E/L

−

−−→

(

α,r

)

E

′

_/L

(

α /

∈

L

)

E

(

α,r

)

−

−−→

E

′

E/L

−

−−→

(

τ,r

)

E

′

_/L

(

α

∈

L

)

Constant

E

(

−→

α,r

)

E

′

A

(

−→

α,r

)

E

′

(

A

def

=

E

)

Fig.A.1. TheoperationalsemantisofPEPA

subjettothenormalisationonditionwhihensuresthat

Π

isawell-formedprobabilitydistribution

X

Π

(

Ci

) = 1

.

(C.2)

TheequationsC.1and C.2 areombinedbyreplaingaolumn of

Q

by aolumn of ones andplainga1in

theorrespondingrowof

0

.

Beausetheonnetivitygraphofthestatetransitionsystemofthemodelwillingeneralhavelowdegree,

the transition matrix of the Markov proess is best stored as a sparse matrix. The PEPA Workbenh uses

aJavaimplementation of the preonditioned bionjugate gradientmethod. This is an iterative proedure as

desribedin[15℄storingtheinnitesimalgeneratormatrixinrow-indexedsparsestoragemode,aompatstorage

mode whih requires storageof only abouttwotimes thenumberof nonzeromatrix elements. An advantage

of onjugate gradient methods for large sparse systems is that they referene the matrix only through its

multipliationofavetor,orthemultipliationofitstransposeandavetor.

C.2. Computing performane results with the PEPA Workbenh. The newfuntionalityof the

workbenhisdesribedthroughatinyexample[10℄,whihweshallrstdesribe. Wethenexplainhowtoadd

thedesriptionoftheresultsin thePEPA inputleandhowtoomputethem.

A tiny example. We desribe the omponents of the PEPA input language for the Workbenh via a

simpleexample,desribedin theletiny.pepa:

(14)

// PIPELINE APPLICATION

// 3 stages, 3 proessors (1 stage per proessor)

// Variables delaration (all idential)

mu1=10; mu2=10; mu3=10;

la1=10; la2=10; la3=10; la4=10;

// Definition of the Stages

Stage1 = (move1, infty).(proess1, infty).(move2, infty).Stage1;

// Definition of the Proessors

Proessor1 = (proess1, mu1).Proessor1;

// Definition of the Network

Network = (move1,la1).Network + (move2,la2).Network

+ (move3,la3).Network + (move4,la4).Network;

// The pipeline model

Network <move1,move2,move3,move4>

(Stage1 <move2> Stage2 <move3> Stage3)

<proess1,proess2,proess3> (Proessor1||Proessor2||Proessor3)

Fig.B.1.TheinputleforthePEPAWorkbenh:pipeline.pepa

P2=(run,r2).P3;

P3=(stop,r3).P1;

P1 || P1

Thismodelisomposedoftwoopiesofaomponent,

P

1

,exeutinginapureparallelsynhronization.

P

1

isasimplesequentialproesswhih undergoesastartativitywithrate

r

1

tobeome

P

2

whihrunswithrate

r

2

to beome

P

3

whihgoesbakto

P

1

viaastop ativitywithrate

r

3

.

Therstlineoftheleisdeningtherates. Thenthesequentialproessisdened,andthenallineisthe

systemequation,whihdesribesthebehaviourofthemodelledsystem.

Adding resultstothe inputle. Inordertoautomatiallyomputesomeperformaneresults,theuser

just needstospeifythemin thePEPAinputle, forexampleintheletiny.pepapresentedbefore. This is

donebyinludingat theendoftheleonelineperresult,oftheform:

result_name = {result_desription};

result_name = rate * {result_desription};

Thenameoftheperformaneresultthat isdesribedisresult_name,andthedesriptionoftheresultforthe

PEPAStateFinderisresult_desription.

Thestates ofinterest aredesribedthroughthe useof asimplepattern language,with double stars (**)

forwildards,anddoublevertialbars(||)forseparatorsbetweenmodelomponents. Themodelomponents

aredesribedin theorderusedin thesystemequation.

Arateanbeadded;inthisasethenalresultobtainedbythePEPAStateFinderwillbemultipliedby

thisrate. Thisisquiteusefulto omputethroughput.

For ourexample, we anadd someresults onerningthe rstproess, independentlyof the stateof the

(15)

run1 = {P2 || **};

Trun1 = r2 * {P2 || **};

stop1 = {P3 || **};

For example, the performane result run1 mathes all the states in whih the rst proess is ready to

perform therun ativity. Thestateof theseond proess anbeanything. Trun1isthe same,multiplied by

therateoftherunativityr2. Itorrespondsthereforetothethroughputofrunfortherstproess.

Forthepipelineappliation, therequiredperformaneresultisspeiedinthePEPAinputle

pipeline.pepa(Fig.B.1). Thisisdonebyaddingthefollowinglineat theend ofthisle:

Throughput = mu1 *

{

** <move1,move2,move3,move4>

((proess1, infty).(move2,infty).Stage1 <move2> ** <move3> **)

<proess1,proess2,proess3> (** || ** || **)

}

Computingtheresults. Theresultsanbeomputedbyusingtheommandlineinterfae. Thisisdone

byinvokingthefollowingommand:

java pepa.workbenh.Main -run lr ./tiny.pepa

The-run lr(or-run lnbg+results)optionmeansthat weusethelinearbionjugate gradientmethod

toomputethesteadystatesolutionofthemodeldesribedin thele./tiny.pepa,andthenweomputethe

performaneresultsspeiedinthis le.

Thisexeutionprintstheresultstothesreen,anditalsosavesoneleperperformaneresult

(./results/model_name.result_name).ThisleistheoutputofthePEPAStateFinderfortheresult

desrip-tionspeiedin theinputle. Itontainsthestatemathing thedesription,andthesumofthesteady-state

probabilitiesforthesestates. Itdoesnottakethemultipliativerateintoaount. Theresultsarealsoappended

to the le./model_root.res,where model_root is the beginning of the model_name, until a

−

ora

.

is

found. This isusedtoautomatiallyompareresultsofsimilarmodels.

OnlyafewleshavebeenmodiedtoinludethenewfuntionalityintheJavaWorkbenh. Theinterested

readershould referto[12℄.

REFERENCES

[1℄ M.Alt,H.Bishof,andS.Gorlath,ProgramDevelopmentforComputationalGridsUsingSkeletonsandPerformane

Predition,ParallelProessingLetters,12(2002),pp.157174.

[2℄ A. Benoit, M. Cole, S. Gilmore, and J. Hillston, Evaluating theperformane of skeleton-based highlevel parallel

programs,inTheInternationalConfereneonComputationalSiene(ICCS2004),PartIII,M.Bubak,D.vanAlbada,

P.Sloot,andJ.Dongarra,eds.,LNCS,SpringerVerlag,2004,pp.299306.

[3℄ A.Benoit,M.Cole,S.Gilmore,andJ.Hillston,Shedulingskeleton-basedgridappliationsusingPEPAandNWS,

SubmittedtoaspeialissueofTheComputerJournalonGridPerformabilityModellingandMeasurement,(2004).

[4℄ R. Biswas, M. Frumkin, W. Smith, andR. V. derWijngaart,Tools andTehniquesforMeasuring andImproving

Grid Performane,inPro. ofIWDC 2002 onDistributed Computing: Mobileand Wireless Computing,vol.2571of

LNCS,Calutta,India,De.2002,Springer-Verlag,pp.4554.

[5℄ M. Cole, Algorithmi Skeletons: Strutured Management of Parallel Computation, MIT Press & Pitman, 1989.

http://homepages.inf.ed.a.uk/mi/Pubs/skeletonbook.ps.gz.

[6℄ M.Cole,eSkel: TheedinburghSkeletonlibrary.TutorialIntrodution,InternalPaper,ShoolofInformatis,Universityof

Edinburgh,(2002).

http://homepages.inf.ed.a.uk/mi/eSkel/.

[7℄ M.Cole,BringingSkeletons outof theCloset: APragmatiManifesto forSkeletalParallel Programming,Parallel

Com-puting,30(2004),pp.389406.

[8℄ I.FosterandC.Kesselman,TheGrid:BlueprintforaNewComputingInfrastruture,MorganKaufmann,1998.

[9℄ N. Furmento, A. Mayer, S. MGough, S. Newhouse, T. Field, and J. Darlington, ICENI: Optimisation of

ComponentAppliationswithinaGridEnvironment,ParallelComputing,28(2002),pp.17531772.

[10℄ S.Gilmore,ThePEPAWorkbenh: User'sManual,InternalPaper,ShoolofInformatis,UniversityofEdinburgh,(2001).

http://www.ds.ed.a.uk/pepa/pwb.pdf.

[11℄ S.GilmoreandJ.Hillston,ThePEPAWorkbenh:ATooltoSupportaProessAlgebra-basedApproahtoPerformane

Modelling, inPro.of the 7th Int. Conf. on Modelling Tehniques and Tools for ComputerPerformaneEvaluation,

no.794inLNCS,Vienna,May1994,Springer-Verlag,pp.353368.

http://www.ds.ed.a.uk/pepa/workbenh.ps.gz.

[12℄ N.Haenel,UserGuidefortheJavaEditionofthePEPAWorkbenh-Tabasorelease,InternalPaper,ShoolofInformatis,

(16)

[13℄ J.Hillston,ACompositionalApproahtoPerformaneModelling,CambridgeUniversityPress,1996.

[14℄ N.Karonis,B.Toonen,andI. Foster,MPICH-G2:AGrid-Enabled ImplementationoftheMessagePassingInterfae,

JournalofParallelandDistributedComputing(JPDC),63(2003),pp.551563.

[15℄ W.H.Press,S.A.Teukolsky,W.T.Vetterling,andB.P.Flannery,NumerialReipesinC:TheArtofSienti

Computing,CambridgeUniversityPress,1992.

[16℄ F.RabhiandS.Gorlath,PatternsandSkeletonsforParallel andDistributedComputing,SpringerVerlag,2002.

[17℄ R.Wolski,N.Spring,andJ.Hayes,Thenetworkweatherservie: adistributedresoureperformaneforeastingservie

formetaomputing,FutureGenerationComputerSystems,15(1999),pp.757768.

Editedby: FrédériLoulergue

Reeived: June3,2004