MODELINGSTREAM COMMUNICATIONS INCOMPONENT-BASEDAPPLICATIONS
∗
M. DANELUTTO
†
, D. LAFORENZA
‡
, N. TONELLOTTO
‡
, M. VANNESCHI
†
, AND C. ZOCCOLO
†
Abstrat. ComponenttehnologyisapromisingapproahtodevelopGridappliations,allowingtodesignveryomplex
appli-ationsbyhierarhialompositionofbasiomponents.Nevertheless,omponentappliationsonGridshaveomplexdeployment
models. Performane-sensitive deisions shouldbetakenbyautomati tools, mathing developer knowledge about omponent
performanewithQoSrequirementsontheappliations,inordertonddeploymentplansthatsatisfyaServieLevelAgreement
(SLA).
Thispaperpresentsasteady-stateperformanemodelforomponent-basedappliationswithstreamommuniationsemantis.
The model stritly adheres to the hierarhial nature of omponent-based appliations, and isof pratial use inlaunh-time
deisions.
Keywords: gridomputing;heterogeneousenvironments;streamomputations;performanemodel;mapping.
1. Introdution. Grid omputing is an emerging tehnology that enablesthe aggregationof
heteroge-neous, distributed resoures to solve omputational problems of ever inreasing size and omplexity. The
appliations that best perform on Grid platforms are the ones requiring large omputational power, or the
treatmentoflargedatasets,i.e. asublassofHigh-PerformaneAppliations[17℄.
Suh appliations (e.g. data-mining [12℄, query proessing[3℄, image proessing and visualization[2℄and
multimediastreaming[38℄)anbeonvenientlyexpressedusingaformalismbasedontwofundamentalnotions:
streamsof dataowingbetweenomponents,andomponents(eithersequentialorparallel) proessingthem.
Several programming languages are built on these onepts. Skeleton-based languages (e.g. SkIE [4℄ and
SBASCO [14℄) and skeleton libraries (e.g. eSkel [11℄ and Kuhen's C++ skeleton library [21℄) exploit the
notionofstreamsfortask-parallelskeletons(e.g. pipeandfarm). MoregenerallanguageslikeASSIST[33℄and
Datautter[15℄introduemodulesandstreamsasprimitiveoneptsto strutureparallelappliations.
Grid programmingframeworks (e.g. GrADS [9℄, ASSIST [13℄) are in hargeof the omplete automation
of appliation exeutionmanagement, eientlyexploiting Grid resoures. Moreover,they should be ableto
exeutetheappliationwithuser-requiredQoS,adaptingtheexeutiontothedynamihangesofGridresoures.
Thetraditionalomponentmappingstrategy,inwhihomponentsarestatially deployedin adistributed
environmentby theirdevelopers,doesnott well in suh senario. A broader deploymentmodelis required,
featuring
(i) manualmapping,inwhihtheomponentsarealreadypairedwiththeirresoures(onwhihtheyare
deployed),
(ii) resouresdisoveryandseletionat launhtime,to guaranteetheinitialdesiredperformane,
(iii) adaptiveomponentsmanagement,that at run-timeadjusttheset ofomputing resouresexploited
[31,1℄,inordertoadapttodierentperformanerequirements(on-demandomputing)ortohangingresoures
availability.
Aordingtothismodel,thedeploymentframeworkmustautomatially managetheoperationsneededto
enforetheappliationdesiredQoS.Thisanbeobtainedwiththespeiationofaperformaneontrat [34℄.
OurapproahintendstoautomatisethetasksneededtostarttheexeutionofHPCappliations. Ournal
goalistoallowanaslargeaspossibleuserommunitytogainfullbenetsfromtheGrid,andatthesametime
togivethemaximumgenerality,appliabilityandeasyofuse.
Themainontributionsofthispaperareasfollows:
(i) Wepropose an analytialmodel of thedynami behaviorof sequential/parallelomponents,
hierar-hial omponents and omponent appliations, ommuniating throughtyped streams of data. It is suited
to be used in simulation environments, to synthetially generate omponents and appliations to test
map-ping/shedulingsolutionsinarepeatableandontrolled setting. Eventually,theproposeddynami modelan
beexploited intheimplementationofdynamireongurationpoliies[1℄.
∗
Thisworkhas beensupportedby: the ItalianMIURFIRB Grid.itprojet,No. RBNE01KNFP,on High-performaneGrid
platformsandtools,andtheEuropeanCoreGRIDNoE(EuropeanResearhNetworkonFoundations,SoftwareInfrastruturesand
AppliationsforLargeSale,Distributed,GRIDandPeer-to-PeerTehnologies,ontrat no.IST-2002-004265).
†
DepartmentofComputerSiene,UniversityofPisa,Pisa,Italy
‡
(ii) Starting from the dynami model we identify the set of variables that an be used to desribe the
performanebehaviorofanappliation,andwederivethesetofrelationsamongthemwhihholdatsteady-state
(performane model). In this way we abstratfrom partiularruntime platforms and weapture all possible
steady-statebehaviorsof anappliation. Moreover, theirformulationbymeans oflinear algebraallowsus to
hierarhiallyomposetheperformanemodelsofseveralomponentsto derivethesteady-statemodelofnew
omponentsorappliations.
(iii) Weintrodueadenitionofperformanemodel forstreamappliations,whihisexploitedin
launh-timemappingand runtimereongurationdeisions.
Afterasurveyofrelatedwork(Set.2),thispaperpresentsadynamimodelofstream-basedomputations
(Set. 3), andin Set.4suhmodelis exploited toderiveasteady-stateperformane model forstream-based
appliations. In Set.5, suh model is applied to aase study, to preditthe programbehaviorat run-time,
and todeviseaorretinitial mappingforspeiedQoSlevels. Setion6onludes thepaper, disussingthe
presentedapproahandfuture work.
2. Related Work. Performane speiation of omponents and their interations is a basi problem
thatmustbesolvedtoenablesoftwareengineerstoassembleeientappliations[27℄. Moreover,performane
modeling is one of the key aspets that needs to be addressed to fae sheduling/mapping problems in
het-erogeneousplatforms. It arises in automatiomponent plaementand reonguration. Severalreentworks
fousonperformanemodelingtehniquestoanalyzethebehaviorofomponent-basedparallelappliationson
distributed,heterogenous,dynamiplatforms.
Analyti performane models in software engineering makeextensive use of UML formalism to desribe
software omponent behavioral models [35℄ and to derive models based on Queuing Networks [19℄ or
Lay-ered Queueing Networks [36℄ to be exploited in design phase of the lifeyle of software. The same holds
for Stohasti Petri Nets[20℄ and Stohasti Proess Algebras [18℄. Suh models typially translatea
paral-lel appliation into an analyti representation of its exeution behavior and the target runtime system
(a-ording to the Software Performane Engineering methodology [28℄). A detailed survey of suh models is
in [5℄. Suh translation is usually not straightforward. It may requireapproximationsto obtain
mathemat-ial models [29℄ for whih a losed-form solution is known. Stohasti models usually require the solution
of the underlying Markov hain whih an easily lead to numerial problems due to the spae state
explo-sion [5℄. More omplex models an be solved by means of simulation, at the ost of a larger omputation
time.
Symboliperformanemodeling[32℄isamethodologythat enablesarapiddevelopmentoflowomplexity
and parametri performane models. Symboli performane models anbe derived from simulationmodels,
tradingo result aurayfor model evaluation ost. In [32℄ asymboli performane model for the Pamela
modelinglanguageisintrodued. Itderiveslowerboundsforsteady-stateperformanesofappliationsstarting
fromamodeloftheprogramandofthesharedresoures,ombiningdeterministiDiretAyliGraphs(DAGs)
modeling with mutual exlusion. Oneof thestrengths of thePamelaapproahis that it is fastand easy to
transformaregularlystruturedappliation intoaperformane model. Themainlimitationofsuhapproah
isthatitomputeslowerboundsoftheperformaneofaprogram. Symboliperformanemodelsshareseveral
properties withthemodelwepropose: bothanbeextratedfrom thestrutureof programs,areparametri,
and anbe eiently evaluated. The main dierene is that the presented model doesnot omputea lower
bound,buttheasymptotisteady-stateperformaneofanappliation,thatisingeneralabetterapproximation
oftherealperformane.
Theasymptotisteady-stateanalysishasbeenpioneeredbyBertsimasandGamarnik[10℄. Thisapproah
hasbeenreently appliedto mappingandshedulingproblems ofparallel appliationsonheterogeneous
plat-forms[23,7,6℄,inwhihtheanalysisisappliedtopartiularlassesofparallelappliations(divisibleload[23℄,
master/slave[6℄, pipelined andsatter operations [7℄), in thehypothesisthat the set ofresouresis knownin
advane. The existing steady-stateapproahesapply only to arestritedlass of strutured parallel
applia-tions,assumingtoknowtheruntimeenvironmentinsuhawaytoderiveoptimalshedulingoftheappliation
omponents. Inadynami environmentlike aGrid anoptimal initial plaement of theomponentsmay
be-omeuselessverysoon,beausetheonditionsoftheexeutionplatformmayvarydynamially. Thepresented
steady-stateanalysis an beapplied to abroader lass ofstrutured parallel appliationsand tries to solvea
Strutural performane models [25℄ are the rst eortto develop ompositional performane models for
omponentappliations. MostsientiandGridomponentmodelsrelyontheoneptofalgorithmiskeleton.
Skeletons are ommon, reusable and eient strutured parallelism exploitation patterns. One advantage of
theskeletalapproahisthat parametriostmodelsanbedevisedfortheevaluationofruntimeperformane
of skeleton ompositions. In [14, 8℄ dierent ost models are assoiated to eah skeleton of an appliation
to enhane its runtime performane through parallelism/repliationdegree adjustments and initial mapping
seletion, respetively. The authors of [14℄ propose parametri ostmodels forpipe, farm and multiblok
skeletons,thatanbearbitrarilyomposedandnested. In[8℄,analytiostmodelsforappliationsomposedby
pipesanddealsarederivedwithinastohastiproessalgebraformulation. Struturalperformanemodelsare
extendedbythepresentedmodelbyproposingamethodologywell-suitedforgeneriompositionofskeletons,
andbytakingintoaountthesynhronizationproblemsintroduedbyusingstreamedommuniations.
Trae-basedperformanemodels[34,26℄areurrentlyexploitedinparallel/Gridenvironmentstomodelthe
performaneofsetsofkernelappliations. Reordingandanalyzingexeutiontraesonreferenearhitetures
ofsuhappliation itispossible,withaertaindegreeofpreision,to foreasttheperformaneofthesameor
similarappliationsondierentresoures.Traeinformationisexploitedinthepresentedmodel,butindierent
waywithrespettotheexisting approahes. Insteadofprolingawholeappliationonaset ofrepresentative
resoures,theappliationmodeliskeptindependentfromresoures. Whentheappliationwill bemappedon
atualresoures,historialinformation willbeusedto model theruntime behaviorof singleomponents,and
then suh information will beoupled with the omponent interations informationto obtain apredition of
theperformaneofthewholeappliation.
The problem of deriving a performane model for omponents has been addressed also in the ontext
of omponent frameworks suh as EJB [37℄, COM+/.NET [16℄ and CCA [24℄. Suh works apply analytial
performanemodel(LQN)ortrae-basedperformanemodeltoderiveamodelforomponents. In[30℄,
trae-basedmodelsareexploitedtoseletthemostsuitableomponents,whenmultiplehoiesareavailable,tobuild
anoptimalappliation,fromthepointofviewofperformane.
3. DynamiBehavior. Anappliationanbestruturedasahypergraphwhosenodesrepresentprimitive
omponentsandwhose(hyper)edgesrepresentommuniationsorsynhronizationsbetweenomponents. Nodes
interat withinput (server) interfaesand output(lient)interfaes. Edgesare diretedand anonnettwo
ormorenodesthroughtheirinterfaes. Twonodesmaybelinkedbymorethanasingleedge.
3.1. Communiations. Communiations between omponents are implemented through input/output
interfaesbindings. Inthisworkdata-owstreamommuniationsarestudied. Everyomponentreeivesdata
throughoneormoreinputinterfaes,performssomeomputations,andgeneratesnewdatatobesentthrough
oneormoreoutputinterfaes.
Inthisontext, astream representsatyped, unidiretionalommuniationhannelbetweenanon-empty,
nitesetofomponents(produers)andanon-empty,nitesetofomponents(onsumers). Theatomipiee
of information transferred througha stream is alled item. A produeris onneted to astream through an
output interfae, while a onsumeris onneted to astream through an input interfae. Every node anbe
produeroronsumerofseveralstreams,anditispossibletospeifyylistrutures(i.e. theommuniation
strutureisnotrestritedtobeaDAG).
Componentsanbeonnetedbystreamsaordingtothreedierentpatterns:
(i) uniast: one-to-one onnetion. Everyitem sentontheoutput streaminterfaeisreeivedin order
bytheinputstreaminterfae.
(ii) merge: many-to-oneonnetion. Everyitem sentontheoutputstreaminterfaesisreeivedbythe
inputstreaminterfae. Thetemporalordering oftheitemsomingfrom eahinputinterfaeispreserved,but
theinterleavingbetweenthedierentsouresisnon-deterministi.
(iii) broadast: one-to-manyonnetion. Everyitem sent ontheoutput stream interfaeis reeivedin
orderbytheinputstreaminterfaes. Thereeptionshappeningondierentinputinterfaesarenotsynhronized.
3.2. Computations. Componentsimplement sequential aswell asparallel omputations. A sequential
omponent exeutes asingle funtion in a single ative thread, proessing items as theyare reeived. Fora
parallelomponent,twosenariosarepossible:
(i) dataparallel: asinglefuntion isexeutedinparallel ondierentportionsofthesamedata;
A primitive omponent, either sequentialorparallel, at runtime repeatedly reeivesitems from its input
streams,performssomeomputationsanddeliversresultitemstoitsoutputstreams.
Aomponentanhaveseveralinputstreams. Thesetofinputstreamsispartitionedbetweenthe
omputa-tionsassoiatedwiththeomponents. Eahinputstreamis assoiatedtoonlyoneomputation;nevertheless,
spontaneousomputationsmayexist,thatdonotneedinputitemstoativate,butfollowownativationpoliies
(e.g. periodially).
Aomputationan beativatedifthefollowingonditionshold:
(i) theomponentanexeuteanewfuntion(thismeansthatitisidle,oritisparallelandthreadsare
availableto exeuteit),
(ii) theassoiatedinputitemshavebeenreeived,ornoitemisneessary.
Asequentialomponentanativateanewfuntiononlywhenitisidle. Aparallelomponentanhaveat
mostoneativedata-parallelomputationatanygiventime(omposedbyaxednumberofthreads),orseveral
task-parallelomputationsrunninginparallel (uptothemaximumnumberofthreadsin theomponent).
Aomponentanhaveseveraloutputstreams. Oneormoreomputations oftheomponentandispath
dataoneahoutputstream.
3.3. Node Behavior. Inordertodesribethebehaviorofaomputation atruntime,onsiderFig.3.1.
Fig.3.1. Sequentialomponentatruntime
Withoutlossofgenerality,asequentialomponentisonsidered;thedisplayedquantitiesrepresent:
(i)
i
k(
t
)
: totalnumberofreeiveditemsat timet
fromthek
th
inputinterfae;
(ii)
e
(
t
)
: totalnumberofomputationsarriedoutattimet
; (iii)o
j(
t
)
: totalnumberofsentitemsat timet
throughthej
th
outputinterfae.
Continuousquantitiesareusedtomodelpartialevolution,e.g.
e
(
t
) = 3
.
5
meansthatthenodereahedthehalf waypointin thefourthomputation.Theativation of aomputationan happenonly whenthe numberofitemsompletely reeivedoneah
assoiatedstreamisgreaterthanthenumberofpartiallyomputeditems:
∀
k
= 1
, . . . , n
i
k(
t
)
−
e
(
t
)
>
0
(3.1)Thenodeimplementationwillexploitnitebuerstostorereeiveditemsforeahinputinterfae,thereforefor
eahinputinterfaeandassoiatedomputationthefollowingmusthold:
∀
k
= 1
, . . . , n
i
k(
t
)
−
e
(
t
)
≤
τ
1k
(3.2)where
τ
1k
representsthemaximumnumberofelements thatanbereeivedonthek
th
inputinterfaebefore
thestreambloks. Thenthemaximumadmissible valuefor
i
k
(
t
)
attimet
is:i
max
k
(
t
) =
τ
1k
+
e
(
t
)
(3.3)Assumingthatnosensibledelaysarepresentbetweentheendofomputationsandthebeginningofthe
transmis-sionoftheprodueditems,thetotalnumberoftransmitteditemsisrelatedtotheprogressoftheomputations
ofthenode. Inthegeneralaseofanodewith
s
funtions,thefollowingequationholdsforeahoutputinterfae:∀
j
= 1
, . . . , m
o
j(
t
) =
f
j
e
1(
t
)
, . . . , e
s(
t
)
(3.4)3.4. Edge Behavior. In order to desribe thebehaviorof a data transmissionon astream, onsider a
uniaststream. Theinvolvedvariablesare
o
(
t
)
,totalnumberofitemssentattimet
from soureinterfae,andi
(
t
)
,totalnumberofitemsreeivedattimet
bythedestinationinterfae. Anewtransmissionbeginsonlyafterafullitemisprodued:
i
(
t
)
≤ ⌊
o
(
t
)
⌋
(3.5)Theedge implementation will exploit nite ommuniationbuers and thenetwork layertransfershunks of
data. Let
q
−
1
betheminimumfration ofitemtransferredatomially. Then
o
(
t
)
−
⌊
q
·
i
(
t
)
⌋
q
≤
τ
2
(3.6)where
τ
2
representsthe maximumnumberof itemsthat anbebuered. Therefore themaximumadmissiblevaluefor
o
(
t
)
at timet
is:o
max
(
t
) =
τ
2
+
⌊
q
·
i
(
t
)
⌋
q
(3.7)Wheneveranedgebuerisfull, aproduerwill blokassoonasittriesand sendsanewitem. From(3.4)we
obtain:
o
max
(
t
)
−
f e
1(
t
)
, . . . , e
m(
t
)
≤
0
(3.8)For merge streams with
k
soure interfaes and broadast streams withk
destination interfaes, the generalonstraints(Eqs.(3.5)and(3.6)fortheuniaststream)beome:
merge:
i
(
t
)
≤
P
k
o
k
(
t
)
P
k
o
k(
t
)
−
i
(
t
)
≤
τ
2k
(3.9)
broadast:
∀
k
i
k(
t
)
≤
o
(
t
)
∀
k
o
(
t
)
−
i
k
(
t
)
≤
τ
2k
(3.10)
For simpliity, inthepreviousequationsthenetworkquantizationonstant
q
hasbeensuppressed.3.5. RuntimeBehavior. Atruntime,aomponentanbeseenasadynamisystem. Thesystemstate
at time
t
is desribedby aset ofstate variables:i
1,...,ni(
t
)
,e
1,...,ne(
t
)
,o
1,...,no
(
t
)
. Thus, the state spaeP
is an
=
n
i
+
n
e
+
n
o
dimension Eulidean spae. The dynami behavior ofaomponent anbe modeled bya trajetoryp
(
t
)
in suhstatespae.The runtime behavior of a omponent is fully speied when it is oupled with hosting resoures. A
omputingresoureis modeled by
w
(
t
)
, theavailableomputing powerat timet
(measuredin MFlop/s)and a ommuniation link is modeled byb
(
t
)
, the instantaneous bandwidth at timet
(measured in MByte/s). Moreover, aharaterizationof the items is required. It is assumed that an item proessed by aomponentrequires
l
unitsofomputingworktobeproessed(measuredinMFlop)ands
unitsofommuniationworktobetransmitted(measuredinbytes).
Introduingthestepfuntion
u
(
x
)
,thenumberofperformed(partial)omputationspertimeunitis:de
dt
=
u
min
i
1(
t
)
, . . . ,
i
n(
t
)
−
e
(
t
)
·
·
u
o
max
(
t
)
−
f e
1(
t
)
, . . . , e
m(
t
)
·
w
(
t
)
L
(3.11)
whiletheequations governingthenumberofpaketsowingin theuniast, mergeand broadaststreamsper
timeunitare,respetively:
di
dt
=
u
o
(
t
)
−
i
(
t
)
·
u
i
max
(
t
)
−
i
(
t
)
·
b
(
t
)
s
(3.12a)di
dt
=
u
X
k
o
k
(
t
)
−
i
(
t
)
·
u
i
max
(
t
)
−
i
(
t
)
·
b
(
t
)
s
(3.12b)
di
k
dt
=
u
o
(
t
)
−
i
k
(
t
)
·
u
i
max
(
t
)
−
i
k
(
t
)
·
b
(
t
)
s
Note that an important assumption has been made. The work required to perform a omputation is
supposed to be independent from the values of the inoming items; their values are used just to perform
omputations. Thisisaommonassumptioninparalleldata-owprogramming,butthereareappliations(e.g.
queryproessinganddatamining)that donotrespetthisassumption.
Thedynamiequationsprovidedbythemodelanbewrittenin thegeneralform:
˙
p
(
t
) =
U
(
p
(
t
))
α
(
t
)
(3.13)Wedenote with
U
:
P
→
M
n,n
thefuntion that, foreverypoint inthe statespae,providestheontrolpart ofthedierentialequations(the onesinvolvingthestepfuntions),andwithα
(
t
)
theresourespart(involvingw
(
t
)
andb
(
t
)
).Weobservethattheontrolmatrixispiee-wiseonstantovernon-innitesimaltimeintervals: itdesends
from quantization in the generalequations for thenodes(3.11), and in theequations for the streams (3.12).
Then, the Cauhy problem an be solved onstrutively. Starting with
t
0
= 0
, p
0(
t
0) = 0
, U
0
=
U
(0)
, we indutivelydenep
i(
t
) =
Z
t
ti
U
i
α
(
τ
)
dτ
t
i+1
= sup
{
t > t
i
|
U
(
p
i(
t
)) =
U
i
}
U
i+1
= lim
t
→
t
+
i
U
(
p
i(
t
))
Inthis way,
p
(
t
)
is dened asthe onatenationof the pieesp
i
|
[ti,ti+1)
: it is aontinuous funtion (p
i(
t
i) =
p
i+1(
t
i)
)andpiee-wisedierentiable.4. STEADYSTATEBEHAVIOR. Thesteady-statebehaviorofthesystemanbeanalysedby
study-ingmeanvalues
p
¯
fortherateofhangeof thestatevariables:¯
p
=
E
[ ˙
p
|
[t0,
∞
)
] =
Z
∞
t0
˙
p
(
t
)
dt
= lim
t
→∞
p
(
t
)
−
p
(
t
0)
t
−
t
0
(4.1)
Thehoieof
t
0
isarbitrary,infattheweightofthetransientphasefadesawayonsideringinniteexeutions.However,to ease thereasoning aboutthese quantities, wean interpret
t
0
asthe end of thetransientphase,e.g. whenthelaststageonsumestherstdataiteminapipeline.
The essential aspet to point out is that for the steady-statemodel the fous is on relations among the
steady-statevariables,rather thanintheir values. Inthiswayitispossibletoabstrat from partiulartarget
platforms,andapturethelassofallpossiblesteady-statebehaviorsofanappliation.
Thesteady-statebehaviorofanode anbe modelledassoiatingto eah omputation
e
k
(
t
)
itsativation rate¯
e
k
= lim
t
→∞
e
k
(
t
)
−
e
k(
t
0)
t
−
t
0
(4.2)
Spontaneous omputations are free variables in thesteady-state model. Computations that are ativated by
datareeption,instead,aresubjettothefollowingondition.
Proposition4.1. Thesteady-stateexeutionrate ofaomputationisboundtobeequal tothe inputrates
onthe inputinterfaes thatativate the omputation.
Proof. Let
k
∈
A
i
,wewillprovethate
¯
i
−
¯
ı
k
= 0
¯
e
i
−
¯
ı
k
= lim
t
→∞
e
i(
t
)
−
e
i(
t
0)
t
−
t
0
−
lim
t
→∞
i
k(
t
)
−
i
k(
t
0)
t
−
t
0
= lim
t
→∞
e
i(
t
)
−
e
i(
t
0)
−
i
k(
t
) +
i
k
(
t
0)
t
−
t
0
= lim
t
→∞
e
i(
t
)
−
i
k(
t
)
t
−
t
0
−
e
i(
t
0)
−
i
k(
t
0)
Thenumeratoroftherstaddendislimitedbyonstants: (3.1)gives
e
i(
t
)
−
i
k(
t
)
≤
0
and(3.2)(notingthat
e
(
t
)
≥ ⌊
e
(
t
)
⌋
)givese
i(
t
)
−
i
k(
t
)
≥ −
τ
1k
whilethenumeratorof theseondaddend isonstant,so thelimittendsto zerowhenthedenominator tends
toinnity.
The data transmission rate
¯
o
k
of an output stream will depend on the ativation rates of one or more omputationsofthenode. Intheprevioussetion,thenumberofdataoutputshasbeenrelatedtothenumberofperformedomputationsbymeansofatransferfuntion
f
k
(Eqn. (3.4)). Proposition4.2. If the transferfuntion is(asymptotially) linearo
k
=
f
k(
e
1
, . . . , e
m) =
α
1
k
e
1
+
. . . α
m
k
e
m
+
c
k(
e
1
, . . . , e
m)
with
lim
k
e
k→∞
k
c
k(
e
)
k
k
e
k
= 0
then asteady-state is eventually reahed, in whih the output rate isa linear ombination of the omputation
rates:
¯
o
k
=
m
X
i=1
α
ki
e
¯
i
(4.3)Proof.
¯
o
k
= lim
t
→∞
f
k(
e
(
t
))
−
f
k
(
e
(
t
0))
t
−
t
0
= lim
t
→∞
α
k
·
(
e
(
t
)
−
e
(
t
0)) +
c
(
e
(
t
))
−
c
(
e
(
t
0))
t
−
t
0
=
α
k
·
lim
t
→∞
e
(
t
)
−
e
(
t
0)
t
−
t
0
+ lim
t
→∞
c
(
e
(
t
))
−
c
(
e
(
t
0))
t
−
t
0
=
α
k
·
¯
e
+ 0 =
m
X
i=1
α
m
k
e
¯
i
Thesteady-statebehaviorofstreamsanbemodelledbyassoiatingtoeahendpointitsdatatransmission
rate. Balane equationsrelatinginputandoutputendpointsarederived.
Proposition4.3. Thesteady-statetransmissionrateatthe endpointsofastreamareharaterisedbythe
following balaneequations:
uniast:
o
¯
A
= ¯
ı
B
(4.4a)merge:
o
¯
A
+ ¯
o
B
= ¯
ı
C
(4.4b)broadast:
o
¯
A
= ¯
ı
B
= ¯
ı
C
(4.4)Theseequations areeasily extendedinthe aseof moreendpoints.
Proof. Theproofissimilarto theoneofProp.4.1,exploiting:
(i) (3.5)and(3.6)foruniast,
(ii) (3.9)formerge,
(iii) (3.10)forbroadast.
Theexeutionrate for eah omputation,and thedata transfer rate foreahinput/output interfae
om-pletelyspeify theappliationstatefrom thepointof viewofitsperformane,therefore wewillallthem the
performanefeatures ofourappliation.
bymeansofsomeperformane annotations,in ordertobuildaperformanemodel. Proposition4.1allows
ustoeliminateexeutionratesassoiatedtodata-dependentomputations. Proposition4.3allowsustorelate
outputratestoinputratesoflinkedmodules.
Theperformanemodelisthereforedenedasanhomogeneoussystemofsimultaneouslinearequations,
thatdesribetherelationsthatholdinthesteady-stateamongtheperformanefeatures. Thesetofsolutions
ofthe systemisavetorsubspaeof
R
n
(where
n
isthe totalnumberofvariables, either inputrates, outputratesorexeution rates);weall the dimensionofthe solutionspae thenumberof degrees of freedomof
theappliation. If thisdimensionis 1,thenthesystemis ompletelydetermined assoonasasinglevaluefor
anyvariableisimposed. Thedegenerateaseofaspaewithdimension0impliesthattheonlysolutiontothe
system isthe null vetor(i. e. everyvariable must bezero): this meansthat thepredited steady-stateis a
deadlokstate,inwhihnoomputationorommuniationanproeed. Thenumberofdegreesoffreedomof
thesystem will impaton how many onstraintsmustbeprovided in order to derivethe expeted valuesfor
everyvariable.
Clearly,onlypositivevaluesoftheratesaremeaningful,soweanonludethateveryassignmentofpositive
valuesforthevetor
[
i e o
]
T
∈
R
n
thatisasolutionofthesystemisapossibleoperationpointforthemodeled appliation.Theoutlinedapproah iseient,in fatthesimpliationofthesimultaneous equationsanbeahieved
usingwellknowntehniques.
5. Appliation of the Model. We showhowthepresentedmodelan beapplied to areal appliation
(see Fig. 5.1), a rendering pipeline. The rst stage requests the renderingof asequene of senes while the
seond renderseah sene (exploiting the PovRay rendering engine), interpreting a sript desribing the 3D
modelof objets,theirpositionsandmotion. Thethirdstage olletsimagesrenderedbytheseondone,and
builds GroupsOf Pitures (GOP), that are sentto the fourth stage,performing DivXompression. Thelast
stageolletsDivXompressedpiees andstoresthemin anAVIoutputle.
Fig.5.1.Graphoftherender-enodeappliation
ForGOPsof12pitures,theperformanemodelforourtestappliationis(weeliminatedexeutionrates
fordata-dependentomputations):
C
1e
=
C
1o
=
C
2i
=
C
2o
=
C
3i
= 12
·
C
3o
=
= 12
·
C
4i
= 12
·
C
4o
= 12
·
C
5i
andhasonedegreeoffreedom.
5.1. Convergene to Steady State. Westartshowingthat theappliationbehavioratuallytends to
steady-state.
Figure5.2showsperformanefeaturestakenfromarealexeutionofthetestappliationonaBladeluster
onsisting of 32 omputing elements, eah equipped with an IntelPentium III Mobile CPU at 800MHz and
1GBofRAM,interonnetedby aswithedFastEthernet dediatednetwork. Theappliation wasongured
toexploit 20mahinesintherenderomputation,andonemahineforeahremainingnode.
Performanefeatures are measuredasin (4.2),i. e. averagingthe number ofperformedomputations on
thedurationoftheexeution. ThetopdiagramshowstheperformaneoftheRenderandtheGOPAssembler
nodes, whih operate on frames, while the bottom diagram shows the Enoder and Colletor nodes, whih
operate on GOPs. The similarity of the urves in the left and the right diagrams shows empirially that
Prop.4.2issatisednotonlyatthesteady-state,butalsoduringtheniteomputation,assoonasbuersare
0
1
2
3
4
5
0
100
200
300
400
500
600
700
800
bandwidth (activation/s)
frame
Rendering engine
GOP Assembler
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0
10
20
30
40
50
60
bandwidth (activation/s)
GOP
Encoder
Collector
Fig.5.2. Convergenetosteady-stateofaveragedperformanefeatures
Moreover,Fig. 5.2showsthat theaveragedomputation ratesstabilize during theomputation, allowing
ustoadoptasteady-statemodeltoapproximate theatualappliationrun.
5.2. FromDesired Performaneto ResoureRequirements. Typially,ifsomeoneisfainga
prob-lem by means of HPC tools, he has learin mind some sort of performane requirement for his appliation.
This anbeexpressed in dierentforms, e.g. ompletion time, omputation rate, response time, et. In our
framework we express requirementsas bounds on omputation rates. That is the most naturalway dealing
withstream parallelism. Thismeansthat, iftheproblemis expressedin dierentterms,somesort of
prelim-inary transformation should be applied (e.g. study the initial transient length to relate ompletion time to
omputationrate,orusetheLittle'sLawtotranslateresponsetimerequirementsinomputationrateones).
Suppose thatwerequire1frame/s(the onstraintis expressedby
C
5i
≥
1
12
, beauseeahinput forC
5
is omposed by12frames). Applyingtheperformane model wederiverequiredomputation andtransferratesforeahomputationand ommuniation.
These values,pairedwith programannotations(seeTab.5.1) ontheweightof omputationor
ommuni-ation (e.g. MFLOPper task/MB transferred to/from memory and message size, respetively) an be used
to derive requirements that the resouresmust fulll in order to meet the performane requirements on the
appliation.
For instane, we an show the requirement for stream
S
2
=
C
2o
. Sine it is required to arry 1.19MB messages with at least rate 1/s, a link of 9.5 Mbit/s is suient. Likewise, the test appliation will neversale above10 frames/s with a100 Mbit/s network, and needs to be redesigned,if wewant to reah higher
performanes.
Deploymentannotationsfortheexampleappliation.
Component
C
1
C
2
C
3
C
4
C
5
Proessor i686 i686 i686 i686 i686
Memory(MB) 64 256 64
CPUWork 3307 52
Mem. Work 302 104
Connetor
S
1
S
2
S
3
S
4
datatype param pi GOP zip
datasize 54B 1.19MB 14.24MB 2MB
eahatomiomputation that,giventherequiredexeutionrate, produestheresourerequirements. This is
essentialinanexeutionenvironmentin whih resouresarenotknowninadvane.
Themodelpresentedin[22℄suitsourneeds. Weanassoiatetoeahomputationaweight,representedby
apairofvalues
w
= (
w
MF LOP
, w
MB
)
,speifyingthenumberofoatingpointoperations(expressedinMFLOP) andthedatatransferredto/frommainmemory(expressedinMB)perativation. Resourepowerisdesribedbythepair
p
= (
p
MF LOP/s
, p
MB/s)
,andexeutiontimeisthereforeestimatedast
(
p, w
) =
w
MF LOP
p
MF LOP/s
+
w
MB
p
MB/s
.
This model an be employed also to nd appropriate parallelism degree for parallel omputation nodes.
We,infat,anrelate
t
(
p, w
)
foranaggregateresourep
= [
p
1
, . . . , p
k]
totheperformaneoftheodeonsingle resourest
(
p
i
, w
)
.Assumingperfetspeedup,weobtain:
t
(
p, w
) =
X
i
t
(
p
i
, w
)
−
1
−
1
Inthiswayweanderive,foreahomputationnode,mathingresourerequirements. Thesewillonern
singleresouresforsequentialnodes,andaggregateonesforparallelnodes.
Resultsommented. InFig.5.3,twomappings(toponanhomogeneousluster,bottomwithheterogeneous
resoures)forthesameonstraintare displayed. Therstthing tonote isthat,eveniftheheterogeneousrun
has morevariane in ahieved bandwidth, the average bandwidth is omparablewith the homogeneousone.
Thisprovidesevidenethattheemployedperformanemodelorretlyhandlesheterogeneoussetsofresoures,
determiningtheorretparallelismdegree. Thegoodperformaneinheterogeneousrun(itsompletiontimeis
evenshorterthantheoneforhomogeneousrun)isexplainedbythefatthatthemodelanmathomputation
requirementswith suitableresoures,i. e. shedule memoryboundomputations (e.g. enoding)onmahines
withfastermemory,andFPUboundones(e.g. rendering) onmahineswithfasterFPU.
The obtained results are as expeted: the mapping omputed using the performane model fullls the
onstraint,atthebeginningandmostofthetimeoftheappliationrun. Thisoursbeause,inordertobuild
ourmodel,wesampledtheahievedperformaneontherstframesofthemovie,buttheappliationworkload
slightlyhangeswiththeevolutionofthemovie. Thisisevidenedbythesmoothedbandwidthurve,thathas
the same oursein the twoexperimental settings: theworkload is heavieraround 100s and 300s, while it is
lighterinthemiddleandattheend.
6. Conlusions and Future Work. Inthisworkwedesribedananalytialapproahtomapalassof
appliationsonaGrid. Theseappliationsinterat throughstreamsofdata, proessedbyseveralautonomous
softwareomponents, either sequential orparallel. We presented asteady stateperformane model for these
appliationsandweappliedittoaasestudy,arenderingpipelineofsequentialandparallelomponents. The
model wasexploited to predit aprogram behaviorat run-time. Then we showed ageneral methodology to
deviseaorretinitialmappingfortheappliation,drivenbyspeiedQoSlevels. Atlast,weshowedtheresults
ofourmappingmethodologywiththepresentedappliation, andwedisussedtheresultsofthemappingand
the exeutionon homogeneousand heterogeneous sets of resoures. Weobtained good results in bothases.
Theappliationwas orretlymappedandtheQoSrequirementrespetedwithasmallerror.
Analytial [35, 19, 36, 20, 18, 29℄ and strutural performane models [25, 14, 8℄ disussed in Set. 2
0
1
2
3
4
5
6
7
0
100
200
300
400
500
600
Bandwidth (frame/s)
Time (s)
Test results for cluster Fuji
instantaneous bandwidth
averaged bandwidth (100s)
constraint
0
1
2
3
4
5
6
7
0
100
200
300
400
500
600
Bandwidth (frame/s)
Time (s)
Test results for heterogeneous configuration
instantaneous bandwidth
averaged bandwidth (100s)
constraint
Fig.5.3. Twoexeutionsofthetestappliation: top)homogeneouslustersofAthlonsXP2600+, down)setof
heteroge-neousresoures(9P42GHz,1AthlonXP2800+,1P42.8GHz).
of the appliation performane from the target platform, allowing us to evaluate the model one to
de-rive enough information to drive the mapping proess. Trae-based approahes [34, 26℄ are used to
over-ome the limitations of previously disussed approahes, but they are not ompositional. Therefore they
must be applied from srath to every new appliation, even if it is built from the same set of
ompo-nents.
All thosemodelsandthepresentedoneshareanassumption onthebehaviorof theappliations:
ompu-tation exeutionsmust be independentfrom theatual valuesof theinput set. Otherwise, twoexeutions of
thesameappliationwouldbenotomparable(thisisalledergodiityforstohastimodels). Forappliations
thatdonotmeetthisrequirements,thebestsolutionistoresortto runtimeadaptation.
Thepresentedapproah is notperfet. The initialmapping anbe onsidered agood hint to start the
exeution of an appliation on a Grid. The dynami hanges in resoures during the exeution an not be
easily inludedin launh-time strategies. Our approah must be oupledwith reshedulingstrategiesat
run-time to solve suh problems. Our future work is going in this diretion. The presented steady state model
an be exploited at run-time to adapt the behavior of omponents to hanges in resoure performanes. In
this way, it should be possible to fulll the QoS requirements during the whole exeution of the
applia-tion.
REFERENCES
[1℄ M.Aldinui,A.Petroelli,E.Pistoletti,M.Torquati,M.Vanneshi,L.Veraldi,andC.Zoolo,Dynami
reongurationofgrid-awareappliationsinASSIST,inPro.11thEuro-ParConferene,Lisboa,Portugal,Aug.2005.
[3℄ B. Babok, S. Babu, M. Datar, R. Motwani, and J. Widom,Models and issues indata stream systems, inPro.
21stACM-SIGMOD-SIGACT-SIGARTSymposiumonPriniplesofdatabasesystems(PODS'02),Madison,USA,2002,
pp.116.
[4℄ B.Bai,M.Danelutto,S.Pelagatti,andM.Vanneshi,SkIE:aheterogeneousenvironmentforHPCappliations,
Par.Comp.,25(1999),pp.18271852.
[5℄ S.Balsamo,A.D.Maro,P.Inverardi,andM.Simeoni,Model-Based PerformanePreditioninSoftware
Develop-ment:ASurvey,IEEETrans.onSoftwareEngineering,30(2004),pp.295310.
[6℄ C.Banino,O.Beaumont, L.Carter,J.Ferrante,A.Legrand,andY.Robert,ShedulingStrategiesfor
Master-Slave Tasking on Heterogeneous Proessors platforms, IEEE Trans. on Parallel and Distributed Systems,15 (2004),
pp.319330.
[7℄ O. Beaumont, A. Legrand,L. Marhal,and Y.Robert,Steady-State ShedulingonHeterogeneousClusters: Why
andHow?,inPro.of
18
th
InternationalParallelandDistributedProessingSymposium(IPDPS04)(IPDPS'04),April
2004.
[8℄ A.Benoit,M.Cole,S.Gilmore,andJ.Hillston,ShedulingSkeleton-BasedGridAppliationsUsingPEPAandNWS,
TheComputerJournal,48(2005),pp.369378.
[9℄ F.Berman,A.Chien,K.Cooper,J.Dongarra,I.Foster,D.Gannon,L.Johnsson,K.Kennedy,C.Kesselman,
J.Mellor-Crummey,D.Reed,L.Torzon,andR.Wolski,TheGrADSProjet: SoftwareSupportforHigh-Level
GridAppliationDevelopment,Int.J.ofHighPerformaneComputingAppliations,15(2001),pp.327344.
[10℄ D.BertsimasandD.Gamarnik,Asymptotiallyoptimalalgorithmforjobshopshedulingandpaketrouting,Journalof
Algorithms,33(1999),pp.296318.
[11℄ M. Cole,Bringingskeletonsout of theloset: a pragmatimanifesto forskeletal parallel programming, Par.Comp.,30
(2004),pp.389406.
[12℄ M.Coppola and M.Vanneshi, High-Performane DataMining withSkeleton-based Strutured Parallel Programming,
Par.Comp.,Sp.Iss.onParallelDataIntensiveComputing,28(2002),pp.793813.
[13℄ M.Danelutto,M.Vanneshi,C. Zoolo,N.Tonellotto,R. Baraglia,T.Fagni,D.Laforenza,andA.
Pa-osi,HPCAppliationexeutiononGrids,inFGG:FutureGenerationGrid,CoreGRID,Springer,2006.
[14℄ M. Dìaz, B. Rubio, E. Soler, and J. M. Troya, SBASCO: Skeleton-based Sienti Components, inPro. of
12
th
EuromiroConfereneon Parallel, Distributed, and Network-Based Proessing (PDP'04),ACoruña, Spain,February
2004.
[15℄ W.DuandG.Agrawal,Languageandompilersupportforadaptiveappliations,inPro.2004ACM/IEEEConferene
onSuperomputing(SC'04),Pittsburgh,USA,Nov.2004.
[16℄ N.Dumitrasu,S.Murphy,andL.Murphy,AMethodologyforPreditingthePerformaneofComponent-Based
Appli-ations,inPro.of
8
th
InternationalWorkshoponComponent-OrientedProgramming(WCOP03),Darmstadt,Germany,
July2003.
[17℄ I. FosterandC. Kesselman,eds.,TheGrid: Blueprint fora NewComputingInfrastruture,Morgan KaufmannPub.,
July1998.
[18℄ S.Gilmore,J.Hillston,L.Kloul,andM.Ribaudo,SoftwareperformanemodellingusingPEPAnets,inPro.of
4
th
InternationalWorkshoponSoftwareandPerformane(WOSP04),NewYork,NY,USA,2004,ACMPress,pp.1323.
[19℄ K.Kant,IntrodutiontoComputerSystemPerformaneEvaluation,MGraw-Hill,1992.
[20℄ P.J.B.KingandR.Pooley,DerivationofPetriNetPerformaneModelsfromUMLSpeiationsofCommuniations
Software,inPro. of
11
th
International Confereneon ComputerPerformane Evaluation: ModellingTehniquesand
Tools(TOOLS00),London,UK,2000,Springer-Verlag,pp.262276.
[21℄ H.Kuhen,ASkeletonLibrary,inPro.8thEuro-ParConferene,London,UK,Aug.2002.
[22℄ A.Litke,A.Panagakis,A.D.Doulamis,N.D.Doulamis,T.A.Varvarigou,andE.A.Varvarigos,Anadvaned
arhitetureforaommerialgridinfrastruture.,inEuropeanArossGridsConferene,M.D.Dikaiakos,ed.,vol.3165
ofLetureNotesinComputerSiene,Springer,2004,pp.3241.
[23℄ L. Marhal, Y. Yang, H. Casanova, and Y. Robert, Arealisti network/appliation model for shedulingdivisible
loadsonlarge-saleplatforms,inPro.of
19
th
InternationalParallelandDistributedProessingSymposium(IPDPS05)
(IPDPS'05),April2005.
[24℄ J.Ray, N.Trebon,R. C. Armstrong, S. Shende,and A.D.Malony,PerformaneMeasurementand Modelingof
ComponentAppliationsinaHighPerformaneComputingEnvironment: ACaseStudy,inPro.of
18
th
International
ParallelandDistributedProessingSymposium(IPDPS04),SantaFé,USA,April2004.
[25℄ J.Shopf, Strutural predition models forhigh-performane distributed appliations, inPro. ofthe Cluster Computing
Conferene(CCC'97),Atlanta,USA,Marh1997.
[26℄ L.J.Senger,M.J.Santana,andR.H.C.Santana,UsingRuntimeMeasurementsandHistorialTraesforAquiring
KnowledgeinParallelAppliations,inPro.ofthe2004InternationalConfereneonComputationalSiene(ICCS04),
M.Bubak,G.D.van Albada,P.M. Sloot,andJ.J.Dongarra, eds.,vol.3036ofLeture Notes inComputerSiene,
Kraków,Poland,June2004,SpringerVerlag,pp.661665.
[27℄ M.Sitaraman,G.Kulzyki, J.Krone, W. F.Ogden,andA.L. N.Reddy,Performane speiation ofsoftware
omponents,inPro.ofthe2001SymposiumonSoftwareReusability(SSR01),Toronto,Ontario,Canada,2001,ACM
Press,pp.310.
[28℄ C.U.Smith,PerformaneEngineeringofSoftwareSystems,Addison-Wesley,1990.
[29℄ B. Spitznagel andD. Garlan,Arhiteture-Based PerformaneAnalysis,inPro.of
10
th
International Confereneon
SoftwareEngineeringandKnowledgeEngineering(SEKE98),Y.DengandM.Gerken,eds.,1998,pp.146151.
[30℄ N.Trebon, A. Morris, J.Ray, S. Shende, and A. Malony,Performane Modelingof Component Assemblieswith
TAU,inPro.ofCompFrame2005,Atlanta,USA,June2005.
[31℄ S. Vadhiyar and J. Dongarra, Self Adaptability in Grid Computing, Conurrreny and Computation: Pratie and
[32℄ A. J. C. van Gemund, Symboli Performane Modeling of Parallel Systems, IEEE Trans. on Parallel and Distributed
Systems,14(2003),pp.154165.
[33℄ M.Vanneshi,Theprogramming modelofASSIST,anenvironmentforparalleland distributed portableappliations, Par.
Comp.,28(2002),pp.17091732.
[34℄ F.Vraalsen, R. A.Aydt, C. L.Mendes,and D.A.Reed,PerformaneContrats: PreditingandMonitoring Grid
Appliation Behavior, in Pro. of
2
nd
International Workshop on Grid Computing (GRID 01), London, UK, 2001,
Springer-Verlag,pp.154165.
[35℄ L. G.WilliamsandC. U.Smith,PASA(SM):An ArhiteturalApproahtoFixingSoftwarePerformaneProblems,in
Pro.of
28
th
InternationalComputerMeasurementGroupConferene,Reno,Nevada,USA,2002,pp.307320.
[36℄ C. M. Woodside, J.E. Neilson, D. C. Petriu, andS. Majumdar,TheStohasti Rendezvous Network Model for
PerformaneofSynhronousClient-Server-likeDistributedSoftware,IEEETrans.onComputer,44(1995),pp.2034.
[37℄ J.Xu,A.Oufimtsev,M.Woodside,andL.Murphy,Performanemodelingandpreditionofenterprisejavabeanswith
layeredqueuingnetworktemplates,inPro.ofthe2005ConfereneonSpeiationandVeriationofComponent-based
Systems(SAVCBS05),NewYork,NY,USA,2005,ACMPress.
[38℄ A.Zhang,Y.Song,andM.Mielke,NetMedia:StreamingMultimediaPresentationsinDistributedEnvironments,IEEE
MultiMedia,9(2002),pp.5673.
Editedby: PasquaD'Ambra,Danieladi Serano,MarioRosarioGuarraino,FranesaPerla
Reeived: June2007