NETWORK SCHEDULINGFOR COMPUTATIONAL GRID ENVIRONMENTS
MARTIN SWANY
∗
AND RICH WOLSKI
†
Abstrat.
TheproblemofdatamovementisentraltodistributedomputingparadigmsliketheGrid.Whileoftenoverlooked,thetime
tostagedataandbinariesanbeasigniantontributor tothewall-lokprogramexeutiontimeinurrentGridenvironments.
This paper desribes a simple sheduler for network data movementin Grid systems that an adaptively determine data
distributionshedulesatruntimeonthebasisofNetworkWeatherServie(NWS)performanepreditions. Theseshedulestake
theformofspanningtrees. ThedistributionmehanismisanenhanementtotheLogistialSessionLayer(LSL),asystemfor
optimizingdatatransfersusinglogistis.
Keywords.
Gridomputing,datalogistis,datastaging
1. Introdution. AsComputationalGridenvironmentsproliferate,theommunitymustonstantlyevolve
theway in whih omputingsystems are used. Distributed omputingon the Grid hasenabled newways of
harnessingomputingresouresandyet,hasexposeditsownsetofhallenges. Onesuhproblemisthatofdata
movement.AppliationsthataredrawntotheGridbeauseoflargeresourerequirementsfrequentlyonsumeor
generatelargeamountsofdata. Theproblemsofdataloalityanddatamovementarebeomingmoreprominent
andritialtotheperformaneanddeployabilityofGridsystems. Further,duetothedynamisminherentinGrid
environments,itislearthat mehanismsfordatastagingmustbeadaptiveliketheomputationsthemselves.
AppLeS[8℄demonstratedthebeginningofanewwayofthinkingaboutprogrammingtheGridsheduling
from the perspetive of the appliation. In this spirit, we propose to approah the problem of adaptively
sheduling buers in the network with proativesupport from the appliation. This paper examines simple
optimizationsthatweanfailitatebythinkingofGridresouresintermsofooperatingelementsinastorage
andomputingoverlaynetwork. Byenablingthistypeoffuntionality,usingtehniquessuhastheLogistial
SessionLayer(LSL)[34℄ ortheInternetBakplaneProtool(IBP) [28℄,the breadthoftheserviesoeredby
aGridisimproved.
Thegoalofthisworkistoinvestigateshedulingandroutingtehniquesfousedonoptimizingdata
move-mentinGridenvironments. Inordertoinvestigatesuhshedulingwewilldrawonpreviousworkasfollows. The
LogistialSessionLayer(LSL)[34℄providesthebasiplatformforooperativedataforwardingthatrespondsto
requestsfromthesheduler. TheNetworkWeatherServie(NWS) [43℄providesuswithnetworkperformane
monitoringand foreastingapabilities. Finally, theNWSlapd[37℄,theahing anddeliverysubsystem ofthe
NWS,ahesnetworkperformaneforeastsand aggregatestheminto aform suitableforonsumptionbythe
sheduler.
Therehasbeenatremendousamountofworkinthisommunitytooptimizeolletiveoperationsforparallel
omputing[4,27, 24, 5,18,39, 20, 40℄. Certainly, theseapproahesareall relatedat somefundamental level
(anddisussedsomewhatin Setion6). However,ourapproahisfousedonpre-runtimedatadistribution (or
staging)ratherthanolletiveoperationsassuh. Initialdatadistributionisanimportantomponentofatual
Grid deployment. This fat is oftenobsured bypre-stagedbinaries orloally-generatedrandom inputdata,
butforGridsystemsto realizetheirpotential,these issuesmustbeaddressed.
Ourapproahtothisproblemis uniqueinanumberofkeyways:
•
IttreatsGridresouresasagraphwithedgevaluesderivedfromurrentnetworkperformaneforeasts•
It adaptively builds distribution trees for arbitrary topologies by reating a shedule based on theMinimumSpanningTree(MST)overthat graph
•
Cooperativeforwardingamong peers is aomplishedwith the Logistial Session Layer (LSL), whihusesasadedTCPonnetions.
Gridenvironmentsareextremelydynami. Networkperformanedependsonambientload. Tobestadapt
ourexeutionatruntime,foreastsbasedonurrentperformaneinformationareneessary. Distributiontrees
based on this information will often vary wildly in shape. We need an extremely general tree onstrution
mehanismtoaommodatethediversityofGridsystems. Finally,asweuseLSLforourdistributionplatform,
∗
DepartmentofComputerandInformationSienes,UniversityofDelaware,Newark,DE,19716(swanyis.udel.edu).
†
2. Problem. Thegeneralproblem thatthisworkaddressesisthatofthelogistis ofdatamovementin
ComputationalGridenvironments. Infat,thelogistisofdatamovementarethemain reasonwhyomputing
power isnota fungibleresourelikeeletrial power. Usersneedomputations to beperformedon spei
bits of data, whereas eletriity an beonsumed regardlessof the loation ormeans of its generation. The
problemsofdataloalityandmovementareuniversalandarearitialonsiderationin Gridsystems.
Therehasbeenmuhreentworkonsideringooperativedatasharingbetweennetworkedpeers[30,33,6,
28,22℄. Theseooperativeapproaheshavehadimpatinboththeparallelproessingandnetworkomputing
domains. Inthisspirit,weonsideranenvironmentin whihGridresouresareenabledto utilizeand provide
ooperationofthissort. Ourgoalisto onsidershedulingthese resouresandexaminepotentialperformane
optimizationsthat mightemerge. ThisworkbuildsontheideasofLogistial [34, 6,28℄, overlay[3,38, 17℄
andpeer-to-peer[30,33,22,44℄networkingtotreattheproblemsofommuniationinGridsystemsinanovel
manner.
The GrADS [7℄ projet is a large, multi-institution projet whose goal is to investigate omprehensive
softwareenvironmentsfordevelopingGridappliations. Assuh,theGrADSenvironmentisfousedonprogram
developmentandompilationaswellasruntimeGridsupport. Beforeexeution,aCongurableObjetProgram
ispreparedbytheompilationsystems. Whentheprogramistobelaunhed,theSheduler/ServieNegotiator
(S/SN)interatswithavarietyofruntimeserviesprovidedbytheGridfabrianddisoversthestate ofthe
Grid at thattime. The S/SNuses this stateinformationto makedeisionsaboutprogram ongurationand
sheduling. In partiular, the system requires urrent short-term foreastsof resoureperformane levels so
that it anmakeproative shedulingdeisions. TheNWS generatessuh foreastsautomatially, but to be
useful,theyhavetobedeliveredtotheS/SN(throughtheGlobus[13℄infrastruture)quiklyandreliably.
Considering the problem of initial data distribution, our assumptions an be aptured by the following
senario. Let us imagine that a user is launhing a program in a Grid environment suh as the GrADS [7℄
projet'stestbed. IntheGrADSarhiteture,theCongurableObjetProgram,orCOP,isdistributedbythe
AppliationManager inthe rstphasesofexeution. This isnot,ofourse, uniqueto GrADS.InmanyGrid
paradigmsauserhasasetofprogramexeutablesthatneedtobedistributedtotheresouresbeforeexeution
anbegin.
InotherGridusagemodels,end-usersutilizeresouresthroughpreviouslyexistingsoftwareinfrastruture.
Thissoftwareexportsserviesthroughappliationinterfaesusingremoteproedurealls,orRPC.NetSolve[11℄
is an exampleof suh a system. The problem that these systemsfae is similar to the programdistribution
probleminthatsomeamountofdatamustoftenbesentfromtheusertoGridresourespriortothebeginning
ofanymeaningfulexeution. This problemisstrongly relatedinthat itonernsinitial datadistribution and
thus,itanbemodeledsimilarly.
These problemsareequivalentto somedegreein thateither priortoruntime orduring aninitialphaseof
runtime,somedatahastobesenttotheeahomputationalnodebefore anyrealappliation progressanbe
made. Often,wehoosetoabstratthisproblemawaywithle-sharingtehniques. Infat,networklesystems
(e.g. NFS) anbeusedwithin asinglesitesothat weonlyneedtotransferoneto nodesthatshare lesthis
way, but there aremanyaseswhere systemsdonotshare les in thisfashion. Further, NFS ansuerfrom
poorperformane andsine data(programs oruser data)is to bemovedoverthenetwork,wepreferto deal
withtheassoiatedoverheadexpliitly. Certainly,therearemanysituationsandsenariosthat dierinsimple
waysfromthisbasimodel,butthisapturesourassumptionsand,infat,modelsrealGridsystemsquitewell.
2.1. Problem Modeling. Considerthe simpledepitionof these data transfersin Figure 2.1. In these
graphs,thevaluealongtheedgedenotessomeost. Inthisaseitisthetimetotransfersomeamountofdata.
Figure2.2obviouslydemonstratesadistributionpattern(ortree)withaloweroverallost.
Further,inGridenvironments,resouresareoftenloatedingroupsorlusters,sothepotentialperformane
Fig.2.1. Costtreefordefaultdistributionstrategy
Fig.2.2. Lessostlydistributiontree
This modeling approahallows us to think aboutthe problem of data distribution asa graphand oers
obvioushanesforoptimization.
3. Sheduling Algorithms. The ruxof this work is the observation that by treatingthe resouresof
theGridasanetwork,wean sheduletheooperationoftheseresouresintheformationofasingle-soure,
datadistributiontree. Thissheduleanbeomputeddynamially,basedonurrentperformaneinformation.
Adistributiontreemustbeabletodiret thedatatoeah node,orspan thetree.
Consideradiretedgraph
G
withvertiesandedges:G
= (V, E)
. Eahedge hasaweightorostc
ij
foreah
(i, j)
∈
E
. Aspanningtree (T
)isagraphwithT
⊆
G
suh that∀
V
thereisa(u, v)
∈
T
that isinidentonit(i.e.,
T
spansthesetV
).TheMinimumSpanningTree
M ST
(G) =
T
whereP
(u,v)
∈
T
c(u, v)
hastheminimumostofallspanningtrees.
Atraditional, and provablyoptimal, approahto thesolutionof MSTis known asPrim's algorithm[29℄.
Thisalgorithmusesagreedyapproahintheonstrutionofthesolutiontree. Briey,thealgorithmproeeds
asfollows.
TondtheMST(
T
),wereateanemptytreeT
andmovethestartingnodeofthetree(v
start
)fromV
toT
:v
start
∈
T
|
T
∩
G
=
∅
(3.1)Then,weiteratewhile
|
V
|
>
0
. Ateahstepweexamineedgesintheut(edgesthatbegininT
andendin
V
)andselettheminimumostedge:min(e)
∈
E
′
|
e(u, v)
u
∈
T and v
∈
V
(3.2)Node
v
isthenmovedtoT
andweexaminethenewlyaddednodeandedgetoseeifitsadditionhasoeredabetterpathtonodesalreadyin
T
.Whilethespanningtreeproblemisattheheartofthisapproahtosheduling,thereareadditionalfators
thatmustbeonsideredinourmodel. Intheprevioussetion,weonsideredextremelysimplegraphs.Obviously
forInternethosts,thetimetotransmitdatatoanumberofhostsisnotlinearwiththenumberofhosts.Multiple
outgoingedgesinterferewithoneanothertheyarenotindependent. Intermsofthenetwork,themorestreams
there aresharingtheresoureof outgoingnetwork apaity,the lesseah stream gets. Thisould ompliate
minimum-Fig.2.3. Adistributiontreeforlusters
spae [12℄ shows that the problem remains NP-hard even when realisti bounds are plaed on transmission
levels(reduingthem toasmallxedset), butgiveshopeforpolynomial-timesolutionsifasolutionexists.
Anotherpotentialombinatorialproblemarisesin oursituationaswell. TheSteinerNetwork isdierent
from the MST problem in that only a subset
S
ofG
must be spanned. This problem has been shown to beNP-hard[19℄. Thisproblemistheheartof theproblem ofminimumspanners [10℄againdemonstratedtobe
NP-omplete. However,wenote that sinetheset with whih we areonerned isnotasubset of
G
that weavoidthediulties assoiatedwiththeseproblems.
These previous results treat their realm of disourse to be in metri spae, meaning that the triangle
inequalityholds. Internetsarenot,ingeneral,inmetrispae. Thismakestheproblemmoretratableinitially,
but ultimatelyompliatesthe model. Inpartiular, rather thanpowerlevels,ourspanning-tree problemhas
theabovedesribedonstraintthat wean refertoaslateralinhibition. Themoreedges(streams)that are
inident on a node, the less well any of them perform. In the extreme, the interferene between streams is
uniquefor everystream onguration. This ombinatorial spae impliesthat theoptimal solutionfor suh a
problemisNP-hard. However,wenotethatthisapproahisnotneessarilyonernedwithanoptimalsolution,
ratherwewishtoempiriallydeterminetheeayofthisgenerallassofsolution.
TheMSTproblemisknowntoberelatedtomanyproblemsindistributeddatamovement. Whilewedonot
dealwithitdiretlyinthiswork,theminimumostpathandall-pairsminimaxproblems[2℄provideabasisfor
multi-hopforwardingofthesortproposed byLSL[34℄andIBP[28℄. Parallelstreamswithdiverse pathsallow
ustoouhroutingintermsofmaximumowalgorithms. However,utilizingparallelstreamsbetweenidential
loations,withdefaultpaths,onlyservestoinreasethevalueofasinglear. Thiswouldertainlyinreasethe
observedbandwidth,but ourtreatmentofthesingle-streamasestillholdswithoutlossofgenerality.
4. SystemArhiteture. TodeployandtestthissheduleronaGridsystem,werelyonvarious
ompo-nentsofGridsoftware. Speially,thissoftwaredependsontheNetworkWeatherServie,theNWS'sahing
LDAP deliverysystemandtheLogistialSessionLayer.
4.1. NetworkWeatherServie. TheNetworkWeatherServie[43,42℄isasystemdevelopedtoprovide
performanemonitoringandonlineperformanepreditiontoGridshedulerssuhasours. Gridenvironments
areextremely dynamiand in orderto managethisdynamism, asheduler musthavenear-termperformane
preditionsuponwhih tobase runtime deisions. TheNWSmeasures,among otherthings, TCPbandwidth
andlatenybetweenhostsinasalableandunintrusivemanner. Byapplyingvariousnon-parametristatistial
tehniques onthe timeseriesprodued bythese ongoing measurements, theNWSis ableto produe foreasts
thatgreatlyimprovepreditionovernaivetehniques. Further,thesemeasurementsanbeombinedwithpast
instrumentationdatato produeaurateestimatesofbandwidth[36℄ortransfertime.
An additional omponent of the NWS, alled the NWSlapd [37, 35℄, provides neessary funtionality as
well. First,this systemahesperformane preditionsnear queryingentities making it possibleto sale the
performaneinformationinfrastrutureandprovideubiquitousforeaststonetwork-awareshedulers. Thispart
of the system also assembles measurement information into anetwork view that anbe easily and quikly
queried. Note,however,that theNWSdoesnotatuallyinitiatemeasurementsbetweeneverypairofhosts(
n
tests.) Rather, theNWSlapd interpretsthe hierarhyof measurementsthat theNWS doestakeandllsin a
ompletematrixofforeasts(asdesribedin[35℄.)
Theomplete matrix offoreastsprovidesus with the node-node adjaenymatrix representationof our
network. Theadjaenymatrixispopulatedbytheobservedbandwidth(and/orlateny)betweenhost
i
andhost
j
inthe(i, j)
thelement. Note thatthegraphthatthismatrixrepresentsisfully-onnetedaseveryhostontheInternetanreaheveryotherhostwithsomebandwidth. 1
Thisprovidestheinitialgraph
G
uponwhihoursheduleroperates.
4.2. Sheduler Implementation. Ourinitialshedulingapproahissimplytodesribeaspanning tree
for the nodes in our resoure pool. To do this, we simply use Prim's algorithm as desribed in Setion 3.
In order to produe a minimum spanning tree, we need a metri where a smaller value is better. Sine
we areoperatingwith bandwidth foreasts,we onvert the bandwidthestimates transfertime estimates by
onsidering
1/bandwidth
asthevalue ofanedge.Fig.4.1.SimpleIllustration ofTreeDepth
Onesimpletehniquethatwehaveimplementedallowsustominimizethedepthofthespanningtree. Our
goalis to minimize the numberof hops that a stream must pass throughas eah hop adds someamount of
overhead. Considerthegraphin Figure4.1. Stritlyspeaking,the minimumspanningshould inlude the ar
A
→
B
,and thatfromB
→
C
. However,it reduesthe depthbyalevelandinreasestheoverallostof thetreetospanviathearfrom
A
→
C
.This has an eet in pratie. Due to small variations in measurements through time, mahines with
funtionallysimilaronnetivityhaveslightlydierentforeasts. Tokeepthetreesmoresimple,wewouldlike
to onsider measurements within some
ǫ
of one another as the same. A perfet hoie for this value is thehistorialforeastingerrorfrom theNWS.
Thesheduler performsas expeted. Whenpresentedwith theresultsofaperformanequeryfrom NWS
ontaininginformationabouttheGrADStestbed[14℄,thesystemwaslearlyabletodisernseparatelustersat
theUniversityofTennesseeandUniversityof Illinoisand suggestadistribution treetakingthat intoaount.
Figure 4.2 depits spanning tree produed by the sheduler, and this graph is generated from that output
usingGraphViz[15℄, agraphplotter. Theinitialset ofresults(inSetion 5)utilize thishostpoolandsimilar
distributionshedules.
torc0
msc01
torc1
torc2
torc3
torc4
torc5
torc6
torc7
torc8
opus0
msc02
msc03
msc04
msc05
msc06
msc07
msc08
opus1
opus2
opus3
opus4
opus5
opus6
opus7
opus8
Fig.4.2. SpanningTree
Notethat Figure 4.2 isreatedautomatially. Otherthan guessingbased onthenamesof thehosts (not
on the domain name), there is no way to disern these lusters at the network level. In some ases, only
empirialperformanemeasurementsshowtheserelationships,asshownpreviouslybyEetiveNetworkViews
1
Fig.4.3.Distributionreordsinatree
4.3. Logistial Session Layer Data Distribution. The sheduler produes adistribution tree whih
isgivento theLogistialSessionLayer[34℄ (LSL)to ontrol thedata distribution. LSL isasystemfor
oop-erativeforwardingandbueringofnetworktrathathasbeenshownto greatlyinreaseend-to-endnetwork
performane. LSLutilizesTCP,soquestionsoffriendlinessarenotanissueanddataintegrityguaranteesare
those of TCP. 2
However,LSL endeavorsto allowTCPto perform betterby keeping theround-trip time on
any sublinkto aminimum. This useof TCPalso failitatesinrementaldeployability,yet takesadvantageof
improvingtransport-layerperformane.
Forthispartiularexperiment,wehaveimplementedanewmessageoptionin theLSLstak. Eahoption
denesadistributiontreeinludinginformationaboutthehildrenofthatnode. Thehierarhyofdistribution
headers is reursivelyenoded and deoded so that onlythe relevantportions of thesubtreeare transmitted
todownstreamneighborsuntilultimately, theleafnodesgetadistributiontreewith asingleentry. Figure 4.3
illustratesthis.
The aknowledgmentof data reeipt at the ultimate destination is impliit with the losing of the TCP
soket. AteahLSLnode,neessarydataissentoutalloutgoingsoketsandthesendingsideofeahofthose
soketsislosed. Eah daemonthenwaits foreahdownstream neighbortoloseitssoket,signalingthat all
destinationshavereeivedthedata. Attheleafnodes,thesoketsarelosednormallyone alldatais written
to thelesystem. Wenote that diret notiationfrom destination tosoure may bemoredesirablein many
asesand suhamodiationisstraightforward.
Internally, the implementation is not aggressively optimized, and further performane improvements are
ertainly possible. There is also no seurity model at this time. Our tehniqueould easily work over
SSH-enryptedandauthentiatedtunnelsandthisisoneimplementationpossibilitythat weareinvestigating.
5. Results. Totesttheeayofoursystem,wehavedeployeditarosstheGrADStestbed[14℄. Thisset
ofGridresouresrangesfrom50to100nodesarosstheU.S.loatedprimarilyattheUniversityofCalifornia,
SanDiego,theUniversityofIllinois,Champaign-Urbana,andtheUniversityofTennessee,Knoxville. Thesites
areonnetedbyInternet2'sAbilene [1℄bakboneandenjoyrelativelyhigh-speedonnetivity.
Toevaluatethedierene betweendiretdistribution (the diret approah)and ourshedulerin asfaira
manner aspossible, wehave modeled the diret distribution within oursoftware infrastruture. That is, the
diretdistributionversionissimplyaattree. Thisallowsforoverlappingommuniationamongthestreams
andis notterriblyineient. Atanyrate,thedatamovementisnotserializedamong thenodesasitoftenis
indailyuse. 3
Twosets of testswere run. Therst set ontains18 nodesloated at twosites. Theseond set ontains
52nodesin6lustersat3sites. Inallasesthesoureofthedatawasloatedat theUniversityofCalifornia,
SantaBarbara. Again,this modelssituationsthat aredemonstrablyrealisti.
Figure 5.1showsthedistributiontime, inseonds, forlesof varioussizes. This testutilizedthe18 node
pooldesribedabove. Weanseethat thisaseillustratesremarkablywellhowhierarhial,ooperativedata
distributionanimproveperformaneandreduedistributiontime. Figure5.2showsledistributiontimesfor
thelarger(52node)hostpool. Again,theperformaneimprovementfrommakingsimpleshedulingdeisions
2
Whetherthisissuientornotisanothermatter,aswehavedonenoharm.
Fig.5.1. DistributionTimesfor18Hosts
Fig.5.2. DistributionTimesfor52Hosts
isquite signiant. We notethat lusters representthe best asefor distribution tehniques suh asthisand
lustersarefrequentlyomponentsinaGrid.
Figure 5.3 depits the delivered bandwidth that we observe in data transfersto the 18 node host pool.
Figure 5.4 shows this same metri for the larger host pool. We have initiated a data transfer that has an
averageperformanemorethanthephysiallink towhihthemahineisattahed(12.5MB/se).
6. Related Work. There aremany aspets ofresearh that are similarand related. LSLis partof the
moregeneralinquiryofLogistialNetworking[28,6℄. Thisworkinvestigatesamorerihviewofstorageinthe
networkandourshedulingapproahisappliabletoeither infrastruture.
Globus GASS [9℄ and GridFTP [16℄ are data movement and stagingservie for Grid systemsthat ould
be sheduled using the tehniques that wehavedesribed. The MagPIe [20, 40, 21℄ projethas investigated
performaneoptimizationsforolletiveoperations. Improvingtheperformaneofolletiveoperationhasbeen
investigated inmanydierentontexts[4,27,24,5,18,39℄,although primarilythefoushasbeenMPI.
Fig.5.3. DeliveredBandwidthofDistributionTree(18Hosts)
Fig.5.4. DeliveredBandwidthofDistributionTree(52Hosts)
Our approah is quite similar to reentwork by Malouh, et. al [25℄, whih treats multiast proxies as
nodesinanetworkoptimizationproblem. Wenotethat theirarinideneonstraintsaredierentthanthose
that wepropose. Further, their simulations were aimed at evaluating various heuristis, while our goal is to
understandtheperformaneimprovementsfromsimpleshedulinginrealnetworks.
Overast[17℄is anetwork overlaybasedmultiast system. Overastusesnodeto nodeprotoolsto build
and evaluate thedistributiontrees. Ourapproahreates distribution treesat runtime and assumesnostate
inthenetwork. Rather, weassumetheavailabilityofnetworkperformaneforeaststo determinedistribution
trees. OuronernsaboutnodefailurearealsoquitedierentgivenourutilizationofTCPasatransportlayer.
Reentworkin appliation-levelmultiast exploresthe appliability ofpeer-to-peer networks[31℄ for this
purpose. Theynoteabenetoftheirworkisthelakofaonstantly-runningroutingprotool,abenetthatwe
7. Conlusion. We havefoused onthe problem of initial data distribution in Grid environments. By
buildingonprevioussystemomponents,suhastheNWSandLSL,wehavedevelopedanovelsystemfordata
distribution. WehavedevelopedashedulerthatisabletoinstantiateooperativedataforwardingbasedonLSL
and performane data from NWS.This shedulingtehniqueand infrastrutureallowus to form distribution
treesthat greatlyinreaseperformane andredue time to distribute data. Tehniquessuh asthis will only
beomemoreimportantas Gridsproliferate.
REFERENCES
[1℄ Abilene.http://www.uaid.edu/ab ile ne/.
[2℄ R. Ahuja, T. Magnanti,and J. Orlin, Network Flows: Theory, Algorithms, and Appliations, Prentie Hall,Upper
SaddleRiver,NewJersey,1993.
[3℄ D.Andersen,H.Balakrishnan, M.Kaashoek,andR. Morris,Resilientoverlaynetworks. Pro.18th ACMSOSP,
Ban,Canada,Otober2001.
[4℄ V.Bala, J. Bruk,R. Cypher, P. Elustondo, A.Ho, C.-T. Ho, S. Kipnis, andM. Snir,CCL: Aportableand
tunableolletiveommuniationlibraryforsalableparallelomputers,IEEETransationsonParallelandDistributed
Systems,6(1995),pp.154164.
[5℄ M.Banikazemi,V.Moorthy, andD.Panda,Eientolletiveommuniation onheterogeneousnetworksof
worksta-tions,inInternationalConfereneonParallelProessing,1998,pp.460467.
[6℄ M.Bek, T. Moore, J.Plank, and M.Swany, Logistial networking: Sharing more than the wires,inPro. of2nd
AnnualWorkshoponAtiveMiddlewareServies,August2000.
[7℄ F. Berman, A. Chien, K. Cooper, J. Dongarra, I. Foster, L. J. Dennis Gannon, K. Kennedy, C.
Kessel-man, D.Reed,L.Torzon,,andR.Wolski,TheGrADSprojet: Softwaresupportforhigh-levelgridappliation
development,Teh.ReportRieCOMPTR00-355,RieUniversity,February2000.
[8℄ F.Berman,R.Wolski,S.Figueira,J.Shopf,andG.Shao,Appliationlevelshedulingondistributedheterogeneous
networks,inProeedingsofSuperomputing1996,1996.
[9℄ J.Bester,I.Foster,C.Kesselman,J.Tedeso,andS.Tueke,GASS:Adatamovementandaessservieforwide
areaomputingsystems,SixthWorkshoponI/OinParallelandDistributedSystems,(1999).
[10℄ L.Cai,Np-ompleteness ofminimumspannerproblem,DisreteAppliedMathematis,48(1994),pp.187194.
[11℄ H.CasanovaandJ.Dongarra,NetSolve: ANetworkServerforSolvingComputationalSieneProblems,The
Interna-tionalJournalofSuperomputerAppliationsandHighPerformaneComputing,(1997).
[12℄ O.Egeioglu andT. Gonzalez,Minimum-energybroadast insimple graphswithlimitednodepower,inPro.IASTED
InternationalConfereneonParallelandDistributedComputingandSystems(PDCS2001),August2001,pp.334338.
[13℄ I. Fosterand C. Kesselman,Globus: Ametaomputing infrastruturetoolkit, International JournalofSuperomputer
Appliations,(1997).
[14℄ GrADS,http://hipersoft.s.rie .edu /gr ads.
[15℄ Graphviz, http://www.researh.att .o m/sw /to ols/ grap hvi z.
[16℄ GridFTP, http://www.globus.org/da tag rid/ grid ftp .htm l.
[17℄ J.Jannotti, D.K.Gifford, K.L.Johnson, M.F. Kaashoek,andJ.W. O'Toole,Jr.,Overast: Reliable
multi-astingwithanoverlaynetwork,inProeedingsofFourthSymposiumonOperatingSystemDesignandImplementation
(OSDI),Otober2000,pp.197212.
[18℄ N.T.Karonis,B.R.deSupinski,I.Foster,W.Gropp,E.Lusk,andJ.Bresnahan,Exploitinghierarhyinparallel
omputernetworkstooptimizeolletiveoperationperformane,in14thInternationalParallelandDistributedProessing
Symposium,2000,pp.377386.
[19℄ R. M. Karp, Reduibility among ombinatorial problems, in Complexity of Computer Computations, R. Miller and
J.Thather,eds.,PlenumPress,1972,pp.85104.
[20℄ T. Kielmann, H. E. Bal, S. Gorlath, K. Verstoep, and R. F. Hofman, Network performane-aware olletive
ommuniation forlustered wideareasystems,ParallelComputing,27(2001),pp.14311456.
[21℄ T.Kielmann, R. Hofman, H.Bal, A.Plaat, andR. Bhoedjang,Mpi's redutionoperationsin lustered wide area
systems,1999.
[22℄ J.Kubiatowiz,D.Bindel,Y.Chen,P.Eaton,D.Geels,R.Gummadi,S.Rhea,H.Weatherspoon,W.Weimer,
C. Wells, and B. Zhao, Oeanstore: An arhiteture for global-sale persistent storage, in Proeedings of ACM
ASPLOS,ACM,November2000.
[23℄ B.Loweamp,N.Miller,D.Sutherland,T.Gross,P.Steenkiste,andJ.Subhlok,Aresourequeryinterfaefor
network-awareappliations,inPro.7thIEEESymp.onHighPerformaneDistributedComputing,August1998.
[24℄ B.Lowekamp andA. Beguelin,Eo: Eient olletiveoperations forommuniation onheterogeneousnetworks. In
InternationalParallelProessingSymposium,pages399405,Honolulu,HI,1996.,1996.
[25℄ N.Malouh,Z. Liu,D.Rubenstein,andS. Sahu,Agraph theoretiapproah toboundingdelay inproxy-assisted. In
12th InternationalWorkshop onNetworkandOperatingSystemSupportforDigitalAudio andVideo(NOSSDAV'02),
May2002.143,2002.
[26℄ K. Obrazka andG. Gheorghiu, Theperformane of a serviefor network-aware appliations, inProeedings of2nd
SIGMETRICSConfereneonParallelandDistributedTools,August1998.
[27℄ J.-Y.L.Park,H.-A.Choi,N.Nupairoj,andL.M.Ni,Construtionofoptimalmultiasttreesbasedontheparameterized
ommuniationmodel,inProeedingsoftheInternationalConfereneonParallelProessing(ICPP),1996,pp.180187.
inSIGCOMM,August2001.
[34℄ M.SwanyandR.Wolski,Datalogistisinnetworkomputing:TheLogistialSessionLayer,inIEEENetworkComputing
andAppliations,Otober2001.
[35℄ , Buildingperformane topologies for omputational grids. Proeedings of Los AlamosComputerSiene Institute
(LACSI)Symposium,Otober2002.
[36℄ ,Multivariateresoureperformaneforeastinginthenetworkweatherservie,inProeedingsofSC2002,November
2002.
[37℄ ,Representingdynamiperformaneinformation ingridenvironmentswiththenetworkweatherservie. 2ndIEEE
InternationalSymposiumonClusterComputingandtheGrid,May2002.
[38℄ J.Touh,TheXBone.WorkshoponResearhDiretionsfortheNextGenerationInternet,May1997.
[39℄ S.Vadhiyar, G. Fagg,andJ. Dongarra,Performanemodeling forself adaptingolletive ommuniationsforMPI.
ProeedingsofLosAlamosComputerSieneInstitute(LACSI)Symposium,Otober2001.
[40℄ R.V.vanNieuwpoort,T. Kielmann,andH.E.Bal,Eientloadbalaning forwide-area divide-and-onquer
appli-ations, inPPoPP'01: ACMSIGPLAN Symposiumon Priniplesand Pratieof ParallelProgramming, June 2001,
pp.3443.
[41℄ P.-J.Wan,G.Calinesu,X.Li,andO.Frieder,Minimum-energybroadastroutinginstatiadhowirelessnetworks,
inINFOCOM,2001,pp.11621171.
[42℄ R. Wolski,Dynamiallyforeasting networkperformane using thenetworkweatherservie,ClusterComputing,(1998).
alsoavailablefromhttp://www.s.utk.edu/rih/publiations/nws-tr.ps.gz.
[43℄ R.Wolski,N.Spring,andJ.Hayes,Thenetworkweatherservie: Adistributedresoureperformaneforeastingservie
formetaomputing,FutureGenerationComputerSystems,(1999).
[44℄ B.Y.Zhao,J.D.Kubiatowiz,andA.D.Joseph,Tapestry: Aninfrastrutureforfault-tolerantwide-arealoationand
routing,Teh.ReportUCB/CSD-01-1141,UCBerkeley,Apr.2001.
Editedby: WilsonRivera,JaimeSeguel.
Reeived: July5,2003.