A FEEDBACK CONTROL MECHANISMFOR BALANCINGI/O- AND
MEMORY-INTENSIVEAPPLICATIONS ON CLUSTERS
XIAO QIN
∗
, HONG JIANG
†
, YIFENG ZHU
†
, AND DAVID R. SWANSON
†
Abstrat. OneommonassumptionofexistingmodelsofloadbalaningisthattheweightsofresouresandI/Obuersize
arestatiallyonguredandannotbeadjustedbasedonadynamiworkload.Thoughthestationgurationoftheseparameters
performswellinalusterwheretheworkloadanbemodeledandpredited,itsperformaneispoorindynamisystemsinwhih
theworkloadisunknown. Inthispaper,anewfeedbakontrolmehanismisproposedtoimproveoverallperformaneofaluster
withageneral andpratialworkload inludingI/O-intensiveand memory-intensiveload. Thismehanismisalso shownto be
eetiveinomplementingandenhaning theperformaneofa numberofexistingdynamiload-balaningshemes. Toapture
theurrentand pastworkloadharateristis,the primaryobjetivesof thefeedbakmehanismare: (1)dynamiallyadjusting
theresoureweights,whihindiatethesignianeofthe resoures,and(2)minimizingthe numberofpagefaultsfor
memory-intensivejobswhileinreasingtheutilizationoftheI/ObuersforI/O-intensivejobsbymanipulatingtheI/Obuersize. Results
fromextensivetrae-drivensimulationexperimentsshowthatomparedwithanumberofshemeswithxedresoureweightsand
buersizes,thefeedbakontrolmehanismdeliversaperformaneimprovementintermsofthemeanslowdownbyupto282%
(withanaverageof125%).
Keywords. Feedbakontrol,I/O-intensiveappliations,luster,loadbalaning
1. Introdution. Sheduling [16, 19℄ and load balaning [1, 10℄ tehniques in parallel and distributed
systemshavebeen investigated to improvesystemperformanewith respet to throughputand/orindividual
response time. Sheduling shemes assign work to mahines to ahieve better resoure utilization, whereas
load-balaningpoliiesanmigrate anewlyarrived job ora runningjob preemptively toanother mahines if
needed.
Sinelusters-atypeoflooselyoupledparallelsystem-havebeomewidelyusedforsientiand
ommer-ial appliations, several distributed load-balaning shemesin lusters havebeenpresentedin the literature,
primarily onsideringCPU [9,10℄, memory[1, 23℄, oraombinationof CPUand memory[26, 27℄. Although
these load-balaning poliies have beenveryeetivein inreasing the utilization of resouresin distributed
systems(andthus improvingsystemperformane),they haveignoredone type ofresoure,namely disk (and
disk I/O). The impat of disk I/O on overall system performane is beoming signiant asmore and more
jobs with high I/Odemand are runningon lusters. This makesstoragedevies alikely performane
bottle-nek. Therefore,webelievethatforanydynamiloadbalaningshemetobeeetivein thisnewappliation
environment,itmustbemadeI/O-aware.
Typial examples of I/O-intensiveappliations inlude long running simulations of time-dependent
phe-nomenathatperiodiallygeneratesnapshotsoftheirstate[22℄,arhivingofrawandproessedremotesensing
data [4℄, multimedia and web-based appliations. These appliations share a ommon feature in that their
storageandomputationalrequirementsareextremelyhigh. Therefore,thehighperformaneofI/O-intensive
appliationsheavilydepends ontheeetiveusageof storage,in additiontothat ofCPUand memory.
Com-poundingtheperformaneimpatofI/Oingeneral,anddiskI/Oinpartiular,thesteadywideninggapbetween
CPUandI/Ospeedmakesloadimbalanein I/Oinreasinglymoreruial tooverallsystemperformane. To
bridgethisgap,I/ObuersalloatedinthemainmemoryhavebeensuessfullyusedtoreduediskI/Oosts,
thus improvingthethroughputofI/Osystems.
This paper proposes a feedbak ontrol mehanism to dynamially ongure resoure weights and I/O
buers in suh away that the weights are apable of reeting thesigniane of systemresoures, and the
memoryutilizationisimprovedforI/O-andmemory-intensiveworkload.
Therestofthepaperisorganizedasfollows. RelatedworkintheliteratureisreviewedinSetion2.Setion3
desribes system model, and Setion 4 proposes the feedbak ontrol mehanism. Setion 5 evaluates the
performaneof themehanism. Finally,Setion6onludes thepaperbysummarizingthemain ontributions
andommentingonfuturediretions ofthis work.
∗
Department of Computer Siene, New Mexio Institute of Mining and Tehnology, Soorro, New Mexio 87801.
http://www.s.nmt.edu/
∼
xqin (xqins.nmt.edu). Questions, omments, or orretions to this doument may be direted tothatemailaddress.
†
2. Related Work. There exists a large base of exellentresearh related to distributed load balaning
models, and to name just afew: sender orreeiver-initiated diusion [5, 24℄, the gradientmodel [6, 13, 14℄,
and the hierarhial balaning model Pollak [24℄. Eager et al. studied both reeiver and sender initiated
diusion, and the results of their study showed that reeiver-initiated poliies are preferable at high system
loads if the overheads of task transferunder thetwo poliies are omparable [5℄. The gradient model makes
useofagradientproximitymap ofunderloadedproessorstoguidethemigration oftasksfrom overloadedto
underloadedproessors[6,13,14℄. Underloadednodesdynamiallyupdatethegradientproximitymap,whereas
overloadednodesinitiate taskmigrations. Pollark proposedasalableapproahfordynamiloadbalaningin
largeparallel anddistributed systemsonamulti-levelontrol hierarhy[15℄. Thehierarhialshemeahieve
signiant performane gain due to the parallelism in the low level of the hierarhy and the possibility to
aggregateinformationin thehigherleveloftheontroltree[15℄.
Theissue of distributed loadbalaning forCPU and memoryresoureshas been extensivelystudied and
reportedin theliterature. Forexample,Harhol-Balteret al.[9℄proposed aCPU-basedpreemptivemigration
poliythatwasmoreeetivethannon-preemptivemigrationpoliies. Zhangetal.[27℄fousedonloadsharing
poliiesthatonsiderbothCPUandmemoryserviesamongthenodesofaluster. Throughoutthispaper,the
CPU-memory-basedloadbalaning poliy presentedin [27℄will bereferredto asCM.The simulationresults
showthattheCMpoliynotonlyimprovesperformaneofmemory-intensivejobs,butalsomaintainsthesame
loadsharingqualityoftheCPU-basedpoliiesforCPU-intensivejobs[27℄.
A largebody ofworkanbefoundin theliteraturethat addressestheissueof balaningtheloadof disk
systems[11,18℄. Sheuermann etal.[18℄ studiedtwoissuesinparallel disksystems,namelystripingand load
balaning, and showed their relationship to response time and throughput. Leeet al. [11℄ proposed two le
assignment algorithms that minimize the varianeof the servie time at eah disk, in addition to balaning
theload arossalldisks. Sine theproblem of balaning the utilizations arossalldisks is isomorphito the
multiproessorsheduling problem [7℄, a greedy multiproessor-shedulingalgorithm, alled LPT [8℄, anbe
appliedtodiskloadbalaning[11℄. Thus,LPTgreedilyassignsaproesstotheproessorwiththelightestI/O
load[11℄. Throughoutthispaper,werefertotheapproahesthatdiretlyapplyLPTtoI/Oloadbalaning as
theIO poliy. TheI/Oloadbalaning poliies in these studies havebeen shown to be eetivein improving
overallsystemperformanebyfullyutilizingtheavailableharddrives.
Veryreently,threeloadbalaningmodels,whihonsiderI/O,CPUandmemoryresouressimultaneously,
were presented[21,26℄. In[21℄, adynami load-balaningsheme,tailoredforthespei requirementsofthe
Question/Answerappliation, was proposed along with a performane analysis of the approah. Xiao et al.
proposed eetiveloadsharingstrategiesbyminimizing bothCPU idletimeandthenumberofpagefaultsin
lusters[26℄.
However, the load-balaning models presented in [21, 26℄ are similar in the sense that the weights of
system resoures and buer size are statially ongured with a dynamial workload. In ontrast, the new
feedbakontrolmehanismproposedinthisstudy judiiouslyonguresthese parametersinaordanewith
theworkloadof theluster. Trae-drivensimulationsshowthat, ompared with theCM and IOpoliies, the
proposedshemewithafeedbakontrolmehanismsigniantlyenhanestheoverallperformaneofaluster
systemunderbothmemory-intensiveandI/O-intensiveworkload.
Someworkhasbeendonetomakeuseoffeedbakontrolmehanismsinoperatingsystemsanddistributed
environments[12,20℄. Forexample,Steereetal. proposedashedulingshemethat dynamiallyadjustsCPU
alloation and period of threads using the feedbak of an appliation's rates of progress with respet to its
inputsand/oroutputs[20℄. LiandNahrstedt studiedafeedbakontrolalgorithmtosupportend-to-endQoS
in a distributed environment [12℄. However, the feedbak ontrols of resoureweights and buer sizes have
notbeenaddressedin these works. In ontrast,this paperhaspresentedtheexperimental resultsthat verify
thebenetsoftheproposedfeedbakontrolmehanismforbothresoureweightsandbuersizes inahighly
dynamienvironment.
3. SystemModel. Weonsidertheissueoffeedbakontrolmethodtoimprovetheperformaneofload
balaning shemes in a luster onneted by a high-speed network, where eah node not only maintains its
individualjobqueuethatholdsjobsuntiltheynishexeution,butalsopereivesreasonablyup-to-dateglobal
loadinformationbyperiodiallyexhangingloadstatuswithothernodes. Jobsarriveateahnodedynamially
eah node ismodeled asasingleM/G/1queue [11℄. Sine jobs may be delayed beause ofwaiting in queues
(to share resoureswithother jobs) or beingmigrated to remote nodes, the slowdown imposed on ajob
u
is denedasbelow,slowdown
(
u
) =
t
f
(
u
)
−
t
a
(
u
)
t
CP U
(
u
) +
t
IO
(
u
)
(3.1)
where
t
f
(
u
)
andt
a
(
u
)
are the nish and arrival times of the job, andt
CP U
(
u
)
and timeIO
(
u
)
are the times spentbyjobu
onCPUandI/O,respetively,withoutanyresouresharing.In expression 3.1, the numerator orresponds to the total time the job spends running, aessing I/O,
waiting,ormigrating,andthedenominatororrespondstotheexeutiontimeforjobuin adediatedsetting.
Thedenitionofslowdownisanextensionoftheoneusedin[9,26,27℄,whereI/Oaesstimeisnotonsidered.
Forsimpliity,weassumethatallnodesarehomogeneous,havingidential omputingpower,memory
a-paity,anddiskI/Operformaneharateristis. Thissimplifyingassumptionshouldnotrestritthegenerality
oftheproposedmodel,beauseifalusterisheterogeneous,therelativeloadofagivenjobimposedonanode
withhighproessingapabilityislessthanthatimposedonanodewithlowperformane. Theproposedsheme
maybeextendedto handleheterogeneoussystembyinorporatingasimpleonversionmehanismforrelative
load[16℄.
Wealsoassumethenetworkinourmodelisfullyonnetedandhomogenousinthesensethatommuniation
delay between any pair of nodes is the same. This simpliation of the network is ommonly used in many
load-balaningmodels[9,26, 27℄. Additionally,weassumethat theinputdataof eah jobhasbeenstoredon
theloal disk ofthe nodeto whih thejob is submitted. This assumptionis onservativein nature,sine we
ondutedanexperimenttoshowthat,underI/O-intensiveworkload,theperformaneoftheproposedshemes
withsuh assumptionisapproximately10%lesseetivethanthat oftheshemeswithoutit.
Foranewlyarrivedjob
u
atanodei
,loadbalaningshemesattempttoshipittoaremotenodewiththe lightest loadifnodei
isheavilyloaded,otherwisejobu
isadmittedintonodei
andexeutedloally. Toavoid uselessmigration that maypotentiallydegradethe systemperformane, theloadbalaning shemesonsidertransferringajobonlyiftheloaddisrepanybetweenthesourenodeandthedestinationnodeisgreaterthan
theloadofthenewlyarrivedjobplusthemigrationost,thereforeguaranteeingthateahmigrationimproves
theexpetedslowdownofthejob. Ifanappropriateandidateremotenodeisnotavailableorthemigrationis
evaluated tobeuseless,theloadbalaningshemeswillnotinitiatethejobmigration.
4. Adaptive Load BalaningSheme.
4.1. WeightedAverageLoad-balaningSheme. Inthissetion,wepresentWAL,aweightedaverage
load-balaningsheme. EahjobisdesribedbyitsrequirementsforCPU,memory,andI/O,whiharemeasured
bythenumberofjobs runningin thenodes, Mbytes, and numberofdisk aessesperms, respetively. Fora
newlyarrivedjob
u
atanodei
,theWAL-FCshemebalanesthesystemloadinthefollowingvesteps. 1. First,theloadofnodei
isupdatedbyaddingjobu
'sload,assigningthenewbornjobtotheloalnode. 2. Seond,amigrationistobeinitiatedifnodei
'sloadisoverloaded. Nodei
isoverloaded,if: (1)itsload isthehighest;and(2)theratiobetweenitsloadandtheaverageloadarossthesystemisgreaterthanathreshold,whihis setto 1.25in ourexperiments. Thisoptimal value,whih isonsistentwith the
resultreportedin[25℄,isobtainedfromanexperimentwherethethresholdisvariedfrom1.0to2.0.
3. Third,aandidatenode
j
with thelowest loadis hosen. Intheasewhere there aremorethantwo nodeswith thelowest load,we randomlyseletonenode to breakthe tie. Ifaandidate node isnotavailable,WAL-FCwillbeterminated andnomigrationwill bearriedout.
4. Fourth, WAL-FC determines if job
u
is eligible for migration. A job is eligible for migration if its migrationisableto potentiallyreduethejob'sslowdown.5. Finally,job
u
ismigratedtotheremotenodej
,andtheloadofnodesi
andj
isupdatedinaordane withjobu
'sload.WAL-FC alulates the weighted average loadindex in the rst step. The load index of eah node
i
is denedastheweightedaverageofCPUandI/Oload,thus:load
(
i
) =
W
CP U
×
loadCP U
(
i
) +
W
IO
×
loadIO
(
i
)
,
(4.1)de-node
i
.W
CP U
andW
IO
are resoure weights used to indiate the signiane of the orresponding re-soure.ItisnotedthatthememoryloadisexpressedbytheimpliitI/Oloadimposedbypagefaults. Let
l
page
(
i, u
)
andl
IO
(
i, u
)
denote theimpliitand expliitI/Oloadofjobu
assigned to nodei
, respetively. loadIO
(
i
)
an bedened byequation4.2,whereM
i
isasetofjobsrunningonnodei
:load
IO
(
i
) =
X
u
∈
M
i
l
page(
i, u
) +
X
u
∈
M
i
l
IO
(
i, u
)
.
(4.2)Let
r
M EM
(
u
)
denote thememoryspaerequestedbyjobu
,andn
M EM
(
i
)
representthememoryspaein bytes that isavailable to alljobs runningonnodei
. Itis to be notedthatthe memoryspae,n
M EM
(
i
)
,an beongured in aordane with thebuer size that is adaptively tuned bythe feedbak ontrol mehanismproposedinSetion4.2. Whenthenode'savailablememoryspaeislargerthanorequaltothememorydemand,
there isnoimpliitI/Oloadimposedon thedisk. Conversely, whenthememoryspaeof anode isunable to
meetthememoryrequirementsofthejobs,thenodeenountersalargenumberofpagefaults,leadingtoahigh
impliit I/Oload. Impliit I/Oloaddepends on three fators, namely, theavailable usermemory spae, the
pagefaultrate,andthememoryspaerequestedbythejobsassigned tonode
i
. Morepreisely,l
page
(
i, u
)
anbedened as follows, where
µ
i
denotesthe page fault rateof the node, and loadM EM
(
i
)
isthe memory load denoted asthesumofthememoryrequirementsofthejobsrunningonnodei
.l
page(
i, u
) =
(
0
ifloadM EM
(
i
)
≤
n
M EM
(
i
)
,µ
i
×
P
v∈
Mi
r
M EM
(
v
)
n
M EM
(
i
)
otherwise.
(4.3)
l
IO
(
i, u
)
in Equation4.2 isafuntionof I/Oaess rate,denotedλ
u
)
,and I/Obuerhit rateh
(
i, u
)
thatwillbedisussedinSetion4.1. Thus,
l
IO
(
i, u
)
isapproximatedbythefollowingexpression:l
IO
(
i, u
) =
λ
u
×
(1
−
h
(
i, u
))
.
(4.4)In what follows, we quantitatively determine whether a job is eligible for migration. When a job
u
is assignedtonodei
,itsexpetedresponsetimer
(
i, u
)
anbeomputedinEquation4.5.r
(
i, u
) =
t
u
×
E
(
L
i
) +
t
u
×
λ
u
×
E
(
s
i
disk
+
Λ
i
disk
×
E
((
s
i
disk
)
2
)
2(1
−
ρ
i
disk
)
)
,
(4.5)where
t
u
andλ
u
aretheomputationtimeand I/Oaessrateofjobu, respetively.E
(
s
i
disk
)
andE
((
s
i
disk
)
2
)
are the mean and mean-square I/Oservie time in node
i
, andρ
i
disk
is the utilization of the disk in nodei
.E
(
L
i
)
representsthe meanCPU queuelengthL
i
, andΛ
i
disk
denotesthe aggregateI/Oaessrate in nodei
. Sinethe expetedresponsetime ofaneligiblemigrantonthesourenode hastobegreaterthan thesumofitsexpetedresponsetimeonthedestinationnodeandthemigrationost,job
u
iseligibleformigrationif:r
(
i, u
)
> r
(
j, u
) +
c
u
,
(4.6)where
j
representsadestinationnode, andc
u
isthemigrationost(time)modeledasfollows,c
u
=
e
+
d
u
×
(
1
b
ij
net
+
1
b
i
disk
+
1
b
j
disk
)
,
(4.7)
where
e
isthexedostofmigratingthejobandloadingitintothememoryonanothernode,b
ij
net
denotestheavailable bandwidthof thenetwork linkbetweennode
i
andj
,b
i
disk
istheavailabledisk bandwidthinnodei
.Inpratie,
b
ij
net
andb
j
disk
anbemeasuredbyaperformanemonitor[3℄. Aordingly,thesimulatordisussedin Setion5estimates
b
ij
net
andb
j
4.2. Problem Desription and Examples. Thefeedbak ontrolmehanismthat aims atminimizing
themeanslowdownfousesonadjustingtheresoureweightsandthebuersizes. Tohelpdesribetheproblem
ofxedresoureweightsandI/Obuersizes,werstpresentthefollowingexamplesthatmotivatetheproposed
solutionto improvethesystemperformane.
Assumealuster withsixidentialnodes[9,17,26,27℄, to whih theIO load-balaningpoliy isapplied.
The average page-fault rate and I/O aess rate are hosento be2.0 No./ms (Number/Milliseond)and 2.8
No./ms,respetively. Thetotalmemorysize foreahnode is640Mbyte, andother parametersoftheluster
aregiveninSetion5.1. Wemodiedthetraesusedin[9,27℄,addingarandomlygeneratedI/Oaessrateto
eahjob. Thetraesusedin [9℄havebeenolletedfromoneworkstationonsixdierenttimeintervals. Inthe
traesusedin our experiments,theCPU and memorydemands remain unhanged, and thememorydemand
ofeahjob ishosenbasedonaParetodistributionwiththemeansize of4Mbytes[27℄.
Toevaluatetheimpatofresoureweights(seeEquation4.1)onthesystemperformane,weonduteda
simulationexperimentwheretheresoureweightswerestatiallyset. Figure4.1plotstherelationshipbetween
theresoureweightofI/Oandthemeanslowdownexperienedbyallthejobsinthetrae. Theresultindiates
thatthemeanslowdownonsistentlydereasesastheI/Oresoureweightinreasesfrom0to1withinrements
of 0.2. We attributed this observation to the fat that, under I/O-intensive workload onditions, the I/O
resoureweightwithahigh valueisableto auratelyreetthesigniane ofthedisk I/Oresouresin the
system.
0
10
20
30
40
50
60
70
0
0.2
0.4
0.6
0.8
1
W
IO
, resource weight of disk I/O
Mean Slowdown
Fig.4.1. Meanslowdowns as a funtionof theI/O
re-soure weight. Averagepage-faultrate=
2
.
0
No./ms,averageI/Oaessrate=
2
.
8
No./ms.0
10
20
30
40
50
60
110
160
210
260
310
360
Buffer Size (MByte)
Mean Slowdown
Fig.4.2.Meanslowdownsasafuntionofthebuersize.
Averagepage-faultrateis
5
.
0
No./ms,averageI/Oaessrateis
2
.
3
No./msThememoryofeahnodeisdividedinto twoportions, withoneservingasI/Obuerandtheotherbeing
used to storeworking sets ofrunning jobs. Without loss ofgenerality, weassume that thebuer sizes of six
nodes are idential. Weonduted aseond experiment, in whih thebuer sizes were statiallyongured.
Figure4.2showsthebuersizehosenintheexperimentandtheorrespondingmeanslowdownsobtainedfrom
thesimulator.
TheurveinFigure4.2revealsthatthebuersizehasalargeeetonthemeanslowdownsoftheIO-aware
poliy. When buersize is smallerthan 210MByte,the slowdown dereaseswith theinreasing valueof the
buersize. Inontrast,theslowdowninreasesasthebuersizeinreasesifthebuersizeisgreaterthan210
MByte. Optimally, themean slowdownof thisgiven workloadreahestheminimumvaluewhen buersize is
210MByte. AlargebuersizeresultsinahighbuerhitrateandreduesI/Oproessingtime,therebyausing
apositiveeetontheperformane. Ontheotherhand,givenaxedvalueofthetotalavailablemainmemory
size, a larger buer size implies asmaller the amount of memory used to store the working sets of running
jobs, whih inturn leadsto alargernumberof pagefaults. Ingeneral,alargebuersize mayintrodueboth
positiveand negativeeet onthemeanslowdownat thesametime,and theoverallperformanedepends on
theresultanteet.
Although the stati onguration of resoure weights and buer sizes is an approah to tuning the
per-formaneoflusterswhere workloadonditionsanbemodeledandpredited, thisapproahperforms poorly
4.3. A Feedbak ControlMehanism. Thehighlevelviewofthearhitetureforthefeedbakontrol
mehanismis presentedin Figure 4.3, wherethe arhitetureomprisesaload-balaningsheme, a
resoures-sharingontroller, and afeedbakontroller. The resoure-sharing ontroller onsists of aCPU sheduler, a
memoryalloator andan I/Oontroller. The slowdown ofanewly ompleted job andthe historyslowdowns
arefed baktothefeedbakontroller,whihthendeterminestherequiredontrolation
∆
W
IO
and∆
bufsize .∆
W
IO
>
0
meanstheIO-weightneedstobeinreased,andotherwisetheIO-weightshouldbedereased. Sinethesumof
W
CP U
andW
IO
is1,theontrolation∆
W
CP U
anbeobtainedaordingly. Similarly,∆
bufsize>
0
meansthebuersize needsto beinreased,andotherwisethebuersizeisto bedereased.Newly
arrived jobs
Running
jobs
CPU
MEM
I/O
Completed job u
Resource
Sharing
controller
Load-balancing
Feedback
Controller
Slowdown history
slowdown(u)
W
IO
, W
CPU
bufsize
Fig.4.3.Arhitetureofthefeedbakontrolmehanism
Therstgoalofthefeedbakontrolleristomanipulatetheresoureweightsinawaythatmakesitpossible
to minimizethemean slowdownof jobs. The systemmodel foranopen loopbalaneris approximatelygiven
bythefollowingequation,
slowdown
(
z
) =
−
wg
(
L
)
W
IO
(
z
) +
wd
(
L
)
,
(4.8)where
wg
(
L
)
andwd
(
L
)
arethegainfatoranddisturbanefatoroftheI/OresoureweightunderworkloadL
, respetively. Thevaluesofwg
andwd
largelydepend onworkloadonditionsand theapplied load-balaning poliy. Thus,wgandwdanbeobtainedbasedonsimulationmodelsforopen-looploadbalaners. Theontrolrulefortheresoureweightisformallymodeled below,
∆
W
IO,u
=
G
w
(1
−
S
u
S
u
−
1
)
∆
W
IO,u
−
1
|∆
W
IO,u
−
1
|
,
(4.9)W
IO,u
=
W
IO,u
−
1
+ ∆
W
IO
,
(4.10)where
∆
W
IO,u
istheontrolation,S
u
denotestheaverageslowdown,∆
W
IO,u−
1
|
∆
W
IO,u
−
1
|
indiateswhethertheprevious
ontrol ationhasinreasedordereasedtheresoureweight,and
G
w
denotes theontrollergainfortheI/O resoure weight. In the experiments presented shortlyin the next setion,G
w
is tuned to be 0.5 for better performane. LetW
IO,u
betheresoureweightuponthearrivalof jobatthesystem,theresoureweightwill be updatedtoW
IO,u
−
1
+ ∆
W
IO
. Without lossof generality, we make use of a linear model to apture the harateristisofvaryingworkloadonditions. Themodelisgivenbythefollowingequation,slowdown
(
z
) =
−
wg0
(
L
)
W
IO
(
z
) +
wd0
+ ∆
wd,
(4.11)The feedbak ontroller attempts to manipulate theresoure weights in the followingthree steps. First,
when ajobu is aomplished,theontroller alulatestheslowdown
s
u
ofthis newlyompleted job, Seond,s
u
is stored in theslowdownhistorytable,and theaverageslowdownS
u
isomputed aordingly. Note thatS
u
reets a spei pattern of the reent slowdowns in the dynami workload. The table size is a tunableontrolations
∆
W
IO,u
and∆
W
CP U,u
,whiharebasedonthepreviousontrolationalongwiththeomparison betweenS
u
andS
u
−
1
. Morepreisely,theperformaneisregardedtobeimprovedbythepreviousontrolation ifS
u
−
1
> S
u
,thereforetheontrollerontinuesinreasingW
IO
ifithasbeeninreasedbythepreviousontrol ation,otherwiseW
IO
isdereased. Similarly,S
u
−
1
< S
u
meansthattheperformanehasbeenworsenedsine thelatest ontrol ation, suggesting thatW
IO
has to be inreased if thepreviousontrol ationhas reduedW
IO
,andvie versa.Besides onguringthe weights,the seondgoal ofthe feedbak ontrolmehanismis to dynamiallyset
thebuersizeofeahnodebasedontheunpreditableworkload. Themehanismisaimingatimprovingbuer
utilizations and reduing the number of page faults by maintaining an eetive usage of memory spae for
runningjobsandtheirdata.
Weanderivetheslowdownbasedonamodelthatapturestheorrelationbetweenthebuersizeandthe
slowdown. Forsimpliity,themodelanbeonstrutedasfollows,
slowdown
(
z
) =
−
bg
(
L
)
bufsize(
z
) +
bd
(
L
)
,
(4.12)where
bg
(
L
)
andbd
(
L
)
arethe buer size gain fatorand disturbane fator under workload L,respetively. Theontrolruleforbuersizesis formulatedas,∆
bufsizeu
=
G
b
(
S
u
−
1
−
S
u
)
∆
bufsizeu
−
1
|∆
bufsizeu
−
1
|
,
(4.13)where
∆
bufsizeu
istheontrolation,∆
bufsizeu
−
1
|
∆
bufsizeu−
1
|
indiateswhether thepreviousontrol ationhasinreased
or dereased the resoure weight, and
C
b
denotes the ontroller gain.G
w
is tuned to be 0.5 in order to deliverbetterperformane. Letbufsizeu
−
1
betheurrentbuersize,thebuersizeisalulatedasbufsizeu
=
bufsize
u
−
1
+ ∆
bufsizeu
.Asanbeseenfrom4.3,thefeedbakontrolgeneratesontrolation
∆
bufsizein additionto∆
W
CP U
and∆
W
IO
. Theadaptivebuer size makesnotieableimpats onboththememoryalloator andI/O ontroller,whihinturnaettheoverallperformane(SeeFigure4.2). Thefeedbakontrollergeneratesaontrolation
∆
bufsize based onthe previousontrol ationalong with theomparison betweenS
u
andS
u
−
1
. Speially,S
u
−
1
> S
u
,meanstheperformaneisimprovedbythepreviousontrolation,therebyinreasingthebuersizeifithasbeeninreasedbythepreviousontrolation,otherwisethebuersizeisredued. Likewise,
S
u
−
1
< S
u
, indiates that thelatest buerontrol ation leadsto aworse performane,implying that thebuer size hastobeinreasedifthepreviousontrolationhasreduedthebuersize,otherwisetheontrollerdereasesthe
buersize.
Theextratimespentinperformingfeedbakontrolisnegligibleand,therefore,theoverheadintroduedby
thefeedbakontrolmehanismisignoredinoursimulationexperiments. Thereasonisbeausetheomplexity
ofthemehanismislow,and ittakesaonstanttimeto makeafeedbakontrol deision.
5. ExperimentsandResults. Toevaluatetheperformaneoftheproposedload-balaningshemewith
afeedbakontrolmehanism,wehaveonduted atrae-drivensimulation,in whihtheperformanemetri
used is slowdown that is dened earlier in setion 3. We have evaluated the performane of the following
load-balaningpoliies:
1. CM:theCPU-memory-basedload-balaningpoliy[27℄withoutusingbuerfeedbakontroller. Ifthe
memoryis imbalaned,CM assignsthenewly arrivedjob tothe nodethat hastheleast aumulated
memoryload. WhenCPUloadisimbalaneandmemoryloadiswellbalaned,CMattemptstobalane
CPUload.
2. IO:theIO-basedpoliy[11℄withoutusingthefeedbakontrolmehanism. TheIOpoliyusesaload
index that representsonlytheI/Oload. Forajob arriving in nodei, the IOsheme greedily assigns
thejob tothenodethathastheleastaumulatedI/Oload.
3. WAL:theWeightedAverageLoad-balaningshemewithoutthefeedbakontroller[21℄.
4. WAL-FC:theWeightedAverageLoad-balaningshemewiththefeedbakontrolmehanism.
5. NLB:Thenon-load-balaningpoliywithoutusingthefeedbakontroller.
Tostudy dynami load balaning, Harhol-Balterand Downey [9℄ implemented atrae-driven simulator
foradistributed systemwith6nodesinwhihround-robinshedulingisemployed. Theloadbalaning poliy
studied in that simulator is CPU-based. Zhang et. al [27℄ extended the simulator, inorporating memory
reourses into the simulation system. Based on the simulator presented in [27℄, our simulator inorporates
the following newfeatures: (1) The abovepolies are implemented in the simulator. (2) The interonnet is
assumedtobeafullyonnetednetwork. (3)Asimplediskmodelisaddedintothesimulator. (4)AnI/Obuer
model,whihwillbepresentedshortlyinthissetion,isimplementedontopofthediskmodel. Thetraesused
in the simulationare modiedfrom [9℄[27℄, andit is assumedthat the I/Oaess rate israndomly hosenin
aordanewithauniform distribution. Weassumethat theI/Oaess rateof eahjob isindependentofthe
job's memoryspaerequirementandCPU servietime. Although thissimpliation deatesanyorrelations
between I/O requirement and other job harateristis, we an examine the impat of I/O requirement on
systemperformanebyonguringthemeanI/Oaessrateasaworkloadparameter.
ThesimulatedsystemisonguredwithparameterslistedinTable5.1. TheparametersforCPU,memory,
disks,andnetworkarehoseninsuhawaythattheyresembleatypialluster oftheurrentday.
Table5.1
DataCharateristis
Parameters Value Parameters Value
CPUSpeed 800MIPS PageFaultServieTime 8.1ms
RAMSize 640MByte SeekandRotationtime 8.0ms
InitialBuerSize 160MByte DiskTransferRate 40MB/Se.
Contextswithtime 0.1ms NetworkBandwidth 1Gbps
Disk aessesof eah job are modeled asaPoisson proess. Data sizes
d
RW
u
ofthe I/Orequestsin jobu
are randomly generated based on aGamma distribution with the mean size of 250KByte and thestandarddeviation of 50Kbyte. The sizes hosen in this way reettypialdata harateristisfor MPEG-1 data [2℄,
whihisretrievedbymanymultimedia appliations.
Sine buer an be used to redue the disk I/O aess frequeny (See Equation4.4), we approximately
modelthebuerhitprobabilityofI/Oaessforjob
u
runningonnodei
bythefollowingformula:h
(
i, u
) =
(
r
u
r
u
+1
if
d
buf
(
i, u
)
≥
d
data
(
u
)
,r
u
r
u
+1
×
d
buf
(
i,u
)
d
data
(
u
)
otherwise,
(5.1)
where
r
u
isthe datare-aessrate,d
buf
(
i, u
)
is thebuersize alloatedto jobu
, andd
data
(
u
)
istheamount of data job u retrievesfrom orstored to the disk, given abuer with innitesize. I/Obuer in a nodeis aresouresharedbymultiplejobsin thenode,andthebuersize ajobanobtainin node
i
atruntimeheavily dependson thejobs' aesspatterns, haraterizedby I/Oaess rateand averagedata size of I/Oaesses.d
data
(
u
)
linearly depends on aess rate, omputation time and average data size of I/O aessesd
RW
u
, andd
data
(
u
)
isinverselyproportionaltoI/Ore-aessrate.d
buf
(
i, u
)
andd
data
(
u
)
areestimatedusingthefollowingtwoequations:
d
buf
(
i, u
) =
λ
u
d
RW
u
d
buf
(
i
)
P
k
∈
M
i
λ
k
d
RW
u
,
(5.2)d
data
(
u
) =
λ
u
t
u
d
RW
u
r
u
+ 1
.
(5.3)FromEquations5.1,5.2and5.3,hitrate
h
(
i, u
)
beomes:h
(
i, u
) =
(
r
u
r
u
+1
if
d
buf
(
i, u
)
≥
d
data
(
u
)
,r
u
d
buf
(
i
)
t
u
P
j∈
Mi
λ
j
d
RW
j
otherwise.
(5.4)
size. The inreasing rate of the buerhit probabilitydrops when the buer size is greater than 150 Mbyte,
suggestingthat further inreasingthebuersize annotsigniantlyimprovethebuerhit probabilitywhen
the buer size approahes to a level at whih a large portion of the I/Odata an be aommodatedin the
buer.
0
10
20
30
40
50
60
70
80
10
50 100 150 200 250 300 350 400 450
NL, R=5
CM, R=5
IO, R=5
NL, R=3
CM, R=3
IO, R=3
Buffer Size (Mbyte)
Buffer Hit Probability
Fig. 5.1. Buer Hit Probability as a funtion of the
BuerSize, page-fault rateis 4.0No./ms, I/Oaess rateis
2.2No./ms.
0
10
20
30
40
50
60
70
80
90
7.2
7.4
7.6
7.8
8
8.2
8.4
8.6
8.8
NLB
IO
WAL
WAL-FC
Number of Page Fault Per Millisecond
Mean Slowdown
Fig.5.2.Meanslowdownsasafuntionofthepage-fault
rate,I/Oaess rateof0.1No./ms.
5.2. Memory Intensive Workload. To simulate a memory intensive workload, the I/O aess rate
is xed to a omparatively low level of 0.1 No./ms. The page-fault rate is set from 7.2 No./ms to 8.8
No./ms with inrements of 0.2 No./ms. The performane of CM is omitted, sine it is very lose to that
ofWAL.
Figure 5.2 reveals that the mean slowdowns of all the poliies inreasewith the page-fault rate. This is
beauseasI/Odemandsarexed,highpage-faultrateleadstoahighutilizationofdisks,ausinglongerwaiting
timeonI/Oproessing.
When the page-fault rate is high, WAL outperforms IO and NLB, and the WAL-FC has better
perfor-mane than both WAL and IO. For example, the WAL poliy redues slowdowns over the IO poliy by up
to 37.2% (with an average of 31.5%), and the WAL-FC poliy improvesthe performane in terms of mean
slowdown over IO by up to a fator of 4 (400%). The reason is that the IO poliy only attempts to
bal-ane expliitI/O load,ignoring the impliit I/Oloadthat resulted from page faults. When theexpliit I/O
loadis low,balaning expliit I/Oloaddoesnotmakeasigniant ontributionto balaning the overall
sys-tem load. In addition, NLB is onsistently the worst among the six poliies, sine NLB leaves three shared
resouresextremely imbalaned and does notimprovethe buer utilization by the adaptive onguration of
buersizes.
Moreinterestingly, the poliies that use the feedbakontrol mehanism algorithm onsiderably improve
the performane overthose withoutemploying the feedbak ontroller. For example,WAL-FC improvesthe
systemperformaneoverWALbyupto274%(withanaverageof220%). Consequently,theslowdownsofNLB,
WAL,andIOaremoresensitiveto thepage-fault ratethanWAL-FC.
5.3. I/O-IntensiveWorkload. TostresstheI/O-intensiveworkloadin thisexperiment,theI/Oaess
rateis xed at ahigh value of 2.8No./ms, and the page-fault rateis hosen from 1.6 No./msto 2.1 No./ms
withinrementsof0.1No./ms. Thelowpage-faultratesimplythat, evenwhentherequestedmemoryspaeis
largerthanthealloatedmemoryspae,pagefaultsdonotourfrequently. This workloadreetsasenario
wherememory-intensivejobsexhibithightemporalandspatialloalityofaess. Figure5.3plotsslowdownas
afuntion ofthepage-faultrate. TheresultsofIOareomittedfromFigure5.3,sinetheyarenearlyidential
tothoseofWAL.
First,theresultsshowthattheWALshemesigniantlyoutperformstheNLBandCMpoliies,suggesting
that NLB and CM are notsuitable for I/O intensive workload. Forexample, asshown in Figure 5.3, WAL
improves theperformane of CM in terms of themean slowdown by upto a fator of 9(with an average of
0
20
40
60
80
100
1.6
1.7
1.8
1.9
2
2.1
NLB
CM
WAL
WAL-FC
Number of Page Fault Per Millisecond
Mean Slowdown
Fig.5.3. Meanslowdownasafuntionofthepage-faultrate,I/Oaess rateis2.8No./ms.
Seond,Figure 5.3showsthat WAL-FCsigniantlyoutperformsWAL.Forexample,WAL-FCdeliversa
performaneimprovementoverWALbyupto282%(withanaverageof125%). Again,thisisbeausetheW
AL-FCshemeappliesthefeedbakontrollertomeet thehighI/Odemands byhangingtheweightsandtheI/O
buersizestoahieveahighbuerhitprobability. ThisresultsuggeststhatimprovingtheI/Obuerutilization
byusing thefeedbak ontrolmehanismanpotentiallyalleviatethe performane degradationresultedfrom
theimbalanedI/Oload.
Third, theresults further show the slowdowns of NLB and CM are verysensitive to the page-fault rate.
In other words, the mean slowdowns of NLB and CM all inrease notieably with the inreasing value of
I/O load. One reason is, as I/O load are xed, a high page-fault rate leads to high disk utilization,
aus-ing longer waiting time on I/O proessing. A seond reason is, when the I/O load is imbalaned, the
ex-pliit I/O load imposed on some node will be very high, leading to a longer paging fault proessing time.
Conversely, the page-fault rate makes insigniantimpat on the performaneof WAL, and WAL-FC. Sine
the high I/O load imposed on the disks is diminished either by balaning the I/O load or by improving
the buer utilization. This observation suggeststhat thefeedbak ontrol mehanism is apable of boosting
theperformane of lusters under I/O-intensiveworkloadevenin the absene ofanydynami load-balaning
shemes.
5.4. Memory and I/O intensive Workload. The twoprevious setionspresentedthe best ases for
theproposedshemesinetheworkloadwaseither highlymemory-intensiveorI/O-intensivebutnotboth. In
theseextremesenarios,thefeedbakontrolmehanismprovidesmorebenetstolustersthanload-balaning
poliies do. This setion attempts to show anotherinteresting asein whih theluster has aworkloadwith
bothhigh memory and I/Ointensivejobs. The I/Oaess rate is set to 1.5 No./ms. The page fault rate is
from7.2No./msto8.4No./mswithinrementsof0.2No./ms.
Figure 5.4 showsthat theperformanes of CM, IO, and WAL are lose to one another. This is beause
the trae, used in this experiment, omprises a good mixture of memory-intensive and I/O-intensive jobs.
Hene, while CM takes advantage of balaning CPU-memory load, IO an enjoy benets of balaning I/O
load. Interestingly, under this spei memory and I/O intensive workload, the resultant eet of balaning
CPU-memoryloadisalmostidential tothatofbalaning I/Oload.
Aseondobservationisthat,underthememoryandI/Ointensiveworkload,load-balaningshemesahieve
higherlevelofimprovementsoverNLB.ThereasonisthatwhenbothmemoryandI/Odemandsarehigh,the
buersizes in aluster areunlikely tobehanged, asthereis amemoryontentionamong memory-intensive
andI/O-intensivejobs. Thus,insteadofutuatingwidelytooptimizetheperformane,thebuersizesnally
onvergetoavaluethat minimizesthemeanslowdown.
Third, inorporating the feedbak ontrol mehanism in the existing load-balaning shemes is able to
further boost theperformane. Forexample, ompared with WAL,WAL-FCfurther dereases theslowdown
by up to 54.5% (with an average of 30.3%). This result suggests that, to sustain a high performane in
0
10
20
30
40
50
60
70
80
90
100
7.2
7.4
7.6
7.8
8
8.2
8.4
NLB
CM
IO
WAL
WAL-FC
Number of Page Fault Per Millisecond
Mean Slowdown
Fig.5.4.Meanslowdownsasafuntionofthepage-fault
rate,I/Oaessrateof1.5No./ms.
0
20
40
60
80
100
120
100
150
200
250
300
350
400
IOCM,IO rate=2.6
WAL-PM,IO rate=2.6
WAL-RE,IO rate=2.6
Average Data Size (KByte)
Mean Slowdown
Fig. 5.5. Mean slowdown as a funtion of the size of
averagedatasize. Pagefaultrateis0.5No./ms,andI/Orate
is2.6No./ms.
5.5. Average Data Size. Inthe previousexperiments, thedata sizes are hosenbased ontypial
mul-timedia appliations. It is notedthat I/Oloaddepends onI/O aess rateand the averagedata size of I/O
requests, whih in turn relyon theI/Oaess patterns of appliations. Thepurpose ofthis experiment isto
study theperformaneimprovementsahieved bythe feedbakontrol mehanismforother typesof
applia-tionsiftheyexhibitdierentharateristis. Speially, Figure5.5 showstheimpatof averagedatasize on
the performane of the feedbakontrol mehanism. The page fault rate and the I/O aess rate are set to
0.5No./msand2.6No./ms.,respetively. Theaveragedatasizeis hosenfrom 100KByteto400KBytewith
inrementsof50KByte.
Figure5.5 indiates that,forthree examinedload-balaningpoliies, themeanslowdowninreasesasthe
average data size inreases. This is beause, under irumstane that both page fault rate and I/O aess
ratearexed, alargeaveragedata sizeyields ahighutilization of disks,ausing longerwaiting timesonI/O
proessing. More importantly, Figure 5.5 shows that the performane improvement gained by the feedbak
ontrolmehanismbeomesmorenotieablewhentheaveragedatasize islarge. Thisresultsuggeststhat the
proposedapproahisnotonlybeneialformultimediaappliations,butalsoturnsouttobeusefulforavariety
ofappliationsthataredataintensivein nature.
6. Conlusions. In this paper, we have proposed a feedbak ontrol mehanism to dynamially adjust
theweightsof reoursesand the buer sizes in aluster with a generaland pratialworkloadthat inludes
memoryandI/Ointensiveworkloadonditions. Theprimaryobjetiveoftheproposedapproahistominimize
the number of page faults for memory-intensive jobs while improving the buer utilization of I/O-intensive
jobs. Thefeedbakontrollerjudiiouslyongurestheweightstoahieveanoptimalperformane. Meanwhile,
under aworkloadwhere thememory demand ishigh, the buersizes aredereasedto alloatemorememory
formemory-intensivejobs, therebyleadingtoalowpage-faultrate.
To evaluate the performane of the mehanism, we ompared the proposed WAL-FCshemewith WAL,
CM, and IO. For omparison purposes, the NLB poliy that does not onsider load balaning is also
simu-lated. A trae-drivensimulation providesextensiveempirial resultsdemonstratingthat WAL-FCis eetive
inenhaningperformaneofexisting dynamiload-balaningpoliiesundermemory-intensiveorI/O-intensive
workload. Inpartiular, when the workload is memory-intensive, WAL-FC redues themean slowdown over
theCM andIOpoliiesbyuptoafatorof9. Further,wehavemadethefollowingobservations:
1. When the page-fault rate is higher and the I/O rate is verylow, WAL and CM outperform IO and
NLB,andWAL-FChasbetterperformanethanWAL;
2. WhenI/Odemandsare high,WALandIO aresigniantly superior toCM and NLB.AndWAL-FC
hasnotieablybetterperformanethanthatof IO;
3. Under an I/Ointensiveworkload, the mean slowdowns of NLB and CM allinrease notieably with
I/Oload. Conversely,thepage-fault ratemakesinsigniantimpatontheperformane ofIO,WAL,
andWAL-FC.
5. Theperformane improvementgained bythefeedbakontrol mehanism beomes pronounedwhen
the average data size is relatively large. Future studies in this area may be performed in several
diretions. First,thefeedbakontrolmehanismwillbeimplementedinalustersystem. Seond,we
willstudythestabilityoftheproposedfeedbakontroller. Finally,itwillbeinterestingto studyhow
quiklythefeedbakontrolleronvergestotheoptimalvaluein lusters.
7. Aknowledgements. Thiswork waspartiallysupported by anNSFgrant(EPS-0091900),astart-up
researhfund(103295)fromtheresearhandeonomidevelopmentoeoftheNewMexioInstituteofMining
andTehnology,aNebraskaUniversityFoundationgrant(26-0511-0019),aUNLAademiProgramPriorities
Grant, and aChineseNSF 973 projetgrant(2004b318201). Weare grateful to the anonymous refereesfor
theirinsightfulsuggestionsandomments.
REFERENCES
[1℄ A.AharyaandS.Setia,AvailabilityandUtilityofIdleMemoryinWorkstationClusters,Pro.ACMSIGMETRICSConf.
MeasuringandModelingofComputerSystems,May1999.
[2℄ E.Balafoutis,G. Nerjes, P.Muth, M.Paterakis,P. Triantafillou,andG. Weikum,ClusteredSheduling
Algo-rithmsforMixed-MediaDiskWorkloads,Pro.Int'lConf.onClusterComputing,2002.
[3℄ J. Basneyand M. Livny,ManagingNetwork Resoures inCondor, Proeedingsof theNinth IEEESymposiumonHigh
PerformaneDistributedComputing,Pittsburgh,Pennsylvania,August2000,pp298-299.
[4℄ C.Chang,B.Moon,A.Aharya,C.Shok,A.Sussman,J.Saltz,Titan:AHigh-PerformaneRemote-sensingDatabase,
Pro.ofInternational ConfereneonDataEngineering,1997.
[5℄ D.Eager,E.Lazowaska,andJ.Zahorjan,Aomparisonofreeiver-initiatedandsender-initiatedadaptiveloadsharing,
Proeedings of the 1985 ACMSIGMETRICS onferene on Measurement and modeling of omputer systems,Austin,
Texas,1985.
[6℄ D. J. Evans andWunbutt,Loadbalaning with networkpartitioning usinghost groups, Parallelomputing, 20:325-345,
Marh1994.
[7℄ M.GareyandD.Johnson,ComputersandIntratability:AGuidetothetheoryofNP-Completeness,W.H.Freeman,1979.
[8℄ R. L.Graham,BoundsonMultiproessing TimingAnomalies,SIAMJ.AppliedMath.,Vol.17,No.2,pp.416-429,1969.
[9℄ M. Harhol-Balter andA.Downey,ExploitingProessLifetimeDistributionsforLoadBalaing,ACMtransationon
ComputerSystems,vol.3,no.31,1997.
[10℄ C.HuiandS.Chanson,Improved StrategiesforDynamiLoadSharing,IEEEConurreny,vol.7,no.3,1999.
[11℄ L.Lee,P.Sheauermann,andR.Vingralek,FileAssignmentinParallelI/OSystemswithMinimalVarianeofServie
time,IEEETrans.onComputers,Vol.49,No.2,pp.127-140,2000.
[12℄ B.LiandK.Nahrstedt,AControlTheoretialModelforQualityofServieAdaptations,inIEEEInternationalWorkshop
onQualityofServie,May1998.
[13℄ F.C.H. Lin andR.M. Keller,TheGradient ModelLoad Balaning Method,IEEETrans.SoftwareEngineering,vol.13,
no.1,pp.32-38,Jan.1987.
[14℄ F. MunizandE.J. Zaluska,Parallel LoadBalaning: AnExtensiontotheGradient Model,ParallelComputing,vol.21,
pp.287-301,1995.
[15℄ R. Pollak,AHierarhialLoadBalaning EnvironmentforParallel andDistributed Superomputer, Pro.ofthe
Interna-tionalSymposiumonParallelandDistributedSuperomputing,Fukuoka,Japan,September1995.
[16℄ X.Qin,H.Jiang,andD.R.Swanson,AnEientFault-tolerantShedulingAlgorithmforReal-timeTaskswithPreedene
ConstraintsinHeterogeneousSystems,Pro.the31stInt'lConf.onParallelProessing(ICPP2002),Vanouver,Canada,
Aug2002,pp.360-368.
[17℄ X.Qin,H.Jiang,Y.Zhu,andD.Swanson,DynamiLoadBalaningforI/O-IntensiveTasksonHeterogeneousClusters,
Pro. ofthe10th International ConfereneonHighPerformaneComputing(HiPC2003),De.17-20,2003,Hyderabad,
India.
[18℄ P.Sheuermann,G.Weikum,P.Zabbak,DataPartitioningandLoadBalaninginParallel DiskSystems,TheVLDB
Journal,pp.48-66,July,1998.
[19℄ H.Shen, S.Lor, andP.Maheshwari,Anarhiteture-independent graphialtoolforautomationtention-free
proess-to-proessormapping,JournalofSuperomputing,Vol.18,No.2,2001,p.115-139.
[20℄ D.C.Steere,A.Goel,J.Gruenberg,et.al.,AFeedbak-drivenProportionAlloatorforReal-RateSheduling,
Oper-atingSystemsDesignandImplementation,NewOrleans,Louisiana,Feb1999.
[21℄ M. Surdeanu, D. Modovan, andS. Harabagiu,PerformaneAnalysis of a Distributed Question/ Answering System,
IEEETrans.onParallelandDistributedSystems,Vol.13,No.6,pp.579-596,2002.
[22℄ T.Tanaka,CongurationsoftheSolarWindFlowandMagnetiFieldaroundthePlanetswithnoMagnetield:Calulation
byanewMHD,JournalofGeophysialResearh,pp.17251-17262,Ot.1993.
[23℄ G.Voelker,ManagingServerLoadinGlobalMemorySystems,Pro.ACMSIGMETRICSConf.MeasuringandModeling
ofComputerSystems,May1997.
[24℄ M.Willebek-LeMairandA.Reeves,StrategiesforDynamiLoadBalaningonHighlyParallelComputers,IEEETrans.
ParallelandDistributedSystems,vol.4,no.9,pp.979-993,Sept.1993.
[25℄ X.Wu,V.Taylor,andR.Stevens,DesignandImplementationofProphesyAutomatiInstrumentationandDataEntry
[26℄ L. Xiao, S. Chen, and X. Zhang,DynamiCluster Resoure Alloations for Jobs with Known and Unknown Memory
Demands,IEEETransationsonParallelandDistributedSystems,vol.13,no.3,2002.
[27℄ X.Zhang,Y.Qu,andL.Xiao,ImprovingDistributedWrokloadPerformanebySharingbothCPUandMemoryResoures,
Pro.20thInt'lConf.DistributedComputingSystems(ICDCS2000),Apr.2000.
Editedby: HongShen
Reeived: February27,2004