A Feedback Control Mechanism for Balancing I/O- and Memory-Intensive Applications on Clusters

(1)

A FEEDBACK CONTROL MECHANISMFOR BALANCINGI/O- AND

MEMORY-INTENSIVEAPPLICATIONS ON CLUSTERS

XIAO QIN

∗

, HONG JIANG

†

, YIFENG ZHU

†

, AND DAVID R. SWANSON

†

Abstrat. OneommonassumptionofexistingmodelsofloadbalaningisthattheweightsofresouresandI/Obuersize

arestatiallyonguredandannotbeadjustedbasedonadynamiworkload.Thoughthestationgurationoftheseparameters

performswellinalusterwheretheworkloadanbemodeledandpredited,itsperformaneispoorindynamisystemsinwhih

theworkloadisunknown. Inthispaper,anewfeedbakontrolmehanismisproposedtoimproveoverallperformaneofaluster

withageneral andpratialworkload inludingI/O-intensiveand memory-intensiveload. Thismehanismisalso shownto be

eetiveinomplementingandenhaning theperformaneofa numberofexistingdynamiload-balaningshemes. Toapture

theurrentand pastworkloadharateristis,the primaryobjetivesof thefeedbakmehanismare: (1)dynamiallyadjusting

theresoureweights,whihindiatethesignianeofthe resoures,and(2)minimizingthe numberofpagefaultsfor

memory-intensivejobswhileinreasingtheutilizationoftheI/ObuersforI/O-intensivejobsbymanipulatingtheI/Obuersize. Results

fromextensivetrae-drivensimulationexperimentsshowthatomparedwithanumberofshemeswithxedresoureweightsand

buersizes,thefeedbakontrolmehanismdeliversaperformaneimprovementintermsofthemeanslowdownbyupto282%

(withanaverageof125%).

Keywords. Feedbakontrol,I/O-intensiveappliations,luster,loadbalaning

1. Introdution. Sheduling [16, 19℄ and load balaning [1, 10℄ tehniques in parallel and distributed

systemshavebeen investigated to improvesystemperformanewith respet to throughputand/orindividual

response time. Sheduling shemes assign work to mahines to ahieve better resoure utilization, whereas

load-balaningpoliiesanmigrate anewlyarrived job ora runningjob preemptively toanother mahines if

needed.

Sinelusters-atypeoflooselyoupledparallelsystem-havebeomewidelyusedforsientiand

ommer-ial appliations, several distributed load-balaning shemesin lusters havebeenpresentedin the literature,

primarily onsideringCPU [9,10℄, memory[1, 23℄, oraombinationof CPUand memory[26, 27℄. Although

these load-balaning poliies have beenveryeetivein inreasing the utilization of resouresin distributed

systems(andthus improvingsystemperformane),they haveignoredone type ofresoure,namely disk (and

disk I/O). The impat of disk I/O on overall system performane is beoming signiant asmore and more

jobs with high I/Odemand are runningon lusters. This makesstoragedevies alikely performane

bottle-nek. Therefore,webelievethatforanydynamiloadbalaningshemetobeeetivein thisnewappliation

environment,itmustbemadeI/O-aware.

Typial examples of I/O-intensiveappliations inlude long running simulations of time-dependent

phe-nomenathatperiodiallygeneratesnapshotsoftheirstate[22℄,arhivingofrawandproessedremotesensing

data [4℄, multimedia and web-based appliations. These appliations share a ommon feature in that their

storageandomputationalrequirementsareextremelyhigh. Therefore,thehighperformaneofI/O-intensive

appliationsheavilydepends ontheeetiveusageof storage,in additiontothat ofCPUand memory.

Com-poundingtheperformaneimpatofI/Oingeneral,anddiskI/Oinpartiular,thesteadywideninggapbetween

CPUandI/Ospeedmakesloadimbalanein I/Oinreasinglymoreruial tooverallsystemperformane. To

bridgethisgap,I/ObuersalloatedinthemainmemoryhavebeensuessfullyusedtoreduediskI/Oosts,

thus improvingthethroughputofI/Osystems.

This paper proposes a feedbak ontrol mehanism to dynamially ongure resoure weights and I/O

buers in suh away that the weights are apable of reeting thesigniane of systemresoures, and the

memoryutilizationisimprovedforI/O-andmemory-intensiveworkload.

Therestofthepaperisorganizedasfollows. RelatedworkintheliteratureisreviewedinSetion2.Setion3

desribes system model, and Setion 4 proposes the feedbak ontrol mehanism. Setion 5 evaluates the

performaneof themehanism. Finally,Setion6onludes thepaperbysummarizingthemain ontributions

andommentingonfuturediretions ofthis work.

∗

Department of Computer Siene, New Mexio Institute of Mining and Tehnology, Soorro, New Mexio 87801.

http://www.s.nmt.edu/

∼

xqin (xqins.nmt.edu). Questions, omments, or orretions to this doument may be direted to

thatemailaddress.

†

(2)

2. Related Work. There exists a large base of exellentresearh related to distributed load balaning

models, and to name just afew: sender orreeiver-initiated diusion [5, 24℄, the gradientmodel [6, 13, 14℄,

and the hierarhial balaning model Pollak [24℄. Eager et al. studied both reeiver and sender initiated

diusion, and the results of their study showed that reeiver-initiated poliies are preferable at high system

loads if the overheads of task transferunder thetwo poliies are omparable [5℄. The gradient model makes

useofagradientproximitymap ofunderloadedproessorstoguidethemigration oftasksfrom overloadedto

underloadedproessors[6,13,14℄. Underloadednodesdynamiallyupdatethegradientproximitymap,whereas

overloadednodesinitiate taskmigrations. Pollark proposedasalableapproahfordynamiloadbalaningin

largeparallel anddistributed systemsonamulti-levelontrol hierarhy[15℄. Thehierarhialshemeahieve

signiant performane gain due to the parallelism in the low level of the hierarhy and the possibility to

aggregateinformationin thehigherleveloftheontroltree[15℄.

Theissue of distributed loadbalaning forCPU and memoryresoureshas been extensivelystudied and

reportedin theliterature. Forexample,Harhol-Balteret al.[9℄proposed aCPU-basedpreemptivemigration

poliythatwasmoreeetivethannon-preemptivemigrationpoliies. Zhangetal.[27℄fousedonloadsharing

poliiesthatonsiderbothCPUandmemoryserviesamongthenodesofaluster. Throughoutthispaper,the

CPU-memory-basedloadbalaning poliy presentedin [27℄will bereferredto asCM.The simulationresults

showthattheCMpoliynotonlyimprovesperformaneofmemory-intensivejobs,butalsomaintainsthesame

loadsharingqualityoftheCPU-basedpoliiesforCPU-intensivejobs[27℄.

A largebody ofworkanbefoundin theliteraturethat addressestheissueof balaningtheloadof disk

systems[11,18℄. Sheuermann etal.[18℄ studiedtwoissuesinparallel disksystems,namelystripingand load

balaning, and showed their relationship to response time and throughput. Leeet al. [11℄ proposed two le

assignment algorithms that minimize the varianeof the servie time at eah disk, in addition to balaning

theload arossalldisks. Sine theproblem of balaning the utilizations arossalldisks is isomorphito the

multiproessorsheduling problem [7℄, a greedy multiproessor-shedulingalgorithm, alled LPT [8℄, anbe

appliedtodiskloadbalaning[11℄. Thus,LPTgreedilyassignsaproesstotheproessorwiththelightestI/O

load[11℄. Throughoutthispaper,werefertotheapproahesthatdiretlyapplyLPTtoI/Oloadbalaning as

theIO poliy. TheI/Oloadbalaning poliies in these studies havebeen shown to be eetivein improving

overallsystemperformanebyfullyutilizingtheavailableharddrives.

Veryreently,threeloadbalaningmodels,whihonsiderI/O,CPUandmemoryresouressimultaneously,

were presented[21,26℄. In[21℄, adynami load-balaningsheme,tailoredforthespei requirementsofthe

Question/Answerappliation, was proposed along with a performane analysis of the approah. Xiao et al.

proposed eetiveloadsharingstrategiesbyminimizing bothCPU idletimeandthenumberofpagefaultsin

lusters[26℄.

However, the load-balaning models presented in [21, 26℄ are similar in the sense that the weights of

system resoures and buer size are statially ongured with a dynamial workload. In ontrast, the new

feedbakontrolmehanismproposedinthisstudy judiiouslyonguresthese parametersinaordanewith

theworkloadof theluster. Trae-drivensimulationsshowthat, ompared with theCM and IOpoliies, the

proposedshemewithafeedbakontrolmehanismsigniantlyenhanestheoverallperformaneofaluster

systemunderbothmemory-intensiveandI/O-intensiveworkload.

Someworkhasbeendonetomakeuseoffeedbakontrolmehanismsinoperatingsystemsanddistributed

environments[12,20℄. Forexample,Steereetal. proposedashedulingshemethat dynamiallyadjustsCPU

alloation and period of threads using the feedbak of an appliation's rates of progress with respet to its

inputsand/oroutputs[20℄. LiandNahrstedt studiedafeedbakontrolalgorithmtosupportend-to-endQoS

in a distributed environment [12℄. However, the feedbak ontrols of resoureweights and buer sizes have

notbeenaddressedin these works. In ontrast,this paperhaspresentedtheexperimental resultsthat verify

thebenetsoftheproposedfeedbakontrolmehanismforbothresoureweightsandbuersizes inahighly

dynamienvironment.

3. SystemModel. Weonsidertheissueoffeedbakontrolmethodtoimprovetheperformaneofload

balaning shemes in a luster onneted by a high-speed network, where eah node not only maintains its

individualjobqueuethatholdsjobsuntiltheynishexeution,butalsopereivesreasonablyup-to-dateglobal

loadinformationbyperiodiallyexhangingloadstatuswithothernodes. Jobsarriveateahnodedynamially

(3)

eah node ismodeled asasingleM/G/1queue [11℄. Sine jobs may be delayed beause ofwaiting in queues

(to share resoureswithother jobs) or beingmigrated to remote nodes, the slowdown imposed on ajob

u

is denedasbelow,

slowdown

(

u

) =

t

f

(

u

)

−

t

a

(

u

)

t

CP U

(

u

) +

t

onCPUandI/O,respetively,withoutanyresouresharing.

In expression 3.1, the numerator orresponds to the total time the job spends running, aessing I/O,

waiting,ormigrating,andthedenominatororrespondstotheexeutiontimeforjobuin adediatedsetting.

Thedenitionofslowdownisanextensionoftheoneusedin[9,26,27℄,whereI/Oaesstimeisnotonsidered.

Forsimpliity,weassumethatallnodesarehomogeneous,havingidential omputingpower,memory

a-paity,anddiskI/Operformaneharateristis. Thissimplifyingassumptionshouldnotrestritthegenerality

oftheproposedmodel,beauseifalusterisheterogeneous,therelativeloadofagivenjobimposedonanode

withhighproessingapabilityislessthanthatimposedonanodewithlowperformane. Theproposedsheme

maybeextendedto handleheterogeneoussystembyinorporatingasimpleonversionmehanismforrelative

load[16℄.

Wealsoassumethenetworkinourmodelisfullyonnetedandhomogenousinthesensethatommuniation

delay between any pair of nodes is the same. This simpliation of the network is ommonly used in many

load-balaningmodels[9,26, 27℄. Additionally,weassumethat theinputdataof eah jobhasbeenstoredon

theloal disk ofthe nodeto whih thejob is submitted. This assumptionis onservativein nature,sine we

ondutedanexperimenttoshowthat,underI/O-intensiveworkload,theperformaneoftheproposedshemes

withsuh assumptionisapproximately10%lesseetivethanthat oftheshemeswithoutit.

theloadofthenewlyarrivedjobplusthemigrationost,thereforeguaranteeingthateahmigrationimproves

theexpetedslowdownofthejob. Ifanappropriateandidateremotenodeisnotavailableorthemigrationis

evaluated tobeuseless,theloadbalaningshemeswillnotinitiatethejobmigration.

4. Adaptive Load BalaningSheme.

4.1. WeightedAverageLoad-balaningSheme. Inthissetion,wepresentWAL,aweightedaverage

load-balaningsheme. EahjobisdesribedbyitsrequirementsforCPU,memory,andI/O,whiharemeasured

bythenumberofjobs runningin thenodes, Mbytes, and numberofdisk aessesperms, respetively. Fora

isoverloaded,if: (1)itsload isthehighest;and(2)theratiobetweenitsloadandtheaverageloadarossthesystemisgreaterthan

athreshold,whihis setto 1.25in ourexperiments. Thisoptimal value,whih isonsistentwith the

resultreportedin[25℄,isobtainedfromanexperimentwherethethresholdisvariedfrom1.0to2.0.

3. Third,aandidatenode

j

with thelowest loadis hosen. Intheasewhere there aremorethantwo nodeswith thelowest load,we randomlyseletonenode to breakthe tie. Ifaandidate node isnot

available,WAL-FCwillbeterminated andnomigrationwill bearriedout.

4. Fourth, WAL-FC determines if job

u

is eligible for migration. A job is eligible for migration if its migrationisableto potentiallyreduethejob'sslowdown.

5. Finally,job

u

ismigratedtotheremotenode

j

,andtheloadofnodes

i

and

j

isupdatedinaordane withjob

u

'sload.

WAL-FC alulates the weighted average loadindex in the rst step. The load index of eah node

i

is denedastheweightedaverageofCPUandI/Oload,thus:

load

(

i

) =

W

CP U

×

load

CP U

(

i

) +

W

IO

×

load

IO

(

i

)

,

(4.1)

(4)

de-node

i

.

W

CP U

and

W

IO

are resoure weights used to indiate the signiane of the orresponding re-soure.

ItisnotedthatthememoryloadisexpressedbytheimpliitI/Oloadimposedbypagefaults. Let

l

page

(

i, u

)

and

l

IO

(

i, u

)

denote theimpliitand expliitI/Oloadofjob

IO

(

i

) =

X

u

∈

M

i

l

page

(

i, u

) +

X

u

∈

M

i

l

IO

(

i, u

)

.

proposedinSetion4.2. Whenthenode'savailablememoryspaeislargerthanorequaltothememorydemand,

there isnoimpliitI/Oloadimposedon thedisk. Conversely, whenthememoryspaeof anode isunable to

meetthememoryrequirementsofthejobs,thenodeenountersalargenumberofpagefaults,leadingtoahigh

impliit I/Oload. Impliit I/Oloaddepends on three fators, namely, theavailable usermemory spae, the

pagefaultrate,andthememoryspaerequestedbythejobsassigned tonode

i

. Morepreisely,

l

page

(

i, u

)

an

bedened as follows, where

µ

i

denotesthe page fault rateof the node, and load

M EM

(

i

)

isthe memory load denoted asthesumofthememoryrequirementsofthejobsrunningonnode

i

.

l

page

(

i, u

) =

(

0

ifload

M EM

(

i

)

≤

n

M EM

(

i

)

,

µ

i

×

P

v∈

Mi

r

M EM

(

v

)

n

M EM

(

i

)

otherwise.

(4.3)

l

IO

(

i, u

)

in Equation4.2 isafuntionof I/Oaess rate,denoted

λ

u

)

λ

u

×

(1

−

h

(

i, u

))

.

(4.4)

In what follows, we quantitatively determine whether a job is eligible for migration. When a job

u

is assignedtonode

i

,itsexpetedresponsetime

r

(

i, u

)

anbeomputedinEquation4.5.

r

(

i, u

) =

t

u

×

E

(

L

i

) +

t

u

×

λ

u

×

E

(

s

i

disk

+

Λ

i

disk

×

E

((

s

i

disk

)

2

)

2(1

−

ρ

i

disk

)

,

(4.5)

where

t

u

and

λ

u

aretheomputationtimeand I/Oaessrateofjobu, respetively.

E

(

s

i

disk

)

and

E

((

s

i

disk

)

2

L

i

, and

Λ

i

disk

denotesthe aggregateI/Oaessrate in node

i

. Sinethe expetedresponsetime ofaneligiblemigrantonthesourenode hastobegreaterthan thesumof

itsexpetedresponsetimeonthedestinationnodeandthemigrationost,job

u

iseligibleformigrationif:

r

(

i, u

)

> r

(

j, u

) +

c

u

,

(4.6)

where

j

representsadestinationnode, and

c

u

isthemigrationost(time)modeledasfollows,

c

u

=

e

+

d

u

×

(

1 b

ij

_net

+

1 b

i

disk

+

1 b

j

_disk

)

,

(4.7)

where

e

isthexedostofmigratingthejobandloadingitintothememoryonanothernode,

b

ij

net

denotesthe

b

ij

net

and

b

j

disk

anbemeasuredbyaperformanemonitor[3℄. Aordingly,thesimulatordisussed

in Setion5estimates

b

ij

net

and

b

j

(5)

4.2. Problem Desription and Examples. Thefeedbak ontrolmehanismthat aims atminimizing

themeanslowdownfousesonadjustingtheresoureweightsandthebuersizes. Tohelpdesribetheproblem

ofxedresoureweightsandI/Obuersizes,werstpresentthefollowingexamplesthatmotivatetheproposed

solutionto improvethesystemperformane.

Assumealuster withsixidentialnodes[9,17,26,27℄, to whih theIO load-balaningpoliy isapplied.

The average page-fault rate and I/O aess rate are hosento be2.0 No./ms (Number/Milliseond)and 2.8

No./ms,respetively. Thetotalmemorysize foreahnode is640Mbyte, andother parametersoftheluster

aregiveninSetion5.1. Wemodiedthetraesusedin[9,27℄,addingarandomlygeneratedI/Oaessrateto

eahjob. Thetraesusedin [9℄havebeenolletedfromoneworkstationonsixdierenttimeintervals. Inthe

traesusedin our experiments,theCPU and memorydemands remain unhanged, and thememorydemand

ofeahjob ishosenbasedonaParetodistributionwiththemeansize of4Mbytes[27℄.

Toevaluatetheimpatofresoureweights(seeEquation4.1)onthesystemperformane,weonduteda

simulationexperimentwheretheresoureweightswerestatiallyset. Figure4.1plotstherelationshipbetween

theresoureweightofI/Oandthemeanslowdownexperienedbyallthejobsinthetrae. Theresultindiates

thatthemeanslowdownonsistentlydereasesastheI/Oresoureweightinreasesfrom0to1withinrements

of 0.2. We attributed this observation to the fat that, under I/O-intensive workload onditions, the I/O

resoureweightwithahigh valueisableto auratelyreetthesigniane ofthedisk I/Oresouresin the

system.

0

10

20

30

40

50

60

70

0

0.2

0.4

0.6

0.8

1 W

IO

, resource weight of disk I/O

Mean Slowdown

Fig.4.1. Meanslowdowns as a funtionof theI/O

re-soure weight. Averagepage-faultrate=

2 .

0

No./ms,average

I/Oaessrate=

2 .

8

No./ms.

0

10

20

30

40

50

60

110

160

210

260

310

360 Buffer Size (MByte)

Mean Slowdown

Fig.4.2.Meanslowdownsasafuntionofthebuersize.

Averagepage-faultrateis

5 .

0

No./ms,averageI/Oaessrate

is

2 .

3

No./ms

Thememoryofeahnodeisdividedinto twoportions, withoneservingasI/Obuerandtheotherbeing

used to storeworking sets ofrunning jobs. Without loss ofgenerality, weassume that thebuer sizes of six

nodes are idential. Weonduted aseond experiment, in whih thebuer sizes were statiallyongured.

Figure4.2showsthebuersizehosenintheexperimentandtheorrespondingmeanslowdownsobtainedfrom

thesimulator.

TheurveinFigure4.2revealsthatthebuersizehasalargeeetonthemeanslowdownsoftheIO-aware

poliy. When buersize is smallerthan 210MByte,the slowdown dereaseswith theinreasing valueof the

buersize. Inontrast,theslowdowninreasesasthebuersizeinreasesifthebuersizeisgreaterthan210

MByte. Optimally, themean slowdownof thisgiven workloadreahestheminimumvaluewhen buersize is

210MByte. AlargebuersizeresultsinahighbuerhitrateandreduesI/Oproessingtime,therebyausing

apositiveeetontheperformane. Ontheotherhand,givenaxedvalueofthetotalavailablemainmemory

size, a larger buer size implies asmaller the amount of memory used to store the working sets of running

jobs, whih inturn leadsto alargernumberof pagefaults. Ingeneral,alargebuersize mayintrodueboth

positiveand negativeeet onthemeanslowdownat thesametime,and theoverallperformanedepends on

theresultanteet.

Although the stati onguration of resoure weights and buer sizes is an approah to tuning the

per-formaneoflusterswhere workloadonditionsanbemodeledandpredited, thisapproahperforms poorly

(6)

4.3. A Feedbak ControlMehanism. Thehighlevelviewofthearhitetureforthefeedbakontrol

mehanismis presentedin Figure 4.3, wherethe arhitetureomprisesaload-balaningsheme, a

resoures-sharingontroller, and afeedbakontroller. The resoure-sharing ontroller onsists of aCPU sheduler, a

memoryalloator andan I/Oontroller. The slowdown ofanewly ompleted job andthe historyslowdowns

IO

is1,theontrolation

∆

W

CP U

anbeobtainedaordingly. Similarly,

∆

bufsize

>

0

meansthebuersize needsto beinreased,andotherwisethebuersizeisto bedereased.

Newly

arrived jobs

Running

jobs

CPU

MEM

I/O

Completed job u

Resource

Sharing

controller

Load-balancing

Feedback

Controller

Slowdown history

slowdown(u)

W

IO

, W

CPU

bufsize

Fig.4.3.Arhitetureofthefeedbakontrolmehanism

Therstgoalofthefeedbakontrolleristomanipulatetheresoureweightsinawaythatmakesitpossible

to minimizethemean slowdownof jobs. The systemmodel foranopen loopbalaneris approximatelygiven

bythefollowingequation,

slowdown

(

z

) =

−

wg

(

L

)

W

IO

(

z

) +

wd

(

L

, respetively. Thevaluesof

wg

and

wd

largelydepend onworkloadonditionsand theapplied load-balaning poliy. Thus,wgandwdanbeobtainedbasedonsimulationmodelsforopen-looploadbalaners. Theontrol

rulefortheresoureweightisformallymodeled below,

∆

W

IO,u

=

G

w

(1

−

S

u

S

u

−

1

)

∆

W

IO,u

−

1

|∆

W

IO,u

−

1

|

,

(4.9)

W

IO,u

=

W

IO,u

−

1

+ ∆

W

IO

,

(4.10)

where

∆

W

IO,u

istheontrolation,

S

u

denotestheaverageslowdown,

∆

W

IO,u−

1 |

∆

W

IO,u

−

₁

|

indiateswhethertheprevious

ontrol ationhasinreasedordereasedtheresoureweight,and

G

w

denotes theontrollergainfortheI/O resoure weight. In the experiments presented shortlyin the next setion,

G

w

is tuned to be 0.5 for better performane. Let

W

IO,u

The feedbak ontroller attempts to manipulate theresoure weights in the followingthree steps. First,

when ajobu is aomplished,theontroller alulatestheslowdown

s

u

ofthis newlyompleted job, Seond,

s

u

is stored in theslowdownhistorytable,and theaverageslowdown

S

u

isomputed aordingly. Note that

S

u

reets a spei pattern of the reent slowdowns in the dynami workload. The table size is a tunable

(7)

ontrolations

∆

W

IO,u

and

∆

W

CP U,u

,whiharebasedonthepreviousontrolationalongwiththeomparison between

S

u

and

S

u

−

1

. Morepreisely,theperformaneisregardedtobeimprovedbythepreviousontrolation if

S

u

−

1

> S

u

,thereforetheontrollerontinuesinreasing

W

IO

ifithasbeeninreasedbythepreviousontrol ation,otherwise

W

IO

isdereased. Similarly,

S

u

−

1

< S

u

meansthattheperformanehasbeenworsenedsine thelatest ontrol ation, suggesting that

W

IO

has to be inreased if thepreviousontrol ationhas redued

W

IO

,andvie versa.

Besides onguringthe weights,the seondgoal ofthe feedbak ontrolmehanismis to dynamiallyset

thebuersizeofeahnodebasedontheunpreditableworkload. Themehanismisaimingatimprovingbuer

utilizations and reduing the number of page faults by maintaining an eetive usage of memory spae for

runningjobsandtheirdata.

Weanderivetheslowdownbasedonamodelthatapturestheorrelationbetweenthebuersizeandthe

slowdown. Forsimpliity,themodelanbeonstrutedasfollows,

slowdown

(

z

) =

−

bg

(

L

)

bufsize

(

z

) +

bd

(

∆

bufsize

u

=

G

b

(

S

u

−

1

−

S

u

)

∆

bufsize

u

−

1

|∆

bufsize

u

−

1

|

,

(4.13)

where

u−

1 |

indiateswhether thepreviousontrol ationhasinreased

or dereased the resoure weight, and

C

b

denotes the ontroller gain.

G

w

is tuned to be 0.5 in order to deliverbetterperformane. Letbufsize

u

−

∆

W

CP U

and

∆

W

IO

. Theadaptivebuer size makesnotieableimpats onboththememoryalloator andI/O ontroller,

> S

u

,meanstheperformaneisimprovedbythepreviousontrolation,therebyinreasingthebuersize

ifithasbeeninreasedbythepreviousontrolation,otherwisethebuersizeisredued. Likewise,

S

u

−

1

< S

u

, indiates that thelatest buerontrol ation leadsto aworse performane,implying that thebuer size has

tobeinreasedifthepreviousontrolationhasreduedthebuersize,otherwisetheontrollerdereasesthe

buersize.

Theextratimespentinperformingfeedbakontrolisnegligibleand,therefore,theoverheadintroduedby

thefeedbakontrolmehanismisignoredinoursimulationexperiments. Thereasonisbeausetheomplexity

ofthemehanismislow,and ittakesaonstanttimeto makeafeedbakontrol deision.

5. ExperimentsandResults. Toevaluatetheperformaneoftheproposedload-balaningshemewith

afeedbakontrolmehanism,wehaveonduted atrae-drivensimulation,in whihtheperformanemetri

used is slowdown that is dened earlier in setion 3. We have evaluated the performane of the following

load-balaningpoliies:

1. CM:theCPU-memory-basedload-balaningpoliy[27℄withoutusingbuerfeedbakontroller. Ifthe

memoryis imbalaned,CM assignsthenewly arrivedjob tothe nodethat hastheleast aumulated

memoryload. WhenCPUloadisimbalaneandmemoryloadiswellbalaned,CMattemptstobalane

CPUload.

2. IO:theIO-basedpoliy[11℄withoutusingthefeedbakontrolmehanism. TheIOpoliyusesaload

index that representsonlytheI/Oload. Forajob arriving in nodei, the IOsheme greedily assigns

thejob tothenodethathastheleastaumulatedI/Oload.

3. WAL:theWeightedAverageLoad-balaningshemewithoutthefeedbakontroller[21℄.

4. WAL-FC:theWeightedAverageLoad-balaningshemewiththefeedbakontrolmehanism.

5. NLB:Thenon-load-balaningpoliywithoutusingthefeedbakontroller.

(8)

Tostudy dynami load balaning, Harhol-Balterand Downey [9℄ implemented atrae-driven simulator

foradistributed systemwith6nodesinwhihround-robinshedulingisemployed. Theloadbalaning poliy

studied in that simulator is CPU-based. Zhang et. al [27℄ extended the simulator, inorporating memory

reourses into the simulation system. Based on the simulator presented in [27℄, our simulator inorporates

the following newfeatures: (1) The abovepolies are implemented in the simulator. (2) The interonnet is

assumedtobeafullyonnetednetwork. (3)Asimplediskmodelisaddedintothesimulator. (4)AnI/Obuer

model,whihwillbepresentedshortlyinthissetion,isimplementedontopofthediskmodel. Thetraesused

in the simulationare modiedfrom [9℄[27℄, andit is assumedthat the I/Oaess rate israndomly hosenin

aordanewithauniform distribution. Weassumethat theI/Oaess rateof eahjob isindependentofthe

job's memoryspaerequirementandCPU servietime. Although thissimpliation deatesanyorrelations

between I/O requirement and other job harateristis, we an examine the impat of I/O requirement on

systemperformanebyonguringthemeanI/Oaessrateasaworkloadparameter.

ThesimulatedsystemisonguredwithparameterslistedinTable5.1. TheparametersforCPU,memory,

disks,andnetworkarehoseninsuhawaythattheyresembleatypialluster oftheurrentday.

Table5.1

DataCharateristis

Parameters Value Parameters Value

CPUSpeed 800MIPS PageFaultServieTime 8.1ms

RAMSize 640MByte SeekandRotationtime 8.0ms

InitialBuerSize 160MByte DiskTransferRate 40MB/Se.

Contextswithtime 0.1ms NetworkBandwidth 1Gbps

Disk aessesof eah job are modeled asaPoisson proess. Data sizes

d

RW

u

ofthe I/Orequestsin job

u

are randomly generated based on aGamma distribution with the mean size of 250KByte and thestandard

deviation of 50Kbyte. The sizes hosen in this way reettypialdata harateristisfor MPEG-1 data [2℄,

whihisretrievedbymanymultimedia appliations.

Sine buer an be used to redue the disk I/O aess frequeny (See Equation4.4), we approximately

modelthebuerhitprobabilityofI/Oaessforjob

u

runningonnode

i

bythefollowingformula:

h

(

i, u

) =

(

_r

_u

r

u

+1

if

d

buf

(

i, u

)

≥

d

data

(

u

)

,

r

u

r

u

+1

×

d

buf

(

i,u

)

d

data

(

u

)

otherwise,

(5.1)

where

r

u

isthe datare-aessrate,

d

buf

(

i, u

)

is thebuersize alloatedto job

u

, and

d

data

(

u

)

istheamount of data job u retrievesfrom orstored to the disk, given abuer with innitesize. I/Obuer in a nodeis a

areestimatedusingthefollowing

twoequations:

d

buf

(

i, u

) =

λ

u

d

RW

u

d

buf

(

i

)

P

k

∈

M

i

λ

k

d

RW

u

,

(5.2)

d

data

(

u

) =

λ

u

t

u

d

RW

u

r

u

+ 1

.

(5.3)

FromEquations5.1,5.2and5.3,hitrate

h

(

i, u

)

beomes:

h

(

i, u

) =

(

r

u

r

u

+1

if

d

buf

(

i, u

)

≥

d

data

(

u

)

,

r

u

d

buf

(

i

)

t

u

P

j∈

Mi

λ

j

d

RW

j

otherwise.

(5.4)

(9)

size. The inreasing rate of the buerhit probabilitydrops when the buer size is greater than 150 Mbyte,

suggestingthat further inreasingthebuersize annotsigniantlyimprovethebuerhit probabilitywhen

the buer size approahes to a level at whih a large portion of the I/Odata an be aommodatedin the

buer.

0

10

20

30

40

50

60

70

80

10 50 100 150 200 250 300 350 400 450

NL, R=5

CM, R=5

IO, R=5

NL, R=3

CM, R=3

IO, R=3

Buffer Size (Mbyte)

Buffer Hit Probability

Fig. 5.1. Buer Hit Probability as a funtion of the

BuerSize, page-fault rateis 4.0No./ms, I/Oaess rateis

2.2No./ms.

0

10

20

30

40

50

60

70

80

90

7.2

7.4

7.6

7.8

8

8.2

8.4

8.6

8.8 NLB

IO

WAL

WAL-FC

Number of Page Fault Per Millisecond

Mean Slowdown

Fig.5.2.Meanslowdownsasafuntionofthepage-fault

rate,I/Oaess rateof0.1No./ms.

5.2. Memory Intensive Workload. To simulate a memory intensive workload, the I/O aess rate

is xed to a omparatively low level of 0.1 No./ms. The page-fault rate is set from 7.2 No./ms to 8.8

No./ms with inrements of 0.2 No./ms. The performane of CM is omitted, sine it is very lose to that

ofWAL.

Figure 5.2 reveals that the mean slowdowns of all the poliies inreasewith the page-fault rate. This is

beauseasI/Odemandsarexed,highpage-faultrateleadstoahighutilizationofdisks,ausinglongerwaiting

timeonI/Oproessing.

When the page-fault rate is high, WAL outperforms IO and NLB, and the WAL-FC has better

perfor-mane than both WAL and IO. For example, the WAL poliy redues slowdowns over the IO poliy by up

to 37.2% (with an average of 31.5%), and the WAL-FC poliy improvesthe performane in terms of mean

slowdown over IO by up to a fator of 4 (400%). The reason is that the IO poliy only attempts to

bal-ane expliitI/O load,ignoring the impliit I/Oloadthat resulted from page faults. When theexpliit I/O

loadis low,balaning expliit I/Oloaddoesnotmakeasigniant ontributionto balaning the overall

sys-tem load. In addition, NLB is onsistently the worst among the six poliies, sine NLB leaves three shared

resouresextremely imbalaned and does notimprovethe buer utilization by the adaptive onguration of

buersizes.

Moreinterestingly, the poliies that use the feedbakontrol mehanism algorithm onsiderably improve

the performane overthose withoutemploying the feedbak ontroller. For example,WAL-FC improvesthe

systemperformaneoverWALbyupto274%(withanaverageof220%). Consequently,theslowdownsofNLB,

WAL,andIOaremoresensitiveto thepage-fault ratethanWAL-FC.

5.3. I/O-IntensiveWorkload. TostresstheI/O-intensiveworkloadin thisexperiment,theI/Oaess

rateis xed at ahigh value of 2.8No./ms, and the page-fault rateis hosen from 1.6 No./msto 2.1 No./ms

withinrementsof0.1No./ms. Thelowpage-faultratesimplythat, evenwhentherequestedmemoryspaeis

largerthanthealloatedmemoryspae,pagefaultsdonotourfrequently. This workloadreetsasenario

wherememory-intensivejobsexhibithightemporalandspatialloalityofaess. Figure5.3plotsslowdownas

afuntion ofthepage-faultrate. TheresultsofIOareomittedfromFigure5.3,sinetheyarenearlyidential

tothoseofWAL.

First,theresultsshowthattheWALshemesigniantlyoutperformstheNLBandCMpoliies,suggesting

that NLB and CM are notsuitable for I/O intensive workload. Forexample, asshown in Figure 5.3, WAL

improves theperformane of CM in terms of themean slowdown by upto a fator of 9(with an average of

(10)

0

20

40

60

80

100

1.6

1.7

1.8

1.9

2

2.1 NLB

CM

WAL

WAL-FC

Number of Page Fault Per Millisecond

Mean Slowdown

Fig.5.3. Meanslowdownasafuntionofthepage-faultrate,I/Oaess rateis2.8No./ms.

Seond,Figure 5.3showsthat WAL-FCsigniantlyoutperformsWAL.Forexample,WAL-FCdeliversa

performaneimprovementoverWALbyupto282%(withanaverageof125%). Again,thisisbeausetheW

AL-FCshemeappliesthefeedbakontrollertomeet thehighI/Odemands byhangingtheweightsandtheI/O

buersizestoahieveahighbuerhitprobability. ThisresultsuggeststhatimprovingtheI/Obuerutilization

byusing thefeedbak ontrolmehanismanpotentiallyalleviatethe performane degradationresultedfrom

theimbalanedI/Oload.

Third, theresults further show the slowdowns of NLB and CM are verysensitive to the page-fault rate.

In other words, the mean slowdowns of NLB and CM all inrease notieably with the inreasing value of

I/O load. One reason is, as I/O load are xed, a high page-fault rate leads to high disk utilization,

aus-ing longer waiting time on I/O proessing. A seond reason is, when the I/O load is imbalaned, the

ex-pliit I/O load imposed on some node will be very high, leading to a longer paging fault proessing time.

Conversely, the page-fault rate makes insigniantimpat on the performaneof WAL, and WAL-FC. Sine

the high I/O load imposed on the disks is diminished either by balaning the I/O load or by improving

the buer utilization. This observation suggeststhat thefeedbak ontrol mehanism is apable of boosting

theperformane of lusters under I/O-intensiveworkloadevenin the absene ofanydynami load-balaning

shemes.

5.4. Memory and I/O intensive Workload. The twoprevious setionspresentedthe best ases for

theproposedshemesinetheworkloadwaseither highlymemory-intensiveorI/O-intensivebutnotboth. In

theseextremesenarios,thefeedbakontrolmehanismprovidesmorebenetstolustersthanload-balaning

poliies do. This setion attempts to show anotherinteresting asein whih theluster has aworkloadwith

bothhigh memory and I/Ointensivejobs. The I/Oaess rate is set to 1.5 No./ms. The page fault rate is

from7.2No./msto8.4No./mswithinrementsof0.2No./ms.

Figure 5.4 showsthat theperformanes of CM, IO, and WAL are lose to one another. This is beause

the trae, used in this experiment, omprises a good mixture of memory-intensive and I/O-intensive jobs.

Hene, while CM takes advantage of balaning CPU-memory load, IO an enjoy benets of balaning I/O

load. Interestingly, under this spei memory and I/O intensive workload, the resultant eet of balaning

CPU-memoryloadisalmostidential tothatofbalaning I/Oload.

Aseondobservationisthat,underthememoryandI/Ointensiveworkload,load-balaningshemesahieve

higherlevelofimprovementsoverNLB.ThereasonisthatwhenbothmemoryandI/Odemandsarehigh,the

buersizes in aluster areunlikely tobehanged, asthereis amemoryontentionamong memory-intensive

andI/O-intensivejobs. Thus,insteadofutuatingwidelytooptimizetheperformane,thebuersizesnally

onvergetoavaluethat minimizesthemeanslowdown.

Third, inorporating the feedbak ontrol mehanism in the existing load-balaning shemes is able to

further boost theperformane. Forexample, ompared with WAL,WAL-FCfurther dereases theslowdown

by up to 54.5% (with an average of 30.3%). This result suggests that, to sustain a high performane in

(11)

0

10

20

30

40

50

60

70

80

90

100

7.2

7.4

7.6

7.8

8

8.2

8.4 NLB

CM

IO

WAL

WAL-FC

Number of Page Fault Per Millisecond

Mean Slowdown

Fig.5.4.Meanslowdownsasafuntionofthepage-fault

rate,I/Oaessrateof1.5No./ms.

0

20

40

60

80

100

120

100

150

200

250

300

350

400 IOCM,IO rate=2.6

WAL-PM,IO rate=2.6

WAL-RE,IO rate=2.6

Average Data Size (KByte)

Mean Slowdown

Fig. 5.5. Mean slowdown as a funtion of the size of

averagedatasize. Pagefaultrateis0.5No./ms,andI/Orate

is2.6No./ms.

5.5. Average Data Size. Inthe previousexperiments, thedata sizes are hosenbased ontypial

mul-timedia appliations. It is notedthat I/Oloaddepends onI/O aess rateand the averagedata size of I/O

requests, whih in turn relyon theI/Oaess patterns of appliations. Thepurpose ofthis experiment isto

study theperformaneimprovementsahieved bythe feedbakontrol mehanismforother typesof

applia-tionsiftheyexhibitdierentharateristis. Speially, Figure5.5 showstheimpatof averagedatasize on

the performane of the feedbakontrol mehanism. The page fault rate and the I/O aess rate are set to

0.5No./msand2.6No./ms.,respetively. Theaveragedatasizeis hosenfrom 100KByteto400KBytewith

inrementsof50KByte.

Figure5.5 indiates that,forthree examinedload-balaningpoliies, themeanslowdowninreasesasthe

average data size inreases. This is beause, under irumstane that both page fault rate and I/O aess

ratearexed, alargeaveragedata sizeyields ahighutilization of disks,ausing longerwaiting timesonI/O

proessing. More importantly, Figure 5.5 shows that the performane improvement gained by the feedbak

ontrolmehanismbeomesmorenotieablewhentheaveragedatasize islarge. Thisresultsuggeststhat the

proposedapproahisnotonlybeneialformultimediaappliations,butalsoturnsouttobeusefulforavariety

ofappliationsthataredataintensivein nature.

6. Conlusions. In this paper, we have proposed a feedbak ontrol mehanism to dynamially adjust

theweightsof reoursesand the buer sizes in aluster with a generaland pratialworkloadthat inludes

memoryandI/Ointensiveworkloadonditions. Theprimaryobjetiveoftheproposedapproahistominimize

the number of page faults for memory-intensive jobs while improving the buer utilization of I/O-intensive

jobs. Thefeedbakontrollerjudiiouslyongurestheweightstoahieveanoptimalperformane. Meanwhile,

under aworkloadwhere thememory demand ishigh, the buersizes aredereasedto alloatemorememory

formemory-intensivejobs, therebyleadingtoalowpage-faultrate.

To evaluate the performane of the mehanism, we ompared the proposed WAL-FCshemewith WAL,

CM, and IO. For omparison purposes, the NLB poliy that does not onsider load balaning is also

simu-lated. A trae-drivensimulation providesextensiveempirial resultsdemonstratingthat WAL-FCis eetive

inenhaningperformaneofexisting dynamiload-balaningpoliiesundermemory-intensiveorI/O-intensive

workload. Inpartiular, when the workload is memory-intensive, WAL-FC redues themean slowdown over

theCM andIOpoliiesbyuptoafatorof9. Further,wehavemadethefollowingobservations:

1. When the page-fault rate is higher and the I/O rate is verylow, WAL and CM outperform IO and

NLB,andWAL-FChasbetterperformanethanWAL;

2. WhenI/Odemandsare high,WALandIO aresigniantly superior toCM and NLB.AndWAL-FC

hasnotieablybetterperformanethanthatof IO;

3. Under an I/Ointensiveworkload, the mean slowdowns of NLB and CM allinrease notieably with

I/Oload. Conversely,thepage-fault ratemakesinsigniantimpatontheperformane ofIO,WAL,

andWAL-FC.

(12)

5. Theperformane improvementgained bythefeedbakontrol mehanism beomes pronounedwhen

the average data size is relatively large. Future studies in this area may be performed in several

diretions. First,thefeedbakontrolmehanismwillbeimplementedinalustersystem. Seond,we

willstudythestabilityoftheproposedfeedbakontroller. Finally,itwillbeinterestingto studyhow

quiklythefeedbakontrolleronvergestotheoptimalvaluein lusters.

7. Aknowledgements. Thiswork waspartiallysupported by anNSFgrant(EPS-0091900),astart-up

researhfund(103295)fromtheresearhandeonomidevelopmentoeoftheNewMexioInstituteofMining

andTehnology,aNebraskaUniversityFoundationgrant(26-0511-0019),aUNLAademiProgramPriorities

Grant, and aChineseNSF 973 projetgrant(2004b318201). Weare grateful to the anonymous refereesfor

theirinsightfulsuggestionsandomments.

REFERENCES

[1℄ A.AharyaandS.Setia,AvailabilityandUtilityofIdleMemoryinWorkstationClusters,Pro.ACMSIGMETRICSConf.

MeasuringandModelingofComputerSystems,May1999.

[2℄ E.Balafoutis,G. Nerjes, P.Muth, M.Paterakis,P. Triantafillou,andG. Weikum,ClusteredSheduling

Algo-rithmsforMixed-MediaDiskWorkloads,Pro.Int'lConf.onClusterComputing,2002.

[3℄ J. Basneyand M. Livny,ManagingNetwork Resoures inCondor, Proeedingsof theNinth IEEESymposiumonHigh

PerformaneDistributedComputing,Pittsburgh,Pennsylvania,August2000,pp298-299.

[4℄ C.Chang,B.Moon,A.Aharya,C.Shok,A.Sussman,J.Saltz,Titan:AHigh-PerformaneRemote-sensingDatabase,

Pro.ofInternational ConfereneonDataEngineering,1997.

[5℄ D.Eager,E.Lazowaska,andJ.Zahorjan,Aomparisonofreeiver-initiatedandsender-initiatedadaptiveloadsharing,

Proeedings of the 1985 ACMSIGMETRICS onferene on Measurement and modeling of omputer systems,Austin,

Texas,1985.

[6℄ D. J. Evans andWunbutt,Loadbalaning with networkpartitioning usinghost groups, Parallelomputing, 20:325-345,

Marh1994.

[7℄ M.GareyandD.Johnson,ComputersandIntratability:AGuidetothetheoryofNP-Completeness,W.H.Freeman,1979.

[8℄ R. L.Graham,BoundsonMultiproessing TimingAnomalies,SIAMJ.AppliedMath.,Vol.17,No.2,pp.416-429,1969.

[9℄ M. Harhol-Balter andA.Downey,ExploitingProessLifetimeDistributionsforLoadBalaing,ACMtransationon

ComputerSystems,vol.3,no.31,1997.

[10℄ C.HuiandS.Chanson,Improved StrategiesforDynamiLoadSharing,IEEEConurreny,vol.7,no.3,1999.

[11℄ L.Lee,P.Sheauermann,andR.Vingralek,FileAssignmentinParallelI/OSystemswithMinimalVarianeofServie

time,IEEETrans.onComputers,Vol.49,No.2,pp.127-140,2000.

[12℄ B.LiandK.Nahrstedt,AControlTheoretialModelforQualityofServieAdaptations,inIEEEInternationalWorkshop

onQualityofServie,May1998.

[13℄ F.C.H. Lin andR.M. Keller,TheGradient ModelLoad Balaning Method,IEEETrans.SoftwareEngineering,vol.13,

no.1,pp.32-38,Jan.1987.

[14℄ F. MunizandE.J. Zaluska,Parallel LoadBalaning: AnExtensiontotheGradient Model,ParallelComputing,vol.21,

pp.287-301,1995.

[15℄ R. Pollak,AHierarhialLoadBalaning EnvironmentforParallel andDistributed Superomputer, Pro.ofthe

Interna-tionalSymposiumonParallelandDistributedSuperomputing,Fukuoka,Japan,September1995.

[16℄ X.Qin,H.Jiang,andD.R.Swanson,AnEientFault-tolerantShedulingAlgorithmforReal-timeTaskswithPreedene

ConstraintsinHeterogeneousSystems,Pro.the31stInt'lConf.onParallelProessing(ICPP2002),Vanouver,Canada,

Aug2002,pp.360-368.

[17℄ X.Qin,H.Jiang,Y.Zhu,andD.Swanson,DynamiLoadBalaningforI/O-IntensiveTasksonHeterogeneousClusters,

Pro. ofthe10th International ConfereneonHighPerformaneComputing(HiPC2003),De.17-20,2003,Hyderabad,

India.

[18℄ P.Sheuermann,G.Weikum,P.Zabbak,DataPartitioningandLoadBalaninginParallel DiskSystems,TheVLDB

Journal,pp.48-66,July,1998.

[19℄ H.Shen, S.Lor, andP.Maheshwari,Anarhiteture-independent graphialtoolforautomationtention-free

proess-to-proessormapping,JournalofSuperomputing,Vol.18,No.2,2001,p.115-139.

[20℄ D.C.Steere,A.Goel,J.Gruenberg,et.al.,AFeedbak-drivenProportionAlloatorforReal-RateSheduling,

Oper-atingSystemsDesignandImplementation,NewOrleans,Louisiana,Feb1999.

[21℄ M. Surdeanu, D. Modovan, andS. Harabagiu,PerformaneAnalysis of a Distributed Question/ Answering System,

IEEETrans.onParallelandDistributedSystems,Vol.13,No.6,pp.579-596,2002.

[22℄ T.Tanaka,CongurationsoftheSolarWindFlowandMagnetiFieldaroundthePlanetswithnoMagnetield:Calulation

byanewMHD,JournalofGeophysialResearh,pp.17251-17262,Ot.1993.

[23℄ G.Voelker,ManagingServerLoadinGlobalMemorySystems,Pro.ACMSIGMETRICSConf.MeasuringandModeling

ofComputerSystems,May1997.

[24℄ M.Willebek-LeMairandA.Reeves,StrategiesforDynamiLoadBalaningonHighlyParallelComputers,IEEETrans.

ParallelandDistributedSystems,vol.4,no.9,pp.979-993,Sept.1993.

[25℄ X.Wu,V.Taylor,andR.Stevens,DesignandImplementationofProphesyAutomatiInstrumentationandDataEntry

(13)

[26℄ L. Xiao, S. Chen, and X. Zhang,DynamiCluster Resoure Alloations for Jobs with Known and Unknown Memory

Demands,IEEETransationsonParallelandDistributedSystems,vol.13,no.3,2002.

[27℄ X.Zhang,Y.Qu,andL.Xiao,ImprovingDistributedWrokloadPerformanebySharingbothCPUandMemoryResoures,

Pro.20thInt'lConf.DistributedComputingSystems(ICDCS2000),Apr.2000.

Editedby: HongShen

Reeived: February27,2004