Mail Server. User. Database. Mail Server. User. Database. Mail Server. User. Database. Mail Server. User. Database

(1)

and S heduling

Shimon Whiteson and Peter Stone

Department of Computer S ien es

The University of Texas at Austin

1 University Station, C0500 Austin, TX 78712-0233 fshimon,pstoneg s.utexas.edu http://www. s.utexas.edu/~{shimon,pstone} May 7, 2004 Abstra t

Computer systems are rapidly be oming so omplex that maintaining them with

human support stas will be prohibitively expensive and ineÆ ient. In response,

vi-sionaries have begunproposingthat omputer systems be imbued withthe abilityto

ongure themselves, diagnose failures, and ultimately repair themselves in response

to these failures. However, despite onvin ing arguments that su h a shift would be

desirable,asofyetthere hasbeenlittle on rete progressmade towards thisgoal. We

viewtheseproblemsasfundamentallyma hine learning hallenges. Hen e,thisarti le

presentsanewnetworksimulatordesignedtostudytheappli ationofma hinelearning

methods from a system-wide perspe tive. We also introdu e learning-based methods

for addressing the problems of job routing and CPU s heduling in the networks we

simulate. Our experimental results verify that methods using ma hine learning

out-performreasonableheuristi andhand- odedapproa hesonexamplenetworksdesigned

to apturemanyofthe omplexities thatexist inreal systems.

1 Introdu tion

Computer systems are rapidly be oming|indeedsome would say have already be ome|so

omplex that maintaining them with human support stas will be prohibitively expensive

andineÆ ient. Largeenterprisesystems,su hasthosefoundinmediumtolarge ompanies,

areprimeexamplesofthisphenomenon. Nonetheless, most omputersystemstodayare still

builttorelyonstati ongurationsand anbeinstalled, ongured,and re- onguredonly

(2)

the ability to ongure themselves, diagnose failures, and ultimately repair themselves in

response to these failures. The resulting shift in omputational paradigm has been alled

by dierent names, in luding ognitive systems (Bra hman, 2002) and autonomi

omput-ing (Kephart &Chess,2003),but theunderlyingmotivationand goalisremarkablysimilar.

Ourlong-termgoalistoenablelarge-s aleintegrated omputersystems, onsistingoftens

tohundredsofma hineswithvaryingfun tionality,tobedeliveredinadefault onguration

and then in rementally tune themselves to the needs of a parti ular enterprise based on

observed usage patterns. In addition, the systems should be able to adapt to hanges in

onne tivity due to system failures and/or omponentupgrades.

Weviewthesegoalsasfundamentallyma hinelearning hallenges. For omputersystems

to optimize their own performan e without human assistan e, they will need to learn from

experien e. To ontinuallyadjust in response to adynami environment,they willneed the

adaptabilitythat onlyma hine learning an oer.

Fully automating the maintenan e and optimization of a large omputer system via

ma hinelearningmethodsisastaggering hallenge. Ifitisnota hievabletoday,what

short-termgoalsshouldwesettomaximizethelikelihoodthatitwillbea hievabletomorrow? One

approa hisbottom-up: insteadofoptimizinganentiresystem,we anoptimizeitsindividual

omponents. Though the study of autonomi omputingis stillinitsearlystages,there has

already been alotof preliminaryprogress withthe bottom-up approa h(see Se tion7 fora

thoroughsurveyofrelatedwork). Forexample,manyresear hershaveusedma hinelearning

tooptimizenetworkrouting(Boyan&Littman,1994; DiCaro &Dorigo,1998; Clarket al.,

2003; Itao et al., 2001). In addition, Gomez et al. (2001) used neuroevolution to optimize

dynami resour e allo ationona hip multipro essor. Chenetal.(2004)used de isiontrees

todiagnosesystem failures and Mesnier etal.(2004)used de isiontrees to lassify dierent

le types and therefore improve disk performan e. Furthermore, Brauer and Weiss (1998)

usedreinfor ementlearningmethodstooptimizings hedulingoftasksonmultiplema hines.

Thebottom-upapproa hisappealingbe auseoptimizingindividual omponentsismu h

more feasible than optimizingentire systems. Eventually, these omponents an be

assem-bled into an autonomi system that should perform better than manually ongured ones.

However, su hanapproa hfailstoaddresstheee tsofintera tionsbetweenvarious

ompo-nentsanddoesnot apitalizeonopportunitiestooptimizeatasystem-widelevel. Therefore,

we propose atop-down approa hto developing autonomi systems. Our emphasis ison

op-timizing the entire system by developing autonomi omponents that work well, not just

independently, but in on ert with other su h omponents.

Sin e doing so in a real, full- edged enterprise system is not urrently feasible, we

in-trodu e in this arti le a high-level simulator designed to fa ilitate the study of ma hine

learning in enterprise systems. Our simulator aptures some of the key omplexities that

make system-wide autonomi omputing a hallenge, while abstra ting away the low-level

detailsthat urrentlymakeitimpra ti alto reatefullyautonomi systemsonrealhardware.

In addition to introdu ing this new tool, we present ma hine learning approa hes to

(3)

validate the notion of atop-down approa h toautonomi omputing.

Theremainderofthisarti leisorganizedasfollows. Se tion2providesba kgroundonour

networksimulatorand Se tion3detailsour methodsforoptimizingroutingand s heduling.

Se tion 4 explains our experimental framework and Se tion 5 presents the results of those

experiments. Se tion 6 dis usses the impli ations of these results, Se tion7 reviews related

work, and Se tion8 highlights some opportunities for future work.

2 Simulation

To pursue our resear h goals, we need a high-level simulator that is apable of modeling

the relevant typesof intera tions among the many dierent omponents of a omputer

sys-tem. While detailed simulators exist for individual system omponents, su h as networks,

databases,et .,toour knowledgethereisnosimulatorthatmodelssystem-wideintera tions.

Therefore,wehavedesignedandimplementedasystemthatsimulatesthewaya omputer

networkpro esses user requestsfroma high-levelperspe tive. The simulatorisverygeneral

purposeand anbeusedtorepresentmanydierentkindsofnetworks. Forexample,Figure1

depi ts a ommer ial enterprise system in whi h aset of users use a web interfa e to he k

their mailorquery a database.

Web Server

Load Balancer

Mail Server

Database

Web Server

Load Balancer

Mail Server

Database

Web Server

Load Balancer

Mail Server

Database

Web Server

Load Balancer

Mail Server

Database

User

Figure 1: An example of a network implemented in our simulator; ovals represent users and

re tanglesrepresentma hines;thelinesbetweenthem representlinksthatallow ommuni ationof

jobsorother pa kets.

The simulator represents a omputer network as a graph: nodes represent ma hines or

users andlinks represent the ommuni ation hannelsbetweenthem. Users reate jobsthat

travelfromma hinetoma hinealonglinksuntilalloftheirstepsare ompleted. InFigure1,

users forward their requests to a Load Balan er whi h sele ts a Web Server to handle the

job. The Web Server forwards the job to a Mail Server or Database as appropriate. When

ompleted, the job is sent ba k to the user who reated it.

(4)

There are two primary typesof nodes: users and ma hines.

Users: Users are spe ial nodes who reate jobs and send them toma hines for pro essing.

On e ajob is ompleted, itis returned to the user, who omputes itss ore.

Ma hines: Ama hine isanode that an ompleteportionsofajob. Ea htype ofma hine

is dened by the set of steps it knows how to omplete. Completing steps uses the

ma hine'sCPUtimesoea hma hine musthaveanalgorithmforallo atingCPUtime

among the jobs urrently in its possession. If a ma hine annot omplete the next

step of a given job, itmust look for a neighboring node that an. If several neighbors

qualify,the ma hine must makearoutingde ision,i.e. itmust trytodetermine whi h

of the ontending neighbors to forward the job to so as to optimize performan e of

the whole system. An intelligent agent an be used to ontrol a ma hine and make

de isionsabout howto allo ateCPU time and route jobs.

Links: A link onne ts two nodes in the network. It is used to transfer jobs and other

pa kets between nodes.

Pa kets: Apa ketisaunitofinformationthattravelsbetween nodesalonglinks. Themost

ommontype of pa ket is a job, des ribed below, but nodes an reate other types of

pa kets inorder to ommuni ate with othernodes about their status.

Jobs: Ajob isaseriesofsteps thatneedtobe ompletedinaspe iedorder. Forexample,

auserwho wishestobuysomething oaweb sitemight reatea\pur hase job." This

job might in lude steps su h as a essing the ustomer database, onrming redit

ardinformation,andgenerating anorder onrmation. Completingthesesteps ould

require the job to travel among several ma hines. A system usually has several types

of jobs whi h dierin the listof steps they require for ompletion.

Steps: A step is one omponent of a job. Ea h step an only be arried out by a subset

of ma hinesinthe network. Forexample, the retrievalof informationinresponse toa

database query must happen at adatabase server.

In theexampleshown inFigure1,MailJobs musttraveltoaWeb Server, aMailServer,

and ba k toaWeb Server beforereturningtothe user. DatabaseJobs must traveltoaWeb

Server, aDatabase, and ba k toa Web Serverbeforereturning tothe user. The goal of the

agents ontrollingthe system is topro ess these user requests as eÆ iently as possible.

A simulation pro eeds for a spe ied number of dis rete timesteps. At ea h timestep,

ma hines an allo atetheir CPU y les towards the ompletion ofsteps onthe jobs intheir

possession, pa kets an be sent alonglinks, and pa kets an arrive atnew nodes.

We believe that this simulator provides a valuable testbed for new approa hes to

auto-nomi omputing. Be ause its design is very general, it an be used to represent a wide

(5)

ior. Most importantly, the simulator aptures many of the real world problems asso iated

with omplex omputer systems while retainingthe simpli ity that makes experimental

re-sear h feasible.

2.1 Load Updates

Ea h ma hine periodi ally(inour ase, everyvetimesteps) sends a spe ial pa ket alleda

LoadUpdatetoea hofitsneighbors. ALoadUpdateindi ateshowmanyjobsthatma hine

alreadyhas initsqueue. The ontentsof su hanupdate an helpanintelligentroutermake

better de isions. However, the very presen e of the update isalsoimportantinformation: if

a ma hine does not re eive any updates from a given neighbor for a ertain period of time

(ten timesteps),it on ludes that thatneighborhas gone down andwillnolongerroute any

jobs to it until it re eives another update. The system has a strong in entive to qui kly

dete t when ma hines go down, sin e any jobs routed to a down ma hine re eive a sti

penalty. Forthepurposesof s oring,su hjobsare given a ompletiontimeof500 timesteps,

though in fa t they are never a tually ompleted. Sin e Load Updates are ommuni ated

via pa kets, they in ur real network traÆ overhead in the simulator. As long as they are

not too frequent, in ludingthem asa routineo urren eis not unrealisti .

2.2 Utility Fun tions

The ultimategoal of our eorts is to improve the network's utility to its users. In the real

world, that utility is not ne essarily straightforward. While it is safe to assume that users

always want their jobs ompleted asqui kly as possible, the value of redu ing a job's

om-pletiontimeisnotalwaysthesame. Furthermore,ea husermayhaveadierentimportan e

tothe system.

In orderto apture these omplexities,the simulatorallows dierent utilityfun tionsfor

ea h user or job type (Walsh et al., 2004). Ea h utility fun tion an be any monotoni ally

de reasing fun tion that maps a job's ompletion time to its utility. Hen e, the goalof the

agents ontrollingthenetwork isnot tominimizeaverage ompletiontime, buttomaximize

the umulative utilityover allthe jobs it isasked to pro ess.

3 Method

In this se tion, we present our approa hto developing intelligent routersand s hedulers for

networks like the ones shown in Figure 1. A hieving good performan e in su h a network

using xed algorithms and hand- oded heuristi s is very diÆ ult and prone to in exibility.

Instead, we use reinfor ement learning to develop routers and s hedulers that are eÆ ient,

robust,and adaptable. The rest ofthis se tion explainsthe details ofour approa h tothese

(6)

As traditionally posed, the pa ket routing problem requires a node in a network to de ide

to whi h neighboring node to forward a given pa ket su h that it willrea h its destination

most qui kly. In the network simulation des ribed above, ea h ma hine fa es a similar but

not identi al problem ea h time it nishes pro essing a job. When it is unable to omplete

the next step required by the job,itmust sear h amongitsneighborsforma hines that an

omplete that step (orthat an forward the job to ma hines that an omplete it). If more

than one neighbor qualies, the ma hine should make the hoi e that allows the job to be

ompleted asqui kly as possible.

In both our task and the traditional routing problem, the router tries to minimize the

travel time of a pa ket given only lo al information about the network. However, in the

traditionalproblemthegoalisonlytogetthepa ketfromitssour etoaspe ieddestination.

In our domain, this goal is not relevant. In fa t, sin e a job returns to its reator when it

is ompleted, the sour e and destination are the same. Instead, we want the job to travel

along apath that allows the appropriatema hines to omplete itsvarious steps insequen e

and return toits reatorin minimaltime.

In this se tion, we present four ways of addressing this modied routing problem: a

randommethod,tworeasonable heuristi methods, and Q-routing, amethodbased on

rein-for ement learning.

3.1.1 Random Router

As its nameimplies, the random routerforwards jobsto a ma hine sele ted randomlyfrom

the set of ontenders C. A neighboringma hine isa ontender if itis apableof ompleting

the job's next step. If no su h ma hines exist, then C is the set of all neighbors who an

forward the job to a ma hine that an omplete its next step. In the random router, the

probability that agiven job willbeforwarded to aspe i ontender 2C is:

P

=

1

jCj

where jCj is the size of C. Despite its simpli ity, the random router is not without merit.

Forexample,if alltheneighborshavethesamespeed anddonot re eiveloadfromanywhere

else, the random router will keep the load on those neighbors evenly balan ed. Of ourse,

it does not address any of the ompli ations that make routing a non-trivial problem and

hen e we expe t it toperform poorly inreal world s enarios.

3.1.2 Speed-Based Router

Without the aid of learning te hniques or global information about the network, a router

annotbeexpe ted toperform optimally. However, it an domu h better thanthe random

router by exploitingthe availablelo alinformation,like the speed of its neighbors, tomake

(7)

2C, P = speed( ) P 0 2C speed( 0 ))

Hen e, if there are two qualifying neighbors and one is twi e as fast as the other, a given

pa ket will have a 2/3 probability of going to the fast ma hine and a 1/3 probability of

going to the slower one. This algorithm ignores both the load these neighbors might be

re eiving from other ma hines and the status of any ma hines the pa ket might be sent to

later. Hen e, ita ts asa myopi load balan er.

3.1.3 Load-Based Router

Anotherheuristi approa htoroutingistoutilizeinformationabouttheloadonneighboring

nodes, re eived in Load Updates. In this ase, the router always routes to the qualifying

neighborwith the lowest urrently estimated load. Hen e, for ea h 2C,

P = ( 1; if load( ) load( 0 )for all 0 2C 0; otherwise

Ifthe apa ityoftheneighboringma hinesisa riti alfa torinthesystem'sperforman e,

then load information is likelyto be highly useful and the Load-Based Router will perform

very well. However, if the system's bottlene k is not adja ent to the ma hine making a

routing de ision, then its neighbors will often have no load and this heuristi will perform

identi allytothe RandomRouter.

There are many other feasible routingheuristi s besides those presented here (e.g.

on-sidering both load and speed when routing). However, all su h heuristi s must make their

de isions based only oninformation about immediate neighbors, whi h may or may not be

auseful guidetoeÆ ientrouting. By ontrast,ama hine usingQ-routing,presented below,

an learn to routewelleven when the riti al parts of the networks are not adja ent to it.

3.1.4 Q-Router

Despitethedistin tivefeaturesofourversionoftheroutingproblem,te hniquesdevelopedto

solvethetraditionalversion an,withmodi ation,beappliedtothetaskfa edbyma hines

in our simulation. In this arti le, we adapt one su h te hnique, alled Q-routing (Boyan

& Littman, 1994), to improve the performan e of our network. Q-routing is an on-line

learning te hnique in whi h reinfor ement learning modules are inserted into ea h node of

thenetwork. Reinfor ementlearning(Sutton&Barto,1998)agentsattempttolearnee tive

ontrol poli ies by observing the positive and negative rewards they re eive from behaving

in dierent ways in dierent situations. From this feedba k, reinfor ement learning agents

learn a value fun tion, whi h estimates the long-term value of taking a ertain a tion in a

ertain state. On e the value fun tion is known, deriving an ee tive poli y is trivial: in

ea h state the agent simplytakes the a tion that the value fun tion estimates willreap the

(8)

to minimize a time-to-go fun tion, whi h estimates how long a given pa ket will take to

omplete ifitisrouted toaparti ularneighbor. Ea hnode xmaintainsa tableof estimates

about thetime-to-goofdierenttypesofpa kets. Ea hentryQ

x

(d;y)isanestimateofhow

mu h additionaltime apa ketwilltaketotravelfrom xtoits ultimatedestinationd if itis

forwarded to y, a neighbor of x. If x sends a pa ket to y, it will immediatelyget ba k an

estimatet for x's time-to-go, whi h is basedon the values in y's Q-table:

t=min

z2Z Q

y (d;z)

where Z isthe set of y'sneighbors. Withthis information,x an update its estimateof the

time-to-go for pa kets bound for d that are sent to y. If q is the time the pa ket spent in

x's queue and s is the time the pa ket spent traveling between x and y, then the following

update rule applies:

Q

x

(d;y)=(1 )Q

x

(d;y)+ (q+s+t)

where is a learning rate parameter (0.7 in our experiments). In the standard terms of

reinfor ement learning (Sutton & Barto, 1998), q+s represents the instantaneous reward

( ost) and t is the estimated value of the next state, y.

By bootstrapping o the values in its neighbors' Q-tables, this update rule allows ea h

nodetoimproveitsestimateofapa ket'stime-to-gowithoutwaitingforthatpa kettorea h

its nal destination. This approa h is based dire tly on the Q-learning method (Watkins,

1989). On e reasonable Q-values have been learned, pa kets an be routed eÆ iently by

simply onsulting the appropriate entries in the Q-table and routing to the neighbor with

the lowest estimated time-to-gofor pa kets with the given destination.

State Representation. To make Q-routing more suitable for our unique version of the

routing problem, we must hange the state features on whi h learning is based. Instead of

ontainingsimply the job'sdestination, the Q-tables ontain three features that indi atein

whatgeneral dire tionthe jobisheaded (and thereforewhat ma hine resour es itwilllikely

tax if routedin a parti ular way):

the type of the job,

the type of the next step the job needs ompleted, and

the user who reated the job.

Inaddition,wewantafourthstate featurethatallowstherouterto onsider howmu hload

isalreadyontheneighbors towhi hitis onsideringforwardingajob. We ould addastate

feature for every neighbor that represents the urrent load onthat ma hine. However, this

would dramati ally in rease the size of the resulting Q-table, espe ially for large,

highly- onne ted networks, and ouldmaketable-basedlearninginfeasible. Fortunately,almostall

of those state features are irrelevant and an be dis arded. Sin e we are trying to estimate

(9)

& Veloso, 1999). As the name implies, a tion-dependent features ause an agent's state to

hange asdierent a tionsare onsidered. In this ase,our a tion-dependentfeature always

ontains the urrent load onwhatever neighbor we are onsidering routingto. The load on

allother neighborsis not in luded and hen ethe Q-tableremainsvery small.

Update Frequen y. TheoriginalformulationofQ-routingspe iesthatea htimeanode

re eives a pa ket it should reply with a time-to-go estimate for that pa ket. However, it is

notne essarilyoptimaltodosoeverytime. Infa t,thefrequen y atwhi hsu h updatesare

sentrepresents animportanttrade-o. Themore oftenareplyis sent, the morereliable the

router's feedba k will beand the more rapidly it willtrain. However, if repliesare sent less

often, then morenetwork bandwidth isreserved for a tualpa kets, instead ofbeing logged

with administrativeupdates. In our implementation,repliesare sent with a0.5 probability,

whi hwe determined tobe reasonable through informal experimentation.

A tionSele tion. Likeotherte hniquesbasedonreinfor ementlearning,Q-routingneeds

anexplorationme hanismtoensurethatoptimalpoli iesaredis overed. Iftherouteralways

sele ts the neighbor with the lowest time-to-go, it may end up with a sub-optimal poli y

be ause onlythe best neighbor'sestimatewilleverget updated. An explorationme hanism

ensures that the router will o asionally sele t neighbors other than the urrent best and

hen e eventually orre t sub-optimalities in its poli y. In our implementation, we use

-greedy exploration (Sutton& Barto, 1998), with set to 0.05. In -greedy exploration, the

routerwill,withprobability,sele taneighborrandomly;withprobability1 itwillsele t

the urrently estimated best neighbor.

3.2 S heduling

The routing te hniques dis ussed above allattemptto distribute load onthe system evenly

soasto minimizethe time that passesbetween the reationand ompletion of ajob. Doing

so orre tly plays an important role in overall system performan e, but it is not the only

fa tor. Our goal inthis arti le,and the point of introdu ing a high-level verti al simulator,

is to investigate the possibility of employing autonomi elements at more than one level of

the system. It is with this goalin mind that we attemptto ouple the routing me hanisms

already des ribed with s hedulers, whi h must determine how to most eÆ iently allo ate a

given ma hine'sCPU y les.

Be auseourgoalistomaximizeoverallutility,a ordingtotheutilityfun tionsgivenfor

ea huser,optimizingroutingalonewouldnotbeoptimal. Theroutingmethodspresentedin

this paperattempttominimizethe ompletiontimeof agiven pa ket. However, ompletion

time is only indire tly related to the s ore, whi h it is our goal to maximize. The s ore

assigned to any job is determined by a utility fun tion, whi h an be dierent for dierent

types of jobs or users. The only requirement is that the fun tion de rease monotoni ally

(10)

on its own: if we are minimizing the ompletion time, we must be maximizing the s ore.

However, this is trueonly inthe very limited ase whereall jobshavethe same importan e.

There are two importantways that jobs an varyin importan e.

Firstly, the jobsmay be governedby dierent utilityfun tions. Suppose jobs reated by

the intern were s ored a ording to the fun tion U(t)= t while jobs reated by the CEO

were s ored a ordingtothe fun tionU(t)= 100t. Inthis ase, the CEO's jobsare vastly

moreimportant. Clearly,anetworkthatdevotesasmu hofits apa itytowards theintern's

jobs asthe CEO'sjobs willbe very sub-optimal.

Se ondly,utilityfun tionsmay benon-linear. Even ifalljobsare ontrolledby the same

fun tion,ifthatfun tionisnon-linearthensomejobswillmattermorethanothers. Imagine

a utility fun tion that slopes down sharply while t 50 and then ompletely attens out.

Now onsider two jobs working their way through the network, one that was reated 25

timesteps ago and one that was reated 100 timesteps ago. In this s enario, the former job

is mu h more important than the latter. The job that has been running for 100 timesteps

is a \lost ause": it is already past the region in whi h there is hopeof improving its s ore

so spending network resour es to speed up its ompletion would be fruitless. By ontrast,

the jobthathas onlyrun for25timestepsis very important: ifit ispossibleto omplete the

jobinless than50timesteps, theneverystep that an beshaved oits ompletiontime will

result inan improved s ore.

Hen e, when jobs do not all have equal importan e, minimizingthe ompletion time of

less importantjobs an be dramati allysuboptimal be ause it uses network resour es that

would be better reserved for more important jobs. In this sense, the Q-routing te hnique

explainedabovehasagreedyapproa h: itattempts tomaximizethes ore ofagiven job(by

minimizingits ompletion time) but doesnot onsider howdoing somayae t the s ore of

other jobs.

In prin iple, this short oming ould be addressed by revising the values that the

Q-router learns and bases its de isions on. For example, if the Q-values represented global

utilityinsteadof time-to-go,the routerwouldhavenoin entivetofavorthe urrent joband

ouldeventuallylearntorouteinawaythatmaximizesglobalutility,even attheexpense of

aparti ularjob'stime-to-go. However, su hasystemwouldhavetheseriousdisadvantageof

requiring ea h node to have system-wide informationabout the onsequen es of itsa tions,

whereas the urrent system isable tolearn given onlyfeedba k from immediateneighbors.

Another alternative would be to hange the router's a tion spa e. Currently, an a tion

onsists of routing a parti ular job to some neighbor. Instead, ea h a tion ould represent

a de ision about how to route all the jobs urrently in the ma hine's queue. While su h a

system would redu e the router'smyopia, itwould reate aprohibitivelylarge a tionspa e.

Given a queue of length n and a set of m neighbors, there would be m n

possible a tions.

Sin e urrent reinfor ement learning methods s ale poorly to large a tion spa es, su h a

representation would render our approa hintra table.

Given these diÆ ulties, we believethe hallenges posed by ompli atedutility fun tions

(11)

in-be pro essed. They de idehow the ma hine's CPUtime willbes heduled. Bydetermining

whi h jobs are in most pressing need of ompletion and pro essing them rst, intelligent

s hedulers an maximizethe network'ss ore even whenthe utilityfun tionsare asymmetri

and non-linear. In the following subse tions, we present two simple s heduling heuristi s

and introdu e a new te hnique alled the insertion s heduler, whi h utilizes the time-to-go

estimates ontained inthe router'sQ-tableto assess a job'spriority.

3.2.1 FIFO S heduler

The default s hedulingalgorithmused inour simulator isthe rst-in rst-out (FIFO)

te h-nique. In this approa h, jobs that have been waiting in the ma hine's queue the longest

are always pro essed rst. More pre isely, the s heduler hooses the next job to pro ess by

sele ting randomlyfrom the set J

l

of jobs that have been waiting the longest. Iftime(j) is

the time that jobj arrived at the ma hine and J is the set of waitingjobs, J

l isdetermined as follows: J l =fj 2J jtime(j)time(j 0 );8j 0 2Jg

Clearly,the FIFOalgorithmdoesnothing toaddressthe ompli ationsthat arise whenjobs

have dierent importan e.

3.2.2 Priority S heduler

An alternative heuristi that does address these on erns is a priority s heduler, whi h is

similartomultilevelfeedba k queues(Tanenbaum,2001). Thisalgorithmworksjustlikethe

FIFOapproa h ex ept that ea h job isassigned apriority. When allo atingCPU time, the

priority s heduler examines only those jobs with the highest priority and sele ts randomly

from among the ones that have been waiting the longest. In other words, the priority

s heduler sele ts jobsrandomly fromthe following set:

J l =fj 2Jjtime(j)time(j 0 )^priority(j)priority(j 0 );8j 0 2Jg

If all the utility fun tions are simply multiples of ea h other, the priority s heduler an

a hieve optimal performan e by assigning jobs priorities that orrespond to the slope of

their utility fun tions. However, whenthe utility fun tions are trulydierent ornon-linear,

the problemof de idingwhi h jobsdeserve higher priority be omesmu h more ompli ated

and the simplisti approa hof the priority s heduler breaks down.

3.2.3 Insertion S heduler

Todevelop amoresophisti atedapproa h,weneedtoformulatethe problemmore arefully.

Everytimeanewjobarrivesatama hine,thes hedulermust hooseanorderingofallthe n

jobsinthequeue andsele tfor pro essingthe jobthatappears atthe headofthat ordering.

Of the n! possible orderings, we want the s heduler to sele t the ordering with the highest

(12)

orderingand 2) howto eÆ iently sele t the best orderingfrom amongthe n! ontenders.

The utility of an ordering is the sum of the onstituent jobs' s ores and a given job's

s oreisaknownfun tionof ompletiontime. Thus, the problemof estimatinganordering's

utility redu es to estimating the ompletion time of all the jobs in that ordering. A job's

ompletion time depends onthree fa tors:

1. How oldthe job was when itarrived atthe urrentma hine,

2. How long the jobwill waitinthis ma hine's queue given the onsidered ordering, and

3. How mu h additionaltime the job willtake to omplete after itleaves this ma hine.

The rst fa tor is known and the se ond fa tor is easily omputed given the speed of the

ma hine and a list of the jobs pre eding this one in the ordering. The third fa tor is not

known but anbeestimatedusingma hinelearning. Infa t,thevalueswewanttoknoware

exa tlythesameasthoseQ-routinglearns. Hen e, ifthes hedulerwepla einea hma hine

is oupled witha Q-router, noadditionallearningis ne essary. We an look up the entryin

the Q-table that orresponds to a job of the given type. Note that this estimate improves

over time asthe Q-router learns.

On e we an estimate the ompletion time of any job, we an ompute the utility of

any ordering. The only hallenge that remains is how to eÆ iently sele t a good ordering

fromamongthen!possibilities. Clearly,enumeratingea hpossibilityisnot omputationally

feasible. Ifwetreatthis taskasasear hproblem,we oulduseany ofanumberof

optimiza-tion te hniques (e.g. hill limbing, simulated annealing, or geneti algorithms). However,

these te hniques alsorequiresigni ant omputationalresour es and the performan e gains

oered by the orderings they dis over are unlikely to justify the CPU time they onsume,

sin ethe sear h needstobeperformedea htimeanew jobarrives. Given these onstraints,

we propose a simple, fast heuristi alled the insertion s heduler. When a new job arrives,

the insertion s hedulerdoes not onsider any orderings that are radi ally dierent from the

urrent ordering. Instead, it de ides at what position to insert the new job into the

ur-rent ordering su h that utility is maximized. Hen e, it needs to onsider only n orderings.

While this restri tion may prevent the insertions heduler fromdis overing the optimal

or-dering,itnonethelessallowsforintelligents hedulingofjobs,withonlylinear omputational

omplexity, that exploits learnedestimates of ompletion time.

3.2.4 Sample S heduler

Theinsertions hedulerusesaheuristi tosele twhi hnqueueorderingstoexamine. Inorder

to test the value of this heuristi , we developed another, similar s heduler that randomly

sele tswhi horderingstotest. Thesamples hedulerestimatestheutilityofea horderingit

examinesin exa tlythe same mannerasthe insertions heduler. It alsosele ts the ordering

that produ es the highest estimated utility, just like the insertion s heduler. The only

(13)

expe t itto outperform the samples heduler.

4 Experimental Framework

Our experiments test all of the above methods on three dierent networks, ea h of whi h

simulates a ommer ial enterprise system serving two users, a CEO and an Intern. In this

se tion, we detail the features that all three networks have in ommon: the job types, the

me hanismfor job reation, and the utility fun tions that are used by ea h user.

4.1 Job Types

In all three networks, there are two types of jobs that the users reate: Mail Jobs and

Database Jobs. Ea h MailJob onsistsof the following three steps:

1. Web Step, work =50

2. MailStep, work = 100

The work asso iated with ea h step is simply the number of CPU y les required to

omplete the step. As one might expe t, only Web Servers an omplete Web Steps and

onlyMailServers an omplete MailSteps. Hen e, inorder tobe ompleted, ea hMailJob

must travel along a path that in ludes the followingstops: 1)visit a Web Server, 2)visit a

Mail Server, 3)return to aWeb Server, 4)return tothe user who reated it.

Ea h Database Job onsists of the followingthree steps:

2. Database Step, work =200

Sin e only a Database an omplete a Database Step, the path of a Database Jobmust

in lude: 1)visit a Web Server, 2) visit a Database, 3)return to a Web Server, 4) returnto

the user who reated it.

4.2 Job Creation

Allthreenetworksuse the followingme hanismfordeterminingwhenusers reatenewjobs.

At ea h timestep, ea h user hooses randomly between reating one or two new jobs. For

ea h job, it hooses randomly between a Mail Job and a Database Job. The reation of

new jobs by ea h user is subje t to animportant restri tion: ea h user must remain below

(14)

method of generating jobs models features of real user behavior: users tend to redu e their

use ofnetworks thatare overloaded andthe reationofnew jobsdependsonthe ompletion

of older ones. For example, a user typing a do ument on a slow terminal is likely to stop

typing momentarilywhenthe numberof keystrokes not re e ted onthe s reenbe omestoo

great. In addition, this demand model allows us to easily test our methods on a network

thatisbusybutnotoverloaded. Anydemandmodelthatisnottiedtothe system's apa ity

islikely toeitherunder orover utilizenetwork resour es. Inthe former ase,weak methods

may still get good performan e sin e there is spare apa ity (i.e. a eiling ee t). In the

latter ase,even goodmethodswillperformbadlybe ausetheavailableresour es, regardless

of how they are allo ated,are insuÆ ientto meet demand. Our demand model, by striking

a balan e between these alternatives, allows us to more ee tively ompare methods of

optimizingthe network's performan e.

4.3 Utility Fun tions

Inorderto apture the omplexitiesraisedby usersof dieringimportan e,weassign

dier-ent utility fun tions, shown in Figure 2, to our two users. The utility fun tions are used in

allthreenetworks. Jobs reated by theinternare s oreda ordingtothefollowingfun tion:

U(t)= n

t=10; if t<50

10t+495; otherwise

where t is the job's ompletion time. By ontrast, jobs reated by the CEO are s ored

by the fun tion

U(t)= n

10t; ift<50

t=10 495; otherwise

The ru ialfeature of these metri sis that they do not have onstant slope. Hen e, the

hange in utility that the system reaps for redu ing the response time of a job is not

on-stant. As explainedabove, this feature givesrise tothe ompli ations that make intelligent

s heduling non-trivial. The point at whi h ea h fun tion hanges slopewas hosen so asto

lie in the region of the x-axis that orresponds to typi al ompletion times for jobs in our

networks. Ifthethreshold werenowherenearthis region,thentheutilityfun tionswouldde

fa tohave onstantslope,yieldingamu heasiers hedulingproblem(i.e. one thatthe

prior-ity s heduler ould handleoptimally). The utility fun tions were not tuned tothis problem

in any other way. Though our experiments study only this parti ular pair of metri s, our

algorithmsare designedtoworkwith arbitraryfun tionsof ompletiontime,solongasthey

are monotoni ally de reasing.

5 Results

In this se tion we des ribe experiments ondu ted on three dierent networks omparing

(15)

-600

-500

-400

-300

-200

-100

0

20

40

60

80

100

120

140 Utility

Completion Time

Utility Functions for the CEO and Intern

CEO

Intern

Figure 2: UtilityFun tions fortheCEOand Intern.

are designed to establish proof-of- on ept for our methods. Hen e, they are the simplest

networkswe ould onstru t thatexhibitthe ompli ationsour methodsattempttoaddress.

Thethirdnetworkislargeranddesigned toprovidesome onden ethat thesemethodswill

s ale up.

Ea hexperimentrunsfor20,000timesteps. Ea hsimulationispre ededby anadditional

5,000\warmup"stepsbeforetallyings ores. Inthe aseofQ-routing,arandomrouterisused

during the warmup steps; Q-routing is turned on and begins training only at timestep #0.

The purpose of the warmup isto ensure that the network isat full apa ity before learning

begins. Doingsohelpsdistinguish hangesinperforman eduetodis overingsuperiorpoli ies

from those due to load building up in an initially empty network. At any point in the

simulation, the s ore for ea h method represents a uniform moving average over the s ores

re eived for the last 100 ompleted jobs. The s ores are averaged over 20runs.

For ea h network, we present the results of pairing ea h routing method with a FIFO

s hedulerandpairingea hs hedulingmethodwithaQ-router. Forthesakeof larity,wedo

notpresenttheotherpossiblepairsthoughourexperiments onrmthatthose ombinations

perform worse than the best methods.

Attimestep#10,000inea hexperiment,a system atastrophe issimulatedinwhi h the

speed of a few riti al ma hines is ut in half. Sin e our learning methods are designed to

work on-line, we expe t them to adapt rapidly to hanges in their environment, a feature

tested by these simulated atastrophes. The details of whi h ma hines are ee ted in ea h

experiment are explained below.

In allof the results presented below, assertions of statisti alsigni an e are based ona

student's t-test (at 95% onden e) omparing the s ores of the two given methods

(16)

5.1 Network #1

Figure3depi ts the networkused inour rst experiment. In this network, theWeb Servers

are relatively slow. Hen e, they a t as a bottlene k to system performan e. Note that the

ma hinesthatmustmakeimportantroutingde isions(theLoadBalan ers),areneighborsof

thema hinesthatare riti altosystemperforman e. Hen e,thespeed-basedandload-based

routers,whi h relyon informationabout their neighbors, an perform very well.

Intern

CEO

Speed = 150

Web Server

Speed = 100

Speed = 50

Web Server

Speed = 200

Load Balancer

Speed = 50

Mail Server

Speed = 100

Mail Server

Speed = 150

Mail Server

Speed = 300

Speed = 200

Speed = 100

Database

Figure3: Network#1,inwhi htheWebServersa tasabottlene ktosystemperforman e. Ovals

representusersandre tanglesrepresentma hines;thelinesbetweenthemrepresentlinksthatallow

ommuni ation of jobs or other pa kets. The speed asso iated with ea h ma hine represents the

numberof CPU y les it an exe ute in one turn. The gray boxindi ates a ma hine whosespeed

isredu edbyhalfduringa system atastrophe.

Figure 4a ompares the performan e of all four routing methods when paired with a

FIFOs heduler and applied toNetwork #1.

The graph learly demonstrates that routingrandomly is dramati allysuboptimal. The

gap in performan e between the random router and its ompetitors is statisti ally

signi- ant. The Q-routingmethodinitiallyperformsaspoorlyasthe randomrouterbutimproves

rapidly. It performs errati allywhile exploringdierentpoli iesbut qui kly plateausat the

same level as the speed-based and load-based routers. In this network, it is not surprising

thatQ-routingdoesnotoutperformtheseheuristi s. Sin ethesystem's bottlene ko ursin

ma hinesthatneighbortheloadbalan ers,theheuristi methodsare abletorouteeÆ iently

based onthe speed and load informationthey re eive.

Attimestep#10,000,asystem atastropheissimulatedinwhi hthe speed ofthefastest

Web Server (indi ated by a gray box inFigure 3) is ut in half. Sin e the Web Servers a t

asbottlene ks inthis network,the atastrophe ausesadrop inperforman eforallmethods

ex ept the random router, whi h already routes so ineÆ iently that the atastrophe does

not further narrowits bottlene k.

Figure4b omparestheperforman eofallfours hedulerswhenpairedwiththeQ-router.

Inthis ase, usingthe insertions heduleryieldsastatisti allysigni antperforman e boost

over allthe other s hedulers. This result suggests that even when Q-routing does not itself

improve performan e, it is worth doing be ause the values it learns an be su essfully

(17)

-800

-750

-700

-650

-600

-550

-500

-450

-400

-350

-300

0 5000

10000

15000

20000

Score

Timesteps

A Comparision of Four Routers

Random Router & FIFO Scheduler

Speed-Based Router & FIFO Scheduler

Load-Based Router & FIFO Scheduler

Q-Router & FIFO Scheduler

-1400

-1200

-1000

-800

-600

-400

-200

0

0 5000

10000

15000

20000

Score

Timesteps

A Comparision of Four Schedulers

Q-Router & FIFO Scheduler

Q-Router & Priority Scheduler

Q-Router & Insertion Scheduler

Q-Router & Sample Scheduler

(a) Routers

(b) Schedulers

Q−Router & Priority Scheduler

Q−Router & FIFO Scheduler

Q−Router & Insertion Scheduler

Q−Router & Sample Scheduler

Q−Router & FIFO Scheduler

Random Router & FIFO Scheduler

Load−Based Router & FIFO Scheduler

Speed−Based Router & FIFO Scheduler

Figure 4: Results from Network #1. In (a), all four routing methods are paired with a FIFO

s heduler. In(b), all fours heduling methodsarepairedwithQ-routing.

insertions heduler, performssigni antlyworse,whi hsupportsour laimthatthe insertion

s heduler isa useful heuristi .

5.2 Network #2

Figure 5 depi ts the network used in our se ond experiment. It is identi al to Network #1

ex eptthatthe speedoftheWebServers hasbeen signi antlyin reased. Asa onsequen e,

the system's bottlene k moves from the Web Servers to the Mail Servers and Databases.

Be ause of this hange, the lo al informationthat the heuristi routers rely on is nolonger

useful. The Load Balan ers ansee onlythe Web Servers and their speeds and loadsare no

longer riti alto system performan e.

Intern

CEO

Speed = 150

Mail Server

Speed = 300

Database

Speed = 600

Speed = 800

Web Server

Speed = 200

Load Balancer

Speed = 50

Mail Server

Speed = 100

Mail Server

Speed = 200

Speed = 100

Database

Speed = 400

Figure 5: Network #2, in whi h the Mail Servers and Databases a t as a bottlene k to system

performan e. Ovals represent users and re tangles represent ma hines; the lines between them

representlinksthatallow ommuni ationofjobsorotherpa kets. Thespeedasso iatedwithea h

ma hine represents the number of CPU y les it an exe ute in one turn. Gray boxes indi ate

(18)

dependson ma hines that are notdire tly visibletothe Load Balan ers,it isnot surprising

thatQ-routingoutperformstheothermethodsinthisnetwork. Duetotheirin reasedspeed,

the Web Servers never have load a umulated in their queues, whi h auses the load-based

routertoperformjustlikearandomrouter. Thespeed-basedroutera tuallyperforms worse

than randombe auseit is misledby the irrelevantspeeds of the Web Servers and attempts

a ounterprodu tive load balan ing.

-260

-240

-220

-200

-180

-160

-140

-120

-100

0 5000

10000

15000

20000

Score

Timesteps

A Comparision of Four Routers

Random Router & FIFO Scheduler

Speed-Based Router & FIFO Scheduler

Load-Based Router & FIFO Scheduler

Q-Router & FIFO Scheduler

-350

-300

-250

-200

-150

-100

-50

0 5000

10000

15000

20000

Score

Timesteps

A Comparision of Four Schedulers

Q-Router & FIFO Scheduler

Q-Router & Priority Scheduler

Q-Router & Insertion Scheduler

Q-Router & Sample Scheduler

(a) Routers

(b) Schedulers

Q−Router & FIFO Scheduler

Load−Based Router & FIFO Scheduler

Random Router & FIFO Scheduler

Speed−Based Router & FIFO Scheduler

Q−Router & FIFO Scheduler

Q−Router & Priority Scheduler

Q−Router & Sample Scheduler

Q−Router & Insertion Scheduler

In this network, the atastrophe at timestep#10,000 involves utting in half the speed

of the fastest Mail Server and Database (indi ated by gray boxes in Figure 5). Sin e the

heuristi routerswere underloadingthefaster ma hinesbeforethe atastrophe,theredu tion

inspeed doesnot ee t them. Withfewerresour es available,the performan e of Q-routing

inevitably degrades, though it is able, through on-line adaptation of its poli y, to retain a

small advantage overthe othermethods.

Figure 6b pairs all four s hedulers with the Q-router toevaluated their performan e on

Network #2. As above, the insertion s heduler provides a statisti ally signi ant boost in

performan e over the other methods. The relatively weak s ores of the sample s heduler

further onrmthe usefulnessof theinsertions heduler's heuristi for sele tingorderingsto

evaluate.

5.3 Network #3

Networks#1and#2areintendedtoprovideproof-of- on eptfortheadvantagesof

learning-basedroutingands heduling. Todemonstratethattheseadvantagess aleup,wealsotested

(19)

its own Mail Server and Database. As in Network #2, the Web Servers have enough CPU

y les that the bottlene kto systemperforman eliesinthe MailServers andDatabases. In

order tokeep this larger network busy, the users are allowed to have500 in omplete jobsat

any time (as opposed to the 100 allows inNetworks #1 and #2).

Intern

CEO

Speed = 400

Web Server

Speed = 200

Load Balancer

Speed = 20

Mail Server

Speed = 40

Mail Server

Speed = 60

Mail Server

Speed = 80

Mail Server

Speed = 100

Mail Server

Speed = 120

Mail Server

Speed = 140

Mail Server

Speed = 160

Mail Server

Speed = 180

Mail Server

Speed = 200

Speed = 160

Speed = 120

Speed = 80

Speed = 40

Database

Speed = 400

Web Server

Speed = 240

Database

Speed = 280

Database

Speed = 320

Database

Speed = 360

Database

Figure7: Network#3,inwhi htheLoadBalan ersmust hoosebetweennineWebServersinstead

ofthree. Ovalsrepresentusersandre tanglesrepresentma hines;thelinesbetweenthemrepresent

linksthat allow ommuni ationof jobsorother pa kets. The speedasso iatedwithea h ma hine

represents the number of CPU y les it an exe ute in one turn. Gray boxes indi ate ma hines

whosespeedis redu edbyhalf duringasystem atastrophe.

Figure 8a ompares the performan e onthis larger network of all four routing methods

when paired with a FIFOs heduler. Sin e the Web Servers allhave the same speed in this

network, the speed-based router performs similarly to the random and load-based routers.

As inNetwork #2, Q-routing a hievesby far the best performan e, obtainingastatisti ally

signi ant improvement overthe other methods.

The atastrophe that o urs attimestep#10,000 onsists of utting inhalf the speed of

the four fastest MailServers and Databases (indi ated by gray boxes inFigure 7). Though

its performan e inevitably degrades, Q-routing re overs gra efully by adjusting its poli y

on-line inresponse toenvironmental hanges.

Figure 8b ompares allfour s hedulers on Network #3 by pairing them with Q-routing.

As before, the insertion s heduler s ores the highest, yielding a statisti ally signi ant

(20)

-1000

-900

-800

-700

-600

-500

-400

-300

-200

0 5000

10000

15000

20000

Score

Timesteps

A Comparision of Four Routers

Random Router & FIFO Scheduler

Speed-Based Router & FIFO Scheduler

Load-Based Router & FIFO Scheduler

Q-Router & FIFO Scheduler

-1800

-1600

-1400

-1200

-1000

-800

-600

-400

-200

0

0 5000

10000

15000

20000

Score

Timesteps

A Comparision of Four Schedulers

Q-Router & FIFO Scheduler

Q-Router & Priority Scheduler

Q-Router & Insertion Scheduler

Q-Router & Sample Scheduler

(a) Routers

(b) Schedulers

Q−Router & FIFO Scheduler

Random Router & FIFO Scheduler

Speed−Based Router & FIFO Scheduler

Load−Based Router & FIFO Scheduler

Q−Router & FIFO Scheduler

Q−Router & Priority Scheduler

Q−Router & Insertion Scheduler

Q−Router & Sample Scheduler

evaluationsof the sample s heduler.

6 Dis ussion

Our experimental results indi ate learly that ma hine learning methods oer a

substan-tial advantage in optimizing the performan e of omputer networks. Both the router and

s hedulerpla edinea hma hinebenet substantiallyfromthetime-to-goestimates

dis ov-eredthrough reinfor ementlearning. Furthermore,the best performan e isa hieved onlyby

pla ing intelligent, adaptive agents at more than one level of the system: the Q-router and

the insertions hedulerperform better togetherthan either ould apart. Hen e, they benet

from a sensible division of optimization tasks; the router fo uses on routing jobs eÆ iently

and balan ingload throughoutthe networkwhilethe s heduler fo uses onprioritizingthose

jobs whose ee t onthe s ore will be most de isive.

This advantage is espe ially ompelling in systems like Networks #2 and #3 in whi h

the ma hines that make important routing de isions (the Load Balan ers) are not dire t

neighbors of the ma hines that are most riti al to system performan e (the Mail Servers

and Databases). In these s enarios, the information ne essary to route eÆ iently is not

dire tly available and a good poli y an be dis overed only by learning from experien e.

However, even when this is not the ase, as inNetwork #1, and heuristi routing methods

perform well, learning-basedsystems an still reap an advantage by exploiting Q-routing's

time-to-goestimates toimprove CPU s heduling. Furthermore, the su ess of the Q-router

andtheinsertions heduleronalargersystem(Network#3)is auseforoptimismthatthese

(21)

generate(intheformofLoadUpdatesandQ-Updates),itdoesnotmodelthe omputational

ost of running them. This simpli ation is probably not signi ant for routing, sin e

Q-routing'supdaterule requiresonlyahandfulofarithmeti operationsforea hjob. However,

s heduling algorithms an be signi antly more expensive, whi h is why we are ommitted

to nding fast heuristi s for addressing this problem. For example, the insertion s heduler

examines onlyn ofthe n! possible queue orderings ea h time it must rearrange its jobs.

Sin e allofthe resultswepresent were obtainedinsimulation,itremainstobe seenhow

these methods will perform in real systems. In parti ular, while our simulator strives to

apture many of the intri a ies that make adaptive routing and s heduling a hallenge, it

also glosses over many ompli ating aspe ts of real systems. For example, in our simulator

the time required to omplete a step is a deterministi fun tion of the ma hine's speed,

whereas inreal systems itdepends onthe state ofthe ma hine's memory, disks, a hes, et .

Inaddition,oursimulatorassumesthatjobsneverneedresour esonmorethanonema hine

simultaneously,thusavoidingtheneedtoreasonabout lo ks,pin datain a hes, et . These,

and many other, issues will need to be addressed before our methods will be deployable in

real systems. However, webelievethat our top-down, system-wide, perspe tivewillplayan

essential role in the emerging eld of autonomi omputing.

7 Related Work

The study of autonomi omputing is still in its early stages. Nonetheless, there already

exists a broad body of work that relates, interms of both methodsand goals, tothe eorts

des ribed here.

The method with the losest relationship to our approa h is of ourse the original

Q-routing te hnique (Boyan & Littman, 1994), of whi h our method is a lose adaptation.

A hief limitation of anoni al Q-routing is that distinguishes between pa kets based only

on their ultimate destination, whi h is insuÆ ient in enterprise systems where jobs always

return to their reator upon ompletion. In this arti le, we show that by expanding the

learner's state representation, the prin iple behind Q-routing an be su essfully appliedto

resour e management tasksthat donot exa tlymat h the routing problemas traditionally

posed.

In addition, there are many other approa hes to network routing with the aid of

ma- hine learning (Di Caro &Dorigo, 1998;Clark etal., 2003; Itaoet al.,2001). In parti ular,

AntNet (Di Caro & Dorigo, 1998) uses an ant olony metaphor to reate agents that

on- urrently explorethe networkand ex hangeinformationabout it. Though it ouldbe easily

adapted tomanage workloads onenterprise systems, it does not expli itly learn time-to-go

estimates for ea h type of pa ket and hen e its routing tables ould not easily be exploited

by CPU s hedulers, as we havedone with Q-routing.

The operating systems literaturedis usses many te hniques for CPUs heduling,

in lud-ing multilevelfeedba k queues (Tanenbaum, 2001), whi h are similar tothe priority

(22)

Beyondthes opeofroutingands heduling,thereisalso onsiderableresear hdevotedto

using ma hine learningtooptimizeresour emanagementin omplexsystems. Forexample,

Brauerand Weiss (1998)usereinfor ementlearningmethodstooptimizes hedulingoftasks

onmultiplema hines. Gomezetal.(2001)useneuroevolutiontooptimizedynami resour e

allo ationona hipmultipro essor. Abdezaheretal.(2002)presentate hniqueformanaging

web servers that onsiders, as we do, jobs of diering utility, though their method uses

only lassi al feedba k ontrol. Also, Yellin (2003) presents an algorithm for dynami ally

sele ting among omponent implementations,though ea h individual implementationmust

still be manually ongured. These methods are important ontributions to the goal of

optimizingindividual parts of omplex systems but they donot address our spe i fo us:

the problems and possibilities of simultaneously optimizing multiple omponents from a

system-wide perspe tive.

Whereas our resear h employs reinfor ementlearning te hniques, other resear hers have

usedsupervised learningmethodstoaddress lassi ationproblems ofinteresttoautonomi

omputing. Chen et al. (2004) use de ision trees todiagnose system failures, an important

problem. However, theirsolutiondoesnot addressthequestion ofhowtodealwith afailure

on e it is diagnosed, whi h would require either manual intervention or a reinfor ement

learning agent. Mesnier et al. (2004) use de ision trees to lassify dierent le types and

therefore improvedisk performan e. Again, su h a system is useful only if we know(or an

get the omputer tolearn) what kindof disk behavior isbest suited toea h letype.

Finally, there is also resear h in autonomi omputing that does not address ma hine

learning at all, but instead redesigns system ar hite tures to reate new possibilities for

adaptation. For example, Jann et al. (2003) present an ar hite ture that allows dynami

movement of hardware resour es a ross logi al partitions without rebooting. They do not

address how an intelligent agent might determine when resour e reallo ationshould o ur.

Hen e their work, while striving for dierent goals, ts ohesively with our approa h: as

they try to reate opportunities for adaptability, we try to reate agents that an exploit

those opportunities.

8 Future Work

In ongoingresear h,we plan toinvestigatenew ways of applyingma hine learningmethods

tofurtherautomateand optimizenetworks likethe onestudiedinthis arti le. Inparti ular,

we hope toautomate the de ision of how frequently ma hines should send updates to their

neighbors. Both Load Updates and Q-Updates are more useful if they are sent more often;

however, both kinds of updates also tax pre ious network bandwidth. Rather than nding

the balan e between these two fa tors through manual experimentation, we would like to

deviseanetworkintelligentenoughtodetermineoptimalupdatefrequen ieswithouthuman

assistan e.

In addition, we would liketo use ma hine learningto determine what network topology

(23)

We hope to develop a system in whi h ma hine learning helps determine the most eÆ ient

stru tureof thenetworkwhenitisinitiallydesigned,whenitneedstobeupgraded,orwhen

it isrepaired.

Our on-goingresear hgoalistodis over, implement,and testma hinelearningmethods

in support of autonomi omputing at all levels of a omputer system. Though this initial

work is all in simulation, the true measure of our methods is whether they an impa t

performan e on real systems. Whenever possible, our design de isions are made with this

fa tinmind. Ultimately weplantoimplementandtest our autonomi omputingmethods,

su h asthe Q-router and insertions heduler, onreal omputer systems.

9 Con lusion

The three main ontributions of this arti le are:

1. A on rete formulationof theautonomi omputingproblemintermsonthe

represen-tative taskof enterprise system optimization.

2. A new verti al simulator designed to abstra tly represent all aspe ts of a omputer

system. This simulator is fully implemented and tested. It is used for all of the

experiments presented inthis paper.

3. Adaptiveapproa hestothenetworkroutingands hedulingproblemsinthissimulator

that out-perform reasonable ben hmark poli ies.

The simulator that we introdu e fa ilitates the study of autonomi omputing methods

from a top-down perspe tive. Rather than simply optimizing individual omponents, we

fo usonoptimizingtheintera tionsbetween omponentsatasystem-wide level. Theresults

presented here, in addition to demonstrating that ma hine learning methods an oer a

signi ant advantage for job routing and s heduling, alsovalidate this top-down approa h.

Theyprovideeviden eofthevalueof ombiningintelligent,adaptiveagentsatmorethanone

level of the system. Togetherthese results oer hope that ma hine learningmethods, when

appliedrepeatedlyandin on ert, anprodu etherobust,self- onguring,andself-repairing

systems ne essary tomeet tomorrow's omputingneeds.

A knowledgments

We would liketo thank IBM for a generous fa ulty award tohelp jump-startthis resear h.

In parti ular, thanks to Russ Blaisdellfor valuable te hni al dis ussions and to Ann Marie

Maynard for serving as a liaison. This resear h was supported in part by NSF CAREER

award IIS-0237699. Finally,we would liketo thank Gerry Tesauro for hisinsightful

sugges-tions about implementingQ-routing andEmmettWit helfor his onstru tive ommentson

(24)

Abdelzaher,T.,Shin,K.G.,&Bhatti,N.(2002).Performan eguaranteesforwebserver

end-systems: A ontrol-theoreti al approa h. IEEE Transa tions on Parallel and Distributed

Systems, 13.

Boyan, J. A., & Littman, M. L. (1994). Pa ket routing in dynami ally hanging networks:

A reinfor ement learning approa h. Advan es in Neural Information Pro essing Systems

(pp. 671{678). Morgan KaufmannPublishers, In .

Bra hman, R. J.(2002). Systems that know what they're doing. IEEE Intelligent Systems,

17, 67{71.

Brauer,W.,&Weiss,G.(1998).Multi-ma hines heduling-amulti-agentlearningapproa h.

Pro eedings of the International Conferen e on Multi-Agent Systems (pp. 42{48).

Chen, M., Zheng, A., Lloyd, J., Jordan, M., & Brewer, E. (2004). Failure diagnosis using

de ision trees. Pro eedings of The International Conferen e on Autonomi Computing

(ICAC-04). Toappear.

Clark, D. D., Partridge, C., Ramming, J. C., & Wro lawski, J. (2003). A knowledge plane

for the internet. Pro eedingsof ACM SIGCOMM.

Di Caro, G., & Dorigo, M. (1998). AntNet: Distributed stigmergeti ontrol for

ommuni- ations networks. Journal of Arti ial Intelligen e Resear h,9, 317{365.

Gomez, F., Burger, D., & Miikkulainen, R. (2001). A neuroevolution method for dynami

resour e allo ationona hip multipro essor. Pro eedingsof theINNS-IEEE International

Joint Conferen e on Neural Networks (pp. 2355{2361). IEEE.

Itao,T., Suda,T., &Aoyama, T.(2001). Ja k-in-the-net: Adaptivenetworking ar hite ture

for servi e emergen e. Pro eedingsof the Asian-Pa i Conferen e on Communi ations.

Jann, J., Browning, L. M., & Burugula, R. S. (2003). Dynami re onguration: Basi

building blo ks for autonomi omputing on IBM pSeries servers. IBM Systems Journal,

42, 29{37.

Kephart, J. O., &Chess, D. M.(2003). The visionof autonomi omputing. Computer, 36,

41{50.

Mesnier,M.,Thereska,E.,Ellard,D.,Ganger,G.R.,&Seltzer,M.(2004). File lassi ation

in self-* storage systems. Pro eedings of The International Conferen e on Autonomi

Computing (ICAC-04). Toappear.

Stone, P., & Veloso, M. (1999). Team-partitioned, opaque-transition reinfor ement

learn-ing. In M. Asada and H. Kitano (Eds.), RoboCup-98: Robot so er world up II. Berlin:

SpringerVerlag.AlsoinPro eedingsoftheThird InternationalConferen eonAutonomous

(25)

MA: MIT Press.

Tanenbaum,A. (2001). Modern operating systems. Englewood Clis,NJ: Prenti e Hall.

Walsh,W.E.,Tesauro,G.,Kephart,J.O., &Das,R.(2004). Utilityfun tionsinautonomi

systems.Pro eedingsoftheInternationalConferen eonAutonomi Computing.Toappear.

Watkins, C. J. C. H. (1989). Learning from delayed rewards. Do toral dissertation, King's

College,Cambridge,UK.

Yellin,D. M.(2003). Competitivealgorithmsforthedynami sele tionof omponent