and S heduling
Shimon Whiteson and Peter Stone
Department of Computer S ien es
The University of Texas at Austin
1 University Station, C0500 Austin, TX 78712-0233 fshimon,pstoneg s.utexas.edu http://www. s.utexas.edu/~{shimon,pstone} May 7, 2004 Abstra t
Computer systems are rapidly be oming so omplex that maintaining them with
human support stas will be prohibitively expensive and ineÆ ient. In response,
vi-sionaries have begunproposingthat omputer systems be imbued withthe abilityto
ongure themselves, diagnose failures, and ultimately repair themselves in response
to these failures. However, despite onvin ing arguments that su h a shift would be
desirable,asofyetthere hasbeenlittle on rete progressmade towards thisgoal. We
viewtheseproblemsasfundamentallyma hine learning hallenges. Hen e,thisarti le
presentsanewnetworksimulatordesignedtostudytheappli ationofma hinelearning
methods from a system-wide perspe tive. We also introdu e learning-based methods
for addressing the problems of job routing and CPU s heduling in the networks we
simulate. Our experimental results verify that methods using ma hine learning
out-performreasonableheuristi andhand- odedapproa hesonexamplenetworksdesigned
to apturemanyofthe omplexities thatexist inreal systems.
1 Introdu tion
Computer systems are rapidly be oming|indeedsome would say have already be ome|so
omplex that maintaining them with human support stas will be prohibitively expensive
andineÆ ient. Largeenterprisesystems,su hasthosefoundinmediumtolarge ompanies,
areprimeexamplesofthisphenomenon. Nonetheless, most omputersystemstodayare still
builttorelyonstati ongurationsand anbeinstalled, ongured,and re- onguredonly
the ability to ongure themselves, diagnose failures, and ultimately repair themselves in
response to these failures. The resulting shift in omputational paradigm has been alled
by dierent names, in luding ognitive systems (Bra hman, 2002) and autonomi
omput-ing (Kephart &Chess,2003),but theunderlyingmotivationand goalisremarkablysimilar.
Ourlong-termgoalistoenablelarge-s aleintegrated omputersystems, onsistingoftens
tohundredsofma hineswithvaryingfun tionality,tobedeliveredinadefault onguration
and then in rementally tune themselves to the needs of a parti ular enterprise based on
observed usage patterns. In addition, the systems should be able to adapt to hanges in
onne tivity due to system failures and/or omponentupgrades.
Weviewthesegoalsasfundamentallyma hinelearning hallenges. For omputersystems
to optimize their own performan e without human assistan e, they will need to learn from
experien e. To ontinuallyadjust in response to adynami environment,they willneed the
adaptabilitythat onlyma hine learning an oer.
Fully automating the maintenan e and optimization of a large omputer system via
ma hinelearningmethodsisastaggering hallenge. Ifitisnota hievabletoday,what
short-termgoalsshouldwesettomaximizethelikelihoodthatitwillbea hievabletomorrow? One
approa hisbottom-up: insteadofoptimizinganentiresystem,we anoptimizeitsindividual
omponents. Though the study of autonomi omputingis stillinitsearlystages,there has
already been alotof preliminaryprogress withthe bottom-up approa h(see Se tion7 fora
thoroughsurveyofrelatedwork). Forexample,manyresear hershaveusedma hinelearning
tooptimizenetworkrouting(Boyan&Littman,1994; DiCaro &Dorigo,1998; Clarket al.,
2003; Itao et al., 2001). In addition, Gomez et al. (2001) used neuroevolution to optimize
dynami resour e allo ationona hip multipro essor. Chenetal.(2004)used de isiontrees
todiagnosesystem failures and Mesnier etal.(2004)used de isiontrees to lassify dierent
le types and therefore improve disk performan e. Furthermore, Brauer and Weiss (1998)
usedreinfor ementlearningmethodstooptimizings hedulingoftasksonmultiplema hines.
Thebottom-upapproa hisappealingbe auseoptimizingindividual omponentsismu h
more feasible than optimizingentire systems. Eventually, these omponents an be
assem-bled into an autonomi system that should perform better than manually ongured ones.
However, su hanapproa hfailstoaddresstheee tsofintera tionsbetweenvarious
ompo-nentsanddoesnot apitalizeonopportunitiestooptimizeatasystem-widelevel. Therefore,
we propose atop-down approa hto developing autonomi systems. Our emphasis ison
op-timizing the entire system by developing autonomi omponents that work well, not just
independently, but in on ert with other su h omponents.
Sin e doing so in a real, full- edged enterprise system is not urrently feasible, we
in-trodu e in this arti le a high-level simulator designed to fa ilitate the study of ma hine
learning in enterprise systems. Our simulator aptures some of the key omplexities that
make system-wide autonomi omputing a hallenge, while abstra ting away the low-level
detailsthat urrentlymakeitimpra ti alto reatefullyautonomi systemsonrealhardware.
In addition to introdu ing this new tool, we present ma hine learning approa hes to
validate the notion of atop-down approa h toautonomi omputing.
Theremainderofthisarti leisorganizedasfollows. Se tion2providesba kgroundonour
networksimulatorand Se tion3detailsour methodsforoptimizingroutingand s heduling.
Se tion 4 explains our experimental framework and Se tion 5 presents the results of those
experiments. Se tion 6 dis usses the impli ations of these results, Se tion7 reviews related
work, and Se tion8 highlights some opportunities for future work.
2 Simulation
To pursue our resear h goals, we need a high-level simulator that is apable of modeling
the relevant typesof intera tions among the many dierent omponents of a omputer
sys-tem. While detailed simulators exist for individual system omponents, su h as networks,
databases,et .,toour knowledgethereisnosimulatorthatmodelssystem-wideintera tions.
Therefore,wehavedesignedandimplementedasystemthatsimulatesthewaya omputer
networkpro esses user requestsfroma high-levelperspe tive. The simulatorisverygeneral
purposeand anbeusedtorepresentmanydierentkindsofnetworks. Forexample,Figure1
depi ts a ommer ial enterprise system in whi h aset of users use a web interfa e to he k
their mailorquery a database.
Web Server
Load Balancer
Mail Server
Database
Web Server
Load Balancer
Mail Server
Database
Web Server
Load Balancer
Mail Server
Database
Web Server
Load Balancer
Mail Server
Database
User
User
User
User
Figure 1: An example of a network implemented in our simulator; ovals represent users and
re tanglesrepresentma hines;thelinesbetweenthem representlinksthatallow ommuni ationof
jobsorother pa kets.
The simulator represents a omputer network as a graph: nodes represent ma hines or
users andlinks represent the ommuni ation hannelsbetweenthem. Users reate jobsthat
travelfromma hinetoma hinealonglinksuntilalloftheirstepsare ompleted. InFigure1,
users forward their requests to a Load Balan er whi h sele ts a Web Server to handle the
job. The Web Server forwards the job to a Mail Server or Database as appropriate. When
ompleted, the job is sent ba k to the user who reated it.
There are two primary typesof nodes: users and ma hines.
Users: Users are spe ial nodes who reate jobs and send them toma hines for pro essing.
On e ajob is ompleted, itis returned to the user, who omputes itss ore.
Ma hines: Ama hine isanode that an ompleteportionsofajob. Ea htype ofma hine
is dened by the set of steps it knows how to omplete. Completing steps uses the
ma hine'sCPUtimesoea hma hine musthaveanalgorithmforallo atingCPUtime
among the jobs urrently in its possession. If a ma hine annot omplete the next
step of a given job, itmust look for a neighboring node that an. If several neighbors
qualify,the ma hine must makearoutingde ision,i.e. itmust trytodetermine whi h
of the ontending neighbors to forward the job to so as to optimize performan e of
the whole system. An intelligent agent an be used to ontrol a ma hine and make
de isionsabout howto allo ateCPU time and route jobs.
Links: A link onne ts two nodes in the network. It is used to transfer jobs and other
pa kets between nodes.
Pa kets: Apa ketisaunitofinformationthattravelsbetween nodesalonglinks. Themost
ommontype of pa ket is a job, des ribed below, but nodes an reate other types of
pa kets inorder to ommuni ate with othernodes about their status.
Jobs: Ajob isaseriesofsteps thatneedtobe ompletedinaspe iedorder. Forexample,
auserwho wishestobuysomething oaweb sitemight reatea\pur hase job." This
job might in lude steps su h as a essing the ustomer database, onrming redit
ardinformation,andgenerating anorder onrmation. Completingthesesteps ould
require the job to travel among several ma hines. A system usually has several types
of jobs whi h dierin the listof steps they require for ompletion.
Steps: A step is one omponent of a job. Ea h step an only be arried out by a subset
of ma hinesinthe network. Forexample, the retrievalof informationinresponse toa
database query must happen at adatabase server.
In theexampleshown inFigure1,MailJobs musttraveltoaWeb Server, aMailServer,
and ba k toaWeb Server beforereturningtothe user. DatabaseJobs must traveltoaWeb
Server, aDatabase, and ba k toa Web Serverbeforereturning tothe user. The goal of the
agents ontrollingthe system is topro ess these user requests as eÆ iently as possible.
A simulation pro eeds for a spe ied number of dis rete timesteps. At ea h timestep,
ma hines an allo atetheir CPU y les towards the ompletion ofsteps onthe jobs intheir
possession, pa kets an be sent alonglinks, and pa kets an arrive atnew nodes.
We believe that this simulator provides a valuable testbed for new approa hes to
auto-nomi omputing. Be ause its design is very general, it an be used to represent a wide
ior. Most importantly, the simulator aptures many of the real world problems asso iated
with omplex omputer systems while retainingthe simpli ity that makes experimental
re-sear h feasible.
2.1 Load Updates
Ea h ma hine periodi ally(inour ase, everyvetimesteps) sends a spe ial pa ket alleda
LoadUpdatetoea hofitsneighbors. ALoadUpdateindi ateshowmanyjobsthatma hine
alreadyhas initsqueue. The ontentsof su hanupdate an helpanintelligentroutermake
better de isions. However, the very presen e of the update isalsoimportantinformation: if
a ma hine does not re eive any updates from a given neighbor for a ertain period of time
(ten timesteps),it on ludes that thatneighborhas gone down andwillnolongerroute any
jobs to it until it re eives another update. The system has a strong in entive to qui kly
dete t when ma hines go down, sin e any jobs routed to a down ma hine re eive a sti
penalty. Forthepurposesof s oring,su hjobsare given a ompletiontimeof500 timesteps,
though in fa t they are never a tually ompleted. Sin e Load Updates are ommuni ated
via pa kets, they in ur real network traÆ overhead in the simulator. As long as they are
not too frequent, in ludingthem asa routineo urren eis not unrealisti .
2.2 Utility Fun tions
The ultimategoal of our eorts is to improve the network's utility to its users. In the real
world, that utility is not ne essarily straightforward. While it is safe to assume that users
always want their jobs ompleted asqui kly as possible, the value of redu ing a job's
om-pletiontimeisnotalwaysthesame. Furthermore,ea husermayhaveadierentimportan e
tothe system.
In orderto apture these omplexities,the simulatorallows dierent utilityfun tionsfor
ea h user or job type (Walsh et al., 2004). Ea h utility fun tion an be any monotoni ally
de reasing fun tion that maps a job's ompletion time to its utility. Hen e, the goalof the
agents ontrollingthenetwork isnot tominimizeaverage ompletiontime, buttomaximize
the umulative utilityover allthe jobs it isasked to pro ess.
3 Method
In this se tion, we present our approa hto developing intelligent routersand s hedulers for
networks like the ones shown in Figure 1. A hieving good performan e in su h a network
using xed algorithms and hand- oded heuristi s is very diÆ ult and prone to in exibility.
Instead, we use reinfor ement learning to develop routers and s hedulers that are eÆ ient,
robust,and adaptable. The rest ofthis se tion explainsthe details ofour approa h tothese
As traditionally posed, the pa ket routing problem requires a node in a network to de ide
to whi h neighboring node to forward a given pa ket su h that it willrea h its destination
most qui kly. In the network simulation des ribed above, ea h ma hine fa es a similar but
not identi al problem ea h time it nishes pro essing a job. When it is unable to omplete
the next step required by the job,itmust sear h amongitsneighborsforma hines that an
omplete that step (orthat an forward the job to ma hines that an omplete it). If more
than one neighbor qualies, the ma hine should make the hoi e that allows the job to be
ompleted asqui kly as possible.
In both our task and the traditional routing problem, the router tries to minimize the
travel time of a pa ket given only lo al information about the network. However, in the
traditionalproblemthegoalisonlytogetthepa ketfromitssour etoaspe ieddestination.
In our domain, this goal is not relevant. In fa t, sin e a job returns to its reator when it
is ompleted, the sour e and destination are the same. Instead, we want the job to travel
along apath that allows the appropriatema hines to omplete itsvarious steps insequen e
and return toits reatorin minimaltime.
In this se tion, we present four ways of addressing this modied routing problem: a
randommethod,tworeasonable heuristi methods, and Q-routing, amethodbased on
rein-for ement learning.
3.1.1 Random Router
As its nameimplies, the random routerforwards jobsto a ma hine sele ted randomlyfrom
the set of ontenders C. A neighboringma hine isa ontender if itis apableof ompleting
the job's next step. If no su h ma hines exist, then C is the set of all neighbors who an
forward the job to a ma hine that an omplete its next step. In the random router, the
probability that agiven job willbeforwarded to aspe i ontender 2C is:
P
=
1
jCj
where jCj is the size of C. Despite its simpli ity, the random router is not without merit.
Forexample,if alltheneighborshavethesamespeed anddonot re eiveloadfromanywhere
else, the random router will keep the load on those neighbors evenly balan ed. Of ourse,
it does not address any of the ompli ations that make routing a non-trivial problem and
hen e we expe t it toperform poorly inreal world s enarios.
3.1.2 Speed-Based Router
Without the aid of learning te hniques or global information about the network, a router
annotbeexpe ted toperform optimally. However, it an domu h better thanthe random
router by exploitingthe availablelo alinformation,like the speed of its neighbors, tomake
2C, P = speed( ) P 0 2C speed( 0 ))
Hen e, if there are two qualifying neighbors and one is twi e as fast as the other, a given
pa ket will have a 2/3 probability of going to the fast ma hine and a 1/3 probability of
going to the slower one. This algorithm ignores both the load these neighbors might be
re eiving from other ma hines and the status of any ma hines the pa ket might be sent to
later. Hen e, ita ts asa myopi load balan er.
3.1.3 Load-Based Router
Anotherheuristi approa htoroutingistoutilizeinformationabouttheloadonneighboring
nodes, re eived in Load Updates. In this ase, the router always routes to the qualifying
neighborwith the lowest urrently estimated load. Hen e, for ea h 2C,
P = ( 1; if load( ) load( 0 )for all 0 2C 0; otherwise
Ifthe apa ityoftheneighboringma hinesisa riti alfa torinthesystem'sperforman e,
then load information is likelyto be highly useful and the Load-Based Router will perform
very well. However, if the system's bottlene k is not adja ent to the ma hine making a
routing de ision, then its neighbors will often have no load and this heuristi will perform
identi allytothe RandomRouter.
There are many other feasible routingheuristi s besides those presented here (e.g.
on-sidering both load and speed when routing). However, all su h heuristi s must make their
de isions based only oninformation about immediate neighbors, whi h may or may not be
auseful guidetoeÆ ientrouting. By ontrast,ama hine usingQ-routing,presented below,
an learn to routewelleven when the riti al parts of the networks are not adja ent to it.
3.1.4 Q-Router
Despitethedistin tivefeaturesofourversionoftheroutingproblem,te hniquesdevelopedto
solvethetraditionalversion an,withmodi ation,beappliedtothetaskfa edbyma hines
in our simulation. In this arti le, we adapt one su h te hnique, alled Q-routing (Boyan
& Littman, 1994), to improve the performan e of our network. Q-routing is an on-line
learning te hnique in whi h reinfor ement learning modules are inserted into ea h node of
thenetwork. Reinfor ementlearning(Sutton&Barto,1998)agentsattempttolearnee tive
ontrol poli ies by observing the positive and negative rewards they re eive from behaving
in dierent ways in dierent situations. From this feedba k, reinfor ement learning agents
learn a value fun tion, whi h estimates the long-term value of taking a ertain a tion in a
ertain state. On e the value fun tion is known, deriving an ee tive poli y is trivial: in
ea h state the agent simplytakes the a tion that the value fun tion estimates willreap the
to minimize a time-to-go fun tion, whi h estimates how long a given pa ket will take to
omplete ifitisrouted toaparti ularneighbor. Ea hnode xmaintainsa tableof estimates
about thetime-to-goofdierenttypesofpa kets. Ea hentryQ
x
(d;y)isanestimateofhow
mu h additionaltime apa ketwilltaketotravelfrom xtoits ultimatedestinationd if itis
forwarded to y, a neighbor of x. If x sends a pa ket to y, it will immediatelyget ba k an
estimatet for x's time-to-go, whi h is basedon the values in y's Q-table:
t=min
z2Z Q
y (d;z)
where Z isthe set of y'sneighbors. Withthis information,x an update its estimateof the
time-to-go for pa kets bound for d that are sent to y. If q is the time the pa ket spent in
x's queue and s is the time the pa ket spent traveling between x and y, then the following
update rule applies:
Q
x
(d;y)=(1 )Q
x
(d;y)+ (q+s+t)
where is a learning rate parameter (0.7 in our experiments). In the standard terms of
reinfor ement learning (Sutton & Barto, 1998), q+s represents the instantaneous reward
( ost) and t is the estimated value of the next state, y.
By bootstrapping o the values in its neighbors' Q-tables, this update rule allows ea h
nodetoimproveitsestimateofapa ket'stime-to-gowithoutwaitingforthatpa kettorea h
its nal destination. This approa h is based dire tly on the Q-learning method (Watkins,
1989). On e reasonable Q-values have been learned, pa kets an be routed eÆ iently by
simply onsulting the appropriate entries in the Q-table and routing to the neighbor with
the lowest estimated time-to-gofor pa kets with the given destination.
State Representation. To make Q-routing more suitable for our unique version of the
routing problem, we must hange the state features on whi h learning is based. Instead of
ontainingsimply the job'sdestination, the Q-tables ontain three features that indi atein
whatgeneral dire tionthe jobisheaded (and thereforewhat ma hine resour es itwilllikely
tax if routedin a parti ular way):
the type of the job,
the type of the next step the job needs ompleted, and
the user who reated the job.
Inaddition,wewantafourthstate featurethatallowstherouterto onsider howmu hload
isalreadyontheneighbors towhi hitis onsideringforwardingajob. We ould addastate
feature for every neighbor that represents the urrent load onthat ma hine. However, this
would dramati ally in rease the size of the resulting Q-table, espe ially for large,
highly- onne ted networks, and ouldmaketable-basedlearninginfeasible. Fortunately,almostall
of those state features are irrelevant and an be dis arded. Sin e we are trying to estimate
& Veloso, 1999). As the name implies, a tion-dependent features ause an agent's state to
hange asdierent a tionsare onsidered. In this ase,our a tion-dependentfeature always
ontains the urrent load onwhatever neighbor we are onsidering routingto. The load on
allother neighborsis not in luded and hen ethe Q-tableremainsvery small.
Update Frequen y. TheoriginalformulationofQ-routingspe iesthatea htimeanode
re eives a pa ket it should reply with a time-to-go estimate for that pa ket. However, it is
notne essarilyoptimaltodosoeverytime. Infa t,thefrequen y atwhi hsu h updatesare
sentrepresents animportanttrade-o. Themore oftenareplyis sent, the morereliable the
router's feedba k will beand the more rapidly it willtrain. However, if repliesare sent less
often, then morenetwork bandwidth isreserved for a tualpa kets, instead ofbeing logged
with administrativeupdates. In our implementation,repliesare sent with a0.5 probability,
whi hwe determined tobe reasonable through informal experimentation.
A tionSele tion. Likeotherte hniquesbasedonreinfor ementlearning,Q-routingneeds
anexplorationme hanismtoensurethatoptimalpoli iesaredis overed. Iftherouteralways
sele ts the neighbor with the lowest time-to-go, it may end up with a sub-optimal poli y
be ause onlythe best neighbor'sestimatewilleverget updated. An explorationme hanism
ensures that the router will o asionally sele t neighbors other than the urrent best and
hen e eventually orre t sub-optimalities in its poli y. In our implementation, we use
-greedy exploration (Sutton& Barto, 1998), with set to 0.05. In -greedy exploration, the
routerwill,withprobability,sele taneighborrandomly;withprobability1 itwillsele t
the urrently estimated best neighbor.
3.2 S heduling
The routing te hniques dis ussed above allattemptto distribute load onthe system evenly
soasto minimizethe time that passesbetween the reationand ompletion of ajob. Doing
so orre tly plays an important role in overall system performan e, but it is not the only
fa tor. Our goal inthis arti le,and the point of introdu ing a high-level verti al simulator,
is to investigate the possibility of employing autonomi elements at more than one level of
the system. It is with this goalin mind that we attemptto ouple the routing me hanisms
already des ribed with s hedulers, whi h must determine how to most eÆ iently allo ate a
given ma hine'sCPU y les.
Be auseourgoalistomaximizeoverallutility,a ordingtotheutilityfun tionsgivenfor
ea huser,optimizingroutingalonewouldnotbeoptimal. Theroutingmethodspresentedin
this paperattempttominimizethe ompletiontimeof agiven pa ket. However, ompletion
time is only indire tly related to the s ore, whi h it is our goal to maximize. The s ore
assigned to any job is determined by a utility fun tion, whi h an be dierent for dierent
types of jobs or users. The only requirement is that the fun tion de rease monotoni ally
on its own: if we are minimizing the ompletion time, we must be maximizing the s ore.
However, this is trueonly inthe very limited ase whereall jobshavethe same importan e.
There are two importantways that jobs an varyin importan e.
Firstly, the jobsmay be governedby dierent utilityfun tions. Suppose jobs reated by
the intern were s ored a ording to the fun tion U(t)= t while jobs reated by the CEO
were s ored a ordingtothe fun tionU(t)= 100t. Inthis ase, the CEO's jobsare vastly
moreimportant. Clearly,anetworkthatdevotesasmu hofits apa itytowards theintern's
jobs asthe CEO'sjobs willbe very sub-optimal.
Se ondly,utilityfun tionsmay benon-linear. Even ifalljobsare ontrolledby the same
fun tion,ifthatfun tionisnon-linearthensomejobswillmattermorethanothers. Imagine
a utility fun tion that slopes down sharply while t 50 and then ompletely attens out.
Now onsider two jobs working their way through the network, one that was reated 25
timesteps ago and one that was reated 100 timesteps ago. In this s enario, the former job
is mu h more important than the latter. The job that has been running for 100 timesteps
is a \lost ause": it is already past the region in whi h there is hopeof improving its s ore
so spending network resour es to speed up its ompletion would be fruitless. By ontrast,
the jobthathas onlyrun for25timestepsis very important: ifit ispossibleto omplete the
jobinless than50timesteps, theneverystep that an beshaved oits ompletiontime will
result inan improved s ore.
Hen e, when jobs do not all have equal importan e, minimizingthe ompletion time of
less importantjobs an be dramati allysuboptimal be ause it uses network resour es that
would be better reserved for more important jobs. In this sense, the Q-routing te hnique
explainedabovehasagreedyapproa h: itattempts tomaximizethes ore ofagiven job(by
minimizingits ompletion time) but doesnot onsider howdoing somayae t the s ore of
other jobs.
In prin iple, this short oming ould be addressed by revising the values that the
Q-router learns and bases its de isions on. For example, if the Q-values represented global
utilityinsteadof time-to-go,the routerwouldhavenoin entivetofavorthe urrent joband
ouldeventuallylearntorouteinawaythatmaximizesglobalutility,even attheexpense of
aparti ularjob'stime-to-go. However, su hasystemwouldhavetheseriousdisadvantageof
requiring ea h node to have system-wide informationabout the onsequen es of itsa tions,
whereas the urrent system isable tolearn given onlyfeedba k from immediateneighbors.
Another alternative would be to hange the router's a tion spa e. Currently, an a tion
onsists of routing a parti ular job to some neighbor. Instead, ea h a tion ould represent
a de ision about how to route all the jobs urrently in the ma hine's queue. While su h a
system would redu e the router'smyopia, itwould reate aprohibitivelylarge a tionspa e.
Given a queue of length n and a set of m neighbors, there would be m n
possible a tions.
Sin e urrent reinfor ement learning methods s ale poorly to large a tion spa es, su h a
representation would render our approa hintra table.
Given these diÆ ulties, we believethe hallenges posed by ompli atedutility fun tions
in-be pro essed. They de idehow the ma hine's CPUtime willbes heduled. Bydetermining
whi h jobs are in most pressing need of ompletion and pro essing them rst, intelligent
s hedulers an maximizethe network'ss ore even whenthe utilityfun tionsare asymmetri
and non-linear. In the following subse tions, we present two simple s heduling heuristi s
and introdu e a new te hnique alled the insertion s heduler, whi h utilizes the time-to-go
estimates ontained inthe router'sQ-tableto assess a job'spriority.
3.2.1 FIFO S heduler
The default s hedulingalgorithmused inour simulator isthe rst-in rst-out (FIFO)
te h-nique. In this approa h, jobs that have been waiting in the ma hine's queue the longest
are always pro essed rst. More pre isely, the s heduler hooses the next job to pro ess by
sele ting randomlyfrom the set J
l
of jobs that have been waiting the longest. Iftime(j) is
the time that jobj arrived at the ma hine and J is the set of waitingjobs, J
l isdetermined as follows: J l =fj 2J jtime(j)time(j 0 );8j 0 2Jg
Clearly,the FIFOalgorithmdoesnothing toaddressthe ompli ationsthat arise whenjobs
have dierent importan e.
3.2.2 Priority S heduler
An alternative heuristi that does address these on erns is a priority s heduler, whi h is
similartomultilevelfeedba k queues(Tanenbaum,2001). Thisalgorithmworksjustlikethe
FIFOapproa h ex ept that ea h job isassigned apriority. When allo atingCPU time, the
priority s heduler examines only those jobs with the highest priority and sele ts randomly
from among the ones that have been waiting the longest. In other words, the priority
s heduler sele ts jobsrandomly fromthe following set:
J l =fj 2Jjtime(j)time(j 0 )^priority(j)priority(j 0 );8j 0 2Jg
If all the utility fun tions are simply multiples of ea h other, the priority s heduler an
a hieve optimal performan e by assigning jobs priorities that orrespond to the slope of
their utility fun tions. However, whenthe utility fun tions are trulydierent ornon-linear,
the problemof de idingwhi h jobsdeserve higher priority be omesmu h more ompli ated
and the simplisti approa hof the priority s heduler breaks down.
3.2.3 Insertion S heduler
Todevelop amoresophisti atedapproa h,weneedtoformulatethe problemmore arefully.
Everytimeanewjobarrivesatama hine,thes hedulermust hooseanorderingofallthe n
jobsinthequeue andsele tfor pro essingthe jobthatappears atthe headofthat ordering.
Of the n! possible orderings, we want the s heduler to sele t the ordering with the highest
orderingand 2) howto eÆ iently sele t the best orderingfrom amongthe n! ontenders.
The utility of an ordering is the sum of the onstituent jobs' s ores and a given job's
s oreisaknownfun tionof ompletiontime. Thus, the problemof estimatinganordering's
utility redu es to estimating the ompletion time of all the jobs in that ordering. A job's
ompletion time depends onthree fa tors:
1. How oldthe job was when itarrived atthe urrentma hine,
2. How long the jobwill waitinthis ma hine's queue given the onsidered ordering, and
3. How mu h additionaltime the job willtake to omplete after itleaves this ma hine.
The rst fa tor is known and the se ond fa tor is easily omputed given the speed of the
ma hine and a list of the jobs pre eding this one in the ordering. The third fa tor is not
known but anbeestimatedusingma hinelearning. Infa t,thevalueswewanttoknoware
exa tlythesameasthoseQ-routinglearns. Hen e, ifthes hedulerwepla einea hma hine
is oupled witha Q-router, noadditionallearningis ne essary. We an look up the entryin
the Q-table that orresponds to a job of the given type. Note that this estimate improves
over time asthe Q-router learns.
On e we an estimate the ompletion time of any job, we an ompute the utility of
any ordering. The only hallenge that remains is how to eÆ iently sele t a good ordering
fromamongthen!possibilities. Clearly,enumeratingea hpossibilityisnot omputationally
feasible. Ifwetreatthis taskasasear hproblem,we oulduseany ofanumberof
optimiza-tion te hniques (e.g. hill limbing, simulated annealing, or geneti algorithms). However,
these te hniques alsorequiresigni ant omputationalresour es and the performan e gains
oered by the orderings they dis over are unlikely to justify the CPU time they onsume,
sin ethe sear h needstobeperformedea htimeanew jobarrives. Given these onstraints,
we propose a simple, fast heuristi alled the insertion s heduler. When a new job arrives,
the insertion s hedulerdoes not onsider any orderings that are radi ally dierent from the
urrent ordering. Instead, it de ides at what position to insert the new job into the
ur-rent ordering su h that utility is maximized. Hen e, it needs to onsider only n orderings.
While this restri tion may prevent the insertions heduler fromdis overing the optimal
or-dering,itnonethelessallowsforintelligents hedulingofjobs,withonlylinear omputational
omplexity, that exploits learnedestimates of ompletion time.
3.2.4 Sample S heduler
Theinsertions hedulerusesaheuristi tosele twhi hnqueueorderingstoexamine. Inorder
to test the value of this heuristi , we developed another, similar s heduler that randomly
sele tswhi horderingstotest. Thesamples hedulerestimatestheutilityofea horderingit
examinesin exa tlythe same mannerasthe insertions heduler. It alsosele ts the ordering
that produ es the highest estimated utility, just like the insertion s heduler. The only
expe t itto outperform the samples heduler.
4 Experimental Framework
Our experiments test all of the above methods on three dierent networks, ea h of whi h
simulates a ommer ial enterprise system serving two users, a CEO and an Intern. In this
se tion, we detail the features that all three networks have in ommon: the job types, the
me hanismfor job reation, and the utility fun tions that are used by ea h user.
4.1 Job Types
In all three networks, there are two types of jobs that the users reate: Mail Jobs and
Database Jobs. Ea h MailJob onsistsof the following three steps:
1. Web Step, work =50
2. MailStep, work = 100
3. Web Step, work =50
The work asso iated with ea h step is simply the number of CPU y les required to
omplete the step. As one might expe t, only Web Servers an omplete Web Steps and
onlyMailServers an omplete MailSteps. Hen e, inorder tobe ompleted, ea hMailJob
must travel along a path that in ludes the followingstops: 1)visit a Web Server, 2)visit a
Mail Server, 3)return to aWeb Server, 4)return tothe user who reated it.
Ea h Database Job onsists of the followingthree steps:
1. Web Step, work =50
2. Database Step, work =200
3. Web Step, work =50
Sin e only a Database an omplete a Database Step, the path of a Database Jobmust
in lude: 1)visit a Web Server, 2) visit a Database, 3)return to a Web Server, 4) returnto
the user who reated it.
4.2 Job Creation
Allthreenetworksuse the followingme hanismfordeterminingwhenusers reatenewjobs.
At ea h timestep, ea h user hooses randomly between reating one or two new jobs. For
ea h job, it hooses randomly between a Mail Job and a Database Job. The reation of
new jobs by ea h user is subje t to animportant restri tion: ea h user must remain below
method of generating jobs models features of real user behavior: users tend to redu e their
use ofnetworks thatare overloaded andthe reationofnew jobsdependsonthe ompletion
of older ones. For example, a user typing a do ument on a slow terminal is likely to stop
typing momentarilywhenthe numberof keystrokes not re e ted onthe s reenbe omestoo
great. In addition, this demand model allows us to easily test our methods on a network
thatisbusybutnotoverloaded. Anydemandmodelthatisnottiedtothe system's apa ity
islikely toeitherunder orover utilizenetwork resour es. Inthe former ase,weak methods
may still get good performan e sin e there is spare apa ity (i.e. a eiling ee t). In the
latter ase,even goodmethodswillperformbadlybe ausetheavailableresour es, regardless
of how they are allo ated,are insuÆ ientto meet demand. Our demand model, by striking
a balan e between these alternatives, allows us to more ee tively ompare methods of
optimizingthe network's performan e.
4.3 Utility Fun tions
Inorderto apture the omplexitiesraisedby usersof dieringimportan e,weassign
dier-ent utility fun tions, shown in Figure 2, to our two users. The utility fun tions are used in
allthreenetworks. Jobs reated by theinternare s oreda ordingtothefollowingfun tion:
U(t)= n
t=10; if t<50
10t+495; otherwise
where t is the job's ompletion time. By ontrast, jobs reated by the CEO are s ored
by the fun tion
U(t)= n
10t; ift<50
t=10 495; otherwise
The ru ialfeature of these metri sis that they do not have onstant slope. Hen e, the
hange in utility that the system reaps for redu ing the response time of a job is not
on-stant. As explainedabove, this feature givesrise tothe ompli ations that make intelligent
s heduling non-trivial. The point at whi h ea h fun tion hanges slopewas hosen so asto
lie in the region of the x-axis that orresponds to typi al ompletion times for jobs in our
networks. Ifthethreshold werenowherenearthis region,thentheutilityfun tionswouldde
fa tohave onstantslope,yieldingamu heasiers hedulingproblem(i.e. one thatthe
prior-ity s heduler ould handleoptimally). The utility fun tions were not tuned tothis problem
in any other way. Though our experiments study only this parti ular pair of metri s, our
algorithmsare designedtoworkwith arbitraryfun tionsof ompletiontime,solongasthey
are monotoni ally de reasing.
5 Results
In this se tion we des ribe experiments ondu ted on three dierent networks omparing
-600
-500
-400
-300
-200
-100
0
0
20
40
60
80
100
120
140
Utility
Completion Time
Utility Functions for the CEO and Intern
CEO
Intern
Figure 2: UtilityFun tions fortheCEOand Intern.
are designed to establish proof-of- on ept for our methods. Hen e, they are the simplest
networkswe ould onstru t thatexhibitthe ompli ationsour methodsattempttoaddress.
Thethirdnetworkislargeranddesigned toprovidesome onden ethat thesemethodswill
s ale up.
Ea hexperimentrunsfor20,000timesteps. Ea hsimulationispre ededby anadditional
5,000\warmup"stepsbeforetallyings ores. Inthe aseofQ-routing,arandomrouterisused
during the warmup steps; Q-routing is turned on and begins training only at timestep #0.
The purpose of the warmup isto ensure that the network isat full apa ity before learning
begins. Doingsohelpsdistinguish hangesinperforman eduetodis overingsuperiorpoli ies
from those due to load building up in an initially empty network. At any point in the
simulation, the s ore for ea h method represents a uniform moving average over the s ores
re eived for the last 100 ompleted jobs. The s ores are averaged over 20runs.
For ea h network, we present the results of pairing ea h routing method with a FIFO
s hedulerandpairingea hs hedulingmethodwithaQ-router. Forthesakeof larity,wedo
notpresenttheotherpossiblepairsthoughourexperiments onrmthatthose ombinations
perform worse than the best methods.
Attimestep#10,000inea hexperiment,a system atastrophe issimulatedinwhi h the
speed of a few riti al ma hines is ut in half. Sin e our learning methods are designed to
work on-line, we expe t them to adapt rapidly to hanges in their environment, a feature
tested by these simulated atastrophes. The details of whi h ma hines are ee ted in ea h
experiment are explained below.
In allof the results presented below, assertions of statisti alsigni an e are based ona
student's t-test (at 95% onden e) omparing the s ores of the two given methods
5.1 Network #1
Figure3depi ts the networkused inour rst experiment. In this network, theWeb Servers
are relatively slow. Hen e, they a t as a bottlene k to system performan e. Note that the
ma hinesthatmustmakeimportantroutingde isions(theLoadBalan ers),areneighborsof
thema hinesthatare riti altosystemperforman e. Hen e,thespeed-basedandload-based
routers,whi h relyon informationabout their neighbors, an perform very well.
Intern
CEO
Speed = 150
Web Server
Speed = 100
Speed = 50
Web Server
Web Server
Speed = 200
Speed = 200
Load Balancer
Load Balancer
Speed = 50
Mail Server
Speed = 100
Mail Server
Speed = 150
Mail Server
Speed = 300
Speed = 200
Speed = 100
Database
Database
Database
Figure3: Network#1,inwhi htheWebServersa tasabottlene ktosystemperforman e. Ovals
representusersandre tanglesrepresentma hines;thelinesbetweenthemrepresentlinksthatallow
ommuni ation of jobs or other pa kets. The speed asso iated with ea h ma hine represents the
numberof CPU y les it an exe ute in one turn. The gray boxindi ates a ma hine whosespeed
isredu edbyhalfduringa system atastrophe.
Figure 4a ompares the performan e of all four routing methods when paired with a
FIFOs heduler and applied toNetwork #1.
The graph learly demonstrates that routingrandomly is dramati allysuboptimal. The
gap in performan e between the random router and its ompetitors is statisti ally
signi- ant. The Q-routingmethodinitiallyperformsaspoorlyasthe randomrouterbutimproves
rapidly. It performs errati allywhile exploringdierentpoli iesbut qui kly plateausat the
same level as the speed-based and load-based routers. In this network, it is not surprising
thatQ-routingdoesnotoutperformtheseheuristi s. Sin ethesystem's bottlene ko ursin
ma hinesthatneighbortheloadbalan ers,theheuristi methodsare abletorouteeÆ iently
based onthe speed and load informationthey re eive.
Attimestep#10,000,asystem atastropheissimulatedinwhi hthe speed ofthefastest
Web Server (indi ated by a gray box inFigure 3) is ut in half. Sin e the Web Servers a t
asbottlene ks inthis network,the atastrophe ausesadrop inperforman eforallmethods
ex ept the random router, whi h already routes so ineÆ iently that the atastrophe does
not further narrowits bottlene k.
Figure4b omparestheperforman eofallfours hedulerswhenpairedwiththeQ-router.
Inthis ase, usingthe insertions heduleryieldsastatisti allysigni antperforman e boost
over allthe other s hedulers. This result suggests that even when Q-routing does not itself
improve performan e, it is worth doing be ause the values it learns an be su essfully
-800
-750
-700
-650
-600
-550
-500
-450
-400
-350
-300
0
5000
10000
15000
20000
Score
Timesteps
A Comparision of Four Routers
Random Router & FIFO Scheduler
Speed-Based Router & FIFO Scheduler
Load-Based Router & FIFO Scheduler
Q-Router & FIFO Scheduler
-1400
-1200
-1000
-800
-600
-400
-200
0
0
5000
10000
15000
20000
Score
Timesteps
A Comparision of Four Schedulers
Q-Router & FIFO Scheduler
Q-Router & Priority Scheduler
Q-Router & Insertion Scheduler
Q-Router & Sample Scheduler
(a) Routers
(b) Schedulers
Q−Router & Priority Scheduler
Q−Router & FIFO Scheduler
Q−Router & Insertion Scheduler
Q−Router & Sample Scheduler
Q−Router & FIFO Scheduler
Random Router & FIFO Scheduler
Load−Based Router & FIFO Scheduler
Speed−Based Router & FIFO Scheduler
Figure 4: Results from Network #1. In (a), all four routing methods are paired with a FIFO
s heduler. In(b), all fours heduling methodsarepairedwithQ-routing.
insertions heduler, performssigni antlyworse,whi hsupportsour laimthatthe insertion
s heduler isa useful heuristi .
5.2 Network #2
Figure 5 depi ts the network used in our se ond experiment. It is identi al to Network #1
ex eptthatthe speedoftheWebServers hasbeen signi antlyin reased. Asa onsequen e,
the system's bottlene k moves from the Web Servers to the Mail Servers and Databases.
Be ause of this hange, the lo al informationthat the heuristi routers rely on is nolonger
useful. The Load Balan ers ansee onlythe Web Servers and their speeds and loadsare no
longer riti alto system performan e.
Intern
CEO
Speed = 150
Mail Server
Speed = 300
Database
Speed = 600
Speed = 800
Web Server
Web Server
Web Server
Speed = 200
Speed = 200
Load Balancer
Load Balancer
Speed = 50
Mail Server
Speed = 100
Mail Server
Speed = 200
Speed = 100
Database
Database
Speed = 400
Figure 5: Network #2, in whi h the Mail Servers and Databases a t as a bottlene k to system
performan e. Ovals represent users and re tangles represent ma hines; the lines between them
representlinksthatallow ommuni ationofjobsorotherpa kets. Thespeedasso iatedwithea h
ma hine represents the number of CPU y les it an exe ute in one turn. Gray boxes indi ate
dependson ma hines that are notdire tly visibletothe Load Balan ers,it isnot surprising
thatQ-routingoutperformstheothermethodsinthisnetwork. Duetotheirin reasedspeed,
the Web Servers never have load a umulated in their queues, whi h auses the load-based
routertoperformjustlikearandomrouter. Thespeed-basedroutera tuallyperforms worse
than randombe auseit is misledby the irrelevantspeeds of the Web Servers and attempts
a ounterprodu tive load balan ing.
-260
-240
-220
-200
-180
-160
-140
-120
-100
0
5000
10000
15000
20000
Score
Timesteps
A Comparision of Four Routers
Random Router & FIFO Scheduler
Speed-Based Router & FIFO Scheduler
Load-Based Router & FIFO Scheduler
Q-Router & FIFO Scheduler
-350
-300
-250
-200
-150
-100
-50
0
5000
10000
15000
20000
Score
Timesteps
A Comparision of Four Schedulers
Q-Router & FIFO Scheduler
Q-Router & Priority Scheduler
Q-Router & Insertion Scheduler
Q-Router & Sample Scheduler
(a) Routers
(b) Schedulers
Q−Router & FIFO Scheduler
Load−Based Router & FIFO Scheduler
Random Router & FIFO Scheduler
Speed−Based Router & FIFO Scheduler
Q−Router & FIFO Scheduler
Q−Router & Priority Scheduler
Q−Router & Sample Scheduler
Q−Router & Insertion Scheduler
Figure 6: Results from Network #2. In (a), all four routing methods are paired with a FIFO
s heduler. In(b), all fours heduling methodsarepairedwithQ-routing.
In this network, the atastrophe at timestep#10,000 involves utting in half the speed
of the fastest Mail Server and Database (indi ated by gray boxes in Figure 5). Sin e the
heuristi routerswere underloadingthefaster ma hinesbeforethe atastrophe,theredu tion
inspeed doesnot ee t them. Withfewerresour es available,the performan e of Q-routing
inevitably degrades, though it is able, through on-line adaptation of its poli y, to retain a
small advantage overthe othermethods.
Figure 6b pairs all four s hedulers with the Q-router toevaluated their performan e on
Network #2. As above, the insertion s heduler provides a statisti ally signi ant boost in
performan e over the other methods. The relatively weak s ores of the sample s heduler
further onrmthe usefulnessof theinsertions heduler's heuristi for sele tingorderingsto
evaluate.
5.3 Network #3
Networks#1and#2areintendedtoprovideproof-of- on eptfortheadvantagesof
learning-basedroutingands heduling. Todemonstratethattheseadvantagess aleup,wealsotested
its own Mail Server and Database. As in Network #2, the Web Servers have enough CPU
y les that the bottlene kto systemperforman eliesinthe MailServers andDatabases. In
order tokeep this larger network busy, the users are allowed to have500 in omplete jobsat
any time (as opposed to the 100 allows inNetworks #1 and #2).
Intern
CEO
Speed = 400
Speed = 400
Speed = 400
Speed = 400
Speed = 400
Speed = 400
Speed = 400
Speed = 400
Web Server
Web Server
Web Server
Web Server
Web Server
Web Server
Web Server
Web Server
Speed = 200
Speed = 200
Load Balancer
Load Balancer
Speed = 20
Mail Server
Speed = 40
Mail Server
Speed = 60
Mail Server
Speed = 80
Mail Server
Speed = 100
Mail Server
Speed = 120
Mail Server
Speed = 140
Mail Server
Speed = 160
Mail Server
Speed = 180
Mail Server
Speed = 200
Speed = 160
Speed = 120
Speed = 80
Speed = 40
Database
Database
Database
Database
Database
Speed = 400
Web Server
Speed = 240
Database
Speed = 280
Database
Speed = 320
Database
Speed = 360
Database
Figure7: Network#3,inwhi htheLoadBalan ersmust hoosebetweennineWebServersinstead
ofthree. Ovalsrepresentusersandre tanglesrepresentma hines;thelinesbetweenthemrepresent
linksthat allow ommuni ationof jobsorother pa kets. The speedasso iatedwithea h ma hine
represents the number of CPU y les it an exe ute in one turn. Gray boxes indi ate ma hines
whosespeedis redu edbyhalf duringasystem atastrophe.
Figure 8a ompares the performan e onthis larger network of all four routing methods
when paired with a FIFOs heduler. Sin e the Web Servers allhave the same speed in this
network, the speed-based router performs similarly to the random and load-based routers.
As inNetwork #2, Q-routing a hievesby far the best performan e, obtainingastatisti ally
signi ant improvement overthe other methods.
The atastrophe that o urs attimestep#10,000 onsists of utting inhalf the speed of
the four fastest MailServers and Databases (indi ated by gray boxes inFigure 7). Though
its performan e inevitably degrades, Q-routing re overs gra efully by adjusting its poli y
on-line inresponse toenvironmental hanges.
Figure 8b ompares allfour s hedulers on Network #3 by pairing them with Q-routing.
As before, the insertion s heduler s ores the highest, yielding a statisti ally signi ant
-1000
-900
-800
-700
-600
-500
-400
-300
-200
0
5000
10000
15000
20000
Score
Timesteps
A Comparision of Four Routers
Random Router & FIFO Scheduler
Speed-Based Router & FIFO Scheduler
Load-Based Router & FIFO Scheduler
Q-Router & FIFO Scheduler
-1800
-1600
-1400
-1200
-1000
-800
-600
-400
-200
0
0
5000
10000
15000
20000
Score
Timesteps
A Comparision of Four Schedulers
Q-Router & FIFO Scheduler
Q-Router & Priority Scheduler
Q-Router & Insertion Scheduler
Q-Router & Sample Scheduler
(a) Routers
(b) Schedulers
Q−Router & FIFO Scheduler
Random Router & FIFO Scheduler
Speed−Based Router & FIFO Scheduler
Load−Based Router & FIFO Scheduler
Q−Router & FIFO Scheduler
Q−Router & Priority Scheduler
Q−Router & Insertion Scheduler
Q−Router & Sample Scheduler
Figure 8: Results from Network #3. In (a), all four routing methods are paired with a FIFO
s heduler. In(b), all fours heduling methodsarepairedwithQ-routing.
evaluationsof the sample s heduler.
6 Dis ussion
Our experimental results indi ate learly that ma hine learning methods oer a
substan-tial advantage in optimizing the performan e of omputer networks. Both the router and
s hedulerpla edinea hma hinebenet substantiallyfromthetime-to-goestimates
dis ov-eredthrough reinfor ementlearning. Furthermore,the best performan e isa hieved onlyby
pla ing intelligent, adaptive agents at more than one level of the system: the Q-router and
the insertions hedulerperform better togetherthan either ould apart. Hen e, they benet
from a sensible division of optimization tasks; the router fo uses on routing jobs eÆ iently
and balan ingload throughoutthe networkwhilethe s heduler fo uses onprioritizingthose
jobs whose ee t onthe s ore will be most de isive.
This advantage is espe ially ompelling in systems like Networks #2 and #3 in whi h
the ma hines that make important routing de isions (the Load Balan ers) are not dire t
neighbors of the ma hines that are most riti al to system performan e (the Mail Servers
and Databases). In these s enarios, the information ne essary to route eÆ iently is not
dire tly available and a good poli y an be dis overed only by learning from experien e.
However, even when this is not the ase, as inNetwork #1, and heuristi routing methods
perform well, learning-basedsystems an still reap an advantage by exploiting Q-routing's
time-to-goestimates toimprove CPU s heduling. Furthermore, the su ess of the Q-router
andtheinsertions heduleronalargersystem(Network#3)is auseforoptimismthatthese
generate(intheformofLoadUpdatesandQ-Updates),itdoesnotmodelthe omputational
ost of running them. This simpli ation is probably not signi ant for routing, sin e
Q-routing'supdaterule requiresonlyahandfulofarithmeti operationsforea hjob. However,
s heduling algorithms an be signi antly more expensive, whi h is why we are ommitted
to nding fast heuristi s for addressing this problem. For example, the insertion s heduler
examines onlyn ofthe n! possible queue orderings ea h time it must rearrange its jobs.
Sin e allofthe resultswepresent were obtainedinsimulation,itremainstobe seenhow
these methods will perform in real systems. In parti ular, while our simulator strives to
apture many of the intri a ies that make adaptive routing and s heduling a hallenge, it
also glosses over many ompli ating aspe ts of real systems. For example, in our simulator
the time required to omplete a step is a deterministi fun tion of the ma hine's speed,
whereas inreal systems itdepends onthe state ofthe ma hine's memory, disks, a hes, et .
Inaddition,oursimulatorassumesthatjobsneverneedresour esonmorethanonema hine
simultaneously,thusavoidingtheneedtoreasonabout lo ks,pin datain a hes, et . These,
and many other, issues will need to be addressed before our methods will be deployable in
real systems. However, webelievethat our top-down, system-wide, perspe tivewillplayan
essential role in the emerging eld of autonomi omputing.
7 Related Work
The study of autonomi omputing is still in its early stages. Nonetheless, there already
exists a broad body of work that relates, interms of both methodsand goals, tothe eorts
des ribed here.
The method with the losest relationship to our approa h is of ourse the original
Q-routing te hnique (Boyan & Littman, 1994), of whi h our method is a lose adaptation.
A hief limitation of anoni al Q-routing is that distinguishes between pa kets based only
on their ultimate destination, whi h is insuÆ ient in enterprise systems where jobs always
return to their reator upon ompletion. In this arti le, we show that by expanding the
learner's state representation, the prin iple behind Q-routing an be su essfully appliedto
resour e management tasksthat donot exa tlymat h the routing problemas traditionally
posed.
In addition, there are many other approa hes to network routing with the aid of
ma- hine learning (Di Caro &Dorigo, 1998;Clark etal., 2003; Itaoet al.,2001). In parti ular,
AntNet (Di Caro & Dorigo, 1998) uses an ant olony metaphor to reate agents that
on- urrently explorethe networkand ex hangeinformationabout it. Though it ouldbe easily
adapted tomanage workloads onenterprise systems, it does not expli itly learn time-to-go
estimates for ea h type of pa ket and hen e its routing tables ould not easily be exploited
by CPU s hedulers, as we havedone with Q-routing.
The operating systems literaturedis usses many te hniques for CPUs heduling,
in lud-ing multilevelfeedba k queues (Tanenbaum, 2001), whi h are similar tothe priority
Beyondthes opeofroutingands heduling,thereisalso onsiderableresear hdevotedto
using ma hine learningtooptimizeresour emanagementin omplexsystems. Forexample,
Brauerand Weiss (1998)usereinfor ementlearningmethodstooptimizes hedulingoftasks
onmultiplema hines. Gomezetal.(2001)useneuroevolutiontooptimizedynami resour e
allo ationona hipmultipro essor. Abdezaheretal.(2002)presentate hniqueformanaging
web servers that onsiders, as we do, jobs of diering utility, though their method uses
only lassi al feedba k ontrol. Also, Yellin (2003) presents an algorithm for dynami ally
sele ting among omponent implementations,though ea h individual implementationmust
still be manually ongured. These methods are important ontributions to the goal of
optimizingindividual parts of omplex systems but they donot address our spe i fo us:
the problems and possibilities of simultaneously optimizing multiple omponents from a
system-wide perspe tive.
Whereas our resear h employs reinfor ementlearning te hniques, other resear hers have
usedsupervised learningmethodstoaddress lassi ationproblems ofinteresttoautonomi
omputing. Chen et al. (2004) use de ision trees todiagnose system failures, an important
problem. However, theirsolutiondoesnot addressthequestion ofhowtodealwith afailure
on e it is diagnosed, whi h would require either manual intervention or a reinfor ement
learning agent. Mesnier et al. (2004) use de ision trees to lassify dierent le types and
therefore improvedisk performan e. Again, su h a system is useful only if we know(or an
get the omputer tolearn) what kindof disk behavior isbest suited toea h letype.
Finally, there is also resear h in autonomi omputing that does not address ma hine
learning at all, but instead redesigns system ar hite tures to reate new possibilities for
adaptation. For example, Jann et al. (2003) present an ar hite ture that allows dynami
movement of hardware resour es a ross logi al partitions without rebooting. They do not
address how an intelligent agent might determine when resour e reallo ationshould o ur.
Hen e their work, while striving for dierent goals, ts ohesively with our approa h: as
they try to reate opportunities for adaptability, we try to reate agents that an exploit
those opportunities.
8 Future Work
In ongoingresear h,we plan toinvestigatenew ways of applyingma hine learningmethods
tofurtherautomateand optimizenetworks likethe onestudiedinthis arti le. Inparti ular,
we hope toautomate the de ision of how frequently ma hines should send updates to their
neighbors. Both Load Updates and Q-Updates are more useful if they are sent more often;
however, both kinds of updates also tax pre ious network bandwidth. Rather than nding
the balan e between these two fa tors through manual experimentation, we would like to
deviseanetworkintelligentenoughtodetermineoptimalupdatefrequen ieswithouthuman
assistan e.
In addition, we would liketo use ma hine learningto determine what network topology
We hope to develop a system in whi h ma hine learning helps determine the most eÆ ient
stru tureof thenetworkwhenitisinitiallydesigned,whenitneedstobeupgraded,orwhen
it isrepaired.
Our on-goingresear hgoalistodis over, implement,and testma hinelearningmethods
in support of autonomi omputing at all levels of a omputer system. Though this initial
work is all in simulation, the true measure of our methods is whether they an impa t
performan e on real systems. Whenever possible, our design de isions are made with this
fa tinmind. Ultimately weplantoimplementandtest our autonomi omputingmethods,
su h asthe Q-router and insertions heduler, onreal omputer systems.
9 Con lusion
The three main ontributions of this arti le are:
1. A on rete formulationof theautonomi omputingproblemintermsonthe
represen-tative taskof enterprise system optimization.
2. A new verti al simulator designed to abstra tly represent all aspe ts of a omputer
system. This simulator is fully implemented and tested. It is used for all of the
experiments presented inthis paper.
3. Adaptiveapproa hestothenetworkroutingands hedulingproblemsinthissimulator
that out-perform reasonable ben hmark poli ies.
The simulator that we introdu e fa ilitates the study of autonomi omputing methods
from a top-down perspe tive. Rather than simply optimizing individual omponents, we
fo usonoptimizingtheintera tionsbetween omponentsatasystem-wide level. Theresults
presented here, in addition to demonstrating that ma hine learning methods an oer a
signi ant advantage for job routing and s heduling, alsovalidate this top-down approa h.
Theyprovideeviden eofthevalueof ombiningintelligent,adaptiveagentsatmorethanone
level of the system. Togetherthese results oer hope that ma hine learningmethods, when
appliedrepeatedlyandin on ert, anprodu etherobust,self- onguring,andself-repairing
systems ne essary tomeet tomorrow's omputingneeds.
A knowledgments
We would liketo thank IBM for a generous fa ulty award tohelp jump-startthis resear h.
In parti ular, thanks to Russ Blaisdellfor valuable te hni al dis ussions and to Ann Marie
Maynard for serving as a liaison. This resear h was supported in part by NSF CAREER
award IIS-0237699. Finally,we would liketo thank Gerry Tesauro for hisinsightful
sugges-tions about implementingQ-routing andEmmettWit helfor his onstru tive ommentson
Abdelzaher,T.,Shin,K.G.,&Bhatti,N.(2002).Performan eguaranteesforwebserver
end-systems: A ontrol-theoreti al approa h. IEEE Transa tions on Parallel and Distributed
Systems, 13.
Boyan, J. A., & Littman, M. L. (1994). Pa ket routing in dynami ally hanging networks:
A reinfor ement learning approa h. Advan es in Neural Information Pro essing Systems
(pp. 671{678). Morgan KaufmannPublishers, In .
Bra hman, R. J.(2002). Systems that know what they're doing. IEEE Intelligent Systems,
17, 67{71.
Brauer,W.,&Weiss,G.(1998).Multi-ma hines heduling-amulti-agentlearningapproa h.
Pro eedings of the International Conferen e on Multi-Agent Systems (pp. 42{48).
Chen, M., Zheng, A., Lloyd, J., Jordan, M., & Brewer, E. (2004). Failure diagnosis using
de ision trees. Pro eedings of The International Conferen e on Autonomi Computing
(ICAC-04). Toappear.
Clark, D. D., Partridge, C., Ramming, J. C., & Wro lawski, J. (2003). A knowledge plane
for the internet. Pro eedingsof ACM SIGCOMM.
Di Caro, G., & Dorigo, M. (1998). AntNet: Distributed stigmergeti ontrol for
ommuni- ations networks. Journal of Arti ial Intelligen e Resear h,9, 317{365.
Gomez, F., Burger, D., & Miikkulainen, R. (2001). A neuroevolution method for dynami
resour e allo ationona hip multipro essor. Pro eedingsof theINNS-IEEE International
Joint Conferen e on Neural Networks (pp. 2355{2361). IEEE.
Itao,T., Suda,T., &Aoyama, T.(2001). Ja k-in-the-net: Adaptivenetworking ar hite ture
for servi e emergen e. Pro eedingsof the Asian-Pa i Conferen e on Communi ations.
Jann, J., Browning, L. M., & Burugula, R. S. (2003). Dynami re onguration: Basi
building blo ks for autonomi omputing on IBM pSeries servers. IBM Systems Journal,
42, 29{37.
Kephart, J. O., &Chess, D. M.(2003). The visionof autonomi omputing. Computer, 36,
41{50.
Mesnier,M.,Thereska,E.,Ellard,D.,Ganger,G.R.,&Seltzer,M.(2004). File lassi ation
in self-* storage systems. Pro eedings of The International Conferen e on Autonomi
Computing (ICAC-04). Toappear.
Stone, P., & Veloso, M. (1999). Team-partitioned, opaque-transition reinfor ement
learn-ing. In M. Asada and H. Kitano (Eds.), RoboCup-98: Robot so er world up II. Berlin:
SpringerVerlag.AlsoinPro eedingsoftheThird InternationalConferen eonAutonomous
MA: MIT Press.
Tanenbaum,A. (2001). Modern operating systems. Englewood Clis,NJ: Prenti e Hall.
Walsh,W.E.,Tesauro,G.,Kephart,J.O., &Das,R.(2004). Utilityfun tionsinautonomi
systems.Pro eedingsoftheInternationalConferen eonAutonomi Computing.Toappear.
Watkins, C. J. C. H. (1989). Learning from delayed rewards. Do toral dissertation, King's
College,Cambridge,UK.
Yellin,D. M.(2003). Competitivealgorithmsforthedynami sele tionof omponent