via Linear Programming
Daniela P. de Farias
DepartmentofManagementS ien eandEngineering
StanfordUniversity
Stanford,CA94305
pu istanford.edu
BenjaminVanRoy
DepartmentofManagementS ien eandEngineering
StanfordUniversity
Stanford,CA94305
bvrstanford.edu
Abstra t
The urseofdimensionalitygivesrisetoprohibitive omputational
requirementsthatrenderinfeasibletheexa tsolutionoflarge{s ale
sto hasti ontrol problems. Westudy an eÆ ientmethod based
onlinear programmingforapproximating solutionsto su h prob-
lems. The approa h \ts" a linear ombination of pre{sele ted
basisfun tions to thedynami programming ost{to{go fun tion.
Wedevelopboundsontheapproximationerrorandpresentexperi-
mentalresultsinthedomainofqueueingnetwork ontrol,providing
empiri alsupportforthemethodology.
1 Introdu tion
Dynami programmingoersauniedapproa hto solvingproblemsof sto hasti
ontrol. Centraltothemethodologyisthe ost{to{gofun tion,whi h anobtained
viasolvingBellman'sequation. Thedomainofthe ost{to{gofun tion isthestate
spa e of thesystemto be ontrolled,and dynami programmingalgorithms om-
puteandstoreatable onsistingofone ost{to{govalueperstate. Unfortunately,
thesize ofastatespa etypi allygrowsexponentiallyinthe numberof statevari-
ables. Known as the urse of dimensionality, this phenomenon renders dynami
programmingintra tablein thefa eofproblemsofpra ti als ale.
Oneapproa htodealingwiththisdiÆ ultyistogenerateanapproximationwithin
aparameterized lassof fun tions, in aspirit similar to that of statisti al regres-
sion. The fo usof this paperis on linearly parameterizedfun tions: one tries to
approximatethe ost{to{gofun tionJ
byalinear ombinationofprespe iedba-
sisfun tions. Notethatthiss hemedependsontwoimportantpre onditionsforthe
able hoi erequiressomepra ti alexperien eortheoreti alanalysisthat provides
roughinformationontheshapeofthefun tiontobeapproximated. \Regularities"
asso iated with thefun tion, for example, anguide the hoi e of representation.
Se ond,weneedaneÆ ientalgorithmthat omputesanappropriatelinear ombi-
nation.
The algorithmwestudy is basedon alinearprogrammingformulation, originally
proposed by S hweitzer andSeidman[5℄, that generalizesthe linearprogramming
approa h toexa tdynami programming,originallyintrodu edbyManne [4℄. We
present anerrorboundthat hara terizesthequalityof approximationsprodu ed
bythelinearprogrammingapproa h. Theerroris hara terizedin relativeterms,
omparedagainstthe\bestpossible"approximationoftheoptimal ost-to-gofun -
tion given the sele tion of basis fun tions. This is the rst su h error bound for
anyalgorithmthatapproximates ost{to{gofun tionsofgeneralsto hasti ontrol
problemsby omputingweightsforarbitrary olle tionsofbasisfun tions.
2 Sto hasti ontrol and linear programming
We onsiderdis rete{timesto hasti ontrolproblemsinvolvinganitestatespa e
S of ardinalityjSj = N. Forea h state x 2 S, there is a nite set of available
a tions A
x
. Taking a tion a2 A
x
whenthe urrentstate is x in urs ost g
a (x).
State transition probabilities p
a
(x;y) represent, for ea h pair (x;y) of states and
ea h a tion a 2 A
x
, the probability that the next state will be y given that the
urrentstateisxandthe urrenta tionisa2A
x .
A poli y uisamappingfrom statestoa tions. Givenapoli yu, thedynami sof
thesystemfollowaMarkov hainwithtransitionprobabilitiesp
u(x)
(x;y). Forea h
poli yu,wedeneatransition matrixP
u
whose (x;y)thentryisp
u(x) (x;y).
Theproblem of sto hasti ontrol amounts to sele tionof apoli y that optimizes
agiven riterion. Inthispaper,wewillemployasanoptimality riterioninnite{
horizondis ounted ostoftheform
J
u
(x)=E
"
1
X
t=0
t
g
u (x
t )
x
0
=x
#
;
where g
u
(x) is used as shorthand for g
u(x)
(x) and the dis ount fa tor 2 (0;1)
re e ts inter{temporal preferen es. Optimality is attained by any poli y that is
greedywithrespe ttotheoptimal ost-to-gofun tionJ
(x)=min
u J
u
(x)(apoli y
uis alledgreedywithrespe ttoJ ifT
u
J =TJ).
LetusdeneoperatorsT
u
andT byT
u J =g
u +P
u
JandTJ=min
u (g
u +P
u J).
Theoptimal ost-to-gofun tion solvesuniquelyBellman'sequation J =TJ. Dy-
nami programmingoersanumberofapproa hesto solvingthisequation;oneof
parti ularrelevan eto ourpapermakesuseoflinearprogramming,aswewillnow
dis uss. Considertheproblem
max 0
J (1)
s:t: TJJ;
where is a ve tor with positive omponents, whi h we will refer to as state{
relevan e weights. It anbeshown thatanyfeasibleJ satises J J
. It follows
that,foranysetofpositiveweights ,J
istheuniquesolutionto(1).
Note that ea h onstraint (TJ)(x) J(x) is equivalent to a set of onstraints
g
a
(x)+ P
y2S p
a
(x;y)J(y) J(x); 8a 2 A
x
, so that the optimization problem
duetothe urse ofdimensionality. Consequently,thelinearprogramofinterestin-
volvesprohibitivelylargenumbersofvariablesand onstraints. Theapproximation
algorithmwestudyredu esdramati allythenumberofvariables.
Let us now introdu e the linear programmingapproa h to approximate dynami
programming. Given pre{sele tedbasisfun tions
1
;:::;
K
, deneamatrix=
[
1
K
℄. Withanaimof omputingaweightve torr~2<
K
su h that~r
isa loseapproximationtoJ
,onemightposethefollowingoptimizationproblem:
max 0
r (2)
s:t: Trr:
Givenasolution~r,onemightthenhopetogeneratenear{optimalde isionsbyusing
apoli y thatisgreedywithrespe tto~r.
Aswiththe aseofexa tdynami programming,theoptimizationproblem(2) an
be re astas a linear program. We will refer to this problem as the approximate
LP. Note that, though the number of variables is redu ed to K, the number of
onstraintsremainsaslargeasin theexa tLP.Fortunately, weexpe tthat most
of the onstraintswill be omeirrelevant, and solutionsto the linearprogram an
beapproximatedeÆ iently,asdemonstratedin[3℄.
3 Error Bounds for the Approximate LP
Whenthe optimal ost{to{go fun tion lieswithin the spanof thebasisfun tions,
solutionoftheapproximateLPyields theexa toptimal ost{to{gofun tion. Un-
fortunately,itis diÆ ultinpra ti etosele tasetofbasisfun tions that ontains
theoptimal ost{to{gofun tion within itsspan. Instead, basisfun tions must be
basedonheuristi sandsimpliedanalyses. One anonlyhopethatthespan omes
losetothedesired ost{to{gofun tion.
FortheapproximateLPtobeuseful, itshould delivergoodapproximationswhen
the ost{to{gofun tionisnearthespanofsele tedbasisfun tions. Inthisse tion,
wepresentaboundthat ensuredesirableresultsofthis kind.
Tosetthestagefordevelopmentofanerrorbound,letusestablishsomenotation.
First,weintrodu etheweightednorms,dened by
kJk
1;
= X
x2S
(x)jJ(x)j; kJk
1;
=max
x2S
(x)jJ(x)j;
for any : S 7! <
+
. Note that bothnorms allow foruneven weightingof errors
a rossthestatespa e.
Wealsointrodu eanoperatorH,denedby
(HV)(x)=max
a2A
x X
y P
a
(x;y)V(y);
for allV : S 7! <. Forany V, (HV)(x) representsthe maximumexpe tedvalue
of V(y) ifthe urrentstate isx and y isa randomvariable representingthenext
state. Basedonthisoperator,wedeneas alar
k
V
=max
x
V(x)
V(x) (HV)(x)
; (3)
V
a\Lyapunovstabilityfa tor,"inasensethatwewillnowexplain. Intheup oming
theorem, we will only be on erned with fun tions V that are positive and that
make k
V
nonnegative. Also, our error bound for the approximate LP will grow
proportionatelywithk
V
,andwethereforewantk
V
tobesmall. Ataminimum,k
V
shouldbenite,whi htranslatestoa ondition
(HV)(x)<V(x); 8x2S: (4)
If were equal to 1, this would look like a Lyapunov stability ondition: the
maximum expe ted value (HV)(x) at the next time step must be less than the
urrentvalueV(x). Ingeneral, islessthan1,andthis introdu essomesla k in
the ondition. Notealso that k
V
be omessmallerasthe(HV)(x)'s be omesmall
relativetotheV(x)'s. Hen e,k
V
onveysadegreeof\stability,"withsmallervalues
representingstrongerstability.
Weare now readyto stateourmain result. Forany givenfun tion V mappingS
topositivereals,weuse1=V asshorthandforafun tionx7!1=V(x).
Theorem3.1 [2 ℄Let~rbeasolutionoftheapproximateLP.Then,foranyv2<
K
su hthat (v)(x)>0for allx2S andHv<v,
kJ
~rk
1;
2k
v (
0
v)min
r kJ
rk
1;1=v
: (5)
AproofofTheorem3.1 an befoundin thelongversionofthis paper[2℄.
We highlight some impli ations of Theorem 3.1. First, the error bound (5) tells
thatthetheapproximationerroryieldedbytheapproximateLPisproportionalto
theerrorasso iatedwiththebestpossibleapproximationrelativetoa ertainnorm
kk
1;1=v
. Hen eweexpe tthattheapproximateLPwillhavereasonablebehavior
{ifthe hoi eofbasisfun tionsisappropriate,theapproximateLPshould yielda
relativelygood approximationto the ost-to-go fun tion, aslongas the onstants
k
v and
0
v remainsmall.
Notethatontheleft-handsideof(5),wemeasuretheapproximationerrorwiththe
weightednormkk
1;
. Re allthattheweightve tor appearsinobje tivefun tion
oftheapproximateLP(2)andmustbe hosen. Inapproximatingthesolutiontoa
givensto hasti ontrolproblem,itseemssensibletoweightmoreheavilyportions
ofthe statespa ethat arevisited frequently, so thata ura y willbeemphasized
in su h regions. As dis ussedin [2℄, it seemsreasonablethat theweight ve tor
shouldbe hosentore e ttherelativeimportan eofea hstate.
Finally, note that the Lyapunovfun tion v playsa entral role in the bound of
Theorem3.1. Its hoi ein uen esthreetermsontheright{hand{sideofthebound:
1. theerrormin
r kJ
rk
1;1=v
;
2. theLyapunovstabilityfa tork
v
;
3. theinnerprodu t 0
vwiththestate{relevan eweights.
An appropriately hosen Lyapunov fun tion should makeall three of these terms
relatively small. Furthermore, for the bound to be useful in pra ti al ontexts,
these terms should not grow mu h with problem size. Wenowillustrate with an
appli ationinqueueingproblemshowasuitableLyapunovfun tion ouldbefound
ConsiderasinglereentrantlinewithdqueuesandnitebuersofsizeB.Weassume
thatexogenousarrivalso uratqueue1withprobabilityp<1=2.Thestatex2<
d
indi atesthenumberofjobsin ea hqueue. The ostperstagein urredat statex
isgivenby
g(x)= jxj
d
= 1
d d
X
i=1 x
i
;
theaveragenumberofjobsperqueue.
Asdis ussedin[2℄,under ertainstabilityassumptionsweexpe tthat theoptimal
ost-to-gofun tionshould satisfy
0J
(x)
2
d x
0
x+
1
d e
0
x+
0
;
forsomepositives alars
0 ,
1 and
2
independentofd. We onsideraLyapunov
fun tion V(x)= 1
d x
0
x+C forsome onstantC>0,whi himplies
min
r kJ
rk
1;1=V
kJ
k
1;1=V
max
x0
2 x
0
x+
1 e
0
x+d
0
x 0
x+dC
2 +
1 +
0
C
;
andtheaboveboundisindependentofthenumberofqueuesinthesystem.
Nowletusstudyk
V
. Wehave
(HV)(x)
p
1
d x
0
x+ 2x
1 +1
d
+C
+(1 p)
1
d x
0
x+C
V(x)
+p 2x
1 +1
x 2
1 +dC
;
and itis learthat,for C suÆ ientlylargeandindependentof d,there isa <1
independentofdsu hthat HV V,andthereforek
V
1
1 .
Finally,letus onsider 0
V. Dis ussionpresentedin[2℄suggeststhatonemightwant
to hoose soastore e tthestationarystatedistribution. Weexpe tthat under
somestabilityassumptions,thetailofthestationarystatedistributionwillhavean
upperbound withgeometri de ay[1℄. Thereforewelet (x)=
1
1 B+1
d
jxj
,for
some0<<1. Inthis ase, isequivalenttothe onditionaljointdistributionof
dindependentand identi ally distributed geometri random variables onditioned
ontheeventthattheyarelessthanB+1,andwehave
0
V =E
"
1
d d
X
i=1 X
2
i +C
X
i
<B+1;i=1;:::;d
#
<2
2
(1 ) 2
+
1 +C ;
where X
i
;i = 1;:::;d are identi ally distributed geometri random variables with
parameter 1 . It follows that 0
V is uniformly bounded over the number of
queues.
This exampleshows thatthe termsinvolvedin the errorbound (5)are uniformly
bounded both in the number of states in the system and in the number of state
variables, hen e the behaviorof the approximate LPdoes notdeteriorate as the
problemsizein reases.
Wenally presentanumeri alexperimentto further illustratetheperforman eof
=1/11.5 λ 1
= 1/11.5 λ 2
= 4/11.5 µ 1
= 2.5/11.5 µ 8
= 3/11.5 µ 3
= 2/11.5 µ 2
= 2.2/11.5
µ 6 µ 4 = 3.1/11.5
= 3/11.5 µ 5
= 3/11.5 µ 7
Figure1: SystemforExample3.2.
Poli y ALP(=0:9) LBFS FIFO LONG
Average Cost 136.7 153.3 163.3 168.3
Table1: Averagenumberofjobsafter50,000,000simulationsteps
3.2 An Eight-Dimensional QueueingNetwork
We onsideraqueueingnetworkwitheightqueues. Thesystemisdepi tedinFigure
1,witharrival(
i
;i=1;2)anddeparture(
i
;i=1;:::;8)probabilitiesindi ated.
Thestatex2<
8
representsthenumberofjobs in ea h queue. The ost-per-state
isg(x)=jxj,andthedis ountfa toris0.995. A tionsa2f0;1g 8
indi atewhi h
queues are being served; a
i
= 1 i a job from queue i is being pro essed. We
onsider only non-iddling poli ies and, at ea h time step, aserver pro esses jobs
fromoneofitsqueuesex lusively.
We hoose oftheform (x)=(1 ) 8
jxj
. Thebasisfun tionsare hosentospan
all polynomials in x of degree2; therefore, theapproximate LPhas 47 variables.
Constraints(Tr)(x)(r)(x)fortheapproximateLParegeneratedbysampling
5000statesa ordingtothedistribution asso iatedwith . Experimentswereper-
formed for = 0:85;0:9 and 0:95, and = 0:9 yielded the poli y with smallest
average ost.
We omparedtheperforman eofthepoli yyieldedbytheapproximateLP(ALP)
with that of rst-in-rst-out (FIFO), last-buer-rst-serve (LBFS) 1
and a poli y
that servesthelongestqueue inea hserver(LONG).Theaveragenumberof jobs
in the system for ea h poli y wasestimated by simulation. Resultsare shown in
Table1. Thepoli ygeneratedbytheapproximateLPperformssigni antlybetter
thanea h ofthe heuristi s,yielding morethan 10%improvementoverLBFS, the
se ondbestpoli y. Weexpe tthatevenbetterresults ouldbeobtainedbyrening
the hoi eofbasisfun tionsandstate-relevan eweights.
4 Closing Remarks and Open Issues
Inthispaperwestudiedthelinearprogrammingapproa htoapproximatedynami
programmingforsto hasti ontrolproblemsasameansofalleviatingthe urseof
1
LBFSservesthejobthatis losesttoleavingthesystem;forexample,iftherearejobs
inqueue2and inqueue6,ajobfromqueue2is pro essedsin eitwill leavethe system
aftergoingthroughonlyonemorequeue,whereasthejobfromqueue6willstill haveto
gothroughtwomorequeues. Wealso hoosetoassignhigherprioritytoqueue8thanto
basisfun tions. Thebounds were shownto beuniformly bounded in thenumber
ofstatesandstatevariablesin ertainqueueingproblems.
Severalquestions remainopenand aretheobje toffuture investigation: Canthe
state-relevan eweightsin theobje tivefun tion be hosenin someadaptiveway?
CanweaddrobustnesstotheapproximateLPalgorithmtoa ountforerrorsinthe
estimationof ostsandtransitionprobabilities, i.e.,designanalternativeLPwith
meaningfulperforman eboundswhenproblemparametersarejustknowntobein
a ertainrange? Howdoourresultsextendto theaverage ost ase? Howdoour
resultsextendtotheinnite-state ase? Howdoesthequalityof theapproximate
valuefun tion,measurebytheweightedL
1
norm,translateintoa tualperforman e
oftheasso iatedgreedypoli y?
A knowledgements
This resear hwassupported byNSF CAREERGrantECS-9985229,bytheONR
underGrantMURIN00014-00-1-0637,andbyanIBMResear hFellowship.
Referen es
[1℄ Bertsimas,D., Gamarnik, D. &Tsitsiklis, J.,\Performan e ofMulti lass Markovian
QueueingNetworksviaPie ewiseLinearLyapunovFun tions,"submittedtoAnnalsof
AppliedProbability,2000.
[2℄ deFarias,D.P.&VanRoy,B.,\TheLinearProgrammingApproa htoApproximate
Dynami Programming,"submittedtopubli ation,2001.
[3℄ de Farias, D.P. & Van Roy, B., \On Constraint Sampling for Approximate Linear
Programming,",submittedtopubli ation,2001.
[4℄ Manne,A.S.,\Linear Programming andSequentialDe isions," Management S ien e
6,No.3,pp.259-267,1960.
[5℄ S hweitzer,P.&Seidmann,A.,\GeneralizedPolynomialApproximationsinMarkovian
De isionPro esses," Journalof Mathemati alAnalysisandAppli ations110,pp. 568-
582,1985.