REIHE COMPUTATIONAL INTELLIGENCE
COLLABORATIVE RESEARCH CENTER 531
Design and Management of Complex Technical Processes
andSystems by meansofComputational IntelligenceMethods
Approximations by OBDDs and the Variable
Ordering Problem
Matthias Krause Petr Savicky Ingo Wegener
No. CI-58/99
Technical Report ISSN 1433-3325 January 1999
Secretary of the SFB531 University ofDortmund Dept. ofComputerScience/XI
44221DortmundGermany
This work is a pro duct of the Collab orative Research Center 531, \Computational
Intelligence", atthe UniversityofDortmundand wasprintedwithnancialsupp ort of
VARIABLE ORDERING PROBLEM Matthias Krause 1 PetrSavick y 2 Ingo Wegener 3 1
TheoretischeInformatik,Univ.Mannheim,68131Mannheim,Germany
e-mail:[email protected]
2
InstituteofComputerScience,AcademyofSciencesofCzechRepublic,
Po dvo darenskouvez2,18207Praha8,CzechRepublic
e-mail:[email protected]
3
FBInformatik,LS2,Univ.Dortmund,44221Dortmund,Germany
e-mail:[email protected]
Abstract
Ordered binary decision diagrams (OBDDs)andtheir variantsare
moti-vated by the need to represent Bo olean functions in applications. Research
concerning these application s leads also to problems and results interesting
fromtheoretical p oint ofview. Inthispap er, metho dsfromcommunication
complexityandinformationtheoryarecombinedtoprovethatthedirect
stor-ageaccessfunctionandtheinnerpro ductfunctionhavethefollowingprop erty.
Theyhavelinear -OBDDsizeforsomevariableorderingand,formost
vari-ableorderings 0
,allfunctionswhichapproximatethemonconsiderablymore
than halfof theinputs, need exp onential 0
-OBDD size. Theseresults have
implicationsfortheuseofOBDDsingeneticprogramming.
1 INTRODUCTION
Branchingprograms (BPs) orbinary decision diagrams(BDDs), which is just
an-other name, are representations of Bo olean functions f 2 B
n
, i.e., f:f0;1g n
!
f0;1g. They are compact but not useful for manipulationsof Bo olean functions,
sinceop erationslikesatisabilitytest,equivalencetestorminimizationleadtohard
problems. Bryant[6] hasintro duced -OBDDs(ordered BDDs),since they canb e
manipulatedeÆciently(see [7]and[19]forsurveys ontheareas ofapplication).
Denition1.1 A p ermutation on f1;:::;ng describ es the variable ordering
x
(1) ;:::;x
(n)
. A -OBDDisadirectedacyclicgraphG=(V;E)withonesource.
Eachsinkislab elledbyaBo oleanconstantandeachinnerno debyaBo olean
vari-able. Innerno deshavetwooutgoingedgesonelab elledby0andtheotherby1. If
an edge leads fromanx
i
-no deto an x
j
-no de, then 1
(i)hasto b e smaller than
1
(j), i.e.,theedges have to resp ect thevariable ordering. The -OBDD
repre-sentsthefunctionf 2B
n
dened inthefollowingway. Theinputaactivates,for
x
i
-no des,theoutgoinga
i
-edge. Thenf(a) isequaltothelab elofthesinkreached
bythe uniqueactivated pathstartingatthesource. Thesizeof Gis measuredby
thenumb erofitsno des. An OBDDisa -OBDDforanarbitrary .
One-waycommunicationcomplexity(seee.g. [11],[14])leadstolowerb oundsfor
OBDDs. Thismetho disalmostthesameascountingthenumb erofsubfunctionsof
f iftherstvariablesaccording tothevariableorderingarereplacedbyconstants.
Therearefunctions,forwhichtheOBDDsizeisverysensitivetothechosenvariable
1
Supp ortedbyDFGgrantKr1521/3-1.
2
Theresearchwaspartiallysupp ortedbyGAoftheCzechRepublic,GrantNo. 201/98/0717.
3
Supp ortedbyDFGgrantWe1066/8-1andbytheDFGaspartoftheCollab orativeResearch
optimalvariableorderingforf,see[5],oreventoapproximatetheoptimalvariable
ordering, see[16].
We willneedthefollowingtwofunctions.
Denition1.2 i) For n =2 k
, thedirect storage access function(ormultiplexer)
on k+n variables is thefunction DSA
n (a 0 ;:::;a k 1 ;x 0 ;:::;x n 1 )= x jaj , where
jajisthenumb erwhosebinaryrepresentation is(a
0 ;:::;a
k 1 ).
ii) For any even n, the inner pro duct function on n variables is the function
IP n (x 1 ;:::;x n )=x 1 x 2 x 3 x 4 :::x n 1 x n .
Clearly,these twofunctions have -OBDD ofsize O (n) fortheorderingof the
varibles used in the denition of the functions. On the other hand, they need
exp onential -OBDDsize formostofthevariableorderings(a fractionof 1 n "
forDSA
n
andevenafractionof1 2 "n
forIP
n
, see[20]). Another exampleofa
functionwithasimilarprop erty,so-calleddisjointquadraticform,mayb eobtained
byreplacingbydisjunctionintheexpressiondescribingIP.
These results are statedfor error-free representations of f and, up to now, to
theb estofourknowledge,nob o dyhaslo okedatrepresentationsofapproximations
of f froma theoreticalp oint of view. In this pap er, we investigatethe inuence
ofthevariableorderingforapproximaterepresentations offunctions. Ifnotstated
otherwise, by a random input x~ we mean an input x~ chosen from the uniform
distributiononf0;1g n . Denition1.3 A function g2 B n is a c-approximation of f 2B n if Pr (f(~x) =
g (x))~ c forarandom input x.~
Oneofthetwoconstantfunctions0and1alwaysisa1=2-approximation.Hence,
weconsider c-approximationsforc>1=2.
Weprovethefollowingstrengtheningsofthepreviouslymentionedlowerb ounds
onthe -OBDDcomplexityofDSA
n andIP
n
forarandomordering. Formost of
theorderings , everyfunctionthat isa(1=2+")-approximationofDSA
n or IP
n ,
where ", 0<"<1=2isanyconstant,requires a -OBDDof exp onential size. In
thecase ofIP
n
,theresultremainstrue evenif"tendstozero inacontrolledway.
Forexact formulationsseeTheorems2.1and 3.1.
The result for the inner pro duct function is stronger, since it can b e proved
for b etter parameters. On the other hand, the result for DSA
n
is of particular
interestforgeneticprogramming,since,recently,DSA
n
isfrequentlyusedinexp
er-iments. Thepro ofcombinesmetho dsfromone-waycommunicationcomplexityand
informationtheory.
Theproblemsaremotivatedbyexp erimentsingeneticprogrammingusing
OB-DDs, where one searches forago o dapproximationof an unknownfunctiongiven
by examples. Our results have consequences for the situation that the unknown
functionhasasmallOBDDforsomeordering,butthisorderingisnotknown. For
moredetailsseeSection 4.
For completeness, we present also an example of a function that is hard to
approximateforanyordering.
2 THEDIRECTSTORAGEACCESSFUNCTION
First, we state the result informally. There are only a few variable orderings
which allow an approximationg
n
of DSA
n
which is essentially b etter than the
trivial approximations by the constants 0 and 1 (which are 1=2-approximations)
holds forafractionof atleast1 n 2"
2
=ln2
ofthe variableorderings forDSA
n .
Eachfunctionwhichisa( 1 2 +"+n (" Æ )=2 )-approximationofDSA n hasa -OBDD
size whichis boundedbelowbye n
Æ
.
Thepro of ofthis theoremis splitted into Lemmas2.2 and 2.3. Recall that an
orderingforDSA
n
isap ermutationofn+kvariables,wherek=logn. Inorder
to simplifytheterminology,we assumethat therstn 0
=
def
b(1 2")nc variables
according to are given to Alice and the other ones to Bob. First, we derive a
prop erty ofrandomvariableorderings .
Lemma 2.2 Withprobabilityatleast1 n 2"
2
=ln2
,Aliceobtainsatmost(1 ")k
addressvariables,i.e.,a-variables.
Pro of. Therandom variableordering can b e pro duced asfollows. We take thek
address variablesand randomlycho ose for them one after another afree p osition
amongthen+kp ossiblep ositions. Thenwe continueinthesamewaywiththen
data variables,i.e.,the x-variables. Duringthe rstk steps ofthis pro cess, there
arealwaysatmost(1 2")nfreep ositionsamongtherstn 0
=b(1 2")ncp ositions
andatleastnop enp ositionsatall.Hence, theprobabilityofeachaddressvariable
to b e given toAlice, is atmost1 2". We can upp er b ound theprobabilitythat
Alicegetsmorethan(1 ")kaddressvariablesbytheprobabilityofatleast(1 ")k
successes inkindep endentBernoullitrialswithsuccess probability1 2".
Theexp ected numb erofsuccesses E[Z]equals(1 2")k . ByCherno'sb ound,
weobtain Pr(Z(1 ")k )=Pr(ZE[Z]+"k )e 2" 2 k =n 2" 2 =ln2 : 2
Inthefollowing,we xavariableorderingwhere Alicegetsat most(1 ")k
address variables. She also gets at least (1 2")n k data variables. If Alice's
address variables are xed, there are at least n "
data variables left which may
describ e theoutput. On theaverage, at least(1 2")n "
o(1) ofthese variables
are given to Alice. Inorder to enable Bob to computethe output exactly, Alice
hasto send himthe valueof heraddress variables andthose datavariableswhich
can describ e theoutput. IftheinformationgivenfromAliceismuch smallerthan
this, Bob can computethe valueofDSA
n
onlywith probabilitycloseto1=2. The
informationgivenfromAliceto Bobis measuredbythelogarithmof thesize ofa
-OBDDcomputingthefunctionDSA
n .
For arigorous argument, let b e an ordering and let A (resp. B) b e the set
of addressvariablesgivenin toAlice(resp. Bob) andletX (resp. Y) b etheset
of data variablesgivenin to Alice(resp. Bob). Clearly,jA[Xj=n 0
and every
computationinany -OBDDreadsrst(someof)thevariablesinA[X andthen
(someof)thevariablesinB[Y. Letg b eafunctionrepresented bya -OBDDG
ofsizes. Because ofthedenitionofc-approximations,weconsiderrandominputs
( ~a; ~
b;x;~ y )~ wherea~is arandomsettingof variablesinA,etc. Inthis situation,the
followingholds. Lemma 2.3 Pr (DSA n ( ~a; ~ b;x;~ ~y )=g ( ~a ; ~ b;x;~ y))~ 1 jXj 2n + 1 2n 22 jAj jXjlns 1=2 .
Before provingLemma2.3,letusdemonstrateitsapplicationbyproving
Theo-rem2.1.
Pro of ofTheorem 2.1. Recall that k=lognandlet s<e n
Æ
. Forevery ordering
probabilityatleast1 n 2"
,wehavejAj(1 ")k . Bysubstitutingtheseestimates
intotheb oundfromLemma2.3,we obtainthatthe probabilitythat DSA
n and g
have thesame valueis at most 1 2 +"+ 1 p 2 n (" Æ )=2 + logn 2n < 1 2 +"+n (" Æ )=2 .
This impliesthetheorem. 2
ForprovingLemma2.3,weneedsomemorenotation. LetH(U)b etheentropy
ofarandomvariableU andH(UjE)resp.H(UjV)theentropyofUgivenanevent
E orarandomvariableV resp. Moreover,letH
(x)= xlogx (1 x)log(1 x)
forx2(0;1).
For each(a;b;y ),let
q (a;b;y )=Pr (DSA
n
(a;b;~x;y )=g (a;b;~x;y ))
forrandomassignmentsx~tothevariablesinX. Theprobability,weareinterested
in,istheaverageofallq (a;b;y ). Letq (a;b)denote theaverageofq (a;b;y )overall
p ossible y and,similarly,letq (a)denotetheaverageofq (a;b;y ) overallp ossibleb
and y . Moreover,foreachpartialinputa,letI
a
b ethesetofpartialinputsb such
that thevariablex
j(a;b)j orx
a;b
, forsimplicity,is givento Alice. Note that H
(x)
is maximalforx=1=2andthemaximumisequalto1. Hence, ifjI
a
jlogs, the
next lemmaimpliesthatformostofb2I
a
,q (a;b)iscloseto1=2.
Lemma 2.4 For everya,wehave
X b2Ia H (q (a;b))jI a j logs:
Pro of. Consider the -OBDDcomputing thefunctiong describ ed b eforeLemma
2.3. For any settingsa;x,leth(a;x) b e therstno de, where thecomputationfor
a;xreachesano detestingavariableinB[Y orasink. Notethat acomputation
for a;b;x;y dep ends on a;xonlyviah(a;x). This means,there is afunction
b;y
suchthatg (a;b;x;y )=
b;y
(h(a;x)). Notethatthesizeoftherangeofhisatmost
s.
Besideswell-knowninformationtheoreticalinequalitiesweusethefollowingone
whosepro of isp ostp onedtotheendofthesection.
Claim 1 LetU andV berandomvariablestakingvaluesinf0;1g. ThenH
(Pr(U =
V))H(UjV).
If(a;b;y )isxedandb2I
a , DSA
n
outputs x
a;b
. Usingtheclaimandthefact
that H(Ujf(V))H(UjV)foreachfunctionf, we conclude
H (q (a;b;y ))=H (Pr (~x a;b = b;y (h(a;~x))) H(~x a;b j b;y (h(a;~x)))H(~x a;b jh(a;~x)):
Now we use the fact H(U
1 jV)+:::+H(U r jV) H((U 1 ;:::;U r )jV) for x~ a;b , b2I a
,andthevector x~
a
ofthese randomvariables. This implies
X b2I a H(~x a;b jh(a;x))~ H(~x a jh(a;~x)):
InthenextstepweapplytheequalitiesH(UjV)=H(U;V) H(V)andH(U;f(U))=
H(U)toobtain H(~x a jh(a;x))~ =H(~x a ;h(a;x))~ H(h(a;x))~ =H(~x a ) H(h(a;x)):~
The randomvariables x~
a;b , b 2I
a
, are indep endent and take values inf0;1g,i.e.,
~ x
a
isuniformlydistributed overf0;1g jIaj andH(~x a )=jI a j. Thisimplies H(~x a jh(a;~x))jI a j logs:
Puttingallourconsiderationstogether,weobtain
X b2I a H (q (a;b;y ))jI a j logs: ThefunctionH
isconcave. Hence, thisinequalityimpliesLemma2.4. 2
Pro of of Lemma 2.3. Let (a;b) = q (a;b) 1
2
. Then we apply the inequality
H ( 1 2 +t)1 (2=ln2)t 2
(estimateTaylor'sexpansionusingthesecondderivative)
to obtain X b2Ia H (q (a;b))= X b2Ia H 1 2 +(a;b) jI a j (2=ln2) X b2Ia (a;b) 2 :
TogetherwithLemma2.4,we get
1 2 lns X b2Ia (a;b) 2 :
Using Cauchy'sinequality,weobtain
X b2Ia j(a;b)j jI a j X b2Ia (a;b) 2 ! 1=2 1 2 jI a jlns 1=2 :
Recallthat q (a)is theaverageofallq (a;b). Sinceb maytake 2 jB j values,we get q (a) = 1 2 jB j 0 @ X b62I a q (a;b)+ X b2I a q (a;b) 1 A 1 2 jB j 2 jB j 1 2 jI a j+ X b2Ia (a;b) ! 1 2 jB j 1 (jI a j (2jI a jlns) 1=2 )= (jI a j); where (t) = def 1 2 jB j 1 (t (2tlns) 1=2
). The function is concave. Let
a 1 ;:::;a m , m=2 jAj
,b ethep ossiblevalues ofa. Then
1 m X 1im q (a i ) 1 m X 1im (jI ai j) 0 @ 1 m X 1im jI ai j 1 A = (jXj=2 jAj ):
The last equalityfollows,since, bydenition,the sumofall jI
ai
j equals jXj. The
left-handsideoftheab oveinequalityistheaverageofallq (a)andthisistheaverage
ofallPr (DSA(a;b;x;~ y )=g (a;b;x;~ y ))and,therefore,equaltoPr(DSA( ~a; ~
b;x;~ y )~ =
g ( ~a; ~
b;x;~ y )).~ We haveprovedthatthisprobabilityisb oundedab oveby
(jXj=2 jAj )=1 jXj 22 jAj+jB j + 1 22 jAj+jB j (22 jAj jXjlns) 1=2 :
Since Aand B areapartitionofthelognaddress variables,we have2 jAj+jB j
=n
Pr(U =V)= X 2f0;1g Pr (U =jV =)Pr(V =): TheconcavityofH implies H (Pr (U =V)) X 2f0;1g H (Pr (U=jV =))Pr(V =): SinceH (x)=H (1 x),we obtain H (Pr(U =0jV =))=H (Pr (U =1jV =))=H(UjV =) and H (Pr(U =V)) X 2f0;1g H(UjV =)Pr(V =)=H(UjV): 2
Letusaddsomecommentstothisresult. ArandomvariableorderingforDSA
n
iswithaprobabilityofatleastn logn
optimal,i.e.,alladdressvariablesaretested
b eforeeachdatavariable. Ifweconsidercutsasinourpro of,Alicegetsalladdress
variableswithaprobabilitywhichisapproximatelyn
log(1 2")
. Therefore,we need
another approach to improve the result with resp ect to the fraction of variable
orderings but one cannotobtain aresult foran exp onentially smallfraction. Bob
gets at least 2"n data variables. He can output the correct value, if Alice tells
himheraddressvariablesand thedecisive data variableisamongBob's variables.
Otherwise, he can guess the right output with probability 1=2(actually, he may
cho osealways0asoutput). Thenhissuccessprobabilityequals2"+ 1 2 (1 2")= 1 2 +".
Hence, ourapproachcannotleadto substantiallyb etter results.
3 THE INNER PRODUCT FUNCTION
In this section, we prove results on the inner pro duct function which are of the
same avor as theresults onthe direct storage access function inSection 2. The
dierence isthatwecanproveb etterb oundsonthequalityofapproximationeven
foralargerfractionof variableorderingsand largerOBDDs.
Theorem3.1 Let 0< Æ. The following property holds for a fraction of at least
1 e 4Æ
2
n
of the variable orderings for IP
n
. Each function which is at least a
( 1 2 +2 1 16 (1 9Æ )n 1 2 )-approximation of IP n
has a -OBDD size which is bounded
belowby2 Æ n
.
Pro of. First,wederiveaprop ertyofrandomvariableorderings . Itisconvenient
to renamethevariablessuch that IP
n (x;y ) =x 1 y 1 :::x n=2 y n=2 . We give the
rstn=2variablesaccordingtotoAliceandtheotheronestoBob. Anindexiis
called asingletonifx
i
isgiventoAliceandy
i
toBobor viceversa.
Lemma 3.2 Withprobabilityatleast1 e 4Æ
2
n
,arandom variableorderingleads
toatleast(1 Æ)n=8singletons.
Pro of. Therandomvariable orderingcan b e pro ducedas follows. First,we draw
n=2 ballsout ofanurn with n=2white balls(x-variables)andn=2 blackballs (y
variablesx
1 ;:::;x
w
. Thenwedrawn=2 w ballsoutofanurnwithwwhiteballs
(the y -variables y
1 ;:::;y
w
) and n=2 w black balls (the remainingy -variables).
The numb er of drawn black balls is alower b ound for the numb er of singletons.
Ourchance ofgettingmanysingletonsisminimalforthemaximalvaluew=n=4.
Then we haveahyp ergeometric distributionwithmeann=8. Itiswell-knownthat
the deviationfromthemean is largerfor thebinomialdistributionwith thesame
success probabilitywhichis 1=2inourcase. Hence, we canb ound theprobability
of gettingat most(1 Æ)n=8singletonsbytheprobabilityof(1 Æ)n=8successes
inn=4 Bernoullitrials with success probability1=2. Nowtheresult followsby an
applicationof Cherno'sb ound. 2
We onlyremark that itis evenp ossible to obtainalowerb ound of(1 Æ)n=4
singletons ifweincrease theerrorprobabilityalittlebit.
Inthe following,we x avariable ordering where Alicegets amongher n=2
variablesexactly tsingletons. Letg b e afunctionrepresented bya -OBDDG of
size s. Wetry toestimatetheprobabilitythatIP (~x)=g (~x) onarandominputx.~
Lemma 3.3 Pr (IP(~x)=g (x))~ 1 2 +s 1=2 2 t=2 1=2 :
First we show how this claimimplies the theorem. We set s = 2 Æ n and t = 1 8 (1 Æ)n. Then s 1=2 2 t=2 =2 Æ n=2 2 (1 Æ )n=16 =2 1 16 (1 9Æ )n :
and thetheoremfollowsfromLemmas3.2and3.3. 2
Pro ofofLemma3.3. WeconsiderthecommunicationmatrixforIP
n
withresp ect
to the partitionof the variables b etween Alice and Bob, i.e., we have 2 n=2
rows
corresp onding tothedierentinputvectorsforthevariablesofAliceandsimilarly
2 n=2
columns. Each matrixentry is thevalueof IP ontheinput oftherowinput
andthecolumninput. Ifwexallvariablesx
j andy
j
wherejisnotasingleton,we
obtain IP
2t
or itsnegationas subfunction. Hence, thecommunicationmatrixcan
b epartitionedto2 t
2 t
-submatriceswhicharecommunicationmatricesforIP
2t or
its negation. For eachof thesesubmatrices M =(M
ij
),w.l.o.g.Aliceis theowner
of allx-variables andBobtheowner ofally -variables. Itis knownfromLindsey's
Lemma(see,e.g.,Babai,Frank,andSimon(1986))thatforeachsubsetAofarows
ofM andeachsubset B ofb columnsofM itholdsthat
X i2A;j2B ( 1) M ij 2 t=2 a 1=2 b 1=2
i.e.,eachnotto osmallsubmatrixisnotconstant. Moreprecisely,wehavetonegate
atleast 1 2 (ab 2 t=2 a 1=2 b 1=2
)entriesofanabsubmatrixofM toobtainaconstant
submatrix. It follows by an averaging argument that there is an assignment to
thevariablesx
j andy
j
where j isnotasingleton suchthat Pr(IP
(~x)=g
(~x))
Pr (IP(x)~ =g (~x))fortheresultingsubfunctionsIP
andg
ofIPandgresp. Bythe
ab ove arguments,we canassumew.l.o.g.that IP =IP 2t . ItissuÆcientto prove Pr (IP (~x)=g (~x)) 1 2 +s 1=2 2 t=2 1=2 : Let M
b e thecommunicationmatrixofg
,D =MM
,and jjD jjthenumb er
of1-entries ofD . Then Pr (IP (~x)=g (~x))=1 jjD jj=2 2t :
jjD jj 1 2 2 2t s 1=2 2 3t=2 1=2
whichimpliestheclaim.
Sinceg
canb erepresentedbya -OBDDwhosesizeisb oundedbys,itfollows
fromthewell-knownrelationsb etween one-waycommunicationcomplexityand
-OBDDsize thatthecommunicationmatrixofg
hasat mostsdierentrows. Let
r
1 ;:::;r
s
containthedierentrowsofM
. We partitionM
toasmallnumb erof
constantsubmatriceswhichintuitivelyimplies thatM andM
arequitedierent.
We p ermutate the rows (i.e., renumb er the input vectors of Alice) such that we
have at rst a 1 rows equalto r 1 , then a 2 rowsequalto r 2
and so on. The blo ck
of a
k
equalrowscanb e partitionedforsomeb
k to ana k b k -matrixconsisting of
zerosonlyandana
k (2
t
b
k
)-matrixconsistingofonesonly. Altogetherwehave
partitioned M
to at most 2s constant submatrices whose sizes are a
k b k and a k (2 t b k ),1ks.
Nowweconsider thecorresp ondingsubmatrices ofM. By theconclusion from
Lindsey's Lemma,itfollowsthatwe haveto negateatleast
d= X 1k s 1 2 (a k b k 2 t=2 a 1=2 k b 1=2 k +a k (2 t b k ) 2 t=2 a 1=2 k (2 t b k ) 1=2 )
entriesinordertoconverteach submatrixintoaconstantone. Hence,
jjD jjd= X 1k s 1 2 (a k 2 t 2 t=2 a 1=2 k (b 1=2 k +(2 t b k ) 1=2 )):
Thesumofalla
k isequalto2 t . Hence, d + := X 1k s 1 2 a k 2 t = 1 2 2 2t : Thetermb 1=2 k +(2 t b k ) 1=2 is maximizedfor b k =2 t 1 and d := X 1k s 1 2 2 t=2 a 1=2 k (b 1=2 k +(2 t b k ) 1=2 )) X 1k s 2 t 1=2 a 1=2 k : Sincex 1=2
isconcave,weobtainthemaximalvaluefora
k =2 t =sand d = X 1k s 2 t 1=2 2 t=2 s 1=2 =s 1=2 2 3t=2 1=2 : SincejjD jjd=d +
d ,wehave provedtheprop osed b oundonjjD jj. 2
Itisnoweasyto obtainafunctionwhich ishardto approximateby -OBDDs
and arbitrary variable orderings. We dene the functionp ermuted inner pro duct
PIP
n
ondlog(n!)e+nvariables. Therstdlog(n!)e=(nlogn)variablesdescrib e
a p ermutation on f1;:::;ng, more precisely, foreach p ermutation,the numb er
of co de words is one or two. The functionPIP
n
realizes IP
n
on the remainingn
variableswhicharep ermutedaccordingto. Foreachvariableordering ,wehave
torepresent(forthedierentassignmentstothep ermutationvariables)IP
n forall
variableorderings once or twice. Hence, the result of Theorem 5which holds for
manyvariableorderingsandIP
n
implies asimilarresult forPIP
n
andallvariable
orderings.
Corollary 3.4 The function PIP
n
cannot be approximatedwellby -OBDDs and
SION FOR GENETIC PROGRAMMING
Geneticprogrammingwasintro ducedbyKoza[12]as aheuristic approach to
con-struct aprogram (in the formof anS-expression) computing afunction given by
examples. LetusconsidergeneticprogrammingrestrictedtoBo oleanconceptslike
Bo olean formulas, binary decision diagrams,circuits etc. This restricted form of
genetic programmingiscloselyrelatedtothefollowingtyp e ofminimization
prob-lems.
Assume,amo delforrepresentingBo oleanfunctionsisgiven. Itmayb eamo del
fromthelistab ove,butalsosomeweakermo dellikeOBDDs,DNFformulas,
deci-siontreesetc. Moreover,acomplexitymeasureforthegivenmo delissp ecied. An
instanceoftheminimizationproblemisdescrib edbyasetSf0;1g n
ofinputsofa
functionf ofnvariablesandthevaluesf(x)forallinputsx2S. Theproblemisto
ndafunctiong togetherwithitsrepresentation inthegivenmo delofcomplexity
as smallasp ossible andsuch that g (x)=f(x)forallxinS.
Geneticprogrammingisaverygeneraltyp eofheuristicsapplicabletothiskind
ofproblemswhichisexp ectedtoallowfurtherprogressinthisarea.Letusp ointout
thatingeneticprogramming,theusualformulationoftheminimizationrequirement
isthatwelo okforarepresentationofgofcomplexityb elowab oundsp eciedamong
theparametersof therun.
In the present pap er, we are mostly interested in the situation, where f is a
total function, which is unknown, and we only have the values of f on a set of
inputs S, which is notcomplete,i.e. S 6=f0;1g n
. Inthis case, thegoalisto nd
atotalfunctiongwhichagreeswithf onexamplesfromS and,moreover,yieldsa
justiablepredictionofthevalues oftheunknownfunctionf ontheinputsnotin
S. Thepairshx;f(x)i forall x2S are calledtraining examples andthe required
functiong iscalled ageneralizationofthetrainingexamples.
In order to get provable justication, we assume that the examples are
cho-sen atrandom,indep endentlyand fromthesamedistribution. This allowsto use
knownresultsconcerning thePAC-learningmo del,whichimplythatundercertain
assumptions,thegeneralizationproblemcan b ereduced totheminimization
prob-lem. More exactly,itisprovedin[3](see also[4])that undernaturalassumptions,
it isp ossible to sp ecify acomplexityb ound sand anumb erm, which istypically
larger thans,so thatanyfunctiong ofcomplexityatmosts,whichagreeswithf
onm randomlychosen indep endentexamples,is likelyto b eago o dapproximaion
off onallinputs.
In exp eriments with S-expressions, for a long time, only tree representations
are used. AlreadyKoza[13]hasrecognized thevalueofgraph representationsand
has intro duced ADFs (automaticallydened functions), i.e., subprograms which
can b e usedat severalplaces. Droste[8],[9] suggestedto useOBDDs andgenetic
programminginordertogeneralizeagivensetoftrainingexamples. Inthecase of
-OBDDsforaxedordering,subprogramsusedatseveralplacesareautomatically
identied and mergedbythe reduction algorithm. Ifthe unknownfunctionhas a
smallOBDDrepresentation,thenthissignicantlyhelpstondtherepresentation.
Successful exp eriments together with a theoretical background may b e found in
[8],[9],[10],[17]. Ap ossibilityto adopt anexisting learningalgorithmforOBDD
using memb ershipand equivalence queries to a heuristic minimization pro cedure
forincompletelysp eciedBo oleanfunctionsisdescrib ed in[2].
Exp eriments withminimizationfortotal functions using -OBDDsfor axed
orderingwerealsop erformed,see[21],[15].
domain. Assume, ~
S is a collection of m examples chosen independently from a
distribution D on the domain. Then, the probability that there exists a function
g 2H,which agrees withf on all examples in ~
S,but Pr (g (~x)=f(~x ))<1=2+",
wherex~ischosen fromD ,is atmostjHj 1 2 +" m .
Itis p ossibleto strengthen this theoremusing theVC dimensionof H, see [4].
However,since we havenonontrivialupp er b ound ontheVCdimensionofclasses
corresp onding to OBDDs, we usethe weaker form. This is almostthesameas if
weusetheformulationusingtheVCdimensionandestimatetheVCdimensionby
logjHj,whichisageneralupp er b oundontheVCdimensionofH.
In a typical application of this theorem following[3], the set H is the set of
all functions of complexityat most s in somemo del. In order to apply Occam's
razortheoremtoOBDDs,weneedanupp erb oundonthenumb erofnonequivalent
OBDDsofagivensizes. Droste[9]usedago o dupp erb oundonthisnumb erbased
on asystem of recurrence relations. Inorder to achieve a closed formula forthis
upp erb ound,weuseaslightlydierentmo del,namelycompleteOBDDs,whichtest
every variableinevery computation. For completeOBDDs,thefollowingb oundis
easyto obtainusingthemetho dofcountingcircuits, seee.g. [18].
Lemma 4.2 Thenumberofnonequivalent complete -OBDDsofsizesforagiven
isatmost(s+2) 2s
=s!(es) s
.
CombiningLemma4.2withTheorem4.1,itis p ossibletojustifythequalityof
thepredictionof theunknownfunctionf,obtainedbyanyheuristic minimization
pro cedure forOBDDs. Weleavetheexactformulationtothefullpap er.
Thegeneralsituationisthatinordertogetago o dapproximation,itisnecessary
that the heuristic minimization pro cedure succeeds to nd a small OBDD that
agreeswithallthetrainingexamples. Letuscallthesituationthatwendsuchan
OBDDacompressionofthesetoftrainingexamples,since thesizeb oundrequired
toachieve ago o dpredictionisalmostexactlytheb oundwhichguarantees thatthe
numb erofbitsneeded torepresent gisless thanm.
TheresultsoftheprevioussectionsimplythatDSA
n andIP
n
arehardto
approx-imatebyany -OBDDforarandom . Inthenext theorem,we prove,moreover,
thatanyfunctionf thatishardtoapproximateinthissensehasalsothefollowing
prop erty. If we have aset oftrainingexamplesfor functionf
, whichis obtained
fromthefunctionf byanunknownp ermutationof thevariables, thena
reason-ablecompressionoftheexamplesrequiresalsotooptimizetheorderingofvariables
usedtorepresentg . Moreexactly,ifwecho oseanorderingofthevariablesfor
solv-ing theminimizationproblemat randomb eforewe starttheminimizationpro cess
and the ordering is not mo diedduring the pro cess, then, with high probability,
almostnocompressionis p ossible.
Theorem4.3 Letf,,"andadistributionDonf0;1g n
besuchthatthefollowing
is true: if an ordering is chosen from the uniform distributionon all orderings,
then with probability at least 1 , every -OBDD h of size at most s satises
Pr (f(~x) = h(x))~ < 1
2
+", where x~ is chosen from the distribution D . Let ~
S be
a set of m independent random examples chosen from D . Then, with probability
1 (es) s 1 2 +" m
,thereisno -OBDDg of sizeatmosts thatagreeswithf
on alltrainingexamples in ~
S.
Pro of. Letuscallanorderingbad,ifithastheprop ertymentionedinthetheorem.
A randomordering isbad withprobabilityat least 1 . Sincetheexamples are
chosen indep endently on the ordering, the distribution of the examples do es not
size at most s under the condition that the ordering is bad. Every -OBDD
of size at most s matches all the examples with probability at most 1 2 +" m .
Multiplyingthisbythenumb erof -OBDDsofsizeatmostsyieldsanupp erb ound
ontherequiredconditionalprobability. Hence,theconditionalprobabilitythatthe
examplesmaynotb eexpressedincomplexityatmostsisatleast1 (es) s 1 2 +" m .
Itfollowsthattheprobabilitythattheorderingisbadand,moreover,theexamples
maynotb eexpressedinsizeatmostsisatleast(1 ) 1 (es) s 1 2 +" m . This
impliesthetheorem. 2
This general result mayb e combined with the results of Sections 2and 3 to
obtainthefollowing.
Corollary 4.4 Foreverylargeenoughn,ifwetakem=n (1)
examplesforDSA
n
fromtheuniformdistributionandchoosearandomorderingofthevariables,then
withprobabilityatleast1 n 1=2
,thereisno -OBDDofsize 1
10
m=logmmatching
the given mtraining examples.
Pro of. Let";" 0 b esuchthat p ln2=2<"<" 0 <1=2 1=10 1=2. Moreover,letÆ <"
b eanysmallp ositivenumb erandlets= 1
10
m=logm. UsingTheorem2.1,weobtain
for every large enough n that a random ordering satises the following. With
probabilityatleast1 n 2" 2 =ln2 ,thereisno( 1 2 + " 0
)-approximationamongfunctions
of -OBDDcomplexityatmostse n
Æ
. Notethatinoursituation,(es) s
2 m=10
.
UsingalsoTheorem4.3,withprobabilityatleast1 n 2" 2 =ln2 2 m=10 1 2 +" 0 m 1 n 1=2
, there is no -OBDD of size at most s matching the givenm training
examplesforDSA
n . 2
Corollary 4.5 Let0<<1be aconstant. For everylargeenough n,if wetake
m =n O (1)
, m n,examples forIP
n
from the uniform distribution and choose a
randomorderingofthevariables,thenwithprobabilityatleast1 e (n)
,thereis
no -OBDDofsizeatmost(1 )m=logmmatchingthegivenmtrainingexamples.
Pro of. Let Æ and " b e p ositive numb erssuch that Æ < 1 9 and 1 2 +" < 1 2 1 .
Moreover, let s = (1 )m=logm. By Theorem 3.1, for every large enough n,
arandom ordering satises the following. Withprobabilityat least 1 e 4Æ 2 n , there isno ( 1 2 +")-approximationofIP n
amongfunctionsof -OBDDcomplexity
atmosts2 Æ n
. Notethat(es) s
2 (1 )m
. TogetherwithTheorem4.3,weobtain
that withprobabilityatleast1 e 4Æ 2 n 2 (1 )m 1 2 +" m =1 e (n) ,thereis
no -OBDDof sizeatmostsmatchingthegivenmrandomtrainingexamplesfor
IP
n . 2
On the other hand, for the functions DSA
n
and IP
n
, there are orderings, for
whichago o dcompressionisp ossible. Thissuggeststhatincludingtheoptimization
ofthevariableorderingintotheminimizationpro cedureoftenisnecessary togeta
go o dqualityofthecomputedgeneralization.
References
[1] L.Babai, P.Frank and J.Simon.Complexity classes in communication
com-plexitytheory.27.FOCS,pp.337{347,1986.
[2] A.Birkendorf and H.U.Simon.Using computationallearning strategies as a
to olforcombinatorialoptimization.Ann. Math.andArt. Intelligence 22, pp.
formatoionPro cessing Letters,No.24,pp.377{380,1987.
[4] A.Blumer,A.Ehrenfeucht,D.HausslerandM.Warmuth.Learnabilityandthe
Vapnik-Chervonenkis Dimension.Journal of the ACM, Vol. 36, pp. 929{965,
1989.
[5] B.Bollig and I.Wegener. Improvingthe variable ordering of OBDDs is
NP-complete.IEEETrans.onComputers45,pp.993{1002,1996.
[6] R.E.Bryant. Graph-based algorithms for Bo olean function manipulation.
IEEETrans.onComputers35,pp.677{691,1986.
[7] R.E.Bryant. Symb olic manipulation with ordered binary decision diagrams.
ACMComputingSurveys 24,pp.293{318,1992.
[8] S.Droste.EÆcientgeneticprogrammingforndinggo o dgeneralizingBo olean
functions.GeneticProgramming'97,pp.82{87,1997.
[9] S.Droste. Genetic programming with guaranteed quality. Genetic
Program-ming'98,pp.54{59,1998.
[10] S.Droste and D.Wiesmann. On representation and genetic op erators in
evo-lutionary algorithms.Submitted to IEEE Trans. on Evolution Computation,
1998.
[11] J.Hromkovic.Communication Complexity and Parallel Computing.Springer,
1997.
[12] J.Koza. Genetic Programming: On the Programmingof Computersby Means
of NaturalSelection.Cambridge,MA:TheMITPress, 1992.
[13] J.Koza. GeneticProgrammingII. AutomaticDiscoveryof ReusablePrograms.
MITPress,1994.
[14] E.Kushilevitz and N.Nisan. Communication Complexity. Cambridge
Univer-sityPress, 1997.
[15] H.Sakanashi,T.Higuchi,H.Iba andK.Kakazu.Anapproachforgenetic
syn-thesizer ofbinarydecisiondiagram.IEEEInt.Conf.onEvolutionary
Compu-tation,ICEC'96,pp.559{564,1996.
[16] D.Sieling, On the existence of p olynomial time approximation schemes for
OBDDminimization.STACS'98,LNCS1373,pp.205{215,1998.
[17] T.R.Shiple, R.Hojati, A.L.Sangiovanni-Vincentelli,R.K.Brayton. Heuristic
MinimizationofBDDsUsing Don't Cares,31stACM/IEEEDesign
Automa-tionConference,pp.225{232,1994.
[18] I.Wegener.TheComplexityofBooleanfunctions.Wiley-Teubnerseriesin
com-puterscience,1987.
[19] I.Wegener.EÆcientdatastructuresforBo oleanfunctions.Discrete
Mathemat-ics136,pp.347{372,1994.
[20] I.Wegener.Branching Programsand BinaryDecision Diagrams{Theory and
Applications. To app ear: SIAM Monographs on Discrete Mathematics and
Applications,1999.
[21] M.Yanagiya.EÆcientgeneticprogrammingbasedonbinarydecisiondiagrams.