Approximations by OBDDs and the Variable Ordering Problem

(1)

REIHE COMPUTATIONAL INTELLIGENCE

COLLABORATIVE RESEARCH CENTER 531

Design and Management of Complex Technical Processes

andSystems by meansofComputational IntelligenceMethods

Approximations by OBDDs and the Variable

Ordering Problem

Matthias Krause Petr Savicky Ingo Wegener

No. CI-58/99

Technical Report ISSN 1433-3325 January 1999

Secretary of the SFB531 University ofDortmund Dept. ofComputerScience/XI

44221DortmundGermany

This work is a pro duct of the Collab orative Research Center 531, \Computational

Intelligence", atthe UniversityofDortmundand wasprintedwithnancialsupp ort of

(2)

VARIABLE ORDERING PROBLEM Matthias Krause 1 PetrSavick y 2 Ingo Wegener 3 1

TheoretischeInformatik,Univ.Mannheim,68131Mannheim,Germany

e-mail:[email protected]

2

InstituteofComputerScience,AcademyofSciencesofCzechRepublic,

Po dvo darenskouvez2,18207Praha8,CzechRepublic

3

FBInformatik,LS2,Univ.Dortmund,44221Dortmund,Germany

Abstract

Ordered binary decision diagrams (OBDDs)andtheir variantsare

moti-vated by the need to represent Bo olean functions in applications. Research

concerning these application s leads also to problems and results interesting

fromtheoretical p oint ofview. Inthispap er, metho dsfromcommunication

complexityandinformationtheoryarecombinedtoprovethatthedirect

stor-ageaccessfunctionandtheinnerpro ductfunctionhavethefollowingprop erty.

Theyhavelinear -OBDDsizeforsomevariableorderingand,formost

vari-ableorderings 0

,allfunctionswhichapproximatethemonconsiderablymore

than halfof theinputs, need exp onential 0

-OBDD size. Theseresults have

implicationsfortheuseofOBDDsingeneticprogramming.

1 INTRODUCTION

Branchingprograms (BPs) orbinary decision diagrams(BDDs), which is just

an-other name, are representations of Bo olean functions f 2 B

n

, i.e., f:f0;1g n

!

f0;1g. They are compact but not useful for manipulationsof Bo olean functions,

sinceop erationslikesatisabilitytest,equivalencetestorminimizationleadtohard

problems. Bryant[6] hasintro duced -OBDDs(ordered BDDs),since they canb e

manipulatedeÆciently(see [7]and[19]forsurveys ontheareas ofapplication).

Denition1.1 A p ermutation on f1;:::;ng describ es the variable ordering

x

(1) ;:::;x

(n)

. A -OBDDisadirectedacyclicgraphG=(V;E)withonesource.

Eachsinkislab elledbyaBo oleanconstantandeachinnerno debyaBo olean

vari-able. Innerno deshavetwooutgoingedgesonelab elledby0andtheotherby1. If

an edge leads fromanx

i

-no deto an x

j

-no de, then 1

(i)hasto b e smaller than

1

(j), i.e.,theedges have to resp ect thevariable ordering. The -OBDD

repre-sentsthefunctionf 2B

n

dened inthefollowingway. Theinputaactivates,for

x

i

-no des,theoutgoinga

i

-edge. Thenf(a) isequaltothelab elofthesinkreached

bythe uniqueactivated pathstartingatthesource. Thesizeof Gis measuredby

thenumb erofitsno des. An OBDDisa -OBDDforanarbitrary .

One-waycommunicationcomplexity(seee.g. [11],[14])leadstolowerb oundsfor

OBDDs. Thismetho disalmostthesameascountingthenumb erofsubfunctionsof

f iftherstvariablesaccording tothevariableorderingarereplacedbyconstants.

Therearefunctions,forwhichtheOBDDsizeisverysensitivetothechosenvariable

1

Supp ortedbyDFGgrantKr1521/3-1.

2

Theresearchwaspartiallysupp ortedbyGAoftheCzechRepublic,GrantNo. 201/98/0717.

3

Supp ortedbyDFGgrantWe1066/8-1andbytheDFGaspartoftheCollab orativeResearch

(3)

optimalvariableorderingforf,see[5],oreventoapproximatetheoptimalvariable

ordering, see[16].

We willneedthefollowingtwofunctions.

Denition1.2 i) For n =2 k

, thedirect storage access function(ormultiplexer)

on k+n variables is thefunction DSA

n (a 0 ;:::;a k 1 ;x 0 ;:::;x n 1 )= x jaj , where

jajisthenumb erwhosebinaryrepresentation is(a

0 ;:::;a

k 1 ).

ii) For any even n, the inner pro duct function on n variables is the function

IP n (x 1 ;:::;x n )=x 1 x 2 x 3 x 4 :::x n 1 x n .

Clearly,these twofunctions have -OBDD ofsize O (n) fortheorderingof the

varibles used in the denition of the functions. On the other hand, they need

exp onential -OBDDsize formostofthevariableorderings(a fractionof 1 n "

forDSA

n

andevenafractionof1 2 "n

forIP

n

, see[20]). Another exampleofa

functionwithasimilarprop erty,so-calleddisjointquadraticform,mayb eobtained

byreplacingbydisjunctionintheexpressiondescribingIP.

These results are statedfor error-free representations of f and, up to now, to

theb estofourknowledge,nob o dyhaslo okedatrepresentationsofapproximations

of f froma theoreticalp oint of view. In this pap er, we investigatethe inuence

ofthevariableorderingforapproximaterepresentations offunctions. Ifnotstated

otherwise, by a random input x~ we mean an input x~ chosen from the uniform

distributiononf0;1g n . Denition1.3 A function g2 B n is a c-approximation of f 2B n if Pr (f(~x) =

g (x))~ c forarandom input x.~

Oneofthetwoconstantfunctions0and1alwaysisa1=2-approximation.Hence,

weconsider c-approximationsforc>1=2.

Weprovethefollowingstrengtheningsofthepreviouslymentionedlowerb ounds

onthe -OBDDcomplexityofDSA

n andIP

n

forarandomordering. Formost of

theorderings , everyfunctionthat isa(1=2+")-approximationofDSA

n or IP

n ,

where ", 0<"<1=2isanyconstant,requires a -OBDDof exp onential size. In

thecase ofIP

n

,theresultremainstrue evenif"tendstozero inacontrolledway.

Forexact formulationsseeTheorems2.1and 3.1.

The result for the inner pro duct function is stronger, since it can b e proved

for b etter parameters. On the other hand, the result for DSA

n

is of particular

interestforgeneticprogramming,since,recently,DSA

n

isfrequentlyusedinexp

er-iments. Thepro ofcombinesmetho dsfromone-waycommunicationcomplexityand

informationtheory.

Theproblemsaremotivatedbyexp erimentsingeneticprogrammingusing

OB-DDs, where one searches forago o dapproximationof an unknownfunctiongiven

by examples. Our results have consequences for the situation that the unknown

functionhasasmallOBDDforsomeordering,butthisorderingisnotknown. For

moredetailsseeSection 4.

For completeness, we present also an example of a function that is hard to

approximateforanyordering.

2 THEDIRECTSTORAGEACCESSFUNCTION

First, we state the result informally. There are only a few variable orderings

which allow an approximationg

n

of DSA

n

which is essentially b etter than the

trivial approximations by the constants 0 and 1 (which are 1=2-approximations)

(4)

holds forafractionof atleast1 n 2"

2

=ln2

ofthe variableorderings forDSA

n .

Eachfunctionwhichisa( 1 2 +"+n (" Æ )=2 )-approximationofDSA n hasa -OBDD

size whichis boundedbelowbye n

Æ

.

Thepro of ofthis theoremis splitted into Lemmas2.2 and 2.3. Recall that an

orderingforDSA

n

isap ermutationofn+kvariables,wherek=logn. Inorder

to simplifytheterminology,we assumethat therstn 0

=

def

b(1 2")nc variables

according to are given to Alice and the other ones to Bob. First, we derive a

prop erty ofrandomvariableorderings .

Lemma 2.2 Withprobabilityatleast1 n 2"

2

=ln2

,Aliceobtainsatmost(1 ")k

addressvariables,i.e.,a-variables.

Pro of. Therandom variableordering can b e pro duced asfollows. We take thek

address variablesand randomlycho ose for them one after another afree p osition

amongthen+kp ossiblep ositions. Thenwe continueinthesamewaywiththen

data variables,i.e.,the x-variables. Duringthe rstk steps ofthis pro cess, there

arealwaysatmost(1 2")nfreep ositionsamongtherstn 0

=b(1 2")ncp ositions

andatleastnop enp ositionsatall.Hence, theprobabilityofeachaddressvariable

to b e given toAlice, is atmost1 2". We can upp er b ound theprobabilitythat

Alicegetsmorethan(1 ")kaddressvariablesbytheprobabilityofatleast(1 ")k

successes inkindep endentBernoullitrialswithsuccess probability1 2".

Theexp ected numb erofsuccesses E[Z]equals(1 2")k . ByCherno'sb ound,

weobtain Pr(Z(1 ")k )=Pr(ZE[Z]+"k )e 2" 2 k =n 2" 2 =ln2 : 2

Inthefollowing,we xavariableorderingwhere Alicegetsat most(1 ")k

address variables. She also gets at least (1 2")n k data variables. If Alice's

address variables are xed, there are at least n "

data variables left which may

describ e theoutput. On theaverage, at least(1 2")n "

o(1) ofthese variables

are given to Alice. Inorder to enable Bob to computethe output exactly, Alice

hasto send himthe valueof heraddress variables andthose datavariableswhich

can describ e theoutput. IftheinformationgivenfromAliceismuch smallerthan

this, Bob can computethe valueofDSA

n

onlywith probabilitycloseto1=2. The

informationgivenfromAliceto Bobis measuredbythelogarithmof thesize ofa

-OBDDcomputingthefunctionDSA

n .

For arigorous argument, let b e an ordering and let A (resp. B) b e the set

of addressvariablesgivenin toAlice(resp. Bob) andletX (resp. Y) b etheset

of data variablesgivenin to Alice(resp. Bob). Clearly,jA[Xj=n 0

and every

computationinany -OBDDreadsrst(someof)thevariablesinA[X andthen

(someof)thevariablesinB[Y. Letg b eafunctionrepresented bya -OBDDG

ofsizes. Because ofthedenitionofc-approximations,weconsiderrandominputs

( ~a; ~

b;x;~ y )~ wherea~is arandomsettingof variablesinA,etc. Inthis situation,the

followingholds. Lemma 2.3 Pr (DSA n ( ~a; ~ b;x;~ ~y )=g ( ~a ; ~ b;x;~ y))~ 1 jXj 2n + 1 2n 22 jAj jXjlns 1=2 .

Before provingLemma2.3,letusdemonstrateitsapplicationbyproving

Theo-rem2.1.

Pro of ofTheorem 2.1. Recall that k=lognandlet s<e n

Æ

. Forevery ordering

(5)

probabilityatleast1 n 2"

,wehavejAj(1 ")k . Bysubstitutingtheseestimates

intotheb oundfromLemma2.3,we obtainthatthe probabilitythat DSA

n and g

have thesame valueis at most 1 2 +"+ 1 p 2 n (" Æ )=2 + logn 2n < 1 2 +"+n (" Æ )=2 .

This impliesthetheorem. 2

ForprovingLemma2.3,weneedsomemorenotation. LetH(U)b etheentropy

ofarandomvariableU andH(UjE)resp.H(UjV)theentropyofUgivenanevent

E orarandomvariableV resp. Moreover,letH

(x)= xlogx (1 x)log(1 x)

forx2(0;1).

For each(a;b;y ),let

q (a;b;y )=Pr (DSA

n

(a;b;~x;y )=g (a;b;~x;y ))

forrandomassignmentsx~tothevariablesinX. Theprobability,weareinterested

in,istheaverageofallq (a;b;y ). Letq (a;b)denote theaverageofq (a;b;y )overall

p ossible y and,similarly,letq (a)denotetheaverageofq (a;b;y ) overallp ossibleb

and y . Moreover,foreachpartialinputa,letI

a

b ethesetofpartialinputsb such

that thevariablex

j(a;b)j orx

a;b

, forsimplicity,is givento Alice. Note that H

(x)

is maximalforx=1=2andthemaximumisequalto1. Hence, ifjI

a

jlogs, the

next lemmaimpliesthatformostofb2I

a

,q (a;b)iscloseto1=2.

Lemma 2.4 For everya,wehave

X b2Ia H (q (a;b))jI a j logs:

Pro of. Consider the -OBDDcomputing thefunctiong describ ed b eforeLemma

2.3. For any settingsa;x,leth(a;x) b e therstno de, where thecomputationfor

a;xreachesano detestingavariableinB[Y orasink. Notethat acomputation

for a;b;x;y dep ends on a;xonlyviah(a;x). This means,there is afunction

b;y

suchthatg (a;b;x;y )=

b;y

(h(a;x)). Notethatthesizeoftherangeofhisatmost

s.

Besideswell-knowninformationtheoreticalinequalitiesweusethefollowingone

whosepro of isp ostp onedtotheendofthesection.

Claim 1 LetU andV berandomvariablestakingvaluesinf0;1g. ThenH

(Pr(U =

V))H(UjV).

If(a;b;y )isxedandb2I

a , DSA

n

outputs x

a;b

. Usingtheclaimandthefact

that H(Ujf(V))H(UjV)foreachfunctionf, we conclude

H (q (a;b;y ))=H (Pr (~x a;b = b;y (h(a;~x))) H(~x a;b j b;y (h(a;~x)))H(~x a;b jh(a;~x)):

Now we use the fact H(U

1 jV)+:::+H(U r jV) H((U 1 ;:::;U r )jV) for x~ a;b , b2I a

,andthevector x~

a

ofthese randomvariables. This implies

X b2I a H(~x a;b jh(a;x))~ H(~x a jh(a;~x)):

InthenextstepweapplytheequalitiesH(UjV)=H(U;V) H(V)andH(U;f(U))=

H(U)toobtain H(~x a jh(a;x))~ =H(~x a ;h(a;x))~ H(h(a;x))~ =H(~x a ) H(h(a;x)):~

(6)

The randomvariables x~

a;b , b 2I

a

, are indep endent and take values inf0;1g,i.e.,

~ x

a

isuniformlydistributed overf0;1g jIaj andH(~x a )=jI a j. Thisimplies H(~x a jh(a;~x))jI a j logs:

Puttingallourconsiderationstogether,weobtain

X b2I a H (q (a;b;y ))jI a j logs: ThefunctionH

isconcave. Hence, thisinequalityimpliesLemma2.4. 2

Pro of of Lemma 2.3. Let (a;b) = q (a;b) 1

2

. Then we apply the inequality

H ( 1 2 +t)1 (2=ln2)t 2

(estimateTaylor'sexpansionusingthesecondderivative)

to obtain X b2Ia H (q (a;b))= X b2Ia H 1 2 +(a;b) jI a j (2=ln2) X b2Ia (a;b) 2 :

TogetherwithLemma2.4,we get

1 2 lns X b2Ia (a;b) 2 :

Using Cauchy'sinequality,weobtain

X b2Ia j(a;b)j jI a j X b2Ia (a;b) 2 ! 1=2 1 2 jI a jlns 1=2 :

Recallthat q (a)is theaverageofallq (a;b). Sinceb maytake 2 jB j values,we get q (a) = 1 2 jB j 0 @ X b62I a q (a;b)+ X b2I a q (a;b) 1 A 1 2 jB j 2 jB j 1 2 jI a j+ X b2Ia (a;b) ! 1 2 jB j 1 (jI a j (2jI a jlns) 1=2 )= (jI a j); where (t) = def 1 2 jB j 1 (t (2tlns) 1=2

). The function is concave. Let

a 1 ;:::;a m , m=2 jAj

,b ethep ossiblevalues ofa. Then

1 m X 1im q (a i ) 1 m X 1im (jI ai j) 0 @ 1 m X 1im jI ai j 1 A = (jXj=2 jAj ):

The last equalityfollows,since, bydenition,the sumofall jI

ai

j equals jXj. The

left-handsideoftheab oveinequalityistheaverageofallq (a)andthisistheaverage

ofallPr (DSA(a;b;x;~ y )=g (a;b;x;~ y ))and,therefore,equaltoPr(DSA( ~a; ~

b;x;~ y )~ =

g ( ~a; ~

b;x;~ y )).~ We haveprovedthatthisprobabilityisb oundedab oveby

(jXj=2 jAj )=1 jXj 22 jAj+jB j + 1 22 jAj+jB j (22 jAj jXjlns) 1=2 :

Since Aand B areapartitionofthelognaddress variables,we have2 jAj+jB j

=n

(7)

Pr(U =V)= X 2f0;1g Pr (U =jV =)Pr(V =): TheconcavityofH implies H (Pr (U =V)) X 2f0;1g H (Pr (U=jV =))Pr(V =): SinceH (x)=H (1 x),we obtain H (Pr(U =0jV =))=H (Pr (U =1jV =))=H(UjV =) and H (Pr(U =V)) X 2f0;1g H(UjV =)Pr(V =)=H(UjV): 2

Letusaddsomecommentstothisresult. ArandomvariableorderingforDSA

n

iswithaprobabilityofatleastn logn

optimal,i.e.,alladdressvariablesaretested

b eforeeachdatavariable. Ifweconsidercutsasinourpro of,Alicegetsalladdress

variableswithaprobabilitywhichisapproximatelyn

log(1 2")

. Therefore,we need

another approach to improve the result with resp ect to the fraction of variable

orderings but one cannotobtain aresult foran exp onentially smallfraction. Bob

gets at least 2"n data variables. He can output the correct value, if Alice tells

himheraddressvariablesand thedecisive data variableisamongBob's variables.

Otherwise, he can guess the right output with probability 1=2(actually, he may

cho osealways0asoutput). Thenhissuccessprobabilityequals2"+ 1 2 (1 2")= 1 2 +".

Hence, ourapproachcannotleadto substantiallyb etter results.

3 THE INNER PRODUCT FUNCTION

In this section, we prove results on the inner pro duct function which are of the

same avor as theresults onthe direct storage access function inSection 2. The

dierence isthatwecanproveb etterb oundsonthequalityofapproximationeven

foralargerfractionof variableorderingsand largerOBDDs.

Theorem3.1 Let 0< Æ. The following property holds for a fraction of at least

1 e 4Æ

2

n

of the variable orderings for IP

n

. Each function which is at least a

( 1 2 +2 1 16 (1 9Æ )n 1 2 )-approximation of IP n

has a -OBDD size which is bounded

belowby2 Æ n

.

Pro of. First,wederiveaprop ertyofrandomvariableorderings . Itisconvenient

to renamethevariablessuch that IP

n (x;y ) =x 1 y 1 :::x n=2 y n=2 . We give the

rstn=2variablesaccordingtotoAliceandtheotheronestoBob. Anindexiis

called asingletonifx

i

isgiventoAliceandy

i

toBobor viceversa.

Lemma 3.2 Withprobabilityatleast1 e 4Æ

2

n

,arandom variableorderingleads

toatleast(1 Æ)n=8singletons.

Pro of. Therandomvariable orderingcan b e pro ducedas follows. First,we draw

n=2 ballsout ofanurn with n=2white balls(x-variables)andn=2 blackballs (y

(8)

variablesx

1 ;:::;x

w

. Thenwedrawn=2 w ballsoutofanurnwithwwhiteballs

(the y -variables y

1 ;:::;y

w

) and n=2 w black balls (the remainingy -variables).

The numb er of drawn black balls is alower b ound for the numb er of singletons.

Ourchance ofgettingmanysingletonsisminimalforthemaximalvaluew=n=4.

Then we haveahyp ergeometric distributionwithmeann=8. Itiswell-knownthat

the deviationfromthemean is largerfor thebinomialdistributionwith thesame

success probabilitywhichis 1=2inourcase. Hence, we canb ound theprobability

of gettingat most(1 Æ)n=8singletonsbytheprobabilityof(1 Æ)n=8successes

inn=4 Bernoullitrials with success probability1=2. Nowtheresult followsby an

applicationof Cherno'sb ound. 2

We onlyremark that itis evenp ossible to obtainalowerb ound of(1 Æ)n=4

singletons ifweincrease theerrorprobabilityalittlebit.

Inthe following,we x avariable ordering where Alicegets amongher n=2

variablesexactly tsingletons. Letg b e afunctionrepresented bya -OBDDG of

size s. Wetry toestimatetheprobabilitythatIP (~x)=g (~x) onarandominputx.~

Lemma 3.3 Pr (IP(~x)=g (x))~ 1 2 +s 1=2 2 t=2 1=2 :

First we show how this claimimplies the theorem. We set s = 2 Æ n and t = 1 8 (1 Æ)n. Then s 1=2 2 t=2 =2 Æ n=2 2 (1 Æ )n=16 =2 1 16 (1 9Æ )n :

and thetheoremfollowsfromLemmas3.2and3.3. 2

Pro ofofLemma3.3. WeconsiderthecommunicationmatrixforIP

n

withresp ect

to the partitionof the variables b etween Alice and Bob, i.e., we have 2 n=2

rows

corresp onding tothedierentinputvectorsforthevariablesofAliceandsimilarly

2 n=2

columns. Each matrixentry is thevalueof IP ontheinput oftherowinput

andthecolumninput. Ifwexallvariablesx

j andy

j

wherejisnotasingleton,we

obtain IP

2t

or itsnegationas subfunction. Hence, thecommunicationmatrixcan

b epartitionedto2 t

2 t

-submatriceswhicharecommunicationmatricesforIP

2t or

its negation. For eachof thesesubmatrices M =(M

ij

),w.l.o.g.Aliceis theowner

of allx-variables andBobtheowner ofally -variables. Itis knownfromLindsey's

Lemma(see,e.g.,Babai,Frank,andSimon(1986))thatforeachsubsetAofarows

ofM andeachsubset B ofb columnsofM itholdsthat

X i2A;j2B ( 1) M ij 2 t=2 a 1=2 b 1=2

i.e.,eachnotto osmallsubmatrixisnotconstant. Moreprecisely,wehavetonegate

atleast 1 2 (ab 2 t=2 a 1=2 b 1=2

)entriesofanabsubmatrixofM toobtainaconstant

submatrix. It follows by an averaging argument that there is an assignment to

thevariablesx

j andy

j

where j isnotasingleton suchthat Pr(IP

(~x)=g

(~x))

Pr (IP(x)~ =g (~x))fortheresultingsubfunctionsIP

andg

ofIPandgresp. Bythe

ab ove arguments,we canassumew.l.o.g.that IP =IP 2t . ItissuÆcientto prove Pr (IP (~x)=g (~x)) 1 2 +s 1=2 2 t=2 1=2 : Let M

b e thecommunicationmatrixofg

,D =MM

,and jjD jjthenumb er

of1-entries ofD . Then Pr (IP (~x)=g (~x))=1 jjD jj=2 2t :

(9)

jjD jj 1 2 2 2t s 1=2 2 3t=2 1=2

whichimpliestheclaim.

Sinceg

canb erepresentedbya -OBDDwhosesizeisb oundedbys,itfollows

fromthewell-knownrelationsb etween one-waycommunicationcomplexityand

-OBDDsize thatthecommunicationmatrixofg

hasat mostsdierentrows. Let

r

1 ;:::;r

s

containthedierentrowsofM

. We partitionM

toasmallnumb erof

constantsubmatriceswhichintuitivelyimplies thatM andM

arequitedierent.

We p ermutate the rows (i.e., renumb er the input vectors of Alice) such that we

have at rst a 1 rows equalto r 1 , then a 2 rowsequalto r 2

and so on. The blo ck

of a

k

equalrowscanb e partitionedforsomeb

k to ana k b k -matrixconsisting of

zerosonlyandana

k (2

t

b

k

)-matrixconsistingofonesonly. Altogetherwehave

partitioned M

to at most 2s constant submatrices whose sizes are a

k b k and a k (2 t b k ),1ks.

Nowweconsider thecorresp ondingsubmatrices ofM. By theconclusion from

Lindsey's Lemma,itfollowsthatwe haveto negateatleast

d= X 1k s 1 2 (a k b k 2 t=2 a 1=2 k b 1=2 k +a k (2 t b k ) 2 t=2 a 1=2 k (2 t b k ) 1=2 )

entriesinordertoconverteach submatrixintoaconstantone. Hence,

jjD jjd= X 1k s 1 2 (a k 2 t 2 t=2 a 1=2 k (b 1=2 k +(2 t b k ) 1=2 )):

Thesumofalla

k isequalto2 t . Hence, d + := X 1k s 1 2 a k 2 t = 1 2 2 2t : Thetermb 1=2 k +(2 t b k ) 1=2 is maximizedfor b k =2 t 1 and d := X 1k s 1 2 2 t=2 a 1=2 k (b 1=2 k +(2 t b k ) 1=2 )) X 1k s 2 t 1=2 a 1=2 k : Sincex 1=2

isconcave,weobtainthemaximalvaluefora

k =2 t =sand d = X 1k s 2 t 1=2 2 t=2 s 1=2 =s 1=2 2 3t=2 1=2 : SincejjD jjd=d +

d ,wehave provedtheprop osed b oundonjjD jj. 2

Itisnoweasyto obtainafunctionwhich ishardto approximateby -OBDDs

and arbitrary variable orderings. We dene the functionp ermuted inner pro duct

PIP

n

ondlog(n!)e+nvariables. Therstdlog(n!)e=(nlogn)variablesdescrib e

a p ermutation on f1;:::;ng, more precisely, foreach p ermutation,the numb er

of co de words is one or two. The functionPIP

n

realizes IP

n

on the remainingn

variableswhicharep ermutedaccordingto. Foreachvariableordering ,wehave

torepresent(forthedierentassignmentstothep ermutationvariables)IP

n forall

variableorderings once or twice. Hence, the result of Theorem 5which holds for

manyvariableorderingsandIP

n

implies asimilarresult forPIP

n

andallvariable

orderings.

Corollary 3.4 The function PIP

n

cannot be approximatedwellby -OBDDs and

(10)

SION FOR GENETIC PROGRAMMING

Geneticprogrammingwasintro ducedbyKoza[12]as aheuristic approach to

con-struct aprogram (in the formof anS-expression) computing afunction given by

examples. LetusconsidergeneticprogrammingrestrictedtoBo oleanconceptslike

Bo olean formulas, binary decision diagrams,circuits etc. This restricted form of

genetic programmingiscloselyrelatedtothefollowingtyp e ofminimization

prob-lems.

Assume,amo delforrepresentingBo oleanfunctionsisgiven. Itmayb eamo del

fromthelistab ove,butalsosomeweakermo dellikeOBDDs,DNFformulas,

deci-siontreesetc. Moreover,acomplexitymeasureforthegivenmo delissp ecied. An

instanceoftheminimizationproblemisdescrib edbyasetSf0;1g n

ofinputsofa

functionf ofnvariablesandthevaluesf(x)forallinputsx2S. Theproblemisto

ndafunctiong togetherwithitsrepresentation inthegivenmo delofcomplexity

as smallasp ossible andsuch that g (x)=f(x)forallxinS.

Geneticprogrammingisaverygeneraltyp eofheuristicsapplicabletothiskind

ofproblemswhichisexp ectedtoallowfurtherprogressinthisarea.Letusp ointout

thatingeneticprogramming,theusualformulationoftheminimizationrequirement

isthatwelo okforarepresentationofgofcomplexityb elowab oundsp eciedamong

theparametersof therun.

In the present pap er, we are mostly interested in the situation, where f is a

total function, which is unknown, and we only have the values of f on a set of

inputs S, which is notcomplete,i.e. S 6=f0;1g n

. Inthis case, thegoalisto nd

atotalfunctiongwhichagreeswithf onexamplesfromS and,moreover,yieldsa

justiablepredictionofthevalues oftheunknownfunctionf ontheinputsnotin

S. Thepairshx;f(x)i forall x2S are calledtraining examples andthe required

functiong iscalled ageneralizationofthetrainingexamples.

In order to get provable justication, we assume that the examples are

cho-sen atrandom,indep endentlyand fromthesamedistribution. This allowsto use

knownresultsconcerning thePAC-learningmo del,whichimplythatundercertain

assumptions,thegeneralizationproblemcan b ereduced totheminimization

prob-lem. More exactly,itisprovedin[3](see also[4])that undernaturalassumptions,

it isp ossible to sp ecify acomplexityb ound sand anumb erm, which istypically

larger thans,so thatanyfunctiong ofcomplexityatmosts,whichagreeswithf

onm randomlychosen indep endentexamples,is likelyto b eago o dapproximaion

off onallinputs.

In exp eriments with S-expressions, for a long time, only tree representations

are used. AlreadyKoza[13]hasrecognized thevalueofgraph representationsand

has intro duced ADFs (automaticallydened functions), i.e., subprograms which

can b e usedat severalplaces. Droste[8],[9] suggestedto useOBDDs andgenetic

programminginordertogeneralizeagivensetoftrainingexamples. Inthecase of

-OBDDsforaxedordering,subprogramsusedatseveralplacesareautomatically

identied and mergedbythe reduction algorithm. Ifthe unknownfunctionhas a

smallOBDDrepresentation,thenthissignicantlyhelpstondtherepresentation.

Successful exp eriments together with a theoretical background may b e found in

[8],[9],[10],[17]. Ap ossibilityto adopt anexisting learningalgorithmforOBDD

using memb ershipand equivalence queries to a heuristic minimization pro cedure

forincompletelysp eciedBo oleanfunctionsisdescrib ed in[2].

Exp eriments withminimizationfortotal functions using -OBDDsfor axed

orderingwerealsop erformed,see[21],[15].

(11)

domain. Assume, ~

S is a collection of m examples chosen independently from a

distribution D on the domain. Then, the probability that there exists a function

g 2H,which agrees withf on all examples in ~

S,but Pr (g (~x)=f(~x ))<1=2+",

wherex~ischosen fromD ,is atmostjHj 1 2 +" m .

Itis p ossibleto strengthen this theoremusing theVC dimensionof H, see [4].

However,since we havenonontrivialupp er b ound ontheVCdimensionofclasses

corresp onding to OBDDs, we usethe weaker form. This is almostthesameas if

weusetheformulationusingtheVCdimensionandestimatetheVCdimensionby

logjHj,whichisageneralupp er b oundontheVCdimensionofH.

In a typical application of this theorem following[3], the set H is the set of

all functions of complexityat most s in somemo del. In order to apply Occam's

razortheoremtoOBDDs,weneedanupp erb oundonthenumb erofnonequivalent

OBDDsofagivensizes. Droste[9]usedago o dupp erb oundonthisnumb erbased

on asystem of recurrence relations. Inorder to achieve a closed formula forthis

upp erb ound,weuseaslightlydierentmo del,namelycompleteOBDDs,whichtest

every variableinevery computation. For completeOBDDs,thefollowingb oundis

easyto obtainusingthemetho dofcountingcircuits, seee.g. [18].

Lemma 4.2 Thenumberofnonequivalent complete -OBDDsofsizesforagiven

isatmost(s+2) 2s

=s!(es) s

.

CombiningLemma4.2withTheorem4.1,itis p ossibletojustifythequalityof

thepredictionof theunknownfunctionf,obtainedbyanyheuristic minimization

pro cedure forOBDDs. Weleavetheexactformulationtothefullpap er.

Thegeneralsituationisthatinordertogetago o dapproximation,itisnecessary

that the heuristic minimization pro cedure succeeds to nd a small OBDD that

agreeswithallthetrainingexamples. Letuscallthesituationthatwendsuchan

OBDDacompressionofthesetoftrainingexamples,since thesizeb oundrequired

toachieve ago o dpredictionisalmostexactlytheb oundwhichguarantees thatthe

numb erofbitsneeded torepresent gisless thanm.

TheresultsoftheprevioussectionsimplythatDSA

n andIP

n

arehardto

approx-imatebyany -OBDDforarandom . Inthenext theorem,we prove,moreover,

thatanyfunctionf thatishardtoapproximateinthissensehasalsothefollowing

prop erty. If we have aset oftrainingexamplesfor functionf

, whichis obtained

fromthefunctionf byanunknownp ermutationof thevariables, thena

reason-ablecompressionoftheexamplesrequiresalsotooptimizetheorderingofvariables

usedtorepresentg . Moreexactly,ifwecho oseanorderingofthevariablesfor

solv-ing theminimizationproblemat randomb eforewe starttheminimizationpro cess

and the ordering is not mo diedduring the pro cess, then, with high probability,

almostnocompressionis p ossible.

Theorem4.3 Letf,,"andadistributionDonf0;1g n

besuchthatthefollowing

is true: if an ordering is chosen from the uniform distributionon all orderings,

then with probability at least 1 , every -OBDD h of size at most s satises

Pr (f(~x) = h(x))~ < 1

2

+", where x~ is chosen from the distribution D . Let ~

S be

a set of m independent random examples chosen from D . Then, with probability

1 (es) s 1 2 +" m

,thereisno -OBDDg of sizeatmosts thatagreeswithf

on alltrainingexamples in ~

S.

Pro of. Letuscallanorderingbad,ifithastheprop ertymentionedinthetheorem.

A randomordering isbad withprobabilityat least 1 . Sincetheexamples are

chosen indep endently on the ordering, the distribution of the examples do es not

(12)

size at most s under the condition that the ordering is bad. Every -OBDD

of size at most s matches all the examples with probability at most 1 2 +" m .

Multiplyingthisbythenumb erof -OBDDsofsizeatmostsyieldsanupp erb ound

ontherequiredconditionalprobability. Hence,theconditionalprobabilitythatthe

examplesmaynotb eexpressedincomplexityatmostsisatleast1 (es) s 1 2 +" m .

Itfollowsthattheprobabilitythattheorderingisbadand,moreover,theexamples

maynotb eexpressedinsizeatmostsisatleast(1 ) 1 (es) s 1 2 +" m . This

impliesthetheorem. 2

This general result mayb e combined with the results of Sections 2and 3 to

obtainthefollowing.

Corollary 4.4 Foreverylargeenoughn,ifwetakem=n (1)

examplesforDSA

n

fromtheuniformdistributionandchoosearandomorderingofthevariables,then

withprobabilityatleast1 n 1=2

,thereisno -OBDDofsize 1

10

m=logmmatching

the given mtraining examples.

Pro of. Let";" 0 b esuchthat p ln2=2<"<" 0 <1=2 1=10 1=2. Moreover,letÆ <"

b eanysmallp ositivenumb erandlets= 1

10

m=logm. UsingTheorem2.1,weobtain

for every large enough n that a random ordering satises the following. With

probabilityatleast1 n 2" 2 =ln2 ,thereisno( 1 2 + " 0

)-approximationamongfunctions

of -OBDDcomplexityatmostse n

Æ

. Notethatinoursituation,(es) s

2 m=10

.

UsingalsoTheorem4.3,withprobabilityatleast1 n 2" 2 =ln2 2 m=10 1 2 +" 0 m 1 n 1=2

, there is no -OBDD of size at most s matching the givenm training

examplesforDSA

n . 2

Corollary 4.5 Let0<<1be aconstant. For everylargeenough n,if wetake

m =n O (1)

, m n,examples forIP

n

from the uniform distribution and choose a

randomorderingofthevariables,thenwithprobabilityatleast1 e (n)

,thereis

no -OBDDofsizeatmost(1 )m=logmmatchingthegivenmtrainingexamples.

Pro of. Let Æ and " b e p ositive numb erssuch that Æ < 1 9 and 1 2 +" < 1 2 1 .

Moreover, let s = (1 )m=logm. By Theorem 3.1, for every large enough n,

arandom ordering satises the following. Withprobabilityat least 1 e 4Æ 2 n , there isno ( 1 2 +")-approximationofIP n

amongfunctionsof -OBDDcomplexity

atmosts2 Æ n

. Notethat(es) s

2 (1 )m

. TogetherwithTheorem4.3,weobtain

that withprobabilityatleast1 e 4Æ 2 n 2 (1 )m 1 2 +" m =1 e (n) ,thereis

no -OBDDof sizeatmostsmatchingthegivenmrandomtrainingexamplesfor

IP

n . 2

On the other hand, for the functions DSA

n

and IP

n

, there are orderings, for

whichago o dcompressionisp ossible. Thissuggeststhatincludingtheoptimization

ofthevariableorderingintotheminimizationpro cedureoftenisnecessary togeta

go o dqualityofthecomputedgeneralization.

References

[1] L.Babai, P.Frank and J.Simon.Complexity classes in communication

com-plexitytheory.27.FOCS,pp.337{347,1986.

[2] A.Birkendorf and H.U.Simon.Using computationallearning strategies as a

to olforcombinatorialoptimization.Ann. Math.andArt. Intelligence 22, pp.

(13)

formatoionPro cessing Letters,No.24,pp.377{380,1987.

[4] A.Blumer,A.Ehrenfeucht,D.HausslerandM.Warmuth.Learnabilityandthe

Vapnik-Chervonenkis Dimension.Journal of the ACM, Vol. 36, pp. 929{965,

1989.

[5] B.Bollig and I.Wegener. Improvingthe variable ordering of OBDDs is

NP-complete.IEEETrans.onComputers45,pp.993{1002,1996.

[6] R.E.Bryant. Graph-based algorithms for Bo olean function manipulation.

IEEETrans.onComputers35,pp.677{691,1986.

[7] R.E.Bryant. Symb olic manipulation with ordered binary decision diagrams.

ACMComputingSurveys 24,pp.293{318,1992.

[8] S.Droste.EÆcientgeneticprogrammingforndinggo o dgeneralizingBo olean

functions.GeneticProgramming'97,pp.82{87,1997.

[9] S.Droste. Genetic programming with guaranteed quality. Genetic

Program-ming'98,pp.54{59,1998.

[10] S.Droste and D.Wiesmann. On representation and genetic op erators in

evo-lutionary algorithms.Submitted to IEEE Trans. on Evolution Computation,

1998.

[11] J.Hromkovic.Communication Complexity and Parallel Computing.Springer,

1997.

[12] J.Koza. Genetic Programming: On the Programmingof Computersby Means

of NaturalSelection.Cambridge,MA:TheMITPress, 1992.

[13] J.Koza. GeneticProgrammingII. AutomaticDiscoveryof ReusablePrograms.

MITPress,1994.

[14] E.Kushilevitz and N.Nisan. Communication Complexity. Cambridge

Univer-sityPress, 1997.

[15] H.Sakanashi,T.Higuchi,H.Iba andK.Kakazu.Anapproachforgenetic

syn-thesizer ofbinarydecisiondiagram.IEEEInt.Conf.onEvolutionary

Compu-tation,ICEC'96,pp.559{564,1996.

[16] D.Sieling, On the existence of p olynomial time approximation schemes for

OBDDminimization.STACS'98,LNCS1373,pp.205{215,1998.

[17] T.R.Shiple, R.Hojati, A.L.Sangiovanni-Vincentelli,R.K.Brayton. Heuristic

MinimizationofBDDsUsing Don't Cares,31stACM/IEEEDesign

Automa-tionConference,pp.225{232,1994.

[18] I.Wegener.TheComplexityofBooleanfunctions.Wiley-Teubnerseriesin

com-puterscience,1987.

[19] I.Wegener.EÆcientdatastructuresforBo oleanfunctions.Discrete

Mathemat-ics136,pp.347{372,1994.

[20] I.Wegener.Branching Programsand BinaryDecision Diagrams{Theory and

Applications. To app ear: SIAM Monographs on Discrete Mathematics and

Applications,1999.

[21] M.Yanagiya.EÆcientgeneticprogrammingbasedonbinarydecisiondiagrams.