A Regularity Lemma and Low-Weight Approximators for Low-Degree Polynomial Threshold Functions

(1)

www.theoryofcomputing.org

A Regularity Lemma and Low-Weight

Approximators for Low-Degree

Polynomial Threshold Functions

Ilias Diakonikolas

∗

Rocco A. Servedio

†

Li-Yang Tan

‡

Andrew Wan

§

Received September 8, 2012; Revised September 27, 2013; Published March 25, 2014

Abstract: We give a “regularity lemma” for degree-d polynomial threshold functions (PTFs) over the Boolean cube{−1,1}n_{. Roughly speaking, this result shows that every} degree-dPTF can be decomposed into a constant number of subfunctions such that almost all of the subfunctions are close to being regular PTFs. Here a “regular” PTF is a PTF sign(p(x))where the influence of each variable on the polynomialp(x)is a small fraction of the total influence of p.

A conference version of this paper appeared in theProc. 25th Ann. IEEE Conf. on Computational Complexity, CCC 2010[11].

∗_{[email protected]}_{. Supported in part by a Simons Postdoctoral Fellowship. Most of this work was done at Columbia} University supported by NSF grant CCF-0728736, and by an Alexander S. Onassis Foundation Fellowship. Part of this research was done while visiting IBM Almaden.

†_{[email protected]}_{. Supported by NSF grants CNS-0716245, CCF-0915929, and CCF-1115703.}

‡_{[email protected]}_{. Supported by DARPA award no. HR0011-08-1-0069 and NSF CyberTrust grant no.}

CNS-0716245.

§_{[email protected]}_{. This work was done at Columbia University supported by NSF CyberTrust award}

CNS-0716245.

ACM Classification:F.1.3, G.2.0

AMS Classification:68Q15, 68R01

(2)

As an application of this regularity lemma, we prove that for any constantsd≥1,ε>0,

every degree-dPTF overnvariables can be approximated to accuracyεby a constant-degree

PTF that has integer weights of total magnitudeOε,d(nd). This weight bound is shown to be optimal up to logarithmic factors.

1 Introduction

A polynomial threshold function (henceforth PTF) is a Boolean function f :{−1,1}n_{→ {−}₁_,₁_}_,

f(x) =sign(p(x)),

where p:{−1,1}n_→

Ris a polynomial with real coefficients. If phas degreed, we say that f is a degree-dPTF. Low-degree PTFs are a natural generalization of linear threshold functions (the cased=1) and hence are of significant interest in complexity theory, see e. g., [1,4,29,28,9,13,16,25,32,34] and many other works.

Theinfluenceof coordinateion a functiong:{−1,1}n_→

Rmeasures the extent to whichxiaffects the output ofg. More precisely, we have Infi(g) =∑S3igb(S)

2_{, where}

∑S⊆[n]gb(S)χS(x) is the Fourier

expansion ofg. Thetotal influenceofgis the sum of allncoordinate influences, Inf(g) =∑ni=1Infi(g).

See [27,18] for background on influences. We say that a polynomialp:{−1,1}n_→

Ris “τ-regular” if the influence of each coordinate onpis

at most aτ fraction ofp’s total influence (seeSection 3for a more detailed definition). A PTF f is said to

beτ-regular if f =sign(p), wherepisτ-regular. Roughly speaking, regular polynomials and PTFs are useful because they inherit some nice properties of PTFs and polynomials over Gaussian (rather than Boolean) inputs; this intuition can be made precise using the “invariance principle” of Mossel et al. [26]. This point of view has been useful in thed =1 case for constructing pseudorandom generators [6], low-weight approximators [33,10], and other results for LTFs [30,24].

1.1 Our results

A regularity lemma for degree-dPTFs A number of useful results in different areas, loosely referred to as “regularity lemmas,” show that for various types of combinatorial objects an arbitrary object can be approximately decomposed into a constant number of “pseudorandom” sub-objects. The best-known example of such a result is Szemerédi’s classical regularity lemma for graphs [35], which (roughly) says that any graph can be decomposed into a constant number of subsets such that almost every pair of subsets induces a “pseudorandom” bipartite graph. Another example is Green’s recent regularity lemma for Boolean functions [14]. Results of this sort are useful because different properties of interest are sometimes easier to establish for pseudorandom objects, and via regularity lemmas it may be possible to prove the corresponding theorems for general objects. We note also that results of this sort play an important part in the “structure versus randomness” paradigm that has been prominent in recent work in combinatorics and number theory, see e. g., [36].

(3)

Theorem 1.1. Let f(x) =sign(p(x))be any degree-d PTF. Fix anyτ >0. Then f is equivalent to a decision treeT, of depth

depth(d,τ):= 1 τ ·

dlog1

τ O(d)

with variables at the internal nodes and a degree-d PTF f_ρ=sign(p_ρ)at each leafρ, with the following property: with probability at least1−τ, a random path1from the root reaches a leafρ such that either (i) pρ isτ-regular, or (ii) fρisτ-close to a constant function.

Regularity is a natural way to capture the notion of pseudorandomness for PTFs, and results of interest can be easier to establish for regular PTFs than for arbitrary PTFs (this is the case for our main application, constructing low-weight approximators, as we describe below). Our regularity lemma provides a general tool to reduce questions about arbitrary PTFs to regular PTFs; it has been used in this way as an essential ingredient in the recent proof that bounded independence fools all degree-2 PTFs [8] and (subsequently to the conference publication of this work [11]) degree-dPTFs [19] (see Section 9). We note that the recent construction of pseudorandom generators for degree-dPTFs of [25] also crucially uses a decomposition result which is very similar to our regularity lemma; we discuss the relation between our work and [25] in more detail below and inSection 1.2. Finally, Kane’s recent work establishing the correct exponent in the Gotsman-Linial conjecture also uses a decomposition result which is very similar to our regularity lemma [21].

Application: Every low-degree PTF has a low-weight approximator [33] showed that every linear threshold function (LTF) over{−1,1}n_{can be}

ε-approximated by an LTF with integer weightsw1, . . . ,wn such that∑iw2i =n·2

˜

O(1/ε2)_{. (Here and throughout the paper we say that}_g_{is an}

ε-approximator for f

if f andgdiffer on at mostε2ninputs from{−1,1}n.) This result and the tools used in its proof found

several subsequent applications in complexity theory and learning theory, see e. g., [6,30].

We apply our regularity lemma for degree-dPTFs to prove an analogue of the [33] result for low-degree polynomial threshold functions. Our result implies that for any constantsd,ε, any degree-dPTF

has anε-approximating PTF of constant degree and (integer) weightO(nd).

When we refer to theweightof a PTF f=sign(p(x)), we assume that all the coefficients of pare integers; by “weight” we mean the sum of the squares ofp’s coefficients. We prove

Theorem 1.2. Let f(x) =sign(p(x))be any degree-d PTF. Fix anyε>0. Then there is a polynomial q(x)of degree D= (d/ε)O(d)and weight2(d/ε)O(d)·nd such thatsign(q(x))isε-close to f .

A result on the existence of low-weightε-approximators for PTFs is implicit in the recent work [9]. They

show that any degree-dPTF f has Fourier concentration∑|S|>1/εO(d) bf(S)

2_≤

ε, and this easily implies

that f can beε-approximated by a PTF with integer weights. (Indeed, recall that the learning algorithm

of [22] works by constructing such a PTF as its hypothesis.) The above Fourier concentration bound implies that there is a PTF of degree 1/εO(d)and weightn1/εO(d) whichε-approximates f. In contrast, our

Theorem 1.2can give a weaker degree bound (ifd=1/εω(1)), but always gives a much stronger weight

bound in terms of the dependence onn. We mention here that Podolskii [31] has shown that for every constantd≥2, there is a degree-dPTF for which anyexactrepresentation requires weightnΩ(nd₎

.

(4)

We also prove lower bounds showing that weightΩe(nd)is required toε-approximate degree-dPTFs

for sufficiently small constantε; seeSection 4.3.

Techniques An important ingredient in our proof ofTheorem 1.1is a case analysis based on the “critical index” of a degree-dpolynomial (seeSection 3for a formal definition). The critical index measures the first point (going through the variables from most to least influential) at which the influences “become small;” it is a natural generalization of the definition of the “critical index” of a linear form [33] that has been useful in several subsequent works [30,6,10]. Roughly speaking we show that:

• If the critical index ofpis large, then a random restriction fixing few variables (the variables with largest influence in p) causes sign(p)to become a close-to-constant function with non-negligible probability; seeSection 3.1.

• If the critical index ofpis positive but small, then a random restriction as described above causesp

to become regular with non-negligible probability; seeSection 3.2.

• If the critical index ofpis zero, thenpis already a regular polynomial as desired.

Structure of the paper Section 2contains notation and background. InSection 3we prove our main result (Theorem 1.1). InSection 4we prove our upper bound regarding integer-weight approximations (Theorem 1.2) and also give a nearly matching lower bound for any constant accuracyε>0.

1.2 Related work

Theorem 1.1and the results in Sections3.1and3.2strengthen earlier results with a similar flavor that appeared in the work of Diakonikolas et al. [7]. Furthermore, similar structural results were proven simultaneously and independently by Ben-Eliezer et al. [23] and by Harsha et al. and Meka et al. [16,25]. We describe each of these works below and their structural results for PTFs as they compare to our Theorem 1.1.

The results in [7] were obtained simultaneously and independently by Diakonikolas et al. [9] and Harsha et al. [16], and [7] is the resulting merge (which followed the approach of [9]). A regularity lemma as inTheorem 1.1is not explicitly derived in [7]; using their Lemmas 5.10 and 5.9 and the ideas present here, one may obtain the conclusion ofTheorem 1.1, but with depth(d,τ):= (1/τ)(dlognlog(1/τ))O(d).

Note the dependence onn; eliminating this dependence is essential for our low-weight approximator application and for the applications in [8,19]. We eliminate the dependence on the dimension partly by developing dimension-independent versions of Lemmas 5.10 and 5.9, which we do in Lemmas3.5 and3.9, respectively.

Harsha et al. [16] give a result which is very similar toLemma 3.14, the main component in our proof ofTheorem 1.1. By applying the result from [16], Meka and Zuckerman [25] give a dimension-independent regularity lemma which is quite similar to ourTheorem 1.1. They obtain a small-depth decision tree such that most leaves areε-close to beingε-regular under astrongerdefinition of regularity,

(5)

Letp:{−1,1}n_→

Rbe a polynomial andε>0. We say that the polynomial pis “ε-regular in`2” if

s

n

∑

i=1

Infi(p)2_≤_ε_·

n

∑

i=1

Infi(p).

Recall that in our definition of regularity, instead of upper bounding the`2-norm of the influence

vectorI = (Inf1(p), . . . ,Infn(p))byε times the total influence of p(i. e., the`1norm ofI), we upper

bound the`∞norm (i. e., the maximum influence). We may thus call our notion “ε-regularity in`∞.”

Note that if a polynomial isε-regular in`2, then it is alsoε-regular in`∞. (And this implication is

easily seen to be essentially tight, e. g., if we have many variables with tiny influence and one variable with anε-fraction of the total influence.) For the other direction, if a polynomial isε-regular in`∞, then it

is√ε-regular in`2. (This is also tight if we have 1/ε many variables with influenceε.)

Meka and Zuckerman prove the following.

Theorem 1.3 ([25]). Every degree-d PTF f =sign(p) can be expressed as a decision tree of depth

2O(d)·(1/ε2)log2(1/ε)with variables at the internal nodes and a degree-d PTF fρ=sign(pρ)at each leafρ, such that with probability1−ε, a random root-to-leaf path reaches a leafρsuch that fρisε-close to beingε-regular in`2. (In particular, for a “good” leafρ, either pρwill beε-regular in`2or fρ will beε-close to a constant.)

Theorem 1.1shows exactly the same statement as the one above if we replace “`2” by “`∞” and the

depth of the tree by(1/ε)·(dlog(1/ε))O(d).

Sinceε-regularity in`2impliesε-regularity in`∞, the result of [25] implies a version ofTheorem 1.1

which has depth of 2O(d)·(1/ε2)log2(1/ε). Hence the latter result and our result are quantitatively

incomparable to each other. Roughly, ifd is a constant (independent of ε), then ourTheorem 1.1is

asymptotically better whenε becomes small. This range of parameters is quite natural in the context of

pseudo-random generators. In particular, in the proof that poly(1/ε)-wise independenceε-fools degree-2

PTFs [8], using [25] instead ofTheorem 1.1, would give a worse bound on the degree of independence (namely,Oe(ε−18)as opposed toOe(ε−9)). On the other hand, ifd=Ωe(log(1/ε)), then the result of [25]

is better.

Finally, Ben-Eliezer et al. [23] establish the existence of a decision tree such that most leaves are

τ-regular (as opposed toτ-close to beingτ-regular):

Theorem 1.4 ([23]). Every degree-d PTF f =sign(p) can be expressed as a decision tree of depth

2(d/τ)O(d)_·_log₍₁_/

τ)with variables at the internal nodes and a degree-d PTF fρ=sign(pρ)at each leaf ρ, such that with probability1−τ, a random root-to-leaf path reaches a leafρ such that fρ isτ-regular.

Note that the depth of their tree is exponential in 1/τ.

2 Preliminaries

We start by establishing some basic notation. We write[n]to denote {1,2, . . . ,n}and[k, `]to denote

(6)

where the underlying distribution will be clear from the context. Forx∈ {−1,1}n_and_A_⊆_[_n_]_{we write}

xAto denote(xi)i∈A.

For a function f :{−1,1}n_→

Randq≥1, we denote bykfkqits`qnorm, i. e.,

kfkq

def

=Ex|f(x)|q1/q ,

where the intended distribution over x will always be uniform over {−1,1}n_{. For Boolean-valued} functions f,g:{−1,1}n_{→ {−}₁_,₁_}_{the distance between} _f_and_g_{, denoted dist}₍_f_,_g₎_{, is}_Prx_[_f₍_x₎₆₌_g₍_x_)] where the probability is over uniformx∈ {−1,1}n_.

We assume familiarity with the basic elements of Fourier analysis over{−1,1}n_{; see}_{Section 2.1}_for a concise review. Our proofs will use various analytic and probabilistic bounds, which we collect for easy reference inSection 2.1below. We call the reader’s attention in particular toTheorem 2.3; throughout the paper,C(which will be seen to play an important role in our proofs) denotesC2₀, whereC₀is the universal constant from that theorem.

2.1 Useful background results

Fourier Analysis over{−1,1}n _{and influences} _{The survey by de Wolf [37] is an excellent source} of background material on Fourier analysis over{−1,1}n_{, including material on the Hypercontractive} Inequality; here we recall only the very basics that we will require. We consider functions f:{−1,1}n_→ R(though we often focus on Boolean-valued functions which map to{−1,1}), and we think of the inputs xto f as being distributed according to the uniform probability distribution. The set of such functions forms a 2n-dimensional inner product space with inner product given byhf,gi=E[f(x)g(x)]. The set of functions(χS)S⊆[n]defined byχS(x) =∏i∈Sxiforms a complete orthonormal basis for this space. Given a function f :{−1,1}n_→

Rwe define itsFourier coefficientsby

b

f(S)def=E[f(x)χS(x)],

and we have that f(x) =∑Sbf(S)χS(x). We refer to the maximum|S|over all nonzero bf(S)as theFourier degreeof f.

As an easy consequence of orthonormality we have Plancherel’s identity hf,gi=∑Sfb(S)g_b(S),

which has as a special caseParseval’s identity,E[f(x)2_{] =}

∑Sfb(S)2. From this it follows that for every f :{−1,1}n_{→ {−}₁_,₁_} _{we have}

∑Sbf(S)2=1. We recall the well-known fact (see e. g., [17]) that

the total influence Inf(f) of any Boolean function equals ∑Sfb(S)2|S|. Note that, in this setting, the

expectation and the variance can be expressed in terms of the Fourier coefficients of f byE[f] = bf(/0)

andVar[f] =∑/06=S⊆[n]fb(S)2.

Let f :{−1,1}n_→

Rand f(x) =∑Sbf(S)χS(x)be its Fourier expansion. Theinfluenceof variablei on f is Infi(f)def=∑S3ibf(S)2, and thetotal influenceof f is Inf(f) =∑n_i=₁Infi(f).

Useful probability bounds We first recall the following moment bound for low-degree polynomials, which is equivalent to the well-known hypercontractive inequality of [3,15]:

Theorem 2.1. Let p:{−1,1}n_→

Rbe a degree-d polynomial and q>2. Then

(7)

The following concentration bound for low-degree polynomials, a simple corollary of hypercontrac-tivity, is well known (see e. g., [12,2]):

Theorem 2.2. Let p:{−1,1}n_→

Rbe a degree-d polynomial. For any t>ed, we have

Prx[|p(x)| ≥tkpk2]≤exp(−Ω(t2/d)).

We will also need the following weak anti-concentration bound for low-degree polynomials over the cube:

Theorem 2.3 ([12, 2]). There is a universal constant C₀>1 such that for any non-zero degree-d polynomial p:{−1,1}n_→

RwithE[p] =0, we have

Prx[p(x)>C₀−d· kpk₂]>C₀−d.

Throughout this paper, we letC=C₀2, whereC₀is the universal constant fromTheorem 2.3. Note that sinceC>C0,Theorem 2.3holds forCas well.

We denote byNnthe standardn-dimensional Gaussian distributionN(0,1)n. The following two facts will be useful in the proof ofTheorem 1.2, in particular in the analysis of the regular case. The first fact is a powerful anti-concentration bound for low-degree polynomials over independent Gaussian random variables:E_G∼_Nn[p(G)2]1/2):

Theorem 2.4([5]). Let p:Rn→Rbe a nonzero degree-d multilinear polynomial. For allε>0and t∈Rwe have

Pr_G∼_Nn h

|p(G)−t| ≤ε·pVar[p(G)]i≤O(dε1/d).

We note that the above bound is essentially tight.

The second fact is a version of the invariance principle of Mossel, O’Donnell and Oleszkiewicz, specifically Theorem 3.19 under hypothesisH4in [26]:

Theorem 2.5([26]). Let

p(x) = ∑ S⊆[n],|S|≤db

p(S)χS(x)

be a degree-d multilinear polynomial withVar[p] =1. Suppose each coordinate i∈[n]hasInfi(p)≤τ. Then,

sup t∈R

|Prx[p(x)≤t]−Pr_G∼_Nn[p(G)≤t]| ≤O(dτ1/(8d)).

3 Main result: a regularity lemma for low-degree PTFs

Let f :{−1,1}n _{→ {−}₁_,₁_} _{be a degree-}_d _{PTF. Fix a representation} _f₍_x_{) =}_sign₍_p₍_x₎₎_{, where} _p_:

{−1,1}n_→

Ris a degree-dpolynomial which (w. l. o. g.) we may take to haveVar[p] =1. We assume w. l. o. g. that the variables are ordered in such a way that Infi(p)≥Infi+1(p)for alli∈[n−1].

(8)

Definition 3.1. Letp:{−1,1}n_→

Randτ>0. Assume the variables are ordered such that Infj(p)≥ Infj+1(p)for all j∈[n−1]. Theτ-critical indexofpis the leastisuch that:

Infi+1(p)≤τ·

n ∑ j=i+1

Infj(p). (3.1)

If (3.1) does not hold for anyiwe say that theτ-critical index ofpis+∞. Ifphasτ-critical index 0, we

say thatpisτ-regular.

Note that ifpis aτ-regular polynomial then maxiInfi(p)≤dτ since the total influence ofpis at most

d. Iff(x) =sign(p(x)), we say f isτ-regular whenpisτ-regular, and we take theτ-critical index of f to

be that ofp.2The following lemma says that the total influence∑ni=j+1Infi(p)goes down geometrically

as a function of jprior to the critical index:

Lemma 3.2. Let p:{−1,1}n_→

Randτ>0. Let k be theτ-critical index of p. For j∈[0,k]we have

n ∑ i=j+1

Infi(p)≤(1−τ)j·Inf(p).

Proof. The lemma trivially holds for j=0. In general, since jis at mostk, we have that Infj(p)≥ τ·∑ni=jInfi(p), or equivalently∑ni=j+1Infi(p)≤(1−τ)·∑ni=jInfi(p)which yields the claimed bound.

We will use the fact that in expectation, the influence of an unrestricted variable in a polynomial does not change under random restrictions:

Lemma 3.3. Let p:{−1,1}n_→

R. Consider a random assignmentρ to the variables x1, . . . ,xkand fix

`∈[k+1,n]. ThenEρ[Inf`(pρ)] =Inf`(p).

Proof. To proveLemma 3.3, we first recall an observation about the expected value of Fourier coefficients under random restrictions (see e. g., [22]):

Fact 3.4. Let p:{−1,1}n_→

R. Consider a random assignmentρ to the variables x1, . . . ,xk. Fix any

S⊆[k+1,n]. Then we have_cp_ρ(S) =∑T⊆[k]pb(S∪T)ρT and thereforeEρ[cpρ(S)

2_{] =}

∑T⊆[k]pb(S∪T)

2_.

In words, the above fact says that all the Fourier weight on sets of the formS∪ {any subset of restricted variables}“collapses” down ontoSin expectation. Consequently, the influence of an unrestricted variable does not change in expectation under random restrictions:

We thus have

Eρ

Inf`(pρ)

=Eρ "

∑

`∈S⊆[k+1,n]c

pρ(S)2 #

= ∑ T⊆[k]

∑

`∈S⊆[k+1,n]b

p(S∪T)2

= ∑

`∈U⊆[n]b

p(U)2=Inf`(p).

2_{Strictly speaking,}

(9)

Notation: ForS⊆[n], we write “ρfixesS” to indicate thatρ∈ {−1,1}|S|is a restriction mappingx_S,

i. e., each coordinate inS, to either−1 or 1 and leaving coordinates not inSunrestricted.

3.1 The large critical index case

The main result of this section isLemma 3.5, which says that if the critical index of f is large, then a noticeable fraction of restrictionsρof the high-influence variables cause fρto become close to a constant function.

Lemma 3.5. Let f:{−1,1}n_{→ {−}₁_,₁_}_{be a degree-d PTF f} ₌_sign₍_p₎_{. Fix}

β >0and suppose that

f hasτ-critical index at least Kdef=α/τ, whereα =Ω(dlog log(1/β) +dlogd). Then, for at least a

1/(2Cd)fraction of restrictionsρfixing[K], the function fρisβ-close to a constant function.

proof Partition the coordinates into a “head” partHdef= [K](the high-influence coordinates) and a “tail” partT = [n]\H. We can write p(x) =p(xH,xT) =p0(xH) +q(xH,xT), wherep0(xH)is the truncation of

pcomprising only the monomials all of whose variables are inH, i. e.,p0(xH) =∑S⊆Hbp(S)χS(xH).

Now consider a restrictionρofHand the corresponding polynomial pρ(xT) =p(ρ,xT). It is clear that the constant term of this polynomial is exactlyp0(ρ). To prove the lemma, we will show that for at least a 1/(2Cd)fraction of allρ ∈ {−1,1}K, the (restricted) degree-d PTF fρ(xT) =sign(pρ(xT)) satisfiesPrx_T[f_ρ(xT)6=sign(p0(ρ))]≤β. Let us define the notion of agood restriction:

Definition 3.6. A restrictionρ∈ {−1,1}K _{that fixes}_H_{is called}_good_{iff the following two conditions} are simultaneously satisfied:

(i) |p0(ρ)| ≥t∗def=1/(2Cd), and (ii) kq(ρ,xT)k2≤t∗· Θ(log(1/β)−d/2.

Intuitively condition (i) says that the constant termp0(ρ)ofpρhas “large” magnitude, while condition (ii) says that the polynomialq(ρ,xT)has “small”`2-norm. We claim that ifρis a good restriction then

the degree-d PTF fρ satisfiesPrxT[fρ(xT)6=sign(p

0₍

ρ))]≤β. To see this claim, note that for any fixed ρwe have fρ(xT)6=sign(p0(ρ))only if|q(ρ,xT)| ≥ |p0(ρ)|, so to show this claim it suffices to show that ifρis a good restriction thenPrx_T[|q(ρ,xT)| ≥ |p0(ρ)|]≤β. But forρa good restriction, by conditions (i) and (ii) we have

PrxT

|q(ρ,xT)| ≥ |p0(ρ)|≤PrxT h

|q(ρ,xT)| ≥ kq(ρ,xT)k2· Θ(log(1/β)d/2 i

,

which is at mostβ by the concentration bound (Theorem 2.2), as desired. So the claim holds: ifρ is a

good restriction then fρ(xT)isβ-close to sign(p0(ρ)). Thus to proveLemma 3.5it remains to show that at least a 1/(2Cd)fraction of all restrictionsρtoHare good.

We prove this in two steps. First we show (Lemma 3.7) that the polynomialp0is not too concentrated: with probability at least 1/Cdoverρ, condition (i) ofDefinition 3.6is satisfied. We then show (Lemma 3.8)

that the polynomialqis highly concentrated: the probability (overρ) that condition (ii) isnotsatisfied is

at most 1/(2Cd). Lemma 3.5then follows by a union bound.

Lemma 3.7. We have thatPrρ

(10)

Proof. Using the fact that the critical index of pis large, we will show that the polynomialp0has large variance (close to 1), and hence we can apply the anti-concentration boundTheorem 2.3.

We start by establishing that Var[p0]lies in the range [1/2,1]. To see this, first recall that for

g:{−1,1}n_→

R we have Var[g] =∑/06=S⊆[n]gb

2₍_S₎_{. It is thus clear that} _Var_[_p0_]_≤_Var_[_p_{] =}_{1. To}

establish the lower bound we use the property that the “tail”T has “very small” influence inp, which is a consequence of the critical index ofpbeing large. More precisely,Lemma 3.2yields

∑ i∈T

Infi(p)≤(1−τ)K·Inf(p) = (1−τ)α/τ·Inf(p)≤d·e−α (3.2)

where the last inequality uses the fact that Inf(p)≤d. Therefore, we have:

Var[p0] =Var[p]− _∑

T∩S6=/0,S⊆[n]b

p(S)2≥1− _∑

i∈T

Infi(p)≥1−de−α_≥₁_/₂

where the first inequality uses the fact that Infi(p) =∑i∈S⊆[n]pb(S)

2_{, the second follows from (3.2) and}

the third from our choice ofα. We have thus established that indeedVar[p0]∈[1/2,1].

At this point, we would like to applyTheorem 2.3for p0. Note however thatE[p0] =E[p] = _bp(/0) which is not necessarily zero. To address this minor technical point we applyTheorem 2.3twice: once for the polynomialp00=p0−p_b(/0)and once for−p00. (Clearly,E[p00] =0 andVar[p00] =Var[p0]∈[1/2,1].) We thus get that, independent of the value of p_b(/0), we have Prρ[|p0(ρ)|>2−1/2·C−d]≥C−d, as desired.

Lemma 3.8. We have thatPrρ

kq(ρ,xT)k2>t∗· Θ(log(1/β)−d/2≤1/(2Cd).

Proof. To obtain the desired concentration bound we must show that the degree-2dpolynomialQ(ρ) = kq(ρ,xT)k2₂has “small” variance. The desired bound then follows by an application ofTheorem 2.2.

We thus begin by showing thatkQk₂≤3dde−α_{. To see this, we first note that}_Q₍

ρ) =∑/06=S⊆Tcpρ(S)

2_.

Hence an application of the triangle inequality for norms and hypercontractivity (Theorem 2.1) yields:

kQk2≤ ∑

/06=S⊆T

k_cpρ(S)k24≤3

d ∑

/06=S⊆T

k_cpρ(S)k2₂.

We now proceed to bound from above the RHS term by term:

∑

/06=S⊆T

k_cpρ(S)k2₂= ∑

/06=S⊆T

Eρ[cpρ(S)

2_{] =}_E

ρ h

∑

/06=S⊆Tc

pρ(S)2 i

≤Eρ

Inf(pρ)=Eρ h

∑ i∈T

Infi(pρ) i

= ∑ i∈T

Eρ

Infi(p_ρ)= ∑ i∈T

Infi(p)≤de−α _(3.3)

where the first inequality uses the fact Inf(pρ)≥∑/06=S⊆Tcpρ(S)

2_{, the equality in (3.3) follows from}

Lemma 3.3, and the last inequality is equation (3.2). We have thus shown thatkQk₂≤3dde−α_. We now upper bound

Prρ

Q(ρ)>(t∗)2·Θ(log(1/β))−d.

Since kQk₂≤3dde−α_, _{Theorem 2.2} _{implies that for all}_t _>_ed _{we have} _Pr

ρ[Q(ρ)>t·3dde−α]≤ exp(−Ω(t1/d)). Takingt to be Θ(ddlndC) this upper bound is at most 1/(2Cd). Our choice of the parameterα givest·d3d·e−α≤(t∗)2·Θ(log(1/β))−d. This completes the proof ofLemma 3.8, and

(11)

3.2 The small critical index case

In this section we show that if the critical index of pis “small,” then a random restriction of “few” variables causes pto become regular with non-negligible probability. We do this by showing that no matter what the critical index is, a random restriction of all variables up to theτ-critical index causesp

to becomeτ0-regular, for someτ0not too much larger thanτ, with probability at least 1/(2Cd). More formally, we prove:

Lemma 3.9. Let p:{−1,1}n_→

Rbe a degree-d polynomial withτ-critical index k∈[n]. Letρ be a random restriction that fixes[k], and letτ0= (C0·dlnd·ln1_τ)d·τfor some suitably large absolute constant C0. With probability at least1/(2Cd)over the choice ofρ, the restricted polynomial pρisτ0-regular.

Proof. We must show that with probability at least 1/(2Cd)overρthe restricted polynomial pρsatisfies

Inf`(pρ)/ n ∑ j=k+1

Infj(pρ)≤τ

0

(3.4)

for all`∈[k+1,n]. Note that before the restriction, we have Inf`(p)≤τ·∑nj=k+1Infj(p)for all `∈ [k+1,n]because theτ-critical index ofpisk.

Let us give an intuitive explanation of the proof. We first show (Lemma 3.10) that with probability at leastC−d the denominator in (3.4) does not decrease under a random restriction. This is an anti-concentration statement that follows easily fromTheorem 2.3. We then show (Lemma 3.11) that with probability at least 1−C−d/2 the numerator in (3.4) does not increase by much under a random restriction, i. e.,novariable influence Inf`(pρ),`∈[k+1,n], becomes too large. Thus both events occur (andpρ is τ0-regular) with probability at leastC−d/2.

We note that while each individual influence Inf`(pρ)is indeed concentrated around its expectation (seeClaim 3.12), we need a concentration statement forn−ksuch influences. This might seem difficult to achieve since we require bounds that are independent ofn. We get around this difficulty by a “bucketing” argument that exploits the fact (at many different scales) that all but a few influences Inf`(p)must be

“very small.”

It remains to state and prove Lemmas3.10and3.11. Consider the event

Edef

=

ρ∈ {−1,1}k|

n ∑

`=k+1

Inf`(pρ)≥ n ∑

`=k+1

Inf`(p)

.

We first show:

Lemma 3.10. Prρ[E]≥C−d.

Proof. It follows fromFact 3.4and the Fourier expression of Inf`(pρ)thatA(ρ)

def

=∑n`=k+1Inf`(pρ)is a degree-2d polynomial. By Lemma 3.3we get thatEρ[A] =∑

n

`=k+1Inf`(p)>0. Also observe that A(ρ)≥0 for allρ∈ {−1,1}k_{. We may now apply}_{Theorem 2.3}_{to the polynomial}_A0₌_A₋_E

ρ[A], to obtain:

(12)

We now turn toLemma 3.11. Consider the event

Jdef

=nρ∈ {−1,1}k max

`∈[k+1,n]Inf`(pρ)>τ

0 n

∑ j=k+1

Infj(p)

o .

We show:

Lemma 3.11. Prρ[J]≤(1/2)·C−d.

The rest of this subsection consists of the proof ofLemma 3.11. A useful intermediate claim is that the influences of individual variables do not increase by a lot under a random restriction (note that this claim does not depend on the value of the critical index):

Claim 3.12. Let p:{−1,1}n_→

Rbe a degree-d polynomial. Letρ be a random restriction fixing[j]. Fix any t>e2d and any`∈[j+1,n]. With probability at least1−exp(−Ω(t1/d))over ρ, we have Inf`(pρ)≤3

d_t_Inf

`(p).

Proof. The identity Inf`(pρ) =∑`∈S⊆[j+1,n]cpρ(S)

2 _and _{Fact 3.4} _{imply that Inf}

`(pρ) is a degree-2d

polynomial inρ. Hence the claim follows from the concentration bound,Theorem 2.2, assuming we can

appropriately upper bound the`2-norm of the polynomial Inf`(pρ). So, to proveClaim 3.12it suffices to show that

kInf`(pρ)k2≤3dInf`(p). (3.5)

The proof of equation (3.5) is similar to the argument establishing thatkQk₂≤3dde−α_in_{Section 3.1.} The triangle inequality tells us that we may bound the`2-norm of each squared-coefficient separately:

kInf`(pρ)k2≤ ∑

`∈S⊆[j+1,n]

kp_bρ(S)2k2.

Sincep_bρ(S)is a degree-dpolynomial,Theorem 2.1yields that

kp_bρ(S)2k2=kpbρ(S)k

2 4≤3

d_k

b pρ(S)k22,

hence

kInf`(pρ)k2≤3d ∑

`∈S⊆[j+1,n]

kp_bρ(S)k22=3

d

Inf`(p),

where the last equality is a consequence of Fact 3.4. Thus equation (3.5) holds, andClaim 3.12is proved.

Claim 3.12 says that for any given coordinate, the probability that its influence after a random restriction increases by atfactor decreases exponentially int. Note thatClaim 3.12and a naive union bound over all coordinates in[k+1,n]does not suffice to proveLemma 3.11. Instead, we proceed as follows: We partition the coordinates in[k+1,n]into “buckets” according to their influence in the tail of

p. In particular, thei-th bucket (i≥0) contains all variables`∈[k+1,n]such that

Inf`(p)

∑n_j=k+₁Infj(p)

(13)

We analyze the effect of a random restrictionρon the variables of each bucketiseparately and then

conclude by a union bound over all the buckets.

So fix a bucketi. Note that, by definition, the number of variables in the i-th bucket is at most 2i+1/τ. We bound from above the probability of the eventB(i)that there exists a variable`in bucketi

that violates the regularity constraint, i. e., such that Inf`(pρ)>τ

0

∑n`=k+1Inf`(p). We will do this by a

combination ofClaim 3.12and a union bound over the variables in the bucket. We will show:

Claim 3.13. We have thatPrρ[B(i)]≤2−(i+2)·C−d.

The above claim completes the proof of Lemma 3.11by a union bound across buckets. Indeed, assuming the claim, the probability thatany variable `∈[k+1,n]violates the condition Inf`(pρ)≤ τ0∑n`=k+1Inf`(p)is at most

∞

∑ i=0

Prρ[B(i)]≤C−d2−2

∞

∑ i=0

(1/2)i= (1/2)·C−d.

It thus remains to proveClaim 3.13. Fix a variable`in thei-th bucket. We applyClaim 3.12selecting a value of

t=et

def

=

lnC d₄i+2

τ d

.

It is clear thatet≤c

0d₍_d₊_i₊_ln₍₁_/

τ))d for some absolute constantc0. As a consequence, there is an

absolute constantC0such that for everyi,

et≤3

−d

C0d2i

dlndln1

τ d

. (3.6)

(To see this, note that fori≤10dlndwe haved+i+ln(1/τ)<11dlndln(1/τ), from which the claimed

bound is easily seen to hold. Fori>10dlnd, we used+i+ln(1/τ)<diln(1/τ)and the fact thatid<2i

fori>10dlnd.)

Inequality (3.6) can be rewritten as 3d·et·τ/2

i_≤

τ0. Hence, our assumption on the range of Inf`(p)

gives

3d·et·Inf`(p)≤τ

0_· n

∑ j=k+1

Infj(p).

Therefore, byClaim 3.12, the probability that coordinate`violates the condition

Inf`(pρ)≤τ

0 n

∑ j=k+1

Infj(p)

is at mostτ/(Cd4i+2)by our choice ofet. Since bucketicontains at most 2

i+1_/

τ coordinates,Claim 3.13

(14)

3.3 Putting everything together: proof ofTheorem 1.1

The following lemma combines the results of the previous two subsections:

Lemma 3.14. Let p:{−1,1}n_→

Rbe a degree-d polynomial and0<eτ,β<1/2. Fix

α=Θ(dlog log(1/β) +dlogd) and τe

0₌

e

τ·(C0dlndln(1/τe))

d

,

where C0 is a universal constant. (We assume w. l. o. g. that the variables are ordered s.t. Infi(p)≥

Infi+1(p), i∈[n−1].) One of the following statements holds true:

1. The polynomial p isτ_e-regular.

2. With probability at least1/(2Cd)over a random restrictionρfixing the firstα/eτ(most influential) variables of p, the functionsign(pρ)isβ-close to a constant function.

3. There exists a value k≤α/τe, such that with probability at least1/(2C

d₎_{over a random restriction}

ρ fixing the first k (most influential) variables of p, the polynomial pρ iseτ

0_-regular.

Proof. The proof is by case analysis based on the value`of theeτ-critical index of the polynomialp. If `=0, then by definitionpisτe-regular, hence the first statement of the lemma holds. If` >α/τe, then we

randomly restrict the firstα/τemany variables. Lemma 3.5says that for a random restrictionρ fixing

these variables, with probability at least 1/(2Cd)the (restricted) degree-d PTF sign(pρ)isβ-close to

a constant. Hence, in this case, the second statement holds. To handle the case`∈[1,α/τe], we apply

Lemma 3.9. This lemma says that with probability at least 1/(2Cd)over a random restrictionρ fixing

variables[`], the polynomialpρisτe

0_{-regular, so the third statement of}_{Lemma 3.14}_holds.

Proof ofTheorem 1.1. We begin by observing that any function fon{−1,1}n_{is equivalent to a decision} tree where each internal node of the tree is labeled by a variable, every root-to-leaf path corresponds to a restrictionρ that fixes the variables as they are set on the path, and every leaf is labeled with the

restricted subfunction f_ρ. Given an arbitrary degree-dPTF f =sign(p), we will construct a decision tree Tof the form described inTheorem 1.1. It is clear that in any such tree every leaf function fρ will be a degree-d PTF; we must show thatThas depth depth(d,τ)and that with probability 1−τ over the choice

of a random root-to-leaf pathρ, the restrictionρis such that either fρ=sign(pρ)isτ-close to a constant function, orpρ isτ-regular.

For a treeT computing f=sign(p), we denote byN(T)its set of internal nodes and byL(T)its set of leaves. We call a leafρ∈L(T)“good” if either fρ isτ-close to a constant function or pρisτ-regular. We call a leaf “bad” otherwise. LetGL(T) andBL(T) be the sets of “good” and “bad” leaves inT

respectively.

The basic approach for the proof is to invoke Lemma 3.14repeatedly in a sequence of at most 2Cdln(1/τ)stages. In the first stage we applyLemma 3.14to f itself; this gives us an initial decision tree. In the second stage we applyLemma 3.14to the restricted subfunctions fρ corresponding to leaves of the initial decision tree that are bad; this “grows” our initial decision tree. Subsequent stages continue similarly; we will argue that after at most 2Cdln(1/τ)stages, the resulting tree satisfies the required

properties forT. In every application ofLemma 3.14the parametersβ andeτ

0_{are both taken to be}

τ; note

that takingτe

0_{to be}

(15)

We now provide the details. In the first stage, the initial application ofLemma 3.14results in a tree

T1. This treeT1may consist of a single leaf node that isτe-regular (if f isτe-regular to begin with—in this

case, sinceτe<τ, we are done), or a complete decision tree of depthα/τe(if f had large critical index),

or a complete decision tree of depthk<α/eτ(if f had small critical index). Note that in each case the

depth ofT1is at mostα/τe.Lemma 3.14guarantees that:

Prρ∈T1[ρ∈BL(T1)]≤1−1/(2C d₎_,

where the probability is over a random root-to-leaf pathρinT1.

In the second stage, the “good” leavesρ∈GL(T1)are left untouched; they will be leaves in the final

treeT. For each “bad” leafρ∈BL(T1), we order the unrestricted variables in decreasing order of their

influence in the polynomialpρ, and we applyLemma 3.14to fρ. This “grows”T1at each bad leaf by

replacing each such leaf with a new decision tree; we call the resulting overall decision treeT2.

A key observation is that the probability that a random path from the root reaches a “bad” leaf is significantly smaller inT2than inT1; in particular

Prρ∈T2[ρ∈BL(T2)]≤(1−1/(2C d₎₎2_.

We argue this as follows: Letρ be any fixed “bad” leaf inT1, i. e.,ρ∈BL(T1). The function fρ is not e

τ0-regular and consequently notτe-regular. Thus, either statement (2) or (3) ofLemma 3.14must hold

when the Lemma is applied to f_ρ. The tree that replacesρinT0has depth at mostα/eτ, and a random

root-to-leaf pathρ1in this treereaches a “bad” leaf with probability at most 1−1/(2Cd). So the overall

probability that a random root-to-leaf path inT2reaches a “bad” leaf is at most(1−1/(2Cd))2.

Continuing in this fashion, in thei-th stage we replace all the bad leaves ofTi−1by decision trees

according toLemma 3.14and we obtain the treeTi. An inductive argument gives that

Prρ∈Ti[ρ∈BL(Ti)]≤(1−1/(2C

d₎₎i_,

which is at mostτfori∗def=2Cdln(1/τ).

The depth of the overall tree will be the maximum number of stages (2Cdln(1/τ)) times the maximum depth added in each stage (at mostα/eτ, since we always restrict at most this many variables), which is at

most(α/eτ)·i

∗_{. Since}

β =τ, we getα=Θ(dlog log(1/τ) +dlogd). Recalling thatτe

0_in_{Lemma 3.14}_is

set toτ, we see that

e

τ= τ

(C0dlndln(1/τ))O(d) .

By substitution we get that the depth of the tree is upper bounded bydO(d)·(1/τ)·log(1/τ)O(d)which

concludes the proof ofTheorem 1.1.

4 Every degree-

d

PTF has a low-weight approximator

In this section we applyTheorem 1.1to proveTheorem 1.2, which we restate below:

Theorem 1.2. Let f(x) =sign(p(x))be any degree-d PTF. Fix anyε >0and let τ= (Θ(1)·ε/d)8d_.

(16)

To proveTheorem 1.2, we first show that any sufficiently regular degree-dPTF overnvariables has a low-weight approximator, of weight roughlynd.Theorem 1.1asserts that almost every leafρ ofTis a

regular PTF or close to a constant function; at each regular leafρ we use the low-weight approximator of

the previous sentence to approximate f_ρ. Finally, we combine all of these low-weight polynomials to get an overall PTF of low weight which is a good approximator for f. We give details below.

4.1 Low-weight approximators for regular PTFs

In this subsection we prove that every sufficiently regular PTF has a low-weight approximator of degree

d:

Lemma 4.1. Given ε >0, let τ = (Θ(1)·ε/d)8d. Let p:{−1,1}n →R be a τ-regular degree-d polynomial withVar[p] =1. There exists a degree-d polynomial q:{−1,1}n_→

Rof weight nd·(d/ε)O(d) such thatsign(q(x))is anε-approximator forsign(p(x)).

Proof. The polynomialqis obtained by rounding the weights ofpto an appropriate granularity, similar to the regular case in [33] for thed=1 case. To show that this works, we use the fact that regular PTFs have very good anti-concentration. In particular we will use the following claim, which follows easily by combining the invariance principle [26] and Gaussian anti-concentration [5]:

Claim 4.2. Let p:{−1,1}n_→

Rbe aτ-regular degree-d polynomial withVar[p] =1. Then

Prx[|p(x)| ≤τ]≤O(dτ1/8d).

Proof. We recall that, since Var[p] =1 and pis of degree d, it holds Inf(p)≤d. Thus, since pis

τ-regular, we have that maxi∈[n]Infi(p)≤dτ. An application of the invariance principle (Theorem 2.5) in tandem with anti-concentration in gaussian space (Theorem 2.4) yields

Prx[|p(x)| ≤τ]≤O(d·(dτ)1/8d) +Pr_G∼_Nn[|p(G)| ≤τ] ≤O(dτ1/8d) +O(dτ1/d) =O(dτ1/8d),

and the claim follows.

We turn to the detailed proof ofLemma 4.1. We first note that if the constant coefficientpb(/0)ofP

has magnitude greater than(O(log(1/ε)))d/2, thenTheorem 2.2(applied to p(x)−pb(/0)) implies that

sign(p(x))agrees with sign(p_b(/0))for at least a 1−ε fraction of inputsx. So in this case sign(p(x))is ε-close to a constant function, and the conclusion of the Lemma certainly holds. Thus we henceforth assume that|p_b(/0)|is at most(O(log(1/ε)))d/2.

Let

α= τ

(Kn·ln(4/ε))d/2

whereK>0 is an absolute constant (specified later). For eachS6=/0 letq_b(S)be the value obtained by rounding pb(S)to the nearest integer multiple of α, and let qb(/0)equal pb(/0). This defines a degree-d

(17)

ofq(x)/α are integers. Since each coefficientbq(S)has magnitude at most twice that of bp(S), we may

bound the sum of squares of coefficients ofq(x)/α by

b p(/0)2

α2 +

∑S6=/04bp(S)

2

α2

≤(O(log(1/ε)))

d

α2

≤nd·(d/ε)O(d).

We now observe that the constant coefficientpb(/0)ofq(x)can be rounded to an integer multiple ofα

without changing the value of sign(q(x))for any inputx. Doing this, we obtain a polynomialq0(x)/α

with all integer coefficients, weightnd·(d/ε)O(d), and which has sign(q0(x)) =sign(q(x))for allx.

In the rest of our analysis we shall consider the polynomialq(x)(recall that the constant coefficient of

q(x)is precisely bp(/0)). It remains to show that sign(q)is anε-approximator for sign(p). For eachS6=/0

leteb(S)equal bp(S)−qb(S). This defines a polynomial (with constant term 0)e(x) =∑Sbe(S)xS, and we

haveq(x) +e(x) =p(x). (The coefficientse_b(S)are the “errors” induced by approximating_bp(S)byq_b(S).) Recall thatτ = (Θ(1)·ε/d)8d. For any inputx, we have that sign(q(x))6=sign(p(x))only if either

(i) |e(x)| ≥τ,or

(ii) |p(x)| ≤τ.

Since each coefficient ofe(x)satisfies

|e_b(S)| ≤α

2 ≤

τ

2(Kn·ln(4/ε))d/2,

the sum of squares of all (at mostnd) coefficients ofeis at most

∑ Sb

e(S)2≤ τ

2

4(Kln(4/ε))d, and thus kek2≤

τ

2(Kln(4/ε))d/2.

ApplyingTheorem 2.2, we get thatPrx[|e(x)| ≥τ]≤ε/2 (for a suitable absolute constant choice ofK), so we have upper bounded the probability of (i).

For (ii), we use the anti-concentration bound for regular polynomials,Claim 4.2. This directly gives us thatPrx[|p(x)| ≤τ]≤O(dτ1/8d)≤ε/2.

Thus the probability, over a randomx, that either (i) or (ii) holds is at mostε. Consequently sign(q)

is anε-approximator for sign(p), andLemma 4.1is proved.

4.2 Proof ofTheorem 1.2

Let f =sign(p)be an arbitrary degree-d PTF overn Boolean variables, and letε>0 be the desired

approximation parameter. We invokeTheorem 1.1with its “τ” parameter set toτ= (Θ(1)·(ε/2)/d)8d (i. e., our choice ofτis obtained by plugging in “ε/2” forεin the first sentence ofLemma 4.1). Each leaf ρof the treeT as described inTheorem 1.1(we call these “good” leaves) is either aτ-regular degree-d

PTF orτ-approximated by a constant function. ByLemma 4.1, for each regular leafρ there is a degree-d

(18)

these “bad” leaves), for which f_ρ is notτ-close to anyτ-regular degree-dPTF, letq(ρ)be the constant-1

function.

For each leafρ of depthr inT, letPρ(x)be the unique multilinear polynomial of degreer which outputs 2riffxreachesρ and outputs 0 otherwise. (As an example, ifρ is a leaf which is reached by the

path “x₃=−1,x₆=1,x₂=1” from the root inT, thenP_ρ(x)would be(1−x₃)(1+x₆)(1+x₂).) Our final PTF is

g(x) =sign(Q(x)), where Q(x) =∑ ρ

Pρ(x)q(ρ)(x).

It is easy to see that on any inputx, the valueQ(x)equals 2|ρx|_·_q(ρx)₍_x₎_{, where we write}_ρ

x to denote the leaf ofT thatxreaches and|ρx|to denote the depth of that leaf. Thus sign(Q(x))equals sign(q(ρx)(x)) for eachx, and from this it follows thatPrx[g(x)6= f(x)]is at mostτ+τ+ε/2<ε. Here the firstτis

because a random inputxmay reach a bad leaf with probability up toτ, and theτ+ε/2 is because for

each good leafρ, the function is eitherτ-approximated by a constant function orε/2-approximated by

sign(q(ρ)₎_.

Since T has depth depth(d,τ), it is easy to see thatQ has degree at most depth(d,τ) +d. It is

clear that the coefficients ofQare all integers, so it remains only to bound the sum of squares of these coefficients. Each polynomial addendPρ(x)q(ρ)(x) in the sum is easily seen to have sum of squared coefficients

∑ S

\

Pρq(ρ)(S)2=E[(Pρ·q

(ρ)₎2_]_≤_max

x Pρ(x)

2_·_E_[_q(ρ)₍_x₎2_]_≤₂2depth(d,τ)_·_nd_·₍

d/ε)O(d). (4.1)

SinceT has depth depth(d,τ), the number of leaves ρ is at most 2depth(d,τ)_{, and hence for each}_S_by Cauchy-Schwarz we have

b Q(S)2=

∑ ρ

\ P_ρq(ρ)₍_S₎

2

≤2depth(d,τ)_· ∑

ρ \

P_ρq(ρ)₍_S₎2_. _(4.2)

This implies that the total weight ofQis

∑ S

b

Q(S)2≤2depth(d,τ)_· ∑ ρ,S

\

Pρq(ρ)(S)2 (using (4.2))

≤22depth(d,τ)_max ρ

∑ S

\ Pρq(ρ)(S)2

≤24depth(d,τ)_·_nd_·₍_d_/

ε)O(d), (using (4.1))

andTheorem 1.2is proved.

4.3 Degree-d PTFs requireΩe(nd)-weight approximators

In this section we give two lower bounds on the weight required toε-approximate certain degree-dPTFs.

(19)

Theorem 4.3. For all sufficiently large n, there is a degree-d n-variable PTF f(x)with the following property: Let K(d)be any positive-valued function depending only on d. Suppose that g(x) =sign(q(x))

is a degree-K(d)PTF with integer coefficientsq_b(S)such thatdist(f,g)≤ε?whereε?def=C−d/2. Then the weight of q isΩd(nd/logn).

Theorem 4.4. For all sufficiently large n, there is a degree-d n-variable PTF f(x)with the following property: Suppose that g(x) =sign(q(x))is any PTF (of any degree) with integer coefficientsq_b(S)such

thatdist(f,g)≤ε?whereε?def=C−d/2. Then the weight of q isΩd(nd−1).

Viewingdandεas constants,Theorem 4.3implies that theO(nd)weight bound of ourε-approximator

fromTheorem 1.2(of constant degree) is essentially optimal for any constant-degreeε-approximator. Theorem 4.4says that there is only small room for improving our weight bound even if arbitrary-degree PTFs are allowed as approximators.

Theorems4.3and4.4are both consequences of the following theorem:

Theorem 4.5. There exists a setC={f1, . . . ,fM}of M=2Ωd(n

d₎

degree-d PTFs fi such that for any 1≤i< j≤M, we havedist(fi,fj)≥C−d.

Proof of Theorems4.3and4.4assumingTheorem 4.5. First we proveTheorem 4.3. We begin by claim-ing that there are at most

3

n ≤K(d)

A

many integer-weight PTFs of degreeK(d)and weight at mostA. This is because any such PTF can be obtained by making a sequence ofAsteps, where at each step either−1, 0, or 1 is added to one of the

n

≤K(d)

many monomials of degree at mostK(d). Each step can be carried out in 3 _≤_K(d)n ways, giving the claimed bound.

ByTheorem 4.5, there areMdistinct degree-d PTFs f1, . . . ,fMany two of which areC−d-far from each other. Consequently any Boolean function (in particular, any weight-Adegree-K(d)PTFg) can have dist(g,fi)≤C−d/2 for at most one fi. Since there are only

3

n ≤K(d)

A

many weight-Adegree-K(d)PTFs, which is less thanMfor someA=Ωd(nd/logn), it follows that some

fimust have distance at leastC−d/2 from every weight-A, degree-K(d)PTF. This givesTheorem 4.3. The proof ofTheorem 4.4is nearly identical. We now use the fact that there are at most(3·2n₎A many integer-weight PTFs of weight at mostA(and any degree), and use the fact that(3·2n)A_{is less than}

Mfor someA=Ωd(nd−1).

(20)

4.3.1 Proof ofTheorem 4.5

The proof is by the probabilistic method. We define the following distributionDovern-variable degree-d

polynomials. A draw ofp(x) =∑S⊂[n],|S|=dbp(S)χS(x)fromDis obtained in the following way: each of

the n_dcoefficients bp(S)is independently and uniformly selected from{−1,1}.

We will proveTheorem 4.5usingLemma 4.6, which says that it is extremely likely the polynomial

c—the product of two independent drawsaandbfromD—will have both small bias and large variance.

Lemma 4.6. Let a(x) and b(x) be two degree-d polynomials drawn independently fromD, and let c(x) =a(x)b(x). Then with probability at least1−2−Ωd(nd)_{we have:}

1. |_bc(/0)| ≤1 4C

−d n/2

d

, and

2. Var[c] =∑|S|>0bc(S)

2_≥ 1 12

n/2

d

2 .

Suppose thatLemma 4.6holds. Let a(x) andb(x) be independent draws fromDand letc(x) = a(x)b(x)which satisfies the conclusions of the lemma. Then the constant term_bc(/0)is small compared with the variance ofc(x). Let us rescale so the variance is 1; i. e., define the polynomial

e(x)def= c(x) Var[c]1/2

soVar[e] =1 and |_be(/0)|<C−d. We now apply Theorem 2.3 to the degree-2d polynomial q(x) =

−e(x) +be(/0), and we see that with probability at leastC

−d _{(over a random uniform draw of}_x_{) we have}

−e(x) +_be(/0)>C−d, and hencePrx[sign(e(x))<0]>C−d.

We now observe that sign(e(x))<0 if and only if sign(a(x))6=sign(b(x)), and consequently Prx[sign(e(x))<0]is precisely dist(sign(a),sign(b)). We thus have that for a(x),b(x) drawn from Das described above, the probability that dist(sign(a),sign(b))is less thanC−d is at most 2−αdnd _for

some absolute constantαd>0 (depending only ond). Now let us consider

M=2(αd/2)nd

many independent draws of polynomialsa1,a2, . . . ,aMfromD. A union bound over all the

M

2

<2αdnd

pairs(i,j)with 1≤i< j≤Mgives that with nonzero probability, everyai,aj pair satisfies

dist(sign(ai),sign(aj))≥C−d.

Thus there must be some outcome for the polynomialsa1,a2, . . . ,aMsuch that dist(sign(ai),sign(aj))≥C−d

for all 1≤i< j≤M. Setting fi=sign(ai)for this outcome,Theorem 4.5is proved.

(21)

4.3.2 Proof ofLemma 4.6

Let us considera(x) =∑|S|=dab(S)χS(x)andb(x) =∑|S|=dbb(S)χS(x)drawn independently fromD. We

will show that the bias of the polynomial c(x) =a(x)b(x) fails to satisfy the bound in item 1 with probability 2−Ωd(nd)_{. Then we show the variance of}_c_{fails to satisfy item 2 with probability 2}−Ωd(nd)_{, and}

the lemma follows from a union bound.

To bound the bias ofc, we begin by noting that:

b

c(/0) =

_∑

S⊆[n]

b

a(S)bb(S).

Each termab(S)bb(S) in the summand is uniform, i.i.d in{−1,1}. Define the random variable XS =

1/2−(1/2)a_b(S)bb(S). Then_∑S_⊆_[n]X_Sis binomially distributed and setting

t=1 4C

−d n 2d

d ,

we may apply the Chernoff bound to obtain:

Pr

b

c(/0)<−1

4C

−d n

2d d

=Pr[X>E[X] +t]≤exp −2 t

2

n d

!

=2−Ωd(nd)_.

The same analysis gives a bound on the magnitude in the negative direction. Since

n/2

d

≥ n

2d d

,

this concludes the analysis for the first item of the lemma.

Now we show that item 2 of the lemma also fails with very small probability. The following terminology will be useful. LetT ⊂[n]be a subset of size exactly|T|=2d(we think ofT as the set of variables defining some monomial of degree 2d). For such aT we let

first(T)def=T∩[n/2] and second(T)def=T∩[n/2+1,n].

We say that such aT isbalancedif|first(T)|=|second(T)|=d. Note that there are exactly n/_d22many balanced subsetsT.

We say that a subsetU⊂[n],|U|=dispureifUis contained entirely in[n/2+1,n]. Let us consider

a(x) =

_∑

|S|=d

b

a(S)χS(x) and b(x) =

_∑

|S|=d

b

b(S)χS(x)

drawn independently fromD. Fix any outcome fora(i. e., for the values of all n_dcoefficientsba(S)), and

(22)

value (drawn uniformly from{−1,1}}) for each of the n_d/2coefficientsbb(U)for pureU. We will show

that with probability at least 1−2−Ωd(nd)_{over the remaining randomness, at least 1}_/_{6 of the}

n/2

d 2

many balanced subsetsT havebc(T)6=0. Since each valuebc(T)which is nonzero is at least 1 in magnitude,

this suffices to prove the lemma.

Consider any fixed pure subsetU⊂[n],|U|=d (for exampleU={n−d+1, . . . ,n}). LetT be a balanced subset ofn(so|T|=2d) such that second(T)equalsU. (There are precisely n/_d2balanced subsetsT with this property; letTUdenote the collection of all n_d/2of them.) Consider the valuebc(T):

this is

b

c(T) =

_∑

S⊆T,|S|=d

b

a(S)bb(T−S).

The only “not-yet-fixed” part of the above expression is the single coefficientbb(U); everything else has

been fixed. Since the coefficientab(T−U)ofbb(U)is a nonzero integer, there are two possible outcomes

for the value ofbc(T), depending on whetherbb(U)is set to +1 or -1. These two possible values differ by

2; consequently, there is at most one possible outcome ofbb(U)that will causec_b(T)to be zero. (Note that

it may well be the case that no outcome forbb(U)would cause_bc(T)to become zero.)

Let us say that an outcome ofbb(U)isperniciousif it has the following property: at most 1/3 of the

n/2

d

elementsT ∈TU havebc(T)take a nonzero value under that outcome ofbb(U). (Equivalently, at

least 2/3 of the n/_d2elementsT ∈TU havebc(T)become zero under that outcome ofbb(U).) It may be

the case that neither outcome in{−1,1}forbb(U)is pernicious (e. g., if each outcome makes at least

95% of the_bc(T)values come out nonzero). It cannot be the case that both outcomes{−1,1}forbb(U)

are pernicious (for if there were two pernicious outcomes, this would mean that at least 1/3 of thebc(T)

values evaluate to 0 under both outcomes forbb(U), but it is impossible for anyc_b(T)to evaluate to 0 under

two outcomes forbb(U)). Consequently we have

Pr

the outcome ofbb(U)is pernicious

≤1/2.

This is true independently for each of the n/_d2many pure subsetsU. As a result, a simple analysis gives

Pr

h

at least 3/4 of the

n/2

d

pure subsetsUhave a pernicious outcome

i

≤2−Ωd(nd)_.

Thus we may assume that fewer than 3/4 of the n/_d2pure subsetsU have a pernicious outcome. So at least 1/4 of the pure subsetsU are non-pernicious. For each such non-perniciousU, more than 1/3 of the n/_d2

elements inTUhave_bc(T)take a nonzero value. Consequently, at least

1 12

n/2

d 2

(23)

5 Conclusions and open problems

In this paper, we gave a regularity lemma for low-degree PTFs. Our lemma roughly says that any low-degree PTF can be decomposed as a shallow binary decision treeT, such that “most” leaves ofT correspond to “regular” low-degree PTFs (or constant functions). As an application, we showed that any constant-degree PTF is well-approximated by a constant-degree PTF with low integer weights. Our result has also been used as an essential ingredient in the recent line of work [8,19] establishing that bounded independence fools low-degree PTFs.

We conclude this paper by mentioning a few relevant open problems. At the quantitative level, an obvious question is whether the depth of the treeTmust depend exponentially on the degree parameterd. It is plausible to conjecture that such a dependence ond is inherent. It would also be very interesting to devise qualitatively improved “regularity lemmas” for PTFs (e. g., using a more refined notion of regularity) that may result in improved bounds for applications; see [20] for a first important step in this direction.

We believe that our regularity lemma may find other applications. For example, it may be useful in the problem of algorithmically reconstructing a degree-dPTF from its Fourier coefficients of degree at mostd(see [30] for a solution to thed=1 version of this problem). Finally, regarding integer-weight approximations, say that an integer weight approximator to a degree-d PTF is proper if its degree is (at most)d. Clearly, the approximators that we obtain are not proper; constructing proper such approximations is left as an interesting open problem.

References

[1] JAMESASPNES, RICHARDBEIGEL, MERRICKL. FURST,AND STEVENRUDICH: The expressive power of voting polynomials. Combinatorica, 14(2):135–148, 1994. Preliminary version in STOC’91. [doi:10.1007/BF01215346] 28

[2] PERAUSTRIN ANDJOHANHÅSTAD: Randomly supported independence and resistance. SIAM J. Comput., 40(1):1–27, 2011. Preliminary version inSTOC’09. [doi:10.1137/100783534] 33 [3] ALINE BONAMI: Étude des coefficients Fourier des fonctions deLp(G). Annales de l’Institute

Fourier, 20(2):335–402, 1970. EuDML. 32

[4] JEHOSHUABRUCK: Harmonic analysis of polynomial threshold functions. SIAM J. Discrete Math., 3(2):168–177, 1990. [doi:10.1137/0403015] 28

[5] ANTHONY CARBERY AND JAMES WRIGHT: Distributional and Lq norm inequalities for polynomials over convex bodies in Rn. Mathematical Research Letters, 8(3):233–248, 2001. [doi:10.4310/MRL.2001.v8.n3.a1] 33,42

(24)

[7] ILIAS DIAKONIKOLAS, PRAHLADH HARSHA, ADAM KLIVANS, RAGHU MEKA, PRASAD RAGHAVENDRA, ROCCO A. SERVEDIO,ANDLI-YANGTAN: Bounding the average sensitivity and noise sensitivity of polynomial threshold functions. InProc. 42nd STOC, pp. 533–542. ACM Press, 2010. [doi:10.1145/1806689.1806763] 30

[8] ILIASDIAKONIKOLAS, DANIELM. KANE,ANDJELANINELSON: Bounded independence fools degree-2 threshold functions. InProc. 51st FOCS, pp. 11–20. IEEE Comp. Soc. Press, 2010. See also atECCCandarXiv. [doi:10.1109/FOCS.2010.8] 29,30,31,49

[9] ILIAS DIAKONIKOLAS, PRASAD RAGHAVENDRA, ROCCO SERVEDIO, AND LI-YANG TAN: Average sensitivity and noise sensitivity of polynomial threshold functions. Technical report, 2009. To appear in SIAM J. Comput. [arXiv:0909.5011] 28,29,30,33

[10] ILIAS DIAKONIKOLAS ANDROCCOA. SERVEDIO: Improved approximation of linear thresh-old functions. Comput. Complexity, 22(3):623–677, 2013. Preliminary version in CCC’09. [doi:10.1007/s00037-012-0045-5] 28,30

[11] ILIASDIAKONIKOLAS, ROCCOA. SERVEDIO, LI-YANGTAN,ANDANDREWWAN: A regularity lemma, and low-weight approximators, for low-degree polynomial threshold functions. InProc. 25th IEEE Conf. on Computational Complexity (CCC’10), pp. 211–222. IEEE Comp. Soc. Press, 2010. [doi:10.1109/CCC.2010.28] 27,29

[12] IRITDINUR, EHUDFRIEDGUT, GUYKINDLER,AND RYANO’DONNELL: On the Fourier tails of bounded functions over the discrete cube. Israel J. Math., 160(1):389–412, 2007. Preliminary version inSTOC’06. [doi:10.1007/s11856-007-0068-9] 33

[13] CRAIGGOTSMAN AND NATHANLINIAL: Spectral properties of threshold functions. Combinator-ica, 14(1):35–50, 1994. [doi:10.1007/BF01305949] 28

[14] BENGREEN: A Szemerédi-type regularity lemma in abelian groups, with applications. Geometric and Functional Analysis (GAFA), 15(2):340–376, 2005. See also atarXiv. [doi:10.1007/s00039-005-0509-8] 28

[15] LEONARD GROSS: Logarithmic Sobolev inequalities. Amer. J. Math., 97(4):1061–1083, 1975. JSTOR. 32

[16] PRAHLADH HARSHA, ADAM KLIVANS, AND RAGHU MEKA: Bounding the sensitivity of polynomial threshold functions. Theory of Computing, 10(1):1–26, 2014. See also at arXiv. [doi:10.4086/toc.2014.v010a001] 28,30

[17] JEFFKAHN, GILKALAI,ANDNATHANLINIAL: The influence of variables on Boolean functions. InProc. 29th FOCS, pp. 68–80. IEEE Comp. Soc. Press, 1988. [doi:10.1109/SFCS.1988.21923] 32

(25)

[19] DANIELM. KANE:k-independent Gaussians fool polynomial threshold functions. InProc. 26th IEEE Conf. on Computational Complexity (CCC’11), pp. 252–261. IEEE Comp. Soc. Press, 2011. See also atarXiv. [doi:10.1109/CCC.2011.13] 29,30,49

[20] DANIELM. KANE: A structure theorem for poorly anticoncentrated Gaussian chaoses and applica-tions to the study of polynomial threshold funcapplica-tions. InProc. 53rd FOCS, pp. 91–100. IEEE Comp. Soc. Press, 2012. [doi:10.1109/FOCS.2012.52] 49

[21] DANIEL M. KANE: The correct exponent for the Gotsman-Linial conjecture. In Proc. 28th IEEE Conf. on Computational Complexity (CCC’13), pp. 56–64, 2013. See also at arXiv. [doi:10.1109/CCC.2013.15] 29

[22] NATHAN LINIAL, YISHAY MANSOUR, AND NOAM NISAN: Constant depth circuits, Fourier transform, and learnability. J. ACM, 40(3):607–620, 1993. Preliminary version in FOCS’89. [doi:10.1145/174130.174138] 29,34

[23] SHACHAR LOVETT, IDO BEN-ELIEZER, ANDARIEL YADIN: Polynomial threshold functions: Structure, approximation and pseudorandomness.Electron. Colloq. on Comput. Complexity (ECCC), 16:118, 2009. ECCC. See also atarXiv. 30,31

[24] KEVINMATULEF, RYANO’DONNELL, RONITTRUBINFELD,ANDROCCOA. SERVEDIO: Testing halfspaces.SIAM J. Comput., 39(5):2004–2047, 2010. Preliminary version inSODA’09. See also at ECCC, and related paper inProperty Testing. [doi:10.1137/070707890] 28

[25] RAGHUMEKA ANDDAVIDZUCKERMAN: Pseudorandom generators for polynomial threshold functions. SIAM J. Comput., 42(3):1275–1301, 2013. Preliminary version inSTOC’10. See also at arXiv. [doi:10.1137/100811623] 28,29,30,31

[26] ELCHANANMOSSEL, RYANO’DONNELL,ANDKRZYSZTOFOLESZKIEWICZ: Noise stability of functions with low influences: invariance and optimality.Ann. of Math., 171(1):295–341, 2010. Preliminary version inFOCS’05. [doi:10.4007/annals.2010.171.295] 28,33,42

[27] RYAN O’DONNELL:Analysis of Boolean Functions. Cambridge Univ. Press, 2014. See online version here. 28

[28] RYAN O’DONNELL AND ROCCO A. SERVEDIO: Extremal properties of polynomial thresh-old functions. J. Comput. System Sci., 74(3):298–312, 2008. Preliminary version in CCC’03. [doi:10.1016/j.jcss.2007.06.021] 28

[29] RYANO’DONNELL ANDROCCOA. SERVEDIO: New degree bounds for polynomial threshold func-tions.Combinatorica, 30(3):327–358, 2010. Preliminary version inSTOC’03. [doi:10.1007/s00493-010-2173-3] 28

(26)

[31] VLADIMIRV. PODOLSKII: Perceptrons of large weight. Problems of Information Transmission, 45(1):46–53, 2009. Preliminary version inCSR’07. [doi:10.1134/S0032946009010062] 29 [32] MICHAELSAKS: Slicing the hypercube. In KEITHWALKER, editor,Surveys in Combinatorics

1993, pp. 211–256. Cambridge Univ. Press, 1993. [doi:10.1017/CBO9780511662089.009] 28

[33] ROCCOA. SERVEDIO: Every linear threshold function has a low-weight approximator.Comput. Complexity, 16(2):180–209, 2007. Preliminary version inCCC’06. [doi:10.1007/s00037-007-0228-7] 28,29,30,42

[34] ALEXANDER A. SHERSTOV: The intersection of two halfspaces has high threshold degree. In Proc. 50th FOCS, pp. 343–362. IEEE Comp. Soc. Press, 2009. See also at ECCC. [doi:10.1109/FOCS.2009.18] 28

[35] ENDRESZEMERÉDI: Regular partitions of graphs.Colloq. Internat. CNRS: Problèmes combina-toires et théorie des graphes, 260:399–401, 1978. 28

[36] TERENCETAO: Structure and randomness in combinatorics. InProc. 48th FOCS, pp. 3–15. IEEE Comp. Soc. Press, 2007. [doi:10.1109/FOCS.2007.17] 28

[37] RONALD DEWOLF:A Brief Introduction to Fourier Analysis on the Boolean Cube. Number 1 in Graduate Surveys. Theory of Computing Library, 2008. [doi:10.4086/toc.gs.2008.001] 32

AUTHORS

Ilias Diakonikolas School of Informatics University of Edinburgh ilias.d ed.ac.uk

http://www.iliasdiakonikolas.org

Rocco A. Servedio

Department of Computer Science Columbia University

rocco cs.columbia.edu

http://www.cs.columbia.edu/~rocco

Li-Yang Tan

Department of Computer Science Columbia University

liyang cs.columbia.edu

(27)

Andrew Wan

Simons Institute for the Theory of Computing University of California, Berkeley

atw12 eecs.berkeley.edu

http://people.seas.harvard.edu/~atw12

ABOUT THE AUTHORS

ILIASDIAKONIKOLASis an assistant professor in theSchool of Informaticsat theUniversity of Edinburgh. He obtained his Ph. D. fromColumbia Universityadvised byMihalis Yannakakis. He is interested in algorithms, learning, statistics, and game theory.

ROCCO SERVEDIO is an associate professor in the Department of Computer Science at Columbia University. He graduated fromHarvard University, where his Ph. D. was su-pervised byLes Valiant. He is interested in computational learning theory, computational complexity, and property testing, with the study of Boolean functions as an underlying theme tying these topics together. He enjoys spending time with his family and hopes to share a bottle of Cutty Sark with H.M. in the afterlife.

LI-YANG TAN is a Ph. D. student at Columbia University where he is fortunate to be advised byRocco Servedio. His research interests lie in complexity theory, with a focus on discrete Fourier analysis, the complexity of Boolean functions, and computational learning theory.