• No results found

“Resistant” Polynomials and Stronger Lower Bounds for Depth-Three Arithmetical Formulas

N/A
N/A
Protected

Academic year: 2021

Share "“Resistant” Polynomials and Stronger Lower Bounds for Depth-Three Arithmetical Formulas"

Copied!
15
0
0

Loading.... (view fulltext now)

Full text

(1)

“Resistant” Polynomials and Stronger Lower Bounds for Depth-Three Arithmetical Formulas

Maurice J. Jansen University at Buffalo

Kenneth W.Regan University at Buffalo

Abstract

We derive quadratic lower bounds on the ∗-complexity of sum-of-products-of-sums (ΣΠΣ) formulas for classes of polynomials f that have too few partial derivatives for the (near-)quadratic lower bound techniques of Shpilka and Wigderson [SW99, Shp01] (after [NW96]). Our techniques introduce a notion of “resistance” which connotes full-degree behavior of f under any projection to an affine space of sufficiently high dimension. They also show stronger lower bounds over the reals than the complex numbers or over arbitrary fields. Separately, by applying a special form of the Baur-Strassen Derivative Lemma tailored to ΣΠΣ formulas, we obtain sharper bounds on +,∗-complexity than those shown for ∗-complexity in [SW99], most notably for the lowest-degree cases of the polynomials they consider.

1 Introduction

In contrast to the exponential size lower bounds on constant-depth Boolean circuits for majority and related functions [FSS81, Yao85, H˚as86], Shpilka and Wigderson [SW99] observed that in arithmetic complexity, over fields of characteristic zero, super-quadratic lower bounds are not even known for constant-depth formulas. Indeed they are unknown for unbounded fan-in, depth 3 formulas that are sums of products of affine linear functions, which they call ΣΠΣ formulas. These formulas have notable upper-bound power because they can carry out forms of Lagrange interpolation. As they ascribed to M. Ben-Or, ΣΠΣ formulas can compute the elementary symmetric polynomials Snk (defined as the sum of all degree-k monomials in n variables, and analogous to majority and threshold- k Boolean functions) in size O(n2) independent of k. Thus ΣΠΣ formulas present a substantial challenge for lower bounds, as well as being a nice small-scale model to study.

Shpilka and Wigderson defined the multiplicative size of an arithmetical (circuit or) formula φ to be the total fan-in to multiplication gates. We denote this by `(φ), and write `(φ) for the total fan-in to all gates, i.e. + gates as well. The best known lower bound for general arithmetical circuits has remained for thirty years the Ω(n log n) lower bound on ` by the “Degree Method” of Strassen [Str73] (see also [BS82, BCS97]). However, this comes nowhere near the exponential lower bounds conjectured by [Val79] for the permanent and expected by many for other NP-hard arithmetical functions. For polynomials f of total degree nO(1), the method is not even capable of Ω(n1+) circuit lower bounds, not for any  > 0. Hence it is notable that [SW99] achieved lower bounds on `3(f ), where the subscript-3 refers to ΣΠΣ formulas, of Ω(n2) for f = Snk when k = Θ(n), n2−k for Snk with small values of k, and Ω(N2/ polylog(N )) for the determinant, with N = n2. However,

Part of this work by both authors was supported by NSF Grant CCR-9821040. Corresponding author: Dept. of CSE at UB, 201 Bell Hall, Buffalo, NY 14260-2000; (716) 645-3180 x114, fax 645-3464; regan@cse.buffalo.edu.

(2)

their methods have a similar limitation of Ω(n2) for ΣΠΣ formulas. Shpilka [Shp01] got past this only in some further-restricted cases, and also considers a depth-2 model consisting of an arbitrary symmetric function of sums. This barrier provides another reason to study the ΣΠΣ model, in order to understand the obstacles and what might be needed to surpass them.

The techniques in [NW96, SW99, Shp01] all depend on the set of dth-order partial derivatives of f being large. This condition fails for functions such as f (x1, . . . , xn) = xn1 + . . . + xnn, which has only n dth-order partials for any d. We refine the analysis to show the sufficiency of f behaving like a degree-r polynomial on any affine subspace A of sufficiently high dimension (for this f , r = n and any affine line suffices). Our technical condition is that for every polynomial g of total degree at most r− 1 and every such A, there exists a d-th order partial of f − g that is non-constant on A. This enables us to prove an absolutely sharp n2 bound on `3(f ) for this f computed over the real or rational numbers, and a lower bound of n2/2 over any field of characteristic zero. Note the absence of “O, Ω” notation. We prove similar tight bounds for sums of powered monomial blocks, powers of inner-products, and functions depending on `p-norm distance from the origin, and also replicate the bounds of [SW99, Shp01] for symmetric polynomials. Even in the last case, we give an example where our simple existential condition may work deeper than the main question highlighted in [Shp01] on the maximum dimension of subspaces A on which Snk vanishes.

In the second half, we prove lower bounds on +,∗ complexity `3(f ) that are significantly higher (but still sub-quadratic) than those given for `3(f ) in [SW99] when the degree r of f is small. This is done intuitively by exploiting a closed-form application of the Baur-Strassen “Derivative Lemma”

[BS82] to ΣΠΣ formulas, showing that f and all of its n first partial derivatives can be computed with only a constant-factor increase in ` and ` over ΣΠΣ formulas for f .

2 Preliminaries

A ΣΠΣ-formula is an arithmetic formula consisting of four consecutive layers: a layer of inputs, next a layer of addition gates, then a layer of multiplication gates, and finally the output sum gate. The gates have unbounded fan-in from the previous layer (only), and individual wires may carry arbitrary constants from the underlying field. Given a ΣΠΣ-formula for a polynomial p, we can write

p =

s

X

i=1

Mi, where Mi = Πdj=1i li,j and li,j = ci,j,1x1+ ci,j,2x2+ . . . + ci,j,nxn+ ci,j,0.

Here di is the in-degree of the ith multiplication gate (fix any order on the multiplication gates), and ci,j,k is nonzero iff there is a wire from xk to the addition gate computing li,j. Note that li,j is homogeneous of degree 1, i.e. strictly linear , if ci,j,0= 0, and is affine linear otherwise.

2.1 Affine Linear Subspaces and Derivatives

An affine linear subspace A of Fn is a set of the form A = V + w ={ v + w : v ∈ V }, where V is a linear subspace of Fn, and w is a vector in Fn. The dimension of A is defined to be the vector space dimension of V .

Let X = (x1, . . . , xn) be an n-tuple of variables. For any affine subspace A, we can always find a set of variables B⊂ X, and affine linear forms lb in the variables X\ B, for each b ∈ B, such that A is the set of solutions of{xb = lb : b∈ B}. This representation is not unique. The set B is called a base of A. The size |B| always equals the co-dimension of A.

(3)

In the following, whenever we consider an affine linear subspace A, we assume we have fixed some base B of A. Any of our numerical “progress measures” used to prove lower bounds will not depend on the choice of a base. The following notion does depend on the choice of a base:

Definition 2.1 ([SW99]). Let A be an affine linear subspace of Fn, and let f ∈ F [x1, . . . , xn].

Then the restriction of f to A is the polynomial obtained from f by substituting lb for the variable xb for each b∈ B, is denoted by f|A. If W is a set of polynomials, define W|A ={f|A | f ∈ W }.

For linear polynomials l = c1x1 + . . . + cnxn+ c0, we denote lh = c1x1+ . . . + cnxn. For a set S of linear polynomials, Sh ={lh: l∈ S}. Observe that if Sh is an independent set, then the set of common zeroes of S is affine linear of dimension n− |S|.

3 Resistance of polynomials

We state our new definition in the weakest and simplest form that suffices for the lower bounds, although the functions in our applications all meet the stronger condition of Lemma 3.3 below.

Definition 3.1. A polynomial f in variables x1, x2, . . . , xnis (d, r, k)-resistant if for any polynomial g(x1, x2, . . . , xn) of degree at most r− 1, for any affine linear subspace A of co-dimension k, there exists a dth order partial derivative of f− g that is non-constant on A.

For a multiset X of size d with elements taken from {x1, x2, . . . , xn}, we will use the notation

df

∂X to indicate the dth-order derivative with respect to the variables in X. As our applications all have r = deg(f ), we call f simply (d, k)-resistant in this case. Then the case d = 0 says that f itself has full degree on any affine A of co-dimension k, and in most cases corresponds to the nonvanishing condition in [SW99]. We separate our notion from [SW99] in applications and notably in the important case of the elementary symmetric polynomials in Section 3.2 below.

The conclusion of Definition 3.1 is not equivalent to saying that some (d + 1)st-order partial of f − g is non-vanishing on A, because the restriction of this partial on A need not be the same as a first-partial of the restriction of the dth-order partial to A. Moreover, (d, k)-resistance need not imply (d− 1, k)-resistance, even for d, k = 1: consider f(x, y) = xy and A defined by x = 0. We have the following theorem:

Theorem 3.1 Suppose f (x1, x2, . . . , xn) is (d, r, k)-resistant, then

`3(f )≥ rk + 1 d + 1.

Proof. Consider a ΣΠΣ-formula that computes f . Remove all multiplication gates that have degree at most r− 1. Doing so we obtain a ΣΠΣ formula F computing f − g, where g is some polynomial of degree at most r− 1. Say F has s multiplication gates. Write:

f− g =

s

X

i=1

Mi, where Mi = Πdj=1i li,j and li,j = ci,j,1x1+ ci,j,2x2+ . . . + ci,j,nxn+ ci,j,0.

The degree of each multiplication gate in F is at least r, i.e. di ≥ r, for each 1 ≤ i ≤ s. Now select a set S of input linear forms using the following algorithm:

(4)

S =∅

for i = 1 to s do repeat d + 1 times:

if(∃j ∈ {1, 2, . . . , di}) such that Sh∪ {lhi,j} is a set of independent vectors then S = S∪ {li,j}

Let A be the set of common zeroes of the linear forms in S. Since Sh is an independent set, A is affine linear of co-dimension|S| ≤ (d + 1)s.

Claim 3.2 If at a multiplication gate Mi we picked strictly fewer than d + 1 linear forms, then any linear form that was not picked is constant on A.

Proof. Each linear form l that was not picked had lh already in the span of Sh, for the set S built up so far. Hence we can write l = c + lh = c +Pg∈Scggh, for certain scalars cg. Since each gh is constant on A, we conclude l is constant on A.

We conclude that for each multiplication gate at least one of the following holds:

1. (d + 1) input linear forms vanish on A, or

2. fewer than (d + 1) linear form vanishes on A, and all others are constant on A.

For each multiset X of size d with elements from{ x1, x2, . . . , xn}, the dth order partial derivative

d(f − g)

∂X (1)

is in the linear span of the set {

di

Y

j=1

j /∈J

lij : 1≤ i ≤ s, J ⊆ {1, 2, . . . , di}, |J| = d }

This follows from the sum and product rules for derivatives and the fact that a first order derivative of an individual linear form lij is a constant. Consider 1≤ i ≤ s and J ⊆ {1, 2, . . . , di} with |J| = d.

If item 1. holds for the multiplication gate Mi, then

di

Y

j=1

j /∈J

lij (2)

vanishes on A, since there must be one lij that vanishes on A that was not selected, given that

|J| = d. If item 2 holds for Mi, then (2) is constant on A.

Hence, we conclude that (1) is constant on A. Since f is (d, r, k)-resistant, we must have that the co-dimension of A is at least k + 1. Hence (d + 1)s≥ k + 1. Since each gate in F is of degree at least r, we obtain

`3(F) ≥ rk + 1 d + 1.

SinceF was obtained by removing zero or more multiplication gates from a ΣΠΣ-formula computing f , we have proven the statement of the theorem.

To prove lower bounds on resistance, we supply the following lemma that uses the syntactic notion of affine restriction. In certain cases this will be convenient.

(5)

Lemma 3.3 Over fields of characteristic zero, for any d ≤ r, k > 0, and any polynomial f (x1, x2, . . . , xn), if for every affine linear subspace A of co-dimension k, there exists some dth order partial derivative of f such that

deg( ∂df

∂X

!

|A

)≥ r − d + 1

then f is (d, r + 1, k)-resistant.

Proof. Assume for every affine linear subspace A of co-dimension k, there exists some dth order partial derivative derivative of f such that

deg( ∂df

∂X

!

|A

)≥ r − d + 1.

Let g be an arbitrary polynomial of degree r. Then

df − g

∂X

!

|A

= ∂df

∂X −∂dg

∂X

!

|A

= ∂df

∂X

!

|A

− ∂dg

∂X

!

|A

.

The term ∂Xdf

|A

has degree at least r− d + 1, whereas the term∂Xdg

|A

can have degree at most r− d. Hence deg(d∂Xf−g

|A

)≥ r − d + 1 ≥ 1. Since over fields of characteristic zero, syntactically different polynomials define different mappings, we conclude d∂Xf−g must be non-constant on A.

The main difference between Lemma 3.3 and the original Definition 3.1 appears to be the order of quantifying the polynomial “g” of degree r− 1 out front in the former, whereas analogous consider- ations in the lemma universally quantify it later (making a stronger condition). We have not found a neat way to exploit this difference in any prominent application, however.

3.1 Applications

We will now prove lower bounds on the ΣΠΣ-formula size of several explicit polynomials.

3.1.1 Sum of Nth Powers Polynomial

Consider f = Pni=1xni. For this polynomial we have ΣΠ-circuits of size O(n log n). This can be shown to be optimal using Strassen’s degree method. By that method we know any circuit for f has size Ω(n log n). The obvious ΣΠΣ-formula has additive size n2 wires in the top linear layer, and has n multiplication gates of degree n. We prove that this is essentially optimal.

Theorem 3.4 Over fields of characteristic zero, any ΣΠΣ-formula for f =Pni=1xni has multiplica- tive size at least n2/2.

Proof. We will show that f is (1, n− 1)-resistant. The result then follows from Theorem 3.1. Let g be an arbitrary polynomial of degree deg(f )− 1 = n − 1. Letting g1, . . . , gn denote the first order partial derivatives of g, we get that the ith partial derivative of f − g equals

nxni−1− gi(x1, . . . , xn).

(6)

Note that the gi’s are of total degree at most n− 2.

We claim there is no affine linear subspace of dimension greater than zero on which all ∂f /∂xi are constant. Consider an arbitrary affine line, parameterized by a variable t:

xi= ci+ dit,

where ci and di are constants for all i∈ [n], and with at least one di nonzero. Then ∂(f−g)∂x

i restricted to the line is given by

n(ci+ dit)n−1− hi(t),

for some univariate polynomials hi(t) of degree≤ n − 2. Since there must exist some i such that di is nonzero, we know some partial derivative restricted to the affine line is parameterized by a univariate polynomial of degree n− 1, and thus, given that the field is of characteristic zero, is not constant for all t.

In case the underlying field is the real numbers R and n is even, we can improve the above result to prove an absolutely tight n2 lower bound. We start with the following lemma:

Lemma 3.5 Let f = Pni=1xni. Over the real numbers, if n is even, we have that for any affine linear subspace A of dimension k≥ 1, deg(f|A) = n.

Proof. Since f is symmetric we can assume without loss of generality that the following is a base representation of A:

xk+1 = l1(x1, . . . , xk) xk+2 = l2(x1, . . . , xk)

...

xn = ln−k(x1, . . . , xk).

Then

f|A = xn1 + . . . xnk+ ln1 + . . . + lnn−k.

We conclude that f|A must include the term xn1, since each lnj has a non-negative coefficient for the term xn1, since n is even.

Theorem 3.6 Over the real numbers, for even n, any ΣΠΣ-formula for f = Pni=1xni has multi- plicative size at least n2.

Proof. Using Lemma’s 3.3 and 3.5 we conclude that over the real numbers f is (0, n− 1)-resistant.

Hence, by Theorem 3.1 we get that `3(f )≥ deg(f)n1 = n2.

Let us note that f =Pni=1xni is an example of a polynomial that, even for large d, has relatively few, namely only n, partial derivatives. This makes application of the partial derivatives technique of [SW99], which we will describe and extend in the next section, problematic.

(7)

3.1.2 Blocks of Powers

Let the underlying field have characteristic zero, and suppose n = m2 for some m. Consider the “m blocks of m powers” polynomial

f =

m

X

i=1 im

Y

j=(i−1)m+1

xmj .

The straightforward ΣΠΣ-formula for f , that computes each term/block using a multiplication gate of degree n, is of multiplicative size n3/2. We will show this is tight.

Proposition 3.7 The blocks of powers polynomial f defined above is (0, m− 1)-resistant.

Proof. Consider an affine linear space of co-dimension m− 1. For any base B of A, restriction to A consists of substitution of the m− 1 variables in B by linear forms in the remaining variables X/B.

This means there is at least one term/block Bi :=Qimj=(i−1)m+1xmj of f whose variables are disjoint from B. This block Bi remains the same under restriction to A. Also, for every other term/block there is at least one variable that is not assigned to. As a consequence, Bicannot be canceled against terms resulting from restriction to A of other blocks. Hence deg(f|A) = deg(f ). Hence by Lemma 3.3 we have that f is (0, m− 1)-resistant.

Corollary 3.8 For the blocks of powers polynomial f defined above, `3(f )≥ nm = n3/2. Proof. Follows immediately from Theorem 3.1 and Proposition 3.7.

Alternatively, one can observe that by substitution of a variable yi for each variable appearing in the ith block one obtains from a ΣΠΣ-formula F for f a formula for f0 = Pmi=1yni of the same size asF. Theorem 3.4 generalizes to show that `3(f0)≥ 12n3/2, which implies `3(f )≥ 12n3/2. 3.1.3 Polynomials depending on distance to the origin

Over the real numbers, d2(x) = x21+ x22+· · · + x2nis the square of the Euclidean distance of the point x to the origin. Polynomials f of the form q(d2(x)) where q is a single-variable polynomial can be readily seen to have high resistance. Only the leading term of q matters.

For example, consider f = (x21+ x22+· · · + x2n)m. On any affine line L in Rn, deg(f|L) = 2m.

Therefore, by Lemma 3.3, over the reals, f is (0, n− 1)-resistant. Hence by Theorem 3.1 we get that Proposition 3.9 Over the real numbers, `3((x21+ x22+· · · + x2n)m)≥ 2mn.

Observe that by reduction this means that the “mth-power of an inner product polynomial”, defined by g = (x1y1+ x2y2+· · · + xnyn)m, must also have ΣΠΣ-size at least 2mn over the reals numbers.

Results for lp norms, p6= 2, are similar.

3.2 Symmetric Polynomials

The special case of (0, k)-resistance implicitly appears in [Shp01], at least insofar as the sufficient condition of Lemma 3.3 is used for the special case d = 0 in which no derivatives are taken. For the elementary symmetric polynomial Snr of degree r≥ 2 in n variables, Theorem 4.3 of [Shp01] implies (via Lemma 3.3) that Snr is (0, n−n+r2 )-resistant. Shpilka proves for r≥ 2, `3(Snr) = Ω(r(n− r)),

(8)

which can be verified using Theorem 3.1: `3(Snr) ≥ (r + 1)(n − n+r2 ) = Ω(r(n− r)). For r = Ω(n) this yields a tight Ω(n2) bound as observed in [Shp01].

The symmetric polynomials Snkcollectively have the “telescoping” property that every dth-order partial is (zero or) the symmetric polynomial Sn−dk−d on an (n− d)-subset of the variables. Shpilka [Shp01] devolves the analysis into the question, “What is the maximum dimension of a linear subspace of Cn on which Snr vanishes?” In Shpilka’s answer, divisibility properties of r come into play as is witnessed by Theorem 5.9 of [Shp01]. To give an example case of this theorem, one can check that S92 vanishes on the 3-dimensional linear space given by

{(x1, ωx1, ω2x1, x2, ωx2, ω2x2, x3, ωx3, ω2x3) : x1, x2, x3 ∈ C}, where ω can be selected to be any primitive 3rd root of unity. Let

ρ0(f ) = max{k : for any linear space A of codimension k, f|A 6= 0}

Shpilka proved for r > n/2, that ρ0(Srn) = n− r, and for r ≥ 2, that n−r2 < ρ0(Snr) ≤ n − r. For S92 we see via divisibility properties of d that the value for ρ0 can get less than the optimum value, although the n−r2 lower bound suffices for obtaining the above mentioned `3(Snr) = Ω(r(n− r)) lower bound. We have some indication from computer runs using the polynomial algebra package Singular [GPS05] that the “unruly” behaviour seen for ρ0 because of divisibility properties for r ≤ n/2 can be made to go away by considering the following notion:

ρ1(f ) = max{k : for any linear space A of codimension k, there exists i,

∂f

∂xi



|A

6= 0}

One can still see from the fact that Snr is homogeneous and using Lemma 3.3 and Theorem 3.1 that

`3(Snr)≥ r·(ρ1(S2rn)+1). Establishing the exact value of ρ1(Snr), which we conjecture to be n + 1− r at least over the rationals, seems at least to simplify obtaining the `3(Snr) = Ω(d(n− d)) lower bound.

In any case we can prove the following relation between the two notions:

Lemma 3.10 For r≥ 2, ρ1(Sn+1r+1)≥ ρ0(Snr−1).

Proof. Suppose A is a linear space of codimension k = ρ1(Sn+1r+1) + 1 over {x1, x2, . . . , xn+1} such that all partials of Sn+1r+1 vanish on A, i.e. for every i,

Snr(x1, . . . , xi, . . . , xn+1)|A = 0. (3) Here we denote xi to indicate that the variable xi is excluded. If k = n + 1, then ρ1(Sn+1r+1) = n, so the statement of the lemma is trivial. Otherwise, some variable is not being substituted for by the restriction “|A”. By symmetry we can assume wlog. that this is xn+1. Hence we can write Equation (3) as:

xn+1Sn−1r−1(x1, . . . , xi, . . . , xn)|A + Snr−1(x1, . . . , xi, . . . , xn)|A = 0, for all i, and

Snr(x1, . . . , xn)|A = 0.

Adding the first n equations and subtracting the last n− r times one obtains 0 = xn+1

n

X

i=1

Sn−1r−1(x1, . . . , xi, . . . , xn)|A

= (n− r)xn+1Snr−1(x1, . . . , xn)|A

(9)

In other words Snr−1(x1, . . . , xn)|A = 0. Simplify the substitution corresponding to A by taking xn+1 = 0 in the defining k equations. We obtain a set of k substitutions on k variables from {x1, x2, . . . , xn}. This defines a linear space of codimension k on which Snr−1vanishes. So ρ0(Snr−1) <

k = ρ1(Sn+1r+1) + 1.

For another example, S63 is made to vanish at dimension 3 not by any subspace that zeroes out 3 co-ordinates but rather by A = { (u, −u, w, −w, y, −y) : u, w, y ∈ C }. Now add a new variable t in defining f = S74. The notable fact is that f 1-resists the dimension-3 subspace A0 obtained by adjoining t = 0 to the equations for A, upon existentially choosing to derive by a variable other than t, such as u. All terms of ∂f /∂u that include t vanish, leaving 10 terms in the variables v, w, x, y, z.

Of these, 4 pairs cancel under the equations x = −w, z = −y, but the leftover vwx + vyz part equates to uw2+ uy2, which not only doesn’t cancel but also dominates any contribution from the lower-degree g. Gr¨obner basis runs using Singular imply that S74 is (1, 4)-resistant over C as well as the rationals and reals, though we have not yet made this a consequence of a general resistance theorem for all Snr.

Hence our (1, k)-resistance analysis for S74 is not impacted by the achieved upper bound of 3 represented by A. Admittedly the symmetric polynomials f have O(n2) upper bounds on `3(f ), so our distinction in this case does not directly help surmount the quadratic barrier. But it does show promise of making progress in our algebraic understanding of polynomials in general.

4 Bounds for +,*-Complexity

The partial derivatives technique of [SW99] ignores the wires of the formula present in the first layer.

In the following we show how to account for them. As a result we get a sharpening of several lower bounds, though not on `3 but on total formula size. We refine the [SW99] result for *-complexity:

Theorem 4.1 ([SW99]) Let f ∈ F [x1, . . . , xn]. Suppose for integers d, D, κ it holds that for every affine subspace A of co-dimension κ, dim(∂d(f )|A) > D. Then

`3(f )≥ min(κ2 d, D

κ+d d

);

—to our result for +,*-complexity:

Theorem 4.2 (new) Let f ∈ F [x1, . . . , xn]. Suppose for integers d, D, κ it holds that for every affine subspace A of co-dimension κ, Pni=1dim[∂d(∂x∂f

i)|A] > D. Then

`3(f )≥ min( κ2 d + 2, D

κ+d d

).

Comparing the two theorems, we see that the result by Shpilka and Wigderson provides a lower bound on multiplicative complexity, while our result gives a lower bound on the total number of wires. We do get an extra “factor n” of additions with the Pni=1dim[∂d(∂x∂f

i)|A] > D condition compared to just dim(∂d(f )|A) > D. Potentially this can lead to improved lower bounds on the total size of the formula, better than one would be able to infer from the lower bound on multiplicative complexity of Theorem 4.1 alone. We shall see that we can indeed get such kinds of improvements in the applications section below.

(10)

We employ the following suite of concepts and lemmas from [SW99] directly.

Definition 4.1 ([SW99]). For f ∈ F [x1, . . . , xn], let ∂d(f ) be the set of all dth order formal partial derivatives of f w.r.t. variables from {x1, . . . , xn}.

For a set of polynomials A ={f1, . . . , ft}, let span(A) = {Pti=1cifi | ci∈ F }, i.e., span(A) is the linear span of A. We write dim[A] as shorthand for dim[span(A)]. We have the following elementary sub-additivity property for the measure dim[∂d(f )].

Proposition 4.3 ([SW99]) For f1, f2 ∈ F [x1, . . . , xn] and constants c1, c2∈ F , dim[∂d(c1f1+ c2f2)]≤ dim[∂d(f1)] + dim[∂d(f2)].

One also needs to bound the growth of dim[∂d(f )] in case of multiplication. For multiplication of affine linear forms, we have the following two bounds.

Proposition 4.4 ([SW99]) Let M = Πmi=1li, where each li is affine linear. Then dim[∂d(M )]≤ m

d

! .

Proposition 4.5 ([SW99]) Let M be a product gate with dim[Mh] = m, then for any d, dim[∂d(M )]≤ m + d

d

! .

Note that for polynomials f1, . . . , fs, span(f1, . . . , fs)|A = span(f1|A, . . . , fs|A), and that dim[W|A] ≤ dim[W ]. Now we modify Proposition 4.3 a little to get a result implicitly used by Shpilka and Wigderson in their arguments.

Proposition 4.6 (cf. [SW99]) For f1, f2 ∈ F [x1, . . . , xn] and constants c1, c2 ∈ F , and affine linear subspace A, we have that dim[∂d(c1f1+ c2f2)|A]≤ dim[∂d(f1)|A] + dim[∂d(f2)|A].

Finally, we require:

Lemma 4.7 ([SW99]) For every n, κ, d, and every affine subspace A of co-dimension κ, we have that

dim[∂d(Sn2d)|A]≥ n− κ d

! .

Now we can prove our sideways improvement of Shpilka and Wigderson’s main Theorem 3.1 [SW99].

Proof of Theorem 4.2. Consider a minimum-size ΣΠΣ-formula for f with multiplication gates M1, . . . , Ms. We have that

f =

s

X

i=1

Mi, where for 1≤ i ≤ s, Mi = Πdj=1i li,j and li,j = ci,j,1x1+ ci,j,2x2+ . . . + ci,j,nxn+ ci,j,0,

(11)

for certain constants ci,j,k ∈ F . Computing the partial derivative of f w.r.t. variable xk we get

∂f

∂xk =

s

X

i=1 di

X

j=1

ci,j,kMi

li,j. (4)

Let

S ={i : dim[Mih]≥ κ}.

If |S| ≥ d+2κ , then `3(f ) ≥ d+2κ2 . Suppose |S| < d+2κ . If S = ∅, then let A be an arbitrary affine subspace of co-dimension κ. Otherwise, construct an affine space A as follows. Since|S|(d + 2) < κ, and since for each j ∈ S, dim[Mih]≥ κ, it is possible to pick d + 2 input linear forms lj,1, . . . , lj,d+2

of each multiplication gate Mj with j∈ S, such that {lj,1h , . . . , lj,d+2h |j ∈ S} is a set of |S|(d + 2) < κ independent homogeneous linear forms. Define

A ={x : li,j(x) = 0, for any i∈ S, j ∈ [d + 2]}.

We have that the co-dimension of A is at most κ. W.l.o.g. assume the co-dimension of A equals κ.

For each i∈ S, d+2 linear forms of Mi vanish on A. This implies that dim[∂d(Mi

li,j)|A] = 0.

for any i∈ S. For any i /∈ S, by Proposition 4.5, dim[∂d(Mi

li,j

)|A] < κ + d d

! .

Let Dk= dim[∂d(∂x∂f

k)|A]. By Proposition 4.6 and equation (4), DkX

i /∈S

X

j

ci,j,k6=0

dim[∂d(Mi

li,j)|A].

Hence there must be at least Dk

(κ+dd ) terms on the r.h.s., i.e. there are at least that many wires from xk to gates in the first layer. Hence in total the number of wires to the first layer is at least Pn

i=1 Di

(κ+dd ) > D (κ+dd ).

We can apply a similar idea to adapt the other main theorem from [SW99]:

Theorem 4.8 ([SW99]) Let f ∈ F [x1, . . . , xn]. Suppose for integers d, D, κ it holds that for every affine subspace A of co-dimension κ, dim(∂d(f|A)) > D. Then for every m≥ 2,

`3(f )≥ min(κm, D

m d

).

Theorem 4.9 (new) Let f ∈ F [x1, . . . , xn]. Suppose for integers d, D, κ with d ≥ 1, it holds that for every affine subspace A of co-dimension κ, Pni=1dim[∂d(∂x∂f

i|A)] > D. Then for every m≥ 2,

`3(f )≥ min(1

2κm, D

m−1 d

).

(12)

Proof. Consider a minimum size ΣΠΣ-formula for f with multiplication gates M1, . . . , Ms. We have that

f =

s

X

i=1

Mi, where for 1≤ i ≤ s, Mi= Πdj=1i li,j, with li,j = ci,j,1x1+ ci,j,2x2+ . . . + ci,j,nxn+ ci,j,0.

If there are κ2 multiplication gates Mi of degree greater than m then already `3(f ) > 12κm. So suppose the number t of multiplication gates of degree greater than m is less than κ2. Wlog. assume these gates are given by

M1, M2, . . . , Mt.

For i = 1, 2, . . . , t, pick two input linear forms li,1, li,2 of Mi, such that for the total collection l1,1, l1,2, . . . , li,1, li,2 we have that the strictly linear parts l1,1h , lh1,2, . . . , lhi,1, lhi,2 are independent. It might be that at some i ≤ t, we cannot find any li,1 or li,2 with li,1h or li,2h independent from the previously collected linear forms. In this case, we just pick li,1 if that one is still independent, and skip to the next index i. If we can’t even find li,1 for which li,1 is independent, we pick no linear form and proceed to the next i.

Let A be the zero set of all the collected input linear forms. Then A has co-dimension at most κ. Wlog. we may assume that the co-dimension of A equals κ. Observe that

∂f

∂xk |A =

s

X

i=1 di

X

j=1

ci,j,k(Mi li,j

)|A. (5)

Now for a multiplication gate Mi of degree ≥ m, there are three cases: either we picked two input linear forms of Mi, or we picked just one, or none at all. In the first case,

(Mi li,j

)|A = 0

in the r.h.s. of (5), for all i, j. In the second and third case, we know that for every input l of Mi

that was not picked, lh is a linear combination of lhi’s for li’s that were picked. Hence lh|

A =

r

X

i=1

ci(lhi |

A) = constant.

As a consequence, (Ml i

i,j)|A = constant in the r.h.s. of (5), for all i, j. Since d≥ 1, in either three cases, we obtain that ∂d(Ml i

i,j|A

) = 0. For multiplication gates Miof degree at most m, Proposition 4.4 gives us that dim[∂d((Ml i

i,j)|A)]≤ m−1d . Let Dk = dim[∂d(∂x∂f

k|A

)]. By Proposition 4.3, we see there are at least Dk/ m−1d terms in (5). This implies that there are at least that many wires fanning out of xk. Adding up for all variables, we conclude that `3(f )≥ D/ m−1d .

4.1 Some Applications

In [SW99] it was proved that for d ≤ log n, `3(Sn2d) = Ω(n

2d d+2

d ). Note for d = 2, this lower bound is only Ω(n). We can apply Theorem 4.2 to prove the following stronger lower bound on the total formula size of Sn2d. In particular for d = 2, we get an Ω(n43) bound.

(13)

Theorem 4.10 For 1≤ d ≤ log n, `3(Sn2d) = Ω(n

2d d+1

d ).

Proof. For any affine subspace A of co-dimension κ and d≥ 2 we have that

n

X

i=1

dim[∂d−1(∂Sn2d

∂xi )|A]≥ dim[∂d(Sn2d)|A]≥ n− κ d

! . The latter inequality follows from Lemma 4.7. Applying Theorem 4.2 we get that

`3(Sn2d)≥ min( κ2 d + 1,

n−κ d



κ+d−1 d−1

) = min( κ2 d + 1,

n−κ d



κ+d d

 κ + d

d ). (6)

Set κ = 19nd+1d . Then we have that

n−κ d



κ+d d

 κ + d

d ≥ (n− κ

κ + d)dκ + d

d ≥ ( 8/9n 2/9nd+1d

)dκ + d

d = 4dnd+1d κ + d d ≥ 4d

9dnd+12d ≥ nd+12d .

Hence (2) is at least min( n

2d d+1

81(d+1), nd+12d ) = Ω(n

2d d+1

d ).

Corollary 4.11 `3(Sn4) = Ω(n4/3).

Shpilka and Wigderson defined the “product-of-inner-products” polynomial over 2d variable sets of size n (superscript indicate different variables, each variable has degree one) by

P IPnd=

d

Y

i=1 n

X

j=1

xijyji.

Theorem 4.12 For any constant d > 0, `3(P IPnd) = Ω(nd+12d ).

Proof. Let f = P IPnd. Essentially we have that

∂f

∂xij = yijP IPnd−1,

where the P IPnd−1 must be chosen on the appropriate variable set. Let A be an arbitrary affine linear subspace of co-dimension κ. Then

d

X

i=1 n

X

j=1

dim[∂d−1( ∂f

∂xij|A)] =

d

X

i=1 n

X

j=1

dim[∂d−1(yijP IPnd−1|A)]

≥ (dn − κ) dim[∂d−1(P IPnd−1|A)]

The last inequality follows because at least dn− κ of the y-variables are not assigned to with the restriction to A. From Lemma 4.9 in [SW99] one gets

dim[∂d−1(P IPnd−1|A)≥ nd−1− 22d−1κnd−2.

(14)

Using Theorem 4.9 we get

`3(f )≥ min(κ2

2 ,(dn− κ)(nd−1− 22d−1κnd−2)

κ−1 d−1

 ).

Taking κ = nd+1d , one gets for constant d that

`3(P IPnd) = Ω(nd+12d ).

For comparison, in [SW99] one gets `3(P IPnd) = Ω(nd+22d ).

5 Conclusion—Possible Further Tools

We have taken some further steps after [SW99], obtaining absolutely tight (rather than asymp- totically so) multiplicative size lower bounds for some natural functions, and obtaining somewhat improved bounds on +,∗-size for low-degree symmetric and product-of-inner-product polynomials.

However, these may if anything enhance the feeling from [SW99, Shp01] that the concepts being employed may go no further than quadratic for lower bounds. One cannot after all say that a func- tion f (x1, . . . , xn) is non-vanishing on an affine-linear space of co-dimension more than n. The quest then is for a mathematical invariant that scales beyond linear with the number of degree-d-or-higher multiplication gates in the formula.

Notably absent in current lower bound techniques for ΣΠΣ-formulas are random restriction type arguments, whereas many of the results for Boolean constant depth circuits of [Ajt83, FSS81, Yao85, H˚as89] proceed using random restrictions. Note that Raz managed to use random restrictions in conjunction with a partial derivatives based technique in his work on multilinear arithmetical formulas [Raz04a, Raz04b]. We speculate that the weaker existential requirement in our resistance notion may help it adapt to random-restriction scenarios, although plenitude of kth-partials may still be needed to ensure the function does not collapse too far. In any event, the search for stronger mathematical techniques to prove exponential lower bounds, even in the self-contained ΣΠΣ formula case, continues.

Acknowledgments We thank Avi Wigderson for comments on a very early version of this work, and referees of a later version for very helpful criticism.

References

[Ajt83] M. Ajtai. Σ11 formulae on finite structures. Annals of Pure and Applied Logic, 24:1–48, 1983.

[BCS97] P. B¨urgisser, M. Clausen, and M.A. Shokrollahi. Algebraic Complexity Theory. Springer Verlag, 1997.

[BS82] W. Baur and V. Strassen. The complexity of partial derivatives. Theor. Comp. Sci., 22:317–330, 1982.

[FSS81] M. Furst, J. Saxe, and M. Sipser. Parity, circuits, and the polynomial-time hierarchy. In Proc. 22nd Annual IEEE Symposium on Foundations of Computer Science, pages 260–270, 1981.

(15)

[GPS05] G.-M. Greuel, G. Pfister, and H. Sch¨onemann. Singular 3.0. A Computer Algebra System for Polynomial Computations, Centre for Computer Algebra, University of Kaiserslautern, 2005. http://www.singular.uni-kl.de.

[H˚as86] J. H˚astad. Almost optimal lower bounds for small-depth circuits. In Proc. 18th Annual ACM Symposium on the Theory of Computing, pages 6–20, 1986.

[H˚as89] J. H˚astad. Almost optimal lower bounds for small-depth circuits. In S. Micali, editor, Randomness and Computation, volume 5 of Advances in Computing Research, pages 143–

170. JAI Press, Greenwich, CT, USA, 1989.

[NW96] N. Nisan and A. Wigderson. Lower bounds on arithmetic circuits via partial derivatives.

Computational Complexity, 6:217–234, 1996.

[Raz04a] R. Raz. Multilinear formulas for permanent and determinant are of super-polynomial size.

In Proc. 36th Annual ACM Symposium on the Theory of Computing, 2004. to appear; also ECCC TR03-067.

[Raz04b] R. Raz. Multilinear NC1 6= multilinear NC2. In Proc. 45th Annual IEEE Symposium on Foundations of Computer Science, pages 344–351, 2004.

[Shp01] A. Shpilka. Affine projections of symmetric polynomials. In Proc. 16th Annual IEEE Conference on Computational Complexity, pages 160–171, 2001.

[Str73] V. Strassen. Berechnung und Programm II. Acta Informatica, 2:64–79, 1973.

[SW99] A. Shpilka and A. Wigderson. Depth-3 arithmetic formulae over fields of characteristic zero. Technical Report 23, ECCC, 1999.

[Val79] L. Valiant. The complexity of computing the permanent. Theor. Comp. Sci., 8:189–201, 1979.

[Yao85] A. Yao. Separating the polynomial-time hierarchy by oracles. In Proc. 26th Annual IEEE Symposium on Foundations of Computer Science, pages 1–10, 1985.

References

Related documents

Compared to the initial investment of a SMEs, enterprises with rich investment experience are more likely to seek the existing large market, much cheap labor and rich

This level is for specifying the input type, selecting the control method, control period, setting direct/reverse action and alarm type.. You can move to the advanced function

To delay cooking up to 99 minutes and 99 seconds, touch either MICRO COOK, TEMP COOK/HOLD or AUTO ROAST and enter cook time, temperature or code.. Touch KITCHEN TIMER and enter

It is a consortium led by IIT-Kanpur and other members includes, the commonwealth of learning (col), the Indian institute of management-Calcutta (IIMC) and the

Keywords: Hepatic angiomyolipoma; Diffusion-weighted MRI; Apparent diffusion coefficient (ADC); Chemical shift imaging; Early venous

wenn ich hier sehe, dass junge Menschen ausge- bildet werden oder Felder auf eine Art und Weise bewirtschaftet werden, dass es mehr Erträge gibt.. Ich empfinde das als sehr

111 In 1976 the Codex Alimentarius Commission issued a Statement on Infant Feeding which said, “it is necessary to encourage breastfeeding by all possible means in order to

Indeed, bio- logical control scientists may wish to become even more proactive about cooperating in classical biological con- trol of citrus pests and begin sharing information