“Resistant” Polynomials and Stronger Lower Bounds for Depth-Three Arithmetical Formulas

(1)

“Resistant” Polynomials and Stronger Lower Bounds for Depth-Three Arithmetical Formulas

Maurice J. Jansen University at Buffalo

Kenneth W.Regan^∗ University at Buffalo

Abstract

We derive quadratic lower bounds on the ∗-complexity of sum-of-products-of-sums (ΣΠΣ) formulas for classes of polynomials f that have too few partial derivatives for the (near-)quadratic lower bound techniques of Shpilka and Wigderson [SW99, Shp01] (after [NW96]). Our techniques introduce a notion of “resistance” which connotes full-degree behavior of f under any projection to an affine space of sufficiently high dimension. They also show stronger lower bounds over the reals than the complex numbers or over arbitrary fields. Separately, by applying a special form of the Baur-Strassen Derivative Lemma tailored to ΣΠΣ formulas, we obtain sharper bounds on +,∗-complexity than those shown for ∗-complexity in [SW99], most notably for the lowest-degree cases of the polynomials they consider.

1 Introduction

In contrast to the exponential size lower bounds on constant-depth Boolean circuits for majority and related functions [FSS81, Yao85, H˚as86], Shpilka and Wigderson [SW99] observed that in arithmetic complexity, over fields of characteristic zero, super-quadratic lower bounds are not even known for constant-depth formulas. Indeed they are unknown for unbounded fan-in, depth 3 formulas that are sums of products of affine linear functions, which they call ΣΠΣ formulas. These formulas have notable upper-bound power because they can carry out forms of Lagrange interpolation. As they ascribed to M. Ben-Or, ΣΠΣ formulas can compute the elementary symmetric polynomials S_n^k (defined as the sum of all degree-k monomials in n variables, and analogous to majority and threshold- k Boolean functions) in size O(n²) independent of k. Thus ΣΠΣ formulas present a substantial challenge for lower bounds, as well as being a nice small-scale model to study.

Shpilka and Wigderson defined the multiplicative size of an arithmetical (circuit or) formula φ to be the total fan-in to multiplication gates. We denote this by `^∗(φ), and write `(φ) for the total fan-in to all gates, i.e. + gates as well. The best known lower bound for general arithmetical circuits has remained for thirty years the Ω(n log n) lower bound on `^∗ by the “Degree Method” of Strassen [Str73] (see also [BS82, BCS97]). However, this comes nowhere near the exponential lower bounds conjectured by [Val79] for the permanent and expected by many for other NP-hard arithmetical functions. For polynomials f of total degree n^O(1), the method is not even capable of Ω(n¹⁺) circuit lower bounds, not for any > 0. Hence it is notable that [SW99] achieved lower bounds on `^∗₃(f ), where the subscript-3 refers to ΣΠΣ formulas, of Ω(n²) for f = S_n^k when k = Θ(n), n²⁻^k for S_n^k with small values of k, and Ω(N²/ polylog(N )) for the determinant, with N = n². However,

∗Part of this work by both authors was supported by NSF Grant CCR-9821040. Corresponding author: Dept. of CSE at UB, 201 Bell Hall, Buffalo, NY 14260-2000; (716) 645-3180 x114, fax 645-3464; regan@cse.buffalo.edu.

(2)

their methods have a similar limitation of Ω(n²) for ΣΠΣ formulas. Shpilka [Shp01] got past this only in some further-restricted cases, and also considers a depth-2 model consisting of an arbitrary symmetric function of sums. This barrier provides another reason to study the ΣΠΣ model, in order to understand the obstacles and what might be needed to surpass them.

The techniques in [NW96, SW99, Shp01] all depend on the set of dth-order partial derivatives of f being large. This condition fails for functions such as f (x₁, . . . , x_n) = xⁿ₁ + . . . + xⁿ_n, which has only n dth-order partials for any d. We refine the analysis to show the sufficiency of f behaving like a degree-r polynomial on any affine subspace A of sufficiently high dimension (for this f , r = n and any affine line suffices). Our technical condition is that for every polynomial g of total degree at most r− 1 and every such A, there exists a d-th order partial of f − g that is non-constant on A. This enables us to prove an absolutely sharp n² bound on `^∗₃(f ) for this f computed over the real or rational numbers, and a lower bound of n²/2 over any field of characteristic zero. Note the absence of “O, Ω” notation. We prove similar tight bounds for sums of powered monomial blocks, powers of inner-products, and functions depending on `p-norm distance from the origin, and also replicate the bounds of [SW99, Shp01] for symmetric polynomials. Even in the last case, we give an example where our simple existential condition may work deeper than the main question highlighted in [Shp01] on the maximum dimension of subspaces A on which S_n^k vanishes.

In the second half, we prove lower bounds on +,∗ complexity `3(f ) that are significantly higher (but still sub-quadratic) than those given for `^∗₃(f ) in [SW99] when the degree r of f is small. This is done intuitively by exploiting a closed-form application of the Baur-Strassen “Derivative Lemma”

[BS82] to ΣΠΣ formulas, showing that f and all of its n first partial derivatives can be computed with only a constant-factor increase in ` and `^∗ over ΣΠΣ formulas for f .

2 Preliminaries

A ΣΠΣ-formula is an arithmetic formula consisting of four consecutive layers: a layer of inputs, next a layer of addition gates, then a layer of multiplication gates, and finally the output sum gate. The gates have unbounded fan-in from the previous layer (only), and individual wires may carry arbitrary constants from the underlying field. Given a ΣΠΣ-formula for a polynomial p, we can write

p =

s

X

i=1

M_i, where M_i = Π^d_j=1ⁱ l_i,j and l_i,j = c_i,j,1x₁+ c_i,j,2x₂+ . . . + c_i,j,nx_n+ c_i,j,0.

Here d_i is the in-degree of the ith multiplication gate (fix any order on the multiplication gates), and ci,j,k is nonzero iff there is a wire from xk to the addition gate computing li,j. Note that li,j is homogeneous of degree 1, i.e. strictly linear , if ci,j,0= 0, and is affine linear otherwise.

2.1 Affine Linear Subspaces and Derivatives

An affine linear subspace A of Fⁿ is a set of the form A = V + w ={ v + w : v ∈ V }, where V is a linear subspace of Fⁿ, and w is a vector in Fⁿ. The dimension of A is defined to be the vector space dimension of V .

Let X = (x1, . . . , xn) be an n-tuple of variables. For any affine subspace A, we can always find a set of variables B⊂ X, and affine linear forms lb in the variables X\ B, for each b ∈ B, such that A is the set of solutions of{xb = l_b : b∈ B}. This representation is not unique. The set B is called a base of A. The size |B| always equals the co-dimension of A.

(3)

In the following, whenever we consider an affine linear subspace A, we assume we have fixed some base B of A. Any of our numerical “progress measures” used to prove lower bounds will not depend on the choice of a base. The following notion does depend on the choice of a base:

Definition 2.1 ([SW99]). Let A be an affine linear subspace of Fⁿ, and let f ∈ F [x1, . . . , xn].

Then the restriction of f to A is the polynomial obtained from f by substituting l_b for the variable x_b for each b∈ B, is denoted by f|A. If W is a set of polynomials, define W_|_A ={f|A | f ∈ W }.

For linear polynomials l = c1x1 + . . . + cnxn+ c0, we denote l^h = c1x1+ . . . + cnxn. For a set S of linear polynomials, S^h ={l^h: l∈ S}. Observe that if S^h is an independent set, then the set of common zeroes of S is affine linear of dimension n− |S|.

3 Resistance of polynomials

We state our new definition in the weakest and simplest form that suffices for the lower bounds, although the functions in our applications all meet the stronger condition of Lemma 3.3 below.

Definition 3.1. A polynomial f in variables x1, x2, . . . , xnis (d, r, k)-resistant if for any polynomial g(x₁, x₂, . . . , x_n) of degree at most r− 1, for any affine linear subspace A of co-dimension k, there exists a dth order partial derivative of f− g that is non-constant on A.

For a multiset X of size d with elements taken from {x1, x₂, . . . , x_n}, we will use the notation

∂^df

∂X to indicate the dth-order derivative with respect to the variables in X. As our applications all have r = deg(f ), we call f simply (d, k)-resistant in this case. Then the case d = 0 says that f itself has full degree on any affine A of co-dimension k, and in most cases corresponds to the nonvanishing condition in [SW99]. We separate our notion from [SW99] in applications and notably in the important case of the elementary symmetric polynomials in Section 3.2 below.

The conclusion of Definition 3.1 is not equivalent to saying that some (d + 1)st-order partial of f − g is non-vanishing on A, because the restriction of this partial on A need not be the same as a first-partial of the restriction of the dth-order partial to A. Moreover, (d, k)-resistance need not imply (d− 1, k)-resistance, even for d, k = 1: consider f(x, y) = xy and A defined by x = 0. We have the following theorem:

Theorem 3.1 Suppose f (x1, x2, . . . , xn) is (d, r, k)-resistant, then

`^∗₃(f )≥ rk + 1 d + 1.

Proof. Consider a ΣΠΣ-formula that computes f . Remove all multiplication gates that have degree at most r− 1. Doing so we obtain a ΣΠΣ formula F computing f − g, where g is some polynomial of degree at most r− 1. Say F has s multiplication gates. Write:

f− g =

s

X

i=1

M_i, where M_i = Π^d_j=1ⁱ l_i,j and l_i,j = c_i,j,1x₁+ c_i,j,2x₂+ . . . + c_i,j,nx_n+ c_i,j,0.

The degree of each multiplication gate in F is at least r, i.e. di ≥ r, for each 1 ≤ i ≤ s. Now select a set S of input linear forms using the following algorithm:

(4)

S =∅

for i = 1 to s do repeat d + 1 times:

if(∃j ∈ {1, 2, . . . , di}) such that S^h∪ {l^hi,j} is a set of independent vectors then S = S∪ {li,j}

Let A be the set of common zeroes of the linear forms in S. Since S^h is an independent set, A is affine linear of co-dimension|S| ≤ (d + 1)s.

Claim 3.2 If at a multiplication gate M_i we picked strictly fewer than d + 1 linear forms, then any linear form that was not picked is constant on A.

Proof. Each linear form l that was not picked had l^h already in the span of S^h, for the set S built up so far. Hence we can write l = c + l^h = c +^P_g∈Sc_gg^h, for certain scalars c_g. Since each g^h is constant on A, we conclude l is constant on A.

We conclude that for each multiplication gate at least one of the following holds:

1. (d + 1) input linear forms vanish on A, or

2. fewer than (d + 1) linear form vanishes on A, and all others are constant on A.

For each multiset X of size d with elements from{ x1, x₂, . . . , x_n}, the dth order partial derivative

∂^d(f − g)

∂X (1)

is in the linear span of the set {

di

Y

j=1

j /∈J

lij : 1≤ i ≤ s, J ⊆ {1, 2, . . . , di}, |J| = d }

This follows from the sum and product rules for derivatives and the fact that a first order derivative of an individual linear form l_ij is a constant. Consider 1≤ i ≤ s and J ⊆ {1, 2, . . . , di} with |J| = d.

If item 1. holds for the multiplication gate Mi, then

di

Y

j=1

j /∈J

lij (2)

vanishes on A, since there must be one lij that vanishes on A that was not selected, given that

|J| = d. If item 2 holds for Mi, then (2) is constant on A.

Hence, we conclude that (1) is constant on A. Since f is (d, r, k)-resistant, we must have that the co-dimension of A is at least k + 1. Hence (d + 1)s≥ k + 1. Since each gate in F is of degree at least r, we obtain

`^∗₃(F) ≥ rk + 1 d + 1.

SinceF was obtained by removing zero or more multiplication gates from a ΣΠΣ-formula computing f , we have proven the statement of the theorem.

To prove lower bounds on resistance, we supply the following lemma that uses the syntactic notion of affine restriction. In certain cases this will be convenient.

(5)

Lemma 3.3 Over fields of characteristic zero, for any d ≤ r, k > 0, and any polynomial f (x₁, x₂, . . . , x_n), if for every affine linear subspace A of co-dimension k, there exists some dth order partial derivative of f such that

deg( ∂^df

∂X

!

|A

)≥ r − d + 1

then f is (d, r + 1, k)-resistant.

Proof. Assume for every affine linear subspace A of co-dimension k, there exists some dth order partial derivative derivative of f such that

deg( ∂^df

∂X

!

|A

)≥ r − d + 1.

Let g be an arbitrary polynomial of degree r. Then

∂^df − g

∂X

!

|A

= ∂^df

∂X −∂^dg

∂X

!

|A

= ∂^df

∂X

!

|A

− ∂^dg

∂X

!

|A

.

The term ^∂_∂X^d^f

|A

has degree at least r− d + 1, whereas the term^∂_∂X^d^g

|A

can have degree at most r− d. Hence deg(^∂^d_∂X^f−g

|A

)≥ r − d + 1 ≥ 1. Since over fields of characteristic zero, syntactically different polynomials define different mappings, we conclude ^∂^d_∂X^f^−g must be non-constant on A.

The main difference between Lemma 3.3 and the original Definition 3.1 appears to be the order of quantifying the polynomial “g” of degree r− 1 out front in the former, whereas analogous consider- ations in the lemma universally quantify it later (making a stronger condition). We have not found a neat way to exploit this difference in any prominent application, however.

3.1 Applications

We will now prove lower bounds on the ΣΠΣ-formula size of several explicit polynomials.

3.1.1 Sum of Nth Powers Polynomial

Consider f = ^Pⁿ_i=1xⁿ_i. For this polynomial we have ΣΠ-circuits of size O(n log n). This can be shown to be optimal using Strassen’s degree method. By that method we know any circuit for f has size Ω(n log n). The obvious ΣΠΣ-formula has additive size n² wires in the top linear layer, and has n multiplication gates of degree n. We prove that this is essentially optimal.

Theorem 3.4 Over fields of characteristic zero, any ΣΠΣ-formula for f =^Pⁿ_i=1xⁿ_i has multiplicative size at least n²/2.

Proof. We will show that f is (1, n− 1)-resistant. The result then follows from Theorem 3.1. Let g be an arbitrary polynomial of degree deg(f )− 1 = n − 1. Letting g1, . . . , g_n denote the first order partial derivatives of g, we get that the ith partial derivative of f − g equals

nxⁿ_i⁻¹− gi(x₁, . . . , x_n).

(6)

Note that the gi’s are of total degree at most n− 2.

We claim there is no affine linear subspace of dimension greater than zero on which all ∂f /∂x_i are constant. Consider an arbitrary affine line, parameterized by a variable t:

x_i= c_i+ d_it,

where ci and di are constants for all i∈ [n], and with at least one di nonzero. Then ^∂(f−g)_∂x

i restricted to the line is given by

n(ci+ dit)ⁿ⁻¹− hi(t),

for some univariate polynomials hi(t) of degree≤ n − 2. Since there must exist some i such that di is nonzero, we know some partial derivative restricted to the affine line is parameterized by a univariate polynomial of degree n− 1, and thus, given that the field is of characteristic zero, is not constant for all t.

In case the underlying field is the real numbers R and n is even, we can improve the above result to prove an absolutely tight n² lower bound. We start with the following lemma:

Lemma 3.5 Let f = ^Pⁿ_i=1xⁿ_i. Over the real numbers, if n is even, we have that for any affine linear subspace A of dimension k≥ 1, deg(f_|_A) = n.

Proof. Since f is symmetric we can assume without loss of generality that the following is a base representation of A:

xk+1 = l1(x1, . . . , xk) xk+2 = l2(x1, . . . , xk)

...

x_n = l_n−k(x₁, . . . , x_k).

Then

f_|_A = xⁿ₁ + . . . xⁿ_k+ lⁿ₁ + . . . + lⁿ_n_−k.

We conclude that f_|_A must include the term xⁿ₁, since each lⁿ_j has a non-negative coefficient for the term xⁿ₁, since n is even.

Theorem 3.6 Over the real numbers, for even n, any ΣΠΣ-formula for f = ^Pⁿ_i=1xⁿ_i has multiplicative size at least n².

Proof. Using Lemma’s 3.3 and 3.5 we conclude that over the real numbers f is (0, n− 1)-resistant.

Hence, by Theorem 3.1 we get that `^∗₃(f )≥ deg(f)ⁿ₁ = n².

Let us note that f =^Pⁿ_i=1xⁿ_i is an example of a polynomial that, even for large d, has relatively few, namely only n, partial derivatives. This makes application of the partial derivatives technique of [SW99], which we will describe and extend in the next section, problematic.

(7)

3.1.2 Blocks of Powers

Let the underlying field have characteristic zero, and suppose n = m² for some m. Consider the “m blocks of m powers” polynomial

f =

m

X

i=1 im

Y

j=(i−1)m+1

x^m_j .

The straightforward ΣΠΣ-formula for f , that computes each term/block using a multiplication gate of degree n, is of multiplicative size n^3/2. We will show this is tight.

Proposition 3.7 The blocks of powers polynomial f defined above is (0, m− 1)-resistant.

Proof. Consider an affine linear space of co-dimension m− 1. For any base B of A, restriction to A consists of substitution of the m− 1 variables in B by linear forms in the remaining variables X/B.

This means there is at least one term/block B_i :=^Q^im_j=(i−1)m+1x^m_j of f whose variables are disjoint from B. This block Bi remains the same under restriction to A. Also, for every other term/block there is at least one variable that is not assigned to. As a consequence, Bicannot be canceled against terms resulting from restriction to A of other blocks. Hence deg(f_|_A) = deg(f ). Hence by Lemma 3.3 we have that f is (0, m− 1)-resistant.

Corollary 3.8 For the blocks of powers polynomial f defined above, `^∗₃(f )≥ nm = n^3/2. Proof. Follows immediately from Theorem 3.1 and Proposition 3.7.

Alternatively, one can observe that by substitution of a variable yi for each variable appearing in the ith block one obtains from a ΣΠΣ-formula F for f a formula for f⁰ = ^P^m_i=1yⁿ_i of the same size asF. Theorem 3.4 generalizes to show that `^∗3(f⁰)≥ ¹₂n^3/2, which implies `^∗₃(f )≥ ¹₂n^3/2. 3.1.3 Polynomials depending on distance to the origin

Over the real numbers, d₂(x) = x²₁+ x²₂+· · · + x²nis the square of the Euclidean distance of the point x to the origin. Polynomials f of the form q(d2(x)) where q is a single-variable polynomial can be readily seen to have high resistance. Only the leading term of q matters.

For example, consider f = (x²₁+ x²₂+· · · + x²n)^m. On any affine line L in Rⁿ, deg(f_|_L) = 2m.

Therefore, by Lemma 3.3, over the reals, f is (0, n− 1)-resistant. Hence by Theorem 3.1 we get that Proposition 3.9 Over the real numbers, `^∗₃((x²₁+ x²₂+· · · + x²n)^m)≥ 2mn.

Observe that by reduction this means that the “mth-power of an inner product polynomial”, defined by g = (x1y1+ x2y2+· · · + xnyn)^m, must also have ΣΠΣ-size at least 2mn over the reals numbers.

Results for l_p norms, p6= 2, are similar.

3.2 Symmetric Polynomials

The special case of (0, k)-resistance implicitly appears in [Shp01], at least insofar as the sufficient condition of Lemma 3.3 is used for the special case d = 0 in which no derivatives are taken. For the elementary symmetric polynomial S_n^r of degree r≥ 2 in n variables, Theorem 4.3 of [Shp01] implies (via Lemma 3.3) that S_n^r is (0, n−^n+r₂ )-resistant. Shpilka proves for r≥ 2, `3(S_n^r) = Ω(r(n− r)),

(8)

which can be verified using Theorem 3.1: `3(S_n^r) ≥ (r + 1)(n − ^n+r₂ ) = Ω(r(n− r)). For r = Ω(n) this yields a tight Ω(n²) bound as observed in [Shp01].

The symmetric polynomials S_n^kcollectively have the “telescoping” property that every dth-order partial is (zero or) the symmetric polynomial S_n−d^k−d on an (n− d)-subset of the variables. Shpilka [Shp01] devolves the analysis into the question, “What is the maximum dimension of a linear subspace of Cⁿ on which S_n^r vanishes?” In Shpilka’s answer, divisibility properties of r come into play as is witnessed by Theorem 5.9 of [Shp01]. To give an example case of this theorem, one can check that S₉² vanishes on the 3-dimensional linear space given by

{(x1, ωx1, ω²x1, x2, ωx2, ω²x2, x3, ωx3, ω²x3) : x1, x2, x3 ∈ C}, where ω can be selected to be any primitive 3rd root of unity. Let

ρ0(f ) = max{k : for any linear space A of codimension k, f|A 6= 0}

Shpilka proved for r > n/2, that ρ0(S^r_n) = n− r, and for r ≥ 2, that ⁿ^−r₂ < ρ0(S_n^r) ≤ n − r. For S₉² we see via divisibility properties of d that the value for ρ₀ can get less than the optimum value, although the ^n−r₂ lower bound suffices for obtaining the above mentioned `₃(S_n^r) = Ω(r(n− r)) lower bound. We have some indication from computer runs using the polynomial algebra package Singular [GPS05] that the “unruly” behaviour seen for ρ0 because of divisibility properties for r ≤ n/2 can be made to go away by considering the following notion:

ρ₁(f ) = max{k : for any linear space A of codimension k, there exists i,

∂f

∂x_i

|A

6= 0}

One can still see from the fact that S_n^r is homogeneous and using Lemma 3.3 and Theorem 3.1 that

`^∗₃(S_n^r)≥ ^r·(ρ¹^(S₂^rⁿ⁾⁺¹⁾. Establishing the exact value of ρ1(S_n^r), which we conjecture to be n + 1− r at least over the rationals, seems at least to simplify obtaining the `3(S_n^r) = Ω(d(n− d)) lower bound.

In any case we can prove the following relation between the two notions:

Lemma 3.10 For r≥ 2, ρ1(S_n+1^r+1)≥ ρ0(S_n^r−1).

Proof. Suppose A is a linear space of codimension k = ρ₁(S_n+1^r+1) + 1 over {x1, x₂, . . . , x_n+1} such that all partials of S_n+1^r+1 vanish on A, i.e. for every i,

S_n^r(x₁, . . . , x_i, . . . , x_n+1)_|_A = 0. (3) Here we denote x_i to indicate that the variable x_i is excluded. If k = n + 1, then ρ₁(S_n+1^r+1) = n, so the statement of the lemma is trivial. Otherwise, some variable is not being substituted for by the restriction “|A”. By symmetry we can assume wlog. that this is x_n+1. Hence we can write Equation (3) as:

x_n+1S_n−1^r⁻¹(x₁, . . . , x_i, . . . , x_n)_|_A + S_n^r₋₁(x₁, . . . , x_i, . . . , x_n)_|_A = 0, for all i, and

S_n^r(x₁, . . . , x_n)_|_A = 0.

Adding the first n equations and subtracting the last n− r times one obtains 0 = x_n+1

n

X

i=1

S_n−1^r⁻¹(x₁, . . . , x_i, . . . , x_n)_|_A

= (n− r)xn+1S_n^r−1(x₁, . . . , x_n)_|_A

(9)

In other words S_n^r−1(x1, . . . , xn)_|_A = 0. Simplify the substitution corresponding to A by taking x_n+1 = 0 in the defining k equations. We obtain a set of k substitutions on k variables from {x1, x₂, . . . , x_n}. This defines a linear space of codimension k on which Sn^r⁻¹vanishes. So ρ₀(S_n^r⁻¹) <

k = ρ1(S_n+1^r+1) + 1.

For another example, S₆³ is made to vanish at dimension 3 not by any subspace that zeroes out 3 co-ordinates but rather by A = { (u, −u, w, −w, y, −y) : u, w, y ∈ C }. Now add a new variable t in defining f = S₇⁴. The notable fact is that f 1-resists the dimension-3 subspace A⁰ obtained by adjoining t = 0 to the equations for A, upon existentially choosing to derive by a variable other than t, such as u. All terms of ∂f /∂u that include t vanish, leaving 10 terms in the variables v, w, x, y, z.

Of these, 4 pairs cancel under the equations x = −w, z = −y, but the leftover vwx + vyz part equates to uw²+ uy², which not only doesn’t cancel but also dominates any contribution from the lower-degree g. Gr¨obner basis runs using Singular imply that S₇⁴ is (1, 4)-resistant over C as well as the rationals and reals, though we have not yet made this a consequence of a general resistance theorem for all S_n^r.

Hence our (1, k)-resistance analysis for S₇⁴ is not impacted by the achieved upper bound of 3 represented by A. Admittedly the symmetric polynomials f have O(n²) upper bounds on `3(f ), so our distinction in this case does not directly help surmount the quadratic barrier. But it does show promise of making progress in our algebraic understanding of polynomials in general.

4 Bounds for +,*-Complexity

The partial derivatives technique of [SW99] ignores the wires of the formula present in the first layer.

In the following we show how to account for them. As a result we get a sharpening of several lower bounds, though not on `^∗₃ but on total formula size. We refine the [SW99] result for *-complexity:

Theorem 4.1 ([SW99]) Let f ∈ F [x1, . . . , x_n]. Suppose for integers d, D, κ it holds that for every affine subspace A of co-dimension κ, dim(∂_d(f )_|_A) > D. Then

`^∗₃(f )≥ min(κ² d, D

κ+d d

);

—to our result for +,*-complexity:

Theorem 4.2 (new) Let f ∈ F [x1, . . . , xn]. Suppose for integers d, D, κ it holds that for every affine subspace A of co-dimension κ, ^Pⁿ_i=1dim[∂_d(_∂x^∂f

i)_|_A] > D. Then

`₃(f )≥ min( κ² d + 2, D

κ+d d

).

Comparing the two theorems, we see that the result by Shpilka and Wigderson provides a lower bound on multiplicative complexity, while our result gives a lower bound on the total number of wires. We do get an extra “factor n” of additions with the ^Pⁿ_i=1dim[∂_d(_∂x^∂f

i)_|_A] > D condition compared to just dim(∂_d(f )_|_A) > D. Potentially this can lead to improved lower bounds on the total size of the formula, better than one would be able to infer from the lower bound on multiplicative complexity of Theorem 4.1 alone. We shall see that we can indeed get such kinds of improvements in the applications section below.

(10)

We employ the following suite of concepts and lemmas from [SW99] directly.

Definition 4.1 ([SW99]). For f ∈ F [x1, . . . , x_n], let ∂_d(f ) be the set of all dth order formal partial derivatives of f w.r.t. variables from {x1, . . . , x_n}.

For a set of polynomials A ={f1, . . . , ft}, let span(A) = {^P^ti=1cifi | ci∈ F }, i.e., span(A) is the linear span of A. We write dim[A] as shorthand for dim[span(A)]. We have the following elementary sub-additivity property for the measure dim[∂d(f )].

Proposition 4.3 ([SW99]) For f₁, f₂ ∈ F [x1, . . . , x_n] and constants c₁, c₂∈ F , dim[∂_d(c1f1+ c2f2)]≤ dim[∂d(f1)] + dim[∂_d(f2)].

One also needs to bound the growth of dim[∂_d(f )] in case of multiplication. For multiplication of affine linear forms, we have the following two bounds.

Proposition 4.4 ([SW99]) Let M = Π^m_i=1l_i, where each l_i is affine linear. Then dim[∂_d(M )]≤ m

d

! .

Proposition 4.5 ([SW99]) Let M be a product gate with dim[M^h] = m, then for any d, dim[∂d(M )]≤ m + d

d

! .

Note that for polynomials f1, . . . , fs, span(f1, . . . , fs)_|_A = span(f_1|_A, . . . , f_s|_A), and that dim[W_|_A] ≤ dim[W ]. Now we modify Proposition 4.3 a little to get a result implicitly used by Shpilka and Wigderson in their arguments.

Proposition 4.6 (cf. [SW99]) For f1, f2 ∈ F [x1, . . . , xn] and constants c1, c2 ∈ F , and affine linear subspace A, we have that dim[∂_d(c₁f₁+ c₂f₂)_|_A]≤ dim[∂d(f₁)_|_A] + dim[∂_d(f₂)_|_A].

Finally, we require:

Lemma 4.7 ([SW99]) For every n, κ, d, and every affine subspace A of co-dimension κ, we have that

dim[∂_d(S_n^2d)_|_A]≥ n− κ d

! .

Now we can prove our sideways improvement of Shpilka and Wigderson’s main Theorem 3.1 [SW99].

Proof of Theorem 4.2. Consider a minimum-size ΣΠΣ-formula for f with multiplication gates M₁, . . . , M_s. We have that

f =

s

X

i=1

Mi, where for 1≤ i ≤ s, Mi = Π^d_j=1ⁱ li,j and l_i,j = c_i,j,1x₁+ c_i,j,2x₂+ . . . + c_i,j,nx_n+ c_i,j,0,

(11)

for certain constants ci,j,k ∈ F . Computing the partial derivative of f w.r.t. variable xk we get

∂f

∂x_k =

s

X

i=1 di

X

j=1

c_i,j,kM_i

l_i,j. (4)

Let

S ={i : dim[Mi^h]≥ κ}.

If |S| ≥ _d+2^κ , then `3(f ) ≥ _d+2^κ² . Suppose |S| < _d+2^κ . If S = ∅, then let A be an arbitrary affine subspace of co-dimension κ. Otherwise, construct an affine space A as follows. Since|S|(d + 2) < κ, and since for each j ∈ S, dim[Mi^h]≥ κ, it is possible to pick d + 2 input linear forms lj,1, . . . , lj,d+2

of each multiplication gate Mj with j∈ S, such that {lj,1^h , . . . , l_j,d+2^h |j ∈ S} is a set of |S|(d + 2) < κ independent homogeneous linear forms. Define

A ={x : li,j(x) = 0, for any i∈ S, j ∈ [d + 2]}.

We have that the co-dimension of A is at most κ. W.l.o.g. assume the co-dimension of A equals κ.

For each i∈ S, d+2 linear forms of Mi vanish on A. This implies that dim[∂_d(M_i

l_i,j)_|_A] = 0.

for any i∈ S. For any i /∈ S, by Proposition 4.5, dim[∂d(M_i

li,j

)_|_A] < κ + d d

! .

Let Dk= dim[∂d(_∂x^∂f

k)_|_A]. By Proposition 4.6 and equation (4), D_k ≤^X

i /∈S

X

j

c_i,j,k6=0

dim[∂_d(Mi

l_i,j)_|_A].

Hence there must be at least ^D^k

(^κ+d_d ) terms on the r.h.s., i.e. there are at least that many wires from xk to gates in the first layer. Hence in total the number of wires to the first layer is at least Pn

i=1 Di

(^κ+dd ) > ^D (^κ+dd ).

We can apply a similar idea to adapt the other main theorem from [SW99]:

Theorem 4.8 ([SW99]) Let f ∈ F [x1, . . . , xn]. Suppose for integers d, D, κ it holds that for every affine subspace A of co-dimension κ, dim(∂_d(f_|_A)) > D. Then for every m≥ 2,

`^∗₃(f )≥ min(κm, D

m d

).

Theorem 4.9 (new) Let f ∈ F [x1, . . . , xn]. Suppose for integers d, D, κ with d ≥ 1, it holds that for every affine subspace A of co-dimension κ, ^Pⁿ_i=1dim[∂_d(_∂x^∂f

i|A)] > D. Then for every m≥ 2,

`3(f )≥ min(1

2κm, D

m−1 d

).

(12)

Proof. Consider a minimum size ΣΠΣ-formula for f with multiplication gates M1, . . . , Ms. We have that

f =

s

X

i=1

Mi, where for 1≤ i ≤ s, Mi= Π^d_j=1ⁱ li,j, with li,j = ci,j,1x1+ ci,j,2x2+ . . . + ci,j,nxn+ ci,j,0.

If there are ^κ₂ multiplication gates M_i of degree greater than m then already `₃(f ) > ¹₂κm. So suppose the number t of multiplication gates of degree greater than m is less than ^κ₂. Wlog. assume these gates are given by

M₁, M₂, . . . , M_t.

For i = 1, 2, . . . , t, pick two input linear forms l_i,1, l_i,2 of M_i, such that for the total collection l1,1, l1,2, . . . , li,1, li,2 we have that the strictly linear parts l_1,1^h , l^h_1,2, . . . , l^h_i,1, l^h_i,2 are independent. It might be that at some i ≤ t, we cannot find any li,1 or l_i,2 with l_i,1^h or l_i,2^h independent from the previously collected linear forms. In this case, we just pick li,1 if that one is still independent, and skip to the next index i. If we can’t even find li,1 for which li,1 is independent, we pick no linear form and proceed to the next i.

Let A be the zero set of all the collected input linear forms. Then A has co-dimension at most κ. Wlog. we may assume that the co-dimension of A equals κ. Observe that

∂f

∂x_{k |}_A =

s

X

i=1 di

X

j=1

c_i,j,k(M_i li,j

)_|_A. (5)

Now for a multiplication gate Mi of degree ≥ m, there are three cases: either we picked two input linear forms of Mi, or we picked just one, or none at all. In the first case,

(M_i li,j

)_|_A = 0

in the r.h.s. of (5), for all i, j. In the second and third case, we know that for every input l of Mi

that was not picked, l^h is a linear combination of l^h_i’s for li’s that were picked. Hence l^h_|

A =

r

X

i=1

ci(l^h_{i |}

A) = constant.

As a consequence, (^M_l ⁱ

i,j)_|_A = constant in the r.h.s. of (5), for all i, j. Since d≥ 1, in either three cases, we obtain that ∂_d(^M_l ⁱ

i,j|A

) = 0. For multiplication gates M_iof degree at most m, Proposition 4.4 gives us that dim[∂_d((^M_l ⁱ

i,j)_|_A)]≤ ^m−1_d . Let D_k = dim[∂_d(_∂x^∂f

k|A

)]. By Proposition 4.3, we see there are at least D_k/ ^m−1_d terms in (5). This implies that there are at least that many wires fanning out of x_k. Adding up for all variables, we conclude that `₃(f )≥ D/ ^m−1_d .

4.1 Some Applications

In [SW99] it was proved that for d ≤ log n, `^∗3(S_n^2d) = Ω(ⁿ

2d d+2

d ). Note for d = 2, this lower bound is only Ω(n). We can apply Theorem 4.2 to prove the following stronger lower bound on the total formula size of S_n^2d. In particular for d = 2, we get an Ω(n⁴³) bound.

(13)

Theorem 4.10 For 1≤ d ≤ log n, `3(S_n^2d) = Ω(ⁿ

2d d+1

d ).

Proof. For any affine subspace A of co-dimension κ and d≥ 2 we have that

n

X

i=1

dim[∂_d−1(∂S_n^2d

∂x_i )_|_A]≥ dim[∂d(S_n^2d)_|_A]≥ n− κ d

! . The latter inequality follows from Lemma 4.7. Applying Theorem 4.2 we get that

`₃(S_n^2d)≥ min( κ² d + 1,

n−κ d

κ+d−1 d−1

) = min( κ² d + 1,

n−κ d

κ+d d

κ + d

d ). (6)

Set κ = ¹₉n^d+1^d . Then we have that

n−κ d

κ+d d

κ + d

d ≥ (n− κ

κ + d)^dκ + d

d ≥ ( 8/9n 2/9n^d+1^d

)^dκ + d

d = 4^dn^d+1^d κ + d d ≥ 4^d

9dn^d+1^2d ≥ n^d+1^2d .

Hence (2) is at least min( ⁿ

2d d+1

81(d+1), n^d+1^2d ) = Ω(ⁿ

2d d+1

d ).

Corollary 4.11 `₃(S_n⁴) = Ω(n^4/3).

Shpilka and Wigderson defined the “product-of-inner-products” polynomial over 2d variable sets of size n (superscript indicate different variables, each variable has degree one) by

P IP_n^d=

d

Y

i=1 n

X

j=1

xⁱ_jy_jⁱ.

Theorem 4.12 For any constant d > 0, `₃(P IP_n^d) = Ω(n^d+1^2d ).

Proof. Let f = P IP_n^d. Essentially we have that

∂f

∂xⁱ_j = yⁱ_jP IP_n^d⁻¹,

where the P IP_n^d−1 must be chosen on the appropriate variable set. Let A be an arbitrary affine linear subspace of co-dimension κ. Then

d

X

i=1 n

X

j=1

dim[∂_d−1( ∂f

∂xⁱ_j|A)] =

d

X

i=1 n

X

j=1

dim[∂_d−1(yⁱ_jP IP_n^d⁻¹|A)]

≥ (dn − κ) dim[∂d−1(P IP_n^d−1|A)]

The last inequality follows because at least dn− κ of the y-variables are not assigned to with the restriction to A. From Lemma 4.9 in [SW99] one gets

dim[∂_d−1(P IP_n^d⁻¹|A)≥ n^d⁻¹− 2^2d⁻¹κn^d⁻².

(14)

Using Theorem 4.9 we get

`3(f )≥ min(κ²

2 ,(dn− κ)(n^d−1− 2^2d−1κn^d−2)

κ−1 d−1

).

Taking κ = n^d+1^d , one gets for constant d that

`₃(P IP_n^d) = Ω(n^d+1^2d ).

For comparison, in [SW99] one gets `^∗₃(P IP_n^d) = Ω(n^d+2^2d ).

5 Conclusion—Possible Further Tools

We have taken some further steps after [SW99], obtaining absolutely tight (rather than asymp- totically so) multiplicative size lower bounds for some natural functions, and obtaining somewhat improved bounds on +,∗-size for low-degree symmetric and product-of-inner-product polynomials.

However, these may if anything enhance the feeling from [SW99, Shp01] that the concepts being employed may go no further than quadratic for lower bounds. One cannot after all say that a function f (x₁, . . . , x_n) is non-vanishing on an affine-linear space of co-dimension more than n. The quest then is for a mathematical invariant that scales beyond linear with the number of degree-d-or-higher multiplication gates in the formula.

Notably absent in current lower bound techniques for ΣΠΣ-formulas are random restriction type arguments, whereas many of the results for Boolean constant depth circuits of [Ajt83, FSS81, Yao85, H˚as89] proceed using random restrictions. Note that Raz managed to use random restrictions in conjunction with a partial derivatives based technique in his work on multilinear arithmetical formulas [Raz04a, Raz04b]. We speculate that the weaker existential requirement in our resistance notion may help it adapt to random-restriction scenarios, although plenitude of kth-partials may still be needed to ensure the function does not collapse too far. In any event, the search for stronger mathematical techniques to prove exponential lower bounds, even in the self-contained ΣΠΣ formula case, continues.

Acknowledgments We thank Avi Wigderson for comments on a very early version of this work, and referees of a later version for very helpful criticism.

References

[Ajt83] M. Ajtai. Σ¹₁ formulae on finite structures. Annals of Pure and Applied Logic, 24:1–48, 1983.

[BCS97] P. B¨urgisser, M. Clausen, and M.A. Shokrollahi. Algebraic Complexity Theory. Springer Verlag, 1997.

[BS82] W. Baur and V. Strassen. The complexity of partial derivatives. Theor. Comp. Sci., 22:317–330, 1982.

[FSS81] M. Furst, J. Saxe, and M. Sipser. Parity, circuits, and the polynomial-time hierarchy. In Proc. 22nd Annual IEEE Symposium on Foundations of Computer Science, pages 260–270, 1981.

(15)

[GPS05] G.-M. Greuel, G. Pfister, and H. Sch¨onemann. Singular 3.0. A Computer Algebra System for Polynomial Computations, Centre for Computer Algebra, University of Kaiserslautern, 2005. http://www.singular.uni-kl.de.

[H˚as86] J. H˚astad. Almost optimal lower bounds for small-depth circuits. In Proc. 18th Annual ACM Symposium on the Theory of Computing, pages 6–20, 1986.

[H˚as89] J. H˚astad. Almost optimal lower bounds for small-depth circuits. In S. Micali, editor, Randomness and Computation, volume 5 of Advances in Computing Research, pages 143–

170. JAI Press, Greenwich, CT, USA, 1989.

[NW96] N. Nisan and A. Wigderson. Lower bounds on arithmetic circuits via partial derivatives.

Computational Complexity, 6:217–234, 1996.

[Raz04a] R. Raz. Multilinear formulas for permanent and determinant are of super-polynomial size.

In Proc. 36th Annual ACM Symposium on the Theory of Computing, 2004. to appear; also ECCC TR03-067.

[Raz04b] R. Raz. Multilinear NC¹ 6= multilinear NC². In Proc. 45th Annual IEEE Symposium on Foundations of Computer Science, pages 344–351, 2004.

[Shp01] A. Shpilka. Affine projections of symmetric polynomials. In Proc. 16th Annual IEEE Conference on Computational Complexity, pages 160–171, 2001.

[Str73] V. Strassen. Berechnung und Programm II. Acta Informatica, 2:64–79, 1973.

[SW99] A. Shpilka and A. Wigderson. Depth-3 arithmetic formulae over fields of characteristic zero. Technical Report 23, ECCC, 1999.

[Val79] L. Valiant. The complexity of computing the permanent. Theor. Comp. Sci., 8:189–201, 1979.

[Yao85] A. Yao. Separating the polynomial-time hierarchy by oracles. In Proc. 26th Annual IEEE Symposium on Foundations of Computer Science, pages 1–10, 1985.