Noisy Interpolating Sets for Low-Degree Polynomials

(1)

www.theoryofcomputing.org

Noisy Interpolating Sets

for Low-Degree Polynomials

Zeev Dvir

∗

Amir Shpilka

†

Received: September 4, 2009; published: February 12, 2011.

Abstract: A Noisy Interpolating Set (NIS) for degree-dpolynomials is a setS⊆_Fn_{, where}

Fis a finite field, such that any degree-d polynomialq∈F[x1, . . . ,xn]can be efficiently

interpolated from its values onS, even if an adversary corrupts a constant fraction of the values. In this paper we construct an explicit NIS for every fieldFpof prime orderpand

any degreed. Our sets are of sizeO(nd₎_{and have efficient interpolation algorithms that can}

recoverqin the presence of a fraction exp(−Ω(d))of errors.

Our construction is based on a theorem which roughly states that if S is a NIS for degree-1 polynomials thend·S={a1+· · ·+ad|ai∈S}is a NIS for degree-dpolynomials.

Furthermore, given an efficient interpolation algorithm forS, we show how to use it in a black-box manner to build an efficient interpolation algorithm ford·S.

As a corollary we obtain an explicit family of punctured Reed-Muller codes (codes that are restrictions of a Reed-Muller code to a subset of the coordinates) which are good codes and have an efficient decoding algorithm against a constant fraction of errors. To the best of our knowledge, even the existence of punctured Reed-Muller codes that are good codes was not previously known.

ACM Classification:F.2.2

AMS Classification:68P30, 68Q25

Key words and phrases:polynomial interpolation, error-correcting codes

∗_{Research supported by Binational Science Foundation (BSF) grant, by Israel Science Foundation (ISF) grant and by}

Minerva Foundation grant.

(2)

1 Introduction

An interpolating set for degree-d polynomials is a set of pointsS⊆_Fn _{such that if} _q(x

1, . . . ,xn) is a

degree-dpolynomial overFand we are given the set of values(q(s))s∈Sthen we can reconstructq. That

is, we can find all the coefficients ofq. A setSis anoisy interpolating setfor degree-d polynomials if we can reconstructqeven if an adversary corrupts anε fraction of the values in(q(s))s∈S. It is not

difficult to prove that a random set of sizeO(nd)is a noisy interpolating set. In this paper we study the problem of giving an explicit construction of a small noisy interpolating set for degree-dpolynomials that has an efficient interpolation algorithm against a constant fraction of errors. Besides being a very natural problem, this question is related to two other interesting problems: giving explicit constructions of punctured Reed-Muller codes that are good codes and the problem of constructing pseudorandom generators against low-degree polynomials. Before stating our results we describe the connection to these two problems.

Reed-Muller codes are an important family of error-correcting codes. They are obtained by evaluating low-degree polynomials on all possible inputs taken from a finite fieldF. More precisely, in the code RM(n,d)the messages correspond to polynomialsq(x1, . . . ,xn)overFof total degreed, and the encoding is the vector of evaluations(q(α¯))α¯∈Fn. The well-known Hadamard code, obtained from evaluating linear functions, is the code RM(n,1). (For more information on Reed-Muller codes see [10].) Although Reed-Muller codes are notgoodcodes (they do not have constant rate) they play an important role both in practice and in theory. For example Reed-Muller codes were used in the early proofs of the PCP theorem [2,1], in constructions of pseudorandom generators [3,7,12] and extractors [13,11], and more. It is thus an important goal to better understand these codes and their properties.

Apuncturingof a code is the operation of throwing away some of the coordinates of every codeword.

For example, every binary linear code (without repeated coordinates) is a punctured Hadamard code. In particular, a punctured Reed-Muller code is a mapping that sends a degree-dpolynomialqto its vector of values on some subsetS⊆_Fn_{. As Reed-Muller codes are not good codes themselves, it is an interesting}

question to find a puncturing that yields good codes. As before, it is not difficult to see that a random subset ofFnof sizeO(nd)gives a good punctured Reed-Muller code. An even more interesting question is to find a puncturing that yields good codes that have efficient encoding and decoding algorithms. (For a randomSwe don’t have an efficient decoding algorithm.) It is not hard to see thatSis a noisy interpolating set (that has an efficient interpolation algorithm) if and only if the mappingq→(q(α¯))α¯∈S

is a code of linear minimal distance (that has an efficient decoding algorithm). Our noisy interpolating sets are of sizeO(nd₎_{and therefore they give a construction of good punctured Reed-Muller codes that}

can be decoded efficiently. To the best of our knowledge no such construction was known prior to our result.

Another problem that is related to our results is the problem of constructingpseudorandom generators

for low-degree polynomials. A mappingG:Fs→Fnis anε-Pseudorandom Generator (PRG)against degree-dpolynomials if it fools every degreedpolynomial innvariables. That is, the two distributions

(3)

For the case of degree-1 polynomials, it is not hard to see that any noisy interpolating set implies a PRG and vice versa. That is, ifG:Fs→Fnis a PRG for polynomials of degree 1 then the image ofGis a noisy interpolating set. Similarly ifS⊆Fnis a noisy interpolating set for degree-1 polynomials then any one to one mapping whose image isSis a PRG for such polynomials. For larger values ofdthere is no such strong correspondence between PRGs and noisy interpolating sets. As before, given a PRG we get a noisy interpolating set (without an obvious efficient interpolation algorithm), however it is not clear how to get a PRG from a noisy interpolating set. Constructions of PRGs against low degree polynomials over very large fields were given in [5] using tools from algebraic geometry. Independently from our work, Viola [14], building on the works of [6,9], showed (over any field) thatd·Sis PRG for degree-d

polynomials wheneverSis PRG for degree-1 polynomials. In particular this result implies thatd·Sis a noisy interpolating set. However, [14] did not consider the problem of giving a noisy interpolating set and in particular did not give an efficient interpolation algorithm. One can view our work as saying that the construction analyzed in [6,9,14] with respect to minimal distance also has an efficient decoding algorithm.

In the next section we consider two types of noisy interpolating sets. The first one allows for repetition of points. That is, we allowSto be a multiset. This corresponds to repeating coordinates in an error-correcting code. In the second variant (properNIS) we do not allow such repetitions, and requireS

to be a simple set (and not a multiset). This is what allows us to get punctured Reed-Muller codes. For each of these types we prove a composition theorem, saying that the sumset of a NIS for degree-1 polynomials is a NIS for higher-degree polynomials. (The non-multiset version of this theorem requires an additional condition on the initial NIS.) We then combine these theorems with known constructions of error-correcting codes to obtain our final results.

1.1 Our results

In the followingFwill always denote a finite field of prime order.1 We denote byF[x1, . . . ,xn]the space

of all polynomials in the variablesx1, . . . ,xnwith coefficients inF. We also denote byFd[x1, . . . ,xn]the

set of all polynomials of degree≤d inF[x1, . . . ,xn]. Since we are interested in polynomials only as

functionsoverFnwe will assume in the following that the individual degree of each variable is smaller

thanp=|F|. This is without loss of generality since for everya∈Fwe haveap=a. For example, over F2, the field of two elements, we will only consider multilinear polynomials. We denote the Hamming

distance between two stringsxandyas dist(x,y).

Definition 1.1(Noisy Interpolating Set (NIS)). Let S= (a1, . . . ,am)∈(Fn)m be a list of points (not necessarily distinct) inFn. We say thatSis an(n,d,ε)-Noisy Interpolating Set(NIS) if there exists an algorithmASsuch that for everyq∈Fd[x1, . . . ,xn]and for every vectore= (e1, . . . ,em)∈Fmsuch that

|{i∈[m]|ei6=0}| ≤ε·m,

the algorithmAS, when given as input the list of values(q(a1) +e1, . . . ,q(am) +em), outputs the

polyno-mialq(as a list of its coefficients). We say thatSis aproperNIS if the pointsa₁, . . . ,amare distinct. IfS

is a proper NIS we can treat it as a subsetS⊆_Fn_.

1_{It is possible to extend our results to other finite fields (see Section}₅_{); however, it requires more notation and can easily be}

(4)

LetS= S(n)_n_∈

Nbe a sequence such that for everyn∈Nwe have thatS

(n)_{is an}_(n,_d,

ε)-NIS (dand εare fixed for alln). We say thatShas an efficient interpolation algorithm if there exists a polynomial time algorithmM(n,L)that takes as input an integernand a listLof values inFsuch that the restriction

M(n,·)has the same behavior as the algorithmA_S(n) described above (in places where there is no risk of

confusion we will omit the superscript(n)and simply say thatShas an efficient interpolation algorithm).

1.1.1 Multiset NIS

Our first theorem shows how to use a NIS for degree-1 polynomials in order to create a NIS for higher-degree polynomials. In order to state the theorem we need the following notation. For two lists2of points

S= (a1, . . . ,am)∈(Fn)mandT= (b1, . . . ,b`)∈(Fn)`we define the sumST to be the list

(ai+bj)i∈[m],j∈[`]∈(Fn)m·`.

The reason for the notation is to make a clear distinction between sets and multi-sets. In particular we shall use the notationS+T to denote set addition. That is, for two setsSandT we define

S+T ={s+t|s∈S,t∈T}.

Theorem 1.2. Let0<ε1<1/2be a real number. Let S1be an (n,1,ε1)-NIS and for each d >1let

Sd=Sd−1S1. Then for every d>1the set Sd is an(n,d,εd)- NIS withεd= (ε1/2)d.Moreover, if S1

has an efficient interpolation algorithm, then so does Sd.

CombiningTheorem 1.2with known constructions of error-correcting codes (in order to defineS1)

we get the following corollary:

Corollary 1.3. For every field _F of prime order and for every d>0, there exists an ε >0 and a familyS= S(n)_n_∈

Nsuch that for every n∈N, S

(n)_{is an}_(n,d_,

ε)-NIS,and such thatShas an efficient

interpolation algorithm. Moreover, for each n∈Nwe haveS(n)=O(nd)and it is possible to generate

the set S(n)in timepoly(n).

1.1.2 Proper NIS

We are able to prove an analog ofTheorem 1.2for proper NIS. We would like to show that ifS₁is a proper(n,1,ε1)-NIS then the setsSd =Sd−1+S1 (this time we use the operation of set addition) are

proper(n,d,εd)-NISs. To show this we will require the initial setS1to satisfy a certain natural condition

on the intersections of different ‘shifts’ of the setsSd−1. Roughly speaking, this condition will guarantee

that there is no small set of errors inSd=Sd−1+S1that has a large intersection with many shifts ofSd−1

(one can see from the proof ofTheorem 1.2why this is useful). The operation of set addition is defined as follows: LetAandBbe two subsets ofFn. We defineA+B={a+b|a∈A,b∈B}. For an element

c∈_Fn_{and for a subset}_A_⊆

Fnwe denoteA+c={a+c|a∈A}.

2_{Instead of a list one can have a multi-set in mind, however it is slightly easier to describe and prove our results using the}

(5)

Definition 1.4(Condition3Fk). LetS⊆Fn. Let S0={0}and for eachd≥1 letSd=Sd−1+S. Let

k>0 be an integer. Forx∈Sdwe letNd(x) =|{b∈S|x∈Sd−1+b}|. We say thatSsatisfies condition

Fkif for every 0<d≤kwe have

|{x∈Sd|Nd(x)>d}| ≤ |Sd−2|.

We are now ready to state the analog ofTheorem 1.2for proper NISs.

Theorem 1.5. Let0<ε1<1/2be a real number and let k>0an integer. There exists a constant C0,

depending only onεand k, such that for all n>C0the following holds: Let S1be a proper(n,1,ε1)-NIS

and for each d>1let Sd=Sd−1+S1. Suppose S1satisfies conditionFk. Then, for every1<d≤k the

set Sdis a (proper)(n,d,εd)-NIS with

εd=

1

d!·

_ε₁

3

d .

Moreover, if S1has an efficient interpolation algorithm, then so does Sd.

As before, combiningTheorem 1.5with known constructions of error-correcting codes gives the following corollary:

Corollary 1.6. For every field_Fof prime order and for every d>0, there exists anε>0and a family S= S(n)_n_∈

Nsuch that for every n∈N, S

(n)_{is a proper}_(n,_d,

ε)-NIS with an efficient interpolation

algorithm. Moreover, for each n∈Nwe haveS(n)=O(nd)and it is possible to generate the set S(n)in

timepoly(n).

It is easy to see that any(n,d,ε)-proper NIS induces a puncturing of the Reed-Muller code RM(n,d) that has an efficient decoding from a fractionε of errors. (We keep the coordinates of the RM-code corresponding to the points in the NIS.) The following corollary is therefore immediate. (SeeSection 2.2

for the definition of a good family of codes.)

Corollary 1.7. For every field_Fof prime order and for every d>0, there exists anε>0and a family

{Cn}of good codes such that Cnis a puncturedRM(n,d)-code. Furthermore, the family{Cn}has an

efficient encoding and decoding algorithms from a constant fraction of errors.

1.2 Organization

InSection 2we give preliminary results that we will use in the rest of the paper. InSection 3we prove

Theorem 1.2andCorollary 1.3. InSection 4, we proveTheorem 1.5andCorollary 1.6.Section 5contains a sketch of an argument showing how to extend our results from fields of prime order to any finite field.

Section 6describes a construction of a family of error-correcting codes with certain special properties that is used inSection 4.

(6)

2 Preliminaries

Letq∈F[x1, . . . ,xn]be a polynomial. We denote by deg(q)the total degree ofq. Thei-th homogeneous

part ofqis the sum of all monomials inqof degreei, and is denoted withqi. We denoteq≤i=∑ij=0qj.

We now give some useful facts and definitions regarding partial derivatives of polynomials and error-correcting codes.

2.1 Partial and directional derivatives

In this section we define partial and directional derivatives of polynomials over finite fields and discuss some of their properties.

We start by defining the partial derivatives of a monomial. LetM(x) =xc1

1 ·. . .·x

cn

n be a monomial in nvariables so thatci≥0 for alli. The partial derivative ofM(x)with respect toxiis zero ifci=0 and

∂M

∂xi(x) =ci·x c1

1 ·. . .·x

ci−1

i ·. . .·x cn n

ifci>0. The definition extends to general polynomials by linearity. It is easy to verify that if deg(q) =d

then all of the partial derivatives ofqare of degree at mostd−1. Also, since we assume that the individual degrees are smaller than the characteristic of the field we have that_∂∂_xq

i(x)≡0 iffxi does not appear in q(x). We denote by

∆q(x) =

∂q

∂x1

(x), . . . , ∂q ∂xn

(x)

the partial derivative vector ofq. The directional derivative of a polynomialq∈_F[x1, . . . ,xn]in direction

a∈_Fn_{is denoted by}

∂q(x,a)and is defined as

∂q(x,a) = n

∑

i=1

ai·∂q ∂xi(x).

That is,∂q(x,a)is the inner product ofaand∆q(x).

The following lemma gives a more “geometric” view of the partial derivatives ofqby connecting them to difference ofqalong certain directions.

Lemma 2.1. Let q∈Fd[x1, . . . ,xn]and let qd be its homogeneous part of degree d. Let a,b∈Fn. Then

q(x+a)−q(x+b) =∂qd(x,a−b) +E(x),

wheredeg(E)≤d−2(we use the convention thatdeg(0) =−∞). That is, the directional derivative of qd

in direction a−b is given by the homogeneous part of degree d−1in the difference q(x+a)−q(x+b).

Proof. It is enough to prove the lemma for the case thatqis a monomial of degreed. The result will then

(7)

dare of degree at mostd−2. LetM(x) =xc1

1 ·. . .·x

cn

n be a monomial of degreed. We observe that the

expressionM(x+a)can be written as

M(x+a) =

n

∏

i=1

(xi+ai)ci

= M(x) +

n

∑

i=1

ai·∂M ∂xi

(x) +E1(x),

where deg(E1)≤d−2. Now, subtractingM(x+b)fromM(x+a), the termM(x)cancels out and we are

left with the desired expression.

The next easy lemma shows that a homogeneous polynomialqcan be reconstructed from its partial derivatives.

Lemma 2.2. Let q∈Fd[x1, . . . ,xn]be a homogeneous polynomial of degree at least 1. Given the vector

of partial derivatives∆q(x), it is possible to reconstruct q in polynomial time.

Proof. We go over all monomials of degree≤dand find out the coefficient they have inqusing the

following method. For a monomialMletibe the first index such thatxiappears in the monomialMwith

positive degree. Consider ∂q

∂xi(x)and check whether the coefficient of the derivative of that monomial is

zero or not. For example, if we want to “decode” the monomialx5₁·x3₂we will look at the coefficient of

x4₁·x3₂in_∂∂_xq

1(x). To get the coefficient inqwe have to divide it by the degree ofxiin the monomial, that

is by 5 (remember that the individual degrees are smaller thanp=|F|=char(F)).

2.2 Error-correcting codes

We start with the basic definitions of error-correcting codes. LetFbe a field and letE:Fn→Fm. Denote byC=E(Fn)the image ofE. ThenCis called an[m,n,k]-codeif for any two codewordsE(v),E(u)∈C, whereu6=v, we have that dist(E(u),E(v))≥k(“dist” denotes the Hamming distance). We refer tomas

theblock lengthofCand tokas theminimal distanceofC. We denote byR=n/mtherelative rateofC

and byδ =k/mtherelative minimal distanceofC, and say thatCis an[R,δ]-code. WhenEis a linear mapping we say thatCis a linear code and call them×nmatrix computingEthegenerator matrixofC. A mapD:Fm→Fncan correctterrors (with respect toE) if for anyv∈Fnand anyw∈Fmsuch that dist(E(v),w)≤twe have thatD(w) =v. Such aDis called adecoding algorithmforC.

A family of codes{Ci}, whereCi is an[Ri,δi]-code of block lengthmi, hasconstant rateif there exists a constantR>0 such that for all codes in the family it holds thatRi≥R. The family has alinear

distanceif there exists a constantδ >0 such that for all codes in the family we haveδi≥δ. In such a

case we say that the family is a family of[R,δ]-codes. If a family of codes as above has limi→∞mi=∞, a constant rate and a linear minimal distance then we say that the family is a family ofgood codesand that the codes in the family are good. Similarly, we say that the family of codes has a decoding algorithm for a fractionτ of errors if for eachCithere is a decoding algorithmDithat can correct aτ fraction of errors. WhenCis a linear code we define the dual ofCin the following way:

(8)

Lemma 2.3. Let C be an[m,n,k]-code overFsuch that C has an efficient decoding algorithm that can

correct anα fraction of errors. For i∈[m]let ai∈_Fn_{be the i-th row of the generator matrix of C. Then,}

1. Let S0= (a1, . . . ,am,¯0, . . . ,¯0)∈(Fn)2m. Then S0is an(n,1,α/2)-NIS with an efficient interpolation algorithm.

2. Let S= (a1, . . . ,am)∈(Fn)mand suppose that the maximal Hamming weight of a codeword in C is

smaller than(1−2α)·m. Then S is an(n,1,α)-NIS with an efficient interpolation algorithm.

Proof.

1. Letqbe a degree-1 polynomial. The interpolation algorithm forS0will work as follows: we first look at the values ofq(x)on the lastmpoints (the zeros). The majority of these values will beq(0)

which will give us the constant term inq. Letq1(x) =q(x)−q(0)be the linear part ofq. Subtracting

q(0)from the values in the firstmcoordinates we reduce ourselves to the problem of recovering the homogeneous linear functionq₁from its values on(a1, . . . ,am)with at mostα·merrors. This

task is achieved using the decoding algorithm forC, since the vector(q1(a1), . . . ,q1(am))is just

the encoding of the vector of coefficients ofq1with the codeC.

2. Here we cannot decipher the constant term ofq(x)so easily. Let (v1, . . . ,vm)∈Fbe the list of input values given to us. (vi=q(ai)for a 1−α fraction of the valuesi.) What we will do is we will go over all p=|_F|possible choices for q(0) and for each “guess”c∈_Fdo the following: Subtractcfrom the values(v1, . . . ,vm)and then use the decoding algorithm ofCto decode the

vectorVc= (v1−c, . . . ,vm−c). It is clear that forc=q(0), this procedure will give us the list of

coefficients ofq(x)(without the constant term). Our task is then to figure out which invocation of the decoding algorithm is the correct one.

Say that the decoding algorithm, on inputVcreturned a linear polynomialqc(x)(there is no constant term). We can then check to see whetherqc(ai) +cis indeed equal tovifor a 1−α fraction of the valuesi. If we could show that this test will succeed only for a singlec∈_Fthen we are done and the lemma will follow. Suppose for contradiction that there are two linear polynomialsqc(x)and

qc0(x)such that both agree with a fraction 1−α of the input values. This means that there exist

two codewordsWc,Wc0∈_Fm(the encodings ofq_candq_c0) inCsuch that dist(Vc,Wc)≤_α·mand

dist(Vc0,W_c0)≤_α·m. Therefore

dist(Vc−Vc0,Wc−W_c0)≤2_α·m.

The vectorVc−Vc0 has the valuec0−c6=0 in all of its coordinates and so we get that the Hamming

weight of the codewordWc−Wc0 is at least(1−2_α)·m, contradicting the properties ofC.

Lemma 2.4. Let C be an[m,n,k]-code overFsuch that the dual C⊥ has minimal distance>r. Let

S⊆Fn be the set of points corresponding to the rows of the generator matrix of C. Then S satisfies

(9)

Proof. Let 0<d≤r/2 be an integer and letSd=S+. . .+S ,dtimes. We will prove the lemma by

showing that

{x∈Sd|Nd(x)>d} ⊆Sd−2,

(the reader is encouraged to reviewDefinition 1.4for the definition ofNd(x)). Letx∈Sd. We can thus write

x=a₁+· · ·+ad,

witha₁, . . . ,ad∈S. Suppose now thatNd(x)>d. This means that there exists a way to writexas a sum

of elements inSas follows

x=b1+· · ·+bd−1+c,

withc6∈ {a₁, . . . ,ad}. Combining the two ways to writexwe get

a1+· · ·+ad−b1− · · · −bd−1−c=0.

Since the distance of the dual is larger than 2dthe above equality must be a trivial one. That is, every vector must appear with a coefficient that is a multiple ofp(the characteristic). Sincec∈ {a/ ₁, . . . ,ad}, we infer thatcmust appear as one of the valuesbi(otherwise it would have coefficient−1). This means

thatxis inSd−2.

3 Constructing a NIS

In this section we proveTheorem 1.2andCorollary 1.3. We start with the proof ofCorollary 1.3, which is a simple application of known results in coding theory.

Proof ofCorollary 1.3. To prove the corollary, all we need to do is to construct, for alln∈N, an(n,1,ε)

-NISS(₁n)with an efficient interpolation algorithm andεwhich does not depend onn. The corollary will then follow by applyingTheorem 1.2.

To constructS1we take a good family of linear codes{Cn}, whereCnis an[mn,n,kn]-code overF (seeSection 2.2) that has an efficient decoding algorithm that can decode a constant fraction of errors and such that the generator matrix ofCncan be found in polynomial time (such families of codes are known to exists, see [10, Chap. 10]). Leta1, . . . ,amn ∈Fnbe the rows of its generator matrix. We defineS

(n)

1 to

be the list of points(a1, . . . ,amn,b1, . . . ,bmn), where for each j∈[mn]we setbj=0. That is,S

(n)

1 contains

the rows of the generator matrix of a good code, together with the points 0, taken with multiplicitymn.

Section 2.3now shows thatS(₁n)satisfies all the required conditions.

3.1 Proof ofTheorem 1.2

LetS1= (a1, . . . ,am)∈(Fn)mbe an(n,1,ε1)-NIS of size|S1|=m. LetSd−1= (b1, . . . ,br)∈(Fn)r be an(n,d−1,εd−1)-NIS of size|Sd−1|=r. LetAd−1andA1be interpolation algorithms forSd−1andS1

respectively, as described inDefinition 1.1. LetSd=Sd−1S1. We will prove the theorem by showing

thatSdhas an interpolation algorithm that makes at most a polynomial number of calls toAd−1and toA1

and can “correct” a fraction

εd=

ε1·εd−1

(10)

of errors. We will describe the algorithmAdand prove its correctness simultaneously (the number of calls

toAd−1andA1will clearly be polynomial and so we will not count them explicitly).

Fixq∈_Fd[x1, . . . ,xn]to be some degree-dpolynomial and letqdbe its homogeneous part of degree d. Let us denoteSd= (c1, . . . ,cmr), where eachciis inFn. We also denote bye= (e1, . . . ,emr)∈Fmr the list of “errors,” so that|{i∈[mr]|ei6=0}| ≤εd·mr. The listSdcan be partitioned in a natural way into

all the “shifts” of the listSd−1. Let us define for eachi∈[m]the listT(i)= (b1+ai, . . . ,br+ai)∈(Fn)r. We thus have thatSdis the concatenation ofT(1), . . . ,T(m)(the order does not matter here). We can also partition the list of errors in a similar way intomlistse(1), . . . ,e(m), each of lengthr, such thate(i)is the list of errors corresponding to the points inT(i).

We say that an indexi∈[m]isgoodif the fraction of errors inT(i)is at mostεd−1/2, otherwise we

say thatiisbad. That is,iis good if

|{j∈[r]|e(_ji)6=0}| ≤(εd−1/2)· |T(i)|= (εd−1/2)·r.

LetE={i∈[m]|iis bad}. From the bound on the total number of errors we get that

|E| ≤ε1·m. (3.1)

Overview The algorithm is divided into three steps. In the first step we look at all pairs(T(i),T(j))and from each one attempt to reconstruct, usingAd−1, the directional derivative∂qd(x,ai−aj). We will claim

that this step gives the correct output for most pairs(T(i)_,T(j)₎_{(namely, for all pairs in which both}_i_and _j

are good). In the second step we take all the directional derivatives obtained in the previous step (some of them erroneous) and from them reconstruct, usingA1, the vector of derivatives∆qd(x)and so also recover

qd(x). In the last step of the algorithm we recover the polynomialq≤d−1(x) =q(x)−qd(x), again using

Ad−1. This will give usq(x) =q≤d−1(x) +qd(x).

Step I: Let i6= j∈[m] be two good indices as defined above. We will show how to reconstruct ∂qd(x,ai−aj)from the values inT

(i)_,_T(j)_{. Recall that we have at our disposal two lists of values}

Li=q(b1+ai) +e(

i)

1 , . . . ,q(br+ai) +e

(i)

r

and

Lj=

q(b1+aj) +e

(j)

1 , . . . ,q(br+aj) +e

(j)

r

.

Taking the difference of the two lists we obtain the list

Li j=q(b1+ai)−q(b1+aj) +e(

i)

1 −e

(j)

1 , . . . ,q(br+ai)−q(br+aj) +e

(i)

r −e( j)

r

.

Observe that since i and j are both good we have that the term e_`(i)−e(_`j) is non zero for at most εd−1·rvalues of`∈[r]. Therefore, we can use algorithmAd−1to recover the degree-(d−1)polynomial

(11)

We carry out this first step for allpairs (T(i),T(j)) and obtain m₂ homogeneous polynomials of degreed−1, let us denote them byRi j(x). We know that ifiand jare both good then

Ri j(x) =∂qd(x,ai−aj).

If eitherior jare bad, we do not know anything aboutRi j(x)(we can assume without loss of generality

that it is homogeneous of degreed−1 since if it is not then clearly the pair is bad and we can modify it to any polynomial we wish, without increasing the total number of errors).

Step II: In this step we take the polynomials Ri j obtained in the first step and recover from them

the polynomial∆qd(x)(then, usingSection 2.2, we will also haveqd(x)). We start by setting up some

notations: The set of degree-(d−1)monomials is indexed by the set

Id−1={(α1, . . . ,αn)|0≤αi≤p−1,α1+. . .+αn=d−1}.

We denote by xα ₌

∏ixαii and also denote by coef(xα,h) the coefficient of the monomial xα in a

polynomialh(x). Letα ∈Id−1and define the degree-1 polynomial

Uα(y1, . . . ,yn) =

n

∑

`=1

coef

xα_,∂qd

∂x`

·y`.

Observe that

∂qd(x,ai−aj) = n

∑

`=1

(ai−aj)`·

∂qd

∂x`

(x)

=

_∑

α∈Id−1

xα_·U

α(ai−aj).

Therefore, for each pairi,jsuch that bothiand jare good we can get the (correct) valuesUα(ai−aj)for

allα∈Id−1by observing the coefficients ofRi j.

Fix someα∈Id−1. Using the procedure implied above for all pairsi6=j∈[m], we get m₂

values

ui j∈Fsuch that ifiand jare good then

ui j=Uα(ai−aj).

We now show how to recoverUα from the valuesui j. Repeating this procedure for allα ∈Id−1will give

us∆q_d(x).

Since we fixedα let us denote byU(y) =Uα(y). We have at out disposal a list of values(ui j)i,j∈[m]

such that there exists a setE ={i∈[m]|iis bad} of size |E| ≤ε1·m such that if i and j are not

in E then ui j =U(ai−aj). Let us partition the list (ui j) according to the index j into m disjoint lists: Bj = (u1j, . . . ,um j). If j6∈E than the list Bj contains the values of the degree-1 polynomial

Uj(y) =U(y−aj)on the setS1with at mostε1·merrors (the errors will correspond to indicesi∈E).

Therefore, we can useA1in order to reconstructUj, and from itU. The “problem” is that we do not

know which values jare good. This problem can be easily solved by applying the above procedure for all

j∈[m]and then taking the majority vote. Since all the good values of jwill return the correctU(y)we will have a clear majority of at least a 1−ε1fraction. Combining all of the above gives us the polynomial

(12)

Step III: Having recoveredqdin the last step we now subtract the valueqd(ci)from the input list of

values (these are the values ofq(x)onSd, with≤εd fraction of errors). This reduces us to the problem of

recovering the degree-(d−1)polynomialq≤d−1(x) =q(x)−qd(x)from its values inSd with a fraction

εdof errors. Solving this problem is easy to do by using the algorithmAd−1on the values in each listT(j)

(defined in the beginning of this section) and then taking the majority. Since for a good value j, the list

T(j)contains at mostεd−1·rerrors, and since more than half of the values jare good, we will get a clear

majority and so be able to recoverq≤d−1.

4 Constructing a proper NIS

In this section we proveTheorem 1.5andCorollary 1.6. As before we start with the proof ofCorollary 1.6

and then go on to prove the theorem.

Proof ofCorollary 1.6. This will be similar to the proof ofCorollary 1.3. All we need to do is find, for

eachn, a proper(n,1,ε)-NIS that satisfies conditionFd. In order to do so we will need a good family of

codes{Cn}, such that eachCnis an[mn,n,kn]-code, which satisfies the following three conditions:

1. {Cn}has an efficient decoding algorithm that can decode a constant fractionα=Ω(1)of errors.

2. The minimal distance of the dualC_n⊥is larger than 2·d.

3. Each codeword inCnhas Hamming weight at most(1−2α)·mn.

Such a family of codes can be constructed explicitly using standard techniques (seeSection 6).

The points inS₁are given by the rows of the generator matrix of the codeCn. The rows are all distinct

since otherwise we would have a codeword of weight 2 in the dual. FromSection 2.4we get thatS1

satisfies conditionFdand from item 2. ofSection 2.3we get thatS1is an(n,1,α)-proper NIS with an efficient interpolation algorithm.

4.1 Proof ofTheorem 1.5

The proof ofTheorem 1.5uses the same arguments as the proof ofTheorem 1.2. The only difference is that the “shifts” ofSd−1(denoted byTi,i=1, . . . ,m) might intersect. However, using the fact thatS1

satisfies conditionFk fork≥d, we will be able control the effect of these intersections. To be more

precise, the only place where the proof will differ from the proof ofTheorem 1.2is where we claim that there are not too many “bad” indicesi∈[m](see below for details). The rest of the proof will be identical and so we will not repeat the parts that we do not have to.

We start by setting the notations for the proof. Let S1⊆Fn be a proper (n,1,ε1)-NIS such that

|S1|=mand letSd−1⊆Fn be a proper(n,d−1,εd−1)-NIS obtained by summingS1 with itselfd−1

times. LetAd−1 andA1be interpolation algorithms forSd−1andS1 respectively. LetSd =Sd−1+S1.

As before, the theorem will follow by giving an interpolation algorithm for Sd that makes at most a polynomial number of calls toAd−1and toA1and can “correct” a fraction

εd=

εd−1·ε1

(13)

of errors (the expression forεd appearing in the theorem can be deduced from this equation by a simple

induction).

Fix a degree-dpolynomialq∈Fd[x1, . . . ,xn]and letqdbe its homogeneous part of degreed. Write

S1={a1, . . . ,am} (theai are distinct). The setSd can be divided intom (possibly overlapping) sets T1, . . . ,Tmsuch that

Ti=Sd−1+ai and Sd=

[

i∈[m]

Ti.

Our algorithm is given a list of values ofq(x)on the setSd with at mostεd· |Sd|errors. LetB⊆Sd be the

set of points inSdsuch that the value ofq(x)on these points contains an error. We therefore have

|B| ≤εd· |Sd|. (4.2)

We say that an indexi∈[m]isbadif

|Ti∩B|>(εd−1/2)· |Ti|= (εd−1/2)· |Sd−1|.

The next claim bounds the number of bad valuesi.

Claim 4.1. Let E={i∈[m]|i is bad}. Then

|E| ≤ε1·m.

If we can proveClaim 4.1then the rest of the proof ofTheorem 1.5will be identical to the proof of

Theorem 1.2(starting from the part titled “Step I”), as the reader can easily verify. IndeedClaim 4.1is the analog of Equation (3.1) for the case of proper sets. We therefore end this section with a proof of

Claim 4.1.

4.1.1 Proof ofClaim 4.1

In order to proveClaim 4.1we will require the following auxiliary claim on the growth rate of the sumsets ofS.

Claim 4.2. Let S⊆Fnbe of size m such that S satisfies conditionFkand let the sets S0,S1, . . . ,Sk be

as inDefinition 1.4(the sumsets of S). Then there exists a constant C₁depending only on k such that if

|S|>C1then for every d≤k we have

|Sd| ≥ 1

2d· |Sd−1| · |S|. (4.3)

Proof. The proof is by induction ond. Ford=1 the bound in (4.3) is trivial. Suppose the bound holds

ford−1 and lets prove it ford. Let us denoteS={a₁, . . . ,am}andTi=Sd−1+ai. ClearlySd=

Sm

i=1Ti.

We will lower bound the number of points inSdin the following way: For eachiwe will take all points

x∈Ti such thatNd(x)≤d and than divide the number of points we got byd. Since there are at most |Sd−2|elementsx∈Sd withNd(x)>d, we get the bound

|Sd| ≥ 1 d·

m

∑

i=1

(|Ti| − |Sd−2|)

= 1

d· |S| · |Sd−1| −

1

(14)

Using the inductive hypothesis on the second term yields

|Sd| ≥

1

d· |S| · |Sd−1| −

2d−2

d · |Sd−1|

≥ 1

2d· |S| · |Sd−1|,

where the last inequality holds for sufficiently large|S|(as a function ofd).

Proof ofClaim 4.1. Lett=|E|and suppose in contradiction thatt>ε1·m=ε1· |S|. This means that

there are at leastt setsTi₁, . . . ,Tit such that|B∩Tij| ≥(εd−1/2)· |Sd−1|. From conditionFd we know

that, apart from a set of size at most|Sd−2|, each “bad” element appears in at mostdsetsTij. Therefore

εd· |Sd| ≥ |B| ≥

1

d·ε1· |S| · _ε_d₋₁

2 · |Sd−1| − |Sd−2|

= εd−1·ε1

2d · |Sd−1| · |S| − ε1

d · |S| · |Sd−2| ≥ εd−1·ε1

2d · |Sd| − ε1

d · |S| · |Sd−2|.

UsingClaim 4.2twice we get that

|S| · |Sd−2| ≤2(d−1)· |Sd−1| ≤4d(d−1)·

|Sd| |S| .

Plugging this inequality into the previous one we get that

εd· |Sd| ≥

εd−1·ε1

2d · |Sd| −ε1·4·(d−1)· |Sd|

|S| ,

which after division by|Sd|gives

εd ≥

εd−1·ε1

2d −

ε1·4·(d−1)

|S|

> εd−1·ε1

3d ,

(whenm=|S|is sufficiently large, as a function ofε1andd) contradicting Equation (4.2).

5 Other finite fields

We briefly sketch how one can extend Theorems1.2and1.5to non-prime finite fields.

The main difference is that over extension fields, one has to considerthe weight-degreeinstead of the degree. Specifically, letFbe a field of order prfor a primep. Given an integerd<prletd=∑ri=−01aipi,

0≤ai<p, be its base-pexpansion. The weight ofd, denoted wt(d), is defined as wt(d) =∑ri=−01ai. For

example, the weight of 14 overF9is 4 as 14=1·9+1·3+2 and 1+1+2=4. The weight-degree of a

monomialxdis simply the weight ofd. Similarly the weight-degree of a multivariate monomial∏ni=1x

(15)

is∑ni=1wt(di). The weight-degree of a polynomialq(x1, . . . ,xn)is the maximum of the weight-degrees of

its monomials. Intuitively, the weight-degree is the correct notion of degree as the polynomialxpi is in fact a linear map overF, and soxaip

i

“behaves” like a polynomial of degreeai.

It is not hard to see that polynomials of weight-degreed in Fpr[x₁, . . . ,xn]that map (_F_pr)n to_F

correspond to degree-dpolynomials innrvariables overFpand vice versa (see, e. g., [4]).

Now, given a polynomialqof weight-degreedoverFpr and anyα∈Fpr, letq_α(x₁, . . . ,xn)be defined

asq_α=Tr(α·q(x1, . . . ,xn)), where Tr is the trace operator; Tr(x) =∑ri=−01x

pi

. It is easy to see thatq_α is of weight-degree at mostdand thatq_α:(Fpr)n→_F. It is also not hard to see that if one can learn all the

valuesqαthen one can easily reconstructq(by a simple linear transformation).

The argument above shows that in order to construct NIS for polynomials of weight-degreedover Fpr it is sufficient to construct NIS for polynomials of degree d in nrvariables. This explains how

Theorems1.2and1.5can be extended to non-prime finite fields. We thus obtain the following Theorem that enables an immediate translation of the results stated in Theorems1.2and1.5and Corollaries1.3,

1.6and1.7to the context of fields of prime power order.

Theorem 5.1. Let p be a prime number, n,r and d integers, andε>0a real number. Fix a basis forFpr overFp. If Sdis an(nr,d,ε)-NIS overFpthen, when viewed as a collection of n-tuples overFpr according

to the fixed basis, Sd is an(n,d,ε)-NIS overFpr (i. e., it is a NIS for polynomials of weight-degree d

overFpr). Moreover, if Sd has an interpolation algorithm that runs in time T over_F_pthen it has such

an algorithm also when viewed as a NIS overFpr. The algorithm can handle the same amount of errors

and its running time is O(rT) +poly(n). In addition, if Sdis a proper NIS overFpthen it is a proper NIS

overFpr.

We leave the exact details of the proof to the reader.

6 A family of codes with special properties

In the proof ofCorollary 1.6we used a family of good linear error-correcting codes{Cn}, over a fieldF of prime orderp, such thatCnhas message lengthnand such that the following three properties

1. The family{Cn}has an efficient encoding/decoding algorithms from a constant fraction of errors.

2. The dual distance ofCnisω(1).

3. If the block length ofCnismnthen every codeword inCnhas Hamming weight at most(1−δ)·mn for some constantδ >0.

Efficient codes with good dual distance are known to exists over any finite field. Results of this type can be found, e. g., in [10]. It is possible that some of these codes already posses property (3) above, however, we could not find a proof for that anywhere. Since we only require the dual distance to be super-constant we can give a simple self-contained construction satisfying all three conditions above. We sketch the argument below without giving a complete proof.

Say we wish to construct a code as above for messages of lengthn. We pick an integer`such that

(16)

that` <log(n). We construct a fieldEof orderp`(this can be done deterministically in time poly(n)). We now take a Reed-Solomon codeRoverE, of block length p`, whose messages are univariate polynomials of degreeα·p`, for some constantα. Notice that this code has minimal distance(1−α)·p` (over

E), relative rateα, and its dual distance is at leastα·p`. Letϕ:E→F`be the natural representation ofEas`-tuples overF. We can extendϕ in the natural way to be a mapping fromEp

`

to(F`)p

`

. In particular we can think ofϕ(R)as a code overFof the same relative rate and of at least the same minimal distance. It is not hard to see that there exists an invertible linear transformationA:F`→F`, such that

v= (v1, . . . ,vp`)∈R⊥if and only if4(Aϕ(v₁), . . . ,Aϕ(v_p`))∈ϕ(R)⊥(there is a slight abuse of notation

here, but the meaning is clear). In particular we get that the minimal distance ofϕ(R)⊥is the same as the minimal distance ofR⊥.

Now, for every 1≤i≤p`we pick a code ˜Ci:F`→Fmi, such thatmi=O(`)and such that∑p

`

i=1mi=n.

Moreover, we would like the codes ˜Cito satisfy the three properties stated above. It is not hard to play a little with the parameters to get that such ˜Ci andmiexist. We also note that the ˜Cican be constructed recursively. We now “concatenate” the code5ϕ(R)with the ˜Cisuch thatCiis applied on thei-th block, of

length`, ofϕ(R). Denote the resulting code withCn. It is now easy to verify that indeed the block length ofCnisn, that it has constant rate and linear minimal distance, and thatC⊥_n has minimal distanceω(1).

Acknowledgement

We would like to thank Parikshit Gopalan for bringing the problem to our attention, Emanuele Viola for sharing the manuscript of [14] with us, and Shachar Lovett and Chris Umans for helpful discussions. We also thank the anonymous reviewers for comments that improved the presentations of the results. We are thankful to Laci Babai for helpful comments.

References

[1] SANJEEV ARORA, CARSTEN LUND, RAJEEV MOTWANI, MADHU SUDAN, AND MARIO SZEGEDY: Proof verification and the hardness of approximation problems. J. ACM, 45(3):501–555, 1998. [doi:10.1145/278298.278306] 2

[2] SANJEEVARORA ANDSHMUELSAFRA: Probabilistic checking of proofs: A new characterization of NP. J. ACM, 45(1):70–122, 1998. [doi:10.1145/273865.273901] 2

[3] L ´ASZLO´ BABAI, LANCEFORTNOW, NOAMNISAN,ANDAVIWIGDERSON: BPP has subexponen-tial time simulations unless EXPTIME has publishable proofs. Comput. Complexity, 3(4):307–318, 1993. [doi:10.1007/BF01275486] 2

[4] ELI BEN-SASSON AND MADHUSUDAN: Limits on the rate of locally testable affine-invariant codes. Electron. Colloq. on Comput. Complexity (ECCC), 17:108, 2010. [ECCC:TR10-108] 15 4_{We would like to have}_ϕ(_R₎⊥₌_ϕ(_R⊥₎_{, however this is not true, so we have}_A_{that “fixes” this.}

(17)

[5] ANDREJBOGDANOV: Pseudorandom generators for low degree polynomials. InProc. 37th STOC, pp. 21–30. ACM Press, 2005. [doi:10.1145/1060590.1060594] 3

[6] ANDREJ BOGDANOV AND EMANUELE VIOLA: Pseudorandom bits for polynomials. SIAM J.

Comput., 39(6):2464–2486, 2010. [doi:10.1137/070712109] 3

[7] RUSSELL IMPAGLIAZZO AND AVI WIGDERSON: P=BPP unless E has subexponential cir-cuits: Derandomizing the XOR lemma. InProc. 29th STOC, pp. 220–229. ACM Press, 1997. [doi:10.1145/258533.258590] 2

[8] J. JUSTESEN: A class of constructive asymptotically good algebraic codes. IEEE Trans. Inform.

Theory, 18(5):652–656, 1972. [doi:10.1109/TIT.1972.1054893] 16

[9] SHACHARLOVETT: Unconditional pseudorandom generators for low degree polynomials. Theory

of Computing, 5(1):69–82, 2009. [doi:10.4086/toc.2009.v005a003] 3

[10] F. J. MACWILLIAMS AND N. J. A. SLOANE: The Theory of Error-Correcting Codes, Part II. North-Holland, 1977. 2,9,15

[11] RONENSHALTIEL ANDCHRISTOPHERUMANS: Simple extractors for all min-entropies and a new pseudorandom generator. J. ACM, 52(2):172–216, 2005. [doi:10.1145/1059513.1059516] 2

[12] MADHUSUDAN, LUCATREVISAN,ANDSALILVADHAN: Pseudorandom generators without the XOR lemma. J. Comput. System Sci., 62(2):236–266, 2001. [doi:10.1006/jcss.2000.1730] 2

[13] AMNONTA-SHMA, DAVIDZUCKERMAN,ANDSHMUELSAFRA: Extractors from Reed-Muller codes. J. Comput. System Sci., 72(5):786–812, 2006. [doi:10.1016/j.jcss.2005.05.010] 2

[14] EMANUELEVIOLA: The sum ofdsmall-bias generators fools polynomials of degreed. Comput.

Complexity, 18(2):209–217, 2009. [doi:10.1007/s00037-009-0273-5] 3,16

AUTHORS

Zeev Dvir

post-doctoral researcher

Princeton University, Princeton, NJ zeev dvir gmail com

http://www.cs.princeton.edu/~zdvir/

Amir Shpilka associate professor

Technion – Israel Institute of Technology, Haifa, Israel shpilka cs technion ac il

(18)

ABOUT THE AUTHORS

ZEEVDVIRis currently a postdoc in the Computer Science department atPrinceton Univer-sity. He received his Ph. D. from theWeizmann Institutein Israel in 2008. His advisors wereRan RazandAmir Shpilka. He is interested mainly in questions in computational complexity that have a connection with classical mathematics.