We now give a scheme achieving essentially optimal tradeoffs between annotation length and space usage for multiplying a b× c integer matrix A by a c-dimensional vector x.
Theorem 3.9.1. Consider a data stream containing entries of a b × c matrix A and a
c-dimensional vector x, in some arbitrary order, possibly interleaved. We assume that all entries of A and x are integers of absolute value polynomial in b and c. For any positive integers ca, cv such that cacv ≥ c, there is an online (bcalog(b + c), cvlog(b + c))-scheme for computing product Ax. Moreover, any (ca, cv) protocol requires ca· cv = Ω(min(c, b)2) bits for matrices with Ω(b· c) non-zero entries.
Proof. We begin with the upper bound. The protocol for verifying inner-products, which follows from Theorem 3.5.2 treats a c dimensional vector (such as a row of A) as a ca× cv array H, where ca· cv ≥ c. This then defines a bivariate polynomial h over a suitably large field Fq, such that h has degree ca in its first variable and cv in its second variable, and such that h(x, y) = Hx,y for all (x, y) ∈ [ca]× [cv]. For an inner-product between two vectors (such as a row of A and the vector x, treated as arrays H and G respectively), we wish to compute P
x∈[ca],y∈[cv]Gx,yHx,y =
P
x∈[ca],y∈[cv]g(x, y)h(x, y) for the corresponding arrays G, H and
polynomials g, h. These polynomials can then be evaluated at locations outside [ca]× [cv], so in the protocol V picks a random position r and evaluates h(r, y) and g(r, y) for 1 ≤ y ≤ cv. P then presents a degree ca polynomial p(X) which is claimed to be Pcy=1v g(X, y)h(X, y). V checks that p(r) =Pcv
y=1g(r, y)h(r, y), and if so accepts Pca
x=1p(x) as the correct answer. In Theorem 3.5.2 it is shown how V can compute h(r, y) efficiently as H is defined incrementally in the stream: each addition of δ to a particular index is mapped to (x, y) ∈ [ca]× [cv], which causes h(r, y)← h(r, y) + δ · χx,y(r), where χx,y is a Lagrange polynomial. Equivalently, the final value of h(r, y) over updates in the stream where the jth update is tj = (δj, xj, yj) is f (r, y) =Ptj:yj=yδj · χxj,y(r).
To run this protocol over multiple vectors in parallel naively would require keeping the h(r, y) values implied by each different vector separately, which would be costly, as it would increase both the annotation and the space usage by a factor of b relative to a single inner product query. Our observation is that rather than keep these values explicitly, it is sufficient to keep only a fingerprint of these values, using the linearity of fingerprint functions to finally test whether the polynomials provided by P for each vector together agree with the stored values.
In our setting, the b×c matrix A implies b bivariate polynomials h1, . . . , hb of degree cain the first variable and cv in the second. We evaluate each polynomial at (r, y) for 1≤ y ≤ cv
for the same value of r: since each test is fooled byP with small probability, the chance that none of them is fooled can be kept high by choosing the field to evaluate the polynomials over to have size polynomial in b + c. Thus, conceptually, the parallel invocation of b instances of this protocol require us to store hi(r, y) for 1≤ y ≤ cv and 1≤ i ≤ b (for the b rows of A), as well as g(r, y) for 1≤ y ≤ cv (where g is the polynomial derived from x). Rather than store this set of b· cv values explicitly, V instead stores only cv fingerprints, one for each value of y, where each fingerprint captures the vector b values of hi(r, y).
From the definition of our fingerprinting function in Lemma 3.2.1, this means over stream updates tj = (δj, ij, xj, yj) of weight δj to row ij and column indexed by xjand yjwe compute one fingerprint zy for each value y ∈ [cv]:
zy = b X i=1 hi(r, y)αi = b X i=1 X tj:yj=y,ij=i δj· χxj,y(rj)α i,
where α is chosen uniformly at random from Fq as in Lemma 3.2.1. Observe that for each y this can be computed incrementally in the stream by storing only r and the current value of zy.
To verify the correctness, V receives the b polynomials pi, one for each row, and incre- mentally builds a fingerprint z∗ of the b-dimensional vector whose ith entry is pi(r). V then tests whether
cv
X
y=1
zyg(r, y) = z∗.
To see the correctness of this, we expand the left hand side as cv X y=1 zyg(r, y) = cv X y=1 b X i=1 hi(r, y)αig(r, y) = cv X y=1 b X i=1 g(r, y)hi(r, y)αi = b X i=1 cv X y=1 g(r, y)hi(r, y)αi
Likewise, if all pi’s are as claimed, then z∗ = b X i=1 pi(r)αi = b X i=1 ( cv X y=1 g(r, y)hi(r, y)αi
Thus, if the pi’s are as claimed, then these two fingerprints will match. Moreover, by the Schwartz-Zippel lemma (Lemma 2.2.1), and the fact that α and r are picked uniformly at random from Fq by V and not known to P, the fingerprints will not match with high probability if the pi’s are not as claimed, when the polynomials are evaluated over a field of size polynomial in (b + c).
To analyze the vcost, we observe that V can compute all fingerprints in O(cv) space. As P provides each polynomial pi(x) in turn, V can incrementally compute z∗ and check that this matches Pcv
y=1zyg(r, y). At the same time, V also computes Pb
i=1 Pca
x=1pi(x), as the value of Ax. Note that if each pi is sent one after another,V can forget each previous oi after the required fingerprints and evaluations have been made; and if ca is larger than cv,V does not even need to keep pi in memory, but can instead evaluate it term by term in parallel for each value of x. Thus the total space needed by V is dominated by the cv fingerprints and check values.
The total size of the information sent by P is dominated by the b polynomials of degree ca.
To prove the lower bound, we give a simple reduction of index to matrix-vector multi- plication. Suppose we have an instance (x, k) of index where x ∈ {0, 1}n2
, k ∈ [n2]. Alice constructs an n× n matrix A from x alone, in which Ai,j = 1 if xf (i,j)=1, where f is a 1-1 correspondence [n]× [n] → [n2], and Ai,j = 0 otherwise. Assume f (i, j) = k. Bob then con- structs a vector x∈ Rnsuch that xi = 1 and all other entries of x are 0. Then the j’th entry of Ax is 1 if and only if xf (i,j)=1, and therefore the value of xf (i,j) can be extracted from the vector Ax. Therefore, if we had a (ca, cv) protocol for verifying matrix-vector multiplication given an n× n matrix A (even for a stream in which all entries of A come before all entries
of x), we would obtain a (√ca, √cv) protocol for index. The lower bound for matrix-vector multiplication thus holds by a lower bound for index given in Theorem 3.3.2.