In this chapter, we introduced the model of streaming interactive proofs and demonstrated that streaming interactive proofs require exponentially less space and communication than annotated data streaming protocols for a large class of problems. We also observed that the GKR protocol can be made to work with a streaming verifier and presented improved protocols for two specific problems of fundamental importance in streaming and database processing: F2 and heavy hitters. These protocols are meant to be illustrative, and although they were tailored to specific problems, they introduced important techniques that we will utilize repeatedly as we develop general-purpose protocols in Chapters 7 and 8.
Chapter 7
Practical Verified Computation with
Streaming Interactive Proofs
In this chapter, we revisit the GKR protocol and show how to reduce the runtime of the prover from Ω(S3) in a naive implementation down to O(S log S), where S is the size of an arithmetic circuit computing the function of interest. We also describe a full imple- mentation of the protocol, demonstrating much greater scalability than one might have ex- pected. Finally, we describe a parallel implementation of the protocol that leverages Graphics Processing Units (GPUs) and experimentally demonstrate the GKR protocol’s substantial amenability to parallelization.
7.1
Overview and Statement of Results
Recall from Chapter 5 that in the GKR protocol, P and V first agree on an arithmetic circuit C of size S and fan-in 2 over a finite field F computing the function of interest. The protocol proceeds in iterations, with one iteration per layer of C. In the ith iteration, the
sum-check protocol is applied to the polynomial fz(i) : Fsi+2si+1 → F defined via:
fz(i)(p, ω1, ω2) =
βsi(z, p)·
˜
addi(p, ω1, ω2)( ˜Vi+1(ω1)+ ˜Vi+1(ω2))+ ˜multi(p, ω1, ω2) ˜Vi+1(ω1)· ˜Vi+1(ω2)
, (7.1)
where βsi, ˜addi,mult˜ i, and ˜Vi+1 are as defined in Chapter 5. In the j’th round of this sum-
check protocol, P is required to send the univariate polynomial
gj(Xj) =
X
(xj+1,...,xsi+2si+1)∈{0,1}si+2si+1−j
fz(i)(r1(i), . . . , r(i)j−1, Xj, xj+1, . . . , xsi+2si+1).
The sum defining gj involves as many as S3 terms, and thus a naive implementation of P would require Ω(S3) time per iteration of the protocol. However, we show that by exploiting the multilinearity of the low-degree extensions ˜addi and mult˜ i that we use in the definition of fi, each gate at layer i contributes to exactly one term in the sum defining gj, as does each gate at layer i+1.1Thus, the polynomial g
j can be computed with a single pass over the gates at layer i, and a single pass over the gates at layer i + 1. As the sum-check protocol requires O(si+ si+1) = O(log S(n)) messages for each layer of the circuit,P requires logarithmically many passes over each layer of the circuit in total.
A complication in applying the above observation is that V must process the circuit in order to pull out information about its structure necessary to check the validity of P’s messages. Specifically, each application of the sum-check protocol requiresV to evaluate ˜addi and mult˜ i at a random point. Theorem 7.1.1 below follows from the fact that for any log- space uniform circuit, V can evaluate the multilinear extension of the wiring predicates at any point using O(log S(n) log|F|) bits of space. We present detailed proofs and discussions of the following theorems in Section 7.2.
1
In order to obtain an interactive proof protocol for log-space uniform NC in which the prover runs in polynomial time and the verifier runs in quasi-linear time, Goldwasser, Kalai, and Rothblum used polyloga- rithmic degree extensions of addi and multi. In contrast, we use the multilinear extensionsadd˜ i and mult˜ i.
Theorem 7.1.1. For any log-space uniform circuit C of size S(n) over finite field F, P can run in O(S(n) log S(n)) time over the entire execution of the GKR protocol applied to C, andV can make a single streaming pass over the input, using O(log S(n) log |F|) bits of space over the entire execution of the protocol.
Moreover, we can strengthen Theorem 7.1.1 as follows. Because the circuit’s wiring pred- icate is independent of the input, we can separate V’s computation into an offline non- interactive preprocessing phase, which occurs before the data stream is seen, and an online interactive phase, which occurs after both P and V have seen the input. This is similar to [57, Theorem 4] and ensures thatV is space-efficient (but may require time O(S(n))) dur- ing the offline phase), and that P is both time- and space-efficient in the online interactive phase. In order to determine which circuit to use,V does need to know (an upper bound on) the length of the input during the preprocessing phase.
Theorem 7.1.2. For any log-space uniform circuit C of size S(n) and depth d(n) over finite field F, P can run in O(S(n) log S(n)) total time over the entire execution of the
GKR protocol applied to C. V can make a single streaming pass over the input, using
O(d(n) log S(n) log|F|) bits of space over the entire execution of the protocol. V can run in time O(S(n)) using space O(d(n) log S(n) log|F|) in a non-interactive, data-independent preprocessing phase, and run in time O(n log n + d(n) log S(n)) using O(d(n) log S(n) log|F|) bits of space in an online interactive phase, where the O(n log n) term is due to the time required to evaluate the low-degree extension of the input at a point.
Finally, Theorem 7.1.3 follows by assuming P can evaluate the multilinear extension of the wiring predicate quickly. We believe that the hypothesis of Theorem 7.1.3 is extremely mild, and we discuss this point at length in Section 7.3, identifying a diverse array of circuits to which Theorem 7.1.3 applies. Moreover, the solutions we adopt in our circuit-checking
experiments of Section 7.4 correspond to Theorem 7.1.3, and are both space- and time- efficient for the verifier.
Theorem 7.1.3. Let C be any log-space uniform circuit of size S(n) and depth d(n) over finite field F, and assume that for all i ∈ {1, . . . , d(n)}, there exists a O(log S(n) log |F|)- space, poly(log S(n))-time algorithm for evaluating ˜addi and mult˜ i at a point. Then in order to to implement the GKR protocol applied to C, P requires O(S(n) log S(n)) time, and V requires O(log S(n) log|F|) bits of space and time O(n log n + d(n)poly(log S(n))), where the O(n log n) term is due to the time required to evaluate the low-degree extension of the input at a point.