In this section, we present an online scheme for the selection problem. Our definition of the Selection problem assumes all frequencies fi := P(jk,δk):jk=iδk are non-negative, and so this definition is only valid for the strict turnstile update model.
Definition 3.3.1. The Selection problem is defined in terms of the quantity N0 = P
i∈[n]fi, the sum of all the frequencies. Given a desired rank ρ∈ [N
0], output an item j from
the stream x = h(j1, δ1), . . . , (jN, δN)i, such that P(jk,δk):jk<jδk < ρ and P(jk,δk):jk>jδk ≥ N0− ρ.
An easy prescient (log n, log n)-scheme is for the helper to give a claimed answer s as annotation at the start of the stream. The verifier need only count how many items in the stream are (a) smaller than s and (b) greater than s. The verifier then outputs s if the rank of s satisfies the necessary conditions, and outputs ⊥ otherwise.
However, our goal is to present (almost) matching upper and lower bounds when only online annotation is allowed. To do this, we first consider the online MA complexity of the communication problem of index: Alice holds a string x ∈ {0, 1}N, Bob holds an integer i∈ [N], and the goal is for Bob to output index(x, i) := xi. The lower bound for selection will follow from the lower bound for index and a key idea for the selection upper bound is taken from the communication protocol for index seen in the proof of the following theorem. Theorem 3.3.2 (Online MA complexity of index). Let ca > 1 and cv be integers such that ca · cv ≥ N. There is an online MA protocol Q for index, with hcost(Q) ≤ ca and vcost(Q) = O(cvlog ca). Futhermore, any online MA protocol Q for index must have hcost(Q) vcost(Q) = Ω(N). Thus, in particular, MA→(index) = ˜Θ(√N ).
Proof. For the lower bound, we use an online MA protocol Q to build a (Merlin-less) ran- domized one-way index protocol Q0. Here, a one-way protocol is a one in which Alice sends a message to Bob, with no communication from Bob to Alice.
We first consider the case where Merlin does not send any message to Alice at all and then explain how to modify the proof to cover the case where Merlin sends a message to Alice (possibly based on Merlin’s internal randomness rM) that does not depend on Bob’s
input. Let ca = hcost(Q). Let B(n, p) denote the binomial distribution with parameters n and p, and let k be the smallest integer such that X ∼ B(k,1
3) ⇒ Pr[X > k/2] ≤ 2−ca/3. A standard Chernoff bound gives k = Θ(h). Let a(x, R
A) denote the message that Alice sends in Q when her random string is RA (notice a(x, RA) does not depend on any help message h1(x, rM) from Merlin, since we have assumed no such help message is sent), and let b(a, h2) be the bit Bob outputs in Q upon receiving message a from Alice and h2 from Merlin. In the protocol Q0, Alice chooses k independent random strings R1, . . . , Rk and sends Bob a(x, R1), . . . , a(x, Rk). Bob then outputs 1 iff there exists a ca-bit string h such that majority (b(a(x, R1), h2), . . . , b(a(x, Rk), h2)) = 1. Let C be the number of bits
communicated in this protocol. Clearly, C ≤ k · vcost(Q) = O(hcost(Q) vcost(Q)). We
claim that Q0 is a 13-error protocol for index whence, by a standard lower bound (see, e.g., Ablayev [4]), C = Ω(N ).
To prove the claim, consider the case when xi = 1. By the correctness of Q there ex- ists a suitable help message h2 from Merlin that causes Pr[b(a(x, RA), i, h2) = 0] ≤ 13. Thus, by construction and our choice of k, the probability that Bob outputs 0 in Q0 is at most 2−ca/3. Now suppose x
i = 0. Then, every possible message h2 from Merlin satisfies Pr[b(a(x, RA), i, h2) = 1] ≤ 13. Arguing as before, and using a union bound over all 2h possible messages h, we see that Bob outputs 1 with probability at most 2ca · 2−ca/3 = 1
3.
Now consider the case in which Merlin sends a message to Alice (possibly based on Merlin’s internal randomness rM) that does not depend on Bob’s input. Assume that the soundness probability of the protocol is 1/13-complete (this can be achieved by repeating the whole protocol O(1) times and taking the majority vote, which increases the costs by only constant factors). In this case, we construct a one-way randomized (Merlin-less) com- munication protocol for index as follows. Alice chooses a random string rM herself. Since Merlin’s message to Alice, h1(x, rM), does not depend on Bob’s input y, Alice can compute
h1(x, rM) herself. Alice sends to Bob the messages a(x, R1, h1(x, rM)), . . . , a(x, Rk, h1(x, rM)) that she would have sent in the online MA protocol given Merlin’s message h1(x, rM), and Bob outputs 1 if and only if there exists a ca-bit string h that would have caused him to accept on a majority of Alice’s messages.
Consider the case when xi = 1. By the correctness of Q, with probability at least 3/4 over the choice of rM, there exists a suitable help message h2 from Merlin that causes Pr[b(a(x, RA, h1(x, rM)), i, h2) = 0]≤ 13 (otherwise, with probability at least 1/4· 1/3 = 1/12 over the choice of both rM and RA, Merlin will fail to convince Bob to output 1, contra- dicting the fact that the protocol is 1/13-complete.) Call such a choice of rM “good”. By construction and our choice of k, if rM is good then the probability that Bob outputs 0 inQ0 is at most 2−ca/3. Thus, in the case x
i = 1, our one-way randomized communication protocol outputs 1 with probability at least 3/4− 2−ca/3 > 2/3.
In the case xi = 0, the argument that our one-way randomized communication protocol outputs 0 with probability at least 2/3 proceeds exactly as in the case where Merlin did not send any message to Alice, since it holds that for every message h1 to Alice and every possible message h2 to Bob, the protocol satisfies Pr[b(a(x, RA, h1), i, h2) = 1] ≤ 13.
The upper bound follows as a special case of the two-party set-disjointness protocol in [3, Theorem. 7.4] since the protocol there is actually online. We give a more direct protocol, which establishes intuition for our selection result. Write Alice’s input string x as x = y(1)· · · y(v), where each y(j)is a string of at most cabits, and fix a prime q with 3ca < q < 6ca. Let y(k) be the substring that contains the desired bit xi. Merlin sends Bob a string z of length at most ca, claiming that it equals y(k). Alice picks a random α∈ Fq and sends Bob α and the strings gα(y(1)), . . . , gα(y(v)), where gα is defined as in Lemma 3.2.1. This requires communicating O(v log q) = O(v log ca) bits. Bob checks if gα(z) = gα(y(k)), outputting ⊥ if not. If the check passes, Bob assumes that z = y(k), and outputs xi from z under this
assumption. By Lemma 3.2.1, the error probability is at most ca/q < 1/3. It is worth making the following two remarks on the above proof.
1. The above lower bound argument in fact shows that an online MA protocolQ for an ar- bitrary two-party communication problem F satisfies hcost(Q) vcost(Q) = Ω(R→(F )), where R→(F ) is the one-way, randomized communication complexity of F . Thus, MA→(F ) = Ω(pR→(F )). A similar result was proved by Aaronson [2].
2. The upper bound for index presented above works more or less unchanged when Alice’s string is in ΣN, for an arbitrary finite alphabet Σ. In view of Lemma 3.2.1, one simply needs to choose the prime q such that 3|Σ|h < q < 6|Σ|h to bound the error probability below 1/3. This leads to a protocol P with hcost(P) ≤ h log |Σ| and vcost(P) = O(v(log |Σ| + log h)). Henceforth, we shall refer to this generalized protocol simply as “the index protocol” — the alphabet Σ will usually be clear from the context.
Theorem 3.3.3. For all ca, cv such that ca · cv ≥ n, there is an online (calog n, cvlog n)- scheme for selection. Furthermore, any online (ca, cv)-scheme for selection must have ca· cv = Ω(n).
Proof. Conceptually, the verifier builds a vector r = (r1, . . . , rn)∈ Zn+ where rk =Pj<kfk. This is done by inducing a new stream x0 from the input stream x: each tuple (xk, δk) in A causes virtual tokens (xk+1, δk), (xk+2, δk), . . . , (n, δk) to be inserted into A0. Then r = f (A0); note that krk1 = O(nN ). We apply the index protocol to this vector, with q = Θ(m2) to retrieve the ranks of elements surrounding the claimed answer s. This information is sufficient to check that s has the claimed rank.
For the lower bound, we use a standard reduction from the index problem. Given the string x∈ {0, 1}N, Alice transforms it into the stream over universe [2N ] whose jth tuple is
(2j− xj, 1), for each j. Given the index i∈ [N], Bob transforms it into a stream consisting of i copies of (2N, 1) and N − i copies of (1, 1). Consequently, the median of the combined length-(2N ) stream is 2i− xi, from which the value of xi can be recovered. To complete the proof, observe that any online scheme to compute this median would imply an online MA protocol for index with the same cost; and that all players can perform this reduction online without extra space or annotation.
Notice that in the above scheme the information computed by the verifier is independent of ρ, the rank of the desired element. Therefore these algorithms work even when ρ is revealed at the end of the stream.