Our algorithm MainDJunta(f,D, k, ϵ) is described in Figure3.6.
It maintains two collections of blocks V = {B1, . . . , Bv} and U = {C1, . . . , Cu} for some nonnegative integers v and u. They are set to be∅ at the beginning, and we will prove the following conditions for V and U are always satisfied as the algorithm runs:
(A). B1, . . . , Bv, C1, . . . , Cu ⊆ [n] are pairwise disjoint (nonempty) blocks of variables;
(B). A distinguishing pair has been found for each of these blocks. For notational convenience we use (xj, yj)to denote the distinguishing pair for each Bj and (xC, yC)to denote the distinguishing pair for each block C ∈ U. We also use the following notation:
wj :=( xj)
Bj =( yj)
Bj ∈ {0, 1}Bj and wC :=( xC)
C =( yC)
C ∈ {0, 1}C,
and we let gj := f↾wj and gC := f↾wC be Boolean functions over{0, 1}Bj and {0, 1}C, respectively.
Throughout the algorithm and its analysis, we set a key parameter γ := 1/(8k). Blocks in V are intended to be those that have been “verified” to satisfy the condition that gj is γ-close to a literal (xij or xij for some unknown variable ij ∈ Bj) under the uniform distribution, while blocks in U have not been verified yet so they may not satisfy this condition. More formally, at any point in the execution of the algorithm we say that the algorithm is in good condition if its current collections V and U satisfy conditions (A), (B) and the following condition:
(C). Every gj, j ∈ [v], is γ-close to a literal under the uniform distribution over {0, 1}Bj.
Obviously our algorithm starts in good condition, and we will show that it remains in good condition with high probability as the algorithm runs.
The algorithm MainDJunta(f,D, k, ϵ) starts with V = U = ∅ and proceeds round by round. For each round, we consider two different situations that this round may get into:
we define it is a type-1 round if u = |U| = 0 (corresponding to step 2.1 of MainDJunta), and it is a type-2 round if u > 0 (corresponding to step 2.2). For a type-1 round (with u = 0), as described in Figure3.6we will draw partitions Pj and Qj from each Bj ∈ V and then run WhereIsTheLiteral on gj(the function defined for Bjin condition (B) above), Pj and Qj to determine if there is a relevant variable in Pj or Qj. Then depending on the results we will follow the idea sketched in Section3.2.3and run blockwise binary search on blocks we carefully selected, hoping to increase the number of relevant blocks we can find (note that the whole process and argument still work even when V = ∅.). We will prove the following lemma for this case in Section3.2.5:
Lemma 3.2.4. Assume that f is ϵ-far from k-juntas with respect to D and MainD-Junta(f,D, k, ϵ) is in good condition at the beginning of a type-1 round with u = 0 and v ≤ k. Then it always remain in good condition at the end of this round. Moreover, letting V′ and U′ be the two corresponding collections of blocks at the end of this round, we have either|V′| = v and |U′| = 1, or |V′| = v − 1 and |U′| = 2 with probability at least ϵ/4.
For the case when the algorithm is in a type-2 round (with u ≥ 1), we pick an arbitrary block C from U and check whether gC is close to a literal under the uniform distribution by using the subroutine Literal. We will prove the following lemma for this case in Section 3.2.6:
Lemma 3.2.5. Assume that f is ϵ-far from k-juntas with respect to D and MainD-Junta(f,D, k, ϵ) is in good condition at the beginning of a type-2 round with u > 0 and v + u ≤ k. Then with probability at least 1 − 1/(64k), one of the following two events
Algorithm MainDJunta(f,D, k, ϵ) with the same input / output as SimpleDJunta in Figure3.3.
1. Initialization: Set V = U =∅, r1 = 64k/ϵand r2 = 3(k + 1).
2. Repeat the following until r1 = 0or r2 = 0:
Let V = {B1, . . . , Bv} and U = {C1, . . . , Cu}. Let xj, yj, wj, gj, xC, yC, wC, gC be the corresponding strings and functions for each Bj ∈ V and C ∈ U as described above in condition (B).
* Otherwise “fail” is returned. Then skip this round and go back to the beginning of step 2.
2.1.3 Draw x∼ D and a subset T of B1∪ · · · ∪ Bv uniformly at random.
Let y = x(R)with R = T∪ T1∪ · · · ∪ Tv. Skip this round and go back to the beginning of step 2 if f(x) = f(y); otherwise run the blockwise binary search on x and y with blocks T, T1, . . . ,Tv:
* If a distinguishing pair of f for T is returned, then add T to U.
* Otherwise a distinguishing pair (x∗, y∗)of f for Tj∗is returned for some j∗ ∈ [v]. Then Remove Bj∗ from V and add both Sj∗ and Tj∗ to U.
2.2 Otherwise u > 0, then:
2.2.1 Set r2to be r2− 1.
2.2.2 Pick a block C ∈ U arbitrarily; let (x, y) be its distinguishing pair, w = xC and g = f↾w. Run Literal(g):
* If Literal(g) returns “true,” remove C from U and add it to V .
* Otherwise Literal(g) returns disjoint subsets C′, C∗of C, each with a distinguish pair of f. Then remove C from U and add both C′and C∗ to U.
2.3 If|V | + |U| ≥ k + 1, then halt and output “reject.”
3. Halt and output “accept.”
Figure 3.6: Description of the distribution-free testing algorithm MainDJunta for k-juntas.
happens at the end of this round (letting V′ and U′ be the two corresponding collections of blocks at the end of this round):
1. The algorithm remains in good condition with|V′| = v + 1 and |U′| = u − 1;
2. The algorithm remains in good condition with|V′| = v and |U′| = u + 1.
Assuming Lemma3.2.4and Lemma3.2.5for now, we are ready to prove the correctness of MainDJunta.
Theorem 3.2.6. (i) The algorithm MainDJunta makes ˜O(k2/ϵ)queries and always accepts f when it is a k-junta. (ii) It rejects with probability at least 2/3 when f is ϵ-far from k-juntas with respect toD.
Proof of Theorem3.2.6Assuming Lemmas3.2.4and Lemma3.2.5. MainDJunta has one-sided errors since it rejects f only when it has found k + 1 pairwise disjoint relevant blocks (in either U or V ) of f.
The number of queries it makes for each type-1 round (corresponding to step 2.1 of MainDJunta) is O(k) + O(log k) = O(k), and for each type-2 round it’s O(k log k + log k) = O(k log k). Since the number of type-1 rounds is at most r1 = 64k/ϵand the number of type-2 rounds is at most r2 = O(k), we know the query complexity of MainD-Junta is O(k2/ϵ + k2log k) = ˜O(k2/ϵ). This finishes the proof for (i).
In the rest of the proof we assume f is ϵ-far from k-juntas with respect toD and it’s enough to show that our algorithm rejects f with probability at least 2/3.
For this purpose we introduce a simple potential function F to measure the progress:
F (V, U ) := 3|V | + 2|U|.
Clearly each round of the algorithm is either of type-1 (when|U| = 0) or of type-2 (when
|U| > 0). By Lemma3.2.4, if the algorithm is in good condition at the beginning of a type-1 round, then the algorithm always ends this round in good condition and with probability
at least ϵ/4 the potential function F goes up by at least one (in which case we say that the algorithm succeeds in this type-1 round). By Lemma3.2.5, if the algorithm is in good condition at the beginning of a type-2 round, then with probability at least 1− 1/(64k) the algorithm ends this round in good condition and F goes up by at least one (in which case we say it succeeds in this type-1 round).
Note that F is 0 at the beginning (V = U =∅) and that we must have |U|+|V | ≥ k+1 (and thus, the algorithm rejects) when the potential function F reaches 3(k + 1) or above.
As a result, a necessary condition for the algorithm to accept is that one of the following two events happens:
E1: At least one of the type-2 rounds fails.
E2: E1 does not happen (so the algorithm succeeds in every type-2 round and remains in good condition all the time). In order to keep F below 3(k + 1) in the end there are at most 3k + 2 many type-2 rounds and exactly 64k/ϵ many type-1 rounds (so that the algorithm can finish), while the algorithm succeeds in at most 3k + 2many type-1 rounds out of them.
By a union bound, the probability that E1happens is at most:
3(k + 1)· 1/(64k) ≤ 6k · 1/(64k) < 1/8.
With a coupling argument we can also show the probability that E2happens is at most the probability that
64k/ϵ∑
i=1
Zi ≤ 3k + 2,
where Zi’s are i.i.d. {0, 1}-valued random variables that take 1 with probability ϵ/4. The expectation of the sum from LHS is 16k, and it follows from the Chernoff bound this probability is at most: (using 3k + 2≤ 5k)
Finally it follows from a union bound that the algorithm rejects with probability at least 2/3when f is ϵ-far from k-juntas with respect toD. This finishes the proof.
3.2.5 Proof of Lemma 3.2.4
We prove Lemma 3.2.4 in this section. Let’s start with a lemma for the subroutine WhereIsTheLiteral (Figure3.4):
Lemma 3.2.7. Assume that g : {0, 1}B → {0, 1} is γ-close (with respect to the uniform distribution) to a literal xior xifor some i∈ B. If i ∈ P , then WhereIsTheLiteral(g, P, Q) returns a distinguishing pair of g for P with probability at least 1− 4γ; If i ∈ Q, then it returns a distinguishing pair of g for Q with probability at least 1− 4γ.
Proof. Let K be the set of strings x ∈ {0, 1}B such that g(x) disagrees with the literal which it is γ-close to (so |K| ≤ γ · 2|B|). We work on the case when i ∈ Q; the proof when i∈ P is similar.
Following the description of WhereIsTheLiteral, it returns a distinguishing pair for Qif
g(x) = g(x(P )) and g(y)̸= g(y(Q)).
Note that this holds if all four strings fall outside of K (in which case g agrees with the literal xi or xi with i ∈ Q) and thus, the probability that it does not hold is at most the probability that at least one of these four strings falls inside K. The latter by a union bound is at most 4γ since each of these four strings is uniformly over{0, 1}B when x, y are drawn uniformly at random from{0, 1}B. This finishes the proof of the lemma.
We are now ready to prove Lemma3.2.4.
Proof of Lemma3.2.4. First, it is easy to verify that if the algorithm MainDJunta starts a type-1 round in good condition, then it ends it in good condition. This is because we never add blocks to V in type-1 rounds, and whenever a block is added to U, it is disjoint from
other blocks and we have found a distinguishing pair for it (note that for the last case in step 2.1.3 of MainDJunta we have found a distinguish pair for Sj∗ in step 2.1.2).
By definition of good condition we know each block Bi ∈ V is verified, i.e., it is γ-close to some literal xij or xij. It then follows from Lemma3.2.7and a union bound that, for any sequence of partitions Pj and Qj of Bj picked at the beginning of step 2.1.2 of MainDJunta, the probability that for each j ∈ [v] the set Sj chosen according to output of WhereIsTheLiteral contains the variable ij is at least (recall that γ = 1/(8k))
1− 4γ · v ≥ 1 − 4γ · k = 1/2.
Conditioning on such event happens, and with T uniformly drawn from B1∪ · · · ∪ Bv
and random partitions Pj,Qjdrawn for each Bj in step 2.1.2 and 2.1.3 of MainDJunta, the random set R = T∪
T1. . .∪
Tvis uniform over all subsets of I, where I ={ij : j ∈ [v]}.
Following Lemma 3.2.3and the fact that f is ϵ-far from k-juntas with respect toD and v ≤ k, we know that with x drawn according to D in step 2.1.3 the probability that f (x) ̸= f(x(R))is at least ϵ/2. Overall we know the algorithm has f(x) ̸= f(y) in step 2.1.3 with probability at least ϵ/4. Given this, the lemma is immediate by inspection of rest of our algorithm.