arXiv:1705.02973v1 [cs.DS] 8 May 2017
Community Detection in Hypergraphs, Spiked
Tensor Models, and Sum-of-Squares
Chiheon Kim
∗[email protected]
Afonso S. Bandeira
†[email protected]
Michel X. Goemans
‡[email protected]
May 9, 2017
AbstractWe study the problem of community detection in hypergraphs under a stochastic block model. Similarly to how the stochastic block model in graphs suggests studying spiked random matrices, our model motivates investigating statistical and computational limits of exact recovery in a certain spiked tensor model. In contrast with the matrix case, the spiked model naturally arising from community detection in hypergraphs is dif-ferent from the one arising in the so-called tensor Principal Component Analysis model. We investigate the effectiveness of algorithms in the Sum-of-Squares hierarchy on these models. Interestingly, our results suggest that these two apparently similar models exhibit significantly different computational to statistical gaps.
1
Introduction
Community detection is a central problem in many fields of science and engineer-ing. It has received much attention for its various applications to sociological behaviours [1, 2, 3], protein-to-protein interactions [4, 5], DNA 3D conforma-tion [6], recommendaconforma-tion systems [7, 8, 9], and more. In many networks with community structure one may expect that the groups of nodes within the same community are more densely connected. The stochastic block model (SBM)
∗MIT, Department of Mathematics, 77 Massachusetts Ave., Cambridge, MA 02139 †Department of Mathematics and Center for Data Science, Courant Institute of
Mathe-matical Sciences, New York University, New York, NY 10012. Part of this work was done while A. S. Bandeira was with the Mathematics Department at MIT and supported by NSF Grant DMS-1317308.
‡MIT, Department of Mathematics, Room 2-474, 77 Massachusetts Ave., Cambridge, MA
[10] is arguably the simplest model that attempts to capture such community structure.
Under the SBM, each pair of nodes is connected randomly and independently with a probability decided by the community membership of the nodes. The SBM has received much attention for its sharp phase transition behaviours [11], [12], [13], computational versus information-theoretic gaps [14], [15], and as a test bed for many algorithmic approaches including semidefinite programming [13], [16], spectral methods [17], [18] and belief-propagation [19]. See [20] for a good survey on the subject.
Let us illustrate a version of the SBM with two equal-sized communities. Let
yP t˘1unbe a vector indicating community membership of an even numbernof
nodes and assume that the size of two communities are equal, i.e.,1Ty“0. Letp
andqbe inr0,1sindicating the density of edges within and across communities, respectively. Under the model, a random graphGis generated by connecting nodesiandj independently with probabilitypifyi“yj or with probabilityq
ifyi‰yj. Letypbe any estimator ofygiven a sampleG. We say thatypexactly
recoversy ifypis equal toy or´y with probability 1´onp1q. In the asymptotic regime ofp“alogn{nand q“blogn{nwhere aąb, it has been shown that the maximum likelihood estimator (MLE) recoversy when?a´?bą?2 and the MLE fails when?a´?bă?2, showing a sharp phase transition behaviour [13]. Moreover, it was subsequently shown in [16] and [21] that the standard semidefinite programming (SDP) relaxation of the MLE achieves the optimal recovery threshold.
A fruitful way of studying phase transitions and effectiveness of different algorithms for the stochastic block model is to consider a Gaussian analogue of the model [22]. Let G be a graph generated by the SBM and let AG be the
adjacency matrix ofG. We have
EAG“ p`q 2 11 T `p´q 2 yy T ´pI.
It is then useful to thinkAG as a perturbation of the signalEAGunder centered
noise.
This motivates a model with Gaussian noise. Given a vectoryP t˘1unwith
1Ty“0, a random matrixT is generated as
Tij“yiyj`σWij
whereWij“Wji„Np0,1qfor alli, jP rns. This model is often referred toZ2
-synchronization [21, 22]. It is very closely related to the spiked Wigner model
(or spiked random matrix models in general) which has a rich mathematical theory [23, 24, 25].
In many applications, however, nodes exhibit complex interactions that may not be well captured by pairwise interactions [26, 27]. One way to increase the descriptive ability of these models is to considerk-wise interactions, giving rise to generative models on random hypergraphs and tensors. Specifically, a hypergraphic version of the SBM was considered in [28, 29], and a version of the censored block model in [30].
For the sake of exposition we restrict our attention to k“4 in the sequel. Most of our results however easily generalize to anyk and will be presented in a subsequent publication. In the remaining of this section, we will introduce a hypergraphic version of the SBM and its Gaussian analogue.
1.1
Hypergraphic SBM and its Gaussian analogue
Here we describe a hypergraphic version of the stochastic block model. Lety
be a vector in t˘1un such that 1Ty “0. Let pand q be in r0,1s. Under the
model, a random 4-uniform hypergraphH onnnodes is generated so that each
ti, j, k, lu Ď rnsis included in EpHqindependently with probability
#
p ifyi“yj “yk“yl(in the same community)
q otherwise (across communities).
LetAH, the adjacency tensor ofH, be the 4-tensor given by
pAHqijkl“ #
1 ifti, j, k, lu PEpHq
0 otherwise. Lety“l4 be the 4-tensor defined as
yijkl“l4 “ # 1 ifyi“yj “yk“yl 0 otherwise, SinceyP t˘1un, we have y“l4“ ˆ1 `y 2 ˙b4 ` ˆ1 ´y 2 ˙b4 .
Note that for any quadruple of distinct nodespi, j, k, lqwe have
pEAHqijkl “ pq1b4` pp´qqy“l4qijkl.
In the asymptotic regime of p“alogn{`n´31˘ andq “blogn{`n´31˘, there is a sharp information-theoretic threshold for exact recovery. Results regarding this hypergraphic stochastic block model will appear in subsequent publication. Here we will focus on the Gaussian counterpart.
Analogously to the relationship between the SBM and the spiked Wigner model, the hypergraphic version of the SBM suggests the following spiked tensor model:
T“y“l4`σW,
whereWis a random 4-tensor with i.i.d. standard Gaussian entries. We note that here the noise tensor W is not symmetric, unlike in the spiked Wigner model. This is not crucial: assuming W to be a symmetric tensor will only scaleσby?4!.
2
The Gaussian planted bisection model
Given a sampleT, our goal is to recover the hidden spikeyup to a global sign flip. Letpybe an estimator ofycomputed fromT. Letppyp;σqbe the probability thatpysuccessfully recoversy. SinceWis Gaussian,ppyp;σqis maximized when
p y “ argmin xPt˘1un:1T x“0} T´x“l4 }2 F,
or equivalentlypy“argmaxx@x“l4,TD, the maximum-likelihood estimator (MLE)
ofy.
Letfpxq “@x“l4,TDand
p
yM L be the MLE ofy. By definition,
ppypM L;σq “Pr ¨ ˝fpyq ą max xPt˘1uzty,´yu 1Tx“0 fpxq ˛ ‚.
Theorem 1. Let ǫą0 be a constant not depending on n. Then, as n grows,
ppypM L;σq converges to 1 if σ ă p1 ´ǫqσ˚ and ppypM L;σq converges to 0 if
σą p1`ǫqσ˚, where σ˚“ c 1 8¨ n3{2 ? logn.
Here we present a sketch of the proof while deferring the details to the appendix. Observe thatypM Lis not equal toy if there existsxP t˘1un distinct
from y such that 1Tx “ 0 andfpxq ěfpyq. For each fixed x, the difference
fpxq ´fpyqis equal to
@
y“l4, x“l4´y“l4D`σ@W, x“l4´y“l4D
which is a Gaussian random variable with mean@y“l4, xl“4´y“l4Dand variance
σ2}x“l4´y“l4}2 F. By definition we have @ x“l4, y“l4D“ 1281 `p1T1`xTyq4 ` p1T1´xTyq4˘ . Letφptq “ 1 128pp1`tq 4 ` p1´tq4 q. Then,@x“l4, y“l4D“n4φpxTy{nqso @ y“l4, x“l4´y“l4D “ ´n4`φp1q ´φpxTy{nq˘, }x“l4 ´y“l4 }F “ n2b2φ p1q ´2φpxTy{nq. Hence, Prpfpxq ´fpyq ě0qis equal to Pr G„Np0,1q ˆ Gě n 2 σ?2 ¨ b φp1q ´φpxTy{nq ˙ .
This probability is maximized when xand y differs by only two indices, that is, xTy “ n´4. Indeed one can formally prove that the probability that y
maximizesfpxqis dominated by the probability thatfpyq ąfpxqfor allxwith
xTy “n´4. By union bound and standard Gaussian tail bounds, the latter
probability is at most ´n 2 ¯2 exp ˆ ´n 4 4σ2 ¨ pφp1q ´φp1´4{nqq ˙ which is approximately exp ˆ 2 logn´n 3¨φ1p1q σ2 ˙ “exp ˆ 2 logn´ n 3 4σ2 ˙ , and it isonp1qifσ2ă n3 8 logn as in Theorem 1.
3
Efficient recovery
Before we address the Gaussian model, let us describe an algorithm for hyper-graph partitioning. LetH be a 4-uniform hypergraph on the vertex setV “ rns. LetAH be the adjacency tensor ofH. Then the problem can be formulated as
max@AH, x“l4D subject toxP t˘1un,1Tx“0.
One approach for finding a partition is to consider the multigraph realization of
H, which appears in [28, 29] in different terminology. Let Gbe the multigraph onV “ rnssuch that the multiplicity of edgeti, juis the number of hyperedges
ePEpHqcontainingti, ju. One may visualize it as substituting each hyperedge by a 4-clique. Now, one may attempt to solve the reduced problem
max@AG, xxT
D
subject to xP t˘1un,1Tx“0
which is unfortunately NP-hard in general. Instead, we consider the semidefinite programming (SDP) relaxation
max xAG, Xy
subject to Xii“1 for alliP rns,
@
X,11TD“0, Xľ0.
WhenAG is generated under the graph stochastic block model, this algorithm
recovers the hidden partition down to optimum parameters. On the other hand, it achieves recovery to nearly-optimum parameters when Gis the multigraph corresponding to the hypergraph generated by a hypergraphic stochastic block model: there is a constant multiplicative gap between the guarantee and the information-theoretic limit. This will be treated in a future publication.
Here we had two stages in the algorithm: (1) “truncating” the hypergraph down to a multigraph, and (2) relaxing the optimization problem on the trun-cated objective function.
Now let us return to the Gaussian model T “ y“l4
`σW. Let fpxq “
@
x“l4,TD. Our goal is to find the maximizer off
pxq. Note that fpxq “ ÿ i1,¨¨¨,i4Prns Ti1i2i3i4¨ 1 16 ˜ 4 ź s“1 p1`xisq ` 4 ź s“1 p1´xisq ¸
so fpxqis a polynomial of degree 4 in variables x1, . . . , xn. Letfp2qpxqbe the
degree 2 truncation offpxq, i.e.,
fp2qpxq “ 1 8 ÿ i1,¨¨¨,i4Prns Ti1i2i3i4 ˜ ÿ 1ďsătď4 xisxit ¸ .
Here we have ignored the constant term of fpxq since it does not affect the maximizer. For eachtsătu Ď t1,2,3,4u, letQst benbynmatrix where
Qstij “ ÿ pi1,¨¨¨,i4qPrns 4 is“i,it“j Ti1i2i3i4. Then, fp2qpxq “ 1 8 @ Q, xxTD
whereQ“ř1ďsătď4Qst. This Qis analogous to the adjacency matrix of the
multigraph constructed above. It is now natural to consider the following SDP: max xQ, Xy
subject to Xii“1 for alliP rns,
@
X,11TD“0, X ľ0.
(1)
Theorem 2. Let ǫą0 be a constant not depending on n. Let Yp be a solution of (1) andppYp;σqbe the probability thatYp coincide withyyT. Ifσă p1´ǫqσp˚2q
where σp˚2q“ c 3 32¨ n3{2 ? logn “ c 3 4¨σ ˚, thenppYp;σq “1´onp1q.
We present a sketch of the proof where details are deferred to the appendix. We note that a similar idea was used in [16, 21] for the graph stochastic block model.
We construct a dual solution of (1) which is feasible with high probability, and certifies that yyT is the unique optimum solution for the primal (1). By complementary slackness, such dual solution must be of the formS:“DQ1´Q1
whereQ1 “diagpyqQdiagpyq andDQ1 is diagpQ11q. It remains to show thatS
To show that S is positive semidefinite, we claim that the second smallest eigenvalue ofES is Θpn3
qand the operator norm }S´ES} isOpσn3{2?logn
q
with high probability. The first part is just an easy calculation, and the second part is application of a nonasymptotic bound on Laplacian random matrices [21]. Hence,Sis positive semidefinite with high probability ifσn3{2?logn
Àn3,
matching with the order ofσ˚ p2q.
4
Standard Spiked Tensor Model
Montanari and Richard proposed a statistical model for tensor Principal Com-ponent Analysis [31]. In the model we observe a random tensorT“vb4
`σW
where v P Rn is a vector with }v} “ ?n (spike), W is a random 4-tensor with i.i.d. standard Gaussian entries, and σě0 is the noise parameter. They showed a nearly-tight information-theoretic threshold for approximate recovery: whenσ"n3{2 then the recovery is information-theoretically impossible, while
ifσ!n3{2 then the MLE gives a vectorv1PRn with|vTv1| “ p1´onp1qqnwith
high probability. Subsequently, sharp phase transitions for weak recovery, and strong and weak detection were shown in [32].
Those information-theoretic thresholds are achieved by the MLE for which no efficient algorithm is known. Montanari and Richard considered a simple spectral algorithm based on tensor unfolding, which is efficient in both theory and practice. They show that the algorithm finds a solutionv1 with |vTv1| “
p1´onp1qqnwith high probability as long asσ“Opnq[31]. This is somewhat believed to be unimprovable using semidefinite programming [33, 34], or using approximate message passing algorithms [35].
For clear comparison to the Gaussian planted bisection model, let us consider when spike is in the hypercubet˘1un. LetyP t˘1unandσě0. Given a tensor
T“yb4`σW
where W is a random 4-tensor with independent, standard Gaussian entries, we would like to recover y exactly. The MLE is given by the maximizer of
gpxq:“@xb4,TDover all vectorsxP t˘1un.
Theorem 3. Let ǫ ą 0 be any constant which does not depend on n. Let
λ˚ “ ?2¨n3{2
?log
n . Whenσą p1`ǫqλ˚, exact recovery is information-theoretically
impossible (i.e. the MLE fails with1´onp1qprobability), while if σă p1´ǫqλ˚
then the MLE recoversy with 1´onp1qprobability.
The proof is just a slight modification of the proof of Theorem 1 which ap-pears in the appendix. Note that bothλ˚andσ˚are in the order ofn3{2{?logn.
The standard spiked tensor model and the Gaussian planted bisection model ex-hibit similar behaviour when unbounded computational resources are given.
4.1
Sum-of-Squares based algorithms
Here we briefly discuss Sum-of-Squares based relaxation algorithms. Given a polynomial p P Rrx1,¨ ¨ ¨, xns, consider the problem of finding the maximum
ofppxqover xPRn satisfying polynomial equalitiesq
1pxq “0,¨ ¨ ¨, qmpxq “0.
Most hard combinatorial optimization problems can be reduced into this form, including max-cut,k-colorability, and general constraint satisfaction problems. TheSum-of-Squares hierarchy (SoS) is a systematic way to relax a polynomial optimization problem to a sequence of increasingly strong convex programs, each leading to a larger semidefinite program. See [36] for a good exposition of the topic.
There are many different ways to formulate the SoS hierarchy [37, 38, 39, 40]. Here we choose to follow the description based onpseudo-expectation functionals
[36].
For illustration, let us recall the definition of gpxq: given a tensor T “
yb4`σW, we define gpxq “@xb4,TD. Heregpxqis a polynomial of degree 4,
and the corresponding maximum-likelihood estimator is the maximizer max gpxq
subject to x2i “1 for alliP rns,
n
ÿ
i“1
xi“0, xPRn.
(2)
Let µ be a probability distribution over the set txP t˘1un : 1Tx “ 0u. We
can rewrite (2) as maxµEx„µgpxq over such distributions. A linear functional
r
E on Rrxs is called pseudoexpectation of degree 2ℓ if it satisfies Er1 “ 1 and
r
Eqpxq2
ě0 for anyqPRrxsof degree at mostℓ. We note that any expectation
Eµ is a pseudoexpectation, but the converse is not true. So (2) can be relaxed to
max Ergpxq
subject to Er is a pseudoexpectation of degree 2ℓ,
r
Eis zero on I
(3)
whereIĎRrxsis the ideal generated byřni“1xi andtx2i ´1uiPrns.
The space of pseudoexpectations is a convex set which can be described as an affine section of the semidefinite cone. Asℓincreases, this space gets smaller and in this particular case, it coincides with the set of true expectations when
ℓ“n.
4.2
SoS on spiked tensor models
We would like to apply the SoS algorithm to the spiked tensor model. In par-ticular, let us consider the degree 4 SoS relaxation of the maximum-likelihood problem. We note that for the spike tensor model with the spherical prior, it is known that neither algorithms using tensor unfolding [31], the degree 4 SoS
relaxation of the MLE [33], nor approximate message passing [35] can achieve the statistical threshold, therefore the statistical-computational gap is believed to be present. Moreover, higher degree SoS relaxations were considered in [34]: for any smallǫě0, degree at leastnΩpǫqis required in order to achieve recovery
whenσ«n1`ǫ.
We show an analogous gap of the degree 4 SoS relaxation fort˘1unand1Tx
prior.
Theorem 4. Let T“yb4`σWbe as defined above. Let gpxq “@xb4,TD. If
σÀ n
logΘp1q
n, then the degree 4 SoS relaxation of maxxgpxq gives the solution
y, i.e., Ergpxq is maximized when Er is the expectation operator of the uniform distribution onty,´yu.
On the other hand, if σ ÁnlogΘp1qn then there exists a pseudoexpectation
r
Eof degree 4 on the hypercube t˘1un satisfying1Tx“0 such that
gpyq ămax
r
E
r
Egpxq,
soy is not recovered via the degree 4 SoS relaxation.
The proof of Theorem 4 is very similar to one that appears in [33, 34]. In the proof it is crucial to observe that
n3
logΘp1qnÀ Er:degree 4max pseudo-exp.
r
E@W, xb4DÀn3logΘp1qn.
The upper bound can be shown via Cauchy-Schwarz inequality for pseudoex-pectations and the lower bound, considerably more involved, can be shown by constructing a pseudoexpectationrEwhich is highly correlated to the entries of
W. We refer the readers to the appendix for details.
4.3
Comparison with the planted bisection model
Here we summarize the statistical-computational thresholds of two models, the planted bisection model and the spiked tensor model.
• For the planted bisection model, there is a constant c ą 0 not depend-ing on n such that for any ǫ ą 0 if σ ă pc´ǫq n3{2
?
logn then recovery is
information-theoretically possible, and if σ ą pc`ǫq n3{2
?log
n then the
re-covery is impossible via any algorithm. Moreover, there is an efficient algorithm and a constantc1 ăc such that the algorithm recoversy with high probability whenσăc1?n3{2
logn.
• For the spiked tensor model, there is a constant C ą 0 not depending on n such that for any ǫ ą 0 if σ ă pC ´ǫq?n3{2
logn then the recovery
is information-theoretically possible, and if σ ą pC`ǫq n3{2
?log
recovery is impossible. In contrast to the planted bisection model, all effi-cient algorithms known so far only successfully recoveryin the asymptotic regime ofσÀn.
We note that the efficient, nearly-optimal recovery is achieved for the planted bisection model by “forgetting” higher moments of the data. On the other hand, such approach is unsuitable for the spiked tensor model since the signal
yb4 does not seem to be approximable by a non-trivial low-degree polynomial.
This might shed light into an interesting phenomenon in average complexity, as in this case it seems to be crucial that second moments (or pairwise relations) carry information.
All information-theoretic phase transition exhibited in the paper readily gen-eralize tok-tensor models and this will be discussed in a future publication. In future work we will also investigate the performance of higher degree SoS algo-rithms for the planted bisection model.
References
[1] A. Goldenberg, A. X. Zheng, S. E. Fienberg, E. M. Airoldiet al., “A sur-vey of statistical network models,”Foundations and TrendsR in Machine
Learning, vol. 2, no. 2, pp. 129–233, 2010.
[2] S. Fortunato, “Community detection in graphs,”Physics reports, vol. 486, no. 3, pp. 75–174, 2010.
[3] M. E. Newman, D. J. Watts, and S. H. Strogatz, “Random graph models of social networks,”Proceedings of the National Academy of Sciences, vol. 99, no. suppl 1, pp. 2566–2572, 2002.
[4] E. M. Marcotte, M. Pellegrini, H.-L. Ng, D. W. Rice, T. O. Yeates, and D. Eisenberg, “Detecting protein function and protein-protein interactions from genome sequences,”Science, vol. 285, no. 5428, pp. 751–753, 1999. [5] J. Chen and B. Yuan, “Detecting functional modules in the yeast protein–
protein interaction network,” Bioinformatics, vol. 22, no. 18, pp. 2283– 2290, 2006.
[6] I. Cabreros, E. Abbe, and A. Tsirigos, “Detecting community structures in hi-c genomic data,” in Information Science and Systems (CISS), 2016 Annual Conference on. IEEE, 2016, pp. 584–589.
[7] G. Linden, B. Smith, and J. York, “Amazon. com recommendations: Item-to-item collaborative filtering,”IEEE Internet computing, vol. 7, no. 1, pp. 76–80, 2003.
[8] S. Sahebi and W. W. Cohen, “Community-based recommendations: a so-lution to the cold start problem,” in Workshop on recommender systems and the social web, RSWEB, 2011.
[9] R. Wu, J. Xu, R. Srikant, L. Massouli´e, M. Lelarge, and B. Hajek, “Clus-tering and inference from pairwise comparisons,” in ACM SIGMETRICS Performance Evaluation Review, vol. 43, no. 1. ACM, 2015, pp. 449–450. [10] P. W. Holland, K. B. Laskey, and S. Leinhardt, “Stochastic blockmodels:
First steps,”Social networks, vol. 5, no. 2, pp. 109–137, 1983.
[11] E. Mossel, J. Neeman, and A. Sly, “A proof of the block model threshold conjecture,”arXiv preprint arXiv:1311.4115, 2013.
[12] E. Abbe and C. Sandon, “Community detection in general stochastic block models: Fundamental limits and efficient algorithms for recovery,” in Foun-dations of Computer Science (FOCS), 2015 IEEE 56th Annual Symposium on. IEEE, 2015, pp. 670–688.
[13] E. Abbe, A. S. Bandeira, and G. Hall, “Exact recovery in the stochastic block model,” IEEE Transactions on Information Theory, vol. 62, no. 1, pp. 471–487, 2016.
[14] Y. Chen and J. Xu, “Statistical-computational tradeoffs in planted prob-lems and submatrix localization with a growing number of clusters and submatrices,” Journal of Machine Learning Research, vol. 17, no. 27, pp. 1–57, 2016.
[15] E. Abbe and C. Sandon, “Recovering communities in the general stochas-tic block model without knowing the parameters,” in Advances in neural information processing systems, 2015, pp. 676–684.
[16] B. Hajek, Y. Wu, and J. Xu, “Achieving exact cluster recovery threshold via semidefinite programming,”IEEE Transactions on Information Theory, vol. 62, no. 5, pp. 2788–2797, 2016.
[17] L. Massouli´e, “Community detection thresholds and the weak ramanujan property,” in Proceedings of the 46th Annual ACM Symposium on Theory of Computing. ACM, 2014, pp. 694–703.
[18] V. A. N. Vu, “A simple SVD algorithm for finding hidden partitions,” pp. 1–12, 2014. [Online]. Available: http://arxiv.org/abs/1404.3918v1
[19] E. Abbe and C. Sandon, “Detection in the stochastic block model with multiple clusters: proof of the achievability conjectures, acyclic bp, and the information-computation gap,”arXiv preprint arXiv:1512.09080, 2015. [20] E. Abbe, “Community detection and the stochastic block model: recent
developments,” 2016.
[21] A. S. Bandeira, “Random Laplacian Matrices and Convex Relaxations,” pp. 1–35, 2016.
[22] A. Javanmard, A. Montanari, and F. Ricci-Tersenghi, “Phase transitions in semidefinite relaxations,”Proceedings of the National Academy of Sciences, vol. 113, no. 16, pp. E2218–E2223, 2016.
[23] D. F´eral and S. P´ech´e, “The largest eigenvalue of rank one deformation of large wigner matrices,”Communications in mathematical physics, vol. 272, no. 1, pp. 185–228, 2007.
[24] A. Montanari and S. Sen, “Semidefinite programs on sparse random graphs and their application to community detection,” arXiv preprint arXiv:1504.05910, 2015.
[25] A. Perry, A. S. Wein, A. S. Bandeira, and A. Moitra, “Optimality and sub-optimality of pca for spiked random matrices and synchronization,”arXiv preprint arXiv:1609.05573, 2016.
[26] S. Agarwal, J. Lim, L. Zelnik-Manor, P. Perona, D. Kriegman, and S. Be-longie, “Beyond pairwise clustering,” in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 2. IEEE, 2005, pp. 838–845.
[27] D. Zhou, J. Huang, and B. Sch¨olkopf, “Learning with hypergraphs: Cluster-ing, classification, and embeddCluster-ing,” inNIPS, vol. 19, 2006, pp. 1633–1640. [28] D. Ghoshdastidar and A. Dukkipati, “Consistency of spectral partitioning of uniform hypergraphs under planted partition model,” in Advances in Neural Information Processing Systems, 2014, pp. 397–405.
[29] L. Florescu and W. Perkins, “Spectral thresholds in the bipartite stochastic block model,” arXiv preprint arXiv:1506.06737, 2015.
[30] K. Ahn, K. Lee, and C. Suh, “Community recovery in hypergraphs,” in
Allerton Conference on Communication, Control and Computing. UIUC, 2016.
[31] A. Montanari and E. Richard, “A statistical model for tensor PCA,” p. 30, nov 2014. [Online]. Available: http://arxiv.org/abs/1411.1076
[32] A. Perry, A. S. Wein, and A. S. Bandeira, “Statistical limits of spiked tensor models,” p. 39, dec 2016. [Online]. Available: http://arxiv.org/abs/1612.07728
[33] S. Hopkins, “Tensor principal component analysis via sum-of-squares proofs,”arXiv:1507.03269, vol. 40, pp. 1–51, 2015.
[34] V. Bhattiprolu, V. Guruswami, and E. Lee, “Certifying Random Polynomials over the Unit Sphere via Sum of Squares Hierarchy,” 2016. [Online]. Available: http://arxiv.org/abs/1605.00903
[35] T. Lesieur, L. Miolane, M. Lelarge, F. Krzakala, and L. Zdeborov´a, “Sta-tistical and computational phase transitions in spiked tensor estimation,”
arXiv preprint arXiv:1701.08010, 2017.
[36] B. Barak and D. Steurer, “Sum-of-squares proofs and the quest toward optimal algorithms,” inProceedings of the International Congress of Math-ematicians, 2014.
[37] N. Z. Shor, “An approach to obtaining global extremums in polynomial mathematical programming problems,”Cybernetics, vol. 23, no. 5, pp. 695– 700, 1987.
[38] P. A. Parrilo, “Structured semidefinite programs and semialgebraic geome-try methods in robustness and optimization,” Ph.D. dissertation, California Institute of Technology, 2000.
[39] Y. Nesterov, “Squared functional systems and optimization problems,” in
High performance optimization. Springer, 2000, pp. 405–440.
[40] J. B. Lasserre, “Global optimization with polynomials and the problem of moments,” SIAM Journal on Optimization, vol. 11, no. 3, pp. 796–817, 2001.
[41] P. Terwilliger, “The subconstituent algebra of an association scheme,(part i),”Journal of Algebraic Combinatorics, vol. 1, no. 4, pp. 363–388, 1992. [42] A. Schrijver, “New code upper bounds from the terwilliger algebra and
semidefinite programming,” IEEE Transactions on Information Theory, vol. 51, no. 8, pp. 2859–2866, 2005.
[43] J. A. Tropp, “User-friendly tail bounds for sums of random matrices,”
Foundations of computational mathematics, vol. 12, no. 4, pp. 389–434, 2012.
A
Proof of Theorem 1
In this section, we prove Theorem 1 for generalk.
Theorem 5(Theorem 1, generalk). Letkbe a positive integer withkě2. Let
yP t˘1un with 1Ty“0 andTbe a k-tensor defined asT“y“lk`σWwhere
Wis a randomk-tensor with i.i.d. standard Gaussian entries. LetypM Lbe the
maximum-likelihood estimator ofy, i.e.,
p
yM L“ argmax xPt˘1un:1Tx
@
T, x“lkD.
For any positiveǫ,
(i) ypM L is equal to y with probability 1´onp1qif σă p1´ǫqσ˚, and
(ii) ypM L is not equal toy with probability1´onp1q ifσą p1`ǫqσ˚
where σ˚“ c k 2k ¨ nk´21 ? 2 logn.
First we prove the following lemma.
Lemma 6. Let φ be a function defined as φptq “ 1 22k´1 ` p1´tqk` p1`tqk˘. Then, @ x“lk, y“lkD“nkφ ˆ xTy n ˙
for anyx, yP t˘1un such that 1Tx“1Ty“0.
Proof. Note thatx“lk“ 1 2k ` p1`xqbk` p1´xqbk˘. Hence, @ x“lk, y“lkD“ 1 22k ÿ s,tPt˘1u x1`sx,1`tyyk. Since1Tx“1Ty“0, we have @ x“lk, y“lkD “ 1 22k ` 2p1T1`xTyqk`2p1T1´xTyqk˘ “ n k 22k´1 ˜ˆ 1`x Ty n ˙k ` ˆ 1´x Ty n ˙k¸ “ nkφ ˆ xTy n ˙ as desired.
Proof of Theorem 5. Let
By definition,ypM Lis not equal toy if there existsxP t˘1undistinct fromy or ´y such that1Tx“0 andfpxqis greater than or equal tofpyq. For each fixed
xP t˘1un with1Tx“0, note that
fpxq ´fpyq “@y“lk, x“lk´y“lkD`σ@W, x“lk´y“lkD
is a Gaussian random variable with mean´nkpφp1q ´φpxTy{nqand variance
σ2 }x“lk´y“lk}2 F “2σ2nkpφp1q ´φpxTy{nqq. Hence, Prpfpxq ´fpyq ě0qis equal to Pr G„Np0,1q ˜ Gě n k{2 σ?2¨ d φp1q ´φ ˆ xTy n ˙¸ . Letσ˚ be σ˚ “aφ1p1q ¨ n k´1 2 ? 2 logn. Sinceφ1p1q “ k
2k, it matches with the definition in the statement of the Theorem.
Upper bound. Let us prove that ppypM L;σq “ 1´onp1qif σ ă p1´ǫqσ˚. By
union bound, we have 1´ppypM L;σq ď ÿ xPt˘1un zty,´yu 1Tx“0 Prpfpxq ´fpyq ě0q “ ÿ xPt˘1un zty,´yu 1Tx“0 Pr G„Np0,1q ˜ Gě n k{2 σ?2 ¨ d φp1q ´φ ˆ xTy n ˙¸ ď ÿ xPt˘1un zty,´yu 1Tx“0 exp ˆ ´n k 4σ2 ˆ φp1q ´φ ˆ xTy n ˙˙˙ .
The last inequality follows from a standard Gaussian tail bound PrGpGątq ď
expp´t2
{2q.
A simple counting argument shows that the number of x P t˘1un with
1Tx“0 andxTy “n´4ris exactly `n{2
r
˘2
forrP t0,1,¨ ¨ ¨, n{2u. Moreover, for anytě0 we haveφptq “φp´tq. Hence,
1´ppypM L;σq ď2 rnÿ{4s r“1 ˆ n{2 r ˙2 exp ˆ ´n k 4σ2 ˆ φp1q ´φ ˆ 1´4r n ˙˙˙ . Note thatφp1q ´φp1´xq ěφ1p1qx´Opx2
q. Hence, there exists an absolute constant C ą0 such that φp1q ´φp1´4r{nq is at leastp1´ǫqφ1p1q ¨4r{n if
răCnand is at least Ωp1qotherwise. Sinceσă p1´ǫqσ˚ we have
´n k 4σ2 ˆ φp1q ´φ ˆ 1´4r n ˙˙ ď ´2r1logn ´ǫ ď ´2p1`ǫqrlogn
ifr ăCn, and ´4nσk2 `
φp1q ´φ`1´4r n
˘˘
“ ´Ωpnlognq otherwise. It implies that
1´ppypM L;σq ď 2
˜Cn ÿ
r“1
expp´2ǫrlognq `nexpp´Ωpnlognqq
¸
À n´2ǫ`nexpp´Ωpnlognqq “onp1q.
Lower bound. Now we prove that ppypM L;σq “ onp1q if σ ą p1´ǫqσ˚. Let
A“ tiP rns:yi“ `1uand B“ rnszA. For eachaPA andb PB, let Eab be
the event thatfpypabqqis greater thanfpyqwhereypabqis the˘1-vector obtained
by flipping the signs ofya andyb. For anyHAĎAandHB ĎB, note that
1´ppypM L;σq ě Pr ˜ ď aPHA,bPHB Eab ¸ “ Pr ˆ max aPHA,bPHB ´ fpypabqq ´fpyq¯ą0 ˙
since any of the eventEabimplies thatypM L‰y. Recall that
fpypabqq ´fpyq “ ´nk ˆ φp1q ´φ ˆ 1´4 n ˙˙ `σAW,pypabqq“lk´y“lkE. So, 1´pppyM L;σqis at least Pr ¨ ˝max aPHA bPHB A W,pypabqq“lk´y“lk E ą n k σ ˆ φp1q ´φ ˆ 1´4 n ˙˙˛ ‚.
Fix HA Ď A and HB Ď B with |HA| “ |HB| “ h where h “ logn2
n. Let pX,Y,Zqbe the partition ofrnsk defined as
X “ tαP rnsk:α´1
pHAYHBq “ Hu,
Y “ tαP rnsk:|α´1pHAYHBq| “1u,
Z “ tαP rnsk:|α´1
pHAYHBq| ě2u.
LetWX,WYandWZ be thek-tensor supported onX ,Y,Z respectively. For eachaPHAand bPHB, let
Xab “ A WX,pypabqq“lk´y“lk E , Yab “ A WY,pypabqq“lk´y“lk E , Zab “ A WZ,pypabqq“lk´y“lk E .
(i) Xab“0 for anyaPHA andbPHB.
(ii) For fixedaPHA andbPHB, the variables Yab andZab are independent.
(iii) EachYab can be decomposed intoYa`Yb wheretYauaPHAY tYbubPHB is a
collection of i.i.d. Gaussian random variables. Proof of Claim. First note that `pypabqq“lk´y“lk˘
α is non-zero if and only if |α´1
paq|and |α´1
pbq| have the same parity. This implies (i) since when αPX we have|α´1paq| “ |α´1pbq| “0. (ii) holds becauseYXZ“0.
ForsPHAYHB, letYsbe the subset ofY such that
Ys“ tαPY:|α´1psq| “1u.
By definition,Ysare disjoint andY “ŤsPHAYHBYs. Hence,
Yab“ ÿ sPHAYHB A WYs,pyp abqq“lk´y“lkE. Moreover,@WYs,pyp
abqq“lk´yl“kDis zero whensR ta, bu. So,
Yab“
ÿ
αPYaYYb
Wαppypabqql“k´y“lkqα.
Note that forαPYa
ppypabqq“lk´y“lkqα“ $ ’ & ’ % `1 if|α´1 pAzHAq| “0 ´1 if|α´1pBzHBq| “0 0 otherwise. So, ÿ αPYa Wαppypabqq“lk´y“lkqα“ ÿ αPYa |α´1 pAzHAq|“0 Wα´ ÿ αPYa |α´1 pBzHBq|“0 Wα
which does not depend on the choice ofb. Let
Ya “ ÿ αPYa |α´1pAzHAq|“0 Wα´ ÿ αPYa |α´1pBzHBq|“0 Wα
and Yb respectively. Then Yab “ Ya `Yb and tYsusPHAYHB is a collection of
independent Gaussian random variables. Moreover, the variance ofYs is equal
to 2k`n
2 ´h
˘k´1
, independent of the choice ofs. By the claim, we have@W,pypabqq“lk´y“lkD“Y
a`Yb`Zab. Moreover, max aPHA bPHB pYa`Yb`Zabq ě max aPHA bPHB pYa`Ybq ´max aPHA bPHB p´Zabq “ max aPHA Ya`max bPHB Yb´max aPHA bPHB p´Zabq.
We need the following tail bound on the maximum of Gaussian random variables.
Lemma 7. Let G1, . . . , GN be (not necessarily independent) Gaussian random
variables with variance1. Letǫą0 be a constant which does not depend onN. Then, Pr ˆ max i“1,¨¨¨,NGią p1`ǫq a 2 logN ˙ ďN´ǫ.
On the other hand,
Pr ˆ max i“1,¨¨¨,NGi ă p1´ǫq a 2 logN ˙ ďexpp´NΩpǫqq ifGi’s are independent.
By the Lemma, we have max aPHA Ya ě p1´0.01ǫq c 2 logh¨2k´n 2 ´h ¯k´1 max bPHA Yb ě p1´0.01ǫq c 2 logh¨2k´n 2 ´h ¯k´1 max aPHA bPHB Zab À a
logh¨max VarpZabq
with probability 1´onp1q. Note that VarpZabq “ }pypabqq“lk´y“lk}2 F ´ pVarpYAq `VarpYBqq “ 2nk ˆ φp1q ´φ ˆ 1´ 4 n ˙˙ ´4k´n 2 ´h ¯k´1 ď 8nk´1 ˆ φ1p1q ´ p1´op1qqk 2k ˙ which isopnk´1q. Hence, max aPHA bPHB A W,pypabqq“lk´y“lkE ě 2p1´0.01ǫ´op1qq c 2 logn¨2k´n 2 ´h ¯k´1 ě p1´0.01ǫ´op1qq c knk´1logn 2k´5 .
On the other hand, sinceσą p1`ǫqσ˚ we have
nk σ ˆ φp1q ´φ ˆ 1´n4 ˙˙ ă n k 1`ǫ¨ 4φ1p1q n ¨ d 2 logn nk´1φ1p1q ă 11 `ǫ c nk´1logn 2k´5
which is less than max aPHA bPHB A W,pypabqq“lk´y“lkE
with probability 1´onp1q. Thus, 1´ppypM Lq ě1´onp1q.
B
Proof of Theorem 2
In this section, we prove Theorem 2 for generalk.
Letk be a positive integer withką2. LetyP t˘1un with1Ty“0 andT
be ak-tensor defined as T “y“lk`σW where W is a randomk-tensor with
i.i.d. standard Gaussian entries. Letfpxq “ @x“lk,TD and letf
p2qpxq be the
degree 2 truncation offpxq, i.e.,
fp2qpxq “ ÿ αPrnsk Tα ˜ 1 2k´1 ÿ 1ďsătďk xαpsqxαptq ¸ .
For eachtsătu Ď rks, let Qst benbynmatrix where
Qstij “12 ¨ ˚ ˚ ˝ ÿ αPrnsk αpsq“i,αptq“j Tα` ÿ αPrnsk αpsq“j,αptq“i Tα ˛ ‹ ‹ ‚ Then, fp2qpxq “ 1 2k´1 @ Q, xxTD
whereQ“ř1ďsătďkQst. We consider the following SDP relaxation for maxxfp2qpxq:
max xQ, Xy
subject to Xii“1 for alliP rns,
@
X,11TD“0, X ľ0.
(4)
Theorem 8(Theorem 2, generalk). Let ǫą0be a constant not depending on
n. LetYp be a solution of (4) andppYp;σqbe the probability thatYp coincide with
yyT. Letσ p2q be σp2q“ c kpk´1q 22k´1 ¨ nk´21 ? 2 logn Ifσă p1´ǫqσp2q, then ppYp;σq “1´onp1q.
Proof. First note that Qst“ nk´2
2k´1yyT `σWst where
Wijst“ ÿ
αPrnsk
αpsq“i,αptq“j
So we have Q“nk´2¨kpk´1q 2k yy T `σWĎ whereĎW “ř1ďsătďkWst.
The dual program of (4) is
max trpDq
subject to D`λ11T ´Qľ0,
D is diagonal.
(5)
By complementary slackness, yyT is the unique optimum solution of (4) if
there exists a dual feasible solutionpD, λqsuch that
@
D`λ11T´Q, yyTD“0
and the second smallest eigenvalue ofD`λ11T ´Qis greater than zero. For
brevity, letS“D`λ11T ´Q. Since S is positive semidefinite, we must have
Sy“0, that is, Dii “ n ÿ j“1 Qijyiyj for anyiP rns.
For a symmetric matrixM, we define theLaplacian LpMqofM asLpMq “
diagpM1q ´M (See [21]). Using this language, we can expressS as
S“diagpyq`LpdiagpyqQdiagpyqq `λyyT˘diagpyq.
Note that
diagpyqQdiagpyq “nk´2¨kpk2´k 1q11T `σdiagpyqWĎdiagpyq.
Hence, the Laplacian of diagpyqQdiagpyqis equal to
nk´1¨kpk´1q 2k ˆ Inˆn´ 1 n11 T ˙ looooooooooooooooooooomooooooooooooooooooooon deterministic part
`σlooooooooooooomooooooooooooonL`diagpyqĎWdiagpyq˘
noisy part
.
This matrix is positive semidefinite if
σ››LpdiagpyqWĎdiagpyq››ďnk´1¨kpk2´k 1q.
Moreover, if the inequality is strict then the second smallest eigenvalue ofS is greater than zero.
By triangle inequality, we have
›
›LpdiagpyqWĎdiagpyqq›› ď max
iPrns n ÿ j“1 Ď Wijyiyj` ÿ 1ďsătďk }diagpyqWstdiagpyq} ď max iPrns n ÿ j“1 Ď Wijyiyj` ˆ k 2 ˙? 2nk´1.
The second inequality holds with high probability since Wst has independent
Gaussian entries. Sinceřnj“1ĎWijyiyjis Gaussian and centered, with high
prob-ability we have that
max iPrns n ÿ j“1 Ď Wijyiyj ď p1`0.1ǫq a 2 logn¨ ˜ max iPrnsVar ˜ n ÿ j“1 Ď Wijyiyj ¸¸1{2 ď p1`0.1ǫqa2 logn¨ dˆ k 2 ˙ nk´1. Hence, › ›LpdiagpyqWĎdiagpyqq››ď p1`0.1ǫ`op1qq d 2 ˆ k 2 ˙ nk´1logn
with high probability. So,Sľ0 as long as
σp1`0.1ǫ`op1qq d 2 ˆ k 2 ˙ nk´1lognănk´1 ¨kpk´1q 2k or simply p1`0.1ǫ`op1qqσă c kpk´1q 22k´1 ¨ nk´21 ? 2 logn “σp2q.
So, whenσă p1´ǫqσp2qwith high probabilityYp “yyT.
C
Pseudo-expectation and its moment matrix
C.1
Notation and preliminaries
Let V be the space of real-valued functions on the n-dimensional hypercube
t´1,`1un. For each S Ď rns, let x
S “śiPSxi. Note that txS : S Ď rnsu is
a basis ofV. Hence, any function f : t˘1un ÑR can be written as a unique
linear combination of multilinear monomials, say
fpxq “ ÿ
SĎrns
fSxS.
The degree of f is defined as the maximum size of S Ă rns such that fS is
nonzero.
Letℓ be a positive integer. Let us denote the collection of subsets ofrnsof size at mostℓby`rďnℓs˘, and the sizeˇˇˇ`rďnℓs
˘ˇˇˇof it by`n
ďℓ
˘
.
LetM be a square symmetric matrix of size`ďnℓ˘. The rows and the columns ofM are indexed by the elements in`rďnℓs˘. To avoid confusion, we useMrS, Ts
We say M is SoS-symmetric if MrS, Ts “ MrS1, T1s whenever x
SxT “
xS1xT1on the hypercube. SincexP t˘1un, it means thatS‘T “S1‘T1where
S‘T denotes the symmetric difference ofS andT. Givenf PV with degree at most 2ℓ, we sayM represents f if
fU “
ÿ
S,TPprďnℓsq:
S‘T“U
MrS, Ts.
We useMf to denote the unique SoS-symmetric matrix representingf.
LetLbe a linear functional onV. By linearity,Lis determined bypLrxSs:
S Ď rnsq. Let XL be the SoS-symmetric matrix of size
`n
ďℓ
˘
with entries
XrS, Ts “LrxS‘Ts. We callXL themoment matrix ofL of degree2ℓ. By definition, we have Lrfs “ ÿ UPprns ď2ℓq fULrxUs “ ÿ S,TPprďnℓsq MfrS, TsXLrS, Ts “ xXL, Mfy
for anyf of degree at most 2ℓ.
C.2
A quick introduction to pseudo-expectations
For our purpose, we only work on pseudo-expectations defined on the hypercube
t˘1un. See [36] for general definition.
Letℓbe a positive integer and d“2ℓ. Apseudo-expectation of degree don
t˘1un is a linear functional Er on the space V of functions on the hypercube
such that (i) Err 1s “1, (ii) Err q2
s ě0 for anyqPV of degree at mostℓ“d{2. We say Er satisfies the system of equalities tpipxq “0um
i“1 if rErfs “0 for any
f PV of degree at most dwhich can be written as
f “p1q1`p2q2` ¨ ¨ ¨ `pmqm
for someq1, q2, . . . , qmPV.
We note the following facts:
• IfEris a pseudo-expectation of degreed, then it is also a pseudo-expectation of any degree smaller thand.
• IfEris a pseudo-expectation of degree 2n, thenErdefines a valid probability distribution supported onP:“ txP t˘1un:pipxq “0 for alliP rmsu.
The second fact implies that maximizing fpxq onP is equivalent to maxi-mizingErr fsover all pseudo-expectations of degree 2nsatisfyingtpipxq “0um
i“1.
Now, letdbe an even integer such that
dąmaxtdegpfq,degpp1q,¨ ¨ ¨,degppmqu.
We relax the original problem to max Err fs
subject to Er is degree-dpseudo-expectation ont˘1un
satisfyingtpipxq “0um i“1.
(SoSd)
We note that the value of (SoSd) decreases as d grows, and it reaches the optimum value maxxPPp0pxqof the original problem at d“2n.
C.3
Matrix point of view
Let Er be a pseudo-expectation of degree d “ 2ℓ for some positive integer ℓ. Suppose that rE satisfies the system tpipxq “ 0um
i“1. Let XEr be the moment
matrix ofEr of degree 2ℓ, hence the size ofXrEis`ďnℓ˘.
Conditions forEr being pseudo-expectation translate as the following condi-tions forXrE:
(i) XH,H“1.
(ii) X is positive semidefinite. Moreover,rEsatisfiestpipxq “0um
i“1 if and only if
(iii) Let U be the space of functions in V of degree at most dwhich can be written asřmi“1piqi for someqiPV. Then,
@
Mf, XEr D
“0 for anyf PU. Hence, (SoSd) can be written as the following semidefinite program
max xMf, Xy
subject to XH,H“1
xMq, Xy “0 for all qPB
X ľ0,
(SDPd)
whereBis any finite subset ofU which spansU, for example, B“ txSpipxq:iP rms,|S| ďd´degppiqu.
D
Proof of Theorem 4
Lety P t˘1un such that 1Ty “0, σ ą 0, andW P pRnqb4 be 4-tensor with
independent, standard Gaussian entries. Given a tensor T “ yb4
would like to recover the planted solutiony fromT. Letfpxq “@T, xb4D. The
maximum-likelihood estimator is given by the optimum solution of max
xPt˘1un:1T
x“0fpxq.
Consider the SoS relaxation of degree 4 max Err fs
subject to Er is a pseudo-expectation of degree 4 ont˘1un
satisfying
n
ÿ
i“1
xi “0.
LetEUpty,´yuqbe the expectation operator of the uniform distribution onty,´yu,
i.e.,EUpty,´yuqrxSs “yS if|S|is even, andEUpty,´yuqrxSs “0 if|S|is odd.
IfEUpty,´yuqis the optimal solution of the relaxation, then we can recovery
from it up to a global sign flip. First we give an upper bound onσ to achieve it with high probability.
Theorem 9 (Part one of Theorem 4). Let T “ yb4 `σW and fpxq “
@
T, xb4D be as defined above. If σ À ?n
logn, then the relaxation maxrEErr fs
over pseudo-expectationEr of degree 4 satisfyingřni“1xi“0is maximized when
r
E“EUpty,´yuq with probability1´onp1q.
We can reduce Theorem 9 to the matrix version of the problem via flattening [31]. Given a 4-tensor T, the canonical flattening of T is defined as n2
ˆn2
matrix T with entries Tpi,jq,pk,ℓq “ Tijkℓ. Then, T “ ryyrT `σW where ry is
the vectorization ofyyT, and W is the flattening of W. Note that this is an instance ofZ2-synchronization model with Gaussian noises. It follows that with
high probability the exact recovery is possible whenσÀ ?n
logn (see Proposition
2.3 in [21]).
We complement the result by providing a lower bound onσwhich is off by polylog factor.
Theorem 10 (Part two of Theorem 4). Let c ą 0 be a small constant. If
σěnplognq1{2`c, then there exists a pseudo-expectation Er of degree 4 on the
hypercube t˘1un satisfying řn
i“1xi “0 such that rErfs ąfpyq with probability
1´onp1q.
D.1
Proof of Theorem 10
Letgpxqbe the noise part offpxq, i.e., gpxq “@W, xb4D. Let r
E be a pseudo-expectation of degree 4 on the hypercube which satisfies the equalityřni“1xi “
Lemma 11. Let gpxq be the polynomial as defined above. Then, there exists a pseudo-expectation of degree 4 on the hypercube satisfying1Tx“0 such that
r
Ergs Á n
3
plognq1{2`op1q.
We prove Theorem 10 using the lemma.
Proof of Theorem 10. Note that gpyq “@W, yb4Dis a Gaussian random
vari-able with variance n4. So, gpyq ď n2logn with probability 1´op1q. Let Er
be the pseudo-expectation satisfying the conditions in the lemma. Then, with probability 1´op1qwe have r Erfs ´fpyq “ ´pn4´rErpyTxq4 sq `σpErr gs ´gpyqq ě ´n4`σ ˆ n3 logn`n 2log n ˙ ě ´n4` p1´op1qq σn 3 plognq1{2`op1q.
Sinceσąnplognq1{2`c for some fixed constantcą0, we haveErr fs ´fpyq ą0
as desired.
In the remainder of the section, we prove Lemma 11.
D.1.1 Outline
We note that our method shares a similar idea which appears in [33] and [34]. We are given a random polynomialgpxq “@W, xb4DwhereWhas
indepen-dent standard Gaussian entries. We would like to constructEr“ErWwhich has large correlation withW. If we simply let
r Erxi1xi2xi3xi4s “ 1 24 ÿ πPS4 Wiπp1q,iπp2q,iπp3q,iπp4q
forti1ăi2ăi3ăi4u Ď rnsandErr xTsbe zero if|T| ď3, then
r Ergs “ 241 ÿ 1ďi1ăi2ăi3ăi4ďn ˜ ÿ πPS4 Wiπp1q,iπp2q,iπp3q,iπp4q ¸2
so the expectation of Err gs over W would be equal to `n4˘ « n4
24. However,
in this case Er does not satisfies the equality 1Tx “ 0 nor the conditions for
pseudo-expectations.
To overcome this, we first project the Er constructed above to the space of linear functionals which satisfy the equality constraints (x2
i “1 and1Tx“0).
Then, we take a convex combination of the projection and a pseudo-expectation to control the spectrum of the functional. The details are following:
(1) (Removing degeneracy) We establish the one-to-one correspondence be-tween the collection of linear functionals on n-variate, even multilinear polynomials of degree at most 4 and the collection of linear functionals onpn´1q-variate multilinear polynomials of degree at most 4 by posing
xn “1. This correspondence preserves positivity.
(2) (Description of equality constraints) Letψbe a linear functional onpn´1q -variate multilinear polynomials of degree at most 4. We may thinkψ as a vector inRp
n´1
ď4q. Then, the condition thatψ satisfies řn´1
i“1 xi`1“0
can be written asAψ“0 for some matrixA. (3) (Projection) LetwPRp
n´1
ď4qbe the coefficient vector ofgpxq. Let Π be the
projection matrix to the spacetx:Ax“0u. In other words, Π“Id´ATpAATq:A
whereIdis the identity matrix of size`nď´41˘andp¨q: denotes the
pseudo-inverse. Lete be the first column of Π andψ1 “ eΠTw
w. Then pψ1qH “1
andAψ1“0 by definition.
(4) (Convex combination) Letψ0“ eTe
e. We note thatψ0 corresponds to the
expectation operator of uniform distribution ontxP t˘1un:1Tx“0u.
We will constructψby
ψ“ p1´ǫqψ0`ǫψ1
with an appropriate constantǫ. Equivalently,
ψ“ψ0` ǫ eTw¨ ˆ Π´ee T eTe ˙ w.
(5) (Spectrum analysis) We bound the spectrum of the functional´Π´eeT
eTe
¯
w
to decide the size ofǫforψbeing positive semidefinite.
D.1.2 Removing degeneracy
Recall that
gpxq “ ÿ i,j,k,lPrns
Wijklxixjxkxℓ.
Observe thatg is even, i.e.,gpxq “gp´xqfor anyxP t˘1un. To maximize such
an even function, we claim that we may only consider the pseudo-expectations such that whose odd moments are zero.
Proposition 12. Let Er be a pseudo-expectation of degree 4 on hypercube sat-isfying řni“1xi “0. Let p be a degree 4 multilinear polynomial which is even.
Then, there exists a pseudo-expectation Er1 of degree 4 such that Err ps “Er1rps
Proof. LetrEbe a pseudo-expectation of degree 4 on hypercube satisfyingřni“1xi“
0. Let us define a linear functionalEr0 on the space of multilinear polynomials
of degree at most 4 so thatEr0rxSs “ p´1q|S|Err xSsfor any SP
`rns
ď4
˘
. Then, for any multilinear polynomialqof degree at most 2, we have
r
E0rqpxq2s “Err qp´xq2s ě0.
Also,rE0 satisfiesEr0r1s “1 and
r E0 «˜ n ÿ i“1 xi ¸ qpxq ff “ ´Er «˜ n ÿ i“1 xi ¸ qp´xq ff “0
for anyqof degree 3. Thus,Er0 is a valid pseudo-expectation of degree 4
satis-fyingřni“1xi “0.
LetEr1“ 1
2pE`r Er0q. This is again a valid pseudo-expectation, since the space
of pseudo-expectations is convex. We haverE1rppxqs “Err ppxqs “Er0rppxqssince
pis even, andEr1rxSs “ p1` p´1q|S|qErr xSs “0 for anyS of odd size.
LetE be the space of all pseudo-expectations of degree 4 onn-dimensional hypercube with zero odd moments. LetE1be the space of all pseudo-expectations of degree 4 onpn´1q-dimensional hypercube. We claim that there is a bijection between two spaces.
Proposition 13. Let ψPE. Let us define a linear functional ψ1 on the space
of pn´1q-variate multilinear polynomials of degree at most 4 so that for any
T Ď rn´1s with|T| ď4
ψ1rxTs “
#
ψrxTYtnus if |T| is odd
ψrxTs otherwise.
Then,ψÞÑψ1 is a bijective mapping from E toE1.
Proof. We say linear functionalψon the space of polynomials of degree at most 2ℓispositive semidefinite ifψrq2s ě0 for anyq of degreeℓ.
Note that the mapping ψ1 ÞÑ ψ where ψrxSs “ ψ1rxSztnus for any S Ď rns
of even size is the inverse of ψÞÑψ1. Hence, it is sufficient to prove thatψ is positive semidefinite if and only ifψ1 is positive semidefinite.
(ñ) Let q be an n-variate polynomial of degree at most 2. Let q0 and q1 be
polynomials inx1,¨ ¨ ¨, xn´1 such that
qpx1,¨ ¨ ¨, xnq “q0px1,¨ ¨ ¨ , xn´1q `xnq1px1,¨ ¨ ¨ , xn´1q.
We getψ1rq2s “ψ1rpq0`x
nq1q2s “ψ1rpq02`q12q `2xnq0q1s. For i“1,2, letqi0
andqi1 be the even part and the odd part of qi, respectively. Then we have
ψ1rq2s “ ψ1rpq002 `q 2 01`q 2 10`q 2 11q `2xnpq00q11`q01q10qs “ ψrpq002 `q 2 01`q 2 10`q 2 11q `2pq00q11`q01q10qs “ ψrpq00`q11q2` pq10`q01q2s ě0.
The first equality follows from that ψ1rqs “0 for odd q. Hence,ψ1 is positive
semidefinite.
(ð) Letqbe an pn´1q-variate polynomial of degree at most 2. Letq0 andq1
be the even part and the odd part ofq, respectively. Then,
ψrq2s “ψrpq02`q 2
1q `2q0q1s.
Note thatq2
0`q12is even andq0q1 is odd. So,
ψrq2s “ψ1rpq20`q 2
1q `2xnq0q1s “ψ1rpq0`xnq1q2s ě0.
Henceψis positive semidefinite.
In addition to the proposition, we note that ψ satisfies řni“1xi “ 0 if and
only ifψ1satisfies 1`řn´1
i“1 xi“0. Hence, maximizingErr gsoverrEPEsatisfying
řn i“1xi“0 is equivalent to max ψ1PE1ψ 1rg1s subject to ψ1 satisfies 1` nÿ´1 i“1 xi “0, whereg1px1,¨ ¨ ¨ , x n´1q “gpx1,¨ ¨ ¨ , xn´1,1q.
D.1.3 Matrix expression of linear constraints
LetF be the set of linear functional on the space ofpn´1q-variate multilinear polynomials of degree at most 4. We often regard a functionalψPFas a`n´1
ď4
˘
dimensional vector with entriesψS “ψrxSswhereSis a subset ofrn´1sof size
at most 4. The spaceE1 of pseudo-expectations of degree 4 (onpn´1q-variate multilinear polynomials) is a convex subset ofF.
Observe thatψPF satisfies 1`řin“´11xi“0 if and only if
ψ «˜ 1` nÿ´1 i“1 xi ¸ xS ff “0 for anySĎ rn´1swith|S| ď3.
Lets,t andube integers such that 0ďs, tď4 and 0ďuďminps, tq. Let
Mu
s,t be the matrix of size
`n´1 ď4 ˘ such that pMs,tqS,Tu “ # 1 if|S| “s,|T| “t, and|SXT| “u 0 otherwise forS, T P`rn´1s ď4 ˘
. Then, the condition that ψPF satisfying 1`řin“´11xi “0
can be written asAψ“0 where
A“M0 0,0`M00,1` 3 ÿ s“1 pMs,ss´´11`Ms s,s`Ms,ss `1q.
D.1.4 Algebra generated by Mu s,t
Letmbe a positive integer greater than 8. For nonnegative integerss, t, u, let
Mu s,t be the `m ď4 ˘ ˆ`m ď4 ˘ matrix with pMs,tqS,Tu “ # 1 if|S| “s,|T| “t, and|SXT| “u 0 otherwise,
forS, TĎ rmswith|S|,|T| ď4. LetAbe the algebra of matrices
ÿ
0ďs,tď4
sÿ^t u“0
xus,tMs,tu
with complex numbers xu
s,t. This algebra A is a C˚-algebra: it is a complex
algebra which is closed under taking complex conjugate. A is a subalgebra of the Terwilliger algebra of the Hamming cubeHpm,2q[41], [42].
Note that A has dimension 55 which is the number of triples ps, t, uqwith 0ďs, tď4 and 0ďuďs^t. Define βus,t,r:“ sÿ^t p“0 p´1qp´t ˆ p u ˙ˆ m´2r p´r ˙ˆ m´r´p s´p ˙ˆ m´r´p t´p ˙
for 0ďs, tď4 and 0ďr, uďs^t. The following theorem says that matrices in the algebraAcan be written in a block-diagonal form with small sized blocks.
Theorem 14([42]). There exists an orthogonal`ďm4˘ˆ`ďm4
˘
matrixU such that forM PAwith M “ 4 ÿ s,t“0 sÿ^t u“0 xus,tMs,tu,
the matrixUTM U is equal to the matrix ¨ ˚ ˚ ˚ ˚ ˝ C0 0 0 0 0 0 C1 0 0 0 0 0 C2 0 0 0 0 0 C3 0 0 0 0 0 C4 ˛ ‹ ‹ ‹ ‹ ‚ where each Cr is a block diagonal matrix with
`m r ˘ ´`rm´1 ˘ repeated, identical blocks of order5´r: Cr“ ¨ ˚ ˚ ˚ ˝ Br 0 ¨ ¨ ¨ 0 0 Br ¨ ¨ ¨ 0 .. . ... . .. ... 0 0 ¨ ¨ ¨ Br ˛ ‹ ‹ ‹ ‚,
and Br“ ˜ ÿ u ˆ m´2r s´r ˙´1{2ˆ m´2r t´r ˙´1{2 βs,t,ru xus,t ¸4 s,t“r .
For brevity, let us denote this block-diagonalization of M by the tuple of matricespB0, B1, B2, B3, B4qwithp5´rq ˆ p5´rqmatrixBr’s.
D.1.5 Projection
Recall thatg1px1,¨ ¨ ¨, xn´1qis equal to
ÿ
i,j,k,ℓPrns
Wijkℓxti,j,k,ℓuztnu.
LetcPRp
n´1
ď4qbe the coefficient vector ofg1. Since entries ofWare independent
Standard Gaussians,c is a Gaussian vector with a diagonal covariance matrix Σ“ErccTswhere ΣS,S“ $ ’ & ’ % n if|S| “0 12n´16 if|S| “1 or|S| “2 24 if|S| “3 or|S| “4.
Letw“Σ´1{2c. By definition,wis a random vector with i.i.d. standard normal
entries.
Let Π“Id´ATpAATq`A wherepAATq` is the Moore-Penrose
pseudoin-verse of AAT and Id is the identity matrix of order `n´1
ď4
˘
. Then, Π is the orthogonal projection matrix onto the nullspace ofA. SinceA,AT and Idare
all in the algebraA, the projection matrix Π is also inA. Letebe the first column of Π and
ψ0:“
e
eTe and ψ1:“
Πw eTw.
We have Aψ0 “ Aψ1 “ 0 by definition of Π, and pψ0qH “ pψ1qH “ 1 since
pΠwqH“eTw.
Let ǫ be a real number with 0 ă ǫ ă 1 and ψ “ p1´ǫqψ0`ǫψ1. This
functional still satisfiesAψ “0 and ψH “1, regardless of the choice of ǫ. We would like to chooseǫsuch thatψis positive semidefinite with high probability.
D.1.6 Spectrum of ψ
Consider the functionalψ0“ eTe
e. It has entries pψ0qS “ $ ’ & ’ % 1 ifS“ H ´ 1 n´1 if|S| “1 or 2 3 pn´1qpn´3q if|S| “3 or 4,
forS Ď rn´1sof size at most 4. We note that this functional corresponds to the degree 4 or less moments of the uniform distribution on the set of vectors
xP t˘1un´1satisfyingřn´1
i“1 xi`1“0.
Proposition 15. Let ψ be a vector in Rp
n´1
ď4q such that Aψ “ 0 and p be an pn´1q-variate multilinear polynomial of degree at most 2. Suppose that
ψ0rp2s “0. Then, ψrp2s “0.
Proof. Let U “ txP t˘1un´1 :řn´1
i“1 xi`1 “0u. Note thatψ0 is the
expec-tation functional of the uniform distribution on U as we seen above. Hence,
ψ0rp2s “0 if and only ifppxq2“0 for anyxPU.
On the other hand, the functionalψ is a linear combination of functionals
tp ÞÑ ppxq : x P Uu since Aψ “ 0. Hence, if ψ0rp2s “ 0 then ψrp2s “ 0 as
ppxq2
“0 for anyxPU.
Recall that ψ “ p1 ´ǫqψ0`ǫψ1 where ψ0 “ eTe
e and ψ1 “ Πw eT w. Let ψ11 “eTw¨ pψ1´ψ0q. Then, ψ11 “ Πw´ eTw eTee “ ˆ Π´ee T eTe ˙ w andψ“ψ0`eTǫ
wψ11. We note that Aψ11“0 sinceψ11 is a linear combination
ofψ0 andψ1.
LetXψ0 andXψ11 be the moment matrix ofψ0 andψ
1
1respectively. LetXψ
be the moment matrix ofψ. Clearly,
Xψ“Xψ0`
ǫ eTwXψ11.
Moreover, for any pP Rp
n´1 ď2q satisfyingX ψ0p“ 0, we haveXψ11p “0 by the proposition. Hence,Xψľ0 if ǫ |eTw|}Xψ11} ďλmin,‰0pXψ0q
whereλmin,‰0 denotes the minimum nonzero eigenvalue.
We note that eTw and }Xψ1
1} are independent random variables. It
fol-lows from that w is a gaussian vector with i.i.d. standard entries, and thate
and ´Π´eeT
eTe
¯
are orthogonal. Hence, we can safely bound eTw and }X ψ1
1}
separately.
To bound}Xψ1
1} we need the following theorem.
Theorem 16 (Matrix Gaussian ([43])). Let tAku be a finite sequence of fixed, symmetric matrices with dimensiond, and lettξkube a finite sequence of
inde-pendent standard normal random variables. Then, for anytě0,
Pr«›››››ÿ k ξkAk › › › › ›ět ff ďd¨e´t2{2σ2 where σ2:“ › › › › › ÿ k A2k › › › › ›.