Community detection in hypergraphs, spiked tensor models, and Sum-of-Squares

(1)

arXiv:1705.02973v1 [cs.DS] 8 May 2017

Community Detection in Hypergraphs, Spiked

Tensor Models, and Sum-of-Squares

Chiheon Kim

∗

Afonso S. Bandeira

†

Michel X. Goemans

‡

May 9, 2017

Abstract

We study the problem of community detection in hypergraphs under a stochastic block model. Similarly to how the stochastic block model in graphs suggests studying spiked random matrices, our model motivates investigating statistical and computational limits of exact recovery in a certain spiked tensor model. In contrast with the matrix case, the spiked model naturally arising from community detection in hypergraphs is dif-ferent from the one arising in the so-called tensor Principal Component Analysis model. We investigate the effectiveness of algorithms in the Sum-of-Squares hierarchy on these models. Interestingly, our results suggest that these two apparently similar models exhibit significantly different computational to statistical gaps.

1 Introduction

Community detection is a central problem in many fields of science and engineer-ing. It has received much attention for its various applications to sociological behaviours [1, 2, 3], protein-to-protein interactions [4, 5], DNA 3D conforma-tion [6], recommendaconforma-tion systems [7, 8, 9], and more. In many networks with community structure one may expect that the groups of nodes within the same community are more densely connected. The stochastic block model (SBM)

∗_{MIT, Department of Mathematics, 77 Massachusetts Ave., Cambridge, MA 02139} †_{Department of Mathematics and Center for Data Science, Courant Institute of}

Mathe-matical Sciences, New York University, New York, NY 10012. Part of this work was done while A. S. Bandeira was with the Mathematics Department at MIT and supported by NSF Grant DMS-1317308.

‡_{MIT, Department of Mathematics, Room 2-474, 77 Massachusetts Ave., Cambridge, MA}

(2)

[10] is arguably the simplest model that attempts to capture such community structure.

Under the SBM, each pair of nodes is connected randomly and independently with a probability decided by the community membership of the nodes. The SBM has received much attention for its sharp phase transition behaviours [11], [12], [13], computational versus information-theoretic gaps [14], [15], and as a test bed for many algorithmic approaches including semidefinite programming [13], [16], spectral methods [17], [18] and belief-propagation [19]. See [20] for a good survey on the subject.

Let us illustrate a version of the SBM with two equal-sized communities. Let

yP t˘1un_{be a vector indicating community membership of an even number}_n_of

nodes and assume that the size of two communities are equal, i.e.,1T_y_“_{0. Let}_p

andqbe inr0,1sindicating the density of edges within and across communities, respectively. Under the model, a random graphGis generated by connecting nodesiandj independently with probabilitypifyi“yj or with probabilityq

ifyi‰yj. Letypbe any estimator ofygiven a sampleG. We say thatypexactly

recoversy ifypis equal toy or´y with probability 1´o_np1q. In the asymptotic regime ofp_“alogn_{nand q_“blogn_{nwhere a_ąb, it has been shown that the maximum likelihood estimator (MLE) recoversy when?a_´?b_ą?2 and the MLE fails when?a_´?b_ă?2, showing a sharp phase transition behaviour [13]. Moreover, it was subsequently shown in [16] and [21] that the standard semidefinite programming (SDP) relaxation of the MLE achieves the optimal recovery threshold.

A fruitful way of studying phase transitions and effectiveness of different algorithms for the stochastic block model is to consider a Gaussian analogue of the model [22]. Let G be a graph generated by the SBM and let AG be the

adjacency matrix ofG. We have

EAG“ p`q 2 11 T _`p´q 2 yy T _´_pI.

It is then useful to thinkAG as a perturbation of the signalEAGunder centered

noise.

This motivates a model with Gaussian noise. Given a vectory_{P t˘}1un_with

1T_y_“_{0, a random matrix}_T _{is generated as}

Tij“yiyj`σWij

whereWij“Wji„Np0,1qfor alli, jP rns. This model is often referred toZ2

-synchronization [21, 22]. It is very closely related to the spiked Wigner model

(or spiked random matrix models in general) which has a rich mathematical theory [23, 24, 25].

In many applications, however, nodes exhibit complex interactions that may not be well captured by pairwise interactions [26, 27]. One way to increase the descriptive ability of these models is to considerk-wise interactions, giving rise to generative models on random hypergraphs and tensors. Specifically, a hypergraphic version of the SBM was considered in [28, 29], and a version of the censored block model in [30].

(3)

For the sake of exposition we restrict our attention to k_“4 in the sequel. Most of our results however easily generalize to anyk and will be presented in a subsequent publication. In the remaining of this section, we will introduce a hypergraphic version of the SBM and its Gaussian analogue.

1.1 Hypergraphic SBM and its Gaussian analogue

Here we describe a hypergraphic version of the stochastic block model. Lety

be a vector in t˘1un _{such that} ₁T_y _“_{0. Let} _p_and _q _{be in} _r₀_,₁_s_{. Under the}

model, a random 4-uniform hypergraphH onnnodes is generated so that each

ti, j, k, l_{u Ď r}n_sis included in E_pH_qindependently with probability

#

p ifyi“yj “yk“yl(in the same community)

q otherwise (across communities).

LetAH, the adjacency tensor ofH, be the 4-tensor given by

pA_Hqijkl_“ #

1 ifti, j, k, lu PEpHq

0 otherwise. Lety“l4 _{be the 4-tensor defined as}

y_ijkl“l4 “ # 1 ifyi“yj “yk“yl 0 otherwise, SinceyP t˘1un_{, we have} y“l4_“ ˆ₁ `y 2 ˙b4 ` ˆ₁ ´y 2 ˙b4 .

Note that for any quadruple of distinct nodespi, j, k, lqwe have

pEAHqijkl “ pq1b4` pp´qqy“l4qijkl.

In the asymptotic regime of p_“alogn_{`n´₃1˘ andq _“blogn_{`n´₃1˘, there is a sharp information-theoretic threshold for exact recovery. Results regarding this hypergraphic stochastic block model will appear in subsequent publication. Here we will focus on the Gaussian counterpart.

Analogously to the relationship between the SBM and the spiked Wigner model, the hypergraphic version of the SBM suggests the following spiked tensor model:

T_“y“l4_`σW,

whereWis a random 4-tensor with i.i.d. standard Gaussian entries. We note that here the noise tensor W is not symmetric, unlike in the spiked Wigner model. This is not crucial: assuming W to be a symmetric tensor will only scaleσby?4!.

(4)

2 The Gaussian planted bisection model

Given a sampleT, our goal is to recover the hidden spikeyup to a global sign flip. Letpybe an estimator ofycomputed fromT. Letp_py_p;σ_qbe the probability thatpysuccessfully recoversy. SinceWis Gaussian,ppyp;σqis maximized when

p y _“ argmin xPt˘1_un_:₁T x“0} T_´x“l4 }2 F,

or equivalentlypy“argmax_x@x“l4_,_TD_{, the maximum-likelihood estimator (MLE)}

ofy.

Letf_px_{q “}@x“l4_,_TD_and

p

yM L be the MLE ofy. By definition,

p_py_pM L;σq “Pr ¨ ˝f_py_{q ą} max xPt˘1_uzty,´yu 1T_x_“0 f_px_q ˛ ‚.

Theorem 1. Let ǫ_ą0 be a constant not depending on n. Then, as n grows,

p_py_pM L;σq converges to 1 if σ ă p1 ´ǫqσ˚ and ppypM L;σq converges to 0 if

σ_{ą p}1`ǫ_qσ˚_{, where} σ˚“ c 1 8¨ n3{2 ? logn.

Here we present a sketch of the proof while deferring the details to the appendix. Observe thatypM Lis not equal toy if there existsxP t˘1un distinct

from y such that 1T_x _“ _{0 and}_f_p_x_{q ě}_f_p_y_q_{. For each fixed} _x_{, the difference}

fpxq ´fpyqis equal to

@

y“l4, x“l4_´y“l4D_`σ@W, x“l4_´y“l4D

which is a Gaussian random variable with mean@y“l4_{, x}l“4_´_y“l4D_{and variance}

σ2_}_x“l4_´_y“l4_}2 F. By definition we have @ x“l4, y“l4D“ ₁₂₈1 `p1T1_`xTyq4 ` p1T1_´xTyq4˘ . Letφptq “ 1 128pp1`tq 4 ` p1´tq4 q. Then,@x“l4, y“l4D“n4φpxTy{nqso @ y“l4, x“l4ý“l4D “ ń4`φp1q ´φpxTy{nq˘, }x“l4 ý“l4 }F “ n2b₂_φ p1q ´2φ_pxT_y_{_n_q_. Hence, Prpfpxq ´fpyq ě0qis equal to Pr G„Np0,1_q ˆ Gě n 2 σ?2 ¨ b φp1q ´φpxT_y_{_n_q ˙ .

This probability is maximized when xand y differs by only two indices, that is, xT_y _“ _n_´_{4. Indeed one can formally prove that the probability that} _y

(5)

maximizesf_px_qis dominated by the probability thatf_py_{q ą}f_px_qfor allxwith

xT_y _“_n_´_{4. By union bound and standard Gaussian tail bounds, the latter}

probability is at most ´_n 2 ¯2 exp ˆ ´n 4 4σ2 ¨ pφp1q ´φp1´4{nqq ˙ which is approximately exp ˆ 2 logn´n 3_¨_φ1_p₁_q σ2 ˙ “exp ˆ 2 logn´ n 3 4σ2 ˙ , and it isonp1qifσ2_ă n3 8 logn as in Theorem 1.

3 Efficient recovery

Before we address the Gaussian model, let us describe an algorithm for hyper-graph partitioning. LetH be a 4-uniform hypergraph on the vertex setV “ rns. LetAH be the adjacency tensor ofH. Then the problem can be formulated as

max@A_H, x“l4D subject tox_{P t˘}1un,1Tx_“0.

One approach for finding a partition is to consider the multigraph realization of

H, which appears in [28, 29] in different terminology. Let Gbe the multigraph onV _{“ r}n_ssuch that the multiplicity of edgeti, j_uis the number of hyperedges

e_PE_pH_qcontainingti, j_u. One may visualize it as substituting each hyperedge by a 4-clique. Now, one may attempt to solve the reduced problem

max@AG, xxT

D

subject to x_{P t˘}1un_,₁T_x_“₀

which is unfortunately NP-hard in general. Instead, we consider the semidefinite programming (SDP) relaxation

max xAG, Xy

subject to Xii“1 for alliP rns,

@

X,11TD_“0, Xľ0.

WhenAG is generated under the graph stochastic block model, this algorithm

recovers the hidden partition down to optimum parameters. On the other hand, it achieves recovery to nearly-optimum parameters when Gis the multigraph corresponding to the hypergraph generated by a hypergraphic stochastic block model: there is a constant multiplicative gap between the guarantee and the information-theoretic limit. This will be treated in a future publication.

Here we had two stages in the algorithm: (1) “truncating” the hypergraph down to a multigraph, and (2) relaxing the optimization problem on the trun-cated objective function.

(6)

Now let us return to the Gaussian model T _“ y“l4

`σW. Let f_px_{q “}

@

x“l4_,_TD_{. Our goal is to find the maximizer of}_f

px_q. Note that fpxq “ ÿ i1,¨¨¨,i4Prns Ti1i2i3i4¨ 1 16 ˜ ₄ ź s“1 p1`xisq ` 4 ź s“1 p1´xisq ¸

so fpxqis a polynomial of degree 4 in variables x1, . . . , xn. Letfp2qpxqbe the

degree 2 truncation offpxq, i.e.,

f_p2_qpxq “ 1 8 ÿ i1,¨¨¨,i4Prns Ti1i2i3i4 ˜ ÿ 1_ďsătď4 xisxit ¸ .

Here we have ignored the constant term of f_px_q since it does not affect the maximizer. For eachts_ăt_{u Ď t}1,2,3,4u, letQst _be_n_by_n_{matrix where}

Qst_ij “ ÿ pi1,¨¨¨,i4qPrns 4 is“i,it“j Ti1i2i3i4. Then, f_p2_qpxq “ 1 8 @ Q, xxTD

whereQ“ř1ďsătď4Qst. This Qis analogous to the adjacency matrix of the

multigraph constructed above. It is now natural to consider the following SDP: max xQ, X_y

@

X,11TD_“0, X ľ0_.

(1)

Theorem 2. Let ǫą0 be a constant not depending on n. Let Yp be a solution of (1) andppYp;σqbe the probability thatYp coincide withyyT. Ifσă p1´ǫqσ_p˚₂_q

where σ_p˚₂_q“ c 3 32¨ n3{2 ? logn “ c 3 4¨σ ˚_, thenppYp;σq “1´onp1q.

We present a sketch of the proof where details are deferred to the appendix. We note that a similar idea was used in [16, 21] for the graph stochastic block model.

We construct a dual solution of (1) which is feasible with high probability, and certifies that yyT is the unique optimum solution for the primal (1). By complementary slackness, such dual solution must be of the formS:“DQ1´Q1

whereQ1 “diagpyqQdiagpyq andDQ1 is diagpQ11q. It remains to show thatS

(7)

To show that S is positive semidefinite, we claim that the second smallest eigenvalue ofES is Θpn3

qand the operator norm }S_´ES_} isO_pσn3_{2?_log_n

q

with high probability. The first part is just an easy calculation, and the second part is application of a nonasymptotic bound on Laplacian random matrices [21]. Hence,Sis positive semidefinite with high probability ifσn3_{2?_log_n

Àn3_,

matching with the order ofσ˚ p2q.

4 Standard Spiked Tensor Model

Montanari and Richard proposed a statistical model for tensor Principal Com-ponent Analysis [31]. In the model we observe a random tensorT_“vb4

`σW

where v _P Rn is a vector with }v_{} “} ?n (spike), W is a random 4-tensor with i.i.d. standard Gaussian entries, and σě0 is the noise parameter. They showed a nearly-tight information-theoretic threshold for approximate recovery: whenσ"n3{2 _{then the recovery is information-theoretically impossible, while}

ifσ!n3{2 _{then the MLE gives a vector}_v1_P_Rn _with_|_vT_v1_{| “ p}₁_´_o_np₁_qq_n_with

high probability. Subsequently, sharp phase transitions for weak recovery, and strong and weak detection were shown in [32].

Those information-theoretic thresholds are achieved by the MLE for which no efficient algorithm is known. Montanari and Richard considered a simple spectral algorithm based on tensor unfolding, which is efficient in both theory and practice. They show that the algorithm finds a solutionv1 _with _|_vT_v1_{| “}

p1´o_np1qqnwith high probability as long asσ_“O_pn_q[31]. This is somewhat believed to be unimprovable using semidefinite programming [33, 34], or using approximate message passing algorithms [35].

For clear comparison to the Gaussian planted bisection model, let us consider when spike is in the hypercubet˘1un_{. Let}_y_{P t˘}₁_un_and_σ_ě_{0. Given a tensor}

T_“yb4`σW

where W is a random 4-tensor with independent, standard Gaussian entries, we would like to recover y exactly. The MLE is given by the maximizer of

gpxq:“@xb4_,_TD_{over all vectors}_x_{P t˘}₁_un_.

Theorem 3. Let ǫ ą 0 be any constant which does not depend on n. Let

λ˚ “ ?2_¨n3_{2

?_log

n . Whenσą p1`ǫqλ˚, exact recovery is information-theoretically

impossible (i.e. the MLE fails with1´onp1qprobability), while if σă p1´ǫqλ˚

then the MLE recoversy with 1´onp1qprobability.

The proof is just a slight modification of the proof of Theorem 1 which ap-pears in the appendix. Note that bothλ˚andσ˚are in the order ofn3{2_{?_log_n_.

The standard spiked tensor model and the Gaussian planted bisection model ex-hibit similar behaviour when unbounded computational resources are given.

(8)

4.1 Sum-of-Squares based algorithms

Here we briefly discuss Sum-of-Squares based relaxation algorithms. Given a polynomial p P Rrx1,¨ ¨ ¨, xns, consider the problem of finding the maximum

ofppxqover xPRn _{satisfying polynomial equalities}_q

1pxq “0,¨ ¨ ¨, qmpxq “0.

Most hard combinatorial optimization problems can be reduced into this form, including max-cut,k-colorability, and general constraint satisfaction problems. TheSum-of-Squares hierarchy (SoS) is a systematic way to relax a polynomial optimization problem to a sequence of increasingly strong convex programs, each leading to a larger semidefinite program. See [36] for a good exposition of the topic.

There are many different ways to formulate the SoS hierarchy [37, 38, 39, 40]. Here we choose to follow the description based onpseudo-expectation functionals

[36].

For illustration, let us recall the definition of g_px_q: given a tensor T _“

yb4_`_σ_W_{, we define} _g_p_x_{q “}@_x_b4_,_TD_{. Here}_g_p_x_q_{is a polynomial of degree 4,}

and the corresponding maximum-likelihood estimator is the maximizer max gpxq

subject to x2_i _“1 for alli_{P r}n_s,

n

ÿ

i“1

xi“0, xPRn.

(2)

Let µ be a probability distribution over the set txP t˘1un _: ₁T_x _“ ₀_u_{. We}

can rewrite (2) as maxµEx„µgpxq over such distributions. A linear functional

r

E on Rrx_s is called pseudoexpectation of degree 2ℓ if it satisfies Er1 “ 1 and

r

Eq_px_q2

ě0 for anyq_P_Rrx_sof degree at mostℓ. We note that any expectation

Eµ is a pseudoexpectation, but the converse is not true. So (2) can be relaxed to

max Erg_px_q

subject to Er is a pseudoexpectation of degree 2ℓ,

r

Eis zero on I

(3)

whereI_Ď_Rrx_sis the ideal generated byřn_i_“1xi andtx2i ´1uiPrns.

The space of pseudoexpectations is a convex set which can be described as an affine section of the semidefinite cone. Asℓincreases, this space gets smaller and in this particular case, it coincides with the set of true expectations when

ℓ“n.

4.2 SoS on spiked tensor models

We would like to apply the SoS algorithm to the spiked tensor model. In par-ticular, let us consider the degree 4 SoS relaxation of the maximum-likelihood problem. We note that for the spike tensor model with the spherical prior, it is known that neither algorithms using tensor unfolding [31], the degree 4 SoS

(9)

relaxation of the MLE [33], nor approximate message passing [35] can achieve the statistical threshold, therefore the statistical-computational gap is believed to be present. Moreover, higher degree SoS relaxations were considered in [34]: for any smallǫ_ě0, degree at leastnΩ_pǫq_{is required in order to achieve recovery}

whenσ_«n1_`ǫ_.

We show an analogous gap of the degree 4 SoS relaxation fort˘1un_and₁T_x

prior.

Theorem 4. Let T_“yb4_`_σ_W_{be as defined above. Let} _g_p_x_{q “}@_xb4_,_TD_{. If}

σÀ n

logΘ_p1_q

n, then the degree 4 SoS relaxation of maxxgpxq gives the solution

y, i.e., Erg_px_q is maximized when Er is the expectation operator of the uniform distribution onty,´yu.

On the other hand, if σ ÁnlogΘp1q_n _{then there exists a pseudoexpectation}

r

Eof degree 4 on the hypercube t˘1un _satisfying₁T_x_“₀ _{such that}

gpyq ămax

r

E

r

Egpxq,

soy is not recovered via the degree 4 SoS relaxation.

The proof of Theorem 4 is very similar to one that appears in [33, 34]. In the proof it is crucial to observe that

n3

logΘp1qnÀ Er:degree 4max pseudo-exp.

r

E@W, xb4DÀn3logΘp1qn.

The upper bound can be shown via Cauchy-Schwarz inequality for pseudoex-pectations and the lower bound, considerably more involved, can be shown by constructing a pseudoexpectationrEwhich is highly correlated to the entries of

W. We refer the readers to the appendix for details.

4.3 Comparison with the planted bisection model

Here we summarize the statistical-computational thresholds of two models, the planted bisection model and the spiked tensor model.

• For the planted bisection model, there is a constant c _ą 0 not depend-ing on n such that for any ǫ ą 0 if σ ă pc´ǫq n3{2

?

logn then recovery is

information-theoretically possible, and if σ ą pc`ǫq n3_{2

?_log

n then the

re-covery is impossible via any algorithm. Moreover, there is an efficient algorithm and a constantc1 ăc such that the algorithm recoversy with high probability whenσ_ăc1_?n3{2

logn.

• For the spiked tensor model, there is a constant C ą 0 not depending on n such that for any ǫ _ą 0 if σ _{ă p}C _´ǫ_q_?n3{2

logn then the recovery

is information-theoretically possible, and if σ ą pC`ǫq n3{2

?_log

(10)

recovery is impossible. In contrast to the planted bisection model, all effi-cient algorithms known so far only successfully recoveryin the asymptotic regime ofσ_Àn.

We note that the efficient, nearly-optimal recovery is achieved for the planted bisection model by “forgetting” higher moments of the data. On the other hand, such approach is unsuitable for the spiked tensor model since the signal

yb4 _{does not seem to be approximable by a non-trivial low-degree polynomial.}

This might shed light into an interesting phenomenon in average complexity, as in this case it seems to be crucial that second moments (or pairwise relations) carry information.

All information-theoretic phase transition exhibited in the paper readily gen-eralize tok-tensor models and this will be discussed in a future publication. In future work we will also investigate the performance of higher degree SoS algo-rithms for the planted bisection model.

References

[1] A. Goldenberg, A. X. Zheng, S. E. Fienberg, E. M. Airoldiet al., “A sur-vey of statistical network models,”Foundations and TrendsR in Machine

Learning, vol. 2, no. 2, pp. 129–233, 2010.

[2] S. Fortunato, “Community detection in graphs,”Physics reports, vol. 486, no. 3, pp. 75–174, 2010.

[3] M. E. Newman, D. J. Watts, and S. H. Strogatz, “Random graph models of social networks,”Proceedings of the National Academy of Sciences, vol. 99, no. suppl 1, pp. 2566–2572, 2002.

[4] E. M. Marcotte, M. Pellegrini, H.-L. Ng, D. W. Rice, T. O. Yeates, and D. Eisenberg, “Detecting protein function and protein-protein interactions from genome sequences,”Science, vol. 285, no. 5428, pp. 751–753, 1999. [5] J. Chen and B. Yuan, “Detecting functional modules in the yeast protein–

protein interaction network,” Bioinformatics, vol. 22, no. 18, pp. 2283– 2290, 2006.

[6] I. Cabreros, E. Abbe, and A. Tsirigos, “Detecting community structures in hi-c genomic data,” in Information Science and Systems (CISS), 2016 Annual Conference on. IEEE, 2016, pp. 584–589.

[7] G. Linden, B. Smith, and J. York, “Amazon. com recommendations: Item-to-item collaborative filtering,”IEEE Internet computing, vol. 7, no. 1, pp. 76–80, 2003.

[8] S. Sahebi and W. W. Cohen, “Community-based recommendations: a so-lution to the cold start problem,” in Workshop on recommender systems and the social web, RSWEB, 2011.

(11)

[9] R. Wu, J. Xu, R. Srikant, L. Massouli´e, M. Lelarge, and B. Hajek, “Clus-tering and inference from pairwise comparisons,” in ACM SIGMETRICS Performance Evaluation Review, vol. 43, no. 1. ACM, 2015, pp. 449–450. [10] P. W. Holland, K. B. Laskey, and S. Leinhardt, “Stochastic blockmodels:

First steps,”Social networks, vol. 5, no. 2, pp. 109–137, 1983.

[11] E. Mossel, J. Neeman, and A. Sly, “A proof of the block model threshold conjecture,”arXiv preprint arXiv:1311.4115, 2013.

[12] E. Abbe and C. Sandon, “Community detection in general stochastic block models: Fundamental limits and efficient algorithms for recovery,” in Foun-dations of Computer Science (FOCS), 2015 IEEE 56th Annual Symposium on. IEEE, 2015, pp. 670–688.

[13] E. Abbe, A. S. Bandeira, and G. Hall, “Exact recovery in the stochastic block model,” IEEE Transactions on Information Theory, vol. 62, no. 1, pp. 471–487, 2016.

[14] Y. Chen and J. Xu, “Statistical-computational tradeoffs in planted prob-lems and submatrix localization with a growing number of clusters and submatrices,” Journal of Machine Learning Research, vol. 17, no. 27, pp. 1–57, 2016.

[15] E. Abbe and C. Sandon, “Recovering communities in the general stochas-tic block model without knowing the parameters,” in Advances in neural information processing systems, 2015, pp. 676–684.

[16] B. Hajek, Y. Wu, and J. Xu, “Achieving exact cluster recovery threshold via semidefinite programming,”IEEE Transactions on Information Theory, vol. 62, no. 5, pp. 2788–2797, 2016.

[17] L. Massouli´e, “Community detection thresholds and the weak ramanujan property,” in Proceedings of the 46th Annual ACM Symposium on Theory of Computing. ACM, 2014, pp. 694–703.

[18] V. A. N. Vu, “A simple SVD algorithm for finding hidden partitions,” pp. 1–12, 2014. [Online]. Available: http://arxiv.org/abs/1404.3918v1

[19] E. Abbe and C. Sandon, “Detection in the stochastic block model with multiple clusters: proof of the achievability conjectures, acyclic bp, and the information-computation gap,”arXiv preprint arXiv:1512.09080, 2015. [20] E. Abbe, “Community detection and the stochastic block model: recent

developments,” 2016.

[21] A. S. Bandeira, “Random Laplacian Matrices and Convex Relaxations,” pp. 1–35, 2016.

(12)

[22] A. Javanmard, A. Montanari, and F. Ricci-Tersenghi, “Phase transitions in semidefinite relaxations,”Proceedings of the National Academy of Sciences, vol. 113, no. 16, pp. E2218–E2223, 2016.

[23] D. Féral and S. Péché, “The largest eigenvalue of rank one deformation of large wigner matrices,”Communications in mathematical physics, vol. 272, no. 1, pp. 185–228, 2007.

[24] A. Montanari and S. Sen, “Semidefinite programs on sparse random graphs and their application to community detection,” arXiv preprint arXiv:1504.05910, 2015.

[25] A. Perry, A. S. Wein, A. S. Bandeira, and A. Moitra, “Optimality and sub-optimality of pca for spiked random matrices and synchronization,”arXiv preprint arXiv:1609.05573, 2016.

[26] S. Agarwal, J. Lim, L. Zelnik-Manor, P. Perona, D. Kriegman, and S. Be-longie, “Beyond pairwise clustering,” in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 2. IEEE, 2005, pp. 838–845.

[27] D. Zhou, J. Huang, and B. Sch¨olkopf, “Learning with hypergraphs: Cluster-ing, classification, and embeddCluster-ing,” inNIPS, vol. 19, 2006, pp. 1633–1640. [28] D. Ghoshdastidar and A. Dukkipati, “Consistency of spectral partitioning of uniform hypergraphs under planted partition model,” in Advances in Neural Information Processing Systems, 2014, pp. 397–405.

[29] L. Florescu and W. Perkins, “Spectral thresholds in the bipartite stochastic block model,” arXiv preprint arXiv:1506.06737, 2015.

[30] K. Ahn, K. Lee, and C. Suh, “Community recovery in hypergraphs,” in

Allerton Conference on Communication, Control and Computing. UIUC, 2016.

[31] A. Montanari and E. Richard, “A statistical model for tensor PCA,” p. 30, nov 2014. [Online]. Available: http://arxiv.org/abs/1411.1076

[32] A. Perry, A. S. Wein, and A. S. Bandeira, “Statistical limits of spiked tensor models,” p. 39, dec 2016. [Online]. Available: http://arxiv.org/abs/1612.07728

[33] S. Hopkins, “Tensor principal component analysis via sum-of-squares proofs,”arXiv:1507.03269, vol. 40, pp. 1–51, 2015.

[34] V. Bhattiprolu, V. Guruswami, and E. Lee, “Certifying Random Polynomials over the Unit Sphere via Sum of Squares Hierarchy,” 2016. [Online]. Available: http://arxiv.org/abs/1605.00903

(13)

[35] T. Lesieur, L. Miolane, M. Lelarge, F. Krzakala, and L. Zdeborov´a, “Sta-tistical and computational phase transitions in spiked tensor estimation,”

arXiv preprint arXiv:1701.08010, 2017.

[36] B. Barak and D. Steurer, “Sum-of-squares proofs and the quest toward optimal algorithms,” inProceedings of the International Congress of Math-ematicians, 2014.

[37] N. Z. Shor, “An approach to obtaining global extremums in polynomial mathematical programming problems,”Cybernetics, vol. 23, no. 5, pp. 695– 700, 1987.

[38] P. A. Parrilo, “Structured semidefinite programs and semialgebraic geome-try methods in robustness and optimization,” Ph.D. dissertation, California Institute of Technology, 2000.

[39] Y. Nesterov, “Squared functional systems and optimization problems,” in

High performance optimization. Springer, 2000, pp. 405–440.

[40] J. B. Lasserre, “Global optimization with polynomials and the problem of moments,” SIAM Journal on Optimization, vol. 11, no. 3, pp. 796–817, 2001.

[41] P. Terwilliger, “The subconstituent algebra of an association scheme,(part i),”Journal of Algebraic Combinatorics, vol. 1, no. 4, pp. 363–388, 1992. [42] A. Schrijver, “New code upper bounds from the terwilliger algebra and

semidefinite programming,” IEEE Transactions on Information Theory, vol. 51, no. 8, pp. 2859–2866, 2005.

[43] J. A. Tropp, “User-friendly tail bounds for sums of random matrices,”

Foundations of computational mathematics, vol. 12, no. 4, pp. 389–434, 2012.

(14)

A

Proof of Theorem 1

In this section, we prove Theorem 1 for generalk.

Theorem 5(Theorem 1, generalk). Letkbe a positive integer withkě2. Let

yP t˘1un _with ₁T_y_“₀ _and_T_{be a} _k_{-tensor defined as}_T_“_y“lk_`_σ_W_where

Wis a randomk-tensor with i.i.d. standard Gaussian entries. LetypM Lbe the

maximum-likelihood estimator ofy, i.e.,

p

yM L“ argmax xPt˘1_un_:₁T_x

@

T, x“lkD.

For any positiveǫ,

(i) ypM L is equal to y with probability 1´onp1qif σă p1´ǫqσ˚, and

(ii) ypM L is not equal toy with probability1´onp1q ifσą p1`ǫqσ˚

where σ_˚“ c k 2k ¨ nk´21 ? 2 logn.

First we prove the following lemma.

Lemma 6. Let φ be a function defined as φ_pt_{q “} 1 22k_´1 ` p1´t_qk_{` p}₁_`_t_qk˘_. Then, @ x“lk, y“lkD_“nkφ ˆ xT_y n ˙

for anyx, yP t˘1un _{such that} ₁T_x_“₁T_y_“₀_.

Proof. Note thatx“lk_“ 1 2k ` p1_`x_qbk_{` p}₁_´_x_qbk˘_{. Hence,} @ x“lk, y“lkD_“ 1 22k ÿ s,tPt˘1u x1_`sx,1_`ty_yk. Since1Tx“1Ty“0, we have @ x“lk, y“lkD _“ 1 22k ` 2p1T1_`xTy_qk_`2p1T1_´xTy_qk˘ “ n k 22k´1 ˜ˆ 1`x T_y n ˙k ` ˆ 1´x T_y n ˙k¸ “ nkφ ˆ xT_y n ˙ as desired.

Proof of Theorem 5. Let

(15)

By definition,ypM Lis not equal toy if there existsxP t˘1undistinct fromy or ´y such that1Tx_“0 andf_px_qis greater than or equal tof_py_q. For each fixed

x_{P t˘}1un _with₁T_x_“_{0, note that}

fpxq ´fpyq “@y“lk, x“lk´y“lkD`σ@W, x“lk´y“lkD

is a Gaussian random variable with mean´nk_p_φ_p₁_{q ´}_φ_p_xT_y_{_n_q_{and variance}

σ2 }x“lk_´y“lk_}2 F “2σ2nkpφp1q ´φpxTy{nqq. Hence, Prpfpxq ´fpyq ě0qis equal to Pr G„Np0,1q ˜ Gě n k{2 σ?2¨ d φp1q ´φ ˆ xT_y n ˙¸ . Letσ_˚ be σ_˚ “aφ1_p₁_{q ¨} n k_´1 2 ? 2 logn. Sinceφ1_p₁_{q “} k

2k, it matches with the definition in the statement of the Theorem.

Upper bound. Let us prove that ppypM L;σq “ 1´onp1qif σ ă p1´ǫqσ˚. By

union bound, we have 1´ppypM L;σq ď ÿ xPt˘1un zty,ýu 1T_x_“0 Prpfpxq ´fpyq ě0q “ ÿ xPt˘1un zty,ýu 1T_x_“0 Pr G„Np0,1_q ˜ Gě n k{2 σ?2 ¨ d φp1q ´φ ˆ xT_y n ˙¸ ď ÿ xPt˘1un zty,ýu 1Tx“0 exp ˆ ń k 4σ2 ˆ φp1q ´φ ˆ xT_y n ˙˙˙ .

The last inequality follows from a standard Gaussian tail bound PrGpG_ąt_{q ď}

expp´t2

{2q.

A simple counting argument shows that the number of x P t˘1un _with

1T_x_“_{0 and}_xT_y _“_n_´₄_r_{is exactly} `n{2

r

˘2

forrP t0,1,¨ ¨ ¨, n{2u. Moreover, for anytě0 we haveφptq “φp´tq. Hence,

1´p_py_pM L;σq ď2 rn_ÿ{4s r“1 ˆ n_{2 r ˙2 exp ˆ ´n k 4σ2 ˆ φ_p1q ´φ ˆ 1´4r n ˙˙˙ . Note thatφ_p1q ´φ_p1´x_{q ě}φ1_p₁_q_x_´_O_p_x2

q. Hence, there exists an absolute constant C ą0 such that φp1q ´φp1´4r{nq is at leastp1´ǫqφ1p1q ¨4r{n if

răCnand is at least Ωp1qotherwise. Sinceσă p1´ǫqσ˚ we have

´n k 4σ2 ˆ φp1q ´φ ˆ 1´4r n ˙˙ ď ´2r₁logn ´ǫ ď ´2p1`ǫqrlogn

(16)

ifr _ăCn, and ´4nσk2 `

φ_p1q ´φ`1´4r n

˘˘

“ ´Ωpnlogn_q otherwise. It implies that

1´p_py_pM L;σq ď 2

˜_Cn ÿ

r“1

expp´2ǫrlogn_{q `}nexpp´Ωpnlogn_qq

¸

À n´2ǫ`nexpp´Ωpnlognqq “onp1q.

Lower bound. Now we prove that p_py_pM L;σq “ onp1q if σ ą p1´ǫqσ˚. Let

A_{“ t}i_{P r}n_s:yi“ `1uand B“ rnszA. For eachaPA andb PB, let Eab be

the event thatf_pypabq_q_{is greater than}_f_p_y_q_where_ypabq_{is the}_˘_{1-vector obtained}

by flipping the signs ofya andyb. For anyHAĎAandHB ĎB, note that

1´p_py_pM L;σq ě Pr ˜ ď aPHA,bPHB Eab ¸ “ Pr ˆ max aPHA,bPHB ´ fpypabqq ´fpyq¯ą0 ˙

since any of the eventEabimplies thatypM L‰y. Recall that

fpypabqq ´fpyq “ ´nk ˆ φp1q ´φ ˆ 1´4 n ˙˙ `σAW,pypabqq“lk_´_y“lkE_. So, 1´p_p_pyM L;σqis at least Pr ¨ ˝max aPHA bPHB A W,pypabqq“lk´y“lk E ą n k σ ˆ φp1q ´φ ˆ 1´4 n ˙˙˛ ‚.

Fix HA Ď A and HB Ď B with |HA| “ |HB| “ h where h “ _logn2

n. Let pX,Y,Z_qbe the partition ofrnsk _{defined as}

X “ tαP rnsk_:_α´1

pHAYHBq “ Hu,

Y _{“ t}α_{P r}n_sk:|α´1_pHAYHBq| “1u,

Z “ tαP rnsk_:_|_α´1

pHAYHBq| ě2u.

LetWX,WYandWZ be thek-tensor supported onX ,Y,Z respectively. For eachaPHAand bPHB, let

Xab “ A WX,pypabqq“lký“lk E , Yab “ A WY,pypabqq“lký“lk E , Zab “ A WZ,pypabqq“lký“lk E .

(17)

(i) Xab“0 for anyaPHA andbPHB.

(ii) For fixedaPHA andbPHB, the variables Yab andZab are independent.

(iii) EachYab can be decomposed intoYa`Yb wheretYauaPHAY tYbubPHB is a

collection of i.i.d. Gaussian random variables. Proof of Claim. First note that `pypabq_q“lk_´_y“lk˘

α is non-zero if and only if |α´1

pa_q|and |α´1

pb_q| have the same parity. This implies (i) since when α_PX we have|α´1_p_a_{q| “ |}_α´1_p_b_{q| “}_{0. (ii) holds because}_Y_X_Z_“_0.

ForsPHAYHB, letYsbe the subset ofY such that

Ys“ tαPY:|α´1psq| “1u.

By definition,Ysare disjoint andY “ŤsPHAYHBYs. Hence,

Yab“ ÿ sPHAYHB A WYs,pyp abq_q“lk_´_y“lkE_. Moreover,@WYs,pyp

abq_q“lk_´_yl“kD_{is zero when}_s_{R t}_{a, b}_u_{. So,}

Yab“

ÿ

αPYaYYb

W_αppypabq_ql“k_´y“lk_qα.

Note that forαPYa

ppypabqq“lk_´_y“lk_qα_“ $ ’ & ’ % `1 if|α´1 pA_zH_{Aq| “}0 ´1 if|α´1_p_B_z_H_{Bq| “}₀ 0 otherwise. So, ÿ αPYa W_αppypabqq“lk_´_y“lk_qα_“ ÿ αPYa |α´1 pAzHAq|“0 Wα´ ÿ αPYa |α´1 pBzHBq|“0 Wα

which does not depend on the choice ofb. Let

Ya “ ÿ αPYa |α´1pAzHAq|“0 Wα´ ÿ αPYa |α´1pBzHBq|“0 Wα

and Yb respectively. Then Yab “ Ya `Yb and tYsusPHAYHB is a collection of

independent Gaussian random variables. Moreover, the variance ofYs is equal

to 2k`n

2 ´h

˘k´1

, independent of the choice ofs. By the claim, we have@W,_pypabq_q“lk_´_y“lkD_“_Y

a`Yb`Zab. Moreover, max aPHA bPHB pYa`Yb`Zabq ě max aPHA bPHB pYa`Ybq ´max aPHA bPHB p´Zabq “ max aPHA Ya`max bPHB Yb´max aPHA bPHB p´Zabq.

(18)

We need the following tail bound on the maximum of Gaussian random variables.

Lemma 7. Let G1, . . . , GN be (not necessarily independent) Gaussian random

variables with variance1. Letǫ_ą0 be a constant which does not depend onN. Then, Pr ˆ max i“1,¨¨¨,NGią p1`ǫq a 2 logN ˙ ďN´ǫ.

On the other hand,

Pr ˆ max i“1,¨¨¨,NGi ă p1´ǫq a 2 logN ˙ ďexpp´NΩpǫqq ifGi’s are independent.

By the Lemma, we have max aPHA Ya ě p1´0.01ǫq c 2 logh¨2k´n 2 ´h ¯k´1 max bPHA Yb ě p1´0.01ǫq c 2 logh¨2k´n 2 ´h ¯k´1 max aPHA bPHB Zab À a

logh¨max VarpZabq

with probability 1ó_np1q. Note that VarpZabq “ }pypabqq“lk_´_y“lk_}2 F ´ pVarpYAq `VarpYBqq “ 2nk ˆ φp1q ´φ ˆ 1´ 4 n ˙˙ ´4kń 2 ´h ¯k´1 ď 8nk´1 ˆ φ1p1q ´ p1óp1qqk 2k ˙ which isopnk´1_q_{. Hence,} max aPHA bPHB A W,_pypabq_q“lk_´_y“lkE _ě ₂_p₁_´₀_.₀₁_ǫ_´_o_p₁_qq c 2 logn_¨2kń 2 ´h ¯k´1 ě p1´0.01ǫóp1qq c knk´1_log_n 2k´5 .

On the other hand, sinceσą p1`ǫqσ˚ we have

nk σ ˆ φ_p1q ´φ ˆ 1´_n4 ˙˙ ă n k 1`ǫ¨ 4φ1p1q n ¨ d 2 logn nk´1_φ₁_p₁_q ă ₁1 `ǫ c nk´1_log_n 2k´5

(19)

which is less than max aPHA bPHB A W,_pypabq_q“lk_´y“lkE

with probability 1´o_np1q. Thus, 1´p_py_p_{M Lq ě}1´o_np1q.

B

Proof of Theorem 2

In this section, we prove Theorem 2 for generalk.

Letk be a positive integer withką2. LetyP t˘1un _with₁T_y_“_{0 and}_T

be ak-tensor defined as T _“y“lk_`_σ_W _where _W _{is a random}_k_{-tensor with}

i.i.d. standard Gaussian entries. Letf_px_{q “} @x“lk_,_TD _{and let}_f

p2qpxq be the

degree 2 truncation off_px_q, i.e.,

f_p2qpxq “ ÿ αPrnsk Tα ˜ 1 2k´1 ÿ 1_ďsătďk x_α_p_s_qx_α_p_t_q ¸ .

For eachtsătu Ď rks, let Qst _be_n_by_n_{matrix where}

Qst_ij “1₂ ¨ ˚ ˚ ˝ ÿ αPrnsk αpsq“i,αptq“j Tα` ÿ αPrnsk αpsq“j,αptq“i Tα ˛ ‹ ‹ ‚ Then, f_p2_qpxq “ 1 2k´1 @ Q, xxTD

whereQ“ř1ďsătďkQst. We consider the following SDP relaxation for maxxfp2_qpxq:

max xQ, X_y

@

X,11TD_“0, X ľ0_.

(4)

Theorem 8(Theorem 2, generalk). Let ǫą0be a constant not depending on

n. LetYp be a solution of (4) andp_pYp;σ_qbe the probability thatYp coincide with

yyT_{. Let}_σ p2q be σ_p2q“ c kpk´1q 22k´1 ¨ nk´21 ? 2 logn Ifσă p1´ǫqσ_p2q, then ppYp;σq “1´onp1q.

Proof. First note that Qst_“ nk´2

2k_´1yyT `σWst where

W_ijst_“ ÿ

αPrnsk

αpsq“i,αptq“j

(20)

So we have Q_“nk´2_¨kpk´1q 2k yy T `σWĎ whereĎW _“ř₁_ď_s_ă_t_ď_kWst_.

The dual program of (4) is

max trpDq

subject to D_`λ11T _´Qľ0,

D is diagonal.

(5)

By complementary slackness, yyT _{is the unique optimum solution of (4) if}

there exists a dual feasible solutionpD, λqsuch that

@

D_`λ11T_´Q, yyTD_“₀

and the second smallest eigenvalue ofD_`λ11T _´_Q_{is greater than zero. For}

brevity, letS“D`λ11T _´_Q_{. Since} _S _{is positive semidefinite, we must have}

Sy“0, that is, Dii “ n ÿ j“1 Qijyiyj for anyi_{P r}n_s.

For a symmetric matrixM, we define theLaplacian LpMqofM asLpMq “

diagpM1_{q ´}M (See [21]). Using this language, we can expressS as

S“diagpyq`LpdiagpyqQdiagpyqq `λyyT˘diagpyq.

Note that

diagpyqQdiagpyq “nk´2¨kpk₂´_k 1q11T _`σdiagpyqWĎdiagpyq.

Hence, the Laplacian of diagpy_qQdiagpy_qis equal to

nk´1¨kpk´1q 2k ˆ Inˆn´ 1 n11 T ˙ looooooooooooooooooooomooooooooooooooooooooon deterministic part

`σ_{looooooooooooomooooooooooooon}L`diagpyqĎWdiagpyq˘

noisy part

.

This matrix is positive semidefinite if

σ››L_pdiagpyqWĎdiagpyq››ďnk´1¨kpk₂´_k 1q.

Moreover, if the inequality is strict then the second smallest eigenvalue ofS is greater than zero.

By triangle inequality, we have

›

›L_pdiagpy_qWĎdiagpy_qq›› _ď max

iPrns n ÿ j“1 Ď Wijyiyj` ÿ 1ďsătďk }diagpy_qWstdiagpy_q} ď max iPrns n ÿ j“1 Ď Wijyiyj` ˆ k 2 ˙_? 2nk´1_.

(21)

The second inequality holds with high probability since Wst _{has independent}

Gaussian entries. Sinceřn_j_“1ĎWijyiyjis Gaussian and centered, with high

prob-ability we have that

max iPrns n ÿ j“1 Ď Wijyiyj ď p1`0.1ǫq a 2 logn_¨ ˜ max iPrnsVar ˜ _n ÿ j“1 Ď Wijyiyj ¸¸1{2 ď p1`0.1ǫ_qa2 logn_¨ dˆ k 2 ˙ nk´1_. Hence, › ›L_pdiagpyqWĎdiagpyqq››ď p1`0.1ǫ`op1qq d 2 ˆ k 2 ˙ nk´1_log_n

with high probability. So,Sľ0 as long as

σp1`0.1ǫ`op1qq d 2 ˆ k 2 ˙ nk´1_log_n_ă_nk´1 ¨kpk´1q 2k or simply p1`0.1ǫ`op1qqσă c kpk´1q 22k´1 ¨ nk´21 ? 2 logn “σp2q.

So, whenσă p1´ǫqσ_p2qwith high probabilityYp “yyT.

C

Pseudo-expectation and its moment matrix

C.1 Notation and preliminaries

Let V be the space of real-valued functions on the n-dimensional hypercube

t´1,`1un_{. For each} _S _{Ď r}_n_s_{, let} _x

S “śiPSxi. Note that txS : S Ď rnsu is

a basis ofV. Hence, any function f : t˘1un _Ñ_R _{can be written as a unique}

linear combination of multilinear monomials, say

f_px_{q “} ÿ

SĎrns

fSxS.

The degree of f is defined as the maximum size of S Ă rns such that fS is

nonzero.

Letℓ be a positive integer. Let us denote the collection of subsets ofrnsof size at mostℓby`r_ďn_ℓs˘, and the sizeˇˇˇ`r_ďnℓs

˘ˇˇ_ˇ_{of it by}`_n

ďℓ

˘

.

LetM be a square symmetric matrix of size`_ďn_ℓ˘. The rows and the columns ofM are indexed by the elements in`r_ďn_ℓs˘. To avoid confusion, we useM_rS, T_s

(22)

We say M is SoS-symmetric if M_rS, T_{s “} M_rS1_{, T}1_s _whenever _x

SxT “

xS1x_T1on the hypercube. SincexP t˘1un, it means thatS‘T “S1‘T1where

S_‘T denotes the symmetric difference ofS andT. Givenf _PV with degree at most 2ℓ, we sayM represents f if

fU “

ÿ

S,TP_prďnℓsq:

S‘T“U

MrS, Ts.

We useMf to denote the unique SoS-symmetric matrix representingf.

LetLbe a linear functional onV. By linearity,Lis determined bypLrxSs:

S _{Ď r}n_sq. Let XL be the SoS-symmetric matrix of size

`n

ďℓ

˘

with entries

X_rS, T_{s “}L_rxS‘Ts. We callXL themoment matrix ofL of degree2ℓ. By definition, we have Lrfs “ ÿ UP_prns ď2ℓq fULrxUs “ ÿ S,TPprďnℓsq MfrS, TsXLrS, Ts “ xXL, Mfy

for anyf of degree at most 2ℓ.

C.2 A quick introduction to pseudo-expectations

For our purpose, we only work on pseudo-expectations defined on the hypercube

t˘1un_{. See [36] for general definition.}

Letℓbe a positive integer and d_“2ℓ. Apseudo-expectation of degree don

t˘1un _{is a linear functional} _E_r _{on the space} _V _{of functions on the hypercube}

such that (i) Err 1s “1, (ii) Err q2

s ě0 for anyq_PV of degree at mostℓ_“d_{2. We say Er satisfies the system of equalities tpipxq “0um

i“1 if rErfs “0 for any

f PV of degree at most dwhich can be written as

f “p1q1`p2q2` ¨ ¨ ¨ `pmqm

for someq1, q2, . . . , qmPV.

We note the following facts:

• IfEris a pseudo-expectation of degreed, then it is also a pseudo-expectation of any degree smaller thand.

• IfEris a pseudo-expectation of degree 2n, thenErdefines a valid probability distribution supported onP:“ tx_{P t˘}1un_:_p_ip_x_{q “}_{0 for all}_i_{P r}_m_su_.

(23)

The second fact implies that maximizing f_px_q onP is equivalent to maxi-mizingErr f_sover all pseudo-expectations of degree 2nsatisfyingtp_ipx_{q “}0um

i“1.

Now, letdbe an even integer such that

d_ąmaxtdegpf_q,degpp1q,¨ ¨ ¨,degppmqu.

We relax the original problem to max Err f_s

subject to Er is degree-dpseudo-expectation ont˘1un

satisfyingtpipxq “0um i“1.

(SoS_d₎

We note that the value of (SoS_d_{) decreases as} _d _{grows, and it reaches the} optimum value maxxPPp0pxqof the original problem at d“2n.

C.3 Matrix point of view

Let Er be a pseudo-expectation of degree d “ 2ℓ for some positive integer ℓ. Suppose that rE satisfies the system tpipxq “ 0um

i“1. Let XEr be the moment

matrix ofEr of degree 2ℓ, hence the size ofX_r_Eis`_ďn_ℓ˘.

Conditions forEr being pseudo-expectation translate as the following condi-tions forX_r_E:

(i) X_H,H“1.

(ii) X is positive semidefinite. Moreover,rEsatisfiestp_ipx_{q “}0um

i“1 if and only if

(iii) Let U be the space of functions in V of degree at most dwhich can be written asřm_i_“1piqi for someqiPV. Then,

@

Mf, XEr D

“0 for anyf _PU. Hence, (SoS_d_{) can be written as the following semidefinite program}

max xMf, Xy

subject to X_H,H“1

xMq, Xy “0 for all qPB

X ľ0_,

(SDP_d₎

whereBis any finite subset ofU which spansU, for example, B_{“ t}xSpipxq:iP rms,|S| ďd´degppiqu.

D

Proof of Theorem 4

Lety P t˘1un _{such that} ₁T_y _“_0, _σ _ą _{0, and}_W _{P pR}n_qb4 _{be 4-tensor with}

independent, standard Gaussian entries. Given a tensor T _“ yb4

(24)

would like to recover the planted solutiony fromT. Letf_px_{q “}@T, xb4D_{. The}

maximum-likelihood estimator is given by the optimum solution of max

xPt˘1_un_:₁T

x“0fpxq.

Consider the SoS relaxation of degree 4 max Err fs

subject to Er is a pseudo-expectation of degree 4 ont˘1un

satisfying

n

ÿ

i“1

xi “0.

LetEUpty,´yuqbe the expectation operator of the uniform distribution onty,´yu,

i.e.,EUpty,´yuqrxSs “yS if|S|is even, andEU_pty,´yuqrxSs “0 if|S|is odd.

IfEU_pty,´yuqis the optimal solution of the relaxation, then we can recovery

from it up to a global sign flip. First we give an upper bound onσ to achieve it with high probability.

Theorem 9 (Part one of Theorem 4). Let T _“ yb4 _`_σ_W _and _f_p_x_{q “}

@

T, xb4D _{be as defined above. If} _σ _À _?n

logn, then the relaxation maxrEErr fs

over pseudo-expectationEr of degree 4 satisfyingřn_i_“₁xi“0is maximized when

r

E“EUpty,´yuq with probability1´onp1q.

We can reduce Theorem 9 to the matrix version of the problem via flattening [31]. Given a 4-tensor T, the canonical flattening of T is defined as n2

ˆn2

matrix T with entries T_pi,jq,pk,ℓq “ Tijkℓ. Then, T “ ryyrT `σW where ry is

the vectorization ofyyT, and W is the flattening of W. Note that this is an instance ofZ2-synchronization model with Gaussian noises. It follows that with

high probability the exact recovery is possible whenσÀ _?n

logn (see Proposition

2.3 in [21]).

We complement the result by providing a lower bound onσwhich is off by polylog factor.

Theorem 10 (Part two of Theorem 4). Let c _ą 0 be a small constant. If

σěnplognq1{2`c_{, then there exists a pseudo-expectation} _E_r _{of degree 4 on the}

hypercube t˘1un _satisfying řn

i“1xi “0 such that rErfs ąfpyq with probability

1´onp1q.

D.1 Proof of Theorem 10

Letg_px_qbe the noise part off_px_q, i.e., g_px_{q “}@W, xb4D_{. Let} _r

E be a pseudo-expectation of degree 4 on the hypercube which satisfies the equalityřn_i_“1xi “

(25)

Lemma 11. Let g_px_q be the polynomial as defined above. Then, there exists a pseudo-expectation of degree 4 on the hypercube satisfying1Tx_“0 such that

r

Ergs Á n

3

plognq1{2`op1q.

We prove Theorem 10 using the lemma.

Proof of Theorem 10. Note that gpyq “@W, yb4D_{is a Gaussian random}

vari-able with variance n4_{. So,} _g_p_y_{q ď} _n2_log_n _{with probability 1}_´_o_p₁_q_{. Let} _E_r

be the pseudo-expectation satisfying the conditions in the lemma. Then, with probability 1óp1qwe have r Erfs ´fpyq “ ´pn4´rErpyTxq4 sq `σpErr gs ´gpyqq ě ń4`σ ˆ n3 logn`n 2_log n ˙ ě ń4` p1óp1qq σn 3 plognq1{2òp1q.

Sinceσ_ąn_plogn_q1_{2_`c _{for some fixed constant}_c_ą_{0, we have}_Er_r _f_{s ´}_f_p_y_{q ą}₀

as desired.

In the remainder of the section, we prove Lemma 11.

D.1.1 Outline

We note that our method shares a similar idea which appears in [33] and [34]. We are given a random polynomialgpxq “@W, xb4D_where_W_has

indepen-dent standard Gaussian entries. We would like to constructEr“ErWwhich has large correlation withW. If we simply let

r Erxi1xi2xi3xi4s “ 1 24 ÿ πPS4 Wiπ_p1_q,iπ_p2_q,iπ_p3_q,iπ_p4_q

forti1ăi2ăi3ăi4u Ď rnsandErr xTsbe zero if|T| ď3, then

r Ergs “ ₂₄1 ÿ 1ďi1ăi2ăi3ăi4ďn ˜ ÿ πPS4 Wiπ_p1_q,iπ_p2_q,iπ_p3_q,iπ_p4_q ¸2

so the expectation of Err g_s over W would be equal to `n₄˘ « n4

24. However,

in this case Er does not satisfies the equality 1T_x _“ _{0 nor the conditions for}

pseudo-expectations.

To overcome this, we first project the Er constructed above to the space of linear functionals which satisfy the equality constraints (x2

i “1 and1Tx“0).

Then, we take a convex combination of the projection and a pseudo-expectation to control the spectrum of the functional. The details are following:

(26)

(1) (Removing degeneracy) We establish the one-to-one correspondence be-tween the collection of linear functionals on n-variate, even multilinear polynomials of degree at most 4 and the collection of linear functionals onpn_´1q-variate multilinear polynomials of degree at most 4 by posing

xn “1. This correspondence preserves positivity.

(2) (Description of equality constraints) Letψbe a linear functional onpn_´1q -variate multilinear polynomials of degree at most 4. We may thinkψ as a vector inRp

n_´1

ď4q. Then, the condition thatψ satisfies řn´1

i“1 xi`1“0

can be written asAψ_“0 for some matrixA. (3) (Projection) LetwPRp

n´1

ď4qbe the coefficient vector of_gp_xq. Let Π be the

projection matrix to the spacetx:Ax“0u. In other words, Π“Id´ATpAATq:_A

whereIdis the identity matrix of size`n_ď´₄1˘andp¨q: _{denotes the}

pseudo-inverse. Lete be the first column of Π andψ1 “ _eΠTw

w. Then pψ1qH “1

andAψ1“0 by definition.

(4) (Convex combination) Letψ0“ _eTe

e. We note thatψ0 corresponds to the

expectation operator of uniform distribution ontx_{P t˘}1un_:₁T_x_“₀_u_.

We will constructψby

ψ_{“ p}1´ǫ_qψ0`ǫψ1

with an appropriate constantǫ. Equivalently,

ψ_“ψ0` ǫ eT_w¨ ˆ Π´ee T eT_e ˙ w.

(5) (Spectrum analysis) We bound the spectrum of the functional´Π´eeT

eT_e

¯

w

to decide the size ofǫforψbeing positive semidefinite.

D.1.2 Removing degeneracy

Recall that

gpxq “ ÿ i,j,k,lPrns

Wijklxixjxkxℓ.

Observe thatg is even, i.e.,g_px_{q “}g_p´x_qfor anyx_{P t˘}1un_{. To maximize such}

an even function, we claim that we may only consider the pseudo-expectations such that whose odd moments are zero.

Proposition 12. Let Er be a pseudo-expectation of degree 4 on hypercube sat-isfying řn_i_“₁xi “0. Let p be a degree 4 multilinear polynomial which is even.

Then, there exists a pseudo-expectation Er1 of degree 4 such that Err ps “Er1rps

(27)

Proof. LetrEbe a pseudo-expectation of degree 4 on hypercube satisfyingřn_i_“1xi“

0. Let us define a linear functionalEr0 on the space of multilinear polynomials

of degree at most 4 so thatEr0rxSs “ p´1q|S|Err xSsfor any SP

`_r_n_s

ď4

˘

. Then, for any multilinear polynomialqof degree at most 2, we have

r

E0rqpxq2s “Err qp´xq2s ě0.

Also,rE0 satisfiesEr0r1s “1 and

r E0 «˜ _n ÿ i“1 xi ¸ qpxq ff “ Ér «˜ _n ÿ i“1 xi ¸ qp´xq ff “0

for anyqof degree 3. Thus,Er0 is a valid pseudo-expectation of degree 4

satis-fyingřn_i_“₁xi “0.

LetEr1“ 1

2pE`r Er0q. This is again a valid pseudo-expectation, since the space

of pseudo-expectations is convex. We haverE1rppxqs “Err ppxqs “Er0rppxqssince

pis even, andEr1rxSs “ p1` p´1q|S|qErr xSs “0 for anyS of odd size.

LetE be the space of all pseudo-expectations of degree 4 onn-dimensional hypercube with zero odd moments. LetE1be the space of all pseudo-expectations of degree 4 onpn´1q-dimensional hypercube. We claim that there is a bijection between two spaces.

Proposition 13. Let ψ_PE. Let us define a linear functional ψ1 _{on the space}

of pn_´1q-variate multilinear polynomials of degree at most 4 so that for any

T Ď rn´1s with|T| ď4

ψ1_rxTs “

#

ψrxTYtnus if |T| is odd

ψrxTs otherwise.

Then,ψÞÑψ1 is a bijective mapping from E toE1.

Proof. We say linear functionalψon the space of polynomials of degree at most 2ℓispositive semidefinite ifψrq2_{s ě}_{0 for any}_q _{of degree}_ℓ_.

Note that the mapping ψ1 ÞÑ ψ where ψrxSs “ ψ1rxSztnus for any S Ď rns

of even size is the inverse of ψÞÑψ1. Hence, it is sufficient to prove thatψ is positive semidefinite if and only ifψ1 is positive semidefinite.

(ñ) Let q be an n-variate polynomial of degree at most 2. Let q0 and q1 be

polynomials inx1,¨ ¨ ¨, xn´1 such that

q_px1,¨ ¨ ¨, xnq “q0px1,¨ ¨ ¨ , xn´1q `xnq1px1,¨ ¨ ¨ , xn´1q.

We getψ1rq2_{s “}_ψ1_rp_q₀_`_x

nq1q2s “ψ1rpq02`q12q `2xnq0q1s. For i“1,2, letqi0

andqi1 be the even part and the odd part of qi, respectively. Then we have

ψ1rq2s “ ψ1rpq002 `q 2 01`q 2 10`q 2 11q `2xnpq00q11`q01q10qs “ ψrpq002 `q 2 01`q 2 10`q 2 11q `2pq00q11`q01q10qs “ ψ_rpq00`q11q2` pq10`q01q2s ě0.

(28)

The first equality follows from that ψ1_r_q_{s “}_{0 for odd} _q_{. Hence,}_ψ1 _{is positive}

semidefinite.

(ð) Letqbe an pn_´1q-variate polynomial of degree at most 2. Letq0 andq1

be the even part and the odd part ofq, respectively. Then,

ψ_rq2_{s “}ψ_rpq02`q 2

1q `2q0q1s.

Note thatq2

0`q12is even andq0q1 is odd. So,

ψrq2s “ψ1rpq20`q 2

1q `2xnq0q1s “ψ1rpq0`xnq1q2s ě0.

Henceψis positive semidefinite.

In addition to the proposition, we note that ψ satisfies řn_i_“1xi “ 0 if and

only ifψ1_{satisfies 1}_`řn´1

i“1 xi“0. Hence, maximizingErr gsoverrEPEsatisfying

řn i“1xi“0 is equivalent to max ψ1_PE1ψ 1_r_g1_s _{subject to} _ψ1 _{satisfies 1}_` n_ÿ´1 i“1 xi “0, whereg1_p_x₁_,_{¨ ¨ ¨} _{, x} n´1q “gpx1,¨ ¨ ¨ , xn´1,1q.

D.1.3 Matrix expression of linear constraints

LetF be the set of linear functional on the space ofpn´1q-variate multilinear polynomials of degree at most 4. We often regard a functionalψPFas a`n´1

ď4

˘

dimensional vector with entriesψS “ψrxSswhereSis a subset ofrn´1sof size

at most 4. The spaceE1 of pseudo-expectations of degree 4 (onpn´1q-variate multilinear polynomials) is a convex subset ofF.

Observe thatψ_PF satisfies 1`řin“´11xi“0 if and only if

ψ «˜ 1` n_ÿ´1 i“1 xi ¸ xS ﬀ “0 for anySĎ rn´1swith|S| ď3.

Lets,t andube integers such that 0ďs, t_ď4 and 0ďu_ďminps, t_q. Let

Mu

s,t be the matrix of size

`n´1 ď4 ˘ such that pM_s,tqS,Tu _“ # 1 if|S_{| “}s,|T_{| “}t, and|S_XT_{| “}u 0 otherwise forS, T P`rn´1s ď4 ˘

. Then, the condition that ψPF satisfying 1`řin“´11xi “0

can be written asAψ“0 where

A_“M0 0,0`M00,1` 3 ÿ s“1 pM_s,ss´_´1₁_`Ms s,s`Ms,ss `1q.

(29)

D.1.4 Algebra generated by Mu s,t

Letmbe a positive integer greater than 8. For nonnegative integerss, t, u, let

Mu s,t be the `_m ď4 ˘ ˆ`m ď4 ˘ matrix with pMs,tqS,Tu “ # 1 if|S| “s,|T| “t, and|SXT| “u 0 otherwise,

forS, T_{Ď r}m_swith|S_|,_|T_{| ď}4. LetAbe the algebra of matrices

ÿ

0_ďs,tď4

s_ÿ^t u“0

xu_s,tM_s,tu

with complex numbers xu

s,t. This algebra A is a C˚-algebra: it is a complex

algebra which is closed under taking complex conjugate. A is a subalgebra of the Terwilliger algebra of the Hamming cubeHpm,2q[41], [42].

Note that A has dimension 55 which is the number of triples ps, t, uqwith 0ďs, tď4 and 0ďuďs^t. Define βu_s,t,r:“ s_ÿ^t p“0 p´1qp´t ˆ p u ˙ˆ m´2r p_´r ˙ˆ m´r´p s_´p ˙ˆ m´r´p t_´p ˙

for 0ďs, tď4 and 0ďr, uďs^t. The following theorem says that matrices in the algebraAcan be written in a block-diagonal form with small sized blocks.

Theorem 14([42]). There exists an orthogonal`_ďm₄˘ˆ`_ďm4

˘

matrixU such that forM PAwith M _“ 4 ÿ s,t“0 s_ÿ^t u“0 xu_s,tM_s,tu,

the matrixUTM U is equal to the matrix ¨ ˚ ˚ ˚ ˚ ˝ C0 0 0 0 0 0 C1 0 0 0 0 0 C2 0 0 0 0 0 C3 0 0 0 0 0 C4 ˛ ‹ ‹ ‹ ‹ ‚ where each Cr is a block diagonal matrix with

`m r ˘ ´`rm´1 ˘ repeated, identical blocks of order5´r: Cr“ ¨ ˚ ˚ ˚ ˝ Br 0 ¨ ¨ ¨ 0 0 Br ¨ ¨ ¨ 0 .. . ... . .. ... 0 0 ¨ ¨ ¨ Br ˛ ‹ ‹ ‹ ‚,

(30)

and Br“ ˜ ÿ u ˆ m´2r s_´r ˙´1{2ˆ m´2r t_´r ˙´1{2 β_s,t,ru xu_s,t ¸4 s,t“r .

For brevity, let us denote this block-diagonalization of M by the tuple of matricespB0, B1, B2, B3, B4qwithp5´rq ˆ p5´rqmatrixBr’s.

D.1.5 Projection

Recall thatg1px1,¨ ¨ ¨, xn´1qis equal to

ÿ

i,j,k,ℓPrns

Wijkℓxti,j,k,ℓuztnu.

LetcPRp

n_´1

ď4qbe the coefficient vector ofg1. Since entries ofWare independent

Standard Gaussians,c is a Gaussian vector with a diagonal covariance matrix Σ“ErccT_s_where ΣS,S“ $ ’ & ’ % n if|S| “0 12n´16 if|S| “1 or|S| “2 24 if|S_{| “}3 or|S_{| “}4.

Letw_“Σ´1_{2_c_{. By definition,}_w_{is a random vector with i.i.d. standard normal}

entries.

Let Π“Id_´AT_p_AAT_q`_A _where_p_AAT_q` _{is the Moore-Penrose}

pseudoin-verse of AAT _and _Id _{is the identity matrix of order} `n´1

ď4

˘

. Then, Π is the orthogonal projection matrix onto the nullspace ofA. SinceA,AT _and _Id_are

all in the algebraA, the projection matrix Π is also inA. Letebe the first column of Π and

ψ0:“

e

eT_e and ψ1:“

Πw eT_w.

We have Aψ0 “ Aψ1 “ 0 by definition of Π, and pψ0q_H “ pψ1q_H “ 1 since

pΠwqH“eTw.

Let ǫ be a real number with 0 ă ǫ ă 1 and ψ “ p1´ǫqψ0`ǫψ1. This

functional still satisfiesAψ “0 and ψ_H “1, regardless of the choice of ǫ. We would like to chooseǫsuch thatψis positive semidefinite with high probability.

D.1.6 Spectrum of ψ

Consider the functionalψ0“ _eTe

e. It has entries pψ0qS “ $ ’ & ’ % 1 ifS“ H ´ 1 n´1 if|S| “1 or 2 3 pn´1qpn´3q if|S| “3 or 4,

(31)

forS _{Ď r}n_´1sof size at most 4. We note that this functional corresponds to the degree 4 or less moments of the uniform distribution on the set of vectors

x_{P t˘}1un´1_satisfyingřn´1

i“1 xi`1“0.

Proposition 15. Let ψ be a vector in Rp

n_´1

ď4q such that Aψ “ 0 and p be an pn_´1q-variate multilinear polynomial of degree at most 2. Suppose that

ψ0rp2s “0. Then, ψrp2s “0.

Proof. Let U _{“ t}x_{P t˘}1un´1 _:řn´1

i“1 xi`1 “0u. Note thatψ0 is the

expec-tation functional of the uniform distribution on U as we seen above. Hence,

ψ0rp2s “0 if and only ifppxq2“0 for anyxPU.

On the other hand, the functionalψ is a linear combination of functionals

tp _ÞÑ p_px_q : x _P U_u since Aψ _“ 0. Hence, if ψ0rp2s “ 0 then ψrp2s “ 0 as

p_px_q2

“0 for anyx_PU.

Recall that ψ _{“ p}1 ´ǫ_qψ0`ǫψ1 where ψ0 “ _eTe

e and ψ1 “ Πw eT w. Let ψ11 “eTw¨ pψ1´ψ0q. Then, ψ11 “ Πw´ eTw eT_ee “ ˆ Π´ee T eT_e ˙ w andψ_“ψ0`_eTǫ

wψ11. We note that Aψ11“0 sinceψ11 is a linear combination

ofψ0 andψ1.

LetXψ0 andXψ11 be the moment matrix ofψ0 andψ

1

1respectively. LetXψ

be the moment matrix ofψ. Clearly,

Xψ“Xψ0`

ǫ eT_wXψ11.

Moreover, for any pP Rp

n_´1 ď2q satisfying_X ψ0p“ 0, we haveXψ11p “0 by the proposition. Hence,Xψľ0 if ǫ |eT_w_|}Xψ11} ďλmin,‰0pXψ0q

whereλmin,‰0 denotes the minimum nonzero eigenvalue.

We note that eTw and }Xψ1

1} are independent random variables. It

fol-lows from that w is a gaussian vector with i.i.d. standard entries, and thate

and ´Π´eeT

eT_e

¯

are orthogonal. Hence, we can safely bound eT_w _and _}_X ψ1

1}

separately.

To bound}Xψ1

1} we need the following theorem.

Theorem 16 (Matrix Gaussian ([43])). Let tAku be a finite sequence of fixed, symmetric matrices with dimensiond, and lettξkube a finite sequence of

inde-pendent standard normal random variables. Then, for anytě0,

Pr«››››_›ÿ k ξkAk › › › › ›ět ﬀ ďd_¨e´t2{2σ2 where σ2:“ › › › › › ÿ k A2_k › › › › ›.

Community detection in hypergraphs, spiked tensor models, and Sum-of-Squares

arXiv:1705.02973v1 [cs.DS] 8 May 2017