• No results found

Community detection in hypergraphs, spiked tensor models, and Sum-of-Squares

N/A
N/A
Protected

Academic year: 2021

Share "Community detection in hypergraphs, spiked tensor models, and Sum-of-Squares"

Copied!
34
0
0

Loading.... (view fulltext now)

Full text

(1)

arXiv:1705.02973v1 [cs.DS] 8 May 2017

Community Detection in Hypergraphs, Spiked

Tensor Models, and Sum-of-Squares

Chiheon Kim

[email protected]

Afonso S. Bandeira

[email protected]

Michel X. Goemans

[email protected]

May 9, 2017

Abstract

We study the problem of community detection in hypergraphs under a stochastic block model. Similarly to how the stochastic block model in graphs suggests studying spiked random matrices, our model motivates investigating statistical and computational limits of exact recovery in a certain spiked tensor model. In contrast with the matrix case, the spiked model naturally arising from community detection in hypergraphs is dif-ferent from the one arising in the so-called tensor Principal Component Analysis model. We investigate the effectiveness of algorithms in the Sum-of-Squares hierarchy on these models. Interestingly, our results suggest that these two apparently similar models exhibit significantly different computational to statistical gaps.

1

Introduction

Community detection is a central problem in many fields of science and engineer-ing. It has received much attention for its various applications to sociological behaviours [1, 2, 3], protein-to-protein interactions [4, 5], DNA 3D conforma-tion [6], recommendaconforma-tion systems [7, 8, 9], and more. In many networks with community structure one may expect that the groups of nodes within the same community are more densely connected. The stochastic block model (SBM)

MIT, Department of Mathematics, 77 Massachusetts Ave., Cambridge, MA 02139Department of Mathematics and Center for Data Science, Courant Institute of

Mathe-matical Sciences, New York University, New York, NY 10012. Part of this work was done while A. S. Bandeira was with the Mathematics Department at MIT and supported by NSF Grant DMS-1317308.

MIT, Department of Mathematics, Room 2-474, 77 Massachusetts Ave., Cambridge, MA

(2)

[10] is arguably the simplest model that attempts to capture such community structure.

Under the SBM, each pair of nodes is connected randomly and independently with a probability decided by the community membership of the nodes. The SBM has received much attention for its sharp phase transition behaviours [11], [12], [13], computational versus information-theoretic gaps [14], [15], and as a test bed for many algorithmic approaches including semidefinite programming [13], [16], spectral methods [17], [18] and belief-propagation [19]. See [20] for a good survey on the subject.

Let us illustrate a version of the SBM with two equal-sized communities. Let

yP t˘1unbe a vector indicating community membership of an even numbernof

nodes and assume that the size of two communities are equal, i.e.,1Ty0. Letp

andqbe inr0,1sindicating the density of edges within and across communities, respectively. Under the model, a random graphGis generated by connecting nodesiandj independently with probabilitypifyi“yj or with probabilityq

ifyi‰yj. Letypbe any estimator ofygiven a sampleG. We say thatypexactly

recoversy ifypis equal toy or´y with probability 1´onp1q. In the asymptotic regime ofpalogn{nand qblogn{nwhere aąb, it has been shown that the maximum likelihood estimator (MLE) recoversy when?a´?bą?2 and the MLE fails when?a´?bă?2, showing a sharp phase transition behaviour [13]. Moreover, it was subsequently shown in [16] and [21] that the standard semidefinite programming (SDP) relaxation of the MLE achieves the optimal recovery threshold.

A fruitful way of studying phase transitions and effectiveness of different algorithms for the stochastic block model is to consider a Gaussian analogue of the model [22]. Let G be a graph generated by the SBM and let AG be the

adjacency matrix ofG. We have

EAG“ p`q 2 11 T `p´q 2 yy T ´pI.

It is then useful to thinkAG as a perturbation of the signalEAGunder centered

noise.

This motivates a model with Gaussian noise. Given a vectoryP t˘1unwith

1Ty0, a random matrixT is generated as

Tij“yiyj`σWij

whereWij“Wji„Np0,1qfor alli, jP rns. This model is often referred toZ2

-synchronization [21, 22]. It is very closely related to the spiked Wigner model

(or spiked random matrix models in general) which has a rich mathematical theory [23, 24, 25].

In many applications, however, nodes exhibit complex interactions that may not be well captured by pairwise interactions [26, 27]. One way to increase the descriptive ability of these models is to considerk-wise interactions, giving rise to generative models on random hypergraphs and tensors. Specifically, a hypergraphic version of the SBM was considered in [28, 29], and a version of the censored block model in [30].

(3)

For the sake of exposition we restrict our attention to k4 in the sequel. Most of our results however easily generalize to anyk and will be presented in a subsequent publication. In the remaining of this section, we will introduce a hypergraphic version of the SBM and its Gaussian analogue.

1.1

Hypergraphic SBM and its Gaussian analogue

Here we describe a hypergraphic version of the stochastic block model. Lety

be a vector in t˘1un such that 1Ty 0. Let pand q be in r0,1s. Under the

model, a random 4-uniform hypergraphH onnnodes is generated so that each

ti, j, k, lu Ď rnsis included in EpHqindependently with probability

#

p ifyi“yj “yk“yl(in the same community)

q otherwise (across communities).

LetAH, the adjacency tensor ofH, be the 4-tensor given by

pAHqijkl #

1 ifti, j, k, lu PEpHq

0 otherwise. Lety“l4 be the 4-tensor defined as

yijkl“l4 “ # 1 ifyi“yj “yk“yl 0 otherwise, SinceyP t˘1un, we have y“l4 ˆ1 `y 2 ˙b4 ` ˆ1 ´y 2 ˙b4 .

Note that for any quadruple of distinct nodespi, j, k, lqwe have

pEAHqijkl “ pq1b4` pp´qqy“l4qijkl.

In the asymptotic regime of palogn{`n´31˘ andq blogn{`n´31˘, there is a sharp information-theoretic threshold for exact recovery. Results regarding this hypergraphic stochastic block model will appear in subsequent publication. Here we will focus on the Gaussian counterpart.

Analogously to the relationship between the SBM and the spiked Wigner model, the hypergraphic version of the SBM suggests the following spiked tensor model:

Ty“l4`σW,

whereWis a random 4-tensor with i.i.d. standard Gaussian entries. We note that here the noise tensor W is not symmetric, unlike in the spiked Wigner model. This is not crucial: assuming W to be a symmetric tensor will only scaleσby?4!.

(4)

2

The Gaussian planted bisection model

Given a sampleT, our goal is to recover the hidden spikeyup to a global sign flip. Letpybe an estimator ofycomputed fromT. Letppypqbe the probability thatpysuccessfully recoversy. SinceWis Gaussian,ppyp;σqis maximized when

p y argmin xPt˘1un:1T x“0} T´x“l4 }2 F,

or equivalentlypy“argmaxx@x“l4,TD, the maximum-likelihood estimator (MLE)

ofy.

Letfpxq “@x“l4,TDand

p

yM L be the MLE ofy. By definition,

ppypM L;σq “Pr ¨ ˝fpyq ą max xPt˘1uzty,´yu 1Tx0 fpxq ˛ ‚.

Theorem 1. Let ǫą0 be a constant not depending on n. Then, as n grows,

ppypM L;σq converges to 1 if σ ă p1 ´ǫqσ˚ and ppypM L;σq converges to 0 if

σą p1`ǫqσ˚, where σ˚“ c 1 8¨ n3{2 ? logn.

Here we present a sketch of the proof while deferring the details to the appendix. Observe thatypM Lis not equal toy if there existsxP t˘1un distinct

from y such that 1Tx 0 andfpxq ěfpyq. For each fixed x, the difference

fpxq ´fpyqis equal to

@

y“l4, x“l4´y“l4D`σ@W, x“l4´y“l4D

which is a Gaussian random variable with mean@y“l4, xl“4´y“l4Dand variance

σ2}x“l4´y“l4}2 F. By definition we have @ x“l4, y“l4D“ 1281 `p1T1`xTyq4 ` p1T1´xTyq4˘ . Letφptq “ 1 128pp1`tq 4 ` p1´tq4 q. Then,@x“l4, y“l4D“n4φpxTy{nqso @ y“l4, x“l4´y“l4D “ ´n4`φp1q ´φpxTy{nq˘, }x“l4 ´y“l4 }F “ n2b2φ p1q ´2φpxTy{nq. Hence, Prpfpxq ´fpyq ě0qis equal to Pr G„Np0,1q ˆ Gě n 2 σ?2 ¨ b φp1q ´φpxTy{nq ˙ .

This probability is maximized when xand y differs by only two indices, that is, xTy n´4. Indeed one can formally prove that the probability that y

(5)

maximizesfpxqis dominated by the probability thatfpyq ąfpxqfor allxwith

xTy n´4. By union bound and standard Gaussian tail bounds, the latter

probability is at most ´n 2 ¯2 exp ˆ ´n 4 4σ2 ¨ pφp1q ´φp1´4{nqq ˙ which is approximately exp ˆ 2 logn´n 3¨φ1p1q σ2 ˙ “exp ˆ 2 logn´ n 3 4σ2 ˙ , and it isonp1qifσ2ă n3 8 logn as in Theorem 1.

3

Efficient recovery

Before we address the Gaussian model, let us describe an algorithm for hyper-graph partitioning. LetH be a 4-uniform hypergraph on the vertex setV “ rns. LetAH be the adjacency tensor ofH. Then the problem can be formulated as

max@AH, x“l4D subject toxP t˘1un,1Tx0.

One approach for finding a partition is to consider the multigraph realization of

H, which appears in [28, 29] in different terminology. Let Gbe the multigraph onV “ rnssuch that the multiplicity of edgeti, juis the number of hyperedges

ePEpHqcontainingti, ju. One may visualize it as substituting each hyperedge by a 4-clique. Now, one may attempt to solve the reduced problem

max@AG, xxT

D

subject to xP t˘1un,1Tx0

which is unfortunately NP-hard in general. Instead, we consider the semidefinite programming (SDP) relaxation

max xAG, Xy

subject to Xii“1 for alliP rns,

@

X,11TD0, Xľ0.

WhenAG is generated under the graph stochastic block model, this algorithm

recovers the hidden partition down to optimum parameters. On the other hand, it achieves recovery to nearly-optimum parameters when Gis the multigraph corresponding to the hypergraph generated by a hypergraphic stochastic block model: there is a constant multiplicative gap between the guarantee and the information-theoretic limit. This will be treated in a future publication.

Here we had two stages in the algorithm: (1) “truncating” the hypergraph down to a multigraph, and (2) relaxing the optimization problem on the trun-cated objective function.

(6)

Now let us return to the Gaussian model T y“l4

`σW. Let fpxq “

@

x“l4,TD. Our goal is to find the maximizer off

pxq. Note that fpxq “ ÿ i1,¨¨¨,i4Prns Ti1i2i3i4¨ 1 16 ˜ 4 ź s“1 p1`xisq ` 4 ź s“1 p1´xisq ¸

so fpxqis a polynomial of degree 4 in variables x1, . . . , xn. Letfp2qpxqbe the

degree 2 truncation offpxq, i.e.,

fp2qpxq “ 1 8 ÿ i1,¨¨¨,i4Prns Ti1i2i3i4 ˜ ÿ 1ďsătď4 xisxit ¸ .

Here we have ignored the constant term of fpxq since it does not affect the maximizer. For eachtsătu Ď t1,2,3,4u, letQst benbynmatrix where

Qstij “ ÿ pi1,¨¨¨,i4qPrns 4 is“i,it“j Ti1i2i3i4. Then, fp2qpxq “ 1 8 @ Q, xxTD

whereQ“ř1ďsătď4Qst. This Qis analogous to the adjacency matrix of the

multigraph constructed above. It is now natural to consider the following SDP: max xQ, Xy

subject to Xii“1 for alliP rns,

@

X,11TD0, X ľ0.

(1)

Theorem 2. Let ǫą0 be a constant not depending on n. Let Yp be a solution of (1) andppYp;σqbe the probability thatYp coincide withyyT. Ifσă p1´ǫqσp˚2q

where σp˚2q“ c 3 32¨ n3{2 ? logn “ c 3 4¨σ ˚, thenppYp;σq “1´onp1q.

We present a sketch of the proof where details are deferred to the appendix. We note that a similar idea was used in [16, 21] for the graph stochastic block model.

We construct a dual solution of (1) which is feasible with high probability, and certifies that yyT is the unique optimum solution for the primal (1). By complementary slackness, such dual solution must be of the formS:“DQ1´Q1

whereQ1 “diagpyqQdiagpyq andDQ1 is diagpQ11q. It remains to show thatS

(7)

To show that S is positive semidefinite, we claim that the second smallest eigenvalue ofES is Θpn3

qand the operator norm }S´ES} isOpσn3{2?logn

q

with high probability. The first part is just an easy calculation, and the second part is application of a nonasymptotic bound on Laplacian random matrices [21]. Hence,Sis positive semidefinite with high probability ifσn3{2?logn

Àn3,

matching with the order ofσ˚ p2q.

4

Standard Spiked Tensor Model

Montanari and Richard proposed a statistical model for tensor Principal Com-ponent Analysis [31]. In the model we observe a random tensorTvb4

`σW

where v P Rn is a vector with }v} “ ?n (spike), W is a random 4-tensor with i.i.d. standard Gaussian entries, and σě0 is the noise parameter. They showed a nearly-tight information-theoretic threshold for approximate recovery: whenσ"n3{2 then the recovery is information-theoretically impossible, while

ifσ!n3{2 then the MLE gives a vectorv1PRn with|vTv1| “ p1´onp1qqnwith

high probability. Subsequently, sharp phase transitions for weak recovery, and strong and weak detection were shown in [32].

Those information-theoretic thresholds are achieved by the MLE for which no efficient algorithm is known. Montanari and Richard considered a simple spectral algorithm based on tensor unfolding, which is efficient in both theory and practice. They show that the algorithm finds a solutionv1 with |vTv1| “

p1´onp1qqnwith high probability as long asσOpnq[31]. This is somewhat believed to be unimprovable using semidefinite programming [33, 34], or using approximate message passing algorithms [35].

For clear comparison to the Gaussian planted bisection model, let us consider when spike is in the hypercubet˘1un. LetyP t˘1unandσě0. Given a tensor

Tyb4`σW

where W is a random 4-tensor with independent, standard Gaussian entries, we would like to recover y exactly. The MLE is given by the maximizer of

gpxq:“@xb4,TDover all vectorsxP t˘1un.

Theorem 3. Let ǫ ą 0 be any constant which does not depend on n. Let

λ˚ “ ?2¨n3{2

?log

n . Whenσą p1`ǫqλ˚, exact recovery is information-theoretically

impossible (i.e. the MLE fails with1´onp1qprobability), while if σă p1´ǫqλ˚

then the MLE recoversy with 1´onp1qprobability.

The proof is just a slight modification of the proof of Theorem 1 which ap-pears in the appendix. Note that bothλ˚andσ˚are in the order ofn3{2{?logn.

The standard spiked tensor model and the Gaussian planted bisection model ex-hibit similar behaviour when unbounded computational resources are given.

(8)

4.1

Sum-of-Squares based algorithms

Here we briefly discuss Sum-of-Squares based relaxation algorithms. Given a polynomial p P Rrx1,¨ ¨ ¨, xns, consider the problem of finding the maximum

ofppxqover xPRn satisfying polynomial equalitiesq

1pxq “0,¨ ¨ ¨, qmpxq “0.

Most hard combinatorial optimization problems can be reduced into this form, including max-cut,k-colorability, and general constraint satisfaction problems. TheSum-of-Squares hierarchy (SoS) is a systematic way to relax a polynomial optimization problem to a sequence of increasingly strong convex programs, each leading to a larger semidefinite program. See [36] for a good exposition of the topic.

There are many different ways to formulate the SoS hierarchy [37, 38, 39, 40]. Here we choose to follow the description based onpseudo-expectation functionals

[36].

For illustration, let us recall the definition of gpxq: given a tensor T

yb4`σW, we define gpxq “@xb4,TD. Heregpxqis a polynomial of degree 4,

and the corresponding maximum-likelihood estimator is the maximizer max gpxq

subject to x2i 1 for alliP rns,

n

ÿ

i“1

xi“0, xPRn.

(2)

Let µ be a probability distribution over the set txP t˘1un : 1Tx 0u. We

can rewrite (2) as maxµEx„µgpxq over such distributions. A linear functional

r

E on Rrxs is called pseudoexpectation of degree 2ℓ if it satisfies Er1 “ 1 and

r

Eqpxq2

ě0 for anyqPRrxsof degree at mostℓ. We note that any expectation

Eµ is a pseudoexpectation, but the converse is not true. So (2) can be relaxed to

max Ergpxq

subject to Er is a pseudoexpectation of degree 2ℓ,

r

Eis zero on I

(3)

whereIĎRrxsis the ideal generated byřni1xi andtx2i ´1uiPrns.

The space of pseudoexpectations is a convex set which can be described as an affine section of the semidefinite cone. Asℓincreases, this space gets smaller and in this particular case, it coincides with the set of true expectations when

ℓ“n.

4.2

SoS on spiked tensor models

We would like to apply the SoS algorithm to the spiked tensor model. In par-ticular, let us consider the degree 4 SoS relaxation of the maximum-likelihood problem. We note that for the spike tensor model with the spherical prior, it is known that neither algorithms using tensor unfolding [31], the degree 4 SoS

(9)

relaxation of the MLE [33], nor approximate message passing [35] can achieve the statistical threshold, therefore the statistical-computational gap is believed to be present. Moreover, higher degree SoS relaxations were considered in [34]: for any smallǫě0, degree at leastnΩpǫqis required in order to achieve recovery

whenσ«n1`ǫ.

We show an analogous gap of the degree 4 SoS relaxation fort˘1unand1Tx

prior.

Theorem 4. Let Tyb4`σWbe as defined above. Let gpxq “@xb4,TD. If

σÀ n

logΘp1q

n, then the degree 4 SoS relaxation of maxxgpxq gives the solution

y, i.e., Ergpxq is maximized when Er is the expectation operator of the uniform distribution onty,´yu.

On the other hand, if σ ÁnlogΘp1qn then there exists a pseudoexpectation

r

Eof degree 4 on the hypercube t˘1un satisfying1Tx0 such that

gpyq ămax

r

E

r

Egpxq,

soy is not recovered via the degree 4 SoS relaxation.

The proof of Theorem 4 is very similar to one that appears in [33, 34]. In the proof it is crucial to observe that

n3

logΘp1qnÀ Er:degree 4max pseudo-exp.

r

E@W, xb4DÀn3logΘp1qn.

The upper bound can be shown via Cauchy-Schwarz inequality for pseudoex-pectations and the lower bound, considerably more involved, can be shown by constructing a pseudoexpectationrEwhich is highly correlated to the entries of

W. We refer the readers to the appendix for details.

4.3

Comparison with the planted bisection model

Here we summarize the statistical-computational thresholds of two models, the planted bisection model and the spiked tensor model.

• For the planted bisection model, there is a constant c ą 0 not depend-ing on n such that for any ǫ ą 0 if σ ă pc´ǫq n3{2

?

logn then recovery is

information-theoretically possible, and if σ ą pc`ǫq n3{2

?log

n then the

re-covery is impossible via any algorithm. Moreover, there is an efficient algorithm and a constantc1 ăc such that the algorithm recoversy with high probability whenσăc1?n3{2

logn.

• For the spiked tensor model, there is a constant C ą 0 not depending on n such that for any ǫ ą 0 if σ ă pC ´ǫq?n3{2

logn then the recovery

is information-theoretically possible, and if σ ą pC`ǫq n3{2

?log

(10)

recovery is impossible. In contrast to the planted bisection model, all effi-cient algorithms known so far only successfully recoveryin the asymptotic regime ofσÀn.

We note that the efficient, nearly-optimal recovery is achieved for the planted bisection model by “forgetting” higher moments of the data. On the other hand, such approach is unsuitable for the spiked tensor model since the signal

yb4 does not seem to be approximable by a non-trivial low-degree polynomial.

This might shed light into an interesting phenomenon in average complexity, as in this case it seems to be crucial that second moments (or pairwise relations) carry information.

All information-theoretic phase transition exhibited in the paper readily gen-eralize tok-tensor models and this will be discussed in a future publication. In future work we will also investigate the performance of higher degree SoS algo-rithms for the planted bisection model.

References

[1] A. Goldenberg, A. X. Zheng, S. E. Fienberg, E. M. Airoldiet al., “A sur-vey of statistical network models,”Foundations and TrendsR in Machine

Learning, vol. 2, no. 2, pp. 129–233, 2010.

[2] S. Fortunato, “Community detection in graphs,”Physics reports, vol. 486, no. 3, pp. 75–174, 2010.

[3] M. E. Newman, D. J. Watts, and S. H. Strogatz, “Random graph models of social networks,”Proceedings of the National Academy of Sciences, vol. 99, no. suppl 1, pp. 2566–2572, 2002.

[4] E. M. Marcotte, M. Pellegrini, H.-L. Ng, D. W. Rice, T. O. Yeates, and D. Eisenberg, “Detecting protein function and protein-protein interactions from genome sequences,”Science, vol. 285, no. 5428, pp. 751–753, 1999. [5] J. Chen and B. Yuan, “Detecting functional modules in the yeast protein–

protein interaction network,” Bioinformatics, vol. 22, no. 18, pp. 2283– 2290, 2006.

[6] I. Cabreros, E. Abbe, and A. Tsirigos, “Detecting community structures in hi-c genomic data,” in Information Science and Systems (CISS), 2016 Annual Conference on. IEEE, 2016, pp. 584–589.

[7] G. Linden, B. Smith, and J. York, “Amazon. com recommendations: Item-to-item collaborative filtering,”IEEE Internet computing, vol. 7, no. 1, pp. 76–80, 2003.

[8] S. Sahebi and W. W. Cohen, “Community-based recommendations: a so-lution to the cold start problem,” in Workshop on recommender systems and the social web, RSWEB, 2011.

(11)

[9] R. Wu, J. Xu, R. Srikant, L. Massouli´e, M. Lelarge, and B. Hajek, “Clus-tering and inference from pairwise comparisons,” in ACM SIGMETRICS Performance Evaluation Review, vol. 43, no. 1. ACM, 2015, pp. 449–450. [10] P. W. Holland, K. B. Laskey, and S. Leinhardt, “Stochastic blockmodels:

First steps,”Social networks, vol. 5, no. 2, pp. 109–137, 1983.

[11] E. Mossel, J. Neeman, and A. Sly, “A proof of the block model threshold conjecture,”arXiv preprint arXiv:1311.4115, 2013.

[12] E. Abbe and C. Sandon, “Community detection in general stochastic block models: Fundamental limits and efficient algorithms for recovery,” in Foun-dations of Computer Science (FOCS), 2015 IEEE 56th Annual Symposium on. IEEE, 2015, pp. 670–688.

[13] E. Abbe, A. S. Bandeira, and G. Hall, “Exact recovery in the stochastic block model,” IEEE Transactions on Information Theory, vol. 62, no. 1, pp. 471–487, 2016.

[14] Y. Chen and J. Xu, “Statistical-computational tradeoffs in planted prob-lems and submatrix localization with a growing number of clusters and submatrices,” Journal of Machine Learning Research, vol. 17, no. 27, pp. 1–57, 2016.

[15] E. Abbe and C. Sandon, “Recovering communities in the general stochas-tic block model without knowing the parameters,” in Advances in neural information processing systems, 2015, pp. 676–684.

[16] B. Hajek, Y. Wu, and J. Xu, “Achieving exact cluster recovery threshold via semidefinite programming,”IEEE Transactions on Information Theory, vol. 62, no. 5, pp. 2788–2797, 2016.

[17] L. Massouli´e, “Community detection thresholds and the weak ramanujan property,” in Proceedings of the 46th Annual ACM Symposium on Theory of Computing. ACM, 2014, pp. 694–703.

[18] V. A. N. Vu, “A simple SVD algorithm for finding hidden partitions,” pp. 1–12, 2014. [Online]. Available: http://arxiv.org/abs/1404.3918v1

[19] E. Abbe and C. Sandon, “Detection in the stochastic block model with multiple clusters: proof of the achievability conjectures, acyclic bp, and the information-computation gap,”arXiv preprint arXiv:1512.09080, 2015. [20] E. Abbe, “Community detection and the stochastic block model: recent

developments,” 2016.

[21] A. S. Bandeira, “Random Laplacian Matrices and Convex Relaxations,” pp. 1–35, 2016.

(12)

[22] A. Javanmard, A. Montanari, and F. Ricci-Tersenghi, “Phase transitions in semidefinite relaxations,”Proceedings of the National Academy of Sciences, vol. 113, no. 16, pp. E2218–E2223, 2016.

[23] D. F´eral and S. P´ech´e, “The largest eigenvalue of rank one deformation of large wigner matrices,”Communications in mathematical physics, vol. 272, no. 1, pp. 185–228, 2007.

[24] A. Montanari and S. Sen, “Semidefinite programs on sparse random graphs and their application to community detection,” arXiv preprint arXiv:1504.05910, 2015.

[25] A. Perry, A. S. Wein, A. S. Bandeira, and A. Moitra, “Optimality and sub-optimality of pca for spiked random matrices and synchronization,”arXiv preprint arXiv:1609.05573, 2016.

[26] S. Agarwal, J. Lim, L. Zelnik-Manor, P. Perona, D. Kriegman, and S. Be-longie, “Beyond pairwise clustering,” in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 2. IEEE, 2005, pp. 838–845.

[27] D. Zhou, J. Huang, and B. Sch¨olkopf, “Learning with hypergraphs: Cluster-ing, classification, and embeddCluster-ing,” inNIPS, vol. 19, 2006, pp. 1633–1640. [28] D. Ghoshdastidar and A. Dukkipati, “Consistency of spectral partitioning of uniform hypergraphs under planted partition model,” in Advances in Neural Information Processing Systems, 2014, pp. 397–405.

[29] L. Florescu and W. Perkins, “Spectral thresholds in the bipartite stochastic block model,” arXiv preprint arXiv:1506.06737, 2015.

[30] K. Ahn, K. Lee, and C. Suh, “Community recovery in hypergraphs,” in

Allerton Conference on Communication, Control and Computing. UIUC, 2016.

[31] A. Montanari and E. Richard, “A statistical model for tensor PCA,” p. 30, nov 2014. [Online]. Available: http://arxiv.org/abs/1411.1076

[32] A. Perry, A. S. Wein, and A. S. Bandeira, “Statistical limits of spiked tensor models,” p. 39, dec 2016. [Online]. Available: http://arxiv.org/abs/1612.07728

[33] S. Hopkins, “Tensor principal component analysis via sum-of-squares proofs,”arXiv:1507.03269, vol. 40, pp. 1–51, 2015.

[34] V. Bhattiprolu, V. Guruswami, and E. Lee, “Certifying Random Polynomials over the Unit Sphere via Sum of Squares Hierarchy,” 2016. [Online]. Available: http://arxiv.org/abs/1605.00903

(13)

[35] T. Lesieur, L. Miolane, M. Lelarge, F. Krzakala, and L. Zdeborov´a, “Sta-tistical and computational phase transitions in spiked tensor estimation,”

arXiv preprint arXiv:1701.08010, 2017.

[36] B. Barak and D. Steurer, “Sum-of-squares proofs and the quest toward optimal algorithms,” inProceedings of the International Congress of Math-ematicians, 2014.

[37] N. Z. Shor, “An approach to obtaining global extremums in polynomial mathematical programming problems,”Cybernetics, vol. 23, no. 5, pp. 695– 700, 1987.

[38] P. A. Parrilo, “Structured semidefinite programs and semialgebraic geome-try methods in robustness and optimization,” Ph.D. dissertation, California Institute of Technology, 2000.

[39] Y. Nesterov, “Squared functional systems and optimization problems,” in

High performance optimization. Springer, 2000, pp. 405–440.

[40] J. B. Lasserre, “Global optimization with polynomials and the problem of moments,” SIAM Journal on Optimization, vol. 11, no. 3, pp. 796–817, 2001.

[41] P. Terwilliger, “The subconstituent algebra of an association scheme,(part i),”Journal of Algebraic Combinatorics, vol. 1, no. 4, pp. 363–388, 1992. [42] A. Schrijver, “New code upper bounds from the terwilliger algebra and

semidefinite programming,” IEEE Transactions on Information Theory, vol. 51, no. 8, pp. 2859–2866, 2005.

[43] J. A. Tropp, “User-friendly tail bounds for sums of random matrices,”

Foundations of computational mathematics, vol. 12, no. 4, pp. 389–434, 2012.

(14)

A

Proof of Theorem 1

In this section, we prove Theorem 1 for generalk.

Theorem 5(Theorem 1, generalk). Letkbe a positive integer withkě2. Let

yP t˘1un with 1Ty0 andTbe a k-tensor defined asTy“lk`σWwhere

Wis a randomk-tensor with i.i.d. standard Gaussian entries. LetypM Lbe the

maximum-likelihood estimator ofy, i.e.,

p

yM L“ argmax xPt˘1un:1Tx

@

T, x“lkD.

For any positiveǫ,

(i) ypM L is equal to y with probability 1´onp1qif σă p1´ǫqσ˚, and

(ii) ypM L is not equal toy with probability1´onp1q ifσą p1`ǫqσ˚

where σ˚“ c k 2k ¨ nk´21 ? 2 logn.

First we prove the following lemma.

Lemma 6. Let φ be a function defined as φptq “ 1 22k´1 ` p1´tqk` p1`tq. Then, @ x“lk, y“lkDnkφ ˆ xTy n ˙

for anyx, yP t˘1un such that 1Tx1Ty0.

Proof. Note thatx“lk 1 2k ` p1`xqbk` p1´xqbk˘. Hence, @ x“lk, y“lkD 1 22k ÿ s,tPt˘1u x1`sx,1`tyyk. Since1Tx“1Ty“0, we have @ x“lk, y“lkD 1 22k ` 2p1T1`xTyqk`2p1T1´xTyqk˘ “ n k 22k´1 ˜ˆ 1`x Ty n ˙k ` ˆ 1´x Ty n ˙k¸ “ nkφ ˆ xTy n ˙ as desired.

Proof of Theorem 5. Let

(15)

By definition,ypM Lis not equal toy if there existsxP t˘1undistinct fromy or ´y such that1Tx0 andfpxqis greater than or equal tofpyq. For each fixed

xP t˘1un with1Tx0, note that

fpxq ´fpyq “@y“lk, x“lk´y“lkD`σ@W, x“lk´y“lkD

is a Gaussian random variable with mean´nkpφp1q ´φpxTy{nqand variance

σ2 }x“lk´y“lk}2 F “2σ2nkpφp1q ´φpxTy{nqq. Hence, Prpfpxq ´fpyq ě0qis equal to Pr G„Np0,1q ˜ Gě n k{2 σ?2¨ d φp1q ´φ ˆ xTy n ˙¸ . Letσ˚ be σ˚ “aφ1p1q ¨ n k´1 2 ? 2 logn. Sinceφ1p1q “ k

2k, it matches with the definition in the statement of the Theorem.

Upper bound. Let us prove that ppypM L;σq “ 1´onp1qif σ ă p1´ǫqσ˚. By

union bound, we have 1´ppypM L;σq ď ÿ xPt˘1un zty,´yu 1Tx0 Prpfpxq ´fpyq ě0q “ ÿ xPt˘1un zty,´yu 1Tx0 Pr G„Np0,1q ˜ Gě n k{2 σ?2 ¨ d φp1q ´φ ˆ xTy n ˙¸ ď ÿ xPt˘1un zty,´yu 1Tx“0 exp ˆ ´n k 4σ2 ˆ φp1q ´φ ˆ xTy n ˙˙˙ .

The last inequality follows from a standard Gaussian tail bound PrGpGątq ď

expp´t2

{2q.

A simple counting argument shows that the number of x P t˘1un with

1Tx0 andxTy n´4ris exactly `n{2

r

˘2

forrP t0,1,¨ ¨ ¨, n{2u. Moreover, for anytě0 we haveφptq “φp´tq. Hence,

1´ppypM L;σq ď2 rnÿ{4s r“1 ˆ n{2 r ˙2 exp ˆ ´n k 4σ2 ˆ φp1q ´φ ˆ 1´4r n ˙˙˙ . Note thatφp1q ´φp1´xq ěφ1p1qx´Opx2

q. Hence, there exists an absolute constant C ą0 such that φp1q ´φp1´4r{nq is at leastp1´ǫqφ1p1q ¨4r{n if

răCnand is at least Ωp1qotherwise. Sinceσă p1´ǫqσ˚ we have

´n k 4σ2 ˆ φp1q ´φ ˆ 1´4r n ˙˙ ď ´2r1logn ´ǫ ď ´2p1`ǫqrlogn

(16)

ifr ăCn, and ´4nσk2 `

φp1q ´φ`1´4r n

˘˘

“ ´Ωpnlognq otherwise. It implies that

1´ppypM L;σq ď 2

˜Cn ÿ

r“1

expp´2ǫrlognq `nexpp´Ωpnlognqq

¸

À n´2ǫ`nexpp´Ωpnlognqq “onp1q.

Lower bound. Now we prove that ppypM L;σq “ onp1q if σ ą p1´ǫqσ˚. Let

A“ tiP rns:yi“ `1uand B“ rnszA. For eachaPA andb PB, let Eab be

the event thatfpypabqqis greater thanfpyqwhereypabqis the˘1-vector obtained

by flipping the signs ofya andyb. For anyHAĎAandHB ĎB, note that

1´ppypM L;σq ě Pr ˜ ď aPHA,bPHB Eab ¸ “ Pr ˆ max aPHA,bPHB ´ fpypabqq ´fpyq¯ą0 ˙

since any of the eventEabimplies thatypM L‰y. Recall that

fpypabqq ´fpyq “ ´nk ˆ φp1q ´φ ˆ 1´4 n ˙˙ `σAW,pypabqq“lk´y“lkE. So, 1´pppyM L;σqis at least Pr ¨ ˝max aPHA bPHB A W,pypabqq“lk´y“lk E ą n k σ ˆ φp1q ´φ ˆ 1´4 n ˙˙˛ ‚.

Fix HA Ď A and HB Ď B with |HA| “ |HB| “ h where h “ logn2

n. Let pX,Y,Zqbe the partition ofrnsk defined as

X “ tαP rnsk:α´1

pHAYHBq “ Hu,

Y “ tαP rnsk:|α´1pHAYHBq| “1u,

Z “ tαP rnsk:|α´1

pHAYHBq| ě2u.

LetWX,WYandWZ be thek-tensor supported onX ,Y,Z respectively. For eachaPHAand bPHB, let

Xab “ A WX,pypabqq“lk´y“lk E , Yab “ A WY,pypabqq“lk´y“lk E , Zab “ A WZ,pypabqq“lk´y“lk E .

(17)

(i) Xab“0 for anyaPHA andbPHB.

(ii) For fixedaPHA andbPHB, the variables Yab andZab are independent.

(iii) EachYab can be decomposed intoYa`Yb wheretYauaPHAY tYbubPHB is a

collection of i.i.d. Gaussian random variables. Proof of Claim. First note that `pypabqq“lk´y“lk˘

α is non-zero if and only if |α´1

paq|and |α´1

pbq| have the same parity. This implies (i) since when αPX we have|α´1paq| “ |α´1pbq| “0. (ii) holds becauseYXZ0.

ForsPHAYHB, letYsbe the subset ofY such that

Ys“ tαPY:|α´1psq| “1u.

By definition,Ysare disjoint andY “ŤsPHAYHBYs. Hence,

Yab“ ÿ sPHAYHB A WYs,pyp abqq“lk´y“lkE. Moreover,@WYs,pyp

abqq“lk´yl“kDis zero whensR ta, bu. So,

Yab“

ÿ

αPYaYYb

Wαppypabqql“k´y“lk.

Note that forαPYa

ppypabqq“lk´y“lk $ ’ & ’ % `1 if|α´1 pAzHAq| “0 ´1 if|α´1pBzHBq| “0 0 otherwise. So, ÿ αPYa Wαppypabqq“lk´y“lk ÿ αPYa |α´1 pAzHAq|“0 Wα´ ÿ αPYa |α´1 pBzHBq|“0 Wα

which does not depend on the choice ofb. Let

Ya “ ÿ αPYa |α´1pAzHAq|“0 Wα´ ÿ αPYa |α´1pBzHBq|“0 Wα

and Yb respectively. Then Yab “ Ya `Yb and tYsusPHAYHB is a collection of

independent Gaussian random variables. Moreover, the variance ofYs is equal

to 2k`n

2 ´h

˘k´1

, independent of the choice ofs. By the claim, we have@W,pypabqq“lk´y“lkDY

a`Yb`Zab. Moreover, max aPHA bPHB pYa`Yb`Zabq ě max aPHA bPHB pYa`Ybq ´max aPHA bPHB p´Zabq “ max aPHA Ya`max bPHB Yb´max aPHA bPHB p´Zabq.

(18)

We need the following tail bound on the maximum of Gaussian random variables.

Lemma 7. Let G1, . . . , GN be (not necessarily independent) Gaussian random

variables with variance1. Letǫą0 be a constant which does not depend onN. Then, Pr ˆ max i“1,¨¨¨,NGią p1`ǫq a 2 logN ˙ ďN´ǫ.

On the other hand,

Pr ˆ max i“1,¨¨¨,NGi ă p1´ǫq a 2 logN ˙ ďexpp´NΩpǫqq ifGi’s are independent.

By the Lemma, we have max aPHA Ya ě p1´0.01ǫq c 2 logh¨2k´n 2 ´h ¯k´1 max bPHA Yb ě p1´0.01ǫq c 2 logh¨2k´n 2 ´h ¯k´1 max aPHA bPHB Zab À a

logh¨max VarpZabq

with probability 1´onp1q. Note that VarpZabq “ }pypabqq“lk´y“lk}2 F ´ pVarpYAq `VarpYBqq “ 2nk ˆ φp1q ´φ ˆ 1´ 4 n ˙˙ ´4k´n 2 ´h ¯k´1 ď 8nk´1 ˆ φ1p1q ´ p1´op1qqk 2k ˙ which isopnk´1q. Hence, max aPHA bPHB A W,pypabqq“lk´y“lkE ě 2p1´0.01ǫ´op1qq c 2 logn¨2k´n 2 ´h ¯k´1 ě p1´0.01ǫ´op1qq c knk´1logn 2k´5 .

On the other hand, sinceσą p1`ǫqσ˚ we have

nk σ ˆ φp1q ´φ ˆ 1´n4 ˙˙ ă n k 1`ǫ¨ 4φ1p1q n ¨ d 2 logn nk´1φ1p1q ă 11 `ǫ c nk´1logn 2k´5

(19)

which is less than max aPHA bPHB A W,pypabqq“lk´y“lkE

with probability 1´onp1q. Thus, 1´ppypM Lq ě1´onp1q.

B

Proof of Theorem 2

In this section, we prove Theorem 2 for generalk.

Letk be a positive integer withką2. LetyP t˘1un with1Ty0 andT

be ak-tensor defined as T y“lk`σW where W is a randomk-tensor with

i.i.d. standard Gaussian entries. Letfpxq “ @x“lk,TD and letf

p2qpxq be the

degree 2 truncation offpxq, i.e.,

fp2qpxq “ ÿ αPrnsk Tα ˜ 1 2k´1 ÿ 1ďsătďk xαpsqxαptq ¸ .

For eachtsătu Ď rks, let Qst benbynmatrix where

Qstij “12 ¨ ˚ ˚ ˝ ÿ αPrnsk αpsq“i,αptq“j Tα` ÿ αPrnsk αpsq“j,αptq“i Tα ˛ ‹ ‹ ‚ Then, fp2qpxq “ 1 2k´1 @ Q, xxTD

whereQ“ř1ďsătďkQst. We consider the following SDP relaxation for maxxfp2qpxq:

max xQ, Xy

subject to Xii“1 for alliP rns,

@

X,11TD0, X ľ0.

(4)

Theorem 8(Theorem 2, generalk). Let ǫą0be a constant not depending on

n. LetYp be a solution of (4) andppYp;σqbe the probability thatYp coincide with

yyT. Letσ p2q be σp2q“ c kpk´1q 22k´1 ¨ nk´21 ? 2 logn Ifσă p1´ǫqσp2q, then ppYp;σq “1´onp1q.

Proof. First note that Qst nk´2

2k´1yyT `σWst where

Wijst ÿ

αPrnsk

αpsq“i,αptq“j

(20)

So we have Qnk´2¨kpk´1q 2k yy T `σWĎ whereĎW ř1ďsătďkWst.

The dual program of (4) is

max trpDq

subject to D`λ11T ´Qľ0,

D is diagonal.

(5)

By complementary slackness, yyT is the unique optimum solution of (4) if

there exists a dual feasible solutionpD, λqsuch that

@

D`λ11T´Q, yyTD0

and the second smallest eigenvalue ofD`λ11T ´Qis greater than zero. For

brevity, letS“D`λ11T ´Q. Since S is positive semidefinite, we must have

Sy“0, that is, Dii “ n ÿ j“1 Qijyiyj for anyiP rns.

For a symmetric matrixM, we define theLaplacian LpMqofM asLpMq “

diagpM1q ´M (See [21]). Using this language, we can expressS as

S“diagpyq`LpdiagpyqQdiagpyqq `λyyT˘diagpyq.

Note that

diagpyqQdiagpyq “nk´2¨kpk2´k 1q11T `σdiagpyqWĎdiagpyq.

Hence, the Laplacian of diagpyqQdiagpyqis equal to

nk´1¨kpk´1q 2k ˆ Inˆn´ 1 n11 T ˙ looooooooooooooooooooomooooooooooooooooooooon deterministic part

looooooooooooomooooooooooooonL`diagpyqĎWdiagpyq˘

noisy part

.

This matrix is positive semidefinite if

σ››LpdiagpyqWĎdiagpyq››ďnk´1¨kpk2´k 1q.

Moreover, if the inequality is strict then the second smallest eigenvalue ofS is greater than zero.

By triangle inequality, we have

›LpdiagpyqWĎdiagpyqq›› ď max

iPrns n ÿ j“1 Ď Wijyiyj` ÿ 1ďsătďk }diagpyqWstdiagpyq} ď max iPrns n ÿ j“1 Ď Wijyiyj` ˆ k 2 ˙? 2nk´1.

(21)

The second inequality holds with high probability since Wst has independent

Gaussian entries. Sinceřnj1ĎWijyiyjis Gaussian and centered, with high

prob-ability we have that

max iPrns n ÿ j“1 Ď Wijyiyj ď p1`0.1ǫq a 2 logn¨ ˜ max iPrnsVar ˜ n ÿ j“1 Ď Wijyiyj ¸¸1{2 ď p1`0.1ǫqa2 logn¨ dˆ k 2 ˙ nk´1. Hence, › ›LpdiagpyqWĎdiagpyqq››ď p1`0.1ǫ`op1qq d 2 ˆ k 2 ˙ nk´1logn

with high probability. So,Sľ0 as long as

σp1`0.1ǫ`op1qq d 2 ˆ k 2 ˙ nk´1lognănk´1 ¨kpk´1q 2k or simply p1`0.1ǫ`op1qqσă c kpk´1q 22k´1 ¨ nk´21 ? 2 logn “σp2q.

So, whenσă p1´ǫqσp2qwith high probabilityYp “yyT.

C

Pseudo-expectation and its moment matrix

C.1

Notation and preliminaries

Let V be the space of real-valued functions on the n-dimensional hypercube

t´1,`1un. For each S Ď rns, let x

S “śiPSxi. Note that txS : S Ď rnsu is

a basis ofV. Hence, any function f : t˘1un ÑR can be written as a unique

linear combination of multilinear monomials, say

fpxq “ ÿ

SĎrns

fSxS.

The degree of f is defined as the maximum size of S Ă rns such that fS is

nonzero.

Letℓ be a positive integer. Let us denote the collection of subsets ofrnsof size at mostℓby`rďns˘, and the sizeˇˇˇ`rďnℓs

˘ˇˇˇof it by`n

ďℓ

˘

.

LetM be a square symmetric matrix of size`ďn˘. The rows and the columns ofM are indexed by the elements in`rďns˘. To avoid confusion, we useMrS, Ts

(22)

We say M is SoS-symmetric if MrS, Ts “ MrS1, T1s whenever x

SxT “

xS1xT1on the hypercube. SincexP t˘1un, it means thatS‘T “S1‘T1where

ST denotes the symmetric difference ofS andT. Givenf PV with degree at most 2ℓ, we sayM represents f if

fU “

ÿ

S,TPprďnℓsq:

S‘T“U

MrS, Ts.

We useMf to denote the unique SoS-symmetric matrix representingf.

LetLbe a linear functional onV. By linearity,Lis determined bypLrxSs:

S Ď rnsq. Let XL be the SoS-symmetric matrix of size

`n

ďℓ

˘

with entries

XrS, Ts “LrxS‘Ts. We callXL themoment matrix ofL of degree2ℓ. By definition, we have Lrfs “ ÿ UPprns ď2ℓq fULrxUs “ ÿ S,TPprďnℓsq MfrS, TsXLrS, Ts “ xXL, Mfy

for anyf of degree at most 2ℓ.

C.2

A quick introduction to pseudo-expectations

For our purpose, we only work on pseudo-expectations defined on the hypercube

t˘1un. See [36] for general definition.

Letℓbe a positive integer and d2ℓ. Apseudo-expectation of degree don

t˘1un is a linear functional Er on the space V of functions on the hypercube

such that (i) Err 1s “1, (ii) Err q2

s ě0 for anyqPV of degree at mostℓd{2. We say Er satisfies the system of equalities tpipxq “0um

i“1 if rErfs “0 for any

f PV of degree at most dwhich can be written as

f “p1q1`p2q2` ¨ ¨ ¨ `pmqm

for someq1, q2, . . . , qmPV.

We note the following facts:

• IfEris a pseudo-expectation of degreed, then it is also a pseudo-expectation of any degree smaller thand.

• IfEris a pseudo-expectation of degree 2n, thenErdefines a valid probability distribution supported onP:“ txP t˘1un:pipxq “0 for alliP rmsu.

(23)

The second fact implies that maximizing fpxq onP is equivalent to maxi-mizingErr fsover all pseudo-expectations of degree 2nsatisfyingtpipxq “0um

i“1.

Now, letdbe an even integer such that

dąmaxtdegpfq,degpp1q,¨ ¨ ¨,degppmqu.

We relax the original problem to max Err fs

subject to Er is degree-dpseudo-expectation ont˘1un

satisfyingtpipxq “0um i“1.

(SoSd)

We note that the value of (SoSd) decreases as d grows, and it reaches the optimum value maxxPPp0pxqof the original problem at d“2n.

C.3

Matrix point of view

Let Er be a pseudo-expectation of degree d “ 2ℓ for some positive integer ℓ. Suppose that rE satisfies the system tpipxq “ 0um

i“1. Let XEr be the moment

matrix ofEr of degree 2ℓ, hence the size ofXrEis`ďn˘.

Conditions forEr being pseudo-expectation translate as the following condi-tions forXrE:

(i) XH,H“1.

(ii) X is positive semidefinite. Moreover,rEsatisfiestpipxq “0um

i“1 if and only if

(iii) Let U be the space of functions in V of degree at most dwhich can be written asřmi1piqi for someqiPV. Then,

@

Mf, XEr D

“0 for anyf PU. Hence, (SoSd) can be written as the following semidefinite program

max xMf, Xy

subject to XH,H“1

xMq, Xy “0 for all qPB

X ľ0,

(SDPd)

whereBis any finite subset ofU which spansU, for example, B“ txSpipxq:iP rms,|S| ďd´degppiqu.

D

Proof of Theorem 4

Lety P t˘1un such that 1Ty 0, σ ą 0, andW P pRnqb4 be 4-tensor with

independent, standard Gaussian entries. Given a tensor T yb4

(24)

would like to recover the planted solutiony fromT. Letfpxq “@T, xb4D. The

maximum-likelihood estimator is given by the optimum solution of max

xPt˘1un:1T

x“0fpxq.

Consider the SoS relaxation of degree 4 max Err fs

subject to Er is a pseudo-expectation of degree 4 ont˘1un

satisfying

n

ÿ

i“1

xi “0.

LetEUpty,´yuqbe the expectation operator of the uniform distribution onty,´yu,

i.e.,EUpty,´yuqrxSs “yS if|S|is even, andEUpty,´yuqrxSs “0 if|S|is odd.

IfEUpty,´yuqis the optimal solution of the relaxation, then we can recovery

from it up to a global sign flip. First we give an upper bound onσ to achieve it with high probability.

Theorem 9 (Part one of Theorem 4). Let T yb4 `σW and fpxq “

@

T, xb4D be as defined above. If σ À ?n

logn, then the relaxation maxrEErr fs

over pseudo-expectationEr of degree 4 satisfyingřni1xi“0is maximized when

r

E“EUpty,´yuq with probability1´onp1q.

We can reduce Theorem 9 to the matrix version of the problem via flattening [31]. Given a 4-tensor T, the canonical flattening of T is defined as n2

ˆn2

matrix T with entries Tpi,jq,pk,ℓq “ Tijkℓ. Then, T “ ryyrT `σW where ry is

the vectorization ofyyT, and W is the flattening of W. Note that this is an instance ofZ2-synchronization model with Gaussian noises. It follows that with

high probability the exact recovery is possible whenσÀ ?n

logn (see Proposition

2.3 in [21]).

We complement the result by providing a lower bound onσwhich is off by polylog factor.

Theorem 10 (Part two of Theorem 4). Let c ą 0 be a small constant. If

σěnplognq1{2`c, then there exists a pseudo-expectation Er of degree 4 on the

hypercube t˘1un satisfying řn

i“1xi “0 such that rErfs ąfpyq with probability

1´onp1q.

D.1

Proof of Theorem 10

Letgpxqbe the noise part offpxq, i.e., gpxq “@W, xb4D. Let r

E be a pseudo-expectation of degree 4 on the hypercube which satisfies the equalityřni1xi “

(25)

Lemma 11. Let gpxq be the polynomial as defined above. Then, there exists a pseudo-expectation of degree 4 on the hypercube satisfying1Tx0 such that

r

Ergs Á n

3

plognq1{2`op1q.

We prove Theorem 10 using the lemma.

Proof of Theorem 10. Note that gpyq “@W, yb4Dis a Gaussian random

vari-able with variance n4. So, gpyq ď n2logn with probability 1´op1q. Let Er

be the pseudo-expectation satisfying the conditions in the lemma. Then, with probability 1´op1qwe have r Erfs ´fpyq “ ´pn4´rErpyTxq4 sq `σpErr gs ´gpyqq ě ´n4`σ ˆ n3 logn`n 2log n ˙ ě ´n4` p1´op1qq σn 3 plognq1{2`op1q.

Sinceσąnplognq1{2`c for some fixed constantcą0, we haveErr fs ´fpyq ą0

as desired.

In the remainder of the section, we prove Lemma 11.

D.1.1 Outline

We note that our method shares a similar idea which appears in [33] and [34]. We are given a random polynomialgpxq “@W, xb4DwhereWhas

indepen-dent standard Gaussian entries. We would like to constructEr“ErWwhich has large correlation withW. If we simply let

r Erxi1xi2xi3xi4s “ 1 24 ÿ πPS4 Wiπp1q,iπp2q,iπp3q,iπp4q

forti1ăi2ăi3ăi4u Ď rnsandErr xTsbe zero if|T| ď3, then

r Ergs “ 241 ÿ 1ďi1ăi2ăi3ăi4ďn ˜ ÿ πPS4 Wiπp1q,iπp2q,iπp3q,iπp4q ¸2

so the expectation of Err gs over W would be equal to `n4˘ « n4

24. However,

in this case Er does not satisfies the equality 1Tx 0 nor the conditions for

pseudo-expectations.

To overcome this, we first project the Er constructed above to the space of linear functionals which satisfy the equality constraints (x2

i “1 and1Tx“0).

Then, we take a convex combination of the projection and a pseudo-expectation to control the spectrum of the functional. The details are following:

(26)

(1) (Removing degeneracy) We establish the one-to-one correspondence be-tween the collection of linear functionals on n-variate, even multilinear polynomials of degree at most 4 and the collection of linear functionals onpn´1q-variate multilinear polynomials of degree at most 4 by posing

xn “1. This correspondence preserves positivity.

(2) (Description of equality constraints) Letψbe a linear functional onpn´1q -variate multilinear polynomials of degree at most 4. We may thinkψ as a vector inRp

n´1

ď4q. Then, the condition thatψ satisfies řn´1

i“1 xi`1“0

can be written asAψ0 for some matrixA. (3) (Projection) LetwPRp

n´1

ď4qbe the coefficient vector ofgpxq. Let Π be the

projection matrix to the spacetx:Ax“0u. In other words, Π“Id´ATpAATq:A

whereIdis the identity matrix of size`nď´41˘andp¨q: denotes the

pseudo-inverse. Lete be the first column of Π andψ1 “ eΠTw

w. Then pψ1qH “1

andAψ1“0 by definition.

(4) (Convex combination) Letψ0“ eTe

e. We note thatψ0 corresponds to the

expectation operator of uniform distribution ontxP t˘1un:1Tx0u.

We will constructψby

ψ“ p1´ǫqψ0`ǫψ1

with an appropriate constantǫ. Equivalently,

ψψ0` ǫ eTw¨ ˆ Π´ee T eTe ˙ w.

(5) (Spectrum analysis) We bound the spectrum of the functional´Π´eeT

eTe

¯

w

to decide the size ofǫforψbeing positive semidefinite.

D.1.2 Removing degeneracy

Recall that

gpxq “ ÿ i,j,k,lPrns

Wijklxixjxkxℓ.

Observe thatg is even, i.e.,gpxq “gxqfor anyxP t˘1un. To maximize such

an even function, we claim that we may only consider the pseudo-expectations such that whose odd moments are zero.

Proposition 12. Let Er be a pseudo-expectation of degree 4 on hypercube sat-isfying řni1xi “0. Let p be a degree 4 multilinear polynomial which is even.

Then, there exists a pseudo-expectation Er1 of degree 4 such that Err ps “Er1rps

(27)

Proof. LetrEbe a pseudo-expectation of degree 4 on hypercube satisfyingřni1xi“

0. Let us define a linear functionalEr0 on the space of multilinear polynomials

of degree at most 4 so thatEr0rxSs “ p´1q|S|Err xSsfor any SP

`rns

ď4

˘

. Then, for any multilinear polynomialqof degree at most 2, we have

r

E0rqpxq2s “Err qp´xq2s ě0.

Also,rE0 satisfiesEr0r1s “1 and

r E0 «˜ n ÿ i“1 xi ¸ qpxq ff “ ´Er «˜ n ÿ i“1 xi ¸ qp´xq ff “0

for anyqof degree 3. Thus,Er0 is a valid pseudo-expectation of degree 4

satis-fyingřni1xi “0.

LetEr1“ 1

2pE`r Er0q. This is again a valid pseudo-expectation, since the space

of pseudo-expectations is convex. We haverE1rppxqs “Err ppxqs “Er0rppxqssince

pis even, andEr1rxSs “ p1` p´1q|S|qErr xSs “0 for anyS of odd size.

LetE be the space of all pseudo-expectations of degree 4 onn-dimensional hypercube with zero odd moments. LetE1be the space of all pseudo-expectations of degree 4 onpn´1q-dimensional hypercube. We claim that there is a bijection between two spaces.

Proposition 13. Let ψPE. Let us define a linear functional ψ1 on the space

of pn´1q-variate multilinear polynomials of degree at most 4 so that for any

T Ď rn´1s with|T| ď4

ψ1rxTs “

#

ψrxTYtnus if |T| is odd

ψrxTs otherwise.

Then,ψÞÑψ1 is a bijective mapping from E toE1.

Proof. We say linear functionalψon the space of polynomials of degree at most 2ℓispositive semidefinite ifψrq2s ě0 for anyq of degree.

Note that the mapping ψ1 ÞÑ ψ where ψrxSs “ ψ1rxSztnus for any S Ď rns

of even size is the inverse of ψÞÑψ1. Hence, it is sufficient to prove thatψ is positive semidefinite if and only ifψ1 is positive semidefinite.

(ñ) Let q be an n-variate polynomial of degree at most 2. Let q0 and q1 be

polynomials inx1,¨ ¨ ¨, xn´1 such that

qpx1,¨ ¨ ¨, xnq “q0px1,¨ ¨ ¨ , xn´1q `xnq1px1,¨ ¨ ¨ , xn´1q.

We getψ1rq2s “ψ1rpq0`x

nq1q2s “ψ1rpq02`q12q `2xnq0q1s. For i“1,2, letqi0

andqi1 be the even part and the odd part of qi, respectively. Then we have

ψ1rq2s “ ψ1rpq002 `q 2 01`q 2 10`q 2 11q `2xnpq00q11`q01q10qs “ ψrpq002 `q 2 01`q 2 10`q 2 11q `2pq00q11`q01q10qs “ ψrpq00`q11q2` pq10`q01q2s ě0.

(28)

The first equality follows from that ψ1rqs “0 for odd q. Hence,ψ1 is positive

semidefinite.

(ð) Letqbe an pn´1q-variate polynomial of degree at most 2. Letq0 andq1

be the even part and the odd part ofq, respectively. Then,

ψrq2s “ψrpq02`q 2

1q `2q0q1s.

Note thatq2

0`q12is even andq0q1 is odd. So,

ψrq2s “ψ1rpq20`q 2

1q `2xnq0q1s “ψ1rpq0`xnq1q2s ě0.

Henceψis positive semidefinite.

In addition to the proposition, we note that ψ satisfies řni1xi “ 0 if and

only ifψ1satisfies 1`řn´1

i“1 xi“0. Hence, maximizingErr gsoverrEPEsatisfying

řn i“1xi“0 is equivalent to max ψ1PE1ψ 1rg1s subject to ψ1 satisfies 1` nÿ´1 i“1 xi “0, whereg1px1,¨ ¨ ¨ , x n´1q “gpx1,¨ ¨ ¨ , xn´1,1q.

D.1.3 Matrix expression of linear constraints

LetF be the set of linear functional on the space ofpn´1q-variate multilinear polynomials of degree at most 4. We often regard a functionalψPFas a`n´1

ď4

˘

dimensional vector with entriesψS “ψrxSswhereSis a subset ofrn´1sof size

at most 4. The spaceE1 of pseudo-expectations of degree 4 (onpn´1q-variate multilinear polynomials) is a convex subset ofF.

Observe thatψPF satisfies 1`řin“´11xi“0 if and only if

ψ «˜ 1` nÿ´1 i“1 xi ¸ xS ff “0 for anySĎ rn´1swith|S| ď3.

Lets,t andube integers such that 0ďs, tď4 and 0ďuďminps, tq. Let

Mu

s,t be the matrix of size

`n´1 ď4 ˘ such that pMs,tqS,Tu # 1 if|S| “s,|T| “t, and|SXT| “u 0 otherwise forS, T P`rn´1s ď4 ˘

. Then, the condition that ψPF satisfying 1`řin“´11xi “0

can be written asAψ“0 where

AM0 0,0`M00,1` 3 ÿ s“1 pMs,s´11`Ms s,s`Ms,ss `1q.

(29)

D.1.4 Algebra generated by Mu s,t

Letmbe a positive integer greater than 8. For nonnegative integerss, t, u, let

Mu s,t be the `m ď4 ˘ ˆ`m ď4 ˘ matrix with pMs,tqS,Tu “ # 1 if|S| “s,|T| “t, and|SXT| “u 0 otherwise,

forS, TĎ rmswith|S|,|T| ď4. LetAbe the algebra of matrices

ÿ

0ďs,tď4

sÿ^t u“0

xus,tMs,tu

with complex numbers xu

s,t. This algebra A is a C˚-algebra: it is a complex

algebra which is closed under taking complex conjugate. A is a subalgebra of the Terwilliger algebra of the Hamming cubeHpm,2q[41], [42].

Note that A has dimension 55 which is the number of triples ps, t, uqwith 0ďs, tď4 and 0ďuďs^t. Define βus,t,r:“ sÿ^t p“0 p´1qp´t ˆ p u ˙ˆ m´2r p´r ˙ˆ m´r´p s´p ˙ˆ m´r´p t´p ˙

for 0ďs, tď4 and 0ďr, uďs^t. The following theorem says that matrices in the algebraAcan be written in a block-diagonal form with small sized blocks.

Theorem 14([42]). There exists an orthogonal`ďm4˘ˆ`ďm4

˘

matrixU such that forM PAwith M 4 ÿ s,t“0 sÿ^t u“0 xus,tMs,tu,

the matrixUTM U is equal to the matrix ¨ ˚ ˚ ˚ ˚ ˝ C0 0 0 0 0 0 C1 0 0 0 0 0 C2 0 0 0 0 0 C3 0 0 0 0 0 C4 ˛ ‹ ‹ ‹ ‹ ‚ where each Cr is a block diagonal matrix with

`m r ˘ ´`rm´1 ˘ repeated, identical blocks of order5´r: Cr“ ¨ ˚ ˚ ˚ ˝ Br 0 ¨ ¨ ¨ 0 0 Br ¨ ¨ ¨ 0 .. . ... . .. ... 0 0 ¨ ¨ ¨ Br ˛ ‹ ‹ ‹ ‚,

(30)

and Br“ ˜ ÿ u ˆ m´2r s´r ˙´1{2ˆ m´2r t´r ˙´1{2 βs,t,ru xus,t ¸4 s,t“r .

For brevity, let us denote this block-diagonalization of M by the tuple of matricespB0, B1, B2, B3, B4qwithp5´rq ˆ p5´rqmatrixBr’s.

D.1.5 Projection

Recall thatg1px1,¨ ¨ ¨, xn´1qis equal to

ÿ

i,j,k,ℓPrns

Wijkℓxti,j,k,ℓuztnu.

LetcPRp

n´1

ď4qbe the coefficient vector ofg1. Since entries ofWare independent

Standard Gaussians,c is a Gaussian vector with a diagonal covariance matrix Σ“ErccTswhere ΣS,S“ $ ’ & ’ % n if|S| “0 12n´16 if|S| “1 or|S| “2 24 if|S| “3 or|S| “4.

LetwΣ´1{2c. By definition,wis a random vector with i.i.d. standard normal

entries.

Let Π“Id´ATpAATq`A wherepAATq` is the Moore-Penrose

pseudoin-verse of AAT and Id is the identity matrix of order `n´1

ď4

˘

. Then, Π is the orthogonal projection matrix onto the nullspace ofA. SinceA,AT and Idare

all in the algebraA, the projection matrix Π is also inA. Letebe the first column of Π and

ψ0:“

e

eTe and ψ1:“

Πw eTw.

We have Aψ0 “ Aψ1 “ 0 by definition of Π, and pψ0qH “ pψ1qH “ 1 since

pΠwqH“eTw.

Let ǫ be a real number with 0 ă ǫ ă 1 and ψ “ p1´ǫqψ0`ǫψ1. This

functional still satisfiesAψ “0 and ψH “1, regardless of the choice of ǫ. We would like to chooseǫsuch thatψis positive semidefinite with high probability.

D.1.6 Spectrum of ψ

Consider the functionalψ0“ eTe

e. It has entries pψ0qS “ $ ’ & ’ % 1 ifS“ H ´ 1 n´1 if|S| “1 or 2 3 pn´1qpn´3q if|S| “3 or 4,

(31)

forS Ď rn´1sof size at most 4. We note that this functional corresponds to the degree 4 or less moments of the uniform distribution on the set of vectors

xP t˘1un´1satisfyingřn´1

i“1 xi`1“0.

Proposition 15. Let ψ be a vector in Rp

n´1

ď4q such that Aψ “ 0 and p be an pn´1q-variate multilinear polynomial of degree at most 2. Suppose that

ψ0rp2s “0. Then, ψrp2s “0.

Proof. Let U “ txP t˘1un´1 :řn´1

i“1 xi`1 “0u. Note thatψ0 is the

expec-tation functional of the uniform distribution on U as we seen above. Hence,

ψ0rp2s “0 if and only ifppxq2“0 for anyxPU.

On the other hand, the functionalψ is a linear combination of functionals

tp ÞÑ ppxq : x P Uu since Aψ 0. Hence, if ψ0rp2s “ 0 then ψrp2s “ 0 as

ppxq2

“0 for anyxPU.

Recall that ψ “ p1 ´ǫqψ0`ǫψ1 where ψ0 “ eTe

e and ψ1 “ Πw eT w. Let ψ11 “eTw¨ pψ1´ψ0q. Then, ψ11 “ Πw´ eTw eTee “ ˆ Π´ee T eTe ˙ w andψψ0`e

wψ11. We note that Aψ11“0 sinceψ11 is a linear combination

ofψ0 andψ1.

LetXψ0 andXψ11 be the moment matrix ofψ0 andψ

1

1respectively. LetXψ

be the moment matrix ofψ. Clearly,

Xψ“Xψ0`

ǫ eTwXψ11.

Moreover, for any pP Rp

n´1 ď2q satisfyingX ψ0p“ 0, we haveXψ11p “0 by the proposition. Hence,Xψľ0 if ǫ |eTw|}Xψ11} ďλmin,‰0pXψ0q

whereλmin,‰0 denotes the minimum nonzero eigenvalue.

We note that eTw and }Xψ1

1} are independent random variables. It

fol-lows from that w is a gaussian vector with i.i.d. standard entries, and thate

and ´Π´eeT

eTe

¯

are orthogonal. Hence, we can safely bound eTw and }X ψ1

1}

separately.

To bound}Xψ1

1} we need the following theorem.

Theorem 16 (Matrix Gaussian ([43])). Let tAku be a finite sequence of fixed, symmetric matrices with dimensiond, and lettξkube a finite sequence of

inde-pendent standard normal random variables. Then, for anytě0,

Pr«››››ÿ k ξkAk › › › › ›ět ff ďd¨e´t2{2σ2 where σ2:“ › › › › › ÿ k A2k › › › › ›.

References

Related documents

Performance evaluation of monitoring tools developed with DiSL showed less overhead than AspectJ implementation of such tool, while providing more functionality (e.g. basic block

V Ashy Bulbul Hemixos flavala Datanla Falls, Di Nong Trai, Lang Biam Mountain. V Mountain Bulbul Hypsipetes mcclellandii Di Nong Trai , Lang

F IGURE 1: Test situation for comparing 3 different learning methods (elec- tronic training collar, pinch collar and quitting signal) in Belgian Ma- linois police dogs.... at

Since the analyses began with the visual experience along the Green Line, the proposed design scenarios will give riders a new narrative of the South Side of Chicago. The proposed

In Qatar, the OH system is in a transitional development stage, and systematic data collection is needed to evaluate and plan OH care for the public. Unfortunately, to

Di ff erences, con fl icts and power relations are crucial for understanding the political nature of urban design and thus the challenges faced by e ff orts to achieve

and sitting upright, seat forward, bent legs, bent arms..  Most

We do not explore larger (i.e. more negative) cutoffs because they produce very insufficient number of reformers. Table 3 shows similar trends across the different business