Boaz-SOS-TCSplus.ppsx

(1)

Boaz Barak – Microsoft Research

Joint work with Jonathan Kelner (MIT) and David Steurer

(Cornell)

Fun and Games with Sums of Squares

(2)

This talk is about

•

Hilbert’s 17

th

problem / Positivstellensatz

•

Proof complexity

•

Semidefinite programming

•

The Unique Games Conjecture

•

Machine Learning

(3)

Exercise: Prove that for every

Hard way to solve: check all extremal points of P (where gradient vanishes)

… there are exponentially many of them

(4)

Minkowski (1885): Is every non-negative polynomial a sum of squares?

Hilbert (1888): No!

Indeed, the question of whether a 3SAT formula is satisfiable can be encoded as whether a degree 6 poly is non-negative, and

thus can’t always have a short proof unless

Non-constructive existence proof of a non-negative degree 4 bivariate poly that is not an SOS.

(…first constructive example by Motzkin in 1965)

Yay! We proved

Hilbert 17th problem(1900): Is every non-negative

polynomial a sum of squares of rational functions?

Artin (1927) ,Krivine (1964), Stengle (1974): Yes!

Not just over but also over zero sets of arbitrary polys

Grigoriev-Vorobjov (1999) ,Grigoriev (2001) :

Whoa! Degree proofs take bits to write down… Some 3SAT formulas require degree.

(5)

SOS Algorithm:

For low degree we consider the program :

max

𝑥 ∈ℝ𝑛

𝑃

(

𝑥

)

𝑠

.

𝑡

.

𝑃

₁

(

𝑥

)

=

⋯

=

𝑃

_𝑘

(

𝑥

)

=

0

SOS Proof that :

SOS polys s.t.

(

𝜈

−

𝑃

)

𝑆

=

𝑆

′

+

1

(

𝑚𝑜𝑑 𝑃

₁

,

..

,

𝑃

_𝑘

)

Degree of proof: max degree of [Gregoriev-Vorobjov’99]

Theorem: [Shor ’87, Parillo ’00, Nesterov ’00, Lasserre ’01]

1) A proof of degree can be found in time.

2) Can find in time the min s.t. degree d proof that

(6)

SOS Algorithm:

For low degree we consider the program :

max

𝑥 ∈ℝ𝑛

𝑃

(

𝑥

)

𝑠

.

𝑡

.

𝑃

₁

(

𝑥

)

=

⋯

=

𝑃

_𝑘

(

𝑥

)

=

0

SOS Proof that :

Polynomials and SOS polys s.t.

(

𝜈

−

𝑃

)

𝑆

=

𝑆

′

+

1

(

𝑚𝑜𝑑 𝑃

₁

,

..

,

𝑃

_𝑘

)

Degree of proof: max degree of [Gregoriev-Vorobjov’99]

Theorem: [Shor ’87, Parillo ’00, Nesterov ’00, Lasserre ’01]

1) A proof of degree can be found in time.

2) Can find in time the min s.t. degree d proof that

Positivstellensatz: All true bounds have SOS proof. [Artin ’27, Krivine ’64, Stengle ‘74]

(7)

Program :

max

𝑥 ∈ℝ𝑛

𝑃

(

𝑥

)

𝑠

.

𝑡

.

𝑃

₁

(

𝑥

)

=

⋯

=

𝑃

_𝑘

(

𝑥

)

=

0

SOS Proof that :

Can optimize in time over programs with degree proofs.

(

𝜈

−

𝑃

)

𝑆

=

𝑆

′

+

1

(

𝑚𝑜𝑑 𝑃

₁

,

..

,

𝑃

_𝑘

)

Can’t hope for always: Captures SAT, CLIQUE, 3COL, MAX-CUT, etc…

But maybe often? Essentially only one (robust) lower bound showing [Grigoriev ’01]

Applications:

• Optimizing polynomials w/ non-negative coefficients over sphere.

• Algorithms for quantum separability problem [Brandao-Harrow’13]

• Sparse coding: learning dictionaries beyond the barrier.

• Finding sparse vectors in subspaces.

• Approach to refute the Unique Games Conjecture.

This talk: General method to analyze the SOS algorithm. [B-Kelner-Steurer’13]

Rest of this

talk:

• _{Super high level description of approach.} • More concrete – reduction to task of

“finding a needle in a needle-stack”

• Implementing reduction via pseudoexpectations

• Example: Sparse Coding (aka dictionary learning)

(8)

Program :

max

𝑥 ∈ℝ𝑛

𝑃

(

𝑥

)

𝑠

.

𝑡

.

𝑃

₁

(

𝑥

)

=

⋯

=

𝑃

_𝑘

(

𝑥

)

=

0

SOS Proof that :

Can optimize in time over programs with degree proofs.

(

𝜈

−

𝑃

)

𝑆

=

𝑆

′

+

1

(

𝑚𝑜𝑑 𝑃

₁

,

..

,

𝑃

_𝑘

)

Can’t hope for always: Captures SAT, CLIQUE, 3COL, MAX-CUT, etc…

But maybe often? Essentially only one (robust) lower bound showing [Grigoriev ’01]

Applications:

• Optimizing polynomials w/ non-negative coefficients over sphere.

• Algorithms for quantum separability problem [Brandao-Harrow’13]

• Sparse coding: learning dictionaries beyond the barrier.

• Finding sparse vectors in subspaces.

• Approach to refute the Unique Games Conjecture.

This talk: General method to analyze the SOS algorithm. [B-Kelner-Steurer’13]

Rest of this

talk:

• _{Super high level description of approach.} • More concrete – reduction to task of

“finding a needle in a needle-stack”

• Implementing reduction via pseudoexpectations

• Example: Sparse Coding (aka dictionary learning)

(9)

Traditional relaxation based approach for

solving/approximating :

1) Define relaxation optimizing over larger set of ’s.

(e.g., if define the set , optimize over instead)

2) Find rounding algorithm mapping larger set into valid ’s.

Our approach:

Invert the steps -

1) Find combining algorithm mapping some set into valid ’s.

2) Use relaxation to supply inputs to

Our Approach: High-Level Description

Crucial ingredient: view of relaxation as a proof system.

Program :

max

𝑥 ∈ℝ𝑛

𝑃

(

𝑥

)

𝑠

.

𝑡

.

(10)

Program :

max

𝑥 ∈ℝ𝑛

𝑃

(

𝑥

)

𝑠

.

𝑡

.

𝑃

₁

(

𝑥

)

=

⋯

=

𝑃

_𝑘

(

𝑥

)

=

0

Finding is hard. We consider easier problem:

“Finding a needle in a needle-stack”

Given many ’s maximizing , find a single with value close to maximum.

(multi) set of s.t. ,

Single s.t. ,

Combiner

Non-trivial combiner:

Only depends on low degree marginals of

\{

𝔼

𝑥∼𝑆

𝑥

𝑖

1

⋯

𝑥

𝑖

𝑘

\}

𝑖

₁

,

..

,

𝑖

_𝑘

∈

[

𝑛

]

[B-Kelner-Steurer’13]: Transform “simple” non-trivial combiners to

algorithm for original problem.

Idea in a nutshell: Simple combiners will output a solution even when fed

(11)

Program :

max

𝑥 ∈ℝ𝑛

𝑃

(

𝑥

)

𝑠

.

𝑡

.

𝑃

₁

(

𝑥

)

=

⋯

=

𝑃

_𝑘

(

𝑥

)

=

0

Single s.t. ,

Combiner Non-trivial combiner:

\{

𝔼

𝑥∼𝑆

𝑥

𝑖

1

⋯

𝑥

𝑖

𝑘

\}

𝑖

₁

,

..

,

𝑖

_𝑘

∈

[

𝑛

]

“fake marginals”.

Pseudoexpectations (aka “Fake

Marginals”)

Def: [Lasserre ’01] Degree pseudoexpectation is operator mapping any degree poly into a number satisfying:

• Normalization:

• Linearity: of deg

• Positivity: of deg

Fundamental Fact: deg SOS proof for

for any deg pseudoexpectation operator

Take home message:

• Pseudoexpectation “looks like” real expectation to low degree polynomials.

• Can efficiently find pseudoexpectation matching any polynomial constraints.

• Proofs about real random vars can often be “lifted” to pseudoexpectation.

(12)

algorithm for original problem. Program :

max

𝑥 ∈ℝ𝑛

𝑃

(

𝑥

)

𝑠

.

𝑡

.

𝑃

₁

(

𝑥

)

=

⋯

=

𝑃

_𝑘

(

𝑥

)

=

0

Single s.t. ,

Combiner Non-trivial combiner:

\{

𝔼

𝑥∼𝑆

𝑥

𝑖

1

⋯

𝑥

𝑖

𝑘

\}

𝑖

₁

,

..

,

𝑖

_𝑘

∈

[

𝑛

]

Pseudoexpectations (aka “Fake

Marginals”)

Def: [Lasserre ’01] Degree pseudoexpectation is operator mapping any degree poly into a number satisfying:

• Normalization:

• Linearity: of deg

• Positivity: of deg

Fundamental Fact: deg SOS proof for

for any deg pseudoexpectation operator

Take home message:

• Pseudoexpectation “looks like” real expectation to low degree polynomials.

• Can efficiently find pseudoexpectation matching any polynomial constraints.

• Proofs about real random vars can often be “lifted” to pseudoexpectation.

Deg pseudoexpectation operator can be represented by p.s.d matrix

Problem: Given low degree maximize s.t.

(13)

Problem: Given low degree maximize s.t.

Non-trivial combiner: Alg with

Input: , r.v. over s.t.

Output: s.t.

Corollary: In this case, we can find efficiently:

• _{Use SOS SDP to find}_{pseudoexpectation} _{matching input} conditions.

• Use to map into an actual solution

Crucial Observation: If proof that is good solution is in SOS framework, then it holds even if is fed with a pseudoexpectation.

Combining Rounding

𝔼

(

𝑃

(

𝑥

)

−v

)

2

=

0

,

∀

_𝑖

𝔼

𝑃

_𝑖

(

𝑋

)

2

=

0

dist of s.t. ,

Single s.t. ,

(14)

Goal: Given examples of form , where recover

Find the “right” representation of observed data

Previous best (rigorous) results:

[Spielman-Wang-Wright ’12, Arora-Moitra-Ge ‘13, Agrawal-Anandkumar-Jain-Netrapalli-Tandon ‘13]

We show: is sufficient* (even in non-independent, overcomplete case) Let set of vectors.

LOTS of work: important primitive in Machine Learning, Vision, Neuroscience...

Example Application: Dictionary Learning / Sparse Coding

[Olhausen-Field ’96]

(15)

Goal: Given examples of form , where recover

Find the “right” representation of observed data

Previous best (rigorous) results:

[Spielman-Wang-Wright ’12, Arora-Moitra-Ge ‘13, Agrawal-Anandkumar-Jain-Netrapalli-Tandon ‘13]

We show: is sufficient* (even in non-independent, overcomplete case) Let set of vectors.

LOTS of work: important primitive in Machine Learning, Vision, Neuroscience,…

Example Application: Dictionary Learning / Sparse Coding

[Olhausen-Field ’96]

(16)

(3) Show that arguments in (1) and (2) fall under the SOS framework.

Goal:

Given examples of form , where recover Let set of vectors.

Achieve in 3 steps:

Result generalizes to overcomplete,

non independent case.

For simplicity, assume , ’s orthonormal basis, i.i.d. random vars over s.t.

(1) Find a program s.t. every maximizing is close to one of ’s

(2) Give combining alg taking moments of dist over maximizers into a vector close to one of ’s.

Consider the polynomial

𝑃

(

𝑥

)

=

𝔼

⟨

𝑦

,

𝑥

⟩

4

=

𝔼

(

∑

𝑊

𝑖

⟨

𝑎

𝑖

,

𝑥

⟩

(can approximate arbitrarily _{well from examples)}

)

4

Opening parenthesis we get

𝑃

(

𝑥

)

≤

𝜇

∑

⟨

𝑎

𝑖

,

𝑥

⟩

4

+

2

𝜇

2

(

∑

⟨

𝑎

𝑖

,

𝑥

⟩

2

)

2

=

𝜇

∑

⟨

𝑎

𝑖

,

𝑥

⟩

4

+

𝑜

(

𝜇

)

∥ 𝑥 ∥

4

Corollary: unit,

Establishes (1) !

(17)

Goal:

Achieve in 3 steps:

Result generalizes to overcomplete,

non independent case.

For simplicity, assume , ’s orthonormal basis, i.i.d. random vars over s.t.

Consider the polynomial

𝑃

(

𝑥

)

=

𝔼

⟨

𝑦

,

𝑥

⟩

4

=

𝔼

(

∑

𝑊

𝑖

⟨

𝑎

𝑖

,

𝑥

⟩

(can approximate arbitrarily _{well from examples)}

)

4

Opening parenthesis we get

𝑃

(

𝑥

)

≤

𝜇

∑

⟨

𝑎

𝑖

,

𝑥

⟩

4

+

2

𝜇

2

(

∑

⟨

𝑎

𝑖

,

𝑥

⟩

2

)

2

=

𝜇

∑

⟨

𝑎

𝑖

,

𝑥

⟩

4

+

𝑜

(

𝜇

)

∥ 𝑥 ∥

4

Corollary: unit,

(18)

Step 2. Let be dist over unit vectors s.t. every satisfies for some

Pick set of random (std gaussian) vectors.

for Let be matrix s.t.

Our combining algorithm outputs the top e-vec of .

Suppose that and for every , .

(Note that )

Then if then (up to scaling) and we’ll succeed.

Goal:

Achieve in 3 steps:

Slightly tedious but mostly* straightforward computations.

(19)

Unique Games Conjecture: UG/SSE problem is NP-hard. [Khot’02,Raghavendra-Steurer’08]

reasons to believe reasons to suspect

“Standard crypto heuristic”: Tried to solve it and couldn’t.

Very clean picture of complexity landscape:

simple algorithms are optimal [Khot’02…Raghavendra’08….]

Random instances are easy via simple algorithm

[Arora-Khot-Kolla-Steurer-Tulsiani-Vishnoi’05]

Simple poly algorithms can’t refute it

[Khot-Vishnoi’04] _{Subexponential algorithm}

[Arora-B-Steurer ‘10]

Quasipoly algo on KV instance [Kolla ‘10]

Simple subexp' algorithms can’t refute it

[B-Gopalan-Håstad-Meka-Raghavendra-Steurer’12] SOS solves all candidate hard

instances [B-Brandao-Harrow-Kelner-Steurer-Zhou ‘12] S O S p ro o f sy st e m

SOS useful for sparse vector problem

Candidate algorithm for search problem

[B-Kelner-Steurer ‘13]

A personal overview of the Unique Games Conjecture

Skeletal program to prove UGC

(20)

Conclusions

• Sum of Squares is a powerful algorithmic framework that can yield strong results for the right problems.

(contrast with previous results on SDP/LP hierarchies, showing lower bounds when using either wrong hierarchy or wrong problem.)

• “Combiner” view allows to focus on the features of the problem rather than details of relaxation.

• SOS seems particularly useful for problems with some geometric structure, includes several problems related to unique games and machine learning.

• Still have only rudimentary understanding when SOS works or not.

(21)

(22)

Other Results

Sparse vector problem:

Recover -sparse vector in -dimensional subspace given arbitrary basis.

Random case: Recovery for any

(Improving on [Demanet-Hand ‘13])

[Brandao-Harrow’12]: Using our techniques, find separable quantum state maximizing a “local operations classical communication” () measurement.

Worst case: Recovery* for

(motivation: machine learning, optimization , [Demanet-Hand 13]