• No results found

Algorithmic Challenges in High-Dimensional Inference Models. Insights from Statistical Physics

N/A
N/A
Protected

Academic year: 2021

Share "Algorithmic Challenges in High-Dimensional Inference Models. Insights from Statistical Physics"

Copied!
105
0
0

Loading.... (view fulltext now)

Full text

(1)

Algorithmic Challenges in High-Dimensional

Inference Models. Insights from Statistical

Physics

David Gamarnik

MIT

Operations Research Center and IDSS MADDD Seminar

Joint work withQuan Li (MIT), Ilias Zadik (NYU), Subhabrata Sen (Harvard), Aukosh Jagannath (U of

(2)

Computational Challenges in High-Dimensional Infrerence

Modern day inference challenges are characterized by Model size (BIG Data)

Structure of parameters (sparsity, discreteness, low-dimensionality, etc.)

(3)

Computational Challenges in High-Dimensional Infrerence

Modern day inference challenges are characterized by

Model size (BIG Data)

Structure of parameters (sparsity, discreteness, low-dimensionality, etc.)

(4)

Computational Challenges in High-Dimensional Infrerence

Modern day inference challenges are characterized by Model size (BIG Data)

Structure of parameters (sparsity, discreteness, low-dimensionality, etc.)

(5)

Computational Challenges in High-Dimensional Infrerence

Modern day inference challenges are characterized by Model size (BIG Data)

Structure of parameters (sparsity, discreteness, low-dimensionality, etc.)

(6)

Computational Challenges in High-Dimensional Infrerence

We need a new theory addressing computational hardness and tractability

High-dim problems are computationally challenging, giving a rise to the so-called Information Theoretic vs

Computation gap. What’s the reason?

(7)

Computational Challenges in High-Dimensional Infrerence

We need a new theory addressing computational hardness and tractability

High-dim problems are computationally challenging, giving a rise to the so-called Information Theoretic vs

Computation gap. What’s the reason?

(8)

Computational Challenges in High-Dimensional Infrerence

We need a new theory addressing computational hardness and tractability

High-dim problems are computationally challenging, giving a rise to the so-called Information Theoretic vs

Computation gap.

What’s the reason?

(9)

Computational Challenges in High-Dimensional Infrerence

We need a new theory addressing computational hardness and tractability

High-dim problems are computationally challenging, giving a rise to the so-called Information Theoretic vs

Computation gap. What’s the reason?

(10)

Computational Challenges in High-Dimensional Infrerence

We need a new theory addressing computational hardness and tractability

High-dim problems are computationally challenging, giving a rise to the so-called Information Theoretic vs

Computation gap. What’s the reason?

(11)

Example I: Largest Submatrix Problem Given m × n matrix J J =    J11 . . . J1n ... ... ... Jm1 . . . Jmn   , Find a k × ` submatrix        J11 . . . J1n ...    ∗ . . . ∗ ... J∗ k,` ... ∗ . . . ∗    ... Jm1 . . . Jmn        ,

with the largest average entry Ave(J∗

(12)

Who cares?..

Bi-clustering: applications in genetics, drug design, energy and social networks.

Genomics: J is gene vs expression data.

m = 2, 500 − 15, 000, n = 10 − 150Madeira & Oliveira survey [2004], Fortunato et al [2010], Shabalin et al [2009].

(13)

Who cares?..

Bi-clustering: applications in genetics, drug design, energy and social networks.

Genomics: J is gene vs expression data.

m = 2, 500 − 15, 000, n = 10 − 150Madeira & Oliveira survey [2004], Fortunato et al [2010], Shabalin et al [2009].

(14)

Who cares?..

Bi-clustering: applications in genetics, drug design, energy and social networks.

Genomics: J is gene vs expression data.

m = 2, 500 − 15, 000, n = 10 − 150Madeira & Oliveira survey [2004], Fortunato et al [2010], Shabalin et al [2009].

(15)

Who cares?..

(16)

Who cares?..

(17)

Theory Background

[Sun, Nobel [2013],[Bhamidi, Dey & Nobel 2013] J = N (0, In)

Global optimum when m = n, k = ` = o(log n) Ave(J∗

(18)

Theory Background

[Sun, Nobel [2013],[Bhamidi, Dey & Nobel 2013] J = N (0, In)

Global optimum when m = n, k = ` = o(log n) Ave(J∗

(19)

Theory Background

[Sun, Nobel [2013],[Bhamidi, Dey & Nobel 2013] J = N (0, In)

Global optimum when m = n, k = ` = o(log n) Ave(J∗

(20)

Algorithms

Several algorithms analyzed inG & Li [2016]

The best of the three algorithms produces a matrix with average entry Ave(CALG n,k )≈ 4 √ 2 3 r log n k <2r log n k ≈ OPT. as 4√2/3 = 1.885... < 2.

Nothing better is known.

(21)

Algorithms

Several algorithms analyzed inG & Li [2016]

The best of the three algorithms produces a matrix with average entry Ave(CALG n,k )≈ 4 √ 2 3 r log n k <2r log n k ≈ OPT. as 4√2/3 = 1.885... < 2.

Nothing better is known.

(22)

Algorithms

Several algorithms analyzed inG & Li [2016]

The best of the three algorithms produces a matrix with average entry Ave(CALG n,k )≈ 4 √ 2 3 r log n k <2r log n k ≈ OPT. as 4√2/3 = 1.885... < 2. Nothing better is known.

(23)

Algorithms

Several algorithms analyzed inG & Li [2016]

The best of the three algorithms produces a matrix with average entry Ave(CALG n,k )≈ 4 √ 2 3 r log n k <2r log n k ≈ OPT. as 4√2/3 = 1.885... < 2.

Nothing better is known.

(24)

Algorithms

Several algorithms analyzed inG & Li [2016]

The best of the three algorithms produces a matrix with average entry Ave(CALG n,k )≈ 4 √ 2 3 r log n k <2r log n k ≈ OPT. as 4√2/3 = 1.885... < 2.

Nothing better is known.

(25)

Algorithms

Several algorithms analyzed inG & Li [2016]

The best of the three algorithms produces a matrix with average entry Ave(CALG n,k )≈ 4 √ 2 3 r log n k <2r log n k ≈ OPT. as 4√2/3 = 1.885... < 2.

Nothing better is known.

(26)

Example II: Spiked Matrix Detection

Random matrix with sparse rank-1 signal: J = λθ∗ · (θ∗)T +W , θ∗ ∈ {0, 1}n, s − sparse, W =d N (0, In) J = λ           1 . . . 1 0 . . . 0 ... ... ... ... ... ... 1 . . . 1 0 . . . 0 0 . . . 0 0 . . . 0 ... ... ... ... ... ... 0 . . . 0 0 . . . 0           +    J11 . . . J1n ... ... ... Jn1 . . . Jnn   ,

(27)

Example II: Spiked Matrix Detection

Random matrix with sparse rank-1 signal: J = λθ∗ · (θ∗)T +W , θ∗ ∈ {0, 1}n, s − sparse, W =d N (0, In) J = λ           1 . . . 1 0 . . . 0 ... ... ... ... ... ... 1 . . . 1 0 . . . 0 0 . . . 0 0 . . . 0 ... ... ... ... ... ... 0 . . . 0 0 . . . 0           +    J11 . . . J1n ... ... ... Jn1 . . . Jnn   ,

(28)

Example II: Spiked Matrix Detection

Random matrix with sparse rank-1 signal: J = λθ∗ · (θ∗)T +W , θ∗ ∈ {0, 1}n, s − sparse, W =d N (0, In) J = λ           1 . . . 1 0 . . . 0 ... ... ... ... ... ... 1 . . . 1 0 . . . 0 0 . . . 0 0 . . . 0 ... ... ... ... ... ... 0 . . . 0 0 . . . 0           +    J11 . . . J1n ... ... ... Jn1 . . . Jnn   ,

(29)

Example II: Spiked Matrix Detection

Random matrix with sparse rank-1 signal: J = λθ∗ · (θ∗)T +W , θ∗ ∈ {0, 1}n, s − sparse, W =d N (0, In) J = λ           1 . . . 1 0 . . . 0 ... ... ... ... ... ... 1 . . . 1 0 . . . 0 0 . . . 0 0 . . . 0 ... ... ... ... ... ... 0 . . . 0 0 . . . 0           +    J11 . . . J1n ... ... ... Jn1 . . . Jnn   ,

(30)

Background

Problem studied widely in recent years: Lelarge, Miolane [2009]

Butucea & Ingster [2013],

Deshphande & Montanari [2014]

Lesieur, Krzakala, Zdeborova [2015,2017] Dia, Macris, Krzakala, Lesieur, Zdeborova [2016] Montanari, Reichman, Zeitouni [2015]

Jagannath, Lopato, Milane [2018] G, Jagannath, Sen [2019]

State of the art understanding

MLE (computed by brute force) recovers θreliably if and

only if λ ≥ λINFO,G, Jagannath, Sen [2019]

Fast (poly-time) algorithm recovers θreliably when

λ > λCOMP Deshphande & Montanari [2014]

The regime λINFO < λ < λCOMPis not understood. Does a

(31)

Background

Problem studied widely in recent years: Lelarge, Miolane [2009]

Butucea & Ingster [2013],

Deshphande & Montanari [2014]

Lesieur, Krzakala, Zdeborova [2015,2017] Dia, Macris, Krzakala, Lesieur, Zdeborova [2016] Montanari, Reichman, Zeitouni [2015]

Jagannath, Lopato, Milane [2018] G, Jagannath, Sen [2019]

State of the art understanding

MLE (computed by brute force) recovers θreliably if and

only if λ ≥ λINFO,G, Jagannath, Sen [2019]

Fast (poly-time) algorithm recovers θreliably when

λ > λCOMP Deshphande & Montanari [2014]

The regime λINFO < λ < λCOMPis not understood. Does a

(32)

Background

Problem studied widely in recent years: Lelarge, Miolane [2009]

Butucea & Ingster [2013],

Deshphande & Montanari [2014]

Lesieur, Krzakala, Zdeborova [2015,2017] Dia, Macris, Krzakala, Lesieur, Zdeborova [2016] Montanari, Reichman, Zeitouni [2015]

Jagannath, Lopato, Milane [2018] G, Jagannath, Sen [2019]

State of the art understanding

MLE (computed by brute force) recovers θreliably if and

only if λ ≥ λINFO,G, Jagannath, Sen [2019]

Fast (poly-time) algorithm recovers θreliably when

λ > λCOMP Deshphande & Montanari [2014]

The regime λINFO < λ < λCOMPis not understood. Does a

(33)

Background

Problem studied widely in recent years: Lelarge, Miolane [2009]

Butucea & Ingster [2013],

Deshphande & Montanari [2014]

Lesieur, Krzakala, Zdeborova [2015,2017] Dia, Macris, Krzakala, Lesieur, Zdeborova [2016] Montanari, Reichman, Zeitouni [2015]

Jagannath, Lopato, Milane [2018] G, Jagannath, Sen [2019]

State of the art understanding

MLE (computed by brute force) recovers θreliably if and

only if λ ≥ λINFO,G, Jagannath, Sen [2019]

Fast (poly-time) algorithm recovers θreliably when

λ > λCOMP Deshphande & Montanari [2014]

The regime λINFO < λ < λCOMPis not understood. Does a

(34)

Background

Problem studied widely in recent years: Lelarge, Miolane [2009]

Butucea & Ingster [2013],

Deshphande & Montanari [2014]

Lesieur, Krzakala, Zdeborova [2015,2017] Dia, Macris, Krzakala, Lesieur, Zdeborova [2016] Montanari, Reichman, Zeitouni [2015]

Jagannath, Lopato, Milane [2018] G, Jagannath, Sen [2019]

State of the art understanding

MLE (computed by brute force) recovers θreliably if and

only if λ ≥ λINFO,G, Jagannath, Sen [2019]

Fast (poly-time) algorithm recovers θreliably when

λ > λCOMP Deshphande & Montanari [2014]

The regime λINFO < λ < λCOMPis not understood. Does a

(35)

Background

Problem studied widely in recent years: Lelarge, Miolane [2009]

Butucea & Ingster [2013],

Deshphande & Montanari [2014]

Lesieur, Krzakala, Zdeborova [2015,2017] Dia, Macris, Krzakala, Lesieur, Zdeborova [2016] Montanari, Reichman, Zeitouni [2015]

Jagannath, Lopato, Milane [2018] G, Jagannath, Sen [2019]

State of the art understanding

MLE (computed by brute force) recovers θreliably if and

only if λ ≥ λINFO,G, Jagannath, Sen [2019]

Fast (poly-time) algorithm recovers θreliably when

λ > λCOMP Deshphande & Montanari [2014]

The regime λINFO < λ < λCOMPis not understood. Does a

(36)

Example III: Sparse High-Dimensional Linear Regression Model Y = Xβ∗+W    Y1 ... Yn   =    X11 X12 . . . X1p ... ... ... ... Xn1 Xn2 . . . Xnp       β∗ 1 ... β∗ p   +    W1 ... Wn   

Goal: Recover β∗from observed X and Y . Sparsity: β∗is sparse:

(37)

Example III: Sparse High-Dimensional Linear Regression Model Y = Xβ∗+W    Y1 ... Yn   =    X11 X12 . . . X1p ... ... ... ... Xn1 Xn2 . . . Xnp       β∗ 1 ... β∗ p   +    W1 ... Wn   

Goal: Recover β∗from observed X and Y . Sparsity: β∗is sparse:

(38)

Example III: Sparse High-Dimensional Linear Regression Model Y = Xβ∗+W    Y1 ... Yn   =    X11 X12 . . . X1p ... ... ... ... Xn1 Xn2 . . . Xnp       β∗ 1 ... β∗ p   +    W1 ... Wn   

Goal: Recover β∗from observed X and Y .

Sparsity: β∗is sparse:

(39)

Example III: Sparse High-Dimensional Linear Regression Model Y = Xβ∗+W    Y1 ... Yn   =    X11 X12 . . . X1p ... ... ... ... Xn1 Xn2 . . . Xnp       β∗ 1 ... β∗ p   +    W1 ... Wn   

Goal: Recover β∗from observed X and Y . Sparsity: β∗is sparse:

(40)

Background

MLE: solve (hard) quadratic minimization problem min

β kY − X βk2

Subject to: kβk0=k.

Questions:

(a) Is the optimal solution βOPTa good approximation of the

ground truth β∗?

(41)

Background

MLE: solve (hard) quadratic minimization problem min

β kY − X βk2

Subject to: kβk0=k.

Questions:

(a) Is the optimal solution βOPTa good approximation of the

ground truth β∗?

(42)

Background

MLE: solve (hard) quadratic minimization problem min

β kY − X βk2

Subject to: kβk0=k.

Questions:

(a) Is the optimal solution βOPTa good approximation of the

ground truth β∗?

(43)

Background

Spectacular success of convex optimization based techniques,Donoho, Tanner, Wainwright, Hastie, Tibshirani, Candes, Tao:

Replace kβk0=k with a convex constraint kβk1=k and

solve the convex minimization problem.

Suppose entries of X and Y are i.i.d. Gaussian. The method recovers the ground truth: βOPT= β∗.

Works when

n > 2k log p , nConvex,

(assuming k  n  p).

(44)

Background

Spectacular success of convex optimization based techniques,Donoho, Tanner, Wainwright, Hastie, Tibshirani, Candes, Tao:

Replace kβk0=k with a convex constraint kβk1=k and

solve the convex minimization problem.

Suppose entries of X and Y are i.i.d. Gaussian. The method recovers the ground truth: βOPT= β∗.

Works when

n > 2k log p , nConvex,

(assuming k  n  p).

(45)

Background

Spectacular success of convex optimization based techniques,Donoho, Tanner, Wainwright, Hastie, Tibshirani, Candes, Tao:

Replace kβk0=k with a convex constraint kβk1=k and

solve the convex minimization problem.

Suppose entries of X and Y are i.i.d. Gaussian. The method recovers the ground truth: βOPT= β∗.

Works when

n > 2k log p , nConvex,

(assuming k  n  p).

(46)

Background

Spectacular success of convex optimization based techniques,Donoho, Tanner, Wainwright, Hastie, Tibshirani, Candes, Tao:

Replace kβk0=k with a convex constraint kβk1=k and

solve the convex minimization problem.

Suppose entries of X and Y are i.i.d. Gaussian.

The method recovers the ground truth: βOPT= β∗.

Works when

n > 2k log p , nConvex,

(assuming k  n  p).

(47)

Background

Spectacular success of convex optimization based techniques,Donoho, Tanner, Wainwright, Hastie, Tibshirani, Candes, Tao:

Replace kβk0=k with a convex constraint kβk1=k and

solve the convex minimization problem.

Suppose entries of X and Y are i.i.d. Gaussian. The method recovers the ground truth: βOPT= β∗.

Works when

n > 2k log p , nConvex,

(assuming k  n  p).

(48)

Background

Spectacular success of convex optimization based techniques,Donoho, Tanner, Wainwright, Hastie, Tibshirani, Candes, Tao:

Replace kβk0=k with a convex constraint kβk1=k and

solve the convex minimization problem.

Suppose entries of X and Y are i.i.d. Gaussian. The method recovers the ground truth: βOPT= β∗.

Works when

n > 2k log p , nConvex,

(assuming k  n  p).

(49)

Background

Spectacular success of convex optimization based techniques,Donoho, Tanner, Wainwright, Hastie, Tibshirani, Candes, Tao:

Replace kβk0=k with a convex constraint kβk1=k and

solve the convex minimization problem.

Suppose entries of X and Y are i.i.d. Gaussian. The method recovers the ground truth: βOPT= β∗.

Works when

n > 2k log p , nConvex,

(assuming k  n  p).

(50)

Challenging Regime

No algorithmsare known in the regime (except noiseless case)

nINFO ,

nConvex

log1 + σk2

 < n < nConvex

(51)

Challenging Regime

No algorithmsare known in the regime (except noiseless case)

nINFO,

nConvex

log1 + σk2

 < n < nConvex

(52)

Challenging Regime

No algorithmsare known in the regime (except noiseless case)

nINFO,

nConvex

log1 + σk2

 < n < nConvex

(53)

Who is he culprit? Topological obstruction to algorithms

Many other examples: Random K-Sat, MaxCut on random graphs, Stochastic Block Model, Planted Clique, Spiked Tensor problem, Sparse Covariance Estimation problem, Locally Computable One Way Functions, etc, etc.

Many of these problems appear to exhibit a complex solution space topologyin the hard regime and do not exhibit it in the tractable regime

Topological obstruction come in the form of Overlap Gap Property (OGP)

(54)

Who is he culprit? Topological obstruction to algorithms

Many other examples: Random K-Sat, MaxCut on random graphs, Stochastic Block Model, Planted Clique, Spiked Tensor problem, Sparse Covariance Estimation problem, Locally Computable One Way Functions, etc, etc.

Many of these problems appear to exhibit a complex solution space topologyin the hard regime and do not exhibit it in the tractable regime

Topological obstruction come in the form of Overlap Gap Property (OGP)

(55)

Who is he culprit? Topological obstruction to algorithms

Many other examples: Random K-Sat, MaxCut on random graphs, Stochastic Block Model, Planted Clique, Spiked Tensor problem, Sparse Covariance Estimation problem, Locally Computable One Way Functions, etc, etc.

Many of these problems appear to exhibit a complex solution space topologyin the hard regime and do not exhibit it in the tractable regime

Topological obstruction come in the form of Overlap Gap Property (OGP)

(56)

Who is he culprit? Topological obstruction to algorithms

Many other examples: Random K-Sat, MaxCut on random graphs, Stochastic Block Model, Planted Clique, Spiked Tensor problem, Sparse Covariance Estimation problem, Locally Computable One Way Functions, etc, etc.

Many of these problems appear to exhibit a complex solution space topologyin the hard regime and do not exhibit it in the tractable regime

Topological obstruction come in the form of Overlap Gap Property (OGP)

(57)

Who is he culprit? Topological obstruction to algorithms

Many other examples: Random K-Sat, MaxCut on random graphs, Stochastic Block Model, Planted Clique, Spiked Tensor problem, Sparse Covariance Estimation problem, Locally Computable One Way Functions, etc, etc.

Many of these problems appear to exhibit a complex solution space topologyin the hard regime and do not exhibit it in the tractable regime

Topological obstruction come in the form of Overlap Gap Property (OGP)

(58)

Overlap Gap Property

Generic minimization problem with random input X min

θ∈ΘL(θ, X ).

OGP holds if there exists R > 0, such that the set {θ : L(θ, X ) ≤ min

θ∈ΘL(θ, X ) + R}

is disconnected.

(59)

Overlap Gap Property

Generic minimization problem with random input X min

θ∈ΘL(θ, X ).

OGP holds if there exists R > 0, such that the set {θ : L(θ, X ) ≤ min

θ∈ΘL(θ, X ) + R}

is disconnected.

(60)

Overlap Gap Property

Generic minimization problem with random input X min

θ∈ΘL(θ, X ).

OGP holds if there exists R > 0, such that the set {θ : L(θ, X ) ≤ min

θ∈ΘL(θ, X ) + R}

is disconnected.

(61)

Overlap Gap Property

Generic minimization problem with random input X min

θ∈ΘL(θ, X ).

OGP holds if there exists R > 0, such that the set {θ : L(θ, X ) ≤ min

θ∈ΘL(θ, X ) + R}

is disconnected.

(62)

Overlap Gap Property 0 1 ⇣ (⇣) ⌧1 ⌧2 R min ✓ L(✓, X )

Formulas for Hypothesis Testing Module January 21, 2018 2 / 2 0 1 ⇣ (⇣) ⌧1 ⌧2 R min ✓ L(✓, X ) ✓

Formulas for Hypothesis Testing Module January 21, 2018 2 / 2 0 1 ⇣ (⇣) ⌧1 ⌧2 R min ✓ L(✓, X ) ✓

(63)

Overlap Gap Property 0 1 ⇣ (⇣) ⌧1 ⌧2 R min ✓ L(✓, X )

Formulas for Hypothesis Testing Module January 21, 2018 2 / 2

0 1 ⇣

(⇣)

⌧1 ⌧2 R

min

✓ L(✓, X )

Formulas for Hypothesis Testing Module January 21, 2018 2 / 2 0 1 ⇣ (⇣) ⌧1 ⌧2 R min ✓ L(✓, X ) ✓

Formulas for Hypothesis Testing Module January 21, 2018 2 / 2 0 1 ⇣ (⇣) ⌧1 ⌧2 R min ✓ L(✓, X ) ✓

(64)

Overlap Gap Property 0 1 ⇣ (⇣) ⌧1 ⌧2 R min ✓ L(✓, X )

Formulas for Hypothesis Testing Module January 21, 2018 2 / 2

0 1 ⇣

(⇣)

⌧1 ⌧2 R

min

✓ L(✓, X )

Formulas for Hypothesis Testing Module January 21, 2018 2 / 2 0 1 ⇣ (⇣) ⌧1 ⌧2 R min ✓ L(✓, X ) ✓

Formulas for Hypothesis Testing Module January 21, 2018 2 / 2 0 1 ⇣ (⇣) ⌧1 ⌧2 R min ✓ L(✓, X ) ✓

(65)

OGP in models with planted signal

Ground truth signal θ∗is observed via a noisy observation

X =d X |θ∗.

Inferring signal θ∗ involves solving a minimization problem

ˆ

θ =arg min

θ∈ΘL(θ, θ ∗,

X ). with the hope that ˆθ is close to θ∗.

When is it ”hard” to find signal θ∗?

For every ζ ∈ [0, 1] consider Γ(ζ) : min

θ:kθ−θ∗k=ζL(θ, θ

(66)

OGP in models with planted signal

Ground truth signal θ∗is observed via a noisy observation

X =d X |θ∗.

Inferring signal θ∗ involves solving a minimization problem

ˆ

θ =arg min

θ∈ΘL(θ, θ ∗,

X ). with the hope that ˆθ is close to θ∗.

When is it ”hard” to find signal θ∗?

For every ζ ∈ [0, 1] consider Γ(ζ) : min

θ:kθ−θ∗k=ζL(θ, θ

(67)

OGP in models with planted signal

Ground truth signal θ∗is observed via a noisy observation

X =d X |θ∗.

Inferring signal θ∗ involves solving a minimization problem

ˆ

θ =arg min

θ∈ΘL(θ, θ ∗,

X ). with the hope that ˆθ is close to θ∗.

When is it ”hard” to find signal θ∗?

For every ζ ∈ [0, 1] consider Γ(ζ) : min

θ:kθ−θ∗k=ζL(θ, θ

(68)

OGP in models with planted signal

Ground truth signal θ∗is observed via a noisy observation

X =d X |θ∗.

Inferring signal θ∗ involves solving a minimization problem

ˆ

θ =arg min

θ∈ΘL(θ, θ ∗,

X ). with the hope that ˆθ is close to θ∗.

When is it ”hard” to find signal θ∗?

For every ζ ∈ [0, 1] consider Γ(ζ) : min

θ:kθ−θ∗k=ζL(θ, θ

(69)

OGP in models with planted signal

Ground truth signal θ∗is observed via a noisy observation

X =d X |θ∗.

Inferring signal θ∗ involves solving a minimization problem

ˆ

θ =arg min

θ∈ΘL(θ, θ ∗,

X ). with the hope that ˆθ is close to θ∗.

When is it ”hard” to find signal θ∗?

For every ζ ∈ [0, 1] consider Γ(ζ) : min

θ:kθ−θ∗k=ζL(θ, θ

,X ).

(70)

OGP in models with planted signal

Ground truth signal θ∗is observed via a noisy observation

X =d X |θ∗.

Inferring signal θ∗ involves solving a minimization problem

ˆ

θ =arg min

θ∈ΘL(θ, θ ∗,

X ). with the hope that ˆθ is close to θ∗.

When is it ”hard” to find signal θ∗?

For every ζ ∈ [0, 1] consider Γ(ζ) : min

θ:kθ−θ∗k=ζL(θ, θ

(71)

OGP in models with planted signal

Suppose Γ is not monotone.

(⇣) ⇣

(⇣)

Formulas for Hypothesis Testing Module January 20, 2018 2 / 2 0 1 ⇣

(⇣)

Formulas for Hypothesis Testing Module January 20, 2018 2 / 2

0 1 ⇣

(⇣)

Formulas for Hypothesis Testing Module January 20, 2018 2 / 2

(72)

OGP in models with planted signal

Non-monotonicity leads to OGP: every R-optimal solution is either τ1-close or τ2-far from the ground signal θ∗.

(⇣) ⇣

(⇣)

Formulas for Hypothesis Testing Module January 20, 2018 2 / 2 0 1 ⇣ (⇣) 0 1 ⇣ (⇣) 0 1 ⇣ (⇣) ⌧1 ⌧2

Formulas for Hypothesis Testing Module January 20, 2018 2 / 2

0 1 ⇣

(⇣)

⌧1 ⌧2

Formulas for Hypothesis Testing Module January 20, 2018 2 / 2 0 1 ⇣

(⇣)

⌧1 ⌧2 R

Formulas for Hypothesis Testing Module January 20, 2018 2 / 2

(73)

Overlap Gap Property

OGP is a provable obstacle to

local algorithms in sparse random graphsG & Sudan [2014,2017], Rahman & Virag [2014], Coja-Oghlan, Haqshenas & Hetterich [2016],

Markov Chain Monte Carlo methodsG, Jagannath & Sen [2019], G & Zadik [2019]

(74)

Overlap Gap Property

OGP is a provable obstacle to

local algorithms in sparse random graphsG & Sudan [2014,2017], Rahman & Virag [2014], Coja-Oghlan, Haqshenas & Hetterich [2016],

Markov Chain Monte Carlo methodsG, Jagannath & Sen [2019], G & Zadik [2019]

(75)

Overlap Gap Property

OGP is a provable obstacle to

local algorithms in sparse random graphsG & Sudan [2014,2017], Rahman & Virag [2014], Coja-Oghlan, Haqshenas & Hetterich [2016],

Markov Chain Monte Carlo methodsG, Jagannath & Sen [2019], G & Zadik [2019]

(76)

Overlap Gap Property

OGP is a provable obstacle to

local algorithms in sparse random graphsG & Sudan [2014,2017], Rahman & Virag [2014], Coja-Oghlan, Haqshenas & Hetterich [2016],

Markov Chain Monte Carlo methodsG, Jagannath & Sen [2019], G & Zadik [2019]

(77)

Overlap Gap Property

OGP is a provable obstacle to

local algorithms in sparse random graphsG & Sudan [2014,2017], Rahman & Virag [2014], Coja-Oghlan, Haqshenas & Hetterich [2016],

Markov Chain Monte Carlo methodsG, Jagannath & Sen [2019], G & Zadik [2019]

(78)

OGP for the Largest Submatrix Problem

Fix α ∈ (0, 1) and two k × k submatrices C1,C2with

Ave(C1)≈ Ave(C2)≈ αOPT. Theorem (G & Li [2016])

For each 0 < y1,y2<1, the expected number of such pairs

C1,C2with y1k common rows and y2k common columns is

exp (f (α, y1,y2)k log n) ,

where

f (α, y1,y2) =4 − y1− y2−1 + y1 1y2α

2.

(79)

OGP for the Largest Submatrix Problem

Fix α ∈ (0, 1) and two k × k submatrices C1,C2with

Ave(C1)≈ Ave(C2)≈ αOPT.

Theorem (G & Li [2016])

For each 0 < y1,y2<1, the expected number of such pairs

C1,C2with y1k common rows and y2k common columns is

exp (f (α, y1,y2)k log n) ,

where

f (α, y1,y2) =4 − y1− y2−1 + y1 1y2α

2.

(80)

OGP for the Largest Submatrix Problem

Fix α ∈ (0, 1) and two k × k submatrices C1,C2with

Ave(C1)≈ Ave(C2)≈ αOPT. Theorem (G & Li [2016])

For each 0 < y1,y2<1, the expected number of such pairs

C1,C2with y1k common rows and y2k common columns is

exp (f (α, y1,y2)k log n) ,

where

f (α, y1,y2) =4 − y1− y2−1 + y1 1y2α

2.

(81)

OGP for the Largest Submatrix Problem

Fix α ∈ (0, 1) and two k × k submatrices C1,C2with

Ave(C1)≈ Ave(C2)≈ αOPT. Theorem (G & Li [2016])

For each 0 < y1,y2<1, the expected number of such pairs

C1,C2with y1k common rows and y2k common columns is

exp (f (α, y1,y2)k log n) ,

where

f (α, y1,y2) =4 − y1− y2−1 + y1 1y2α

2.

(82)

No OGP when0 < α < 0.9625...

Color points correspond to positive value of f .

1

y

2

y

No OGP when 0 < α < 0.9625... .

(83)

No OGP when0 < α < 0.9625...

Color points correspond to positive value of f .

1

y

2

y

No OGP when 0 < α < 0.9625... .

(84)

No OGP when0 < α < 0.9625...

Color points correspond to positive value of f .

1

y

2

y

No OGP when 0 < α < 0.9625... .

(85)

No OGP when0 < α < 0.9625...

Color points correspond to positive value of f .

1

y

2

y

No OGP when 0 < α < 0.9625... .

(86)

OGP when0.9625... < α < 1

The set of overlaps exhibits a gap.

1

y

2

(87)

OGP when0.9625... < α < 1

The set of overlaps exhibits a gap.

1

y

2

(88)

OGP in High-Dimensional Sparse Regression Problem

Consider the optimization problem parametrized by ζ ∈ (0, 1)

Γ(ζ) , min

β kY − X βk2

Subject to: kβk0=k,

(89)

OGP in High-Dimensional Sparse Regression Problem

Consider the optimization problem parametrized by ζ ∈ (0, 1)

Γ(ζ) , min

β kY − X βk2

Subject to: kβk0=k,

(90)

Plot ofΓ. Monotonicity

(91)

Plot ofΓ. Non-monotonicity and OGP

(92)

Plot ofΓ. Non-monotonicity and OGP

(93)

OGP Theorem

Theorem (G & Zadik [2017])

The OGP provably takes place when n ≤ cnConvex for

sufficiently small constant c > 0.

As a consequence, (a variant of) Gradient Descent

algorithm fails to find the ground truth regression vector β∗.

Conjecturally MCMC fails as well.

On the other hand, when n ≥ cnConvex and c is sufficiently

large,

No local minimum exist except β∗.

The algorithm based on local improvement finds the ground truth vector fast.

(94)

OGP Theorem

Theorem (G & Zadik [2017])

The OGP provably takes place when n ≤ cnConvex for

sufficiently small constant c > 0.

As a consequence, (a variant of) Gradient Descent

algorithm fails to find the ground truth regression vector β∗.

Conjecturally MCMC fails as well.

On the other hand, when n ≥ cnConvex and c is sufficiently

large,

No local minimum exist except β∗.

The algorithm based on local improvement finds the ground truth vector fast.

(95)

OGP Theorem

Theorem (G & Zadik [2017])

The OGP provably takes place when n ≤ cnConvex for

sufficiently small constant c > 0.

As a consequence, (a variant of) Gradient Descent

algorithm fails to find the ground truth regression vector β∗.

Conjecturally MCMC fails as well.

On the other hand, when n ≥ cnConvex and c is sufficiently

large,

No local minimum exist except β∗.

The algorithm based on local improvement finds the ground truth vector fast.

(96)

OGP Theorem

Theorem (G & Zadik [2017])

The OGP provably takes place when n ≤ cnConvex for

sufficiently small constant c > 0.

As a consequence, (a variant of) Gradient Descent

algorithm fails to find the ground truth regression vector β∗.

Conjecturally MCMC fails as well.

On the other hand, when n ≥ cnConvex and c is sufficiently

large,

No local minimum exist except β∗.

The algorithm based on local improvement finds the ground truth vector fast.

(97)

OGP Theorem

Theorem (G & Zadik [2017])

The OGP provably takes place when n ≤ cnConvex for

sufficiently small constant c > 0.

As a consequence, (a variant of) Gradient Descent

algorithm fails to find the ground truth regression vector β∗.

Conjecturally MCMC fails as well.

On the other hand, when n ≥ cnConvex and c is sufficiently

large,

No local minimum exist except β∗.

The algorithm based on local improvement finds the ground truth vector fast.

(98)

The OGP in Spiked Matrix Detection

For a certain range of values λINFO< λ < λCOMPthe model

exhibits the OGP

(99)

The OGP in Spiked Matrix Detection

For a certain range of values λINFO < λ < λCOMPthe model

exhibits the OGP

(100)

Conclusions

Lessons learned:

Statistical physics offers a ”new theory” of complexity of inference problems arising in high-dimensional statistical models

Challenge: the link between this new complexity theory and the classical complexity paradigms such as

NP-hardness needs to be understood.

(101)

Conclusions

Lessons learned:

Statistical physics offers a ”new theory” of complexity of inference problems arising in high-dimensional statistical models

Challenge: the link between this new complexity theory and the classical complexity paradigms such as

NP-hardness needs to be understood.

(102)

Conclusions

Lessons learned:

Statistical physics offers a ”new theory” of complexity of inference problems arising in high-dimensional statistical models

Challenge: the link between this new complexity theory and the classical complexity paradigms such as

NP-hardness needs to be understood.

(103)

Conclusions

Lessons learned:

Statistical physics offers a ”new theory” of complexity of inference problems arising in high-dimensional statistical models

Challenge: the link between this new complexity theory and the classical complexity paradigms such as

NP-hardness needs to be understood.

(104)

Conclusions

Lessons learned:

Statistical physics offers a ”new theory” of complexity of inference problems arising in high-dimensional statistical models

Challenge: the link between this new complexity theory and the classical complexity paradigms such as

NP-hardness needs to be understood.

(105)

References

Related documents

Moreover, in successful post-conflict transitions, the increase in government size involves an increase in the incidence of capital expenditure relative to government

maritime cabotage law and the Jones Act, considering (1) the current economic effects of the Jones Act in detail, and (2) how the energy market currently deals with the Act. In

The case study aims to apply process mining analysis to an existing asset management process in order to explore whether the process deviates from the planned process.. It

When research is carried out using patients or staff from this Trust, the same procedure must take place in relation to the processing of the personal information. Information

Merchant warrants and represents to Global Direct and Member: (a) that each sales transaction delivered hereunder will represent a bona fide sale to a card holder by Merchant for

The largest systematic uncertainties on the results of the angular analysis often arise from the understanding of the detector acceptance, labelled “Acceptance” in the table, which

Our method consists of (1) a novel input representation of an entity and it’s surrounding world context, (2) a neu- ral network mapping such past and present world represen- tation

When a mobile node sends position and time information to an external service, the anonymity server decrypts the message, removes any identifiers such as network ad- dresses,