Primal-Dual Methods for p-modulus on Graphs

(1)

Primal-Dual Methods for p-Modulus on Graphs

Dominique Zosso

Montana State University | Department of Mathematical Sciences http://www.math.montana.edu/zosso | [email protected]

SIAM Central States Section Meeting University of Arkansas at Little Rock | 2016-10-02

(2)

[p-Modulus]

(3)

Definitions I

Definition (p-Energy)

Consider a weighted undirected graph G(V , E, σ), where V is the set of vertices, E the set of edges, and σ : E → R⁺ non-negative edge-weights.

Let ρ: E → R and 1 ≤ p < ∞.

Then the p-energy is the quantity

p,σ(ρ) :=X

e∈E

σ(e)|ρ(e)|^p.

Definition

Γ :family of objects γ (walks, trees, ...) on G

(4)

Definitions II

Definition (object cost)

Given G(V , E, σ) and ρ we define the cost of an object γ:

`ρ(γ) :=X

e∈E

N_Γ(γ, e)ρ(e) = Nγρ = (NΓρ)(γ)

Definition (ρ Admissibility) The function ρ is admissible if:

∀γ ∈ Γ : `_ρ(γ) ≥1 Equivalently: infγ∈Γ`_ρ(γ) ≥1

Set of admissible ρ:

A(Γ) := {ρ | ∀γ ∈ Γ : `ρ(γ) ≥1}

(5)

Definitions III

Definition (p-Modulus)

Given G(V , E, σ), p, and Γ, we define the p-Modulus of the family Γ:

Modp,σ(Γ) := inf

ρ∈A(Γ)p,σ(ρ)

Computational goal:

minρ p,σ(ρ) s.t. NΓρ ≥ 1 (elementwise). (1)

(6)

[Primal-Dual Hybrid

Gradients]

(7)

Primal Dual formulation

The Legendre-Fenchel transform is used in the following primal-dual equivalence:

Theorem (Ekeland and Témam, 1976)

Let F : W → R be a closed and convex functional on the set W , G : X → R a closed and convex functional, and K : X → W be a continuous linear operator. Then we have the following equivalence:

minx∈X

nF (Kx) + G (x)o

| {z }

Primal

=min

x∈X max

φ∈W^∗

nhKx, φi − F^∗(φ) + G (x)o

| {z }

Primal−Dual

(2)

where x and φ are the primal and dual variables, respectively, F^∗ is the

(8)

Primal-Dual Hybrid Gradients

Theorem (Chambolle and Pock, 2011) The iterative scheme

φⁿ⁺¹ =arg min

φ∈W^∗

{−hK xⁿ, φi + F^∗(φ) + 1 2r1

kφⁿ− φk²₂} (3) xⁿ⁺¹ =arg min

x∈V

{hx, K^∗φi + G (x) + 1 2r2

kxⁿ−xk²₂} (4)

xⁿ⁺¹ = xⁿ⁺¹+ θ(xⁿ⁺¹−xⁿ) (5)

for θ ∈ [0, 1], converges to the saddle-point for r1r2≤1/L², where L = operator norm/induced norm ofK , or L² = spectral radius ofK^∗K . θ =0 corresponds to the Arrow-Hurwitz algorithm

(9)

Primal-Dual formulation I

We rewrite

minρ _p,σ(ρ) s.t. NΓρ ≥ 1 (elementwise)

as

minρ p,σ(ρ) + χ(N ρ) with the barrier function

χ(µ) :=(0 if µ ≥ 1

∞ otherwise

(10)

Primal-Dual formulation II

We compute the dual (convex conjugate) of the barrier function:

χ^∗(λ) :=sup

µ

{hλ, µi − χ(µ)} (6a)

=

(hλ, 1i if λ ≤ O

∞ otherwise (6b)

Thus the original problem becomes equivalent to:

minρ max

λ≤O _p,σ(ρ) + hλ, N ρi − χ^∗(λ)

=min

ρ max

λ≤O _p,σ(ρ) + hλ, N ρ − 1i (7)

“obvious”.

(11)

Primal-Dual core algorithm I

Now, PDHG (Chambolle-Pock) suggests the following iterative scheme:

ρⁿ⁺¹ =arg min

ρ _p,σ(ρ) + hN^Tλⁿ, ρi + 1 2r1

kρ − ρⁿk²₂ (8a) λⁿ⁺¹ =arg min

λ≤O hλ, 1 − N ρi + 1 2r2

kλ − λⁿk²₂ (8b)

λⁿ⁺¹ =2λⁿ⁺¹− λⁿ (8c)

(12)

Primal-Dual core algorithm II

Primal Update:

ρⁿ⁺¹ =arg min

ρ

_p,σ(ρ) + 1 2r1

kρ − (ρⁿ−r1N^Tλⁿ)k²₂ Depends on p, e.g.,

p =1:

ρⁿ⁺¹=shrink(ρⁿ−r1N^Tλⁿ, r1σ), where shrink(z, τ)(e) :=

z(e) − τ (e) ifz(e) ≥ τ (e) 0 if |z(e)| < τ (e) z(e) + τ (e) ifz(e) ≤ −τ (e)

p =2:

ρⁿ⁺¹= (I +2r1S)⁻¹(ρⁿ−r1N^Tλⁿ),

where S := diag(σ)

(13)

Primal-Dual core algorithm III

Dual update:

λⁿ⁺¹=arg min

λ≤O

1 2r2

kλ − (λⁿ+ r2(N ρ − 1))k²2

Projection onto the (closed convex) set of non-positive λ:

λ^∗ = λⁿ+ r2(N ρ − 1) (9a) λⁿ⁺¹=min(λ^∗, O) (elementwise) (9b)

(14)

Primal-Dual core algorithm IV

Complete instruction set: (for p = 2)

ρⁿ⁺¹= (I +2r1S)⁻¹(ρⁿ−r1N^Tλⁿ) (10a)

λ^∗ = λⁿ+ r2(N ρ − 1) (10b)

λⁿ⁺¹=min(λ^∗, O) (elementwise) (10c)

λⁿ⁺¹=2λⁿ⁺¹− λⁿ (10d)

Note: in the general case 1 < p 6= 2, (10a) involves an inner optimization.

(15)

[Essential family of objects]

(16)

How to get N

_Γ

? I

Given a graph and objects of interest (walks, trees, ...), the family Γ can become extremely big, thus NΓ extremely tall.

There exists an essential subfamily of objects that “spans” the set A(Γ) of admissible ρ.

Equivalently:

inessential objects = rows of NΓ corresponding to inactive constraints:

λ(γ) =0;

essential family = rows of NΓ for which λ(γ) <0.

(17)

How to get N

_Γ

? II

Idea: Construct essential NΓ greedily.

1 Start with Γ⁽⁰⁾ := {} i.e., N⁽⁰⁾ empty (0 × |E|), and ρ⁽⁰⁾=0

2 Find a γ⁽ⁿ⁺¹⁾∈Γ s.t. `_ρ⁽ⁿ⁾(γ⁽ⁿ⁺¹⁾) <1 (−ε) (if none found: done) 3 Add γ⁽ⁿ⁺¹⁾ to constraints: Γ⁽ⁿ⁺¹⁾:= Γ⁽ⁿ⁾∪ {γ⁽ⁿ⁺¹⁾} i.e.,

N⁽ⁿ⁺¹⁾:= N⁽ⁿ⁾ N_γ(n+1)

4 Compute Modp,σ(Γ⁽ⁿ⁺¹⁾)and let ρ⁽ⁿ⁺¹⁾ achieve the minimum

5 Optional housekeeping = remove inactive constraints:

Γ⁽ⁿ⁺¹⁾:= Γ⁽ⁿ⁺¹⁾\ {γ}for each γ : λ⁽ⁿ⁺¹⁾(γ) =0

(18)

How to get N

_Γ

? III

Reasonable questions:

How to find γ⁽ⁿ⁺¹⁾∈Γ?

I pick the one that violates “most”; (not unique)

I object = walk on G fromv to w , (v , w ∈ V ): shortest path (Dijkstra)

I object = spanning tree of G: minimal spanning tree (Kruskal) Housekeeping:

I Necessary?

I Do we get the essential family?

I Cycles?

Numerical stability?

(19)

Bounds on Mod

_p,σ

(Γ) I

As we search for the essential family, based on current iterates, what can we say about the value of Modp,σ(Γ)?

Monotonicity Γ⁽ⁿ⁾⊂Γ⁽ⁿ⁺¹⁾ ⊆Γ:

Modp,σ(Γ⁽ⁿ⁾) ≤Modp,σ(Γ⁽ⁿ⁺¹⁾) ≤Modp,σ(Γ)

(20)

Bounds on Mod

_p,σ

(Γ) II

Also: Given current ρ⁽ⁿ⁾, let γ^∗ =arg min_γ∈Γ`_ρ(n)(γ) <1.

Then the rescaled ρ^∗ := _` ^ρ⁽ⁿ⁾

ρ(n)(γ^∗) ∈A(Γ).

Now: Modp,σ(Γ) = inf

ρ∈A(Γ)p,σ(ρ)

≤ _p,σ(ρ^∗))

= 1

`_ρ(n)(γ^∗)^p_p,σ(ρ⁽ⁿ⁾)

= 1

`_ρ(n)(γ^∗)^pModp,σ(Γ⁽ⁿ⁾) Therefore:

Modp,σ(Γ⁽ⁿ⁾) ≤Modp,σ(Γ) ≤ 1

`_ρ(n)(γ^∗)^p Modp,σ(Γ⁽ⁿ⁾).

(21)

[Results]

(22)

Walks on simple graph

1 2

3 4

1 0

1 2

3 4

1 0

0 ≤ Mod2,σ(Γ) ≤ ∞

1 2

3 4

1 0.5

1 0

1 0.5

1 0

1 2

3 4

1 0.5

1 0

1 0.5

1 0

0.5 ≤ Mod2,σ(Γ) ≤ ∞

1 2

3 4

1 0.5

1 -0

1 0.5

σ(e),ρ(e) 1 ≤ Mod2,σ(Γ) ≤1

(23)

Walks on less simple graph I

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

0.869 0

0.267 0

0.962 0

1.09 0

0.614 0

0.984 0

0.688 0

0.255 0

0.3 0

0.507 0

0.849 0

0.926 0 0.89

0 0.419

0 0.634

0 0.19

0

0.212 0

0.236 0

0.779 0

0.595 0

0.29 0

0.595 0

0.248 0

0.155 0

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

0.869 0

0.267 0

0.962 0

1.09 0

0.614 0

0.984 0

0.688 0

0.255 0

0.3 0

0.507 0

0.849 0

0.926 0 0.89

0 0.419

0 0.634

0 0.19

0

0.212 0

0.236 0

0.779 0

0.595 0

0.29 0

0.595 0

0.248 0

0.155 0

0 ≤ Mod2,σ(Γ) ≤ ∞

(24)

Walks on less simple graph II

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

0.869 0

0.267 0

0.962 0

1.09 0

0.614 1

0.984 0

0.688 0

0.255 0

0.3 0

0.507 0

0.849 0

0.926 0 0.89

0 0.419

0 0.634

0 0.19

0

0.212 0

0.236 0

0.779 0

0.595 0

0.29

0 0.595

0 0.248

0 0.155

0

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

0.869 0

0.267 0

0.962 0

1.09 0

0.614 1

0.984 0

0.688 0

0.255 0

0.3 0

0.507 0

0.849 0

0.926 0 0.89

0 0.419

0 0.634

0 0.19

0

0.212 0

0.236 0

0.779 0

0.595 0

0.29

0 0.595

0 0.248

0 0.155

0

0.6144 ≤ Mod2,σ(Γ) ≤ ∞

(25)

Walks on less simple graph III

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

0.869 -0

0.267 0.527

0.962 -0

1.09 -0

0.614 1

0.984 -0

0.688 -0

0.255 -0

0.3 -0

0.507 -0

0.849 -0

0.926 -0 0.89

-0 0.419

0.337 0.634

0.222 0.19

-0

0.212 -0

0.236 -0

0.779 -0

0.595 -0

0.29

-0 0.595

-0 0.248

-0 0.155

-0

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

0.869 -0

0.267 0.527

0.962 -0

1.09 -0

0.614 1

0.984 -0

0.688 -0

0.255 -0

0.3 -0

0.507 -0

0.849 -0

0.926 -0 0.89

-0 0.419

0.337 0.634

0.222 0.19

-0

0.212 -0

0.236 -0

0.779 -0

0.595 -0

0.29

-0 0.595

-0 0.248

-0 0.155

-0

0.7676 ≤ Mod2,σ(Γ) ≤ ∞

(26)

Walks on less simple graph IV

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

0.869 -0

0.267 0.527

0.962 -0

1.09 -0

0.614 1

0.984 -0

0.688 -0

0.255 0.497

0.3 -0

0.507 -0

0.849 -0

0.926 -0 0.89

-0 0.419

0.337 0.634

0.222 0.19

-0

0.212 -0

0.236 0.536

0.779 0.163

0.595 -0

0.29

-0 0.595

-0 0.248

-0 0.155

-0

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

0.869 -0

0.267 0.527

0.962 -0

1.09 -0

0.614 1

0.984 -0

0.688 -0

0.255 0.497

0.3 -0

0.507 -0

0.849 -0

0.926 -0 0.89

-0 0.419

0.337 0.634

0.222 0.19

-0

0.212 -0

0.236 0.536

0.779 0.163

0.595 -0

0.29

-0 0.595

-0 0.248

-0 0.155

-0

0.9192 ≤ Mod2,σ(Γ) ≤ ∞

(27)

Walks on less simple graph V

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

0.869 -0

0.267 0.534

0.962 -0

1.09 0.0586

0.614 1

0.984 0.0649

0.688 0.0928

0.255 0.514

0.3 0.213

0.507 -0

0.849 0.0752

0.926 -0 0.89

-0 0.419

0.341 0.634

0.225 0.19

-0

0.212 0.302

0.236 0.555

0.779 0.168

0.595 0.107

0.29

-0 0.595

0.107 0.248

0.258 0.155

-0

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

0.869 -0

0.267 0.534

0.962 -0

1.09 0.0586

0.614 1

0.984 0.0649

0.688 0.0928

0.255 0.514

0.3 0.213

0.507 -0

0.849 0.0752

0.926 -0 0.89

-0 0.419

0.341 0.634

0.225 0.19

-0

0.212 0.302

0.236 0.555

0.779 0.168

0.595 0.107

0.29

-0 0.595

0.107 0.248

0.258 0.155

-0

1.015 ≤ Mod2,σ(Γ) ≤2.751

(28)

Walks on less simple graph VI

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

0.869 -0

0.267 0.537

0.962 -0

1.09 0.0816

0.614 1

0.984 0.0904

0.688 0.0651

0.255 0.52

0.3 0.149

0.507 0.0872

0.849 0.105

0.926 0.0477 0.89

-0 0.419

0.343 0.634

0.226 0.19

-0

0.212 0.42

0.236 0.561

0.779 0.17

0.595 0.149

0.29

0.153 0.595 0.0752

0.248 0.181

0.155 0.285

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

0.869 -0

0.267 0.537

0.962 -0

1.09 0.0816

0.614 1

0.984 0.0904

0.688 0.0651

0.255 0.52

0.3 0.149

0.507 0.0872

0.849 0.105

0.926 0.0477 0.89

-0 0.419

0.343 0.634

0.226 0.19

-0

0.212 0.42

0.236 0.561

0.779 0.17

0.595 0.149

0.29

0.153 0.595 0.0752

0.248 0.181

0.155 0.285

1.06 ≤ Mod2,σ(Γ) ≤2.112

(29)

Walks on less simple graph VII

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

0.869 0.0568

0.267 0.644

0.962 0.0513

1.09 0.12

0.614 1

0.984 0.133

0.688 0.0598

0.255 0.521

0.3 0.137

0.507 0.0802

0.849 0.0963

0.926 0.0439 0.89

0.0554 0.419

0.294 0.634

0.194 0.19

0.26

0.212 0.386

0.236 0.562

0.779 0.171

0.595 0.137

0.29

0.14 0.595 0.0691

0.248 0.166

0.155 0.262

1.096 ≤ Mod2,σ(Γ) ≤1.096

(30)

Spanning tree on simple graph I

1 2

3 4

1 0

1 2

3 4

1 0

0 ≤ Mod2,σ(Γ) ≤ ∞

1 2

3 4

1 0.333

1 0

1 0.333

1 0

1 2

3 4

1 0.333

1 0

1 0.333

1 0

0.3333 ≤ Mod2,σ(Γ) ≤3

1 2

3 4

1 0.5

1 0.25

1 2

3 4

1 0.5

1 0.25

0.5 ≤ Mod2,σ(Γ) ≤0.8889

(31)

Spanning tree on simple graph II

1 2

3 4

1 0.385

1 0.308

1 0.231

1 0.308

1 2

3 4

1 0.385

1 0.308

1 0.231

1 0.308

0.5385 ≤ Mod2,σ(Γ) ≤0.6319

1 2

3 4

1 0.364

1 0.273

1 2

3 4

1 0.364

1 0.273

0.5455 ≤ Mod2,σ(Γ) ≤0.66

2 4

1 0.333

(32)

Spanning tree on large graph I

Random geometric graph:

6,000 vertices drawn from [0, 1]² epsilon-neighbors graph: = 0.05 134,826 edges

Family of spanning trees; each tree uses 5,999 edges

|Γ| ≈10⁹⁷⁸⁷

4

^!

p =2

(33)

Spanning tree on large graph II

Some hours of number crunching:

n bounds t

1: 0 ≤ Mod2,σ(Γ) ≤ ∞ 2: 0.0001985 ≤ Mod2,σ(Γ) ≤3.099 3: 0.0003876 ≤ Mod2,σ(Γ) ≤1.203 4: 0.0005676 ≤ Mod2,σ(Γ) ≤0.5809 5: 0.0007387 ≤ Mod2,σ(Γ) ≤0.3366 10: 0.00138 ≤ Mod2,σ(Γ) ≤0.0941 20: 0.002533 ≤ Mod2,σ(Γ) ≤0.03805 50: 0.003681 ≤ Mod2,σ(Γ) ≤0.005329 100: 0.003728 ≤ Mod2,σ(Γ) ≤0.004021 200: 0.003735 ≤ Mod2,σ(Γ) ≤0.003821 500: 0.003736 ≤ Mod2,σ(Γ) ≤0.00375

(34)

Spanning tree on large graph III

Numerical breakdown as |Γ⁽ⁿ⁾|increases:

core computation gets obviously larger (size of λ, size of N ) slower: kN k grows from ∼ 80 to ∼ 690

worse conditioned: κ(N N^T)grows from 1 to ∼ 10⁷

convergence of core computation worsens: from 10 iterations to 1,000 convergence error on ρ adversely affects path selection, bounds roundoff error on constraints, λ becomes relevant

(35)

Last example I

Unit resistor network:

1 2 3

4 5 6

7 8 9

1 0.357

1 0.0714 1

0.357

1 0.357

1

1 1

0.143 1

0.286

1 0.286 1

0.357 1

0.0714 1

0.0714

1 0.0714

1 2 3 4 5

6 7 8 9 10

11 12 13 14 15

16 17 18 19 20

21 22 23 24 25

1 0.0624

1 0.146

1 0.0702

1 0.0243 1

0.0624

1 0.103

1 0.0624

1

0.0407 1

0.307 1

0.0399 1

0.0026 1

0.0841

1 0.35

1 0.0841 1

0.206 1

1 1

0.22 1

0.0537 1

0.0763

1 0.343

1 0.0763 1

0.0407 1

0.307 1

0.0399 1

0.0026 1

0.0459

1 0.0832

1 0.0459 1

0.0624 1

0.146 1

0.0702 1

0.0243 1

0.0243

1 0.0269

1 0.0243

1 2 3 4 5 6 7

8 9 10 11 12 13 14

15 16 17 18 19 20 21

22 23 24 25 26 27 28

29 30 31 32 33 34 35

36 37 38 39 40 41 42

43 44 45 46 47 48 49

1 0.0214

1 0.0478

1 0.0711

1 0.0496

1 0.0263

1 0.0107 1

0.0214

1 0.038

1 0.0319

1 0.038

1 0.0214

1 0.0165 1

0.0508 1

0.116 1

0.0514 1

0.0186 1

0.0058 1

0.0263

1 0.0606

1 0.0896

1 0.0606

1 0.0263 1

0.00607 1

0.023 1

0.29 1

0.0217 1

0.00925 1

0.00607 1

0.0233

1 0.0884

1 0.356

1 0.0884

1 0.0233 1

0.0638 1

0.243 1

1 1

0.245 1

0.0712 1

0.0208 1

0.0215

1 0.0859

1 0.354

1 0.0859

1 0.0215 1

0.00607 1

0.023 1

0.29 1

0.0217 1

0.00925 1

0.00607 1

0.0233

1 0.0562

1 0.0871

1 0.0562

1 0.0233 1

0.0165 1

0.0508 1

0.116 1

0.0514 1

0.0186 1

0.0058 1

0.0156

1 0.0284

1 0.0252

1 0.0284

1 0.0156 1

0.0214 1

0.0478 1

0.0711 1

0.0496 1

0.0263 1

0.0107 1

0.0107

1 0.0165

1 0.0104

1 0.0165

1 0.0107

· · ·

n =3 5 7

(36)

Last example II

0 10 20

1.7 1.8 1.9 2

n Modp,σ(Γ)

At n = 23, = 0.0001, inner convergence tolerance 10⁻⁹: 316 paths in the constraint, and 1.99585 ≤ Mod2,σ(Γ) ≤1.99624

At n = 23, = 0.00001, inner convergence tolerance 10⁻¹⁰: 371 paths in the constraint, and 1.99585 ≤ Mod2,σ(Γ) ≤1.99589

(37)

[?]

(38)

Acknowledgements

Thank you!

Definitions and greedy algorithm:

Pietro’s talk at MSU in April 2016 Support from:

NSF DMS-1461138 (co-PI with Braxton Osting, U Utah)