• No results found

orecchia.pdf

N/A
N/A
Protected

Academic year: 2020

Share "orecchia.pdf"

Copied!
149
0
0

Loading.... (view fulltext now)

Full text

(1)

A Simple, Combinatorial Algorithm

for Solving SDD Systems

in Nearly-Linear Time

Lorenzo Orecchia,

MIT Math

(2)

Problem Definition: SDD Systems

PROBLEM INPUT:

GOAL:

solve for

x

A ∈ Rn×n, b ∈ Rn.

Ax

=

b

SQUARE SYSTEM OF LINEAR EQUATIONS

2

GOAL:

solve for

x

(3)

Problem Definition: SDD Systems

PROBLEM INPUT:

a

11

a

12

. . .

a

1n

a

21

a

22

. . .

a

2n

.

.

.

.

.

.

. .

.

.

.

.

a

n

1

a

n2

. . .

a

nn

x

1

x

2

.

.

.

x

n

=

b

1

b

2

.

.

.

b

n

A ∈ Rn×n, b ∈ Rn.

GOAL: solve for x

Restriction: A is Symmetric Diagonally-Dominant

a

n

1

a

n2

. . .

a

nn
(4)

Problem Definition: SDD Systems

PROBLEM INPUT:

a

11

a

12

. . .

a

1n

a

12

a

22

. . .

a

2n

.

.

.

.

.

.

. .

.

.

.

.

a

1n

a

2n

. . .

a

nn

x

1

x

2

.

.

.

x

n

=

b

1

b

2

.

.

.

b

n

A ∈ Rn×n, b ∈ Rn.

4

GOAL: solve for x

Restriction: A is Symmetric Diagonally-Dominant • Symmetry: AT = A

a

(5)

Problem Definition: SDD Systems

PROBLEM INPUT: A ∈ Rn×n, b ∈ Rn.

a

11

a

12

. . .

a

1n

a

12

a

22

. . .

a

2n

.

.

.

.

.

.

. .

.

.

.

.

a

1n

a

2n

. . .

a

nn

x

1

x

2

.

.

.

x

n

=

b

1

b

2

.

.

.

b

n

GOAL: solve for x

Restriction: A is Symmetric Diagonally-Dominant • Symmetry: AT = A

• Diagonal Dominance:

for all rows (and columns) i, diagonal entry dominates

a

ii

j i

|

a

ij

|

a

(6)

Problem Definition: SDD Systems

PROBLEM INPUT: A ∈ Rn×n, b ∈ Rn.

a

11

a

12

. . .

a

1n

a

12

a

22

. . .

a

2n

.

.

.

.

.

.

. .

.

.

.

.

a

1n

a

2n

. . .

a

nn

x

1

x

2

.

.

.

x

n

=

b

1

b

2

.

.

.

b

n

6

GOAL: solve for x

Restriction: A is Symmetric Diagonally-Dominant • Symmetry: AT = A

• Diagonal Dominance: for all i,

a

ii

j=i

|

a

ij

|

a

1n

a

2n

. . .

a

nn

x

n

b

n

WHY SDD ? Practically important subcase of positive semidefinite matrices

A IS SDD A 0

(7)

Problem Definition: SDD Systems

PROBLEM INPUT: A ∈ Rn×n, b ∈ Rn.

a

11

a

12

. . .

a

1n

a

12

a

22

. . .

a

2n

.

.

.

.

.

.

. .

.

.

.

.

a

1n

a

2n

. . .

a

nn

x

1

x

2

.

.

.

x

n

=

b

1

b

2

.

.

.

b

n

GOAL: solve for x

Restriction: A is Symmetric Diagonally-Dominant • Symmetry: AT = A

• Diagonal Dominance: for all i,

a

ii

j=i

|

a

ij

|

a

1n

a

2n

. . .

a

nn

x

n

b

n

WHY SDD ? Practically important subcase of positive semidefinite matrices

(8)

Problem Definition: SDD Systems

PROBLEM INPUT: A ∈ Rn×n, b ∈ Rn, b ∈ Im(A)

a

11

0

. . .

a

1n

0

a

22

. . .

0

.

.

.

.

.

.

. .

.

.

.

.

0

a

2n

. . .

a

nn

x

1

x

2

.

.

.

x

n

=

b

1

b

2

.

.

.

b

n

nnz(A)

8

GOAL: solve for x

• approximately:

•in nearly-linear time in the sparsity nnz(A) and log(1/ǫ) , i.e. RunTime =

x − x∗A ≤ ǫx∗A

0

a

2n

. . .

a

nn

x

n

b

n

(9)

Reduction to Laplacian Systems

SDD SYSTEM:

Ax

=

b

LAPLACIAN SYSTEM:

Lv

=

χ

[Gremban’96]

(10)

Graph Laplacian

G =(V,E,w) weighted undirected graph with n vertices and m edges

1

1

2

1

3

GRAPH G

10

1

L

(

G

) =

(11)

Graph Laplacian

G =(V,E,w) weighted undirected graph with n vertices and m edges

1

1

2

1

3

GRAPH G

1

L

(

G

) =

GRAPH LAPLACIAN L(G)

(12)

Graph Laplacian

G =(V,E,w) weighted undirected graph with n vertices and m edges

1

1

2

1

3

GRAPH G

12

1

L

(

G

) =

GRAPH LAPLACIAN L(G)

(13)

Laplacian as Sum of Edges

0

. . .

0

. . .

0

. . .

0

..

.

..

.

..

.

..

.

..

.

..

.

..

.

0

. . .

1

. . .

1

. . .

0

.

.

.

.

.

.

.

i j i

L

(

G

) =

e={i,j}∈E

w

e

..

.

..

.

..

.

..

.

..

.

..

.

..

.

0

. . .

1

. . .

1

. . .

0

..

.

..

.

..

.

..

.

..

.

..

.

..

.

0

. . .

0

. . .

0

. . .

0

(14)

Laplacian as Sum of Edges

0

. . .

0

. . .

0

. . .

0

..

.

..

.

..

.

..

.

..

.

..

.

..

.

0

. . .

1

. . .

1

. . .

0

.

.

.

.

.

.

.

i j i 14

L

(

G

) =

e={i,j}∈E

w

e

..

.

..

.

..

.

..

.

..

.

..

.

..

.

0

. . .

1

. . .

1

. . .

0

..

.

..

.

..

.

..

.

..

.

..

.

..

.

0

. . .

0

. . .

0

. . .

0

j
(15)

Laplacian as Sum of Edges

0

. . .

0

. . .

0

. . .

0

..

.

..

.

..

.

..

.

..

.

..

.

..

.

0

. . .

1

. . .

1

. . .

0

.

.

.

.

.

.

.

i j i

L

(

G

) =

e={i,j}∈E

w

e

..

.

..

.

..

.

..

.

..

.

..

.

..

.

0

. . .

1

. . .

1

. . .

0

..

.

..

.

..

.

..

.

..

.

..

.

..

.

0

. . .

0

. . .

0

. . .

0

j
(16)

Laplacian as Sum of Edges

L

(

G

) =

e={i,j}∈E

w

e

0

..

.

1

..

.

1

.

0

· · ·

1

· · ·

1

· · ·

0

T

i j i j 16

..

.

0

χ

e

χ

T

e

i

j

ARBITRARY ORIENTATION OF EDGES: WILL BE USEFUL WHEN DISCUSSING FLOWS

(17)

More Graph Matrices: Incidence Matrix

G =(V,E,w) weighted undirected graph with n vertices and m edges

χ

e 1

χ

e 2

..

n

B

T =

..

.

χ

ei

..

.

χ

em
(18)

More Graph Matrices: Incidence Matrix

G =(V,E,w) weighted undirected graph with n vertices and m edges

B

T =

χ

e1

χ

e 2

..

.

m

n

18

B

T =

.

χ

ei

..

.

χ

em

m

L

=

e

=

{

i,j

}∈

E

w

e

χ

e

χ

T

e

=

B

T

W B

(19)

More Graph Matrices: Incidence Matrix

G =(V,E,w) weighted undirected graph with n vertices and m edges

B

T =

χ

e1

χ

e 2

..

.

χ

m

n

B

=

χ

ei

..

.

χ

em

m

(20)

Action of Incidence Matrix

G =(V,E,w) weighted undirected graph with n vertices and m edges

+1 -1

+1 -1

20

Action of BTon a vector

f ∈ Rm:

Action of B on a vector v ∈ Rn:

(

B

T

f

)

i

= flow out of i

flow into of i = net flow into graph at i

+1 -1

(21)

Graphs as Electrical Circuits

1

1

2

1

(22)

Graphs as Electrical Circuits

1

1/2

1/2

1

1/3 Edge conductances we

Edge resistances re

re= 1/we

(23)

Graphs as Electrical Circuits

1

1/2

1/2

1

1/3 Edge conductances we

Edge resistances re

re= 1/we

(24)

Graphs as Electrical Circuits

1

1/2

1/2

1

1/3 Edge conductances we

Edge resistances re

re= 1/we

24

one unit of electrical flow Into vertex 1 and out of 3

(25)

Graphs as Electrical Circuits

1

1/2

1/2

1

1/3 Edge conductances we

Edge resistances re

re= 1/we

one unit of electrical flow Into vertex 1 and out of 3

1. Ohm’s Law: For every edge e = (i, j) ∈ E:

f(i,j) = vi − vj

rij

(26)

Graphs as Electrical Circuits

1

1/2

1/2

1

1/3 Edge conductances we

Edge resistances re

re= 1/we

26

one unit of electrical flow Into vertex 1 and out of 3

1. Ohm’s Law: For every edge e = (i, j) ∈ E:

f(i,j) = vi − vj

rij

2. Kirchoff’s Conservation Law: For every vertex i ∈ V :

flow out of i − flow into i = net flow into network at i

f

=

R

−1

Bv

(27)

Graphs as Electrical Circuits

1

1/2

1/2

1

1/3 Edge conductances we

Edge resistances re

re= 1/we

one unit of electrical flow Into vertex 1 and out of 3

1. Ohm’s Law:

f = R−1Bv = W Bv

2. Kirchoff ’s Conservation Law:

(28)

Graphs as Electrical Circuits

1

1/2

1/2

1

1/3 Edge conductances we

Edge resistances re

re= 1/we

28

one unit of electrical flow Into vertex 1 and out of 3

1. Ohm’s Law:

f = R−1Bv = W Bv

2. Kirchoff ’s Conservation Law:

BTf = χ

Example above: Current source s is vertex 1. Current sink t is vertex 3

General net flow vector χ allowed:

(29)

1

1/2

1/2

1

1/3 Edge conductances we

Edge resistances re

re= 1/we

General net flow vector χ :

χT1 = 0

Laplacian Systems as Electrical Problems

one unit of electrical flow Into vertex 1 and out of 3

Q: What is resulting electrical flow on the graph?

f

=

R

−1

Bv

=

W Bv,

B

T

f

=

χ

(30)

1

1/2

1/2

1

1/3 Edge conductances we

Edge resistances re

re= 1/we

General net flow vector χ :

χT1 = 0

Laplacian Systems as Electrical Problems

30

one unit of electrical flow Into vertex 1 and out of 3

Q: What is resulting electrical flow on the graph?

Given input currents χ, finding voltage is equivalent to solving Laplacian system of linear equations

(31)

1

1/2

1/2

1

1/3 Edge conductances we

Edge resistances re

re= 1/we

General net flow vector χ :

χT1 = 0

Laplacian Systems as Electrical Problems

one unit of electrical flow Into vertex 1 and out of 3

Q: What is resulting electrical flow on the graph?

Lv

=

χ

v

=

L

+

χ

(32)

Energy Intepretation

Lv

=

χ

min

1

·

x

T

Lx

x

T

χ

Optimality condition

32

min

x1

1

2

·

x

T

Lx

x

T

χ

Equivalent up to scaling

min

x1

1

2

·

x

T

Lx

(33)

Energy Intepretation

Lv

=

χ

min

1

·

x

T

Lx

x

T

χ

Optimality condition

min

x1

1

2

·

x

T

Lx

x

T

χ

Equivalent up to scaling

min

1

2

·

x

T

Lx

=

(i,j)∈E

(xixj)2

(34)

Why it matters

Direct Applications

Modeling electrical

networks

Simulating random walks

PageRank

34

PageRank

(35)

Why it matters

Direct Applications

Modeling electrical

networks

Simulating random walks

PageRank

PageRank

Numerical Applications

Finite element

[BHV08]
(36)

Why it matters: Faster Graph Algorithms

“The Laplacian Paradigm”

Maximum flow

[CKM+11,LRS13,KOLS12]

Multicommodity flow

[KMP12, KOLS12]

Random spanning trees

[KM09]

Graph sparsification

[SS11]

36

Graph sparsification

[SS11]

Lossy flow, min-cost flow

[DS08]

Balanced partitioning

[OSV12]

Oblivious routing

[KOLS12]
(37)
(38)

Our Result

Solve

Lx

=

b

in time ˜

O

m

log

2

n

log

1

ǫ

• Very different approach

• Simple and intuitive algorithm

38

• Simple and intuitive algorithm

• Proof fits on a single blackboard

(39)

Our Result

• Very different approach

• Simple and intuitive algorithm

Solve

Lx

=

b

in time ˜

O

m

log

2

n

log

1

ǫ

• Simple and intuitive algorithm

• Proof fits on a single blackboard

(40)

Which computational model?

Q:

How do we account for running time of arithmetic operations?

A:

Constant time for operations on word-size sequence of bits.

1

0

0

0

0

1

1

1

1

1

1

O

(log

c

n

)

Size of Word: Polynomial in size of input

(41)

Which computational model?

Q:

How do we account for running time of arithmetic operations?

A:

Constant time for operations on word-size sequence of bits.

1

0

0

0

0

1

1

1

1

1

1

O

(log

c

n

)

Size of Word: Polynomial in size of input

Previous works

:
(42)

The Philosophy of This Talk

PROBLEM

REPRESENTATION

42

(43)

The Philosophy of This Talk

PROBLEM

REPRESENTATION

Not all representations created equally: e.g. eigenvalue decompostion

Good representation often requires

combinatorial insight

(44)

The Philosophy of This Talk

PROBLEM

REPRESENTATION

Not all representations created equally: e.g. eigenvalue decompostion

Good representation often requires

combinatorial insight

44

Leverage large body of techniques in continuous optimization

Regularization Gradient Descent

Accelerated Gradient Descent Randomized Coordinate Descent

(45)

The Philosophy of This Talk

PROBLEM

REPRESENTATION

Not all representations created equally: e.g. eigenvalue decompostion

Good representation often requires

combinatorial insight

Leverage large body of techniques in continuous optimization

Regularization Gradient Descent

Accelerated Gradient Descent Randomized Coordinate Descent

(46)

Techniques and Challenges in the

Solution of Laplacian Systems

Solution of Laplacian Systems

(47)

Changing Representation:

Gaussian Elimination

Simple Graphic Interpretation:

L =

   

4 1 1 1 1

−1 1 0 0 0 −1 0 1 0 0

(48)

Changing Representation:

Gaussian Elimination

Simple Graphic Interpretation:

Eliminate (pivot) this node

48

    

4 1 1 1 1

−1 1 0 0 0 −1 0 1 0 0 −1 0 0 1 0 −1 0 0 0 1

(49)

Changing Representation:

Gaussian Elimination

Simple Graphic Interpretation:

Eliminate (pivot) this node

   

4 1 1 1 1

−1 1 0 0 0 −1 0 1 0 0

(50)

Changing Representation:

Gaussian Elimination

Simple Graphic Interpretation:

     

1 0 0 0 0 0 3 4 − 1 4 − 1 4 − 1 4

0 1

4

3

4 − − 1

4 − 1 4

0 1

4 − 1 4 3 4 − 1 4

0 1

4 − 1

4 −1

(51)

Cholesky Decomposition

Simple Graphic Interpretation:

  

1 0 0 0 0

0 3 4 − 1 4 − 1 4 − 1 4 0 − 1 4 3

4 − − 1 4 − 1 4     L =    

4 0 0 0 0

−1 1 0 0 0 0 0 1 0 0

       

4 1 0 0 0 0 1 0 0 0 0 0 1 0 0

   

(52)

Changing Representation:

Gaussian Elimination

Advantages:

Gives exact algorithm

Computes inverse matrix (can then be used on any

b

)

Can choose elimination order

(53)

Changing Representation:

Gaussian Elimination

Advantages:

Gives exact algorithm

Computes implicit representation of inverse matrix

(can then be used on any

b

)

Can choose elimination order

Can choose elimination order

Disadvantages:

(54)

Gaussian Elimination

and Nested Dissection

Combinatorial arguments can help preserve sparsity by selecting good order:

G

(55)

Gaussian Elimination

and Nested Dissection

Combinatorial arguments can help preserve sparsity by selecting good order:

G

(56)

Gaussian Elimination

and Nested Dissection

Combinatorial arguments can help preserve sparsity by selecting good order:

G

56

Small Vertex Separator

(57)

Gaussian Elimination

and Nested Dissection

Combinatorial arguments can help preserve sparsity by selecting good order:

G

Size of existing separators bounds sparsity

(58)

Gaussian Elimination

and Nested Dissection

Combinatorial arguments can help preserve sparsity by selecting good order:

G

58

Size of existing separators bounds sparsity

(59)

Gaussian Elimination: the bad case

G sparse expander

(60)

Changing Representation:

Gaussian Elimination

Advantages:

Gives exact algorithm

Computes implicit representation of inverse matrix

(can then be used on any

b

)

Can choose elimination order to minimize running time

60

Can choose elimination order to minimize running time

Disadvantages:

Intermediate Laplacians can be very dense

Very slow

in worst case:

O

(

n

3

)

O

(

n

w

)

(61)

Gaussian Elimination: the good case

Easily linear time for

some graphs:

(62)

Gaussian Elimination: the good case

Easily linear time for

some graphs:

PATHS

Eliminate in time O(1)

62

PATHS

(63)

Gaussian Elimination: the good case

Easily linear time for

some graphs:

(64)

Gaussian Elimination: the good case

Easily linear time for

some graphs:

PATHS

O(n) time

64

This works more in general for trees

by

recursively eliminating a leaf.

(65)

Iterative Methods: Gradient Descent

Consider energy interpretation:

This is a convex optimization problem on which we can apply gradient

min

x1

1

2

·

x

T

Lx

x

T

χ

(66)

Consider energy interpretation:

This is a convex optimization problem on which we can apply gradient

Iterative Methods: Gradient Descent

f

(

x

)

f

(

x

) =

Lx

χ

2

f

(

x

) =

L

min

x1

1

2

·

x

T

Lx

x

T

χ

This is a convex optimization problem on which we can apply gradient descent techniques.

(67)

Consider energy interpretation:

This is a convex optimization problem on which we can apply gradient

Iterative Methods: Gradient Descent

f

(

x

)

f

(

x

) =

Lx

χ

λ

2

D

2

f

(

x

)

2

D

min

x1

1

2

·

x

T

Lx

x

T

χ

Use degree norm · D

(68)

Iterative Methods: Gradient Descent

Consider energy interpretation:

This is a convex optimization problem on which we can apply gradient

f

(

x

)

f

(

x

) =

Lx

χ

min

x1

1

2

·

x

T

Lx

x

T

χ

λ

2

D

2

f

(

x

)

2

D

68

This is a convex optimization problem on which we can apply gradient descent techniques.

Construct iterative solutions:

x

(t+1)

=

x

(t)

hD

−1

f

(

x

(t)

)

Step length

(69)

Iterative Methods: Gradient Descent

Consider energy interpretation:

This is a convex optimization problem on which we can apply gradient

f

(

x

)

f

(

x

) =

Lx

χ

min

x1

1

2

·

x

T

Lx

x

T

χ

λ

2

D

2

f

(

x

)

2

D

This is a convex optimization problem on which we can apply gradient descent techniques.

Construct iterative solutions:

By standard gradient descent analysis h = 1/2

For quadratic function, it can be optimized at every step.

x

(t+1)

=

x

(t)

1

2

D

1

f

(

x

(t)

)

(70)

Iterative Methods: Gradient Descent

Consider energy interpretation:

This is a convex optimization problem on which we can apply gradient

f

(

x

)

f

(

x

) =

Lx

χ

min

x1

1

2

·

x

T

Lx

x

T

χ

λ

2

D

2

f

(

x

)

2

D

70

This is a convex optimization problem on which we can apply gradient descent techniques.

Construct iterative solutions:

.

x

(t+1)

=

tj=0

I+2W

j

χ

UNRAVEL RECURSION TO OBTAIN TRUNCATED SERIES: t steps of random walk

(71)

Iterative Methods: Gradient Descent

Consider energy interpretation:

This is a convex optimization problem on which we can apply gradient

f

(

x

)

f

(

x

) =

Lx

χ

λ

2

D

2

f

(

x

)

2

D

min

x1

1

2

·

x

T

Lx

x

T

χ

This is a convex optimization problem on which we can apply gradient descent techniques.

Construct iterative solutions:

Iterations necessary to converge to ǫ-approximate solution:

x

(0)

, x

(1)

, x

(2)

, . . . , x

(t)

, . . .

x

(t+1)

=

tj=0

I+2W

j
(72)

Bad Example for Gradient Descent

PATH:

λ

2

=

n

1

2

72

Iterations necessary to converge to ǫ-approximate solution:

T

=

O

2

λ2

log

n

ǫ

=

O

n

2

log

n

ǫ

(73)

Bad Example for Gradient Descent

PATH:

λ

2

=

n

1

2

1/2 1/2

S t

GAMBLER’S RUIN

Iterations necessary to converge to ǫ-approximate solution:

Each Iteration is a matrix-vector multiplication by , requiring time O(m)

T

=

O

2

λ2

log

n

ǫ

=

O

n

2

log

n ǫ

2

n

I+W

2

(74)

Bad Example for Gradient Descent

PATH:

λ

2

=

n

1

2

74

Iterations necessary to converge to ǫ-approximate solution:

Each Iteration is a matrix-vector multiplication by , requiring time O(m)

T

=

O

2

λ2

log

n

ǫ

=

O

n

2

log

n

ǫ

2

n

I+W

2

RunTime =

O

mn

2

log

n ǫ
(75)

Consider energy interpretation:

This is a convex optimization problem on which we can apply gradient descent techniques.

min

x1

1

2

·

x

T

Lx

x

T

χ

Accelerated Gradient Descent

f

(

x

) =

Lx

χ

λ

2

D

2

f

(

x

)

2

D

descent techniques.

Accelerated gradient techniques achieve better convergence:

Improved iteration count:

CHEBYSHEV’S ITERATION CONJUGATE GRADIENT

T

=

O

2

(76)

Still no luck, but …

PATH:

λ

=

1

T

=

O

n

log

n

76

PATH:

λ

2

=

n

1

2

T

=

O

n

log

n

ǫ

RunTime =

O

mn

log

n ǫ

(77)

Still no luck, but …

PATH:

λ

=

1

T

=

O

n

log

n

IT TAKES n STEPS FOR CHARGE TO TRAVEL ACROSS

PATH:

λ

2

=

n

1

2

T

=

O

n

log

n

ǫ

RunTime =

O

mn

log

n ǫ

(78)

Combining Representation and Iteration

Gaussian elimination and gradient methods seem complementary

Gaussian elimination is nearly-linear on

paths

(and trees),

but slow on

expanders

.

78

Gradient methods are nearly-linear time on

expanders

,

but slow on

paths

.

MAIN APPROACH TO FAST SOLVERS:

(79)

Combining Representation and Iteration:

Combinatorial Preconditioning

Lv

=

χ

Gradient methods fail when condition number is large.

IDEA: modify system to improve condition number

where H is a preconditioner graph.

DESIRED PROPERTIES OF H:

1. New matrix is well conditioned:

L

+

H

Lv

=

L

+

H

χ

(80)

Combining Representation and Iteration:

Combinatorial Preconditioning

Lv

=

χ

Gradient methods fail when condition number is large.

IDEA: modify system to improve condition number

80

where H is a preconditioner graph.

DESIRED PROPERTIES OF H:

1. New matrix is well conditioned:

2. Linear systems in L

H can be solved quickly by Gaussian elimination

L

+

H

Lv

=

L

+

H

χ

L

+H

L

IMPROVES ITERATION COUNT

(81)

Preconditioning: Low-Stretch Trees

(82)

Preconditioning: Low-Stretch Trees

G

spanning tree T

(83)

Preconditioning: Low-Stretch Trees

G

spanning tree T

(84)

Preconditioning: Low-Stretch Trees

G

spanning tree T

st(

e

) =

1

r

e

e′∈Pe

r

e′

Stretch of e

84

e

(85)

Preconditioning: Low-Stretch Trees

G

spanning tree T

st(

e

) =

1

r

e

e′∈Pe

r

e′

Stretch of e

e

P

e
(86)

Preconditioning: Low-Stretch Trees

G

spanning tree T

st(

e

) =

1

r

e

e′∈Pe

r

e′

Stretch of e

86

e

Pe

(87)

Preconditioning: Low-Stretch Trees

G

spanning tree T

st(

e

) =

1

r

e

e′∈Pe

r

e′

Stretch of e

e

P

e

Total Stretch of e

st(

T

) =

e∈E

(88)

Preconditioning: Low-Stretch Trees

G

spanning tree T

st(

e

) =

1

r

e

e′∈Pe

r

e′

Stretch of e

88

e

P

e

Total Stretch of e

st(

T

) =

e∈E

st(

e

)

(89)

Preconditioning: Low-Stretch Trees

G

spanning tree T

st(

e

) =

1

r

e

e′∈Pe

r

e′

Stretch of e

e

P

e

Total Stretch of e

st(

T

) =

e∈E

(90)

Preconditioning: Low-Stretch Trees

90

(91)

Preconditioning: Low-Stretch Trees

Q: How does this help?

A: Can help to bound condition number of system preconditioned by T

(92)

Preconditioning: Low-Stretch Trees

92

Q: How does this help?

A: Can help to bound condition number of system preconditioned by T

1

·

I

L

+T

L

G

st(

T

)

·

I

Spanning tree property

Low-stretch-tree property

(93)

Preconditioning: Low-Stretch Trees

Q: How does this help?

A: Can help to bound condition number of system preconditioned by T

EASY PROOF:

1

·

I

L

+T

L

G

st(

T

)

·

I

+

Tr

+

=

T

+

= st(

)

(94)

Preconditioning: Low-Stretch Trees

94

Q: How does this help?

A: Can help to bound condition number of system preconditioned by T

EASY PROOF:

1

·

I

L

+T

L

G

st(

T

)

·

I

λ

max

L

+T

L

G

Tr

L

+T

L

G

=

e∈E

χ

T

e

L

+

T

χ

e

= st(

T

)

Trace bounds eigenvalue

(95)

Preconditioning: Low-Stretch Trees

Q: How does this help?

A: Can help to bound condition number of system preconditioned by T

EASY PROOF:

1

·

I

L

+T

L

G

st(

T

)

·

I

+

Tr

+

=

T

+
(96)

Preconditioning: Partial Results

1. Preconditioning by low-stretch spanning trees[BH01]

Condition number

RunTime = ˜

O

(

m

log

n

log

n ǫ

)

·

[

O

(

m

) +

O

(

n

)]

(97)

Preconditioning: Partial Results

1. Preconditioning by low-stretch spanning trees[BH01]

RunTime = ˜

O

(

m

log

n

log

n ǫ

)

·

[

O

(

m

) +

O

(

n

)]

Condition number L+

T L

(98)

Preconditioning: Partial Results

1. Preconditioning by low-stretch spanning trees[BH01]

RunTime = ˜

O

(

m

log

n

log

n ǫ

)

·

[

O

(

m

) +

O

(

n

)]

Condition number L+

T L

Multiplication by

98

2. Improved analysis using Conjugate Gradient and Trace bound [SW09]

RunTime = ˜

O

(

m

4/3

polylog

n

)

(99)

Recursive Preconditioning:

Spielman-Teng and Koutis-Miller-Peng

Nearly-linear-time algorithms at last: [ST04], [KMP10],[KMP11]

(100)

Recursive Preconditioning:

Spielman-Teng and Koutis-Miller-Peng

Nearly-linear-time algorithms at last: [ST04], [KMP10],[KMP11]

-IDEA: Low-stretch trees do not provide good enough condition number. Use better preconditioner graphs

L

+

Lv

=

L

+

χ

100

PROBLEM: Hard to solve system for G

1.

SOLUTION: Solve recursively.

L

+

G

(101)

Recursive Preconditioning:

Spielman-Teng and Koutis-Miller-Peng

Nearly-linear-time algorithms at last: [ST04], [KMP10],[KMP11]

-IDEA: Low-stretch trees do not provide good enough condition number. Use better preconditioner graphs

L

+

Lv

=

L

+

χ

PROBLEM: Hard to solve system for G

1.

SOLUTION: Solve recursively.

MAIN IDEA: At every recursive level, G becomes smaller as some

low-L

+

G

1

Lv

=

L

+

G

1

χ

L

+

G

2

L

G

1

x

=

L

+

(102)

Our Algorithm

(103)

Our Algorithm

(at last)

(104)

Choosing the Right Representation

Solve for the Flow

All algorithms discussed so far aim to solve voltage problem .

This is particularly problematic for gradient-based methods:

min

x1

1

2

·

v

T

Lv

v

T

χ

104

This is particularly problematic for gradient-based methods:

(105)

Choosing the Right Representation

Solve for the Flow

All algorithms discussed so far aim to solve voltage problem .

Our algorithm targets the minimum-energy flow problem:

min

x1

1

2

·

v

T

Lv

v

T

χ

Our algorithm targets the minimum-energy flow problem:

min

f

T

Rf

(106)

Choosing the Right Representation

Solve for Electrical Flow

All algorithms discussed so far aim to solve voltage problem .

Our algorithm targets the minimum-energy flow problem:

min

x1

1

2

·

v

T

Lv

v

T

χ

Our algorithm targets the minimum-energy flow problem:

min

f

T

Rf

s.t

B

T

f

=

χ

Minimize energy

(107)

Choosing the Right Representation

Solve for Electrical Flow

All algorithms discussed so far aim to solve voltage problem .

Our algorithm targets the minimum-energy flow problem:

min

x1

1

2

·

v

T

Lv

v

T

χ

Our algorithm targets the minimum-energy flow problem:

This choice opens up different representational questions:

min

f

T

Rf

s.t

B

T

f

=

χ

1

1

1

1

1

1

Minimize energy

(108)

Optimality Conditions for Flow Problem

1. Ohm’s Law:

v

:

f

=

R

−1

Bv

2. Kirchoff ’s Conservation Law:

B

T

f

=

χ

(109)

Optimality Conditions for Flow Problem

1. Ohm’s Law:

v

:

f

=

R

−1

Bv

2. Kirchoff ’s Conservation Law:

B

T

f

=

χ

We eliminate dependence on voltages in Ohm’s Law:

Kirchoff ’s Cycle Law: For any cycle C in G, the optimal electrical flow has

e∈C

refe = 0

(110)

Optimality Conditions for Flow Problem

1. Ohm’s Law:

v

:

f

=

R

−1

Bv

2. Kirchoff ’s Conservation Law:

B

T

f

=

χ

110

We eliminate dependence on voltages in Ohm’s Law:

Kirchoff ’s Cycle Law: For any cycle C in G, the optimal electrical flow has

e∈C

refe = 0

(111)

Optimality Conditions for Flow Problem

We eliminate dependence on voltages in Ohm’s Law:

Kirchoff ’s Cycle Law: For any cycle C in G, the optimal electrical flow has

e∈C

refe = 0

Fact:

Ohm’s Law

Kirchoff’s Cycle Law

Fact:

Ohm’s Law

Kirchoff’s Cycle Law

Flow values

Unit resistances 1

2 1

1

(112)

Optimality Conditions for Flow Problem

We eliminate dependence on voltages in Ohm’s Law:

Kirchoff ’s Cycle Law: For any cycle C in G, the optimal electrical flow has

e∈C

refe = 0

Fact:

Ohm’s Law

Kirchoff’s Cycle Law

112

Fact:

Ohm’s Law

Kirchoff’s Cycle Law

Flow values

Unit resistances

Obeys KCL

1

2 1

1

1

1

1

(113)

Optimality Conditions for Flow Problem

We eliminate dependence on voltages in Ohm’s Law:

Kirchoff ’s Cycle Law: For any cycle C in G, the optimal electrical flow has

e∈C

refe = 0

Fact:

Ohm’s Law

Kirchoff’s Cycle Law

Fact:

Ohm’s Law

Kirchoff’s Cycle Law

Flow values

Unit resistances

Obeys KCL

Set voltages

1

2 1

1

1 1

0 1

2

4

(114)

Optimality Conditions for Flow Problem

We eliminate dependence on voltages in Ohm’s Law:

Kirchoff ’s Cycle Law: For any cycle C in G, the optimal electrical flow has

e∈C

refe = 0

Fact:

Ohm’s Law

Kirchoff’s Cycle Law

114

Fact:

Ohm’s Law

Kirchoff’s Cycle Law

Flow values

Unit resistances

Obeys KCL

Set voltages

using spanning tree

KCL ensures that off-tree edges respect Ohm’s Law

1

2 1

1

1

1

1

1

0 1

1 2

4

2

(115)

1. Kirchoff’s Cycle Law: For any cycle

C

in

G,

the optimal electrical flow

has

e∈C

r

e

f

e

= 0

Algorithmic Approach

ITERATIVELY FIX BY ADDING/REMOVING FLOW ALONG CYCLES

2. Kirchoff’s Conservation Law:

(116)

1

1

1

1

1

0

1

s t

Initialize:

Pick a cycle

Pick a cycle

Improve:

Algorithmic Approach

0

0

0

0

0

0

Pick a cycle

Pick a cycle

(117)

1

1

1

1

1

0

1

s t

Initialize

:

Pick a cycle

Pick a cycle

Pick a cycle

Pick a cycle

Algorithmic Approach

0

0

0

0

0

0

Pick a cycle

Pick a cycle

Improve:

Pick a cycle

Fix it

(118)

1/4

1/4

Initialize:

Improve:

Pick a cycle

Pick a cycle

Algorithmic Approach

1/4 1/4

Improve:

Pick a cycle

Fix it

Pick a cycle

Fix it

(119)

1 1

1

1 3/4

1/4

1

Initialize:

s t

Pick a cycle

Pick a cycle

Improve:

Pick a cycle

Pick a cycle

Algorithmic Approach

1/4

0

0

1/4

0

Pick a cycle

Fix it

Pick a cycle

Fix it

Improve:

Pick a cycle

Fix it

Repeat

Pick a cycle

Fix it

(120)

Initialize:

Pick a cycle

Pick a cycle

Improve: 1 1 1 5/8 9/16 1/16 1 s t 51/16 35/16 1 25/16 0 Which cycle? Which cycle?

Algorithmic Approach

Pick a cycle

Fix it

Repeat

Pick a cycle

Fix it

(121)

Initialize:

Pick a cycle

Pick a cycle

Improve: 1 1 1 5/8 9/16 1/16 1 s t 51/16 35/16 1 25/16 0 Which cycle? Which cycle?

Algorithmic Approach

Pick a cycle

Fix it

Repeat

Pick a cycle

Fix it

(122)

Choosing the Right Representation:

Cycle Space

BTf = χ

f

f1

f = R−1Bv

122

f0

0

(123)

Choosing the Right Representation:

Cycle Space

BTf = χ

f

f1

f = R−1Bv

Cycle updates

An affine translation of span of cycles (cycle space CG) CG = {c : BTc = 0}

f0

(124)

Choosing the Right Representation:

Cycle Space

BTf = χ

f

f1

f = R−1Bv

Cycle updates

An affine translation of span of cycles (cycle space CG) CG = {c : BTc = 0}

124

f0

0

f* f2

(125)

Choosing the Right Representation:

Basis of Cycle Space

Fact:

Fundamental cycles of spanning tree give a basis of cycle space

(126)

Choosing the Right Representation:

Basis of Cycle Space

Fact:

Fundamental cycles of spanning tree give a basis of cycle space

126

Fix spanning tree

T

e

c

(127)

Our New Representation

ORIGINAl PROBLEM

c

C

G

:

c

T

R

(

f

0

+

f

) = 0

B

T

f

= 0

Use spanning tree T

basis for CG

(128)

Our New Representation

ORIGINAl PROBLEM

c

C

G

:

c

T

R

(

f

0

+

f

) = 0

B

T

f

= 0

128

Use spanning tree T

basis for CG

m – n + 1 constraints

e

E

\

T

:

c

Te

R

(

f

0

+

f

) = 0

(129)

Our New Representation

ORIGINAl PROBLEM

c

C

G

:

c

T

R

(

f

0

+

f

) = 0

B

T

f

= 0

Use spanning tree T

basis for CG

m – n + 1 constraints

e

E

\

T

:

c

Te

R

(

f

0

+

f

) = 0

(130)

Our New Representation

ORIGINAl PROBLEM

c

C

G

:

c

T

R

(

f

0

+

f

) = 0

B

T

f

= 0

130

Use spanning tree T

basis for CG

m – n + 1 constraints

Iteratively fix fundamental cycles of T WHAT ITERATIVE METHOD IS THIS?

m – n + 1 variables

(131)

Initialize:

Pick a cycle

Pick a cycle

Improve: 1 1 1 5/8 9/16 1/16 1 s t 51/16 35/16 1 25/16 0

Algorithmic Approach

Which cycle? Which cycle?

Pick a cycle

Fix it

Repeat

Pick a cycle

Fix it

(132)

Choosing the Right Iteration

Gradient Descent?

BTf = χ

f

f = R−1Bv

An affine translation of span of cycles (cycle space CG) CG = {c : BTc = 0}

Basis of fundamental cycles

132 132

ft

0

(133)

Choosing the Right Iteration

Gradient Descent?

BTf = χ

f

f = R−1Bv

An affine translation of span of cycles (cycle space CG) CG = {c : BTc = 0}

Basis of fundamental cycles

f

ft

f*

Gradient descent would update all cycles at once.

(134)

Choosing the Right Iteration

Randomized Cordinate Descent!

BTf = χ

f

f = R−1Bv

An affine translation of span of cycles (cycle space CG) CG = {c : BTc = 0}

Basis of fundamental cycles

134 134

ft

0

f*

Move along one coordinate direction at a time. Choose basis vector c

e w.p. proportional to

Convergence analysis?

cTe Rce

re

= st(

e

) + 1

(135)

Coordinate Descent: Convergence

min

x1

1

2

·

x

T

Ax

b

T

x

Iteration count for gradient descent:

T

=

O

λ

max

(

A

)

λ

(

A

)

log

n

ǫ

∀e ∈ E \ T : cTe R(f0 + f) = 0

f = eE\T fece

Iteration count for randomized coordinate descent:

T

=

O

Tr

(

A

)

λ

min

(

A

)

log

n

ǫ

T

=

O

λ

min

(

A

)

log

ǫ

(136)

Coordinate Descent: Convergence

min

x1

1

2

·

x

T

Ax

b

T

x

Iteration count for gradient descent:

T

=

O

λ

max

(

A

)

λ

(

A

)

log

n

ǫ

∀e ∈ E \ T : cTe R(f0 + f) = 0

f = eE\T fece

136

Iteration count for randomized coordinate descent:

T

=

O

Tr

(

A

)

λ

min

(

A

)

log

n

ǫ

T

=

O

λ

min

(

A

)

log

ǫ

(137)

Convergence

min

x1

1

2

·

x

T

Ax

b

T

x

e ∈ E \ T : cTe R(f0 + f) = 0

f = eE\T fece

λ

min

(

A

) = 1

λ

max

(

A

)

Tr(

A

) = st(

T

) +

O

(

n

)

Same calculation as for voltage preconditioning.

Iteration count for randomized coordinate descent:

T

=

O

Tr

(A)

λmin(A)

log

n ǫ

(138)

New Condition Numbers

min

x1

1

2

·

x

T

Ax

b

T

x

e ∈ E \ T : cTe R(f0 + f) = 0

f = eE\T fece

λ

min

(

A

) = 1

λ

max

(

A

)

Tr(

A

) = st(

T

) +

O

(

n

)

138

Same calculation as for voltage preconditioning.

Iteration count for randomized coordinate descent:

T

=

O

Tr

(A)

λmin(A)

log

n ǫ

Tree stretch

Worse by a factor of m n + 1 in our case We start with a trace guarantee,

(139)

Final Convergence

min

x1

1

2

·

x

T

Ax

b

T

x

e ∈ E \ T : cTe R(f0 + f) = 0

f = eE

References

Related documents

Several therapeutic approaches to the treatment of HOCM can be used: a medical therapy (negative inotropic substances as beta-adrenergic block- ing agents or calcium

168 THE INTELLIGENT GUIDE TO TEXAS HOLD'EM POKER When I played in the now defunct Prince George's County Mary- land cardrooms, there was bonus money given each day for the first

We estimate [6] using an advisor fixed effect

In the final multivariable logistic regression model packed cell volume (PCV) was the only important predictor for medically treated cases, and heart rate and presence of hyperaemic

Objective: The aim was to explore the experiences of fourth- year medical students of diagnostic consultations in a simu- lated primary care setting, in order to gain an insight

We considered using the periodogram, the spectrogram and the Morlet wavelet scalogram to cover a diversity of time- frequency samplings (see Fig 1); the Wigner-Ville

[r]

If the stock market opening is the main driving force behind the in- crease in the foreign participation rate, the foreign investors should be in- di ff erent among stocks and