orecchia.pdf

(1)

A Simple, Combinatorial Algorithm

for Solving SDD Systems

in Nearly-Linear Time

Lorenzo Orecchia,

MIT Math

(2)

Problem Definition: SDD Systems

PROBLEM INPUT:

GOAL:

solve for

x

A ∈ Rn×n, b ∈ Rn.

Ax

=

b

SQUARE SYSTEM OF _{LINEAR EQUATIONS}

2

GOAL:

solve for

x

(3)

PROBLEM INPUT:







a

11

a

12

. . .

a

1n

a

21

a

22

. . .

a

2n

.

. .

.

a

_n

1

a

n2

. . .

a

nn













x

1

x

2

.

x

_n







=







b

1

b

2

.

b

_n







GOAL: _{solve for}_x

Restriction: _A _isS_ymmetricD_iagonally-D_ominant





a

_n

1

a

n2

. . .

a

nn

(4)

Problem Definition: SDD Systems

PROBLEM INPUT:







a

11

a

12

. . .

a

1n

a

12

a

22

. . .

a

2n

.

. .

.

a

1n

a

2n

. . .

a

nn













x

1

x

2

.

x

_n







=







b

1

b

2

.

b

_n







4

Restriction: _A _isS_ymmetricD_iagonally-D_ominant • Symmetry: AT ₌ _A





a

(5)

Problem Definition: SDD Systems

PROBLEM INPUT_: A ∈ Rn×n, b ∈ Rn.







a

11

a

12

. . .

a

1n

a

12

a

22

. . .

a

2n

.

. .

.

a

1n

a

2n

. . .

a

nn













x

1

x

2

.

x

_n







=







b

1

b

2

.

b

_n







• Diagonal Dominance:

for all rows (and columns) i, diagonal entry dominates

a

_ii

≥

j i

|

a

ij

|





a

(6)







a

11

a

12

. . .

a

1n

a

12

a

22

. . .

a

2n

.

. .

.

a

1n

a

2n

. . .

a

nn













x

1

x

2

.

x

_n







=







b

1

b

2

.

b

_n







6

• Diagonal Dominance: for all i,

a

_ii

≥

j₌i

|

a

ij

|





a

1n

a

2n

. . .

a

nn









x

_n









b

_n





WHY SDD ? _{Practically important subcase of positive semidefinite matrices}

A IS SDD A 0

(7)

Problem Definition: SDD Systems







a

11

a

12

. . .

a

1n

a

12

a

22

. . .

a

2n

.

. .

.

a

1n

a

2n

. . .

a

nn













x

1

x

2

.

x

_n







=







b

1

b

2

.

b

_n







• Diagonal Dominance: for all i,

a

_ii

≥

j₌i

|

a

ij

|





a

1n

a

2n

. . .

a

nn









x

_n









b

_n





WHY SDD ? _{Practically important subcase of positive semidefinite matrices}

(8)

PROBLEM INPUT_: A ∈ Rn×n, b ∈ Rn, b ∈ Im(A)







a

11

0

. . .

a

1n

0

a

22

. . .

0

.

. .

.

0

a

2n

. . .

a

nn













x

1

x

2

.

x

_n







=







b

1

b

2

.

b

_n







nnz(_A)

8

• approximately:

•in nearly-linear time in the sparsity nnz(A) and log(1/ǫ) , i.e. RunTime =

x − x∗_A ≤ ǫx∗_A





0

a

2n

. . .

a

nn









x

_n









b

_n





(9)

Reduction to Laplacian Systems

SDD SYSTEM:

Ax

=

b

LAPLACIAN SYSTEM:

Lv

=

χ

[Gremban’96]

(10)

Graph Laplacian

G =(V,E,w) weighted undirected graph with n vertices and m edges

1

2

1

3

GRAPH G

10

1

L

(

G

) =

(11)

Graph Laplacian

1

2

1

3

GRAPH G

1

L

(

G

) =

GRAPH LAPLACIAN L₍G₎

(12)

Graph Laplacian

1

2

1

3

GRAPH G

12

1

L

(

G

) =

GRAPH LAPLACIAN L₍G₎

(13)

Laplacian as Sum of Edges





0

. . .

₀

. . .

₀

. . .

₀

..

.

..

.

..

.

..

.

..

.

..

.

..

.

0

. . .

1

. . .

−

1

. . .

₀

.





i j i

L

(

G

) =

e₌{i,j}∈E

w

_e





−

..

.

..

.

..

.

..

.

..

.

..

.

..

.

0

. . .

−

1

. . .

1

. . .

0

..

.

..

.

..

.

..

.

..

.

..

.

..

.

0

. . .

₀

. . .

₀

. . .

₀

(14)

Laplacian as Sum of Edges





0

. . .

₀

. . .

₀

. . .

₀

..

.

..

.

..

.

..

.

..

.

..

.

..

.

0

. . .

1

. . .

−

1

. . .

₀

.





i j i 14

L

(

G

) =

e₌{i,j}∈E

w

_e





−

..

.

..

.

..

.

..

.

..

.

..

.

..

.

0

. . .

−

1

. . .

1

. . .

0

..

.

..

.

..

.

..

.

..

.

..

.

..

.

0

. . .

₀

. . .

₀

. . .

₀





j

(15)





0

. . .

₀

. . .

₀

. . .

₀

..

.

..

.

..

.

..

.

..

.

..

.

..

.

0

. . .

1

. . .

−

1

. . .

₀

.





i j i

L

(

G

) =

e₌{i,j}∈E

w

_e





−

..

.

..

.

..

.

..

.

..

.

..

.

..

.

0

. . .

−

1

. . .

1

. . .

0

..

.

..

.

..

.

..

.

..

.

..

.

..

.

0

. . .

₀

. . .

₀

. . .

₀





j

(16)

L

(

G

) =

e₌{i,j}∈E

w

_e





0

..

.

1

..

.

−

1

.





0

· · ·

1

· · ·

−

1

· · ·

0

T

i j i j 16





−

..

.

0





χ

e

χ

T

e

i

j

ARBITRARY ORIENTATION OF EDGES: WILL BE USEFUL WHEN DISCUSSING FLOWS

(17)

More Graph Matrices: Incidence Matrix





χ

e 1

χ

e 2

..





n

B

T =







..

.

χ

e_i

..

.

χ

e_m

(18)

More Graph Matrices: Incidence Matrix

B

T =





χ

e₁

χ

e 2

..

.





m

n

18

B

T =





.

χ

e_i

..

.

χ

e_m





m

L

=

e

=

{

i,j

}∈

E

w

e

χ

e

χ

T

e

=

B

T

W B

(19)

More Graph Matrices: Incidence Matrix

B

T =





χ

e₁

χ

e 2

..

.

χ





m

n

B

=





χ

e_i

..

.

χ

e_m





m

(20)

Action of Incidence Matrix

+1 -1

20

Action of BT_{on a vector}

f ∈ Rm:

Action of B on a vector v ∈ Rn:

(

B

T

f

)

i

= flow out of i

−

flow into of i = net flow into graph at i

+1 -1

(21)

Graphs as Electrical Circuits

1

2

1

(22)

1

1/2

1

1/3 Edge conductances we

Edge resistances re

re= 1/we

(23)

1

1/2

1

Edge resistances re

re= 1/we

(24)

Graphs as Electrical Circuits

1

1/2

1

Edge resistances re

re= 1/we

24

one unit of electrical flow Into vertex 1 and out of 3

(25)

1

1/2

1

Edge resistances re

re= 1/we

1. Ohm’s Law: For every edge e = (i, j) ∈ E:

f₍_i,j₎ = vi − vj

r_ij

(26)

Graphs as Electrical Circuits

1

1/2

1

Edge resistances re

re= 1/we

26

1. Ohm’s Law: For every edge e = (i, j) ∈ E:

f₍_i,j₎ = vi − vj

r_ij

2. Kirchoff’s Conservation Law: For every vertex i ∈ V :

flow out of i − flow into i = net flow into network at i

f

=

R

−1

Bv

(27)

1

1/2

1

Edge resistances re

re= 1/we

1. Ohm’s Law:

f = _R−1_Bv ₌ _{W Bv}

2. Kirchoff ’s Conservation Law:

(28)

Graphs as Electrical Circuits

1

1/2

1

Edge resistances re

re= 1/we

28

1. Ohm’s Law:

f = _R−1_Bv ₌ _{W Bv}

BTf = _χ

Example above: Current source s is vertex 1. Current sink t is vertex 3

General net flow vector χ allowed:

(29)

1

1/2

1

Edge resistances re

re= 1/we

General net flow vector χ :

χT1 = 0

Laplacian Systems as Electrical Problems

Q: What is resulting electrical flow on the graph?

f

=

R

−1

Bv

=

W Bv,

B

T

f

=

χ

(30)

1

1/2

1

Edge resistances re

re= 1/we

χT1 = 0

Laplacian Systems as Electrical Problems

30

Given input currents χ, finding voltage is equivalent to solving Laplacian system of linear equations

(31)

1

1/2

1

Edge resistances re

re= 1/we

χT1 = 0

Laplacian Systems as Electrical Problems

Lv

=

χ

v

=

L

+

χ

(32)

Energy Intepretation

Lv

=

χ

min

1

·

x

T

Lx

−

x

T

χ

Optimality condition

32

min

x_⊥₁

1

2

·

x

T

Lx

−

x

T

χ

Equivalent up to scaling

min

x_⊥₁

1

2

·

x

T

Lx

(33)

Energy Intepretation

Lv

=

χ

min

1

·

x

T

Lx

−

x

T

χ

Optimality condition

min

x_⊥₁

1

2

·

x

T

Lx

−

x

T

χ

Equivalent up to scaling

min

1

2

·

x

T

Lx

=

(i,j)∈_E

(x_i−_x_j₎2

(34)

Why it matters

•

Direct Applications

•

Modeling electrical

networks

•

Simulating random walks

•

PageRank

34

•

PageRank

(35)

Why it matters

•

Direct Applications

•

Modeling electrical

networks

•

Simulating random walks

•

PageRank

•

PageRank

Numerical Applications

•

Finite element

[BHV08]

(36)

Why it matters: Faster Graph Algorithms

“The Laplacian Paradigm”

•

Maximum flow

[CKM+11,LRS13,KOLS12]

•

Multicommodity flow

[KMP12, KOLS12]

•

Random spanning trees

[KM09]

•

Graph sparsification

[SS11]

36

•

Graph sparsification

[SS11]

•

Lossy flow, min-cost flow

[DS08]

•

Balanced partitioning

[OSV12]

•

Oblivious routing

[KOLS12]

(37)

(38)

Our Result

Solve

Lx

=

b

in time ˜

O

m

log

2

n

log

1

ǫ

• Very different approach

• Simple and intuitive algorithm

38

• Simple and intuitive algorithm

• Proof fits on a single blackboard

(39)

Our Result

• Very different approach

Solve

Lx

=

b

in time ˜

O

m

log

2

n

log

1

ǫ

• Proof fits on a single blackboard

(40)

Which computational model?

Q:

How do we account for running time of arithmetic operations?

A:

Constant time for operations on word-size sequence of bits.

1

0

1

O

(log

c

n

)

Size of Word: Polynomial in size of input

(41)

Which computational model?

Q:

How do we account for running time of arithmetic operations?

A:

Constant time for operations on word-size sequence of bits.

1

0

1

O

(log

c

n

)

Size of Word: Polynomial in size of input

Previous works

:

(42)

The Philosophy of This Talk

PROBLEM

REPRESENTATION

42

(43)

PROBLEM

REPRESENTATION

Not all representations created equally: e.g. eigenvalue decompostion

Good representation often requires

combinatorial insight

(44)

PROBLEM

REPRESENTATION

44

Leverage large body of techniques in continuous optimization

Regularization Gradient Descent

Accelerated Gradient Descent Randomized Coordinate Descent

(45)

The Philosophy of This Talk

PROBLEM

REPRESENTATION

Leverage large body of techniques in continuous optimization

Regularization Gradient Descent

Accelerated Gradient Descent Randomized Coordinate Descent

(46)

Techniques and Challenges in the

Solution of Laplacian Systems

(47)

Changing Representation:

Gaussian Elimination

Simple Graphic Interpretation:

L =



   

4 ₋1 ₋1 ₋1 ₋1

−1 1 0 0 0 −1 0 1 0 0



(48)

Eliminate (pivot) this node

48



    

4 ₋1 ₋1 ₋1 ₋1

−1 1 0 0 0 −1 0 1 0 0 −1 0 0 1 0 −1 0 0 0 1



(49)

Eliminate (pivot) this node



   

4 ₋1 ₋1 ₋1 ₋1

−1 1 0 0 0 −1 0 1 0 0



(50)

Gaussian Elimination

     

1 0 0 0 0 0 3 4 − 1 4 − 1 4 − 1 4

0 ₋1

4

3

4 − − 1

4 − 1 4

0 ₋1

4 − 1 4 3 4 − 1 4

0 ₋1

4 − 1

4 −1

(51)

Cholesky Decomposition



  

1 0 0 0 0

0 3 4 − 1 4 − 1 4 − 1 4 0 − 1 4 3

4 − − 1 4 − 1 4     L =    

4 0 0 0 0

−1 1 0 0 0 0 0 1 0 0

       

4 ₋1 0 0 0 0 1 0 0 0 0 0 1 0 0

   

(52)

Advantages:

•

Gives exact algorithm

•

Computes inverse matrix (can then be used on any

b

)

•

Can choose elimination order

(53)

Advantages:

•

Computes implicit representation of inverse matrix

(can then be used on any

b

)

•

Disadvantages:

(54)

and Nested Dissection

Combinatorial arguments can help preserve sparsity by selecting good order:

G

(55)

Gaussian Elimination

G

(56)

G

56

Small Vertex Separator

(57)

and Nested Dissection

G

Size of existing separators bounds sparsity

(58)

G

58

Size of existing separators bounds sparsity

(59)

Gaussian Elimination: the bad case

G sparse expander

(60)

Advantages:

•

Computes implicit representation of inverse matrix

(can then be used on any

b

)

•

Can choose elimination order to minimize running time

60

•

Can choose elimination order to minimize running time

Disadvantages:

•

Intermediate Laplacians can be very dense

•

Very slow

in worst case:

O

(

n

3

)

O

(

n

w

)

(61)

Gaussian Elimination: the good case

Easily linear time for

some graphs:

(62)

some graphs:

PATHS

Eliminate in time O(1)

62

PATHS

(63)

some graphs:

(64)

some graphs:

PATHS

O₍n_{) time}

64

This works more in general for trees

by

recursively eliminating a leaf.

(65)

Iterative Methods: Gradient Descent

Consider energy interpretation:

This is a convex optimization problem on which we can apply gradient

min

x_⊥₁

1

2

·

x

T

Lx

−

x

T

χ

(66)

f

(

x

)

∇

f

(

x

) =

Lx

−

χ

∇

2

f

(

x

) =

L

min

x_⊥₁

1

2

·

x

T

Lx

−

x

T

χ

This is a convex optimization problem on which we can apply gradient descent techniques.

(67)

Iterative Methods: Gradient Descent

f

(

x

)

∇

f

(

x

) =

Lx

−

χ

λ

₂

D

∇

2

f

(

x

)

2

D

min

x_⊥₁

1

2

·

x

T

Lx

−

x

T

χ

Use degree norm · _D

(68)

f

(

x

)

∇

f

(

x

) =

Lx

−

χ

min

x_⊥₁

1

2

·

x

T

Lx

−

x

T

χ

λ

₂

D

∇

2

f

(

x

)

2

D

68

Construct iterative solutions:

x

(t+1)

=

x

(t)

−

hD

−1

∇

f

(

x

(t)

)

Step length

(69)

Iterative Methods: Gradient Descent

f

(

x

)

∇

f

(

x

) =

Lx

−

χ

min

x_⊥₁

1

2

·

x

T

Lx

−

x

T

χ

λ

₂

D

∇

2

f

(

x

)

2

D

By standard gradient descent analysis h _{= 1/2}

For quadratic function, it can be optimized at every step.

x

(t+1)

=

x

(t)

−

1

2

D

−

1

∇

f

(

x

(t)

)

(70)

Iterative Methods: Gradient Descent

f

(

x

)

∇

f

(

x

) =

Lx

−

χ

min

x_⊥₁

1

2

·

x

T

Lx

−

x

T

χ

λ

₂

D

∇

2

f

(

x

)

2

D

70

.

x

(t+1)

=

t_j₌₀

I+₂W

j

χ

UNRAVEL RECURSION TO OBTAIN TRUNCATED SERIES: t steps of random walk

(71)

Iterative Methods: Gradient Descent

f

(

x

)

∇

f

(

x

) =

Lx

−

χ

λ

₂

D

∇

2

f

(

x

)

2

D

min

x_⊥₁

1

2

·

x

T

Lx

−

x

T

χ

Iterations necessary to converge to ǫ_{-approximate solution:}

x

(0)

, x

(1)

, x

(2)

, . . . , x

(t)

, . . .

x

(t+1)

=

t_j₌₀

I+₂W

j

(72)

Bad Example for Gradient Descent

PATH:

λ

2

=

n

1

2

72

T

=

O

2

λ₂

log

n

ǫ

=

O

n

2

log

n

ǫ

(73)

Bad Example for Gradient Descent

PATH:

λ

2

=

n

1

2

1/2 1/2

S t

GAMBLER’S RUIN

Each Iteration is a matrix-vector multiplication by , requiring time O(m₎

T

=

O

2

λ₂

log

_n

ǫ

=

O

n

2

log

n ǫ

2

n

I+W

2

(74)

Bad Example for Gradient Descent

PATH:

λ

2

=

n

1

2

74

Each Iteration is a matrix-vector multiplication by , requiring time O(m₎

T

=

O

2

λ₂

log

n

ǫ

=

O

n

2

log

n

ǫ

2

n

I+W

2

RunTime =

O

mn

2

log

n ǫ

(75)

min

x_⊥₁

1

2

·

x

T

Lx

−

x

T

χ

Accelerated Gradient Descent

∇

f

(

x

) =

Lx

−

χ

λ

₂

D

∇

2

f

(

x

)

2

D

descent techniques.

Accelerated gradient techniques achieve better convergence:

Improved iteration count:

CHEBYSHEV’S ITERATION CONJUGATE GRADIENT

T

=

O

2

(76)

Still no luck, but …

PATH:

λ

=

1

T

=

O

n

log

n

76

PATH:

λ

2

=

n

1

2

T

=

O

n

log

n

ǫ

RunTime =

O

mn

log

n ǫ

(77)

Still no luck, but …

PATH:

λ

=

1

T

=

O

n

log

n

IT TAKES n _{STEPS FOR CHARGE TO TRAVEL ACROSS}

PATH:

λ

2

=

n

1

2

T

=

O

n

log

n

ǫ

RunTime =

O

mn

log

n ǫ

(78)

Combining Representation and Iteration

Gaussian elimination and gradient methods seem complementary

•

Gaussian elimination is nearly-linear on

paths

(and trees),

but slow on

expanders

.

78

•

Gradient methods are nearly-linear time on

expanders

,

but slow on

paths

.

MAIN APPROACH TO FAST SOLVERS:

(79)

Combining Representation and Iteration:

Combinatorial Preconditioning

Lv

=

χ

Gradient methods fail when condition number is large.

IDEA: modify system to improve condition number

where H is a preconditioner graph.

DESIRED PROPERTIES OF H:

1. New matrix is well conditioned:

L

+

H

Lv

=

L

+

H

χ

(80)

Combining Representation and Iteration:

Combinatorial Preconditioning

Lv

=

χ

Gradient methods fail when condition number is large.

IDEA: modify system to improve condition number

80

where H is a preconditioner graph.

DESIRED PROPERTIES OF H:

1. New matrix is well conditioned:

2. Linear systems in L

H can be solved quickly by Gaussian elimination

L

+

H

Lv

=

L

+

H

χ

L

+_H

L

IMPROVES ITERATION COUNT

(81)

Preconditioning: Low-Stretch Trees

(82)

Preconditioning: Low-Stretch Trees

G

spanning tree T

(83)

G

spanning tree T

(84)

G

spanning tree T

st(

e

) =

1

r

_e

e′∈Pe

r

_e′

Stretch of e

84

e

(85)

G

spanning tree T

st(

e

) =

1

r

_e

e′∈Pe

r

_e′

Stretch of e

e

P

e

(86)

G

spanning tree T

st(

e

) =

1

r

_e

e′∈Pe

r

_e′

Stretch of e

86

e

Pe

(87)

Preconditioning: Low-Stretch Trees

G

spanning tree T

st(

e

) =

1

r

_e

e′∈Pe

r

_e′

Stretch of e

e

P

e

Total Stretch of e

st(

T

) =

e∈E

(88)

G

spanning tree T

st(

e

) =

1

r

_e

e′∈Pe

r

_e′

Stretch of e

88

e

P

e

Total Stretch of e

st(

T

) =

e∈E

st(

e

)

(89)

G

spanning tree T

st(

e

) =

1

r

_e

e′∈Pe

r

_e′

Stretch of e

e

P

e

Total Stretch of e

st(

T

) =

e∈E

(90)

90

(91)

Q: How does this help?

A: Can help to bound condition number of system preconditioned by T

(92)

Preconditioning: Low-Stretch Trees

92

1

·

I

L

+_T

L

G

st(

T

)

·

I

Spanning tree property

Low-stretch-tree property

(93)

EASY PROOF:

1

·

I

L

+_T

L

G

st(

T

)

·

I

₊

Tr

+

=

T

+

= st(

)

(94)

Preconditioning: Low-Stretch Trees

94

EASY PROOF:

1

·

I

L

+_T

L

G

st(

T

)

·

I

λ

max

L

+_T

L

G

≤

Tr

L

+_T

L

G

=

e∈E

χ

T

e

L

+

T

χ

e

= st(

T

)

Trace bounds eigenvalue

(95)

Preconditioning: Low-Stretch Trees

EASY PROOF:

1

·

I

L

+_T

L

G

st(

T

)

·

I

₊

Tr

+

=

T

+

(96)

Preconditioning: Partial Results

1. Preconditioning by low-stretch spanning trees[BH01]

Condition number

RunTime = ˜

O

(

√

m

log

n

log

n ǫ

)

·

[

O

(

m

) +

O

(

n

)]

(97)

Preconditioning: Partial Results

RunTime = ˜

O

(

√

m

log

n

log

n ǫ

)

·

[

O

(

m

) +

O

(

n

)]

Condition number _L+

T L

(98)

Preconditioning: Partial Results

RunTime = ˜

O

(

√

m

log

n

log

n ǫ

)

·

[

O

(

m

) +

O

(

n

)]

Condition number _L+

T L

Multiplication by

98

2. Improved analysis using Conjugate Gradient and Trace bound [SW09]

RunTime = ˜

O

(

m

4/3

polylog

n

)

(99)

Recursive Preconditioning:

Spielman-Teng and Koutis-Miller-Peng

Nearly-linear-time algorithms at last: [ST04], [KMP10],[KMP11]

(100)

Recursive Preconditioning:

-IDEA: Low-stretch trees do not provide good enough condition number. Use better preconditioner graphs

L

+

Lv

=

L

+

χ

100

PROBLEM: Hard to solve system for G

1.

SOLUTION: Solve recursively.

L

+

G

(101)

Recursive Preconditioning:

-IDEA: Low-stretch trees do not provide good enough condition number. Use better preconditioner graphs

L

+

Lv

=

L

+

χ

PROBLEM: Hard to solve system for G

1.

SOLUTION: Solve recursively.

MAIN IDEA: At every recursive level, G _{becomes smaller as some}

low-L

+

G

1

Lv

=

L

+

G

₁

χ

L

+

G

₂

L

G

₁

x

=

L

+

(102)

Our Algorithm

(103)

Our Algorithm

(at last)

(104)

Choosing the Right Representation

Solve for the Flow

All algorithms discussed so far aim to solve voltage problem .

This is particularly problematic for gradient-based methods:

min

x_⊥₁

1

2

·

v

T

Lv

−

v

T

χ

104

This is particularly problematic for gradient-based methods:

(105)

Choosing the Right Representation

Solve for the Flow

Our algorithm targets the minimum-energy flow problem:

min

x_⊥₁

1

2

·

v

T

Lv

−

v

T

χ

min

f

T

Rf

(106)

Solve for Electrical Flow

min

x_⊥₁

1

2

·

v

T

Lv

−

v

T

χ

min

f

T

Rf

s.t

B

T

f

=

χ

Minimize energy

(107)

Solve for Electrical Flow

min

x_⊥₁

1

2

·

v

T

Lv

−

v

T

χ

This choice opens up different representational questions:

min

f

T

Rf

s.t

B

T

f

=

χ

1

Minimize energy

(108)

Optimality Conditions for Flow Problem

1. Ohm’s Law:

∃

v

:

f

=

R

−1

Bv

B

T

f

=

χ

(109)

Optimality Conditions for Flow Problem

1. Ohm’s Law:

∃

v

:

f

=

R

−1

Bv

B

T

f

=

χ

We eliminate dependence on voltages in Ohm’s Law:

Kirchoff ’s Cycle Law: For any cycle C in G, the optimal electrical flow has

e∈C

refe = 0

(110)

Optimality Conditions for Flow Problem

1. Ohm’s Law:

∃

v

:

f

=

R

−1

Bv

B

T

f

=

χ

110

e∈C

refe = 0

(111)

e∈C

refe = 0

Fact:

Ohm’s Law

↔

Kirchoff’s Cycle Law

Fact:

Ohm’s Law

↔

Flow values

Unit resistances 1

2 1

1

(112)

e∈C

refe = 0

Fact:

Ohm’s Law

↔

112

Fact:

Ohm’s Law

↔

Flow values

Unit resistances

Obeys KCL

1

2 1

1

(113)

e∈C

refe = 0

Fact:

Ohm’s Law

↔

Fact:

Ohm’s Law

↔

Flow values

Unit resistances

Obeys KCL

Set voltages

1

2 1

1

1 1

0 1

2

4

(114)

e∈C

refe = 0

Fact:

Ohm’s Law

↔

114

Fact:

Ohm’s Law

↔

Flow values

Unit resistances

Obeys KCL

Set voltages

using spanning tree

KCL ensures that off-tree edges respect Ohm’s Law

1

2 1

1

0 1

1 2

4

2

(115)

1. Kirchoff’s Cycle Law: For any cycle

C

in

G,

the optimal electrical flow

has

e∈C

r

e

f

e

= 0

Algorithmic Approach

ITERATIVELY FIX BY ADDING/REMOVING FLOW ALONG CYCLES

2. Kirchoff’s Conservation Law:

(116)

1

0

1

s t

Initialize:

•

Pick a cycle

•

Pick a cycle

Improve:

0

•

Pick a cycle

•

Pick a cycle

(117)

1

0

1

s t

Initialize

:

•

Pick a cycle

•

Pick a cycle

0

•

Pick a cycle

•

Pick a cycle

Improve:

•

Pick a cycle

•

Fix it

(118)

1/4

Initialize:

Improve:

•

Pick a cycle

1/4 1/4

Improve:

•

Pick a cycle

•

Fix it

•

Pick a cycle

•

Fix it

(119)

1 1

1

1 _3/4

1/4

1

Initialize:

s t

•

Pick a cycle

•

Pick a cycle

Improve:

•

Pick a cycle

•

Pick a cycle

1/4

0

1/4

0

•

Pick a cycle

•

Fix it

•

Pick a cycle

•

Fix it

Improve:

•

Pick a cycle

•

Fix it

•

Repeat

•

Pick a cycle

•

Fix it

(120)

Initialize:

•

Pick a cycle

•

Pick a cycle

Improve: 1 1 1 5/8 9/16 1/16 1 s t 51/16 35/16 1 25/16 0 Which cycle? Which cycle?

•

Pick a cycle

•

Fix it

•

Repeat

•

Pick a cycle

•

Fix it

(121)

Initialize:

•

Pick a cycle

•

Pick a cycle

Improve: 1 1 1 5/8 9/16 1/16 1 s t 51/16 35/16 1 25/16 0 Which cycle? Which cycle?

•

Pick a cycle

•

Fix it

•

Repeat

•

Pick a cycle

•

Fix it

(122)

Choosing the Right Representation:

Cycle Space

BTf ₌ χ

f

f1

f ₌ R−1Bv

122

f0

0

(123)

Cycle Space

BTf ₌ χ

f

f1

f ₌ R−1Bv

Cycle updates

An affine translation of span of cycles (cycle space CG) C_G = _{c : BTc = 0_}

f0

(124)

Cycle Space

BTf ₌ χ

f

f1

f ₌ R−1Bv

Cycle updates

124

f0

0

f* f2

(125)

Choosing the Right Representation:

Basis of Cycle Space

Fact:

Fundamental cycles of spanning tree give a basis of cycle space

(126)

Basis of Cycle Space

Fact:

Fundamental cycles of spanning tree give a basis of cycle space

126

Fix spanning tree

T

e

c

(127)

Our New Representation

ORIGINAl PROBLEM

∀

c

∈

C

_G

:

c

T

R

(

f

₀

+

f

) = 0

B

T

f

= 0

Use spanning tree T

basis for CG

(128)

Our New Representation

ORIGINAl PROBLEM

∀

c

∈

C

_G

:

c

T

R

(

f

₀

+

f

) = 0

B

T

f

= 0

128

Use spanning tree T

basis for CG

m – n + 1 constraints

∀

e

∈

E

\

T

:

c

Te

R

(

f

0

+

f

) = 0

(129)

ORIGINAl PROBLEM

∀

c

∈

C

_G

:

c

T

R

(

f

₀

+

f

) = 0

B

T

f

= 0

Use spanning tree T

basis for CG

∀

e

∈

E

\

T

:

c

Te

R

(

f

0

+

f

) = 0

(130)

ORIGINAl PROBLEM

∀

c

∈

C

_G

:

c

T

R

(

f

₀

+

f

) = 0

B

T

f

= 0

130

Use spanning tree T

basis for CG

Iteratively fix fundamental cycles of T WHAT ITERATIVE METHOD IS THIS?

m – n + 1 variables

(131)

Initialize:

•

Pick a cycle

•

Pick a cycle

Improve: 1 1 1 5/8 9/16 1/16 1 s t 51/16 35/16 1 25/16 0

Which cycle? Which cycle?

•

Pick a cycle

•

Fix it

•

Repeat

•

Pick a cycle

•

Fix it

(132)

Choosing the Right Iteration

Gradient Descent?

BTf ₌ χ

f

f ₌ R−1Bv

Basis of fundamental cycles

132 132

ft

0

(133)

Gradient Descent?

BTf ₌ χ

f

f ₌ R−1Bv

∇

f

ft

f*

Gradient descent would update all cycles at once.

(134)

Randomized Cordinate Descent!

BTf ₌ χ

f

f ₌ R−1Bv

134 134

ft

0

f*

Move along one coordinate direction at a time. Choose basis vector c

e w.p. proportional to

Convergence analysis?

cT_e Rc_e

r_e

= st(

e

) + 1

(135)

Coordinate Descent: Convergence

min

x_⊥₁

1

2

·

x

T

Ax

−

b

T

x

Iteration count for gradient descent:

T

=

O

λ

max

(

A

)

λ

(

A

)

log

n

ǫ

∀e ∈ E \ T : cTe R(f0 + f) = 0

f = e_∈E_\T fece

Iteration count for randomized coordinate descent:

T

=

O

Tr

(

A

)

λ

min

(

A

)

log

n

ǫ

T

=

O

λ

_min

(

A

)

log

ǫ

(136)

Coordinate Descent: Convergence

min

x_⊥₁

1

2

·

x

T

Ax

−

b

T

x

Iteration count for gradient descent:

T

=

O

λ

max

(

A

)

λ

(

A

)

log

n

ǫ

∀e ∈ E \ T : cTe R(f0 + f) = 0

f = e_∈E_\T fece

136

T

=

O

Tr

(

A

)

λ

min

(

A

)

log

n

ǫ

T

=

O

λ

_min

(

A

)

log

ǫ

(137)

Convergence

min

x_⊥₁

1

2

·

x

T

Ax

−

b

T

x

∀

e ∈ E \ T : cTe R(f0 + f) = 0

f = e_∈E_\T fece

λ

min

(

A

) = 1

λ

max

(

A

)

≤

Tr(

A

) = st(

T

) +

O

(

n

)

Same calculation as for voltage preconditioning.

T

=

O

Tr

(A₎

λ_min₍A₎

log

n ǫ

(138)

New Condition Numbers

min

x_⊥₁

1

2

·

x

T

Ax

−

b

T

x

∀

e ∈ E \ T : cTe R(f0 + f) = 0

f = e_∈E_\T fece

λ

min

(

A

) = 1

λ

max

(

A

)

≤

Tr(

A

) = st(

T

) +

O

(

n

)

138

Same calculation as for voltage preconditioning.

T

=

O

Tr

(A₎

λ_min₍A₎

log

n ǫ

Tree stretch

Worse by a factor of m _– n _{+ 1 in our case} We start with a trace guarantee,

(139)

Final Convergence

min

x_⊥₁

1

2

·

x

T

Ax

−

b

T

x

∀

e ∈ E \ T : cTe R(f0 + f) = 0

f = e_∈E