A Simple, Combinatorial Algorithm
for Solving SDD Systems
in Nearly-Linear Time
Lorenzo Orecchia,
MIT Math
Problem Definition: SDD Systems
PROBLEM INPUT:
GOAL:
solve for
x
A ∈ Rn×n, b ∈ Rn.
Ax
=
b
SQUARE SYSTEM OF LINEAR EQUATIONS2
GOAL:
solve for
x
Problem Definition: SDD Systems
PROBLEM INPUT:
a
11
a
12. . .
a
1na
21
a
22. . .
a
2n.
.
.
.
.
.
. .
.
.
.
.
a
n1
a
n2. . .
a
nn
x
1x
2.
.
.
x
n
=
b
1b
2.
.
.
b
n
A ∈ Rn×n, b ∈ Rn.
GOAL: solve for x
Restriction: A is Symmetric Diagonally-Dominant
a
n1
a
n2. . .
a
nnProblem Definition: SDD Systems
PROBLEM INPUT:
a
11
a
12. . .
a
1na
12
a
22. . .
a
2n.
.
.
.
.
.
. .
.
.
.
.
a
1n
a
2n. . .
a
nn
x
1x
2.
.
.
x
n
=
b
1b
2.
.
.
b
n
A ∈ Rn×n, b ∈ Rn.
4
GOAL: solve for x
Restriction: A is Symmetric Diagonally-Dominant • Symmetry: AT = A
a
Problem Definition: SDD Systems
PROBLEM INPUT: A ∈ Rn×n, b ∈ Rn.
a
11
a
12. . .
a
1na
12
a
22. . .
a
2n.
.
.
.
.
.
. .
.
.
.
.
a
1n
a
2n. . .
a
nn
x
1x
2.
.
.
x
n
=
b
1b
2.
.
.
b
n
GOAL: solve for x
Restriction: A is Symmetric Diagonally-Dominant • Symmetry: AT = A
• Diagonal Dominance:
for all rows (and columns) i, diagonal entry dominates
a
ii≥
j i
|
a
ij|
a
Problem Definition: SDD Systems
PROBLEM INPUT: A ∈ Rn×n, b ∈ Rn.
a
11
a
12. . .
a
1na
12
a
22. . .
a
2n.
.
.
.
.
.
. .
.
.
.
.
a
1n
a
2n. . .
a
nn
x
1x
2.
.
.
x
n
=
b
1b
2.
.
.
b
n
6GOAL: solve for x
Restriction: A is Symmetric Diagonally-Dominant • Symmetry: AT = A
• Diagonal Dominance: for all i,
a
ii≥
j=i
|
a
ij|
a
1n
a
2n. . .
a
nn
x
n
b
n
WHY SDD ? Practically important subcase of positive semidefinite matrices
A IS SDD A 0
Problem Definition: SDD Systems
PROBLEM INPUT: A ∈ Rn×n, b ∈ Rn.
a
11
a
12. . .
a
1na
12
a
22. . .
a
2n.
.
.
.
.
.
. .
.
.
.
.
a
1n
a
2n. . .
a
nn
x
1x
2.
.
.
x
n
=
b
1b
2.
.
.
b
n
GOAL: solve for x
Restriction: A is Symmetric Diagonally-Dominant • Symmetry: AT = A
• Diagonal Dominance: for all i,
a
ii≥
j=i
|
a
ij|
a
1n
a
2n. . .
a
nn
x
n
b
n
WHY SDD ? Practically important subcase of positive semidefinite matrices
Problem Definition: SDD Systems
PROBLEM INPUT: A ∈ Rn×n, b ∈ Rn, b ∈ Im(A)
a
11
0
. . .
a
1n0
a
22
. . .
0
.
.
.
.
.
.
. .
.
.
.
.
0
a
2n
. . .
a
nn
x
1x
2.
.
.
x
n
=
b
1b
2.
.
.
b
n
nnz(A)8
GOAL: solve for x
• approximately:
•in nearly-linear time in the sparsity nnz(A) and log(1/ǫ) , i.e. RunTime =
x − x∗A ≤ ǫx∗A
0
a
2n
. . .
a
nn
x
n
b
n
Reduction to Laplacian Systems
SDD SYSTEM:
Ax
=
b
LAPLACIAN SYSTEM:
Lv
=
χ
[Gremban’96]
Graph Laplacian
G =(V,E,w) weighted undirected graph with n vertices and m edges
1
1
2
1
3
GRAPH G
10
1
L
(
G
) =
Graph Laplacian
G =(V,E,w) weighted undirected graph with n vertices and m edges
1
1
2
1
3
GRAPH G
1
L
(
G
) =
GRAPH LAPLACIAN L(G)
Graph Laplacian
G =(V,E,w) weighted undirected graph with n vertices and m edges
1
1
2
1
3
GRAPH G
12
1
L
(
G
) =
GRAPH LAPLACIAN L(G)
Laplacian as Sum of Edges
0
. . .
0
. . .
0
. . .
0
..
.
..
.
..
.
..
.
..
.
..
.
..
.
0
. . .
1
. . .
−
1
. . .
0
.
.
.
.
.
.
.
i j iL
(
G
) =
e={i,j}∈E
w
e
−
..
.
..
.
..
.
..
.
..
.
..
.
..
.
0
. . .
−
1
. . .
1
. . .
0
..
.
..
.
..
.
..
.
..
.
..
.
..
.
0
. . .
0
. . .
0
. . .
0
Laplacian as Sum of Edges
0
. . .
0
. . .
0
. . .
0
..
.
..
.
..
.
..
.
..
.
..
.
..
.
0
. . .
1
. . .
−
1
. . .
0
.
.
.
.
.
.
.
i j i 14L
(
G
) =
e={i,j}∈E
w
e
−
..
.
..
.
..
.
..
.
..
.
..
.
..
.
0
. . .
−
1
. . .
1
. . .
0
..
.
..
.
..
.
..
.
..
.
..
.
..
.
0
. . .
0
. . .
0
. . .
0
jLaplacian as Sum of Edges
0
. . .
0
. . .
0
. . .
0
..
.
..
.
..
.
..
.
..
.
..
.
..
.
0
. . .
1
. . .
−
1
. . .
0
.
.
.
.
.
.
.
i j iL
(
G
) =
e={i,j}∈E
w
e
−
..
.
..
.
..
.
..
.
..
.
..
.
..
.
0
. . .
−
1
. . .
1
. . .
0
..
.
..
.
..
.
..
.
..
.
..
.
..
.
0
. . .
0
. . .
0
. . .
0
jLaplacian as Sum of Edges
L
(
G
) =
e={i,j}∈E
w
e
0
..
.
1
..
.
−
1
.
0
· · ·
1
· · ·
−
1
· · ·
0
T
i j i j 16
−
..
.
0
χ
e
χ
T
e
i
j
ARBITRARY ORIENTATION OF EDGES: WILL BE USEFUL WHEN DISCUSSING FLOWS
More Graph Matrices: Incidence Matrix
G =(V,E,w) weighted undirected graph with n vertices and m edges
χ
e 1χ
e 2..
n
B
T =
..
.
χ
ei..
.
χ
emMore Graph Matrices: Incidence Matrix
G =(V,E,w) weighted undirected graph with n vertices and m edges
B
T =
χ
e1χ
e 2..
.
m
n
18B
T =
.
χ
ei..
.
χ
em
m
L
=
e
=
{
i,j
}∈
E
w
e
χ
e
χ
T
e
=
B
T
W B
More Graph Matrices: Incidence Matrix
G =(V,E,w) weighted undirected graph with n vertices and m edges
B
T =
χ
e1χ
e 2..
.
χ
m
n
B
=
χ
ei..
.
χ
em
m
Action of Incidence Matrix
G =(V,E,w) weighted undirected graph with n vertices and m edges
+1 -1
+1 -1
20
Action of BTon a vector
f ∈ Rm:
Action of B on a vector v ∈ Rn:
(
B
Tf
)
i= flow out of i
−
flow into of i = net flow into graph at i
+1 -1
Graphs as Electrical Circuits
1
1
2
1
Graphs as Electrical Circuits
1
1/2
1/2
1
1/3 Edge conductances we
Edge resistances re
re= 1/we
Graphs as Electrical Circuits
1
1/2
1/2
1
1/3 Edge conductances we
Edge resistances re
re= 1/we
Graphs as Electrical Circuits
1
1/2
1/2
1
1/3 Edge conductances we
Edge resistances re
re= 1/we
24
one unit of electrical flow Into vertex 1 and out of 3
Graphs as Electrical Circuits
1
1/2
1/2
1
1/3 Edge conductances we
Edge resistances re
re= 1/we
one unit of electrical flow Into vertex 1 and out of 3
1. Ohm’s Law: For every edge e = (i, j) ∈ E:
f(i,j) = vi − vj
rij
Graphs as Electrical Circuits
1
1/2
1/2
1
1/3 Edge conductances we
Edge resistances re
re= 1/we
26
one unit of electrical flow Into vertex 1 and out of 3
1. Ohm’s Law: For every edge e = (i, j) ∈ E:
f(i,j) = vi − vj
rij
2. Kirchoff’s Conservation Law: For every vertex i ∈ V :
flow out of i − flow into i = net flow into network at i
f
=
R
−1Bv
Graphs as Electrical Circuits
1
1/2
1/2
1
1/3 Edge conductances we
Edge resistances re
re= 1/we
one unit of electrical flow Into vertex 1 and out of 3
1. Ohm’s Law:
f = R−1Bv = W Bv
2. Kirchoff ’s Conservation Law:
Graphs as Electrical Circuits
1
1/2
1/2
1
1/3 Edge conductances we
Edge resistances re
re= 1/we
28
one unit of electrical flow Into vertex 1 and out of 3
1. Ohm’s Law:
f = R−1Bv = W Bv
2. Kirchoff ’s Conservation Law:
BTf = χ
Example above: Current source s is vertex 1. Current sink t is vertex 3
General net flow vector χ allowed:
1
1/2
1/2
1
1/3 Edge conductances we
Edge resistances re
re= 1/we
General net flow vector χ :
χT1 = 0
Laplacian Systems as Electrical Problems
one unit of electrical flow Into vertex 1 and out of 3
Q: What is resulting electrical flow on the graph?
f
=
R
−1Bv
=
W Bv,
B
T
f
=
χ
1
1/2
1/2
1
1/3 Edge conductances we
Edge resistances re
re= 1/we
General net flow vector χ :
χT1 = 0
Laplacian Systems as Electrical Problems
30
one unit of electrical flow Into vertex 1 and out of 3
Q: What is resulting electrical flow on the graph?
Given input currents χ, finding voltage is equivalent to solving Laplacian system of linear equations
1
1/2
1/2
1
1/3 Edge conductances we
Edge resistances re
re= 1/we
General net flow vector χ :
χT1 = 0
Laplacian Systems as Electrical Problems
one unit of electrical flow Into vertex 1 and out of 3
Q: What is resulting electrical flow on the graph?
Lv
=
χ
v
=
L
+
χ
Energy Intepretation
Lv
=
χ
min
1·
x
T
Lx
−
x
T
χ
Optimality condition
32
min
x⊥1
1
2
·
x
T
Lx
−
x
T
χ
Equivalent up to scaling
min
x⊥1
1
2
·
x
T
Lx
Energy Intepretation
Lv
=
χ
min
1·
x
T
Lx
−
x
T
χ
Optimality condition
min
x⊥1
1
2
·
x
T
Lx
−
x
T
χ
Equivalent up to scaling
min
12
·
x
T
Lx
=
(i,j)∈E
(xi−xj)2
Why it matters
•
Direct Applications
•
Modeling electrical
networks
•
Simulating random walks
•
PageRank
34
•
PageRank
Why it matters
•
Direct Applications
•
Modeling electrical
networks
•
Simulating random walks
•
PageRank
•
PageRank
Numerical Applications
•
Finite element
[BHV08]Why it matters: Faster Graph Algorithms
“The Laplacian Paradigm”
•
Maximum flow
[CKM+11,LRS13,KOLS12]•
Multicommodity flow
[KMP12, KOLS12]•
Random spanning trees
[KM09]•
Graph sparsification
[SS11]36
•
Graph sparsification
[SS11]•
Lossy flow, min-cost flow
[DS08]•
Balanced partitioning
[OSV12]•
Oblivious routing
[KOLS12]Our Result
Solve
Lx
=
b
in time ˜
O
m
log
2n
log
1ǫ
• Very different approach
• Simple and intuitive algorithm
38
• Simple and intuitive algorithm
• Proof fits on a single blackboard
Our Result
• Very different approach
• Simple and intuitive algorithm
Solve
Lx
=
b
in time ˜
O
m
log
2n
log
1ǫ
• Simple and intuitive algorithm
• Proof fits on a single blackboard
Which computational model?
Q:
How do we account for running time of arithmetic operations?
A:
Constant time for operations on word-size sequence of bits.
1
0
0
0
0
1
1
1
1
1
1
O
(log
c
n
)
Size of Word: Polynomial in size of input
Which computational model?
Q:
How do we account for running time of arithmetic operations?
A:
Constant time for operations on word-size sequence of bits.
1
0
0
0
0
1
1
1
1
1
1
O
(log
c
n
)
Size of Word: Polynomial in size of input
Previous works
:The Philosophy of This Talk
PROBLEM
REPRESENTATION
42
The Philosophy of This Talk
PROBLEM
REPRESENTATION
Not all representations created equally: e.g. eigenvalue decompostion
Good representation often requires
combinatorial insight
The Philosophy of This Talk
PROBLEM
REPRESENTATION
Not all representations created equally: e.g. eigenvalue decompostion
Good representation often requires
combinatorial insight
44
Leverage large body of techniques in continuous optimization
Regularization Gradient Descent
Accelerated Gradient Descent Randomized Coordinate Descent
The Philosophy of This Talk
PROBLEM
REPRESENTATION
Not all representations created equally: e.g. eigenvalue decompostion
Good representation often requires
combinatorial insight
Leverage large body of techniques in continuous optimization
Regularization Gradient Descent
Accelerated Gradient Descent Randomized Coordinate Descent
Techniques and Challenges in the
Solution of Laplacian Systems
Solution of Laplacian Systems
Changing Representation:
Gaussian Elimination
Simple Graphic Interpretation:
L =
4 −1 −1 −1 −1
−1 1 0 0 0 −1 0 1 0 0
Changing Representation:
Gaussian Elimination
Simple Graphic Interpretation:
Eliminate (pivot) this node
48
4 −1 −1 −1 −1
−1 1 0 0 0 −1 0 1 0 0 −1 0 0 1 0 −1 0 0 0 1
Changing Representation:
Gaussian Elimination
Simple Graphic Interpretation:
Eliminate (pivot) this node
4 −1 −1 −1 −1
−1 1 0 0 0 −1 0 1 0 0
Changing Representation:
Gaussian Elimination
Simple Graphic Interpretation:
1 0 0 0 0 0 3 4 − 1 4 − 1 4 − 1 4
0 −1
4
3
4 − − 1
4 − 1 4
0 −1
4 − 1 4 3 4 − 1 4
0 −1
4 − 1
4 −1
Cholesky Decomposition
Simple Graphic Interpretation:
1 0 0 0 0
0 3 4 − 1 4 − 1 4 − 1 4 0 − 1 4 3
4 − − 1 4 − 1 4 L =
4 0 0 0 0
−1 1 0 0 0 0 0 1 0 0
4 −1 0 0 0 0 1 0 0 0 0 0 1 0 0
Changing Representation:
Gaussian Elimination
Advantages:
•
Gives exact algorithm
•
Computes inverse matrix (can then be used on any
b
)
•
Can choose elimination order
Changing Representation:
Gaussian Elimination
Advantages:
•
Gives exact algorithm
•
Computes implicit representation of inverse matrix
(can then be used on any
b
)
•
Can choose elimination order
•
Can choose elimination order
Disadvantages:
Gaussian Elimination
and Nested Dissection
Combinatorial arguments can help preserve sparsity by selecting good order:
G
Gaussian Elimination
and Nested Dissection
Combinatorial arguments can help preserve sparsity by selecting good order:
G
Gaussian Elimination
and Nested Dissection
Combinatorial arguments can help preserve sparsity by selecting good order:
G
56
Small Vertex Separator
Gaussian Elimination
and Nested Dissection
Combinatorial arguments can help preserve sparsity by selecting good order:
G
Size of existing separators bounds sparsity
Gaussian Elimination
and Nested Dissection
Combinatorial arguments can help preserve sparsity by selecting good order:
G
58
Size of existing separators bounds sparsity
Gaussian Elimination: the bad case
G sparse expander
Changing Representation:
Gaussian Elimination
Advantages:
•
Gives exact algorithm
•
Computes implicit representation of inverse matrix
(can then be used on any
b
)
•
Can choose elimination order to minimize running time
60
•
Can choose elimination order to minimize running time
Disadvantages:
•
Intermediate Laplacians can be very dense
•
Very slow
in worst case:
O
(
n
3)
O
(
n
w)
Gaussian Elimination: the good case
Easily linear time for
some graphs:
Gaussian Elimination: the good case
Easily linear time for
some graphs:
PATHS
Eliminate in time O(1)
62
PATHS
Gaussian Elimination: the good case
Easily linear time for
some graphs:
Gaussian Elimination: the good case
Easily linear time for
some graphs:
PATHS
O(n) time64
This works more in general for trees
by
recursively eliminating a leaf.
Iterative Methods: Gradient Descent
Consider energy interpretation:
This is a convex optimization problem on which we can apply gradient
min
x⊥1
1
2
·
x
T
Lx
−
x
T
χ
Consider energy interpretation:
This is a convex optimization problem on which we can apply gradient
Iterative Methods: Gradient Descent
f
(
x
)
∇
f
(
x
) =
Lx
−
χ
∇
2f
(
x
) =
L
min
x⊥1
1
2
·
x
T
Lx
−
x
T
χ
This is a convex optimization problem on which we can apply gradient descent techniques.
Consider energy interpretation:
This is a convex optimization problem on which we can apply gradient
Iterative Methods: Gradient Descent
f
(
x
)
∇
f
(
x
) =
Lx
−
χ
λ
2D
∇
2f
(
x
)
2
D
min
x⊥1
1
2
·
x
T
Lx
−
x
T
χ
Use degree norm · D
Iterative Methods: Gradient Descent
Consider energy interpretation:
This is a convex optimization problem on which we can apply gradient
f
(
x
)
∇
f
(
x
) =
Lx
−
χ
min
x⊥1
1
2
·
x
T
Lx
−
x
T
χ
λ
2D
∇
2f
(
x
)
2
D
68
This is a convex optimization problem on which we can apply gradient descent techniques.
Construct iterative solutions:
x
(t+1)=
x
(t)−
hD
−1∇
f
(
x
(t))
Step length
Iterative Methods: Gradient Descent
Consider energy interpretation:
This is a convex optimization problem on which we can apply gradient
f
(
x
)
∇
f
(
x
) =
Lx
−
χ
min
x⊥1
1
2
·
x
T
Lx
−
x
T
χ
λ
2D
∇
2f
(
x
)
2
D
This is a convex optimization problem on which we can apply gradient descent techniques.
Construct iterative solutions:
By standard gradient descent analysis h = 1/2
For quadratic function, it can be optimized at every step.
x
(t+1)=
x
(t)−
12
D
−1
∇
f
(
x
(t))
Iterative Methods: Gradient Descent
Consider energy interpretation:
This is a convex optimization problem on which we can apply gradient
f
(
x
)
∇
f
(
x
) =
Lx
−
χ
min
x⊥1
1
2
·
x
T
Lx
−
x
T
χ
λ
2D
∇
2f
(
x
)
2
D
70
This is a convex optimization problem on which we can apply gradient descent techniques.
Construct iterative solutions:
.
x
(t+1)=
tj=0 I+2W jχ
UNRAVEL RECURSION TO OBTAIN TRUNCATED SERIES: t steps of random walk
Iterative Methods: Gradient Descent
Consider energy interpretation:
This is a convex optimization problem on which we can apply gradient
f
(
x
)
∇
f
(
x
) =
Lx
−
χ
λ
2D
∇
2f
(
x
)
2
D
min
x⊥1
1
2
·
x
T
Lx
−
x
T
χ
This is a convex optimization problem on which we can apply gradient descent techniques.
Construct iterative solutions:
Iterations necessary to converge to ǫ-approximate solution:
x
(0), x
(1), x
(2), . . . , x
(t), . . .
x
(t+1)=
tj=0 I+2W jBad Example for Gradient Descent
PATH:
λ
2
=
n
1
272
Iterations necessary to converge to ǫ-approximate solution:
T
=
O
2
λ2
log
nǫ
=
O
n
2log
nǫ
Bad Example for Gradient Descent
PATH:
λ
2
=
n
1
21/2 1/2
S t
GAMBLER’S RUIN
Iterations necessary to converge to ǫ-approximate solution:
Each Iteration is a matrix-vector multiplication by , requiring time O(m)
T
=
O
2λ2
log
nǫ
=
O
n
2log
n ǫ2
n
I+W2
Bad Example for Gradient Descent
PATH:
λ
2
=
n
1
274
Iterations necessary to converge to ǫ-approximate solution:
Each Iteration is a matrix-vector multiplication by , requiring time O(m)
T
=
O
2
λ2
log
nǫ
=
O
n
2log
nǫ
2
n
I+W2
RunTime =
O
mn
2log
n ǫConsider energy interpretation:
This is a convex optimization problem on which we can apply gradient descent techniques.
min
x⊥11
2
·
x
T
Lx
−
x
T
χ
Accelerated Gradient Descent
∇
f
(
x
) =
Lx
−
χ
λ
2D
∇
2f
(
x
)
2
D
descent techniques.
Accelerated gradient techniques achieve better convergence:
Improved iteration count:
CHEBYSHEV’S ITERATION CONJUGATE GRADIENT
T
=
O
2
Still no luck, but …
PATH:
λ
=
1
T
=
O
n
log
n
76
PATH:
λ
2
=
n
1
2T
=
O
n
log
n
ǫ
RunTime =
O
mn
log
n ǫStill no luck, but …
PATH:
λ
=
1
T
=
O
n
log
n
IT TAKES n STEPS FOR CHARGE TO TRAVEL ACROSS
PATH:
λ
2
=
n
1
2T
=
O
n
log
n
ǫ
RunTime =
O
mn
log
n ǫCombining Representation and Iteration
Gaussian elimination and gradient methods seem complementary
•
Gaussian elimination is nearly-linear on
paths
(and trees),
but slow on
expanders
.
78
•
Gradient methods are nearly-linear time on
expanders
,
but slow on
paths
.
MAIN APPROACH TO FAST SOLVERS:
Combining Representation and Iteration:
Combinatorial Preconditioning
Lv
=
χ
Gradient methods fail when condition number is large.
IDEA: modify system to improve condition number
where H is a preconditioner graph.
DESIRED PROPERTIES OF H:
1. New matrix is well conditioned:
L
+
H
Lv
=
L
+
H
χ
Combining Representation and Iteration:
Combinatorial Preconditioning
Lv
=
χ
Gradient methods fail when condition number is large.
IDEA: modify system to improve condition number
80
where H is a preconditioner graph.
DESIRED PROPERTIES OF H:
1. New matrix is well conditioned:
2. Linear systems in L
H can be solved quickly by Gaussian elimination
L
+
H
Lv
=
L
+
H
χ
L
+HL
IMPROVES ITERATION COUNT
Preconditioning: Low-Stretch Trees
Preconditioning: Low-Stretch Trees
G
spanning tree T
Preconditioning: Low-Stretch Trees
G
spanning tree T
Preconditioning: Low-Stretch Trees
G
spanning tree T
st(
e
) =
1
r
ee′∈Pe
r
e′Stretch of e
84
e
Preconditioning: Low-Stretch Trees
G
spanning tree T
st(
e
) =
1
r
ee′∈Pe
r
e′Stretch of e
e
P
ePreconditioning: Low-Stretch Trees
G
spanning tree T
st(
e
) =
1
r
ee′∈Pe
r
e′Stretch of e
86
e
Pe
Preconditioning: Low-Stretch Trees
G
spanning tree T
st(
e
) =
1
r
ee′∈Pe
r
e′Stretch of e
e
P
eTotal Stretch of e
st(
T
) =
e∈E
Preconditioning: Low-Stretch Trees
G
spanning tree T
st(
e
) =
1
r
ee′∈Pe
r
e′Stretch of e
88
e
P
eTotal Stretch of e
st(
T
) =
e∈E
st(
e
)
Preconditioning: Low-Stretch Trees
G
spanning tree T
st(
e
) =
1
r
ee′∈Pe
r
e′Stretch of e
e
P
eTotal Stretch of e
st(
T
) =
e∈E
Preconditioning: Low-Stretch Trees
90
Preconditioning: Low-Stretch Trees
Q: How does this help?
A: Can help to bound condition number of system preconditioned by T
Preconditioning: Low-Stretch Trees
92
Q: How does this help?
A: Can help to bound condition number of system preconditioned by T
1
·
I
L
+TL
Gst(
T
)
·
I
Spanning tree property
Low-stretch-tree property
Preconditioning: Low-Stretch Trees
Q: How does this help?
A: Can help to bound condition number of system preconditioned by T
EASY PROOF:
1
·
I
L
+TL
Gst(
T
)
·
I
+Tr
+=
T
+= st(
)
Preconditioning: Low-Stretch Trees
94
Q: How does this help?
A: Can help to bound condition number of system preconditioned by T
EASY PROOF:
1
·
I
L
+TL
Gst(
T
)
·
I
λ
maxL
+TL
G≤
Tr
L
+TL
G=
e∈E
χ
T
e
L
+
T
χ
e= st(
T
)
Trace bounds eigenvalue
Preconditioning: Low-Stretch Trees
Q: How does this help?
A: Can help to bound condition number of system preconditioned by T
EASY PROOF:
1
·
I
L
+TL
Gst(
T
)
·
I
+Tr
+=
T
+Preconditioning: Partial Results
1. Preconditioning by low-stretch spanning trees[BH01]
Condition number
RunTime = ˜
O
(
√
m
log
n
log
n ǫ)
·
[
O
(
m
) +
O
(
n
)]
Preconditioning: Partial Results
1. Preconditioning by low-stretch spanning trees[BH01]
RunTime = ˜
O
(
√
m
log
n
log
n ǫ)
·
[
O
(
m
) +
O
(
n
)]
Condition number L+
T L
Preconditioning: Partial Results
1. Preconditioning by low-stretch spanning trees[BH01]
RunTime = ˜
O
(
√
m
log
n
log
n ǫ)
·
[
O
(
m
) +
O
(
n
)]
Condition number L+
T L
Multiplication by
98
2. Improved analysis using Conjugate Gradient and Trace bound [SW09]
RunTime = ˜
O
(
m
4/3polylog
n
)
Recursive Preconditioning:
Spielman-Teng and Koutis-Miller-Peng
Nearly-linear-time algorithms at last: [ST04], [KMP10],[KMP11]
Recursive Preconditioning:
Spielman-Teng and Koutis-Miller-Peng
Nearly-linear-time algorithms at last: [ST04], [KMP10],[KMP11]
-IDEA: Low-stretch trees do not provide good enough condition number. Use better preconditioner graphs
L
+
Lv
=
L
+
χ
100
PROBLEM: Hard to solve system for G
1.
SOLUTION: Solve recursively.
L
+
G
Recursive Preconditioning:
Spielman-Teng and Koutis-Miller-Peng
Nearly-linear-time algorithms at last: [ST04], [KMP10],[KMP11]
-IDEA: Low-stretch trees do not provide good enough condition number. Use better preconditioner graphs
L
+
Lv
=
L
+
χ
PROBLEM: Hard to solve system for G
1.
SOLUTION: Solve recursively.
MAIN IDEA: At every recursive level, G becomes smaller as some
low-L
+
G
1
Lv
=
L
+
G
1χ
L
+
G
2L
G
1x
=
L
+
Our Algorithm
Our Algorithm
(at last)
Choosing the Right Representation
Solve for the Flow
All algorithms discussed so far aim to solve voltage problem .
This is particularly problematic for gradient-based methods:
min
x⊥1
1
2
·
v
T
Lv
−
v
T
χ
104
This is particularly problematic for gradient-based methods:
Choosing the Right Representation
Solve for the Flow
All algorithms discussed so far aim to solve voltage problem .
Our algorithm targets the minimum-energy flow problem:
min
x⊥1
1
2
·
v
T
Lv
−
v
T
χ
Our algorithm targets the minimum-energy flow problem:
min
f
TRf
Choosing the Right Representation
Solve for Electrical Flow
All algorithms discussed so far aim to solve voltage problem .
Our algorithm targets the minimum-energy flow problem:
min
x⊥1
1
2
·
v
T
Lv
−
v
T
χ
Our algorithm targets the minimum-energy flow problem:
min
f
TRf
s.t
B
Tf
=
χ
Minimize energy
Choosing the Right Representation
Solve for Electrical Flow
All algorithms discussed so far aim to solve voltage problem .
Our algorithm targets the minimum-energy flow problem:
min
x⊥1
1
2
·
v
T
Lv
−
v
T
χ
Our algorithm targets the minimum-energy flow problem:
This choice opens up different representational questions:
min
f
TRf
s.t
B
Tf
=
χ
1
1
1
1
1
1
Minimize energy
Optimality Conditions for Flow Problem
1. Ohm’s Law:
∃
v
:
f
=
R
−1Bv
2. Kirchoff ’s Conservation Law:
B
Tf
=
χ
Optimality Conditions for Flow Problem
1. Ohm’s Law:
∃
v
:
f
=
R
−1Bv
2. Kirchoff ’s Conservation Law:
B
Tf
=
χ
We eliminate dependence on voltages in Ohm’s Law:
Kirchoff ’s Cycle Law: For any cycle C in G, the optimal electrical flow has
e∈C
refe = 0
Optimality Conditions for Flow Problem
1. Ohm’s Law:
∃
v
:
f
=
R
−1Bv
2. Kirchoff ’s Conservation Law:
B
Tf
=
χ
110
We eliminate dependence on voltages in Ohm’s Law:
Kirchoff ’s Cycle Law: For any cycle C in G, the optimal electrical flow has
e∈C
refe = 0
Optimality Conditions for Flow Problem
We eliminate dependence on voltages in Ohm’s Law:
Kirchoff ’s Cycle Law: For any cycle C in G, the optimal electrical flow has
e∈C
refe = 0
Fact:
Ohm’s Law
↔
Kirchoff’s Cycle Law
Fact:
Ohm’s Law
↔
Kirchoff’s Cycle Law
Flow values
Unit resistances 1
2 1
1
Optimality Conditions for Flow Problem
We eliminate dependence on voltages in Ohm’s Law:
Kirchoff ’s Cycle Law: For any cycle C in G, the optimal electrical flow has
e∈C
refe = 0
Fact:
Ohm’s Law
↔
Kirchoff’s Cycle Law
112
Fact:
Ohm’s Law
↔
Kirchoff’s Cycle Law
Flow values
Unit resistances
Obeys KCL
1
2 1
1
1
1
1
Optimality Conditions for Flow Problem
We eliminate dependence on voltages in Ohm’s Law:
Kirchoff ’s Cycle Law: For any cycle C in G, the optimal electrical flow has
e∈C
refe = 0
Fact:
Ohm’s Law
↔
Kirchoff’s Cycle Law
Fact:
Ohm’s Law
↔
Kirchoff’s Cycle Law
Flow values
Unit resistances
Obeys KCL
Set voltages
1
2 1
1
1 1
0 1
2
4
Optimality Conditions for Flow Problem
We eliminate dependence on voltages in Ohm’s Law:
Kirchoff ’s Cycle Law: For any cycle C in G, the optimal electrical flow has
e∈C
refe = 0
Fact:
Ohm’s Law
↔
Kirchoff’s Cycle Law
114
Fact:
Ohm’s Law
↔
Kirchoff’s Cycle Law
Flow values
Unit resistances
Obeys KCL
Set voltages
using spanning tree
KCL ensures that off-tree edges respect Ohm’s Law
1
2 1
1
1
1
1
1
0 1
1 2
4
2
1. Kirchoff’s Cycle Law: For any cycle
C
in
G,
the optimal electrical flow
has
e∈C
r
ef
e= 0
Algorithmic Approach
ITERATIVELY FIX BY ADDING/REMOVING FLOW ALONG CYCLES
2. Kirchoff’s Conservation Law:
1
1
1
1
1
0
1
s t
Initialize:
•
Pick a cycle
•
Pick a cycle
Improve:
Algorithmic Approach
0
0
0
0
0
0
•
Pick a cycle
•
Pick a cycle
1
1
1
1
1
0
1
s t
Initialize
:
•
Pick a cycle
•
•
•
Pick a cycle
Pick a cycle
Pick a cycle
Algorithmic Approach
0
0
0
0
0
0
•
Pick a cycle
•
Pick a cycle
Improve:
•
Pick a cycle
•
Fix it
1/4
1/4
Initialize:
Improve:
•
•
Pick a cycle
Pick a cycle
Algorithmic Approach
1/4 1/4
Improve:
•
Pick a cycle
•
Fix it
•
Pick a cycle
•
Fix it
1 1
1
1 3/4
1/4
1
Initialize:
s t
•
Pick a cycle
•
Pick a cycle
Improve:
•
Pick a cycle
•
Pick a cycle
Algorithmic Approach
1/4
0
0
1/4
0
•
Pick a cycle
•
Fix it
•
Pick a cycle
•
Fix it
Improve:
•
Pick a cycle
•
Fix it
•
Repeat
•
Pick a cycle
•
Fix it
Initialize:
•
Pick a cycle
•
Pick a cycle
Improve: 1 1 1 5/8 9/16 1/16 1 s t 51/16 35/16 1 25/16 0 Which cycle? Which cycle?
Algorithmic Approach
•
Pick a cycle
•
Fix it
•
Repeat
•
Pick a cycle
•
Fix it
Initialize:
•
Pick a cycle
•
Pick a cycle
Improve: 1 1 1 5/8 9/16 1/16 1 s t 51/16 35/16 1 25/16 0 Which cycle? Which cycle?
Algorithmic Approach
•
Pick a cycle
•
Fix it
•
Repeat
•
Pick a cycle
•
Fix it
Choosing the Right Representation:
Cycle Space
BTf = χ
f
f1
f = R−1Bv
122
f0
0
Choosing the Right Representation:
Cycle Space
BTf = χ
f
f1
f = R−1Bv
Cycle updates
An affine translation of span of cycles (cycle space CG) CG = {c : BTc = 0}
f0
Choosing the Right Representation:
Cycle Space
BTf = χ
f
f1
f = R−1Bv
Cycle updates
An affine translation of span of cycles (cycle space CG) CG = {c : BTc = 0}
124
f0
0
f* f2
Choosing the Right Representation:
Basis of Cycle Space
Fact:
Fundamental cycles of spanning tree give a basis of cycle space
Choosing the Right Representation:
Basis of Cycle Space
Fact:
Fundamental cycles of spanning tree give a basis of cycle space
126
Fix spanning tree
T
e
c
Our New Representation
ORIGINAl PROBLEM
∀
c
∈
C
G:
c
TR
(
f
0+
f
) = 0
B
Tf
= 0
Use spanning tree T
basis for CG
Our New Representation
ORIGINAl PROBLEM
∀
c
∈
C
G:
c
TR
(
f
0+
f
) = 0
B
Tf
= 0
128
Use spanning tree T
basis for CG
m – n + 1 constraints
∀
e
∈
E
\
T
:
c
TeR
(
f
0+
f
) = 0
Our New Representation
ORIGINAl PROBLEM
∀
c
∈
C
G:
c
TR
(
f
0+
f
) = 0
B
Tf
= 0
Use spanning tree T
basis for CG
m – n + 1 constraints
∀
e
∈
E
\
T
:
c
TeR
(
f
0+
f
) = 0
Our New Representation
ORIGINAl PROBLEM
∀
c
∈
C
G:
c
TR
(
f
0+
f
) = 0
B
Tf
= 0
130
Use spanning tree T
basis for CG
m – n + 1 constraints
Iteratively fix fundamental cycles of T WHAT ITERATIVE METHOD IS THIS?
m – n + 1 variables
Initialize:
•
Pick a cycle
•
Pick a cycle
Improve: 1 1 1 5/8 9/16 1/16 1 s t 51/16 35/16 1 25/16 0
Algorithmic Approach
Which cycle? Which cycle?•
Pick a cycle
•
Fix it
•
Repeat
•
Pick a cycle
•
Fix it
Choosing the Right Iteration
Gradient Descent?
BTf = χ
f
f = R−1Bv
An affine translation of span of cycles (cycle space CG) CG = {c : BTc = 0}
Basis of fundamental cycles
132 132
ft
0
Choosing the Right Iteration
Gradient Descent?
BTf = χ
f
f = R−1Bv
An affine translation of span of cycles (cycle space CG) CG = {c : BTc = 0}
Basis of fundamental cycles
∇
f
ft
f*
Gradient descent would update all cycles at once.
Choosing the Right Iteration
Randomized Cordinate Descent!
BTf = χ
f
f = R−1Bv
An affine translation of span of cycles (cycle space CG) CG = {c : BTc = 0}
Basis of fundamental cycles
134 134
ft
0
f*
Move along one coordinate direction at a time. Choose basis vector c
e w.p. proportional to
Convergence analysis?
cTe Rce
re
= st(
e
) + 1
Coordinate Descent: Convergence
min
x⊥1
1
2
·
x
T
Ax
−
b
T
x
Iteration count for gradient descent:
T
=
O
λ
max
(
A
)
λ
(
A
)
log
n
ǫ
∀e ∈ E \ T : cTe R(f0 + f) = 0
f = e∈E\T fece
Iteration count for randomized coordinate descent:
T
=
O
Tr
(
A
)
λ
min
(
A
)
log
n
ǫ
T
=
O
λ
min(
A
)
log
ǫ
Coordinate Descent: Convergence
min
x⊥1
1
2
·
x
T
Ax
−
b
T
x
Iteration count for gradient descent:
T
=
O
λ
max
(
A
)
λ
(
A
)
log
n
ǫ
∀e ∈ E \ T : cTe R(f0 + f) = 0
f = e∈E\T fece
136
Iteration count for randomized coordinate descent:
T
=
O
Tr
(
A
)
λ
min
(
A
)
log
n
ǫ
T
=
O
λ
min(
A
)
log
ǫ
Convergence
min
x⊥1
1
2
·
x
T
Ax
−
b
T
x
∀e ∈ E \ T : cTe R(f0 + f) = 0
f = e∈E\T fece
λ
min
(
A
) = 1
λ
max(
A
)
≤
Tr(
A
) = st(
T
) +
O
(
n
)
Same calculation as for voltage preconditioning.
Iteration count for randomized coordinate descent:
T
=
O
Tr
(A)
λmin(A)
log
n ǫNew Condition Numbers
min
x⊥1
1
2
·
x
T
Ax
−
b
T
x
∀e ∈ E \ T : cTe R(f0 + f) = 0
f = e∈E\T fece
λ
min
(
A
) = 1
λ
max(
A
)
≤
Tr(
A
) = st(
T
) +
O
(
n
)
138
Same calculation as for voltage preconditioning.
Iteration count for randomized coordinate descent:
T
=
O
Tr
(A)
λmin(A)
log
n ǫTree stretch
Worse by a factor of m – n + 1 in our case We start with a trace guarantee,
Final Convergence
min
x⊥1
1
2
·
x
T
Ax
−
b
T
x
∀e ∈ E \ T : cTe R(f0 + f) = 0
f = e∈E