Sketching as a Tool for Numerical Linear Algebra
(Graph Sparsification)
David P. Woodruff presented by Sepehr Assadi
o(n) Big Data Reading Group University of Pennsylvania
April, 2015
Goal
New survey by David Woodruff:
I Sketching as a Tool for Numerical Linear Algebra Topics:
I Subspace Embeddings
I Least Squares Regression
I Least Absolute Deviation Regression
I Low Rank Approximation
I Graph Sparsification
I Sketching Lower Bounds
Goal
New survey by David Woodruff:
I Sketching as a Tool for Numerical Linear Algebra Topics:
I Subspace Embeddings
I Least Squares Regression
I Least Absolute Deviation Regression
I Low Rank Approximation
I Graph Sparsification
I Sketching Lower Bounds
Matrix Compression
Previously:
I Compressa matrix A ∈ Rn×d using linear sketches
I Example: subspace embedding
Definition (`2-subspace embedding)
A (1 ± ε) `2-subspace embedding for a matrix A ∈ Rn×d is a matrix S for which for all x ∈ Rn
kSAxk22= (1 ± ε) kAxk22
I Typically SA is an ˜O(d2)-size matrix
I Techniques:
F Usingrandommatrices S (Guassian, sign matrices, etc. )
F Usingleverage scoresampling
Graph Compression
Today:
I Compressa graph G(V , E ) using linear sketches
I Example: sparsification
Definition (cut sparsifier)
A (1 ± ε) cut sparsifier of a graph G(V , E ) is a weighted subgraph H of G such that for any S ⊆ V :
WH(S , S ) = (1 ± ε) · WG(S , S )
*WG(S , S ) is the weight of the cut between S and S in G
I Typically H is an ˜O(n)-size graph
Graph Compression (cont.)
Laplacianmatrix of a graph G(V , E ): L ∈ Rn×n
I L = D − A, degree matrixD ∈ Rn×n and adjacency matrixA
I L =Pe∈ELe foredge-laplacian matrix Le ∈ Rn×n
I L = BTB foredge-vertex incidence matrixB ∈ R(n2)×n
A set of vertices S ⊆ V and its characteristic vector x ∈ {0, 1}n: xTLx = X
e:(u,v)∈E
(xu− xv)2 =δG(S , S )
Any cut sparsifier H of G has a Laplacian ˜L such that:
∀x ∈ {0, 1}n xTLx = (1 ± ε) · x˜ TLx
Spectral Sparsifier
Definition (spectral sparsifier)
A (1 ± ε) spectral sparsifier of a graph G(V , E ) is a weighted subgraph H of G such that for any x ∈ Rn:
xTLx = (1 ± ε) · x˜ TLx
*L (resp. ˜L) is the Laplacian of G (resp. H )
Originally proposed by Spielman and Teng [ST11]:
I O(m) construction time and ˜˜ O(n) size.
Spectral vs Cut Sparsifiers
Difference between spectral and cut sparsifiers:
(Figure from [ST11])
Graph vs Matrix Compression
Matrix compression A ∈ Rn×d
I A is atallmatrix, i.e., n d
I Compression guarantee of the form ˜O(d2) Graph compression L ∈ Rn×n
I L is a squarematrix
Graph vs Matrix Compression
Matrix compression A ∈ Rn×d
I A is atallmatrix, i.e., n d
I Compression guarantee of the form ˜O(d2) Graph compression L ∈ Rn×n
I L is a squarematrix But ...
I L = BTB and B istall
I xTLx = xTBTBx = kBxk2
I Spectral sparsification is a subspace embeddingfor B!
Spectral Sparsification and Subspace Embedding
A sampling based subspace embedding: Leverage score sampling Leverage Scoreof i-th row of A = UΣV:
`i = U(i)
2
Leverage score sampling for A ∈ Rm×d
I Ss×m = Ds×m· Ωm×m
I Ds×m: rescaling matrix (according to the sampled probability)
I Ωm×m: sampling matrix (based on leverage scores)
Theorem (LS-sampling theorem)
For s = Θ(d log dβε2 ), with probability 0.99, Ss×m is a subspace
Spectral Sparsification and Subspace Embedding (cont.)
Theorem
Sampling and weighting ˜O(ε−2n) edges from G(V , E ) according to leverage scores of B ∈ R(n2)×n results in a (1 ± ε) spectral sparsifier of G.
Proof.
For any x ∈ Rn, xTLx = kBxk
LS-sampling for subspace embedding of B
Linear Sketching for Spectral Sparsification
Theorem ( [KLM+14])
There exists a distribution on ε−2polylog(n) ×n2 dimensional matrices S, such that with high probability, from S · B, a (1 ± ε) spectral sparsifier of G can be recovered.
Key feature: linear sketch
I Firstsingle pass spectral sparsifier fordynamic graph streams[KLM+14]
Introduction and Removal of Artificial Bases
Theorem ( [LMP13])
Let K be any PSD matrix with maximum eigen value λu and minimum (non-zero) eigen value λl and d = dlog (λu/λl)e. For
` ∈ [d], define:
γ(`) = λu 2`
Consider the sequence of PSD matrices K(0), . . . , K(d), where:
K(`) = K + γ(`) · I
Then:
1 K R K(d) R 2K
2 K(`) K(` − 1) 2K(`) for ` ≥ 1
Constructing a Spectral Sparsifier
Use previous theorem!
d = O(log n) for Laplacian matrices
Leverage scores of K(`) ≈ leverage scores of K(` + 1)
Proof.
On the board.
Sparse Recovery Algorithm
Theorem ([GLPS12])
There exists an algorithm D and a distribution on matrices Φ of dimension ε−2polylog(n) × n, such that for any x ∈ Rn, with high probability, D(Φx, i) can detect whether |xi| = Ω(kxk) or
|xi| = o(kxk).
Heavy hitter detection!
Constructing a Spectral Sparsifiers via Linear Sketches
1 For i = 1, . . . , O(log n):
(a) Maintain Φ · Di· B, (Φ is the sparse recovery matrix, Di ∈ R(n2)×(n2) is diagonal)
2 Repeat O(log n) times We are done!
Proof Sketch.
Enough information to traverse the hierarchyof K(0) to K(d) At each level `, compute Φ · Di · B · K(`)†be for every edge e
†
Questions?
Anna C. Gilbert, Yi Li, Ely Porat, and Martin J. Strauss.
Approximate sparse recovery: Optimizing time and measurements.
SIAM J. Comput., 41(2):436–453, 2012.
Michael Kapralov, Yin Tat Lee, Cameron Musco, Christopher Musco, and Aaron Sidford.
Single pass spectral sparsification in dynamic streams.
In 55th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2014,, pages 561–570, 2014.
Mu Li, Gary L. Miller, and Richard Peng.
Iterative row sampling.
In 54th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2013,, pages 127–136, 2013.