Sketch As a Tool for Numerical Linear Algebra

(1)

Sketching as a Tool for Numerical Linear Algebra

(Graph Sparsification)

David P. Woodruff presented by Sepehr Assadi

o(n) Big Data Reading Group University of Pennsylvania

April, 2015

(2)

Goal

New survey by David Woodruff:

I Sketching as a Tool for Numerical Linear Algebra Topics:

I Subspace Embeddings

I Least Squares Regression

I Least Absolute Deviation Regression

I Low Rank Approximation

I Graph Sparsification

I Sketching Lower Bounds

(3)

Goal

New survey by David Woodruff:

I Sketching as a Tool for Numerical Linear Algebra Topics:

I Subspace Embeddings

I Least Squares Regression

I Least Absolute Deviation Regression

I Low Rank Approximation

I Graph Sparsification

I Sketching Lower Bounds

(4)

Matrix Compression

Previously:

I Compressa matrix A ∈ R^n×d using linear sketches

I Example: subspace embedding

Definition (`₂-subspace embedding)

A (1 ± ε) `₂-subspace embedding for a matrix A ∈ R^n×d is a matrix S for which for all x ∈ Rⁿ

kSAxk²₂= (1 ± ε) kAxk²₂

I Typically SA is an ˜O(d²)-size matrix

I Techniques:

F Usingrandommatrices S (Guassian, sign matrices, etc. )

F Usingleverage scoresampling

(5)

Graph Compression

Today:

I Compressa graph G(V , E ) using linear sketches

I Example: sparsification

Definition (cut sparsifier)

A (1 ± ε) cut sparsifier of a graph G(V , E ) is a weighted subgraph H of G such that for any S ⊆ V :

W_H(S , S ) = (1 ± ε) · W_G(S , S )

*W_G(S , S ) is the weight of the cut between S and S in G

I Typically H is an ˜O(n)-size graph

(6)

Graph Compression (cont.)

Laplacianmatrix of a graph G(V , E ): L ∈ R^n×n

I L = D − A, degree matrixD ∈ R^n×n and adjacency matrixA

I L =^P_e∈EL_e foredge-laplacian matrix L_e ∈ R^n×n

I L = B^TB foredge-vertex incidence matrixB ∈ R(ⁿ²)^×n

A set of vertices S ⊆ V and its characteristic vector x ∈ {0, 1}ⁿ: x^TLx = ^X

e:(u,v)∈E

(x_u− x_v)² =δ_G(S , S )

Any cut sparsifier H of G has a Laplacian ˜L such that:

∀x ∈ {0, 1}ⁿ x^TLx = (1 ± ε) · x˜ ^TLx

(7)

Spectral Sparsifier

Definition (spectral sparsifier)

A (1 ± ε) spectral sparsifier of a graph G(V , E ) is a weighted subgraph H of G such that for any x ∈ Rⁿ:

x^TLx = (1 ± ε) · x˜ ^TLx

*L (resp. ˜L) is the Laplacian of G (resp. H )

Originally proposed by Spielman and Teng [ST11]:

I O(m) construction time and ˜˜ O(n) size.

(8)

Spectral vs Cut Sparsifiers

Difference between spectral and cut sparsifiers:

(Figure from [ST11])

(9)

Graph vs Matrix Compression

Matrix compression A ∈ R^n×d

I A is atallmatrix, i.e., n d

I Compression guarantee of the form ˜O(d²) Graph compression L ∈ R^n×n

I L is a squarematrix

(10)

Graph vs Matrix Compression

Matrix compression A ∈ R^n×d

I A is atallmatrix, i.e., n d

I Compression guarantee of the form ˜O(d²) Graph compression L ∈ R^n×n

I L is a squarematrix But ...

I L = B^TB and B istall

I x^TLx = x^TB^TBx = kBxk²

I Spectral sparsification is a subspace embeddingfor B!

(11)

Spectral Sparsification and Subspace Embedding

A sampling based subspace embedding: Leverage score sampling Leverage Scoreof i-th row of A = UΣV:

`_i =U_(i)

2

Leverage score sampling for A ∈ R^m×d

I S_s×m = D_s×m· Ω_m×m

I D_s×m: rescaling matrix (according to the sampled probability)

I Ω_m×m: sampling matrix (based on leverage scores)

Theorem (LS-sampling theorem)

For s = Θ(^{d log d}_βε2 ), with probability 0.99, S_s×m is a subspace

(12)

Spectral Sparsification and Subspace Embedding (cont.)

Theorem

Sampling and weighting ˜O(ε⁻²n) edges from G(V , E ) according to leverage scores of B ∈ R(ⁿ²)^×n results in a (1 ± ε) spectral sparsifier of G.

Proof.

For any x ∈ Rⁿ, x^TLx = kBxk

LS-sampling for subspace embedding of B

(13)

Linear Sketching for Spectral Sparsification

Theorem ( [KLM⁺14])

There exists a distribution on ε⁻²polylog(n) ×ⁿ₂ dimensional matrices S, such that with high probability, from S · B, a (1 ± ε) spectral sparsifier of G can be recovered.

Key feature: linear sketch

I Firstsingle pass spectral sparsifier fordynamic graph streams[KLM⁺14]

(14)

Introduction and Removal of Artificial Bases

Theorem ( [LMP13])

Let K be any PSD matrix with maximum eigen value λu and minimum (non-zero) eigen value λ_l and d = dlog (λ_u/λ_l)e. For

` ∈ [d], define:

γ(`) = λ_u 2^`

Consider the sequence of PSD matrices K(0), . . . , K(d), where:

K(`) = K + γ(`) · I

Then:

1 K _R K(d) _R 2K

2 K(`) K(` − 1) 2K(`) for ` ≥ 1

(15)

Constructing a Spectral Sparsifier

Use previous theorem!

d = O(log n) for Laplacian matrices

Leverage scores of K(`) ≈ leverage scores of K(` + 1)

Proof.

On the board.

(16)

Sparse Recovery Algorithm

Theorem ([GLPS12])

There exists an algorithm D and a distribution on matrices Φ of dimension ε⁻²polylog(n) × n, such that for any x ∈ Rⁿ, with high probability, D(Φx, i) can detect whether |x_i| = Ω(kxk) or

|x_i| = o(kxk).

Heavy hitter detection!

(17)

Constructing a Spectral Sparsifiers via Linear Sketches

1 For i = 1, . . . , O(log n):

(a) Maintain Φ · Dⁱ· B, (Φ is the sparse recovery matrix, Dⁱ ∈ R(ⁿ²)^×(ⁿ₂) is diagonal)

2 Repeat O(log n) times We are done!

Proof Sketch.

Enough information to traverse the hierarchyof K(0) to K(d) At each level `, compute Φ · Dⁱ · B · K(`)^†b_e for every edge e

†

(18)

Questions?

(19)

Anna C. Gilbert, Yi Li, Ely Porat, and Martin J. Strauss.

Approximate sparse recovery: Optimizing time and measurements.

SIAM J. Comput., 41(2):436–453, 2012.

Michael Kapralov, Yin Tat Lee, Cameron Musco, Christopher Musco, and Aaron Sidford.

Single pass spectral sparsification in dynamic streams.

In 55th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2014,, pages 561–570, 2014.

Mu Li, Gary L. Miller, and Richard Peng.

Iterative row sampling.

In 54th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2013,, pages 127–136, 2013.