• No results found

Matrix-Vector Multiplication in Sub-Quadratic Time (Some Preprocessing Required)

N/A
N/A
Protected

Academic year: 2022

Share "Matrix-Vector Multiplication in Sub-Quadratic Time (Some Preprocessing Required)"

Copied!
55
0
0

Loading.... (view fulltext now)

Full text

(1)

Matrix-Vector Multiplication in Sub-Quadratic Time (Some Preprocessing Required)

Ryan Williams

January 17, 2007

0-0

(2)

Introduction

Matrix-Vector Multiplication: Fundamental Operation in Scientific Computing

1

(3)

Introduction

Matrix-Vector Multiplication: Fundamental Operation in Scientific Computing How fast can

n × n

matrix-vector multiplication be?

1-a

(4)

Introduction

Matrix-Vector Multiplication: Fundamental Operation in Scientific Computing How fast can

n × n

matrix-vector multiplication be?

Θ(n

2

)

steps just to read the matrix!

1-b

(5)

Introduction

Matrix-Vector Multiplication: Fundamental Operation in Scientific Computing How fast can

n × n

matrix-vector multiplication be?

Θ(n

2

)

steps just to read the matrix!

Main Result: If we allow

O(n

2+ε

)

preprocessing, then matrix-vector multiplication over any finite semiring can be done in

O(n

2

/(ε log n)

2

)

.

1-c

(6)

Better Algorithms for Matrix Multiplication

Three of the major developments:

2

(7)

Better Algorithms for Matrix Multiplication

Three of the major developments:

Arlazarov et al., a.k.a. “Four Russians” (1960’s):

O(n

3

/ log n)

operations

Uses table lookups

Good for hardware with short vector operations as primitives

2-a

(8)

Better Algorithms for Matrix Multiplication

Three of the major developments:

Arlazarov et al., a.k.a. “Four Russians” (1960’s):

O(n

3

/ log n)

operations

Uses table lookups

Good for hardware with short vector operations as primitives

Strassen (1969):

n

log 7log 2

= O(n

2.81

)

operations

Asymptotically fast, but overhead in the big-O

Experiments in practice are inconclusive about Strassen vs. Four Russians for Boolean matrix multiplication (Bard, 2006)

2-b

(9)

Better Algorithms for Matrix Multiplication

Three of the major developments:

Arlazarov et al., a.k.a. “Four Russians” (1960’s):

O(n

3

/ log n)

operations

Uses table lookups

Good for hardware with short vector operations as primitives

Strassen (1969):

n

log 7log 2

= O(n

2.81

)

operations

Asymptotically fast, but overhead in the big-O

Experiments in practice are inconclusive about Strassen vs. Four Russians for Boolean matrix multiplication (Bard, 2006)

Coppersmith and Winograd (1990):

O(n

2.376

)

operations

Not yet practical

2-c

(10)

Focus: Combinatorial Matrix Multiplication Algorithms

3

(11)

Focus: Combinatorial Matrix Multiplication Algorithms

Also called non-algebraic; let’s call them non-subtractive E.g. Four-Russians is combinatorial, Strassen isn’t

3-a

(12)

Focus: Combinatorial Matrix Multiplication Algorithms

Also called non-algebraic; let’s call them non-subtractive E.g. Four-Russians is combinatorial, Strassen isn’t

More Non-Subtractive Boolean Matrix Mult. Algorithms:

Atkinson and Santoro:

O(n

3

/ log

3/2

n)

on a

(log n)

-word RAM

Rytter and Basch-Khanna-Motwani:

O(n

3

/ log

2

n)

on a RAM

Chan: Four Russians can be implemented on

O(n

3

/ log

2

n)

on a pointer machine

3-b

(13)

Main Result

The

O(n

3

/ log

2

n)

matrix multiplication algorithm can be “de-amortized”

4

(14)

Main Result

The

O(n

3

/ log

2

n)

matrix multiplication algorithm can be “de-amortized”

More precisely, we can:

Preprocess an

n × n

matrix

A

over a finite semiring in

O(n

2+ε

)

Such that vector multiplications with

A

can be done in

O(n

2

/(ε log n)

2

)

4-a

(15)

Main Result

The

O(n

3

/ log

2

n)

matrix multiplication algorithm can be “de-amortized”

More precisely, we can:

Preprocess an

n × n

matrix

A

over a finite semiring in

O(n

2+ε

)

Such that vector multiplications with

A

can be done in

O(n

2

/(ε log n)

2

)

Allows for “non-subtractive” matrix multiplication to be done on-line

4-b

(16)

Main Result

The

O(n

3

/ log

2

n)

matrix multiplication algorithm can be “de-amortized”

More precisely, we can:

Preprocess an

n × n

matrix

A

over a finite semiring in

O(n

2+ε

)

Such that vector multiplications with

A

can be done in

O(n

2

/(ε log n)

2

)

Allows for “non-subtractive” matrix multiplication to be done on-line Can be implemented on a pointer machine

4-c

(17)

Main Result

The

O(n

3

/ log

2

n)

matrix multiplication algorithm can be “de-amortized”

More precisely, we can:

Preprocess an

n × n

matrix

A

over a finite semiring in

O(n

2+ε

)

Such that vector multiplications with

A

can be done in

O(n

2

/(ε log n)

2

)

Allows for “non-subtractive” matrix multiplication to be done on-line Can be implemented on a pointer machine

This Talk: The Boolean case

4-d

(18)

Preprocessing Phase: The Boolean Case

Partition the input matrix

A

into blocks of

⌈ε log n⌉ × ⌈ε log n⌉

size:

A1,1

A2,1

A n

εlog n,1

A1,2 A1, n

εlog n

A n

εlog n,εlog nn

· · ·

...

· · · · · ·

...

...

Ai,j

ε log n

ε log n

A =

5

(19)

Preprocessing Phase: The Boolean Case

Build a graph

G

with parts

P

1

, . . . , P

n/(ε log n),

Q

1

, . . . , Q

n/(ε log n) 2ε log n

P

2

... ... ...

...

P

n

ε log n

P

1 2ε log n

2ε log n 2ε log n

2ε log n 2ε log n

Q

1

Q

2

Q

n

ε log n

Each part has 2ε log n vertices, one for each possible ε log n vector

6

(20)

Preprocessing Phase: The Boolean Case

Edges of

G

: Each vertex

v

in each

P

i has exactly one edge into each

Q

j

2

ε log n

P i 2

ε log n

Q j

v

A

j,i

v

7

(21)

Preprocessing Phase: The Boolean Case

Edges of

G

: Each vertex

v

in each

P

i has exactly one edge into each

Q

j

2

ε log n

P i 2

ε log n

Q j

v

A

j,i

v

Time to build the graph:

n

ε log n ·

n

ε log n · 2

ε log n

· ( ε log n) 2 = O(n 2+ε )

number of Qj

number of Pi

number of nodes in Pi

matrix-vector mult of Aj,i and v

7-a

(22)

How to Do Fast Vector Multiplications

Let

v

be a column vector. Want:

A · v

.

8

(23)

How to Do Fast Vector Multiplications

Let

v

be a column vector. Want:

A · v

.

(1)

Break up

v

into

ε log n

sized chunks:

v =

v

1

v

2

...

v

n

εlog n

8-a

(24)

How to Do Fast Vector Multiplications

(2)

For each

i = 1, . . . , n/(ε log n)

, look up

v

i in

P

i.

9

(25)

How to Do Fast Vector Multiplications

(2)

For each

i = 1, . . . , n/(ε log n)

, look up

v

i in

P

i.

2ε log n

P

2

... ... ...

...

P

n

ε log n

P

1 2ε log n

2ε log n 2ε log n

2ε log n 2ε log n

Q

1

Q

2

Q

n

ε log n

v1

v2

vn/(ε log n)

Takes

O(n) ˜

time.

9-a

(26)

How to Do Fast Vector Multiplications

(2)

For each

i = 1, . . . , n/(ε log n)

, look up

v

i in

P

i.

2ε log n

P

2

... ... ...

...

P

n

ε log n

P

1 2ε log n

2ε log n 2ε log n

2ε log n 2ε log n

Q

1

Q

2

Q

n

ε log n

v1

v2

vn/(ε log n)

Takes

O(n) ˜

time.

10

(27)

How to Do Fast Vector Multiplications

(3)

Look up the neighbors of

v

i, mark each neighbor found.

11

(28)

How to Do Fast Vector Multiplications

(3)

Look up the neighbors of

v

i, mark each neighbor found.

2ε log n

P

2

... ... ...

...

P

n

ε log n

P

1 2ε log n

2ε log n 2ε log n

2ε log n 2ε log n

Q

1

Q

2

Q

n

ε log n

v1

v2

vn/(ε log n)

A1,1 · v1

A2,1 · v1

A n

ε log n,1 · v1

11-a

(29)

How to Do Fast Vector Multiplications

(3)

Look up the neighbors of

v

i, mark each neighbor found.

2ε log n

P

2

... ... ...

...

P

n

ε log n

P

1 2ε log n

2ε log n 2ε log n

2ε log n 2ε log n

Q

1

Q

2

Q

n

ε log n

v1

v2

vn/(ε log n)

A1,2 · v2

A2,2 · v2

A n

ε log n,2 · v2

12

(30)

How to Do Fast Vector Multiplications

(3)

Look up the neighbors of

v

i, mark each neighbor found.

2ε log n

P

2

... ... ...

...

P

n

ε log n

P

1 2ε log n

2ε log n 2ε log n

2ε log n 2ε log n

Q

1

Q

2

Q

n

ε log n

v1

v2

vn/(ε log n)

A1, n

ε log n · vn/(ε log n)

A2, n

ε log n · vn/(ε log n)

A n

ε log n,ε log nn · vn/(ε log n)

Takes

O

 

n ε log n



2



13

(31)

How to Do Fast Vector Multiplications

(4)

For each

Q

j, define

v

j as the OR of all marked vectors in

Q

j

2ε log n

P

2

... ... ...

...

P

n

ε log n

P

1 2ε log n

2ε log n 2ε log n

2ε log n 2ε log n

Q

1

Q

2

Q

n

ε log n

v1

v2

vn/(ε log n)

v

1

v

2

v

n/(ε log n)

Takes

O(n ˜

1+ε

)

time

14

(32)

How to Do Fast Vector Multiplications

(4)

For each

Q

j, define

v

j as the OR of all marked vectors in

Q

j

2ε log n

P

2

... ... ...

...

P

n

ε log n

P

1 2ε log n

2ε log n 2ε log n

2ε log n 2ε log n

Q

1

Q

2

Q

n

ε log n

v1

v2

vn/(ε log n)

v

1

v

2

v

n/(ε log n)

Takes

O(n ˜

1+ε

)

time

15

(33)

How to Do Fast Vector Multiplications

(5)

Output

v

:=

v

1

v

2

...

v

n

εlog n

. Claim:

v

= A · v

.

16

(34)

How to Do Fast Vector Multiplications

(5)

Output

v

:=

v

1

v

2

...

v

n

εlog n

. Claim:

v

= A · v

.

Proof: By definition,

v

j

= W

n/(ε log n)

i=1

A

j,i

· v

i.

16-a

(35)

How to Do Fast Vector Multiplications

(5)

Output

v

:=

v

1

v

2

...

v

n

εlog n

. Claim:

v

= A · v

.

Proof: By definition,

v

j

= W

n/(ε log n)

i=1

A

j,i

· v

i.

Av =

A

1,1

· · · A

1,n/(ε log n)

... . . . ...

A

n/(ε log n),1

· · · A

n/(ε log n),n/(ε log n)

v

1

...

v

n

εlog n

16-b

(36)

How to Do Fast Vector Multiplications

(5)

Output

v

:=

v

1

v

2

...

v

n

εlog n

. Claim:

v

= A · v

.

Proof: By definition,

v

j

= W

n/(ε log n)

i=1

A

j,i

· v

i.

Av =

A

1,1

· · · A

1,n/(ε log n)

... . . . ...

A

n/(ε log n),1

· · · A

n/(ε log n),n/(ε log n)

v

1

...

v

n

εlog n

= ( W

n/(ε log n)

i=1

A

1,i

· v

i

, . . . , W

n/(ε log n)

i=1

A

1,n/(ε log n)

· v

i

) = v

.

16-c

(37)

Some Applications

Can quickly compute the neighbors of arbitrary vertex subsets

Let

A

be the adjacency matrix of

G = (V, E)

.

Let

v

S be the indicator vector for a

S ⊆ V

.

17

(38)

Some Applications

Can quickly compute the neighbors of arbitrary vertex subsets

Let

A

be the adjacency matrix of

G = (V, E)

.

Let

v

S be the indicator vector for a

S ⊆ V

.

Proposition:

A · v

S is the indicator vector for

N (S)

, the neighborhood of

S

.

17-a

(39)

Some Applications

Can quickly compute the neighbors of arbitrary vertex subsets

Let

A

be the adjacency matrix of

G = (V, E)

.

Let

v

S be the indicator vector for a

S ⊆ V

.

Proposition:

A · v

S is the indicator vector for

N (S)

, the neighborhood of

S

.

Corollary: After

O(n

2+ε

)

preprocessing, can determine the neighborhood of any vertex subset in

O(n

2

/(ε log n)

2

)

time.

(One level of BFS in

o(n

2

)

time)

17-b

(40)

Graph Queries

Corollary: After

O(n

2+ε

)

preprocessing, can determine if a given vertex subset is an independent set, a vertex cover, or a dominating set, all in

O(n

2

/(ε log n)

2

)

time.

18

(41)

Graph Queries

Corollary: After

O(n

2+ε

)

preprocessing, can determine if a given vertex subset is an independent set, a vertex cover, or a dominating set, all in

O(n

2

/(ε log n)

2

)

time.

Proof: Let

S ⊆ V

.

18-a

(42)

Graph Queries

Corollary: After

O(n

2+ε

)

preprocessing, can determine if a given vertex subset is an independent set, a vertex cover, or a dominating set, all in

O(n

2

/(ε log n)

2

)

time.

Proof: Let

S ⊆ V

.

S

is dominating

⇐⇒ S ∪ N (S) = V

.

18-b

(43)

Graph Queries

Corollary: After

O(n

2+ε

)

preprocessing, can determine if a given vertex subset is an independent set, a vertex cover, or a dominating set, all in

O(n

2

/(ε log n)

2

)

time.

Proof: Let

S ⊆ V

.

S

is dominating

⇐⇒ S ∪ N (S) = V

.

S

is independent

⇐⇒ S ∩ N (S) = ∅

.

18-c

(44)

Graph Queries

Corollary: After

O(n

2+ε

)

preprocessing, can determine if a given vertex subset is an independent set, a vertex cover, or a dominating set, all in

O(n

2

/(ε log n)

2

)

time.

Proof: Let

S ⊆ V

.

S

is dominating

⇐⇒ S ∪ N (S) = V

.

S

is independent

⇐⇒ S ∩ N (S) = ∅

.

S

is a vertex cover

⇐⇒ V − S

is independent.

18-d

(45)

Graph Queries

Corollary: After

O(n

2+ε

)

preprocessing, can determine if a given vertex subset is an independent set, a vertex cover, or a dominating set, all in

O(n

2

/(ε log n)

2

)

time.

Proof: Let

S ⊆ V

.

S

is dominating

⇐⇒ S ∪ N (S) = V

.

S

is independent

⇐⇒ S ∩ N (S) = ∅

.

S

is a vertex cover

⇐⇒ V − S

is independent.

Each can be quickly determined from knowing

S

and

N (S)

.

18-e

(46)

Triangle Detection

19

(47)

Triangle Detection

Problem: Triangle Detection

Given: Graph

G

and vertex

i

.

Question: Does

i

participate in a 3-cycle, a.k.a. triangle?

19-a

(48)

Triangle Detection

Problem: Triangle Detection

Given: Graph

G

and vertex

i

.

Question: Does

i

participate in a 3-cycle, a.k.a. triangle?

Worst Case: Can take

Θ(n

2

)

time to check all pairs of neighbors of

i

19-b

(49)

Triangle Detection

Problem: Triangle Detection

Given: Graph

G

and vertex

i

.

Question: Does

i

participate in a 3-cycle, a.k.a. triangle?

Worst Case: Can take

Θ(n

2

)

time to check all pairs of neighbors of

i

Corollary: After

O(n

2+ε

)

preprocessing on

G

, can solve triangle detection for arbitrary vertices in

O(n

2

/(ε log n)

2

)

time.

19-c

(50)

Triangle Detection

Problem: Triangle Detection

Given: Graph

G

and vertex

i

.

Question: Does

i

participate in a 3-cycle, a.k.a. triangle?

Worst Case: Can take

Θ(n

2

)

time to check all pairs of neighbors of

i

Corollary: After

O(n

2+ε

)

preprocessing on

G

, can solve triangle detection for arbitrary vertices in

O(n

2

/(ε log n)

2

)

time.

Proof: Given vertex

i

, let

S

be its set of neighbors (gotten in

O(n)

time).

S

is not independent

⇐⇒ i

participates in a triangle.

19-d

(51)

Conclusion

A preprocessing/multiplication algorithm for matrix-vector multiplication that builds on lookup table techniques

20

(52)

Conclusion

A preprocessing/multiplication algorithm for matrix-vector multiplication that builds on lookup table techniques

Is there a preprocessing/multiplication algorithm for sparse matrices? Can we do multiplication in e.g.

O(m/poly(log n) + n)

,

where

m

= number of nonzeroes?

20-a

(53)

Conclusion

A preprocessing/multiplication algorithm for matrix-vector multiplication that builds on lookup table techniques

Is there a preprocessing/multiplication algorithm for sparse matrices? Can we do multiplication in e.g.

O(m/poly(log n) + n)

,

where

m

= number of nonzeroes?

Can the algebraic matrix multiplication algorithms (Strassen, etc.) be applied to this problem?

20-b

(54)

Conclusion

A preprocessing/multiplication algorithm for matrix-vector multiplication that builds on lookup table techniques

Is there a preprocessing/multiplication algorithm for sparse matrices? Can we do multiplication in e.g.

O(m/poly(log n) + n)

,

where

m

= number of nonzeroes?

Can the algebraic matrix multiplication algorithms (Strassen, etc.) be applied to this problem?

Can our ideas be extended to achieve non-subtractive Boolean matrix multiplication in

o(n

3

/ log

2

n)

?

20-c

(55)

Thank you!

21

References

Related documents