• No results found

Matrix-Vector Multiplication in Sub-Quadratic Time (Some Preprocessing Required)

N/A
N/A
Protected

Academic year: 2022

Share "Matrix-Vector Multiplication in Sub-Quadratic Time (Some Preprocessing Required)"

Copied!
55
0
0

Loading.... (view fulltext now)

Full text

(1)

Matrix-Vector Multiplication in Sub-Quadratic Time (Some Preprocessing Required)

Ryan Williams

January 17, 2007

0-0

(2)

Introduction

Matrix-Vector Multiplication: Fundamental Operation in Scientific Computing

1

(3)

Introduction

Matrix-Vector Multiplication: Fundamental Operation in Scientific Computing How fast can

n × n

matrix-vector multiplication be?

1-a

(4)

Introduction

Matrix-Vector Multiplication: Fundamental Operation in Scientific Computing How fast can

n × n

matrix-vector multiplication be?

Θ(n

2

)

steps just to read the matrix!

1-b

(5)

Introduction

Matrix-Vector Multiplication: Fundamental Operation in Scientific Computing How fast can

n × n

matrix-vector multiplication be?

Θ(n

2

)

steps just to read the matrix!

Main Result: If we allow

O(n

2+ε

)

preprocessing, then matrix-vector multiplication over any finite semiring can be done in

O(n

2

/(ε log n)

2

)

.

1-c

(6)

Better Algorithms for Matrix Multiplication

Three of the major developments:

2

(7)

Better Algorithms for Matrix Multiplication

Three of the major developments:

Arlazarov et al., a.k.a. “Four Russians” (1960’s):

O(n

3

/ log n)

operations

Uses table lookups

Good for hardware with short vector operations as primitives

2-a

(8)

Better Algorithms for Matrix Multiplication

Three of the major developments:

Arlazarov et al., a.k.a. “Four Russians” (1960’s):

O(n

3

/ log n)

operations

Uses table lookups

Good for hardware with short vector operations as primitives

Strassen (1969):

n

log 7log 2

= O(n

2.81

)

operations

Asymptotically fast, but overhead in the big-O

Experiments in practice are inconclusive about Strassen vs. Four Russians for Boolean matrix multiplication (Bard, 2006)

2-b

(9)

Better Algorithms for Matrix Multiplication

Three of the major developments:

Arlazarov et al., a.k.a. “Four Russians” (1960’s):

O(n

3

/ log n)

operations

Uses table lookups

Good for hardware with short vector operations as primitives

Strassen (1969):

n

log 7log 2

= O(n

2.81

)

operations

Asymptotically fast, but overhead in the big-O

Experiments in practice are inconclusive about Strassen vs. Four Russians for Boolean matrix multiplication (Bard, 2006)

Coppersmith and Winograd (1990):

O(n

2.376

)

operations

Not yet practical

2-c

(10)

Focus: Combinatorial Matrix Multiplication Algorithms

3

(11)

Focus: Combinatorial Matrix Multiplication Algorithms

Also called non-algebraic; let’s call them non-subtractive E.g. Four-Russians is combinatorial, Strassen isn’t

3-a

(12)

Focus: Combinatorial Matrix Multiplication Algorithms

Also called non-algebraic; let’s call them non-subtractive E.g. Four-Russians is combinatorial, Strassen isn’t

More Non-Subtractive Boolean Matrix Mult. Algorithms:

Atkinson and Santoro:

O(n

3

/ log

3/2

n)

on a

(log n)

-word RAM

Rytter and Basch-Khanna-Motwani:

O(n

3

/ log

2

n)

on a RAM

Chan: Four Russians can be implemented on

O(n

3

/ log

2

n)

on a pointer machine

3-b

(13)

Main Result

The

O(n

3

/ log

2

n)

matrix multiplication algorithm can be “de-amortized”

4

(14)

Main Result

The

O(n

3

/ log

2

n)

matrix multiplication algorithm can be “de-amortized”

More precisely, we can:

Preprocess an

n × n

matrix

A

over a finite semiring in

O(n

2+ε

)

Such that vector multiplications with

A

can be done in

O(n

2

/(ε log n)

2

)

4-a

(15)

Main Result

The

O(n

3

/ log

2

n)

matrix multiplication algorithm can be “de-amortized”

More precisely, we can:

Preprocess an

n × n

matrix

A

over a finite semiring in

O(n

2+ε

)

Such that vector multiplications with

A

can be done in

O(n

2

/(ε log n)

2

)

Allows for “non-subtractive” matrix multiplication to be done on-line

4-b

(16)

Main Result

The

O(n

3

/ log

2

n)

matrix multiplication algorithm can be “de-amortized”

More precisely, we can:

Preprocess an

n × n

matrix

A

over a finite semiring in

O(n

2+ε

)

Such that vector multiplications with

A

can be done in

O(n

2

/(ε log n)

2

)

Allows for “non-subtractive” matrix multiplication to be done on-line Can be implemented on a pointer machine

4-c

(17)

Main Result

The

O(n

3

/ log

2

n)

matrix multiplication algorithm can be “de-amortized”

More precisely, we can:

Preprocess an

n × n

matrix

A

over a finite semiring in

O(n

2+ε

)

Such that vector multiplications with

A

can be done in

O(n

2

/(ε log n)

2

)

Allows for “non-subtractive” matrix multiplication to be done on-line Can be implemented on a pointer machine

This Talk: The Boolean case

4-d

(18)

Preprocessing Phase: The Boolean Case

Partition the input matrix

A

into blocks of

⌈ε log n⌉ × ⌈ε log n⌉

size:

A1,1

A2,1

A n

εlog n,1

A1,2 A1, n

εlog n

A n

εlog n,εlog nn

· · ·

...

· · · · · ·

...

...

Ai,j

ε log n

ε log n

A =

5

(19)

Preprocessing Phase: The Boolean Case

Build a graph

G

with parts

P

1

, . . . , P

n/(ε log n),

Q

1

, . . . , Q

n/(ε log n) 2ε log n

P

2

... ... ...

...

P

n

ε log n

P

1 2ε log n

2ε log n 2ε log n

2ε log n 2ε log n

Q

1

Q

2

Q

n

ε log n

Each part has 2ε log n vertices, one for each possible ε log n vector

6

(20)

Preprocessing Phase: The Boolean Case

Edges of

G

: Each vertex

v

in each

P

i has exactly one edge into each

Q

j

2

ε log n

P i 2

ε log n

Q j

v

A

j,i

v

7

(21)

Preprocessing Phase: The Boolean Case

Edges of

G

: Each vertex

v

in each

P

i has exactly one edge into each

Q

j

2

ε log n

P i 2

ε log n

Q j

v

A

j,i

v

Time to build the graph:

n

ε log n ·

n

ε log n · 2

ε log n

· ( ε log n) 2 = O(n 2+ε )

number of Qj

number of Pi

number of nodes in Pi

matrix-vector mult of Aj,i and v

7-a

(22)

How to Do Fast Vector Multiplications

Let

v

be a column vector. Want:

A · v

.

8

(23)

How to Do Fast Vector Multiplications

Let

v

be a column vector. Want:

A · v

.

(1)

Break up

v

into

ε log n

sized chunks:

v =

v

1

v

2

...

v

n

εlog n

8-a

(24)

How to Do Fast Vector Multiplications

(2)

For each

i = 1, . . . , n/(ε log n)

, look up

v

i in

P

i.

9

(25)

How to Do Fast Vector Multiplications

(2)

For each

i = 1, . . . , n/(ε log n)

, look up

v

i in

P

i.

2ε log n

P

2

... ... ...

...

P

n

ε log n

P

1 2ε log n

2ε log n 2ε log n

2ε log n 2ε log n

Q

1

Q

2

Q

n

ε log n

v1

v2

vn/(ε log n)

Takes

O(n) ˜

time.

9-a

(26)

How to Do Fast Vector Multiplications

(2)

For each

i = 1, . . . , n/(ε log n)

, look up

v

i in

P

i.

2ε log n

P

2

... ... ...

...

P

n

ε log n

P

1 2ε log n

2ε log n 2ε log n

2ε log n 2ε log n

Q

1

Q

2

Q

n

ε log n

v1

v2

vn/(ε log n)

Takes

O(n) ˜

time.

10

(27)

How to Do Fast Vector Multiplications

(3)

Look up the neighbors of

v

i, mark each neighbor found.

11

(28)

How to Do Fast Vector Multiplications

(3)

Look up the neighbors of

v

i, mark each neighbor found.

2ε log n

P

2

... ... ...

...

P

n

ε log n

P

1 2ε log n

2ε log n 2ε log n

2ε log n 2ε log n

Q

1

Q

2

Q

n

ε log n

v1

v2

vn/(ε log n)

A1,1 · v1

A2,1 · v1

A n

ε log n,1 · v1

11-a

(29)

How to Do Fast Vector Multiplications

(3)

Look up the neighbors of

v

i, mark each neighbor found.

2ε log n

P

2

... ... ...

...

P

n

ε log n

P

1 2ε log n

2ε log n 2ε log n

2ε log n 2ε log n

Q

1

Q

2

Q

n

ε log n

v1

v2

vn/(ε log n)

A1,2 · v2

A2,2 · v2

A n

ε log n,2 · v2

12

(30)

How to Do Fast Vector Multiplications

(3)

Look up the neighbors of

v

i, mark each neighbor found.

2ε log n

P

2

... ... ...

...

P

n

ε log n

P

1 2ε log n

2ε log n 2ε log n

2ε log n 2ε log n

Q

1

Q

2

Q

n

ε log n

v1

v2

vn/(ε log n)

A1, n

ε log n · vn/(ε log n)

A2, n

ε log n · vn/(ε log n)

A n

ε log n,ε log nn · vn/(ε log n)

Takes

O

 

n ε log n



2



13

(31)

How to Do Fast Vector Multiplications

(4)

For each

Q

j, define

v

j as the OR of all marked vectors in

Q

j

2ε log n

P

2

... ... ...

...

P

n

ε log n

P

1 2ε log n

2ε log n 2ε log n

2ε log n 2ε log n

Q

1

Q

2

Q

n

ε log n

v1

v2

vn/(ε log n)

v

1

v

2

v

n/(ε log n)

Takes

O(n ˜

1+ε

)

time

14

(32)

How to Do Fast Vector Multiplications

(4)

For each

Q

j, define

v

j as the OR of all marked vectors in

Q

j

2ε log n

P

2

... ... ...

...

P

n

ε log n

P

1 2ε log n

2ε log n 2ε log n

2ε log n 2ε log n

Q

1

Q

2

Q

n

ε log n

v1

v2

vn/(ε log n)

v

1

v

2

v

n/(ε log n)

Takes

O(n ˜

1+ε

)

time

15

(33)

How to Do Fast Vector Multiplications

(5)

Output

v

:=

v

1

v

2

...

v

n

εlog n

. Claim:

v

= A · v

.

16

(34)

How to Do Fast Vector Multiplications

(5)

Output

v

:=

v

1

v

2

...

v

n

εlog n

. Claim:

v

= A · v

.

Proof: By definition,

v

j

= W

n/(ε log n)

i=1

A

j,i

· v

i.

16-a

(35)

How to Do Fast Vector Multiplications

(5)

Output

v

:=

v

1

v

2

...

v

n

εlog n

. Claim:

v

= A · v

.

Proof: By definition,

v

j

= W

n/(ε log n)

i=1

A

j,i

· v

i.

Av =

A

1,1

· · · A

1,n/(ε log n)

... . . . ...

A

n/(ε log n),1

· · · A

n/(ε log n),n/(ε log n)

v

1

...

v

n

εlog n

16-b

(36)

How to Do Fast Vector Multiplications

(5)

Output

v

:=

v

1

v

2

...

v

n

εlog n

. Claim:

v

= A · v

.

Proof: By definition,

v

j

= W

n/(ε log n)

i=1

A

j,i

· v

i.

Av =

A

1,1

· · · A

1,n/(ε log n)

... . . . ...

A

n/(ε log n),1

· · · A

n/(ε log n),n/(ε log n)

v

1

...

v

n

εlog n

= ( W

n/(ε log n)

i=1

A

1,i

· v

i

, . . . , W

n/(ε log n)

i=1

A

1,n/(ε log n)

· v

i

) = v

.

16-c

(37)

Some Applications

Can quickly compute the neighbors of arbitrary vertex subsets

Let

A

be the adjacency matrix of

G = (V, E)

.

Let

v

S be the indicator vector for a

S ⊆ V

.

17

(38)

Some Applications

Can quickly compute the neighbors of arbitrary vertex subsets

Let

A

be the adjacency matrix of

G = (V, E)

.

Let

v

S be the indicator vector for a

S ⊆ V

.

Proposition:

A · v

S is the indicator vector for

N (S)

, the neighborhood of

S

.

17-a

(39)

Some Applications

Can quickly compute the neighbors of arbitrary vertex subsets

Let

A

be the adjacency matrix of

G = (V, E)

.

Let

v

S be the indicator vector for a

S ⊆ V

.

Proposition:

A · v

S is the indicator vector for

N (S)

, the neighborhood of

S

.

Corollary: After

O(n

2+ε

)

preprocessing, can determine the neighborhood of any vertex subset in

O(n

2

/(ε log n)

2

)

time.

(One level of BFS in

o(n

2

)

time)

17-b

(40)

Graph Queries

Corollary: After

O(n

2+ε

)

preprocessing, can determine if a given vertex subset is an independent set, a vertex cover, or a dominating set, all in

O(n

2

/(ε log n)

2

)

time.

18

(41)

Graph Queries

Corollary: After

O(n

2+ε

)

preprocessing, can determine if a given vertex subset is an independent set, a vertex cover, or a dominating set, all in

O(n

2

/(ε log n)

2

)

time.

Proof: Let

S ⊆ V

.

18-a

(42)

Graph Queries

Corollary: After

O(n

2+ε

)

preprocessing, can determine if a given vertex subset is an independent set, a vertex cover, or a dominating set, all in

O(n

2

/(ε log n)

2

)

time.

Proof: Let

S ⊆ V

.

S

is dominating

⇐⇒ S ∪ N (S) = V

.

18-b

(43)

Graph Queries

Corollary: After

O(n

2+ε

)

preprocessing, can determine if a given vertex subset is an independent set, a vertex cover, or a dominating set, all in

O(n

2

/(ε log n)

2

)

time.

Proof: Let

S ⊆ V

.

S

is dominating

⇐⇒ S ∪ N (S) = V

.

S

is independent

⇐⇒ S ∩ N (S) = ∅

.

18-c

(44)

Graph Queries

Corollary: After

O(n

2+ε

)

preprocessing, can determine if a given vertex subset is an independent set, a vertex cover, or a dominating set, all in

O(n

2

/(ε log n)

2

)

time.

Proof: Let

S ⊆ V

.

S

is dominating

⇐⇒ S ∪ N (S) = V

.

S

is independent

⇐⇒ S ∩ N (S) = ∅

.

S

is a vertex cover

⇐⇒ V − S

is independent.

18-d

(45)

Graph Queries

Corollary: After

O(n

2+ε

)

preprocessing, can determine if a given vertex subset is an independent set, a vertex cover, or a dominating set, all in

O(n

2

/(ε log n)

2

)

time.

Proof: Let

S ⊆ V

.

S

is dominating

⇐⇒ S ∪ N (S) = V

.

S

is independent

⇐⇒ S ∩ N (S) = ∅

.

S

is a vertex cover

⇐⇒ V − S

is independent.

Each can be quickly determined from knowing

S

and

N (S)

.

18-e

(46)

Triangle Detection

19

(47)

Triangle Detection

Problem: Triangle Detection

Given: Graph

G

and vertex

i

.

Question: Does

i

participate in a 3-cycle, a.k.a. triangle?

19-a

(48)

Triangle Detection

Problem: Triangle Detection

Given: Graph

G

and vertex

i

.

Question: Does

i

participate in a 3-cycle, a.k.a. triangle?

Worst Case: Can take

Θ(n

2

)

time to check all pairs of neighbors of

i

19-b

(49)

Triangle Detection

Problem: Triangle Detection

Given: Graph

G

and vertex

i

.

Question: Does

i

participate in a 3-cycle, a.k.a. triangle?

Worst Case: Can take

Θ(n

2

)

time to check all pairs of neighbors of

i

Corollary: After

O(n

2+ε

)

preprocessing on

G

, can solve triangle detection for arbitrary vertices in

O(n

2

/(ε log n)

2

)

time.

19-c

(50)

Triangle Detection

Problem: Triangle Detection

Given: Graph

G

and vertex

i

.

Question: Does

i

participate in a 3-cycle, a.k.a. triangle?

Worst Case: Can take

Θ(n

2

)

time to check all pairs of neighbors of

i

Corollary: After

O(n

2+ε

)

preprocessing on

G

, can solve triangle detection for arbitrary vertices in

O(n

2

/(ε log n)

2

)

time.

Proof: Given vertex

i

, let

S

be its set of neighbors (gotten in

O(n)

time).

S

is not independent

⇐⇒ i

participates in a triangle.

19-d

(51)

Conclusion

A preprocessing/multiplication algorithm for matrix-vector multiplication that builds on lookup table techniques

20

(52)

Conclusion

A preprocessing/multiplication algorithm for matrix-vector multiplication that builds on lookup table techniques

Is there a preprocessing/multiplication algorithm for sparse matrices? Can we do multiplication in e.g.

O(m/poly(log n) + n)

,

where

m

= number of nonzeroes?

20-a

(53)

Conclusion

A preprocessing/multiplication algorithm for matrix-vector multiplication that builds on lookup table techniques

Is there a preprocessing/multiplication algorithm for sparse matrices? Can we do multiplication in e.g.

O(m/poly(log n) + n)

,

where

m

= number of nonzeroes?

Can the algebraic matrix multiplication algorithms (Strassen, etc.) be applied to this problem?

20-b

(54)

Conclusion

A preprocessing/multiplication algorithm for matrix-vector multiplication that builds on lookup table techniques

Is there a preprocessing/multiplication algorithm for sparse matrices? Can we do multiplication in e.g.

O(m/poly(log n) + n)

,

where

m

= number of nonzeroes?

Can the algebraic matrix multiplication algorithms (Strassen, etc.) be applied to this problem?

Can our ideas be extended to achieve non-subtractive Boolean matrix multiplication in

o(n

3

/ log

2

n)

?

20-c

(55)

Thank you!

21

References

Related documents

based research in active flow control and Propulsion Airframe Integration configurations. •   Improve data accuracy and repeatability related to

The effect of nanoparticles morphology on the oil recovery also need to be studied as different morphology will give a different result and the best morphology should be used

Apart from land fragmentation, ownership of key production resources such as, land, draft animal power and family labour can also potentially impact efficiency.. The main argument

Form CA-2 (''Federal Employees' Notice of Occupational Disease and Claim for Compensation"), your statements in response to the checklist, and a report from your treating

On the other hand, it has been called into question that, if the works of Ricardo and Malthus – which were so vilified in TGT – were written precisely in the period in time before

Our method uses a scalable SGD optimizer that learns a scoring (utility) function able to rank the assignments better than a baseline method consisting of averaging the peer grades.

• Authoring: Using Encompass’ intuitive authoring component, trainers and subject matter experts were able to create rich training scenarios and reference information; Encompass

Facilitating joint attention with salient pointing in interactions involving children with autism spectrum disorder.. Katja Dindar, University of Eastern Finland,