woodruff_2014.ppt

(1)

Turnstile Streaming Algorithms

Might as Well Be Linear Sketches

(2)

Turnstile Streaming Model

• Underlying n-dimensional vector x initialized to 0

n

• Long stream of updates x

Ã

x + e

_i

or x

Ã

x - e

_i

for

standard unit vector e

_i

• At end of the stream, x

2

{-m, -m+1, …, m-1, m}

n

for

some bound m

·

poly(n)

• Output an approximation to f(x) whp

• Goal

: use as little space (in bits) as possible

Only consider

1 pass over

data

(3)

Example: Norms

• Suppose you want |x|

_pp

= Ʃ

i=1n

|x

i

|

p

• Want Z for which (1-

Ɛ

) |x|

_pp

·

Z

·

(1+

Ɛ

) |x|

_pp

• Many applications

• p = 2

– Geometry, linear algebra

• p = 1

(4)

Algorithm for 2-Norm

• Let r = 1/

Ɛ2

• Choose an r x n matrix A of i.i.d. N(0,1/r) normal

random variables (with precision 1/poly(n))

• Maintain Ax in the stream

• Output |Ax|

₂2

(5)

Algorithm for 1-Norm [Indyk]

• Let r = 1/

Ɛ2

• Choose an r x n matrix A of i.i.d. Cauchy random

variables (with precision 1/poly(n))

• Maintain Ax in the stream

• Output median(|Ax

₁

|, …, |Ax

_r

|)

• Proof: 1-stability of Cauchy distribution

(6)

Common Features

Algorithms for 2-norm and 1-norm have the following

form:

1. Choose a random matrix A independent of x

2. Maintain Ax in the stream

3. Output a function of Ax

Question (?!):

does the optimal algorithm for

approximating any function in the streaming

model have this form?

Some functions f(x) may be weird:

What is

x

xx₁

Some functions f(x) may be weird:

What is

x

xx₁ The state of these

algorithms only depends on the underlying vector x,

not on the specific stream of updates

The state of these

algorithms only depends on the underlying vector x,

(7)

Our Results

• Yes, up to a factor of log n

• Theorem: for any relation f(x) for x ₂ {-m, -m+1, …, m}n , there

is a family F of O(n log m) matrices A with polynomially

bounded integer entries, and a correct (whp) algorithm in the streaming model which:

1. uniformly samples an A ₂ F 2. maintains Ax in the stream 3. outputs a function of Ax

Logarithm of number of possibilities (“states”) of Ax, for x ₂{-m,-m+1, …, m}n is optimal up to a log n factor

- For earlier examples, the matrices F are n log m samples from matrices of

i.i.d. normals or Cauchys

- Can show n log m samples suffice by Newman’s Theorem

- For earlier examples, the matrices F are n log m samples from matrices of

i.i.d. normals or Cauchys

(8)

Consequences

a ₂ {0,1}n

Create stream s(a)

b ₂ {0,1}n

Create stream s(b)

Lower Bound Technique

1. Run Alg on s(a), transmit state of Alg(s(a)) to Bob

2. Bob computes Alg(s(a), s(b))

(9)

Consequences

a ₂ {0,1}n b ₂ {0,1}n

Our main theorem implies:

It suffices to look at simultaneous communication

complexity of g: weaker public-coin model in which Alice and Bob simultaneously send a message to a referee

If referee can solve g(a,b), then space of Alg at least the

simultaneous communication complexity of g • Use public coin to sample A

• Alice sends A*x(a) and Bob sends A*x(b) to referee

• Referee uses linearity to compute A*(x(a) + x(b)) Create stream s(a) with

underlying vector x(a)

(10)

The log n Factor Loss

• Main Theorem:

The logarithm of the number of

states of Ax, as x ranges over {-m, -m+1, …, m}

n

,

plus the amount of randomness to store A, is

optimal up to a log n factor

• The log n loss is necessary

(11)

Non-Uniformity Restriction

• Careful wording: “exists a family F of O(n log m)

matrices A with polynomially bounded integer

entries…”

• Algorithm is

non-uniform

– Each A is hardwired

– Output of each state for each A also hardwired

• Alternatively, allow algorithm to use more space

to process a stream update,

provided it only

retains Ax and its randomness

(12)

Comment on the Model

• For each random seed, algorithm is a deterministic automaton with a finite number of states

• Main theorem only requires correctness for x ₂ {-m, -m+1, …, m}n

It counts the number of states as x varies in this range

• While processing the stream, may have |x|₁ > m

(13)

Related Work

• Ganguly

– Deterministic algorithms

– Specific to heavy hitters problem

– Shows algorithm might as well be a linear

sketch over the reals

(14)

Talk Outline

•

Proof of Main Theorem

1. Reduce optimal automaton to

path-independent automata

2. From path-independent automata to linear

sketches

(15)

Start

+e₁

-e₁, +e₂ … -e_n -e₁ +e₁ +e_n … +e₅ … … … …

Stream Automaton for Fixed

Randomness

Want each state of

the automaton to

only depend on x,

not how it got there

Want each state of

the automaton to

only depend on x,

not how it got there

0n in two

different states

0n in two

(16)

Path-Independent Automaton

• Each x

2

Z

n

in a unique state

•

For each randomness, can we modify the

automaton to make it path-independent?

(17)

Path-Reversible Automaton

• Path-reversible:

8

states s, if σ is a stream

(+e

_i1

, -e

_i2

, -e

_i3

, …,+e

_ir

) of updates, resulting

in a state t, then from t the stream

σ

-1

= (-e

ir

, …,+e

i3

,+e

i2

, -e

i1

) returns us to s

s₁ s₂ s₃ s₄

+e₂ -e₁ _+e

5

-e₅ +e₁

-e₂

(18)

Strategy

Arbitrary Automaton

Path-Independent Automaton

For stream σ, freq(σ) ₂ Zn is “net update” to each coordinate

Idea: 1. if in a state s, and update by a stream σ,

with freq(σ) = 0, answers ought to be similar

2. collapse all states s, s’ for which s+σ = s’ and freq(σ) = 0 for some stream σ

(19)

Zero-Frequency Graph

• Directed graph G = (V,E)

• V = states of old automaton A_old (for fixed randomness)

• (s,t) ₂ E if there is a stream σ of length at most L with s+σ=t and freq(σ) = 0

– Finite bound on L

• Terminal equivalence class: strongly connected

component with no outgoing edge

– Path in G lands in a terminal equivalence class

(20)

New Transition Function

• Suppose in terminal equivalence class C

• Given an update e_i

• Let v ₂ C be an arbitrary node

• Compute v+e_iusing transition function of A_old

• Walk from v+e_iuntil reach terminal equivalence class C’

– C’ is unique

• Does not depend on choice of v

• Only one terminal equivalence class reachable

(21)

Terminal equivalence

class

u v

+e_i _+e

i

class

(22)

Output Function of

A

new

• In each terminal equivalence class C, sample node u from stationary distribution from random walk in C (add self-loops)

– Output of A_new on C = Output of A_old on u

• If v is starting vertex of A_old,

– take a random walk in G from v

– let starting vertex of A_new be terminal equivalence class C reached

(23)

Correctness

• Let ¦ be an arbitrary distribution on streams ¾

• Choose fixed randomness so A_old correct on ¦’:

– Long sequence of zero streams, – Followed by ¾ sampled from ¦,

– Followed by long sequence of zero streams

• Output of A_new on ¦ statistically close to output of A_old

on ¦’

(24)

Arbitrary Automaton

Path-Independent Automaton

(25)

Talk Outline

•

Proof of Main Theorem

1. Reduce optimal automaton to

path-independent automata

2. From path-independent automata to linear

sketches

(26)

Path Independent Automata and

Submodules

• Let o be the initial state

• M = {x

2

Z

n

such that x in o}

• 0

n

2

M

• If x

2

M, then –x

2

M

• If x, y

2

M, then x+y

2

M

• M is a free submodule of Z

n

(a lattice)

• M has a basis

(27)

• States of automaton are elements (cosets) of the

quotient module Z

n

/M

• Space of automaton is log of the number of

cosets containing an x

2

{-m, …, m}

n

• Z

n

/M examples:

– Zn/e

1 is free. It remembers all but first coordinate

– Zn/(2e

1, 2e2, …, 2en) not free. It remembers coordinate

parities

(28)

Smith Normal Form

• Smith Normal Form:

9

a basis y

₁

, …, y

_n

of Z

n

for

which the generators of M are q

_i

¢

y

_i

for i = 1, …,

r, where q

_i

| q

_i+1

are positive integers, and r =

rank(M)

• If q

₁

= … = q

_s

= 1 but q

_s+1

> 1, the generators of

Z

n

/M are y

s+1

+ M, …, y

n

+ M

(29)

Remaining Issues

• Counting argument:

if we replace Bx mod q with Bx,

we get a linear sketch with a similar space

complexity

• Issue:

entries of B may be exponentially large

Compression:

reduce coefficients of random linear

(30)

Applications and Open Questions

• Simpler proof of ~(n

1-2/p

) bit lower bound

for estimating F

_p

, p > 2

– No communication complexity

• Many dimension lower bounds known for

sketching norms

over the reals

– F

_p

, matrix norms, adaptive sketching