Turnstile Streaming Algorithms
Might as Well Be Linear Sketches
Turnstile Streaming Model
• Underlying n-dimensional vector x initialized to 0
n• Long stream of updates x
Ã
x + e
ior x
Ã
x - e
ifor
standard unit vector e
i• At end of the stream, x
2
{-m, -m+1, …, m-1, m}
nfor
some bound m
·
poly(n)
• Output an approximation to f(x) whp
• Goal
: use as little space (in bits) as possible
Only consider
1 pass over
data
Example: Norms
• Suppose you want |x|
pp= Ʃ
i=1n
|x
i|
p• Want Z for which (1-
Ɛ) |x|
pp·
Z
·
(1+
Ɛ) |x|
pp• Many applications
• p = 2
– Geometry, linear algebra
• p = 1
Algorithm for 2-Norm
• Let r = 1/
Ɛ2• Choose an r x n matrix A of i.i.d. N(0,1/r) normal
random variables (with precision 1/poly(n))
• Maintain Ax in the stream
• Output |Ax|
22Algorithm for 1-Norm [Indyk]
• Let r = 1/
Ɛ2• Choose an r x n matrix A of i.i.d. Cauchy random
variables (with precision 1/poly(n))
• Maintain Ax in the stream
• Output median(|Ax
1|, …, |Ax
r|)
• Proof: 1-stability of Cauchy distribution
Common Features
Algorithms for 2-norm and 1-norm have the following
form:
1. Choose a random matrix A independent of x
2. Maintain Ax in the stream
3. Output a function of Ax
Question (?!):
does the optimal algorithm for
approximating any function in the streaming
model have this form?
Some functions f(x) may be weird:
What is
x
xx1
Some functions f(x) may be weird:
What is
x
xx1 The state of these
algorithms only depends on the underlying vector x,
not on the specific stream of updates
The state of these
algorithms only depends on the underlying vector x,
Our Results
• Yes, up to a factor of log n
• Theorem: for any relation f(x) for x 2 {-m, -m+1, …, m}n , there
is a family F of O(n log m) matrices A with polynomially
bounded integer entries, and a correct (whp) algorithm in the streaming model which:
1. uniformly samples an A 2 F 2. maintains Ax in the stream 3. outputs a function of Ax
Logarithm of number of possibilities (“states”) of Ax, for x 2 {-m,-m+1, …, m}n is optimal up to a log n factor
- For earlier examples, the matrices F are n log m samples from matrices of
i.i.d. normals or Cauchys
- Can show n log m samples suffice by Newman’s Theorem
- For earlier examples, the matrices F are n log m samples from matrices of
i.i.d. normals or Cauchys
Consequences
a 2 {0,1}n
Create stream s(a)
b 2 {0,1}n
Create stream s(b)
Lower Bound Technique
1. Run Alg on s(a), transmit state of Alg(s(a)) to Bob
2. Bob computes Alg(s(a), s(b))
Consequences
a 2 {0,1}n b 2 {0,1}n
Our main theorem implies:
It suffices to look at simultaneous communication
complexity of g: weaker public-coin model in which Alice and Bob simultaneously send a message to a referee
If referee can solve g(a,b), then space of Alg at least the
simultaneous communication complexity of g • Use public coin to sample A
• Alice sends A*x(a) and Bob sends A*x(b) to referee
• Referee uses linearity to compute A*(x(a) + x(b)) Create stream s(a) with
underlying vector x(a)
The log n Factor Loss
• Main Theorem:
The logarithm of the number of
states of Ax, as x ranges over {-m, -m+1, …, m}
n,
plus the amount of randomness to store A, is
optimal up to a log n factor
• The log n loss is necessary
Non-Uniformity Restriction
• Careful wording: “exists a family F of O(n log m)
matrices A with polynomially bounded integer
entries…”
• Algorithm is
non-uniform
– Each A is hardwired
– Output of each state for each A also hardwired
• Alternatively, allow algorithm to use more space
to process a stream update,
provided it only
retains Ax and its randomness
Comment on the Model
• For each random seed, algorithm is a deterministic automaton with a finite number of states
• Main theorem only requires correctness for x 2 {-m, -m+1, …, m}n
It counts the number of states as x varies in this range
• While processing the stream, may have |x|1 > m
Related Work
• Ganguly
– Deterministic algorithms
– Specific to heavy hitters problem
– Shows algorithm might as well be a linear
sketch over the reals
Talk Outline
•
Proof of Main Theorem
1. Reduce optimal automaton to
path-independent automata
2. From path-independent automata to linear
sketches
Start
+e1
-e1, +e2 … -en -e1 +e1 +en … +e5 … … … …
Stream Automaton for Fixed
Randomness
Want each state of
the automaton to
only depend on x,
not how it got there
Want each state of
the automaton to
only depend on x,
not how it got there
0n in two
different states
0n in two
Path-Independent Automaton
• Each x
2
Z
nin a unique state
•
For each randomness, can we modify the
automaton to make it path-independent?
Path-Reversible Automaton
• Path-reversible:
8
states s, if σ is a stream
(+e
i1, -e
i2, -e
i3, …,+e
ir) of updates, resulting
in a state t, then from t the stream
σ
-1= (-e
ir
, …,+e
i3,+e
i2, -e
i1) returns us to s
s1 s2 s3 s4
+e2 -e1 +e
5
-e5 +e1
-e2
Strategy
Arbitrary Automaton
Path-Reversible Automaton
Path-Independent Automaton
For stream σ, freq(σ) 2 Zn is “net update” to each coordinate
Idea: 1. if in a state s, and update by a stream σ,
with freq(σ) = 0, answers ought to be similar
2. collapse all states s, s’ for which s+σ = s’ and freq(σ) = 0 for some stream σ
Zero-Frequency Graph
• Directed graph G = (V,E)
• V = states of old automaton Aold (for fixed randomness)
• (s,t) 2 E if there is a stream σ of length at most L with s+σ=t and freq(σ) = 0
– Finite bound on L
• Terminal equivalence class: strongly connected
component with no outgoing edge
– Path in G lands in a terminal equivalence class
New Transition Function
• Suppose in terminal equivalence class C
• Given an update ei
• Let v 2 C be an arbitrary node
• Compute v+ei using transition function of Aold
• Walk from v+ei until reach terminal equivalence class C’
– C’ is unique
• Does not depend on choice of v
• Only one terminal equivalence class reachable
Terminal equivalence
class
Terminal equivalence
class
u v
+ei +e
i
Terminal equivalence
class
Terminal equivalence
class
Output Function of
A
new
• In each terminal equivalence class C, sample node u from stationary distribution from random walk in C (add self-loops)
– Output of Anew on C = Output of Aold on u
• If v is starting vertex of Aold,
– take a random walk in G from v
– let starting vertex of Anew be terminal equivalence class C reached
Correctness
• Let ¦ be an arbitrary distribution on streams ¾
• Choose fixed randomness so Aold correct on ¦’:
– Long sequence of zero streams, – Followed by ¾ sampled from ¦,
– Followed by long sequence of zero streams
• Output of Anew on ¦ statistically close to output of Aold
on ¦’
Arbitrary Automaton
Path-Reversible Automaton
Path-Independent Automaton
Talk Outline
•
Proof of Main Theorem
1. Reduce optimal automaton to
path-independent automata
2. From path-independent automata to linear
sketches
Path Independent Automata and
Submodules
• Let o be the initial state
• M = {x
2
Z
nsuch that x in o}
• 0
n2
M
• If x
2
M, then –x
2
M
• If x, y
2
M, then x+y
2
M
• M is a free submodule of Z
n(a lattice)
• M has a basis
• States of automaton are elements (cosets) of the
quotient module Z
n/M
• Space of automaton is log of the number of
cosets containing an x
2
{-m, …, m}
n• Z
n/M examples:
– Zn/e1 is free. It remembers all but first coordinate
– Zn/(2e
1, 2e2, …, 2en) not free. It remembers coordinate
parities
Smith Normal Form
• Smith Normal Form:
9
a basis y
1, …, y
nof Z
nfor
which the generators of M are q
i¢
y
ifor i = 1, …,
r, where q
i| q
i+1are positive integers, and r =
rank(M)
• If q
1= … = q
s= 1 but q
s+1> 1, the generators of
Z
n/M are y
s+1
+ M, …, y
n+ M
Remaining Issues
• Counting argument:
if we replace Bx mod q with Bx,
we get a linear sketch with a similar space
complexity
• Issue:
entries of B may be exponentially large
Compression:
reduce coefficients of random linear
Applications and Open Questions
• Simpler proof of ~(n
1-2/p) bit lower bound
for estimating F
p, p > 2
– No communication complexity
• Many dimension lower bounds known for
sketching norms
over the reals
– F
p, matrix norms, adaptive sketching