Ran Raz (Weizmann & IAS)
joint work with
Anat Ganor (Weizmann) Gillat Kol (IAS)
Exponential Separation of
Alice has a string , chosen according to a publicly known distribution.
She wants to send a message to Bob, so that
Bob can retrieve with high probability. How many bits does Alice need to send?
Answer: [Shannon ‘48 , Huffman ‘52]: bits!
Message Compression
[S48,H52]
!
bits
A B
Interactive-Compression Problem [BBCR ‘09]:
What if Alice and Bob engage in an
interactive communication protocol?
Can the protocol’s transcript be compressed to its ``information content’’ ?
Message-Compression Theorem [S48,H52]:
Alice gets , Bob gets , where are chosen according to a joint distribution .
They want to compute . ( are publicly known)
How many bits do they need to exchange?
Communication Complexity
[Yao]
:
A B
𝒚
𝒙
𝒇
(
𝒙
,
𝒚
)
!
Players can use private and public random strings. They have to compute with
probability (over and the random strings). Communication complexity of a protocol : maximal communication complexity over and the random strings.
Can every protocol be compressed to its
information content?
But how do we measure information content
of a protocol…???
Answer: Information Complexity! […,CSWY ‘01, BYJKS ‘04, BBCR ‘09,…]
: random variables
= the information that a player who knows learns about by seeing
(on average)
Conditional Mutual Information
Information Complexity
IC(
,
)
=
( ; Y | X)
+
( ; X | Y)are distributed according to
denotes the transcript of
what Alice learns
about Y from what about BobX from learns The amount of information that the players learn
IC(
,
)
=
( ; Y | X)
+
( ; X | Y)Information Complexity
what Alice learns
about Y from what about BobX from learns
IC(
f
,
)
=
inf
{
IC(,)}
computes f
over
,: CC(,) IC(,). Hence,
f,: CC(f,) IC(f,) .
Other Direction:
CC(,) can be much larger than IC(,).
The Compression Problem:
Given a protocol , can be simulated by ’, s.t. CC(’,) IC(,) ?
Information vs. Communication
A protocol with CC(,) = , and IC(,) = , () can be simulated by ’, with
1) CC(’,) [BBCR ‘09]
(at most quadratic compression)
2) over product distributions:
CC(’,) [BBCR ‘09]
3) for protocols with const number of rounds:
CC(’,) [BR ‘10] 4) CC(’,) [Bra ‘12]
Hence, f,: CC(f,)
f,: IC(f,) CC(f,)
No gap between IC and CC was known
[KLLRX ‘12]: Almost all known techniques for lower bounding CC give the same bound for IC New functions and techniques may be needed
Information vs. Communication
First gap between IC and CC: , explicit , such that
IC(f,) =
CC(f,)
Hence: Interactive protocols cannot always be compressed to their information content! By [Bra ‘12]: largest possible gap
Input size: triple exponential in
Protocol with IC has double exp CC
Alice gets . Bob gets (independently)
Goal: compute .
= CC of best protocol that answers correctly with prob on each coordinate.
Does ?
Equivalently, let =
Does ?
Connections known for a long time:
[R ‘94], [CSWY ‘01], [BBCR ‘09] [BR ‘10]: = I
Our result hence gives the first gap between
AC and CC: , explicit , such that
AC(f,) =
CC(f,)
Hence: a strong direct-sum theorem for CC does not hold!
Multilayer = 100k layers Depth: c = multilayers Alice gets x, Bob gets y.
Each input contains a bit for every vertex v in the tree.
Complete Binary Tree
100k
= Non-noisy vertices:
choose xv = yv at random
= Noisy vertices: choose xv,yv
independently at random Randomly select a multilayer i
Set all vertices in multilayers < i
to be non-noisy
Set all vertices in multilayer i
to be noisy
Select xv,yv for multilayers 1,..,i
The Distribution
: First
i
Multilayers
xv = 0
yv = 1
noisy multilayer i
multilayer c
Typical Vertices
Alice owns odd layers
Bob owns even layers The player who owns v
dictates the correct child of v: If Alice owns v and xv = 0, left
is correct, otherwise right
non-noisy
xv = yv
noisy
typical leaves
Typical Vertices
≥ 80% correct children noisy multilayer i non-noisyxv = yv
noisy
xv,yv iid
Alice owns odd layers
Bob owns even layers The player who owns v
dictates the correct child of v: If Alice owns v and xv = 0, left
is correct, otherwise right
v in multilayer > i is typical if the sub-path in multilayer i
leading to v has ≥ 80%
= Non-noisy vertices:
choose xv = yv at random
= Noisy vertices: choose xv,yv
independently at random
i is randomly chosen
Multilayers < i are non-noisy
Multilayer i is noisy
Multilayers > i: Bursting noise:
Set all non-typical vertices to be noisy
The Distribution
noisy multilayer i
typical leaves
layer j
v is typical if the sub-path in multilayer i leading to v has
Player’s Goal: Find and output the same typical leaf
Remarks: c =, multilayer=
The Bursting Noise Game
noisy multilayer i
typical leaves
non-noisy
xv = yv
noisy
xv,yv iid
100k
v is typical if the sub-path in multilayer i leading to v has
Player’s Goal: Find and output the same typical leaf
Remarks: c =, multilayer=
Typical leaves are rare (prob < )
If the players know i, they can solve by exchanging O(k) bits
A binary search can find i by exchanging O(log c) bits.
That’s why we set c =
The Bursting Noise Game
noisy multilayer i
typical leaves
non-noisy
xv = yv
noisy
xv,yv iid
Player’s Goal: Find and output the same typical leaf
Why the CC seems high (> ):
Hard to guess a typical leaf (prob < )
Hard to find i (CC > )
Full lower bound proof is hard… makes use of the bursting noise
The Bursting Noise Game
noisy multilayer i
typical leaves
non-noisy
xv = yv
noisy
xv,yv iid
Update : After selecting (x,y):
- Randomly select a bit b.
- For every leaf v, add b to yv Define: f(x,y) = b
Remark:
- For any typical leaf xvyv = b,
(as we started with xv = yv )
- For any non-typical leaf xvyv is random
Hence, to determine b it
suffices to find a typical leaf
Converting to a Function
noisy multilayer i
typical leaves
non-noisy
xv = yv
noisy
Starting from the root, on every vertex v,
the player who owns v
sends her bit w.p. 90% and sends the negation w.p. 10%
Both players move to the child indicated by this bit. When reaching a leaf v, players output xvyv
By Chernoff, they reach a typical leaf w.h.p.
First Attempt
noisy multilayer i
typical leaves
non-noisy
xv = yv
noisy
xv,yv iid
xv = 0
0 sent
90%
yv = 1
0
sent
10%
If the players always send their true bit, i is revealed, thus IC ≥ H(i) = log(c) =
Why
90%
and not
100%
??
Intuitively, a player learns very little information at non-noisy vertices, since both inputs are the same. W.h.p. the players reach only
noisy vertices (multilayer i)
Problem: with prob , players
reach a non-typical vertex at the end of multilayer i, and then reach additional noisy vertices.
Solution: we add a machinery to abort if the players reach a non-typical vertex .
Why the IC Seems Low
For let be the distribution of the bit sent by Alice. is either (0.9,0.1) or (0.1,0.9). Let be Bob’s best
estimation for , given by
It is known that,
.
We prove that this is at most
,
where is any distribution known to Bob at time.
Let be (0.9,0.1) or (0.1,0.9), based on Bob’s bit. Then , on every non-noisy vertex.
Rectangle based methods:
Lower bound CC using properties of large rectangles
[KLLRX ‘12]: Almost all known rect methods give the same bound for IC
Our contribution: New rectangle method powerful enough to separate IC and CC
Idea: Measure the size of a rectangle relative
to a new (arbitrary) distribution
Definition: (f,) has the (,)-Relative
Discrepancy Property if distribution on
(x,y) s.t. R rectangle with (R) > :
Theorem: If (f,) has the (,)-RDP then
CC(f,) log
Definition: (f,) has the (,)-Relative
Discrepancy Property if distribution on
(x,y) s.t. R rectangle with (R) > :
Theorem: If (f,) has the (,)-RDP then
CC(f,) log
We prove the (,)-RDP with
Definition: (f,) has the (,)-Relative
Discrepancy Property if distribution on
(x,y) s.t. R rectangle with (R) > :
The Distribution
RD: (R ∩ f-1(0)) ≥ (½ - ) (R)Randomly select multilayer i, set its vertices to noisy
Set all vertices before multilayer i
to non-noisy
Set all vertices after multilayer i
to noisy
is ``close’’ to but ``simpler’’:
Before multilayer i: Same as
After multilayer i: Only differs on typical vertices
Fix randomly and an assignment for vertices in
multilayers . Let be the restriction of the rectangle to inputs with .
Let
Show that is almost uniformly distributed.
Fix . Then the typical vertices are almost uniformly distributed.
If the measures according to and are significantly different, non-negligible amount of information is known on inputs on typical vertices.
Use Shearer’s inequality to show that the
information known about is very large, and hence is small.
First gap between IC and CC: , explicit , such that
IC(f,) =
CC(f,)
Hence: Interactive protocols cannot always be compressed to their information content By [Bra ‘12]: largest possible gap
By [BR ‘10]: Implies that a strong direct-sum theorem for CC does not hold