4.4 Examples
4.4.2 Gaussian Case
A similar test channel tension arises in the Gaussian case. This can be seen most clearly by considering the optimization problem over ρxz for fixed σX2 . In Fig.
4.3 we plot G3(ρxz) = inf σ2 Y sup λ∈Λ inf ρxy,ρyz G [K, Σ, λ, ∆, R] where we hold σ2
X = 1, and K = K(1, σY, 1, ρxy, ρyz, ρxz)is the covariance matrix
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08
Test channel correlation ρxz
Figure 4.3: Test channel optimization for Theorem 19. The plot shows the exponent against ρxz, holding σ2X = 1 fixed for R = 0.4, ζxy =
0.7and ∆ = 0.4.
Intuitively, ρxz controls the number of different codewords we use to cover
the source sequences. At rate R the scheme allows us to identify at most exp(nR) codewords uniquely, and binning is required to go beyond this. A large code- book has the advantage that each source can be mapped to a better (i.e. closer) codeword. As we increase the size of the codebook beyond this point, the gains from having a “cleaner” codebook are outweighed by the penalty we pay for binning. From the plot we can see there is an optimum choice around ρxz = 0.76
for these parameters.
Figure 4.4 shows the exponent plotted (by numerically solving the optimiza- tion problem) against the rate. For comparison the upper bound of Theorem 20 is included, as is the exponent for the no side information case, corresponding a the continuous version of Marton’s point-to-point exponent [58]. This result
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.2 0.4 0.6 0.8 1 1.2 1.4
Rate R (nats per sample)
(Achievable) Error exponent
Our Exponent "Informed" Encoder NoSide information
Figure 4.4: A plot of the achievable exponent of Theorem 19. Here ζxy =
0.7 (the correlation coefficient between the source and side in- formation) and ∆ = 0.4. R(∆) = 0.121 nats for these parame- ters.
was first proved by Ihara and Kubo [59], who showed the exponent is inf σX:12log( σ2 X ∆ )>R D(fσX||f1) = 1
2(∆ exp(2R) − log(∆ exp(2R)) − 1) . (4.22)
We can show our achievable exponent recovers (4.22) by taking the side in- formation to be statistically independent i.e. ζ = 0. In this case, one can show that ρxy = ρyz = 0solve the inner optimization problem of (4.12). Further, since
X ⊥⊥ Y , Y cannot help achieve the distortion constraint, choosing σY = 1 is
nature’s best play. With these choices we see that D(K|| ¯K) = D(fσ2
X||f1) and
we are left with the following equivalent optimization (where we have written ˆ X = αZ) inf σX sup ρxˆx,σXˆ D(fσ2 X||f1) E[(X − ˆ X)2] ≥ ∆or I(X; ˆX) ≥ R ∞ otherwise.
As nature will always pick σX such that the supremum is finite, we are left with
inf
σX:R(σX2,∆)≥R
D(fσ2 X||f1).
Expanding the divergence and appealing to the monotonicity of x − log x gives (4.22)2.
Using equation (4.22) and Theorem 20 we can determine the error expo- nent exactly when the side information is available at both the encoder and decoder. In this case, Wyner [79, section 3] provides a simple scheme to achieve the rate distortion function. The encoder simply subtracts the conditional mean E[X|Y = y] from the source. An achievable exponent then follows by com- puting the point-to-point exponent for the random variable X|Y = y, which is again Gaussian, with mean −ζy and variance 1 − ζ2. Our achievable exponent
in this case is inf σX:R(σX,∆)>R D(fσ2 X||f1−ζ 2) = 1 2 ∆ exp(2R) 1 − ζ2 − log ∆ exp(2R) 1 − ζ2 − 1 (4.23) We now show that this is in fact the best we can do, by showing that (4.23) coincides with the upper bound of Theorem 20. The optimization problem of Theorem 20 can be solved as follows. We first note that if X, Y are zero mean with covariance matrix K, then Var(X|Y ) = det(K)VarY . Hence we may write the problem as
inf
K0: g(K,∆,R)≤0D(K||Σ)
where g(K, ∆, R) = − log det(K) + log(∆) + log eT
2Ke2+ 2R. The KKT conditions
tell us the optimum K∗ must satisfy
2Using a virtually identical argument one can show that exponent of Theorem 17 reduces to
Marton’s exponent for the discrete-memoryless case when the side information is independent of the source.
1. −1 2(K ∗)−1 +1 2Σ −1+ λ −(K ∗)−1+ 0 0 0 eT 2K ∗e 2 = 0 2. λg(K∗) = 0.
One can solve to this system to find
K∗ = ζ2+ ∆ exp(2R) ζ ζ 1 .
Evaluating D(K∗||Σ) yields (4.23). Therefore, when the side information is available in both places we have determined the exponent exactly as (4.23).
CHAPTER 5
IMPROVED SOURCE CODING EXPONENTS VIA WITSENHAUSEN’S RATE
In this chapter we improve the results of the previous chapter, for the special case that the side information is availably fully (i.e. without being encoded) at the decoder, see Fig 5.1.
c
2011 IEEE. Portions, reprinted, with permission, from [Kelly and Wagner, “Improved Source Coding Exponents via Witsenhausen’s Rate”, to appear in IEEE Transactions on Information Theory].
5.1
Notation and Preliminaries
For sets, types, etc., we use the same notations as the previous chapter. Unless specified, exponents and logarithms are taken in base 2. We use kxk∞ to de-
note the supremum norm, i.e. kxk∞ = maxi|xi|. The notation TQn, denotes the
(Q, n, )-typical set, i.e. the set of x ∈ Xnsatisfying kQ
x− Qk∞≤ .
A graph G = (V, E) is a pair of sets, where V is the set of vertices and E ⊂ V × V is the set of edges. Two vertices x, y ∈ V are connected iff (x, y) ∈ E. In this chapter we need only consider simple graphs, i.e. undirected graphs without self-loops. The degree of a vertex v, ∆(v), is the number of other vertices
Encoder R Decoder
X X
Y
^
to which v is connected. The degree of a graph G, denoted ∆(G) is defined as maxv∈V ∆(v). A coloring of a graph is an assignment of colors to vertices so
that no pair of adjacent vertices share the same color. The chromatic number of G, γ(G), is defined to be the fewest number of colors needed to color G. For U ⊂ V , G(U )is the (vertex-) induced subgraph, i.e. the graph with vertex set U and edge set E ∩ (U × U ). An independent set of G is a subgraph of G containing no edges. The graph ¯G is the graph complement of G, which has the same vertex set of G and two vertices are connected in ¯Gif and only if they are not connected in G. A clique of G is a subset of the vertices of G such that every two vertices are connected. A graph G is called perfect if the chromatic number of every induced subgraph, G(V0)is equal to the size of the largest clique of G(V0).
Let G = (V, E), H = (V0, E0) be two graphs. The strong product (also called
the and product or normal product) G ∧ H is a graph whose vertex set is V × V0
and in which two vertices (v, v0), (u, u0)are connected iff
1. v = u and (v0, u0) ∈ E0or 2. v0 = u0and (v, u) ∈ E or
3. (v, u) ∈ E and (v0, u0) ∈ E0.
We will be interested in Gn= G∧G∧. . .∧G(n-factors), the n-fold strong product
of G. One may think of the vertices of Gn as length n vectors (v
1, . . . , vn) with
two vertices are connected in Gn if each of the components of the vectors are
either the same or connected in G. The characteristic graph, GX, of a source PXY
is the graph whose vertex set is X and two vertices x, x0are connected if there is a y ∈ Y such that P (y|x0)P (y|x) > 0. For a given y, the set Z(y) = {x : P (x|y) > 0}is the set of ‘confusable’ sequences, i.e. the set of xs than can occur with a
given y.
For a graph G and distribution Q on the vertices of G, we define the follow- ing functional. Definition 9. κ(G, Q) = max W :W G QW =Q H(W |Q). (5.1)
Note: whenever we write the graph G where a matrix is expected, we abuse notation and refer to the matrix G = A + I where A is the adjacency matrix of graph G and I is the identity matrix.
A second equivalent definition of κ is κ(G, Q) = max
X, ˜X:QX=QX˜=Q
H( ˜X|X) (5.2)
where X and ˜X have common alphabet and P (˜x|x) > 0only if (˜x, x) ∈ E(G)or x = ˜x.
We remark that similar optimizations arise in the determination of maxi- mum entropy Markov chains subject to moment constraints [80].
We will also make use of the following graph functionals from graph/zero- error information theory.
Definition 10. The graph entropy [64], H(G, Q), of a graph G and a distribution Qon the vertices of G is defined as
H(G, Q) = min
X∈Z∈Γ(G)I(X; Z) (5.3)
where X is a random node in the graph and has distribution Q, Γ(G) denotes the set of all maximal independent sets of G, and the notation X ∈ Z means PZ|X(z|x) = 0for x 6∈ z.
Definition 11. The complementary graph entropy (or co-entropy or π-entropy) [62, 63], ¯H(G, Q)of a graph G with a distribution Q on the vertices of G is de- fined as ¯ H(G, Q) = lim →0lim supn→∞ log γ(GnX(TQn,)) n . (5.4)
Graph entropy and complementary graph entropy are related as follows (see for example [81, Theorem 4])
¯
H(G, Q) ≤ H(G, Q), (5.5)
and equality holds in (5.5) if G is perfect [82, Corollary 12].