1 be a sequence of i.i.d. random variables
in L with d(X1) > 0 and with a common probability distribution pX over R. Then, for
any > 0, there is a family of -REP(pX) partial Hadamard matrices of dimension
mN × N, for N = 2n with ρ = d(X1).
We also have the general result in Theorem 4.6, which implies that one can construct a family of truncated Hadamard matrices which is -REP for a class of distributions.
Theorem 4.6 (Achievability result). Let Π be a family of probability distributions
with strictly positive RID. Then, for any > 0, there is a family of -REP(Π) partial Hadamard matrices of dimension mN × N, for N = 2n, with ρ = supπ∈Πd(π).
Theorem 4.6 implies that there is a fixed ensemble of measurement matrices capable of capturing the information of all the distributions in the family Π. This is very useful in applications because usually taking the measurements is costly and most of the time, we do not have the exact distribution of the signal. If each distribution needs its own specific measurement matrix, we might need to do several rounds of measurements each time taking the measurements compatible with one specific distribution and do the recovery process for that specific distribution. The benefit of Theorem 4.6 is that one measurement ensemble works for all the distributions. It is also good to notice that although the measurement ensemble is fixed, the recovery (decoding) process might need to know the exact distribution of the signal in order to have successful recovery.
4.4 Proof Techniques
In this section, we will give a brief overview of the techniques used to prove the results. We will divide this section into two subsections. In Subsection 4.4.1, we will overview the proof techniques for the RID. Subsection 4.4.3 will be devoted to proof ideas and intuitions about the A2A compression problem.
4.4.1 Rényi Information Dimension
In this part, we will prove Theorem 4.1 and 4.2 which together give a full characteri- zation of the RID over the space L.
Proof of Theorem 4.1. To prove the first part of the theorem, notice that
H([X1n]q) .= H([X1n]q,Θk1) .= H([X1n]q|Θk1),
because H(Θk
1) ≤ k .= 0. As Θk1 ∈ {0, 1}k and takes finitely many values, it is
sufficient to show that for any realization θk
1, lim q→∞ H([Xn 1]q|θ1k) log2(q) = rank(ACθ). (4.3)
Taking the expectation over Θk
1, we will get the result. To prove (4.3), notice that
H([X1n]q|θk1) = H([ACθUCθ+ AC¯θVC¯θ]q) .= H([ACθUCθ + AC¯θVC¯θ]q|VC¯θ) (4.4)
.
where we used H(VC¯θ) ≤ NH(V1) .= 0. We also used the fact that knowing VC¯θ,
[ACθUCθ]q and [ACθUCθ+ ACθ¯ VCθ¯ ]q are equal up to a finite uncertainty. Specifically,
suppose L is the minimum number of lattices of size 1
q required to cover ACθ¯ ×[0,2q]| ¯Cθ|,
which is a finite number. Then
H([ACθUCθ]q|VCθ¯ ,[ACθUCθ + ACθ¯ VCθ¯ ]q) ≤ log2(L),
which implies (4.4) and (4.5).
Generally ACθ is not full rank. Assume that the rank of ACθ is equal to m and let
Am be a subset of linearly independent rows. It is not difficult to see that knowing
[AmUCθ]q there is only finite uncertainty in the remaining components of [ACθUCθ]q,
which is negligible compared with log2(q) as q tends to infinity. Therefore, we obtain
H([X1n]q|θ1k) .= H([ACθUCθ]q) .= H([AmUCθ]q) .= m log2(q).
Thus, taking the limit as q tends to infinity, we obtain lim
q→∞
H([X1n]|θ1k)
log2(q) = rank(ACθ).
Also, taking the expectation with respect to Θk
1, we obtain d(X1n) = E{rank(ACΘ)},
which is the desired result.
To prove the second part of the theorem, notice that H([Xn
1]q|Y1m) .= H([X1n]q|Y1m,Θk1).
For a specific realization θk
1, we have H([X1n]q|Y1m, θk1) = H([ACθUCθ + ACθ¯ VCθ¯ ]q|BCθUCθ + BCθ¯ VCθ¯ ) . = H([ACθUCθ + ACθ¯ VCθ¯ ]q|BCθUCθ + BCθ¯ VCθ¯ , VCθ¯ ) . = H([ACθUCθ]q|BCθUCθ).
Generally, ACθ is not full-rank. Let Am be the set of all linearly independent rows of
ACθ of size m. Then H([ACθUCθ]q|BCθUCθ) .= H([AmUCθ]q|BCθUCθ).
It may happen that some of the rows of Amcan be written as a linear combination
of rows of BCθ. Let Ar be the remaining matrix after dropping m − r predictable
rows of Am. Given, BCθUCθ, ArUCθ has a continuous distribution thus
H([ArUCθ]q|BCθUCθ) .= r log2(q).
It is easy to check that r is exactly R(A; B)[Cθ]. Therefore, taking the expectation
with respect to Θk
1, we get d(X1n|Y1m) = E{R(A; B)[CΘ]}.
Using the results of Theorem 4.1, we can prove Theorem 4.2.
Proof of Theorem 4.2. For part 1, the proof is simple by considering the rank
characterization. We know that Xn
1 = AZ1k and d(X1n) = E{rank(ACΘ)}. Moreover,
M X1n= MAZ1k, thus d(X1n) = E{rank(MACΘ)}. As M is invertible rank(ACΘ) =
rank(MACΘ), and we get the result.
For part 2, notice that for any realization θk
1 and the corresponding set Cθ,
4.4. Proof Techniques 77
Taking the expectation over Θk
1, we get the desired result
d(X1n, Y1m) = d(X1n) + d(Y1m|X1n) = d(Y1m) + d(X1n|Y1m).
For part 3, using the chain rule result from part 2 and applying the definition of
IR(X1n; Y1m), we get
IR(X1n; Y1m) = d(X1n) + d(Y1m) − d(X1n, Y1m),
which shows the symmetry of IR with respect to X1n and Y1m.
For part 4, notice that for a specific realization θk
1, a simple rank check shows
that R(A; B)[Cθ] ≤ rank(ACθ). Taking the expectation over Θk1, we get d(X1n|Y1m) ≤
d(X1n).
If Xn
1 and Y1m are independent, the equality follows from the definition. For
the converse part, notice that if Xm
1 is fully discrete then d(X1n|Y1m) ≤ d(X1n) = 0.
Similarly, if Ym
1 is fully discrete then d(Y1m|X1n) ≤ d(Y1m) = 0 and using the identity
d(X1n) − d(X1n|Y1m) = d(Y1m) − d(Y1m|X1n), we get the equality. This case is fine
because after removing the discrete Zi, i∈ [k], either X1nor Y1m is equal to 0, namely,
a deterministic value, and the independence holds. Assume that none of Xn
1 or Y1m is fully discrete. Without loss of generality, let
Zr
1, r ≤ k, be the non-discrete random variables among Z1kand let ˜X1nand ˜Y1mbe the
resulting random vectors after dropping the discrete constituents, namely, we have ˜
X1n = ArZ1r and ˜Y1m = BrZ1r, where Ar and Br are the matrices consisting of the
first r columns of A and B respectively. It is easy to check that d(Xn
1) = d( ˜X1n) and
d( ˜X1n| ˜Y1m) = d(X1n|Ym
1 ). Thus, it remains to show that ˜X1n and ˜Y1m are independent.
As we have dropped all the discrete components, the resulting Θi, i ∈ [r], are 1
with strictly positive probability. This implies that for any realization of θn
1 and
the corresponding Cθ, R(Ar; Br)[Cθ] = rank(Ar,Cθ). In particular, this holds for any
Cθ of size 1, namely, for any column of Ar and Br, which implies that if Ar has
a non-zero column the corresponding column in Br must be zero and if Br has a
non-zero column then the corresponding column in Ar must be zero. This implies
that ˜X1n and ˜Y1m depend on disjoint subsets of the random variables Z1r. Therefore,
they must be independent.
4.4.2 Polarization of the RID
In this part, we will prove the polarization of the RID in the single and multi-terminal case as stated in Theorem 4.3. The main idea is to use the recursive structure of the Hadamard matrices and the rank characterization of the RID in the space L.
Proof of Theorem 4.3. For the initial value, we have I0(1) = d(X1). Let n ∈ N
and N = 2n. To simplify the proof, instead of the Hadamard matrices, H, we will
use shuffled Hadamard matrices, ˜H, constructed as follows: ˜H1 = H1 and ˜H2N is
constructed from ˜HN as follows
˜h1 ... ˜hN → ˜h1 , ˜h1 ˜h1 , −˜h1 ... , ... ˜hi , ˜hi ˜hi , −˜hi ... , ... ,
where ˜hi, i ∈ [N] denotes the i-th row of the ˜HN. Let X1N be as in Theorem 4.3 and
let ˜Z1N = ˜HNX1N, where HN has been replaced by ˜HN. Also, let ˜In(i) = d( ˜Zi| ˜Z1i−1),
i∈ [N]. We first prove that ˜I is also an erasure process with initial value d(X1) and
evolves as follows
˜In(i)+ = ˜In+1(2i − 1) = 2˜In(i) − ˜In(i)2
˜In(i)− = ˜In+1(2i) = ˜In(i)2,
where i ∈ [N] with the corresponding {+, −}-labeling bn
1. Let ˜Hi−1 and ˜Hi denote
the first i − 1 and the first i rows of ˜HN. Also, let ˜hi denote the i-th row of ˜HN.
We have ˜Z1i = ˜HiX1N and ˜Z1i−1 = ˜Hi−1X1N. As X1N are i.i.d. nonsingular random
variables, it results that ˜Z1i belong to the space L generated by X1N. Notice that
using the rank characterization for the RID over L, we have
d( ˜Zi| ˜Z1i−1) = E{I( ˜Hi−1; ˜hi)[CΘ]},
where I( ˜Hi−1; ˜h
i)[CΘ] ∈ {0, 1} is the amount of increase of rank of ˜HCi−1Θ by adding
˜hi. Consider the stage n + 1, where we have the shuffled Hadamard matrix ˜H2N.
Consider the row i+, which corresponds to the row 2i − 1 of ˜H
2N. If we look at the
first block of the new matrix, we simply notice that adding ˜hi has the same effect in
increasing the rank of this block as it had in ˜HN. A similar argument holds for the
second block. Moreover, adding ˜hi increases the rank of the matrix if it increases
the rank of either the first or the second block or both. Let 1i(ΘN1 ) ∈ {0, 1} denote
the random rank increase in ˜Hi−1 by adding ˜h
i, then we have
12i−1(Θ2N1 ) = 1i(ΘN1 ) + 1i(ΘN2N+1) − 1i(ΘN1 )1i(Θ2NN+1).
ΘN
1 and Θ2NN+1 are i.i.d. random variables and a simple check shows that 1i(ΘN1 )
and 1i(Θ2NN+1) are also i.i.d. Taking the expectation value, we obtain
˜In(i)+= ˜In+1(2i − 1) = 2˜In(i) − ˜In(i)2. (4.6)
Moreover, if we denote ˜WN
1 = ˜HNXN2N+1, then by the structure of ˜HN, it is easy
to see that ˜In(i)+ and ˜In(i)− can be written as follows:
˜In(i)+= ˜In+1(2i − 1) = d( ˜Zi+ ˜Wi| ˜Z1i−1, ˜W1i−1),
˜In(i)−= ˜In+1(2i) = d( ˜Zi− ˜Wi| ˜Zi+ ˜Wi, ˜Z1i−1, ˜W1i−1).
Using the chain rule for the RID, we have ˜In(i)++ ˜In(i)−
2 = 12d( ˜Zi− ˜Wi, ˜Zi+ ˜Wi| ˜Z1i−1, ˜W1i−1)
= 12d( ˜Zi, ˜Wi| ˜Z1i−1, ˜W1i−1) = d( ˜Zi,| ˜Z1i−1) = ˜In(i),
which along with (4.6), implies that ˜In(i)− = ˜In(i)2. Therefore, ˜I evolves like an
erasure process with the initial value d(X).
Notice that the only difference between HN and ˜HN is the permutation of the
rows, namely, there is a row shuffling matrix BN such that ˜HN = BNHN. It was
4.4. Proof Techniques 79
However, notice that XN
1 is an i.i.d. sequence and BNX1N is again an i.i.d. sequence
with the same distribution as XN
1 . In particular, adding or removing BN does not
change the RID values, which implies that for ZN
1 = HNX1N and In(i) = d(Zi|Z1i−1),
In(i) = ˜In(i). Therefore, I is also an erasure process with initial value d(X), which
polarizes to {0, 1}.
4.4.3 A2A Compression
In this part, we prove the achievability part for the single-terminal case.
Proof of Theorem 4.5. We will give an explicit construction of the the measure-
ment ensemble. Let n ∈ N and N = 2n. Assume that XN
1 is a sequence of i.i.d.
nonsingular random variables with RID equal to d(X). Let ZN
1 = HNX1N, where HN
is the Hadamard matrix of order N. Also, assume that In(i) = d(Zi|Z1i−1), i ∈ [N].
As we proved in Theorem 4.3, I is an erasure process with initial value d(X). We will construct the measurement matrix ΦN by selecting all the rows of HN with the
corresponding In value greater than or equal to d(X). Therefore, we can construct
the measurement ensemble {ΦN} labelled with all N that are a power of 2. Assume
that the dimension of ΦN is mN × N. It remains to prove that the ensemble {ΦN}
is -REP with measurement rate d(X). This will complete the proof of Theorem 4.5. We first show that the family {ΦN} has measurement rate d(X). Notice that the
process Inconverges almost surely. Thus, it also converges in probability. Specifically,
considering the uniform probability assumption, we have lim sup N→∞ mN N = lim supN→∞ #{i ∈ [N] : In(i) ≥ d(X)} N = lim sup n→∞ P(In≥ d(X)) = P(I∞≥ d(X)) = d(X).
It remains to prove that {ΦN} is -REP. Let S = {i ∈ [N] : In(i) ≥ d(X)} denote
the selected rows to construct ΦN and let Z1N = HNX1N be the full measurements.
It is easy to check that ΦNX1N = ZS. Also, let Bi = Sc∩ [i − 1] denote all the indices
in Scbefore i. We have d(X1N|ZS) = d(Z1N|ZS) = d(ZSc|ZS) = X i∈Sc d(Zi|ZBi, ZS) ≤ X i∈Sc d(Zi|Z1i−1) = X i∈Sc In(i) ≤ N d(X) = d(X1N),
which shows the -REP property for {ΦN}.
The last step is to prove Theorem 4.6, i.e., to show that for a family of mixture distributions Π with strictly positive RID, there is a fixed measurement family {ΦN},
which is -REP for all the distributions in Π with a measurement rate vector lying in the Rényi information region of of the family.
Proof of Theorem 4.6. The proof is simple considering the fact that the construc-
tion of the family {ΦN} in the proof of Theorem 4.5 depends only on the erasure
pattern. Also, the erasure pattern is independent of the shape of the distribution and only depends on its RID. Moreover, it can be shown that the erasure patterns for
different values of δ are ordered, namely, for δ > δ0, Iδ
n(i) ≥ Iδ
0
n(i), i ∈ [N]. Consider-
ing the method we use to construct the family {ΦN}, this implies that an -REP
measurement family designed for a specific RID δ is -REP for any distribution with RID less than δ. Thus, if we design {ΦN} for supπ∈Πd(π), it will be -REP for any
distribution in the family.