Proof Techniques - Compressed Sensing of Memoryless Sources:A Deterministic Hadamard Constructi

1 be a sequence of i.i.d. random variables

in L with d(X1) > 0 and with a common probability distribution pX over R. Then, for

any > 0, there is a family of -REP(pX) partial Hadamard matrices of dimension

mN × N, for N = 2n with ρ = d(X1).

We also have the general result in Theorem 4.6, which implies that one can construct a family of truncated Hadamard matrices which is -REP for a class of distributions.

Theorem 4.6 (Achievability result). Let Π be a family of probability distributions

with strictly positive RID. Then, for any > 0, there is a family of -REP(Π) partial Hadamard matrices of dimension mN × N, for N = 2n, with ρ = supπ∈Πd(π).

Theorem 4.6 implies that there is a fixed ensemble of measurement matrices capable of capturing the information of all the distributions in the family Π. This is very useful in applications because usually taking the measurements is costly and most of the time, we do not have the exact distribution of the signal. If each distribution needs its own specific measurement matrix, we might need to do several rounds of measurements each time taking the measurements compatible with one specific distribution and do the recovery process for that specific distribution. The benefit of Theorem 4.6 is that one measurement ensemble works for all the distributions. It is also good to notice that although the measurement ensemble is fixed, the recovery (decoding) process might need to know the exact distribution of the signal in order to have successful recovery.

4.4 Proof Techniques

In this section, we will give a brief overview of the techniques used to prove the results. We will divide this section into two subsections. In Subsection 4.4.1, we will overview the proof techniques for the RID. Subsection 4.4.3 will be devoted to proof ideas and intuitions about the A2A compression problem.

4.4.1 Rényi Information Dimension

In this part, we will prove Theorem 4.1 and 4.2 which together give a full characterization of the RID over the space L.

Proof of Theorem 4.1. To prove the first part of the theorem, notice that

H([X₁n]q) .= H([X1n]q,Θk1) .= H([X1n]q|Θk1),

because H(Θk

1) ≤ k .= 0. As Θk1 ∈ {0, 1}k and takes finitely many values, it is

sufficient to show that for any realization θk

1, lim q→∞ H([Xn 1]q|θ1k) log2(q) = rank(ACθ). (4.3)

Taking the expectation over Θk

1, we will get the result. To prove (4.3), notice that

H([X₁n]q|θk1) = H([ACθUCθ+ AC¯θVC¯θ]q) .= H([ACθUCθ + AC¯θVC¯θ]q|VC¯θ) (4.4)

where we used H(VC¯θ) ≤ NH(V1) .= 0. We also used the fact that knowing VC¯θ,

[ACθUCθ]q and [ACθUCθ+ ACθ¯ VCθ¯ ]q are equal up to a finite uncertainty. Specifically,

suppose L is the minimum number of lattices of size 1

q required to cover ACθ¯ ×[0,2q]| ¯Cθ|,

which is a finite number. Then

H([ACθUCθ]q|VCθ¯ ,[ACθUCθ + ACθ¯ VCθ¯ ]q) ≤ log2(L),

which implies (4.4) and (4.5).

Generally ACθ is not full rank. Assume that the rank of ACθ is equal to m and let

Am be a subset of linearly independent rows. It is not difficult to see that knowing

[AmUCθ]q there is only finite uncertainty in the remaining components of [ACθUCθ]q,

which is negligible compared with log₂(q) as q tends to infinity. Therefore, we obtain

H([X₁n]q|θ1k) .= H([ACθUCθ]q) .= H([AmUCθ]q) .= m log2(q).

Thus, taking the limit as q tends to infinity, we obtain lim

q→∞

H([X₁n]|θ₁k)

log₂(q) = rank(ACθ).

Also, taking the expectation with respect to Θk

1, we obtain d(X1n) = E{rank(ACΘ)},

which is the desired result.

To prove the second part of the theorem, notice that H([Xn

1]q|Y1m) .= H([X1n]q|Y1m,Θk1).

For a specific realization θk

1, we have H([X₁n]q|Y1m, θk1) = H([ACθUCθ + ACθ¯ VCθ¯ ]q|BCθUCθ + BCθ¯ VCθ¯ ) . = H([ACθUCθ + ACθ¯ VCθ¯ ]q|BCθUCθ + BCθ¯ VCθ¯ , VCθ¯ ) . = H([ACθUCθ]q|BCθUCθ).

Generally, ACθ is not full-rank. Let Am be the set of all linearly independent rows of

ACθ of size m. Then H([ACθUCθ]q|BCθUCθ) .= H([AmUCθ]q|BCθUCθ).

It may happen that some of the rows of Amcan be written as a linear combination

of rows of BCθ. Let Ar be the remaining matrix after dropping m − r predictable

rows of Am. Given, BCθUCθ, ArUCθ has a continuous distribution thus

H([ArUCθ]q|BCθUCθ) .= r log2(q).

It is easy to check that r is exactly R(A; B)[Cθ]. Therefore, taking the expectation

with respect to Θk

1, we get d(X1n|Y1m) = E{R(A; B)[CΘ]}.

Using the results of Theorem 4.1, we can prove Theorem 4.2.

Proof of Theorem 4.2. For part 1, the proof is simple by considering the rank

characterization. We know that Xn

1 = AZ1k and d(X1n) = E{rank(ACΘ)}. Moreover,

M X₁n= MAZ₁k, thus d(X₁n) = E{rank(MACΘ)}. As M is invertible rank(ACΘ) =

rank(MACΘ), and we get the result.

For part 2, notice that for any realization θk

1 and the corresponding set Cθ,

4.4. Proof Techniques 77

Taking the expectation over Θk

1, we get the desired result

d(X₁n, Y₁m) = d(X₁n) + d(Y₁m_|X₁n) = d(Y₁m) + d(X₁n_|Y₁m).

For part 3, using the chain rule result from part 2 and applying the definition of

IR(X1n; Y1m), we get

IR(X1n; Y1m) = d(X1n) + d(Y1m) − d(X1n, Y1m),

which shows the symmetry of IR with respect to X₁n and Y₁m.

For part 4, notice that for a specific realization θk

1, a simple rank check shows

that R(A; B)[Cθ] ≤ rank(ACθ). Taking the expectation over Θk1, we get d(X1n|Y1m) ≤

d(X₁n).

If Xn

1 and Y1m are independent, the equality follows from the definition. For

the converse part, notice that if Xm

1 is fully discrete then d(X1n|Y1m) ≤ d(X1n) = 0.

Similarly, if Ym

1 is fully discrete then d(Y1m|X1n) ≤ d(Y1m) = 0 and using the identity

d(X₁n) − d(X₁n_|Y₁m) = d(Y₁m) − d(Y₁m_|X₁n), we get the equality. This case is fine

because after removing the discrete Zi, i∈ [k], either X1nor Y1m is equal to 0, namely,

a deterministic value, and the independence holds. Assume that none of Xn

1 or Y1m is fully discrete. Without loss of generality, let

1, r ≤ k, be the non-discrete random variables among Z1kand let ˜X1nand ˜Y1mbe the

resulting random vectors after dropping the discrete constituents, namely, we have ˜

X₁n = ArZ1r and ˜Y1m = BrZ1r, where Ar and Br are the matrices consisting of the

first r columns of A and B respectively. It is easy to check that d(Xn

1) = d( ˜X1n) and

d( ˜X₁n| ˜Y₁m) = d(X₁n|Ym

1 ). Thus, it remains to show that ˜X1n and ˜Y1m are independent.

As we have dropped all the discrete components, the resulting Θi, i ∈ [r], are 1

with strictly positive probability. This implies that for any realization of θn

1 and

the corresponding Cθ, R(Ar; Br)[Cθ] = rank(Ar,Cθ). In particular, this holds for any

Cθ of size 1, namely, for any column of Ar and Br, which implies that if Ar has

a non-zero column the corresponding column in Br must be zero and if Br has a

non-zero column then the corresponding column in Ar must be zero. This implies

that ˜X₁n and ˜Y₁m depend on disjoint subsets of the random variables Z₁r. Therefore,

they must be independent.

4.4.2 Polarization of the RID

In this part, we will prove the polarization of the RID in the single and multi-terminal case as stated in Theorem 4.3. The main idea is to use the recursive structure of the Hadamard matrices and the rank characterization of the RID in the space L.

Proof of Theorem 4.3. For the initial value, we have I0(1) = d(X1). Let n ∈ N

and N = 2n_{. To simplify the proof, instead of the Hadamard matrices, H, we will}

use shuffled Hadamard matrices, ˜H, constructed as follows: ˜H1 = H1 and ˜H2N is

constructed from ˜HN as follows

   ˜h1 ... ˜hN    →          ˜h1 , ˜h1 ˜h1 , _−˜h1 ... , ... ˜hi , ˜hi ˜hi , −˜hi ... , ...          ,

where ˜hi, i ∈ [N] denotes the i-th row of the ˜HN. Let X1N be as in Theorem 4.3 and

let ˜Z₁N = ˜HNX1N, where HN has been replaced by ˜HN. Also, let ˜In(i) = d( ˜Zi| ˜Z₁i−1),

i∈ [N]. We first prove that ˜I is also an erasure process with initial value d(X1) and

evolves as follows

˜In(i)+ = ˜In+1(2i − 1) = 2˜In(i) − ˜In(i)2

˜In(i)− = ˜In+1(2i) = ˜In(i)2,

where i ∈ [N] with the corresponding {+, −}-labeling bn

1. Let ˜Hi−1 and ˜Hi denote

the first i − 1 and the first i rows of ˜HN. Also, let ˜hi denote the i-th row of ˜HN.

We have ˜Z₁i = ˜HiX₁N and ˜Z₁i−1 = ˜Hi−1X₁N. As X₁N are i.i.d. nonsingular random

variables, it results that ˜Z₁i belong to the space L generated by X₁N. Notice that

using the rank characterization for the RID over L, we have

d( ˜Zi| ˜Z1i−1) = E{I( ˜Hi−1; ˜hi)[CΘ]},

where I( ˜Hi−1_{; ˜h}

i)[CΘ] ∈ {0, 1} is the amount of increase of rank of ˜HCi−1Θ by adding

˜hi. Consider the stage n + 1, where we have the shuffled Hadamard matrix ˜H2N.

Consider the row i+_{, which corresponds to the row 2i − 1 of ˜}_H

2N. If we look at the

first block of the new matrix, we simply notice that adding ˜hi has the same effect in

increasing the rank of this block as it had in ˜HN. A similar argument holds for the

second block. Moreover, adding ˜hi increases the rank of the matrix if it increases

the rank of either the first or the second block or both. Let 1i(ΘN1 ) ∈ {0, 1} denote

the random rank increase in ˜Hi−1 _{by adding ˜h}

i, then we have

12i−1(Θ2N1 ) = 1i(ΘN1 ) + 1i(ΘN2N+1) − 1i(ΘN1 )1i(Θ2NN+1).

ΘN

1 and Θ2NN+1 are i.i.d. random variables and a simple check shows that 1i(ΘN1 )

and 1i(Θ2NN+1) are also i.i.d. Taking the expectation value, we obtain

˜In(i)+= ˜In+1(2i − 1) = 2˜In(i) − ˜In(i)2. (4.6)

Moreover, if we denote ˜WN

1 = ˜HNXN2N+1, then by the structure of ˜HN, it is easy

to see that ˜In(i)+ and ˜In(i)− can be written as follows:

˜In(i)+= ˜In+1(2i − 1) = d( ˜Zi+ ˜Wi| ˜Z₁i−1, ˜W₁i−1),

˜In(i)−= ˜In+1(2i) = d( ˜Zi− ˜Wi| ˜Zi+ ˜Wi, ˜Z1i−1, ˜W1i−1).

Using the chain rule for the RID, we have ˜In(i)++ ˜In(i)−

2 = 12d( ˜Zi− ˜Wi, ˜Zi+ ˜Wi| ˜Z1i−1, ˜W1i−1)

= 1₂d( ˜Zi, ˜Wi| ˜Z₁i−1, ˜W₁i−1) = d( ˜Zi,| ˜Z₁i−1) = ˜In(i),

which along with (4.6), implies that ˜In(i)− = ˜In(i)2. Therefore, ˜I evolves like an

erasure process with the initial value d(X).

Notice that the only difference between HN and ˜HN is the permutation of the

rows, namely, there is a row shuffling matrix BN such that ˜HN = BNHN. It was

4.4. Proof Techniques 79

However, notice that XN

1 is an i.i.d. sequence and BNX1N is again an i.i.d. sequence

with the same distribution as XN

1 . In particular, adding or removing BN does not

change the RID values, which implies that for ZN

1 = HNX1N and In(i) = d(Zi|Z1i−1),

In(i) = ˜In(i). Therefore, I is also an erasure process with initial value d(X), which

polarizes to {0, 1}.

4.4.3 A2A Compression

In this part, we prove the achievability part for the single-terminal case.

Proof of Theorem 4.5. We will give an explicit construction of the the measure-

ment ensemble. Let n ∈ N and N = 2n_{. Assume that X}N

1 is a sequence of i.i.d.

nonsingular random variables with RID equal to d(X). Let ZN

1 = HNX1N, where HN

is the Hadamard matrix of order N. Also, assume that In(i) = d(Zi|Z1i−1), i ∈ [N].

As we proved in Theorem 4.3, I is an erasure process with initial value d(X). We will construct the measurement matrix ΦN by selecting all the rows of HN with the

corresponding In value greater than or equal to d(X). Therefore, we can construct

the measurement ensemble {ΦN} labelled with all N that are a power of 2. Assume

that the dimension of ΦN is mN × N. It remains to prove that the ensemble {ΦN}

is -REP with measurement rate d(X). This will complete the proof of Theorem 4.5. We first show that the family {ΦN} has measurement rate d(X). Notice that the

process Inconverges almost surely. Thus, it also converges in probability. Specifically,

considering the uniform probability assumption, we have lim sup N→∞ mN N = lim supN→∞ #{i ∈ [N] : In(i) ≥ d(X)} N = lim sup n→∞ P(In≥ d(X)) = P(I∞≥ d(X)) = d(X).

It remains to prove that {ΦN} is -REP. Let S = {i ∈ [N] : In(i) ≥ d(X)} denote

the selected rows to construct ΦN and let Z1N = HNX1N be the full measurements.

It is easy to check that ΦNX1N = ZS. Also, let Bi = Sc∩ [i − 1] denote all the indices

in Sc_{before i. We have} d(X₁N_|ZS) = d(Z1N|ZS) = d(ZSc|Z_S) = X i∈Sc d(Zi|ZBi, ZS) ≤ X i∈Sc d(Zi|Z1i−1) = X i∈Sc In(i) ≤ N d(X) = d(X1N),

which shows the -REP property for {ΦN}.

The last step is to prove Theorem 4.6, i.e., to show that for a family of mixture distributions Π with strictly positive RID, there is a fixed measurement family {ΦN},

which is -REP for all the distributions in Π with a measurement rate vector lying in the Rényi information region of of the family.

Proof of Theorem 4.6. The proof is simple considering the fact that the construc-

tion of the family {ΦN} in the proof of Theorem 4.5 depends only on the erasure

pattern. Also, the erasure pattern is independent of the shape of the distribution and only depends on its RID. Moreover, it can be shown that the erasure patterns for

different values of δ are ordered, namely, for δ > δ0_{, I}δ

n(i) ≥ Iδ

n(i), i ∈ [N]. Consider-

ing the method we use to construct the family {ΦN}, this implies that an -REP

measurement family designed for a specific RID δ is -REP for any distribution with RID less than δ. Thus, if we design {ΦN} for supπ∈Πd(π), it will be -REP for any

distribution in the family.

In document Compressed Sensing of Memoryless Sources:A Deterministic Hadamard Construction (Page 95-100)