Compressive Sampling as Distributed Source Coding

of Sparse Signals

5.2 Compressive Sampling as Distributed Source Coding

FIGURE 5.2

Left: Depiction of Gaussian random vectors as an ellipsoid. Classical rate-distortion theory and transform coding results are for this sort of source, which serves as a good model for discrete cosine transform (DCT) coefficients of an image or modified discrete cosine transform (MDCT) coefficients of audio. Right: Depiction of 2-sparse signals inR3_{, which form a union of three}

subspaces. This serves as a good conceptual model for wavelet coefﬁcients of images.

Table 5.1 Performance Summary: Distortions for Several Scenarios WhenNIs Large with ␣⫽K/NHeld Constant. RateR and DistortionD Are Both Normalized byK.

JRepresents the Sparsity Pattern ofu = VTx. The Boxed Entry Is a Heuristic Analysis of the Compressive Sampling Case.H(·)Represents the Binary Entropy Function, and the Rotational LossR*SatisﬁesR∗⫽O(logR)

Encoder

Centralized Distributed

(codeJand nonzeros ofu) (scalar coding of⌽x)

KnowsJ c2⫺2R c2⫺2(R⫺R∗)

a priori

Decoder Is toldJ c2⫺2(R⫺H(␣)/␣) _c₂⫺2(R⫺H(␣)/␣⫺R∗)

InfersJ N/A c␦(logN)2⫺2␦R

index, denotedJ, which will be detailed later. With all the proper accounting, when KN, the savings is more dramatic than just a constant number of bits.

Following the compressive sampling framework, one obtains a rather different way to compressx: quantize the measurementsy⫽⌽x, with⌽andV known to the decoder. Since⌽spreads the energy of the signal uniformly across the measurements, each measurement should be allocated the same number of bits. The decoder should estimate x as well as it can; we will not limit the computational capability of the decoder.

How well will compressive sampling work? It depends both on how much it matters to use the best basis (V) rather than a set of random vectors (⌽) and on how much the quantization ofy affects the decoder’s ability to infer the correct subspace. We separate these issues, and our results are previewed and summarized in Table 5.1. We will derive the ﬁrst three entries and then the boxed result,which requires much more explanation. But ﬁrst we will establish the setting more concretely.

5.2.1 Modeling Assumptions

To reﬂect the concept that the orthonormal basisV is not used in the sensor/encoder,

we modelV as random and available only at the estimator/decoder. It is chosen uni-

formly at random from the set of orthogonal matrices. The source vector x is also

random; to model it asK-sparse with respect toV, we letx⫽Vuwhereu∈RN hasK

nonzero entries in positions chosen uniformly at random. As depicted in Figure5.3,we

denote the nonzero entries ofubyuK∈RKand let the discrete random variableJrep- resent the sparsity pattern. Note that bothV and⌽can be considered side information available at the decoder but not at the encoder.

Let the components of uK be independent and Gaussian N(0,1). Observe that Eu2⫽K, and sinceV is orthogonal we also haveEx2⫽K. For the measurement matrix⌽, let the entries be independentN(0,1/K)and independent ofV and u. This normalization makes each entry ofyhave unit variance.

Let us now establish some notation to describe scalar quantization. When scalar yi is quantized to yieldyî, it is convenient to define the relative quantization error ␤⫽E|yi⫺yî|2

/E|yi|2

and then further define␳⫽1⫺␤and vi⫽yî⫺␳yi. These definitions yield a gain-plus-noise notationyî⫽␳yi⫹vi, where

␴2 v⫽E vi 2 ⫽␤(1⫺␤)Eyi 2 , (5.1)

to describe the effect of quantization. Quantizers with optimal (centroid) decoders result invbeing uncorrelated withy[14, Lemma 5.1]; other precise justiﬁcations are also possible [15]. Source of randomness V J uK V F F

Encoder bits Decoder

u x y y Q (·) Entropy coding bits ˆ x FIGURE 5.3

Block diagram representation of the compressive sampling scenario analyzed information- theoretically.Vis a random orthogonal matrix;uis aK-sparse vector withN(0,1) nonzero entries, and⌽is a Gaussian measurement matrix. More speciﬁcally, the sparsity pattern ofuis

represented byJ, and the nonzero entries are denoteduK. In the initial analysis, the encoding of

5.2 Compressive Sampling as Distributed Source Coding

117

In subsequent analyses, we will want to relate␤to the rate (number of bits) of the quantizer. The exact value of␤depends not only on the rateR, but also on the distribution ofyi and the particular quantization method. However, the scaling of␤ withRis as 2⫺2Runder many different scenarios (see the Appendix to this chapter). We will write

␤⫽c2⫺2R (5.2) without repeatedly specifying the constantc⭓1.

With the established notation, the overall quantizer output vector can be written as ˆ

y⫽␳⌽Vu⫹v⫽Au⫹v, (5.3) whereA⫽␳⌽V. The overall source coding and decoding process, with the gain-plus- noise representation for quantization, is depicted in Figure 5.4. Our use of (5.3) is to enable easy analysis of linear estimation ofxfromyˆ.

5.2.2 Analyses

Since the sparsity levelKis the inherent number of degrees of freedom in the signal,

we will let there beKRbits available for the encoding ofx and also normalize the

distortion byK:D⫽_K1Ex⫺xˆ2. Where applicable, the number of measurements

Mis a design parameter that can be optimized to give the best distortion-rate trade-off.

In particular, increasingMgives better conditioning of certain matrices, but it reduces

the number of quantization bits per measurement.

Before analyzing the compressive sampling scenario (Figure5.3),we consider some

simpler alternatives, yielding the ﬁrst three entries in Table 5.1.

5.2.2.1 Signal in a Known Subspace

If the sparsifying basisV and subspaceJ are ﬁxed and known to both (centralized)

encoder and decoder, the communication ofxcan be accomplished by sending quan-

tized versions of the nonzero entries ofV⫺1x. Each of theK nonzero entries has unit

variance and is allottedRbits, soD(R)⫽c2⫺2R performance is obtained, as given by

the ﬁrst entry in Table5.1.

5.2.2.2 Adaptive Encoding with Communication ofJ

Now suppose thatV is known to both encoder and decoder, but the subspace index

J is random, uniformly selected from the N_Kpossibilities. A natural adaptive (and

u V x ⫽Vu y⫽⌽x Measurement ⌽ Quantization Decoding ⫽␳y_⫹v ˆ x ˆ y FIGURE 5.4

centralized) approach is to spend log₂N_Kbits to communicateJ and the remaining available bits to quantize the nonzero entries of V⫺1x. Deﬁning R0⫽_K1 log₂N_K, the encoder has KR⫺KR0 bits for theK nonzero entries ofV⫺1xand thus attains performance

Dadaptive(R)⫽c2⫺2(R⫺R0), R⭓R0. (5.4) WhenKandNare large with the ratio␣⫽K/Nheld constant,log₂N_K≈NH(␣)where H(p)⫽⫺plog₂p⫺(1⫺p)log₂(1⫺p) is the binary entropy function [16, p. 530]. ThusR0≈H(␣)/␣, giving a second entry in Table 5.1.

IfRdoes not exceedR0,then the derivation above does not make sense,and even if RexceedsR0by a small amount, it may not pay to communicateJ. Adirectapproach is to simply quantize each component ofxwithKR/Nbits. Since the components ofx have varianceK/N, performance ofE(xi⫺xˆi)2

⭐c(K/N)2⫺2KR/N can be obtained, yielding overall performance

Ddirect(R)⫽c2⫺2KR/N. (5.5) By choosing the better between (5.4) and (5.5) for a given rate, one obtains a simple baseline for the performance usingVat the encoder. A convexiﬁcation by time sharing could also be applied, and more sophisticated techniques are presented in [17].

5.2.2.3 Loss from Random Measurements

Now let us try to understand in isolation the effect of observingxonly through⌽x.

The encoder sends a quantized version ofy⫽⌽x, and the decoder knowsV and the

sparsity pattern J.

From Equation (5.3), the decoder hasyˆ⫽␳⌽Vu⫹vand knows whichKelements

of u are nonzero. The performance of a linear estimate of the form xˆ⫽F(J)yˆ will depend on the singular values of theM-by-Kmatrix formed by theKrelevant columns of⌽V.1Using elementary results from random matrix theory, one can ﬁnd how the distortion varies withM andR. (The distortion does not depend onN because the zero components ofuare known.) The analysis given in [21] shows that for moderate to highR, the distortion is minimized whenK/M≈1⫺((2 ln 2)R)⫺1. Choosing the number of measurements accordingly gives performance

DJ(R)≈2(ln 2)eR·c2⫺2R⫽c2⫺2(R⫺R

∗₎

(5.6) whereR∗⫽1₂log₂(2(ln 2)eR), giving the third entry in Table 5.1. Comparing toc2⫺2R, we see that having access only to separately quantized random measurements induces a signiﬁcant performance loss.

One interpretation of this analysis is that the coding rate has effectively been reduced by R∗ bits per degree of freedom. Since R∗ grows sublinearly with R, the

1 _{One should expect a small improvement—roughly a multiplication of the distortion by}_K_/_M_—

from the use of a nonlinear estimate that exploits the boundedness of quantization noise [18, 19]. The dependence on⌽Vis roughly unchanged [20].

5.2 Compressive Sampling as Distributed Source Coding

119

situation is not too bad—at least the performance does not degrade with increasing KorN. The analysis whenJ is not known at the decoder—that is, it must be inferred fromyˆ—reveals a much worse situation.

5.2.2.4 Loss from Sparsity Recovery

As we have mentioned before, compressive sampling is motivated by the idea that

the sparsity patternJ can be detected, through a computationally tractable convex

optimization, with a “small” number of measurements M. However, the number of

measurements required depends on the noise level. We sawM∼2Klog(N⫺K)⫹K

scaling is required by lasso reconstruction; if the noise is from quantization and we

are trying to code withKRtotal bits, this scaling leads to a vanishing number of bits

per measurement.

Unfortunately, the problem is more fundamental than the suboptimality of lasso

decoding. We will show that trying to code withKRtotal bits makes reliable recovery

of the sparsity pattern impossible as the signal dimensionN increases. In this analysis,

we assume that the sparsity ratio␣⫽K/N is held constant as the problems scale, and

we see that no number of measurementsMcan give good performance.

To see why the sparsity pattern cannot be recovered, consider the problem of

estimating the sparsity pattern ofufrom the noisy measurementyin (5.3). LetEsignal⫽

EAu2 and Enoise⫽E

v2 be the signal and noise energies, respectively, and deﬁne the SNR asSNR⫽Esignal/Enoise. The number of measurementsM required to recover the sparsity pattern ofufrom y can be bounded below with the following theorem.

Theorem 5.1. Consider any estimator for recovering the sparsity pattern of a K-sparse

In document Distributed Source Coding Theory, Algorithms and Applications tqw darksiderg pdf (Page 121-125)