of Sparse Signals
5.2 Compressive Sampling as Distributed Source Coding
FIGURE 5.2
Left: Depiction of Gaussian random vectors as an ellipsoid. Classical rate-distortion theory and transform coding results are for this sort of source, which serves as a good model for discrete cosine transform (DCT) coefficients of an image or modified discrete cosine transform (MDCT) coefficients of audio. Right: Depiction of 2-sparse signals inR3, which form a union of three
subspaces. This serves as a good conceptual model for wavelet coefficients of images.
Table 5.1 Performance Summary: Distortions for Several Scenarios WhenNIs Large with ␣⫽K/NHeld Constant. RateR and DistortionD Are Both Normalized byK.
JRepresents the Sparsity Pattern ofu = VTx. The Boxed Entry Is a Heuristic Analysis of the Compressive Sampling Case.H(·)Represents the Binary Entropy Function, and the Rotational LossR*SatisfiesR∗⫽O(logR)
Encoder
Centralized Distributed
(codeJand nonzeros ofu) (scalar coding of⌽x)
KnowsJ c2⫺2R c2⫺2(R⫺R∗)
a priori
Decoder Is toldJ c2⫺2(R⫺H(␣)/␣) c2⫺2(R⫺H(␣)/␣⫺R∗)
InfersJ N/A c␦(logN)2⫺2␦R
index, denotedJ, which will be detailed later. With all the proper accounting, when KN, the savings is more dramatic than just a constant number of bits.
Following the compressive sampling framework, one obtains a rather different way to compressx: quantize the measurementsy⫽⌽x, with⌽andV known to the decoder. Since⌽spreads the energy of the signal uniformly across the measurements, each measurement should be allocated the same number of bits. The decoder should estimate x as well as it can; we will not limit the computational capability of the decoder.
How well will compressive sampling work? It depends both on how much it matters to use the best basis (V) rather than a set of random vectors (⌽) and on how much the quantization ofy affects the decoder’s ability to infer the correct subspace. We separate these issues, and our results are previewed and summarized in Table 5.1. We will derive the first three entries and then the boxed result,which requires much more explanation. But first we will establish the setting more concretely.
5.2.1
Modeling Assumptions
To reflect the concept that the orthonormal basisV is not used in the sensor/encoder,
we modelV as random and available only at the estimator/decoder. It is chosen uni-
formly at random from the set of orthogonal matrices. The source vector x is also
random; to model it asK-sparse with respect toV, we letx⫽Vuwhereu∈RN hasK
nonzero entries in positions chosen uniformly at random. As depicted in Figure5.3,we
denote the nonzero entries ofubyuK∈RKand let the discrete random variableJrep- resent the sparsity pattern. Note that bothV and⌽can be considered side information available at the decoder but not at the encoder.
Let the components of uK be independent and Gaussian N(0,1). Observe that Eu2⫽K, and sinceV is orthogonal we also haveEx2⫽K. For the measure- ment matrix⌽, let the entries be independentN(0,1/K)and independent ofV and u. This normalization makes each entry ofyhave unit variance.
Let us now establish some notation to describe scalar quantization. When scalar yi is quantized to yieldyˆi, it is convenient to define the relative quantization error ⫽E|yi⫺yˆi|2
/E|yi|2
and then further define⫽1⫺and vi⫽yˆi⫺yi. These definitions yield a gain-plus-noise notationyˆi⫽yi⫹vi, where
2 v⫽E vi 2 ⫽(1⫺)Eyi 2 , (5.1)
to describe the effect of quantization. Quantizers with optimal (centroid) decoders result invbeing uncorrelated withy[14, Lemma 5.1]; other precise justifications are also possible [15]. Source of randomness V J uK V F F
Encoder bits Decoder
u x y y Q (·) Entropy coding bits ˆ x FIGURE 5.3
Block diagram representation of the compressive sampling scenario analyzed information- theoretically.Vis a random orthogonal matrix;uis aK-sparse vector withN(0,1) nonzero entries, and⌽is a Gaussian measurement matrix. More specifically, the sparsity pattern ofuis
represented byJ, and the nonzero entries are denoteduK. In the initial analysis, the encoding of
5.2
Compressive Sampling as Distributed Source Coding
117
In subsequent analyses, we will want to relateto the rate (number of bits) of the quantizer. The exact value ofdepends not only on the rateR, but also on the distribution ofyi and the particular quantization method. However, the scaling of withRis as 2⫺2Runder many different scenarios (see the Appendix to this chapter). We will write
⫽c2⫺2R (5.2) without repeatedly specifying the constantc⭓1.
With the established notation, the overall quantizer output vector can be written as ˆ
y⫽⌽Vu⫹v⫽Au⫹v, (5.3) whereA⫽⌽V. The overall source coding and decoding process, with the gain-plus- noise representation for quantization, is depicted in Figure 5.4. Our use of (5.3) is to enable easy analysis of linear estimation ofxfromyˆ.
5.2.2
Analyses
Since the sparsity levelKis the inherent number of degrees of freedom in the signal,
we will let there beKRbits available for the encoding ofx and also normalize the
distortion byK:D⫽K1Ex⫺xˆ2. Where applicable, the number of measurements
Mis a design parameter that can be optimized to give the best distortion-rate trade-off.
In particular, increasingMgives better conditioning of certain matrices, but it reduces
the number of quantization bits per measurement.
Before analyzing the compressive sampling scenario (Figure5.3),we consider some
simpler alternatives, yielding the first three entries in Table 5.1.
5.2.2.1
Signal in a Known Subspace
If the sparsifying basisV and subspaceJ are fixed and known to both (centralized)
encoder and decoder, the communication ofxcan be accomplished by sending quan-
tized versions of the nonzero entries ofV⫺1x. Each of theK nonzero entries has unit
variance and is allottedRbits, soD(R)⫽c2⫺2R performance is obtained, as given by
the first entry in Table5.1.
5.2.2.2
Adaptive Encoding with Communication ofJ
Now suppose thatV is known to both encoder and decoder, but the subspace index
J is random, uniformly selected from the NKpossibilities. A natural adaptive (and
u V x ⫽Vu y⫽⌽x Measurement ⌽ Quantization Decoding ⫽y⫹v ˆ x ˆ y FIGURE 5.4
centralized) approach is to spend log2NKbits to communicateJ and the remaining available bits to quantize the nonzero entries of V⫺1x. Defining R0⫽K1 log2NK, the encoder has KR⫺KR0 bits for theK nonzero entries ofV⫺1xand thus attains performance
Dadaptive(R)⫽c2⫺2(R⫺R0), R⭓R0. (5.4) WhenKandNare large with the ratio␣⫽K/Nheld constant,log2NK≈NH(␣)where H(p)⫽⫺plog2p⫺(1⫺p)log2(1⫺p) is the binary entropy function [16, p. 530]. ThusR0≈H(␣)/␣, giving a second entry in Table 5.1.
IfRdoes not exceedR0,then the derivation above does not make sense,and even if RexceedsR0by a small amount, it may not pay to communicateJ. Adirectapproach is to simply quantize each component ofxwithKR/Nbits. Since the components ofx have varianceK/N, performance ofE(xi⫺xˆi)2
⭐c(K/N)2⫺2KR/N can be obtained, yielding overall performance
Ddirect(R)⫽c2⫺2KR/N. (5.5) By choosing the better between (5.4) and (5.5) for a given rate, one obtains a simple baseline for the performance usingVat the encoder. A convexification by time sharing could also be applied, and more sophisticated techniques are presented in [17].
5.2.2.3
Loss from Random Measurements
Now let us try to understand in isolation the effect of observingxonly through⌽x.
The encoder sends a quantized version ofy⫽⌽x, and the decoder knowsV and the
sparsity pattern J.
From Equation (5.3), the decoder hasyˆ⫽⌽Vu⫹vand knows whichKelements
of u are nonzero. The performance of a linear estimate of the form xˆ⫽F(J)yˆ will depend on the singular values of theM-by-Kmatrix formed by theKrelevant columns of⌽V.1Using elementary results from random matrix theory, one can find how the distortion varies withM andR. (The distortion does not depend onN because the zero components ofuare known.) The analysis given in [21] shows that for moderate to highR, the distortion is minimized whenK/M≈1⫺((2 ln 2)R)⫺1. Choosing the number of measurements accordingly gives performance
DJ(R)≈2(ln 2)eR·c2⫺2R⫽c2⫺2(R⫺R
∗)
(5.6) whereR∗⫽12log2(2(ln 2)eR), giving the third entry in Table 5.1. Comparing toc2⫺2R, we see that having access only to separately quantized random measurements induces a significant performance loss.
One interpretation of this analysis is that the coding rate has effectively been reduced by R∗ bits per degree of freedom. Since R∗ grows sublinearly with R, the
1 One should expect a small improvement—roughly a multiplication of the distortion byK/M—
from the use of a nonlinear estimate that exploits the boundedness of quantization noise [18, 19]. The dependence on⌽Vis roughly unchanged [20].
5.2
Compressive Sampling as Distributed Source Coding
119
situation is not too bad—at least the performance does not degrade with increasing KorN. The analysis whenJ is not known at the decoder—that is, it must be inferred fromyˆ—reveals a much worse situation.
5.2.2.4
Loss from Sparsity Recovery
As we have mentioned before, compressive sampling is motivated by the idea that
the sparsity patternJ can be detected, through a computationally tractable convex
optimization, with a “small” number of measurements M. However, the number of
measurements required depends on the noise level. We sawM∼2Klog(N⫺K)⫹K
scaling is required by lasso reconstruction; if the noise is from quantization and we
are trying to code withKRtotal bits, this scaling leads to a vanishing number of bits
per measurement.
Unfortunately, the problem is more fundamental than the suboptimality of lasso
decoding. We will show that trying to code withKRtotal bits makes reliable recovery
of the sparsity pattern impossible as the signal dimensionN increases. In this analysis,
we assume that the sparsity ratio␣⫽K/N is held constant as the problems scale, and
we see that no number of measurementsMcan give good performance.
To see why the sparsity pattern cannot be recovered, consider the problem of
estimating the sparsity pattern ofufrom the noisy measurementyin (5.3). LetEsignal⫽
EAu2 and Enoise⫽E
v2 be the signal and noise energies, respectively, and define the SNR asSNR⫽Esignal/Enoise. The number of measurementsM required to recover the sparsity pattern ofufrom y can be bounded below with the following theorem.
Theorem 5.1. Consider any estimator for recovering the sparsity pattern of a K-sparse