Asymptotic Performance of Sparse Signal Detection Using Convex Programming Method

(1)

Contents lists available at ScienceDirect

Chinese Journal of Aeronautics

journal homepage: www.elsevier.com/locate/cja

Asymptotic Performance of Sparse Signal Detection

Using Convex Programming Method

LEI Chuan

a,b

, ZHANG Jun

a,b,

*

a_{School of Electronics and Information Engineering, Beihang University, Beijing 100191, China} b_{National Key Laboratory of CNS/ATM, Beijing 100191, China}

Received 1 July 2011; revised 9 October 2011; accepted 17 April 2012

Abstract

The detection of sparse signals against background noise is considered. Detecting signals of such kind is difficult since only a small portion of the signal carries information. Prior knowledge is usually assumed to ease detection. In this paper, we consider the general unknown and arbitrary sparse signal detection problem when no prior knowledge is available. Under a Ney-man-Pearson hypothesis-testing framework, a new detection scheme is proposed by combining a generalized likelihood ratio test (GLRT)-like test statistic and convex programming methods which directly exploit sparsity in an underdetermined system of linear equations. We characterize large sample behavior of the proposed method by analyzing its asymptotic performance. Spe-cifically, we give the condition for the Chernoff-consistent detection which shows that the proposed method is very sensitive to theA2norm energy of the sparse signals. Both the false alarm rate and the miss rate tend to zero at vanishing signal-to-noise ratio

(SNR), as long as the signal energy grows at least logarithmically with the problem dimension. Next we give a large deviation analysis to characterize the error exponent for the Neyman-Pearson detection. We derive the oracle error exponent assuming signal knowledge. Then we explicitly derive the error exponent of the proposed scheme and compare it with the oracle exponent. We complement our study with numerical experiments, showing that the proposed method performs in the vicinity of the likeli-hood ratio test (LRT) method in the finite sample scenario and the error probability degrades exponentially with the number of observations.

Keywords: signal detection; convex programming; asymptotic analysis; signal reconstruction; sparse signals

1. Introduction1

In this paper we consider the sparse signal detection problem against background noise. In the application where a network of sensors [1] or meters [2] is deployed to monitor certain parameter in a geometrical area, we want to detect if there is an abnormal event from

*Corresponding author. Tel.: +86-10-82339684.

E-mail address: [email protected]

Foundation items: National Basic Research Program of China (2011CB707000); Innovative Research Group Na-tional Natural Science Foundation of China (60921001)

measurements of the sensors or meters when the ma-jority of sensors/meters are not affected. The detection of such sparse pattern of abnormal events against joint no-alarm state is crucial for the early warning of ab-normality.

In other applications such as acoustic signal proc-essing, a sparse signal recording with contiguous large entries may suggest a rapid change in a short duration of time. The detection of such “transient” events is of particular interest [3-6]. In addition, this general problem also appears in the applications of genomics [7] and image processing [8].

These sparse signals (either subtle or rapid) are dif-ficult to be captured because the useful information is contained in the unknown small fraction of the signal

(2)

entries. Various prior assumptions therefore are made to ease detection in the existing approaches. However, in most situations the detectors need to identify targets buried in their surroundings using no other a priori information beyond the fact that the targets are rare (sparse). Thus, in this paper we are interested in de-tecting sparse signals with unknown support sets and arbitrary non-zero entries. It is our goal to perform sparse signal detection in an unsupervised setting by preserving the sparse targets and reducing the effect of background noise.

It is well known that with the exact knowledge of the signal, the likelihood ratio test (LRT) gives optimal test performance in a Neyman-Pearson framework by maximizing the detection rate. However, the LRT be-comes inapplicable without a precise specification of the sparse signal.

Another classical approach, which is most com-monly used in the composite hypothesis testing, is the generalized likelihood ratio test (GLRT) (see Trees [9] and Kay [10]). In the GLRT approach, the idea is to im-plement an LRT with the unknown parameters being replaced by their maximum likelihood (ML) estimates. However, the cost of computing the GLRT could be enormous unless the sets of possible values of the un-known parameters are moderate. Although in some situations GLRT is asymptotically optimal (e.g. the work of Zeitouni, et al. [11] for a Neyman-Pearson-like setting), the assumption that the parameter sets under the two hypotheses form a partition of the parameter space is crucial to the optimality of the GLRT [11-12]. Obviously this assumption does not hold here. In addi-tion, Bickel and Chernoff [13] and Hartigan [14] have shown that the GLRT has nonstandard behavior in the setting of sparse signal detection, where the maximized generalized likelihood ratio tends to infinity under the null hypothesis.

Recently the sparse detection problem is also studied in the settings of sparse normal means testing and non-Gaussian signatures detection. A characteristic of this line of approach is that it tests for a global null against a sparse alternative and does not identify which entries are non-zero when the alternative is true. In these studies, researchers have turned to performance metrics in an asymptotic sense. In particular, by as-suming a Gaussian mixture model, Ingster [15] studied the asymptotic detection problem where the sparse mean vector has equal non-zero entries. The research showed that there was a detection boundary on the plane of signal sparsity and non-zero entry magnitude. Inside this boundary, LRT is asymptotically powerful, viz., it can diminish error probabilities as signal di-mension grows. Outside of the boundary all tests are asymptotically powerless. Donoho and Jin [16] proposed a type of second-level significance testing statistic and showed that it is asymptotically powerful within the detection region established by Ingster under the two-point Gaussian mixture model. Although these techniques are asymptotically optimal under certain

model, they are incapable in the identification of the non-zero entries. Moreover these techniques do not have natural generalizations to the detection of arbi-trary sparse means and in general are not effective in finite size problems.

The researches closest to this paper were presented in Ref. [17] and Ref. [18]. These works examine the performance of compressive sensing techniques in de-tectionand general inferenceproblems. Results in these works have shown the performance of standard statis-tical techniques in a compressive measurement frame-work using random projections. In this paper we show that the compressive-sensing-type techniques can still be effective in the detection of a sparse signal.

The primary interest of this paper is to investigate the performance of sparse signal detection on the basis of convex optimization techniques. The motivation is the combinatorial nature of the search for a sparse sig-nal in noisy observations [19] and the resulting computa-tional ineffectiveness of maximum likelihood (ML)- based search techniques. To the best of our knowledge, few have studied the performance of convex program-ming techniques for the detection of an unknown and arbitrary sparse signal in a composite hypothesis- test-ing setup.

In this paper, a new approach which we refer to as likelihood ratio test with sparse estimation (LRT-SE) is proposed to detect unknown and arbitrary sparse sig-nals. The method employs anA1-regularized

optimiza-tion method (also known as LASSO [20] in the context of regression problems in machine learning) to esti-mate the unknown and arbitrary sparse signal. Com-pared with the classical ML method, A1-regularized

optimization results in significant differences in com-putational complexity and theoretical behavior. More-over, unlike the property that the ML estimate does not have any specific accuracy property for a finite sample size, there are well-established results in the literature on the sparse signal estimation accuracy us-ingA1-regularized optimizations. This has allowed us to derive some kind of performance guarantee for the finite-dimensional signals which is useful in a re-al-world implementation.

We analyze the performance of LRT-SE in Gaussian noise. The techniques used in this paper are from as-ymptotic statistical theory and the results apply to cases where a large number of samples are observed, for example, using large-scale sensor networks. It is wor-thy to note that although the analysis in this paper is asymptotic, the proposed scheme gives satisfactory performance in finite-size detection problems.

The rest of this paper is organized as follows. A formulation of the sparse signal detection problem is presented in Section 2. The scheme of LRT-SE and the performance of the sparse estimation are introduced in Section 3. The asymptotic performance of LRT-SE is analyzed in Section 4. The simulation results based on synthetic signals are presented in Section 5. Finally

(3)

conclusions and comments are made in Section 6.

2. Problem Statement

Let x, s, n denote the observations, the sparse signal and the noise, respectively. S is the given sparsity index. We formulate the problem in the binary hypothesis testing framework as follows:

0 1 0 : : , H H dS ® ¯ x n x s n ý ýs (1) where_{x s n}, , _Rn

. and ý ýs ₀ denotes the number of non-zeroentries in s.

We consider our problem in the Neyman-Pearson setting. We assume the noise is white Gaussian noise with 2

~N( ,V n)

n 0 I .

Our problem is different from what is considered in the sparse normal means testing problem. See Ref. [16] for example. Letxidenote the thi entry ofx. The prob-lem in Ref. [16] is to test between

i.i.d 0 i.i.d 1 : ~ (0,1), 1 : ~ (1 ) (0,1) ( ,1), 0, 1 i i x i n x N N H N H i n H H P H d d ! d d °° ® ° °¯ (2)

Translating the above problem into our terms, it is equivalent to assumes_i P, i supp(s) in Eq. (2), where supp(s) is the support set of a vector s. Our problem is more general by making no prior assump-tions about the signalsapart from its sparsity, i.e.,

i

s can take any real number ifisupp(s). Furthermore, we do not assume stationarity of the signal, as assumed in Eq. (2). This nonstationarity of the signal precludes many conventional techniques which require a station-ary source.

TheApnorm (p!0) of a vectorxis denoted as

1/ 1 ( ) n p p p i i x

¦

ý ýx (3)

TheA0norm is defined as

0 #{xi|xi z0}

ý ýx (4)

where # means the number of elements in the set {xi | xiz0}.

The support set of a vectorxis denoted as

supp( )x { |i xi z0} (5) 3. LRT with Sparse Estimation(LRT-SE)

3.1. Detection statistic

The problem in Eq. (2) is a composite hypothesis test and the uniformly most powerful test does not exist in general.

A general strategy for testing composite hypothesis

is the so-called joint estimation and detection technique [21], where the unknown signal is estimated assum-ing H1 is true and the estimate is used in the likelihood

ratio test as if it were the real signal. To this end, we consider multiple i.i.d observations.

Letx x0, 1,Ă,xt (tt1)be a sequence of i.i.d obser-vations. i.i.d 2 0 i.i.d 2 1 : ~ ( , ), 0 : ~ ( , ), 0 i n i n i t i t H N H N V V _{d d} ° ® ° _{d d} ¯ 0 x I x s I (6)

Based on this argument, given observations

0, 1, , t

x x Ă x , LRT-SE works as follows:

1) Estimatesfromx0assuming H1 is true. Denote the

estimates䩘(s䩘is a function ofx₀

).

2) For T ₂T T 1 ] [ t t Ă

x x x x , calculate the test statistic and compare this value with a thresholdKt:

1 0 1 0 ( ; ) ( ) ( ; ) t H H t t t f T f K x s x x 䩘 0 M (7) where 1( ; ) t

f x s䩘 is the likelihood of the observa-tionsxtwhen the mean vector iss䩘.

In the case of additive white Gaussian noise, a sim-ple calculation shows that the test on ( t)

T x is equiva-lent to testing T 1 t i i

¦

x s䩘

. In the next section we discuss

the estimation method to calculates䩘. Distinct from the ML method used in classical GLRT, we are going to use a completely different estimation technique, which requires solving an A₁-regularized optimization prob-lem.

3.2. Sparse estimation algorithm

The main mathematical architecture of LRT-SE is the sparse estimation. To computes䩘, we consider the

1

A -regularized convex program:

1 0 2

arg min s.t. dH

䩘 _{ý ý} _ý _ý

s s As Ax (8)

where ARm nu (mdn) is the measurement matrix with each entry drawn from i.i.d Gaussian distribution. Each row of the matrix is made orthonormal. Eq. (8) is also known as the A1-error version of basis pursuit

denoising (BPDN)[22].

The idea is to utilize the fact that a sparse signal can be uniquely embedded in a certain subspace (specifi-cally a hyperplane). This subspace, specified by a pre-determined matrixAand referred to as the measure-ment matrix, is adaptive to the signals with certain sparsity index. This implies that shrinking the observa-tion into a lower dimension will not lose the informa-tion about a underlying sparse signal and by further recovering it, we are expected to get almost the same

(4)

(if not exactly the same because of the presence of noise) signal, i.e. large correlation between the recon-struction and the rest of the observations. In contrast, if what we observe is purely noise, the reconstruction from the low-dimensional projections will still be sparse and the correlation will be zero in expectation. In a nutshell, it is as if there were a matched filter that always tunes to the only sparse signal which could possibly lie in the projections.

The structure of the algorithm is shown in Fig. 1.

Fig. 1 Structure of LRT-SE algorithm.

3.3. Sparse estimation accuracy

The following lemma, which is based on the work of Candès, et al.[19, Theorem 1.1], describes the relationship between the size of the noisee Anleaked into the measurements and the fidelity of the sparse estimates䩘.

Lemma 1[19] LetS be such thatG₃SG₄S 2 , whereGSis the smallest quantity such that

2 2 2

(1G_Sý ý ý) c d A c_T ýd (1 G_S)ý ýc (9) for all subsetsT{1, 2,", }n with|T|dSand coeffi-cient sequences c [ ]cj j T .ATis made by choosing the columns of A, whose indices are in the setT.

Then for any signalswithý ýs 0dSand any

pertur-bation with

2 dH e A

ý ý , the solutions䩘_{to Eq. (8) obeys}

2 CsH

d

s䩘 s

ý ý (10)

whereCsis a constant that depends only onG4SandG3S. As illustrated in Lemma 1, the convex program in Eq. (8) can recover a sparse signal that has been con-taminated by an arbitrary type of noise vector of boundedA2norm. However for mathematical simplicity

Gaussian model is assumed in our analysis afterwards. Lemma 1 allows us to establish another fidelity measure of sparse estimation in the form of the bound on the correlation betweensands䩘_{as stated in the} following corollary. Corollary 1 If 2₍ ₂ ₎1/ 2_, s m m C Výýs O _then T s s䩘

is bounded above some positive constant] . Proof It is easy to seee AnRmis still white Gaussian noise withe~N( ,0V2Im).

From Eq. (10) we get

T 2 2 2 2 2 2 1 ( ) 2 CsH t s s䩘 _{ý ý ý ý}s s䩘 (11)

And from triangular inequality we can get

2 2d 2 2 d 2dCsH s s䩘 s䩘 s s s䩘 ý ý ý ý ý ý ý ý ý ý (12) Therefore 2 t 2CsH 䩘 ý ý ý ýs s (13) Then T 2 2 2 2 2 2 2 2 2 ] 1 ( ) 2 s s s C C C H H H t 䩘 _{>ý ý ý ý} ý ý ý ý s s s s s s (14) Therefore if 2 !CsH s ý ý (15) Denote 2 2 2Cs ] ý ý ý ýs s Hand we have T 0 ] t ! s s䩘 (16)

For Gaussian noise

i.i.d 2 ~ N( ,V m) e 0 I , we usually have 2 2 (m 2 )m

H V O for large probability. There-fore we need 2 2 2 1 2 ( 2 ) s m m C V ý ýs O ₍₁₇₎ and selectO 2 or 3while other choices are also

pos-sible.

4. Performance Analysis

For a detection strategy, it is desirable that the method is sensitive when the signal of interest is weak and the noise is substantial. Also the detection error probability is expected to decay fast as more observa-tions are available. We characterize these two aspects of LRT-SE via analyzing its consistency and the error exponent.

4.1. Chernoff consistency

In the following text, we establish the sensitivity of LRT-SE by showing its consistency.

The detector of LRT-SE using BPDN is defined as

1 0 ( ; ) 1, ( ; ) ( ; ) 0, Otherwise t t t t t f f K G K ! ° ® ° ¯ 0 䩘 x s x x (18)

Let PF Pr(H0 oH1) denote the probability

that H₁ is decided although H0 is true.

Nota-tionPM Pr(H1oH0)is defined analogously. Further

define P_D 1 P_M . The P_F, P_Mand P_D are generally called false alarm rate, miss rate and detection rate, respectively. The performance measure used here is

(5)

Chernoff consistency. We borrow the notion in Ref. [23] to define it in our setting.

Definition 1 The detector (G xt;Kt) is called Chernoff-consistent if there exists

K

tsuch that samples

drawn from hypotheses Eq. (6) can be separated com-pletely by comparing 1 0 ( ; ) ( ; ) t t f f x s x 䩘 0 with

K

twhento f. That is 1)lim D( ( ; )) 1 t t tofP G x K , for any t x under H1 2)lim F( ( ; )) 0 t t tofP G x K , for any t x under H0

The next theorem describes the maximum noise level under which reliable detection, viz. zero error probabilities, is possible using LRT-SE.

Theorem 1 Given m n ( )

m n

u

d

A R . Letsbe an unknown sparse vector, and 2

T T T 1 [ ] t t x x x Ăx , n i

x R be a set of ti.i.d samples drawn from one of the two hypotheses in Eq. (6). Let ( t; )

t G x K be a test defined in Eq. (18). If 2₍ _{2 )} 1/ 2 s m m C V ý ýs O _{, then} ₍ t_; ₎ t G x K is

Chernoff-consistent, withCsa constant coming from

the estimation algorithm.

Typicallym2Sln .nTherefore considerable amount of noise is suppressed by the step of random projection. Note that the noise energy grows proportionally ton. This means that the Chernoff-consistent detection is achievable in vanishingly small SNR.

Proof The test statistic of LRT-SE is

1 0 T 1 2 2 ( ; ) exp( ) ( ; ) 2 t t i i t t f f V

¦

x s x s x 䩘䩘 0 (19)

Therefore LRT-SE is equivalent to the following test: 1 0 T 1 1 t i t H i H t

¦

x s W 䩘_M (20) where 2 1 ln 2 t t t V W K (21)

The distributions of the statistic in Eq. (20) are

2 T 2 0 1 2 T T 2 1 1 1 ~ (0, ) 1 ~ ( , ) : : t i i t i i N t t t H H N t V V ° ° ® ° °¯

¦

s x s s x s s s 䩘䩘䩘䩘䩘 ý ý ý ý (22)

Let ₎ be the cumulative distribution function (cdf) of N(0,1) Then if we would like to use anDas the level,

W

thas to be chosen such that F( ( ; )) t t P G x K D (23) It yields that 1 2 (1 ) / t t W V_{ý ý}_s䩘 ) D ₍₂₄₎ It follows that M D T 1 2 ( ( ; )) 1 ( ( ; ))= (1 ) t t t t P P t G K G K ) ) D _V ª º « » ¬ ¼ 䩘䩘 ý ý x x s s s (25) Given 2₍ _{2 )}1/ 2 s m m C V ýýs O _{and applying} Lemma 1, it yields T 0 ] t ! s s䩘 ₍₂₆₎

for allx_iunder H1. ] !0 is a fixed constant.

It follows that M( ( ; )) 0 t t P G x K o (27) asto f.

Note the size of ( t; )

t

G x K isD for allt. Further let {D(t)}(0,1) be a sequence satisfying

D(t)o0 and tD(t)o f . Then ( t; ) t

G x K has sizeD(t)for eacht, and therefore

F( ( ; )) ( ) 0

t t

P G x K D t o (28) On the other hand, Eq. (27) still holds with

D

(t)in

place ofD . Hence ( t; ) t

G x K is Chernoff-consistent.

4.2. Error exponent

In practice, a large number of data samples are col-lected by, e.g. distributed sensors or multiple observa-tions. It is desirable to see how the error probabilities of the detector decrease as the number of observations increases. We now provide a large deviation analysis of the asymptotic behavior of LRT-SE by deriving a bound on the error exponent of the detector.

Define the miss rate PM as

M ( 1 0) 1 D

P P H oH P (29) The error exponent for a Neyman-Pearson LRT-SE detector is defined as NP LRT-SE M 1 lim ln t E P t of (30)

The error exponent describes the rate at which the error decays exponentially whento f.

To characterize the error exponent, first note that the decision threshold in Eq. (24) is a function of s䩘 thus is also a function ofs. To make the threshold invariant over the set{ :s sý ý₀S}and without loss of perform-ance, we divide both sides of Eq. (24) by s 2

䩘

ý ý. Also note that the test statistic in Eq. (20) now is the sum of i.i.d random variables:

(6)

T 2 ( ) r x x s s 䩘䩘 ý ý (31)

whose distributions are

2 0 T 2 2 1 : ( ) : ( ) ~ (0, ) ~ ( , ) N H r N r H V V ° ® ° ¯ x s s x s 䩘䩘 ý ý (32)

The following lemma describes the best error expo-nent in the miss rate over all tests between two hy-potheses in Eq. (32).

Lemma2 (Chernoff-Stein Lemma [24]) Letx x₀, ₁,",xtbe i.i.d drawn from one of the hy-potheses in Eq. (6). Let the sparse estimation fromx₀bes䩘. Consider the test on Eq. (31) whose dis-tributions areh0andh1in Eq. (32) under H0 and H1

re-spectively.

Let At T t be an acceptance region for hypothesis

H0, where the set ( 1) ( 2) ( )

t

r

T r x ur x u}u x is the Cartesian product of r(x₁)through (r x_t). Let the false alarm rate and the miss rate corresponding toA_tbe

c

0( ), 1( )

t t

t h At t h At

D E (33)

whereAtcdenotes the complement ofAt . For 0D1 2, define , min t t n t t A T D D D E E (34)

where the minimization is over all possible acceptance regions in T t having a false alarm rate less thanD .

Then 0 1 1 lim ln t ( ) t _t D h h D E of (35)

whereD(h₀||h₁) fis the Kullback-Leibler distance betweenh0andh1.

Proof See Ref. [24]. Theorem 2 2 NP 2 2 2 LRT-SE 2 1 1 cos cos 2 2 E T T V V d S s s ýý ýý (36)

whereTis the angle betweensands䩘.

Proof LRT-SE is specified by the test in Eq. (18) and a threshold given for a given levelD . Note that Eq. (18), which is a test in the space T t with the acceptance region specified by Eq. (24), is equivalent to Eq. (20) in the proof of Theorem 2. Theorem 3 readily follows by noting that Eq. (24) is not the optimal acceptance region for not knowingsand by further applying

Lemma 2, withh₀andh₁specified in Eq. (32). The upper bound on error exponent of LRT-SE

given in Theorem 3 is tight from the simulation results shown later. Theorem 3 shows that LRT-SE is intrinsi-cally different from the energy detectors, since the

es-timate energy does not appear in the exponent. It is the angle between the estimate and the unknown sparse signal that dictates the error decay rate of LRT-SE. Moreover later simulation results show that energy estimation is also insignificant in finite size settings, where normalization of estimate energy does not change the achievable performance.

Next we derive the exact error exponent using large deviation theory. Theorem 3 2 NP 2 2 LRT-SE 2 1 cos 2 = E T V s ýý (37)

Proof Denote X_i ~N( ,P V2) where P !0under

H1 . LetSt Xi i 1

t

¦

. Define the test statistic

Tt t

1

St (38)

We are interested in the error probability Pr(TtdWt), whereWt c/ t, c V 2 䩘 ý ýs 1 (1 ) ) D . This is a large deviation event. To see this clearly we introduce

Yi=XiP so that Yi~N(0,V2). Pr(TtdWt) Pr(t 1 Yi i 1 t

¦

dWtP) (39) Let H!0 be arbitrary, then for t large enough

t W Hand therefore 1 1 Pr( ) Pr( ) t t t i i T dW d t

¦

Y d H P (40) For t sufficiently large. Now since E(Yi)=0 , clearly

HP for H sufficiently small, so that this is a large deviation event about the empirical mean. For N(0,V) the rate function is

I(s) s2_{/ 2}

V

2

(41) wheresis the parameter of the rate and is determined

by the threshold on the right hand side of the inequality. From Cramerÿs theorem, for all a<E(Yi)

1 1 { : } 1 ln Pr( ) in lim s pu f ( ) t i t s x x a i t t Y a I s of

¦

d d d (42)

In our problem a=HP, therefore

2 1 1 2 1 ( ) lim sup ln Pr( ) 2 t i t t t _i Y H P V of d

¦

(43)

Now sinceHis arbitrary we may take Hp0 to give

1 1 1 0 1 2 2 2 2 0 lim sup ln Pr( ) lim lim sup ln Pr( )

( ) lim 2 2 t t t t i t i t T t t Y H H W H P H P P V V of o of o d d d

¦

(44)

(7)

Note T 2 P s s s 䩘䩘

ý ý, which implies that

2 T 1 2 2 2 M 2 2 1

lim inf ln ( ) cos

2 ( ) t t P V T of t 䩘䩘 ýý ý ý s s s s (45)

To compare the exponent given here with the upper bound from Theorem 3, the difference between the upper and lower bound is

T 2 2 2 1 cos 2 2SV S V T s s s s 䩘䩘 ýý ý ý (46)

This is the penalty incurred for not knowingswhen selecting our threshold.

5. Simulation Results

This section presents the simulation results of LRT-SE using synthetic sparse signal model, and benchmarks the performance of LRT-SE against that of LRT, which serves as the theoretical upper bound due to the Neyman-Pearson lemma [25], as well as GLRT. As we have noted, despite the asymptotic approach to analyze the performance in this paper, LRT-SE is non-asymptotic in nature.

5.1. Synthetic sparse signal model

We model sparse signal_s_Rn

with support ran-domly chosen from all subsets of{1, 2,Ă, }n with car-dinalityS and with non-zero entries i.i.d drawn from standard normal distributions, which has the most en-tropy among all distributions with a constraint on the second moment. Denotes [s s₁ ₂Ăs_n] i.i.d ~ (0,1), supp( ) i s N i s (47) and the noise n~N( ,0V2In)

.

Forx [x x₁ ₂Ăx_n], i.i.d 2 0 i.i.d 2 1 i.i.d 2 c ~ (0, ), , ~ (0,1 ), supp( ) : 1, 2, ~ (0, ), supp( ) : i i i H x N i H x N x N i n i V V V ° °° ® ° ° °¯ Ă s s (48)

5.2. Likelihood ratio test for sparse signal model

Given the sparse model Eq. (48), we employ the LRT: 1 0 1 0 1 0 1 0 ( , , , ; ) ( , , , ; ) H H t t f f K x x x s x x x " " 0 M (49)

wheref0and f1are likelihood functions under H0 and

H1 respectively. Note that the use of LRT assumes full

knowledge of the signal support supp(s). Simple cal-culation ultimately leads to a detector in the following form: 1 0 2 sup p(t+1) H i H i x W

¦

s M (50) where T T 1 1 [ ] t t Ă s s s (51) and 2 ( 1) / 2 2 2 2 ( 1) / 2 (1 ) 2 (1 ) ln ( ) S t S t K V W V V V (52)

Intuitively, the optimal detector computes the “en-ergy” coming from the “non-zero” group of the re-ceived signal, whose distribution is

0 2 1 2 ( 1) 1 : , 2 2 ( 1) 1 : , 2 2(1 ) S t H S t H * V * V § · ¨ ¸ ° _© _¹ ° ® _§ _· ° _¨ _¸ ° _© _¹ ¯ (53) Thus F 0 2 D 1 2 1 ( 1) Pr( | ) 1 , 2 2 1 ( 1) Pr( | ) 1 , 2 2(1 ) S t P t H S t P t H W J V W J V _! § · ¨ ¸ ° _© _¹ ° ® _§ _· ° _! _¨ _¸ ° _© _¹ ¯ (54)

where

J

( , )s x is the regularized gamma function:

1 t 0 1 ( , ) e d ( ) x s s x t t s J *

³

(55) 5.3. Numerical simulations

We make four plots to show performance of the de-tectors. The first is the standard receiver operating characteristics (ROC) that trade off between the sparse detection rates and the false alarm rates under a fixed and relatively low SNR. The second and third plots are the plots of the detection rates (PD) against various

SNRs around the transitions between a poor and a nearly perfect detection. Finally the miss rate is plotted as a function of the number of observations where both SNR and the false alarm rate are fixed. This indicates how fast nearly perfect detection can be achieved by collecting more observations. The comparison is per-formed on 10 000 test signals drawn from the sparse model as in Eq. (48) and from noise, respectively. The SNR in dB is defined as10 lg S₂

nV .

Figure 2 shows the snapshots of the observations in Fig.2(a) through Fig.2(c),Fig. 2(d) shows the ROC curves under a rather low SNR that sparse signal plus

(8)

Fig. 2 Detection performance of LRT-SE witht 1,sparsity indexS 10, and SNR = 5.051 5 dB, with snap-shots of the observations.

noise and noise only observations are not visually separable from the snapshots in Fig. 2(a) through Fig. 2(c). The performance of the optimal LRT is used as the theoretical upper bound for the other detectors. The curve of the LRT largely coincides with the lines

ofP_D 1andP_F 0, which means it has nearly perfect detection performance. LRT-SE, which does not have prior knowledge of the signals, has performance in the vicinity of the optimal LRT method throughout various choices of the false alarm rates and the detection rates (see Fig. 2(d)). The difference of the detection rates between the two detectors when P_F 0.1 is less than 0.1. This is the price we pay for not knowing the signal in advance and being able to detect an infinite class of sparse signals, which nonetheless is low from the re-sults. It is also shown in Fig.2(d) that normalization of the energy of the sparse estimate essentially does not change the detection performance.

Figure 3 shows SNR dependence of the detectors. By limiting the false alarm rate to not greater than 0.002, LRT-SE is sensitive and approaches nearly per-fect detection when SNR is about 0 dB. In addition, it shows again that the price we pay for having the ability to detect an unknown and arbitrary sparse signal is low, compared with the oracle LRT method.

Fig. 3 Detection rates of LRT-SE vs SNR witht 1, spar-sity indexS 10, and false alarm rate is 0.002. Another widely used approach to detecting signals with unknown parameters is the GLRT which uses the ML method for estimation. Although the ML estimator at high SNR has similar performance to the “oracle” estimator[26]

, which knows the locations of the non-zero elements of

s

, it is computationally intracta-ble (i.e. NP-hard) in the settings with sparsity con-straints. In Fig. 4 we implement the GLRT using ex-haustive search over all possible sparsity patterns (we decrease the sparsity level to 2, since at the previous sparsity level of 10, it is impossible to perform support enumeration for the GLRT). Observe first that the LRT again serves as the upper bound on the performance of detectors which are not endowed with oracular knowl-edge. Observe then at high SNR, the LRT, the GLRT and LRT-SE all achieve nearly perfect performance which is close to detection rate 1.0. However, the pro-posed LRT-SE detector has significantly low computa-tional complexity (polynomial time) and performs al-most as well as the GLRT detector.

(9)

Fig. 4 Comparison of the LRT-SE and GLRT witht 1, sparsity indexS 2, and false alarm rate is 0.002. Finally we plot the miss rates of LRT-SE as well as its large deviation (LD) estimate when the number of observations in creases in Fig. 5. The LD estimate re-fers to_enELRT-SENP _{. First we observe the exponential decay}

of the miss rate PM of LRT-SE which is highly

desir-able in real implementations (e.g. when using multiple distributed sensors). The second curve is the LD esti-mate with the rate of decayELRT-SENP provided by

Theo-rem 3. It can be seen that the slope of the LD estimate is almost the same as that of the miss rates simulated with LRT-SE. The characterization Eq. (37) of

NP LRT-SE

E by Theorem 3 is reasonably accurate. Nonethe-less, there is a non-vanishing gap between _enELRT-SENP _and

PM . This is because the LD estimate does not consider

subexponential terms and in particular the constants. The curve marked by the u-mark corresponds to the optimal exponent given in Eq. (36) achieved when the detector has direct access tos, which serves as an up-per bound. Such phenomenon is indeed predicted by Theorem 2. The difference between the curves of LRT-SE and the oracle exponent is the penalty which incurs for not knowingswhen selecting a detection threshold.

Fig. 5 Miss rate as a function of the number of observations for the sparse signal when sparsity indexS 1, false alarm rate is 0.02 and SNR=15.051 5 dB.

6. Conclusions

In this paper, we consider the sparse signal detection using convex optimization techniques. We derive the maximum noise level such that the false alarm rate and the miss rate diminish. We also provide a theoretical characterization of the exponential decay rate of the miss rate as the number of observations increases. Such properties allow the possibility of using a large number of sensors to improve the detection performance.

Weleave several interesting problems open as future work. Non-asymptotic performance results are also helpful in implementation of the detectors. Observation model with correlated data also deserves careful study for both theoretical and practical reasons.

References

[1] He T, Ben-David S, Tong L. Nonparametric change detection in 2D random sensor field. IEEE Proceedings International Conference on Acoustic Speech, Signal Processing. 2005; 821-824.

[2] Liu Y, Ning P, Reiter M K. False data injection attacks against state estimation in electric power grids. ACM Conference on Computer and Communications Security. 2009; 21-32.

[3] Hatchue G, Lei C, Tong L. Distributed detection and classification of transient signals. Cornell University, ACSP-TR-06-08-01, 2008.

[4] Learned R, Willsky A. A wavelet packet approach to transient signal classication. Applied and Computa-tional Harmonic Analysis 1995; 2(3): 265-278. [5] Germida A, Yan Z, Plusquellic J, et al. Defect detection

using power supply transient signal analysis. Interna-tional Test Conference. 1999; 536-540.

[6] Anderson H, Gupta M. Joint deconvolution and classication with applications to passive acoustic un-derwater multipath. Journal of the Acoustical Society of America 2008; 124(5): 2973-2983.

[7] Heger A, Lappe M, Holm L. Accurate detection of very sparse sequence motifs. Journal of Computation Biol-ogy 2004; 11(5): 843-857.

[8] Bruckstein A M, Donoho D L, Elad M. From sparse solutions of systems of equations to sparse modeling of signals and images. SIAM Review 2009; 51(1): 34-81. [9] Trees H V. Detection, estimation, and modulation

the-ory, Part I. 2nd ed. New York: John Wiley & Sons, 2001.

[10] Kay S M. Fundamentals of statistical signal processing, Volume II: Detection theory. Upper Saddle River, NJ: Prentice-Hall PTR, 1998.

[11] Zeitouni O, Ziv J, Merhav N. When is the generalized likelihood ratio test optimal?. IEEE Transactions on Information Theory 1992; 38: 1597-1602.

[12] Levy B C. Principles of signal detection and parameter estimation. New York: Springer, 2008.

[13] Bickel P J, Chernoff H. Asymptotic distribution of the likelihood ratio statistic in a proto-typical nonregular problem. In: Ghosh J K, Mitra S K, Parthasarathy K R, et al., editors. Statistics and probability. New Dechi: Wiley Eastern, 1993; 83-96.

[14] Hartigan J A. A failure of likelihood asymptotics for normal mixtures. Proceedings Berkeley Conference in Honor of Jerzy Neyman and Jack Keifer. 1985;

(10)

807-810.

[15] Ingster Y. Minimax detection of a signal for p-balls. Mathmatical Methods of Statistics 1999; 7: 401-428. [16] Donoho D, Jin J. Higher criticism for detecting sparse

heterogeneous mixtures. The Annals of Statistics 2004; 32(3): 962-994.

[17] Haupt J, Nowak R. Compressive sampling for signal detection. IEEE Proceedings International Confer-ence on Acoustic Speech, Signal Processing. 2007; 536-540.

[18] Davenport M, Boufounous P, Wakin M, et al. Signal processing with compressive measurements. IEEE Transactions on Signal Processing 2010; 4(2): 445-460. [19] Candès E, Romberg J, Tao T. Near optimal signal re-covery from random projections: universal encoding strategies. IEEE Transactions on Inforamation Theory 2006; 52(12): 5406-5425.

[20] Tibshirani R. Regression shrinkage and select via the LASSO. Journal of the Royal Statistical Society, Series B (Methodological) 1996; 58(1): 267-288.

[21] Levy B C. Principles of signal detection and parameter estimation. New York: Springer, 2008.

[22] Ben-Haim Z, Eldar Y, Elad M. Near-oracle perform-ance of basis pursuit under random noise. arXiv:0903.4579v2, 2009.

[23] Shao J. Mathematical statistics. New York: Spring-er-Verlag, 1999.

[24] Cover T, Thomas J. Elements of information theory. 2nd ed. Hoboken: John Wiley & Sons, 2006.

[25] Neyman J, Pearson E. On the use and interpretation of certain test criteria for purposes of statistical inference. Biometrika 1928; 20(1):175-240.

[26] Ben-Haim Z, Eldar Y. The Cramér-Rao bound for esti-mating a sparse parameter vector. IEEE Transactions on Signal Processing 2010; 58(6): 3384-3389.

Biographies:

LEI Chuan received the B.S. degree in Electronics and In-formation Engineering from Beihang University in 2006. He is currently a Ph.D. student in the School of Electronics and Information Engineering, Beihang University since 2006. He was a visiting graduate student in the School of Electrical and Computer Engineering, Cornell University from 2007 to 2009 under the scholarship from Chinese Scholarship Coun-cil. His main research interests are statistical signal process-ing, information theory and air transportation system. E-mail: [email protected]

ZHANG Jun received the B.S., M.S., and Ph.D. degrees from Beihang University, in 1987, 1991, 2001, respectively. He is currently a professor with the School of Electronics and Information Engineering, Beihang University and the Direc-tor of National Key LaboraDirec-tory of CNS/ATM, Beijing. His area of research includes signal processing, communication theory and air transportation system.