• No results found

Efficiency of orthogonal super greedy algorithm under the restricted isometry property

N/A
N/A
Protected

Academic year: 2020

Share "Efficiency of orthogonal super greedy algorithm under the restricted isometry property"

Copied!
21
0
0

Loading.... (view fulltext now)

Full text

(1)

R E S E A R C H

Open Access

Efficiency of orthogonal super greedy

algorithm under the restricted

isometry property

Xiujie Wei

1*

and Peixin Ye

2

*Correspondence:

[email protected] 1School of Sciences, Tianjin Chengjian University, Tianjin, China Full list of author information is available at the end of the article

Abstract

We investigate the efficiency of orthogonal super greedy algorithm (OSGA) for sparse recovery and approximation under the restricted isometry property (RIP). We first show that under the RIP conditions of the measurement matrix

Φ

and the minimum magnitude of the nonzero coordinates of the signal, forl2bounded orl∞bounded noise vectore, with explicit stopping rules, OSGA can recover the support of an arbitraryK-sparse signalxfromy=

Φ

x+ein at mostKsteps. Then, we investigate the error performance of OSGA inmterm approximation with regards to dictionaries satisfying the RIP in a separable Hilbert space. We establish a Lebesgue-type inequality for OSGA. Based on this inequality, we obtain the optimal rate of convergence for the sparse class induced by such dictionaries.

Keywords: Orthogonal super greedy algorithm; Orthogonal multi matching pursuit; Restricted isometry property; Compressed sensing;m-term approximation;

Lebesgue-type inequality

1 Introduction

Recovery and approximation by sparse linear combination of elements from a fixed re-dundant family is frequently utilized in many application areas, such as image or signal processing, PDE solvers and statistical learning, see [1]. In general, these problems are

N-Phard. It is well known that greedy type algorithms are efficient approaches to solve them, see [2–5]. Among others, the orthogonal greedy algorithm (OGA) has been widely used in practice. OGA is a simple yet powerful algorithm for highly nonlinear sparse ap-proximation that has seen a large amount of research over its history, see [3,6,7] and the reference therein.

In this paper, we consider the orthogonal super greedy algorithm (OSGA), which is more efficient than OGA from the viewpoint of computational complexity. The performances of (OSGA) in greedy approximation and signal recovering were analyzed in [8–12]. We will make further study on the efficiency of the OSGA from two aspects.

In Sect.2, we will study the efficiency of the OSGA in recoveringN-dimensional sparse signal from linear measurements. This topic is also known in the literature as compressed sensing (CS), [13–15]. We show that OSGA can recover aK-sparse signal from noisy mea-surements in at mostKsteps. We remark that in the field of signal processing, orthogo-nal super greedy algorithm OSGA is also known as orthogoorthogo-nal multi matching pursuit

(2)

(OMMP) [4]. So for the reader’s convenience, we will use the term OMMP instead of OSGA in Sect.2.

In Sect.3, we study the error performance of the OSGA in the general context. We in-vestigate the efficiency of OSGA form-term approximation with regard to dictionaries in a real separable Hilbert spaceH. Assuming that the dictionary satisfies the RIP condition, we establish a Lebesgue inequality for OSGA which bounds the error of OSGA by the best

m-term approximation error. Based on this inequality, we derive the sharp convergence rate of OSGA on the sparse class induced by dictionary satisfying the RIP condition.

2 The efficiency of the OMMP in CS

In this section, we analyze the efficiency of the OMMP in compressed sensing.

We consider the following model. Suppose thatx∈RN is an unknownN-dimensional

signal and we wish to recover it fromMnoisy measurementsygiven by inner products with fixed vectors, that is,

y=Φx+e, (2.1)

whereΦis a knownM×Nmeasurement matrix withMNande∈RM is a vector of

measurement errors. The error vector can either be zero, bounded, or Gaussian noise. Forx= (xj)Nj=1∈RN, define

xp:=

N

i=1

xpi

1/p

, 0 <p<∞,

and

x∞:= sup 1≤iN

|xi|

.

The signalx∈RN is said to beK-sparse ifx

0:=|supp(x)|=|{i:xi= 0}| ≤K<N. We

study the performance of OMMP for the recovery of the support of theK-sparse signalx

under model (2.1).

These algorithms recover theK-sparse signal by iteratively constructing the support of it by some greedy principles.

The OMMP is a stepwise forward selection and is easy to implement, it has been used for signal recovery. Now let us recall the definition of OMMP(s) in an algorithmic way (see Algorithm1).

OSGA(s) generalizes the orthogonal greedy algorithm (OGA) in the sense that it selects multiplesindices in each iteration, it can recover the sparse signal using fewer steps and further reduce the complexity. To investigate the efficiency of the OMMP in CS, we use the restricted isometry property (RIP) condition ofΦwhich ensures the stable recovery ofxfrom noisy measurements. This property is introduced by Candes and Tao [16,17] as follows.

A matrixΦis said to satisfy RIP of orderKif there exists a constantδ∈(0, 1) such that, for allK-sparse vectorsx,

(3)

Algorithm 1Orthogonal Matching Multi Pursuit (OMMP(s))

Input:Measurement matrixΦ, vectory, ands, the stopping criterion.

Step 1:Set the residualr0:=y, a trivial initial approximationx0:= 0, the index setΛ0:=, and the iteration counterl= 0.

Step 2:DefineΛl+1:=Λl∪ {i1, . . . ,is}such that

rl,φi1 ≥ · · · ≥r

l,φ

is ≥ sup

φΦ,φik,k=1,...,s

rl,φ .

Then

xl+1:= arg min z:supp(z)∈Λl+1

yΦz2

and update the residual

rl+1:=yΦxl+1.

End if the stopping condition is achieved. Otherwise, we setl:=l+ 1 and turn to step 2.

Output:If the algorithm stops at thekth iteration, outputΛkandx

Λk=xk.

We recall some results on the efficiency of OMMP based on RIP. These results concern the estimate of the upper bound ofsKorder RIP constantδsK. In the noiseless case, Liu

and Temlyakov [8] proved thatδsK <

s

(2+√2)√K is a sufficient condition for OMMP(s) to

recover everyK-sparse signal in at mostK iterations successfully. Then, Wang, Kwon, and Shim [10] relaxed the condition toδsK<

s

3√s+√K. The constant was further relaxed as

δsK<

s

2√s+√K by Satpathi, Das, and Chakraborty [18]. In [18] the authors also pointed out

the possibility of extending the results to the noisy case. On the other hand, Dan Wei [9] defined the restricted orthogonality constantθK,Kand proved thatδsKs+1+

K

sθsKs+1,s< 1

is a sufficient condition guaranteeing the perfect recovery ofK-sparse signals by OMMP. She shows that this result implies the result of [18]. Moreover in [9] the performance of OMMP for support recovery from noisy measurements was also discussed.

Motivated by the above works, we investigate the OMMP in the general setting where noise is presented under the RIP based conditions. We prove if the sampling matrix Φ

satisfies the RIP of ordersKwith the isometry constant

δsK< 1 +

1 2

K s

1 2

K s + 4

K s,

then OMMP can recover everyK-sparse signal in at mostK steps with thel2bounded andl∞bounded noise. It is easy to check that

s

2√s+√K < 1 +

1 2

K s

1 2

K s + 4

K s.

(4)

When dealing with the noisy measurements, one of the key components in OMMP is the stopping rule which depends on the structure of the noise. In general, there are several natural stopping rules for OMMP.

(1) Stop after a fixed number of iterationsK.

(2) Forl2bounded noise case: The error vectoreis bounded inl2norm withe2≤b2,

we set the stopping rule asrk

2≤b2.

(3) Forl∞bounded noise case: The error vectoreis bounded inl∞norm with

Φe∞≤b∞, we set the stopping rule as Φrk

1 +

sKδsK

1 –δsK

b∞,

whereΦ∗is the transpose ofΦ.

Since we mainly consider l2 bounded noise and l∞ bounded noise, we use stopping rules (2) and (3) respectively. In Sect.2.1, some notations are introduced and some conse-quences of restricted isometry properties are presented. In Sect.2.2, we give the RIP based sufficient conditions for OMMP(s) withl2bounded noise case by using the stopping rule (2). In Sect.2.3, we establish the results in thel∞bounded noise case by using the stopping rule (3). In Sect.2.4, we give the RIP based sufficient conditions for OMMP(s) with high probability in the Gaussian noise case.

2.1 Preliminary

The following notations will be used in this section. For a signalx∈RN, letΩ={1, . . . ,N}

denote the index set ofxandT= supp(x) ={i:xi= 0}the support of it.

Suppose thatφ1,φ2, . . . ,φNare the columns of the matrixΦ. We always assumeφi2= 1, 1≤iN. Suppose thatΛis a subset ofΩ, letΦΛdenote the matrixΦ restricted to the columns indexed byΛ. We use the same way to definefor the vectorx∈RN. Thus

ΦΛx=ΦxΛ=ΦΛxΛ. (2.3)

We recall some useful consequences of the RIP.

Lemma 2.1([5]) Suppose thatΓ andΛare two disjoint sets of indices.If the matrixΦ

satisfies the RIP of order|ΓΛ|with isometry constantδ|ΓΛ|,then for any x∈R|Λ|we

have

ΦΓΦΛx2δ|ΓΛ|x2 (2.4)

and

ΦΛΦΛ –1

x2≤ 1

(1 –δ|Λ|)

x2. (2.5)

Lemma 2.2([6]) Let x∈RNbe a K -sparse vector.Suppose that the matrixΦsatisfies the

RIP of order K with isometry constantδK.Then,forΛΩ,

(1 –δK)xT\Λ2≤ΦT∗\Λ

IΦΛΦΛ

ΦT\ΛxT\Λ2≤(1 +δK)xT\Λ2, (2.6)

(5)

The residual vector afterksteps can be written as

rk=yΦΛ

Λky= (IPk)y

= (IPk)(Φx+e)

= (IPk)Φx+ (IPk)e

=rk1+rk2,

wherePk=ΦΛkΦ

Λk denotes the projection onto the linear space spanned by{φi,iΛk},

rk

1= (IPk)Φxis the signal part of the residual, andrk2= (IPk)eis the noise part of the

residual. Denote:

Tk=ΛkT, Ak=T\Λk, Bk=TΛk.

Suppose thatSkΩ\Bk,|Sk|=s, such that

min iSk

rk,φi ≥ max i∈(Ω\Bk)\Sk

rk,φi .

The difference between OMMP(s) and the standard orthogonal matching pursuit (OMP) is that at each step of the OMMP(s),selements are simultaneously selected from a dictio-nary instead of one. For a sufficient condition to guarantee OMMP(s) to select at least one correct index at each iteration, we need to give a lower bound ofMSkmSk. So we provide the following lemma which plays an important role in our analysis.

Lemma 2.3 Let mSk :=miniSk{|r

k

1,φi|}and MAk :=maxiAk{|r

k

1,φi|}.Assume Ak=∅.

Then there hold

mSk

δsK

s(1 –δsK)

xAk2 (2.7)

and

MAk≥ 1 – δsK

|Ak|

xAk2. (2.8)

Proof Using the fact (IPk)ΦΛk=ΦΛkΦΛk(Φ

ΛkΦΛk)–1Φ

ΛkΦΛk= 0, we express

rk1= (IPk)Φx

as

rk1= (IPk)ΦTxT= (IPk)(ΦTkxTk+ΦAkxAk)

= (IPk)ΦAkxAk=

IΦΛkΦΛk

ΦAkxAk

=ΦAkxAkΦΛkΦ

(6)

SinceΛkA

k=∅, applying (2.5) and (2.4) of Lemma2.1, and the fact that for any two

integersKKit holds

δKδK, (2.10)

it follows that

ΦΛkΦAkxAk2 =ΦΛkΦΛk

–1

ΦΛkΦAkxAk2

≤ 1

(1 –δsk)

ΦΛkΦAkxAk2

δsk+|Ak| (1 –δsk)

xAk2≤

δsK

(1 –δsK)

xAk2. (2.11)

By (2.9), (2.11), and Lemma2.1, we have

ΦS

kr

k

12 =Φ

Sk

ΦAkxAkΦΛkΦ

ΛkΦAkxAk2

ΦS

kΦAkxAk2+Φ

SkΦΛkΦ

ΛkΦAkxAk)2

δs+|Ak|xAk2+δs(k+1)Φ

ΛkΦAkxAk2

δsKxAk2+δsK

δsK

(1 –δsK)

xAk2

= δsK (1 –δsK)

xAk2. (2.12)

Thus

mSk≤ 1

Skr

k

12≤

δsK

s(1 –δsK)

xAk2.

Using (2.9) and Lemma2.2, we obtain

ΦA

kr

k

12 =Φ

Ak

IΦΛΛk

ΦAkxAk)2

=Φ

T\Λk

IΦΛΛk

ΦT\ΛkxT\Λk) 2

≥(1 –δsK)xT\Λk2= (1 –δsK)xAk2. (2.13)

Hence

MAk≥ 1

√ |Ak|

ΦA

kr

k

12≥

(1 – δsK)

|Ak|

xAk2.

Therefore Lemma2.3is proved.

2.2 l2bounded noise

We shall discuss the performance of OMMP(s) for thel2 bounded noise case. That is, the error vectoreis bounded inl2 norm withe2≤b2for some constantb2> 0. With a suitable stopping rule and a reasonable condition on the minimum magnitude of the nonzero coordinates of theK-sparse signal ofx, OMMP(s) can recover the support ofx

(7)

Theorem 2.1 Suppose thate2≤b2andΦsatisfies the RIP of order sK with the isometry

constantδ:=δsK < 1 +12

K s

1 2

K s + 4

K

s.Then OMMP(s)with the stopping ruler k

2≤

b2recovers the support of any K -sparse signal x∈RNfrom y=Φx+e provided that all the

nonzero components xisatisfy

|xi|>

2b2

(1 –δ) –

K s

δ 1–δ

. (2.14)

Proof DenoteEk=maxiΩ{|rk2,φi|}. Recalling the definition ofmSk andMAk in Lemma

2.3, there hold

min iSk

rk,φ

i ≤min

iSk rk

1,φi +max iΩ

rk

2,φi =mSk+Ek

and

max iAk

rk,φi ≥max iAk

rk1,φi –max iΩ

rk2,φi =MAkEk.

It is clear that in order for OMMP(s) to select at least one correct index at this step, it is necessary to have

max iAk

rk,φ

i >min iSk

rk,φ i .

A sufficient condition to guarantee OMMP(s) to select at least one correct index at each iteration until all correct indices are selected is

MAkEk>mSk+Ek.

That is,

2Ek<MAkmSk (2.15)

holds fork= 0, 1, . . . ,K– 1. We first provide an upper bound forEk. The assumptione

b2yields that

Ek=max iΩ

rk2,φi ≤max iΩ φi2

rk22=(IPk)e2e2≤b2. (2.16)

Next we give a lower bound ofMAkmSk. Using Lemma2.3and assumption (2.14), we have

MAkmSk

1 –δ

√ |Ak|

–√ δ

s(1 –δ)

xAk2

1 –δ

√ |Ak|

–√ δ

s(1 –δ)

|Ak|

2b2

(1 –δ) –

K s

δ 1–δ

(8)

This means OMMP(s) will select at least one correct index at thek+ 1-th iteration, and OMMP(s) can recover the support ofxin at mostK iterations. It remains to show that under the stopping rulerkb2, OMMP exactly stops when all the correct indices are selected.

First, assume thatAk=∅, thenTΛkand (IPk)Φx= 0. Thus

rk2=(IPk)Φx+ (IPk)e2=(IPk)e2e2≤b2.

So when all theKcorrect indices are selected, thel2norm of the residual will be less than

b2and hence OMMP(s) stops. Now we show that OMMP(s) does not stop early. Suppose that the OMMP(s) has runksteps for somek<K. We will verify thatrk

2>b2 and so OMMP(s) does not stop at the current step.

Secondly, assume thatAk=∅, then

rk2 =(IPk)Φx+ (IPk)e2

≥(IPk)Φx2–(IPk)e2

=ΦAkxAkΦΛkΦ

ΛkΦAkxAk2–(IPk)e2 =ΦBkx

(k)

2–(IPk)e2

ΦBkx (k)

2–e2

≥1 –δ|Bk|x (k)

2–b2

≥√1 –δxAk2–b2

≥√1 –δ|Ak|

2b2

(1 –δ) –

K s

δ 1–δ

b2

>b2,

wherex(k)= (xAkΦ

ΛkΦAkxAk)

, and to derive the third inequality, we use the RIP

con-dition. Hence OMMP(s) does not stop early, and the theorem is proved.

2.3 lbounded noise

We now give the RIP based sufficient conditions for OMMP(s) withl∞bounded noise. Our result is the following theorem.

Theorem 2.2 Suppose thatΦe∞ ≤bandΦ satisfies the RIP of order sK with the same isometry constant as in Theorem2.1.Then OMMP(s)with the stopping rule

Φrk

1 +

sKδ

1 –δ

b

finds the support of any K -sparse signal x if all the nonzero coefficients xisatisfy

|xi|>

2(1 + √

sKδ 1–δ )b

(1 –δ) –

K s

δ 1–δ

(9)

Proof DenoteEk=Φ∗(IPk)e∞=maxiΩ|r2k,φi|. We first prove that

2Ek<MAkmSk (2.18)

holds fork= 0, 1, . . . ,K– 1, wheremSk andMAk are defined as in Lemma2.3. It follows from the proof of Theorem2.1that inequality (2.18) can ensure OMMP recovers the true support ofxin at mostKsteps.

We first provide an upper bound ofEk. The assumptionΦe∞≤b∞ and the fact

Φ

Λk(IPk)e= 0 yield that

Ek=max iΩ

r2k,φi = max iΩ\Λk

φi∗(IPk)e

ΦΩ\Λke+ΦΩ\ΛkPke

b∞+ max

iΩ\Λk φiΦΛk

ΦΛkΦΛk –1

ΦΛke2

b∞+δΦΛkΦΛk –1

ΦΛke2

b∞+ δ 1 –δΦ

Λke2

b∞+ δ 1 –δ

skb

1 +δ

sK

1 –δ

b∞. (2.19)

Here, to derive the third and fourth inequalities, we have used (2.4) and (2.5) of Lemma

2.1respectively.

On the other hand, we can obtain a lower bound ofMAkmSk. By using Lemma2.3and assumption (2.17), we have

MAkmSk

1 –δ

√ |Ak|

–√ δ

s(1 –δ)

xAk2

1 –δ

√ |Ak|

–√ δ

s(1 –δ)

|Ak|

2(1 + √

sKδ 1–δ )b

(1 –δ) –

K s

δ 1–δ

> 2

1 +

sKδ

1 –δ

b∞. (2.20)

The bounds (2.19) and (2.20) imply that (2.18) holds fork= 0, 1, . . . ,K– 1. It remains to show that OMMP (s) exactly stops when all the correct indices are selected under the stopping ruleΦrk∞≤(1 +

sKδsK 1–δsK )b∞.

First, assume thatAk=∅, thenTΛkand (IPk)Φx= 0. Thus

Φrk=Φ∗(IPk)(Φx+e)=Φ∗(IPk)e

1 +δ

sK

1 –δ

b∞,

(10)

Secondly, assume thatAk=∅, then

Φrk =Φ∗(IPk)(Φx+e)

Φ∗(IPk)ΦxΦ∗(IPk)e

≥max iAk

(IPk)Φx,φiΦ∗(IPk)e

≥√1 –δ |Ak|

xAk

1 +δ

sK

1 –δ

b

≥√1 –δ |Ak|

|Ak|

2(1 + √

sKδ 1–δ )b

(1 –δ) –

K s

δ 1–δ

1 +δ

sK

1 –δ

b

= (1 –δ)2(1 + √

sKδ 1–δ )b

(1 –δ) –

K s

δ 1–δ

1 +δ

sK

1 –δ

b

>

1 +δ

sK

1 –δ

b∞.

Here, to derive the third and fourth inequalities, we have used (2.8) of Lemma2.3and assumption (2.17), respectively. Hence OMMP(s) does not stop early, and the theorem is

proved.

2.4 Gaussian noise

As an application of the above results onl2andl∞bounded noise cases, we shall discuss the performance of OMMP(s) on recoveringK-sparse signals with the Gaussian noise. Gaussian noise is of particular interest in statistics since it is probably the best simulation of real noise when the noise source is particularly complex, cf. [19,20].

The motivation of applying the results on thel2bounded andl∞ bounded noise cases to the Gaussian noise case comes from the following results of Cai, Xu, and Zhang [21], which shows that the Gaussian noiseeis essentially bounded in bothl2andl∞ norms. They have shown that ifeis zero-mean white Gaussian noise with covarianceσ2IM×M,

that is,eN(0,σ2IM×M), then there hold

Pee:e2≤σ

M+ 2MlogM≥1 – 1

M

and

Pee:Φeσ2logN≥1 – 1 2√πlogN.

Combining the above results with those of Theorem2.1and Theorem2.2, we immedi-ately obtain the following theorems.

Theorem 2.3 Suppose that eN(0,σ2IM×M)andΦsatisfies the RIP of order sK with the

same isometry constant as in Theorem2.1and nonzero components xisatisfy

|xi|>

2σM+ 2√MlogM

(1 –δ) –

K s

δ 1–δ

(11)

Then OMMP(s)with the stopping rulerk

2≤σ

M+ 2√MlogM finds the support of any K -sparse signal x∈RN from y=Φx+e with probability at least1 – 1

M.

Theorem 2.4 Suppose that eN(0,σ2I

M×M)andΦsatisfies the RIP of order sK with the

same isometry constant as in Theorem2.1and nonzero components xisatisfy

|xi|>

2(1 +√sKδ 1–δ )σ

2logN

(1 –δ) –

K s

δ 1–δ

.

Then OMMP(s)with the stopping ruleΦrk∞≤(1 + √

sKδ 1–δ )σ

2logN finds the support of any K -sparse signal x∈RN from y=Φx+e with probability at least1 –2√ 1

πlogN.

Theorem2.3and Theorem2.4show that, with a suitable stopping rule and a reasonable condition on the minimum magnitude of the nonzero coordinates of theK-sparse signal ofx, OMMP(s) can recover the support ofxin at mostKiterations with high probability from measurements with Gaussian noise.

3 Error performance of the OSGA

In this section, we study the error performance of the OSGA in the general context. LetH

be a real separable Hilbert space with an inner product·,·and the normx:=x,x1/2 for allxH. Recall that a setDfromHis called a dictionary if

D={φi,iI} ⊂H, and spanD=H.

Without loss of generality we shall assume that the dictionaryDis normalized, i.e.,

φi= 1, iI.

The standard measure of approximation power of a dictionary is the error of the best

m-term approximation. Given a dictionaryD, we define them-sparse class as

Σm:=Σm(D) :=

jΛ

cjφj:cj∈R,φjD,(Λ) =m

.

The error of bestm-term approximation to a functionfHfrom the dictionaryDis defined as

σm(f) :=σm(f,D) := inf gΣm

fg. (3.1)

Now we will use OSGA(s) to generate amterm approximantfm and estimate the error

ffmin terms ofσm(f).

So let us first recall the definition of the OSGA(s) (see Algorithm2).

To investigate the error performance of SOGA, one needs to make some assumptions on the dictionaryD. A simple and useful assumption is the coherenceμof a dictionaryD defined by

μ:=μ(D) := sup

(12)

Algorithm 2Orthogonal Super Greedy Algorithm (OSGA(s))

Suppose we are given an integersand a target elementr0:=fH. For eachm≥1, we inductively define

(1) φi(m–1)s+1, . . . ,φimsDare the elements of the dictionaryDsatisfying the following

inequality. DenoteIm:={i(m–1)s+1, . . . ,ims}and assume that min

iIm

rm–1,φi≥ sup

φD,φ=φi,iIm

rm–1,φ.

(2) LetHm:=Hm(f) :=span(φi1, . . . ,φims), and letPHm denote an operator of orthogonal

projection ontoHm. Define

fm:=Gm(f) := OSGAm(f,D) :=PHm(f).

(3) Define the residual after themth iteration of the algorithm

rm:=rsm(f) :=ffm.

Dictionaries with coherence μ are called μ-coherent. It was proved in [8] that for

μ-coherent dictionaries the OSGA provides the same (in the sense of order) upper bound on the sparse class for the error as the OGA. In this paper, we will improve this results under a weaker assumption on a dictionary. This assumption is RIP which is formulated in a finite dimensional context in Sect.2. Here we require the definition of RIP in terms of the dictionary instead of the measurement matrix in a infinite dimensional context:

A dictionaryDis said to satisfy RIP of orderkif there exists a constantδ∈(0, 1) such that, for any subsetJIwith(J)≤kand any scalarsai,iJ, the following inequalities

hold:

(1 –δ)

iJ

|ai|2

iJ

aiφi

2≤(1 +δ)

iJ

|ai|2

. (3.2)

As before, the minimum of allδsatisfying (3.2) is referred to as an isometry constantδk.

It is known from [22] that if the dictionaryDhas coherenceμ, then it satisfies RIP of orderkforkμ–1+ 1 with isometry constantδ

kμ(k– 1), but not vice versa.

Recently, under the RIP assumption, the authors in [7] proved the almost optimality of OGA by establishing the corresponding Lebesgue-type inequality. This inequality quan-tifies the efficiency of OGA for individual elements. It is natural to ask if we can establish Lebesgue-type inequality for OSGA(s). We give a positive answer to this question. In fact, we prove the following inequality for the OSGA(s) when the dictionaryDsatisfies RIP. This is the first Lebesgue-type inequality for the super greedy type algorithms obtained so far as we know.

Theorem 3.1 Given parameter s∈N,there exist fixed constants A:= 88,δ∗:= 1/10,C:= 8

(13)

fH,we have

rAn=ffAnCσn(f,D).

In Theorem3.1, we remark that the values ofA,δ∗,Cfor which the above result holds are coupled. For example, it is possible to obtain a smaller value ofCat the price of a larger value ofAor of a smaller value ofδ∗.

Based on this inequality, we will derive the convergence rate of OSGA on the sparse class induced by dictionaries satisfying RIP conditions.

The rest part of this section is organized as follows. The Sects.3.1and3.2are devoted to the proof of Theorem3.1. In Sect.3.3, as an application of Theorem3.1, we study the rate of convergence of OSGA on the sparse class induced by RIP dictionaries.

3.1 Reduction of the residual

To prove Theorem3.1, we establish the following lemma which quantifies the reduction of the residuals generated by the OSGA(s) under the RIP condition. Whens= 1, this lemma was proved in [7]. However, to deal with the case ofs> 1, we need some new techniques. We need to establish a new lemma (Lemma3.1). In what follows, we denote by

Λk:= k

i=1

Ii:={i1, . . . ,isk}

the set of indices selected afterkiterations of the OSGA(s) applied to the target element

fH.

Lemma 3.1 Let(fk)k≥0be the sequence of approximations generated by the OSGA(s)

ap-plied to f,and let g=iTziφi,(T)≤n.Then,if T is not contained inΛk,we have

rk+12≤ rk2–

1 –δ(TΛk) (1 +δs)(T\Λk)

rk2–fg2

. (3.3)

Lemma 3.1quantifies the reduction ofrk at each iteration provided thatT is not

contained in Λk and thatrkfg. Note that in the case whenTΛk, we have

rkfg.

Proof We may assume thatrkfg, otherwise inequality (3.3) is trivially satisfied.

Denote

Fk+1:=span(φi,iIk+1).

ThenHk+1is a direct sum ofHkandFk+1. Therefore,

rk+1=ffk+1=rk+fkfk+1

=rk+fkPHk+1(f)

=rk+fkPHk+1(rk+fk)

(14)

It is clear thatFk+1⊂Hk+1implies

rk+12=rkPHk+1(rk) 2

rkPFk+1(rk) 2

=rk2–PFk+1(rk) 2

. (3.4)

Now we estimatePFk+1(rk). Since the dictionaryDsatisfies RIP of orderswith isom-etry constantδs, we have

iIk+1

aiφi

2≤(1 +δs) iIk+1

|ai|2

. (3.5)

Thus

PFk+1(rk)= sup ψFk+1,ψ≤1

PFk+1(rk),ψ

= sup

(ci)iIk+1,iIk+1ciφi≤1

iIk+1

rk,φici

≥ sup

(ci)iIk+1,iIk+1|ci|2≤(1+δs)–1

iIk+1

rk,φici

= (1 +δs)–1/2 iIk+1

rk,φi

21/2

. (3.6)

Using inequality (3.6), we can continue to estimate (3.4) as

rk+12≤ rk2– (1 +δs)–1 iIk+1

rk,φi

2

rk2– (1 +δs)–1

max iIk+1

rk,φi2

.

Therefore to prove inequality (3.3), it suffices to prove that

(1 +δs)–1

max iIk+1

rk,φi2

≥ 1 –δ(TΛk) (1 +δs)(T\Λk)

rk2–fg2

. (3.7)

To prove this, we first note that

2rk2–fg2gfkrk2–fg2+gfk2

=rk2–gfkrk2+gfk2

≤2gfk,rk= 2g,rk.

This inequality can be written as

rk2–fg2≤

|g,rk|

gfk2

(15)

If we writefk=

iΛkc

k

iφi,δ:=δ(TΛk), then the denominator of the right-hand side of inequality (3.8) satisfies the RIP

gfk2 =

iT

ziφi

iΛk

ckiφi

2

=

iT\Λk

ziφi+

iTΛk

zicki

φi+

iΛk\T

ckiφi

2

≥(1 –δ)

iT\Λk

|zi|2+

iTΛk zicki

2

+

iΛk\T cki2

≥(1 –δ)

iT\Λk

|zi|2

.

On the other hand, the numerator of the right-hand side of inequality (3.8) satisfies

g,rk

2

=

iT

ziφi,rk

2

=

iT\Λk

ziφi,rk

2

iT\Λk

|zi|φi,rk

2

≤max iIk+1

rk,φi2 iT\Λk

|zi|

2

≤max iIk+1

rk,φi

2

(T\Λk)

iT\Λk

|zi|2.

Therefore we obtain

rk2–fg2≤

(T\Λk)(maxiIk+1|rk,φi| 2)

1 –δ(TΛk)

,

which implies (3.7). Thus we complete the proof of Lemma3.1.

The following proposition is an immediate consequence of Lemma3.1.

Proposition 3.1 Assume that,for a given integer A> 0andδ∗< 1,a dictionaryDsatisfies RIP of order(As+ 1)n withδ(As+1)nδ∗.If g=

iTziφi,(T)≤n,then for any triple of

non-negative integers(j,m,L)such that(TΛj)≤m and j+mL≤(As+ 1)n,we have

rj+mL2≤e

1–δ

(16)

Proof By Lemma3.1, ifg=iTziφi,(T)≤n, then for any triple of non-negative integers

(j,m,L) such that(TΛj)≤mandj+mL≤(As+ 1)n, we have

rj+mL2–fg2≤

1 – 1 –δ

(1 +δ∗)m

mL

max0,rj2–fg2

e–1–δ

1+δLmax0,rj2–fg2,

where we have used the fact that(TΛl)≤mfor alllj. This implies inequality (3.9).

Thus we complete the proof of Proposition3.1.

3.2 Lebesgue-type inequality for SOGA

We are in a position to prove Theorem3.1. FixfH, we first observe that the assertion of the theorem can be derived from the following lemma.

Lemma 3.2 If0≤k<n satisfies

rAk ≤2σk(f), σk(f) > 4σn(f), (3.10)

then there exists k<k˜≤n such that

rAk˜ ≤2σk˜(f), σk˜(f)≤4σn(f). (3.11)

Indeed, assuming that Lemma3.2holds, we complete the proof of Theorem3.1as fol-lows. We letk˜be the largest integer in{0, 1, . . . ,n}for whichrA˜k ≤2σk˜(f). Since

r0=σ0=f,

suchk˜exists. Ifk˜<n, then we must haveσk˜(f)≤4σn(f), and therefore

rAnrA˜k ≤2σk˜(f)≤8σn(f). (3.12)

We are therefore left with proving Lemma3.2. For this, we fix

δ∗= 1

10, (3.13)

and 0≤k<nsuch that (3.10) holds. Letk<Knbe the first integer such thatσk> 4σK.

By the definition ofσK(f), for anyB> 1, there existsg=

jTzjψj,(T) =Ksuch that

fgBσK(f).

The significance ofKis that on the one hand

fgBσK(f) <

B

4σk(f), (3.14)

while on the other hand

(17)

To apply Proposition3.1for the abovegandj=Ak, we need to bound(TΛAk) with

Ayet to be specified. To this end, we writeK=k+M, withM> 0, and notice that ifST

is any set with(S) =MandgS:=

jSzjψj, then

gSf – (ggS)–fg

σk(f) –BσK(f)

1 –B

4

σk(f), (3.16)

where we have used the fact thatggSΣk. Using the RIP, we obtain the following lower

bound for the coefficients ofg: for any setSTwith(S) =M,

1 –B 4

2

σk2(f)≤ gS2≤

1 +δ

jS

|zj|2=

11 10

jS

|zj|2. (3.17)

Take forSthe setSg of theMsmallest coefficients ofgand note that then, for any more

generalSTwith(S)≥M, one has

jS

|zj|2

jSg

|zj|2

(S)/M,

and hence

10 11

1 –B 4

2

(S)

M σ

2

k(f)≤

jS

|zj|2. (3.18)

Now we consider the particular setS:=T\ΛAksatisfying(S)≥M. Combining the above

bound with the RIP, we obtain

10 11

1 –B 4

2 (S)

M σ

2

k(f)

1 –δ∗≤ gS2≤ gfAk2

gf+rAk

2

BσK(f) + 2σk(f)

2

B

4 + 2 2

σk2(f).

Sinceδ∗=101 this gives the bound

(T\ΛAk)≤

9 11

(B4+ 2)2 (1 –B

4)2

M≤11M, (3.19)

where the second inequality is obtained by takingBsufficiently close to 1.

(18)

boundrAkdirectly in terms ofσl(f) for somel>k. Accordingly, we use Proposition3.1

in different ways for the two cases.

In the case whereK– 1 >k, i.e.,M≥2, we apply (3.9) withj=Ak,m= 11M, andL= 4. IndeedAk+Lm=Ak+ 44MAnholds fork+MnwheneverA≥44. Moreover, notice that for suchA

A(K– 1) =Ak+A(M– 1)

Ak+1 2AM

=Ak+Am 22 =Ak+Lm,

whenever

A≥88. (3.20)

This gives

rA(K–1)2≤ rAk+mL2

e–3611r

Ak2+fg2

e–36114σ2

k(f) +B2σK2(f)

e–361164σ2

K–1(f) +B2σK2–1(f)

≤4σK2–1(f),

where we have used (3.15) in the fourth inequality, and the last inequality follows by taking

Bsufficiently close to 1. We thus obtain (3.11) for the valuek˜=K– 1 >k.

In the case whereK– 1 =k, i.e.,M= 1, we apply (3.9) withj=Ak,m= 11, andL= 8. From (3.19), we know that(TΛAk)≤11 andAnA(k+ 1)≥Ak+mLforAsatisfying

(3.20). This yields

rA(k+1)2≤ rAk+mL2

e–7211rAk2+f g2

e–72114σ2

k(f) +B2σk2+1(f)

4e–7211+B 2

16

σk2(f).

This implies thatΛA(k+1)containsT. In fact, if it missed one of the indicesiT, then we derive from the RIP condition

1 –δ∗|zi|2≤ gfA(k+1)2

(19)

BσK(f) +

4e–7211+B 2

16σk(f) 2

B

4 +

4e–7211+B2 16

σk2(f).

On the other hand, we know from (3.18) that

10 11

1 –B 4

2

σk2(f)≤ |zi|2, (3.21)

which forBsufficiently close to 1 is a contradiction since

10 11

1 –B 4

2 >10

9

B

4 +

4e–7211+B2 16

.

This implies that rA(k+1) ≤σk+1(f), and therefore inequality (3.11) holds for the value

˜

k=k+ 1. This verifies Lemma3.2and hence completes the proof of Theorem3.1.

3.3 Convergence rate on sparse class

As an application of Theorem3.1, we will derive the convergence rate of SOGA on the sparse class induced by a dictionary. Firstly, we recall the definitions of these classes. For a general dictionaryDinH, we define the class of functions

A01(D) :=

fH:f =

iΛ

ci(f)φi,φiD,Λ<∞,

iΛ

ci(f)≤1

and we defineA1(D) to be the closure ofA01(D) inH. It is well known that the classA1(D) plays an important role in the study of greedy approximation with respect to dictionar-ies. In [23], DeVore and Temlyakov proved that, for an arbitrary dictionaryDinH, the OGA provides, aftermiterations, an approximation offA1(D) with the following up-per bound of the convergence rate:

rOGAm (f)≤m–1/2.

Note that the ratem–1/2is sharp since whenDis an ortho-normal basis ofH, it is easy to findf0∈A1(D) such that

rOGAm (f0) =c·m–1/2.

Unlike OGA, to get the same convergence rate onA1(D) for SOGA, one must make some extra assumptions on the dictionary D. A specific feature of a dictionary, μ -coherence, allows us to build OSGA with the same rate of convergence. Assuming that the dictionaryDisμ-coherent, Liu and Temlyakov in [8] proved the following theorem.

Theorem 3.2 IfDis a dictionary in a Hilbert space H with coherence parameterμ.Then,

for s≤(2μ)–1,OSGA(s)provides,after m iterations,an approximation of f A

1(D)with

the following upper bound on the error:

(20)

Now we present our results. By an immediate consequence of Theorem3.1, we get the following theorem.

Theorem 3.3 Given parameter s∈N,for all n≥0,ifDis a dictionary in a Hilbert space H satisfying RIP of order(88s+ 1)n with isometry constantδ(88s+1)n≤1/10,then OSGA(s)

provides,after m iterations,an approximation of fA1(D),with the following error bound:

r88n=ff88nCn

1 2.

Proof As the discussion above, the results of DeVore and Temlyakov [8] imply the follow-ing estimate for the bestmterm approximation error of functionsfA1(D):

σn(f,D)≤n

1

2, n= 1, 2, . . . . (3.22)

Combining (3.22) with Theorem3.1, we complete the proof of Theorem3.3.

Acknowledgements

Thanks for all the referenced authors.

Funding

This work is supported by the National Nature Science Foundation of China [Grant Nos. 11701411, 11271199, 11671213, 11401247].

Availability of data and materials Not applicable.

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

The authors contributed equally and significantly in writing this paper. All authors read and approved the final manuscript.

Author details

1School of Sciences, Tianjin Chengjian University, Tianjin, China.2School of Mathematical Sciences and LPMC, Nankai

University, Tianjin, China.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Received: 3 January 2019 Accepted: 16 April 2019 References

1. DeVore, R.A.: Nonlinear approximation. Acta Numer.7, 51–150 (1998)

2. Pati, Y.C., Rezaiifar, R., Krishnaprasad, P.S.: Orthogonal Matching Pursuit: Recursive Function Approximation with Applications to Wavelet Decomposition (1993)

3. Lin, J.H., Li, S.: Nonuniform support recovery from noisy measurements by orthogonal matching pursuit. J. Approx. Theory165, 20–40 (2013)

4. Liu, E., Temlyakov, V.N.: Super greedy type algorithm. Adv. Comput. Math.37, 493–504 (2012)

5. Needell, D., Tropp, J.A.: Cosamp: iterative signal recovery from incomplete and inaccurate samples. Appl. Comput. Harmon. Anal.26, 301–321 (2009)

6. Cai, T., Wang, L.: Orthogonal matching pursuit for sparse signal recovery with noise. IEEE Trans. Inf. Theory57, 4680–4688 (2011)

7. Cohen, A., Dahmen, W., DeVore, R.A.: Orthogonal matching pursuit under the restricted isometry property. Constr. Approx.45, 113–127 (2017)

8. Liu, E., Temlyakov, V.N.: The orthogonal super greedy algorithm and applications in compressed sensing. IEEE Trans. Inf. Theory58, 2040–2047 (2012)

9. Wei, D.: Analysis of orthogonal multi-matching pursuit under restricted isometry property. Sci. China Math.57, 2179–2188 (2014)

(21)

11. Wei, X.J., Ye, P.X.: Estimates of restricted isometry constant in super greedy algorithms. Int. J. Future Gener. Commun. Netw.8, 137–144 (2015)

12. Yi, S., Li, B., Pan, W.L., Li, J.: Analysis of generalised orthogonal matching pursuit using restricted isometry constant. Electron. Lett.14, 1020–1022 (2014)

13. Baraniuk, R.G.: Compressive sensing. IEEE Signal Process. Mag.24, 118–121 (2007)

14. Candes, E.J.: Compressive sampling. In: Int. Congress of Mathematics, vol. 3, pp. 1433–1452 (2006) 15. Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory52, 1289–1306 (2006)

16. Candes, E.J., Tao, T.: Decoding by linear programming. IEEE Trans. Inf. Theory51, 4203–4215 (2005)

17. Candes, E.J., Tao, T.: Near-optimal signal recovery from random projections: universal encoding strategies? IEEE Trans. Inf. Theory52, 5406–5425 (2005)

18. Satpathi, S., Das, R., Chakraborty, M.: Improving the bound on the rip constant in generalized orthogonal matching pursuit. IEEE Signal Process. Lett.20, 1074–1077 (2013)

19. Candes, E.J., Tao, T.: The Dantzig selector: statistical estimation whenpis much larger thann. Ann. Stat.35, 2313–2351 (2007)

20. Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat.32, 407–451 (2004)

References

Related documents