Efficiency of orthogonal super greedy algorithm under the restricted isometry property

(1)

R E S E A R C H

Open Access

Efﬁciency of orthogonal super greedy

algorithm under the restricted

isometry property

Xiujie Wei

1*

_{and Peixin Ye}

2

*_{Correspondence:}

[email protected] 1_{School of Sciences, Tianjin} Chengjian University, Tianjin, China Full list of author information is available at the end of the article

Abstract

We investigate the eﬃciency of orthogonal super greedy algorithm (OSGA) for sparse recovery and approximation under the restricted isometry property (RIP). We ﬁrst show that under the RIP conditions of the measurement matrix

Φ

and the minimum magnitude of the nonzero coordinates of the signal, forl2bounded orl∞bounded noise vectore, with explicit stopping rules, OSGA can recover the support of an arbitraryK-sparse signalxfromy=

Φ

x+ein at mostKsteps. Then, we investigate the error performance of OSGA inmterm approximation with regards to dictionaries satisfying the RIP in a separable Hilbert space. We establish a Lebesgue-type inequality for OSGA. Based on this inequality, we obtain the optimal rate of convergence for the sparse class induced by such dictionaries.

Keywords: Orthogonal super greedy algorithm; Orthogonal multi matching pursuit; Restricted isometry property; Compressed sensing;m-term approximation;

Lebesgue-type inequality

1 Introduction

Recovery and approximation by sparse linear combination of elements from a ﬁxed re-dundant family is frequently utilized in many application areas, such as image or signal processing, PDE solvers and statistical learning, see [1]. In general, these problems are

N-Phard. It is well known that greedy type algorithms are eﬃcient approaches to solve them, see [2–5]. Among others, the orthogonal greedy algorithm (OGA) has been widely used in practice. OGA is a simple yet powerful algorithm for highly nonlinear sparse ap-proximation that has seen a large amount of research over its history, see [3,6,7] and the reference therein.

In this paper, we consider the orthogonal super greedy algorithm (OSGA), which is more eﬃcient than OGA from the viewpoint of computational complexity. The performances of (OSGA) in greedy approximation and signal recovering were analyzed in [8–12]. We will make further study on the eﬃciency of the OSGA from two aspects.

In Sect.2, we will study the eﬃciency of the OSGA in recoveringN-dimensional sparse signal from linear measurements. This topic is also known in the literature as compressed sensing (CS), [13–15]. We show that OSGA can recover aK-sparse signal from noisy mea-surements in at mostKsteps. We remark that in the ﬁeld of signal processing, orthogo-nal super greedy algorithm OSGA is also known as orthogoorthogo-nal multi matching pursuit

(2)

(OMMP) [4]. So for the reader’s convenience, we will use the term OMMP instead of OSGA in Sect.2.

In Sect.3, we study the error performance of the OSGA in the general context. We in-vestigate the eﬃciency of OSGA form-term approximation with regard to dictionaries in a real separable Hilbert spaceH. Assuming that the dictionary satisﬁes the RIP condition, we establish a Lebesgue inequality for OSGA which bounds the error of OSGA by the best

m-term approximation error. Based on this inequality, we derive the sharp convergence rate of OSGA on the sparse class induced by dictionary satisfying the RIP condition.

2 The efﬁciency of the OMMP in CS

In this section, we analyze the eﬃciency of the OMMP in compressed sensing.

We consider the following model. Suppose thatx∈RN _{is an unknown}_N_-dimensional

signal and we wish to recover it fromMnoisy measurementsygiven by inner products with ﬁxed vectors, that is,

y=Φx+e, (2.1)

whereΦis a knownM×Nmeasurement matrix withMNande∈RM _{is a vector of}

measurement errors. The error vector can either be zero, bounded, or Gaussian noise. Forx= (xj)Nj=1∈RN, deﬁne

xp:=

_N

i=1

xp_i

1/p

, 0 <p<∞,

and

x∞:= sup 1≤i≤N

|xi|

.

The signalx∈RN _{is said to be}_K_{-sparse if}_x

0:=|supp(x)|=|{i:xi= 0}| ≤K<N. We

study the performance of OMMP for the recovery of the support of theK-sparse signalx

under model (2.1).

These algorithms recover theK-sparse signal by iteratively constructing the support of it by some greedy principles.

The OMMP is a stepwise forward selection and is easy to implement, it has been used for signal recovery. Now let us recall the deﬁnition of OMMP(s) in an algorithmic way (see Algorithm1).

OSGA(s) generalizes the orthogonal greedy algorithm (OGA) in the sense that it selects multiplesindices in each iteration, it can recover the sparse signal using fewer steps and further reduce the complexity. To investigate the eﬃciency of the OMMP in CS, we use the restricted isometry property (RIP) condition ofΦwhich ensures the stable recovery ofxfrom noisy measurements. This property is introduced by Candes and Tao [16,17] as follows.

A matrixΦis said to satisfy RIP of orderKif there exists a constantδ∈(0, 1) such that, for allK-sparse vectorsx,

(3)

Algorithm 1Orthogonal Matching Multi Pursuit (OMMP(s))

Input:Measurement matrixΦ, vectory, ands, the stopping criterion.

Step 1:Set the residualr0_:=_y_{, a trivial initial approximation}_x0_{:= 0, the index set}_Λ0_:=_∅_, and the iteration counterl= 0.

Step 2:DeﬁneΛl+1:=Λl∪ {i1, . . . ,is}such that

rl,φi1 ≥ · · · ≥r

l_,_φ

is ≥ sup

φ∈Φ,φ=φ_ik,k=1,...,s

rl,φ .

Then

xl+1:= arg min z:supp(z)∈Λl+1

y–Φz2

and update the residual

rl+1:=y–Φxl+1.

End if the stopping condition is achieved. Otherwise, we setl:=l+ 1 and turn to step 2.

Output:If the algorithm stops at thekth iteration, outputΛk_and_x

Λk=xk.

We recall some results on the eﬃciency of OMMP based on RIP. These results concern the estimate of the upper bound ofsKorder RIP constantδsK. In the noiseless case, Liu

and Temlyakov [8] proved thatδsK <

√

s

(2+√2)√K is a suﬃcient condition for OMMP(s) to

recover everyK-sparse signal in at mostK iterations successfully. Then, Wang, Kwon, and Shim [10] relaxed the condition toδsK<

√

s

3√s+√K. The constant was further relaxed as

δsK<

√

s

2√s+√K by Satpathi, Das, and Chakraborty [18]. In [18] the authors also pointed out

the possibility of extending the results to the noisy case. On the other hand, Dan Wei [9] deﬁned the restricted orthogonality constantθK,Kand proved thatδsK–s+1+

K

sθsK–s+1,s< 1

is a suﬃcient condition guaranteeing the perfect recovery ofK-sparse signals by OMMP. She shows that this result implies the result of [18]. Moreover in [9] the performance of OMMP for support recovery from noisy measurements was also discussed.

Motivated by the above works, we investigate the OMMP in the general setting where noise is presented under the RIP based conditions. We prove if the sampling matrix Φ

satisﬁes the RIP of ordersKwith the isometry constant

δsK< 1 +

1 2

K s –

1 2

K s + 4

K s,

then OMMP can recover everyK-sparse signal in at mostK steps with thel2bounded andl∞bounded noise. It is easy to check that

√

s

2√s+√K < 1 +

1 2

K s –

1 2

K s + 4

K s.

(4)

When dealing with the noisy measurements, one of the key components in OMMP is the stopping rule which depends on the structure of the noise. In general, there are several natural stopping rules for OMMP.

(1) Stop after a ﬁxed number of iterationsK.

(2) Forl2bounded noise case: The error vectoreis bounded inl2norm withe2≤b2,

we set the stopping rule asrk

2≤b2.

(3) Forl∞bounded noise case: The error vectoreis bounded inl∞norm with

Φ∗e∞≤b∞, we set the stopping rule as Φ∗rk_∞≤

1 +

√

sKδsK

1 –δsK

b∞,

whereΦ∗is the transpose ofΦ.

Since we mainly consider l2 bounded noise and l∞ bounded noise, we use stopping rules (2) and (3) respectively. In Sect.2.1, some notations are introduced and some conse-quences of restricted isometry properties are presented. In Sect.2.2, we give the RIP based suﬃcient conditions for OMMP(s) withl2bounded noise case by using the stopping rule (2). In Sect.2.3, we establish the results in thel∞bounded noise case by using the stopping rule (3). In Sect.2.4, we give the RIP based suﬃcient conditions for OMMP(s) with high probability in the Gaussian noise case.

2.1 Preliminary

The following notations will be used in this section. For a signalx∈RN, letΩ={1, . . . ,N}

denote the index set ofxandT= supp(x) ={i:xi= 0}the support of it.

Suppose thatφ1,φ2, . . . ,φNare the columns of the matrixΦ. We always assumeφi2= 1, 1≤i≤N. Suppose thatΛis a subset ofΩ, letΦΛdenote the matrixΦ restricted to the columns indexed byΛ. We use the same way to deﬁnexΛfor the vectorx∈RN. Thus

ΦΛx=ΦxΛ=ΦΛxΛ. (2.3)

We recall some useful consequences of the RIP.

Lemma 2.1([5]) Suppose thatΓ andΛare two disjoint sets of indices.If the matrixΦ

satisﬁes the RIP of order|Γ ∪Λ|with isometry constantδ|Γ∪Λ|,then for any x∈R|Λ|we

have

Φ_Γ∗ΦΛx₂≤δ|Γ∪Λ|x2 (2.4)

and

Φ_Λ∗ΦΛ –1

x₂≤ 1

(1 –δ|Λ|)

x2. (2.5)

Lemma 2.2([6]) Let x∈RN_{be a K -sparse vector}_._{Suppose that the matrix}_Φ_{satisﬁes the}

RIP of order K with isometry constantδK.Then,forΛ⊂Ω,

(1 –δK)xT\Λ2≤ΦT∗\Λ

I–ΦΛΦΛ†

ΦT\ΛxT\Λ₂≤(1 +δK)xT\Λ2, (2.6)

(5)

The residual vector afterksteps can be written as

rk=y–Φ_ΛkΦ†

Λky= (I–Pk)y

= (I–Pk)(Φx+e)

= (I–Pk)Φx+ (I–Pk)e

=rk₁+rk₂,

wherePk=ΦΛkΦ

†

Λk denotes the projection onto the linear space spanned by{φi,i∈Λk},

rk

1= (I–Pk)Φxis the signal part of the residual, andrk2= (I–Pk)eis the noise part of the

residual. Denote:

Tk=Λk∩T, Ak=T\Λk, Bk=T∪Λk.

Suppose thatSk⊆Ω\Bk,|Sk|=s, such that

min i∈Sk

rk,φi ≥ max i∈(Ω\Bk)\Sk

rk,φi .

The diﬀerence between OMMP(s) and the standard orthogonal matching pursuit (OMP) is that at each step of the OMMP(s),selements are simultaneously selected from a dictio-nary instead of one. For a suﬃcient condition to guarantee OMMP(s) to select at least one correct index at each iteration, we need to give a lower bound ofMSk–mSk. So we provide the following lemma which plays an important role in our analysis.

Lemma 2.3 Let mSk :=mini∈Sk{|r

k

1,φi|}and MAk :=maxi∈Ak{|r

k

1,φi|}.Assume Ak=∅.

Then there hold

mSk≤

δsK

√

s(1 –δsK)

xAk2 (2.7)

and

MAk≥ 1 –_√ δsK

|Ak|

xAk2. (2.8)

Proof Using the fact (I–Pk)ΦΛk=Φ_Λk–Φ_Λk(Φ∗

ΛkΦΛk)–1Φ∗

ΛkΦΛk= 0, we express

rk₁= (I–Pk)Φx

as

rk₁= (I–Pk)ΦTxT= (I–Pk)(ΦTkxTk+ΦAkxAk)

= (I–Pk)ΦAkxAk=

I–ΦΛkΦ† Λk

ΦAkxAk

=ΦAkxAk–ΦΛkΦ†

(6)

SinceΛk_∩_A

k=∅, applying (2.5) and (2.4) of Lemma2.1, and the fact that for any two

integersK≤Kit holds

δK≤δK, (2.10)

it follows that

Φ_Λ†kΦAkxAk2 =Φ ∗ ΛkΦΛk

–1

Φ_Λ∗kΦAkxAk2

≤ 1

(1 –δsk)

Φ_Λ∗kΦAkxAk2

≤ δsk+|Ak| (1 –δsk)

xAk2≤

δsK

(1 –δsK)

xAk2. (2.11)

By (2.9), (2.11), and Lemma2.1, we have

Φ_S∗

kr

k

12 =Φ ∗

Sk

ΦAkxAk–ΦΛkΦ

†

ΛkΦAkxAk2

≤Φ_S∗

kΦAkxAk2+Φ ∗

SkΦΛkΦ

†

ΛkΦAkxAk)2

≤δs+|Ak|xAk2+δs(k+1)Φ

†

ΛkΦAkxAk2

≤δsKxAk2+δsK

δsK

(1 –δsK)

xAk2

= δsK (1 –δsK)

xAk2. (2.12)

Thus

mSk≤ 1

√

sΦ

∗

Skr

k

12≤

δsK

√

s(1 –δsK)

xAk2.

Using (2.9) and Lemma2.2, we obtain

Φ_A∗

kr

k

12 =Φ ∗

Ak

I–Φ_ΛkΦ† Λk

ΦAkxAk)2

=Φ∗

T\Λk

I–Φ_ΛkΦ† Λk

Φ_T_\_Λkx_T_\_Λk) 2

≥(1 –δsK)xT\Λk2= (1 –δsK)xAk2. (2.13)

Hence

MAk≥ 1

√ |Ak|

Φ_A∗

kr

k

12≥

(1 –_√ δsK)

|Ak|

xAk2.

Therefore Lemma2.3is proved.

2.2 l2bounded noise

We shall discuss the performance of OMMP(s) for thel2 bounded noise case. That is, the error vectoreis bounded inl2 norm withe2≤b2for some constantb2> 0. With a suitable stopping rule and a reasonable condition on the minimum magnitude of the nonzero coordinates of theK-sparse signal ofx, OMMP(s) can recover the support ofx

(7)

Theorem 2.1 Suppose thate2≤b2andΦsatisﬁes the RIP of order sK with the isometry

constantδ:=δsK < 1 +1₂

K s–

1 2

K s + 4

K

s.Then OMMP(s)with the stopping ruler k

2≤

b2recovers the support of any K -sparse signal x∈RNfrom y=Φx+e provided that all the

nonzero components xisatisfy

|xi|>

2b2

(1 –δ) –

K s

δ 1–δ

. (2.14)

Proof DenoteEk=maxi∈Ω{|rk₂,φi|}. Recalling the deﬁnition ofmSk andMAk in Lemma

2.3, there hold

min i∈Sk

_rk_,_φ

i ≤min

i∈Sk _rk

1,φi +max i∈Ω

_rk

2,φi =mSk+Ek

and

max i∈Ak

rk,φi ≥max i∈Ak

rk₁,φi –max i∈Ω

rk₂,φi =MAk–Ek.

It is clear that in order for OMMP(s) to select at least one correct index at this step, it is necessary to have

max i∈Ak

_rk_,_φ

i >min i∈Sk

_rk_,_φ i .

A suﬃcient condition to guarantee OMMP(s) to select at least one correct index at each iteration until all correct indices are selected is

MAk–Ek>mSk+Ek.

That is,

2Ek<MAk–mSk (2.15)

holds fork= 0, 1, . . . ,K– 1. We ﬁrst provide an upper bound forEk. The assumptione ≤

b2yields that

Ek=max i∈Ω

rk₂,φi ≤max i∈Ω φi2

rk₂₂=(I–Pk)e₂≤ e2≤b2. (2.16)

Next we give a lower bound ofMAk–mSk. Using Lemma2.3and assumption (2.14), we have

MAk–mSk ≥

1 –δ

√ |Ak|

–√ δ

s(1 –δ)

xAk2

≥

1 –δ

√ |Ak|

–√ δ

s(1 –δ)

|Ak|

2b2

(1 –δ) –

K s

δ 1–δ

(8)

This means OMMP(s) will select at least one correct index at thek+ 1-th iteration, and OMMP(s) can recover the support ofxin at mostK iterations. It remains to show that under the stopping rulerk ≤b2, OMMP exactly stops when all the correct indices are selected.

First, assume thatAk=∅, thenT⊆Λkand (I–Pk)Φx= 0. Thus

rk₂=(I–Pk)Φx+ (I–Pk)e₂=(I–Pk)e₂≤ e2≤b2.

So when all theKcorrect indices are selected, thel2norm of the residual will be less than

b2and hence OMMP(s) stops. Now we show that OMMP(s) does not stop early. Suppose that the OMMP(s) has runksteps for somek<K. We will verify thatrk

2>b2 and so OMMP(s) does not stop at the current step.

Secondly, assume thatAk=∅, then

rk₂ =(I–Pk)Φx+ (I–Pk)e₂

≥(I–Pk)Φx₂–(I–Pk)e₂

=ΦAkxAk–ΦΛkΦ

†

ΛkΦAkxAk2–(I–Pk)e2 =ΦBkx

(k)

2–(I–Pk)e2

≥ΦBkx (k)

2–e2

≥1 –δ|Bk|x (k)

2–b2

≥√1 –δxAk2–b2

≥√1 –δ|Ak|

2b2

(1 –δ) –

K s

δ 1–δ

–b2

>b2,

wherex(k)= (xAk –Φ

†

ΛkΦAkxAk)

∗_{, and to derive the third inequality, we use the RIP}

con-dition. Hence OMMP(s) does not stop early, and the theorem is proved.

2.3 l_∞bounded noise

We now give the RIP based suﬃcient conditions for OMMP(s) withl∞bounded noise. Our result is the following theorem.

Theorem 2.2 Suppose thatΦ∗e∞ ≤b∞ andΦ satisﬁes the RIP of order sK with the same isometry constant as in Theorem2.1.Then OMMP(s)with the stopping rule

Φ∗rk_∞≤

1 +

√

sKδ

1 –δ

b∞

ﬁnds the support of any K -sparse signal x if all the nonzero coeﬃcients xisatisfy

|xi|>

2(1 + √

sKδ 1–δ )b∞

(1 –δ) –

K s

δ 1–δ

(9)

Proof DenoteEk=Φ∗(I–Pk)e∞=maxi∈Ω|r₂k,φi|. We ﬁrst prove that

2Ek<MAk–mSk (2.18)

holds fork= 0, 1, . . . ,K– 1, wheremSk andMAk are deﬁned as in Lemma2.3. It follows from the proof of Theorem2.1that inequality (2.18) can ensure OMMP recovers the true support ofxin at mostKsteps.

We ﬁrst provide an upper bound ofEk. The assumptionΦ∗e∞≤b∞ and the fact

Φ∗

Λk(I–Pk)e= 0 yield that

Ek=max i∈Ω

r₂k,φi = max i∈Ω\Λk

φ_i∗(I–Pk)e

≤Φ_Ω∗_\_Λke_∞+Φ_Ω∗_\_ΛkPke_∞

≤b∞+ max

i∈Ω\Λk φ_i∗ΦΛk

Φ_Λ∗kΦΛk –1

Φ_Λ∗ke₂

≤b∞+δΦ_Λ∗kΦΛk –1

Φ_Λ∗ke₂

≤b∞+ δ 1 –δΦ

∗ Λke2

≤b∞+ δ 1 –δ

√

skb∞

≤

1 +δ

√

sK

1 –δ

b∞. (2.19)

Here, to derive the third and fourth inequalities, we have used (2.4) and (2.5) of Lemma

2.1respectively.

On the other hand, we can obtain a lower bound ofMAk–mSk. By using Lemma2.3and assumption (2.17), we have

MAk–mSk ≥

1 –δ

√ |Ak|

–√ δ

s(1 –δ)

xAk2

≥

1 –δ

√ |Ak|

–√ δ

s(1 –δ)

|Ak|

2(1 + √

sKδ 1–δ )b∞

(1 –δ) –

K s

δ 1–δ

> 2

1 +

√

sKδ

1 –δ

b∞. (2.20)

The bounds (2.19) and (2.20) imply that (2.18) holds fork= 0, 1, . . . ,K– 1. It remains to show that OMMP (s) exactly stops when all the correct indices are selected under the stopping ruleΦ∗rk∞≤(1 +

√

sKδsK 1–δsK )b∞.

First, assume thatAk=∅, thenT⊆Λkand (I–Pk)Φx= 0. Thus

Φ∗rk_∞=Φ∗(I–Pk)(Φx+e)_∞=Φ∗(I–Pk)e_∞≤

1 +δ

√

sK

1 –δ

b∞,

(10)

Secondly, assume thatAk=∅, then

Φ∗rk_∞ =Φ∗(I–Pk)(Φx+e)_∞

≥Φ∗(I–Pk)Φx_∞–Φ∗(I–Pk)e_∞

≥max i∈Ak

(I–Pk)Φx,φi –Φ∗(I–Pk)e_∞

≥√1 –δ |Ak|

xAk–

1 +δ

√

sK

1 –δ

b∞

≥√1 –δ |Ak|

|Ak|

2(1 + √

sKδ 1–δ )b∞

(1 –δ) –

K s

δ 1–δ

–

1 +δ

√

sK

1 –δ

b∞

= (1 –δ)2(1 + √

sKδ 1–δ )b∞

(1 –δ) –

K s

δ 1–δ

–

1 +δ

√

sK

1 –δ

b∞

>

1 +δ

√

sK

1 –δ

b∞.

Here, to derive the third and fourth inequalities, we have used (2.8) of Lemma2.3and assumption (2.17), respectively. Hence OMMP(s) does not stop early, and the theorem is

proved.

2.4 Gaussian noise

As an application of the above results onl2andl∞bounded noise cases, we shall discuss the performance of OMMP(s) on recoveringK-sparse signals with the Gaussian noise. Gaussian noise is of particular interest in statistics since it is probably the best simulation of real noise when the noise source is particularly complex, cf. [19,20].

The motivation of applying the results on thel2bounded andl∞ bounded noise cases to the Gaussian noise case comes from the following results of Cai, Xu, and Zhang [21], which shows that the Gaussian noiseeis essentially bounded in bothl2andl∞ norms. They have shown that ifeis zero-mean white Gaussian noise with covarianceσ2IM×M,

that is,e∼N(0,σ2IM×M), then there hold

Pe∈e:e2≤σ

M+ 2MlogM≥1 – 1

M

and

Pe∈e:Φ∗e_∞≤σ2logN≥1 – 1 2√πlogN.

Combining the above results with those of Theorem2.1and Theorem2.2, we immedi-ately obtain the following theorems.

Theorem 2.3 Suppose that e∼N(0,σ2IM×M)andΦsatisﬁes the RIP of order sK with the

same isometry constant as in Theorem2.1and nonzero components xisatisfy

|xi|>

2σM+ 2√MlogM

(1 –δ) –

K s

δ 1–δ

(11)

Then OMMP(s)with the stopping rulerk

2≤σ

M+ 2√MlogM ﬁnds the support of any K -sparse signal x∈RN _{from y}₌_Φ_x₊_{e with probability at least}_{1 –} 1

M.

Theorem 2.4 Suppose that e∼N(0,σ2_I

M×M)andΦsatisﬁes the RIP of order sK with the

same isometry constant as in Theorem2.1and nonzero components xisatisfy

|xi|>

2(1 +√sKδ 1–δ )σ

√

2logN

(1 –δ) –

K s

δ 1–δ

.

Then OMMP(s)with the stopping ruleΦ∗rk∞≤(1 + √

sKδ 1–δ )σ

√

2logN ﬁnds the support of any K -sparse signal x∈RN from y=Φx+e with probability at least1 –₂√ 1

πlogN.

Theorem2.3and Theorem2.4show that, with a suitable stopping rule and a reasonable condition on the minimum magnitude of the nonzero coordinates of theK-sparse signal ofx, OMMP(s) can recover the support ofxin at mostKiterations with high probability from measurements with Gaussian noise.

3 Error performance of the OSGA

In this section, we study the error performance of the OSGA in the general context. LetH

be a real separable Hilbert space with an inner product·,·and the normx:=x,x1/2 for allx∈H. Recall that a setDfromHis called a dictionary if

D={φi,i∈I} ⊂H, and spanD=H.

Without loss of generality we shall assume that the dictionaryDis normalized, i.e.,

φi= 1, i∈I.

The standard measure of approximation power of a dictionary is the error of the best

m-term approximation. Given a dictionaryD, we deﬁne them-sparse class as

Σm:=Σm(D) :=

j∈Λ

cjφj:cj∈R,φj∈D,(Λ) =m

.

The error of bestm-term approximation to a functionf ∈Hfrom the dictionaryDis deﬁned as

σm(f) :=σm(f,D) := inf g∈Σm

f–g. (3.1)

Now we will use OSGA(s) to generate amterm approximantfm and estimate the error

f–fmin terms ofσm(f).

So let us ﬁrst recall the deﬁnition of the OSGA(s) (see Algorithm2).

To investigate the error performance of SOGA, one needs to make some assumptions on the dictionaryD. A simple and useful assumption is the coherenceμof a dictionaryD deﬁned by

μ:=μ(D) := sup

(12)

Algorithm 2Orthogonal Super Greedy Algorithm (OSGA(s))

Suppose we are given an integersand a target elementr0:=f ∈H. For eachm≥1, we inductively deﬁne

(1) φi(m–1)s+1, . . . ,φims∈Dare the elements of the dictionaryDsatisfying the following

inequality. DenoteIm:={i(m–1)s+1, . . . ,ims}and assume that min

i∈Im

rm–1,φi≥ sup

φ∈D,φ=φi,i∈Im

rm–1,φ.

(2) LetHm:=Hm(f) :=span(φi1, . . . ,φims), and letPHm denote an operator of orthogonal

projection ontoHm. Deﬁne

fm:=Gm(f) := OSGAm(f,D) :=PHm(f).

(3) Deﬁne the residual after themth iteration of the algorithm

rm:=rsm(f) :=f –fm.

Dictionaries with coherence μ are called μ-coherent. It was proved in [8] that for

μ-coherent dictionaries the OSGA provides the same (in the sense of order) upper bound on the sparse class for the error as the OGA. In this paper, we will improve this results under a weaker assumption on a dictionary. This assumption is RIP which is formulated in a finite dimensional context in Sect.2. Here we require the definition of RIP in terms of the dictionary instead of the measurement matrix in a infinite dimensional context:

A dictionaryDis said to satisfy RIP of orderkif there exists a constantδ∈(0, 1) such that, for any subsetJ⊂Iwith(J)≤kand any scalarsai,i∈J, the following inequalities

hold:

(1 –δ)

i∈J

|ai|2

≤

i∈J

aiφi

2≤(1 +δ)

i∈J

|ai|2

. (3.2)

As before, the minimum of allδsatisfying (3.2) is referred to as an isometry constantδk.

It is known from [22] that if the dictionaryDhas coherenceμ, then it satisﬁes RIP of orderkfork≤μ–1_{+ 1 with isometry constant}_δ

k≤μ(k– 1), but not vice versa.

Recently, under the RIP assumption, the authors in [7] proved the almost optimality of OGA by establishing the corresponding Lebesgue-type inequality. This inequality quan-tifies the efficiency of OGA for individual elements. It is natural to ask if we can establish Lebesgue-type inequality for OSGA(s). We give a positive answer to this question. In fact, we prove the following inequality for the OSGA(s) when the dictionaryDsatisfies RIP. This is the first Lebesgue-type inequality for the super greedy type algorithms obtained so far as we know.

Theorem 3.1 Given parameter s∈N,there exist ﬁxed constants A:= 88,δ∗:= 1/10,C:= 8

(13)

f ∈H,we have

rAn=f–fAn ≤Cσn(f,D).

In Theorem3.1, we remark that the values ofA,δ∗,Cfor which the above result holds are coupled. For example, it is possible to obtain a smaller value ofCat the price of a larger value ofAor of a smaller value ofδ∗.

Based on this inequality, we will derive the convergence rate of OSGA on the sparse class induced by dictionaries satisfying RIP conditions.

The rest part of this section is organized as follows. The Sects.3.1and3.2are devoted to the proof of Theorem3.1. In Sect.3.3, as an application of Theorem3.1, we study the rate of convergence of OSGA on the sparse class induced by RIP dictionaries.

3.1 Reduction of the residual

To prove Theorem3.1, we establish the following lemma which quantiﬁes the reduction of the residuals generated by the OSGA(s) under the RIP condition. Whens= 1, this lemma was proved in [7]. However, to deal with the case ofs> 1, we need some new techniques. We need to establish a new lemma (Lemma3.1). In what follows, we denote by

Λk:= k

i=1

Ii:={i1, . . . ,isk}

the set of indices selected afterkiterations of the OSGA(s) applied to the target element

f ∈H.

Lemma 3.1 Let(fk)k≥0be the sequence of approximations generated by the OSGA(s)

ap-plied to f,and let g=_i_∈_Tziφi,(T)≤n.Then,if T is not contained inΛk,we have

rk+12≤ rk2–

1 –δ(T∪Λk) (1 +δs)(T\Λk)

rk2–f –g2

. (3.3)

Lemma 3.1quantiﬁes the reduction ofrk at each iteration provided thatT is not

contained in Λk and thatrk ≥ f –g. Note that in the case whenT ⊆Λk, we have

rk ≤ f–g.

Proof We may assume thatrk ≥ f –g, otherwise inequality (3.3) is trivially satisﬁed.

Denote

Fk+1:=span(φi,i∈Ik+1).

ThenHk+1is a direct sum ofHkandFk+1. Therefore,

rk+1=f–fk+1=rk+fk–fk+1

=rk+fk–PHk+1(f)

=rk+fk–PHk+1(rk+fk)

(14)

It is clear thatFk+1⊂Hk+1implies

rk+12=rk–PHk+1(rk) 2

≤rk–PFk+1(rk) 2

=rk2–PFk+1(rk) 2

. (3.4)

Now we estimatePFk+1(rk). Since the dictionaryDsatisﬁes RIP of orderswith isom-etry constantδs, we have

i∈Ik+1

aiφi

2≤(1 +δs) i∈Ik+1

|ai|2

. (3.5)

Thus

PFk+1(rk)= sup ψ∈Fk+1,ψ≤1

PFk+1(rk),ψ

= sup

(ci)i∈Ik+1,i∈Ik+1ciφi≤1

i∈Ik+1

rk,φici

≥ sup

(ci)i∈Ik+1,i∈Ik+1|ci|2≤(1+δs)–1

i∈Ik+1

rk,φici

= (1 +δs)–1/2 i∈Ik+1

_r_k_,φi

21/2

. (3.6)

Using inequality (3.6), we can continue to estimate (3.4) as

rk+12≤ rk2– (1 +δs)–1 i∈Ik+1

rk,φi

2

≤ rk2– (1 +δs)–1

max i∈Ik+1

rk,φi2

.

Therefore to prove inequality (3.3), it suﬃces to prove that

(1 +δs)–1

max i∈Ik+1

rk,φi2

≥ 1 –δ(T∪Λk) (1 +δs)(T\Λk)

rk2–f –g2

. (3.7)

To prove this, we ﬁrst note that

2rk2–f–g2g–fk ≤ rk2–f –g2+g–fk2

=rk2–g–fk–rk2+g–fk2

≤2g–fk,rk= 2g,rk.

This inequality can be written as

rk2–f –g2≤

|g,rk|

g–fk2

(15)

If we writefk=

i∈Λkc

k

iφi,δ:=δ(T∪Λ_k), then the denominator of the right-hand side of inequality (3.8) satisﬁes the RIP

g–fk2 =

i∈T

ziφi–

i∈Λk

ck_iφi

2

=

i∈T\Λk

ziφi+

i∈T∩Λk

zi–cki

φi+

i∈Λk\T

–ck_iφi

2

≥(1 –δ)

i∈T\Λk

|zi|2+

i∈T∩Λk zi–cki

2

+

i∈Λk\T ck_i2

≥(1 –δ)

i∈T\Λ_k

|zi|2

.

On the other hand, the numerator of the right-hand side of inequality (3.8) satisﬁes

g,rk

2

=

i∈T

ziφi,rk

2

=

i∈T\Λk

ziφi,rk

2

≤

i∈T\Λk

|zi|φi,rk

2

≤max i∈Ik+1

rk,φi2 i∈T\Λk

|zi|

2

≤max i∈Ik+1

rk,φi

2

(T\Λk)

i∈T\Λ_k

|zi|2.

Therefore we obtain

rk2–f –g2≤

(T\Λk)(maxi∈Ik+1|rk,φi| 2₎

1 –δ(T∪Λk)

,

which implies (3.7). Thus we complete the proof of Lemma3.1.

The following proposition is an immediate consequence of Lemma3.1.

Proposition 3.1 Assume that,for a given integer A> 0andδ∗< 1,a dictionaryDsatisﬁes RIP of order(As+ 1)n withδ(As+1)n≤δ∗.If g=

i∈Tziφi,(T)≤n,then for any triple of

non-negative integers(j,m,L)such that(T∪Λj)≤m and j+mL≤(As+ 1)n,we have

rj+mL2≤e–

1–δ∗

(16)

Proof By Lemma3.1, ifg=_i_∈_Tziφi,(T)≤n, then for any triple of non-negative integers

(j,m,L) such that(T∪Λj)≤mandj+mL≤(As+ 1)n, we have

rj+mL2–f–g2≤

1 – 1 –δ ∗

(1 +δ∗)m

mL

max0,rj2–f –g2

≤e–1–δ ∗

1+δ∗Lmax0,r_j2–f –g2,

where we have used the fact that(T∪Λl)≤mfor alll≥j. This implies inequality (3.9).

Thus we complete the proof of Proposition3.1.

3.2 Lebesgue-type inequality for SOGA

We are in a position to prove Theorem3.1. Fixf ∈H, we ﬁrst observe that the assertion of the theorem can be derived from the following lemma.

Lemma 3.2 If0≤k<n satisﬁes

rAk ≤2σk(f), σk(f) > 4σn(f), (3.10)

then there exists k<k˜≤n such that

r_Ak˜ ≤2σk˜(f), σk˜(f)≤4σn(f). (3.11)

Indeed, assuming that Lemma3.2holds, we complete the proof of Theorem3.1as fol-lows. We letk˜be the largest integer in{0, 1, . . . ,n}for whichr_A˜k ≤2σk˜(f). Since

r0=σ0=f,

suchk˜exists. Ifk˜<n, then we must haveσ_k_˜(f)≤4σn(f), and therefore

rAn ≤ rA˜k ≤2σk˜(f)≤8σn(f). (3.12)

We are therefore left with proving Lemma3.2. For this, we ﬁx

δ∗= 1

10, (3.13)

and 0≤k<nsuch that (3.10) holds. Letk<K≤nbe the ﬁrst integer such thatσk> 4σK.

By the deﬁnition ofσK(f), for anyB> 1, there existsg=

j∈Tzjψj,(T) =Ksuch that

f–g ≤BσK(f).

The signiﬁcance ofKis that on the one hand

f–g ≤BσK(f) <

B

4σk(f), (3.14)

while on the other hand

(17)

To apply Proposition3.1for the abovegandj=Ak, we need to bound(T∪ΛAk) with

Ayet to be speciﬁed. To this end, we writeK=k+M, withM> 0, and notice that ifS⊂T

is any set with(S) =MandgS:=

j∈Szjψj, then

gS ≥f – (g–gS)–f –g

≥σk(f) –BσK(f)

≥

1 –B

4

σk(f), (3.16)

where we have used the fact thatg–gS∈Σk. Using the RIP, we obtain the following lower

bound for the coeﬃcients ofg: for any setS⊂Twith(S) =M,

1 –B 4

2

σ_k2(f)≤ gS2≤

1 +δ∗

j∈S

|zj|2=

11 10

j∈S

|zj|2. (3.17)

Take forSthe setSg of theMsmallest coeﬃcients ofgand note that then, for any more

generalS⊂Twith(S)≥M, one has

j∈S

|zj|2

j∈Sg

|zj|2

≥(S)/M,

and hence

10 11

1 –B 4

2

(S)

M σ

2

k(f)≤

j∈S

|zj|2. (3.18)

Now we consider the particular setS:=T\ΛAksatisfying(S)≥M. Combining the above

bound with the RIP, we obtain

10 11

1 –B 4

2 (S)

M σ

2

k(f)

1 –δ∗≤ gS2≤ g–fAk2

≤g–f+rAk

2

≤BσK(f) + 2σk(f)

2

≤

B

4 + 2 2

σ_k2(f).

Sinceδ∗=₁₀1 this gives the bound

(T\ΛAk)≤

9 11

(B₄+ 2)2 (1 –B

4)2

M≤11M, (3.19)

where the second inequality is obtained by takingBsuﬃciently close to 1.

(18)

boundrAkdirectly in terms ofσl(f) for somel>k. Accordingly, we use Proposition3.1

in diﬀerent ways for the two cases.

In the case whereK– 1 >k, i.e.,M≥2, we apply (3.9) withj=Ak,m= 11M, andL= 4. IndeedAk+Lm=Ak+ 44M≤Anholds fork+M≤nwheneverA≥44. Moreover, notice that for suchA

A(K– 1) =Ak+A(M– 1)

≥Ak+1 2AM

=Ak+Am 22 =Ak+Lm,

whenever

A≥88. (3.20)

This gives

rA(K–1)2≤ rAk+mL2

≤e–3611_r

Ak2+f –g2

≤e–3611₄_σ2

k(f) +B2σK2(f)

≤e–3611₆₄_σ2

K–1(f) +B2σK2–1(f)

≤4σ_K2_–1(f),

where we have used (3.15) in the fourth inequality, and the last inequality follows by taking

Bsuﬃciently close to 1. We thus obtain (3.11) for the valuek˜=K– 1 >k.

In the case whereK– 1 =k, i.e.,M= 1, we apply (3.9) withj=Ak,m= 11, andL= 8. From (3.19), we know that(T∪ΛAk)≤11 andAn≥A(k+ 1)≥Ak+mLforAsatisfying

(3.20). This yields

rA(k+1)2≤ rAk+mL2

≤e–7211_r_Ak2₊_f _–_g2

≤e–7211₄_σ2

k(f) +B2σk2+1(f)

≤

4e–7211₊B 2

16

σ_k2(f).

This implies thatΛA(k+1)containsT. In fact, if it missed one of the indicesi∈T, then we derive from the RIP condition

1 –δ∗|zi|2≤ g–fA(k+1)2

(19)

≤

BσK(f) +

4e–7211₊B 2

16σk(f) 2

≤

B

4 +

4e–72₁₁₊B2 16

σ_k2(f).

On the other hand, we know from (3.18) that

10 11

1 –B 4

2

σ_k2(f)≤ |zi|2, (3.21)

which forBsuﬃciently close to 1 is a contradiction since

10 11

1 –B 4

2 >10

9

B

4 +

4e–72₁₁₊B2 16

.

This implies that rA(k+1) ≤σk+1(f), and therefore inequality (3.11) holds for the value

˜

k=k+ 1. This veriﬁes Lemma3.2and hence completes the proof of Theorem3.1.

3.3 Convergence rate on sparse class

As an application of Theorem3.1, we will derive the convergence rate of SOGA on the sparse class induced by a dictionary. Firstly, we recall the deﬁnitions of these classes. For a general dictionaryDinH, we deﬁne the class of functions

A0₁(D) :=

f∈H:f =

i∈Λ

ci(f)φi,φi∈D,Λ<∞,

i∈Λ

ci(f)≤1

and we deﬁneA1(D) to be the closure ofA01(D) inH. It is well known that the classA1(D) plays an important role in the study of greedy approximation with respect to dictionar-ies. In [23], DeVore and Temlyakov proved that, for an arbitrary dictionaryDinH, the OGA provides, aftermiterations, an approximation off ∈A1(D) with the following up-per bound of the convergence rate:

rOGA_m (f)≤m–1/2.

Note that the ratem–1/2_{is sharp since when}_D_{is an ortho-normal basis of}_H_{, it is easy to} ﬁndf0∈A1(D) such that

rOGA_m (f0) =c·m–1/2.

Unlike OGA, to get the same convergence rate onA1(D) for SOGA, one must make some extra assumptions on the dictionary D. A speciﬁc feature of a dictionary, μ -coherence, allows us to build OSGA with the same rate of convergence. Assuming that the dictionaryDisμ-coherent, Liu and Temlyakov in [8] proved the following theorem.

Theorem 3.2 IfDis a dictionary in a Hilbert space H with coherence parameterμ.Then,

for s≤(2μ)–1_,_OSGA₍_s₎_provides_,_{after m iterations}_,_{an approximation of f} _∈_A

1(D)with

the following upper bound on the error:

(20)

Now we present our results. By an immediate consequence of Theorem3.1, we get the following theorem.

Theorem 3.3 Given parameter s∈N,for all n≥0,ifDis a dictionary in a Hilbert space H satisfying RIP of order(88s+ 1)n with isometry constantδ(88s+1)n≤1/10,then OSGA(s)

provides,after m iterations,an approximation of f ∈A1(D),with the following error bound:

r88n=f –f88n ≤Cn–

1 2_.

Proof As the discussion above, the results of DeVore and Temlyakov [8] imply the follow-ing estimate for the bestmterm approximation error of functionsf ∈A1(D):

σn(f,D)≤n–

1

2_, _n_{= 1, 2, . . . .} _(3.22)

Combining (3.22) with Theorem3.1, we complete the proof of Theorem3.3.

Acknowledgements

Thanks for all the referenced authors.

Funding

This work is supported by the National Nature Science Foundation of China [Grant Nos. 11701411, 11271199, 11671213, 11401247].

Availability of data and materials Not applicable.

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

The authors contributed equally and signiﬁcantly in writing this paper. All authors read and approved the ﬁnal manuscript.

Author details

1_{School of Sciences, Tianjin Chengjian University, Tianjin, China.}2_{School of Mathematical Sciences and LPMC, Nankai}

University, Tianjin, China.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional aﬃliations.

Received: 3 January 2019 Accepted: 16 April 2019 References

1. DeVore, R.A.: Nonlinear approximation. Acta Numer.7, 51–150 (1998)

2. Pati, Y.C., Rezaiifar, R., Krishnaprasad, P.S.: Orthogonal Matching Pursuit: Recursive Function Approximation with Applications to Wavelet Decomposition (1993)

3. Lin, J.H., Li, S.: Nonuniform support recovery from noisy measurements by orthogonal matching pursuit. J. Approx. Theory165, 20–40 (2013)

4. Liu, E., Temlyakov, V.N.: Super greedy type algorithm. Adv. Comput. Math.37, 493–504 (2012)

5. Needell, D., Tropp, J.A.: Cosamp: iterative signal recovery from incomplete and inaccurate samples. Appl. Comput. Harmon. Anal.26, 301–321 (2009)

6. Cai, T., Wang, L.: Orthogonal matching pursuit for sparse signal recovery with noise. IEEE Trans. Inf. Theory57, 4680–4688 (2011)

7. Cohen, A., Dahmen, W., DeVore, R.A.: Orthogonal matching pursuit under the restricted isometry property. Constr. Approx.45, 113–127 (2017)

8. Liu, E., Temlyakov, V.N.: The orthogonal super greedy algorithm and applications in compressed sensing. IEEE Trans. Inf. Theory58, 2040–2047 (2012)

9. Wei, D.: Analysis of orthogonal multi-matching pursuit under restricted isometry property. Sci. China Math.57, 2179–2188 (2014)

(21)

11. Wei, X.J., Ye, P.X.: Estimates of restricted isometry constant in super greedy algorithms. Int. J. Future Gener. Commun. Netw.8, 137–144 (2015)

12. Yi, S., Li, B., Pan, W.L., Li, J.: Analysis of generalised orthogonal matching pursuit using restricted isometry constant. Electron. Lett.14, 1020–1022 (2014)

13. Baraniuk, R.G.: Compressive sensing. IEEE Signal Process. Mag.24, 118–121 (2007)

14. Candes, E.J.: Compressive sampling. In: Int. Congress of Mathematics, vol. 3, pp. 1433–1452 (2006) 15. Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory52, 1289–1306 (2006)

16. Candes, E.J., Tao, T.: Decoding by linear programming. IEEE Trans. Inf. Theory51, 4203–4215 (2005)

17. Candes, E.J., Tao, T.: Near-optimal signal recovery from random projections: universal encoding strategies? IEEE Trans. Inf. Theory52, 5406–5425 (2005)

18. Satpathi, S., Das, R., Chakraborty, M.: Improving the bound on the rip constant in generalized orthogonal matching pursuit. IEEE Signal Process. Lett.20, 1074–1077 (2013)

19. Candes, E.J., Tao, T.: The Dantzig selector: statistical estimation whenpis much larger thann. Ann. Stat.35, 2313–2351 (2007)

20. Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat.32, 407–451 (2004)