R E S E A R C H
Open Access
Efficiency of orthogonal super greedy
algorithm under the restricted
isometry property
Xiujie Wei
1*and Peixin Ye
2*Correspondence:
[email protected] 1School of Sciences, Tianjin Chengjian University, Tianjin, China Full list of author information is available at the end of the article
Abstract
We investigate the efficiency of orthogonal super greedy algorithm (OSGA) for sparse recovery and approximation under the restricted isometry property (RIP). We first show that under the RIP conditions of the measurement matrix
Φ
and the minimum magnitude of the nonzero coordinates of the signal, forl2bounded orl∞bounded noise vectore, with explicit stopping rules, OSGA can recover the support of an arbitraryK-sparse signalxfromy=Φ
x+ein at mostKsteps. Then, we investigate the error performance of OSGA inmterm approximation with regards to dictionaries satisfying the RIP in a separable Hilbert space. We establish a Lebesgue-type inequality for OSGA. Based on this inequality, we obtain the optimal rate of convergence for the sparse class induced by such dictionaries.Keywords: Orthogonal super greedy algorithm; Orthogonal multi matching pursuit; Restricted isometry property; Compressed sensing;m-term approximation;
Lebesgue-type inequality
1 Introduction
Recovery and approximation by sparse linear combination of elements from a fixed re-dundant family is frequently utilized in many application areas, such as image or signal processing, PDE solvers and statistical learning, see [1]. In general, these problems are
N-Phard. It is well known that greedy type algorithms are efficient approaches to solve them, see [2–5]. Among others, the orthogonal greedy algorithm (OGA) has been widely used in practice. OGA is a simple yet powerful algorithm for highly nonlinear sparse ap-proximation that has seen a large amount of research over its history, see [3,6,7] and the reference therein.
In this paper, we consider the orthogonal super greedy algorithm (OSGA), which is more efficient than OGA from the viewpoint of computational complexity. The performances of (OSGA) in greedy approximation and signal recovering were analyzed in [8–12]. We will make further study on the efficiency of the OSGA from two aspects.
In Sect.2, we will study the efficiency of the OSGA in recoveringN-dimensional sparse signal from linear measurements. This topic is also known in the literature as compressed sensing (CS), [13–15]. We show that OSGA can recover aK-sparse signal from noisy mea-surements in at mostKsteps. We remark that in the field of signal processing, orthogo-nal super greedy algorithm OSGA is also known as orthogoorthogo-nal multi matching pursuit
(OMMP) [4]. So for the reader’s convenience, we will use the term OMMP instead of OSGA in Sect.2.
In Sect.3, we study the error performance of the OSGA in the general context. We in-vestigate the efficiency of OSGA form-term approximation with regard to dictionaries in a real separable Hilbert spaceH. Assuming that the dictionary satisfies the RIP condition, we establish a Lebesgue inequality for OSGA which bounds the error of OSGA by the best
m-term approximation error. Based on this inequality, we derive the sharp convergence rate of OSGA on the sparse class induced by dictionary satisfying the RIP condition.
2 The efficiency of the OMMP in CS
In this section, we analyze the efficiency of the OMMP in compressed sensing.
We consider the following model. Suppose thatx∈RN is an unknownN-dimensional
signal and we wish to recover it fromMnoisy measurementsygiven by inner products with fixed vectors, that is,
y=Φx+e, (2.1)
whereΦis a knownM×Nmeasurement matrix withMNande∈RM is a vector of
measurement errors. The error vector can either be zero, bounded, or Gaussian noise. Forx= (xj)Nj=1∈RN, define
xp:=
N
i=1
xpi
1/p
, 0 <p<∞,
and
x∞:= sup 1≤i≤N
|xi|
.
The signalx∈RN is said to beK-sparse ifx
0:=|supp(x)|=|{i:xi= 0}| ≤K<N. We
study the performance of OMMP for the recovery of the support of theK-sparse signalx
under model (2.1).
These algorithms recover theK-sparse signal by iteratively constructing the support of it by some greedy principles.
The OMMP is a stepwise forward selection and is easy to implement, it has been used for signal recovery. Now let us recall the definition of OMMP(s) in an algorithmic way (see Algorithm1).
OSGA(s) generalizes the orthogonal greedy algorithm (OGA) in the sense that it selects multiplesindices in each iteration, it can recover the sparse signal using fewer steps and further reduce the complexity. To investigate the efficiency of the OMMP in CS, we use the restricted isometry property (RIP) condition ofΦwhich ensures the stable recovery ofxfrom noisy measurements. This property is introduced by Candes and Tao [16,17] as follows.
A matrixΦis said to satisfy RIP of orderKif there exists a constantδ∈(0, 1) such that, for allK-sparse vectorsx,
Algorithm 1Orthogonal Matching Multi Pursuit (OMMP(s))
Input:Measurement matrixΦ, vectory, ands, the stopping criterion.
Step 1:Set the residualr0:=y, a trivial initial approximationx0:= 0, the index setΛ0:=∅, and the iteration counterl= 0.
Step 2:DefineΛl+1:=Λl∪ {i1, . . . ,is}such that
rl,φi1 ≥ · · · ≥r
l,φ
is ≥ sup
φ∈Φ,φ=φik,k=1,...,s
rl,φ .
Then
xl+1:= arg min z:supp(z)∈Λl+1
y–Φz2
and update the residual
rl+1:=y–Φxl+1.
End if the stopping condition is achieved. Otherwise, we setl:=l+ 1 and turn to step 2.
Output:If the algorithm stops at thekth iteration, outputΛkandx
Λk=xk.
We recall some results on the efficiency of OMMP based on RIP. These results concern the estimate of the upper bound ofsKorder RIP constantδsK. In the noiseless case, Liu
and Temlyakov [8] proved thatδsK <
√
s
(2+√2)√K is a sufficient condition for OMMP(s) to
recover everyK-sparse signal in at mostK iterations successfully. Then, Wang, Kwon, and Shim [10] relaxed the condition toδsK<
√
s
3√s+√K. The constant was further relaxed as
δsK<
√
s
2√s+√K by Satpathi, Das, and Chakraborty [18]. In [18] the authors also pointed out
the possibility of extending the results to the noisy case. On the other hand, Dan Wei [9] defined the restricted orthogonality constantθK,Kand proved thatδsK–s+1+
K
sθsK–s+1,s< 1
is a sufficient condition guaranteeing the perfect recovery ofK-sparse signals by OMMP. She shows that this result implies the result of [18]. Moreover in [9] the performance of OMMP for support recovery from noisy measurements was also discussed.
Motivated by the above works, we investigate the OMMP in the general setting where noise is presented under the RIP based conditions. We prove if the sampling matrix Φ
satisfies the RIP of ordersKwith the isometry constant
δsK< 1 +
1 2
K s –
1 2
K s + 4
K s,
then OMMP can recover everyK-sparse signal in at mostK steps with thel2bounded andl∞bounded noise. It is easy to check that
√
s
2√s+√K < 1 +
1 2
K s –
1 2
K s + 4
K s.
When dealing with the noisy measurements, one of the key components in OMMP is the stopping rule which depends on the structure of the noise. In general, there are several natural stopping rules for OMMP.
(1) Stop after a fixed number of iterationsK.
(2) Forl2bounded noise case: The error vectoreis bounded inl2norm withe2≤b2,
we set the stopping rule asrk
2≤b2.
(3) Forl∞bounded noise case: The error vectoreis bounded inl∞norm with
Φ∗e∞≤b∞, we set the stopping rule as Φ∗rk∞≤
1 +
√
sKδsK
1 –δsK
b∞,
whereΦ∗is the transpose ofΦ.
Since we mainly consider l2 bounded noise and l∞ bounded noise, we use stopping rules (2) and (3) respectively. In Sect.2.1, some notations are introduced and some conse-quences of restricted isometry properties are presented. In Sect.2.2, we give the RIP based sufficient conditions for OMMP(s) withl2bounded noise case by using the stopping rule (2). In Sect.2.3, we establish the results in thel∞bounded noise case by using the stopping rule (3). In Sect.2.4, we give the RIP based sufficient conditions for OMMP(s) with high probability in the Gaussian noise case.
2.1 Preliminary
The following notations will be used in this section. For a signalx∈RN, letΩ={1, . . . ,N}
denote the index set ofxandT= supp(x) ={i:xi= 0}the support of it.
Suppose thatφ1,φ2, . . . ,φNare the columns of the matrixΦ. We always assumeφi2= 1, 1≤i≤N. Suppose thatΛis a subset ofΩ, letΦΛdenote the matrixΦ restricted to the columns indexed byΛ. We use the same way to definexΛfor the vectorx∈RN. Thus
ΦΛx=ΦxΛ=ΦΛxΛ. (2.3)
We recall some useful consequences of the RIP.
Lemma 2.1([5]) Suppose thatΓ andΛare two disjoint sets of indices.If the matrixΦ
satisfies the RIP of order|Γ ∪Λ|with isometry constantδ|Γ∪Λ|,then for any x∈R|Λ|we
have
ΦΓ∗ΦΛx2≤δ|Γ∪Λ|x2 (2.4)
and
ΦΛ∗ΦΛ –1
x2≤ 1
(1 –δ|Λ|)
x2. (2.5)
Lemma 2.2([6]) Let x∈RNbe a K -sparse vector.Suppose that the matrixΦsatisfies the
RIP of order K with isometry constantδK.Then,forΛ⊂Ω,
(1 –δK)xT\Λ2≤ΦT∗\Λ
I–ΦΛΦΛ†
ΦT\ΛxT\Λ2≤(1 +δK)xT\Λ2, (2.6)
The residual vector afterksteps can be written as
rk=y–ΦΛkΦ†
Λky= (I–Pk)y
= (I–Pk)(Φx+e)
= (I–Pk)Φx+ (I–Pk)e
=rk1+rk2,
wherePk=ΦΛkΦ
†
Λk denotes the projection onto the linear space spanned by{φi,i∈Λk},
rk
1= (I–Pk)Φxis the signal part of the residual, andrk2= (I–Pk)eis the noise part of the
residual. Denote:
Tk=Λk∩T, Ak=T\Λk, Bk=T∪Λk.
Suppose thatSk⊆Ω\Bk,|Sk|=s, such that
min i∈Sk
rk,φi ≥ max i∈(Ω\Bk)\Sk
rk,φi .
The difference between OMMP(s) and the standard orthogonal matching pursuit (OMP) is that at each step of the OMMP(s),selements are simultaneously selected from a dictio-nary instead of one. For a sufficient condition to guarantee OMMP(s) to select at least one correct index at each iteration, we need to give a lower bound ofMSk–mSk. So we provide the following lemma which plays an important role in our analysis.
Lemma 2.3 Let mSk :=mini∈Sk{|r
k
1,φi|}and MAk :=maxi∈Ak{|r
k
1,φi|}.Assume Ak=∅.
Then there hold
mSk≤
δsK
√
s(1 –δsK)
xAk2 (2.7)
and
MAk≥ 1 –√ δsK
|Ak|
xAk2. (2.8)
Proof Using the fact (I–Pk)ΦΛk=ΦΛk–ΦΛk(Φ∗
ΛkΦΛk)–1Φ∗
ΛkΦΛk= 0, we express
rk1= (I–Pk)Φx
as
rk1= (I–Pk)ΦTxT= (I–Pk)(ΦTkxTk+ΦAkxAk)
= (I–Pk)ΦAkxAk=
I–ΦΛkΦ† Λk
ΦAkxAk
=ΦAkxAk–ΦΛkΦ†
SinceΛk∩A
k=∅, applying (2.5) and (2.4) of Lemma2.1, and the fact that for any two
integersK≤Kit holds
δK≤δK, (2.10)
it follows that
ΦΛ†kΦAkxAk2 =Φ ∗ ΛkΦΛk
–1
ΦΛ∗kΦAkxAk2
≤ 1
(1 –δsk)
ΦΛ∗kΦAkxAk2
≤ δsk+|Ak| (1 –δsk)
xAk2≤
δsK
(1 –δsK)
xAk2. (2.11)
By (2.9), (2.11), and Lemma2.1, we have
ΦS∗
kr
k
12 =Φ ∗
Sk
ΦAkxAk–ΦΛkΦ
†
ΛkΦAkxAk2
≤ΦS∗
kΦAkxAk2+Φ ∗
SkΦΛkΦ
†
ΛkΦAkxAk)2
≤δs+|Ak|xAk2+δs(k+1)Φ
†
ΛkΦAkxAk2
≤δsKxAk2+δsK
δsK
(1 –δsK)
xAk2
= δsK (1 –δsK)
xAk2. (2.12)
Thus
mSk≤ 1
√
sΦ
∗
Skr
k
12≤
δsK
√
s(1 –δsK)
xAk2.
Using (2.9) and Lemma2.2, we obtain
ΦA∗
kr
k
12 =Φ ∗
Ak
I–ΦΛkΦ† Λk
ΦAkxAk)2
=Φ∗
T\Λk
I–ΦΛkΦ† Λk
ΦT\ΛkxT\Λk) 2
≥(1 –δsK)xT\Λk2= (1 –δsK)xAk2. (2.13)
Hence
MAk≥ 1
√ |Ak|
ΦA∗
kr
k
12≥
(1 –√ δsK)
|Ak|
xAk2.
Therefore Lemma2.3is proved.
2.2 l2bounded noise
We shall discuss the performance of OMMP(s) for thel2 bounded noise case. That is, the error vectoreis bounded inl2 norm withe2≤b2for some constantb2> 0. With a suitable stopping rule and a reasonable condition on the minimum magnitude of the nonzero coordinates of theK-sparse signal ofx, OMMP(s) can recover the support ofx
Theorem 2.1 Suppose thate2≤b2andΦsatisfies the RIP of order sK with the isometry
constantδ:=δsK < 1 +12
K s–
1 2
K s + 4
K
s.Then OMMP(s)with the stopping ruler k
2≤
b2recovers the support of any K -sparse signal x∈RNfrom y=Φx+e provided that all the
nonzero components xisatisfy
|xi|>
2b2
(1 –δ) –
K s
δ 1–δ
. (2.14)
Proof DenoteEk=maxi∈Ω{|rk2,φi|}. Recalling the definition ofmSk andMAk in Lemma
2.3, there hold
min i∈Sk
rk,φ
i ≤min
i∈Sk rk
1,φi +max i∈Ω
rk
2,φi =mSk+Ek
and
max i∈Ak
rk,φi ≥max i∈Ak
rk1,φi –max i∈Ω
rk2,φi =MAk–Ek.
It is clear that in order for OMMP(s) to select at least one correct index at this step, it is necessary to have
max i∈Ak
rk,φ
i >min i∈Sk
rk,φ i .
A sufficient condition to guarantee OMMP(s) to select at least one correct index at each iteration until all correct indices are selected is
MAk–Ek>mSk+Ek.
That is,
2Ek<MAk–mSk (2.15)
holds fork= 0, 1, . . . ,K– 1. We first provide an upper bound forEk. The assumptione ≤
b2yields that
Ek=max i∈Ω
rk2,φi ≤max i∈Ω φi2
rk22=(I–Pk)e2≤ e2≤b2. (2.16)
Next we give a lower bound ofMAk–mSk. Using Lemma2.3and assumption (2.14), we have
MAk–mSk ≥
1 –δ
√ |Ak|
–√ δ
s(1 –δ)
xAk2
≥
1 –δ
√ |Ak|
–√ δ
s(1 –δ)
|Ak|
2b2
(1 –δ) –
K s
δ 1–δ
This means OMMP(s) will select at least one correct index at thek+ 1-th iteration, and OMMP(s) can recover the support ofxin at mostK iterations. It remains to show that under the stopping rulerk ≤b2, OMMP exactly stops when all the correct indices are selected.
First, assume thatAk=∅, thenT⊆Λkand (I–Pk)Φx= 0. Thus
rk2=(I–Pk)Φx+ (I–Pk)e2=(I–Pk)e2≤ e2≤b2.
So when all theKcorrect indices are selected, thel2norm of the residual will be less than
b2and hence OMMP(s) stops. Now we show that OMMP(s) does not stop early. Suppose that the OMMP(s) has runksteps for somek<K. We will verify thatrk
2>b2 and so OMMP(s) does not stop at the current step.
Secondly, assume thatAk=∅, then
rk2 =(I–Pk)Φx+ (I–Pk)e2
≥(I–Pk)Φx2–(I–Pk)e2
=ΦAkxAk–ΦΛkΦ
†
ΛkΦAkxAk2–(I–Pk)e2 =ΦBkx
(k)
2–(I–Pk)e2
≥ΦBkx (k)
2–e2
≥1 –δ|Bk|x (k)
2–b2
≥√1 –δxAk2–b2
≥√1 –δ|Ak|
2b2
(1 –δ) –
K s
δ 1–δ
–b2
>b2,
wherex(k)= (xAk –Φ
†
ΛkΦAkxAk)
∗, and to derive the third inequality, we use the RIP
con-dition. Hence OMMP(s) does not stop early, and the theorem is proved.
2.3 l∞bounded noise
We now give the RIP based sufficient conditions for OMMP(s) withl∞bounded noise. Our result is the following theorem.
Theorem 2.2 Suppose thatΦ∗e∞ ≤b∞ andΦ satisfies the RIP of order sK with the same isometry constant as in Theorem2.1.Then OMMP(s)with the stopping rule
Φ∗rk∞≤
1 +
√
sKδ
1 –δ
b∞
finds the support of any K -sparse signal x if all the nonzero coefficients xisatisfy
|xi|>
2(1 + √
sKδ 1–δ )b∞
(1 –δ) –
K s
δ 1–δ
Proof DenoteEk=Φ∗(I–Pk)e∞=maxi∈Ω|r2k,φi|. We first prove that
2Ek<MAk–mSk (2.18)
holds fork= 0, 1, . . . ,K– 1, wheremSk andMAk are defined as in Lemma2.3. It follows from the proof of Theorem2.1that inequality (2.18) can ensure OMMP recovers the true support ofxin at mostKsteps.
We first provide an upper bound ofEk. The assumptionΦ∗e∞≤b∞ and the fact
Φ∗
Λk(I–Pk)e= 0 yield that
Ek=max i∈Ω
r2k,φi = max i∈Ω\Λk
φi∗(I–Pk)e
≤ΦΩ∗\Λke∞+ΦΩ∗\ΛkPke∞
≤b∞+ max
i∈Ω\Λk φi∗ΦΛk
ΦΛ∗kΦΛk –1
ΦΛ∗ke2
≤b∞+δΦΛ∗kΦΛk –1
ΦΛ∗ke2
≤b∞+ δ 1 –δΦ
∗ Λke2
≤b∞+ δ 1 –δ
√
skb∞
≤
1 +δ
√
sK
1 –δ
b∞. (2.19)
Here, to derive the third and fourth inequalities, we have used (2.4) and (2.5) of Lemma
2.1respectively.
On the other hand, we can obtain a lower bound ofMAk–mSk. By using Lemma2.3and assumption (2.17), we have
MAk–mSk ≥
1 –δ
√ |Ak|
–√ δ
s(1 –δ)
xAk2
≥
1 –δ
√ |Ak|
–√ δ
s(1 –δ)
|Ak|
2(1 + √
sKδ 1–δ )b∞
(1 –δ) –
K s
δ 1–δ
> 2
1 +
√
sKδ
1 –δ
b∞. (2.20)
The bounds (2.19) and (2.20) imply that (2.18) holds fork= 0, 1, . . . ,K– 1. It remains to show that OMMP (s) exactly stops when all the correct indices are selected under the stopping ruleΦ∗rk∞≤(1 +
√
sKδsK 1–δsK )b∞.
First, assume thatAk=∅, thenT⊆Λkand (I–Pk)Φx= 0. Thus
Φ∗rk∞=Φ∗(I–Pk)(Φx+e)∞=Φ∗(I–Pk)e∞≤
1 +δ
√
sK
1 –δ
b∞,
Secondly, assume thatAk=∅, then
Φ∗rk∞ =Φ∗(I–Pk)(Φx+e)∞
≥Φ∗(I–Pk)Φx∞–Φ∗(I–Pk)e∞
≥max i∈Ak
(I–Pk)Φx,φi –Φ∗(I–Pk)e∞
≥√1 –δ |Ak|
xAk–
1 +δ
√
sK
1 –δ
b∞
≥√1 –δ |Ak|
|Ak|
2(1 + √
sKδ 1–δ )b∞
(1 –δ) –
K s
δ 1–δ
–
1 +δ
√
sK
1 –δ
b∞
= (1 –δ)2(1 + √
sKδ 1–δ )b∞
(1 –δ) –
K s
δ 1–δ
–
1 +δ
√
sK
1 –δ
b∞
>
1 +δ
√
sK
1 –δ
b∞.
Here, to derive the third and fourth inequalities, we have used (2.8) of Lemma2.3and assumption (2.17), respectively. Hence OMMP(s) does not stop early, and the theorem is
proved.
2.4 Gaussian noise
As an application of the above results onl2andl∞bounded noise cases, we shall discuss the performance of OMMP(s) on recoveringK-sparse signals with the Gaussian noise. Gaussian noise is of particular interest in statistics since it is probably the best simulation of real noise when the noise source is particularly complex, cf. [19,20].
The motivation of applying the results on thel2bounded andl∞ bounded noise cases to the Gaussian noise case comes from the following results of Cai, Xu, and Zhang [21], which shows that the Gaussian noiseeis essentially bounded in bothl2andl∞ norms. They have shown that ifeis zero-mean white Gaussian noise with covarianceσ2IM×M,
that is,e∼N(0,σ2IM×M), then there hold
Pe∈e:e2≤σ
M+ 2MlogM≥1 – 1
M
and
Pe∈e:Φ∗e∞≤σ2logN≥1 – 1 2√πlogN.
Combining the above results with those of Theorem2.1and Theorem2.2, we immedi-ately obtain the following theorems.
Theorem 2.3 Suppose that e∼N(0,σ2IM×M)andΦsatisfies the RIP of order sK with the
same isometry constant as in Theorem2.1and nonzero components xisatisfy
|xi|>
2σM+ 2√MlogM
(1 –δ) –
K s
δ 1–δ
Then OMMP(s)with the stopping rulerk
2≤σ
M+ 2√MlogM finds the support of any K -sparse signal x∈RN from y=Φx+e with probability at least1 – 1
M.
Theorem 2.4 Suppose that e∼N(0,σ2I
M×M)andΦsatisfies the RIP of order sK with the
same isometry constant as in Theorem2.1and nonzero components xisatisfy
|xi|>
2(1 +√sKδ 1–δ )σ
√
2logN
(1 –δ) –
K s
δ 1–δ
.
Then OMMP(s)with the stopping ruleΦ∗rk∞≤(1 + √
sKδ 1–δ )σ
√
2logN finds the support of any K -sparse signal x∈RN from y=Φx+e with probability at least1 –2√ 1
πlogN.
Theorem2.3and Theorem2.4show that, with a suitable stopping rule and a reasonable condition on the minimum magnitude of the nonzero coordinates of theK-sparse signal ofx, OMMP(s) can recover the support ofxin at mostKiterations with high probability from measurements with Gaussian noise.
3 Error performance of the OSGA
In this section, we study the error performance of the OSGA in the general context. LetH
be a real separable Hilbert space with an inner product·,·and the normx:=x,x1/2 for allx∈H. Recall that a setDfromHis called a dictionary if
D={φi,i∈I} ⊂H, and spanD=H.
Without loss of generality we shall assume that the dictionaryDis normalized, i.e.,
φi= 1, i∈I.
The standard measure of approximation power of a dictionary is the error of the best
m-term approximation. Given a dictionaryD, we define them-sparse class as
Σm:=Σm(D) :=
j∈Λ
cjφj:cj∈R,φj∈D,(Λ) =m
.
The error of bestm-term approximation to a functionf ∈Hfrom the dictionaryDis defined as
σm(f) :=σm(f,D) := inf g∈Σm
f–g. (3.1)
Now we will use OSGA(s) to generate amterm approximantfm and estimate the error
f–fmin terms ofσm(f).
So let us first recall the definition of the OSGA(s) (see Algorithm2).
To investigate the error performance of SOGA, one needs to make some assumptions on the dictionaryD. A simple and useful assumption is the coherenceμof a dictionaryD defined by
μ:=μ(D) := sup
Algorithm 2Orthogonal Super Greedy Algorithm (OSGA(s))
Suppose we are given an integersand a target elementr0:=f ∈H. For eachm≥1, we inductively define
(1) φi(m–1)s+1, . . . ,φims∈Dare the elements of the dictionaryDsatisfying the following
inequality. DenoteIm:={i(m–1)s+1, . . . ,ims}and assume that min
i∈Im
rm–1,φi≥ sup
φ∈D,φ=φi,i∈Im
rm–1,φ.
(2) LetHm:=Hm(f) :=span(φi1, . . . ,φims), and letPHm denote an operator of orthogonal
projection ontoHm. Define
fm:=Gm(f) := OSGAm(f,D) :=PHm(f).
(3) Define the residual after themth iteration of the algorithm
rm:=rsm(f) :=f –fm.
Dictionaries with coherence μ are called μ-coherent. It was proved in [8] that for
μ-coherent dictionaries the OSGA provides the same (in the sense of order) upper bound on the sparse class for the error as the OGA. In this paper, we will improve this results under a weaker assumption on a dictionary. This assumption is RIP which is formulated in a finite dimensional context in Sect.2. Here we require the definition of RIP in terms of the dictionary instead of the measurement matrix in a infinite dimensional context:
A dictionaryDis said to satisfy RIP of orderkif there exists a constantδ∈(0, 1) such that, for any subsetJ⊂Iwith(J)≤kand any scalarsai,i∈J, the following inequalities
hold:
(1 –δ)
i∈J
|ai|2
≤
i∈J
aiφi
2≤(1 +δ)
i∈J
|ai|2
. (3.2)
As before, the minimum of allδsatisfying (3.2) is referred to as an isometry constantδk.
It is known from [22] that if the dictionaryDhas coherenceμ, then it satisfies RIP of orderkfork≤μ–1+ 1 with isometry constantδ
k≤μ(k– 1), but not vice versa.
Recently, under the RIP assumption, the authors in [7] proved the almost optimality of OGA by establishing the corresponding Lebesgue-type inequality. This inequality quan-tifies the efficiency of OGA for individual elements. It is natural to ask if we can establish Lebesgue-type inequality for OSGA(s). We give a positive answer to this question. In fact, we prove the following inequality for the OSGA(s) when the dictionaryDsatisfies RIP. This is the first Lebesgue-type inequality for the super greedy type algorithms obtained so far as we know.
Theorem 3.1 Given parameter s∈N,there exist fixed constants A:= 88,δ∗:= 1/10,C:= 8
f ∈H,we have
rAn=f–fAn ≤Cσn(f,D).
In Theorem3.1, we remark that the values ofA,δ∗,Cfor which the above result holds are coupled. For example, it is possible to obtain a smaller value ofCat the price of a larger value ofAor of a smaller value ofδ∗.
Based on this inequality, we will derive the convergence rate of OSGA on the sparse class induced by dictionaries satisfying RIP conditions.
The rest part of this section is organized as follows. The Sects.3.1and3.2are devoted to the proof of Theorem3.1. In Sect.3.3, as an application of Theorem3.1, we study the rate of convergence of OSGA on the sparse class induced by RIP dictionaries.
3.1 Reduction of the residual
To prove Theorem3.1, we establish the following lemma which quantifies the reduction of the residuals generated by the OSGA(s) under the RIP condition. Whens= 1, this lemma was proved in [7]. However, to deal with the case ofs> 1, we need some new techniques. We need to establish a new lemma (Lemma3.1). In what follows, we denote by
Λk:= k
i=1
Ii:={i1, . . . ,isk}
the set of indices selected afterkiterations of the OSGA(s) applied to the target element
f ∈H.
Lemma 3.1 Let(fk)k≥0be the sequence of approximations generated by the OSGA(s)
ap-plied to f,and let g=i∈Tziφi,(T)≤n.Then,if T is not contained inΛk,we have
rk+12≤ rk2–
1 –δ(T∪Λk) (1 +δs)(T\Λk)
rk2–f –g2
. (3.3)
Lemma 3.1quantifies the reduction ofrk at each iteration provided thatT is not
contained in Λk and thatrk ≥ f –g. Note that in the case whenT ⊆Λk, we have
rk ≤ f–g.
Proof We may assume thatrk ≥ f –g, otherwise inequality (3.3) is trivially satisfied.
Denote
Fk+1:=span(φi,i∈Ik+1).
ThenHk+1is a direct sum ofHkandFk+1. Therefore,
rk+1=f–fk+1=rk+fk–fk+1
=rk+fk–PHk+1(f)
=rk+fk–PHk+1(rk+fk)
It is clear thatFk+1⊂Hk+1implies
rk+12=rk–PHk+1(rk) 2
≤rk–PFk+1(rk) 2
=rk2–PFk+1(rk) 2
. (3.4)
Now we estimatePFk+1(rk). Since the dictionaryDsatisfies RIP of orderswith isom-etry constantδs, we have
i∈Ik+1
aiφi
2≤(1 +δs) i∈Ik+1
|ai|2
. (3.5)
Thus
PFk+1(rk)= sup ψ∈Fk+1,ψ≤1
PFk+1(rk),ψ
= sup
(ci)i∈Ik+1,i∈Ik+1ciφi≤1
i∈Ik+1
rk,φici
≥ sup
(ci)i∈Ik+1,i∈Ik+1|ci|2≤(1+δs)–1
i∈Ik+1
rk,φici
= (1 +δs)–1/2 i∈Ik+1
rk,φi
21/2
. (3.6)
Using inequality (3.6), we can continue to estimate (3.4) as
rk+12≤ rk2– (1 +δs)–1 i∈Ik+1
rk,φi
2
≤ rk2– (1 +δs)–1
max i∈Ik+1
rk,φi2
.
Therefore to prove inequality (3.3), it suffices to prove that
(1 +δs)–1
max i∈Ik+1
rk,φi2
≥ 1 –δ(T∪Λk) (1 +δs)(T\Λk)
rk2–f –g2
. (3.7)
To prove this, we first note that
2rk2–f–g2g–fk ≤ rk2–f –g2+g–fk2
=rk2–g–fk–rk2+g–fk2
≤2g–fk,rk= 2g,rk.
This inequality can be written as
rk2–f –g2≤
|g,rk|
g–fk2
If we writefk=
i∈Λkc
k
iφi,δ:=δ(T∪Λk), then the denominator of the right-hand side of inequality (3.8) satisfies the RIP
g–fk2 =
i∈T
ziφi–
i∈Λk
ckiφi
2
=
i∈T\Λk
ziφi+
i∈T∩Λk
zi–cki
φi+
i∈Λk\T
–ckiφi
2
≥(1 –δ)
i∈T\Λk
|zi|2+
i∈T∩Λk zi–cki
2
+
i∈Λk\T cki2
≥(1 –δ)
i∈T\Λk
|zi|2
.
On the other hand, the numerator of the right-hand side of inequality (3.8) satisfies
g,rk
2
=
i∈T
ziφi,rk
2
=
i∈T\Λk
ziφi,rk
2
≤
i∈T\Λk
|zi|φi,rk
2
≤max i∈Ik+1
rk,φi2 i∈T\Λk
|zi|
2
≤max i∈Ik+1
rk,φi
2
(T\Λk)
i∈T\Λk
|zi|2.
Therefore we obtain
rk2–f –g2≤
(T\Λk)(maxi∈Ik+1|rk,φi| 2)
1 –δ(T∪Λk)
,
which implies (3.7). Thus we complete the proof of Lemma3.1.
The following proposition is an immediate consequence of Lemma3.1.
Proposition 3.1 Assume that,for a given integer A> 0andδ∗< 1,a dictionaryDsatisfies RIP of order(As+ 1)n withδ(As+1)n≤δ∗.If g=
i∈Tziφi,(T)≤n,then for any triple of
non-negative integers(j,m,L)such that(T∪Λj)≤m and j+mL≤(As+ 1)n,we have
rj+mL2≤e–
1–δ∗
Proof By Lemma3.1, ifg=i∈Tziφi,(T)≤n, then for any triple of non-negative integers
(j,m,L) such that(T∪Λj)≤mandj+mL≤(As+ 1)n, we have
rj+mL2–f–g2≤
1 – 1 –δ ∗
(1 +δ∗)m
mL
max0,rj2–f –g2
≤e–1–δ ∗
1+δ∗Lmax0,rj2–f –g2,
where we have used the fact that(T∪Λl)≤mfor alll≥j. This implies inequality (3.9).
Thus we complete the proof of Proposition3.1.
3.2 Lebesgue-type inequality for SOGA
We are in a position to prove Theorem3.1. Fixf ∈H, we first observe that the assertion of the theorem can be derived from the following lemma.
Lemma 3.2 If0≤k<n satisfies
rAk ≤2σk(f), σk(f) > 4σn(f), (3.10)
then there exists k<k˜≤n such that
rAk˜ ≤2σk˜(f), σk˜(f)≤4σn(f). (3.11)
Indeed, assuming that Lemma3.2holds, we complete the proof of Theorem3.1as fol-lows. We letk˜be the largest integer in{0, 1, . . . ,n}for whichrA˜k ≤2σk˜(f). Since
r0=σ0=f,
suchk˜exists. Ifk˜<n, then we must haveσk˜(f)≤4σn(f), and therefore
rAn ≤ rA˜k ≤2σk˜(f)≤8σn(f). (3.12)
We are therefore left with proving Lemma3.2. For this, we fix
δ∗= 1
10, (3.13)
and 0≤k<nsuch that (3.10) holds. Letk<K≤nbe the first integer such thatσk> 4σK.
By the definition ofσK(f), for anyB> 1, there existsg=
j∈Tzjψj,(T) =Ksuch that
f–g ≤BσK(f).
The significance ofKis that on the one hand
f–g ≤BσK(f) <
B
4σk(f), (3.14)
while on the other hand
To apply Proposition3.1for the abovegandj=Ak, we need to bound(T∪ΛAk) with
Ayet to be specified. To this end, we writeK=k+M, withM> 0, and notice that ifS⊂T
is any set with(S) =MandgS:=
j∈Szjψj, then
gS ≥f – (g–gS)–f –g
≥σk(f) –BσK(f)
≥
1 –B
4
σk(f), (3.16)
where we have used the fact thatg–gS∈Σk. Using the RIP, we obtain the following lower
bound for the coefficients ofg: for any setS⊂Twith(S) =M,
1 –B 4
2
σk2(f)≤ gS2≤
1 +δ∗
j∈S
|zj|2=
11 10
j∈S
|zj|2. (3.17)
Take forSthe setSg of theMsmallest coefficients ofgand note that then, for any more
generalS⊂Twith(S)≥M, one has
j∈S
|zj|2
j∈Sg
|zj|2
≥(S)/M,
and hence
10 11
1 –B 4
2
(S)
M σ
2
k(f)≤
j∈S
|zj|2. (3.18)
Now we consider the particular setS:=T\ΛAksatisfying(S)≥M. Combining the above
bound with the RIP, we obtain
10 11
1 –B 4
2 (S)
M σ
2
k(f)
1 –δ∗≤ gS2≤ g–fAk2
≤g–f+rAk
2
≤BσK(f) + 2σk(f)
2
≤
B
4 + 2 2
σk2(f).
Sinceδ∗=101 this gives the bound
(T\ΛAk)≤
9 11
(B4+ 2)2 (1 –B
4)2
M≤11M, (3.19)
where the second inequality is obtained by takingBsufficiently close to 1.
boundrAkdirectly in terms ofσl(f) for somel>k. Accordingly, we use Proposition3.1
in different ways for the two cases.
In the case whereK– 1 >k, i.e.,M≥2, we apply (3.9) withj=Ak,m= 11M, andL= 4. IndeedAk+Lm=Ak+ 44M≤Anholds fork+M≤nwheneverA≥44. Moreover, notice that for suchA
A(K– 1) =Ak+A(M– 1)
≥Ak+1 2AM
=Ak+Am 22 =Ak+Lm,
whenever
A≥88. (3.20)
This gives
rA(K–1)2≤ rAk+mL2
≤e–3611r
Ak2+f –g2
≤e–36114σ2
k(f) +B2σK2(f)
≤e–361164σ2
K–1(f) +B2σK2–1(f)
≤4σK2–1(f),
where we have used (3.15) in the fourth inequality, and the last inequality follows by taking
Bsufficiently close to 1. We thus obtain (3.11) for the valuek˜=K– 1 >k.
In the case whereK– 1 =k, i.e.,M= 1, we apply (3.9) withj=Ak,m= 11, andL= 8. From (3.19), we know that(T∪ΛAk)≤11 andAn≥A(k+ 1)≥Ak+mLforAsatisfying
(3.20). This yields
rA(k+1)2≤ rAk+mL2
≤e–7211rAk2+f –g2
≤e–72114σ2
k(f) +B2σk2+1(f)
≤
4e–7211+B 2
16
σk2(f).
This implies thatΛA(k+1)containsT. In fact, if it missed one of the indicesi∈T, then we derive from the RIP condition
1 –δ∗|zi|2≤ g–fA(k+1)2
≤
BσK(f) +
4e–7211+B 2
16σk(f) 2
≤
B
4 +
4e–7211+B2 16
σk2(f).
On the other hand, we know from (3.18) that
10 11
1 –B 4
2
σk2(f)≤ |zi|2, (3.21)
which forBsufficiently close to 1 is a contradiction since
10 11
1 –B 4
2 >10
9
B
4 +
4e–7211+B2 16
.
This implies that rA(k+1) ≤σk+1(f), and therefore inequality (3.11) holds for the value
˜
k=k+ 1. This verifies Lemma3.2and hence completes the proof of Theorem3.1.
3.3 Convergence rate on sparse class
As an application of Theorem3.1, we will derive the convergence rate of SOGA on the sparse class induced by a dictionary. Firstly, we recall the definitions of these classes. For a general dictionaryDinH, we define the class of functions
A01(D) :=
f∈H:f =
i∈Λ
ci(f)φi,φi∈D,Λ<∞,
i∈Λ
ci(f)≤1
and we defineA1(D) to be the closure ofA01(D) inH. It is well known that the classA1(D) plays an important role in the study of greedy approximation with respect to dictionar-ies. In [23], DeVore and Temlyakov proved that, for an arbitrary dictionaryDinH, the OGA provides, aftermiterations, an approximation off ∈A1(D) with the following up-per bound of the convergence rate:
rOGAm (f)≤m–1/2.
Note that the ratem–1/2is sharp since whenDis an ortho-normal basis ofH, it is easy to findf0∈A1(D) such that
rOGAm (f0) =c·m–1/2.
Unlike OGA, to get the same convergence rate onA1(D) for SOGA, one must make some extra assumptions on the dictionary D. A specific feature of a dictionary, μ -coherence, allows us to build OSGA with the same rate of convergence. Assuming that the dictionaryDisμ-coherent, Liu and Temlyakov in [8] proved the following theorem.
Theorem 3.2 IfDis a dictionary in a Hilbert space H with coherence parameterμ.Then,
for s≤(2μ)–1,OSGA(s)provides,after m iterations,an approximation of f ∈A
1(D)with
the following upper bound on the error:
Now we present our results. By an immediate consequence of Theorem3.1, we get the following theorem.
Theorem 3.3 Given parameter s∈N,for all n≥0,ifDis a dictionary in a Hilbert space H satisfying RIP of order(88s+ 1)n with isometry constantδ(88s+1)n≤1/10,then OSGA(s)
provides,after m iterations,an approximation of f ∈A1(D),with the following error bound:
r88n=f –f88n ≤Cn–
1 2.
Proof As the discussion above, the results of DeVore and Temlyakov [8] imply the follow-ing estimate for the bestmterm approximation error of functionsf ∈A1(D):
σn(f,D)≤n–
1
2, n= 1, 2, . . . . (3.22)
Combining (3.22) with Theorem3.1, we complete the proof of Theorem3.3.
Acknowledgements
Thanks for all the referenced authors.
Funding
This work is supported by the National Nature Science Foundation of China [Grant Nos. 11701411, 11271199, 11671213, 11401247].
Availability of data and materials Not applicable.
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
The authors contributed equally and significantly in writing this paper. All authors read and approved the final manuscript.
Author details
1School of Sciences, Tianjin Chengjian University, Tianjin, China.2School of Mathematical Sciences and LPMC, Nankai
University, Tianjin, China.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Received: 3 January 2019 Accepted: 16 April 2019 References
1. DeVore, R.A.: Nonlinear approximation. Acta Numer.7, 51–150 (1998)
2. Pati, Y.C., Rezaiifar, R., Krishnaprasad, P.S.: Orthogonal Matching Pursuit: Recursive Function Approximation with Applications to Wavelet Decomposition (1993)
3. Lin, J.H., Li, S.: Nonuniform support recovery from noisy measurements by orthogonal matching pursuit. J. Approx. Theory165, 20–40 (2013)
4. Liu, E., Temlyakov, V.N.: Super greedy type algorithm. Adv. Comput. Math.37, 493–504 (2012)
5. Needell, D., Tropp, J.A.: Cosamp: iterative signal recovery from incomplete and inaccurate samples. Appl. Comput. Harmon. Anal.26, 301–321 (2009)
6. Cai, T., Wang, L.: Orthogonal matching pursuit for sparse signal recovery with noise. IEEE Trans. Inf. Theory57, 4680–4688 (2011)
7. Cohen, A., Dahmen, W., DeVore, R.A.: Orthogonal matching pursuit under the restricted isometry property. Constr. Approx.45, 113–127 (2017)
8. Liu, E., Temlyakov, V.N.: The orthogonal super greedy algorithm and applications in compressed sensing. IEEE Trans. Inf. Theory58, 2040–2047 (2012)
9. Wei, D.: Analysis of orthogonal multi-matching pursuit under restricted isometry property. Sci. China Math.57, 2179–2188 (2014)
11. Wei, X.J., Ye, P.X.: Estimates of restricted isometry constant in super greedy algorithms. Int. J. Future Gener. Commun. Netw.8, 137–144 (2015)
12. Yi, S., Li, B., Pan, W.L., Li, J.: Analysis of generalised orthogonal matching pursuit using restricted isometry constant. Electron. Lett.14, 1020–1022 (2014)
13. Baraniuk, R.G.: Compressive sensing. IEEE Signal Process. Mag.24, 118–121 (2007)
14. Candes, E.J.: Compressive sampling. In: Int. Congress of Mathematics, vol. 3, pp. 1433–1452 (2006) 15. Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory52, 1289–1306 (2006)
16. Candes, E.J., Tao, T.: Decoding by linear programming. IEEE Trans. Inf. Theory51, 4203–4215 (2005)
17. Candes, E.J., Tao, T.: Near-optimal signal recovery from random projections: universal encoding strategies? IEEE Trans. Inf. Theory52, 5406–5425 (2005)
18. Satpathi, S., Das, R., Chakraborty, M.: Improving the bound on the rip constant in generalized orthogonal matching pursuit. IEEE Signal Process. Lett.20, 1074–1077 (2013)
19. Candes, E.J., Tao, T.: The Dantzig selector: statistical estimation whenpis much larger thann. Ann. Stat.35, 2313–2351 (2007)
20. Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat.32, 407–451 (2004)