Journal of Inequalities and Applications Volume 2010, Article ID 249507,18pages doi:10.1155/2010/249507
Research Article
The Convergence Rate for a
K
-Functional in
Learning Theory
Bao-Huai Sheng
1and Dao-Hong Xiang
21Department of Mathematics, Shaoxing University, Shaoxing, Zhejiang 312000, China 2Department of Mathematics, Zhejiang Normal University, Jinhua, Zhejiang 321004, China
Correspondence should be addressed to Bao-Huai Sheng,[email protected]
Received 11 November 2009; Accepted 21 February 2010
Academic Editor: Yuming Xing
Copyrightq2010 B.-H. Sheng and D.-H. Xiang. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
It is known that in the field of learning theory based on reproducing kernel Hilbert spaces the upper bounds estimate for aK-functional is needed. In the present paper, the upper bounds for theK-functional on the unit sphere are estimated with spherical harmonics approximation. The results show that convergence rate of theK-functional depends upon the smoothness of both the approximated function and the reproducing kernels.
1. Introduction
It is known that the goal of learning theory is to approximate a functionor some function featuresfrom data samples.
Let X be a compact subset of n-dimensional Euclidean spaces Rn, Y ⊂ R. Then,
learning theory is to find a functionf related the input x ∈ X to the output y ∈ Y see 1–3. The functionfxis determined by a probability distributionρx, y ρy|xρXx
onZ : X×Y,whereρXxis the marginal distribution onX andρy |xis the condition
probability ofyfor a givenx.
Generally, the distributionρx, yis known only through a set of samplez:{zi}mi1
{xi, yi}mi1 independently drawn according to ρx, y. Given a sample z, the regression
problem based on Support Vector MachineSVMlearning is to find a functionfz:X → Y
such that fzx is a good estimate of y when a new input x is provided. The binary
classification problem based on SVM learning is to find a functionψz :X → {−1,1}which
divides X into two parts. Hereψz is often induced by a real-valued function fz with the
form ofψzx sgnfzxwhere sgnfzx 1 iffzx ≥0, otherwise,−1. The functions
fz are often generated from the following Tikhonov regularization scheme see, e.g., 4–
samplez:
fz:fz,λ,HK arg min
f∈HK
1 m
m
i1
Vp
yifxi
λf2H K
, 1.1
whereλ > 0 is a positive constant called the regularization parameter andVpt 1−tp max{1−t,0}pp≥1calledp-norm SVM loss.
In addition, the Tikhonov regularization scheme involving offsetbsee, e.g.,4,10,11
can be presented below with a similar way to1.1
fz:fz, λ,HK arg min
ff∗b, f∗∈HK, b∈R
1 m
m
i1
Vp
yifxi
λf∗2H K
. 1.2
We are in a position to define reproducing kernel Hilbert space. A functionK:X×X → Ris called a Mercer kernel if it is continuous, symmetric, and positive semidefinite, that is, for any finite set of distinct points{x1, x2, . . . , xl} ⊂ X, the matrixKxi, xjli,j1is positive
semidefinite.
The reproducing kernel Hilbert space RKHS HK see 12 associated with the
Mercer kernel K is defined to be the closure of the linear span of the set of functions {Kx:Kx,·:x∈X}with the inner product·,·HK ·,·KsatisfyingKx, KyKKx, y and the reproducing property
Kx, gKgx, ∀x∈X, g∈ HK. 1.3
Ifgx iciKx, xi, theng2K
i,jcicjKxi, xj. DenoteCXas the space of
continuous function on X with the norm · ∞. Let κ : K∞. Then the reproducing property tells that
g
∞≤κgK, ∀g ∈ HK. 1.4
It is easy to see thatHKis a subset ofCX.We say thatKx, yis a universal kernel if for
any compact subsetX HKis dense inCX see13, Page 2652.
Let ∧ ⊂ X be a given discrete set of finite points. Then, we may define an RKHS H∧
K, · H∧Kby the linear span of the set of functions{Kx : Kx,· : x ∈ ∧}. Then, it is easy to see thatH∧K⊂ HKand for anyf ∈ H∧Kthere holdsfH∧
K fK. DefineEf : ZVpyfxdρx, yand f
Vp
ρ : arg minEf,where the minimum is
taken over all measurable functions. Then, to estimate the explicit learning rate, one needs to estimate the regularization errorssee, e.g.,4,7,9,14
Dλ: inf
f∈HK
Ef− EfVp
ρ
λf2H K
, 1.5
Dλ: inf
ff∗b, f∗∈HK, b∈R
Ef− EfVp
ρ
λf∗2H K
The convergence rate of1.5is controlled by theK-functionalsee, e.g.,9
Kf, λH K g∈Hinf
K
f−gp,dρ
Xλg
2
K
, λ >0, f ∈LpdρX
, 1.7
and1.6is controlled by anotherK-functionalsee, e.g.,4
Kf, λH
K g∈H inf
K, gg∗b, g∗∈HK, b∈R
f−gp,dρ Xλg
∗2
K
, 1.8
whereλ >0, f ∈Lpdρ
X {fx:fp,dρX <∞}with
f
p, dρX ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩
X
fxp
dρXx
1/p
, 1≤p <∞,
ess sup
x∈X
fx, p ∞.
1.9
We notice that, on one hand, theK-functionals1.7and1.8are the modifications of the K-functional of interpolation theory see15since the interpolation relation1.4.
On the other hand, they are different from the usualK-functionalssee e.g.,16–30since the termg2K.However, they have some similar point. For example, ifKx, yis a universal kernel,HKis dense inLpdρX see e.g.,31. Moreover, some classical function spaces such
as the polynomial spacessee2, 32and even some Sobolev spaces may be regarded as RKHSsee e.g.,33.
In learning theory we often require Kf, tHK Otβ and Kf, t
HK Ot
β for
someβ > 0 see e.g., 1,7,14. Many results on this topic have been achieved. With the
weighted Durrmeyer operators8,9showed the decay by takingKx, yto be the algebraic polynomials kernels on0,1×0,1or on the simplex inR2.
However, in general case, the convergence of K-functional 1.8 should also be considered since the offset often has influences on the solution of the learning algorithmssee e.g.,6,11. Hence, the purpose of this paper is twofold. One is to provide the convergence
rates of1.7and1.8whenKx, yis a general Mercer kernel on the unit sphereSq and
1≤p≤∞.The other is how to construct functions of the type of
fx β0g∗x β0
m
i1
βiKx, xi, x∈X, 1.10
to obtain the convergence rate of1.8. The translation networks constructed in34–37have the form of1.10and the zonal networks constructed in38,39have the form of1.10with β00. So the methods used by these references may be used here to estimate the convergence
rates of1.7and1.8if one can bound the termg∗2
K.
In the present paper, we shall give the convergence rate of1.7and1.8for a general kernel defined on the unit sphereSn {x∈Rn1 :x
Rn1 1}andρx, y ρy|xμnx
andμnx,the convergence rate of1.7-1.8in the general case may be obtained according
to the way used by1,8.
The rest of this paper is organized as follows. In Section 2, we shall restate some notations on spherical harmonics and present the main results. Some useful lemmas dealing with the approximation order for the de la Vall´ee means of the spherical harmonics, the Gauss integral formula, the Marcinkiewicz-Zygmund with respect to the scattered data obtained by G. Brown and F. Dai and a result on the zonal networks approximation provided by H. N. Mhaskar will be given inSection 3. A kind of weighted norm estimate for the Mercer kernel matrices on the unit sphere will be given inLemma 3.8. Our main results are proved in the last section.
Throughout the paper, we shall writeA OBif there exists a constantC >0 such thatA≤CB. We writeA∼BifAOBandBOA.
2. Notations and Results
To state the results of this paper, we need some notations and results on spherical harmonics.
2.1. Notations
For integers l ≥ 0, q ≥ 1, the class of all one variable algebraic polynomials of degree≤ l defined on −1,1is denoted byPl, the class of all spherical harmonics of degreel will be
denoted byHlq, and the class of all spherical harmonics of degreel ≤ nwill be denoted by q
n. The dimension ofHlqis given bysee40, Page 65
dql dimHlq ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩
2lq−1 lq−1
⎛ ⎝lq−1
q−1 ⎞
⎠, l≥1,
1, l0,
2.1
and that ofΠqnis
n l0d
q
l.One has the following well-known addition formulasee41, Page
10, Theorem 2:
dql
k1
Yl,kxYl,k
y d
q l
ωq
pql1xy, l0,1, . . . , x, y∈Sq, 2.2
wherepql1xis the degree-lgeneralized Legendre polynomial. The Legendre polynomials are normalized so thatpql11 1 and satisfy the orthogonality relations
1
−1
pql1xpqk1xWqxdx
ωq
ωq−1dlq
δl,k, Wqx
DefineLpSqand Lp
Wα,β by takingρX to be the usual volume element ofS
q and the
Jacobi weights functionsWα,βx 1−xα1xβ,α > −1, β > −1, respectively. For any
φ∈L1
Wqwe have the following relationsee42, Page 312:
Sq
φxydμqx ωq−1
1
−1
φxWqxdx. 2.4
The orthogonal projectionsYkf, xof a functionf ∈ L1SqonHkqare defined bysee e.g.,
43
Yk
f, x d
q k
ωq
Sq
pqk1xyfydμq
y, x∈Sq, 2.5
wherexydenotes the inner product ofxandy.
2.2. Main Results
Letφ∈L1
Wqsatisfyalφ ωq−1 1
−1φxp
q1
l xWqxdx >0 and
∞
l0alφdql <∞. Define
Kφ, x, yKφ, xy ∞
l0
al
φd
q l
ωq
pql1xy, al
φ>0, x, y∈Sq. 2.6
Then, by44, Chapter 17we know thatKφ, x, yis positive semidefinite onSqand the right
of2.6is convergence absolutely and uniformly since|pql1x| ≤ 1. Therefore,Kφ, x, yis a Mercer kernel onSq.By13, Theorem 10we know thatKφ, x, yis a universal kernel on
Sq. We suppose that there is a constantC
p>0 depending only onpsuch for anyy∈Sq
Kφ,·, y
p,Sq≤Cpφp,Wq, 1≤p≤∞, φ∈LpWq. 2.7
Given a finite set∧, we denote by∧the cardinality of∧. Forr >0 anda≥ 1 we say that a finite subset∧ ⊂Sqis anr, a-covering ofSqif
Sq⊂
ω∈∧
Bω, r, max
ω∈∧∧ ∩Bω, r≤a, 2.8
where Bω, r {x ∈ Sq : dω, x ≤ r}with dω, x : arc cosω, x being the geodesic
Let S ≥ 1 be an integer, a {an}n∞0 a sequence of real numbers. Define forward
difference operators byΔan Δ1an:an1−an,Δr : ΔΔr−1an,r2,3, . . . ,Δ0an:an,
|a|S: sup
0≤r≤S, v≥0
v1r|Δra v|,
|a|∗S
S
r1
∞
v0
v1r−1|Δra v|.
2.9
We say a finite subset∧ ⊂Sq is a subset of interpolatory type if for any real numbers
{yω}ω∈∧there is ap∈Πnq such thatpω yω,ω∈ ∧.This kind of subsets may be found from
45,46.
LetBSbe the set of all sequenceafor which|a|S<∞andB∗Sthe set of all sequencea
for which|a|S|a|∗S<∞.
Letβ >0 be a real number,f ∈LpSq.Then, we say∂
βf ∈LpSqif there is a function
ϕ∈LpSqsuch that
ϕx∼∞
k0
kβYk
f, x, x∈Sq. 2.10
We now give the results of this paper.
Theorem 2.1. If there is a constantγ0>0depending only onqsuch that∧is a subset of interpolatory
type and aδ/62m, a-covering ofSq satisfying0 < δ < γ
0/awitha ≥ 1and mbeing a given
positive integer.S > qis an integer.β > 0is a real number such that there isγ ≥ γ2q−1and
β−γ > q,φ ∈L∞W
q satisfies{l1
βa
lφ}l∞0 ∈ BSand{l1−βal−1φ}l∞0 ∈ B∗S.H∧Kφis the
reproducing kernel space reproduced by∧and the kernel2.6.∂γf∈LpSq. Then there is a constant
Cp,q >0depending only onpandqand a functiongm,φx C0gm,φ∗ xwithgm,φ∗ x∈ H∧Kφ
andC0a constant such that
f−gm,φp,SqO
1 2mγ
, 2.11
g∗
m,φ
2
H∧
Kφ
O22m2q−1. 2.12
The functionsφsatisfying the conditions of Theorem 2.1may be found in39, Page 357.
Corollary 2.2. Under the conditions ofTheorem 2.1. If∂γf∈LpSq, then
K
f, 1 22mγ
H∧
Kφ
O
1 2mγ
. 2.13
Corollary 2.2shows that the convergence rate of theK-functional1.8is controlled by
Theorem 2.3. If there is a constantγ >0depending only onqsuch that∧is a subset of interpolatory type and aδ/62m, a-covering ofSq satisfying0 < δ < γ/awith a ≥ 1and mbeing a given positive integer.H∧Kφis the reproducing kernel space reproducing by∧and the kernel2.6withφ∈ L∞W
qsatisfying{l1
βa
lφ} ∈ BSand{1/l1βa−l1φ} ∈ B∗S.Then, forα≥q1/p−1/2β
and∂βf∈LpSqthere holds
K
f, 1 2αm
H∧
Kφ
O
1 2mβ
, 2.14
whereamaxa,0.
3. Some Lemmas
To prove Theorems 2.1 and 2.3, we need some lemmas. The first one is about the Gauss integral formula and Marcinkiewicz inequalities.
Lemma 3.1see47–50. There exist constantsγ >0depending only onqsuch that for any positive integernand anyδ/n, a-covering∧ofSqsatisfying0< δ < γ/a, there exists a set of real numbers
λω∼n−q,ω∈ ∧such that
Sq
fydμq
y
ω∈∧
λωfω 3.1
for anyf ∈Πq3n,and forf ∈Πqn
f
p,Sq ∼ ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩
ω∈∧
λωfωp
1/p
, 0< p <∞,
max
ω∈∧fω, p ∞,
3.2
where∧ ∼ nq ∼ dimΠq
n,the constants of equivalence depending only onq,a,δ, andpwhen pis small. Here one employs the slight abuse of notation that001.
The second lemma we shall use is the Nikolskii inequality for the spherical harmonics.
Lemma 3.2see38,45,49,51,52. If 0 < p < r ≤ ∞,p ∈ Πqn, then one has the following Nikolskii inequality:
pp,Sq≤C
qpr,Sq ≤C
qnq1/p−1/rpp,Sq, 3.3
where the constantCqdepends only onq.
Lemma 3.3. Letμbe a positive measure onX.{Hk}k∞0 is a sequence of finite-dimensional spaces
satisfying the following:
IHk⊂Lpdμ.
IIHkis orthogonal toHl(inLpdμ) whenl /k.
IIIspan∞
k0Hkis dense inLpdμfor all1≤p≤∞.
IVH0is the collection of the constants.
The Ces`aro meansCδoffxis given by
σNδf, x 1 AδN
N
k0
AδN−kPk
f, x, AδN Γkδ1
Γk1Γδ1 3.4
forN1,2, . . ., where
Pk
f, x
X
fy d
k
l1
ϕk,ixϕk,i
y
dμy, x∈X, 3.5
and {ϕk,ix}dik1 is an orthogonal base of Hk in L2dμ. One sets,for a given α > 0, ∂αfx ∼
∞
k1kαPkf, xand∂αf ∈Lpdμif there existsϕ∈Lpdμsuch thatPkϕ, x kαPkf, x. Letηbe defined asηu ∈ C∞, ηu 1foru ∈0,1/2andηu 0foru >1and is a nonegative and nonincrease function.ρNf, xare the de la Vall´ee Poussin means defined as
ρN
f, x
∞
k0
η
k 2N
Pk
f, x, x∈X, f ∈L1dμ. 3.6
Then,ρNf, x fx, f ∈k≤NHk.If for someδ >0,σδNfp, dμ ≤Cfp, dμ,1 ≤ p≤ ∞, then,ρNfp, dμ≤Cfp, dμand
ρNf−fp, dμO
∂αfp, dμ
Nα
, ∂αf∈Lp
dμ. 3.7
Lemma 3.3makes the followingLemma 3.4.
Lemma 3.4. Letηbe the function defined as inLemma 3.3. Define two kinds of operators, respectively, by
VN
f, x
∞
k0
η
k 2N
Yk
f, x, x∈Sq, f ∈L1Sq,
VN∗f, x ∞
k0
η
k 2N
al
fd
q l
ωq
pql1x, x∈−1,1, f ∈L1
Wq.
Then,VNf, x fxfor anyf∈ΠqNandVN∗f, x fxfor anyf∈PN. Moreover,
VNf−f
p,Sq O
∂αf
p,Sq Nα
, ∂αf ∈LpSq, 3.9
VN∗f−fp,W qO
⎛
⎝∂αfp,Wq Nα
⎞
⎠, ∂αf∈Lp
Wq, 3.10
where forf ∈LpW
qone defines
∂αfx∼ ∞
k1
kαak
fd
q l
ωq
plq1x x∈−1,1. 3.11
Proof. By54, Lemma 2.2we knowVNfp,Sq ≤Cfp,Sqfor someC >0. Hence,3.9holds by3.7. By19, Theorem Awe knowσNδfp,Wq ≤Cfp,Wqforδ > q/2−1/2.Hence,3.10 holds by3.7.
Let∧ ⊂Sqbe a finite set. Then we call∧an M-Z quadrature measure of ordernif3.1
and3.2hold forf ∈ Πqn.By this definition one knows the finite set∧inLemma 3.1is an
M-Z quadrature measure of ordern. Define an operator as
σy
∧, η, f, x ∞
l0
η
l y
dq l
ωq
ω∈∧
λωfωpql1ωx, x∈Sq. 3.12
Then, we have the following results.
Lemma 3.5see39. For a given integern≥1,let∧be an M-Z quadrature measure of order62n,
φ∈L1
Wq,S > qan integer,1≤p≤∞,β > q/p
, wherepsatisfies1/p1/p1which satisfies
p1ifp ∞andp ∞ifp1.ηtdefined inLemma 3.3is a nonnegative and non-increasing function. Letφ∈L1
Wq satisfy{l1
−βa−1
l φ}
∞
l0 ∈ B∗S. Then, forf ∈Hγ,p,0< γ ≤β, whereHγ,p consists off∈LpSqfor which the derivative of orderγ; that is,∂γf, belongs toLpSq. Then, there is an operatorDφ :Hγ,p → LPSqsuch that
i(see [39, Proposition3.1, (b)]).Dφfp,Sq≤c∂γfp,Sqforf∈Hβ,p
fx
Sq
φxyDφf
ydμq
y, x∈Sq, 3.13
fl, k al
φ !Dφfl, k, 3.14
ii(see [39, Theorem3.1]). Moreover, if one adds an assumption that{l1βalφ}l∞0 ∈ BS, then, there are constantsC >0andC1>0such that
f·−
ω∈∧
λωDφσ2n∧, η, f, ωφ·ω
p,Sq
≤C2−nγ∂γfp,Sq 3.15
and for0≤γ≤β
∂γf−∂γσn∧, η, f
p,Sq ≤C1nγ−β∂βfp,Sq. 3.16
Lemma 3.6see e.g.,29, Page 230. Letf ∈L2Sq. Then,
f2,Sq ∞
k0
Ykf22,Sq 1/2
. 3.17
Following Lemma 3.7 deals with the orthogonality of the Legendre polynomials plxy.
Lemma 3.7. For the generalized Legendre polynomialspql1one has
dql ωq
Sq
pkq1uyplq1xudμqu pqk1
xy, x, y∈Sq. 3.18
Proof. It may be obtained by2.2.
Lemma 3.8. Letφ ∈L∞W
q satisfy2.7forp ∞and∂qφ∈L
∞
Wq.∧ ⊂ S
qis a finite set satisfying the conditions ofTheorem 2.1. Then, there is a constantCφ >0depending only onφsuch that
ω,ω∈∧
λωλωKφ, ω, ω2
1/2
≤Cφ<∞. 3.19
Proof. Define a matrix byK
√ λ
∧ λωK∧ω, ω λωω,ω∈∧, where K∧ω, ω
∧
l0
λl ×
dq l/ωqp
q1
l ωωwithλl≥0 andl∧2 {a aωω∈∧ :
ω∈∧a
2
ω1/2<∞}.Then,
ω,ω∈∧
λωλωK∧ω, ω2
1/2
K√λ
∧ l2 ∧
sup
a /0, a∈l2 ∧
aK
√ λ ∧ a
By the Parseval equality we have
aK
√ λ
∧ a
ω,ω∈∧
aω λωK∧
ω, ωaω λω
ω,ω∈∧
aω λω
⎛ ⎝∧
l0
λl×
dql ωq
plq1ωω ⎞
⎠aω λω
∧
l0
λl dql
α0
ω∈∧
aω λωYl,αω
2
≤max
0≤l≤∧λl
Sq ∧
l0
dql
α0
ω∈∧
aω λωYl,αω
Yl,αx
2
dμqx
max
0≤l≤∧λl
Sq ω∈∧
aω λω ∧
l0
dlq ωq
pql1xω
2
dμqx.
3.21
LetL∧x∈ΠqnsatisfyL∧ω aω/ λω,ω∈ ∧. Then, by3.1
aK
√ λ
∧ a
ω,ω∈∧
aω λωK∧ω, ωaω λω
≤ max
0≤l≤∧λl
Sq ∧
l0
dql ωq
Sq
L∧ωpql1xωdμqω
2
dμqx
max
0≤l≤∧λl
Sq|
L∧x|2dμqx
max
0≤l≤∧λl
ω∈∧
λω|L∧x|2 max
0≤l≤∧λla a.
3.22
Hence, aK
√ λ
∧ a ≤ max0≤l≤∧λlaa. On the other hand, since Kφ, x,·∞,Sq ≤ Cφ∞,W q, x∈Sq, we have for anyx∈Sqthat
Kφ, x, y−KV∧∗φ, x, yKφ−V∧∗φ, x, y≤Cφ−V∧∗φ
∞,Wq
. 3.23
It follows forx, y∈Sqthat
KV∧∗φ, x, y−Cφ−V∧∗φ
∞,Wq
≤Kφ, x, y≤KV∧∗φ, x, yCφ−V∧∗φ
∞,Wq .
DefineK
√ λ
∧ φ λωKφ, ω, ω λωω,ω∈∧. Then,3.24,3.10, the Cauchy inequality,
and the fact∧ ∼nqmake
aK
√ λ
∧ φa−aK √
λ
∧
V∧∗φaO
∧φ−V∧∗φ
∞,Wq
×aaO1aa. 3.25
It follows that
aK
√ λ
∧ φa≤aK √
λ
∧
V∧∗φaaK
√ λ
∧ φa−aK √
λ
∧
V∧∗φa
≤ max
0≤l≤∧η
l 2∧
aaO1aa.
3.26
Equation3.2thus holds.
4. Proof of the Main Results
We now show Theorems2.1and2.3, respectively.
Proof ofTheorem 2.1. Lemma 4.3 in39gave the following results.
Let1 ≤p ≤ ∞,γ > q/p,S > q be an integer, and a sequence of real numbers such aγ :{l1γal} ∈ BS.Then, there existsφ∈LpWqsuch thatalφ∈al,l0,1,2, . . . .
Since1lβ 1lγ1lβ−γandβ−γ> q,we have aφ∗∈L∞W
qsuch thatalφ
∗
1lγa
lφ.Hence,∂γφ∈L∞W
qand
KV2∗m1
φ, x, y ∞
l0
al
V2∗m1
φd
q l
ωq
pql1xy, x, y∈Sq, 4.1
and for 0≤k≤2m1there holds forx, y∈Sqthat
dkq ωq
Sq
KV2∗m1
φ, x, uplq1uydμqu ak
V2∗m1
φd
q k
ωq
pqk1xyak
φd
q k
ωq
pkq1xy. 4.2
It follows forx, y∈Sqthat
pqk1xy
SqK
V2∗m1
φ, x, uplq1uydμqu
ak
φ 4.3
On the other hand, since
V2mf, xY0V2mf, x
2m1
k1
Yk
where for 1≤k≤2m1, we have by4.3
Yk
V2m
f, x d
q k
ωqak
φ
Sq
KV2∗m1
φ, x, upql1uydμqu
×V2m
f, ydμq
y. 4.5
Hence, above equation and3.1-3.2makes
V2m
f, xY0
V2m
f, x
2 m1
k1
dqk ωqak
φ
Sq
Sq
KV2∗m1
φ, x, upqk1uydμqu
×V2m
f, ydμq
y
C0
ω∈∧
λωCω,φK
V2∗m1
φ, x, ω,
4.6
whereC0Y0V2mf, x,Cω,φ
2m1
k1
dq
k/ωqYkV2mf, ω/akφ.Define
gm,φx C0gm,φ∗ x C0
ω∈∧
λωCω,φK
φ, x, ω, x∈Sq. 4.7
Then, we knowgm,φ∗ x∈RH∧Kφand by3.9
gm,φ−f
p,Sq ≤gm,φ−V2mfp,SqV2mf−fp,Sq
O
1 2mγ
gm,φf−V2mf
p,Sq,
4.8
where
gm,φx−V2mf, x≤
ω∈∧
λωCω,φK
V2∗m1
φ−φ, x, ω. 4.9
It follows by3.9that gm,φ−V2mf
p,Sq ≤
ω∈∧
λωCω,φKV2∗m1φ−φ,·, ωp,Sq
O
1 2mγ
ω∈∧
λωCω,φ.
4.10
On the other hand, by the definition ofV2mfand3.14we have for 1≤k≤2m1that
Yk
V2mf, ωη
k 2m1
ak
φYk
Dφf, ω
whereDφ denotes the operatorDφofLemma 3.5forφxy Kφ, xy.Hence,
Cω,φ
2m1
k1
dkq ωq
η
k 2m1
Yk
Dφf, ω
, ω∈Sq. 4.12
Equation3.2and the definition ofηtmake
ω∈∧
λωCω,φ≤
ω∈∧
λω
2m1
k1
dkq ωq
η
k 2m1
Yk
Dφf, ω
≤Cq
2m1
k1
dqk
Sq Yk
Dφf, ωdμqω.
4.13
The H ¨older inequality, theiofLemma 3.5, and the fact that|pql1x| ≤1 make|YkDφf, ω| ≤
CqdqkDφfp,Sq ≤Cqdq
k∂γfp,Sq. Therefore,
ω∈∧
λωCω,φ≤Cq2m1
k1
dkq2∂γfp,Sq. 4.14
Takegm,φ∗ x
ω∈∧λωCω,φKφ, x, ω,then
gm,φ∗ 2
H∧
Kφ
ω,ω∈∧
λωλωCω,φCω,φKφ, ω, ω
≤
ω∈∧
λωCω,φ2
ω,ω∈∧
λωλωKφ, ω, ω2
1/2
.
4.15
Equations3.2,3.17,3.16, and the Cauchy inequality make
ω∈∧
λωCω,φ2≤C
2m1
k1
dqk ωq
Yk
Vm
f, ω ak φ 2
2,Sq
≤C 2m1
k1
dqk2∂γfp,Sq 2
.
4.16
LetΓxbe the Gamma function. Then, it is well known thatΓx/Γxa∼x−a, x → ∞.Therefore,
dlq 2lq−1 lq−1
lq−1 q−1
2lq−1Γlq−1 ΓqΓl1 ∼
2lq−1 Γq l
Hence,
2m1
k1
dqk2O 2m1
l1
l2q−1
O2m12q−1, m−→∞. 4.18
Equations4.14and4.4make
gm,φ−V2mf
p,Sq O
1 2mγ−2q1
, 4.19
and hence
f−gm,φp,SqO
1 2mγ
1 2mγ−2q1
. 4.20
Sinceγ ≥ 2q−1γ, we have2.11by4.20. Equation2.12follows by 4.3,4.4, and 3.19.
Proof ofCorollary 2.2.By2.11-2.12one has
K
f, 1 22mγ
HKφ
≤f−gm,φp,Sq 1 22mγ
g∗
m,φ
2
HKφ
O
1 2mγ
1
22mγ ×O
22m2q−1O
1 2mγ
.
4.21
Proof ofTheorem 2.3. Take the place ofφxωinLemma 3.5withKφ, xω,denote still byDφ
the operatorDφ inLemma 3.5withφxy Kφ, xyand
gm,φx
ω∈∧
λωDφσ2m∧, η, f, ωKφ, xω, x∈Sq, 4.22
then,gm,φ∈HK∧φand by3.15f−gm,φp,SqO1/2mβ.In this case,
gm,φ2H∧
Kφ
ω,ω∈∧
λωλωDφσ2m∧, η, f, ωDφσ2m∧, η, f, ωKφ, ωω
≤
ω∈∧
λωDφσ2m∧, η, f, ω2
ω,ω∈∧
λωλωKφ, ωω2
1/2
.
Sinceσ2m∧, η, xis a spherical harmonics of order≤ 2m, we know byiofLemma 3.5that
Dφσ2m∧, η, f, xare also spherical harmonics of order≤ 2m.Then, 3.2,iofLemma 3.5, 3.3, and3.16make
ω∈∧
λωDφσ2m∧, η, f, ω2≤C
Sq
Dφσ2m∧, η, f, ω2dμqω
≤C∂βσ2m∧, η, f2
2,Sq
≤C22mq1/p−1/2∂
βσ2m∧, η, f2
p,Sq
≤C22mq1/p−1/2∂βf−∂βσ2m∧, η, f2
p,SqC∂βf2p,Sq
. 4.24
Hence,3.19and above equation makegm,φ2H∧
Kφ ≤C2
mq1/p−1/2∂βf2
p,Sq. Equation2.14 follows by3.15.
Acknowledgments
This work is supported by the National NSF10871226 of China. The authors thank the reviewers for giving very valuable suggestions.
References
1 F. Cucker and S. Smale, “On the mathematical foundations of learning,”Bulletin of the American Mathematical Society, vol. 39, no. 1, pp. 1–49, 2002.
2 F. Cucker and D.-X. Zhou, Learning Theory: An Approximation Theory Viewpoint, Cambridge Monographs on Applied and Computational Mathematics, Cambridge University Press, Cambridge, Mass, USA, 2007.
3 V. N. Vapnik, Statistical Learning Theory, Adaptive and Learning Systems for Signal Processing, Communications, and Control, John Wiley & Sons, New York, NY, USA, 1998.
4 D. R. Chen, Q. Wu, Y. M. Ying, and D. X. Zhou, “Support vector machine soft margin classifier: error analysis,”Journal of Machine Learning and Research, vol. 5, pp. 1143–1175, 2004.
5 T. Evgeniou, M. Pontil, and T. Poggio, “Regularization networks and support vector machines,” Advances in Computational Mathematics, vol. 13, no. 1, pp. 1–50, 2000.
6 Y. Li, Y. Liu, and J. Zhu, “Quantile regression in reproducing kernel Hilbert spaces,”Journal of the American Statistical Association, vol. 102, no. 477, pp. 255–268, 2007.
7 H. Tong, D.-R. Chen, and L. Peng, “Analysis of support vector machines regression,”Foundations of Computational Mathematics, vol. 9, no. 2, pp. 243–257, 2009.
8 H. Tong, D.-R. Chen, and L. Peng, “Learning rates for regularized classifiers using multivariate polynomial kernels,”Journal of Complexity, vol. 24, no. 5-6, pp. 619–631, 2008.
9 D.-X. Zhou and K. Jetter, “Approximation with polynomial kernels and SVM classifiers,”Advances in Computational Mathematics, vol. 25, no. 1–3, pp. 323–344, 2006.
10 D. Chen and D.-H. Xiang, “The consistency of multicategory support vector machines,”Advances in Computational Mathematics, vol. 24, no. 1–4, pp. 155–169, 2006.
11 E. De Vito, L. Rosasco, A. Caponnetto, M. Piana, and A. Verri, “Some properties of regularized kernel methods,”Journal of Machine Learning Research, vol. 5, pp. 1363–1390, 2004.
12 N. Aronszajn, “Theory of reproducing kernels,”Transactions of the American Mathematical Society, vol. 68, pp. 337–404, 1950.
14 Q. Wu, Y. Ying, and D.-X. Zhou, “Multi-kernel regularized classifiers,”Journal of Complexity, vol. 23, no. 1, pp. 108–134, 2007.
15 J. Bergh and J. L ¨ofstr ¨om,Interpolation Spaces, Springer, New York, NY, USA, 1976.
16 H. Berens and G. G. Lorentz, “Inverse theorems for Bernstein polynomials,” Indiana University Mathematics Journal, vol. 21, no. 8, pp. 693–708, 1972.
17 H. Berens and L. Q. Li, “The Peetre K-moduli and best approximation on the sphere,” Acta Mathematica Sinica, vol. 38, no. 5, pp. 589–599, 1995.
18 H. Berens and Y. Xu, “K-moduli, moduli of smoothness, and Bernstein polynomials on a simplex,” Indagationes Mathematicae, vol. 2, no. 4, pp. 411–421, 1991.
19 W. Chen and Z. Ditzian, “Best approximation andK-functionals,”Acta Mathematica Hungarica, vol. 75, no. 3, pp. 165–208, 1997.
20 W. Chen and Z. Ditzian, “Best polynomial and Durrmeyer approximation inLpS,”Indagationes Mathematicae, vol. 2, no. 4, pp. 437–452, 1991.
21 F. Dai and Z. Ditzian, “Jackson inequality for Banach spaces on the sphere,” Acta Mathematica Hungarica, vol. 118, no. 1-2, pp. 171–195, 2008.
22 Z. Ditzian and X. Zhou, “Optimal approximation class for multivariate Bernstein operators,”Pacific Journal of Mathematics, vol. 158, no. 1, pp. 93–120, 1993.
23 Z. Ditzian and K. Runovskii, “Averages and K-functionals related to the Laplacian,” Journal of Approximation Theory, vol. 97, no. 1, pp. 113–139, 1999.
24 Z. Ditzian, “A measure of smoothness related to the Laplacian,” Transactions of the American Mathematical Society, vol. 326, no. 1, pp. 407–422, 1991.
25 Z. Ditzian and V. Totik,Moduli of Smoothness, vol. 9 ofSpringer Series in Computational Mathematics, Springer, New York, NY, USA, 1987.
26 Z. Ditzian, “Approximation on Banach spaces of functions on the sphere,”Journal of Approximation Theory, vol. 140, no. 1, pp. 31–45, 2006.
27 Z. Ditzian, “Fractional derivatives and best approximation,”Acta Mathematica Hungarica, vol. 81, no. 4, pp. 323–348, 1998.
28 L. L. Schumaker,Spline Functions: Basic Theory, John Wiley & Sons, New York, NY, USA, 1981, Pure and Applied Mathematics.
29 K. Y. Wang and L. Q. Li,Harmonic Analysis and Approximation on the Unit Sphere, Science Press, Beijing, China, 2000.
30 Y. Xu, “Approximation by means of h-harmonic polynomials on the unit sphere,” Advances in Computational Mathematics, vol. 21, no. 1-2, pp. 37–58, 2004.
31 S. Smale and D.-X. Zhou, “Estimating the approximation error in learning theory,”Analysis and Applications, vol. 1, no. 1, pp. 17–41, 2003.
32 B. Sheng, “Estimates of the norm of the Mercer kernel matrices with discrete orthogonal transforms,” Acta Mathematica Hungarica, vol. 122, no. 4, pp. 339–355, 2009.
33 S. Loustau, “Aggregation of SVM classifiers using Sobolev spaces,” Journal of Machine Learning Research, vol. 9, pp. 1559–1582, 2008.
34 H. N. Mhaskar and C. A. Micchelli, “Degree of approximation by neural and translation networks with a single hidden layer,”Advances in Applied Mathematics, vol. 16, no. 2, pp. 151–183, 1995.
35 B. H. Sheng, “Approximation of periodic functions by spherical translation networks,” Acta Mathematica Sinica. Chinese Series, vol. 50, no. 1, pp. 55–62, 2007.
36 B. Sheng, “On the degree of approximation by spherical translations,”Acta Mathematicae Applicatae Sinica. English Series, vol. 22, no. 4, pp. 671–680, 2006.
37 B. Sheng, J. Wang, and S. Zhou, “A way of constructing spherical zonal translation network operators with linear bounded operators,”Taiwanese Journal of Mathematics, vol. 12, no. 1, pp. 77–92, 2008.
38 H. N. Mhaskar, F. J. Narcowich, and J. D. Ward, “Approximation properties of zonal function networks using scattered data on the sphere,”Advances in Computational Mathematics, vol. 11, no. 2-3, pp. 121–137, 1999.
39 H. N. Mhaskar, “Weighted quadrature formulas and approximation by zonal function networks on the sphere,”Journal of Complexity, vol. 22, no. 3, pp. 348–370, 2006.
40 H. Groemer,Geometric Applications of Fourier Series and Spherical Harmonics, vol. 61 ofEncyclopedia of Mathematics and Its Applications, Cambridge University Press, Cambridge, Mass, USA, 1996.
41 C. M ¨uller,Spherical Harmonics, vol. 17 ofLecture Notes in Mathematics, Springer, Berlin, Germany, 1966.
42 S. Z. Lu and K. Y. Wang,Bochner-Riesz Means, Beijing Normal University Press, Beijing, China, 1988.
44 H. Wendland, Scattered Data Approximation, vol. 17 of Cambridge Monographs on Applied and Computational Mathematics, Cambridge University Press, Cambridge, Mass, USA, 2005.
45 H. N. Mhaskar, F. J. Narcowich, N. Sivakumar, and J. D. Ward, “Approximation with interpolatory constraints,”Proceedings of the American Mathematical Society, vol. 130, no. 5, pp. 1355–1364, 2002.
46 F. J. Narcowich and J. D. Ward, “Scattered data interpolation on spheres: error estimates and locally supported basis functions,”SIAM Journal on Mathematical Analysis, vol. 33, no. 6, pp. 1393–1410, 2002.
47 G. Brown and F. Dai, “Approximation of smooth functions on compact two-point homogeneous spaces,”Journal of Functional Analysis, vol. 220, no. 2, pp. 401–423, 2005.
48 G. Brown, D. Feng, and S. Y. Sheng, “Kolmogorov width of classes of smooth functions on the sphere
Sd−1,”Journal of Complexity, vol. 18, no. 4, pp. 1001–1023, 2002.
49 F. Dai, “Multivariate polynomial inequalities with respect to doubling weights andA∞weights,” Journal of Functional Analysis, vol. 235, no. 1, pp. 137–170, 2006.
50 H. N. Mhaskar, F. J. Narcowich, and J. D. Ward, “Spherical Marcinkiewicz-Zygmund inequalities and positive quadrature,”Mathematics of Computation, vol. 70, no. 235, pp. 1113–1130, 2001.
51 E. Belinsky, F. Dai, and Z. Ditzian, “Multivariate approximating averages,”Journal of Approximation Theory, vol. 125, no. 1, pp. 85–105, 2003.
52 A. I. Kamzolov, “Approximation of functions on the sphereSn,”Serdica, vol. 10, no. 1, pp. 3–10, 1984.
53 F. Dai and Z. Ditzian, “Ces`aro summability and Marchaud inequality,”Constructive Approximation, vol. 25, no. 1, pp. 73–88, 2007.