The Convergence Rate for a Functional in Learning Theory

(1)

Journal of Inequalities and Applications Volume 2010, Article ID 249507,18pages doi:10.1155/2010/249507

Research Article

The Convergence Rate for a

K

-Functional in

Learning Theory

Bao-Huai Sheng

1

_{and Dao-Hong Xiang}

2

1_{Department of Mathematics, Shaoxing University, Shaoxing, Zhejiang 312000, China} 2_{Department of Mathematics, Zhejiang Normal University, Jinhua, Zhejiang 321004, China}

Correspondence should be addressed to Bao-Huai Sheng,[email protected]

Received 11 November 2009; Accepted 21 February 2010

Academic Editor: Yuming Xing

Copyrightq2010 B.-H. Sheng and D.-H. Xiang. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

It is known that in the field of learning theory based on reproducing kernel Hilbert spaces the upper bounds estimate for aK-functional is needed. In the present paper, the upper bounds for theK-functional on the unit sphere are estimated with spherical harmonics approximation. The results show that convergence rate of theK-functional depends upon the smoothness of both the approximated function and the reproducing kernels.

1. Introduction

It is known that the goal of learning theory is to approximate a functionor some function featuresfrom data samples.

Let X be a compact subset of n-dimensional Euclidean spaces Rn_, _Y _⊂ _{R. Then,}

learning theory is to find a functionf related the input x ∈ X to the output y ∈ Y see 1–3. The functionfxis determined by a probability distributionρx, y ρy|xρXx

onZ : X×Y,whereρXxis the marginal distribution onX andρy |xis the condition

probability ofyfor a givenx.

Generally, the distributionρx, yis known only through a set of samplez:{zi}mi1

{xi, yi}mi1 independently drawn according to ρx, y. Given a sample z, the regression

problem based on Support Vector MachineSVMlearning is to find a functionfz:X → Y

such that fzx is a good estimate of y when a new input x is provided. The binary

classification problem based on SVM learning is to find a functionψz :X → {−1,1}which

divides X into two parts. Hereψz is often induced by a real-valued function fz with the

form ofψzx sgnfzxwhere sgnfzx 1 iffzx ≥0, otherwise,−1. The functions

fz are often generated from the following Tikhonov regularization scheme see, e.g., 4–

(2)

samplez:

fz:fz,λ,HK arg min

f∈HK

1 m

m

i1

Vp

yifxi

λf2_H K

, 1.1

whereλ > 0 is a positive constant called the regularization parameter andVpt 1−tp max{1−t,0}pp≥1calledp-norm SVM loss.

In addition, the Tikhonov regularization scheme involving oﬀsetbsee, e.g.,4,10,11

can be presented below with a similar way to1.1

fz:fz, λ,HK arg min

ff∗_{b, f}∗_∈H_K_{, b∈}_R

1 m

m

i1

Vp

yifxi

λf∗2_H K

. 1.2

We are in a position to define reproducing kernel Hilbert space. A functionK:X×X → Ris called a Mercer kernel if it is continuous, symmetric, and positive semidefinite, that is, for any finite set of distinct points{x1, x2, . . . , xl} ⊂ X, the matrixKxi, xjli,j1is positive

semidefinite.

The reproducing kernel Hilbert space RKHS HK see 12 associated with the

Mercer kernel K is defined to be the closure of the linear span of the set of functions {Kx:Kx,·:x∈X}with the inner product·,·HK ·,·KsatisfyingKx, KyKKx, y and the reproducing property

Kx, gKgx, ∀x∈X, g∈ HK. 1.3

Ifgx _iciKx, xi, theng2K

i,jcicjKxi, xj. DenoteCXas the space of

continuous function on X with the norm · _∞. Let κ : K_∞. Then the reproducing property tells that

_g

∞≤κgK, ∀g ∈ HK. 1.4

It is easy to see thatHKis a subset ofCX.We say thatKx, yis a universal kernel if for

any compact subsetX HKis dense inCX see13, Page 2652.

Let ∧ ⊂ X be a given discrete set of finite points. Then, we may define an RKHS H∧

K, · H∧Kby the linear span of the set of functions{Kx : Kx,· : x ∈ ∧}. Then, it is easy to see thatH∧_K⊂ HKand for anyf ∈ H∧_Kthere holdsfH∧

K fK. DefineEf : _ZVpyfxdρx, yand f

Vp

ρ : arg minEf,where the minimum is

taken over all measurable functions. Then, to estimate the explicit learning rate, one needs to estimate the regularization errorssee, e.g.,4,7,9,14

Dλ: inf

f∈HK

Ef− EfVp

ρ

λf2_H K

, _1.5

Dλ: inf

ff∗_{b, f}∗_∈H_K_{, b∈}_R

Ef− EfVp

ρ

λf∗2_H K

(3)

The convergence rate of1.5is controlled by theK-functionalsee, e.g.,9

Kf, λ_H K _g∈Hinf

K

f−g_p,dρ

Xλg

2

K

, λ >0, f ∈LpdρX

, _1.7

and1.6is controlled by anotherK-functionalsee, e.g.,4

Kf, λ_H

K _g∈H inf

K, gg∗b, g∗∈HK, b∈R

f−g_p,dρ Xλg

∗2

K

, _1.8

whereλ >0, f ∈Lp_dρ

X {fx:fp,dρX <∞}with

_f

p, dρX ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩

X

_fxp

dρXx

1/p

, 1≤p <∞,

ess sup

x∈X

_fx, p ∞.

1.9

We notice that, on one hand, theK-functionals1.7and1.8are the modifications of the K-functional of interpolation theory see15since the interpolation relation1.4.

On the other hand, they are diﬀerent from the usualK-functionalssee e.g.,16–30since the termg2_K.However, they have some similar point. For example, ifKx, yis a universal kernel,HKis dense inLpdρX see e.g.,31. Moreover, some classical function spaces such

as the polynomial spacessee2, 32and even some Sobolev spaces may be regarded as RKHSsee e.g.,33.

In learning theory we often require Kf, t_H_K Otβ _and _{Kf, t}

HK Ot

β _for

someβ > 0 see e.g., 1,7,14. Many results on this topic have been achieved. With the

weighted Durrmeyer operators8,9showed the decay by takingKx, yto be the algebraic polynomials kernels on0,1×0,1or on the simplex inR2_.

However, in general case, the convergence of K-functional 1.8 should also be considered since the oﬀset often has influences on the solution of the learning algorithmssee e.g.,6,11. Hence, the purpose of this paper is twofold. One is to provide the convergence

rates of1.7and1.8whenKx, yis a general Mercer kernel on the unit sphereSq _and

1≤p≤∞.The other is how to construct functions of the type of

fx β0g∗x β0

m

i1

βiKx, xi, x∈X, 1.10

to obtain the convergence rate of1.8. The translation networks constructed in34–37have the form of1.10and the zonal networks constructed in38,39have the form of1.10with β00. So the methods used by these references may be used here to estimate the convergence

rates of1.7and1.8if one can bound the termg∗2

K.

In the present paper, we shall give the convergence rate of1.7and1.8for a general kernel defined on the unit sphereSn _{_x_∈_Rn1 _:_x

Rn1 1}andρx, y ρy|xμ_nx

(4)

andμnx,the convergence rate of1.7-1.8in the general case may be obtained according

to the way used by1,8.

The rest of this paper is organized as follows. In Section 2, we shall restate some notations on spherical harmonics and present the main results. Some useful lemmas dealing with the approximation order for the de la Vall´ee means of the spherical harmonics, the Gauss integral formula, the Marcinkiewicz-Zygmund with respect to the scattered data obtained by G. Brown and F. Dai and a result on the zonal networks approximation provided by H. N. Mhaskar will be given inSection 3. A kind of weighted norm estimate for the Mercer kernel matrices on the unit sphere will be given inLemma 3.8. Our main results are proved in the last section.

Throughout the paper, we shall writeA OBif there exists a constantC >0 such thatA≤CB. We writeA∼BifAOBandBOA.

2. Notations and Results

To state the results of this paper, we need some notations and results on spherical harmonics.

2.1. Notations

For integers l ≥ 0, q ≥ 1, the class of all one variable algebraic polynomials of degree≤ l defined on −1,1is denoted byPl, the class of all spherical harmonics of degreel will be

denoted byH_lq, and the class of all spherical harmonics of degreel ≤ nwill be denoted by q

n. The dimension ofH_lqis given bysee40, Page 65

dq_l dimH_lq ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩

2lq−1 lq−1

⎛ ⎝lq−1

q−1 ⎞

⎠_{, l}_≥_1,

1, l0,

2.1

and that ofΠqnis

n l0d

q

l.One has the following well-known addition formulasee41, Page

10, Theorem 2:

dq_l

k1

Yl,kxYl,k

y d

q l

ωq

pq_l1xy, l0,1, . . . , x, y∈Sq, 2.2

wherepq_l1xis the degree-lgeneralized Legendre polynomial. The Legendre polynomials are normalized so thatpq_l11 1 and satisfy the orthogonality relations

₁

−1

pq_l1xpq_k1xWqxdx

ωq

ωq−1d_lq

δl,k, Wqx

(5)

DefineLp_Sq_and _Lp

Wα,β by takingρX to be the usual volume element ofS

q _{and the}

Jacobi weights functionsWα,βx 1−xα1xβ,α > −1, β > −1, respectively. For any

φ∈L1

Wqwe have the following relationsee42, Page 312:

Sq

φxydμqx ωq−1

₁

−1

φxWqxdx. 2.4

The orthogonal projectionsYkf, xof a functionf ∈ L1SqonH_kqare defined bysee e.g.,

43

Yk

f, x d

q k

ωq

Sq

pq_k1xyfydμq

y, x∈Sq, 2.5

wherexydenotes the inner product ofxandy.

2.2. Main Results

Letφ∈L1

Wqsatisfyalφ ωq−1 1

−1φxp

q1

l xWqxdx >0 and

_∞

l0alφdq_l <∞. Define

Kφ, x, yKφ, xy ∞

l0

al

φd

q l

ωq

pq_l1xy, al

φ>0, x, y∈Sq. 2.6

Then, by44, Chapter 17we know thatKφ, x, yis positive semidefinite onSq_{and the right}

of2.6is convergence absolutely and uniformly since|pq_l1x| ≤ 1. Therefore,Kφ, x, yis a Mercer kernel onSq_._By₁₃_{, Theorem 10}_{we know that}_{Kφ, x, y}_{is a universal kernel on}

Sq_{. We suppose that there is a constant}_C

p>0 depending only onpsuch for anyy∈Sq

_Kφ,_·_{, y}

p,Sq≤Cpφ_p,W_q, 1≤p≤∞, φ∈Lp_W_q. 2.7

Given a finite set∧, we denote by∧the cardinality of∧. Forr >0 anda≥ 1 we say that a finite subset∧ ⊂Sq_{is an}_{r, a-covering of}_Sq_if

Sq⊂

ω∈∧

Bω, r, max

ω∈∧∧ ∩Bω, r≤a, 2.8

where Bω, r {x ∈ Sq _: _{dω, x} _≤ _r_}_with _{dω, x} _: _{arc cosω, x} _{being the geodesic}

(6)

Let S ≥ 1 be an integer, a {an}n∞0 a sequence of real numbers. Define forward

diﬀerence operators byΔan Δ1an:an1−an,Δr : ΔΔr−1an,r2,3, . . . ,Δ0an:an,

|a|_S: sup

0≤r≤S, v≥0

v1r_|_Δr_a v|,

|a|∗_S

S

r1

∞

v0

v1r−1_|_Δr_a v|.

2.9

We say a finite subset∧ ⊂Sq _{is a subset of interpolatory type if for any real numbers}

{yω}ω∈∧there is ap∈Πnq such thatpω yω,ω∈ ∧.This kind of subsets may be found from

45,46.

LetBSbe the set of all sequenceafor which|a|S<∞andB∗Sthe set of all sequencea

for which|a|_S|a|∗_S<∞.

Letβ >0 be a real number,f ∈Lp_Sq_._{Then, we say}_∂

βf ∈LpSqif there is a function

ϕ∈Lp_Sq_{such that}

ϕx∼∞

k0

kβYk

f, x, x∈Sq. 2.10

We now give the results of this paper.

Theorem 2.1. If there is a constantγ0>0depending only onqsuch that∧is a subset of interpolatory

type and aδ/62m_{, a}_{-covering of}_Sq _satisfying₀ _{< δ < γ}

0/awitha ≥ 1and mbeing a given

positive integer.S > qis an integer.β > 0is a real number such that there isγ ≥ γ2q−1and

β−γ > q,φ ∈L∞_W

q satisfies{l1

β_a

lφ}l∞0 ∈ BSand{l1−βa_l−1φ}l∞0 ∈ B∗S.H∧Kφis the

reproducing kernel space reproduced by∧and the kernel2.6.∂γf∈LpSq. Then there is a constant

Cp,q >0depending only onpandqand a functiongm,φx C0g_m,φ∗ xwithg_m,φ∗ x∈ H∧_K_φ

andC0a constant such that

f−gm,φ_p,SqO

1 2mγ

, 2.11

g∗

m,φ

2

H∧

Kφ

O22m2q−1. 2.12

The functionsφsatisfying the conditions of Theorem 2.1may be found in39, Page 357.

Corollary 2.2. Under the conditions ofTheorem 2.1. If∂γf∈LpSq, then

K

f, 1 22mγ

H∧

Kφ

O

1 2mγ

. _2.13

Corollary 2.2shows that the convergence rate of theK-functional1.8is controlled by

(7)

Theorem 2.3. If there is a constantγ >0depending only onqsuch that∧is a subset of interpolatory type and aδ/62m_{, a}_{-covering of}_Sq _satisfying₀ _{< δ < γ/a}_with _a _≥ ₁_and _m_{being a given} positive integer.H∧_K_φis the reproducing kernel space reproducing by∧and the kernel2.6withφ∈ L∞_W

qsatisfying{l1

β_a

lφ} ∈ BSand{1/l1βa−_l1φ} ∈ B∗_S.Then, forα≥q1/p−1/2β

and∂βf∈LpSqthere holds

K

f, 1 2αm

H∧

Kφ

O

1 2mβ

, _2.14

whereamaxa,0.

3. Some Lemmas

To prove Theorems 2.1 and 2.3, we need some lemmas. The first one is about the Gauss integral formula and Marcinkiewicz inequalities.

Lemma 3.1see47–50. There exist constantsγ >0depending only onqsuch that for any positive integernand anyδ/n, a-covering∧ofSq_satisfying₀_{< δ < γ/a}_{, there exists a set of real numbers}

λω∼n−q,ω∈ ∧such that

Sq

fydμq

y

ω∈∧

λωfω 3.1

for anyf ∈Πq₃_n,and forf ∈Πqn

_f

p,Sq ∼ ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩

ω∈∧

λωfωp

1/p

, 0< p <∞,

max

ω∈∧fω, p ∞,

3.2

where∧ ∼ nq _∼ _dim_Πq

n,the constants of equivalence depending only onq,a,δ, andpwhen pis small. Here one employs the slight abuse of notation that001.

The second lemma we shall use is the Nikolskii inequality for the spherical harmonics.

Lemma 3.2see38,45,49,51,52. If 0 < p < r ≤ ∞,p ∈ Πqn, then one has the following Nikolskii inequality:

p_p,Sq≤C

qp_r,Sq ≤C

qnq1/p−1/rp_p,Sq, 3.3

where the constantCqdepends only onq.

(8)

Lemma 3.3. Letμbe a positive measure onX.{Hk}k∞0 is a sequence of finite-dimensional spaces

satisfying the following:

IHk⊂Lpdμ.

IIHkis orthogonal toHl(inLpdμ) whenl /k.

IIIspan∞

k0Hkis dense inLpdμfor all1≤p≤∞.

IVH0is the collection of the constants.

The Ces`aro meansCδoffxis given by

σ_Nδf, x 1 Aδ_N

N

k0

Aδ_N−kPk

f, x, Aδ_N Γkδ1

Γk1Γδ1 3.4

forN1,2, . . ., where

Pk

f, x

X

fy _d

k

l1

ϕk,ixϕk,i

y

dμy, x∈X, 3.5

and {ϕk,ix}d_ik₁ is an orthogonal base of Hk in L2dμ. One sets,for a given α > 0, ∂αfx ∼

_∞

k1kαPkf, xand∂αf ∈Lpdμif there existsϕ∈Lpdμsuch thatPkϕ, x kαPkf, x. Letηbe defined asηu ∈ C∞, ηu 1foru ∈0,1/2andηu 0foru >1and is a nonegative and nonincrease function.ρNf, xare the de la Vall´ee Poussin means defined as

ρN

f, x

∞

k0

η

k 2N

Pk

f, x, x∈X, f ∈L1dμ. 3.6

Then,ρNf, x fx, f ∈_k≤NHk.If for someδ >0,σδ_Nfp, dμ ≤Cfp, dμ,1 ≤ p≤ ∞, then,ρNfp, dμ≤Cfp, dμand

ρNf−f_{p, dμ}O

∂αf_{p, dμ}

Nα

, ∂αf∈Lp

dμ. 3.7

Lemma 3.3makes the followingLemma 3.4.

Lemma 3.4. Letηbe the function defined as inLemma 3.3. Define two kinds of operators, respectively, by

VN

f, x

∞

k0

η

k 2N

Yk

f, x, x∈Sq_{, f} _∈_L1_Sq_,

V_N∗f, x ∞

k0

η

k 2N

al

fd

q l

ωq

pq_l1x, x∈−1,1, f ∈L1

Wq.

(9)

Then,VNf, x fxfor anyf∈Πq_NandV_N∗f, x fxfor anyf∈PN. Moreover,

_V_N_f₋_f

p,Sq O

_∂_α_f

p,Sq Nα

, ∂αf ∈LpSq, 3.9

V_N∗f−f_p,W qO

⎛

⎝∂αfp,Wq Nα

⎞

⎠_, _∂_α_f_∈_Lp

Wq, 3.10

where forf ∈Lp_W

qone defines

∂αfx∼ ∞

k1

kαak

fd

q l

ωq

p_lq1x x∈−1,1. 3.11

Proof. By54, Lemma 2.2we knowVNfp,Sq ≤Cf_p,Sqfor someC >0. Hence,3.9holds by3.7. By19, Theorem Awe knowσ_Nδfp,Wq ≤Cfp,Wqforδ > q/2−1/2.Hence,3.10 holds by3.7.

Let∧ ⊂Sq_{be a finite set. Then we call}_∧_{an M-Z quadrature measure of order}_n_if_3.1

and3.2hold forf ∈ Πqn.By this definition one knows the finite set∧inLemma 3.1is an

M-Z quadrature measure of ordern. Define an operator as

σy

∧, η, f, x ∞

l0

η

l y

_dq l

ωq

ω∈∧

λωfωpq_l1ωx, x∈Sq. 3.12

Then, we have the following results.

Lemma 3.5see39. For a given integern≥1,let∧be an M-Z quadrature measure of order62n_,

φ∈L1

Wq,S > qan integer,1≤p≤∞,β > q/p

_{, where}_p_satisfies_1/p_1/p₁_{which satisfies}

p1ifp ∞andp ∞ifp1.ηtdefined inLemma 3.3is a nonnegative and non-increasing function. Letφ∈L1

Wq satisfy{l1

−β_a−1

l φ}

∞

l0 ∈ B∗S. Then, forf ∈Hγ,p,0< γ ≤β, whereHγ,p consists off∈Lp_Sq_{for which the derivative of order}_γ_{; that is,}_∂γf_{, belongs to}_Lp_Sq_{. Then, there} is an operatorDφ :Hγ,p → LPSqsuch that

i(see [39, Proposition3.1, (b)]).Dφfp,Sq≤c∂γf_p,Sqforf∈H_β,p

fx

Sq

φxyDφf

ydμq

y, x∈Sq, 3.13

fl, k al

φ !Dφfl, k, 3.14

(10)

ii(see [39, Theorem3.1]). Moreover, if one adds an assumption that{l1βalφ}l∞0 ∈ BS, then, there are constantsC >0andC1>0such that

f·−

ω∈∧

λωDφσ2n∧, η, f, ωφ·ω

p,Sq

≤C2−nγ∂γf_p,Sq 3.15

and for0≤γ≤β

_∂_γ_f₋_∂_γ_σ_n_∧_{, η, f}

p,Sq ≤C1nγ−β∂βf_p,Sq. 3.16

Lemma 3.6see e.g.,29, Page 230. Letf ∈L2_Sq_{. Then,}

f₂_,Sq _∞

k0

Ykf2₂_,Sq 1/2

. 3.17

Following Lemma 3.7 deals with the orthogonality of the Legendre polynomials plxy.

Lemma 3.7. For the generalized Legendre polynomialspq_l1one has

dq_l ωq

Sq

p_kq1uyp_lq1xudμqu pq_k1

xy, x, y∈Sq_. _3.18

Proof. It may be obtained by2.2.

Lemma 3.8. Letφ ∈L∞_W

q satisfy2.7forp ∞and∂qφ∈L

∞

Wq.∧ ⊂ S

q_{is a finite set satisfying} the conditions ofTheorem 2.1. Then, there is a constantCφ >0depending only onφsuch that

ω,ω∈∧

λωλωKφ, ω, ω2

1/2

≤Cφ<∞. 3.19

Proof. Define a matrix byK

√ λ

∧ λωK∧ω, ω λωω,ω_∈∧, where K∧ω, ω

∧

l0

λl ×

dq l/ωqp

q1

l ωωwithλl≥0 andl∧2 {a aωω∈∧ :

ω∈∧a

2

ω1/2<∞}.Then,

ω,ω∈∧

λωλωK_∧ω, ω2

1/2

K√λ

∧ _l2 ∧

sup

a /0, a∈l2 ∧

aK

√ λ ∧ a

(11)

By the Parseval equality we have

aK

√ λ

∧ a

ω,ω_∈∧

aω λωK∧

ω, ωaω λω

ω,ω_∈∧

aω λω

⎛ ⎝∧

l0

λl×

dq_l ωq

p_lq1ωω ⎞

⎠_a_ω λ_ω

∧

l0

λl dq_l

α0

ω∈∧

aω λωYl,αω

2

≤max

0≤l≤∧λl

Sq ∧

l0

dq_l

α0

ω∈∧

aω λωYl,αω

Yl,αx

2

dμqx

max

0≤l≤∧λl

Sq ω∈∧

aω λω ∧

l0

d_lq ωq

pq_l1xω

2

dμqx.

3.21

LetL_∧x∈ΠqnsatisfyL∧ω aω/ λω,ω∈ ∧. Then, by3.1

aK

√ λ

∧ a

ω,ω∈∧

aω λωK∧ω, ωaω λ_ω

≤ max

0≤l≤∧λl

Sq ∧

l0

dq_l ωq

Sq

L_∧ωpq_l1xωdμqω

2

dμqx

max

0≤l≤∧λl

Sq|

L_∧x|2dμqx

max

0≤l≤∧λl

ω∈∧

λω|L∧x|2 max

0≤l≤∧λla _a.

3.22

Hence, aK

√ λ

∧ a ≤ max0≤l≤∧λlaa. On the other hand, since Kφ, x,·∞,Sq ≤ Cφ_∞,W q, x∈Sq, we have for anyx∈Sqthat

Kφ, x, y−KV_∧∗φ, x, yKφ−V_∧∗φ, x, y≤Cφ−V_∧∗φ

∞,Wq

. _3.23

It follows forx, y∈Sq_that

KV_∧∗φ, x, y−Cφ−V_∧∗φ

∞,Wq

≤Kφ, x, y≤KV_∧∗φ, x, yCφ−V_∧∗φ

∞,Wq .

(12)

DefineK

√ λ

∧ φ λωKφ, ω, ω λω_ω,ω_∈∧. Then,3.24,3.10, the Cauchy inequality,

and the fact∧ ∼nq_make

aK

√ λ

∧ φa−aK √

λ

∧

V_∧∗φaO

∧φ−V_∧∗φ

∞,Wq

×aaO1aa. 3.25

It follows that

aK

√ λ

∧ φa≤aK √

λ

∧

V_∧∗φaaK

√ λ

∧ φa−aK √

λ

∧

V_∧∗φa

≤ max

0≤l≤∧η

l 2∧

aaO1aa.

3.26

Equation3.2thus holds.

4. Proof of the Main Results

We now show Theorems2.1and2.3, respectively.

Proof ofTheorem 2.1. Lemma 4.3 in39gave the following results.

Let1 ≤p ≤ ∞,γ > q/p,S > q be an integer, and a sequence of real numbers such aγ :{l1γal} ∈ BS.Then, there existsφ∈Lp_W_qsuch thatalφ∈al,l0,1,2, . . . .

Since1lβ 1lγ1lβ−γandβ−γ> q,we have aφ∗∈L∞_W

qsuch thatalφ

∗

1lγ_a

lφ.Hence,∂γφ∈L∞_W

qand

KV₂∗m1

φ, x, y ∞

l0

al

V₂∗m1

φd

q l

ωq

pq_l1xy, x, y∈Sq, 4.1

and for 0≤k≤2m1_{there holds for}_{x, y}_∈_Sq_that

d_kq ωq

Sq

KV₂∗m1

φ, x, up_lq1uydμqu ak

V₂∗m1

φd

q k

ωq

pq_k1xyak

φd

q k

ωq

p_kq1xy. 4.2

It follows forx, y∈Sq_that

pq_k1xy

SqK

V₂∗m1

φ, x, up_lq1uydμqu

ak

φ 4.3

On the other hand, since

V2mf, xY₀V₂mf, x

2m1

k1

Yk

(13)

where for 1≤k≤2m1_{, we have by}_4.3

Yk

V2m

f, x d

q k

ωqak

φ

Sq

KV₂∗m1

φ, x, upq_l1uydμqu

×V2m

f, ydμq

y. 4.5

Hence, above equation and3.1-3.2makes

V2m

f, xY0

V2m

f, x

2 m1

k1

dq_k ωqak

φ

Sq

KV₂∗m1

φ, x, upq_k1uydμqu

×V2m

f, ydμq

y

C0

ω∈∧

λωCω,φK

V₂∗m1

φ, x, ω,

4.6

whereC0Y0V2mf, x,C_ω,φ

2m1

k1

dq

k/ωqYkV2mf, ω/akφ.Define

gm,φx C0g_m,φ∗ x C0

ω∈∧

λωCω,φK

φ, x, ω, x∈Sq. _4.7

Then, we knowg_m,φ∗ x∈RH∧_K_φand by3.9

_g_m,φ₋_f

p,Sq ≤gm,φ−V2mf_p,SqV2mf−f_p,Sq

O

1 2mγ

gm,φf−V2mf

p,Sq,

4.8

where

gm,φx−V2mf, x≤

ω∈∧

λωCω,φK

V₂∗m1

φ−φ, x, ω. _4.9

It follows by3.9that gm,φ−V2mf

p,Sq ≤

ω∈∧

λωCω,φKV₂∗m1φ−φ,·, ω_p,Sq

O

1 2mγ

ω∈∧

λωCω,φ.

4.10

On the other hand, by the definition ofV2mfand3.14we have for 1≤k≤2m1that

Yk

V2mf, ωη

k 2m1

ak

φYk

Dφf, ω

(14)

whereDφ denotes the operatorDφofLemma 3.5forφxy Kφ, xy.Hence,

Cω,φ

2m1

k1

d_kq ωq

η

k 2m1

Yk

Dφf, ω

, ω∈Sq. 4.12

Equation3.2and the definition ofηtmake

ω∈∧

_λ_ω_C_ω,φ_≤

ω∈∧

λω

2m1

k1

d_kq ωq

η

k 2m1

Yk

Dφf, ω

≤Cq

2m1

k1

dq_k

Sq Yk

Dφf, ωdμqω.

4.13

The H ¨older inequality, theiofLemma 3.5, and the fact that|pq_l1x| ≤1 make|YkDφf, ω| ≤

Cqdq_kDφfp,Sq ≤C_qdq

k∂γfp,Sq. Therefore,

ω∈∧

_λ_ω_C_ω,φ_≤_C_q2m1

k1

d_kq2∂γf_p,Sq. 4.14

Takeg_m,φ∗ x

ω∈∧λωCω,φKφ, x, ω,then

g_m,φ∗ 2

H∧

Kφ

ω,ω∈∧

λωλωC_ω,φC_ω_,φKφ, ω, ω

≤

ω∈∧

λωCω,φ2

ω,ω∈∧

λωλωKφ, ω, ω2

1/2

.

4.15

Equations3.2,3.17,3.16, and the Cauchy inequality make

ω∈∧

λωCω,φ2≤C

2m1

k1

dq_k ωq

Yk

Vm

f, ω ak φ 2

2,Sq

≤C ₂m1

k1

dq_k2∂γf_p,Sq 2

.

4.16

LetΓxbe the Gamma function. Then, it is well known thatΓx/Γxa∼x−a, x → ∞.Therefore,

d_lq 2lq−1 lq−1

lq−1 q−1

2lq−1Γlq−1 ΓqΓl1 ∼

2lq−1 Γq l

(15)

Hence,

2m1

k1

dq_k2O ₂m1

l1

l2q−1

O2m12q−1, m−→∞. 4.18

Equations4.14and4.4make

_g_m,φ₋_V₂mf

p,Sq O

1 2mγ−2q1

, 4.19

and hence

f−gm,φ_p,SqO

1 2mγ

1 2mγ−2q1

. 4.20

Sinceγ ≥ 2q−1γ, we have2.11by4.20. Equation2.12follows by 4.3,4.4, and 3.19.

Proof ofCorollary 2.2.By2.11-2.12one has

K

f, 1 22mγ

HKφ

≤f−gm,φ_p,Sq 1 22mγ

g∗

m,φ

2

HKφ

O

1 2mγ

1

22mγ ×O

22m2q−1O

1 2mγ

.

4.21

Proof ofTheorem 2.3. Take the place ofφxωinLemma 3.5withKφ, xω,denote still byDφ

the operatorDφ inLemma 3.5withφxy Kφ, xyand

gm,φx

ω∈∧

λωDφσ2m∧, η, f, ωKφ, xω, x∈Sq, _4.22

then,gm,φ∈H_K∧_φand by3.15f−gm,φp,SqO1/2mβ.In this case,

gm,φ2_H∧

Kφ

ω,ω∈∧

λωλωD_φσ₂m∧, η, f, ωD_φσ₂m∧, η, f, ωKφ, ωω

≤

ω∈∧

λωDφσ2m∧, η, f, ω2

ω,ω∈∧

λωλωKφ, ωω2

1/2

.

(16)

Sinceσ2m∧, η, xis a spherical harmonics of order≤ 2m, we know byiofLemma 3.5that

Dφσ2m∧, η, f, xare also spherical harmonics of order≤ 2m.Then, 3.2,iofLemma 3.5, 3.3, and3.16make

ω∈∧

λωDφσ2m∧, η, f, ω2≤C

Sq

_D_φ_σ₂m∧, η, f, ω2dμ_qω

≤C∂βσ2m∧, η, f2

2,Sq

≤C22mq1/p−1/2_∂

βσ2m∧, η, f2

p,Sq

≤C22mq1/p−1/2_∂_β_f−_∂_β_σ₂m∧, η, f2

p,SqC∂βf2_p,Sq

. 4.24

Hence,3.19and above equation makegm,φ2_H∧

Kφ ≤C2

mq1/p−1/2_∂_β_f2

p,Sq. Equation2.14 follows by3.15.

Acknowledgments

This work is supported by the National NSF10871226 of China. The authors thank the reviewers for giving very valuable suggestions.

References

1 F. Cucker and S. Smale, “On the mathematical foundations of learning,”Bulletin of the American Mathematical Society, vol. 39, no. 1, pp. 1–49, 2002.

2 F. Cucker and D.-X. Zhou, Learning Theory: An Approximation Theory Viewpoint, Cambridge Monographs on Applied and Computational Mathematics, Cambridge University Press, Cambridge, Mass, USA, 2007.

3 V. N. Vapnik, Statistical Learning Theory, Adaptive and Learning Systems for Signal Processing, Communications, and Control, John Wiley & Sons, New York, NY, USA, 1998.

4 D. R. Chen, Q. Wu, Y. M. Ying, and D. X. Zhou, “Support vector machine soft margin classifier: error analysis,”Journal of Machine Learning and Research, vol. 5, pp. 1143–1175, 2004.

5 T. Evgeniou, M. Pontil, and T. Poggio, “Regularization networks and support vector machines,” Advances in Computational Mathematics, vol. 13, no. 1, pp. 1–50, 2000.

6 Y. Li, Y. Liu, and J. Zhu, “Quantile regression in reproducing kernel Hilbert spaces,”Journal of the American Statistical Association, vol. 102, no. 477, pp. 255–268, 2007.

7 H. Tong, D.-R. Chen, and L. Peng, “Analysis of support vector machines regression,”Foundations of Computational Mathematics, vol. 9, no. 2, pp. 243–257, 2009.

8 H. Tong, D.-R. Chen, and L. Peng, “Learning rates for regularized classifiers using multivariate polynomial kernels,”Journal of Complexity, vol. 24, no. 5-6, pp. 619–631, 2008.

9 D.-X. Zhou and K. Jetter, “Approximation with polynomial kernels and SVM classifiers,”Advances in Computational Mathematics, vol. 25, no. 1–3, pp. 323–344, 2006.

10 D. Chen and D.-H. Xiang, “The consistency of multicategory support vector machines,”Advances in Computational Mathematics, vol. 24, no. 1–4, pp. 155–169, 2006.

11 E. De Vito, L. Rosasco, A. Caponnetto, M. Piana, and A. Verri, “Some properties of regularized kernel methods,”Journal of Machine Learning Research, vol. 5, pp. 1363–1390, 2004.

12 N. Aronszajn, “Theory of reproducing kernels,”Transactions of the American Mathematical Society, vol. 68, pp. 337–404, 1950.

(17)

14 Q. Wu, Y. Ying, and D.-X. Zhou, “Multi-kernel regularized classifiers,”Journal of Complexity, vol. 23, no. 1, pp. 108–134, 2007.

15 J. Bergh and J. L ¨ofstr ¨om,Interpolation Spaces, Springer, New York, NY, USA, 1976.

16 H. Berens and G. G. Lorentz, “Inverse theorems for Bernstein polynomials,” Indiana University Mathematics Journal, vol. 21, no. 8, pp. 693–708, 1972.

17 H. Berens and L. Q. Li, “The Peetre K-moduli and best approximation on the sphere,” Acta Mathematica Sinica, vol. 38, no. 5, pp. 589–599, 1995.

18 H. Berens and Y. Xu, “K-moduli, moduli of smoothness, and Bernstein polynomials on a simplex,” Indagationes Mathematicae, vol. 2, no. 4, pp. 411–421, 1991.

19 W. Chen and Z. Ditzian, “Best approximation andK-functionals,”Acta Mathematica Hungarica, vol. 75, no. 3, pp. 165–208, 1997.

20 W. Chen and Z. Ditzian, “Best polynomial and Durrmeyer approximation inLpS,”Indagationes Mathematicae, vol. 2, no. 4, pp. 437–452, 1991.

21 F. Dai and Z. Ditzian, “Jackson inequality for Banach spaces on the sphere,” Acta Mathematica Hungarica, vol. 118, no. 1-2, pp. 171–195, 2008.

22 Z. Ditzian and X. Zhou, “Optimal approximation class for multivariate Bernstein operators,”Pacific Journal of Mathematics, vol. 158, no. 1, pp. 93–120, 1993.

23 Z. Ditzian and K. Runovskii, “Averages and K-functionals related to the Laplacian,” Journal of Approximation Theory, vol. 97, no. 1, pp. 113–139, 1999.

24 Z. Ditzian, “A measure of smoothness related to the Laplacian,” Transactions of the American Mathematical Society, vol. 326, no. 1, pp. 407–422, 1991.

25 Z. Ditzian and V. Totik,Moduli of Smoothness, vol. 9 ofSpringer Series in Computational Mathematics, Springer, New York, NY, USA, 1987.

26 Z. Ditzian, “Approximation on Banach spaces of functions on the sphere,”Journal of Approximation Theory, vol. 140, no. 1, pp. 31–45, 2006.

27 Z. Ditzian, “Fractional derivatives and best approximation,”Acta Mathematica Hungarica, vol. 81, no. 4, pp. 323–348, 1998.

28 L. L. Schumaker,Spline Functions: Basic Theory, John Wiley & Sons, New York, NY, USA, 1981, Pure and Applied Mathematics.

29 K. Y. Wang and L. Q. Li,Harmonic Analysis and Approximation on the Unit Sphere, Science Press, Beijing, China, 2000.

30 Y. Xu, “Approximation by means of h-harmonic polynomials on the unit sphere,” Advances in Computational Mathematics, vol. 21, no. 1-2, pp. 37–58, 2004.

31 S. Smale and D.-X. Zhou, “Estimating the approximation error in learning theory,”Analysis and Applications, vol. 1, no. 1, pp. 17–41, 2003.

32 B. Sheng, “Estimates of the norm of the Mercer kernel matrices with discrete orthogonal transforms,” Acta Mathematica Hungarica, vol. 122, no. 4, pp. 339–355, 2009.

33 S. Loustau, “Aggregation of SVM classifiers using Sobolev spaces,” Journal of Machine Learning Research, vol. 9, pp. 1559–1582, 2008.

34 H. N. Mhaskar and C. A. Micchelli, “Degree of approximation by neural and translation networks with a single hidden layer,”Advances in Applied Mathematics, vol. 16, no. 2, pp. 151–183, 1995.

35 B. H. Sheng, “Approximation of periodic functions by spherical translation networks,” Acta Mathematica Sinica. Chinese Series, vol. 50, no. 1, pp. 55–62, 2007.

36 B. Sheng, “On the degree of approximation by spherical translations,”Acta Mathematicae Applicatae Sinica. English Series, vol. 22, no. 4, pp. 671–680, 2006.

37 B. Sheng, J. Wang, and S. Zhou, “A way of constructing spherical zonal translation network operators with linear bounded operators,”Taiwanese Journal of Mathematics, vol. 12, no. 1, pp. 77–92, 2008.

38 H. N. Mhaskar, F. J. Narcowich, and J. D. Ward, “Approximation properties of zonal function networks using scattered data on the sphere,”Advances in Computational Mathematics, vol. 11, no. 2-3, pp. 121–137, 1999.

39 H. N. Mhaskar, “Weighted quadrature formulas and approximation by zonal function networks on the sphere,”Journal of Complexity, vol. 22, no. 3, pp. 348–370, 2006.

40 H. Groemer,Geometric Applications of Fourier Series and Spherical Harmonics, vol. 61 ofEncyclopedia of Mathematics and Its Applications, Cambridge University Press, Cambridge, Mass, USA, 1996.

41 C. M ¨uller,Spherical Harmonics, vol. 17 ofLecture Notes in Mathematics, Springer, Berlin, Germany, 1966.

42 S. Z. Lu and K. Y. Wang,Bochner-Riesz Means, Beijing Normal University Press, Beijing, China, 1988.

(18)

44 H. Wendland, Scattered Data Approximation, vol. 17 of Cambridge Monographs on Applied and Computational Mathematics, Cambridge University Press, Cambridge, Mass, USA, 2005.

45 H. N. Mhaskar, F. J. Narcowich, N. Sivakumar, and J. D. Ward, “Approximation with interpolatory constraints,”Proceedings of the American Mathematical Society, vol. 130, no. 5, pp. 1355–1364, 2002.

46 F. J. Narcowich and J. D. Ward, “Scattered data interpolation on spheres: error estimates and locally supported basis functions,”SIAM Journal on Mathematical Analysis, vol. 33, no. 6, pp. 1393–1410, 2002.

47 G. Brown and F. Dai, “Approximation of smooth functions on compact two-point homogeneous spaces,”Journal of Functional Analysis, vol. 220, no. 2, pp. 401–423, 2005.

48 G. Brown, D. Feng, and S. Y. Sheng, “Kolmogorov width of classes of smooth functions on the sphere

Sd−1_,”_{Journal of Complexity, vol. 18, no. 4, pp. 1001–1023, 2002.}

49 F. Dai, “Multivariate polynomial inequalities with respect to doubling weights andA∞weights,” Journal of Functional Analysis, vol. 235, no. 1, pp. 137–170, 2006.

50 H. N. Mhaskar, F. J. Narcowich, and J. D. Ward, “Spherical Marcinkiewicz-Zygmund inequalities and positive quadrature,”Mathematics of Computation, vol. 70, no. 235, pp. 1113–1130, 2001.

51 E. Belinsky, F. Dai, and Z. Ditzian, “Multivariate approximating averages,”Journal of Approximation Theory, vol. 125, no. 1, pp. 85–105, 2003.

52 A. I. Kamzolov, “Approximation of functions on the sphereSn_,”_{Serdica, vol. 10, no. 1, pp. 3–10, 1984.}

53 F. Dai and Z. Ditzian, “Ces`aro summability and Marchaud inequality,”Constructive Approximation, vol. 25, no. 1, pp. 73–88, 2007.