• No results found

The Convergence Rate for a Functional in Learning Theory

N/A
N/A
Protected

Academic year: 2020

Share "The Convergence Rate for a Functional in Learning Theory"

Copied!
18
0
0

Loading.... (view fulltext now)

Full text

(1)

Journal of Inequalities and Applications Volume 2010, Article ID 249507,18pages doi:10.1155/2010/249507

Research Article

The Convergence Rate for a

K

-Functional in

Learning Theory

Bao-Huai Sheng

1

and Dao-Hong Xiang

2

1Department of Mathematics, Shaoxing University, Shaoxing, Zhejiang 312000, China 2Department of Mathematics, Zhejiang Normal University, Jinhua, Zhejiang 321004, China

Correspondence should be addressed to Bao-Huai Sheng,[email protected]

Received 11 November 2009; Accepted 21 February 2010

Academic Editor: Yuming Xing

Copyrightq2010 B.-H. Sheng and D.-H. Xiang. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

It is known that in the field of learning theory based on reproducing kernel Hilbert spaces the upper bounds estimate for aK-functional is needed. In the present paper, the upper bounds for theK-functional on the unit sphere are estimated with spherical harmonics approximation. The results show that convergence rate of theK-functional depends upon the smoothness of both the approximated function and the reproducing kernels.

1. Introduction

It is known that the goal of learning theory is to approximate a functionor some function featuresfrom data samples.

Let X be a compact subset of n-dimensional Euclidean spaces Rn, Y R. Then,

learning theory is to find a functionf related the input xX to the output yY see 1–3. The functionfxis determined by a probability distributionρx, y ρy|xρXx

onZ : X×Y,whereρXxis the marginal distribution onX andρy |xis the condition

probability ofyfor a givenx.

Generally, the distributionρx, yis known only through a set of samplez:{zi}mi1

{xi, yi}mi1 independently drawn according to ρx, y. Given a sample z, the regression

problem based on Support Vector MachineSVMlearning is to find a functionfz:XY

such that fzx is a good estimate of y when a new input x is provided. The binary

classification problem based on SVM learning is to find a functionψz :X → {−1,1}which

divides X into two parts. Hereψz is often induced by a real-valued function fz with the

form ofψzx sgnfzxwhere sgnfzx 1 iffzx ≥0, otherwise,−1. The functions

fz are often generated from the following Tikhonov regularization scheme see, e.g., 4–

(2)

samplez:

fz:fz,λ,HK arg min

f∈HK

1 m

m

i1

Vp

yifxi

λf2H K

, 1.1

whereλ > 0 is a positive constant called the regularization parameter andVpt 1−tp max{1−t,0}pp≥1calledp-norm SVM loss.

In addition, the Tikhonov regularization scheme involving offsetbsee, e.g.,4,10,11

can be presented below with a similar way to1.1

fz:fz, λ,HK arg min

ffb, f∈HK, b∈R

1 m

m

i1

Vp

yifxi

λf∗2H K

. 1.2

We are in a position to define reproducing kernel Hilbert space. A functionK:X×X → Ris called a Mercer kernel if it is continuous, symmetric, and positive semidefinite, that is, for any finite set of distinct points{x1, x2, . . . , xl} ⊂ X, the matrixKxi, xjli,j1is positive

semidefinite.

The reproducing kernel Hilbert space RKHS HK see 12 associated with the

Mercer kernel K is defined to be the closure of the linear span of the set of functions {Kx:Kx,·:xX}with the inner product·,·HK ·,·KsatisfyingKx, KyKKx, y and the reproducing property

Kx, gKgx,xX, g∈ HK. 1.3

Ifgx iciKx, xi, theng2K

i,jcicjKxi, xj. DenoteCXas the space of

continuous function on X with the norm · . Let κ : K. Then the reproducing property tells that

g

∞≤κgK,g ∈ HK. 1.4

It is easy to see thatHKis a subset ofCX.We say thatKx, yis a universal kernel if for

any compact subsetX HKis dense inCX see13, Page 2652.

Let ∧ ⊂ X be a given discrete set of finite points. Then, we may define an RKHS H∧

K, · H∧Kby the linear span of the set of functions{Kx : Kx,· : x ∈ ∧}. Then, it is easy to see thatH∧K⊂ HKand for anyf ∈ H∧Kthere holdsfH∧

K fK. DefineEf : ZVpyfxdρx, yand f

Vp

ρ : arg minEf,where the minimum is

taken over all measurable functions. Then, to estimate the explicit learning rate, one needs to estimate the regularization errorssee, e.g.,4,7,9,14

Dλ: inf

f∈HK

Ef− EfVp

ρ

λf2H K

, 1.5

Dλ: inf

ffb, f∈HK, b∈R

Ef− EfVp

ρ

λf∗2H K

(3)

The convergence rate of1.5is controlled by theK-functionalsee, e.g.,9

Kf, λH K g∈Hinf

K

fgp,dρ

Xλg

2

K

, λ >0, f ∈LpdρX

, 1.7

and1.6is controlled by anotherK-functionalsee, e.g.,4

Kf, λH

K g∈H inf

K, ggb, g∗∈HK, b∈R

fgp,dρ Xλg

∗2

K

, 1.8

whereλ >0, f ∈Lp

X {fx:fp,dρX <∞}with

f

p, dρX ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩

X

fxp

dρXx

1/p

, 1≤p <,

ess sup

x∈X

fx, p.

1.9

We notice that, on one hand, theK-functionals1.7and1.8are the modifications of the K-functional of interpolation theory see15since the interpolation relation1.4.

On the other hand, they are different from the usualK-functionalssee e.g.,16–30since the termg2K.However, they have some similar point. For example, ifKx, yis a universal kernel,HKis dense inLpX see e.g.,31. Moreover, some classical function spaces such

as the polynomial spacessee2, 32and even some Sobolev spaces may be regarded as RKHSsee e.g.,33.

In learning theory we often require Kf, tHK Otβ and Kf, t

HK Ot

β for

someβ > 0 see e.g., 1,7,14. Many results on this topic have been achieved. With the

weighted Durrmeyer operators8,9showed the decay by takingKx, yto be the algebraic polynomials kernels on0,1×0,1or on the simplex inR2.

However, in general case, the convergence of K-functional 1.8 should also be considered since the offset often has influences on the solution of the learning algorithmssee e.g.,6,11. Hence, the purpose of this paper is twofold. One is to provide the convergence

rates of1.7and1.8whenKx, yis a general Mercer kernel on the unit sphereSq and

1≤p≤∞.The other is how to construct functions of the type of

fx β0g∗x β0

m

i1

βiKx, xi, xX, 1.10

to obtain the convergence rate of1.8. The translation networks constructed in34–37have the form of1.10and the zonal networks constructed in38,39have the form of1.10with β00. So the methods used by these references may be used here to estimate the convergence

rates of1.7and1.8if one can bound the termg∗2

K.

In the present paper, we shall give the convergence rate of1.7and1.8for a general kernel defined on the unit sphereSn {xRn1 :x

Rn1 1}andρx, y ρy|nx

(4)

andμnx,the convergence rate of1.7-1.8in the general case may be obtained according

to the way used by1,8.

The rest of this paper is organized as follows. In Section 2, we shall restate some notations on spherical harmonics and present the main results. Some useful lemmas dealing with the approximation order for the de la Vall´ee means of the spherical harmonics, the Gauss integral formula, the Marcinkiewicz-Zygmund with respect to the scattered data obtained by G. Brown and F. Dai and a result on the zonal networks approximation provided by H. N. Mhaskar will be given inSection 3. A kind of weighted norm estimate for the Mercer kernel matrices on the unit sphere will be given inLemma 3.8. Our main results are proved in the last section.

Throughout the paper, we shall writeA OBif there exists a constantC >0 such thatACB. We writeABifAOBandBOA.

2. Notations and Results

To state the results of this paper, we need some notations and results on spherical harmonics.

2.1. Notations

For integers l ≥ 0, q ≥ 1, the class of all one variable algebraic polynomials of degree≤ l defined on −1,1is denoted byPl, the class of all spherical harmonics of degreel will be

denoted byHlq, and the class of all spherical harmonics of degreelnwill be denoted by q

n. The dimension ofHlqis given bysee40, Page 65

dql dimHlq ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩

2lq−1 lq−1

⎛ ⎝lq−1

q−1 ⎞

, l1,

1, l0,

2.1

and that ofΠqnis

n l0d

q

l.One has the following well-known addition formulasee41, Page

10, Theorem 2:

dql

k1

Yl,kxYl,k

y d

q l

ωq

pql1xy, l0,1, . . . , x, y∈Sq, 2.2

wherepql1xis the degree-lgeneralized Legendre polynomial. The Legendre polynomials are normalized so thatpql11 1 and satisfy the orthogonality relations

1

−1

pql1xpqk1xWqxdx

ωq

ωq−1dlq

δl,k, Wqx

(5)

DefineLpSqand Lp

Wα,β by takingρX to be the usual volume element ofS

q and the

Jacobi weights functionsWα,βx 1−1,α > −1, β > −1, respectively. For any

φL1

Wqwe have the following relationsee42, Page 312:

Sq

φxydμqx ωq−1

1

−1

φxWqxdx. 2.4

The orthogonal projectionsYkf, xof a functionfL1SqonHkqare defined bysee e.g.,

43

Yk

f, x d

q k

ωq

Sq

pqk1xyfydμq

y, xSq, 2.5

wherexydenotes the inner product ofxandy.

2.2. Main Results

LetφL1

Wqsatisfyalφ ωq−1 1

−1φxp

q1

l xWqxdx >0 and

l0alφdql <∞. Define

Kφ, x, yKφ, xy

l0

al

φd

q l

ωq

pql1xy, al

φ>0, x, y∈Sq. 2.6

Then, by44, Chapter 17we know thatKφ, x, yis positive semidefinite onSqand the right

of2.6is convergence absolutely and uniformly since|pql1x| ≤ 1. Therefore,Kφ, x, yis a Mercer kernel onSq.By13, Theorem 10we know thatKφ, x, yis a universal kernel on

Sq. We suppose that there is a constantC

p>0 depending only onpsuch for anyySq

Kφ,·, y

p,SqCpφp,Wq, 1≤p≤∞, φLpWq. 2.7

Given a finite set∧, we denote by∧the cardinality of∧. Forr >0 anda≥ 1 we say that a finite subset∧ ⊂Sqis anr, a-covering ofSqif

Sq

ω∈∧

Bω, r, max

ω∈∧∧ ∩Bω, ra, 2.8

where Bω, r {xSq : dω, x r}with dω, x : arc cosω, x being the geodesic

(6)

Let S ≥ 1 be an integer, a {an}n∞0 a sequence of real numbers. Define forward

difference operators byΔan Δ1an:an1−anr : ΔΔr−1an,r2,3, . . . ,Δ0an:an,

|a|S: sup

0≤r≤S, v≥0

v1r|Δra v|,

|a|∗S

S

r1

v0

v1r−1|Δra v|.

2.9

We say a finite subset∧ ⊂Sq is a subset of interpolatory type if for any real numbers

{}ω∈∧there is ap∈Πnq such thatpω yω,ω∈ ∧.This kind of subsets may be found from

45,46.

LetBSbe the set of all sequenceafor which|a|S<∞andB∗Sthe set of all sequencea

for which|a|S|a|∗S<.

Letβ >0 be a real number,fLpSq.Then, we say

βfLpSqif there is a function

ϕLpSqsuch that

ϕx∼∞

k0

kβYk

f, x, xSq. 2.10

We now give the results of this paper.

Theorem 2.1. If there is a constantγ0>0depending only onqsuch thatis a subset of interpolatory

type and aδ/62m, a-covering ofSq satisfying0 < δ < γ

0/awitha ≥ 1and mbeing a given

positive integer.S > qis an integer.β > 0is a real number such that there isγγ2q−1and

βγ > q,φLW

q satisfies{l1

βa

lφ}l∞0 ∈ BSand{l1−βal−1φ}l∞0 ∈ B∗S.H∧Kφis the

reproducing kernel space reproduced byand the kernel2.6.∂γfLpSq. Then there is a constant

Cp,q >0depending only onpandqand a functiongm,φx C0gm,φ∗ xwithgm,φ∗ x∈ H∧Kφ

andC0a constant such that

fgm,φp,SqO

1 2

, 2.11

g∗

m,φ

2

H∧

O22m2q−1. 2.12

The functionsφsatisfying the conditions of Theorem 2.1may be found in39, Page 357.

Corollary 2.2. Under the conditions ofTheorem 2.1. If∂γfLpSq, then

K

f, 1 22

H∧

O

1 2

. 2.13

Corollary 2.2shows that the convergence rate of theK-functional1.8is controlled by

(7)

Theorem 2.3. If there is a constantγ >0depending only onqsuch thatis a subset of interpolatory type and aδ/62m, a-covering ofSq satisfying0 < δ < γ/awith a 1and mbeing a given positive integer.H∧Kφis the reproducing kernel space reproducing byand the kernel2.6withφLW

qsatisfying{l1

βa

lφ} ∈ BSand{1/l1βa−l1φ} ∈ B∗S.Then, forαq1/p−1/2β

and∂βfLpSqthere holds

K

f, 1 2αm

H∧

O

1 2

, 2.14

whereamaxa,0.

3. Some Lemmas

To prove Theorems 2.1 and 2.3, we need some lemmas. The first one is about the Gauss integral formula and Marcinkiewicz inequalities.

Lemma 3.1see47–50. There exist constantsγ >0depending only onqsuch that for any positive integernand anyδ/n, a-coveringofSqsatisfying0< δ < γ/a, there exists a set of real numbers

λωn−q,ω∈ ∧such that

Sq

fydμq

y

ω∈∧

λωfω 3.1

for anyf ∈Πq3n,and forf ∈Πqn

f

p,Sq ∼ ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩

ω∈∧

λωfωp

1/p

, 0< p <,

max

ω∈∧fω, p,

3.2

where∧ ∼ nq dimΠq

n,the constants of equivalence depending only onq,a,δ, andpwhen pis small. Here one employs the slight abuse of notation that001.

The second lemma we shall use is the Nikolskii inequality for the spherical harmonics.

Lemma 3.2see38,45,49,51,52. If 0 < p < r ≤ ∞,p ∈ Πqn, then one has the following Nikolskii inequality:

pp,SqC

qpr,SqC

qnq1/p−1/rpp,Sq, 3.3

where the constantCqdepends only onq.

(8)

Lemma 3.3. Letμbe a positive measure onX.{Hk}k∞0 is a sequence of finite-dimensional spaces

satisfying the following:

IHkLp.

IIHkis orthogonal toHl(inLp) whenl /k.

IIIspan∞

k0Hkis dense inLpfor all1≤p≤∞.

IVH0is the collection of the constants.

The Ces`aro meansCδoffxis given by

σNδf, x 1 N

N

k0

N−kPk

f, x, N Γ1

Γk1Γδ1 3.4

forN1,2, . . ., where

Pk

f, x

X

fy d

k

l1

ϕk,ik,i

y

dμy, xX, 3.5

and {ϕk,ix}dik1 is an orthogonal base of Hk in L2dμ. One sets,for a given α > 0, ∂αfx

k1kαPkf, xand∂αfLpif there existsϕLpsuch thatPkϕ, x kαPkf, x. Letηbe defined asηuC, ηu 1foru ∈0,1/2andηu 0foru >1and is a nonegative and nonincrease function.ρNf, xare the de la Vall´ee Poussin means defined as

ρN

f, x

k0

η

k 2N

Pk

f, x, xX, fL1dμ. 3.6

Then,ρNf, x fx, fk≤NHk.If for someδ >0,σδNfp, dμCfp, dμ,1 ≤ p≤ ∞, then,ρNfp, dμCfp, dμand

ρNf−fp, dμO

∂αfp, dμ

, ∂αfLp

dμ. 3.7

Lemma 3.3makes the followingLemma 3.4.

Lemma 3.4. Letηbe the function defined as inLemma 3.3. Define two kinds of operators, respectively, by

VN

f, x

k0

η

k 2N

Yk

f, x, xSq, f L1Sq,

VNf, x

k0

η

k 2N

al

fd

q l

ωq

pql1x, x∈−1,1, f ∈L1

Wq.

(9)

Then,VNf, x fxfor anyf∈ΠqNandVN∗f, x fxfor anyfPN. Moreover,

VNff

p,Sq O

αf

p,Sq

, ∂αfLpSq, 3.9

VN∗f−fp,W qO

∂αfp,Wq

, αfLp

Wq, 3.10

where forfLpW

qone defines

∂αfx∼ ∞

k1

kαak

fd

q l

ωq

plq1x x∈−1,1. 3.11

Proof. By54, Lemma 2.2we knowVNfp,SqCfp,Sqfor someC >0. Hence,3.9holds by3.7. By19, Theorem Awe knowσNδfp,WqCfp,Wqforδ > q/2−1/2.Hence,3.10 holds by3.7.

Let∧ ⊂Sqbe a finite set. Then we callan M-Z quadrature measure of ordernif3.1

and3.2hold forf ∈ Πqn.By this definition one knows the finite set∧inLemma 3.1is an

M-Z quadrature measure of ordern. Define an operator as

σy

, η, f, x

l0

η

l y

dq l

ωq

ω∈∧

λωfωpql1ωx, xSq. 3.12

Then, we have the following results.

Lemma 3.5see39. For a given integern≥1,letbe an M-Z quadrature measure of order62n,

φL1

Wq,S > qan integer,1≤p≤∞,β > q/p

, wherepsatisfies1/p1/p1which satisfies

p1ifpandpifp1.ηtdefined inLemma 3.3is a nonnegative and non-increasing function. LetφL1

Wq satisfy{l1

−βa−1

l φ}

l0 ∈ B∗S. Then, forfHγ,p,0< γβ, whereHγ,p consists offLpSqfor which the derivative of orderγ; that is,∂γf, belongs toLpSq. Then, there is an operatorDφ :Hγ,pLPSqsuch that

i(see [39, Proposition3.1, (b)]).Dφfp,Sqc∂γfp,SqforfHβ,p

fx

Sq

φxyDφf

ydμq

y, xSq, 3.13

fl, k al

φ !Dφfl, k, 3.14

(10)

ii(see [39, Theorem3.1]). Moreover, if one adds an assumption that{l1βalφ}l∞0 ∈ BS, then, there are constantsC >0andC1>0such that

f·−

ω∈∧

λωDφσ2n, η, f, ωφ·ω

p,Sq

C2−nγ∂γfp,Sq 3.15

and for0≤γβ

γfγσn, η, f

p,SqC1nγ−β∂βfp,Sq. 3.16

Lemma 3.6see e.g.,29, Page 230. LetfL2Sq. Then,

f2,Sq

k0

Ykf22,Sq 1/2

. 3.17

Following Lemma 3.7 deals with the orthogonality of the Legendre polynomials plxy.

Lemma 3.7. For the generalized Legendre polynomialspql1one has

dql ωq

Sq

pkq1uyplq1xudμqu pqk1

xy, x, ySq. 3.18

Proof. It may be obtained by2.2.

Lemma 3.8. LetφLW

q satisfy2.7forpand∂qφL

Wq.∧ ⊂ S

qis a finite set satisfying the conditions ofTheorem 2.1. Then, there is a constantCφ >0depending only onφsuch that

ω,ω∈∧

λωλωKφ, ω, ω2

1/2

Cφ<. 3.19

Proof. Define a matrix byK

λ

λωK∧ω, ω λωω,ω∈∧, where K∧ω, ω

l0

λl ×

dq l/ωqp

q1

l ωωwithλl≥0 andl∧2 {a aωω∈∧ :

ω∈∧a

2

ω1/2<∞}.Then,

ω,ω∈∧

λωλωKω, ω2

1/2

K√λ

l2 ∧

sup

a /0, a∈l2 ∧

aK

λa

(11)

By the Parseval equality we have

aK

λ

a

ω,ω∈∧

λωK

ω, ωaω λω

ω,ω∈∧

λω

⎛ ⎝

l0

λl×

dql ωq

plq1ωω

aω λω

l0

λl dql

α0

ω∈∧

λωYl,αω

2

≤max

0≤l≤∧λl

Sq

l0

dql

α0

ω∈∧

λωYl,αω

Yl,αx

2

dμqx

max

0≤l≤∧λl

Sq ω∈∧

λω

l0

dlq ωq

pql1xω

2

dμqx.

3.21

LetLx∈ΠqnsatisfyL∧ω aω/ λω,ω∈ ∧. Then, by3.1

aK

λ

a

ω,ω∈∧

λωKω, ωaω λω

≤ max

0≤l≤∧λl

Sq

l0

dql ωq

Sq

Lωpql1xωdμqω

2

dμqx

max

0≤l≤∧λl

Sq|

Lx|2dμqx

max

0≤l≤∧λl

ω∈∧

λω|L∧x|2 max

0≤l≤∧λla a.

3.22

Hence, aK

λ

a ≤ max0≤l≤∧λlaa. On the other hand, since Kφ, x,·∞,Sq∞,W q, xSq, we have for anyxSqthat

Kφ, x, yKVφ, x, yKφVφ, x, yCφ−V∗φ

∞,Wq

. 3.23

It follows forx, ySqthat

KVφ, x, yCφ−V∗φ

∞,Wq

Kφ, x, yKVφ, x, yCφ−V∗φ

∞,Wq .

(12)

DefineK

λ

∧ φ λωKφ, ω, ω λωω,ω∈∧. Then,3.24,3.10, the Cauchy inequality,

and the fact∧ ∼nqmake

aK

λ

φaaK

λ

VφaO

φVφ

∞,Wq

×aaO1aa. 3.25

It follows that

aK

λ

φaaK

λ

VφaaK

λ

φaaK

λ

Vφa

≤ max

0≤l≤∧η

l 2∧

aaO1aa.

3.26

Equation3.2thus holds.

4. Proof of the Main Results

We now show Theorems2.1and2.3, respectively.

Proof ofTheorem 2.1. Lemma 4.3 in39gave the following results.

Let1 ≤p ≤ ∞ > q/p,S > q be an integer, and a sequence of real numbers such :{l1γal} ∈ BS.Then, there existsφLpWqsuch thatalφ∈al,l0,1,2, . . . .

Since1 11lβ−γandβγ> q,we have aφ∗∈LW

qsuch thatalφ

1a

lφ.Hence,∂γφLW

qand

KV2m1

φ, x, y

l0

al

V2m1

φd

q l

ωq

pql1xy, x, ySq, 4.1

and for 0≤k≤2m1there holds forx, ySqthat

dkq ωq

Sq

KV2m1

φ, x, uplq1uydμqu ak

V2m1

φd

q k

ωq

pqk1xyak

φd

q k

ωq

pkq1xy. 4.2

It follows forx, ySqthat

pqk1xy

SqK

V2m1

φ, x, uplq1uydμqu

ak

φ 4.3

On the other hand, since

V2mf, xY0V2mf, x

2m1

k1

Yk

(13)

where for 1≤k≤2m1, we have by4.3

Yk

V2m

f, x d

q k

ωqak

φ

Sq

KV2m1

φ, x, upql1uydμqu

×V2m

f, ydμq

y. 4.5

Hence, above equation and3.1-3.2makes

V2m

f, xY0

V2m

f, x

2 m1

k1

dqk ωqak

φ

Sq

Sq

KV2m1

φ, x, upqk1uydμqu

×V2m

f, ydμq

y

C0

ω∈∧

λωCω,φK

V2m1

φ, x, ω,

4.6

whereC0Y0V2mf, x,Cω,φ

2m1

k1

dq

k/ωqYkV2mf, ω/akφ.Define

gm,φx C0gm,φ∗ x C0

ω∈∧

λωCω,φK

φ, x, ω, xSq. 4.7

Then, we knowgm,φ∗ x∈RH∧Kφand by3.9

gm,φf

p,Sqgm,φV2mfp,SqV2mf−fp,Sq

O

1 2

gm,φf−V2mf

p,Sq,

4.8

where

gm,φx−V2mf, x

ω∈∧

λωCω,φK

V2m1

φφ, x, ω. 4.9

It follows by3.9that gm,φV2mf

p,Sq

ω∈∧

λωCω,φKV2m1φ−φ,·, ωp,Sq

O

1 2

ω∈∧

λωCω,φ.

4.10

On the other hand, by the definition ofV2mfand3.14we have for 1≤k≤2m1that

Yk

V2mf, ωη

k 2m1

ak

φYk

Dφf, ω

(14)

whereDφ denotes the operatorDφofLemma 3.5forφxy Kφ, xy.Hence,

Cω,φ

2m1

k1

dkq ωq

η

k 2m1

Yk

Dφf, ω

, ωSq. 4.12

Equation3.2and the definition ofηtmake

ω∈∧

λωCω,φ

ω∈∧

λω

2m1

k1

dkq ωq

η

k 2m1

Yk

Dφf, ω

Cq

2m1

k1

dqk

Sq Yk

Dφf, ωdμqω.

4.13

The H ¨older inequality, theiofLemma 3.5, and the fact that|pql1x| ≤1 make|YkDφf, ω| ≤

CqdqkDφfp,SqCqdq

k∂γfp,Sq. Therefore,

ω∈∧

λωCω,φCq2m1

k1

dkq2∂γfp,Sq. 4.14

Takegm,φ∗ x

ω∈∧λωCω,φKφ, x, ω,then

gm,φ∗ 2

H∧

ω,ω∈∧

λωλωCω,φCωKφ, ω, ω

ω∈∧

λωCω,φ2

ω,ω∈∧

λωλωKφ, ω, ω2

1/2

.

4.15

Equations3.2,3.17,3.16, and the Cauchy inequality make

ω∈∧

λωCω,φ2≤C

2m1

k1

dqk ωq

Yk

Vm

f, ω ak φ 2

2,Sq

C 2m1

k1

dqk2∂γfp,Sq 2

.

4.16

LetΓxbe the Gamma function. Then, it is well known thatΓx/Γxax−a, x → ∞.Therefore,

dlq 2lq−1 lq−1

lq−1 q−1

2lq−1Γlq−1 ΓqΓl1 ∼

2lq−1 Γq l

(15)

Hence,

2m1

k1

dqk2O 2m1

l1

l2q−1

O2m12q−1, m−→∞. 4.18

Equations4.14and4.4make

gm,φV2mf

p,Sq O

1 2−2q1

, 4.19

and hence

fgm,φp,SqO

1 2

1 2−2q1

. 4.20

Sinceγ ≥ 2q−1γ, we have2.11by4.20. Equation2.12follows by 4.3,4.4, and 3.19.

Proof ofCorollary 2.2.By2.11-2.12one has

K

f, 1 22

H

fgm,φp,Sq 1 22

g∗

m,φ

2

H

O

1 2

1

22 ×O

22m2q−1O

1 2

.

4.21

Proof ofTheorem 2.3. Take the place ofφxωinLemma 3.5withKφ, xω,denote still byDφ

the operatorDφ inLemma 3.5withφxy Kφ, xyand

gm,φx

ω∈∧

λωDφσ2m, η, f, ωKφ, xω, xSq, 4.22

then,gm,φHKφand by3.15fgm,φp,SqO1/2mβ.In this case,

gm,φ2H

ω,ω∈∧

λωλωDφσ2m, η, f, ωDφσ2m, η, f, ωKφ, ωω

ω∈∧

λωDφσ2m, η, f, ω2

ω,ω∈∧

λωλωKφ, ωω2

1/2

.

(16)

Sinceσ2m, η, xis a spherical harmonics of order≤ 2m, we know byiofLemma 3.5that

Dφσ2m, η, f, xare also spherical harmonics of order≤ 2m.Then, 3.2,iofLemma 3.5, 3.3, and3.16make

ω∈∧

λωDφσ2m, η, f, ω2≤C

Sq

Dφσ2m, η, f, ω2qω

C∂βσ2m, η, f2

2,Sq

C22mq1/p−1/2

βσ2m, η, f2

p,Sq

C22mq1/p−1/2βfβσ2m, η, f2

p,SqC∂βf2p,Sq

. 4.24

Hence,3.19and above equation makegm,φ2H

C2

mq1/p−1/2βf2

p,Sq. Equation2.14 follows by3.15.

Acknowledgments

This work is supported by the National NSF10871226 of China. The authors thank the reviewers for giving very valuable suggestions.

References

1 F. Cucker and S. Smale, “On the mathematical foundations of learning,”Bulletin of the American Mathematical Society, vol. 39, no. 1, pp. 1–49, 2002.

2 F. Cucker and D.-X. Zhou, Learning Theory: An Approximation Theory Viewpoint, Cambridge Monographs on Applied and Computational Mathematics, Cambridge University Press, Cambridge, Mass, USA, 2007.

3 V. N. Vapnik, Statistical Learning Theory, Adaptive and Learning Systems for Signal Processing, Communications, and Control, John Wiley & Sons, New York, NY, USA, 1998.

4 D. R. Chen, Q. Wu, Y. M. Ying, and D. X. Zhou, “Support vector machine soft margin classifier: error analysis,”Journal of Machine Learning and Research, vol. 5, pp. 1143–1175, 2004.

5 T. Evgeniou, M. Pontil, and T. Poggio, “Regularization networks and support vector machines,” Advances in Computational Mathematics, vol. 13, no. 1, pp. 1–50, 2000.

6 Y. Li, Y. Liu, and J. Zhu, “Quantile regression in reproducing kernel Hilbert spaces,”Journal of the American Statistical Association, vol. 102, no. 477, pp. 255–268, 2007.

7 H. Tong, D.-R. Chen, and L. Peng, “Analysis of support vector machines regression,”Foundations of Computational Mathematics, vol. 9, no. 2, pp. 243–257, 2009.

8 H. Tong, D.-R. Chen, and L. Peng, “Learning rates for regularized classifiers using multivariate polynomial kernels,”Journal of Complexity, vol. 24, no. 5-6, pp. 619–631, 2008.

9 D.-X. Zhou and K. Jetter, “Approximation with polynomial kernels and SVM classifiers,”Advances in Computational Mathematics, vol. 25, no. 1–3, pp. 323–344, 2006.

10 D. Chen and D.-H. Xiang, “The consistency of multicategory support vector machines,”Advances in Computational Mathematics, vol. 24, no. 1–4, pp. 155–169, 2006.

11 E. De Vito, L. Rosasco, A. Caponnetto, M. Piana, and A. Verri, “Some properties of regularized kernel methods,”Journal of Machine Learning Research, vol. 5, pp. 1363–1390, 2004.

12 N. Aronszajn, “Theory of reproducing kernels,”Transactions of the American Mathematical Society, vol. 68, pp. 337–404, 1950.

(17)

14 Q. Wu, Y. Ying, and D.-X. Zhou, “Multi-kernel regularized classifiers,”Journal of Complexity, vol. 23, no. 1, pp. 108–134, 2007.

15 J. Bergh and J. L ¨ofstr ¨om,Interpolation Spaces, Springer, New York, NY, USA, 1976.

16 H. Berens and G. G. Lorentz, “Inverse theorems for Bernstein polynomials,” Indiana University Mathematics Journal, vol. 21, no. 8, pp. 693–708, 1972.

17 H. Berens and L. Q. Li, “The Peetre K-moduli and best approximation on the sphere,” Acta Mathematica Sinica, vol. 38, no. 5, pp. 589–599, 1995.

18 H. Berens and Y. Xu, “K-moduli, moduli of smoothness, and Bernstein polynomials on a simplex,” Indagationes Mathematicae, vol. 2, no. 4, pp. 411–421, 1991.

19 W. Chen and Z. Ditzian, “Best approximation andK-functionals,”Acta Mathematica Hungarica, vol. 75, no. 3, pp. 165–208, 1997.

20 W. Chen and Z. Ditzian, “Best polynomial and Durrmeyer approximation inLpS,”Indagationes Mathematicae, vol. 2, no. 4, pp. 437–452, 1991.

21 F. Dai and Z. Ditzian, “Jackson inequality for Banach spaces on the sphere,” Acta Mathematica Hungarica, vol. 118, no. 1-2, pp. 171–195, 2008.

22 Z. Ditzian and X. Zhou, “Optimal approximation class for multivariate Bernstein operators,”Pacific Journal of Mathematics, vol. 158, no. 1, pp. 93–120, 1993.

23 Z. Ditzian and K. Runovskii, “Averages and K-functionals related to the Laplacian,” Journal of Approximation Theory, vol. 97, no. 1, pp. 113–139, 1999.

24 Z. Ditzian, “A measure of smoothness related to the Laplacian,” Transactions of the American Mathematical Society, vol. 326, no. 1, pp. 407–422, 1991.

25 Z. Ditzian and V. Totik,Moduli of Smoothness, vol. 9 ofSpringer Series in Computational Mathematics, Springer, New York, NY, USA, 1987.

26 Z. Ditzian, “Approximation on Banach spaces of functions on the sphere,”Journal of Approximation Theory, vol. 140, no. 1, pp. 31–45, 2006.

27 Z. Ditzian, “Fractional derivatives and best approximation,”Acta Mathematica Hungarica, vol. 81, no. 4, pp. 323–348, 1998.

28 L. L. Schumaker,Spline Functions: Basic Theory, John Wiley & Sons, New York, NY, USA, 1981, Pure and Applied Mathematics.

29 K. Y. Wang and L. Q. Li,Harmonic Analysis and Approximation on the Unit Sphere, Science Press, Beijing, China, 2000.

30 Y. Xu, “Approximation by means of h-harmonic polynomials on the unit sphere,” Advances in Computational Mathematics, vol. 21, no. 1-2, pp. 37–58, 2004.

31 S. Smale and D.-X. Zhou, “Estimating the approximation error in learning theory,”Analysis and Applications, vol. 1, no. 1, pp. 17–41, 2003.

32 B. Sheng, “Estimates of the norm of the Mercer kernel matrices with discrete orthogonal transforms,” Acta Mathematica Hungarica, vol. 122, no. 4, pp. 339–355, 2009.

33 S. Loustau, “Aggregation of SVM classifiers using Sobolev spaces,” Journal of Machine Learning Research, vol. 9, pp. 1559–1582, 2008.

34 H. N. Mhaskar and C. A. Micchelli, “Degree of approximation by neural and translation networks with a single hidden layer,”Advances in Applied Mathematics, vol. 16, no. 2, pp. 151–183, 1995.

35 B. H. Sheng, “Approximation of periodic functions by spherical translation networks,” Acta Mathematica Sinica. Chinese Series, vol. 50, no. 1, pp. 55–62, 2007.

36 B. Sheng, “On the degree of approximation by spherical translations,”Acta Mathematicae Applicatae Sinica. English Series, vol. 22, no. 4, pp. 671–680, 2006.

37 B. Sheng, J. Wang, and S. Zhou, “A way of constructing spherical zonal translation network operators with linear bounded operators,”Taiwanese Journal of Mathematics, vol. 12, no. 1, pp. 77–92, 2008.

38 H. N. Mhaskar, F. J. Narcowich, and J. D. Ward, “Approximation properties of zonal function networks using scattered data on the sphere,”Advances in Computational Mathematics, vol. 11, no. 2-3, pp. 121–137, 1999.

39 H. N. Mhaskar, “Weighted quadrature formulas and approximation by zonal function networks on the sphere,”Journal of Complexity, vol. 22, no. 3, pp. 348–370, 2006.

40 H. Groemer,Geometric Applications of Fourier Series and Spherical Harmonics, vol. 61 ofEncyclopedia of Mathematics and Its Applications, Cambridge University Press, Cambridge, Mass, USA, 1996.

41 C. M ¨uller,Spherical Harmonics, vol. 17 ofLecture Notes in Mathematics, Springer, Berlin, Germany, 1966.

42 S. Z. Lu and K. Y. Wang,Bochner-Riesz Means, Beijing Normal University Press, Beijing, China, 1988.

(18)

44 H. Wendland, Scattered Data Approximation, vol. 17 of Cambridge Monographs on Applied and Computational Mathematics, Cambridge University Press, Cambridge, Mass, USA, 2005.

45 H. N. Mhaskar, F. J. Narcowich, N. Sivakumar, and J. D. Ward, “Approximation with interpolatory constraints,”Proceedings of the American Mathematical Society, vol. 130, no. 5, pp. 1355–1364, 2002.

46 F. J. Narcowich and J. D. Ward, “Scattered data interpolation on spheres: error estimates and locally supported basis functions,”SIAM Journal on Mathematical Analysis, vol. 33, no. 6, pp. 1393–1410, 2002.

47 G. Brown and F. Dai, “Approximation of smooth functions on compact two-point homogeneous spaces,”Journal of Functional Analysis, vol. 220, no. 2, pp. 401–423, 2005.

48 G. Brown, D. Feng, and S. Y. Sheng, “Kolmogorov width of classes of smooth functions on the sphere

Sd−1,”Journal of Complexity, vol. 18, no. 4, pp. 1001–1023, 2002.

49 F. Dai, “Multivariate polynomial inequalities with respect to doubling weights andA∞weights,” Journal of Functional Analysis, vol. 235, no. 1, pp. 137–170, 2006.

50 H. N. Mhaskar, F. J. Narcowich, and J. D. Ward, “Spherical Marcinkiewicz-Zygmund inequalities and positive quadrature,”Mathematics of Computation, vol. 70, no. 235, pp. 1113–1130, 2001.

51 E. Belinsky, F. Dai, and Z. Ditzian, “Multivariate approximating averages,”Journal of Approximation Theory, vol. 125, no. 1, pp. 85–105, 2003.

52 A. I. Kamzolov, “Approximation of functions on the sphereSn,”Serdica, vol. 10, no. 1, pp. 3–10, 1984.

53 F. Dai and Z. Ditzian, “Ces`aro summability and Marchaud inequality,”Constructive Approximation, vol. 25, no. 1, pp. 73–88, 2007.

References

Related documents