• No results found

Approximation Analysis of Learning Algorithms for Support Vector Regression and Quantile Regression

N/A
N/A
Protected

Academic year: 2021

Share "Approximation Analysis of Learning Algorithms for Support Vector Regression and Quantile Regression"

Copied!
18
0
0

Loading.... (view fulltext now)

Full text

(1)

Volume 2012, Article ID 902139,17pages doi:10.1155/2012/902139

Research Article

Approximation Analysis of

Learning Algorithms for Support Vector

Regression and Quantile Regression

Dao-Hong Xiang,

1

Ting Hu,

2

and Ding-Xuan Zhou

3

1Department of Mathematics, Zhejiang Normal University, Zhejiang 321004, China 2School of Mathematics and Statistics, Wuhan University, Wuhan 430072, China 3Department of Mathematics, City University of Hong Kong, Kowloon, Hong Kong

Correspondence should be addressed to Ding-Xuan Zhou,[email protected]

Received 14 July 2011; Accepted 14 November 2011 Academic Editor: Yuesheng Xu

Copyrightq2012 Dao-Hong Xiang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

We study learning algorithms generated by regularization schemes in reproducing kernel Hilbert spaces associated with an-insensitive pinball loss. This loss function is motivated by the -insen-sitive loss for support vector regression and the pinball loss for quantile regression. Approximation analysis is conducted for these algorithms by means of a variance-expectation bound when a noise condition is satisfied for the underlying probability measure. The rates are explicitly derived under a priori conditions on approximation and capacity of the reproducing kernel Hilbert space. As an application, we get approximation orders for the support vector regression and the quantile regu-larized regression.

1. Introduction and Motivation

In this paper, we study a family of learning algorithms serving both purposes of support vector

regression and quantile regression. Approximation analysis and learning rates will be provided,

which also helps better understanding of some classical learning methods.

Support vector regression is a classical kernel-based algorithm in learning theory introduced in1. It is a regularization scheme in a reproducing kernel Hilbert spaceRKHS HKassociated with an-insensitive lossψ:R → Rdefined for≥0 by

ψu max{|u| −,0}

|u| −, if |u| ≥,

(2)

Here, for learning functions on a compact metric spaceX,K :X×X → Ris a continuous, symmetric, and positive semidefinite function called a Mercer kernel. The associated RKHS HKis defined2as the completion of the linear span of the set of function{Kx Kx,· :

xX}with the inner product·,·K satisfyingKx, KyK Kx, y. LetY Randρbe a

Borel probability measure onZ:X×Y. With a sample z{xi, yi}mi1∈Zmindependently

drawn according toρ, the support vector regression is defined as

fzSVRarg min f∈HK 1 m m i1 ψfxiyi λf2K , 1.2

whereλλm>0 is a regularization parameter.

When >0 is fixed, convergence of1.2was analyzed in3. Notice from the original motivation1for the insensitive parameterfor balancing the approximation and sparsity of the algorithm thatshould change with the sample size and usuallym → 0 as the sample sizemincreases. Mathematical analysis for this original algorithm is still open. We will solve this problem in a special case of our approximation analysis for general learning algorithms. In particular, we show howfSVR

z approximates the median functionfρ,1/2, which

is one of the purposes of this paper. Here, forxX, the median function valuefρ,1/2xis a

median of the conditional distributionρ· |xofρatx.

Quantile regression, compared with the least squares regression, provides richer infor-mation about response variables such as stretching or compressing tails4. It aims at esti-mating quantile regression functions. With a quantile parameter 0< τ <1, a quantile regression

functionfρ,τ is defined by its valuefρ,τxto be aτ-quantile ofρ· |x, that is, a valueuY

satisfying

ρyY :yu |xτ, ρyY :yu |x≥1−τ. 1.3 Quantile regression has been studied by kernel-based regularization schemes in a learning theory literaturee.g.,5–8. These regularization schemes take the form

fzQRarg min f∈HK 1 m m i1 ψτ fxiyi λf2K , 1.4

whereψτ :R → Ris the pinball loss shown inFigure 1defined by

ψτu

1−τu, if u >0,

τu, if u≤0. 1.5

Motivated by the-insensitive lossψand the pinball lossψ

τ, we propose the

-insen-sitive pinball lossψ

τ :R → Rwith an insensitive parameter≥0 shown inFigure 1defined

as ψτu ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ 1−τu, ifu > ,τu, ifu≤ −, 0, otherwise. 1.6

(3)

0 uτu (1−τ)u ψτ(u) 0 ψɛτ(u)τ(u+ɛ) (1−τ)(u uɛ)ɛ ɛ Figure 1

This loss function has been applied to online learning for quantile regression in our previous work8. It is applied here to a regularization scheme in the RKHS as

fzfz,λ,τ arg min f∈HK 1 m m i1 ψτfxiyi λf2K . 1.7

The main goal of this paper is to study how the output functionfzgiven by1.7converges

to the quantile regression functionfρ,τ and how explicit learning rates can be obtained with

suitable choices of the parameters λ mα, mβ based on a priori conditions on the

probability measureρ.

2. Main Results on Approximation

Throughout the paper, we assume that the conditional distributionρ· | xis supported on

−1,1for everyxX. Then, we see from1.3that we can take values offρ,τto be on−1,1.

So to see howfzapproximatesfρ,τ, it is natural to project values of the output functionfz

onto the same interval by the projection operator introduced in9.

Definition 2.1. The projection operatorπon the space of function onXis defined by

πfx ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ 1, iffx>1, −1, iffx<−1, fx, if −1≤fx≤1. 2.1

Our approximation analysis aims at establishing bounds for the errorπfzfρ,τLpρX

in the spaceLpρXwith somep>0 whereρXis the marginal distribution ofρonX.

2.1. Support Vector Regression and Quantile Regression

Our error bounds and learning rates are presented in terms of a noise condition and approx-imation condition onρ.

(4)

Definition 2.2. Letp∈0,∞andq∈1,∞. We say thatρhas aτ-quantile ofp-average type

qif for everyxX, there exist aτ-quantilet∗∈Rand constantsax ∈0,2, bx>0 such that

for eachu∈0, ax,

ρyt∗−u, t∗ |xbxuq−1, ρ

yt, tu |xbxuq−1, 2.2

and that the function onXtaking valuebxaqx−1−1atxXlies inLpρX.

Note that condition2.2tells us thatfρ,τx t∗is uniquely defined at everyxX.

The approximation condition on ρ is stated in terms of the integral operator LK :

L2

ρXL

2

ρX defined byLKfx

XKx, ufudρXu. SinceK is positive semidefinite,

LK is a compact positive operator and itsr-th powerLrKis well-defined for anyr > 0. Our

approximation condition is given as

fρ,τ LrK gρ,τ for some 0< r ≤ 1 2, gρ,τL 2 ρX. 2.3

Let us illustrate our approximation analysis by the following special case which will be proved inSection 5.

Theorem 2.3. LetX ⊂ Rn andK CX×X. Assumef

ρ,τ ∈ HKandρhas aτ-quantile ofp

-average type 2 for somep∈0,∞. Takeλmp1/p2andmβwithp1/p2β≤ ∞.

Let 0< η <p1/2p2. Then, withp∗ 2p/p1 >0, for any 0< δ <1, with confidence 1−δ, one has πfzfρ,τ LpρX∗ ≤ Clog3 δm ηp1/2p2, 2.4

whereCis a constant independent ofmorδ.

Ifp≥1/2η−2 for 0< η <1/4, we see that the power exponent for the learning rate

2.4is at least1/2−2η. This exponent can be arbitrarily close to 1/2 whenηis small enough. In particular, if we takeτ 1/2,Theorem 2.3provides rates for output functionfzSVR

of the support vector regression1.2to approximate the median functionfρ,1/2.

If we takeβ∞leading to0,Theorem 2.3provides rates for output functionfzQR

of the quantile regression algorithm1.4to approximate the quantile regression functionfρ,τ.

2.2. General Approximation Analysis

To state our approximation analysis in the general case, we need the capacity of the hypo-thesis space measured by covering numbers.

Definition 2.4. For a subsetSofCXandu > 0, the covering numberNS, uis the minimal integerl∈Nsuch that there existldisks with radiusucoveringS.

(5)

The covering numbers of ballsBR {f ∈ HK :fKR}with radiusR > 0 of the

RKHS have been well studied in the learning theory literature10,11. In this paper, we as-sume for somes >0 andCs>0 that

logNB1, uCs 1 u s ,u >0. 2.5

Now we can state our main result which will be proved inSection 5. Forp ∈ 0,

andq∈1,∞, we denote θmin 2 q, p p1 ∈0,1. 2.6

Theorem 2.5. Assume2.3with 0 < r ≤1/2 and2.5withs > 0. Suppose thatρhas aτ

-quan-tile ofp-average typeqfor somep ∈0,andq∈1,∞. Takeλ mαwith 0 < α1 andα <

2s/s2sθ. Setmβwithαr/1rβ≤ ∞. Let

0< η < 1s22s2sθ

s2sθ2s . 2.7

Then, withppq/p1>0, for any 0< δ <1, with confidence 1δ, one has

πfzfρ,τ LpρX∗ ≤ C log3 η 2 log3 δmϑ, 2.8

whereCis a constant independent ofmorδand the power indexϑis given in terms ofr, s, p, q, α, andηby ϑ 1 qmin αr 1−r, 1 2sθ2sθ−1 2sθ2s 1s, 1 2sθ1−2r 1s2−2r, 1 2sθs 1s α 2 − 1 22−θ . 2.9

The indexϑcan be viewed as a function of variablesr, s, p, q, α, η. The restriction 0 < α <2s/s2sθonαand2.7onηensure thatϑis positive, which verifies the valid learning rate inTheorem 2.5.

Assumption2.5is a measurement of regularity of the kernelKwhenXis a subset of Rn. In particular,scan be arbitrarily small whenKis smooth enough. In this case, the power

indexϑin2.8can be arbitrarily close to1/qmin{αr/1−r,1/2−θ}. Again, whenβ∞,

0, algorithm 1.7corresponds to algorithm1.4 for quantile regression. In this case,

(6)

Error analysis has been done for quantile regression algorithm1.4in5,6. Under the assumptions thatρsatisfies2.2with somep∈0,∞andq >1 and the2empirical covering

numberNzB1, η, 2 see5for more detailsofB1is bounded as

sup z∈Zm logNz B1, η, 2 ≤a 1 η s withs∈0,2, a≥1, 2.10

it was proved in5that with confidence 1−δ,

πfzQRfρ,τ LpρX∗ ≤ ⎧ ⎨ ⎩Dτλ Dτλ λ log3 m Ks,Cpa λs/2m p1/p2−s/2 Ks,Cpa λs/2m 5 32Cplog3 m p1/p2 145 log3 m ⎫ ⎪ ⎪ ⎬ ⎪ ⎪ ⎭ 1/q , 2.11 whereCpandKs,Cpare constants independent ofmorλ. Here,Dτλis the regularization error

defined as Dτλ inf f∈HK Eτ f− Eτ fρ,τ λf2K, 2.12

andEτfis the generalization error associated with the pinball lossψτdefined by

Eτ f Z ψτ fxydρ X Y ψτ fxydρy|xdρXx. 2.13

Note thatEτfis minimized by the quantile regression functionfρ,τ. Thus, when the

regular-ization errorDτλdecays polynomially asDτλOλr/1−r which is ensured byLemma 2.6

below when2.3is satisfiedandλmα, thenπfzQRfρ,τLp

ρX Olog3/δmϑwith ϑ 1 qmin αr 1−r, p1 p2−s/2 1−αs 2 ,p1 p2 . 2.14

Sincep1/p2 1/2−θ, we see that this learning rate is comparable to our result in

(7)

2.3. Comparison with Least Squares Regression

There has been a large literature in learning theorydescribed in12for the least squares algorithms: fzLSarg min f∈HK 1 m m i1 fxiyi 2 λf2K . 2.15

It aims at learning the regression functionfρx

Yydρy | x. A crucial property for its

error analysis is the identityElsf− Elsfρ f2L2

ρX for the least squares generalization

errorElsf

Zyfx

2. It yields a variance-expectation bound Eξ2 4Eξfor the

random variableξ yfx2−yfρx2onZ, ρwheref :XYis an arbitrary

mea-surable function. Such a variance-expectation bound with Eξpossibly replaced by its posi-tive powerEξθplays an essential role for analyzing regularization schemes and the power exponentθdepends on strong convexity of the loss. See13and references therein. However, the pinball loss in the quantile regression setting has no strong convexity6and we would not expect a variance-expectation bound for a general distributionρ. Whenρhas aτ-quantile ofp-average typeq, the following variance-expectation bound withθgiven by2.6can be found in5,7 derived by means ofLemma 3.1below.

Lemma 2.6. Ifρhas aτ-quantile ofp-average typeqfor somep∈0,andq∈1,∞, then

Eψτ fxyψτ fρ,τxy 2 Eτ f− Eτ fρ,τ θ ,f:X−→Y, 2.16

where the power indexθis given by2.6and the constantCθisCθ22−θqθγ−1θLp

ρX.

Lemma 2.6overcomes the difficulty of quantile regression caused by lack of strong convexity of the pinball loss. It enables us to derive satisfactory learning rates, as in

Theorem 2.5.

3. Insensitive Relation and Error Decomposition

An important relation for quantile regression observed in5assets that the errorπfz

fρ,τ taken in a suitable Lp

ρX space can be bounded by the excess generalization error

Eτπfz− Eτfρ,τwhen the noise condition is satisfied.

Lemma 3.1. Letp ∈0,andq∈ 1,∞. Denoteppq/p1 >0. Ifρhas aτ-quantile of

p-average typeq, then for any measurable functionfonX, one has

ffρ,τLpρXCq,ρ Eτ f− Eτ fρ,τ 1/q , 3.1 whereCq,ρ21−1/qq1/q{bxaqx−1−1}xX 1/q LpρX.

(8)

ByLemma 3.1, to estimateπfzfρ,τLp

ρX, we only need to bound the excess

gener-alization errorEτπfz− Eτfρ,τ. This will be done by conducting an error decomposition

which has been developed in the literature for regularization schemes9,13–15. Technical difficulty arises for our problem here because the insensitive parameterchanges withm. This can be overcome16by the following insensitive relation

ψτuψτuψτu, u∈R. 3.2

Now, we can conduct an error decomposition. Define the empirical errorEz,τffor

f:X → Ras Ez f 1 m m i1 ψτ fxiyi . 3.3

Lemma 3.2. Letλ >0,fzbe defined by1.7and

fλ0arg min f∈HK Eτ f− Eτ fρ,τ λf2K. 3.4 Then, Eτ πfz − Eτ fρ,τ λfz 2 K≤ S1S2Dτλ, 3.5 where S1 Eτ πfz − Eτ fρ,τ − Ez πfz − Ez fρ,τ , S2 Ez fλ0− Ez fρ,τ − Eτ fλ0− Eτ fρ,τ , 3.6

Proof. The regularized excess generalization errorEτπfz− Eτfρ,τ λfz2Kcan be

ex-pressed as Eτ πfz − Ez πfz ! Ez πfz λfz 2 K " − ! Ez fλ0λfλ02 K " Ez fλ0− Eτ fλ0 Eτ fλ0− Eτ fρ,τ λfλ02 K . 3.7

(9)

The fact|y| ≤ 1 implies thatEz,τπfz ≤ Ez,τfz. The insensitive relation3.2and the

definition offztell us that

Ez fz λfz 2 K≤ 1 m m i1 ψτfzxiyi λfz 2 K ≤ 1 m m i1 ψτfλ0xiyi λfλ02 K≤ Ez fλ0λfλ02 K. 3.8

Then, by subtracting and addingEτfρ,τandEz,τfρ,τand notingDτλ Eτfλ0−Eτfρ,τ

λfλ02

K, we see that the desired inequality inLemma 3.2holds true.

In the error decomposition3.5, the first two terms are called sample error. The last term is the regularization error defined in2.12. It can be estimated as follows.

Proposition 3.3. Assume2.3. Definefλ0by3.4. Then, one has

DτλC0λr/1−r, fλ0 K≤ # C0λ2r−1/2−2r, 3.9

whereC0is the constantC0 gρ,τL2

ρX gρ,τ

2

L2

ρX.

Proof. Letμλ1/1−r>0 and

LKμI

−1

LKfρ,τ. 3.10

It can be found in17,18that when2.3holds, we have

fρ,τ2L2 ρXμfμ 2 Kμ 2rg ρ,τ2L2 ρX. 3.11 Hence,fρ,τL2 ρXμ rg ρ,τL2 ρX and 2 Kμ2r−1gρ,τ2L2

ρX. Sinceψτis Lipschitz, we know

by takingffμin2.12that DτλZ ψτ fμxyψτ fρ,τxy λfμ2Kfρ,τL1 ρXλ fμ2 Kfρ,τL2 ρXλ fμ2 Kμrgρ,τL2 ρXλμ 2r−1g ρ,τ2L2 ρX gρ,τL2 ρX gρ,τ2 L2 ρX λr/1−r. 3.12

This verifies the desired bound forDτλ. By takingf 0 in2.12, we have

λfλ02

K≤ DτλC0λ

r/1−r. 3.13

(10)

4. Estimating Sample Error

This section is devoted to estimating the sample error. This is conducted by using the vari-ance-expectation bound inLemma 2.6.

DenoteκsupxX#Kx, x. ForR≥1, denote WR zZm:fz

KR

. 4.1

Proposition 4.1. Assume2.3and2.5. LetR1 and 0< δ <1. Ifρhas aτ-quantile ofp-average

typeqfor somep∈0,andq∈1,∞, then there exists a subsetVR ofZmwith measure at most

δsuch that for any z∈ WR\VR,

Eτ πfz − Eτ fρ,τ λfz 2 K≤2C 1log 3 δλ r/1−rmax 1 −1/2−2r m C2log3 δm −1/2−θC 3m−1/2sθRs/1s, 4.2

whereθis given by2.6andC1, C2, C3are constants given by

C14C02κ # C0, C2245324C 1/2−θ θ , C3 406Cs1/1s408CθCs1/2sθ. 4.3

Proof. Let us first estimate the second partS2of the sample error. It can be decomposed into

two partsS2S2,1S2,2where

S2,1 Ez fλ0− Ez πfλ0 −Eτ fλ0− Eτ πfλ0 , S2,2 Ez πfλ0− Ez fρ,τ − Eτ πfλ0− Eτ fρ,τ . 4.4

For boundingS2,1, we take the random variableξzψτfλ0xyψτπfλ0xy

onZ, ρ. It satisfies 0≤ξ≤ |πfλ0xfλ0x| ≤1fλ0. Hence,|ξEξ| ≤1fλ0

and EξEξ2 ≤Eξ2 1f0

λEξ. Applying the one-side Bernstein inequality12,

we know that there exists a subsetZ1ofZmwith measure at least 1−δ/3such that

S2,1≤ 71fλ0 ∞ log3 6m Eτ fλ0− Eτ πfλ0,zZ1,δ. 4.5

(11)

For S2,2, we apply the one-side Bernstein inequality again to the random variable

ξz ψτπfλ0xyψτfρ,τxy, bound the variance byLemma 2.6withfπfλ0,

and find that there exists another subsetZ2ofZmwith measure at least 1−δ/3such that

S2,2≤ 4 log3 3m θ 2 θ/4−2θ 1−θ 2 2C θlog3 m 1/2−θ Eτ πfλ0− Eτ fρ,τ ,zZ2,δ. 4.6

Next, we estimate the first partS1of the sample error. Consider the function set

Gψτ πfxyψτ fρ,τxy :fKR . 4.7

A function from this setgz ψτπfxyψτfρ,τxysatisfies Eg≥0,|gz| ≤2,

and Eg2 C

θE by 2.16. Also, the Lipschitz property of the pinball loss yields

NG, u≤ NB1, u/R. Then, we apply a standard covering number argument with a ratio

inequality12,13,19,20toGand find from the covering number condition2.5that

Probz∈Zm ⎧ ⎪ ⎨ ⎪ ⎩fsupKR $ Eτ πf− Eτ fρ,τ % −$Ez πf− Ez fρ,τ % & Eτ πf− Eτ fρ,τ θ ≤4u1−θ/2 ⎫ ⎪ ⎬ ⎪ ⎭ ≥1− NB1, u R exp − mu2−θ 2 4/3u1−θ ≥1−exp Cs R u smu2−θ 2 4/3u1−θ . 4.8

Setting the confidence to be 1−δ/3, we takeuR, m, δ/3to be the positive solution to the equation Cs R u smu2−θ 2 4/3u1−θ logδ 3. 4.9

Then, there exists a third subsetZ3 ofZmwith measure at least 1−δ/3such that

sup fKR $ Eτ πf−Eτ fρ,τ % −$Ez πf−Ez fρ,τ % & Eτ πf−Eτ fρ,τ θ uR, m, δ/3θ ≤4 uR, m,δ 3 1−θ/2 ,zZ3,δ. 4.10

(12)

Thus, forz∈ WRZ3, we have S1 Eτ πfz − Eτ fρ,τ − Ez πfz − Ez fρ,τ ≤4 ! uR, m,δ 3 "1−θ/2 Eτ πfz − Eτ fρ,τ θ ! uR, m,δ 3 "θ ≤ 1−θ 2 42/2−θuR, m,δ 3 θ 2 Eτ πfz − Eτ fρ,τ 4uR, m,δ 3 . 4.11

Here, we have used the elementary inequality√ab ≤ √aband Young’s inequality. Putting this bound and4.5,4.6into3.5, we know that for z∈ WRZ3Z1Z2,

there holds Eτ πfz− Eτ fρ,τ λfz 2 K≤ 1 2 Eτ πfz − Eτ fρ,τ 2Dτλ 20uR, m,δ 3 7 1fλ0 ∞ log3 6m 4 log3 3m 2C θlog3 m 1/2−θ , 4.12 which together withProposition 3.3implies

Eτ πfz− Eτ fρ,τ λfz 2 K≤24C0λ r/1−r522C θ1/2−θ log3 δm −1/2−θ 40uR, m,δ 3 2κ#C0log 3 δ λ2r−1/2−2r m . 4.13 Here, we have used the reproducing property inHKwhich yields12

fκfK,f∈ HK. 4.14

Equation4.9can be expressed as

u2sθ− 4 3mlog 3 δu 1sθ2 m log 3 δu s 4CsRs 3m u 1−θ2CθCsRs m 0. 4.15

By Lemma 7.2 in12, the positive solutionuR, m, δ/3to this equation can be bounded as

uR, m, δ/3≤max 6 mlog 3 δ, 8 m log 3 δ 1/2−θ , 6CsRs m 1/1s , 8CθCsRs m 1/2sθ ≤6 81/2−θ log3 δm −1/2−θ6C s1/1s 8CθCs1/2sθ ×m−1/2sθRs/1s. 4.16

(13)

Thus, for z∈ WRZ3Z1Z2, the desired bound4.2holds true. Since the measure

of the setZ3Z1Z2is at least 1−δ, our conclusion is proved.

5. Deriving Convergence Rates by Iteration

To applyProposition 4.1 for error analysis, we need someR1 for z ∈ WR. One may choose−1/2according to

fz

Kλ

−1/2, zZm 5.1

which is seen by takingf 0 in1.7. This choice is too rough. Recall fromProposition 3.3

thatfλ0K

#

C0λ2r−1/2−2rwhich is a bound for the noise-free limitfλ0offz. It is much

better thanλ−1/2. This motivates us to try similar tight bounds forf

z . This target will be

achieved in this section by applyingProposition 4.1iteratively. The iteration technique has been used in13,21to improve learning rates.

Lemma 5.1. Assume2.3with 0< r ≤1/2 and2.5withs >0. Takeλmαwith 0< α1 and

mβwith 0< β≤ ∞. Let 0< η <1. Ifρhas aτ-quantile ofp-average typeqfor somep0,

andq∈1,∞, then for any 0< δ <1, with confidence 1δ, there holds

fz K≤4C 3 1√2 & C1 & C2 log3 η 2 log3 δm θη, 5.2 whereθηis given by θηmax αβ 2 , α1−2r 2−2r , α 2 − 1 22−θ, α2sθ−11s 2sθ2s η ≥0. 5.3

Proof. Puttingλ mαwith 0 < α 1 and mβwith 0< β ≤ ∞intoProposition 4.1, we

know that for anyR≥1 there exists a subsetVRofZmwith measure at mostδsuch that

fz

KamR

s/22sb

m,z∈ WR\VR, 5.4

where withζ: max{αβ/2, α1−2r/2−2r, α/2−1/22−θ} ≥ 0, the constants are given by am & C3mα/2−1/22sθ, bm ⎛ ⎝√2 C1log3 δ C2log3 δ ⎞ ⎠mζ. 5.5 It follows that WR⊆ WamRs/22sbmVR. 5.6

(14)

Let us apply5.6iteratively to a sequence{Rj}J

j0defined byR0λ−1/2andRj

amRj−1s/22sbmwhereJ ∈Nwill be determined later. Then,WRj−1⊆ WRjVRj−1. By5.1,WR0 Zm. So we have ZmWR0⊆ WR1∪VR0 ⊆ · · · ⊆ W RJ∪∪jJ0−1VRj . 5.7 As the measure ofVRj is at most δ, we know that the measure of∪Jj0−1VRj is at most . Hence,WRJhas measure at least 1.

DenoteΔ s/22s<1/2. The definition of the sequence{Rj}J

j0tells us that RJam1ΔΔ2···ΔJ−1RJ J−1 j1 am1ΔΔ2···Δj−1bmΔjbm. 5.8

Let us bound the two terms on the right-hand side. The first term equals

C31−ΔJ/21−Δ2sθ−1/42s−2θ1−ΔJ/1−Δmα/J, 5.9

which is bounded by

C32sθ−1/42s−2θ1−Δmα/2−α2sθ−1/42s−2θ1−ΔΔJ

C32sθ−11s/2sθ2sm1/2sθ2−J. 5.10

TakeJto be the smallest integer greater than or equal to log1/η/log 2. The above expres-sion can be bounded byC32sθ−11s/2sθ2.

The second terms equals

J−1 j1 a1ΔΔm 2···Δj−1bmΔjbmJ−1 j1 C3mα/2−1/22sθ1−Δj/1−ΔbΔ1jmζΔjb1mζ, 5.11 whereb1 : √

2&C1log3 &C2log3. It is bounded by

2sθ−11s/2sθ2sC3b1

J−1

j0

α2sθ−11s/2sθ2ssj/22sj. 5.12

When ζα2sθ−11s/2sθ2s, the above expression is bounded by

C3b1Jmα2sθ−11s/2sθ2s. When ζα2sθ−11s/2sθ2s, it is

(15)

Based on the above discussion, we obtain

RJC3C3b1J

mθη, 5.13

whereθηmax{α2sθ−11s/2sθ2s η, ζ}. So with confidence 1−,

there holds fz KR JC 3 1√2 & C1 & C2 J log3 δm θη. 5.14

Then, our conclusion follows by replacingδbyδ/Jand notingJ≤2 log3. Now, we can prove our main result,Theorem 2.5.

Proof ofTheorem 2.5. TakeRto be the right side of5.2. ByLemma 5.1, there exists a subset

VR ofZmwith measure at mostδsuch thatZm\V

R⊆ WR. ApplyingProposition 4.1to this

R, we know that there exists another subsetVR ofZmwith measure at mostδsuch that for

any z∈ WR\VR, Eτ πfz − Eτ fρ,τ ≤2mβC1log3 δmαr/1−rC 2log 3 δm −1/2−θ C4 log3 η 2 log3 δm s/1sθη−1/2sθ, 5.15 where C4 C34C3s/1s 1√2 & C1 & C2 . 5.16

Since the setVRVR has measure at most 2δ, after scaling 2δtoδand setting the constantC

by

C2Cq,ρ

2C1C2 C4, 5.17

we see that the above estimate together withLemma 3.1gives the error bound

πfzfρ,τ LpρX∗ ≤C log3 η 2 log3 δmϑ 5.18

with confidence 1−δand the power indexϑgive by

ϑ 1 qmin β, αr 1−r, 1 2sθs 1sθη , 5.19

(16)

provided that

θη<

1s

s2sθ. 5.20

Sinceβαr/1−r, we know thatαβ/2≤α1−2r/2−2r. By the restriction 0< α <

2s/s2sθonα, we findα1−2r/2−2r<1s/s2sθandα/2−1/22−θ<

1s/s2sθ. Moreover, restriction2.7onηtell us thatα2sθ−11s/2

sθ2s η <1s/s2sθ. Therefore, condition5.20is satisfied. The restriction

βαr/1−rand the above expression forϑtells us that the power index for the error bound can be exactly expressed by formula2.9. The proof ofTheorem 2.5is complete.

Finally, we proveTheorem 2.3.

Proof ofTheorem 2.3. Sincefρ,τ ∈ HK, we know that2.3holds withr1/2. The noise

condi-tion onρis satisfied withq2 andp∈0,∞. Then,θp/p1∈0,1. SinceX ⊂Rnand

KCX×X, we know from11that2.5holds true for anys > 0. With 0< η <p

1/2p2, let us choosesto be a positive number satisfying the following four inequalities:

p1 p2 < 2s s2sθ, 1 3 < 1s$22ss2sθp1/p2% s2sθ2s , p1 p2 −2η≤ 1 2sθs$2sθp1/p2−1% 2sθ2ss 31s, p1 p2 −2η≤ 1 2sθ. 5.21

The first inequality above tells us that the restrictions onα, βare satisfied by choosingα

p1/p2 1/2−θandp1/p2≤ β ≤ ∞. The second inequality shows that condition2.7for the parameterηrenamed now asη∗is also satisfied by takingη∗ 1/3. Thus, we applyTheorem 2.5and know that with confidence 1−δ,2.8holds with the power indexϑgiven by2.9; butr 1/2, α1/2−θ, andη∗1/3 imply that

ϑ 1 2min α, 1 2sθ, 1 2sθ2sθ−1 2sθ2ss 31s . 5.22 The last two inequalities satisfied bysyieldϑα/2−η. So2.8verifies2.4. This com-pletes the proof of the theorem.

Acknowledgment

The work described in this paper is supported by National Science Foundation of China un-der Grant 11001247 and by a grant from the Research Grants Council of Hong KongProject no. CityU 103709.

(17)

References

1 V. N. Vapnik, Statistical Learning Theory, Adaptive and Learning Systems for Signal Processing, Com-munications, and Control, John Wiley & Sons, New York, NY, USA, 1998.

2 N. Aronszajn, “Theory of reproducing kernels,” Transactions of the American Mathematical Society, vol. 68, pp. 337–404, 1950.

3 H. Tong, D.-R. Chen, and L. Peng, “Analysis of support vector machines regression,” Foundations of

Computational Mathematics, vol. 9, no. 2, pp. 243–257, 2009.

4 R. Koenker, Quantile Regression, vol. 38 of Econometric Society Monographs, Cambridge University Press, Cambridge, UK, 2005.

5 I. Steinwart and A. Christman, “How SVMs can estimate quantile and the median,” in Advances in

Neural Information Processing Systems, vol. 20, pp. 305–312, MIT Press, Cambridge, Mass, USA, 2008.

6 I. Steinwart and A. Christmann, “Estimating conditional quantiles with the help of the pinball loss,”

Bernoulli, vol. 17, no. 1, pp. 211–225, 2011.

7 D. H. Xiang, “Conditional quantiles with varying Gaussians,” online first in Advances in Computational

Mathematics.

8 T. Hu, D. H. Xiang, and D. X. Zhou, “Online learning for quantile regression and support vector regression,” Submitted.

9 D.-R. Chen, Q. Wu, Y. Ying, and D.-X. Zhou, “Support vector machine soft margin classifiers: error analysis,” Journal of Machine Learning Research, vol. 5, pp. 1143–1175, 2004.

10 D.-X. Zhou, “The covering number in learning theory,” Journal of Complexity, vol. 18, no. 3, pp. 739– 767, 2002.

11 D.-X. Zhou, “Capacity of reproducing kernel spaces in learning theory,” IEEE Transactions on

Infor-mation Theory, vol. 49, no. 7, pp. 1743–1752, 2003.

12 F. Cucker and D.-X. Zhou, Learning Theory: An Approximation Theory Viewpoint, Cambridge Mono-graphs on Applied and Computational Mathematics, Cambridge University Press, Cambridge, UK, 2007.

13 Q. Wu, Y. Ying, and D.-X. Zhou, “Learning rates of least-square regularized regression,” Foundations

of Computational Mathematics, vol. 6, no. 2, pp. 171–192, 2006.

14 Q. Wu and D.-X. Zhou, “Analysis of support vector machine classification,” Journal of Computational

Analysis and Applications, vol. 8, no. 2, pp. 99–119, 2006.

15 T. Hu, “Online regression with varying Gaussians and non-identical distributions,” Analysis and

Ap-plications, vol. 9, no. 4, pp. 395–408, 2011.

16 D.-H. Xiang, T. Hu, and D.-X. Zhou, “Learning with varying insensitive loss,” Applied Mathematics

Letters, vol. 24, no. 12, pp. 2107–2109, 2011.

17 S. Smale and D.-X. Zhou, “Learning theory estimates via integral operators and their approxima-tions,” Constructive Approximation, vol. 26, no. 2, pp. 153–172, 2007.

18 S. Smale and D.-X. Zhou, “Online learning with Markov sampling,” Analysis and Applications, vol. 7, no. 1, pp. 87–113, 2009.

19 Y. Yao, “On complexity issues of online learning algorithms,” IEEE Transactions on Information Theory, vol. 56, no. 12, pp. 6470–6481, 2010.

20 Y. Ying, “Convergence analysis of online algorithms,” Advances in Computational Mathematics, vol. 27, no. 3, pp. 273–291, 2007.

21 I. Steinwart and C. Scovel, “Fast rates for support vector machines using Gaussian kernels,” The

(18)

Submit your manuscripts at

http://www.hindawi.com

Hindawi Publishing Corporation

http://www.hindawi.com Volume 2014

Mathematics

Journal of

Hindawi Publishing Corporation

http://www.hindawi.com Volume 2014 Mathematical Problems in Engineering

Hindawi Publishing Corporation http://www.hindawi.com

Differential Equations

International Journal of

Volume 2014

Applied Mathematics Hindawi Publishing Corporation

http://www.hindawi.com Volume 2014

Probability and Statistics Hindawi Publishing Corporation

http://www.hindawi.com Volume 2014

Hindawi Publishing Corporation

http://www.hindawi.com Volume 2014

Mathematical PhysicsAdvances in

Complex Analysis

Journal of

Hindawi Publishing Corporation

http://www.hindawi.com Volume 2014

Optimization

Journal of Hindawi Publishing Corporation

http://www.hindawi.com Volume 2014

Combinatorics

Hindawi Publishing Corporation

http://www.hindawi.com Volume 2014

International Journal of Hindawi Publishing Corporation

http://www.hindawi.com Volume 2014

Operations Research

Journal of

Hindawi Publishing Corporation

http://www.hindawi.com Volume 2014

Function Spaces

Abstract and Applied Analysis Hindawi Publishing Corporation

http://www.hindawi.com Volume 2014 International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporation http://www.hindawi.com Volume 2014

The Scientific

World Journal

Hindawi Publishing Corporation

http://www.hindawi.com Volume 2014

Hindawi Publishing Corporation

http://www.hindawi.com Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporation

http://www.hindawi.com Volume 2014 Hindawi Publishing Corporation

http://www.hindawi.com Volume 2014

Decision Sciences

Discrete Mathematics

Journal of

Hindawi Publishing Corporation

http://www.hindawi.com Volume 2014 Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Stochastic Analysis

References

Related documents

Quantification of CB2 (A) and MOR (F) immufluorescence-staining in the dorsal horn of the spinal cord (DHSC), ipsilateral to sham surgery or chronic constriction injury (CCI),

In order to reduce the overall complexity of an articulated robotic hand, preserving its dexterity, a traditional mechatronic approach has been assumed, and, besides the mechanics

In the market scenarios (1–4), the water users can reap the increase in rent due to the restriction on the resources; in the non-market scenario (5), the water users cannot use

In this study, a simulation optimization spreadsheet model was proposed for the decision of expanding operations in the fast-food industry.. The model was presented via a

They are presented in the form of FITS files with two tables, one (called SOURCES) containing the source mag- nitudes and fluxes for 6 filters, as well as the magnitude and flux

The following six knowledge and skills competencies are both important and used frequently by managers for on-the-job coaching: active listening, effective questioning,

So, we cannot obtain the Muskat problem of the liquid motion in the pore space of an absolutely rigid body as a homogenization of the corresponding initial boundary value problem for

• As the number of double bonds in the fatty acid.. increases , the melting point