Volume 2012, Article ID 902139,17pages doi:10.1155/2012/902139
Research Article
Approximation Analysis of
Learning Algorithms for Support Vector
Regression and Quantile Regression
Dao-Hong Xiang,
1Ting Hu,
2and Ding-Xuan Zhou
31Department of Mathematics, Zhejiang Normal University, Zhejiang 321004, China 2School of Mathematics and Statistics, Wuhan University, Wuhan 430072, China 3Department of Mathematics, City University of Hong Kong, Kowloon, Hong Kong
Correspondence should be addressed to Ding-Xuan Zhou,[email protected]
Received 14 July 2011; Accepted 14 November 2011 Academic Editor: Yuesheng Xu
Copyrightq2012 Dao-Hong Xiang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
We study learning algorithms generated by regularization schemes in reproducing kernel Hilbert spaces associated with an-insensitive pinball loss. This loss function is motivated by the -insen-sitive loss for support vector regression and the pinball loss for quantile regression. Approximation analysis is conducted for these algorithms by means of a variance-expectation bound when a noise condition is satisfied for the underlying probability measure. The rates are explicitly derived under a priori conditions on approximation and capacity of the reproducing kernel Hilbert space. As an application, we get approximation orders for the support vector regression and the quantile regu-larized regression.
1. Introduction and Motivation
In this paper, we study a family of learning algorithms serving both purposes of support vector
regression and quantile regression. Approximation analysis and learning rates will be provided,
which also helps better understanding of some classical learning methods.
Support vector regression is a classical kernel-based algorithm in learning theory introduced in1. It is a regularization scheme in a reproducing kernel Hilbert spaceRKHS HKassociated with an-insensitive lossψ:R → Rdefined for≥0 by
ψu max{|u| −,0}
|u| −, if |u| ≥,
Here, for learning functions on a compact metric spaceX,K :X×X → Ris a continuous, symmetric, and positive semidefinite function called a Mercer kernel. The associated RKHS HKis defined2as the completion of the linear span of the set of function{Kx Kx,· :
x ∈X}with the inner product·,·K satisfyingKx, KyK Kx, y. LetY Randρbe a
Borel probability measure onZ:X×Y. With a sample z{xi, yi}mi1∈Zmindependently
drawn according toρ, the support vector regression is defined as
fzSVRarg min f∈HK 1 m m i1 ψfxi−yi λf2K , 1.2
whereλλm>0 is a regularization parameter.
When >0 is fixed, convergence of1.2was analyzed in3. Notice from the original motivation1for the insensitive parameterfor balancing the approximation and sparsity of the algorithm thatshould change with the sample size and usuallym → 0 as the sample sizemincreases. Mathematical analysis for this original algorithm is still open. We will solve this problem in a special case of our approximation analysis for general learning algorithms. In particular, we show howfSVR
z approximates the median functionfρ,1/2, which
is one of the purposes of this paper. Here, forx∈X, the median function valuefρ,1/2xis a
median of the conditional distributionρ· |xofρatx.
Quantile regression, compared with the least squares regression, provides richer infor-mation about response variables such as stretching or compressing tails4. It aims at esti-mating quantile regression functions. With a quantile parameter 0< τ <1, a quantile regression
functionfρ,τ is defined by its valuefρ,τxto be aτ-quantile ofρ· |x, that is, a valueu∈Y
satisfying
ρy∈Y :y≤u |x≥τ, ρy∈Y :y≥u |x≥1−τ. 1.3 Quantile regression has been studied by kernel-based regularization schemes in a learning theory literaturee.g.,5–8. These regularization schemes take the form
fzQRarg min f∈HK 1 m m i1 ψτ fxi−yi λf2K , 1.4
whereψτ :R → Ris the pinball loss shown inFigure 1defined by
ψτu
1−τu, if u >0,
−τu, if u≤0. 1.5
Motivated by the-insensitive lossψand the pinball lossψ
τ, we propose the
-insen-sitive pinball lossψ
τ :R → Rwith an insensitive parameter≥0 shown inFigure 1defined
as ψτu ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ 1−τu−, ifu > , −τu, ifu≤ −, 0, otherwise. 1.6
0 u −τu (1−τ)u ψτ(u) 0 ψɛτ(u) −τ(u+ɛ) (1−τ)(u u −ɛ) −ɛ ɛ Figure 1
This loss function has been applied to online learning for quantile regression in our previous work8. It is applied here to a regularization scheme in the RKHS as
fzfz,λ,τ arg min f∈HK 1 m m i1 ψτfxi−yi λf2K . 1.7
The main goal of this paper is to study how the output functionfzgiven by1.7converges
to the quantile regression functionfρ,τ and how explicit learning rates can be obtained with
suitable choices of the parameters λ m−α, m−β based on a priori conditions on the
probability measureρ.
2. Main Results on Approximation
Throughout the paper, we assume that the conditional distributionρ· | xis supported on
−1,1for everyx∈X. Then, we see from1.3that we can take values offρ,τto be on−1,1.
So to see howfzapproximatesfρ,τ, it is natural to project values of the output functionfz
onto the same interval by the projection operator introduced in9.
Definition 2.1. The projection operatorπon the space of function onXis defined by
πfx ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ 1, iffx>1, −1, iffx<−1, fx, if −1≤fx≤1. 2.1
Our approximation analysis aims at establishing bounds for the errorπfz−fρ,τLp∗ ρX
in the spaceLpρ∗Xwith somep∗>0 whereρXis the marginal distribution ofρonX.
2.1. Support Vector Regression and Quantile Regression
Our error bounds and learning rates are presented in terms of a noise condition and approx-imation condition onρ.
Definition 2.2. Letp∈0,∞andq∈1,∞. We say thatρhas aτ-quantile ofp-average type
qif for everyx∈X, there exist aτ-quantilet∗∈Rand constantsax ∈0,2, bx>0 such that
for eachu∈0, ax,
ρy∈t∗−u, t∗ |x≥bxuq−1, ρ
y∈t∗, t∗u |x≥bxuq−1, 2.2
and that the function onXtaking valuebxaqx−1−1atx∈Xlies inLpρX.
Note that condition2.2tells us thatfρ,τx t∗is uniquely defined at everyx∈X.
The approximation condition on ρ is stated in terms of the integral operator LK :
L2
ρX → L
2
ρX defined byLKfx
XKx, ufudρXu. SinceK is positive semidefinite,
LK is a compact positive operator and itsr-th powerLrKis well-defined for anyr > 0. Our
approximation condition is given as
fρ,τ LrK gρ,τ for some 0< r ≤ 1 2, gρ,τ ∈L 2 ρX. 2.3
Let us illustrate our approximation analysis by the following special case which will be proved inSection 5.
Theorem 2.3. LetX ⊂ Rn andK ∈C∞X×X. Assumef
ρ,τ ∈ HKandρhas aτ-quantile ofp
-average type 2 for somep∈0,∞. Takeλm−p1/p2andm−βwithp1/p2≤β≤ ∞.
Let 0< η <p1/2p2. Then, withp∗ 2p/p1 >0, for any 0< δ <1, with confidence 1−δ, one has πfz −fρ,τ LpρX∗ ≤ Clog3 δm η−p1/2p2, 2.4
whereCis a constant independent ofmorδ.
Ifp≥1/2η−2 for 0< η <1/4, we see that the power exponent for the learning rate
2.4is at least1/2−2η. This exponent can be arbitrarily close to 1/2 whenηis small enough. In particular, if we takeτ 1/2,Theorem 2.3provides rates for output functionfzSVR
of the support vector regression1.2to approximate the median functionfρ,1/2.
If we takeβ∞leading to0,Theorem 2.3provides rates for output functionfzQR
of the quantile regression algorithm1.4to approximate the quantile regression functionfρ,τ.
2.2. General Approximation Analysis
To state our approximation analysis in the general case, we need the capacity of the hypo-thesis space measured by covering numbers.
Definition 2.4. For a subsetSofCXandu > 0, the covering numberNS, uis the minimal integerl∈Nsuch that there existldisks with radiusucoveringS.
The covering numbers of ballsBR {f ∈ HK :fK ≤ R}with radiusR > 0 of the
RKHS have been well studied in the learning theory literature10,11. In this paper, we as-sume for somes >0 andCs>0 that
logNB1, u≤Cs 1 u s , ∀u >0. 2.5
Now we can state our main result which will be proved inSection 5. Forp ∈ 0,∞
andq∈1,∞, we denote θmin 2 q, p p1 ∈0,1. 2.6
Theorem 2.5. Assume2.3with 0 < r ≤1/2 and2.5withs > 0. Suppose thatρhas aτ
-quan-tile ofp-average typeqfor somep ∈0,∞andq∈1,∞. Takeλ m−αwith 0 < α≤1 andα <
2s/s2s−θ. Setm−βwithαr/1−r≤β≤ ∞. Let
0< η < 1s22s−sα2s−θ
s2s−θ2s . 2.7
Then, withp∗pq/p1>0, for any 0< δ <1, with confidence 1−δ, one has
πfz −fρ,τ LpρX∗ ≤ C log3 η 2 log3 δm −ϑ, 2.8
whereCis a constant independent ofmorδand the power indexϑis given in terms ofr, s, p, q, α, andηby ϑ 1 qmin αr 1−r, 1 2s−θ− sα2s−θ−1 2s−θ2s − sη 1s, 1 2s−θ − sα1−2r 1s2−2r, 1 2s−θ− s 1s α 2 − 1 22−θ . 2.9
The indexϑcan be viewed as a function of variablesr, s, p, q, α, η. The restriction 0 < α <2s/s2s−θonαand2.7onηensure thatϑis positive, which verifies the valid learning rate inTheorem 2.5.
Assumption2.5is a measurement of regularity of the kernelKwhenXis a subset of Rn. In particular,scan be arbitrarily small whenKis smooth enough. In this case, the power
indexϑin2.8can be arbitrarily close to1/qmin{αr/1−r,1/2−θ}. Again, whenβ∞,
0, algorithm 1.7corresponds to algorithm1.4 for quantile regression. In this case,
Error analysis has been done for quantile regression algorithm1.4in5,6. Under the assumptions thatρsatisfies2.2with somep∈0,∞andq >1 and the2empirical covering
numberNzB1, η, 2 see5for more detailsofB1is bounded as
sup z∈Zm logNz B1, η, 2 ≤a 1 η s withs∈0,2, a≥1, 2.10
it was proved in5that with confidence 1−δ,
πfzQR −fρ,τ LpρX∗ ≤ ⎧ ⎨ ⎩Dτλ Dτλ λ log3/δ m Ks,Cpa λs/2m p1/p2−s/2 Ks,Cpa λs/2m 5 32Cplog3/δ m p1/p2 145 log3/δ m ⎫ ⎪ ⎪ ⎬ ⎪ ⎪ ⎭ 1/q , 2.11 whereCpandKs,Cpare constants independent ofmorλ. Here,Dτλis the regularization error
defined as Dτλ inf f∈HK Eτ f− Eτ fρ,τ λf2K, 2.12
andEτfis the generalization error associated with the pinball lossψτdefined by
Eτ f Z ψτ fx−ydρ X Y ψτ fx−ydρy|xdρXx. 2.13
Note thatEτfis minimized by the quantile regression functionfρ,τ. Thus, when the
regular-ization errorDτλdecays polynomially asDτλOλr/1−r which is ensured byLemma 2.6
below when2.3is satisfiedandλm−α, thenπfzQR−fρ,τLp∗
ρX Olog3/δm −ϑwith ϑ 1 qmin αr 1−r, p1 p2−s/2 1−αs 2 ,p1 p2 . 2.14
Sincep1/p2 1/2−θ, we see that this learning rate is comparable to our result in
2.3. Comparison with Least Squares Regression
There has been a large literature in learning theorydescribed in12for the least squares algorithms: fzLSarg min f∈HK 1 m m i1 fxi−yi 2 λf2K . 2.15
It aims at learning the regression functionfρx
Yydρy | x. A crucial property for its
error analysis is the identityElsf− Elsfρ f−fρ2L2
ρX for the least squares generalization
errorElsf
Zy−fx
2dρ. It yields a variance-expectation bound Eξ2 ≤ 4Eξfor the
random variableξ y−fx2−y−fρx2onZ, ρwheref :X → Yis an arbitrary
mea-surable function. Such a variance-expectation bound with Eξpossibly replaced by its posi-tive powerEξθplays an essential role for analyzing regularization schemes and the power exponentθdepends on strong convexity of the loss. See13and references therein. However, the pinball loss in the quantile regression setting has no strong convexity6and we would not expect a variance-expectation bound for a general distributionρ. Whenρhas aτ-quantile ofp-average typeq, the following variance-expectation bound withθgiven by2.6can be found in5,7 derived by means ofLemma 3.1below.
Lemma 2.6. Ifρhas aτ-quantile ofp-average typeqfor somep∈0,∞andq∈1,∞, then
Eψτ fx−y−ψτ fρ,τx−y 2≤ Cθ Eτ f− Eτ fρ,τ θ , ∀f:X−→Y, 2.16
where the power indexθis given by2.6and the constantCθisCθ22−θqθγ−1θLp
ρX.
Lemma 2.6overcomes the difficulty of quantile regression caused by lack of strong convexity of the pinball loss. It enables us to derive satisfactory learning rates, as in
Theorem 2.5.
3. Insensitive Relation and Error Decomposition
An important relation for quantile regression observed in5assets that the errorπfz−
fρ,τ taken in a suitable Lp
∗
ρX space can be bounded by the excess generalization error
Eτπfz− Eτfρ,τwhen the noise condition is satisfied.
Lemma 3.1. Letp ∈0,∞andq∈ 1,∞. Denotep∗ pq/p1 >0. Ifρhas aτ-quantile of
p-average typeq, then for any measurable functionfonX, one has
f−fρ,τLp∗ ρX ≤Cq,ρ Eτ f− Eτ fρ,τ 1/q , 3.1 whereCq,ρ21−1/qq1/q{bxaqx−1−1}x∈X 1/q LpρX.
ByLemma 3.1, to estimateπfz−fρ,τLp∗
ρX, we only need to bound the excess
gener-alization errorEτπfz− Eτfρ,τ. This will be done by conducting an error decomposition
which has been developed in the literature for regularization schemes9,13–15. Technical difficulty arises for our problem here because the insensitive parameterchanges withm. This can be overcome16by the following insensitive relation
ψτu−≤ψτu≤ψτu, u∈R. 3.2
Now, we can conduct an error decomposition. Define the empirical errorEz,τffor
f:X → Ras Ez,τ f 1 m m i1 ψτ fxi−yi . 3.3
Lemma 3.2. Letλ >0,fzbe defined by1.7and
fλ0arg min f∈HK Eτ f− Eτ fρ,τ λf2K. 3.4 Then, Eτ πfz − Eτ fρ,τ λfz 2 K≤ S1S2Dτλ, 3.5 where S1 Eτ πfz − Eτ fρ,τ − Ez,τ πfz − Ez,τ fρ,τ , S2 Ez,τ fλ0− Ez,τ fρ,τ − Eτ fλ0− Eτ fρ,τ , 3.6
Proof. The regularized excess generalization errorEτπfz− Eτfρ,τ λfz2Kcan be
ex-pressed as Eτ πfz − Ez,τ πfz ! Ez,τ πfz λfz 2 K " − ! Ez,τ fλ0λfλ02 K " Ez,τ fλ0− Eτ fλ0 Eτ fλ0− Eτ fρ,τ λfλ02 K . 3.7
The fact|y| ≤ 1 implies thatEz,τπfz ≤ Ez,τfz. The insensitive relation3.2and the
definition offztell us that
Ez,τ fz λfz 2 K≤ 1 m m i1 ψτfzxi−yi λfz 2 K ≤ 1 m m i1 ψτfλ0xi−yi λfλ02 K≤ Ez,τ fλ0λfλ02 K. 3.8
Then, by subtracting and addingEτfρ,τandEz,τfρ,τand notingDτλ Eτfλ0−Eτfρ,τ
λfλ02
K, we see that the desired inequality inLemma 3.2holds true.
In the error decomposition3.5, the first two terms are called sample error. The last term is the regularization error defined in2.12. It can be estimated as follows.
Proposition 3.3. Assume2.3. Definefλ0by3.4. Then, one has
Dτλ≤C0λr/1−r, fλ0 K≤ # C0λ2r−1/2−2r, 3.9
whereC0is the constantC0 gρ,τL2
ρX gρ,τ
2
L2
ρX.
Proof. Letμλ1/1−r>0 and
fμ
LKμI
−1
LKfρ,τ. 3.10
It can be found in17,18that when2.3holds, we have
fμ−fρ,τ2L2 ρXμfμ 2 K≤μ 2rg ρ,τ2L2 ρX. 3.11 Hence,fμ−fρ,τL2 ρX ≤μ rg ρ,τL2 ρX andfμ 2 K≤μ2r−1gρ,τ2L2
ρX. Sinceψτis Lipschitz, we know
by takingffμin2.12that Dτλ≤ Z ψτ fμx−y −ψτ fρ,τx−y λfμ2K ≤fμ−fρ,τL1 ρXλ fμ2 K≤fμ−fρ,τL2 ρXλ fμ2 K ≤μrgρ,τL2 ρXλμ 2r−1g ρ,τ2L2 ρX gρ,τL2 ρX gρ,τ2 L2 ρX λr/1−r. 3.12
This verifies the desired bound forDτλ. By takingf 0 in2.12, we have
λfλ02
K≤ Dτλ≤C0λ
r/1−r. 3.13
4. Estimating Sample Error
This section is devoted to estimating the sample error. This is conducted by using the vari-ance-expectation bound inLemma 2.6.
Denoteκsupx∈X#Kx, x. ForR≥1, denote WR z∈Zm:fz
K≤R
. 4.1
Proposition 4.1. Assume2.3and2.5. LetR≥1 and 0< δ <1. Ifρhas aτ-quantile ofp-average
typeqfor somep∈0,∞andq∈1,∞, then there exists a subsetVR ofZmwith measure at most
δsuch that for any z∈ WR\VR,
Eτ πfz − Eτ fρ,τ λfz 2 K≤2C 1log 3 δλ r/1−rmax 1,λ −1/2−2r m C2log3 δm −1/2−θC 3m−1/2s−θRs/1s, 4.2
whereθis given by2.6andC1, C2, C3are constants given by
C14C02κ # C0, C2245324C 1/2−θ θ , C3 406Cs1/1s408CθCs1/2s−θ. 4.3
Proof. Let us first estimate the second partS2of the sample error. It can be decomposed into
two partsS2S2,1S2,2where
S2,1 Ez,τ fλ0− Ez,τ πfλ0 −Eτ fλ0− Eτ πfλ0 , S2,2 Ez,τ πfλ0− Ez,τ fρ,τ − Eτ πfλ0− Eτ fρ,τ . 4.4
For boundingS2,1, we take the random variableξzψτfλ0x−y−ψτπfλ0x−y
onZ, ρ. It satisfies 0≤ξ≤ |πfλ0x−fλ0x| ≤1fλ0∞. Hence,|ξ−Eξ| ≤1fλ0∞
and Eξ−Eξ2 ≤Eξ2 ≤1f0
λ ∞Eξ. Applying the one-side Bernstein inequality12,
we know that there exists a subsetZ1,δofZmwith measure at least 1−δ/3such that
S2,1≤ 71fλ0 ∞ log3/δ 6m Eτ fλ0− Eτ πfλ0, ∀z∈Z1,δ. 4.5
For S2,2, we apply the one-side Bernstein inequality again to the random variable
ξz ψτπfλ0x−y−ψτfρ,τx−y, bound the variance byLemma 2.6withfπfλ0,
and find that there exists another subsetZ2,δofZmwith measure at least 1−δ/3such that
S2,2≤ 4 log3/δ 3m θ 2 θ/4−2θ 1−θ 2 2C θlog3/δ m 1/2−θ Eτ πfλ0− Eτ fρ,τ , ∀z∈Z2,δ. 4.6
Next, we estimate the first partS1of the sample error. Consider the function set
Gψτ πfx−y−ψτ fρ,τx−y :fK≤R . 4.7
A function from this setgz ψτπfx−y−ψτfρ,τx−ysatisfies Eg≥0,|gz| ≤2,
and Eg2 ≤ C
θEgθ by 2.16. Also, the Lipschitz property of the pinball loss yields
NG, u≤ NB1, u/R. Then, we apply a standard covering number argument with a ratio
inequality12,13,19,20toGand find from the covering number condition2.5that
Probz∈Zm ⎧ ⎪ ⎨ ⎪ ⎩fsupK≤R $ Eτ πf− Eτ fρ,τ % −$Ez,τ πf− Ez,τ fρ,τ % & Eτ πf− Eτ fρ,τ θ uθ ≤4u1−θ/2 ⎫ ⎪ ⎬ ⎪ ⎭ ≥1− NB1, u R exp − mu2−θ 2Cθ 4/3u1−θ ≥1−exp Cs R u s − mu2−θ 2Cθ 4/3u1−θ . 4.8
Setting the confidence to be 1−δ/3, we takeu∗R, m, δ/3to be the positive solution to the equation Cs R u s − mu2−θ 2Cθ 4/3u1−θ logδ 3. 4.9
Then, there exists a third subsetZ3,δ ofZmwith measure at least 1−δ/3such that
sup fK≤R $ Eτ πf−Eτ fρ,τ % −$Ez,τ πf−Ez,τ fρ,τ % & Eτ πf−Eτ fρ,τ θ u∗R, m, δ/3θ ≤4 u∗ R, m,δ 3 1−θ/2 , ∀z∈Z3,δ. 4.10
Thus, forz∈ WR∩Z3,δ, we have S1 Eτ πfz − Eτ fρ,τ − Ez,τ πfz − Ez,τ fρ,τ ≤4 ! u∗ R, m,δ 3 "1−θ/2 Eτ πfz − Eτ fρ,τ θ ! u∗ R, m,δ 3 "θ ≤ 1−θ 2 42/2−θu∗ R, m,δ 3 θ 2 Eτ πfz − Eτ fρ,τ 4u∗ R, m,δ 3 . 4.11
Here, we have used the elementary inequality√ab ≤ √a√band Young’s inequality. Putting this bound and4.5,4.6into3.5, we know that for z∈ WR∩Z3,δ∩Z1,δ∩Z2,δ,
there holds Eτ πfz− Eτ fρ,τ λfz 2 K≤ 1 2 Eτ πfz − Eτ fρ,τ 2Dτλ 20u∗ R, m,δ 3 7 1fλ0 ∞ log3/δ 6m 4 log3/δ 3m 2C θlog3/δ m 1/2−θ , 4.12 which together withProposition 3.3implies
Eτ πfz− Eτ fρ,τ λfz 2 K≤24C0λ r/1−r522C θ1/2−θ log3 δm −1/2−θ 40u∗ R, m,δ 3 2κ#C0log 3 δ λ2r−1/2−2r m . 4.13 Here, we have used the reproducing property inHKwhich yields12
f∞≤κfK, ∀f∈ HK. 4.14
Equation4.9can be expressed as
u2s−θ− 4 3mlog 3 δu 1s−θ−2Cθ m log 3 δu s− 4CsRs 3m u 1−θ−2CθCsRs m 0. 4.15
By Lemma 7.2 in12, the positive solutionu∗R, m, δ/3to this equation can be bounded as
u∗R, m, δ/3≤max 6 mlog 3 δ, 8Cθ m log 3 δ 1/2−θ , 6CsRs m 1/1s , 8CθCsRs m 1/2s−θ ≤6 8Cθ1/2−θ log3 δm −1/2−θ6C s1/1s 8CθCs1/2s−θ ×m−1/2s−θRs/1s. 4.16
Thus, for z∈ WR∩Z3,δ∩Z1,δ∩Z2,δ, the desired bound4.2holds true. Since the measure
of the setZ3,δ∩Z1,δ∩Z2,δis at least 1−δ, our conclusion is proved.
5. Deriving Convergence Rates by Iteration
To applyProposition 4.1 for error analysis, we need someR ≥ 1 for z ∈ WR. One may chooseRλ−1/2according to
fz
K≤λ
−1/2, ∀z∈Zm 5.1
which is seen by takingf 0 in1.7. This choice is too rough. Recall fromProposition 3.3
thatfλ0K≤
#
C0λ2r−1/2−2rwhich is a bound for the noise-free limitfλ0offz. It is much
better thanλ−1/2. This motivates us to try similar tight bounds forf
z . This target will be
achieved in this section by applyingProposition 4.1iteratively. The iteration technique has been used in13,21to improve learning rates.
Lemma 5.1. Assume2.3with 0< r ≤1/2 and2.5withs >0. Takeλm−αwith 0< α≤1 and
m−βwith 0< β≤ ∞. Let 0< η <1. Ifρhas aτ-quantile ofp-average typeqfor somep∈0,∞
andq∈1,∞, then for any 0< δ <1, with confidence 1−δ, there holds
fz K≤4C 3 1√2 & C1 & C2 log3 η 2 log3 δm θη, 5.2 whereθηis given by θηmax α−β 2 , α1−2r 2−2r , α 2 − 1 22−θ, α2s−θ−11s 2s−θ2s η ≥0. 5.3
Proof. Puttingλ m−αwith 0 < α≤ 1 and m−βwith 0< β ≤ ∞intoProposition 4.1, we
know that for anyR≥1 there exists a subsetVRofZmwith measure at mostδsuch that
fz
K≤amR
s/22sb
m, ∀z∈ WR\VR, 5.4
where withζ: max{α−β/2, α1−2r/2−2r, α/2−1/22−θ} ≥ 0, the constants are given by am & C3mα/2−1/22s−θ, bm ⎛ ⎝√2 C1log3 δ C2log3 δ ⎞ ⎠mζ. 5.5 It follows that WR⊆ WamRs/22sbm ∪VR. 5.6
Let us apply5.6iteratively to a sequence{Rj}J
j0defined byR0λ−1/2andRj
amRj−1s/22sbmwhereJ ∈Nwill be determined later. Then,WRj−1⊆ WRj∪VRj−1. By5.1,WR0 Zm. So we have ZmWR0⊆ WR1∪VR0 ⊆ · · · ⊆ W RJ∪∪jJ0−1VRj . 5.7 As the measure ofVRj is at most δ, we know that the measure of∪Jj0−1VRj is at most Jδ. Hence,WRJhas measure at least 1−Jδ.
DenoteΔ s/22s<1/2. The definition of the sequence{Rj}J
j0tells us that RJam1ΔΔ2···ΔJ−1R0Δ J J−1 j1 am1ΔΔ2···Δj−1bmΔjbm. 5.8
Let us bound the two terms on the right-hand side. The first term equals
C31−ΔJ/21−Δmα2s−θ−1/42s−2θ1−ΔJ/1−Δmα/2ΔJ, 5.9
which is bounded by
C3mα2s−θ−1/42s−2θ1−Δmα/2−α2s−θ−1/42s−2θ1−ΔΔJ
≤C3mα2s−θ−11s/2s−θ2sm1/2s−θ2−J. 5.10
TakeJto be the smallest integer greater than or equal to log1/η/log 2. The above expres-sion can be bounded byC3mα2s−θ−11s/2s−θ2sη.
The second terms equals
J−1 j1 a1ΔΔm 2···Δj−1bmΔjbm≤ J−1 j1 C3mα/2−1/22s−θ1−Δj/1−ΔbΔ1jmζΔjb1mζ, 5.11 whereb1 : √
2&C1log3/δ &C2log3/δ. It is bounded by
mα2s−θ−11s/2s−θ2sC3b1
J−1
j0
mζ−α2s−θ−11s/2s−θ2ssj/22sj. 5.12
When ζ ≤ α2s−θ−11s/2s−θ2s, the above expression is bounded by
C3b1Jmα2s−θ−11s/2s−θ2s. When ζ ≥ α2s−θ−11s/2s−θ2s, it is
Based on the above discussion, we obtain
RJ≤C3C3b1J
mθη, 5.13
whereθηmax{α2s−θ−11s/2s−θ2s η, ζ}. So with confidence 1−Jδ,
there holds fz K≤R J≤C 3 1√2 & C1 & C2 J log3 δm θη. 5.14
Then, our conclusion follows by replacingδbyδ/Jand notingJ≤2 log3/η. Now, we can prove our main result,Theorem 2.5.
Proof ofTheorem 2.5. TakeRto be the right side of5.2. ByLemma 5.1, there exists a subset
VR ofZmwith measure at mostδsuch thatZm\V
R⊆ WR. ApplyingProposition 4.1to this
R, we know that there exists another subsetVR ofZmwith measure at mostδsuch that for
any z∈ WR\VR, Eτ πfz − Eτ fρ,τ ≤2m−βC1log3 δm −αr/1−rC 2log 3 δm −1/2−θ C4 log3 η 2 log3 δm s/1sθη−1/2s−θ, 5.15 where C4 C34C3s/1s 1√2 & C1 & C2 . 5.16
Since the setVR∪VR has measure at most 2δ, after scaling 2δtoδand setting the constantC
by
C2Cq,ρ
2C1C2 C4, 5.17
we see that the above estimate together withLemma 3.1gives the error bound
πfz −fρ,τ LpρX∗ ≤C log3 η 2 log3 δm −ϑ 5.18
with confidence 1−δand the power indexϑgive by
ϑ 1 qmin β, αr 1−r, 1 2s−θ − s 1sθη , 5.19
provided that
θη<
1s
s2s−θ. 5.20
Sinceβ≥αr/1−r, we know thatα−β/2≤α1−2r/2−2r. By the restriction 0< α <
2s/s2s−θonα, we findα1−2r/2−2r<1s/s2s−θandα/2−1/22−θ<
1s/s2s−θ. Moreover, restriction2.7onηtell us thatα2s−θ−11s/2
s−θ2s η <1s/s2s−θ. Therefore, condition5.20is satisfied. The restriction
β≥αr/1−rand the above expression forϑtells us that the power index for the error bound can be exactly expressed by formula2.9. The proof ofTheorem 2.5is complete.
Finally, we proveTheorem 2.3.
Proof ofTheorem 2.3. Sincefρ,τ ∈ HK, we know that2.3holds withr1/2. The noise
condi-tion onρis satisfied withq2 andp∈0,∞. Then,θp/p1∈0,1. SinceX ⊂Rnand
K ∈ C∞X×X, we know from11that2.5holds true for anys > 0. With 0< η <p
1/2p2, let us choosesto be a positive number satisfying the following four inequalities:
p1 p2 < 2s s2s−θ, 1 3 < 1s$22s−s2s−θp1/p2% s2s−θ2s , p1 p2 −2η≤ 1 2s−θ − s$2s−θp1/p2−1% 2s−θ2s − s 31s, p1 p2 −2η≤ 1 2s−θ. 5.21
The first inequality above tells us that the restrictions onα, βare satisfied by choosingα
p1/p2 1/2−θandp1/p2≤ β ≤ ∞. The second inequality shows that condition2.7for the parameterηrenamed now asη∗is also satisfied by takingη∗ 1/3. Thus, we applyTheorem 2.5and know that with confidence 1−δ,2.8holds with the power indexϑgiven by2.9; butr 1/2, α1/2−θ, andη∗1/3 imply that
ϑ 1 2min α, 1 2s−θ, 1 2s−θ − sα2s−θ−1 2s−θ2s − s 31s . 5.22 The last two inequalities satisfied bysyieldϑ ≥α/2−η. So2.8verifies2.4. This com-pletes the proof of the theorem.
Acknowledgment
The work described in this paper is supported by National Science Foundation of China un-der Grant 11001247 and by a grant from the Research Grants Council of Hong KongProject no. CityU 103709.
References
1 V. N. Vapnik, Statistical Learning Theory, Adaptive and Learning Systems for Signal Processing, Com-munications, and Control, John Wiley & Sons, New York, NY, USA, 1998.
2 N. Aronszajn, “Theory of reproducing kernels,” Transactions of the American Mathematical Society, vol. 68, pp. 337–404, 1950.
3 H. Tong, D.-R. Chen, and L. Peng, “Analysis of support vector machines regression,” Foundations of
Computational Mathematics, vol. 9, no. 2, pp. 243–257, 2009.
4 R. Koenker, Quantile Regression, vol. 38 of Econometric Society Monographs, Cambridge University Press, Cambridge, UK, 2005.
5 I. Steinwart and A. Christman, “How SVMs can estimate quantile and the median,” in Advances in
Neural Information Processing Systems, vol. 20, pp. 305–312, MIT Press, Cambridge, Mass, USA, 2008.
6 I. Steinwart and A. Christmann, “Estimating conditional quantiles with the help of the pinball loss,”
Bernoulli, vol. 17, no. 1, pp. 211–225, 2011.
7 D. H. Xiang, “Conditional quantiles with varying Gaussians,” online first in Advances in Computational
Mathematics.
8 T. Hu, D. H. Xiang, and D. X. Zhou, “Online learning for quantile regression and support vector regression,” Submitted.
9 D.-R. Chen, Q. Wu, Y. Ying, and D.-X. Zhou, “Support vector machine soft margin classifiers: error analysis,” Journal of Machine Learning Research, vol. 5, pp. 1143–1175, 2004.
10 D.-X. Zhou, “The covering number in learning theory,” Journal of Complexity, vol. 18, no. 3, pp. 739– 767, 2002.
11 D.-X. Zhou, “Capacity of reproducing kernel spaces in learning theory,” IEEE Transactions on
Infor-mation Theory, vol. 49, no. 7, pp. 1743–1752, 2003.
12 F. Cucker and D.-X. Zhou, Learning Theory: An Approximation Theory Viewpoint, Cambridge Mono-graphs on Applied and Computational Mathematics, Cambridge University Press, Cambridge, UK, 2007.
13 Q. Wu, Y. Ying, and D.-X. Zhou, “Learning rates of least-square regularized regression,” Foundations
of Computational Mathematics, vol. 6, no. 2, pp. 171–192, 2006.
14 Q. Wu and D.-X. Zhou, “Analysis of support vector machine classification,” Journal of Computational
Analysis and Applications, vol. 8, no. 2, pp. 99–119, 2006.
15 T. Hu, “Online regression with varying Gaussians and non-identical distributions,” Analysis and
Ap-plications, vol. 9, no. 4, pp. 395–408, 2011.
16 D.-H. Xiang, T. Hu, and D.-X. Zhou, “Learning with varying insensitive loss,” Applied Mathematics
Letters, vol. 24, no. 12, pp. 2107–2109, 2011.
17 S. Smale and D.-X. Zhou, “Learning theory estimates via integral operators and their approxima-tions,” Constructive Approximation, vol. 26, no. 2, pp. 153–172, 2007.
18 S. Smale and D.-X. Zhou, “Online learning with Markov sampling,” Analysis and Applications, vol. 7, no. 1, pp. 87–113, 2009.
19 Y. Yao, “On complexity issues of online learning algorithms,” IEEE Transactions on Information Theory, vol. 56, no. 12, pp. 6470–6481, 2010.
20 Y. Ying, “Convergence analysis of online algorithms,” Advances in Computational Mathematics, vol. 27, no. 3, pp. 273–291, 2007.
21 I. Steinwart and C. Scovel, “Fast rates for support vector machines using Gaussian kernels,” The
Submit your manuscripts at
http://www.hindawi.com
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Mathematics
Journal ofHindawi Publishing Corporation
http://www.hindawi.com Volume 2014 Mathematical Problems in Engineering
Hindawi Publishing Corporation http://www.hindawi.com
Differential Equations
International Journal of
Volume 2014
Applied Mathematics Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Probability and Statistics Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Mathematical PhysicsAdvances in
Complex Analysis
Journal ofHindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Optimization
Journal of Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
Combinatorics
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
International Journal of Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Operations Research
Journal of
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Function Spaces
Abstract and Applied Analysis Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014 International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporation http://www.hindawi.com Volume 2014
The Scientific
World Journal
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014 Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Decision Sciences
Discrete Mathematics
Journal ofHindawi Publishing Corporation
http://www.hindawi.com Volume 2014 Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014