• No results found

Convergence rate for the moving least squares learning with dependent sampling

N/A
N/A
Protected

Academic year: 2020

Share "Convergence rate for the moving least squares learning with dependent sampling"

Copied!
13
0
0

Loading.... (view fulltext now)

Full text

(1)

R E S E A R C H

Open Access

Convergence rate for the moving

least-squares learning with dependent

sampling

Qin Guo

1*

and Peixin Ye

1

*Correspondence:

[email protected];

[email protected]

1School of Mathematical Sciences

and LPMC, Nankai University, Tianjin, China

Abstract

We consider the moving least-squares (MLS) method by the regression learning framework under the assumption that the sampling process satisfies the

α

-mixing condition. We conduct the rigorous error analysis by using the probability inequalities for the dependent samples in the error estimates. When the dependent samples satisfy an exponential

α

-mixing, we derive the satisfactory learning rate and error bound of the algorithm.

MSC: 68T05; 68Q32; 60E15

Keywords: Moving least-squares; Regression function; Mixing sequence; Probability inequality; Error bound

1 Introduction

The least-squares (LS) method is an important global approximate method based on the regular or concentrated data sample points. However, there are still some irregular or scat-tered samples which are obtained in many practical applications such as engineering and machine learning [1–4]. They also need to be analyzed to achieve their special useful-ness. The moving least-squares (MLS) method was introduced by McLain in [4] to draw a set of contours based on a cluster of scattered data sample points. It turns out that the MLS method is a useful local approximation tool in various fields of mathematics such as approximation theory, data smoothing [5], statistics [6], and numerical analysis [7]. Re-cently, research effort has been made to study the regression learning algorithm by the MLS method, see [8–12]. The main advantage of the MLS regression learning algorithm is that we can learn the regression function in the simple function space, usually generated by polynomials.

We recall the regression learning problem by the MLS method briefly. Functions for learning are defined on a compact metric spaceX(input space) and take values inY= R(output space). The sampling process is controlled by an unknown Borel probability measureρonZ=X×Y. We define the regression function as follows:

(x) =

Y

y dρ(y|x),

(2)

whereρ(·|x) is the conditional probability measure induced byρonY givenxX. The goal of regression learning is to find a good approximation of the regression function

based on a set of random samples z ={zi}mi=1={(xi,yi)}mi=1∈Zm drawn according to the

measureρ.

We define the approximationfzofpointwisely:

fz(x) :=fz,σ,x(x) =arg min

f∈HEz,x(f), xX, (1.1)

the local moving empirical error is defined by

Ez,x(f) =

1

m m

i=1

x σ,

xi σ

f(xi) –yi 2

, (1.2)

where the hypothesis spaceHC(X) is ad˜-dimensional Lipschitz function space,σ=

σ(m) > 0 is a window width, and:Rn×RnR+ is called an MLS weight function

which satisfies the conditions as follows, see [9,10]:

(1)

Rn

(x,t)dt= 1, ∀x,t∈Rn, (1.3)

(2) (x,t)≥cq, ∀|xt| ≤1, (1.4)

(3) (x,t)≤ c˜q

(1 +|xt|)q, ∀x,t∈R

n, (1.5)

where the constantsq>n+ 1,cq,c˜q> 0.

The task of the paper is to derive the error bound offzfρρXwith the normf(·)ρX:=

( X|f(·)|2dρ X)

1

2 to evaluate the approximation ability offz, see [13–22]. The error analysis

of algorithm (1.1) for the independent and identical (i.i.d.) samples has been carried out in [8–10]. However, the samples are not independent but are not far from being indepen-dent in some real data analysis such as market prediction, system diagnosis, and speech recognition. The mixing conditions can quantify how close to independence a sequence of random samples is. In [14,16,23–25], the authors carried out the regression estimation of the least squares algorithm with theα-mixing samples. Up to now there has been no result of algorithm (1.1) obtained in the case of dependent samples. Hence we extend the anal-ysis of algorithm (1.1) to theα-mixing sampling setting which is quite easy to establish, see [26].

Definition 1.1 LetMbadenote theσ-algebras of events generated by the random samples

{zi= (xi,yi)}bi=a.{zi}i≥1is said to satisfy a strongly mixing condition (orα-mixing condition)

if

αi=sup k≥1

sup

A∈Mk1,B∈M∞k+i

P(AB) –P(A)P(B)−→0, asi→ ∞. (1.6)

Specifically, if there exist some positive constantsα> 0,β> 0, andc> 0 such that

αiαexp–ciβ, ∀i≥1, (1.7)

(3)

Our goal is to obtain the convergence rate asm→ ∞of algorithm (1.1) under hypothesis (1.7). The rest of the paper is organized as follows. In Sect.2, we review some concepts and state our main results and the error decomposition. In Sect.3, we present the estimate of the sample error. In Sect.4, we provide the proofs of the main results.

2 Main results and error decomposition

Before giving the main results, we firstly need to provide some concepts that will be re-ferred to throughout this paper, see [8–10].

Definition 2.1 The probability measureρX onXis said to satisfy the condition with

exponentτ> 0 if

ρX

B(x,r)≥cτrτ, ∀0 <rr0,xX, (2.1)

where the constantsr0> 0,> 0, andB(x,r) ={uX:|ux| ≤r, forr> 0}.

Definition 2.2 We say that the hypothesis spaceHsatisfies the norming condition with exponentζ > 0 andd∈Nif we can find points{ui}id=1⊂B(x,σ) for everyxXand 0 < σσ0satisfying|uiuj| ≥2cHσfori=jand

d

i=1 f(ui)

2 1

2

cHσζfC(X), ∀fH, (2.2)

where the constantsσ0> 0,cH> 0 anddis chosen as at least the dimensiond˜ofH.

Here we assume|y| ≤Malmost surely, and all the constants such asC,CH,ζ,,ζ,CH,ρX,

CH ,ρ

X, and so on are independent of the key parametersδ,m, orσ in this paper. Now we

give our main results of algorithm (1.1).

Theorem 2.1 Assume that(1.7), (2.1),and(2.2)hold.Suppose0 <p< 2,σ= (m(α))γ with

m(α)=m{8m c }1/(1+

β)–1,γ > 0,and0 <σmin{σ

0, 1, (r0/CH,ζ)1/max{ζ,1}}.If m satisfies

m(α)1–γ τmax{ζ,1}

≥256c– 1 p

p /3 +,ζ

log2 + 8e–2α/δ1+ 1 p

+,ζγlogm(α), (2.3)

then for any0 <δ< 1,with confidence1 –δ,we have

fz2ρXC

m(α)4γ(ζ+τ2max{ζ,1})–p1+1. (2.4)

Then we can obtain the explicit learning rate of algorithm (1.1) with selecting the suit-able parameterσ=σ(m).

Theorem 2.2 Under the assumptions of Theorem2.1,if we chooseσ= (m(α))–(4ς+2maxε {τ,τ ς}),

0 <ε< 1/4,and

m(α)≥C1log2 + 8e–2α/δ1+ 1

p+logm(α)2+σ–(4ς+2max{τ,τ ς})/ε

(4)

then with confidence1 –δ,we have

fzfρρXC2

m(α)ε

1

2, (2.6)

where

C1=

,ζ +

256 3 c

–1p p

2

1 + 1

4ς+ 2max{τ,τ ζ} 2

. (2.7)

Remark2.1 The result of the above theorem shows that the learning rate tends tom–12

whenσ→1. For the i.i.d. case, the same rate has been obtained in [9,10].

To estimate the quantity of the total errorfzfρρX, we use the proposition from [8]

below.

Proposition 2.1 Assume(2.1)and(2.2)hold.Then we have

fzfH2ρXCHσ

–2ζτmax{ζ,1}

X

Ex(fz,σ,x) –Ex(fH,σ,x)

dρX(x), (2.8)

where

Ex(f) =

Z

x σ,

u σ

f(u) –y2(u,y), ∀f:X→R (2.9)

is called the local moving expected risk and

fH(x) :=fH,σ,x=arg min

f∈HEx(f), xX, (2.10)

is called the target function.

Remark2.2 Here we assumeH. It follows from

Ex(f) =

X

x σ,

u σ

f(u) –(u)

2

dρX(u) +Ex(), ∀f :X→R, (2.11)

thatfH=. ThusfzfρρX=fzfHρX.

Next we only need to provide the upper bound of the integral in (2.8). So to do this, we give its decomposition as follows:

X

Ex(fz,σ,x) –Ex(fH,σ,x)

dρX(x)≤

X

Ex(fz,σ,x) –Ex(fH,σ,x)

Ez,x(fz,σ,x) –Ez,x(fH,σ,x)

dρX(x)

:=S(z,σ). (2.12)

(5)

3 Estimates for the sample error

In order to obtain the probability estimate ofS(z,σ), we shall use the upper bound for

fz,σ,xandfH,σ,x. We firstly derive the confidence-based estimate offz,σ,xas follows. Proposition 3.1 Under the assumptions of Theorem2.1,if

m(α)≥–,ζlog(δσ)στmax{ζ,1}, (3.1)

then with confidence at least1 –δ,we have

fz,σ,xC(X)≤

23+τ+ζM

c

τcqCHτ/2,ζcH

σζ–max{τ2, τ ζ

2}:=CH,ρ

ζ–max{τ2,τ ζ2}, xX, (3.2)

where

CH,ζ=min

cH 2ζ+1√dC

H,0

,cH 2 ,

1 2

,

,ζ = 2τ+1

cτCHτ ,ζ– 2 τ+1–1

7 6+

7 6log

1 + 4e–2α+7n 6 log

1 + 4BX

CH,ζ

.

The proof is analogous to that of Theorem 3 in [8] except that we need to use the fol-lowing Lemma3.1for the dependent sampling setting to replace Lemma 2 in [8].

Lemma 3.1 Let0 <rr0and0 <δ< 1.If(1.7)and(2.1)hold,then with confidence1 –δ,

we have

(xB(x,r))

m

r

2

τ

+7logδ– 7log(1 + 4e

–2α) – 7nlog(4BX r + 1)

6m(α)

– 1, ∀xX. (3.3)

Specifically,if

m(α)>[7log(

1

δ) + 7log(1 + 4e

–2α) + 7nlog(4BX r + 1)]

6(2cτrτ+1τ – 1)

, (3.4)

then with confidence at least1 –δ,we have

(xB(x,r))

m

2τ+1r

τ

, ∀xX, (3.5)

where(xmB(x,r))is the proportion of those sampling points lying in B(x,r).

Proof It is shown in Theorem 5.3 of [27] that one can find {vj}Nj=1 ⊆X satisfying XBR(Rn)⊆

N

j=1B(vj,2r) andN ≤(4rR + 1)n. Letξ(j):X→Rbe the characteristic function

of the setB(vj,2r). Its meanμ(j)= Xξ(j)(x)dρX=ρX(B(vj,2r)) satisfies|ξ(j)–μ(j)| ≤1 and σ2(ξ(j))≤1. Now we use the Bernstein inequality for the dependent samples in [28].

(6)

with meanμ= Zξ(z)dρand varianceσ2.Assume that|ξ

iμ| ≤D almost surely.Then, for everyε> 0,

P 1 m m i=1

[ξiμ] >ε

≤1 + 4e–2αexp

m

(α)ε2

2(σ2+1 3)

. (3.6)

Then it follows from the above proposition that

P 1 m m i=1

ξi(j)–μ(j)≤–ε

≤1 + 4e–2αexp

m

(α)ε2

2 +2 3ε

, ∀ε> 0, (3.7)

hence,

P

min

1≤j≤N 1 m m i=1

ξi(j)–μ(j)

≤–ε

N1 + 4e–2αexp

m

(α)ε2

2 +23ε

. (3.8)

For 0 <δ< 1, let

N1 + 4e–2αexp

m

(α)ε2

2 +23ε

=δ. (3.9)

Then we get

ε=

2 3logN

(1+4e–2α)

δ +

(23logN(1+4δe–2α))2+ 8m(α)logN(1+4e–2α)

δ

2m(α)

≤2logN

(1+4e–2α)

δ

3m(α) +

2logN(1+4δe–2α)

m(α)

≤7logN

(1+4e–2α)

δ

6m(α) + 1. (3.10)

It follows that, with confidence at least 1 –δ,

min

1≤j≤N 1 m m i=1

ξi(j)–μ(j)

> –7log

N(1+4e–2α)

δ

6m(α) – 1. (3.11)

Hence, we have

1 m m i=1

ξ(j)–μ(j)> –7log

N(1+4e–2α)

δ

6m(α) – 1, ∀j= 1, . . . ,N. (3.12)

Condition (2.1) yieldsμ(j)≥(r2)τ. Alsoξ(j)(xi) = 1 ifxiB(vj,r2) and 0 otherwise. So that 1

m

m

i=1ξ(j)(xi) =(xB(vj,r2))/m. Hence,

xB

vj, r

2

m>

r

2

τ

–7log

N(1+4e–2α)

δ

(7)

Observe fromXNj=1B(vj,2r) that for eachxX, there exists somej∈1, . . . ,N such

thatxB(vj,r2), i.e.,|vjx| ≤2r. SincexiB(vj,2r) implies|xix| ≤ |xivj|+|vjx| ≤r,

we see that

xB(x,r)/m

xB

vj, r

2

m

r

2

τ

–7log

N(1+4e–2α)

δ

6m(α) – 1. (3.14)

This proves Lemma3.1.

Now we are in a position to prove Proposition3.1.

Proof of Proposition3.1 By (3.1) and settingr=CH,ζσmax{ζ,1}≤r0, it is easy to see that

(3.4) holds. Then (3.5) is valid.

It follows from (3.5) and Definition2.2withσreplaced byσ2 that

mi/m=

xB(ui,r)

/m>cτrτ/2τ+1, (3.15)

and

|xi,lui| ≤r, (3.16)

where{xi,l}ml=1i are the points of the set(xB(ui,r)), which implies

|xi,lx| ≤ |xi,lui|+|uix| ≤r+ σ

2 ≤σ, (3.17)

wherexX,l= 1, . . . ,m˜, andm˜ =min1id{mi}.

Then, by (1.4), we have

x σ,

xi,l σ

cq. (3.18)

Hence

1

m m

i=1

x σ,

xj σ

fz,σ,x(xj)

2

≥ 1

m d

i=1 m

l=1

x σ,

xi,l σ

fz,σ,x(xi,l) 2

≥ 1

m d

i=1 m

l=1 cq

fz,σ,x(xi,l) 2

2τ+1r

τ

cq

cHσζ

2ζ+1 2

fz,σ,x2C(X). (3.19)

The last inequality has been proved in Theorem 3 in [8]. Finally, combining (3.19) with the following inequality

1

m m

i=1

x σ,

xi σ

fz,σ,x(xi)

2

≤ 2

m m

i=1

x σ,

xi σ

(0 –yi)2+y2i

≤4M2, (3.20)

(8)

We also need to invoke Lemma 4 in [8] which provides the result about the upper bound offH,σ,x.

Proposition 3.3 Assume that(2.1)and(2.2)hold.Then,for some constant CH ,ρ

X

inde-pendent ofσ,we have

fH,σ,xC(X)≤CH ,ρXσ

ζ–max{τ

2,τ ζ2},xX, 0 <σmin{σ0, 1}. (3.21)

Next we will bound the sample error. The estimation forS(z,σ) relies on the ratio prob-ability inequality below that can be found in [27].

Proposition 3.4 Suppose that(1.7)holds.LetGbe a set of functions on Z and c> 0such that,for each gG,μ(g) = Zg(z)≥0,μ(g2)≤(g),and|g(z) –μ(g)| ≤D almost surely.Then,for everyε> 0and0 <α≤1,we have

P

sup

g∈G

μ(g) –m1mi=1g(zi)

ε+μ(g) ≥4α

ε

≤1 + 4e–2αN(G,αε)

×exp

α

2m(α)ε

2c+23D

. (3.22)

We obtain the upper bound estimate forS(z,σ) by using Proposition3.4.

Proposition 3.5 If the assumptions of Proposition3.1hold,

R=maxCH,ρX,C

H,ρX,M

σζτ2max{1,ζ}), (3.23) and

m(α)≥256

3 c

–1p p

log2 + 8e

–2α

δ

1+1 p

, (3.24)

then with confidence1 –δ,there holds

X

Ex(fz,σ,x) –Ex(fH,σ,x)

Ez,x(fz,σ,x) –Ez,x(fH,σ,x)

dρX(x)

≤16R2Dσn

256cp

3m(α) 1/(1+p)

+1 2

X

Ex(fz,σ,x) –Ex(fH,σ,x)

dρX(x). (3.25)

Proof Let the functiong(u,y) be defined on the function set

GR=

X

x σ,

u σ

f(u) –y2–fH,σ,x(u) –y 2

dρX(x) :

fBR:={fH:fC(X)≤R}

. (3.26)

With condition (1.5) and the boundof the density function ofρX, we have

X

x σ,

u σ

dρX(x)≤cρcq

Rn σn

(1 +|u|)qdu

2πn/2c

ρc˜q

(qn)(n2)σ

(9)

which implies

g(u,y)≤2(R+M)2Dσn≤8R2Dσn:=cR. (3.28)

Hence|g(u,y) –μ(g)| ≤2cR.

It follows from the Schwarz inequality that

g(u,y)2=

X x σ, u σ

f(u) –fH,σ,x(u)

f(u) +fH,σ,x(u) – 2y

dρX(x) 2 ≤ X x σ, u σ

f(u) –fH,σ,x(u) 2

(2R+ 2M)2dρX(x)

× X x σ, u σ

dρX(x). (3.29)

By (3.27),

μg2≤16R2Dσn X X x σ, u σ

f(u) –fH,σ,x(u) 2

dρX(u)

dρX(x). (3.30)

It has been proved in [9] that

X x σ, u σ

f(u) –fH,σ,x(u) 2

dρX(u)

= Z x σ, u σ

f(u) –y2–fH,σ,x(u) –y 2

(u,y). (3.31)

Substituting (3.31) into (3.30),

μg2≤16R2Dσn X Z x σ, u σ

f(u) –y2

fH,σ,x(u) –y 2

(u,y)

dρX(x)

= 16R2Dσn Z X x σ, u σ

f(u) –y2

fH,σ,x(u) –y 2

dρX(x)

(u,y)

= 16R2Dσnμ(g). (3.32)

Using Proposition3.4withα=14 andG=GR, we know that

P

sup

fBR

X[(Ex(f) –Ex(fH,σ,x)) – (Ez,x(f) –Ez,x(fH,σ,x))]dρX(x) ε+ X(Ex(f) –Ex(fH,σ,x))dρX(x)

≥√ε

≤1 + 4e–2αN

GR, ε

4

exp

– 3m

(α)ε

2048R2Dσn

(10)

Since for anyg1,g1GR,

g1(u,y) –g2(u,y)=

X

x σ,

u σ

f1(u) –y2–f2(u) –y2dρX(x)

X

x σ,

u σ

f1(u) –f2(u)

×f1(u) +f2(u) – 2ydρX(x)

≤4RDσnf1(u) –f2(u), (3.34)

then we have

N

GR, ε

4

N

BR,

ε

16RDσn

=N

B1, ε

16R2Dσn

. (3.35)

It follows from (3.33) that

P

sup

fBR

X[(Ex(f) –Ex(fH,σ,x)) – (Ez,x(f) –Ez,x(fH,σ,x))]dρX(x)

ε+ X(Ex(f) –Ex(fH,σ,x)dρX(x)

≤√ε

≥1 –1 + 4e–2αN

B1, ε

16R2Dσn

exp

– 3m

(α)ε

2048R2Dσn

. (3.36)

We set the term (1 + 4e–2α)N(B1, ε

16R2Dσn)exp{– 3m (α)ε

2048R2Dσn}of the above inequality toδ/2.

We need to invoke the lemma proved by the same method of Proposition 4.3 in [21].

Lemma 3.2 Let η∗(m(α),δ)be the smallest positive solution of the following inequality inη:

1 + 4e–2αN(B

1,η))exp

–3m

(α)η

128

δ.

IflogN(B1,η)≤cp(η)–p,for some p∈(0, 2),cp> 0and allη> 0,then with confidence at least1 –δ,we have

ηm(α),δ≤max

256 3m(α)log

1 + 4e–2α δ ,

256cp

3m(α)

1/(1+p)

. (3.37)

Then we return to the proof of Proposition3.5. It follows from Theorem 5.3 in [27] that

N

B1, ε

16R2Dσn

≤(32R2Dσn ε + 1)

˜

d, for 0 <ε< 16R2Dσn;

(11)

When 0 <ε< 16R2Dσn,

logN

B1, ε

16R2Dσn

≤ ˜dlog

32R2Dσn ε + 1

d˜

p

32R2Dσn ε

p

=d˜

p

32R2Dσnpεp, p> 0. (3.39)

Whenε≥16R2Dσn, we have

logN

GR, ε

4

≤0. (3.40)

Hence we conclude that

logN

B1, ε

16R2Dσn

≤2pd˜

p

ε

16R2Dσnp

:=cpηp. (3.41)

This, together with (3.37), implies that, for

m(α)≥256

3 c

–1p p

log2 + 8e

–2α

δ

1+1 p

, (3.42)

we obtain

ε≤16R2Dσn

256cp

3m(α) 1/(1+p)

. (3.43)

Combining (3.43) with (3.36), with confidence 1 –δ/2, we have

X

Ex(f) –Ex(fH,σ,x)

Ez,x(f) –Ez,x(fH,σ,x)

dρX(x)

≤√ε

ε+

X

Ex(f) –Ex(fH,σ,x)

dρX(x)

ε+1 2

X

Ex(f) –Ex(fH,σ,x)

dρX(x)

≤16R2Dσn

256cp

3m(α) 1/(1+p)

+1 2

X

Ex(f) –Ex(fH,σ,x)

dρX(x). (3.44)

Finally, settingf=fz,σ,xin the above inequality, we derive the desired result.

4 Proofs of the main results

In this subsection, we provide the proofs of Theorem 2.1and Theorem2.2. We firstly prove Theorem2.1.

Proof If we takeσ= (m(α))γ,γ> 0, then we have

R=maxCH,ρX,C

H,ρX,M

(12)

It is readily seen that (2.3) implies (3.1). Then Proposition3.5holds true. We thus obtain, with confidence 1 –δ,

X

Ex(fz,σ,x) –Ex(fH,σ,x)

dρX(x)≤32R2Dσn

256cp

3m(α) 1/(1+p)

. (4.2)

Therefore from (2.8) we obtain

fzfH2ρXCHσ

–2ζτmax{ζ,1}×32R2Dσn

256cp

3m(α) 1/(1+p)

Cm(α)4γ(ζ+τ2max{ζ,1})–p1+1

. (4.3)

This proves Theorem2.1.

Next, we prove Theorem2.2.

Proof Letγ =ε/[4ς+ 2max{τ,τ ς}] > 0 andp= ε

1–ε. Therefore we have

γ τmax{1,ς}<ε< 1/4 (4.4)

and

m(α)1–γ τmax{1,ς}

m(α)1–ε

m12. (4.5)

It follows from (2.5) that

m(α)12 ≥

,ζ +

256 3 c

–1p p

1 + 1

4ς+ 2max{τ,τ ζ}

×!log2 + 8e–2α/δ1+ 1

p+logm(α)"

,ζ +

256 3 c

–1p p

!

log2 + 8e–2α/δ1+ 1

p+γlogm(α)", (4.6)

which implies that condition (2.3) of Theorem2.1holds true, we thus obtain, with confi-dence 1 –δ,

fzfHρXC1

m(α)ε

1

2. (4.7)

This proves Theorem2.2.

Funding

This work was supported by the National Natural Science Foundation of China (grant no. 11671213) and the Fundamental Research Funds for the Central Universities.

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

(13)

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Received: 9 March 2018 Accepted: 21 July 2018

References

1. Cerny, M., Antoch, J., Hladik, M.: On the possibilistic approach to linear regression models involving uncertain, indeterminate or interval data. Inf. Sci.244(7), 26–47 (2013)

2. Fasshauer, G.E.: Toward approximate moving least squares approximation with irregularly spaced centers. Comput. Methods Appl. Mech. Eng.193(12–14), 1231–1243 (2004)

3. Komargodski, Z., Levin, D.: Hermite type moving-least-squares approximations. Comput. Math. Appl.51(8), 1223–1232 (2006)

4. Mclain, D.H.: Drawing contours from arbitrary data points. Comput. J.17(17), 318–324 (1974)

5. Savitzky, A., Golay, M.J.E.: Smoothing and differentiation of data by simplified least squares procedures. Anal. Chem.

36(8), 1627–1639 (1964)

6. Wand, M.P., Jones, M.C.: Kernel Smoothing. Chapman & Hall, London (1995)

7. Shepard, D.: A two-dimensional interpolation function for irregularly-spaced data. In: ACM National Conference, pp. 517–524 (1968)

8. Wang, H.Y., Xiang, D.H., Zhou, D.X.: Moving least-square method in learning theory. J. Approx. Theory162(3), 599–614 (2010)

9. Wang, H.Y.: Concentration estimates for the moving least-square method in learning theory. J. Approx. Theory163(9), 1125–1133 (2011)

10. He, F.C., Chen, H., Li, L.Q.: Statistical analysis of the moving least-squares method with unbounded sampling. Inf. Sci.

268(1), 370–380 (2014)

11. Tong, H.Z., Wu, Q.: Learning performance of regularized moving least square regression. J. Comput. Appl. Math.325, 42–55 (2017)

12. Guo, Q., Ye, P.X.: Error analysis of the moving least-squares method with non-identical sampling. Int. J. Comput. Math., 1–15 (2018).https://doi.org/10.1080/00207160.2018.1469748

13. Smale, S., Zhou, D.X.: Online learning with Markov sampling. Anal. Appl.7(1), 87–113 (2009)

14. Pan, Z.W., Xiao, Q.W.: Least-square regularized regression with non-iid sampling. J. Stat. Plan. Inference139(10), 3579–3587 (2009)

15. Guo, Z.C., Shi, L.: Classification with non-i.i.d. sampling. Math. Comput. Model.54(5), 1347–1364 (2011)

16. Guo, Q., Ye, P.X.: Coefficient-based regularized regression with dependent and unbounded sampling. Int. J. Wavelets Multiresolut. Inf. Process.14(5), 1–14 (2016)

17. Sun, H.W., Guo, Q.: Coefficient regularized regression with non-iid sampling. Int. J. Comput. Math.88(15), 3113–3124 (2011)

18. Chu, X.R., Sun, H.W.: Half supervised coefficient regularization for regression learning with unbounded sampling. Int. J. Comput. Math.90(7), 1321–1333 (2013)

19. Steinwart, I., Hush, D., Scovel, C.: Learning from dependent observations. J. Multivar. Anal.100(1), 175–194 (2009) 20. Billingsley, P.: Convergence of probability measures. Appl. Stat.159(1–2), 1–59 (1968)

21. Wu, Q., Ying, Y.M., Zhou, D.X.: Learning rates of least-square regularized regression. Found. Comput. Math.6(2), 171–192 (2006)

22. Lv, S.G., Feng, Y.L.: Semi-supervised learning with the help of Parzen windows. J. Math. Anal. Appl.386(1), 205–212 (2012)

23. Sun, H.W., Wu, Q.: Regularized least square regression with dependent samples. Adv. Comput. Math.32(2), 175–189 (2010)

24. Chu, X.R., Sun, H.W.: Regularized least square regression with unbounded and dependent sampling. Abstr. Appl. Anal.

2013, Article ID 139318 (2013)

25. Guo, Q., Ye, P.X., Cai, B.L.: Convergence rate forlq-coefficient regularized regression with non-i.i.d. sampling. IEEE Access6, 18804–18813 (2018)

26. Parthasarathy, K.R.: Convergence of probability measures. Technometrics12(1), 171–172 (1968)

27. Cucker, F., Zhou, D.X.: Learning Theory: An Approximation Theory Viewpoint. Cambridge University Press, Cambridge (2007)

References

Related documents

2 ] 2+ unit reveal no unusual features. Virtually the same distances are seen in the acetato-bridged complex 4 [67]. A large number of metal complexes containing

In the present study, we analysed the clinical and lab- oratory features of an cohort of Italian patients with AOSD and describe in detail two difficult cases, both complicated

The results regarding the relationship between paranoid, schizoid, and schizotypal personality disorders and the parenting styles showed that schizoid

According to the obtained information, the highest frequency between the basic themes in the analyzed sources is related to the content of the code (P10) called

In patient with a recent acute coronary syndrome, the addition of a new oral anticoagulant to antiplatelet therapy results in a modest reduction in cardiovascular events but

MOVED by Stephen Manchester and SUPPORTED by Kay Pray that the Program and Planning Committee recommends that the Board of Directors of the Community Mental Health

Minnesota Artist Exhibition Program, Artist Panel, Minneapolis Institute of Arts, 2006 - 2008 WARM, Board of Directors, Women’s Art Registry of Minnesota, 2006 -

An interesting feature of the Chung-ADR contractual scheme, is the irrelevance result concerning the allocation of ex-post decision rights as a condition to induce efficient