The errors of simultaneous approximation of multivariate functions by neural networks

(1)

Contents lists available atScienceDirect

Computers and Mathematics with Applications

journal homepage:www.elsevier.com/locate/camwa

The errors of simultaneous approximation of multivariate functions by

neural networks

✩

Tingfan Xie, Feilong Cao

∗

Institute of Metrology and Computational Science, China Jiliang University, Hangzhou 310018, Zhejiang Province, PR China

a r t i c l e i n f o Article history:

Received 14 April 2010

Received in revised form 27 March 2011 Accepted 28 March 2011 Keywords: Neural networks Simultaneous approximation Error Complexity

a b s t r a c t

There have been many studies on the simultaneous approximation capability of feed-forward neural networks (FNNs). Most of these, however, are only concerned with the density or feasibility of performing simultaneous approximations. This paper considers the simultaneous approximation of algebraic polynomials, employing Taylor expansion and an algebraic constructive approach, to construct a class of FNNs which realize the simultaneous approximation of any smooth multivariate function and all of its derivatives. We also present an upper bound on the approximation accuracy of the FNNs, expressed in terms of the modulus of continuity of the functions to be approximated.

1. Introduction

To aid our description, we introduce some notation. Let_Rd_denote_d_{-dimensional Euclidean space, and let}

NandN0

denote the set of all positive integers and the set of all non-negative integers, respectively. LetD

⊂

_Rd_{be a domain, and with}_f

:

_D

→

R. Iffis continuous onD, then we writef

∈

C

(

D

)

. For a givend-index

α

=

(α

₁

, α

₂

, . . . , α

_d

),

α

_i

∈

_N₀

,

i

=

1

,

2

, . . . ,

d

we write

|

α

| =

α

₁

+

α

₂

+ · · · +

α

_d. ForX

=

(

x1

,

x2

, . . . ,

xd

)

∈

Rd, we also write

Xα

=

xα1 1 x α2 2

· · ·

x αd d

,

‖

X

‖ =

(

x 2 1

+

x 2 2

+ · · · +

x 2 d

)

1/2

.

If functionfhas partial derivatives with

α

iorder forxi

,

i

=

1

,

2

, . . . ,

d, then we write

Dαf

=

∂

|α|_f

∂

Xα

=

∂

|α|

∂

xα1 1

∂

x α2 2

· · ·

∂

x αd d f

(

X

).

Form

∈

_N, suppose that

Cm

(

D

)

= {

f

∈

C

(

D

)

:

Dαf

∈

C

(

D

),

|

α

| ≤

m

}

,

and_C∞

(

_D

)

=



∞

m=1Cm

(

D

)

.

The modulus of continuity of the functionf

∈

_C

(

D

)

is defined by

ω(

f

, δ)

=

sup

‖X−X′_‖≤_δ_;_X_,_X′_∈_D

|

f

(

X

)

−

f

(

X′

)

|

.

✩_{The research was supported by the National Natural Science Foundation of China (Nos. 60873206, 6100588) and the Natural Science Foundation of} Zhejiang Province of China (No. Y61003207).

∗_{Corresponding author.}

E-mail address:[email protected](F. Cao).

(2)

This modulus is usually used as a tool for measuring approximation error. It is also used to measure the smoothness of the functionfin approximation theory and Fourier analysis (see [1–3]). If there exists a constantM

>

0 such that

ω(

f

, δ)

p

≤

M

δ

α

,

0

< α

≤

1

,

then we write asf

∈

Lip

(

M

, α)

.

Let

ϕ

be a real function defined onR, and suppose thatn

∈

N. Feed-forward neural networks (FNNs) with one hidden

layer, the only type that we are concerned with in this paper, are mathematically expressed as N_ϕ,n

(

X

)

=

n

−

j=1

cj

ϕ(

⟨

Wj

·

X

⟩ +

bj

),

X

∈

Rd (1)

where for 0

≤

j

≤

n, thebj

∈

Rare the thresholds, theWj

=

(w

j1

, w

j2

, . . . , w

jd

)

∈

Rdare the connection weights, the

cj

∈

Rare the coefficients,

⟨

Wj

·

X

⟩ =

w

j1x1

+

w

j2x2

+ · · · +

w

jdxdis the inner product ofWjandX, and

ϕ

is the activation

function of the network. We write the collection of the neural networks asNϕ,n.

In many fundamental network models, the activation function must satisfy

ϕ(

x

)

→



1

,

asx

→ +∞

,

0

,

asx

→ −∞

.

This is called the sigmoidal function.

It is well-known that FNNs are universal approximators. Theoretically, any continuous function defined on a compact set can be approximated by an FNN to any desired degree of accuracy by increasing the number of hidden neurons. It was proved by Cybenko [4] and Funahashi [5] that any continuous function can be approximated on a compact set with uniform topology by a network of the form given in Eq.(1), using any continuous, sigmoidal activation function. Hornik et al. in [6] have shown that any measurable function can be approached with such a network. Furthermore, various density results on FNN approximations of multivariate functions were later established by many authors using various methods, for more or less general situations: in [7] by Leshno et al., [8] by Mhaskar and Micchelli, [9] by Chui and Li, [10] by Chen and Chen etc.

Yet a related and important problem is that of complexity: determining the number of neurons required to guarantee that all functions (belonging to a certain class) can be approximated to the prescribed degree of accuracy

ϵ

. For example, a classical result of Barron [11] shows that if the function is assumed to satisfy certain conditions expressed in terms of its Fourier transform, and if each of the neurons evaluates a sigmoidal activation function, then at mostO

(ϵ

−2

)

_neurons

are needed to achieve the order of approximation

ϵ

. Previously, many authors have published similar results on the complexity of FNN approximations: Mhaskar and Micchelli [12], Suzuki [13], Maiorov and Meir [14], Makovoz [15], Ferrari and Stengel [16], Xu and Cao [17], Cao et al. [18], etc.

Simultaneous approximations of functions and their partial derivatives have also been studied in [19–23], etc. Hornik et al. [19], by assuming that

ϕ

∈

_Cm

(

R

)

satisfies

ϕ

(ν)

(

t

) (ν

=

0

,

1

, . . . ,

m

)

are Lebesgue integrable onRand

ϕ

̸=

0, proved that N_ϕ

=

∞



n=0 N_ϕ,n

is dense uniformly on a compact subsetKof the function setC∞↓

(

R d

)

_{. Here}

C∞↓

(

R

d

)

_{means that}_f

∈

C∞

(

Rd

)

, and for any

multi-integral index

α

and

β

, it holds that lim

‖X‖→∞

XβDαf

(

X

)

=

0

.

This result shows that forf

∈

_C∞

↓

(

Rd

)

, the compact setK

⊂

Rd, and any

ε >

0, there existsNϕ,n

(

X

)

, such that

max

|_α|_<mmaxX∈K

|

Dαf

(

X

)

−

DαN_ϕ,n

(

X

)

|

< ε.

The proofs given in [19] are non-constructive, in the sense that their arguments are based on Arzela–Ascoli-type theorems, the Fourier transform, and some other theoretical results. The results of Hornik et al. were applied by Gallant and White [21] to train a network with one hidden layer to learn an unknown function and its derivatives by the least squares methods. Cardaliaguet and Euvrard [20] also studied simultaneous approximations, but their results are restricted to the first-order derivatives of one-dimensional or two-dimensional functions. For the networks constructed in [20], Anastassiou [24,25] gave estimates of the simultaneous approximation rate. In [22], Li derived density results for the simultaneous approximation of multivariate functions and their partial derivatives by FNNs(1), just by assuming that

ϕ

∈

_Cn

(

R

)

is not polynomial. Similar results for radial basis function networks were given in [26] by Li.

The result given by Hornik et al. [19] shows that under certain conditions any smooth function and its derivatives can be simultaneously approximated by FNNs. So it is natural to raise the question of how much the error of the simultaneous approximation is. For the question, Attali and Gilles [27] proved that iff

∈

_Cp

(

K

)

,Kis a compact set inRd, andDαf

∈

Lip

(

M

,

1

)

for

|

α

| ≤

p, then there existsN_ϕ,n

∈

Nϕ,nsuch that

max

|_α|≤pmaxX∈K

|

Dαf

(

X

)

−

DαN_ϕ,n

(

X

)

| =

O

(

n−

1

2d

).

₍₂₎

(3)

We found that there is space to improve the estimate(2). In our recent paper [28], the authors of this paper have proved a profound result for the univariate case by using a new constructive approach: forf

∈

_Cp

[

_a

,

_b

]

_{, and any}

ε >

_{0, there exists}

an FNNN_ϕ,n

(

x

)

form such as(1)such that

|

f(k)

(

x

)

−

N_ϕ,(k)_n

(

x

)

| ≤

C

(

a

,

b

,

p

)



1 n

+

1



p−k

ω



f(p)

,

1 n

+

1



+

ε,

for alla

≤

x

≤

bandk

=

0

,

1

, . . . ,

p, whereC

(

a

,

b

,

p

)

is a positive constant depending only ona

,

bandp. The aim of this paper concerns the general dimensiond. That is, we will prove that the right side of the inequality(2)can be replaced by O

(

n−1d

)

_{by means of a constructive approach. The main result to be established is as follows.}

Suppose that

ϕ

∈

_C∞

(

R

)

,

ϕ

(ν)

(

0

)

̸=

0 for

ν

=

0

,

1

. . .

, andf

∈

Cp

(

K

)

where K is a compact set inRd, and

Dβf

∈

Lip

(

M

,

1

)

,

|

β

| =

p. Then for

|

α

| ≤

p, an FNNN_ϕ,n

∈

Nϕ,ncan be constructed such that

max

X∈K

|

Dαf

(

X

)

−

DαN_ϕ,n

(

X

)

| =

O

(

n

−p+1_d−|α|

).

The paper is organized as follows. In the next section, we will consider how to construct FNNs in the form of(1)to approximate simultaneously an algebraic polynomial and its derivatives. Section3answers the above question and gives the estimates of the error of the simultaneous approximation of a function and its derivatives by FNNs.

2. Simultaneous approximation of a polynomial by FNNs Let

Pn

(

X

)

=

−

|α|≤n

C_αXα

,

X

∈

_Rd

be an algebraic polynomial indvariables with the total degree at mostn. We denote byΠn

d the set of all of these algebraic

polynomials. ThenΠn

dis a linear space with



n+d d



dimension (see [29]).

Now we establish a theorem on the simultaneous approximation of a multivariate polynomial by FNNs. Theorem 2.1. Suppose that

ϕ

∈

_Cn

(

[−

1

,

1

]

)

,

ϕ

(ν)

(

0

)

̸=

0for

ν

=

0

,

1

, . . . ,

n, and Pn

∈

Πdn, defined on

[−

1

,

1

]

d_{. Then for any}

arbitrary

ε >

0, there exists N

∈

N_ϕ,_n₊_d d

such that

max

|α|≤n

‖

DαPn

(

X

)

−

DαN

(

X

)

‖

C([−1,1]d)

< ε,

(3)

where

‖ · ‖

_C₍_K₎

=

sup_X∈K

| · |

,

|

α

| ≤

n, and

[

a

,

b

]

d

= [

a

,

b

] × [

a

,

b

] × · · · × [

a

,

b

]

.

Proof. Since

Pn

(

X

)

=

Pn∗−1

(

X

)

+

−

|_α|=n

C_αXα

,

(4)

there existd-dimensional vectors

Bnj

=

(

bnj,1

,

bnj,2

, . . . ,

bnj,d

),

j

=

1

,

2

, . . . ,



n

+

d

−

1

d

−

1



such that (see [9])

−

|α|=n C_αXα

=

_n₊_d₋₁ d−1 

−

j=1 en,j

(

⟨

Bnj

·

X

⟩

)

n

,

X

∈

Rd

,

(5) whereP∗ n−1

∈

Π n−1

d , and the coefficientsen,jare independent ofX. Take a real number

η >

0 such that whenX

∈ [−

1

,

1

]

d,

we have

|

η

⟨

Bnj

·

X

⟩|

<

1

,

j

=

1

,

2

, . . . ,



n

+

d

−

1 d

−

1



,

where

|

a

|

is the absolute value ofa.

The function

ϕ(

t

)

can be expanded att

=

0 by using Taylor’s formula as follows:

ϕ(

t

)

=

n

−

ν=0

ϕ

(ν)

(

₀

)

ν

!

t ν

+

1

(

n

−

1

)

!

∫

t 0

(ϕ

(n)

(µ)

−

ϕ

(n)

(

₀

))(

_t

−

µ)

n−1_d

µ.

(4)

The assumption that

ϕ

(n)

(

0

)

̸=

0 implies that

(

⟨

Bnj

·

X

⟩

)

n

=

n

!

η

n

ϕ

(n)

(

₀

)

ϕ(η

⟨

Bnj

·

X

⟩

)

+

Qn−1

(η,

⟨

Bnj

·

X

⟩

)

+

Rn,j

(η,

X

),

(6) whereQn−1

(η,

⟨

Bnj

·

X

⟩

)

∈

Πn −1 d and Rn,j

(η,

X

)

=

n

η

n

ϕ

(n)

(

₀

)

∫

η⟨Bnj·X⟩ 0

(ϕ

(n)

(µ)

−

ϕ

(n)

(

₀

))(η

⟨

_B nj

·

X

⟩ −

µ)

n−1d

µ.

It is easy to find by computation that when

|

α

| ≤

n, there exists a constantMjindependent of

η

such that for allX

∈ [−

1

,

1

]

d,

|

DαRn,j

(η,

X

)

|

<

Mj

ω(ϕ

(n)

, η),

where

ω(ϕ

(n)

, η)

_{is the modulus of continuity for the function}

ϕ

(n)

(

_t

)

_{on the interval}

[−

₁

,

₁

]

_{. It is obvious that}

ω(ϕ

(n)

, η)

→

0 as

η

→

0. So when

η

n

>

0,

‖

DαRn,j

(η

n

,

X

)

‖

C([−1,1]d₎

<

ε

(

n

+

1

)



n+d−1 d−1



max 1≤j≤ _n₊_d₋₁ d−1 (

|

enj

| +

1

)

.

(7)

Now collecting(4)–(6)gives

Pn

(

X

)

=

_n₊_d₋₁ d−1 

−

j=1 enj n

!

η

n n

ϕ

(n)

(

0

)

ϕ(η

n

⟨

Bnj

·

X

⟩

)

+

Pn−1

(

X

)

+

Rn

(η

n

,

X

),

(8) where Pn−1

(

X

)

=

Pn∗−1

(

X

)

−

_n₊_d₋₁ d−1 

−

j=1 enj n

!

η

n

ϕ

(n)

(

₀

)

n−1

−

ν=0

ϕ

(ν)

(

₀

)

ν

!

(η

n

⟨

Bnj

·

X

⟩

)

ν

,

and Rn

(η

n

,

X

)

=

_n₊_d₋₁ d−1 

−

j=1 enjRnj

(η

n

,

X

).

And from(7)we see that

|

DαRn

(η

n

,

X

)

|

<

ε

n

+

1

,

X

∈ [−

1

,

1

]

d

.

₍₉₎

Similarly, there exists

η

n−1

>

0 such that

Pn−1

(

X

)

=

_n₊_d₋₂ d−1 

−

j=1 en−1,j

(

n

−

1

)

!

η

n−1 n−1

ϕ

(n−1)

(

0

)

ϕ(η

n−1

⟨

Bn−1,j

·

X

⟩

)

+

Pn−2

(

X

)

+

Rn−1

(η

n−1

,

X

),

(10)

wherePn−2

(

X

)

∈

Πdn−2and when

|

α

| ≤

n,

|

DαRn−1

(η

n−1

,

X

)

|

<

ε

n

+

1

,

X

∈ [−

1

,

1

]

d

.

We go on withnof the same steps, and define

N

(

X

)

=

n

−

ℓ=0  n+d−l d−1 

−

j=1 e_ℓj

ℓ

!

η

ℓ ℓ

ϕ

(ℓ)

(

0

)

ϕ(η

ℓ

⟨

B_ℓj

·

X

⟩

).

Then collecting(8)and(10)gives Pn

(

X

)

=

N

(

X

)

+

Rn

(

X

),

where Rn

(

X

)

=

n

−

ℓ=0 R_ℓ

(η

_ℓ

,

X

).

(5)

Therefore, from(9)and the above representation ofRn

(

X

)

, it follows that

‖

DαRn

(

X

)

‖

C([−1,1]d₎

< ε,

This finishes the proof ofTheorem 2.1.

By means of a linear transform of variables,Theorem 2.1can be extended to the case of

[

a

,

b

]

d. Theorem 2.2. Suppose that

ϕ

∈

_Cn

(

[

_a

,

_b

]

)

_{; there exists a point t}

0

∈

(

a

,

b

)

such that

ϕ

(ν)

(

t0

)

̸=

0for

ν

=

0

,

1

,

2

, . . . ,

n, and

P

∈

Πn

d, defined on

[

a

,

b

]

d. Then for arbitrary

ε >

0, and multi-index

α

with

|

α

| ≤

n, there exists N

∈

N_ϕ,n+d d such that max |α|≤n

‖

DαP

(

X

)

−

DαN

(

X

)

‖

C([a,b]d)

< ε.

3. Errors of the simultaneous approximation of a function by FNNs

We first give the error estimation for the simultaneous approximation of algebraic polynomials for a function and its derivatives.

Theorem 3.1. Suppose that f

∈

_Ck

(

[

_a

,

_b

]

d

)

_{, h}

∈

(

₀

, (

_b

−

_a

)/

₂

)

_{and j}

=

₀

,

₁

, . . . ,

_{k. Then there exists a d-variable algebraic}

polynomial P with total degree at most n, for multi-indexes

α, β

with

|

α

| =

j

,

|

β

| =

k

≤

n, such that it holds that

‖

Dαf

(

X

)

−

DαP

(

X

)

‖

C([a+h,b−h]d)

=

O



1 n(k−j)/d

ω



Dβf

,

1 n1/d



.

Proof. It is well-known that for anyf

∈

_Ck

(

[−

₁

,

₁

]

)

_{there are polynomials}_p

n

∈

Π1nand a positive constantCsuch that

(see [30,31])

|

f(j)

(

x

)

−

p(_nj)

(

x

)

| ≤

C



√

1

−

x2 n

+

1 n2



k−j

ω



f(k)

,

√

1

−

x2 n

+

1 n2



,

x

∈ [−

1

,

1

]

where 0

≤

j

≤

k. Therefore, it is easy to obtain that for anya′

,

b′

∈

(

−

1

,

1

)

,

‖

f(j)

(

x

)

−

p(_nj)

(

x

)

‖

_C₍[a′_,_b′_]₎

=

O



1 nk−j

ω



f(k)

,

1 n



.

By a transformation, the above estimates hold for any givenaandb, i.e.,

‖

f(j)

(

x

)

−

p(_nj)

(

x

)

‖

_C₍[a+h,b−h])

=

O



1 nk−j

ω



f(k)

,

1 n



,

which shows thatTheorem 3.1is valid ford

=

1. Thus, such estimates can be extended to the case of

[

a

+

h

,

b

−

h

]

d_{for any}

d

>

1. This completes the proof ofTheorem 3.1.

Now we establish the main result in this paper as follows.

Theorem 3.2. Suppose that

ϕ

∈

_Cn

(

[

a

,

b

]

)

and there exists a point t0

∈

(

a

,

b

)

such that

ϕ

(ν)

(

t0

)

̸=

0for

ν

=

0

,

1

,

2

, . . . ,

n,

If f

∈

_Ck

(

[

_a

,

_b

]

d

) (

_k

≤

_n

)

_{, then there exists N}

∈

_N

ϕ,nsuch that for all multi-indexes

α

and

β

with

|

α

| ≤

k

,

|

β

| =

k, and

h

∈

(

0

, (

b

−

a

)/

2

)

,

‖

Dαf

(

X

)

−

DαN

(

X

)

‖

C([a+h,b−h]d₎

=

O



1 n(k−|α|)/d

ω



Dβf

,

1 n1/d



holds.

Proof. For givenn

∈

_N₀, takem

∈

_N₀such that

md

≤

n

< (

m

+

1

)

d

.

(11)

FromTheorem 3.1it follows that there existsP

(

X

)

∈

Π_dmd, for all multi-indexes

α

and

β

with

|

α

| ≤

k

,

|

β

| =

kand h

∈

(

0

, (

b

−

a

)/

2

)

, such that

‖

Dαf

(

X

)

−

DαP

(

X

)

‖

_C₍_[_a₊_h_,_b₋_h_]d₎

=

O



1 mk−|α|

ω



Dβf

,

1 m



.

(12)

(6)

ApplyingTheorem 2.2, for the polynomialsP

(

X

)

and any

ε >

0

,

there exists an FNNN

∈

N_ϕ,_m₊_d d

such that

‖

DαP

(

X

)

−

DαN

(

X

)

‖

C([a,b]d)

< ε.

Combining the inequality with(12), we obtain

‖

Dαf

(

X

)

−

DαN

(

X

)

‖

_C₍_[_a₊_h_,_b₋_h_]d₎

=

O



1 mk−|_α|

ω



Dβf

,

1 m



+

ε.

(13)

From(11), we have ford

>

1 1 m

=

O



1 n1/d



,



m

+

d d



<

n

.

HenceN

∈

N_ϕ,n, and from(13)it follows that

‖

Dαf

(

X

)

−

DαN

(

X

)

‖

_C₍_[_a₊_h_,_b₋_h_]d₎

=

O



1 n(k−|α|)/d

ω



Dβf

,

1 n1/d



+

ε.

Finally, recalling the arbitrariness of

ε

, we have

‖

Dαf

(

X

)

−

DαN

(

X

)

‖

C([a+h,b−h]d)

=

O



1 n(k−|α|)/d

ω



Dβf

,

1 n1/d



.

This finishes the proof ofTheorem 3.2.

Theorem 3.3. Suppose that

ϕ

∈

_C∞

(

R

)

,

ϕ

(ν)

(

0

)

̸=

0for

ν

=

0

,

1

, . . . ,

and K is any compact set in K

⊂

Rd, If f

∈

Ck

(

K

)

,

then an FNN N

∈

Nϕ,ncan be constructed such that

‖

Dαf

(

X

)

−

DαN

(

X

)

‖

_C₍_K₎

=

O



1 n(k−|α|)/d

ω



Dβf

,

1 n1/d



holds for all multi-indexes

α, β

with

|

α

| ≤

k

,

|

β

=

k

|

.

Proof. SinceK

⊂

_Rd_{is a compact set, we can take a cube}

[

_a

,

_b

]

d_{such that}_K

⊂ [

_a

+

₁

,

_b

−

₁

]

d_{. Applying}_{Theorem 3.2}_{to the}

cube

[

a

,

b

]

d_and_h

=

_{1 implies}_{Theorem 3.3}_.

Remark.InTheorem 3.3, ifDβf

∈

Lip

(

M

,

1

)

, with

|

β

| =

k, then it holds that

‖

Dαf

(

X

)

−

DαN

(

X

)

‖

_C₍_K₎

=

O



1 n(k+1−|α|)/d



for all

|

α

| ≤

k, which is just the conclusion mentioned in Section1. Acknowledgments

The authors would like to thank the reviewer for many useful suggestions. References

[1] G.G. Lorentz, Approximation of Functions, Rinehart and Winston, New York, 1966.

[2] T.F. Xie, S.P. Zhou, Approximation Theory of Real Functions, Hangzhou University Press, Hangzhou, 1998. [3] A. Zygmund, Trigonometric Series, Cambridge University Press, Cambridge, 1959.

[4] G. Cybenko, Approximation by superpositions of sigmoidal function, Math. Control Signals System 2 (1989) 303–314. [5] K.I. Funahashi, On the approximate realization of continuous mappings by neural networks, Neural Netw. 2 (1989) 183–192. [6] K. Hornik, M. Stinchcombe, H. White, Multilayer feedforward networks are universal approximation, Neural Netw. 2 (1989) 359–366.

[7] M. Leshno, V.Y. Lin, A. Pinks, S. Schocken, Multilayer feedforward networks with a nonpolynomial activation function can approximate any function, Neural Netw. 6 (1993) 861–867.

[8] H.N. Mhaskar, C.A. Micchelli, Approximation by superposition of a sigmoidal function, Adv. Appl. Math. 13 (1992) 350–373. [9] C.K. Chui, X. Li, Approximation by ridge functions and neural networks with one hidden layer, J. Approx. Theory 70 (1992) 131–141.

[10] T.P. Chen, H. Chen, Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to a dynamic system, IEEE Trans. Neural Netw. 6 (1995) 911–917.

[11] A.R. Barron, Universal approximation bounds for superpositions of a sigmoidal function, IEEE Trans. Inform. Theory 39 (1993) 930–945. [12] H.N. Mhaskar, C.A. Micchelli, Degree of approximation by neural networks with a single hidden layer, Adv. Appl. Math. 16 (1995) 151–183. [13] S. Suzuki, Constructive function approximation by three-layer artificial neural networks, Neural Netw. 11 (1998) 1049–1058.

[14] V. Maiorov, R.S. Meir, Approximation bounds for smooth functions inC(Rd₎_{by neural and mixture networks, IEEE Trans. Neural Netw. 9 (1998)} 969–978.

[15] Y. Makovoz, Uniform approximation by neural networks, J. Approx. Theory 95 (1998) 215–228.

[16] S. Ferrari, R.F. Stengel, Smooth function approximation using neural networks, IEEE Trans. Neural Netw. 16 (2005) 24–38. [17] Z.B. Xu, F.L. Cao, The essential order of approximation for neural networks, Sci. China Ser. F 47 (2004) 97–112.

(7)

[18] F.L. Cao, T.F. Xie, Z.B. Xu, The estimate for approximation error of neural networks: a constructive approach, Neurocomputing 71 (2008) 626–630. [19] K. Hornik, M. Stinchcombe, H. White, Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks,

Neural Netw. 3 (1990) 551–560.

[20] P. Cardaliaguet, G. Euvrard, Approximation of a function and its derivative with a neural network, Neural Netw. 5 (1992) 207–220.

[21] H.H. Gallant, H. White, On learning the derivatives of an unknown mapping with multilayer feedforward networks, Neural Netw. 5 (1992) 129–138. [22] X. Li, Simultaneous approximations of multivariate functions and their derivatives by neural networks with one hidden layer, Neurocomputing 12

(1996) 327–343.

[23] Z.B. Xu, F.L. Cao, SimultaneousLp_{-approximation order for neural networks, Neural Netw. 18 (2005) 914–923.}

[24] G.A. Anastassiou, Rate of convergence of some neural network operators to the unit-univariate case, J. Math. Anal. Appl. 212 (1997) 237–262. [25] G.A. Anastassiou, Rate of convergence of some multivariate neural network operators to the unit, Comput. Math. Appl. 40 (2000) 1–19. [26] X. Li, On simultaneous approximations by radial basis function neural networks, Appl. Math. Comput. 95 (1998) 75–89.

[27] J.G. Attali, P. Gilles, Approximation of functions by a multilayer perception a new approach, Neural Netw. 6 (1997) 1069–1081. [28] T.F. Xie, F.L. Cao, The errors in simultaneous approximation by feed-forward neural networks, Neurocomputing 73 (2010) 903–907. [29] W. Cheney, W. Light, A Course in Approximation Theory, Thomson Learning, 2000.

[30] R. DeVore, G.G. Lorentz, Constructive Approximation, Springer-Verlag, Berlin, Heidelberg, New York, 1993.

Computers and Mathematics with Applications 61 (2011) 3146–3152

ScienceDirect