R-splines for response surface Modeling

(1)

July 12,2000

Sarah W. Hardy Douglas W. Nychka

North Carolina State University National Center for Atmospheric Research

(2)

R-splines are introduced as splines t with a polynomial null space plus the sum of radial basis functions. Thin plate splines are a special case of R-splines. By this

broader denition of an R-spline, however, it includes splines in which 2m?d 0

where the traditional roughness penalty is not guaranteed to be non-negative denite. This papers discusses a modication of the roughness penalty that allows for the tting of reduced polynomial null spaces. A series of examples are used to demonstrate the behavior of this modied roughness penalty.

Keywords: Thin plate spline, nonparametric, response surfaces, roughness penalty, Demmler-Reinsch basis functions.

1 Introduction

(3)

plate spline modeling.

R-splines arose in the context of attempting to modify the thin plate spline to allow for a greater number of explanatory variables in the model. R-splines are splines t with a polynomial null space plus the sum of radial basis functions. Clearly, thin plate splines fall into this category. By this broader denition of an R-spline, however, it also includes other splines. Specically, it includes splines in which 2

m

?

d

0. In other words, the polynomial null space is reduced or of a lower order than required by the thin plate spline restraint. When 2

m

?

d

0, the seamless way in which the polynomial function and roughness penalty t together (i.e, the space spanned by the polynomial terms is the null space of the roughness penalty) no longer holds or the relationship is \broken." The term brokenspline will be used to indicate this kind of spline where the penalty matrix is chosen to be similar to the thin plate spline. The term R-splines refers to the broader class of splines including both broken and thin plate splines.

This paper begins by discussing the thin plate spline roughness penalty. It then presents the broken spline modication to the roughness penalty. The Demmler-Reinsch basis function representation of splines is used to show the role the eigenvalues of the roughness penalty matrix have in the spline solution. A series of four two and three dimensional examples from data sets available in S-PLUS and FUNFITS (Nychka, et. al., 1996), is used to compare the roughness penalties for the thin plate and broken spline. Another ve dimensional example using data collected at Becton Dickinson is also presented.

2 Thin plate spline roughness penalty

The roughness functional below,

J

m(

f

), will increase in magnitude as a function departs

(4)

J

m(

f

) =

Z <

d

X

m

!

1!

:::

d!

@

m

_f

@u

1 1

:::@u

d

! 2

:d

u

The sum in the integrand is taken over all non-negative integer vectors,

, such that P

1+

+

d =

m

. Clearly, for

m

?1 order polynomials

J

m(

f

) = 0, because all

m

th derivatives are 0.

The thin plate spline estimator of

f

(Wahba, 1990) is the minimizer of the sum of the mean-squared error and the roughness penalty which is weighted by a smoothing parameter. Details on the form of the thin plate spline can be found in the appendix. In matrix form, the thin plate spline estimate is

f

(xi) =

T

+

M

where

T

= 0 and 2

m

?

d >

0

;

where

T

is the design matrix of a polynomial model of order

m

?1 and

M

k;i =

E

(k xi?xk k;

m;d

).

For linear combinations of basis functions that satisfy the constraints,

J

m(

f

) =

T

M >

0. In other words,

M

is guaranteed to be positive denite. Dening the matrix

W

nn, a diagonal matrix proportional to the reciprocal variances of the errors, allows for the following matrix representation of the penalized sums of squares:

S

= 1

_n

(

Y

?

T

?

M

)T

W

(

Y

?

T

?

M

) +

T

M:

(1) After taking partial derivatives of (1) with respect to

and

, a QR decomposition of

T

,

F

T

_T

₌

2 6 6 4

R

0 3 7 7

5, is used to enforce the constraint

T

_{= 0.}

_F

_n

n is an orthogonal matrix that can be partioned

F

= [

F

1

j

F

2] where

F

1 has columns that span the column space of

T

, i.e

T

=

F

1

R

, and

F

2 is orthogonal to the column space of

T

. Reparameterizing by letting

=

F

2

!

2,

!

1, and

!

T _{= (}

_!

1

;!

2)

(5)

regression form:

X

T

_WX!

₊

_H!

₌

_X

T

_WY;

where

X

=

T MF

2 T

and

H

= 2 6 6 4

0 0

0

F

T

2

MF

2

3 7 7 5

:

Note that in the ridge regression formulation the roughness penalty is represented by the matrix H.

3 Broken spline roughness penalty

The degree of the polynomial component implies a specic roughness penalty in the thin plate spline, and thus if

T

, the design matrix for the polynomial component does not span

P

m?1 the roughness penalty minimized in the ridge regression formulation will not be the same roughness penalty as the one used to derive the thin plate spline. We will use the term broken spline to describe splines where T does not span

P

m?1.

For purposes of this comparative discussion the T matrix for a thin plate spline will be denoted

T

P and the

T

matrix for a broken spline will be denoted

T

R. Note that the

column space of

T

R is a subset of that of

T

P. Partition

F

= [

F

a

F

b

F

c]

where

F

a spans the space of

T

R,

F

b spans the columns of

T

P that are not in

T

R, and

F

c

(6)

For the broken spline H, the roughness penalty matrix, becomes

H

R=

2 6 6 6 6 6 6 4

0r r

0 0

0

F

Tb

MF

b

F

Tc

MF

b 0

F

Tb

MF

c

F

Tc

MF

c

3 7 7 7 7 7 7 5

;

as compared to the H for the thin plate spline which is

H

t=

2 6 6 6 6 6 6 4

0r r

0 0

0 0t ?rt?r

0

0 0

F

Tc

MF

c

3 7 7 7 7 7 7 5

:

Hence,

H

R for the broken spline is the roughness penalty for the thin plate spline, plus

other terms,

H

R=

H

t+

2 6 6 6 6 6 6 4

0 0 0

0

F

Tb

MF

b

F

Tc

MF

b 0

F

Tb

MF

c 0

3 7 7 7 7 7 7 5

:

With thin plate splines,

H

tis guaranteed to be non-negative denite. Recall

T

M >

0

where

T

_{= 0 and 2}

_m

?

d >

0, which guarantees that

M

is positive denite.

F

T 2

MF

2, is a quadratic form of a positive denite matrix and as such is also positive denite.

H

t,

having

F

T

2

MF

2 in the lower right block and zeros elsewhere, is non-negative denite. With broken splines, however,

H

Ris no longer a non-negative denite matrix. In the

computational solution to the thin plate spline,

UDU

T _{is the singular value decomposition}

of

BHB

so

H

t =

B

?1

UDU

T

B

?1. Now, BHB is no longer positive denite and thus its singular value decomposition is

UDV

T_{, where}

_U

6=

V

, and

H

R =

B

(7)

columns corresponding to the negative eigenvalues. Hence,

H

R, is the matrix that results from forcing the negative eigenvalues of

H

R to be positive. How \close" the two matrices

are depends on the magnitude of these eigenvalues. The negative eigenvalues that occur correspond to the \missing" polynomial terms in the null space. Thus, the roughness penalty is forced into penalizing roughness in the surface that might be modeled by these missing polynomial terms, tting them instead with the exible radial basis functions.

3.1 Demmler-Reinsch basis functions

3.1.1 Denition

Constructing a Demmler-Reinsch basis for the smoothing problem aids in the understand-ing of the roles the eigenvalues and eigenvectors of

H

have in the spline solution. Before dening the Demmler-Reinsch basis, two inner products are dened:

< h

;h

>

1= X

k

h

(

xk)

W

k

h

(xk) and

< h

;h

>

2= Z

< d

X

m

!

1!

:::

d!

@

m

_h

@x

1 1

:::@x

d

!

@

m

_h

@x

1 1

:::@x

d

!

d

x

The Demmler-Reinsch basis is denoted byf

g

g and is dened by the three following properties:

1. f

g

gfor 1

N

spans the same subspace asf

jgand the radial basis functions. 2.

< g

;g

>

1= 1 for

=

and 0 for

6

=

. 3.

< g

;g

>

2=

D

for

=

and 0 for

6

=

.

(8)

residual sums of squares and the roughness penalty can be expressed. By convention

D

are in ascending order.

3.1.2 Role in spline solution

Because this basis spans the same space as the components of

f

(x) we can now express both

f

(x) and

y

as a linear combinations of these basis functions:

f

(xk) =

N

X

=1

g

(xk) and

y

k=

N

X

=1

u

g

(xk)

:

(2) Using these expression it can be easily shown that

u

= P

nk=1

g

(

xk)

W

k

Y

k

:

Moreover, under the assumption that the random errors are independent and distributed

N

(0

;

2

I

), the

u

's are also independent and are distributed

N

(

;

2

I

)

:

With the Demmler-Reinsch basis functions the residual sum of squares and the rough-ness penalty can be expressed:

n

X

k=1

(

Y

k?

f

(xk)) 2

w

k =Xn

=1

(

u

?

) 2 and

J

m(

f

) =Xn

=1

D

2

:

Thus, it is apparent that minimizing

S

(

f

) is equivalent to nding

min 2<

n

X

=1

(

u

?

) 2+

n

X

=1

D

2

:

(3)

The minimizing expression is:

c

= _{1 +}

u

_D

;

and so,

^

f

(x) =

N

X

=1

c

g

(x) =

N

X

=1

u

(9)

In fact, the

G

= f

g

=1;::;N we are seeking is (cleverly) the same G we found

in the ridge regression formulation and property (2) is easily veried by noting that

G

T₍

_X

T

_WX

₎

_G

₌

_I:

_{To determine if property (3) holds, observe that}

_G

T

_HG

₌

_D

_{. As}

an important point of clarication recall that the upper

r

block of

H

is 0 where

r

is the dimension of the null space. This implies

D

= 0 for

= 1

;::;r

where

D

are the

eigenvalues of

G

T

_HG

_{. More intuitively, by virtue of the way in which the basis functions}

are dened, the roughness of the basis functions is

J

m(

g

) =

D

and

r

of these basis

functions will be polynomials whose roughness is 0.

With this formulation, it is apparent that if the

D

's, which are the eigenvalues of

H

,

and the

g

's, which are the Demmler-Reinsch basis functions, are correlated for the thin

plate and broken spline, then the two roughness penalties are similar and measuring the same features of the data.

4 Examples

4.1 Example datasets

(10)

engine was run (a measure of the richness of the air/ethanol mix). The response variable is the concentration of nitric oxide and nitrogen dioxide in engine exhaust, normalized by the work done by the engine. The mini-triathalon data contains swimming, biking, and running times from 110 entrants in a Cary, NC event. In the example we are using swimming and biking times to predict running times. The BD2 dataset comes from results of a sequence of RSM designs in a DNA amplication experiment performed at Becton Dickinson Technologies. The explanatory variables are potassium phosphate ( a buer), magnesium acetate (a salt), and dimethyl sulfoxide ( a solvent). The response variable is yield. In summary, the ethanol and mini-triathalon datasets each have 2 explanatory variables and the stack and BD2 datasets have 3.

(11)

•••••••••••••••••• •• •

• •

•

m=1

m=2

0 10 20 30 40

0

2

4

6

8

minitri H eigenvalues

•••••• ••••

•• • •

• •

• • •

•

m=1

m=2

0.0 0.2 0.4 0.6

0.0

0.04

0.08

0.12

stack.loss H eigenvalues

•••••••••••••••••••• • ••

• •

•

• • •

m=1

m=2

0 5 10 15

0.0

1.0

2.0

3.0

ethanol H eigenvalues

•••••••••••••••••••••• •••••• ••• ••

•• • •

•

•• •

•

m=1

m=2

0.0 0.4 0.8 1.2

0.0

0.05

0.15

BD2 H eigenvalues

H eigenvalues for m=1 and m=2

(12)

Figure 2 shows image plots of the correlations of the thin plate spline and broken spline Demmler-Reinsch basis functions for these four data sets. Lines on the graphs indicate the division of the basis functions associated with the null space and roughness penalty. The areas in the regions greater than the lines on the graph indicate the regions where the basis functions are those associated with the null space. The area enclosed within the lines indicates the basis functions associated with the roughness penalty, which these plots show are highly correlated. As expected, the area outside the lines is not as highly correlated, as the polynomials in the null space are not the same for the thin plate and broken splines. There is, however, some correlation between the basis functions representing the null space in the thin plate spline and those representing the roughness penalty in the broken spline. This indicates that the polynomial terms missing from the broken spline are being accounted for in some way by the roughness penalty. There is some dierence between the two sets of basis functions, but overall when ordered in ascending order by eigenvalues the two sets of basis functions match closely beyond the rst few functions related to the null space. Clearly, more theoretical work is needed to explicitly quantify the relationship given these promising empirical results.

(13)

0 20 40 60 80

0

20

40

60

80

120 -1.0 -0.5 0.0 0.5 1.0

m=1

m=2

minitri

0 5 10 15 20

0

5

10

15

20

25

-1.0 -0.5 0.0 0.5 1.0

m=1

m=2

stack.loss

0 20 40 60 80

0

20

40

60

80

100 -1.0 -0.5 0.0 0.5 1.0

m=1

m=2

ethanol

0 10 20 30 40

0

10

20

30

40

50

-1.0 -0.5 0.0 0.5 1.0

m=1

m=2

BD2 Correlation of Demmler-Reinsch basis functions for m=1 and m=2

(14)

• • • • • • • •• • • • • • • • •_• • • • • ••• • • • • •• • • •• • • • • • • • • • • • • ••• • • • • • • • • • • • • • • • • • • • • • •• • • • • • • • •• • • • • • • • • • • • • • • • • • • • • • • • • • • •• Predicted Values Observed Values

25 30 35 40 45 50 55

R^2 = 69.56% Q^2 = 64.88%

• • • • • • • •• • • • • • • • • • • • • • • • • • • • • • • • • •• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •_• • • • • • • • • • • • • • •• Predicted values Residuals

30 35 40 45 50 55

-10

-5

0

5

10

m=1, Broken spline

RMSE = 3.571

• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •• •• •• •• •• •• •• •• •• •• ••• ••••••

Effective number of parameters

Average Prediction Error

0 20 40 60 80 100

20 40 60 80 100 120

Eff. df. = 8 Res. df. = 102 GCV min = 13.753

GCV • • • • • • • •• _• • • • • • • • • • ••_••_•• • • • • • • • • •• • • • • • • • • • • • • ••• • • • • • • • • • • • • • • • • • • • • • •• • • • • • • • • • • • • • • • • • • • • • • • • • • • •• • • • • • • •• Predicted Values Observed Values

25 30 35 40 45 50 55

R^2 = 70.23% Q^2 = 64.66%

• • • • • • • •• • • • • • • • • • • • • • • • • • • • • • • • • •• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •• Predicted values Residuals

30 35 40 45 50 55

-10

-5

0

5

10

m=2, Thin plate Spline

RMSE = 3.552

• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •• •• •• ••• ••••• ••••

0 20 40 60 80 100

20 40 60 80 100 120

Eff. df. = 9.6 Res. df. = 100.4 GCV min = 13.828

GCV

(15)

4.2 A larger example dataset

••••••••••••••••• •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

••••••••••••••••••••••••••••••••• ••••••••••••••

••••••• ••••••

•••••• •••

••• •••

• • • •

• •• •• • • •• • • • • • • •

m=2

m=3

0.0 0.01 0.02 0.03

0.0

0.002

0.004

0.006

0.008

0.010

0.012

example

Figure 4: Comparison of H eigenvalues for the example dataset t with a broken and thin plate spline.

(16)

the broken spline than for the thin plate spline.

0 50 100 150

0

50

100

150

200

250

-1.0 -0.5 0.0 0.5 1.0

m=2

m=3

example

Figure 5: Correlation of basis functions for the example dataset t with a broken and thin plate spline.

(17)

case the broken spline is doing a slightly better job of tting the data. • • • • • •• • • • • • • • •_• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •• • • • • • • • • • • • • • • • • • •• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •• • • • Predicted Values Observed Values

-2 -1 0 1 2

-2

-1

0

1

2 R^2 = 45.12% Q^2 = 10.18%

• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •• • • • • • • • • • • • •• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •• • • • Predicted values Residuals

-1.0 -0.5 0.0 0.5

-2

-1

0

1

2

m=2, Broken spline

RMSE = 0.6713

Pure Error = 0.6929 •

• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •• •• •• •••• ••••• •••••

0 50 100 150

0.6

0.8

1.0

1.2

GCV • • • • • • • • • • • • • • • _• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •• • • • • • • •• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • Predicted Values Observed Values

-2 -1 0 1 2

-2

-1

0

1

2 R^2 = 37.5% Q^2 = 6.3%

•• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • _• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •• • • • • • • • • •• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •• • • • • • • • • • • • • • • • • • • • • • • • Predicted values Residuals

-1.0 -0.5 0.0 0.5

-2

-1

0

1

2

m=3, Thin plate Spline

RMSE = 0.6761

Pure Error = 0.6929 •

• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •• •• •• ••• •••• ••••••

50 100 150

0.6

0.8

1.0

1.2

GCV

Figure 6: Diagnostic plots for the example dataset t with a broken and thin plate spline.

5 Conclusions

(18)

It can be shown that the thin plate spline solution is the unique and optimal solution to the posed minimization problem. The broken spline is also a unique solution to the minimization problem given a xed roughness penalty matrix as described.

In the thin plate spline, the roughness penalty is explicitly dened and makes intuitive sense from a physical perspective as the bending energy of a thin plate. In fact, positive denite matrices arise in many applications involving energy, and clearly only make sense for a roughness penalty that is to be minimized. With the broken spline, although the eective roughness penalty can not be explicitly dened, it is closely related to the \bend-ing energy" roughness penalty and empirically gives good results. The new insight that it is possible to achieve good results using the sum of polynomial terms and radial basis functions to estimate smooth functions without the restrictions imposed for thin plate splines opens the door for experimentation using this modeling technique for many more types of datasets.

A Denition of a thin plate spline

The thin plate spline estimator of

f

(Wahba, 1990) is the minimizer of the following penalized sums of squares for a

d

-dimensional explanatory variablex:

S

(

f

) = 1

_n

Xn

i=1

w

i(

y

i?

f

(xi)) 2+

J

m(

f

) for

>

0

:

(5)

Thus, an ^

f

minimizing (5) will result in

f

with some level of smoothness dictated by

. The function that minimizes this expression has the form

f

(xi) =

t

X

j=1

j(xi)

j +

N

X

k=1

E

(kxi?xkk;

m;d

)

k (6)

where Xt

j=1

(19)

In this formulation, the

j(xi) are a set of

t

polynomial functions (of order

m

?1) and

E

(k xi?xk k;

m;d

) are a set of

N

radial basis functions. It is assumed that

j is estimable.

The radial basis functions are explicitly dened as below:

E

(r;

m;d

) = 8 > > < > > :

a

mdkrk

(2m?d)

log

(

krk)

d

even

a

mdkrk

(2m?d)

d

odd

where

a

mddepends only on

m

and

d

. One standard way of determining

is by generalized

cross-validation (GCV) (Bates et al, 1987).

B References

Bates, D.M., Lindstrom, M.J., Wahba, G. and Yandell, B.S. (1987). GCVPACK - Rou-tines for generalized cross validation. Comm. Stat. Sim. Comp. 16, 263-297. Berlin, pp. 85-100.

Nychka, D., B. Bailey, S. Ellner, P. Haaland, and M. O'Connell (1996). FunFits data analysis and statistical tools for estimating functions. Software and paper available from statlib.

StatSci (1993)S-PLUSReferenceManualVol. 2,Version3.2MathSoft, Inc., Seattle, WA. Wahba, G. (1990). Spline ModelsforObservationalData. Society for Industrial Applied