• No results found

Nonlinear Regression Models

3.2 Modeling by Basis Functions

Figure 3.2 shows a plot of the measured acceleration y (in terms of g, the acceleration due to gravity) of a crash-dummy’s head at time x (in milliseconds, ms) from the moment of impact in repeated motorcycle crash trials (H¨ardle, 1990). With such data, which involve complex non-linear structures, it is difficult to effectively capture the structure of the phenomena by modeling based on a polynomial model or a specific non-linear function.

In this section, we describe more flexible models for explication of such nonlinear relationships. The models are based on spline, B-spline, radial, and other basis functions. In Section 3.3, we show that they can be uniformly organized by the method known as basis expansions. We are chiefly concerned here with models that explicate the relationship between a single predictor variable x and a response variable y, and con-sider these nonlinear methods in the following framework.

7LPH PLOOLVHFRQGV 

$FFHOHUDWLRQ JUDYLW\ 

\

[

Figure 3.2 Motorcycle crash trial dataʢn= 133).

Suppose that we have n set of observations{(xi, yi); i = 1, 2, · · · , n}

for predictor variable x and response variable y, and the n values xiof the predictor variable are given in order of increasing size in the interval [a, b] such as a < x1 < x2 < · · · < xn < b. We assume that yi at each point xiis observed as

yi= u(xi)+ εi, i= 1, 2, · · · , n (3.9) with the noise εi. Several types of flexible models that reflect the struc-tures (u(x)) of phenomena have been proposed for the separation of noise from data and explication of these structures.

3.2.1 Splines

In basic concept, a spline is a special function that smoothly connects several low-degree polynomials to fit a model to the observed data, rather than fitting a single polynomial model to the data. The interval contain-ing all n observed values{x1, x2, · · · , xn} is divided into subintervals, or segments, and piecewise fitting of polynomial models to the segments is

a t1 t2 t3 t4 t5 t6 t7 t8 b Figure 3.3 Fitting third-degree polynomials to the data in the subintervals [a, t1], [t1, t2],· · ·, [tm, b] and smoothly connecting adjacent polynomials at each knot.

performed. Explication of a complex structure with a single polynomial model would invariably require the fitting of a high-order polynomial to all observed data, which would then lead to difficulty in obtaining a stable model suitable for making predictions. In contrast, a spline, as illustrated in Figure 3.3, performs piecewise fitting of low-order polynomial mod-els to data in subintervals and forms smooth connections between the models of adjacent intervals, in the manner next described.

Let us divide the interval containing all values {x1, x2, · · · , xn} of the predictor variable into subintervals at t1 < t2 < · · · < tm. In the spline, these m (≤ n) points are known as knots. Suppose the spline is used to fit third-degree polynomials to the data in the subintervals [a, t1], [t1, t2],· · · , [tm, b] and smoothly connect adjacent polynomials at each knot (Figure 3.3). This means that the model fitting is performed under the constraint that the first and second derivatives of both third-degree polynomials at any given knot are continuous.

In practice, cubic splines are the most commonly used, for which the spline function having knots t1 < t2< · · · < tmis given by

u(x;θ) = β0+ β1x+ β2x2+ β3x3+

m i=1

θi(x− ti)3+, (3.10)

whereθ = (β0, β1, β2, β3, θ1, θ2, · · · , θm)T and (x− ti)+ = max{0, x − ti}

W

L

^ `

[ W



L



PD[ 

[ W



L

W

L

[ W 

L



Figure 3.4 Functions (x− ti)+= max{0, x − ti} and (x − ti)3+included in the cubic spline given by (3.10).

(Figure 3.4)ɽOne further condition is applied at the two ends of the over-all interval. It is known that third-degree polynomial fitting is unsuitable near such boundaries, as it generally tends to induce large variations in the estimated curve. For cubic splines, the condition is therefore added that a linear function be used in the subintervals [a, t1] and [tm, b], the two end intervals. Such a cubic spline is known as a natural cubic spline and is given by the equation

u(x;θ) = β0+ β1x+

m−2



i=1

θi{di(x)− dm−1(x)} , (3.11)

whereθ = (β0, β1, θ1, θ2, · · · , θm−2)Tand

di(x)=(x− ti)3+− (x − tm)3+ tm− ti

. (3.12)

A key characteristic of (3.10) and (3.11) is that, although the spline function itself is a nonlinear model, it is a linear model in terms of its pa-rameters β0, β1, · · · , θ1, θ2, · · ·. Cubic splines are thus models represented

F  G 

D  E 

Figure 3.5 Basis functions: (a){1, x}; linear regression, (b) polynomial regres-sion;{1, x, x2, x3}, (c) cubic splines, (d) natural cubic splines.

by a linear combination of

1, x, x2, x3, (x − t1)3+, (x − t2)3+, · · · , (x − tm)3+, (3.13) and natural cubic splines are models represented by a linear combination of

1, x, d1(x)− dm−1(x), d2(x)− dm−1(x),· · · , dm−2(x)− dm−1(x). (3.14) The functions given in these equations are referred to as basis functions.

Thus a simple linear regression model is a linear combination of the basis functions{1, x}, and a polynomial regression model can be represented by a linear combination of the basis functions{1, x, x2, · · · , xp}.

Figure 3.5 (a) – (d) shows the basis functions of a linear regression model, a polynomial regression model, a cubic spline given in (3.13), and a natural cubic spline given in (3.14). In contrast to the polyno-mial model, the spline models are characterized by the allocation of basis functions to specific regions of the observed data. The influence of any given weight estimate on the overall model is therefore small, which may be seen to facilitate estimation by a model that can capture the structure

1 0.0 0.2 0.4 0.6

2 3 4 5

Figure 3.6 A cubic B-spline basis function connected four different third-order polynomials smoothly at the knots 2, 3, and 4.

of the phenomenon throughout the entire region. Model estimation based on splines is described in Section 3.3.

3.2.2 B-splines

A spline, as described above, is constructed from piecewise fitting of polynomials and smooth connection at knots between the polynomials in adjacent subintervals. A B-spline basis function, in contrast, consists of multiple polynomials connected smoothly. Figure 3.6 shows a cubic B-spline basis connected different third-order polynomials smoothly at the knots 2, 3, and 4.

In order to construct m basis functions{b1(x), b2(x),· · · , bm(x)}, we set the knots tias follows:

t1< t2< t3< t4= x1< · · · < tm+1= xn< · · · < tm+4. (3.15) Then, n data points are partitioned in the (m− 3) subintervals [t4, t5], [t5, t6],· · ·, [tm, tm+1]. Given these knots, we use the following de Boor’s (2001) algorithm to construct the B-spline basis functions.

In general, let us denote the j-th B-spline function of order r as bj(x; r). We first define the B-spline function of order 0 as follows:

bj(x; 0)=⎧⎪⎪⎨

⎪⎪⎩ 1, tj≤ x < tj+1

0, otherwise (3.16)

Starting from this B-spline function, we can then obtain the j-th spline function of order r by the following recursion formula.

bj(x; r)= x− tj

tj+r− tj

bj(x; r− 1) + tj+r+1− x

tj+r+1− tj+1bj+1(x; r− 1). (3.17) Figure 3.7 shows the first-, second-, and third-order B-spline func-tions for uniformly spaced knots. For each B-spline function, the basis function is composed of one more straight line, quadratic polynomial, or cubic polynomial than its polynomial order, and each subinterval is similarly covered (piecewise) by the polynomial order plus one basis function. For example, the third-order B-spline function, as shown in Figure 3.6, is composed of four cubic polynomials, and as may be seen in the subintervals bounded by dotted lines in Figure 3.7, the subintervals [ti, ti+1] (i= 4, · · · , m) are respectively covered by the four cubic B-spline basis functions bi−2(x; 3), bi−1(x; 3), bi(x; 3), bi+1(x; 3).

A third-order B-spline regression model approximates the structure of a phenomenon by a linear combination of cubic B-spline basis func-tions and is given by

yi=

m j=1

wjbj(xi; 3)+ εi, i= 1, 2, · · · , n. (3.18) Figure 3.8 shows a curve fitting, in which a third-order B-spline re-gression model is fitted to a set of simulated data. With u(x) = exp{−x sin(2πx)} + 0.5 as the true structure, which is represented in the figure by the dotted line, we generated two-dimensional data using y = u(x) + ε with Gaussian noise. The solid line represents the fitted curve. With good estimates of the model, it is then possible to capture the nonlinear structure of the data.

When applying splines in practical situations, we still need to deter-mine the appropriate number and position of the knots. Moving the knots to various positions and then estimating them as parameters can result in extremely computational difficulties. One approach to this problem is to position the knots at equal intervals in the observed range of the data and refine the smoothness of the fitted curve by changing the number of knots. In the following section, we consider this problem within the framework of model selection.

0.0

0 2 4 6 8

0.2 0.4 0.6 0.0 0.2

0 2 4 6 8

0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 1.0

0 1 2 3 4 5 6 7

Figure 3.7 Plots of the first-, second-, and third-order B-spline functions. As may be seen in the subintervals bounded by dotted lines, each subinterval is covered (piecewise) by the polynomial order plus one basis function.

3.2.3 Radial Basis Functions

Here let us consider a model based on a set of n observations{(xi, yi); i= 1, 2,· · · , n} for a p-dimensional vector of predictor variables x = (x1, x2,

· · · , xp)T and a response variable Y, where xi = (xi1, xi2, · · · , xip)T. In general, a nonlinear function φ(z) depending on Euclidean distance z =

||x−μ|| between p-dimensional vector x and μ is known as a radial basis function, and a regression model based on radial basis functions is given by

yi= w0+

m j=1

wjφj

||xi− μj||

+ εi, i= 1, 2, · · · , n, (3.19)

     



[

\

Figure 3.8 A third-order B-spline regression model is fitted to a set of data, gen-erated from u(x) = exp{−x sin(2πx)} + 0.5 + ε with Gaussian noise. The fitted curve and the true structure are, respectively, represented by the solid line and the dotted line with cubic B-spline bases.

(Bishop, 1995, Chapter 5; Ripley, 1996, Section 4.2), where μjis a p-dimensional vector of center that determines the position of the basis function. The function often employed in practice is the Gaussian basis function given by

φj(x)≡ exp

⎛⎜⎜⎜⎜⎜

⎝−||x − μj||2 2h2j

⎞⎟⎟⎟⎟⎟

⎠ , j= 1, 2, ..., m, (3.20)

where the quantity h2jrepresents the spread of the function and, together with the number of basis functions, plays the role of a parameter that controls the smoothness of the fitted model. Other commonly used non-linear functions include the thin plate spline function φ(z)= z2log z and the inverse polynomial function φ(z)= (z2+ h2)−γ(γ > 0).

The unknown parameters included in the nonlinear regression model with Gaussian basis functions are{μ1, · · · , μm, h21, · · · , h2m} in addition to the coefficients {w0, w1, · · · , wm}. A method of estimating all of these parameters simultaneously might be considered, but this would lead to

questions concerning the uniqueness of the estimates and localization of solutions in numerical optimization, which, along with the selection of the number of basis functions, would require an extremely large com-putation time. In practice, a two-step estimation method is used as an effective tool of avoiding these problems: first the basis functions are determined from the data on the predictor variables and then a model having the known basis functions is fit to the data (Moody and Darken, 1989; Kawano and Konishi, 2007; Ando et al., 2008).

One such method employs clustering as a tool of determining the basis functions, for which a technique such as k-means clustering or self-organizing mapping described in Chapter 10 may be used to partition the data{x1, x2, · · · , xn} on p predictor variables into m clusters C1, C2, · · · , Cmthat correspond to the number of basis functions. The centersμjand width parameters h2jare then determined by

μˆj= 1

where njis the number of the observations that belong to the j-th cluster Cj. Substituting these estimates into the Gaussian basis function (3.20) gives us a set of m basis functions

ˆφj(x)= exp The nonlinear regression model based on the Gaussian basis functions is then given by

yi= w0+

m j=1

wjˆφj(xi)+ εi, i= 1, 2, · · · , n. (3.23)

As described in the next section, the key advantage of advance determi-nation of the basis functions from the data on the predictor variables is that it facilitates estimation of the nonlinear regression model.