• No results found

Exploring Non-linear Relationships through Splines

4.4 Other Data Sources Considered

5.1.3 Exploring Non-linear Relationships through Splines

In the aggregate dataset of local authorities (stratified by age), the socio-economic covariates are continuous, expressed as the percentage of mothers in the age/LA stratum that correspond to an individual-level binary characteristic i.e. white/non-white ethnicity or married/unmarried status. We wish to explore the existence of a non-linear relationship between stillbirth and the covariates. For example, is there a threshold value beyond which the any covariate’s relationship to stillbirth changes? Spline regression provides a flexible way of modelling this by dividing the data’s range into smaller segments and fitting simple models to each. This method is attractive because its basis functions (described below) can be treated as ordinary covariates in regression (Kagerer,2013).

Advantageously, spline regression fits within the class of linear regressions and so is generally less computationally intensive than other semi-parametric methods. de Boor(2001) provides a detailed overview of the method.

The Truncated Power Basis

Piecewise linear splines The simplest spline (of degree 1) is formed by linking linear segments. The covariate data is split into sub-intervals and a linear function is fitted over each sub-interval. Each line may differ in gradient but must connect to the adjacent lines at the interval boundaries, known as knots. A spline with K interior knots divides the data intoK+ 1sub-intervals. A linear spline regression function for a single covariate is

y=β0+β1x+

K

X

k=1

αk(xξk)+ (5.12)

(xξk)+=      0 x < ξk xξk x > ξk. (5.13)

The truncation function affects only the fit of the spline to the right of knotξk. For values ofxin the first segment, the basis function is simplyy =β0+β1x. In the next segment, the basis function is

y=β0+β1x+α1(xξ1) =β0α1ξ1+ (β1+α1)x, which continues from the end point of the previous segment with an additional gradient(β1+α1). More generally, the gradient of the line between knotsξk andξk+1isβ+α1+...+αk. The parametersβ0, β1, α1, ..., αK can be estimated by maximum likelihood just like ordinary covariates.

Higher order splines Linear splines can fit data as well as, or even better than higher degree polynomials, but sometimes a smoother line is desired. We can achieve this by constructing the spline with segments of higher order. Using intervals of quadratic (or more commonly, cubic) polynomials allows the curve to remain relatively easy to interpret whilst achieving the smoother line of a higher order polynomial. A spline of degreeDis y=β0+β1x+...+βDxD+ K X k=1 αk(xξk)D+ (5.14)

withKknots and the same truncation function(xξk)+from Equation5.13raised to the power ofD.

The number of parameters isD+K+ 1. Such a spline function is continuous along with its firstD−1

derivatives. TheDthderivative is continuous at the knot points.

Unfortunately, a common issue for the truncated power basis is that the fitting algorithm can become numerically unstable if the basis functions are correlated (Ruppert et al.,2003). An alternative approach is the use of a B-splines basis function, described below.

The B-spline Basis

B-spline basis functions can be used in the same circumstances as truncated polynomials. All splines are linear combinations of B-splines, which consist of connected polynomial pieces. However, they are not highly correlated and always take values between 0 and 1, thus the algorithms used to fit them are much more stable (Birk, 1994).

B-splines of degree 1 form the basis for linear piecewise fits. As a simple example, consider degree 1 B-splines on a domainxwith four knots(x0, x1, x2, x3). It splits into three linear pieces;x0tox1,x1to x2 andx2tox3. Figure5.1shows the three basis functions. Within each interval, the corresponding

Figure 5.1: Degree 1 B-splines for linear piecewise fit with equidistant knots

B-spline function goes from 0 to 1 with a proportional slope, then declines to 0 at the following knot. On either side of the boundary knotsx0andx3, the B-splines are equal to 0. The B-spline values each form

a column in a new predictor matrix for regression.

More generally, a B-spline of degreeDhas following properties:

• it consists ofD+ 1polynomial pieces, each of degreeD; • at the knots, derivatives up to orderD−1are continuous;

• it is positive on a domain spanned byD+ 2knots and zero elsewhere; • at a givenx,D+ 1B-splines are non-zero.

(Ellers and Marx,1996)

Given a domain divided intoK−1intervals byKknots (including the boundary knots at either side), the number of B-splines of degree D in the regression isn=D+K−1.

LetBj(x;D)denote the value atxof thejth B-spline of degreeD. Then the regression functiony(x)is

y(x) =

n

X

j=1

Bj(x;D)

with estimate parametersαj, j= 1, . . . , n.

In R, the B-spline basis was constructed with the packagesplines.

Knot selection Decisions concerning the number and placement of knots are very important. In terms of which knot sequence would produce the best fitting regression model,Birk(1994, pg. 69) laments

that “there is little or no formal theory to justify” the choice. A trivial starting point would be to use equidistant knots or have an equal number of observations between knots. Practically speaking, an examination of the data combined with knowledge of the subject matter may help determine appropriate knot selection.

The goal for this analysis is not necessarily to produce the smoothest fit but to determine where “breakpoints” in the continuous covariates might be - threshold values beyond which the a covariate’s association to stillbirth changes. We employ a manual yet systematic approach. Firstly, each of the continuous covariates will be investigated one at a time within the model, with the other covariates remaining linear. Beginning with splines of degree 1 (piecewise linear models), spline functions will be performed with evenly distributed segments, i.e. 1 knot at the median, 2 knots at the 33% and 66% centile, 3 knots (quantiles), 4 knots (quintiles) and 9 knots (deciles). From the patterns observed in these results, we systematically experiment with knot placement until an optimal selection is found, with the relative quality of each selection being compared via the models’ AIC values.