• No results found

Ph.D. course: Regression models. Linear effects. Approaches

N/A
N/A
Protected

Academic year: 2022

Share "Ph.D. course: Regression models. Linear effects. Approaches"

Copied!
6
0
0

Loading.... (view fulltext now)

Full text

(1)

Ph.D. course: “Regression models”

Non-linear effect of a quantitative covariate

PKA & LTS Sect. 4.2.1, 4.2.2

6 May 2013

www.biostat.ku.dk/~pka/regrmodels13

Per Kragh Andersen

1

Linear effects

We have studied models with the linear predictor:

LPi = a + bxi for a quantitative covariate x, both for

• quantitative outcomes (b is a mean value difference),

• binary outcomes (b is a log(odds ratio),

• survival times (“a = log(h0(t))”, b is a log(hazard ratio)).

The slope b has a simple interpretation: change for the linear predictor per 1 unit change in x.

Linearity is simple, but restrictive and we need

• ways of checking the assumption of linearity

• alternative models to use when linearity fits poorly

2

Approaches

Different possibilities are:

• transformation of x, i.e. LPi= a + bf (xi); the function f must be known,

• scatterplot smoother, fine for description, not optimal for inference (Figure next slide),

• methods based on choosing cut-points for x (Sect. 4.2.1),

• polynomials (Sect. 4.2.2).

The last three approaches may suggest transformations, f.

We will use modeling the effect of bilirubin in the PBC-3 trial as illustration, but ideas carry over to quantitative and binary outcomes.

0 5 10 15

0.000.010.020.030.04

Drinks per week

Probability of fetal death

Figure 1: Scatterplot smoother for the binary outcome y (fetal death) when plotted against the covariate x (alcohol consumption). The dis- tribution of x is indicated along the horizontal axis.

(2)

Using cut-points for the covariate

• Piecewise constant effect

• Linear regression splines

• Quadratic/cubic (restricted) regression splines

5

2 4 6 8 10

−2.0−1.5−1.0−0.50.00.5

x

Linear Predictor

2 4 6 8 10

−2.0−1.5−1.0−0.50.00.5

x

Linear Predictor

2 4 6 8 10

−2.0−1.5−1.0−0.50.00.5

x

Linear predictor

2 4 6 8 10

−2.0−1.5−1.0−0.50.00.5

x

Linear Predictor

Figure 2: Illustration of models for the linear predictor that are alternatives to the simple linear model. The dotted curve represents the true relationship.

6

Bilirubin in quintiles

Cox regression model with bilirubin categorized in quintiles

hi(t) =



















h0(t) if xi≤ 10.3, h0(t) exp(b1) if xi∈ (10.3, 16], h0(t) exp(b2) if xi∈ (16, 26.7], h0(t) exp(b3) if xi∈ (26.7, 51.4], h0(t) exp(b4) if xi> 51.4.

With dummy variables I(xi≤ 10.3), . . . , I(xi> 51.4), the linear predictor for individual i is the piecewise constant function

LPi(t) = log(h0(t)) + b1I(10.3 < xi≤ 16) + · · · + b4I(xi> 51.4).

The estimates in this model are : bb1=−0.537(0.708),bb2= 1.120(0.494), bb3= 1.698(0.460), bb4= 2.670(0.437).

0 100 200 300 400

0123

Bilirubin

Linear predictor

Figure 3: Estimated linear predictor (solid curve) for the PBC study assuming an effect of serum bilirubin that is piecewise constant in quintile groups. The dashed curve joins values of the linear predictor for the scores attached to each interval of bilirubin. The distribution of bilirubin is shown on the horizontal axis.

(3)

Using interval scores

s(xi) =



















7.66 if xi≤ 10.3 13.26 if xi∈ (10.3, 16]

20.23 if xi∈ (16, 26.7]

37.32 if xi∈ (26.7, 51.4]

148.83 if xi> 51.4

The model with linear predictor log(h0(t)) + bs(xi)is nested in the model with bilirubin categorized and the likelihood ratio test for linearity is 19.0∼ χ23, P = 0.0003.

However, the model with a linear effect of x is not nested in the categorized model.

Using plots of pseudo-observations, the fit may be evaluated.

9

0 100 200 300 400

−6−4−202

Bilirubin

time = 0.71

0 100 200 300 400

−6−4−202

Bilirubin

time = 1.18

0 100 200 300 400

−6−4−202

Bilirubin

time = 2.16

0 100 200 300 400

−6−4−202

Bilirubin

time = 3.19

Figure 4: The estimated linear predictor for the PBC3 study (assuming an effect of serum bilirubin which is piecewise constant in quintile groups) plotted against bilirubin together with smoothed pseudo-observations. The four panels correspond to quintiles of observed event times.

10

1 2 3 4 5 6

−6−4−202

log(bilirubin)

time = 0.71

1 2 3 4 5 6

−6−4−202

log(bilirubin)

time = 1.18

1 2 3 4 5 6

−6−4−202

log(bilirubin)

time = 2.16

1 2 3 4 5 6

−6−4−202

log(bilirubin)

time = 3.19

Figure 5: The estimated linear predictor for the PBC3 study (assuming an effect of log(serum bilirubin) which is piecewise constant in quintile groups) plotted against log(bilirubin) together with smoothed pseudo-observations. The four panels correspond to

Comments

The model with a piecewise constant effect of x:

• is easy to fit and easy to report,

• does not contain the model with a linear effect of x as a sub-model

• does not provide a smooth (in fact, not even continuous) relationship

• is sensitive to the choice of cut-points

(4)

Regression splines

A regression “spline” is a function of the form

x+i = (xi− r)I(xi > r)

for some threshold r. Thus, x+i = 0 for xi ≤ r and increases linearly with xi from xi= rand upwards.

If we compose the linear predictor of several such spline terms:

LPi= a + bxi+ b1x+i1+ ... + b4x+i4

we get a broken linear function (Figure). The parameter bj is the change of slope at cut-point j: slope before first cut-point is b, slope between first and second is b + b1etc.

Linearity: b1= b2= ... = b4= 0.

13

Results for PBC-3

For the PBC3 example we get the estimates:

bb = −0.245(0.182), bb1= 0.460(0.309), bb2=−0.122(0.185), bb3=−0.0592(0.0654), bb4=−0.0278(0.0174).

The likelihood ratio test for linearity is 39.13 ∼ χ2(4), P < 0.001.

The linear spline function is now continuous but still not smooth.

Linear predictor with quadratic splines:

LPi= a + b1xi+ b2x2i + b1,1(x+i1)2+ ... + b1,4(x+i4)2. No simple interpretation of coefficients, but a smooth curve is obtained. LR test for linearity b2= b1,1= ... = b1,4= 0: 40.97 ∼ χ25.

14

0 100 200 300 400

−101234

Bilirubin

Linear predictor

Figure 6: Estimated linear predictor for the PBC3 study assuming an effect of serum bilirubin modeled as a linear spline (dashed), an unrestricted quadratic spline (solid), or a quadratic spline restricted to be linear for bilirubin values above 51.4 (dotted). The distribution of bilirubin is shown on the horizontal axis.

Restricted splines

The quadratic effect b2x2i may be quite dramatic for large (both positive and negative) values of xi.

This may be avoided using restricted splines, see p. 220. The idea is that for large (positive or negative) x’s, the curve is linear instead of quadratic.

Also cubic splines may be defined: (x+ij)3.

(5)

Polynomials

The simplest alternative to a linear function is a quadratic function, and a standard test for linearity is including x2and testing whether the corresponding coefficient is b2= 0:

LPi = a + b1xi+ b2x2i. The resulting parabola has maximum in −b2b1

2 and it is “happy” (convex) if b2> 0, “bad-tempered” (concave) if b2< 0.

For the PBC-3 data:

bb1= 0.0227(0.0031), bb2=−0.0000369(0.00000871).

Also higher order polynomials (cubic etc.).

• Simple approach including simple tests for linearity

• No simple interpretation of coefficients

• Influential points

17

0 100 200 300 400

0.000.050.100.150.20

Bilirubin

Cook’s distance

0 1 2 3 4 5 6

0.000.050.100.150.20

Years

Cook’s distance

Figure 7: Cook’s distance for the model with a quadratic effect of biliru- bin plotted against bilirubin and time: +: observed failure times, o:

censored observations.

18

Fractional polynomials

Instead of using just x2 and perhaps x3, use several powers xq e.g.,

√x = x0.5, 1/√

x = x−0.5, x, 1/x = x−1, x2, 1/x2= x−2, x3, 1/x3= x−3, (and the power q = 0 is taken to mean log(x)).

This provides a lot of flexibility but no interpretable coefficients.

Since such models are purely descriptive, one often aims at finding best-fitting models with two or three terms.

We did that for the PBC-3 study:

Table 1: Likelihood ratio tests comparing fractional polynomial models for the effect of bilirubin in the PBC3 study to a model with a linear effect. First column: one additional term in the model; next columns two additional terms in the model.

q2

q1 –3 –2 –1 –0.5 0 0.5 2

–3 4.29

–2 12.10 19.10

–1 25.51 29.32 32.38

–0.5 30.69 32.75 34.12 34.09

0 32.32 33.10 33.24 32.57 32.35

0.5 30.56 30.68 30.57 31.08 31.77 32.42

2 20.82 21.47 23.65 28.69 31.17 32.50 32.44

3 16.59 17.88 21.19 28.00 31.06 32.46 32.00 26.59

(6)

Results

The best fitting model with 1 extra term (in addition to just x) is

LPi= log(h0(t)) + b1xi+ b0log(xi)

with estimates bb1=−0.000723(0.00222),bb0= 1.0661(0.202), i.e. the linear term is insignificant and that for log(x) is highly significant.

The best fitting model with 2 extra terms is

LPi= log(h0(t)) + b1xi+ b−0.5x−0.5i + b−2x−2i with estimates bb1=−0.00242(0.00165),bb−0.5=

40.301(13.889), bb−2=−12.575(2.372), the last two terms being highly significant and the linear term insignificant.

21

0 100 200 300 400

−101234

Bilirubin

Linear predictor

Figure 8: Estimated linear predictor for the PBC study assuming an ef- fect of serum bilirubin which is modeled either as a fractional polynomial with powers 1 and 0 (dashed) or with powers 1, –0.5, and –2 (solid).

The distribution of bilirubin is shown on the horizontal axis.

22

Comments

• Easy to fit models with non-linear effects using a linear predictor - just define appropriate extra covariates.

• Most such models provide estimates without a simple interpretation (however, linear splines)

• Note the distinction to “truly non-linear models”, e.g. the Gompertz growth curve model with

E(yi) = a + b exp(cxi) for which special software is needed for the fitting

References

Related documents

Facet joint arthropathy - osteophyte formation and distortion of joint alignment MRI Axial T2 L3-L4 disk Psoas Paraspinal muscles Psoas Paraspinal NP AF MRI Axial T2 PACS, BIDMC

The kitchen, the dining room, the hall and even the downstairs bedroom all have French doors that open onto a large patio terrace and then the rest of the garden is laid to lawn..

Make measurements on timeslot 0 [FREQUENCY] {Timeslot Off} [Enter] Activate the ORFS measurement (figure 25) [MEASURE] {GMSK Output RF The default setting measures spectrum

Most companies recruit for full-time and internship positions, but some indicate Co-Op as a recruiting priority, while not attending Professional Practice

The  next  stage  will  be  to  shortlist  tender  submissions  from  suppliers  and  start  the  process  of  kit  evaluation,  meetings 

The UNO Assessment Committee is responsible for guiding the process of campus-wide academic assessment of student learning, and to that end it conducts regular reviews of

Using cloud BI products such as Power PI and Lumira Cloud, students can access data from different data sources (sales, marketing, operations systems, finance,