RIDGE TRACE ANALYSIS

T Table 4.1 Cases considered and the corresponding eigenvalues of X

4.3. RIDGE TRACE ANALYSIS

In this section, a GLM version of Hoerl & Kennard’s (1970b) ridge trace plot is proposed and applied to an actual example. In the GLM setting, the ridge trace is a (two-dimensional) plot of ß^.(k) versus k for all j=l.... p. This trace will serve to portray the sensitivity of the regression estimates as a function of the ridge parameter k. For plotting purposes, a reasonable range of k (>0) values may be chosen such that the resulting asymptotic confidence ellipsoid

A A A A ^ ' j ’’ y A A A ^

displacement from ß, (j)(k) = [ß-ß (k)] X WX[ß-ß (k)], is less than the 2

upper a% point of y Alternatively, the adaptive values k^ and k^ can be used to suggest an upper bound for k in the ridge trace.

A set of data for patients who underwent open-heart surgery is given in Pregibon (1979, p.152). The main aim of the study was to investigate the relationship between the occurrence (y = 1) or non-occurrence (y = 0) of a post-operative myocardial infarcation and various explanatory variables. We use this benchmark data set to illustrate the potential use of ridge regression in GLMs; no attempt is made to provide a complete analysis of the data. A description of the explanatory variables is given in Table 4.5. There are n = 142 observations and p = 19 variables. Pregibon suggested a logistic regression model for the analysis of this data set.

Consider the following tentative model in terms of the original Z-variables:

l o g O V U - j ^ ) ] = z.101 + ... + zil9019 • i=l... n ,

where jli = S[yi] = P j y ^ 1}. In order to apply the ridge regression _

technique, first let x ^ = n and

X u = (zij- i (zu - V 2]* • j=2 .... 19 • (4-2) so that •••,xi9^ represents the standardized version of

[z^.... Zig]. The model corresponding to these standardized X-variables is:

logitfti.] = x . ^ ! + ... + x .19/319 , i=l.... n . (4.3) If model (4.3) is fitted the deviance has the value 75.7 whereas the Pearson statistic is 101.9 with 123 d f . Pregibon proceeded to perform a subset selection procedure based on the deviance measure. However, it should be cautioned that for binary response data, the residual deviance contains no information concerning lack of fit of the linear

logistic formulation (Williams, 1983).

Table 4.5 Description and Coding of the Explanatory Variables.

Variable Name Variable Code

CONSTANT - Unit Vector. Zi

ACI - Acute Coronary Incident; 5 levels. Z2=ACI(2), .... z^=ACI(5) LMS - Left Main Stenosis; 3 levels. Zg=LMS(2) z^.=LMS(3) OT - Operation Timing; 4 levels. Zg=0T(2) Zg=OT(3) z^g=0T(4) BT - Balloon Timing; 4 levels. z^=BT(2) z^^=BT(3) z^=BT(4) NY - N.Y. Heart Assoc. Rating; 3 levels. z^=NY(2) z^^=NY(3)

LV - Left Ventrical Grade; quantitative. z^g

CAD - Coronary Artery Disease Score; quantitative. z ^

ANOX - Anoxic Time; quantitative. z^g

PUMP - Pump Time; quantitative. z^g

The MLE ß from fitting model (4.3) is given in Table 4.6 (a). It

A A

can be seen that the asymptotic standard errors of ß^ and ß ^ are very large relative to those of the other estimates. Due to the number of variables involved in the model, the ridge traces are sketched in three separate diagrams (Figures 4.1(a) to 4.3(a)). For model (4.3) the adaptive choices of k are k^= 0.0044 and k^= 0.048. A reasonable range for k in each ridge trace plot is (0,0.1) since the resulting ridge estimates will only be displaced within the bounds of a 50%

asymptotic confidence ellipsoid relative to ß, as

A A A w '"p T*A A A w f\ 4K0.1) = [ß-ß (0.1)] X WX[ß-ß (0.1)] = 18.2 < XT,

19,0.5 18.3 .

Table 4.6 Logistic Regression Summaries for (a) model (4.3) and (b) model (4.4).

**1*1 _____ lb)**

j Variable A s.e. A s.e. A u

W

J3.(k ) C y 1 Constant -35.93 (21.71) -31.67 (5.80) -21.29 -16.46 2 AC I (2) -9.14 (6.84) -14.78 (6.21) -8.09 -5.25 3 AC I (3) 7.26 (6.83) -1.42 (4.86) -1.45 -1.35 4 AC I (4) 1.40 (4.98) -4.12 (4.02) -2.79 -2.17 5 A C I (5) 32.39 (61.82) * * * 6 LMS(2) 13.28 (6.30) 16.03 (6.09) 9.17 6.28 7 LMS(3) 6.37 (5.39) 7.52 (5.25) 3.97 2.56 8 0T(2) -9.69 (5.08) -8.27 (4.89) -4.06 -2.45 9 0T(3) -4.24 (4.22) -1.59 (4.00) 0.11 0.56 10 0T(4) -3.57 (4.89) -1.42 (4.54) 0.26 0.63 11 BT(2) -0.14 (4.14) -1.07 (3.79) -0.42 -0.24 12 BT(3) -2.69 (5.17) -1.02 (4.59) 1.11 1.73 13 BT(4) -12.44 (6.68) -14.36 (6.08) -8.70 -6.50 14 NY (2) -32.46 (67.57) -6.54 (8.37) -3.44 -2.15 15 NY (3) 1.82 (7.95) 3.34 (7.31) 1.92 1.28 16 LV -10.46 (4.69) -8.38 (4.20) -5.08 -3.66 17 CAD 11.10 (5.22) 11.20 (5.03) 6.98 5.07 18 ANOX 5.24 (4.29) 2.90 (4.09) 1.43 0.88 19 PUMP 7.10 (4.48) 7.39 (4.29) 5.64 4.65 Deviance 75.7 83.1 88.4 96.4 Pearson stat . 101.9 (123 df) 119.2 (124 df) 79.3 75.9

Since the ridge traces for ßj_ (corresponds to ACI(5)) and ß ^ (corresponds to N Y (2)) are extremely unstable, we examined the data for these two variables rather closely, and found that there are exactly 5 patients with ACI = 5 and all of them have y = 1. Therefore the likelihood is maximized at ,co’ (see also Section 4.2.2). In other words, it is not possible to estimate the individual effect of ACI(5) by maximum likelihood. Furthermore, out of the 18 patients with NY at level 2, only one has y = 1, and for this patient ACI is at level 5. Without this particular observation, it will be impossible too to estimate the individual effect of NY(2) by maximum likelihood. The

19 V 1.00 o.ao o.«o 0.70 o.eo o.so 0.40 0.30 0.20 0.10 o A * § 1. 0 0 0.30 0.70 0 . 1 0 0.20 0.40 0.50 0.60 0.60

7 „ 0 . 1 0 0 . 2 0 0 . 3 0 0 . 5 0 • 1 0 0 . 7 0 0 . 9 0 1 .00 0 . 0 0 0 .1 O' 0 . 2 0 0 . 3 0 0 . 4 0 0 . 5 0 0 . 6 0 0 . 7 0 0 . 1 0 0 . 9 0 1 • 0- A K • 1 0 " ‘

o o Ql CD 0.60 0.20

K

«10

Figure 4.4 Pearson statistic versus k for model (4.4). o o 0.60 0.40 0.20

K

«10

instability in the ridge trace of ACI(5) (and possibly that of NY(2)) reflects these peculiarities in the data and the sensitivity of the estimates.

If the MLE does not exist then the ridge estimator ß (k) will not be well-defined. Modification to model (4.3) is thus required to ensure the existence of the MLE. Based on medical advice, the appropriate method of modification is to amalgamate ACI(5) ('evolving M I ’) with ACI(l) ('crescendo’) category. This approach is equivalent to dropping ACI(5) as the coefficient of ACI(l) has been conveniently set to zero. The resulting model is thus:

logitDi.] = x u P 1+. . .+x.4ß4+x.6P6 + . . .+x.19ß 19 . (4.4) Summary statistics from fitting (4.4) are shown in Table 4.6 (b).

The smallest and largest eigenvalues of X X are respectively 0.074 and 2.95, indicating only a moderate degree of population- inherent / model-specification collinearity among the explanatory variables. The ridge traces for the revised model are displayed in Figures 4.1(b) to 4.3(b). The corresponding Pearson statistic and

asymptotic confidence ellipsoid displacement measure (}) are plotted in Figures 4.4 and 4.5. We observe that there is no longer any evidence of extreme instability in the ridge trace plots. In particular, a noticeable change occurs in the behaviour of the trace for NY(2). The ridge estimate at k = 0.008 (see Table 4.6 (b)) suggests that the

effects of most variables are less than those depicted by the MLE. Moreover, the Pearson statistic plot reveals that further shrinkage will improve the goodness-of-fit of model (4.4), with the best fit occurring at k ^ 0.016 (= k c say). This also signifies that the observed data are perhaps over-classified according to the recording procedure used. Incidentally, Pregibon (1979, p.26) commented that "all levels of the qualitative variables may not be needed, and some

collapsing may be indicated” .

Consultation with medical experts suggests the following regrouping for the categorical variables:

AC I LMS OT BT

(1) crescendo/new ischemic changes/subendocardial MI/ evolving MI (2) normal resting ECG

(1) yes/>51% stenosis (2) no

(1) elective (2) semi-elective/urgent/emergent (1) pre-op,pre-anaes./pre-op,post-anaes./inter-op (2) post-op

(1) minor pain (2) slight pain at rest/pain at rest An inspection of the MLE and the associated standard errors shows that the collapsing of the redundant levels as above is also sensible based on statistical criteria. This leads to a simplified model:

(4.5) logit[p.]

X i T l + + X il0T 10 ’

where x ’ s correspond to the variables listed in Table 4.7 and are assumed to be standardized similarly to the x * s in (4.2). The summary statistics obtained from fitting model (4.5) (see Table 4.7 (a)) indicate that the variables OT. NY and ANOX have relatively little effects on the occurrence of a heart-attack. The apparent insignificance of these variables is further reflected in their ridge traces (Figure 4.6) and the ridge estimates T*(ka ) and nr*(kc ) , where 0.0078, and k 0.026 is the value of k that minimizes the

a c

Pearson statistic. If we apply the subset selection procedure of Hoerl 8l Kennard (1970b) using the ridge trace, the following seven carrier model is obtained:

logitO.] = x ilT

1

+x i

2

+x.

3

15

5

17

7

18

8

+x il

0

Tr10. (4.6) It is noted that a stepwise elimination procedure starting with the full model (4.5) also arrives at this same model. Parameter estimates for model (4.6) can be found in Table 4.7 (b).

In this example, the ridge analysis draws our attention to a simplified and parsimonious model. Estimates from the ridge trace

1.00 o.to 0.70 0.50 • 1 0 0.60 0.40 0.30 o.to 0.20 I .0 0 o.to 0.20 0.30 0.50 0.60 0.10

also provide meaningful supplements to the MLE, especially in terms of the Pearson goodness-of-fit criterion. Other aspects of model checking, such as those proposed in the next two chapters, could then be performed at the conclusion of this preliminary analysis.

Table 4.7 Logistic Regression Summaries for (a) model (4.5) and (b) model (4.6).

Isl

m . j Variable n r . J s . e. nr J v a '.(k ) nr J .(k )c J nr . J s . e. 1 Constant -*26.64 (4.50) -•21.10 -14.72 -25.98 (4.38) 2 ACI(2) -10.89 (5.29) -7.05 -3.43 -9.74 (5.18) 3 LMS(2) 8.56 (3.56) 6.28 3.78 8.18 (3.48) 4 0T(2) -3.96 (3.58) -2.55 -1.21 * 5 BT(2) -12.42 (3.61) -9.85 -6.85 -10.84 (3.21) 6 NY(2) 1.08 (2.67) 0.72 0.37 * 7 LV -6.71 (3.49) -5.11 -3.36 -6.62 (3.49) 8 CAD 10.78 (4.33) 7.74 4.61 10.09 (4.17) 9 ANOX 1.20 (3.47) 0.84 0.45 * 10 PUMP 6.32 (3.73) 5.72 4.61 7.62 (3.42) Deviance 94.1 96.2 106.5 95.7 Pearson stat 111.3 (132 df)1 89.6 81.6 110.9 (135 df) 4.4. CONCLUSION

This section concludes our investigation on collinearity and ridge estimation in GLMs. Based on the results of our empirical study and other findings in the ridge regression literature, we recommend the ridge procedure for GLMs as a supplementary procedure to maximum likelihood analysis, especially when the explanatory variables are collinear.

The Monte Carlo experiments in Section 4.2 verified that (i) collinearity can seriously affect the MLE and (ii) the adaptive ridge estimator with k = k performs generally better than the MLE in terms

Although total reliance on a particular ridge estimator is not to be encouraged, the actual example in Section 4.3 indicates that the proposed ridge trace plot can provide additional insight by highlighting the sensitivity of the regression estimates as a function of the ridge parameter k. The associated ridge analysis assists in the selection of the relevant explanatory variables and can be applied regardless of the degree of the collinearity.

C H A P T E R F I V E

5.1. INTRODUCTION

We now turn our attention to other aspects of examining the fit of a specified GLM. The practice of examining and plotting residuals to detect some inadequacies in GLMs has been established; see eg. Pregibon (1979), Williams (1984), and Landwehr, Pregibon &. Shoemaker (1984). However, to assess the leverage information that is contained in the projection matrix, only a rough guideline has been provided by Hoaglin & Welsch (1978). Similarly, judgement of the influence of an observation, as measured for example by the generalized Cook’s statistic for GLMs, has to rely on comparison with asymptotic values. The purpose of this chapter is to present a general technique for assessing leverage and influential observations. The procedure takes the form of Half-Normal plots with envelopes derived by simulation so as to enhance overall assessment of the model. This procedure of assessment is more informative and provides additional insight compared with procedures based on the largest sample leverage and influence statistics. It is recommended that these plots should accompany any thorough diagnostic analysis.

Section 5.2 begins with a review of residuals, sind leverage and

In document Ridge regression and diagnostics in generalized linear models (Page 76-89)

T Table 4.1 Cases considered and the corresponding eigenvalues of X

4.3. RIDGE TRACE ANALYSIS

1*1____ _________ lb)

W

K

«10

K

«10

1

2

2

3

3

15

5

17

7

18

8

0

Isl

**1*1 _____ lb)**