ASSESSING PARTIAL INFLUENCE - 2 : The generalized partial residual is

Definition 6 2 : The generalized partial residual is

6.3. ASSESSING PARTIAL INFLUENCE

In many practical situations, it is important to judge whether the evidence for a new explanatory variable z is spread evenly throughout the data or rests with only a few cases. However, as shown in Cook & Wang (1983), both the added variable and partial residual

plots can fail if important information about z arises from observations at leverage points. This is because leverage points usually produce relatively small residuals, and such points may only make a small contribution to the plot, even if they are influential

for that variable. In the following, we derive a reliable graphical display which highlights the degree of influence by individual observations including leverage points on the variable of interest. To facilitate discussion, we shall refer to the notation set out in Table 6.1 where a subscripted i in brackets means "with the ith

observation deleted". Note that for generality, -r now represents a qxl vector of additional parameters.

Table 6.1 Models of Interest.

m o d e l # d e s c r i p t i o n f i t t e d m o d e l D e v i a n c e I V = Xß V = Xß D i II 17 = Xß + Ztt A A A Tj - Xß + Zrr °2 III ” ( i ) = X ( i ) P (i) A* A/ ^ ( i ) = x ( i ) P ( i ) D3 IV T,( i ) = X ( i ) P ( i ) + Z ( i ) T (i) _{" ( i f X ( i ) P ( i ) + Z (i)"r(1)} ° 4

When q = 1, an influence measure for the impact of the ith

A A A

observation on nr can be based on nr-nr^^. Computations of nr^^ can be simplified using some one-step estimates. The resulting index plot of the standardized change on the nr coefficient is advocated in Pregibon’s (1981) logistic regression diagnostics. However, when q>l, a more relevant measure is C^(nr) = (D^-D^-CD^-D^) which represents the change in deviance due to Z when the ith case is deleted. A large positive C.. means the ith observation is contributing appreciably to including Z in the model whereas a large negative change implies that deletion of the observation actually strengthens the evidence for Z.

Computation of can be simplified by using the one-step approximation to C^ : \ 2 2 C. (nr) = r . . - r 0 . , iv J li 2i 2 2 2 -1

where r ^ = + s^ ^ 12 i i ^ _^1lii^ ■ *s t^le diagonal element

~iz _ i T~i / o 9

of = W ä(X WX) X W , d ^ and s ^ are respectively the component of

n 2 n 2

the deviance D 1 = .21d 1 . and of the Pearson statistic S. = .2,st\ =

1 i=l li 1 i=l li

~ 2 ~ 2 2 2

2(y^-p^) /v. ; r ^ is defined similarly. The quantities r ^ and r ^ can be regarded as squared residuals (Williams, 1987) for models I and II respectively. Consequently, we suggest an index plot of c|(nr) to identify observations that may be influencing the apparent significance of Z. We call the plot of cj a partial influence plot. Finally, to assess whether the evidence for Z is spread throughout the data or depends solely on a particular observation, one merely

1 2

compares ci D^- D^- C^ with a critical value.

Some related work in a more specialized context has appeared in the literature. Atkinson (1982) provided a method for identifying cases that influence the transformation of the response in the linear regression model. His approach is based on inspecting the added variable plot and a Half-Normal plot of his modified Cook’s distance in the hope of identifying any influential case. In our GLM situation (Table 6.1), let 0 denotes the full set of parameters (ie.

ß

and nr), then the generalized Cook’s statistic becomes:

a y e ) = ( e ( 1 ) - e ) TM(e( i ) - 0) , ( 6 . 1 0 )

A A __^

where M is conveniently taken to be {Cov[0]} /(p+q), so that CEL(0) is basically comparing model II with model IV and can be interpreted as the asymptotic confidence ellipsoid displacement due to deleting

the ith case.

The generalized Cook’s statistic (6.10) measures the influence of an observation on all the parameter estimates. This may be non-

informative and misleading if our interest is on nr only. An observation can be influential on one parameter or a subset of parameters. An alternative formulation to our partial influence measure C (t) is to adopt an argument analogous to the linear regression case as given in Cook &. Weisberg (1982, pl25) and Atkinson (1985, §10.2.1). Suppose that interest is in t contrasts of the form A 0, where A is a tx(p+q) matrix of known constants, with rank t < p+q. An appropriate measure of influence for this set of contrasts is again

nr a a t* i

given by (6.10) but with M = A (ACov[0]A } A/t. The special case where interest centers on nr corresponds to defining A = [0|l ]. The modified Cook’s statistic then becomes:

CD.(nr) = ^)T {Cov[nr]} nr)/q , and

/V A __1 'T'/'V «"pyX __-i {Cov[nr]} = Z WZ - Z WX(X WX) X WZ,

Some one-step approximation to may t>e substituted to ease the computation of QL(nr). In general, the deviance based partial influence measure C^ is preferred since there is no ambiguity concerning the choice of M as occurs in the case of CIT (see also the discussion in Cook & Weisberg (1982, p!22-124)).

In document Ridge regression and diagnostics in generalized linear models (Page 108-111)