Measurement errors in generalized linear model explanatory variables

(1)

LINEAR MODEL EXPLANATORY VARIABLES

by

Leonard Stefanski Department of Statistics North Carolina State University

and

Department of Biostatistics Harvard School of Public Health

Paper Presented at the

(2)

Under the assumption that response and explanatory variables

follow a generalized linear model, estimating equations are derived

for the case in which the explanatory variables are measured with

error.

Although the estimating equations are shown to have multiple

solutions, a procedure is suggested for uniquely identifying the

appropriate root.

A by-product of the proposed computational methods

is an informative plot, called the measurement error trace, which

graphically illustrates the effect of measurement error on estimated

parameters.

(3)

This paper studies the problem of fitting generalized linear

models to data when explanatory variables are measured with error.

Assuming that measurement error is normally distributed and

independent of both the true explanatory and response variable,

unbiased estimating equations for the generalized linear model

parameters are obtained by conditioning on certain sufficient

statistics.

The estimating equations are suitable for both the

functional and structural versions of the measurement error model.

For the structural semi-parametric version of the generalized linear

measurement-error model, efficient estimating equations are

identified.

Definitions and statement of the modelling assumptions are given

in Section 1.

Estimating equations for functional models are derived

in Section 2, and Section 3 contains material relevant to structural

measurement-error models.

To a great extent Sections 1-3 constitute a

review of the recent paper by Stefanski and Carroll (1987).

A major obstacle to the application of the theory in Sections 1-3

is the nonuniqueness of solutions to the proposed estimating

equations.

This problem was mentioned in Stefanski and Carroll

(4)

suggests a graphical technique, called the measurement-error trace,

for displaying the effect of measurement error on parameter estimates.

1. GENERALIZED LINEAR MEASUREMENT ERROR MODELS 1.1 Generalized linear models in canonical form

Throughout this paper attention will be restricted to generalized

linear models in canonical form (McCullagh

&

NeIder, 1983, Ch. 2).

That is, given a p-vector explanatory variable U-u, it is assumed that

the response variable y has the density

exp{y(<<+~TU)-b(<<+~TU)

a(,)

+

c(y,,)}

( 1 . 1 )

with respect to a sigma-finite measure m(·).

In (1.1),

e

T-(<<,~T

,.);

a(·), b(·) and c(·) are known functions; and the dominating measure

m(·) does not depend on

e

or u.

Table 1.1 gives choices of a(·), b(·)

and m(·) for some common nonlinear models.

Table 1.

Choices of a(·), b(·) and m(·) for some common

generalized linear models in canonical form.

a(,)

b(l1)

m( • )

Poisson

Counting

measure on

{0 , 1 , . . . }

Logistic

Counting

measure on

{O, 1}

Gamma

•

10g(-1/11) (11<0)

Lebesque

measure on

(O,CD)

Inverse Gaussian

•

_(_211)1/2 (11<0)

Lebesque

measure on

(O,CD)

(5)

parameters are assumed known.

This is crucial to the theory presented

later and thus the necessity of the restriction to canonical models.

Unfortunately, some canonical models entail restriction on a+aTu, e.g.

Gamma and Inverse Gaussian, and thus they are less desireable, from a

modelling viewpoint, than certain noncanonical models.

1.2 The measurement error model

In a generalized linear measurement error model a proxy, X, is

observed in place of U.

It is assumed that conditioned on U-u, the

observable random variable

X

has the normal density

12\-1/2

T--1

h

_x

(x;9,u) • (2n)-P/2 exp{-(1/2)(x-u)

Q

(x-u)}

( 1 .2)

where 2 is the covariance matrix of the measurement error vector, x-u.

Note that like (1.1), (1.2) possesses a natural sufficient

statistic for u when the other parameters are assumed known.

A

generalized linear measurement error model is obtained by

combining (1.1) and (1.2) under the assumption that Y and X are

conditionally independent given U.

The resulting density of

(Y,x)

conditioned on U-u is then

( 1 . 3 )

For an independent sequence of random variables

(Y. ,X.)

1 1

(6)

either as a sequence of constants or as a sequence of independent and identically distributed random variables. In the former case a

functional model is obtained while the latter case is termed a

structural model. structural models can be further characterized as parametric or semiparametric depending on whether the distribution of U is specified parametrically or nonparametrically. Functional models and nonparametric structural models are studied in this paper.

Not all of the parameters for all versions of model (1.3) are identifiable and thus some additional information is required. It will be assumed that

2/a

(+) - S2, ( 1 .4)

where S2 is known. In simple linear regression (1.4) reduces to the common identifiability assumption that the ratio of measurement-error variance to the equation error variance is known. In models for which a(+)-l, for example, logistic and Poisson regression, (1.4) requires that the measurement-error covariance matrix is known~

2. FUNCTIONAL MODELS

2.1 The functional likelihood

Under the assumptions of the functional model there are n+p+2 unknown parameters,

a,~T,+

and u

1, .. ,un. Given data (Yi,Xi ) (i-l, .. ,n) the functional log likelihood is

n

L (

e,

_{u 1 ' .. , un)}

=.

I log {h

_y

X(Y . , X. ;

e,

u. ) } . ( 2 .1 )

1-1 ' 1 1 1

(7)

However, for nonlinear models maximization of (2.1) is neither

computationally attractive nor is it guaranteed to yield consistent

estimators. In logistic regression it is known that the functional

maximum likelihood estimator of (a,~T) is not consistent (Stefanski

and Carroll, 1985). This is a classic example of the failure of the

method of maximum likelihood in the presence of an increasing number

of nuisance parameters (Neyman and Scott, 1948).

2.2 Unbiased estimating equations

Note that (1.3) can be written

where

hy,X(y,x;e,u) - q(&,e,u)v(y,x,e)

T -1 T -1 T

( r e ) exp{u 2 & _ u 2 u+2b(a+~ u)}.

q 0 , ,u - a(+) 2a(+) ,

T -1

( ) {2ay-x 2 x * }

v y,x,e - exp 2a(+> + C (y,+)

& - &(y,x,e) - x+y2~

c*(y,+) - c(y,+) - (1/2)log[{2na(+)}PI2I].

( 2 .2)

Thus viewing u as a parameter and

a,a

and + as fixed, the statistic

6 - 6(Y,X,e) - X+Y2a ( 2 . 3 )

is sufficient for u. As a consequence, the distribution of YI6 does

not depend on u, and this fact can be exploited to derive unbiased

estimating equations for e which are independent of u.

write hyI6 (yl&;e) for the conditional distribution of YI6=o. routine derivation establishes that

(8)

h_yI6(yl&;9) - exp[yn - (1/2)y2 a TQa/ a (+) + c(y,+)

- log{S(n,a,+)}], ( 2 • 4 )

where n -

(~+aT&)/a(+)

and S(·,·,·) is determined by the requirement that

Since m(·) does not depend on 9 i t follows from (2.5) that

.

I

h_yI6(yl&;9) dm(y) - 0,

.

where h_yI6(yl&;9) - (a/a9)h_yI6(yl&;9) .

( 2 . 5 )

( 2 .6)

.

Define W_s (y,x,9) - hYI6(ylx+y2a;9), then i t follows from (2.6) that E₉

{w

_{s (y,X,9)} -} _{E9 [E9 {Ws (Y,X,9)16}] - O.}

A

Any estimator 9

s solving n

.1: W_s (Y_i ,X_i ,9_{s ) -} 0 (2.7)

1-1

will be called a sufficiency estimator. The score,

w

(9)

{y - E(YI6-&)}/a(,)

ws(y,x,e) - {y - E(YI6-&)}/a(,) - {y2_ E(y216=&»2a/a(.) r (y ,x ,e ) - E {r (Y , X ,e)

I

6-&}

( 2 .8)

A second unbiased estimating equation is found by adopting an approach due to Lindsay (1980,1982,1983). The conditional score, w

c' is defined via

{y-E(YI6-&)}/a(,) {y-E(YI6-&)}t(&)/a(,) r(y,x, e)-E {r (Y, X,e) 16-&}

( 2 .9)

where t(·) is a p-vector-valued function not depending on (Y,X). With this restriction on t(·) note that

E[{Y-E(YI6)}t(6)] - E(t(6)E[{Y-E(YI6)}16]) - 0 and thus

W

c is unbiased. An estimator satisfying

n

E

w

(Y.,x.,e) - 0

. 1 c 1 1 C

1-will be called a conditional estimator.

(2.10)

The conditional score depends on t(·) which must be specified. Ideally t(·) would be chosen to minimize the asymptotic variance of e

(10)

The fact that t(Ai)-u_i is optimal, and thus so to is any one-to-one linear function of u

i ' suggests choosing t(·) so that E{t(Ai )} is a one-to-one linear function of u

i . Thus simply taking t(A)-A is suggested (E(A)-u+E(y)Q~). Another possibility is suggested by the facts that X is unbiased for

u

and A is sufficient for u (assuming e fixed) and thus t(A)-E(XIA) is a uniformly minimum variance unbiased estimator of u (again assuming e fixed). Note that

E(XIA) - A-E(YIA)Q~. (2.11)

3. STRUCTURAL MODELS

3.1 The structural likelihood

Consider the nonparametric structural model defined in Section 1.2. The joint density of (Y,X) is given by

fy,X(y,x;e,g) -

f

hy,x(y,x;e,u)g(u)dv(u) ( 3 .1) where hy,x is defined jn (1.3). This constitutes a semiparametric model with parametric component e and nonparametric component g. The density g is assumed to be an element of G, a family of densities with

respect to Lebesgue measure, denoted v(·).

LetR(y,x,e,g) - log fy,x(y,x;e,g) and }(y,x,e,g)

-(a/ae)1(y,x,e,g). If g(.) were known then

J

would be the efficient score for e.

Assuming that differentiation and integration can be interchanged

.

(11)

I(y,x,9,g) • !(a/a9)log(h)hgdv

~ Ihgdv

• E{(a/a9)log h(y,xi9,U)IY=y,X=x} .

.

Thus if g(o) is viewed as a prior for u, then

J

has the interpretation as the posterior expectation of the functional maximum likelihood 9-score. Furthermore, since ~·X+YQ~ is sufficient for u in the

conditional model, (1.3), the conditional distribution of UIY,X is the same as that of UI~. Therefore,

R(y,x,9,g) • E{(a/a9)log h(y,xia,U)I~=x+yQ~}. ( 3 .2)

3.2 Efficient estimating equations

In model (3.1) interest lies primarily in estimation of a, i.e., g(o) is a nuisance function. The conditional and sufficiency scores of Section 2 are appropriate for the structural model of this section in the sense of being unbiased but they are generally not efficient.

In this section the efficient a-score is derived and is shown to be equal to a conditional score (2.9) with t(o).E(UI~-o)o Efficiency is defined in the sense of Pfanzagl (1982, Ch. 14), Begun et ale

(1983) and Lindsay (1983, 1985).

Assume that the family of densities {hy(Yin)}, obtained by setting n·a+~Tu and fixing, in the right hand side of (1.1), is a regular exponential family for n € H where H is one of the three open

T

intervals (-~,O), (O,~), (-~,~). Let a = (a,~ ,.) be an element in

e

= RXRPXR+ and 9 an element of G. write

~

= (a,g) and with supp(g) denoting the support of g, define

T={~:a+aTu

8 H for u 8 supp(g)}.

(12)

Let 5 be the class of estimating equations, W, satisfying for all

1:' in T:

( i )

( i i )

(iii)

·T

E1:'{(3/3a)w(Y,x,a)} - -E1:'{w(Y,x,a)j (Y,x,a)},

E1:'{llw(y,X,9)11

2}

<

aD.

If G is complete in the sense defined below, then it transpires

that every score in 5 must be conditionally unbiased with respect to 6

(Theorem

3.1),

i.e., if w is in 5 then

E1:'{W(Y,X,916)} - 0, for all 1:' in T.

( 3 • 3 )

This allows an easy derivation (Corollary

3.1)

of the efficient

estimating equation for a.

Definition.

A collection of functions, H, is said to be complete

with respect to a measure

p

if a necessary condition for

I

t(s)h(s)dp(s) • 0,

for all h

€

H, is t(·).O p-almost surely.

For a fixed a

€

a

let Ga-{g

€

G:(9,g)

€

T} and let va be Lebesgue

measure on {u

€ RP:a+~Tu €

H}.

Theorem

3.1.

Assume that for each fixed a

€

a,

G

(13)

The positive-definite matrix Vw·{E(wiT)}-lE(wwT){E(jwT)}-l

measures the efficiency of w as an estimating equation.

Under

regularity conditions V

w

is the asymptotic covariance matrix of

n1/ 2 (e_a) when e is a consistent estimator of a satisfying

Ew(Yi,xi,a)-o.

Let

*

.

w

-

J

(y, x, a, g) -E

{~(

Y, X, a, g)

I

~-x+yS2a}

( 3 . 4 )

and let V * be the associated covariance matrix.

The following result

w

states that w*- is the efficient a-score.

Corollary 3.1.

Vw~Vw*

for all w

€

S. proofs of Theorem 3.1 and its corollary can be found in Stefanski

and Carroll (1987) and will not be given here.

To find w* note that Y and U are conditionally uncorrelated given

~.

This follows from the facts that the a-field generated by

~

is

contained in the a-field generated by (Y,X), and the conditional

distributions of UI(Y-y,X-x) and

UI~-x+yS2a

are identical.

Thus

E(YUI~)

-

E{E(YUIY,X)I~}

-

E{YE(UI~) I~}

-

E(YI~)E(UI~).

This fact and

(3.2) imply that

* . . .

.

w

-

i -

E(ll~)

- E(h/hIY,X) - E{E(h/hIY,X)

I~}

.

- E(h/hIY,X) -

E(h/hl~)

(14)

{y - E(YI6-o)}/a(+)

*

w

(y,x,e) - {y - E(YI6-o)}E(UI6-o)/a(+) r (y ,x ,e ) - E {r (Y , X, e)

I

6=- 0 }

( 3 • 5 )

Comparison with (2.9) shows that

w*

is a conditional score with t(o) =:

E(UI6-o). Note that 6-X+Y2~-U+Z+Y2~where Z has a N{O,a(+)2} distribution. This implies that

E(UI6) - 6 - E(YI6)2~ - E(ZI6).

But by (2.11), 6 - E(YI6)2~ - E(XI6); also it can be shown that

.

E(ZI6-o) - _{-a(+)2f 6 (o)/f 6 (o)}

.

where f

6(o) is the density of 6 and f6(o) =: (3/30)f6(o). Thus

.

E(UI6-o) - _{E(XI6-o) + a(+)2f 6 (o)/f 6 (o).} (3.6) Fully efficient estimators of

(<<,~T)

in the linear model have been given by Bickel and Ritov (1986).

4. LOGISTIC REGRESSION

In this section the logistic model is used as a means of illustrating and comparing the various estimating equations. The logistic model assumes that

pr e (Y-1 IU- u ) -

F(<<+~Tu)

where F(t) =: l/(l+e-t ). For this model a(+) 5 1 and m(·) is counting

measure on {O,l}. The conditional distribution of Y16=o is given by

( 4 .1 )

(15)

*

Note that if d is defined as d -d-(1j2)2a, with a similar notation for

~*-~-(1j2)2a,

then

i .e. , conditioned on d -6 , Y follows a logistic model.

*

( 4 .3)

This closure

*

property (equality of the conditional distributions of YIU and YI6

*

where 6 is some function of 6) seems only to hold for the normal and logistic models.

The sufficiency score is a conditional score for the particular choice t(6)-6-2a. In Section 3 it was suggested that taking t(~)

=

E(Xld-6) should lead to promising estimating equations. For the logistic model

( 4 • 4 )

Fully efficient estimation requires that t(~) - E(XI6-~) +

.

2f6(~)jfd(6), see equation (3.6).

All of the scores for the logistic model share a common problem; the associated estimating equations may possess multiple roots.

(16)

Note that if Y.-1,

1

"'c(Yi,Xi,a) - {I _ F(Ot+(3TX.+(1/2)(3T Q(3)}( 1 \

1

~i+Q(3/~

and if (3T 2(3 4 ~, ", (y.,x.,a) 4 0; also if Y.-O,

c 1 1 1

"'c(Y' ,X. ,a) _ -F(Ot+(3TX1.-(1/2) (3T 2(3)

~

1 )

1 1 X.-2(3/2

1

and again "'c(Yi,xi,a) 4 0 as (3T 2(3

4~.

Thus if 11(31

I

4

~

in such a

way that (3T 2(3 4

~,

G

n(Ot,(3) 4 O. The manner in which "'c depends on (3 through (3T 2(3 makes the score behave similar to "redescending" scores

which find applications in robust statistics. A consequence of this

behavior is the fact that G_n(·,·) may have multiple roots not all of

which lead to consistent sequences of estimators.

Figure 1 displays a graph of the second component of G_n(O,(3) vs (3

for ~ e [0.25,10.25]. _{Sample size, n, was set at 100; the (U i ) are}

distributed as standard normal random variates; the (Xi) were

generated according to the model

X._{1 1 1}- U. + IQ

z.

where (Z.) are standard normal random variables independent of the

1

(U_i ); 2-1; and Y_i were generated according to model (1.1) with

a=O

and

~-1. It is evident that Gn(O,~) contains multiple roots in the

interval [0.25,10.25].

The problem of multiple roots is not specific to logistic

(17)

quadratically through the term

~TQ~.

The next section discusses the problem of multiple roots and suggests a strategy for locating the appropriate root.

5. THE PROBLEM OF MULTIPLE ROOTS

It is often the case that (2.7) and (2.10) have multiple roots. This is not a finite-sample problem; i t persists asymptotically. Thus a strategy is required for selecting the correct solution to

equations (2.7) and (2.10).

As a means of motivation, the problem of multiple roots will be discussed first in the context of the simple linear

errors-in-variables version of model (1.3). The insights gained from this

investigation are then generalized to nonlinear models and illustrated in the context of logistic regression.

Let ~o be a scalar and suppose that given U-u, y has a normal distribution with mean

a+~ou

and variance a2• The conditional

distribution of Y16-6 is normal with variance

a2/(I+~~Q~o)

and mean T

(a+~o6)/(I+~oQ~o). It can be shown that for this model the estimating equations (2.7) imply that

a

_s

- Y-~

_s

X

and ~s solves

A2

-~sQSyX + (SyyQ-SXX)~s + SyX

=

0,

where

( 5 . 1 )

n

SyX

=

t (Y.-Y)(X.-X)i

. 1 1 1

1-n _ 2

Syy

=

t (y.-y) . i-I 1

n _ 2

(18)

This quadratic equation has two real roots, a

s ,l and as,2 (Kendall and Stuart 1979, Ch. 29), converging to a_o and -l/Qa_o' (ao~O). A number of root-selection criteria can be formulated for this model but

unfortunately they do not generalize to nonlinear multiple-regression models. For example, the correct root, a_s,l ' has, at least

asymptotically, the same sign as the so-called naive estimator, aN-syX/SXX·

Consider the function

( 5 . 2 )

obtained by replacing Q with ~Q in (5.1). Note that when ~-O the equation

f(a,~) - 0 ( 5 .3)

has the unique solution aN-SYX/Sxx; while for ~-1, (5.3) and (5.1) are identical. Since (5.3) has a unique solution at ~-O, the implicit function theorem guarantees the existence of a unique function, a(~),

solving (5.3) for all ~ in some neighborhood of ~-O. It transpires that a(~) exists uniquely for all ~ € [0,1] and that a(l)-a 1. That

s,

is the "correct" root of (5.1) is determined by the fact that it lies on the locus of solutions, {a(~):O~~~l)} to (5.3), which in turn is uniquely determined by the condition that a(O)=Syx/Sxx' These ideas are illustrated in Figure 2.

(19)

Note that when ~-O, Gn(<<,a,O) is the gradient of an ordinary logistic log likelihood. Consequently, except in cases of quasi- or complete

separation (Santner and Duffy, 1986), the equation Gn(<<,a,O) - 0

A AT T

possesses a unique finite solution, eN - (~,aN) , which is easily found using a Newton-Raphson iteration; eN is the so-called naive estimator. As in the linear model the implicit function theorem guarantees the existence of a unique family of solutions, e(~), to the equation

Gn(<<,a,~) - 0,

in some neighborhood of ~-O such that e(O)-e_N. If e(~) exists uniquely for all ~ € [0,1] then the asymptotic arguments in Appendix I suggest

that e(l) is the consistent estimator we seek.

A simple example illustrates the preceeding argument. Consider the problem of fitting a no-intercept (<<-0) logistic model to the data

described at the end of Section 4. Figure 1 indicates that the estimating equation (4.5) has at least three roots. In Figure 3 is

plotted the second component of G(<<,a,~) for ~-O.OO, 0.25, 0.50, 0.75 and 1.00. The figure clearly shows which root is connected continuously to the naive estimator. Figure 4 contains plots of the same functions depicted in Figure 3 but for a data set of size n=1000. This figure

(20)

For models other than logistic or linear regression the root-finding argument proceeds similarly. Let w(Y,x,e,~) denote either (2.8) or (2.9) with ~2 in place of 2. Then e(~) solves

-1

n

o -

G_n(e,~) - n _. t₁w(Y.,x.,e,~).₁ ₁

1-6. THE MEASUREMENT ERROR TRACE

( 5.5 )

Let e(~), O~~~l be the solution locus discussed in the previous section. Generally e(O) is easily obtained using standard computational methods; this is the naive estimator. For ~>O, e(~) can be found by employing a Newton-Raphson interation of (5.5) starting from e(O). This iteration will converge to the desired solution provided ~ is

sufficiently close to zero. The solution locus, e(~),(O~~~l), can be generated on a grid of ~ values {O-~O<~l<..<~k-l} successively by using a

... A

Newton-Raphson iteration starting at e(~i) to compute e(~i+l). The

iteration scheme converges to the desired solution provided the grid mesh is sufficiently small.

Note that e(~) is the estimator one would obtain if the measurement error covariance had been assumed to be proportional to ~2 instead of Q.

It is often the case that a measurement error model is fit to data primarily for examining the effects of measurement error on estimated parameters. In these cases Q may not be known, but represents a best guess or crude estimate of the measurement error covariance matrix. A plot of e(~) versus ~ illustrates the nature of the dependence of the estimated parameters on the magnitude of the (assumed) error covariance. Since it is similar in design and intent to a ridge trace, the plot of

(21)

(O~L~I), has been computed by the procedure suggested in the preceeding paragraph, the measurement error trace is easily constructed.

Figures 5-8 display examples of the measurement error trace for some different logistic regression measurement-error models. In each example one thousand observations were generated according to the logistic model

with

~-1, ~T_(O.OO,O.25,O.50),

U - N(O,I

3), Z - N(O,I3) and X_U+2

1_{/ 2}

_z.

Figures 5-8 differ only with respect to the choice of 2. For Figure 5, 2-2₅-1₃; for Figure 6

-G

0

~}

2 - 2₆ 0 0 for Figure 7,

~-1/2)

{~

0

2 - 2₇ 1_1/2 ;

2 and for Figure 8,

2- 1/2

1

o

g).

1 ,

Only the estimated slope coefficients, a(L), are plotted in Figures 5-8. Recall that a(O) is the naive estimate, i.e., the estimate obtained by fitting a logistic model to the observed data ignoring measurement error; a(l) is the errors-in-variables estimate. Figures 5-8 clearly illustrate those parameters which are affected by the covariable

measurement error.

ACKNOWLEDGEMENTS

(22)

REFERENCES

Begun, J.M., Hall, W.J., Hwang, W.M. & Wellner, J.A. (1983).

Information and asymptotic efficiency in parametric-nonparametric models. Ann. Statist. 11, 432-52.

Bickel, P.J. & Ritov, Y. (1987). Efficient estimation in the errors-in-variables model. Ann. Statist. 15, 513-40.

Cox, D.R. & Hinkley, D.V. (1974). Theoretical Statistics. London: Chapman and Hall.

GIeser, L.J. (1981). Estimation in a multivariate

'errors-in-variables' regression model: large sample results. Ann. Statist. 9, 24-44.

Kendall, M.G. & Stuart, A. (1979). The Advanced Theory of Statistics, 2. London: Griffin.

Lindsay, B.G. (1980). Nuisance parameters, mixture models, and the efficiency of partial likelihood estimators. Philos. Trans. Roy. Soc. London Sere ~ 296, 639-65.

Lindsay, B.G. (1982). Conditional score functions: some optimality results. Biometrika 69, 503-12.

Lindsay, B.G. (1983). Efficiency of the conditional score in a mixture setting. Ann. Statist. 11, 486-97.

Lindsay, B.G. (1985). using empirical partially Bayes inference for increased efficiency. Ann. statist. 13, 914-32.

McCullagh, P. & NeIder, J.A. (1983). Generalized Linear Models. London: Chapman and Hall.

Neyman, J. & Scott, E.L. (1948). Consistent estimates based on partially consistent observations. Econometrica 16, 1-32. pfanzagl, J. (1982). Contributions to a General Asymptotic

Statistical Theory. New York: Springer-verlag.

santner, T.J. & Duffy, D.E. (1986). A note on A. Albert and J.A. Anderson's conditions for the existence of maximum likelihood estimates in logistic regression models. Biometrika 73, 755-58. stefanski, L.A. & Carroll, R.J. (1985). Covariate measurement error

in logistic regression. Ann. Statist. 13, 1335-51.

(23)

(A.2 )

APPENDIX 1

An argument is given which suggests that the root selection procedure presented in Section 5 works asymptotically.

Let A denote an element in RS and ~ an element in [0,1]. Let {G(·,·), G

1(·,·), G2(·,·), •.•. } be a sequence of functions defined on RSx[O,l], taking values in RS such that G_n(·,·) converges to G(·,·) uniformly on compact sets in RSx[O,l].

Assume that G(A,O) has a unique root, i.e., the equation

G(A,O) - 0 (A.1)

has a unique solution, A

O• Also assume that there exists a unique

s

element of

e

[0,1], a, such that

G{a(~),~} - 0 for all ~ € [0,1].

Note that a(O)-A_o.

Finally, make the assumption that for each fixed ~, a(~) is an isolated root of G(a,~)-O uniformly in~. More specifically i t is assumed:

There exists an n>O, not depending on ~, such that

G(A,~)-O and

I

IA-a(~)1

I

~ n imply A=a(~), for all ~.

Under these conditions i t is possible to prove the following result.

(A.3 )

proposition A.1: If for each n, there exists a_n(·) €

e

S [O,l] such

that

Gn{an(~)'~}

=

0 for all ~ € [0,1)

(24)

lim

sup

_{119n ('t')-9('t')II-O.}

n

't'

Proof:

It is sufficient to show that given any

~

> 0,

(A.4 )

limsup

sup

I

_{19n ('t')-9('t')I}

I

~ ~.

n

't'

Note that it is also sufficient to consider only those

~

for which

o<~<n,

where n is defined in

(A.3).

Suppose there exists some

~, O~~~n,

for which

(A.4)

does not

hold.

It is shown that this leads to a contradiction.

Define Dn (·), «n' and 't'n by the equations

Dn('t') -

_{119n ('t') - 9('t')}

II;

«n - sup Dn('t') - Dn('t'n)·

't'

If

(A.4)

does not hold then it is possible to find a subsequence

Thus a

Since

°

(0) ~ 0,

On ('t'n ) - «n

>~,

nk k k k

function, it follows that for n

_k

large

.

*

't', call lt 't'n.

Note that the sequences

k

{n

_k},

such that «n

>~

for all n

_k• k

and Dn (·) is a continuous

enough, On

('t')-~

for some

k

*

{'t'n } and {9n ('t'n )} are both contained in compact sets.

_k _k _k

*

further subsequence {n.} can be found along which 't'n.

~

't' ,

J J

*

~

I

A -9( 't' )

I I .

on compact sets

*

- G(An.,'t'n.)

~

O.

J J

Let An.

J

since G

_n

*

~

- On. ( 't'n . )

J J

*

-9n . ( 't'n . ) ;

J J

~

G uniformly

*

Gn ( An . ' 't'n . )

J J

By continuity of G,

*

--'0.

,*

9n . ( 't'n .) --,

A

and

(25)

and thus

* * *

Gn(An.'~n.) - G(A ,~ ) ~ O.

J J

* * *

But Gn(An.'~n.) • 0 and this means that G(A ,~ ) - O.

J J

* * * *

So it has been shown that G(A ,~ )-0, and

I

IA -e(~

)1 I -

0 ~ nand thus (A.3) implies that

A-e(~).

Since

I

IA-e(~)

11-0,

this

contradicts the assumption that 0>0.

In the application of the proposition to the root selection problem, say for example in structural logistic regression, G

n is given by (5.4) and

G(a,a,~) - _Ee {Gn(a,a,~)}.

o

Pointwise convergence of G_n to G, either almost surely or in

probability, follows from the corresponding law of large numbers. Smoothness conditions on W

c and regularity conditions on the joint density of (Y,X) will generally guarantee that the convergence is uniform on compact sets.

The existence .of a unique solution to the equation G(a,a,O)-O follows from the fact, again under regularity conditions, that

G(a,a,O) is the expected gradient of a convex likelihood function; A

O

is just the limit of the so-called naive estimator. Note also that by design, G(ao,ao,l)-O.

Now provided that aG(a,a,~)/a(a,a) is nonsingular, O~~~l (at

(26)

rule out exceptional behavior of G(<<,~,T) for O~T~1 and thus will generally hold under sufficient regularity conditions.

Finally concavity of the usual logistic likelihood insures that

Gn(<<'~'O).O has, for n large enough, a unique convergent solution. Thus provided

[«n(T),~~(T)]

solves

Gn{«n(T)'~n(T),T}

• 0, for all T €

(27)

Figure 1. Plot of Gn(O,~) vs. ~, (0.25<~<10.25); model, logistic

(28)

i _

-I

_~--..1

"'_--.---.---~

..,t

."" I

(

,

~

'.

\ I

_I

I

II

I

Io'lII!!j

_.

t:s:::I I"Q

1"1)

:=

o!""I"- ~

$LI 1"1)

(29)

(30)

. _. &, II 11

... "" I.

I

l

,.-...

,..

.\\

\.

I

1/

1

.... ""1.

II

I

~~~_

\

.1

...•

1'\'1

I

1

11 ••••••1

\.'....

\

I

"'1'1 ' \

/ /

'\,

)

/

I

.1 ••

\,

I / __

1/

I

(1.

1

I

.1

111 I~II

,I

I

.1 I

,,'I'~

•1 •I !I

J I I

.1 I --.1

• s .... ,.1

I I I I

11,1.'.1

11.1,,':"'>-:.1'"

II .',& ... I II • •" .1 If. Ie .1If!. .

-I •. •.iI. ._ ••K

t.:~::::::::.'.'

I.·..""·:.·

...

~

'''';;.

..

(31)

Figure 3. Plot of Gn(0,~,~),~-0(0.25)1,vs. ~, (0.25<~<10.25); model,

(32)

I

(33)

(34)

(35)

Figure 5. Plot of ~(~) vs. ~,

°

<

~

<

1; model, logistic regression,

(~'~1'~2'~3) - _{(1,0.00,0.25,0.50), U - N(0,I 3 ), Z - N(0,I 3 ),}

(36)

f

.

~!J.,:

::::c:

;:'LI ~I"D t:cI

••

IS:!

s:a

1SI ;S) _~ IS:) C!£)

-

.-"'"

-

-~- C5:J

...

:;-..."II ~ ~ ~ c;r-.

1S1~

~I

!1,--4

-

-!

-

I '

-

-.'.' ....J. ...:..

t..:. ....':...

~.

....

-

-\

- I -

-

-:

- -

)'

.

.:.

. .

.

:..

. - -

-:

-

.

.\

-

.

:-

- - -

.

.:

- - - -

_:.

.

- -

.

-

1-

-

-~

~

I~

~

:

- - . - --- -

-

-- - -

-

-.

- -

- ~ - --

-

- -- - - .

-

.- '

-

- -

I

-- --I -- \:

- .

~

..---. -t--.- ----

1 :

--=

~:

:

I:

~

:

~

---:s...

.

-

J

-

-~

- -- .

-~-

.. --

~

-i· -

~

---

~

----

~

----

-~-

----. _I----. ----. ----.:__----. ----. _:----. __

~

. .: _. __. : \_. . .: . _. . _: _____

~

I

~

\

~

-

1 _ _

1 -

-IS) - -

I -

-

I,

-

-~

- -II - - .: - - . - - :- - - - \- -: . - - - ' : - . \ -- - . - - -: - . . .

:;

- . --- - -

-:-

. .

.

\-:

- - . - .

~-

. - .

\\

- . -

.-.

. .

-. - ~ - -I

-. -. - - I _

-

.-- - - . .

--

- - . . - - - - -

--

- - - -

-- -

\

- . -_.

--

-

J

-

- ';

--

-

-I

-

.

\

.

-I

... .

(37)

oe-t'-;:U:::C: ~ -!"'I-I"P ~

IS:I IS) C£I IS) lSI

-

-~ t"-.:I e.a.:- ~ -:..rI

i i

- I

I

- I

~

I

I _

_

\

- -

:-

- -

-

-:

- -

-:

- -

-I

t

-I

_•

~

- -

-

- -

-

-- - -

-

-\

-- - - -

-

_:

- - - -

_:

- - -

-

-II

~

-

- -

-

- -

- - -

-

- -

-

\

- - -

-

-:

-

- - -

-:

- - - -

-3 ..i l _ _

_l _ _

-

- -

(

--- - - -

-

- - - -- - - -

~\

- -

:

- - - -

-:-

--

-

- - - -- -

-

- - -- -

- - -

~-

-

\

- -

-~

-

- -

~

-\

-

- -

~-

-

)-

-~. -

-

- -~- -i I I

-

\ \ I\ \ \ \ \ \\\ \ \

-~

_I.

-

~

\

-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 1

-\.

I

-I.

tS:I

I

t .

1

-1

I:

-

-~+-

-

1 : .

-! i

~-I...

I

~

-..:1'-1

I

~

- -

-cs::st

-

-:x. --..

i- -. - - - - --- -

"

(38)

,

IS)

-~s:a.-

::c

~ ~1'1) _J:'d

IS) IS) _IS:I _!is:I IS) IS)

-

-...

~ (;.0,.) _~ _-:J1 r;r-.

lS:I~i i \ i i

i

I

1-

~

I

~

__ _t _:_ _: __

-'1_ _

~_

_ _ .. _

_~

_ _ _ _ _

~

_ _ _ _ _

~

t

-

.

-

..

~

I

~

:

-.f----:- -.--:---

II -:- ---: -..-_:----.

~

'I

~

-- --:1- -- -:- ---- -: ---

+- ---- -:---:----.

:1

:

I:

:

~

---l --- ----:---

~

----:---~

~I

~

J

~

-

..

-I

-

-~~

----:- I ---

~

--.-

~

---

~

\ ---

~

_:-

---1

-

-I

-

-- -- -- --

-:

-

\

-

:-

- - - -

_:

- - - - -

:-

-

\

- -

-:

- -

-

-:

-

- -

-

.

- 1 -

-

I

~

I

~

I

~

- - - - .. "- - .1- --.. .. - .. .. .- - . - - --. - -\ - ..-. - - .. --- - - - .

-....:l 1_ .

-..

I..

..

.

li-

..

: -r--:-.-

-V -.. --

-~

-.---:- ..--\ .--_:----.

fS)

1 ~

\:

~

...

~

-

. \ .. . : . . . .. \1 . _.. .. . .

-"-D .. . I - . .. Is

-- -I - . - I

(39)

-~~;.,I ::::I::: s:J,.i .~td:) _I:::'d

&S:I ;S) IS) _ts:J IS) IS)

-

-..-

~ c....,:J _~ CJ'I 0--.

... ... -... '" ... ... ... ... .... ... ... ... ... -- ... ... ... ... ...