## Asymptotic results with generalized estimating

## equations for longitudinal data II

### R.M. Balan

*∗*

### University of Ottawa

### I. Schiopu-Kratina

*†*

### Statistics Canada

### March 9, 2004

AbstractWe consider the marginal models of Liang and Zeger (1986) for the analysis of longitudinal data and we develop a theory of statistical in-ference for such models. We prove the existence, weak consistency and asymptotic normality of a sequence of estimators defined as roots of pseudo-likelihood equations.

*Keywords: generalized estimating equations, generalized linear model, *
consis-tency, asymptotic normality.

*AMS Classification: Primary 62F12; Secondary 62J12.*

### 1

### Introduction

Longitudinal data sets arise in biostatistics and life-time testing problems when the responses of the individuals are recorded repeatedly over a period of time. By controlling for individual differences, longitudinal studies are well-suited to measure change over time. On the other hand, they require the use of special statistical techniques because the responses on the same individual tend to be strongly correlated. In a seminal paper Liang and Zeger (1986) proposed the use of generalized linear models (GLM) for the analysis of longitudinal data.

In a cross-sectional study, a GLM is used when there are reasons to believe
that each response*yi* depends on an observable vectorx*i* of covariates (see the

monograph of McCullagh and Nedler, 1989). Typically this dependence is
spec-ified by an unknown parameter *β* and a link function *µ* via the relationship

*∗*_{Research supported by a research grant from the Natural Sciences and Engineering }

Re-search Council of Canada. *E-mail address*: rbala348@science.uottawa.ca

*†*_{Corresponding author.} _{Postal address}_{: Statistics Canada, HSMD, 16RHC, Ottawa, ON,}

*µi*(*β*) =*µ*(x*Tiβ*), where *µi*(*β*) is the mean of*yi*. For one-dimensional

observa-tions, the maximum quasi-likelihood estimator ˆ*βn* is defined as the solution of

the equation
*n*
X
*i*=1
˙
*µi*(*β*)*vi*(*β*)*−*1(*yi−µi*(*β*)) = 0*.* (1)

where ˙*µi* is the derivative of *µi* and *vi*(*β*) is the variance of*yi*. Note that this

equation simplifies considerably if we assume that *vi*(*β*) = *φiµ*˙(x*Ti* *β*), with a

nuisance scale parameter *φi*. In fact (1) is a genuine likelihood equation if the
*yi*’s are independent with densities *c*(*yi, φi*) exp*{φ−i*1[(x*Tiβ*)*yi−b*(x*Tiβ*)]*}*;

In a longitudinal study, the components of an observationy*i* = (*yi*1*, . . . , yim*)*T*

represent repeated measurements at times 1*, . . . , m*for subject*i*. The approach
proposed by Liang and Zeger is to impose the usual assumptions of a GLM only
for the marginal scalar observations *yij* and the *p*-dimensional design vectors

x*ij*. If the correlation matrices within individuals are known (but the entire

likelihood is not specified), then the *m*-dimensional version of (1) becomes a
*generalized estimating equation*(GEE).

In this article we prove the existence, weak consistency and asymptotic
normality of a sequence of estimators, defined as solutions (roots) of
*pseudo-likelihood equations* (see Shao, 1999, p. 315). We work within a nonparametric
set-up similar to that of Liang and Zeger and built upon the impressive work of
Xie and Yang (2003).

Our approach differs from that of Liang and Zeger (1986), Xie and Yang
(2003), and Schiopu-Kratina (2002) in the treatment of the correlation structure
of the data recorded for the same individual across time. As in Rao (1998), we
first obtain a sequence of preliminary consistent estimators ( ˜*βn*)*n* of the main

parameter *β*0 (under the “working independence assumption”), which we use

to consistently estimate the average of the true individual correlations. We
then create the pseudo-likelihood equations whose solutions provide our final
sequence of consistent estimators of of the main parameter. In practice, the
analyst would first use numerical approximation methods (like the
Newton-Raphson method) to solve a simple estimating equation, where each individual
correlation matrix is the identity matrix. The next step would be to solve for*β*in
the pseudo-likelihood equation, in which all the quantities can be calculated from
the data. This approach eliminates the need to introduce nuisance parameters
or to guess at the correlation structures, and thus avoids some of the problems
associated with these methods (see p.112-113 of Fahrmeir and Tutz, 1991).
We note that the assumptions that we require for this two-steps procedure
(our conditions (*AH*g), (*I*e*w*), (*C*e*w*)) are only slightly more stringent that those

of Xie and Yang (2003). They reduce to conditions related to the “working independence assumption” when the average of the true correlation matrices is asymptotically nonsingular (our hypothesis (H)).

As in Lai, Robbins and Wei (1979), where the linear model is treated, we relax the assumption of independence between subjects and consider residuals

which form a martingale difference sequence. Thus our results are more general than results published so far, e.g. Xie and Yang (2003) for GEE, and Shao (1992) for GLM.

Since a GEE is not a derivative, most of the technical difficulties surface when proving the existence of roots of such general estimating equations. Two distinct methods have been developed to deal with this problem. One gives a local solution of the GEE and relies on the classical proof of the inverse function theorem (Yuan and Jennrich, 1998; Schiopu-Kratina, 2003). The other method, which uses a result from topology, was first brought into this context by Chen, Hu and Ying (1999) and was extensively used by Xie and Yang (2003) in their proof of consistency. We adopt this second method, which facilitates a comparison of our results to those of Xie and Yang (2003) and incorporates the inference results for GLM contained in the seminal work of Fahrmeir and Kaufmann (1987).

This article is organized as follows. Section 2 is dedicated to the existence
and weak consistency of a sequence of estimators of the main parameter. To
accommodate the estimation of the average of the correlation matrices in the
martingale set-up, we require two conditions: (*C*1) is a boundedness condition
on the (2+*δ*)-moments of the normalized residuals, whereas (*C*2) is a consistency
condition on the normalized conditional covariance matrix. In this context we
use the martingale strong law of large numbers of Kaufmann (1987). Section
3 presents the asymptotic normality of our estimators. This is obtained under
slightly stronger conditions that those of Xie and Yang (2003), by applying the
classical martingale central limit theorem (see Hall and Heyde, 1980). For ease
of exposition, we placed the more technical proofs in Appendices A and B.

We introduce first some matrix notation (see Schott, 1997). IfAis a*p×p*
matrix, we will denote with*k*A*k*its spectral norm, with det(A) its determinant
and with tr(A) its trace. If A is a symmetric matrix, we denote by *λ*min(A)

(*λ*max(A)) its minimum (maximum) eigenvalue. For any matrix A, *k* A *k*=
*{λ*max(A*T*A)*}*1*/*2. For a *p*-dimensional vector x, we use the Euclidean norm
*k*x*k*= (x*T*_{x)}1*/*2_{= tr(xx}*T*_{)}1*/*2_{. We let}_{A}1*/*2 _{be the symmetric square root of a}

positive definite matrix A andA*−*1*/*2_{= (A}1*/*2_{)}*−*1_{. Finally, we use the matrix}

notationA*≤*Bif*λT*_{A}_{λ}_{≤}_{λ}T_{B}_{λ}_{for any}_{p}_{-dimensional vector}_{λ}_{.}

Throughout this article we will assume that the number of longitudinal
ob-servations on each individual is fixed and equal to *m*. More precisely, we will
denote with y*i* := (*yi*1*, . . . , yim*)*0, i≤n*a longitudinal data set consisting of *n*

respondents, where the components ofy*i* represent measurements at different

times on subject*i*. The observations*yij*are recorded along with a corresponding
*p*-dimensional vectorx*ij* of covariates and the marginal expectations and

vari-ances are specified in terms of the regression parameter *β* through*θij* =x*Tijβ*

as follows:

where*µ*is a continuously differentiable link function with ˙*µ >*0, i.e. we consider
only canonical link functions.

Here are the most commonly used such link functions:
1. in the linear regression,*µ*(*y*) =*y*;

2. in the log regression for count data,*µ*(*y*) = exp(*y*);

3. in the logistic regression for binary data,*µ*(*y*) = exp(*y*)*/*[1 + exp(*y*)];
4. in the probit regression for binary data,*µ*(*y*) = Φ(*y*), where Φ is the standard
normal distribution function; we have ˙Φ(*y*) = (2*π*)*−*1*/*2_{exp(}* _{−}_{y}*2

_{/}_{2).}

In the sequel the unknown parameter*β*lies in an open set*B ⊆Rp*_{and}_{β}

0 is

the true value of this parameter. We normally drop the parameter*β*0 to avoid

cumbersome notation.

Let *µi*(*β*) = (*µi*1(*β*)*, . . . , µim*(*β*))*T*, A*i*(*β*) = diag(*σi*21(*β*)*, . . . , σ*2*im*(*β*)) and

Σ*i*(*β*) := Cov*β*(y*i*). Note that Σ*i*=A*i*1*/*2R¯*i*A1*i/*2, where ¯R*i* is the true

correla-tion matrix ofy*i* at*β*0. LetX*i* = (x*i*1*, . . . ,*x*im*)*T*.

We consider the sequence*²i*(*β*) = (*²i*1(*β*)*, . . . , ²im*(*β*))*T* with *²ij*(*β*) =y*ij−*
*µij*(*β*), and we assume that the residuals (*²i*)*i≥*1 form a martingale difference

sequence, i.e.

*E*(*²i|Fi−*1) = 0 for all*i≥*1

where *Fi* is the minimal *σ*-field with respect to which *²*1*, . . . ²i* are measurable.

This is a natural generalization of the case of independent observations.
Finally, to avoid keeping track of various constants, we agree to denote with
*C* a generic constant which does not depend on*n*, but is different from case to
to case.

### 2

### Asymptotic Existence and Consistency

We consider the generalized estimating equations (GEE) of Xie and Yang (2003)
in the case when the “working” correlation matrices areRindep* _{i}* =Ifor all

*i*. This is also known as the “working independence” case, the word “independence” referring to the observations on the same individual. Let ( ˜

*βn*)

*n*be a sequence

of estimators such that
*P*(gindep

*n* ( ˜*βn*) = 0)*→*1 and *β*˜*n−→P* *β*0 (3)

wheregindep

*n* (*β*) =
P_{n}

*i*=1X*Ti²i*(*β*) is the “working independence” GEE.

The following quantities have been used extensively in the work of Xie and
Yang (2003) and play an important role in the conditions for the existence and
consistency of ˜*βn*:
Hindep*n* =
*n*
X
*i*=1
X*Ti* A*i*X*i,* *π*indep*n* :=

max*i≤nλ*max((Rindep*i* )*−*1)

min*i≤nλ*min((Rindep*i* )*−*1)

˜
*τ*indep

*n* :=*m*max_{i}_{≤}_{n}*λ*max((Rindep*i* )*−*1) =*m,* (*γn*(0))indep := max* _{i}_{≤}_{n,j}_{≤}_{m}*x

*Tij*(Hindep

*n*)

*−*1x

*ij*

We will also use the following maxima:

*kn*[2](*β*) = max* _{i}_{≤}_{n}* max

*¯ ¯ ¯ ¯*

_{j}_{≤}_{m}*µ*¨

_{µ}_{˙}(

_{(}

*θ*)

_{θ}ij_{ij}_{)}¯ ¯ ¯ ¯

*,*

*kn*[3](

*β*) = max

*max*

_{i}_{≤}_{n}*¯ ¯ ¯ ¯*

_{j}_{≤}_{m}*µ*(3)

_{(}

_{θ}*ij*) ˙

*µ*(

*θij*) ¯ ¯ ¯ ¯

The fact that the residuals (*²i*)*i≥*1 form a martingale difference sequence

does not change the proofs of Theorem 2 and Theorem A.1.(ii) of Xie and Yang
(2003). Following their work, we conclude that the sufficient conditions for the
existence of a sequence ( ˜*βn*)*n* with the desired property (3) are:

(*AH*)indep _{For any}_{r >}_{0}* _{, k}*[

*l*]

*,*indep

*n* = sup

*β∈Bn*indep(*r*)
*k*[*l*]

*n*(*β*)*, l*= 2*,*3 are bounded

(*I∗*

*w*)indep *λ*min(Hindep*n* )*→ ∞*

(*Cw∗*)indep *n*1*/*2(*γn*(0))indep*→*0

where *B*indep

*n* (*r*) := *{β*;*k* (Hindep*n* )1*/*2(*β* *−β*0) *k≤* *m*1*/*2*r}*. We denote with

(*C*)indep _{the set of conditions (}_{AH}_{)}indep_{, (}_{I}∗

*w*)indep, (*Cw∗*)indep.

It turns out that, in practice, the analyst will have to verify conditions similar
to (*C*)indep _{in order to produce the estimators that we propose(see Remark 5).}

All the classical examples corresponding to our link functions 1-4 are within the scope of our theory. We present below two new examples.

Example 1. Suppose that*p*= 2. Letx*ij* = (*aij, bij*)*T*,*un*=
P
*i≤n,j≤mσ*2*ija*2*ij*,
*vn* =
P
*i≤n,j≤mσij*2*b*2*ij* and *wn*=
P
*i≤n,j≤mσ*2*ijaijbij*. In this case
Hindep
*n* =
·
*un* *wn*
*wn* *vn*
¸

*λ*max(Hindep*n* ) = (*un*+*vn*+*dn*)*/*2 and *λ*min(Hindep*n* ) = (*un*+*vn−dn*)*/*2, with
*dn* :=

p

(*un−vn*)2+ 4*wn*2. Note that *wn* = *√unvn*cos*θn* for *θn* *∈* [0*, π*] and

det(Hindep

*n* ) =*unvn*sin2*θn* (see also p.79 of McCullagh and Nedler, 1989).

Sup-pose that
*α*:= lim inf
*n→∞* sin
2_{θ}*n>*0*.*
Since
1
*λ*min(Hindep*n* )
= *λ*max(H
indep
*n* )
det(Hindep*n* )
= *un*+*vn*+*dn*
*unvn*sin2*θn*

one can show that condition (*I∗*

*w*)indep is equivalent to min(*un, vn*)*→ ∞*. On

the other hand

x*T*
*ij*(Hindep*n* )*−*1x*ij* =
*a*2
*ij*
*un*
*−*2*wnaijbij*
*unvn*
+*b*
2
*ij*
*vn*
*≤*
µ
*aij*
*u*1*n/*2
+ *bij*
*vn*1*/*2
¶2

Condition (*C∗*

*w*)indep holds if*n*1*/*2max*i≤n,j≤m*(*un−*1*/*2*aij*+*vn−*1*/*2*bij*)2*→*0.

Example 2. The case of a single covariate with*p*different levels (one-way
ANOVA; see also Example 3.13 of Shao, 1999) is usually treated by identifying
each of these levels with one of the *p*-dimensional vectorse1*, . . . ,*e*p*, wheree*k*

has the *k*-th component 1 and all the other components 0. We can say that
x*ij* *∈ {*e1*, . . . ,*e*p}*for all*i≤n, j≤m*. In this case,Hindep*n* is a diagonal matrix.

More precisely,
Hindep
*n* =
*p*
X
*k*=1
*ν*(*k*)
*n* e*k*e*Tk*
where *νn*(*k*)=
P
*i≤n,j≤m*;x*ij*=e*kσ*
2

*ij*. Let*νn* = min*k≤pνn*(*k*). Condition (*Iw∗*)indep

is equivalent to*νn→ ∞*and (*Cw∗*)indep is equivalent to *n*1*/*2*νn−*1*→*0.

The method introduced by Liang and Zeger (1986) and developed recently in
Xie and Yang (2003) relies heavily on the “working” correlation matricesR*i*(*α*)

which are chosen arbitrarily by the statistician (possibly containing a nuisance
parameter*α*) and are expected to be good approximations of the unknown true
correlation matrices ¯R*i*.

In the present paper, we consider an alternative approach in which at each
step *n*, the “working” correlation matrices R*i*(*α*)*, i* *≤* *n* are replaced by the

random matrix
e
*Rn* :=
1
*n*
*n*
X
*i*=1
A*i*( ˜*βn*)*−*1*/*2*²i*( ˜*βn*)*²i*( ˜*βn*)*T*A*i*( ˜*βn*)*−*1*/*2

which depends only on the data set and is shown to be a (possibly biased) consistent estimator of the average of the true correlation matrices

¯¯
R*n*:= 1
*n*
*n*
X
*i*=1
¯
R*i*

The consistency of*R*e*n* is obtained under the following two conditions imposed

on the (normalized) residualsy*∗*

*i* =A

*−*1*/*2

*i* *²i*, with*E*(y*∗i*y*∗iT*) = ¯R*i*:

(*C*1) there exists a*δ∈*(0*,*2] such that sup

*i≥*1
*E*(*k*y*∗*
*i* *k*2+*δ*)*<∞*
(*C*2) 1
*n*
*n*
X
*i*=1
*Vi−→P* 0*,* where*Vi*=*E*(y*i∗*y*∗iT|Fi−*1)*−*R¯*i*

Remark 1. Condition (*C*1) is a bounded moment requirement which is
usually needed for verifying the conditions of a martingale limit theorem, while
(*C*2) is satisfied if the observations are independent. Condition (*C*2) is in

fact a requirement on the (normalized) conditional covariance matrix V*n* =
P*n*

*i*=1*E*(y*i∗*y*i∗T|Fi−*1). More precisely, if the following hypothesis holds true:

(*H*) there exists a constant C*>*0 such that*λ*min( ¯¯R*n*)*≥C* for all*n,*

then (*C*2) is equivalent to ¯¯R*−n*1*/*2(V*n/n*) ¯¯R

*−*1*/*2

*n* *−*I

*P*

*−→* 0 (which is similar to
(3.1) of Hall and Heyde, 1980, or (4.2) of Shao, 1992). Note that (H) is implied
by the following stronger hypothesis, which is needed in Section 3:

(*H0*_{) There exists a constant C}_{>}_{0 such that}_{λ}

min( ¯R*i*)*≥C* for all*i*

(*H0*_{) is satisfied if ¯}_{R}

*i*= ¯Rfor all*i*.

The following result is essential for all our developments.

Theorem 1 *Let* R*n*=*E*(*R*e*n*). Under(*C*)indep*,*(*C*1) *and*(*C*2), we have
e

*Rn−*R*n* *L*
1

*−→*0 (elementwise)*.*

*If the convergence in* (*C*2) *is almost sure, then* *R*e*n−*R*n* *−→a.s.* 0 *(elementwise).*

*The same conclusion holds if* R*n* *is replaced by*R¯¯*n.*

Proof: Let*R*b*n*=*n−*1
P_{n}*i*=1A
*−*1*/*2
*i* *²i²Ti* A
*−*1*/*2

*i* and note that*E*(*R*b*n*) = ¯¯R*n*. Our

result will be a consequence of the following two propositions, whose proofs are
given in Appendix A.*2*

Proposition 1 *Under* (*C*1) *and*(*C*2), we have

b

*Rn−*R¯¯*n* *L*
1

*−→*0 (elementwise)*.*
Proposition 2 *Under* (*C*)indep_{,}_{(}_{C}_{1)} _{and}_{(}_{C}_{2), we have}

e

*Rn−R*b*n* *L*
1

*−→*0 (elementwise)*.*

In what follows we will assume that the inverse of the (nonnegative definite)
random matrix *R*e*n* exists with probability 1, for every *n*. We consider the

following*pseudo-likelihood*equation

*n*
X
*i*=1

D*i*(*β*)*T*Ve*i,n*(*β*)*−*1*²i*(*β*) = 0 (4)

whereD*i*(*β*) =A*i*(*β*)X*i*andVe*i,n*(*β*) :=A*i*(*β*)1*/*2*R*e*n*A*i*(*β*)1*/*2. Note that

equa-tion (4) can be written as

˜
g*n*(*β*) :=
*n*
X
*i*=1
X*Ti* A*i*(*β*)1*/*2*R*e*n−*1A*i*(*β*)*−*1*/*2*²i*(*β*) = 0

We consider also the estimating function
g*n*(*β*) =
*n*
X
*i*=1
X*T*
*i*A*i*(*β*)1*/*2R*−n*1A*i*(*β*)*−*1*/*2*²i*(*β*)

Note thatM*n*:= Cov(g*n*) =
P*n*
*i*=1X*Ti*A
1*/*2
*i* R*−n*1R¯*i*R*−n*1A
1*/*2
*i* X*i*.

As in Xie and Yang (2003), we introduce the following quantities:

H*n*:=
*n*
X
*i*=1
X*T*
*i* A
1*/*2
*i* R*−n*1A
1*/*2
*i* X*i,* *πn* :=*λ*max(R
*−*1
*n* )
*λ*min(R*−n*1)
˜

*τn* :=*mλ*max(R*−n*1)*,* *γn*(0):= max_{i}_{=1}_{,...,n}_{j}_{=1}max* _{,...,m}*(x

*Tij*H

*−n*1x

*ij*)

*,*˜

*γn*= ˜

*τnγn*(0)

Remark 2. Few comments about ˜*τn* are worth mentioning. First M*n* *≤*
*τn*H*n*, where *τn*:= max*i≤nλ*max(R*−n*1R¯*i*)*≤τ*˜*n*. Also, since*r*(*jkn*)*−*¯¯*r*

(*n*)

*jk* *→*0 and

*|*¯¯*r*(* _{jk}n*)

*| ≤*1, we can assume that

*|r*(

*)*

_{jk}n*| ≤*2, for

*n*large enough (here

*r*(

*)*

_{jk}n*,*¯¯

*r*(

*) are the elements of the matrices R*

_{jk}n*n*, respectively ¯¯R

*n*). Therefore ˜

*τn*

*≥*1

*/*2. The

reason why we prefer to work with ˜*τn* instead of*τn* will become transparent in

the proof of Proposition 3 (given in Appendix B). Another reason is of course
the fact that ˜*τn* does not depend on the unknown matrices ¯R*i*.

Our approach requires a slight modification of the conditions introduced
by Xie and Yang (2003) to accommodate the use of ˜*τn* instead of *τn*. Let

˜

*Bn*(*r*) :=*{β*;*k*H1*n/*2(*β−β*0)*k≤*(˜*τn*)1*/*2*r}*. Our conditions are:

(AH)g For any*r >*0*,* ˜*k*[*nl*] = sup
*β∈Bn*˜ (*r*)

*k*[*nl*](*β*)*, l*= 2*,*3 are bounded

(*I*e*w*) (˜*τn*)*−*1*λ*min(H*n*)*→ ∞*

(*C*e*w*) (*πn*)2˜*γn* *→*0*,* and*n*1*/*2*πnγ*˜*n→*0

Remark 3. Note that (*I*e*w*) implies (*Iw∗*)indep, which implies*λ*min(H*n*)*→ ∞*.

This follows from the following inequalities: 1

2*m*H

indep

*n* *≤λ*min(R*−n*1)*·*Hindep*n* *≤*H*n* *≤λ*max(R*−n*1)*·*Hindep*n* =

˜
*τn*
*m*H

indep

*n*

Remark 4. Our conditions depend on the matrix R*n*, which cannot be

written in a closed form. Since ˜*Rn−*R*n* *→P* 0, it is desirable to express our

conditions in terms of the matrix ˜*Rn*. In practice, if the sample size is large

enough, one may choose to verify conditions (gAH), (*I*e*w*), (*C*e*w*) by using ˜*Rn*

Remark 5. If we suppose that hypothesis (H) holds, then for*n*large
*C*

2 *≤λ*min(R*n*)*≤λ*max(R*n*)*≤*2*m*

In this case (˜*τn*)*n* and (*πn*)*n* are bounded, *C*(*γn*(0))indep *≤γ*(0)*n* *≤C*(*γ*(0)*n* )indep,

and for every*r >*0 there exists*r0 _{>}*

_{0 such that ˜}

_{B}*n*(*r*)*⊆Bn*indep(*r0*). Therefore,

(*AH*g), (*I*e*w*)*,*(*C*e*w*) are equivalent to (*AH*)indep, (*Iw∗*)indep, (*Cw∗*)indep, respectively.

In order to verify (*H*), it is sufficient to check that there exists a constant*C >*0
such that

det(*R*e*n*)*≥C,* for all*n,* a*.*s*.*

under the hypothesis of Theorem 1.

We need to consider the following derivatives:

e

*Dn*(*β*) :=*−∂*˜g*n*(*β*)

*∂βT* *,* *Dn*(*β*) :=*−*
*∂*g*n*(*β*)

*∂βT*

The next theorem is a modified version of Theorem A.2, respectively Theorem A.1.(ii) of the above mentioned authors.

Theorem 2 *Under* (*AH*g)*and*(*C*e*w*),

*(i) for every* *r >*0
sup
*β∈Bn*˜ (*r*)
*k*H*−*1*/*2
*n* *Dn*(*β*)H*−n*1*/*2*−*I*k*
*P*
*−→*0*, and*

*(ii) there existsc*0*>*0 *such that for everyr >*0

*P*(*Dn*(*β*)*≥c*0H*n, f or all β∈B*˜*n*(*r*))*→*1*.*

Proof: (i) The first two terms produced by the decomposition*Dn*(*β*) =H*n*(*β*)+

B*n*(*β*) +*En*(*β*) are shown to be bounded by *π*2*n*˜*γn*, whereas the third term is

bounded in *L*2 _{by}*√ _{nπ}*

*nγ*˜*n*. (Here H*n*(*β*)*,*B*n*(*β*)*,En*(*β*) have the same

expres-sions as those given in Xie and Yang (2003) withR*i*(*α*)*, i≤n*replaced byR*n*.)

The arguments are essentially the same as those used in Lemmas A1.(ii), A2.(ii),
A3.(ii) of Xie and Yang (2003). The fact that we are replacing the “working”
correlation matricesR*i*(*α*)*, i*= 1*, . . . , n*with the matrixR*n*and we assume that

(*²i*)*i≥*1is a martingale difference sequence, does not influence the proof. Finally

we note that (ii) is a consequence of (i). *2*

The next two results are intermediate steps that are used in the proof of our main result. Their proofs are given in Appendix B.

Proposition 3 *Suppose that the conditions of Theorem 1 hold. Then*

Proposition 4 *Suppose that the conditions of Theorem 1 hold. Under* (*AH*g)
*and*( ˜*Cw*),
sup
*β∈Bn*e (*r*)
*k*H*−*1*/*2
*n* [*D*e*n*(*β*)*− Dn*(*β*)]H*−n*1*/*2*k*
*P*
*−→*0*.*

The next theorem is our main result. It shows that under our slightly modified
conditions (*AH*g)*,*(*I*e*w*)*,*(*C*e*w*) and the additional conditions of Theorem 1, one

can obtain a solution ˆ*βn* of the pseudo-likelihood equation ˜g*n*(*β*) = 0, which is

also a consistent estimator of *β*0.

Theorem 3 *Suppose that the conditions of Theorem 1 hold. Under*(*AH*g),(*I*e*w*)

*and*(*C*e*w*), there exists a sequence ( ˆ*βn*)*n* *of random variables such that*
*P*(˜g*n*( ˆ*βn*) = 0)*−→*1 *and* *β*ˆ*n* *−→P* *β*0*.*

Proof: Let *² >* 0 be arbitrary and *r* = *r*(*²*) = p(24*p*)*/*(*c*2

1*²*), where *c*1 is a

constant to be specified later. We consider the events
˜
*En*:=*{k*H*−n*1*/*2˜g*nk≤* inf
*β∈∂Bn*˜ (*r*)*k*H
*−*1*/*2
*n* (˜g*n*(*β*)*−*˜g*n*)*k}*
˜

Ω*n*:=*{D*˜*n*( ¯*β*) nonsingular*,* for all ¯*β∈B*˜*n*(*r*)*}*

By Lemma A of Chen, Hu and Ying (1999), it follows that on the event ˜*En∩*Ω˜*n*,

there exists ˆ*βn* *∈B*˜*n*(*r*) such that ˜*gn*( ˆ*βn*) = 0. Therefore, it remains to prove

that *P*( ˜*En∩*Ω˜*n*)*>*1*−²*for*n*large.

By Taylor’s formula and Lemma 1 of Xie and Yang (2003) we obtain that
for any *β∈∂B*˜*n*(*r*) there exists ¯*β* *∈B*˜*n*(*r*) and a*p×*1 vector *λ*,*kλk*= 1 such

that

*k*H*−*1*/*2

*n* (˜g*n*(*β*)*−*˜g*n*)*k≥ |λT*H*−n*1*/*2*D*˜*n*( ¯*β*)H*−n*1*/*2*λ| ·r*(˜*τn*)1*/*2*≥*

*{|λT*_{H}*−*1*/*2

*n* *Dn*( ¯*β*)H*−n*1*/*2*λ| − |λT*H*−n*1*/*2[ ˜*Dn*( ¯*β*)*− Dn*( ¯*β*)]H*n−*1*/*2*λ|} ·r*(˜*τn*)1*/*2

By Theorem 2.(ii) there exists*c*0*>*0 such that

*P*(*λT*H*−n*1*/*2*Dn*(*β*)H*−n*1*/*2*λ≥c*0*,* for all*β* *∈B*˜*n*(*r*)*,* for all*λ,kλk*= 1)*>*1*−²/*6

(5)
when*n*is large. Let*c0*

0*∈*(0*, c*0) be arbitrary. By Proposition 4,

*P*(*|λT*H*−n*1*/*2[ ˜*Dn*(*β*)*−Dn*(*β*)]H*n−*1*/*2*λ| ≤c0*0*,* for all*β∈B*˜*n*(*r*)*,* for all*λ*)*>*1*−²/*6

(6)
when*n*is large. Therefore, if we put *c*1:=*c*0*−c0*0, we have

*P*( inf

*β∈∂Bn*˜ (*r*)*k*H

*−*1*/*2

*n* (˜g*n*(*β*)*−*g˜*n*)*k≥c*1*r*(˜*τn*)1*/*2)*>*1*−²/*3 (7)

On the other hand, by Chebyshev’s inequality and our choice of*r*, we have
*P*(*k*H*−n*1*/*2g*n* *k≤c*1*r*(˜*τn*)1*/*2*/*2)*>*1*−²/*6 for all*n*. By Proposition 3,

*P*(*k*H*−n*1*/*2(˜g*n−*g*n*)*k≤c*1*r*(˜*τn*)1*/*2*/*2)*>*1*−²/*6, for*n*large. Hence

*P*(*k*H*−n*1*/*2g˜*nk≤c*1*r*(˜*τn*)1*/*2)*>*1*−²/*3 (8)

From (7) and (8) we get that *P*( ˜*En*)*>*1*−*(2*²*)*/*3, for n large. This concludes

the proof of the asymptotic existence.

We proceed now with the proof of weakly consistency. Let*δ >*0 be arbitrary.
By ( ˜*Iw*), we have ˜*τn/λ*min(H*n*)*<*(*δ/r*)2, for*n*large. We know that on the event

˜

*En∩*Ω˜*n*, there exists ˆ*βn∈B*˜*n*(*r*) such that ˜g*n*( ˆ*βn*) = 0. Therefore, on this event

*kβ*ˆ*n−β*0*k≤k*H*n−*1*/*2*k · k*H1*n/*2( ˆ*βn−β*0)*k≤*[*λ*min(H*n*)]*−*1*/*2*·*(˜*τn*)1*/*2*r < δ*

for*n*large. This proves that*P*(*kβ*ˆ*n−β*0*k≤δ*)*>*1*−²*, for*n*large. *2*

### 3

### Asymptotic Normality

Let *cn* = *λ*max(M*−n*1H*n*). In this section we will suppose that (*cnτ*˜*n*)*n* is

bounded.

Theorem 4 *Under the conditions of Theorem 3,*

M*n−*1*/*2˜g*n* =M*n−*1*/*2H*n*( ˆ*βn−β*0) +*oP*(1)*.*

Proof: On the set*{*g˜*n*( ˆ*βn*) = 0*,* *β*ˆ*n* *∈B*˜*n*(*r*)*}*, we have ˜g*n* =*D*e*n*( ¯*βn*)( ˆ*βn−β*0)

for some ¯*βn∈B*˜*n*(*r*), by Taylor’s formula. Multiplication withM*−n*1*/*2 yields

M*−*1*/*2

*n* g˜*n*=M*n−*1*/*2H1*n/*2A*n*H*n*1*/*2( ˆ*βn−β*0) +M*n−*1*/*2H*n*( ˆ*βn−β*0)

whereA*n*:=H*−n*1*/*2*D*e*n*( ¯*βn*)H*−n*1*/*2*−*I=*oP*(1), by Theorem 2.(i) and Proposition

4. The result follows since *k* M*−n*1*/*2H*n*1*/*2 *k≤* *cn*1*/*2 and *k* H1*n/*2( ˆ*βn* *−β*0) *k≤*

(˜*τn*)1*/*2*r*. *2*

Let *γ*(*nD*) := max1*≤i≤nλ*max(H*−n*1*/*2X*Ti*A

1*/*2

*i* R*−n*1A*i*1*/*2X*i*H*−n*1*/*2). Note that
*γn*(*D*) *≤Cdn*˜*γn*, where *dn* = max*i≤n,j≤mσ*2*ij*. We consider the following

condi-tions:

(*N*e*δ*) There exists a*δ >*0 such that : (i)*Y* := sup
*i≥*1*E*(*k*y
*∗*
*i* *k*2+*δ* *|Fi−*1)*<∞*a*.*s*.*;
(ii)(*cnτ*˜*n*)1+2*/δγn*(*D*)*→*0
(*C*2)*0* _{max}
*i≤n* *λ*max(*Vi*)
*P*
*−→*0

Remark 6. Note that condition (*N*e*δ*)(*i*), with *Y* integrable, implies (*C*1),

whereas (*C*2)*0* _{is a stronger form of (}_{C}_{2). Part (ii) of condition (}_{N}_{e}

*δ*) was

intro-duced by Xie and Yang (2003).

The following result gives the asymptotic distribution of ˜g*n*.

Lemma 1 *Suppose that the conditions of Theorem 1 hold. Under* (*N*e*δ*), (*C*2)*0*

*and*(*H0*_{)}

M*−n*1*/*2˜g*n* *−→d* *N*(0*,*I)

Proof: We note that
M*−*1*/*2

*n* g˜*n*=M*−n*1*/*2g*n*+M*−n*1*/*2(˜g*n−*g*n*)

and*k*M*−n*1*/*2(˜g*n−*g*n*)*k≤*(*cnτ*˜*n*)1*/*2*k*(˜*τn*)*−*1*/*2H*n−*1*/*2(˜g*n−*g*n*)*k→P* 0, by

Propo-sition 3. Therefore it is enough to prove that M*−n*1*/*2g*n* *→d* *N*(0*,*I). By the

Cram´er-Wold theorem, this is equivalent to showing that: *∀λ,kλk*= 1

*λT*_{M}*−*1*/*2
*n* g*n*=
*n*
X
*i*=1
*Zn,i−→d* *N*(0*,*1) (9)
where *Zn,i* = *λT*M*−n*1*/*2X*Ti* A
1*/*2

*i* R*−n*1A*−i*1*/*2*²i*. Note that *E*(*Zn,i|Fi−*1) = 0 for

all*i≤n*, i.e. *{Zn,i*;*i≤n, n≥*1*}* is a martingale difference array.

Relationship (9) follows by the martingale central limit theorem with Lin-deberg condition (see Corollary 3.1 of Hall and Heyde, 1980) if

*n*
X
*i*=1
*E*[*Z*2
*n,i*I(*|Zn,i|> ²*)*|Fi−*1]*−→*0 a*.*s*.* (10)
and
*n*
X
*i*=1
*E*(*Z*2
*n,i|Fi−*1)*−→P* 1 (11)

Relationship (10) follows from (*N*e*δ*) exactly as in Lemma 2 of Xie and Yang

(2003) with*ψ*(*t*) =*tδ/*2_{. Relationship (11) follows from (}_{C}_{2)}*0* _{and (}_{H}0_{):}

*n*
X
*i*=1
*E*(*Zn,i*2 *|Fi−*1)*−*1 =
*n*
X
*i*=1
[*E*(*Zn,i*2 *|Fi−*1)*−E*(*Zn,i*2 )]
=
*n*
X
*i*=1
*λT*M*−n*1*/*2X*iT*A1*i/*2R*−n*1*Vi*R*−n*1A*i*1*/*2X*i*M*−n*1*/*2*λ*
*≤* max

1*≤i≤nλ*max(*Vi*)*·*1max*≤i≤nλ*max( ¯R

*−*1
*i* )*·λT*M*n−*1*/*2M*n*M*−n*1*/*2*λ*
*≤* *C−*1_{max}
*i≤n* *λ*max(*Vi*)
*P*
*−→*0

*2*

Putting together the results in Theorem 4 and Lemma 1, we obtain the
asymptotic normality of the estimator ˆ*βn*.

Theorem 5 *Under the conditions of Theorem 3,*(*Nδ*)*and*(*H0*)

M*−*1*/*2

*n* H*n*( ˆ*βn−β*0)*−→d* *N*(0*,*I)

Remark 7. In applications we would need a version of Theorem 5 where
M*n* is replaced by an consistent estimator. We suggest the estimator proposed

by Liang and Zeger (1986) (see also Remark 8 of Xie and Yang, 2003). The details of the proof are omitted.

### A

### Appendix A

The following lemma is a consequence of Kaufmann’s (1987) martingale strong law of large numbers and can be viewed as a stronger version of Theorem 2.19 of Hall and Heyde (1980).

Lemma 2 *Let*(*xi*)*i≥*1*be a sequence of random variables and*(*Fi*)*i≥*1*a sequence*

*of increasingσ-fields such thatxiisFi-measurable for everyi≥*1. Suppose that

sup_{i}E|xi|α<∞*for someα∈*(1*,*2]. Then

1
*n*
*n*
X
*i*=1
(*xi−E*(*xi|Fi−*1))*−→*0 *a.s. and in Lα*

Proof: Note that *yi* = *xi−E*(*xi|Fi−*1)*, n* *≥* 1 is a martingale difference

se-quence. By conditional Jensen’s inequality

*|yi|α≤*2*α−*1*{|xi|α*+*|E*(*xi|Fi−*1)*|α} ≤*2*α−*1*{|xi|α*+*E*(*|xi|α|Fi−*1)*}*

and sup*i≥*1*E|yi|α≤*2*α*sup*i≥*1*E|xi|α<∞*. Hence
X
*i≥*1
*E|yi|α*
*iα* *≤*sup_{i}_{≥}_{1}*E|yi|*
*α _{·}*X

*i≥*1 1

*iα*

*<∞*

The lemma follows by Theorem 2 of Kaufmann (1987) with*p*= 1*, Bi*=*i−*1. *2*

Proof of Proposition 1: We denote with ˆ*r _{jk}*(

*n*)

*,*¯¯

*r*(

_{jk}*n*)

*, v*(

_{jk}*n*), (

*j, k*= 1

*, . . . , m*) the elements of the matrices

*R*b

*n*, ¯¯R

*n*, respectively

*Vn*. We write

ˆ
*r*(* _{jk}n*)

*−*¯¯

*r*(

*)= 1*

_{jk}n*n*

*n*X

*i*=1 (

*y∗*

*ijyik∗*

*−E*(

*yij∗y∗ik|Fi−*1)) +1

*n*

*n*X

*i*=1

*v*(

*) (12)*

_{jk}iThe first term converges to 0 almost surely and in *L*1+*δ/*2 _{by applying the}

Lemma 2 with*xi*=*yij∗yik∗*, and using (*C*1). The second term converges to 0 in

probability by condition (*C*2). This convergence is also in *L*1+*δ/*2 _{because the}

sequence *{n−*1P*n*
*i*=1*v*

(*i*)

*jk}n* has uniformly bounded moments of order 1 +*δ/*2

and hence is uniformly integrable. *2*

Proof of Proposition 2: We denote with ˜*r _{jk}*(

*n*), (

*j, k*= 1

*, . . . , m*) the el-ements of the matrix

*R*e

*n*. Let ˜

*δi,jk*:= [

*σijσik*]

*/*[

*σij*( ˜

*βn*)

*σik*( ˜

*βn*)]

*−*1, ˜∆

*µij*:=

*µij*( ˜

*βn*)

*−µij*(

*β*0) and

˜

∆(*²ij²ik*) :=*²ij*( ˜*βn*)*²ik*( ˜*βn*)*−²ij²ik*= ( ˜∆*µij*)( ˜∆*µik*)*−*( ˜∆*µij*)*²ik−*( ˜∆*µik*)*²ij*

With this notation, we have

˜
*r _{jk}*(

*n*)

*−r*ˆ(

*)= 1*

_{jk}n*n*

*n*X

*i*=1

*²ij*( ˜

*βn*)

*²ik*( ˜

*βn*)

*σij*( ˜

*βn*)

*σik*( ˜

*βn*)

*−*1

*n*

*n*X

*i*=1

*²ij²ik*

*σijσik*= 1

*n*

*n*X

*i*=1 ˜ ∆(

*²ij²ik*)

*σijσik*+1

*n*

*n*X

*i*=1 ˜ ∆(

*²ij²ik*)

*σijσik*˜

*δi,jk*+ 1

*n*

*n*X

*i*=1

*²ij²ik*

*σijσik*˜

*δi,jk*

From here, we conclude that

*|r*˜* _{jk}*(

*n*)

*−r*ˆ(

*)*

_{jk}n*| ≤Un,jk*+ max

*i≤n*

*|*˜

*δi,jk| ·*(

*Un,jk*+ 1

*n*

*n*X

*i*=1

*|yij∗y∗ik|*) where

*Un,jk*:= 1

*n*

*n*X

*i*=1

*|*∆˜

*µij| · |*∆˜

*µik|*

*σijσik*+ 1

*n*

*n*X

*i*=1

*|*∆˜

*µij|*

*σij*

*· |y*

*∗*

*ik|*+ 1

*n*

*n*X

*i*=1

*|*∆˜

*µik|*

*σik*

*· |y*

*∗*

*ij|*=

*U*[1] +

_{n,jk}*U*[2] +

_{n,jk}*U*[3]

_{n,jk}Recall that our estimator ˜*βn* was obtained in the proof of Theorem 2 of Xie

and Yang (2003) as a solution of the GEE in the case when all the “working”
correlation matrices are Rindep* _{i}* =I. One of the consequences of the result of
Xie and Yang is that for every fixed

*² >*0, there exist

*r*=

*r²*and

*N*=

*N²*such

that, if we denote Ω*n,²*=*{β*˜*n* lies in*B*indep*n* (*r*)*}*, then
*P*(Ω*n,²*)*≥*1*−²* for all*n≥N.*

We define ˜*βn* to be equal to*β*0 on the event Ω*cn,²*. Therefore,

on Ω*c*

Using Taylor’s formula and assumption (*AH*)indep_{, we can conclude that on}

the event Ω*n,²*, there exist a constant*C*=*C²*such that

*|*˜*δi,jk|*=
¯
¯
¯
¯
¯
˙
*µ*(x*T*
*ijβ*0)
˙
*µ*(x*T*
*ijβ*˜*n*)
*−*1
¯
¯
¯
¯
¯*≤C·*(*γ*
0

*n*)indep*·*(*m*1*/*2*r*) for all*i≤n*

1
*n*
*n*
X
*i*=1
( ˜∆*µij*)2
*σ*2
*ij*
= *n−*1_{( ˜}_{β}*n−β*0)*T*
*n*
X
*i*=1
Ã
*σ*2
*ij*( ¯*βn*)
*σ*2
*ij*
!2
*σ*2
*ij*x*ij*x*Tij*
( ˜*βn−β*0)
*≤* *n−*1( ˜*βn−β*0)*T*
( * _{n}*
X

*i*=1 X

*Ti*A1

*i/*2[A

*i*( ¯

*βn*)A

*−i*1]2A 1

*/*2

*i*X

*i*) ( ˜

*βn−β*0)

*≤*

*n−*1max

*i≤n*

*λ*2 max[A

*i*( ¯

*βn*)A

*−i*1]

*· k*(H

*n*indep)1

*/*2( ˜

*βn−β*0)

*k*2

*≤*

*Cn−*1(

*m*2

*r*)

Note also that*E*[*n−*1P*n*

*i*=1(*y∗ij*)2] =*E*[ˆ*r*
(*n*)
*jj* ] =*O*(1) since ˆ*r*
(*n*)
*jj* *−*¯¯*r*
(*n*)
*jj*
*L*1
*−→*0 and
¯¯

*r*(* _{jj}n*) = 1. Applying the Cauchy-Scwartz inequality to each of the three sums
that form

*Un,jk*we can conclude that

*E*[*U _{n,jk}*[1] ]

*−→*0 and

*E*[(

*U*[

_{n,jk}*l*] )2]

*−→*0

*, l*= 2

*,*3 On the other hand

*E*[max
*i≤n* *|*
˜
*δi,jk| ·Un,jk*] =
Z
Ω*n,²*
max
*i≤n* *|*
˜
*δi,jk| ·Un,jkdP* *≤C*(*γn*(0))indep
Z
Ω
*Un,jk→*0
*E*[max
*i≤n* *|*
˜
*δi,jk| ·*
1
*n*
*n*
X
*i*=1
*|y∗*
*ijyik∗|*]*≤C*(*γ*(0)*n* )indep[*E*(ˆ*r*
(*n*)
*jj* )]1*/*2[*E*(ˆ*r*
(*n*)
*kk*)]1*/*2*→*0
*2*

### B

### Appendix B

Proof of Proposition 3: Let*h*(0)* _{ijk}* = [

*σ*2

*ij/σik*2]1*/*2, ˜*R−n*1:= ˜*Qn*= (˜*q*(*jkn*))*j,k*=1*,...,m*

andR*−*1

*n* :=Q*n*= (*q*(*jkn*))*j,k*=1*,...,m*. With this notation, we write

(˜*τn*)*−*1*/*2H*n−*1*/*2(˜g*n−*g*n*) =
*m*
X
*j,k*=1
(˜*q*(* _{jk}n*)

*−q*(

*))*

_{jk}n*· {*(˜

*τn*)

*−*1

*/*2H

*−n*1

*/*2

*n*X

*i*=1

*h*(0)

*x*

_{ijk}*ij²ik}*

By Theorem 1, ˜*q _{jk}*(

*n*)

*−q*(

_{jk}*n*)

*→P*0 for every

*j, k*. The result will follow once we prove that

*{*(˜

*τn*)

*−*1

*/*2H

*−n*1

*/*2

P_{n}*i*=1*h*

(0)

*ijk*x*ij²ik}n* is bounded in *L*2, for every *j, k*.

Since (*²ik*)*i≥*1 is a martingale difference sequence, we have

*E*(*k*(˜*τn*)*−*1*/*2H*−n*1*/*2
*n*
X
*i*=1
*h*(0)* _{ijk}*x

*ij²ikk*2) = (˜

*τn*)

*−*1tr

*{*H

*−n*1

*/*2(

*n*X

*i*=1 (

*h*(0)

*)2*

_{ijk}*2*

_{σ}*ik*x

*ij*x

*Tij*)H

*−n*1

*/*2

*}*= (˜

*τn*)

*−*1tr

*{*H

*−n*1

*/*2(

*n*X

*i*=1

*σ*2

*ij*x

*ij*x

*Tij*)H

*−n*1

*/*2

*} ≤*(˜

*τn*)

*−*1(4

*m*˜

*τn*) tr(I) = 4

*mp*becauseP

*n*

_{i}_{=1}

*σ*2

*ij*x

*ij*x

*Tij≤*P

_{n}*i*=1X

*Ti*A

*i*X

*i≤λ*max(R

*n*)H

*n*

*≤*4

*mτ*˜

*n*H

*n*.

*2*

Proof of Proposition 4: We write

*Dn*(*β*) =H*n*(*β*) +B*n*(*β*) +*En*(*β*)*,* *D*e*n*(*β*) =*H*e*n*(*β*) +*B*e*n*(*β*) +*E*e*n*(*β*)

where *H*e*n*(*β*)*,B*e*n*(*β*)*,E*e*n*(*β*) have the same expressions as H*n*(*β*)*,*B*n*(*β*)*,En*(*β*),

withR*n* replaced by*R*e*n*. Our result will follow by the following three lemmas.

*2*

Lemma 3 *Suppose that assumption* (*AH*g) *holds. If*(*πnγ*˜*n*)*n* *is bounded, then*

*for any* *r >*0 *and for anyp×*1 *vectorλwithkλk*= 1,
sup
*β∈Bn*˜ (*r*)
*|λT*_{H}*−*1*/*2
*n* [*H*e*n*(*β*)*−*H*n*(*β*)]H*−n*1*/*2*λ|*
*P*
*−→*0*.*

Lemma 4 *Suppose that assumption* (*AH*g) *holds. If*(*π*2

*nγ*˜*n*)*n* *is bounded, then*

*for any* *r >*0 *and for anyp×*1 *vectorλwithkλk*= 1,
sup

*β∈Bn*˜ (*r*)

*|λT*H*−n*1*/*2[*B*e*n*(*β*)*−*B*n*(*β*)]H*−n*1*/*2*λ|*
*P*

*−→*0*.*

Lemma 5 *Suppose that assumption*(*AH*g)*holds. If* (*n*1*/*2_{γ}_{˜}

*n*)*n* *is bounded, then*

*for any* *r >*0 *and for anyp×*1 *vectorλwithkλk*= 1,
sup

*β∈Bn*˜ (*r*)

*|λT*H*−n*1*/*2[*E*e*n*(*β*)*− En*(*β*)]H*−n*1*/*2*λ|*
*P*

*−→*0*.*

Proof of Lemma 3: Using Theorem 1 and the fact that*|r*(* _{jk}n*)

*| ≤*2 for

*n*large, we have

*An*=R1

*n/*2

*R*e

*n−*1R1

*n/*2

*−*I=R

*n*1

*/*2(

*R*e

*−n*1

*−*R

*n−*1)R1

*n/*2

*→P*0 (elementwise).

For every*β*,

*|λT*_{H}*−*1*/*2

*|*

*n*
X
*i*=1

*λT*H*−n*1*/*2X*Ti*A*i*(*β*)1*/*2R*−n*1*/*2*An*R*−n*1*/*2A*i*(*β*)1*/*2X*i*H*−n*1*/*2*λ|*

*≤*max*{|λ*max(*An*)*|,|λ*min(*An*)*|} · {λT*H*−n*1*/*2H*n*(*β*)H*−n*1*/*2*λ}*

The result follows, since one can show that for every*β∈B*˜*n*(*r*)

*|λT*_{H}*−*1*/*2
*n* H*n*(*β*)H*−n*1*/*2*λ−*1*| ≤λT*H*−n*1*/*2H[1]*n* (*β*)H*n−*1*/*2*λ*+2*|λT*H*n−*1*/*2H[2]*n* (*β*)H*−n*1*/*2*λ|*
*≤Cπnγ*˜*n*+ 2*C*(*πn*˜*γn*)1*/*2*≤C* (13)
where
H[1]
*n* (*β*) =
*n*
X
*i*=1
X*T*
*i*(A1*i/*2(*β*)*−*A
1*/*2
*i* )R*−n*1(A1*i/*2(*β*)*−*A
1*/*2
*i* )X*i*
H[2]*n* (*β*) =
*n*
X
*i*=1
X*Ti*(A1*i/*2(*β*)*−*A
1*/*2
*i* )R*−n*1A1*i/*2X*i*

We used the fact that sup* _{β}_{∈}Bn*˜ (

*r*)max

*i≤nλ*max{(A1

*i/*2(

*β*)A

*−*1*/*2

*i* *−*I)2*} ≤C*˜*γn*,

which follows by (*AH*g) as in Lemma B1.(ii) of Xie and Yang (2003). *2*

Proof of Lemma 4: Letw*i,n*(*β*)*T* =*λT*H*−n*1*/*2X*T _{i}* G

*[1](*

_{i}*β*)diag

*{*X

*i*H

*n−*1

*/*2

*λ}*R

*−n*1

*/*2

andz*i,n*(*β*) =R*−n*1*/*2A*i*(*β*)*−*1*/*2(*µi−µi*(*β*)). We have

*|λT*H*−n*1*/*2[*B*e[1]*n* (*β*)*−*B[1]*n* (*β*)]H*−n*1*/*2*λ|*=*|*
*n*
X
*i*=1
w*i,n*(*β*)*TAn*z*i,n*(*β*)*|*
*≤k Ank*
( * _{n}*
X

*i*=1

*k*w

*i,n*(

*β*)

*k*2 )1

*/*2(

*X*

_{n}*i*=1

*k*z

*i,n*(

*β*)

*k*2 )1

*/*2

using Cauchy-Schwartz inequality. Methods similar to those developed in the
proof of Lemma A.2.(ii) of Xie and Yang (2003) show that for any *β* *∈B*˜*n*(*r*),
P*n*

*i*=1*k*w*i,n*(*β*)*k*2*≤Cπnγn*(0) and
P*n*

*i*=1*k*z*i,n*(*β*)*k*2*≤Cπnτ*˜*nλ*max(H*−n*1*/*2H*n*( ¯*β*)

H*−n*1*/*2)*≤Cπnτ*˜*n* (using (13) for the last inequality). Hence

sup

*β∈Bn*˜ (*r*)

*|λT*_{H}*−*1*/*2

*n* [*B*e*n*[1](*β*)*−*B[1]*n* (*β*)]H*n−*1*/*2*λ| ≤Ck Ankπn*(˜*γn*)1*/*2*−→P* 0

Letv*i,n*(*β*)*T* =*λT*H*n−*1*/*2X*Ti*A*i*(*β*)1*/*2R*n−*1*/*2*An*R*−n*1*/*2diag*{*X*i*H*−n*1*/*2*λ}*G[2]*i* (*β*)

A*i*(*β*)1*/*2R1*n/*2. We have
*|λT*H*−n*1*/*2[*B*e[2]*n* (*β*)*−*B[2]*n* (*β*)]H*−n*1*/*2*λ|*=*|*
*n*
X
*i*=1
v*i,n*(*β*)*T*z*i,n*(*β*)*|*

*≤*
( * _{n}*
X

*i*=1

*k*v

*i,n*(

*β*)

*k*2 )1

*/*2(

*X*

_{n}*i*=1

*k*z

*i,n*(

*β*)

*k*2 )1

*/*2

*.*One can prove that for any

*β∈B*˜

*n*(

*r*),

P_{n}

*i*=1 *k*v*i,n*(*β*)*k*2*≤Cπnγ*
(0)

*n* *λ*max(*A*2*n*)*·*

*{λT*_{H}*−*1*/*2

*n* H*n*(*β*)H*−n*1*/*2*λ} ≤* *Cπnγn*(0) *k An* *k*2 (using (13) for the last

inequal-ity). Hence
sup
*β∈Bn*˜ (*r*)
*|λT*_{H}*−*1*/*2
*n* [*B*e[2]*n* (*β*)*−*B[2]*n* (*β*)]H*n−*1*/*2*λ| ≤Ck Ank* *πn*(˜*γn*)1*/*2*−→P* 0
*2*

Proof of Lemma 5: We write*E*e*n*(*β*)*− En*(*β*) = [*E*e*n*[1](*β*)*− En*[1](*β*)] + [*E*e*n*[2](*β*)*−*

*En*[2](*β*)] and we use a decomposition which is similar to that given in the proof

of Lemma A.3.(ii) of Xie and Yang (2003). More precisely, we write
*λT*H*−n*1*/*2[*E*e*n*[1](*β*)*− En*[1](*β*)]H*n−*1*/*2*λ*=*Tn*[1]+*Tn*[3](*β*) +*Tn*[5](*β*)
*λT*_{H}*−*1*/*2
*n* [*E*e*n*[2](*β*)*− En*[2](*β*)]H*n−*1*/*2*λ*=*Tn*[2]+*Tn*[4](*β*) +*Tn*[6](*β*)
where*Tn*[*l*](*β*) =
P_{m}*j,k*=1(˜*q*
(*n*)
*jk* *−q*
(*n*)
*jk* )*·S*
[*l*]
*n,jk*(*β*) for*l*= 1*, . . . ,*6 and
*S _{n,jk}*[1] =

*λT*

_{H}

*−*1

*/*2

*n*

*n*X

*i*=1 [A

*1*

_{i}−*/*2G[1]

*]*

_{i}*j*[diag

*{*X

*i*H

*n−*1

*/*2

*λ}*]

*jh*(0)

*x*

_{ijk}*ij²ik*

*S*[3] (

_{n,jk}*β*) =

*λT*

_{H}

*−*1

*/*2

*n*

*n*X

*i*=1 [A

*−*1

_{i}*/*2G[1]

*(*

_{i}*β*)]

*j*[diag

*{*X

*i*H

*n−*1

*/*2

*λ}*]

*j*[A

*i*(

*β*)

*−*1

*/*2A1

*i/*2

*−I*]

*k*

*h*(0)

*x*

_{ijk}*ij²ik*

*S*[5] (

_{n,jk}*β*) =

*λT*

_{H}

*−*1

*/*2

*n*

*n*X

*i*=1 [A

*−*1

_{i}*/*2(G[1]

*(*

_{i}*β*)

*−*G[1]

*)]*

_{i}*j*[diag

*{*X

*i*H

*−n*1

*/*2

*λ}*]

*jh*(0)

*ijk*x

*ij²ik*

*S*[2] =

_{n,jk}*λT*H

*−n*1

*/*2

*n*X

*i*=1 [diag

*{*X

*i*H

*n−*1

*/*2

*λ}*]

*k*[G[2]

*i*A 1

*/*2

*i*]

*k*

*h*(0)

*ijk*x

*ij²ik*

*S*[4] (

_{n,jk}*β*) =

*λT*H

*−n*1

*/*2

*n*X

*i*=1 [A

*−*1

_{i}*/*2A

*i*(

*β*)1

*/*2

*−I*]

*j*[diag

*{*X

*i*H

*−n*1

*/*2

*λ}*]

*k*[G[2]

*i*(

*β*)A 1

*/*2

*i*]

*k*

*h*(0)

*x*

_{ijk}*ij²ik*

*S*[6] (

_{n,jk}*β*) =

*λT*

_{H}

*−*1

*/*2

*n*

*n*X

*i*=1 [diag

*{*X

*i*H

*n−*1

*/*2

*λ}*]

*k*[(G[2]

*i*(

*β*)

*−*G[2]

*i*)A1

*i/*2]

*kh*(0)

*ijk*x

*ij²ik*

Since ˜*q*(* _{jk}n*)

*−q*(

*)*

_{jk}n*→P*0, it is enough to prove that

*{S*[1]

_{n,jk}*}n*,

*{S*[2]

*n,jk}n*and

*{*sup* _{β}_{∈}_{Bn}*˜

_{(}

_{r}_{)}

*|S*[

*] (*

_{n,jk}l*β*)

*|}n*,

*l*= 3

*,*4

*,*5

*,*6 are bounded in

*L*2for every

*j, k*= 1

*, . . . , m*.

We have
*E*(*|S _{n,jk}*[1]

*|*2)

*≤*tr

*{*H

*−n*1

*/*2 Ã

*X*

_{n}*i*=1 [A

*1*

_{i}−*/*2G[1]

*]2*

_{i}*j*[diag

*{*X

*i*H

*n−*1

*/*2

*λ}*]2

*j*(

*h*(0)

*ijk*)2

*σ*2

*ik*x

*ij*x

*Tij*! H

*−n*1

*/*2

*}*

*≤Cγ*(0)

*n*tr

*{*H

*−n*1

*/*2(

*n*X

*i*=1

*σ*2

*ij*x

*ij*x

*Tij*)H

*−n*1

*/*2

*} ≤Cγn*(0)(4

*mpτ*˜

*n*) =

*Cγ*˜

*n≤C.*

By the Cauchy-Schwartz inequality, for every*β* *∈B*˜*n*(*r*)

*|S _{n,jk}*[3] (

*β*)

*|*2

*(*

_{≤}*X*

_{n}*i*=1 [A

*−*1

_{i}*/*2G[1]

*(*

_{i}*β*)]2

*j*[diag

*{*X

*i*H

*n−*1

*/*2

*λ}*]2

*j*[A

*i*(

*β*)

*−*1

*/*2A

*i*1

*/*2

*−I*]2

*k*) (

*X*

_{n}*i*=1 (

*h*(0)

*)2*

_{ijk}*2*

_{²}*ik*(

*λT*H

*−n*1

*/*2x

*ij*)2 )

*≤*

*Cnγ*(0)

*n*˜

*γn· {λT*H

*−n*1

*/*2(

*n*X

*i*=1 (

*h*(0)

*)2*

_{ijk}*2*

_{²}*ik*x

*ij*x

*Tij*)H

*−n*1

*/*2

*λ}*Hence

*E*(sup

*˜ (*

_{β}_{∈}Bn*r*)

*|S*[3]

*n,jk*(

*β*)

*|*2)

*≤Cnγ*(0)

*n*

*γ*˜

*n·{λT*H

*−n*1

*/*2(P

*n*

_{i}_{=1}

*σ*2x

_{ij}*ij*x

*Tij*)H

*−*1

*/*2

*n*

*λ} ≤*

*Cnγ*0

*n*˜

*γn·*(4

*mτ*˜

*n*)

*≤Cn*(˜

*γn*)2

*≤C*.

Similarly, by the Cauchy-Schwartz inequality, for every*β* *∈B*˜*n*(*r*)

*|S _{n,jk}*[5] (

*β*)

*|*2

*(*

_{≤}*X*

_{n}*i*=1 [A

*−*1

_{i}*/*2(G[1]

*(*

_{i}*β*)

*−*G[1]

*)]2*

_{i}*j*[diag

*{*X

*i*H

*−n*1

*/*2

*λ}*]2

*j*) (

*X*

_{n}*i*=1 (

*h*(0)

*)2*

_{ijk}*²*2

*ik*(

*λT*H

*−n*1

*/*2x

*ij*)2 )

*≤*

*Cnγ*(0)

*n*˜

*γn· {λT*H

*−n*1

*/*2(

*n*X

*i*=1 (

*h*(0)

*)2*

_{ijk}*2*

_{²}*ik*x

*ij*x

*Tij*)H

*−n*1

*/*2

*λ}*and

*E*(sup

*˜ (*

_{β}_{∈}Bn*r*)

*|S*[5]

*n,jk*(

*β*)

*|*2)

*≤Cn*(˜

*γn*)2

*≤C*.

The terms*S _{n,jk}*[

*l*] (

*β*),

*l*= 2

*,*4

*,*6 can be treated by similar methods.

*2*

### References

[1] Chen, K., Hu, I. and Ying, Z. (1999). Strong consistency of maximum
quasi-likelihood estimators in generalized linear models with fixed and adaptive
designs.*Ann. Stat.*271155-1163.

[2] Fahrmeir, L. and Kaufmann H. (1985). Consistency and asymptotic
nor-mality of the maximum likelihood estimator in generalized linear models.
*Ann. Stat.*13342-368.

[3] Fahrmeir, L. and Tutz, G. (1991).*Multivariate Statistical Modelling Based*
*on Generalized Linear Models, Springer, New Yoork.*

[4] Kaufmann, H. (1987). On the strong law of large numbers for multivariate
martingales.*Stochastic Processes and Their Applications,*2673-85.
[5] Hall, P. and Heyde, C.C. (1980). *Martingale Limit Theory and Its *

*Appli-cations, Academic Press.*

[6] Lai, T.L., Robbins, H., and Wei, C.Z. (1979). Strong consistency of least
squares estimates in multiple regression II.*J. Multiv. Anal.*9343-361.
[7] Liang, K.-Y., and Zeger, S.L. (1986). Longitudinal data analysis using

gen-eralized linear models.*Biometrika,*7313-22.

[8] McCullagh, P., and Nedler, J.A. (1989).*Generalized Linear Models, Second*
Edition, Chapman & Hall.

[9] Rao, J.N.K. (1998). Marginal models for repeated observatons: inference
with survey data.*Proceedings of the ASA meeting.*

[10] Schiopu-Kratina, I. (2003). Asymptotic results for generalized estimating
equations with data from complex surveys.*Revue roumaine de math. pures*
*et appl.*48, 327-342.

[11] Schott, J.R. (1997).*Matrix Analysis for Statistics, John Wiley, New York.*
[12] Shao, J. (1992). Asymptotic theory in generalized linear models with

nui-sance scale parametres.*Probab. Theory Relat. Fields*9125-41.
[13] Shao, J. (1999).*Mathematical Statistics, Springer, New York.*

[14] Xie, M. and Yang, Y. (2003). Asymptotics for generalized estimating
equa-tions with large cluster sizes.*Ann. Stat.*31310-347.

[15] Yuan, K.-H., and Jennrich, R.J. (1998). Asymptotics of estimating
equa-tions under natural condiequa-tions.*J. Multiv. Anal.*65 245-260.