Fitting general linear model for longitudinal survey data under informative sampling

(1)

FITTING GENERAL LINEAR MODEL FOR LONGITUDINAL

SURVEY DATA UNDER INFORMATIVE SAMPLING

ABDULHAKEEM A.H. EIDEH1

Sampling designs for surveys are often complex and informative, in the sense that the selection probabilities are correlated with the variables of interest, even when conditioned on explanatory variables. In this case conventional analysis that disregards the informativeness can be seriously biased, since the sample distribution differs from that of the population. Most of the studies in social surveys are based on data collected from complex sampling designs. Standard analysis of survey data often fails to account for the complex nature of the sampling design such as the use of unequal selection probabilities, clustering, and post-stratification. The effect of the sample design on the analysis is due to the fact that the models in use typically do not incorporate all the design variables determining the sample selection, either because there may be too many of them or because they are not of substantive interest.

AL-QUDS UNIVERSITY, PALESTINE

Statistics in Transition-New Series, Vol 11, No 3, 2010

ABSTRACT

The purpose of this article is to account for informative sampling in fitting superpopulation model for multivariate observations, and in particular multivariate normal distribution, for longitudinal survey data. The idea behind the proposed approach is to extract the model holding for the sample data as a function of the model in the population and the first order inclusion probabilities, and then fit the sample model using maximum likelihood, pseudo maximum likelihood and estimating equations methods. As an application of the results, we fit the general linear model for longitudinal survey data under informative sampling using different covariance structures: the exponential correlation model, the uniform correlation model, and the random effect model, and using different conditional expectations of first order inclusion probabilities given the study variable. The main feature of the present estimators is their behaviours in terms of the informativeness parameters.

Key words: General Linear Model, Informative sampling, Longitudinal Survey Data, Maximum Likelihood, and Sample distribution.

1. Introduction

1_{Address for correspondence}_{: Dr. ABDULHAKEEM A.H. EIDEH, Department of Mathematics,} College of Science and Technology, Al-Quds University, Abu-Dies campus, Palestine, P.O. Box 20002, Jerusalem. E-mail: [email protected].

(2)

However, if the sampling design is informative in the sense that the outcome variable (variable of interest) is correlated with the design variables not included in the model, even after conditioning on the model covariates, standard estimates of the model parameters can be severely biased, leading possibly to false inference. Pfeffermann (1993, 1996) reviews many examples reported in the literature that illustrate the effects of ignoring the sampling process when fitting models to survey data and discusses methods that have been proposed to deal with this problem, see also Skinner, Holt, and Smith (1989), Kasprzyk, Duncan, Kalton and Singh (1989), Hoem, (1989), and Chambers and Skinner (2003). It should be emphasized that standard inference may be biased even when the original sample is a simple random sample, due to non-response, attrition and imperfect frames that results in de facto a posterior differential inclusion probabilities.

To overcome the difficulties associated with the use of classical inference procedures for cross sectional survey data, Pfeffermann, Krieger and Rinott (1998) proposed the use of the sample distribution induced by the assumed population models, under informative sampling, and developed expressions for its calculation. Similarly, Eideh and Nathan (2006) fitted time series models for longitudinal survey data under informative sampling. Furthermore, Eideh (2008) fitted random effects or subject-specific effects models for analyzing normal data, which are assumed to be correlated, under the concept of informative sampling.

The plan of this paper is as follows. In Section 2 we define sample distribution and sample likelihood. In Section 3 we extract the sampled distribution of the multivariate normal distribution under informative sampling. In Section 4 we fit the general linear model for longitudinal survey data. Section 5 provides a discussion of the results. 2. Sample distribution and sample likelihood

Let U =

{

1,...,N

}

denote a finite population consisting of Nunits. Let y be the target or study variable of interest and let y_i be the value of y for the ith population unit. Let x_i, i∈U be the value of an auxiliary variable(s), x, and z=

{

z₁,...,z_N

}

be the values of a known design variable, used for the sample selection process but not included in the working model under consideration. In what follows we consider a sampling design with selection probabilities πi =Pr(i∈s)>0, and sampling weight

i i

w =1π ; i =1,...,N. In practice, the π_i’s may depend on the population values

(

x,y,z

)

. We express this by writing: πi =Pr(i∈s|x,y,z). The sample s consists of the subset of U selected at random by the sampling scheme with inclusion probabilities π1,...,πN. Denote by

(

)

′ = I1,...,IN

I the N by 1 sample indicator (vector) variable such that I_i =1 if unit i∈U is selected to the sample and I_i =0 if otherwise. The sample s is defined accordingly as s=

{

i|i∈U,I_i =1

}

and its complement by c=s =

{

i|i∈U,I_i =0

}

. We assume probability sampling, so that

0 ) Pr( ∈ >

= i s

i

(3)

We now consider the population values y1,...,yN as random variables, which are independent realizations from a distribution with probability density function

(

_i | _i,θ

)

p y x

f , indexed by a vector parameter θ.

According to Krieger and Pfeffermann (1997), the (marginal) sample probability density function of y_i is defined as:

(

)

(

)

(

) (

)

(

)

(

) (

)

(

π θ γ

)

θ γ π γ θ θ γ γ θ γ θ , , | , | , , | , , | s Pr , | , , | s Pr s , , , | , , | i i p i i p i i i p i i i p i i i i p i i s x E x y f y x E x i y f y x i i x y f x y f = ∈ ∈ = ∈ = x (1)

where θ is the parameter of the population distribution, γ is the parameter indexing

(

| , ,γ

)

Pr i∈s x_i y_i and

(

i i

)

p

(

i i i

) (

p i i

)

i p x E x y f y x dy E π | ,θ,γ =

∫

π | , ,γ | ,θ

Having derived the sample distribution, Pfeffermann, Krieger and Rinott (1998) proved that if the population measurements y_i are independent, then as N →∞(with

nfixed, where n is the sample size), the sample measurements are asymptotically independent, so we can apply standard inference procedures to complex survey data by using the marginal sample distribution for each unit. Based on the sample data

{

y_i,x_i,w_i; i∈s

}

, we can estimate the parameters of the population model in two steps:

Step-one: According to Pfeffermann and Sverchkov (1999), estimate the informativeness parameters γ using the following relationship:

(

_i | _i, _i,γ

)

1 _p

(

π_i | _i, _i,γ

)

s w x y E x y

E = (2) Thus the informativeness parameters can be estimated using regression analysis. Denoting the resulting estimate of γ by γ~.

Step-two: Substitute γ~ in the sample log-likelihood function, and then maximize the resulting sample log-likelihood function with respect to the population parameters, θ:

( )

(

)

( )

θ

(

θ γ

)

γ θ π θ γ θ ~ , , | log ~ , , | log ~ , 1 1 i i s n i srs i i p n i srs rs w E l E l l x x

∑

= = + = − = (3) where l_rs

( )

θ,γ~ is the sample log-likelihood after substituting γ~ in the sample

(4)

( )

=

∑

_∈

{

(

)

}

s

i p i i

srs f y x

l θ log | ,θ

is the classical log-likelihood obtained by ignoring the sample design. 3. Multivariate normal distribution under informative sampling

Most classical methods of multivariate analysis of continuous data are based on an examination of the structure of population mean vectors and covariance matrices. We consider the problem of estimating the mean vector μ and the covariance matrix Vof a vector of study variables y, from survey data obtained under informative sampling. The original theoretical basis for this is the multivariate normal distribution. The existing work in this area deals with the estimation problem when the sampling scheme is noninformative; see for example Smith and Holmes (1989).

The following theorem focuses on multivariate normal distribution under different modeling of the population conditional expectation of first order inclusion probabilities.

Theorem 1. Assume that the population distribution is q-dimensional multivariate normal with mean vector μ and covariance matrix V, that is:

(

y y

)

Nq

(

)

i N p iq i i 1,..., ~ , , =1,..., ′ = μ V y

Let E_p

( )

y_ij =µ_j and Cov_p

(

y_ij,y_ik

)

=v_jk, i=1,...,N; j,k =1,2,...,q.

1. Under the exponential inclusion probability model – exponential sampling:

(

)

(

)

( )

∏

(

)

= = ′ + = | q j ij j i i i p y a a a E 1 0 0 exp exp exp ay y π (4)

The sample probability density function of y_i is:

( ) ( )

{

(

)

}

{

(

)

}

_  ₋ ₋ ₊ ′ ₋ ₊ = − − − Va μ y V Va μ y V y_i q _i _i s f 2π 0.5 0.5exp 0.5 1 (5) That is, _i

(

y_i ,...,y_iq

)

~N_q

(

,

)

,i 1,...,n s 1 + = ′ = μ Va V y . So that,

( )

(

)

( )

y Va Va μ y y + = + = ∈ | = i p i p i s E s i E E and

(5)

( )

(

)

( )

i p i p i s Var s i Var Var y V y y = = ∈ | = 2. Under the linear inclusion probability model – linear sampling:

(

)

∑

= + = ′ + = | q t ij j i i i p y b b b E 1 0 0 y b y π (6)

The sample probability density function of yi is:

( ) (

)( )

{

( )

(

)

(

)

}

μ b μ y V μ y V y b y ′ + − ′ − − ′ + = − − − 0 1 5 . 0 5 . 0 0 2 exp 0.5 b b f i i q i i s π (7) Furthermore,

( )

μ b Vb y μ b Vb μ y ′ + + = ′ + + = 0 0 b E b E i p i s (8a) and

( )

( )( )

(

)

( ) ( )( )

(

)

2 0 2 0 μ b Vb Vb y μ b Vb Vb V y y ′ + ′ − = ′ + ′ − = = b Var b Cov Var i p i s i s (8b) Proofs:

1. As an extension of equation (1), the sample probability density function of yi is given by:

( )

(

)

(

( )

) ( )

i p i p i i p i p i s E f E i f f π π y y y y = | ∈s = | (9) So that,

( )

(

)( )

(

{

(

)

(

)

}

( )

(

) (

Q

)

f q i i q i i s 5 . 0 exp 5 . 0 exp 2 5 . 0 exp 5 . 0 exp 2 exp 5 . 0 2 1 5 . 0 5 . 0 − ′ − = ′ + ′ − ′ − − ′ = − − − − − Va a V Va a μ a μ y V μ y V y a y π π (10) where Q=

(

y_i −μ

)

′V−1

(

y_i −μ

)

−2a′

(

y_i −μ

)

.

Setting Ci =yi −μ , then Q can be written as: Ci V Ci − a′Ci

′

= −

2

Q 1 .Using theorems

in multivariate statistical analysis, see Johnson and Wichern (1998), page 68, we have:

(6)

i i i V V C aV V C C 0.5 0.5 2 0.5 0.5 Q= ′ − − − ′ − then , let Now Di V 0.5Ci − = :

(

D V a

) (

D V a

)

aVa a V V a a V V a D V a D D D V a D D ′ − − ′ − = ′ − ′ + ′ − ′ = ′ − ′ = 1 5 . 0 5 . 0 5 . 0 5 . 0 5 . 0 5 . 0 5 . 0 5 .. 0 2 2 Q i i i i i i i i

But C_i =y_i −μ and D_i =V−0.5C_i , so that Q can be expressed equivalently as:

(

)

(

)

(

)

(

)

(

)

(

)

(

)

Q 1 5 . 0 5 . 0 5 . 0 5 . 0 5 . 0 5 . 0 Va a Va μ y V Va μ y Va a a V V μ y V a V V μ y V ′ − + − ′ + − = ′ − − − ′ − − = − − − i i i i

Thus after substituting this expression of Q in (10), we get (5).

Hence the multivariate normal distribution in the sample is the same as in the population, except that the mean is shifted by the constant Va. Notice that the sample probability density function does not depend on a₀. Note also:

( )

ij j j j jj q jq s y av a v a v E = + +...+ +...+ µ ₁ ₁ (11a) and

( )

y v Cov

(

y y

)

v j k q Vars ij = jj, p ij, ik = jk, ≠ =1,..., (11b) 2. Substitute (6) in (9) we get (7).

Let us now compute the first two moments of this sample pdf. In order to do this we will use the moment generating function technique. The moment generating function of the sample probability density function is given by:

( )

(

)

(

)

( )

μ b u u b u μ b y y μ b y b y u y u u ′ + ′ + ′ + = ′ + ′ + ′ = ′ =

∫

0 0 0 0 0 exp exp b d dM M b b d f b b E M i i p i p i i p i i i i i i s where

( )

i exp

(

i .5 i i

)

p M u = u′μ+ u′Vu and

(7)

( )

_p

( )(

_i _i

)

i i p M d dM Vu μ u u u + = Thus we have:

( )

(

i

)

i p i s b b M M b μ V u μ b u u 0 0 0 + ′ + ′ + = (12)

This equation gives explicitly the relationship between the population and the sample moment generating functions. Notice that the sample and population moment generating functions are different unless b=0, in which case the sampling mechanism is noninformative.

Let R_s

( )

u_i =logM_s

( )

u_i . Differentiating this expression twice and setting u_i =0, we get (8a) and (8b).

From (8a) and (8b), we can see that:

( )

, ... ... ... ... 1 1 0 1 1 iq q j j jq q jj j j j ij s b b b b v b v b v b y E µ µ µ µ + + + + + + + + + + = (13a)

( )

(

)

2 1 1 0 2 1 1 ... ... ... ... q q j j jq q jj j j jj ij s b b b b v b v b v b v y Var µ µ µ + + + + + + + + − = (13b)

(

)

(

)(

)

2

)

1 1 0 1 1 1 1 ... ... ... ... ... ... , q q j j kq q kj j k jq q jj j j jk ik ij p b b b b v b v b v b v b v b v b v y y Cov µ µ µ + + + + + + + + + + + + − = , (13c) for i=1,...,n, j≠k =1,2,...,q.

Also Vars

( )

yij ≤Varp

( )

yij and the equality holds if and only if b=0, that is when the sampling mechanism is noninformative. On the other hand, if b≠0, then the means the variances and the covariances change, in contradiction to what happens for the sample probability density function (5), where only the means change.

To illustrate the results, from now on, we only consider the particular cases of the exponential inclusion probability model – exponential sampling, see equation (4). The following are special cases of Theorem 1.

Corollary 1. (Univariate normal distribution, q=1)

Let E_p

( )

y_i₁ =µ₁ and Var_p

( )

y_i₁ =v₁₁. Under the exponential sampling:

(

i i1

)

exp

(

0 1 i1

)

p y a a y

E π | = + (14) We get the following result:

y₁~N

(

₁ a₁v₁₁,v₁₁

)

s

(8)

Corollary 2. (Bivariate normal distribution,q=2).

Let E_p

( )

y_i₁ =µ₁,E_p

( )

y_i₂ =µ₂, Var_p

( )

y_i₁ =v₁₁, Var_p

( )

y_i₂ =v₂₂, Cov_p

(

y_i₁,y_i₂

)

=v₁₂

and Corp

(

yi₁,yi₂

)

=ρ₁₂. Under the exponential sampling:

(

i i1, i2

)

exp

(

0 1 i1 2 i2

)

p y y a a y a y

E π | = + + (16) and using the properties of multivariate normal distribution, see Johnson and Wichern (1998), page 171, we obtain the following results:

1. The joint sample probability density function of

(

y_i₁,y_i₂

)

is:

(

)

                  + + + + ′ 22 21 12 11 22 2 21 1 2 12 2 11 1 1 2 2 1, ~ , v v v v v a v a v a v a N y y s i i µ µ (17a)

2. The sample marginal probability density functions of y_i₁ and y_i₂ are respectively given by:

(

1 1 11 2 12 11

)

1~N av a v ,v y s i µ + + (17b) and

(

2 1 21 2 22 22

)

2~N av a v ,v y s i µ + + (17c) 3. The conditional sample probability density function of y_i₁ given y_i₂ is univariate normal distribution with conditional mean:

(

)

( )

(

( )

)

(

)

(

)

(

)

(

)

(

2

)

12 11 1 2 2 22 11 12 1 22 2 12 11 1 2 2 22 11 12 1 22 2 21 1 2 2 22 11 12 12 2 11 1 1 2 2 11 22 12 1 2 1 1 | ρ µ ρ µ µ ρ µ µ ρ µ ρ − + − + =       − + − + = + + − + + + = − + = v a y v v v v v a y v v v a v a y v v v a v a y E y v v y E y y E i i i i s i i s i i s (17d)

and conditional variance:

(

)

(

)

(

1 2

)

2 12 11 2 1| i 1 p i | i i s y y v V y y V = −ρ = (17e) That is,

(

) (

)

(

) (

)

{

2

}

12 11 2 12 11 1 2 2 5 . 0 22 11 12 1 2 1 |y ~N µ +ρ v v y −µ +av 1−ρ ,v 1−ρ y _i s i i (17f)

(9)

Notice that:

(

1| 2

)

(

1| 2

)

1

(

1| 2

)

E_s y_i y_i =E_p y_i y_i +aV_p y_i y_i (17g) 4. Similarly, the conditional sample probability density function of y_i₂ given y_i₁ is:

(

) (

)

(

) (

)

{

2

}

12 22 2 12 22 2 1 1 5 . 0 11 22 12 2 1 2 |y ~N µ +ρ v v y −µ +a v 1−ρ ,v 1−ρ y _i s i i (17h)

Thus the sample and population probability density functions are different, but belong to the same family of distribution, which is normal. Also the change occurs only for the means and conditional means, whereas the variances, conditional variances, and covariance do not change.

In particular if a₂ =0, that is the inclusion probabilities depend only on y_i₁ (which is the case in panel surveys), then we have:

(

yi1,yi2

)

is:

(

)

                  + + ′ 22 21 12 11 12 1 2 11 1 1 2 2 1, ~ , v v v v v a v a N y y s i i _µ µ (18a)

2. The marginal sample probability density functions of y_i₁ and y_i₂ are respectively given by:

(

1 1 11 11

)

1~N a v ,v y s i µ + (18b) and

(

2 1 12 22

)

2~N av ,v y s i µ + (18c) 3. The conditional sample probability density function of yi1 given yi2 is:

(

)

(

) (

)

(

)

{

2

}

12 11 2 2 5 . 0 22 11 12 2 12 11 1 1 2 1 |y ~N µ +av 1−ρ +ρ v v y −µ ,v 1−ρ y i s i i (18d)

4. The conditional sample probability density function of y_i₂ given y_i₁ is:

(

) (

)

(

)

{

2

}

12 22 1 1 5 . 0 11 22 12 2 1 2 |y ~N µ +ρ v v y −µ ,v 1−ρ y _i s i i (18e)

Notice that the sample and population probability density functions of yi2 |yi1 are same, while the other sample and population distributions are different.

Birnbaum et al. (1950) studied the effect of selection performed on some coordinates of a multi-dimensional population, but in different point of view.

(10)

Notice that the sample and population pdf’s of yi2|yi1 are same, while the other sample and population distributions are different.

Corollary 3. (Bivariate normal distribution,q=2).

Let E_p

( )

y_i₁ =µ₁,E_p

( )

y_i₂ =µ₂, Var_p

( )

y_i₁ =v₁₁, Var_p

( )

y_i₂ =v₂₂, Cov_p

(

y_i₁,y_i₂

)

=v₁₂

and Cor_p

(

y_i₁,y_i₂

)

=ρ₁₂. Under the linear model:

(

i i1, i2

)

0 1 i1 2 i2 p y y b b y b y

E π | = + +

and using the properties of multivariate normal distribution, we have the following results:

(

y_i₁,y_i₂

)

is:

(

)

(

₁ ₂

)

2 2 1 1 0 2 2 1 1 0 2 1, p i , i i i i i s f y y b b b y b y b b y y f µ µ + + + + = (19a) 2. Integrating (19a), with respect to y_i₂, we have:

( )

(

)

(

)

(

)

(

)

(

)

(

)

(

)

(

)

(

)

( )

(

)

( )

(

)

(

)

( )

(

)

(

)

( )

1 1 1 , , 1 , 1 , , 1 , , 1 2 2 1 1 0 1 1 5 . 0 11 22 12 2 1 1 0 2 1 2 1 2 2 2 2 1 1 0 1 2 1 2 1 1 1 1 0 2 2 1 1 0 2 1 2 1 2 2 2 2 1 1 0 2 2 1 1 1 2 2 1 0 2 2 1 1 0 2 2 1 2 2 2 2 1 1 0 2 2 1 1 1 2 2 1 0 2 2 1 1 0 2 2 1 2 2 1 1 0 2 2 1 1 0 2 2 1 1 i p i i i i i p i p i i i p i p i p i i p i i i p i p i i i i p i i i i p i i i p i i i i p i i i i p i i i p i i i i i s i s y f b b b y v v y b b dy y y f y f y b b b b y y E y f b y f y b y f b b b b dy y y f y f y b b b b dy y y f y b dy y y f b b b b dy y y f y b b b b dy y y f y b dy y y f b b b b dy y y f b b b y b y b b dy y y f y f                 + +         −       + + + = + + + + + + + = + + + + + + = + + + + + + = + + + + = =

∫

µ µ µ ρ µ µ µ µ µ µ µ µ µ µ µ µ µ µ µ (19b)

3. Similarly, integrating (19a), with respect to yi1, we obtain the following marginal sample probability density function of y_i₂:

(11)

( )

(

)

( )

2 2 2 1 1 0 2 2 5 . 0 22 11 12 1 1 2 2 0 2 p i i i i s f y b b b y v v b y b b y f                 + +         −       + + + = µ µ µ ρ µ (19c)

4. Using (19a), (19b), and (19c), we get the following conditional sample probability density function of y_i₁ given y_i₂:

(

)

(

( )

)

(

)

(

)

( )

(

)

(

( )

)

(

)

(

1 2

)

2 2 2 1 1 0 2 2 1 1 0 2 2 1 2 2 2 1 1 0 2 2 1 1 0 2 2 2 1 1 0 1 2 1 2 2 0 2 1 2 2 1 1 0 2 2 1 1 0 2 2 1 2 1 | | , | , , | i i p i i i p i i i p i i p i i i p i i i p i i p i i i p i i i s i i s i i s y y f y b y y E b b y b y b b y f y y f y b y y E b b y b y b b y f b b b y y E b y b b y y f b b b y b y b b y f y y f y y f         + + + + = + + + + = + + + + ÷ + + + + = = µ µ µ µ (19d) where

(

)

(

2 2

)

5 . 0 22 11 12 1 2 1 | µ ρ  −µ      + = _i i i p y v v y y E

5. Similarly, the conditional sample probability density function of yi2 given yi1 is:

(

)

(

)

(

2 1

)

0 1 1 1 2 2 0 2 2 1 1 1 2 | | | p i i i i i p i i i i s f y y b y b y y E b b y b y b y y f         + + + + = (19e) where

(

)

(

1 1

)

5 . 0 11 22 12 2 1 2 | µ ρ  −µ      + = _i i i p y v v y y E

Thus in this case we see that the sample probability density functions and population probability density functions are very different. Also notice that formulas (19b, c) are true, not only for the bivariate normal distribution, but for any joint probability density function of

(

y_i₁,y_i₂

)

, provided that the marginal and conditional distributions and the corresponding moments are exist.

The following theorem provides the maximum likelihood estimators of the mean vector μ and covariance matrixV.

(12)

Theorem 2. Assume

(

y y

)

N_q

(

)

i N p iq i i 1,..., ~ , , =1,..., ′ = μ V

y are independent. Let

n y

y1,..., be a sample of size n selected by informative sampling.

1. Under the exponential sampling – equation (4): the maximum likelihood estimators of μ and Vare given by:

a V y μ_ˆ ₌ ₋ _ˆ~_(20a) and

(

)(

i

)

{ }

ij

{ }

ij n i i s n − − ′ = = =

∑

= − V y y y y Vˆ ˆ 1 1 (20b) where ~a is the least square estimator under the model:E_s

(

w_i |y_i

)

=exp

(

−a₀ −a′y_i

)

,

(

)

′ = y1,...,yq y

∑

= − = n j ij i n y y 1 1 , and

∑

(

)

(

)

= − ₋ ₋ = = n k j jk i ik ij ij s n y y y y Vˆ 1 .

2. Under the linear sampling – equation (6): the maximum likelihood estimators of μ and Vare defined by the equations:

(

y−μˆ

)

(

b~₀ +~b′μˆ

)

=Vˆb~ (21a) and

(

−

)(

−

)

′ =

∑

= μ y μ y Vˆ ˆ ˆ 1 n i n (21b) where b~₀ and b~ are the least squares estimators under the model:E_s

(

w_i |y_i

)

=1

(

−b₀ −b′y_i

)

. Solve (21a) and (21b) iteratively. Start with

classical maximum likelihood estimators. Proofs:

1. Exponential sampling. Using the two-step method of estimation.

Step-one. Estimation of informativeness parameters a0 and a via the relationship (2).

Step-two. After substituting ~ain (5), the resulting sample log-likelihood is given by:

(

)

∑

(

) (

)

= ∗ − ∗ ′ ₋ − − − − = n i i i rs nq n l 1 1 5 . 0 log 5 . 0 2 log 5 . 0 ,V V y μ V y μ μ π (22) where μ* =μ+Va~.

According to Johnson and Wichern (1998), page 182, the maximum likelihood estimators of μ*and Vare given by:

(13)

y μ* = ˆ and

(

−

)(

−

)

′ =

∑

= − y y y y V _i n i i n 1 1 ˆ _.

Hence the result – equation (20).

2. Linear model. Using the two-step method of estimation.

Step-one. Estimation of informativeness parameters, b₀ and b via the relationship (2).

Step-two. After substituting b~₀ and b~ in (7), the resulting sample log-likelihood is given by:

(

μ V

)

=− − V −

∑

(

y −μ

)

′V

(

y −μ

)

−

(

+b′μ

)

= − ~ ~ log 2 1 log 2 1 2 log 2 1 , 0 1 1 b n n nq l n i i i rs π (23)

Now differentiating l_rs

(

μ,V

)

with respect toμ andV, we can show that the maximum likelihood estimators of μ and Vare defined by equation (21).

The following corollary provides the maximum likelihood estimators of the mean µ and the variance σ2 when the population model is univariate normal and the sampling process is exponential and linear, which is a particular case of Theorem 2 with q=1. Corollary 4. Assume y N

(

)

i N p i ~ , , 1,..., 2 = σ

µ are independent. Let y₁,...,y_n be a sample of size n selected under the following sampling schemes.

1.Exponential sampling. The maximum likelihood estimators of µ and σ2 are given by: 2 1ˆ ~ ˆ σ µ = y−a (24a) and

(

)

∑

= − ₋ = = n i i y y n s 1 2 1 2 2 ˆ σ (24b) where a~₁ is the least square estimator of the informativeness parameter a1.

2. Linear sampling. The maximum likelihood estimators of µ and σ2 are given by:

(

ˆ

)

~

(

~ ~ ˆ

)

0 ˆ 1 1 1 1 0 1 2

∑

− − + = = − n i i nb b b y µ µ σ (25a) and

(14)

(

)

∑

= − ₋ = n i i y n 1 2 1 2 ˆ ˆ µ σ (25b)

where b~₀ and b~₁ are the estimators of the informativeness parameters b₀ and b₁. Corollary 5. Assume

(

y y

)

N

(

)

i N p i i i 1, 2 ~ 2 , , =1,..., ′ = μ V

y . Under the exponential

sampling – equation (4): the maximum likelihood estimators of µ₁,µ₂

,

v₁₁,v₂₂,v₁₁and are given by:

12 2 11 1 1 1 ~ ~ ˆ = y −a s −a s µ (26a) 12 1 22 2 2 2 ~ ~ ˆ = y −a s −as µ (26b)

{ }

, , 1,2 ˆ 22 21 12 11 ₌ ₌       = s i j s s s s ij V (27)

4. Application – fitting general linear model for longitudinal survey data under informative sampling

4.1 Population model:

Let yit,i=1,...,N;t =1,...,T be the measurement on the i-th subject at time t=1,...,T. Associated with each yit are the (known) values,xitk,k =1,...p, of p explanatory variables. We assume that the yit follow the regression model:

y_it =β₁x_it₁ +...+β_px_itp +ε_it (28) where εit are random sequence of length T associated with each of the N subjects. In our context, the longitudinal structure of the data means that we expect the εit to be correlated within subjects.

Let yi =

(

yi1,...,yiT

)

′,

(

)

′

= it itp

it x 1,...,x

x and let β=

(

β1,...,βp

)

′ be the vector of unknown regression coefficients. The general linear model for longitudinal survey data treats the random vectors y_i,i=1,...,N as independent multivariate normal variables, that is

(

xβ V

)

y ~ _q _i ,

p

i N (29) where xi is the matrix of size Tby p of explanatory variables for subject i, and V has

( )

jk −th element, vjk =covp

(

yij,yik

)

,j,k =1,...,T; see Diggle, Liang and Zeger

(15)

4.2 Covariance structure of y : i

It is useful at this stage to consider what form the matrix V might take. We consider three cases: the exponential correlation model, the uniform correlation model; see Diggle, Liang and Zeger (1994), and the random effect model; see Skinner and Holmes (2003).

Case1: The exponential correlation model:

In this model, the

( )

t,s −th element of V has the form:

(

y y

)

t s T

v_ts =cov_p _it, _is =σ2ρt−s , , =1,..., (30) Note that the correlation,vts σ2, between a pair of measurements on the same unit decays toward zero as the time separation between the measurements increases. Case2: The uniform correlation model:

In this model, we assume that there is a positive correlation,ρ, between any two measurements on the same subject. So that the

( )

t,s −th element of V has the form:

(

y y

)

t s T vts covp it, is , 1,..., 2 ≠ = = = σ ρ ; vtt ,t 1,...,T 2 = =σ (31) Case3: Random effects models:

Under this model the multivariate outcomes yi =

(

yi1,...,yiT

)

′,i=1,...,N are independent with mean vector and covariance matrix given respectively by:

( ) (

y_i = _T

)

=μ p E β₁,...,β (32a)

( )

i = u T + T =∑ p y J V 2 2 cov σ σ (32b) where J_T denotes the Tby Tmatrix all of whose elements are one, and the

( )

t,t′ −th

element of VT is t t T t t .,..., , ; ′= ′ − ρ . 4.3 Sampling design:

We assume a single-stage informative sampling design, where the sample is a panel sample selected at time t =1 and all units remain in the sample till time t=T. Examples of longitudinal surveys, some of which are based on complex sample designs, and of the issues involved in their design and analysis can be found in Herriot and Kasprzyk (1984), and Nathan (1999). In many of the cases described in these papers, a sample is selected for the first round and continues to serve for several rounds. Then it is intuitively reasonable to assume that the first order inclusion probabilities, π_i, depend on the population values of the response variable at the first

(16)

occasion only, the values y_i₁, and on x_i₁=

(

x_i₁₁,,x_i₁_p

)

′, and the values of known design variable, z=

{

z₁,...,z_N

}

, used for the sample selection, but not included in the working model under consideration.

4. 4 Sample distribution:

Under exponential inclusion probability model:

(

i i i

)

(

i i i p i p

)

p y a a y a x a x a x E | 1, 1 =exp 0 + 0 1 + 1 11+ 2 12 +...+ 1 ∗ x π (33)

Using (29), (33) and Theorem 1, we have:

(

μ V

)

θ x y , , *, 0

~

q s i i a N (34) where,

[

_′ ₊ _′ ₊ _′ ₊

]

′ = _i₁ a₀v₁₁ _i₂ a₀v₁₂ _iT a₀v₁_T * ,..., ,x β x β β x μ

Note that, let y_i_,_T₋₁ =

(

y_i₂,y_i₃,,y_iT

)

′. Alternatively, the sample probability density function of yi is can be written as:

(

i i

)

s

(

i i

) (

p iT i i

)

s f y f y f y x = ₁x₁ y_, ₋₁ | ₁,x (35) where

(

)

(

)

_      − ′ − − = 2 11 0 1 1 11 11 1 1 2 1 exp 2 1 , , y a v v v y fs i xi θ γ i xiβ π (36)

(

)

( )

(

)

(

)

(

)

   ₋ ₋ ′ ₋ = − − − ∗ − − − ∗ − − − , 1 1 1 1 , 0 1 1 , 2 1 1 , 0 1 1 1 , 2 1 exp 2 1 , | _i_T _T _T _i_T _T T T i i T i p V V y f y x y μ y μ π (37)

[

]

(

)

(

)

_′      ′ − + ′ ′ − + ′ = = ∗ ∗ ∗ ∗ − − β x β x β x β x x y μ 1 1 11 1 1 1 11 21 2 1 1 , 1 ,..., , | i i T iT i i i i i T i p T y v v y v v y E

with general term:

( )

1,1 1, 1 1 11 1 , 1 + ′+ − + ′ + ′ = t t − t t t t v v v v v ; t,t′=1,...,T −1. So that we have the following sample model:

. , , 2 , 1 ; ... 1 1 0 n i x x y it i it itp p it it  = + = + + + + = ∗ ∗ _ε ε β β β β x , (38)

(17)

where β₀ =a₀v₁₁, β∗ =

(

β₀,β₁,...,β_p

)

′, x∗_i =

(

1,x_i

)

and the ε_it are a random sequence correlated within subjects .

Note that if a₀ =0, that is the sampling design is noninformative, then the population and sample models are the same.

4.5 Estimation:

We consider three method of estimation, namely: unweighted maximum likelihood, pseudo maximum likelihood, and two-step estimation based on the sample distribution.

4.5.1 Unweighted maximum likelihood:

Maximum likelihood for the case where the sampling design is ignorable: the value of

(

β V

)

θ= , that satisfy:

(

)

(

)

(

)

      ₊ ₊ ₋ ′ ₋ ∂ ∂ − = ∂ ∂

∑

= − n i i i srs nq n l 1 1 log 2 log 2 1 , V y μ V y μ θ V μ θ π (39)

4.5.2 Pseudo maximum likelihood:

The pseudo maximum likelihood estimator of θ=

(

β,V

)

is defined as the solution of:

( )

ln2 ln

{

(

)

}

{

(

)

}

0 2 1 ˆ 1 =     ₊ ₊ ₋ ₊ ′ ₋ ₊ ∂ ∂ − = − ∈

∑

V y μ Va V y μ Va θ θ i i s i i w w q U π (40)

( )

ln

{

(

)

}

ln

{

(

| ,

)

}

0 ˆ 1 1 , 1 1 _∂ = ∂ + ∂ ∂ = ₋ ∈ ∈

∑

p iT i i s i i i s s i i ws f y n N y f w U y x θ x θ θ (41)

For more discussion, see Eideh and Nathan (2006). 4.5.3 Two-step method:

Step one. Estimate a₀ via the model: E_s

(

w_i |y_i₁,x_i₁

)

=exp

(

−a∗₀ −a₀y_i₁ −a′x_i₁

)

. Step two. Using Theorem 2, the maximum likelihood estimators of β∗ and V are defined as the solution of:

(

)

log2 log

(

) (

)

0 2 1 , 1 1 =       − ′ − + + ∂ ∂ − = ∂ ∂

∑

= ∗ − ∗ ∗ n i i i rs nq n l V y μ V y μ θ V β θ π (42) where β∗ =

(

β₀,β₁,...,β_p

)

′ and μ* =

[

x′_i₁β+a₀v₁₁,x_i′₂β+a₀v₁₂,...,x′_iTβ+a₀v₁_T

]

′.

(18)

4.6 Variance estimation:

For variance estimation we use the inverse of Fisher information matrix and the bootstrap approach for variance, see Pfeffermann and Sverchkov (1999, 2003) and Eideh and Nathan (2006).

(a) Fisher information matrix approach:

The inverse of the observed Fisher information matrix evaluated at θˆ =

( )

βˆ,Vˆ is given by:

( ) ( )

[ ]

( )

1 ˆ 2 1 1 ˆ ˆ ˆ − = −               ∂ ′ ∂ ∂ − = = θ θ θ θ θ θ θ l n I Vs s (43) (b) Bootstrap approach:

Let θˆ =

( )

βˆ,Vˆ be the sample maximum likelihood estimator of θ=

(

β,V

)

based on any of the equations (39-42) and θˆb =

(

βˆb,Vˆb

)

be the ML estimator computed from the bootstrap sample b=1,...,B, with the same sample size, drawn by simple random sampling with replacement from the original sample – the sample drawn under informative sampling design. The bootstrap variance estimator of θˆ =

( )

βˆ,Vˆ is defined as:

( )

′      ₋       ₋ =

∑

= b boot B b boot b boot B Vˆ θˆ 1 θˆ θˆ θˆ θˆ 1 (44) where

∑

= B b b boot B θ θˆ 1 ˆ . 5. Conclusions

In this paper, we extend the definition of univariate sample distribution into multivariate random variables. Also, we consider a new method of estimating the parameters of the superpopulation model for analyzing multivariate normal observations from finite population when the sampling design is informative. Furthermore, the general linear model for longitudinal survey data under informative sampling using different covariance structures: the exponential correlation model, the uniform correlation model, and the random effect model, was fitted under informative sampling.

The main feature of the present estimators is their behaviours in terms of the informativeness parameters.

The paper is purely mathematical. The role of informativeness of sampling mechanism in adjusting various estimators for bias reduction, based on simulation

(19)

study, under different population models and different modeling of conditional expectations of first order inclusion probabilities given response variable and covariates, can be found in Pfeffermann and Sverchkov (1999, 2003), Nathan and Eideh (2004), Eideh (2008), and Eideh and Nathan (2006, 2009).

I hope that the new mathematical results obtained will encourage further theoretical, empirical and practical research in these directions.

ACKNOWLEDGEMENTS

The author is grateful to the referees for their valuable comments.

REFERENCES

Birnbaum Z.W., Paulson E., and Andrews F.C. (1950). On the Effect of Selection Performed on Some Coordinates of a Multi-Dimensional Population. Psychometrika, 15, pp 191-204.

Chambers, R. and Skinner, C. (2003). Analysis of Survey Data. New York: John Wiley.

Diggle, P. J., Liang, K. Y, and Zeger, S. L. (1994). Analysis of Longitudinal Data. Oxford: Science Publication.

Eideh A.H. (2008). Estimation and Prediction of Random Effects Models for Longitudinal Survey Data under Informative Sampling. Statistics in Transition – New Series .Volume 9, Number 3, December 2008, pp 485 – 502.

Eideh, A. H. and Nathan, G. (2009). Two-Stage Informative Cluster Sampling with application in Small Area Estimation. Journal of Statistical Planning and Inference.139, pp 3088-3101.

Eideh, A. H. and Nathan, G. (2006)

Journal of Statistical Planning and

Inference,136, 3052-3069.

Fahrmeir, L. and Tutz, G. (2001). Multivariate statistical modeling based on generalized linear models,2nd edn. New York: Springer.

Herriot, R.A., and Kasprzyk, D. (1984). The survey of income and program participation. American Statistical Association, Proceedings of the Social Statistics Section,pp.107-116.

Hoem, J.M. (1989). The issue of weights in panel surveys of individual behaviors. In Panel Surveys, (Eds.), Kasprzyk, D., Duncan, G.J., Kalton, G., and Singh, M.P., New York: Wiley, pp. 539-565.

Lohnson, R.A., and Wichern, D.W. (1998). Applied Multivariate Statistical Analysis, 4th edn. New Jersey: Prentice Hall.

Kasprzyk, D., Duncan, G.J, Kalton, G., and Singh, M.P. (Eds.) (1989). Panel Surveys. New York: Wiley.

Krieger, A.M, and Pfeffermann, D. (1997) Testing of distribution functions from complex sample surveys. Journal of Official Statistics. 13: 123-142.

Mardia, K.V., Kent, T.J. and Bibby, J.M. (1979). Multivariate analysis, New York: Academic Press.

Nathan, G. (1999). A review of sample attrition and representativeness in three longitudinal surveys. GSS methodology Series No. 13. London: Office of National Statistics.

(20)

Nathan, G. and Eideh, A. H. (2004) : Échantillonage et Méthodes d’Enquêtes. (P. Ardilly – Ed.) Paris: Dunod, pp 227-240.

Pfeffermann, D. (1993). The role of sampling weight when modeling survey data. International Statistical Review 61: 317-337.

Pfeffermann, D. (1996). The use of sampling weights for survey data analysis. Statistical Methods in Medical Research, V.5: 239-261.

Pfeffermann, D., Krieger, A. M, and Rinott, Y. (1998). Parametric Distributions of Complex Survey Data under Informative Probability Sampling. Statistica Sinica, 8, 1087 1114.

Pfeffermann, D. and Sverchkov, M. (1999). Parametric and Semi-Parametric Estimation of Regression Models Fitted to Survey Data. Sankhya, 61, Ser. B, 66-186. Pfeffermann, D. and Sverchkov, M. (2003). Fitting Generalized Linear Models under Informative Probability Sampling. Analysis of Survey Data. (eds. R. Chambers and C. J. Skinner), pp. 175-195. New York: Wiley.

Skinner, C.J., Holt, D., and Smith, T.M.F (Eds.) (1989). Analysis of Complex Surveys, New York: Wiley.

Skinner, C.J., and Holmes, D. (2003). Random Effects Models for Longitudinal Data. Analysis of Survey Data. (eds. R. Chambers and C. Skinner), pp. 175-195. New York: Wiley.

Smith, T.M.F. and Holmes, D.J. (1989). Multivariate analysis.In Analysis of Complex Surveys, eds. C.J. Skinner, D. Holt and T.M.F. Smith, New York: Wiley, pp. 165-190.