Parametric fractional imputation for missing data analysis

(1)

Statistics Publications

Statistics

2011

Parametric fractional imputation for missing data

analysis

Jae Kwang Kim

Iowa State University, [email protected]

Follow this and additional works at:

http://lib.dr.iastate.edu/stat_las_pubs

Part of the

Design of Experiments and Sample Surveys Commons

, and the

Statistical

Methodology Commons

The complete bibliographic information for this item can be found at

http://lib.dr.iastate.edu/

stat_las_pubs/102

. For information on how to cite this item, please visit

http://lib.dr.iastate.edu/

howtocite.html

.

This Article is brought to you for free and open access by the Statistics at Iowa State University Digital Repository. It has been accepted for inclusion in Statistics Publications by an authorized administrator of Iowa State University Digital Repository. For more information, please contact

(2)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 C ⃝???? Biometrika Trust

Printed in Great Britain

Parametric fractional imputation for missing data analysis

BYJAE KWANG KIM

Department of Statistics, Iowa State University, Ames, Iowa 50011, U.S.A. [email protected]

SUMMARY

Parametric fractional imputation is proposed as a general tool for missing data analysis. Us-ing fractional weights, the observed likelihood can be approximated by the weighted mean of the imputed data likelihood. Computational efficiency can be achieved using the idea of impor-tance sampling and calibration weighting. The proposed imputation method provides efficient parameter estimates for the model parameters specified in the imputation model and also pro-vides reasonable estimates for parameters that are not part of the imputation model. Variance estimation is discussed and results from a limited simulation study are presented.

Some key words: EM algorithm, Importance sampling, Item nonresponse, Monte Carlo EM, Multiple imputation.

1. INTRODUCTION

Suppose that y1, . . . , yn are the observations for a probability sample selected from a finite population, where the finite population values are independent realisations of a random variable Y with a p-dimensional distributionF0(y)∈ {Fθ(y) ;θ∈Ω}. Suppose that, under complete response, a parameterηg =E{g(Y)}is unbiasedly estimated by

ˆ ηg = n ∑ i=1 wig(yi) (1)

for some functiong(yi)with sampling weightswi. Under simple random sampling, the sampling weight is1/nand the sample can be regarded as a random sample from an infinite population with distributionF0(y).

Under nonresponse, one can replace (1) with ˆ ηgR≡ n ∑ i=1 wiE { g(yi)|yi,obs } , (2)

whereyi,obsandyi,misdenote the observed part and missing part ofyi, respectively. To simplify

the presentation, we assume the sampling mechanism and the response mechanism are ignor-able in the sense of Rubin (1976). To compute the conditional expectation in (2), we need a correct specification of the conditional distribution ofyi,misgivenyi,obs. The conditional

expec-tation in (2) depends onθ0, whereθ0 is the true parameter value corresponding toF0. That is,

E{g(yi)|yi,obs

}

=E{g(yi)|yi,obs, θ0

}

.

To compute the conditional expectation in (2), a Monte Carlo approximation based on the imputed data can be used. Thus, one can interpret imputation as a Monte Carlo approximation of the conditional expectation given the observed data. Imputation is very attractive in practice because, once the imputed data are created, the data analyst does not need to know the conditional

(3)

49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96

distribution in (2). Monte Carlo methods for approximating the conditional expectation in (2) can be placed in two classes. One is the Bayesian approach, where the imputed values are generated from the posterior predictive distribution ofyi,mis givenyobs= (yi,obs;i= 1, . . . , n):

f(y_i,_mis|y_obs)=

∫

f(y_i,_mis |θ, y_obs)f(θ|y_obs)dθ. (3) This is essentially the approach used in multiple imputation as proposed by Rubin (1987). The other is the frequentist approach, where the imputed values are generated from the conditional distributionf(y_i,_mis|y_obs,θˆ)andθˆis an estimated value forθ.

In the Bayesian approach to imputation, the convergence to a stable posterior predictive distri-bution (3) is difficult to check (Gelman et al., 1996). Also, the variance estimator used in multiple imputation is not consistent for some estimated parameters. For examples, see Wang & Robins (1998) and Kim et al. (2006).

The frequentist approach for imputation has received less attention than the Bayesian impu-tation. One notable exception is Wang & Robins (1998) who studied the asymptotic properties of multiple imputation and a parametric frequentist imputation procedure. They considered the estimated parameterθˆto be given, and did not discuss parameter estimation.

We consider frequentist imputation given a parametric model for the original distribution. Using the idea of importance sampling, we propose a frequentist imputation method that can be implemented with fractional imputation, discussed in Fay (1996) and Kim & Fuller (2004), where fractional imputation was presented as a nonparametric imputation method in the con-text of survey sampling and the parameters of interest are of descriptive nature. The proposed fractional imputation, called parametric fractional imputation, is also applicable in an analytic setting where interest lies in the model parameters of the superpopulation model. The parametric fractional imputation method can be modified to reduce Monte Carlo error and can be used to simplify the Monte Carlo implementation of the EM algorithm.

2. FRACTIONALIMPUTATION

As discussed in §1, we consider an approximation for the conditional expectation in (2) using fractional imputation. In fractional imputation, M >1 imputed values for y_i,_mis, say y_i,∗(1)_mis, . . . , y_i,∗(_misM), are generated and assigned fractional weights,w∗_i₁, . . . , w_iM∗ , so that

M

∑

j=1

w_ij∗g(y_ij∗) =E{g(yi)|yi,obs,θˆ}, (4)

where y_ij∗ = (yi,obs, y_i,∗(_misj)), holds at least approximately for large M, where θˆis a consistent

estimator ofθ0. A popular choice forθˆis the pseudo maximum likelihood estimator, whereθˆis

theθthat maximizes the pseudo log-likelihood function. That is, ˆ θ= arg max θ∈Ω n ∑ i=1 wilog { f_obs₍_i₎(y_i,_obs;θ)}, (5) wheref_obs₍_i₎(y_i,_obs;θ)=∫ f(yi;θ)dyi,misis the marginal density ofyi,obs. A computationally

(4)

97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144

Condition (4) applied tog(yi) =cimplies that M

∑

j=1

w_ij∗ = 1 (6)

for alli. Given fractionally imputed data satisfying (4) and (6), the parameterηgcan be estimated by ˆ ηF I,g = n ∑ i=1 M ∑ j=1 wiwij∗g ( y∗_ij). (7)

The imputed estimator (7) is obtained by applying the formula (1) usingy_ij∗ as the observations with weightswiwij∗. For a single parameterηg =E{g(Y)}, any fractional imputation satisfying (4) provides a consistent estimator ofηg. For general purpose estimation, theg-function defining ηg is unknown at the time of imputation (Fay, 1992). To create fractional imputation for cate-gorical data with a finite number of possible values foryi,mis, we take the possible values as the imputed values and compute the conditional probability ofy_i,_misas

p ( y_i,∗(_misj) |y_i,_obs,θˆ ) = f(y ∗ ij; ˆθ) ∑Mi k=1f(yik∗; ˆθ) ,

wheref(yi;θ)is the joint density ofyievaluated atθandMi is the number of possible values of y_i,_mis. The choice of w∗_ij =p(y_i,∗(_misj) |y_i,_obs,θˆ) satisfies (4) and (6). Fractional imputation for categorical data usingw∗_ij =p(y_i,∗(_misj) |y_i,_obs,θˆ), which is close in spirit to the expectation-maximisation by weighting method of Ibrahim (1990), is discussed in Kim & Rao (2009).

For a continuous random variableyi, condition (4) can be approximately satisfied using im-portance sampling, wherey_i,∗(1)_mis, . . . , y∗_i,(_misM)are independently generated from a distribution with densityh(y_i,_mis) which has the same support asf(y_i,_mis|y_i,_obs, θ) for allθ∈Ω. The corre-sponding fractional weights are

w_ij∗₀ =w_ij∗₀(ˆθ) =Ci

f(y∗(_i,_misj) |y_i,_obs; ˆθ) h(y∗_i,(_misj) )

, (8)

whereCiis chosen to satisfy (6). Ifh

(

y_i,_mis)=f(y_i,_mis |y_i,_obs,θˆ)is used,w_ij∗₀ =M−1. REMARK1. Under mild conditions, g¯∗_i =∑M_j₌₁w∗_ij₀g(y∗_ij) with w_ij∗₀ in (8) converges to ¯

gi(ˆθ)≡E{g(yi)|yi,obs,θˆ} with probability 1, as M → ∞. The approximate variance is σ_i2/M, where σ_i2=E [{ g(yi)−¯gi(ˆθ) }2 _f₍_y_i,_mis |_y_i,_obs_,_θˆ₎ h(y_i,_mis) |yi,obs, ˆ θ ] . Theh(yi,mis)that minimizesσi2is

h∗(yi,mis ) =f ( yi,mis |yi,obs,θˆ ) × g(yi)−g¯i(ˆθ) E{g(yi)−g¯i(ˆθ)|yi,obs,θˆ }.

(5)

145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192

When theg-function is unknown,h(y_i,_mis)=f(yi,mis|yi,obs,θˆ)is a reasonable choice in terms of statistical efficiency. Other choices ofh(y_i,_mis) can have better computational efficiency in some situations.

For public access data, a large number of imputed values is not desirable. We propose an approximation with a small imputation size, say M = 10. To describe the procedure, let y_i,∗(1)_mis, . . . , y_i,∗(_misM) be independently generated from a distribution with densityh(y_i,_mis). Given the imputed values, it remains to compute the fractional weights that satisfy (4) and (6) as closely as possible. The proposed fractional weights are computed in two steps. In the first step, the ini-tial fractional weights are computed by (8). In the second step, the iniini-tial fractional weights are adjusted to satisfy (6) and

n ∑ i=1 M ∑ j=1 wiw∗ijs(ˆθ;yij∗) = 0, (9)

wheres(θ;y) =∂logf(y;θ)/∂θis the score function ofθ. Adjusting the initial weights to satisfy a constraint is often called calibration. As can be seen in§3, constraint (9) makes the resulting imputed estimatorηˆF I,gin (7) fully efficient for a linear function ofθ.

To construct the fractional weights satisfying (6) and (9), regression weighting or empiri-cal likelihood weighting can be used. For example, in the regression weighting, the fractional weights are w∗_ij =w_ij∗₀− ( _n ∑ i=1 wi¯s∗i )T  n ∑ i=1 M ∑ j=1 wiwij∗0 ( s∗_ij −s¯∗_i)⊗2    −1 w∗_ij₀(s∗_ij−¯s∗_i), (10) where w_ij∗₀ is the initial fractional weight (8) using importance sampling,s¯∗_i =∑M_j₌₁w_ij∗₀s∗_ij, B⊗2=BBT_{, and}_s∗

ij =s(ˆθ;yij∗). Here,Mneed not be large. If the distribution belongs to exponential family of the form

f(y;θ) = exp{t(y)Tθ+ϕ(θ) +A(y)},

then (9) can be obtained from ∑n_i₌₁∑M_j₌₁wiwij∗{t(y∗ij) + ˙ϕ(ˆθ)}= 0, where ϕ˙(θ) = ∂ϕ(θ)/∂θ. In this case, calibration can be used only for complete sufficient statistics.

3. ASYMPTOTIC RESULTS

In this section, we discuss some asymptotic properties of the fractionally imputed estimator (7). We consider two types of fractionally imputed estimators. One is obtained by using the initial fractional weights in (8) and the other is obtained by using the calibrated fractional weights of (10). The imputed estimator ηˆF I,g in (7) is a function of n andM, wheren is the sample size andM is the number of imputed values for each missing value. Thus, we useηˆg0,n,M and

ˆ

ηg1,n,M to denote the imputed estimator (7) using the initial fractional weights in (8) and the imputed estimator using the calibration fractional weights in (10), respectively. The following theorem presents some asymptotic properties of the fractionally imputed estimators. The proof is presented in Appendix A.

THEOREM1. Under some regularity conditions stated in Appendix A,

(6)

193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 and (ˆηg1,n,M −ηg)/σg1,n,M →N(0,1) (12) in distribution, asn→ ∞, for eachM >1, where

σ_g2₀_,n,M = var [ _n ∑ i=1 wi{g¯i∗(θ0) +K1T¯si(θ0)} ] , σ_g2₁_,n,M = var [ _n ∑ i=1 wi{¯gi∗(θ0) +K1T¯si(θ0) +BT(¯si(θ0)−s¯∗i (θ0))} ] , ¯ g∗_i (θ) =∑M_j₌₁w_ij∗₀(θ)g(y∗_ij),s¯i(θ) =Eθ { s(θ;yi)|yi,obs } ,s¯∗_i(θ) =∑M_j₌₁w_ij∗₀(θ)s(θ;y_ij∗), B ={I_mis(θ0)}−1Ig,mis(θ0), and K1={Iobs(θ0)}−1Ig,mis(θ0). Here, Iobs(θ) = E{−∑n_i₌₁wi∂s¯i(θ)/∂θ}, Ig,mis(θ) =E[ ∑n i=1wi{s(θ;yi)−s¯i(θ)}g(yi)], and I_mis(θ) =E[∑n_i₌₁wi{s(θ;yi)−¯si(θ)}⊗2 ] . In Theorem 1, σ_g2₀_,n,M =σ_g2₁_,n,M +BT_var { _n ∑ i=1 wi(¯s∗i −s¯i) } B

and the last term represents the reduction in the variance of the fractionally imputed estimator ofηg due to the calibration in (9). Thus,σ2_g₀_,n,M ≥σ2_g₁_,n,M with equality forM =∞. Clayton et al. (1998) and Robins & Wang (2000) proved results similar to (11) for the special case of M =∞.

To consider variance estimation, let ˆ V (ˆηg) = n ∑ i=1 n ∑ j=1 Ωijg(yi)g(yj)

be a consistent estimator for the variance ofηˆg=

∑n

i=1wig(yi)under complete response, where Ωij are coefficients. Under simple random sampling, Ωij =−1/{n2(n−1)} for i̸=j and Ωii= 1/n2.

For largeM, using the results in Theorem 1, a consistent estimator for the variance ofηˆF I,g in (7) is ˆ V (ˆηF I,g) = n ∑ i=1 n ∑ j=1 Ωije¯∗ie¯∗j, (13) wheree¯∗_i = ¯g_i∗(ˆθ) + ˆKT 1s¯∗i(ˆθ) = ∑M j=1wij∗0ˆe∗ij,eˆ∗ij =g(yij∗) + ˆK T 1s(ˆθ;y∗ij)and ˆ K1 = { _n ∑ i=1 wis¯∗i(ˆθ)¯s∗i(ˆθ) T }₋1_∑_n i=1 M ∑ j=1 wiwij∗ { s(ˆθ;y_ij∗)−¯s∗_i } g(y∗_ij). For moderate sizeM, the expected value of variance estimator (13) can be written

E { ˆ V (ˆηF I,g) } =E    n ∑ i=1 n ∑ j=1 Ωije¯i¯ej   +E    n ∑ i=1 n ∑ j=1 ΩijcovI ( ¯ e∗_i,e¯∗_j)   ,

(7)

241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288

where ¯ei =EI(¯e∗i) and the subscript I is used to denote the expectation with respect to the imputation mechanism generating y_i,mis∗(j) from h(yi,mis). If the imputed values are generated independently, covI(¯e∗i,e¯∗j) = 0fori̸=jand, using the argument in Remark 1, varI(¯e∗i)can be estimated byVˆIi,e ≡

∑M

j=1(wij∗0)2(ˆe∗ij−¯e∗i)2. Thus, an unbiased estimator forσg20,n,M is ˆ σ_g2₀_,n,M = n ∑ i=1 n ∑ j=1 Ωije¯∗_i¯e∗_j − n ∑ i=1 ΩiiVˆIi,e+ n ∑ i=1 w2_iVˆIi,g, where VˆIi,g = ∑M

j=1(wij∗0)2(gij∗ −g¯∗i)2. The estimator ofσ2g1,n,M in (12) can be derived in a similar manner.

Variance estimation with fractionally imputed data can be also performed using the replication method described in Appendix 2.

4. MAXIMUM LIKELIHOOD ESTIMATION

In this section, we propose a computational method for obtaining the pseudo maximum likeli-hood estimator in (5). The pseudo maximum likelilikeli-hood estimator reduces to the usual maximum likelihood estimator if the sampling design is simple random sampling with wi= 1/n. With missing data, the pseudo maximum likelihood estimator ofθ0can be obtained by

ˆ θ= arg max θ∈Ω n ∑ i=1 wiE { logf(yi;θ)|yi,obs } . (14)

For wi= 1/n, Dempster et al. (1977) proved that the maximum likelihood estimator in (14) is equal to (5). They proposed using the EM algorithm, computing the solution iteratively by definingθˆ₍_t₊₁₎to be the solution to

ˆ θ(t+1)= arg max θ∈Ω n ∑ i=1 wiE { logf(yi;θ)|y_i,_obs,θˆ(t) } , (15)

whereθˆ(t)is the estimate ofθobtained at thet-th iteration. To compute the conditional

expecta-tion in (15), Monte Carlo implementaexpecta-tion of the EM algorithm of Wei & Tanner (1990) can be used.

In the Monte Carlo EM method, independent draws ofy_i,_mis are generated from the condi-tional distribution f(y_i,_mis|y_i,_obs,θˆ₍_t₎) for each t to approximate the conditional expectation in (15). The Monte Carlo EM method requires heavy computation because the imputed values are re-generated for each iterationt. Also, generating imputed values fromf(y_i,_mis |y_i,_obs,θˆ₍_t₎) can be computationally challenging since it often requires an iterative algorithm such as the Metropolis-Hastings algorithm for each EM iteration. To avoid re-generating values from the conditional distribution at each step, we propose the following algorithm for parametric frac-tional imputation:

[Step 0] Obtain an initial estimatorθˆ₍₀₎ofθand seth(y_i,_mis) =f(y_i,_mis |y_i,_obs,θˆ₍₀₎). [Step 1] GenerateM imputed values,y_i,∗(1)_mis, . . . , y_i,∗(_misM), fromh(yi,mis

)

.

[Step 2] With the current estimate of θ, denoted by θˆ₍_t₎, compute the fractional weights by w∗_ij₍_t₎=wij0(ˆθ(t)), wherewij0(ˆθ)is defined in (8).

(8)

289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336

[Step 3] (Optional) If w∗_ij₍_t₎ > C/M for some i= 1, . . . , n and j= 1, . . . , M, then set h(y_i,_mis) =f(y_i,_mis |y_i,_obs,θˆ₍_t₎)and go to Step 1. IncreaseMif necessary.

[Step 4] Findθˆ(t+1)that maximises overθ∈Ωthe quantity

Q∗ ( θ|θˆ₍_t₎ ) = n ∑ i=1 M ∑ j=1 wiwij∗(t)logf ( y_ij∗;θ) (16) over .

[Step 5] Sett=t+ 1and go to Step 2. Stop ifθˆ₍_t₎meets the convergence criterion.

In Step 0, the initial estimator θˆ(0) can be the maximum likelihood estimator obtained by

using only the respondents. Step 1 and Step 2 correspond to the E-step of the EM algorithm. Step 3 can be used to control the variation of the fractional weights and to avoid extremely large fractional weights. The thresholdC/Min Step 3 guarantees that no individual fractional weight exceedsCtimes of the average of the fractional weights. In Step 4, the value ofθthat maximizes Q∗(θ|θˆ(t))in (16) can be obtained by solving

n ∑ i=1 M ∑ j=1 wiwij∗(t)s ( θ;y_ij∗)= 0, (17) where s(θ;y) is the score function of θ. Thus, the solution can be obtained by applying the complete sample score equation to the fractionally imputed data. Equation (17) can be called the imputed score equation using fractional imputation. Unlike the Monte Carlo EM method, the imputed values are not changed for each iteration, only the fractional weights are changed.

REMARK2. In Step 2, fractional weights can be computed by using the joint density with the current parameter estimateθˆ(t). Note thatw∗ij(0)(θ)in (8) can be written

w_ij∗₍₀₎(θ) = f(y

∗(j)

i,mis|yi,obs, θ)/h(y∗_i,(_misj) )

∑M k=1f(y ∗(k) i,mis|yi,obs, θ)/h(y∗ (k) i,mis) = f(y ∗ ij;θ)/h(y∗ (j) i,mis) ∑M k=1f(yik∗;θ)/h(y ∗(k) i,mis) , (18) which does not require the marginal density in computing the conditional distribution. Only the joint density is needed.

Given the M imputed values,y_i,∗(1)_mis, . . . , y_i,∗(_misM), generated fromh(y_i,_mis), the sequence of estimators{θˆ(0),θˆ(1), . . .}can be constructed using importance sampling. The following theorem

presents some convergence properties of the sequence of the estimators.

THEOREM2. LetQ∗(θ|θˆ(t))be the weighted log likelihood function (16) based on fractional

imputation. If

Q∗(ˆθ(t+1) |θˆ(t))≥Q∗(ˆθ(t) |θˆ(t)) (19)

then

l_obs∗ (ˆθ(t+1))≥lobs∗ (ˆθ(t)), (20)

wherel_obs∗ (θ) =∑n_i₌₁wilog{f_obs∗ ₍_i₎(yi,obs;θ)}and f_obs∗ ₍_i₎(y_i,_obs;θ) = ∑M j=1f(y∗ij;θ)/h(y∗ (j) i,mis) ∑M j=11/h(y ∗(j) i,mis) .

(9)

337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384

Proof. By (18) and using Jensen’s inequality,

l∗_obs(ˆθ(t+1))−l∗obs(ˆθ(t)) = n ∑ i=1 wilog M ∑ j=1 w_ij∗₍_t₎f(y ∗ ij; ˆθ(t+1)) f(y∗_ij; ˆθ(t)) ≥ n ∑ i=1 M ∑ j=1 wiw∗_ij₍_t₎log f(y_ij∗; ˆθ₍_t₊₁₎) f(y_ij∗; ˆθ(t)) =Q∗(ˆθ(t+1) |θˆ(t))−Q∗(ˆθ(t)|θˆ(t)). Therefore, (19) implies (20).

Note thatl∗_obs(θ) is an imputed version of the observed pseudo log-likelihood based on the theM imputed values,y∗_i,(1)_mis, . . . , y_i,∗(_misM). Thus, by Theorem 2, the sequencel∗_obs(ˆθ₍_t₎)is mono-tonically increasing and, under the conditions stated in Wu (1983), the convergence ofθˆ(t)to a

stationary point follows for fixedM. Theorem 2 does not hold for the sequence obtained from the Monte Carlo EM method for fixedM, because the imputed values are re-generated for each E-step of the Monte Carlo EM method, and convergence is very hard to check for the Monte Carlo EM (Booth & Hobert, 1999).

REMARK3. Sung & Geyer (2007) considered a Monte Carlo maximum likelihood method that directly maximizes l∗_obs(θ). Computing the value ofθthat maximizesQ∗(θ|θˆ(t))is easier

than computing the value ofθthat maximizesl∗_obs(θ).

5. SIMULATIONSTUDY

In a simulation study, B = 2,000 Monte Carlo samples of size n= 200 were indepen-dently generated from an infinite population withxi∼N(2,1),y1i |xi ∼N(β0+β1xi, σee), where (β0, β1, σee) = (1,0·7,1), y2i |(xi, y1i)∼Ber(pi), log{pi/(1−pi)}=ϕ0+ϕ1xi+ ϕ2y1i,(ϕ0, ϕ1, ϕ2) = (−3,0·5,0·7),δi1 |(xi, yi, zi)∼Ber(πi),log{πi/(1−πi)}= 0·5xi, and δi2 |(xi, yi, zi, δi1)∼Ber(0·7). The variablesxi, δi1, andδi2 are always observed. Variabley1i is observed ifδi1= 1 and is not observed ifδi1= 0. Variabley2i is observed ifδi2 = 1and is

not observed ifδi2 = 0. The overall response rate fory1is about 72%.

We are interested in estimating four parameters: the marginal mean ofy, η1 =E(y1); the

marginal mean of y2, η2=E(y2); the slope for the regression of y1 on x, η3=β1; and the

proportion of y1 less than 3, η4=pr(y1<3). Under complete response, η1, η2, and η3 are

computed by the maximum likelihood method and the proportionη4is estimated by

ˆ η4,n = 1 n n ∑ i=1 I(y1i <3). (21)

Under nonresponse, four imputed estimators were computed: the parametric fractional imputa-tion estimator usingw∗_ij₀in (8) withM = 100; the calibration fractional imputation estimator us-ing the regression weightus-ing method in (10) withM = 10; and two multiple imputation estima-tors withM = 100andM = 10, respectively. In fractional imputation,Mimputed values ofy1i were independently generated byy₁∗_ij ∼N( ˆβ0(0)+ ˆβ1(0)xi,σˆee(0)), where( ˆβ0(0),βˆ1(0),ˆσee(0))

is the initial regression parameter estimator computed from the respondents of y1. Also, M

(10)

385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432

Table 1. Monte Carlo standardised variances of the imputed estimators

Imputation method η1 η2 η3 η4

FI (M = 100) 129 137 150 110 MI (M = 100) 129 136 150 110 CFI (M = 10) 129 137 150 110 MI (M = 10) 132 138 156 111 FI, fractional imputation; CFI, calibration fractional im-putation; MI, multiple imputation.

log{pˆ_ij₍₀₎/(1−pˆ_ij₍₀₎)}= ˆϕ₀₍₀₎+ ˆϕ₁₍₀₎xi+ ˆϕ2(0)y∗1ij and( ˆϕ0(0),ϕˆ1(0),ϕˆ2(0))is the initial

co-efficient for the logistic regression ofy2i on(1, xi, y1∗ij)obtained by solving the imputed score equation for(ϕ0, ϕ1, ϕ2)using the respondents fory2 only. For each imputed value, we assign

the fractional weight

w∗_ij₍_t₎∝ f1 ( y₁∗(_ij)|xi,θˆ1(t) ) f2 ( y₂∗(_ij)|xi, y1∗(ij),θˆ2(t) ) f1 ( y∗₁_i(j) |xi,θˆ1(0) ) f2 ( y∗₂(_ij)|xi, y1∗(ij),θˆ2(0) ), (22)

where f1(y1 |x, θ1) denotes the conditional distribution of y1 given x evaluated at θ1 =

(β0, β1, σee)and

f2(y2 |x, y1, θ2) =

{

pr(y2 = 1|x, y1, θ2) ify2= 1

pr(y2 = 0|x, y1, θ2) ify2= 0,

withθ2 = (ϕ0, ϕ1, ϕ2). In (22), the parameter estimatesθˆ1(t)andθˆ2(t)were obtained by the

max-imum likelihood method using the fractionally imputed data with fractional weightw∗_ij₍_t₋₁₎. In Step 3 of the fractional imputation for maximum likelihood in§4,C= 5was used. In the cal-ibration fractional imputation method,M = 10values were randomly selected fromM1 = 100

initial fractionally imputed values by systematic sampling with selection probability proportional tow_ij∗₀ in (8). The regression fractional weights were then computed by (10). In Step 5, the con-vergence criterion was∥θˆ(t+1)−θˆ(t)∥<10−9. In multiple imputation, the imputed values are

generated from the posterior predictive distribution iteratively using Gibbs sampling with100 iterations.

All the point estimators are nearly unbiased and are not listed here. The standardised variances of the four imputed estimators are presented in Table 1. The standardised variance in Table 1 was computed by dividing the variance of each estimator by that of the complete sample estimator. The simulation results in Table 1 show that the fractional imputed estimator and the multiple imputation estimator have similar properties forM = 100. The calibration fractional imputation estimator is more efficient than the multiple imputation estimator forM = 10because it uses extra information in the imputed score functions.

In addition to point estimators, variance estimators were also computed for each Monte Carlo sample. We used the linearised variance estimator (13) for fractional imputation. For multiple imputation, we used the variance formula of Rubin (1987). Table 2 presents the Monte Carlo relative biases for the variance estimators. The simulation error for the relative bias of the vari-ance estimators reported in Table 2 is less than 1%. Table 2 shows that the proposed linearisation method provides good estimates for the variance of the fractional imputation estimators. The multiple imputation variance estimators are essentially unbiased forη1, η2, and η3 which

(11)

ap-433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480

Table 2.Relative biases of the variance estimators (%) Imputation method var(ˆη1) var(ˆη2) var(ˆη3) var(ˆη4)

FI (M = 100) 1·0 −1·1 −2·6 −3·2 MI (M= 100) 1·3 −0·6 −1·4 12·2 CFI (M = 10) 0·9 −2·1 −2·9 −1·6 MI (M= 10) 0·4 0·1 −2·3 12·7 FI, fractional imputation; CFI, calibration fractional imputation; MI, multiple imputation.

pear in the imputation model. For variance estimation of the proportion, the multiple imputation variance estimator shows significant bias (12·7% forM = 10 and 12·2% forM = 100). The multiple imputation method in this simulation is congenial for the estimators ofη1,η2 andη3,

but it is not congenial for the estimator (21) ofη4. See Meng (1994) and Appendix 3.

6. CONCLUDINGREMARKS

Parametric fractional imputation is proposed as a method of creating a complete data set with fractionally imputed data. Parameter estimation with fractionally imputed data can be imple-mented using existing software treating the imputed values as observed. The data provider, who has good information for model development, can use an imputation model to construct the frac-tionally imputed data with replicated fractional weights for variance estimation. No information beyond the data set is required for analysis.

If parametric fractional imputation is used to construct the score function, the solution to the imputed score equation is very close to the maximum likelihood estimator for the parameters in the model. Parametric fractional imputation yields consistent estimates for parameters that are not part of the imputation model. For example, in the simulation study, parametric fractional imputation computed from a normal model provides direct estimates for the cumulative distribu-tion funcdistribu-tion. Thus, the proposed imputadistribu-tion method is useful when the parameters of interest are unknown at the time of imputation. Variance estimation can be performed using a linearisation method or a replication method. Variance estimation for parametric fractional imputation, unlike multiple imputation, does not require the congeniality condition of Meng (1994).

The proposed fractional imputation is applicable when the response mechanism is nonignor-able and the response mechanism is specified. Also, parametric fractional imputation can be used with data from a large scale survey sample obtained by a complex sampling design. These topics are beyond the scope of this paper and will be presented elsewhere. Some computational issues such as the convergence criteria for the EM algorithm using fractional imputation are also topics for future research.

ACKNOWLEDGEMENT

The research was partially supported by a Cooperative Agreement between the US Department of Agriculture Natural Resources Conservation Service and Iowa State University. The author wishes to thank professor Wayne Fuller, three anonymous referees, and the associate editor for their very helpful comments.

(12)

481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 SUPPLEMENTARYMATERIAL

More details of the simulation setup, including the program codes, are available at http://jkim.public.iastate.edu/fi.html.

APPENDIX 1

Assumptions and proof for Theorem 1

We consider a regular parametric family {f(y;θ);θ∈Ω}, where Ω is in a finite dimensional Euclidean space. Assume that the true parameter θ0 lies in the interior of Ω. Define S¯(θ) =

∑n i=1wiE { s(θ;yi)|yi,obs, θ } and η¯g(θ) = ∑n i=1wiE { g(yi)|yi,obs, θ }

. We assume the following conditions:

(C1) The solutionθˆin (5) is unique and satisfiesn1/2_(ˆ_θ₋_θ

0) =Op(1).

(C2) The partial derivatives ofS¯(θ)andη¯g(θ)exist and are continuous aroundθ0almost everywhere.

(C3) The partial derivative ofS¯(θ)satisfies

∥∂S¯(θ)/∂θ−E{∂S¯(θ)/∂θ}∥ →0

in probability, uniformly inθandE{∂S¯(θ)/∂θ}is continuous and nonsingular atθ0. Also, the

partial derivative ofη¯g(θ)satisfies

∥∂η¯g(θ)/∂θ−E{∂η¯g(θ)/∂θ} ∥ →0

in probability, uniformly inθandE{∂η¯g(θ)/∂θ}is continuous atθ0.

(C4) There exists a positivedsuch thatE{g(Y)2+d}<∞andE{Sj(θ0)2+d

}

<∞whereSj(θ) = ∂logf(y;θ)/∂θjforj= 1, . . . , pandθjis thej-th element ofθ.

Condition (C1) is a standard condition and will be satisfied in most cases. Conditions (C2) and (C3) provide some conditions about the partial derivatives of the estimator computed from the conditional expectation. Note thatE{∂S¯(θ)/∂θ}=−I_obs(θ)andE{∂η¯g(θ)/∂θ}=Ig,mis(θ), which are defined

in Theorem 1. Condition (C4) is the moment conditions for the central limit theorem.

Proof of Theorem 1.Define a class of estimators

˜ ηg0,n,K(θ) = n ∑ i=1 M ∑ j=1 wiwij∗0(θ)g ( y∗_ij)+KT n ∑ i=1 wiE { s(θ;yi)|yi,obs, θ }

indexed byK. Note that, by (5), we have∑n_i₌₁wiE{s(ˆθ;yi)|yi,obs,θˆ}= 0andη˜g0,n,K(ˆθ) = ˆηg0,n,M

for anyK. According to Theorem 2.13 of Randles (1982), we have

˜ ηg0,n,K(ˆθ)−η˜g0,n,K(θ0) =op ( n−1/2 ) , if E { ∂ ∂θη˜g0,n,K(θ0) } = 0 (A.1) is satisfied. Using n ∑ i=1 M ∑ j=1 wi { ∂ ∂θw ∗ ij0(θ) } g(y_ij∗)= n ∑ i=1 M ∑ j=1 wiw∗ij0(θ) { s(θ;y∗_ij)−s¯_i∗(θ)}g(y_ij∗),

the choice ofK={I_obs(θ0)}− 1

(13)

529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 To show (12), consider ˜ ηg1,n,K(θ) = n ∑ i=1 M ∑ j=1 wiwij∗ (θ)g ( y_ij∗)+KT n ∑ i=1 wiE { s(θ;yi)|yi,obs, θ } . Using (10), n ∑ i=1 M ∑ j=1 wiwij∗ (θ)g ( yij∗ ) = n ∑ i=1 M ∑ j=1 wiw∗ij0(θ)g ( y∗ij ) +  ∑n i=1 wiE { s(θ;yi)|yi,obs, θ } − n ∑ i=1 M ∑ j=1 wiwij∗0(θ)s ( θ;y_ij∗)   T ˆ Bg(θ), where ˆ Bg(θ) =  ∑n i=1 M ∑ j=1 wiw∗ij0(θ) { s∗_ij(θ)−s¯∗_i (θ)}⊗2   −1 n ∑ i=1 M ∑ j=1 wiwij∗0(θ) { s∗_ij(θ)−s¯∗_i(θ)}g(y_ij∗).

After some algebra, it can be shown that the choice of K=K1 in Theorem 1 also satisfies

E{∂η˜g1,n,K(θ0)/∂θ}= 0and, by Randles (1982) again, ˜ ηg1,n,K(ˆθ)−η˜g1,n,K(θ0) =op ( n−1/2 ) . APPENDIX2

Replication variance estimation

Under complete response, letw_i[k]be thek-th replication weight for uniti. Assume that the replication variance estimator ˆ Vn = L ∑ k=1 ck ( ˆ η_g[k]−ηˆg )2 ,

whereckis the factor associate with replicationk,Lis the number of replication,ηˆg=

∑n i=1wig(yi)and ˆ η[gk] = ∑n i=1w [k]

i g(yi), is consistent for the variance ofηˆg. For replication with the calibration method of

(9), we consider the following steps for creating replicated fractional weights. [Step 1] Computeθˆ[k]_{, the}_k_{-th replicate of}_θˆ_{, using fractional weights.}

[Step 2] Using theθˆ[k]_{computed from Step 1, compute the replicated fractional weights by}

n ∑ i=1 M ∑ j=1 w[_ik]w∗_ij[k]s ( ˆ θ[k];y∗_ij ) = 0, (A.2)

using the regression weighting technique.

Equation (A.2) is the calibration equation for the replicated fractional weights. For any estimator of the form (7), the replication variance estimator is constructed as

ˆ V(ˆηF I,g) = L ∑ k=1 ck ( ˆ η[_{F I,g}k] −ηˆF I,g )2

whereηˆ_{F I,g}[k] =∑n_i₌₁∑M_j₌₁w_i[k]w∗_ij[k]g(y_ij∗)andw_ij∗[k]is computed from (A.2).

In general, Step 1 can be computationally problematic sinceθˆ[k] _{is often computed from the iterative}

(14)

577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624

of0 = ¯S[k](ˆθ[k])aroundθˆ. The one-step approximation is

ˆ θ[k]∼= ˆθ+ [ ˆ I[k] obs ( ˆ θ )]−1 ¯ S[k] ( ˆ θ ) , where ˆ I_obs[k] (θ) = n ∑ i=1 M ∑ j=1 w[_ik]w∗_ij { −∂ ∂θs ( θ;y_ij∗)}− n ∑ i=1 M ∑ j=1 w[_ik]w_ij∗ {s(θ;y∗_ij)−s¯∗_i (θ)}⊗2 andS¯[k]₍_θ_{) =}∑n i=1w [k] i ¯s∗i(θ).TheIˆ [k]

obs(θ)is a replicated version of the observed information matrix

proposed by Louis (1982).

APPENDIX 3

A note on multiple imputation for a proportion

Assume that we have a random sample of sizenwith observations(xi, yi)obtained from a bivariate

normal distribution. The parameter of interest is a proportion, for example,η =pr(y≤3). An unbiased estimator ofηis ˆ ηn= 1 n n ∑ i=1 I(yi≤3). (A.3)

Note thatηˆnis unbiased but has larger variance than the maximum likelihood estimator

∫ 3 −∞ ϕ ( y−µˆy ˆ σyy ) dy, (A.4)

whereϕ(y)is the density of the standard normal distribution and(ˆµy,σˆyy)is the maximum likelihood

estimator of(µy, σyy).

For simplicity, assume that the firstr(< n)elements have bothxiandyiresponding, but the lastn−r

elements havexiobserved andyimissing. In this situation, an efficient imputation method such as y_i∗∼N ( ˆ β0+xiβˆ1,ˆσe2 ) (A.5) can be used, whereβˆ0,βˆ1andσˆ2ecan be computed from the respondents. In multiple imputation, the

pa-rameter estimates are generated from a posterior distribution given the observations. Under the imputation mechanism (A.5), the imputed estimator ofµ2of the formµˆ2,I=n−1

(∑r i=1yi+ ∑n i=r+1yi∗ ) satisfies var(ˆµ2,F E) =var(¯yn) +var(ˆµ2,F E−y¯n), (A.6)

whereµˆ2,F E=EI(ˆµ2,I). Condition (A.6) is the congeniality condition of Meng (1994).

Now, forη=pr(y≤3), the imputed estimator ofηbased onηˆnin (A.3) is

ˆ ηI = 1 n { _r ∑ i=1 I(yi≤3) + n ∑ i=r+1 I(y∗i ≤3) } . (A.7)

The expected value ofηˆIover the imputation mechanism is

EI(ˆηI) = 1 n { _r ∑ i=1 I(yi≤3) + n ∑ i=r+1 pr(yi ≤3|xi,θˆ) } = ˆηF E+ 1 n r ∑ i=1 {I(yi≤3)−pr(yi≤3|xi,θˆ)},

(15)

625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 whereηˆF E=n−1 ∑n

i=1pr(yi≤3|xi,θˆ).For the proportion,ηˆF E̸=EI(ˆηI)and so the congeniality

condition does not hold. In fact,

var{EI(ˆηI)}<var(ˆηn) +var{EI(ˆηI)−ηˆn}

and the multiple imputation variance estimator overestimates the variance ofηˆIin (A.7). If the maximum

likelihood estimator (A.4) is used, then

var(ˆηF E) =var(ˆηn) +var(ˆηF E−ηˆn)

and the multiple imputation variance estimator will be approximately unbiased.

REFERENCES

BOOTH, J. G. & HOBERT, J. P. (1999). Maximizing generalized linear models with an automated Monte Carlo EM algorithm.J. R. Statist. Soc. B61, 625–685.

CLAYTON, D., SPIEGELHALTER, D., DUNN, G. & PICKLES, A. (1998). Analysis of longitudinal binary data from multiphase sampling.J. R. Statist. Soc. B60, 71–87.

DEMPSTER, A. P., LAIRD, N. M. & RUBIN, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm.J. R. Statist. Soc. B39, 1–37.

FAY, R. E. (1992). When are inferences from multiple imputation valid? InProc. Survey Res. Meth. Sect.Alexandria, VA: American Statistical Association.

FAY, R. E. (1996). Alternative paradigms for the analysis of imputed survey data.J. Am. Statist. Assoc.91, 490–498. GELMAN, A., MENG, X. L. & STERN, H. (1996). Posterior predictice assessmemt of model fitness via realized

discrepancies (with discussion).Statist. Sinica6, 733–807.

IBRAHIM, J. G. (1990). Incomplete data in generalized linear models.J. Am. Statist. Assoc.85, 765–769.

KIM, J. K., BRICK, M. J., FULLER, W. A. & KALTON, G. (2006). On the bias of the multiple imputation variance estimator in survey sampling.J. R. Statist. Soc. B68, 509–521.

KIM, J. K. & FULLER, W. (2004). Fractional hot deck imputation.Biometrika91, 559–578.

KIM, J. K. & RAO, J. N. K. (2009). Unified approach to linearization variance estimation from survey data after imputation for item nonresponse.Biometrika96, 917–932.

LOUIS, T. A. (1982). Finding the observed information matrix when using the EM algorithm. J. R. Statist. Soc. B

44, 226–233.

MENG, X. L. (1994). Multiple-imputation inferences with uncongenial sources of input (with discussion). Statist. Sci.9, 538–573.

RANDLES, R. H. (1982). On the asymptotic normality of statistics with estimated parameters. Ann. Statist.10, 462–474.

ROBINS, J. M. & WANG, N. (2000). Inference for imputation estimators.Biometrika87, 113–124. RUBIN, D. B. (1976). Inference and missing data.Biometrika63, 581–590.

RUBIN, D. B. (1987).Multiple Imputation for Nonresponse in Surveys. Wiley.

SUNG, Y. J. & GEYER, C. J. (2007). Monte Carlo likelihood inference for missing data models. Ann. Statist.35, 990–1011.

WANG, N. & ROBINS, J. M. (1998). Large-sample theory for parametric multiple imputation procedures.Biometrika

85, 935–948.

WEI, G. C. & TANNER, M. A. (1990). A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms.J. Am. Statist. Assoc.85, 699–704.

WU, C. F. J. (1983). On the convergence properties of the EM algorithm.Ann. Statist.11, 95–103.