• No results found

Variance estimation after imputation

N/A
N/A
Protected

Academic year: 2021

Share "Variance estimation after imputation"

Copied!
185
0
0

Loading.... (view fulltext now)

Full text

(1)

Retrospective Theses and Dissertations

Iowa State University Capstones, Theses and

Dissertations

2000

Variance estimation after imputation

Jae-Kwang Kim

Iowa State University

Follow this and additional works at:

https://lib.dr.iastate.edu/rtd

Part of the

Statistics and Probability Commons

This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Retrospective Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please [email protected].

Recommended Citation

Kim, Jae-Kwang, "Variance estimation after imputation " (2000). Retrospective Theses and Dissertations. 12693. https://lib.dr.iastate.edu/rtd/12693

(2)

INFORMATION TO USERS

This manuscript has been reproduced from the microfilm master. UMI films

the text directly from the original or copy submitted. Thus, some thesis and

dissertation copies are in typevvriter ftice, while others may be from any type of

computer printer.

The quality of this reproduction is dependent upon the quality of the

copy submitted. Broken or indistinct print, colored or poor quality illustrations

and photographs, print bleedthrough, substandard margins, and improper

alignment can adversely affect reproduction.

In the unlikely event that the author dkJ not send UMI a complete manuscript

and there are missing pages, these will be noted. Also, if unauthorized

copyright material had to be removed, a note will indicate the deletion.

Oversize materials (e.g., maps, drawings, charts) are reproduced by

sectioning the original, beginning at the upper left-hand comer and continuing

from left to right in equal sections with small overiaps.

Photographs included in the original manuscript have been reproduced

xerographically in this copy.

Higher quality 6' x 9' black and white

photographic prints are available for any photographs or illustrations appearing

in this copy for an additional charge. Contact UMI directly to order.

Bell & Howeli Information and Learning

300 North Zeeb Road, Ann Arbor, Ml 48106-1346 USA

600-521-0600

(3)
(4)

Variance estimation after imputation

In-. l a t ' - K w a n g K i m

A d i s s e r t a t i o n s u l ) i n i t t f d t o l l i e g r a d u a t e f a c u l t y in [)artial fullillni<Mit o f t h e re<iuinMn«'nts for t h e d e g r e e o f

D O C T O I i o r I M l l I . O S O l M h ' M a j o r : S t a t i s t i c s M a j o r P r o f e s s o r : W a y n e A. I'uller Iowa S t a t e I ' n i v e r s i t y A m e s . Iowa

2000

(5)

UMI Number 9977332

UMI

UMI Microform9977332

Copyright 2000 by Bell & Howell Information and Leaming Company. All rights reserved. This microform edition is protected against

unauthorized copying under Title 17, United States Code.

Bell & Howell Information and Leaming Company 300 North Zeeb Road

P.O. Box 1346 Ann Arbor. Ml 48106-1346

(6)

I I ( I r a d u a t r ('oll«'in« Iowa S t a t f r i i i w r s i t y T h i s is t o ctTtify t l i a t t h e Doctoral d i s s e r t a t i o n of J a e - K w a n g K i m h a s m e t t h e d i s s e r t a t i o n r e c | u i r e n i e n t s of Iowa S t a l e I ' l i i v e r s i t y N l a j o r Professor For t h e M a j o r P r o g r a m o r t h e G ' r t e t ' o l l e g e

Signature was redacted for privacy.

Signature was redacted for privacy.

(7)

TABLE OF CONTENTS

GENERAL INTRODUCTION

I

1 B a s i c Probl«'in I

2 K s t i i n a t i o i i ill I In-pr(>st>iis('uf iiouri's|jons»' 1

;{ l)iss«'rlatioii O r g a n i z a t i o n l i t ' f c n ' i i a ' s 1

LITERATURE REVIEW

(I

1 P f t ' l i i n i i i a r i c s (i •j R cp li cH t i o n V'ariaiHc K.STi n i a t i o n W i t h o u t Noiircspoiisc I I 2.1 I lit-jackkiiiff nu'tliotl 1-J

2 . 2 Halanrc'd ri'p«'at»'(l rcplicalioii I I

•J Hot D f c k I m p u t a t i o n .Mcthoils I')

I Ti ll ' . M u l t i p l e I m p u t a t i o n .Xpiiroacli I S 1.1 l i a y c s i a n .Ju.stiHcation "211 1.2 R a i u l o m i z a t i o n Validity 2 2 !.;{ F a y ' s K.xamplo 2 1 ') Q u a s i - R a n d o m i z a t i o n .Approach 2 7 G O t h o r a p p r o a c h e s 2 9 ti.l K a l t o n a n d K i s h ' s a p p r o a c h 2 9 6 . 2 S a r n d a l ' s a p p r o a c h ^{0 6.;5 Tollefson a n d Fullor's a p p r o a c h .12

(8)

I V

(i.-l F a y ' s a p p r o a c h

licft'ieiiccs 3 1

VARIANCE ESIMATION AFTER IMPUTATION

3 7

A b s t r a c t 3 7 I I n t r o d u c t i o n 3 S J A X a i l r t i i i r [•.srnMcilKJii Mi'tiiuil -5'.) 3 Kxtc'iisioiis T o U a i i d o m I i i i p u t a t i o n 11 1 .Jaci\knif<' Mctluxl 11 ('oiu|)U' X S u r v e y Dcsiffiis IN ' ) . l D c t c r i n i i i i s t i c i m p u t a t i o n IN K a n d o m i i n p u t a l i t > n 'y'l (i C o n c l u d i n g K c n u u k s "i l Ackno\vl('il}!,in(Mits K c f c r c n c c s "i") A p p e n d i x A •')() A p p c ' n d i x 11 (il

INFERENCE PROCEDURES FOR HOT DECK IMPUTATION

. . (i3

A l ) s t r a c t (i3 1 I n t r o d u c t i o n (il 2 M o d e l s for I n i p u t a t i o n (i') 2 . 1 N o t a t i o n ()o 2 . 2 P o p u l a t i o n .Model .Approach 6 8 2 . 3 R e s p o n s e .Model . A p p r o a c h (>9 2.-1 Hot dcck imputation 70 3 E s t i m a t i o n .After H o i D e c k I m p u t a t i o n : P o p u l a t i o n M o d e l .Approach . . 72

(9)

\ ' 5 \ ' a r i ; u u ( ' K s t i i n a t i o i i 8 8 (i S i i m i l a t i o i i S t u d i e s 5)(i (i.l KxperiiiUMit O n t ' 5)0 G.'J Kx|j«'rinuMil two 10;{ A p p e i u l i x lUN Urferriu'c's ! l-'5

REPLICATION VARIANCE ESTIMATION FOR MULTI-PHASE

STRATIFIED SAMPLING

12:]

A l j s t r a c t 1 2 3

1 l i i t r u d i K t i o i i 1 2 1

•J A s y n i p t o t i c I'lDpt'itics I'Jt)

•J A [{('plication M c l l i o d for t h e Kt'\v«'if»lit«-(1 Kxpaiisioii K s t i i n a t o r I'Ui

1 U t ' p l i i a t i o i i \'ariaiic(' Kstiiiiatioii I'or t h e D o u b l e Mxpansiuu lOstiinator . . I l l

o U e p l i c a l i o n V a r i a n c e l^stiiiiator for M u l t i - I ' l i a s e S a m p l i n g I I')

(i A p p l i c a t i o n t o tli«'JOUO I S ( ' e n s u s 1 1 9 (i.l i n t r o f l u c t i o n 11!) (i.2 P o i n t Kstiinatioii l ' ) 0 G.3 X'ariance Kstiination 1 A p p e n t l i x 1 •')•') A . N o t a t i o n l o o B . P r o p e r t i e s of t h e proposetl v a r i a n c e «' s t i i n at or 1 5 6 C . D e r i v a t i o n o f t w o p h a s e v a r i a n c e w h e n t h e s e c o n d p h a s e is s t r a t i f i e d poisson s a m p l i n g l o 9 R e f e r e n c e s 1()8

GENERAL CONCLUSIONS

170

R e f e r e n c e s 171

(10)
(11)

V l l Taljlc I ThIJU' 2 T a b l e T a b l e I l a b l e •') T a b l e (i Table 7 T a b l e 8 T a b l e y

LIST OF TABLES

i s t r a l i o n of t h e p s e i u l o <lata s«'t for U a o a n d S l i a o inetlioil 17 I l l u s t r a t i o n o f t h e p r o f j o s e d pseiulo d a t a s e t 18

M»'aii. v a r i a i i e e . a n d s t a i u l a r d i z e d v a r i a n c e of t h e point <'sliina-t o r s u n d e r <'sliina-t h e f o u r clilfer<'sliina-tMi<'sliina-t i m p u <'sliina-t a <'sliina-t i o n s c l u ' i n e s in exp«'riineii<'sliina-t

o n e (lO.UOO s a m p l e s ) IKi

Nb'an. relativt* bia.s. v a r i a n c e , s l a n d a r d i z e t l varianc<' o f l l u ' vari­

a n c e e s t i m a t o r in e x p e r i m e n t o n e ( lU.UOU sampU-s ) 117 S t a n t l a r d i z e t l n K ' a n C . l . w i d t h , s l a n d a r d i z e i l v a r i a n c e o f ( ' . I . w i d t h . a n d c o v e r a g e in e x p e r i m e n t o u e (lU.OOU s a m p l e s ) 118 Pojjiilation p a r a m e t e r s f o r s i m u l a t i o n in e x p e r i m e n t t w o 118 .Mean, v a r i a n c e , a n d s l a i u l a r d i z e d v a r i a n c e o f t l i e p o i n t e s t i m a ­ t o r s u n d e r different i m p u t a t i o n s c h e m e s i n experiiiK'nt t w o ( 1 0 . 0 0 0 s a m p l e s o f s i z e 100) 119 •Mean, r e l a t i v e b i a s , v a r i a n c e , s t a n d a n l i z e d v a r i a n c e o f t h e vari­ a n c e e s t i m a t o r in e x p e r i m e n t t w o (10.000 s a m p l e s ) 120 S t a n d a r d i z e d m e a n C M . w i d t h , s t a n d a r d i z e d v a r i a n c e ( M . w i d t h , a n d c o v e r a g e in e x p e r i m e n t t w o (10.000 s a m | ) l e s ) 121

(12)

Vlll

Tahlc lU M e a n a n d variaiici' o f l l i c point o s t i m a t o r of 0.x uiuliT t h e t h n v

difftMcnl i m p u t a t i o n schciiu's in ( > x i ) c > r i i i R'iit t w o wlu'ii Z i s a l w a y s

o b s o r v t ' d 121

Table 11 M e a n a n d v a r i a n c e o f t h e v a r i a n c e e s t i m a t o r , t h e m e a n l e n g t h o f 9.j/c ("I (("I w i d t h ) . I h e c o v e r a R c ' o f *)') Vf C I . ;iiid t h e r e l a t i v e b i a s o f t h e variance''<fi!!!al<>r in <'Nperi!n<'tit t w o wlwn / alwMy;

(13)

I

GENERAL INTRODUCTION

1

Basic Problem

III m a n y s a i i i p U ' s u r v e y s , s o i i i r u l t h e u n i t s c u n t a r t i ' d <lu n o t r e s p o n d . O t h e r u n i t s

m a y r e s p o n d t o s o m e h u t n o t a l l < | i u ' s t i o n s heiii}^ a s k e t l . 1 In- p r o h l e m o f inissiinj; d a t a i n

s u r v e y s a i n i ) r m j i ; i s c a l l e t l t h e | ) r o l ) l e m o f i i o i i n .

T h e p r o b l e m s c r e a t e i l by iionr»'spoiise a r e well e x i d a i i u ' d by U u b i n (1!)S7):

r i n ' s e inissiiiff valiH's i n t e n d e d by survey d e s i g n t o l)e o b s e r v e d not o n l y m e a n less e[lit i«Mit e s t i m a t e ' s i i e e a u s e of t h e r«'dut e d siz«' o f d a t a b a s e b u t a l s o t hat s t a n d a r f l c o m i j h ' t e - d a t a m e t h o d s l annot b«' imiiH'diately u s e d t o analyz«' t h e d a t a . .Moreover, p o s s i b l e biases <'xist b e c a u s e t h e r e s p o n d i ' i i t s a r e o f t e n s y s t e m a t i c a l l y d i l f e r e n t f r o m t h e noiirespoiuU'nts: of p a r t i c u l a r conciMii. t h e s e biases a r e difficult t o e l i m i n a t e s i n c e t h e precise rea.sons for iu)nr<'sponse ar«' u s u a l l y n o t k n o w n . ( p . 1 )

It is c o m m t j i i p r a c t i c e t o d i s t i n g u i s h between u n i t . wluMi s o n u ' o f t h e

u n i t s c o n t a c t e d d o n o t r e s p o n d b e c a u s e of iiot-at-liomes. refusals, i n a b i l i t y t o p a r t i c i p a t e , a n d untracecl u n i t s , a n d i l t i n r t o i i n s p o i i M . when s o m e b u t n o t a l l o f t h e responses a r e a v a i l a b l e . I t e m n o n r e s p o n s e a r i s e s b e c a u s e of i t e m r e f u s a l s , " d o i r t k n o w s ", o m i ss i o n a n d a n s w e r s d e l e t e d in e d i t i n g . I h e p r o b l e m of v a r i a n c e e s t i m a t i o n in t h e pr e s en c e of i t e m n o n r e s p o n s e will b e a d d r e s s e d i n t h i s work.

(14)

2

2 Estimation in the presense of nonresponse

r i i o l i t e r a t u r e o n t h e e s t i m a t i o n i ) r o b l e m in th<' p r e s e n c e of n o n r e s p o n s e is c o m p a r ­ a t i v e l y r c v e n t ; r e v i e w papers inc l u d e O h a n d Sc h e u r e n ( IDS.'J). K o t t ( I ! ) 9 1 ) . a n d Brick ancl K a l t o n ( 1!)9()). M e t h o d s p r o p o s e d i n t h i s l i t e r a t u r e c a n h e roughly grcjupecl i n t o t h e following c a t e g o r i e s (not nuitiially e x c l u s i v e ) :

( i ) I' v o n d u n s l i a s t d o n ( ' o i n p h ti Itj H t r o r d i d I

W h e n s o m e variables a r e not r e c o r d e d for s o m e cjf t h e u n i t s , a simi>le e x p e d i e n t is t o cliscard i n c o m p l e t e l y recorcled u n i t s ancl t o a n a l \ x e o n l y t h e u n i t s w i t h c o m p l e t e d a t a . D e l e t i o n g o f units is e a s y t o c a r r y cnit a n d m a y h e satisfactcjry w i t h snutll a m c n i n t s o f missing d a t a . B i a s will o c c u r in t h e pcjint e s t i m a t e a s well a s in t h e v a r i a n c e - c o v a r i a n c c e s t i m a t e , u n l e s s t l i e p o p u l a t i o n m e a n for r e s p o n d e n t s is ecpial

t o t h a t cjf nonresponclcnts ( s e e . e . g . K a l t o n |) . 7). Also, t h e fracticju of

clis-carclecl u n i t s will h e iion-negligil>le w h e n t h e n u m b e r of i t e m s in t h e c|uestiomiaire is l a r g e .

( i i ) \ \ ( Kjhling A d j u s t m i nt

In \ \ t i y h t i n y ndju.'itiin iil for t h e n o n r e s p o n s e p r o b l e m , t h e w e i g i i t s o f s p e c i h e d r e s p o n d e n t s a r e increased s o t h a t t h e y r e p r e s e n t t h e n o n r e s p o n c l e n t s . W e i g h t i n g a d j u s t m e n t is primarily used t o c o m | ) e n s a t e f o r u n i t n o n r e s p o n s e . 1 h e m a i n o b ­ j e c t i v e of t h e weighting a d j u s t m e n t i s t o r e d u c e b i a s in s u r v e y e s t i m a t e s b y m a k i n g eacii r e s p o n d e n t represent t h e c o r r e c t f r a c t i o n o f t h e t a r g e t p o p u l a t i o n .

(iii) I m p u l a t i o n P m n d u r r s

I m p u t a t i o n m e a n s inserting v a l u e s f o r m i s s i n g i t e m s . I m p u t a t i o n is useful in deal­

(15)

( h ) I ' s e s t l i c s a i n t ' s u r v e y weiglits f o r a l l iltMiis. u n l i k e s e p a r a t e w e i g h t i n g a d j u s t ­ m e n t for e a c h i t e m .

( b ) l i e t a i n s a l l t h e r e p o r t e d d a t a f o r u s e i n m u l t i v a r i a t e a n a l y s i s , u n l i k e t h e coin-[Dlete c a s e a p p r o a c h .

1 h e r e a r e s e v e r a l i m p u t a t i o n m e t h o c l s u s e d i n p r a c t i c e . Hot d e c k im|JUtatioti is t h e i m p u t a t i o n p r o c e d u r e in w h i c h t h e v a l u e a s s i g n e d for a m i s s i n g i t e m is tak<'n from r e s p o n d e n t s in t h e c u r r e n t s a m p l e . .Many o f the- htjl d e c k i m p u t a t i o n p r o c e d u r e s s t a r t w i t h a d i v i s i o n of t h e s a m p l e i n t o c e l l s l)asetl «;n au.xiliary v a r i a b l e s k i u n v n for b o t h t h e n ' s p o n d e n t s a n d n o n n ' s p o n d e n t s . [ h e cell is c a l l e d t h e iinpul(tlii»n n i l . O n e of tli<' c o m m c j i i l y u s e d hcjt d e c k i m p u t a t i o n m e t h o d s is .•iiiiipti rttinloiii h o t dtrfc n n p n t n h o n . w h e r e n o n r e s p o n d e n t s a r e assigne(l v a l u e s f r o m r e s p o n d e n t s in t h e s a n i e i m p u t a t i o n cell w i t h eciual p r o b a b i l i t i e s o f s e l e c t i o n . I n s p i t e o f i t s c o n v e n i e n c e , t r t ' a t i n g t h e i m p u t e d values a s if t h e y a r e t r u e values a n d m a k i n g i n f e r e n c e u s i n g s t a n d a r d f o r m u l a s s h o u l d b e u.secl w i t h c a u t i o n . The s t a n ­ d a r d v a r i a n c e e s t i m a t o r s , in p a r t i c u l a r , l e a d t o u n d e r e s t i m a t i o n b e c a u s e t h e a d d i t i o n a l v a r i a b i l i t y d u e t o m i s s i n g v a l u e s a n d i m p u t a t i o n is n o t t a k e n i n t o a c c o u n t .

W e will r e v i e w e.xisting m e t l u n l s o f v a r i a n c e e s t i m a t i o n for i m p u t e d d a t a a n d suggest a l t e r n a t i v e m o t h o d s .

3 Dissertation Organization

T h e d i s s e r t a t i o n c o n s i s t s o f t h r e e rc'search p a p e r s . T h e d i s s e r t a t i o n i s o r g a n i z e d a s follows: • C h a p t e r 2 E x i s t i n g m e t h o d s o f v a r i a n c e e s t i m a t i o n a f t e r i m p u t a t i o n , i n c l u d i n g t h e a p p r o a c h o f R u b i n ( 1 9 S 7 ) a n d t h e a p p r o a c h o f R a o ( 1 9 9 6 ) a r e r e v i e w e d . T h e p r o s a n d cons

(16)

1

of t h e e x i s t i n g niethocls a r e discussed.

• C l i a p t e r

A v a r i a n c e e s t i m a t i o n m e t l i o d based o n s i n g l e i m p u t a t i o n is p r o p o s e d i n t h i s p a p e r . I his is b a s i c a l l y a n e.xtension of t h e a j i p r o a c h o f Kao( UMKi). 1 h e propos«'d m e t h o d is usefid f o r e s t i m a t i n g t h e variance of a s m o o t h f u n c t i o n of l i n e a r e s t i m a t o r s .

• C h a p t e r t

In t h i s p a p e r , a v a r i a n c e e s t i m a t i o n m<'tho«l b a s e d o n t w o o r m o r e i m p u t e d values for i ' a c h m i s s i n g i t e m is p r o p o s e d . In p a r t i c u l a r , a iirocedur** calleil fully eflicient f r a c t i o n a l i m p u t a t i o n is |iro|)osed a n d v a r i a n c e e s t i m a t i o n l o r t h e jirocedure is p r e s e n t e d . • C h a i J t e r j In t h i s p a p e r , a v a r i a n c e e s t i m a t i o n m e t h o d f o r mulli-pha.se s a m p l i n g is preseiitetl a n d a p p l i e d t o t h e 2 0 0 0 I S C e n s u s of I ' o p u l a t i t j i i . • C h a p t e r (i C o n c l u s i o n s a r e m a d e .

References

B r i c k . .J. .\I. a n d K a l t o n . ( I . (199(i) " H a n d l i n g m i s s i n g d a t a in s u r v e y r«'search." S l a t i t

-/ical M f t h o d a i n S h d i c a l R i s c a r c h , 21.>238.

K a l t o n . CI. (198.'{) ('otnpfn.-^atiiiy f o r .•<iirct y d a t a . I n s t i t u t e o f S o c i a l Research.

K o t t . P . S . ( 1 9 9 4 ) . n o t e o n h a n d l i n g n o n r e s p o n s e i n s a m p l e s u r v e y s . " J o u r n a l of t i n

(17)

• )

O h . II. L. a i u l S c l u ' u r o i i . K. J . (IlKS;}). "W'cigliling acljiistinciits for unit n o i i - n ' s p o i i s c . "

In liicnnipli If D a Id ill S u m pit S u r r t ijt<. \'oliinii J, Tlit o n j tiinl l i i b l i o y n i p h i r s . W . ( I .

M a d o w . I. O i k i n . a i u l 1). B. Ruhiii ( r d s . ). .\«'\v N'ork: AcaclcMiiic l ^ c s s . 1 1:{-IS1.

Rao. .). N. K . ( 1*)%). " O n v a r i a i u f c s t i i u a t i o i i with i m p u t e d s u r v e y d a t a . " .Journal of

tin Aiiif rican Statislicul Assnciatiuu. 1)1. 191)-.')l)l).

(18)

(i

LITERATURE REVIEW

1 Preliminaries

A | ) o | ) u l a l i o n o f .V i d c n t i l i a i j i c clcintMits is (U-noti-d i)y / = { 1 . "J \ }. A s u b s e t ol l l u ' |)o|Julalioii is s«'U'( t«'<l a n d caiU'd a sani|)lr. 1 In- s e l e c t i o n o f sanii)les uses a >^et of p r o h a h i l i t y n d e s calU-d t i i e s t u n p l i n y iiitcliiini.>iii. Let . 1 d e n o t e t h e set of indices in i h e s a m p l e . Deliiu' t h e s a m p l e s e l e c t i o n indicator f u n c t i o n

l j = < 1 if J € . 1

( i . n

I ) i f j ^ . i

a n d

B = ( / ,

I

k ) . ( l . - J )

Let / H B ) d e n o t e a s a m p l i n g iiiechanisin t h a t a s s i g n s proi)al)ilities. s u m m i n g t o o n e . t o t l i e 2'^ p o s s i b l e

B

v e c t o r s .

.Associated w i t h t h e j - l \ \ e l e m e n t of t h e p o p u l a t i o n is a v«'ctor o f c h a r a c t e r i s t i c s d e n o t e d by a n d t h e p o i j u l a t i o n of vectors is d e n o t e d by

Y = ( y ,

y

, v ) .

( l . ; J )

L e t t h e p o p u l a t i o n ( j u a n t i t y o f i n t e r e s t b e f

(yi

y

.v) find let U b e a n e s t i m a t o r o f 0 \ ba.sed o n t h e s a m p l e . T h e t r a d i t i o n a l s u r v e y s a m p l i n g a p p r o a c h t r e a t s

Y

a s fixed

(19)

i

A n e s t i m a t o r 0 is CH

II

CC

I

dtsKju itiibiasid for 0 \ if

/• (tf 1 ^ ) = 0 s . ( l . l ) w l i c r c JT - { y ,

y

.v}. /-' ( O \ a n d Y.b ' I f n o l c s t h e s u i n i i u n a t i o n o v e r a l l possil)l«'

B.

A n u l l i c r l u o i l f of i n f o r i ' n r r assiuncs t h a t l l i c p o p u l a t i u n v c r t o r

Y

is a r a n d o m s a m p l e f r o m a n i n l i n i t f s n p c r i ^ p n l a t i o i i . i ll*' inodcl-l)as<'d aj)j)roai li in s u r v e y s a m p l i n g m a k e s i n f e r e n c e s b a s e d o n tin- c o n d i t i o n a l d i s t r i h u l i o n o f

Y

f^iven t h e s a m p l e o u t c o m e

.1.

N o t e t i i a t t h i s c o n d i t i o n a l d i s t r i b u t i o n is d e t e r m i n e d b y t h e s a m p ^ m ^ m e c h a n i s m a s well a s by th«' d i s t r i b u t i o n o f t h e variable

Y.

I h e d e | ) e n d e n c e o n t h e sampling; m e c h a n i s m c a n b e a v o i d e d if t h e s a m i j l i n g m e c h a n i s m is iijiioniblt. W'e f o r m a l i z e t h e coticept in D e i i n i t i o n

1 . 1 .

Definition 1.1

l . t l l l n (lislnbulion of Y bi <li i i o t t d bij C ( Y ) a n d ctilltti t i n s u i n r p o p il­ l a t i o n i n o d t l . [ a ! /»(B) bf Ihf suuiplintj t m r i t a n i s i n . l i n n . /J(B) lynorabli a n d i r t l u s u p i rpopulatioti riiodd if a n d only if

£ ( Y | . \ ) = Z : ( Y ) .

(l..-j)

w h i n

£ ( Y | . 1 ) /.S

lli( conditional d i s t r i b u l i o n o f \ y i v m t l u .sampit o u t r o i m

. 1 .

I.et

X

= ( X i . - - - . x , v ) b e a vector of v a l u e s f o r a secx}nd v a r i a b l e , w h e r e t h e t r u e v e c t o r

X

is k n o w n f o r t h e population. .A sufficient c o n d i t i o n for t h e i g n o r a b i l i t y o f t h e

sampling mechanism is that it can be described by the conditional independence of

Y

a n d

B

g i v e n

X.

In D a w i d ' s (1979) n o t a t i o n .

Y L B | X

( l . ( i )

m e a n s t h a t t h e v a r i a b l e

Y

is i n d e p e n d e n t o f t h e s a m p l e s e l e c t i o n i n d i c a t o r v a r i a b l e

(20)

s

s t r a t i l i e i l r a i u l o i n s a m p l i n g a n d t h e a u x i l i a r y v a r i a b l o X is t h e i n d i c a t o r v c c t o r f o r s t r a t a , t h e n t h f c o n t l i t i o n ( l . O ) h o l d s b o c a u s c . i n t h e s a m e s t r a t u m , t h e p r o b a b i l i t y o f s a n i j i l e s e l e c t i o n is t h e s a m e for all e l e m e n t s . H e n c e , t h e s a m p l i n g m e c h a n i s m is i n d e p e n d e n t of t h e v a l u e o f V i n t l i e s t r a t u m . R u b i n ( 1 9 7 0 ) . S c o t t a n d S m i t h ( 1 9 7 3 ) . a i u l S u g d e n a n d S m i t h ( 1 9 8 1 ) d i s c u s s i g n o r a b i l i t y .

I.et u s a s s u m e t h a t t h e l i n i t e p o p u l a t i o n I i s m a d e u|) of (1 i m p u t a t i o n cells. W i t h i n e a c h ci'll </. cy = 1 (1. t h e »'lemeiits ar<' i d e n t i c a l l y a n d i n d e p e n d e n t l y d i s t r i b u t e d w i t h

m e a n it., a n d v a r i a n c e . i.e.

w h e r e I j d e n o t e s t h e s e t o f i n d i c e s f o r t h e i m i) U t a t i u n c e l l . W e c a l l l l u ' m o d e l ( 1 . 7 )

t h e i i n p u t d t i o i i ct II niotit I.

Lemma 1.1

A s s n n i i roinlilioii ( l . ( ) ) with t i n i t t t-rilinrij ra r ni b h X h f i m j t i n i i u l i r a t o r

r t c l n r f o r i i i i p u l a l i o n cflls nn<l (I s s u h k that

Till II till s a m p l i n g iiucliaiiisni i s iynovablt u n d t r snpt rpopnhition m o d i I ( 1 . 7 ) .

Proof. I

,et b e a n y m i ' a s u r a b l e s e t in t h e s i g m a - f i e l d rT

( V ' )

gj-nerated b y t i i e r a n d o m v a r i a b l e V. I h e n . b y t h e d e h n i t i o n o f c o n d i t i o n a l i n d e i K ' u d e n c e . for i <E I j . U < l ' r ( / , = 1) < I . I . - - . . V . ( l . S ) P r (V; € > ' . / . = ! \ i e l ' j ) = P r ( V ; € >• I i € { ' . j ) P v ( l , = \ \ , e l ' j ) . .Mso. Pr( V ; € .s' I

i e f ' . j J , = i )

P r ( y . € S . l , = 1 I < €

r . j )

P r { I , = \ \ i € r . j ) P r ( V ; e S \ i € u n d e r ( 1 . 8 ) . H e n c e . £ ( v ; | / € r , . / . = i ) = r ( y ; | / € r , ) .

(21)

9

S i m i l a r l y .

£ ( V ; I / € I ' , . ! , = 0 ) = £ ( V ; I / € T y ) .

S o . t l i c r e s u l t follows. •

I . c m m a I.I iiii[>lii'>< t tial l l i e ' l i s t r i i m t ioii o f I I n - s j i m p l f i i p a r t is llii' s a m i ' a s t h a t of iioii-samplcd p a r t . Tliat is

v ; | . i

~ / € / ' , ( I . ! ) )

for <'a< li ( c l l <j. <j = I

(livvii iioiircsiHJiist'. t h e original s a m p l e .1 is (Iccoiiiposcd i n t o tin- sj't of rt'sponcU'iits. A n - a n d t h e s»'t of n o n r c s p o n d c n t s . . 1 \ / . l)«'lin«' t h e rcs|)onsc i i u l i c a t o r fimrtioii

1 V, r e s p o n d s if sami)U'<l

/ = 1 V ( l . i U )

0 V, dot's n o t r«'spoiul if s a m p l e d

l i , =

a n d t h e assot iatecl vt-ctor

R = ( / ^ , l i s ) . ( l . l l )

I h e d i s t r i b u t i o n o f

R

is called t h e r es j) ons e m e c h a n i s m . .Vote t h a t t h e r«'siJonse m e c h a n i s m is u s u a l l y u n k n o w n a n d i s sp«'cilied by t h e mo<lel. C o n d i t i o n a l i n f e r e n c e for

Y

g i v e n

R

r e q u i r e s t h e specification o f t h e r e s p o n s e m e c h a n i s m . I g n o r a b i l i t y o f t h e r e s p o n s e m e c h a n i s m i s d e f i n e d in D e f i n i t i o n 1.2

Definition 1.2

L d C { Y \ b t t h t c o n d i t i o n a l d i f t t r i b u t i o n

o/Y

y i r t i i t i n r t a l i z t d sanipli . 1 . a n d t i n r t n l i z t d r t s p o m U t t t .s A f i . T l u n , tht rf.'iponst n i t c h a i i i .'ini (.s iynortibli u n d e r t h t i n o d t l if

£ ( Y | . - L . l f i ) = £ ( Y | . - l ) .

( 1 . 1 2 )

(22)

10

R u h i n ( lf)7(). p . ' i S ' J ) lU'fiiictl t l i r n ' s p o i i s e i i u ' c l i H i i i s i n t o b o a m i s s i n g a t r a n t l o m

( M A 11) n u ' c l i a i i i s m if

w h e r e t h e n o t a t i o n (1.13) m e a n s t h a t t h e r e s p o n s e i n d i c a t o r v a r i a b l e

R

is i i u h ' p e n d e n t

uf the study variable

Y.

conditiuna! on the au.xiliary variable

X

and tlie "^ainph'

B.

Lemma 1.2

I.it llu auiiliurij v a n a b h

X

bt llit i m p i i t a t i o n n i l r u r i a b h dijiiitd i n l.( iiniKi 1.1. A s s t i i m thai

a n d t h a t I / k M A H condiHon ( I . I J ) h o l d s , fhtii llu n s p o i i x m t r l i u n t s n i i s Kjuortiblt unilt r lilt t i i o d t l ( l . O j .

Proof.

I l u ' proof is ((uite s i m i l a r t o t h a t of L e m m a "J.l. Let >" b e a n y m e a s u r a b l e s e t i n t h e sigma-li«'ld ^ ( V ) gen«'rated b y t h e r a n d o m v a r i a b l e V. I ' h e n . b y t h e delinitiou o f t h e c o n d i t i o n a i i n d e | ) e n d e n e e . f o r t € I

R ± Y | ( X . B )

( i . i ; n 0 < P r ( / / , = 1) < 1. I ^ A ( 1 . 1 1 ) Pr ^ S . It, = I \ I e r. j . I, = I ) = P r l \ ' , ^ S \ i€ r, j . / , = l ) X | > r ( / f , = 1 I / e r , . / . = 1) a n d b y t h e d e l i n i t i o u of /?, P r ( / f , = I 1 / €

r , J ,

= 1) = = 1 I / €

r , ) .

r i i e r e f o r e . P r ( v ; € . s I i € [',J. I, = \ . H, = I ) P r ( > ; e

s . H , =

11

i e t ' j . I , = \ )

P r ( H . = I I / € / ; . / , = I ) P r ( y ; € . v | t € L ' j . l , = 1) u n d e r ( 1 . 1 4 ) . H e n c e , t h e result follows.

(23)

11 ( l i v e n l . t ' i i u u a 1.1 a n d L r n i m a l . J . w e a r c abU- t o s a y l l i a t n i o d f l ( 1 . 7 ) . l o g e t l u T w i t h a n i g n o r a b l e s a m p l i n g i n o r l i a n i s i n a n d a n i g n o r a b l e i v s p o n s c n u ' c i i a n i s i n . p r o d u c i ' o b s e r v a t i o n s i n a n i m p u t a t i o n ccll t h a t a r c d i s t r i b u t e d i d e n t i c a l l y a n d itKle|jendently. r i i a t is. V, \ ( . \ . . \ n ) - ( / ' . . t ; ) . ( 1 . 1 5 )

2 Replication Variance Estimation Without Nonresponse

I n t h i s s e i t i o n . w e c o n s i d e r t h « ' ca.st- w h e n t h e r e i s n o n o n r c s | ) i i n s e i n t h e s a m p l e . I n m a n y s t i u T n ' s . d a t a a r e c o l l e c l e d f r o m i n d i v i d u a l s o r u n i t s s a m p l e < l u s i n g c o m p h ' x s a m p l e d e s i g n s t h a t i n c l i u l e v a r y i n g i ^ r o b a b i l i t i c s a n d n o n - i n d e p e n d e n t s « ' l c c t i o n s . O n « ' a p p r o a c h t o e s t i m a t i n g t h e s t a n < l a r d e r r o r t)f t h e e s t i m a t o r i s t o l i n c a r i z * ' t h e e s t i m a t o r u s i n g a T a y l o r s c r i « ' s e x p a n s i o n a n d t h e n us»' s t a i u l a r d s a m p h * s u r v e y v a r i a n c e • • s t i m a t i o n m e t h o d t o e s t i m a t e t h e p r e c i s i o n o f t h e l i n e a r i / . i ' i l s t a t i s t i c . . \ n a d v a i U a g e o f t h e l i n e a r i z a t i o n m e t l i o t l i s t h a t it i s a p p l i c a b l e t o g e n e r a l s a m [ ) l i n g d e s i g n , b u t a d i s a d v a n t a g e i s t h a t it i n v o l v e s t l u ' i l e r i v a t i o n o f a s e p a r a t e v a r i a n c e e s t i m a t i o n f o r m u l a f o r e a c h s t a t i s t i c . .An a l t e r n a t i v e a p p r o a c h is t o u s e a r e p l i c a t i o n m e t h o d . Two p o p u l a r m e t h o d s i n surv<>y s a m p l i n g a r e t h e j a c k k n i f e a n d b a l a n c e d r e p e a t e d r e p l i c a t i o n ( l i l i U ) . W o l t e r (198')) a n d R u s t a n d R a o (199G) p r o v i d e g o o t l r e v i e w s of t h e r e p l i c a t i o n l i t e r a t u r e a s a p p l i e d t o coinple.N s a m p l e s u r v e y s .

Let 0 b e t h e p o p u l a t i o n p a r a m e t e r o f i n t e r e s t a n d let 0 b e t h e e s t i m a t o r of 0 ba.sed o n t h e full s a m p l e . T o e s t i m a t e s a m p l i n g e r r o r s , s u b s a m p l e s f r o m t h e s a m p l e a r e <lrawn a m i 0 is c o m p u t e t l f r o m e a c h s u b s a m p l e . Dilferent w a y s o f s u b s a m p l i n g f r o m t h e full s a m p l e c o r r e s p o n d t o d i f f e r e n t r e p l i c a t i o n m e t h o d s . I h e s u b s a m p l e s a r e c a l l e d r e p l i c a t e s a m p l e s a n d t h e s t a t i s t i c s c a l c u l a t e d f r o m t h e s e r e p l i c a t e s a r e calletl r e p l i c a t e e s t i m a t e s .

(24)

12

T h e \ a r i a n a ' o f tin* s a m p l e c s t i m a l o r ^ is e s t i i n a t e c l froni t h e r e p l i c a t e e s t i m a t e s b y

I

' W = Z ' *

k=i

w h e r e Qik) is t h e A'-th e s t i m a t e of 0 b a s e d o n t h e oljservatioiis i i u h u l e c l i n t h e A-th r i ' p l i i a t e . /, is t h e n u m b e r o f r e p l i i a t e s . a n i l c^. is a factor a s s o c i a t e d w i t h r e p l i c a t e A* a n d d e t e r m i n e d by t h e r e p l i c a t i o n m < ' t h o d . W h e n t h e o r i g i n a l e s t i m a t o r

0

is a l i n e a r e s t i m a t o r of t h e f o r m = C-MT)

16.1

w h e r e u \ = tr, ( . 1 ) . t h e A-th r e p l i c a t e ol 0 c a n b e w r i t t e n a s I t . I

w h e r e d e n o t e s t h e r e p l i c a t e weight f o r t h e / - t h unit of t h e A'-th r<'plicat«'.

[ h e following l e m m a p r o v i d e s a n e c e s s a r y c o n d i t i o n for a v a r i a n c e e s t i m a t o r t o Ix' u n b i a s e d .

Lemma 2.1

L d tht o r n j u m l t s l h i m t o r 0 bt a l i i i i u r t.'itinKitor of t i n f o r m i n ( J . 171. If a ri pliraliort variniict i. s l i m a t o r I i n ( J . K J j

/.•»

( Us k j i i n n b i d s i d f o r t i n ( d i s i y i i ) rariaiicf of 0 a n d always takt.s i i o i i n t y a t i r t r a l i K s . tlnii wi h a r t

H A-= 1 . 2 . • • • . / . (2.1!)) i6.> ie.» f o r all

z ,

salisfyiiKj I V/r I = 0 . (2.2U)

Proof. Let

' • W = t - - ' ( E

- E

f c = l \ i 6 . » i € . \ '

(25)

B y (2."20) Hiul tlif u i i b i a s o d n c s s o f I w e h a v e

('Ml ^} =

"-S i net' P i - ( \ " - ( « , ) < U | . F ) = 0 . w e iiav»> r { o ^ ) = 0

for a l l sainpU's. Siiicc a r c all p o s i t i \ c h y tin* lUiiiiicgatixciH'ss of I (Jj^ a g a i n . ( 2. 2 0)

follows. •

r i i c r('i)ruat«' f a c t o r c^. is c h o s e n s o t h a t c^. - " ' • ) ^i] estiniat«'s \ d r (»', V, / , ) .

I l u l e r s t r i c t unhia.seclness o f I (^0^ . w e h a v e

I

-Y . n - ( " • ! " - '!•,)* P r ( / € .1) = / r - l ' r ( / < = . » ) [ ! - P r ( / € . \ ) ] . ( 2 . 2 1 ) k = l

Kc|uality ( 2 . 2 1 ) h o l d s b e c a u s e th«' left s i d e of ( 2 . 2 1 ) is t h e e x p e c t e d valiu' o f \ ' for t h e p a r t i c u l a r p o i ) u l a t i o n JF w h o s e // values a r e Z I T O S for a l l u n i t s exce|)t f o r t h e / - t h

e l e m e n t . T h e right s i d e o f ( 2 . 2 1 ) is tin- d e s i g n v a r i a n c e of 0 for t h e p o p u l a t i o n JF.

2.1 The jackknife method

T h e j a c k k n i f e m e l l i o t l . w h i c h o r i g i n a l l y w a s (U-signed t o e s t i m a t e t h e l)ia.s o f a n e s t i ­ m a t o r by d e l e t i n g o n e f l a t u m f r o m t h e o r i g i n a l d a t a set a n d r e c a l c u l a t i n g t h e e s t i m a t o r b a s e d o n t i i e rest of t h e d a t a , h a s b e c o m e a v a l u a b l e t o o l for t h e v a r i a n c e e s t i m a t i o n s i n c e t h e w o r k of l u k e y (1!).')S). h i a n i n f i n i t e p o p u l a t i o n c o n t e x t . Tukey (li).')S) s u g ­ g e s t e d t h a t e a c h r e p l i c a t e e s t i m a t e m i g h t b e r e g a r d e d a.s a n i n d e p e n d e n t a n d i d e n t i c a l l y d i s t r i b u t e d r a n d o m v a r i a b l e , w h i c h in t u r n s u g g e s t s a v e r y s i m p l e v a r i a n c e e s t i m a t o r . I n t h e f i n i t e p o p u l a t i o n s a m p l i n g c o n t e x t , e a c h j a c k k n i f e r e p l i c a t e d e l e t e s o n e u n i t a n d m o d i f i e s t h e w e i g h t s o f o t h e r s .

(26)

1-1

E x a m p l e 2 . 1 / . I ndt r timijU n i n d o n i s n t n p i m t j o f s i z t n from a Jinitf p o p u l a t i o n

of s i z t .V. Ihi jnckkiitft r a r i n n c t t s t i m a t o r i s d i j i t u d b y ((juntioti ( J . 1 0 ) with < t =

/ j ~ ' ( n — 1 ) ( 1 — a n d tcf''' = (;i — 1) ' l u r , i f i k and i f ) ' ' = 0 . I'litsr

L'aluts s a t i s f y ( J . 10) with = 1 a n d ( 2 . 2 1 ) .

2 . l o r stratijii d r a n d o m s a m p l i n i j . Itt V/,, b( t h i r a l u t o j tht i - t h t h m t n t i n s t r a t u m h . Lt I ct; — {ill, — I) u ' h u i th( unit { h i ) i s d t l i t t d f o r thi k - t h r t p l i c a t i a n d l i t

ti'i,, if a u i u t i n s t r a t u m (j is d i l t t i d (j ^ h

' (iii^ — 1) ' i j u n i t [ h j ) / . s d t I t t i l l i ^ j U i f u n i t ( h i ) I S i l l I t t i l l .

T h t n f 2 . 1 9 ) h o l d s , win n c/,, = ( : / , , i . • • • . r/.,//) w i t h ^ [ if h = ij. a n d zi,,., = 0 o t h i r u ' i s t .

2.2 Balanced repeated replication

UalaiKi'd rt'peati'cl n ' p l i c a t i o i i ( l i K l t ) w a s first [)ro|)oscHl i)y M c C ar t l i y (lf)()^)) for tlu* c a s e wlu-rc t w o c l u s l i ' r s pi-r s t r a t u m a r c sampU'cl w i t h rcplaaMiiciit in t h e first s t a g e o f s a m p l i n g . I n t h e t w o - c l u s l e r - p < ' r - s t r a t u i n tlesigii w i t h / / s t r a t a , a m i n i m a l s e t o f L b a l a n c e d h a l f - s a m p l e s m a y b«' constriictecl f r o m a n /, x /, l l a d a m a r t l m a t r i x ( s e e . e . g . VVolter. 19 8 5) by c h o s i n g a n y / / r o l n m n s e x c l u d i n g t h e column of all + r s . w h e r e

H < L < H ^ L e t b e t h e e l e m e n t of t h e l l a d a m a r d m a t r i x satisfying = 0 (2.2'J) k = \ for a l l h a n d = M i / -k=l

(27)

T l i o A'-tli of t l u ' lincHr e s t i m a t o r of t h e f o r m / / 1 >,=i 1-1 c a n b e w r i t t e n a s ( 2 . 2 1 ) S i n c e t h e B R R uses u n l v lialf of t h e o r i g i n a l s a m p h ' . it m a y p r o f h u ' e \<'r\' u n s t a b l e e s t i m a t e s for s o m e n o n l i n e a r s t a t i s t i c s i n r e l a t i v e l y s m a l l samples. I'o a v o i d a n o m a l i e s .

Kay (H)S1) s u g g e s t e d u s i n g

w h e r e U < <'> < 1.

3 Hot Deck Imputation Methods

T h e r e a r e a vari«>ty o f i m p u t a t i o n m e t h o d s u s e d in practice, a s not«'il b y K a l t o n a n d K a s p r z y k (1!)S()). I h e h o t d e c k i m p u t a t i o n m e t h o d s t a r t s with t h e ilivision o f t l u ' s a m p l e i n t o several i m p u t a t i o n c<*lls. M a n y hot d e c k i m p u t a t i o n metho<ls a s s i g n th«' v a l u e f r o m a recorcl w i t h a response- t o a recorcl w i t h a missing v al ue o n t h a t i t e m . T h e s e r e c o r d s will b e c a l l e i l t h e d o n o r a m i r t c i p i m l . res|)ectively. O f t e n , t h e v a l u e s for a set o f r e l a t e d m i s s i n g i t e m s a r e tak<'n f r o m t h e s a n u ' d o n o r , t o p r e s e r v e s o m e o f t h e m u l t i v a r i a t e r e l a t i o n s h i p s .

H o t d e c k i m p u t a t i o n m e t h o d s c a n b e r o u g h l y classified into t h e following c a t e g o r i e s :

( i ) S e q u e n t i a l Hot D e c k I m p u t a t i o n

S o m e h o t d e c k i m p u t a t i o n p r o c e d u r e s i m p u t e t h e value f r o m t h e r e c o r d i n t h e s a m e cell t h a t w a s l a s t r e a d b y t h e c o m p u t e r . T h i s is p a r t l y b a s e d o n a belief t h a t , if t h e d a t a a r e a r r a n g e d i n s o m e g e o g r a p h i c o r d e r , a d j a c e n t u n i t s i n t h e cell

(28)

1 ( )

will ti'iid t o Ix' iiiort' s i m i l a r t h a n r a n d o m l y chosen u n i t s i n t h e cell. O n e problem with t h e setpiential iiot d e c k i m p u t a t i o n is t h a t it m a y e a s i l y mak»' nudti|)le uses of donors, a feature tluit l e a d s t o a loss of [jrecision in s u r v e y e s t i m a t e s .

(ii) Uaiidoin Hot Deck I i n i j u t a t i o n

respondent is c h o s e n a t r a n d o m w i t h i n a n i m p u t a t i o n cell, a n d t h e selectecl

respondent's value is assigiK-d t h e iu)nres|)oiident. l o preserve m u l t i v a r i a t e

relationships, values f r o m t h e s a m e d o n o r a r e used for all m i s s i n g i t e m s of a recor<l. T h e seh'ction »)f d o n t u s c a n Ix perforiiud e i t h e r witlireplacement o r w i t h o u t -rei)lacem»'nt. F i n i h e r i n o r e . o n e may h a v e m o r e than o n e iinput<'il value for each missiii(f i t e m .

(iii) N'earest-.Xeighhor Hot l)e<k I m p u t a t i o n

This hot deck nu-thod assigns a iionrespondent t h e value of t h e "nearest" resi)ijn-deiit. when- " n e a r e s t " is defined in t e r m s of a tlistance f u n c t i o n of t h e auxiliary variables.

Random hot deck i m p u t a t i o n involves r a n d o m selection of d o n o r s . This r a n d o m selection mechanism i n t r o d u c e s w h a t is t<'rm«'tl i m p u t a t i o n v a r i a n c e , a n d this iniputatii>n

variance reduces t h e precision of th«' survey e s t i m a t e s .

. \ s reviewed by Hrick a i u l K a l t o n (l(M)(i), t h e r e a r e two m a i n m e t h o d s for reducing i m p u t a t i o n variance. O n e i s t h r o u g h a s a m p l e design for st'Iecting d o n o r s within each i m p u t a t i o n cell. For i n s t a n c e , selecting d o n o r s by simple r a n d o m samiiling without replacement is |)referable l o simi)le r a n d o m s a m p l i n g of flonors w i t h replacement. By

minimizing the multiple use of tlonors. the without-replacement design leads to a IO\V<T

i m p u t a t i o n variance.

•A second approach is t o u s e f r n c t i o r u i l i i i i p u l a l i o n . which involves dividing nonre-spondents" records i n t o p a r t s a n d i m p u t i n g s e p a r a t e l y t o «'ach p a r t . For e.xainple. each

(29)

17

rc'spoiuU'iit might b e d i v i d e d i n t o three p a r t s , e a c h ot which is allocated a weight ol o n e - t h i r d o f t h e n o i n c s p o i u l e n t ' s original w e i g h t . I hen s e p a r a t e donors a r e c hosen for eac h p a r t . If we have only o n e i m p u t e d value for e a c h nonr(>spondent. t h e n we will call t h e p r o c e d u r e sitiijU ( h o t d i c k ) i i i i p u t i i t i o i t .

Example 3.1

Siippo.-^i. i n a siiii[jlt m i i d u i i t .•minptf o f s i z i i i . r u n i t s n s p o n d m i d t n d o n o t r t . s p o n d t o i t t n i t j . [ I n i m p i i t t d v a U n f o r l u i s s i n t j u n i t i i s d i n o t i d Oij i j ' • I h t i n i p u t i d f s t i i i i i i t o r o f tli( p o p n l d t i o n niiiiii ) i s

HI = " ' I E H ' A I •

( i t l/( J

I f till i f i t l i - n plitcniif lit h o t d a k i m p u t a t i o n i s u s ( d . t h i n t i n r a r i n n r i o j i/i i s . c o n ­ d i t i o n a l o n r . HI . . / \ i t r ( ! i i ) = \ <"•((/,.) + — / * . (.s,*) w h I rt !•= t U a n d tjr - H y.-I f t i l t u u t h o u t - r i p l a n nil n t h o t d i c k i n i p u t n t i o n i s u s i d with r > i n . t h i n t i n r a r i a i i c i o f UI c o n d i t i o n a l o n r . \ ' a r ( t i i ) = \ ' i i r ( ! j , . ) + ( s ^ ) . (:{.27) I IJ \ / \ / F o r f r a c t i o n a l i m p u t a t i o n w i t h t h i n u m b t r o f i m p u t a t i o n ( i j u a l t o c . a n d t i n w i t h -r i p l a c t i t i f n t h o t d ( c k i m p u t a t i o n i s u s f d i n d ( p i n d i n t l y c t i i i n s . t i n v a n a n c t o f iji i s . c o n d i t i o n a l o n r , \ ' a r ( y i ) = \ ' a r ( i j r ) + l i i s ; ) . (;j.28) c n -U t n c t , w i t l w u t - r c p l a c t n n n t h o t d e c k i m p u t a t i o n a n d f r a c t i o n a l i m p u t a t i o n n d u c t t h ( i m p u t a t i o n v a r i a n c e r t l a t i i ' f t o w i t h - r t p l u c t m t n l i m p u t a t i o n .

(30)

IS

Fractional i i n p u t a t i o i i is cliscussrfl by K a l t o n a n d Kisli (19SI) a n d Kay (1996). A (lifForcnt. but r e l a t i ' d , a p p r o a c h is m u l t i p l e i m p u t a t i o n , which is discussed in t l u ' next section.

4 The Multiple Imputation Approach

.Multiple impntaticjii. proposed by R u b i n (1078). is a jjrocedure for handlinp; inissin^^ d a t a t h a t alUnvs t h e d a t a analyst tcj use s t a n d a r d Iechnic|ues of analysis desip,iu'd for c o m p l e t e d a t a , w h i l e a t t h e s a m e t i m e providing a m e t h o d t o e s t i m a t e t h e uncertainty d u e t o t h e m i s s i n g d a t a .

. \ c o m p r e h e n s i v e description of iiudti|jle i n t p u t a t i o i i is given in R u b i n (19S7). Rubin (1987) devot<'s a gocnl d e a l of c h a p t e r •'} t o specifying recpiin'ments for t h e \ a i i d i l y of m u l t i p l e i m p u t a t i o n inference u n d e r t h e nujch'l ba.s«'d a p p r o a c h . His arginnents in t h a t c h a p t e r a r e for t h e Bayesian a p p r o a c h , w h e r e inferences a r e m a d e using t h e postericjr m e a n a n d t h e p o s t e r i o r variance. R u b i n (1987) d e v o t e s C h a p t e r I t o conditions for t h e validity of multiph* i m p u t a t i o n in t h e r a n d o m i z a t i o n f r a m e w o r k .

•Multiple i m p u t a t i o n c a n b e c h a r a c t e r i z e d by t h e m e t h o d of g e n e r a t i n g tlie i m p u t e d values a n d by t h e variance fornuila. ['he variance f o r m i d a d i r e c t l y uses the c o m p l e t e -s a m p l e variance e -s t i m a t o r -s o t h a t it c a n b e i m p l e m e n t e d ea-sily u-sing t h e exi-sting -soft­ ware.

Let On b e t h e c o m p l e t e s a m p l e e s t i m a t o r of tiie p a r a m e t e r 0 a n d l „ = \ (V,„,„) b e t h e c o m p l e t e s a m p l e variance e s t i m a t o r of 0,^. The full s a m p l e V„,,„ is decomposed a s

Vjum = (>',65. V'„u»). w h e r e is t h e p a r t of w i t h / / , = 1 a n d V„,„ is t h e part of

> j(i ni 11 h /?i — 0 .

M u l t i p l e i m p u t a t i o n involves r e p e a t i n g t h e i m p u t a t i o n process independently M t i m e s . T h e i m p u t e d values a r e g e n e r a t e d from t h e posterior d i s t r i b u t i o n of V,,,,, given Yobs- After m u l t i p l e i m p u t a t i o n , w e h a v e M d a t a s e t s . T h u s w e c a n construct M

(31)

sep-19

a r a l e s t a t i s t i c s a n d M e s t i m a t o r s of v a r i a n c e baseil o n t h e a u g m e n t e d s a m p l e . Let t h e

statistics b e a n d I /(i).„ ^ i(M).n f'Ji" t h e e s t i m a t o r a n d e s t i m a t o r of

variance, respectively. 1 hen. t h e m u l t i p l e i m p u t a t i o n e s t i m a t o r of 0 is

M O M . n = . U - ' ( 1 . 2 9 ) (=1 a n d t h e a s s o c i a t e d variance e s t i m a t o r is t \ i. n = V\/.ri + — — — I h i. n - ( l.;{0) where \ l i = i a n d M f l u. n = ( A / - 1 ) " ' (i . ; J - ' ) 1-1

T h e t y p i c a l assum|>lions associated with multiiile i m p u t a t i o n a r e

I m | / • ; 0 ( i.:{;}) a n d j n n n [/•.' ( T ^ ^ , , . ) - T = U. (l.iM) where a n d = lim 0\i,„ A / X T-^.n = liin A/-+ X

In t h e Bayesian a p p r o a c h , t h e d i s t r i b u t i o n u.sed in ( I..};}) a n d ( I.^M) is t h e conditional

d i s t r i b u t i o n of 0 given \[,b, under t h e a.ssumed model. In t h e cla.ssical model-based a p ­

proach. t h e d i s t r i b u t i o n is t h e d i s t r i b u t i o n of V'^6s u n d e r t h e a s s u m e d model. In t h e r a n d o m i z a t i o n a p p r o a c h , t h e d i s t r i b u t i o n used i n (-1.33) a n d (1.34) is t h e joint d i s t r i b u ­ tion of t h e s a m p l i n g m e c h a n i s m a n d t h e response m e c h a n i s m .

(32)

20

4.1 Bayesian Justification

C h a p t e r of R u b i n (19S7) cU-als with t h f valichty of multiph' i m p u l a t i o i i in Uu*

Bayesian fraiiu'work. l b review tliat a p p r o a c h , we assume that t h e e s t i m a t o r 0„ based o n tlie c o m p h ' t e sainph* is t h e posterior m e a n of 0 under the a s s u m e d Bayesian inotiel

/ ( i i u h u l i n g b o t h t h e likelihoo<l a n d tlie prior d e n s i t y ) . That is.

()„ = i-:,{01 (1.;{••))

Also, let \ „ lie t h e |)osterior variance of 0 u n d e r t l u ' model / . That is.

i ; . = i / « ; 1 (i.iUi)

According t o M e n g (l!)f) l . i).') l:}). a liayesian iiUKlel / satisfying ( 1.•{•")) a n d ( l.^Ui) is said

t o b e c o i i t j i m i l l t o t h e analysis using I I sing t h e terminology of congeniality, u e

s u m m a r i z e t h e m a i n results in c h a p t e r •'{ of R u b i n (1?)S7).

Result 4 . 1 .1 .-.s 1/;n f

I h i l l

( i ) [''or lilt c o i n p l d t s i i i n p h . t i n l i m j t s i n n m o d i I f i.s v o n i j t m i i l l o llii itiialifsi.-i tisiiiij (iiiil

( i t ) t h i i u i p i i h d r a l u i.'i ( i n d r u u ' i i f r o m llit r o i i d i t i o i i i i l i l i s l r i b u l i o i i | V/fc,) " /

^ m i s ( j i i ' f i i Vi.fc, u n d t r l f i t B i i i j i s i i i n m o d t l f .

T h i l l , n n d t r n o i i n s p o n s t . Ihf l i n y t f i i a i i m o d d f /.s coiujiiuiil t o Iht i i n i i l y s i s usiiiij l l n i m p u i f d p a i r ( ^ A / . n . /.v/.n) c a l c u l u l t d f r o m a n d (.(..W). its . \ / —> : x . T h i i l i s .

= E f ( 0 I (

a n d

7

'X.N

= I V L i J . (-l.^W)

(33)

21

Proof,

l o slunv t h e

T'C

|ualii'n's ( 1.37) a n d ( l.iJS). n u l c t h a t

r { 0 \ Yj,.,) = I

r ( o \ ) v ; , . . , ) P { ) | > " 6 . ) ' i o )

I

v;,,,) for / = I . 2 . - - -

. M .

S o . by till' law of l a r g e tiumlicrs.

/ • ; ( 0 | V j , . i = Mm £ ; / • . • ( « I > A , . i ; r ) 1 = I 1 -i — 1 a n d

+\- {i-:{0\

v;,u,) i V

m

}

I -1 = -1 I " «=I a l m o s t surely. •

By Uesult 1.1. we h a v e t h e desired relation ( 1.3i}) a n d ( t..M) under t h e posterior

distribution of 0 given In Uesult l . l . t h e r e ar«' t w o m o d e l s involvt-d. I ' h e first

motlel is called t h e a n a l y s t ' s model, which is u.sed in ( I.•{•')) a n d ( l.-'Ui). I lu- secoiul

m o d e l is called t h e i m p u t e r ' s m o d e l , which is used in calculating C { ) ' , n i s I Uesuh

1.1 re(|uires t h a t t h e t w o models b e t h e s a m e . . \ s is observed in Kay (1991.11)92). .\Ieng (199-1). l{ul)in (n)9()). a n d Schafer (1997). if t h e a n a l y s l ' s model is dilferent f r o m t h a t of i m p u t e r . then t h e m u l t i p l e i m p u t a t i o n e s t i m a t o r m a y b e bia.sed. not only for variance estinuitiou but also for p o i n t e s t i m a t i o n .

(34)

22

4.2 Randomization Validity

Multiple i i n p u t a t i o i i . which is based on t h e Bayesiaii p a r a d i g m , c a n h e evaitiatetl u n d e r t h e fre(|uenlist p a r a d i g m , where t l u ' p o p u l a t i o n values a r e t r e a t e d a s fixed a n d inferences are hased o n tin* s a m p l i n g d i s t r i b u t i o n gen«'rated by r e p e t i t i o n s of th»' sami)le seU'ction procedure a n d a motlel for resi)onse probai)ilities.

Hubin(l!)S7. p.118) g a v e t h e delinition of pro|)er i i n p u l a l i o n . w h i c h is a key concept for t h e randoniization validity of multiple i m p u t a t i o n . T h e delinition of a proper i m p u

-tati(jn procedure t r e a t s t h e comi)lele samjjle V a s lixed. a n d t h e res|}onse indication

vector

R

as t h e r a n d o m variable. F o r coinplet(> s a m i i l e statistics a n d \ „ . a im|Mitalion

m e t h o d is called profx r uncler t h e assumeil r e s p o n s e mechanism if

I = 0,,. i c i )

l-H x.M I Vv.m} = K,. (("2)

a n d

I Vvi.m} = I ((";{)

The subscript H is used h e r e t o em|)hasize t h a t t h e reference d i s t r i b u t i o n is with respect t o t h e assumed response m e c h a n i s m on H.

[ h e main conclusion regarcling randomization validity with prop«'r i m p u t a t i o n is well s u m m a r i z e d in i t u b i n (19S7):

Result 1.1; If t h e c o m p h ' t e - d a t a inference is randomization valid a n d t h e imiltiple-imputation p r o c e d u r e is p r o p e r , t h e n t h e infinite-/^ r e p e a t e d im­ putation inference is randomization-valid u n d e r t h e posited r e s p o n s e mech­ anism. ( p . 119)

T h e conditions of p r o p e r i m p u t a t i o n a r e difficult t o verify. O n e i m p u t a t i o n procedure

is t o generate t h e missing p a r t V'„,„ from t h e conditional d i s t r i b u t i o n £ ( V „ , „ | of

(35)

2:}

p r o p i r i n i p u l d t i o i i . liayesianly p r o p e r impiilalioii is nol suMic ieiit for p r o p e r imi)iilalioii.

r i u ' following t l i c o r r m a t l e r n p t s l o clarify l l u ' rolationsliips.

Theorem 4.1

I f a m u l t i p U i m p u t a t i o n i j t n t r a t t t l f r o m C { \ „ u 3 | ^uha) U'^iny tlit f i a i j t s i u i i m o d t l f s d t i s j i t s ( C ' l ) , t i n n i t a l s o s a t i s j i t s { ( ' • { } .

Proof. [.

c t tl li<> h!1\' gi\<'n full ^ainpl'' I' s t i n i a t o r of I).

lU'

t l i c law of l a r a e tiumlxMs.

1 . a n d l<:=l = / • / ( O n

I yJ,.,)

\ t ^lini^ T T H " ^

~

= I/(<),. I

Now,

I V„,m) = l-U / (^r. I ^ .6,) I v.,,,

= /•,'/< IV/ — I'. J {0,X I I \ ,.h, I Flirt l i o n n o r e .

A

h(«x.,. I

-Av (<),.! v:,.,) IV

= Vn E f { o „ I I V,„

whore i h e e q u a l i t y ( l.;J9) follows from t h e decomposilion

(l.:5!)) ( 1 . 1 0 )

V

[ q

I v..m] = Vft

[ E f

{

q

i

K b . )

1

+ i

- n

[ i / {

q

I V . 6 , ) I v;,.,

w i t h Q = d „ - E f I VLi,) a n d so E f[ Q \ \ = 0 . The e q u a l l y (1.1 0 ) holds because

(36)

by a s s u m p t i o n (C'l).

Sunu- autliurs. for rxampU* Kay (19!)'2). have cpn'stioiKnl t h e validity of t h e m u l t i p l e i m p u t a t i o n u n d e r t h e f r e q u e n t i s t response probability m o d e l . W e will s t u d y t h i s in t h e n e x t subsection.

4 . 3 F a y ' s E x a m p l e

Fay( 19!)1.1(M)2) us«'d a li<'riiouHi model t o i l l u s t r a t e t h e ililliculty of c r e a t i n g pro|)er i m p u t a t i o n s a s a g«'iieral purposi' m<'thodology. W e s u p p r e s s t h e subscript n in t h e e s t i m a t o r s t o simplify t h e n o t a t i o n . S u p p o s e we liav<' a s i m p l e r a n d o m s a m p l e of size ii for variable V t a k i n g o n l y 0 t)r 1. .Vssumi' t h a t , for simplicity, t h e lirst r c-U'inents a r e

observetl a n d t h e | ) a r a m e t e r of int*'ri'st is 0 \ — A " ' 51. = i f^is'iine tin- uniform

r e s p o n s e mechanism a n d u s e th«' H<'rnouHi model t o c n ' a t e lh<> m u l t i p l e i m p u t a t i o n , t h e n w<' use M w i t h ( l . l l ) a s t l u ' e s t i m a t o r of ( ) \ a n d u s e w i t h M (-1.-12) a n d M

(37)

•Jo

t o e s t i m a t e t h e variance of 0 \ i . riien, letting n r ^ p ^ ( 0 . I) and . V ' n —> 0. we have

E [ O ^ \ \ )

Var{0^ I Y)

I-(IK i

Y)

O s - 1 - O s )

< r

' O s { \ - O s ) - N ~ ' ) O s Oy

IxTaus*' t l u ' res|)ondents c a n b e regarded a s a s i m p l e r a n d o m s a m p l e from t h e p o p u l a t i o n . Hence.

i : { i \

i Y )

= \ „ r ( ( l ^

| Y ) .

Now. a s s u m e that for e a c h unit i . we have .V, t a k i n g e i t h e r u or b a s possible values. W e want t o e s l i m a t t ' 0 , , = / ' r ( V = 1..V = a ) a n d O t , = l ' r { ) = 1. .V = b ) . Let

II I I , , -I- III, an<l /• = r , + n , . If we h a v e c o m p l e t e response, t h e n w v will u s e

0., = f / , ; '

^ v;/( . V , = </1

1 = 1

0 , = = M

1=1

a n d t h e variance-covariance e s t i m a t o r for {O.t.Oi,^ is

\ =

( i - y , . ) 0

0 ( i - t f . )

w i t h d„ = tfu + Oh.

If we h a v e missing d a t a a n d i m p u t e using t h e a p p r o x i m a t e Bayesian b o o t s t r a p m e t h o d [jroposed by Rui)in a n d S c h e n k e r (l!)86). tlien we u s e

M { ' ) 0..M = . 1 / : i (=1

4

, u = (=1

(38)

2(i

WIKTC

jc) _ { Y . y j i \ , = " ) + T .

v;"7(.v, = </

1=1 i = r+ 1 ijU) "h.i =

=

E

> • ; " ' / ( . V , = b ) 1=1 l = r+l a n d t l u ' varinii(»'-c-u\ariaiu-(' e s t i m a t o r for is

/

l \ , = ; r V\/ + (1 + i (1 + .\/ ' I Ihi.ih + M ) li\i :rr\/ + (1 +

wluTf I \/ is dcliiK'd in ( l. l'i) a n d

m I h l : . = I h l . i h = Ihi.bh •*' * f = | (=1

riuMi. since tiie resj)onileiits a r e a siinpli- r a n d o m sam|)U' of size r .

/

K { T . . ) = 0 s ( 1 - 0 S

_L -L ^ y i (" i i - ' . i ) () i

"u 'T, V "u / '" V ".i / \ / >•

C "II-'•u ^ f ^ i ±. J . J . f '•l.-' l.y' i \ V / V " b / ' • " f . " i l V " ( . / >• ljUt I '«;•

/

\

V / = t f v ( l 0 , s -£u.+ i ^ "f, r V ri„ y r V M„ m j i ( l _ + i [ i _ ( i t ) -r \ lia " h / "f, r \ " 6 /

Hence. overestimates t h e variances of a n d tf5,x. u n d e r s t i m a t e s t h e c o v a r i a n c e o f

(39)

27

Kubin (1990) fxplaiiK'd tluit if t lie auxiliary variabk- .V is not used by t h e i m p u t c r t o croatc t l u ' m u l t i p l e i m p u t a t i o n , b u t is used l)y tlit> u l t i m a t e analyst t o tlefine t'slimands. then tlu* i m p u t a t i o n may not IK- p r o p e r . In t h e above e x a m p l e , tlie . A H U i m p u t a t i o n is

proper for 0 = + Oi,. l)ut is imjiroijer for 0 = 0.^— Oi,. Ruliin (1!)9()) a r g u e s t h a t t h e

nHilti|)le i m p u t a t i o n is still conlidenee-proper in tlie s e n s e that it produ<<'s a variance ('stin)ate t l i a t is t o o large.

5 Quasi-Raiidomizatioii Approach

T h e r a n d o m i z a t i o n aj)proaeh. wliieli tri'ats tlu- |)opulation vector

Y

a s lixed. has

played a d o m i n a n t role in t h e design a n d analysis of s a m p l e surveys, [{andumization inference re<|uires t h a t units Ix' selec t<'d by iinihabilili/ snniplinij. which is characterized by t h e folhjuing two projjerties:

1. I'he s a m p l i n g distribution is d e t e r m i n e d by t h e s a m p l e r b»'fore a n y ij values a r e known.

2. Kvery u n i t h a s a positive ( k n o w n ) probal)ility of s«'lection.

riie key ingredient of t h e r a n d o m i v a t i o n approach, a k n o w n probability of selection, is lost when s o m e of t h e d a t a a r e m i s s i n g .

Ciiven t h e e x i s t e n c e of n o i n e s p o n d e n t s . o n e approach is t o regard t h e respoiid<Mits as t h e second pha.se s a m p l e in a two-pha.se sam|)le design. T h i s is niaiie possible by treating

the //,"s as random variables. It is necessary to specify a probability niod«'l for

R.

T h e r a n d o m i z a t i o n v e r s i o n o f i n f e r e n c e f o r i m p u t a t i o n i s c a l l e d t h e q i K L s i - r a n d o i n i z a t i o i i approach, a t c r u : suggested by O h a n d .Seh«Hireii (l9N.n.

T h e r e a r e t w o main differences b e t w e e n t h e s a m p l i n g distribution a n d t h e response distribution i n t h e q u a s i - r a n d o m i z a t i o n approach. F i r s t , t h e sampling d i s t r i b u t i o n is determined b y t h e sampler before a n y observations ar«' taken. O n t h e o t h e r h a n d .

(40)

wo may have (lifrerenl response m e c h a n i s m s for ditrerent i t e m s . Second, t l i e s a m p l i n g dislriljution is k n o w n , under t h e c o n t r o l of tlu' survey s t a t i s t i c i a n . O n t h e o t h e r h a n d , t h e response d i s t r i l j u t i o n is u n k n o w n a m i needs s o m e f o r m of modelling.

If t h e finite population is p a r t i t i o n e d into

CI

i m p u t a t i o n cells, t h e usual

([uasi-raiidoniization a p p r o a c h assunu's t h e following respons*' m e c h a n i s m .

( k . l ) ['"or e a c h cell i ] — I . - - . ( i . all i t e m s { h } , g t o have

tiie s a m e r e s p o n s e prol)al)ility. p j = P r { l i , — 1 | / € ^ j ) . where / , tlenotes t h e s«'t of indices for t h e 7-th i m p u t a t i o n c«>ll.

( R . i ) Kor e v e r y / = 1. • • • . .V. l ^ r { l i , — I ) > 0.

We will call t h i s u n i f o r m n s i w i i s i intcli<itiisin i n l l n i i i m p u l a l i o t i n i l .

Uao a n d Shat) used w f i f j l i t u l h o i i l i c k u n p t i i a t t o n . which selects tlonors with

replacenn'iil witli t h e prohaliility of selection being |)rop(jrtiunal ttj t h e s a m p l i n g weights. This produces a n unbiased e s t i m a t o r u n d e r assinnptions ( H. 1) a n d (H.2). W hile t lu' pro-cedur«' is o f t e n saitl t o be design uidjia.sed. unbia.sedn«'ss r«'(iuires t h e responsj* probability model a s s u m p t i o n s ( l { . l ) a n d ( l t . 2 ) .

riie adjusttnl jackknife v a r i a n c e t ' s l i m a t u r for w e i g h t e d hot deck i m p u t a t i o n , pro­

posed by R a o a n d S h a o is constructi'd by c h a n g i n g I'very i m p u t e t l value for t h e

jackknife replicatc* when a r e s p o n d e n t is deleted. I h e v a r i a n c e e s t i m a t o r c a n b e w r i t t e n a s I. 2 (••i.i;}) where r; + (I - f i j ) { ' h + - .7, (o.-l-l) J=1 j€.lnr. with 1 J •' (o.-io)

References

Related documents

The basic insight after first experiences with the implemented platform for openHPI is that traditional learning management systems are actually not suit- able for the operation

7 Barriers include: (1) insufficient training and/or interest of some PCPs in managing mental disorders; (2) the brevity of primary care visits; (3) the competing de- mands

Within the same category of coins, the listing follows the increasing value of the currency.. (f) In the list the denomination of the coins reflects the currency shown on

The fertility rates by second order live births were highest at age 28 on a level of 52 live births per 1 000 women in semi-urban or rural municipalities and at age 31 on a level

Based on the results, theory surpasses practice in which restructuring TESOL programs is a demand to utilize an optimal integration of a core and complimentary segments of

Figure 3: RMS error between the model run with assimilation and the true solution for different schemes and parameters. The x-axis represents the localization length-scale and

The task of a clustering algorithm is to group those objects into some number of clusters, so that:..  Members of a cluster are similar to