Modelling Missing Data Complete-data model:

(1)

Modelling Missing Data Complete-data model:

◦ independent and identically distributed (iid) draws Y ¹ , . . . , Y n

◦ from multivariate distribution P θ , Y i = (Y i1 , . . . , Y ip ) ^T ∼ P ^θ .

◦ Data matrix:

Y = (Y 1 , . . . , Y n ) ^T = (Y ij ) i=1,...,n;j=1,...,p

1 2 ...

n

Y i1 Y i2 · · ·

· · · Y ip

Units

Inference: estimate θ

◦ likelihood approach Ã maximize likelihood function

◦ Bayesian approach Ã consider posterior distribution Likelihood function of the complete data:

f Y (y|θ) = Q ⁿ

i=1 f Y i (y i |θ) where f Y i ( ·|θ) is the density of P ^θ

Maximum Likelihood with Missing Data, Mar 30, 2004 - 1 -

Modelling Missing Data

Problem: Some data Y ij may be missing

Example: Cholesterol levels of heart-attack patients Data:

◦ Serum-cholesterol levels for n = 28 patients treated for heart attacks.

◦ Cholesterol levels were measured for all patients 2 and 4 days after the attack.

◦ For 19 of the 28 patients, an additional measurement was taken 14 days after the attack.

◦ See also Schafer, sections 5.3.6 and 5.4.3.

Id Y 1 Y 2 Y 3 Id Y 1 Y 2 Y 3

1 270 218 156 15 294 240 264 2 236 234 — 16 282 294 — 3 210 214 242 17 234 220 264 4 142 116 — 18 224 200 — 5 280 200 — 19 276 220 188 6 272 276 256 20 282 186 182 7 160 146 142 21 360 352 294 8 220 182 216 22 310 202 214 9 226 238 248 23 280 218 — 10 242 288 — 24 278 248 198 11 186 190 168 25 288 278 — 12 266 236 236 26 288 248 256 13 206 244 — 27 244 270 280 14 318 258 200 28 236 242 204

Maximum Likelihood with Missing Data, Mar 30, 2004 - 2 -

Modelling Missing Data Idea: describe missingness of Y ij by indicator variable R ij

R ij = ½ 1 Y ij has been observed 0 Y ij is missing Multivariate dataset with missing values:

Y i1 Y i2 · · · Y ip R i1 R i2 · · · R ip

Units 1, . . . , n 1 1 1 1

. . . , n 2 0 1 1

. . . , n 3 1 0 1

. . . , n 4 1 1 0

. . . , n 5 0 0 1

. . . , n 6 0 1 0

. . . , n 7 0 1 0

. . . , n 8 0 0 0

Observed-data model:

◦ The observed information consists of

¦ observed values y ^ij

¦ the values r ^ij indicating which values are missing

◦ A statistical model for the observations should make use of all available information

Notation:

Y = (Y obs , Y mis ) where

◦ Y ^obs are the observed variables

◦ Y ^mis are the missing variables

Missing Data Mechanism

Statistical model for missing data:

(R = r |Y = y) = f R|Y (r |y, ξ) with parameters ξ ∈ Ξ.

Definition Missing completely at random (MCAR) The observations are missing completely at random if

(R = r |Y = y) = (R = r) or equivalently

f _R|Y (r |y, ξ) = f R (r, ξ), that is, R and Y are independent.

Definition Missing at random (MAR) The observations are missing at random if

(R = r |Y = y) = (R = r|Y ^obs ) or equivalently

f R |Y (r|y, ξ) = f R |Y ^obs (r|y obs , ξ),

that is, knowledge about Y mis does not provide any additional infor- mation about R if Y obs is already known (R and Y mis are conditionally independent given Y obs ).

Definition Not missing at random (NMAR)

Observations are not missing at random if the above assumption for

MAR is not fulfilled.

(2)

Missing Completely at Random (MCAR)

Example: Bernoulli selection

◦ Complete data: Y 1 , . . . , Y n

◦ Response indicator: R 1 , . . . , R n

◦ Missingness mechanism: Each unit is observed with probability ξ f R (r |ξ) =

n

Q

i=1

ξ ^r ⁱ (1 − ξ) ¹ ^−r ⁱ

Example: Simple random sample (SRS)

◦ Suppose that only m observations are available, n − m values are miss- ing.

◦ Responding units are simple random sample of all units:

f R (r |ξ) = (¡ _n

m

¢ −1 if P n i=1 r i = m

0 otherwise

◦ Note that ξ = m is a parameter of the distribution of R

Maximum Likelihood with Missing Data, Mar 30, 2004 - 5 -

Missing at Random (MAR)

Example: Hypertension trials

Data: Serial measurements of blood pressure during long-term trials of drugs which reduce blood pressure

Problem: Patients are withdrawn during the course of the study because of inadequate blood pressure control.

Example: from: Murray and Findlay (1988), Correcting for the bias caused by drop- outs in hypertension trials, Statistics in Medicine 7, 941-946.

◦ International double-blind randomized study with 429 patients

◦ 218 receive β-blocker metoprolol

◦ 211 receive serotonergic S 2 -receptor blocking agent ketanserin

◦ Treatment phase lasted 12 weeks

◦ Scheduled clinic visits for weeks 0, 2, 4, 8, and 12

◦ Patients with diastolic blood pressure exceeding 110 mmHg in week 4 or week 8 “jump” to an open follow-up phase.

◦ Missing data pattern for the two groups:

Metoprolol group 0 2 4 8 12 Number of patients

1 1 1 1 1 162

1 1 1 1 0 14 (11 jumps)

1 1 1 0 1 2

1 1 1 0 0 30 (28 jumps)

1 1 0 1 1 1

1 1 0 0 0 2

1 0 1 1 1 2

1 0 1 0 1 1

1 0 1 0 0 1

1 0 0 1 1 1

1 0 0 0 1 1

1 0 0 0 0 1

Ketanserin group 0 2 4 8 12 Number of patients 1 1 1 1 1 136 1 1 1 1 0 19 (18 jumps)

1 1 1 0 1 1

1 1 1 0 0 41 (37 jumps)

1 1 0 1 1 2

1 1 0 0 0 4

1 0 1 1 1 2

1 0 1 1 0 1

1 0 1 0 1 1

1 0 1 0 0 1

1 0 0 1 1 2

1 0 0 1 0 1

1 0 0 0 0 1

◦ “Jumps” (majority of missing observations) lead to missing data which are missing at random

Maximum Likelihood with Missing Data, Mar 30, 2004 - 6 -

Missing at Random (MAR) Mean diastolic blood pressure based

a. on all available data at each time point b. on patients with complete data

c. on all available data, assuming missing data to be missing at random

Weeks on Treatment

Mean DBP (mmHg)

0 4 8 12

90 95 100 105 110

Ketanserin Metoprolol

(a)

Weeks on Treatment

Mean DBP (mmHg)

0 4 8 12

90 95 100 105 110

Ketanserin Metoprolol

(b)

Weeks on Treatment

Mean DBP (mmHg)

0 4 8 12

90 95 100 105 110

Ketanserin Metoprolol

(c)

Missing at Random (MAR) Examples for data missing at random:

◦ Double sampling: sample survey:

¦ observe variables Y i1 , . . . , Y ik for all individuals i = 1, . . . , n

¦ observe variables Y i,k+1 , . . . , Y ip only for subsample

◦ Sampling with nonresponse followup: sample survey with nonresponses

¦ intensive followup effort for all units not feasible

¦ take random sample of all nonresponding units

◦ Randomized experiment with unequal numbers of cases per treatment group:

¦ suppose original design was balanced

¦ missingness mechanism is deterministic, thus data are MAR

◦ Matrix sampling for questionnaire or test items

¦ divide test or questionnaire into sections

¦ groups of sections are administered to subjects in a randomized

fashion

(3)

Not Missing at Random (NMAR) Example: Medical treatment of blood pressure

n individuals are treated for high blood pressure

◦ Observations: binary responses

Y ij = blood pressure of ith individual on jth day

◦ Missing observations: (not every individual shows up every day)

R ij = ½ 1 ith individual appears for measurement on jth day 0 otherwise

◦ Statistical model for Y and R:

Y ij

iid ∼ N (µ ⁱ , σ ² ) R ij

iid ∼ Bin ³

1, 1 − _{1 + exp(Y} ^exp(Y ^ij ⁾

ij )

´

Ã that is, individuals with high blood pressure are less likely to turn up for a measurement

Maximum Likelihood with Missing Data, Mar 30, 2004 - 9 -

Not Missing at Random (NMAR)

Example: Randomly censored data

◦ Objective: Analysis of data (e.g. survival times) T 1 , . . . , T n

iid ∼ f(·|θ)

◦ Survivor function S(t): Probability of surviving beyond time t S(t) = (T 1 > t)

◦ Censoring times associated with T i ’s:

C 1 , . . . , C n iid ∼ g(·|ξ)

(e.g. due to loss to follow up, drop out, or termination of the study)

◦ Observations: (Y ¹ , R 1 ), . . . , (Y n , R n ) where Y i = min(T i , C i )

and

R i = ½ 1 if T i ≤ C i

0 otherwise

Maximum Likelihood with Missing Data, Mar 30, 2004 - 10 -

Observed-Data Likelihood

Likelihood of complete observations (Y obs , R) L n (θ, ξ |Y ^obs , R) = f Y obs ,R (Y obs , R |θ, ξ)

= Z

f Y,R (Y obs , y mis , R |θ, ξ) dy ^mis

= Z

f _R|Y (R |Y ^obs , y mis |ξ)f ^Y (Y obs , y mis |θ) dy ^mis If the missing observations are missing at random (MAR), the first factor does not depend on y mis and thus is not affected by integration over y mis :

L n (θ, ξ |Y ^obs , R) = Z

f _R|Y (R |Y ^obs |ξ)f Y (Y obs , y mis |θ) dy ^mis

= f _R|Y (R |Y ^obs |ξ) Z

f Y (Y obs , y mis |θ) dy ^mis

= f R |Y (R |Y ^obs |ξ)f ^Y ^obs (Y obs |θ)

= f R |Y (R |Y ^obs |ξ)L ⁿ (θ |Y ^obs )

If the parameters ξ do not depend on θ, we get θ = argmax ˆ

θ ∈Θ

L n (θ, ξ |Y ^obs , R) ⇔ ˆθ = argmax

θ ∈Θ

L n (θ |Y ^obs ),

that is, we can obtain the maximum likelihood estimator by minimizing L n (θ |Y ^obs ) with respect to θ.

Definition Observed-data likelihood The likelihood function

L n (θ |Y ^obs ) = f Y obs (Y obs |θ)

is called the likelihood ignoring the missing-data mechanism or short observed-data likelihood.

Ignorability

Definition

A missing-data mechanism is ignorable for likelihood inference if

◦ observations are missing at random (MAR) and

◦ the parameters ξ (missingness-mechanism) and θ θ (data model) are distinct, in the sense that the joint parameter space of (θ, ξ) is the product of the parameter spaces Ξ and Θ.

Example: Non-distinct parameters Suppose that

◦ Y ⁱ ^iid ∼ Bin(1, θ)

◦ R ⁱ ^iid ∼ Bin(1, θ)

◦ Y ^obs = (Y 1 , . . . , Y m ) ^T (after reordering) The joint likelihood of Y obs and R is

L n (θ |Y ^obs , R) = Q ⁿ

i=1

θ ^r ⁱ (1 − θ) ¹ ^−r ⁱ Q ^m

i=1

θ ^y ⁱ (1 − θ) ¹ ^−y ⁱ

= θ ^m+Y (1 − θ) ⁿ ^−Y

This is the likelihood of a Bernoulli experiment with n + m observations.

Consequently θ ˆ ML = m + Y

m + n .

Ignoring the missing-data mechanism: The observed-data likelihood is L n (θ|Y obs ) =

m

Q

i=1

θ ^y ⁱ (1 − θ) ¹ ^−y ⁱ which leads to the ML estimator

θ ˆ ML = Y

m .

(4)

Ignorability

The first condition (MAR) is typically regarded as the more important condition:

◦ If MAR does not hold, then the maximum likelihood estimator based on the observed-data likelihood can be seriously biased.

◦ If the data are MAR but distinctness does not hold, inference based on the observed-data likelihood L n (θ |Y obs ) is still valid from the frequentist perspective, but not fully efficient.

Example:

Suppose

◦ Y i

iid ∼ N (µ, σ ² )

◦ R i

iid ∼ Bin ¡1, e ^µ /(1 + e ^µ ) ¢ Ã parameter are not distinct

Likelihood

µ

−0.4 −0.2 0.0 0.2 0.4

Ln

(

µ,1|Yobs

)

Ln

(

µ,1|R

)

L_n

(

µ,1|R,Yobs

)

MLE based on L

n

( θ|Y

obs

)

µ

^

obs

Frequency

−0.4 −0.2 0.0 0.2 0.4

0 30 60 90 120 150 180

MLE based on L

n

( θ|Y

obs

,R )

µ^

_compl

Frequency

−0.4 −0.2 0.0 0.2 0.4

0 30 60 90 120 150 180

Frequentist interpretation:

◦ Parameter θ 0 = (µ 0 , σ ² 0 ) is fixed.

◦ If R i

iid ∼ Bin(1, ¹ 2 ) then:

missingness mechanism ignorable ˆ

µ obs MLE for θ

◦ If µ ⁰ = 0 then R i

iid ∼ Bin(1, ¹ 2 ) ÃProperties of ˆ µ obs unchanged.

◦ However, ˆµ ^compl is more efficient.

Maximum Likelihood with Missing Data, Mar 30, 2004 - 13 -

Univariate Data with Missing Observations Example: Incomplete univariate data

Unit

Y R

1 2 ...

m ...

n 1 1 ...

1 0 ...

0 Suppose that

◦ Y ¹ , . . . , Y n is an iid random sample

◦ only Y ^obs = (Y 1 , . . . , Y m ) ^T have been observed

◦ Y ^mis = (Y m+1 , . . . , Y n ) ^T are missing at random The observed-data likelihood is

L n (θ|Y obs ) = Z

f Y (Y obs , y mis |θ) dy mis

= Z

· · · Z _m

Q

i=1

f Y i (Y i |θ) Q ⁿ

i=m+1

f Y i (y i |θ) dy m+1 · · · dy n

= Q ^m

i=1

f Y i (Y i |θ) Z

· · ·

Z _n

Q

i=m+1

f Y i (y i |θ) dy ^m+1 · · · dy n

=

m

Q

i=1

f Y i (Y i |θ)

Thus the observed-data likelihood is a complete-data likelihood based on the reduced sample (Y 1 , . . . , Y m ).

Maximum Likelihood with Missing Data, Mar 30, 2004 - 14 -

Multivariate Normal Distribution Let Y = (Y 1 , . . . , Y p ) ^T ∼ N (µ, Σ).

Properties

◦ If p = 2

Σ = Ã σ ² ₁ σ 12

σ 12 σ ² 2

!

=

Ã σ ² ₁ ρσ 1 σ 2

ρσ 1 σ 2 σ ² 2

!

where ρ = σ 12

σ 1 σ 2

is the correlation between Y 1 and Y 2 .

◦ Density of Y

f Y (y |µ, Σ) = (2π) ⁻ ^p ² (det(Σ)) ⁻ ¹ ² exp ¡ − ¹ 2 (y − µ) ^T Σ ⁻¹ (y − µ) ¢,

◦ Linear transformations

AY + b ∼ N (Aµ + b, AΣA ^T ).

◦ Marginal distribution of Y ¹ Y 1 ∼ N (µ ¹ , σ ² 1 )

◦ Conditional distribution of Y 2 given Y 1

Y 2 |Y ¹ ∼ N ¡β 0 + β 1 Y 1 , σ ² ₂ _|1 ¢, with parameters

β 0 = µ 2 − β ¹ µ 1

β 1 = σ 12

σ 1 ²

σ ² ₂ _|1 = σ ² ₂ − σ 12 ²

σ ² 1

Bivariate Normal Data

Example: Bivariate data with one variable subject to nonresponse

1 2 ...

m ...

n

1 1 ...

1 0 ...

0 Y 1 Y 2 R 2

Units

Suppose that

Y i = (Y i1 , Y i2 ) ^{T iid} ∼ N (µ, Σ)

with mean µ = (µ 1 , µ 2 ) ^T and covariance matrix

Σ = Ã σ ² 1 σ 12

σ 12 σ 2 ²

! .

The observed-data likelihood is given by L n (θ |Y ^obs ) = Q ^m

i=1

f Y i (Y i |µ, Σ) Q ⁿ

i=m+1

f Y i1 (Y i1 |µ ¹ , σ ² ₁ )

= Q ^m

i=1

f Y i2 |Y i1 (Y i2 |Y ⁱ¹ , β 0 , β 1 , σ ² 2 |1 ) Q ⁿ

i=1

f Y i1 (Y i1 |µ ¹ , σ 1 ² ).

(5)

Bivariate Normal Data

Example: Bivariate data with one variable subject to nonresponse Data: Y i = (Y i1 , Y i2 ) ^{T iid} ∼ N (µ, Σ)

◦ Parameter: µ ¹ = µ 2 = 0, σ 1 ² = σ ² 2 = 1, ρ

◦ Y ⁱ¹ completely observed

◦ Y ⁱ² has missing values

Missing data mechanism: Proportion of missing data 1 − p = 20%

◦ Missing completely at random (MCAR):

R i2

iid ∼ Bin(1, p)

◦ Missing at random (MAR):

R i2 = 0 if Y i1 > z p

◦ Not missing at random (NMAR):

R i2 = 0 if Y i2 > z p

Simulation study:

◦ 1000 repetitions for ρ = 0, 0.5, 0.9

◦ Simulated means (standard variations) for ˆµ ^CC and ˆ µ M L

MCAR MAR NMAR

ˆ

µ CC µ ˆ M L µ ˆ CC µ ˆ M L µ ˆ CC µ ˆ M L

ρ = 0 0.00046 0.00048 -0.00095 -0.00143 -0.34585 -0.34587 (0.10762) (0.10797) (0.10876) (0.12250) (0.08760) (0.08776) ρ = 0.5 0.00557 0.00534 -0.17073 0.00459 -0.34919 -0.29226

(0.11294) (0.11229) (0.10585) (0.11850) (0.08406) (0.08171) ρ = 0.9 -0.00218 -0.00208 -0.31229 -0.00130 -0.34964 -0.10023

(0.11012) (0.10093) (0.09034) (0.10361) (0.08460) (0.08877)

Maximum Likelihood with Missing Data, Mar 30, 2004 - 17 -

Bivariate Normal Data

Missing Values in Y 2 (MCAR)

Y 1 Y 2

−3 −2 −1 0 1 2 3

−3

−2

−1 0 1 2 3

observed missing

Missing Values in Y 2 (MAR)

Y 1 Y 2

−3 −2 −1 0 1 2 3

−3

−2

−1 0 1 2 3

observed missing

Missing Values in Y 2 (NMAR)

Y 1 Y 2

−3 −2 −1 0 1 2 3

−3

−2

−1 0 1 2 3

observed missing

Maximum Likelihood with Missing Data, Mar 30, 2004 - 18 -

Bivariate Normal Data

Complete−case (MCAR)

µ ^ _CC

Frequency

−0.8 0 −0.4 0.0 0.4

50 100 150 200

250 ρ = 0

Complete−case (MAR)

µ ^ CC

Frequency

−0.8 0 −0.4 0.0 0.4

50 100 150 200

250 ρ = 0

Complete−case (NMAR)

µ ^ _CC

Frequency

−0.8 0 −0.4 0.0 0.4

50 100 150 200

250 ρ = 0

MLE (MCAR)

µ

^ _ML

Frequency

−0.8 0 −0.4 0.0 0.4

50 100 150 200

250 ρ = 0

MLE (MAR)

µ

^ ML

Frequency

−0.8 0 −0.4 0.0 0.4

50 100 150 200

250 ρ = 0

MLE (NMAR)

µ

^ _ML

Frequency

−0.8 0 −0.4 0.0 0.4

50 100 150 200

250 ρ = 0

Bivariate Normal Data

Complete−case (MCAR)

µ ^ _CC

Frequency

−0.8 0 −0.4 0.0 0.4

50 100 150 200

250 ρ = 0.5

Complete−case (MAR)

µ ^ CC

Frequency

−0.8 0 −0.4 0.0 0.4

50 100 150 200

250 ρ = 0.5

Complete−case (NMAR)

µ ^ _CC

Frequency

−0.8 0 −0.4 0.0 0.4

50 100 150 200

250 ρ = 0.5

MLE (MCAR)

µ

^ _ML

Frequency

−0.8 0 −0.4 0.0 0.4

50 100 150 200

250 ρ = 0.5

MLE (MAR)

µ

^ ML

Frequency

−0.8 0 −0.4 0.0 0.4

50 100 150 200

250 ρ = 0.5

MLE (NMAR)

µ

^ _ML

Frequency

−0.8 0 −0.4 0.0 0.4

50 100 150 200

250 ρ = 0.5

(6)

Bivariate Normal Data

Complete−case (MCAR)

µ ^ _CC

Frequency

−0.8 0 −0.4 0.0 0.4

50 100 150 200

250 ρ = 0.9

Complete−case (MAR)

µ ^ CC

Frequency

−0.8 0 −0.4 0.0 0.4

50 100 150 200

250 ρ = 0.9

Complete−case (NMAR)

µ ^ _CC

Frequency

−0.8 0 −0.4 0.0 0.4

50 100 150 200

250 ρ = 0.9

MLE (MCAR)

µ

^ _ML

Frequency

−0.8 0 −0.4 0.0 0.4

50 100 150 200

250 ρ = 0.9

MLE (MAR)

µ

^ ML

Frequency

−0.8 0 −0.4 0.0 0.4

50 100 150 200

250 ρ = 0.9

MLE (NMAR)

µ

^ _ML

Frequency

−0.8 0 −0.4 0.0 0.4

50 100 150 200

250 ρ = 0.9

Maximum Likelihood with Missing Data, Mar 30, 2004 - 21 -

Bivariate Normal Data

Example: Bivariate data with both variables subject to nonresponse

1 ...

l l + 1

...

m m + 1

...

n

1 ...

1 0 ...

0 1 ...

1 1 ...

1 0 ...

0 Y 1 Y 2 R 1 R 2

Units

Data: Y i = (Y i1 , Y i2 ) ^{T iid} ∼ N (µ, Σ)

◦ Both variables subject to nonresponse The observed-data likelihood is given by

L n (θ |Y ^obs ) = Q ^l

i=1

f Y i (Y i |µ, Σ) Q ^m

i=l+1

f Y i1 (Y i1 |µ ¹ , σ 1 ² ) Q ⁿ

i=m+1

f Y i2 (Y i2 |µ ² , σ ² 2 )

In this case, no appropriate parametrization which leads to a factorization into likelihoods with distinct parameters can be found.

Maximum Likelihood with Missing Data, Mar 30, 2004 - 22 -

Censored Variables

Example: Randomly censored data

◦ Objective: Analysis of data Y 1 , . . . , Y n

iid ∼ N (µ, σ ² )

◦ Fixed censoring at c:

R i = ½ 1 Y i ≤ c 0 otherwise

The likelihood of the complete observations is L n (θ |Y ^obs , R) = Q ^m

i=1

f Y i (Y i |µ, σ ² ) Q ⁿ

i=m+1

Z _∞

c

f Y i (y i |µ, σ ² ) dy i

=

m

Q

i=1

f Y i (Y i |µ, σ ² )

n

Q

i=m+1

S Y i (c)

=

m

Q

i=1

1 σ ϕ µ Y i − µ

σ

¶ _n

Q

i=m+1

· 1 − Φ µ c − µ σ

¶¸

where

S Y i (y) = 1 − Φ µ y − µ σ

¶

is the survivor function of Y i . Notation:

◦ ϕ(y) is the density of the standard normal distribution, ϕ(y) = 1

√ 2π exp ³

− 1 2 y ² ´

.

◦ Φ(y) is the corresponding cumulative distribution function (CDF),

Φ(y) = Z y

−∞

ϕ(z) dz.

Censored Variables

Example: Randomly censored data Derivation of the likelihood:

We have for the missing data mechanism

(R i = r i |Y ) = (R i = r i |Y i ) = 1 ( −∞,c] (Y i ) ^r ⁱ (1 − 1 ( −∞,c] (Y i )) ^1−r ⁱ Hence the likelihood of the complete data (including R) is

L n (θ |Y, R) = Q ⁿ

i=1

f Y i (Y i |θ) Q ⁿ

i=1

1 _(−∞,c] (Y i ) ^R ⁱ (1 − 1 (−∞,c] (Y i )) ¹ ^−R ⁱ

Since R i = 1 and thus 1 ( −∞,c] (Y i ) ^R ⁱ = 1 for uncensored observations Y i , we have

L n (θ |Y ^obs , R)

= Z _m

Q

i=1

f Y i (Y i |θ) Q ⁿ

i=m+1

f Y i (y i |θ)1 ⁽ −∞,c] (y i ) ^R ⁱ 1 (c, ∞) (y i ) ¹ ^−R ⁱ dy m+1 · · · dy ⁿ

Modelling Missing Data Complete-data model:

Modelling Missing Data Complete-data model:

◦ independent and identically distributed (iid) draws Y 1 , . . . , Y n

◦ from multivariate distribution P θ , Y i = (Y i1 , . . . , Y ip ) T ∼ P θ .

◦ Data matrix:

Y = (Y 1 , . . . , Y n ) T = (Y ij ) i=1,...,n;j=1,...,p

1 2 ...

n

Y i1 Y i2 · · ·

· · · Y ip

Units

Inference: estimate θ

◦ likelihood approach Ã maximize likelihood function

◦ Bayesian approach Ã consider posterior distribution Likelihood function of the complete data:

f Y (y|θ) = Q n

i=1 f Y i (y i |θ) where f Y i ( ·|θ) is the density of P θ

Maximum Likelihood with Missing Data, Mar 30, 2004 - 1 -

Modelling Missing Data

Problem: Some data Y ij may be missing

Example: Cholesterol levels of heart-attack patients Data:

◦ Serum-cholesterol levels for n = 28 patients treated for heart attacks.

◦ Cholesterol levels were measured for all patients 2 and 4 days after the attack.

◦ For 19 of the 28 patients, an additional measurement was taken 14 days after the attack.

◦ See also Schafer, sections 5.3.6 and 5.4.3.

Id Y 1 Y 2 Y 3 Id Y 1 Y 2 Y 3

Maximum Likelihood with Missing Data, Mar 30, 2004 - 2 -

Modelling Missing Data Idea: describe missingness of Y ij by indicator variable R ij

R ij = ½ 1 Y ij has been observed 0 Y ij is missing Multivariate dataset with missing values:

Y i1 Y i2 · · · Y ip R i1 R i2 · · · R ip

Units 1, . . . , n 1 1 1 1

. . . , n 2 0 1 1

. . . , n 3 1 0 1

. . . , n 4 1 1 0

. . . , n 5 0 0 1

. . . , n 6 0 1 0

. . . , n 7 0 1 0

. . . , n 8 0 0 0

Observed-data model:

◦ The observed information consists of

¦ observed values y ij

¦ the values r ij indicating which values are missing

◦ A statistical model for the observations should make use of all available information

Notation:

Y = (Y obs , Y mis ) where

◦ Y obs are the observed variables

◦ Y mis are the missing variables

Missing Data Mechanism

Statistical model for missing data:

(R = r |Y = y) = f R|Y (r |y, ξ) with parameters ξ ∈ Ξ.

Definition Missing completely at random (MCAR) The observations are missing completely at random if

(R = r |Y = y) = (R = r) or equivalently

f R|Y (r |y, ξ) = f R (r, ξ), that is, R and Y are independent.

Definition Missing at random (MAR) The observations are missing at random if

(R = r |Y = y) = (R = r|Y obs ) or equivalently

f R |Y (r|y, ξ) = f R |Y obs (r|y obs , ξ),

that is, knowledge about Y mis does not provide any additional infor- mation about R if Y obs is already known (R and Y mis are conditionally independent given Y obs ).

Definition Not missing at random (NMAR)

Observations are not missing at random if the above assumption for

MAR is not fulfilled.

Missing Completely at Random (MCAR)

Example: Bernoulli selection

◦ Complete data: Y 1 , . . . , Y n

◦ Response indicator: R 1 , . . . , R n

◦ Missingness mechanism: Each unit is observed with probability ξ f R (r |ξ) =

n

Q

i=1

ξ r i (1 − ξ) 1 −r i

Example: Simple random sample (SRS)

◦ Suppose that only m observations are available, n − m values are miss- ing.

◦ Responding units are simple random sample of all units:

f R (r |ξ) = (¡ n

m

¢ −1 if P n i=1 r i = m

0 otherwise

◦ Note that ξ = m is a parameter of the distribution of R

Maximum Likelihood with Missing Data, Mar 30, 2004 - 5 -

Missing at Random (MAR)

Example: Hypertension trials

Data: Serial measurements of blood pressure during long-term trials of drugs which reduce blood pressure

◦ independent and identically distributed (iid) draws Y ¹ , . . . , Y n

◦ from multivariate distribution P θ , Y i = (Y i1 , . . . , Y ip ) ^T ∼ P ^θ .

Y = (Y 1 , . . . , Y n ) ^T = (Y ij ) i=1,...,n;j=1,...,p

f Y (y|θ) = Q ⁿ

i=1 f Y i (y i |θ) where f Y i ( ·|θ) is the density of P ^θ

¦ observed values y ^ij

¦ the values r ^ij indicating which values are missing

◦ Y ^obs are the observed variables

◦ Y ^mis are the missing variables

f _R|Y (r |y, ξ) = f R (r, ξ), that is, R and Y are independent.

(R = r |Y = y) = (R = r|Y ^obs ) or equivalently

f R |Y (r|y, ξ) = f R |Y ^obs (r|y obs , ξ),

ξ ^r ⁱ (1 − ξ) ¹ ^−r ⁱ

f R (r |ξ) = (¡ _n