Chapter 4 Modelling Missingness using Chain Event Graphs
4.2 CEGs for Informed Missingness
4.2.2 Application to the UKCP
In this section I will find the MAP CEG structure for the running example, used in the previous section, given the available data from the UKCP study. The resulting model can then be used to draw inference on the e↵ect of the birth weight and visual impairment on survival and gives an understanding of the missingness structures beyond the three established mechanisms. As I have ensured that the missingness
indicator appears before the variable with missing values within the ordering of the variables in the tree, the model selection techniques of Chapter 3.2 can be directly applied to this example and the scores of the CEGs in the model space can be calculated in closed form as before.
As in Chapter 3.2 I assume a uniform prior on the root-to-leaf paths and an equivalent sample size of 3, equal to the number of categories the birth weight variable takes. A discussion of possible informative priors for the examples of the UKCP study are discussed in Chapter 6.2. Running the AHC algorithm finds the MAP CEG to be the CEG given in Figure 4.8 with the CPVs given in Table 4.1. The predictive probabilities of survival up to or above the age of 5 are attached to the final positions in the CEG, which, together with the 95% credible intervals of the posterior distribution of survival, are: 98.7 (98.4,99.0)% for position w5, 89.5
(87.5,91.3)% for w6 and 84.6 (81.8,87.2)% for w7. The CEG is again drawn as
an Ordinal CEG such that the positions describing the same succeeding event are vertically aligned in descending order with respect to the predictive probability of survival. To calculate the predictive probability of survival for positionsw1,w2 and
w3 Table 4.1 can then be used to obtain a survival probability of 96.5% for w1,
96.2% forw2 and 95.3% for w3. So, from the topology of the Ordinal CEG a low
birth weight is predicted to give the highest probability of survival and a slightly lower probability for a very low and a normal birth weight.
w1 missing + + not missing & & w5 98.7% No survival ( ( Survival ( ( w0 bw very low // bw low 8 8 bw normal & & w2 missing 22 not missing //w4 not severe 6 6 severe //w6 89.5% No survivalSurvival ////w1 w3 missing // not missing 8 8 w7 84.6% No survival 6 6 Survival 6 6
Figure 4.8: Ordinal MAP CEG structure for the UKCP example describing the e↵ect of birth weight, visual impairment and missingness on survival
Stage/Position Conditional Probability Vector
u0=w0 (P(X1=Low), P(X1=Very low), P(X1=Normal)) (0.244,0.177,0.579)
u1=w1 (P(X2=Not missing|u1), P(X2=Missing|u1)) (0.850,0.150)
u2={w2, w3} (P(X2=Not missing|u2), P(X2=Missing|u2)) (0.814,0.186)
u3=w4 (P(X3=Not severe|u3), P(X3=Severe|u3)) (0.894,0.106)
u4=w5 (P(X3=Survival|u4), P(X3=No survival|u4)) (0.987,0.013)
u5=w6 (P(X4=Survival|u5), P(X4=No survival|u5)) (0.895,0.105)
u6=w7 (P(X4=Survival|u6), P(X4=No survival|u6)) (0.846,0.154)
Table 4.1: Table of CPVs associated with the MAP CEG for the UKCP example on birth weight, visual impairment and survival given in Figure 4.8
As illustrated in Chapter 2.3.2 on the CHDS example, a number of conclu- sions can be drawn from the CEG about the likely dependence structure of the three variables considered. The distribution of the missingness is indistinguishable for a very low and normal birth weight as w2 and w3 are in the same stage. Further,
recall the conditional independence statement of Chapter 2.3 Equation 2.8, which states thatY(w)??X(w)| E(w), whereY(w) is a variable identified with the set of paths fromw0 tow,X(w) is the variable associated with the edges emanating from
wand E(w) represents the event that the individual passes through the positionw. This can be used as before to read o↵ conditional independencies from the CEG associated with the variables in the graph by looking at the cut-sets of the graph as- sociated with each of the edges emanating from the vertex subsetsVR2,VX2 andVX3.
The first cut-set in the graph consists of the edges emanating from positionsw1,w2
and w3. These are reached by three unique paths and hence Y(w)?? X(w)| E(w)
applied tow1,w2 andw3 gives the trivial conditional independence statement that
the birth weight a↵ects the missingness process. Moving further along the graph we can deduce the conditional independency Y(w4)?? Z(w4)| E(w4), where by Equa-
tion 2.7 Z(w4) is the variable associated with the paths from w4 to w1. Here the event, E(w4), i.e. going through w4 corresponds to observing visual ability. Z(w4)
describes visual impairment and survival, whileY(w4) represents the birth weight.
We then have that, given visual impairment is observed, the visual impairment and survival are independent of birth weight, such that, when visual disability is ob- served, we have that the distribution of visual disability and survival is the same (w1 ! w4, w2 ! w4, w3 ! w4). Finally, consider the three final positions, w5, w6
and w7, which can be interpreted as describing the ‘health state’ of the individual.
We have that X(w5), X(w6) and X(w7) describe the variable survival and then,
from Y(w) ??X(w)| E(w) applied to w5, w6 and w7 we conclude that survival de-
pends only on these three positions and not on the paths through which they have been reached.
As expected, the highest probability of survival is obtained when visual im- pairment is observed to be non-severe. In this case survival up to or above 5 is pre- dicted to be 98.7%. When visual impairment is observed to be severe, the individual is forced into the final positionw6 with survival of 89.5%, which is significantly lower
than survival with a non-severe disability. The poorest survival is found to be for individuals whose visual impairment is not observed. Here a very low and low birth weight leads to a survival probability equal to the predictive survival probability for severe impairment, while for a normal birth weight survival is predicted to be only 84.6%. This is significantly lower than survival when visual disability is observed.
We hence deduce directly from the Ordinal CEG structure that the data are unlikely to be MAR. As explained in the previous section the expected survival probabilities, under MAR, for individuals for whom visual disability is missing, can be calculated from the right hand side of Equation 4.4, where the survival probability conditional on a particular birth weight is expected to be the weighted average of the survival probability for individuals of that birth weight with a severe or non-severe disabil- ity, weighted according to the probability of observing a severe or non-severe visual impairment. In Figure 4.8 all individuals go through position w4 and therefore
the calculated expected probability of survival under MAR will be the same for all individuals. This is given by
98.7⇥0.894 + 89.5⇥0.106 = 97.7%,
with 95% credible interval (96.8%,98.5%). This is compared to the predictive sur- vival probabilities when visual impairment is missing, which correspond to the pre- dictive survival probabiltiies associated with the positionsw6 and w7, of 89.5% and
84.6%. We see that the predictive survival for a missing impairment is much lower and, in either case, does not lie within the calculated 95% credible interval. The conclusion is therefore that the data are unlikely to be MAR. In the situation where the individual has a normal birth weight this can be read o↵directly from the Or- dinal CEG. For a very low or low birth weight, the missing edge leads to the same position as severe visual disability with survival probability 89.5%. Figure 4.8 sug- gests that the data are not MAR, however this needed to be calculated explicitly to make reliable conclusions. (Compare Figure 4.6).
Having found the MAP CEG structure for the tree given in Figure 4.1 the hypothesis that data are MCAR can also be examined. The first requirement for this is that w1, w2 and w3 are in the same stage, which suggests that there is no
evidence that missingness is dependent on the birth weight of the individual, such that R2 ?? X1. However, this is only the case for w2 and w3 but not for w1. The
second requirement, that missingness is independent of visual disability (R2 ??X2)
has also shown to be implausible by the above.
Note that again the equivalent sample size can be varied to check whether the selected model is sensitive towards the strength of the uniform prior. Doing so shows that the only observed change is that an individual with a very low birth weight and severe visual impairment also moves into positionw7 as the sample size
increases. This happens as the uniform prior implies a survival probability of 50% a priori and this a↵ects a reduction in the predictive probability of survival for
the relatively small number of individuals with a very low birth weight and severe impairment. The Ordinal CEG then proposes that data may be MAR for a very low birth weight, as now the position reached when the individual has a very low birth weight and a missing impairment lies between the other two position. However, calculations as the ones performed above and the close predictive probabilities of
w7 and w6 in comparison to w5 show that this is not the case.