Comparison of Four Methods for RANDHIE Data (Beta Differences)

CHAPTER 5 RESULTS

5.4 Comparison of Four Methods for RANDHIE Data (Beta Differences)

In this section, the results based on differences in beta estimates for RANDHIE data are presented for (i) CC analyses; (ii) Rubin’s MI; (iii) Galimard et al’s MI; and (iv) Ogundimu & Collins’ MI are presented in Tables 5.9 - 5.12.

Table 5.9 Beta difference between CC analysis method for RANDHIE datasets with missing data and the complete dataset

15% 30% 50%

MCAR MAR MNAR MCAR MAR MNAR MCAR MAR MNAR

β diff β diff β diff β diff β diff β diff β diff β diff β diff Intercept -0.0367 -0.1182 0.01452 0.00152 -0.02 0.13779 -0.0367 0.38958 0.37639 logc 0.00303 -0.0006 0.00994 -0.0044 -0.0101 0.00148 0.00303 0.00455 0.02757 linc 0.01644 0.03865 -0.0168 -0.0079 0.0001 -0.0086 0.01644 -0.0187 -0.0166 lfam -0.0097 -0.0274 0.03388 0.01678 0.02619 0.02134 -0.0097 0.04654 0.0761 xage -0.0022 -0.0032 0.00068 0.00152 0.0009 -0.0002 -0.0022 -0.0021 -0.0013 female -0.0218 -0.0821 0.04526 -0.0012 -0.0252 -0.126 -0.0218 0.03588 -0.0733 child -0.0965 -0.1551 0.08459 0.01193 -0.0349 -0.106 -0.0965 -0.0461 -0.2198 Green  lowest value among its own missing percentage

Yellow  medium value among its own missing percentage Red  highest value among its own missing percentage

Table 5.10 Beta difference between Rubin’s MI method for RANDHIE datasets with missing data and the complete dataset

15% 30% 50%

MCAR MAR MNAR MCAR MAR MNAR MCAR MAR MNAR

β diff β diff β diff β diff β diff β diff β diff β diff β diff Intercept -0.0331 -0.1275 -0.02 0.03252 -0.0419 0.15165 -0.0565 0.33058 0.32677 logc 0.00036 -0.0022 0.00823 -0.0054 -0.0076 -0.0026 0.00365 0.00541 0.02933 linc 0.01818 0.03863 -0.014 -0.0093 0.00582 -0.0111 0.01739 -0.0178 -0.0154 lfam -0.0203 -0.0398 0.0316 0.01722 0.00969 0.03667 -0.0167 0.05186 0.08573 xage -0.0022 -0.0024 0.00079 0.00107 0.00025 8.56E-06 -0.0019 -0.0016 -0.0018 female -0.0139 -0.0708 0.04662 0.00012 -0.0285 -0.1514 -0.0133 0.0502 -0.0373 child -0.0913 -0.1149 0.11042 -0.0003 -0.0319 -0.0878 -0.0831 -0.0264 -0.1998 Green  lowest value among its own missing percentage

Yellow  medium value among its own missing percentage Red  highest value among its own missing percentage

Table 5.11 Beta difference between Galimard et. al’s MI method of RANDHIE datasets with missing data and the complete dataset

15% 30% 50%

MCAR MAR MNAR MCAR MAR MNAR MCAR MAR MNAR

Yellow  medium value among its own missing percentage Red  highest value among its own missing percentage

Table 5.12 Beta difference between Ogundimu & Collins’ MI method of RANDHIE datasets with missing data and the complete dataset

15% 30% 50%

MCAR MAR MNAR MCAR MAR MNAR MCAR MAR MNAR

β diff β diff β diff β diff β diff β diff β diff β diff β diff Intercept -0.302 -0.2537 -0.0886 -0.5646 -0.745 -0.2253 -1.166 -1.2862 -0.9738 logc 0.00045 -0.012 -0.0126 -0.0011 -0.0144 -0.0151 -0.0058 -0.0067 -0.0082 linc 0.02112 0.00749 -0.006 0.03791 0.01364 -0.0123 0.0106 0.02361 -0.0034 lfam -0.0252 0.02149 0.01039 -0.0645 0.02158 0.00526 0.04152 0.04566 0.05388 xage -0.0034 0.00431 0.00138 -0.0037 0.01207 0.00052 0.00238 0.0128 0.00384 female -0.0017 -0.0082 0.01353 -0.0667 -0.059 0.06897 0.0539 -0.1671 0.06869 child -0.0907 -0.1763 -0.0438 -0.1938 -0.2952 -0.0802 0.22471 -0.0758 -0.0953 Green  lowest value among its own missing percentage

Yellow  medium value among its own missing percentage Red  highest value among its own missing percentage

Interpretation of Results Based on Table 5.9 (Beta difference between CC analysis method for RANDHIE datasets with missing data and the complete dataset) For the beta difference of CC analysis and the complete dataset in the RANDHIE dataset, MCAR datasets have the most variables with the least amount of difference from the complete data for all 3 missing percentages. When the missing data percentage was low (15%), the MAR data had the most coefficients with highest difference from the complete data. But as the missing data increased (30% and 50%), MNAR data had the more coefficients that are most different from the complete data. The results make sense in the way that CC analysis performed best on MCAR data, with MAR coming in second while MNAR had the worst performance for each missing data percentage; it is also valid that the method’s performance on MAR and MNAR data changes as the percentage of missing data increases because MNAR (as a non-ignorable missing mechanism) is more sensitive to biases in CC analysis.

Interpretations of Results Based on Table 5.10 (Beta difference between Rubin’s MI method for RANDHIE datasets with missing data and the complete dataset) For Rubin’s MI on RANDHIE, when missing data was 15%, MCAR data had the most variables (5) with the smallest beta differences while MAR had the most variables (6) with the largest beta differences. For 30% missing data, MAR and MCAR data had the most variables (2) with the smallest beta differences while MNAR had the most variables (4) with the largest beta differences. For 50% missing data, Rubin’s MI performed best for MCAR data with 3 and 1 coefficients being the least and most different from complete data respectively. For MAR with 50% missing, 2 and 3 coefficients were the least and most different from complete data respectively. For MNAR with 50% missing, Rubin’s MI performed the worst with 2 and 1 coefficients being the most and least different from the complete data. This result for 15% missing data was unexpected because Rubin’s MI is expected to deliver results that are least different from the complete data for the MAR data and most different from complete data for MNAR data. The results for MI on 30% and 50% missing data show that Rubin’s MI deliver results closer to the actual results for MCAR and MAR data than MNAR data.

Interpretation of Results Based on Table 5.11 (Beta difference between Galimard et. al’s MI method for RANDHIE datasets with missing data and the complete dataset) The beta difference of the variables when Galimard et. al’s method was applied to RANDHIE is identical

results least different from the actual results for MCAR data, MAR data second, and MNAR with results most different from the actual results.

Interpretation of Results Based on Table 5.12 (Beta difference between Ogundimu & Collins’ MI method for RANDHIE datasets with missing data and the complete dataset) For Ogundimu & Collins’ MI in 15% and 30% missing percentages, the MNAR datasets had the most variables with coefficients least different from the complete dataset. For 15% missing, the MNAR had 4 coefficients that were least different from the complete data while for 30% missing, the MNAR had 4. For 50% missing, Ogundimu & Collins’ MI method delivered the most variables with coefficients that were most similar to the complete data in MCAR data while MNAR data had the most variables that were most different from the complete data. This suggests that Ogundimu & Collins’ method works well on MNAR data when the missing data percentage is 30% or lower while for anything above that it works better on MCAR and MAR data.

In document Comparison of Two Newly Developed Multiple Imputation Methods for MNAR Cross-Sectional Data (Page 60-66)