Simulation results - DOUBLY ROBUST SUPPORT VECTOR

CHAPTER 3: DOUBLY ROBUST SUPPORT VECTOR

3.2 Methods

3.3.2 Simulation results

Let oracle error be the out-of-sample prediction error achieved when the training set is fully observed. We define the above oracle prediction error (AOPE) as the out- of-sample prediction error minus the oracle error. The AOPE represents the loss in

accuracy due to missing data in the covariates. Tables B.1 – B.4 in section B.4 (page 117) in the appendix report the AOPE for each competing method in each of the simulation settings.

The number of predictors is an important factor in the performance of each of the missing data methods. With more covariates, the stability of the resulting classifier improves. Consider the inter quartile range (IQR) of AOPE within each simulation setting. With two covariates, mean imputation has the smallest median IQR across all settings at 2.4 percentage points. DRSVM generates the largest median IQR at 3.5 percentage points. Thus, all methods exhibit comparable variability in AOPE. With ten covariates, the median IQR drops for each method with the smallest median IQR at 1 percentage point (mean imputation) and the largest is 1.9 percentage point (DRSVM). Roughly speaking, each method experiences a 50% decrease in IQR between two and ten predictors. The complete case method, however, is the exception. Its median IQR remained essentially unchanged between two and ten predictors. For all but the complete case method, the improved stability as the number of covariates increases is expected. Note that in these simulations, the number of observations and number of observations with missing data remain constant while the number of covariates increases. A single missing covariate represents much greater information loss in the setting with two covariates than the setting with ten covariates. Continuing with this heuristic thinking, the stability of the complete case method does not improve because a single missing covariate represents the same percent information loss in the two covariate setting as the ten covariate setting because the entire observation is removed.

Along with stability, the number of predictors affects the overall accuracy of each of the methods. Averaging over all simulation settings, the median prediction error improved from 3.6 percentage points (in the case KNN imputation) to 0.6 percentage points (in the case of mean imputation) when increasing the number of predictors from

two to ten. The complete case solution is the only method which did not improve with more covariates.

Focusing on specific classifiers, the clearest (and expected) result from the simulation study is that complete case classifier only performs well in MCAR situations (β = _{0). When missingness depends on the covariates and outcome (β} , 0), the complete case has an average increase in prediction error of 5 percentage points compared to when missingness depends on the covariates but not the outcome. On average, the other classification methods did not perform differently under one missing data model than the other.

Considering the settings where kernel choice matches boundary type, most missing data methods performed better in the linear boundary settings. The difference was most pronounced with fewer predictors. One exception to this observation is the DRSVM classifier, which performed worse by about 1 percentage point in the non- linear settings even with 10 predictors.

The performance of some classifiers did not vary when the covariate distribution changed from normal toχ2_{, when averaging over the other factors. However, complete}

case performed worse by 2.5 percentage points in theχ2_{setting. Similarly, the accuracy}

of DRSVM decreased by 1.5 percentage points when the underlying data was χ2

while the performance of the other competitors, including DWSVM, stayed within 0.5 percentage points.

The simulation results highlight situations when the DRSVM performs well and situations when it does not. Note that in the first scenario of Table 3.1, the DRSVM performs best at all levels ofβexcept for the extreme case when β= ₋6. The DRSVM is most beneficial in situations with few covariates and considerable missing data. The method seems to work much better with the linear kernel than with the Gaussian kernel. The DRSVM struggles with the Gaussian kernel in large part because of is- sues related to the approximate objective function, over-fitting, and tuning parameter

selection. Recall that the objective function, in order to maintain convexity, is approximated as in equation (3.3) in which the data has been reorganized into an augmented set of observations of types (a), (b), and (c) as described above. Operationally, the approximation re-labels observations of type (b). Thus each complete case observation contributes to the augmented data a single observation of type (c) andKobservations of type (b). The weighted, average loss of those contributed observations is the loss contribution of the complete case observation to the standard (not augmented) objective function (3.2) so long as the rule classifies all of the contributed observations to the same class. The linear kernel will classify each contributed observation to the same class except for contributed observations close to the classifier boundary. The Gaussian kernel, because it can generate complex boundaries, can generate boundaries so that contributed observations in the augmented dataset from a single complete case are classified into different classes. In such situations, the loss contribution from the complete cases is poorly approximated by equation (3.3). The poor approximation issue is an over-fitting issue and can be remedied by selecting appropriate tuning parameters. On a dataset-by-dataset basis, the user can inspect the outcomes to ensure over-fitting does not occur. This simulation study highlights the need for an automated solution for this issue for situations when dataset-by-dataset review is not feasible. Potential ideas for an automated solution are discussed in the conclusions.

In document Stewart_unc_0153D_15589.pdf (Page 82-85)