Improving respondent representativeness by adaptive design is associated with reducing nonre- sponse bias (Sarndal 2014, Schouten, et. al. 2016). Current methods for adaptive designs focus on data collection. Adaptive data collection designs are primarily a nonresponse follow-up strategy
which applies differential data collection protocols to different subgroups of the nonrespondents. Although intricate strategies targeting nonresponse follow-up improve respondent representative- ness, it is a task that not only complicates the inferential process but also inflicts additional cost (i.e. costs related to nonresponse follow-up). In contrast, this dissertation develops adaptive sam- pling designs that obtains representative respondents through differential sampling probabilities for over- and under-represented subjects while maintaining a coherent data collection protocol. Two adaptive sampling designs are proposed, BSS-Z and BSS-X. Guided by the benchmark, a propensity score based sampling probability is used to tailor the sampling decision sequentially, attenuating the impact of undesirable nonresponse mechanisms. In a multi-replicate survey set- ting, the BSS-Z method sequentially conforms the frame variables of focal survey to those of the benchmark, improving the representativeness of the respondent data. Employing a similar mecha- nism, the BSS-X method conforms not only the frame but also survey covariates of focal survey to those of the benchmark. Both sampling designs are evaluated by simulation experiments to mimic adaptive designs under various nonresponse mechanisms, including two types of not missing at random (NMAR) mechanisms, NMARX and NMARY. The results show that the respondent rep-
resentativeness improves at each successive sampling phase. The greatest benefit to these sampling paradigm is that representative respondent pool maintains a similar variance-covariance structure to that of the benchmark, thereby producing less biased descriptive statistics.
Prior studies on unit nonresponse imputation obtained mixed results, perhaps in part caused by the fact that imputation models were built by the respondent data alone. Focal survey respondent data often bear unknown nonresponse patterns, thereby producing mixed results on inferences. In this dissertation, the imputation of unit nonresponse is guided by the benchmark. The benchmarked multiple imputation (B-MI), implemented after obtaining representative respondent data, are better able to recover the population structure and eliminates bias not only under ignorable nonresponse (MAR) but also under one type of nonignorable nonresponse (NMARX). We implemented Multi-
The imputation models iteratively fit to the benchmark data and the cumulative respondent data, predicting missing information for unit nonrespondents to construct a completed dataset, i.e., re- spondent data and imputed nonrespondent data. The completed data preserves a high level of population structure with respect to marginal distribution and joint distribution, although biases of the estimates are not completely eliminated under MNAR.Y missingness. The point estimates of the sample mean are unbiased for both MAR and NMARX. The greatest benefit of the proposed
approach is nonresponse bias reduction under NMARX and NMARY missingness.
Conventional wisdom suggests that the reduced error often is a tradeoff from increased cost. We use simulation to examine the cost and error implications under a single-stage four-phase sim- ple random sample design. We developed a cost model and evaluated the cost-effectiveness of proposed paradigm. The subject-level cost model is nonlinear, stochastic, and inversely propor- tionate to subjects response propensity. For error estimation we presented two variance estimators, a current practice for post-survey adjustments (i.e., GREG estimate), and an alternative adjust- ment (i.e., benchmarked MI estimate). The benchmarked MI offers some insights for additional gains over GREG on bias reduction in situations where micro-level data and survey covariates are available.
Overall, while we found that the traditional fixed sampling and weighting adjustments out- performed the proposed strategy on cost-effectiveness when missingness is MAR, the proposed strategy outperformed the traditional strategy when missingness is NMAR. Although bias persists under NMAR, the adaptive designs outperformed the fixed design for both bias reduction and cost-effectiveness. In practice, it is not possible to assess whether the unit nonresponse is MAR or NMAR without obtaining additional data for nonrespondents (Brick, 2013). But where real world limits exist, the advantage for the proposed strategy was even more pronounced when auxiliary variables are weak and/or survey variable (Y) is strongly correlated with survey covariates (X). The differences in bias reduction and cost-effectiveness of the sample designs support claims that, when the MAR assumption is violated, adaptive designs are superior in general.
When the MAR assumption is violated, adaptive designs consistently have more favorable bias properties, thereby better coverage rates for confidence intervals. When the point estimates are seriously biased, Benchmarked MI estimates have better coverage rates under NMARY. The
differences in the coverage rates of the two estimators are consistent to support claims that, when the MAR assumption is violated, Benchmarked MI estimators are superior in general.
In summary, the proposed adaptive designs are more cost-effective than the fixed sampling design when the MAR assumption is violated and weighting variables are weak predictors of sur- vey outcome variables. Our research has important implications for the era of increasing survey cost and increasing availability of digital data, e.g., administrative data, medical records, and para- data. Combining data from various sources to produce information not available from a single data source is not only inevitable but also sensible with respect to time and cost. The time is ripe for a new path forward. The proposed strategy is simple, straightforward and readily applicable with current statistical software. Most importantly, the proposed inferential paradigm would serve as an alternative and cost-effective survey design strategy in improving the quality of survey inference.