Simulation Background - Evaluation and Comparison of the Estimators

5. Evaluation and Comparison of the Estimators

5.2. Simulation Background

The estimation of treatment effects for count data under the EOM and EPOF can be executed with many parametric count specifications. The following section

characterizes the performance of the CMP, RGP, NB, and Poisson specifications under several distributions of data. To the degree possible, our simulations are guided by the health economics literature. Adapting the literature into a representative simulation scheme was conducted in an organized but unscientific manner. Studies from the health economics literature ranging as far back as 20 years were collected. Those that did not report a mean of the count variable, or that utilized a method other than parametric maximum likelihood (e.g., nonlinear least squares with an exponential mean) were not considered, leaving a total of 44 papers. In general, count models are utilized within the

field to estimate healthcare demand (e.g., physician visits) or substance abuse (e.g., alcoholic drinks per week). A majority of the count data literature is focused on the former, with physician/general practitioner (GP) visits being the primary variable of interest, and specialist visits, ER visits, prescription drug use, and inpatient hospital nights/weeks generally accounting for the rest of the healthcare demand literature.

Mean averages of GP visits typically fell in a range between 1 and 6, while non-GP visits generally had a mean less than 1. Means from the substance abuse literature ranged anywhere between <1 and 99 (excepting a single outlier on each end, all other values were between 4 and 17). (See Appendix C for details.) Since GP visits comprise the bulk of the literature, the values drawn from these studies were selected to guide our simulation design.⁷ The data considered were drawn from myriad datasets, and were often separated by gender; some studies pooled multiple years and others only reported an annual mean across multiple years. Rather than attempting to approximate an unweighted “mean of the means,” we simply chose a mean that we felt was representative of the data in general. The mean selected is 3.

In order to keep the study focused, we limit the policy effect estimated to the average treatment effect (ATE) rather than the average marginal (AME) or incremental effects (AIE) discussed earlier. Current parametric methods of estimating nonlinear endogenous policy effects are intended to fit binary, rather than continuous (or

7 Although the substance abuse literature has higher means, we believe it is reasonable to assume that the behavior of the count models considered will not differ much between two means that are both relatively small. Of more interest are the extremely small values of the healthcare demand studies. Such small values are almost certainly created by dual data generating processes: a binary variable of requiring treatment, and a count of demand conditional on seeking treatment. In general, studies did not report conditional means, but we feel safe assuming the mean conditional on requiring specialized treatment is roughly in line with that of the unconditional mean of GP visits.

level discrete) variables. Although the AIE and AME can be computed in the

endogenous sample selection case (under the assumption of exogeneity for the policy variables), estimating the ATE in the sample selection case is congruent with the binary policy effect estimated in the endogenous treatment case.

The literature provides much less guidance regarding estimated policy effects.

With few exceptions, results were reported as coefficient values rather than marginal or treatment effects. We thus selected three “true” treatment levels for our simulations.

Section 5.3 considers performance of the models estimating a “small” treatment effect (10% of the mean), and a “large” treatment effect (100% of the mean), generated using a standard Poisson distribution with log-normally distributed heterogeneity, and no

dependence. Ten percent is likely nearing the lower bound of what is economically significant in the case of a binary variable. Although effect sizes greater than 100% of the mean are possible, it is likely that the magnitude of the 100% effect is sufficient to serve as a “large” treatment effect, and the ability of the various estimators to estimate the ATE would likely be similar for all values greater than 100%. Section 5.4 considers data generated with both heterogeneity and dependence. These data sets have a “moderate”

treatment effect of approximately 25% of the mean.

The four models are compared according to four criteria. The relative accuracy of the models is determined by computing an absolute percent bias for the coefficients, as well as the expected value of Y and the ATE, where

ABP(β) = ¹_k∑ �^k_i=1 ^β�ⁱ_β^−β� (Eq. 31)

and k is the number of repetitions.

Relative efficiency of estimated treatment effects is compared using the Mean Squared Error (MSE)

MSE = Var�ATE� � + �ATE� − ATE�². (Eq. 32)

Goodness of fit is determined using the Akaike Information Criterion, a measure that penalizes for additional parameters, and then commonly appears in the literature when comparing count models.⁸

AIC = 2j – 2lnL, (Eq. 33)

where j is the number of parameters in the model and lnL is the opmitized value of the log-likelihood function. Although there is no test statistics to determine what a “good” fit is, the AIC provides a measure of relative fit, where smaller values indicate a superior fit of the data.⁹

All of the simulations have several components in common. Recall that Xs is the binary selection/treatment variable, X_p represents the policy variable of interest (which may be endogenous), Xo is the remaining observed data, Xu is an unobserved (scalar) confounder that enters the equations for both Y and X_s, and W⁺ is an instrumental

8 For some examples of AIC in count model selection in health economics, see Deb and Trivedi (1997, 2002), Gerdtham and Trivedi (2001), Liu and Gupta (2011), and Schmitz (2012).

9 We calculated BIC for each model as well: however, the relative AIC and BIC values between models were virtually identical, and we do not report the BIC values.

variable for the binary selection/treatment equation. The data are generated according to the following:

Xo ~ U(0.5,1), W⁺ ~ U[0,1], Xu ~ N(0,1)

X_p = 1(u > 0.45) if exogenous (where u is standard uniformly distributed) Xp= 1(αoXo+ αwW⁺+ αc + Xu > 0) if endogenous and

Xs= 1(αpXp+ αoXo+ αwW⁺+ αc + Xu > 0) for sample selection.

In the case of endogenous selection, �αp αo αc αW� = [-0.5 1 1 0.5], resulting in a roughly 64% probability of selection. Endogenous treatment has participation coefficients α = [-0.47 1 0.5] resulting in a 55% probability of treatment.

As demonstrated in 1.19, the expected influence of βu on the expected value of Y has a closed-form solution when Xuis standard normally distributed. βu was selected so that unobserved heterogeneity served as “multiplier” of roughly 10%: i.e., exp �^β₂^u²� =1.1.

For all simulations, βu = 0.437 in the Poisson, RGP, and NB cases, and βu= 0.437ν in the CMP scenario. Where possible, the constant βc was selected to account for roughly 1/3 of the effect of observables on the expected value of Y. βc = 0.334 in the Poisson, RGP, and NB cases (except for the case of 100% treatment effect, where βc is adjusted to hold Y constant despite the larger βp value.)¹⁰

All CMP data is generated based on 4, with the true Z calculated to within a truncation error of 1e^-5. Estimation using the CMP model also computes the true Z to within 1e^-5 of the “true” value, and post-estimation computation of the expectation of Y

10 Data generated according to a CMP process do not strictly follow this outline, since the model does not have a reliable closed-form solution to calculate the coefficient values. The values assigned correspond as closely as possible to the plan of assignment discussed above.

and the ATE are calculated using 10 and 28 respectively. (For comparison, estimations of Y and the ATE computed from 8 are reported in Appendix B.)

Each simulation is run 500 times with n = 5000. All simulations are executed using Gauss-Legendre quadrature with ten points of support. Since the focus of the analysis is on the performance of the count specification with regard to the β coefficients and ATE, we do not report the FIML estimations of the α parameters or the predicted probability of selection/treatment.

5.3. Simulation 1—Estimating “Small” and “Large” Treatment Effects

In document Examining the effect of health behaviors on wages and healthcare utilization in models with endogeneity (Page 34-39)