Parametric bootstrap - Calibration with Grouped Data

V. Calibration with Grouped Data

5.5 Parametric bootstrap

where J_n= 1_n1⁰_n is an n × n vector of all ones.

In practice, σ₀², V , and x₀ are usually unknown and need to be estimated from the data. In such cases, an approximate pivot (essentially a Wald statistic), denoted Q, can be obtained by replacing σ₀², V , and x₀ in the above equation with their respective estimates σb₀², bV , and xb₀. This suggests an approximate 100(1 − α)%

confidence interval for x₀ of

(5.10) Jb_cal(x) =x : z_α/2< Q < z_1−α/2 ,

where z_α/2 = z_1−α/2 denote the α/2 and 1 − α/2 quantiles of a standard normal distribution, respectively. Similar to the approximate predictive pivot (Equation (3.14)), it is unlikely that Equation (5.10) will yield closed-form solutions, thus, the solution must be obtained numerically. For the simulated data example, a 95% inversion interval based on Equation (5.10), corresponding to y₀ = 0.75, is given by (0.5859, 0.9779), which is very similar to the Wald-based intervals obtained earlier.

5.5 Parametric bootstrap

In Section 3.3.3, we discussed how to calculate calibration intervals based on the nonparametric bootstrap. A crucial assumption for the ordinary nonparametric bootstrap, however, is that the data are independent; for reasons discussed earlier, this assumption is typically not valid for grouped data. Nonetheless, a different kind of bootstrap, called the parametric bootstrap, has shown promise as a serious inferential tool. For examples, see McCulloch et al. (2008, pg. 342) and Efron (2011). When applicable, the parametric bootstrap typically gives answers similar to a Bayesian analysis with uninformative priors, however, it is much faster than

MCMC simulations (bootstrap simulations usually only require a few thousand iterations whereas traditional MCMC simulations may require tens, or even hundreds of thousands, of iterations). The parametric bootstrap essentially entails sampling from the fitted model itself, rather than sampling (with replacement) from the data. For controlled calibration in a mixed model setting, we propose the following algorithm (essentially a parametric version of Algorithm1based on theLMMinstead of the ordinary LM):

Algorithm 2: Parametric bootstrap for controlled calibration in LMMs.

for r = 1 to R do

(1) generate q new values of the random effects, denoted α^?_r, from a N 0, bG distribution;

(2) generate N new errors, denoted ^?_r, from a N (0,bσ²I) distribution;

(3) set y^?_r = X bβ + Zα^?_r+ ^?_r;

(4) update the original model using y_r^? as the response vector to obtain bβ_r^?; (5) generate y_0r^? from a N (y₀,σb₀²) distribution;

(6) compute bx^?_0r = µ⁻¹

y_0r^? ; bβ_r^? . end

Note that only steps (5) and (6) are specific to calibration. Similar parametric bootstrap schemes have also been proposed for mixed models. For example, we can condition on the current values of the random effects by ignoring step (1) of Algorithm2and using the currentEBLUP,α, in place of αb ^?. Semiparametric variants of Algorithm 2 that involve sampling directly from the EBLUP and residuals have also been proposed, but Morris (2002) considers this to be bad practice because it consistently underestimates the true variation in the data.

We applied Algorithm 2 to the simulated data from Figure5.2. A histogram of the R = 9, 999 bootstrap replicates of bx₀ is shown in Figure 5.3. Not surprisingly,

the distribution is reasonably symmetric and approximately normal; the normal Q-Q plot also confirms this. These bootstrap replicates were used to produce the last two confidence intervals in Table 5.1. For comparison, we have also included the calibration intervals computed in the previous sections. The results are all very similar and there is little reason here for choosing one interval over another. The Wald-based intervals are symmetric, but, as can be seen from Figure 5.3, symmetry is not unrealistic for this example (this is not the case for the example given in the next section). The inversion interval is not symmetric about bx0, as well as the bootstrap intervals, however, the bootstrap approach has the advantage of providing an estimate of the entire sampling distribution ofxb₀. It should be noted, though, that the parametric bootstrap assumes that the model specified for the data is correct! If, however, the data were not normal, then all of these intervals would likely produce misleading results. In the next section, we discuss a potential remedy that can be used for non-Gaussian LMMs, that is, LMMs that do not assume a specific distribution for the random effects or the errors.

Table 5.1: Approximate 95% calibration intervals for the simulated balanced random intercept example. The intervals based on the parametric bootstrap are labeled (PB).

Interval Estimate Lower 2.5% Upper 97.5% Length SE

Wald 0.7819 0.5859 0.9778 0.3920 0.0999

Crude interval 0.7819 0.5915 0.9722 0.3807 0.0971

Inversion 0.7819 0.5859 0.9779 0.3920 NA

Normal (PB) 0.7819 0.5873 0.9742 0.3870 0.0987

Percentile (PB) 0.7819 0.5895 0.9789 0.3893 0.0987

Bootstrap value

Density

0.4 0.6 0.8 1.0

0 1 2 3 4

−4 −2 0 2 4

0.4 0.6 0.8 1.0

Theoretical quantile

Sample quantile

Figure 5.3: Bootstrap distribution of xb₀ obtained using Algorithm2. The dotted red curve represents a normal distribution with meanbx0and standard deviation estimated from the bootstrap replicatesxb^?_0r. The vertical black line indicates the position ofxb₀.

5.5.1 Parametric bootstrap adjusted inversion interval.

Although we favor the bootstrap confidence intervals obtained directly from the R bootstrap replicates of xb0, researchers are likely more familiar with the inversion and Wald-based intervals discussed in the previous two sections. These intervals, however, use the quantiles from a standard normal distribution (i.e., rely on normal approximations). The parametric bootstrap can be used to improve upon these intervals by replacing the standard normal quantiles with more accurate ones. For instance, for the inversion interval, at each run in Algorithm2, we compute

Q^? =

y₀^?− µ

bx₀; bβ^? r

σb₀^2?+ X₀⁰

X⁰Vb^∗−1X⁰−1

X₀ ,

where the denominator is evaluated at x₀ =bx₀. As a result, we obtain the R bootstrap values Q^?_r. Let γ_α/2^? and γ_1−α/2^? denote the sample α/2 and 1 − α/2 quantiles of Q^?_r, respectively. A bootstrap adjusted inversion interval for x₀ is then given by

(5.11) Jb_cal^? (x) =x : γ_α/2^? < Q < γ_1−α/2^? .

We illustrate this on the bladder volume example in Section5.8. A similar adjustment can also be made for the Wald-based interval as well; this is very similar to the studentized bootstrap procedure outlined in steps (6)-(7) of Algorithm 1.

In document Topics in Statistical Calibration (Page 98-102)