The General Idea of NPI Bootstrap - Nonparametric Predictive Methods for Bootstrap and Test Rep

In this section we present the main idea of the three types of bootstrap: standard, Banks’ and NPI-B, and explain the difference between them. For the standard bootstrap method, the observations are drawn from the n original sample points, but with the other two kinds it is drawn from the points of the original sample and from the intervals between them. NPI-B depends on creating n + 1 intervals using n observations, then drawing one value from these intervals and adding this value to the data set, and continuing to sample m further values in the same way in order to derive an NPI-B sample. Banks’ bootstrap uses the same process but without adding the new value to the data set. The style of sampling observations of NPI-B, which samples values from the data points and from the interval between these points and adds these values to the data set, means that the NPI-B sample has more variance than other methods of bootstrap. We will show this property in detail in simulation studies in this chapter and in the next example. All possible orderings of the new observations among the past observations are equally likely to appear in NPI-B, while they have multinomial distributions with n + 1 intervals for Banks’ bootstrap, and with n data observations for the standard bootstrap, all are equally likely for each new observation.

The NPI-B algorithm for one-dimensional real-valued data on a finite (bounded) interval is as follows:

1. Take the data set of n observations which are real-valued, 1-dimensional on a finite closed interval.

2. These n observations partition the intervals into n + 1 intervals.

3. Randomly select one of the n + 1 intervals, each with equal probability.

4. Sample one future value uniformly from this selected interval.

5. Add that value to the data: increase n to n + 1.

6. Repeat steps 2-4, now with n + 1 data, to get a further future value.

7. Do this m times to get a NPI bootstrap sample Y₁, Y₂, ..., Y_m of size m.

NPI-B

8. Repeat all these steps B times, where B is a chosen integer value, to get a total of B NPI bootstrap samples of size m.

In this algorithm we assumed that the distribution between data points is uniform, this does not follow from Hill’s assumption, but we assumed that because the NPI-B method is an improvement on NPI-Banks’ bootstrap method. NPI-Banks’ put uniformly distributed probabilities 1/(n + 1) over each interval between data points. This assumption is convenient for computation and intuitively reasonable. We do not consider further underlying principles according to which such an assumption would be optimal, it is just one possible assumption among many possibilities.

Example 2.1

In this example we illustrate the main arguments of the three bootstrap methods and the differences between them. We use (2, 4, 6) as original sample, and treat it as a sample drawn from an unknown distribution with support [0,8]. First, to sample an NPI-B sample of size m = n = 3 there is n + 1 intervals between the data set values including the end points (0, 8). The intervals are I₁ = (0, 2), I₂ = (2, 4), I₃ = (4, 6) and I₄ = (6, 8). Choose one interval and then sample the new value from this interval as the first value in NPI-B sample. Then add this value to the

Banks-B

Orderings Theoretical Probabilities Frequency Observed Proportions

(3,0,0,0) 0.02 7 0.04

data set so that it is n = 4 and the intervals become 5 intervals. Continue with this procedure to derive an NPI-B sample of size m = 3. There are ^n+m_m = ⁶₃ = 20 orderings of 3 future observations among the 3 data observations, which are shown in Table 2.1. For example, (1, 0, 2, 0) means there is 1 future observation from I1, 0 from I₂, 2 from I₃ and 0 from I₄. All orderings have equal probability 1/20 = 0.05.

We sampled 200 NPI-B samples to record the number of frequencies of each ordering and put the results of the simulation in Table 2.1. It is clear from this table that the probability of each ordering is close to 0.05 in most cases.

In Banks’ bootstrap we use the same method but do not add the new value to the data set. Table 2.2 shows the orderings and probability of each one using multinomial distribution, and the observed proportions of each ordering using simulation with 200 Banks’ bootstrap samples. For a standard bootstrap sample, the value is drawn just from the data values. This sample can be, for example, (2, 4, 6) or (2, 2, 6) or (4, 2, 4) etc. There are 10 orderings that can appear here, as shown in Table 2.3. This table contains the probability of the ordering using a multinomial distribution with n data observations, and the actual probabilities of 200 standard bootstrap samples.

The theoretical probabilities and those from the simulation study are similar in most cases of the three kinds of bootstrap methods. Figure 2.1 illustrates the variance

standard-B

Orderings Theoretical Probabilities Frequency Observed Proportions

(3,0,0) 0.04 10 0.05

(2,1,0) 0.11 19 0.10

(2,0,1) 0.11 17 0.09

(1,2,0) 0.11 24 0.12

(1,1,1) 0.22 45 0.23

(1,0,2) 0.11 30 0.15

(0,3,0) 0.04 2 0.01

(0,2,1) 0.11 27 0.14

(0,1,2) 0.11 20 0.10

(0,0,3) 0.04 6 0.03

Table 2.3: Orderings of Standard-B

values of NPI-B samples, standard-B samples and Banks’ bootstrap samples to measure how far observations are spread out, and to give a general insight into the NPI-B samples that have a large variance. That is due to the method of sampling as discussed earlier. These values of variances come from the simulation experiment in this example and are plotted in Figure 2.1. There are some NPI-B samples which have small values of variance,and some of them are close to 0, as shown in Figure 2.1.

This is possible with NPI-B samples but happens rarely. This can appear because the sample size is small.

In document Nonparametric Predictive Methods for Bootstrap and Test Reproducibility (Page 33-36)