NPI Bootstrap for Finite Intervals - Nonparametric Predictive Methods for Bootstrap and Test Re

The proposed nonparametric predictive inference bootstrap method (NPI-B) is based

on the repeated application of assumptionA(n). First it is done with then observed

data which createn+1 intervals, leading to one further observation. This is followed

by adding a further observation to the data and applying A(n+1) in order to draw

the second further observation, and so on. This is continued until there are m fur-

ther observations, which then together (and without the original n observed data)

form one NPI-B sample. In this section we restrict attention to NPI-B applied to observations on a finite interval, because it simplifies the approach in the intervals

I1 and In+1. If NPI-B is applied on the full real-line, these two intervals require

different procedures for sampling a value within them. This will be presented in the following sections. The NPI-B algorithm for one-dimensional real-valued data on a finite interval is shown in Section 2.2.

2.3. NPI Bootstrap for Finite Intervals 29 ● ● ● ● ● ●

NPI standard Banks

The crucial difference from the standard bootstrap (’standard-B’) is that an NPI- B sample does not consist of the observations from the original sample but of points from the whole possible data range, because the sampling here is from the interval in between the data values and also outside the data range. This procedure leads to greater variation in the NPI-B samples than in the standard-B samples, as discussed in Section 2.2. Furthermore with NPI-B it is possible to estimateP(Y > c) for dif-

ferent values of c, especially if c is greater than the maximum value of the original

sample. This provides a method for estimating this probability, but the uniformity assumption within the intervals will affect the estimate and hence it would be dif- ficult to arrive at the exact statistical properties for such an estimate, hence the results presented mainly serve as an illustration of our method. It should be men- tioned that the NPI-B procedure is close in nature to Banks’ proposal of a smoothed bootstrap [8], where sampling also takes place uniformly in intervals between the

actual data, but Banks’ only uses the actual n+ 1 intervals from step 1 above for

the sampling of all values in the bootstrap sample, so a sampled value is not added to the data. This leads to a smaller variation in Banks’ approach than in NPI-B.

To study the NPI bootstrap performance, we have carried out a simulation experiment using R code. The considered methods in this part are: standard bootstrap, Banks’ bootstrap (smoothed Efron’s bootstrap) and NPI bootstrap. We have per- formed simulation studies, using the software R, for the performance of the NPI-B

as the estimation approach. For each method, we generated B = 1000 bootstrap

samples and calculated the variance which is the square of equation (1.6), bias from equation (1.9), absolute error |T_n∗−To

n|. Tno here is the observed value of statis-

tic which is calculated from the original sample. We found the absolute error for every value of statistics in bootstrap samples and then took the average of these values, and the mean square error (MSE) of the statistics from equation (1.12). It is important to mention the reason for choosing these measures. The variance of statistics is used to show the difference between the three methods of bootstrap or to show which method has a close variance to the original observations. We will see that the NPI-B has the largest variance of statistics but it is the closest one to the

2.3. NPI Bootstrap for Finite Intervals 31

n Uniform (0,1) Beta (1,2) Beta (0.5,1) 20 0.0800 0.0551 0.0729 50 0.0815 0.0444 0.0738 100 0.0857 0.0523 0.0799 200 0.0853 0.0492 0.0825 500 0.0842 0.0523 0.0859 1000 0.0841 0.0521 0.0834

Table 2.4: Variances of original samples from specific distributions with a variety of sample sizes

method measures n= 20 n= 50 n= 100 n= 200 n= 500 n= 1000 standard variance of mean 0.004 0.002 0.001 0.0005 0.0002 0.0001

bias 0.003 -0.0001 -0.001 -0.001 0.0002 0.0001 absolute error 0.049 0.032 0.024 0.017 0.010 0.007

MSE 0.004 0.002 0.001 0.0005 0.0001 0.0001 Banks variance of mean 0.004 0.002 0.001 0.0004 0.0001 0.001

bias -0.007 0.001 0.0003 0.001 -0.0001 -0.0001 absolute error 0.052 0.032 0.023 0.017 0.010 0.007

MSE 0.004 0.002 0.001 0.0004 0.0002 0.0001 NPI variance of mean 0.008 0.003 0.002 0.001 0.0003 0.0002 bias -0.009 0.002 0.001 0.001 -0.0002 0.00001 absolute error 0.070 0.046 0.033 0.024 0.015 0.010

MSE 0.008 0.003 0.002 0.001 0.0003 0.0002

Table 2.5: The sample mean when the original sample was from U (0,1)

variance of the original sample. Regarding bias, MSE and absolute error, they are the most commonly used measures of statistical accuracy of estimators as discussed in Section 1.3.1. We repeated this experiment forn= 20,50,100,200,500,1000 and for statistics: mean, variance and upper quartile (q75), to present location, variation

and position parameters, respectively, and for different distributions such as Uni- form (0,1), Beta (α, β), with (α = 0.5, β = 1) and (α = 1, β = 2), as examples of symmetric and skewed distributions. Note that, in these cases, the size of bootstrap

samples ism = n. Table 2.4 shows the observed variance of these original samples

with various sample sizes.

Tables 2.5, 2.6, 2.7, of Uniform (0,1) show that when comparing the absolute value of bias, the NPI bootstrap has the smallest bias of variance parameter in all cases, but for the mean andq75it has the smallest value only ifnis very large (500 or

1000). Mostly, for three parameters, the MSE and absolute error of Banks’ bootstrap and standard bootstrap are very close but they are less than NPI bootstrap’s values of MSE and absolute error. The variance of three parameters using NPI-B is larger than the others methods. This property of NPI-B is considered a good point because

method measures n= 20 n= 50 n= 100 n= 200 n= 500 n= 1000 standard variance of variance 0.0003 0.0001 0.0001 0.00003 0.00001 0.00001

bias -0.005 -0.002 -0.001 -0.0005 -0.0001 -0.0001 absolute error 0.015 0.009 0.006 0.004 0.003 0.002

MSE 0.0003 0.0001 0.0001 0.00003 0.00001 0.00001 Banks variance of variance 0.0004 0.0001 0.0001 0.00003 0.00001 0.00001 bias 0.005 0.002 0.001 0.0004 0.0002 0.00004 absolute error 0.016 0.009 0.007 0.004 0.003 0.002

MSE 0.0004 0.0001 0.0001 0.00003 0.00001 0.00001 NPI variance of variance 0.001 0.0002 0.0001 0.0001 0.00003 0.00001 bias 0.001 0.0002 -0.0001 -0.00002 0.00002 -0.0001 absolute error 0.021 0.012 0.009 0.006 0.004 0.003

MSE 0.001 0.0002 0.0001 0.0001 0.00003 0.00001

Table 2.6: The sample variance when the original sample was from U (0,1)

method measures n= 20 n= 50 n= 100 n= 200 n= 500 n= 1000 standard variance ofq75 0.004 0.003 0.001 0.001 0.0003 0.0002 bias -0.006 -0.014 -0.006 -0.004 0.0005 0.001 absolute error 0.039 0.040 0.020 0.024 0.014 0.009 MSE 0.004 0.003 0.001 0.001 0.0003 0.0002 Banks variance ofq75 0.004 0.002 0.001 0.001 0.0003 0.0002 bias 0.001 -0.007 -0.003 -0.000 0.001 0.001 absolute error 0.044 0.037 0.018 0.023 0.015 0.009 MSE 0.004 0.002 0.001 0.001 0.0003 0.0002 NPI variance ofq75 0.008 0.005 0.001 0.001 0.001 0.0004 bias -0.009 -0.012 -0.004 -0.002 0.0002 0.001 absolute error 0.064 0.053 0.028 0.030 0.020 0.014 MSE 0.008 0.005 0.001 0.001 0.001 0.0004

2.3. NPI Bootstrap for Finite Intervals 33

method measures n= 20 n= 50 n= 100 n= 200 n= 500 n= 1000 standard variance of mean 0.003 0.001 0.001 0.0003 0.0001 0.0001

bias 0.001 -0.0005 -0.002 0.0002 0.0002 0.0003 absolute error 0.041 0.024 0.019 0.013 0.008 0.006

MSE 0.003 0.001 0.001 0.0003 0.0001 0.0001 Banks variance of mean 0.006 0.002 0.001 0.0003 0.0001 0.0001 bias 0.078 0.035 0.018 0.009 0.003 0.002 absolute error 0.089 0.041 0.024 0.015 0.009 0.006 MSE 0.012 0.003 0.001 0.0004 0.0001 0.0001 NPI variance of mean 0.011 0.003 0.001 0.001 0.0002 0.0001 bias 0.076 0.036 0.018 0.009 0.003 0.002 absolute error 0.099 0.050 0.031 0.020 0.012 0.008 MSE 0.017 0.004 0.002 0.001 0.0002 0.0001

Table 2.8: The sample mean when the original sample was from Beta (1,2)

method measures n= 20 n= 50 n= 100 n= 200 n= 500 n= 1000 standard variance of variance 0.0002 0.0001 0.0001 0.00002 0.00001 0.000004 bias -0.003 -0.001 -0.001 -0.0003 -0.0002 0.0001 absolute error 0.013 0.008 0.005 0.004 0.002 0.002

MSE 0.0003 0.0001 0.0001 0.00002 0.00001 0.000004 Banks variance of variance 0.005 0.001 0.0003 0.0001 0.00002 0.00001

bias 0.068 0.033 0.016 0.008 0.003 0.001 absolute error 0.073 0.035 0.017 0.009 0.004 0.002 MSE 0.010 0.002 0.001 0.0002 0.00003 0.00001 NPI variance of variance 0.009 0.002 0.001 0.0002 0.00004 0.00001 bias 0.062 0.032 0.015 0.008 0.003 0.001 absolute error 0.072 0.036 0.018 0.010 0.005 0.003 MSE 0.013 0.003 0.001 0.0002 0.0001 0.00002

Table 2.9: The sample variance when the original sample was from Beta (1,2)

it makes the variance of parameters using NPI-B the closest one to the variance of the original sample, and supports the discussion in Section 2.2. In the tables of results in this chapter the values were approximated to three decimal digits to make them easy to read, but with some values we use additional digits to be informative and to avoid putting zero’s ”0.000” in the results. It is apparent from results of Beta (1,2) in Tables 2.8, 2.9, 2.10 that for all parameters here the bias of standard-B is the smallest one, unlike the other two methods, but sometimes there is no large difference between the bias of NPI-B and the bias of Banks’ bootstrap method.

Here, for the mean andq75, note that the absolute error and MSE have the same

status of Uniform (0,1), but for variance, the absolute error of Banks’ bootstrap and NPI bootstrap are similar and greater than the standard bootstrap.

The results of Beta (0.5,1) are presented in Tables 2.11,2.12,2.13 and show that when we estimate the variance, the NPI bootstrap method has the smallest bias

method measures n= 20 n= 50 n= 100 n= 200 n= 500 n= 1000 standard variance ofq75 0.007 0.003 0.001 0.001 0.0004 0.0002 bias 0.013 -0.010 -0.004 0.002 -0.0003 0.002 absolute error 0.067 0.047 0.029 0.021 0.017 0.012 MSE 0.007 0.003 0.001 0.001 0.0004 0.0002 Banks variance ofq75 0.016 0.003 0.001 0.001 0.0005 0.0002 bias 0.082 0.015 0.010 0.009 0.002 0.002 absolute error 0.109 0.048 0.029 0.023 0.018 0.012 MSE 0.022 0.003 0.001 0.001 0.0005 0.0002 NPI variance ofq75 0.028 0.007 0.003 0.002 0.001 0.0004 bias 0.088 0.015 0.009 0.009 0.001 0.002 absolute error 0.132 0.065 0.040 0.032 0.024 0.016 MSE 0.036 0.007 0.003 0.002 0.001 0.0004

Table 2.10: The sample upper quartileq75 when the original sample was from Beta

(1,2)

method measures n= 20 n= 50 n= 100 n= 200 n= 500 n= 1000 standard variance of mean 0.003 0.001 0.001 0.0004 0.0002 0.0001

bias -0.003 -0.001 0.0005 -0.00003 -0.0002 0.0001 absolute error 0.047 0.029 0.022 0.002 0.011 0.007

MSE 0.003 0.001 0.001 0.0004 0.0002 0.0001 Banks variance of mean 0.004 0.002 0.001 0.0004 0.0002 0.0001 bias 0.030 0.012 0.006 0.003 0.001 0.001 absolute error 0.054 0.033 0.023 0.017 0.010 0.007 MSE 0.005 0.002 0.001 0.0004 0.0002 0.0001 NPI variance of mean 0.007 0.003 0.002 0.001 0.0003 0.0002 bias 0.030 0.011 0.006 0.003 0.001 0.001 absolute error 0.071 0.044 0.032 0.023 0.015 0.010 MSE 0.008 0.003 0.002 0.001 0.0003 0.0002

Table 2.11: The sample mean when the original sample was from Beta (0.5,1)

method measures n= 20 n= 50 n= 100 n= 200 n= 500 n= 1000 standard variance of variance 0.0004 0.0002 0.0001 0.0001 0.00002 0.00001

bias -0.004 -0.002 -0.001 -0.001 -0.0002 -0.0001 absolute error 0.016 0.011 0.009 0.006 0.004 0.002

MSE 0.0004 0.0002 0.0001 0.0001 0.00002 0.00001 Banks variance of variance 0.0004 0.0002 0.0001 0.00004 0.00002 0.00001 bias -0.002 0.001 0.001 0.001 0.0002 0.00003 absolute error 0.016 0.011 0.009 0.005 0.003 0.002

MSE 0.0004 0.0002 0.0001 0.00004 0.00002 0.00001 NPI variance of variance 0.001 0.0004 0.0002 0.0001 0.00004 0.00002 bias -0.005 -0.0003 -0.0002 0.0002 0.00002 -0.0001 absolute error 0.021 0.015 0.012 0.007 0.005 0.003

MSE 0.001 0.0004 0.0002 0.0001 0.00004 0.00002

2.4. NPI Bootstrap on Real Line 35

In document Nonparametric Predictive Methods for Bootstrap and Test Reproducibility (Page 38-45)