Conclusions - Simulation Results - Estimating the variance of the Horvitz Thompson estimator

3.5 Simulation Results

3.5.7 Conclusions

This section provides a summary of the main findings for the comparisons between the variance estimators. As discussed in the methodology, the choice of the best estimator is subjective depending on whether one prefers a low MSE or a low absolute RB. As a result it is difficult to find any one of the nine approximate variance estimators that is uniformly superior to the rest. Table 3.27 shows the preferred estimator for RANSYS and CPS, and the preferred estimator within both families across both sampling algorithms. The RB and MSE are considered separately for each situation as it is difficult to determine one estimator which consistently has both a low absolute RB and low MSE. For some situations two estimators were chosen.

RB MSE

RANSYS Vˆ˜BR4 Vˆ˜BR1 CPS Vˆ˜BR4,Vˆ˜Ber Vˆ˜BR1 Brewer Family Vˆ˜BR4,Vˆ˜BR3 Vˆ˜BR1 H´ajek-Deville Family Vˆ˜Dev2, Vˆ˜Ber Vˆ˜Dev1

Table 3.27: The preferred approximate estimators

Among the Brewer Family estimators, Vˆ˜BR1 obtained the lowest absolute RB for some populations, however, it performed poorly in populations (d), (e) and (g). On the other hand,Vˆ˜BR3 and Vˆ˜BR4 rarely had high absolute RBs as compared with the other Brewer Family estimator, andVˆ˜BR4 frequently had a low absolute RB. In the H´ajek-Deville FamilyVˆ˜Dev1 achieved the lowest absolute RB in some situations, however, it performed poorly for populations (d) and (e). Vˆ˜Dev2 andVˆ˜Ber performed the best across all populations in the H´ajek-Deville Family with regard to the RB, however the latter approximate variance estimator consistently had a high MSE.

3.5 Simulation Results 3 SIMULATION STUDY

ˆ ˜

VDev1 always achieved the lowest MSE among the H´ajek-Deville Family.

The MSE affects every single sample estimate, therefore it is important to consider this property in more detail. The approximate variance estimators had much the same MSEs for any given situation, therefore they should all be about equally liable to produce a poor estimate from time to time. This is consistent with the empirical results as each estimator has a high RB on at least one occasion. As

ˆ ˜

VBR1 has a low MSE for both RANSYS and CPS it should be slightly less likely to produce a poor estimate from time to time.

The properties of the variance estimators within each family are generally very similar. Under RANSYS the Brewer Family estimators usually performed better across all sample sizes than the Hájek-Deville Family estimators. This was expected, as the Brewer Family estimators were designed using Hartley and Rao’s approximation of joint inclusion probabilities realised under this sampling algorithm. The Hájek-Deville estimators performed just as well or better under CPS than the Brewer Family, except for populations (d) and (e) where this family performed rather poorly. ExcludingVˆ˜Ber, the Hájek-Deville Family estimators tend to be more stable, shown by their MSE, than the Brewer Family estimators (excluding Vˆ˜BR1). Therefore the Hájek-Deville Family estimators (excludingVˆ˜Ber) should be less likely to produce a poor estimate than the Brewer Family estimators (excludingVˆ˜BR1).

Finally, comparing ˆVSY G with the approximate variance estimators, this estimator nearly always had the largest MSE. The RB of this estimator is generally not noticeably greater in magnitude than the lowest absolute RB, and occasionally is itself the lowest. For RANSYS using Hartley and Rao’s full approximations of the joint inclusion probabilities has not significantly improved this estimator to justify the extra computational effort required. The main interest in ˆVSY G, however, is in

3.5 Simulation Results 3 SIMULATION STUDY

its performance under CPS, as the joint inclusion probabilities are known exactly. Under this sampling algorithm ˆVSY G, unlike any of the other approximate variance estimators, had an absolute RB greater than 1% in only three occasions, the largest in magnitude being -3.7727%. All the approximate variance estimators had an absolute RB greater than 1% on numerous occasions. Thus, although the knowledge of the joint inclusion probabilities does not imply that ˆVSY G is the best estimator for minimising the absolute RB, it does appear to guarantee a consistently low absolute RB which is itself desirable. However, it is difficult to justify using this estimator due to its high MSE.

4 FURTHER RESULTS

4 Further Results

4.1 Introduction

During the simulation study two interesting discoveries were made which do not seem to have appeared in the literature before. The first discovery is concerned with the relationship between some of the approximate variance estimators under consideration. The second discovery is the relationship between the entropy of a sampling design and the true variance of the HTE under this sampling design. Entropy is a measure of the “randomness” of a sampling design, so it was thought that if a sampling design had a greater entropy then it would also have a higher variance. This, however, has not been found to be true empirically. This chapter will discuss both these discoveries in detail, and their relevance to the estimation of variance.

In document Estimating the variance of the Horvitz Thompson estimator (Page 86-89)