• No results found

Allocating samples among strata

In document PDFbigbook SAS (Page 181-184)

Precision essentially depends only the absolute sample size, not the relative fraction of the population sampled

3.7 Stratified simple random sampling

3.7.7 Allocating samples among strata

There are number of ways of allocating a sample of size n among the various strata. For example,

1. Equal allocation. Under an equal allocation scheme, all strata get the same sample size, i.e. nh = n/H This allocation is best if variances of strata are roughly equal, equally precise estimates are required for each stratum, and you wish to test for differences in means among strata (i.e. an analytical survey discussed in previous sections).

2. Proportional allocation. Under proportional allocation, sample sizes are allocated to be proportional to the number of sampling units in the strata, i.e ni= n ×NNi = n ×P NNi

h = n ×N Ni

1+N2+···+NH = n × WiThis allocation is simple to plan and intuitively appealing. However, it is not the best design.

This design may waste effort because large strata get large sample sizes but precision is determined by sample size not the ratio of sample size to population size. For example, if one stratum is 10 times larger than any other stratum, it is not necessary to allocate 10 times the sampling effort to get the same precision in that stratum.

3. Neyman allocation In Neyman allocation (named after the statistician Neyman), the sample is allo-cated to minimize the overall standard error for a given total sample size. Tedious algebra gives that the sample should be allocated proportional to the product of the stratum size and the stratum standard deviation, i.e. ni = n × P WWiSi

hSh = n × P NNiSi

hSh = n × N NiSi

1S1+N2S2+···+NHSH. This allocation will be appropriate if the costs of measuring units are the same in all strata. Intuitively, the strata that have the most of sampling units should be weighted larger; strata with larger standard deviations must have more samples allocated to them to get the se of the sample mean within the stratum down to a reasonable level. A key assumption of this allocation is that the cost to sample a unit is the same in all strata.

4. Optimal Allocation when costs are involved In some cases, the costs of sampling differ among the strata. Suppose that it costs Ci to sample each unit in a stratum i. Then the total cost of the survey is C = P nhCh. The allocation rule is that sample sizes should be proportional to the product to stratum sizes, stratum standard deviations, and the inverse of the square root of the cost of sampling, i.e. ni = n × WiSi/ samples are found in strata that are larger, more variable, or cheaper to sample.

In practice, most of the gain in precision occurs from moving from equal to proportional allocation, while often only small improvements in precision are gained from moving from proportional allocation to Neyman allocation. Similarly, unless cost differences are enormous, there isn’t much of an improvement in precision to moving to an allocation based on costs.

Example - estimating the size of a caribou herd This section is based on the paper:

Siniff, D.B. and Skoog, R.O. (1964).

Aerial Censusing of Caribou Using Stratified Random Sampling.

The Journal of Wildlife Management, 28, 391-401.

http://dx.doi.org/10.2307/3798104

Some of the values have been modified slightly for illustration purposes.

The authors wished to estimate the size of a caribou herd. The density of caribou differs dramatically based on the habitat type. The survey area was was divided into six strata based on habitat type. The survey design is to divide each stratum in 4 km2quadrats that will be randomly selected. The number of caribou in the quadrats will be counted from an aerial photograph.

The computations are available in the caribou tab in the Excel workbook ALLofData.xls available in Sample Program Library at http://www.stat.sfu.ca/~cschwarz/Stat-650/Notes/MyPrograms.

The key point to examining different allocations is to make a single cell represent the total sample size and then make a formula in each of the stratum sample sizes a function of the total.

The total sample size can be found by varying the sample total until the desired precision is found.

Results from previous year’s survey: Here are the summary statistics from the survey in a previous year:

Map-squares sampled

Stratum Nh nh y s Est total se(total)

1 400 98 24.1 74.7 9640 2621

2 40 10 25.6 63.7 1024 698

3 100 37 267.6 589.5 26760 7693

4 40 6 179 151.0 7160 2273

5 70 39 293.7 351.5 20559 2622

6 120 21 33.2 99.0 3984 2354

Total 770 211 69127 9172

The estimated size of the herd is 69,127 animals with an estimated se of 9,172 animals.

Equal allocation

What would happen if an equal allocation were used? We now split the 211 total sample size equally among the 6 strata. In this case, the sample sizes are ‘fractional’, but this is OK as we are interested only in planning to see what would have happened. Notice that the estimate of the overall population would NOT change, but the se changes.

Stratum Nh nh y s Est total se(total)

1 400 35.2 24.1 74.7 9640 4810

2 40 35.2 25.6 63.7 1024 149

3 100 35.2 267.6 589.5 26760 8005

4 40 35.2 179 151.0 7160 354

5 70 35.2 293.7 351.5 20559 2927

6 120 35.2 33.2 99.0 3984 1684

Total 770 211 69127 9938

An equal allocation gives rise to worse precision than the original survey. Examining the table in more detail, you see that far too many samples are allocated in an equal allocation to strata 2 and 4 and not enough to strata 1 and 3.

Proportional allocation

What about proportional allocation? Now the sample size is proportional to the stratum population sizes.

For example, the sample size for stratum 1 is found as 211 × 400/770. The following results are obtained:

Stratum Nh nh y s Est total se(total)

1 400 109.6 24.1 74.7 9640 2431

2 40 11.0 25.6 63.7 1024 656

3 100 27.4 267.6 589.5 26760 9596

4 40 11.0 179 151.0 7160 1554

5 70 19.2 293.7 351.5 20559 4787

6 120 32.9 33.2 99.0 3984 1765

Total 770 211 69127 11263

This has an even worse standard error! It looks like not enough samples are placed in stratum 3 or 5.

Optimal allocation

What if both the stratum sizes and the stratum variances are to be used in allocating the sample? We create a new column (at the extreme right) which is equal to NhSh. Now the sample sizes are proportional to these values, i.e. the sample size for the first stratum is now found as 211 × 29866.4/133893.8. Again the estimate of the total doesn’t change but the se is reduced.

Stratum Nh nh y s Est total se(total) NhSh

1 400 47.1 24.1 74.7 9640 4089 29866.4

2 40 4.0 25.6 63.7 1024 1206 2550.0

3 100 92.9 267.6 589.56 26760 1629 58953.9

4 40 9.5 179 151.0 7160 1709 6039.6

5 70 38.8 293.7 351.5 20559 2639 24607.6

6 120 18.7 33.2 99.0 3984 2522 11876.4

Total 770 211 69127 6089 133893.8

In document PDFbigbook SAS (Page 181-184)