4. Sampling Procedures
4.4.2 Stratified Random Sampling
The sample of ten sampling units identified in Figure 4.3 may well be a good representation of the 100 sampling units in the population. However, if we have some prior information about the population, it may be clear that this is not the case. For example, assume that the sampling frame depicted in Figure 4.2 comes from a list of employees in a company and that, for some other reason, the employees are listed by gender such that the first 40 employees are female and the second 60 employees are male, as shown in Figure 4.4.
00 01
02
03 04 05 06 07 08 09 10 11 12 1314
15 1617
18 19 20 21 22 23 24 25 26 27 28 29 Female 30 3132
33
34 35 36 37 38 39 4041
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 Male 60 6162
63 64 6566
67 68 69 70 7172
73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 9899
Figure 4.4 A Simple Random Sample from a Stratified Population
It is clear from Figure 4.4 that we have inadvertently over-sampled females (selecting 5 out of 40) and under-sampled males (selecting 5 out of 60). As a result, any inferences drawn from this sample will be biased towards the behaviour or attitudes of females because they are over-represented in the sample compared to their representation in the population.
To overcome this problem, stratified random sampling makes use of prior information to subdivide the population into strata of sampling units such that the units within each stratum are as homogeneous as possible with respect to the stratifying variable. Each stratum is then sampled at random using the same
sampling fraction for each stratum. When the same sampling fraction is used in each stratum, this method is sometimes called proportionate stratified sampling. The resulting sample will then have the correct proportion of each stratum within the whole population, and one source of error will have been eliminated. For example, if the males and females in Figure 4.4 were each sampled at a rate of 10% then the sample shown in Figure 4.5, where sampling unit 55 is substituted for sampling unit 33, would be a more representative sample than that shown in Figure 4.4. 00 01
02
03 04 05 06 07 08 09 10 11 12 1314
15 1617
18 19 20 21 22 23 24 25 26 27 28 29 Female 30 3132
33 34 35 36 37 38 39 4041
42 43 44 45 46 47 48 49 50 51 52 53 5455
56 57 58 59 Male 60 6162
63 64 6566
67 68 69 70 7172
73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 9899
Figure 4.5 A Stratified Random Sample from a Stratified Population
While the stratified sample in Figure 4.5 is obviously more representative than that in Figure 4.4 (in that males and females are represented in their correct proportions), the question remains as to whether the stratified sample is still a random sample of the population. This question can be answered by reference to the two criteria for random sampling noted earlier in this chapter. That is, a sample is random if:
• each unit is sampled independently; and
• each unit in the population has an equal probability of being selected in the sample (at the start of the sampling process).
With respect to stratified random sampling, it is clear that the second condition obviously holds, in that all males and females each have the same chance of selection (i.e. 10%) at the start of the process. With respect to the first condition, within each strata each unit is selected independently because simple random sampling is being employed within each strata. Therefore, given that each strata
is sampled independently of the other, the addition of two independent random samples will produce a third random sample.
To use stratified sampling, it is necessary that some prior information about the population is available before sampling takes place. The prior information should also relate to the variables which are to be measured in the survey. For example, if one were attempting to measure trip generation rates in a survey, then stratification on the basis of car ownership would be more useful than stratification on the basis of the day of the week on which the respondent was born (assuming both data sets were available prior to sampling). Whilst the latter stratification would ensure that we got the correct number of people born on each day in our sample, we would not expect this to improve our estimate of trip generation rates. On the other hand, by having the correct average car ownership in our sample, rather than (by chance) too high or too low an estimate of car ownership, we would expect a better estimate of trip generation rate. Therefore stratified sampling requires that we have some prior information about each unit in our population which is relevant to the objectives of the survey.
Whilst stratified sampling is useful, in general, to ensure that the correct proportions of each stratum are obtained in the sample, it becomes doubly important when there are some relatively small sub-groups within the population. With simple random sampling, it would be possible to completely miss out on sampling members of small sub-groups. Stratified random sampling at least ensures that some members of these rare population sub-groups are sampled (assuming these sub-groups were used as strata and that the product of the sub-group population size and the sampling rate produces a number greater than one).
A final advantage claimed for stratified sampling is that it allows different survey methods to be used for each of the strata. An example given by Stopher and Meyburg (1979), concerning stratification on the basis of access distance to a rail station, suggests that while the strata with shorter access distances may be surveyed by a postal questionnaire, the stratum with the longest access distance should be surveyed by personal interview on the basis that they are less likely to be transit-oriented travellers. Whilst such a variation in survey method is possible, care should be taken when comparing, or combining, the results for the different strata because of the different biases built into each of the survey methods.
A variation on stratified sampling is the use of multiple stratifications. Thus, instead of stratifying with respect to only one variable, the stratification can be performed with respect to several variables thus creating an n-dimensional matrix of stratification cells. In selecting the number of dimensions for
stratification and the number of strata within each dimension, attention should be paid to the total number of stratification cells produced. Since the number of cells increases geometrically with the number of dimensions or strata, it is possible to produce a large number of cells inadvertently. In such a case, the average number of units in the sample within each cell could be quite small (perhaps fractional). Under these conditions, the necessary round-off errors in drawing a sample could defeat the purpose of stratification, unless carefully controlled. The method by which stratification is conducted will depend to a large extent on the structure of the sampling frame to be used. In some sampling frames, the stratification may have already been performed in the compilation of the sampling frame list. For example, students at a University may already be categorised by Faculty. In such a case, an unrestricted random sample is conducted separately within each of the stratified lists. In other sampling frames, the list ordering may be completely random but it may be known how many sampling units belong in each stratum and, therefore, how many in the sample should come from each stratum. In this case, a random sample may be drawn from the entire list and, upon selection, each unit is placed in its correct stratum. When the required quota for each stratum has been sampled, further selections for that stratum are rejected and another selection is made.
Finally, it should be noted that the concept of stratification can also be used after the data have been collected by means of a simple random sample survey. In such a case, the survey results can be adjusted so that each stratum is represented in the correct proportion. Such weighted "expansion" (see Section 9.1) is frequently performed when the required stratification information does not become available until after the survey has been performed (e.g. when travel surveys are performed in Census years). However, it should be noted that such a procedure is strictly valid only when there is a sufficiently large sample size within each of the strata to enable reasonable confidence to be held in each of the strata results.