• No results found

PRINCIPLES OF MULTIPLE FRAME SAMPLING

Multiple Frame Sampling

7.2. PRINCIPLES OF MULTIPLE FRAME SAMPLING

The list and area frames can be developed independently, and samples can be selected separately from each frame, in single or in multiple stages. The final sample of areas of land or points from the area frame and a sample of farms from the list frame must be selected independently.

Two main assumptions must be made when using multiple frame sampling:

• Completeness. Every farm in the population belongs to at least one frame. While the area frame sampling unit is an area of land, the associated name may be required to report all crops and livestock “regardless of ownership” on the land in their holding. While the sampling unit for the list frame is a name, the reporting unit is the holding, and the holder must report all activities occurring on the land in the holding “regardless of ownership”. This connects the entire target population to a unit of land. In this case, the frame is complete. However, it may not be practical to request operators to report livestock numbers that do not belong to them. If the choice of reporting unit requires farm operators with no land but having cattle to report those cattle, the list will contain names with no land. In this case, the area frame will be complete only if the sample design includes residential areas with samples of households selected and screened for agricultural activities.

• Identifiability. For any sample unit from any frame, it is possible to determine whether the reporting unit belongs to any other frame. The use of an area frame means, by definition, that every list frame reporting unit overlaps with the area frame. The requirement of identifiability is met by determining which area frame reporting units can also be selected from the list frame.

The sampling unit for the area frame is a segment of land or a point. Rules of association are used to link the land in the segment or point to a farm that is also found on the list frame, usually using the name of the farm operator. The sampling unit from the list is a name of a farm operator, while the reporting unit is the holding operated by the name. A final assumption is that the overlap between the two frames can be determined by matching names. When an area frame is used, it is by definition complete, and thus overlaps completely with the list frame.

The basic theory of multiple frame sampling (Hartley, 1962; Kott and Vogel, 1995) begins with dividing the population into mutually exclusive domains. Figure 7.1 shows two sampling frames that cover the same target population and form three domains:

• a, a non-overlapping domain containing units belonging only to Frame A; • b, a non-overlapping domain containing units belonging only to Frame B; and • ab, an overlapping domain containing units belonging to both Frames A and B.

FIGURE 7.1

The population total Y can be written as

If Frame A is an area frame, the population total for Ya is based on land in farms having farm operators whose names

do not appear on the list frame. Yab is the population total for farms that could be selected from either Frame A or

Frame B. If Frame B is a list frame, Yb is the population total for farms on the list frame that have operators whose

names cannot be associated with land in the area frame. This may happen when the name represents a person who owns livestock but does not operate any land. Whether or not the area frame is complete for these types of reporting units depends on how the rules of association are defined and the screening of households for farm operators is performed. It is usual practice for the area frame design and rules of association to be such that Yb is zero.

This corresponds to a simplified scenario, in which the list frame is embedded into the area frame in a dual-frame design with only two domains:

• a, a non-overlapping domain containing units belonging only to the area frame (Frame A); and

• ab, an overlapping domain containing units belonging to both area and list frames (Frames A and B). In this case,

ab is indeed the complete list frame.

To emphasize the nature of the frames, from now on the term “Frame A” refers to an area frame and “Frame L” refers to a list frame. Figure 7.2 illustrates the idea of a full coverage area frame and how the two domains, formed by the area and list frame together, can be viewed as either the union of the domain a (from Frame A) and the list frame L, or as the union of domains a and ab, both from Frame A.

Using the dual-frame design shown in Figure 7.2 above, the population total Y can be written in a simpler form: FIGURE 7.2

This expression shows that estimates for the population total Y can be produced by adding estimates for the total of each domain:

Although the overlapping domain can be represented by either ab or L, the estimators Ŷab and ŶL use data and sample

design information from different frames. Therefore, Hartley (1962) proposed using both in the same expression to define a general class of estimators, as follows:

where

• ŶH is Hartley’s general estimator, also referred to as the full population total estimator, making use of data from

both frames;

• Ŷa is the domain a estimator, based only on the area frame sample data;

• Ŷab is the domain ab estimator, based only on the area frame sample data;

• ŶL is the estimator for the list frame total, based only on the list frame sample data; and

• p is an arbitrary constant, such that 0 ≤ p ≤ 1.

Indeed, each choice of p defines an estimator. The so-called screening estimator, that corresponds to choosing

p = 0, is very convenient. In this case,

To apply this estimator, the sampled units from the area frame that are listed in the list frame are eliminated. In practice, a screening procedure is applied to the area frame sampled units to identify area reporting units that are also present in the list frame; hence the name “screening estimator”. In some cases, the domains can be determined from a large-scale area frame survey, and then only the non-overlap domain sampled for subsequent surveys. Since the area and list frames were developed and sampled independently, the design-based estimators for Ya and

Yab are used to estimate the corresponding population values. However, the population total for farms found in both

frames Yab can be estimated using design-based estimators from either Frame A or B or from a combination of both

frames. A large proportion of the literature on multiple frame sampling concerns estimation when the sampling units are represented by both frames.

Another feature of multiple frame sample designs is that each frame can be designed independently of the other. The optimal use of multi-stage sampling, stratification, the use of PPSs, and calibration estimators can be applied separately to each frame. It remains necessary to determine, for each area frame sampling or reporting unit, whether or not it could also have been selected from the list frame. This requirement complicates the use of multiple frames. Section 3 below outlines problems in the application of multiple frame surveys.

7.3. PROBLEMS IN THE APPLICATION OF MULTIPLE FRAME SURVEYS