Multiple Frame Sampling
7.3. PROBLEMS IN THE APPLICATION OF MULTIPLE FRAME SURVEYS Multiple frame surveys include all the complexities of single frame surveys, as well as the additional requirement
that the overlap between frames be determined (Vogel, 1975).
All farms in the list frame must be completely identified by name, address, and any other name forms that can identify the farming unit. The data collection for the area frame must also obtain detailed information on names and addresses, because the overlap is based on a name-matching exercise. A list of farms from a census identified only by an address will be difficult to use in a multiple frame context.
Another issue is that when developing a list of farms, more than one source of names may be available. For example, one list of names may derive from the agricultural census and another from an administrative source. The choice is between using the two lists in the multiple frame design or combining them. If the two lists are used in a multiple frame design as well as with an area frame, the population can be divided into four mutually exclusive domains. The need to identify all domains when there are two or more list frames greatly complicates the survey and estimation process. For this reason, it is more practical to combine all lists and remove duplicates prior to sampling. While record linkage methodologies can assist in removing duplicates, the process is subject to errors in the linkage. The need to match names from the area frame sample to names on the list frame complicates the survey process and is subject to non-sampling errors. The area frame sampling unit must be mapped onto a reporting unit, as does the list frame name. A further complexity arises from the fact that the list frame represents farms or households for a previous point in time. Some may no longer exist when the survey is conducted, possibly having been replaced with other farms included elsewhere in the list or that are new to the country’s agriculture. The following example illustrates the situations that may be encountered.
Suppose that when the census was conducted and used to develop the list frame, the name associated with Farm/ Household 1 was “Mr. A”. Mr. A was thus selected for the list sample. However, when the enumerator visited the selected household, it was learned that Mr A no longer lived there and that there was now a “Mr. B”. The enumerator collects data for the land operated by Mr. B. Rules of association are used to determine whether the statistical office can use the data reported by Mr. B, according to the following steps:
• Determine whether Mr. B may also be found elsewhere on the list. If yes, Mr. B already had a chance of being selected, such that either a multiplicity estimator can be used or the data for Farm 1 set to zero.
• Mr. B does not appear elsewhere on the list. Does the statistical office use Mr. B’s data for Farm 1? Suppose that the area frame sample contains land operated by Mr. B. Data for Mr. B will be collected from the area frame. The name-matching exercise will show that Mr. B is not on the list; therefore, Farm 1 falls within the domain belonging only to the area frame. The data for Mr. B from the list sample should not be substituted for Mr. A in the list sample, because it would result in an upward bias.
The need to match area frame reporting units with the list frame requires taking extreme care when recording names and addresses, to avoid errors. A misspelled name may result in an area reporting unit being assigned to the wrong name, or require a follow-up interview.
Non-response is especially acute when used in a multiple frame design, especially for the area frame. Because the area frame tract or area surrounding a data point can be observed and measured, the results for non-response are robust. However, in multiple frame design, it is essential that a name be associated with the area tract. The difficulty of assigning names where there is non-response is a major source of non-sampling error in multiple frame sample surveys. A list and area multiple frame sampling design should yield more efficient and robust estimators than use of an area frame alone. If the list frame is not constructed carefully or is not updated, outliers can occur if rare or large farms are missing from the list and appear in the area non-overlap domain. If the large farms were on the list frame,
its design-based expansion factor would be very small; however, a large expansion factor would arise in the area frame’s non-overlap domain.
A common problem is the temptation to make the list as large as possible to avoid the occurrence of outliers. However, the larger the list, the more subject it is to duplication. The smaller the list, the easier it is to avoid duplication and to determine the non-overlap domain.
Another common problem is the temptation to add names found in the area frame survey to the list frame. These additions introduce a downward bias, because the estimation probabilities have been changed (reduced) when added to the list.
To determine the domains, it must first be assumed that every farm or household included in the list frame has a chance of being selected in the area frame sample. The area frame sample then includes two domains: those that are not on the list (non-overlap) and those that are on the list (overlap). An essential point is that the identification of the two area frame domains must be based on the area frame sampled units, and not on the entire area frame. As stated before, the area frame sampling unit is land-based. The survey process must associate names of farm operators and/or households with each sampled unit of land. This provides a listing of names associated with the area frame sample that is to be matched with the entire list frame (not the list sample, but the entire list frame). The accuracy of this matching process depends on the quality of the data collection effort from the area frame side, and the quality of the list frame development process. If the name associated with an area frame reporting unit is also on the list frame, then the area frame unit is on the overlap domain. This requires assuming that the name on the list would report for the same unit of land selected in the area frame sample. If the name associated with the area frame reporting unit cannot be found on the list frame, then that area frame reporting unit is in the non-overlap domain. If a name from the area frame does not match a list frame name exactly, but is close, then it may be possible to compare addresses or secondary names associated with the reporting units. In some cases, it may be necessary to return to the area frame reporting unit to obtain additional information to determine the overlap status. This is an exacting process, in which errors in classification add error and/or bias to the estimators.