• No results found

Multiple Frame Sampling

7.5. DUAL FRAME ESTIMATOR

Dual frame estimators combine estimates for the domains generated by the two frames covering the target population. Since independent probability sampling schemes are applied to each frame, to introduce the estimators, it is necessary to distinguish the inclusion probabilities related to each frame.

Let SA denote the sample selected from the area frame and SL be the sample selected from the list frame.

Define as the probability that sample unit k is included in the area frame sample, and as the probability that both k and l are in SA.

Similarly, define as the probability that k is included in the list frame sample and as the probability of both k and l being in SL.

In the following subsections, the main dual frame estimators proposed in the literature are presented.

7.5.1. Hartley and the screening estimator

As mentioned previously, Hartley’s general class of dual frame estimators is given by:

Although independent samples are taken from each frame, the general variance form must take into account the area frame-related covariance between estimators for non-overlapping and overlapping domains a and ab respectively, resulting in the expression below:

(1)

Any value for p such that 0 ≤ p ≤ 1 can be used. One of these is an optimal choice, in the sense that it minimizes the variance Var(ŶH). Note that Expression (1) above can also be written as

Therefore,

Thus, the best choice for p (in the sense that it estimates the optimum value po) is:

In practice, the value can be very close to zero. In such cases, adopting the screening estimator (choosing p =0) is convenient and advantageous. Recall the simplicity of the screening estimator expression,

Given that the domains are mutually exclusive and the estimators use information from different frames, the variance of the screening estimator is simply:

This same variance formula would be obtained if a stratified sample design were applied, with strata corresponding to the domains. Consistent estimators for the variance of Hartley’s estimator and the screening estimator can be derived using the expressions shown in Section 4.

7.5.2. The Fuller-Burmeister estimator

Fuller and Burmeister (1972) also proposed a general class of estimators. Their estimation approach incorporates area sample information on the size of the overlapping domain (Na). The expression below emphasizes this feature

as a term to be added to the Hartley estimator:

Rearranging the terms, the constants p1 and p2 can also be emphasized as the coefficients of a regression-type

estimator:

If the partial correlation between (Ŷa+ŶL) and ( ab - L ), given (Ŷab-ŶL), is not zero, the Fuller-Burmeister estimator

is expected to be statistically more efficient than the Hartley estimator.

Optimum choices for p1 and p2, in the sense that they estimate the values that minimize the variance Var(ŶFB), are

given by:

A consistent estimator for the variance of the Fuller-Burmeister estimator is given below:

]

7.5.3. The Skinner-Rao estimator

Skinner and Rao (1996) noted that the Fuller-Burmeister estimator, based on the optimum values for the coefficients, is not a simple linear combination of the y values.

This is because the estimated values of and are chosen for the purpose of minimizing the Var(ŶFB). To

develop an estimator that is a simple weighted combination of the variables of interest, Skinner and Rao proposed a pseudo-maximum likelihood estimator that can be expressed as follows:

.

In this expression,

;

; and is the smallest root of the quadratic equation

, (2) where

;

and

To achieve the desired linear simplicity for their estimator, Skinner and Rao (1996) suggest choosing the value for

p that minimizes not the variance of their estimator ŶSR, but rather the asymptotic variance of . The linearized

Let and be consistent estimators for the asymptotic variances of the respective estimators. Then, the value for p that minimizes the asymptotic variance of is

Therefore, the best choice for the constant is

The approximated variance of the Skinner-Rao estimator can be expressed as the sum of two components, one related to each frame:

The first component can be written as a function of Frame A sampling weights:

A consistent estimator for the above expression is provided by

where

. Similarly, write the second component as a function of frame L sampling weights:

Then, a consistent estimator for the variance approximated above is given by

.

Therefore,

7.5.4. Single frame-type estimator

The Skinner-Rao estimator is one example of estimator that can be expressed as a single frame-type estimator. Previously, Bankier (1986), Kalton and Anderson (1986) and Skinner (1991) had proposed types of estimators that fit into a class with the feature of producing estimates based on the sum of estimates to each frame. Let ŸA be the

estimator used in the area frame data and ŸL be the estimator used in the list frame data. Then, the general form of

this class of estimators is given by:

where

; and

Since the samples are taken independently from each frame, the variance and variance estimation of the estimators in this class can be assessed separately and their components summated over, to provide the respective dual-frame measure.

7.5.5. Choosing among dual-frame estimators

The estimators introduced above display differing levels of complexity, depending on how they provide estimates for the overlapping domain Yab. All estimators can be extended to more than two frames, and they are also able to

accommodate complex sample designs (Ferraz, 2015). Considering the level of complexity involved in the data collection and matching sampled units between the frames (Vogel 1975), it is recommended that the estimator be chosen on the basis of simplicity. Screening estimators are the simplest to understand and apply in practice, leaving the more complex aspects to the data collection and matching process. Therefore, it is recommended that these be used as a valid starting point. The precision of estimates can be improved by investigating the feasibility and efficiency of other estimators based on simulation studies at a later stage. These studies seeking to compare the statistical performances of dual-frame estimators should take into account the peculiarities of each country and specific probability sample designs.

To improve precision, one should not only search for competitive dual-frame estimators, but also ascertain how to incorporate auxiliary variables – that may be available through at least one of the frames – into the inference process.