CHAPTER 1 ·····························································································································
2.3 COMPARISON BETWEEN FE AND PD ESTIMATOR ······································
𝜷̂𝐹𝐸 = [∑ 𝑿̃𝑔′𝑿̃ 𝑔
𝑔 ]−1∑ 𝑿𝑔̃𝑔′𝒚̃𝑔;
𝜷̂𝑃𝐷 = [∑ 𝑁𝑔 𝑔𝑿̃𝑔′𝑿̃
𝑔]−1∑ 𝑁𝑔 𝑔𝑿̃𝑔′𝒚̃𝑔,
Looking at the formulae of FE and PD, 𝜷̂𝑃𝐷 imposes density weights 𝑁𝑔 to both 𝑿̃𝑔′𝑿̃𝑔
(matrix denominator) and 𝑿̃𝑔′𝒚̃𝑔 (matrix numerator) of 𝜷̂𝐹𝐸 for each cluster 𝑔. The inverse
operation makes it hard to see how the weighting practice deviates 𝜷̂𝑃𝐷 from 𝜷̂𝐹𝐸.
This section discusses the two equivalence conditions between FE and PD, and how PD deviates from FE as the conditions are relaxed. The discussion is inspired by a pair of mathematical concepts—mediant and weighted mediant, which are analogous to FE and PD in the number theory. This pair of concepts is introduced in the following section.
2.3.1 Mediant and Weighted Mediant
A mediant 𝑚 for a sequence of 𝑛 fractions 𝑎1
𝑏1, 𝑎2 𝑏2, … , 𝑎𝑛 𝑏𝑛 is defined as 𝑚 =𝑎1+𝑎2+⋯+𝑎𝑛 𝑏1+𝑏2+⋯+𝑏𝑛 = (∑ 𝑏𝑖 𝑖) −1∑ 𝑎 𝑖 𝑖 , for 𝑖 = 1 … 𝑛,
where 𝑎𝑖 is a nonnegative real number, and 𝑏𝑖 a is positive real number. If 𝑤1, 𝑤2, … , 𝑤𝑛 are 𝑛 positive real numbers, then a weighted median is defined as
𝑚𝑤 = 𝑤1𝑎1+𝑤2𝑎2+⋯+𝑤𝑛𝑎𝑛
𝑤1𝑏1+𝑤2𝑏2+⋯+𝑤𝑛𝑏𝑛 = (∑ 𝑤𝑖 𝑖𝑏𝑖)
−1∑ 𝑤
𝑖𝑎𝑖
𝑖 , for 𝑖 = 1 … 𝑛.
It is interesting to see that 𝜷̂𝐹𝐸 and 𝜷̂𝑃𝐷 can be viewed as, in matrix form, the
mediant and the weighted mediant of the sequence of [𝑿̃𝑔′𝑿̃𝑔]−1𝑿̃𝑔′𝒚̃𝑔. Unfortunately, the
literature of the mediant is based on numbers, not vectors or matrices. To avoid groundless application of the mediant theory to the comparison between FE and PD, this paper chooses
58
relevant theories and then conducts Monte Carlo simulation to verify their applicability to the matrix setting.
2.3.2 Relevant Theories from Mediant Theory
In general, there are two relevant conclusions from the mediant theory: the first (equivalence conditions) states two sufficient conditions for the equivalency between mediant and weighted mediant; and the second (deviation conditions) informs how the difference between them is correlated to the association between weights and values of the fractions.
Equivalence Conditions: the mediant equals to the weighted mediant if any of the
following conditions holds (see proofs in Appendix C):
a. (Equal-weight Condition) all weights are equal: 𝑤𝑖 = 𝑤 for all 𝑖 = 1 … 𝑛 and 𝑤 ∈ ℝ>0;
b. (Equal-fraction Condition) all fractions are equal: 𝑎𝑖
𝑏𝑖= 𝑎
𝑏 for all 𝑖 = 1 … 𝑛 and 𝑎 ∈ ℝ≥0, 𝑏 ∈
ℝ>0.
Deviation Condition: If a relatively larger fraction 𝑎𝑛
𝑏𝑛 is associated with a relatively larger weight 𝑤𝑛, then 𝑚𝑤 > 𝑚. In another word, if the covariance between 𝑎𝑛
𝑏𝑛 and 𝑤𝑛 is positive, then 𝑚𝑤 > 𝑚; otherwise, 𝑚𝑤 < 𝑚. A simple case when 𝑛 = 2 and a simulation with 𝑛 >
2 are shown in Appendix C.
2.3.3 Extension to Vector and Matrix
For the matrix setting, the mediant and weighted mediant could be defined as 𝑀 = (𝐵1+ 𝐵2+ ⋯ + 𝐵𝑛)−1(𝐴
59 𝑀𝑤 = (𝑤1𝐵1+ 𝑤2𝐵2+ ⋯ + 𝑤𝑛𝐵𝑛)−1(𝑤 1𝐴1+ 𝑤2𝐴2+ ⋯ + 𝑤𝑛𝐴𝑛) = (∑ 𝑤𝑖 𝑖𝐵𝑖)−1∑ 𝑤 𝑖𝐴𝑖 𝑖 ,
in which the “fraction” is 𝒇𝑖 = 𝐵𝑖−1𝐴𝑖. Since FE and PD are vectors, we only consider the situations when 𝑀, 𝑀𝑤 and 𝒇𝑖 are vectors. Thus, if 𝐵𝑖 is, say a 𝐾 × 𝐾 invertible matrix, then 𝐴𝑖 is a 𝐾 × 1 vector and the resulting 𝑀, 𝑀𝑤 and 𝒇𝑖 are all 𝐾 × 1 vectors.
It can be shown that both equivalence conditions hold, meaning that the mediant equals to the weighted median in the matrix setting when either all weights are equal or all matrix fractions are equal. For the equal-weight condition, if 𝑤𝑖 = 𝑤 for all 𝑖 = 1 … 𝑛,
𝑀𝑤 = (∑ 𝑤𝑖 𝑖𝐵𝑖)−1∑ 𝑤𝑖 𝑖𝐴𝑖 = (∑ 𝑤𝐵𝑖 𝑖)−1∑ 𝑤𝐴𝑖 𝑖 = (∑ 𝐵𝑖 𝑖)−1∑ 𝐴𝑖 𝑖 = 𝑀. This condition generalizes the case (in section 2.2) in which FE and PD are equivalent when cluster sizes are equal. For the equal-fraction condition, if 𝐵𝑖−1𝐴𝑖 = 𝐵−1𝐴 for all 𝑖, then 𝐴
𝑖 = 𝐵𝑖𝐵−1𝐴,
and 𝑀𝑤 = (∑ 𝑤𝑖 𝑖𝐵𝑖)−1∑ 𝑤 𝑖𝐴𝑖
𝑖 = (∑ 𝑤𝑖 𝑖𝐵𝑖)−1(∑ 𝑤𝑖 𝑖𝐵𝑖)𝐵−1𝐴 = 𝐵−1𝐴 =
(∑ 𝐵𝑖 𝑖)−1(∑ 𝐵𝑖 𝑖𝐵−1𝐴) = (∑ 𝐵𝑖 𝑖)−1(∑ 𝐴𝑖 𝑖) = 𝑀. This condition suggests that if 𝒇𝑖 =
𝐵𝑖−1𝐴𝑖 is the same for all 𝑖, the weighted mediant does not deviate from the mediant even
though the weights are very different. In the case of 𝜷̂𝐹𝐸 and 𝜷̂𝑃𝐷, the fraction is 𝜷̂𝑔 = [𝑿̃𝑔′𝑿̃𝑔]−1𝑿̃𝑔′𝒚̃𝑔, which is the local OLS estimate of 𝜷 using observations only in cluster 𝑔.
The equal-fraction condition informs us that if the local estimates are the same, 𝜷̂𝐹𝐸 equals to 𝜷̂𝑃𝐷 no matter how different the cluster sizes are.
Unfortunately, the deviation condition no longer holds. Measuring the vector deviations with either element-wise comparison or the Euclidean distance between vectors, no correlation can be found between weighting larger weights to “larger” matrix fractions
60
and “larger” changes from mediant vector to the weighted mediant vector20.
In regression analysis, even under stationary assumption of 𝜷, local estimates 𝜷̂𝑔 could still deviate from the true 𝜷 and vary across clusters due to differences in error structures and cluster sizes.
For the variance-covariance matrices in (2.8) and (2.9), they are not in the form of matrix mediant and weighted mediant. Together with 𝜷̂𝐹𝐸 and 𝜷̂𝑃𝐷, the comparison between the variance estimators are examined by a Monte Carlo simulation in the next section.