Variable-Coefficient Models
RANDOM-COEFFICIENTS MODEL 6.6.1 Model Formulation
6.6.2 A Bayes Solution
The formulation of (6.6.5) subject to (6.6.3) can be viewed from a Bayesian perspective, as there exists an informative prior on (6.6.3), but not on ¯␥. In the classical sampling approach, inferences are made typically by assuming that the probability law f (y, ) generating the observations y is known, but not the vector of constant parameters. Estimators ˆ(y) of the parameters are chosen as functions of y so that their sampling distributions, in repeated experiments, are, in some sense, concentrated as closely as possible about the true values of. In the Bayesian approach, a different line is taken. First, all quantities, including the parameters, are considered random variables. Second, all proba-bility statements are conditional, so that in making a probaproba-bility statement it is necessary to refer to the conditioning event as well as the event whose prob-ability is being discussed. Therefore, as part of the model, a prior distribution p() of the parameter is introduced. The prior distribution is supposed to ex-press a state of knowledge (or ignorance) about before the data are obtained.
Given the probability model f (y;), the prior distribution, and the data y, the probability distribution of is revised to p( | y), which is called the posterior distribution of , according to Bayes’ theorem (e.g., Kaufman (1977), Intriligator, Bodkin, and Hsiao (1996)):15
P( | y) ∝ p() f (y | ), (6.6.10)
where the sign “∝” denotes “is proportional to,” with the factor of proportion-ality being a normalizing constant.
Under the assumption that A.6.6.3. u∼ N(0, ),
we may write the model (6.6.5) as follows:
A.1. Conditional on X, ˜W, , and ¯␥,
y∼ N(X + ˜W ¯␥, ). (6.6.11)
A.2. The prior distributions of and ¯␥ are independent:
P(, ¯␥) = P() · P( ¯␥). (6.6.12)
A.3. P() ∼ N(A1, C).¯
6.6 A Mixed Fixed- and Random-Coefficients Model 169
A.4. There is no information about ¯ and ¯␥; therefore P( ¯) and P( ¯␥) are independent and
P( ¯) ∝ constant, P( ¯␥) ∝ constant.
Conditional on and C, repeatedly applying the formulas in Appendix 6A yields (Hsiao, Appelbe, and Dineen (1993)):
i. The posterior distribution of ¯ and ¯␥ given y is N
β¯∗
␥¯∗
, D1
, (6.6.13)
where D1=
A1X W˜
( + XC X)−1(X A1, ˜W)
−1
, (6.6.14)
and¯∗
␥¯∗
= D1
A1X W˜
( + XC X)−1y. (6.6.15) ii. The posterior distribution of given ¯ and y is N(∗, D2), where
D2 = {X[−1− −1W ( ˜˜ W−1W )˜ −1W˜−1]X+ C−1}−1, (6.6.16)
∗ = D2{X[−1− −1W ( ˜˜ W−1W )˜ −1W˜−1]y+ C−1A1}.¯ (6.6.17) iii. The (unconditional) posterior distribution of is N(∗∗, D3), where
D3 = {X[−1− −1W ( ˜˜ W−1W )˜ −1W˜−1]X+ C−1
− C−1A1( A1C−1A1)−1A1C−1}−1, (6.6.18)
∗∗= D3{X[−1− −1W ( ˜˜ W−1W )˜ −1W˜−1]y}
= D2{X[−1− −1W ( ˜˜ W−1W )˜ −1W˜−1]X ˆ + C−1A1¯∗}, (6.6.19) where ˆ is the GLS estimate of (6.6.5),
 = {Xˆ [−1− −1W ( ˜˜ W−1W )˜ −1W˜−1]X}−1
× {X[−1− −1W ( ˜˜ W−1W )˜ −1W˜−1]y}. (6.6.20) Given a quadratic loss function of the error of the estimation, a Bayes point estimate is the posterior mean. The posterior mean of ¯ and ¯␥ (6.6.15) is the GLS estimator of the model (6.6.5) after substituting the restriction (6.6.3),
y= X A1 + ˜¯ W ¯␥ + v, (6.6.21)
where v= X␣ + u. However, the posterior mean of  is not the GLS estimator
of (6.6.5). It is the weighted average between the GLS estimator of and the overall mean ¯ (6.6.17) or ¯∗(6.6.19) with the weights proportional to the in-verse of the precision of respective estimates. The reason is that although both (6.2.2) and (6.6.5) allow the coefficients to be different across cross-sectional units, (6.6.3) has imposed the additional prior information that are randomly distributed with mean A1. For (6.2.2), the best linear predictor for an individ-¯ ual outcome is to substitute the best linear unbiased estimator of the individual coefficients into the individual equation. For model of (6.6.5) and (6.6.3), be-cause the expectedi is the same across i and the actual difference can be attributed to a chance outcome, additional information abouti may be ob-tained by examining the behavior of others, hence (6.6.17) or (6.6.19).
In the special case of the error-components model (6.6.9), X = IN⊗ eT. Under the assumption that wi t contains an intercept term (i.e., ¯β = 0) and ui t
is i.i.d., the Bayes estimator ((6.6.15)) of ¯␥ is simply the GLS estimator of (6.6.21) ¯␥∗. The Bayes estimator ofαi (6.6.17) is
αi∗∗=
Tσα2 Tσα2+ σu2
ˆ¯vi, (6.6.22)
where ˆ¯vi = T1 T
τ=1ˆvi tand ˆvi t= yi t− ¯␥∗wi t. Substituting ¯␥∗, andαi∗∗for the unknown ¯␥, and αiin (6.6.9), Wansbeek and Kapteyn (1978) and Taub (1979) show that
ˆyi,T +S = ¯␥∗wi,t+s+ αi∗∗ (6.6.23)
is the best linear unbiased predictor (BLUP) for the i th individual s periods ahead.16
6.6.3 An Example
In a classical framework, it makes no sense to predict the independently drawn random variablei (or␣i). However, in panel data, we actually operate with two dimensions – a cross-sectional dimension and a time-series dimension.
Even thoughiis an independently distributed random variable across i , once a particulariis drawn, it stays constant over time. Therefore, it makes sense to predict i. The classical predictor ofi is the generalized least-squares estimator of the model (6.6.5). The Bayes predictor (6.6.19) is the weighted average between the generalized least-squares estimator of for the model (6.6.5) and the overall mean A1 if ¯ is known, or A¯ 1¯∗ if ¯ is unknown, with the weights proportional to the inverse of the precisions of respective estimates. The Bayes estimator of the individual coefficients,i, “shrinks” the GLS estimator ofi toward the grand mean ¯ or ¯∗. The reason for doing so stems from DeFinetti’s (1964) exchangeability assumption. When there are not enough time-series observations to allow for precise estimation of individual
i (namely, T is small), additional information abouti may be obtained by
6.6 A Mixed Fixed- and Random-Coefficients Model 171
Table 6.1. Long-haul regression coefficientsa Price coefficient,
Route unconstrained Mixed coefficients
1 −0.0712(−0.15) −0.2875(N/A)
2 0.1694(0.44) −0.0220(N/A)
3 −1.0142(−5.22) −0.7743(N/A)
4 −0.4874(−2.29) −0.1686(N/A)
5 −0.3190(−2.71) −0.2925(N/A)
6 0.0365(0.20) −0.0568(N/A)
7 −0.3996(−3.92) −0.3881(N/A)
8 −0.1033(−0.95) −0.2504(N/A)
9 −0.3965(−4.22) −0.2821(N/A)
10 −0.6187(−4.82) −0.5934(N/A)
Average N/A −0.3116
Route Income coefficient
1 1.4301(3.07) 0.4740(N/A)
2 −0.348(−0.09) 0.2679(N/A)
3 0.3698(1.95) 0.3394(N/A)
4 0.2497(0.70) 0.3145(N/A)
5 0.5556(2.71) 0.3501(N/A)
6 0.1119(0.95) 0.1344(N/A)
7 0.9197(8.10) 0.5342(N/A)
8 0.3886(3.88) 0.5255(N/A)
9 0.6688(6.16) 0.5648(N/A)
10 0.1928(2.39) 0.2574(N/A)
Average N/A 0.3762
at-statistics in parentheses.
Source: Hsiao, Appelbe, and Dineen (1993, Table 3).
examining the behavior of others because the expected response is assumed the same and the actual differences in response among individuals are the work of a chance mechanism.
Table 6.1 presents Canadian route-specific estimates of the demand for customer-dialed long-haul long-distance service (>920 miles) based on quar-terly data from 1980 (I) to 1989 (IV) (Hsiao, Appelbe, and Dineen (1993)).
Some of the point-to-point individual route estimates (unconstrained model) of the price and income coefficients have the wrong signs (Table 6.1, column 2) per-haps because of multicollinearity. However, if one invokes the representative-consumer argument by assuming that representative-consumers respond in more or less the same way to price and income changes (thus considering the coefficients of these variables across routes as random draws from a common population with constant mean and variance–covariance matrix), but also allows route-specific effects to exist by assuming that the coefficients of the intercept and seasonal
dummies are fixed and different for different routes, then all the estimated route-specific price and income coefficients have the correct signs (Table 6.1, column 3).
6.6.4 Random or Fixed Parameters