• No results found

A Bayes Solution

Variable-Coefficient Models

RANDOM-COEFFICIENTS MODEL 6.6.1 Model Formulation

6.6.2 A Bayes Solution

The formulation of (6.6.5) subject to (6.6.3) can be viewed from a Bayesian perspective, as there exists an informative prior on␤ (6.6.3), but not on ¯␥. In the classical sampling approach, inferences are made typically by assuming that the probability law f (y, ␪) generating the observations y is known, but not the vector of constant parameters␪. Estimators ˆ␪(y) of the parameters ␪ are chosen as functions of y so that their sampling distributions, in repeated experiments, are, in some sense, concentrated as closely as possible about the true values of␪. In the Bayesian approach, a different line is taken. First, all quantities, including the parameters, are considered random variables. Second, all proba-bility statements are conditional, so that in making a probaproba-bility statement it is necessary to refer to the conditioning event as well as the event whose prob-ability is being discussed. Therefore, as part of the model, a prior distribution p(␪) of the parameter ␪ is introduced. The prior distribution is supposed to ex-press a state of knowledge (or ignorance) about␪ before the data are obtained.

Given the probability model f (y;␪), the prior distribution, and the data y, the probability distribution of␪ is revised to p(␪ | y), which is called the posterior distribution of ␪, according to Bayes’ theorem (e.g., Kaufman (1977), Intriligator, Bodkin, and Hsiao (1996)):15

P(␪ | y) ∝ p(␪) f (y | ␪), (6.6.10)

where the sign “∝” denotes “is proportional to,” with the factor of proportion-ality being a normalizing constant.

Under the assumption that A.6.6.3. u∼ N(0, ),

we may write the model (6.6.5) as follows:

A.1. Conditional on X, ˜W, ␤, and ¯␥,

y∼ N(X␤ + ˜W ¯␥, ). (6.6.11)

A.2. The prior distributions of␤ and ¯␥ are independent:

P(␤, ¯␥) = P(␤) · P( ¯␥). (6.6.12)

A.3. P(␤) ∼ N(A1␤, C).¯

6.6 A Mixed Fixed- and Random-Coefficients Model 169

A.4. There is no information about ¯␤ and ¯␥; therefore P( ¯␤) and P( ¯␥) are independent and

P( ¯␤) ∝ constant, P( ¯␥) ∝ constant.

Conditional on and C, repeatedly applying the formulas in Appendix 6A yields (Hsiao, Appelbe, and Dineen (1993)):

i. The posterior distribution of ¯␤ and ¯␥ given y is N

β¯

␥¯

 , D1



, (6.6.13)

where D1=

A1X W˜



( + XC X)−1(X A1, ˜W)

−1

, (6.6.14)

and␤¯

␥¯



= D1

A1X W˜



( + XC X)−1y. (6.6.15) ii. The posterior distribution of␤ given ¯␤ and y is N(␤, D2), where

D2 = {X[−1− −1W ( ˜˜ W−1W )˜ −1W˜−1]X+ C−1}−1, (6.6.16)

= D2{X[−1− −1W ( ˜˜ W−1W )˜ −1W˜−1]y+ C−1A1␤}.¯ (6.6.17) iii. The (unconditional) posterior distribution of␤ is N(␤∗∗, D3), where

D3 = {X[−1− −1W ( ˜˜ W−1W )˜ −1W˜−1]X+ C−1

− C−1A1( A1C−1A1)−1A1C−1}−1, (6.6.18)

∗∗= D3{X[−1− −1W ( ˜˜ W−1W )˜ −1W˜−1]y}

= D2{X[−1− −1W ( ˜˜ W−1W )˜ −1W˜−1]X ˆ␤ + C−1A1␤¯}, (6.6.19) where ˆ␤ is the GLS estimate of (6.6.5),

␤ = {Xˆ [−1− −1W ( ˜˜ W−1W )˜ −1W˜−1]X}−1

× {X[−1− −1W ( ˜˜ W−1W )˜ −1W˜−1]y}. (6.6.20) Given a quadratic loss function of the error of the estimation, a Bayes point estimate is the posterior mean. The posterior mean of ¯␤ and ¯␥ (6.6.15) is the GLS estimator of the model (6.6.5) after substituting the restriction (6.6.3),

y= X A1␤ + ˜¯ W ¯␥ + v, (6.6.21)

where v= X␣ + u. However, the posterior mean of ␤ is not the GLS estimator

of (6.6.5). It is the weighted average between the GLS estimator of␤ and the overall mean ¯␤ (6.6.17) or ¯␤(6.6.19) with the weights proportional to the in-verse of the precision of respective estimates. The reason is that although both (6.2.2) and (6.6.5) allow the coefficients to be different across cross-sectional units, (6.6.3) has imposed the additional prior information that␤ are randomly distributed with mean A1␤. For (6.2.2), the best linear predictor for an individ-¯ ual outcome is to substitute the best linear unbiased estimator of the individual coefficients into the individual equation. For model of (6.6.5) and (6.6.3), be-cause the expected␤i is the same across i and the actual difference can be attributed to a chance outcome, additional information about␤i may be ob-tained by examining the behavior of others, hence (6.6.17) or (6.6.19).

In the special case of the error-components model (6.6.9), X = IN⊗ eT. Under the assumption that wi t contains an intercept term (i.e., ¯β = 0) and ui t

is i.i.d., the Bayes estimator ((6.6.15)) of ¯␥ is simply the GLS estimator of (6.6.21) ¯␥. The Bayes estimator ofαi (6.6.17) is

αi∗∗=

 α2 α2+ σu2



ˆ¯vi, (6.6.22)

where ˆ¯vi = T1 T

τ=1ˆvi tand ˆvi t= yi t− ¯␥wi t. Substituting ¯␥, andαi∗∗for the unknown ¯␥, and αiin (6.6.9), Wansbeek and Kapteyn (1978) and Taub (1979) show that

ˆyi,T +S = ¯␥wi,t+s+ αi∗∗ (6.6.23)

is the best linear unbiased predictor (BLUP) for the i th individual s periods ahead.16

6.6.3 An Example

In a classical framework, it makes no sense to predict the independently drawn random variable␤i (or␣i). However, in panel data, we actually operate with two dimensions – a cross-sectional dimension and a time-series dimension.

Even though␤iis an independently distributed random variable across i , once a particular␤iis drawn, it stays constant over time. Therefore, it makes sense to predict ␤i. The classical predictor of␤i is the generalized least-squares estimator of the model (6.6.5). The Bayes predictor (6.6.19) is the weighted average between the generalized least-squares estimator of␤ for the model (6.6.5) and the overall mean A1␤ if ¯␤ is known, or A¯ 1␤¯ if ¯␤ is unknown, with the weights proportional to the inverse of the precisions of respective estimates. The Bayes estimator of the individual coefficients,␤i, “shrinks” the GLS estimator of␤i toward the grand mean ¯␤ or ¯␤. The reason for doing so stems from DeFinetti’s (1964) exchangeability assumption. When there are not enough time-series observations to allow for precise estimation of individual

i (namely, T is small), additional information abouti may be obtained by

6.6 A Mixed Fixed- and Random-Coefficients Model 171

Table 6.1. Long-haul regression coefficientsa Price coefficient,

Route unconstrained Mixed coefficients

1 −0.0712(−0.15) −0.2875(N/A)

2 0.1694(0.44) −0.0220(N/A)

3 −1.0142(−5.22) −0.7743(N/A)

4 −0.4874(−2.29) −0.1686(N/A)

5 −0.3190(−2.71) −0.2925(N/A)

6 0.0365(0.20) −0.0568(N/A)

7 −0.3996(−3.92) −0.3881(N/A)

8 −0.1033(−0.95) −0.2504(N/A)

9 −0.3965(−4.22) −0.2821(N/A)

10 −0.6187(−4.82) −0.5934(N/A)

Average N/A −0.3116

Route Income coefficient

1 1.4301(3.07) 0.4740(N/A)

2 −0.348(−0.09) 0.2679(N/A)

3 0.3698(1.95) 0.3394(N/A)

4 0.2497(0.70) 0.3145(N/A)

5 0.5556(2.71) 0.3501(N/A)

6 0.1119(0.95) 0.1344(N/A)

7 0.9197(8.10) 0.5342(N/A)

8 0.3886(3.88) 0.5255(N/A)

9 0.6688(6.16) 0.5648(N/A)

10 0.1928(2.39) 0.2574(N/A)

Average N/A 0.3762

at-statistics in parentheses.

Source: Hsiao, Appelbe, and Dineen (1993, Table 3).

examining the behavior of others because the expected response is assumed the same and the actual differences in response among individuals are the work of a chance mechanism.

Table 6.1 presents Canadian route-specific estimates of the demand for customer-dialed long-haul long-distance service (>920 miles) based on quar-terly data from 1980 (I) to 1989 (IV) (Hsiao, Appelbe, and Dineen (1993)).

Some of the point-to-point individual route estimates (unconstrained model) of the price and income coefficients have the wrong signs (Table 6.1, column 2) per-haps because of multicollinearity. However, if one invokes the representative-consumer argument by assuming that representative-consumers respond in more or less the same way to price and income changes (thus considering the coefficients of these variables across routes as random draws from a common population with constant mean and variance–covariance matrix), but also allows route-specific effects to exist by assuming that the coefficients of the intercept and seasonal

dummies are fixed and different for different routes, then all the estimated route-specific price and income coefficients have the correct signs (Table 6.1, column 3).

6.6.4 Random or Fixed Parameters