A Forecasting and Averaging in a Bayesian Framework

This Appendix provides details on the methods used to forecast with the dynamic linear model de…ned by equations (1) and (2) in the main text, as well as the averaging approach. We follow closely Dangl and Halling (2012) and West and Harrison (1997).

A.1 Bayesian Estimation of the Parameters of the Predictive Re-gression

For convenience we begin by transcribing the system of equations from Section 2.1 in the main text:

et+h= X_{t t}⁰ + vt+h; vt+h N (0; V ), (observation equation); (A.1)

t= _{t 1}+ $_t; $_t N (0; W_t), (transition equation); (A.2) The essential components of the Bayesian approach we employ are the priors for V and t, along with a method to estimate Wt; the joint or conditional posterior dis-tribution of V and _t; and in the context of our predictive regression, the predictive density. Finally, we also require an updating scheme for the priors after observing the data.

The approach involves a full conjugate Bayesian analysis. The starting point is the natural conjugate g prior speci…cation set at t = 0:

V jD0 IG 1 2;1

2S₀ , (A.3)

0jD0; V N [0; gS₀(X⁰X) ¹], (A.4)

where,

S₀ = 1

N 1 e⁰(I X(X⁰X) ¹X⁰) e, (A.5)

and D0 indicates the conditioning information at t = 0. In general, at any arbi-trary subsequent period, D_t = [ e_t; e_{t h}; :::; X_t; X_{t h}; :::; P riors_t=0]. That is, D_t contains the exchange rate variations, the predictors, and the prior parameters. At this arbitrary period we can form a posterior belief about the unobservable coe¢ -cient _{t 1}jDt, and the variance of the observation equation error term (observational variance V jDt). The use of a natural conjugate prior implies that the posterior dis-tributions are from the same family as the priors. Speci…cally, the posteriors are also jointly normally-inverse gamma distributed:

V jD^t IG nt

2 ;ntSt

2 , (A.6)

t 1jDt; V N (m_t; V C_t), (A.7) where, S_tis the mean of the estimate of the observational variance V; at time t, with nt as the corresponding number of degrees-of-freedom; mt is the estimate of coe¢ -cient vector _{t 1} conditional on D_t and V ; and C_t corresponds to the conditional variance matrix of _{t 1}, normalised by the by the observational variance. Inte-grating out the distribution given by (A.7) with respect to V; yields a multivariate t distribution for the coe¢ cients’posterior:

t 1jD^t Tnt(mt; StC_t). (A.8)

The form of the transition equation given by (A.2) suggests that, when updating the coe¢ cients vector, the posterior distribution of _{t 1}jDt represented by (A.8) does not necessarily become the prior for _tjDt. The equation indicates that the transition process is exposed to normally distributed random shocks, which widens the variance but maintains the mean:

tjD^t Tnt(mt; StC_t + Wt). (A.9)

The predictive density of the h-step-ahead change in the exchange rate e_t+h; is obtained by integrating the conditional density of e_t+hover the space spanned by and V: To derive it, let '(x; ; ²) be the density of a normal distribution evaluated at x and ig(V ; a; b) be the density of an IG[a; b] distributed variable, evaluated at V: Then, the predictive density is:

f ( et+hjD^t) = Z ₁

'( e; X_t⁰ ; V )'( ; mt; V C_t + Wt)d ig nt

2 ;ntSt

2 dV

= Z ₁

'( e; X_t⁰m_t; X_t⁰(V C_t + W_t)X_t+ V ) ig nt

2 ;ntSt

2 dV

= tnt( et+h; cet+h; Qt+h), (A.10) where, t( e_t+h; ce_t+h; Q_t+h) denotes the density of a t distribution with n_t degrees-of-freedom, mean ce_t+h, variance Q_t+h, evaluated at e_t+h. The mean of the predictive distribution is computed as:

ce_t+h= X_t⁰m_t, (A.11)

and the total unconditional variance of the same distribution is given by:

Qt+1= X_t⁰RtXt+ St, (A.12)

with

R_t= S_tC_t + W_t, (A.13)

where, R_t is the unconditional variance of the coe¢ cient vector _t at time t. The

…rst term in equation (A.12) captures the variance arising from uncertainty in the estimation of the coe¢ cient vector t. The last term St denotes the estimate of the variance of the disturbance term of the observation equation.

After observing the exchange rate change at t + h, the priors on _t and V are updated as described in equations (A.14)-(A.19). The …rst element, is the prediction error:

"t+h= et+h ce_t+h, (prediction error). (A.14) Insofar as the prediction error equals zero, no updating occurs in the coe¢ cient vector, since the forecast matches the value observed in the data.

nt+1= nt+ 1, (degrees-of-freedom), (A.15)

S_t+h= S_t+S_t n_t

"²_t+h Q_t+h 1

, (estimator of observational variance). (A.16) Given that the total variance of the forecast is expressed by Q_t+h, then E("²_t+h) = Q_t+h. When the prediction error matches its expectation, i.e., "²_t+h = Q_t+h, the estimate of the observational variance remains unchanged (S_t+h = S_t). When the prediction error is lower (higher) than the expected error, the estimated observa-tional variance reduces (increases).

An additional element that induces changes in the coe¢ cients is the adaptive vector:

At+h= RtXt

Q_t+h, (adaptive vector). (A.17)

It characterises the degree to which the posterior of the coe¢ cient vector _tchanges to new observation. The numerator of equation (A.17) can be recognised as the information content of the current observation, and the denominator as the measure of the precision of the estimated coe¢ cients. With the above elements, we are now in position to update the coe¢ cients’point estimate m_t and the covariance matrix C_t:

m_t+h= m_t+ A_t+h"_t+h, (expected coe¤. vector estimator), (A.18) C_t+h= 1

S_t R_t A_t+hA⁰_t+hQ_t+h , (variance of the coe¤. vector estimator). (A.19) The exposition so far does not include a method to estimate W_t. However, as we noted in Section 2.1, to capture the relationship between the coe¢ cients’estimation

error and the variance, we let W_t be proportional to the estimation variance S_tC_t of the coe¢ cients tjD^t at time t: That is:

W_t= 1

S_tC_t, 2 f 1; ₂; :::; _dg, 0 < j 1. (A.20) Therefore, the variance of the predicted coe¢ cient vector expressed in equation (A.13) simpli…es to:

Rt= StC_t + 1

StC_t = 1

StC_t. (A.21)

This completes the requisites for forecasting with one model. However, the approach we pursue allows for k candidate predictors and d possible support points for time-variation in coe¢ cients, and therefore, k:d models. Hence, part A.2 of this appendix extends the analysis to deal with these possibilities in a Bayesian model selection and averaging approach.

A.2 Bayesian Dynamic Averaging Over Models and Forgetting Fac-tors

Let M_i constitute a speci…c selection of a predictor from a set of k candidates, and _j a speci…c choice of degree of time-variation in coe¢ cients from the space f ¹; 2; :::; dg. Clearly, the mean of the predictive distribution of the forecasts we computed above (see equation (A.11)) is in‡uenced by these speci…c choices. Hence, the point estimate of e_t+h is now also conditional on M_i and _j:

ce^j_t+h;i= E( et+hjMⁱ; j; Dt) = X_t⁰mtjMⁱ; j; Dt. (A.22) The starting point in examining which model setting turns out to be important empirically, is to assign prior weights to each individual predictor Mi and each support point _j. We begin with a prior that allows each predictor and each support point to have the same chance of becoming probable. That is, for each M_i and _j we set uninformative priors:

P (M_ij j; D₀) = 1=k, (A.23)

P ( _jjD0) = 1=d. (A.24)

At time t, the posterior probabilities are updated using Bayes’s rule. We …rst update the posterior probability of a certain model, given a value of _j:

P (M_ij j; D_t) = f ( e_tjMi; _j; D_{t h})P (M_ij j; D_{t h})

f ( etj ^j; Dt h) , (A.25)

where,

f ( etj ^j; D_{t h}) =X

f ( etjMⁱ; j; D_{t h})P (Mij ^j; D_{t h}). (A.26)

The key ingredient is the conditional density:

f ( e_tjMi; _j; D_{t h}) 1 q

Q^j_t;i t_n_t ₁

@ et e^j_t;i q

Q^j_t;i 1

A , (A.27)

where, t_n_t ₁ is the density of a Student t distribution and e^j_t;i and Q^j_t;i are the corresponding point estimates and variance of the predictive distribution of model M_i, given = _j (refer to equation (A.10)). The prediction of the average model for each of the speci…c value of = _j is given by:

ce^j_t+h= Xk i=1

P (M_ij j; D_t) ce^j_t+h;i. (A.28)

Essentially, for each speci…c , it is the sum of the exchange rate prediction of each of the k models weighted by their posterior probability. If there was only one support point for time-variation in coe¢ cients, such that d = 1, then equation (A.28) would complete the averaging approach. However, we are considering several possibilities for , hence we also perform Bayesian averaging over these values.

Starting with the prior probability in equation (A.24), the posterior probability of a speci…c is:

P ( _jjDt) = f ( e_tj j; D_{t h})P ( _jjDt h)

P f ( etj ; D^{t h})P ( jD^{t h}): (A.29) We note that using this probability, we can infer the degree of time-variation in coe¢ cients supported by the data (see also equation (14) in the main text).

We are now in position to …nd the total posterior probability of a model de-termined by a speci…c selection of predictor Mi and degree of coe¢ cient variation

P (Mi; jjD^t) = P (Mij ^j; Dt)P ( jjD^t), (A.30) and the unconditional average prediction of the average model,

ce_t+h= Xd j=1

P ( _jjDt) ce^j_t+h. (A.31)

Thus, ce_t+h is obtained by averaging over the average models’ prediction, over degrees of time-variation in coe¢ cients.

In document On the Sources of Uncertainty in Exchange Rate Predictability (Page 40-45)