Bayesian Inference in Cointegrated Systems

(6.0) An Overview o f the C hapter.

In this chapter I develop a Bayesian procedure to conduct inference on the cointegrating rank o f a system of 1(1) variables, and to verify the validity of over identifying restrictions on the cointegration parameters. The model is specified in terms of a parameterisation which seems to suit well this inferential problem Exact finite sample distributions for the parameters and the statistics o f interest are obtained by means o f Monte Carlo integration o f the corresponding conditional posterior distributions. A simulation analysis, an application on Danish and Finnish money demand data, and an application on the UK exchange rate data are presented The chapter is organized as follows Recalling the content of chapter [5], Section [6.1] summarises the main motivations behind a Bayesian approach to the analysis o f cointegrated systems. Section [6.2] is devoted to presenting the model Section [6 3] describes the prior distribution and Section [6.4] copes with the resulting joint distribution Section [6.5] considers inference on the cointegration rank, and Section [6 6] deals with testing the validity o f the over identifying restrictions imposed on the cointegrating vectors. Section [6.7] contains the results o f a set o f applications The applications are the Danish and Finnish money demand examples studied by Johansen and Juselius (1990), and the PPP /UIP UK data o f Johansen and Juselius (1992). Section [6 8] concludes

[6.1] Motivations

As I have detailed in the previous chapter, Johansen's ML approach to the analysis

of cointegrated systems presents some technical problems which can be summarised in the following few points:

1) The cointegrating rank test statistics have non standard distributions, and only asymptotic results are available for any kind of inference in the model These distributions have to be tabulated, by simulating the vector Brownian motion process a sufficiently high number o f times Once the rank has been determined on the basis of the relevant statistics, since the parameters of the cointegrating vectors are not identified, the researcher has to provide the model with some (linear) constraints intended to achieve identification Insofar as these constraints generate over-identification of the cointegrating parameters, their validity can be checked by means of a likelihood ratio test statistic (see Johansen, 1995b), whose distribution is standard (asymptotically x2) All the inferential questions in the model, conditioned upon a given cointegrating rank, can be answered on the basis of available standard distributional results The identified cointegrating vectors themselves can be given a distribution (Johansen 1991, 1995a, 1995b): the asymptotic distribution is normal, with a variance-covariance matrix which is itself a random variable Therefore the identified cointegrating vectors have an asymptotic distribution which is in the form of a mixture of normal variables There is also the necessity of resorting to asymptotic results when the interest o f the researcher focuses on non-linear functions o f the parameters of the model, e g. the impulse response coefficients (see Lutkepohl 1991) or the shock persistence profiles (see Pesaran and Shin, 1995) As for the finite sample distributions o f the parameters, little is known, beyond Phillips' (1994) finding that identified cointegrating vectors have finite sample Cauchy-like tailed distributions Cappuccio

and Lubian (1995) provide disturbing Monte Carlo evidence on the poorness o f finite sample properties of the LR tests of the over-identifying restrictions.

2) An awkward role is played by deterministic components As pointed out in Section [5 6], different deterministic components generate different asymptotic distributions for the cointegration rank test statistics. Exactly as happens in conventional unit root testing, there is the usual necessity to decide what kind o f deterministic behaviour to allow for the model being used to conduct inference on the stochastic features of the series under study Then the inference results are somehow conditional on the deterministic component being allowed for. In order to cope with the problem, Johansen (1992) has specified a procedure whose finite sample properties are unknown.

3) Inefficient parameterisation The test-bed parameterisation for the presence o f cointegrating long-run relationships is the vector autoregressive (VAR) framework

In Sims' (1980) own words, this is a "profligate" parameterisation, which precludes the possibility o f analysing models with more than 4 or 5 series Imposing constraints on the parameter space is not easily feasible in Johansen's approach Restricting the parameter space would preclude using simple OLS partial regression to concentrate the likelihood function, as is done in Johansen's setting These issues justify an alternative approach to the problem The Bayesian approach seems to be the obvious candidate, if one considers all the points mentioned above As for point one, Bayesian inferential techniques are very much more straightforward in their applied interpretation, and they easily lend themselves to become a convenient support to decision making in modelling Moreover they are based on exact finite sample distributions o f the relevant parameters and statistics This might prove of crucial importance, as an interesting study by Bauwens and Lubrano (1994) has recently shown: if one were to accept the asymptotic standard error associated with the identified cointegration coefficient vectors as a true

measure o f the uncertainty associated with the results being obtained, one would actually underestimate that uncertainty. The exact finite sample posterior distribution is much more dispersed. Using a Bayesian framework of analysis, one could compute the moments and the exact finite sample posterior distribution of any desired function of the parameters

As for the second point, instead o f conditioning upon the deterministic component, as is implicitly done in the MLA, using Bayesian techniques one could think of

marginalising the joint posterior distribution with respect to the parameters in the deterministic part This would really render the analysis more coherent

As for the problem of parameterisation inefficiency, once again the use o f Bayesian techniques might render it possible to resort to some useful and sensible ways to restrict the parameter space For example, one might consider the BVAR-type of

approach (see Doan, Litterman and Sims, 1984), where one trades off unbiasedness with efficiency, or alternative specifications o f distributed lags Of course many computational difficulties are expected to arise, and indeed do arise Nevertheless, many different numerical simulation techniques are nowadays available, as shown in Chapter 3. In the present study I use Monte Carlo integration techniques via Gibbs sampling in order to conduct posterior inference in a cointegrated VAR setting The inference being conducted refers to the number of cointegrating vectors and to their structural interpretation, and this constitutes the main novelty o f this approach with respect to previous Bayesian studies: for example Kleibergen and van Dijk (1994) work with a given cointegrating rank

|6.2] T he Model

A(£)y, = 6 ' D l + e ( , A(L) = I„ - £ A ,£ ', y i or r(L)Ay, = a P 'y ,.1+ ô ’D( + e (, r ( £ ) = I „ - Z r / ^ (1) ;=i p ( a ) = p ( P ) = r < « , e, ~ AWW(0,E).

The vector Dt contains the deterministic components of the model The parameters in 6' are convolutions o f the autoregressive parameters and the expectation value o f

Model (1) is exactly the one specified by Johansen and his co-authors in his papers The model is subject to the usual non-identification issue o f the cointegrating parameters As seen in Section [5 4], some restrictions are needed in order to achieve identification 1 decided to start from the condition usually imposed, i.e. the normalisation p = [Ir q>']' , which corresponds to a situation of exact identification o fp

In order to write the likelihood function o f the process, I write the model in a matrix form

AY=D6 + Y *r + Y., Pa' + E, vec(E) -N fO ,!® ^),

r= [r, |r 2

\...\rkj

, {y*}, = [yf.1,|yf.2*l I W ] P

The likelihood function o f the model is then:

pidata |o^P ,r,8,I) oc |Z| ™ exp{ - 5 trace (E* E £-')}. (2)

Notice that the model is bilinear in the parameters a , T, 5 on one side, and P on the other Therefore, the conditional distribution o f each group of the parameters, given all the others, are o f known analytical forms This will be used in the

posterior distribution analysis, after having described the prior distribution being implemented.

[6.3] The Prior Distribution.

The main aim behind the specification of the prior distribution is to ensure enough flexibility to represent the extra sample information being available to the researcher, with the desired strength, as measured by prior precision. The prior distribution is denoted as:

p(a, cp, T, 8, E) = p ( a )p ( P) p( F)p{ 8) p(E)

Note the assumption o f prior independence among the set o f parameters a , <p, T, 8, E This assumption is by no means necessary and can be relaxed only at the cost of making computations slightly more burdensome

The prior pdfs for a and <p read:

p(vec(a)) x exp{- 5[(vec(a)-tiJEa-'(vec(a)-pJ}, p(vecUp>) <x expt-.SfCvecitpJ-n^'E^'ivecitp)-^},

(3)

As in Geweke (1993), these are shrink-to-mean prior distributions: p a and i y are the prior means for vec(a) and vec(<p), respectively Note that in many macroeconomic applications, economic theory suggests prior beliefs on the long- run equilibrium relationships These beliefs can easily be reflected with an adequate choice of the hyperparameter vector j y The intensity o f these prior beliefs is

proportional to Za ‘ and 'Z ^ 1. For convenience, I choose the prior variance matrices to be:

= a>J„r, V = “V1™ • where 5 = n'r (4)

The single hyperparameters coa and eo^ control the strength o f prior beliefs on a

and <p respectively. The necessity of specifying a proper prior distribution for the parameters in <p will be justified when describing the joint posterior pdf

The prior distribution of vec(T) is specified as

p(vec(T))oc exp{- 5 vecÇT) 'Zp-1 vec(T)), (5)

where Zj--1 is a diagonal matrix whose elements are determined as follows:

[var {r„}ÿ]-‘ = when i * j , [var { T J J '1 = c o ^ when i = j (6)

£tX¡h >0, ÛXj >0.

As in Doan, Litterman and Sims (1984) a shrink-to-mean prior is specified: each autoregressive parameter has a prior mean equal to zero, with precision increasing with the lag order and, ceteris paribus, when the coefficient is off the diagonal of the autoregressive matrices: this amounts to believe that own lags are more important in each of the VAR equations, and that coefficients on relatively recent

lags are expected to drifi from zero more than those on relatively distant lags This seems to be a viable and sensible solution to the overparameterisation problem related to the VAR setting. O f course, when the overall tightness hyperparameter

Also for the deterministic parameters in 5, a normal shrink-to mean-prior distribution is specified:

p(vec(S))oc exp{-.5(vec(8)-Hs) ' £ g-' (vecCSj-pj) }, (7)

where a prior precision £ g_1 equal to zero reflects prior ignorance. The prior for £ is:

p(L) cc \Z\<"+W , (8)

and it is a customary choice in the Bayesian treatment of multi-equational models, in the absence of prior information. In case prior information about E is available, it is possible to specify an inverted Wishart (Zellner, 1971, Section B 3) specification for p(E), of which expression (8) constitutes a special case

In synthesis, I choose independent shrink-to-mean priors for the parameters describing the conditional mean of the process y„ i.e a , q>, T, 5 For E I specify a customary ignorance prior.

Note that for each of these five groups of parameters, and conditionally on the other ones, the prior pdf is conditionally conjugate with respect to the corresponding conditional likelihood In this way, the conditional posterior distributions are all tractable

(6.4] The Joint Posterior Distribution.

Combining the likelihood function with the prior distribution, it is possible to write the joint posterior distribution:

/?(a,cp,r,5,E|cijta) ac |E|'<7'+’,+1 >2exp {-. 5 [/race (E 'E Z 1 +

(v e c (a )-p j'£ a 1(vec(a)-p.a)+(vec((p)-|i9)'E((>-‘(v e cirp )-^

v ec (0 ' V ' vec(0+(5 -n*)]}. (9)

Notice that, given the informative prior pdf, in the joint posterior pdf all the parameters are identified. If the prior distribution for <p were improper, i.e. if co^=0,

a rank deficiency in a would induce singularity in the variance covariance matrix in the conditional posterior pdf o f vec(q>). This is the point made in Kleibergen and van Dijk (1994).

Due to the already mentioned presence of non linearities between parameters, the joint posterior distribution is not easily amenable to analytical integration. Therefore, in order to obtain posterior marginal distributions and posterior moments, it is necessary to resort Monte Carlo Integration. This technique has been explained in Section [3.1],

When it is not possible to provide i.i.d draws from the joint posterior distribution, as in our case where it is of no known analytical form, then some other methods have to be adopted Following the suggestions o f Hammersley and Handscomb (1964) one could choose an "importance function" to sample from. In any case that choice is not easy, and it might yield very poor estimates. In fact, as is stressed in Koop (1994), it is necessary that the tails of the importance distribution be fatter than those o f the posterior distribution, otherwise the draws from the tails of the importance function dominate the behaviour of the Monte Carlo estimate For this reason, one should know exactly the shape of the posterior distribution, in order to choose correctly the importance function We do not know the form o f the joint posterior in our context, and therefore we adopt a Gibbs Sampling Algorithm

(GSA). This algorithm has been described in Section [3.3c]. Application of the

posterior distributions in the present context The parameter vector is divided into three sets, the first one 0, = [vec(<p)j, the second one 02= [vec(a')'|vec(r)'|vec(5)']', and the third one 03 = vech(Z). The following three lemmas describe the conditional posterior distributions of these three subsets of parameters.

Lemma 4.1 The conditional posterior distribution o f 0,= vec(cp) is rx.y-variate

In document Bayesian inference on non stationary data (Page 155-164)