Constrained HMMs - Hidden Markov Models based methods

Chapter 2 Literature Review

2.12 Hidden Markov Models based methods

2.12.3 Constrained HMMs

The HMM framework and methods presented thus far have not placed any restric- tions on the behaviour of the underlying MC in that the MC is permitted to visit any of the states freely. Such a HMM is referred to as an unconstrained HMM. Chib (1998) and Luong et al. (2012) consider a constrained HMM such that the underlying MC is restricted to move in a particular way, and construct CP methods around this framework.

Under the constrained HMM framework, the underlying MC cannot return to previously visited states. In a CP context, this results in the underlying states corresponding to the segments between two consecutive CPs. Thus if there are M

CPs, then the data is partitioned intoM+ 1 segments and the assumed constrained HMM has H = M + 1 underlying states. As the number of underlying states is assumed known a priori for a HMM whether constrained or unconstrained, this consequently means the number of CPs is known a priori under the constrained HMM framework.

The behaviour of the underlying MC is more formally constrained to move in the following manner. Firstly, X0 = X1 = 1 and Xn = H = M + 1. That is,

the latent MC and observation process must start in the first segment, and end in the last segment. Secondly, the underlying MC is constructed such that it is unable to return to previously visited segments and thus states. There are consequently only two possible moves for the underlying chain at each time t. Explicitly, if

Xt=i, i= 1, . . . , M, then either

(i) Remain in the current state and segment, thusXt+1 =Xt=i.

(ii) Alternatively, move to the next segment and state in the state space. Thus,

Xt+1 =i+ 16=Xt=i

P, the corresponding transition matrix, is a matrix with non-zero entries on the diagonal and immediate super-diagonal, and zeroes elsewhere. That is pij > 0 if

j = _{i, i+ 1_}, else pij = 0. Under this setup, each row of the transition matrix

only has one unknown transition probability aspi,i+1= 1−pi,i. Such restriction on

the transition matrix needs to be accounted for in parameter estimation methods in order to maintain the constrained HMM framework.

Luong et al. (2012) provide a method in which the posterior CP probability and confidence intervals for CP location estimates can be computed via the use

of a constrained HMM framework. These pre-determined location estimates could be provided by CP estimates computed under the Viterbi or Posterior Decoding algorithm discussed earlier, or by alternative means. Via the Forward-Backward Equations, it is shown that the probability of a CP occurring at a specified time, can be computed in addition to usual smoothed probabilities under a constrained HMM framework. That is, fori= 1, . . . , M,

P(ith CP at timet+ 1) =P(Xt+1=i+ 1, Xt=i|y1:n) (2.21)

= αt(i)βt+1(i+ 1)pi,i+1f(yt+1|Xt+1 =i+ 1)

α1(1)β1(1)

(2.22) Such probabilities can thus be used to determine the CP probability (CPP, the probability of any CP occurring at a specified time). Theα confidence intervals for theith CP location, (Lα

i, Uiα) can also be provided by:

Lαi = inf ( L∈ {1, . . . , n}| L X t=1 P(ith CP at time t+ 1)≥1−α 2 ) U_iα= inf ( U _{∈ {}1, . . . , n_}| U X t=1 P(ith CP at time t+ 1)_≥α+ 1 2 )

Such quantities provide quantification of the uncertainty regarding the CP location. An implementation of the methodology is provided in theRpackage postCP (Nuel and Luong, 2012) and its application on the GNP dataset are displayed in Figure 2.9. We consider the 95% confidence intervals and CPP plot for the Viterbi and NBER CP estimates, assuming a 2-state Gaussian Markov Mixture model . We observe that the confidence intervals are a mixture of narrow and wide (the initial CPs and the middle CPs respectively), highlighting that some of the CP estimates provided are more certain than others and other CP configurations are possible. The CPP plots provide further reasoning as to the shape and behaviour of the confidence intervals, with narrow intervals associated with centred and peaked CPPs, and wide intervals associated with more diffused CPPs around the CP estimates. Such CPP behaviour corresponds to how the GNP data is behaving and whether the CPs are obvious or not. By quantifying the uncertainty of CPs via the CPP plot for example, this provides a better understanding of the data and the CP estimate.

Whilst the uncertainty of CP locations has now been addressed, there are several disadvantages to such an approach, namely that CP location estimates need to be provided preliminary and this is also dependent on the number of CPs being known a priori. Luong et al. (2012) remark that the accuracy of the CP posterior

Time GNP data 1950 1955 1960 1965 1970 1975 1980 1985 −8 −6 −4 −2 0 2

(a) Confidence Intervals for Viterbi CP estimates Time CPP 1950 1955 1960 1965 1970 1975 1980 1985 0.0 0.2 0.4 0.6 0.8 1.0

(b) CPP for Viterbi CP estimates

Time GNP data 1950 1955 1960 1965 1970 1975 1980 1985 −8 −6 −4 −2 0 2

(c) Confidence Intervals for NBER CP estimates Time CPP 1950 1955 1960 1965 1970 1975 1980 1985 0.0 0.2 0.4 0.6 0.8 1.0

(d) CPP for NBER CP estimates

Figure 2.9: Confidence Intervals (grey bars) and Changepoint Probability (CPP) plots for the Viterbi and NBER estimates on the GNP dataset. These quantities are computed via a constrained HMM framework as proposed in Luong et al. (2012).

probabilities reported are highly dependent on the estimates of the CP locations and number provided, due to its influence in the estimation ofθ. This is demonstrated in the GNP implementation (see Figure 2.9, around 1980) where the CPP plots are noticeably different for the two sets of CP estimates initially provided. Such sensitivity is not particularly desirable or sensible if CP characteristics are generally unknown.

Chib (1998) propose a framework in which the uncertainty of CP locations is quantified more explicitly by considering the uncertainty of the underlying state sequence. This is performed by sampling from the posterior of the underlying state sequence,p(x1:n|y1:n), and thus sampling the location of CPs when there is a change

in state in the underlying state sequence. That is Xt = i 6= Xt+1 = i+ 1 for

i= 1, . . . , M.

Sampling the underlying state sequence is achieved by sampling from the joint posterior distribution of the model parameters and underlying state sequence,

p(x1:n, θ|y1:n, H). This is typically not a conventional, standard distribution and

thus a MCMC sampling scheme is employed. In particular, they iteratively sample from the following two full conditionals,

• θ|y1:n, X1:n=x1:n

• X1:n|y1:n, θ.

It is thus possible to obtain a posterior of the state sequence by marginalising out the model parameters from the joint posterior,p(x1:n|y1:n, H) =

p(x1:n, θ|y1:n, H)dθ.

Consequently a posterior of the CP locations can be obtained by determining when there is a change in state in the sampled state sequence from its posterior.

Chib (1998) also provide an ad-hoc solution in determining the number of underlying states and thus the number of CPs. This is achieved by framing the unknown number of CPs problem as a Bayesian model selection problem, similar to that explored in Section 2.9. Each model assumes a different number of states and thus number of CPs. The marginal likelihood can thus be approximated for each model, and Bayesian model selection methods such as Bayes’ factor can be employed in determining which model is suitable, and thus how many CPs to assume.

Chib (1998) remark that the marginal likelihood, p(y1:n|H = h) which as-

sesses the likelihood of the data arising from a model assumingH=hstates, can be approximated and obtained additionally from the MCMC sampling algorithm for the joint posterior distribution of the underlying state sequence and parameters.

Having obtained the marginal likelihood, the model posterior distribution can also be approximated in combination with a model prior. Chib (1998) use the Bayes’ Factor to determine which model, and thus how many CPs, to assume. Bayes’ Factor in assessing the relative evidence of one model over another. Thus, suppose one wants to assess whether to assume m1 or m2 CPs, and consequently whether

to assume m1+ 1 or m2 + 1 underlying states in a constrained HMM framework.

Then the Bayes’ Factor between these two models is defined as,

Bm1,m2 =

p(y1:n|H =m1+ 1)

p(y1:n|H =m2+ 1)

. (2.23)

Larger values ofBm1,m2 indicate that the data supports a model assumingm1 CPs over m2 CPs.

Figure 2.10 displays the results of Chib’s implementation on the GNP example. In particular, we assume the GNP data arises from a Gaussian Markov Mixture model such that the mean and the variance are state dependent. As the number of CPs is unknown a priori, this needs to be estimated firstly. We consider models with

0 1 2 3 4 5 6 7 8 9 10 No. CPs P oster ior Probability 0.0 0.2 0.4 0.6 0.8 1.0

(a) Posterior Distribution of Number of CPs

Posterior Density of Regime Change Probabilities

Time Probability 0 20 40 60 80 100 120 140 0.0 0.1 0.2 0.3 0.4

(b) CP probability assuming one CP occurs

Figure 2.10: Posterior Distribution of Number of CPs and location of first CP under the constrained HMM framework of Chib (1998). Zero CPs are most probable but if a single CP is assumed to have occurred, then this is most likely to occurred towards the beginning of the data.

zero to ten CPs and approximate their respective posterior distributions, assuming a Uniform prior over the number of CPs (Figure 2.10(a)). As the posterior highlights, zero CPs are the most probable, with some probability associated with one recession potentially occurring. The use of Bayes’ Factor also concludes the same result. Up to 14 potential CPs were also considered in concordance with the 14 detected by NBER; identical results were achieved with nearly all probability mass on zero CPs occurring.

We could thus conclude that no CPs have occurred during the data if we take the maximum a posterior estimate of the number of CPs. However, if we condition that one CP has occurred, this CP appears to occur towards the beginning of the data.

The constrained HMM approach as proposed by Chib (1998) provides a state- of-the art framework in tackling CP problems and providing quantification of CP characteristics. The uncertainty is captured by sampling the underlying state sequence via a MCMC algorithm, and model parameter uncertainty is captured by marginalising out this quantity. However, this is typically a high-dimensional corre- lated vector and thus care is required in designing good moves such that the sampling MC is mixing well. In addition, the uncertainty of both the number and location of CPs are not considered simultaneously which may be desired.

In document The uncertainty of changepoints in time series (Page 56-60)