5.2 Adaptations of the Hidden Markov Model
6.1.1 Introducing the Double-Chain Markov Model
Markov chains and HMMs have been reviewed in the previous chapters of this disser- tation. Recall that the Markov chain is a stochastic process where transitions between successive outputs of a discrete time random variable is governed by the Markov prop- erty. This process is entirely observable as each observed output is exactly identified with one state of the process. This was depicted in Figure 1.1 of Section 1.1. While Markov chains are widely used, there are applications where the model is not appro-
priate. For example in speech recognition there is not perfect identification between the state at a given point and the signal output. Instead, at each time point the state of the chain is unknown, but the output of another variable (the distribution of which depends entirely on the state of the model at the time point in question) is observed. This process is of course the HMM which has been discussed in detail. Importantly, the outputted signal sequence of the HMM is governed by the state process (whereby the state process is in turn governed, similar to the Markov chain, by the Markov property). It should however be noted that if the state process is not known / not assumed, then the probability of observing a given signal at some arbitrary time point k is dependent on the previous outputted signals.1 However, given the state at time
k, the probability of observing a given signal at time k is conditionally independent of all previous outputted signals. This was summarised in Figure 1.2 and is also made clear by the two equations below:
P (Sk = sk|Sk−1 = sk−1) 6= P (Sk = sk)
P (Sk= sk|Xk= i, Sk−1 = sk−1) = P (Sk = sk|Xk = i) . (6.1)
This conditional independence between the outputs of the HMM (equation (6.1)) may not always be justified. In fact in the literature there are numerous examples of processes governed by the HMM structure, but where the assumption of conditional independence between outputs is deemed to not be appropriate (see for example [10], [11], [22], and [46]). In particular, if it is assumed that successive outputs are related through the Markov property, then the resulting model is the double-chain Markov model (DCMM) presented in this chapter. That is, the DCMM has a similar stochas- tic framework to the HMM, but now it is assumed that for a given time point the signal emitted is not only dependant on the current hidden state, but also depen- dant (through the Markov property) on the previous observed signal (see Figure 1.3).
1The reason for this is that the observed signal sequence holds valuable information in predicting
the state sequence, which in turn drives the signal process. Hence if the state process is not known / not assumed, then the previous outputted signals hold valuable information in calculating the probability of observing a given signal.
The name double-chain Markov model is now clear as the model is a combination of two inter-linked Markov chains; the hidden chain governing the relation between the states and the observable chain governing (together with the hidden state process) the relation between the observed outputs or signals.
The dependence of a signal on both the current state and the previous emitted signal can be explained as follows. The signal process can be considered as a Markov chain, but where the transition probability matrix is dependant on the current state occu- pied. That is, a signal transition probability matrix is associated with each state in the state space, and each time the DCMM enters a new state, the signal transition probability matrix for that state is used to determine which signal (given the previ- ous signal) will be emitted for that time point. The output of the DCMM can thus be viewed as a time inhomogeneous Markov chain, where the transition probability matrix used for the outputs is driven by the state process of the DCMM.
A benefit of the DCMM is that the advantages of both the Markov chain and HMM are conserved - that is the system is driven by an unobserved latent process while successive outputs are dependent through the Markov property.
As an example of the DCMM, consider the following application adapted from [10]. In this application it is desired to model a time-series of daily average wind speeds at a specific location over a 17 year period. These wind speeds are of interest in order to determine the possible use of wind power in the area. More specifically two extreme conditions which can prevent a good exploitation of power need to be con- sidered: days with exceptionally low wind speed and days with exceptionally high wind speed. Accordingly the data is classified into three categories, ‘low wind speed’, ‘normal wind speed’ and ‘high wind speed’. Let these categories be denoted by Cl,
Cn and Ch respectively.
Several models were used to model this data including Markov chains, HMMs and DCMMs. For each of these models, let {Cl, Cn, Ch} represent the set of possible ob-
servations.
For the Markov chain, the output of the process is its state (that is the state process is also the signal process) and hence {Cl, Cn, Ch} represents the state/signal space of
the Markov chain. A single transition matrix is then used throughout to model tran- sitions between these outputs {Cl, Cn, Ch}. That is, dependence between the wind
speeds on successive days is modelled, but no underlying latent factor is considered. The HMM considers an underlying latent factor by including the hidden state pro- cess. This could for example be some seasonal factor; at certain times of the year higher wind speeds could be expected, while at other times lower wind speeds could be the norm. Since the output of the HMM is a signal determined by its hidden state process, {Cl, Cn, Ch} now represents the signal space of the HMM. And so the current
state occupied would then influence the probability of observing one of the signals from {Cl, Cn, Ch}. While the HMM does incorporate this latent factor which drives
the wind speed which is observed, it is assumed that there is no direct dependence between the wind speeds on successive days.
The DCMM incorporates these two models and conserves the advantage of each model. The DCMM once again incorporates the hidden state process (e.g. the process of the seasonal factor) which influences the signal which is observed from the signal space {Cl, Cn, Ch}.2 However now direct dependence (through the Markov
property) between the wind speeds on successive days is also modelled. This is done by estimating a separate signal transition probability matrix for each hidden state (seasonal factor) in the state space. In the application given in [10] it was found, using BIC as a model selection criterion, that of the models considered, the DCMM with two states was the most appropriate. The state transition matrix was estimated to be
P = 0.9875 0.0125 0.0148 0.9852
.
2For the DCMM, the output is a signal (influenced by its hidden state process), and so
Thus, as intuitively expected, the seasonal factor is estimated to be quite stable through time (as the underlying data represents daily intervals).
The signal transition probability matrix for each state was estimated as
B(1) = 0.3550 0.6450 0 0.0805 0.8874 0.0321 0.0228 0.7721 0.2051 B(2) = 0.1973 0.7846 0.0181 0.0361 0.8137 0.1502 0 0.6826 0.3174
where B(1) represents the signal transition probability matrix when the DCMM is in state 1, and
B(2) represents the signal transition probability matrix when the DCMM is in state 2.
It can be seen that transitions into Cl (column 1) are more likely using B(1) than
B(2). Conversely, transitions into Ch (column 3) are more likely using B(2) than in
B(1). This suggests that state 1 corresponds to seasons or time periods when lower wind speeds would be expected, while state 2 corresponds to seasons or time periods when higher wind speeds would be expected. For the DCMM, this then highlights the dependence of the output signal on both the previous signal (through the Markov property) and the current state (which is governed by the hidden Markov chain). The DCMM may thus prove particularly useful when it is expected that the tran- sition probability matrix of a Markov chain could potentially change through time according to changes through time of some underlying latent process.
Based on the above discussion it may well be expected that both the time-homogeneous Markov chain and HMM are special cases of the DCMM. This desirable property does indeed hold true and is formally proven in Appendix A.
While the advantages of the DCMM have been mentioned, one notable disadvantage is that the DCMM will contain more parameters than either the Markov chain or the HMM. For a given application, these parameters will typically have to be estimated from the data observed (parameter estimation for the DCMM will be discussed later
in this chapter). Thus for a given application, one of the considerations which needs to be taken into account when assessing the suitability of the DCMM over the time- homogeneous Markov chain and the HMM is the amount of data which is available. The number of parameters which need to be estimated for each model is given below (where M represents the number of states in the state space and K represents the number of signals in the signal space):
• For the Markov chain, the number of parameters which need to be estimated is (M − 1) + M (M − 1).
• For the HMM, the number of parameters which need to be estimated is (M − 1) + M (M − 1) + M (K − 1).
• For the DCMM, the number of parameters which need to be estimated is (M − 1) + M (M − 1) + M K(K − 1).
Finally, it should be noted that this dissertation will focus on the discrete-time, discrete-state and discrete-signal DCMM where the state transition probability ma- trix and the signal transition probability matrix for each state are assumed time homogeneous.