HMMs and three basic problems - Bayesian extreme quantile regression for hidden Markov models

In order to produce hidden Markov models useful in real-world applications there are three basic problems, which need to be solved.

• 1st problem - evaluation problem :

Supposing we have an observation sequence and a specific model, how do we efficiently compute the probability of the observation sequence, given that model? (In other words, what is the probability

that this observation sequence was produced by that model?)

• 2nd problem - decoding problem :

Supposing, again, we have an observation sequence and a specific model, how do we deduce from the observation sequence the most likely state sequence in a meaningful manner? (For instance, how do we find a corresponding state sequence that best ”explains” the observations?)

• 3rd problem - estimation problem :

How do we adjust the parameters of our model in order to maximize the probability of the observation sequence, given the model?

3.3.1 Solutions to the three basic problems of HMMs

In this section our aim is to briefly describe the solutions to the above problems and make the con- nections with some algorithms (Viterbi, Baum-Welch, Forward-Backward), rather than present the solutions in great detail.

The first problem (evaluation problem) can be also viewed in a different but extremely useful way; that of how well a given model matches a given observation sequence. Therefore, if we are in a situation in which we want to choose the model which best matches the observations, among several competing models, the only thing we have to do is to solve the evaluation problem. The most straightforward way of solving this problem, is to enumerate every possible state sequence of length equal to the number of observations. It is obvious that for a large number of observations or states this calculation is computationally infeasible. In fact, even for small values, the number of calculations is very large. However, there is a more efficient procedure to solve this problem which is called Forward-Backward algorithm (Baum 1972).

The second problem (decoding problem) is a way of uncovering the hidden part of the model. Finding the most likely state sequence is not always needed, because the probability measure of an hidden Markov model does not explicitly involve the state sequence. However, in many applications it is important and useful to uncover that sequence. This problem can be solved in several possible ways. One way is to maximize the probability of being in state i, at time t, given the observed sequenceyT and the model parametersθ,

In order to solve this problem one can use dynamic programming methods such as the Viterbi algorithm (Forney 1973).

The third problem (estimation problem) concerns methods of optimizing the model parameters and it is the most difficult one, as there is no known way of solving it analytically. Usually, the maximum likelihood method is followed, in order to find parameters that maximize the probability of the observation sequenceyT, given the state sequencexT,

P r(yT|xT, θ).

This maximization can be accomplished via the Baum-Welch algorithm (Baum, Petrie, Soules and Weiss 1970).

Alternatively, one may use Bayesian inference to estimate the parameters of the hidden Markov model via Markov chain Monte Carlo methods.

Forward-Backward algorithm

The Forward-Backward algorithm (FB; Baumet al. 1970) is a set of filtering recursions that are used to calculate the likelihood and to simulate realizations of the underlying process of a hidden Markov model given the values of the model parameters. Usually it is used within more general recursive schemes, where the parameters need to be estimated. In particular, the Forward-Backward algorithm can be used to evaluate the likelihood within the steps of an expectation-maximization (EM) algorithm for ML estimation, or to simulate realizations of the hidden chain in an MCMC algorithm for Bayesian estimation. This algorithm will be used in applications in this dissertation and, therefore, we present it in detail in the following section.

Viterbi algorithm

This algorithm was generated by Andrew Viterbi as an error-correction scheme for noisy digital com- munication links. Nowadays, it is commonly used in speech recognition, keyword spotting, computa- tional linguistics and bio-informatics. It is a dynamic programming algorithm, which finds the most likely state sequence to have generated a sequence of observations. For example, a possible observed sequence could be an acoustic signal and a string of text could be the -hidden- state sequence that caused the observations. The algorithm is based on several assumptions. Both observed and hidden events must be in a sequence, which often corresponds to time. These two sequences need to be

ranged, while an observation has to correspond to exactly one hidden event. Moreover, computing the most likely -hidden- state sequence up to a certain pointtmust depend only on the observed event at pointt, and the most likely sequence at pointt−1. A transition from a previous state to a new one is marked by an incremental metric (number), which depends on the transition probability from the old to the new state. The aim of the algorithm is to keep a number for each state, so, when an event occurs, the Viterbi algorithm examines the new possible states and chooses the best one using these metrics.

Baum-Welch algorithm

The Baum-Welch algorithm was developed by Leonard E. Baum and his co-workers in a series of papers published between 1966 and 1972 (Baum and Petrie 1966; Baum and Egon 1967; Baum and Sell 1968; Baum, Petrie, Soules and Weiss 1970; Baum 1972). The name of Welch appears only as joint author -with Baum- of a paper listed by Baum, Petrie, Soules and Weiss (1970) as submit- ted for publication. It is an example of an algorithm of the Estimation-Maximization (EM) type. The Baum-Welch algorithm updates the model parameters until convergence, usually following the Forward-Backward algorithm, due to its interpretation as an extension of the forward induction procedure to the evaluation problem.

In this thesis we implement an MCMC algorithm for inference about discrete-time finite state- space hidden Markov models using the Forward-Backward algorithm. This algorithm consists of updates of the hidden sequence of states given the model parameters. Then, it updates the values of the parameters from their conditional distributions and repeats this procedure until convergence.

The reason why we use the Forward-Backward algorithm and not any of the other two (Viterbi or Baum-Welch) is that the Forward-Backward algorithm is more appropriate given our model con- struction (we need to calculate the likelihood and simulate realizations of the latent variables given the model parameters).

In document Bayesian extreme quantile regression for hidden Markov models (Page 58-61)