The main aim of this thesis is to develop Bayesian diagnostic tools for the model selection issue in a Hidden Markov model context. Under the Bayesian perspective, we develop likelihood-based criteria from the AIC, BIC and DIC for HMMs. We extend the original definition of the DIC taking into account the concept of focus and the availability of closed form of the likelihood of HMMs. We also contribute in developing Bayesian modified versions of the AIC and BIC which approximated at the posterior distribution of the model parameters. We also examine the WAIC (Watanabe,2010), based on the predictive pointwise density. We also develop a Poisson hidden Markov model (PHMM) to spatially model the traffic crash data. Our methodology is illustrated by application involving the crashes which occurred on several motorways in the UK. We are interested in identifying highway segments which have distinct crash rates (distinct states) of the relative safety process. Selecting an optimal number of states is an important part of the interpretation.
The structure of this thesis is organized as follows:
In Chapter2we present briefly the concept of mixture models to give a better understanding of hidden Markov models.
In Chapter3we include the fundamental definitions and notations of the HMMs. In addition, we introduce the idea of presenting the HMM as a generative model. We develop an algorithm to explain the mechanism of generating data from a parametric HMM. Furthermore, this chapter presents the concept of forward-backward algorithm, as well as the estimation of the model parameters using the EM approach.
Chapter 4 discusses the inference technique for the unknown parameters of hidden Markov models within the Bayesian framework. We set out a theory of hidden state models and develop the necessary MCMC algorithm. We discuss the problem of estimation of the hidden state sequence of a HMM. In addition, this chapter discusses the problem of label switching. We review the literature relevant to this problem and also its solutions.
In chapter5we consider the model selection issue of HMMs. We derive several forms of the likelihood function of a HMM, namely, the observed, complete and conditional likelihood. We develop several conditional and observed likelihood-based versions for the Deviance information criterion (DIC; Spiegelhalter et al., 2002). In addition, we propose several modified versions of the Akaike information criterion (AIC; Akaike,1973) and the Bayesian information criterion (BIC;Schwarz,1978) approximated from a Bayesian perspective. Also, this chapter introduces a criterion based on assessing the predictive ability of a HMM, the widely applicable information criterion (WAIC;Watanabe,2009).
In chapter 6, we introduce simulation studies based on synthetic and real data application to assess the model selection criteria proposed in chapter5.
in Chapter 7 we presents an application involving the traffic crash data. In this chapter we model the spatial dependency, rather than the temporal dependency, on a highway segment using a Poisson hidden Markov model (PHMM). We apply our methodology to identify the highway segments that have distinct crash rates (distinct states) of the relative safety process. This chapter also includes the process of estimation and model selection taking into account the sensitivity analysis of some priors chosen for the state-specific crash rates.
Finally, chapter8dedicated to summarize the work of this thesis and introduce some proposed ideas for future research of HMMs.
Finite Mixture Models
2.1 Introduction
Hidden Markov models can be considered as an extension of mixture models where the observations are generated independently from some distribution depending on a state or component follows an unobserved Markov process (Cappé et al.,2005). In order to understand the theoretical structure of hidden Markov models, we devote this chapter for reviewing briefly some fundamentals of mixture models. This thesis mainly concentrates on HMMs with a discrete and finite state space.
Mixture models have been developed as a flexible tool to model data with an unobserved heterogeneity, for example, different types of data can form clusters or groups. A finite mixture model (FMM) is generally used when an observation belongs to one of K groups (components) that have distinct features and can be described by different probability distributions (Marin and Robert,2014). In other words, these models are a weighted average of a finite number of distributions (mixing components). In real life, FMMs may be a finite mixture of distributions such as Gaussian or Poisson distributions (McLachlan and Peel,2000;
Frühwirth-Schnatter,2006).
Interest in FMMs has increased over the last decades. They can be used for cluster analysis, latent class analysis, discriminant analysis, image analysis, survival analysis, disease mapping and meta analysis. There are many textbooks which have focused in detail on finite mixture models such asMcLachlan and Peel(2000);Frühwirth-Schnatter(2006);Schlattmann(2009);
Marin and Robert(2014).
Bayesian methods to model these mixtures of distributions have been used widely for inference. The extensive use of these distributions led to the rapid development in posterior simulation techniques such as MCMC methods (McLachlan and Peel, 2000, p.5). Therefore, MCMC procedures have been used to handle the difficulties in the estimation processes of parameters of FMM such as the number of k components (Richardson and Green,1997), and
the effect of label switching (Stephens, 2000; Jasra et al., 2005). Moreover, the Bayesian framework has been employed to simplify these complicated structures by classifying them into a set of simple structures using hidden or latent variables (Marin and Robert,2014).