Modification to AIC and BIC - Model Fit Diagnostics for Hidden Markov Models

In conventional HMMs, the AIC and BIC are typically based on the log-likelihood function, `( ˆΘ), the deviance D; D = −2(`( ˆΘ)), evaluated at a point estimate, Θ, obtained fromˆ maximizing the likelihood function using the EM algorithm (Dempster et al., 1977). In this section, we introduce several versions of the AIC and BIC that are based on the observed and conditional log-likelihoods approximated from a Bayesian perspective which considers a new idea in the HMMs context. Developing such expressions are inspired byBrooks(2002, p. 617) who pointed out, in his comments on the article published bySpiegelhalter et al.(2002), that it is possible to obtain approximate estimates for the AIC and BIC based on a deviance evaluated at the posterior draws.Brooks(2002) proposed that the term of model fit in both criteria can be the expected deviance, D(Θ), which is approximated over posterior draws.

In addition to such a proposal, we develop, further, other versions of those criteria for HMMs by evaluating their log-likelihoods at the posterior draws summarized from an MCMC

sampling.

As mentioned earlier, these criteria require specification of the number of free parameters or the penalty term. In order to employ such criteria for HMM with different model complexity, we need to determine the number of free parameters, h, of the model. The number of free parameters, h, of a HMM with parameters Θ = (π, A, θ ), is given as (Zucchini and

MacDonald,2009)

h= K2+ sK − 1, (5.17)

where K refers to the number of states and s is a single numeric value representing the number of parameters of the underlying distribution of the observation process. For example, s = 2 for the Normal distribution (µ and σ2) and s = 1 for the Poisson distribution (λ ) (Zucchini and

MacDonald,2009).

5.4.1 Recursive observed likelihood-based AIC and BIC

In this section, we provide modified versions of the AIC and BIC for HMMs based on a recursive or observed likelihood approximated from a Bayesian perspective. By introducing the recursive log-likelihood in closed form into the general definitions of the AIC and BIC provided in the previous section (5.2.1), we introduce three different cases of modified versions of the AIC and BIC. These are referred as AICrecand BICrec, respectively, as follows: Case I

AICrec1 = EΘ|y[Drec(Θ|y)] + 2h,

= −2E_Θ|y[log Pr(y|π, A, θ )] + 2h, = −2 Z π Z A Z θ

[log Pr(y|π, A, θ )] Pr(π, A, θ |y, z)dπdAdθ + 2h, (5.18)

BICrec1 = EΘ|y[Drec(Θ|y)] + h log(T ),

= −2E_Θ|y[log Pr(y|π, A, θ )] + h log(T ), = −2 Z π Z A Z θ

[log Pr(y|π, A, θ )] Pr(π, A, θ |y, z)dπdAdθ + h log(T ), (5.19)

Case II

AICrec2= Drec(EΘ|y(Θ|y)) + 2h,

= −2 log Pr(y|Eπ,A,θ[π, A, θ |y, z]) + 2h, = −2 Z π Z A Z θ

log Pr(y| ¯π, ¯A, ¯θ )Pr(π, A, θ |y, z)dπdAdθ + 2h, (5.20)

BICrec2= Drec(EΘ|y(Θ|y)) + h log(T ),

= −2 log Pr(y|Eπ,A,θ[π, A, θ |y, z]) + h log(T ), = −2 Z π Z A Z θ

log Pr(y| ¯π, ¯A, ¯θ )Pr(π, A, θ |y, z)dπdAdθ + h log(T ), (5.21)

where E_Θ|y[Drec(Θ|y)] = −2EΘ|y[log Pr(y|π, A, θ )] in the above two versions is the expected recursive deviance evaluated at the posterior means of all model parameters summarized from the posterior distribution Pr(π, A, θ |y). These posterior means are marginally approximated from the Gibbs sampler as follows:

¯ π ≈ 1 M M

∑

m=1 π(m)_j , ¯A ≈ ¯ajk= 1 M M

∑

m=1 a(m)_jk and ¯θ ≈ 1 M M

∑

m=1 θ(m)_j , for j, k = 1, 2, ..., K. Case III

AICrec3 = EDˆrec(.)[Drec(Θ)] + 2h,

= −2E_\_{log Pr(.)}[log Pr(y|π, A, θ )] + 2h, = −2 Z π Z A Z θ h \ log Pr(y|π, A, θ ) i Pr(π, A, θ |y, z)dπdAdθ + 2h, (5.22)

BICrec3 = EDˆrec(.)[Drec(Θ)] + h log(T ),

= −2E_{log Pr(.)}_\ [log Pr(y|π, A, θ )] + h log(T ), = −2 Z π Z A Z θ h \ log Pr(y|π, A, θ ) i

Pr(π, A, θ |y, z)dπdAdθ + h log(T ), (5.23)

where EDˆrec(.)[Drec(Θ)] = −2E\log Pr(.)[log Pr(y|π, A, θ )] in both criteria is a minimum expected recursive deviance, ˆDrec(.), evaluated at draws from the posterior distribution of all model parameters, Pr(π, A, θ |y), observed over an MCMC run.

We define these three cases. In case I, the model fit term in both AICrec1 and BICrec1,

represents the posterior mean of the recursive deviance. This case is as the same what is proposed by Brooks (2002) for autoregressive models. Furthermore, we contribute in developing the last two versions defined in the cases II and III as follows. In case II, we assume that the model fit term for both versions, AICrec2 and BICrec2, represents the recursive

deviance evaluated at the plugged-in estimates of the posterior distribution, namely, the posterior means, whereas in the case III, the proposed model fit term is inspired by the observed DIC3introduced byCeleux et al.(2006) who proposed that the model fit term to be a functional estimator (a minimum deviance, or equivalently maximum log-likelihood). Celeux

et al. (2006) pointed out that this such an estimator provides more stable evaluations.

Furthermore, its density is easily approximated by an MCMC evaluation. This estimator, i.e. a minimum deviance, was also proposed byRichardson(2002) in her discussion ofSpiegelhalter

et al. (2002). Accordingly, we define the model fit term in both AICrec3 and BICrec3 as a

minimum recursive deviance observed over an MCMC run.

Further details of the MC approximations for all these versions are provided at the appendix of this chapter.

5.4.2 Conditional likelihood-based AIC and BIC

Given a conditional log-likelihood, it is possible to derive several versions of the fit term of the AIC and BIC. The conditional likelihood-based AIC and BIC will be referred to as AICcon and BICcon, respectively. Analogous to criteria based on the recursive observed likelihood, i.e. AICsrecand BICsrec, we also introduce three classes of versions of these criteria as follows: Case I

AICcon1= Eθ ,z[Dcon(θ , z)] + 2h, = −2Eθ ,z[log Pr(y|θ , z)] + 2h, = −2

Z z

Z θ

[log Pr(y|θ , z)] Pr(θ , z|y)dzdθ + 2h, (5.24)

BICcon1 = Eθ ,z[Dcon(θ , z)] + h log(T ), = −2Eθ ,z[log Pr(y|θ , z)] + h log(T ), = −2

Z z

Z θ

where Eθ ,z[Dcon(θ , z)] = −2Eθ ,z[log Pr(y|θ , z)] in both criteria is the expected conditional deviance evaluated given draws from the posterior distribution, Pr(θ , z|y), of the stale-dependent parameter θ and hidden states z.

Case II

AICcon2 = Eθ ,zDcon( ˆθ , ˆz) + 2h, = −2Eθ ,zlog Pr(y|ˆz, ˆθ ) + 2h, = −2

Z z

Z θ

log Pr(y|ˆz, ˆθ ) Pr(θ , z|y)dzdθ + 2h, (5.26)

BICcon2= Eθ ,zDcon( ˆθ , ˆz) + h log(T ), = −2Eθ ,zlog Pr(y|ˆz, ˆθ ) + h log(T ), = −2

Z z

Z θ

log Pr(y|ˆz, ˆθ ) Pr(θ , z|y)dzdθ + h log(T ), (5.27)

where Eθ ,zDcon( ˆθ , ˆz) = −2Eθ ,zlog Pr(y|ˆz, ˆθ ) in both criteria is the expected conditional deviance, given a joint Maximum a posteriori (MAP) estimator (ˆz, ˆθ ) summarized from the posterior distribution, Pr(θ , z|y), of the state-dependent parameters θ and hidden states z. This joint MAP estimator can be approximated by using the best pair among the posterior draws, i.e., the pair that has the highest value of

ˆz, ˆθ = argmax z,Θ

Pr(y, z|θ )Pr(z|π, A)Pr(θ ).

Case III

AICcon3= EDˆcon(.)[Dcon(θ , z)] + 2h,

= −2E_{log Pr(.)}_\ [log Pr(y|θ , z)] + 2h, = −2 Z z Z θ h \ log Pr(y|θ , z) i Pr(θ , z|y)dzdθ + 2h, (5.28)

BICcon3= EDˆcon(.)[Dcon(θ , z)] + h log(T ),

= −2E_\_{log Pr(.)}[log Pr(y|θ , z)] + h log(T ), = −2 Z z Z θ h \ log Pr(y|θ , z) i Pr(θ , z|y)dzdθ + h log(T ), (5.29)

where EDˆcon(.)[Dcon(θ , z)] = −2Elog Pr(.)\ [log Pr(y|θ , z)] is a minimum expected conditional deviance, evaluated at draws from the posterior distribution Pr(θ , z|y), observed over an MCMC run.

Note that in case I we use the expected conditional deviance evaluated over the state-specific parameter, θ , and hidden state, z, as a model fit term of the AICcon1 and BICcon1. In case II we

use the conditional deviance evaluated at a plugged joint Bayesian estimator: (ˆz, ˆθ ) which can be joint maximum a posteriori (MAP) estimators of (z, θ ). In case III, both model fit terms in the AICcon3 and BICcon3 are given a function estimator that represents the minimum

conditional deviance value obtained through an MCMC run, given posterior draws of the state-dependent parameter, θ , and hidden states, z. This latter case is based on the same as the idea introduced in the case III with respect to the AIC and BIC based on the recursive deviance (sub-section (5.4.1)).

At the end this chapter, we provide all the MC approximations of these versions.

In document Model Fit Diagnostics for Hidden Markov Models (Page 138-143)