3.2 Model framework
3.2.4 Identifiability
A more general form of the hierarchical model could involve probe-specific variances; but since we observed little evidence of variation among probe variances, we used the simpler form of the model to avoid an excessive computational burden. Though it is not possible to do an analysis of the variances belonging to different probes in different states, since the states themselves would be unknown before the estimation, luckily we found out that the variances within the probe were uniform throughout the data set, i.e, the variances of these probe specific variances were negligible (less than .01) So we settled for a less complex hierarchical model which took into account the diversity between the probes only in terms of th mean structure
3.2.3 Adding covariate effects to the model
Underlying characteristics of the DNA sequence that affect chromatin rigidity may affect positioning of nucleosomes. In previous studies, the prevalence of certain polynucleotide sequences such as poly-A (repetitions of the “A” nucleotide), poly-T and some others have been seen to affect nucleosome positioning. Hence such sequence characteristics may be assumed to affect the correct prediction of nucleosomal state. In order to model the association between the covariates and states, we used three levels of models, relating the
covariates to the transition rates, or emission probabilities, via link functions.
• Model M0: the original model (Eqns. 3.2.4 and 3.2.3) assuming that the covariates do not affect either the state transitions or emissions.
• Model M1 (“transition model”): Here we use a multiplicative intensity model for associating the covariates to the transition rates, by assuming that the transition rate during the interval between two points depends on the the value of the covariates at the end of the interval, i.e. the last probe. LetXij
denote the value of covariate j (j= 1, . . . , d) for probei. Then the transition rates over the interval [ti−1, ti] become:
logλi(Xi) = θl0+ d X j=1 θljXij logµi(Xi) = θm0+ d X j=1 θmjXij. (3.2.5)
In the later sections, to simplify notation, we shall denoteλi(Xi) and µi(Xi) simply asλi and µi.
• Model M2 (“emission model”): Here we assume that the probability distribution of the observations conditional on the hidden state can also be affected by the
covariate measurements (i.e. probe sequence features). In this case, we assume that the state-specific probe measurement mean can be modeled as a linear function of the covariates, i.e.,
νi|Z(ti) =k∼N νk0+ d X j=1 βkjXij, τ0σ2k (3.2.6)
Note that the emission model (M2) has probe specific means already built into the model. So instead of modelingνij of the hierarchical set up we directly modelνi (We implemented the hierarchical setup for the base and transition models).
• Model M3 (“full model”): Here we assume both the transition intensities and state emission probabilities are affected by the covariates, that is, both expressions (3.2.5) and (3.2.6) hold.
None of these models described above lead to a closed-form analytical expression for the likelihood due to the hidden state variableZ, although this can be computed numerically through recursive techniques, as we discuss in Section 3.3. Before the model estimation procedure is described, we discuss the issue of identifiability in our proposed models, as it is highly relevant to the validity of our inference.
3.2.4 Identifiability
Here, we examine closely the identifiability conditions of the model described in Sections 3.2.2 and 3.2.3. Non-identifiability of a model implies that there are two sets of
parameter settings that give rise to the same likelihood (and posterior distribution, with a suitably non-informative prior). Usually in a Bayesian framework imposing an
informative prior may ensure identifiability. But since we want to impose minimal prior effects on inference and accordingly set flat priors, non-identifiability can lead to serious convergence problems in the Markov chain Monte Carlo (MCMC) sampling procedure and bias inference. There are two parts to the proof. Our approach is very similar to Leroux’s in the sense that we both base our result on the fundamental theorem stated by Teicher in his 1967 paper ’On the identifiability of mixtures in product measures’ This theorem has been the basis for proving identifiability conditions in several settings where the joint modeling of dependent data has come up. Hidden markov models form one such category of models. In Leroux’s proof of identifiability in HMM, the part of identifiability of emission densities ( that follow from the that the collection of emission densities should be same and the emission parameters have one to one correspondence with the densities ) is the same as ours. However for the next part on equivalence of the mixture weights, we diverge. From this part it can be concluded in both the models that the laws of the processes are equal, which implies that the transition probabilities are equal. In Leroux’s model, the proof is completed there itself since the parameters are the transition
transition probabilities by means of the transition equations, and it needs some work in this stage to translate the condition of equivalence of transition probabilities to that on transition rates.
To establish identifiability conditions for the proposed models, we extend the results of Teicher (1967). First, let us state the following definition and results.
Definition. Letfφ(y) be a parametric family of densities of Y with respect to a common dominating measureµand parameterφin some set Φ. If π is a probability measure on Φ, then the density
fπ(y) = Z
Φ
fφ(y)π(dφ) is called a mixture density.
We say that the class of (all) mixtures offφ is identifiable if
fπ =fπ0 µ−a.e iffπ=π0for all y
Further, we say that the class of finite mixtures offφ is identifiable if for all measuresπ and π0 with finite support,fπ =fπ0µa.e iffπ =π0.
Proposition 1. (Teicher, 1967). The class of joint finite mixtures of the normal family is identifiable.
Proposition 2. (Teicher, 1967). Assume that the class of finite mixtures of the family
fφ of densities of Y with parameterφ∈Φ is identifiable. Then the class of finite mixtures
of n-fold product densities fφ(n)(y) =fφ(y1). . . fφ(yn) with parameter φ∈Φn is
identifiable.
Proposition 2 was proved by induction onn (Teicher, 1967).
Now, note that any hidden Markov model is a finite mixture ofn-fold product densities, where the the weights of the mixture are functions of the transition probabilities.
Keeping this in mind, we applied the above results to prove the identifiability of models M0-M3, given in the following result.
Theorem 1. Models M0, M1, M2 and M3 are identifiable. In other words, if η denotes the total set of all parameters in any of the four models, andL(η;y) denotes the
The proof of Theorem 1 is given in the Appendix.