• No results found

5.3 Case Study 2: Scherberger Data Set

5.4.2 HSMM with Model Selection

In this section a supervisory decoder based on the HSMM with GLM dynamics (Def. 3.5) is used to model the neural activity of the Musallam data set. The variational learning algorithm for HSMM developed in Section 3.5.3 is used in conjunction with the Bayesian model class selection and ASD priors defined in Chapter 4.

The suboptimal substate insertion method (described in Section 5.4.1) for enumerating a set of model classes was used. The model class with the highest posterior probability was a 7-state HSMM with: Mc = [1,1,3,2] (see Figure 5.11). This model is remarkably similar to that identified with the

GLHMM (see Section 5.4.1). The prior distributions (Section 5.4.2.1) used to model knowledge of duration in each state were minimally informative, but were biased against rapid switching between discrete states. The identified HSMM model retained the characteristic switching between several substates of the memory period, found when identifying the GLHMM model. By increasing the amount of information contained in the prior distributions, the identified models could be constrained to avoid this behavior, however this type of model class has significantly smaller posterior probability when compared to the less informative counterparts.

Supervisory decoding using the 7-state HSMM requires the use of a fixed lag smoother (Def. 3.13) instead of the forward filter used for GLHMMs. This fixed lag smoother delays estimation of the discrete cognitive state by a fixed amount of time (in this section a 0.1 second lag was used). The fixed lag smoother is required due to the nature of the HSMM. An indicative example decode using the forward filter, the fixed lag smoother and the smoother (Def. 3.12) which uses all data from the trial are compared in Figure 5.12.

The fixed lag smoother (with a 0.1 second lag) used in conjunction with the identified 7-state HSMM correctly decoded 87.5% of trials. The fixed lag smoother decodes the correct “go” signal in 96.53% of trials, but the total performance of the decode algorithm is degraded by the presence of false positives. The smoother, using all data from the trial, correctly decodes 91.67% of trials. The forward filter proved to be practically useless for decoding: 0% of trials were correctly decoded, due to the excessive number of false positives occurring in the decode process. Note that the forward filter did decode the actual “go” signal in 100% of the trials, but these “correct” decodes are practically

50 100 150 200 250 300 0 1 500 1 S1.1 Duration d P(d) S2.1 S4.1 S4.2 S3.1 S3.2 S3.3 25 50 75 100 1000 1500 2000 2500 3000 1 0 50 100

Baseline Cue Memory Reach

150 200 250 300 0 1 S1 S2 S3 S4 S3.2 S1.1 S3.1 S3.3 S4.1 S2.1 S4.2 (a) (b) (f) (e) (d) c) (

Figure 5.11: The optimal 7-state HSMM identified using a combination of Bayesian model class selection and ASD priors. (a) Original left-to-right state sequence defined by experimental cues of the first trial of the training data set; the states represent baseline, cue, memory, and reach, respectively. (b) The corresponding 7-state HSMM state sequence found during the identification process. (c) The recorded arm movement during the trial. (d) The original left-to-right transition sequence of the 4-mode model. (e) The 7 sub-states of the optimal HSMM model, with allowed transitions between states depicted with an arrow. (f) The posterior distribution of the duration spent in each discrete state. the maximum allowed durationDfor any state was defined as 300, but there is no significant probability mass for durations longer then 100.

0 0 0 1 1 1 0 1 (a) (b) (d) (e) c) ( 50 100 150 200 250

Figure 5.12: Example decode with 7-mode HSMM when using: (b) The forward filter. (c) Smoother which utilized all of the observed data. (d) Fixed lag smoother with a lag of 0.1 seconds. The decode has been shifted by 0.1 seconds to align with the smoothing results and arm movement. (a) The original left-to-right state sequence defined by experimental cues. (e) The recorded arm movement during the trial. This trial was chosen to demonstrate the improved decoding ability of the smoother over the fixed lag filter. In the majority of trials the smoother and fixed-lag smoother produce nearly identical results. In all trials the forward filter produced many low-duration false positives typically indistinguishable from the decoding of the actual “go” signal.

indistinguishable from the false positives. The forward filter proves to be a poor choice for decoding HSMM, as it does not effectively take into account the duration spent in each mode. In the speech processing community the non-causal Viterbi algorithm is typically used in conjunction with HSMM (or HMM) for decoding [10]. We found that by accepting a small lag, the fixed lag smoother could recover most of the performance of the smoother using all of the data. The introduction of a 0.1 second lag in a neurological supervisory decoder is a negligible amount of time.

5.4.2.1 Prior Data for HSMM Supervisory Decoder

This section gives specific details about the prior distribution used in identifying HSMM supervisory decoders from the Musallam data set. The prior distributions on the GLM dynamics are the same as are used in (5.3.1). The priors on the HSMM transition matrix were biased to give an overall left-to-right model, but transitions between substates were not constrained and used an ASD prior to determine connectivity: a0ij= ⎧ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎩ 1 ifj=i+ 1

0.1 ifSi andSj are sub-states

0.001 else

50 100 150 200 250 300 350 500 1000 1500 2000 2500 3000 3500 0 1 0 1 (a) (b) c) (

Figure 5.13: 7-state HSMM decode of double reach (trial 55/144): In this trial the monkey first reaches for a target and then after a delay period conducts a second corrective reach to a second target. (a) Original left-to-right state sequence defined by experimental cues. (b) Decoded state sequence using forward filter. (c) The recorded arm movement during the trial. The grey section highlights the period where arm movement occurs

and where the HSMM is constrained so self-transitions are not allowed. The model identification process was found to be invariant to a range of hyperparmaters used in the prior definition. The prior distribution on the duration spent in each mode is generated by a Gamma distribution:

p0i(d) =c dα−1β

αexp(βd)

Γ(α) , (5.14)

where α = 10, β = 1/5. The value of c was set to 20, and represents the effective number of data points that the prior represents (see Remark 4.4 in Section 4.3). It was found that careful consideration of the prior distribution of the duration was required. Because of the nature of HSMM, there can be a very small number of transitions between discrete states, and if the constant c is chosen to be too high (>100), then the prior information “swamps” the posterior distribution. if the value of c is chosen to be very small (e.g., 1) then the prior resembles a ASD prior and few posterior durations have any significant probability mass. This difficulty may be avoidable by choosing to use parameterized (e.g. Poisson or Gamma) distributions for the posterior duration model.

Chapter 6

Conclusions

6.1

Summary of Thesis Contributions

The primary contribution of this thesis is the definition of a series of hybrid system models, and the development of Bayesian inference algorithms for identification of these models from observed data. By associating continuous dynamics with both stationary and nonstationary Markov chains, a series of hybrid models capable of modeling a range of biological and engineering systems were developed. Motivating the development process is the application of supervisory decoding for neural prosthetics (Chapter 5). Here, neural activity is modeled as a hybrid system which represents both the continuous dynamics of observed extracellular neural activity, and the discrete transitions between different cognitive or planning states.

The developed models and identification methods of Chapter 3 provide novel contributions in both the fields of hybrid systems and machine learning. A series of hybrid system models based on the hidden Markov model (HMM), the hidden semi-Markov model (HSMM), and the variable tran- sition hidden Markov model (VTHMM), were created by the addition of generalized linear model (GLM) dynamics. The resulting hybrid systems, including the generalized linear hidden Markov model (GLHMM), and its HSMM and VTHMM counterparts, were used to model both biologi- cal and mechanical systems. A key contribution in the thesis is the extension of the variational Bayesian (VB) framework to identification of the developed hybrid models; even without the addi- tion of GLM dynamics, applying VB to HSMM and VTHMM models is a significant contribution to the machine learning literature. These models are typically used in speech processing technology, and the developed VB approach has several inherent advantages over the standard EM implemen- tation. An additional Bayesian inference algorithm, the Gibbs sampler, is also adapted for use in GLHMM models. The GLHMM framework is applied to the identification of piecewise autoregres- sive exogenous (PWARX) models, a class of models that define discrete transitions based on the autoregressive state. Apart from providing a novel method of PWARX identification, this analysis motivates the development of a new class of hybrid system: the hidden regressor-dependent Markov

model (HRDMM).

When creating hybrid models of many systems, the prior intuition about the system’s structure may be incomplete, and Bayesian model class selection can be used to infer the number of discrete modes, transition structures, and orders of continuous dynamics of the model. Chapter 4 shows the importance of the model evidence for Bayesian model class selection in information theoretic terms, and applies two methods to evaluate the evidence. First, the developed VB approach in Chapter 3 inherently provides an estimate of the model evidence. Second, the Stationarity method for estimating the model evidence from posterior samples is refined for use in hybrid systems and systems with latent variables. This Stationarity method allows the Gibbs sampler to be effectively used for model class selection in hybrid systems. In addition to Bayesian model class selection, automatic structure determination (ASD) priors are defined which represent a body of work that allows subsequently applied inference algorithms to “prune” out unneeded model structure. ASD priors are developed for the HSMM model, and are then demonstrated by automatically identifying the number of movement primitives in a bee dance data set.

The developed Gibbs sampler and VB inference algorithms (Chapter 3) and associated model selection tools (Chapter 4) are used to build a supervisory decoder for neural prosthetics in Chapter 5. The design of a supervisory decoder, whose job it is to classify, in real time, the discrete cognitive or behavioral state of the brain region from which the neural signals are recorded, consists of two parts: (1) the identification of the hybrid model which represents the neural activity in each discrete state, as well as the transitions between states; (2) the design of an estimator which uses the hybrid model to classify activity into the discrete cognitive or behavior states. Three important contributions are made in the new framework over existing supervisory decoders: new models which are capable of both explicitly modeling the duration spent in each cognitive state and incorporating disjoint types of recorded neural signals are developed; the developed identification process is automatic, in that it does not require recorded neural data to be pre-segmented; and, if incomplete information about either the number of cognitive modes or the underlying neural process exists, then model class selection methods can be deployed to automatically infer the optimal model structure. All of these contributions were shown to improve the performance of the supervisory decoder on recorded neural data sets.

Related documents