Using a Stability Matrix as Prior - Prediction for Models with Cubic Drift and Linear Diffu-

Chapter 7 Prediction for Models with Cubic Drift and Linear Diffu-

7.3 Using a Stability Matrix as Prior

The aim of deriving a stability matrix and developing efficient ways to sample it was so that it can be used as prior information for parameters in the cubic models studied in Chapter 5. Here we demonstrate its effect on the posterior distribution of parameters estimated for an example problem of the form Eq. (5.1), with randomly generated parameters. As in Figure 5.16 we compute the full posterior for parameters but also estimate the density for just those parameters that give stable solutions to the resultant SDE. This was calculated numerically by simulating the SDE for each parameter vector and recording whether the solution remained bounded. Using

data from an arbitrary two dimensional model of the form Eq. (5.1) withN = 100

and ∆ = 0.1 we estimated all 20 of the parameters entering the drift term. Those

parameters in the diffusion function were fixed. TheComponent-wisealgorithm

was used to sample the 8 cubic parameters using a stability matrix of the form Eq. (7.7) while the others were sampled using the standard Gibbs sampler of Section

5.4.1. We estimated the posterior distributions using 3×106 MCMC samples taken

from 3 chains after checking each had converged to the same distribution.

posteriors which use a stability matrix as prior. In this case the subset of stable parameters is very similar to the full posterior: the stable parameters account for 80% of the whole distribution. The posterior which includes the stability matrix prior is close to the full posterior but has some different features, particularly for those parameters that enter the diagonal components of the stability matrix. The stability matrix restricts them to be negative, which is evidently much too strong a constraint in this case. Work would need to be done to remove this constraint. In general, the prior information, in its current format, is too restrictive but there are several possibilities for relaxing the constraints while ensuring stable SDEs are inferred.

The form of matrix given in Eq. (7.6) is not the only possible way of deriving a matrix that satisfies Eq. (7.5). One could think of a method of entering

components into M such that they are all off diagonal. Also it would be useful

to make the matrix larger so that no two parameters enter the same component. Of course with a larger matrix there will be some redundant components that are equal to 0. This may cause a problem for the sampling strategy, particularly those algorithms based upon the Wishart distribution. The probability of proposing a matrix with one component set at a definite value is 0 so these algorithms might

not be applicable. However, it would still be possible to use theComponent-wise

algorithm without much alteration.

Further study would lead to a greater understanding of the minimally restrictive conditions that can be derived to enforce stochastic stability. This could either be developed using the same Lyapunov function used here, namely the simple squared Euclidean norm, or could involve research into other Lyapunov functions. Still using the same Lyapunov function we could learn how to implement the stability bound in Theorem 1 that includes other parameters besides the cubic terms. In particular this would include the parameters that enter the stochastic terms. This would be a departure from the approach of Majda et al. [2009] and may lead to a more general approach of using prior information to infer non-linear SDEs.

7.4 Summary and Conclusions

In this chapter we have addressed the problem of stochastic stability for SDEs of the form Eq. (5.1) inferred from data. We have proposed a solution, motivated by the work of Majda et al. [2009], which implements an energy constraint on the system. In Section 7.1.2 we derived a means of casting this energy constraint into the requirement that a certain matrix be negative definite. This matrix’s components

Density −3 −2 −1 0 1 2 3 0.0 0.2 0.4 0.6 (a)A1,7 Density −4 −3 −2 −1 0 1 2 3 0.0 0.2 0.4 (b)A1,8 Density −3 −2 −1 0 1 2 3 0.0 0.2 0.4 (c)A1,9 Density −3 −2 −1 0 1 2 3 4 0.0 0.2 0.4 (d)A1,10 Density −3 −2 −1 0 1 2 3 4 0.0 0.2 0.4 (e)A2,7 Density −3 −2 −1 0 1 2 3 0.0 0.2 0.4 0.6 (f)A2,8 Density −4 −2 0 2 0.0 0.2 0.4 (g)A2,9 Density −4 −3 −2 −1 0 1 2 0.0 0.2 0.4 0.6 (h)A2,10

Figure 7.7: Estimate posterior distributions for parameters from a two dimensional

model of the form Eq. (5.1) withN = 100 and ∆ = 0.1. The parameters, which are

randomly generated, are written in the matrix notation introduced in Section 5.4.1. The histograms are the posterior distributions with uninformative prior, in red are the posterior distributions for parameters with stable SDEs and in black are the posterior distributions which include the stability matrix prior information derived

are the parameters entering into the cubic terms of the SDE. Requiring this Stability Matrix to be negative definite places bounds on the domain of these parameters. This is included in the Bayesian framework as prior information.

This novel use of prior information has consequences for the MCMC algorithm used for inference; the Gibbs sampler of Section 5.4.1 is no longer applicable. In Section 7.2 we considered five different algorithms to sample the Stability Ma- trix. These included basic rejection and random walk sampling, which were found to be inefficient compared to a component-wise algorithm. This Component-wise algorithm is complicated to implement as it involves solving a quadratic equation to compute the upper and lower bounds of each parameter. It then implements a rejection algorithm to sample truncated Normal distributions. However, it was found to be more efficient than algorithms based upon the Central and Non-Central Wishart distributions. As far as we are aware, these distributions have not been used as proposals in a Metropolis-Hastings algorithm and the work here is new. We studied how to select the parameters of the Central Wishart distribution in Sec- tion 7.2.3 and found that the optimal efficiency corresponded to an acceptance rate

close to 0.234, which corresponds to a broad class of Metropolis-Hastings algorithms

[Roberts et al., 1997]. We derived the Non-Central Wishart algorithm in Section 7.2.4. This is a novel use of this distribution. However, it is not clear how to tune the parameters and the algorithm is very slow computationally due to the need to calculate matrix Hypergeometric functions for the proposal density. Further work needs to be done to understand how to optimise this algorithm.

Based on the theory of stochastic stability discussed in Section 2.6 we know that negative definiteness of the Stability Matrix is a sufficient condition to ensure the inferred parameters lead to SDEs whose solutions remain bounded. However, as discussed in Section 7.3, in its present form, it is likely overly restrictive on the space of parameters. There are many ways in which the constraints could be relaxed while ensuring stochastic stability. The matrix could be enlarged such that any increase in dimension adds no further restriction on the parameter space. This limiting matrix, if it exists, would then be minimally restrictive. The Component-wise algorithm, derived in Section 7.2.2 would still be able to sample this matrix. Further work, involving more detailed study of matrix spaces could be pursued in this direction.

The methods in this Chapter could be developed into a very general framework for including stability as prior information for SDE inference. Different Lya- punov functions could be tested to see if this leads to any practical algorithms that can be derived. For the inference problems in this thesis we find it useful to implement the Component-wise algorithm to sample the Stability Matrix as we find

that the advantage of being guaranteed a stable SDE outweigh the fact that the prior has an unquantifiable influence on the posterior and in some cases may affect the estimates. In the next Chapter we apply the methods developed here, and the previous two chapters, to fit SDEs of the form Eq. (5.1) to the dynamical systems discussed in Chapter 3.

In document Methods of likelihood based inference for constructing stochastic climate models (Page 184-189)