Chapter 2 Eulerian Data Assimilation
2.13 Conclusions and Future Directions
By careful analysis of the forward problem, we have been able to formulate a well-posed Bayesian inverse problem regarding Eulerian data of the Stokes’ flow dynamical system. We have shown that the likelihood function is continuous with respect to a space that has full measure with respect to a specified choice of Gaussian prior measure. Using this, we have shown how to draw samples from well defined posterior distributions on function space using the RWMH MCMC sampler. We have then implemented this algorithm in C, and gained insight into what kind of information is available in Eulerian data. We have also shown how the standard RWMH method loses efficiency as the grid is refined, and that the method framed on function space does not. We have also verified the algorithm by checking against explicit posterior distributions that are calculable in certain situations, as well as showing that we are mindful of committing inverse crimes. This chapter demonstrates a method of tackling inverse problems that we will be using in several different scenarios throughout the thesis. The core to the philosophy underlying these methods is the belief that formulating numerical methods on infinite dimensional spaces and only discretizing once we choose to implement such a method gives us better algorithms that are robust under different discretisations and refinements. This subject could be extended in many directions, some of which will be ad- dressed in later chapters. We could also look to implement this problem for the full Navier-Stokes equations, which would then capture the non-linear behaviour that is hugely important in real life applications of data assimilation in fluid mechanics. The
necessary theoretical results concerning framing data assimilation of the Navier-Stokes equations has already been addressed in [24]. Implementing this problem for real life ap- plications would also involve formulating the problem on approximations of real domains, as opposed to the torus domain that we have picked for ease in this simple example.
The sampling method can also be improved. If an adjoint to the forward model were to be implemented, we would be able to calculate the gradient of the observation operator. This would allow us to include gradient information in the proposal distribu- tion, as in, for example, the Metropolis adjusted Langevin algorithm (MALA) presented in section 1.12.2. Moreover, we could also replace the burn-in with a deterministic method to find an approximation to the state of highest probability density, via an equivalent Tikhonov regularisation to the smoothing that the prior exerts (see section 1.11). Many of the deterministic methods used for these variational problems also use the gradient to find the solution, for example the gradient descent or conjugate gradient methods.
Another alternative, which could reduce the cost of the problem, would be to replace the forward model with a gPC approximation, as discussed in section 1.1. This would in all probability work quite well for the size of problem we have considered in the majority of the numerics in this chapter, where the state space has 100 degrees of freedom. However, if we were to increase this dimension for practical applications, the costs would soon become unfeasible, as unlike the MCMC method we have presented here, gPC approximations suffer from the curse of dimensionality.
In the next chapter, we will consider a very similar scenario in data assimilation of data observed from a Stokes’ flow system. This time, however, the observations will be Lagrangian in nature.
Chapter 3
Lagrangian Data Assimilation
3.1
Motivation
We now consider a similar problem with a different data type informing us about the state of the velocity field. Hundreds of GPSfloaters (which float on the surface of the ocean) anddrifters (which submerge to a given depth at which they are transported by the flow of the water) are currently distributed throughout the planet’s oceans in an attempt to better understand these complex dynamical systems. Periodically transmitting their position, as well as other data concerning salinity and temperature of the water amongst other things, these Lagrangian tracers create huge banks of data.
Figure 3.1 shows an example of such a data set, in the form of a spaghetti
diagram. We wish to infer, from this data, the entire state of the vector field in question, just as we did in the Eulerian case. The main difference between these two forms of data, is that in the Eulerian case, the observation operator is linear, as we are making direct observations of a linear system. In the case of Lagrangian dynamics, the observation operator is highly nonlinear. Moreover, the dynamics of the tracers themselves can
Figure 3.1: Spaghetti diagram of 20-day drifter trajectory segments. Colours give the mean drift direction (legend in upper-right corner). Taken from [53]
display chaotic behaviour[7]. Herein we will explore the differences between these two observation operators, and then using the same approach as in the previous chapter, we will describe Monte Carlo methods with which we can sample from well defined posterior distributions which give us information about the flow of the dynamical system.