Inverse Crimes - Eulerian Data Assimilation

Chapter 2 Eulerian Data Assimilation

2.11 Inverse Crimes

When testing algorithms to tackle inverse problems, we must always be certain that we test it in a fair and objective way. Namely, we must not assume the model in our algorithm is a perfect match for the dynamics of the system that we are observing. It is potentially problematic to study only experiments where the data and assimilation algorithm use the same model and parameters. This is sometimes referred to as an

inverse crime [45]. Here we further test the algorithms by using data created with different parameters, resolutions, observation noise models, and time step sizes, to ensure that we are not committing such a crime.

It is important to consider the effect of differing resolutions in our data creation algorithm and in the sampling algorithm. In reality, of course, the true dynamical system is infinite dimensional, so we consider the case where our data is created using a much higher resolution than we use for the sampling method. In all of the following figures (apart from those that specify a lower resolution) the data was created using a grid for the vector field of 1000×1000 points (or 5×105 complex Fourier modes). The data assimilation algorithm was run with varying numbers of grid points for the velocity field approximation, running through 16, 100, 196 and finally 400 points. Figure 2.27 shows that as the number of grid points used in the velocity field approximation for the algorithm is increased, the marginal distribution for this particular Fourier mode appears to converge to a limit. The noise levels and quantity of data are such that the posterior is not a peaked distribution on the true Fourier mode which was present in the initial condition that created the data. For more details on the convergence of the posterior as the mesh used in the forward problem is refined, see [25].

−0.150 −0.1 −0.05 0 0.05 0.1 0.15 0.2 0.25 0.3 2 4 6 8 10 12 14 16 18 20 Re(u_0,1(0)) Probability Density 16 Fourier Modes 100 Fourier Modes 196 Fourier Modes 400 Fourier Modes Actual value

Figure 2.27: Re(u0,1(t)): Increasing resolution in model, high resolution data, Eulerian

data.

An in depth discussion of what can happen when there is a mismatch in the forcing functions for the dynamical system between the data environment and the algorithm for both Eulerian and Lagrangian data can be found in chapter 4, so we shall skip over this particular area in this section.

We now consider several different cases of a mismatch in the actual noise in our observations, and the noise model that we use in the likelihood function in the algorithm. First we consider the case of low variance noise in our observations, withΣ = 0.001I, where we use a larger variance of varying size in the likelihood function.

Figure 2.28 shows that as we increase the variance of Σ, the influence of the observations decreases, until we are sampling from a distribution that is very close to the marginal prior distribution for this Fourier mode. This makes perfect sense, as if we assume more uncertainty in our data, then we can draw less information from it, and have to rely on our prior beliefs.

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2 0 1 2 3 4 5 6 7 8 9 10 Re(u_0,1(0)) Probability Density ! = 0.01 ! = 0.1 ! = 1 ! = 10 Actual Value

Figure 2.28: Re(u0,1(t)): Increasing variance in the noise model of the algorithm, low

actual noise, Eulerian case

We may also consider the case where the observational noise in our data has much larger variance than the variance we use in our likelihood. Figure 2.29 shows what happens in this case. This scenario can cause a great deal of problems, as we are trying to infer accurately using poor data with a high signal to noise ratio (SNR). Essentially, this shows that the algorithm still works in this case, but the results may be poor.

We may also consider the case in which we use a very low resolution model for creating our data. This is not realistic in terms of data that is collected in the field, but gives us another opportunity to show how the algorithm copes with a mismatch between model and data.

Figure 2.30 shows the marginal distributions of one Fourier mode with Eulerian data created with a 4 ×4 grid, with a varying number of grid points used in the assimilation model. The resolution in the algorithm which gets closest to the correct answer is actually the lowest resolution, with 16 grid points on which the velocity field is

0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0 20 40 60 80 100 120 140 160 180 200 Re(u_0,1(0)) Probability Density ! = 0.0001 ! = 0.001 ! = 0.01 ! = 0.1 Actual Value

Figure 2.29: Re(u0,1(t)): Decreasing variance in the noise model of the algorithm, high

variance actual noise, Eulerian case

approximated. This is not surprising given that this is the resolution at which the data was created.

So in conclusion, we have demonstrated numerically that if we increase the resolution of the approximation of the velocity field in our algorithm, that the posterior converges in distribution. We have also presented how various mismatches in data creation and model used in the algorithm can affect the results.

In document Applications of MCMC methods on function spaces (Page 104-107)