Model validation - Models and methods - Variational Bayesian data driven modelling for biomedic

4.3 Models and methods

4.4.2 Model validation

So far, the third order linear modelM3 has been chosen as the best model. An empirical

validation method has been applied to check thatM3 did not overfit the data compared

with M2 by including one more extra degree of freedom in the parameter space. If a

model overfitted the data, the predictive ability of the model deteriorates. The validation method, known as the leave-one-out cross validation technique, has been applied to compare between the second order model M2 and the third order model M3 [200]. For

Figure 4.6: Fitting results compared between the two nonlinear models N M1 and N M2and the linear modelM3 for the time series shown in Fig.4.3(c). The measured

values are indicated by circles.

each time series in the cohort, we performed the following procedure for both models M2 and M3:

1) Leave data pointiout from the measurement time series, fit the model M (M =M2

orM3) based on the rest of the data points using the VB method, then after the

parameters are inferred, obtain an estimation for data pointiby substituting the inferred deterministic parameters, and compute the estimation error term (ei = yi−yˆi, whereyi is the measurement and ˆyi is the estimation);

2) Repeat step one for i= 1, . . . , n

3) Compute the RMSE frome1, . . . , enfor both models, and choose the model with the

smallest error.

The differences in the RMSE between M2 andM3 for each time series have been calcu-

lated and they are presented in the boxplot in Fig.4.7. It is clear that in most casesM3

has less errors compared with M2 between the observations and the estimations. The

differences in the errors estimated byM2 andM3 are tested by a one sample t-test with

significant level of 0.05. The test results rejected the hypothesis that the mean of the differences between M2 and M3 is zero, indicating thatM3 is a better model compared

It is worth noticing that there are cases whenM2 is the better model; however, the aim

is to identify a model that is capable of describing all of the time series and M2 is a

special case of M3. Therefore, the choice ofM3 overM2 was validated.

Figure 4.7: NRMSE value of the errors between the observations and the estimated

values byM2 andM3

4.4.3 Structural identifiability and parameter sensitivity

A structural identifiability analysis as introduced in Section2.6 has been performed for M3. For the third order linear system

...

xt+θ3x¨t+θ2x˙t+θ1xt+θ0 = 0 (4.9)

with observations

yt=xt (4.10)

and initial conditions

x0 =      x0 ˙ x0 ¨ x0      (4.11)

The Laplace transform of (4.9) is as follows:

Rearranging (4.12), the following form can be obtained: X= (x0+ ˙x0+ ¨x0)s 3_{+ (}_θ 3x0+ ˙x0)s2+ (θ2x0+θ3x˙0+ ¨x0)s−θ0 s4₊_θ 3s3+θ2s2+θ1s (4.13)

where θ3, θ2, θ1, x0+ ˙x0 + ¨x0, θ3x0 + ˙x0, θ2x0+θ3x˙0+ ¨x0 and θ0 are assumed to be

known [93]. Therefore,θ3,θ2,θ1 andθ0 are uniquely identifiable.

To check parameter sensitivity, the procedures explained in Section 2.7 – the one-at- a-time parameter sensitivity analysis – were performed for model M3 (see details in

Section2.7). To check how sensitive the output is to a small change in each deterministic parameter, the following steps have been performed for model M3:

1) Simulate the time series with no measurement or system noise and obtain the root mean square error (RMSE) between the noise free time series and the measurement time series.

2) Take 1000 random samples of θ_i(j) (j = 1,2, . . . ,1000) from a uniform distribution from 0.99ˆθi to 1.01ˆθi, where ˆθi is the posterior mean of θi.

3) Calculate the RMSE values between the measurement values and each of 1000 gen-

erated time series, denoted as RMSE(j), using the sampled parameterθ(_ij) and the posterior means of the rest of the parameters.

4) Using (2.83), obtain the sensitivity index SI for parameterθi.

The same procedure can be performed to the inferred parameters of all the time series. Considering the parameter values for different time series fitted byM3 are in the same

neighbourhood, the sensitivity indices SI of θ1, θ2, θ3, and θ0 in M3 for one typical

time series as shown in Fig. 4.6 is presented as an example here. As shown in Table

4.2, 1% change in θ1 inM3 can cause 0.17% change in RMSE. The The RMSE values

between the measurements and the output, generated when the posterior means of the parameters were used, is RMSE(j)= 239.0 AU. With 1% perturbation in ˆθ1, RMSE(j)is

between 239.3 AU to 240.2 AU. According to Table4.2, the RMSE values remain within a small range when each of the four parameters are perturbed within 1%, the modelM3

Table 4.2: Summary of the parameter sensitivities for M3

Parameter SI RMSE(j) range

θ1 0.17 239.3 – 240.2 AU

θ2 0.11 239.4 – 240.1 AU

θ3 0.30 239.4 – 241.5 AU

θ0 -0.19 239.2 – 240.1 AU

In document Variational Bayesian data driven modelling for biomedical systems (Page 144-148)