Apart from the discussed circumstances under which predictive model selection (via Pseudo-BMS/BMA) or combination (via Bayesian Stacking) is preferential in multi-model usage (cf. Chapter 4), it remains a fundamental motivation in science and engineering to understand the DGP - and sooner or later to identify the true model. For a single distinct model, this is achieved by BMS/BMA. Often, this “search for the truth” is also the intention when modellers refer to combination of fully specified models for the same purpose - which might be falsely assumed by modellers to be provided by BMA.
I want to specify what is exactly meant by an illustrative example from H¨oge et al. (2019): “For an observed decline in concentration of a substance, two ex- perts might provide two plausible hypotheses. The first expert hypothesizes the concentration decrease results from only microbial consumption (M1) and the se- cond expert claims that solely abiotic reactions cause the decline (M2). Each expert comes up with a model that contains the mathematical formulation of their respective process. If BMA was applied to both models, it would prefer one over the other, and would do so increasingly clearly with more included data from the decline - BMA assumes only one of the two models can be true and tries to identify
it. BMA would not settle with the weights of the two processes like 75 % biotic and 25 % abiotic under growing data size, even if this ratio represented what actually happened in reality. ”
One might think that this ratio can be found by Bayesian Stacking. However, Bayesian Stacking would not search for the correct ratio on the process level, but for the optimal shares of the two individual predictive distributions from the two distinct hypotheses to superpose them. Hence, our desired kind of model com- bination of fully developed models is on the process level and therefore different from Bayesian Stacking.
What we want is to identify the representation of the process as “stacked” models (Minka, 2002) on process level, i.e. as superposition of individual models’ out- puts and not of their pdfs (see Section 5.1). A combined model in this sense is a weighted average of the mathematical model equations. The prediction of this combined model is the accordingly averaged individual model outputs. The right combination of models is supposed to represent the DGP and our goal is therefore to identify it - like the superposition of 75 % biotic model M1 and 25 % abiotic model M2 in the above example.
Bayesian Combined Model Selection and Averaging
Such an approach was originally prosed by Monteith et al. (2011) as Bayesian model combination. It rates combinations of models in line with consistent Baye- sian model selection to fulfil the identification-purpose. The term Bayesian model combination is concise but might cause confusion although it simply refers to the methodology of BMS/BMA applied to combined models CMkinstead of individual
models Mm. The combined models are defined as:
CMk = w1M1+ w2M2+ ... + wNMMNM (44)
The models M1, ..., MNM are the fully specified models in the model set M . The
weights wm are not found by the method, but proposed by the modeller or pro-
vided otherwise - in Monteith et al. (2011), the weights stem from an assigned distribution. Then, the same equations as for BMS/BMA (see Section 2.3.1) are applied to the defined CMk. The computational effort is only slightly larger than
applying BMS/BMA to the individual models because both frameworks require full marginalization over each individual model’s prior parameter distribution. The underlying theory and its consistent behaviour to identify the true model then holds for the rated combined models. Therefore, in a straight-forward manner, this
method is called Bayesian Combined Model Selection/Averaging (BCMS/BCMA) (H¨oge et al., 2019). Note, that the weights in Equation 44 do not express concep- tual uncertainty between the individual models as in BMS/BMA. In BCMS/BCMA, conceptual uncertainty refers to selecting one combined model as (quasi-)true. Therefore, the model weights (probabilities) for all combined models CM that come from applying the equations in Section 2.3.1 express conceptual uncertainty in this context.
While BMS/BMA converges to the individual model with a prior predictive distri- bution that is closest to the true data distribution q(y|Mtrue) (Minka, 2002; Mon-
teith et al., 2011), BCMS/BCMA converges to the optimal combined model CMopt
with a p(y|CMopt) that is closest. For an application example of BCMS/BCMA
to classification problems refer to Kim and Ghahramani (2012). In hydrosystem modelling, an approach that works similarly can be found in Ajami et al. (2007).
No Data Little Data Much Data
Figure 21: Identification of the model closest to the truth in the set with BMA/BMS (upper half) vs. identification of the most plausible combined model with BCMA/BCMS (lower half) under growing data size (from left to right). The true model (vertical dashed line) is situated between the two model candidates, M1 and M2. In BMA/BMS, weights are assigned to the two distinct models; in BCMA/BCMS, weights are assigned to combinations of both models (M1:M2, ratios in percent). BMA/BMS converges towards one model candidate, BCMA/BCMS converges towards a specific model combination (from H¨oge et al., 2019).
As discussed by H¨oge et al. (2019), for the illustrative example above hence fol- lows: “The difference between BMA/BMS and BCMA/BCMS becomes apparent when looking at the change in model weightings under growing data size. This is
illustrated for a simple two-model-setup in Figure 21, like the “biotic vs. abiotic decay” example from above. Without any data, indifferent uniform prior model weights are assigned to the two individual models M1 (microbial) and M2 (abiotic) in BMA/BMS, i.e. 50% each, and for all a-priori specified combinations of the two models in BCMA/BCMS, i.e. exemplary 5 combinations with 20% each. Referring to the conceptual example above, each combined model consists of both the biotic and abiotic reaction terms but by different fractions which resembles that the con- centration decrease is caused by both, e.g., 25% microbial and 75% pure chemical decay. Once the individual or combined models face a small amount of data, the model set member closest to the data gains strongest in weight and others gain less or lose model weight. These weights represent the uncertainty in BMA or BCMA of an individual or combined model, respectively, to represent the truth given the current data. Under more and more additional informative data, the weighting converges fully to the one most plausible member in the set: BMA turns into BMS for an individual model and BCMA turns into BCMS for a combined model. In a situation as visualized in Figure 21, where the truth lays somewhere between M1 and M2, BMA/BMS will tend towards the one single model in the set that appears to be most likely to have generated the data - either the biotic or abiotic model but not a mixture. Identifying a truth consisting of combined models will only be pursued by BCMA/BCMS, where the combinations are a-priori defined by the modeler and offered as candidates.”