Bayesian Model Selection methods

Chapter 2 Literature Review

2.9 Bayesian Model Selection methods

Bayesian model selection approaches could also be employed within a CP problem. Bayesian statistics concerns deriving the posterior distribution for the unknown quantities of interest and performing inference on this posterior distribution. Within the CP context, Bayesian approaches are focused on obtaining or approxi- mating the posterior distribution of CP characteristics. Namely, the joint posterior

p(M, τ1:M|y1:n) = p(M|y1:n)p(τ1:M|y1:n, M) is the quantity of interest. Such poste-

rior probabilities can be obtained via applications of Bayes’ Theorem and marginalisation. More explicitly,

wherep(M) andp(τ1:M|M) denote the prior on the number of CPs and the locations.

p(y1:n|τ1:M) denotes the marginal likelihood with respect to CP configuration τ1:M,

such thatθ has been marginalised out.

The ease in which the posterior is computed is very much dependent on the ease in computing the marginal likelihood p(y1:n|τ1:M), and assumptions placed on

the data and the model. For example, if segment independence is assumed, then it is convenient to compute the marginal likelihood as it is the product of segment marginal likelihoods (Eckley et al., 2011). However, in general situations, numerical approximation of p(y1:n|M) is often required to perform the marginalisation. The

choice of prior on the number of CPs present and their locations is also an important aspect in calculating the posterior which we shall discuss later on in this section.

An advantage of such Bayesian approaches is that it provides a more explicit quantification of the uncertainty regarding CP characteristics. In addition, the quantities presented above are not conditional on model parametersθand thus

the uncertainty associated with unknownθhas been accounted for. Having obtained the posterior, a variety of inference approaches could thus be applied. This includes Bayes’ Factor and Posterior Odds, the ratio between marginal likelihoods and posteriors respectively, which assesses the evidence of one CP configuration over another. For example p(y1:n|M=1)

p(y1:n|M=2) is the Bayes’ Factor between one CP being present over

two. Larger values of this factor indicate stronger evidence of one CP being present. Bayes’ Factor and Posterior Odds can also be used with respect to the posterior of CP configurations. CP estimates can also be obtained by minimising the expected posterior loss function for a suitable loss function. Such Bayesian approaches appear in Smith (1975), Carlin et al. (1992), Stephens (1994) in both single and multiple CP contexts.

An area of ongoing discussion in the Bayesian community is the choice of prior, our initial belief on the unknown quantity of interest. This is known to have an effect on the posterior on which inference is performed. This is no different in a CP context where priors are specified on both the number and location of CPs,

p(M) and p(τ1:M|M). There are variety of ways in which this can be performed,

dependent on one’s belief. Uninformative priors are often chosen in the Bayesian community if little is known on the unknown quantities. In a CP context this means one does not favour certain CP configurations. As a result, the likelihood has the most influence on the posterior rather than the prior. A naive, misguided prior in achieving this uninformative-ness is to assume the following Uniform distribution on both the number and location of CPs as in Bayesian CP analysis,

M ∼Unif({0,1, . . . , Mmax}) p(τ1:M|M) =p(τ1) M Y i=2 p(τi|τi−1) p(τ1) = 1 n₋M τ1 = 2, . . . , n−M p(τj|τj₋1) = 1 n−τj−1−1 τj =τj₋1+ 1, . . . , n−M +j−1, j = 2, . . . , M.

Whilst such a prior setup seems to be uninformative via the use of the Uniform distribution, Koop and Potter (2009) show that this is not a case for the location of the CPs with an undesirable clustering effect of CPs towards the end of the data. This effect may not be a true representation of one’s uninformative belief and should therefore be avoided if necessary.

stricted uniform priors for the CP location, p(τ1) = 1 ⌈c·n⌉ τ1= 2, . . . ,⌈c·n⌉ (2.12) p(τj|τj−1) = 1 ⌈c·n⌉ τj =τj−1+ 1, . . . , τj−1+⌈c·n⌉, j= 2, . . . , M. (2.13)

wherecis a tuning parameter controlling the maximum duration for each segment. Larger values correspond to longer segments of data. _⌈x_⌉denotes the ceiling function such that_⌈x_⌉= inf_{z_∈Z_|_x _≤_z_}_.

The form of the proposed priors look very similar to that of the “uninformative” uniform priors, although subtle differences occur. Namely CPs can occur beyond the scope of the data. By extending the potential scope of CP instances, this removes the undesirable clustering of CPs towards the end of the data, and thus provides a true uninformative prior for the CP location. In addition, this proposed prior also treats the number of CPs as an unknown with inference now focusing on the number of CPs occurring within the scope of the data. This is despite the number of potential CPs being pre-specified. Nevertheless, the proposed prior provides true uninformative belief and should thus be utilised if an uninformative prior is desired.

An alternative manner to specify a prior on both the number and location of CPs is to consider a prior on the segment length. This prior is introduced with respect to the methodology reviewed in Section 2.13 and we will consider it there in greater detail.

The Bayesian approaches reviewed in this section provide explicit quantification of CP uncertainty in the form of the posterior and is an attractive approach for the problem presented in this thesis. In addition, a Bayesian approach considers the uncertainty associated with the unknown model parametersθ by integrating them out of the joint posteriors obtained. This thus results in CP estimates which are not conditional on specified model parameters. Implementations of these Bayesian methods are scarce and often tailored with a specific problem and application in mind due to the priors and models assumed. Specifying appropriate priors on the number and location of CPs is a difficult task, particularly if it is sensitive on the posterior of interest. This is a potential disadvantage of the Bayesian approaches outlined in this section. If it is thus possible to obtain the posterior of the CP characteristics without having to specify such influential priors on the CP characteristics themselves, this would be a particularly advantageous Bayesian approach. One alternative approach is to specify a prior on the segment durations which is outlined in Section 2.13.

In document The uncertainty of changepoints in time series (Page 41-44)