Choice of prior - A framework for input estimation

Chapter 2 Theory

2.4 A framework for input estimation

2.4.1 Choice of prior

In this section, priors for scalar input functions are discussed. For vector-valued input functions, priors can be independently assigned to each component, as long as they are assumed to bea priori independent. Assigning priors jointly over all components will not be considered further in this work. In the input-estimation literature, the most common type of prior for a scalar input functionu(t) is given by

p(u(t))∝e−τ2 Rtf ti dj u dtj 2 dt (2.16) whereτ is the regularisation constant, andj is the order of the derivative, which is typically1 or 2. The logarithm of this prior is proportional to the L2 _{norm of the} input function’s derivative. Seen as a function ofu(t)this expression is not a norm, but a seminorm, since any polynomial of degreej−1can be added to the function without changing the value of the expression. This makes the prior improper, since it

cannot be integrated to give a finite value. This is not necessarily a problem, as long as the corresponding posterior is proper. If desired, it is possible to add additional factors to the prior to make it proper. These priors will assign lower probabilities to functions that have large higher-order derivatives, making highly oscillatory functions less probable. Penalising the first derivative corresponds to the intuitive notion that rapid changes in the function value area priori improbable. Similarly, penalising the second derivative corresponds to the notion that rapid changes in the slope of the function are improbable. Penalising a higher-order derivative will result in a higher degree of smoothness. This can be seen by noting that differentiation is a high-pass operation. Informally, this means that the derivative will usually be “noisier” than the function itself. Hence, by forcing the second derivative to be smooth, the function itself is forced to be even smoother.

Maximum entropy-based methods define a prior by discretising the input function in time intoNB points, and letting

p(u(t))∝e−τ

PNB−1

k=0 uklog_mkuk

(2.17) whereuk is the input function value at timetk. The value in the denominator,mk, is

considered to be a “baseline” value, which is the best guess for that value unless the data suggest otherwise. Hattersley et al. (2008) definemk to be the mean of adjacent

function values,(uk−1+uk+1)/2. This prior discourages large deviations from the baseline. For a straight line, where uk = mk for all k, the unnormalised log-prior

evaluates to zero. Any deviation from this line will result in a smaller probability density (Fig. 2.2).

A large class of potentially useful priors can be constructed by modelling the input function as a Gaussian process. A Gaussian process is defined as a stochastic process for which all finite-dimensional distributions are Gaussian. This means that if n time points are selected,t0, . . . tn−1, the vector of function values h

u(t1) . . . u(tn)

is a multivariate Gaussian (Rasmussen and Williams 2006). The statistics of the process can be uniquely defined by a mean function m(t), which represents the mean of u(t), and a covariance function K(s, t) which represents the covariance between u(s) and u(t) This idea is illustrated in Fig. 2.3. Using the L2 _{norm of the} _j_{th derivative as a log-prior is a special case of modelling the} input function as a Gaussian process. In particular, the choice ofj= 1 corresponds to m(t) = 0 and K(s, t) = 1/τ ·min{s, t}, and the choice of j = 2 corresponds tom(t) = 0 and K(s, t) = _τ1min{₆s,t}2(3 max{s, t} −min{s, t}) (Bell and Pillonetto 2004). Here, the regularisation parameterτ can be interpreted as the process noise precision, which is the inverse of the process noise variance.

0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 uk −1.4 −1.2 −1.0 −0.8 −0.6 −0.4 −0.2 0.0 0.2 Unnormalised log-prior densit y Entropic log-prior

Figure 2.2: Value of the entropy-based log-prior of Eq. (2.17), where the input function at one time point,uk, is varied, while the function at all other time points

is kept fixed at 1. The log-prior attains its largest value when uk is equal to its

neighbours. Hereτ = 1.

In pharmacokinetic models, it is often desirable to impose nonnegativity constraints, to ensure that the input function does not attain unphysical negative values. This requirement can be added to the problem by assigning a zero probability to functions that at any point drop below0. An alternative is to model the input function in the logarithmic domain, such that the prior is placed onlogu(t) rather than onu(t)directly. The latter approach can also make for more plausible models, since they capture a notion that if a function value is large, large changes in the value are more probable. As an example of why this may be more realistic, consider a typical pharmacokinetic experiment, where the input rate is large in the initial stages of the experiment, and close to zero in the later stages (see Fig. 2.1). When using an input model where the first derivative of the input function is penalised, a change in the input function by a certain amount will be penalised equally regardless of where the change occurs. In contrast, a logarithmic model penalises changes proportionally to the current function value. Hence, a large change in the region where the input value is close to zero would be considered less probable than a large change during the initial stage when the input value is large.

0.0 0.2 0.4 0.6 0.8 1.0 t −2 −1 0 1 2 3 4 u ( t ) t1 t2

Example Gaussian process

0.0 0.6 1.2 u(t1) 0.5 1.0 1.5 u ( t2 )

Figure 2.3: Example: Gaussian process u(t) with mean function m(t) = sin(2πt)

and covariance function K(s, t) = 0.1e−12 (s−t)2

10−3 . This is an example of a squared

exponential covariance function. The thick line shows the mean function, and the shaded area covers±1.96 standard deviations of the variance at each point. The thin lines are example realisations of the stochastic process. The function values at points t1= 0.1andt2 = 0.2form a bivariate Gaussian distribution, whose mean vector and covariance matrix can be computed fromm(t)and K(s, t)(shown in the inset). The covariance function assigns larger correlations between points that are close in time.

In document Input estimation in nonlinear dynamical systems for drug discovery applications (Page 41-44)