Estimation across multiple models with
application to Bayesian computing and
software development
Richard J Stevens (Diabetes Trials Unit, University of Oxford)
Trevor J Sweeting (Department of Statistical Science, University College Lon-don)
Abstract
Statistical models are sometimes incorporated into computer software for making predictions about future observations. When the computer model consists of a single statistical model this corresponds to estimation of a function of the model parameters. This paper is concerned with the case that the computer model imple-ments multiple, individually-estimated statistical sub-models. This case frequently arises, for example, in models for medical decision making that derive parameter information from multiple clinical studies. We develop a method for calculating the posterior mean of a function of the parameter vectors of multiple statistical mod-els that is easy to implement in computer software, has high asymptotic accuracy, and has a computational cost linear in the total number of model parameters. The formula is then used to derive a general result about posterior estimation across multiple models. The utility of the results is illustrated by application to clinical software that estimates the risk of fatal coronary disease in people with diabetes. Key words: Bayesian inference; asymptotic theory; computer models.
Research Report No.274, Department of Statistical Science, University College London. Date: January 2007
1
Introduction
It frequently occurs that a computer model estimates a single output from multiple, individually-estimated statistical models. For example, a computer model for risk of complications of diabetes (Eastman et al. 1997) combines a survival model for heart disease estimated from the Framingham Heart Study with a Markov model for diseases of the eye estimated from the Wisconsin Study of Diabetic Retinopa-thy and several more models. This approach is typical of many models developed since in diabetes and in other fields (Brown 2000). A typical output to be esti-mated might be life-expectancy or quality-adjusted life expectancy. Calculations are often computationally demanding and a common practice at present is to re-strict attention to the maximum likelihood estimate (MLE). When uncertainty across multiple models is addressed, it is usually by Monte Carlo methods, which increases the computational burden substantially; so much so that many published software models take many hours to estimate uncertainty (e.g. Clarkeet al. 2004), or decline to address uncertainty at all (e.g. Eddy and Schlessinger 2003).
In this paper we provide a framework for calculations across multiple statistical models, in which the computational burden at run-time increases only linearly with the total number of model parameters. Our approach uses an asymptotic approximation formula for posterior moments obtained by Sweeting (1996) which we will refer to here as the summation formula. This formula appears to have no equivalent outside the Bayesian paradigm and for that reason the remainder of this paper is restricted to Bayesian estimation via posterior expectations. That the summation formula can be useful in practice has been shown previously (Stevens 2003). Although other asymptotic methods exist for the single-model situation (see, for example, Hoeffding 1982, Tierney and Kadane 1986), we show here that the summation formula has properties relevant to the situation in which a computer model uses two or more statistical models to obtain output. The summation formula also has properties which allow most of the computational burden to be absorbed at development time, with rapid calculations at run-time.
The summation formula is given, without proof, in Section 2. In Section 3 we prove our main result by considering the case of a single model whose likelihood, for a parameter vectorθ, factorizes intoplikelihood functions for subvectorsθ1, . . . , θp. This, and the more general corollary derived in Section 4, can be applied to the case of a single computer model whose output depends on p independently estimated
statistical models. In Section 5 we give as an example computer software that combines three statistical models to estimate risk of fatal coronary heart disease. Section 6 gives a general discussion.
2
Formulae for posterior expectations based on
signed-root transformations
In this section we review the summation formula in Sweeting (1996), which provides asymptotic approximations to posterior expectations.
Suppose that a model is represented by a likelihood function L(θ) ≡ Ln(θ;x) on a k-dimensional parameter vector θ = (θ1, . . . , θk), based on a data matrix x withn rows, and a prior densityπ(θ). Letl(θ) = logL(θ) be the log-likelihood and
ˆ
θ = (ˆθ1, . . . ,θˆk) be the MLE ofθ. Also let J be the observed information matrix; that is, J is the matrix of second-order partial derivatives of−l(θ) evaluated atθ =
ˆ
θ. For each i= 1, . . . , k, let Li(θi) denote the profile likelihood of (ˆθ1, . . . ,θˆi−1, θi). That is, Li(θi) is the maximum value of L achievable, for a given θi, by fixing
θ1, . . . , θi−1 at their MLEs and maximising over all possible values of θi+1, . . . , θk. Define ˆθi+1(θi),θˆi+2(θi), . . . ,θˆk(θi) to be the values of θi+1, θi+2, . . . , θk at which this conditional maximum is achieved. Finally, define the profile log-likelihoods
li(θi) = logLi(θi). We can now define a transformation Ri of eachθi, i= 1, . . . , k, by
Ri(θi) = sign(θi−θˆi)[2{l(ˆθ)−li(θi)}]1/2.
Each Ri is a signed-root transformation of θi; in full, it is a signed-root profile log-likelihood ratio. Note that Pi{Ri(θi)}2 = 2{l(ˆθ)−l(θ)}, the usual log- likelihood ratio statistic.
For each i, define θ+
i and θ−i to be the solutions to the equations Ri(θi) = 1 and Ri(θi) = −1 respectively. Sweeting (1996) uses these values as the basis for an approximation to posterior expectations. For i= 1, . . . , k−1, defineji+1(θi) to
be the matrix − ∂2l ∂θ2 i+1 · · · − ∂2l ∂θi+1∂θk · · · · · · · · · · − ∂2l ∂θk∂θi+1 · · · − ∂2l ∂θ2 k
evaluated at (ˆθ1, . . . ,θˆi−1, θi,θˆi+1(θi), . . . ,θˆk(θi)), withjk+1(θk) defined to be 1 and
Now define τ+ i =πi(θi+)|ji+1(θ+i )|1/2 ( ∂ ∂θi l(ˆθ1, . . . ,θˆi−1, θi+,θˆi+1(θ+i ), . . . ,θˆk(θi+)) )−1 (1) τi− =πi(θi−)|ji+1(θ−i )|1/2 ( ∂ ∂θi l(ˆθ1, . . . ,θˆi−1, θi−,θˆi+1(θ−i ), . . . ,θˆk(θ−i )) )−1
and τi = τi− +τi+. Finally, let α+i = τi+/τi and α−i = τi−/τi. Then Sweeting’s (1996) approximation to the posterior expectation E{g(θ)} of a ‘smooth’ function
g(θ) is E{g(θ)}=g(ˆθ) + k X i=1 {α+ i g+i +α−i g−i −g(ˆθ)} (2) in which gi+ = g(ˆθ1, . . . ,θˆi−1, θ+i ,θˆi+1(θi+), . . . ,θˆk(θi+)) and gi− = g(ˆθ1, . . . ,θˆi−1, θ−i , ˆ
θi+1(θ−i ), . . . ,θˆk(θi−)). The error term in (2) turns out to be O(n−2) (Sweeting, 1996). This may be compared with the error of the plug-in approximation g(ˆθ), which is O(n−1). The above result holds whenever the prior density π(θ) is O(1)
with respect to n.
We emphasise that, despite the complexity of the derivations, the implemen-tation of these methods is relatively straightforward. The presenimplemen-tation here is re-stricted to signed-root log-likelihood ratios. Sweeting (1996) embeds these in the more general framework of signed-root log-density ratios. Sweeting and Kharroubi (2003) obtain alternative versions of formula (2) in which the abscissae θ+
i , θ−i are placed symmetrically either side of ˆθi at a distance ki−1/2, where ki is the recip-rocal of the first entry in {ji(ˆθi−1)}−1. It is shown that these formulae are also
correct to O(n−2). A disadvantage of these versions is that they are not invariant
to reparameterisation, so some care needs to be taken when choosing the parame-terisation. On the other hand, they may be easier to implement as no inversion of the signed-root log-likelihood ratio is required for the computation of θi+ and θ−i .
It is possible to obtain an approximation for the posterior variance of g(θ) by applying formula (2) to both g(θ) and {g(θ)}2. However, in terms of asymptotic
error of approximation, no gain is made over the simpler approximation that uses the observed information matrix along with the delta method,
Var{g(θ)}={g0(ˆθ)}TJ−1g0(ˆθ), (3) since this is already an O(n−2) approximation. Here g0
(θ) is the column vector of first-order partial derivatives of g(θ). If further a normal approximation to the
posterior distribution of g(θ) is appropriate, then clearly interval estimates for
g(θ) can be obtained from (2) and (3). In general such (two-sided) intervals will only be accurate to O(n−1). Higher-order accuracy may be achieved by using the
distributional approximation formulae in Sweeting (1996), for example.
3
Application to multi-component models
In this section we consider the case of a single computer model that incorporates
p statistical sub-models, possibly derived from independent data sets, to calculate an output. The parameter vector of the combined model is θ= (θ1, . . . , θp), where
θm = (θm1, . . . , θmkm) is the vector of parameters associated with the mth
sub-model, m = 1, . . . , p. The dimension of the overall parameter space is therefore
k =k1+· · ·+kp. Let the data matrix associated with sub-modelm havenm rows. We assume that the likelihood factorizes as
L(θ) =L1(θ1)L2(θ2)· · ·Lp(θp), (4) where now Lm(θm) denotes the likelihood function associated with the mth sub-model. Independent data matrices for the sub-models will be sufficient (but not necessary) for this factorisation. The profile likelihood of (ˆθm1, . . . ,θˆm,i−1, θmi) for themth sub-model is denoted byLmi(θmi). We further suppose that the component parameter vectorsθ1, . . . , θp are a prioriindependent so that the prior density also factorizes as
π(θ) =π1(θ1)π2(θ2)· · ·πp(θp). (5) It will be convenient to re-express formula (2) in a new notation. For each sub-model m= 1, . . . , pdefine a set of abscissae θm[j], j = 0, . . . ,2km, as follows.
θm[0] = ˆθm θm[2i−1] = (ˆθm1, . . . ,θˆm,i−1, θmi+,θˆm,i+1(θmi+ ),· · ·,θˆm,km(θ + mi)) θm[2i] = (ˆθm1, . . . ,θˆm,i−1, θmi−,θˆm,i+1(θmi− ),· · ·,θˆm,km(θ − mi)) for i= 1, . . . , km and define the corresponding set of weights αm[j] by
αm[0] = 1−km
αm[2i−1] = α+mi
Now suppose that some quantityg =g(θ) is the output of the computer model given the parameter vector θ, possibly dependent on additional data supplied by the computer user. We derive formulae for E{g(θ)} as follows. Let Em{g(θ)} denote the posterior expectation of g(θ) obtained from sub-model m alone, with the parameter vectors θs, s6=m held fixed. Then, from equation (2),
Em{g(θ)}= 2Xkm j=0 αm[j]g(θ1, . . . , θm−1, θm[j], θm+1, . . . , θp) (6) to O(n−2 m ).
Given the likelihood and prior factorisations (4) and (5), formula (6) gives the conditional posterior expectation of g(θ) given θs, s 6=m. Application of the iterated expectation formula therefore gives the posterior expectation of g to be
E{g(θ)}= 2k1 X j1=0 · · · 2kp X jp=0 α1[j1]· · ·αp[jp]g(θ1[j1], . . . , θp[jp]). (7) Notice that this formula requires (2k1+ 1)×(2k2+ 1)×. . .×(2kp+ 1) evaluations of the function g.
The results derived in the Appendix provide an alternative to (7). Consider instead the overall k−parameter model. Given the factorisation of the component likelihood functions, the fitted parameter vector is ˆθ = (ˆθ1, . . . ,θˆp). From the results in the Appendix, the abscissae for the signed-root approximations areθ[0] =
ˆ
θ and, for j = 1, . . . ,2km, m= 1, . . . , p,
θ[2(k1+· · ·+km−1) +j] = (ˆθ1, . . . ,θˆm−1, θm[j],θˆm+1, . . . ,θˆp),
whereθm[j] is obtained from themth likelihood alone. The corresponding weights, from the Appendix, are α[0] = 1−k and
α[2(k1+· · ·+km−1) +j] = αm[j] ,
where again αm[j] is obtained from the mth likelihood alone. Invoking formula (2), we obtain the alternative approximation
E{g(θ)}=
2k X
r=0
α[r]g(θ[r]), (8)
which requires only 2k+ 1 evaluations ofg compared with theQm(2km+ 1) evalua-tions for formula (7). Furthermore, although both formulae (7) and (8) are correct
to O(n−min2 ), where nmin = min(nm, m = 1, . . . , p), formula (8) is likely to be more accurate in practice, since the errors will be compounded in (7). We note that the quantities θm[j] and αm[j] depend only on the likelihood and prior from the
mth sub-model and, for each sub-model m, the components θs, s 6= m, are fixed at their MLE values in the evaluation of g(θ[r]).
The developers of a computer model, or its component statistical models, would determine the values of θ[r] and α[r], r = 1, . . . ,2m, during the development cycle. In software to calculate estimates of some function g(θ, X), with values of
X provided by the user, the chief computational burden at run-time is the 2m+ 1 evaluations of g(θ[r], X).
4
A general posterior expectation decomposition
In this section we note that formula (8) can be decomposed into p approximate posterior expectations from each of the sub-models. This leads to an interesting general approximation formula for the posterior expectation of g in terms of the component posterior expectations.
Given a functiong(θ) in the multi-component situation of Section 3, define the
p functions
gm(θm) =g(ˆθ1, . . . ,θˆm−1, θm,θˆm+1· · ·θˆp). Then, from formula (8),
E{g(θ)} = (1−k)g(ˆθ) + 2k X r=1 α[r]g(θ[r]) = (1−k)g(ˆθ) + p X m=1 2Xkm j=1 αm[j]gm(θm[j]) = (1−k)g(ˆθ) + p X m=1 [E{gm(θm)} −(1−km)g(ˆθ)] = g(ˆθ) + p X m=1 [E{gm(θm)} −g(ˆθ)], (9) which expresses the overall posterior expectation of g(θ) as the first-order approx-imation g(ˆθ) plus a sum of correction terms calculated from each sub-model. It is not hard to see that (9) holds toO(n−min1 ). What is new here is that it holds to the higher asymptotic order O(n−min2 ).
This relationship between posterior expectations and maximum likelihood es-timates may find application in calculating expectations across multiple models,
even when the asymptotic methods of Sections 2 and 3 are not employed. For example, if the sub-models are low dimensional, then exact, or numerically com-puted, component expectations could be used in (9). In the case of numerical integration, formula (9) gives a clear computational benefit over p−dimensional numerical integration. We further note that one or more of the MLEs of θm in (9) could be replaced by their posterior means if these were more readily available (e.g. analytically). This can be shown using the fact that, in general, the posterior mean and the MLE differ by O(n−1).
Sweeting and Kharroubi (2003) derive an alternative version of the summation formula (2) that is of the form
E{g(θ)}= k X
i=1
wi(α+i gi++α−i gi−), wherewi are weights satisfying
P
iwi = 1 and the gi±values are evaluations ofg at points defined in an alternative way to the θ±i in Section 2. It turns out that this approximation is also O(n−2). A discussion of the advantages of this version of the
formula is given in that paper. We simply note here that this alternative version could be used to approximate the component posterior expectations ofgm(θm) and then (9) used to approximate the overall posterior expectation of g(θ).
The posterior uncertainty in g(θ) may be measured by its posterior variance, which again can be obtained from the sub-model variance formulae. Specifically, from (3) we have, to O(n−2), Var{g(θ)} = {g0(ˆθ)}TJ−1g0 (ˆθ) = p X m=1 {g0m(ˆθm)}TJm−1g 0 m(ˆθm) = p X m=1 Var{gm(θm)},
where gm0 (θm) is the column vector of first-order partial derivatives of gm(θm) and
Jm is the observed information associated with sub-modelm. The above additive formula follows since J is block diagonal. Again, any exact or approximate values for Var{gm(θm)} could be used in this formula.
5
Example
We illustrate the use of (8) and (9) with an application in which three statistical models are used to estimate risk of fatal coronary heart disease.
The UK Prospective Diabetes Study (UKPDS) Group have published an equa-tion for coronary case fatality in type 2 diabetes (Stevens et al. 2004). Clinically, case fatality is the proportion of cases of a disease that are fatal; for the purposes of the model, case fatality is defined as the probability that coronary disease is fatal, conditional on coronary disease occurring. Given an individual with observed value
x= (x1, . . . , x5)T of a vector of covariatesX = (X1, . . . , X5)T, with the convention
that X1 = 1, the UKPDS model for case fatality p(x), given a parameter vector
µ= (µ1, . . . , µ5), is
p(x) = h1 + expn−xTµoi−1 (10) The MLE of µ is ˆµ = (0.713,0.048,0.178,0.141,0.104) (with corresponding stan-dard errors 0.194, 0.0236, 0.0119, 0.0612 and 0.0417), derived by fitting a logistic regression model to data from 597 people with coronary disease as described in detail previously (Stevens et al. 2004).
The UKPDS Group have also published an equation for risk of coronary disease in type 2 diabetes (UKPDS Group 2001). Given an individual with observed value
y = (y1, . . . , y8)T of a vector of covariates Y = (Y1, . . . , Y8)T, with the convention
that Y1 = 1, the modelled risk of coronary disease over t years depends on a
parameter vector ν = (ν1, . . . , ν9) as follows:
R(y, t) = 1−exp{−q(1−ν9t)/(1−ν9)}, (11)
where q = exp{(ν1, . . . , ν8)y}.
The MLE ofνis ˆν = (0.0112,1.059,0.525,0.390,1.350,1.183,1.088,3.845,1.078) (with corresponding standard errors 0.00154, 0.0155, 0.00666, 0.0511, 0.103, 0.124, 0.0365, 0.0250 and 0.638, derived by fitting the model to 4,540 people with a me-dian 10.7 years of follow-up, as described previously (UKPDS Group 2001). There were 597 incident cases of coronary disease in this cohort, and these are the same 597 that were used to fit the case fatality equation (10).
Variable X3 in the case fatality model, and variable Y5 in the coronary risk
model, take the same value. Glycated haemoglobin (abbreviated HbA1c) is a widely-used indicator of prevailing blood glucose levels over recent months. Mea-surements of HbA1c are subject to variation between laboratories. The models define X3 and Y5 to be HbA1c as measured on an assay aligned to the reference
laboratories of the National Glycosylation Standardisation Program (NGSP). The NGSP publishes linear regression models
relating X3, HbA1c measured in an NGSP reference laboratory, to h, the HbA1c
measured in another laboratory. Let λ = (λ1, λ2, λ3). For a particular local
lab-oratory the MLE of λ was ˆλ = (0.229,0.982,0.124) (with corresponding standard errors 0.079, 0.0096 and 0.014), so that an HbA1c of 4.5% in the local laboratory corresponds to a MLE of 4.65% at the reference laboratory.
Although the risk model and case fatality model were estimated from data from the same study, they still satisfy the factorisation criterion (4). Let F denote the set of individuals with fatal coronary disease, N denote the set of individuals with nonfatal coronary disease, and C denote the set of individuals censored for coronary disease within the UKPDS. Further, write
ri(ν) = P(disease in individual i),
pi(µ) = P(fatal disease in individual i|disease in individual i) Then the likelihood for the case fatality model isL1(µ) =
Q
i∈Fpi(µ) Q
i∈N{1−pi(µ)} and the likelihood for the risk model isL2(ν) =
Q
i∈C{1−ri(ν)} Q
i∈F,Nri(ν). The combined likelihood function that would be required to estimate µandν jointly is
L(µ, ν) = Y i∈C {1−ri(ν)} Y i∈F pi(µ)ri(ν) Y i∈N {1−pi(µ)}ri(ν),
which equals L1(µ)L2(ν) as required in (4). The extension of this to include the
independent data on which λ is estimated is trivial.
Consider a hypothetical patient with HbA1c of 4.5% in the local laboratory,
Xi = 0 for all i > 0, i 6= 3, and Yi = 0 for alli > 0, i 6= 5. For simplicity, we consider here the case that t = 1. Using ˆλ, ˆµ and ˆν the MLE of one-year coronary risk is 0.0078707 and the MLE of one-year case fatality is 0.23752, giving an estimated risk of fatal coronary disease 0.0078707×0.23752 = 0.001869. We wish to calculate the posterior expectation of the risk of fatal coronary disease over λ, µ, ν under the independent prior specifications πλ(λ) ∝1/λ2 and πµ(µ) =
πν(ν) ∝ 1. Taking p = 3, with θ1 = λ, θ2 = µ and θ3 = ν, we can apply the
approximations of Section 3. To three significant figures, the approximate posterior expectation by the full method (7) is 0.001903 and requires 1,463 evaluations. The approximation (8) to the posterior expectation is also 0.001903 and requires only 35 evaluations. For comparison, the exact answer by simulation is 0.001907.
To six significant figures the MLE of fatal coronary risk is 0.00186948. Fixing
λ = ˆλ, µ = ˆµ, the expectation over ν by (8) is 0.00189733. Fixing µ = ˆµ, ν = ˆν, the expectation over λ by numerical integration is 0.00186948. Hence, using the decomposition formula (9), the expectation over λ, µ, ν is approximately −2 × 0.00186948 + 0.00189733 + 0.00187545 + 0.00186960 = 0.001903.
Table 1 compares the MLE and approximations (8) and (9) to the posterior expectation, calculated by simulation, for 25 individuals selected at random from the UKPDS. The higher-order approximations to the expectation show a smaller mean absolute error than the MLE. Compared to the variation between patients, however, the error in the MLE is small, due to the large sample size on which the risk and case fatality models were estimated. To illustrate the performance of the approximations in models derived from a smaller sample, we refit the risk and case fatality models using a 15% random subsample of the UKPDS. Table 2 shows simulation results and asymptotic approximations for this analysis.
6
Discussion
We have presented formulae for fast estimates of posterior quantities with high asymptotic accuracy. In our example the good asymptotic properties of the for-mulae resulted in substantially reduced error relative to the posterior standard deviation. Although the formulae are for Bayesian estimation, Stevens (2003) showed that they can also be useful in other contexts.
The example in Section 5 is one of many situations in which one model is used to estimate the input to another. Another is a class of methods for correcting regression dilution that consist of estimating a scaling factor from a repeated mea-sures model, which is then used to modify the effect parameter, or equivalently the input variable, preparatory to calculating a prediction from a regression model (Frost and Thompson 2000). The methods of Sections 3 and 4 could be used to calculate expectations across both the repeated measures model and the regression model.
Our examples have emphasised the use of the results of Section 3 in combining multi-component models. They may also have a use in reducing the computational burden in fitting a single model with very many parameters. Calculation of max-imum likelihood estimates and profile likelihood functions becomes increasingly difficult with the square of the dimension of the parameter vector. If a prob-lem with many parameters can be factorized in the manner of Section 3 then the
weights and abscissae for these approximations can be calculated on pproblems of lower dimension.
The factorisation criteria (4) and (5) will be met whenever thepmodels are es-timated on pindependent data sets, but the example shows that factorisation can also arise from conditionality as well as independence. It is possible to relax the factorisation (5) of the prior density (although not the factorisation (4) of the like-lihood). Let π1(θ1), . . . , πp(θp) be the marginal prior densities, or some other densi-ties chosen for convenience, ofθ1, . . . , θp and defineφ(θ) =π(θ)/{π1(θ1)· · ·πp(θp)}. Then φ(θ) may be absorbed into the function g(θ) before computing the expecta-tion (8). Sinceφ(θ) isO(1), the resulting approximation will continue to be of the same order of accuracy; details are not given here.
This paper was motivated by the “UKPDS Risk Engine” software project, which provides clinicians with estimates of coronary and stroke risk in people with diabetes. The methods of this paper made a substantial contribution to the suc-cessful drive to make the Risk Engine run acceptably fast on a Palm computer. They should also find application in the many other models, in diabetes and else-where, that use multi-sub-models.
Appendix: Derivation of weights and abscissae
We obtain the abscissae and weights used in formula (8) for the overall multi-component model when the likelihood function and prior density factorise as in (4) and (5).
Defining lm(θm) = logLm(θm) to be the log-likelihood for the mth model, we observe that
l(θ1, . . . , θk) = l1(θ1) +· · ·+lp(θp)
so that the maximiser of lis ˆθ = (ˆθ1, . . . ,θˆp), where ˆθm is the maximiser oflm(θm). For a givenmand 1≤i≤km, as in Section 2 we let ˆθmj(θmi), j =i+1, . . . , km,and
ˆ
θrj(θmi), j = 1, . . . , kr, r =m+ 1, . . . , p, denote the values of θm,i+1, . . . , θpkp that
maximisel(θ) conditional on (ˆθ1, . . . ,θˆm−1,θˆm1, . . . ,θˆm,i−1, θmi). But it follows from the factorisation (4) of L that ˆθmj(θmi), j > i does not depend on (ˆθ1, . . . ,θˆm−1)
and that ˆθrj(θmi) = ˆθrj for r > m. Therefore the profile log-likelihood associated with the (k1+· · ·+km−1 +i)th component θmi of θ, i= 1, . . . , km, m = 1, . . . , p,
can be written as
l1(ˆθ1) +· · ·+lm−1(ˆθm−1) +lmi(θmi) +lm+1(ˆθm+1) +· · ·+lp(ˆθp)
in whichlmi(θmi) is the logarithm of the profile likelihoodLmi(θmi) defined in Sec-tion 3. It follows that the (m, i)th signed-root transformation is simply calculated from the mth log-likelihood function as
Rmi(θmi) = sign(θmi−θˆmi)[2{lm(ˆθm)−lmi(θmi)}]1/2, and hence that θ+
mi and θmi− are obtained from the mth likelihood alone.
Next consider the weights in the overall model. In view of the factorisation of the likelihood, it is readily seen that the log-likelihood derivative in the definition of τ+
mi associated with the (m, i)th component of θ is simply
∂ ∂θmi
lm(ˆθm1, . . . ,θˆm,i−1, θ+mi,θˆm,i+1(θmi+ ), . . . ,θˆmkm(θ
+
mi)).
Similarly, the matrix jr+1 in the combined model is block diagonal, so that when
r = 2(k1+· · ·+km−1) +i its determinant is
|jmi(ˆθmi)||Jm+1| · · · |Jp|,
where Jm is the observed information matrix associate with model m. It follows that the (m, i)th component of τ+ for the combined model is
Y s6=m πs(ˆθs) Y s>m |Js|−1/2τmi+ , where τ+
mi is given by equation (1) applied to model m. Since we get a similar expression for the (m, i)th component of τ−, the common factors in these compo-nents of τ+ andτ− cancel when we form the (m, i)th component ofα+ andα−. It follows that the (m, i)th components ofα+ and α− are justα+
mi and αmi− obtained from the mth model.
Acknowledgements
This paper arises from work carried out when RJS was a Wellcome Trust research fellow at the Diabetes Trials Unit, Oxford. We are grateful to the UK Prospec-tive Diabetes Study group, and to the laboratory of the Diabetes Trials Unit for permission to use the example.
References
Brown J.B. 2000. Computer models of diabetes: almost ready for prime time, Diabetes Research and Clinical Practice 50(Supplement 3):S1–3.
Clarke P.M., Gray A.M., Briggs A., Farmer A.J., Fenn P. et al. 2004. A model to estimate the lifetime health outcomes of patients with Type 2 diabetes: the United Kingdom Prospective Diabetes Study (UKPDS) Outcomes Model (UKPDS no. 68) Diabetologia 47: 1747–59.
Eastman R.C., Javitt J.C., Herman W.H., Dasbach E.J., Zbrozek A.S.et al. 1997. Model of complications of NIDDM. I. Model construction and assumptions, Diabetes Care 20: 725–34.
Eddy D.M. and Schlessinger L. 2003. Archimedes: A trial-validated model of diabetes. Diabetes Care 26: 3093–3101.
Frost C. and Thompson S. 2000. Correcting for regression dilution bias: compari-son of methods for a single predictor variable, Journal of the Royal Statistical Society series A 163: 173–190.
Hoeffding W. 1982. Asymptotic normality. In: Kotz S., Johnson N.L. and Read C.B. (Eds.), Encyclopedia of Statistical Sciences. Wiley.
Stevens R.J. 2003. Evaluation of methods for interval estimation of model out-puts, with application to survival models. Journal of Applied Statistics 30: 967–981.
Stevens R.J., Coleman R.L., et al. 2004. Risk factors for myocardial infarction case fatality and stroke case fatality in type 2 diabetes (UKPDS 66). Diabetes Care 27: 201–207.
Sweeting T.J. 1996. Approximate Bayesian computation based on signed roots of log-density ratios (with Discussion). In: Bernardo J.M., Berger J.O., Dawid A.P., and Smith A.F.M. (Eds.), Bayesian Statistics 5. Oxford University Press.
Sweeting T. J. and Kharroubi S. A. 2003. Some new formulae for posterior expectations and Bartlett corrections. Test 12: 497–521.
Tierney L. and Kadane J. B. 1986. Accurate approximations for posterior mo-ments and marginal densities. Journal of the American Statistical Associa-tion 81: 82–86.
UKPDS Group. 2001. The UKPDS Risk Engine: a model for the risk of coronary heart disease in type 2 diabetes (UKPDS 56). Clinical Science 101: 671–679.
Table 1. Modelled one-year risk of fatal coronary disease in 25 randomly selected individuals: exact posterior expectation calculated by simulation, compared to the maximum likelihood estimate (MLE) and the expectation estimated by formulae (8) and (9) as described in the text.
Patient Posterior expectation (Monte-Carlo standard error) Posterior standard deviation MLE Formula (8) Formula (9) a 0.000913 (2.0 × 10-7) 0.000329 0.000836 0.000912 0.000912 b 0.000272 (5.6 × 10-8) 0.000098 0.000248 0.000271 0.000271 c 0.000483 (1.0 × 10-7) 0.000163 0.000461 0.000483 0.000483 d 0.004536 (6.1 × 10-7) 0.000963 0.004405 0.004534 0.004534 e 0.002254 (4.1 × 10-7) 0.000646 0.002171 0.002253 0.002253 f 0.001034 (2.1 × 10-7) 0.000341 0.000990 0.001033 0.001033 g 0.000064 (1.6 × 10-8) 0.000028 0.000061 0.000064 0.000064 h 0.005467 (6.7 × 10-7) 0.001047 0.005384 0.005468 0.005468 i 0.002598 (3.5 × 10-7) 0.000553 0.002550 0.002598 0.002598 j 0.001944 (2.7 × 10-7) 0.000432 0.001923 0.001944 0.001944 k 0.002100 (2.8 × 10-7) 0.000443 0.002087 0.002101 0.002101 l 0.002884 (4.3 × 10-7) 0.000694 0.002821 0.002884 0.002884 m 0.002335 (3.1 × 10-7) 0.000481 0.002289 0.002335 0.002335 n 0.004851 (6.2 × 10-7) 0.000977 0.004802 0.004852 0.004852 o 0.009124 (1.2 × 10-6) 0.001811 0.008968 0.009125 0.009125 p 0.003509 (5.7 × 10-7) 0.000868 0.003380 0.003507 0.003507 q 0.003081 (4.1 × 10-7) 0.000633 0.003032 0.003081 0.003081 r 0.004077 (6.6 × 10-7) 0.001037 0.003922 0.004074 0.004074 s 0.004236 (5.7 × 10-7) 0.000886 0.004148 0.004236 0.004236 t 0.001626 (3.2 × 10-7) 0.000516 0.001583 0.001626 0.001626 u 0.003069 (4.1 × 10-7) 0.000655 0.003063 0.003070 0.003070 v 0.002285 (3.2 × 10-7) 0.000513 0.002245 0.002286 0.002286 w 0.011519 (1.4 × 10-6) 0.002226 0.011322 0.011520 0.011520 x 0.004730 (6.5 × 10-7) 0.001031 0.004621 0.004729 0.004729 y 0.007197 (9.3 × 10-7) 0.001404 0.007024 0.007196 0.007196 Mean error1 11% 0.15% 0.23%
1 For each row, we calculated the absolute value of the difference between each estimate and the exact
Table 2. One-year risk of fatal coronary disease according to models derived from a subsample of the UKPDS, as described in the text. MLE and mean error defined as in Table 1.
Patient Posterior expectation (Monte-Carlo standard error) Posterior standard deviation MLE Formula (8) Formula (9) a 0.002967 (1.8 × 10-5) 0.002094 0.002033 0.002875 0.002875 b 0.000380 (3.0 × 10-6) 0.000297 0.000243 0.000358 0.000358 c 0.000381 (3.1 × 10-6) 0.000286 0.000282 0.000367 0.000367 d 0.006667 (3.0 × 10-5) 0.003207 0.005474 0.006545 0.006545 e 0.002049 (1.5 × 10-5) 0.001261 0.001586 0.001987 0.001987 f 0.000761 (7.2 × 10-6) 0.000537 0.000574 0.000735 0.000735 g 0.000043 (4.1 × 10-7) 0.000044 0.000033 0.000043 0.000043 h 0.005730 (3.2 × 10-5) 0.002462 0.005183 0.005610 0.005610 i 0.003703 (1.7 × 10-5) 0.001864 0.003267 0.003632 0.003632 j 0.001688 (1.1 × 10-5) 0.000842 0.001563 0.001641 0.001641 k 0.002185 (1.2 × 10-5) 0.001008 0.002094 0.002143 0.002143 l 0.001783 (1.6 × 10-5) 0.001019 0.001500 0.001730 0.001730 m 0.002487 (1.2 × 10-5) 0.001178 0.002194 0.002432 0.002432 n 0.005609 (3.1 × 10-5) 0.002614 0.005278 0.005504 0.005504 o 0.012677 (6.6 × 10-5) 0.005601 0.011483 0.012523 0.012523 p 0.005380 (2.6 × 10-5) 0.002828 0.004255 0.005274 0.005274 q 0.003956 (1.9 × 10-5) 0.001912 0.003599 0.003889 0.003889 r 0.006368 (3.2 × 10-5) 0.003499 0.004985 0.006244 0.006244 s 0.005246 (2.7 × 10-5) 0.002557 0.004608 0.005153 0.005153 t 0.002094 (1.4 × 10-5) 0.001366 0.001839 0.002100 0.002100 u 0.004599 (2.1 × 10-5) 0.002120 0.004624 0.004565 0.004565 v 0.001714 (1.2 × 10-5) 0.000864 0.001503 0.001662 0.001662 w 0.016274 (7.9 × 10-5) 0.006945 0.014721 0.016133 0.016133 x 0.004516 (2.8 × 10-5) 0.002237 0.003794 0.004382 0.004382 y 0.012669 (4.9 × 10-5) 0.005190 0.011112 0.012589 0.012589 Mean error 27% 3.9% 3.9%