Incorporating cost in Bayesian Variable Selection, with application to cost-effective measurement of quality of health care.

(1)

& %

Incorporating cost in Bayesian Variable

Selection, with application to cost-eﬀective

measurement of quality of health care

Dimitris Fouskakis,

Department of Mathematics, School of Applied Mathematical and Physical Sciences, National Technical University of Athens, Athens, Greece; e-mail: [email protected].

Joint work with:

Ioannis Ntzoufras & David Draper

Department of Statistics Department of Applied Mathematics and Statistics Athens University of Economics and Business University of California

Athens, Greece; e-mail: [email protected] Santa Cruz, USA; e-mail:[email protected]

Presentation is available at:www.math.ntua.gr/∼fouskakis/Conferences/BMS/bms.pdf.

'

&

$

%

Synopsis

1. Motivation - Indirect Measurement of Quality of Health Care. 2. Model Speciﬁcation.

3. Cost - Beneﬁt Analysis.

4. Cost - Restriction - Beneﬁt Analysis. 5. Discussion.

University of Florida 10th Annual Winter Workshop: Bayesian Model Selection and Objective Methods3

'

&

$

%

1 Motivation - Indirect Measurement of Quality

of Health Care

How to measure hospital quality of care?

• Indirect method: input-output approach— hospital outcomes (e.g., mortality within 30 days of admission) comparedafter adjusting for diﬀerences in inputs(sickness at admission).

• Patient sickness at admissionis traditionally assessed by using logistic regression of mortality within 30 days of admission on a fairly large number of sickness indicators (on the order of 100) to construct asickness scale.

• Beneﬁt - Only Analysis: Classical variable selection techniques can be employed to ﬁnd an “optimal” subset of 10-20 indicators. In a major U.S. study constructed by RAND Corporation, such approach was used to reduced the initial list of_p= 83 sickness indicators gathered on_n= 2_,532 pneumonia patients down to a core of 14 predictors (Keeler, _{et al.}, 1990).

'

&

$

%

The 14-Variable Rand Pneumonia Scale

The RAND admission sickness scale for pneumonia (p= 14 variables), with the marginal data collection costs per patient for each variable (in minutes of abstraction time).

Variable Cost Variable Cost

(Minutes) (Minutes)

Blood Urea Nitrogen 1.50 Age 0.50

Systolic Blood Pressure 0.50 Chest X-rayCongestive 2.50

Score(2-point scale) Heart Failure Score(3-point scale)

Total APACHE II Score 10.00 APACHE II Coma Score 2.50 (36-point scale) (3-point scale)

Serum Albumin 1.50 Shortness of Breath 1.00 (3-point scale) Day 1 (yes, no)

Respiratory Distress 1.00 Septic Complications 3.00

(yes, no) (yes, no)

Prior Respiratory Failure 2.00 Recently Hospitalized 2.00

(yes, no) (yes, no)

Ambulatory Score 2.50 InitialTemperature 0.50 (3-point scale)

(2)

& %

2 Model Speciﬁcation

• Logistic regression model with_Yi= 1 if patientidies after 30 days of admission.

• Xij: jsickness predictor variable for theipatient.

• m→γ= (_γ1, . . . , γp)T.

• γj : Binary indicators of the inclusion of the variableXj in the model.

• Model spaceM={0_,1}p;_p= total number of variables considered. Hence the model formulation can be summarized as

(_Yi|γ) indep ∼ Bernoulli(_pi(γ))_, ηi(γ) = log pi(γ) 1−_pi(γ) = p j=0 βjγjXij, η(γ) = Xdiag(γ)β=Xγ βγ_. & %

Two diﬀerent approaches

• The RAND Benefit - Only approach is sub-optimal: it does not consider differences in cost of data collectionamong available predictors. We propose aCost - Benefit Analysis, in which variables are chosen only when they predict well enough given how much they cost to collect.

• In problems such as this, in which there are two desirable criteria that compete, and over which ajoint optimizationmust be achieved, there are two main ways to proceed:

– Both criteria can be placed on a common scale, and optimization can occur on that scale (strategy (a)).

– One criterion can be optimized, subject to a bound on the other (strategy (b)).

'

&

$

%

Three methods for solving this problem

(1) (strategy (a)) Draper and Fouskakis (2000) and Fouskakis and Draper (2002, 2008) proposed an approach to this problem based onBayesian Decision Theory. They used stochastic optimization methods to ﬁnd (near-) optimal subsets of predictor variables that maximize an expected utility function which trades oﬀ data collection cost against predictive accuracy.

(2) (strategy (a)) In this work, as an alternative to (1), we propose a prior distribution that accounts for the cost of each variable and results in a set of posterior model probabilities which correspond to aGeneralized Cost-Adjusted version of the Bayesian Information Criterion(Fouskakis, Ntzoufras and Draper, 2007a). (3) (strategy (b)) We also implement aCost - Restriction - Beneﬁt Analysis,

where the search is conducted only among models whose cost does not exceed a budgetary restriction(Fouskakis, Ntzoufras and Draper, 2007b), by the usage of a Population - Based Trans - Dimensional RJMCMC Method.

Here we present results from methods (2) (Cost - Beneﬁt Analysis) and (3) (Cost - Restriction - Beneﬁt Analysis).

'

&

$

%

3 Cost-Beneﬁt Analysis

The aim is to identify well ﬁtted models after taking into account the cost of each variable.

Therefore we need to estimate the posterior model probability

f(γ|y) = f(γ) f(y|βγ_,γ)_f(βγ|γ)_dβγ γ_∈{0,1}pf (γ)_f(y|βγ,γ)f(βγ|γ)dβγ after introducing a prior on model space_f(γ) depending on the cost.

Prior on Model Parameters

f(βγ|γ) =_{N ormal}

0_,4_nXT_γXγ−

1

• Low Information Prior, since it gives weight to the prior equal to one data-point (see Ntzoufras, Delaportas and Forster, 2003).

(3)

& %

A Cost-penalized Prior on Model Space

f(_γj)∝exp γj 2 c0−cj c0 log_n for _j= 1_{, . . . , p .}

When comparing modelsγ(k)_and_γ() _⇒_{penalty imposed to the log-likelihood}

ratio is given by −2 logf(γ (k)₎ f(γ()₎ = p j=1 γj(k)−γ () j _c_j c0 log_n−_d_γ(k)−dγ() log_{n .}

• cj : cost per observation forXj variable.

• c0 : baseline cost (default choice: c0= min{cj} ∀j= 1, . . . , p).

• Indiﬀerence concerning the cost⇒_cj=c0 forj= 1, . . . , p⇒uniform prior on

model space (_f(γ)∝1)⇒Posterior model odds = Bayes factor.

& %

Approximations of the Posterior Model Odds

Using Laplace approximation in our model formulation we end up

−2 log_f(γ|y) = −2 log_f(y|βγ˜ _,γ) +_φ(γ)

prior model prob.

−2 log_f(γ) Penalty Term +_O(_n−1)_. with _φ(γ) = 1 4_n ˜ βT_γXT γXγβγ˜ +_d_γlog(4_n) + log |Ψ −1 γ | |XT γXγ|

can be thought a measure of discrepancy between the data and the prior information of the model parameters

.

• _βγ˜ _{: posterior mode of}_f₍_βγ_|_y_,_γ_),

• d_γ=p_j₌₁γj is the dimension of the modelγ,

• Ψ_γ is minus the inverse of the Hessian matrix of

h(βγ) = logf(y|βγ,γ) + logf(βγ|γ) evaluated at the posterior modeβγ˜ .

'

&

$

%

Penalty Interpretation: A generalized cost-adjusted BIC

−2 log_f(γ|y) = −2 log_f(y|βγˆ ) + p j=1 γjc j c0logn+O(1) = −2 log_f(y|βγˆ ) +Cγ c0 log_n+_O(1)_.

• Cγ =p_j₌₁_γjcj , the cost of modelγ.

• _βγˆ _{= MLE of the parameters}_βγ _{of model}_γ_.

• If_cj=c0for allj⇒BIC =−2 logf(y|βγˆ ) +dγlogn.

'

&

$

%

Implementation and Results

• Run RJMCMC (Green, 1995) for 100K iterations in the full model space.

• Eliminate non-important variables (with marginal probabilities_<0.30) forming a new reduced model space.

• Run RJMCMC for 100K iterations in the reduced model space to estimate posterior model odds and best models.

• Two setups:

1. Beneﬁt only analysis (uniform prior on model space).

(4)

& %

Preliminary Results: Marginal Probabilities

f

(

γ

j

= 1

|

y

)

Variable Beneﬁt Cost-Beneﬁt

Index Name Cost Analysis Analysis

1 Systolic Blood Pressure (SBP) Score 0.50 0.99 0.99

2 Age 0.50 0.99 0.99

3 Blood Urea Nitrogen 1.50 1.00 0.99

4 Apache II Coma Score 2.50 1.00

5 Shortness of Breath Day 1 1.00 0.97 0.79

8 Septic Complications 3.00 0.88

12 Initial Temperature 0.50 0.98 0.96

13 Heart Rate Day 1 0.50 0.34

14 Chest Pain Day 1 0.50 0.39

15 Cardiomegaly Score 1.50 0.71

27 Hematologic History Score 1.50 0.45

37 Apache Respiratory Rate Score 1.00 0.95 0.32

46 Admission SBP 0.50 0.68 0.90

49 Respiratory Rate Day 1 0.50 0.81

51 Confusion Day 1 0.50 0.95

70 Apache pH Score 1.00 0.98 0.98

73 Morbid + Comorbid Score 7.50 0.96

78 Musculoskeletal Score 1.00 0.54

& %

Reduced Model Space: Posterior Model Probabilities/Odds

Common variables in both analyses:X_{1 +}X_{2 +}X_{3 +}X_{5 +}X_{12 +}X₇₀ Benefit-Only Analysis

Common Variables Additional Model Posterior

k Within Each Analysis Variables Cost Probabilities∗ P O∗∗

1k 1 X4 +X15 +X37 +X73 +X8 +X27+X46 22.5 0.3066 1.00 2 +X_{8 +}X₂₇ 22.0 0.1969 1.56 3 +X₈ 20.5 0.1833 1.67 4 +X₂₇₊X₄₆ 19.5 0.0763 4.02 5 17.5 0.0383 8.00 Cost-Benefit Analysis

Common Variables Additional Model Posterior

k Within Each Analysis Variables Cost Probabilities∗ P O∗∗

1k 1 X_{46 +}X₅₁ +X₄₉₊X₇₈ 7.5 0.1460 1.00 2 +X₁₄ +X₄₉₊X₇₈ 7.5 0.1168 1.27 3 +X₁₃ +X₄₉₊X₇₈ 7.5 0.0866 1.69 4 +X₁₃₊X₁₄ +X₄₉₊X₇₈ 8.0 0.0665 2.20 5 +X14 +X49 7.0 0.0461 3.17 6 +X₄₉ 6.5 0.0409 3.57 7 +X₃₇ +X₇₈ 7.5 0.0382 3.82 8 +X₁₃₊X₁₄ +X₄₉ 7.5 0.0369 3.96 9 +X₁₃ 6.5 0.0344 4.25

∗_{above 3%.}∗∗_{posterior odds of the best model within each analysis versus the current model}_k_.

'

&

$

%

Reduced Model Space: Comparisons

Comparison of measures of fit, cost and dimensionality between the best models in the reduced model space of the benefit-only and cost-benefit analysis; percentage difference is in relation to benefit-only.

Analysis Difference Benefit-Only Cost-Benefit (%) Minimum Deviance 1553.2 1635.8 +5_.3 Median Deviance 1564.5 1644.8 +5_.1 Cost 22.5 7.5 –66.7 Dimension 13 10 –23.1

'

&

$

%

4 Cost Restriction - Beneﬁt Analysis

• Implement aCost - Restriction - Beneﬁt Analysis, in which the practical relevance of the selected variable subsets is ensured by enforcing an overall limit on the total data collection cost of each subset: the search is conducted only among models whose cost does not exceed this budgetary restriction_C.

• Therefore, we should a-priori exclude modelsγwith total cost larger than_C, resulting to a signiﬁcantly reduced model space,

M={γ∈ {0_,1}p: p

i=1

ciγi≤C}.

• AIM: Estimate posterior model probabilities in the cost restricted model space.

(5)

& % • PROBLEM: Due to the cost limit, model space areas of local maximum

exist. Thus, we need to change the deﬁnition of the neighborhood structure of the proposed models and construct more advanced proposed jumps possibly between models of the same cost in order to avoid getting trapped into local maxima.

• SOLUTION: Intelligent trans-dimension MCMC methods that allow to move across areas of local maximum even if these are distinct.

Proposed Algorithm

• We have developed a Population Based Trans-Dimensional Reversible-Jump Markov Chain Monte Carlo algorithm (Population RJMCMC), combining ideas from the Population-Based MCMC(Jasra, Stephens and Holmes, 2007) andSimulated Tempering(Geyer and Thompson, 1995) algorithms.

& %

Population RJMCMC

• Use3 chains: The actual one, plus twoauxiliaryones.

– In the auxiliary chains the posterior distributions are raised in a power_tk (temperature),_k= 1_,2.

– 1st auxiliary chain: _t1>1→increasing diﬀerences between the posterior

probabilities (makes the distribution steeper allowing by this way the MCMC to move closer to locally best models).

– 2nd auxiliary chain: 0_{< t}2<1 →reducing diﬀerences between the

posterior probabilities (makes the distribution ﬂatter allowing by this way the MCMC to move easily across diﬀerent models).

• Temperatures_tk change stochastically.

• By this way the extensive number of chains is avoided.

'

&

$

% The incorporation of stochastic temperatures can be done using pseudo priorsgk(tk). In

this case the posterior distribution will be expanded to

whereγ₍_k₎andβ₍_k₎are the model indicator and parameter vector of chaink. Model indicators and parameters can be updated using RJMCMC steps, while the temperature

tk can be generated from the conditional posterior distribution

f(tk|β,γ,β(k),γ(k), t\k,y)∝f(y|β(k),γ(k))f(β(k)|γ(k))f(γ(k))tkgk(tk).

The desired posterior marginal distribution for the temperaturestk is given by

f(tk|y) ∝ γ(k)∈M β(k) f(y|tk,β₍_k₎,γ₍_k₎)f(β₍_k₎|γ₍_k₎)f(γ₍_k₎)tkgk(tk)dβ₍_k₎ ∝ Zk(y, tk)gk(tk),

whereZk(y, tk) is the marginal likelihood over all possible models for chaink.

'

&

$

% Sincegk(tk) are pseudo-priors, we can set

gk(tk)∝ hk(tk)

Zk(y, tk)

wherehk(tk) are convenient and easy to simulate from density functions resulting to f(tk|y) =hk(tk).

For the selection ofhk(tk) we propose to use

h1(t1) =Gamma(t1−1;a2, b2) andh2(t2) =Beta(t2;a1, b1).

Prior Distributions

Same prior on model parameters as in the Cost - Beneﬁt Analysis and a uniform prior on cost restricted model space, i.e.

f(γ)∝I(γ∈ M:c(γ) =

p

j=1

γjcj ≤C),

wherecj is the diﬀerential cost per observation for variableXj andCis the budgetary

(6)

& %

Implementation and Results

• COST LIMIT:_C= 10 minutes of abstraction time.

• Run Population RJMCMC for 100K iterations in the full model space, twice, starting each time from a diﬀerent model.

• Eliminate non-important variables (with marginal probabilities_< 0.30 in both runs) forming a new reduced model space.

• Run population RJMCMC in the reduced space, twice.

• Compare results and performance of population RJMCMC with simple RJMCMC.

& %

Preliminary Results: Marginal Probabilities

f

(

γ

j

= 1

|

y

)

Variables with marginal posterior probabilities _f(_γj= 1|y)above 0.30 in at least

one run.

Marginal Posterior Probabilities

Variable First Run Second Run

Index Name Cost Analysis Analysis

1 Systolic Blood Pressure (SBP) Score 0.50 0.98 0.99

2 Age 0.50 0.97 0.95

3 Blood Urea Nitrogen 1.50 0.99 0.91

4 Apache II Coma Score 2.50 0.55 1.00

5 Shortness of Breath Day 1 1.00 0.92 0.80

6 Serum Albumin 1.50 0.40 0.55

12 Initial Temperature 0.50 0.91 0.93

37 Apache Respiratory Rate Score 1.00 0.72 0.79

46 Admission SBP 0.50 0.45 0.25

49 Respiratory Rate Day 1 0.50 0.35 0.25

51 Confusion Day 1 0.50 0.44 0.01

62 Body System Count 2.50 0.55 0.33

70 Apache pH Score 1.00 0.81 0.73

'

&

$

%

Reduced Model Space: Posterior Model Probabilities/Odds

Common variables in both analyses:X2 +X4 Population RJMCMC - 500K iterations

1st Run 2nd Run

Common Additional Posterior Posterior

k m Variables Variables Prob. P O∗

1k Prob. P O∗ 1k 1 m₁ X_{1 +}X_{12 +}X₃₇ +X₃₊X₅ +X₆₂ 0.4872 1.00 0.4879 1.00 2 m₂ +X₅ +X_{46 +}X_{62 +}X₇₀ 0.1202 4.05 0.1052 4.63 3 m₃ +X₃ +X_{62 +}X₇₀ 0.0894 5.45 0.0982 4.97 4 m4 +X3+X5 +X6 +X70 0.0344 14.16 0.0498 9.80 Simple RJMCMC - 500K iterations 1st Run 2nd Run

Common Additional Posterior Posterior

k m Variables Variables Prob. P O∗

1k Prob. P O∗1k 1 m1 X62 +X1+X3+X5+X12+X37 0.6129 1.00 0.5952 1.00 2 m₃ +X₁₊X₃ +X₁₂₊X₃₇ +X₇₀ 0.0828 7.40 0.1214 4.90 3 m₂ +X₁ +X₅₊X₁₂₊X₃₇₊X₄₆ +X₇₀ 0.0762 8.04 0.1074 5.54 4 m₅ +X₃₊X₅ +X_{46 +}X_{49 +}X₇₀ 0.0457 13.41 <0.03 >19.9 5 m₆ +X₁₊X₃₊X₅ +X_{49 +}X₇₀ 0.0337 18.19 <0.03 >19.9

∗_{posterior odds of the best model within each analysis versus the current model}_k_.

All models appearing in the table have total cost 10 min (cost limit).

'

&

$

%

Reduced Model Space: Monte Carlo Errors

Monte Carlo Errors (%) RJMCMC

Type Run Iterations m1 m2 m3 m4

POP. 1 500K 1.2 0.5 0.9 0.7 POP. 2 500K 1.5 0.4 1.0 0.7 POP. 1 200K 1.9 0.8 1.1 1.2 POP. 2 200K 1.6 1.0 1.1 0.9 POP. 1 100K 2.5 1.2 1.7 1.5 POP. 2 100K 2.7 0.9 1.6 1.2 SIMPLE 1 500K 4.2 1.3 3.2 0.0 SIMPLE 2 500K 4.2 1.7 3.6 0.0 Relative Comparisons SIMPLE vs. POP. 500K 3.5 2.8 3.6 0.0 (First Run) 200K 2.2 1.8 2.9 0.0 100K 1.7 1.2 1.9 0.0 SIMPLE vs. POP. 500K 2.8 3.4 3.6 0.0 (Second Run) 200K 2.6 1.7 3.3 0.0 100K 1.6 1.9 2.3 0.0

(7)

'

&

$

%

5 Discussion

• Cost - Beneﬁt Analysis:

The resulting models achieve dramatic gains in cost and noticeable improvement in model simplicity at the price of a small loss in predictive accuracy, when compared to the results of a more traditional beneﬁt-only analysis.

• Cost - Restriction - Beneﬁt Analysis:

Population RJMCMC algorithm explores the model space eﬃciently and converges faster than simple RJMCMC (having lower Monte Carlo errors).

& %

References

Draper D, Fouskakis D (2000). A case study of stochastic optimization in health policy: problem formulation and preliminary results.Journal of Global Optimization,18, 399–416.

Fouskakis D, Draper D (2002). Stochastic optimization: a review.International Statistical Review,

70, 315–349.

Fouskakis D, Draper D (2008). Comparing stochastic optimization methods for variable selection in binary outcome prediction, with application to health policy.Journal of the American Statistical Association,103, forthcoming.

Fouskakis D, Ntzoufras I, Draper D (2007a). Bayesian variable selection using cost-adjusted BIC, with application to cost-eﬀective measurement of quality of health care. (submitted).

Fouskakis D, Ntzoufras I, Draper D (2007b). Population Based Reversible Jump MCMC for Bayesian Variable Selection and Evaluation Under Cost Limit Restrictions. (submitted).

Geyer CJ, Thomson EA (1995). Annealing Markov Chain Monte Carlo with applications to ancestral inference.Journal of the American Statistical Association,90, 909–920.

Green P (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination.Biometrika,82, 711–732.

Jasra A, Stephens DA, Holmes CC (2007). Population-based reversible jump MCMC.Biometrika. forthcoming.

Keeler E, Kahn K, Draper D, Sherwood M, Rubenstein L, Reinisch E, Kosecoﬀ J, Brook R (1990). Changes in sickness at admission following the introduction of the Prospective Payment System.

Journal of the American Medical Association,264, 1962–1968.

Ntzoufras I, Dellaportas P, Forster JJ (2003). Bayesian variable and link determination for generalized linear models.Journal of Statistical Planning and Inference,111, 165–180.