MCMC in Bayesian Variable Selection/Model Averaging

(1)

MCMC in Bayesian Variable Selection/Model

Averaging

Readings: See website

11/18/2019

(2)

Example with Diabetes data

First lets load the data (from Hoff) and coerce them to be dataframes.

set.seed(8675309)

source("yX.diabetes.train.txt")

diabetes.train = as.data.frame(diabetes.train) dim(diabetes.train)

## [1] 342 65

source("yX.diabetes.test.txt")

diabetes.test = as.data.frame(diabetes.test) colnames(diabetes.test)[1] = "y"

Too many models 3.6893488 × 10¹⁹ to enumerate!

(3)

BMA using BAS

library(BAS)

diabetes.bas = bas.lm(y ~ ., data=diabetes.train,

method="MCMC", prior='hyper-g-n', n.models = 10000,

MCMC.iteration=500000,

thin = 10, initprobs="eplogp", force.heredity=FALSE)

I run a MCMC sampler that combines a Random Walk Metropolis algorithm with a random swap step.

I The algorithm stops when the number of iterations exceeds MCMC.iterations or n.models have been visited.

I thin save every 10th model.

I initprobs argument uses the p-values from OLS to compute an approximate Bayes Factor. These are used internally to sort the variables which provides improved memory usage.

(4)

BMA using BAS

library(BAS)

(5)

BMA using BAS

library(BAS)

(6)

BMA using BAS

library(BAS)

(7)

Checking Convergence

The functiondiagnostics compares

I estimates of posterior probabilities estimated from Monte Carlo frequencies (asymptotically unbiased) to

I estimates based on normalizing marginal likelihoods times prior probabilities of just the sampled models. (Fisher consistent; recovers exact under enumeration)

I should be equal if the chain has converged.

(8)

Checking Convergence

I estimates based on normalizing marginal likelihoods times prior probabilities of just the sampled models. (Fisher consistent;

recovers exact under enumeration)

(9)

Checking Convergence

I estimates based on normalizing marginal likelihoods times prior probabilities of just the sampled models. (Fisher consistent;

recovers exact under enumeration)

(10)

Convergence of Posterior Inclusion Probabilities

diagnostics(diabetes.bas, type="pip")

0.0 0.2 0.4 0.6 0.8 1.0

0.00.20.40.60.81.0

Convergence Plot: Posterior Inclusion Probabilities

pip (renormalized)

pip (MCMC)

Close, but could run longer!

(11)

Convergence of model probabilities

diagnostics(diabetes.bas, type="model")

0.00 0.02 0.04 0.06 0.08 0.10 0.12

0.000.020.040.060.080.100.12

Convergence Plot: Posterior Model Probabilities

p(M | Y) (renormalized)

p(M | Y) (MCMC)

Clearly, not long enough.

(12)

rerun longer

MCMC.iterations=10^7, thin = 64, # keep every p initprobs="eplogp",

force.heredity=FALSE)

(13)

Diagnostics

diagnostics(diabetes.bas, type="model")

0.00 0.02 0.04 0.06 0.08 0.10

0.000.020.040.060.080.10

Convergence Plot: Posterior Model Probabilities

p(M | Y) (renormalized)

p(M | Y) (MCMC)

(14)

Convergence of Posterior Inclusion Probabilities

diagnostics(diabetes.bas, type="pip")

0.0 0.2 0.4 0.6 0.8 1.0

0.00.20.40.60.81.0

Convergence Plot: Posterior Inclusion Probabilities

pip (renormalized)

pip (MCMC)

OK?

(15)

Highest Probability Model

One problem with MCMC and model selection is that the “best”

model may be visited very few times.

summary(diabetes.bas$freq)

## Min. 1st Qu. Median Mean 3rd Qu. Max.

## 1.000 2.000 4.000 7.825 8.000 4127.000

(16)

Comments

I I routinely use much larger numbers of MCMC iterations than I see in papers if looking at BMA or model selection (i.e. millions rather than say 10,000)

I Explore different numbers of models - does it make a difference for estimating marginal inclusion probabilities or posterior inclusion probabilities? What about predictions under BMA? I Check memory ! items can be large in R memory

(17)

Comments

I Explore different numbers of models - does it make a difference for estimating marginal inclusion probabilities or posterior inclusion probabilities? What about predictions under BMA?

I Check memory ! items can be large in R memory

(18)

Comments

I Explore different numbers of models - does it make a difference for estimating marginal inclusion probabilities or posterior inclusion probabilities? What about predictions under BMA?

I Check memory ! items can be large in R memory

(19)

What variables are included?

image(diabetes.bas)

00.3780.6130.8441.0741.281.492.586 201311987654321 Model Rank

Log Posterior Odds Intercept sex map ldl tch glu ‘bmi^2‘ ‘tc^2‘ ‘hdl^2‘ ‘ltg^2‘ ‘age:sex‘ ‘age:map‘ ‘age:ldl‘ ‘age:tch‘ ‘age:glu‘ ‘sex:map‘ ‘sex:ldl‘ ‘sex:tch‘ ‘sex:glu‘ ‘bmi:tc‘ ‘bmi:hdl‘ ‘bmi:ltg‘ ‘map:tc‘ ‘map:hdl‘ ‘map:ltg‘ ‘tc:ldl‘ ‘tc:tch‘ ‘tc:glu‘ ‘ldl:tch‘ ‘ldl:glu‘ ‘hdl:ltg‘ ‘tch:ltg‘ ‘ltg:glu‘

(20)

Prediction

BMA for prediction and compute the MSE between the predicted and actual responses for the test data using BMA:

pred.bas = predict(diabetes.bas,

newdata=diabetes.test, estimator="BMA", se.fit=T) cv.summary.bas(pred.bas$fit, diabetes.test$y)

## [1] 0.6739172

(21)

Scoring

BAS::cv.summary.bas

## function (pred, ytrue, score = "squared-error")

## {## if (length(pred) != length(ytrue)) {

## stop("predicted values and observed values are not the same length")

## return()

## }

## if (!(score %in% c("squared-error", "miss-class"))) {

## stop(paste("score ", score, "not implemented; please check spelling"))

## }

## if (score == "miss-class") {

## pred.class <- ifelse(pred < 0.5, 0, 1)

## confusion <- table(pred.class, ytrue)

## error <- (sum(confusion) - sum(diag(confusion)))/sum(confusion)

## }

## else {

## error <- sqrt(sum((pred - ytrue)^2)/length(ytrue))

## }

## return(error)

## }## <bytecode: 0x7fb255d10130>

## <environment: namespace:BAS>

(22)

Fit the Lasso

suppressMessages(library(lars))

diabetes.lasso = lars(as.matrix(diabetes.train[, -1]), diabetes.train[,1],

type="lasso")

(23)

Plot of Lasso

plot(diabetes.lasso)

**************************************************************************** ******** *** * **** ** * ** * * ** * *

0.0 0.2 0.4 0.6 0.8 1.0

−100−50050100

|beta|/max|beta|

Standardized Coefficients

**************************************************************************** ******** *** * **** *** ** * * ** * *

**************************************************************************** ******** *** * **** ** * ** * * ** * *

**************************************************************************** ******** *** * **** *** ** * * ** *

*

**************************************************************************** ******** *** * **** ** * ** * * ** *

*

**************************************************************************** ******** *** * **** *** ** * * ** *

*

**************************************************************************** ******** *** * **** ** * ** * * ** * *

**************************************************************************** ******** *** * **** ** * ** * * ** *

*

**************************************************************************** ******** *** * **** ** * ** * * ** * *

**************************************************************************** *********** * ****** ***

*

** * *

**************************************************************************** ******** *** * ****** * **

*

** * *

**************************************************************************** ******** *** * **** *** ** * *

** * *

************************************************************************************ *** * **** ** * ** * * ** * *

************************************************************************************ *** * **** ** *** * * ** * *

**************************************************************************** ******** *** * **** ** * ** * * ** * *

**************************************************************************** ******** *** * **** *** ** * * ** * *

**************************************************************************** ******** *** * ****** * ** * * ** * *

**************************************************************************** ******** *** * **** ** * ** * * ** * *

**************************************************************************** ******** *** ***** *** ** * * ** * *

**************************************************************************** ******** *** * **** *** ** * * ** * *

**************************************************************************** ******** *** * **** ** * ** * * ** * *

**************************************************************************** ******** *** * **** *** ** * * ** * *

**************************************************************************** ******** *** * **** ** * ** * * ** * *

************************************************************************************ *** * **** *** ** * * ** * *

**************************************************************************** ******** *** * **** ** * ** * * ** * *

**************************************************************************** ******** *** ***** ** * ** * * ** * *

**************************************************************************** ******** *** * **** *** ** * * ** * *

**************************************************************************** ******** *** * **** ** * ** * * ** * *

**************************************************************************** ******** *** * **** *** ** * * ** * *

**************************************************************************** ********

*** * ****

** ***

*

** *

*

**************************************************************************** ******** *** * ****** * **

*

** * *

**************************************************************************** ******** *** ***** ** * **

* *

** * *

**************************************************************************** ********

*** * ****** * **

*

** *

*

**************************************************************************** ******** *** * **** ** * ** * * ** * *

**************************************************************************** ******** *** ***** *** **

* *

** * *

**************************************************************************** ******** *** * **** ** * ** * *

** * *

**************************************************************************** *********** * ****** * **

* *

** *

*

**************************************************************************** ******** *** * **** *** ** * * ** * *

**************************************************************************** ******** *** * **** ** * ** * * ** * *

**************************************************************************** ******** *** * ****** * ** * * ** *

*

**************************************************************************** ******** *** * **** *** ** * * ** * *

**************************************************************************** ******** *** * **** ** * ** * * ** * *

LASSO

55051523831557146

0 36 65 76 83 88 94 97 98 99 102

(24)

Best Cp Model for Lasso

best = which.min(diabetes.lasso$Cp) best

## 26

## 27

out.lasso = predict(diabetes.lasso, s=best[1], as.matrix(diabetes.test[,-1])) cv.summary.bas(out.lasso$fit, diabetes.test$y)

## [1] 0.6890253 Close

(25)

Choice of estimator

I BMA - optimal for squared error loss Bayes

I Highest Posterior Probability model (not optimal for prediction) but for selection

I Median Probabilty model (select model where PIP > 0.5) (optimal under certain conditions; nested models)

I Best Probability Model - Model closest to BMA under loss

(26)

Choice of estimator

(27)

Choice of estimator

(28)

Choice of estimator

(29)

HPM

pred.HPM = predict(diabetes.bas,

newdata=diabetes.test,

estimator="HPM", se.fit=TRUE) cv.summary.bas(pred.HPM$fit, diabetes.test$y)

## [1] 0.6910106

(30)

MPM

pred.MPM = predict(diabetes.bas,

newdata=diabetes.test, estimator="MPM")

cv.summary.bas(pred.MPM$fit, diabetes.test$y)

# broken in 1.5.3 :-(

(31)

BPM

pred.BPM = predict(diabetes.bas,

newdata=diabetes.test, estimator="BPM")

cv.summary.bas(pred.BPM$fit, diabetes.test$y)

## [1] 0.6881558

(32)

Coverage

CI.bas = confint(pred.bas)

mean(CI.bas[,1] < diabetes.test$y &

diabetes.test$y <= CI.bas[,2])

## [1] 0.99

(33)

Credible Intervals

plot(CI.bas)

## NULL

points(1:nrow(diabetes.test), diabetes.test$y, col="red", pch=20)

−3

−2

−1 0 1 2 3 4

case

predicted values

1 4 7 11 15 19 23 27 31 35 39 43 47 51 55 59 63 67 71 75 79 83 87 91 95 99

(34)

Coverage with HPM

CI.HPM = confint(pred.HPM)

mean(CI.HPM[,1] < diabetes.test$y &

diabetes.test$y <= CI.HPM[,2])

## [1] 0.99

(35)

Credible Intervals HPM

plot(CI.HPM)

## NULL

points(1:nrow(diabetes.test), diabetes.test$y, col="red", pch=20)

−4

−2 0 2 4

case

predicted values

1 4 7 11 15 19 23 27 31 35 39 43 47 51 55 59 63 67 71 75 79 83 87 91 95 99

(36)

Calculation of CI

(37)

Summary

I prediction intervals account for model uncertainty

I multiple shrinkage with BMA

I avoids problem of “winner’s curse” or post-selection inference I choice of estimator and loss

(38)

Summary

I prediction intervals account for model uncertainty I multiple shrinkage with BMA

(39)

Summary

I avoids problem of “winner’s curse” or post-selection inference

I choice of estimator and loss

(40)