MCMC in Bayesian Variable Selection/Model
Averaging
Readings: See website
11/18/2019
Example with Diabetes data
First lets load the data (from Hoff) and coerce them to be dataframes.
set.seed(8675309)
source("yX.diabetes.train.txt")
diabetes.train = as.data.frame(diabetes.train) dim(diabetes.train)
## [1] 342 65
source("yX.diabetes.test.txt")
diabetes.test = as.data.frame(diabetes.test) colnames(diabetes.test)[1] = "y"
Too many models 3.6893488 × 1019 to enumerate!
BMA using BAS
library(BAS)
diabetes.bas = bas.lm(y ~ ., data=diabetes.train,
method="MCMC", prior='hyper-g-n', n.models = 10000,
MCMC.iteration=500000,
thin = 10, initprobs="eplogp", force.heredity=FALSE)
I run a MCMC sampler that combines a Random Walk Metropolis algorithm with a random swap step.
I The algorithm stops when the number of iterations exceeds MCMC.iterations or n.models have been visited.
I thin save every 10th model.
I initprobs argument uses the p-values from OLS to compute an approximate Bayes Factor. These are used internally to sort the variables which provides improved memory usage.
BMA using BAS
library(BAS)
diabetes.bas = bas.lm(y ~ ., data=diabetes.train,
method="MCMC", prior='hyper-g-n', n.models = 10000,
MCMC.iteration=500000,
thin = 10, initprobs="eplogp", force.heredity=FALSE)
I run a MCMC sampler that combines a Random Walk Metropolis algorithm with a random swap step.
I The algorithm stops when the number of iterations exceeds MCMC.iterations or n.models have been visited.
I thin save every 10th model.
I initprobs argument uses the p-values from OLS to compute an approximate Bayes Factor. These are used internally to sort the variables which provides improved memory usage.
BMA using BAS
library(BAS)
diabetes.bas = bas.lm(y ~ ., data=diabetes.train,
method="MCMC", prior='hyper-g-n', n.models = 10000,
MCMC.iteration=500000,
thin = 10, initprobs="eplogp", force.heredity=FALSE)
I run a MCMC sampler that combines a Random Walk Metropolis algorithm with a random swap step.
I The algorithm stops when the number of iterations exceeds MCMC.iterations or n.models have been visited.
I thin save every 10th model.
I initprobs argument uses the p-values from OLS to compute an approximate Bayes Factor. These are used internally to sort the variables which provides improved memory usage.
BMA using BAS
library(BAS)
diabetes.bas = bas.lm(y ~ ., data=diabetes.train,
method="MCMC", prior='hyper-g-n', n.models = 10000,
MCMC.iteration=500000,
thin = 10, initprobs="eplogp", force.heredity=FALSE)
I run a MCMC sampler that combines a Random Walk Metropolis algorithm with a random swap step.
I The algorithm stops when the number of iterations exceeds MCMC.iterations or n.models have been visited.
I thin save every 10th model.
I initprobs argument uses the p-values from OLS to compute an approximate Bayes Factor. These are used internally to sort the variables which provides improved memory usage.
Checking Convergence
The functiondiagnostics compares
I estimates of posterior probabilities estimated from Monte Carlo frequencies (asymptotically unbiased) to
I estimates based on normalizing marginal likelihoods times prior probabilities of just the sampled models. (Fisher consistent; recovers exact under enumeration)
I should be equal if the chain has converged.
Checking Convergence
The functiondiagnostics compares
I estimates of posterior probabilities estimated from Monte Carlo frequencies (asymptotically unbiased) to
I estimates based on normalizing marginal likelihoods times prior probabilities of just the sampled models. (Fisher consistent;
recovers exact under enumeration)
I should be equal if the chain has converged.
Checking Convergence
The functiondiagnostics compares
I estimates of posterior probabilities estimated from Monte Carlo frequencies (asymptotically unbiased) to
I estimates based on normalizing marginal likelihoods times prior probabilities of just the sampled models. (Fisher consistent;
recovers exact under enumeration)
I should be equal if the chain has converged.
Convergence of Posterior Inclusion Probabilities
diagnostics(diabetes.bas, type="pip")
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Convergence Plot: Posterior Inclusion Probabilities
pip (renormalized)
pip (MCMC)
Close, but could run longer!
Convergence of model probabilities
diagnostics(diabetes.bas, type="model")
0.00 0.02 0.04 0.06 0.08 0.10 0.12
0.000.020.040.060.080.100.12
Convergence Plot: Posterior Model Probabilities
p(M | Y) (renormalized)
p(M | Y) (MCMC)
Clearly, not long enough.
rerun longer
diabetes.bas = bas.lm(y ~ ., data=diabetes.train,
method="MCMC", prior='hyper-g-n', n.models = 15000,
MCMC.iterations=10^7, thin = 64, # keep every p initprobs="eplogp",
force.heredity=FALSE)
Diagnostics
diagnostics(diabetes.bas, type="model")
0.00 0.02 0.04 0.06 0.08 0.10
0.000.020.040.060.080.10
Convergence Plot: Posterior Model Probabilities
p(M | Y) (renormalized)
p(M | Y) (MCMC)
Convergence of Posterior Inclusion Probabilities
diagnostics(diabetes.bas, type="pip")
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Convergence Plot: Posterior Inclusion Probabilities
pip (renormalized)
pip (MCMC)
OK?
Highest Probability Model
One problem with MCMC and model selection is that the “best”
model may be visited very few times.
summary(diabetes.bas$freq)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 2.000 4.000 7.825 8.000 4127.000
Comments
I I routinely use much larger numbers of MCMC iterations than I see in papers if looking at BMA or model selection (i.e. millions rather than say 10,000)
I Explore different numbers of models - does it make a difference for estimating marginal inclusion probabilities or posterior inclusion probabilities? What about predictions under BMA? I Check memory ! items can be large in R memory
Comments
I I routinely use much larger numbers of MCMC iterations than I see in papers if looking at BMA or model selection (i.e. millions rather than say 10,000)
I Explore different numbers of models - does it make a difference for estimating marginal inclusion probabilities or posterior inclusion probabilities? What about predictions under BMA?
I Check memory ! items can be large in R memory
Comments
I I routinely use much larger numbers of MCMC iterations than I see in papers if looking at BMA or model selection (i.e. millions rather than say 10,000)
I Explore different numbers of models - does it make a difference for estimating marginal inclusion probabilities or posterior inclusion probabilities? What about predictions under BMA?
I Check memory ! items can be large in R memory
What variables are included?
image(diabetes.bas)
00.3780.6130.8441.0741.281.492.586 201311987654321 Model Rank
Log Posterior Odds Intercept sex map ldl tch glu ‘bmi^2‘ ‘tc^2‘ ‘hdl^2‘ ‘ltg^2‘ ‘age:sex‘ ‘age:map‘ ‘age:ldl‘ ‘age:tch‘ ‘age:glu‘ ‘sex:map‘ ‘sex:ldl‘ ‘sex:tch‘ ‘sex:glu‘ ‘bmi:tc‘ ‘bmi:hdl‘ ‘bmi:ltg‘ ‘map:tc‘ ‘map:hdl‘ ‘map:ltg‘ ‘tc:ldl‘ ‘tc:tch‘ ‘tc:glu‘ ‘ldl:tch‘ ‘ldl:glu‘ ‘hdl:ltg‘ ‘tch:ltg‘ ‘ltg:glu‘
Prediction
BMA for prediction and compute the MSE between the predicted and actual responses for the test data using BMA:
pred.bas = predict(diabetes.bas,
newdata=diabetes.test, estimator="BMA", se.fit=T) cv.summary.bas(pred.bas$fit, diabetes.test$y)
## [1] 0.6739172
Scoring
BAS::cv.summary.bas
## function (pred, ytrue, score = "squared-error")
## {## if (length(pred) != length(ytrue)) {
## stop("predicted values and observed values are not the same length")
## return()
## }
## if (!(score %in% c("squared-error", "miss-class"))) {
## stop(paste("score ", score, "not implemented; please check spelling"))
## }
## if (score == "miss-class") {
## pred.class <- ifelse(pred < 0.5, 0, 1)
## confusion <- table(pred.class, ytrue)
## error <- (sum(confusion) - sum(diag(confusion)))/sum(confusion)
## }
## else {
## error <- sqrt(sum((pred - ytrue)^2)/length(ytrue))
## }
## return(error)
## }## <bytecode: 0x7fb255d10130>
## <environment: namespace:BAS>
Fit the Lasso
suppressMessages(library(lars))
diabetes.lasso = lars(as.matrix(diabetes.train[, -1]), diabetes.train[,1],
type="lasso")
Plot of Lasso
plot(diabetes.lasso)
**************************************************************************** ******** *** * **** ** * ** * * ** * *
0.0 0.2 0.4 0.6 0.8 1.0
−100−50050100
|beta|/max|beta|
Standardized Coefficients
**************************************************************************** ******** *** * **** *** ** * * ** * *
**************************************************************************** ******** *** * **** ** * ** * * ** * *
**************************************************************************** ******** *** * **** ** * ** * * ** * *
**************************************************************************** ******** *** * **** *** ** * * ** *
*
**************************************************************************** ******** *** * **** ** * ** * * ** *
*
**************************************************************************** ******** *** * **** *** ** * * ** *
*
**************************************************************************** ******** *** * **** ** * ** * * ** * *
**************************************************************************** ******** *** * **** ** * ** * * ** *
*
**************************************************************************** ******** *** * **** ** * ** * * ** * *
**************************************************************************** ******** *** * **** ** * ** * * ** * *
**************************************************************************** ******** *** * **** ** * ** * * ** * *
**************************************************************************** ******** *** * **** ** * ** * * ** * *
**************************************************************************** *********** * ****** ***
*
*
** * *
**************************************************************************** ******** *** * ****** * **
*
*
** * *
**************************************************************************** ******** *** * **** *** ** * *
** * *
************************************************************************************ *** * **** ** * ** * * ** * *
************************************************************************************ *** * **** ** *** * * ** * *
**************************************************************************** ******** *** * **** ** * ** * * ** * *
**************************************************************************** ******** *** * **** ** * ** * * ** * *
**************************************************************************** ******** *** * **** ** * ** * * ** * *
**************************************************************************** ******** *** * **** ** * ** * * ** * *
**************************************************************************** ******** *** * **** *** ** * * ** * *
**************************************************************************** ******** *** * **** *** ** * * ** * *
**************************************************************************** ******** *** * ****** * ** * * ** * *
**************************************************************************** ******** *** * **** ** * ** * * ** * *
**************************************************************************** ******** *** * **** ** * ** * * ** * *
**************************************************************************** ******** *** * **** ** * ** * * ** * *
**************************************************************************** ******** *** * **** ** * ** * * ** * *
**************************************************************************** ******** *** * **** ** * ** * * ** * *
**************************************************************************** ******** *** * **** ** * ** * * ** * *
**************************************************************************** ******** *** ***** *** ** * * ** * *
**************************************************************************** ******** *** * **** *** ** * * ** * *
**************************************************************************** ******** *** * **** ** * ** * * ** * *
**************************************************************************** ******** *** * **** *** ** * * ** * *
**************************************************************************** ******** *** * **** ** * ** * * ** * *
**************************************************************************** ******** *** * **** ** * ** * * ** * *
************************************************************************************ *** * **** *** ** * * ** * *
**************************************************************************** ******** *** * **** ** * ** * * ** * *
**************************************************************************** ******** *** * **** ** * ** * * ** * *
**************************************************************************** ******** *** * **** ** * ** * * ** * *
**************************************************************************** ******** *** * **** ** * ** * * ** * *
**************************************************************************** ******** *** * **** ** * ** * * ** * *
**************************************************************************** ******** *** ***** ** * ** * * ** * *
**************************************************************************** ******** *** * **** *** ** * * ** * *
**************************************************************************** ******** *** * **** *** ** * * ** * *
**************************************************************************** ******** *** * **** ** * ** * * ** * *
**************************************************************************** ******** *** * **** ** * ** * * ** * *
**************************************************************************** ******** *** * **** *** ** * * ** * *
**************************************************************************** ********
*** * ****
** ***
*
*
** *
*
**************************************************************************** ******** *** * ****** * **
*
*
** * *
**************************************************************************** ******** *** ***** ** * **
* *
** * *
**************************************************************************** ********
*** * ****** * **
*
*
** *
*
**************************************************************************** ******** *** * **** ** * ** * * ** * *
**************************************************************************** ******** *** ***** *** **
* *
** * *
**************************************************************************** ******** *** * **** ** * ** * *
** * *
**************************************************************************** *********** * ****** * **
* *
** *
*
**************************************************************************** ******** *** * **** *** ** * * ** * *
**************************************************************************** ******** *** * **** ** * ** * * ** * *
**************************************************************************** ******** *** * ****** * ** * * ** *
*
**************************************************************************** ******** *** * **** *** ** * * ** * *
**************************************************************************** ******** *** * **** *** ** * * ** * *
**************************************************************************** ******** *** * **** ** * ** * * ** * *
**************************************************************************** ******** *** * **** ** * ** * * ** * *
LASSO
55051523831557146
0 36 65 76 83 88 94 97 98 99 102
Best Cp Model for Lasso
best = which.min(diabetes.lasso$Cp) best
## 26
## 27
out.lasso = predict(diabetes.lasso, s=best[1], as.matrix(diabetes.test[,-1])) cv.summary.bas(out.lasso$fit, diabetes.test$y)
## [1] 0.6890253 Close
Choice of estimator
I BMA - optimal for squared error loss Bayes
I Highest Posterior Probability model (not optimal for prediction) but for selection
I Median Probabilty model (select model where PIP > 0.5) (optimal under certain conditions; nested models)
I Best Probability Model - Model closest to BMA under loss
Choice of estimator
I BMA - optimal for squared error loss Bayes
I Highest Posterior Probability model (not optimal for prediction) but for selection
I Median Probabilty model (select model where PIP > 0.5) (optimal under certain conditions; nested models)
I Best Probability Model - Model closest to BMA under loss
Choice of estimator
I BMA - optimal for squared error loss Bayes
I Highest Posterior Probability model (not optimal for prediction) but for selection
I Median Probabilty model (select model where PIP > 0.5) (optimal under certain conditions; nested models)
I Best Probability Model - Model closest to BMA under loss
Choice of estimator
I BMA - optimal for squared error loss Bayes
I Highest Posterior Probability model (not optimal for prediction) but for selection
I Median Probabilty model (select model where PIP > 0.5) (optimal under certain conditions; nested models)
I Best Probability Model - Model closest to BMA under loss
HPM
pred.HPM = predict(diabetes.bas,
newdata=diabetes.test,
estimator="HPM", se.fit=TRUE) cv.summary.bas(pred.HPM$fit, diabetes.test$y)
## [1] 0.6910106
MPM
pred.MPM = predict(diabetes.bas,
newdata=diabetes.test, estimator="MPM")
cv.summary.bas(pred.MPM$fit, diabetes.test$y)
# broken in 1.5.3 :-(
BPM
pred.BPM = predict(diabetes.bas,
newdata=diabetes.test, estimator="BPM")
cv.summary.bas(pred.BPM$fit, diabetes.test$y)
## [1] 0.6881558
Coverage
CI.bas = confint(pred.bas)
mean(CI.bas[,1] < diabetes.test$y &
diabetes.test$y <= CI.bas[,2])
## [1] 0.99
Credible Intervals
plot(CI.bas)
## NULL
points(1:nrow(diabetes.test), diabetes.test$y, col="red", pch=20)
−3
−2
−1 0 1 2 3 4
case
predicted values
1 4 7 11 15 19 23 27 31 35 39 43 47 51 55 59 63 67 71 75 79 83 87 91 95 99
Coverage with HPM
CI.HPM = confint(pred.HPM)
mean(CI.HPM[,1] < diabetes.test$y &
diabetes.test$y <= CI.HPM[,2])
## [1] 0.99
Credible Intervals HPM
plot(CI.HPM)
## NULL
points(1:nrow(diabetes.test), diabetes.test$y, col="red", pch=20)
−4
−2 0 2 4
case
predicted values
1 4 7 11 15 19 23 27 31 35 39 43 47 51 55 59 63 67 71 75 79 83 87 91 95 99
Calculation of CI
Summary
I prediction intervals account for model uncertainty
I multiple shrinkage with BMA
I avoids problem of “winner’s curse” or post-selection inference I choice of estimator and loss
Summary
I prediction intervals account for model uncertainty I multiple shrinkage with BMA
I avoids problem of “winner’s curse” or post-selection inference I choice of estimator and loss
Summary
I prediction intervals account for model uncertainty I multiple shrinkage with BMA
I avoids problem of “winner’s curse” or post-selection inference
I choice of estimator and loss
Summary
I prediction intervals account for model uncertainty I multiple shrinkage with BMA
I avoids problem of “winner’s curse” or post-selection inference I choice of estimator and loss