• No results found

MCMC in Bayesian Variable Selection/Model Averaging

N/A
N/A
Protected

Academic year: 2021

Share "MCMC in Bayesian Variable Selection/Model Averaging"

Copied!
40
0
0

Loading.... (view fulltext now)

Full text

(1)

MCMC in Bayesian Variable Selection/Model

Averaging

Readings: See website

11/18/2019

(2)

Example with Diabetes data

First lets load the data (from Hoff) and coerce them to be dataframes.

set.seed(8675309)

source("yX.diabetes.train.txt")

diabetes.train = as.data.frame(diabetes.train) dim(diabetes.train)

## [1] 342 65

source("yX.diabetes.test.txt")

diabetes.test = as.data.frame(diabetes.test) colnames(diabetes.test)[1] = "y"

Too many models 3.6893488 × 1019 to enumerate!

(3)

BMA using BAS

library(BAS)

diabetes.bas = bas.lm(y ~ ., data=diabetes.train,

method="MCMC", prior='hyper-g-n', n.models = 10000,

MCMC.iteration=500000,

thin = 10, initprobs="eplogp", force.heredity=FALSE)

I run a MCMC sampler that combines a Random Walk Metropolis algorithm with a random swap step.

I The algorithm stops when the number of iterations exceeds MCMC.iterations or n.models have been visited.

I thin save every 10th model.

I initprobs argument uses the p-values from OLS to compute an approximate Bayes Factor. These are used internally to sort the variables which provides improved memory usage.

(4)

BMA using BAS

library(BAS)

diabetes.bas = bas.lm(y ~ ., data=diabetes.train,

method="MCMC", prior='hyper-g-n', n.models = 10000,

MCMC.iteration=500000,

thin = 10, initprobs="eplogp", force.heredity=FALSE)

I run a MCMC sampler that combines a Random Walk Metropolis algorithm with a random swap step.

I The algorithm stops when the number of iterations exceeds MCMC.iterations or n.models have been visited.

I thin save every 10th model.

I initprobs argument uses the p-values from OLS to compute an approximate Bayes Factor. These are used internally to sort the variables which provides improved memory usage.

(5)

BMA using BAS

library(BAS)

diabetes.bas = bas.lm(y ~ ., data=diabetes.train,

method="MCMC", prior='hyper-g-n', n.models = 10000,

MCMC.iteration=500000,

thin = 10, initprobs="eplogp", force.heredity=FALSE)

I run a MCMC sampler that combines a Random Walk Metropolis algorithm with a random swap step.

I The algorithm stops when the number of iterations exceeds MCMC.iterations or n.models have been visited.

I thin save every 10th model.

I initprobs argument uses the p-values from OLS to compute an approximate Bayes Factor. These are used internally to sort the variables which provides improved memory usage.

(6)

BMA using BAS

library(BAS)

diabetes.bas = bas.lm(y ~ ., data=diabetes.train,

method="MCMC", prior='hyper-g-n', n.models = 10000,

MCMC.iteration=500000,

thin = 10, initprobs="eplogp", force.heredity=FALSE)

I run a MCMC sampler that combines a Random Walk Metropolis algorithm with a random swap step.

I The algorithm stops when the number of iterations exceeds MCMC.iterations or n.models have been visited.

I thin save every 10th model.

I initprobs argument uses the p-values from OLS to compute an approximate Bayes Factor. These are used internally to sort the variables which provides improved memory usage.

(7)

Checking Convergence

The functiondiagnostics compares

I estimates of posterior probabilities estimated from Monte Carlo frequencies (asymptotically unbiased) to

I estimates based on normalizing marginal likelihoods times prior probabilities of just the sampled models. (Fisher consistent; recovers exact under enumeration)

I should be equal if the chain has converged.

(8)

Checking Convergence

The functiondiagnostics compares

I estimates of posterior probabilities estimated from Monte Carlo frequencies (asymptotically unbiased) to

I estimates based on normalizing marginal likelihoods times prior probabilities of just the sampled models. (Fisher consistent;

recovers exact under enumeration)

I should be equal if the chain has converged.

(9)

Checking Convergence

The functiondiagnostics compares

I estimates of posterior probabilities estimated from Monte Carlo frequencies (asymptotically unbiased) to

I estimates based on normalizing marginal likelihoods times prior probabilities of just the sampled models. (Fisher consistent;

recovers exact under enumeration)

I should be equal if the chain has converged.

(10)

Convergence of Posterior Inclusion Probabilities

diagnostics(diabetes.bas, type="pip")

0.0 0.2 0.4 0.6 0.8 1.0

0.00.20.40.60.81.0

Convergence Plot: Posterior Inclusion Probabilities

pip (renormalized)

pip (MCMC)

Close, but could run longer!

(11)

Convergence of model probabilities

diagnostics(diabetes.bas, type="model")

0.00 0.02 0.04 0.06 0.08 0.10 0.12

0.000.020.040.060.080.100.12

Convergence Plot: Posterior Model Probabilities

p(M | Y) (renormalized)

p(M | Y) (MCMC)

Clearly, not long enough.

(12)

rerun longer

diabetes.bas = bas.lm(y ~ ., data=diabetes.train,

method="MCMC", prior='hyper-g-n', n.models = 15000,

MCMC.iterations=10^7, thin = 64, # keep every p initprobs="eplogp",

force.heredity=FALSE)

(13)

Diagnostics

diagnostics(diabetes.bas, type="model")

0.00 0.02 0.04 0.06 0.08 0.10

0.000.020.040.060.080.10

Convergence Plot: Posterior Model Probabilities

p(M | Y) (renormalized)

p(M | Y) (MCMC)

(14)

Convergence of Posterior Inclusion Probabilities

diagnostics(diabetes.bas, type="pip")

0.0 0.2 0.4 0.6 0.8 1.0

0.00.20.40.60.81.0

Convergence Plot: Posterior Inclusion Probabilities

pip (renormalized)

pip (MCMC)

OK?

(15)

Highest Probability Model

One problem with MCMC and model selection is that the “best”

model may be visited very few times.

summary(diabetes.bas$freq)

## Min. 1st Qu. Median Mean 3rd Qu. Max.

## 1.000 2.000 4.000 7.825 8.000 4127.000

(16)

Comments

I I routinely use much larger numbers of MCMC iterations than I see in papers if looking at BMA or model selection (i.e. millions rather than say 10,000)

I Explore different numbers of models - does it make a difference for estimating marginal inclusion probabilities or posterior inclusion probabilities? What about predictions under BMA? I Check memory ! items can be large in R memory

(17)

Comments

I I routinely use much larger numbers of MCMC iterations than I see in papers if looking at BMA or model selection (i.e. millions rather than say 10,000)

I Explore different numbers of models - does it make a difference for estimating marginal inclusion probabilities or posterior inclusion probabilities? What about predictions under BMA?

I Check memory ! items can be large in R memory

(18)

Comments

I I routinely use much larger numbers of MCMC iterations than I see in papers if looking at BMA or model selection (i.e. millions rather than say 10,000)

I Explore different numbers of models - does it make a difference for estimating marginal inclusion probabilities or posterior inclusion probabilities? What about predictions under BMA?

I Check memory ! items can be large in R memory

(19)

What variables are included?

image(diabetes.bas)

00.3780.6130.8441.0741.281.492.586 201311987654321 Model Rank

Log Posterior Odds Intercept sex map ldl tch glu ‘bmi^2‘ ‘tc^2‘ ‘hdl^2‘ ‘ltg^2‘ ‘age:sex‘ ‘age:map‘ ‘age:ldl‘ ‘age:tch‘ ‘age:glu‘ ‘sex:map‘ ‘sex:ldl‘ ‘sex:tch‘ ‘sex:glu‘ ‘bmi:tc‘ ‘bmi:hdl‘ ‘bmi:ltg‘ ‘map:tc‘ ‘map:hdl‘ ‘map:ltg‘ ‘tc:ldl‘ ‘tc:tch‘ ‘tc:glu‘ ‘ldl:tch‘ ‘ldl:glu‘ ‘hdl:ltg‘ ‘tch:ltg‘ ‘ltg:glu‘

(20)

Prediction

BMA for prediction and compute the MSE between the predicted and actual responses for the test data using BMA:

pred.bas = predict(diabetes.bas,

newdata=diabetes.test, estimator="BMA", se.fit=T) cv.summary.bas(pred.bas$fit, diabetes.test$y)

## [1] 0.6739172

(21)

Scoring

BAS::cv.summary.bas

## function (pred, ytrue, score = "squared-error")

## {## if (length(pred) != length(ytrue)) {

## stop("predicted values and observed values are not the same length")

## return()

## }

## if (!(score %in% c("squared-error", "miss-class"))) {

## stop(paste("score ", score, "not implemented; please check spelling"))

## }

## if (score == "miss-class") {

## pred.class <- ifelse(pred < 0.5, 0, 1)

## confusion <- table(pred.class, ytrue)

## error <- (sum(confusion) - sum(diag(confusion)))/sum(confusion)

## }

## else {

## error <- sqrt(sum((pred - ytrue)^2)/length(ytrue))

## }

## return(error)

## }## <bytecode: 0x7fb255d10130>

## <environment: namespace:BAS>

(22)

Fit the Lasso

suppressMessages(library(lars))

diabetes.lasso = lars(as.matrix(diabetes.train[, -1]), diabetes.train[,1],

type="lasso")

(23)

Plot of Lasso

plot(diabetes.lasso)

**************************************************************************** ******** *** * **** ** * ** * * ** * *

0.0 0.2 0.4 0.6 0.8 1.0

−100−50050100

|beta|/max|beta|

Standardized Coefficients

**************************************************************************** ******** *** * **** *** ** * * ** * *

**************************************************************************** ******** *** * **** ** * ** * * ** * *

**************************************************************************** ******** *** * **** ** * ** * * ** * *

**************************************************************************** ******** *** * **** *** ** * * ** *

*

**************************************************************************** ******** *** * **** ** * ** * * ** *

*

**************************************************************************** ******** *** * **** *** ** * * ** *

*

**************************************************************************** ******** *** * **** ** * ** * * ** * *

**************************************************************************** ******** *** * **** ** * ** * * ** *

*

**************************************************************************** ******** *** * **** ** * ** * * ** * *

**************************************************************************** ******** *** * **** ** * ** * * ** * *

**************************************************************************** ******** *** * **** ** * ** * * ** * *

**************************************************************************** ******** *** * **** ** * ** * * ** * *

**************************************************************************** *********** * ****** ***

*

*

** * *

**************************************************************************** ******** *** * ****** * **

*

*

** * *

**************************************************************************** ******** *** * **** *** ** * *

** * *

************************************************************************************ *** * **** ** * ** * * ** * *

************************************************************************************ *** * **** ** *** * * ** * *

**************************************************************************** ******** *** * **** ** * ** * * ** * *

**************************************************************************** ******** *** * **** ** * ** * * ** * *

**************************************************************************** ******** *** * **** ** * ** * * ** * *

**************************************************************************** ******** *** * **** ** * ** * * ** * *

**************************************************************************** ******** *** * **** *** ** * * ** * *

**************************************************************************** ******** *** * **** *** ** * * ** * *

**************************************************************************** ******** *** * ****** * ** * * ** * *

**************************************************************************** ******** *** * **** ** * ** * * ** * *

**************************************************************************** ******** *** * **** ** * ** * * ** * *

**************************************************************************** ******** *** * **** ** * ** * * ** * *

**************************************************************************** ******** *** * **** ** * ** * * ** * *

**************************************************************************** ******** *** * **** ** * ** * * ** * *

**************************************************************************** ******** *** * **** ** * ** * * ** * *

**************************************************************************** ******** *** ***** *** ** * * ** * *

**************************************************************************** ******** *** * **** *** ** * * ** * *

**************************************************************************** ******** *** * **** ** * ** * * ** * *

**************************************************************************** ******** *** * **** *** ** * * ** * *

**************************************************************************** ******** *** * **** ** * ** * * ** * *

**************************************************************************** ******** *** * **** ** * ** * * ** * *

************************************************************************************ *** * **** *** ** * * ** * *

**************************************************************************** ******** *** * **** ** * ** * * ** * *

**************************************************************************** ******** *** * **** ** * ** * * ** * *

**************************************************************************** ******** *** * **** ** * ** * * ** * *

**************************************************************************** ******** *** * **** ** * ** * * ** * *

**************************************************************************** ******** *** * **** ** * ** * * ** * *

**************************************************************************** ******** *** ***** ** * ** * * ** * *

**************************************************************************** ******** *** * **** *** ** * * ** * *

**************************************************************************** ******** *** * **** *** ** * * ** * *

**************************************************************************** ******** *** * **** ** * ** * * ** * *

**************************************************************************** ******** *** * **** ** * ** * * ** * *

**************************************************************************** ******** *** * **** *** ** * * ** * *

**************************************************************************** ********

*** * ****

** ***

*

*

** *

*

**************************************************************************** ******** *** * ****** * **

*

*

** * *

**************************************************************************** ******** *** ***** ** * **

* *

** * *

**************************************************************************** ********

*** * ****** * **

*

*

** *

*

**************************************************************************** ******** *** * **** ** * ** * * ** * *

**************************************************************************** ******** *** ***** *** **

* *

** * *

**************************************************************************** ******** *** * **** ** * ** * *

** * *

**************************************************************************** *********** * ****** * **

* *

** *

*

**************************************************************************** ******** *** * **** *** ** * * ** * *

**************************************************************************** ******** *** * **** ** * ** * * ** * *

**************************************************************************** ******** *** * ****** * ** * * ** *

*

**************************************************************************** ******** *** * **** *** ** * * ** * *

**************************************************************************** ******** *** * **** *** ** * * ** * *

**************************************************************************** ******** *** * **** ** * ** * * ** * *

**************************************************************************** ******** *** * **** ** * ** * * ** * *

LASSO

55051523831557146

0 36 65 76 83 88 94 97 98 99 102

(24)

Best Cp Model for Lasso

best = which.min(diabetes.lasso$Cp) best

## 26

## 27

out.lasso = predict(diabetes.lasso, s=best[1], as.matrix(diabetes.test[,-1])) cv.summary.bas(out.lasso$fit, diabetes.test$y)

## [1] 0.6890253 Close

(25)

Choice of estimator

I BMA - optimal for squared error loss Bayes

I Highest Posterior Probability model (not optimal for prediction) but for selection

I Median Probabilty model (select model where PIP > 0.5) (optimal under certain conditions; nested models)

I Best Probability Model - Model closest to BMA under loss

(26)

Choice of estimator

I BMA - optimal for squared error loss Bayes

I Highest Posterior Probability model (not optimal for prediction) but for selection

I Median Probabilty model (select model where PIP > 0.5) (optimal under certain conditions; nested models)

I Best Probability Model - Model closest to BMA under loss

(27)

Choice of estimator

I BMA - optimal for squared error loss Bayes

I Highest Posterior Probability model (not optimal for prediction) but for selection

I Median Probabilty model (select model where PIP > 0.5) (optimal under certain conditions; nested models)

I Best Probability Model - Model closest to BMA under loss

(28)

Choice of estimator

I BMA - optimal for squared error loss Bayes

I Highest Posterior Probability model (not optimal for prediction) but for selection

I Median Probabilty model (select model where PIP > 0.5) (optimal under certain conditions; nested models)

I Best Probability Model - Model closest to BMA under loss

(29)

HPM

pred.HPM = predict(diabetes.bas,

newdata=diabetes.test,

estimator="HPM", se.fit=TRUE) cv.summary.bas(pred.HPM$fit, diabetes.test$y)

## [1] 0.6910106

(30)

MPM

pred.MPM = predict(diabetes.bas,

newdata=diabetes.test, estimator="MPM")

cv.summary.bas(pred.MPM$fit, diabetes.test$y)

# broken in 1.5.3 :-(

(31)

BPM

pred.BPM = predict(diabetes.bas,

newdata=diabetes.test, estimator="BPM")

cv.summary.bas(pred.BPM$fit, diabetes.test$y)

## [1] 0.6881558

(32)

Coverage

CI.bas = confint(pred.bas)

mean(CI.bas[,1] < diabetes.test$y &

diabetes.test$y <= CI.bas[,2])

## [1] 0.99

(33)

Credible Intervals

plot(CI.bas)

## NULL

points(1:nrow(diabetes.test), diabetes.test$y, col="red", pch=20)

−3

−2

−1 0 1 2 3 4

case

predicted values

1 4 7 11 15 19 23 27 31 35 39 43 47 51 55 59 63 67 71 75 79 83 87 91 95 99

(34)

Coverage with HPM

CI.HPM = confint(pred.HPM)

mean(CI.HPM[,1] < diabetes.test$y &

diabetes.test$y <= CI.HPM[,2])

## [1] 0.99

(35)

Credible Intervals HPM

plot(CI.HPM)

## NULL

points(1:nrow(diabetes.test), diabetes.test$y, col="red", pch=20)

−4

−2 0 2 4

case

predicted values

1 4 7 11 15 19 23 27 31 35 39 43 47 51 55 59 63 67 71 75 79 83 87 91 95 99

(36)

Calculation of CI

(37)

Summary

I prediction intervals account for model uncertainty

I multiple shrinkage with BMA

I avoids problem of “winner’s curse” or post-selection inference I choice of estimator and loss

(38)

Summary

I prediction intervals account for model uncertainty I multiple shrinkage with BMA

I avoids problem of “winner’s curse” or post-selection inference I choice of estimator and loss

(39)

Summary

I prediction intervals account for model uncertainty I multiple shrinkage with BMA

I avoids problem of “winner’s curse” or post-selection inference

I choice of estimator and loss

(40)

Summary

I prediction intervals account for model uncertainty I multiple shrinkage with BMA

I avoids problem of “winner’s curse” or post-selection inference I choice of estimator and loss

References

Related documents

An '-' entry in the estimate column indicates that either no sample observations or too few sample observations were available to compute an estimate, or a ratio of medians cannot

No regions-of-interest showed signi fi cantly different metabolism in the HR group versus the LR group in the placebo condition, however compared with the LR group, the HR

Over the entire period from 1954 to 2001, the quality- adjusted measures show that the role of investment- specific technological change has been considerable, accounting for 60 to

For information regarding Special Research Interest Groups--Adult Community and Continuing Education, Affective Response, Creativity, Early Childhood, General Research,

In our scheme we solve above two problems at the same time, and develop the certificateless public cryptosys- tem by combining the RSA signature using smart cards and

Appendix A – Operational Definitions of Good Practice/Liberal Arts Experience Scales Good teaching and high quality interactions with faculty is a 23-item scale with an alpha

Performance and power analysis of rcce message passing on the intel single chip cloud computer. In Proceedings of the 4th Many-core Applications Research Community (MARC) Symposium

To the authors’ knowledge, this is the first study to establish expert consensus at a national level about which behaviours should be targeted in relation to antimicrobial