• No results found

Bayesian Penalized Methods for High Dimensional Data

N/A
N/A
Protected

Academic year: 2021

Share "Bayesian Penalized Methods for High Dimensional Data"

Copied!
38
0
0

Loading.... (view fulltext now)

Full text

(1)

Bayesian

 

Penalized

 

Methods

 

for

 

High

 

Dimensional

 

Data

Joseph

 

G.

 

Ibrahim

Joint

 

with

 

Hongtu

 

Zhu

 

(2)

Motivation

 

GLRR:

 

Bayesian

 

Generalized

 

Low

 

Rank

 

Regression

 

L2R2:

 

Bayesian

 

Longitudinal

 

Low

 

Rank

 

Regression

 

ADNI

 

data

 

analysis

(3)
(4)

Alzheimer’s

 

Disease

Alzheimer's disease (AD) is an escalating national epidemic and a 

genetically complex, progressive, and fatal neurodegenetive 

disease. 

The incidence of AD doubles every five years after the age of 65 and the number of AD patients has recently dramatically 

increased, which has caused a heavy socioeconomic burden.  

AD is the sixth‐leading  cause of death  in the United States, and  

(5)

ADNI

 

Database

The Alzheimer's Disease Neuroimaging Initiative (ADNI) is the first 

"Big Data" project for AD and is collecting imaging, genetic, 

clinical, and cognitive data  for measuring the progress of AD or 

the effects of treatment.

ADNI began 2004 and has three phases including ADNI 1, ADNI 

Go, and ADNI 2. 

Efficiently integrating big ADNI data  may  lead to 

(AD1)  detecting AD at the earliest stage possible and marking its progress    

through  biomarkers;

(AD2)  developing new diagnostic methods for AD intervention, prevention and 

(6)

ADNI

 

Database

ADNI

 

1.

 

Integrating Imaging and Genetic Data to identify  

genetic and environmental contributions to  brain baseline data 

and brain development trajectories. 

Model:

 

Brain

 

volume

 

=

 

f(SNP,

 

age,

 

gender,

 

…)

Data:

Genotype: SNPs (X) (≈600,000+)

MRI ROI (region‐of‐interest volumes = Y) (93) 

Prognostic factors: age, gender, education, etc. 

(7)

Magnetic Resonance Imaging (MRI)

Voxel is 3‐D version of Pixel

MRI machine reads signal on 

a voxel,  stores in 3‐D array

sMRI = structure of brain 

fMRI = brain activity from 

blood flow

Voxel:

 

n

subjects

 

will

 

yield

 

n

x6 million

 

matrix

ROIs

 

reduce

 

dimension

 

to

 

93

 

ROIs

(8)

Single

Nucleotide

 

Polymorphism

 

(SNP)

Normal

 

(not

 

rare)

different

 

nucleotides

 

in

 

the

 

same

 

location

SNPs

 

may

 

affect

 

gene

 

function

ADNI:

  

600,000

 

SNPs

 

n=750

 

<<

 

600,000

 

SNPs

Select SNPs only on top 40 genes reported by

(9)

Bayesian

 

Shrinkage

 

and

 

Selection

Prior

 

:

   

̶

log(prior)

 

=

 

penalty

 

function

 

=

Posterior:

Frequentist

 

penalized

 

estimation

 

Maximum

 

aposteriori

 

(MAP)

 

estimation

MLE

 

sets

 

penalty

 

to

 

0

 

(MAP

 

with

 

non

(10)

Bayesian

 

Shrinkage

 

and

 

Selection

Popular

 

choice

α ≤

1

 →

shrinkage

 

and

 

selection:

 

creates

 

singularity

 

at

 

0

 

and

 

a

 black

 

hole

,

 

to

 

pull

 

smaller

 

elements

 

to

 

0

Bridge

 

regression:

  α

<

 

1

L

1

priors

 

(lasso,

 

adaptive

 

lasso):

 α

=

 

1

α

>

 

1

 →

No

 

selection,

 

shrinkage

 

only

(11)

Black

 

Hole

 

Priors:

 

α ≤

1

Prior creates a  singularity at  origin. MAP estimation  allows selection  and shrinkage Unstable around  the boundary

(12)

Want

 

huge

 

spike

 

(gravity)

 

at

 

the

 

origin;

 

Gravity

 

should

 

pull

 

the

 

smaller

 

coefficients

 

to

 

0

 

Huge spike/gravity implies smaller

coefficients shrink more

Smaller spike/gravity implies smaller

coefficients shrink less

Distributional

 

Perspective

Singularity/Discontinuity at the origin

(13)

Want

 

heavy

 

tails/minimum

 

gravity

 

/

 

flat

 

density

 

far

 

from

 

origin;

 

Gravity

 

should

 

not

 

affect

 

the

 

larger

  

coefficients

Steeper slope/stronger gravity implies larger coefficients shrink more Flatter tail/weaker gravity

implies larger

coefficients shrink less

(14)

Commonly

 

Used

 

Priors

Larger spike at the origin and heavier tails

(15)
(16)

Do

 

SNPs

 

act

 

alone

 

or

 

work

 

together?

 

Do

 

the

 

ROIs

 

also

 

act

 

together?

Do

 

ROIs

 

and

 

SNPs

 

acting

 

together

 

support

 

some

 

underlying

 

structure

 

in

 

the

 

regression

 

coefficients.

We

 

try

 

and

 

exploit

 

this

 

structure

 

to

 

reduce

 

dimension

(17)

GLRR:

 

Low

 

Rank

 

Regression

p

*

=

 

r*(p+d)

 

<<

 

p*d,

 

5*(1K+1K)

 

=

 

10K

 

<<

 

1K*1K

 

=

 

1

 

million

 

(18)

U

 

and

 

V

 

need

 

not

 

be

 

unitary

 

(orthonormal)

 

otherwise

 

need

 

matrix

 

VMF

 

and

 

metropolis

No

 

ordering

 

restriction

 

on

 

elements

 

of

 Δ

otherwise

 

need

 

truncated

 

normal

 

and

 

metropolis

Many

 

Bayesian

 

applications

 

do

 

not

 

require

 

identifiability

Allows

 

closed

 

form

 

full

 

conditionals

 

to

 

apply

 

Gibbs

sampler

scale to larger dimensions 

computational efficiency

(19)
(20)

GLRR:

 

Model

 

and

 

Priors

Cov(Yi) =

(21)

GLRR:

 

Why

 

L

2

Priors

If

 

covariates

 

are

 

correlated

L

2

tends

 

to

 

push

 

them

 

towards

 

each

 

other

¾ more correlated estimates (Ridge), reason for 

our choice 

L

1

tends

 

to

 

pick

 

one,

 

force

 

the

 

rest

 

to

 

0

least absolute subset selection operator (lasso)

True β 1 1 1 1 1 1 1 1 1 1

OLS 2.95 1.09 1.11 1.24 0.98 0.98 1.57 1.14 1.33 0.66 Ridge 1.13 1.02 0.75 1.19 0.86 0.99 1.46 1.03 1.21 0.62 Lasso 0 0 0 2.95 0 0.07 0.97 0 0.23 0

(22)

GLRR:

 

Comparison

 

Criteria

 

for

 

Determining

 

the

 

Rank

 

of

 

B

(23)
(24)
(25)

GLRR:

 

Simulated

 

ROC

Blue:GLRR5 Red:GLRR3 Black: = LASSO --- : BLASSO … : G-SMuRFS

(26)

GLRR:

 

Simulated

 

Image

 

Recovery

 

Rows: True, LASSO, BLASSO, G-SMuRFS, GLRR3, GLRR5, respectively. Columns: Cases 1-5 n = 1,000

GLRR better for low rank, lasso and

GLRR are similar for high rank

(27)

ADNI Database: n = 749 subjects, d = 93 ROI 

volumes, p = 1,072 SNPs on top  40 genes from 

AlzGene database. 

Standardized ROI volumes and SNPs

Smallest BIC was at r = 3 (checked r = 1 to 10)

Compute Binary B (say, Bbin) using p‐value < 0.001 

thresholding

Columns of U correspond to SNPs and Columns of V 

correspond to ROI

Compute BbinT B

bin (ROI), Bbin BbinT (SNP)

GLRR:

 

ADNI

 

Application

(28)

Largest Diagonals Largest Column Sum

Top ROI: highest # of 

significant SNPs

Top ROI: highest # sig. SNPs and 

highest # sig. of SNPs that also  

affect other ROIs

7.1 g protein/ounce 0.81 g  protein/ounce 0.10 g protein/calorie 0.12 g  protein/calorie

GLRR:

 

Using

 

B

binT

B

(29)

GLRR:

 

ADNI

 

Results

-log10(p) of B -log10 (p) of U -log10 (p) of V

B BbinTB

(30)

GLRR:

 

ADNI

 

ROI

 

Network

Top 20 ROIs based on BbinTB

bin and 3 layers of V

ROIs most highly correlated with rs10792821(PICALM),

rs9791189(NEDD9), rs9376660(LOC651924), rs17310467(PRNP), respectively.

(31)
(32)
(33)

q*

 

=

 

number

 

of

 

random

 

effects

Covariance

 

estimation

 

same

 

as

 

GLRR

Can

 

apply

 

Gibbs

 

sampler

(34)
(35)

L2R2

 

:

 

Simulated

 

ROC

L2R2 and G‐SMuRFS 

same for prognostic 

factors

L2R2  better than G‐SMuRFS for 

(36)

L2R2:

 

Simulated

 

Image

 

Recovery

True      G‐SMuRFS       L2R2 Mod.  Sparse Ext.  Spares

(37)

GLRR

 

outperforms

 

LASSO,

 

BLASSO,

 

and

 

G

SMuRFS

 

in

 

a

 

great

 

many

 

settings.

Gibbs:

 

Scale

 

to

 

larger

 

dimensions

 

only

 

feasible

 

choice

 

for

 

HD

 

data

Metropolis:

 

Don’t

 

scale

Single try: works on small dimensions

Multiple

 

try:

 

only

 

on

 

tiny

 

dimensions

Selection

 

with

 

p >> n

is

 

unstable

(38)

Computer

 

code

 

written

 

in

 

MATLAB

For

 

r=3

 

in

 

GLRR,

 

30

 

minutes

 

for

 

10K

 

samples

 

(1500

 

parameters).

For

 

r=5

 

in

 

GLRR,

 

40

 

minutes

 

for

 

10K

 

samples

 

(2500

 

parameters)

BLASSO

 

takes

 

3

 

hours

 

(40K

 

parameters).

References

Related documents

Thiagarajan et al [122] examine what types of browsing content leads to higher energy consumption. They find complex Javascript and CSS, as well as certain image types, are a

“Accurate intraocular lens power calculation after myopic laser in situ keratomileusis, by passing corneal power

Our San Diego Bay study site, the Sweetwater Marsh Na- tional Wildlife Refuge, has one of the longest and most detailed records of habitat development at a mit- igation site: data

 A “ meta-model ”, an information model depicting the information structure of the system. The syntax used for these models are a Mendix specific graphical notation, however the

Because our construction of the oil price expectations measure relies on a different loss function than the loss function used in fitting the excess return, we automatically

As at 1 Sept 2020, SCGBHD is a registered supplier of cables and wires with Tenaga Nasional Bhd (TNB), Sabah Electricity S/B, Telekom Malaysia Bhd (TMB) and Petroliam

This environment, for example, contains time to interact personally, or a budget for teams to engage in non-profit activities (such as team events). Additionally,

The objective of this research was to analyze the effects of age, stress, parity, nutritional status, and anemia experienced by the mother during pregnancy that