Bayesian Penalized Methods for High Dimensional Data

(1)

Bayesian

Penalized

Methods

for

High

Dimensional

Data

Joseph

G. Ibrahim

Joint

with

Hongtu

Zhu

(2)

• Motivation

• GLRR:

Bayesian

Generalized

Low

Rank

Regression

• L2R2:

Bayesian

Longitudinal

Low

Rank

Regression

• ADNI

data

analysis

(3)

(4)

Alzheimer’s

Disease

•

Alzheimer's disease (AD) is an escalating national epidemic and a

genetically complex, progressive, and fatal neurodegenetive

disease.

• The incidence of AD doubles every five years after the age of 65 and the number of AD patients has recently dramatically

increased, which has caused a heavy socioeconomic burden.

• AD is the sixth‐leading cause of death in the United States, and

(5)

ADNI

Database

• The Alzheimer's Disease Neuroimaging Initiative (ADNI) is the first

"Big Data" project for AD and is collecting imaging, genetic,

clinical, and cognitive data for measuring the progress of AD or

the effects of treatment.

• ADNI began 2004 and has three phases including ADNI 1, ADNI

Go, and ADNI 2.

• Efficiently integrating big ADNI data may lead to

(AD1) detecting AD at the earliest stage possible and marking its progress

through biomarkers;

(AD2) developing new diagnostic methods for AD intervention, prevention and

(6)

ADNI

Database

• ADNI

1.

Integrating Imaging and Genetic Data to identify

genetic and environmental contributions to brain baseline data

and brain development trajectories.

• Model:

Brain

volume

=

f(SNP,

age,

gender,

…)

• Data:

•

Genotype: SNPs (X) (≈600,000+)

•

MRI ROI (region‐of‐interest volumes = Y) (93)

•

Prognostic factors: age, gender, education, etc.

(7)

Magnetic Resonance Imaging (MRI)

•

Voxel is 3‐D version of Pixel

•

MRI machine reads signal on

a voxel, stores in 3‐D array

•

sMRI = structure of brain

•

fMRI = brain activity from

blood flow

• Voxel:

n

subjects

will

yield

n

x6 million

matrix

• ROIs

reduce

dimension

to

93 ROIs

(8)

Single

‐

Nucleotide

Polymorphism

(SNP)

• Normal

(not

rare)

different

nucleotides

in

the

same

location

• SNPs

may

affect

gene

function

• ADNI:

600,000

SNPs

→

n=750

<<

600,000

SNPs

Select SNPs only on top 40 genes reported by

(9)

Bayesian

Shrinkage

and

Selection

• Prior

:

• ̶

_log(prior)

₌

_penalty

_function

₌

• Posterior:

• Frequentist

penalized

estimation

≡

Maximum

aposteriori

(MAP)

estimation

• MLE

sets

penalty

to

0 (MAP

with

non

‐

(10)

Bayesian

Shrinkage

and

Selection

• choice

• α ≤

1 →

shrinkage

and

selection:

creates

singularity

at

0 and

a

black

hole

,

to

pull

smaller

elements

to

0

• Bridge

regression:

α

<

1

• L

₁

priors

(lasso,

adaptive

lasso):

α

=

1

• α

>

1 →

No

selection,

shrinkage

only

(11)

Black

Hole

Priors:

α ≤

1

Prior creates a singularity at origin. MAP estimation allows selection and shrinkage Unstable around the boundary

(12)

Want

huge

spike

(gravity)

at

the

origin;

Gravity

should

pull

the

smaller

coefficients

to

0

Huge spike/gravity implies smaller

coefficients shrink more

Smaller spike/gravity implies smaller

coefficients shrink less

Distributional

Perspective

Singularity/Discontinuity at the origin

(13)

Want

heavy

tails/minimum

gravity

/

flat

density

far

from

origin;

Gravity

should

not

affect

the

larger

coefficients

Steeper slope/stronger gravity implies larger coefficients shrink more Flatter tail/weaker gravity

implies larger

coefficients shrink less

(14)

Commonly

Used

Priors

Larger spike at the origin and heavier tails

(15)

(16)

• Do

SNPs

act

alone

or

work

together?

• Do

the

ROIs

also

act

together?

• Do

ROIs

and

SNPs

acting

together

support

some

underlying

structure

in

the

regression

coefficients.

• We

try

and

exploit

this

structure

to

reduce

dimension

(17)

GLRR:

Low

Rank

Regression

p

_*

=

**r*(p+d)**

<<

**p*d,**

**5*(1K+1K)**

=

10K

<<

**1K*1K**

=

1 million

(18)

• U

and

V

need

not

be

unitary

(orthonormal)

–

otherwise

need

matrix

VMF

and

metropolis

• No

ordering

restriction

on

elements

of

Δ

–

otherwise

need

truncated

normal

and

metropolis

• Many

Bayesian

applications

do

not

require

identifiability

• Allows

closed

form

full

conditionals

to

apply

Gibbs

sampler

•

scale to larger dimensions

•

computational efficiency

(19)

(20)

GLRR:

Model

and

Priors

Cov(Y_i) =

(21)

GLRR:

Why

L

₂

Priors

If

covariates

are

correlated

‐

• L

₂

tends

to

push

them

towards

each

other

¾ more correlated estimates (Ridge), reason for

our choice

• _L

₁

_tends

_to

_pick

_one,

_force

_the

_rest

_to

₀

least absolute subset selection operator (lasso)

True β 1 1 1 1 1 1 1 1 1 1

OLS _2.95 _1.09 _‐_1.11 _1.24 _0.98 _0.98 _1.57 _1.14 _1.33 _0.66 Ridge _1.13 _1.02 _0.75 _1.19 _0.86 _0.99 _1.46 _1.03 _1.21 _0.62 Lasso ₀ ₀ ₀ _2.95 ₀ _0.07 _0.97 ₀ _0.23 ₀

(22)

GLRR:

Comparison

Criteria

for

Determining

the

Rank

of

B

(23)

(24)

(25)

GLRR:

Simulated

ROC

Blue:GLRR5 Red:GLRR3 Black: = LASSO --- : BLASSO … : G-SMuRFS

(26)

GLRR:

Simulated

Image

Recovery

Rows: True, LASSO, BLASSO, G-SMuRFS, GLRR3, GLRR5, respectively. Columns: Cases 1-5 n = 1,000

GLRR better for low rank, lasso and

GLRR are similar for high rank

(27)

•

ADNI Database: n = 749 subjects, d = 93 ROI

volumes, p = 1,072 SNPs on top 40 genes from

AlzGene database.

•

Standardized ROI volumes and SNPs

•

Smallest BIC was at r = 3 (checked r = 1 to 10)

•

Compute Binary B (say, B_bin) using p‐value < 0.001

thresholding

•

Columns of U correspond to SNPs and Columns of V

correspond to ROI

•

Compute B_binT _B

bin (ROI), Bbin BbinT (SNP)

GLRR:

ADNI

Application

(28)

Largest Diagonals Largest Column Sum

Top ROI: highest # of

significant SNPs

Top ROI: highest # sig. SNPs and

highest # sig. of SNPs that also

affect other ROIs

7.1 g protein/ounce 0.81 g protein/ounce 0.10 g protein/calorie 0.12 g protein/calorie

GLRR:

Using

B

_binT

_B

(29)

GLRR:

ADNI

Results

-log₁₀(p) of B -log₁₀(p) of U -log₁₀(p) of V

B B_binT_B

(30)

GLRR:

ADNI

ROI

Network

Top 20 ROIs based on B_binT_B

bin and 3 layers of V

ROIs most highly correlated with rs10792821(PICALM),

rs9791189(NEDD9), rs9376660(LOC651924), rs17310467(PRNP), respectively.

(31)

(32)

(33)

• q*

=

number

of

random

effects

• Covariance

estimation

same

as

GLRR

• Can

apply

Gibbs

sampler

(34)

(35)

L2R2

:

Simulated

ROC

L2R2 and G‐SMuRFS

same for prognostic

factors

L2R2 better than G‐SMuRFS for

(36)

L2R2:

Simulated

Image

Recovery

True G‐SMuRFS L2R2 Mod. Sparse Ext. Spares

(37)

• GLRR

outperforms

LASSO,

BLASSO,

and

G

‐

SMuRFS

in

a

great

many

settings.

• Gibbs:

Scale

to

larger

dimensions

• only

feasible

choice

for

HD

data

• Metropolis:

Don’t

scale

• Single try: works on small dimensions

• Multiple

try:

only

on

tiny

dimensions

• Selection

with

p >> n

is

unstable

(38)