Bayesian
Penalized
Methods
for
High
Dimensional
Data
Joseph
G.
Ibrahim
Joint
with
Hongtu
Zhu
•
Motivation
•
GLRR:
Bayesian
Generalized
Low
Rank
Regression
•
L2R2:
Bayesian
Longitudinal
Low
Rank
Regression
•
ADNI
data
analysis
Alzheimer’s
Disease
•
Alzheimer's disease (AD) is an escalating national epidemic and agenetically complex, progressive, and fatal neurodegenetive
disease.
• The incidence of AD doubles every five years after the age of 65 and the number of AD patients has recently dramatically
increased, which has caused a heavy socioeconomic burden.
• AD is the sixth‐leading cause of death in the United States, and
ADNI
Database
• The Alzheimer's Disease Neuroimaging Initiative (ADNI) is the first
"Big Data" project for AD and is collecting imaging, genetic,
clinical, and cognitive data for measuring the progress of AD or
the effects of treatment.
• ADNI began 2004 and has three phases including ADNI 1, ADNI
Go, and ADNI 2.
• Efficiently integrating big ADNI data may lead to
(AD1) detecting AD at the earliest stage possible and marking its progress
through biomarkers;
(AD2) developing new diagnostic methods for AD intervention, prevention and
ADNI
Database
•
ADNI
1.
Integrating Imaging and Genetic Data to identify
genetic and environmental contributions to brain baseline data
and brain development trajectories.
•
Model:
Brain
volume
=
f(SNP,
age,
gender,
…)
•
Data:
•
Genotype: SNPs (X) (≈600,000+)•
MRI ROI (region‐of‐interest volumes = Y) (93)•
Prognostic factors: age, gender, education, etc.Magnetic Resonance Imaging (MRI)
•
Voxel is 3‐D version of Pixel•
MRI machine reads signal ona voxel, stores in 3‐D array
•
sMRI = structure of brain•
fMRI = brain activity fromblood flow
•
Voxel:
n
subjects
will
yield
n
x6 million
matrix
•
ROIs
reduce
dimension
to
93
ROIs
Single
‐
Nucleotide
Polymorphism
(SNP)
•
Normal
(not
rare)
different
nucleotides
in
the
same
location
•
SNPs
may
affect
gene
function
•
ADNI:
600,000
SNPs
→
n=750
<<
600,000
SNPs
Select SNPs only on top 40 genes reported by
Bayesian
Shrinkage
and
Selection
•
Prior
:
•
̶
log(prior)
=
penalty
function
=
•
Posterior:
•
Frequentist
penalized
estimation
≡
Maximum
aposteriori
(MAP)
estimation
•
MLE
sets
penalty
to
0
(MAP
with
non
‐
Bayesian
Shrinkage
and
Selection
•
Popular
choice
•
α ≤
1
→
shrinkage
and
selection:
creates
singularity
at
0
and
a
black
hole
,
to
pull
smaller
elements
to
0
•
Bridge
regression:
α
<
1
•
L
1priors
(lasso,
adaptive
lasso):
α
=
1
•
α
>
1
→
No
selection,
shrinkage
only
Black
Hole
Priors:
α ≤
1
Prior creates a singularity at origin. MAP estimation allows selection and shrinkage Unstable around the boundaryWant
huge
spike
(gravity)
at
the
origin;
Gravity
should
pull
the
smaller
coefficients
to
0
Huge spike/gravity implies smaller
coefficients shrink more
Smaller spike/gravity implies smaller
coefficients shrink less
Distributional
Perspective
Singularity/Discontinuity at the origin
Want
heavy
tails/minimum
gravity
/
flat
density
far
from
origin;
Gravity
should
not
affect
the
larger
coefficients
Steeper slope/stronger gravity implies larger coefficients shrink more Flatter tail/weaker gravity
implies larger
coefficients shrink less
Commonly
Used
Priors
Larger spike at the origin and heavier tails
•
Do
SNPs
act
alone
or
work
together?
•
Do
the
ROIs
also
act
together?
•
Do
ROIs
and
SNPs
acting
together
support
some
underlying
structure
in
the
regression
coefficients.
•
We
try
and
exploit
this
structure
to
reduce
dimension
GLRR:
Low
Rank
Regression
p
*=
r*(p+d)
<<
p*d,
5*(1K+1K)
=
10K
<<
1K*1K
=
1
million
•
U
and
V
need
not
be
unitary
(orthonormal)
–
otherwise
need
matrix
VMF
and
metropolis
•
No
ordering
restriction
on
elements
of
Δ
–
otherwise
need
truncated
normal
and
metropolis
•
Many
Bayesian
applications
do
not
require
identifiability
•
Allows
closed
form
full
conditionals
to
apply
Gibbs
sampler
•
scale to larger dimensions•
computational efficiencyGLRR:
Model
and
Priors
Cov(Yi) =
GLRR:
Why
L
2Priors
If
covariates
are
correlated
‐
•
L
2tends
to
push
them
towards
each
other
¾ more correlated estimates (Ridge), reason for
our choice
•
L
1tends
to
pick
one,
force
the
rest
to
0
least absolute subset selection operator (lasso)
True β 1 1 1 1 1 1 1 1 1 1
OLS 2.95 1.09 ‐1.11 1.24 0.98 0.98 1.57 1.14 1.33 0.66 Ridge 1.13 1.02 0.75 1.19 0.86 0.99 1.46 1.03 1.21 0.62 Lasso 0 0 0 2.95 0 0.07 0.97 0 0.23 0
GLRR:
Comparison
Criteria
for
Determining
the
Rank
of
B
GLRR:
Simulated
ROC
Blue:GLRR5 Red:GLRR3 Black: = LASSO --- : BLASSO … : G-SMuRFSGLRR:
Simulated
Image
Recovery
Rows: True, LASSO, BLASSO, G-SMuRFS, GLRR3, GLRR5, respectively. Columns: Cases 1-5 n = 1,000
GLRR better for low rank, lasso and
GLRR are similar for high rank
•
ADNI Database: n = 749 subjects, d = 93 ROIvolumes, p = 1,072 SNPs on top 40 genes from
AlzGene database.
•
Standardized ROI volumes and SNPs•
Smallest BIC was at r = 3 (checked r = 1 to 10)•
Compute Binary B (say, Bbin) using p‐value < 0.001thresholding
•
Columns of U correspond to SNPs and Columns of Vcorrespond to ROI
•
Compute BbinT Bbin (ROI), Bbin BbinT (SNP)
GLRR:
ADNI
Application
Largest Diagonals Largest Column Sum
Top ROI: highest # of
significant SNPs
Top ROI: highest # sig. SNPs and
highest # sig. of SNPs that also
affect other ROIs
7.1 g protein/ounce 0.81 g protein/ounce 0.10 g protein/calorie 0.12 g protein/calorie
GLRR:
Using
B
binTB
GLRR:
ADNI
Results
-log10(p) of B -log10 (p) of U -log10 (p) of V
B BbinTB
GLRR:
ADNI
ROI
Network
Top 20 ROIs based on BbinTB
bin and 3 layers of V
ROIs most highly correlated with rs10792821(PICALM),
rs9791189(NEDD9), rs9376660(LOC651924), rs17310467(PRNP), respectively.
•
q*
=
number
of
random
effects
•
Covariance
estimation
same
as
GLRR
•
Can
apply
Gibbs
sampler
L2R2
:
Simulated
ROC
L2R2 and G‐SMuRFSsame for prognostic
factors
L2R2 better than G‐SMuRFS for