Supplementary Materials for

(1)

advances.sciencemag.org/cgi/content/full/7/2/eabb5398/DC1

Supplementary Materials for

Molecular subtyping of Alzheimer’s disease using RNA sequencing data reveals

novel mechanisms and targets

Ryan A. Neff, Minghui Wang, Sezen Vatansever, Lei Guo, Chen Ming, Qian Wang, Erming Wang, Emrin Horgusluoglu-Moloch, Won-min Song, Aiqun Li, Emilie L. Castranio, Julia TCW, Lap Ho, Alison Goate,

Valentina Fossati, Scott Noggle, Sam Gandy, Michelle E. Ehrlich, Pavel Katsel, Eric Schadt, Dongming Cai, Kristen J. Brennand, Vahram Haroutunian, Bin Zhang*

*Corresponding author. Email: [email protected] Published 6 January 2021, Sci. Adv. 7, eabb5398 (2021)

DOI: 10.1126/sciadv.abb5398

The PDF file includes:

Supplementary Results Figs. S1 to S11

Tables S1 to S5

Legends for data files S1 to S3

Other Supplementary Material for this manuscript includes the following:

(available at advances.sciencemag.org/cgi/content/full/7/2/eabb5398/DC1) Data files S1 to S3

(2)

Supplementary Results

Selection of Clustering Algorithm for AD Subtyping Analysis

As shown in Fig. S3, we successfully identify clusters of related samples using all four methods for each of the four brain regions. To determine the likelihood that these sample clusters may represent molecular subtypes of AD and that the sample grouping is consistent, we performed 50 rounds of bootstrapped reclustering using each clustering algorithm while withholding 20% of the samples and genes per round; we then empirically calculated the likelihood that samples are consistently grouped together from the observed clusters compared with a distribution of 100,000 possible random groupings (Methods). A specific subtype grouping is considered a putative subtype if its empirically-adjusted p-value is less than 0.05. Using this method, we detect the presence of putative AD subtypes using all four clustering methods (emp. p-value: <0.05) using molecular data from the PHG region alone. Among the four algorithms evaluated, our new network-based clustering approach, WSCNA, shows the highest likelihood of stable subtypes compared to random clustering of 7.08:1 (emp. p-value: <1*10-5_{) in the PHG region.}

The other clustering methods also identify subtypes in the PHG, but with a smaller likelihood ratio. Furthermore, among all four brain regions in the MSBB-AD, the PHG region shows the most robust AD subtype signal, likely due to its large DEG signature.

Ranking of Predictive AD Subtype Specific Genes and Modules

To understand which genes are most important for distinguishing the three AD subtype classes, we rank the Gini importance of each feature in the RF model across all 271 features validated in both MSBB and ROSMAP. Fig. S11a shows the expression levels of the top 30 features in the model, which together explain 28.6% of the AD subtype classification variance. Interestingly, many of these top features such as VPS35, SV2B, KCNB1, OPCML, and DOCK3 have been implicated as key modulators of AD (40–44).

To further quantify how gene expression changes explain the subtypes predicted by the model, we estimate which differential expression changes lead to the greatest change in RF model accuracy using a model explainer algorithm(45). Such analysis identifies expression quartiles most important for classifying each AD subtype. As shown in Fig. S11b, decreased expression of many neuronal genes, including SCAMP5, SV2B, KCNB1, and DOCK3 best classify the class C subtypes while increased expression of a mix of the neuronal/astrocytic genes CAND1, DNAJA2, VPS35, and GHITM best characterize the class A subtype. The class B subtypes can be best predicted by increased expression of CAMSAP3 and SPEG (astrocytic gene) and decreased expression of KIAA1033 (microglial gene) and LMAN1 (OPC gene). After the batch normalization across the datasets to remove technical differences, the quartiles shown in Fig. S11b for each gene are consistent across the MSBB-AD PHG and ROSMAP datasets.

(3)

Fig. S1. Overview of cohorts, methods, and analyses performed in this study. Two human

cohorts of AD, MCI, and non-demented controls (MSBB-AD and ROSMAP) along with a library of AD mouse model signatures compiled by the authors were used in this study. Confounding diseases and conditions were excluded prior to RNAseq and downstream analysis as denoted in the figure. Analyses inside the yellow highlighted regions were performed on both MSBB-AD and ROSMAP cohorts through a standardized pipeline to minimize any technical variation, while other analyses were performed on the cohort which matches its highlighted color. Colored arrows denote key inputs for further post-hoc analyses.

~1900 individuals with late-onset dementia,

mild cognitive impairment, and

non-demented controls

canonical AD, mild cognitive impairment,

and non-demented controls

Excluded those with significant cerebrovascular disease, hippocampal sclerosis, DLBD, other comorbidities Random forest classification model training and

cross-validation within MSBB-AD RNA-seq data

normalization:

Mixed model correction for covariates (age, gender, PMI, RIN, rRNA,

exonic rate, batch) + Linear effect model correction for AD stages

eQTL data generation and normalization

WSCNA Clustering + AD Subtype Refinement (Dynamic Tree Cut)

DEG analysis (AD Subtype vs. control) MEGENA key network

regulator (KNR) gene analysis DEG analysis

(AD vs. control)

KNR association with brain cell types 3,322 individuals with

various clinical phenotypes

canonical AD, mild cognitive impairment,

and non-demented controls

Excluded those without AD, other cognitive dementias

Random forest prediction in ROSMAP

cohort

Classification overlap between predicted and actual ROSMAP

subtypes MEGENA network

construction Bayesian causal network analysis (gene

expression, eQTL) BN key network regulator (KNR) gene analysis Gene Expression Omnibus (GEO) / Synapse.org / Mount Sinai Curation of high-quality AD mouse models with RNAseq expression available AD Mouse model Molecular Signatures Library Candidate mouse model selection

Candidate target gene selection

Cohort design

Gene set enrichment analysis (GSEA) between mouse and

human molecular signatures

Whole cohort molecular analyses AD subtype-specific analyses

Raw data Normalized data Post-hoc analyses

Brain cell type proportion analysis DEG analysis (experiment vs. control) ROSMAP MSBB-AD AD Mouse Models

(4)

Fig. S2. Evaluation of brain cell-type proportion signatures identified by RNAseq before and after normalization across Alzheimer's brains in the MSBB-AD cohort. A-C)

Cell type surrogate proportion values (SPVs) of sequenced cell types in the bulk transcriptomic data for MSBB-AD inferred for six brain cell types (astrocytes,

endothelials, microglia, neurons, oligodendrocytes, oligodendrocyte precursor cells) by marker gene expression (see Methods). A) Before normalization for dementia severity (CDR score). B) After normalization for dementia severity (CDR score). C). With both normalization for dementia severity and inferred cell type proportion.

(5)

Fig. S3. Evaluation of sample clustering methods and brain regions for subtyping. A)

Bootstrapped (n=50, 80:20 train-test split) consensus matrices from four different clustering methods using the gene expression data (not normalized for disease severity) from the PHG in the MSBB-AD cohort. The probability that pairs of samples are grouped into the same cluster is then calculated for every subject pair and then plotted on a

heatmap. B) Clustering stability across different brain regions using WSCNA, measured by proportion of samples within each cluster and each brain region. Up to 7 clusters are detected in the PHG (BM36) and IFG (BM44) while only 4 clusters with lower stability are detected in the FP and STG. C) Bootstrapped (n=50, 80:20 train-test split) cluster stability for each brain region and each method. D) Bootstrapped label swap (n=1000) is used to assess whether sample clusters detected in other brain regions by WSCNA have a bias for the same subtype as detected in the PHG. The same subtype could be identified in at least 4 of the 6 subtypes.

Agglomerative K-medoids

MEGENA

Probability sample pairs clustered

together

Clustering consensus matrix for each method

(bootstrapped n=50)

Probability sample pairs within clusters clustered together

Brain region Hierarchical K-medoids MEGENA WINA

FP 55.0% 64.1% 54.5% 48.6%

STG 57.7% 48.8% 66.5% 54.3%

PHG 50.5% 50.5% 79.6% 77.7%

IFG 51.6% 57.0% 79.6% 71.9%

Probability sample pairs not in clusters clustered together

FP 30.4% 45.3% 32.4% 15.7%

STG 33.0% 45.7% 32.0% 19.6%

PHG 27.2% 52.0% 30.6% 24.6%

IFG 28.2% 47.0% 37.4% 39.5%

Odds ratio of stable clustering

FP 1.81 1.42 1.68 3.09

STG 1.75 1.07 2.08 2.77

PHG 1.86 0.97 2.60 3.16

IFG 1.83 1.21 2.13 1.82

Probability that sample pairs are clustered together (Rand index) bootstrapped p-value 1000 times (label swap)

blue brown green untyped red turquoise yellow

# samples 23 18 13 7 5 21 8 FP 29.0% 37.0% 29.9% 30.0% 38.2% 43.0% 48.2% STG 47.4% 38.1% 43.5% 42.8% 45.0% 46.4% 39.9% PHG 78.8% 70.4% 68.8% 61.9% 79.5% 89.2% 80.3% IFG 58.0% 52.3% 46.8% 50.5% 56.8% 45.4% 64.2% FP p-value 0.882 0.259 0.878 0.946 0.724 0.019 0.099 STG p-value 0 0.243 0.063 0.386 0.485 0.001 0.514 PHG p-value 0 0 0 0.006 0.002 0 0 IFG p-value 0.022 0.345 0.896 0.76 0.538 0.935 0.082 A 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 Proportion of samples

always clustering together

WSCNA Cluster # B C _D WSCNA Brain Region FP STG PHG IFG

(6)

Fig. S4. High clustering stability of the CDR-normalized WSCNA subtypes A) Heatmap of

pairwise clustering rate (rate that two samples end up in the same WSCNA cluster) across 50 bootstrapped runs (80-20 split of genes and samples) of WSCNA on the

CDR-normalized gene expression data in the PHG from the MSBB-AD cohort, with row and column colors representing true AD subtypes. B-C) Bar chart and table of mean pairwise clustering rate for samples by AD cluster, where the clustering rate is defined by the rate at which pairs of samples within a subtype are grouped together, compared to grouped with samples from other subtypes. P-values indicate the significance level of a mean clustering rate as high as or higher than the true clustering rate per AD subtype, calculated empirically by 100k trials of random label swap.

(7)

Fig. S5. Gene module expression across brain regions. Mean gene expression heatmap of the

WINA modules in the four brain regions in the MSBB-AD cohort.

Mean gene module differential gene expression (vs. non-AD) A (yellow) B1 (red) B2 (blue) C1 (turquoise) C2 (orange) Subtype Subtype Brain region Ge n e m o d u le Brain region IFG FP STG PHG

(8)

Fig S6. Investigation of Additional Pathologic Variables of Amyloid-beta and Tau from

(9)

Fig. S7. Polygenic risk scores per AD subtype class in the MSBB-AD cohort. Polygenic risk

scores were calculated using the PRSice R package and the Kunkle et al. 2019 dataset. Non-European subjects were excluded from the MSBB-AD cohort before fitting the PRS model. Significant Welsh’s t-test p-values between pairwise comparisons are annotated in the figure (a<0.05).

MSBB-AD subtype class

Polygenic risk score (PRS)

(10)

Fig. S8. Subtype signatures may predate AD diagnosis in MSBB-AD. WSCNA clustering

dendrogram of all the PHG samples in the MSBB cohort, demonstrating the control and AD samples are mixed in all the sample clusters.

0. 65 0. 70 0. 75 0. 80 0. 85 0. 90 0. 95 1. 00

WINA Clustering Dendrogram Using Both AD and Non-AD Samples

MSBB-AD PHG Samples De n d ro g ra m He ig h t S1 10 B3 55 .BM _ 3 6_ 3 17 K7 9C 0 14 .h B_ R N A_ 11 0 23 P1 9B6 4 8. BM _3 6 _5 4 5 S1 11 B3 55 .BM _ 3 6_ 3 88 S1 10 B3 55 .BM _ 3 6_ 3 45 S1 10 B3 55 .BM _ 3 6_ 3 49 S1 11 B3 55 .BM _ 3 6_ 3 98 P1 9B6 4 8. BM _3 6 _5 1 7 S1 10 B3 55 .BM _ 3 6_ 3 62 S1 11 B3 55 .BM _ 3 6_ 3 82 S1 11 B3 55 .BM _ 3 6_ 4 00 S1 09 B3 55 .BM _ 3 6_ 2 82 K7 7C 0 14 .h B_ R N A_ 10 8 02 K7 7C 0 14 .h B_ R N A_ 10 7 62 P1 9B6 4 8. BM _3 6 _5 3 8 P1 9B6 4 8. BM _3 6 _5 3 1 S1 51 B6 48 .BM _ 3 6_ 4 90 S1 10 B3 55 .BM _ 3 6_ 3 52 S1 51 B6 48 .BM _ 3 6_ 4 55 S1 10 B3 55 .BM _ 3 6_ 3 08 S1 09 B3 55 .BM _ 3 6_ 2 81 S1 10 B3 55 .BM _ 3 6_ 3 51 S1 10 B3 55 .BM _ 3 6_ 3 54 S1 51 B6 48 .BM _ 3 6_ 4 66 S1 10 B3 55 .BM _ 3 6_ 3 27 P1 9B6 4 8. BM _3 6 _5 3 3 S1 11 B3 55 .BM _ 3 6_ 4 17 S1 51 B6 48 .BM _ 3 6_ 4 48 S1 10 B3 55 .BM _ 3 6_ 3 56 S1 51 B6 48 .BM _ 3 6_ 4 35 S1 11 B3 55 .BM _ 3 6_ 4 16 S1 51 B6 48 .BM _ 3 6_ 4 83 S1 11 B3 55 .BM _ 3 6_ 4 24 S1 51 B6 48 .BM _ 3 6_ 4 50 P1 9B6 4 8. BM _3 6 _5 2 0 S1 51 B6 48 .BM _ 3 6_ 5 02 S1 51 B6 48 .BM _ 3 6_ 5 15 S1 11 B3 55 .BM _ 3 6_ 3 94 S1 51 B6 48 .BM _ 3 6_ 4 28 S1 11 B3 55 .BM _ 3 6_ 4 19 S1 11 B3 55 .BM _ 3 6_ 3 76 S1 10 B3 55 .BM _ 3 6_ 3 07 S1 51 B6 48 .BM _ 3 6_ 4 80 P1 9B6 4 8. BM _3 6 _5 1 8 S1 51 B6 48 .BM _ 3 6_ 4 93 S1 10 B3 55 .BM _ 3 6_ 3 44 S1 10 B3 55 .BM _ 3 6_ 3 47 P1 9B6 4 8. BM _3 6 _5 2 8 S1 10 B3 55 .BM _ 3 6_ 3 63 S1 11 B3 55 .BM _ 3 6_ 3 97 S1 10 B3 55 .BM _ 3 6_ 3 20 S1 51 B6 48 .BM _ 3 6_ 4 32 S1 10 B3 55 .BM _ 3 6_ 3 14 S1 11 B3 55 .BM _ 3 6_ 4 23 S1 10 B3 55 .BM _ 3 6_ 3 10 S1 51 B6 48 .BM _ 3 6_ 4 41 P1 9B6 4 8. BM _3 6 _5 2 1 S1 10 B3 55 .BM _ 3 6_ 3 68 S1 11 B3 55 .BM _ 3 6_ 4 13 S1 11 B3 55 .BM _ 3 6_ 3 90 S1 15 B3 55 .BM _ 3 6_ 3 35 S1 10 B3 55 .BM _ 3 6_ 3 26 S1 11 B3 55 .BM _ 3 6_ 3 77 S1 15 B3 55 .BM _ 3 6_ 3 41 S1 11 B3 55 .BM _ 3 6_ 3 95 S1 15 B3 55 .BM _ 3 6_ 3 39 S1 51 B6 48 .BM _ 3 6_ 5 09 S1 51 B6 48 .BM _ 3 6_ 4 30 S1 11 B3 55 .BM _ 3 6_ 4 12 S1 10 B3 55 .BM _ 3 6_ 3 28 S1 09 B3 55 .BM _ 3 6_ 2 84 S1 11 B3 55 .BM _ 3 6_ 3 78 S1 09 B3 55 .BM _ 3 6_ 2 90 S1 10 B3 55 .BM _ 3 6_ 3 05 E0 07 C 01 4 .h B_ R N A_ 1 0 67 2 S1 51 B6 48 .BM _ 3 6_ 4 85 S1 10 B3 55 .BM _ 3 6_ 3 48 S1 11 B3 55 .BM _ 3 6_ 4 11 S1 51 B6 48 .BM _ 3 6_ 4 76 P1 9B6 4 8. BM _3 6 _5 2 5 S1 51 B6 48 .BM _ 3 6_ 4 82 P1 9B6 4 8. BM _3 6 _5 3 9 S1 10 B3 55 .BM _ 3 6_ 3 67 S1 10 B3 55 .BM _ 3 6_ 3 22 P1 9B6 4 8. BM _3 6 _5 4 1 S1 10 B3 55 .BM _ 3 6_ 3 61 S1 11 B3 55 .BM _ 3 6_ 3 87 S1 10 B3 55 .BM _ 3 6_ 3 04 S1 51 B6 48 .BM _ 3 6_ 4 65 P1 9B6 4 8. BM _3 6 _5 4 3 S1 10 B3 55 .BM _ 3 6_ 3 55 S1 10 B3 55 .BM _ 3 6_ 3 65 S1 11 B3 55 .BM _ 3 6_ 3 99 K7 9C 0 14 .h B_ R N A_ 10 9 72 S1 10 B3 55 .BM _ 3 6_ 3 69 S1 51 B6 48 .BM _ 3 6_ 5 05 S1 10 B3 55 .BM _ 3 6_ 3 12 S1 10 B3 55 .BM _ 3 6_ 3 50 S1 15 B3 55 .BM _ 3 6_ 3 40 S1 11 B3 55 .BM _ 3 6_ 3 93 P1 9B6 4 8. BM _3 6 _5 2 7 S1 10 B3 55 .BM _ 3 6_ 3 30 S1 10 B3 55 .BM _ 3 6_ 3 31 S1 09 B3 55 .BM _ 3 6_ 2 86 S1 51 B6 48 .BM _ 3 6_ 4 96 S1 51 B6 48 .BM _ 3 6_ 4 58 S1 10 B3 55 .BM _ 3 6_ 3 58 S1 51 B6 48 .BM _ 3 6_ 4 72 K7 7C 0 14 .h B_ R N A_ 10 3 72 S1 11 B3 55 .BM _ 3 6_ 4 05 S1 11 B3 55 .BM _ 3 6_ 3 86 K7 7C 0 14 .h B_ R N A_ 10 3 82 K7 7C 0 14 .h B_ R N A_ 10 4 72 S1 10 B3 55 .BM _ 3 6_ 3 60 P1 9B6 4 8. BM _3 6 _5 4 4 S1 51 B6 48 .BM _ 3 6_ 4 67 S1 11 B3 55 .BM _ 3 6_ 4 07 S1 11 B3 55 .BM _ 3 6_ 4 06 S1 11 B3 55 .BM _ 3 6_ 4 08 S1 09 B3 55 .BM _ 3 6_ 2 79 S1 51 B6 48 .BM _ 3 6_ 4 64 S1 11 B3 55 .BM _ 3 6_ 4 10 S1 51 B6 48 .BM _ 3 6_ 4 71 K7 7C 0 14 .h B_ R N A_ 10 4 62 E0 07 C 01 4 .h B_ R N A_ 1 0 58 3 S1 10 B3 55 .BM _ 3 6_ 3 71 L4 3C 0 14. hB _ R N A _ 106 22 P1 9B6 4 8. BM _3 6 _5 3 4 S1 51 B6 48 .BM _ 3 6_ 4 86 S1 11 B3 55 .BM _ 3 6_ 4 03 S1 09 B3 55 .BM _ 3 6_ 2 78 E0 07 C 01 4 .h B_ R N A_ 1 0 82 2 K7 9C 0 14 .h B_ R N A_ 11 1 42 K7 9C 0 14 .h B_ R N A_ 11 1 32 K7 7C 0 14 .h B_ R N A_ 10 5 02 E0 07 C 01 4 .h B_ R N A_ 1 0 89 2 K7 6C 0 14 .h B_ R N A_ 10 9 52 K7 6C 0 14 .h B_ R N A_ 10 9 42 K7 7C 0 14 .h B_ R N A_ 10 4 92 E0 07 C 01 4 .h B_ R N A_ 1 2 29 2 S1 11 B3 55 .BM _ 3 6_ 3 80 P1 9B6 4 8. BM _3 6 _5 1 6 S1 11 B3 55 .BM _ 3 6_ 3 74 S1 51 B6 48 .BM _ 3 6_ 4 49 S1 10 B3 55 .BM _ 3 6_ 3 73 S1 51 B6 48 .BM _ 3 6_ 4 99 E0 07 C 01 4 .h B_ R N A_ 1 0 66 2 E0 07 C 01 4 .h B_ R N A_ 1 0 64 2 P1 9B6 4 8. BM _3 6 _1 1 07 S1 11 B3 55 .BM _ 3 6_ 3 79 S1 10 B3 55 .BM _ 3 6_ 3 25 S1 51 B6 48 .BM _ 3 6_ 5 10 S1 10 B3 55 .BM _ 3 6_ 3 29 S1 11 B3 55 .BM _ 3 6_ 4 14 K7 9C 0 14 .h B_ R N A_ 12 2 22 E0 07 C 01 4 .h B_ R N A_ 1 0 69 2 E0 07 C 01 4 .h B_ R N A_ 1 0 61 7 E0 07 C 01 4 .h B_ R N A_ 1 0 84 2 K7 9C 0 14 .h B_ R N A_ 11 0 62 K7 7C 0 14 .h B_ R N A_ 10 4 82 K7 7C 0 14 .h B_ R N A_ 10 5 12 P1 9B6 4 8. BM _3 6 _5 3 7 K7 9C 0 14 .h B_ R N A_ 11 0 72 E0 07 C 01 4 .h B_ R N A_ 1 0 55 2 S1 15 B3 55 .BM _ 3 6_ 3 37 S1 51 B6 48 .BM _ 3 6_ 4 37 S1 10 B3 55 .BM _ 3 6_ 3 66 K7 7C 0 14 .h B_ R N A_ 10 7 42 K7 7C 0 14 .h B_ R N A_ 10 7 22 K7 7C 0 14 .h B_ R N A_ 10 7 12 S1 09 B3 55 .BM _ 3 6_ 2 99 S1 10 B3 55 .BM _ 3 6_ 3 72 S1 10 B3 55 .BM _ 3 6_ 3 70 S1 09 B3 55 .BM _ 3 6_ 2 98 S1 09 B3 55 .BM _ 3 6_ 2 97 S1 51 B6 48 .BM _ 3 6_ 4 40 S1 15 B3 55 .BM _ 3 6_ 3 38 S1 11 B3 55 .BM _ 3 6_ 4 22 S1 10 B3 55 .BM _ 3 6_ 3 02 S1 10 B3 55 .BM _ 3 6_ 3 32 S1 10 B3 55 .BM _ 3 6_ 3 33 S1 51 B6 48 .BM _ 3 6_ 4 91 S1 11 B3 55 .BM _ 3 6_ 3 83 K7 7C 0 14 .h B_ R N A_ 10 5 42 S1 51 B6 48 .BM _ 3 6_ 4 44 S1 51 B6 48 .BM _ 3 6_ 4 92 S1 10 B3 55 .BM _ 3 6_ 3 64 S1 09 B3 55 .BM _ 3 6_ 2 91 S1 09 B3 55 .BM _ 3 6_ 2 89 S1 15 B3 55 .BM _ 3 6_ 3 36 S1 51 B6 48 .BM _ 3 6_ 5 07 S1 09 B3 55 .BM _ 3 6_ 2 96 S1 11 B3 55 .BM _ 3 6_ 3 89 P1 9B6 4 8. BM _3 6 _5 4 0 S1 10 B3 55 .BM _ 3 6_ 3 24 K7 6C 0 14 .h B_ R N A_ 10 9 12 S1 10 B3 55 .BM _ 3 6_ 3 46 S1 11 B3 55 .BM _ 3 6_ 4 20 E0 07 C 01 4 .h B_ R N A_ 1 0 88 2 E0 07 C 01 4 .h B_ R N A_ 1 0 68 2 S1 11 B3 55 .BM _ 3 6_ 4 15 S1 51 B6 48 .BM _ 3 6_ 4 29 K7 9C 0 14 .h B_ R N A_ 11 0 42 E0 07 C 01 4 .h B_ R N A_ 1 0 70 2 E0 07 C 01 4 .h B_ R N A_ 1 0 63 2 E0 07 C 01 4 .h B_ R N A_ 1 0 83 2 S1 11 B3 55 .BM _ 3 6_ 3 75 S1 11 B3 55 .BM _ 3 6_ 3 84 S1 11 B3 55 .BM _ 3 6_ 4 25 S1 11 B3 55 .BM _ 3 6_ 4 18 S1 51 B6 48 .BM _ 3 6_ 4 95 S1 51 B6 48 .BM _ 3 6_ 4 87 E0 07 C 01 4 .h B_ R N A_ 1 0 56 7 K7 9C 0 14 .h B_ R N A_ 11 0 52 K7 9C 0 14 .h B_ R N A_ 10 9 62

(11)

Fig. S9. Subtype signatures are found within MCI subjects in MSBB-AD. WSCNA

clustering dendrogram of AD and MCI samples in the MSBB cohort, demonstrating the MCI samples are mixed in all the AD subtypes. MCI samples are shown with green labels. 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00

WINA Clustering MSBB PHG (original clusters from WINA, MCI=green)

MSBB PHG Samples WINA Cluster ing Height S110B355.BM_36_317 K79C014.hB_RNA_11023 S151B648.BM_36_476 P19B648.BM_36_545 S111B355.BM_36_388 S111B355.BM_36_411 S110B355.BM_36_345 S110B355.BM_36_349 P19B648.BM_36_517_{S111B355.BM_36_382} S111B355.BM_36_398P19B648.BM_36_538 S109B355.BM_36_290 S110B355.BM_36_305 S111B355.BM_36_395 S115B355.BM_36_339 S151B648.BM_36_430_{S110B355.BM_36_328} S109B355.BM_36_284 S111B355.BM_36_378 S151B648.BM_36_509 P19B648.BM_36_534 S111B355.BM_36_400S110B355.BM_36_362 K77C014.hB_RNA_10802 K77C014.hB_RNA_10762 S151B648.BM_36_490 S110B355.BM_36_308 S109B355.BM_36_281 S151B648.BM_36_455 S110B355.BM_36_352_{S110B355.BM_36_327} S111B355.BM_36_377S115B355.BM_36_341 P19B648.BM_36_533 S151B648.BM_36_435 S110B355.BM_36_356 S111B355.BM_36_424 S151B648.BM_36_450S151B648.BM_36_502 S151B648.BM_36_515 S110B355.BM_36_354_{S151B648.BM_36_466 S111B355.BM_36_394 S110B355.BM_36_307 S111B355.BM_36_376 P19B648.BM_36_518} S151B648.BM_36_493_{S110B355.BM_36_344} S110B355.BM_36_347P19B648.BM_36_521 S110B355.BM_36_368_{S111B355.BM_36_413}_{S111B355.BM_36_390}S110B355.BM_36_363 S110B355.BM_36_320 S151B648.BM_36_432 S110B355.BM_36_314 S111B355.BM_36_423 S110B355.BM_36_310 S151B648.BM_36_441 E007C014.hB_RNA_10672 S151B648.BM_36_485S110B355.BM_36_348 E007C014.hB_RNA_10583 S110B355.BM_36_367 L43C014.hB_RNA_10622S110B355.BM_36_371S151B648.BM_36_486_{S111B355.BM_36_403 S109B355.BM_36_278} E007C014.hB_RNA_10822 K77C014.hB_RNA_10502 S111B355.BM_36_380 E007C014.hB_RNA_12292 S110B355.BM_36_373 S151B648.BM_36_449 E007C014.hB_RNA_10892 K76C014.hB_RNA_10942 K77C014.hB_RNA_10492 P19B648.BM_36_516_{S111B355.BM_36_374} P19B648.BM_36_539P19B648.BM_36_541_{S110B355.BM_36_361} S151B648.BM_36_465P19B648.BM_36_543 S111B355.BM_36_387 S110B355.BM_36_365S111B355.BM_36_399 E007C014.hB_RNA_10692 K79C014.hB_RNA_11142 E007C014.hB_RNA_10842 P19B648.BM_36_527S151B648.BM_36_505 S110B355.BM_36_312 S110B355.BM_36_355 S110B355.BM_36_331 S110B355.BM_36_350_{S115B355.BM_36_340 S111B355.BM_36_393} S151B648.BM_36_472 S151B648.BM_36_496 S110B355.BM_36_330 S151B648.BM_36_487 K77C014.hB_RNA_10382 K77C014.hB_RNA_10472 S151B648.BM_36_458 S110B355.BM_36_358_{S111B355.BM_36_405} K77C014.hB_RNA_10372 S111B355.BM_36_386 S110B355.BM_36_360 S111B355.BM_36_406 S111B355.BM_36_408 S109B355.BM_36_279 P19B648.BM_36_544 S151B648.BM_36_464_{S111B355.BM_36_410} S151B648.BM_36_471 K77C014.hB_RNA_10462 E007C014.hB_RNA_10642P19B648.BM_36_525S151B648.BM_36_482 P19B648.BM_36_1107S151B648.BM_36_510S111B355.BM_36_379 S110B355.BM_36_329

E007C014.hB_RNA_10662 K79C014.hB_RNA_12222_{K77C014.hB_RNA_10512} E007C014.hB_RNA_10617 K77C014.hB_RNA_10482 K79C014.hB_RNA_11132

P19B648.BM_36_537 E007C014.hB_RNA_10552 S151B648.BM_36_437 S115B355.BM_36_337 S110B355.BM_36_366 K77C014.hB_RNA_10742_{K77C014.hB_RNA_10722}S109B355.BM_36_299 K77C014.hB_RNA_10712 S110B355.BM_36_372 S109B355.BM_36_298 S110B355.BM_36_304 S109B355.BM_36_297 S111B355.BM_36_420 S151B648.BM_36_440 S151B648.BM_36_491 S111B355.BM_36_422 S110B355.BM_36_302 S110B355.BM_36_332 S110B355.BM_36_333 S115B355.BM_36_338 S111B355.BM_36_383 K77C014.hB_RNA_10542 S151B648.BM_36_444 S151B648.BM_36_492 S110B355.BM_36_364 S109B355.BM_36_291 S109B355.BM_36_289 S151B648.BM_36_507 S109B355.BM_36_296 S111B355.BM_36_389P19B648.BM_36_540 S110B355.BM_36_324 K76C014.hB_RNA_10912 E007C014.hB_RNA_10882 S111B355.BM_36_415 E007C014.hB_RNA_10682K79C014.hB_RNA_11042 S111B355.BM_36_375 S111B355.BM_36_384 S111B355.BM_36_425

E007C014.hB_RNA_10702 E007C014.hB_RNA_10632 E007C014.hB_RNA_10832S111B355.BM_36_418 S151B648.BM_36_495

E007C014.hB_RNA_10567

(12)

Fig. S10. Supplementary Figures for ROSMAP AD Subtyping. A-B) Cell type surrogate

proportion values (SPVs) of sequenced cell types in the bulk transcriptomic data for ROSMAP inferred for six brain cell types (astrocytes, endothelials, microglia, neurons, oligodendrocytes, oligodendrocyte precursor cells) by marker gene expression (see

Methods). A) Before normalization for dementia severity (MMSE score). B) After

normalization for dementia severity (MMSE score). C) Polygenic risk score by ROSMAP AD subtype class, with significant pairwise correlations as determined by Welsh’s t-test highlighted, using the Kunkle et al. dataset and after removal of non-European samples.

(13)

Fig. S11. Prediction of AD Molecular Subtype. A) Gini importance of top features used in the

predictive RF model trained on the MSBB-AD PHG data and tested by the ROSMAP data. B). Local interpretable model estimates (LIME) algorithm prediction weights for the top subtype-specific predictive features (including gene expression level quartile cut-offs) for the three classes of MSBB-AD subtypes in the RF model.

(14)

Supplementary Tables

Table S1. Summary of Clinical and Pathologic Phenotypes for MSBB-AD Samples with PHG Transcriptomic Data

Metric (mean +/- SD) Control AD P-value (t-test)

Number of subjects 32 151 - % Female 68.8% 66.9% 0.846 % European ancestry 71.9% 81.9% 0.209 % African ancestry 18.8% 11.8% 0.046 % Hispanic ancestry 6.2% 3.9% 0.572 % Other ancestry 0% 2.4% 0.38

Age of death (years) 82.8 y +/- 10.1 86.3 y +/- 8.9 0.0479

Clinical dementia rating

(CDR) 0 +/- 0 3.4 +/- 1.1 5.6*10

-34

Mean Ab plaque number (per

mm2₎ 2.1 +/- 3.1 11.6 +/- 9.8 2.2*10

-7

Pathology + CDR composite

(”Brain Bank score”) 2.16 +/- 1.27 4.16 +/- 2.08 6.22*10

-7

(15)

Table S2. Subtype key network regulators identified in MEGENA and BN networks

Subtype # MEGENA key reg. genes upregulated

# MEGENA

key reg. genes downregulated # BN key reg. genes upregulated # BN key reg. genes downregulated

# Overlap key reg.

genes upregulated # Overlap key reg. genes downregulated A 121 82 73 24 43 11 B1 94 66 81 26 26 17 B2 388 165 165 76 101 34 C1 287 225 107 189 65 78 C2 336 308 95 124 55 87

(16)

Table S3. Intersection of IGAP Consortium Significant AD Genes (53) and Predicted AD Subtype Key Network Regulator Genes using MEGENA

Gene Class Subtype Direction

MEGENA KNR supporting genes IGAP gene-level p-value IGAP variants in gene

PICALM B blue down 48 2.7127E-05 163

PSMC6 B blue down 59 4.5074E-05 5

TRAM1 B blue down 53 0.007827625 8

CAMTA2 B blue up 115 0.00008935 2

CTIF B blue up 87 0.0083064 15

ELL B blue up 85 0.002916457 74

FAM193B B blue up 92 0.000886004 24

GAK B blue up 133 0.001115133 3

HMHA1 B blue up 50 3.44587E-05 10

L3MBTL1 B blue up 74 0.004199 1 MAML1 B blue up 108 0.000249565 31 MARK2 B blue up 80 0.00199503 33 PIP5K1C B blue up 142 0.0004533 8 RAP1GAP2 B blue up 67 0.00047861 3 RASGEF1C B blue up 71 0.008884 1 SHANK2 B blue up 71 0.00379017 10 VAC14 B blue up 78 0.002427807 24

ACP1 C orange down 87 0.001185017 6

AMPH C orange down 81 0.0007451 1

CHRM3 C orange down 123 0.00029988 3

COX7A2L C orange down 87 0.000147388 8

EPDR1 C orange down 77 2.1267E-05 58

FIG4 C orange down 71 0.000555367 3

GUCY1B3 C orange down 151 4.70056E-05 5

MEF2C C orange down 78 0.008687003 20

PPFIA2 C orange down 73 0.001150956 41

PSMC6 C orange down 152 4.5074E-05 5

RGS4 C orange down 74 0.0062325 6

RTN1 C orange down 121 0.003425 2

SLC2A13 C orange down 85 0.00509558 5

XRCC5 C orange down 104 0.0015272 2 ANTXR1 C orange up 52 0.0021666 5 MAML1 C orange up 110 0.000249565 31 MAPKAPK2 C orange up 89 0.000764538 16 MSI2 C orange up 67 0.0095515 2 MTSS1L C orange up 57 0.000141744 22 MVB12B C orange up 81 0.001322135 26 PARD3B C orange up 80 0.009855 2 XKR8 C orange up 79 0.00216636 15

PDE4B B red down 33 0.008373333 3

PIP4K2A B red down 42 0.008063513 40

TRAM1 B red down 31 0.007827625 8

CUX2 B red up 37 0.002295725 4

PIP5K1C B red up 36 0.0004533 8

RAP1GAP2 B red up 42 0.00047861 3

RBFOX3 B red up 34 0.0039801 10

SHANK2 B red up 36 0.00379017 10

AMPH C turquoise down 81 0.0007451 1

CADPS C turquoise down 65 0.008799167 6

CHRM3 C turquoise down 129 0.00029988 3

CSMD1 C turquoise down 63 0.004882021 77

CUX2 C turquoise down 84 0.002295725 4

EPDR1 C turquoise down 61 2.1267E-05 58

GUCY1B3 C turquoise down 107 4.70056E-05 5

MEF2C C turquoise down 80 0.008687003 20

NGEF C turquoise down 67 0.000173692 11

RGS4 C turquoise down 72 0.0062325 6

RTN1 C turquoise down 116 0.003425 2

SLC2A13 C turquoise down 81 0.00509558 5

(17)

ANTXR1 C turquoise up 64 0.0021666 5 ARHGDIB C turquoise up 81 0.001651 1 CMTM7 C turquoise up 60 0.004248222 9 DOCK8 C turquoise up 60 0.001083467 3 MAPKAPK2 C turquoise up 157 0.000764538 16 MSI2 C turquoise up 51 0.0095515 2 PARD3B C turquoise up 91 0.009855 2 PGF C turquoise up 81 0.001583667 3 TRAM1 C turquoise up 61 0.007827625 8

ANTXR1 A yellow down 36 0.0021666 5

HMHA1 A yellow down 32 3.44587E-05 10

PARD3B A yellow down 48 0.009855 2

XKR8 A yellow down 51 0.00216636 15

AMPH A yellow up 62 0.0007451 1

CHRM3 A yellow up 93 0.00029988 3

EPDR1 A yellow up 63 2.1267E-05 58

GUCY1B3 A yellow up 111 4.70056E-05 5

MEF2C A yellow up 66 0.008687003 20

RGS4 A yellow up 62 0.0062325 6

RTN1 A yellow up 102 0.003425 2

(18)

Table S4. Sub-classification of MCI samples in MSBB-AD

Subtype # MCI samples Percent of MCI

samples # AD+MCI samples Percent of AD+MCI samples A 14 43.8% 61 33.3% B1 11 34.4% 39 21.3% B2 5 15.6% 34 18.6% C1 1 3.1% 34 18.6% C2 1 3.1% 15 8.2%

(19)

Table S5. Relative sample proportion per AD subtype across MSBB-AD and ROSMAP Subtype MSBB-AD samples Percent of MSBB-AD ROSMAP samples Percent of ROSMAP C 50 33.1% 179 45.8% B 54 35.7% 81 20.7% A 47 31.1% 96 24.5% Others (unknown) 0 0% 35 8.9%

(20)

Data file S1. Supplementary Tables of MEGENA and BN key drivers identified in MSBB-AD.

Data file S2. Features used in Molecular Subtype Predictive RF Model. Average gene

expression (log(CPM)) for features (n=271) used in random forest model for MSBB-AD Alzheimer's Disease Subtypes, after CDR normalization and cohort effect correction between MSBB-AD and ROSMAP.