Methods - Structured clustering representations and methods

4.4.1 Differential expression analysis

TRAP-purified mRNAs from either Drd1a- or Drd2-expressing SPNs were reverse-transcribed, amplified, and used to interrogate Affymetrix 430 2.0 GeneChip microarrays. Affymetrix CEL files were processed and normalized using the RMA algorithm from the Bioconductor “affy” package [40]. For each (Dose, Cell Type) group, log2 fold-change for each probe- set was computed as the difference in mean expression compared with the matched saline-treated group. Significance of differences between groups was calculated by Welch’s t test using scipy.stats or R [98]. To report counts for comparisons between groups, we defined significantly differentially expressed genes as those having any probe-set with greater than 1.5-fold change and a Benjamini–Hochberg adjusted P value from Welch’s t test < 0.10. Source code and data files to replicate all statistical analyses are provided on the Web site http://pd.sciencespace.org and at http://github.com/aheilbut/PDmouse. Dataset S20 contains all statistical results for all probe-sets, and Table 4.26 provides links to complete files with all data tables discussed.

4.4.2 Linear modeling of AIM scores from L-DOPA dose and expression

Since there is variability in the timing and severity of dyskinesias both in the clinic and in these mouse models, one of the initial questions considered was whether there were genes associated specifically with the emergence of dyskinesia, distinct from other expression changes associated with L-DOPA treatment but which might not be directly related to dyskineia. To test the hypothesis that differences in gene expression may be correlated to variation in AIMs severity, we considered two sets of nested linear models relating expres- sion of each probe-set, L-DOPA dose, and AIM score: AIM ∼ Expression + C(Dose),

AIM ∼ C(Dose), and AIM ∼ Expression, as well as Expression ∼ AIM + C(Dose), Expression ∼ C(Dose), and Expression ∼ AIM. C(Dose) refers to the factor vari-

able representing high- or low-dose levodopa treatment. Models were fit using the “ols” procedure in the python statsmodels module [108]. Comparing these models allowed assessment of whether expression was correlated with AIM score, and whether that cor-

relation was more than would have been expected given the common dependence of dyskinesia and expression on levodopa dose. This process distinguishes three possible sets of genes: (i) dose-dependent genes with the expected correlation with dyskinesia severity (i.e., significant differential expression across dose, and significant association of AIM score and dose, but nonsignificant association of AIM score and expression, adjusting for dose); (ii) dose- dependent genes with excess correlation with dyskinesia (i.e., as in i, but with significant association of AIM score and expression, adjusting for dose); and (iii) genes with expression independent of dose yet correlated with dyskinesia (i.e., as in ii, but without significant differential expression between doses). Fig. ?? shows theoretical examples of each of these types of possible probe- sets. Dataset S16 reports statistics for all model fits and comparisons, to enable comparisons among models and sorting probe-sets by correlations with AIMs, statistical significance, or magnitudes of expression changes. Probe-sets are sorted by the significance of the multiple correlation for the model Expression ∼ C(Dose) + AIM, after filtering for significant changes of 1.5-fold or greater between the high- and low-dose groups.

4.4.3 Pathways Overlap Analysis

For each treatment group, the set of statistically significant differentially expressed genes (Benjamini–Hochberg FDR, cut-off of 0.10), independent of magnitude of change, was compared against the Wikipathways gene sets to compute overlaps. Statistical significance of gene set overlaps was assessed by a hypergeometric test.

4.4.4 Multiple Hypothesis Testing Adjustment

P values from all statistical tests were adjusted using the Benjamini–Hochberg procedure with “multicomp.multipletests” in python statsmodels [108] to control false-discovery rate over all probe-sets. Bonferroni-adjusted and nominal P values are also reported.

Full details on experimental methods and biological reagents are provided in the text and supplement of Heiman, 2014. [51]

log2 expression in te g ra te d AI Ms

Group 1: Gene expression is dose-dependent;

expected correlation between expression and AIMs score; (given knowledge of dose, no statistically significant improvement in model by including expression)

log2 expression in te g ra te d AI Ms

Group 2: Gene expression is dose-dependent;

statistically significant excess correlation between expression and AIMs score, even given knowledge of dose

log2 expression in te g ra te d AI Ms

Group 3: Gene expression is not related to dose;

but AIMs score is correlated with expression

chronic, low-dose levodopa chronic, high-dose levodopa

AIM ~ expression

C

B

A

AIM ~ Dose + Expression AIM ~ Dose AIM ~ Dose + Expression AIM ~ Dose

chronic, low-dose levodopa chronic, high-dose levodopa

AIM ~ expression AIM ~ Dose + Expression AIM ~ Dose AIM ~ Dose + Expression AIM ~ Dose

chronic, low-dose levodopa chronic, high-dose levodopa

AIM ~ expression AIM ~ Dose + Expression AIM ~ Dose AIM ~ Dose + Expression AIM ~ Dose

Figure 4.4: Hypothetical examples of genes with different relationships to AIM scores that are dis- tinguished by comparisons between alternative linear models. Each panel shows a scatter plot of AIM score vs. log2 expression. Each point represents one gene in one mouse; its expression is encoded by horizontal position, and the integrated AIM score of that mouse is encoded by vertical position. Blue points represent measurements from mice treated with low-dose levodopa; red points are from mice treated with the high dose of the drug. (A) An example of a gene in group 1. The expression of this gene is dependent on dose. Fine dotted lines show the predicted AIM score for a model of AIM score as a function of dose (blue for low-dose and red for high dose). If the AIM score is modeled as a function of gene expression alone (AIM � Ex- pression), there is a significant correlation between AIM score and expression (green dotted line). Red and blue large dashed lines depict AIM score prediction from a model using both dose and expression (AIM � Dose + Expression). For group 1, there is no significant difference between the dashed and dotted lines; conditional on knowledge of dose, AIM score is not correlated with expression. (B) A hypothetical gene in group 2. For this gene, there is a significant difference between the dashed and dotted lines; the model (AIM � Dose + Expression) fits significantly better than (AIM � Dose). (C) A hypothetical gene in group 3, for which AIM score is not well modeled by gene expression alone (AIM � expression), because expression measurements overlap between the dose groups. However, there is again a significant difference between (AIM � Dose + Expression) and (AIM � Dose) models.

In document Structured clustering representations and methods (Page 69-72)