PRS x SNP interaction analysis - Testing for evidence of PRS-environmental factor interactions

Chapter 5 Testing for evidence of PRS-environmental factor interactions and PRS-SNP

5.4 PRS x SNP interaction analysis

To examine whether individual SNPs modify the combined effect of SNPs on breast cancer risk, significant PRS x SNP interactions were tested for. All possible pair-wise interactions between individual SNPs and the PRS were tested, with the training sampling consisting of BBCS subjects and the replication sample represented by UK2 study cases. In order to maintain a large number of SNPs in the PRS, only the two GWAS were analysed. Imputed SNPs were also included in the analysis, to further increase the number of SNPs in the analysis. SNP effects were estimated using the BBCS subjects and a logistic regression model, with four principal components included as covariates.

Before correcting for multiple comparisons, 3,539 significant interactions between the PRS constructed using all independent GWAS SNPs, minus the one tested SNP, and individual SNPs were observed. After correcting for multiple comparisons using a FDR of 5%, no significant interactions were observed when using all SNPs in a score. For the p ≤ 0.001 interval there were found to be 217 significant interactions, out of 220 tested PRS x SNP interactions, when testing at a 5% significant level. This meant that approximately 99% of the tested interactions were observed to be significant, which was a lot greater than the 5% that would be expected by chance. Even after correcting for multiple comparisons using a FDR of 5%, all 217 significant interactions were still observed. For the p ≤ 0.01 interval, 432 significant interactions were observed, 89 of which were still significant after adjusting for multiple comparisons. After adjusting the association p-values by the FDR for the polygenic scores and SNP interaction tests constructed using SNPs with p-value thresholds p ≤ 0.05 to p ≤ 1, no SNPs were found to be significantly associated with the polygenic score constructed using the remaining SNPs within the same bin. Therefore, for these bins no evidence was found to suggest that individual SNPs interact with the constructed polygenic scores.

179

Focusing on the intervals where significant interactions were still observed after correcting for multiple testing (p ≤ 0.01 and p ≤ 0.001), there were found to be 17 common SNPs between those significant after FDR in p ≤ 0.01 and p ≤ 0.001 bins. None of the 17 common SNPs have previously been shown to be associated with breast cancer or other traits at genome-wide significance, in fact none of the 89 single SNPs in the p ≤ 0.01 interval have currently been found to reach genome-wide

significance for any traits. Seven of the 217 SNPs observed to significantly interact with the PRS constructed using SNPs with a p ≤ 0.001 are published genome-wide

significant breast cancer SNPs (Table 5-4).

With approximately 99% of the SNP x PRS interactions being found to be significant after adjusting by an FDR of 5% for the p ≤ 0.001 bin, this may suggest that there could be SNPs in the score which are highly correlated with the individual SNP being tested. None of the individual SNPs were found to be highly correlated with the remaining SNPs used to construct the PRS (all correlations were r2_{< 0.2), meaning that it was}

unlikely that the linear association was driven by correlation between the individual SNPs and those used in the score. This was also found to be the case for the p ≤ 0.01 analysis as none of the 89 SNPs were found to be highly correlated with the remaining SNPs used to construct the PRS (all correlations were r2_{< 0.2).}

The results indicate that it could be possible that individual SNPs modify the combined effect of SNPs on breast cancer risk, with some of the individual SNPs having

previously been observed to be associated with breast cancer risk. With each polygenic score in chapter 2 having been shown to be significantly associated with breast cancer risk in an independent sample, the results from the PRS x SNP

interaction analysis suggest that the presence of a certain SNPs in either the p ≤ 0.01 or p ≤ 0.001 score, could modify the effect the score has on breast cancer risk. However, this is a case-only analysis, so it would be best to see whether the interactions replicate in a case-control setting.

180

PRS No. SNPs* Sig. interactions** FDR**

p ≤ 1 66,339 3,539 0 p ≤ 0.7 55,786 2,959 0 p ≤ 0.4 39,217 2,139 0 p ≤ 0.1 13,442 887 0 p ≤ 0.05 7,474 622 0 p ≤ 0.01 1,813 432 89 p ≤ 0.001 220 217 217

* The number of SNPs with a p-value less than or equal to the given PRS threshold ** p-value < 0.05

No. SNPs-1 = the number of SNPs used to construct the PRS

For the FDR, the total no. SNPs in the PRS were used as the number of tests

Table 5-3: Testing for sig. interactions between SNPs and 𝑃𝑅𝑆̂ _{𝑚−1 (𝑐𝑎𝑠𝑒−𝑜𝑛𝑙𝑦)}

SNP Chromosome Position rs11249433 1 10566215 rs13387042 2 217905832 rs12655019 5 56195790 rs865686 9 110888478 rs1219648 10 123346190 rs10995190 10 64278682 rs3803662 16 52586341

Table 5-4: Published genome-wide significant breast cancer SNPs found to significantly interact with PRS

181

5.5 Discussion

In chapter 2, it was observed that polygenic scores constructed using breast cancer GWAS SNPs from one GWAS were associated with breast cancer status in an independent GWAS. To investigate this further, I tested whether there was evidence that the effect a breast cancer polygenic score has on breast cancer risk could be modified by either BMI or age at menarche. For other complex diseases, with a polygenic basis, evidence of PRS-environmental factor interactions have been established (139-142). Individual breast cancer susceptibility variants have been previously shown to interact with BMI and age at menarche, but this was the first time that it has been tested whether an en-masse breast cancer PRS interacts with either risk factor.

Initially, for an interaction analysis, the two breast cancer GWAS would have been considered small, sample size wise. The size of the replication GWAS was reduced further as only a limited number of BBCS cases had either BMI or age at menarche information. As only cases had BMI or age at menarche data, and to improve the power to detect significant interactions, a case-only approach was implemented. To conduct a case-only interaction analysis, it is assumed that the disease being studied is rare and that in the population the gene and environment factors being tested are independent. The problem with assuming independence is that, typically, there is uncertainty as to whether the assumption holds (149). Therefore, great care should be taken when drawing a conclusion based on the results of a case-only interaction analysis for this reason. Even though there was some uncertainty as to whether the assumption of independence holds between the breast cancer polygenic scores and BMI, age at menarche and the genotyped SNPs, a case-only analysis was conducted because information on the environmental factors were only available for a small proportion of cases genotyped in the studies used in this thesis. The interactions should also be tested using a case-control interaction analysis in a much larger number

182 of individuals.

For the case-only interaction analysis conducted in this chapter, multiple polygenic scores were constructed for the BBCS cases who had either age at menarche or BMI information, for different p-value thresholds. A linear regression model was then used to model a PRS and an environmental factor, with a significant association providing evidence that a significant interaction exists. As none of the UK2/BBCS derived polygenic scores had a significant linear relationship with age at menarche, there was no evidence to suggest that a polygenic score constructed using GWAS SNPs interacts with age at menarche. A number of the scores constructed using SNPs genotyped on the iCOGS custom array were, however, shown to be significantly associated with age at menarche, thus suggesting that an interaction exists. The scores derived using COGS SNPs with a p-value ≤ 1, p-value ≤ 0.7 and p-value ≤ 0.4 were shown to be significantly associated with age at menarche (p < 0.05). When being more stringent on the choice of SNPs used to construct the polygenic score, the associations become non-significant. The results suggest that the breast cancer scores constructed using a large number of independent genotyped SNPs, could interact with age at menarche to have an effect on breast cancer risk. However, no significant associations were observed between BMI and any of the polygenic scores constructed, using either UK2/BBCS SNPs, or COGS SNPs. The PRS x environmental factor analyses conducted in this chapter would have only had up to 25% power to detect a PRS association with BMI. Therefore, the analyses should be replicated in a larger sample, preferably a sample with a greater number of individuals with BMI and age at

menarche information.

BMI and age at menarche are not the only environmental factors that have been

identified as breast cancer risk factors. Further analyses should therefore be conducted to examine whether other breast cancer risk factors, such as percent mammographic density, interact with breast cancer polygenic scores. Unfortunately, at the time of

183

performing the analyses conducted in this chapter, I did not have access to data that would have enabled me to investigate whether interactions between PRS and other breast cancer risk factors exist. The data for other breast cancer risk factors was available, but it would have been too time consuming to apply for it, and this would have delayed my analyses.

In this chapter, it was also examined whether any of the genotyped GWAS SNPs interacted with a polygenic score to have an effect on breast cancer risk. Significant associations were found, and surprisingly for the p ≤ 0.001 interval it was found that approximately 99% of the tested interactions were observed to be significant, which was a lot greater than the 5% that would be expected by chance. Even after adjusting for multiple testing using an FDR < 5%, a number of significant associations were observed for the scores constructed using SNPs with a p ≤ 0.01 and p ≤ 0.001. For the other intervals, no significant interactions were observed after adjusting for multiple testing. After measuring the correlation between the individual SNPs and those used in the PRS for the significant interactions, none of the individual SNPs were found to be highly correlated with the remaining SNPs used to construct the PRS. Only 17 SNPs were found to significantly interact with the PRS based on remaining SNPs with a p ≤ 0.01 and p ≤ 0.001, with none of the SNPs shown to significantly interact with any of the other scores (p ≤ 1, p ≤ 0.7, p ≤ 0.4, p ≤ 0.1 and p ≤ 0.05) after correcting for multiple testing. There was therefore no evidence to suggest that these SNPs

interacted with other PRS, just those based on SNPs with a p ≤ 0.01 and/or p ≤ 0.001. None of the individual SNPs were found to be highly correlated with any of the

remaining SNPs used to construct the p ≤ 0.01 and p ≤ 0.001 scores, therefore suggesting that correlation between SNPs is not driving the significant linear

association, and that it is possible that these SNPs are interacting with the scores. With this being a case-only analysis, it should be tested whether the same can be shown when conducting a case-control interaction analysis.

184

In document Exploring the genetic architecture and the chromatin organisation of breast cancer (Page 179-185)