Chapter 5 Testing for evidence of PRS-environmental factor interactions and PRS-SNP
5.3 PRS x risk factor interaction analysis
In order to perform the PRS x risk factor analyses, two independent samples were needed and one of them, the replication sample, needed to contain individuals with BMI or age at menarche data. The main focus of these interaction analyses was to establish whether there was evidence of an interaction existing between a breast cancer polygenic score, that represents SNPs across the genome, and either BMI or age at menarche. With the data I had access to, BMI and age at menarche data was only available for BBCS subjects, I did not have this information for the UK2 or COGS subjects. This meant that only a small number of individuals could then be used to test for an interaction, so in order to improve the statistical power to detect an interaction, a case-only approach was implemented. The BBCS cases with either BMI or age at menarche data were assigned to the replication sample, and the remaining BBCS cases, those without either BMI or age at menarche data, and controls were assigned to the training set. The UK2 GWAS was combined with BBCS training sample to increase the number of individuals in the training sample, in order to improve the precision of the SNP effect estimates used to construct the polygenic score. For the BMI interaction analysis, this meant that the combined GWAS training sample consisted of 4,316 cases and 5,190 controls, with the replication set containing 921 BBCS cases. For the age at menarche interaction analysis, the combined GWAS training set contained 4,312 cases and 5,190 controls, with the replication set containing 925 BBCS cases.
To increase the training sample size further, and test whether a score enriched for breast cancer associated SNPs interacts with either breast cancer risk factors, the COGS was used as the training sample in a separate interaction analysis. For the BMI analysis with COGS subjects as the training sample, the training sample consisted of all European COGS subjects (48,064 cases and 43,486 controls), with the replication sample containing 921 BBCS cases. For the age at menarche analysis with the COGS
173
subjects as the training sample, the training sample again consisted of 48,064 cases and 43,486 controls, with the replication sample containing 925 BBCS cases. Even though the COGS training set data was external to the BBCS GWAS, the number of BBCS subjects in the replication sample could not be increased further as the BMI and age at menarche information was only available for those BBCS subjects. BMI in the replication sample was found to range from 16.57 to 47.22 when including the women that are considered to be outliers (“underweight” to “obese”) (Figure 5-1), and the mean BMI for women in the replication sample was calculated to be 26.71, which is just within the “overweight” interval (25 ≤ BMI < 30). The age at which a woman in the replication sample has her first menstruation cycle ranged from 9 to 20 years when including women that are considered to be outliers (Figure 5-2), with the mean age being 13 years.
174
Figure 5-1: Boxplot of BMI distribution for replication sample
175
The SNPs used to construct the multiple polygenic scores were the SNPs retained after QC and LD-clumping (𝑟2 > 0.1). In order to increase the number of SNPs used to
construct the polygenic scores for the BBCS cases, imputed SNPs were used. The UK2 SNPs retained after QC and LD-clumping (𝑟2 > 0.1) that had not been genotyped
in the BBCS, were extracted from the BBCS imputed SNPs for all BBCS individuals. This then meant that up to 82,823 SNPs were used to construct the polygenic scores for the BBCS cases. Imputed SNPs were also used in the analyses when the COGS was the training sample. The COGS SNPs retained after QC and LD-clumping (𝑟2 >
0.1), that had not been genotyped in the BBCS, were extracted from the BBCS imputed SNPs for the BBCS cases used in the replication sample. Using imputed SNPs in the COGS based analyses meant that up to 41,651 SNPs were used to construct the polygenic scores for the BBCS BMI or age at menarche cases. After computing the scores, a linear regression model was used to test whether a breast cancer risk score was linearly associated with either BMI or age at menarche, with four principal
components included as covariates in the model.
There was shown to be a non-significant linear association between age at menarche and the breast cancer polygenic score derived from 82,823 UK2/BBCS SNPs (p ≤ 1, p- value= 0.602) (Table 5-1). The same was shown for each p-value threshold, none of the UK2/BBCS derived polygenic scores for the BBCS age at menarche subjects had a significant linear association with age at menarche in the BBCS cases. A significant linear association between age at menarche and the breast cancer polygenic score derived using 41,651 COGS SNPs (p ≤ 1, p-value= 0.020) was however observed, suggesting that an interaction between the two exists (Table 5-1). A significant association between age at menarche and PRS was also observed for the score constructed using SNPs with a p-value ≤ 0.7 and a p -value ≤ 0.4 (association p-value = 0.016 and association p-value= 0.042). The significant associations observed, were however not as significant as one would have hoped. Nonetheless, the results still
176
suggested a significant PRS x age at menarche interaction, when there was less of a restriction on the SNPs included in the polygenic score and when the score was based on the SNP effects estimated using the COGS.
The linear association between BMI and the breast cancer polygenic score derived from 82,823 UK2/BBCS SNPs (p ≤ 1, p-value= 0.153) was observed to be non- significant (Table 5-2). This was also found to be the case for most of the scores constructed using UK2/BBCS SNP estimates. A significant association was, however, observed between BMI and the breast cancer polygenic score derived using 377 GWAS SNPs (p ≤ 0.001, p-value= 0.040). The result suggested a significant PRS x BMI interaction existed when there was a restriction on the BBCS/UK2 SNPs included in the polygenic score. The same was not shown for any of the COGS derived
polygenic scores, so there was no evidence to suggest an interaction between BMI and any of the breast cancer polygenic scores existed, when using COGS SNP effects.
Age at menarche
Training set Replication set p-value threshold No. SNPs p-value
UK2/BBCS BBCS cases p ≤ 1 82,823 0.602 p ≤ 0.7 70,783 0.658 p ≤ 0.4 50,954 0.799 p ≤ 0.1 18,675 0.822 p ≤ 0.05 10,751 0.967 p ≤ 0.01 2,853 0.361 p ≤ 0.001 377 0.208 COGS BBCS cases p ≤ 1 41,651 0.020 p ≤ 0.7 34,575 0.016 p ≤ 0.4 24,590 0.042 p ≤ 0.1 9,597 0.427 p ≤ 0.05 5,833 0.527 p ≤ 0.01 1,962 0.959 p ≤ 0.001 529 0.131
177
BMI
Training set Replication set p-value threshold No. SNPs p-value
UK2/BBCS BBCS cases p ≤ 1 82,823 0.153 p ≤ 0.7 70,783 0.190 p ≤ 0.4 50,954 0.211 p ≤ 0.1 18,675 0.200 p ≤ 0.05 10,751 0.494 p ≤ 0.01 2,853 0.774 p ≤ 0.001 377 0.040 COGS BBCS cases p ≤ 1 41,651 0.838 p ≤ 0.7 34,575 0.828 p ≤ 0.4 24,590 0.795 p ≤ 0.1 9,597 0.130 p ≤ 0.05 5,833 0.169 p ≤ 0.01 1,962 0.287 p ≤ 0.001 529 0.309
178