around 10 mutations per site (B) Distribution of mutation frequency classified by the directionality of the CTCF motif shows the highest mutation rates for those involved in convergent interactions The density plots were created by
4. Studying the mechanisms of histone mark propagation along DNA-loops
4.2.3 Proposing possible mechanisms in which 3D-structure conveys functionality
We decided to further study the mechanism by which a distal-QTL could propagate to the other end of a loop and produce a changing phenotype such as a hQTL. We proposed two possibilities that could begin to explain the propagation of signal and that were previously described in our review (Ruiz-Velasco and Zaugg, 2017) (Figure 4.3A):
(i) “touch-and-act model”: where the physical contact at the loop anchors would be the only affected, without changing the genomic factors within the DNA-loop to convey its functionality.
(ii) “spreading model”: where the complete loop would show an altered function and therefore a coordinated activation or repression of the whole local neighbourhood.
We decided to prove our hypothesis by testing the allelic fraction of intermediate molecular phenotypes along the loops. If the allelic fraction showed a preference to map to one allele only at both boundaries -but not at the inside region- we would predict the touch- and-act model, while if the allelic fraction showed a constant preference along the loop it would be showing the spreading model, which predicts a coordinated activation/repression through the entire domain (Figure 4.3B). Yet those interactions where one anchor showed an opposite allelic fraction to the other end would indicate that the model cannot be tested as the allele-specific events are not linked in the same direction.
Briefly, we followed the same approach as for defining asCTCF by first identifying all heterozygous SNPs per individual, overlapping them with the ChIP-seq peaks (SA1, H3K4me1, H3K4me3, and H3K27ac), extracting the read counts surrounding the SNP (+/- 250bp) using SNPhood to calculate the allelic fraction and adjusted p-value (Methods: 4.4.3). We retrieved on average 240,000 unique peaks per individual which contain a heterozygous SNP, all of which were pooled and classified according to the loop regions (Figure S4.3A,B). By keeping those loops having allele-specific factors at both anchors and in total at least ten allele specific events, we were able to obtain 487, 284, and 1909 loops for HiC, H3K4me3, and Rad21 ChIA-PET, respectively. We then normalised the loop length as percentages to visualise all the interactions together and interpolated the data to fill in the blanks (Figure 4.3C).
We decided to check the proportion of loops that fell into each of the models with a heatmap and used k-means to cluster each of the datasets into 6 groups. Strikingly, we observed evidence for the two models in all three datasets at similar proportions. For the HiC data we observed 35% (clusters 5 & 6) of loops to describe the spreading model, 33% (clusters 1 & 4) for the touch-and-act, and 32% (clusters 2 & 3) to be not-linked. Meanwhile H3K4me3 data had 34% (clusters 1 & 2) for the spreading, 34% (clusters 3 & 6) for touch-and-act, and 32% (clusters 4 & 5) for each not-linked while the Rad21 dataset had 33% (clusters 3 & 6), 33% (clusters 1 & 5), and 34% (clusters 2 & 4) respectively. We acknowledge that a major limitation at this point is that the analyses are mostly qualitative and not quantitative and that a larger dataset should be used to gain further insights into the prevalence of each of the models.
Figure 4.3. Models proposing how functionality is conveyed along DNA-loops
(A) Schematic of the two proposed models for the transmission of signal from a SNP to a distal site. (B) Examples of
how the two models look like for HiC interactions of one individual when the fraction estimates (y-axis) are visualised along the loop-length [%] for heterozygous (grey) and allele-specific (red) factors. (C) Heatmaps grouped with k- means (6 clusters) provide evidence for the presence of both models (i=spreading, ii=touch-and-act; iii=non-linked) at different proportions for the 487, 284, and 1909 loops for HiC, H3K4me3, and Rad21 ChIA-PET respectively. Colour key shows allelic fraction, where red shows reads mapping preferentially to one allele, blue to the other allele and yellow as the 50% of reads contributed by each allele. Panel A was taken from (Ruiz-Velasco and Zaugg 2017) where we briefly mention the models, panels B and C were done by myself and are unpublished.
4.3 Discussion
Multiple studies from the last years have provided us with extensive knowledge and a collection of chromatin conformation data, which has increased our knowledge about chromatin 3D-organisation. However our understanding of how such architecture influence the function of downstream processes, and how structure translates into function, is still scarce and just beginning to emerge (Ruiz-Velasco and Zaugg, 2017). One such question is to understand how genetic variants can affect distant sites through physical contacts and if such transmission of signal would also produce changes in the surrounding intermediate molecular phenotypes.
In this study we took advantage of the prevalent inter-individual variation of intermediate molecular phenotypes in LCLs and tried to exploit it to get mechanistic insights that explained distal-QTLs in the context of 3D-organisation. We defined loop- QTLs as those distal-QTL-genomic factor pairs with a validated physical interaction either in HiC or in ChIA-PET. The finding that the outer loop boundaries most likely represented topological domain boundaries which were less likely to be loop-QTLs could be explained by the stronger binding of CTCF, which implies that it is more stable and less variable based on its lower SD. While we speculated that CTCF-dimer disruption could begin to explain the way in which distal-QTLs exert their function, we did not observe this to occur frequently, which is in line with the idea that these interactions should be the most robust to safeguard the regulatory domains along the genome.
On the other hand, we tested whether a loop-QTL could be explained by the disruption of a combinatorial binding of two TFs in the 3D-context. These analyses provided us with previously described and novel TF pairs which were mostly involved in immune response and lymphocyte-leukocyte activation. We speculate that a possibility is that even when the physical loop is not affected by the distal-QTL, a functional interaction
could still be disrupted. However further analyses including extensive characterisation of the inside regions will need to be done to draw any additional conclusion in this regard.
We decided to test whether a distal-QTL would affect only those loci in close physical interaction or whether it would be propagated along the entire loop, which led us to propose two models: the “touch-and-act” and the “spreading” model for the cases just described, respectively. We used allelic biases in genomic factors distributed along loops to verify both models and the proportions to which they happened. Interestingly there is evidence for both of these mechanisms, which could either suggest that there are multiple mechanism of how 3D structure conveys its function that depend on the combination of loci and regulatory factors (Ruiz-Velasco and Zaugg, 2017) or represent the various maturation states during loop extrusion.
While this study is still at its early stages and more improvements and analyses can be done, we are excited about the prevalence of both models and the possibility of gaining functional insights to the mechanisms by which genetic variants affect the genome. Although a couple of studies with varying approaches have been published since we started our study, an advantage in our analyses is that we are using also ChIA-PET data to test our models, which should capture more functional interactions given that it filters for specific proteins know to be present at loop anchors, and that we plan to extend our study to the existing cohorts of LCLs to maximise our results. Finally, we would need to also validate that such models apply only when the loop is present and not for peaks outside chromatin interactions.
4.4 Methods