1.3 Type 2 diabetes (T2D)
1.4.2 DNA methylation
1.4.2.3 DNA methylation function
As proposed in 1975 [312, 155], DNAme has been generally thought of as linked to repression, silencing, and general inactivation of the genome. Indeed, many lines of evidence support this notion. DNAme is essential to genome stability and is found around centromeres, microsatellites, transposable elements, and other repetitive elements in mammalian genomes (reviewed in [305]). DNAme is necessary for X-chromosome inactivation in females, where one of the two female copies of the X-chromosome is silenced so that transcription occurs on only one copy for the majority of X chromosome genes [212]. DNAme also serves in imprinting where one parental copy of a gene is transcriptionally silenced (reviewed in [20]). Finally, there has been incontrovertible evidence for some time that DNAme in promoters silences gene expression (reviewed in [212]).
However, despite all of these observations, there is no established, comprehensive model for how DNAme mechanistically functions in mammalian genomes. One model envisages DNAme as a “locking” mechanism, where it aids in maintaining chromatin states rather than initiating chromatin remodelling [173]. For instance in mouse, methylation of the Hprt gene occurs after X-chromosome inactivation and Hprt silencing, suggesting DNAme is not the primary mechanism in X-chromosome transcriptional silencing [227]. However, DNAme can also attract TFs (reviewed in [445]), including methyl-CpG binding domain proteins, like MeCP2, which associate with repressor complexes that alter the surrounding chromatin structure, pointing to a role as a chromatin remodelling pioneer [212]. Still other data suggest that nucleosome histone modifications may make DNA differentially susceptible to methylation [174]. Through all of these lines of evidence, a picture emerges of a very complex and intertwined relationship between DNAme, histone modifications, and other regulatory mechanisms that is highly context dependent.
Enabled by high throughput technologies to assay methylomes genome wide, the past decade of epigenetics research has helped establish and contextualize the diverse roles of DNAme. In somatic mammalian tissues, the majority (~70-90%) of CpG sites are methylated [212, 445]. CpG dinucleotides are globally depleted (~5x) from the human genome [34], likely stemming from the fact that 5mC is prone to spontaneously deaminate from a cytosine to a thymine residue [212].
An important exception to this global CpG methylation pattern is the presence of specific ~1 kb stretches of mostly unmethylated CG-dense regions called CpG islands (CGIs; reviewed
in [77]). CGIs generally mark TSSs. In humans and in mice, ~50% occur in canonical promoters. The remaining “orphan CGIs” are split with ~25% occurring in intragenic regions and ~25% occurring in intergenic regions. Despite the various genomic contexts of CGIs, nearly all show evidence of transcriptional initiation, perhaps because CGIs mark open, nucleosome deficient chromatin and therefore do not require additional ATP-dependent chromatin remodelling complexes for nucleosome displacement [77]. Many of the orphan CGIs show transcription of ncRNA, and exhibit tissue specific activity. Of all CGIs, the intragenic CGIs show the greatest number of differences in DNAme across somatic tissues, which may be linked to alternative splicing [77].
In addition, the high CpG content of CGIs can recruit proteins that promote H3K4me3, an activating chromatin mark [385]. Conversely, the G+C richness of CpGs also attracts proteins associated with H3K27me3, a repressive mark [250]. Therefore, in specific cellular contexts like embryonic stem (ES) cells, many CGIs lie in a bivalent chromatin states. As differentiation occurs, these states flip, like a switch, into active or repressed [77, 205]. Because of the peculiar fact that ~70% of all promoters have a CGI [77], CGI promoters have been intensely studied. In this specific context, it is clear that DNAme of CGI promoters blocks TF binding and gene transcription is inhibited [173, 77]. Because many early studies focused on this specific context (promoter CGIs), this observation has shaped the general perception that DNAme decreases gene expression [173].
However, as technologies have enabled the study of DNAme in other contexts, this intuition that DNAme always decreases gene expression has been clearly refuted [173]. For instance, DNAme in the context of gene bodies can be associated with increased levels of transcription, possibly even stimulating the elongation phase of transcription [173]. Building on this observation, there is a growing body of evidence that supports a regulatory role of gene body DNAme in transcript splicing (reviewed in [210]). Compared to introns, DNAme is more abundant in exons [210], and has been shown to be capable of directly causing alternative splicing [432]. However, the effects of DNAme appear to be context specific—sometimes promoting exon inclusion and other times promoting exclusion. In cases of strong splicing programs for constitutive exons, perhaps due to a strong splice motif, weaker DNAme effects on splicing are often suppressed. These observations have led to a model where DNAme functions as a “fine-tuning” mechanism for alternative splicing. Such a model is consistent with the fact that DNAme cannot be required for splicing since other organisms that lack DNAme, like Drosophila melanogaster and Saccharomyces cerevisiae, have spliced genes [210]. Mechanistically, DNAme has been shown to affect splicing by altering the kinetics of Pol II elongation, for instance by creating “roadblocks” via CTCF or MeCP2 recruitment,
as well as by attracting proteins which associate with splicing factors, such as HP1 [210]. Despite these clear cases, given the growing catalogue of TFs that read and write DNAme (reviewed in [445]), there are likely many more key TFs involved in DNAme-linked splicing that have yet to be discovered.
Even less characterised are intergenic regions, although initial evidence suggests that the methylation patterns in these regions is very important. In mouse, Stadler et al. [364] describe intergenic lowly methylated regions (LMRs) that are cell type specific and overlap DHSs and enhancers. These regions were also strongly correlated with increased expression of nearby genes (i.e., methylation of LMR reduced expression). In human cell lines, Charlet et al. [58] found some H3K27ac peaks, a hallmark of active enhancers, coexist with DNAme in enhancers, but not promoters. In cases where TCF4, a TF associated with enhancers and H3K27ac peaks, was bound within a H3K27ac peak, an abrupt decrease DNAme was observed. Furthermore, genetic or pharmacological reduction of DNAme decreased H3K27ac, suggesting that DNAme is important to broader enhancer integrity, but lack of DNAme is linked to TF binding within enhancers. It should be noted, however, that TF binding does not always coincide with decreased DNAme, and indeed some TFs may prefer DNAme when binding (reviewed in [445]). The important message from these studies is DNAme shows complex patterns of functional importance in intergenic regions such as enhancers. Given the importance of enhancers in regulating tissue specific gene expression and the general enrichment of GWAS loci in disease relevant, tissue specific enhancer states [287], such results may have important implications for disease and motivate the further study of DNAme in this context.
Collectively, these observations demonstrate the role of DNAme is far more complex and nuanced than previously appreciated. Understanding the functional importance of DNAme will require the integration of multiple molecular traits, including gene expression, DNAme, and histone marks, across multiple tissues.