• No results found

A subset of BL6-specific CTCF binding is tissue- tissue-shared

evolutionarily young CTCF binding sites

2.3.4 A subset of BL6-specific CTCF binding is tissue- tissue-shared

Although CTCF is known to be bound across multiple-tissues, the numbers and locations of its binding sites can vary among tissues[577]. We, therefore, investigated the association between subspecies-specificity and tissue-specificity by profiling the pattern of CTCF binding in tissues other than the liver using publicly available data.

We used ENCODE CTCF ChIP-seq data for BL6 adult (8 weeks) male mice from 13 tissues: liver, lung, bone marrow, bone marrow macrophages, cortical plate, cerebellum, heart, kidney, thymus, spleen, olfactory bulb, small intestine and testis[451]. Since ENCODE did not perform ChIP-seq analysis on CAST samples, all tissue analyses were limited to BL6 only.

Figure 2.4: Almost a 1000 BL6 subspecies-specific CTCF binding sites are shared among five tissues.

a, b and c UpSet plot of the liver-derived CTCF binding sites found across the 12 mouse ENCODE tissues for all sites (a), conserved (b) and BL6-specific (c).

The number of sites bound at each combination of tissues is indicated on the y-axis on the top bar chart. The original plot was reduced to these 26 combinations representing only highly tissue-shared and tissue-specific. The rightmost bar on each UpSet plot (boxed) indicates the number of CTCF binding sites that were not found to be bound in any other ENCODE tissue library. d Density plots of the association between CTCF binding strength in musculus-common/BL6-specific sites and occupancy conservation across 12 tissues. The plots display the frequency of CTCF shared binding across tissues. Diversity values are indicated on the x axis as calculated using Shannon Diversity Index from the p-value estimates of peak calls of each category (see Methods). The red line is for the proportion of conserved CTCF occupancy within each bin of Shannon index, calculated based on the number of CTCF sites bound for each category across tissues separately. e Bar plot of the proportion of CTCF binding sites bound in ascending number of tissues in conserved versus BL6-specific sites. The y axis represents the cumulative percentage of binding sites found at the minimum number of tissues on the x axis. The dashed grey line denotes the minimum number of tissues at which 50% sites are shared. f Overlap of CTCF binding from 13 ENCODE Project derived data sets with our liver-specific BL6 data.

The four tissues with the highest overlap are enclosed and used for further analysis. The number of peaks shared with each tissue are inset.
g Number of BL6-specific CTCF binding sites shared among subsets of the four selected tissues, plus the ENCODE liver as an added technical replicate.

Analysis of ENCODE tissue libraries of CTCF showed that at least 10.5% of all binding sites, are bound in all ENCODE tissues (almost 4000 sites), and over 1800 more have their occupancy conserved in a minimum of 11 tissues (Figure 2.4a). On the other hand, 17% of all CTCF sites appear to be liver-specific, with no shared occupancy in any other ENCODE tissues. Of all ENCODE tissues analysed, kidney appears to have the highest degree of tissue-shared binding with the liver, with more than 78%, of which over 2000 sites are bound exclusively between the two tissues (Figure 2.4a). When we stratified these sites based on their evolutionary origin, the patterns above were mirrored in the musculus-common set of CTCF binding sites; 98%

(> 3900) of all CTCF sites bound in all 12 tissues were musculus-common (Figure 2.4b). The results from the BL6-specific set of CTCF sites were, on the other hand, to the contrary. Slightly over 1% of all BL6-specific CTCF sites were bound in all 12 tissues, and 41% of these sites (>2800) were found only in the liver (Figure 2.4c). The kidney, again, appeared to be the tissue with which most occupancy is shared, albeit greatly reduced now from 85% in musculus-common to just about 50% in BL6-specific sites (Figure 2.4c).

In addition to looking at shared CTCF occupancy in other tissues, we evaluated the strength of CTCF binding sites in all of the 12 mouse ENCODE tissues, using the Shannon Index[578] as a measure of their diversity of evolutionarily variable CTCF binding on the basis of their abundance and conservation. The expectation is that sites that are tissue-shared are more likely involved in regulatory functions, hence they are under increased selective pressure that keeps their levels of shared binding high tissue-wide[399].

Results confirmed the findings above, showing high Shannon index values across tissues, correlated with a great degree in CTCF occupancy conservation (Figure 2.4d rightmost panel). These results were more strongly observed in musculus-common sites, with higher density at higher values of the diversity index (Figure 2.4d leftmost panel).

The Shannon index high values distribute smoothly in a bi-modal trend, with the bottom five tissues from Figure 2.4a and b clustering together towards the lower range of the diversity index, and the top 5 tissues occupying the cluster at the higher end of the curve. The diversity curve for the BL6-specific CTCF sites was, however, markedly different. The calculated tissue index values were much lower and extended in a wider scale, flattening the distribution, a further sign that subspecies-specific CTCF binding is predominantly tissue-specific, its occupancy in other tissues is far more restricted.

In light of the analyses performed above, we explored the possibility of finding a subset of BL6-specific CTCF binding sites with elevated levels of tissue-sharedness.

We theorised that increased tissue-permeation to CTCF subspecies-specific binding could be a precursor to their adopting functional roles. We first looked at how many of these sequences are found in progressively more tissues (Figure 2.4e). Analysis of ENCODE tissue data showed that whilst a minimum of 50% of all musculus-common CTCF sites are found in at least 6 tissues, the same proportion of sites can be found in only one other tissue for BL6-specific sites. The analysis; however, suggested that 16% of subspecies-specific sites can be tissue-shared in a minimum of 5 tissues (Figure 2.4e).

We identified the top five ENCODE tissues by the number of CTCF binding sites that co-occur with our liver BL6 ChIP-seq datasets for further analysis. As expected, ENCODE liver and kidney have the most overlap with our liver datasets (Figure 2.4f). For the BL6 binding sites we identified as musculus-common, 67-85%

are shared in these five tissues and 26-49% of the BL6-specific sites we identified in liver are also bound in these five ENCODE tissues. The analysis only used CTCF binding sites that were retrieved from at least two ENCODE biological replicates, making the number of estimated binding sites, especially those that are BL6-specific,

Focusing only on the five ENCODE tissues most similar to liver in CTCF binding profile (Figure 2.4f enclosed), we were able to identify a subset of CTCF subspecies-specific sites that are bound in all five tissues. There were 912 CTCF sites found in our data and shared with ENCODE liver, kidney, heart, lung and cortical plate (Figure 2.4g), spread over all mouse chromosomes. These sites constitute 13% of all of our BL6-specific CTCF sites, but we hypothesise that shared binding in these different tissues may indicate an increased involvement in genomic functions compared to their tissue-variable counterparts.

2.3.5 Tandem duplication event of BL6-specific CTCF

Outline

Related documents