We had analyzed different data sets of yeast nucleosome positions. These data sets were from yeast cells. Our next analysis was to examine in vitro nucleosomes for a yeast gene within a vector. Our goal was to determine if the nucleosomes would be arranged around the gene similar to the in vivo results we had determined. The results determined are for the gene region alone and the entire vector region including the gene.
The results obtained show the WW/SS distribution of the 4 nucleosome types is similar to the distribution for the Clark nucleosome series from section 4.1. This result tells us that the distribution of the canonical and anti-pattern nucleosomes is conserved for both in vitro and in vivo and not an artifact of the in vivo data.
Overall, we have shown that the nucleosome patterns found in yeast cells are neither a factor of the technique used to determine the sequences nor a factor of the nature of the cellular material (in vitro/in vivo). The final step in our analysis was to examine another cell type. We looked to human nucleosomes to determine if the yeast was a good model for the human data.
4.4 Gaffney Human Nucleosome Data Series
We selected a human nucleosome data set for lymphoblastoid cell lines. The cell lines are from seven Yoruba individuals, the data was obtained using MNase digestion. The
sequences were aligned to the HG18 genome. We obtained the sequences based on the sequence information from the data set. Figure 4.34 has the WW/SS frequencies for the human nucleosome sequences and Table 4.4 has the percent distribution of the
Figure 4.34: Gaffney Nucleosome Series Nucleotide Frequency (top line – WW frequency; bottom line – SS frequency)
Table 4.4: Gaffney Series Nucleosome Type Distribution
Nucleosome Type Type 1 Type 2 Type 3 Type 4 Frequency (%) 40.88% 21.71% 12.60% 24.80%
Nucleosome Count 1430131373 759450462 440719484 867643686
The total counts of nucleosome shows an increase in types 2 and 4 compared to the yeast data. Most likely this is due to the increased [C+G] concentration of the human genome compared to yeast.
Figure 4.35: Frequency Distribution of [C+G] Concentration within Gaffney Series
Figure 4.36: Gaffney Series Nucleosome Type Distribution for [C+G] Frequency Levels Figures 4.35 and 4.36 show the type distribution for each [C+G] content amount and the distribution of each individual nucleosome type. These figures agree with the previous results in that they show a shift towards the increased [C+G] concentration of the human genome. The crossover from a higher amount of type 1 to a higher amount of type 4 happens in the 60-70% [C+G] range rather than the 40-50% range seen in yeast. The individual nucleosomes also have an increase in peak concentration to 40-50% [C+G]
from 30-40% in yeast.
Figure 4.37: Average Gaffney Series Nucleosome Occurrence for Human Gene TSS
The comparison of nucleosome occupancy around gene transcription start sites is shown in Figure 4.37. All nucleosome types are relatively close to the results obtained for yeast.
The graphs are much smoother because the total number of genes analyzed for the human data was 21,000 compared to 4300 in yeast. The type 1 and type 3 normalized occurrence levels are slightly lower than the type 2 and type 4 levels. The type 2 and type 4
nucleosomes correspond to a higher amount of strong dinucleotides in the minor groove compared to the major groove. This could be the result of increased [C+G] concentration overall compared to the yeast genome. The interesting aspect of this figure is that the different nucleosome types have very similar curve shapes with a shift in the relative position. In the yeast data, the curves differed in the magnitude of the valley directly upstream of the transcription start site and the peak following, but there was some overlap in the rise and fall of the valley/peak. We see a complete separation of lines for the different nucleosome types. Overall, the general shape of the nucleosome occupancy plot is conserved for the human data set.
Figure 4.38: Nucleosome Type Distribution for Gaffney Series Nsm+ and Nsm- Values The adjacent nucleosomes to the transcription start sites showed a change in the human data from the yeast data. Figure 4.38 gives the nucleosome type distribution for
nucleosomes adjacent to the gene transcription start sites. The upstream nucleosomes (nsms-1 to nsms-6) are fairly consistent for nucleosome type frequency and complement the yeast data results. The nucleosomes within the region of the gene (nsms+1 to nsms+6) show a distinct trend different from the yeast data. The type 4 nucleosome frequency increases almost 10% from nsms-1 to nsms+1 while the type 1 nucleosomes decrease in frequency. These patterns continue for the other downstream nucleosomes although the magnitude of the frequency change from the upstream nucleosome frequencies is not as high. The genes were grouped with expression data obtained from [Valouev et al.
2011]23. We created four groups of genes, Table 4.5, based on the expression levels.
Table 4.5: Human Gene Groups by Expression Level
Human Gene Group # Frequency Range Number of Genes
1 0.001-‐0.1 4893
2 0.1-‐1.0 3451
3 1.0-‐8.0 6277
4 8+ 6381
The nucleosome occupancy data and adjacent nucleosome data was split into the appropriate gene groups similar to the grouping done with the yeast data. Figures 4.39 and 4.40 show the nucleosome occupancy data comparison between types 1 and 4 and
Figure 4.39: Average Gaffney Series Nucleosome Occurrence for Type 1 and Type 4 Nucleosomes around Human Gene TSS Grouped By Expression Level
between nucleosome types X and Y for the Gaffney data set. The data shows a similar shift in baseline similar to Figure 4.37 between the two graphs for all groups. The
difference between the individual groups shows an interesting trend. As the groups increase in gene expression level, group 1 being the lowest expression and group 4 being the highest expression level, we see an increase in the normal features surrounding the transcription start site. The normal valley and peak are not pronounced for any of the low expression level groups. In humans, the particular genes that are expressed in a cell are dependent on the cell type. It makes sense that low expression genes would have a more uniform nucleosome occupancy. Yeast does not have differentiated cell types so all genes can potentially be expressed requiring a specific nucleosome pattern. This pattern is observed in the more expressed genes in human, specifically groups 3 and 4.
Figure 4.40: Average Gaffney Series Nucleosome Occurrence for Type X and Type Y Nucleosomes around Human Gene TSS Grouped By Expression Level
Figures 4.41 – 4.43 give the adjacent nucleosome data for nucleosome type 1, type 4 and type Y. Each figure shows the data for a different nucleosome type and gives the change in nucleosome type frequency between the four gene groups. The χ2 analysis results are shown in Table 4.6. The p-values are not as straight forward for the human data as compared to the yeast data. The type 1 nucleosomes within the gene region show significance, which correlates to the drop in type 1 frequency seen in Figure 4.40. The other nucleosome types are not significant, but the resulting p-values are much lower than their yeast counterparts. Looking at the type 1 nucleosome frequency in Figure 4.41 we can see that the nsms+1 frequency is consistently lower than in the other downstream
Table 4.6: X2 Results for Gaffney Human Series Nucleosome Adjacent to Transcription Start Sites Grouped by Gene Expression Levels
nsms type p-‐value type 1 nsms+ 0.008 type 1 nsms-‐ 0.63 type 4 nsms+ 0.42 type 4 nsms -‐ 0.33 type Y nsms+ 0.46 type Y nsms-‐ 0.88
Figure 4.41: Nucleosome Type Distribution for Gaffney Series for Type 1 Nucleosomes Adjacent to Gene TSS
Figure 4.42: Nucleosome Type Distribution for Gaffney Series for Type 4 Nucleosomes Adjacent to Gene TSS
Figure 4.43: Nucleosome Type Distribution for Gaffney Series for Type Y Nucleosomes
nucleosomes. This trend is also shown in nsms+2, although there is not as extreme a difference from the nucleosomes more downstream. The remaining nucleosomes
(nsms+3 – nsms+6) show a difference in frequency for group 1, but that difference is not present for the remaining groups with higher expression levels. There seems to be a distinction in nucleosome type frequency for the low expression level genes that is not shown for the higher expression level genes. This agrees with the occupancy data where we saw a more uniform occupancy of the nucleosomes with low expression levels compared to the groups with genes having higher expression levels.
The final step in analyzing the Gaffney human data set was to determine the auto-correlation values for the different nucleosome types shown in Figure 4.44.
Figure 4.44: Nucleosome Distance Auto-Correlation Function for Gaffney Series Nucleosomes
The results of the auto-correlation analysis show an interesting trend in the data. The four graphs retain the basic trends from the yeast data. The bottom 5% of genes have a higher number of nucleosomes surrounding their transcription start sites than the top 5% of genes and the peaks in the data correspond to the nucleosome length separated by 20-bp
of linker DNA. The difference between the data sets is that the human baseline starts much higher than yeast’s baseline. This means that there is an overall increase in the number of nucleosomes surrounding the gene’s transcription start site. The Gaffney series contains almost 3.5 billion nucleosomes so the relative number of nucleosomes to the size of the genome is larger than the yeast data. The Clark series from section 4.1 contained fewer than 800,000 nucleosomes for a genome with 12 million nucleotides. The Gaffney set contains a greater ratio of nucleosomes to genome size so the overall auto-correlation distance function would have higher results. The important feature of Figure 4.44 is that the graphs retain the peaks corresponding to the lengths of the nucleosomes and the linker DNA separating them.
5.0 Conclusion
Five series of nucleosomes from yeast and human cells have been analyzed with reasonable results that statistically represent the collection of nucleosomes. The spread of nucleosome types is consistent for each yeast data set and the nucleosome occurrence results, as they relate to transcription sites, support previous work. We have shown that the nucleosome distribution is consistent for different sequence acquisition techniques, MNase and chemical cleavage, as well as for in vivo and in vitro data sets. We observed a shift in nucleosome distribution when we compared the yeast results to a human
nucleosome data set that corresponds to the increase [C+G] concentration of the human genome. The human data shows an increased distinction in the nucleosome occupancy surrounding gene transcription start sites for genes of low and high expression level.
We have shown that nucleosome position within the genome and surrounding gene transcription start sites is conserved for the different sets analyzed. The human data showed some interesting trends for the occupancy of low expression level genes and for the downstream nucleosome type frequency adjacent to the transcription start site.
6.0 Future Work
The planned future work of this project is to expand our analysis of yeast genes and interesting human nucleosome results. First, we want to analyze RNAseq data for yeast genes. This will ensure our transcription frequency data for yeast genes is comparable to the RNAseq data used to analyze the human genes. Other future work involves investigating the distribution of nucleosome occupancy for human nucleosomes surrounding the gene transcription start sites. Finally, we will investigate the shift in neighboring nucleosomes within genes the human data exhibited compared to the yeast data. Overall, we look to further examine the differences between the yeast and human nucleosome results obtained in this study.
References
1McDonald D, Milestone 9, (1973-1974) The nucleosome hypothesis: An alternative string theory, Nature Milestones: Gene Expression. Dec 2005
http://www.nature.com/milestones/geneexpression/milestones/articles/milegene09.html
2Hewish DR, Burgoyne LA. Chromatin sub-structure. The digestion of chromatin DNA at regularly spaced sites by a nuclear deoxyribonuclease. Biochem. Biophys. Res.
Commun. 52:504-10, 1973.
3Olins AL, Olins DE. Spheroid Chromatin Units (v bodies). Science, 1974 Jan 25;
183(4122):330-2.
4Luger K, Davey, Mader AW, Richmond RK, Sargent DF, Richmond TJ; Crystal Structure of the Nucleosome Core Particle at 2.8 A Resolution, Nature, 1997 Sept 18;
389(6648): 251-60.
5Kornberg, RD Chromatin structure: a repeating unit of histones and DNA.Science 184;
1974, 868-71.
6JM Harp, Hanson BL, DE Timm, GJ Bunick; X-RAY Structure of the Nucleosome Core Particle at 2.5 A Resolution; www.rcsb.org/ PDB 1EQZ
7UCSF Chimera – a visulaization system for exploratory research and analysis Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, Ferrin TE. J Comput Chem. 2004 Oct;25(13):1605-12. http://www.cgl.ucsf.edu/chimera
8Jansen A, Verstrepen KJ; Nucleosome Positioning in Saccharomyces cerevisiae.
Microbiol. Mol.Biol. Rev. 2011, 75(2):301.
9Talbert PB, Henikoff S; Histone variants – ancient wrap artists of the epigenome. Nat.
Rev. Mol. Biol. 11: 2010, 264-75.
10Bartova E, Krejci J, Harnicarova A, Galiova G, Kozubek S; Histone Modifications and Nuclear Architecture: A Review. J Histochem Cytochem. 2008 August; 56(8): 711-21.
11http://www.mun.ca/biology/desmid/brian/BIOL2060/BIOL2060-23/23_11.jpg
12Tolkunov D, Morozov AV; Genomic Studies and Computational Predictions of Nucleosome Positions and Formation Energies. Advances in Protein Chemistry and Structural Biology. 79. 2010 1-57
13Cui F, Zhurkin V; Structure-based Analysis of DNA Sequence Patterns Guiding
Nucleosome Positioning in vitro. Journal of Biomolecular Structure and Dynamics 27(6) 2010, 821-41
14Satchwell SC, Drew HR, Travers AA; Sequence periodicities in chicken nucleosome core DNA. Journal of Molecular Biology 191(4) 1986, 659-75
15http://hgdownload.cse.ucsc.edu/downloads.html#yeast
16Cole HA, Howard BH, Clark DJ; Activation-induced disruption of nucleosome position clusters on the coding regions of Gcn4-dependant genes extends into neighboring genes.
Nucleic Acids Res., 39, 9521-35.
17Brogaard K, Xi L, Wang JP, Widom J; A map of the nucleosome positions in yeast at base-pair resolution. Nature 486, 28 June 2012, 496-501
18Mavrich TN, Ioshikhes IP, Venters BJ, Jiang C, Tomsho LP, Qi J, Schuster SC, Albert I, Pugh BF; A barrier nucleosome model for statistical positioning of nucleosomes throughout the yeast genome. Genome Res. 18, 1073-83.
19Holstege FCP, Jennings EG, Wyrich JJ, Lee TI, Hengartner CJ, Green MR, Golub TR, Lander ES, Young RA; Dissecting the Regulatory Circuitry of a Eukaryotic Genome.
Cell 95 1998 pp. 717-28.
20 Clark et al. personal communication. Yeast ARG1 data unpublished
21http://hgdownload.cse.ucsc.edu/goldenPath/hg18/chromosomes/
22Gaffney DJ, mcVicker G, Pai AA, Fondufe-Mittendorf YN, Lewellen N, Michelini K, Widom J, Gilad Y, Pritchard JK; Controls of Nucleosome Positioning in the Human Genome. PLOS Genetics 8(11) 2012
23http://eqtl.uchicago.edu
24Valouev A, Johnson SM, Boyd SD, Smith CL, Fire AZ, Sidow A; Determinants of nucleosome organization in primary human cells. Nature 474 2011 pp. 516-22.
25Cui F, Cole HA, Clark D, Zhurkin VB; Transcriptional activation of yeast genes disrupts intragenic nucleosome phasing. Nucleaic Acids Research 40(21) 2012 10753-64.