Based on these observations, the objectives of this research were to fully characterize the unique 5’ regions of each PXR 1 and PXR 2 transcript and identify their distinct promoter regions. The majority of available sequence information was produced via high-throughput methodologies and cDNA library screenings. We sought to characterize the PXR transcripts directly from human liver tissue. In this study, we were able to identify novel transcriptionstartsites (TSS) for PXR 1 and PXR 2 by classical molecular methods and characterize each sample by its PXR isoform profile. Additionally, reporter assays using a CYP3A4 promoter construct demonstrated that PXR 2 has comparable transcriptional activity to PXR 1. In total, these data illustrate a need for better understanding of PXR regulation, isoform expression and the cumulative effect on the regulation of xenobiotic metabolism.
Tri-methylation of histone H3 lysine 4 (H3K4me3) is a major chromatin mark regulating gene transcription . It is mostly found around transcriptionstartsites (TSS) and strongly associated with active transcription [2, 3]. Active chromatin marks such as H3K4me3 are typically restricted to narrow regions over specific functional gen- omic motifs while repressive marks, such as H3K27me3, are often deposited over broader genomic regions . However, very broad peaks of H3K4me3 were recently identified in many cell types as marks that predicted cell identity . These broad peaks spanned up to 60 kb and highly differed between cell types [6, 7]. It was further shown that the breadth of H3K4me3 regions was posi- tively correlated with transcriptional consistency . The relationship between RNA polymerase pausing and transcriptional consistency may explain the regulatory role of these broad regions of H3K4me3. However, a mechanism linking them definitively has not yet been identified. Control of transcriptional noise may be per- missive for cell fate decisions, and therefore, tight regu- lation of transcriptional consistency may be required for full commitment to a phenotype . The goal of this study was to identify distinctive patterns of H3K4me3 peak breadth within a narrower region around TSSs and determine if H3K4me3 breadth specifically at TSSs rep- resented an independent variable of transcription regula- tion, a topic not previously investigated in diseases.
ABSTRACT Transcriptionstartsites (TSSs) lying inside annotated genes, on the same or opposite strand, have been observed in diverse bacteria, but the function of these unexpected transcripts is unclear. Here, we use the metal-reducing bacterium She- wanella oneidensis MR-1 and its relatives to study the evolutionary conservation of unexpected TSSs. Using high-resolution til- ing microarrays and 5=-end RNA sequencing, we identified 2,531 TSSs in S. oneidensis MR-1, of which 18% were located inside coding sequences (CDSs). Comparative transcriptome analysis with seven additional Shewanella species revealed that the major- ity (76%) of the TSSs within the upstream regions of annotated genes (gTSSs) were conserved. Thirty percent of the TSSs that were inside genes and on the sense strand (iTSSs) were also conserved. Sequence analysis around these iTSSs showed conserved promoter motifs, suggesting that many iTSS are under purifying selection. Furthermore, conserved iTSSs are enriched for regu- latory motifs, suggesting that they are regulated, and they tend to eliminate polar effects, which confirms that they are func- tional. In contrast, the transcription of antisense TSSs located inside CDSs (aTSSs) was significantly less likely to be conserved (22%). However, aTSSs whose transcription was conserved often have conserved promoter motifs and drive the expression of nearby genes. Overall, our findings demonstrate that some internal TSSs are conserved and drive protein expression despite their unusual locations, but the majority are not conserved and may reflect noisy initiation of transcription rather than a biolog- ical function.
The nucleosome containing histone H3.5 is unstable We found that the H3.5 nucleosome is quite unstable, as compared to the H3.3 nucleosome in vitro (Fig. 1). Con- sistently, the mobility of H3.5 is remarkably faster than that of H3.3 in living cells (Fig. 5). In humans, the expres- sion level of another histone H3 variant, H3T, is also high in the testis, but low in somatic cells [6, 8, 9]. The nucleo- some containing H3T is quite unstable in vitro and in liv- ing cells . Nucleosome instability was also reported with a mouse testis-specific H2A variant, H2AL2 . Therefore, instability may be a common characteristic of the testis-specific nucleosomes. The unstable nature of the H3.5 nucleosome may be suitable for further replacement with transition proteins and protamines. H3.5 incorporation may also regulate the transcription of the genes required during the early stages of spermato- genesis. However, H3.3 appears to be more relevant for regulating transcription during spermatogenesis, as its incorporation is correlated with the gene expression level. In contrast, H3.5 incorporation may function to transiently mark TSSs to assist in the replacement with H3.3, depending on gene expression. Histone acetylation may also play an important role in global and/or local
Other deletions narrowed the region required for PinifC3 transcription to an 86-nt interval. Transformants employing promoter fragments with 5 ⬘ endpoints 254 and 151 nt upstream of the transcriptionstart site, but not one with the promoter fragment truncated to ⫺ 65, produced GUS enzyme in cleaving sporangia (Fig. 2A and B). These transformants were also examined using RNA blotting, since enzyme levels may under- state transcript accumulation during the rapid process of cleav- age (45). In transformants utilizing either the ⫺ 254 or ⫺ 151 promoter deletions, strong mRNA induction was observed, FIG. 1. Promoters of Phytophthora NIF genes. Shown (top to bottom) are alignments of portions of the promoters of the PinifC1, PinifC2, and PinifC3 genes of P. infestans, three putative homologues from P. sojae, and two from P. ramorum. The last five are extracted from sequences upstream of gene models estExt_fgenesh1_pg.C_260137, estExt_fgenesh1_pg.C_260134, estExt_fgenesh1_pg.C_260135, fgenesh1_pg.C_scaffold_ 35000061, and fgenesh1_pg.C_scaffold_35000062 from assembly 1.0 of their respective genome projects. Numbers in the left margin indicate distances from transcriptional and/or translational start points; only the latter are predicted for the P. sojae and P. ramorum genes. Black shading represents ⬎ 70% identity among the eight genes, and gray shading indicates identity in P. infestans only. Indicated at the top of each panel for the P. infestans genes are approximate areas of predicted transcriptionstartsites (TSP1 and TSP2), deletion endpoints used in functional studies of PinifC3, and the 7-nt cold box required for gene induction during zoosporogenesis. For PinifC1 and PinifC2, a majority of 5 ⬘ transcript termini appeared to be within TSP1, while for PinifC3 most were at TSP2.
In prostate tumor samples, analysis of XMRV integration sites also showed a preference for transcriptionstartsites, CpG islands, and DNase-hypersensitive sites. Significantly, XMRV integration sites in tumors are commonly found within cancer breakpoints, within common fragile sites, and near miRNA genes, features that are frequently linked with human cancers. Cancer cytogenetics has been a powerful means to pinpoint the locations of cancer-initiating genes, and acquired chromo- somal changes have now been reported to occur in more than 50,000 cases across all main cancer types (http://cgap.nci.nih .gov/Chromosomes/Mitelman) (50). Balanced chromosome re- arrangements, particularly translocations, are strongly associ- ated with distinct tumor entities and may represent an initial event in oncogenesis (68). The common fragile site is another cancer-associated genomic feature that is frequently altered in non-virus-associated tumors (35). Both cancer breakpoints and common fragile sites are preferential integration targets for vector DNA, hepatitis B virus, and various DNA viruses, in- cluding human papillomavirus, Epstein-Barr virus, simian virus 40, and adeno-associated virus. These integration events may contribute significantly to the development of various types of cancers by disrupting the normal activity of tumor suppressor genes or proto-oncogenes in the vicinity (35, 46, 66, 76, 77, 105). In the SCID-X1 gene-therapy trial wherein two patients received an MLV-derived vector and subsequently developed leukemia via activation of the LMO2 oncogene (37), the two integration sites targeted by the MLV-based vector reside within FRA11E, a common fragile site known to correlate with chromosomal breakpoints in tumors (7). Since XMRV inte- gration in DU145 cells does not display a bias for cancer breakpoints and common fragile sites, the high XMRV inte- gration preference seen in tumor samples for genomic regions with the highest frequencies of cancer breakpoints and com- mon fragile sites is striking and likely represents a selection process. The key question of whether these integrated provi- ruses are an indirect consequence of genomic instability initi- ated by other genetic lesions or perhaps have a direct role in prostate carcinogenesis awaits further investigations.
this developmentally-linked gene expression suggest that this “hard-wiring” is likely the result of globally-acting regulatory mechanisms, specifically; stage-specific varia- tions in nucleosome positioning, processivity of the RNA polymerase II complex and stage-specific variations in the stability of the mRNA transcript [12-19]. Hypotheses that considered regulation of stage-specific gene expression exerted at the level of individual promoters, through specific transcription factor biding to cis -regulatory DNA motifs, fell out of favour in the early 2000′s due to the apparent absence of transcription factors in the P. falcip- arum genome [1,20,21]. In 2008, however, a restricted number of specific transcription factors, sharing the ape- tela 2 (AP2) DNA binding motif, were found in P. falcip- arum , with homologues quickly identified throughout all apicomplexans, leading to their designation as ApiAP2 transcription factors [22-25]. ApiAP2 have subsequently been shown to be critical regulators in the regulation of gene expression throughout the Plasmodium spp. life cycle as well as potentially playing a role in the monoalle- lic expression of the PfEMP1 virulence protein family through modulation of the local chromatin environment [26-31]. In 2010, using protein binding arrays, the cognate cis -acting DNA motif for 24 of the 27 P. falciparum ApiAP2 were determined . Interestingly, these DNA motifs are widely distributed within intergenic regions, with many intergenic regions sharing multiple ApiAP2 binding sites. Whilst this multiplicity of ApiAP2 binding sites may represent the means for a model of multifactor- ial control (a point that will be picked up later), whether all predicted DNA binding sites actually act as cis -regula- tory sites remains to be addressed. In the absence of well- defined transcriptionstartsites for P. falciparum , our in- ability to relate the position of a predicted ApiAP2 to this key transcriptional landmark hampers our efforts to de- sign functional studies to explore their role in the control of transcription initiation.
study may draw further attention to the risk of rAAV2 vector integration. First, large host chromosomal deletions at integra- tion sites, which had been considered rare in our previous study (22), were found to be relatively common. In addition, translocations were occasionally found. These chromosomal changes may result in the disruption of genes; therefore, func- tional loss of cellular genes should be more carefully consid- ered. Second, rAAV2 vectors preferentially integrated near gene regulatory sequences. This nature of rAAV2 integration may not disrupt genes but in some cases may allow proviruses to drive flanking cellular genes, similar to retroviral long ter- minal repeats. This is because unpredictable complex struc- tures of rAAV2 proviruses with various deletions and rear- rangements (22) may delete poly(A) sequences or place functional enhancer/promoter sequences next to open reading frames of flanking cellular genes. Third, cancer-related genes were found to be hit by rAAV2 integration at a frequency of 3.5%. A majority of these cancer-related genes have been identified in hematologic malignancies, and the significance of these in liver remains to be addressed. However, we need to keep in mind that one third of integrations occurred within ⫾ 1 kb of the transcriptionstartsites and upstream of the coding sequences of cancer-related genes; therefore, not only loss of function but also gain of function of these genes may be pos- sible.
By counting X-chromosomes in nuclear cycle 12, embryos engage their dosage compensation machinery in time to avoid an imbalance in X-linked gene products that would otherwise arise between the sexes during cycle 14 when genome-wide transcription begins (Gergen, 1987; Tracey et al., 2000). But such early chromosome counting demands that expression of the genes that communicate X-chromosome dose to Sxl must begin even before cycle 12. As Fig. 1 illustrates, the two strongest XSEs, sisA and sc, are among the earliest expressed genes (Erickson and Cline, 1993). The question of whether these sex-determination genes might have something in common that allows them to be expressed so early led us to the heptanucleotide motif, CAGGTAG. Three of the four XSEs in D. melanogaster (all but run) possess multiple copies of this sequence or its reverse complement within 500 bp of their transcriptionstartsites (Erickson and Cline, 1998; Sefton et al.,
Many studies of specific loci have described differential enrichment of particular histone modifications between the two parental chromosomes at some ICRs and imprinted gene promoters or transcriptionstartsites (TSSs). These marks include histone H3 acetylation, H4 acetylation, H3 dimethylation at lysine 4 (H3K4me2) and H3K4me3, which are found preferentially enriched on the unmethylated chromosome or normally active allele in comparison with its homologous counterpart, and H3K27me2, H3K27me3, H3K9me2 and H3K9me3 pre- ferentially enriched on the methylated chromosome or inactive allele [5-42]. H4K20me3 has previously been shown to be preferentially enriched on the methylated chromosome of eight ICRs [6,20,28,35,36,38,40]. At non- ICR regions, limited experimental analysis has been undertaken to test for enrichment of this mark - no pre- ferential enrichment was found for one imprinted cluster  and data conflicted when different experimental approaches were used for one other gene . Two non- allele-specific studies have identified a higher proportion of coenrichment of active (H3K4me2/3) and repressive (DNA methylation and H3K9me3) marks at imprinted loci compared to other loci, verifying the ability to detect histone modifications preferentially enriched on one of the two chromosomes using microarray platforms [43,44]. Importantly, individual genes and individual ICRs show different combinations of enriched histone modifications, with cell-type specificity also apparent. Despite many studies having previously assessed
Circadian rhythms modulate the biology of many human tissues, including brain tissues, and are driven by a near 24-hour transcriptional feedback loop. These rhythms are paralleled by 24-hour rhythms of large portions of the transcriptome. The role of dynamic DNA methylation in influencing these rhythms is uncertain. While recent work in Neurospora suggests that dynamic site-specific circadian rhythms of DNA methylation may play a role in modulating the fungal molecular clock, such rhythms and their relationship to RNA expression have not, to our knowledge, been elucidated in mammalian tissues, including human brain tissues. We hypothesized that 24-hour rhythms of DNA methylation exist in the human brain, and play a role in driving 24-hour rhythms of RNA expression. We analyzed DNA methylation levels in post-mortem human dorsolateral prefrontal cortex samples from 738 subjects. We assessed for 24-hour rhythmicity of 420,132 DNA methylation sites throughout the genome by considering methylation levels as a function of clock time of death and parameterizing these data using cosine functions. We determined global statistical significance by permutation. We then related rhythms of DNA methylation with rhythms of RNA expression determined by RNA sequencing. We found evidence of significant 24- hour rhythmicity of DNA methylation. Regions near transcriptionstartsites were enriched for high-amplitude rhythmic DNA methylation sites, which were in turn time locked to 24-hour rhythms of RNA expression of nearby genes, with the nadir of methylation preceding peak transcript expression by 1–3 hours. Weak ante-mortem rest-activity rhythms were associated with lower amplitude DNA methylation rhythms as were older age and the presence of Alzheimer’s disease. These findings support the hypothesis that 24-hour rhythms of DNA methylation, particularly near transcriptionstartsites, may play a role in driving 24-hour rhythms of gene expression in the human dorsolateral prefrontal cortex, and may be affected by age and Alzheimer’s disease.
The available information regarding the transcription of beta- papillomavirus genes is limited due to the lack of suitable in vitro cell lines enabling the transcription of viral genes and the replica- tion of the viral genome. Haller et al. provided the only report addressing HPV5 differentiation-dependent transcription and al- ternative splicing, identifying multiple HPV5 transcripts from EV patients via in situ hybridization (16). Each of the characterized transcripts was spliced at two major splice donor sites: one site was situated in the E6-proximal portion of the LCR region at nucleo- tide (nt) 4, and the other site was located downstream of the first ATG codon of E1 (nt 982). Furthermore, two major conserved splice acceptor sites were identified: one site was located in the first part of the E4 ORF, at nt 3322, and the other site was located upstream of the E2 ORF, at nt 2676. The early and late promoters have been mapped to transcriptionstartsites (TSSs) at nt P175 and P7535 for HPV8 (17, 18). Two promoter regions have been implicated in the LCR of HPV5; however, the exact positions of these promoters are unknown and presumed to be similar to those observed in the closely related virus type HPV8 (16). The HPV5 LCR is only 478 bp long, compared with the ⬎ 800-bp LCRs of previously identified alphaHPVs. In addition, the HPV5 LCR does not contain putative binding sites recognized by the transcription enhancer factor TEF-1 and the transcription factor SP-1, which are present in all alphaHPVs (10, 19). Mucosal HPVs exhibit stronger enhancers in the LCR than HPV5 and HPV8 (10), sug- gesting that different HPV LCRs also display different capacities to cooperate with cellular factors depending on the tissue types in- fected. The HPV5 early region is located in the 5= region of the genome and encodes oncoproteins E6 and E7. HPV5 also encodes the viral replication factor and helicase E1 and the viral replication and transcription factor E2. The viral oncoprotein E4 is expressed as a fusion protein with E1. The region between the early and late genes in alphaHPVs is long (300 to 500 bp) and contains the E5 gene; however, in betaHPVs, this region is less than 100 bp (80 bp in HPV5), and the E5 ORF is missing (20). The late region is located downstream of the early region and encodes the viral ma- jor and minor capsid proteins, L1 and L2, respectively (21).
Histone modifications, such as H3K4me and H3K36me, are major regulators of chromatin structure at the FLC locus and are essential for maintaining high levels of FLC transcription (He et al., 2004; Kim et al., 2005; Pien et al., 2008; Shafiq et al., 2014; Tamada et al., 2009; Xu et al., 2008; Zhao et al., 2005). Thus, we further investigated whether TOP1 α -mediated changes in DNA topological structure affect H3K4 and H3K36 methylations at the FLC locus. By performing chromatin immunoprecipitation (ChIP) assays, we observed that H3K4me3 enrichment in the top1 α mutant was not significantly different compared with that of the wild-type plant at the FLC locus (Fig. 1E). However, levels of H3K36me2 and H3K36me3 were dramatically reduced in top1 α mutants, especially in the 500-1500 bp region downstream of the transcriptionstartsites (TSSs), rather than evenly reduced in the entire gene body (Fig. 1F,G), suggesting that TOP1 α is involved in regulating di- and tri-methylation levels of H3K36 at the FLC locus in the early phase of transcription elongation.
FIG. 1. KSHV ORF56 is a split gene that produces transcripts of low abundance after lytic induction. (A) Schematic diagrams of the KSHV ORF56 and ORF57 ORFs and their transcripts. The numbers above each ORF are the nucleotide positions of the start and termination codons in the KSHV genome (GenBank accession number U75698) (33). The heavy line with dashes on both ends represents the genomic region encompassing ORF56 and ORF57, with promoters (arrows, designated by their transcriptionstartsites) and 3 ⬘ -end processing signals [a poly(A) signal and a cleavage site (CS)] indicated. Below the heavy line are bicistronic ORF56/57 and monocistronic ORF57 pre-mRNAs that each contain two exons (boxes 1 and 2) and one intron (dashes between boxes). Primers (heavy arrows) used to characterize KSHV ORF56 and ORF57 transcripts are shown below the pre-mRNAs and are named by the locations of their 5 ⬘ ends. Antisense RNA probes a and b for RPA assays and the resulting RNA products protected from each probe are illustrated in the bottom of the diagram, with the sizes (nt) in parentheses. (B) The majority of KSHV ORF56 transcripts are bicistronic ORF56/57 transcripts that are spliced to remove the ORF57 intron. The primers Pr81856 and Pr82296 were used in an RT-PCR on DNase I-treated total RNA (8 g) isolated from uninduced (0 h) or n-butyrate (NB)-induced (24 and 48 h) JSC-1 cells. Parallel reactions without reverse transcriptase were used as a control. Twenty nanograms of Bac36 DNA (48) was used as a KSHV DNA control. The lower panel shows the splicing junction identified by sequencing of the 333-bp PCR product. (C) RPA analysis of ORF56 and ORF57 transcripts. Twenty micrograms of total RNA from uninduced or butyrate (NB)-induced JSC-1 cells was hybridized with 4 ng of antisense 32 P-labeled
Although data on the repression of MVSG expression in bloodstream and procyclic-form trypanosomes and on the sto- chastic nature of differential activation in the metacyclic form are available (11, 29, 35), we do not know how individual promoters function to recruit RNA polymerase, nor do we know anything about the molecular processes that mediate stage-specific regulation. The obvious difficulties in working with a nondividing cell type that cannot be obtained through existing in vitro culture techniques and the need to transmit trypanosomes through tsetse flies have meant that only indirect approaches have been possible. After repeated serial passaging of trypanosomes through laboratory mice Donelson and co- workers were able to identify low numbers of bloodstream form cells that had activated MVSGs de novo. The transcrip- tion of two of the MVSGs was monocistronic (1, 20), and one population had transcriptionally activated a MES without de- tectable genomic rearrangements (1). Transcriptionstartsites for these MVAT4 and MVAT7 genes were mapped to elements referred to as putative MVAT promoters that varied between 67 and 73 bp in length and that were able to drive reporter gene expression from plasmids in transient-transfection exper- iments with procyclic culture form parasites (1, 20). A similar element was isolated in a promoter trap conducted in procyclic trypanosomes (25). In a different approach, using 5- to 7-day- old trypanosome populations that were expanded in mice from a single metacyclic trypanosome and therefore that were still transcribing a MES within the natural context of the life cycle, Graham et al. (11, 13) reported transcriptionstartsites for the 1.22 and 1.61 MVSG genes. Although partly similar to each other, these sites differed from those for the MVAT4 and MVAT7 genes. Nevertheless, an upstream segment of the 1.22 MVSG element resembled the MVAT promoter-like element and was able to drive reporter gene expression in bloodstream form trypanosomes but was inactive when tested in procyclic cells (11, 13–15). The conclusions from the papers by Graham et al. therefore indicated a promoter structure that was an alternative to that of the putative MVAT promoters studied by Donelson and coworkers and a complex pattern of transcrip- tional regulation that involved chromosomal context and the presence or absence of trans-acting factors in different life cycle stages (14, 15). In the present work, we resolve the discrepan- cies between the previously published work from our and other laboratories through direct examination of transcription of MVSGs in trypanosomes isolated from the salivary glands of infected tsetse flies. We also demonstrate for the first time accurate in vitro initiation of transcription of these putative core promoters by using procyclic extracts, validating a de- tailed mutational analysis of the core promoter in a life cycle stage that is more amenable to experimental study.
Recent studies using global run-on sequencing (GRO- Seq) characterized nascent transcription in response to E2 treatment in MCF7 cells and showed that many of the ERα bound enhancers bind RNA pol II and transcribe enhancer RNAs (eRNA) . Importantly, eRNA tran- scription and/or eRNA transcripts per se are required for activation of adjacent target genes . To determine whether TDG plays a role in eRNA transcription, we first looked too see whether sites of TDG binding coincide with sites of E2-mediated eRNA transcription in MCF7 cells by overlaying sites of E2-dependent TDG localiza- tion with publicly available GRO-Seq data. We find that, on average, sites of E2-dependent TDG localization also undergo a concomitant increase in transcription in response to E2 (Fig. 5a, b). Furthermore, sites of TDG binding at the enhancers of target genes we examined previously overlap precisely with locations that undergo transcription at those targets (Fig. 5c). Transcription of noncoding RNA from ER-targeted enhancers is readily induced by 100 nM E2 treatment for 1 h. Remarkably, depleting TDG protein using siRNA prior to treatment abrogates the ability of E2 to induce eRNA from TDG- targeted enhancers (Fig. 5d and Additional file 10). These findings reveal for the first time a potential mechanism by which TDG regulates ER-signaling.
ClustalW2  release 2.0.10 was used to realign the sequences used with TRANSFAC, version 2008.3,  to determine the PWMs. The gap extension penalty was reduced to 7 from the default of 15, the gap extension to 3 from 6.66 and the transitions weighting to 0 from 0.5. Minor manual adjustments were made to the results to reduce the number of locations with optional gaps. The 10 PWMs where TRANSFAC had introduced gaps in order to produce their published PWMs were I $DL_01, V$MYOGNF1_01, IRF-1, V$IRF2_01, V $BRN2_01, V$ARP1_01, P$EMBP1_Q2, V$RSRFC4_Q2, V$LUN1_01, V$DEAF1_02. In all, 510 PWMs were pro- cessed through ClustalW2. Of these there were 70 PWMs where ClustalW2 introduced gaps in order to obtain alignment of one or two base-pairs on the edge. These added no significant information and were ignored. There were 58 cases where ClustalW2 intro- duced a gap for one site (or all but one site) in the cen- tre of the binding region. These cases could be significant but given the small sample size, these were also ignored. There were 26 examples similar to the above where there was more than one gap that was introduced, but still there was only a single instance of each type, so these were ignored as well. There were 159 examples where ClustalW2 introduced one or more gaps involving more than one site. The significance of these examples varies in a continuous spectrum from many probably being of no significance through to the examples given in the Results, which were the two best. Logos and information content were calculated for the core of the sequence where base types were available for more than 50% of the binding sites. Information content for PWMs resulting from the gapped alignments and the standard alignments were calculated as described above.