We also estimate the selection strength among the four focal groups using the selection intensity parameter K and the group average dN/dS value. The null hypothesis is that there should be no signiﬁcant difference among the four groups if genomearchitecture does not impact patterns of gene family evolution. To our surprise, the Karyorelictea and Heterotrichea have more gene families under intensiﬁed selection compared with NEF reference group (Table 2). Similarly, there is a trend of selection strength (measured by group average dN/dS values) among the four focal groups (Karyorelictea ⬎ NEF ⬎ Heterotrichea ⬎ EF; Fig. S1). The intensiﬁcation and lowest dN/dS values suggest that Karyorelictea is under the most selective constraint, whereas the EF group is under the most relaxed selection, which could be either relaxed purifying selection or weak positive selection. Our analyses are at odds with the null hypothesis (i.e., that genomearchitecture and patterns of sequence evolution are not correlated) and further emphasize the impact of different genome architectures, in- cluding programmed genome rearrangements, on gene family evolution. We also tested for episodic diversifying selection between paralogs and orthologs for each group (Fig. 2). Here, a signiﬁcantly higher proportion of paralogous branches under episodic selection is detected in the Heterotrichea, EF, and NEF groups, which indicates FIG 4 Summary of features of the four focal ciliate groups, including the ability of somatic division, somatic ploidy, the structure of somatic genome, the average transcript diversity, and patterns of selection estimated by average dN/dS ratio, and, based on their nuclear/genome features, how likely was it that compensatory mutations would occur in each group when mildly deleterious mutations are present. Unknown features are indicated by a question mark (“?”). Diagrams of representative members of each group are drawn with somatic nuclei in empty circles and germline nuclei in ﬁlled circles (black). Oral structures are shown in light gray.
The common ancestry of archaea and eukaryotes is evident in their genomearchitecture. All eukaryotic and several archaeal genomes consist of multiple chromosomes, each replicated from multiple origins. Three scenarios have been proposed for the evolution of this genomearchitecture: 1) mutational diversification of a multi-copy chromosome; 2) capture of a new chromosome by horizontal transfer; 3) acquisition of new origins and splitting into two replication- competent chromosomes. We report an example of the third scenario: the multi-origin chromosome of the archaeon Haloferax volcanii has split into two elements via homologous recombination. The newly generated elements are bona fide chromosomes, because each bears “chromosomal” replication origins, rRNA loci, and essential genes. The new chromosomes were stable during routine growth but additional genetic manipulation, which involves selective bottle- necks, provoked further rearrangements. To the best of our knowledge, rearrangement of a naturally evolved prokaryotic genome to generate two new chromosomes has not been described previously.
Genomearchitecture is well diversified among eukaryotes in terms of size and content, with many being radi- cally shaped by ancient and ongoing genome conflicts with transposable elements (e.g., the large transposon-rich genomes common among plants). In ciliates, a group of microbial eukaryotes with distinct somatic and germ-line genomes present in a single cell, the consequences of these genome conflicts are most apparent in their develop- mentally programmed genome rearrangements. This complicated developmental phenomenon has largely over- shadowed and outpaced our understanding of how germ-line and somatic genome architectures have influenced the evolutionary dynamism and potential in these taxa. In our review, we highlight three central concepts: how the evolution of atypical ciliate germ-line genome architectures is linked to ancient genome conflicts; how the complex, epigenetically guided transformation of germline to soma during development can generate widespread genetic variation; and how these features, coupled with their unusual life cycle, have increased the rate of molecular evolu- tion linked to genomearchitecture in these taxa.
Although the GAM method inherently performs single-cell chromatin co-segregation sampling, GAM data have not so far been used for single-cell studies of chromatin architecture. Thus, in this paper we present SluiceBox, a novel pipeline for studying the genomearchitecture of selected loci. We chose the name SluiceBox—the water- and-trough system used by miners to pan for gold—because it serves as an accurate metaphor for mining GAM data for nuggets of insight into chromatin conformation. We employ SluiceBox to analyze single-cell GAM data in order to study the Hist1 region. We find that the Hist1 region does show states with distinctly different chro- matin structures. We use GAM and SluiceBox to explore the hypothesis that the various states we observed orig- inate from differences in cell cycle phase.
ABSTRACT The Collaborative Cross Consortium reports here on the development of a unique genetic resource population. The Collaborative Cross (CC) is a multiparental recombinant inbred panel derived from eight laboratory mouse inbred strains. Breeding of the CC lines was initiated at multiple international sites using mice from The Jackson Laboratory. Currently, this innovative project is breeding independent CC lines at the University of North Carolina (UNC), at Tel Aviv University (TAU), and at Geniad in Western Australia (GND). These institutions aim to make publicly available the completed CC lines and their genotypes and sequence information. We genotyped, and report here, results from 458 extant lines from UNC, TAU, and GND using a custom genotyping array with 7500 SNPs designed to be maximally informative in the CC and used a novel algorithm to infer inherited haplotypes directly from hybridization intensity patterns. We identiﬁed lines with breeding errors and cousin lines generated by splitting incipient lines into two or more cousin lines at early generations of inbreeding. We then characterized the genomearchitecture of 350 genetically independent CC lines. Results showed that founder haplotypes are inherited at the expected frequency, although we also consistently observed highly signi ﬁ cant transmission ratio distortion at speci ﬁ c loci across all three populations. On chromosome 2, there is signi ﬁ cant overrepresentation of WSB/EiJ alleles, and on chromosome X, there is a large de ﬁ cit of CC lines with CAST/EiJ alleles. Linkage disequilibrium decays as expected and we saw no evidence of gametic disequilibrium in the CC population as a whole or in random subsets of the population. Gametic equilibrium in the CC population is in marked contrast to the gametic disequilibrium present in a large panel of classical inbred strains. Finally, we discuss access to the CC population and to the associated raw data describing the genetic structure of individual lines. Integration of rich phenotypic and genomic data over time and across a wide variety of ﬁelds will be vital to delivering on one of the key attributes of the CC, a common genetic reference platform for identifying causative variants and genetic networks determining traits in mammals.
modifiers. For example, HiChIP provides a protein- centric view of genomearchitecture by coupling ChIP-seq with Hi-C, and has identified genome-wide cohesin-mediated looping interactions . A similar method, PLAC-seq, targets H3K4me3 histone marks to generate improved maps of promoter–enhancer interac- tions in mESCs defined by this chromatin modification . Although designed for the same goals as chromatin interaction analysis by paired-end tag sequencing (ChIA- PET), HiChIP and PLAC-seq require less starting mater- ial, which improves library complexity and signal-to-noise ratios. These methods also work independently of multiple probes, unlike CHi-C methods, and thus can be less costly, and ideally prevent probe-binding biases. HiChIP and PLAC-seq should provide useful insights for diseases arising from mutations affecting epigenetic modifiers, TFs, TF-binding loci, and architectural proteins. Below we highlight examples focused on architectural proteins, but future applications could include applying HiChIP or PLAC-seq to numerous chromatin modifiers that are the targets of epigenetic therapies , as mutations in several of these modifiers likely alter the 3D genome structure in addition to chromatin structure.
Eukaryotic cells contain multiple linear chromosomes that are replicated from multiple origins. For this type of genomearchitecture to arise, three steps are required (but not necessarily in this order): multiplication of origins, multiplication of chromosomes, and linearisation of chromosomes. Given the shared evolutionary history of eukaryotes and archaea, it is not surprising that two of these three features are found in archaeal genomes as well. Up to four replication origins can be present on some archaeal chromosomes, and multiple chromosomes that use an Orc-type replication initiation mechanism co-exist in haloarchaeal species; however, no archaeon with linear chromosomes has been found to date. Here we show that an increase in the number of circular chromosomes is easily achievable through natural evolution. To the best of our knowledge, rearrangement of a naturally-evolved prokaryotic genome that generates two new chromosomes, each with pre-existing multiple origins that depend on the same type of replication initiation, has not been described previously. Interestingly, the H. volcanii genome might already contain an imprint of a similar event, where the ancestral
Red algae comprise ~7100 species found primarily in marine environments, although some also occur in fresh- water habitats. Beyond their important ecological roles, red algae are crucial to the evolution of marine phyto- plankton. This is because a single or, potentially, multiple ancient red algal lineages donated their plastid to a myriad of chlorophyll c-containing forms such as haptophytes, cryptophytes, stramenopiles, dinoflagellates, and apicom- plexans through secondary (or additional rounds of) endosymbiosis [12–14]. Due to the importance of these chlorophyll c-containing groups as primary producers and grazers, a large number (currently 75) of species with red algal-derived plastids have been sequenced. However, the donor lineage of these plastids remains relatively poorly studied, with only 29 plastid genomes reported, and these primarily from a single red algal class, the sexually repro- ducing (with one exception, see below) Florideophyceae, with no genomes available from three other classes that rely primarily on asexual reproduction (Stylonematophyceae, Compsopogonophyceae, and Rhodellophyceae) [6, 15–27]. This imbalance in available data is readily apparent when compared to Viridiplantae, for which hundreds of complete plastid genomes have been determined. These “green” plastid genomes have been used to re- solve basal group relationships in Viridiplantae and to document the high genomearchitecture variability in most green algae when compared to the extreme con- servation found in flowering plants (about 800 ge- nomes in GenBank) [28–30].
and species susceptibility of ADRV and RGV. Another highly variable virulence-related gene is vIF2α, in which two functional domains (S1 and HD) are completely ab- sent in RGV and FV3. There are only 69 and 76 amino acids remaining in part of the C-terminal domain. Re- cently, the role of vIF2α in blocking the antiviral effects of cellular PKR has been confirmed in some ranaviruses [49,50], and the truncated vIF2α in FV3 was also demon- strated to be involved in viral pathogenesis . Thus, the variable vIF2αs in ranaviruses might be associated with viral pathogenesis and host susceptibility. The third major gene variation is a novel gene. In ADRV, the novel 75L gene contains both motifs of NLS and NES, whereas its homologues in RGV, FV3 and CMTV lack a 53 amino acid N-terminal sequence and the NLS motif, and has only a NES motif. NLS motifs have been suggested to be important for quick import and export of viral nucleopro- tein [52,53] and for efficient viral protein synthesis [54,55], and the nuclear localization role has been demonstrated in RGV 50L gene with an NLS motif . Perhaps, the novel 75L gene with NLS and NES motifs might be in- volved in viral protein synthesis and transport during ADRV infection, and might be a strongly virulent gene. Indeed, genomearchitecture changes and major gene vari- ations are the raw material basis for evolutionary emer- gence, and major gene variations have been shown to determine the pathogenicity of viruses [57,58]. Therefore, our current findings suggest that ADRV might emerge from a common ancestor of amphibian-subgroup rana- viruses, in which the corresponding genetic change routes through genomic changes include segment inversion, frag- ment insertion and deletion, and some major virulence- related gene variations.
This fundamental genomic understanding is likely to be valuable for crop improvement. Oliver et al.  tab- ulated 65 examples of TE insertions in regulatory or coding sequences that affect a wide range of phenotypic traits, such as skin color in grape  and anthocyanin accumulation in blood orange . The most famous example involving a TE insertion and crop productivity is perhaps the insertion of the Hopskotch TE in the far-upstream regulatory region of tb1 in maize, which enhanced tb1 expression and promoted the typical architecture of the maize plant relative to that of its progenitor, teosinte . Gene and genome doubling have also been shown to be important in agriculture, as summarized by Olsen and Wendel . Examples of this importance are seen in major grains such as wheat and rice, as well as in other crop plants such as tomato and sunflower. In addition to cases in which known TE insertions or duplicated genes have been shown to affect crop plant traits, the more general im- portance of these events has been appreciated, even when the specific lesions are not understood. For ex- ample, in the most important species of cotton (G. hirsutum), which is allopolyploid, the two co-resident genomes have intermingled and contribute unequally to fiber quality and yield [94–98]. In maize, large genotype – phenotype association studies have shown that modern paralogs descended from the most recent WGD are ~50 % more likely to be associated with functional and phenotypic variation than singleton genes, which highlights the importance of genome- wide neofunctionalization in generating new variation . As is the case for TEs and WGD events, diversi- fication, evolution and selection of small RNAs are potentially important processes in crop plants, includ- ing rice [49, 64] and cotton . In cotton, only one of two homoeologs of an mRNA that encodes a MYB transcription factor underwent preferential degrad- ation during cotton fiber development, which makes this case particularly illustrative of a direct link be- tween a recent WGD event and miRNA behavior. Fur- ther work is needed to understand the interplay between TE proliferation, insertion/retention bias in polyploid plants and small RNA biology, and how to harness this biology to enhance traits of agronomic importance.
Results: The PS128 genome contains 11 contigs (3,325,806 bp; 44.42% GC content) after hybrid assembly of sequences derived with Illumina MiSeq and PacBio RSII systems. The most abundant functions of the protein‑coding genes are carbohydrate, amino acid, and protein metabolism. The 16S rDNA sequences of PS128 are closest to the sequences of L. plantarum WCFS1 and B21; these three strains form a distinct clade based on 16S rDNA sequences. PS128 shares core genes encoding the metabolism, transport, and modification of TAs with other sequenced L. plantarum strains. Compared with the TA‑related genes of other completely sequenced L. plantarum strains, the PS128 contains more lipoteichoic acid exporter genes.
that experienced divergent evolutionary pressures and re- duced recombination after the Y chromosome acquired a sex-determination gene (Graves 1995b; Charlesworth 1996; Lahn and Page 1999). As a result, the X and Y chromosomes lack homology across most of their length. Nonetheless, the heterogametic sex chromosomes must function like a homol- ogous chromosome pair and segregate reductionally at mei- osis. To meet this challenge, the X and Y of most mammals retain a small, 1–5 Mb telomere-adjacent segment of well-preserved sequence homology known as the pseudoau- tosomal region (PAR) (Mangs and Morris 2007). The critical meiotic activities of pairing, synapsis, and crossing over are concentrated to this narrow interval (Burgoyne 1982), rendering the PAR the most recombinogenic locus in the mammalian genome (Rouyer et al. 1986; Page et al. 1987; Hinch et al. 2014). In most mammals, disruption of sequence homology between X- and Y-linked PAR sequences can trig- ger meiotic metaphase I arrest and apoptosis (Gabriel- Robez et al. 1990; Burgoyne et al. 1992; Mohandas et al. 1992; Dumont 2017). PAR-spanning mutations may even pro- vide a barrier to gene ﬂow between incipient species (White et al. 2012a,b). Importantly, deletions and rearrangements in the PAR have been directly linked to infertility in humans and mice (Burgoyne et al. 1992; Jorgez et al. 2011).
Extensive resources have been developed for sorghum genomics (S orghum G enomics P lanning W orkshop P articipants 2005). Several linkage maps have been constructed on the basis of interspecific (i.e., B owers et al. 2003) and intraspecific crosses (i.e., M enz et al. 2002). The sorghum genome map has been aligned to the genome maps of other cereals revealing extensive macrocolinearity, especially between sorghum, rice, and maize (P eng et al. 1999; W ilson et al. 1999; K lein et al. 2003; P aterson et al. 2004; D evos 2005). Approxi- mately 200,000 sorghum ESTs have been collected re- vealing 22,000 unique transcript clusters (L. H. P ratt , personal communication; http://www.fungen.org/). Microarrays and qRT-PCR assays based on these sequen- ces have been used to collect information on sorghum gene expression modulated by plant hormones involved in plant protection (S alzman et al. 2005) and osmotic stress (B uchanan et al. 2005). In addition, the collec- tion of 500,000 methyl-filtered sorghum sequences tagged .90% of the sorghum genes (B edell et al. 2005). The architecture of sorghum chromosomes has also been characterized in several studies. A molecular kar- yotype of the sorghum genome was developed on the basis of fluorescence in situ hybridization (FISH) of BACs derived from each sorghum chromosome (K im et al. 2002). Karyotype-aided analysis of sorghum chro- mosome size and DNA content recently allowed the establishment of a unified sorghum chromosome num- bering system (K im et al. 2005a). In addition, the molec- ular cytology of three sorghum chromosomes has been analyzed in detail using genetically mapped BAC clones and FISH (I slam -F aridi et al. 2002; K im et al. 2005b). These sorghum chromosomes were found to contain distal regions of euchromatin and pericentromeric re- gions of heterochromatin (I slam -F aridi et al. 2002; K im et al. 2005b).
Despite the success of pairedend mapping, there are still challenges to overcome. One important feature of the pairedend mapping approach is that it relies on the reference assembly. It is well established that the reference assembly represents very rare or unique alleles at some loci in the genome. In rare instances, it is also possible that these unique alleles represent cloning artifacts or are a result of misassembly of the reference sequence. For example, this has been suggested for an inversion overlapping an exon of the DOCK3 gene on chromosome 3, for which there is an inversion in the reference assembly as compared to available mRNA Figure 1. Overview of inversion discovery by paired-end mapping. The top part of the figure shows the alignment between the reference assembly and an individual carrying an inversion. When paired-end mapping is performed, the donor DNA is first sheared into several similarly sized DNA fragments. The ends of these fragments are then sequenced (fragments are depicted in blue and red, with the boxes at the ends showing the parts that are sequenced). The pairs of end-sequences are then mapped to the reference genome. The majority of these pairs will map in a plus(+)/minus(-) orientation, separated by the approximate distance expected from the fragment size (labeled A and D). End-pairs labeled B and C indicate mapping of fragment ends in a region containing an inversion compared to the reference assembly. Instead of the expected +/- orientation of the two end-sequences, the pairs spanning the inversion breakpoints map as +/+ and -/-, respectively. Clusters of such read pairs are indicative of an inversion. Only fragments spanning the inversion breakpoint will exhibit this pattern of alignment. Better clone coverage will yield better resolution and more accurate mapping of the breakpoints.
The profile of H3K27me3 marks confirmed these ob- servations. In distal cells where the 3′-TAD is normally inactive, the amount of H3K27me3 was significantly re- duced in mutant versus control cells, as if the “proximal regulation” had not been entirely switched off in distal cells. In parallel with both the decrease of Island3 eRNAs level and the decrease in H3K27ac, the distribu- tion of H3K27me3 marks appeared increased in the former 5 ′ -TAD. Altogether, these results suggest that when mixed into a single fused-TAD, the proximal regu- lation tends to take the lead over the distal regulation, with proximal enhancers being active for too long, even in distal cells where distal limb enhancers seem to be somewhat under-active. A potential mechanism may in- volve the reported effect of HOX13 proteins in the ter- mination of 3′-TAD regulation, combined with the novel chromatin architecture of the fused-TAD. In the absence of HOXD13 proteins, deleted from the fused- TAD, the dose of HOXA13 should be sufficient to se- cure the repression of 3′-TAD and thus to implement
The architecture of A. dohrni mitogenome in- cluding genome content, gene order, and genome asymmetry is consistent with two other reduviid spe- cies T. dimidiata and V. hoffmanni. The mitogenomes of the three assassin bugs share the same genome con- tent (37 genes and 1 control region) and gene order, and have exactly the same genome asymmetry with a gene strand asymmetry (GSA) rate of 0.24 [GSA = (Number of genes on the major-strand – Number of genes on the minor-strand)/Number of total genes] . The size differences among reduviid mitoge- nomes (A. dohrni, 16, 470 bp; T. dimidiata, 17, 019 bp; and V. hoffmanni, 15, 625 bp) is mainly due to the variable number of repeats in the control regions. It is also worth noting that the evolutionary conserved genomearchitecture shared among assassin bugs is also considered ancestral to both insects and crusta- ceans [7, 33].
The butterfly Bicyclus anynana is emerging as a model organism in the field of evo-devo, life history evolution and ageing. However, there are limitations to the detail in which these traits can be unraveled without basic knowledge about genomearchitecture. A genetic linkage map allows separate components of traits to be identified and linked to distinct chromosomal regions and to study their effect in more detail in a context of functionality and interaction. Therefore, a linkage map was constructed for B. ananynana to serve as a resource to facilitate future investigations. Linkage mapping had to take account of absence of crossing-over in female Lepidoptera, and of our use of a full-sib crossing design. We developed a new method to determine and exclude the non-recombinant uninformative female inherited component in offspring. The linkage map was constructed using a novel approach that uses exclusively JOINMAP-software for Lepidoptera linkage mapping. This approach simplifies the mapping procedure, avoids over-estimation of mapping distance and increases the reliability of relative marker positions. A total of 347 AFLP markers, 9 microsatellites and one single-copy nuclear gene covered all 28 chromosomes, with a mapping distance of 1354 cM. Conserved synteny of Tpi on the Z-chromosome in Lepidoptera was confirmed for B. anynana. The results are discussed in relation to other mapping studies in Lepidoptera. This study contributes to the knowledge of chromosome structure and evolution of an intensively studied organism. On a broader scale it provides an insight in Lepidoptera sex chromosome evolution and it proposes a simpler and more reliable method of linkage mapping than used for Lepidoptera to date.
Hi-C data. For instance, very few tools perform comparative analysis, visually or statistically, of two Hi-C contact maps [59, 62, 69, 122], and none of these tools allow joint analysis of more than two datasets that come from multiple time points, conditions, or cell types. Also, many of the existing methods, specifically the three-dimensional reconstruction algorithms, do not scale to high-resolution Hi-C data from large genomes such as human and mouse. Deconvolution of Hi-C data from a large number of cells into subpopulations with similar chromatin organizations and estimation of the density of each subpopulation is still largely unexplored [123, 124]. Similarly, integration of two- dimensional Hi-C data or three-dimensional chromatin models with the vast quantity of available one- dimensional datasets, such as replication timing, histone modifications, protein binding and gene expression, is also understudied. One study that integrates Hi-C data with many types of genomics and epigenomics data tracks uses a technique called graph-based regularization (GBR) to perform semi-automated genome annotation . This study encouragingly shows that the integration of Hi-C data improves the annotation quality and allows identi- fication of novel domain types. However, GBR assumes that regions that are close in three dimensions should be assigned the same annotation label, which may only makes sense for large-scale domain annotations (greater than approximately 100 kb). Another method integrates low resolution Hi-C data (1 Mb) with transcription-factor binding, histone modification and DNase hypersensi- tivity information and identifies 12 different clusters of interacting loci that fall into two distinct chromatin link- ages (co-active and co-repressive) . Most recently, Chen et al. present a unified four-dimensional analysis framework (three space plus one time dimension) that uses adaptive resolution contact maps to perform gene- level analysis . They use this framework to interrogate the dynamic relationship between genomearchitecture and gene expression of primary human fibroblasts over a 56-hour time course. Concurrent advances in such computational integration efforts and in experimental data generation have the potential to transform our understanding of the structure-function relationship and help translational biomedical research. Several intriguing studies suggest that alterations in chromatin conforma- tion and in gene regulation are tightly linked in cancer [22, 23, 126, 127], cellular differentiation , and development .
At the point when organizations shifted to looking at optimizing their whole organizations—to streamline operations and cut costs—the information system began to be seen as the nervous system of the organization, rather than a collection of data-processing applications to increase local efficiency. The technology for this step lagged the need by several years—but the interest in aligning and integrating information systems endured. Currently this is the focus of most enterprise architecture efforts. Both cost-efficiency and organizational transformation (or at least fundamental business improvement—and electronic support of customers—e-commerce) are the typical foci of these efforts.
More recently, a strong qualitative and dominant monogenic resistance known as CMD2 was discovered in a Nigerian landrace (TMEB3) in the 1980s (Akano et al., 2002). Multiple biparental QTL analyses have been conducted, initially using simple-sequence repeat (SSR) markers (Akano et al., 2002; Lokko et al., 2005; Okogbe- nin et al., 2007, 2012; Mohan et al., 2013) but more recently genome-wide SNPs (Rabbi et al., 2014a,b), to understand the genetic basis of this type of qualitative resistance to CMD. Although some studies hint at additional resistance loci (Okogbenin et al., 2012; Mohan et al., 2013), most evidence points solely to the CMD2 locus (Rabbi et al., 2014a,b). However, these biparental mapping efforts relied on a handful of unique parental genotypes from West Africa and therefore only examined a narrow slice of Afri- can cassava germplasm diversity (Rabbi et al., 2014b).