Empirically derived models of amino acid replacement are employed to study the association between various physical features of proteins and evolution. The strengths of these associations are statistically evaluated by applying the models of protein evolution to 11 diverse sets of protein sequences. Parametric bootstrap tests indicate that the solvent accessibility status of a site has a particularly strong association with the process of amino acid replacement that it experiences. Significant association between secondary structure environment and the amino acid replacement process is also observed. Careful description of the length distribution of secondary structure elements and of the organization of secondary structure and solvent accessibility along a protein did not always significantly improve the fit of the evolutionary models to the data sets that were analyzed. As indicated by the strength of the association of both solvent accessibility and secondary structure with amino acid replacement, the process of protein evolution—both above and below the species level—will not be well understood until the physical constraints that affect protein evolution are identified and characterized.
14 Read more
In chapter 3, some recently developed computational and multivariate statistical techniques were used to explore the molecular structure of the serpin protein family. The serpins are structurally more complex than bHLH proteins and consequently are used here to evaluate the relative efficacy of new procedures recently developed in the William R. Atchley lab. Serpins (serine protease inhibitors) are a super family of proteins whose membership is based on the presence of a single common core domain consisting of three β -sheets and 8-9 α -helices (Gettins 2002). They are widely distributed among eukaryotes and in some viruses that infect them (Irving, Pike et al. 2000). They are absent from fungi and chlorophytes, despite being found in higher plants and recently discovered in prokaryotes (Irving, Pike et al. 2000; Irving, Steenbakkers et al. 2002). Despite the presence of a common fold of approximately 350 residues in all serpins, pairwise identity of primary structures can be as low as 25 % (Gettins 2002). The majority of serpins are functionally characterized as proteinase inbibitors, for chymotrypsin like serine proteinases, but some inhibit other types of proteinases. Serpins regulate numerous separate cellular and extracellular processes including blood coagulation,
150 Read more
Recently described stochastic models of protein evolution have demonstrated that the inclusion of structural informa- tion in addition to amino acid sequences leads to a more reliable estimation of evolutionary parameters. We present a generative, evolutionary model of protein structure and sequence that is valid on a local length scale. The model concerns the local dependencies between sequence and structure evolution in a pair of homologous proteins. The evolutionary trajectory between the two structures in the protein pair is treated as a random walk in dihedral angle space, which is modeled using a novel angular diffusion process on the two-dimensional torus. Coupling sequence and structure evo- lution in our model allows for modeling both “smooth” conformational changes and “catastrophic” conformational jumps, conditioned on the amino acid changes. The model has interpretable parameters and is comparatively more realistic than previous stochastic models, providing new insights into the relationship between sequence and structure evolution. For example, using the trained model we were able to identify an apparent sequence–structure evolutionary motif present in a large number of homologous protein pairs. The generative nature of our model enables us to evaluate its validity and its ability to simulate aspects of protein evolution conditioned on an amino acid sequence, a related amino acid sequence, a related structure or any combination thereof.
16 Read more
The fitness function dictated by our biological assumptions (Eq. 6.1) allowed us to efficiently parallelize genome evolution. Assuming no genetic linkage and a low mutation rate, an overall evolutionary competition between N organisms having n genes expressed at different (but fixed) levels is equivalent to parallel competitions within n populations of N one-gene, one-expression-level individuals. We carried out n=650 such sub-simulations of N=1,000 genes, with 50 distinct genes at each of 13 expression levels evenly spaced on a log scale from 10 to 100,000. Initial genes were chosen at random by choosing a random sequence encoding a lattice protein (see below) that adopted a target structure with free energy of unfolding (stability) of at least 0 kcal/mol, and hill-climbing until the stability exceeded 5 kcal/mol. During evolution, the initial sequences equilibrated for 50,000 generations, and recording of evolutionary data (see below) then proceeded until the most- recent common ancestor of the final population had a birth time at least 100,000 generations after the end of equilibration. Fitness was converted into reproductive success by Wright-Fisher sampling using non-overlapping generations. Fitness costs (see below) were derived from absolute numbers of misfolded proteins (Eq. 6.1) and so had the same meaning within and between sub-simulations, making evolutionary rates between sub- simulations directly comparable. A mutation rate of 0.00001 changes per nucleotide (μ=0.00075 per gene) per generation (Nμ=0.75) was held constant across all sub- simulations. A full simulation run required approximately one month of computing time on a 2.0GHz Pentium 4 PC with 0.5 GB of RAM. Simulation code was written in C++.
185 Read more
ABSTRACT The “ nearly neutral ” theory of molecular evolution proposes that many features of genomes arise from the interaction of three weak evolutionary forces: mutation, genetic drift, and natural selection acting at its limit of ef ﬁ cacy. Such forces generally have little impact on allele frequencies within populations from generation to generation but can have substantial effects on long-term evolution. The evolutionary dynamics of weakly selected mutations are highly sensitive to population size, and near neutrality was initially proposed as an adjustment to the neutral theory to account for general patterns in available protein and DNA variation data. Here, we review the motivation for the nearly neutral theory, discuss the structure of the model and its predictions, and evaluate current empirical support for interactions among weak evolutionary forces in protein evolution. Near neutrality may be a prevalent mode of evolution across a range of functional categories of mutations and taxa. However, multiple evolutionary mechanisms (in- cluding adaptive evolution, linked selection, changes in ﬁtness-effect distributions, and weak selection) can often explain the same patterns of genome variation. Strong parameter sensitivity remains a limitation of the nearly neutral model, and we discuss concave ﬁtness functions as a plausible underlying basis for weak selection.
20 Read more
The plasmoid observed in the Earth’s magnetotail shows a wide variety of the complicated plasma structures that are not simply described by the standard Petschek reconnection model. The interaction of the plasmoid propagating tailward with the surrounding plasmas of the plasma sheet at rest may be important to understand the plasma sheet structure and the plasma heating observed in the magnetotail. The nonlinear time evolution of the plasmoid is stud- ied by using a large-scale, high-resolution, two dimensional MHD simulation code. Several discontinuities/shocks are formed in association with magnetic reconnection: 1) a pair of the standard Petscheck-type slow shock waves emanating from the X-type neutral point, 2) the tangential discontinuity inside the plasmoid that separates the accel- erated plasmas from the original plasma sheet plasmas, 3) the slow shock with a “crab-hand” structure surrounding the front-side of the plasmoid, 4) the intermediate shocks in the edge of the plasma sheet inside the plasmoid, and 5) the contact discontinuity inside the plasma sheet that separates the shock-heated plasmas from the Joule heated plas- mas by the magnetic diffusion at the X-type neutral point. We also discuss how those plasma discontinuities/shocks structures are affected by the lobe/mantle plasma condition.
protein . Despite being highly conserved and present across eukaryotic lineages except Archaeplastida (Fig. 4), its function remains entirely unknown. The protein consists of four transmembrane domains homologous to other Tim17 proteins with additional C-terminal trans- membrane domain, that is specific only to the peroxisomal orthologues (Figs. 2 and 3a). Tmem135, a 52 kDa protein, also referred to as PMP52, was originally identified by mass spectrometry of purified peroxisomes [26, 27]. Based on the expression profiling the protein was suggested to take a part in the fatty acid metabolism [28, 29]. Tmem135/ PMP52 is predicted to carry eight TMDs, which corres- pond to two Tim17 protein family domains. The pro- tein is present in all eukaryotic supergroups, however the ambiguous relationship between the N- and C-terminal Tim17 domains indicate that these have been swapped during evolution (Fig. 1). That a member of the origin- ally described mitochondrial protein family was found in the peroxisomes has a precedent in the case of the peroxisomal ADP/ATP carrier PMP34, a member of mitochondrial carrier protein family . Considering that a Tim17 protein (Tim22) assembles the mitochon- drial carriers into the inner mitochondrial membrane, it is tempting to speculate that PMP24 mediates the insertion
13 Read more
Figure 8 shows the BDT depth for the North Pole and Solis Planum as a function of time under both dry and wet conditions. Although the BDT depth is not identical to the lithospheric thickness, the two are closely linked, and the convection style and lithosphere thickness are coupled. On present-day Mars, the BDT depth for a dry rheology is significantly deeper than that for a wet rhe- ology in both the North Pole and Solis Planum regions. This occurred because the existence of water reduces the frictional strength in regions of brittle deformation owing to pore pressure and also reduces rock strength in regions of ductile deformation. A comparison of the pre- sent-day North Pole and Solis Planum revealed that the BDT depth at the North Pole is markedly deeper under both wet and dry conditions, indicating that lithospheric thickness in the North Pole region is greater than that of the Solis Planum. This is likely attributed to differences in the present-day thermal structure and in crustal thick- ness (e.g., Zuber et al. 2000). The BDT depth in Solis Pla- num tends to be relatively shallow owing to the thicker crust (Figs. 7, 8b).
13 Read more
MRI provides an advantage over histology in that, as imaging is a 3D process that does not require brain sectioning, the resulting image can be viewed in any plane, coronal, horizontal and sagittal, and all planes can be viewed at once. This allows for brain regions and fiber tracts to be viewed throughout their rostral-caudal extent, facilitating visualization of the true structure of the brain. Traditional histological atlases of reptiles, which consist of coronal sections, have much poorer longitudinal resolution than they do in-plane resolution. Each image in an atlas is typically 100 µm or more caudal to the previous one. Our 3D atlas has the same resolution – 23 µm 3 – in all three dimensions. Moving through the brain at such a fine resolution provides a much better understanding of how brain regions change shape, size, and particularly how they are connected and related to each other. This can be clearly demonstrated by scrolling in a rostro-caudal direction through our online atlas and comparing the result with flipping through the figures in this paper.
288 Read more
Streptococcal fibronectin-binding protein is an important virulence factor involved in colonization and invasion of epithelial cells and tissues by Streptococcus pyogenes. In order to investigate the mechanisms involved in the evolution of sfbI, the sfbI genes from 54 strains were sequenced. Thirty-four distinct alleles were identified. Three principal mechanisms appear to have been involved in the evolution of sfbI. The amino- terminal aromatic amino acid-rich domain is the most variable region and is apparently generated by intergenic recombination of horizontally acquired DNA cassettes, resulting in a genetic mosaic in this region. Two distinct and divergent sequence types that shared only 61 to 70% identity were identified in the central proline-rich region, while variation at the 3 ⴕ end of the gene is due to deletion or duplication of defined repeat units. Potential antigenic and functional variabilities in SfbI imply significant selective pressure in vivo with direct implications for the microbial pathogenesis of S. pyogenes.
To align the whole genome assemblies of D. melanogaster, D. simulans, D. yakuba and D. erecta, we used whole genome multiple alignment algorithm implemented in the VISTA Genome Pipeline (Brudno et al. in prep.). This algorithm consists of two major modules – Pairwise Alignment of Sister Taxa and Progressive Multiple Align- ment. First module uses a glocal (hybrid global/local) approach based on a reimplementation of the original Shuffle-LAGAN (S-LAGAN) chaining algorithm [37,38] combined with a post-processing stage called SuperMap. The S-LAGAN chaining takes as input a set of local align- ments between the two sequences and returns the maxi- mal scoring subset of these under certain gap criteria. In order to allow our alignments to incorporate duplications in both genomes, SuperMap algorithm takes two S- LAGAN outputs, for each sequence as the base. We then classified all local alignments as belonging to both chains, and consequently orthologous (best bidirectional hits), or being in only one chain, and hence a duplication. After the two pairs of sister taxa (melanogaster/simulans and yakuba/erecta) were aligned, we used a progressive general- ization of the pairwise SuperMap algorithm to align the two alignments to each other, and get a 4-way alignment. Our algorithm is based on finding a maximum weighted matching in a graph, with the weights specified by the out- group genomes, to order the individual alignment blocks in the likely order of the ancestors of melanogaster/simulans and yakuba/erecta. After that we align the resulting ances- trally-ordered alignments to each other using LAGAN . To restrict our analysis to one-to-one orthologs, all cases in which an ORF in one species was aligned to more than one ORF in another species were excluded from the analysis. Since a substantial fraction of Drosophila coding regions had stretches of ambiguities and internal stop codons, we assumed that such stretches could have erro- neous length due to sequencing errors. Therefore, if it allowed us to reduce the numbers of internal stop codons, Rate of evolution and frequency of the inferior allele as func-
13 Read more
For an unknow n protein apparently containing m ultiple dom ains, it is usual to identify d atabase homologues of each constituent dom ain. Figure 3.2(b) includes pairs of all homologous protein dom ains, belonging to bo th single- and m u lti dom ain proteins. Surprisingly, th e effect of including m ulti-dom ain proteins on the ‘40% th resh o ld ’ is not th a t d ram atic, in th a t conservation of function above 40% is still high. Even w ithin th e 30-40% sequence identity region, alm ost 90% of pairs share a m inim um of three EC digits. Below 30%, th e pairing of enzymes w ith non-enzymes becomes quite common. Given th a t variation of dom ain construction w ithin a superfam ily is a useful ro u te to new functions, in th a t new modules can provide altern ativ e substrate-binding sites, catalytic residues or protein-protein in terfaces, it is reasonable to assum e th a t a sizeable fraction of those pairs having conserved function share the same dom ain organisation. (O f course, there are m any exceptions.) It follows th a t changes in m odular construction m ay be far more com m on below th e 40% threshold; th e sequences, and structures, of two homologous domains which are fused to otherw ise non-homologous m odules m ust vary to some extent owing to different dom ain interface requirem ents. It is well to rem em ber th a t m any of these dom ain pairs probably correspond to substrate-binding or reg u latory dom ains, for exam ple. N evertheless, th e histogram suggests th a t a pairwise sequence id en tity above 40% strongly indicates a sim ilarity in catalytic function.
441 Read more
In physical research, the material structure and structural energy are bases of the the evolution of the universe; it’s identical between physical theory and evolution of material structure; corresponding with evolution of material structure "epochal character", "stages" and "hierarchy" has determined for the "epochal character", "stages" and "hierarchy" of physical theory; the research results obtained by the evolution of material structure under "extreme conditions"of "epochal character" have the significance of“epochal character”. Different "epochal character" have different physical laws, which are relatively independent, and are connected, transitioned, and transformed at their "epochal character" interface. At the same time, it must be noted that some physical experiments under “extreme conditions” do not necessarily reproduced the past. Because we can not confirm that some kind of physical environment under “extreme conditions” must be existent in history of evolution of material structure. Such as super electric field, super magnetic field environment, its existence in the history of the evolution of the universe is worth discussing. Therefore, the "extreme conditions" set by artificially can not be completely equivalent to the actual existence of the evolutionary history of the universe. Moreover, there are high temperature, high pressure, electric field, magnetic field inside the sun, earth and other celestial bodies ， which are also the actual environment that's we are difficult to simulate in the “present world ”. In scientific sense, artificial high energy, high pressure, high temperature, super low temperature, strong electric field, strong magnetic field, etc, and creating new species by genetic changes in living things, artificial methods have achieved the supernatural and created nature.
The relatively high probabilities of a small number of pathways means that evolution on low-roughness pathways has a degree of predictability and reproducibility that would not necessarily be expected (27). Experimental studies of fitness landscapes may therefore be informative for processes that have happened, or are happening, in nature. For example, the high-probability evolutionary pathways identified for the evolution of pyrimethamine resistance of the malaria dihydrofolate reductase studied in E. coli coincide exactly with the inferred stepwise acquisition of pyrimethamine
20 Read more
Plant viruses threaten food security by inciting many damaging plant disease epidemics and the damage is predicted to increase, partly because climate change is expected to accelerate rates of evolution, emergence, and adaptability of viral and subviral para- sites (21). This is especially critical because most plant viruses have RNA genomes, whose intrinsic properties of high replication speeds, lack of error correction during their RNA replication, short generation times, and vast within-plant population sizes dictate extremely high genetic variability of their populations (22). The inherently high evolutionary potential of plant viruses makes their population genetic structures more “plastic”, which challenges the measures implemented for virus disease control (23). Therefore, it is very important, both fundamentally and prac- tically, to understand the forces that drive the molecular evolution and determine the population genetic structures of plant viruses (20).
12 Read more
Alternative splicing is known to be an important source of protein sequence variation, but its evolutionary impact has not been explored in detail. Studying alternative splicing requires extensive sampling of the transcriptome, but new data sets based on expressed sequence tags aligned to chromosomes make it possible to study alternative splicing on a genome-wide scale. Although genes showing alternative splicing by exon skipping are conserved as compared to the genome as a whole, we find that genes where structural differences between human and mouse result in genome-specific alternatively spliced exons in one species show almost 60% greater nonsynonymous divergence in constitutive exons than genes where exon skipping is conserved. This effect is also seen for genes showing species-specific patterns of alternative splicing where gene structure is conserved. Our observations are not attributable to an inherent difference in rate of evolution between these two sets of proteins or to differences with respect to predictors of evolutionary rate such as expression level, tissue specificity, or genetic redundancy. Where genome-specific alternatively spliced exons are seen in mammals, the vast majority of skipped exons appear to be recent additions to gene structures. Furthermore, among genes with genome-specific alternatively spliced exons, the degree of nonsynonymous divergence in constitutive sequence is a function of the frequency of in- corporation of these alternative exons into transcripts. These results suggest that alterations in alternative splicing pattern can have knock-on effects in terms of accelerated sequence evolution in constant regions of the protein.
11 Read more
th a t tandem exon duplication is one source of alternatively spliced exons (Kondrashov and Kooniii, 2001; Letunic et al., 2002), bu t none of the probable recent exon gains th a t we identified showed evidence of this. If genome-specific exons are created by tandem dupli cation then the lack of detectable sequence homology in the orthologous gene nm st be due to rapid sequence change following duplication. By restricting our comparison to genes undergoing alternative splicing by exon-skipping, and subdividing these into those cases where alternative splicing occurred in th e ancestor of human and mouse, and those where alternative splicing emerged in th e hum an or mouse branch only, we have been able to focus on the im pact of alternative splicing on recent m am m alian sequence evolution. This approach was designed to elim inate the influence of functional differences between genes, unlike the com parison of sequence constraint in alternatively spliced genes to genes in the genome as a whole. Thus, although genes showing alternative splicing by exon-skipping are a slow-evolving subset of the hum an genome, there is an increased rate of sequence evolution in the im m ediate afterm ath of the appearance of alternative splicing. T his result is reminiscent of observations about the evolution of duplicated genes. A number of studies have reported relaxation of sequence constraint in duplicated genes compared to singletons (Lynch and Conery, 2000; Van de Peer et al., 2001; Nenibaware et al., 2002; Seoighe et al., 2003), b u t it has recently been shown th a t genes th a t tend to rem ain duplicated are gener ally more slowly evolving th an genes th a t are found in single copy (Davis and Petrov, 2004; Jordan et al., 2004). It therefore appears th a t conserved genes are more likely th a n faster evolving genes to undergo diversification by either gene duplication or alternative splicing, and th a t b o th processes result in an increased ra te of sequence change.
132 Read more
A striking feature of the CCHFV N structure is that the arm domain displays a DEVD caspase cleavage motif at its apex, in the most accessible position on the entire molecule (Fig. 1A). This exposed position, along with its strict conservation in all CCHFV strains, suggests that the virus has evolved to present the cleavage motif to the cellular environment, which is somehow beneficial to the virus life cycle. If possession of the exposed cleavage site were not beneficial to the virus or if caspase cleavage were a host defense mechanism, a fast-mutating RNA virus such as CCHFV would be predicted to quickly eliminate the motif. The functional signifi- cance of this caspase cleavage site is therefore an intriguing feature of the CCHFV N protein. To investigate the fate of the N protein fragments following cleavage, we performed caspase cleavage of the N protein in vitro and showed that the cleavage products re- mained associated as a single unit. This raises the interesting pos- sibility that N protein functions may remain unaltered following cleavage. The cleaved N protein could of course have an altered tertiary or quaternary structure, which may influence function in a variety of ways, including interaction with different protein part- ners, the adoption of different oligomeric states, or alteration of RNA binding affinity. In order to best understand the functional relevance of this DEVD motif in the CCHFV life cycle, we need to manipulate the CCHFV genome with a view to studying the con- sequence of such a change in the context of a live virus infection of intact organisms. Unfortunately, such a system currently does not exist.
10 Read more
SNPs are the largest resource of genetic variation (Sherry et al. 2001) and there are growing protein structure databases (Berman et al. 2000). The concurrent outgrowth of genetic variation and protein structural data has motivated the integration of coding SNPs into protein structure (Chasman and Adams 2001; Sunyaev, Lathe, and Bork 2001a; Sunyaev et al. 2001b; Wang and Moult 2001; Ramensky et al. 2002; Hughes et al. 2003; Krishnan and Westhead 2003; Yue, Li, and Moult 2005; Yue and Moult 2006). These techniques attempt to associate genetic variation of SNPs with phenotypic consequences of abnormal protein structural changes and diseases in order to predict which nsSNPs are likely to affect protein function. Although these studies connect genotypic changes with phenotype, they typically do not provide a population genetic interpretation about differ- ences in relative fitnesses among alleles. Over the last half century, phylogenetic inference with interspecific sequence data has experienced steady methodological advance. At the cornerstone of the progress are models of molecular evolution. Recently, protein tertiary structure information has become much more abundant. This benefits our knowledge about protein-coding evolution (e.g., Marsden et al. 2006). Despite the surge of molecular pheno- type information (e.g., protein tertiary structure) and the advance of genotype-phenotype mapping techniques (e.g., protein structure prediction), many models of molecular evolu- tion treat sequence changes without considering their phenotypic consequences. Therefore, many models treat natural selection superficially.
131 Read more
Key questions concerning pertussis are the origin of the dis- ease, the forces that have driven the shifts in B. pertussis popula- tions, and the role of these shifts in the resurgence of pertussis. To address these questions, we have determined the global popula- tion structure of B. pertussis by whole-genome sequencing of 343 strains from 19 countries isolated between 1920 and 2010. Phylo- genetic analysis revealed a deep divergence between two lineages of B. pertussis, possibly suggesting two independent introductions of the organism from an unknown reservoir. Bayesian methods showed that the date of the common ancestor of one of these lineages correlates with the first descriptions of pertussis in Eu- rope and that this lineage has increased in diversity subsequent to the introduction of vaccination. Our analyses revealed that many (putative) adaptive mutations occurred in the period in which the WCV was used, suggesting that vaccination was the major force driving changes in B. pertussis populations. Furthermore, we ex- tend our previous observation that the mutation leading to the ptxP3 allele occurred once and that the ptxP3 strains have spread and diversified worldwide (29). Finally, we identified novel puta- tive adaptive loci, the analysis of which may cast new light on the persistence and resurgence of pertussis and point to ways to in- crease the effectiveness of vaccination.
13 Read more