phylogenetic inference

Top PDF phylogenetic inference:

An Automated Framework for Fast Cognate Detection and Bayesian Phylogenetic Inference in Computational Historical Linguistics

An Automated Framework for Fast Cognate Detection and Bayesian Phylogenetic Inference in Computational Historical Linguistics

All the data for training and testing was taken from publicly available sources and has further been submitted along with the supplementary ma- terial accompanying this paper. For training of the parameters of our BipSkip approach for fast cognate detection, the data by List (2014) was used in the form provided by List et al. (2017). This dataset consists of six subsets each cover- ing a subgroup of a language family of moder- ate size and time depth (see SI 2). To test the BipSkip method, we used both the test set of List et al. (2017), consisting of six distinct datasets of moderate size, as well as five large datasets from five different language families (Austronesian, Austro-Asiatic, Indo-European, Pama-Nyungan, and Sino-Tibetan) used for the study by Rama et al. (2018) on the potential of automatic cognate detection methods for the purpose of phylogenetic reconstruction. The latter dataset was also used to test the MAPLE approach for phylogenetic infer- ence. The other two datasets could not be used for the phylogenetic inference task, since these datasets contain a large number of largely unre- solved dialect varieties for which no expert classi- fications are available at the moment. More infor- mation on all datasets is given in Table 2.
Show more

11 Read more

Phylogenetic inference from homologous sequence data: minimum topological assumption, strict mutational compatibility consensus tree as the ultimate solution

Phylogenetic inference from homologous sequence data: minimum topological assumption, strict mutational compatibility consensus tree as the ultimate solution

that the apparent homoplasy on the tree is minimal, and b) recognition of incompatible characters and their exclu- sion from the phylogenetic analysis [7]. Both approaches regard homoplasy as disturbing, adverse phenomenon. Proponents of the first approach argue that exclusion of homoplastic characters discards information which is still informative regardless of its imperfection. Such a state- ment is very much alike to a statement that listening to a noise mixed with a pure melody contributes to the artistic value of the melody. As stated by Page and Holmes [8], "homoplasy is a poor indicator of evolutionary relation- ships, because similarity does not reflect shared ancestry." Since incompatible characters are irrelevant and add no apparent benefit whilst imposing a substantial obstacle to phylogenetic inference, their exclusion from at least pri- mary genealogical analysis is fully justified. Once a regular tree has been formed, irregular markers could be used for elucidation of the natural history of reversions, parallel- isms and also recombinations. Thus, reverse mutations form paraphyletic groups rooted by the marker which has subsequently reverted, whilst parallel mutations form polyphyletic groups, each monophyletic subgroup of which is being rooted by the same marker. Recombina- tions usually cause multiple parallelisms or multiple reversions within a single element, whereby markers placed between these are always younger, formed after the moment of recombination. Hence, most of recombina- tions can also be detected on the resulting tree. A detailed Input sequence set in a form of alignment
Show more

14 Read more

Continuous Space Representations of Linguistic Typology and their Application to Phylogenetic Inference

Continuous Space Representations of Linguistic Typology and their Application to Phylogenetic Inference

Figure 2 shows the case of the Austroasiatic lan- guages. In the original, categorical representations, the mixtures of two languages form a deep valley (i.e., typologically unnatural intermediate states). By contrast, the continuous space representations al- low a language to change into another without harm- ing typological naturalness. This indicates that in the continuous space, we can easily reconstruct ty- pologically natural ancestors. The major feature changes include “postpositional” to “prepositional” (0.46–0.47), “strongly suffixing” to “little affixa- tion” (0.53–0.54) and “SOV” to “SVO” (0.60–0.61). 5 Phylogenetic Inference
Show more

11 Read more

Current Methods for Automated Filtering of Multiple Sequence Alignments Frequently Worsen Single-Gene Phylogenetic Inference

Current Methods for Automated Filtering of Multiple Sequence Alignments Frequently Worsen Single-Gene Phylogenetic Inference

Abstract.—Phylogenetic inference is generally performed on the basis of multiple sequence alignments (MSA). Because errors in an alignment can lead to errors in tree estimation, there is a strong interest in identifying and removing unreliable parts of the alignment. In recent years several automated filtering approaches have been proposed, but despite their popularity, a systematic and comprehensive comparison of different alignment filtering methods on real data has been lacking. Here, we extend and apply recently introduced phylogenetic tests of alignment accuracy on a large number of gene families and contrast the performance of unfiltered versus filtered alignments in the context of single-gene phylogeny reconstruction. Based on multiple genome-wide empirical and simulated data sets, we show that the trees obtained from filtered MSAs are on average worse than those obtained from unfiltered MSAs. Furthermore, alignment filtering often leads to an increase in the proportion of well-supported branches that are actually wrong. We confirm that our findings hold for a wide range of parameters and methods. Although our results suggest that light filtering (up to 20% of alignment positions) has little impact on tree accuracy and may save some computation time, contrary to widespread practice, we do not generally recommend the use of current alignment filtering methods for phylogenetic inference. By providing a way to rigorously and systematically measure the impact of filtering on alignments, the methodology set forth here will guide the development of better filtering algorithms. [alignment filtering; alignment trimming; molecular phylogeny; multiple sequence alignment; phylogeny; phylogenetic inference; phylogenetics]
Show more

14 Read more

Trial by phylogenetics - Evaluating the Multi-Species Coalescent for phylogenetic inference on taxa with high levels of paralogy (Gonyaulacales, Dinophyceae)

Trial by phylogenetics - Evaluating the Multi-Species Coalescent for phylogenetic inference on taxa with high levels of paralogy (Gonyaulacales, Dinophyceae)

Figure 1. Maximum likelihood phylogenetic inference of ribosomal DNA genes. Concatenation of small subunit rDNA and D1-D3 region large subunit rDNA. Accession numbers for concatenated genes in Table S3. Gonyaulacales (n=16) in purple, outgroups (n=3) in light blue and taxa incertae sedis (n=1) in teal. The topology was rerooted on the branch separating outgroup taxa with the Gonyaulacales. The scale represents the expected number of substitutions per site.

20 Read more

Fast phylogenetic inference from typing data

Fast phylogenetic inference from typing data

Background: Microbial typing methods are commonly used to study the relatedness of bacterial strains. Sequence‑ based typing methods are a gold standard for epidemiological surveillance due to the inherent portability of sequence and allelic profile data, fast analysis times and their capacity to create common nomenclatures for strains or clones. This led to development of several novel methods and several databases being made available for many microbial species. With the mainstream use of High Throughput Sequencing, the amount of data being accumulated in these databases is huge, storing thousands of different profiles. On the other hand, computing genetic evolution‑ ary distances among a set of typing profiles or taxa dominates the running time of many phylogenetic inference methods. It is important also to note that most of genetic evolution distance definitions rely, even if indirectly, on computing the pairwise Hamming distance among sequences or profiles.
Show more

14 Read more

Sporogony of four Haemoproteus species (Haemosporida: Haemoproteidae), with report of in vitro ookinetes of Haemoproteus hirundinis: phylogenetic inference indicates patterns of haemosporidian parasite ookinete development

Sporogony of four Haemoproteus species (Haemosporida: Haemoproteidae), with report of in vitro ookinetes of Haemoproteus hirundinis: phylogenetic inference indicates patterns of haemosporidian parasite ookinete development

Background: Haemoproteus (Parahaemoproteus) species (Haemoproteidae) are widespread blood parasites that can cause disease in birds, but information about their vector species, sporogonic development and transmission remain fragmentary. This study aimed to investigate the complete sporogonic development of four Haemoproteus species in Culicoides nubeculosus and to test if phylogenies based on the cytochrome b gene (cytb) reflect patterns of ookinete development in haemosporidian parasites. Additionally, one cytb lineage of Haemoproteus was identified to the spe- cies level and the in vitro gametogenesis and ookinete development of Haemoproteus hirundinis was characterised. Methods: Laboratory-reared C. nubeculosus were exposed by allowing them to take blood meals on naturally infected birds harbouring single infections of Haemoproteus belopolskyi (cytb lineage hHIICT1), Haemoproteus hirun- dinis (hDELURB2), Haemoproteus nucleocondensus (hGRW01) and Haemoproteus lanii (hRB1). Infected insects were dissected at intervals in order to detect sporogonic stages. In vitro exflagellation, gametogenesis and ookinete development of H. hirundinis were also investigated. Microscopic examination and PCR-based methods were used to confirm species identity. Bayesian phylogenetic inference was applied to study the relationships among Haemopro- teus lineages.
Show more

16 Read more

Use of Endogenous Retroviral Sequences (ERVs) and structural markers for retroviral phylogenetic inference and taxonomy

Use of Endogenous Retroviral Sequences (ERVs) and structural markers for retroviral phylogenetic inference and taxonomy

The origin of retroviruses is lost in a prebiotic mist. Assuming a 0.2% neutral substitution rate per million years [6] and a 50% divergence limit for nucleotide sequence recognition, retroviral sequences >250 Million years old cannot be found in current genomes. If any of their genes are selected for, they may stay recognizable longer. Thus, although the ERV record has limitations, the reconstruction of retrovirus evolution differs fundamen- tally from that of other viruses, due to the ERVs in the ever richer archive of genomic assemblies. According to the VIIth ICTV report [7], Retroviridae borders to Pararetroviri- dae (e.g. Hepatitis B), Metaviridae (Gypsy-like) and Pseudo- viridae (Copia-like). Together with the even more more distant relatives Mal-R [8], DIRS [9] retrotransposons and chromoviruses [10], not included here, they show that ret- roviruses are parts of a vast retrotransposon sequence uni- verse. In this work, we concentrated on retroviruses. An ancestral retrovirus likely had structural traits which at present are common denominators of the diverse related sequences. Although some structural traits may be absent in individual viruses, readily identifiable common denominators are 5'LTR, PBS, Gag (MA, CA and NC), Pro, Pol, Env, PPT and 3'LTR [11]. The most universal trait is the pol gene, with its reverse transcriptase (RT), RNAse H and integrase (IN). The use of other conserved but distin- guishing traits in phylogenetic inference and retroviral classification discussed here are: nucleotide bias, number of zinc fingers, translational strategy, C-terminal Pro and Pol motifs, presence of dUTPase and accessory genes and LTR length. Env is an unreliable evolutionary marker, exemplified by the hybrid betaretroviral MPMV [11], but can be useful in narrow phylogenies to demarcate a spe- cific group.
Show more

12 Read more

Phylogenetic inference of calyptrates, with the first mitogenomes for Gasterophilinae (Diptera: Oestridae) and Paramacronychiinae (Diptera: Sarcophagidae)

Phylogenetic inference of calyptrates, with the first mitogenomes for Gasterophilinae (Diptera: Oestridae) and Paramacronychiinae (Diptera: Sarcophagidae)

We are grateful to Drs. Jie Liu (Institute of Zo- ology, Chinese Academy of Sciences) and Eliana Buenaventura (Natural History Museum of Denmark, University of Copenhagen), Prof. Aibing Zhang and Miss Jie Qin (Capital Normal University) who gave us invaluable help during this study. We wish to extend our sincerest thanks to Mr. Zhe Zhao (Institute of Zo- ology, Chinese Academy of Sciences), Mr. Yuan Wang (Institute of Zoology, Chinese Academy of Sciences), and Dr. Xingyi Li (Third Military Medical University of Chinese P.L.A) for giving useful suggestions of phylogenetic analyses, and to Drs Kelly A. Meiklejohn (University of Florida, Gainesville, USA), Thomas J. Simonsen (Natural History Museum, London, UK) and Mark Eglar (University of Melbourne, Australia) for kindly providing valuable comments to the man- uscript. We would also like to express our gratitude to the online information providers Diptera info (http://mail.diptera.info/news.php) and Discover Life (http://www.discoverlife.org) for specimen pic- tures in our graphical abstract. This study was sup- ported by the Fundamental Research Funds for the Central Universities (No. JC2015-04), the Program for New Century Excellent Talents in University (No. NCET-12-0783), Beijing Higher Education Young Elite Teacher Project (No. YETP0771), and the National Science Foundation of China (No. 31201741), all to DZ, and the Carlsberg Foundation (No. 2012_01_0433) to TP.
Show more

16 Read more

Reconciling taxonomy and phylogenetic inference: formalism and algorithms for describing discord and inferring taxonomic roots

Reconciling taxonomy and phylogenetic inference: formalism and algorithms for describing discord and inferring taxonomic roots

The original algorithm described in Theorem 1 and the branch and bound algorithm in Algorithm 1 have been implemented in the rppr binary of the pplacer suite of pro- grams (http://matsen.fhcrc.org/pplacer). The code is in written in OCaml [12], an appropriate choice as it has O (log n) immutable set operations in the standard library. The input can either be a “ reference package ” containing both taxonomic and phylogenetic information, or simply a phylogenetic tree along with a comma separated value file specifying the color assignments. Our implementation has been validated using an independent “ brute-force ” imple- mentation in Python; the two codes return identical results on a testing corpus consisting of all colorings on all trees of three to eight leaves with up to six colors. These trees and results can be downloaded at http://matsen.fhcrc.org/ pplacer/data/convexify-validation.tar.gz. The algorithm is invoked via a single command line call, which outputs a list of uncolored taxa for every nonconvex taxonomic rank as well as displaying them on a taxonomically labeled tree by highlighting them in red.
Show more

11 Read more

Analysis of kinetoplast cytochrome b gene of 16 Leishmania isolates from different foci of China: different species of Leishmania in China and their phylogenetic inference

Analysis of kinetoplast cytochrome b gene of 16 Leishmania isolates from different foci of China: different species of Leishmania in China and their phylogenetic inference

Phylogenetic hypotheses of Leishmania were generated with cyt b kDNA segments using two types of commonly applied phylogenetic techniques: heuristic searches using maximum parsimony (MP) analyses performed with the program PAUP* program and Bayesian inference (BI) using the MrBayes v.3.2 program [46]. In both MP and BI analyses, gaps were treated as missing data. For heuristic searches under parsimony, invariant characters were removed from the dataset. Each search involved 10 ran- dom additional replicates, one tree held at each step, with tree bisection and reconnectin branch swapping, steepest descent on, and a maximum of 10,000 saved trees. Non- parametric bootstrapping was used to generate phylogeny confidence values [47], with 1,000 pseudoreplicates using a heuristic tree search for each pseudoreplicate. Trypano- soma brucei (M94286) was used to root the trees.
Show more

12 Read more

From alignment of etymological data to phylogenetic inference via population genetics

From alignment of etymological data to phylogenetic inference via population genetics

Recently, mathematical theory of statistical physics has been shown to unite stochastic mod- els of evolution in seemingly diverse fields, such as population genetics, ecology and linguis- tics (Blythe and McKane, 2007; Blythe, 2009; Baxter et al., 2009; V´azquez et al., 2010). How- ever, statistical inference about language evolution under such models is complicated by the practi- cally intractable form of likelihoods for even a moderate set of languages. This calls for novel ways to probabilistic evaluation of any particu- lar phylogenetic model and for learning the most plausible genealogies from data. In the con- text of population genetics, such an approach is introduced in (Sir´en et al., 2011; Sir´en et al.,
Show more

11 Read more

A MULTI-GENE MOLECULAR SYSTEMATIC STUDY OF THE KICKXELLOMYCOTINA, INCLUDING THE EXAMINATION OF TWO NEW GENES (MCM7 AND TSR1) FOR PHYLOGENETIC INFERENCE

A MULTI-GENE MOLECULAR SYSTEMATIC STUDY OF THE KICKXELLOMYCOTINA, INCLUDING THE EXAMINATION OF TWO NEW GENES (MCM7 AND TSR1) FOR PHYLOGENETIC INFERENCE

Although Aguileta et al. (2008) demonstrated the utility and power of these two genes for phylogenetic analysis, neither primer sequences nor PCR protocols were provided. Schmitt et al. (2009) aligned amino acid sequences (from GenBank) to design new degenerate primers to amplify regions of both MCM7 and TSR1. With these primers, they were able to sequence MCM7 and TSR1 for 42 species of lichenized ascomycetes. The resulting phylogeny was well-resolved and demonstrated the potential use of these genes for other taxa. Raja et al. (2011) performed additional testing of MCM7 among the Ascomycota and found that it resolved relationships more strongly than the ribosomal large subunit (LSU), one of the most commonly used genes within the ascomycetes. Morgenstern et al. (2012) generated a phylogeny using MCM7 sequences from genome-sequenced fungi, which included some early-diverging taxa. Hermet et al. (2012) utilized both MCM7 and TSR1 in a study of Mucor, demonstrating the potential utility of the MCM7 and TSR1 genes outside of the Dikarya. Despite the apparent phylogenetic potential beyond the Mucorales (Hermet et al. 2012), these genes have not yet been
Show more

182 Read more

EVOLUTIONARY CHANGE OF RESTRICTION CLEAVAGE SITES AND PHYLOGENETIC INFERENCE

EVOLUTIONARY CHANGE OF RESTRICTION CLEAVAGE SITES AND PHYLOGENETIC INFERENCE

veloped for the probability of having a particular pattern of site changes among evolutionary lineages, such as parallel gains or losses of sites, and for inferring th[r]

27 Read more

Robust Entity Clustering via Phylogenetic Inference

Robust Entity Clustering via Phylogenetic Inference

Our primary contribution consists of new model- ing ideas, and associated inference techniques, for the problem of cross-document coreference resolu- tion. We have described how writers systematically plunder (φ) and then systematically modify (θ) the work of past writers. Inference under such models could also play a role in tracking evolving memes and social influence, not merely in establishing strict coreference. Our model also provides an al- ternative to the distance-dependent CRP. 2

11 Read more

Protein structure, distribution of homoplasy and phylogenetic inference

Protein structure, distribution of homoplasy and phylogenetic inference

A close look at the genetic code structure shows an asymmetry of nucleotide substitutions when there is a change of residue hydrophobicity and volume and current molecular model of evolu[r]

173 Read more

Algebraic Geometry of Phylogenetic Models.

Algebraic Geometry of Phylogenetic Models.

The goal of phylogenetic inference is to find a tree that captures the evolutionary re- lationships between species. However, as referenced in Section 1.1, various biological phenomena confound this effort. Individual genes may actually conform to different phy- logenetic trees, telling conflicting stories about the species in which they reside. The result is that a model-based approach on a single tree may be doomed to fail. Suppose for example that one has aligned DNA sequences and that a certain portion of the se- quences evolved according to a model on one tree and a different portion independently according to a model on another. Then the observed distribution on the n-tuples of DNA bases is unlikely to belong to either model. Instead, the observed distribution would be a weighted sum of two distributions, one from each model, where the weighting is ac- cording to the proportion of DNA that evolved according to each. Geometrically, the observed distribution would lie on a line between two probability distributions, one from each model.
Show more

95 Read more

The importance of phylogenetic model assessment for macroevolutionary inference

The importance of phylogenetic model assessment for macroevolutionary inference

Chapters 2 to 5 in this thesis address two assumptions that could be made when using phylogenetic timescale estimates to study macroevolution. The first is that speciation and extinction are the primary drivers of species richness. While phylogenetic estimates can provide some insight about macroevolution, it is becoming increasingly apparent that dispersal can be a primary driver of local species richness (e.g. Wiens & Donoghue 2004). Chapter 2 uses phylogenetic inferences to assess the importance of dispersal in driving species richness across latitudes. The second assumption that is common in studies of macroevolution is that the methods for phylogenetic inference provide reliable estimates of divergence times. Critically, research in macroevolution is becoming increasingly reliant on the accurate estimation of evolutionary divergence times using phylogenetics. Meanwhile, there are several potential sources of bias to these estimates that have not been investigated in depth. Some sources of biased estimates of divergence times might arise from macroevolutionary processes themselves, which is the subject of chapters 3, 4, and 5.
Show more

195 Read more

Bayesian inference of ancestral dates on bacterial phylogenetic trees

Bayesian inference of ancestral dates on bacterial phylogenetic trees

Here, we present a new methodology called BactDating for analyzing dated genetic data in order to estimate evo- lutionary rates and dated phylogenies in bacterial popu- lations. We use a Bayesian framework for inference as in BEAST, but consider that phylogenetic relationships have been assessed in a previous step as in the optimization and maximum likelihood methods described above. This way we enjoy the benefits of Bayesian inference in ancestor dating (22), such as assessment of uncertainties and flexibility of model choice and comparison, but with a computational scalability and speed comparable to the optimization meth- ods described above. Furthermore, we explore the specific problems posed by application in bacterial genomics, and in particular the disruptive effect that homologous recombi- nation can have on estimates of the temporal signal (23,24). Recombination is well known for disrupting phylogenetic inference, and especially to affect branch lengths estimates so that trees look star-like with abnormally long termi- nal branches (23,25,26). To account for this, sites detected as recombinant are sometimes removed prior to running BEAST, but this approach is inefficient and can even exacer- bate the problem (23). A more principled method is imple- mented in the Bacter package (27) which incorporates the ClonalOrigin model of bacterial recombination (28) within BEAST2 (5), but this approach is too computationally in- tensive to be applicable to large genomic datasets. Instead we show how the effect of recombination can be accounted for in the dating of ancestral nodes, by exploiting a scalable phylogenetic method that accounts for bacterial recombi- nation such as ClonalFrameML (29) or Gubbins (30).
Show more

11 Read more

Refuting phylogenetic relationships

Refuting phylogenetic relationships

The sequences of 34 genes identified as core genes in [12] were retrieved using the program "retrieve sequences" in all analyses of the Neurogadgets website [38] (option "Reciprocal best match in other genomes" using a GI number). The 34 genes were: argS, infB, pheS, rplN, secY, dnaG, ksgA, proS, rpsB, serS, dnaX, leuS, rplA, rpsC, thrS, fusA, lysS, rplC, rpsD, trpS, gcp, metG, rplE, rpsG, valS, gltX, nusA, rplF, rpsH, ychF, hisS, nusG, rplK, rpsM. For all these markers, we produced a careful alignment and prelimi- nary phylogenetic analyses (NJ) to check the sequence orthology. We subsequently excluded from the files all the instances of species harboring multiple copies of each gene and obtained a set of 34 files with 135 shared spe- cies. Maximum Likelihood analyses, using Phyml [13], were conducted on these data to ensure that the mono- phyly of the main groups under study was supported. As it appeared that the homology of archaeal and bacterial sequences in lysS was doubtful, and that pheS and proS presented either hidden paralogy problems or more likely ancient LGTs between the two prokaryotic domains, these 3 markers were removed for the rest of our study. We used a selection of 43 common species representative of 8 major prokaryotic groups in the 31 remaining markers, for further in-depth phylogenetic analyses to be presented here and elsewhere. The groups tested here were: the Archaea (Halobacterium sp., Pyrococcus abyssi, Archaeoglobus fulgidus, Methanosarcina acetivorans, Thermoplasma volca- nium, Pyrobaculum aerophilum, Aeropyrum pernix, Sulfolobus solfataricus), the Spirochaetes (Borrelia burgdorferi,
Show more

19 Read more

Show all 3137 documents...