6 PART IV: EVOLUTIONARY ANALYSIS
6.3 Constructing Phylogenetic Trees
6.3.5 Statistical Tests of a Tree Obtained
G
GeenneerraallCCoommmmeennttssoonnSSttaattiissttiiccaallTTeessttss
There are two different types of methods for testing the reliability of an obtained tree. One is to test the topological difference between the tree and its closely related tree by using a certain quantity, for example, the sum of all branch lengths in the minimum evolution method. This type of test examines the
reliability of every interior branch of the tree, and is generally a conservative test as compared to other tests included in MEGA.
The other type of test examines the reliability of each interior branch whether or not it is significantly different from 0. If a particular interior branch is not
significantly different from 0, we cannot exclude the possibility of a trifurcation of the associated branches or that the other types of bifurcating trees can be generated by changing the splitting order of the three branches involved. Therefore, in MEGA we implement the bootstrap procedure for estimating the standard error of the interior branch and test the deviation of the branch length from 0 (Dopazo 1994).
The third type of test is the bootstrap test, in which the reliability of a given branch pattern is ascertained by examining the frequency of its occurrence in a large number of trees, each based on the re-sampled dataset.
Details of these procedures are given in Nei and Kumar (2000, chapter 9).
C
CoonnddeennsseeddTTrreeeess
When several interior branches of a phylogenetic tree have low statistical support (PC or PB) values, it often is useful to produce a multi-furcating tree by assuming that all interior branches have a branch length equal to 0. We call this multi- furcating tree a condensed tree. In MEGA, condensed trees can be produced for any level of PC or PB value. For example, if there are several branches with PC or PB values of less than 50%, a condensed tree with the 50% PC or PB level will
have a multi-furcating tree with all its branch lengths reduced to 0.
Since branches of low significance are eliminated to form a condensed tree, this tree emphasizes the reliable portions of branching patterns. However, this tree has one drawback. Since some branches are reduced to 0, it is difficult to draw a tree with proper branch lengths for the remaining portion. Therefore we give our attention only to the topology so the branch lengths of a condensed tree in MEGA are not proportional to the number of nucleotide or amino acid substitutions.
Note that, although they may look similar, condensed trees are different from the consensus trees mentioned earlier. A consensus tree is produced from many equally parsimonious trees, whereas a condensed tree is merely a simplified version of a tree. A condensed tree can be produced for any type of tree (NJ, ME, UPGMA, MP, or maximum-likelihood tree).
See also Nei and Kumar (2000) page 175.
I
InntteerriioorrBBrraanncchhTTeessttss
Interior Branch Test of Phylogeny
Phylogeny | Interior Branch Test of Phylogeny
A t-test, which is computed using the bootstrap procedure, is constructed based on the interior branch length and its standard error and is available only for the NJ and
Minimum Evolution trees. MEGA shows the confidence probability in the Tree Explorer; if this value is greater than 95% for a given branch, then the inferred length for that branch is considered significantly positive.
See Nei and Kumar (2000) (chapter 9) for further details. Neighbor Joining (Construct Phylogeny)
Phylogeny | Construct Phylogeny | Neighbor-Joining…
This command is used to construct a neighbor-joining (NJ) tree (Saitou & Nei 1987). The NJ method is a simplified version of the minimum evolution (ME) method, which uses distance measures to correct for multiple hits at the same sites, and chooses a topology showing the smallest value of the sum of all branches as an estimate of the correct tree. However, the construction of an ME tree is time-consuming because, in principle, the S values for all topologies have to be evaluated and the number of possible topologies (un-rooted trees) rapidly increases with the number of taxa.
With the NJ method, the S value is not computed for all or many topologies. The examination of different topologies is imbedded in the algorithm, so that only one final tree is produced. This method does not require the assumption of a constant rate of evolution so it produces an un-rooted tree. However, for ease of inspection, MEGA displays NJ trees in a manner similar to rooted trees. The algorithm of the NJ method is somewhat complicated and is explained in detail in Nei and Kumar (2000).
For constructing the NJ tree, MEGA may request that you specify the distance estimation method, subset of sites to include, and whether to conduct a test of the inferred tree through an Analysis Preferences dialog box.
B
BoooottssttrraappTTeessttss
Bootstrap Test of Phylogeny
Phylogeny | Bootstrap Test of Phylogeny
One of the most commonly used tests of the reliability of an inferred tree is Felsenstein's (1985) bootstrap test, which is evaluated using Efron's (1982) bootstrap re-sampling technique. If there are m sequences, each with n nucleotides (or codons or amino acids), a phylogenetic tree can be reconstructed using some tree building method. From each sequence, n nucleotides are randomly chosen with replacements, giving rise to m rows of n columns each. These now constitute a new set of sequences. A tree is then reconstructed with these new sequences using the same tree building method as before. Next the topology of this tree is compared to that of the original tree. Each interior
branch of the original tree that is different from the bootstrap tree the sequence it partitions is given a score of 0; all other interior branches are given the value 1. This procedure of re-sampling the sites and the subsequent tree reconstruction is repeated several hundred times, and the percentage of times each interior branch is given a value of 1 is noted. This is known as the bootstrap value. As a general rule, if the bootstrap value for a given interior branch is 95% or higher, then the topology at that branch is considered "correct". See Nei and Kumar (2000) (chapter 9) for further details. This test is available for four different methods: Neighbor Joining, Minimum Evolution, Maximum Parsimony, and UPGMA.
B
Boooottssttrraappmmeetthhooddttooccoommppuutteessttaannddaarrddeerrrroorrooffddiissttaanncceeeessttiimmaatteess
When you choose the bootstrap method for estimating the standard error, you must specify the number of replicates and the seed for the pseudorandom number generator. In each bootstrap replicate, the desired quantity is estimated and the standard deviation of the original values is computed (see Nei and Kumar [2000], page 25 for details). It is possible that in some bootstrap replicates the quantity you desire is not calculable for statistical or technical reasons. In these cases, MEGA will discard the results of the bootstrap replicates and its final estimate will be the results of all valid replicates. This means that the number of bootstrap replicates used can be smaller than the number specified by the user. However, if the number of valid bootstrap replicates is < 25, then MEGA will report that the standard error cannot be computed (an "n/c" swill appear in the result window).