6 PART IV: EVOLUTIONARY ANALYSIS
6.3 Constructing Phylogenetic Trees
6.3.4 Maximum Parsimony (MP) Method
B
Brraanncchh--aanndd--BBoouunnddaallggoorriitthhmm
The branch-and-bound algorithm is used to find all the MP trees. It guarantees to find all the MP trees without conducting an exhaustive search. MEGA also employs the Max-mini branch-and-bound search, which is described in detail in Kumar et al. (1993) and Nei and Kumar (2000, page 123).
A
AlliiggnnmmeennttGGaappssaannddSSiitteesswwiitthhMMiissssiinnggIInnffoorrmmaattiioonn
In MEGA, gap sites are ignored in the MP analysis, but there are two different ways to treat these sites. One is to delete all of these sites from data analysis. This option, called the Complete-Deletion option, is generally desirable because different regions of DNA or amino acid sequences often evolve under different evolutionary forces. However, if the number of nucleotides (or amino acids) involved in a gap is small and gaps are distributed more or less randomly, you may include all such sites and treat them as missing data. Therefore, gaps and missing data are never used in computing tree lengths in MEGA 4.
C
CoonnsseennssuussTTrreeee
The MP method produces many equally parsimonious trees. Choosing this command produces a composite tree that is a consensus among all such trees, for example, either as a strict consensus, in which all conflicting branching patterns among the trees are resolved by making those nodes multifurcating or as a Majority-Rule consensus, in which conflicting branching patterns are resolved by selecting the pattern seen in more than 50% of the trees. (Details are given in Nei and Kumar [2000], page 130).
A
AnnaallyyssiissPPrreeffeerreenncceess((MMaaxxiimmuummPPaarrssiimmoonnyy))
This dialog box contains four overlapping pages, with each page marked by Tabs
running across the top. You can go to any page by simply clicking on the Tab. Each tab page organizes a set of logically related options. Information from all the pages is used in the requested analysis, so it is important that you examine the options selected in each tab before pressing OK to proceed with analysis.
Phylogeny Test and Options
To assess the reliability of the MP trees, MEGA provides the bootstrap test. You need to enter the number of replicates and a starting random seed for this test.
Search Options
Use this to select between the branch-and-bound and the heuristic (close- neighbor interchange) searches. For the branch-and-bound search, an
optimized Max-mini branch-and-bound algorithm is used. While this algorithm is guaranteed to find all the MP trees, a branch-and-bound search often is too time consuming for more than 15 sequences, although this number varies from data set to data set. Alternatively, you may use the heuristic search (Close-Neighbor- Interchange)., a branch swapping method that begins with a given initial tree. You may automatically obtain a set of initial trees by using the Min-mini algorithm with a given search factor. Alternatively, you can use the random addition option to produce the initial trees.
Include Sites
This provides options for handling gaps and missing data in the analysis,
specifying inclusion and exclusion of codon positions, and restricting the analysis to only some types of labeled sites (if applicable).
Gaps and Missing Data
You may choose to remove all sites containing alignment gaps and missing- information before the parsimony analysis begins using the Complete-deletion option. Alternatively, you may choose to retain all such sites. In this case, all missing-information and alignment gap sites are treated as missing data in the calculation of tree length.
Codon Positions
By clicking on the ellipses (or the lime square), you may select any combination of 1st, 2nd, 3rd, and non-coding positions for analysis. This option is available only if the nucleotide sequences contain protein-coding regions. If they do, you can choose between the analysis of nucleotide sequences or translated protein sequences. If you choose the latter, MEGA will translate all protein-coding regions into amino acid sequences and conduct the protein sequence parsimony analysis.
Labeled Sites
This option is available only if there are labels associated with some or all of the sites in the data. By clicking on the ellipses, you will have the option of including sites with selected labels. If you choose to include only labeled sites, then these sites will be the first extracted from the data and all other options mentioned above will be enforced. Note that labels associated with all three positions in the codon must be included for a full codon to be incorporated in the analysis.
H
HeeuurriissttiiccSSeeaarrcchh
Min-mini algorithm
This is a heuristic search algorithm for finding the MP tree, and is somewhat similar to the branch-and bound search method. However, in this algorithm, many trees that are unlikely to have a small local tree length are eliminated from the computation of their L values. Thus while the algorithm speeds up the search for the MP tree, as compared to the branch-and-bound search, the final tree or trees may not be the true MP tree(s). The user can specify a search factor to control the extensiveness of the search and MEGA adds the user specified search factor to the current local upper bound. Of course, the larger the search factor, the slower the search, since many more trees will be examined.
(See also Nei & Kumar (2000), pages 122, 125)
Close-Neighbor-Interchange (CNI)
In any method, examining all possible topologies is very time consuming. This algorithm reduces the time spent searching by first producing a temporary tree, (e.g., an NJ tree when an ME tree is being sought), and then examining all of the topologies that are different from this temporary tree by a topological distance of dT = 2 and 4. If this is repeated many times, and all the topologies previously
examined are avoided, one can usually obtain the tree being sought. For the MP method, the CNI search can start with a tree generated by the random addition of sequences. This process can be repeated multiple times to find the MP tree.
See Nei & Kumar (2000) for details.