• No results found

6 PART IV: EVOLUTIONARY ANALYSIS

6.2 Computing Evolutionary Distances

6.2.2 Specifying Distance Estimation Options

A

AnnaallyyssiissPPrreeffeerreenncceess((DDiissttaanncceeCCoommppuuttaattiioonn))

In this dialog box you can select and view the desired options in the Options Summary. Options are organized in logical sections. A lime square in the right-most cell in a row indicates that you have a choice regarding the attribute in that row The three primary sets of options available in this dialog box are:

Analysis

Compute

Use this to specify whether to compute Distances only or Distances and

Standard Errors. If you select the latter, then you are given a choice as to how to compute it in the Standard Error Computation box.

Standard Error Computation By

This row is visible only if you have chosen Distances and Std. Err in the Compute row. You may choose to use analytical formulas or the bootstrap method to calculate standard errors depending on the type of distance computed.

Whenever the standard errors are estimated by the bootstrap method, you will be prompted for the number of bootstrap replicates and a random number seed. When you compute average distance or diversity, only the bootstrap method is available for computing standard errors.

Include Sites

These are options for handling gaps or missing data, including or excluding codon positions, and restricting the analysis to labeled sites, if applicable. Gaps and Missing Data

You may choose to remove all sites containing alignment gaps and missing information before the calculation begins (Complete-deletion option).

Alternatively, you may choose to retain all such sites initially, excluding them as necessary in the pair-wise distance estimation (Pair-wise-deletion option). Codon Positions

Click on the ellipses or the lime square, for the option of selecting any

combination of 1st, 2nd, 3rd, and non-coding positions for analysis. This option is available only if the nucleotide sequences contain protein-coding regions and you have selected a nucleotide-by-nucleotide analysis.

Labeled Sites

This option is available only if some or all of the sites have associated labels. By clicking on the ellipses, you will be provided with the option of including sites with selected labels. If you choose to include only labeled sites, then these sites will be the first extracted from the data. Then all other options mentioned above will be enforced. Note that labels associated with all three positions in the codon must be included for a full codon to be incorporated in the analysis.

Substitution Model

In this set of options, you choose the various attributes of the substitution models.

Model

Here you select a stochastic model for estimating evolutionary distance by clicking on the ellipses to the right of the currently selected model (click on the lime square to select this row first). This will reveal a menu containing many different distance methods and models.

Substitutions to Include

Depending on the distance model or method selected, the evolutionary distance can be teased into two or more components. By clicking on the drop-down button (first click on the lime square to select this row), you will be provided with a list of components relevant to the chosen model.

Transition/Transversion Ratio

This option will be visible if the chosen model requires you to provide a value for the Transition/Transversion ratio (R).

Pattern among Lineages

This option becomes available if the selected model has formulas that allow the relaxation of the assumption of homogeneity of substitution patterns among lineages.

Rates among Sites

This option becomes available if the selected distance model has formulas that allow rate variation among sites. If you choose gamma-distributed rates, then the Gamma parameter option becomes visible.

D

DiissttaanncceeMMooddeellOOppttiioonnss

With this option, you can choose the general attributes of the substitution models for DNA and protein sequence evolution.

Model

You can select a stochastic model for estimating evolutionary distances by clicking on the ellipses to the right of the currently selected model (click on the lime square to select this row first). This will reveal a menu containing many different distance

methods/models.

Transition/Transversion Ratio

This option will be visible if the chosen model requires you to provide a value for the Transition/Transversion ratio (R).

Pattern among Lineages

This option becomes available if the distance model you have selected has formulas that allow the relaxation of the assumption of homogeneity of substitution patterns among lineages.

Rates among Sites

This option becomes available if the distance model you have selected has formulas that allow rate variation among sites. If you choose gamma distributed rates, then the

Gamma parameter option becomes visible.

B

Boooottssttrraappmmeetthhooddttooccoommppuutteessttaannddaarrddeerrrroorrooffddiissttaanncceeeessttiimmaatteess

When you choose the bootstrap method for estimating the standard error, you must specify the number of replicates and the seed for the pseudorandom number generator. In each bootstrap replicate, the desired quantity is estimated and the standard deviation of the original values is computed (see Nei and Kumar [2000], page 25 for details). It is possible that in some bootstrap replicates the quantity you desire is not calculable for statistical or technical reasons. In these cases, MEGA will discard the results of the bootstrap replicates and its final estimate will be the results of all valid replicates. This means that the number of bootstrap replicates used can be smaller than the number specified by the user. However, if the number of valid bootstrap replicates is < 25, then MEGA will report that the standard error cannot be computed (an "n/c" swill appear in the result window).