5.4 Experimental Evaluation
5.4.1 Experimental Setup
Data Simulations were conducted on 30- and 50-taxon phylogenies. For 30-taxon data sets, 10 random trees were generated using PHYL-O-GEN tool [91] as “species tree” under birth-death model, and 5 horizontal gene transfer events were simulated between pairs of branches on the species trees using Galtier’s tool [92]. The simulation of horizontal gene transfer were conducted individually 10 times on each species tree, so totally 100 networks are generated from the simulation. Since Galtier’s tool does not provide the details of simulated transfer events, we modified the tool to have it report the simulated transfers that it added. From the set of 32 gene trees contained in each network, 4, 8, 12, 16, 24, and 32 gene trees were randomly sampled and used as input to the methods.
For 50-taxon data sets, the same procedure as above was applied with the two differences: (1) the number of horizontal gene transfer events simulated was 10, so the sampling was made over 1024 gene trees, (2) and the 30 times of sampling process was repeated for each sample size to generate input data, so that statistically significant results are obtained.
The second evaluation runs M2 [3], MURPAR [132] and PIRN [4] both on the simulated data sets and on a biological data. For biological data, we used the Poacaea
data set, which was originally sequenced by the Grass Phylogeny Group, and was used to test both CASS [5] and PIRN [4]. Binary trees were constructed for six loci: ITS, ndhF, phyB, rbcL, rpoC and waxy [139]. Since the gene trees had different sets of leaves, we selected the gene trees for ndhF, phyB, rbcL, rpoC2 and ITS, and restricted them to 14 leaves that they have in common.
Methods and Accuracy Measures All of our methods are based on solving the Pairwise HGT Inference problem. M1 runs the exact method of Wu [35], SPRDist, that returns the exact rSPR distance, since it only requires the rSPR dis- tance. However, M2 and MURPAR cannot directly utilize the method, because they require the placement of the HGT estimates. So we employ RIATA-HGT method [39, 40], as implemented in PhyloNet [136] to solve the Pairwise HGT Inference problem and obtain the locations of the multiple optimal HGT events. Other pairwise inference tools including SPRIT [140] were tested as well, but results were almost identical; therefore, all results are reported based on the pairwise solu- tions obtained by either SPRDist or RIATA-HGT. We used the GLPK ILP Solver to solve the ILP formulation of MURPAR.
Although we introduce CASS [5] as an algorithm for the problem, we do not use it for comparison. As indicated by the authors in [5], while CASS computes a minimal network N from an input set C of clusters of a set of gene trees T , it is not guaranteed that T ⊆ T (N). More formally, if C is the set of all clusters of taxa displayed by the trees in T , the network N computed by CASS is the minimal network that displays all clusters in C . It is important to note that if a network N displays all clusters of a set of trees, N does not necessarily display all the trees themselves. It is easy to see that if N is minimal for C and N� is minimal for T (C is the set of all clusters
of trees in T ), then the number of reticulation nodes in N is smaller than or equal to that in N�, because the problem with C is less restrictive than that with T . An
illustration of this issue is given in Fig. 5.3.
d
b c
a a b d c d b a c a b d c a b d c
T1 T2 T3 N1 N2
Figure 5.3 : The difference between the formulation used by MURPAR, M2 [3], and PIRN [4], and the one used by CASS [5]. For the input set of gene trees T = {T 1, T 2, T 3}, CASS computes a network with a single reticulation node (N1), since this network displays all clusters of the gene trees. However, MURPAR, M2, and PIRN compute minimal networks with two reticulation nodes, such as N2, since 2 is the minimum number of reticulation nodes required in a network that displays all three gene trees.
The evaluation for the first part measures detected number of reticulations from the methods, and detectable from the input tree set. When M1 or M2 is run on a collection T = {T1, . . . , Tk} ⊆ T (N) induced by network N, we record the number
of reticulations that the method computed; we call this number the detected number of reticulations. Now, if network N was generated with 5 or 10 HGTs, this does not necessarily mean that the collection T of trees will have all trees to allow for detecting 5 or 10 HGTs, respectively. For example, consider the collection T that has only trees whose (pairwise) SPR distance is 1. In this case, the number of detectable HGTs is 1, and not 5 (or 10). Therefore, for each such collection T , we compute (exhaustively) the smallest subset of HGTs in N that can reconcile all trees in T ; we call this number the detectable number of reticulations (notice that this is not necessarily the smallest number of reticulations needed to reconcile all trees in T ; computing this number would be prohibitive). The accuracy of the methods is considered better
as the difference between detectable and detected numbers of reticulations becomes smaller.
The second part evaluates the performance by comparing their return values. Since they all return the lower-bound of the number of nework-nodes required to reconcile the input trees, the values can directly be used for comparison. In parsimony, the smaller the value is, the better the corresponding approach is. In this comparison, M1 is excluded, because its return is not by the same measure. We will refer to this as accuracy. Besides accuracy, we also assess the run time of the methods, which is important when they are used as a preprocessing unit for the following analysis. We checked the results from both 30-taxon and 50-taxon data sets for the second part, and confirm that they show the same trend. But we only visualize the result of 30-taxon data sets.