3.3 Results
3.3.1 Simulations
To evaluate CTPsingle in a controlled environment and compare its performance to Ances-Tree, LICHeE and PyClone, we performed two sets of simulations: (1) with low coverage (∼ 32×) as routinely observed in whole genome sequencing experiments; and (2) with ultra-high coverage (∼ 2, 800×) as typically obtained from deep sequencing experiments. As the other methods are primarily intended for multi-sample datasets, for each coverage-depth, we also generate two sets of simulations: (a) with a single sample per tumor; and (b) with two samples per tumor. Although CTPsingle can not exploit the additional information that can be obtained from the second sample, our results demonstrate that it achieves
●
CTPsingleAncesTreeLICHeEPyClone
0.4 0.8 True purity
Predicted purity
Low coverage single−sample
●
CTPsingleAncesTreeLICHeEPyClone
0.4 0.8 True purity
Predicted purity
Low coverage multi−sample
●
CTPsingleAncesTreeLICHeEPyClone
0.4 0.8 True purity
Predicted purity
Deep coverage single−sample
●
CTPsingleAncesTreeLICHeEPyClone
0.4 0.8 True purity
Predicted purity
Deep coverage multi−sample
Figure 3.1: Comparison of true versus predicted tumor purity across the simulation ex-periments. Each dot represents a distinct sample and is colored based on its real tumor purity. The red lines illustrate the y = x line in each plot.
better results in all four experiment settings. For each experiment setting, we simulate 50 instances. The rest of the simulation details are given in Supplementary Section A.1.
We compare the performance of the tools using three measures: (i) estimated tumor purity; (ii) number of subclones predicted; and (iii) root mean square error (RMSE) of cancer cell fractions.
The RMSE measure is calculated using the cancer cell fractions of mutations reported by each method as follows. Let RLF(M) denote the cancer cell fraction of the lineage where mutation M is assigned to in the reported solution and TLF(M) denote the cancer cell fraction of the lineage from which M originates in the ground truth. Then RMSE is calculated as:
v
where S represents the set of mutations reported by the tool. Above, the cancer cell fractions for each tool is computed using the subclonal or lineage frequencies reported by the tool and adjusted by the tumor purity estimated in the same way as is done in CTPsingle.
We note that both AncesTree and LICHeE discard a small number of mutations in some cases, possibly giving them an unfair advantage in the calculation of RMSE. CTPsingle and PyClone report all mutations. The calculation of tumor purity, number of subclones and lineage frequencies, as well as the running parameters for the tools are all described in Supplementary Section A.2.
0 10 20 30
0 10 20 30
0 10 20 30
0 10 20 30
CTPsingleAncesTreeLICHeEPyClone
0 1 2 3 410
#subclones error
count
Low coverage single−sample
0 20 40 60
0 20 40 60
0 20 40 60
0 20 40 60
CTPsingleAncesTreeLICHeEPyClone
0 1 2 3
#subclones error
count
Low coverage multi−sample
0 10 20 30 40
0 10 20 30 40
0 10 20 30 40
0 10 20 30 40
CTPsingleAncesTreeLICHeEPyClone
0 1 2 3
#subclones error
count
Deep coverage single−sample
0 25 50 75
0 25 50 75
0 25 50 75
0 25 50 75
CTPsingleAncesTreeLICHeEPyClone
0 1 2 3
#subclones error
count
Deep coverage multi−sample
Figure 3.2: Comparison of the absolute difference between the true and predicted number of subclones across the simulation experiments. The single-sample experiments contain 50 samples and the multi-sample experiments contain a total of 100 (50 × 2) samples.
Figure 3.1 shows the comparison of the true versus predicted tumor purities for each experiment. As can be seen from the plots, CTPsingle outperforms other methods in all experiments. For the low coverage datasets, both AncesTree and LICHeE tend to under-estimate the purity, although AncesTree also calls near 1.0 purity in some cases. While the purity estimates are significantly improved for AncesTree in the case of deep coverage, LICHeE estimates show little difference for deep coverage. This is probably due to the fact that this method directly works with variant allele frequencies and hence cannot distinguish between high and low coverage data. In contrast, CTPsingle estimates purity with almost 100% accuracy in the deep coverage samples. The two outliers in the case of multi-sample deep coverage experiment belong to the same tumor instance and it is due to the fact that this tumor contains a subclone consisting of only 2 mutations. We also note that PyClone appears to report a purity over 1.0 for some samples. This is because this tool reports allelic frequencies rather cellular frequencies. Thus, we multiply the frequencies reported by PyClone by a factor of 2 to calculate the purity estimated by this tool. However, for some samples the original frequencies seem to be closer to the true purity. Nevertheless, we choose to keep this practice as it improves the overall correlation between the predicted and true purities for this tool, especially for the low coverage samples. In addition, this problem does not affect the RMSE evaluation, as cancer cell fractions are already normalised (i.e., the highest CCF will always be 1.0).
Figure 3.2 shows the distribution of #subclones error for each method across the ex-periments. #subclones error is simply calculated as the absolute difference between the predicted number of subclones versus the true number of subclones. Once again, the plots
0 5 10 15 20
0 5 10 15 20
0 5 10 15 20
0 5 10 15 20
CTPsingleAncesTreeLICHeEPyClone
0.0 0.4 0.8 RMSE
count
Low coverage single−sample
0 10 20 30 40
0 10 20 30 40
0 10 20 30 40
0 10 20 30 40
CTPsingleAncesTreeLICHeEPyClone
0.0 0.4 0.8 RMSE
count
Low coverage multi−sample
0 1020 3040 50
100 2030 4050
0 1020 3040 50
100 2030 4050
CTPsingleAncesTreeLICHeEPyClone
0.0 0.4 0.8 RMSE
count
Deep coverage single−sample
0 25 50 75 100
0 25 50 75 100
0 25 50 75 100
0 25 50 75 100
CTPsingleAncesTreeLICHeEPyClone
0.0 0.4 0.8 RMSE
count
Deep coverage multi−sample
Figure 3.3: The histogram of root mean square error (RMSE) values across the simulation experiments. For each sample, RMSE is calculated using the formula given in 3.13. The bin size for the histogram is set to 0.1 for each plot. The single-sample experiments contain 50 samples and the multi-sample experiments contain a total of 100 (50 × 2) samples.
show that CTPsingle outperforms the other methods in all experiment settings. In deep coverage datasets, CTPsingle correctly estimates the number of subclones in all but a few cases.
The histogram of the RMSE values for each sample is shown in Figure 3.3. In the low coverage datasets, RMSE values are typically below 0.3 for CTPsingle, while this measure can range up to 0.8 for AncesTree and LICHeE. Note that the samples with low RMSE often represent the cases where the number of subclones are correctly identified, although RMSE can also be low if two clusters of mutations with very similar true lineage frequencies are merged into one cluster. Hence, this measure penalizes methods that tend to merge mutations with very different lineage frequencies. For the deep coverage datasets, CTPsingle has less than 0.1 RMSE.
We also compare CTPsingle with CITUP on the same simulation datasets. To save computation time, we run CITUP only on trees with the correct number of nodes. Despite this advantage, we observe that CTPsingle performs better than CITUP on the low coverage datasets, although the performance of the two methods are very similar on the deep coverage datasets (Supplementary Section A.1 and Supplementary Figure A.2).
In terms of running time, we observe that LICHeE is the fastest tool and typically completed within a couple of minutes on our simulation datasets. While CTPsingle is slower than LICHeE, it completed all but a few samples within 30 minutes and did not exceed one hour on any sample. In contrast, AncesTree and PyClone took several hours on many samples.
Effect of false positive SNVs and copy number aberrations
Real datasets may contain a number of false positive mutation calls due to various rea-sons such as sequencing artefacts or mapping ambiguity. To evaluate the performance of CTPsingle in such cases, we performed an additional experiment with 50 simulations where a fraction of the called mutations represent false positives. For this experiment, we use the same parameters as in (1a), however we simulate 1% of the total mutations as false positives. For each such mutation, we randomly select a frequency from the interval [0, 1]
and simulate its variant read count based on this frequency.
To show that CTPsingle is able to infer clonality in tumors with moderate levels of copy number changes, we also performed 50 simulations on tumors with 30% genome aberrations.
Single nucleotide mutations that fall under these aberrant regions are discarded, losing about 30% of the data points on average. The other parameters were kept the same as in experiment (1a).
Figure 3.4 plots the results of CTPsingle in these datasets as compared to its performance on (1a). These plots suggest that CTPsingle’s performance does not deteriorate significantly in these datasets. Note that the slight improvement in the RMSE measure in the bottom row is probably due to the decreased number of total mutations usable by CTPsingle in this dataset.
Power to detect subclones on low and high coverage datasets
To investigate whether CTPsingle is effective for tumors that are highly heterogeneous (i.e., contain many subclones), we also performed additional simulations with 6 to 7 subclones.
For these experiments, we again generated 50 low coverage and 50 high coverage single-sample simulations. All other simulation parameters were kept the same as described in Supplementary Section A.1. Figure 3.5 illustrates the performance of CTPsingle on these additional datasets. As can be seen from the figure, CTPsingle could not estimate the correct number of subclones when applied to low coverage data. On the other hand, it can still estimate the tumor purity with good accuracy and has reasonably low RMSE. On deep coverage data, the performance is only mildly affected by additional subclones, suggesting that CTPsingle is suitable for highly heterogeneous tumors when deep sequencing data is available.