2 Filtering gene fusions in 1,011 cancer cell lines
2.2 Filtering approach
2.2.6 Summary and overview of a framework for fusion filtering
In the above paragraphs, I examined the effectiveness of a set of filters in removing technical artefacts from our list of putative fusion transcripts. Overall, I found that using a splitting read threshold of at least 4 splitting reads per transcript increased the projected true positive rate from 51% to 63% (Figure 2.3). Also, only using fusion calls supported by two or more algorithms significantly increased the biological replicability. Finally, I also removed a small number of fusions that were recurrently found in a set of non-neoplastic GTEx samples.
Note, that during the course of my PhD, I also examined the efficacy of a further 4 filters, that examine false positives filtered by 1) regions of high repeats, 2) ratios of splitting reads to spanning reads, 3) the deFuse additional filters and 4) highly recurrent fusions. No efficacy was found for those filters (keeping in mind the limitations of my benchmarking approach) and for the sake of brevity, the data is not shown.
Thus, for our final filtering pipeline summarised in Figure 2.7, I decided to implement only the splitting read and multi-algorithm filters, as they showed the greatest efficacy in increasing the proportion of true positives.
Implemented across our 1,011 cell lines, of the 1,934,711 putative fusion transcripts outputted by cgpRna, 10,514 (0.5%) pass my filtering process (Supplementary Table 3).
Figure 2.7: Overview of the filtering and annotation pipeline to identify our final catalogue of gene fusions in 1,011 cancer cell lines.
68
This represents 8,354 fusion events, i.e. unique cell line/fusion combinations. PCR validations performed on a subset of these fusion transcripts (n = 474) yields a projected 69.2% pass rate. In terms of the 23 cell lines that were RNA-Seq’d at two independent sources, applying the two filters substantially improved the proportion of shared fusions from 6.7% to 70.4%.
Finally, I compared our filtered list of fusions with the list of fusion events described in literature (see section 2.2.1.3). Our raw output contained 84.13% of ‘validated’ and 52.6% of all remaining fusion events described in literature. After filtering, we still matched 74.21% of validated and 39.74% of the remaining fusion events. Although we lost a small number of fusion events that had previously been validated, overall considering that 99.95% of putative fusion events were removed, our filtering strongly enriches our set with previously observed fusion events, i.e. likely true positives.
Although my set of filters provides a vast improvement over the validation rate of our set of fusion transcripts, it is clear that separating true positive from false positive fusions with high reliability remains a challenge. A high validation rate suggests a high specificity of a calling and filtering approach in excluding false negatives. However, in many instances implementing our filtering pipeline also reduces the sensitivity, as fusion transcripts that validate by PCR are filtered out.
At the same time, a limitation of my approach are the benchmarking tools that are available to test filters against. While performing the PCR validations for 945 was without a doubt a big effort, these only represented a fraction of the fusion transcripts called. Similarly, they under-represent certain types of fusion calls which may have created a skew in the analysis. For instance, the vast majority of PCR validations were performed on fusion transcripts that were called with more than 4 splitting reads and there was an overrepresentation of fusion transcripts called by multiple algorithms.
Similarly, the analysis of shared fusions across the same 23 samples from different RNA-Seq sources was extremely valuable. It highlighted how single fusion-calling algorithms often called fusion transcripts that were not replicated in another sequencing source. Some variability observed between different cell lines is currently unexplained but may be related to data quality from either RNA-Seq data source.
69 The difficulty in obtaining a list of fusions that comprehensive and free from false positives is not unique to our data-set. As mentioned in the introduction, multiple publications that examined RNA-Seq fusion calling algorithms agreed that there is a room for improvement in the sensitivity and specificity of the algorithms (Carrara et al., 2013; Kumar et al., 2016). It is possible that a certain degree of noise in the data is unavoidable with the current tools available, especially as fusion-calling relies on the detection of non- standard read alignments that may be particularly error-prone. Notably, very few papers that conduct fusion-calling from RNA-Seq data report any validation rates beyond the validation of a selected few high-confidence hits. The recent TCGA analysis reports a validation rate of 63.3%, based on a subset of samples with available whole-genome sequencing data (Gao et al., 2018). Their method of validation differs from my method of benchmarking fusion transcripts based on 945 PCR validations and the overlap of shared fusions in 23 different samples. Nonetheless, the ~70% validation rate from our methods suggest that our fusion-calling and filtering approach performs relatively well. At the same time, this analysis shows that working with data produced by single fusion-calling algorithms can be extremely unreliable and caution should be used when multiple algorithms are not used.
Future research may yield more insight into the nature and causes of sequencing and fusion-calling artefacts and could lead to advances in the sensitivity and specificity of computational fusion-calling algorithms and associated filters. Thus, in the future, it may be useful to reanalyse the RNA-Seq data for fusion calls, as any improvements in sensitivity and specificity can lead to substantial noise reduction in my downstream analyses.
71