Thesis Scope and Objectives - Interpretation of Mutations, Expression, Copy Number in Somatic B

In order to improve patient care through tailoring therapies based on disease characteristics, the field will require advancements in our ability to effectively interpret and analyze large genomic data. We hypothesize that improvements in genome-wide analyses - diagnostic tools and reagents for better detection of genetic abnormalities, mutation interpretation, and genomic signatures for chemotherapy response - can result in a more accurate understanding of tumour biology. This thesis introduces improvements in

both the design and analysis of experiments, and then applies these techniques to breast cancer. We describe the generation of new data, as well as leveraging existing data sets, with the same overall goal of validating the proposed methods and interpreting the results obtained.

Specifically, the main objectives of this thesis are to:

1) Improve the design and analysis of nucleic acid hybridization studies (specifically FISH and aCGH). We sought to develop a novel method to identify single copy regions in the genome that contain highly divergent repetitive elements, which we predicted to act as single copy sequences in optimized experimental conditions. We aimed to generate small, single copy FISH probes that contained divergent repetitive elements, to confirm their predicted behavior in metaphase FISH. In addition, we proposed that oligonucleotide placement throughout the genome (ie. distance to highly conserved repetitive elements) within these single copy regions can affect the variation observed in microarray signal intensities. Accordingly, we sought to develop an aCGH microarray to test this theory and compare the platform’s reproducibility to a commonly used commercial platform. Both FISH and aCGH methods were validated on normal samples, as well as samples with known genomic alterations.

2) The large number of predicted DNA variants arising from genome-wide studies creates new challenges to validate the effect of any given variant on the mRNA transcript and protein product. Splicing mutations represent a unique set of variants that can be validated using mRNA sequences. We aimed to develop a software tool that can conduct genome-wide, statistically robust validation of predicted splicing mutations using sample-matched RNA sequencing data.

3) Splicing mutations are currently underpresented in large genomic studies, given that the majority of experiments only assess the canonical nucleotides at an intron/exon boundary. We aimed to carry out indepth splicing mutation analyses on a large set of breast cancer tumours using previously published data from the TCGA. We hoped to identify new underlying processes of tumour biology not

previously described by protein-coding dominated studies. Further, we planned to validate these mutations using the software described in objective 2.

4) Current selection of the specific cytotoxic agents to be used for breast cancer patient care does not consider analyzing the tumour for genes involved in drug disposition. In addition to the classical pathological features, these genes may be informative in identifying which tumours are the most likely to respond to certain therapies. This thesis aimed to use machine learning to develop predictive models of breast cancer tumour sensitivity to paclitaxel and gemcitabine. Rather than completing a genome-wide study, we sought to start with a much smaller set of biologically-relevant genes based on what is known about paclitaxel and gemcitabine drug mechanisms of action. In addition, a set of FFPE tumour samples was obtained from patients that were treated with paclitaxel and gemcitabine, and whose response to these drugs is known. In addition to previously published data sets, we planned to validate the predictive gene signatures on these FFPE samples through nucleic acid extraction and analysis. In turn, we wanted to determine whether high quality data could be obtained from FFPE samples, and how suitable their use would be in studies involving chemosensitivity predictions.

References

1. Jemal, A. et al. Global cancer statistics. CA. Cancer J. Clin. 61, 69–90 (2011).

2. Canadian Cancer Society’s Advisory Committee on Cancer Statistics. Canadian Cancer Statistics 2014. Toronto, ON: Canadian Cancer Society. (2014).

3. Perou, C. M. et al. Molecular portraits of human breast tumours. Nature 406, 747– 752 (2000).

4. Sørlie, T. et al. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc. Natl. Acad. Sci. U. S. A. 98, 10869– 10874 (2001).

5. Herschkowitz, J. I. et al. Identification of conserved gene expression features between murine mammary carcinoma models and human breast tumors. Genome Biol. 8, R76 (2007).

6. Goldhirsch, A. et al. Strategies for subtypes-dealing with the diversity of breast cancer: Highlights of the St Gallen international expert consensus on the primary therapy of early breast cancer 2011. Ann. Oncol. 22, 1736–1747 (2011).

7. Van ’t Veer, L. J. et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536 (2002).

8. Prat, A. & Perou, C. M. Deconstructing the molecular portraits of breast cancer. Mol. Oncol. 5, 5–23 (2011).

9. Slamon, D. J. et al. Use of Chemotherapy plus a Monoclonal Antibody against HER2 for Metastatic Breast Cancer That Overexpresses HER2. N. Engl. J. Med.

344, 783–792 (2001).

10. Hanahan, D. & Weinberg, R. A. Hallmarks of cancer: the next generation. Cell 144,

646–674 (2011).

11. Hanahan, D. & Weinberg, R. A. The hallmarks of cancer. Cell 100, 57–70 (2000).

12. Banerji, S. et al. Sequence analysis of mutations and translocations across breast cancer subtypes. Nature 486, 405–409 (2012).

13. Ellis, M. J. et al. Whole-genome analysis informs breast cancer response to aromatase inhibition. Nature 486, 353–360 (2012).

14. Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012).

15. Shah, S. P. et al. The clonal and mutational evolution spectrum of primary triple- negative breast cancers. Nature 486, 395–399 (2012).

16. Stephens, P. J. et al. The landscape of cancer genes and mutational processes in breast cancer. Nature 486, 400–404 (2012).

17. Levine, A. J., Momand, J. & Finlay, C. A. The p53 tumour suppressor gene. Nature

351, 453–456 (1991).

18. Bachman, K. E. et al. The PIK3CA gene is mutated with high frequency in human breast cancers. Cancer Biol. Ther. 3, 772–775 (2004).

19. Lawrence, M. S. et al. Mutational heterogeneity in cancer and the search for new cancer genes. Nature 499, 214–218 (2013).

20. Dees, N. D. et al. MuSiC: Identifying mutational significance in cancer genomes. Genome Res. 22, 1589–1598 (2012).

21. Youn, A. & Simon, R. Identifying cancer driver genes in tumor genome sequencing studies. Bioinforma. Oxf. Engl. 27, 175–181 (2011).

22. Greenman, C., Wooster, R., Futreal, P. A., Stratton, M. R. & Easton, D. F. Statistical Analysis of Pathogenicity of Somatic Mutations in Cancer. Genetics 173,

2187–2198 (2006).

23. Varela, I. et al. Exome sequencing identifies frequent mutation of the SWI/SNF complex gene PBRM1 in renal carcinoma. Nature 469, 539–542 (2011).

24. Pritchard, A. L. & Hayward, N. K. Molecular pathways: mitogen-activated protein kinase pathway mutations and drug resistance. Clin. Cancer Res. Off. J. Am. Assoc. Cancer Res. 19, 2301–2309 (2013).

25. Curtis, C. et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 486, 346–352 (2012).

26. Bauman, J. G., Wiegant, J., Borst, P. & van Duijn, P. A new method for fluorescence microscopical localization of specific DNA sequences by in situ hybridization of fluorochromelabelled RNA. Exp. Cell Res. 128, 485–490 (1980).

27. Levsky, J. M. & Singer, R. H. Fluorescence in situ hybridization: past, present and future. J. Cell Sci. 116, 2833–2838 (2003).

28. Cheung, V. G. et al. Integration of cytogenetic landmarks into the draft sequence of the human genome. Nature 409, 953–958 (2001).

29. Roohi, J., Cammer, M., Montagna, C. & Hatchwell, E. An improved method for generating BAC DNA suitable for FISH. Cytogenet. Genome Res. 121, 7–9 (2008).

30. Goorha, S., Glenn, M. J., Drozd-Borysiuk, E. & Chen, Z. A set of commercially available fluorescent in-situ hybridization probes efficiently detects cytogenetic abnormalities in patients with chronic lymphocytic leukemia. Genet. Med. Off. J. Am. Coll. Med. Genet. 6, 48–53 (2004).

31. Mallo, M. et al. Fluorescence in situ hybridization improves the detection of 5q31 deletion in myelodysplastic syndromes without cytogenetic evidence of 5q-. Haematologica 93, 1001–1008 (2008).

32. Kato, A., Albert, P. S., Vega, J. M. & Birchler, J. A. Sensitive fluorescence in situ hybridization signal detection in maize using directly labeled probes produced by high concentration DNA polymerase nick translation. Biotech. Histochem. Off. Publ. Biol. Stain Comm. 81, 71–78 (2006).

33. Rogan, P. K., Cazcarro, P. M. & Knoll, J. H. M. Sequence-Based Design of Single- Copy Genomic DNA Probes for Fluorescence In Situ Hybridization. Genome Res.

34. Alpár, D. et al. Automated FISH analysis using dual-fusion and break-apart probes on paraffin-embedded tissue sections. Cytom. Part J. Int. Soc. Anal. Cytol. 73, 651– 657 (2008).

35. Test and Technology Transfer Committee, American College of Medical Genetics, 9650 Rockville Pike, Bethesda, MD 20814-3998, United States. Technical and clinical assessment of fluorescence in situ hybridization: an ACMG/ASHG position statement. I. Technical considerations. Test and Technology Transfer Committee. Genet. Med. Off. J. Am. Coll. Med. Genet. 2, 356–361 (2000).

36. Wan, T. S. K. Cancer Cytogenetics: Methodology Revisited. Ann. Lab. Med. 34,

413–425 (2014).

37. Shaw, A. T. et al. Clinical features and outcome of patients with non-small-cell lung cancer who harbor EML4-ALK. J. Clin. Oncol. 27, 4247–4253 (2009).

38. Landstrom, A. P. & Tefferi, A. Fluorescent in situ hybridization in the diagnosis, prognosis, and treatment monitoring of chronic myeloid leukemia. Leuk. Lymphoma

47, 397–402 (2006).

39. Wolff, A. C. et al. American Society of Clinical Oncology/College of American Pathologists guideline recommendations for human epidermal growth factor receptor 2 testing in breast cancer. J. Clin. Oncol. 25, 118–145 (2007).

40. Perez, E. A. et al. Immunohistochemistry and fluorescence in situ hybridization assessment of HER2 in clinical trials of adjuvant therapy for breast cancer (NCCTG

N9831, BCIRG 006, and BCIRG 005). Breast Cancer Res. Treat. 138, 99–108 (2013).

41. Kallioniemi, A. et al. Comparative genomic hybridization for molecular cytogenetic analysis of solid tumors. Science 258, 818–821 (1992).

42. Pinkel, D. et al. High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays. Nat. Genet. 20, 207–211 (1998).

43. Pollack, J. R. et al. Genome-wide analysis of DNA copy-number changes using cDNA microarrays. Nat. Genet. 23, 41–46 (1999).

44. Barrett, M. T. et al. Comparative genomic hybridization using oligonucleotide microarrays and total genomic DNA. Proc. Natl. Acad. Sci. U. S. A. 101, 17765– 17770 (2004).

45. Miller, D. T. et al. Consensus statement: chromosomal microarray is a first-tier clinical diagnostic test for individuals with developmental disabilities or congenital anomalies. Am. J. Hum. Genet. 86, 749–764 (2010).

46. Manning, M., Hudgins, L. & Professional Practice and Guidelines Committee. Array-based technology and recommendations for utilization in medical genetics practice for detection of chromosomal abnormalities. Genet. Med. Off. J. Am. Coll. Med. Genet. 12, 742–745 (2010).

47. Vissers, L. E. L. M., Veltman, J. A., van Kessel, A. G. & Brunner, H. G. Identification of disease genes by whole genome CGH arrays. Hum. Mol. Genet. 14 Spec No. 2, R215–223 (2005).

48. Andre, F. et al. Molecular Characterization of Breast Cancer with High-Resolution Oligonucleotide Comparative Genomic Hybridization Array. Clin. Cancer Res. 15,

441–451 (2009).

49. Schena, M., Shalon, D., Davis, R. W. & Brown, P. O. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270, 467– 470 (1995).

50. Schena, M. et al. Parallel human genome analysis: microarray-based expression monitoring of 1000 genes. Proc. Natl. Acad. Sci. U. S. A. 93, 10614–10619 (1996).

51. Hughes, T. R. et al. Expression profiling using microarrays fabricated by an ink-jet oligonucleotide synthesizer. Nat. Biotechnol. 19, 342–347 (2001).

52. Eisen, M. B., Spellman, P. T., Brown, P. O. & Botstein, D. Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. U. S. A. 95,

14863–14868 (1998).

53. Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. U. S. A.

54. Sanger, F., Nicklen, S. & Coulson, A. R. DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci. U. S. A. 74, 5463–5467 (1977).

55. Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature

409, 860–921 (2001).

56. McPherson, J. D. et al. A physical map of the human genome. Nature 409, 934–941 (2001).

57. Mardis, E. R. The impact of next-generation sequencing technology on genetics. Trends Genet. TIG 24, 133–141 (2008).

58. Metzker, M. L. Sequencing technologies - the next generation. Nat. Rev. Genet. 11,

31–46 (2010).

59. Ng, S. B. et al. Targeted capture and massively parallel sequencing of 12 human exomes. Nature 461, 272–276 (2009).

60. Gnirke, A. et al. Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nat. Biotechnol. 27, 182–189 (2009).

61. Bentley, D. R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008).

62. Hatem, A., Bozdağ, D., Toland, A. E. & Çatalyürek, Ü. V. Benchmarking short sequence mapping tools. BMC Bioinformatics 14, 184 (2013).

63. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory- efficient alignment of short DNA sequences to the human genome. Genome Biol.

10, R25 (2009).

64. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

65. Li, R. et al. SOAP2: an improved ultrafast tool for short read alignment. Bioinforma. Oxf. Engl. 25, 1966–1967 (2009).

66. Li, H., Ruan, J. & Durbin, R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 18, 1851–1858 (2008).

67. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinforma. Oxf. Engl. 25, 1754–1760 (2009).

68. Smith, A. D., Xuan, Z. & Zhang, M. Q. Using quality scores and longer reads improves accuracy of Solexa read mapping. BMC Bioinformatics 9, 128 (2008).

69. Burrows, M. & Wheeler, DJ. A block-sorting lossless data compression algorithm. Digit. Equip. Corp. (1994).

70. Ruffalo, M., LaFramboise, T. & Koyutürk, M. Comparative analysis of algorithms for next-generation sequencing read alignment. Bioinforma. Oxf. Engl. 27, 2790– 2796 (2011).

71. McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

72. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinforma. Oxf. Engl. 25, 2078–2079 (2009).

73. Pirooznia, M. et al. Validation and assessment of variant calling pipelines for next- generation sequencing. Hum. Genomics 8, 14 (2014).

74. Liu, X., Han, S., Wang, Z., Gelernter, J. & Yang, B.-Z. Variant Callers for Next- Generation Sequencing Data: A Comparison Study. PLoS ONE 8, (2013).

75. Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).

76. Larson, D. E. et al. SomaticSniper: identification of somatic point mutations in whole genome sequencing data. Bioinforma. Oxf. Engl. 28, 311–317 (2012).

77. Saunders, C. T. et al. Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinforma. Oxf. Engl. 28, 1811–1817 (2012).

78. Ding, J. et al. Feature-based classifiers for somatic mutation detection in tumour– normal paired sequencing data. Bioinformatics 28, 167–175 (2012).

79. Sherry, S. T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001).

80. Smigielski, E. M., Sirotkin, K., Ward, M. & Sherry, S. T. dbSNP: a database of single nucleotide polymorphisms. Nucleic Acids Res. 28, 352–355 (2000).

81. Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).

82. Kang, H. J., Choi, K. O., Kim, B.-D., Kim, S. & Kim, Y. J. FESD: a Functional Element SNPs Database in human. Nucleic Acids Res. 33, D518–522 (2005).

83. Kumar, P., Henikoff, S. & Ng, P. C. Predicting the effects of coding non- synonymous variants on protein function using the SIFT algorithm. Nat. Protoc. 4,

1073–1081 (2009).

84. Adzhubei, I., Jordan, D. M. & Sunyaev, S. R. Predicting functional effect of human missense mutations using PolyPhen-2. Curr. Protoc. Hum. Genet. Editor. Board Jonathan Haines Al Chapter 7, Unit7.20 (2013).

85. Calabrese, R., Capriotti, E., Fariselli, P., Martelli, P. L. & Casadio, R. Functional annotations improve the predictive score of human disease-related mutations in proteins. Hum. Mutat. 30, 1237–1244 (2009).

86. Shirley, B. C. et al. Interpretation, stratification and evidence for sequence variants affecting mRNA splicing in complete human genome sequences. Genomics Proteomics Bioinformatics 11, 77–85 (2013).

87. Desmet, F.-O. et al. Human Splicing Finder: an online bioinformatics tool to predict splicing signals. Nucleic Acids Res. 37, e67 (2009).

88. Pabinger, S. et al. A survey of tools for variant analysis of next-generation genome sequencing data. Brief. Bioinform. 15, 256–278 (2014).

89. Rehm, H. L. et al. ACMG clinical laboratory standards for next-generation sequencing. Genet. Med. Off. J. Am. Coll. Med. Genet. 15, 733–747 (2013).

90. Anderson, M. W. & Schrijver, I. Next Generation DNA Sequencing and the Future of Genomic Medicine. Genes 1, 38–69 (2010).

91. Easton, D. F., Ford, D. & Bishop, D. T. Breast and ovarian cancer incidence in BRCA1-mutation carriers. Breast Cancer Linkage Consortium. Am. J. Hum. Genet.

56, 265–271 (1995).

92. Howlader N, Noone AM, Krapcho M, Garshell J, Miller D, Altekruse SF, Kosary CL, Yu M, Ruhl J, Tatalovich Z,Mariotto A, Lewis DR, Chen HS, Feuer EJ, Cronin KA (eds). SEER Cancer Statistics Review, 1975-2011, National Cancer Institute. Bethesda, MD, http://seer.cancer.gov/csr/1975_2011/, based on November 2013 SEER data submission, posted to the SEER web site, April 2014.

93. Thusberg, J. & Vihinen, M. Pathogenic or not? And if so, then how? Studying the effects of missense mutations using bioinformatics methods. Hum. Mutat. 30, 703– 714 (2009).

94. Jurica, M. S. & Moore, M. J. Pre-mRNA Splicing: Awash in a Sea of Proteins. Mol. Cell 12, 5–14 (2003).

95. Burset, M., Seledtsov, I. A. & Solovyev, V. V. Analysis of canonical and non- canonical splice sites in mammalian genomes. Nucleic Acids Res. 28, 4364–4375 (2000).

96. Pagani, F. & Baralle, F. E. Opinion: Genomic variants in exons and introns: identifying the splicing spoilers. Nat. Rev. Genet. 5, 389–396 (2004).

97. Caminsky, N., Mucaki, E. J. & Rogan, P. K. Interpretation of mRNA splicing mutations in genetic disease: review of the literature and guidelines for information- theoretical analysis. F1000Research 3, (2014).

98. Rogan, P. K., Svojanovsky, S. & Leeder, J. S. Information theory-based analysis of CYP2C19, CYP2D6 and CYP3A5 splicing mutations. Pharmacogenetics 13, 207– 218 (2003).

99. Modrek, B. & Lee, C. A genomic view of alternative splicing. Nat. Genet. 30, 13– 19 (2002).

100. Brudno, M. et al. Computational analysis of candidate intron regulatory elements for tissue-specific alternative pre-mRNA splicing. Nucleic Acids Res. 29, 2338– 2348 (2001).

101. Wessagowit, V., Nalla, V. K., Rogan, P. K. & McGrath, J. A. Normal and abnormal mechanisms of gene splicing and relevance to inherited skin diseases. J. Dermatol. Sci. 40, 73–84 (2005).

102. Krawczak, M., Reiss, J. & Cooper, D. N. The mutational spectrum of single base- pair substitutions in mRNA splice junctions of human genes: causes and consequences. Hum. Genet. 90, 41–54 (1992).

103. Ars, E. et al. Mutations affecting mRNA splicing are the most common molecular defects in patients with neurofibromatosis type 1. Hum. Mol. Genet. 9, 237–247 (2000).

104. McLaren, W. et al. Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics 26, 2069–2070 (2010).

105. Mucaki, E. J., Shirley, B. C. & Rogan, P. K. Prediction of Mutant mRNA Splice Isoforms by Information Theory-Based Exon Definition. Hum. Mutat. 34, 557–565 (2013).

106. Schneider, T. D. Information content of individual genetic sequences. J. Theor. Biol.

189, 427–441 (1997).

107. Pertea, M., Lin, X. & Salzberg, S. L. GeneSplicer: a new computational method for splice site prediction. Nucleic Acids Res. 29, 1185–1190 (2001).

108. Cartegni, L., Wang, J., Zhu, Z., Zhang, M. Q. & Krainer, A. R. ESEfinder: a web resource to identify exonic splicing enhancers. Nucleic Acids Res. 31, 3568–3571 (2003).

109. Yeo, G. & Burge, C. B. Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J. Comput. Biol. J. Comput. Mol. Cell Biol.

11, 377–394 (2004).

110. Reese, M. G., Eeckman, F. H., Kulp, D. & Haussler, D. Improved splice site detection in Genie. J. Comput. Biol. J. Comput. Mol. Cell Biol. 4, 311–323 (1997).

In document Interpretation of Mutations, Expression, Copy Number in Somatic Breast Cancer: Implications for Metastasis and Chemotherapy (Page 56-78)