CHAPTER 4 DOWN SYNDROME PREDICTION AND SCREENING MODEL
4.7 VISUALIZATION OF FEATURE MAPS AND TRAINED FILTERS OF BI-
In this section, we visualized the trained filters and feature maps from intermediate convolutional hidden layers of our trained bi-stream CNN model. The bi-stream CNN model had a few advantages when compared with traditional machine learning algorithms.
First of all, we used chromosome maps to represent the genotyping array information, which converted one-dimensional genome data to images. Secondly, we used 16 convolutional 3x3 size kernels to capture local genomic features and detect patterns from adjacent genes and SNP sites from two chromosome SNP maps. Thirdly, two-branch CNN model could capture the genomic features from two chromosomes at the same time. Figure 4.4 A and B showed the output feature maps and their corresponding trained filters from convolutional layer C1 of each branch CNN models. Some trained filters could highlight the most important and informative SNP sites from the chromosome SNP maps, and neglect less informative ones (marked as yellow squares). The green rectangles showed that our trained filters could sharpen input images and capture local motifs, which represented the correlated variations patterns in genome regions. The bi-stream model could also detect continuous gene and SNP intensity variations by capturing adjacent variation patterns in line (marked as white rectangles).
Our bi-stream CNN model could detect the simultaneous or causal SNPs variations in human genomes. These genome characterizations and extracted genomic patterns provided signals to classify DS and normal samples. However, traditional machine learning algorithms tended to build models with a global view from all available features and treated each feature independently. Therefore, they were hard to extract signals from regional genomic patterns and correlations between adjacent genes and SNPs sites.
Figure 4. 4 Visualization of feature maps and trained filter weights from convolutional layer C1 of Figure 4.3. Figure a, b, c and d in figure (A) represent four feature maps from convolutional layer C1 of lower branch CNN model (shown in Figure1). Figure e, f, g, and h in figure (A) are the corresponding 3x3 kernels weights of Figure a, b, c, and d. Figure a, b, c and d in Figure (B) represent four feature maps from convolutional layer C1 of the upper branch CNN model. Figure e, f, g, and h in figure (B) are the corresponding 3x3 kernels weights for Figure a, b, c, and d.
4.8 DISCUSSION
Previous studies illustrated that gene expressions and SNP variations were highly correlated within local genome regions (Bulik-Sullivan et al., 2015; Duerr et al., 2006; Farh et al., 2015). Genome-wide association studies also demonstrated that human DS was usually associated with many gene copy number and SNPs variations, and many unidentified genomic abnormalities (Jain et al., 2016; Petry et al., 2017; Sailani et al., 2013). In this study, our bi-stream CNN model could learn the genomic features and associated
variations among adjacent genes and SNP sites from chromosome SNP maps. Currently, human DS treatments only have limited effects. They cannot cure DS fundamentally either. At the same time, there isn’t any clear effect or benefit on human DS treatments by using traditional drugs (K. J. Gardiner, 2015; Kishnani et al., 2010; Lott et al., 2011; Mohan, Bennett, & Carpenter, 2009). The feature maps and extracted genome features could identify DS related markers and pathway components. These genome features explained the genomic characteristics and pathological mechanisms of human DS, which could be further applied in gene therapy and genetic medicine developments.
An accurate non-invasive DS screening method offers a low-risk way to screen human DS. It helps low-risk patients avoid taking further invasive diagnostic procedures, which could result in fetal loss. Nowadays, genotyping array analyses on fetal genomes could be performed on the trophoblast cells with non-invasive procedures after the fifth week gestation (Jain et al., 2016; Petry et al., 2017). In this study, we developed a novel method to construct accurate DS screening model by using bi-stream CNN and genotyping array data. The results showed that our bi-stream CNN model had the best performance in every evaluation metric when compared with two single-stream CNN models and three traditional machine learning models. The CNN model achieved over 99.3% accuracies, as well as very low false positive and false negative rates. It was very important to disease prediction and medical practice. Even though traditional machine learning algorithms obtained over 96% accuracies. Their high false-negative rates are not suitable for clinical screening tests. Traditional machine learning algorithms treated each SNP sites as single feature independently. They were hard to extract signals from regional genomic patterns and variation correlations between adjacent genes and SNPs sites. Although the single-
stream models could extract features and patterns from local genome features and adjacent SNP sites, they could only learn these features from one single chromosome, which completely neglected the genomic patterns of the other one. In deep learning studies, large datasets were great obstacles in the model construction and optimization. We used each pixel to represent the intensities of SNP site, and used chromosome SNP maps to represent the genome information, which significantly reduced data and model complexity. Furthermore, our bi-stream CNN architecture could learn local genomic patterns and extracted regional features, which could also be applied to build prediction models from genotyping array data for more diseases.
REFERENCES
Alekseyev, M., & Pevzner, P. A. (2009). Breakpoint graphs and ancestral genome reconstructions. Genome research, gr–082784.
Anbarasi, M., Anupriya, E., & Iyengar, N. (2010). Enhanced prediction of heart disease with feature subset selection using genetic algorithm. International Journal of Engineering Science and Technology, 2(10), 5370–5376.
Antonarakis, S. E. (2016). Down syndrome and the complexity of genome dosage imbalance. Nature Reviews Genetics.
Applegate, D., Bixby, R., Chvatal, V., & Cook, W. (2006). Concorde tsp solver, 2006. URL http://www. tsp. gatech. edu/concorde.
Avdeyev, P., Jiang, S., Aganezov, S., Hu, F., & Alekseyev, M. A. (2016). Reconstruction of ancestral genomes in presence of gene gain and loss. Journal of Computational Biology. Bagshaw, S. M., Berthiaume, L. R., Delaney, A., & Bellomo, R. (2008). Continuous versus intermittent renal replacement therapy for critically ill patients with acute kidney injury: a meta-analysis. Critical care medicine, 36(2), 610–617.
Bellomo, R., Kellum, J. A., & Ronco, C. (2012). Acute kidney injury. The Lancet, 380(9843), 756–766.
Bevc, S., Hojs, N., Hojs, R., Ekart, R., Gorenjak, M., & Puklavec, L. (2017). Estimation of glomerular filtration rate in elderly chronic kidney disease patients: Comparison of three
novel sophisticated equations and simple cystatin c equation. Therapeutic Apheresis and Dialysis.
Bhargava, P. (2010). Epigenetics to proteomics: from yeast to brain. Proteomics, 10(4), 749–770.
Bhutkar, A., Schaeffer, S. W., Russo, S. M., Xu, M., Smith, T. F., & Gelbart, W. M. (2008). Chromosomal rearrangement inferred from comparisons of 12 drosophila genomes. Genetics, 179(3), 1657–1680.
Blanchette, M., Bourque, G., & Sankoff, D. (1997). Breakpoint phylogenies. Genome informatics, 8, 25–34.
Boore, J. L. (2006). The use of genome-level characters for phylogenetic reconstruction. Trends in Ecology & Evolution, 21(8), 439–446.
Brock, D. J., & Sutcliffe, R. G. (1972). Alpha-fetoprotein in the antenatal diagnosis of anencephaly and spina bifida. The Lancet, 300(7770), 197–199.
Bulik-Sullivan, B., Finucane, H. K., Anttila, V., Gusev, A., Day, F. R., Loh, P.-R., . . . others (2015). An atlas of genetic correlations across human diseases and traits. Nature genetics, 47(11), 1236–1241.
Byrne, K. P., & Wolfe, K. H. (2005). The yeast gene order browser: combining curated homology and syntenic context reveals gene fate in polyploid species. Genome research, 15(10), 1456–1461.
Canales, M. T., Blackwell, T., Ishani, A., Taylor, B. C., Hart, A., Beyth, R. J., & Ensrud, K. E. (2017). Renal function and death in older women: Which egfr formula should we use? International journal of nephrology, 2017.
Celik, E., Atalay, M., & Kondiloglu, A. (2016). The diagnosis and estimate of chronic kidney disease using the machine learning methods. International Journal of Intelligent Systems and Applications in Engineering.
Chauve, C., Gavranovic, H., Ouangraoua, A., & Tannier, E. (2010). Yeast ancestral genome reconstructions: the possibilities of computational methods ii. Journal of Computational Biology, 17(9), 1097–1112.
Chavez, M. C. (2016). Hippotherapy versus aquatic therapy use in early intervention physical therapy in children with down syndrome (Unpublished doctoral dissertation). Division of Physical Therapy, School of Medicine, University of New Mexico.
Chawla, L. S., Eggers, P. W., Star, R. A., & Kimmel, P. L. (2014). Acute kidney injury and chronic kidney disease as interconnected syndromes. New England Journal of Medicine, 371(1), 58–66.
Chen, Z., Zhang, X., & Zhang, Z. (2016). Clinical risk assessment of patients with chronic kidney disease by using clinical data and multivariate models. International urology and nephrology, 48(12), 2069–2075.
Coresh, J., Selvin, E., Stevens, L. A., Manzi, J., Kusek, J. W., Eggers, P., . . . Levey, A. S. (2007). Prevalence of chronic kidney disease in the united states. Jama, 298(17), 2038– 2047.
Delsuc, F., Brinkmann, H., & Philippe, H. (2005). Phylogenomics and the reconstruction of the tree of life. Nature Reviews Genetics, 6(5), 361–375.
Dierssen, M., & de la Torre, R. (2012). Pathways to cognitive deficits in down syndrome. Down Syndrome: From Understanding the Neurobiology to Therapy, 197, 73.
Drillon, G., Carbone, A., & Fischer, G. (2011). Combinatorics of chromosomal rearrangements based on synteny blocks and synteny packs. Journal of Logic and Computation, 23(4), 815–838.
Drillon, G., Carbone, A., & Fischer, G. (2014). Synchro: a fast and easy tool to reconstruct and visualize synteny blocks along eukaryotic chromosomes. PLoS One, 9(3), e92621. Driscoll, D. A., & Gross, S. (2009). Prenatal screening for aneuploidy. New England Journal of Medicine, 360(24), 2556–2562.
Duerr, R. H., Taylor, K. D., Brant, S. R., Rioux, J. D., Silverberg, M. S., Daly, M. J., . . . others (2006). A genome-wide association study identifies il23r as an inflammatory bowel disease gene. science, 314(5804), 1461–1463.
Ehrich, M., Deciu, C., Zwiefelhofer, T., Tynan, J. A., Cagasan, L., Tim, R., . . . others (2011). Noninvasive detection of fetal trisomy 21 by sequencing of dna in maternal blood: a study in a clinical setting. American journal of obstetrics and gynecology, 204(3), 205– e1.
Esteva, A., Kuprel, B., Novoa, R. A., Ko, J., Swetter, S. M., Blau, H. M., & Thrun, S. (2017). Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542(7639), 115–118.
Farh, K. K.-H., Marson, A., Zhu, J., Kleinewietfeld, M., Housley, W. J., Beik, S., others (2015). Genetic and epigenetic fine-mapping of causal autoimmune disease variants. Nature, 518(7539), 337.
Faust, O., Acharya, U. R., Sudarshan, V. K., San Tan, R., Yeong, C. H., Molinari, F., & Ng, K. H. (2016). Computer aided diagnosis of coronary artery disease, myocardial infarction and carotid atherosclerosis using ultrasound images: A review. Physica Medica.
Feijão, P., & Meidanis, J. (2009). Scj: a variant of breakpoint distance for which sorting, genome median and genome halving problems are easy. In International workshop on algorithms in bioinformatics (pp. 85–96).
Feng, B., Zhou, l., & Tang, J. (2017). Ancestral genome reconstruction on whole genome level. Current Genomics, accepted(0), 0.
Fertin, G. (2009). Combinatorics of genome rearrangements. MIT press. Figueroa, D. F., & Baco, A. R. (2015). Octocoral mitochondrial genomes provide insights into the phylogenetic history of gene order rearrangements, order reversals, and cnidarian phylogenetics. Genome biology and evolution, 7(1), 391–409.
Frickey, T., & Lupas, A. N. (2004). Phylogenie: automated phylome generation and analysis. Nucleic acids research, 32(17), 5231–5238.
Friedberg, R., Darling, A. E., & Yancopoulos, S. (2008). Genome rearrangement by the double cut and join operation. Bioinformatics: Data, Sequence Analysis and Evolution, 385–416.
Gagnon, Y., Blanchette, M., & El-Mabrouk, N. (2012). A flexible ancestral genome reconstruction method based on gapped adjacencies. BMC bioinformatics, 13(Suppl 19), S4.
Gao, N., Zhang, Y., Feng, B., & Tang, J. (2015). A cooperative co-evolutionary genetic algorithm for tree scoring and ancestral genome inference. Computational Biology and Bioinformatics, IEEE/ACM Transactions on, 12(6), 1248–1254.
Gardiner, K., Herault, Y., Lott, I. T., Antonarakis, S. E., Reeves, R. H., & Dierssen, M. (2010). Down syndrome: from understanding the neurobiology to therapy. Journal of Neuroscience, 30(45), 14943–14945.
Gardiner, K. J. (2015). Pharmacological approaches to improving cognitive function in down syndrome: current status and considerations. Drug Des Devel Ther, 9,103–125. Ghiurcuta, C. G., & Moret, B. M. (2014). Evaluating synteny for improved comparative studies. Bioinformatics, 30(12), i9–i18.
Gordon, J. L., Armisén, D., Proux-Wéra, E., ÓhÉigeartaigh, S. S., Byrne, K. P., &
Wolfe, K. H. (2011). Evolutionary erosion of yeast sex chromosomes by mating-type switching accidents. Proceedings of the National Academy of Sciences, 108(50), 20024– 20029.
Gordon, J. L., Byrne, K. P., & Wolfe, K. H. (2009). Additions, losses, and rearrangements on the evolutionary route from a reconstructed ancestor to the modern saccharomyces cerevisiae genome. PLoS genetics, 5(5), e1000485.
Gray, K. R., Aljabar, P., Heckemann, R. A., Hammers, A., Rueckert, D., Initiative,
A. D. N., et al. (2013). Random forest-based similarity measures for multi-modal classification of alzheimer’s disease. NeuroImage, 65, 167–175.
Greenspan, S. I., Wieder, S., & Simons, R. (1998). The child with special needs: Encouraging intellectual and emotional growth. Addison-Wesley/Addison Wesley Longman.
Guralnick, M. J. (2010). Early intervention approaches to enhance the peer-related social competence of young children with developmental delays: A historical perspective. Infants and young children, 23(2), 73.
Hedtke, S. M., Townsend, T. M., & Hillis, D. M. (2006). Resolution of phylogenetic conflict in large data sets by increased taxon sampling. Systematic Biology, 55(3), 522– 529.
Higuera, C., Gardiner, K. J., & Cios, K. J. (2015). Self-organizing feature maps identify proteins critical to learning in a mouse model of down syndrome. PloS one, 10(6), e0129126.
Höhl, M., & Ragan, M. A. (2007). Is multiple-sequence alignment required for accurate inference of phylogeny? Systematic Biology, 56(2), 206–221.
Hu, F., Zhou, J., Zhou, L., & Tang, J. (2014). Probabilistic reconstruction of ancestral gene orders with insertions and deletions. Computational Biology and Bioinformatics, IEEE/ACM Transactions on, 11(4), 667–672.
Hu, F., Zhou, L., & Tang, J. (2013). Reconstructing ancestral genomic orders using binary encoding and probabilistic models. In Bioinformatics research and applications (pp. 17– 27). Springer.
Jain, C. V., Kadam, L., van Dijk, M., Kohan-Ghadr, H.-R., Kilburn, B. A., Hartman,C., others (2016). Fetal genome profiling at 5 weeks of gestation after noninvasive isolation of trophoblast cells from the endocervical canal. Science translational medicine, 8(363), 363re4–363re4.
Jean, G., Sherman, D. J., & Nikolski, M. (2009). Mining the semantics of genome super- blocks to infer ancestral architectures. Journal of Computational Biology, 16(9), 1267– 1284.
Jha, V., Garcia-Garcia, G., Iseki, K., Li, Z., Naicker, S., Plattner, B., . . . Yang, C.-W. (2013). Chronic kidney disease: global dimension and perspectives. The Lancet, 382(9888), 260–272.
Jones, B. R., Rajaraman, A., Tannier, E., & Chauve, C. (2012). Anges: reconstructing ancestral genomes maps. Bioinformatics, 28(18), 2388–2390.
Kishnani, P. S., Heller, J. H., Spiridigliozzi, G. A., Lott, I., Escobar, L., Richardson, S.. McRae, T. (2010). Donepezil for treatment of cognitive dysfunction in children with down syndrome aged 10–17. American Journal of Medical Genetics Part A, 152(12), 3028–3035. Kleschevnikov, A. M., Yu, J., Kim, J., Lysenko, L. V., Zeng, Z., Yu, Y. E., & Mobley, W. C. (2017). Evidence that increased kcnj6 gene dose is necessary for deficits in behavior and dentate gyrus synaptic plasticity in the ts65dn mouse model of down syndrome. Neurobiology of Disease, 103, 1–10.
Kuehn, B. M. (2016). Treating trisomies: Prenatal down’s syndrome therapies explored in mice. Nature Publishing Group. Kurtzman, C., Fell, J. W., & Boekhout, T. (2011). The yeasts: a taxonomic study (Vol. 1). Elsevier.
Lameire, N., & Van Biesen, W. (2010). The initiation of renal-replacement therapy: Just- in-time delivery. New England Journal of Medicine, 363(7), 678–680.
Levey, A. S., & Coresh, J. (2012). Chronic kidney disease. The Lancet, 379(9811), 165– 180.
Levey, A. S., Coresh, J., Balk, E., Kausz, A. T., Levin, A., Steffes, M. W., . . . Eknoyan, G. (2003). National kidney foundation practice guidelines for chronic kidney disease: evaluation, classification, and stratification. Annals of internal medicine, 139(2), 137–147. Levey, A. S., Eckardt, K.-U., Tsukamoto, Y., Levin, A., Coresh, J., Rossert, J., . . . Eknoyan, G. (2005). Definition and classification of chronic kidney disease: a position statement from kidney disease: Improving global outcomes (kdigo). Kidney international, 67(6), 2089–2100.
Levey, A. S., Stevens, L. A., Schmid, C. H., Zhang, Y. L., Castro, A. F., Feldman, H. I.. others (2009). A new equation to estimate glomerular filtration rate. Annals of internal medicine, 150(9), 604–612.
Levey, A. S., Coresh, J., Bolton, K., Culleton, B., Harvey, K. S., Ikizler, T. A., . . . others (2002). K/doqi clinical practice guidelines for chronic kidney disease: evaluation, classification, and stratification. American Journal of Kidney Diseases, 39 (2 SUPPL. 1). Levin, A., Stevens, P. E., Bilous, R. W., Coresh, J., De Francisco, A. L., De Jong, P. E.. others (2013). Kidney disease: Improving global outcomes (kdigo) ckd work group. kdigo 2012 clinical practice guideline for the evaluation and management of chronic kidney disease. Kidney International Supplements, 3(1), 1–150.
Lin, Y., Hu, F., Tang, J., & Moret, B. (2013). Maximum likelihood phylogenetic reconstruction from high-resolution whole-genome data and a tree of 68 eukaryotes. In Pacific symposium on biocomputing (pp. 357–366).
Lott, I. T., Doran, E., Nguyen, V. Q., Tournay, A., Head, E., & Gillen, D. L. (2011). Down syndrome and dementia: a randomized, controlled trial of antioxidant supplementation. American journal of medical genetics Part A, 155 (8), 1939–1948.
Luo, H., Arndt, W., Zhang, Y., Shi, G., Alekseyev, M. A., Tang, J., . . . Friedman, R. (2012). Phylogenetic analysis of genome rearrangements among five mammalian orders. Molecular phylogenetics and evolution, 65(3), 871–882.
Ma, J. (2010). A probabilistic framework for inferring ancestral genomic orders. In Bioinformatics and biomedicine (bibm), 2010 ieee international conference on (pp. 179– 184).
Marcet-Houben, M., & Gabaldón, T. (2015). Beyond the whole-genome duplication: Phylogenetic evidence for an ancient interspecies hybridization in the baker’s yeast lineage. PLoS biology, 13 (8).
McKinney, W. (2012). Python for data analysis: Data wrangling with pandas, numpy, and ipython. " O’Reilly Media, Inc.".
McMahon, G. M., Hwang, S.-J., Clish, C. B., Tin, A., Yang, Q., Larson, M. G., . . . others (2017). Urinary metabolites along with common and rare genetic variations are associated with incident chronic kidney disease. Kidney International, 91(6), 1426–1435.
Mohan, M., Bennett, C., & Carpenter, P. K. (2009). Memantine for dementia in people with down syndrome. The Cochrane Library.
Morton, R., Tong, A., Howard, K., Snelling, P., & Webster, A. (2010). The views of patients and carers in treatment decision making for chronic kidney disease: systematic review and thematic synthesis of qualitative studies. BmJ, 340, c112.
Neves, J., Martins, M. R., Vilhena, J., Neves, J., Gomes, S., Abelha, A., . . . Vicente, H. (2015). A soft computing approach to kidney diseases evaluation. Journal of medical systems, 39(10), 131.
Nguyen, C. D., Costa, A. C., Cios, K. J., & Gardiner, K. J. (2011). Machine learning methods predict locomotor response to mk-801 in mouse models of down syndrome. Journal of neurogenetics, 25(1-2), 40–51.
Palomaki, G. E., Kloza, E. M., Lambert-Messerlian, G. M., Haddow, J. E., Neveux, L. M., Ehrich, M.(2011). Dna sequencing of maternal plasma to detect down syndrome: an international clinical validation study. Genetics in medicine, 13(11), 913–920.
Park, Y., & Kellis, M. (2015). Deep learning for regulatory genomics. Nature biotechnology, 33(8), 825–826.
Parker, S. E., Mai, C. T., Canfield, M. A., Rickard, R., Wang, Y., Meyer, R. E., others (2010). Updated national birth prevalence estimates for selected birth defects in the united states, 2004–2006. Birth Defects Research Part A: Clinical and Molecular Teratology, 88(12), 1008–1016.
Patterson, D. (2009). Molecular genetic analysis of down syndrome. Human genetics, 126(1), 195–214.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., . . . others (2011). Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12(Oct), 2825–2830.
Perrin, A., Varré, J.-S., Blanquart, S., & Ouangraoua, A. (2015). Procars: Progressive reconstruction of ancestral gene orders. BMC genomics, 16(5), 1.
Petry, C., Mooslehner, K., Prentice, P., Hayes, M., Nodzenski, M., Scholtens, D., . . . others (2017). Associations between a fetal imprinted gene allele score and late pregnancy