• No results found

3.2 Background

3.2.1 Sequenced-based comparative analysis

Over the past 40 years, genome sequencing and comparative analysis has improved immensely and the ability to sequence species inexpensively has resulted in the availability of sequence data from species that extends beyond animals of agricultural or medical interest (Damas et al., 2017). Today, high quality genome sequence information is available for many mammalian, and other vertebrate species, enabling genome assembly and comparison of evolutionarily divergent species. The ultimate goal of any genome assembly effort is to produce a sequence contig that spans the entire length of one chromosome, from the p terminus to the q terminus (chromosome-level) (Damas et al., 2017). However, it become apparent that draft, or even some so-called ‘chromosome-level’ genome assemblies fail to span the entire length of a given chromosome, resulting in sub-chromosomal sized scaffolds (Damas et al., 2017). Moreover, they fail to provide sufficient comparative information regarding structure and organisation on a chromosome level (Lewin et al., 2009), contiguity of de novo genome assemblies ensures completeness and is essential for structural variation and

93

linkage analysis (Jiao et al., 2017). Today, newer sequencing techniques such as PacBio and Dovetail produce long-reads (LRS >10kb) that were expected to overcome the limitations faced when using NGS short-reads (150-300bp) to assemble to a ‘chromosome level’ (Mantere, Kersten and Hoischen, 2019). However, unforeseen restrictions are emerging, including library preparation while LRS technologies require fresh material or intact cells and protocols for the isolation and handling of ultra-long, high molecular weight DNA all require improvement. Additionally, contigs do not span across chromosomal centromeres and heterochromatin blocks (Damas et al., 2017). Therefore, de novo sequenced genome assemblies are often highly fragmented, meaning that additional assembly algorithms are required to place together the NGS or LRS scaffolds into longer contigs. RACA is an example of this. RACA requires a fully assembled reference genome from the same order of the target species, where it then orientates and orders NGS sequence scaffolds, producing sub-chromosome-sized predicted chromosome fragments (PCF) (Kim et al., 2013). Additionally, computational tools like Evolution Highway Chromosome Browser can be used visualize the assembled data and compare the genomes of multiple species (Larkin and Farre-Belmonte, 2014). Larkin et al (2014) compared the genomes of 11 mammalian species including pig, using Satsuma synteny program, the block results were then visualized using evolution highway chromosome browser and comparative chromosome location established (Larkin., and Farre-Belmonte, 2014).

In 2017, a novel approach was developed to upgrade fragmented, de novo sequenced NGS genomes to the chromosome level. The technique utilised a combination of computational algorithms, including RACA to order scaffolds into PCFs. PCR and computational verification was then applied to validate correct placement. Finally, PCFs were then applied to metaphase chromosomes of the target species using a universal set of avian BAC probes (Damas et al., 2017). This approach successfully upgraded fragmented NGS genome assemblies of five avian species (pigeon, peregrine falcon, budgerigar, saker falcon and ostrich). The resulting chromosme level assemblies

94

contained >80% of the genome and were comparable to similar sequencing and mapping techniques (O’Connor et al., 2018). To date, this combined approach is limited to avian species, with the exception of Larkin et al (2006) who reported a similar in silico and cytogenetic technique in mammals, whereby BACs assigned to cattle chromosome 19 (BTA19) were mapped to mink chromosome 8 using FISH mapping (Larkin et al., 2006). In a study prior to this, Larkin et al (2006a) used BLASTn similarity search to anchor selected cattle BACs to human chromosome 17 (HSA17) and mouse chromosome 11 (MMU11) sequences, with five blocks of synteny observed in the comparative map of BTA19 and HSA17 (Larkin et al., 2006a). With this in mind, Larkin et al (2006b) expanded upon this work through the application of seven BACs, selected in aforementioned study (Larkin et al., 2006a), to metaphase chromosomes of the mink. Successful hybridisations were observed throughout, therefore establishing that BACs selected using genome conservation in silico analysis hybridised well to distantly related species.

With the success of this combined approach in mind, the purpose of this study was to generate preliminary data using non-selected (positional) cattle BACs and sequence- based selected cattle BACs, extracted from previous studies (Larkin et al., 2006ab), to examine if sequence-based selection increases hybridisation success rates in mammals. Finally, use this data to refine selection criteria in preparation to create a universal set of human BAC probes that hybridise across distantly related species, with the intention of mapping de novo sequenced genome to a chromosome level. To achieve this, human BACs were selected by colleagues at RVC with that selection based on genomic properties defined through previous avian work, and preliminary data reported in this study. The selection criteria included low repeat percentage, high mean all score and GC content (gene rich).

The ultimate goal of any genome assembly effort is to create a contiguous sequenced read from the p terminus to the q terminus of each individual chromosome (Damas et al., 2017). Emerging techniques i.e. PacBio, BioNano and Dovetail still fail to achieve this

95

read length, this is demonstrated in the most recent assembly of the western lowland gorilla (Kamilah August 2019). In this work PacBio RSII was used to produce an assembly that consisted of 5,705 scaffolds, with 220 gaps between the scaffolds, meaning that multiple scaffolds span the length of each chromosome. To overcome this problem in birds a novel approach was developed to assembly de novo avian genome assemblies to a chromosome level, using a combination of computational algorithms and physical mapping of scaffolds to chromosomes, thus creating a universal panel of avian BACs that could be used to generate chromosome-level assemblies (Damas et al., 2017). The purpose of this study was to establish whether a similar feat can be achieved in mammals through the use of BAC probes previously isolated in Artiodactyls (cattle), rodents (mice) and primates (humans).