From barcodes to metabarcodes - List of acronyms and abbreviations

List of acronyms and abbreviations

1.6. From barcodes to metabarcodes

The recent technical advancements of massive parallel DNA sequencing technologies (e.g.

next-generation sequencing platforms, NGS; Shendure and Ji, 2008; Glenn, 2011) have revolutionised many areas of scientific inquiry, taxonomy included. Providing millions of sequence-reads in a single experiment, NGS platforms have extended the classical, one-specimen-at-a-time Sanger sequencing identification of single specimens to the community level (Taberlet et al., 2012). This approach, called “metabarcoding”, is a multispecies

identification method using massive parallel sequencing of a particular marker in environmental DNA or RNA samples (Cristescu, 2014). The significant decrease in the costs of massive sequencing and the ease of sampling and analysing multiple instead of individual specimens has led to an increase of metabarcoding studies for aquatic, microbial and soil communities (Schmidt et al., 2013; Valentini et al., 2016; Abdelfattah et al., 2018), as well as to its application to biodiversity surveillance and monitoring (Bohmann et al., 2014; Deiner et al., 2017). However, being “blind”, metabarcoding approaches need a comprehensive taxonomic reference database, which is generated with the traditional barcoding approach on morphologically verified and curated specimens (Cristescu, 2014).

Furthermore, its blindness is also extended to the unknown amount of species to identify in the community; this requires the primers used for the PCR to be highly versatile (amplify different target molecules with the same efficiency), in order not to miss species whose target sequences do not match well with the primers designed (Taberlet et al., 2012).

Despite these and many other issue shared with the classical DNA-barcoding approach (use of a single target gene to identify taxa, PCR errors, etc.), DNA-metabarcoding has a potential that goes beyond biodiversity assessment and monitoring. It has proven to be an effective tool for diet assessment (Leray et al., 2013; De Barba et al., 2014; Kartzinel et al., 2015), species diversity and distribution (Nanjappa et al., 2014; Malviya et al., 2016; dos Santos et al., 2017; Tragin and Vaulot, 2019) and product authentication (Mishra et al., 2016; Raclariu et al., 2017; 2018). All the aforementioned studies show that we are still at the early stages of exploitation of DNA-metabarcoding potential, and it will be a powerful technique for many years to come.

1.6.1. DNA barcoding and metabarcoding in diatoms

The application of DNA barcoding to diatoms is no different, in principle, from that in other organisms i.e. to provide unambiguous identification of a specimen, using a short sequence of coding or noncoding DNA (Mann et al., 2010). Some characteristics found in

diatoms as cryptic speciation, different morphology across life cycle and culture conditions (Mann, 1999) make barcoding particularly advantageous in these organisms over classical morphological examinations (Mann et al., 2010).

To date, no universal barcode region for diatoms exists, but several markers have been considered and proposed within the nuclear, mitochondrial and chloroplast genomes (Moniz and Kaczmarska, 2009, Fig. 1.4).

Fig. 1.4. Main target genes utilised for DNA barcoding in diatoms. Orange = mitochondrion; green = chloroplast; blue = nucleus.

The classical barcode genes used for animals (COI) and plants (matK, rbcL) seem not to work well for diatoms and other protists. For COI, the main problem is lack of sufficiently conserved primer target regions across taxa (Evans et al., 2007; Moniz and Kaczmarska, 2009) and occurrence of introns (Ehara et al., 2000; Armbrust et al., 2004; Ravin et al., 2010). Plastid markers have been considered problematic for DNA barcoding due to both uniparental or biparental inheritance (Round et al., 1990; Jensen et al., 2003; Levialdi Ghiron et al., 2008). Nonetheless, the rbcL has been evaluated both in its entire length (~1400 bp) and as fragment at 3’-end (rbcL-3P, ~750 bp, Hamsher et al., 2011; ~540 bp, MacGillivary and Kaczmarska, 2011). Preliminary results suggested that the 3’-region is

more variable than the 5’-one and so discouraged the use of the whole gene (Hamsher et al., 2011). In spite of the fact that ease of amplification, sequencing, and alignment as well as lack of indels and introns make it a promising marker (MacGillivary and Kaczmarska, 2011), the low resolution at discerning closely related species in some groups and the aforementioned uncertain inheritance led to the conclusion of a better use of rbcL-3P region as complementary barcoding gene together with 5.8S-ITS2 rDNA region in a dual-locus DNA barcoding system (MacGillivary and Kaczmarska, 2011). This latter region was proposed by Moniz and Kaczmarska (2009, 2010) as candidate barcode based on its use at identifying protist, fungal and plant species (Wayne Litaker et al., 2007; Seifert, 2009; Chen et al., 2010). However, the ITS region is known to be difficult to align even in closely related species (Desdevises et al., 2000; Poisot et al., 2011) and to show infraspecific polymorphism due to non-concerted evolution (Harpke et al., 2006; Zheng et al., 2008), all factors that limit its applications in heterogeneous taxa.

Among nuclear DNA markers, and still within the rDNA cistron, most of the attention has been focused on the genes coding for the nuclear small and large subunit (SSU and LSU) RNAs of the ribosomes, (a.k.a. 18S and 28S rDNA, respectively). Due to its overall length, generally around 3,000 bp, barcoding has focused on the D1-D3 (~ 800 bp) and D2-D3 (~

613 bp) regions in the LSU (Hamsher et al., 2011). These fragments are considered as variable as the rbcL-3P (Hamsher et al., 2011), and therefore, expected to resolve species- and sometimes population-level relationships (Alverson, 2008). However, these markers are unsuitable for current NGS platforms used in metabarcoding approaches because they are too long. Another drawback is that LSU reference sequences are available only for selected groups of organisms; not yet across the entire eukaryotic tree of life, not even across the diatom diversity. On the contrary, the SSU region has been used extensively in diatom phylogenies (Medlin et al., 1993; Kooistra and Medlin, 1996; Medlin et al., 1996;

Medlin and Kaczmarska, 2004; Sarno et al., 2005; Sorhannus, 2007) and the huge number of reference sequences stored in public databases (e.g. PR², Guillou et al., 2012) essentially

covers the diversity of the diatoms. The validity of the various variable regions as barcoding target has been evaluated, in particular the V4 and V9 (Nelles et al., 1984).

Recent results showed that the V4 region (~ 380-400 bp) can be considered the most promising candidate marker for DNA barcoding in diatoms given its ease of amplification, extensive reference library and variability, and universality of its primer target (Zimmermann et al., 2011; Luddington et al., 2012). It outperforms the V9 region in separating closely related species because of its greater length (~ 380 bp vs. 105 bp) and the fact that the V9 region is located at the very 3’-end of 18S gene, a region that is often sequenced incompletely or poorly (Gaonkar, 2017; Gaonkar et al., 2018). However, currently several V4 (BioMarKs, Massana et al., 2015; the Ocean Sampling Day, Kopf et al., 2015) and V9 (e.g. Tara Oceans, de Vargas et al., 2015) metabarcoding datasets are available to explore diversity and distribution of organisms (diatoms included) in world’s oceans and to test the effectiveness of both regions in discriminating specific taxa. In this thesis, I will use the two global metabarcoding datasets, OSD (V4) and Tara Oceans (V9) to explore the diversity of Chaetoceros in the world’s oceans.

1.7. Case study: the planktonic diatom family Chaetocerotaceae, with

In document Phylogenetics and Phylogeography in the Planktonic Diatom Genus <i>Chaetoceros</i> (Page 43-47)