1 General Discussion
1.9 Methodologies to assess microbial communities
For around 350 years, microscopes have been used to evaluate microbial communities. When van Leeuwenhoek first visualized bacteria in water samples in the late 1670’s, he was able to characterize them by relative size and morphology (Bardell, 1982). Van Leeuwenhoek was later the first to describe human-associated microbiota in his microscopic surveys of human saliva (Bardell, 1982). Fast forward to today, and immense technical advancement has drastically improved the visualization of microbes. Optical microscopy is still a very useful tool when visualizing the diversity and spatial organization of bacteria in different
environments, especially when combined with bacterial staining or labelling strategies (Tropini et al., 2018).
Historically, when studying the human microbiota, culture techniques were utilized to grow bacterial isolates in predetermined medium. However, upwards of 80% of bacteria that reside within us are fastidious with very specific and complex growth requirements, making them “unculturable” under standard lab conditions. Thus, the diverse microbial composition of the human gut was drastically underestimated (Wilson et al., 1996). For this reason, the
development of molecular, culture-independent techniques has been paramount in understanding the human microbiome. Early studies of the microbiota involved the generation of clone libraries of the small subunit ribosomal RNA genes (16S rRNA)
followed by Sanger sequencing of short inserts (Wilson et al., 1996). Denaturing gradient gel electrophoresis (DGGE) was also used but necessitates significant user skill and lacks
sensitivity (Burton and Reid, 2002). Fluorescence in situ hybridization (FISH) can now be used to enumerate bacteria with flow cytometry based on the binding and subsequent
fluorescence of complementary 16S rRNA sequence probes, but this is probe-dependent and cannot identify unknown species (Fraher et al., 2012; Namsolleck et al., 2004). PCR-
electrospray ionization mass spectrometry (PCR-ESI-MS) is yet another molecular technique capable of characterizing microbial communities, involving mass spectrometry of PCR amplicons such that the composition of nucleotides is deduced and compared against a database (Ecker et al., 2008; Nickel et al., 2015). This technique appears to perform
comparatively to 16S rRNA gene sequencing with shorter workflow times but has not been widely implemented in the microbiome field, and instead has found favour for clinical diagnostics (Peeters et al., 2016; Zhang et al., 2019). Now, the most commonly used method
in the microbiota research field is 16S rRNA gene sequencing; it has become a powerful tool, requiring PCR with carefully designed barcoded primers and sequencing adapters that
facilitate massive parallel sequencing output (Figure 4) (Gloor et al., 2010).
Classical techniques such as Sanger sequencing are capable of sequencing the entire 16S rRNA gene length, but lack multiplexing capability; when utilizing next generation sequencing (NGS, for example with the Illumina platform as was used in Chapters 2, 3, 4, and 5 of this thesis), sequencing read length is limited. The bacterial 16S rRNA gene has regions of high conservation flanking 9 regions of hypervariability (V1-V9, Figure 4). Due to NGS’s shorter read lengths, the variable region of interest to be sequenced should be
carefully chosen to optimize the resolution of the microbiota profiling (Soergel et al., 2012). Based on the GC content of variable regions within different bacterial genera, PCR
amplification may not work optimally biasing the data (Alcon-Giner et al., 2017). Alternatively, the variable regions of different bacterial groups may not in fact be that “variable”, limiting differentiation at lower taxonomic levels (Alcon-Giner et al., 2017). Thus, the V-region primer selection should be mindful of the suspected bacterial populations found within the sampled environment. Primers 515F-806R targeting the V4 region are utilized in the Earth Microbiome project and throughout this thesis, and are capable of differentiating common genera within the gut upon paired end sequencing (Thompson et al., 2017). Comparatively, primers targeting the V6 region can provide resolution between species of Lactobacillus, and thus would be utilized in sequencing of vaginal samples or yogurt, for example (Thompson et al., 2017).
Through the early years of microbiota analysis, the Roche 454 method of pyrosequencing was favoured due to the ability to generate reads upwards of 500 bases in length, spanning multiple variable regions on the 16S rRNA gene (Caporaso et al., 2010). When the Illumina platform was first utilized in the human microbiota field in 2010, the sequence length was 75 bases, but modern incarnations of the Miseq and HiSeq systems can generate lengths > 600 bp and have superseded 454 as the favoured contemporary platform (DiBella et al., 2013; Hummelen et al., 2010; Salipante et al., 2014). Nascent long-read sequencing technologies (capable of generating read lengths of 10s to 100s of kilobases) including those developed by PacBio and Oxford Nanopore may soon replace 16S variable region amplicon sequencing altogether (Callahan et al., 2019; Dohm et al., 2020).
Where 16S rRNA gene amplicon sequencing is the classic microbiota sequencing, shotgun metagenomic sequencing surveys the entire genome and genetic material of all organisms present in a sample, as opposed to only the 16S rRNA gene in bacteria. Shotgun
metagenomic sequencing was utilized in Chapter 3 of this thesis. Metagenomic sequencing is less susceptible to biases inherent in amplicon sequencing, can provide higher taxonomic resolution, and can capture information from host, bacterial, viral, and fungal DNA, as well as functional pathways present in a sample (Hillmann et al., 2018; Jovel et al., 2016). However, it is significantly more expensive than 16S rRNA gene sequencing for both sequencing platform and computational costs and has fewer computational tools and databases available for analysis (Gevers et al., 2012).
The bioinformatic analysis of NGS data is complex, time-consuming, and there is no
standard methodology across the field. Datasets generated by high-throughput sequencing are compositional in nature, as there is an arbitrary “total” imposed by the sequencing
instrument; although the reads are discrete counts, they represent just a sampling of the original genetic material present in the sample (Gloor et al., 2017). When microbiome data are not treated in the appropriate compositional manner, incorrect assumptions and
conclusions can be drawn (for example, the difference between claims of changes in absolute vs. relative abundance). Quantitative Insights Into Microbial Ecology (QIIME) is a
commonly used analysis toolkit originally developed for pyrosequencing datasets, which facilitates data visualization, diversity analysis, and simple statistics (Caporaso et al., 2010). Several other analysis tools have been developed to overcome challenges in microbiome data analysis, including ALDEx2, and compositional data analysis packages within R software (van den Boogaart and Tolosana-Delgado, 2008; Fernandes et al., 2013; McMurdie and Holmes, 2013; R Core Team, 2013). These sequencing and bioinformatic tools have been utilized throughout the chapters of this thesis in various patient and sample populations, with the aim of determining how the microbiota influences stone disease.
A) The 16S rRNA gene has highly conserved (light blue) regions surrounding nine variable (dark blue) regions. B) The primers utilized in this thesis amplified base pairs 515-806 encompassing the fourth variable region, and contained an Illumina adapter (sequences shown), followed by four random nucleotides, one of sixteen unique 12-mer barcodes (not shown), and the forward and reverse primers (sequences shown).