MATERIALS AND METHODS - The rate, spectrum and effects of spontaneous mutation in bacteria with

Bacterial strains and culture conditions. The two wild-type MA experiments were

founded from a single clone derived from V. fischeri ES114 and V. cholerae 2740-80, respectively. All MA experiments with V. fischeri were carried out on tryptic soy agar plates supplemented with NaCl (TSAN) (30 g/liter tryptic soy broth powder, 20 g/liter NaCl, 15 g/liter agar) and were incubated at 28°. Frozen stocks of each MA lineage were prepared at the end of the experiment by growing a single colony overnight in 5ml of tryptic soy broth supplemented with NaCl (TSBN) (30 g/liter tryptic soy broth powder, 20 g/liter NaCl) at 28° and freezing in 8% DMSO at -80°. For V. cholerae, all MA experiments were carried out on tryptic soy agar plates (TSA) (30 g/liter tryptic soy broth powder, 15 g/liter agar) and were incubated at 37°. Similarly, frozen stocks were prepared by growing a single colony from each lineage overnight in 5ml of tryptic soy

broth (TSB) (30 g/liter tryptic soy broth powder) at 37° and were stored in 8% DMSO at - 80°.

Mutator strains of V. fischeri ES114 and V. cholerae 2740-80 were generated by replacing the mutS gene in each genome with an erythromycin resistance cassette, as described previously (Datsenko and Wanner 2000; Heckman and Pease 2007; Val et al. 2012). Briefly, I used splicing by overlap extension (PCR-SOE) to generate two erythromycin resistance cassettes, one of which was flanked by ≈ 750 bps of the upstream and downstream regions of the mutS gene in V. fischeri ES114, while the second was flanked by ≈ 750 bps of the upstream and downstream regions of the mutS gene in V. cholerae 2740-80 (Heckman and Pease 2007). Both the V. fischeri ES114 and V. cholerae 2740-80 ΔmutS fragments were then cloned into the R6K γ-ori-based suicide vector pSW7848, which contains a ccdB toxin gene that is arabinose-inducible and glucose-repressible (PBAD) (Val et al. 2012). Both of these pSW7848 plasmids,

henceforth referred to as pSW7848-VfΔmutS and pSW7848-VcΔmutS, were transformed into Escherichia coli pi3813 chemically competent cells and stored at -80° (Datsenko and Wanner 2000).

Conjugal transfer of the pSW7848-VfΔmutS and pSW7848-VcΔmutS plasmids was performed using a tri-parental mating with the E. coli pi3813 cells as the donors (Val et al. 2012), E. coli DH5α-pEVS104 as the helper (Stabb and Ruby 2002), and V. fischeri ES114 and V. cholerae 2740-80 as the respective recipients. For V. fischeri ES114, the chromosomally inserted pSW7848-VfΔmutS plasmid resulting from a single crossover at the ΔmutS gene was selected on LBS plates (Graf et al. 1994) containing

1% glucose and 1 ug/ml chloramphenicol at 28°. Selection for loss of the plasmid backbone from a second recombination step was then performed on LBS plates containing 0.2% arabinose at 28°, which induces the PBAD promoter of the ccdB gene

and ensures that all cells that have not lost the integrated plasmid will die (Val et al. 2012). For V. cholerae, the chromosomally inserted pSW7848-VcΔmutS plasmid was selected on LB plates (Sambrook et al. 1989) containing 1% glucose and 5 ug/ml chloramphenicol at 30°. Selection for loss of the plasmid backbone was performed on LB plates with 0.2% arabinose at 30°. Replacement of the mutS gene in V. fischeri ES114 and V. cholerae 2740-80 were verified by conventional sequencing, and V. fischeri ES114 ΔmutS and V. cholerae 2740-80 ΔmutS were used to found the two mutator MA experiments, under identical conditions to those described above for the wild-type MA experiments.

Ancestral reference genomes. Prior to this study, the genome of V. fischeri ES114

was already in completed form and annotated, consisting of three contigs representing chr1, chr2, and the 45.85 Kb plasmid (Ruby et al. 2005). Further, the location of the oriC on both chromosomes was available in dOriC 5.0, a database for the predicted oriC regions in bacterial and archaeal genomes (Gao et al. 2013). Fortunately, the oriC region on both chromosomes had been placed at coordinate zero, allowing me to proceed with this V. fischeri ES114 reference genome for all subsequent V. fischeri analyses. In contrast, when I initiated these MA experiments, the V. cholerae 2740-80 genome was still in draft form, consisting of 257 scaffolds with unknown chromosome

genome location on bpsm and indel rates, I used single molecule, real-time (SMRT) sequencing to generate a complete assembly separated into the two contigs of V. cholerae 2740-80.

The Pacific Biosciences RSII sequencer facilitates the completion of microbial genomes by producing reads of multiple kilobases that extend across repetitive regions and allow whole-genomes to be assembled at a relatively limited cost (Koren and Phillippy 2015). Genomic DNA (gDNA) was prepared using the Qiagen Genomic-Tip Kit (20/G) from overnight cultures of V. cholerae 2740-80 grown in LB at 37° using manufacturers instructions. Importantly, this kit uses gravity filtration to purify gDNA, which limits shearing and increases the average fragment size of the resulting gDNA sample. Long insert library preparation and SMRT sequencing was performed on this V. cholerae 2740-80 gDNA at the Icahn School of Medicine at Mount Sinai according to the manufacturer’s instructions, as described previously (Beaulaurier et al. 2015). Briefly, libraries were size selected using Sage Science Blue Pippin 0.75% agarose cassettes to enrich for long-reads, and were assessed for quantity and insert size using an Agilent DNA 12,000 gel chip. Primers, polymerases, and magnetic beads were loaded to generate a completed SMRTbell library, which was run in a single SMRT cell of a Pacific Biosciences RSII sequencer at a concentration of 75 pM for 180 minutes.

As expected, the long insert SMRT sequencing library generated mostly long reads, with an average sub-read length of 8,401 bps and an N50 of 11,480 bps. I used the hierarchical genome-assembly process workflow (HGAP3) to generate a completed assembly of V. cholerae 2740-80 and polished the assembly using the Quiver algorithm (Chin et al. 2013). The resultant assembly consisted of two contigs representing chr1

and chr2, with an average coverage of 128x. I annotated this assembly using prokka (v1.11), specifying Vibrio as the genus (Seemann 2014). I then identified the location of the oriC on both contigs using Ori-finder, which applies analogous methods to those used by dOriC 5.0 to identify oriC regions in bacterial genomes (Gao and Zhang 2008; Gao et al. 2013). Of course, these oriC regions were not located at coordinate zero of the V. cholerae 2740-80 reference genome, so I re-formatted the reference genome to place each oriC region at the beginning of the chr1 and chr2 contigs, then stitched the contigs back together and re-polished the genome using Quiver. Prokka was then run a second time to update the location of all genes, and this re-formatted V. cholerae 2740- 80 genome was used as the ancestral reference genome for all subsequent V. cholerae analyses.

MA-WGS Process. For the two wild-type MA experiments, seventy-five independent

lineages were founded by single cells derived from a single colony of V. fischeri ES114 and V. cholerae 2740-80, respectively. Each of these lineages was then independently propagated every 24 hours onto fresh TSAN for V. fischeri and fresh TSA for V. cholerae, and this cycle was repeated for a total of 217 days. For the two mutator MA experiments, forty-eight independent lineages were founded and propagated as described above from a single colony each of V. fischeri ES114 ΔmutS and V. cholerae 2740-80 ΔmutS, respectively. However, because of their higher mutation rates, these lineages were only propagated for a total of 43 days. At the conclusion of the four MA experiments, each lineage was grown overnight in the appropriate liquid broth at the appropriate temperature (see above), and stored at -80° in 8% DMSO.

Daily generations were estimated monthly for the wild-type lineages and bi- monthly for the mutator lineages by calculating the number of viable cells in a representative colony from 10 lineages per MA experiment following 24 hours of growth. During each measurement, the representative colonies were placed in 2 ml of phosphate buffer saline (80 g/liter NaCl, 2 g/liter KCl, 14.4 g/liter Na2HPO4 • 2H2O, 2.4

g//liter KH2PO4), serially diluted, and spread plated on TSAN or TSA for V. fischeri and

V. cholerae, respectively. These plates were then incubated for 24 hours at 28° or 37°, and the daily generations per colony were calculated from the number of viable cells in each representative colony. The average daily generations were then calculated for each time-point using the average of the ten representative colonies, and the total generations elapsed between each measurement were calculated as the product of the average daily generations and the number of days before the next measurement. The total of number of generations elapsed during the duration of the MA experiment per lineage was then calculated as the sum of these totals over the course of each MA study (Figure B.1).

At the conclusion of each of the four MA experiments, gDNA was extracted using the Wizard Genomic DNA Purification Kit (Promega) from 1 ml of overnight culture (TSBN at 28° for V. fischeri; TSB at 37° for V. cholerae) inoculated from 50 representative stored lineages for Vf-wt and Vc-wt experiments, and all 48 stored lineages for the Vf-mut and Vc-mut experiments. For the wild-type MA experiments, gDNA from the ancestral V. fischeri ES114 and V. cholerae 2740-80 strains was also extracted. All libraries were prepared using a modified Illumina Nextera protocol designed for inexpensive library preparation of microbial genomes (Baym et al. 2015).

Sequencing of the Vf-wt and Vc-wt lineages and their respective ancestors was performed using the 101-bp paired-end Illumina HiSeq platform at the Beijing Genome Institute (BGI), while sequencing of the Vf-mut and Vc-mut lineages was performed using the 151-bp paired-end Illumina HISeq platform at the University of New Hampshire Hubbard Center for Genomic Studies.

The raw fastQ reads were analyzed using fastQC, and revealed that 48 Vf-wt lineages, 49 Vc-wt lineages, 19 Vf-mut lineages, and 22 Vc-mut lineages were sequenced at sufficient depth to accurately identify bpsm and indel mutations. The failure to successfully sequence a high proportion of Vf-mut and Vc-mut lineages was mostly generated by a poorly normalized library, leading to limited sequence data for several of the mutator lineages. For the successfully sequenced lineages, all reads were mapped to their respective reference genomes with both the Burrows-Wheeler Aligner (BWA) (Li and Durbin 2009) and Novoalign (www.novocraft.com). The average depth of coverage across the successfully sequenced lineages of each MA experiment was 100x for Vf-wt, 96x for Vc-wt, 124x for Vf-mut, and 92x for Vc-mut.

Base-substitution mutation identification. For all four MA experiments, bpsms were

identified as described in the methods of Chapter 1. Briefly, a three-step process was used to identify bpsms. First, I identified an ancestral consensus base at each site in the reference genome using pooled reads across all lines from each MA experiment. Second, I identified a lineage specific consensus base at each site in the reference genome for each individual lineage using only the reads from that MA line. Here, I required a minimum of two forward and two reverse reads, and 80% consensus among

those reads. Third, lineage specific consensus bases for each lineage were compared to the overall ancestral consensus of the MA experiment to identify putative bpsms. Putative bpsms were considered genuine if they were independently identified by both the BWA and Novoalign alignments, and they were only identified in a single lineage. Any sites at which I did not identify an ancestral and lineage specific consensus base were not analyzed for mutations. As was the case in Chapter 1, I generated a supplementary dataset for all genuine bpsms identified in this study (Table B.1), demonstrating that the vast majority of bpsms in all four MA experiments from this study were covered by more than 50 reads and were supported by more than 95% of the reads that covered the site. Further, none of the bpsms identified in this study were present in any of the MA ancestral strains, which were also sequenced and analyzed. Thus, I am confident that all bpsms identified in this study represent true spontaneous bpsms that arose during the MA experiments.

Insertion-deletion mutation identification. All indels identified in this study were also

identified as described in the methods of Chapter 1. Briefly, I started by extracting putative indels using lenient filters so that I would not rule out genuine indels in long SSRs. These putative indels were extracted from both the BWA and Novoalign alignments as long as they were covered by at least two forward and two reverse reads, and 30% of those reads identified the exact same indel (size and motif). All putative indels that were called by more than 80% of the reads that covered the site and were independently identified by BWA and Novoalign were considered genuine indels. For putative indels that were supported by 30-80% of the reads that covered the site in both

the BWA and Novoalign alignments, I parsed out only reads that had bases on both the upstream and downstream regions of the SSR (if the indel was in an SSR), and on both the upstream and downstream region of the indel (if the indel was not in an SSR). If the indel was called by more than 80% of these sub-reads in both the BWA and Novoalign alignments, it was also considered genuine. Lastly, as described in Chapter 1, I employed PINDEL to all MA lineages to identify large genuine indels that went undetected with the short-read aligners (Ye et al. 2009), requiring a total of 20 reads (6 forward and 6 reverse) and 80% consensus (size and motif). A supplementary dataset for all genuine indels identified in the four MA experiments described in this study is provided in Table B.2, which highlights that nearly all indels were covered by more than 50 reads, with at least 80% consensus. Further, as with the bpsms, none of the indels that were identified in this study were present in the ancestral strains, so I am confident that nearly all of these indels represent genuine indels that arose during the MA experiments.

Mutation-rate analyses. Overall bpsm and indel rates were calculated for each lineage

using the equation:

! = #/%&,

where ! represents the mutation rate, # represents the number of mutations observed, % represents the number of ancestral sites analyzed, and & represents the total number of generations elapsed per lineage. Conditional bpsm rates for each lineage were calculated using the same equation, but with # representing the number of bpsms of the focal bpsm type, and % representing the number analyzed ancestral sites that could

generate the focal bpsm type. All summative bpsm and indel rates presented for each MA experiment were calculated as the average mutation rate across all analyzed lineages, while summative standard errors were calculated as the standard deviation of the mutation rate across all lines (/), divided by the square root of the total number of lines in the corresponding MA experiment (0):

'(_*++,-.= // 0.

For my interval analysis of bpsm and indel rates within chromosomes, I divided each chromosome into 100 kb intervals, starting at the origin of replication and extending bi-directionally to the replication terminus. Bpsm rates in each interval were measured by dividing the total number of bpsms or indels from this study by the product of the total number of analyzed sites in each interval across all lines and the number of generations per line, using the same formula described above for genome wide mutation rates:

! = #/%&.

Because none of the chromosomes were exactly divisible by 100 kb, the terminal intervals on each replichore were always less than 100 kb, but their mutation rates were calibrated to the number of of bases analyzed in those intervals.

Statistical analyses. All statistical analyses were performed in R Studio Version

RESULTS

Four MA experiments were carried out in this study using daily single-cell bottlenecks that limit the efficiency of natural selection to purge deleterious and enrich beneficial mutations. For the two wild-type (wt) experiments, V. fischeri ES114 (Vf-wt) and V. cholerae 2740-80 (Vc-wt) colonies were used to found 75 MA lineages, each of which was propagated for 217 days. For the two mutator (mut) experiments, V. fischeri ES114 (Vf-mut) and V. cholerae 2740-80 (Vc-mut) strains lacking a mutS gene were used to found 48 MA lineages, each of which was propagated for 43 days. The parameters of each MA experiment and the mutations that were identified from all of the final isolates are summarized in Table 1. In all four experiments, generations of growth per day declined over the course of the MA experiment, particularly in the mutator lineages, as a result of the fitness cost of bearing the acquired mutations (Figure B.1).

Table 1. Parameters and observed mutations in the four Vibrio fischeri and Vibrio cholerae mutation accumulation experiments.

MA Lines Sequenced Lines Gen. per line Gen. total No. of bpsm No. of indels Bpsm rate per nucleotidea Bpsm rate per genomeb Indel rate per nucleotidea Indel rate per genomeb Vf_wt 48 5187 248976 219 60 _2.07⋅10-10 8.85⋅10-4 5.68⋅10-11 2.43⋅10-4 Vc_wt 49 6453 316197 138 22 _1.07⋅10-10 4.38⋅10-4 1.71⋅10-11 6.98⋅10-5 Vf_mut 19 810 15390 4313 382 6.57⋅10-8 2.81⋅10-1 5.82⋅10-9 2.49⋅10-2 Vc_mut 22 1254 27588 1022 273 _9.09⋅10-9 3.72⋅10-2 2.43⋅10-9 9.93⋅10-3

a_{Bpsm and indel mutation rates/nucleotide/generation are calculated as the number of observed}

mutations, divided by the product of sites analyzed and number of generations per lineage. The above estimates represent the average rate across all sequenced lineages.

b_{Bpsm and indel mutation rates/genome/generation are calculated by multiplying the mutation}

rate/nucleotide/generation in each lineage by the genome size. The above estimates represent the average rate across all sequenced lineages.

The properties of my MA experiments allowed me to assume that few mutations were subject to the biases of natural selection. The threshold selective coefficient (s) below which genetic drift will overpower natural selection is determined by:

where 0₅ is the effective population size, estimated here as the harmonic mean of the population size (N) (Hall et al. 2008). By calculating 0₅ for each MA experiment, I estimate that only mutations conferring an adaptive or deleterious effect (s) greater than 0.083, 0.067, 0.106, and 0.069 for Vf-wt, Vc-wt, Vf-mut, and Vc-mut, respectively, were subject to the biases of natural selection, which is expected to be a very small fraction of mutations (Kimura 1983; Hall et al. 2008). Furthermore, if I exclude indels that were identified at the same site or SSR, only four genes were hit more than once and no genes were hit more than twice across all wild-type lineages, suggesting that positive selection acting on common traits was minimal in these experiments.

Other metrics that have been used to test that the efficiency of purifying selection is minimized in MA experiments include the ratio of coding to non-coding mutations and the ratio of synonymous to nonsynonymous bpsms. However, both of these tests are problematic as preferential mismatch repair in coding regions (Lee et al. 2012), context- dependent mutation biases (Sung et al. 2015), and a non-uniform distribution of mutation rates and spectra across the genome (Foster et al. 2013; Dillon et al. 2015; Dettman et al. 2016) can generate artificial signatures of natural selection. These issues were evident in my MA experiments, where chi-square tests comparing my observed mutations with the expected ratios of coding to non-coding DNA and synonymous to nonsynonymous sites in each genome were at times inconsistent.

For each MA experiment, the expected ratio of coding to non-coding mutations was determined directly from each ancestral reference genome, and the expected ratio of synonymous to nonsynonymous bpsms was calculated from each ancestral reference genome, after accounting for codon usage and %GC content at synonymous

and nonsynonymous sites. In the Vf-wt lines, I observed an excess of non-coding indels and bpsms (Bpsm: χ2_{= 4.01, d.f. = 1, p = 0.0451, Indels: χ}2_{= 61.43, d.f. = 1, p <}

0.0001), while the ratio of nonsynonymous to synonymous bpsms did not differ significantly from the null expectation (χ2 = 0.91, d.f. = 1, p = 0.3410). In the Vc-wt lines, non-coding bpsms were again in excess (χ2 = 8.74, d.f. = 1, p = 0.0028), while the ratio of coding to non-coding indels (χ2_{= 1.48, d.f. = 1, p = 0.2240) and nonsynonymous to}

synonymous bpsms (χ2_{= 1.47, d.f. = 1, p = 0.2262) did not differ from the null}

expectation. The excess of non-coding indels and bpsms could imply that selection played a small role in eradicating coding mutations or that mismatch repair is more

In document The rate, spectrum and effects of spontaneous mutation in bacteria with multiple chromosomes (Page 49-75)