3.3 Summary
4.2.2 Genome-wide distribution of mutations
4.2.2.1 Genome organisation
4.2.2.1.2 First replichore versus second replichore
By using the GC-skew plot for E. coli REL4536, the two replichores were distinguished
such that both replichores terminated around sites TerB or TerC (Figure 4.11). Based
on this distinction, the first replichore in the REL4536 genome has a length of roughly 1.9 Mb and encodes 1,789 genes while the second replichore has a length of roughly 2.6 Mb and encodes 2,431 genes. Given the different distributions of the different mutation types (Figure 4.11), it was possible that the two replichores accumulated mutations at
different rates. To investigate this further, the mutation rates at which mutations occurred within each replichore were calculated. For lineages with large-scale GCRs that involved inversions around the terminus region, these resulted in changes in the sizes of each replichore, and new replichore sizes were taken into account in such cases. Furthermore, inversions around the terminus region were excluded from the analysis because they were not specific to a single replichore. As the two replichores are of different sizes, mutation rates presented have been normalised and are thus, average mutation rates per nucleotide per unit time.
The per generation mutation rates for the first and second replichores were 1.5-fold (Mann-Whitney U = 248.0, p = 0.206) and 1.8-fold (Mann-Whitney U = 175.0, p = 0.009) greater, respectively, in anaerobically grown cells as compared to aerobically
grown cells. Meanwhile, the per day mutation rate for the first and second replichores were 2.1-fold (Mann-Whitney U = 177.0, p = 0.010) and 1.7-fold (Mann-Whitney
U = 159.0, p = 0.003) greater in aerobically grown cells as compared to anaerobically
grown cells. These observations are consistent with previously reported per generation and per day mutation rates (Figure 4.1).
To determine if specific mutation types were responsible for the difference in mutation rates observed between the two replichores, the rates at which BPSs, indels and GCRs occurred in the two replichores under each environment were calculated (Figure 4.12).
Per generation rates for BPSs (Figure 4.12a) did not appear to be biased towards a
particular replichore in either environment (p > 0.05, Mann-Whitney U-test). However,
rates of BPSs per day (Figure 4.12b) were 2.7-fold (Mann-Whitney U = 251.0, p = 0.007) and 2.2-fold (Mann-Whitney U = 191.5, p = 0.022) greater, in the first and
second replichore, respectively, of aerobically grown cells, as compared to anaerobically grown cells. Rates of indels per generation and per day (Figure 4.12) did
124 not appear to be biased towards a particular replichore in either environment (p > 0.05,
Mann-Whitney U-test).
Figure 4.12. Mutation rates of different mutation types in the two replichores in aerobically and anaerobically grown E. coli. Shown are a) mean mutation rates per nucleotide per generation and b) mean mutation rates per nucleotide per day of growth. Error bars represent standard error of the mean. Asterisk denotes a significant difference between the aerobic and anaerobic mutation rates (p < 0.05).
On the other hand, rates of GCRs per generation (Figure 4.12a) were significantly
2.4-fold (Mann-Whitney U = 201.0, p = 0.025) and three-fold (Mann-Whitney
U = 155.0, p = 0.002) greater, in the first and second replichore, respectively, of
0.00E+00 5.00E-11 1.00E-10 1.50E-10 2.00E-10 2.50E-10 3.00E-10
First replichore Second replichore First replichore Second replichore First replichore * Second replichore *
M u ta tion ra te (m u tation s p er n u cleo tide p er g en eration ) Aerobic Anaerobic 0.00E+00 1.00E-09 2.00E-09 3.00E-09 4.00E-09 5.00E-09 6.00E-09
First replichore * Second replichore * First replichore Second replichore First replichore Second replichore
M u ta tion ra te (m u tation s p er n u cleo tide p er d a y ) BPSs Indels BPSs GCRs a) b) Indels GCRs BPSs Indels GCRs * * * *
125 anaerobically grown cells, as compared to aerobically grown cells. Meanwhile, per day rates of GCRs (Figure 4.12b) did not appear to be biased towards a particular
replichore in either environment (p > 0.05, Mann-Whitney U-test). As GCRs per
generation were shown to be more prevalent in anaerobically grown cells (section 4.2.1.3), it was interesting to see that they occurred at relatively similar rates across the two differently-sized replichores and that one region of the genome in particular was not responsible for the high anaerobic GCR per generation rate.
To investigate the G Æ T versus C Æ A mutation rate asymmetry further (section 4.2.1.1.1), mutation rates for the 12 different types of BPS, normalized to account for
any differences in nucleotide content per replichore, were calculated
(Figure A.1 and Figure A.2). For G Æ T transversions, mutation rates per generation
were significantly greater in the first replichore under aerobic conditions (Mann- Whitney U = 228.0, p = 0.025). On the other hand, C Æ A mutation rates per generation
were greater in the second replichore under anaerobic conditions (Mann-Whitney U = 250.0, p = 0.055). Likewise, per generation mutation rates for A Æ C transversions
and T Æ G transversions were significantly greater in the second replichore under anaerobic conditions (Mann-Whitney U = 234.0, p = 0.027). These results suggested the
presence of a strand bias in the types of BPSs that arose under aerobic and anaerobic conditions, though the cause of this is unknown.
The factors behind any asymmetric mutation pressures on the aerobically and anaerobically grown cells are not immediately apparent. It is possible that the observed BPS spectrum is a result of replication strand bias (259) where the chromosome can be clearly differentiated by GC-skew between the two replichores (258). During replication, the template leading strand is discontinuously single-stranded while the complementary Okazaki fragments are being synthesized, while the template lagging strand is maintained in a double-stranded structure during continuous leading strand synthesis (Figure 4.13). As ssDNA is more susceptible to DNA damage and so more
prone to mutation (246, 258), the rates at which mutations arise in the template leading strand are likely to be higher than those in the template lagging strand. Moreover, as replication is slower during anaerobic growth, it seems more likely that mutation rates may be higher in the template leading strand, than for aerobically grown cells. Therefore, it is possible that this replication bias has led to the observed mutation rate
126 asymmetry seen in Figure 4.2. In addition, the mutational strand bias is potentially the
result of transcription bias, where the un-transcribed strand is repaired more efficiently than the transcribed strand (258, 259, 272). Alternatively, it is also possible that the observed BPS spectrum of the aerobically and anaerobically grown cells is the result of different physiological conditions generated by the cells during growth under their respective environmental conditions. Thus, to determine whether the observed BPS spectra of the aerobically and anaerobically grown cells are a result of a replication, transcriptional bias, physiology, or a combination of the three, further work will be required.
Figure 4.13. Genome organisation of E. coli. Due to the bi-directional nature of E .coli replication, each replichore has both leading (template leading strand shown in green) and lagging (template lagging strand shown in red) strands. Figure modified from (263)