Bacteria and Archaea.
Let us turn now to prokaryotic genomes. The Section represents results of the analysis of nucleotide sequences of all 19 bacterial genomes of different groups both from Bacteria and Archaea, which are listed in the article on the second Chargaff’s rule [Rapoport, Trifonov, 2012, p. 2]: “Nucleotide disparities for prokaryotic coding sequences were taken from bacterial genomes of different groups both from Bacteria and Archea. All together 19 genomes were used: Aquifex aeolicus, Acidobacteria
bacterium, Bradyrhizobium japonicum, Bacillus subtilis, Chlamydia trachomatis, Chromobacterium violaceum, Dehalococcoides ethenogenes, Escherichia coli, Flavobacterium psychrophilum, Gloeobacter violaceus, Helicobacter pilory, Methanosarcina acetivorans, Nanoarchaeum equitans, Syntrophus aciditrophicus, Streptomyces coelicolor, Sulfolobus solfataricus, Treponema denticola, Thermotoga maritima and Thermus thermophiles”.
Fig. 9.1 shows the results of the analysis of these prokaryotic genomes by the oligomer sums method. These results demonstrate that the hyperbolic rule No. 1 is fulfilled for all the listed genomes of prokaryotes: the model hyperbolic progressions HA,1(n) = SA/n, HT,1(n) = ST/n, HC,1(n) = SC/n, and HG,1(n) = SG/n from the expression (2.2) practically coincide with the OS-sequences of real total amounts of n-plets from the classes A1-, T1-, C1-, and G1-oligomers at n = 1, 2, 3, …, 20. Because of this coincidence, the model hyperbolic progressions, which are represented by red lines in the graphs of Fig. 9.1, almost completely cover the sequences of real values (the blue lines in the lower graphs show in percent slight alternating deviations of real values from model values).
1
SA = 440779 ST = 436095 SC = 336361 SG = 338100
2
SA =1076577 ST = 1084801 SC = 1426653 SG = 1408353
3 4 SA = 1129118 ST = 1129396 SC = 882500 SG = 884272 5 SA = 301793 ST = 300618 SC = 212019 SG = 211409 6 SA = 21274 ST = 23172 SC = 38842 SG = 44089 7 SA = 405227 ST = 402383 SC = 355663 SG = 358014
8 SA = 1297551 ST = 1293044 SC = 1321325 SG = 1319228 9 SA = 945771 ST = 975318 SC = 468718 SG = 458213 10 SA = 887941 ST = 882586 SC = 1444547 SG = 1443945 SA = 498514 ST = 501121 SC = 323770 SG = 320426
11 12 SA = 1638004 ST = 1658700 SC = 1228410 SG = 1226378 13 SA = 167981 ST =167983 SC = 77361 SG = 77560 14 SA = 772747 ST = 770484 SC = 812772 SG = 823297 15 SA = 1203558 ST = 1213059 SC = 3121252 SG = 3129638
16 SA = 867639 ST = 881683 SC = 490453 SG = 487562 17 SA = 570544 ST = 572448 SC = 346447 SG = 353748 18 SA = 501112 ST = 498004 SC = 424115 SG = 436351 19 SA = 327251 ST = 330338 SC = 734285 SG = 729652
Fig. 9.1. Graphical representations of the results of the analysis - by the oligomer sums method – of 19 bacterial genomes of Bacteria and Archaea mentioned in [Rapoport, Trifonov, 2012, p. 2]. For each of genomes two rows of resulting data are shown at n = 1, 2, …, 20 plotted along the abscissа axes: the top rows demonstrate that model hyperbolic progressions SA/n, ST/n, SC/n, SG/n (red lines) almost completely cover the OS-sequences of real values (the ordinate axes show appropriate values); the bottom blue lines show in percent slight alternating deviations of real values from model values. The left column indicates numbers denoted the genomes as explained in the text.
The genomes are enumerated in Fig. 9.1 by numbers 1-19:
1) Aquifex aeolicus VF5, complete genome, 1551335 bp, accession AE000657, version AE000657.1,
https://www.ncbi.nlm.nih.gov/nuccore/AE000657.1?report=genbank ;
2) Acidobacteria bacterium KBS 146
M015DRAFT_scf7180000000004_quiver.1_C, whole genome shotgun sequence, 4996384 bp, accession JHVA01000001,
https://www.ncbi.nlm.nih.gov/nuccore/JHVA01000001.1?report=genbank;
3) Bradyrhizobium japonicum strain E109, complete genome, 9224208 bp, accession CP010313,
https://www.ncbi.nlm.nih.gov/nuccore/CP010313.1?report=genbank ; 4) Bacillus subtilis strain UD1022, complete genome,4025326 bp, accession CP011534,
https://www.ncbi.nlm.nih.gov/nuccore/CP011534.1?report=genbank;
5) Chlamydia trachomatis strain QH111L, complete genome, 1025839 bp, accession CP018052,
https://www.ncbi.nlm.nih.gov/nuccore/CP018052.1?report=genbank;
6) Chromobacterium violaceum strain LK30 1, whole genome shotgun sequence, 127377 bp, accession LDUX01000001 version LDUX01000001.1,
https://www.ncbi.nlm.nih.gov/nuccore/LDUX01000001.1?report=genbank;
7) Dehalococcoides mccartyi strain CG3, complete genome, NCBI Reference Sequence: NZ_CP013074.1, 1521287 bp,
https://www.ncbi.nlm.nih.gov/nuccore/NZ_CP013074.1?report=genbank;
8) Escherichia coli CFT073, complete genome, GenBank: AE014075.1, 5231428 bp, https://www.ncbi.nlm.nih.gov/nuccore/AE014075.1?report=genbank;
9) Flavobacterium psychrophilum JIP02/86, complete genome, 2860382 bp, accession NC_009613, https://www.ncbi.nlm.nih.gov/nuccore/NC_009613.3; 10) Gloeobacter violaceus PCC 7421 DNA, complete genome, GenBank:
BA000045.2, 4659019 bp, accession BA000045 AP006568-AP006583 version BA000045.2,
https://www.ncbi.nlm.nih.gov/nuccore/BA000045.2?report=genbank; 11) Helicobacter pilory, NCBI Reference Sequence: NC_000921.1, complete genome, 1643831 bp, accession NC_000921 NZ_AE001440-NZ_AE001571
version NC_000921.1, https://www.ncbi.nlm.nih.gov/nuccore/NC_000921.1;
12) Methanosarcina acetivorans str. C2A, complete genome, 5751492 bp, accession AE010299 AE010656-AE011189 version AE010299.1, https://www.ncbi.nlm.nih.gov/nuccore/AE01029;
13) Nanoarchaeum equitans Kin4-M, complete genome, 490885 bp, accession AE017199 AACL01000000 AACL01000001 version AE017199.1,
https://www.ncbi.nlm.nih.gov/nuccore/AE017199.1?report=genbank;
14) Syntrophus aciditrophicus SB, complete genome, 3179300 bp, accession CP000252,
https://www.ncbi.nlm.nih.gov/nuccore/CP000252.1?report=genbank;
15) Streptomyces coelicolor A3(2) complete genome, 8667507 bp, accession AL645882,
https://www.ncbi.nlm.nih.gov/nuccore/AL645882.2?report=genbank;
16) Sulfolobus solfataricus strain SULA, complete genome, 2727337 bp, accession CP011057,
https://www.ncbi.nlm.nih.gov/nuccore/CP011057.1?report=genbank; 17) Treponema denticola SP33 supercont1.1, whole genome shotgun sequence, NCBI Reference Sequence: NZ_KB442453.1, 1850823 bp, accession NZ_KB442453 NZ_AGDZ01000000 version NZ_KB442453.1,
https://www.ncbi.nlm.nih.gov/nuccore/NZ_KB442453.1?report=genbank; 18) Thermotoga maritima strain Tma200, complete genome, 1859582 bp, accession CP010967,
https://www.ncbi.nlm.nih.gov/nuccore/CP010967.1?report=genbank;
19) Thermus thermophilus DNA, complete genome, strain: TMY, 2121526 bp, accession AP017920,
https://www.ncbi.nlm.nih.gov/nuccore/AP017920.1?report=genbank
One can see from Fig. 9.1 that in some prokaryotic genomes (for example in №№ 3, 7, 9, and 15) the alternating small deviations of real values from model values are systematic and related to 3m-plets; it seems to be analogous to the much stronger triplet-deviations described above for human genes in Figs. 8.1-8.7. Can a sign of the presence of such triplet-deviations in the genomes of some bacteria serve as a criterion for the selection of bacterial species for genetic engineering problems? It is one of many new questions arisen due to the discovery of the represented hyperbolic rules and the applications of the oligomer sums method.
Fig. 9.2 shows examples of sequences of the harmonic mean values for two of these bacterial genomes. One can see triplet-deviations in these sequences at dots corresponding to 3m-plets.
Fig. 9.2. The sequences of harmonic mean values of agreed deviations of all four OS-sequences from their model harmonic progressions in the genomes of Bradyrhizobium japonicum strain E109 (left) and Escherichia coli CFT073 (right). n = 1, 2, …, 20 are plotted along the abscissа axes. The ordinate axes show harmonic mean values.
10. Analysis of genomes of microorganisms living in extreme environments