2.5 Discussion
3.3.3 Eocyte hypothesis
Further to the satisfactory recovery of known evolutionary signals, to val- idate the structural phylogeny method, this work can be used to generate insight on the Eocyte hypothesis. Competing arguments exist regarding the structure of the tree of life with two being the three monophyletic domains, according to the Woese system [85] and an alternate topology proposed by Lake [86], i.e. the Eocyte hypothesis. Lake argued that only crenarchaeota, previously known as eocytes, from domain archaea were monophyletic with eukaryotes as opposed to the canonical Woese classification, where the com- plete domain archaea (comprising Euryarchaeota and Crenarchaeota) are monophyletic, see Figure 3.6. Considerable evidence is present in the lit-
erature both for [87–90] and against [91, 92] with some studies unable to support [93, 94] one or the other.
The class II aaRS presented an opportunity to probe support for one of the two topologies in Figure 3.6. The six structures in the Figure 3.4 class IIb cluster (i.e. AspRS and AsnRS) originate from E. coli (PDB ID: 1C0A), H. sapiens mitochondria (PDB ID: 4AH6), H. sapiens cytoplasm (PDB ID: 4J15),S. cerevisiae cytoplasm (PDB ID: 1EOV),S. tokodii (Cre- narchaeota, PDB ID: 1WYD) and P. horikoshii (Euryarchaeota, PDB ID: 1X54). In this case AspRS from P. horikoshii (euryarchaeota) and AsnRS fromS. tokodii (crenarchaeota) are considered equivalent. The primary rea- son for this equivalence is the structural similarity, as reflected in the small distance between them. Moreover S. tokodii does not have an AsnRS and instead uses the mischarging of AspRS to charge the Asn-tRNA [95, 96]. It would also be worth noting here that AsnRS is absent in a majority of bacte- ria and archaea, where the same function is performed by non-discriminating AspRS [3, 4]. Results shown in Figure 3.4 are consistent with this observa- tion.
As discussed earlier, mitochondrial aaRSS show higher similarity to bac- terial synthetases as opposed to their cytoplasmic counterparts which is also seen here where E. coli AspRS and H. sapiens mitochondrial AspRS are monophyletic, Figure 3.7. This grouping is distant from the remaining four structures, where archaeal AspRS and AsnRS (1WYD, chain A from cre- narchaeota and 1X54, chain A from euryarchaeota) form a group and share a clade with eukaryotic aaRSs (4J15, chain A fromH. sapiens and 1EOV, chain A fromS. cerevisea). This topology of bacterial, archaeal and eukary- otic aaRSs, illustrated in Figure 3.7, lends support to the Woese classification of the three domains where all archaea (euryarchaeota and crenarchaeota) are monophyletic and share a higher degree of similarity with eukarya as opposed to bacteria. In contrast to this the presence of P. horikoshii As- pRS in the bacterial-mitochondrial group would have presented evidence in favour of eocyte tree, which is not the case.
3.4
Discussion
In this work, the structural method [21] to probe evolutionary relation- ships using protein structures was used to recover well studied relationships
Figure 3.7: A selection of structures from class IIb in Figure 3.4 to illustrate support for the Woese classification of the three domains of life.
between aaRSs. The synthetases are one of the most ancient protein families considering the role they play in living organisms. Due to the conservation of their role, they have been evolutionarily conserved across the three domains of life and provide an opportunity for us to probe deep evolutionary relation- ships. The primary focus of this method is to assist in the recovery of deep evolutionary relationships from datasets in which the evolutionary signal is too weak to be probed with conventional sequence-based methods. Due to substantial conservation at the sequence level, in this instance the method is used to recover well established relationships derived from sequence-based analysis. This dataset therefore acts as a control and successful recovery of evolutionary signals lends confidence to the predictions made by this method in cases where sequence-based inferences are not possible.
As discussed in the previous section, the method reasonably recovers tree like networks. The structural analysis performed here used SSM-based
Qscoreas the primary metric to quantify distance between protein structures and is coupled to the NJ algorithm to recover phylogenies. This method is successful in terms of (a) recovering the substructure in the aaRSs classi- fication and (b) recovering the known relationships between cytoplasmic,
mitochondrial and bacterial aaRSs. Point (a) is of considerable significance because previously used methods to quantify structural distance for use in phylogenetic inference disagreed with sequence-based methods [19], whereas the choice of Qscore agrees well with classifications determined by sequence analysis.
Furthermore, it is well established that mitochondrial aaRSs group closely with bacterial aaRSs, an observation which is also recovered. Moreover, each functional cluster is observed, i.e. aaRSs responsible for charging tRNAs with the same amino acid across species group together. This functional clustering leads into the recovery of well-established substructure of aaRSs. For class I, the presence of the Rossman fold reveals canonical relationships formed from sequence analysis whereas for class II, near canonical relation- ships are recovered for two of the three subclasses with the deviation in the third explained by sequence and quaternary structure variation, in the previous section.
The success of this method opens up a new area of exploring deep evolu- tionary relationships which could previously not be analysed at a sequence level due to extreme sequence divergence. An example of this was presented, namely using the structural approach and the class II aaRSs to explore two competing descriptions of the organisation of the domains of life i.e. the three domain Woese tree and eocyte hypothesis. In this case only a single species from each of euryarchaeota and crenarchaeota was present in the data due to a lack of structures. Choosing one classification over the other purely based on a single data point would be incorrect, however, due to the relatively high success rate of recovering established relationships, once more structures of aaRSs are available from both euryarchaeota and crenar- chaeota it is anticipated that a better picture will emerge with conclusive evidence either in support or against one of the classifications.