Article (Discoveries)
Title:
From β- to α-proteobacteria: the origin and evolution of rhizobial nodulation genes nodIJ
Authors:
Seishiro Aoki*1
, Motomi Ito*, and Wataru Iwasaki†
Author affiliation:
*Department of General Systems Studies, Graduate School of Arts and Sciences, University of Tokyo, 3-8-1 Komaba, Meguro-ku, Tokyo 153-8902, Japan
†Center for Earth Surface System Dynamics, Atmosphere and Ocean Research Institute, the University of Tokyo, 5-1-5 Kashiwa-no-ha, Kashiwa, Chiba 277-8564, Japan
1
Corresponding author: Seishiro Aoki
E-mail: saoki@bio.c.u-tokyo.ac.jp
Keywords:
symbiosis, rhizobia, nodulation genes, horizontal gene transfer, gene duplication
MBE Advance Access published September 11, 2013
by guest on July 13, 2016
http://mbe.oxfordjournals.org/
Abstract
Although many α- and some β-proteobacterial species are symbiotic with legumes, the
evolutionary origin of nitrogen-fixing nodulation remains unclear. We examined α- and β
-proteobacteria whose genomes were sequenced using large-scale phylogenetic profiling and
revealed the evolutionary origin of two nodulation genes. These genes, nodI and nodJ (nodIJ),
play key roles in the secretion of Nod factors, which are recognized by legumes during
nodulation. We found that only the nodulating β-proteobacteria, including the novel strains
isolated in this study, possess both nodIJ and their paralogous genes (DRA-ATPase/permease
genes). Contrary to the widely accepted scenario of the α-proteobacterial origin of rhizobia, our
exhaustive phylogenetic analysis showed that the entire nodIJ clade is included in the clade of
Burkholderiaceae DRA-ATPase/permease genes, i.e., the nodIJ genes originated from gene
duplication in a lineage of the β-proteobacterial family. After duplication, the evolutionary rates
of nodIJ were significantly accelerated relative to those of homologous genes, which is consistent with their novel function in nodulation. The likelihood analyses suggest that this accelerated evolution is not associated with changes in either nonsynonymous/synonymous substitution rates or transition/transversion rates, but rather, in the GC content. Although the low GC content of the nodulation genes has been assumed to reflect past horizontal transfer events from donor rhizobial genomes with low GC content, no rhizobial genome with such low GC content has yet been found. Our results encourage a reconsideration of the origin of nodulation and suggest new perspectives on the role of the GC content of bacterial genes in functional adaptation.
by guest on July 13, 2016
http://mbe.oxfordjournals.org/
Introduction
The symbiosis of prokaryotes with eukaryotes is a widespread and important phenomenon in ecosystems. Indeed, the symbiosis between nitrogen-fixing rhizobia and host legumes is one of the most eagerly studied subjects, and its origin has been of interest since its discovery more than
100 years ago (Quispel 1988). However, the origin and evolution of this symbiosis are still
unclear. Prior to the isolation of legume-nodulating bacterial species of Burkholderia and
Cupriavidus (Burkholderiaceae) in β-proteobacteria in 2001 (Chen et al. 2001; Moulin et al.
2001), it had been assumed that rhizobia were restricted to the α-proteobacteria class. Whereas
most legumes are symbiotic with species of α-proteobacteria (α-rhizobia), several legumes,
particularly those of the genus Mimosa, are nodulated primarily by β-proteobacteria (β-rhizobia)
(Chen et al. 2005; Andam et al. 2007; Bontemps et al. 2010; Gyaneshwar et al. 2011; Mishra et al. 2012). Phylogenetic studies of 16S ribosomal DNA (rDNA) and housekeeping genes have revealed that legume-nodulating bacteria are polyphyletic, and the horizontal transfer of symbiotic genes has allowed nodulation abilities to diffuse between different bacterial clades, making it difficult to trace the evolutionary history of rhizobial symbiosis to its origin (Young and Haukka 1996; Doyle 1998; Wernegreen and Riley 1999).
The symbiotic genes include nodulation genes (nod genes) and nitrogen-fixation genes (nif,
fix, and fdx genes), which are usually located in symbiotic regions within symbiotic plasmids
(pSym) or genomic regions called symbiosis islands. The nodulation genes synthesize host-specific lipochito-oligosaccharides (Nod factors), the structures of which are important in the recognition of rhizobia by legumes. Although symbiotic regions are very heterogeneous in size
and gene content among rhizobia, several nod genes occur in every rhizobial species (with one
known exception, the Aeschynomene-nodulating Bradyrhizobium strains; Giraud et al. 2007).
These common nod genes were first described as the nodA, nodB, and nodC genes and are
involved in the biosynthesis of the core of the Nod factors. Later, the nodI and nodJ genes, whose
sequences are similar to the ATPase and permease components of an ABC-type transport system,
were identified and included among the common nod genes (van Rhijn and Vanderleyden 1995;
Suominen et al. 2001). The common nod genes form an operon in most α-rhizobial strains and
by guest on July 13, 2016
http://mbe.oxfordjournals.org/
bacterial genomes, the nodI and nodJ genes belong to the DRA (drug and antibiotic resistance) family and were assumed to be involved in Nod factor secretion (Davidson et al. 2008). The symbiotic regions often have a specific GC content lower than that of the entire bacterial genome, which is thought to result from recent horizontal gene transfer events from host genomes of low GC content (Lawrence and Ochman 1997; Amadou et al. 2008; Juhas et al. 2009; Azad and Lawrence 2011).
Several studies have hypothesized a unique origin of common nod genes and their horizontal
transfer from α- to β-proteobacteria; i.e., the α-rhizobial origin of nodulation genes (Moulin et al.
2001; Chen et al. 2003; Kock et al. 2004; Balachander et al. 2007; Amadou et al. 2008; Sprent 2009; Angus and Hirsch 2010; Bontemps et al. 2010; Martinez-Romero et al. 2010; Rogel et al.
2011). The first genomic study of a β-rhizobium, Cupriavidus taiwanensis, found that this
bacterium had the most minimal symbiotic genome structure among rhizobia, suggesting thatβ
-rhizobia evolved relatively recently (Amadou et al. 2008). Subsequent phylogenetic studies have
revised this view, suggesting a much earlier β-rhizobial origin: the monophyly and large
divergences of the symbiotic genes of β-rhizobia indicate horizontal gene transfers between α-
and β-rhizobia many millions of years ago (Chen et al. 2005; Bontemps et al. 2010; Angus and
Hirsh 2010; Gyaneshwar et al. 2011; Mishra et al. 2012). Bontemps et al. (2010) postulated that
the symbiotic nodulation of Burkholderia is old and stable but that the horizontal gene transfer of
nodulation genes likely occurred from α- to β-proteobacteria. Martinez-Romero et al. (2010)
observed that the nodA sequences in Bradyrhizobium had larger diversity scores than those of
other taxa, suggesting that the bradyrhizobial (i.e., α-proteobacterial) nod genes are ancestral.
In this work, the phylogenetic profiling of abundant bacterial genomic data revealed that the
nodI and nodJ genes provide novel insights into the origin and evolution of rhizobia. We isolated
novel strains of β-rhizobia from the legume Mimosa pudica and conducted the first exhaustive
phylogenetic analysis using the homologous sequences of the nodIJ genes of these strains and
strains whose genomes were sequenced. Surprisingly, the results strongly suggest a β
-proteobacterial origin of these common nodulation genes. Furthermore, the duplication origin of nodulation genes suggests that the low GC content of symbiotic regions has a functional role.
by guest on July 13, 2016
http://mbe.oxfordjournals.org/
Results
Phylogenetic profiling of α- and β-proteobacterial genomes lead to evolutionary analysis of nodIJ genes.
We searched for genes that are preferentially associated with rhizobia and that possibly originated
from α- or β-proteobacteria. For the search, we examined α- and β-proteobacteria whose
genomes were sequenced using a large-scale phylogenetic profiling analysis. Assuming that such gene clusters would have been present in the last rhizobial ancestor with nodulation functions, we focused on the gene clusters (or gene families) that are specifically and broadly distributed across
the genomes of α- and β-rhizobial species. We searched for these gene clusters by phylogenetic
profiling using the microbial comparative genomics systems MBGD and IMG (supplementary tables S1 and S2, Supplementary Material online; Uchiyama 2005, Markowitz et al. 2009). Gene
clusters containing each of the following genes were found: fixA, fixB, fixC, nifB, nifD, nifE, nifH,
nifK, nifN, and nodA, as genes preferentially associated with rhizobia only, and fdxN, metQ, nodI,
and nodJ, as associated with rhizobia and β-proteobacteria (including non-nodulating species).
fixA, fixC, nifB, nifD, nifE, nifK, nifN, and nodA had previously been identified as genes overrepresented in rhizobia (Amadou et al. 2008).
We then investigated their phylogenetic trees in detail, focusing on genes that (i) displayed
signs of horizontal gene transfer among α- and β- rhizobia and (ii) had paralogous genes in the α-
or β- proteobacterial genomes (supplementary fig. S1 and tables S1 and S2, Supplementary
Material online). The first criterion was based on the widely accepted hypothesis of the unique
origin and horizontal transfer of rhizobial symbioticgenes (Moulin et al. 2001; Chen et al. 2003,
2005; Kock et al. 2004; Balachander et al. 2007; Amadou et al. 2008; Bontemps et al. 2010; Martinez-Romero et al. 2010; Rogel et al. 2011; Tian et al. 2012) and ensured that the genes both existed in the last rhizobial ancestor and also had functions related to nodulation at that time. The second criterion was adopted because only the use of paralogs can allow rooting the trees and the deduction of the direction of transfer events of focal genes (Barton et al. 2007). In fact, previous studies have postulated that the direction of the transfer and the origin of genes cannot be reliably proven using the phylogeny of symbiotic genes without paralogs (Chen et al. 2005; Young 2005).
by guest on July 13, 2016
http://mbe.oxfordjournals.org/
In addition, gene duplication is an important mechanism for the origin of new gene functions (Hughes 1994, 2002; Barton et al. 2007).
Among the genes examined, nodI and nodJ met both criteria (fig. 1 and supplementary figs.
S2-S4, Supplementary Material online), whereas nodA met only the first criterion. All the nodA
genes identified by the phylogenetic profiling analysis were isolated from rhizobia, and the
paralogous genes of nodA were not clearly identified in the α- and β-proteobacterial genomes
(supplementary fig. S1A, Supplementary Material online). fdxN, metQ, nifD, nifE, nifH, nifK, and
nifN met only the second criterion, and we could not find any correlation between the duplication
and nodulation events in the phylogeny of these genes (supplementary fig. S1, Supplementary
Material online). The evolutionary histories of fdxN, nifD, nifE, nifH, nifK, nifN, and other
nitrogen fixation genes (fixA, fixB, fixC, and nifB) were very complex, spanning non-nodulating
but nitrogen-fixing bacteria (supplementary figs. S1B-S1H, Supplementary Material online), as described in previous studies (Chen et al. 2003; Verma et al. 2004; Young 2005; Hartmann and
Barnum 2010; Gyaneshwar et al. 2011). The phylogeny of the metQ gene was largely
characterized by vertical transmission (supplementary fig. S1I, Supplementary Material online).
In contrast to the other genes, nodI and nodJ showed signs of horizontal transfer among rhizobial
species, and, more importantly, both genes contained paralogous genes (DRA-ATPase for nodI
and DRA-permease for nodJ; fig. 1 and supplementary figs. S2-S4, Supplementary Material
online). Therefore, nodI and nodJ (nodIJ in the nod operon) were adopted for the present study;
here, we designate the nodIJ homologous genes that may not have a function in nodulation as
DRA-ATPase/permease genes.
nodIJ genes have a duplicational origin in an ancient β-, not α-, rhizobial genome.
We estimated the evolutionary histories of nodIJ and DRA-ATPase/permease (fig. 1). We
searched against 1,373 fully sequenced genomes in the KEGG Sequence Similarity Database
(SSDB) to obtain any sequence similar to the nodIJ genes as described in Materials and Methods
(supplementary table S3, Supplementary Material online; Kanehisa et al. 2012). We excluded
nodIJ of Azorhizobium caulinodans ORS571 in the analyses, because it exhibited exceptional sequence divergence (see also Materials and Methods and supplementary fig. S5, Supplementary
Material online). Because nodIJ (and DRA-ATPase/permease) always constituted an operon in
by guest on July 13, 2016
http://mbe.oxfordjournals.org/
each genome analyzed (supplementary table S4, Supplementary Material online), we concatenated the two genes for our phylogenetic analysis. Because the independent phylogenetic
trees of nodI and nodJ homologs had approximately the same topology as the nodIJ tree, we
assumed that both genes had the same evolutionary history and could be analyzed as one unit
(supplementary figs. S2 and S3, Supplementary Material online). As we discuss later, the nodIJ
sequences showed divergence in nucleotide composition from the DRA-ATPase/permease sequences. Thus, we used the following methods for the phylogenetic analysis: nonhomogeneous models, amino acid recording, and LogDet transformation (Loomis and Smith 1990; Lockhart et al. 1994; Galtier and Gouy 1998; Sheffield et al. 2009). All the methods provided similar tree topologies (fig. 1 and supplementary figs. S2-S4, Supplementary Material online, see also Materials and Methods). The trees in supplementary figure S6 (Supplementary Material online) and on the left side in figure 2 are reference bacterial species trees constructed using 25 "core" gene sequences, as proposed by Battistuzzi and Hedges (2009).
The phylogenetic congruence between gene phylogenies was analyzed for the following two pairs: DRA-ATPase/permease genes vs. core genes of species of the Burkholderiaceae family, and nodulation genes vs. core genes of rhizobia. The topology- and distance-based methods of cophylogenetic analysis using the Jane2 (Conow et al. 2010) and CopyCat programs
(Meier-Kolthoff et al. 2007) rejected the null hypotheses of random relationships for both pairs (p<0.01
for every test). On the other hand, the data-based method implemented in the CONSEL program did not reject the hypothesis of topology congruence for the former pair, but rejected one for the
latter pair (p<0.001, the upper half of fig. 2; Shimodaira and Hasegawa 2001). These results
indicate that the DRA-ATPase/permease genes were vertically transmitted within the Burkholderiaceae family (supplementary table S5, Supplementary Material online) while the nodulation genes were horizontally transferred (Wernegreen and Riley 1999; Moulin et al. 2004).
The most important finding was that Burkholderia sp. CCGE1002, Burkholderia phymatum,
and C. taiwanensis all contained both sets of nodIJ and DRA-ATPase/permease in their genomes
(triangles, squares, and stars, respectively, in fig. 1 and figs. S2-S4, Supplementary Material
online). DRA-ATPase/permeases do not exist in the α-rhizobial clades, strongly suggesting that gene duplications occurred in an ancestral genome of the Burkholderiaceae family. To confirm
this, we created a larger dataset of nodIJ homologs using KEGG Similarity Database (146
by guest on July 13, 2016
http://mbe.oxfordjournals.org/
homologous sequences of bacteria whose genomes were sequenced, as described in Materials and
Methods. This dataset also included sequences of novel β-rhizobial strains isolated in this study
from the root nodules of M. pudica in Japan. The newly isolated strains, with 16S rDNA
sequences highly similar to those of C. taiwanensis, contained both nodIJ and
DRA-ATPase/permease genes, as did other β-rhizobia (fig. 1 and supplementary table S4 and figs.
S2-S4, Supplementary Material online). A distance trees created from this larger dataset (supplementary fig. S4, Supplementary Material online) strongly supported a similar phylogenetic topology as indicated in figure 3. We also conducted maximum likelihood (ML) and Bayesian inference (BI) analyses on a smaller dataset (69 sequences; fig. 1 and supplementary figs. S2 and S3, Supplementary Material online) and confirmed that every
phylogeny supports the topology in figure 3 and the scenario of nodIJ gene duplication in the
clade of the family Burkholderiaceae.
It has been suggested that more than one independent transfer of the nodA gene occurred
between the α- and β-rhizobia, as the sequences of β-rhizobia fall into two separate clades:
Cupriavidus taiwanensis and Burkholderia phymatum in one clade and B. tuberum STM678 in
the other (Chen et al. 2003, 2005; Garau et al. 2009). Therefore, we examined whether the nodIJ
genes of B. tuberum STM678 show an evolutionary history that is distinct from the histories of
other β-rhizobial nodIJ genes. In addition to a reported partial nodI sequence
(DDBJ/EMBL/GenBank accession number AJ306730, Moulin et al. 2001), we retrieved the
full-length nodIJ sequence from the draft genomic data in the Sequence Read Archive (SRA;
accession number SRP000062) and reconstructed the phylogenetic trees. Both phylogenies
showed that the B. tuberum STM678 genes were included in the clade of β-rhizobial nodIJ
(supplementary fig. S7), supporting the single β-rhizobial origin of the nodIJ genes. The
discrepancies between the nodA and nodIJ phylogenies in B. tuberum STM678 suggest a
complex evolutionary history for this rhizobial species.
Although our analysis showed the β-rhizobial origin of the nodIJ genes, the origin of the
nodulation itself requires further investigation. Thus, we also analyzed the phylogenies of all the
nod genes in the symbiotic region in the plasmid of C. taiwanensis (nodDBCHASUQ; Amadou et
al. 2008), but we could not find their paralogous genes that may clarify the duplicational origins.
Then, we investigated if either of the α-rhizobial and β-rhizobial clade is basal to the other in
their trees, because in such cases they may still provide information on the origins of the nod
by guest on July 13, 2016
http://mbe.oxfordjournals.org/
genes. In all the trees that we reconstructed, however, the α-rhizobial and β-rhizobial clades
constituted sister groups to each other (supplementary fig. S8, Supplementary Material online).
Therefore, the phylogenies of these other nod genes neither ruled out nor supported the β
-rhizobial origin of the nodulation.
Evolutionary rates of nodIJ genes increased after gene duplication.
The tree of the nodIJ and DRA-ATPase/permease genes showed an elongated branch of the
ancestor of the rhizobial nodIJ clade (fig. 2 and supplementary figs. S2-S4, Supplementary
Material online). A likelihood ratio test (LRT) rejected the hypothesis of a global clock model
against an alternative hypothesis of a 2-local-clocks model (i.e., the nodIJ lineages had a different
rate from the other DRA-ATPase/permease lineages) using the nucleotide substitution model (2
lnL=349.25, P<10-3
). Figure 4 gives the estimates of the evolutionary rates, assuming a relaxed clock based on the Bayesian method implemented in the MCMCTREE program (Yang 2007).
The evolutionary rates of the nodIJ and DRA-ATPase/permease genes toward B. phymatum
indicate an increase in the rates following gene duplication in the nodIJ lineage (blue and red
branches in fig. 4). Other branches for the nodIJ genes also showed increases in the substitution
rates (supplementary fig. S9, Supplementary Material online). Under the 2-local-clocks model in
the BASEML program (Yang 2007), the estimated rate for the branch group of the nodIJ lineage
(5.30 x 10-9 substitution/site/year) was 6.7 times the estimated rate for the branch group of the
DRA-ATPase/permease lineages (0.785 x 10-9
substitution/site/year).
We tested whether this elevation of the substitution rates reflects changes in the
nonsynonymous/synonymous rates (ω= dn/ds), as studied previously (Arbiza et al. 2006; Neiman
et al. 2010; Zhong et al. 2009). We applied a branch model in the CODEML program (Yang
2007), allowing different ratios of ω among the lineages in a tree. An LRT did not show
significant likelihood differences between the two-ratio (different rates between the nodIJ and
DRA-ATPase/permease lineages) and one-ratio (a single rate) models (table 1). The estimated ω
of the nodIJ lineage (0.10313) was similar to that of the DRA-ATPase/permease lineages
(0.09245). The results of the analysis using the clade model and the branch-site model did not
differ (table 1). In summary, the evolutionary rates of nodIJ were accelerated relative to the
by guest on July 13, 2016
http://mbe.oxfordjournals.org/
evolutionary rates of the DRA-ATPase/permease genes. This elevation is consistent with their
novel function in nodulation, but was not attributed to the change in ω.
GC content of the nodIJ genes is related to an accelerated evolutionary rate.
We assembled the nodIJ and DRA-ATPase/permease sequences from the sequenced genomes
and analyzed their nucleotide compositions. Whereas the DRA-ATPase/permease sequences had
GC contents approaching the genomic average, the GC contents of the nodIJ sequences were
reduced (figs. 1 and 5A). Accordingly, the synonymous codon usages were almost the same
between the DRA-ATPase/permease and core genes; in contrast, those of the nodIJ genes and the
genes in the symbiotic regions were similar (fig. 5B). A disparity index test confirmed that the
nodIJ genes have a pattern of base composition that is distinct from those of the
DRA-ATPase/permease genes at their 3rd
codon positions and for the total nucleotides (supplementary table S6, Supplementary Material online).
To examine the effects of GC content on the changes in the evolutionary rate of nodIJ, we
applied a nonhomogeneous model implemented in BppML (Dutheil and Boussau 2008) to
compare the GC content (θ) and transitions/transversions ratio (κ). Table 2 shows that the
homogeneous model (a single θ and κ) was rejected in favor of the nonhomogeneous models
(separate θs and/or κs between the nodIJ and DRA-ATPase/permease lineages). Both the NH4
model (one θ and two κs) and NH2 model (two θs and one κ) fit the data significantly better than
the homogeneous model (one θ and one κ) using LRTs. The NH2 model was not strongly
rejected when compared with the NH3 model (two θs and two κs) by LRT, and the Bayesian
information criterion (BIC) favored the NH2 model over the NH3 model (table 2). Therefore, the
differences in κs between the nodIJ and DRA-ATPase/permease lineages may be explained by
the change in θs. These results suggest that the elevation of the substitution rate in the nodIJ
lineage is unrelated to differences in either the ω or κ ratios but may reflect a change in θ.
by guest on July 13, 2016
http://mbe.oxfordjournals.org/
Discussion
We investigated the origin and evolution of the common nodulation genes nodIJ by studying
gene duplication and horizontal transfer. In general, it is difficult to discern the order and direction of horizontal gene transfer events and the origin of a gene, even with a resolved gene
phylogeny (Chen et al. 2005; Kunin et al. 2005; Young 2005). However, the nodIJ genes are
exceptional in this respect: our exhaustive phylogenetic analysis provided clear evidence of duplication in the lineage of the family Burkholderiaceae following the divergence of
Burkholderia and Cupriavidus. Gene duplication and the ensuing modification of genes are thought to play an important role in the origin of the functional novelty of genes (Hughes 1994, 2002; Barton et al. 2007). The significantly elevated evolutionary rate (fig. 4) and horizontal gene
transfers in the nodIJ lineage of nodulating α- and β- proteobacteria (figs. 1 and 2) suggest that
sequence evolution following the duplication led to their novel functions in nodulation. The
nodIJ sequence would have been originated from gene duplication in the lineage of β
-proteobacteria, followed by transfer to α-proteobacteria.
Previous studies have discussed the origin of rhizobia (Doyle 1998; Young and Haukka 1996;
Angus and Hirsh 2010; Bontemps et al. 2010; Martinez-Romero et al. 2010). Doyle (1998)
proposed that a phylogenetic analysis of nodulation genes rather than of housekeeping genes is appropriate for addressing the origin of nodulation. In contrast, other studies have argued the importance of the diversity and distribution of rhizobial species. Bontemps et al. (2010)
emphasized that the majority of rhizobia characterized to date are α-proteobacteria and that α
-rhizobia exhibit a greater taxonomic diversity, greater nodulation gene diversity, and far greater range of legume host taxa. In addition, these authors noted that the diversity of nodulation gene
sequences within α-rhizobia is greater than that between α- and β-rhizobia and suggested that
horizontal transfer is more likely to have occurred from α- to β-proteobacteria (Bontemps et al.
2010). Several studies have also analyzed the diversity of nod genes sequences, proposing the
hypothesis that nod genes evolved in Bradyrhizobium and then transferred to other genera
(Martinez-Romero et al. 2010; Ormeno-Orrillo et al. 2013). A study of the genome sequence of C.
taiwanensis suggested that this β-rhizobium has evolved recently because it exhibits the tightest
by guest on July 13, 2016
http://mbe.oxfordjournals.org/
clustering of symbiotic genes, a few nod-box sequences, and very few different Nod factors (Amadou et al. 2008).
A few studies have discussed the possibility of transfer from β- to α-proteobacteria, though
the ranges of organisms and genes in these studies are limited (Young 2005; Gyaneshwar et al.
2011). Kock (2004) isolated many Burkholderia strains and a few strains of Bradyrhizobium and
Rhizobium from Cyclopia spp. Because these have very similar nodA sequences, she suggested
that Burkholderia isolates acquired the gene through horizontal transfer from Bradyrhizobium or
Rhizobium (Kock 2004). However, a review by Gyaneshwar et al. (2011) proposed that
Burkhorderia was the donor because of its preponderance as a Cyclopia symbiont. Young (2005)
postulated that Bradyrhizobium acquired nif genes from Burkholderia (Chen et al. 2005,
Gyaneshwar et al. 2011), yet a recent study showed that the nif genes of Bradyrhizobium and
Burkholderia belong to the α-proteobacteria lineage (Hartmann and Barnum 2010).
The results of our phylogenetic analyses are consistent with those of some other studies of
rhizobial evolution. Andam et al. (2007) suggested that Cupriavidus strains acquired symbiotic
genes from Burkholderia rather than from α-proteobacteria. In the present study, the Cupriavidus
nodIJ genes were most similar to the Burkholderia genes (fig. 1). In addition, the nodIJ tree was similar but not congruent with the species tree of rhizobia (fig. 2, Supplementary Material Online), consistent with previous studies suggesting that the spread and maintenance of symbiotic genes occurred thorough vertical transmission but that lateral transfer also played a significant role (Wernegreen and Riley 1999; Moulin et al. 2004; Chen et al. 2008; Han et al. 2010; Gerding et al. 2012). As the present study uncovered the unexpected complexity of the evolution of nodulation, we argue that further studies should be designed to clarify its origin.
It is well known that duplicated genes can have relaxed functional constraints, allowing one
of the genes to acquire novel functions under positive selection (Barton et al. 2007). Such
positive selections are usually observed as long phylogenetic branches that are characterized by accelerated nonsynonymous substitution rates (Arbiza et al. 2006; Neiman et al. 2010; Zhong et
al. 2009). In the present analysis, the nodIJ lineages actually showed long branches, but they
were characterized by lower GC contents than the other DRA-ATPase/permease lineages (tables
by guest on July 13, 2016
http://mbe.oxfordjournals.org/
1 and 2). It is often supposed that genomic regions of specific GC content are due to recent horizontal gene transfer events, reflecting the specific base composition of their donor genomes (Lawrence and Ochman 1997; Azad and Lawrence 2011). However, several studies challenge this view. For example, Raghavan et al. (2012) indicated that selection operates on the base composition of individual coding regions of bacterial genes. These authors demonstrated that the GC content of the synonymous sites of highly expressed genes are essentially higher than those of noncoding regions and that the overall base composition of these genes affects bacterial fitness: strains expressing equivalent genes of higher GC content have significantly faster doubling times. Such selection may operate on housekeeping genes that are typically transcribed
constitutively and highly but not on the nod genes in a symbiotic region that are expressed only
in response to biochemical signals from host legumes. In fact, there is high divergence in GC content and synonymous codon usage between the genes in symbiotic regions and other regions (fig. 5). Juhas et al. (2009) suggested that histone-like nucleotide-structuring proteins prefer to bind to regions of low GC content in bacterial genomes and may regulate gene expression in
genomic islands. Notably, the nodABCIJ operon has a special element (nod box) in its promoter
and is regulated in a characteristic manner. Bohlin et al. (2010) showed that intra-genomic GC content heterogeneity is linked with bacterial phenotypes, particularly the oxygen requirement.
The oxygen supply is suggested to act as a mechanism for sanctions against cheating α-rhizobia
by host legumes (Kiers et al. 2003), and the oxygen concentration influences the nitrogenase
activity of rhizobia in nodules (Doyle 1998). Moreover, according to Young et al. (2006), the nod
genes consistently have a lower GC content than any rhizobia, and no nodulating bacteria with a low genomic GC content have yet been found. Because our results suggest a duplicational origin of nodIJ in the β-proteobacterial genome, followed by accelerated evolution, we propose the following scenario for the origin and evolution of these common nodulation genes: after the
duplication of the DRA-ATPase/permease genes in β-proteobacteria, ancestral nodIJ genes were
translocated near other nodulation genes, thus creating a symbiosis region; their GC content then decreased with their functional activity as nodulation genes (fig. 1). Our findings revise the current views on the origin of rhizobial symbiosis.
by guest on July 13, 2016
http://mbe.oxfordjournals.org/
Materials and Methods
Phylogenetic profiling analysis.
We searched for genes that are preferentially associated with rhizobia and that possibly originated
from α- or β-proteobacteria using two tools for phylogenetic profiling analysis. We first
employed the “Phylogenetic Pattern Analysis” tool, which is available as part of the MBGD interface (Uchiyama 2005). Three occurrence patterns preferentially associated with (i) rhizobia,
(ii) rhizobia and α-proteobacteria, and (iii) rhizobia and β-proteobacteria were analyzed with the
species in all 6 families of α- and β-proteobacteria that include at least one species of nodulating
bacteria. When we designated the 6 families, the MBGD system chose 55 representative species automatically. The gene clusters with correlation coefficients less than 0.15 and including at least
one of each α- and β-rhizobia species were selected (supplementary table S1, Supplementary
Material online). The genomes of the same 55 species were also analyzed with the “Phylogenetic Profiler for Single Genes” tool in IMG system version 3.5 (Markowitz et al. 2009) using the following options: max. E-value, 1e-5; min. percent identity, 50; algorithm, by taxon percent with homologs; minimum taxon percent with homologs, 80; and minimum taxon percent without homologs, 70 (supplementary table S2, Supplementary Material online). For the analysis of the phylogenetic relationships of the sequences in the gene clusters, we used the clustering tree option in MBGD, the blast tree view option in NCBI, and the KEGG orthology.
Isolation of β-rhizobial strains, DNA extraction, and sequencing.
New β-rhizobial strains were isolated from the mature nodules of Mimosa pudica from Okinawa
Island, Japan. A total of 5 strains were obtained from 5 independent nodules of plants in the field at the University of Ryukyus. Isolation was conducted on yeast-mannitol (YM) or tryptone-yeast extract (TY) media, as described previously (Aoki et al. 2010). The reference strains and new
isolates are listed in supplementary table S4 (Supplementary Material online). DNA extraction
and sequencing were performed using specific primers (supplementary table S7, Supplementary Material online), as described previously (Clark et al. 2002; Aoki et al. 2010). The 3-step PCR
protocol, a common method, was effective for isolating the nodIJ and 16S rDNA genes, but the
by guest on July 13, 2016
http://mbe.oxfordjournals.org/
two-step method was better for isolating the DRA-ATPase/permease sequences. PCR conditions
for the nodIJ and 16S rDNA were as follows: preheating at 94 C for 3 min, 40 cycles of
denaturing at 98 C for 10 s, annealing at 52 C-67 C for 30 s, and extension at 72 C, and a final
extension at 72 C for 10 min. The complete ORF sequence of the nodIJ genes was amplified
using the primers nodCpRALTA1154 and nodHpRALTA66c (supplementary table S7, Supplementary Material online). The partial sequence of the 16S rDNA was isolated using the primers 27f and 1525r. The two-step PCR conditions were as follows: preheating at 94 C for 3 min, 40 cycles of denaturing at 98 C for 10 s, annealing and extension at 68 C, and a final extension at 68 C for 10 min. The complete sequence of the DRA-ATPase/permease operon was amplified using the primer pairs 108/RALTA2104-964,
RALTA2100-133/RALTA2104-964, RALTA2100-108/RALTA2104-1018 or RALTA2100-
108/RALTA2104-1018. Each reaction was performed using 0.5 μl DNA extract with 0.1 μl
LA-Taq, 1 μl LA-Taq buffer, 2.5 mM MgCl2, 200 μM each dNTP, and 0.5 μM each primer in a 10 μl
final reaction volume. The amplified DNA fragments were cloned into the pGEM-T (Easy) vector (Promega, USA) or purified with ExoSAP-IT (USB, USA). The Genome Lab Dye Terminator Cycle Sequencing with Quick Start Kit and CEQ8000 Genetic Analysis System (Beckman Coulter, Inc., USA) were used for sequencing. Although the 5 strains isolated from Japan showed different RAPD patterns, the sequence analysis grouped them into two categories.
Data assembly and sequence alignment.
In addition to the sequences of the new strains, sequences similar to the nodIJ genes were
collected with similarity searches against 1,373 fully sequenced genomes using the KEGG Sequence Similarity Database (SSDB) on January 7, 2011 (Kanehisa et al. 2012). The amino acid
sequences similar to the nodI gene of Mesorhizobium loti MAFF303099 (mlr6164) were selected
using the options of bidirectional best (best-best), forward-best, and reverse-best hits; sequences with Smith-Waterman (SW) scores greater than 100 were collected (supplementary table S3, Supplementary Material online). Those containing large indels or that were highly diverged were
omitted from the dataset. The dataset contained nine archaeal sequences of the nodI homolog
(DRA-ATPase). One of the archaeal genes, Mbur1717 of Methanococcoides burtonii DSM6242,
by guest on July 13, 2016
http://mbe.oxfordjournals.org/
sequences for the dataset. The sequences of eight α-rhizobial strains for which the sequence of
nodIJ genes had been studied (but not their entire genomes) were added to the dataset. Lastly,
147 genes were selected to compose the dataset of the nodI homologous genes; this dataset
contained all the sequences of K09695 (130 genes) in the KEGG Orthology Database, though
there is no sequence of non-nodulating α-proteobacteria in the larger dataset and K09695
(supplementary table S3, Supplementary Material online). We also referenced other genome databases and confirmed that our dataset is largely consistent with the gene clusters of these databases (COG1131 in COG and eggNOG3.0, HBG758042 in HOGENOM, and ortholog
cluster 162712 in IMG3.4). Furthermore, we collected sequences similar to nodI using a BLAST
search of the NCBI database on April 1, 2013, and confirmed that the phylogenies in figure 1 and supplementary figures S2-S4 (Supplementary Material online) contained appropriate OTUs for the analysis (data not shown). The dataset contained the genes of 18 nodulating bacteria but
excluded the nodI sequence of A. caulinodans ORS571 (AZC_3813) because of its high
divergence. The SSDB search did not select AZC_3813 as a sequence with an SW score larger than 100. This result was confirmed by ortholog databases. The KEGG Orthology Database did not include AZC_3813 in any orthologous groups. The OMA database divided AZC_3813
(OMA group 160863) from the group of 13 members of rhizobial nodI genes (OMA group
74164). The HOGENOM Database omitted AZC_3813 from the gene family HBG758042 that
contains all the other nodI sequences of rhizobia. AZC_3813 is included in HBG334846.
eggNOG3.0 included it in COG0842 but not in COG1131. The nodJ gene of A. caulinodans
(AZC_3812) also has a divergent sequence: previous studies reported that the sequences of the
nod genes of A. caulinodans ORS571 show high divergence and can cause long branch attraction
artifacts (Chen et al. 2003; Moulin et al. 2004; Tian et al. 2012). The addition of the A.
caulinodans sequence supported the results of previous studies (supplementary figs. S5 and S8,
Supplementary Material online). As the outgroup sequence, we used the four sequences of the genes in the ortholog group K09687 of KEGG Orthology Database that is the phylogenetic sister group of K09695 (supplementary table S3, Supplementary Material online). Both searches using
mlr6164 and Mbur1717 as queries selected the four genes as highly similar to nodI genes.
Sequences similar to nodJ of M. loti (mlr6166) were also collected. A search using the
archaeal sequence of Mbur1716 (DRA-permease) of M. burtonii as the query did not update the
information in the dataset resulting from mlr6166. The dataset of orthologous genes to nodJ is
by guest on July 13, 2016
http://mbe.oxfordjournals.org/
similar to K09694 in KEGG, COG0842 in COG and eggNOG 2.0, and ortholog cluster 5816 in
IMG3.4. Acidovorax ebreus TPSY and Bordetella parapertussis 12822 lack the nodI and nodJ
homologous sequence, respectively (supplementary table S4, Supplementary Material online).
Seventeen species of actinobacteria have two nodJ-similar sequences tandemly. As a result, 141
strains harbored both nodI and nodJ homologous sequences. Five strains of β-proteobacteria have
both nodIJ and DRA-ATPase/permease sequences paralogously. The dataset of 146 sequences,
designated in this study as the “larger dataset,” was used for the phylogenetic analysis with the distance method. For the phylogenetic analysis of the genes identified by the phylogenetic
profiling analysis (fdxN, fixABC, metQ, nifBDEHKN, and nodA), we used the amino acid
sequences of all the genes in each orthologous group identified by the MBGD system (supplementary table S1, Supplementary Material online). The phylogenies of all the nodulation
genes of C. taiwanensis (nodCBCHASUQ) were essentially analyzed using the genes in the
orthologous groups of KEGG Orthology Database because the larger nodIJ gene datasets were
appreciably similar to K09695 and K09694. For the analysis of the nodA, nodB, nodC, and nodD
genes, we used all the sequences in orthologous groups K14658, K14659, K14666, and K14657, respectively. The sequences that belonged to K00860 and showed SW scores greater than 640
were used for the analysis of nodQ-like genes; the sequences that belong to K00612 or K04128
and show SW scores greater than 700 were used for the analysis of nodU. The sequences similar
to nodH and nodS of C. taiwanensis were collected using an SSDB search with an SW score. The
protein-coding sequences of 25 core genes from the 62 species were collected for the analysis of the phylogeny of bacterial strains following Battistuzzi and Hedges (2009).
The amino-acid sequences were aligned with the MAFFT program (Katoh et al. 2005) using
the strategies L-INS-i (nod and 25 core genes) and Q-INS-i (16S rDNA), followed by manual
adjustment. The nucleotide sequences were then forced to fit the amino acid alignments using the PAL2NAL program (Suyama et al. 2006). The sequence alignments are deposited at the Dryad Digital repository (http://datadryad.org/). The protein-coding sequences for the analysis of the phylogeny of the 25 core genes were concatenated following the method of Battistuzzi and Hedges (2009). To delete the poorly aligned sites of 25 core genes, we used Gblocks 0.91b (Castresana 2009) and Slowfaster (Kostka et al. 2008). GBlocks 0.91b was applied to the concatenated dataset, with the minimum length of a block parameter set to two. SlowFaster was
by guest on July 13, 2016
http://mbe.oxfordjournals.org/
phylogenies were constructed with MEGA5 software (ML tree using JTT+uniform rate and NJ tree using JTT+Gamma with calculated alpha value), and comparisons of the bootstrap support for the monophyly of the classes of bacteria were conducted following Battistuzzi and Hedges (2009). The 97.5% SlowFaster stringency was finally selected. A final data set of 7,911 amino acid sites was used for the phylogenetic analysis of the 25 core genes.
Analysis of base composition and codon usage.
We employed three analyses to determine whether the nodIJ genes had a different pattern of
nucleotide sequences from the DRA-ATPase/permease genes. The GC content of each sequence was estimated using MEGA5 (Tamura et al. 2011). The whole-genome GC content was obtained from the NCBI or JGI homepages. MEGA5 was also used to analyze the departure from base-compositional stationarity using the disparity index test (ID-test; Kumar and Gadagkar 2001). The homogeneity of the substitution pattern was tested using the Monte Carlo method with 1,000 replications. We calculated the probability of rejecting the null hypothesis that sequences have evolved with the same pattern of substitution at p < 0.01 (Chiari et al. 2009). Supplementary table S6 (Supplementary Material online) shows the proportion of genes for which the homogeneity assumption of disparity index test was rejected. An analysis of the variation of codon usage was undertaken using the program GCUA 1.1 (McInerney 1998). A correspondence analysis of the relative synonymous codon usage values was performed to determine the major source of codon usage variation.
Phylogenetic analysis.
We explored the following previously used methods to analyze the dataset of nucleotide sequences with the base compositional heterogeneity: LogDet transformation (Lockhart et al. 1994), nonhomogeneous model (Galtier and Gouy 1998), and amino acid recording (Loomis and Smith 1990; Sheffield et al. 2009). The phylogenies are deposited at the Dryad Digital Repository (http://datadryad.org/). For the primary analysis, we examined the larger data set of 146
sequences of the nodIJ and DRA-ATPase/permease genes using the distance method. A
neighbor-joining (NJ) analysis after LogDet transformation was performed with the PAUP4b10
by guest on July 13, 2016
http://mbe.oxfordjournals.org/
program (Swofford 1998). The proportion of invariable sites was estimated using PAUP and Modeltest (Posada and Crandall 1998) under the GTR model. The larger dataset was also analyzed with the NJ method after amino acid recording with the JTT model using MEGA5. Our
phylogenic analysis divided the nodIJ and DRA-ATPase/permease genes into 17 groups
(supplementary fig. S4 and table S6, Supplementary Material online). Both trees of the
independent nodI and nodJ genes supported the results of the grouping (data not shown). As the
genes in groups VI, VIII, X, and XIV have large deletions and/or diverged sequences (supplementary table S6, Supplementary Material online), we did not use them for the ensuing analysis. For further phylogenetic analysis with the maximum likelihood (ML) and Bayesian inference (BI) methods, 69 sequences of 64 strains (two novel isolates and 62 organisms whose genomes were sequenced) were selected as a “smaller dataset.” The ML analysis of the nucleotide sequences was conducted with the nonhomogeneous model implemented in BppML in the Bio++ Program Suite package version 0.3.1 (Dutheil and Boussau 2008), which allows non-stationary and non-homogeneity in a dataset. We used the T92 model of sequence evolution (Tamura 1992) with a gamma (4 classes) distribution for rate evolution and the following options: optimization.topology=yes and bootstrap.verbose=yes. To obtain a reasonable starting tree for the ML search, we reconstructed an NJ tree with the LogDet model using PHYLO_WIN (Galtier et al. 1996). BI analysis after amino acid recording was performed with MrBayes 3.1.2 using a WAG+I+G model, as described previously (Aoki et al. 2010). The analysis was replicated twice and sampled every 100th generation for one million generations. We confirmed that all the values of effective sample size are greater than 100 with Tracer v1.5 (Drummond AJ,
Rambaut 2007; Lukoschek et al. 2012). An ML analysis with the amino acid sequences of nodIJ
homologs was conducted using RAxML version 7.2.8 with the PROTGAMMAIWAG model
(Stamatakis et al. 2005). Because the independent phylogenies of the nodI and nodJ genes in the
clade of the Burkholderiaceae family had the same topology as the nodIJ tree, we used the
concatenated sequence of this family for the following cophylogenetic analysis.
For the phylogenetic analysis of the 25 core genes, a dataset of 7,911 amino acid sites of 62 species was analyzed using RAxML with the PROTGAMMAJTT model (Battistuzzi and Hedges
2009). The sequence of B. tuberum STM678 was analyzed with a partial sequence of the nodI
(556 bp) (DDBJ/EMBL/GenBank accession number AJ306730, Moulin et al. 2001) and nodIJ
by guest on July 13, 2016
http://mbe.oxfordjournals.org/
using CLC genomic workbench 6.0.2. De novo assembly was performed with the following options: mismatch cost, 3; insertion cost, 3; deletion cost, 3; length fraction, 1; and similarity, 1. The sequences were added to the smaller dataset and analyzed with ML method using the nonhomogeneous model (supplementary fig. S7, Supplementary Material online). To obtain a reasonable starting tree, we used an NJ tree with a LogDet model and an ML tree after amino acid recording. Analysis with the ML and BI methods was also examined after amino acid recording (data not shown). We used the NJ method with MEGA5 (JTT model) for the analysis
of the fdxN, fixABC, metQ, nifBDEHKN, and nodA genes in supplementary figure S1
(Supplementary Material online). For the phylogenies of the C. taiwanensis (nodDBCHASUQ)
genes, the 1st and 2nd codon positions were analyzed by the ML method using MEGA5 with the GTR+G model (supplementary fig. S8, Supplementary Material online).
Phylogenetic congruence test.
To test for cophylogeny, we employed three methods: a topology-based maximum co-divergence approach, a distance-based permutation procedure, and a data-based method. The program Jane2 (Conow et al. 2010) was used for the topology-based method, as used in previous studies (Jeong et al. 1999; Wernegreen and Riley 1999; Suominen et al. 2001). Using this method, one can test the null hypothesis that two phylogenies are randomly related by comparing the scores of optimal reconstructions with those of randomly obtained phylogenies though permutational procedures (Refregier et al. 2008). Tip mapping was conducted with random permutations and a sample size of 500, and the number of observed co-diversification events was tested against that obtained under randomization. The default setting of costs with a node-based cost model was used. The distance-based analysis of cophylogeny was performed with CopyCat (Meier-Kolthoff et al. 2007), a wrapper for ParaFit (Legendre et al. 2002), to assess the null hypothesis of a random association between the host and symbiont. Distance matrices of the host and symbiont (parasite) genes were created using MEGA5 with a gamma distribution and JTT model. The matrices and an association file representing the host-symbiont pairs were input into CopyCat. The program implements a global test and tests of individual links between the host and symbiont phylogenies. A total of 999 permutations were performed. For the data-based method, tests of alternative phylogenetic hypotheses between the gene datasets were conducted in an ML framework using
by guest on July 13, 2016
http://mbe.oxfordjournals.org/
the CONSEL program (Shimodaira and Hasegawa 2001; Steenkamp et al 2008; Moszczynski et al. 2012). RAxML was used to produce a log file for the site-wise log-likelihoods of alternative trees given the datasets. The generated log file was run in CONSEL to calculate the p value for each alternative topology using the approximately unbiased (AU; Shimodaira 2002), Shimodaira-Hasegawa (SH; Shimodaira and Shimodaira-Hasegawa 1999) and Kishino–Shimodaira-Hasegawa (KH; Kishino and Hasegawa 1989) tests.
Estimation of molecular evolutionary rate.
The molecular evolutionary rate was analyzed using a Bayesian approach implemented in MCMCTREE in the PAML4 program package (Yang 2007). We used correlated- and independent-rate models under a GTR (nucleotide) or WAG (amino acid) model with a discrete gamma distribution (5 rate categories) and calibrations of hard bounds. The shape and scale
parameters in the gamma prior for the overall rate parameter and σ2
were estimated following Inoue et al. (http://abacus.gene.ucl.ac.uk/software/MCMCtreeStepByStepManual.pdf). First, 25
core genes of 62 species were examined to estimate the divergence times between Cupriavidus
and Burkholderia, the basal divergence of Cupriavidus species, and the basal divergence of
Burkholderia species used in this study. The calibration points of prokaryotes were according to
Battistuzzi and Hedges (2009): a minimum time for the divergence of σ- and β-proteobacteria of
1,640 Mya (million years ago), a minimum time for the divergence of Cyanobacteria and Dehalococcoidetes of 2,300 Mya, a maximum of 4,000 Mya for the earliest land-dwelling taxa, and a maximum boundary for the ingroup root node of 4,200 Mya. The estimated divergence
times were then used for the analysis of the molecular evolutionary rate of the nodIJ and
DRA-ATPase/permease genes, together with a calibration point of the rhizobia-legume symbiosis. Previous studies have suggested that legumes were developing a capacity for symbiotic nitrogen fixation with rhizobia by approximately 60 Mya (Cannon et al 2010; Held et al. 2010). Wikstrom
et al. (2001) estimated the basal divergence of Leguminosae (Bauhinia and Pisum) at 56-68 Mya.
Lavin et al. (2005) fixed the legume stem clade at 60 Mya and estimated the divergence of Mimosoideae and Faboideae at 58.1-59.0 Mya. Sprent (2007) stated “Legume evolved about 60 Mya, and nodulation 58 Mya.” Kistner and Parniske (2002) described that the oldest known fossil
by guest on July 13, 2016
http://mbe.oxfordjournals.org/
The host legumes in this study range from Mimosoideae to Faboideae. We then set the maximum time for the basal divergence of the genes of rhizobia interacting with these legumes at 60 Mya.
Molecular evolutionary analysis.
To examine the pattern of ω (dn/ds: nonsynonymous/synonymous substitution rate ratio), κ
(transitions/transversions ratio), and θ (GC content) acting on the nodIJ and
DRA-ATPase/permease genes at global and lineage-specific levels, maximum likelihood tests implemented in the PAML and Bio++ Suite software packages were applied to the sequences of rhizobia and Burkholderiaceae species (Yang 2007; Dutheil and Boussau2008). The branch-specific model of PAML includes the two-ratio and one-ratio models to represent the different
presumed hypotheses. The two-ratio model assigns the ratio ωnodIJ to the branch of the rhizobia
clade and the ratio ωDRA-ATPase/permease to all other branches. The one-ratio model assumes a constant
ratio ω across all branches. The two models can be compared using the likelihood ratio test
(LRT) with the CODEML program in PAML. To perform independent tests of the overall
influence of θ and κ, we applied the nonhomogeneous model of nucleotide sequence evolution
using the BppML program in the Bio++ Suite package (Galtier and Gouy 1998; Dutheil and
Boussau 2008). This program allows variation in θ or κ among the branches by relaxing the
assumption of the homogeneous base composition. In this model, the LRT compares the likelihoods of two nested models of evolution: the “nonhomogeneous model,” in which separate
rates (κnodIJ, θnodIJ, κDRA-ATPase/permease, and θDRA-ATPase/permease) are estimated from sets of branches on a
phylogenetic tree, and the “homogeneous model,” in which a single rate of κ and θ is estimated
from the same tree. We estimated the ML of several alternative hypotheses using the branch-specific nonhomogeneous models (general models NH2, NH3, and NH4 in Duthei and Boussau (2008)) with the T92+Gamma4 substitution model. The NH2 model, for example, assigns the
ratio θnodIJ to the branches of the rhizobia clade and the ratio θDRA-ATPase/permease to all other branches,
whereas κ is constant over the tree.
by guest on July 13, 2016
http://mbe.oxfordjournals.org/
Acknowledgements. We are grateful to Dr. Hirohisa Kishino and Dr. Julien Dutheil for their help. We also thank the editor and three anonymous reviewers for insightful comments that greatly improved the manuscript. This work was supported by a Grand-in-Aid for Scientific Research (C) and Scientific Research on Priority Areas ‘Comparative Genomics’ from the Ministry of Education, Culture, Sports, Science, and Technology of Japan. All sequences described in this study were deposited at DDBJ/EMBL/GenBank under accession numbers AB749314-AB749323.
References
Amadou C, Pascal G, Mangenot S, et al. (20 co-authors). 2008. Genome sequence of the
beta-rhizobium Cupriavidus taiwanensis and comparative genomics of rhizobia. Genome Res.
18:1472-1483.
Andam CP, Mondo SJ, Parker MA. 2007. Monophyly of nodA and nifH Genes across Texan and
Costa Rican Populations of Cupriavidus Nodule Symbionts. Appl Environ Microbiol.
73:4686-4690.
Angus AA, Hirsch AM 2010. Insights into the history of the legume-betaproteobacterial symbiosis. Mol Ecol. 19:28-30.
Aoki S, Kondo T, Prevost D, Nakata S, Kajita T, Ito M. 2010. Genotypic and phenotypic
diversity of rhizobia isolated from Lathyrus japonicus indigenous to Japan. Syst Appl
Microbiol.33:383-397.
Azad RK, Lawrence JG. 2011. Towards more robust methods of alien gene detection. Nucleic Acids Res. 39:e56.
Balachander D, Raja P, Kumar K, Sundaram SP. 2007. Non-rhibozial nodulation in legumes. Biotech Mol Biol Rev. 2 49-57.
Barton NH, Briggs DEG, Eisen JA, Goldstein DB, Patel NH. 2007. Evolution. New York: Cold Spring Harbor Laboratory Press.
by guest on July 13, 2016
http://mbe.oxfordjournals.org/
on land. Mol Biol Evol. 26:335-343.
Bohlin J, Snipen L, Hardy SP, Kristoffersen AB, Lagesen K, Donsvik T, Skjerve E, Ussery DW. 2010. Analysis of intra-genomic GC content homogeneity within prokaryotes. BMC Genomics 11:464.
Bontemps C, Elliott GN, Simo MF, et al. (12 co-authors). 2010. Burkholderia species are ancient
symbionts of legumes. Mol Ecol.19:44-52.
Cannon SB, Ilut D, Farmer AD, Maki SL, May GD, Singer SR, Doyle JJ. 2010. Polyploidy did not predate the evolution of nodulation in all legumes. PLoS ONE 5:e11630.
Castresana J. 2000. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol. 17:540-552.
Chen WM, de Faria SM, Straliotto R, et al. (13 co-authors). 2005. Proof that Burkholderia forms
effective symbioses with legumes: a study of novel Mimosa-nodulating strains from South
America. Appl Environ Microbiol. 71:7461-7471.
Chen WF, Guan SH, Zhao CT, Yan XR, Man CX, Wang ET, Chen WX. 2008. Different
Mesorhizobium species associated with Caragana carry similar symbiotic genes and have common host ranges. FEMS Microbiol Lett. 283:203-209.
Chen WM, Laevens S, Lee TM, Coenye T, de Vos P, Mergeay M, Vandamme P. 2001. Ralstonia
taiwanensis sp. nov., isolated from root nodules of Mimosa species and sputum of a cystic fibrosis patient. Int J Syst Evol Microbiol. 51:1729-1735.
Chen WM, Moulin L, Bontemps C, Vandamme P, Bena G, Boivin-Masson C. 2003. Legume
symbiotic nitrogen fixation by beta-proteobacteria is widespread in Nature. J Bacteriol.
185:7266-7272.
Chiari Y, Meijden A, Madsen O, Vences M, Meyer A. 2009. Base composition, selection, and phylogenetic significance of indels in the recombination activating gene-1 in vertebrates. Front Zool. 6:32.
Clark IM, Mendum TA, Hirsch PR. 2002. The influence of the symbiotic plasmid pRL1JI on the distribution of GM rhizobia in soil and crop rhizospheres, and implications for gene flow.
Antonie Van Leeuwenhoek 81:607-616.
Conow C, Fielder D, Ovadia Y, Libeskind-Hadas R. 2010. Jane: a new tool for the cophylogeny reconstruction problem. Algorithms Mol Biol. 5:16.
Davidson AL, Dassa E, Orelle C, Chen J. 2008. Structure, Function, and Evolution of Bacterial
by guest on July 13, 2016
http://mbe.oxfordjournals.org/
ATP-Binding Cassette Systems. Microbiol Mol Biol Rev. 72:317-364.
Doyle JJ. 1998. Phylogenetic perspectives on nodulation: evolving views of plants and symbiotic
bacteria. Trends Plant Sci.3:473-478.
Drummond AJ, Rambaut A. 2007. BEAST: Bayesian evolutionary analysis by sampling trees.
BMC Evol Biol.7:214.
Dutheil J, Boussau B. 2008. Non-homogeneous models of sequence evolution in the Bio++ suite
of libraries and programs. BMC Evol Biol.8:255.
Galtier N, Gouy M. 1998. Inferring pattern and process: maximum likelihood implementation of a non-homogeneous model of DNA sequence evolution for phylogenetic analysis. Mol Biol Evol. 15:871-879.
Galtier N, Gouy M, Gautier C. 1996. SEAVIEW and PHYLO_WIN: two graphic tools for sequence alignment and molecular phylogeny. Comput Appl Biosci. 12:543-548.
Garau G, Yates RJ, Deiana P, Howieson JG. 2009. Novel strains of nodulating Burkholderia
have a role in nitrogen fixation with papilionoid herbaceous legumes adapted to acid, infertile soils. Soil Biol Biochem. 41:125-134.
Gerding M, O’Hara GW, Brau L, Nandasena K, Howieson JG. 2012. Diverse Mesorhizobium spp. with unique nodA nodulating the South African legume species of the genus Lessertia. Plant Soil 358:385-401.
Giraud E, Moulin L, Vallenet D, et al. (34 co-authors). 2007 Legumes symbioses: absence of nod genes in photosynthetic bradyrhizobia. Science 316:1307-1312.
Groussin M, Pawlowski J, Yang Z. 2011. Bayesian relaxed clock estimation of divergence times in foraminifera. Mol Phylogenet Evol. 61:157-166.
Gyaneshwar P, Hirsch AM, Moulin L, et al. (12 co-authors). 2011. Legume-nodulating betaproteobacteria: diversity, host range and future prospects. Mol Plant-Microb Interact. 11:1276-1288.
Han TX, Tian CF, Wang ET, Chen WX. 2010. Associations Among Rhizobial Chromosomal Background, nod Genes, and Host Plants Based on the Analysisof Symbiosis of Indigenous Rhizobia and Wild Legumes Native to Xinjiang. Microb Ecol. 59:311-323.
Hartmann LS, Barnum SR. 2010. Inferring the evolutionary history of Mo-dependent nitrogen
fixation from phylogenetic studies of nifK and nifDK. J Mol Evol. 71:70-85.
by guest on July 13, 2016
http://mbe.oxfordjournals.org/
not so common symbiotic entry. Trends Plant Sci.15:540-545.
Hughes AL. 1994. The evolution of functionally novel proteins after gene duplication. Proc R
Soc London Ser B.256:119–124.
Hughes AL. 2002. Adaptive evolution after gene duplication. Trends Genet. 18:433-434.
Jeong S, Ritchie N, Myrold D. 1999. Molecular phylogenies of plants and Frankia support
multiple origins of actinorhizal symbioses. Mol Phylogenet Evol. 13:493-503.
Juhas M, Van Der Meer JR, Gaillard M, Harding RM, Hood DW, Crook DW. 2009. Genomic islands: tools of bacterial horizontal gene transfer and evolution. FEMS Microbiol Rev. 33:376-393.
Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M. 2012. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res. 40:D109-114.
Katoh K, Kuma K, Toh H, Miyata T. 2005. MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 33:511-518.
Kiers ET, Rousseau RA, West SA, Denison RF. 2003. Host sanctions and the legume-rhizobium
mutualism. Nature425:78-81.
Kishino H, Hasegawa M. 1989. Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data and the branching order Hominidae. J Mol Evol. 29:170-179.
Kistner C, Parniske M. 2002. Evolution of signal transduction in intracellular symbiosis. Trends
Plant Sci. 7:511-518.
Kock M. 2004. Diversity of root nodule bacteria associated with Cyclopia species. PhD thesis,
South Africa: University of Pretoria.
Kostka M, Uzlikova M, Cepicka I, Flegr J. 2008. SlowFaster, a user-friendly program for
slow-fast analysis and its application on phylogeny of Blastocystis. BMC Bioinformatics 9:341.
Kumar S, Gadagkar SR. 2001. Disparity Index: a simple statistic to measure and test the homogeneity of substitution patterns between molecular sequences. Genetics 158:1321-1327. Kunin V, Goldovsky L, Darzentas N, Ouzounis CA. 2005. The net of life: reconstructing the
microbial phylogenetic network. Genome Res. 15:954-959.
Lane DJ. 1991. 16S/23S rRNA sequencing. In: Stackebrandt E, Goodfellow M, editors. Nucleic
acid techniques in bacterial systematics. New York: John Wiley and Sons. p. 115-175.
Lavin M, Herendeen PS, Wojciechowski MF. 2005. Evolutionary rates analysis of Leguminosae
by guest on July 13, 2016
http://mbe.oxfordjournals.org/
implicates a rapid diversification of lineages during the Tertiary. Syst Biol.54:575-594.
Lawrence JG, Ochman H 1997. Amelioration of bacterial genomes: rates of change and exchange. J Mol Evol. 44:383-397.
Legendre P, Desdevises Y, Banzin E. 2002. A statistical test for host-parasite coevolution. Syst Biol. 51:217-234.
Lockhart PJ, Steel MA, Hendy MD, Penny D. 1994. Recovering evolutionary trees under a more realistic model of sequence evolution. Mol Biol Evol. 11:605-612.
Loomis WF, Smith DW. 1990. Molecular phylogeny of Dictyostelium discodeum by protein
sequence comparison. Proc Natl Acad Sci U S A. 87:9093-9097.
Lukoschek V, Keogh JS, Avise JC. 2012. Evaluating Fossil Calibrations for Dating Phylogenies in Light of Rates of Molecular Evolution: A Comparison of Three Approaches. Syst Biol. 61:22-43.
Markowitz VM, Mavromatis K, Ivanova NN, Chen IA, Chu K, Kyrpides NC. 2009. IMG ER: a system for microbial genome annotation expert review and curation. Bioinformatics 25:2271-2278.
Martinez-Romero JC, Ormeno-Orrillo E, Rogel MA, Lopez-Lopez A, Martinez-Romero E. 2010.
Trends in Rhizobial Evolution and Some Taxonomic Remarks. In: Pontarotti P, editor.
Evolutionary biology-concepts, molecular and morphological evolution. Berlin: Springer. p. 301-316.
McInerney JO. 1998. GCUA: general codon usage analysis. Bioinformatics 14:372-373.
Meier-Kolthoff JP, Auch AF, Huson DH, Goker M. 2007. COPYCAT: co-phylogenetic analysis tool. Bioinformatics 23:898-900.
Mishra RPN, Tisseyre P, Melkonian R, Chaintreuil C, Miche L, Klonowska A, Gonzalez S, Bena G, Laguerre G, Moulin L. 2012. Genetic diversity of Mimosa pudica rhizobial symbionts in
soils of French Guiana: investigating the origin and diversity of Burkholderia phymatum and
other beta-rhizobia. FEMS Microbiol Ecol. 79:487-503.
Moszczynski K, Mackiewicz P, Bodyl A. 2012. Evidence for Horizontal Gene Transfer from Bacteroidetes Bacteria to Dinoflagellate Minicircles. Mol Biol Evol. 29:887-892.
Moulin L, Bena G, Boivin-Masson C, Stepkowski T. 2004. Phylogenetic analyses of symbiotic
nodulation genes support vertical and lateral gene co-transfer within the Bradyrhizobium genus.
by guest on July 13, 2016
http://mbe.oxfordjournals.org/