Gene gain and loss events in Rickettsia and Orientiaspecies
Biology Direct volume 6, Article number: 6 (2011)
Genome degradation is an ongoing process in all members of the Rickettsiales order, which makes these bacterial species an excellent model for studying reductive evolution through interspecies variation in genome size and gene content. In this study, we evaluated the degree to which gene loss shaped the content of some Rickettsiales genomes. We shed light on the role played by horizontal gene transfers in the genome evolution of Rickettsiales.
Our phylogenomic tree, based on whole-genome content, presented a topology distinct from that of the whole core gene concatenated phylogenetic tree, suggesting that the gene repertoires involved have different evolutionary histories. Indeed, we present evidence for 3 possible horizontal gene transfer events from various organisms to Orientia and 6 to Rickettsia spp., while we also identified 3 possible horizontal gene transfer events from Rickettsia and Orientia to other bacteria. We found 17 putative genes in Rickettsia spp. that are probably the result of de novo gene creation; 2 of these genes appear to be functional. On the basis of these results, we were able to reconstruct the gene repertoires of "proto-Rickettsiales" and "proto-Rickettsiaceae", which correspond to the ancestors of Rickettsiales and Rickettsiaceae, respectively. Finally, we found that 2,135 genes were lost during the evolution of the Rickettsiaceae to an intracellular lifestyle.
Our phylogenetic analysis allowed us to track the gene gain and loss events occurring in bacterial genomes during their evolution from a free-living to an intracellular lifestyle. We have shown that the primary mechanism of evolution and specialization in strictly intracellular bacteria is gene loss. Despite the intracellular habitat, we found several horizontal gene transfers between Rickettsiales species and various prokaryotic, viral and eukaryotic species.
Open peer review
Reviewed by Arcady Mushegian, Eugene V. Koonin and Patrick Forterre. For the full reviews please go to the Reviewers' comments section.
Rickettsia species are best known as the causative agents of vector-borne diseases, with rickettsial diseases representing an important cause of illness and death worldwide . Rickettsia spp. have been isolated not only from hematophagous arthropod vectors, such as lice, ticks, fleas and mites , but also from several freshwater leeches [2–4], annelids, amoebae and plants . The rickettsial species associated with arthropods are split into 2 major groups: the spotted fever group (SFG), which is associated with ticks, fleas and mites, and the typhus group (TG), which is smaller and is associated with human body lice (Rickettsia prowazekii) and fleas (Rickettsia typhi) [2, 5]. Rickettsia bellii and Rickettsia canadensis exhibit important genomic divergences and branch outside both groups . Rickettsia spp. have reduced genomes that vary in size from 1.1 MB for the TG, 1.2-1.4 MB for the SFG and 1.5 Mb for R. bellii . Gene loss is thought to be a feature of the evolution of intracellular pathogenic bacteria [6–11]. In general, Rickettsiales genomes are still undergoing reduction, as supported by the R. prowazekii paradigm [12, 13], which makes them an excellent model in which to study this process by observing interspecies variations in genome size and gene content . The absence of a given gene in a genome (when compared to a related species) is often considered to be the result of its loss in that genome . Consequently, the evolutionary process of gene loss in bacterial species has already been investigated in other studies [9, 12, 14, 15].
Previous studies also showed that gene gain events occur in Rickettsiales [12, 16] by duplication. Given the genetic isolation of Rickettsiales, alternative mechanisms for gene gain seemed impossible until the discovery of horizontal gene transfer (HGT) events in rickettsial genomes [9, 17–20]. These findings made it acceptable to regard microbial genomes as dynamic entities that evolve by both losing and acquiring genes . Therefore, when comparing closely related genomes, the relative absence of a given gene in one genome could potentially reflect gene gain in another genome.
Our objective was to study the evolution of Rickettsia and Orientia species from a free-living lifestyle to an intracellular one by considering gene loss and HGT events. For the first time, we compared 3 members of the intracellular Rickettsiales order (Rickettsia spp., Orientia spp., Anaplasma spp.) to a free-living, non-pathogenic alpha-proteobacterium (Caulobacter spp.). Orientia tsutsugamushi are obligate intracellular bacteria that live in mites and are the causative agent of scrub typhus , while Anaplasma spp. are pathogenic parasites that cause anaplasmosis in humans through tick bites . As previously mentioned, gene loss in Rickettsiales has already been investigated, but here, we distinguished truly lost genes from genes gained by other species. Therefore, to identify HGT, we constructed systematic phylogenies for every gene that appeared to be missing. We were able to verify HGT events reported by previous studies, and we were able to describe previously unidentified events as well as to define the origin of each transferred gene. After identifying true gene loss and HGT, we reconstructed the ancestral gene sets of alpha-proteobacteria, proto-Rickettsiales, proto-Rickettsiaceae and proto-Rickettsia, which helped us to predict the number of genes lost during the passage from a free-living to an intracellular lifestyle (See Additional File 1: Figure S1). Finally, we attempted a reconstruction of the first Rickettsia genealogy of all gained genes.
We found that 701 riCOGs (COGs found only in Rickettsia spp.) were common to the 11 rickettsial species in our study, 995 riCOGs were absent from at least 1 of the 11 rickettsial species and 1,954 open reading frames (ORFs) were specific to 1 of the 11 species (See Additional File 1: Table S1). Functional characterization of the riCOGs is described in Figure S2 (See Additional File 1: Figure S2). We observed that 929 rioriCOGs (COGs obtained comparing Rickettsia spp. and Orientia spp.) were found in both Rickettsia and Orientia. In total, 173 rioriCOGs were found only in Rickettsia, and 119 were identified only in Orientia. Finally, cross-comparisons of Rickettsia spp., Orientia spp., Anaplasma spp. and Caulobacter detected 191 ricauCOGs (COGs obtained comparing all 4 genera) in all 4 genera, 102 in Rickettsia, Orientia and Anaplasma only, 281 in Anaplasma only and 1,982 in Caulobacter only.
A phylogenomic tree constructed based on gene content (i.e., the presence or absence of COGs) showed a genus organization for Rickettsiae that was different from a whole core gene concatenated tree  (Figure 1) or even a phylogenetic tree based on 16S rRNA sequences . On our phylogenomic tree, R. felis and R. akari are not placed within the SFG, in contrast to what would normally be expected. Rather, R. felis clusters with R. bellii, while R. akari clusters with the TG and R. canadensis. However, the branch leading to R. akari has a low bootstrap, probably due to phylogenetic uncertainty. This phylogenomic analysis shows that the rickettsial gene repertoires have different evolutionary histories. The phylogenomic tree suggests that differential gene loss and lateral gene acquisition may have played important roles in the evolution of some Rickettsia spp. Indeed, genes acquired from distant sources are more likely to supply novel traits that would set the recipient apart from its relatives . We also analyzed the species organization for the different functional COG categories (See Additional File 1: Figure S3). Likewise, using the tree based on whole-genome content, phylogenomic analysis for the different functions revealed many topologies that differ from that of the whole core gene tree or the 16 S rRNA based tree. These results suggest that gene loss and HGT are relevant for all gene functions.
HGT and phylogenetics
We used single-gene phylogenies to investigate the evolution of rickettsial gene content. When comparing 2 species, genes that are absent in one species could have either been lost in that species or gained in the other through HGT. To detect HGT, phylogenetic trees were constructed for Rickettsiales genes that were found to not be common to all Rickettsiales spp. and compared to the Tree of Life (See Material and Methods). Among the 995 riCOGs that were missing from at least 1 rickettsial species, we found 6 genes that were likely obtained by the other Rickettsia species via HGT (Table 1, See Additional File 2: Phylogenetic trees). These genes were acquired from several organisms, ranging from gamma-proteobacteria to eukaryotes. The functions of these transferred genes are diverse; the genes code for enzymes with synthase or proteolytic activities. Special interest should be given to the metK gene (riCOG00983), which codes for S-adenosylmethionine synthetase; phylogenies suggest that this gene was transferred from gamma-proteobacteria to Rickettsia species. Sequence analysis showed that the metK gene is degrading at different rates in the Rickettsia species that do not appear on the phylogenetic tree, while it is conserved in R. akari, and R. felis. This result agrees with the studies of Andersson  on the metK gene. This degradation confirms our hypothesis regarding gene loss in species where the gene is absent. Phylogenetic analysis showed that among the 129 rioriCOGs missing from at least one Rickettsia spp., 3 have likely been subject to HGT. These genes were obtained from different organisms, including gamma-proteobacteria, protists and viruses (Table 1). They code for enzymes with hydrolase activity and for ankyrin-repeat containing proteins.
Rickettsiae as gene donors
We identified 3 examples where Rickettsia and Orientia were probably gene donors in HGT events. Rickettsia spp. from the SFG contributed a gene coding for a nucleotidyltransferase substrate binding protein to the Firmicutes spp. and Bacteroidetes spp., and the rickettsial ancestor contributed a gene coding for a putative permease to gamma-proteobacteria spp. Orientia species contributed a Na+/proline symporter histidine kinase gene to Bacteroidetes spp. These genes have various functions, including transposase and transferase activities (Table 2). We did not identify any HGT events in the genomes of Anaplasma spp., supporting the fact that none have ever been identified by any previous studies .
Gene gain during rickettsial evolution
Gene gains occurred at different times during rickettsial evolution, and the rate of HGT varies among species (Figure 2). Genes were gained before the separation of the SFG and TG (2 genes) and, subsequently, during speciation events within the SFG (4 genes). We identified one HGT event in Orientia tsutsugamushi Ikeda but none in Orientia tsutsugamushi Boyrong. R. canadensis, with 3 horizontally transferred genes, exhibited the highest rate of HGT among the species studied. Most of the horizontally transferred genes in SFG were gained ancestrally; however, R. akari and R. felis gained 2 genes even after specialization. R. prowazekii and R. typhi are the only Rickettsia species in which no gene gains were found to have occurred after their separation from the SFG. In some species, HGT events were followed by gene loss (Table 1).
De novo appearance
Genes present exclusively in Rickettsia spp. may correspond to genes that are the result of de novo gene formation. New genes often arise through the duplication of existing genes or through fusion/fission events . Genes originating from non-coding DNA are extremely rare . However, the short lengths of our 17 analyzed sequences indicate that they are likely to be newly arisen genes or, possibly, pseudogenes (Table 3). We only evaluated the 17 longest sequences because the probability that a sequence will be non-coding increases as sequence length decreases. Based on pair-wise ω ratio tests (ω = Ka/Ks = non synonymous/synonymous substitutions) on our 17 analyzed sequences, only 2 appear to be functional, given that their ω ratio is significantly less than 1. Of these genes, 3 seem to be evolving toward becoming coding regions in some lineages, while 12 others are apparently either non-functional or pseudogenes (their ω ratio is not significantly less than 1). The functions of the protein products of the true genes are unknown, and none have any annotated protein domains. Further studies are needed to investigate the functions of these novel genes.
Gene loss during rickettsial evolution
After eliminating the genes gained by HGT, we were able to deduce the gene sets of "proto-Rickettsia" and "proto-Rickettsiaceae", which correspond to the ancestors of current rickettsial spp. and Rickettsiaceae, respectively, using the PARS algorithm  (See Material and Methods). A comparison of the gene content between species showed that Rickettsia spp. share 1,027 genes that were probably vertically transferred from " proto-Rickettsia" (Figure 3). R. prowazekii and R. typhi have lost the largest number of genes (284 genes), whereas R. bellii has maintained all of the 1,027 acquired genes. We found no gene loss in R. bellii, and indeed, this species has more genes than the rickettsial ancestor due to HGT. We only identified one such transfer because our study was restricted to the uncommon COGs. The group containing R. conorii, R. massiliae, R. rickettsii and R. africae lost many genes (128 genes) following its association with ticks and its separation from the R. felis/R. akari cluster (79 genes). Gene loss that occurred as the SFG separated from the TG mainly considered genes involved in metabolic functions and information storage and processing. Losses at the species level equally affect genes of all functions (Figure 3). The " proto- Rickettsiaceae" contained 1,944 genes. Thus, the Rickettsiaceae have lost 2,135 genes during their evolution to an intracellular lifestyle. Following their speciation and distinction from the Orientia genus, the Rickettsiae lost 1,015 genes (See Additional File 1: Figure S4).
Several studies have searched for HGT events in various species [28–35], including rickettsial species [18, 19, 36], but their results have rarely been used for genome reconstruction. In the instances where reconstructions have been performed, they have been based on assumptions and estimations by changing " gain and loss penalties" [31, 32] that consider gene loss more likely to occur than gene gain. Furthermore, the origin of the transferred genes is not always clear . Our phylogenetic analysis-based strategy allowed us to reconstruct, for the first time, the gene sets of " proto-Rickettsiales"," proto-Rickettsiaceae" and " proto- alpha-proteobacteria" by accounting for gene loss and gain events.
The content of the alpha-proteobacterial ancestor was estimated at 3,000-4,000 genes, which is also the result suggested in a previous study . Looking further, we were able to determine that Rickettsiaceae lost 2,135 genes during their evolution to an intracellular lifestyle, and we described the rate of gene loss in 11 Rickettsia spp. We found that R. bellii has maintained all " proto-rickettsial" genes, while R. prowazekii and R. typhi have lost the largest number of ancestral genes and have the smallest genomes in Rickettsia. These 2 species present a restricted host range and exhibit extensive gene loss, together with reduced gene gain [9, 36]. The differences in gene repertoires between current Rickettsia spp. are mainly the result of differential gene losses from the ancestor . Gene loss at the species level is relevant for genes from all functional categories. The different functional repertoires seem to play important roles in the adaptation of Rickettsia spp. to their various hosts. Finally, the analysis of gene repertoires seems to be crucial for species definition ; differential gene losses help in the creation of new rickettsial species.
We found 9 genes that seem to have been gained by Rickettsia and Orientia spp. through HGT. Most gene gains took place before the separation of the SFG and the TG. However, horizontally acquired genes were then degraded , or completely lost, especially in the species belonging to the TG. Some genes seem to have been gained independently by different rickettsial species after their speciation. Transferred genes include transposases and ankyrin repeat-containing proteins and have various origins including different bacteria, animals and plants. Thus, we found genes encoding ankyrin repeat-containing proteins that appear to have been transferred from viruses and protists to Orientia species. These results suggest that genetic exchanges could have occurred between protists and their symbionts , as was found to be the case for R. bellii and amoebae . Using a BLAST search against the " non-redundant" database (NR), we demonstrated that some of the horizontally transferred riCOGs that we identified show similarity to protist sequences, even when the probable donor species are not protists (See Additional File 1: Table S2). Protists, especially amoebae, appear to have played a significant role as a melting pot for genetic exchange [17, 39]. The genome of R. bellii includes many genes related to those of amoebal symbionts , perhaps due to ancient gene exchanges between an ancestor of R. bellii and other amoebal hosts. It is plausible that the first host cell of Rickettsia was a protist . Similarly, the intracellular lifestyles of Rickettsia and Orientia spp. allowed them to donate genes to Firmicutes spp., Bacteroidetes spp. and gamma-proteobacteria spp. The transferred genes encode for proteins with transposase, proteolysis and hydrolase activities. However, the biological significance of these HGT events cannot currently be inferred, and the biological impact of gene transfers remains to be investigated.
Phylogenies of Rickettsia spp. revealed that some gene sequences do not have any significant homologues in the NR database and may therefore have appeared de novo. The origin and function of such genes remain a mystery . There are at least two steps involved in the evolution of new protein-coding genes from ancestral non-coding DNA. First, the DNA must be transcribed, and the locus that did not originally encode a protein has to acquire an ORF. Second, the new ORF must be transcribed through the use of a nearby existing gene promoter . In Rickettsia, 15 of the 17 sequences analyzed are either nonfunctional or are in the process of adopting functionality. These sequences may also correspond to inactivated genes that are being degraded by mutation (pseudogenes). Of course, we should not neglect the fact that for closely related bacterial species, the Ka/Ks ratio might change over time because the selective consequences of non-synonymous change are not always effectively instantaneous . It is also possible that these sequences may have been the result of HGT from lineages that have yet to be sequenced. It was recently proposed that such " ORFans" could also represent genes of viral or plasmidic origin [42, 43]. Indeed, our candidates are short and AT-rich, as has been proposed for ORFans of viral origin (Table 3). However, given that they do not currently match any homologous sequences in any databases, we consider them to be probable de novo genes. It will therefore be interesting to determine the origin of these sequences and whether they are genuinely functional and, even more importantly, to determine their function.
Previous studies have suggested that gene loss is a major evolutionary force that acts during the reductive evolution of intracellular bacteria [15, 44, 45]. Our phylogenetic analysis-based strategy of examining each missing gene allowed us to confirm that Rickettsiales are mainly shaped by gene loss. Genome size differences observed between obligate intracellular and free-living bacteria indeed result from genome reduction. However, we also detected HGT events that had not been previously identified and constructed the "proto-Rickettsia" genealogy based on quantitative genetic data. Therefore, our approach can be applied to better define bacterial evolutionary histories, and further cross-comparisons with other Rickettsiales, such as Wolbachia or Pelagibacter ubique, could help pinpoint specific HGT events leading to the intracellular lifestyle of Rickettsia. Finally, substantial horizontal gene transfers could help to accurately define bacterial spp. by allowing the phylogenetic history of genes that occur sporadically among multiple taxa to be traced.
Determination of COGs
Protein sets for 11 sequenced rickettsial species, 2 Orientia tsutsugamushi species, 3 Caulobacter species and 3 Anaplasma spp. were downloaded from NCBI ftp://ftp.ncbi.nih.gov/genomes/Bacteria/. The following organisms were used in this study: Rickettsia conorii Malish 7 [NCBI: NC_003103]; Rickettsia africae ESF-5 [NCBI: NC_012633]; Rickettsia massiliae MTU5 [NCBI: NC_009900]; Rickettsia rickettsii Sheila Smith [NCBI: NC_009882]; Rickettsia felis URRWXCal2 [NCBI: NC_007109]; Rickettsia akari Hartford [NCBI: NC_009881]; Rickettsia prowazekii Madrid E [NCBI: NC_000963]; Rickettsia typhi Wilmington [NCBI: NC_006142]; Rickettsia canadensis McKiel [NCBI: NC_009879]; Rickettsia bellii RML369-C [NCBI: NC_007940]; Orientia tsutsugamushi Boyrong [NCBI: NC_009488]; Orientia tsutsugamushi Ikeda [NCBI: NC_010793]; Caulobacter sp. K31 [NCBI: NC_010338]; Caulobacter crescentus CB15 [NCBI: NC_002696]; Caulobacter crescentus NA1000 [NCBI: NC_011916]; Anaplasma marginale St. Maries [NCBI: NC_004842]; Anaplasma marginale Florida [NCBI: NC_012026]; and Anaplasma phagocytophilum [NCBI: NC_007797]. We also included the Rickettsia slovaca 13-B proteome [project ID: 15712]. Orthologous genes were identified using the COGsoft program (E-value = 10-20 and coverage ≥ 70%) .
Gene content phylogeny
The COG data were used to construct a whole-genome phylogenetic tree based on gene content. We generated a matrix of binary discrete characters (" 0" and " 1" for absence and presence, respectively). Using this matrix, we constructed a phylogenetic tree implementing the neighbor-joining (NJ) method within PHYLIP (Phylogeny Inference Package) . The intergenomic distance used was that defined by Snel  (Table 4).
Construction of phylogenies
We also conducted phylogenetic analyses of COGs, excluding those shared by all Rickettsia spp. Therefore, we analyzed the following COGs: 1) those found only in Rickettsia (riCOGs) and absent in at least one rickettsial species, 2) those obtained after comparisons of Rickettsia spp. and Orientia spp. (rioriCOGs) and 3) those obtained after comparisons of Rickettsia spp., Orientia spp., Anaplasma spp. and Caulobacter spp. (ricauCOGs). Homologous sequences were queried within an NR database, and multiple alignments of homologous sequences were made using MUSCLE . Data producing a bias (such as skewing toward short sequences) or noise were automatically eliminated. Based on the multiple alignments, phylogenetic trees were constructed using three different methods: neighbor joining (NJ), maximum parsimony (MP) and maximum likelihood (ML). Paralogous sequences were detected by comparing gene trees against a reference species tree (Tree of Life), and functionality was verified using web databases including GeneOntology or NCBI's dbEST. These paralogous sequences were deleted from our study.
Analysis of phylogenies
We used phylogenies to distinguish between gene loss and horizontal gene transfer. When the tree topology obtained from the phylogeny of a riCOG corresponded to the topology of the Tree of Life, the riCOG was considered to be lost by the species absent in the tree. In contrast, when the phylogeny presented Rickettsiales spp. anchored in a non-alpha-proteobacteria clade, we concluded that the riCOG was gained through HGT from the organism with which the rickettsial species was grouped. Due to the presence of many phylogenetic artifacts, we only considered the HGT hypothesis for trees with high bootstrap values (> 60) for at least one of the 3 methods. A TBLASTN was used for every possible HGT to eliminate the potential problem of missing data. Some phylogenies containing species only from Rickettsiales might be explained because the corresponding genes are short sequences that are inadequate for phylogenic analysis. These genes might be the result of putative de novo appearances. Thus, we estimated the pair-wise ω = Ka/Ks values for these genes and their BLASTN matches using the codeml software . Sequences with ω < 1 likely represent a protein-coding exon. A χ2- test was used to determine whether ω ratios were significantly less than one. Therefore, when the P value was less than 0.05, the sequences were considered non-functional.
Reconstruction of the ancestral gene set
After eliminating the possible gained genes, we reconstructed the gene set of " proto-Rickettsiales" and " proto-Rickettsiaceae" using a parsimony approach implemented in a two-pass algorithm . The rickettsial species tree based on the whole gene core was coupled with the algorithm. The phyletic pattern of all COGs (i.e., the matrix indicating presence or absence) in each species analyzed was mapped onto this tree. The numbers of lost COGs were determined at ancestral nodes and at each leaf. We used the maximum possible gain penalty (i.e., 5) so that only gene losses appeared on the tree. Caulobacter spp. were used as outgroup for the reconstruction of the ancestral gene set.
Reviewer's report 1
Arcady Mushegian, Department of Bioinformatics, Stowers Institute for Medical Research, Kansas City, Missouri, USA.
The manuscript by Georgiades et al. is concerned with evolution of gene content in Rickettsiales, a group of alpha-proteobacteria whose genomes must have experienced considerable gene losses in the process of becoming parasites. The authors used several existing algorithms to estimate specific gains and losses and to reconstruct a putative common ancestor of the group. The conclusions of the paper seem quite reasonable, but I have several concerns about methods and about specific examples shown in the Supplement 1. I suspect that a bit more detailed explanation would set this all right. More specifically, in order: Lines 123-124: "Genes that are absent in one species could either have been lost by this species or gained by another species via HGT" - This does not have to be an 'either-or' proposition, it could be both - has this been considered?
We agree with the reviewer on this point. An absent gene could have been gained and then lost, especially if the gain took place at an ancestral level. This idea has been considered; as shown in Table 1, we have mentioned the species that have gained genes by HGT and the species that have lost these genes after speciation (see also lines 166-167).
Line 177: "After eliminating the genes gained by HGT" - after eliminating genes gained by HGT, the authors could have rebuilt the gene content tree; have this been done? Is this "purged" gene-content tree closer to the 16S RNA tree than the initial gene-content tree? And throughout the rest of the paper, which 'species tree' of the three possible trees was used (it is sometimes indicated, but not always).
A gene content tree without the genes gained by HGT has not been rebuilt. We obtained strong evidence for only 12 HGT events. Therefore, the elimination of 12 genes from a phylogeny containing more than a thousand genes will not give a topology different from that of the initial tree.
Throughout the rest of the paper, the tree used for the reconstruction of the ancestral gene set was the whole core gene concatenated tree, while the single-gene phylogenies used for HGT identification were compared to the " Tree of Life", as defined by the 16S RNA sequences. This is now clarified in the paper (lines: 111-112, 132, 312-313, 318-319, 335-336).
Methods: line 284: In building neighbour-joining trees from the binary matrix of gene content, which intergenomic distance was used - the importance of the appropriate normalization has been emphasized in literature.
The intergenomic distance was that defined by Snel et al. . The matrix of gene content was determined as follows: we calculated the percentage of genes shared between genomes i.e. the number of genes shared between genomes divided by the number of genes in the smallest genome (See Table 4).
"Caulobacter was used as outgroup" - in which cases? How is it compatible with the cases, discussed in the same paragraph, when Rickettsiae were attached to the non-alpha-proteobacteria?
Caulobacter was used as outgroup for the reconstruction of the ancestral gene set. The phrase was mistakenly placed in the wrong paragraph; this is now corrected (see Methods, lines 340-341).
Lines 305-307: " When the tree topology obtained from a riCOG was similar to the phylogeny of the tree from 16 S rRNA sequences..." - does 'similar' means 'same' here (i.e., the same when branches corresponding to the species lacking this COGs are pruned in the species tree), or 'similar but not the same'?
These lines have been rephrased in the paper (see Methods, lines 318-320). " Similar" here does not mean " identical"; it means " as close to the " Tree of Life" topology as possible".
Lines 307-309: " When, on the contrary, the phylogeny presented Rickettsiales spp. anchored in a non-alpha proteobacteria clade, we concluded that the riCOG was gained through HGT" - do the authors mean the whole riCOG, or only some genes in it? Table 1 seems to indicate the latter? What if there are no alpha-proteobacteria (other than Rickettsia) in the tree (see examples?)
We consider that a riCOG is a gene.
If there are no alpha-proteobacteria in a single-gene phylogeny (other than Rickettsia), then this is strong evidence of gene gain by Rickettsia only.
Lines 324-325: " The whole genome sequence-based rickettsial species tree was coupled with the algorithm." - Which one (see above), and why the choice?
It is the whole core gene concatenated tree that was coupled to the algorithm. This point has been clarified in the Methods (lines 336-337). This tree gives a more robust and accurate phylogeny. Furthermore, this phylogeny corresponds to the phenotypic classification dividing Rickettsia spp. into the Typhus Group, Spotted Fever Group and a " group" of divergent species, R. bellii and R. canadensis [5, 24].
Sup. file 1: are trees in this file rooted or not? They seem to be unrooted but arbitrarily shown as rooted, which may cause visual artefacts, e.g.: riCOG01685: if the tree is rearranged properly (i.e., all Metazoans, including rotifer Philodina, shown as one clade which is well-supported), we see the Rickettsia clade as a deep branch between plants and animals; where is the HGT? rioriCOG00831: Canarypox virus and Orientia spp. sequences are intermingling branches without modest support - maybe both of them are basal, again no HGT? rioriCOG00862 may be a similar situation: looks like there is a Trichomonas vaginalis clade and an Orientia spp. clade - where is the HGT evidence?
According to the reviewer's comment, the trees were changed and are now rooted (Additional File 2).
riCOG01685: Rickettsia found in a branch between plants and animals show a HGT from plants and animals to Rickettsia.
rioriCOG00831: Canarypox virus seems to surround Orientia spp.; it is more likely a HGT from Canarypox virus to Orientia spp.
rioriCOG00862: In this phylogeny, the only species present are Trichomonas vaginalis and Orientia species. This evidence supports a probable HGT toward Orientia spp.
Reviewer's report 2
Eugene V. Koonin, NCBI, NLM, NIH, Bethesda, MD 20894, USA.
Generally, a fine study, describing the evolutionary reconstruction of the evolution of a large group of bacterial endosymbionts. I think the study would benefit from employing a state of the art Maximum Likelihood method for the reconstruction of gene repertoires such as Count by Miklos Csuros (Csurös M. Count: evolutionary analysis of phylogenetic profiles with parsimony and likelihood. Bioinformatics. 2010 Aug 1; 26(15):1910-2). This is unlikely to change the conclusions dramatically but would yield more robust results.
We would like to thank the reviewer for the comment on the use of maximum likelihood method. However, we insist on our choice of the maximum parsimony method because the gene contents' evolution models are still unclear. Furthermore, we were able, using the PARS algorithm, to eliminate gained genes from the reconstruction and give high-gain penalties to make gene losses appear on the tree. We are, however, preparing a future approach using parsimony and branch lengths to reconstruct ancestral genomes (Royer-Carenzi M, Didier G, personal communication).
I am concerned about one conclusion of the paper, namely, that 17 genes in Rickettsia probably evolved de novo. The authors themselves consider alternatives such as rapid evolution and horizontal gene transfer from an unknown source, and I think these are actually much more plausible than the de novo explanation.
Rapid evolution and HGT from unknown sources are effectively plausible alternatives, as is a possible viral origin of these genes. These are scenarios that we are not neglecting. However, given that there is an important proportion of ORFans genes in bacterial genomes, it is not irrational to consider de novo creation. Finally, because they do not currently match with any homologs in any databases, we consider them to be probable de novo genes. This point has been reinforced in our Discussion (lines 256-261).
Reviewer's report 3
Patrick Forterre, Département de Microbiologie, Unité de Biologie Moléculaire du Gène chez les Extrêmophiles, Paris, France.
Georgiades and co-workers report a phylogenomic analysis of the order Rickettsiales, in order to identify the mechanisms of reductive evolution that shape the history of this order. Rickettsiales are members of the phylum proteobacteria, subdivision alpha. They are intracellular parasites of eukaryotic cells and « closely » related to the ancestor of mitochondria. Rickettsiales are therefore interesting both from a medical and an evolutionary expertise. Several genomes of Rickettsia have been now sequenced and already submitted to extensive comparative genomic analyses because of their medical and evolutionary interest. The analysis of Georgiades and co-workers update previous analyses by including 11 genomes of Rickettsiales and a Caulobacter as outgroup. They have obtained from their phylogenomic analysis a more precise estimation of the gene content of the last common ancestors of Rickettsia, Rickettsiales and Rickettsiae Their result confirm that, as expected, intracellular Rickettsia have mainly evolved by gene loss.
The authors have focused on the distinction between gene loss and horizontal gene transfer, by analyzing individual phylogenies of genes absent in one species. They identify several cases of HGT and propose the direction of the transfer and potential sources for the donor (in the case of transfer to Rickettsiae) or the recipient (in the case of transfer from Rickettsiae). The number of transfer detected is very limited 12, versus more than 2000 gene losses), but raises interesting biological questions. In the case of transfer to Rickettsiae, the authors suggest that it involved either eukaryotes infected by a Rickettsia (possibly a protist) or another parasite (a virus or a bacterium) infecting the Rickettsia infected cell. They do not really discuss the case of transfer from Rickettsiae. They are very rare but quite interesting since they suggest that descendants of the parasitic bacterium that received the rickettsial gene within the infected eukaryotic cell were later on able to transfer this gene to free living bacterium. This confirms that intracellular pathogens are not completely close systems but can participate to the network of gene transfer in nature. For example, this suggests that a eukaryotic gene (or the gene of a eukaryotic virus) could be transferred to free living bacteria by the intermediate of an intracellular pathogenic bacteria.
HGT events both from and toward Rickettsiae are the result of their intracellular lifestyle. However, concerning the transfers from Rickettsiae, two out of three times, the donation of the gene took place at an ancestral level. It is the Spotted Fever Group and the proto-Rickettsiales that contributed genes and not the current Rickettsia spp. (See Table 2). This result suggests that the more specialized a species becomes, the more it becomes a closed system. This remark is even more convincing when looking at R. prowazekii, whose evolution is characterized exclusively by gene loss [9, 12](See lines 213-217). R. prowazekii is a true obligate intracellular human-specialized pathogen that does not participate in the network of gene transfer in nature.
In their analysis, the authors compare the individual gene trees obtained to the 16 S rRNA tree to distinction between gene loss and horizontal gene transfer. It is unclear if this is the best strategy, since 16 S rRNA should be quite similar between Rickettsiae, with possibly a lack of resolution of the tree. They mention the whole core gene concatenated tree. Is this tree congruent with the 16 S rRNA tree? The authors could have updated this tree and use it as the reference species tree in their analysis. In any case, it should be interesting to have on the same figure the whole genome tree compared to the species tree (preferentially based on core gene phylogeny) (Figure 1).
The individual gene trees are compared to the " Tree of Life" phylogeny that corresponds to the classification of all species according to their 16S RNA sequences. We believe that this point is now clarified in the paper (lines 313-314, 318-320). The whole core gene concatenated tree is similar to the 16S RNA of rickettsial species but is not exactly identical (lines 112-113). According to the reviewer's comment on Figure 1, we presented both the whole genome phylogenomic tree and the whole core gene concatenated tree in the same figure (Figure 1A, B). We would like to thank the reviewer for this suggestion.
The authors also identify a low number of new genes without homologues in data base (ORFans) in Rickettsia. They indicate that the origin and function of such genes remain a mystery. In fact, it has been recently shown that many ORFans in Archaeal and bacterial genomes have a viral or plasmidic origin (Cortez et al., 2009). These genes are usually short, AT-rich, and located in genome region with traces of active or ancient integrated extra chromosomal elements. What about the genes detected here? Is it possible to imagine a viral origin? What is known about plasmid and or viruses in Rickettsiae? Cortez et al., have described several putative provirus's (proplasmids) in several Rickettsiae (Additional data file 3 in Cortez et al ., 2009). It could be interesting to use their analysis to see if some ORFans genes or genes involved in loss and HGT described here belong to these putative integrated elements.
(Cortez D, Forterre P, Gribaldo S: A hidden reservoir of integrative elements is the major source of recently acquired foreign genes and ORFans in Archaeal and bacterial genomes. Genome Biol. 2009 10(6):R65).
Based on the reviewer's comment, except from the short length of our ORFans genes, which was already pointed out, the AT % content was verified. Indeed, these genes are AT- rich (See revised Table 3), so the possibility of a viral origin is now discussed in the paper (lines: 256-261).
spotted fever group
horizontal gene transfer
cluster of orthologous genes
open reading frame
- NR database:
COG found only in Rickettsia
COG obtained after comparison of Rickettsia spp. and Orientia spp.
COG obtained after comparison of Rickettsia spp., Orientia spp., Anaplasma spp. and Caulobacter spp.
Raoult D, Ndihokubwayo JB, Tissot-Dupont H, Roux V, Faugere B, Abegbinni R, Birtles RJ: Outbreak of epidemic typhus associated with trench fever in Burundi. Lancet. 1998, 352: 353-358. 10.1016/S0140-6736(97)12433-3.
Merhej V, Raoult D: Rickettsial evolution in the light of comparative genomics. Biol Rev. 2010
Kikuchi Y, Sameshima S, Kitade O, Kojima J, Fukatsu T: Novel clade of Rickettsia spp. from leeches. Appli Environ Microbiol. 2002, 68: 999-1004. 10.1128/AEM.68.2.999-1004.2002.
Kuchler SM, Kehl S, Dettner K: Characterization and localization of Rickettsia sp. In water beetles of genus Deronectes (Coleoptera: Dytiscidae). Fems Microbiol Ecol. 2009, 68: 201-211. 10.1111/j.1574-6941.2009.00665.x.
Fournier PE, Raoult D: Current Knowledge on Phylogeny and Taxonomy of Rickettsia spp. Ann N Y Acad Sci. 2009, 1166: 1-11. 10.1111/j.1749-6632.2009.04528.x.
Moran NA: Accelerated evolution and Muller's rachet in endosymbiotic bacteria. Proc Natl Acad Sci USA. 1996, 93: 2873-2878. 10.1073/pnas.93.7.2873.
Andersson SG, Kurland CG: Reductive evolution of resident genomes. Trends Microbiol. 1998, 6: 263-268. 10.1016/S0966-842X(98)01312-2.
Andersson JO, Anersson SG: Genome degradation is an ongoing process in Rickettsia. Mol Biol Evol. 1999, 16 (9): 1178-1191.
Blanc G, Ogata H, Robert C, Audic S, Suhre K, Vestris G, Claverie JM, Raoult D: Reductive genome evolution from the mother of Rickettsia. Plos Genetics. 2007, 3: 10.1371/journal.pgen.0030014.
Darby AC, Cho NH, Fuxelius HH, Westberg J, Andersson SG: Intracellular pathogens go extreme: genome evolution in the Rickettsiales. Trends Genet. 2007, 23: 511-520. 10.1016/j.tig.2007.08.002.
Merhej V, Royer-Carenzi M, Pontarotti P, Raoult D: Massive comparative genomic analysis reveals convergent evolution of specialized bacteria. Biol Direct. 2009, 4: 13-10.1186/1745-6150-4-13.
Ogata H, Audic S, Renesto-Audiffren P, Fournier PE, Barbe V, Samson D, Roux V, Cossart P, Weissenbach J, Claverie JM, Raoult D: Mechanisms of evolution in Rickettsia conorii and R. prowazekii. Science. 2001, 293: 2093-2098. 10.1126/science.1061471.
Bechah Y, Karkouri EK, Mediannikov O, Quentin L, Pelletier N, Robert C, Médigue C, Mege JL, Raoult D: Genomic, proteomic and transcriptomic analysis of virulent and avirulent Rickettsia prowazekii reveals its adaptive mutation capabilities. Genome Res. 2010, 20: 655-663. 10.1101/gr.103564.109.
Fournier PE, Raoult D: Identification of rickettsial isolates at the species level using multi-spacer typing. BMC Microbiology. 2007, 7: 72-10.1186/1471-2180-7-72.
Boussau B, Karlberg OE, Frank CA, Legault BA, Andersson SGE: Computational inference of scenarios for α-proteobacterial genome evolution. Proc Natl Acad Sci USA. 2004, 101: 9722-9727. 10.1073/pnas.0400975101.
Ogata H, Renesto P, Audic S, Robert C, Blanc G, Fournier PE, Parinello H, Claverie JM, Raoult D: The genome sequence of Rickettsia felis identifies the first putative conjugative plasmid in an obligate intracellular parasite. Plos Biol. 2005, 3: e218-10.1371/journal.pbio.0030248.
Ogata H, La SB, Audic S, Renesto P, Blanc G, Robert C, Fournier PE, Claverie JM, Raoult D: Genome sequence of Rickettsia bellii illuminates the role of amoebae in gene exchanges between intracellular pathogens. Plos Genet. 2006, 2: e76-10.1371/journal.pgen.0020076.
Blanc G, Ogata H, Robert C, Audic S, Claverie JM, Raoult D: Lateral gene transfer between obligate intracellular bacteria: evidence from the Rickettsia massiliae genome. Genome Res. 2007, 17: 1657-1664. 10.1101/gr.6742107.
Wolf YI, Aravind L, Koonin EV: Rickettsiae and Chlamydiae - evidence of horizontal gene transfer and gene exchange. Trends Genet. 1999, 15: 173-175. 10.1016/S0168-9525(99)01704-7.
Cho NH, Kim HR, Lee JH, Kim SY, Kim J, Cha S, Kim SY, Darby AV, Fuxelius HH, Yin J, Kim JH, Kim J, Lee SJ, Koh YS, Jang WJ, Park KH, Andersson SG, Choi MS, Kim IS: The Orientia tsutsugamushi genome reveals massive proliferation of conjugative type IV secretion system and host-cell interaction genes. Proc Natl Acad Sci USA. 2007, 104: 7981-7986. 10.1073/pnas.0611553104.
Kuo CH, Ochman H: The fate of new bacterial genes. FEMS Microbiol Rev. 2009, 33: 38-43. 10.1111/j.1574-6976.2008.00140.x.
Seong SY, Choi MS, Kim IS: Orientia tsutsugamushi infection: overview and immune responses. Microbes Infect. 2001, 3: 11-21. 10.1016/S1286-4579(00)01352-6.
Damler J, Barbet A, Bekker C, Dasch G, Palmer G, Ray S, Rikihisa Y, Runargirwa F: Reorganization of genera in the families Rickettsiaceae and Anaplasmataceae in the order Rickettsiales: unification of some species of Ehrlichia with Anaplasma, Cowdria with Ehrlichia and Ehrlichia with Neorickettsia, description of 6 new species combination and designation of Ehrlichia equi and HGE agent as subjective synonyms of H. phagocytophila. Int J Syst Evol Microbiol. 2001, 51: 2145-2165.
Merhej V, Khalid EK, Raoult D: Whole genome-based phylogenetic analysis of Rickettsiae. Clin Microbiol Infect. 2009, 15: 336-337. 10.1111/j.1469-0691.2008.02265.x.
Ochman H, Lerat E, Daubin V: Examining bacterial species under the specter of gene transfer and exchange. Proc Natl Acad Sci USA. 2005, 102: 6595-6599. 10.1073/pnas.0502035102.
Knowles DG, McLysaght A: Recent de novo origin of human protein-coding genes. Genome Res. 2009, 19: 1752-1759. 10.1101/gr.095026.109.
Mirkin BG, Fenner TI, Galperin MY, Koonin EV: Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes. BMC Evol Biol. 2003, 3: 2-10.1186/1471-2148-3-2.
Pal C, Papp B, Lercher MJ: Adaptive evolution of bacterial metabolic networks by horizontal gene transfer. Nat Genet. 2005, 37: 1372-1375. 10.1038/ng1686.
Zhaxybayeva O, Gogarten JP, Charlebois RL, Doolittle WF, Papke RT: Phylogenetic analyses of cyanobacterial genomes: Quantification of horizontal gene transfer events. Genome Res. 2006, 16: 1099-1108. 10.1101/gr.5322306.
Whitaker JW, McConkey GA, Westhead DR: The transferome of metabolic genes explored: analysis of the horizontal transfer of enzyme encoding genes in unicellular eukaryotes. Genome Biol. 2009, 10: R36-10.1186/gb-2009-10-4-r36.
Iwasaki W, Takagi T: Rapid pathway evolution facilitated by horizontal gene transfers across prokaryotic lineages. Plos Genet. 2009, 5 (Suppl 3): e1000402-10.1371/journal.pgen.1000402.
Snel B, Bork P, Huynen MA: Genomes in flux: The evolution of archaeal and proteobacterial gene content. Genome Res. 2002, 12: 17-25. 10.1101/gr.176501.
Marri RP, Hao W, Golding GB: Gene gain and gene loss in Streptococcus: Is it driven by habitat?. Mol Biol Evol. 2006, 23 (12): 2379-2391. 10.1093/molbev/msl115.
Kettler GC, Martny AC, Huang K, Zucker J, Coleman ML, Rodrigue S, Chen F, Lapidus A, Ferriera S, Johnson J, Steglish C, Church GM, Richardson P, Chisholm SW: Patterns and implications of gene gain and loss in the evolution of Prochlorococcus. Plos Genet. 2007, 3 (12): 2515-2517. 10.1371/journal.pgen.0030231.
Cordero OX, Hogeweg P: The impact of long-distance horizontal gene transfer on prokaryotic genome size. Proc Natl Acad Sci USA. 2009, 106: 21748-21753. 10.1073/pnas.0907584106.
Fuxelius HH, Alistair CD, Cho NH, Andersson SGE: Visualisation of pseudogenes in intracellular bacteria reveals the different tracks to gene destruction. Genome Biol. 2008, 9: R42-10.1186/gb-2008-9-2-r42.
Medici D, Donati C, Tettelin H, Masignani V, Rappuoli R: The microbial pan-genome. Curr Opin Genet Dev. 2005, 15: 589-594. 10.1016/j.gde.2005.09.006.
Thomas V, Greub G: Amoebae/amoebal symbionts genetic transfers: lessons from giant viruses neighbors. Intervirology. 2010, 53: 254-267. 10.1159/000312910.
Moliner C, Fournier PE, Raoult D: Genome analysis of microorganisms living in Amoebae reveals a melting pot of evolution. FEMS Microbiol Rev. 2010, 34: 281-294. 10.1111/j.1574-6976.2009.00209.x.
Merkeev IV, Mironov AA: Orphan genes: Function, evolution, and composition. Mol Biol. 2008, 42: 127-132. 10.1134/S0026893308010196.
Rocha MTG, Cooper JE, Smith NH, Feil EJ: Comparisons of dN/dS are time dependent for closely related bacterial genomes. J Theor Biol. 2006, 239: 226-235. 10.1016/j.jtbi.2005.08.037.
Boyer M, Gimenez G, Suzan-Monti M, Raoult D: Classification and determination of possible origins of ORFans through analysis of nucleocytoplasmic large DNA viruses. Intervirology. 2010, 53: 310-320. 10.1159/000312916.
Cortez D, Forterre P, Gribaldo S: A hidden reservoir of integrative elements is the major source of recently acquired foreign genes and ORFans in the Archaeal and bacterial genomes. Genome Biol. 2009, 10: R65-10.1186/gb-2009-10-6-r65.
Moran NA, Wernegreen JJ: Lifestyle evolution in symbiotic bacteria: insights from genomics. Trends Ecol Evol. 2000, 15: 321-326. 10.1016/S0169-5347(00)01902-9.
Moran NA, Mira A: The process of genome shrinkage in the obligate symbiont Buchnera aphidicola. Genome Biol. 2001, 2 (12): research0054.1-0054.12. 10.1186/gb-2001-2-12-research0054.
The NCBI site. [ftp://ftp.ncbi.nih.gov/genomes/Bacteria/]
Tatusov RL, Koonin EV, Lipman DJ: A genomic perspective on protein families. Science. 1997, 278: 631-637. 10.1126/science.278.5338.631.
Felsenstein J: PHYLIP (Phylogeny Inference Package) version 3.5c. Distributed by the author. 1993, Department of Genetics. University of Washington, Seattle
Snel B, Bork P, Huynen MA: Genome phylogeny based on gene content. Nat Genet. 1999, 21: 108-110. 10.1038/5052.
Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004, 32: 1792-1797. 10.1093/nar/gkh340.
Goldman N, Yang ZH: Codon-Based Model of Nucleotide Substitution for Protein-Coding Dna-Sequences. Mol Biol Evol. 1994, 11: 725-736.
We would like to thank Oliver Chabrol and Ghislain Fournous for technical support, Boris Mirkin and Petar Konovski for their help in the PARS algorithm, Gilles Didier and Manuela Royer-Carenzi for their help in statistical analysis and Anthony Levasseur for helpful remarks and corrections.
The authors declare that they have no competing interests.
PP and DR designed the research project. EK provided the COG data. KG and VM performed the genomic analysis. KG, VM, PP and DR analyzed the data. KG and VM wrote the paper. KG and VM contributed equally to the work. RD and PP revised the paper. All authors read and approved the final version.
Electronic supplementary material
Additional file 1: Four supplementary figures and two supplementary tables. Figure S1. Phylogenetic analysis-based strategy. Figure S2. Functional characterization of the rickettsial COGs. The functional categories were determined by the COGs database http://0-www-ncbi-nlm-nih-gov.brum.beds.ac.uk/COG/grace/fiew.cgi. Figure S3. Phylogenomic clustering of Rickettsia spp. in the different COG functional categories. Topologies are different for each functional category. Figure S4. Gene loss. Tentative scenario of gene loss in Rickettsiales from " proto-alpha-proteobacteria" (a) to current Rickettsia spp. Gene sets of " proto-alpha-proteobacteria"," proto-Rickettsiales" (b) and " proto-Rickettsiaceae" (c) were estimated using the PARS algorithm with a gain penalty of five. Table S1. Determination of COGs. The total number of ORFs in the studied species, number of COGs and specific genes are indicated. Table S2. Horizontally transferred genes with sequences that produced a BLAST hit with protist sequences. E-value < e-3 and identity >25%. (PDF 90 KB)
Additional file 2: Phylogenetic trees showing HGT events as generated by the ML method. For genes gained by Rickettsiales the donors are colored in green and for the cases for which the Rickettsiales gave genes the receivers are colored in blue. (PDF 49 KB)
About this article
Cite this article
Georgiades, K., Merhej, V., El Karkouri, K. et al. Gene gain and loss events in Rickettsia and Orientiaspecies. Biol Direct 6, 6 (2011). https://0-doi-org.brum.beds.ac.uk/10.1186/1745-6150-6-6