Introduction

Nitrogen fixation (or diazotrophy) is the process of converting atmospheric nitrogen (N2) to reduced forms such as ammonia (NH3) (Postgate 1982). A small amount of atmospheric dinitrogen is reduced by lightning; however, the majority is reduced by prokaryotes (Postgate 1982; Sprent and Sprent 1990). The enzyme complex nitrogenase is responsible for nitrogen fixation. The subunits of nitrogenase are encoded by the nifHDK operon. Nitrogenase is composed of two components, dinitrogenase reductase (iron protein) and dinitrogenase (molybdenum–iron protein). Dinitrogenase reductase is composed of two identical subunits that are encoded by nifH (Mevarech et al. 1980), while dinitrogenase is a tetramer composed of two subunits encoded by nifD (Lammers and Haselkorn 1984) and two subunits encoded by nifK (Mazur and Chui 1982). Dinitrogenase reductase contains an iron–sulfur (4F–4S) cofactor, which binds the subunits of nitrogenase, and is responsible for mediating the ATP-dependent transfer of electrons to the dinitrogenase tetramer. Dinitrogenase binds atmospheric dinitrogen (N2) and is responsible for the transfer of electrons to it (Postgate 1982).

Other nitrogen fixation operons, in addition to nifHDK, have been identified in the heterocystous cyanobacterium, Nostoc sp. Strain PCC 7120 (Mazur and Chui 1982; Rice et al. 1982; Lammers and Haselkorn 1984). These include nifBSU, fdxN (Mulligan et al. 1988; Mulligan and Haselkorn 1989), and nifENXW (Borthakur et al. 1990; Haselkorn and Buikema 1992). NifB, nifN, and nifE are involved in molybdenum–iron cofactor synthesis. FdxN encodes a bacterial-type ferrodoxin of unknown function, and the functions of nifS, nifU, nifX, and nifW remain unknown. Recent studies have suggested that some nif genes may have arisen via paralogous gene duplication (Fani et al. 2000). In addition to the nif gene family listed above, several alternative nitrogenases have been found and these include Mo-dependent nif2, Va-dependent vnf, and Fe-dependent anf operons (Bishop and Premakumar 1992).

The ability to fix nitrogen is widely distributed in distantly related members of the Eubacteria and the Archaea (Young 1992); however, it is not ubiquitous in occurrence. The evolutionary history of the nitrogen fixation has been debated for some time. The debate has focused on how to explain the random distribution of nitrogen fixation in distantly related lineages of prokaryotes. Several alternative hypotheses have been proposed to explain this haphazard distribution. The first hypothesis is that nitrogen fixation arose once in evolutionary history but has been transferred laterally (i.e., horizontally) to various lineages (Normand and Bousquet 1989). A second hypothesis predicts that it arose early in the evolutionary history of prokaryotes and was at one time ubiquitous among them, but has since been lost by many lineages and retained by a few distantly related ones (i.e., vertical descent, accompanied by multiple losses) (Young 1992; Normand et al. 1992; Hirsch et al. 1995). A third view is that it arose multiple times through convergent evolution (Postgate and Eady 1988).

To evaluate the evolution of nitrogen fixation, we and others (Normand and Bousquet 1989; Normand et al. 1992; Hirsch et al. 1995) have examined the phylogenies of nif genes. We utilized published ribosomal RNA phylogenies (Woese 1987; Olsen and Woese 1993; Olsen et al. 1994) as an independent framework for comparison because rRNA is universally present, highly conserved, believed to be representative of the organismal phylogeny, and not believed to be laterally transferred (Woese 1987; Olsen and Woese 1993; Olsen et al. 1994). Thus, rRNA phylogenies in all likelihood, represent true organismal phylogenies and, therefore, are the most appropriate comparison for exploring alternative hypotheses involving lateral gene transfer. If the nif and the rRNA phylogenies have generally congruent topologies, then vertical descent may be supported. If, on the other hand, the nif and rRNA phylogenies have incongruent topologies, then lateral transfer of nif may have occurred. Convergence, or multiple origins of nif genes, could be indicated by incongruence, although we would also expect significant divergence in nucleotide and amino acid sequences if this were the case. This is in sharp contrast to the high sequence similarity and the conserved organization of nif genes found in nitrogen fixing organisms. Another possibility to explain incongruence is the inadvertent analysis of paralogous gene copies that arose though duplication. Therefore, our study focuses on distinguishing horizontal transfer from vertical descent of nif genes within the Eubacteria. In this paper, we reexamine the evolutionary history of nitrogen fixation by analyzing nifD from an increased number of cyanobacteria, proteobacteria, and Gram-positive bacteria. We analyzed only molybdenum containing nifD genes, and excluded all alternative nitrogenase genes, i.e., anfD and vnfD. Our objective was to compare the topology of our nifD phylogeny to 16S rRNA phylogenies to assess the evolutionary history of nifD and nitrogen fixation within the Eubacteria.

Materials and Methods

Fifty-seven nifD nucleic acid and inferred amino acid sequences were used in this study (Table 1). Inferred amino acid sequences were initially aligned using Clustal W (Thompson et al. 1994) with gaps inserted for optimal alignment. The amino acid alignment was then adjusted manually using MacClade 4.0 (Maddison and Maddison 2000). The nucleotide alignment was made using Codon Align 1.0 (Barry Hall, University of Rochester), which constructs the alignment based on the amino acid alignment, with gaps inserted between codons and not within them. The amino acid alignment was then adjusted manually using MacClade 4.0 (Maddison and Maddison 2000).

Table 1 Eubacterial and archaean nifD sequences used in this study with GenBank accession numbers listed

The amino acid data matrix was analyzed using parsimony with PAUP* 4.0b10 (Swofford 2002). All trees were rooted with outgroup analysis using six archaean representatives (Table 1). Analyses were performed with the user-defined stepmatrix “PROTPARS,” which is equivalent to Felsenstein’s PROTPARS from PHYLIP (Felsenstein 1981). Parsimony analysis was conducted using the heuristic search option, with gaps treated as missing data, trees obtained using stepwise addition, and the tree-bisection-reconnection (TBR) branch swapping with random sequence addition for 100 replicates, steepest descent not in effect, maxtrees set at 50,000, branches collapsed if the maximum length was zero, multrees option in effect, and no topological constraints enforced. Three parsimony analyses were performed: (1) all characters included, (2) constant and uninformative characters excluded, and (3) constant and uninformative characters plus 51 characters that correspond to an insertion found in nifD genes of Clostridium, Methanosarcina, and Chlorobium tepidum. Bootstrap values were calculated for 500 replicates to evaluate branch support using the parsimony options listed above (Bremer 1994; Felsenstein 1985; Huelsenbeck et al. 1995).

Analysis of the nucleic acid data matrix was performed using maximum likelihood (ML) criteria with PAUP* 4.0b10 (Swofford 2002). We determined the evolutionary model that best described our data using Modeltest 3.06 (Posada and Crandall 1998), which was the F81 model (Felsenstein 1981) with a likelihood score of 22106.18947. Trees were rooted with outgroup analysis using nifD sequences from six archaean representatives (Table 1). ML analysis was conducted using the heuristic search option, with the likelihood settings corresponding to the F81 model (Felsenstein 1981). ML settings were as follows: no molecular clock enforced, starting branch lengths determined via the Rogers–Swofford approximation method, trees with likelihoods that were 5% or further from the target score were rejected, branch-length optimization, equaled one-dimensional Newton Raphson with pass where the limit = 20 and δ = 1e−06, starting trees obtained via stepwise addition, sequence addition as-is, one tree held at each step during stepwise addition, TBR, steepest descent in effect, maxtrees set at 50,000, branches collapsed if the branch lengths equal to or less than 1e−08, multrees option in effect, and topological constraints not enforced. ML analysis was preformed with the third codon excluded, to remove noise resulting from saturation at that position, as well as with and without the ~154-bp insert excluded. Bootstrap values were calculated for 500 replicates to evaluate branch support using parsimony criteria (Bremer 1994; Felsenstein 1985; Huelsenbeck et al. 1995).

Distance analysis of the nucleic acid data matrix was performed using the neighbor-joining method (Saitou and Nei 1987), with all characters included and the DNA distance measure set to LogDet/paralinear to correct for base compositional bias (Lake 1994; Lockhart et al. 1994).

Results

NifD Amino Acid Sequences

The aligned nifD amino acid data matrix was 599 residues in length and consisted of 349 variable characters. Parsimony analysis with all characters included generated 38 equally most parsimonious trees that were 4389 steps long, with a consistency index (CI) of 0.54, a rescaled consistency index (RC) of 0.39, and a retention index (RI) of 0.71. The 38 most parsimonious trees (MPTs) represented or six islands of similar topology (Fig. 1). The major topological difference between the islands is the placement of Acidithiobacillus ferrooxidans and a clade containing Alcaligenes faecalis, Azotobacter vinelandii, Azoarcus sp. BH72, and Klebsiella pneumoniae. The first island is composed of four trees (Fig. 1A) with a branch containing A. faecalis, A. vinelandii, Azoarcus sp. BH72, K. pneumoniae sister to the cyanobacteria, and Acidithiobacillus ferrooxidans sister to the cyanobacteria and proteobacteria. The second island is composed of 19 trees (Fig. 1B), with the cyanobacteria and proteobacteria as sister, with A. ferrooxidans as sister to the larger clade. Islands 3 through 6, composed of 12, 1, 1, and 1 trees, respectively, place the cyanobacteria and proteobacteria as sister, with A. ferrooxidans embedded within the proteobacteria, and A. faecalis, A. vinelandii, Azoarcus sp. BH72, and K. pneumoniae as sister (Fig. 1C). The islands vary slightly at the terminal branches, but the overall topology of all MPTs is similar. A strict consensus tree of the 38 MPTs from the six islands is presented in Fig. 1D. This tree consists of seven major branches. The cyanobacteria and proteobacteria occur in a single clade, with the cyanobacteria supported as monophyletic and the proteobacteria occurring on several unresolved clades. Neither the α, β, nor γ proteobacterial subgroups are supported as monophyletic.

Figure 1
figure 1

Trees generated from parsimony analysis of nifD amino acid sequences. The taxnomic groups are indicated; in addition, the proteobacterial subgroups are inbrackets. Branches in boldface are those taxa that occur on different islands (discussed in the text). A Strict consensus tree of the four equally parsimonious trees in island 1. B Strict consensus tree of the 19 equally parsimonious trees in island 2. C Strict consensus tree generated from the 15 equally parsimonious trees generated from islands 3 through 6. D Strict consensus tree of all 38 equally parsimonious trees. Numbers above branches indicate bootstrap values.

In all MPTs of amino acid sequences (Fig. 1), the Gram-positive bacteria (firmicutes and actinobacteria) are not supported as monophyletic. The high G+C Gram-positive bacteria and Frankia strains (actinobacteria) are monophyletic and sister to the proteobacteria and cyanobacteria. The low G+C Gram-positive bacteria, Clostridia species (firmicutes), form a clade sister to the green sulfur bacterium, Chlorobium tepidum. All additional parsimony analyses of the amino acid data matrix (i.e., with and without constant and uninformative characters and the 51 amino acid residue insertion) resulted in trees that are virtually identical in topology to those trees presented in Fig. 1.

NifD Nucleic Acid Analysis

The aligned nifD nucleic acid data matrix was 1794 bp in length, with the 598 third codon positions excluded from the analysis. ML analysis converged on a single tree with a −lnL score of 20239.65577 (Fig. 2) and resolved four major clades. The cyanobacteria are supported as monophyletic and are sister to the proteobacteria. However, ML analysis failed to support the monophyly of the α, β, or γ subgroups of the proteobacteria. Sister to the cyanobacteria and proteobacteria is Acidithiobacillus ferrooxidans. The Gram-positive bacteria (actinobacteria and firmicutes) are not resolved as monophyletic, while the actinobacteria are monophyletic and sister to the cyanobacteria and proteobacteria. Clostridium species occur on a branch with Chlorobium tepidum and Methanosarcina.

Figure 2
figure 2

Maximum likelihood tree of nifD nucleic acid sequences that had an −InL score of 20239.65577. The different groups of bacterial taxa are bracketed. The proteobacterial subgroups are in brackets. Numbers in boldface above the branches are bootstrap values and numbers below the branches indicate branch lengths.

The neighbor-joining tree (Fig. 3), constructed using LogDet/paralinear distances, resulted in a tree with a topology similar to that of the ML analysis of DNA sequences (Fig. 2) and parsimony analysis of amino acid sequences (Fig. 1); however, Frankia is embedded within the proteobacteria. Distance analysis also differed from the other analyses (Figs. 1 and 2) in that Chlorobium tepidum, Clostridium species, and Methanosarcina species are not placed together.

Figure 3
figure 3

Distance tree of nifD nucleic acid sequences created with the neighbor-joining method and LogDet/paralinear DNA distances. The different groups of bacterial taxa are bracketed.The proteobacterial subgroups are in brackets. Numbers above the branches indicate branch lengths.

Discussion

The relationships among the cyanobacteria, proteobacteria, and Gram-positive bacteria (actinobacteria and firmicutes) have not been fully resolved from analyses of ribosomal RNAs (Hirsch et al. 1995). Phylogenies based on 16S rRNA differ in the relationships between these eubacterial lineages, with Woese (1987) placing the cyanobacteria and Gram-positive bacteria as sister to each other with the proteobacteria at the base of the tree (Fig. 4A), whereas Olsen et al. (1994) places the Gram-positive bacteria and the proteobacteria as sister to each other with the cyanobacteria at the base of the tree (Fig. 4B). Although these topologies differ, they are consistent in the resolution of these eubacterial lineages as monophyletic and distinct. Therefore, it has been suggested that the relationship among these eubacterial lineages may best be described as an unresolved trichotomy (Hirsch et al. 1995). However, these phylogenies are still useful for comparison because they resolve the three lineages as monophyletic. Thus, incongruence between the nif phylogenies and the 16S rRNA phylogenies may indicate lateral gene transfer. For example, if lateral transfer of nif had occurred, we would not expect to resolve these major lineages as monophyletic. Rather, we would expect that members of one lineage would be placed among members of another lineage (Fig. 4C), which has been reported by others (Normand and Bousquet 1989; Normand et al. 1992; Hirsch et al. 1995). NifH, -D, and -K have all been examined to some extent and have produced conflicting results supporting both lateral transfer and vertical descent.

Figure 4
figure 4

Published trees redrawn. A 16S rRNA tree modified from Woese (1987). B 16S rRNA tree modified from Olsen et al. (1993). C Phylogeny depicting lateral transfer from the proteobacteria to the cyanobacteria and actinobacteria. Adapted from Hirsch et al. (1995).

Analyses of partial nifH sequences that support lateral transfer have topologies that were incongruent with a 16S rRNA-based phylogeny and placed the cyanobacteria and actinobacteria within the proteobacteria (Normand and Bousquet 1989; Normand et al. 1992; Hirsch et al. 1995; Ueda et al. 1995; Kessler et al. 1997). This suggests that nitrogen fixation was laterally transferred from an ancestral proteobacterium to the most recent common ancestors of cyanobacteria and actinobacteria (Normand and Bousquet 1989; Normand et al. 1992; Hirsch et al. 1995). However, other nifH analyses resolved the cyanobacteria and proteobacteria as sister to each other with the actinobacteria at the base of the tree (Hennecke et al. 1985; Hirsch et al. 1995; Zehr et al. 1998, 2000), consistent with the topology of the 16S rRNA phylogeny supporting vertical descent of nifH.

Previous nifD phylogenies also supported lateral transfer because the cyanobacteria were placed within the proteobacteria (Normand and Bousquet 1989; Hirsch et al. 1995; Kessler et al. 1997). The nifD phylogeny differs from nifH in that the actinobacteria and cyanobacteria do not occur together, and actinobacteria are placed as sister to the proteobacteria and cyanobacteria clade (Normand et al. 1992; Kessler et al. 1997). The failure of nifD to resolve the evolution of nitrogen fixation led some investigators to speculate that it simply does not provide sufficient resolution (Normand et al. 1992; Hirsch et al. 1995).

NifK amino acid sequences have also been analyzed with parsimony and distance methods to resolve the issue (Hirsch et al. 1995). Parsimony analysis resulted in a phylogeny that predicted vertical descent, with the actinobacteria and cyanobacteria as sister to the proteobacteria (Hirsch et al. 1995). In contrast, distance analysis resulted in a phylogeny that supported lateral transfer (Fig. 4C), with the actinobacteria and cyanobacteria sister to each other and embedded within the proteobacteria (Hirsch et al. 1995). Thus, conflicting results may be a combination of three different nif genes being analyzed with different methods of phylogenetic analysis, none of which are model-based methods such as ML.

Our analyses of nifD (Figs. 1,2,3), consistently produced topologies that are congruent with 16S rRNA phylogenies, and with the cyanobacteria, proteobacteria, and actinobacteria fully resolved via reciprocal monophyly and, thus support vertical descent. Our analysis did not result in members of one lineage being place within or among members of another lineage, consistent with 16S rRNA phylogenies supporting vertical descent.

Parsimony analysis of amino acid sequences (Fig. 1) and ML analyses of DNA sequences (Fig. 2), place the firmicutes (Clostridium), Methanosarcina, and Chlorobium tepidum together at the base of the tree; however, distance analysis of DNA sequences (Fig. 3) places them in two clades at the base of the tree. The placement of these taxa together is unexpected; since the firmicutes and Chlorobium tepidum are Eubacteria, and Methanosarcina is a member of the Archaea. Although the placement of these taxa together may be unexpected, analyses of nifH also place Clostridium with Methanosarcina (Zehr et al. 1998, 2000). While this might be attributed to long-branch attraction, Clostridium, Methanosarcina, and Chlorobium tepidum all possess an insertion encoding approximately 50 amino acids not found in any of the other taxa examined. However, this 50-amino acid insert in not conserved in these taxa. Distance analysis using the LogDet/paralinear distances, which compensates for base compositional bias (Lake 1994; Lockhart et al. 1994), did not place these taxa together (Fig. 3). This may suggest that the nifD genes of these taxa share a common evolutionary history due to lateral gene transfer. It also possible that the placement of these two taxa together is due to their nifD sequences being ancient paralogous copies that arose prior to the separation of the Eubacteria and the Archaea (Young 1990, 1992, 2000; Leigh 2001).

Another discrepancy in our results is the placement of Frakia in the parsimony analysis of amino acid sequences (Fig. 1) and ML analysis of DNA sequences (Fig. 2), which place them as sister to the larger clade of cyanobacteria and proteobacteria, in contrast to our distance analysis of DNA sequences (Fig. 3) which places it within the proteobacteria. The placement of Frankia within the proteobacteria (Fig. 3) is consistent with lateral transfer, but its placement as sister to the cyanobacteria and proteobacteria (Fig. 2) is consistent with vertical descent. Although there is conflict regarding the placement of Frankia in the distance tree (Fig. 3), it is more likely that its placement as sister to the cyanobacteria and proteobacteria is correct, supporting vertical descent (Figs. 1 and 2), since this is congruent with 16S rRNA (Woese 1987; Olsen and Woese 1993; Olsen et al. 1994), and the increased probability that parsimony and likelihood methods are superior phylogenetic analyses for more accurately resolving evolutionary relationships.

Our results are consistent with Zehr et al. (1998, 2000) in the placement of the proteobacteria and cyanobacteria as sister, with the actinobacteria at the base. A major difference between Zehr’s parsimony and distance analyses of nifH (Zehr et al. 1998, 2000) and the nifD phylogenies presented here is the placement of the proteobacterial subgroups. (Zehr et al. 1998, 2000) analyses support the monophyly of the proteobacteria as a whole, as well as the monophyly of the α and β subgroups (Zehr et al. 1998, 2000). Our parsimony analysis of amino acid sequences does not support the monophyly of the proteobacteria. However, with the exception of Acidithiobacillus ferrooxidans, the monophyly of the proteobacteria is supported by ML analysis of DNA sequences (Fig. 2). None of the analyses presented here resolve the α, β, and γ proteobacterial subgroups as monophyletic. Recent studies suggest that both nifH (Udea et al. 1995; Hurek et al. 1997, 1998; Laguerre et al. 2001) and nifD (Parker et al. 2002; Qian et al. 2002, 2003) may have been transferred horizontally within the proteobacteria, which would result in failure to differentiate these subgroups. Additionally, it has been suggested that some of the proteobacterial subgroups may not be monophyletic (Garrity and Holt 2001). Although our analyses of nifD vary slightly from those of nifH (Zehr et al. 1998, 2000), the two are similar in overall topology and support vertical descent. Furthermore, the agreement of nifH and nifD can be taken as evidence that nif genes and nitrogen fixation have evolved by vertical descent.

The analyses of nifD presented here are consistent with nifH (Zehr et al. 1998, 2000) and resolve the Gram-positive bacteria (firmicutes and actinobacteria), proteobacteria, and cyanobacteria, which is congruent with 16S rRNA phylogenies (Woese 1987; Olsen and Woese 1993; Olsen et al. 1994), thus supporting the vertical decent of nifD within these lineages. The discrepancies in earlier nif phylogenies may have been due to the number of species that were examined. Previous studies examined a limited number of taxa whereas we examined 57 taxa. These discrepancies could also be due to the methods of analysis used. Previous studies employed only parsimony and/or distance methods. We utilized parsimony, distance, and the model-based maximum likelihood method. It has been suggested that divergent sequences should be analyzed with model-based criteria to more accurately predict phylogeny since noise due to saturation is higher (Swofford et al., 1996; Huelsenbeck and Rannala 1997; Buckely and Cunningham 2002).