Keywords

1 Introduction

Nitrogen is an essential nutrient for all living organisms. It is an important component of many organic molecules such as DNA, RNA, and proteins, the building blocks of life. Molecular nitrogen or dinitrogen (N2) makes up nearly four-fifths of the atmosphere but is metabolically unavailable directly to higher plants or animals. It is available to some species of microorganism through biological nitrogen fixation (BNF) in which atmospheric nitrogen is converted to ammonia by the enzyme nitrogenase (Postgate 1998). The ammonia is then transferred to the higher plant to meet its nutritional needs for the synthesis of proteins, enzymes, nucleic acids, chlorophyll, etc., and subsequently it enters the food chain. Thus, all eukaryotes (including higher plants and animals) naturally depend on the BNF activity of the N-fixing microbes for their N supply. Microorganisms that fix nitrogen are called diazotrophs. According to current knowledge, only prokaryotes (members of the domains Archaea and Bacteria) are capable of performing BNF (Klipp 2004).

The ability to fix nitrogen is widely, though paraphyletically, distributed across both the bacterial and archaeal domains (Raymond et al. 2004). There are two types of diazotrophic prokaryotes: those that are free-living (e.g., Azotobacter, Clostridium, Klebsiella, etc.) and those that form symbiotic relationships (e.g., Rhizobium, Bradyrhizobium, Frankia, etc.). The free-living diazotrophs require a chemical energy source if they are nonphotosynthetic, whereas the photosynthetic diazotrophs, such as the cyanobacteria, utilize light energy (Leigh 2002). Some diazotrophs called rhizobia enters into symbiotic relationship with legumes like clover and soyabean. The symbiosis between legumes and the nitrogen-fixing rhizobia occurs within nodules mainly on the root and in a few cases on the stem (Burns and Hardy 1975). A similar symbiosis occurs between a number of woody plant species and the diazotrophic actinomycete Frankia (Pedrosa et al. 2000). These symbiotic associations are the greatest contributions of fixed nitrogen to agricultural systems.

The exploitation of biological nitrogen fixation for agricultural benefits has long been sought after. Biological nitrogen fixation provides a means to meet the needs of a growing population with a nutritious, environmentally friendly, sustainable food supply. This makes the need for BNF research very compelling in the current scenario. In the last two decades, many exciting happenings in nitrogen fixation took place, genomes have been sequenced, the “omics” approaches have been applied to both symbionts, and new genetically modified crops are becoming commonplace in agriculture. Biochemical research into the workings of nitrogen fixation is generally focused on the enzyme complex called nitrogenase. Other than its usual function, this system has emerged as a model for more general biochemical processes, such as signal transduction, protein–protein interaction, inter- and intramolecular electron transfer, complex metal cluster involvement in enzymatic catalysis, etc. (Peters et al. 1995). It is in this perspective that we have thought of reviewing the present trends in nitrogen fixation research in context to agriculture with special emphasis on genomics, proteomics, and bioinformatics.

2 Various Aspects of Biological Nitrogen Fixation

2.1 Biological Nitrogen Fixation and Sustainable Agriculture

Natural reserves of soil nitrogen are normally low, so commercially prepared N fertilizers must be added to increase plant growth and vigor. Chemical fertilizers had a substantial impact on food production in the recent past and are today an indispensable part of modern agricultural practices. But for the farmers of developing countries, N fertilizers are neither affordable nor widely available. Moreover, the harmful effects on the environment of heavy use of N fertilizer are becoming more evident day by day. Further, the fossil fuels which are used in the production of N fertilizer are becoming scarcer and more expensive. At the same time, the demand for food is going up as populations increase. Therefore, there is a great need to search for all possible avenues. The process of biological nitrogen fixation offers economically attractive and ecologically sound means of reducing external nitrogen input and improving the quality and quantity of internal resources. Biological nitrogen fixation is the reduction of atmospheric N2 gas to biologically available ammonium, mediated by prokaryotic organisms in symbiotic relationships, associative relationships, and under free-living conditions (Postgate 1998). The fixed nitrogen that is provided by biological nitrogen fixation is less prone to leaching and volatilization, and therefore the biological process contributes an important and sustainable input into agriculture. Nitrogen input through BNF can help maintain soil N reserves as well as substitute for N fertilizer to attain large crop yields (Peoples and Craswell 1992). An understanding of the factors controlling BNF systems in the field is vital for the support and successful adoption in large scale in an agricultural context.

Wani et al. (1995) highlighted the importance of biological nitrogen fixation of legumes in sustainable agriculture in semiarid tropical region. Legumes, one of the most important plant families in agriculture, are often involved in a remarkable symbiosis with nitrogen-fixing rhizobia. Legumes are often considered to be the major nitrogen-fixing systems, as they may derive up to 90 % of their nitrogen from N2. The quantity of atmospheric N fixed through forage legume biological N fixation can range as high as 200 kg/ha per year (Peoples et al. 1995). The symbiotic association of actinorhizal species helps in improving soil fertility in disturbed sites such as eroded areas, sand dunes, moraines, etc. Actinorhizal plant nitrogen fixation rates are comparable to those found in legumes (Torrey and Tjepkema 1979; Dawson 1983). Nitrogen-fixing Azolla–Cyanobacteria symbiosis has been widely used to enrich rice paddies with organic nitrogen in Southeast Asian countries like China, Vietnam, and Southeast Asia (Watanabe and Liu 1992). The rice paddies of Asia, which feed over half of the world’s population, depend upon cyanobacterial N2 fixation (Irisarri et al. 2001).

2.2 Physiological and Phylogenetic Diversity of Diazotrophs

Farmers have known, probably since the time of the Egyptians, that legumes such as pea, lentil, and clover are important for soil fertility. The practices of crop rotation, intercropping, and green manuring were extensively described by then Romans, but it was not until the nineteenth century that an explanation for the success of the legumes in restoring soil fecundity was uncovered. The discovery of nitrogen fixation was attributed to the German scientists Hellriegel and Wilfarth, who in 1886 reported that legumes bearing root nodules could use gaseous nitrogen. Shortly afterward, in 1888, Beijerinck, a Dutch microbiologist, succeeded in isolating a bacterial strain from root nodules. This isolate happened to be a Rhizobium leguminosarum strain (Franche et al. 2009). Beijerinck (in 1901) and Lipman (in 1903) were responsible for the isolation of Azotobacter spp., while Winodgradsky (in 1901) isolated the first strain of Clostridium pasteurianum (Stewart 1969). The discovery of nitrogen fixation in blue-green algae was established much later (Stewart 1969). The identification of nitrogen-fixing microbe from root nodules of nonleguminous plants like Alder generated considerable controversy for a while. It was Brunchorst who named the microbe Frankia subtilis (Pawlowski 2009). Hiltner (1898) recognized the nodule inhabitant as an actinomycete, Gram-positive bacteria closely related to Streptomyces. Pommer (1959) was probably the first person to obtain an isolate, but it did not reinfect its host plant. For a long time, diazotrophy in the actinomycetes was thought to be limited to the genus Frankia, but through the years several other actinomycetes have been shown to have nif genes (Gtari et al. 2012). Over the years there have been continual discoveries of new diazotrophs, revealing that this function is performed by a very diverse group of prokaryotes. In the last decades, the use of molecular technologies for the direct detection of the genes of biological nitrogen fixation has shown that the capacity for diazotrophy is even more widespread than previously expected.

Although nitrogen fixation is not found in eukaryotes, it is widely distributed among the Bacteria and the Archaea, revealing considerable biodiversity among diazotrophic organisms. The ability to fix nitrogen is found in most bacterial phylogenetic groups, including green sulfur bacteria, Firmibacteria, actinomycetes, cyanobacteria, and all subdivisions of the Proteobacteria. In Archaea, nitrogen fixation is mainly restricted to methanogens. The ability to fix nitrogen is compatible with a wide range of physiologies including aerobic (e.g., Azotobacter), facultatively anaerobic (e.g., Klebsiella), or anaerobic (e.g., Clostridium) heterotrophs; anoxygenic (e.g., Rhodobacter) or oxygenic (e.g., Anabaena) phototrophs; and chemolithotrophs (e.g., Alcaligenes, Thiobacillus, Methanosarcina) (Young 1992). Diazotrophs show considerable diversity in terms of habitats. They are found as free-living in soils and water, associative symbioses with grasses, actinorhizal associations with woody plants, and cyanobacterial symbioses with various plants. The most widely known and discussed feature of diazotrophs is their symbiotic association with a number of leguminous plants collectively referred to as rhizobia. The rhizobia are Gram negative and belong to the large and important Proteobacteria division and include the genera like Agrobacterium, Allorhizobium, Azorhizobium, Bradyrhizobium, Mesorhizobium, Rhizobium, Sinorhizobium, Devosia, Methylobacterium, and Ochrobactrum (Franche et al. 2009). These soil bacteria are able to invade legume roots in nitrogen-limiting environments, leading to the formation of a highly specialized organ, the root nodule. These specialized root structures offer an ecological niche for the microbe to fix nitrogen (Mylona et al. 1995). Symbiotic association is not limited to the legumes but to a number of nonlegumes. The most significant among them are the actinorhizal plants–Frankia association. The genus Frankia consists of filamentous actinomycetes forming symbiotic associations with a number of woody dicot plants like Casuarina, Hippophae, Alnus, Myrica, etc., belonging to different families (Benson and Silvester 1993). Frankia compartmentalizes nitrogenase within the vesicle structures, which are surrounded by an envelope containing a high content of bacteriohopane lipids and function to protect the enzyme from oxygen inactivation (Berry et al. 1993; Huss-Danell 1997). Over the years diazotrophy has been reported from other actinomycetes as well such as Mycobacterium flavum, Corynebacterium autotrophicum, Arthrobacter sp., Agromyces, etc. (Gtari et al. 2012). The findings of several authors (Von Bulow and Dobereiner 1975; Dobereiner 1976; Baldani and Baldani 2005) revealed existing associations of tropical grasses with nitrogen-fixing bacteria that, which under favorable conditions, may be contributing significantly to the N economy of these plants. The bacteria belong to the genus Azospirillum and are the most promising microorganisms that colonize roots of economically important grasses and cereals (Leigh 2002).

Cyanobacteria have long been known to fix nitrogen. Both heterocystous (like Anabaena, Nostoc, etc.) and nonheterocystous cyanobacteria (like Trichodesmium, Plectonema, etc.) are capable of diazotrophy (Schlegel and Zaborosch 2003). They are the only organisms that are capable of both O2-evolving photosynthesis and nitrogen fixation (Klipp 2004). Therefore, face the unique problem of balancing two essential, but incompatible, cellular processes: oxygenic photosynthesis and O2-sensitive N2 fixation. In some filamentous cyanobacteria, nitrogen fixation occurs in specialized, terminally differentiated cells called heterocysts that protect the nitrogenase complex from O2 damage by increasing respiration, terminating photosystem II activity, and forming multilayered cellular membranes that reduce oxygen diffusion, thus creating a microaerobic environment (Adams 2000). However, in members like Lyngbya, Plectonema, etc., where heterocyst is absent, nitrogen fixation occurs in internally organized cells (Schlegel and Zaborosch 2003). Another important aspect of cyanobacteria is their association with higher plants. The AnabaenaAzolla association (Bohlool et al. 1992) and NostocGunnera association (Mylona et al. 1995) can fix a substantial amount of nitrogen. Cycads in association with cyanobacterial species can also fix nitrogen (Rai et al. 2002).

2.3 Nitrogenase Complex: Enzymatic Machinery

The biochemical machinery required for biological nitrogen fixation is provided by the nitrogenase enzyme system (Eady and Postgate 1974; Hoffman et al. 2009). Nitrogenase is a two-protein component system that catalyzes the reduction of dinitrogen to ammonia coupled to the hydrolysis of ATP (Rees and Howard 2000). The most extensively studied form of nitrogenase is the molybdenum-containing system that consists of two component metalloproteins, the molybdenum–iron (MoFe) protein and the iron (Fe) protein. The smaller component of nitrogenase is the Fe protein, which acts as a redox-active agent and transfers electrons to the MoFe protein for the reduction of substrates from available electron donor in the system (Rees et al. 2005). It has two identical subunits. The Fe protein contains one iron sulfur cluster [4Fe-4S], which bridges the two subunits. The Fe protein has one MgATP-binding site in each subunit that binds to two MgATP molecules. Binding of MgATP to the Fe protein induces conformational changes followed by hydrolysis of MgATP, which facilitate the electron transfer from the Fe protein to the MoFe protein (Rees et al. 2005). Although this transfer of electrons is the main function of the Fe protein, it has some other functions. The Fe protein is needed for initial biosynthesis of the MoFe cofactor. Following the biosynthesis of MoFe cofactor, the insertion of the preformed MoFe cofactor into the MoFe protein requires the Fe protein (Burgess and Lowe 1996). The larger component of nitrogenase is the MoFe protein, which is a α2β2-tetramer, containing two αβ-dimer subunits. Each dimer contains one MoFe cofactor and one P-cluster [8Fe-7S]. The MoFe cofactor is located in the active site of the protein where the reduction of substrates occurs. The main role of the P-cluster is electron transfer by accepting an electron from the Fe protein and donating it to the MoFe cofactor. Each cluster contains eight metals and associated sulfurs that are arranged distinctively. The αβ-dimeric units communicate and contact each other through their subunits (Burgess and Lowe 1996). The P-cluster bridges between each α- and β-subunit, while the MoFe cofactor is placed on α-subunits. In addition to this molybdenum-containing nitrogenase, alternative nitrogenases also exist that are homologous to this system, but with the molybdenum almost certainly substituted by vanadium or iron (Eady 1996). The vanadium-nitrogenase system has two components. It has an Fe protein which is the same as other nitrogenase systems, and the second component is a vanadium–iron (VFe)-containing protein which is different compared to two other systems. This type of nitrogenase has been detected in A. vinelandii and A. chroococcum (Robson et al. 1986). The third type of nitrogenase, iron only, contains an iron (Fe) protein and another protein, which is very similar to MoFe protein and VFe protein, while it has only Fe as its cofactor. This type of protein has also been detected in A. vinelandii nitrogenase (Eady 1996).

Studies by various authors (Thorneley and Lowe 1985; Burgess and Lowe 1996) revealed that the basic mechanism of nitrogenase involves the following: (1) complex formation between the reduced Fe protein with two bound ATP and the MoFe protein, (2) electron transfer between the two proteins coupled to the hydrolysis of ATP, (3) dissociation of the Fe protein accompanied by re-reduction (via ferredoxins or flavodoxins) and exchange of ATP for ADP, and (4) repetition of this cycle until sufficient numbers of electrons and protons have been accumulated so that available substrates can be reduced. In addition to dinitrogen reduction, nitrogenase has been found to catalyze the reduction of protons to dihydrogen, as well as nonphysiological substrates such as acetylene.

2.4 Genetics and Genomics of Biological Nitrogen Fixation

The biochemical complexity of nitrogen fixation is reflected in the genetic organization and in the regulation of expression of the components required for the catalytic activity. Various techniques like mutations, deletion mapping, cloning vectors, etc., have facilitated the identification of genes associated with nitrogen fixation. The organization and regulation of the genes were revealed in the early 1980s. The organism that appears to have the simplest organization of nitrogen fixation-specific (nif) genes, and which is the one best studied at the molecular genetics level, is the facultative anaerobe, Klebsiella pneumoniae. Arnold et al. (1988) reported the first ever detailed organization of nif genes from this organism. A 24 kb base pair DNA region contains the entire K. pneumonia nif cluster, which includes 20 genes. nifHDK are the three structural genes encoding for the three subunits of Mo nitrogenase. In most nitrogen-fixing prokaryotes, these three genes form one transcriptional unit, with a promoter in front of the nifH gene. A number of studies (Dixon et al. 1980; Paul and Merrick 1989; Rubio and Ludden 2005, 2008) have established that the maturation of apo-Fe protein (NifH) requires the products of nifH, nifM, nifU, and nifS, while that of apo-MoFe protein requires at least six genes nifE, nifN, nifV, nifH, nifQ, and nifB which are required for the biosynthesis of FeMoco. There is considerable homology between nifDK and nifEN, and it has been speculated that the nifEN products might form a scaffold for FeMoco biosynthesis that later shifts FeMoco to the nifDK complex (Brigle et al. 1987). Imperial and his coworkers (1984) established that the nifQ gene product might be involved in the formation of a molybdenum–sulfur precursor to FeMoco. Mutations in nifB result in the formation of an immature MoFe protein that lacks FeMo cofactor. It can be activated in vitro by adding FeMo cofactor that has been isolated from wild-type MoFe protein (Roberts et al. 1978). Mutations in the nifV gene result in the formation of a nitrogenase with a bound citrate rather than homocitrate. The nifV product is homocitrate synthase (Zheng et al. 1997). Thus, on the basis of mutational studies, the function of various other nif genes has been confirmed. In contrast to Klebsiella, the nif organization is a bit complex different in Azotobacter vinelandii. In Azotobacter the genes coding for the Mo-dependent nitrogenase components (nifHDK) and their regulatory and assembly systems are located in two discrete regions (O’Carroll and Dos Santos 2011). The organization of nitrogen-fixing genes along with their genetic regulation in different rhizobia was extensively reviewed by Fischer (1994), and according to him rhizobial nif genes are structurally homologous to the 20 K. pneumoniae nif genes, and it is inferred that a conserved nif gene plays a similar role in rhizobia as in K. pneumoniae.

Besides the nif genes, the “fix”- and “nod”-type genes are associated with biological nitrogen fixation and nodule formation in rhizobial species, and many do not have homologues in the free-living diazotroph like K. pneumonia. The fix genes represent a very heterogeneous class including genes involved in the development and metabolism of bacteroides. Studies by Anthamatten and Hennecke (1991) and Batut et al. (1991) have established that fix L, fixJ, and fixK genes encode regulatory proteins. The fixABCX genes code for an electron transport chain to nitrogenase (Fischer 1994). Mutations in any one of the fixABCX genes of S. meliloti, B. japonicum, and A. caulinodans completely abolish nitrogen fixation. All four fixGHIS gene products are predicted to be transmembrane proteins, but further biochemical analysis is required to define their function in rhizobial nitrogen fixation (Fischer 1994). The fixNOQP genes encode the membrane-bound cytochrome oxidase that is required for the respiration of the rhizobia in low-oxygen environments (Delgado et al. 1998). Johnston and his coworkers discovered the presence of nodulation genes in a plasmid of Rhizobium leguminosarum and mutation of those genes rendered them useless. Later on studies (Schultze and Kondorosi 1998; Perret et al. 2000) ascertained that nod, nol, and noe genes produce nodulation signals. The interplay of different nod genes, triggering of the creation of root nodule, signaling cascades, and development of nodule meristem were reported by a number of researchers (Yang et al. 1999; Long 2001; Geurts and Bisseling 2002). In most species, the nod ABC genes are part of a single operon. Inactivation of these genes abolishes the ability to elicit any symbiotic reaction in the plant (Long 1989). Over the years other nod genes like nodD, nodEF, nodS, nodL, and nodHPQ have been characterized in many rhizobia. Like the rhizobia, Azospirillum includes a megaplasmid and sequences similar to nod genes (Elmerich 1984). Frankia on the other hand houses a number of nif genes, but researchers failed to spot nod genes in Frankia (Ceremonie et al. 1998).

Understanding of genetic machinery behind biological nitrogen fixation attained new heights with the arrival of complete genome sequences of various diazotrophs. Recent advances in genome sequencing have opened exciting new perspectives in the field of genomics by providing the complete gene inventory of rhizobial microsymbionts. Genomics have enabled thorough analysis of the gene organization of nitrogen-fixing species, the identification of new genes involved in nitrogen fixation, and the identification of new diazotrophic species. Mesorhizobium loti strain MAFF303099 (Kaneko et al. 2000) was the first sequence of a symbiotic bacterium, and it was followed by Sinorhizobium meliloti (Puhler et al. 2004). The completion of the genomes of Rhizobium leguminosarum bv viciae (Young et al. 2006), Rhizobium etli (Gonzalez et al. 2006), Bradyrhizobium strains, and Frankia strains (Normand et al. 2007) and sequences for a number of free-living diazotrophs spanning different habitat and ecological niches bolstered nitrogen fixation. The genome information from all these nitrogen-fixing organisms allows researchers to rapidly apply information obtained from genome sequencing to the developing area of functional genomics, which will provide new insights into the complex molecular relationships that support both symbiotic and nonsymbiotic nitrogen fixation. DNA array technologies are now being used to monitor the expression of a whole genome in a single experiment. The first massive approach to transcriptional analyses of the entire symbiotic replicons was based on a high-resolution transcriptional analysis of the symbiotic plasmid of Rhizobium sp. NGR234 (Perret et al. 1999) at the Universite de Geneve, which developed methods to study the regulation of bacterial genes during symbiosis. The transcriptome for S. meliloti has been examined under a variety of conditions, including in planta (Ampe et al. 2003; Berges et al. 2003). Functional gene arrays or GeoChips are also being utilized for high-throughput analysis of microbial communities involved in nitrogen fixation. Xie et al. (2011) have utilized GeoChip-based analysis to screen out functional genes associated with N-cycle in extreme environment like acid-mine drainage.

3 The Application of Bioinformatics in BNF Research

As we enter into the post-genomics era, the bioinformatics tools have emerged as important means in research of biological nitrogen fixation. Large-scale genome projects have resulted in the availability of tremendous amount of biological data. This data includes information about genomes which in turn gives the idea about proteins, codon usage, etc. With the current deluge of data, computational methods have become indispensable to biological investigations. The development of bioinformatics and statistical genetics has resulted in the production of a number of tools, which are used to annotate the genome and obtain productive information from them (Hogeweg 2011). Originally developed for the analysis of biological sequences, bioinformatics now encompasses a wide range of subject areas including structural biology, genomics, and gene expression studies.

One of the primary applications of bioinformatics is the organization of the biological data in database that allows researchers to access existing information with ease. Open-access databases like GenBank, EMBL, and DDBJ now house thousands of nifH and nifD sequences. The numbers of fully sequenced and assembled diazotrophic genomes deposited in the databases have also gone up in the last few years. Simultaneously, new databases exclusively devoted to various aspects of biological nitrogen fixation like NodMutDB (Nodulation Mutant Database) (Mao et al. 2005), RhizoGATE (Becker et al. 2009), RhizoBase (http://genome.kazusa.or.jp/rhizobase/), etc., have also surfaced in recent years. EST programs conducted in the model legume M. truncatula have led to the development of databases that allow data mining to identify genes relevant for nitrogen-fixing symbioses, for example, the TIGR M. truncatula Gene Index (http://www.tigr.org/tdb/mtgi) (Quackenbush et al. 2000), the M. truncatula database MtDB2 (http://www.medicago.org), and the database of the Medicago Genome Initiative (Bell et al. 2001). The data present in the various databases can be analyzed and interpreted in a biologically meaningful manner with the aid of computational tools.

Nowadays, the rapid increase in the number of prokaryotic species with sequenced genomes enables the development of in silico searching tools to identify complex biochemical pathways such as nitrogen fixation. Such assumptions, although very accurate, yield putative results and do not obviate the need for genetic and biochemical confirmation of gene function. Computation prediction tools like BLAST (Basic Local Alignment Search Tool) are being used by researchers for examining the occurrence and distribution of nitrogen fixation genes. The genomes present in the database are being scanned using NifHDK as query sequence (O’Carroll and Dos Santos 2011). Phylogenies for the major nif operon genes have been inferred by distance matrix-based methods like neighbor-joining or UPGMA or maximum likelihood-based methods in an attempt to understand the timing and complex genetic events that have marked the history of nitrogen fixation (Raymond et al. 2004). Computational tools are also now routinely employed by researchers (Amadou et al. 2008; Carvalho et al. 2010; Peralta et al. 2011; Black et al. 2012) to compare the entire genomes of diazotrophs, which permits the study of more complex evolutionary events, such as gene duplication, horizontal gene transfer, and the prediction of factors important in bacterial speciation. Comparative genomics of Frankia yielded vital information regarding their evolutionary history and linked the inconsistency of genome size with the biogeographic history of the host plants harboring the microbial strains (Normand et al. 2007). Systems biology is another area where computer-based simulation has been used extensively to analyze and visualize the complex connections and circuits of cellular pathways such as nitrogen fixation. Zhao and his colleagues (2012) used several in silico tools for the reconstruction of metabolic network involved in symbiotic nitrogen fixation in S. meliloti 1021. It provided a knowledge-based framework for better understanding the symbiotic relationship between rhizobia and legumes. The nifH gene is the most widely sequenced marker gene used to identify nitrogen-fixing Bacteria and Archaea. Many PCR primers have been developed to target the nifH gene with the purpose of amplifying this gene sequence. Various program tools like Primer designer, PrimerSelect, Primer3, etc., are now available which assist in designing these primers and evaluating the primer through e-PCR (Schuler 1997). Recently Gaby and Buckley (2012) made a thorough in silico evaluation of the various nifH primers.

Bioinformatics is also indispensable for the examination of the data obtained in proteome analysis. An excellent resource of Internet-accessible proteome databases is the Expert Protein Analysis System (ExPASy), available online at http://www.expasy.ch/ (Gasteiger et al. 2003). Furthermore software packages have been developed that can take multiple protein-expression profiles and automatically identify quantitative changes of interest. Two-dimensional electrophoresis databases are accessible on the Internet and can be browsed with interactive software and integrated with in-house results. A cluster of Orthologous Groups of proteins (COG) is a new database search and represents an attempt at a phylogenetic classification of proteins from complete genomes (http://www.ncbi.nlm.nih.gov/COG) (Tatusov et al. 2000). It is to serve as a platform for functional annotation of newly sequenced genomes and for studies on genome evolution. In addition, the identification of domains as subsets of proteins has been a very promising approach, implemented by databases such as InterPro (http://www.ebi.ac.uk/interpro/). Proteomic analysis has revealed the direct genome functionality in a number of diazotrophic genomes (MacLean et al. 2007). Smit and coworkers (2012) have used various proteomics approaches along with bioinformatics tools for proteomic phenotyping of Novosphingobium nitrogenifigens, a free-living diazotroph. Rapid developments of technological expertise in proteomics coupled with the improvement of in silico tools have resulted in a deluge of structural information that guarantees acceleration in nitrogen fixation research.

As we march into the new millennium, practical application of computation tools to decipher meaningful information from available data is inevitable. Some of the important in silico tools used for research in various aspects of biological nitrogen fixation is mentioned in Table 1. Bioinformatics has the potential to elevate the research on biological nitrogen-fixing bacteria and its protein machinery to a next level. The availability of bioinformatics tools has provided an opportunity to focus on the comparative genomics, molecular evolution of the genomes along with conformational and structural details of the proteins involved. Structural studies of proteins will provide a better understanding of the functional evolution of diazotrophy.

Table 1 Some bioinformatics tools used for research in biological nitrogen fixation

3.1 Research Trends in Codon Usage Analysis and Comparative Genomics

In the post-genomics era, the application of bioinformatics tools in comparative genomics has led to the belief that every genome has its own story. Particularly the genetic code and its usage preferences are one of the most interesting aspects of biological science. In the early period, majority of work on codon usage patterns focused upon E. coli (Peden 1999). Gradually the bioinformatics analysis of codon usage was applied upon mammalian, bacterial, bacteriophage, viral, and mitochondrial genes. Sharp and Li (1987) were the pioneers in developing the Codon Adaptation Index (CAI) to assess the similarity amid the synonymous codon usage of a gene to that of the reference set. Besides CAI, several indices such as GC content, GC3 content, effective number of codons (Nc) (Wright 1990), relative synonymous codon usage (RSCU) (Sharp et al. 1986), Codon Bias Index (CBI), and Fop (frequency of optimal codons) (Ikemura 1985) are very significant in studies concerning codon usage patterns. Very preliminary work on codon usage of nitrogen-fixing diazotrophs was initiated by Mathur and Tuli (1991). Ramseier and Gottfert (1991) reported differences in codon usage and GC content in Bradyrhizobium genes. Moderate codon bias was attributed to translational selection in nitrogen-fixing genes of Bradyrhizobium japonicum USDA 110 (Sur et al. 2005). The analysis of synonymous codon usage patterns of three Frankia genomes (strains CcI3, ACN14a, and EAN1pec) revealed that codon usage was highly biased, but variations were noticed among the three strains(Sen et al. 2008). Using Codon Adaptation Index (CAI), highly expressed genes in Frankia were predicted. Synonymous codon usage analysis in Azotobacter vinelandii divulged considerable amount of heterogeneity (Sur et al. 2008). About 503 potentially highly expressed genes were identified, and most of them were linked to metabolic functions of which 10 were associated with the core nitrogen-fixing mechanism. Sen et al. (2012) explored the role of rare TTA codon in the genome of diazotrophic actinomycetes Frankia.

Other than codon usage, molecular evolution of genes is another aspect which needs to be investigated. A more reliable index of genetic drift over evolutionary time is the ratio of Ka (nonsynonymous substitutions per site) to Ks (synonymous substitutions per site) for a large set of genes, based on the comparisons of related species. The Ka to Ks ratio, which is almost always less than one, is widely used as an indicator of the extent of purifying selection acting to conserve coding sequences. This parameter has been widely applied in the analysis of adaptive molecular evolution and is regarded as a general method of measuring the rate of sequence evolution in biology. Program packages like PAML (Yang 1997) have been extensively used for the estimation of nucleotide substitution rates based on phylogenetic analysis by maximum likelihood (ML). Ka/Ks parameters have been used to assess the molecular evolution of in plant hemoglobin genes (Guldner et al. 2004), secretory protein genes in Streptomyces and yeast (Li et al. 2009b), and in various disease-causing genes. Among diazotrophs, Crossman et al. (2008) measured the rates of synonymous (Ks) and nonsynonymous substitutions (Ka) in orthologous genes of R. etli and R. Leguminosarum. More recently, synonymous and nonsynonymous substitution rates of orthologs shared by five species of Rhizobiales, three plant symbionts, one plant pathogen, and one animal pathogen have been calculated by Peralta et al. (2011). Apart from the whole genome, molecular evolution of the genes responsible for symbiotic association and nodulation such as nodule-specific genes (Yi 2009) and recently SymRK (Mahe et al. 2011) has been specifically analyzed. But still a lot of symbiotic genes from wide range of diazotrophs have still to be analyzed to gather a complete scenario of their evolutionary rate in terms of their sequence features. Accumulations of bacterial whole genome sequences also give the biologists more opportunities to explore and compare the genomes in larger scale. Comparative genomics has given rise to a new concept highlighting the great diversity between closely related strains. A species can be described by its pan-genome, i.e., the sum of a core genome containing genes present in all strains, and a dispensable genome, with genes absent from one or more strains and genes unique to each strain (Medini et al. 2005). Studying the diversity within pan-genomes is of interest for the characterization of the species or genus. Low pan-genome diversity could be reflective of a stable environment, while bacterial species with substantial abilities to adapt to various environments would be expected to have high pan-genome diversity (Snipen and Ussery 2010). In 2005, Tettelin and colleagues introduced the conception of “pan-genome” in Streptococcus agalactiae (Tettelin et al. 2005). Soon afterward, pan-genome has been widely used to provide insight into the analysis of the evolution of S. pneumonia (Hiller et al. 2007), H. influenza (Hogg et al. 2007), E. coli (Rasko et al. 2008), and so on. Besides evolution, pan-genome has been widely used to detect strain-specific virulence factors for some pathogens, L. pneumophila (D’Auria et al. 2010). Recently symbiotic pan-genome of the nitrogen-fixing bacterium Sinorhizobium meliloti has been explored using computational methods, and a set of accessory genetic factors related to the symbiotic process have been defined (Galardini et al. 2011). As complete nucleotide sequences of more chromosome and symbiotic plasmids of nitrogen-fixing organisms become available, we have entered into the phase of comparative genomics. Comparative genomics also enables a much deeper understanding of the origin and evolution of free-living and symbiotic nitrogen fixation. Comparative genomics approach has been utilized by Carvalho et al. (2010) to delineate the evolutionary characterization of diazotrophic and pathogenic bacteria of the order Rhizobiales. Black et al. (2012) have worked upon 14 strains of Rhizobiales to investigate the feasibility of defining a core “symbiome.” The authors’ group is currently engaged in comparative genomics of nitrogen-fixing actinomycetes, Frankia, and members of Rhizobiales using CMG-Biotools – a platform for comparative genomics. The proteomes are compared with BLASTP using the “50/50” rule, i.e., BLASTP hit was considered significant if the alignment produced at least 50 % identity for at least 50 % of the length of the longest gene (either query or subject). The BLAST results are visualized in a BLAST matrix, which summarizes the results of genomic pairwise comparisons. One such BLAST matrix produced for five Frankia strains is presented in Fig. 1. The comparison of these whole genomes has revealed valuable information, such as several events of lateral gene transfer, particularly in the symbiotic plasmids and genomic islands that have contributed to a better understanding of the evolution of contrasting symbioses.

Fig. 1
figure 1

BLAST matrix for the Frankia genomes created by CMG-Biotools platform for comparative genomics. The darkest green, indicative of the highest fraction of genes found similar between two genomes

3.2 Bioinformatics Approaches for the Characterization of Proteins Related to BNF

Apart from the sequence-based analysis and comparative genomics, the structural biology is one such field which has been hugely benefitted by bioinformatics tools. Structural analyses include protein and nucleic acid structure prediction, comparison, classification, and assessment of structure–function relationship. Often it is seen that structural analysis in turn depends on the results of sequence analysis. For example, protein structure prediction depends on sequence alignment data. Thus the two aspects of bioinformatics analysis are not isolated but often interact to produce integrated results.

Developments in the field of proteomics have resulted in availability of large amount of biological data in the public domain. This data includes amino acid sequences of nitrogenase proteins from a wide range of microbes. However, very little is known about the structure and role of all these proteins. Two technologies, X-ray and NMR, are by far the two most common means used to determine protein structure experimentally. In 1992, Kim and Rees (1992) provided a detailed crystallographic structure of molybdenum–iron protein of the Azotobacter vinelandii nitrogenase. The crystal structure of nitrogenase molybdenum–iron protein has also been described from Clostridium pasteurianum (Kim et al. 1993). The X-ray crystal structure of Klebsiella pneumoniae nitrogenase component 1 (Kp1) has also been determined and refined to a resolution of 1.6 Å (Mayer et al. 1999). The 2.9 Å crystal structure of the NifH protein from Azotobacter vinelandii was obtained by Georgiadis et al. (1992). However, tertiary structures of large number of nitrogenase proteins from different diazotrophs particularly those of symbiotic ones have not yet been resolved. The exact mechanism of working of these proteins is also relatively unknown due to the difficulty in obtaining crystals of nitrogen bound to nitrogenase. This is because the resting state of MoFe protein does not bind nitrogen. Moreover, in the recent years, quite a number of discrepancies have also crept out regarding the protein structures resolved by X-ray crystallography leading to retraction of papers (Chang et al. 2006). In this regard, a viable alternative approach is to predict 3D structure of proteins based on homology modeling technique and validate it properly. Homology modeling is a reliable technique that can consistently predict the 3D structure of a protein with precision akin to one obtained at low resolution by experimental means (Marti-Renom et al. 2000). This technique depends upon the alignment of a protein sequence of unknown structure (target) with that of a homologue of known structure (template). This technique is particularly quite important in organisms with slow growth rate which poses difficulties in the purification of subsequent proteins. Browne et al. (1969) published the first report on homology modeling. A model of α-lactalbumin was constructed by taking the coordinates of a hen’s egg-white lysozyme and modifying, by hand, those amino acids that did not match the structure. Since the mid-1980s, a large number of homology models of proteins with different folds and functions have been reported in the literature (Johnson et al. 1994; Sali 1995). Homology modeling approaches were first applied for structural analysis of nitrogenase iron protein from Trichodesmium sp., a marine filamentous nitrogen-fixing cyanobacteria (Zehr et al. 1997a). Standard homology modeling approaches have also been used to generate reliable models of the nitrogenase Fe protein from thermophilic Methanobacter thermoautotrophicus based on the structure of the Azotobacter vinelandii nitrogenase Fe protein (Sen and Peters 2006). The authors’ group has been involved in the determination of 3D-structure NifH protein from various diazotrophs like Frankia (Sen et al. 2010) and Bradyrhizobium ORS 278 (Thakur et al. 2012) using homology modeling technique. The model of NifH of Frankia (Fig. 2) was based on the template protein which was a nitrogenase iron protein from Azotobacter vinelandii. The structure is reliable offering insights into the 3D structural framework as well as structure–function relationship of NifH protein. The models based on homology are quite useful in providing conformational properties and structure–function relationship of these proteins.

Fig. 2
figure 2

Three-dimensional model of NifH protein from Frankia sp. CcI3 created by homology modeling technique (Sen et al. 2010)

A number of aspects of nitrogenase, particularly structure–function relationships, are interesting areas of fundamental research. The three-dimensional structure of protein like that of nitrogenase is often considered an ideal model system for the study of the complex metal cluster-mediated catalysis, electron transfer, complex metal cluster assembly, protein–protein interactions, and nucleotide-dependent signal transduction. Molecular dynamics simulations offer details about molecular motions as a function of time and are widely used to study protein motions at the atomic level. First protein simulation for 9.2 ps was carried out by McCammon et al. (1977) for bovine pancreatic trypsin inhibitor (BPTI) (McCammon et al. 1977). Case and Karplus work on dynamics of ligand binding to heme protein in 1979 is arguably the first simulation of ligand moving through the protein (Case and Karplus 1979). First application of normal modes to identify low frequency oscillations using the energy minimization of the molecular mechanics force field of protein was described by Brooks and Karplus (1983). This is the basic technique to identify domain-level motions in a protein. First simulation of a protein in explicit waters was done by Levitt and Sharon (1988).

Metalloproteins like nitrogenase are a vast class of biological molecules, which are responsible for many vital functions. Despite the intrinsic difficulties of these systems particularly those related to parameterization of the metal cofactors, they have been the object of several MD simulations. These studies are mainly focused on structural aspects, since the cluster either has a storage role or is involved in an electron-transfer process in these proteins. Among the metalloprotein having FeS cluster cofactor, molecular dynamics simulation has been carried for protein like heme-containing cytochrome P450 (Kuhn et al. 2001), Rubredoxins (Grottesi et al. 2002), 3Fe–4S cluster-containing protein, ferredoxin I (Meuwly and Karplus 2004), adenosine phosphosulfate reductase (dos Santos et al. 2009), and hydA1 hydrogenase (Sundaram et al. 2010). More recently, molecular modeling, dynamics, and docking studies on both A. vinelandii and G. diazotrophicus FeSII proteins and nitrogenases were carried out by Lery et al. (2010), elucidating molecular aspects of protein–protein interaction. In the MD simulation of metalloproteins, the force field parameters of the metal ion and its ligands need to be defined beforehand taking into account the nature of the metal ion, its coordination number, geometry, oxidation, and spin states and the nature of its ligands. Several sets of parameters have been reported in the literature for the active sites of the most widely studied metalloproteins including the coordination geometries of the metal ligand (Banci and Comba 1997; Norrby and Brandt 2001; Comba and Remenyi 2002). One of the parameters that significantly affect the overall protein structure is the partial charges of the atoms of the metal–ligand moiety. In the bonded model, partial charges are commonly calculated through the RESP (Restrained Electrostatic Surface Potential) methodology (Fox and Kollman 1998) applied to semiempirical or ab initio calculations. The ab initio calculations are mostly performed through density functional theory (DFT) calculations, with the B3LYP functional or Hartree–Fock calculations (Banci 2003). Thus, the development of proper parameters of the metal cofactors needs the amalgamation of quantum calculations in conjunction with classical molecular mechanics calculations. This will enable the description of not only structural features but also of reactivity properties of the metalloproteins.

3.3 Tracing the Evolution of BNF Through Bioinformatics

3.3.1 Classical Approach

Researchers have long sought to answer the question of when nitrogen fixation began and what evolutionary pressures affected it (Postgate and Eady 1988; Berman-Frank et al. 2003). The emergence and evolution of nitrogen fixation ability (diazotrophy) among prokaryotes is complex and has not yet been fully elucidated. The incomplete distribution pattern of this highly conserved enzyme among Bacteria and Archaea has led to the development of conflicting hypotheses on BNF. The first idea theorizes that nitrogen fixation is an ancient function of the last common ancestor of Bacteria–Archaea that was vertically transmitted, but has undergone widespread gene loss among descendants with horizontal transfer in some isolated instances (Hennecke et al. 1985; Normand and Bousquet 1989; Fani et al. 2000; Berman-Frank et al. 2003). During this postulated time period, reduced nitrogen may have been very abundant and the initial function of nitrogenase was probably very different. One proposed initial function of ancient nitrogenase might be associated with detoxification mechanism for cyanides and other chemicals (Silver and Postgate 1973; Fani et al. 2000). This idea is based on the observation that nitrogenase reduces a number of alternative substrates in addition to N2, several of which are toxins (e.g., cyanides). The second hypothesis proposes that nitrogen fixation was an anaerobic ability that appeared after the emergence of oxygenic photosynthesis and was subsequently lost in most lineages through horizontal transfer (Postgate 1982; Postgate and Eady 1988). Recently, Hartmann and Barnum (2010) examined Mo-nitrogenase phylogeny and proclaimed a conclusion combining both theories on diazotrophic evolution.

Nitrogenase genes are highly conserved at both the chemical and genetic levels across wide phylogenetic ranges and among closely related organisms. The conservation of nitrogenase genes lends itself for use as a genetic marker for phylogenetic analysis to help answer questions of the evolution of nitrogen fixation and its genes. Raymond et al. (2004) reported that nitrogenase evolved in multiple lineages, and there are evidences of loss, duplications, and horizontal and vertical transfers for the nitrogenase genes and operons during the course of evolution. nifD and nifK are thought to be the result of an in-tandem gene duplication (Fani et al. 2000; Postgate and Eady 1988), giving the functional components of the enzyme. A second duplication event is thought to have occurred for the nifEN genes. Till date most of the studies concerning the evolution of nitrogen fixation have focused on the nif genes, primarily the highly conserved nifH gene but also the larger but less conserved nifD, nifK, nifE, and nifN genes (Normand and Bousquet 1989; Normand et al. 1992; Hirsch et al. 1995; Fani et al. 2000). Sequence alignment-based methods are widely used to study the evolution of relevant nif genes. Young (2005) discussed the phylogeny and evolution of nitrogenases in details. According to Young, true NifH proteins can be divided into three types – Type B (“bacterial”) is the best represented and includes enzymes from the proteobacteria, cyanobacteria, and firmicutes; Type C (“clostridial”) is found in the firmicute bacterium and Clostridium, the green sulfur bacterium Chlorobium, and also in the archaeon Methanosarcina; and Type A is associated with the “alternative” nitrogenases that do not contain molybdenum and is found in both archaea and proteobacteria. There are also a large number of more distant relatives, notable among them light-independent protochlorophyllide (Pchlide). The similarity between these proteins and NifH was analyzed and discussed by Burke et al. (1993), who argued that nitrogen fixation probably originated before photosynthesis, so the photosynthesis enzymes would have been derived from NifH rather than the other way round. The phylogenies of NifDKEN family have also been topic of many research works. Dedysh et al. (2004) utilized the NifD phylogeny to assess the nitrogen fixation capabilities of methanotrophic bacteria. Henson et al. (2004b) reexamined the phylogeny of nitrogen fixation by analyzing only the molybdenum-containing nif D gene from a cyanobacteria, proteobacteria, as well as Gram-positive bacteria. The strict requirement of NifH in biological nitrogen fixation and its universal presence in diazotrophs has resulted in this protein serving as a sequence tag or barcode for the identification of nitrogen fixers. Genomic analysis using the sequence of NifH as a query results in BLAST hits that include NifH, VnfH, and AnfH components of the Mo-, V-, and Fe-only nitrogenases, respectively (Raymond et al. 2004). Recently, Dos Santos and his colleagues proposed a new criterion for computational prediction of nitrogen fixation: the presence of a minimum set of six genes coding for structural and biosynthetic components, namely, NifHDK and NifENB (Dos Santos et al. 2012). Latysheva et al. (2012) considered the various nif orthologs for performing empirical Bayesian ancestral state reconstructions to investigate the evolution of nitrogen fixation in cyanobacteria.

Over the years, there has been a debate among the workers regarding horizontal gene transfer (HGT) versus vertical descent as the dominant force in the evolution and distribution of N fixation. In the case of an early origin and subsequent vertical descent of the nif genes, a comparison of SSU ribosomal phylogeny and the phylogeny of nif genes should reveal roughly the same features, assuming that the mutation rates in both genes were similar. In the case of a late development and a mainly horizontal distribution of the genes, the phylogeny of the nif genes should deviate significantly from the rRNA-based standard tree. A number of researchers have presented strong evidence that SSU rRNA phylogeny and phylogeny based on the nif genes are in general agreement, suggesting that they have evolved in a similar fashion (Hennecke et al. 1985; Young 1992; Zehr et al. 1997b). However, numerous studies have highlighted instances of possible horizontal gene transfer in nifD (Parker et al. 2002; Qian et al. 2003; Henson et al. 2004a, b), nifH (Normand and Bousquet 1989; Hurek et al. 1997; Cantera et al. 2004; Dedysh et al. 2004), and nifK (Kessler et al. 1997) based on incongruence with 16S rRNA trees. Other studies have found support for both vertical descent and horizontal transfer (Hirsch et al. 1995). Haukka et al. (1998) proposed that horizontal gene transfer may have played an increasing role at genus and lower taxonomic levels. This may be especially important in organisms that have nif genes located on plasmids (Normand and Bousquet 1989).

3.3.2 Alternative Approaches

For tracing evolution of proteins within a set of divergently evolved proteins, it is useful to construct the phylogenetic trees based on the similarities in the amino acid sequences and the base sequences of the genes. But previous studies seem to suggest that the origin and extant distribution of nitrogen fixation is perplexing from a phylogenetic perspective, largely because of factors that confound molecular phylogeny such as sequence divergence, paralogy, and horizontal gene transfer (Raymond et al. 2004). This leads to the assumption that sequence-based phylogeny is not enough to reveal the complex evolutionary path in BNF. Moreover, many workers (Nadler 1995; Qi et al. 2004; Sims et al. 2009) have also pointed out fallacies in sequence alignment-based methods. Therefore substitute phylogenetic approaches are being sought. Alignment-free condensed matrix method relying on nucleotide triplet is one such alternative approach. The condensed matrix method of studying molecular phylogeny takes into account a set of invariants in a DNA sequence and determines the extent of resemblance among DNA sequences using the invariants (Randic et al. 2001). In the condensed matrix method, all the possible triplets of the nif genes were calculated and matrices were formed by using all the possible triplet. Then leading eigenvalues of these matrices were calculated. The eigenvalues were later used for the construction of distance matrices and consequently for tree construction. This approach has been utilized by researchers in phylogenetic analysis of aminoacyl t-RNA synthetase (Mondal et al. 2008), swine flu genomes (Sur et al. 2010), bacterial zeta toxin (Mondal et al. 2011), and nitrogenase proteins (Sur et al. 2010). A cladogram showing the evolution of nifH gene in various diazotrophs constructed by the condensed matrix method is presented in Fig. 3. In the phylogram, the placement of Frankia ACN14a away from the other actinobacteria and Synechococcus sp. JA-3-3Ab being isolated from rest of cyanobacterial strains is apparently quite interesting. Members of various classes of Proteobacteria (alpha, beta, gamma, and delta) are clustered together in the triplet-based phylogenetic tree. Mottled distribution of cyanobacteria is an indication of their polyphyletic origin. Thus, condensed matrix method-based phylogeny is apparently a suitable method for explaining the complex events marking the nitrogen fixation evolution.

Fig. 3
figure 3

Phylogenetic tree of nifH gene based on condensed matrix method developed by the author’s group. Colored fonts are used to indicate different classes of diazotrophs. Purple is used for proteobacterial strains; black is used for cyanobacterial strains; blue for green sulfur; orange for Actinobacteria; green for Firmicutes; red for methanogenic; and gray for Aquificae

Another suitable alternative of protein sequence alignments is the structure-based phylogeny. It is well known that the 3D structures and structural features of homologous proteins are conserved better than their amino acid sequences (Chothia et al. 1986; Hubbard and Blundell 1987). It has been demonstrated several times that the homologous proteins could diverge beyond recognition at the level of their amino acid sequences but maintain similar structure and function. In several cases of low sequence similarity, proteins retain the fold as well as retain the broad biochemical features and/or functional properties, suggesting an evolutionary connection (Murzin et al. 1995; Russell and Sternberg 1996). Previous studies (Balaji and Srinivasan 2001) have shown that in cases of poor sequence identity, structure-based phylogenies generate better models of evolution of proteins than the traditional sequence-based methods. Hence, it is more appropriate to use similarities in 3D structure of proteins in modeling evolution of distantly related proteins. The construction of phylogenetic trees using 3D structures has been applied for a variety of protein families like short-chain alcohol dehydrogenases (Breitling et al. 2001) and metallo-β-lactamases (Garau et al. 2005). Lately 3D structure-based phylogenetic approach has been utilized for functional characterization of proteins with cupin folds (Agarwal et al. 2009). It was revealed that structure-based clustering of members of cupin superfamily reflects a function-based clustering. Moreover, the comparison of distance matrices utilized in phylogenetic tree construction methods has been considered as an equivalent of comparison of phylogenetic trees based on protein structures (Balaji and Srinivasan 2001; Pazos and Valencia 2001). Therefore, such structure-based approaches can be utilized to assess the phylogenetic relationships of proteins involved in BNF which shares low sequence similarity but high structural resemblance with many proteins with diverse biological functions.

Along with the trajectory of evolution of diazotrophy in various organisms, another feature that needs attention is the functional divergence of the proteins involved in this biological process. Previous workers (Gu 1999; Dermitzakis and Clark 2001; Raes and Van de Peer 2003) have shown that gene duplication events often lead to a shift in protein function from an ancestral role resulting in divergence and as a consequence of which some residues are subjected to altered functional constraints. This implies that evolutionary rates at these sites will vary in different homologous genes of a gene family. Site-specific altered functional constraint (or shifted evolutionary rates) can be detected by comparing the rate correlation between gene clusters, when the phylogeny is given (Gu 1999). This approach has been earlier exploited by researchers to trace the functional divergence in vertebrate hemoglobin (Gribaldo et al. 2003), G-protein alpha subunits (Zheng et al. 2007), OPR gene family in plants (Li et al. 2009a), and anoctamin family of membrane proteins (Milenkovic et al. 2010). However, a broad picture on the functional divergence in the NifH/Bchl protein family is still unavailable.

4 Challenges and Future Prospects

Considerable progress has been made in understanding the machinery of biological nitrogen fixation in the last few decades. The major part of the research has been focused on the structure of nitrogenase, elucidation of the compositions, and functions of all of the nif-gene products. In the past, major roadblocks in the BNF research were the struggle associated with the detection of nif genes from environmental samples and subsequent crystallization of the nitrogenase enzymes. In the post-genomics era, these hurdles have largely been removed with the advent of metagenomic research and in silico protein modeling techniques. The challenge now is to put all the known information together and, with the combined application of biochemical, genetics, and bioinformatics techniques, to determine how nitrogenase functions at the molecular level. With the rapid increase in the number of complete genomes of varied diazotrophs along with their nitrogen-fixing genes in the public domain, bioinformatics tools have emerged as a potent weapon to tackle the unsolved mysteries of symbiotic and asymbiotic nitrogen fixation. It can be used to extract meaningful interpretation of sequence data. With the advent of new algorithms and computational tools for measuring structural divergence, the problems associated with functional evolution of nitrogenase system can also be tackled in a better way and new glimpses can be gained. Genomic studies aided by the bioinformatics tools offer a global view of the expression, regulation, dynamics, and evolution of the genomes from nitrogen-fixing microbes and have the capability in offering new opportunities to preserve and improve biotic resources.