Introduction

Membrane-localized NADPH oxidases known as respiratory burst oxidase homologs (Rbohs) in plants are key players in the generation of superoxide radicals (O2 ) (a type of reactive oxygen species; ROS). Reports on their role in the defense response and vital morphogenetic processes such as cell expansion, pollen tube growth, seed germination and after-ripening, root hair elongation, Ca2+-dependent stomatal closure have been documented (Kaur et al. 2014). The first plant NADPH oxidase was identified in Oryza sativa, called as OsRbohA (Groom et al. 1996). Subsequently, more Rbohs were identified in other species. Rboh protein family displays a diverse portfolio and comprises 127 members from 26 plant species with isoforms ranging from A to J.

Rbohs encode homologs of the mammalian NADPH oxidase (Nox) catalytic subunit known as gp91phox. The domain architecture comprises an extended N-terminal region containing two Ca2+-binding EF-hand motifs, six α-helical transmembrane domains (TMD-I to TMD-VI) connected by five loops (loops A–E) and a C-terminal FNR (ferredoxin-NADP+ reductase) domain containing FAD (flavin adenine dinucleotide) and NADPH-binding moieties (Kaur et al. 2014). Experimental evidence suggest that the N-terminal region plays a key role in the regulation of Rbohs via Ca2+, Rac GTPase, protein kinases (CDPK; calcium-dependent protein kinase, CcaMK; Ca2+/CaM-dependent protein kinase, MAPK; mitogen activated protein kinase, OST1; open stomata 1, receptor-like cytoplasmic kinase BIK1, histidine kinase), GIRAFFE heme oxygenase, extracellular ATP, phospholipase Dα1, and phosphatidic acid (Kaur et al. 2014; Baxter et al. 2014). The N-terminal region of RbohB from O. sativa (OsRbohB138−313) comprises a homodimer, where each monomer contains two EF-hands and two EF-hand-like motifs with Ca2+-binding sites (Oda et al. 2010). Ca2+ binding to the EF-hand and Ca2+-dependent phosphorylation act together to control ROS production in OsRbohB (Wong et al. 2007) and Arabidopsis thaliana AtRbohC and AtRbohD (Kobayashi et al. 2007; Takeda et al. 2008). However, Ca2+-independent intramolecular interaction was also observed between the N and C termini in OsRbohB, which requires the N-terminal region upstream of EF-hands (Oda et al. 2010). The possible lack of comprehensive phylogenetic information, three-dimensional structure, folding mechanism and physiological function for many Rbohs creates a gap in our understanding of Rbohs function. To bridge this gap, various bioinformatics approaches were employed (Lee et al. 2007; Kaur et al. 2015) to analyze the Rboh proteins from Arabidopsis and rice. A range of gene duplications were identified that were dicot as well as monocot-specific. Further, the sequence and disorder analyses indicated the N-terminal to be highly variable compared to the transmembrane and C-terminal domains. Three-dimensional models for the N-terminal domain for Rbohs from A. thaliana and O. sativa were generated and amino acid residues likely to affect functions were identified. Further, the role of Ca2+ in the folding of Rboh proteins was studied using discrete molecular dynamics. The present study provides a rational to evaluate the effects of certain amino acid residues on the structure and function of the plant NADPH oxidases in crop improvement programmes, for instance, crops with better stress tolerance.

Materials and methods

Phylogenetic analysis

A total of 127 Rboh protein sequences were retrieved from 26 plant species (Kaur et al. 2014). Multiple sequence alignments were generated using the Clustal Omega program (Sievers et al. 2011). Neighbor-joining and maximum likelihood based trees were generated using the ClustalX (Thompson et al. 1997) and PhyML 3.0 programs (Dereeper et al. 2008), respectively. Phylogenetic tree topology was drawn with the MEGA6 software (Tamura et al. 2013) and the evolutionary trace analysis (ETA) was carried out using the evolutionary trace server (Innis et al. 2000).

Ancestral sequence reconstruction

The sequence alignment was divided into five monophyletic groups, each containing monocot and dicot sequences as described here: (1) AtRbohD, AtRbohA, AtRbohG, AtRbohC, OsRbohI; (2) AtRbohB, OsRbohB, OsRbohH; (3) AtRbohF, OsRbohC, OsRbohA; (4) AtRbohE, OsRbohG, OsRbohF; and (5) AtRbohH, AtRbohJ, OsRbohD, OsRbohE. The four lower plant sequences CmRboh1, CmRboh2, PyRboh and CcNoxD were used as outgroups for ancestral reconstruction. The alignment for each monophyletic group with outgroup sequences were submitted to the FastML server (http://fastml.tau.ac.il/), and ancestral sequences were calculated using the JTT model of substitution with a Gamma distribution.

Multiple sequence alignment, identification of orthologs and structure analysis

Protein sequences of A. thaliana (AtRbohA-J) and O. sativa (OsRbohA-I) were aligned using Clustal Omega (Sievers et al. 2011). Secondary structures were predicted using PSIPRED (Jones 1999) through the Ali2D (Alva et al. 2016) server (http://toolkit.tuebingen.mpg.de/ali2d) and JPred3 (Cole et al. 2008) (http://www.compbio.dundee.ac.uk/jpred3/refs.html). Orthologous sequences were identified using InParanoid8 (Sonnhammer and Östlund 2015) (http://inparanoid.sbc.su.se/cgi-bin/faq.cgi), OMA Browser (Schneider et al. 2007) and OrthologID (Chiu et al. 2006).

Template searching was performed with PSI-BLAST (Altschul et al. 1997) and HHPred (Soding et al. 2005), and homology modeling was carried out using Modeller9v8 (Sali and Blundell 1993) for N-terminal EF-hand and C-terminal NADPH-binding regions. Ab initio modeling for N-terminal region upstream of EF-hands was carried out using I-TASSER (Roy et al. 2010) and Phyre software (Kelley and Sternberg 2009); and the I-TASSER models were used for subsequent studies. Structure optimization for models was carried out with Chiron (Ramachandran et al. 2011), Gaia (Kota et al. 2011) and KoBaMIN (Rodrigues et al. 2012). The stereochemical and quality of models were assessed with structural analysis and verification server (http://nihserver.mbi.ucla.edu/SAVES/) and QMEAN server (Benkert et al. 2009) (http://swissmodel.expasy.org/qmean/cgi/index.cgi).

Simulations were conducted using replica exchange discrete molecular dynamics (RX-DMD) via πDMD software package (http://www.moleculesinaction.com/pdmd.html) (Ding and Dokholyan 2006; Ding et al. 2008; Dokholyan et al. 1998; Proctor et al. 2011; Shirvanyants et al. 2012). RX-DMD simulations for the three OsRbohB forms were performed with 18 replicas at temperatures 0.480, 0.520, 0.540, 0.560, 0.580, 0.600, 0.620, 0.640, 0.650, 0.660, 0.670, 0.680, 0.690, 0.700, 0.710, 0.720, 0.740 and 0.760. The weighted histogram analysis method (WHAM) was used to analyze the thermodynamics of protein folding via rexwham program on replica exchange trajectories (Kumar et al. 1992). Electrostatic surface potentials were generated using Poisson–Boltzmann electrostatic calculation methods (PDB2PQR and ABPS input) (Dolinsky et al. 2004). Docking simulations were performed with ClusPro (Comeau et al. 2004) and ZDOCK (Chen et al. 2003). Structure visualization and the generation of figures were carried out using PyMol (DeLano 2002).

Disordered regions were predicted using metaPrDOS (Ishida and Kinoshita 2008) and GSmetaDisorder (Kozlowski and Bujnicki 2012). PrDOS (Ishida and Kinoshita 2007) was used for OsRbohF and OsRbohG (> 1000 aa) and the results were evaluated using DynaMine (Cilia et al. 2013).

Results

Phylogenetic analysis of Rboh family within plant kingdom

The phylogeny showing the closely and distantly related plant NADPH oxidases is presented in Fig. 1. It corresponds to 127 full-length protein sequences for experimentally characterized Rbohs, obtained from 26 plant species including dicots, monocots and lower plants (S1 File). We reconstructed phylogeny established using neighbor-joining method (Fig. 1) and a very similar phylogenetic tree was also generated using maximum likelihood method. As expected on the basis of the evolutionary separation existing between lower and higher plants, Rbohs from red algae formed a separate sub-cluster 1A. Bootstrap support for this division was as high as 98.2% (Fig. 1). In higher plants, we observed a more extensive expansion in the number of Rboh proteins due to gene duplications (Fig. 2). The Rboh proteins from higher plants were grouped into nine main clusters (2–10), with clusters 2–4, six being subdivided into two sub-clusters A and B. Clusters 2–6 contained both dicot and monocot Rbohs while cluster 7 was monocot-specific and clusters 8, 9 and 10 were dicot-specific. Cluster 1 showed members of red algae to be phylogenetically more proximal to Rbohs from Fabaceae (Glycine max and Phaseolus vulgaris). The distribution of Rbohs in red algae indicated the presence of one Rboh in Chondrus crispus, Porphyra yezoensis, whereas two in the case of Cyanidioschyzon merolae. No introgression of any dicot and monocot species was observed in this cluster. Cluster 2 contained representative members of both dicots and monocots. The legume species predominate the dicots, whereas members of A. thaliana and Populus trichocarpa have also been included in the cluster (sub-cluster 2A). In sub-cluster 2A, the presence of A. thaliana AtRbohH and AtRbohJ and in the same sub-cluster, an orthologous relationship of two P. trichocarpa Rbohs (PtRbohH and PtRbohJ) with AtRbohs was observed. Another group constituted five Rbohs from four legumes. A member of Vitaceae family (Vitis vinifera VvRbohH) with unknown function was also included in the sub-cluster. However, the other sub-cluster (2B) was predominated by cereal crops belonging to monocots. Sub-cluster 2B contained one protein from Hordeum vulgare and two proteins each from O. sativa and Zea mays. Cluster 3 included both monocot and dicot Rbohs within sub-cluster 3A and 3B, respectively, belonging to same plant families as cluster 2 but different Rboh subfamilies. Cluster 4 appears to be complex showing A. thaliana AtRbohI closer to monocots than dicots. Monocot sub-cluster 4A contained Rbohs from Z. mays (ZmRbohB-α and ZmRbohB-β), O. sativa (OsRbohA) and H. vulgare (HvRbohA and HvRbohF1). This sub-cluster was more close to that part of dicotic sub-cluster (cluster 4B) which also contained Rbohs involved in biotic stress from Solanaceae family; Lycopersicon esculentum (LeRboh1), Solanum tuberosum (StRbohA) and Nicotiana benthamiana (NbRbohA) and Nicotiana tabacum (NtRbohF). Members from Vitaceae and Salicaceae family were closer to Solanaceae than legumes. Cluster 5 contained different members belonging to family Brassicaceae (Lepidium sativum LesaRbohA, Brassica oleracea BoRbohD and BoRbohF), Cucurbitaceae (Cucumis sativus CsRboh) and monocot. The introgression of members of leguminous and Solanaceae have not been observed in this cluster. Cluster 6 was represented by monocots and dicots. The members of different families such as Poaceae, Brassicaceae, Fabaceae, Salicaceae, Vitaceae and Solanaceae were observed in this cluster. This cluster can be sub-divided into two major sub-clusters, one representing the members of monocots (6A) and the other of dicots (6B). Cluster 7 was exclusively a monocot-specific cluster with representation from rice and maize. On the other hand, clusters 8–10 were dicot-specific with majority of members from Brassicaceae. Cluster 9 was mostly dominated by members of family Fabaceae. Besides Fabaceae, isolated member of Vitaceae family (V. vinifera VvRbohA) of unknown function was observed in this cluster. Cluster 10 was predominated by members of family Solanaceae including members from P. trichocarpa (PtRbohA and PtRbohC).

Fig. 1
figure 1

Phylogenetic tree for127 Rbohs from 26 plant species. The unrooted tree was inferred by the neighbor-joining method with 1000 bootstraps after the alignment of the Rboh amino acid sequences listed in Supplementary File S1. Rbohs are labeled with their name and accession number. Green, pink and orange circles indicate dicot, monocot and lower plant Rbohs, respectively. Filled and empty circles denote experimentally validated and unknown functions respectively. At, Arabidopsis thaliana; Bo, Brassica oleracea; Cc, Citrullus colocynthis; Cs, Cucumis sativus; Glyma, Glycine max; Lesa, Lepidium sativum; Lj, Lotus japonicus; Le, Lycopersicon esculentum; Mt, Medicago truncatula; Na, Nicotiana attenuata; Nb, Nicotiana benthamiana; L × S, Nicotiana langsdorffii × Nicotiana sanderae; Nt, Nicotiana tabacum; Pv, Phaseolus vulgaris; Pt, Populus trichocarpa; St, Solanum tuberosum; Vf, Vicia faba; Vv, Vitis vinifera; Hv, Hordeum vulgare; Os, Oryza sativa; Pc, Potamogeton crispus; Ta, Triticum aestivum; Zm, Zea mays; Cc, Chondrus crispus; Cm, Cyanidioschyzon merolae; Py, Porphyra yezoensis (color figure online)

Fig. 2
figure 2

Gene duplications in Rbohs. Model drawn in reference to Fig. 1, indicating evolutionary separation and duplication of Rbohs between lower (algae) and higher plants, and further extensive expansion in the number of Rbohs due to gene duplications in legume, other dicots and monocots

The ancestral sequence reconstruction using the multiple sequence alignment of 127 Rbohs and the phylogenetic tree were used to identify amino acid substitutions that possibly account for the functional versatility of Rbohs within the plant kingdom. Red algae Rbohs (CmRboh1, CmRboh2, PyRboh and CcNoxD) were used as outgroups. First, ancestors were predicted for the nodes closest to the monocot and dicot clusters 2A, 2B, 3A, 3B, 4A, 4B, 6A, and 6B and the ancestors to the combined clusters 2A/2B, 3A/3B, 4A/4B and 6A/6B. Further, 2A/2B, 3A/3B, 4A/4B and 6A/6B clusters were analyzed for ancestral sequences (ANC), sites predicted with high confidence (INV) and discriminative residues among the clusters (DISCRIM) were determined from the alignments. The final output indicating ANC, INV and DISCRIM residues among AtRbohs and OsRbohs from the four clusters is shown in S2 File.

According to the evolutionary trace analysis (ETA), different partitions P01–P20 divide the phylogenetic tree into classes (S3 File). The ET classes identified all clusters obtained from phylogenetic analysis. Each partition included different numbers of classes, where each class contained cluster of similar sequences originating from a given node within that partition. In partition P01, all 127 sequences cluster into one class. In partition P02, these sequences divide into two classes 1 and 2. In the present work, we focused on ten Arabidopsis and nine rice Rbohs.

Sequence analysis and structural studies

The multiple sequence alignment for the 19 Rboh proteins (10 from Arabidopsis and 9 from rice), showed regions of conserved and non-conserved residues (S4 File). OsRbohI and OsRbohC were identified as putative orthologs of AtRbohD and AtRbohF, respectively (S5 File).

N-terminal EF-hands

The crystal structure corresponding to the N-terminal domain of OsRbohB (PDB: 3A8R) comprises a homodimer with two EF-hands and two EF-hand-like motifs. The Ca2+-binding site EF-hand I comprises the amino acid residues; D242, N244, D246, R248 and E253 (Oda et al. 2010). The sequence identities for the 18 Rbohs corresponding to the N-terminal EF-hand ranged from 42 to 67%. The three-dimensional models comprised ~89 to ~95% residues in the ‘most favoured regions’ of the Ramachandran plot with deletions and substitutions associated in Ca-binding sites (S6 File). Except for five Rbohs (OsRbohA, OsRbohG, OsRbohI, AtRbohB and AtRbohF), the remaining were associated with substitutions corresponding to the equivalent Ca2+-binding residues (S4 File). OsRbohF was associated with a deletion of five amino acid residues corresponding to the Ca2+-binding ligands in EF-hand I, whereas AtRbohI was associated with deletion of D242. AtRbohA, AtRbohC, AtRbohD, AtRbohG and AtRbohJ were associated with N244D substitution, and AtRbohE, AtRbohH and AtRbohJ with R248K. In the case of OsRbohD and OsRbohH, the R248Q substitution was observed, whereas OsRbohC and OsRbohE were associated with R248H and R248M accepted substitution, respectively. In all 19 Rbohs, D246 was conserved.

Another important binding partner of Rbohs was Rac GTPase. R273 and Y277 are important for Rac-binding in OsRbohB (Oda et al. 2010). Nine Rbohs (OsRbohA, OsRbohC, OsRbohF, OsRbohG, OsRbohI, AtRbohA, AtRbohD, AtRbohE and AtRbohF) were associated with R273Q substitution. In the case of OsRbohD and OsRbohE, R273H substitution was observed, whereas AtRbohB, AtRbohH and AtRbohJ were associated with the R273N substitution and AtRbohG with the R273K substitution. In AtRbohC, AtRbohI and OsRbohH, R273 was conserved and Y277 was conserved among the 19 Rbohs.

Two pairs of putative orthologous Rbohs; AtRbohD–OsRbohI and AtRbohF–OsRbohC were selected for further studies. The percentage sequence identities for the Rbohs; AtRbohD, AtRbohF, OsRbohC and OsRbohI with the PDB code: 3A8R were 53, 55, 54 and 62%, respectively (S2 File). Electrostatic surfaces of the protein (PDB code: 3A8R) and four Rboh models are shown in Fig. 3a. Among four Rbohs, N244D and R248H substitution were observed for Ca-binding sites in AtRbohD and OsRbohC; and R273Q in all four Rbohs (Fig. 3b). Further, OsRac1 was modeled and used for docking with Rboh models (Fig. 4). Docking studies show that monomer models of two Rbohs interact in the region corresponding to R273 and Y277, although no interactions were observed in the dimer state (Fig. 5).

Fig. 3
figure 3

a Electrostatic surfaces showing N-terminal EF-hands for (PDB code: 3A8R) and four Rbohs models. Ca2+-and Rac-binding residues in 3A8R are indicated in green and cyan, respectively. Conserved residues are shown in green and cyan while respective substitutions are shown in magenta and orange. Gray sphere denotes calcium ion. The electrostatic potential displayed is between −5 (red) and +5 (blue) kT e-1. b Ca2+-binding residues and respective substitutions are shown as sticks in four Rboh models with reference to 3A8R. Conserved residues are shown in green while respective substitutions are shown in magenta. Gray sphere denotes calcium ion (color figure online)

Fig. 4
figure 4

Model of OsRac1 showing structural alignment with template. Magenta and green color indicate Rboh and template (PDB code: 3A8R), respectively (color figure online)

Fig. 5
figure 5

Docked monomer Rboh models of a OsRbohC and b OsRbohI with OsRac1. Magenta and blue colors indicate Rboh and OsRac1, respectively. Cyan color indicates Rac-binding residues in Rbohs (color figure online)

To study the role of Ca2+ in folding of Rboh proteins, we performed ab initio folding simulations of EF-hands from AtRbohD and OsRbohB. Further, OsRbohB was studied in more detail. Weighted histogram analysis method (WHAM) was used to calculate the folding thermodynamics from replica exchange simulations for three OsRbohB forms: apo form, containing one and two Ca2+ ions. The thermodynamic parameters such as specific heat were computed to analyze the unfolding transition temperature. The specific heat of wild-type apoOsRbohB dimer showed a major peak at T = 320 K (Fig. 6a), which may refer to the global unfolding or melting transition of the protein. However, at lower temperatures, the protein appeared to be folded or at least partially folded. A minor peak emerged at a lower temperature, T = 300 K, which corresponds to a partial unfolding transition. Above T = 340 K, the protein appeared to be in a random coil state. Similarly, we computed the specific heat of two other forms (Fig. 6a) that showed two peaks in the specific heat at T = 320, 342 K and T = 340, ~352 K. The presence of two peaks may indicate the existence of intermediate states in the folding pathway. Compared to the wild-type apoOsRbohB, single Ca2+ OsRbohB has higher melting temperature (342 K), and hence stronger thermal stability, suggesting that Ca2+ enhance the folding of apoOsRbohB. Furthermore, the peaks in specific heat indicate the correspondence with the transition between energetic states, where energy is required to increase the potential of the complex (Fig. 6b). The presence of three low-energy states in the potential energy distribution with Gaussian-like peaks in three forms was also observed (Fig. 6c).

Fig. 6
figure 6

Comparison of wild-type apoOsRbohB form with one and two Ca2+ forms a specific heat and b potential energy as a function of temperature, c potential energy distributions

N-terminal region upstream of EF-hands

The N-terminal region upstream of EF-hands was highly diverse within Rbohs as observed from multiple sequence alignment. Further, ETA confirmed these results. In addition, disordered regions were observed in N-terminal region as indicated by disorder prediction (Fig. 7) and DynaMine analysis (S7 File). Unlike EF-hands, ab initio structural models for two pair of putative orthologs Rbohs and OsRbohB appear more basic and have different conformations (Fig. 8).

Fig. 7
figure 7

Disorder prediction a AtRbohs and b OsRbohs using Genesilico metadisorder, metaPrDOS and PrDOS

Fig. 8
figure 8

Electrostatic surfaces showing N-terminal upstream of EF-hand-like motifs for (PDB code: 3A8R) and four Rbohs ab initio models. The electrostatic potential displayed is between −5 (red) and +5 (blue) kT e-1 (color figure online)

Transmembrane region

The TMDs (I–VI) were conserved including a pair of His residues in TMD-III and TMD-V (S4 File). The alignment of the six TMDs in 19 Rbohs connected by five loops (A–E) was similar to human Nox2. A large insertion comprising ~ 57 amino acid residues was observed in TMD-III of OsRbohF compared to the other Rbohs. The amino acid residues in TMD-II were most conserved among the six TMDs.

C-terminal NADPH-binding regions

Our results indicated two FAD and four NADPH-binding domains (I–IV) in 19 Rbohs (S4 File). The three-dimensional models corresponding to the C-terminal NADPH-binding domains were constructed using the crystal structure of the NADPH-binding domain of gp91phox (PDB: 3A1F) as template (S8 File). The sequence identities corresponding to the individual Rbohs with the sequence of 3A1F vary from 36 to 43%. Certain large insertions were observed in OsRbohH. The presence of large insertions between NADPH-I and NADPH-II precluded the possibility of constructing reliable models for some of these Rbohs. The FAD-binding regions were also excluded from the model due to the lack of suitable templates. The NADPH-I domain comprises the glycine-rich motif (GXGXG), which is conserved in all 19 Rbohs (S4 File). In addition, other conserved residues among Rbohs correspond to human Nox2 (Pro-415 and Asp-500) and AtRbohD (Cys890).

Discussion

In the present study, ten Rbohs from A. thaliana and nine from O. sativa were analyzed using phylogenetic, evolutionary trace and sequence-structure approaches.

Phylogenetic analysis of Rboh family within plant kingdom

In the present study, an extensive phylogenetic analysis of 127 Rboh proteins from 26 plant species involving members of dicots, monocots and lower plants was conducted. In an earlier phylogenetic study, 50 ferric reduction oxidases (FRO) and 77 Noxes from plants were reported (Chang et al. 2016). This paper also suggests that though FROs are closely related proteins but still differ from the plant Noxes, and therefore the comprehensive phylogenetic analysis of Noxes conducted in the present paper holds significance. In the present study, representative members from red algae, Fabaceae, Brassicaceae, Salicaceae, Vitaceae, Poaceae, Solanaceae, Cucurbitaceae and Potamogetonaceae were distributed in ten phylogenetic clusters. The phylogeny along with functional data available for few Rboh proteins indicated by filled circles in Fig. 1 were used for functional analysis of Rbohs. Cluster 1 indicated a close relationship of Rbohs from legumes with red algae, suggesting the possibility of evolution of legume Rbohs from lower plants (red algae). In the evolutionary context, it also implies that Rbohs of land plants arose from lower plants dwelled in water. Among four red algae Rbohs, the role of only C. crispus CcRbohD in pathogen-induced ROS production has been established (Herve et al. 2006). The analysis of clusters 2 and 6 indicated duplications of M. truncatula Rboh proteins relative to AtRbohs. These observations were supported by the genome duplications in legumes relative to A. thaliana (Yan et al. 2003; Cannon et al. 2006). Arabidopsis thaliana AtRbohH and AtRbohJ are implicated to have pollen-specific functions (Potocký et al. 2007), while the role of only M. truncatula Rboh (MtRbohF) in root hair development has been reported (Marino et al. 2011). From sub-cluster 2B, only O. sativa OsRbohE is known to involve in biotic stress (Yoshie et al. 2005). It is interesting to infer that cluster 2 involves development and biotic stress-related Rbohs. Further, majority of the Rboh isoforms in this cluster belong to H and J subfamily both from dicots and monocots. In cluster 3, except OsRbohF, known to play role in defense (Lin et al. 2009b), the functions of other proteins are still not known. Subfamily E seems to be dominant among monocots and dicots in this cluster. Further, we noticed the emergence of abiotic stress-related functions with biotic stress (pathogen/wound) and development-related Rbohs in cluster 4. For example, Z. mays ZmRbohB-α and ZmRbohB-β are involved in abiotic stress (Lin et al. 2009a, b), O. sativa OsRbohA in drought and growth regulation (Wang et al. 2016), OsRbohC in drought (Kaur and Pati 2016), H. vulgare HvRbohA and HvRbohF1 in biotic stress from sub-cluster 4A (Yoshie et al. 2005; Trujillo et al. 2006; Lightfoot et al. 2008). The sub-cluster 4B which contains Rbohs involved in biotic stress from Solanaceae family; L. esculentum LeRboh1, S. tuberosum StRbohA and N. benthamiana NbRbohA (Sagi et al. 2004; Kumar et al. 2007; Yoshioka et al. 2003), while N. tabacum NtRbohF plays role in pollen tube growth (Potocký et al. 2007). Few Rbohs are known to be involved in abiotic stress (A. thaliana AtRbohF and Citrullus colocynthis CcRbohF) form a separate group in sub-cluster 4B. AtRbohF is the best example and the most studied Rboh possessing all the three kinds of function: abiotic stress, biotic stress and development (Marino et al. 2012). CcRbohF from Cucurbitaceae family is reported in drought tolerance (Si et al. 2010). The weak phylogenetic support at the start of cluster 4 further validates this observation and provides proof for this group’s closeness with abiotic stress related monocot Rbohs. In cluster 5, the role of LesaRbohA is still unknown, whereas BoRbohD and BoRbohF in ethylene signaling and heavy metal stress have been reported (Jakubowicz et al. 2010). The critical role of CsRboh in brassinosteroid-induced stress tolerance has also been demonstrated (Xia et al. 2009). However, the role of only monocot Rboh (Potamogeton crispus PcRboh1) present in this cluster has not been identified yet. In the case of cluster 6, out of six proteins in sub-cluster 6A, only two (OsRbohB and OsRbohH) were reported to be drought-induced (Wang et al. 2013; Kaur and Pati 2016). We hypothesize that other monocot Rbohs (ZmRbohF, ZmRbohI, HvRbohB1 and HvRbohB2) may function in abiotic stress. The sub-cluster 6B which predominantly involves Rboh from B subfamily has wide range of functions. The role of S. tuberosum StRbohB in biotic stress (Kobayashi et al. 2006), A. thaliana AtRbohB in seed germination and after-ripening (Müller et al. 2009), L. sativum LesaRbohB in root development and auxin signaling (Müller et al. 2012), M. truncatula MtRbohA in nodule nitrogen fixation (Marino et al. 2011) and P. vulgaris PvRbohB in root growth and nodule nitrogen fixation (Montiel et al. 2012) were reported recently. Further, we observed that cluster 7 was represented by monocots, cluster 8 contained members of dicots, and cluster 9 was represented exclusively by members of legumes. Instead of three clusters as observed in the present study, they were placed in one cluster in an earlier study where only limited plant species were considered (Marino et al. 2011). The functions of most Rbohs from cluster 7 are still unknown. However, a recent analysis from our group has indicated the potential role of OsRbohI in abiotic stresses (Kaur and Pati 2016). On the other hand, members from clusters 8 are implicated in an array of biological activities. Most members are from Brassicaceae and belong to D subfamily. Arabidopsis thaliana AtRbohD is reported to play versatile roles in biotic and abiotic stress (Marino et al. 2012) while C. colocynthis CcRbohD in drought tolerance (Si et al. 2010). Arabidopsis thaliana AtRbohC plays role in root growth and cell wall integrity (Marino et al. 2012). The function of members in cluster 9 is still unknown. The divergence observed in the form of three clusters (cluster 7, 8 and 9) hints towards gene duplications among various members placed in three clusters. Such mechanisms of gene duplications and models; and their role in the evolution of novel gene functions have been highlighted earlier (Innan and Kondrashov 2010; Lawton-Rauh 2003). It has been observed that gene duplications may result due to nonfunctionalization, neofunctionalization (evolving novel functions) or subfunctionalization (partition of gene functions) of genes (Moore and Purugganan 2005; Rensing 2014). In our study, we also observed AtRbohH and AtRbohJ of cluster 2 are having pollen-specific function (Potocký et al. 2007) while MtRbohF of same cluster is linked to root hair development (Marino et al. 2011). The divergence of function of the members present in the same cluster also indicate that proteins encoded by duplicating Rboh genes may interact with different substrates leading to versatility in their functions (Huberts and van der Klei 2010). In addition, gene duplications also have implications in the evolution of regulatory networks (Babu et al. 2004). Therefore, gain or loss of function during gene duplications may result in new or loss of connections with the existing partners in the network. Further, the observed ancestral, invariant and discriminative residues among four dicot–monocot clusters from ancestral sequence reconstruction also hint towards a number of gene duplication events among Rbohs.

Sequence and structure analyses

In the present work, we have selected two model species, one representing dicots (Arabidopsis) and the other a member of monocots, i.e., rice. We identified the conserved and variable sequences and their link to structural features among their Rbohs. Using multiple sequence alignment (MSA), we observed many conserved and variable regions as well as secondary structural elements among 19 Rbohs, which may hint towards their functions. MSA shows N-terminal to be highly variable with no regular structure possibly contributing to the diverse functions of the Rbohs. The identification of orthologous genes is considered important in understanding the function of unknown genes, and hence is also explored in the present work (Kristensen et al. 2011). For this purpose, specific orthologous resources to identify orthologous sequences for each of ten AtRbohs and nine OsRbohs were investigated. This study provides important information not only on the orthologous proteins but on also diverse roles of Rbohs among members of dicots and monocots. The validation of these ortholog partners will further substantiate the relationship during the process of plant evolution.

Structural characterization based on the availability of crystal structure has been accepted means for the prediction of function of protein (Ramachandran and Dokholyan 2012). Rbohs are important transmembrane proteins involved in various biological functions that need proper characterization. We therefore analyzed 18 Rbohs to study the existing variations in the residues and predicted their structure using homology modeling. In the present work, emphasis was laid on N-terminal region of Rbohs involving Ca2+- and Rac-binding sites critical for its activation and function.

In our study, we observed the following substitutions: N244D, R248K/Q/H/M in Ca2+-binding sites with reference to OsRbohB. In earlier studies, it has been observed that the nature of amino acids in the six positions from the loop region of helix-loop-helix of Ca2+-binding motifs affects the Ca2+-affinity (Procyshyn and Reid 1994). N244D substitution at the loop region increased the Ca2+-affinity in synthetic peptides (Procyshyn and Reid 1994) and lead to conformational changes in troponin C-based calcium biosensors (Mank et al. 2006, 2008). In addition, shift from monodentate to bidentate or vice versa in carboxylate-binding mode of Asp (D) also affect the Ca2+-binding affinity and the catalytic activity and function of proteins (Dudev and Lim 2007). Apart from Ca2+-binding proteins, N244D substitution is also known in effecting substrate binding of aspartate aminotransferase (Oue et al. 1999), conformational changes in phenylalanine hydroxylase and NAD-dependent d-lactate dehydrogenase (Carvalho et al. 2003; Shinoda et al. 2005). Further, R248Q/H substitutions can affect the helix stability in centrin, EF-hand containing centrosome-associated protein (Taillon and Jarvik 1995). A recent study revealed that R248Q substitution in EF-hand influence the structural flexibility and salt bridges in neuronal calcium sensor-1 protein (Zhu et al. 2014). R248M substitution involving charged to uncharged amino acid, is known to inhibit ability of eukaryotic translation initiation factor 5 (eIF5) in stimulating GTP hydrolysis (Paulin et al. 2001), affecting substrate binding by zinc-endopeptidases (Marie-Claire et al. 1997) and lactate monooxygenase (Sanders et al. 1999), and Rho-GAP activity of myosin myr 5 (Müller et al. 1997). Thermodynamics studies involving assessment of free energy changes will be helpful in understanding the effect of the above mutations on Ca2+-binding as well as the stability of the mutants in the respective Rbohs.

The other component in the N terminus is the Rac-binding region, which is reported to play a very important role in the regulation of Rboh (Oda et al. 2010; Wong et al. 2007). The present study revealed R273N/Q/K/H substitutions in the Rac-binding sites in 18 Rbohs. Three Rbohs (AtRbohB, AtRbohH and AtRbohJ) showed similar kind of substitution (R273N), although they are implicated in different functions (Kaur et al. 2014). R273Q is known to prevent Rac-binding to N-terminal tetratricopeptide repeat motifs from p67phox (one of the components of human Nox) while R273K has weak interaction (Koga et al. 1999). In another experiment, two rice Rboh models (OsRbohI and OsRbohC) were docked with deduced structure of OsRac1. The rice Rbohs (OsRbohI and OsRbohC) are putative orthologs of AtRbohD and AtRbohF, respectively, linked to plant stress. The monomer models of two Rbohs show interaction in the region corresponding to R273 and Y277, although no interactions were observed in the dimer state. This might be due to the unstable nature of monomer, which needs to bind with some substrate as compared to dimer.

While analyzing the available information regarding various Rbohs, the involvement of AtRbohD in a range of biological functions such as plant development, biotic and abiotic stress was observed (Kaur et al. 2014). Hence, it is important for us to further study ab initio folding of its EF-hands, and hence the role of Ca2+ using replica exchange discrete molecular dynamics. In the current study, the critical role of Ca2+ in the folding of EF-hands was found. It is observed that EF-hands feature three helices instead of four in the absence of Ca2+. It indicates the involvement of Ca2+ in specific folding of EF-hands. In WHAM analysis from replica exchange, trajectories of OsRbohB using its three forms (apo form, containing one and two Ca2+ ions) might suggest it as a two-state protein. It further provides evidence of strong link of number of Ca2+ ions with the folding event and its probable involvement in the regulation of Rbohs.

We observed disordered regions upstream of the EF-hand-like motifs in the N terminus of 19 Rbohs. This hints towards the presence of different binding partners. To get insights about their structural composition, ab initio structure prediction and electrostatic studies were conducted for two pair of putative orthologous Rbohs as well as for OsRbohB. We observed that the N-terminal region was rich in basic amino acids, which may participate in electrostatic interactions with acidic phospholipids as well as subcellular localization and function.

In the present study, six TMDs downstream of N-terminal region were observed from sequence analysis and in agreement with previous studies on Rbohs (Lin et al. 2009b; Trujillo et al. 2006). However, it has not been possible to predict the structure of TMDs due to non-availability of suitable templates for modeling. A pair of His residues involved in heme binding in the TMD-III and V was conserved among 19 Rbohs. Six TMDs involve five loops; two intracellular loops—B and D; and three extracellular loops—A, C and E. The B-loop has been recognized as an important structural element for human Nox2 function and plays a critical role in ROS production. Arg73, Arg80, Arg91, Arg92, Leu94 and Asp95 residues are involved in electron flow from NADPH across membrane in Nox2 (von Löhneysen et al. 2010). Among 19 Rbohs, Arg73 and Arg80 are conserved. Leu94 which is conserved in human Noxes is replaced by a more bulky group, i.e., phenylalanine in all Rbohs. Previous study on L → F substitution has documented its effect on the binding properties of platelet glycoprotein Ibα for von Willebrand factor (Miller et al. 1992). However, the equivalent of Asp95 is conserved in all Rbohs, except OsRbohB and OsRbohH that are substituted by Asn. The D → N substitution in the loop region between TMDs have increased the signaling activity in Kaposi’s sarcoma-associated herpesvirus G protein-coupled receptors (Ho et al. 2001). Further, the cysteine residues in E-loop of Nox4 that form disulphide bridges to maintain stability of the E-loop (Takac et al. 2011), are absent in Rbohs.

Apart from N-terminal and TMD, other component of Rbohs is C-terminal, which shows interaction with N-terminal from Rboh (Oda et al. 2010). C-terminal consists of FAD and NADPH-binding regions where NADPH-binding region is involved in electron transfer from the cytosolic NADPH (electron donor), through FAD and then across the membrane through hemes (in TMDs) to molecular oxygen (electron acceptor) leading to superoxide production (Kaur et al. 2014). In the present study, two FADs and four NADPH-binding regions in the C-terminal region of 19 Rbohs were observed and in agreement with earlier reports on Rbohs (Lin et al. 2009b; Trujillo et al. 2006). Structure analysis of the predicted C-terminal models shows a Rossmann fold for binding with NADPH. It consists of alternating β-strands and α-helical regions where all strands forms a central β-sheet. The glycine-rich motif (GXGXG) in NADPH-I domain was conserved among 19 Rbohs. Only the highly conserved leucine (L) at the first X position in the above motif in Rbohs was substituted by Ala and Gly in human Noxes. It has been proposed that the glycine-rich motif is involved in substrate binding, where substrate is ATP in bacterial sensor histidine kinases and their eukaryotic counterparts (Egger et al. 1997) and S-adenosyl-l-methionine (SAM) in SAM-dependent methyltransferases (Kozbial and Mushegian 2005). Another type of glycine-rich motif (GXXXG) has been observed in transmembrane α-helices and known to stabilize the oligomerization of several membrane proteins (Kleiger and Eisenberg 2002). Amino acid residues that are required for human Nox2 function (Pro-415 and Asp-500) were conserved in C-terminal region of Rbohs. A recent study has proposed the regulatory role of Cys890 from AtRbohD during plant defense response and S-nitrosylation of this residue inhibits the activity of the enzyme (Yun et al. 2011). The equivalent cysteine residue was conserved in the other 18 Rbohs and human Nox2 in NADPH-IV. This data suggest that the redox-based modification like S-nitrosylation at the conserved cysteine might regulate the activity of other 18 Rbohs and govern a feedback loop in the synthesis of reactive oxygen species.

Conclusion

The phylogenetic analysis corresponding to 127 Rbohs representing 26 plant species suggests a number of both ancient and recent gene duplication events that may contribute to functional versatility of plant NADPH oxidases. Comprehensive sequence analysis accompanied with evolutionary trace, orthologous identification and disorder studies identified the conserved and variable residues among Rbohs. These residues are important candidates for mutation and functional studies. The N-terminal domain is associated with variability relative to the transmembrane and C-terminal domains. The conserved cysteine residue (Cys890) in C-terminal domain in AtRbohD may regulate the activity of other Rbohs via redox-based modification like S-nitrosylation. Three-dimensional models corresponding to the N-terminal indicated deletions and substitutions associated with the Ca2+- and Rac-binding sites. The role of Ca2+ in folding of EF-hands from Rboh proteins was also explored using discrete molecular dynamics. The present study provides crucial insights to design functional genomics experiments to validate the function of the plant NADPH oxidases.