Introduction

Powdery mildew is one of the most ubiquitous plant diseases and it infects a wide variety of plant species, including cereals, millets, pulses, horticultural crops and forest trees. The powdery mildew fungus, Microsphaera diffusa Cke. & Pk, is reported to cause yield losses of 0–26 % (Dunleavy 1980) in soybean. The powdery mildew locus O (Mlo) gene was first identified in barley as being involved in modulation of the defense response against the powdery mildew pathogen, Blumeria graminis f.sp. hordei (Jørgensen 1992; Büschges et al. 1997). It was later observed that complete resistance to this fungal pathogen was conferred by the mutant allele, mlo, of this gene when it was present in homozygous state. The resistance conferred by mlo allele is broad-spectrum, covering all the strains of the pathogen, and is durable in nature (Jørgensen 1992; Piffanelli et al. 2002). A similar broad-spectrum resistance to powdery mildew due to the loss of Mlo function has been reported in tomato (Bai et al. 2008). Thus, Mlo function appears to be necessary for successful infection by the powdery mildew pathogen; in fact, the Mlo proteins are absolute necessity for this pathogen to be able to successfully penetrate the host cell wall (Piffanelli et al. 2002). This conclusion is further supported by the observation that an over-expression of the wild-type allele, Mlo, of this gene leads to super-susceptibility to the powdery mildew pathogen (Wolter et al. 1993; Kim et al. 2002b). There is some evidence that the powdery mildew resistance in common bean (Phaseolus vulgaris) is a quantitatively governed character (Trabanco et al. 2012), indicating that some other gene products are also involved in modulation of host response to the powdery mildew pathogen.

Although the Mlo gene was originally described for its role in defense response, there were indications that it might also be involved in other biological processes. The homozygous mlo plants were reported to show spontaneous death of mesophyll cells associated with increased leaf senescence (Wolter et al. 1993; Kumar et al. 2001; Piffanelli et al. 2002). In addition, Mlo gene family seems to be involved in modulation of response to other biotic and abiotic stresses, including wounding, the carbohydrate elicitor produced by wheat powdery mildew, salt stress, mannitol treatment (Piffanelli et al. 2002; Konishi et al. 2010), etc. In relation to the defense response, Mlo proteins are commonly expressed during invasion by fungal haustoria/appressoria (Kumar et al. 2001). However, these proteins are constitutively expressed in developmental functions and are also involved in response mechanisms regulated by biotic and abiotic stimuli (Chen et al. 2006). The Mlo proteins have a characteristic feature of seven transmembrane (TM) domains (Devoto et al. 1999), and are located in the plasma membrane. These proteins also have a common feature of 20 amino acid long CaM-binding domain (CaMBD; Kim et al. 2002a, b), which mediates the defense signaling mechanism (Reddy et al. 2003). The Mlo proteins invariably contain cysteine and proline residues either in their extracellular loops or in their TM domains; these amino acid residues have remained unchanged for over 400 million years and they are essential for Mlo protein function and/or stability (Elliott et al. 2005). The second and third cytoplasmic loops of Mlo proteins play a critical role in powdery mildew susceptibility (Reinstädler et al. 2010).

The Mlo genes are postulated to trace back to the early stages of evolution of land plants, and they are reported to be restricted to higher plants and certain mosses (Devoto et al. 1999, 2003). A large family of RAC/ROP GTPases acts against pathogens in defense responses, and in cytosolic signal transduction involved in several developmental and stress response phenomena (Chen et al. 2010). The Mlo proteins are believed to participate in signal transduction by altering cell polarity in conjunction with RAC G-proteins leading to resistance to powdery mildew fungus in barley (Schultheiss et al. 2002; Opalski et al. 2005). Nibau et al. (2006) have reviewed the role of RAC G-proteins in several other cellular metabolic pathways and downstream signaling. The RAC/ROP (rho of plants) proteins act as constitutive signaling molecules regulating fungal invasion and defense response regulated by ROP GTPases (Hoefle et al. 2011; Huesmann et al. 2012). The Mlo proteins are rich in leucine, which comprises 9–13 % of the total amino acid residues (Singh et al. 2012). It has been concluded that proteins rich in leucine residues contribute to defense response mechanisms (Buchanan and Gay 1996; Kêdzierski et al. 2004; Torii 2004; Jung et al. 2004). Leucine-rich receptor kinases play a vital role in signal transduction pathways through different protein–protein interactions (Li and Chory 1997; Forsthoefel et al. 2005; Kemmerling et al. 2007), and are involved in downstream defense-regulated pathways and development-related mechanisms (Osakabe et al. 2005).

The availability of complete genome sequence of Arabidopsis thaliana facilitated the identification of 15 A. thaliana Mlo (AtMlo) genes distributed on all the five chromosomes (Devoto et al. 2003). Subsequently, putative Mlo genes were described in important cereal crops like rice (11 genes; Liu and Zhu 2008), sorghum (13 genes; Singh et al. 2012) and wheat (7 genes; Konishi et al. 2010) based on bioinformatics analysis of genome sequences. Shen et al. (2012) surveyed the genome sequence database of soybean (http://www.phytozome.net/soybean.php, phytozome, Release v4.0; Schmutz et al. 2010) for the MLO domain, and identified 20 Glycine max L. Mlo (GmMlo) genes distributed on 13 of the 20 chromosomes. In the Phytozome database, the genome sequence of G. max comprises 975 Mb distributed in 20 chromosomes. In comparison, the soybean genome sequence in the NCBI database contains 1,115 Mb organized in 20 chromosomes (http://www.ncbi.nlm.nih.gov/genome/?term=glycine+max, 30/10/13). In view of the above, it was suspected that there may be some more members of the Mlo gene family in the soybean genome in addition to those described by Shen et al. (2012). Therefore, we searched the NCBI database for Mlo gene family members and were able to detect 19 novel GmMlo members in addition to the 20 GmMlo genes already reported by Shen et al. (2012). These 39 GmMlo genes were distributed on 15 of the 20 chromosomes of G. max. The GmMlo members were analyzed for chromosomal location, gene structure and organization. In addition, a comparative phylogenetic analysis of AtMlo and GmMlo members was carried out to deduce their orthologous clustering. We also characterized the promoter regions of the 39 GmMlo genes to decipher their putative functions, which may be helpful in designing of experiments for their further characterization, identification and isolation.

Materials and methods

Identification of Mlo gene family members in G. max genome and their functional characterization

In view of the availability of complete structural and functional characteristics of the 15 AtMlo genes, the sequences of proteins encoded by them were used for the identification of Mlo gene family members in soybean genome. The amino acid sequences of the AtMlo proteins were extracted from TAIR (The Arabidopsis Information Resource; http://www.arabidopsis.org/) and used as query sequences in the tBLASTn tool (Altschul et al. 1990) for homology search of GmMlos from the NCBI soybean genome database (http://www.ncbi.nlm.nih.gov/sites/entrez). The homology search was used at 50 % or more query coverage and greater than 35 % identity. The superfluous hits were carefully removed as follows. First, the NCBI accession numbers and the chromosomal locations of the putative GmMlos were compared. In case two or more GmMlos had the same NCBI accession number and/or the chromosomal location, their amino acid sequences were compared. In case the amino acid sequences also matched, only one of these putative GmMlos was retained. Pfam (pfam.sanger.ac.uk/; Punta et al., 2012), the HMMER-based database, and InterProScan (Quevillon et al. 2005), which searches against secondary database, were further used to confirm the presence of signature MLO domain in the putative GmMlo sequences.

In order to investigate the probable roles of GmMlo members in plant cellular functions, their 1,000-bp upstream sequences representing the putative promoter regions were analyzed for the presence of various cis-acting regulatory elements. These upstream regions were analyzed using PlantCARE (http://bioinformatics.psb.ugent.be/webtools/plantcare/html/; Lescot et al. 2002) and PLACE (http://www.dna.affrc.go.jp/PLACE/; Higo et al. 1999) tools, which detect and identify the various cis-regulatory sequences that act as binding sites for regulatory proteins involved in the regulation of gene expression.

Topological configuration of GmMlo proteins and their gene organization

Prediction of TM helices of the putative GmMlo proteins was done using HMMTOP server 2.0 (http://www.enzim.hu/hmmtop/; Tusnády and Simon 1998, 2001), which provides the localization of proteins as well as the topological arrangement of the TM domains. Sub-cellular localization of all the GmMlo proteins was determined by the CELLO v.2.5 support vector machine software (http://cello.life.nctu.edu.tw/; Yu et al. 2006), and their physico-chemical properties were studied using the ProtParam server (http://web.expasy.org/protparam/; Gasteiger et al. 2005). The gene structure was studied using the two HMM-based fastest and most accurate gene prediction tools: Fgenesh (http://linux1.softberry.com/; Solovyev et al. 2006) and GENSCAN (http://genes.mit.edu/GENSCAN.html; Burge and Karlin. 1997). The number of exons was determined from the GNOMON tool as used in the NCBI soybean database. The exon–intron gene organization of the 39 GmMlo genes was determined using Gene Structure Display Server (GSDS; http://gsds.cbi.pku.edu.cn/; Guo et al. 2007).

Construction of phylogenetic trees and motif study

The multiple sequence alignment of amino acids using ClustalW (Thompson et al. 1994) with default parameters was visualized with CLC Sequence Viewer 6.8.1 software (http://www.clcbio.com), which enables multitasking working environment with clear visualization of each amino acid in a different color. The alignment generated through ClustalW was fed to CLC sequence viewer to generate sequence alignment of the 39 GmMlo proteins. We used MEGA 5.0 tool to construct a phylogenetic tree based on Maximum likelihood estimation (MLE) and Neighbor-joining (NJ) methods with 1,000 bootstrap replications. The MEME (http://meme.nbcr.net/; Bailey and Elkan. 1994, Bailey et al. 2009) package was used to analyze the functional motifs of the GmMlo proteins. Correlation of phylogenetic tree with the alignment file was done using Geneious pro 3.5.6 (http://www.geneious.com/; Drummond et al. 2008) to draw simultaneous inferences from phylogenetic tree and conserved motifs, and to have a generalized conclusion of motif variation and member diversification. The phylogenetic tree in Geneious pro was constructed using Neighbor-joining method with 1,000 bootstrap replications based on Jukes–Cantor genetic distance model (the only model that can be used in Geneious pro). Synteny between the selected pairs of chromosomal regions containing groups of GmMlo genes was analyzed using multiple-genome comparison and alignment tool (MGCAT; http://alggen.lsi.upc.es/recerca/align/mgcat/; Treangen and Messeguer. 2006).

Results

Mlo gene identification in G. max genome and analysis of their cis-acting elements

The amino acid sequences of the 15 AtMlo proteins were independently used in BLAST search to trap every possible homologous GmMlo member present in the soybean genome sequence available in the NCBI database. A total of 39 GmMlo members were identified by genome-wide scanning using BLAST Entrez query. The nucleotide sequences corresponding to the predicted GmMlo proteins were obtained using tBLASTn. The predicted GmMlo genes were numbered 1–20 based on their correspondence with the members of GmMlo family reported by Shen et al. (2012); the remaining 19 genes were numbered sequentially from 21 to 39, depending on their chromosomal location (Table 1; Supplementary Table 1). The 39 GmMlo genes were present in 15 of the 20 G. max chromosomes, the chromosomes devoid of GmMlo members were 5, 7, 14, 17 and 18. Chromosome 12 had the maximum number of six Mlo genes, followed by chromosome 6 (5 genes); chromosomes 8, 9, 10, 19 and 20 had one gene each, while the remaining eight chromosomes carried between two and four genes (Fig. 1). The two or more GmMlo genes located in a single chromosome were generally scattered, but six gene-pairs and two groups of three genes were located rather close to each other and appeared to form clusters. The distances between the presumed transcribed regions of pairs of these genes ranged from 11,725 bp between GmMlo8 and 27 to 709,271 bp between GmMlo11 and 29 (Supplementary Table 1).

Table 1 List of identified GmMlo members in Glycine max
Fig. 1
figure 1

Physical map of the 39 GmMlo members on 15 chromosomes of G. max. The size of each chromosome approximately corresponds to the total size of the concerned chromosome. The horizontal line on each chromosome represents the location of the GmMlo gene indicated by a number, corresponding to the gene serial number. Chromosome 5, 7, 14, 17 and 18 do not carry any GmMlo gene

The analysis of 1,000-bp upstream regions of the GmMlo genes indicated the presence of elements involved in hormonal response (ABRE, AuxRR-core, ERE, GARE-motif, P-box, TATC-box, TGA-element, TCA element, GCTCA element and TGACG element), tissue-specific gene expression (shoot-specific: as-2-box; seed-specific: RY-element, GCN4_motif and Skn-1 motif), developmental regulation (circadian, HD-Zip, CCAAT-box and Nodule-site 2 elements), elicitor response (AT-rich region and Box-W1 elements), and defense and abiotic stress responses (TC-rich repeats; Supplementary Tables 2a–2d). In addition, elements involved in wound response (WUN motif), and response to abiotic factors like light (MRE element), anoxia (ARE and GC motif elements), heat shock (HSE element), low temperature (LTR element), and drought (MBS element), were also present. Thus, 32 of the 39 GmMlo genes had at least one hormone response element, and two of the genes (GmMlo15 and GmMlo24) had four different hormone response elements each. Similarly, 22 genes had an element for seed-specific expression, and GmMlo21 had all the three seed-specific elements. Thirteen genes had pathogen elicitor response elements, and elements for developmental roles were present in a similar number of GmMlos. Finally, 33 of the GmMlos had response element for one or the other abiotic stress; most of these genes had more than one type of response element, and GmMlo8 and GmMlo39 had five different response elements each. Among the different motifs, the seed-specific Skn-1 motif was present in 19 genes; AT-rich repeat motif was found in 17 genes, while HD-Zip, Nodule-site 2, AuxRR-core, P-box and GC motif were detected in only one gene each.

Mlo protein domain characterization and gene organization

The TM helices, their locations in the polypeptide, and the topological arrangement of the GmMlo proteins were studied using HMMTOP server 2.0. The number of TM domains varied from four in GmMlo37 to up to ten in GmMlo1, GmMlo28, GmMlo11 and GmMlo17. However, 18 of the 39 GmMlo proteins had seven TM domains, the characteristic number of TM for these proteins (Table 1). All the GmMlo proteins were localized in plasma membrane of the cell (Kim and Hwang 2012) with their N-terminus projecting outward. The ProtParam study of physical and chemical properties of GmMlo proteins indicated that 35 of these proteins were rich in leucine, which comprised from 8.7 % (GmMlo8) to 13.1 % (GmMlo26) of the total amino acid residues; in 25 of these proteins the leucine content was 10 % or higher. The proteins GmMlo9, GmMlo25, GmMlo29, GmMlo32 and GmMlo33, on the other hand, were serine-rich, and they showed 9.2 % (GmMlo33)–11.9 % (GmMlo9) serine content. The molecular weights of the GmMlo proteins ranged from 33,325.5 to 78,850.5 Da, their theoretical pI varied from 6.87 to 9.66, aliphatic indices from 84.14 to 102.16 and the grand average hydropathicity from −0.187 to 0.185 (Supplementary Table 3).

The GmMlo gene size ranged from merely 3,485 bp (GmMlo27) to 18,992 bp (GmMlo28), and 28 of the GmMlo genes were over 5,000-bp long (Table 1). The organization of GmMlo genes as predicted by GSDS tool is depicted in Supplementary Fig 1. The number of exons varied from 8 (GmMlo36) to 18 (GmMlo34); most (24) of the GmMlo genes had 15 exons, which seems to be a common feature of Mlo gene family (Devoto et al. 2003; Singh et al. 2012). Seven GmMlo members had 13 exons; four had 14 exons, while 18, 16, 10 and 8 exons were detected in one gene each. The number of amino acids in GmMlo protein predicted using Fgenesh and GENSCAN ranged from merely 288 in GmMlo36 to 676 in GmMlo34, while the size of Mlo domain predicted through Pfam, ranged from only 256 in GmMlo36 to 619 in GmMlo34 amino acid residues (Table 1).

Sequence alignment and phylogenetic analysis

The multiple sequence alignment visualized through CLC Sequence Viewer and ClustalW indicated the presence of seven TM regions (Fig. 2). Only the TM3 domain was present in all the GmMlo members, while the other six TM domains were missing/disrupted in one (TM4), two (TM2, TM5), three (TM1, TM7) or four (TM6) of the 39 GmMlos. Comparative phylogenetic trees were constructed separately for the Mlo genes from Arabidopsis (Fig. 3a) and soybean (Fig. 3b); in addition, a combined tree for both Arabidopsis and soybean Mlo genes was also constructed (Fig. 3c). The MLE method generated dendrograms similar to those produced by the Neighbor-Joining (NJ) method (phylogenetic trees not shown), and there were only minor variations in the clustering pattern. This variation may be expected due to the differences between the bases of grouping used by the two methods: NJ relies on distance matrix for grouping, while MLE is an evolutionary model-based method. Therefore, the MLE trees were preferred for inferring evolutionary relationships among the Mlo genes (Fig. 3a–c). The phylogenetic trees for AtMlos (Fig. 3a), GmMlos (Fig. 3b), and the combined AtMlos and GmMlos (Fig. 3c) had, in each case, three major clusters (clusters I, II and III). The cluster I was the largest and it was subdivided into two sub-clusters (Ia and Ib) so that each tree had a total of four sub-clusters. In the case of A. thaliana, cluster Ia had five genes (AtMlo9, 10, 5, 7, 8), cluster Ib had four genes (AtMlo3, 12, 2 6), while clusters II and III had three genes each (AtMlo1, 15, 13, and AtMlo4, 11, 14, respectively). Similarly, the cluster Ia of G. max had 17 genes, cluster Ib had 10 genes, while clusters II and III had five and seven genes, respectively. Cluster Ia from the combined phylogenetic tree comprised the cluster Ib of AtMlos and the cluster Ia plus two members (GmMlo6, 8) from the cluster Ib of GmMlos. The cluster Ib of the combined analysis consisted of the cluster Ia of AtMlos and the remaining eight members of the cluster Ib of GmMlos. But the membership of clusters II and III of the combined tree was purely a combination of the members belonging to the respective clusters of AtMlos and GmMlos. Thus, the combined phylogenetic analysis of AtMlos and GmMlos produced only a small change in the clustering pattern: the genes GmMlo6 and GmMlo8 showed closer relationship with the members of cluster Ia in the combined analysis, while they were placed in cluster Ib when the GmMlos were analyzed separately.

Fig. 2
figure 2figure 2

Multiple sequence alignment of the predicted 39 GmMlo proteins. The alignment generated through ClustalW was fed to CLC Sequence Viewer for visualization. The red line block indicates the seven transmembrane regions in GmMlo proteins. The last block shows the region of calmodulin-binding domain (CaMBD), an essential characteristic of MLO gene family. Note that GmMlo 22 and 36 lack TM 5, 6 and 7 along with CaMBD. The CaMBD region shows strong conservation of lysine-rich domain (color figure online)

Fig. 3
figure 3

The AtMlo and GmMlo proteins phylogenetic classification generated using maximum likelihood estimation (MLE) method with the ancestral node at 1,000 replicates. The Arabidopsis Mlos are abbreviated with ‘At’ prefix. GmMlo stands for Glycine max Mlo. a The AtMlo members with three major clusters. b The GmMlo members with three major clusters. c The comparative AtMlo and GmMlo phylogenetic classification. All the classifications showed formation of three major clusters (marked with bold colored lines, cluster I was divided into two sub-clusters shown in similar color as of main cluster). The cluster I shows higher degree of evolutionary expansion with respect to the other clusters in all the linearized trees. The GmMlo members show duplication events that might have occurred after segmental deletion of ancestral gene, as also evident from Geneious pro visualization (color figure online)

The conserved motif study through MEME generated 27 motifs at motif width between 40 and 70 (Fig. 4a). Motif 2 was consistently present in all the GmMlo proteins and had three (serine, lysine and tryptophan) virtually invariant amino acid residues (Fig. 4b), while motif 27 was present only in GmMlo11. Figure 4c gives the amino acid sequences of the different motifs. The correlation between the alignment and phylogenetic tree was derived using Geneious pro (Supplementary Fig 2). It is evident from the Supplementary Fig 2 that GmMlo34 had three additional motifs, viz., motifs 19, 21 and 22 (predicted through block diagram of MEME). In contrast, GmMlo22 and GmMlo36 lacked some motifs (motifs 1, 3 and 6) present in other GmMlos. The motif 13 was found to be highly conserved between GmMlo5 and 27 with a single amino acid substitution (motif not shown). Similarly, motifs 16 and 18 were highly conserved between GmMlo21 and 29, and between GmMlo10 and 20, respectively.

Fig. 4
figure 4

a Combined block diagram of conserved motifs of the predicted 39 GmMlo proteins. b The motif 2 was found to be evenly conserved in all the GmMlo proteins. c The conserved motif sequences generated using MEME

Discussion

Features of the Mlo gene family

The Mlo gene family is one of the largest seven TM domain protein-encoding gene families in plants. The Mlo genes are known to participate in both constitutive as well as tissue-specific and development-related gene expression. The Mlo proteins function primarily in defense response to powdery mildew pathogen and programmed cell-death pathways (Peterhänsel et al. 1997), but they are also implicated in various other functions, including responses to abiotic stresses, growth regulators and wounding. In view of the above, this study was initiated to identify and characterize the Mlo genes present in soybean genome. We used the 15 AtMlo proteins as query sequences for homology search of the soybean genome sequence available in the NCBI database and identified 39 members of the Mlo gene family. The identified GmMlo proteins showed structural features and cellular localization similar to those of AtMlos, and the GmMlo genes exhibited exon–intron boundaries similar to those of AtMlo genes. Further, the GmMlo proteins had physico-chemical properties like molecular weight, theoretical pI and amino acid richness more or less comparable to those of AtMlos. Most of the GmMlo proteins were leucine-rich, which is in agreement with the findings for SbMlo (Sorghum bicolor Mlo) proteins (Singh et al. 2012). But the proteins GmMlo9, 25, 29, 32 and 33 were serine-rich in the place of being leucine-rich.

In this study, 19 GmMlo genes were identified in addition to the 20 genes reported by Shen et al. (2012) based on in silico analysis of soybean genome sequence in the Phytozome database (http://www.phytozome.net/soybean.php). We searched the soybean genome sequence available in the Phytozome database (as on 30/10/2013) using the search method of Shen et al. (2012), and identified a total of 35 GmMlo members. Similar result was obtained when the Phytozome database was subjected to homology search using AtMlo8, the longest (593 amino acid residues) AtMlo protein, as query sequence. But when AtMlo3 having 556 amino acid residues was used as query sequence for BLAST search, 39 GmMlo members were detected, although four of the members had only partial gene sequence (incomplete protein with as few as 80 amino acid residues). Further, the search of soybean genome sequence in the NCBI database using the method of Shen et al. (2012) identified 39 GmMlo members. Thus, the reason for the discovery of 19 additional GmMlo members in the present study seems to be the much larger and more complete genome sequence of G. max available in the NCBI database (1,115 Mb organized in 20 chromosomes) as compared to that in the Phytozome database (975 Mb distributed in 20 chromosomes).

The 39 GmMlo genes covered 0.0897 % of the soybean genome, and were distributed on 15 of the 20 G. max chromosomes. Most of the GmMlo members occurred as singletons, and only some of them were present in groups of two or three; the members in most of the gene-groups were separated by 20 kb or less. The physical map suggested similarities between pairs of some of the gene groups: for example, the distance between the members of gene-pairs GmMlo1-GmMlo21 and GmMlo11-GmMlo29 was virtually identical (0.7 Mb). This suggested that each pair of such GmMlo gene groups might have originated by duplication of a single parental genomic region coupled with translocation of one gene group to a different chromosome. Therefore, the synteny between four pairs of such genomic regions (between the regions having gene-groups GmMlo1-GmMlo21 and GmMlo11-GmMlo29, groups GmMlo2-GmMlo23 and GmMlo18-GmMlo19, groups GmMlo30-GmMlo31 and GmMlo15-GmMlo 34, and groups GmMlo4-GmMlo5-GmMlo6 and GmMlo7-GmMlo8-GmMlo27) was investigated using the MGCAT (Fig. 5). Both the genic as well as intergenic regions in three (chromosome 1 with 11, 4 with 6, and 12 with 13) of the four comparisons showed extensive synteny at the DNA sequence level. But in the case of chromosomes 2 and 16, a large (~14 kb) insertion/deletion was detected. These findings tend to support the suggestion that these four pairs of genomic regions originated from duplication of four parental genomic segments. G. max is a paleopolyloid species that has undergone two duplication events about 59 and 13 million years ago, each followed by gene diversification and loss, and extensive chromosomal rearrangements (Schmutz et al. 2010). The above GmMlo gene duplication events might have occurred about 13 million years ago during the second genome duplication event since the concerned intergenic sequences have not greatly diverged. Schmutz et al. (2010) detected homologous genomic regions between all the pairs of chromosomes investigated in this study. Shen et al. (2012) proposed that segmental duplication seems to be the chief method of increase in the number of Mlo genes, and findings from the present study are consistent with this suggestion.

Fig. 5
figure 5

Synteny between GmMlos containing genomic regions of pairs of G.max chromosomes analyzed by MGCAT. a Chromosome 1 (GmMlo 1, 21) and chromosome 11 (GmMlo 11, 29). b Chromosome 2 (GmMlo 2, 23) and chromosome 16 (GmMlo 18, 19). c Chromosome 6 (GmMlo 7, 8, 27) and chromosome 4 (GmMlo 4, 5, 6). d Chromosome 13 (GmMlo 15, 34) and chromosome 12 (GmMlo 30, 31). The syntenic gene sets show collinear gene order between pairs of chromosomes. The above relationship indicates that the syntenic arrangements of respective genes on different chromosomes have resulted due to common ancestry instead of random shuffling

Mlo proteins are reported to form 6 or 7 hydrophobic domains that have the potential to form TM helices; this feature is shared by Mlo proteins of all the plant species investigated so far. The N-terminus of TM domains has extracellular location, while their C-termini are exposed in the cytoplasm. The amino acid sequence variation, topology, and localization in plasma membrane of Mlo proteins are reminiscent of the seven TM domain G-protein-coupled receptors of metazoa (Devoto et al. 1999). The GmMlo proteins contained 4–10 TM domains: 8 proteins had less than seven TM domains, while 13 proteins had more than 7 TM domains, indicating that seven TM domains are not essential for the function being performed by these genes. Multiple sequence alignment of the GmMlo sequences showed that all the TM domains, except TM3, were disrupted in one or more of the proteins. But the TM domains 1, 2, 4, 5, 6 and 7 were disrupted/absent in one or more of the five GmMlo proteins that were less than 450 amino acids in length, i.e., presumably the truncated versions of Mlo proteins. Only TM6 was disrupted in one GmMlo protein that was 496 amino acids long. Thus all the TM domains, except for TM6, seem to be conserved in the GmMlo proteins, and appear to be needed for their function. The basic biochemical function of the Mlo proteins is not well understood. The disease susceptible response mediated by the wild-type Mlo protein does not implicate the heterotrimeric G-protein signaling. But it is possible that these proteins function as G-protein-coupled receptors in processes other than pathogen response. Alternatively, Mlo proteins may function as cell surface receptors in a signaling cascade that does not involve the G-protein complex (Lorek et al. 2010).

The CaMBD was completely missing in two of the truncated proteins (GmMlo22, 36), but was present in the remaining 37 proteins. Shen et al. (2012) had reported that GmMlo6 did not have a detectable CaMBD, and suggested that it may be either a special type of Mlo gene or a pseudogene. In this study, GmMlo6 had a CaMBD similar to those of the other members of the family. Further, the protein GmMlo6 identified from the currently available soybean genome sequence in the Phytozome database also had a regular CaMBD. Thus, Shen et al. (2012) failed to detect a CaMBD in GmMlo6 most likely due to incomplete sequence of the gene in the Phytozome database at the time of their search. It was observed that the C-terminal region of CaMBD was much less conserved than its N-terminal portion, indicating that the latter might be necessary for Mlo protein function, especially in disease response. CaMBD is involved in binding of the Mlo protein with the calmodulin protein, which seems to be necessary for Mlo function in powdery mildew susceptibility (Kim et al. 2002a, b; Lecourieux et al. 2006). When Mlo proteins are unable to bind calmodulin, a partially resistant response to powdery mildew is observed (Kim et al. 2002a, b). The GmMlo22 and GmMlo36 genes did not have a CaM-binding domain. Therefore, these genes would not be able to participate in the host response to powdery mildew and in regulated cell death (Kim et al. 2002a, b; Bai et al. 2008). However, the gene GmMlo36 does contain the AT-rich element in its promoter region for fungal elicitor response. It seems that GmMlo36 was initially involved in response to biotic stresses due to fungal pathogens, but the deletion of CaMBD would have now impaired this function.

The putative promoter regions of the GmMlo genes had a variety of response elements, including seed-specific, nodule site, circadian rhythm, etc. suggesting that they are involved in several developmental and stress response pathways. The Mlo genes are reported to participate in programmed cell death, and in response to powdery mildew pathogen (Kim et al. 2002b; Bai et al. 2008), wounding, auxins (Piffanelli et al. 2002; Feechan et al. 2009) and abiotic stresses (Shen et al. 2012). In general, the paralogous members of the family, as indicated by their clustering pattern (Fig. 4b), did not show similarities in their 5′cis-acting response elements. For example, the paralogs GmMlo17, 22 and 28 did not share response elements (Supplementary Table 2). Thus the paralogous members of the GmMlo gene family share the ‘signature motif’ of the family, but during the course of evolution their functions seem to have diverged.

Motif and comparative phylogenetic study

The use of Geneious pro offered an easy way to correlate phylogenetic classification with the conserved motif study. Comparative phylogenetic analysis facilitated clustering of the 39 GmMlo members into four sub-clusters, the tree topology pattern being similar to that of AtMlo members. It appears that the AtMlo and GmMlo members diverged along similar evolutionary lines and their sequence homology and sequence conservation suggest that they might have similar functions. Clustering pattern of the 20 GmMlos studied by Shen et al. (2012) along with the Mlo family members of eight other plant species, including A. thaliana, was essentially comparable with the grouping pattern of GmMlos obtained in this study (Fig. 4b). Further, the tree topology of GmMlos (Fig. 4b) suggests that several rounds of duplication events, leading to the present 39 GmMlos, have occurred at different times during the evolution of G. max. It appears that there were four different events of gene duplication in the clusters II and III, while seven and eight duplication events seem to have occurred in sub-clusters Ib and Ia, respectively.

Different GmMlos seem to have evolved by different mechanisms. For example, GmMlo17 and 22 proteins are very closely related, but GmMlo22 is truncated, while GmMlo17 is a full-length protein. It seems that these two genes originated from a recent duplication event, followed by a large deletion in GmMlo22. Similarly, GmMlo35 and GmMlo37 appear to have originated due to a duplication event, followed by deletion in GmMlo37. The presence of additional motifs at the C-terminus of GmMlo34 indicates an increase in the sequence of this gene due to duplication of the concerned motif sequences. In this study, both gain and loss of different motifs were observed in the different GmMlo members.

The present in silico study attempts to identify and characterize members of the Mlo gene family in the G. max genome. The Mlo proteins function as modulators of infection by the powdery mildew pathogen, which is reported to attack a large number of crop species. Homology search using AtMlos as query sequences identified 39 GmMlo genes located in 15 of the 20 G. max chromosomes. In addition, the analysis revealed that these proteins participate in several other functions, including response to abiotic stresses and developmental pathways. The GmMlo members seem to be involved in various cellular processes, exhibit tissue- and development-specific expression and many of them are likely to be involved in responses to fungal pathogens. The findings from this study may be helpful in developing strategies for breeding powdery mildew resistance in soybean, and functional validation and isolation of the individual GmMlo genes.