Introduction

Soybean (Glycine max) is one of most important agricultural crops in the USA and around the world. Because soybean seeds contain a high percentage of protein (40%) and oil (20%), soybean is considered the most nutritious crop and soybean seeds are processed into a variety of food products, such as soybean milk and tofu. Recently, soybean has been adopted as a potential source of biofuels. The widespread agricultural use of soybeans and the demand for increased production will require the development of cultivars with higher yields and improved resistance to environmental stressors. Thus, there are growing needs to modify the soybean to increase its yield and resistance to different environmental stresses. Although progress has been made, several critical problems remain, including the disease resistance and the need for increased yield. Newly discovered microRNAs (miRNAs) may play important roles in soybean development, nitrogen fixation, and the response to abiotic and biotic stresses.

The miRNAs are a class of small regulatory RNAs, which negatively regulate gene expression at the posttranscriptional levels by binding target mRNAs for mRNA cleavage or inhibition of mRNA translation (Zhang et al. 2006c). Many investigations have shown that miRNAs play an important role in a variety of biological and metabolic processes in plants and animals (Carrington and Ambros 2003; Ambros and Chen 2007; Zhang et al. 2007a). In plants, miRNAs function to control tissue (leaf, root, stem, and flower) differentiation and development, phase switching from vegetative growth to reproductive growth, signal transduction, and the response to biotic and abiotic stress (e.g., salinity, drought, and pathogens) (Chen 2005; Zhang et al. 2006c). Since the first miRNAs were discovered in plants in 2002 (Park et al. 2002; Reinhart et al. 2002), several hundred miRNAs have been identified in plants by computational and experimental approaches (Zhang et al. 2006e). A catalog of plant miRNAs includes 184 from Arabidopsis thaliana, 269 from rice, 234 from Populus trichocarpa (Griffiths-Jones 2004; Griffiths-Jones et al. 2006), and 188 from maize (Zhang et al. 2006a). However, very little is known about miRNAs in soybean despite its agricultural and economic significance (Zhang et al. 2005, 2006b; Subramanian et al. 2008; Sunkar and Jagadeeswaran 2008).

Comparative genomics across vastly divergent taxa has shown that many miRNAs are highly evolutionarily conserved from species to species, ranging from moss to high flowering eudicot species in the plant kingdom (Floyd and Bowman 2004; Zhang et al. 2006b) and from worms to humans in the animal kingdom (Pasquinelli et al. 2000, 2003; Altuvia et al. 2005). The extensive evolutionary conservation of miRNA provides a powerful approach to their identification using comparative genomics. Using this strategy, we recently developed an expressed sequence tag (EST) and a genome survey sequence (GSS) approach to identify miRNAs (Zhang et al. 2005; Pan et al. 2007). By using this approach, we have successfully identified more than 600 miRNAs in 71 plant species, including several important crops, such as wheat, tomato, tobacco, cotton and maize (Zhang et al. 2005, 2006a, b, 2007b). This approach has also been employed by other scientists to identify miRNAs in other plant species (Guo et al. 2007; Xie et al. 2007; Gleave et al. 2008). There are several significant advantages of using EST analysis for identifying miRNAs: (1) EST analysis can be employed to identify conserved miRNAs not only in model species, whose genomes have been published, but also in species for which only EST sequences have been determined; (2) EST analysis provides direct evidence for miRNA expression that cannot be inferred from genomic sequence surveys since EST are derived from transcribed sequences (mRNA) (Adams et al. 1991; Matukumalli et al. 2004); (3) miRNA identification using EST analysis can be conducted without specialized software using the BLASTn search algorithm and so is readily available for widespread use (Altschul et al. 1997). Although several computational programs have been developed for predicting miRNAs, all these programs are based on genome sequences and require that these programs run individually on a computer; there is no any clue on their expression of miRNAs predicted by these programs (Zhang et al. 2006e). Thus, the difficulty related to genome-based miRNA prediction is remedied by EST-based analyses. Based on these three advantages, EST analysis will significantly enhance our ability to identify miRNAs and to investigate miRNA structure, function and evolution. Except identifying more than the 600 miRNAs in 71 plant species, we employed EST analysis to analyze the evolutionary relationships of miRNAs. Our results demonstrated that many miRNAs are highly conserved evolutionarily across all major lineages of plants, including mosses, gymnosperms, monocots and eudicots. We further concluded that regulation of gene expression by miRNAs appears to have existed during the earliest stages of plant evolution and has been constrained (functionally) for more than 425 million years (Zhang et al. 2006b).

Currently, 394,370 soybean ESTs have been deposited in National Center for Biotechnology Information (NCBI) GenBank EST database (based on data collected on 8 February 2008). This provides a valuable resource for the identification of potential miRNAs in soybean. The goal of this study is to identify soybean miRNAs and their potential targets. To achieve this goal, we first compared all of these EST sequences with the 184 known Arabidopsis thaliana miRNAs to identify potential miRNA homologs in soybean; then, selected putative soybean miRNAs were validated by quantitative real time PCR (qRT-PCR) using miRNA specific primers and probes (Chen et al. 2005). Based on these newly identified soybean miRNAs, we also predicted the potential miRNA targets in soybeans by BLASTn search.

Materials and methods

Reference set of miRNAs

To identify potential soybean miRNAs, a total of 184 known A. thaliana miRNAs were defined as a reference set of miRNA sequences. A. thaliana miRNAs were used as reference miRNAs because A. thaliana and soybeans are eudicots and a large number of A. thaliana miRNAs have been deposited in the publicly available miRNA database. The 184 A. thaliana mature miRNAs and their precursor sequences were downloaded from the miRNA database (miRBase Sequence Database, http://microrna.sanger.ac.uk/sequences/; release 10.1, December 2007). Although some of these A. thaliana miRNAs were initially identified by computational approaches, a majority of them have been validated by experimental approaches including direct cloning, PCR, and/or Northern blotting (Griffiths-Jones 2004; Griffiths-Jones et al. 2006).

Soybean ESTs, cDNAs, and mRNAs

Soybean EST, mRNA, and cDNA sequences were obtained from the GenBank nucleotide databases at NCBI and the soybean nucleotide databases from the Institute for Genome Research (TIGR) at http://www.tigr.org. A total of 394,370 soybean ESTs were deposited in the NCBI EST database and all of these ESTs were screened against the 184 known A. thaliana miRNAs.

Identifying potential soybean miRNAs using EST-based comparative genomics

Soybean miRNAs were identified according to our previously published method (Zhang et al. 2005, 2006a, b, 2007b). There are two important parameters in EST analysis; one is conservation of sequences, another is the second hairpin stem-loop structure of the potential pre-miRNAs. Figure S1 summarizes the general procedure for identifying conserved miRNAs in soybeans using EST-based comparative genomics. Briefly, the mature sequences of known A. thaliana miRNAs were subjected to a BLASTn search against all of the publicly available EST databases in NCBI using BLASTn 2.2.9 (1 May 2004) (Altschul et al. 1997). To improve the BLASTn search, the Blast parameter settings were adjusted as follows: expect values were set at 1,000 to increase the number of potential hits; the default word-match size between the query and database sequences was set at seven; and the number of descriptions and alignments was raised to 1,000. If the searches reveal partial sequence similarity to an A. thaliana mature miRNA sequence, the non-aligned regions were manually inspected and compared to determine the number of matching nucleotides to assess their potential as miRNA candidates. Those EST sequences that closely matched (no more than 4 mismatches, including insertion and/or deletion nucleotides) the previously known A. thaliana mature miRNAs were included in the set of miRNA candidates used for additional characterization based on the following criteria. The entire EST sequence (containing the conserved miRNA sequence) was selected to predict the secondary structures and to screen for miRNA precursor sequences. Although we also did BLASTn searches using miRNA precursor sequences, the mature sequences were the primary source of sequences for BLASTn searches against the EST databases of NBCI GenBank because only mature miRNAs, rather than miRNA precursors, are highly conserved in plants (Zhang et al. 2006b).

Expressed sequence tag sequences with four or fewer mismatches and/or indels (deletion/insertion nucleotides) compared to the previously identified A. thaliana miRNAs, were further compared with each other to eliminate redundancies. Then, the secondary structures of the non-redundant sequences were generated using the Zuker folding algorithm, as implemented through the web-based computational software MFOLD 3.2 (Mathews et al. 1999; Zuker 2003). MFOLD 3.2 is publicly available at http://www.bioinfo.rpi.edu/applications/mfold/rna/form1.cgi. The software default parameters were used to predict the secondary structures of the selected sequences. All MFOLD outputs including free energy (ΔG kcal/mol), the number of nucleotides (A, G, C and U), location of the matching regions, and the number of arms per structure were recorded. The minimal folding free energy index (MFEI) for each sequence was calculated as previously described (Zhang et al. 2006d). In previous studies, we found that miRNA precursor sequences have significantly higher MFEI than other non-coding or coding RNAs, and the candidate RNA sequences are more likely to be miRNAs when the MFEI is greater than 0.85 (Zhang et al. 2006d). To avoid mistakenly designating other types of RNAs as miRNA candidates, MFEI was also considered when predicting secondary structures.

In this study, an RNA sequence was considered a miRNA candidate only if it fit all of the following criteria: (1) predicted mature miRNAs had no more than four nucleotide substitutions compared with A. thaliana mature miRNAs; (2) the RNA sequence can fold into an appropriate stem-loop hairpin secondary structure; (3) the mature miRNA could be localized in one arm of the hairpin structure; (4) no more than 6 mismatches between the predicted mature miRNA sequence and its opposite miRNA* sequence in the secondary structure; (5) no loop or break in the miRNA or miRNA* sequences; (6) predicted secondary structure had high MFEI and negative MFE. Overall, the application of these criteria for inclusion of RNAs as miRNAs reduced the number of RNAs analyzed, minimized the likelihood that non-miRNAs would be included in subsequent analyses, and significantly reduced the total number of predicted false miRNAs.

Expression of soybean miRNAs

Two approaches were employed to confirm and establish the expression of miRNAs in soybean. They are qRT-PCR analysis and EST analysis. All identified soybean miRNA precursor sequences were used to perform BLASTn searches of the NCBI EST database, and BLASTn results were recorded and analyzed. The potential tissues in which we expected miRNA expression for miRNA candidates were determined based on the tissue sources reported for each EST in the NCBI database.

Isolation of total RNA

Soybean (Glycine max cv. NC Raleigh) seeds were kindly provided by Dr. Joseph Burton (ARS, USDA, Raleigh, NC, USA). The soybean seeds were cultivated in the Greenhouse Facility of East Carolina University. Total RNA was isolated from 4-week-old soybean seedlings using mirVanaTM miRNA Isolation Kit (Ambion, Austin, TX, USA) according to the manufacturer’s protocol. Briefly, 0.1 g seedling tissues were harvested, weighed and immediately immersed into 1 mL Lysis/Binding Buffer in a microcentrifuge tube on ice. Collected samples were immediately homogenized using the Fisher Scientific PowerGen* Model 125 Homogenizer (Pittsburgh, PA, USA). Then, 300 μL of homogenized tissue was transferred into a centrifuge tube and 30 μL miRNA homogenate additive was added and mixed well by vortexing 20 s and left on ice for 10 min. Following the addition of 300 μL acid-phenol:chloroform (5:1, v/v; pH 4.5) and thorough vortexing for 60 s, the mixture was centrifuged for 5 min at 10,000g at room temperature. The upper aqueous phase was removed and 1.25 volumes of 100% ethanol (at room temperature) was added, mixed and allowed to incubate for 1 min on ice. The ethanol-treated samples were passed through the filter cartridge, which was washed three times with 500–700 μL of wash solution, and the purified RNA was eluted from the filter cartridge using 100 μL of elution buffer. The eluted RNA was stored at −20°C prior to analysis. The quality and the quantity of the total RNAs were measured using NanoDrop ND-1000 (NanoDrop Technologies, Wilmington, DE, USA).

To confirm soybean miRNAs and analyze the expression of miRNAs in soybean tissues, qRT-PCR was employed by using the Applied Biosystems TaqMan® microRNA Assays Protocol (Foster City, CA, USA). A two-step assay was performed in TaqMan-based real-time quantification of miRNAs. The first step involved a reverse-transcription reaction in which a stem-loop RT primer was used to reverse-transcribe mature miRNAs to cDNAs. The second step involved real-time PCR, in which the expression level of miRNAs was monitored and quantified using qRT-PCR that includes miRNA-specific forward primer, reverse primer and FAM dye-labeled TaqMan probes (Chen et al. 2005).

Reverse-transcription reaction

The miRNA reverse-transcription reactions contained 150 ng of total RNAs, 0.25 mM each of dNTPs, 3.33 U/μL MultiScribe reverse transcriptase, 1× reverse-transcription buffer, and 0.25 U/μL RNase inhibitor. The total volume of reverse-transcription reactions was adjusted to 15 μL using nuclease-free water. The miRNA reverse-transcription reactions were performed using an Eppendorf Mastercycler (Eppendorf North America, Westbury, NY, USA). The temperature program was 30 min at 16°C, 30 min at 42°C, 5 min at 85°C and then held at 4°C. All reverse-transcription reactions, including no-template controls and RT minus controls, were performed in duplicate.

Real-time reaction

Real-time PCR reactions were performed using TaqMan® microRNA Assays kit (Applied Biosystems) on an Applied Biosystems 7300 Sequence Detection System. Twenty μL PCR reaction mixtures were prepared and each contained 1 μL 20× TaqMan MicroRNA Assay primers and probes, 10 μL 2× TaqMan Universal PCR Master Mix, 1 μL of product from reverse-transcription reaction (after fivefold dilution), and 8 μL nuclease-free water. The reactions were incubated in a 96-well plate at 95°C for 10 min, followed by 45 cycles of 95°C for 15 s and 60°C for 60 s. After the completion of the real-time reactions, the threshold was manually set and the threshold cycle (C T) was recorded. The C T is defined as the fractional cycle number at which the fluorescence passes the fixed threshold (Chen et al. 2005). All reactions were conducted in triplicate.

Potential miRNA targets

There is ample documentation that the mechanism of miRNA-mediated gene regulation requires perfect or near-perfect complementarity between the miRNAs and their targeted mRNAs for directly cleaving mRNAs or repressing protein translation (Rhoades et al. 2002; Zhang et al. 2006c). A BLASTn search based on the complementarity between miRNAs and their targets has become a powerful approach for identifying plant miRNA targets. To date, a majority of miRNA targets in plants have been predicted based on BLASTn searches of databases followed by confirmation using one or several experimental approaches, including Northern blotting, qRT-PCR and 5′ rapid amplification of cDNA ends (5’RACE). In this study, we also used the BLASTn search to predict miRNA targets in the soybean. The procedure was similar to that described above for predicting soybean miRNA homologs. A modification was that we used the identified soybean miRNAs to do a BLASTn search in the protein-coding gene databases instead of the EST database. We searched the potential miRNA targets using the identified soybean miRNAs against the GenBank protein-coding nucleotide databases using BLASTn searches and the soybean nucleotide databases from TIGR using miRU (Zhang 2005). The parameters that were used in BLAST searches for miRNA targets, include total numbers of mismatched nucleotides between miRNAs and the potential targets and the alignment structures. The conservation of a target site in other plant species was also considered for identifying miRNA targets and eliminating false positives. The total number of allowed mismatches at complementary sites between miRNA sequences and potential mRNA targets were limited to no more than four (no mismatch between positions 10 and 11, no more than 1 mismatch between positions 1 and 9, and no more than 2 mismatches at other positions), and no gaps were allowed at the complementary sites. Because the proteome of soybean has not been fully annotated, we performed BLASTn searches using the predicted miRNAs against protein-coding nucleotide database as well as EST databases; in the later case, after identifying potential ESTs, we used the identified EST sequences to do a homology search against the protein-coding mRNA database in other plant species and decided the potential targeted genes based on the degree of similarity of protein-coding mRNAs among plant species.

Results

Identification of soybean miRNAs

After BLASTn searches of the NCBI EST databases using the 184 miRNAs from A. thaliana as probes and further screening based on analysis of the secondary structures of putative miRNA from the MFOLD 3.2 results, a total of 69 miRNAs were identified from a total of 394,370 soybean EST sequences (Table 1, Fig. 1 and Suppl. Fig. S2). These results provide evidence that about 0.0175% soybean ESTs contained potential miRNAs in the total pool of transcribed RNAs. This number is higher than the previously reported 0.010% for other plant species (Zhang et al. 2006b). There are two reasons that the soybean miRNA percentages are higher than those previously reported for other plant species; one is that several new miRNA families have been identified and the total number of plant miRNAs has increased since the previous study; another reason is that we modified the BLASTn search to include indels in addition to nucleotide substitutions as a measure of miRNA variation. In this study, we also identified five miRNAs from ESTs deposited in the NCBI EST database for G. soja (3 miRNA from 18,511 ESTs) and G. clandestine (2 miRNA from 911 ESTs). G. soja and G. clandestine are two important wild species of soybean.

Table 1 Soybean miRNAs identified by comparative genomics and secondary structures
Fig. 1
figure 1

Predicted hairpin secondary structures of the selected soybean miRNAs identified in this study. Mature miRNA sequences are shaded. miRNA precursors may be slightly longer than the sequences shown in this figure

Mature miRNAs can be located within either arm of the secondary hairpin stem-loop structures. Of the 74 soybean miRNAs identified, 38 (51.35%) are located in 3′ arm of the stem-loop hairpin structures while 36 (48.65%) are in 5′ arm. This property of soybean miRNAs is similar to those of other plant species in which mature miRNAs are typically confined to the stem-loop hairpin region.

The 69 miRNAs identified in soybean were classified into 33 miRNA families. The large number of miRNAs identified in soybean suggests that miRNAs are common in soybean and that miRNAs are highly conserved from A. thaliana to soybean. However, the abundance of miRNAs in each miRNA family differs (Fig. 2). Of the 69 miRNAs, 7 miRNAs belong to miR-166 family; 6 belong to miR-157 and miR-169, respectively; miR-172, miR-396 and miR-171 contain 4 members; and only one miRNA was identified from among each of the other miRNA families. The distribution of miRNAs among the various miRNA families in soybean is similar to other plant species, such as A. thaliana, rice, maize and cotton (Sunkar et al. 2005; Zhang et al. 2006a, b, 2007b). This uneven distribution of miRNAs in different family indicates that different miRNAs may have different evolutionary history and play a different role in plant development and growth.

Fig. 2
figure 2

Size of miRNA families in soybean

The diversity of soybean miRNAs is also observed in the length of pre-miRNA sequences (Fig. 3). The length of soybean pre-miRNAs varies from 44 to 259 with an average of 105.7 ± 45.4 nt and with a median of 92 nt. More than 50% of pre-miRNAs are between 79 and 112 nt in length. This distribution of pre-miRNA lengths is similar to those reported for Arabidopsis, rice, cotton and maize (Sunkar et al. 2005; Zhang et al. 2006a, b, 2007b).

Fig. 3
figure 3

Size distribution of pre-miRNAs in soybean

Characteristics of soybean miRNAs

Although there are few obvious similarities in the nucleotides at specific positions along the mature miRNA sequences identified in this study, uracil constitutes about 70% of the bases at the 5′ end of the mature miRNAs while cytosine represents greater than 60% of the bases at position 19 (Fig. 4). These results are comparable to those obtained in comparisons to cotton miRNA in which 53 (71.62%) and 49 (66.22%) of 74 cotton miRNAs have U and C at the 1st and the 19th positions from the 5′ end, respectively. We previously reported that uracil is the predominant nucleotide at the 5′ end of mature miRNA sequences, and proposed that uracil may play an important role in miRNA biogenesis through recognition of targeted miRNA precursors by the RNA-induced silencing complex (RISC) (Zhang et al. 2006b). However, a further comparison of the nucleotide distribution at each position in all the mature plant miRNAs reported to date showed that cytosine, like uracil is the predominate nucleotide at position 19 (61% of cases). Why is cytosine the dominant nucleotide at position 19 in mature miRNA? One possible reason is that cytosine at this location may be important for targeting RISC or Dicer cleavage to specific sites on pre-miRNAs.

Fig. 4
figure 4

Position-specific nucleotide preferences in soybean mature miRNAs. The percentage distribution of individual nucleotides at each position numbered 1–21 are designated as A, checkered boxes; C, cross woven lines; G, black bars with white stippling; and U, white bars with black stippling

The percentage composition of four nucleotides (A, C, G and U) in soybean pre-miRNAs is not even nor are the G/C or A/U ratios (Table 2). Uracil is dominant in soybean pre-miRNA sequences and comprises 29.94 ± 5.33% of total nucleotide composition followed by adenine, guanine and cytosine (19.93 ± 4.24%). While the expected ratio of G/C and U/A bases in fully double-stranded RNA would be 1, we found ratios of 1.27 ± 0.41 and 1.22 ± 0.43, respectively. This suggests soybean pre-miRNAs contain about 20% U and G more than A and C, respectively, which could be a simple manifestation of the unequal distribution of nucleotides within single-stranded regions of the pre-miRNAs.

Table 2 Major characteristics of the identified soybean pre-miRNAs

Minimal folding free energy (MFE) is one important characteristic that determines RNA and DNA secondary structure. The lower the MFEs, the higher the thermodynamic stability of the corresponding sequences; so the sequences with lower MFEs can form stable secondary stem-loop structures. Several studies have demonstrated that pre-miRNAs have high negative MFE (Bonnet et al. 2004b; Zhang et al. 2006d). In this study, we observed that soybean pre-miRNAs have high negative MFEs ranging from −6.4 to −82.8 kcal/mol with an average of −39.60 ± 17.26 kcal/mol. However, MFEs are strongly and positively corrected with their RNA/DNA sequence length. The longer the RNA sequences, the more freedom (the lower the MFEs) the sequences have to form stable secondary structures. To normalize the potential effect of nucleotide length on MFE calculations we developed a modification of the MFE calculation that we refer to as the adjusted MFE (AMFE). The adjusted MFE represents the MFE of an RNA sequence with 100 nt in length. Here, AMFEs of soybean pre-miRNAs ranged from −11.21 to −56.56 kcal/mol with an average of −37.84 ± 11.54 kcal/mol.

Although pre-miRNAs have high negative MFEs and AMFEs, several studies demonstrated that there is no significant difference between pre-miRNAs and other non-coding/coding RNAs (Zhang et al. 2006d). To better distinguish miRNAs from other RNAs, we developed a new criterion of miRNAs, called the minimal folding free energy index (MFEI), which combines three important parameters of RNAs: MFE, sequence length and G and C nucleotide content. A recent study demonstrated that the MFEI of miRNA precursor sequences was significantly higher than that of other RNAs including other small RNAs and mRNAs; a candidate RNA sequence is more likely to be an miRNA when the MFEI is greater than 0.85 (Zhang et al. 2006d). Now, MFEI is being adopted as an important criterion for distinguishing miRNAs from the other RNAs. In this study, the average MFEIs of the 74 soybean pre-miRNAs was 0.86; a majority of the identified soybean miRNAs have an MFEI value higher than 0.85.

The miRNA clusters in soybean

Five miRNA clusters were identified in soybean. These miRNA clusters are gma-miR166a-miR-166b cluster, gma-miR166f-miR-166 g cluster, gma-miR169c-miR-169g* cluster, gma-miR169b-miR-169d cluster, and the gma-miR171a-miR-171d cluster, which contains 10 different miRNAs. Each of these miRNA clusters was found in at least one EST sequence. More interestingly, we also found that the miR169c-miR169g* cluster exists in the wild soybean species G. soja and this cluster is highly conserved between G. max and G. soja. The clustered miRNAs represent 16% of the total of the identified miRNAs in soybean, suggesting that miRNA clusters are relatively common in soybean species. To the best of our knowledge, this is the first report of miR-166 and miR-171 family clusters in plants. In one of our previous studies, we discovered a small miR-169 cluster in cotton that consisted of two miR-166 precursors spaced 155 nt apart and oriented in the same direction (Zhang et al. 2007b). In this study, the same miR-169 cluster was also identified in the soybean (Fig. 5). This suggests that the miR-169 cluster is conserved even among distantly related angiosperm. Of the six soybean miRNA clusters identified in this study, some were encoded within the same precursor sequences whereas others were located on discrete precursors separated by genomic DNA. For example, miR169c and miR-169g* are located within different arms of the secondary hairpin structure of a same miRNA-169 precursor. This intra-premiRNA cluster was also observed in the wild species G. soja. Of the remaining 4 miRNA clusters, miRNAs were encoded within different precursors, separated by as few as several to as many as 168 nt. In the cases of those miRNAs that were encoded by different precursors, the mature miRNA encoding region was always located within the same arm of the secondary hairpin structure.

Fig. 5
figure 5

MiRNA-166a-166b cluster in soybean EST EV280596. a Schematic diagram of the organization of the cluster. b EST sequence containing the miRNAs encoded within the cluster. Shadowed sequences represent pre-miRNAs; underlined sequences represent the mature miRNAs. c Predicted secondary structure of miR-166a. d Predicted secondary structure of miR-166b

Sense- and antisense-strand miRNAs

Recent studies have shown that miRNAs are transcribed and processed from sense and antisense transcripts derived from the same genomic loci in both invertebrates and vertebrates, including fruit fly and human (Bender 2008; Stark et al. 2008; Tyler et al. 2008). However, to the best of our knowledge, no report has yet demonstrated that anti-sense miRNAs are also transcribed and processed from the same genomic loci in plant species. In this study, we observed that five pairs of soybean miRNAs are bidirectionally transcribed and processed from the same soybean genomic loci, generating both sense and antisense miRNAs (Fig. 6). The five pairs of soybean miRNAs are miR-157b and miR-157c, miR-157d and miR-157e, miR-162a and miR-162b, miR-396a and miR-396b, and miR-396c and miR-396d. These five pairs of transcripts belong to three distinct miRNA families. Although the EST database for wild soybean species is more limited than that of G. max, we also documented one pair (miR-396a and miR-396b) of these sense-antisense miRNAs in the wild soybean species, G. clandestine. We also observed that the G. clandestine pair has several nucleotide substitutions relative to the G. max pair, indicative that these miRNAs have evolved distinctly in these two lineages and may be widely distributed among soybean and other plant species (Fig. 7).

Fig. 6
figure 6

Sense and antisense miRNAs and their corresponding secondary structures

Fig. 7
figure 7

Alignment of soybean miR-396a with its corresponding antisense miR-396a* obtained from the cultivated soybean species Glycine max and the wild species Glycine clandestine. miR-396a and miR-396 are derived from the same genetic locus, see details in Fig. 6. This figure shows that miR-396 is highly conserved in soybean and its wild species. This suggests that antisense is also conserved in plant kingdom

In animals, antisense miRNAs have at least one nucleotide difference in their seed regions compared to their partners (Bender 2008; Stark et al. 2008; Tyler et al. 2008), suggesting that antisense miRNAs may play a different function in the regulation of target genes. In our study, we observed the same phenomenon in plants. Although the sense and antisense miRNAs are transcribed from a same miRNA locus at a same genome location, the mature products of sense and antisense miRNAs are not identical. All identified pairs of sense/antisense miRNAs have 1–3 nucleotide differences in relation to their anti-sense partners (Table 3). Further, a majority of the nucleotide changes occurred within the miRNA seed region. For example, sense and antisense miR-157 and miR-162 have one nucleotide difference at position 10 or 11, which is required for a specific mRNA target (Schwab et al. 2005). This suggests that antisense miRNAs may target different genes or function through a different mechanism in the plant kingdom. These antisense miRNAs may contribute to the functional diversification of miRNA genes in plant growth and development.

Table 3 Sense miRNAs and their antisense partners

Expression of soybean miRNAs in public EST database

ESTs are partial sequences derived from transcripts. Although ESTs in available databases are not representative of transcribed DNA in all plant tissues or culture conditions, this bias could influence conclusions concerning the expression pattern of a specific gene, such as a miRNA, which is found in the EST database. However, mining EST databases in a systematic way could provide evidence that miRNAs found in a specific EST is expressed in a specific tissue. After mining 394,370 soybean ESTs in NCBI databases, we found that at least one EST contains the identified miRNA precursor sequences in different soybean tissues, including leaf, flower, root, stem, hypocotyl, cotyledon, seedling and somatic embryo (Table 4). This suggests that these miRNAs are expressed in a specific soybean tissue.

Table 4 Number of ESTs containing a specific miRNA precursor

RT-PCR assay of putative soybean miRNAs

Stem-loop RT-PCR is a reliable method for detecting the expression of mature miRNAs, and it can distinguish miRNAs that vary in sequence from one another by as little as a single base pair (Chen et al. 2005). In this study, we used the unique primers, designed by the Applied Biosystems, to detect specific soybean mature miRNAs identified using in silico EST analysis. The miRNAs used in expression studies included miR-156, miR-157, miR-159, miR-166, miR-169, miR-172, and miR-396. The qRT-PCR analyses demonstrated that all miRNAs were expressed in soybean seedlings. Based on the threshold cycle (C T), we also observed the expressed level of a specific miRNA. A difference of one C T unit represents a two-fold difference in the amount of expression. Analyzing the results from qRT-PCR, we observed that the expression level of miRNAs differ from each other in soybean seedlings. For example, miR-156 is expressed much higher level than miR-172. The C T for miR-156 is 17.92 ± 0.11 awhile that is 22.88 ± 0.11 for miR-172, suggesting that the expression level of miR-156 is about 32 fold higher than the expression level of miR-172 (Fig. 8). The expression patterns observed for miR-156 and 172 appear to be related to their different functions. In Arabidopsis and rice, miR-156 regulates leaf development by targeting Squamosa-promoter binding protein-like protein (SBP) transcription factors (Rhoades et al. 2002; Schwab et al. 2005). Overexpression of miR-156 resulted in enhanced leaf initiation and produced bushier plants in Arabidopsis (Schwab et al. 2005). By comparison, miR-172 controls flower development and phase change from vegetative growth to reproductive growth by inhibiting the protein translation of an A class gene, apetal 2 (ap2) transcriptional factor (Aukerman and Sakai 2003; Chen 2004), which controls the timing of flower development and morphology (Lohmann and Weigel 2002). During plant vegetative growth, the expression level of miR-172 is very low but increases significantly immediately prior to flowering and reaches peak expression levels during the flowering period (Aukerman and Sakai 2003; Chen 2004). Overexpression of miR-172 inhibits the translation of the ap2 and ap2-like genes, which results in early flowering and disruption of floral organ identity (Aukerman and Sakai 2003; Chen 2004). Thus, it is not difficult to understand that miR-172 is expressed at lower level in soybean seedling compared to miR-156. This also suggests that miR-172 may also regulate soybean flower development and phase change from vegetative growth to reproductive growth and points to the possibility of using miRNAs to modulate phase change to influence soybean yield.

Fig. 8
figure 8

Amplification plot of soybean miR-156 (left) and miR-172 (right). The same amount of cDNA was added to each qRT-PCR analysis. Each miRNAs-dependent reaction was repeated three times

Soybean miRNA targets

miRNAs regulate gene expression posttranscriptionally. miRNAs bind to the targeted mRNAs within the 3′ untranslated region (3′ UTR) or coding region of transcribed mRNAs and promote mRNA cleavage or translation repression. Usually, there are no more than four mismatches between miRNAs and their targeted mRNAs in plants (Rhoades et al. 2002; Schwab et al. 2005). The sequence relationship between miRNAs and their mRNA targets has been used successfully to identify miRNA targets in plants by performing BLASTn searches using putative miRNAs as query sequences to search NCBI mRNA databases or other mRNA databases.

BLASTn results conducted as described in Methods revealed a total of 152 potential miRNA targets in soybean protein-coding sequence databases. At least one targeted mRNA was identified for each of the soybean miRNA families except miR-394, miR-399, miR-426, miR-862 and miR-865 (Table 5, Fig. 9). These 152 potential miRNA targets belong to several gene families with diverse biological functions. Among the pool of mRNA targets, the majority are transcriptional factors, whereas others are associated with plant metabolism and response to environmental stress. These results are similar to those reported previously for other plant species, such as Arabidopsis, rice and corn (Rhoades et al. 2002; Bonnet et al. 2004a; Zhang et al. 2006a).

Table 5 Potential targets of the identified miRNAs in soybean
Fig. 9
figure 9

Predicted miRNA targets and their complementary sites within defined mRNAs. Each bottom strand shows the miRNA sequence, and each top strand shows the corresponding complementary site within specific mRNA targets. Watson–Crick pairing (vertical dashes) and G:U wobble pairing (circles) are indicated

Transcriptional factors are an important component in transcriptional process. Transcriptional factors usually bind to a specific DNA sequence and control genetic information transfer from DNA to RNA. In this study, among the pool of mRNA targets, we found that nearly 40% of the miRNA families appear to target transcriptional factor-encoding mRNAs. Based on functional studies conducted in Arabidopsis and rice, it has been demonstrated that a number of the miRNAs we identified in soybean act on transcription factors that regulate plant development (Rhoades et al. 2002; Zhang et al. 2006c). These plant development regulators include miR-171, miR-172, miR-156/157, and miR-166, which control diverse developmental functions ranging from control of floral morphology and flowering time (miR-172) to leaf and root development (miR-156/157 and miR-171) (Zhang et al. 2006c). AP2 is one of the class A gene transcriptional factors that play an important role in floral morphology and flowering time. Results from Arabidopsis and maize show that miR-172 targets both AP2 in Arabidopsis (Aukerman and Sakai 2003; Chen 2004) and AP2-like gene glossy15 (gl15) in maize (Lauter et al. 2005; Zhang et al. 2006a). Similar, in this study, we found that miR-172 targets the ap2 gene in soybean, which suggests that the function of miR-172 is highly conserved among plants. The scarecrow-like (SCL) (GRAS domain) family is a class of plant-specific transcription factors, which control a wide range of plant developmental process. Our study demonstrated that one member of SCL family, SCL6, is a potential target of miR-171. Consistent with our interpretation, functional studies in Arabidopsis have shown that miR-171 targets SCL6 expression predominantly in inflorescence and floral tissues (Llave et al. 2002; Reinhart et al. 2002). This suggests that miR-171 may play a role in soybean flower development. Both HD-ZIP III and SQUAMOSA promoter binding protein-like protein (SBP) transcription factors are important for leaf development. Functional studies have shown that miR-166 and miR-156/157 control leaf development via targeting these two classes of transcriptional factors in Arabidopsis, rice and maize (Juarez et al. 2004; Mallory et al. 2004). Here, we also identified that these two classes of transcriptional factors are potential targets of soybean miR-156/157 and miR-165/166 families. Several other soybean miRNAs also target specific transcriptional factors, such as miR-169 targets CCAAT-binding (CBF) transcription factor; miR-447 targets TCP transcriptional factor, one important transcriptional factor controlling leaf development. Auxin response factors (ARFs) are a class of transcriptional factors, which play a role in plant signaling transduction and root development. In this study we found miR-160 perfectly match to ARF 10 mRNAs and this binding site is highly evolutionary conserved in several plant species, including Arabidopsis, rice, maize and soybean.

Soybean miRNAs may also target transcriptional activators. We found that miR-168 and miR-169 target GRF1-interacting factor 1 (GIF1) and HSP2, respectively. In Arabidopsis, GIF1 is a transcriptional coactivator, which regulates the growth and shape of leaves and petals (Kim and Kende 2004). To our best knowledge, we have not yet seen a report indicating that miRNA is also a target transcriptional activator. This will enhance our knowledge of miRNA-mediated gene regulation.

Another major class of soybean miRNA targets are genes that control biological processes and that respond to environmental condition. Nucleosomes are the fundamental units in eukaryotic chromatin. Nucleosome assembly is a complicated biological process, and it required nucleosome assembly protein (NAP). This study demonstrated that miR-414 perfectly complementary with soybean NAP-1 mRNA, and the binding site is conserved from species to species in plant kingdom, including Arabidopsis. This suggests that miR-414 may play an important role in regulating nucleosome assembly.

Several studies demonstrated that several miRNA are involved in response to a various environmental stress, including cold, salinity, drought and nutritional deficiency (Jones-Rhoades and Bartel 2004; Sunkar and Zhu 2004; Lu et al. 2005; Zhang et al. 2005; Chiou 2007). Here, we found that miR-159, miR-398, miR-414 potentially target GST, SOD and cytochrome C reductase mRNAs, which all play an important role in plant responses to different biotic and abiotic environmental stresses.

Discussion

Although miRNAs have been extensively studied in the past several years, no systematic study has been performed on soybean, one of the most important crops around world. Recently, two groups identified several miRNAs from soybean by computational and direct cloning approaches (Zhang et al. 2005, 2006b; Subramanian et al. 2008; Sunkar and Jagadeeswaran 2008), but soybean miRNAs still remains largely unknown. This study not only systemically identified 69 miRNAs from 33 families in the domestic soybean and five miRNAs in the soybean wild species G. soja and G. clandestine, but also revealed several novel miRNA features through comparative genome analysis. These results demonstrate that miRNAs are common in soybean. The potential miRNA targeted genes are involved in the development, metabolic processes, and stress response. This is similar to other plant species (Rhoades et al. 2002; Bonnet et al. 2004a; Zhang et al. 2006a).

More importantly, the occurrence of antisense miRNAs in plants was firstly reported in this study. Several recent studies observed that miRNAs can be transcribed from both sense and antisense strands of DNA in animals (Bender 2008; Stark et al. 2008; Tyler et al. 2008). However, plant antisense miRNAs has not been reported. The discovery of antisense miRNAs provides a new insight to miRNAs biogenesis and function. Five pairs of antisense miRNAs and their corresponding sense miRNAs have been identified in this study. These antisense miRNAs are also conserved in wild species of soybean. This suggests that antisense miRNAs may widely exist in plants. A majority of miRNA genes are transcribed by RNA polymerase (Pol II) following a common processing pathway (Lee et al. 2004). The miRNA gene is transcribed into a long primary miRNA sequence (pri-miRNA), which undertakes a series of subsequent processing events to generate a miRNA precursor (pre-miRNA) and finally a mature miRNA. When considering that sense and antisense miRNAs are complementary strands within the same genomic loci, several questions regarding the transcription of antisense miRNAs arise. How are these sense miRNAs and their partner antisense miRNAs transcribed? Are both sense and antisense miRNAs transcribed simultaneously or at different times? Are sense and antisense miRNAs transcribed following the same or independent mechanisms? If both sense and antisense miRNA are transcribed simultaneously, then what happens when the two transcriptional complexes encounter one another, i.e. how would the RNA Pol II complexes bypass each other when they meet?

The second major novel finding is that the occurrence of miRNA clusters is much higher in soybean than that in other plant species. It is common that animal miRNAs are clustered, although the significance of miRNA clustering is uncertain (Tanzer and Stadler 2004; Altuvia et al. 2005). However, miRNA clustering appears to be less common among plants. Only few miRNA clusters have been found in plants (Jones-Rhoades and Bartel 2004; Talmor-Neiman et al. 2006; Zhang et al. 2006b, 2007b). For example, Talmor-Neiman et al. (2006) observed that two miRNAs (miR-1219a and miR-1219b) were located within approximately 200 bp of each other. Surprisingly, we identified five miRNA clusters in soybean. The frequency is much higher than that in other plant species. Of the five miRNA clusters, miR-166 and miR-171 family clusters have not been reported earlier. It should be noted that these miRNA clusters were not observed in model plant species, such as A. thaliana and rice, suggesting that selected clustered plant miRNAs may evolve in a lineage-specific manner. In animals, many miRNA clusters are formed via miRNA duplication (Tanzer and Stadler 2004). Plant miRNA clusters may have been generated by a similar mechanism. Supporting evidence includes some miRNA clusters in Arabidopsis appear to have evolved via miRNA duplication (Allen et al. 2004; Maher et al. 2006). However, the mechanisms that drive miRNA clustering and miRNA cluster evolution are unclear. The study of clustered animal miRNAs has shown that clustered miRNAs have similar gene expression patterns and are transcribed together in a polycistronic manner. This indicates that common regulatory control may be a significant force in the maintenance of miRNA clustering (Tanzer and Stadler 2004; Altuvia et al. 2005). Additionally, there are as much as 40 miRNAs clustered together in animals whereas plant miRNA clusters only contained a small number of miRNAs. This difference suggests that the mechanisms driving miRNA evolution in animals and plants may be different.

Another interesting finding is that the domination of U and C bases at the first and 19th positions from the 5′ ends of the mature miRNAs. Although it has been reported that U is dormant at the first position (Zhang et al. 2006b), no study has reported that C base dominates the 19th position. It is unclear why U and C are dominated at these two positions. Further study on this phenomenon may allow a better understanding of the mechanism of miRNA biogenesis and function.

Conclusions

Our results from comparative genome-based in silico screening of soybean EST databases using Arabidopsis miRNAs and quantitative real time PCR (qRT-PCR) provide evidence for 69 miRNAs belonging to 33 families, with an additional five miRNAs identified in two wild soybean species. Based on sequence comparisons among the soybean miRNAs, we conclude that their precursors (pre-miRNAs) vary in length from 44 to 259 with an average of 106 ± 45 nt and that these precursors include previously unidentified clustered miRNAs as well as sense and antisense miRNAs, which have not been observed in plants. Further, comparative sequence analyses of the distribution of individual bases at each nucleotide position within mature soybean miRNAs revealed that uracil is the dominant nucleotide at position one (5′ most) while cytosine is the dominant nucleotide in position 19, which suggests that these two nucleotides may play an important role in miRNA biogenesis and/or miRNA-mediated gene regulation. miRNA-specific qRT-PCR analyses of RNA samples prepared from soybean seedlings revealed that miRNAs are differentially expressed both quantitatively and qualitatively in soybean tissues. Identification of putative miRNA-targeted mRNAs indicate that the targeted mRNAs include a large percentage (nearly 40%) that are transcription factors as well as sequences that encode genes that play a role in signal transduction and stress response. Overall, our results showed the extensive evolutionary conservation of miRNAs in plant species with an apparent significant increase in the number of clustered and antisense miRNAs in soybeans compared to previously studied plants.