Introduction

Polyploidy, the state of having more than two sets of chromosomes, is very common in plants and there is extensive evidence for whole-genome duplication events in basal angiosperm lineages (Soltis et al. 2009). Species experiencing relatively recent genome duplications are defined as polyploids per se, whereas species with more ancient duplications are typically defined as paleopolyploids or diploids because their chromosome sets have differentiated such that they no longer pair and/or resemble one another.

Polyploid studies frequently focus on gene redundancies and the divergence of duplicated gene copies. The fate of duplicated genes over evolutionary time is typically divided into three categories: non-functionalization, neo-functionalization, and sub-functionalization (Force et al. 1999; Prince and Pickett 2002). Increasingly, studies of gene duplication focus on gene expression data (Adams 2007; Jackson and Chen 2010) as divergence in gene expression profiles may indicate a divergence in duplicate gene function. A recent study in Arabidopsis thaliana supports this idea, reporting that highly co-expressed duplicate gene pairs shared more similar protein–protein interaction profiles than less co-expressed pairs (Arabidopsis Interactome Mapping Consortium 2011). Studies have focused on duplicate gene transcript partitioning as a consequence of plant developmental differentiation (Buggs et al. 2011; Chaudhary et al. 2009; Hovav et al. 2008; Nomura et al. 2005) or stress conditions (Dong and Adams 2011; Liu and Adams 2007; Stamati et al. 2009) in a wide range of natural and/or synthetic plant polyploids. Recent investigations have incorporated structural aspects into the analysis by focusing on transcript partitioning between genes located within duplicated linkage blocks (Flagel et al. 2009; Lin et al. 2010).

Soybean is a sequenced paleopolyploid genome that maintains at least one gene duplicate for ~75 % of its genes (Schlueter et al. 2007; Schmutz et al. 2010). The most recent genome doubling event occurred approximately 9–14 million years ago (Blanc and Wolfe 2004; Schlueter et al. 2004; Schmutz et al. 2010). A high proportion of the duplicated soybean genes resulted from the most recent genome duplication event. These gene pairs are located within syntenic chromosomal regions and are termed homoeologous gene pairs. A smaller proportion of soybean gene duplicates are arranged in tandem or are located within non-syntenic regions; these would be considered non-homoeologous paralogs.

RNA interference (RNAi) and DNA cytosine methylation are epigenetic processes that regulate gene expression and silencing. RNAi processes are governed by the activity of paralogous RNAseIII Dicer-like (DCL) genes that encode endonuclease proteins that process double-stranded RNA (dsRNA) into small RNAs (sRNAs). Each DCL has a specialized function that has been well characterized in Arabidopsis and other model organisms (Bouche et al. 2006; Eamens et al. 2008b; Margis et al. 2006). The DCL family of proteins in Arabidopsis has four canonical DCLs (AtDcl1, AtDcl2, AtDcl3, and AtDcl4) which control the expression of developmentally regulated genes, repression of mobile DNA elements, and defense against viral infection by generating a variety of sRNAs, including micro (miRNAs), natural-anti-sense (nat-siRNAs), repeat-associated (rasiRNAs), trans-acting (tasiRNAs), and viral small (vsRNAs) (Margis et al. 2006). DCL1 is the enzyme responsible for the processing and maturation of miRNAs in Arabidopsis. miRNAs are single-stranded 21-nt RNA molecules derived from partially complementary stem loop precursor structures transcribed from host genes that control gene expression (Eamens et al. 2008b). DCL2 is required for the processing of nat-siRNAs generated from two overlapping RNA transcripts in cis-antisense orientation (Katiyar-Agarwal et al. 2006) and the transitive silencing of transgenes (Mlotshwa et al. 2008). DCL3 is one of several components of the RNA-directed DNA methylation pathway (RdDM) and is responsible for processing 24-nt rasiRNAs from endogenous repeat sequences and transposons. DNA methylation of repetitive and transposon sequences suppresses their aberrant expression, thereby maintaining genome stability (Chan et al. 2005). DCL4 sequentially processes tasiRNAs from specific miRNA-targeted transcripts that convert into double-stranded RNA by RNA-directed RNA polymerase 6 (RDR6). These tasiRNAs negatively regulate various transcripts involved in organ development and vegetative phase changes in the plant (Allen et al. 2005).

RNAi and DNA methylation have a well-established association and have been reported to be influenced and altered by stress conditions in Arabidopsis (Ben Amor et al. 2009; Borsani et al. 2005; Boyko et al. 2010; Navarro et al. 2008), rice (Yan et al. 2011), and Medicago truncatula (Capitao et al. 2011). Specific roles in stress response have been identified for DCL2 and DCL3 in Arabidopsis (Borsani et al. 2005; Boyko et al. 2010; Brosnan et al. 2007; Eamens et al. 2008a; Yan et al. 2011). In the paleopolyploid soybean, widespread gene duplication adds an additional layer of complexity to defining the roles of the genes that govern these processes.

In this study, we profiled the transcriptional responses to stress of eight soybean duplicated gene pairs and one nonduplicated gene (DCL3) known to be involved in epigenetic processes, particularly RNAi and DNA methylation. The RNAi genes include seven canonical soybean DCL genes and homoeologous pairs of ARGONAUTE1 (AGO1), RDR6, and the double-stranded RNA binding protein DRB1 (Eamens et al. 2009; Margis et al. 2006; Vaucheret 2008; Wassenegger and Krczal 2006), respectively. The cytosine–DNA–methyltransferase genes include a homoeologous gene pair with homology to DNA METHYLTRANSFERASE1 (MET1) (Finnegan and Dennis 1993) and a paralogous pair with homology to DOMAINS REARRANGED METHYLASE1 (DRM1) (Cao and Jacobsen 2002). The transcript analysis was conducted across eight different stress treatments and three different soybean genotypes in an attempt to define co-expression patterns between duplicate genes and identify unique transcriptional responses to stress.

Materials and methods

Identification of predicted RNAi pathway and methylation genes in soybean

The predicted soybean homologs for several gene families, such as the DCL, AGO, and methyltransferase families, were obtained from The Arabidopsis Information Resource (http://www.arabidopsis.org/) and the Rice Genome Annotation Project (http://rice.plantbiology.msu.edu/). The resulting amino acid sequences were used to query the soybean sequenced genome databases (http://www.phytozome.org). Most genes of interest had more than one copy. We chose to focus on a subset of genes that clearly showed two duplicate, intact copies: GmDCL1, GmDCL2, GmDCL4, GmAGO1, GmDRB1, GmDRM1, GmMET1, and GmRDR6. The soybean homolog gene models are shown in Table 1, all renamed with “Gm” prefixes to denote the species Glycine max and generic “a/b” annotations to specify the two duplicate copies. GmDCL3, which displayed one intact copy, was also included in the downstream transcriptional analysis. Thus, all seven complete soybean DCL genes were included in this part of the study.

Table 1 Location and conservation of soybean gene duplicates in this study

The positions of duplicated blocks for soybean were taken from published data (Schmutz et al. 2010) and the location of individual genes of interest within those blocks were visualized using Circos (Krzywinski et al. 2009). Evolutionary distances between duplicate genes were determined using the gene model nucleotide sequences. Duplicate gene coding regions were aligned using the Smith–Waterman pairwise alignments algorithm. Synonymous (Ks) and nonsynonymous (Ka) changes between the duplicated sequences were determined using PAML (Yang 1997; Yang 2007). To determine the age of the duplications, a molecular clock was assumed and dating was determined as previously described (Schlueter et al. 2004).

Plant materials, growth, and nucleic acid extraction

Soybean cultivars Williams 82, Archer, and Noir 1 seeds were obtained from Dr. James Orf at the University of Minnesota. The plants were grown in 50:50 soil and vermiculite mix and maintained under standard growth chamber conditions (22–25 °C, 16-h photoperiod at 150–200 μmol m−2 s−1).

Abiotic stress experiments were performed after 14 days of growth. Seedlings were gently uprooted, with soil material removed and incubated in their respective treatments for 3 h, then flash-frozen for RNA extraction. Salt-stressed seedlings were incubated in 200 mM of sodium chloride (NaCl). Cold-stressed plants were incubated at 4 °C distilled water in a well-lit walk-in cold room. Drought stress was carried out by incubation of the plant root system between two pieces of 3-mm Whatman filter paper. Pathogen response stress was simulated by incubation in 1 mM salicylic acid (SA) solution. Mock control seedlings were incubated in dH2O. All treatments were carried out in growth chamber conditions with the exception of the cold treatment (described above). Triplicate samples of mock and treated 14-day-old tissue encompassing uni-foliate and trifoliate leaves, stems, and roots were each harvested separately at approximately six h after light onset and immediately flash-frozen in liquid nitrogen.

An isolate of the oomycete soybean pathogen Phytopthora sojae Race28 (Ps28) (supplied by Dr. Dean Malvick, University of Minnesota) was maintained by weekly subculture on V8 agar. A 1-cm vertical incision was made on the hypocotyl 2–3 mm below the cotyledons on soybean cultivars cvs. Williams 82 (susceptible), Archer (resistant), and Noir 1 (susceptible). Agar infected with P. sojae was inserted into the incision and the wound site wrapped in parafilm to protect against desiccation and unrelated infection. Mock controls were carried out using sterile V8 agar (Kachroo et al. 2008). Mock and P. sojae-infected inoculated plants were harvested at 3- and 24-h time points. The soybean viral pathogen, soybean mosaic potyvirus (SMV), was obtained from Prof. Ben Lockhart, University of Minnesota. The cotyledons, stems, uni-foliate, and tri-foliate leaves of 14-day-old Williams 82 (susceptible), Archer (susceptible), and Noir 1 (susceptible) were lightly dusted with carborundum and mechanically inoculated with ground SMV-infected plant material in 100 mM PO4 (pH 7.5) 0.5 % mercapto-ethanol inoculation buffer. Mock treatments were carried out using the same buffer minus infected material. Mock and virus-inoculated plants were incubated in the growth chamber and harvested at 10 and 30 days past inoculation (dpi). All plants were harvested approximately 6 h after light onset and immediately flash-frozen in liquid nitrogen and stored at −80 °C until required. Virus infection was confirmed by electron microscopy and PCR using virus-specific primers.

For all experiments, three biological replicate plants were sampled for each treatment × tissue type × genotype sample. Total RNA was extracted using TRIzol (Invitrogen), chloroform-treated twice to remove unwanted protein and precipitated with an equal volume of iso-propanol. After centrifugation, the pellet was re-suspended in nuclease-free water and immediately DNase-treated, followed by purification using an RNA cleanup kit as per manufacturer’s instruction (Qiagen). The RNA concentration and purity was measured by a Nanodrop spectrophotometer (NanoDrop Technologies) and its integrity was validated by agarose gel electrophoresis. DNA samples were collected from Archer, Noir 1, and Williams 82 leaf samples using the Qiagen DNeasy kit (Qiagen).

Transcription analysis with quantitative real-time PCR

A 3-μg aliquot of DNase-treated RNA from each RNA sample was reverse-transcribed using Superscript III for first-strand synthesis according to the manufacturer’s instructions (Invitrogen). Quantitative real-time PCR was performed to estimate the transcriptional responses of the Dicer-like genes GmDCL1a, GmDCL1b, GmDCL2a, GmDCL2b, GmDCL3a, GmDCL4a, and GmDCL4b (gene model names are shown in Table 1). Primers were designed using primer3 (http://frodo.wi.mit.edu/). Manual adjustments were made for some primers to assure paralog-specific amplification. Real-time PCR data were collected from amplification plots and measured relative to the calculated ΔC t value of endogenous actin gene ACT2/7 (Glyma19g32990) (Jian et al. 2008) using the 2ΔΔCt method (Pfaffl 2001). The primer sequences for the real-time PCR experiments are shown in Table S1 of the “Electronic supplementary material”.

A linear model was created using the qPCR expression data. The model was a nested ANOVA with abiotic stress nested within genotype nested within developmental tissue. The transcriptional responses to stress of the GmDCL genes were compared with the unstressed control samples and significance was determined by Fisher's least significant difference test. Principal component analysis (PCA) was conducted by using every genotype, tissue, and abiotic stress combination as explanatory variables to describe the relationship between the GmDCL genes. ANOVA and PCA were conducted using the statistical software package R. PCA was visualized using the R biplot Gui (la Grange et al. 2009).

Relative transcription of gene duplicates

We used the Sequenom MassARRAY technology to quantify the transcript ratios of duplicated soybean genes encoding GmDCL1, GmDCL2, GmDCL4, GmAGO1, GmDRB1, GmDRM, GmMET1, and GmRDR6 (gene model names are shown in Table 1). The procedure was nearly identical to the method used in a previous study to determine the transcript ratios of 29 homoeologous genes on soybean chromosomes 8 and 15 (Lin et al. 2010). Briefly, SNPs were identified between the coding regions of the gene duplicate pairs using Phytozome (http://www.phytozome.net/) and Align Sequences Nucleotide BLAST.

DNA from leaf tissues of Archer, Noir 1, and Williams 82 were used as controls for MassARRAY assay quality. The cDNA samples from the abiotic and biotic stress experiments on the three soybean genotypes were assayed with MassARRAY for the three biological replicates for each stress × tissue × genotype sample. To quantify the duplicate transcript ratios, PCR and extension PCR reactions on the cDNA and DNA control templates were performed according to the manufacturer’s specifications (Sequenom). To increase the reliability of the measurements, four technical replications were performed for each sample. In downstream analyses, the value used for each biological replicate was the mean of the four technical replicates. Mass spectrometry quantification of duplicate transcript ratios was performed at the University of Minnesota Genotyping Facility. The resulting data were run through a quality control pipeline to remove unusable data and bad assays as described (Lin et al. 2010). Transcript ratios were standardized based on the DNA control data for each assay as described (Lin et al. 2010).

Each of the gene pairs were represented by multiple SNP assays (GmDCL1 = six assays, GmDCL2 = three assays, GmDCL4 = seven assays, GmAGO1 = three assays, GmDRB1 = three assays, GmDRM1 = seven assays, GmMET1 = four assays, GmRDR6 = ten assays). For graphing purposes, transcript data were averaged among assays for each gene pair. Microsoft Excel, PowerPoint, and Spotfire DecisionSite 9.1.1 software were used to generate figures and tables of the duplicate gene transcription data.

Results

Identification and structure of duplicated soybean genes involved in epigenetic processes

Amino acid sequences from RNAi pathway genes previously characterized in Arabidopsis were used to query the soybean genome sequence database (www.phytozome.net) to identify soybean homologs. Searches were performed for genes known to be involved in the RNAi pathway, including the four canonical DCL genes, AGO1, RDR6, and DRB1. Additionally, searches were performed to identify soybean homologs to the cytosine–DNA–methyltransferase genes, MET1 and DRM1. Two duplicate soybean gene homologs were found for all but one of these genes. A duplicate gene was not identified in soybean for the homolog of DCL3.

The soybean GmRDR6, GmDRB1, and GmMET1 gene duplicates all reside within duplicated homoeologous blocks between chromosomes 4 and 6 with an average of ~60 % duplicate gene conservation between blocks (Fig. 1; Table 1). Similarly, the two GmAGO1 genes reside in a small but highly conserved block between chromosomes 9 and 16. The two GmDRM1 copies reside on chromosomes 5 and 19, which do not appear to belong to any conserved homoeologous block.

Fig. 1
figure 1

Chromosomal positions of six homoeologous gene pairs in this study. The position of the GmDCL2 tandem duplication on chromosome 9 is also shown (indicated with asterisks)

The chromosomal locations of GmDCL1a and GmDCL1b were clearly defined and located in a large homoeologous block on the distal ends of chromosomes 3 and 19, respectively (Fig. 1). GmDCL2a and GmDCL2b both reside on chromosome 9 as a uni-directional tandem repeat separated by 5 kb. GmDCL2a and GmDCL2b share homology with two additional copies respectively located on chromosomes 8 and 15 (tentatively named GmDCL2c and GmDCL2d). However, analysis of predicted amino acid sequences of both GmDCL2c and GmDCL2d revealed several in-frame stop codons throughout the coding region, and both gene models have incomplete structural domains, leading to the conclusion that these “c” and “d” copies are likely pseudogenes. GmDCL4a and GmDCL4b are located on chromosomes 17 and 13, respectively, and are components of a small homoeologous block spanning several megabases between the two chromosomes. Both chromosomes 17 and 13 are highly rearranged, with many small homoeologous blocks matching several different chromosomes. The GmDCL3a locus situated on chromosome 4 resides within a robust homoeologous block between chromosomes 4 and 6, similar to the GmRDR6, GmDRB1, and GmMET1 gene duplicates. However, the GmDCL3b homoeologous candidate appears to be a pseudogene; GmDCL3b shares high DNA sequence homology (92–98 %) with GmDCL3a across several small regions of the locus but has an in-frame stop codon 42 amino acid residues down-stream of the start codon and lacks several critical domains.

Collectively, the genes analyzed in this study include six homoeologous pairs imbedded within homoeologous blocks, one unlinked paralogous pair (GmDRM1), one tandem repeat paralogous pair (GmDCL2), and a single gene copy with no intact duplicate (GmDCL3). The synonymous substitution rate (Ks values; Table 1) was calculated between each duplicate pair to estimate the age of the duplications. The age of duplication for the six homoeologous pairs ranged from 3 to 11.7 mya (data not shown). This finding, along with the estimated age of duplication for other gene pairs within these blocks, suggests that these duplications were potentially derived from the whole genome duplication event 9–14 mya. The age of duplication for the GmDCL2a tandem repeat was 19.4 mya, indicating that this duplication predated the whole-genome duplication event of soybean.

We calculated Ka/Ks ratios for each duplicate pair to examine whether any of the gene copies show evidence for current positive selection (Ka/Ks >1; Table 1). While none of these gene pairs show significant evidence of positive selection, GmDCL1a and b have a very high ratio relative to the other duplicates. This indicates that there may have been some positive selective pressure acting on one or both of these gene copies following duplication to allow for divergence in function, and over time those changes have become fixed and maintained under negative selection.

Transcriptional responses of DCL genes to abiotic stresses

Quantitative real-time PCR was used to measure the transcriptional responses of the seven GmDCL genes to abiotic stresses. Three genotypes (William 82, Archer, and Noir 1) were tested for four different stresses (cold, drought, SA, and high NaCl) along with an unstressed control. Each genotype × treatment was performed on three biological replications. PCA was used to group genes based on expression profile similarity among the stress treatments, tissue types, and genotypes. The first principal component explained 52.1 % of the variance and the first two principal components combined explained 74.3 % of the variance. The data can be interpreted in a PCA biplot (Chapman et al. 2002; Park et al. 2008) of these two principal components (Fig. 2). GmDCL2a and GmDCL3a exhibited a strong co-upregulation in response to abiotic stress, as seen by the cluster on the right side of the biplot (Fig. 2). This co-upregulation of GmDCL2a and GmDCL3a is observed in nearly all stresses and tissue types, particularly roots (Fig. S1 of the “Electronic supplementary material”). Furthermore, GmDCL1b and GmDCL2b exhibited a strong co-downregulation in response to abiotic stress as seen by the cluster on the left (Fig. 2). This relationship between GmDCL1b and GmDCL2b was driven largely by the stress-induced co-downregulation in stem tissues (particularly the SA treatment; Fig. 1 of the “Electronic supplementary material”). GmDCL1a and GmDCL4b both plotted near the center of the biplot, indicating that they exhibited limited responses to the treatments. GmDCL4a did not cluster with any other gene, perhaps due to greater expression variation among genotypes.

Fig. 2
figure 2

Principal component analysis of the seven soybean Dicer-like genes in response to abiotic stresses. Real-time qPCR was performed on Williams 82, Archer, and Noir 1 cDNA in three different tissues under four abiotic stresses (and unstressed controls). This biplot shows clustering of soybean Dicer-like genes based on the first two principal components

Pairwise Pearson’s R correlations among the seven GmDCL genes were examined to further analyze co-expression patterns in response to stress (Table 2). The two major PCA clusters described above also exhibited the highest pairwise R values (GmDCL2a–GmDCL3a, 0.800; GmDCL1b–GmDCL2b, 0.821). The next highest R values were displayed by GmDCL2a–GmDCL2b (R = 0.656) and GmDCL3a–GmDCL2b (R = 0.656). Figure 3 shows scatterplots of the transcriptional responses to abiotic stresses for these four gene pair comparisons. Some trends were clearly observable for a given stress across the three genotypes. The most obvious was the effect of SA treatments on the GmDCL1b and GmDCL2b transcripts in stems; all three genotypes displayed a strong transcriptional down-regulation of these two gene copies (the yellow triangles in Fig. 3a). This treatment did not show down-regulation in the other five gene copies (note the positions of the yellow triangles in Fig. 3c, d). Figure S2 of the “Electronic supplementary material” shows all of the pairwise comparisons between the seven genes.

Table 2 Real-time qPCR correlation matrix of transcript response to abiotic stress for the Dicer-like gene family membersa
Fig. 3
figure 3

ad Gene × gene pairwise plots of transcriptional changes for the Dicer-like gene family members in response to abiotic stresses. Real-time qPCR was performed on Williams 82, Archer, and Noir 1 cDNA (genotypes are not distinguished in the plots) for 15 treatment × tissue type groups (color and shape coded according to the key; SA salicylic acid). Data are represented as log2-transformed values of the fold change relative to the unstressed control; each data point is the mean of three biological replications. The controls (shown as black squares) all plot at position (0, 0) and the range for each plot is −2.4 to 2.4 for both the X and Y axes. Data points that plot in the upper right quadrant represent samples in which both genes in the plot were transcriptionally upregulated; data points in the lower left quadrant represent samples in which both genes in the plot were transcriptionally downregulated. The comparisons shown in this figure represent the four highest pairwise correlations in this data set. The complete set of pairwise comparisons is shown in Fig. S2 of the “Electronic supplementary material”

GmDCL2 was the only duplicate gene pair to display a high co-expression value (R = 0.656; R values for the GmDCL1 and GmDCL4 paralogs were 0.195 and 0.268, respectively). This is surprising considering that these tandem-arranged copies are estimated to have a divergence time that is much more ancient (19.4 mya) than the 9–14 mya divergence estimate of the homoeologs (i.e., the presumed divergence time of the GmDCL1 and GmDCL4 duplicates). Therefore, the GmDCL2a and GmDCL2b were further analyzed for their transcriptional responses to stresses relative to one another. Figure 4 shows the relative transcriptional responses of GmDCL2a versus GmDCL2b as measured by both qRT-PCR and Sequenom MassARRAY quantitative SNP assays. These data indicate that, despite their co-expression, GmDCL2a is more transcriptionally responsive to stresses than GmDCL2b across nearly all genotypes, tissue types, and stresses tested. The qRT-PCR data in Fig. 3c support this trend as the distribution of data points is relatively flat but extends much further along the right side of the GmDCL2a axis, indicating that GmDCL2a transcripts are frequently up-regulated under stress, while GmDCL2b transcript levels show less response.

Fig. 4
figure 4

GmDcl2a shows a transcriptional increase relative to GmDcl2b in response to abiotic stress. Sequenom MassARRAY (X axis) and real-time PCR (Y axis) assays were used to estimate the relative transcriptional responses of GmDcl2a and GmDcl2b to four different abiotic stresses (see key) among three genotypes and three tissue types (each data point is the mean of three biological replications). The real-time PCR data were computed as the fold-change (FC) of GmDcl2a and GmDcl2b relative to the unstressed control. The GmDcl2a and GmDcl2b control FC values were set to 1.0, thus all of the control data plot along value 0.5 on the y axis (shown as black squares). Nearly all of the stress × genotype × tissue data points plot to the upper right of the control, indicating that the stresses elicited a transcriptional increase of GmDcl2a relative to GmDcl2b (cross-validated by the real-time PCR and Sequenom MassARRAY platforms)

Another important finding in the qRT-PCR versus MassARRAY comparison was the relative cross-validation of the two platforms (Fig. 4). Sequenom MassARRAY is a multiplex PCR assay that allows for the automated quantification of several different SNPs in a single reaction. The SNPs can be quantified between paralogous genes for 384 templates for approximately 30 SNPs per sample. Therefore, this technology has much higher throughput than standard quantitative PCR. We chose to use the MassARRAY technology to screen relative transcriptional responses for a larger set of paralogous genes and a larger panel of biotic and abiotic stresses.

Assessing transcription of eight paralogous gene pairs in response to biotic and abiotic stresses

Along with the three GmDCL gene pairs, MassARRAY SNP assays were designed for five additional paralogous gene pairs involved in RNAi pathways and other epigenetic processes (GmAGO1, GmDRB1, GmDRM1, GmMET1, and GmRDR6). The MassARRAY system allowed us to measure the relative transcriptional changes between paralogous pairs across the four abiotic stresses described in the previous section and four biotic stress stages (P. sojae post-inoculation at 3 h, P. sojae post-inoculation at 24 h, SMV at 10 days post-inoculation (dpi), and SMV at 30 dpi).

Figure 5 shows a heat map of the relative expression levels of the duplicate gene copies across the control and treated samples. The variation in relative response among the gene pairs is notable. The GmDCL1, GmAGO1, GmDRB1, and GmRDR6 paralogous pairs displayed only subtle relative responses to any of the treatments. The other four paralogous pairs each displayed unique patterns. GmMET1 showed a strong up-regulation of the “a” copy in some specific stresses. GmDCL4 and GmDRM1 showed tissue-specific changes: the GmDCL4a copy was up-regulated in stem tissues and the GmDRM1a copy was strongly up-regulated in leaf tissues.

Fig. 5
figure 5

Heat map of the ratio of transcript abundance between duplicate gene pairs as measured by Sequenom MassARRAY. Red indicates that transcription of the ‘a’ copy is favored, black indicates that the duplicate pairs have approximately equal transcription levels, while blue indicates that transcription of the ‘b’ copy is favored. The scale on the right indicates coloration used to depict the relative transcript proportions between duplicate copies. The heat map is organized by gene pair and tissue type, with each column representing a genotype (A Archer, W Williams 82, N Noir 1)

GmDCL2 displayed the most chaotic patterns. First, the down-regulation of GmDCL2b in SA-treated stems is clearly evident as a red row in an otherwise blue set of tiles (Fig. 5). This result confirms the same trend observed in the quantitative PCR data (Fig. 3). Second, the range of relative expression changes was extreme, favoring the “b” transcript in root and stem tissues but favoring the “a” transcript in some leaf tissue treatments. Furthermore, the transcriptional response to abiotic stresses almost always showed a relative up-regulation of the “a” copy (also see Fig. 4), but this response was not as strong or universal across the biotic stress treatments. Interestingly, GmDCL2 was the only paralogous gene pair that displayed a consistent difference among the genotypes. In both control and treated samples, Noir 1 showed a favoring of the “b” transcript compared to Williams 82 and Archer.

A more detailed examination of the duplicate gene transcript ratios among the genotypes is shown in Fig. S3 of the “Electronic supplementary material”. The genotype × genotype comparisons indicate that Williams 82 and Archer display similar duplicate transcript ratios for the set of eight gene pairs relative to the Williams 82–Noir 1 and Archer–Noir 1 comparisons. A gene expression heat map grouped by genotype also illustrates this point (Fig. S4 of the “Electronic supplementary material”). Additionally, these data indicate that there were generally fewer transcriptional differences between genotypes in response to the abiotic stresses than were observed in response to the biotic stresses (Fig. S3 of the “Electronic supplementary material”).

The relative expression ratios from the MassARRAY data were compared among the eight gene duplicates to identify possible interactions among sets of duplicate pairs. Pairwise correlations among the eight gene pairs were calculated from the entire set of MassARRAY ratios (Table S2 of the “Electronic supplementary material”). Nearly half (13 of 28) of the pairwise comparisons were significant. We also analyzed the positive co-expression trends to stress response within each treatment × tissue combination (Table S3 of the “Electronic supplementary material”). The data for the three genotypes were combined for this analysis, leaving a total of 24 interaction tests for each pair of genes. There was clearly far more co-expression interactions in response to abiotic stresses (36 significant interactions) than biotic stresses (five significant interactions). In fact, no significant interactions were observed in response to P. sojae inoculation. The total number of positive co-expression interactions for each gene pair is shown in Table S4 of the “Electronic supplementary material”.

Discussion

Profiling for duplicate gene co-expression

Stress conditions are known to trigger responses in gene expression which are regulated at both the transcriptional and post-transcriptional levels. In this study, PCR-based assays allowed us to screen the relative gene expression levels of eight gene pairs over three soybean genotypes, three tissue types, and eight stress conditions. These data revealed a wide range of patterns among the different gene pairs, including stress- and tissue-specific transcriptional responses.

The most profound co-expression patterns among the seven GmDCL genes were observed among non-paralogous copies, particularly GmDCL1b–GmDCL2b and GmDCL2a–GmDCL3a. The GmDCL1b–GmDCL2b co-expression pattern was most strikingly observed as a down-regulation in stems exposed to SA. The GmDCL2a–GmDCL3a co-expression pattern was driven mainly by up-regulation of both genes in response to various stress × tissue treatments but was most profound in stressed roots. We did not find any mechanistic rationale, such as gene promoter similarities, that might explain the observed co-expression between the non-paralogous GmDCL genes. Furthermore, at this point, it is unclear if the respective GmDCL1b–GmDCL2b or GmDCL2a–GmDCL3a co-expression patterns are associated with shared or coordinated functions between the pathways assigned to each gene class (e.g., the miRNA, nat-siRNA, and/or rasiRNA pathways).

The co-expression analysis of homoeologous or paralogous GmDCL copies revealed a surprising and perhaps counter-intuitive relationship between the age of duplication and co-expression in response to stress. GmDCL2a and GmDCL2b are a tandem-arranged paralogous pair with age of duplication estimated to be 19.4 mya. Evidence suggests that the GmDCL1 and GmDCL4 gene pairs are more recent homoeologous duplications (Fig. 1), potentially resulting from the whole-genome duplication event 9–14 mya. One would expect that more recent duplicates will display stronger co-expression patterns than more ancient duplicates; however, our data revealed that the GmDCL2 paralogs exhibited much stronger patterns of co-expression than either the GmDCL1 or GmDCL4 homoeologous pairs. Based on our analysis, there is no clear explanation for this finding, as there are no obvious selective mechanisms (Ka/Ks) or promoter sequence conservation differences between the GmDCL2 paralogs as compared with the GmDCL1 and GmDCL4 duplicates. However, the GmDCL2 paralogs are the only tandem-arranged duplicates in this study and may thereby be exposed to similar chromatin and/or epigenetic states in response to stress and development. Furthermore, a recent study of soybean small RNAs identified a 22-nucleotide miRNA (miR1515) that specifically targets the GmDCL2b locus (Zhai et al. 2011). This class of miRNA has been shown to trigger the production of secondary small RNAs (e.g., tasiRNAs) (Chen et al. 2010) that, in turn, may target the GmDCL2a copy and/or other GmDCL copies. This regulatory cascade may explain the co-expression of the GmDCL2 duplicates across treatments, and similar mechanisms may also influence co-expression patterns among non-paralogous copies.

Regulatory pathways and stress response

Based on homology to characterized Arabidopsis genes, we can broadly divide the genes investigated in this study into three distinct processes involved in transcriptional and post-transcriptional regulation: (1) DNA methylation (GmMET1 and GmDRM1), (2) RNAi processing (GmDCL2, GmDCL4, and GmRDR6), and (3) miRNA processing (GmDCL1, GmDRB1, and GmAGO1). All three of these processes have been implicated in plant stress responses. DNA methylation changes (Chinnusamy and Zhu 2009) and alterations of specific siRNAs and miRNAs (Kulcheski et al. 2011; Silva et al. 2011; Sunkar et al. 2007), as well as their targets (Borsani et al. 2005), have been reported across a wide range of plant species and stresses.

Our data allow us to compare the transcriptional response to stress for the genes in these three categories. At first glance, our results suggest that the DNA methylation and RNAi pathway genes are more responsive to stress than miRNA pathway genes in soybean. GmDCL3a and the GmDCL2 paralogs, particularly GmDCL2a, exhibited a wide range of transcriptional changes in response to stress (Fig. 3; Fig. S2 of the “Electronic supplementary material”). This finding suggests that these genes may play an important role in stress response. Though co-expressed, GmDCL2a consistently exhibited a relative up-regulation to stresses compared to the GmDCL2b copy (Fig. 4). GmDCL2a may function as a component of the nat-siRNA pathway and/or a surrogate component to DCL4 in anti-virus defense (Dunoyer et al. 2010). Furthermore, the relative transcript analysis of the DNA methyltrasferase duplicate genes for GmDRM1 and GmMET1 showed evidence of transcriptional responses to the stress treatments (Fig. 5).

The GmDCL1 duplicates showed some transcriptional changes in response to stress; however, the range was more subtle than the other GmDCL genes. GmDCL1b showed a strong co-expression with GmDCL2b, including a conspicuous down-regulation in response to salicylic acid treatment in stems. Taking into account the ability of many DCL and DRB proteins to compete and antagonize one another, it would be premature to dismiss the influence of the miRNA pathway in soybean stress response. In fact, recent reports in soybean have identified over 200 miRNAs, including several that exhibited differential expression under abiotic and biotic stress (Kulcheski et al. 2011; Li et al. 2011). However, the involvement of the different genes regulating and processing transcripts will remain unresolved until functional analysis can be carried out with appropriate soybean mutants.

The data set presented here may be particularly useful for designing targeted experiments that focus on functional divergence between duplicated genes. The development of new soybean mutant resources (Bolon et al. 2011; Hancock et al. 2011; Mathieu et al. 2009; Pham et al. 2010) and new methods capable of producing single and double mutants (Curtin et al. 2011) will be crucial for the advancement of soybean functional genomics and for studies of functional divergence between soybean duplicate genes. Mutant phenotypes for duplicated genes are frequently difficult to identify, largely due to the genetic buffer provided by the duplicate copy(s) (Bouche and Bouchez 2001; Jander and Barth 2007). Phenotypes for loss of function mutants may be more attainable and informative when screened under conditions known to trigger transcriptional differentiation between the duplicate copies.