Introduction

Mutation breeding is an important technique for creating genetic variation that contributes new sources of useful agronomic traits for improvement of crop varieties. During the past seventy years, more than 3000 mutant varieties have been released, most of which were induced by gamma-ray irradiation (Ahloowalia et al. 2004). Unlike chemical mutagens that induce point mutations, such as ethyl methane sulfonate (EMS), exposure to gamma-rays induces various types of DNA damage, such as single- or double-strand breaks and base pair substitutions (Wallace 2002; Wu et al. 2006). These broad-spectrum mutants generate many kinds of phenotypes. Gamma-irradiation treatment of soybeans can induce earlier maturity, seed coat color changes, triple null lipoxygenase, alteration of seed storage proteins, and high yield (Lee et al. 2011, 2014; Ha et al. 2014).

The causal mutations responsible for target agronomic traits induced by gamma-irradiation should be searched for direct use or for further cross breeding with marker-assisted selection or marker-assisted backcrossing (Ahloowalia et al. 2004). Map-based cloning has been the most commonly used strategy for identifying mutated genes. However, the multistep process of population development and genetic map construction is laborious and time consuming (Lukowitz et al. 2000; Salvi and Tuberosa 2005). Recently, next-generation sequencing (NGS) technology has allowed the de novo sequencing of crop species and the direct detection of genome-wide DNA polymorphisms within species by resequencing (Varshney et al. 2009). These NGS technologies can also be applied to identify the points of difference between mutant and wild-type genomes, which allows the rapid identification of causal mutations in combination with comparative functional candidate gene approaches (Zhu and Zhao 2007; Schneeberger 2014; Varshney et al. 2014). In rice, Hwang et al. (2015) detected genome-wide DNA polymorphism between an early maturing mutant and wild-type rice by whole genome resequencing. Then, a few hundred mutations that caused structural alterations of genes were further prioritized on the basis of their molecular functions related to flower development, and one putative causal mutation in a gene encoding a leucine-rich repeat receptor-like kinase was identified.

Flowering time is an important agronomic trait influencing crop yield. There are different mechanisms for controlling the timing of flowering in response to photoperiod length, temperature, and other environmental signals (Kim et al. 2009; Xia et al. 2012b). Soybean is classified as a short-day plant because it flowers in response to a short photoperiod (Kim et al. 2012). At present, nine major loci have been reported to control time to flowering and maturity in soybean: E1 to E8 and J (Xia et al. 2012b). Of these, the E1, E2, E3, and E4 loci were related to photoperiod sensitivity under various light conditions. The gene for E1 contains a putative nuclear localization signal and B3 domain and positively regulates flowering repressor GmFT4, a homolog of Flowering Locus T, and negatively regulates flowering promoters GmFT2a/GmFT5a (Xia et al. 2012a; Zhai et al. 2014). The gene responsible for E2 was homologous to the circadian clock-controlled GIGANTEA (GI) gene in Arabidopsis (Watanabe et al. 2011). The E3 and E4 genes encode PHYTOCHROME A (PHY A) homologs GmPHYA3 and GmPHYA2, respectively (Liu et al. 2008; Watanabe et al. 2009). In addition, the recent release of the soybean genome sequence has enabled the identification of 118 genes homologous to Arabidopsis flowering time-related genes throughout the soybean genome by comparative genomic analysis (Kim et al. 2012).

In a previous study, we selected an early-flowering soybean mutant line (Song et al. 2010). Here, to identify a casual mutation for the early flowering, we screened genome-wide DNA polymorphisms between the early-flowering mutant and wild-type soybean by whole genome resequencing and evaluated the different expression patterns of mutated genes involved in flower development.

Materials and methods

Plant materials

The early-flowering mutant cultivar Glycine max ‘Josaengseori’ (JS) was developed from the landrace Seoritae (SR) by gamma-ray treatment (250 Gy) at the Korea Atomic Energy Research Institute (KAERI) (Song et al. 2010). SR had a black seed coat and green cotyledon, suitable for mixed cooking with rice. However, it had late flowering (66 days to 50 % flowering) and maturity (164 days after sowing). In mutant cultivar Josaengseori, time to 50 % flowering was 10 days earlier, and time to pod maturity was 34 days earlier, than in wild-type SR (Song et al. 2010). Seeds of JS and SR were planted in 15-cm-diameter pots in a greenhouse, and leaves of a single plant per cultivar were used to isolate total DNA for whole genome sequencing. The leaves of JS and SR were harvested at the time to 50 % flowering (R1–R2) to analyze the expression levels of flowering genes.

DNA library construction and massively parallel sequencing

Genomic DNA was extracted from the fresh leaves according to the procedure described by Kim et al. (2010). The purified whole genomic DNA was randomly sheared by Covaris S2 (Covaris, Woburn, MA, USA) to yield DNA fragments in the target range of 400–500 bp, and average molecular sizes of the fragments were accessed using an Agilent Bioanalyzer 2100 (Agilent Technologies, Palo Alto, CA, USA). Following fragmentation, the resulting overhangs were converted to blunt ends using TruSeq DNA Sample Preparation Kits v2 (Illumina, CA, USA), followed by a clean-up step using AMPure XP beads (Beckman Coulter Genomics, Danvers, MA, USA). To increase the success of ligation between the fragmented DNA and index adapters and to reduce self-ligation of the blunt fragments, the 3′ ends were adenylated. Immediately following adenylation, the index adapters were ligated to the freshly adenylated, fragmented genomic DNA and were purified using AMPure XP beads. Sample ligation products were size selected on a 2 % agarose gel followed by gel extraction and column purification of the DNA. Successfully ligated DNA fragments that contained adapter sequences were enhanced via PCR using adapter-specific primers. DNA was re-isolated using AMPure XP beads, and average molecular sizes of the libraries were assessed with the Agilent Bioanalyzer 2100 to check for a sharp peak in the expected 500- to 600-bp range. Each library was loaded on the Hiseq 2000 platform (Illumina, CA, USA), and we performed high-throughput sequencing to ensure that each sample met the desired average sequencing depth. Image analysis and base calling were performed using the Illumina pipeline with default settings.

Preprocessing

After sequencing, the samples were demultiplexed and the indexed adapter sequences were trimmed using the Solexa QA package v.1.13 (Cox et al. 2010). It is common for the quality of bases from either end of Illumina reads to drop in quality; we, therefore, trimmed either end of the reads when the Phred quality score dropped below Q = 20 (or 0.05 probability of error). In addition, we trimmed all 5′ and 3′ stretches of ambiguous “N” nucleotides. Trimming resulted in reads with a mean length of 73.6 bp across all samples, and a minimum length of 25 bp was applied during sequence trimming.

Alignment and analysis of variants including SNPs

The Burrows–Wheeler Aligner (BWA) program (Li and Durbin 2009) was used to align the reads to the reference genome. The BWA default values for mapping were used, except for seed length = 32, maximum differences in the seed = 1, number of threads = 10, mismatch penalty = 6, gap open penalty = 15, gap extension penalty = 8. Mapped reads were extracted from the resulting BAM file for further analyses using SAMtools (Li et al. 2009). High mapping quality ensures reliable (unique) mapping of the reads, which important for variant calling.

Only reliable BWA-mapped reads were considered for single nucleotide polymorphism (SNP) calling. The positions of SNPs in the aligned reads compared to the reference were called using SAMtools (Li et al. 2009). Using the varFilter command, SNPs were called only for variable positions with a minimal mapping quality of 30. The minimum SNP quality was set 100. The minimum and maximum read depths were set 5 and 1000, respectively. A custom perl script was used to select significant sites in the called SNP positions.

RNA extraction and expression analysis of flowering genes

Total RNA was isolated from sampled fresh leaves with TRIzol Reagent (Invitrogen, CA, USA) according to the protocol provided. The RNA concentration was determined using the Nanodrop system (Nanodrop, DE). The DNA solution was then diluted to the working concentration with distilled water and stored at −20 °C until use.

The relative expression levels between JS and SR were obtained by quantitative RT-PCR using SYBR Green II Master Mix Kit (Takara, Japan). The specific primers were designed using PRIMER3 (http://frodo.wi.mit.edu/primer3/) based on the nine non-synonymous substitutions of flowering genes (Supplemental Table 1). An F-box family protein (CD397253) and an actin gene (ACT2/7, TC204150) were selected as housekeeping genes for comparison of gene expression levels (Gutierrez-Gonzalez et al. 2010). Total RNA (100 ng) was treated with DNase (Promega), and reverse transcription was performed according to the PrimeScript RT Reagent Kit instructions. For the qPCR, 100 ng of total RNA was used for reaction in the Eco™ Real-Time PCR System (Illumina), and the expression levels were analyzed with Eco Software v3.0.16.0 and normalized with the results of α-tubulin. Three replicate reactions were performed for each set of conditions, and the data were presented as mean ± SD (n = 3).

Results

Whole genome resequencing

Genome-wide variation between SR and JS was screened by whole genome resequencing with an Illumina Genome Analyzer. Illumina fragment libraries were made from genomic DNA isolated from a single homozygous plant. A total of 167 million reads produced approximately 15 Gb of sequence for SR, giving 15.5× genome coverage; 131 million reads produced 11 Gb of sequence for JS with 11.9× coverage (Table 1). In the wild-type JS genome, mapping of 158.4 million (94.5 %) reads to the reference resulted in the production of a 931.9-Mb consensus sequence, with 95.7 % genome coverage for the reference. In the mutant SR genome, mapping of 124.8 million (94.8 %) reads to the reference resulted in the production of a 926-Mb consensus sequence, with 95.2 % genome coverage for the reference.

Table 1 Results of genome resequencing between wild-type landrace Seoritae (SR) and early-flowering mutant Josaengseori (JS) by comparison with the reference genome (Gmax v1.1, 973 Mb)

Identification of SNPs on individual chromosomes between SR and JS

The frequencies of SNPs and insertions–deletions (indels) on individual chromosomes of SR and JS were surveyed by comparing with the reference sequence (Table 2). A total of 332,821 SNPs were identified throughout all soybean chromosomes. Chromosome 18 showed the highest number of SNPs (37,463), and very few SNPs (2591) were detected on chromosome 20. The frequency of SNPs in each chromosome varied from 1/1482 bp (chr. 16) to 1/18,051 bp (chr. 20), with an average value of 1/2925 bp; nucleotide diversity π (average number of SNPs per nucleotide) ranged from 5.54 × 10−5 (chr. 20) to 6.75 × 10−4 (chr. 16), with an average value of π = 3.42 × 10−4.

Table 2 Identification of polymorphisms on individual chromosomes between wild-type soybean landrace Seoritae (SR) and early-flowering mutant Josaengseori (JS) by comparison with the reference genome sequence

Among 332,821 polymorphic SNPs, 68,662 (20.6 %) were detected in genic regions, and 264,159 (79.4 %) were in intergenic regions. Among the polymorphic SNPs in genic regions, 21,084 were in coding sequence (CDS) regions, 13,755 were in untranslated regions (UTRs), and 69,140 were in introns. Of the 21,084 SNPs detected in CDS regions, 46.6 % were synonymous and 53.4 % were non-synonymous. The polymorphic SNPs between JS and SR included 12,519 genes.

The detected SNPs were classified into two groups: transitions (A/G and C/T) and transversions (A/C, A/T, C/G, and G/T), on the basis of nucleotide substitutions (Table 3). Among the 332,821 SNPs, 224,570 (67.5 %) were transitions and 108,251 (32.5 %) were transversions. Among transversions, the frequency of C/G was lowest (6.9 %).

Table 3 Identification of substitutions in SNPs detected in early-flowering mutant Josaengseori (JS) by comparison with wild-type landrace Seoritae (SR) genome sequence

Identification of indels on individual chromosomes between SR and JS

A total of 65,168 indels were identified in all soybean chromosomes, ranging from 875 (chr. 20) to 6793 (chr. 18) with an average value of 14,934 (Table 2). Chromosome 18 had the greatest number of SNPs (37,463) and also had a large number of indels. Among 65,178 polymorphic indels, 13,441 (20.6 %) were detected in genic regions, and 51,737 (79.4 %) were in intergenic regions. Among the polymorphic indels in genic regions, 741 were in CDS regions, 3884 were in UTRs, and 15,675 were in introns. The polymorphic indels between SR and JS included 7211 genes.

The lengths of indels and their frequency in SR and JS were calculated (Fig. 1). Among 655,178 indels, 32,571 insertions (1–5 bp), and 32,607 deletions (1–5 bp) were observed. The frequency of different types of indels varied and was negatively correlated with the number of nucleotides. Mononucleotide indels (49,614, 76.1 %) were the most frequent indels in JS and SR, following by di- (9821, 15.1 %) and trinucleotide indels (3390, 5.2 %).

Fig. 1
figure 1

Frequency and size of deletions and insertions in early-flowering soybean mutant Josaengseori (JS) compared with the wild-type landrace Seoritae (SR) genome sequence

Functional analysis of genes carrying non-synonymous SNPs and indels

To identify significantly affected biological processes, gene ontology (GO) analysis was performed using SNPs or indels between JS and SR. GO terms with a false discovery rate (FDR) of less than 0.01 were collected. Biological processes in which SNPs and indels were involved included reproduction (GO:0000003, GO:0022414), cellular processes (GO:0009987), organization of cellular components (GO:0016043), multicellular organismal processes (GO:0032501), development (GO:0032502), responses to stimulus (GO:0050896), localization (GO:0051177), establishment of localization (GO:0051234), and biological regulation (GO:0065007). Among these, cellular processes formed the largest categories of both SNPs (42 %) and indels (44.1 %) (Fig. 2).

Fig. 2
figure 2

GO term representation (%) of SNPs or indels between JS and SR

Screening of genes related to flowering time in JS

Agronomic traits of mutant cultivar JS that showed major improvement over the wild-type SR were earlier time to 50 % flowering (10 days) and to pod maturity (34 days) (Song et al. 2010). To identify a casual mutation for early flowering, we screened the variation in functional DNA of 118 genes involved in the flowering pathway throughout the soybean genome (Kim et al. 2012). Of these, 30 flowering genes were in SNPs and 25 were in indels (Supplemental Tables 2 and 3). Especially, nine flowering time-related genes carried non-synonymous SNPs in coding regions (Table 4). Based on the reference genome sequence, five genes (Glyma02g33040, Glyma10g36600, Glyma13g01290, Glyma14g10530, and Glyma17g11040) showed non-synonymous SNPs in JS; Glyma06g22650, Glyma16g01980, Glyma18g53690, and Glyma20g29300 were in SR. Glyma02g33040 showed a change from nucleotide A in SR to C in JS, which resulted in changes in amino acids, from glutamic acid to aspartic acid. The G–A substitution was shown in Glyma06g22650 and caused a change from valine in SR to methionine in JS. Glyma10g36600 and Glyma13g01290 showed A–T substitutions, which changed from lysine to a stop codon and from threonine to serine, respectively. Glyma14g10530 had a C–T substitution, which caused a change from serine to leucine. Glyma16g01980 showed A–T substitutions that corresponded to cysteine being replaced by serine; Glyma17g11040 had T–C substitutions, resulting in changing from isoleucine to threonine. Glyma18g53690 showed a nucleotide change from C to A, which resulted in a change from alanine to aspartic acid. Glyma20g29300 showed two SNPs, T–C and A–G substitutions, which resulted in changes from glutamine to arginine and from valine to alanine, respectively.

Table 4 Identification of functional mutations in soybean genes that are homologous with flowering time-related genes of Arabidopsis

Differential expressions of nine flowering genes detected in SNPs between SR and JR

We performed real-time quantitative PCR to determine whether non-synonymous SNPs might affect the expression of the nine flowering time-related genes (Fig. 3). Among these genes, expression of ELF3 was increased in JS relative to SR, and six genes (AGL18, GI, LHY, TOC1, TSF, and AGL42) had decreased levels of expression. The expression level of AGL8 was similar in JS and SR.

Fig. 3
figure 3

Relative expression levels of flowering genes between wild-type landrace Seoritae (SR) and early-flowering mutant Josaengseori (JS)

Discussion

Next-generation sequencing generates massive amounts of nucleotide reads at a low cost per Mb, and mapping of mutant reads to the wild-type genome sequence provides high sequencing depth for the prediction of mutation sites (Hwang et al. 2015). Ionizing radiation (e.g., gamma-ray) produces both DNA strand breaks and base substitutions (Morita et al. 2009). The induction of nucleotide substitutions and small deletions (2–16 bp) by gamma-ray treatment has been demonstrated in different plant species, but the frequency of these changes at the DNA level has been evaluated in only a few studies performed on rice (Sato et al. 2006; Morita et al. 2009) and Arabidopsis (Yoshihara et al. 2010).

In the present study, we acquired numerous nucleotide changes in a soybean mutant using NGS. Gamma-ray mutagenesis produced 332,821 polymorphic SNPs and 65,178 indels in early-flowering mutant JS compared to wild-type cultivar SR. In the detected SNPs, A/G and C/T transitions were predominant and accounted for 33.6 and 33.9 %, respectively. Similarly, in whole genome analysis of rice mutants generated by gamma-ray irradiation, the frequency of base transitions (70 %) was higher than that of base transversions (Hwang et al. 2014). In Arabidopsis dry seeds, gamma-ray exposure also generated a high frequency of G/C to A/T transitions (Yoshihara et al. 2010). Ionizing radiation induces the formation of free radicals (OH and H), which leads to oxidative DNA damage (Breen and Murphy 1995; Wallace 1998). The predominant form of oxidative damage to DNA in animal and bacterial cells is the formation of 8-OH-dG (Fuciarelli et al. 1990; Hirano et al. 2001; Wang et al. 1998). This molecule (8-OH-dG) can mispair with adenine, which induces major base substitutions involving G/C to T/A transversions (Shibutani et al. 1991). In our study, however, the incidence of transitions was higher than that of transversions (8.6 % A/C, 8.3 % A/T, 6.9 % C/G, and 8.2 % G/T). Yoshihara et al. (2010) suggested that mutations induced by the oxidation of guanine might be low in irradiated plant seeds and that the spectra of these mutation types varied according to the irradiation conditions and cell types. Conditions such as low water content and cell proliferation activity in the dry seeds might have affected the mutation spectrum in our soybean mutant as well as in rice and Arabidopsis.

The mutation rate in early-flowering mutant JS generated by 250-Gy gamma-irradiation was calculated on a genome basis using whole genome resequencing. The average frequency of SNPs in the soybean mutant genome was 1/2925 bp, and nucleotide diversity π was 3.42 × 10−4. This was consistent with the rice mutant genome of 1/2736 bp found with 200–300 Gy of gamma-irradiation (Hwang et al. 2014). These results suggested that the degree of genetic alteration created by gamma-irradiation was similar among different species.

The genetic alterations (SNPs and indels) were predominantly (79 %) located in intergenic regions. Among the polymorphic SNPs in CDS regions, the frequency of non-synonymous substitutions was much higher than that of synonymous substitutions. Similar mutagenic effects, such as high frequency of non-synonymous changes, were observed in rice mutants (Hwang et al. 2014). Non-synonymous changes, which alter amino acid sequences, may contribute to phenotypic differences. Therefore, the high frequency of non-synonymous changes could help to explain why gamma-irradiation produces high mutant rates and spectrums in agronomic traits.

The early-flowering mutant JS carried SNPs and indels in approximately 19,700 genes from wild-type cultivar SR. To select potential candidate mutations, these variations can be prioritized according to the putative functions of the genes that are thought to be connected to the mutant phenotypes. The availability of the reference genome and comparative genomic analysis enabled the identification of 118 key gene sets involved in the flowering pathway (Kim et al. 2012). Among these, only mutated genes containing non-synonymous SNPs in coding regions that produce amino acid changes were further examined as potential candidate mutations for earl flowering mutant JS. Using this approach, we identified nine flowering genes containing non-synonymous SNPs in the CDS regions (Table 4). Among these, Glyma10g36600 (GmGIa, E2) of JS contained a premature stop codon (AAA → TAA) at the 10th exon, which was the same mutation described in a previous report (Watanabe et al. 2011). The GI gene plays an important role in flowering by controlling the mRNA expression levels of CO and FT under inductive conditions in a wide range of plant species, including monocots and dicots such as rice and Arabidopsis (Koornneef et al. 1998; Fowler et al. 1999; Hayama et al. 2003; Mizoguchi et al. 2005). Watanabe et al. (2011) suggested that early flowering in soybean caused by the loss of function of GmGIa was related to the expression level of GmFT2a. GI regulates FT expression via microRNAs that cooperate with other transcriptional factors (Jung et al. 2007). Overexpression of OsGI in transgenic rice suppressed the expression of FT orthologs and resulted in a late-flowering phenotype (Hayama et al. 2003). In our study, the expression level of GmFT2a was higher in JS than in SR (3.197, supplemental Fig. 1). It seems that the premature stop codon of Glyma10g36600 in JS affected the expression of GmFT2a and resulted in an early-flowering phenotype in JS.

TOC1 (TIMING OF CAB EXPRESSION 1) and CCA1 (CIRCADIAN CLOCK ASSOCIATED 1)/LHY (LATE ELONGATED HYPOCOTYL) together make up the proposed central circadian loop (Ding et al. 2007). This positive–negative feedback loop between evening and morning factors led to the first genetic model of the plant clock (Alabadi et al. 2001). Both toc1 and cca1/lhy have defects in flowering time and photomorphogenesis, which correlate with the respective mutant circadian phenotypes (Somers et al. 1998; Strayer et al. 2000). In toc1, mutant plants have an early-flowering phenotype when grown under a short-day photoperiod. This phenotype is the result of clock-based misinterpretation of photoperiodic information rather than of the direct effect of toc1 on floral-induction pathways (Somers et al. 1998; Strayer et al. 2000). cca1 and lhy also exhibit an early-flowering phenotype under short-day conditions, and this was especially marked in the ccal/lhy double mutant, which is nearly insensitive to photoperiodic sensing (Mizoguchi et al. 2002). In our results, TOC1 (Glyma17g11040) and LHY (Glyma16g01980) showed non-synonymous substitutions in JS and SR, respectively. They also showed lower expression levels in JS than in SR. Ding et al. (2007) reported that in cca1/lhy/toc1, cca1/toc1, and lhy/toc1, the phase of GI expression was shifted earlier, resulting in a correlative increase in FT expression level. We suggest that the early-flowering phenotype in JS is a result of the low expression of TOC1 and LHY, which, in turn, leads to a phase shift of GI and an increase in FT.

Here, AGL18 (Glyma02g33040) of JS was detected non-synonymous substitutions, but AGL15 did not change. In the qPCR, AGL18 of JS showed a lower expression level (0.159) than SR (Fig. 3). The MADS-domain factors AGL15 and AGL18 contribute to regulation of the floral transition (Fernandez et al. 2014). While single mutants have no phenotype, agl15/agl18 double mutants flower earlier than the wild type (Adamczyk et al. 2007). Therefore, AGL15 and AGl18 appear to act in a redundant fashion as floral repressors in seedlings. The earlier flowering in agl15/agl18 mutants under short-day conditions is associated with upregulation of FT, and both AGL15 and AGL18 are expressed in the vascular system and shoot apex of young seedlings (Adamczyk et al. 2007), which suggests that AGL15 and AGL18 act directly on FT in leaves and on other targets in the meristem (Fernandez et al. 2014).

ELF3 is a clock-associated gene that plays a pivotal role in the circadian gating pathway (Hicks et al. 1996; McWatters et al. 2000). ELF3 has been shown to interact directly with both COP1 and GI in vivo. In the present study, ELF3 of JS showed higher expression (3.004) than SR, while GI had low expression (0.282). Yu et al. (2008) reported that ELF3-mediated interaction of COP1 with GI may result in degradation of not only the protein target GI, but also of the substrate adaptor ELF3. They suggested that ELF3 is degraded upon interaction with COP1, creating a negative feedback mechanism that limits the extent of ELF3 activity. In our study, the expression level of COP1 (Glyma14g05430) was lower in JS than in SR (0.535, Supplemental Fig. 1). The lower COP1 levels appear to have caused the higher level of ELF3 in JS. In addition, ELF3 overexpression promoted constitutive degradation of GI (Yu et al. 2008). In our study, higher expression of ELF3 in JS seemed to affect the lower expression of GI.

In this study, we used NGS to analyze an early-flowering soybean mutant generated by gamma-ray irradiation. The mutant, JS, showed numerous SNPs and indels compared to the original landrace, SR. JS contained changes in flowering genes related to the photoreceptor-mediated signaling pathways. We suggest that the early-flowering phenotype in JS was caused by changes in flowering genes generated by gamma-ray mutagenesis. Our results provide critical insights into the regulatory pathways associated with soybean flowering and help to improve our knowledge about mutation breeding.