Introduction

The soybean (Glycine max (L.) Merr.) is one of the most economically significant crops, providing almost half of the world’s source of protein, livestock feed, and oil, thus rendering it an essential component of the oil and protein industries (Graham and Vance 2003). Soybean could be grown in a wide range of latitudes (from 50°N to 35°S) (McBlain and Bernard 1987), which may be attributed to natural variations in major genes and quantitative trait loci (QTLs) associated with flowering time and maturity (Cao et al. 2016). Using classic methods, 11 major genes related to flowering and maturity have been reported: E1 and E2 (Bernard 1971), E3 (Buzzell 1971), E4 (Buzzell and Voldeng 1980; Saindon et al. 1989a, 1989b), E5 (McBlain and Bernard 1987), E6 (Bonato 1999), E7 (Cober and Voldeng 2001), E8 (Cober et al. 2010), E9 (Kong et al. 2014; Zhao et al. 2016), E10 (Samanfar et al. 2017), and J (Ray et al. 1995). Among these genes, E3 (Watanabe et al. 2009) and E4 (Liu et al. 2008) encode phyA homologs that delay flowering and maturity. E1, which largely influences flowering time in soybean (McBlain and Bernard 1987; Upadhyay et al. 1994), is a legume-specific transcription factor that downregulates two soybean FLOWERING LOCUS T genes, namely, FT2a and FT5a (Kong et al. 2010; Xia et al. 2012; Cao et al. 2015). E2 has been identified as a soybean ortholog of the Arabidopsis GIGANTEA gene (Watanabe et al. 2011). Allelic variations and combinations of the E1, E3, and E4 genes greatly contribute to the observed wide range of adaptability of soybean (Xu et al. 2013; Jiang et al. 2014). E9 and E10 (Zhao et al. 2016; Samanfar et al. 2017) are orthologs of soybean FT2a and FT4, respectively. To date, most research studies have focused on the genetic basis of flowering time under LD, whereas current knowledge on the mechanism of flowering time under SD is limited.

Cultivated soybeans were domesticated from their wild relative (Glycine soja Sieb. & Zucc.) approximately 5000 years ago in the temperate regions (between 32°N and 40°N) of China (Hymowitz 1970). During the following years up to the last century, soybean has been gradually spread to other production areas, except for those in low latitudes of South America. This gap in expansion has resulted from photoperiod sensitivity in soybean. As a facultative short-day (SD) plant, soybean cultivars from temperate regions would flower early when grown in lower-latitude areas under SD conditions. Later flowering and maturity are often positively correlated with seed yield (Cober et al. 2010). This phenomenon consequently limits the growing area of soybean. The identification of the long-juvenile (LJ) trait (delayed flowering time under SD condition) was a key breakthrough in overcoming this barrier (Hartwig and Kiihl 1979; Neumaier and James 1993). Two loci, J and E6, play an important role in controlling the LJ trait (Ray et al. 1995; Bonato 1999). J, a co-ortholog of the Arabidopsis flowering-time gene EARLY FLOWERING 3 (ELF3), acts as a direct transcriptional repressor that binds to the promoter of the legume-specific flowering repressor E1 to suppress its transcription, relieving the E1-dependent repression of two important FT genes that are targets of E1 and promoting flowering under SD conditions. Multiple loss-of-function J alleles are distributed across the global soybean germplasm and greatly prolong soybean maturity and enhance grain yield in the tropics (Lu et al. 2017). The E6 loci have been mapped quite close to J while it has also been proved that E1 has an epistatic effect on E6 (Li et al. 2017). The results of Lu and Li indicated that E6 and J are quite likely to be the same gene. Homozygous recessive alleles of E6 and J can delay soybean flowering time and improve yield in tropical regions. Several studies have suggested that at least three loci control the LJ trait (Carpentieri-Pípolo et al. 2000, 2002). Based on the quantitative nature of the LJ trait, the identification of novel loci controlling the LJ trait is considered a major task by soybean researchers to elucidate the molecular mechanisms underlying this important trait.

QTL mapping is a common approach to the identification of new loci (Mohan et al. 1997; Daverdin et al. 2017; Kong et al. 2018), while a traditional QTL mapping should be conducted in at least three independent test environments. Nevertheless, either three locations and/or years for field or three repetitions for controlled-environment experiment usually takes a lot of time. In this study, we use a new strategy to rapidly identify consistent novel QTLs associated with the LJ trait by multiple genetic populations and genotyping-by-sequencing. QTLs identified in this study will provide fundamental resources for fine-mapping candidate genes associated with the LJ trait and facilitate molecular breeding in the tropics.

Methods

Field experiments

To rapidly identify novel QTLs of the LJ trait, eight LJ lines from low-latitude areas and one conventional juvenile (CJ) line were used as parents, which generated six F2 populations (Supplementary Table 1). The F2 populations and their respective parental lines were sown in greenhouses in the Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Harbin, China (45°43′N, 126°45′E) in May 2017. These greenhouses have blackout cloth to provide SD conditions (12 h light and 12 h dark). Days to flowering were recorded at the R1 stage (first-open flower appeared) for each plant (Fehr et al. 1971).

DNA isolation

Young and fully-developed trifoliate leaves from the parents and the F2 individuals were collected, frozen in liquid nitrogen, and then stored in a − 80 °C freezer. Total genomic DNA was extracted from each parent and individual using a plant genomic DNA kit (CWBIO, Beijing, China). The integrity and quality of the extracted DNA were evaluated by 1% agarose gel electrophoresis. The DNA concentration of each sample was determined using a Qubit R 2.0 fluorometer (Invitrogen, Carlsbad, CA, USA) and NanoDrop 2000 (Thermo Scientific, Wilmington, DE, USA).

Genotyping by high-throughput sequencing

Approximately 1.5 μg of DNA of each parent were prepared for whole-genome resequencing. Sequencing libraries were generated as described by Cheng et al. (2015). These parental libraries were sequenced on an Illumina HiSeq2500 platform (Illumina, Inc., San Diego, CA, USA), and 150-bp paired-end reads with insert sizes of around 300 bp were generated. Based on the reference parental polymorphic loci, genotypes were identified by low-coverage sequencing of the six F2 populations. Genomic DNA was incubated at 37 °C with restriction endonuclease MseI (New England Biolabs, NEB, Ipswich, MA, USA), T4 DNA ligase (NEB), ATP (NEB), and MseI Y-adapter N containing barcode. Restriction-ligation reactions were heat inactivated at 65 °C, and then digested by additional restriction enzyme MseI + NlaIII at 37 °C. The restriction-ligation samples were purified with Agencourt AMPure (Beckman Coulter, Brea, CA, USA). Polymerase chain reaction (PCR) was performed using diluted restriction-ligation DNA samples, dNTPs, PhusionMaster Mix (NEB) universal primer, and index primer. The PCR products were purified by Agencourt AMPure XP (Beckman Coulter, Brea, CA, USA) and pooled, then separated by 2% agarose gel electrophoresis. Fragments sizes ranging from 375 to 400 bp (with indexes and adaptors) were isolated using a gel extraction kit (Qiagen). Purified fragment products were diluted for sequencing. Then, paired-end sequencing (each end was 150 bp in length) was performed on an Illumina HiSeq 2500 system (Illumina, Inc., San Diego, CA, USA) according to the manufacturer’s recommendations.

Sequence data grouping and SNP identification

Raw data (raw reads) were first processed through a series of quality control (QC) procedures using in-house C programs. The QC standards included removal of the following: (1) reads with > 50% bases while Phred quality < 5; (2) reads with ≥ 10% unidentified nucleotides (N); (3) reads that contain MseI and/or NlaIII cut-site remnant sequences; (4) reads with > 10 nt aligned to the adapter, which allow ≤ 10% mismatches; and low-quality raw data were first removed. Aligning the clean reads of each sample against the reference genome was performed using Burrows-Wheeler Aligner (settings: mem-t4-M-R) (BWA v0.7.10) (Li and Durbin 2009), then SAMtools software (v1.7.6) (Li et al. 2009) was used to convert the alignment files to BAM files. GATK (v3.8.0) was used to perform for all samples’ variant calling (Wang et al. 2010). Markers with > 30% missing genotype data, markers with segregation distortion (p < 0.01), or containing abnormal bases were filtered out in map construction.

Map construction

Chi-square (χ2) tests were conducted for all of the SNPs to detect segregation distortion. For bin mapping, markers with the same genotype were divided into bin markers using a Python script. Based on physical position, the markers were divided into 20 linkage groups (or chromosomes), and the genetic distance between the markers was estimated in cM using the QTL IciMapping software (http://www.isbreeding.net) (Meng et al. 2015).

QTL analysis using high-density genetic maps

QTL for flowering time in different populations was detected using the ICIM-ADD mapping method with the QTL IciMapping software with default software parameters (PIN (probability in stepwise regression) = 0.001, step = 1.0 cM). The threshold levels for declaring the existence of a QTL with an additive and/or dominance effect were determined by performing 1000 permutations on the data with a significance level of p = 0.05. Multiple-QTL model (MQM) using the QTL analyzing software MapQTL5 was also employed to identify QTLs. The LOD threshold for QTL in MQM method was calculated using a permutation test (PT) at a significance level of error rate of 0.05, n = 1000.

Results

Phenotypic analysis

We choose eight LJ lines whose flowering time ranged from 43 to 54 days and one CJ line as parent (Table 1, Supplementary Fig. 1, and Supplementary Table 1). By crossing LJ lines with CJ lines or LJ lines with LJ lines, we produced six F2 populations as follows: A, B, C, D, E, and F. The variations in flowering time (R1) of the populations ranged from 26 to 53 days under SD condition (12 h light and 12 h dark). Populations B, D, and E, which were generated by crossing LJ lines, showed transgressive segregation. This indicated that the LJ trait of these populations is regulated in a quantitative manner.

Table 1 Statistical analysis of the flowering times of F2 populations in short-day environments

Analysis of sequencing data and construction of a genetic linkage map

All of the parental cultivars PI 591429, PI 628930, PI 240664, BR121, PI 285096, PI 159925, H3, PI 628951, and PI 285096, were resequenced at an average sequencing depth of 26.63×, 24.47×, 26.81×, 11.27×, 24.64×, 24.56×, 23.96×, 32.62×, and 24.64×, respectively. Compared with the reference genome Gmax_275_Wm82.a2.v1, the coverage of each parental cultivar ranged from 94.6 to 95.5% (Supplementary Table 2). The number of polymorphic loci of filtered SNPs used in each population ranged from 4408 to 6704. Adjacent intervals with the same genotype were considered a single recombination bin locus (Huang et al. 2009). Finally, 2958, 2272, 4016, 3329, 2260, and 2702 bin markers were used. Based on these bin markers, six highly dense genetic linkage maps were constructed, with an average marker interval of each population ranging from 0.73 to 1.19 cM (Supplementary Fig. 2 and Supplementary Table 3).

QTL mapping of LJ traits

The threshold of the LOD scores for evaluating the statistical significance of the QTL effects is shown in Supplementary Table 4. Loci located to similar regions in multiple populations were considered consistent QTLs. A total of seven QTLs that were associated with the LJ trait were identified in six populations using the ICIM-ADD method (Table 2 and Fig. 1). Among these, three consistent QTLs, qLJ4.1, qLJ6.1, and qLJ16.1, were detected in at least two populations, they were considered consistent novel QTLs. A consistent QTL, qLJ4, was detected on chromosome 4 across populations B, D, and E, with the largest LOD scores of 4.21, 13.73, and 24.60, which could explain 14.29%, 13.73%, and 24.60% of the observed phenotypic variations, respectively. The second consistent QTL, qLJ6.1, was located between the interval of 15,645,576–17,359,296 on chromosome 6 and explained 33.74% and 48.15% of the observed phenotypic variations in populations A and F, respectively. The third consistent QTL, qLJ16.1, was detected in populations A, B, E, and F. It accounted for 8.65–18.21% of the observed phenotypic variations in the respective populations and could delay flowering time by 3–5 days. Except for three consistent QTLs, qLJ4, qLJ6.1, and qLJ16.1, four QTLs were detected only in population C, qLJ6.2, qLJ10, qLJ11, and qLJ16.2, and their phenotype variance was approximately 49.46%, 3.60%, 3.70%, and 5.96%, respectively.

Table 2 Details of the QTLs detected by ICIM and MQM methods in six F2 populations
Fig. 1
figure 1

Quantitative trait locus (QTL) mapping by ICIM-ADD mapping, implemented by IciMapping 4.0. Whole-chromosome scan of QTLs in six F2 population: a PI 591429 × PI 628930, b PI 240664 × BR121, c PI 285096 × PI 591429, d PI 159925 × PI 285096, e H3 × PI 628951, and f PI 628805 × PI 591429. The dotted lines indicate the threshold of each population

The MQM method was employed to identify QTLs in this study (Table 2 and Supplementary Fig. 3). We detected seven QTLs in the six populations using the MQM method. These were distributed across five linkage groups and explained 3.58–54.20% of the phenotypic variation in their respective populations. Most of these QTLs were detected by ICIM. Among these, qLJ4 was detected in D and E populations, qLJ6.1, qLJ6.2, qLJ10, qLJ16.1, and qLJ16.2 were identified in the same populations as ICIM-ADD method. This finding validated that these six QTLs controlled the LJ trait. Another QTL, qLJ11, was detected in the D population. Compared with the ICIM-ADD method, MQM offered similar detection results relating to major QTLs and had smaller LOD scores and much larger QTL intervals.

Candidate gene prediction

In this study, QTL intervals detected by the ICIM-ADD method were used to predict candidate genes.

In populations B, D, and E, which were developed from crosses between LJ cultivars with the functional J allele and LJ cultivars with loss-of-function J alleles (BR121, PI 159925, and H3), one consistent QTL, qLJ4, was detected in almost the same position on chromosome 4, with a physical interval compassing 1,931,339–3,010,079, which coincides with the J gene. Apparently, the qLJ4 is controlled by J.

In the A and F population, we mapped another consistent QTL, qLJ6.1. We combined the interval of qLJ6.1 in two populations to a physical position encompassing 15,645,626 and 17,359,346 to avoid omission during the identification of candidate genes. In the combined region, there are 122 predicted genes on the reference W82 sequence (http://www.phytozome.net/soybean). Sixty genes could be functionally annotated using WEGO2.0 (Supplementary Fig. 4). Of these, 39, 24, and 48 genes were functionally annotated to the categories of biological processes, cellular components, and molecular function, respectively. Three genes were related to transcription regulation activity, namely, Glyma.06G186800, Glyma.06G190200, and Glyma.06G190800. Meanwhile, we found that A and F population have the same 18 genes (Supplementary Table 5) with nonsynonymous/synonymous mutations between parents based on the resequencing data. Genes mentioned above were all considered candidate genes of qLJ6.1 (Table 3).

Table 3 Information of the candidate genes of qLJ6.1 in A and F populations

In the A, B, E, and F population, we detected the third consistent QTL, qLJ16.1. The QTL qLJ16.1 could delay flowering time for about 5 days according to Table 2. We merged the physical positions of the four populations into an interval between 48,317 and 4,404,659. This region harbors 458 genes according to the W82 reference genome, 246 genes could be functionally annotated using WEGO2.0 (Supplementary Fig. 5). Of these, 153, 55, and 209 genes were functionally annotated to the categories of biological processes, cellular components, and molecular function, respectively. One gene, Glyma.16G034900, was related to reproductive process. We also analyzed sequence polymorphisms of the 458 genes, of which 69, 74, 110, and 82 genes had nonsynonymous/stopgain SNV/stoploss SNV/nonframeshift insertion/nonframeshift deletion/frameshift deletion between their parents in the A, B, E, and F populations, respectively (Supplementary Tables 6, 7, 8, and 9). These four populations shared 21 mutated genes. The 22 genes mentioned above might be the candidate genes of qLJ16.1 (Table 4).

Table 4 Information of the candidate genes of qLJ16.1 in A, B, E, and F populations

The QTL qLJ6.2 in population C on chromosome 6, with a physical position encompassing 19,730,954 to 20,245,291, was extremely close to E1. Li et al. (2017) suggested that the E1 gene plays a role in controlling flowering time under SD conditions. Lu et al. (2017) confirmed that E1, which was regulated by J, play roles in the LJ trait. Therefore, we analyzed the resequencing data of the two parents of the D population, including the upstream, CDS, and downstream regions of the E1 genomic sequence. No sequence differences within the CDS, the 2-kb upstream fragment, and the 2-kb downstream fragment were detected between two parents. This indicated that there must be another candidate gene within this interval. Around this region lie three other genes related to flowering time, SPL (Glyma.06G205800), AGL8 (Glyma.06G205800), and RGA1 (Glyma.213100). According to the resequencing data, only one gene, AGL8 (Glyma.06G205800), differed in sequence between PI 285096 and PI 591429. PI 285096 harbored an SNP at nucleotide 187 (A to G) that resulted in an amino acid change (M to V) in Glyma.06G205800 relative to PI 591429. AGL8 is an ortholog of the Arabidopsis flowering-time gene AGL8, which is negatively regulated by APETALA1 (AP1) (Mandel and Yanofsky 1995). This indicated that AGL8 might be the candidate gene of qLJ6.2.

The QTL qLJ16.2 in the C population spanned markers 16-Bin161 to 16-Bin162, with a physical position encompassing 30,009,486–30,149,533. Around this region lie four orthologs of the Arabidopsis flowering genes, FT2a (Glyma.16G150700), FT2b (Glyma.16G151000), GI (Glyma.16G163200), and AGL6 (Glyma.16G200700). According to the resequencing data, FT2a, FT2b, GI, and AGL6 all had polymorphisms within the exon between parents. Among these, the SNP in FT2b results in a synonymous mutation. Thus, FT2a (Glyma.16G150700), GI (Glyma.16G163200), and AGL6 (Glyma.16G200700) might be candidate genes of qLJ16.2. FT2a, a soybean ortholog of FLOWERING LOCUS T (FT), is involved in the transition to flowering, and FT2a was highly upregulated under SD conditions (Kong et al. 2010), which indicated that FT2a was more likely to the candidate gene of the QTL qLJ16.2.

The interval of QTL qLJ10 in the C population on chromosome 10 encompasses positions 43,116,197–43,252,908, where it coincides with the E2 (Watanabe et al. 2011) gene. The parent PI 285096 possesses a dominant E2 allele, whereas PI 591429 possesses a recessive E2 allele. Thus, we assumed that QTL qLJ10 might controlled by E2.

Besides the major QTLs mentioned above, another QTL, qLJ11, was mapped on chromosome 11 in population C. There were four genes related to flowering around this QTL, namely, VRN1 (Glyma.11g124100), FPA (Glyma.11g126400), FTIP1 (Glyma.11g130400), and LUX (Glyma.11g136600). Among these four genes, only one gene, LUX (Glyma.11g136600), has nonsynonymous mutations between the parents of the C population. LUX encodes a MYB domain protein that is essential to circadian rhythms (Hazen et al. 2005). In Arabidopsis and long-day legume pea, mutants for ELF3 and LUX exhibit similar early-flowering phenotypes (Liew et al. 2014). Thus, LUX (Glyma.11g136600) might be the candidate gene of qLJ11.

Discussion

The LJ trait is characterized by delayed flowering time under SD conditions (Neumaier and James 1993; Spehar 1995), which plays a key role in the expansion of soybean farming areas from high to low latitudes. To date, only two loci, E6 and J, have been determined to control LJ (Ray et al. 1995; Bonato 1999). J was first cloned in 2017 and was described as a major gene that controls flowering time under SD conditions at low-latitude regions (Lu et al. 2017). J acts as a direct transcriptional repressor upstream of the legume-specific flowering repressor E1, which is the core soybean flowering suppressor that downregulates two soybean FLOWERING LOCUS T genes, namely, FT2a and FT5a (Xia et al. 2012). At least eight loss-of-function J alleles exist in nature. The examination of the geographical origins of J haplotypes suggests that J is not essential to soybean domestication, but supports the idea that loss of J function confers advantages to lower latitudes and has arisen independently several times as an important means of adaptation during expansion to these regions. Some late-flowering cultivars from low-latitude regions carry an apparently functional J allele, which suggests that there must be some other genes that control the LJ trait. This is consistent with the findings of previous research studies that the LJ trait is regulated by at least three independent loci (Carpentieri-Pípolo et al. 2000, 2002). Identification of novel QTLs related to LJ trait may facilitate in improving final grain yield in low-latitude regions.

QTL mapping is a common method to identify novel loci related to quantitative trait, while two main parts of QTL mapping: genetic map construction and obtaining phenotype data from at least three independent test environments usually take lots of time. Taking advantage of high-throughput sequencing technology, we could rapidly get high-density genetic map for QTL mapping constructed with single-nucleotide polymorphism (SNP) markers developed from GBS approach. However, the obtaining of phenotype data still delays the mapping of QTLs. Here, we design a new strategy to rapidly identify consistent novel QTLs underlying long-juvenile trait in soybean by multiple genetic populations and GBS. To prove the effectiveness and reliability of this method, we attempt to use this strategy to detect a known control loci related to LJ trait. Thus, we developed three populations, B, D, and E, which have one of the parents harbors loss-of-function J alleles. Not surprisingly, we successfully mapped J in these populations. These results prove the accuracy of the data of each population. Furthermore, the mutual authentication of B, D, and E populations proves the effectiveness and reliability of this strategy. Compared with traditional strategy, this strategy takes much less time to detected consistent novel QTLs.

In this study, we consistently mapped the control loci J and two novel QTLs related to LJ trait, qLJ6.1 and qLJ16.1, and predicted candidate genes by GO annotation and searching polymorphisms between parents. Besides qLJ6.1 and qLJ16.1, we detected two major QTLs very close to E2 and FT2a, indicated that E2 and FT2a might affect LJ trait. This finding agrees with the result of a previous study that E2 and FT2a might control the LJ trait in soybean (Lu et al. 2015). Map-based cloning of qLJ6.1 and qLJ16.1 and the functional characterization of these candidate genes are currently underway in our laboratory. Our findings may improve our understanding of the genetic and molecular mechanisms underlying the LJ trait, and the new strategy could offer new insights to QTL mapping.