Introduction

Soybean [Glycine max (L.) Merrill] is particularly important to world agriculture as the main economical oil and protein crop, providing more than 25% of the world’s source of protein for food and animal feed (Lu et al. 2020). Flowering time is one of the most important factors affecting soybean yield (Zhang et al. 2004), and it is controlled by several major gene loci and quantitative trait loci (QTLs). To date, twelve major genetic loci and genes associated with flowering time have been identified: E1 and E2 (Bernard 1971; Xia et al. 2012; Watanabe et al. 2011), E3 (Buzzell 1971; Watanabe et al. 2009), E4 (Buzzell and Voldeng 1980; Saindon et al. 1989a, b; Liu et al. 2008), E5 (McBlain and Bernard 1987), E6 (Bonato and Vello 1999; Li et al. 2017; Fang et al. 2020), E7 (Cober and Voldeng 2001), E8 (Cober et al. 2010), E9 (Kong et al. 2014; Zhao et al. 2016), E10 (Samanfar et al. 2017), E11 (Wang et al. 2019), and J (Ray et al. 1995; Lu et al. 2017). Among these, E1 is a legume-specific transcriptional factor that contributes the most to flowering time and maturity and delays flowering by suppressing the expression of FLOWERING LOCUS T (FT) orthologs in soybean (Xia et al. 2012). E2 is an ortholog of Arabidopsis GIGANTEA, which plays conserved roles in flowering in many crops (Hecht et al. 2007; Itoh et al. 2010; Izawa et al. 2011). E2 regulates flowering time in soybean by controlling the expression of GmFT2a but not GmFT5a (Watanabe et al. 2011). The effect of E2 on delaying flowering has been stably detected in different environments (Watanabe et al. 2011; Xu et al. 2013). The diversification of E2 haplotypes contributes to the adaptation of soybean to different latitudes by affecting flowering time. Compared with soybean varieties from the north, the E2 allele appears more frequently in southern soybean varieties (Wang et al. 2016). However, the mechanism underlying how E2 regulates flowering time in soybean is not clearly understood. E3 and E4 are reported as the orthologs of Arabidopsis Phytochrome A (phyA) and regulate flowering under different red-to-far-red (R:FR) quantum ratios (Cober et al. 1996; Liu et al. 2008; Watanabe et al. 2009; Xu et al. 2013). Under long-day conditions, either E3 or E4 could induce the expression of E1 (Xia et al.2012; Xu et al. 2013). E5 has been considered as an unexpected outcrossing with pollen having an E2-dl allele and no unique E5 gene exists (Dissanayaka et al. 2016). The E7 locus is located on Chr06 and is closely linked to E1 (Cober and Voldeng 2001), while the E8 locus is located on Chr04 near E1L gene (Cober et al. 2010). Both E7E7 and E8E8 delay flowering or maturity under low R:FR light quality or natural daylight conditions (Cober and Voldeng 2001; Cober et al. 2010). The candidate genes of E9 and E10 are GmFT2a and GmFT4, respectively, both of which are the members of GmFT family (Kong et al. 2014; Zhao et al. 2016; Samanfar et al. 2017). E11 is a recently identified locus, which is located on Chr07. E11E11 could promote flowering under long-day conditions (Wang et al. 2019). E6 and J are identified as the promotion of flowering time under short-day conditions (Lu et al.2017; Li et al. 2017). A recent study indicated that E6, an ortholog of Arabidopsis EARLY FLOWERING 3, is a novel allele of J (Fang et al. 2020). Among them, various combinations of E1 to E4 greatly contribute to the adaptation and yield of soybean (Tasma et al. 2001; Liu et al. 2011; Harada et al. 2011; Watanabe et al. 2012; Xu et al. 2013; Jia et al. 2014; Jiang et al. 2014; Li et al. 2017). Meanwhile, the variation contribution rate of these genes was as much as 62–66% (Liu et al. 2008; Tsubokura et al. 2014). In addition, Tof11 and Tof12 have been recently reported to sequentially contribute to soybean adaptation to higher latitudes (Lu et al. 2020). They undergone stepwise selection and loss-of-function alleles of them were enriched in landraces, particularly in the northeast region of China (Lu et al. 2020).

Although much work has been done to elucidate the control of flowering time in soybean, the mechanism has yet to be fleshed out. In the present work, a recombinant inbred lines (RILs) population called S4W was developed to identify new QTLs related to flowering time under long-day conditions. A total of twelve QTLs were detected, four of which are stable. Moreover, the roles of qR1-2 and qR1-6.1 in regulating flowering depend on E2 and that the causal gene of qR1-2 acts upstream of E2. Additionally, we presumed the most likely candidate genes according to resequencing data and briefly analyzed the geographic distributions of the candidate genes.

Materials and methods

Plant materials and growth conditions

For QTL mapping, an F6:8 RIL population named S4W was developed via a single-seed descendent method (Fehr 1987). The population (n = 116) was generated from a cross between Suinong4 (e1ase2e3E4) and Williams 82 (e1asE2E3E4). The RIL population and parents were grown under natural long-day conditions in 2018 and 2019 at the Experimental Station of the Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Harbin, China (45°75′N, 126°63′E). The seeds were sown with a row length of 2 m, row spacing of 60 cm, and distance between rows of approximately 10 cm. Approximately 20 seeds were sown per line. Standard cultivation practices were applied equally to each trial to control insects and weeds (Lu et al. 2017). The day of emergence was defined as the day when 50% of plants in a line reached emergence. The day when 50% of plants in a row began to flower was recorded as R1 (Fehr et al. 1971). The accessions in the 424-panel obtained from Lu et al. (2020) were used for analysis in this study. The 424-accession panel was grown under natural day-length conditions in Zhengzhou (34°44′N, 113°42′E) and Hefei, China (31°51′N, 117°15′E) in 2018, and in Guangzhou (23°16′N, 113°23′E) (short-day conditions), Wuhan (30°52′N, 114°31′E), Zhengzhou, and Harbin, China (45°75′N, 126°63′E) (long-day conditions) in 2019 and used to evaluate flowering time (Lu et al. 2020). Plants used for qRT-PCR were grown under long-day conditions (16 h light/8 h dark) in a plant growth cabinet.

DNA extraction

Young and fully developed trifoliate leaves were well-collected, frozen in liquid nitrogen, and transferred to a − 80 °C freezer. Total genomic DNA was extracted from the samples with Genomic DNA kit (CWBIO, Beijing, China). The integrity and quality of the extracted DNA were tested by 1% agarose gel electrophoresis. The DNA concentrations in each sample were analyzed with a Qubit 2.0 Fluorometer (Invitrogen, Carlsbad, CA, United States) and NanoDrop 2000 (Thermo Scientific, Wilmington, DE, United States).

Genotyping by high-throughput sequencing

To guarantee the whole genome sequencing, about 1.5 μg DNA samples were analyzed. The sequencing database used in this study was generated which described by Cheng et al. (2015). The Illumina HiSeq2500 platform (Illumina, Inc., San Diego, CA, USA) was used to sequence both parental libraries and there were 150 bp paired-end reads with insert sizes of about 300 bp generated. Genotyping-by-sequencing (GBS) technology was used to genotype S4W population. Low-coverage sequencing was used to identify the genotypes of single-nucleotide polymorphisms (SNPs) in the RIL population based on the reference polymorphic loci of the parents (Huang et al. 2009; Davey et al. 2013). Genomic DNA was incubated at 37 °C with restriction endonuclease MseI (New England Biolabs, NEB, Ipswich, MA, USA), T4 DNA ligase (NEB), ATP (NEB), and MseI Yadapter N containing barcode. The reaction was terminated by changing the temperature to 65 °C at the proper time. The enzymes MseI + NlaIII were then added to digest the products at 37 °C. Following digestion, the products were purified using Agencourt AMPure (Beckman Coulter, Brea, CA, USA) and amplified by PCR technology. The PCR system included diluted restriction-ligation DNA samples, dNTPs, PhusionMaster Mix (NEB) universal primer, and index primer. Also, the PCR products were purified as digested products by the same method and were separated by 2% agarose gel electrophoresis. Among all the separated strips, fragments ranging from 375 to 400 bp were isolated and extracted from the gels with a Gel Extraction kit (Qiagen). The purified fragment products were diluted prior to sequencing. Then, according to the manufacturer’s recommendations, the paired-end sequencing which end was 150 bp in length was performed on the Illumina HiSeq 2500 system (Illumina, Inc., San Diego, CA, USA).

Sequence data grouping and SNP identification

In order to have a reliable data without artificial bias, a series of quality control (QC) procedures using in-house C programs were used for raw data (raw reads). The QC standards included removing data at following situations: (1) under the Phred quality < 5 condition while reads with > 50% bases; (2) recognizing at least 10% unidentified nucleotides; (3) reads with MseI and/or NlaIII remnants; and (4) reads with > 10 nt aligned to the adapter while allowing mismatch was 10% and low-quality data obtained under situation above would be remove at the first round of distinguish attempt. Aligning the clean reads from each sample against the reference genome was performed using Burrows-Wheeler Aligner (setting: mem-t4-M-R) (BWA v0.7.10) (Li and Durbin 2009), then SAMtools software (v1.7.6) (Li et al. 2009) was used to convert the alignment files to BAM files. GATK (v3.8.0) was used to perform for all samples’ variant calling (Wang et al. 2010). Markers with > 30% missing genotype data, markers with segregation distortion (P < 0.01), or markers containing abnormal bases were filtered out in map construction.

Map construction

All SNPs were subjected to Chi-square tests to detect all segregation distortion. On the progress of SNP mapping, Python script was used to divide the markers with the same genotype as SNP marker. According to physical position, all markers were distributed into 20 linkage groups (or chromosomes). In this study, 3285 high-quality markers were detected in S4W population. The genetic distances of the markers were analyzed using QTL IciMapping (ICIM) software (Meng et al. 2015).

Classification of S4W population based on the genotypes of E2 and qR1-2

The interval of qR1-10, with a high logarithm of odds (LOD) score and percentage of phenotypic variance explained by the QTL (PVE), is consistent with E2 locus, which indicates that E2 has an extremely strong effect on flowering in S4W population. To exclude the effect of E2 on other QTLs, S4W population was divided into two sub-populations using allele-specific DNA marker of E2. One group included 58 individuals with E2E2, and the other included 51 individuals with e2e2. IciMapping software was used to detect QTLs.

To examine the genetic relationships between qR1-2 and E2, S4W population was classified into four groups based on the homozygous allelic combinations of E2 and qR1-2. SNP markers (2:15078759 and 2:12259445) based on the genetic map with the highest LOD scores of qR1-2 and the functional marker of E2 were used.

QTL analysis using high-density genetic map

The IciMapping (ICIM) method with QTL IciMapping software was used to detect QTLs for flowering time, with default software parameters (probability in stepwise regression = 0.001, step = 1.0 cM). The criteria and the IciMapping protocol used in this study were described by Li et al. (2007). The LOD threshold of declaring significant QTLs was calculated using a permutation test (PT) at a significance level of P < 0.05, n = 1000.

RNA extraction and reverse transcription

Fresh leaves were sampled from plants at ZT8 at 20 days after emergence (20 DAE), frozen in liquid nitrogen, and stored at − 80 °C freezer. Total RNA was extracted from the samples using an Ultrapure RNA Kit (CWBIO). The integrity and quality of RNA were tested by 0.8% agarose gel electrophoresis. NanoDrop 2000 (Thermo Scientific, Wilmington, DE, United States) was used to analyze the RNA concentrations in each sample. Reverse transcription was performed using Super Script First-strand cDNA Synthesis System (Takara, China) from 500 ng RNA.

Quantitative PCR

Quantitative reverse transcription polymerase chain reaction (qRT-PCR) was performed using Roche LightCycle480 system (Roche, Germany) and SYBR Green II Master Mix Kit (Takara, Japan). cDNA was diluted tenfold prior to qRT-PCR. The PCR cycling conditions were 95 °C for 30 s followed by 40 cycles of 95 °C for 10 s and 60 °C for 60 s. The relative expression levels of E2 were calculated based on the expression level of Tubulin (Glyma.05G157300). The primers used for real time PCR were obtained from Li et al. (2013). Three biological replicates were used in the assays, and the results were presented as mean ± s.e.m. (n = 8 plants).

Results

Phenotypic analysis

In order to detect new QTLs related to flowering time, we constructed a RIL population of 116 individuals developed from the crosses between Suinong4 and Williams 82. Suinong4 is one of the major soybean varieties with the high yield that is planted in MG0 (maturity group 0) in Heilongjiang Province, China. Williams 82 is a soybean variety from Northern America and that is planted in MG III. Suinong4 flowered approximately 10 days earlier than Williams 82 in 2018 and 2019 (Table 1 and Supplementary Fig. 1). The days to flowering ranged from 27 to 53 in F6:7 population in 2018 and from 31 to 57 in F7:8 population in 2019 in Harbin (Table 1). A transgressive segregation of flowering time was observed in S4W population indicating that there may exist at least two loci which associated with flowering time.

Table 1 Statistical analysis of the parents and S4W population

Analysis of sequencing data and construction of a genetic linkage map

In this study, total bases of 15,797,212,800 and 13,223,878,500 were identified in the two parents, with a Q30 ratio and GC content of 35.87% and 35.84%, respectively (Supplementary Table 1). The average sequencing depth of Suinong4 was 13.18 × , and Williams 82 was 11.20 × . A total of 3285 SNP markers fell within 20 linkage groups (LGs) were used to construct a high-density genetic map. The genetic length was 2720.53 cM with an average marker interval distance of 0.828 cM (Supplementary Fig. 2).

QTL mapping for flowering time in S4W population

Using the data of flowering time and the genetic map, the QTL identification was performed using ICIM method. Loci located in the similar regions in 2 years were considered as consistent QTLs. A LOD value of 2.5 was used as the threshold to acquire QTLs contributed to the flowering time trait.

We primarily detected seven QTLs related to flowering time in S4W population (Table 2 and Fig. 1). Among these QTLs, qR1-2, qR1-4, and qR1-10 were stably detected in 2018 and 2019. With the highest LOD scores, qR1-10 was located between the interval of 39,966,171–45,852,401 on chromosome 10 which was consistent with the position of E2 and accounted for approximately 23–43% of the observed phenotypic variation (PV). qR1-2 was detected on chromosome 2 with a physical position of 12,296,249–15,179,120 and explained 4.31–5.68% of the observed PV. Another QTL, qR1-4 with a LOD score of 3.94–4.51, could explain 4.77–7.39% of the observed PV and was located on chromosome 4. qR1-2 and qR1-4 showed positive additive effects. This suggested that the effects of these two QTLs on delayed flowering were derived from Suinong4. qR1-19 was another QTLs with the high LOD score. Located in the interval of 47,664,574–47,774,155 on chromosome 19, the position of qR1-19 covered E3 locus, which encodes a homolog of the Arabidopsis photoreceptor phyA (Watanabe et al. 2009). qR1-19 could explain 9.33% of the observed PV in 2019. Based on analysis of additive effects, the effect of qR1-19 on extended flowering time was derived from Williams 82. In addition to these QTLs, qR1-6, qR1-11, and qR1-20 which were located on chromosome 6, 11, and 20, respectively, were also detected in a single year. They could explain 5.15%, 4.91%, and 5.48% of the observed PV, respectively.

Table 2 Details of the QTLs detected by IciMapping 4.0 in S4W population
Fig. 1
figure 1

Quantitative trait locus (QTL) mapping in S4W population by IciMapping 4.0. a QTLs of flowering time in 2018. b QTLs of flowering time in 2019. The dashed lines indicate the LOD score of 2.5

QTL mapping in two sub-populations classified by the genotype E2

As mentioned above, we detected a major QTL called qR1-10. The position of qR1-10 coincides with E2, a major soybean maturity locus (Watanabe et al. 2011). The additive effect of qR1-10 is negative, suggesting that its effect on delayed flowering is attributed to Williams 82. This corresponds with the genotypes of E2 in the parents: the genotype of Suinong4 is e2e2, while that of Williams 82 is E2E2. Based on these results, E2 is probably the candidate gene of qR1-10. Previous studies showed that the maturity locus E2 is a GIGANTEA ortholog with a significant effect on flowering time in soybean (Watanabe et al. 2011). The high LOD scores and PVE of qR1-10 indicate that qR1-10 has an extremely strong impact on flowering time which might cover the effects of some other QTLs in S4W population. Thus, we divided S4W population into two sub populations: 58 individuals with E2E2 called E2-S4W and 51 individuals with e2e2 called e2-S4W. Interestingly, a described QTL named qR1-2 was detected in E2-S4W, while it was not detected in e2-S4W, in both 2018 and 2019 (Table 3, Supplementary Table 2, Supplementary Fig. 3, and Supplementary Fig. 4). The interval position of qR1-2 was 15,078,759–15,263,588 on chromosome 2. The LOD scores and PVE of qR1-2 were higher in E2-S4W than in S4W in both 2018 and 2019. Another QTL, qR1-6.1, was initially detected in E2-S4W in 2018 and 2019 (Table 3 and Supplementary Fig. 3). Like qR1-2, qR1-6.1 was not detected in e2-S4W (Supplementary Table 2 and Supplementary Fig. 4). Considering the low LOD scores of qR1-6.1, the genetic effect of qR1-6.1 was not explored further. With the physical interval of 8,542,345–13,420,505, qR1-6.1 accounted for 15.97–17.38% of the observed PV. These results suggest that the effects of qR1-2 and qR1-6.1 might rely on the presence of E2. Several other QTLs were detected in only 1 year: qR1-7 and qR1-18 were detected in E2-S4W (Table 3); qR1-13 and qR1-19.1 were detected in e2-S4W (Supplementary Table 2). These QTLs accounted for 15.66%, 9.30%, 36.66%, and 23.30% of the observed PV, respectively.

Table 3 Details of the QTLs detected by IciMapping 4.0 in E2-S4W

Genetic effects between E2 and qR1-2

As an ortholog of Arabidopsis GI (GIGANTEA), E2 plays major roles in adaptation and delays flowering under long-day conditions in soybean (Watanabe et al. 2011; Xu et al. 2015; Wang et al. 2016). The expression of E2 is suppressed in the e3e4 background (Cao et al. 2015). QTL mapping indicated that qR1-2 regulates flowering time based on E2. To further explore the genetic effect between qR1-2 and E2, we classified S4W population into four groups based on different homozygous allelic combinations at E2 and qR1-2 using the functional marker of E2 and SNP markers with highest LOD scores of qR1-2 (2:15,078,759 was used in 2018 and 2:12,259,445 was used in 2019). Individuals with the same genotype of qR1-2 as SN4 were labeled as qR1-2-SN4, while individuals with the same genotype of qR1-2 as W82 were labeled as qR1-2-W82. We then examined the flowering time in the four groups. In the E2E2 background, qR1-2-SN4 lines exhibited significantly later flowering than qR1-2-W82 lines in both years (Fig. 2a and b). However, qR1-2-SN4 lines and qR1-2-W82 lines showed no difference in flowering time in the e2e2 background (Fig. 2a and b). These results indicate that qR1-2 depends on E2 for its activity in regulating flowering and that E2 has an epistatic effect on qR1-2. An epistatic effect usually implies the relationship on upstream and downstream. Therefore, the expression levels of E2 in the four groups were also measured. Eight individuals were randomly selected from each of the four groups, and samples were taken at ZT8 on 20 DAE and used for qRT-PCR (Li et al. 2013). Not unexpectedly, E2 had higher expression levels in qR1-2-SN4 than in qR1-2-W82 in both the E2 and e2 backgrounds (Fig. 2c and d). These results suggest that the causal gene in the qR1-2-SN4 segment promotes the transcription of E2 and that the causal gene of qR1-2 acts upstream of E2.

Fig. 2
figure 2

Genetic interaction evaluation and qRT-PCR analysis of E2 and qR1-2. a Flowering time of different combinations of qR1-2 and E2 in 2018. b Flowering time of different combinations of qR1-2 and E2 in 2019. c Expression levels of E2 in E2-qR1-2-SN4 and E2-qR1-2-W82. d Expression levels of E2 in e2-qR1-2-SN4 and e2-qR1-2-W82. All data were given as mean ± s.e.m. (n = 8 plants). Plants were grown under LD (16L/8D) and sampled at ZT8, 20 DAE. qR1-2-SN4, individuals have the same genotype of qR1-2 as SN4; qR1-2-W82, individuals have the same genotype of qR1-2 as W82; DAE, days after emergence. A Student’s t-test was used to generate the P values

Candidate gene prediction

QTLs with similar positions and the same additive effect in different years were identified as the same QTL. To analyze candidate genes of the QTLs, we combined the different intervals and chose the longest distance as the candidate interval. Since we detected four stable QTLs (qR1-2, qR1-4, qR1-6.1, and qR1-10) and qR1-19 with high LOD scores and PVE, we primarily predicted the candidate genes of these QTLs.

As mentioned above, the interval of qR1-10 includes the region containing E2 (Watanabe et al. 2011). Since there were different alleles of E2 between parents, we presumed that E2 is the candidate gene of qR1-10. In the position interval of qR1-19, the locus of E3, a maturity locus in soybean, was found (Watanabe et al. 2009). As the parents possess different alleles of E3, E3 is likely the candidate gene of qR1-19.

The interval of qR1-2 contained 57 predicted genes with nonsynonymous/frameshift deletion/nonframeshift deletion/stopgain/frameshift insertion/nonframeshift insertion based on the resequencing data (Supplementary Table 3). Among these genes, we notice that Glyma.02G139200 is an ortholog of Arabidopsis PHYTOCHROME AND FLOWERING TIME 1 (PFT1). As reported, PFT1 acts as a co-activator of TCPs and FBHs to active the transcription of CONSTANS, thereby regulating flowering (Liu et al. 2017). The short tandem repeat (STR) of PFT1 is essential for its role in regulating flowering (Rival et al. 2014). We used PCR approach to confirm the mutations of Glyma.02G139200 between Suinong4 and Williams 82. An indel of an 84-bp STR was found between the two parents (Supplementary Fig. 5a). Furthermore, using InterPro (https://www.ebi.ac.uk/interpro/search/sequence/), we predicted the protein domain of Glyma.02G139200 (Supplementary Fig. 6b). The 84-bp STR was located in the predicted domain of Glyma.02G139200. Subsequently, we examined the variations of Glyma.02G139200 associated with flowering time in the 424-accession soybean panel at five field sites located in different latitudes in China (Lu et al. 2020). Glyma.02G139200 contains fifteen haplotypes in the 424-accession panel (Supplementary Fig. 6a). Unfortunately, the allele (H11) of Glyma.02G139200 from SN4 is a rare allele, thus it was not involved in the statistical analysis. We therefore selected two haplotypes (H14 and H15) covering a large group of individuals and analyzed their association with flowering time. In all locations, accessions harboring H14 showed significantly later flowered than those harboring H15, suggesting that Glyma.02G139200 might influence flowering in soybean (Supplementary Fig. 7). We further examined the geographic distributions of H14 and H15 of Glyma.02G139200 in subsets of Chinese landraces and cultivars (Supplementary Fig. 6c). The proportion of H14 was very low in Huanghuai and the northern region of China. Whereas, a higher proportion of H14 was detected in the southern region of China. These results imply that the later flowering allele of Glyma.02G139200 is already being used in southern region of China. Accordingly, Glyma.02G139200 might be the candidate of qR1-2.

The region of qR1-4 harbored 76 predicted genes with nonsynonymous/frameshift deletion/nonframeshift deletion/stopgain/frameshift insertion/stoploss according to the resequencing data (Supplementary Table 4). Among these genes, Glyma.04G042700 is an ortholog of Arabidopsis MRG. In Arabidopsis, MRG2 physically interacts with CONSTANS to promote flowering by activating the expression of FT (Bu et al. 2014). According to the resequencing data and PCR sequencing results, a nonsynonymous mutation in codon 346 existed between Suinong4 and Williams 82 which was not located in the domain of Glyma.04G042700 predicted by InterPro (Supplementary Table 4, Supplementary Fig. 5b, and Supplementary Fig. 8b). In the soybean 424-panel, four haplotypes of Glyma.04G042700 were detected. The allele of Glyma.04G042700 from Suinong4 belongs to haplotype 3 (H3), while the allele from Williams 82 belongs to haplotype 4 (H4) (Supplementary Fig. 8a). In all locations, accessions harboring H3 showed significantly later flowering than accessions harboring H4 (Supplementary Fig. 9). These results indicate that the mutation in codon 346 of Glyma.04G042700 could alter flowering time in soybean. Consequently, we presume that the candidate gene of qR1-4 might be Glyma.04G042700. The geographic distributions of H3 and H4 of Glyma.04G042700 were also examined in the Chinese accessions (Supplementary Fig. 8c). A relatively high proportion of H4 was detected in the northern region of China, whereas this haplotype was hardly existed in the southern region of China. Considering that accessions with H4 showed earlier flowering than those with H3 and that the proportion of H4 was higher in the north of China, we presume that H4 has already been used during the breeding process in northern China.

qR1-6.1 was first detected in E2-S4W. The interval of qR1-6.1 contained 50 predicted genes with nonsynonymous/frameshift deletion/stopgain/nonframeshift insertion according to the resequencing data (Supplementary Table 5). Twenty-four of these genes were functionally annotated and categorized into three classes: biological processes, cellular components, and molecular function using WEGO2.0 (http://wego.genomics.org.cn/), respectively (Supplementary Table 6 and Supplementary Fig. 10). No genes that were obviously associated with regulation of flowering were found, thus the genes mentioned above are considered as the candidate genes of qR1-6.1.

Discussion

In summary, we developed a RIL population from the crosses between Suinong4 and Williams 82. With the combination of high-density map (constructed by GBS approach) and phenotypic data of flowering time, twelve QTLs were identified in total. Among these QTLs, four of them (qR1-2, qR1-4, qR1-6.1, and qR1-10) are stable which were detected in 2018 and 2019. Others (qR1-6, qR1-7, qR1-11, qR1-13, qR1-18, qR1-19, qR1-19.1, and qR1-20) were detected in only 1 year. Additionally, qR1-2 and qR1-10 were reported previously (Bernard 1971; Pooprompan et al. 2006; Khan et al. 2008). Whereas no QTL related to flowering time has been reported before for the position of qR1-4 or qR1-6.1 according to the SoyBase (https://www.soybase.org/search/index.php?searchterm=First+flower&search=true).

E1 to E4 play important roles in soybean at high latitudes (Liu et al. 2008; Watanabe et al. 2009; Xu et al. 2013; Lin et al. 2021). Under long-day conditions, E3 and E4 promote the expression of E1 and E2 to control flowering time (Kong et al. 2010; Watanabe et al. 2011). The effect of E3 is enhanced under a high R:FR ratio under long-day conditions (Watanabe et al. 2009). In the current study, E3 was identified as the candidate gene of qR1-19. However, qR1-19 was only detected in 2019, perhaps since, both parents possess e1as, the effects of E3 might be weakened (Upadhyay et al. 1994; Watanabe et al. 2004; Yamanaka et al. 2001). In addition, weather conditions in 2018 and 2019 were widely different (Supplementary Table 7). There were more cloudy and rainy days in 2018 than in 2019, leading to lower R/FR ratios, suggesting that E3 presumably had less of an effect in 2018 (Supplementary Table 7) (Mao et al. 2017). Furthermore, the average high temperature and the difference between the high and low temperatures were lower in 2019 than in 2018 (Supplementary Table 7). The difference in temperature conditions might have influenced the effect of E3. Therefore, the phenotypes are influenced not only by genotypes but also by the environments.

Based on the results described above, E2 (qR1-10) had a crucial effect on flowering time in S4W population. E2 participates in the regulation of circadian clock and flowering, its effect on flowering time is relatively stable under different light conditions (Watanabe et al. 2011; Lee et al. 2016). Combination of different E2 alleles with other E loci partially determines the maturity groups of soybean cultivars (Li and Lam 2020). Furthermore, selection of E2 haplotypes has contributed to the changes in flowering time during the adaptation of soybean to different latitudes (Watanabe et al. 2011; Wang et al. 2016). Nonetheless, the actual molecular mechanism of E2 in regulating flowering has not been fully clarified. Another stable QTL, qR1-2, completely lost its effect on flowering time in the e2e2 background, suggesting that qR1-2 is dependent on E2 function (Fig. 2). In addition, Glyma.02G139200, an ortholog of Arabidopsis PFT1, might be the candidate gene of qR1-2. PFT1 regulates the signal intensity downstream of phyA and phyB in Arabidopsis (Cerdan and Chory 2003). GI acts as a positive regulator of very-low-fluence response (VLFR) which is mediated by phyA (Huq et al. 2000; Kim et al. 2007; Oliverio et al. 2007; Sawa et al. 2007; Song et al. 2014). Thus, GI and PFT1 might function in the same pathway to regulate flowering. Similarly, we presume that E2 and qR1-2 might participate in the same pathway to regulate flowering in soybean. The analysis of genetic interaction evaluation of qR1-2 and E2 and the results of qRT-PCR suggest that E2 has an epistatic effect on qR1-2 and that the causal gene qR1-2 acts upstream of E2 (Fig. 2). Molecular cloning of qR1-2 will further facilitate the understanding of regulatory relationships between qR1-2 and E2 and the mechanisms controlling flowering of them. Meanwhile, qR1-2 could be utilized in molecular breeding to improve soybean varieties that possess E2 by fine-tuning flowering time.

A panel of 424 soybean accessions was used to analyze the associations between candidate genes and flowering time in this study. We attempted to analyze the genetic relationship between E2 and Glyma.02G139200 in 424-accession panel. However, after classification, H14 contained so few individuals in the E2 or e2 background that statistical analysis cannot be carried on. In addition, the different haplotypes of Glyma.02G139200 and Glyma.04G042700 lead to changes in flowering time at five sites, suggesting that the two genes may play important roles in controlling flowering time in regions ranging from high latitudes (Harbin: 45°75′N, 126°63′E) to low latitudes (Guangzhou: 23°16′N, 113°23′E). For Glyma.02G139200, H14, which leads to delayed flowering time compared to H15, was present at relatively low frequency in wild soybean (n = 3), this might be a mutant allele of Glyma.02G139200 (Supplementary Fig. 6). Considering that soybean varieties in low latitudes have a relatively high frequently of E2 allele, mutants of Glyma.02G139200 might also have potential in breeding varieties with LJ trait (long-juvenile trait: extending the vegetative phase and improving yield under short-day conditions), which is consistent with the analysis of geographic distributions of Glyma.02G139200 in our study (Supplementary Fig. 6c) (Wang et al. 2016). For Glyma.04G042700, analysis of the geographic distributions of H3 and H4 implied that H4 of Glyma.04G042700 which leads to early flowering has been used for breeding in the north of China (Supplementary Fig. 8c).

Identifying QTLs related to flowering time is an important initial step in investigating the molecular mechanism underlying flowering time, even latitudinal adaptivity. In the present study, we detected several QTLs associated with flowering time, including qR1-2, whose role in regulating flowering is dependent on E2. These findings are conducive to understand the flowering regulatory network in soybean and provide genetic resources for molecular breeding.