Introduction

Soybean is the major crop cultivated in Brazil, which is the second largest producer worldwide. In the 2018/2019 harvest, Brazil reached the production of 115 million grain tons, corresponding for about 47.6% of the national grain production (CONAB 2019).

The principal compounds in soybean grain are protein (about 40%) and oil (about 20%). Soybean oil is the second vegetable oil produced in the world, corresponding to about 28% of vegetable oil consumption (USDA 2019). In Brazil, soybean is historically used by food purposes, but in recent years the demand for soybean oil for biodiesel production has increased. This was mainly due to the government efforts to reduce the dependency of fossil fuels by increasing about 228% biodiesel production in the last 10 years (Brazil 2016; ANP 2019).

Soybean oil is composed by 13% of palmitic acid (16:0), 4% of stearic acid (18:0), 20% of oleic acid (18:1), 55% of linoleic acid (18:2) and 8% of linolenic acid (18:3), on average (Goettel et al. 2014; Silva et al. 2018a; Zhang et al. 2018). The content of these fatty acids is determinant for the quality of soybean oil, influencing its physicochemical, functional and nutritional properties (Silva et al. 2018a; Kim et al. 2019). Saturated fatty acids, such as palmitic and stearic acids, are more stable to oxidative degradation but their consumption is associated with low-density lipoprotein (LDL) cholesterol increase, coronary disease and some types of cancer (Wang et al. 2012a, b; Kim et al. 2019). However, recent results indicate that stearic acid does not have this cholesterolemic effect in humans (Hunter et al. 2009; Heim and Gillman 2017). Oils with high levels of monounsaturated fatty acids like oleic acid are oxidatively stable and useful for industrial applications, such as biodiesel, paints, lubricants and cosmetics production, and food proposals (Maheshwari and Kovalchuk 2016; Woyann et al. 2019). Linoleic and linolenic acids are polyunsaturated fatty acids (PUFA) that are considered essential for humans because the body cannot synthesize them (Patterson et al. 2012). However, a high concentration of PUFA in soybean oil is the main cause of low oxidative stability and rancidification, altering the flavor and reducing storage time of food products (Warner and Fehr 2008).

Fatty acid desaturases (FAD) are the main enzymes that determine the content of oleic, linoleic and linolenic acids in soybean seed. The omega-6-desaturases GmFAD2-1A (Glyma.10g278000) and GmFAD2-1B (Glyma.20g111000) convert oleic acid into linoleic acid, and the omega-3-desaturases GmFAD3A (Glyma.14g194300), GmFAD3B (Glyma.02g227200) and GmFAD3C (Glyma.18g062000) convert linoleic acid into linolenic acid (Baud and Lepiniec 2010). Other enzymes associated with fatty acid content in soybean seed are delta-9-stearoyl-acyl carrier protein desaturases (GmSACPD-A-Glyma.07G207200; GmSACPD-B-Glyma.02G138100; GmSACP-C-Glyma.14G121400), which convert stearic acid into oleic acid (Gillman et al. 2014; Kim et al. 2019), and fatty acyl-ACP thioesterases (GmFATB1a-Glyma.05g012300; GmFAT1B-Glyma.17g120400; GmFATB2a-Glyma.04g151600; GmFATB2b-Glyma.06g211300), which hydrolyze acil-ACPs and produce free fatty acids (Weselake et al. 2009; Vogel et al. 2019).

Increasing oil content and improving its composition are two of the main goals of soybean breeding programs (Warner and Fehr 2008; Wang et al. 2012a, b; Pham et al. 2014, Silva et al. 2018a). In this context, molecular markers have been used successfully to increase the content of oleic acid and reduce the contents of PUFA to produce oil with increased oxidative stability (Pham et al. 2010, 2012). This is a healthier alternative to the hydrogenation, a process that converts PUFA into oleic acid but generates trans isomers of fatty acids (molecules which are associated with some heart diseases, high cholesterol levels and development of diabetes type 2) (Hu et al. 1997; Pham et al. 2014).

The oil content in soybean seed and its composition are considered quantitative traits (Burton et al. 1983; Bachlava et al. 2009), and thus controlled by quantitative trait loci (QTLs). Therefore, QTL mapping is widely used to identify markers associated with the content and composition of soybean oil (Wang et al. 2012a, b; Li et al. 2015; Heim and Gillman 2017; Zhang et al. 2018; Priolli et al. 2019). Although SSR and SCAR markers continue to be used for QTL mapping in soybean (Shu et al. 2013; Leite et al. 2016; Yu et al. 2018), most recent QTL mapping studies in soybean have used SNP markers (Heim and Gillman 2017; Akond et al. 2018; Zhang et al. 2018; Priolli et al. 2019; Seo et al. 2019). The results of QTL mapping analysis can be used to identify genes controlling important traits as well as identify molecular markers associated with these traits and to estimate their effects on the traits. Heim and Gillman (2017) evaluated a RIL population in three environments and found that a large genomic deletion of A6 soybean line is associated with a dramatic increase of stearic acid levels (about 24–28%), and suggested that GmSACPD-C and at least one other gene in this region are responsible for the variation of the trait. Zhang et al. (2018) used genome-wide scan to find many candidate genes for the variation of various soybean seed composition traits. Priolli et al. (2019) found 19 SNPs associated with palmitic, oleic and oil contents, and suggested various candidate genes to controlling oil and palmitic deposition in soybean seed.

Previously, we developed a soybean variety line CS303TNKCA which contains low linolenic acid content (3.5–4%), absence of lipoxygenases and the Kunitz protease inhibitor, and characterized a mutation on its GmFAD3A gene that is responsible for the reduced linolenic acid content of CS303TNKCA. Here, we used 534 SNP markers to detect QTLs associated with oil and fatty acid content in a segregating population derived from CS303TNKCA variety and evaluated in four environments. We found 20 QTLs associated with these traits, including five QTLs that was not associated with previously identified QTLs or genes. The results found in the present work can help understand the genetic basis of these traits and help breeders to modify oil content and composition in soybean seed, especially in tropical conditions.

Materials and methods

Plant material

The population used in this study was generated by crossing two contrasting lines, CS303TNKCA and FA22. The CS303TNKCA variety was developed by Programa de Melhoramento da Qualidade da Soja from Universidade Federal de Viçosa (PMQS/BIOAGRO/UFV), and contains middle oil levels in the seed (20–23%), low linolenic acid content (about 3.5–4%) and absence of lipoxygenases (lox1, lox2 and lox3) and Kunitz protease inhibitor. CS303TNKCA was developed by backcrossing BARC-12 (Leffel 1994) into the recurrent variety Monarca (COOPADAP, Brazil). The FA22 line has middle oleic acid content (about 50%), about 19% of oil content, and was developed in the Iowa State University by chemical mutagenesis (Alt et al. 2005).

To generate the segregating population, CS303TNKCA and FA22 soybean lines were crossed in a greenhouse at Universidade Federal de Viçosa (Viçosa, Minas Gerais (MG), Brazil; 20º45′14″S, 42 52′55″W). Two hundred and eleven F2:3 families were cultivated in the experimental field at Visconde do Rio Branco-MG, Brazil (VRB; 21 00′37″S, 42 50′26″W), in February to June 2010. The experiment was performed in a completely randomized design with two replications and an experimental plot of 25 plants per 1.5-m row and a row spacing of 0.5 meters. F2:4 families were evaluated in three experimental fields, one in Viçosa-MG, Brazil (VIC), in December 2010 to April 2011, and two in São Gotardo-MG, Brazil (SG1 and SG2; 19º18′39″S, 46º02′56″W), in November 2010 to March 2011. These experiments were performed in a completely randomized block design with four replications and an experimental plot of 25 plants per 1.5-m row and a row spacing of 0.5 meters.

Phenotypic analysis

The oil content of CS303TNKCA, FA22 and each F2:3 and F2:4 family was determined by nuclear magnetic resonance spectrometry (NM—Resonance Instruments, Witney, Oxfordshire, UK), in dry basis and expressed in g kg−1. To evaluate the fatty acid composition, lipids from soybean seeds were extracted using the following method: 15 mg of powdered soybean seed were mixed with 1 mL of hexane using N2 and stored at 4 °C for 16 h. The hexane solution was collected into another tube and evaporated using N2. Then, 0.4 mL of 1 M sodium methoxide was added to the tube, shaken and incubated at 30 °C for 1 h. Then, 1 mL of milliQ water and 1 mL of hexane was added in this order, shaken and incubated at room temperature for 1 h. Finally, 0.75 mL of the organic phase was collected, added to anhydrous sodium sulfate to remove the moisture and transferred to a vial tube. The fatty acid composition was performed by gas chromatography in a GC-2010 Plus chromatograph (Shimadzu), as reported by Burkey et al. (2007). The content of each fatty acid was expressed as a proportion of total fatty acids in g kg−1.

Genotypic analysis

Genomic DNA from CS303TNKCA, FA22, and each F2 plant was extracted by using the Wizard Genomic DNA Purification Kit (Promega, Madison, WI). The concentration of extracted DNA was determined by using a NanoDrop spectrophotometer (NanoDrop Technologies, Wilmington, DE), and the quality was checked by the 260/280 and 260/230 ratios, as well as by 0.8% agarose gel electrophoresis. Genomic DNA was diluted to 100 ng/µL.

Genotyping of F2 population was performed by using Illumina GoldenGate from USDA/ARS Soybean Genomics and Improvement Laboratory (Beltsville, MD), using 1536 SNP from Universal Soy Linkage Panel (USLP 1.0), as described by Hyten et al. (2010). The analysis of alleles in each locus was performed using GenomeStudio software (Illumina Inc., San Diego, CA).

Segregating test and map construction

The goodness-of-fit of markers to the expected segregation ratio of 1:2:1 in the F2 population was checked by Chi-Square at 5% probability. The linkage map was constructed using a minimum log-of-odds (LOD) threshold of 3.0 and a maximum recombination frequency of 30%. Recombination frequencies were converted into genetic distances in centiMorgans (cM) by using the Kosambi function (Kosambi 1944). In case of more than one linkage group per chromosome was generate, the linkage groups were analyzed together. Segregating test and map construction were performed in Genes software (Cruz 2013). The average distance between markers was calculated for any soybean linkage group by using only non-fully linked markers.

QTL detection

Descriptive statistics were calculated for each environment and the average of all experiments (joint analysis). Frequency distribution graphs were generated for the average content of oil and fatty acids, and the normality of the data was checked by the Lilliefors Test (Lilliefors 1967). Correlations between traits in each environment and for the same trait in all environments were calculated by the Pearson coefficient.

Analysis of deviance was performed by restricted maximum likelihood (REML) method to evaluate de significance of the genotype, environment and genotype by environment interaction effects using software Selegen (Resende 2016) and R (R Core Team 2017). Heritabilities based on means were calculated using the formulas:

$$ \begin{aligned} {\text{h}}_{\text{g}}^{2} & =\upsigma_{\text{g}}^{2} /\left( {\upsigma_{\text{g}}^{2} +\upsigma_{\text{e}}^{2} / {\text{r}}} \right),\;{\text{for}}\,{\text{individual}}\;{\text{analysis}} .\\ {\text{h}}_{\text{g}}^{2} & =\upsigma_{\text{g}}^{2} /\left( {\upsigma_{\text{g}}^{2} +\upsigma_{\text{e}}^{2} / {\text{r }} +\upsigma_{\text{ge}}^{2} / {\text{re}}} \right),\,{\text{for}}\;{\text{joint}}\,{\text{analysis}} .\\ \end{aligned} $$

where \( h_{g}^{2} \) is the heritability coefficient, \( \sigma_{g}^{2} \) is the genotypic variation, \( \sigma_{e}^{2} \) is the residual variation, \( \sigma_{ge}^{2} \) is the genotype by environment interaction variation, e is the number of environments and r is the number of repetitions. Due to the variation in the number of repetitions between and among environments, we used a harmonic mean of the repetition number for calculating the heritabilities.

QTL detection was performed in software Genes (Cruz 2013) for each environment and for the average values of the four environments using only markers that segregated as expected for the F2 population. Simple interval mapping (Lander and Botstein 1989) was performed by linear regression model, at distance intervals of 0.1 cM. QTL positions were determined using a LOD threshold of 3.0. The coefficient of determination (R2) was used to evaluate the proportion of phenotypic variation that is explained by each marker. Dominant and additive effects were estimated. Only QTLs that were identified in the joint analysis (LOD > 3.0) were considered in this study. QTLs were named using an abbreviation of the trait followed by the number of the chromosome and a sequential number when more than one QTL was identified in the same chromosome for the same trait.

The position of the SNPs flanking the QTLs was identified in Soybase (Grant et al. 2009) and used to position each QTLs in the soybean genome. Using this information, we searched in Soybase and Phytozome (Goodstein et al. 2011) databases and previous literature for nearby genes and QTLs that could be associated with the QTLs found in this study.

Results and discussion

Phenotypic analysis

Descriptive statistics (minimum, maximum, average and standard deviation) and values of kurtosis and skewness for each environment and joint analysis are shown in Table 1. There was a substantial amplitude for each trait in each environment, varying from 86.71 to 121.12 g kg−1 for palmitic acid, 22.21–51.39 g kg−1 for stearic acid, 184.01–577.42 g kg−1 for oleic acid, 272.46–618.07 g kg−1 for linoleic acid, 27.06–84.05 g kg−1 for linolenic acid and 180.06–236.31 g kg−1 for oil content. Transgressive segregation was observed for all traits in all environments except for palmitic acid in the joint analysis. Transgressive segregation has been reported in soybean segregating populations for oil and fatty acid contents, which matches to the concept that the traits are controlled by multiple genes with small effects (Cao et al. 2017; Heim and Gillman 2017; Smallwood et al. 2017; Bueno et al. 2018; Silva et al. 2019).

Table 1 Descriptive statistics, kurtosis and skewness for oil and fatty acid contents of F2-derived populations from CS303TNKCA × FA22 crossing, evaluated in four environments in Brazil

Kurtosis and skewness indicated normal distribution for all traits in joint analysis, except linolenic content (positive skewness at 5% probability). However, significant kurtosis and skewness were observed in individual analysis. Linolenic and also oleic content did not show normal distribution in joint analysis at 1% and 5% probability, respectively (Fig. 1). For linolenic acid, this fact could be explained by a gene (GmFAD3A-Glyma.14g194300) that was previously described having a major effect in the trait (Bilyeu et al. 2005; Pinto et al. 2013). Indeed, we recently described a mutation in GmFAD3A of CS303TNKCA variety, and individuals carrying this mutation had 41.4 g kg−1 of linolenic acid on average (Silva et al. 2018a), a value that matches the second peak in linolenic acid distribution (Fig. 1).The presence of QTL with major effect can distort the normal distribution curve since the values will tend to approximate the averages of each QTL genotype and not only to the mean of the trait.

Fig. 1
figure 1

Distribution of average oil and fatty acid contents in F2-derived populations from CS303TNKCA × FA22 crossing. The P value indicates the probability of following normal distribution

The traits showed significant genotypic effect at 1% probability in all environments and in joint analysis except oil content in Vicosa environment (Table 2), indicating possibility of gains with selection for all traits. There also was significant effect of environment for all traits in joint analysis. Significant genotype by environment effect was observed for all fatty acid contents at 1% probability but not for oil content. Fatty acids in soybean are typical quantitative traits and are affected by environment and genotype by environment interaction effects (Matei et al. 2018; Zhao et al. 2019). Indeed, Brazil has a markable environment variability and it can change the behavior of soybean crop, affecting the quantity and composition of seed oil. For example, temperature variations can alter the balance of saturated and unsaturated fatty acids and the balance of protein and oil content. Generally, as the average temperature increases the quantity of oil and saturated fatty acid increases, while in lower average temperatures there is an increase of protein and unsaturated fatty acids content (Baud and Lepiniec 2010; Matei et al. 2018).

Table 2 Analysis of deviance for oil and fatty acid contents of F2-derived populations from CS303TNKCA x FA22 crossing, evaluated in four environments in Brazil

Heritability values for fatty acid content varied from 53.86% (oleic content/VIC) to 81.46% (linolenic content/VRB) in the individual analysis, and from 77.82% (stearic content) to 89.62% (linolenic content) in joint analysis. These are considered good values for quantitative traits and point to good perspectives of selection gains. Heritability values for fatty acid content have varied in many studies, usually ranging from 40 to 90% (Hyten et al. 2004; Leamy et al. 2017; Li et al. 2017; Priolli et al. 2019; Woyann et al. 2019). For oil content, heritability values in each environment varied from 16.51% (VIC) to 56.70% (VRB) and joint analysis showed a higher value (66.60%). Oil content is also a quantitative trait and different values of heritability are showed in QTL mapping studies such as values below 50% (Leamy et al. 2017) and above 80% (Priolli et al. 2019; Woyann et al. 2019). These results show that the evaluation of oil content and composition in multiple environments can lead to a higher values of heritability, increasing the selection gains.

Pearson coefficients of correlation between traits in each environment and between environments are shown in the supplementary information (Table S1). Oil content was the trait that varied most in different environments, with an average coefficient of correlation of 0.3388 for the trait in different locations. On the other hand, linolenic acid content had a lesser variation between environments, with an average coefficient of correlation of 0.7205 between the trait in the four locations. Regarding the correlation between traits in each environment and the average values of the same traits (joint analysis), the coefficients of correlation varied from 0.5780 (oil content in VIC location) to 0.9150 (linolenic content in SG1 location).

In all environments and joint analysis, we observed a strong negative correlation between oleic and linolenic contents, ranging from − 0.9572 (VRB) to − 0.9928 (VIC). Negative correlations between these traits were well reported, likely due to genes that present pleiotropic effects on both traits (Bachlava et al. 2008; Leamy et al. 2017; Li et al. 2017; Smallwood et al. 2017; Silva et al. 2019). We also observed negative correlations between palmitic and oleic acids (− 0.4828 to − 0.7687), between oleic and linolenic acids (− 0.2901 to − 0.5157), and between oleic acid and oil content (− 0.1739 to − 0.5903). Positive correlations were found between palmitic and linoleic acids (0.3770 to 0.7272), and between linoleic and oil contents (0.2024 to 0.5789).

Correlations between fatty acids are expected since they are at the same metabolic pathway. Palmitic acid is converted to stearic acid by the addition of two carbons, a process that involves the enzymes very-long-chain 3-oxoacyl-CoA reductase 1, very-long-chain (3R)-3-hydroxyacyl-CoA dehydratase and very-long-chain enoyl-CoA reductase (Kanehisa and Goto 2000). The production of unsaturated fatty acids is mediated by desaturases, enzymes that add double bounds in specific positions of the carbon chain. Stearoyl-ACP desaturases convert stearic into oleic acid, omega-6-fatty acid desaturases convert oleic into linoleic acid, and omega-3-fatty acid dessaturases convert linoleic into linolenic acid, by the addition of a double bounds after carbons 9, 12 and 15, respectively (Kanehisa and Goto 2000). Since the production of these fatty acids depends on the amount of the fatty acids upstream in the metabolic pathway, they will be correlated, as was observed in our results and by many authors (Leamy et al. 2017; Li et al. 2017; Smallwood et al. 2017; Silva et al. 2019).

SNP selection and map construction

We genotyped 211 soybean F2 segregating plants using 1536 SNPs from Illumina GoldenGate Assay and found 595 polymorphic SNPs. Illumina GoldenGate Assay was created with SNPs on evenly distributed in each of the 20 consensus linkage groups of soybean, having a wide range of allele frequencies in different germplasm and its SNPs were mapped in the integrated molecular linkage map (Gutierrez-Gonzalez et al. 2011; Hyten et al. 2010). The polymorphic SNPs percentage (38.74%) was higher than that found by Vuong et al. (2016; 31.5%), but lower than that found by Gutierres-Gonzalez et al. (2011; 55.33%); both studies used the same set of SNPs.

Chi square test revealed that 58 polymorphic markers (9,7%) did not segregated as expected for F2 population and was removed from the subsequent analysis. Of these markers, 42 markers (72.4%) had predominant alleles from parent CS303TNKCA while 16 markers (27.6%) had predominant alleles from parent FA22. Segregation distortions is a common biological phenomenon that can be caused by small population size as well as by genetic factors affecting the inheritance of the markers (Nodar et al. 1993; Wang et al. 2012a, b; Silva et al. 2018b). According to Wang et al. (2012a, b), the skewed segregation ratio is higher in RIL populations than in F2 or backcross populations.

In the mapping analysis, 537 markers were used to construct the linkage map. A total of 534 markers were distributed in 25 linkage groups (Supplementary Fig. S1) which were positioned in the 20 soybean chromosomes, while the remaining three markers were not included in any linkage groups (BARC-035199-07136, BARC-029937-06757, and BARC-042281-08231). The lack of polymorphic markers closely flanking these three markers may be the cause of them had not being included in linkage groups because this could have led to recombination frequencies higher than 30% between these three markers and the markers flanking. BARC-035199-07136 marker is located at position 1,030,857 of Gm01 chromosome but the first marker included at D1a linkage group (BARC-060833-16926) is located at 4,831,739 position of Gm01 (Grant et al. 2009). Similarly, BARC-042281-08231 marker is located at position 343,168 of Gm20 but the first marker included at I linkage group (BARC-042281-08231) is located at 16,731,974 position of Gm20 (Grant et al. 2009). Gm06 was formed by two linkage groups in this study (5 and 25). BARC-029937-06757 marker (10,900,876 position of C2 likage group) is placed in Gm06 between the last marker of linkage group 5 (BARC-065139-19125, position 4,422,517) and the first marker of linkage group 25 (BARC-021735-04194, position 15,980,506) (Grant et al. 2009).

Table 3 shows information about the distribution of polymorphic SNPs in the 20 chromosomes of soybean. Five soybean linkage groups were composed of two linkage groups in this study (C2/Gm06, A2/Gm08, B1/Gm11, F/Gm13, and G/Gm18) and fifteen chromosomes were composed by one linkage group. The number of markers per chromosome varied from 12 (Gm12) to 45 (Gm02), covering from 52.69 cM (Gm11) to 154.93 cM (Gm02), totaling 2067.17 cM, an average of 103.36 cM per chromosome and 5.10 cM between markers. Recent linkage map studies in soybean have found similar overall lengths of 2054.50 cM (Cao et al. 2017), 2201.4 cM (Heim and Gillman 2017), 2534.42 cM (Li et al. 2017), but overall lengths for soybean genome greater than 3000 cM were also found (Fan et al. 2015; Seo et al. 2019). Gm06 had the shortest average distance between markers (3.05 cM) and Gm12 had the longest average distance (12.12 cM). Gm01 and Gm17 were the only chromosomes that did not presented full linked markers (only bins containing a single marker), and the percentage of bins with two or more markers varied from zero (Gm01 and Gm17) to 47.22% (Gm13).

Table 3 Coverage of the twenty chromosomes of soybean from a linkage map derived from 534 SNP markers

QTL mapping

A total of 20 QTLs controlling oil content or fatty acid content were found in the joint analysis (Table 4), varying from one (linolenic content) to six (palmitic content) per trait. The number of QTLs per chromosome was one (Gm05, Gm06 and Gm15), two (Gm14 and Gm17), three (Gm04, Gm 13 and Gm19) and four (Gm02). These QTLs explained from 7.02% (qSte-13) to 70.37% (qLin-14) of phenotypic variation in the joint analysis. The coefficients of determination for individual environments are also shown in Table 4. All QTLs for fatty acid and oil content had predominantly additive effects, agreeing with other studies (Monteros et al. 2008; Reinprecht et al. 2009; Wang et al. 2012a, b; Pinto et al. 2013, Heim and Gillman 2017; Bueno et al. 2018; Silva et al. 2018b).

Table 4 QTLs associated with oil and fatty acid average contents in F2-derived populations from CS303TNKCA × FA22 crossing, evaluated in four environments in Brazil

The interval between flanking markers ranged from 0.20 cM (qSte-06 and qPal-17) to 18.73 cM (qLin-14). The occurrence of monomorphic markers was observed within the intervals of 15 QTLs, and markers that showed segregation distortion were observed in three QTLs. These could be the cause of the wide intervals observed for some flanking markers, notably for qPal-05 (13.51 cM) and qLin-14 (18.73 cM). Despite presenting the largest interval, qLin-14 showed the highest R2 (70.37%).

Six QTLs were found to be associated with palmitic acid content explaining 7.03–12.56% of the trait variation in five chromosomes, totaling 58.26% of phenotypic variation. These QTLs were also identified in four (qPal02) three (qPal-05 and qPal-19), two (qPal-13.1 and qPal-17) and one (qPal-13.2) environments. All the alleles that reduce palmitic content were from FA22 line.

The qPal-05 QTL presented the highest coefficient of determination (12.83%) and had a significant effect in three environments. This QTL was placed in a wide region (432,530–7,943,521 bp) where QTLs and genes controlling palmitic content were found. Li et al. (2015) evaluated 421 soybean accession and mapped a QTL controlling palmitic acid content between positions 1,127,438 and 1,131,632 bp of Gm05 (R2 = 4.0%), a region where is located GmFATB1a (Glyma.05g016500), a gene that codifies for strictosidine synthase and have been associated with palmitic content (Cardinal et al. 2007; Thapa et al. 2016). Previously, Cardinal et al. (2007) found that a deletion in GmFATB1a is responsible for the reduced palmitic content in soybean lines N97-3681-11 and N97-3708-13. Bachleda et al. (2016) developed a TaqMan marker to target a mutation in GmFATB1a which could reduce palmitic content from 110 g kg−1 to 66 g kg−1 (R2 = 57.3%) in an F2 population derived from low palmitic acid N87-2122-4 soybean line (Burton et al. 1994). Close to GmFATB1a (1,048,878 bp), Zhang et al. (2018) found a QTL controlling palmitic content associated with the candidate gene Glyma.05g012300, another GmFATB gene. Besides these, Heim and Gillman (2017) mapped a QTL associated with palmitic acid in position 1,420,686 bp (R2 = 13.06%). In another region of qPal-05 interval is located Glyma.05g015400 (7,703,169–7,710,844 bp) that codifies for phospholipid-translocating ATPase. This gene was identified by sequence comparison of known gene families in Arabidopsis as associated with palmitic acid content with R2 = 7.7% (Li et al. 2015). All these findings indicate to the importance of this region of Gm05 in the control of palmitic acid.

We could not localize the position of one of the flanking markers of qPal-13.1 in Soybase (BARC-041141-07916) but the other flanking marker (BARC-041671-08065) is placed at 30,268,680 bp. This QTL was significant in two environments and explained 7.31% of the palmitic content variation. Close to it, Priolli et al. (2015) associated marker Satt586 (29,609,521 bp) to palmitic acid content by evaluating 95 soybean accession. This marker could reduce palmitic content from 10.74% to 10.12%. In interval 8,477,484–8,585,017 bp of Gm17, qPal-17 was found significant in two environments (R2 = 7.03%). Reinprecht et al. (2006) found a QTL associated with Sat_092 marker (9,687,613 bp) that explained 11% of the palmitic content variation of a RIL population in one of three environments. Similarly, Smallwood et al. (2017) found a QTL close to this region (7,525,398 bp) that explained 3% of the palmitic content variation in a RIL population. The same authors found another QTL located at 44,515,446 bp position of Gm19 (R2 = 4.0%), which matches to the interval of qPal-19 (42,048,082–44,658,492 bp), a QTL significant in three environments which explained 10.86% of palmitic content variation. Besides theses QTLs, we could not associate qPal-02 (4,032,973–4,189,847 bp; R2 = 12.56%) and qPal-13.2 (1,456,486–2,776,574 bp; R2 = 7.67%) with QTLs or genes found in other previous studies, so these could be new QTLs controlling palmitic acid.

We identified five QTLs associated with stearic acid content that explained 7.02–33.91% of phenotypic variation, totaling 73.24% of phenotypic variation. Alleles from CS303TNKCA cultivar were responsible for reducing stearic content, except for qSte-14. The major QTL found was qSte-14 (R2 = 33.91%), which was significant in all environments and was located in the interval of 42,946,740–44,292,942 bp. In this interval, there are two serine-threonine protein kinases, Glyma14g34560 (43,135,438–43,135,773 bp) and Glyma14g34570 (43,138,158–43,139,368 bp). Li et al. (2015) considered another serine-threonine protein kinase gene located in Gm15 as a candidate gene for controlling stearic acid content in soybean seed. Thus, Glyma14g34560 and Glyma14g34570 are possible candidates to explain the effect of this QTL on stearic content. In addition, Heim and Gillman (2017) found a QTL in position 42,206,409 bp that explained 56.76% of stearic acid content in a RIL population, which increases the hypothesis of the importance of this region of Gm14 in controlling stearic acid content.

In Gm06, we found qSte-06 (R2 = 16.50%), located in the interval 21,977,088–27,940,205 bp and significant in all environments. In Soybase, BARC-023517-05442 marker (27,940,205 pb) is placed at 112.87 cM, close to the estimated position of a QTL found by Hyten et al. (2004; 117.8–125.3 cM), which explained 13.1% of the stearic content variation in a RIL population. In Gm13, we found a QTL controlling stearic content (qSte-13; R2 = 7.02%) that is located at the interval 30,771,524–32,171,107 bp. This region is close to other QTL found controlling palmitic content (qPal-13.1), indicating the possibility that the same region is controlling both traits. Two other QTLs were found controlling stearic acid content, qSte-15 (8,262,792–9,125,708 bp; R2 = 7.95%), and qSte-17 (38,811,019–39,399,316 bp; R2 = 7.86%), both significant in two environments, but no QTLs or genes associated with stearic content were found in these regions in the literature. As discussed above, Li et al. (2015) also found a QTL controlling stearic acid content in Gm15, but in another region of the chromosome. Therefore, qSte-15 and qSte-17 can be new QTLs associated with stearic content.

Three QTLs controlling oleic acid content were identified, explaining 8.93–12.78% of trait variation, and all alleles that increase oleic content were from FA22 line. Identified in two environments, qOle-19 (R2 = 12.13%) was located between 42,048,082 bp and 44,658,492 bp positions. Close to this interval, Kim et al. (2010) found a QTL associated with sct_010 marker (41,380,304 bp) with R2 = 7.48% by evaluating RIL population in various environments. In the same region, Monteros et al. (2008) found the association between Satt561 marker (43,022,543 bp) to oleic content, explaining 25.0% of trait variation. In Gm04, qOle-04 presented the highest coefficient of determination QTLs (12.78%) and was placed between 4,054,945 bp and 6,322,237 bp, being significant in all environments. In Gm 02, qOle-02 (6,976,528–7,473,425; R2 = 8.93%) was also significant in two environments. We could not identify other QTLs or genes associated to oleic content in these two regions in previous studies, so it can be new regions controlling oleic content.

Despite the wide amplitude in the content of oleic acid in all environments (184.01–577.42 g kg−1), we did not find a major QTL controlling this trait (the QTLs found explained 33.84% of phenotypic variation). Thus, we could not completely elucidate the genetic bases of the middle oleic content of FA22 line (about 50%). Alternatively, we sequenced the entire region of GmFAD2-1A and GmFAD2-1B genes from FA22 line but found no mutations in them (data not shown). These two genes code for the main omega-6-desaturases expressed in soybean seed, and mutations in these genes are generally the cause of the increase of oleic content in soybean lines (Alt et al. 2005; Pham et al. 2010; Pham et al. 2011). We also evaluated the expression of GmFAD2-1A and GmFAD2-1B genes in FA22 line by Real-time PCR, but its expression pattern was similar to other soybean lines with normal values of oleic acid (data not shown). These results indicate that the middle oleic content of FA22 line could be explained by other mechanisms such as post-translational mechanisms, besides the QTLs discussed above.

Regarding the linoleic acid content, we found qLol-04 in all environments, in the interval 23,716,593–36,725,586 bp (R2 = 13.15%), qLol-19, significant in two environments and placed between 42,048,082 bp and 44,658,492 bp positions (R2 = 12.72%) and qLol-02 (R2 = 7.95%), found in one environment. The QTLs qLol-02 and qLol-19 were located at the same intervals as qOle-02, and qOle-19, respectively, and had similar coefficients of determination, suggesting that there is a same locus controlling these two traits. In fact, highly negative correlations between oleic and linoleic acid content were reported in all environments as well as in other studies, suggesting a pleiotropic effect for these loci, as discussed before (Leamy et al. 2017; Li et al. 2017; Smallwood et al. 2017). Heim and Gillman (2017) and Kim et al. (2010) also found QTLs controlling linoleic content in the same regions, as discussed above. Moreover, the interval of qOle-19 and qLol-19 was also associated with palmitic acid (qPal-19). Pleiotropic effect for oleic, linolenic and palmitic contents were also reported by Smallwood et al. (2017). As expected, as FA22 line alleles were responsible for increasing oleic content, the same alleles reduced linoleic content.

Only one QTL was found to be associated with linolenic content, qLin-14, located in the interval 44,292,942–46,713,845 bp, explaining 70.37% of phenotypic variation. This region houses Glyma.14g194300 (GmFAD3A), a major effect gene which codes for an omega-3-fatty-acid desaturase, an enzyme that converts linoleic to linolenic acid (Bilyeu et al. 2005; Pinto et al. 2013). We previously reported a mutation in GmFAD3A of CS303TNKCA variety that causes a premature stop codon, reduces linolenic content to 33–44 g kg−1 and explained 50.83–73.70% of phenotypic variation (Silva et al. 2018a). Mutations in GmFAD3A associated with low linolenic acid were also reported by many authors (Bilyeu et al. 2005; Reinprecht et al. 2009; Pham et al. 2014).

In relation to oil content, we found two QTLs, both also significant in one environment. The QTL qOil-02 (10,233,770–14,106,445 bp) explained 7.98% of oil variation. We could not find genes or QTLs associated to oil content close to this interval but a QTL associated with protein content in RIL population was found in the interval 14,764,422–15,058,707 bp Seo et al. (2019). Considering that negative correlation between oil and protein in soybean seed due to pleiotropy have been well reported (Reinprecht et al. 2006; Leite et al. 2016), a QTL controlling protein content is a target to controlling oil content as well. In Gm04 we found qOil-04 (R2 = 7.32%) in the interval 1,891,503–2,507,409 bp. Close to this region, the Satt274 marker (5,286,777 bp) was associated to oil content in two RIL populations (R2 = 10.1–13.1) by Wang et al. (2014).

Other QTLs controlling oil content were found in individual analysis, but at most in two locations (data not shown). These results are in according to the low correlations between oil content in different environments (0.3397, on average). In fact, oil content is a complex trait that is controlled by both genetic and environmental factors (Priolli et al. 2015; Seo et al. 2019), and genotype x environment interaction for oil content were reported by many authors, including in Brazilian conditions (Bueno et al. 2013; Cao et al. 2017; Matei et al. 2018). Some studies have been reported that the oil content increases as the average temperature increases (Naeve and Huerd 2008; Matei et al. 2018). Rocha et al. (2009) evaluated the effect of many factors in soybean oil content, and found a pronounced effect of the interaction between genotype and year, overcoming the effects of planting location and sowing date. It is possible that different genes controlling oil content can be activated in different environment conditions, but more studies are necessary.

Oil and fatty acid contents are important traits for soybean to both industrial and food uses. In this study, we evaluated 211 segregating F2-derived families in four environments and identify QTLs controlling oil and fatty acid contents. This population was derived from CS303TNKCA, a soybean cultivar developed by PMQS/BIOAGRO/UFV which has low linolenic acid and middle oil content and is used in our breeding program as allele donor for producing high oleic low linolenic acid lines. Of the 20 QTLs found in this study, eight could not be associated with previously reported QTLs or genes (qPal-02, qOle-02, qLol-02, qOle-04, qLol-04, qPal-13.2, qSte-15 and qSte-17), and can be configured as new QTLs. The results indicate FA22 line as an allele donor for reducing palmitic and linoleic acids and increasing oleic acid, while CS303TNKCA cultivar was identified as a source of alleles for reducing linolenic acid, as previously published (Silva et al. 2018a), and for reducing stearic content.

Further studies are needed to confirm the effect of these QTLs and reduce its intervals, such as fine QTL mapping and DNA sequencing, making it possible to find candidate genes that are related to fatty acid contents, increasing the knowledge about their genetic basis, including to identify new markers for marker-assisted selection. The results generated in this study can help understand the genetic basis of these traits and help breeders to modify oil content and composition in soybean seed, especially in tropical conditions.