Introduction

Oil palm (Elaeis guineensis Jacq.) is the most productive oil crop in the world, and is currently grown on some 19 million hectares (ha) of land. This is only about 0.4% of the total world agricultural land but accounts for almost 40.0% of the global oils and fats (Kushairi et al. 2018). Comparatively, soybean (Glycine max) utilizes 40.1% of the total agricultural land, followed by cottonseed (13.8%), rapeseed (13.0%) and sunflower (10.0%) (Pirker et al. 2016).

In traditional oil palm breeding, the parental lines are continuously crossed to generate superior progenies, similar to producing hybrids in other crops. The progeny from crosses however, are not automatically acceptable just because they come from good parents. Thus, each cross is progeny tested, and only the confirmed combinations with superior yield are used to produce commercial seeds (Soh et al. 2003). It takes on average 10–12 years to develop a new variety, sometimes even up to 20 years for commercial application (Rajanaidu et al. 2000). The question begged is obviously whether the time can be shortened. The main challenge is collection of phenotypic data which is time consuming and labour-intensive, requring years for reliable data compilation. Yield is recorded for at least five years, from six to 10 years after planting in the field and vegetative measurements have to be done several times (Corley and Tinker 2003; Swaray et al. 2020).

In introgressing good trait(s) from Palm A into Palm B, the whole gamut of genes from A, both good and bad, are first incorporated with those from B, and then the undesirable genes weeded out by repeated subsequent self-pollination and selection. It would be faster if only the good gene alleles could be introgressed, but the question has always been how to do so. In recent years, enabling technologies have emerged, such as marker-assisted selection (MAS) and genomic selection (GS). In MAS, markers are used to predict the phenotype, saving time and money in gathering the phenotypic data, as selection can be made even on seedlings when the adult features are yet to show (Collard et al. 2005; Nadeem et al. 2018). More recently, GS, which uses genome-wide markers to estimate the effects of all loci, makes it possible to compute a genomic estimated breeding value for specific traits (Wang et al. 2018) and this approach, is gaining prominence for crop improvement. Both, MAS and GS increase the rate of genetic gain by reducing the necessary selection time for the desired traits. MAS- and GS-based programmes have been applied to improve yield in soybean (Concibido et al. 1997; Sebastian et al. 2010; Jarquín et al. 2014; Fallen et al. 2015; Stewart-Brown et al. 2019) and maize (Yousef and Juvik 2001; Liu et al. 2015; Pace et al. 2015; Beyene et al. 2015; Wang et al. 2020) and have enhanced disease resistance, yield, plant height and flowering time in wheat and rice (Gupta et al. 2010; Poland et al. 2012; Ragimekula et al. 2013; Spindel et al. 2015; Thavamanikumar et al. 2015; Borrenpohl et al. 2020). These molecular strategies are also applicable to oil palm.

In oil palm, the required tools and techniques for MAS and GS have been developed over the last two decades. For example, DNA-based markers and identification of genomic loci associated with monogenic as well as polygenic traits have been reported (Jack and Mayes 1993; Singh and Cheah 2005). The causal genes regulating the two most important monogenic traits—shell and fruit colour—have been identified and the discoveries translated into commercial diagnostic assays (Singh et al. 2013a, 2014; Ooi et al. 2016). For yield, the QTLs associated with oil yield (OY) and various other yield components (YCs) have been reported by Rance et al. (2001), Billotte et al. (2010), Jeennor and Volkaert (2014), Pootakham et al. (2015), Seng et al. (2016), Teh et al. (2016, 2020) and Bhagya et al. (2020). Many QTLs and markers have been associated with OY and various YCs across different genetic backgrounds, suggesting a complex genetic mechanism determining oil palm yield. The QTLs were uncovered using different marker systems, starting with restriction fragment length polymorphism (RFLP), which were largely replaced by amplified fragment length polymorphism (AFLP), simple sequence repeat (SSR) and more recently, single nucleotide polymorphism (SNP) based markers. RFLP-based markers are codominant, but not popular at present as the technique for generating and identifying informative RFLP markers is expensive and laborious. To overcome these shortfalls, AFLP markers can be used instead (Singh and Cheah 1999; Kularatne et al. 2001; Seng et al. 2007) although their dominant nature also posed some limitations in application. Subsequently, SSR markers (also codominant but requiring less DNA and with high reproducibility across laboratories) have become popular in oil palm research (Ting et al. 2010; Zaki et al. 2012; Ting et al. 2013). More recently, SNP markers have gained importance and are preferred due to their wide distribution in the genome, codominant nature and amenability to high throughput analysis (Mishra et al. 2014; Nadeem et al. 2018).

This study constructed a genetic linkage map for a Deli dura × AVROS pisifera family, a commercial planting material, and updated the Deli dura × Yangambi pisifera genetic map constructed previously by Ting et al. (2014). Both maps were constructed using the same oil palm customised array containing 4451 SNP markers and over 600 SSR markers, making the comparison possible. The genetic maps were then used to identify QTLs associated with OY and YCs, and the results were compared to the QTLs published previously for oil palm. Linking and cataloguing the QTLs identified in different studies and by different marker systems is challenging, but has fortunately been made easier with the publication of the oil palm genome build (EG5) (Singh et al. 2013b). It is now possible to compare QTLs from different crosses and publications to determine if they fall within the same chromosomal regions. The ability to identify overlapping QTLs linked to a trait in a similar chromosomal region, adds confidence to the postulation that the genomic region strongly influences the trait concerned. Inclusion of QTL-linked markers consistently associated with a trait in a panel has increased the prediction accuracy of GS models in cattle improvement (Brøndum et al. 2015). More importantly, candidate genes within or near the QTL regions can now be identified for subsequent analysis to determine the actual causative genes for the yield trait(s).

Materials and methods

Mapping families

The first mapping family—P2 (05 Trial 1)—is an advanced breeding cross between an Ulu Remis Deli dura (ENL48) and a Yangambi pisifera (ML161). The P2 population consisted of 87 F1 tenera palms currently grown at FGV R&D Sdn. Bhd., Kota Gelanggi, Pahang, Malaysia. The second family namely, KULIM DxP consisted of 135 F1 tenera palms, planted at the Tereh Utara plantation of Kulim Plantation Bhd., Johor, Malaysia. The KULIM DxP palms were generated from a cross between an ex-Ulu Remis Deli dura (KT 910512/0804) and an AVROS pisifera (KT 911101/1203). The maternal dura and the paternal pisifera palms are known to have contrasting yield parameters, as pisifera is female sterile and rarely produces fruit bunches to maturity (Wonkyi-Appiah 1987; Kushairi et al. 1999; Kushairi and Rajanaidu 2000; Swaray et al. 2020). The maternal Deli dura palms are known to have higher bunch weight and lower bunch number compared to the paternal pisifera and the resulting intraspecific progenies of these two parental palms show hybrid vigour for yield (Gascon and de Berchoux 1964; Durand-Gasselin et al. 2000; Jin et al. 2017; Singh et al. 2020). Leaf materials from all the palms, including the parental ones, were sampled for DNA extraction and marker analysis.

Yield-related phenotypic data

Ripe bunches from both families were analysed for their YCs over a 5-year period according to the standard protocol used by oil palm breeders (Blaak et al. 1963; Rao et al. 1983; Isa et al. 2011). The standard protocol for determining YCs is also cited in the National standards (SIRIM standard MS157), as the recommended methodology to determine the suitable parental palms for commercial seed production. A minimum of three bunches per palm were analysed for 16 YC parameters: mean bunch number (MBN, no/palm/year), mean fresh fruit bunch weight (MFFB, kg/palm/year), mean fruit weight (MFW, g/fruit), total mesocarp and kernel oils (TOT, ton/ha/year), mesocarp oil yield (OY, ton/ha/year), oil/bunch (OTB, %), oil/wet mesocarp (OTWP, %), oil/dry mesocarp (OTDP, %), mean mesocarp weight (MPW, g/fruit), mesocarp/fruit (MTF, %), kernel yield (KY, ton/ha/year), mean kernel weight (MKW, g/fruit), kernel/fruit (KTF, %), kernel/bunch (KTB, %), mean shell weight (MSW, g/fruit) and shell/fruit (STF, %). The distribution and correlations between the parameters were evaluated using the Kolmogorov–Smirnov normality and Pearson correlation tests in SPSS 16.0.

Genomic DNA extraction

Extraction of genomic DNA from frozen leaves stored at − 80 °C was done using the modified CTAB method (Doyle and Doyle 1990). DNA quality was checked by digestion with EcoRI and HaeIII and electrophoresed on 0.8% agarose gel (Rahimah et al. 2006). The acceptable purity values were 1.8–2.0, as measured by the NanoDrop spectrophotometer (NanoDrop Technologies Inc., Wilmington, DE).

SNP and SSR analyses

SNP genotyping was performed by a service provider using the oil palm customized OPSNP3 Illumina Infinium II Bead-Chip array (Illumina Inc., San Diego, CA) containing 4451 SNPs. For SSR genotyping, fragment analysis was carried out using the ABI PRISM® 3100 Genetic Analyzer (Applied Biosystems, Foster City, CA). The SNP and SSR genotyping analyses were as described by Ting et al. (2013, 2014).

Construction of genetic linkage maps

An integrated genetic map of P2 was constructed previously (Ting et al. 2014). Additional SSR markers (sMo, sMh, sMg, _oSSR, sTE, sEg, sOleiSc, p5sc322 and sPSc) from the MPOB SSR database (http://opsri.mpob.gov.my/opsri/welcome.php) and Billotte et al. (2010) (mEgCIR) were genotyped and added to the P2 map. The KULIM DxP genetic map was constructed using JoinMap® 4.1 (van Ooijen 2006) as described by Ting et al. (2014). In brief, the independent parental and integrated KULIM DxP genetic maps were constructed simultaneously using the maximum likelihood (ML) mapping algorithm, where each linkage group (LG) was formed from marker pairs with recombination frequency ≤ 0.2. The Haldane mapping function was used to determine the map distance in centimorgan (cM) and markers with nearest neighbor stress (N.N. Stress) value > 4 cM were excluded from the individual parental and integrated maps. Finally, a consistent marker-order was determined by four iterations of map calculation. The integrated genetic linkage maps for P2 and KULIM DxP were labeled as DP and DPK, respectively.

QTLs analysis

QTL analysis was carried out separately for DP and DPK as described by Ting et al. (2016). The default parameters in Interval Mapping (IM), the Multiple-QTL Model (MQM) and Kruskal–Wallis non-parametric ranking tests (KW) were used in MapQTL®6 (van Ooijen 2009). The 95.0% genome-wide (GW) and chromosome-wide (CW) LOD significance thresholds for each YC was determined by 1000 permutations. In addition, G model (GM) (Bernardo 2013) was used to estimate the individual marker effect for the QTLs linked to each YC.

Mapping of QTLs to the oil palm genome build

Markers from the QTL regions were aligned to the oil palm reference genome (EG5) (Singh et al. 2013b) to identify their positions on the corresponding pseudo-chromosome using the Exonerate (Slater and Birney 2005) program with its default parameters. Markers with low scores (< 90.0% matched) and not uniquely mapped were removed. The genomic region corresponding to the QTLs were searched against the predicted oil palm gene model database (Chan et al. 2017) in PalmXplore (http://palmxplore.mpob.gov.my, Sanusi et al. 2018) to identify putative genes and their functions.

Results and discussion

Comparison of DP and DPK genetic maps

A DP (P2) genetic linkage map was constructed previously using AFLP, RFLP, SSR and SNP markers by Ting et al. (2014). A further 240 SSR markers, 151 from MPOB and 89 from Billotte et al. (2010) were added to the current DP map. The updated DP map now contains 1595 markers across 16 LGs, spanning 1714.3 cM. Interestingly, a small number of SNP markers (23 SNPM) that failed to map previously, are now in DP although the same mapping parameters were used. They helped bridge some gaps in the original map and further saturate some regions linked to QTLs e.g. OTB on LGDP2 and MSW and MKW on LGDP3. The DPK genetic map (KULIM DxP) had slightly fewer markers, only 57 SSRs and 1449 SNPs in 16 LGs, covering a total map length of 1902.3 cM. The average map distance per marker in DPK was 1.3 cM, which as expected was close to the 1.1 cM observed in DP. In DP, the LGs were 66.2–193.2 cM, and in DPK, the range observed was 60.7–192.4 cM. In both populations, LGDP/DPK5 was the shortest, and the longest was—LGDP/DPK4. There were in total 746 common markers across the 16 LGs, a comparison of which revealed relatively high collinearity of the markers in both maps (Supplementary Figure 1). This is likely due to both populations having female parents of the Deli dura pedigree. This suggests that major chromosomal rearrangements have not yet occurred in domestication of the closely related parental lines, as also observed for watermelon (Ren et al. 2014).

Fig. 1
figure 1

Significant (p ≤ 0.01, 2-tailed) positive (solid lines) and negative (dotted lines) correlations between YCs in P2 and KULIM DxP families

Yield components (YCs) and correlations between them

Of the 16 YCs evaluated, 11 were common in both P2 and KULIM DxP families—MBN, MFFB, TOT, OY, KY, OTB, KTF, KTB, MKW, MSW and OTWP. The data for MFW, MPW, STF, OTDP and MTF were only available for KULIM DxP. Almost all the YCs (except MSW) had a continuous and significant normal distribution (p > 0.05) in both populations. Normality of YC data was also observed in other oil palm mapping families analysed by Billotte et al. (2010), Seng et al. (2016) and Teh et al. (2020). For P2, YC data were available for 75 of its 87 palms, of which three outliers were removed for MBN based on a Boxplot analysis comparing the observed and expected mean values (5.0% trimmed mean, SPSS 16.0). For KULIM DxP, the data was available for all of its 135 palms. However, for MSW, MPW and MKW, one, two and four outliers were removed, respectively, following Boxplot analysis.

MBN was determined for an average of 13 bunches/palm for both families, where the range of observations made for individual palms of P2 and KULIM DxP was 6–16 and 6–19, respectively. As MFFB is influenced by MBN, variation was also observed for it, 72.04–210.53 kg/palm/year in the two populations, while OY was 2.53–7.92 ton/ha/year. The variations for the different YCs are summarized in Supplementary Table S1. Wide distribution was also observed for fruit components, such as mesocarp measurements and their derivatives (MPW, OTWP, OTDP and MTF) as well as the kernel (KY, MKW, KTF and KTB) and shell-related traits (MSW and STF), suggesting that both populations are suitable for QTL analysis for all their YCs measured in this study.

The correlations between the various YCs were consistent in both P2 and KULIM DxP families, with three levels of positive relationships (Fig. 1). Strong correlations were observed among MBN, MFFB, TOT and OY with r = 0.63–0.99. The second level of positive correlations was among the mesocarp and endocarp components. The mesocarp components (OTB, OTDP and MTF) and MPW had moderate correlation with r = 0.20–0.28 for KULIM DxP. Moderate to strong correlations (r = 0.30–0.77) were recorded among the endocarp components where KTF, STF, KY and KTB were correlated with MKW and MSW. Finally, the mesocarp and endocarp components contributing to MFW showed strong correlations with MPW (r = 0.87) and moderate correlations with MKW (r = 0.49). A graphical view of the correlations between the YCs is shown in Fig. 1, while Supplementary Table S2 demonstrates the relationships of both the direct (those categorized in the same group) and contributory effects (those at different levels) of the YCs to the overall yield in oil palm.

Pearson correlation was negative between some YCs, mainly between the mesocarp (OTB, OTPM, OTDP, MPW and MTF) and endocarp (KTF, STF, KY, KTB, MKW and MSW) components. Among them, negative correlations with r = − 0.29 to − 0.95 occurred between MTF and the endocarp components in KULIM DxP. This clearly indicates that increasing mesocarp reduces kernel and shell, and vice versa, suggesting competition among the sinks for assimilates. Strong correlations among the YCs were also reported by Kushairi et al. (1999), Okwuagwu et al. (2008), Okoye et al. (2009), Seng et al. (2016), Osorio-Guarín et al. (2019) and Teh et al. (2020).

P2: QTLs linked to YCs

In the DP genetic map, 10 QTLs, significant at GW, were associated with various YCs. The traits for the QTLs and their LGs were MBN (LGDP13A), OTB (LGs DP2 and DP12), OTWP (LGDP12), KY (LGDP15), MKW (LGs DP3 and DP10), MSW (LGs DP2, DP3 and DP16) (Table 1). A QTL associated with MBN was identified at map interval 0.0–5.0 cM on LGDP13A. An AFLP marker, EAAG/MCTC-125, was closest to the QTL peak detected at LOD 3.9 for MBN. Both the IM and MQM methods revealed that the QTL explained ~ 20.5% of the phenotypic variation for MBN, and a negative (paternal) effect (− 0.59) was estimated using GM. When associating the MBN phenotype with the observed genotype profiles, without the AFLP locus from the paternal palm (denoted aa genotype) (Fig. 2A) MBN increased to 13.30 ± 1.53 bunches from 12.11 ± 1.53 bunches. The limitation of an AFLP marker here was its dominant nature, and it was not clear if the marker concerned, EAAG/MCTC-125, amplified a homozygous or heterozygous DNA segment. Therefore, other flanking markers (LOD 3.6)—namely, sMo00166, sMo00196, SNPM04999 and SNPM03169—located ~ 2.6 cM (Figure S1) away were used as proxies, although the phenotypic variation explained was slightly reduced to 18.6.

Table 1 Genome-wide (GW) significant QTLs detected for YCs in P2
Fig. 2
figure 2

Boxplot distribution of YCs by genotype of closest markers to QTL peaks in P2

QTLs associated with OTB were found in the 48.0–52.0 cM (4.0 cM confidence interval) and 34.3–42.8 cM (8.5 cM confidence interval) regions of LGs DP2 and DP12, respectively. Markers from the two intervals showed negative effects from 0.9 to 1.2% (p = 0.007). The closest markers flanking the QTLs were SNPM02314 (LGDP2) and SNPM04433 (LGDP12). Palms categorized in the genotypes ab and aa had significant differences in OTB (p ≤ 0.05 T test, SPSS 16.0). For the marker from the maternal palm—SNPM02314—the homozygous genotype aa showed increased OTB (31.4 ± 2.6%), ~ 1.9% higher than the ab genotype (29.6 ± 2.9%). The genotype of the paternal marker SNPM04433, meanwhile, had an opposite effect on OTB. The aa genotype (28.7 ± 2.8%) had 2.6% lower OTB than ab (31.3 ± 2.6%) (Fig. 2B).

In addition to OTB, LGDP12 also hosted another GW significant QTL, OTWP, which interval overlapped that for OTB, with the same marker, SNPM04433, located closest to the QTL peaks for both traits. This explained why the two YCs were strongly correlated (r = 0.81). However, SNPM04433 had a stronger effect of − 2.14 (p = 0.000263) for OTWP than for OTB (only − 1.20, p = 0.000160). This was likely due to the larger variation for OTWP (3.2%) in the two genotypes ab (54.0 ± 3.5%) and aa (50.9 ± 3.2%) (Fig. 2C). QTLs associated with kernel and shell components, such as KY, MSW and MKW, were also identified on DP. The markers linked to them explained less of the phenotypic variation than those linked to the QTLs for fruit bunch, whole fruit and mesocarp components (Table 1). This is demonstrated for KY where marker SNPM01951 from the QTL interval 75.0–82.1 cM in LGDP15 showed an effect of only 0.07 (p = 0.013897). The average KY for the two genotypes ab and aa were 0.57 and 0.66 ton/ha/year, respectively, a difference of only 0.09 ton/ha/year (Fig. 2D). Similar observations were made for MSW and MKW where the genotypes ab and aa of SNPM02999 (LGDP2) and EAGC/MCAA-302 (LGDP10) showed only a small difference of not more than 0.18 g (Fig. 2E, F). Additional QTLs for MSW and MKW were observed in LGs DP3 and DP16 where markers showing clear codominant segregating profiles were detected close to their QTL peaks. The SSR marker mEgCIR3301 had three alleles <abxac> , which segregated into four genotype classes—ab, aa, bc and ac. Interestingly, ab and aa showed lower phenotypic values than bc and ac (Fig. 2E, F). Another interesting marker was SNPM02704 at the QTL interval associated with MSW on LGDP16. The two parental palms showed the same genotype <abxab> and therefore, their parental effects and contribution to the trait could not be determined via GM. However, among the three observed genotypes, bb had the lowest MSW (0.79 ± 0.3 g) compared to aa (0.96 ± 0.2 g) and ab (1.10 ± 0.2 g) (Fig. 2F).

In this study, QTL analysis also revealed a number of putative QTLs for YCs (Table 2). By permutating the entire 16 LGs, these QTLs had LOD scores lower than their GW significance thresholds but higher than their 95.0% significant thresholds at the chromosome level. In this respect, three CW significant QTLs, termed putative, were identified for MBN, TOT and OY in LGDP2. Interestingly, these three production components are strongly related to each other (r = 0.79–0.99). In oil palm, a common QTL interval on the genetic map for related YCs, such as OTB, OTF, STF, KTF and DMWM, was also reported by Jeennor and Volkaert (2014). Similarly, in other crops, clustering of QTLs was reported for fiber quality and various yield traits in cotton (Keerio et al. 2018), weight, length, diameter and peduncle length in tomato (Portis et al. 2014), grain yield, harvesting index and grain weight in rice (Zhu et al. 2017) as well as maturity date, fruit development, fruit structure and the solid soluble content in sweet cherry (Calle and Wünsch 2020). The co-localization of multiple QTLs suggests the presence of closely linked loci or pleiotropic genes (Billotte et al. 2010; Lemmon and Doebley 2014).

Table 2 Chromosome-wide (CW) QTLs detected for YCs in P2

KULIM DxP: QTLs linked to YCs

In this population, GW-significant QTLs were identified for nine YCs (Table 3). The YCs with their associated QTLs and LGs were MBN and MFFB (LGDPK1), OTB (LGDPK8), OY and TOT (LGDPK1 and DPK8), KTB, KTF and MTF (LGDPK14) and STF (LGDPK4). A QTL was associated with MBN at interval 0–7.2 cM on LGDPK1, explaining ~ 15.9% of the phenotypic variation for the trait. The QTL peak had LOD 5.1 and the closest marker was a SSR, mEgCIR3803, with four genotype classes among the progenies, namely ac, ad, bc and bd. Palms with the ac and bc genotypes had lower MBN of 12.61 ± 0.39 and 12.76 ± 0.38, respectively, than those with the bd (13.90 ± 0.33) and ad (14.85 ± 0.36) genotypes (Fig. 3A). Within the same QTL interval, a smaller region (0.75–7.58 cM) was associated with MFFB, where the SNP marker, SNPM01086 was located closest to the QTL peak. In fact, MFFB is one of the most important traits that indicates the productivity of oil palm. This co-segregating <abxab> marker demonstrated that both the aa (157.92 ± 3.30 kg) and ab (156.56 ± 2.52 kg) genotypes contributed to significantly higher MFFB production than palms with the bb genotype (143.02 ± 4.28 kg) (Fig. 3B). On LGDPK1, the slightly extended interval from 0.00 to 7.60 cM also hosted QTLs for OY and TOT, where the co-segregating marker SNPM01086 was closest to the QTL peak. Higher OY (6.1 ± 0.2 ton/ha/year) and TOT (6.60 ± 0.1 ton/ha/year) were observed for the aa than in the ab (5.8 ± 0.1 ton/ha/year OY and TOT) and bb (5.24 ± 0.18 ton/ha/year OY and 5.76 ± 0.19 ton/ha/year TOT) genotypes.

Table 3 Genome-wide (GW) significant QTLs detected for YCs in KULIM DxP
Fig. 3
figure 3

Boxplot distribution of YCs by genotype of closest markers to QTL peaks in KULIM DxP

The QTLs associated with OY and TOT were also identified on LGDPK8 (92.3–105.2 cM), with two SNP markers, SNPM02425 and SNPM02400, located closest to the QTL peaks, respectively. The OY-linked SNPM02425 showed a co-segregating profile <abxab>, i.e., palms with the bb genotype had higher OY (6.18 ± 0.13 ton/ha/year) than those with aa (5.26 ± 0.2 ton/ha/year) and ab (5.76 ± 0.1 ton/ha/year). For the QTL associated with TOT, the maternally inherited marker SNPM02400 revealed significantly higher TOT (6.6 ± 0.1 ton/ha/year) for the homozygous genotype (aa) than ab (5.83 ± 0.1 ton/ha/year). Interestingly, SNPM02400 also pointed to another QTL associated with OTB located at the 101.1–103.4 cM interval. The aa genotype of this marker was also responsible for higher OTB (28.2 ± 0.2%) than ab (26.8 ± 0.2%) (Fig. 3C). The three YCs discussed above—OTB, OY and TOT—were significantly related with each another. Therefore, selection for higher OTB will also increase OY and TOT, although these three YC traits are highly influenced by the environment (Soh et al. 2017). The heritability for the three YCs are low, so their breeding improvement will be highly dependent on the environment and general operational management of the trials. If the environment is unfavourable and operational management is poor, the gains from MAS will be tentative.

On LGDPK4, the QTL interval associated with STF was 3.5–16.2 cM. It explained 18.6% of the phenotypic variation in STF and the closest marker to the QTL peak was SNPM00151, which revealed a marker effect of − 0.73% (heterozygous in the paternal palm). The heterozygous (ab) group showed a significantly lower STF (10.60 ± 0.19%) than aa (12.06 ± 0.19%) (Fig. 3H). On DPK14, the QTLs for three highly correlated traits—KTF, KTB and MTF were found within the same map interval (46.9–64.8 cM). For KTF and KTB, the markers closest to the QTL peak (54.0 cM) were SNPM04522 and SNPM04938 which mapped on the same locus, indicating they had similar segregation profiles in the mapping family. The phenotypic variation explained by the QTL for KTF (18.8%) was higher than that for KTB (21.1%). Based on the genotypes of both markers, higher KTF and KTB were observed for the ab (7.69 ± 0.13% KTF and 5.20 ± 0.09% KTB) than the homozygous aa genotype (6.70 ± 0.13% KTF and 4.46 ± 0.09% KTB) (Fig. 3E, F). Within the same map interval, SNPM01100, located closest to the QTL peak (57.4 cM), accounted for 15.6% of the MTF phenotypic variation. In contrast with KTF and KTB, the aa genotype of SNPM01100 showed significantly higher MTF (82.45 ± 0.31%) than ab (80.4 ± 0.28%) (Fig. 3G). Interestingly, marker SNPM01100 was also significantly associated with KTF and KTB, although it was not closest to their QTL peaks. This indicates that within the QTL interval, this marker influences multiple traits differently depending on its genotype, which is supported by the significant correlations of KTF and KTB with MTF. This suggests that the genes that contribute to increased kernel size (larger KTF and KTB) will reduce mesocarp (MTF). So, selection for MTF will reduce KTF, boosting the mesocarp oil yield (Kushairi et al. 1999).

This study also identified a number of putative QTLs for various YCs on LGs DPK2 (OTDP), DPK4 (MFW, MPW, MSW and KY), DPK5 (MPW, MFW, OTB, OTWP and OTDP), DPK7 (OTWP), DPK8 (MBN), DPK13 (KTF) and DPK14 (MKW, STF and KY). Information on the putative QTLs is summarized in Table 4.

Table 4 Chromosome-wide (CW) QTLs detected for YCs in KULIM DxP

Comparison of common QTLs between P2 and KULIM DxP

This study identified 42 QTLs (21 putative) in P2 and KULIM DxP, distributed across 12 LGs (except 06, 09 and 11). Within each family, a number of the QTLs were co-localized on the same regions, such as on LGs DP1 (MFFB, TOT and OY), DP2 (MBN, OY and TOT) and DP12 (OTB and OTWP) in P2. In KULIM DxP, common QTLs were found on LGs DPK5 (MFW, MPW, OTB, OTDP and OTWP), DPK8 (OTB and TOT) and DPK14 (MTF and STF and; KTB and KTF). However, comparing P2 and KULIM DxP, only a few QTLs were detected in the same LGs for both. The QTLs on the same LGs were those associated with OTB, MBN, OY, TOT and MSW with OTDP in LG2, and MBN with KTF in LG13. However, the QTLs in the same LGs in P2 and KULIM DxP did not overlap, either in the genetic or physical map.

The lack of common QTLs in both families is likely due to differences in their genetic backgrounds, especially as their pisifera parents were different. The pisifera of P2 was Yangambi and that of KULIM DxP was AVROS, of quite separate origins. The pisifera of KULIM DxP contributed most of the alleles that revealed the GW QTLs for OTB (LGDPK8), KTB, KTF, MTF (LGDPK14) and TOT (LGDPK1). The maternal dura, as expected, contributed the alleles for the STF-related QTLs, as the shell trait is maternally inherited. However, in P2, the GW QTLs detected were contributed in equal numbers by both the paternal and maternal parents. Its paternally inherited QTLs were those associated with MBN (LGDP13A), OTB, OTWP (LGDP12) and KY (LGDP15).

QTLs from different studies

The QTLs identified in this study were compared with 144 previously reported for several oil palm crosses (Billotte et al. 2010; Jeennor and Volkaert 2014; Pootakham et al. 2015; Seng et al. 2016; Teh et al. 2016; Bai et al. 2017; Ithnin et al. 2017). Comparison was also made to the QTLs already detected for MFW, MPW, STF, MTF and OTDP in P2 (Ting et al. 2018). The sequences of all the published QTL-linked markers were first mapped to the EG5 genome build to locate them in their pseudo-chromosomes. The results showed that most of the QTLs identified in our study were unique to P2 or KULIM DxP, and have not been reported in other oil palm crosses. Nevertheless, genomic regions on CHR09 and 14 that hosted QTLs in LGs DP7 and DP3 was common to those reported in different genetic backgrounds (discussed below). And, another five QTLs detected in our study are located as close as 2792 bp to the QTLs reported previously in CHR02, 06 and 15 (Fig. 4).

Fig. 4
figure 4figure 4

Comparison of QTLs from different studies by mapping relevant information to oil palm EG5 genome build. Only closely linked markers defined the QTL regions for each trait on the chromosomes are shown

In CHR02, marker SNPM00151, linked to the QTLs for STF and MSW, was located only ~ 236.4 kb away from the SSR marker sMg00022 that was reported to be associated with KB and KF by Seng et al. (2016). Interestingly, STF is positively related with both KB and KF, which explains why the same genomic region may influence both traits. In the window (2,092,554–2,328,938 bp) which encompasses both the QTL intervals, we identified two genes—acyl-acyl carrier protein thioesterase (Acyl-ACP TE) and UDP-glycosyltransferase (UGT) involved in the fatty acid (FA) biosynthesis and glycosylation modification, respectively, during fruit development and ripening (Pulsifer et al. 2014; Jing et al. 2011; Sun et al. 2017; Wu et al. 2017; Peng et al. 2020). In the oil palm fruit, the Acyl-ACP TE genes such as FATA and FATB encode protein that hydrolyse the FA acyl chains from ACPs. FATA is quite specific for unsaturated acyl ACPs e.g. C18:1-ACP for release of C18:1, and FATB for saturated acyl-ACPs, e.g. C16:0-ACP and C14:0-ACP for release of C16:0 and C14:0, respectively thus, playing essential roles in determining the FA composition of palm oil (Sambanthamurthi et al. 2000; Othman et al 2001). UGT is involved in anthocyanin glycosylation, the process of accumulating phenolic compounds which are responsible for the customary deep orange-to-red colour of oil palm exocarp. Based on their biological activities, the two genes have a direct impact on the composition of palm oil produced. However, their impact on the shell (and kernel) components, if any, require further investigation.

In CHR06, the marker EAGC/MCAA-302 closest to the QTL peak for MKW—was in the same QTL interval (37,012–38,280 kb) associated with PF and aBWT in a multi-parental DxP cross (Billotte et al. 2010). In the interval, a valine-glutamine motif-containing protein (VQ) was identified at chromosomal position 37,411,925 bp. In many plants, VQ has been reported to be responsive to biotic and abiotic stress, including pathogen infection, when interacting with the WRKY transcription factor (TF) (Chen et al 2012; Pecher et al. 2014; Liu et al. 2020). The specific interaction between the VQ motif FXhVQChTG (pfam05678) containing the gene IKU1 and a WRKY, MINI3, reportedly controls endosperm growth and seed size in Arabidopsis (Wang et al. 2010). Therefore, VQ is a good candidate gene to investigate for its regulatory effect on kernel and seed in oil palm. Additional analysis of the MKW-QTL region revealed that VQ was flanked by gibberellin 2-beta-dioxygenase (GA2OX) and a GATA TF (GATA), the putative functions of which are summarized in Table 5. Interestingly, these genes are significantly differentially expressed in low- and high-yielding oil palm (Wong et al. 2017). Furthermore, GATA is known to regulate biological functions in various plant organs, including the flower and seed.

Table 5 Putative biological functions for the candidate genes, proteins and transcription factors identified within the QTL region associated with yield components in the P2 and KULIM DxP mapping populations

In CHR09, the genomic region corresponding to 74.8–84.5 cM on LGDP7 of P2 was previously reported to be associated with MTF and STF (Ting et al. 2018). The same genomic region was also associated with QTLs for Bwt and Fwt which were identified in populations derived from Deli, La Me and Yangambi genetic backgrounds (Billotte et al. 2010). Although the correlations between MTF, Bwt and Fwt are not known, it is postulated that increased MTF (or decreased STF) will increase Fwt. A search for genes of interest was performed in the genomic region 8,208,977–9,198,501 bp, and two, C3HC4-type zinc finger TF (RING finger) and a membrane-bound O-acyltransferase (MBOAT), were shortlisted. In Nicotiana benthamiana, RING finger is in the chloroplasts and silencing it stops the growth of fruits (Wu et al. 2014). MBOATs, such as diacylglycerol acyltransferase (DGAT) and lysophospholipid acyltransferase (LPLAT), are involved in catalysing the synthesis and accumulation of lipids in developing seeds, including in the mesocarp of oil palm (Tranbarger et al. 2011; Li et al, 2013; Wang et al. 2012; Jin et al. 2017; Rosli et al. 2018).

The SSR marker mEgCIR3301 mapped to 6,491,270 bp in CHR14 was found associated to MKW in P2 and an DxP mapping family by Seng et al. (2016) as both families shared the same paternal parent (coded ML161). Interestingly, mEgCIR3301 was flanked by a lipid acylation-related gene, glycerol-3-phosphate acyltransferase (GPAT), at 6,480,850 bp and WRI1, at 6,510,932 bp. In many plants, including oil palm, WRI1 has been reported to regulate genes encoding a number of key enzymes along the FA and triacylglycerol synthesis pathways (Maeo et al. 2009; Bourgis et al. 2011; Tranbarger et al. 2011; Chapman and Ohlrogge 2012; Qu et al. 2012; To et al. 2012; Vanhercke et al. 2013; Tajima et al. 2013; Grimberg et al. 2020; Kong et al. 2020). In fact, a wider group of genes, such as the sugar- and carbohydrate-responsive genes, are also reported to be regulated by WRI1 (Masaki et al. 2005; Cernac et al. 2006). The storage compounds regulated by these genes eventually will affect development of the seed, embryo and even seedling, suggesting a possible role for WRI1 in regulating MKW of oil palm.

Another common genomic region is the 19,804–20,124 kb interval on CHR15, which was associated with MTF and STF in KULIM DxP. The region was also reportedly linked to other important YCs, such as FFB, Fwt, Bwt and PO (Billotte et al. 2010). We identified a pectinesterase (PME) and a small auxin-up RNA-like auxin-responsive protein (SAUR) at 19,788,553 bp (to 19,805,976 bp) and 20,058,133 bp (to 20,059,096 bp), respectively. Both are related to cell metabolism, PME degrading pectin and modifying the cell wall in preparation for fruit ripening and softening, and SAUR involved in cell division, expansion and differentiation (Markakis et al. 2013; Abu-Sarra and Abu-Goukh 1992; Li et al. 2015; Wen et al. 2020). The presence of these genes in QTL regions influencing various bunch components suggests the importance of genes regulating cell wall development, cell division, expansion and differentiation for the appropriate development of all components in the fruit bunch. Extending the search beyond the common QTL regions (in CHR02, 06, 14 and 15), we also identified a number of genes and TFs involved in the regulation of sugar levels, FA/oil biosynthesis, growth and development of flower, seed and fruit (Table 5), all of which potentially impact development of the bunch components.

Conclusion

This study describes the QTLs associated with yield components in two advanced dura × pisifera populations. Several common QTLs were identified in both populations. The QTLs linked to MTF and OTWP in P2 and KULIM DxP that influence mesocarp formation, respectively, were located ~ 22,000 kb apart in CHR09 (LGDP/DPK7). In addition, another similar genomic region (~ 11,000 kb apart) in CHR08 (LGDP/DPK2) regulates OTB and OTDP in P2 and KULIM DxP, respectively, both directly contributing to oil yield. The QTLs associated with similar yield traits have been published previously in mapping populations of different genetic backgrounds. We collated all the information to identify the QTL regions influencing the related traits reported by the different studies in CHR02, 06, 09, 14 and 15. Search within and near the QTL regions in the different chromosomes revealed 29 candidate genes and transcription factors related to glycosylation, plant growth, development and architecture, glucose and hormone signalling, lipid metabolism, photosynthesis, flowering and fruit ripening. UGT, PG, MYB, NAC2, AUX/IAA, RING finger and PME are example of genes potentially regulating oil palm fruit formation, thus directly impacting yield. The current genome-based candidate gene approach is useful in identifying interesting genes that can assist in further understanding the genetic control of oil palm yield. In fact, GATA gene located within the QTL interval was shown previously to be differentially expressed in high- and low-yielding palms. Further validation of the association of the other candidate genes with the traits concerned can help develop useful tools for marker assisted selection in oil palm breeding. The markers linked to the QTLs could also be candidates for developing an appropriate marker panel for genomic selection in oil palm.