Introduction

The oil palm (Elaeis guineensis Jacq.) originated in West Africa (Corley and Tinker 2015). Oil palm fruits have been used for edible oil in Africa for almost 5000 years. Nowadays, palm oil is also an important source of biofuels (Corley and Tinker 2015; Zabaruddin et al. 2019). Commercial oil palm plantation started in Indonesia around 1911 (Corley and Tinker 2015). After that, commercial plantations in Indonesia and Malaysia expanded rapidly (Corley and Tinker 2015). The oil palm is now cultivated in over 43 countries globally (Soh et al. 2018). Indonesia and Malaysia contribute to around 85% of the palm oil production world-wide. In the past 100 years, due to selective breeding, introduction of the African weevil for improving pollination, hybridization between different genetic types of the oil palm and improvement of field management, the average crude palm oil (CPO) yield increased from 2 to 3.8 tons/ha/year (Corley and Tinker 2015; Soh et al. 2018). The oil palm is the most productive oil crop and has a long life-span (≥ 25 years) for harvest. Producing the equal amount of oil from other oilseed crops requires much more land. For example, in order to yield the same volume of oil generated from 1 ha of oil palm, it would require 4.8 ha of rapeseed, 5.4 ha of sunflower, or 7.6 ha of soybean (Soh et al. 2018). According to the European Palm Oil Alliance (EPOA 2020), among the major oil-producing plants, the oil palm used the smallest percentage (7.0%) of land for oils, but produced the biggest percentage (38.0%) of total oil. Global CPO production rose from 15.2 million metric tons (MMT) in 1995 to 72.7 MMT in 2019/2020. However, the palm oil sector is often blamed for causing deforestation and the loss of biodiversity, which turns popular opinion against palm oil in some countries (Gatti et al. 2019). It is increasingly challenging to expand palm oil cultivation areas as new land is limited; thus, one way to increase palm oil is to increase oil yield (Nyouma et al. 2019; Hoffmann et al. 2017; Yue et al. 2020) in the existing plantations. According to a previous theoretical estimation, the potential maximum oil yield of the oil palm is 18.2 tons /ha/year (Corley and Tinker 2015) while the current average oil yield is only 3.8 tons/ha/year. The current low oil yield of oil palm is mainly due to only a few (3–5) cycles of genetic improvement (Soh et al. 2018) and also due to several other yield-limiting factors, including soil conditions, rainfall, water supply, amount and types of fertilizers used and pests (Woittiez et al. 2017). Therefore, there is room for the improvement of palm oil yields through breeding and field management (Yue et al. 2020). In addition, the fatty acid profiles of palm oil are not optimal for human consumption due to its high percentage (~50%) of saturated fatty acids (Edem 2002). Currently, the requirement for high-oleic plant oil is increasing as people want plant oils, which are beneficial for health. Therefore, for oil palm breeding, improving fatty acid profiles for a higher degree of unsaturated fatty acids is also an important breeding goal (Montoya et al. 2014) although it is a challenging task (Soh et al. 2018).

Many molecular methods have been used to accelerate the improvement of oil yield in oil palm. These approaches include tissue culture (Tisserat 1991), haploid breeding (Dunwell et al. 2010), mutation breeding (Rohani et al. 2012), marker-assisted selection (MAS) and genomic selection (GS) (Babu and Mathur 2016), transgenic breeding (Parveez et al. 2000), and genome editing (GE) (Aprilyanto et al. 2019). Several important issues concerning factors influencing palm oil yield (Woittiez et al. 2017) and selective breeding for improving oil yield have been reviewed in recent papers (Nyouma et al. 2019; Cros et al. 2019; Hoffmann et al. 2017). The present review article aims at synthesizing novel information on the improvement of oil palm production, including the development of molecular technologies, and their usage in accelerating genetic improvement of oil yield and quality of oil palm. We also propose the characteristics of an ideal type of palms and outline a roadmap toward the breeding of the ideal type of palms.

Cloning of elite oil palms

In the 1960-1970s, tissue culture of elite oil palms was rigorously attempted (Jones 1974). The success of tissue culture in the 1970s (Jones 1974) inspired people to try this technique. Protocols for tissue culture in the oil palm have been described in several papers (Staritsky 1970; Jones 1974; Corley et al. 1977; Marbun et al. 2015; Pannetier et al. 1985; Noiret et al. 1985) and reviews (Saleh and Scherwinski-Pereira 2016; Hashim et al. 2018). Basically, oil palm tissue culture consists of the following steps: (1) sampling of spear from elite trees and growing healthy tissues; (2) removing contaminating microorganisms using disinfection; (3) growing tissues on culture medium; (4) multiplication of cultures; (5) regeneration of plantlets; (6) hardening off for transfer to soil; and (7) planting in the prenursery and transferring to normal nursery, followed by checking uniformity, planting in field, selecting the best clones for various environments, and multiplying the selected clones. In every step, protocols may differ. Readers may obtain detailed protocols from published papers (Corley et al. 1977; Jones 1974; Marbun et al. 2015; Staritsky 1970) and reviews (Hashim et al. 2018; Saleh and Scherwinski-Pereira 2016; Weckx et al. 2019).

In the early eighties, plantations of clonal palms were started in Southeast Asia, and thereafter in other countries (Noiret et al. 1985), but the outcomes were discouraging (Khaw and Ng 1998). Because of the abnormal fruiting, high cost, long time needed and unexpected low yield (Corley et al. 1986), many companies lost confidence in cloning oil palms in the 1980s–90s. Since the 2000s, with the improvement of tissue culture protocols (Matthes et al. 2001; Eeuwens et al. 2002), the mantled flowering rate reduced to less than 5%, thus making this technology economically viable (Soh et al. 2017; Ong-Abdullah et al. 2015; Abdullah et al. 2018). More recently, a more refined understanding of the molecular mechanism governing the mantled was achieved (Ong-Abdullah et al. 2015; Rival 2020; Etienne et al. 2016), which will surely reduce the rate of mantled flowering in clonal oil palm. In 2018, there were over 20 oil palm tissue culture laboratories producing 5–8 million cloned palms globally. The yield of the clonal palms is usually 15–30% higher than the seed-based Tenera (Nambiappan et al. 2018). Oil yield of clonal palms could be estimated at almost 10 tons/ha/year. However, in past years, plantations of 100% clonal palms have encountered a serious problem: low fruit setting (our own observation). It is not known yet whether the lower fruit setting rate in clonal palms was due to too many female flowers, not enough pollen or other factors. To solve the problem of lower fruit setting of clonal palms, one method is planting a certain proportion (20–40%) of seed-based Tenera palms in the clonal palm plantation, which proved to be quite effective in improving the fruit setting. However, this is not the optimal method to yield the maximal production of clonal elite palms. It is essential to figure out the reasons for lower fruit setting in clonal palms by field observations of number and viability of male flowers, and activities of the pollinating weevil (Elaeidobius kamerunicus) to improve fruit setting in plantations with all clonal palms. It is to note that for each clone of selected palms, it is impossible to genetically improve within the clone as there is not genetic variation available. To further improve clonal palms, crossing different clones originating from different sources may generate elite Dura and Pisifera palms, which can be selected by incorporating marker-assisted selection (MAS) and genomic selection (GS).

Haploid identification and production in the oil palm

In plant breeding, haploids are important. Significant improvements of economically important traits can be achieved through doubled haploid production (Maluszynski et al. 2013). It is possible to generate homozygosity in one generation with doubled haploid production systems. In conventional breeding, absolute homozygosity on the whole genome is almost not achievable as it takes many generations. Therefore, in a perennial crop like the oil palm, such a strategy is not applicable in practice (Maluszynski et al. 2013). In the oil palm, screening for natural haploid individuals was conducted using over 100 SSRs (simple-repeat sequences), with only one double haploid found in 21,900,000 seedlings from different sources (Dunwell et al. 2010), suggesting a very low frequency (i.e., 1 out of 21,900,000) of natural double haploids. Therefore, to produce sufficient number of double haploids, it is essential to develop technologies for in vitro production of haploid oil palms (Sparjanbabu 2013).

Mutation breeding in the oil palm

In the oil palm, gamma rays were first used to induce mutation of seeds and pollen in 1977 (Nur et al. 2018). Since then, different doses (ranging from 10 to 200 Gray) of gamma irradiation have been applied and generated some mutants (Rohani et al. 2012; Mohamad 2016). Whether these levels of gamma ray are the optimal dose for inducing useful mutations remains to be further validated. Other mutagenesis methods, including the one utilising chemical mutagens such as EMS (ethyl methane sulphonate) (Omar and Novak 1990), may also be used to generate preferred phenotypes. Although several mutagenesis methods have been tried to induce mutations in the oil palm, the final performances of the mutated palms have yet to be reported due to the long generation interval. Since induced mutation may generate novel and useful mutations in the oil palm genome, which may improve the performances of important traits, further tries on mutation breeding is essential in the oil palm breeding. An alternative way to increase the genetic variation of breeding populations is to get new germplasm from natural populations, which can be done by knowing the genetic variations in natural populations by genotyping natural populations existing in the whole world using DNA markers, including microsatellites, SNPs, and next generation sequencing technologies. Certainly, this could be a huge workload, which needs proper coordination and financial support.

Molecular breeding in the oil palm

Molecular breeding refers to the use of DNA markers that are tightly linked to traits, to rapidly improve these traits (Xu 2010). In general, there are two important approaches in molecular breeding: marker-assisted breeding (MAS) and genomic selection (GS) (Xu 2010). In general, molecular breeding involves the following steps (Fig. 1): (a) planting the breeding populations, where traits of interest segregate and DNA markers show polymorphism and informativeness; (b) collecting plant leaf samples; (c) isolating DNA from leaf samples, and preparing DNA samples for genotyping DNA markers using PCR (polymerase chain reaction) or sequencing; (d) developing sufficient genomic resources, including whole genome sequences, DNA markers, linkage maps, transcriptomes and whole genome resequencing. (e) Genotyping DNA markers covering the whole genome in the populations for which the traits of interest of individuals were recorded. (f) Identifying DNA markers associated with traits through quantitative trait loci (QTL) mapping or genome wide association studies (GWAS) and validating the identified DNA markers in different populations. (g) Verifying the results of QTL mapping or GWAS in different populations to select reliable DNA markers for selecting elite individuals at the seedling stage. (h) Developing MAS and GS models using selected DNA markers. (i) Identifying/selecting the best individuals with preferred marker alleles for traits of interest; (j) repeating the above steps for a few generations to obtain stable superior varieties/lines, and (k) releasing the improved varieties/lines to commercial plantation. In this section, we summarise the current status of the availability of DNA markers, linkage maps, whole genome sequencing, GWAS, MAS, and GS, as well as provide some suggestions to tackle the challenges in them.

Fig. 1
figure 1

Ways toward MAS and GS for accelerating genetic improvement of the oil palm (see detailed explanation in the text)

Availability of DNA markers

DNA markers are the variable parts of a given genome among individuals within a species (Phillips and Vasil 2013). For example, in the 1980–90s, RFLP (Botstein et al. 1980), RAPD (Williams et al. 1990), AFLP (Vos et al. 1995), short sequence repeats (SSRs) (Litt and Luty 1989; Edwards et al. 1991) were developed. RFLP, RAPD and AFLP DNA markers are old types of DNA markers. Although they have facilitated plant breeding in the 1980s (Xu 2010), they are less used now due to low polymorphism, difficulties and instability in genotyping (White and Cantsilieris 2018). Nowadays, SSRs and single nucleotide polymorphisms (SNPs) have become the most preferred DNA markers for molecular breeding (White and Cantsilieris 2018). In this subsection, these two types of DNA markers are reviewed and commented.

SSR markers are short randomly tandem repeats of 2–6-bp motifs (Litt and Luty 1989). The polymorphism of SSRs is caused by the change of the number of repeats of a certain motif. To genotype SSRs, they are amplified by PCR. The PCR products are separated using gel electrophoresis on automated DNA sequencers. Alleles can be easily called by different methods with commercially available software (Guichoux et al. 2011). Alleles at an SSR locus show codominant inheritance; thus, both alleles in an individual can be detected. The first set of 21 SSRs in the oil palm was reported in 2001 (Billotte et al. 2001). Thereafter, more and more SSRs have been isolated, characterized and used in different genetic research on the oil palm (Lopez et al. 2004; Ting et al. 2014; Myint et al. 2019). Since the genome of Pisifera, Dura, and E. Oleifera were sequenced (Jin et al. 2016; Singh et al. 2013b), it is now easy to get SSR sequences using bioinformatics tools. A first microsatellite database for oil palm was established recently (Babu et al. 2019b). SSRs from the oil palm have been used in the analysis of genetic diversity, population structure, linkage mapping and mapping of QTL for important traits. However, due to some difficulties, including null alleles, stuttering bands, preferential implications of shorter alleles, in genotyping SSRs (Yue and Xia 2014), more and more labs are shifting to the use of SNPs for genetic studies.

A SNP is the change of a single nucleotide base between two DNA sequences. SNPs are the most abundant DNA polymorphism (Wang et al. 1998). SNPs can be genotyped using over 40 methods (Nielsen et al. 2011), including genotyping with microarray ( Kwong et al. 2016). At present, due to the rapid advance of NGS (next generation sequencing) technologies, and the substantial reduction of sequencing cost, SNPs are also widely detected and genotyped through genotyping by sequencing (GBS) (Davey et al. 2011; Bai et al. 2017). Most SNPs are diallelic codominant markers. The first set of SNPs in the oil palm was identified in expressed sequence tags in 2007 (Riju et al. 2007). Recently, over a million SNPs have been identified for oil palm (Jin et al. 2016; Kwong et al. 2016).

Linkage maps

A linkage map is a map of DNA markers, which are ordered and distanced on chromosomes based on linkage analysis of genotyped DNA makers in a reference population/family or populations/families (Botstein et al. 1980). To construct a linkage map, it is essential to have one or several segregating populations in which DNA markers segregate. Usually, for linkage mapping, the population size varies from 96 to 192 individuals. For constructing high-density and high-resolution maps, many individuals (> 500 individuals) are required, which could be costly.

The first linkage map of the oil palm was reported in 1997, consisting of 97 RFLP markers on 24 linkage groups (LGs) (Mayes et al. 1997). The first microsatellite-based linkage map with 16 LGs, covered with over 800 DNA markers, was published by a French group (Billotte et al. 2005). So far, over 20 linkage maps have been constructed using different types and numbers of DNA markers (Table 1) Recently, ultrahigh-density linkage maps with over 10,000 codominant DNA markers were reported (Bai et al. 2018a; Ong et al. 2019) (Table 1). These especially dense linkage maps are useful in identifying DNA markers associated with important traits. However, it is notable that most linkage maps were constructed using a few hundred DNA markers in a population with less than 200 palms, which limits their power in ordering DNA markers, which are closely linked, on a linkage group.

Table 1 Summary of linkage maps constructed in the oil palm

Whole genome sequencing

NGS technologies have advanced very rapidly and revolutionized biological studies (Edwards and Batley 2010) and also the genetic improvement of the oil palm. The genomes of a Pisifera individual of E. guineensis and an individual of E. oleifera were sequenced and reported in 2013 (Singh et al. 2013b). The assembled genome of the Pisifera was ~1.5 Gb in length and contained ~34,800 genes. The genome of one elite Dura palm was also sequenced (Jin et al. 2016). The genome of the Dura tree was ~1.7 Gb long and contained ~ 36,100 genes, which are similar to those of the Pisifera palm. Recently, our group is sequencing 200 palms from different continents including Africa, South America and Southeast Asia. The Nuzhdin Lab in the USA, in collaboration with MPOB, has sequenced over 200 E. oleifera and E. guineensis palms (Genbank accession PRJNA434010) sampled from Central and South America (Ithnin et al. 2020).

Genome sequences of oil palms provide important resources for rapid genetic improvement of complex traits through MAS and GS, as they facilitate the development of high-density genome/haplotype maps, identification of QTLs and discovery of new genes for important traits (Low et al. 2017; Ithnin and Kushairi 2020; Ong et al. 2020). Therefore, these genomes will help the achievement of sustainability for palm oil. However, the challenge remains to convert this huge information in the genome sequences into knowledge and technologies, which can be applied in accelerating genetic improvement of the oil palm. Better reference genomes are required for downstream applications. Thus, it is essential to fill in the gaps by sequencing long reads using the third-generation sequencing technologies, and to annotate the genome using advanced bioinformatics tools (Ong et al. 2019). Therefore, there is still a lot of work to do to improve the assembly of the oil palm genomes. International collaborations should be enhanced.

QTL mapping for important traits

In the oil palm, many important traits are complex and quantitative in nature. Most of these traits are controlled by many genes with small effects, environmental factors and their interactions (Soh et al. 2018). QTL refers to chromosomal regions or gene clusters influencing the expression of quantitative traits (Geldermann 1975). QTL mapping is the procedure to find the associations between traits and DNA markers covering the whole genome of a species of importance using quantitative genetic approaches. Studies on mapping of QTL for traits have been conducted in the oil palm since 2001 (Rance et al. 2001; Nyouma et al. 2019).

For mapping QTL for oil yield related traits and fatty acid components, several studies (Rance et al. 2001; Billotte et al. 2010; Pootakham et al. 2015; Seng et al. 2016; Bai et al. 2017; Ting et al. 2018; Xia et al. 2019; Singh et al. 2009) were conducted. These studies revealed that oil yield and fatty acid components were determined by many QTL located in different chromosomes and their effects on the phenotypic variation (PV) ranged from 5 to 45%. One study (Bai et al. 2017) showed that DNA markers associated with oil to bunch (O/B) and oil to dry mesocarp (O/DM) ratios were mapped on three linkage groups (i.e., LG 1, LG 8, and LG 10). These QTL explained from 7.65 to 13.3% of the PV. The average O/B of palms with better genotypes at two QTL for O/B was ~31% while the average O/B in trees without any beneficial QTL genotypes was ~28.2%, suggesting that pyramiding of QTL with beneficial genotypes using DNA markers can accelerate genetic improvement. Using a pedigree-based approach, 18 QTL controlling traits, including fresh bunch weight and bunch number, were detected among a large genetically diversified sample from breeding program (Tisne et al. 2015). The authors found that QTL patterns were dependent on the genetic origin. Only one QTL was shared between heterotic groups. Studies on QTL mapping for fatty acid profiles revealed that fatty acids in palm oil were determined by many QTL with relatively small effects, environmental factors and their interactions (Singh et al. 2009; Montoya et al. 2013; Ting et al. 2016).

Studies on mapping QTL for trunk height (i.e., vertical growth) were conducted (Lee et al. 2015; Pootakham et al. 2015; Ong et al. 2018; Teh et al. 2020). These studies showed that QTL for this trait were located on different LGs and their effects were also quite different among different studies, suggesting that tree height is a complex trait and is determined by many genes, environment factors and their interactions. It is also possible that different genes or pathways in palms derived from different origins may affect the height in different planting materials. Therefore, further studies on mapping QTL for this trait should use diverse oil palm genetic materials to get a comprehensive picture of genetic basis of trunk height in oil palm.

Four QTL for Ganoderma resistance were mapped (Tisné et al. 2017). Among the four QTL, two QTL were detected for the first appearance of the Ganoderma disease, and the other two were for the death of palms. This study, for the first time, provided valuable information for selecting oil palm varieties resistant to Ganoderma disease. However, whether the DNA markers in the four QTL can reliably predict the resistance against the Ganoderma disease in other breeding materials from different genetic origins remains to be evaluated.

Although many studies on QTL mapping have been performed for several important traits, including oil yield related traits, fatty acid profiles, trunk height/vertical growth and resistance against Ganoderma in the oil palm, it seems that most QTL (especially those with small effects on phenotypes) detected are family-specific and could not be confirmed in other families with different genetic backgrounds. Therefore, most markers associated with significant QTL can only be used for MAS in the families where the QTL were found, which limits their use in for MAS. To find universal DNA markers, which may be used in selecting preferred palms in any population, it is essential to conduct GWAS using large populations representing all materials available in the world. This could be a huge task. Thus, international collaborations are essential.

GWAS for important traits

Another method to detect DNA markers associated with traits is association mapping (Oraguzie et al. 2007). In this case, unstructured populations are used, which means many more recombination events exist than in QTL mapping families. Thus, association mapping using genome-wide sets of DNA markers provided valuable information for selecting (Oraguzie et al. 2007) than QTL mapping. With the rapid advances in NGS (Mardis 2008), it is now fast and cost-effective to generate millions of SNPs using NGS, thus providing an increasing ability to generate large amounts of genotyping data. Therefore, GWAS is favored over QTL mapping (Heffner et al. 2009).

In the oil palm, since 2016, several GWAS (Kwong et al. 2016; Teh et al. 2016; Ithnin et al. 2017; KalyanaBabu et al. 2020; Babu et al. 2019a; Osorio-Guarín et al. 2019) for oil yield related traits have been published. These studies revealed that besides the paramount effect of significant QTL for the shell thickness on chromosome 2, effects of other QTL for oil yield traits and trunk height/vertical growth were low, suggesting these traits were determined by many genetic factors, environments and their interactions. Further GWAS should be conducted to verify the DNA markers associated with traits. With identified DNA markers, which are significantly associated with economic traits, it is now possible to conduct GS in the oil palm. However, in order to optimize the GS prediction models, it is essential to know the effect of each marker. Thus, sufficient phenotypic and genotypic variations within training populations are essential.

MAS and GS

MAS refers to a breeding procedure where DNA markers associated with traits are used in selecting elites to accelerate the genetic improvement (Xu 2010). The use of DNA markers for selection in the oil palm can greatly shorten the breeding cycle and reduce the number of breeding cycles needed before selected elite varieties can be released (Soh et al. 2018; Xu 2010). Using DNA markers, selection can be performed at the seedling/nursery stage. GS refers to the DNA marker-based selection by simultaneously using many DNA markers covering the entire genome (Heffner et al. 2009), which is more powerful than MAS. In this section, we present an overview of the status of MAS and GS in the oil palm and solutions to tackle the challenges in such approaches.

Only a few traits in the oil palm, such as shell thickness (Singh et al. 2013a; Singh et al. 2014) and external fruit color (Singh et al. 2014), are known to be determined by single genes. For single gene traits, once the causative polymorphisms in the gene determining it are identified, it is easy to use DNA markers to identify different genotypes at the seedling stage, which could lead to fast production of improved planting materials. The Shell gene, responsible for the shell thickness, was identified in 2013 (Singh et al. 2013a). Ten independent mutations in the Shell gene cause the change of the shell thickness (Ooi et al. 2016; Singh et al. 2020). Methods for detecting the mutations in the gene, including PCR-RFLP (Babu et al. 2017), bi-directional allele-specific PCR (Reyes et al. 2015) and other PCR-based methods (Ritter et al. 2016), have been optimized and used to identify the shell type. The virescens (VIR) gene, controlling fruit skin color, was identified in 2014 (Singh et al. 2014). Each of the five SNPs in VIR causes the dominant-negative virescens phenotype. Each SNP leads to the termination of the carboxy-terminal domain of VIR. The SNPs in VIR allow for selection of the trait at the seedling/nursery stage, which is three to six years before fruiting. Thus, genotyping VIR can greatly speed up the introgression of the gene into elite breeding materials.

Generally, the more DNA markers for all significant QTL for a trait are applied for selection of candidates, the greater chance for selecting the right elite palms (Xu 2010). However, in practice, in the oil palm industry, cost of genotyping the DNA markers is also an important issue which must be considered, especially when the resources are limited. It is essential to use at least two markers flanking each significant QTL. To ensure that the right elites are selected, the markers should be close (< 5 cM) enough to the QTL to avoid getting recombinants (Xu 2010). For example, if two DNA markers flanking the QTL are located at a distance of 5 cM to the QTL and applied in genotyping selection candidates, there is a higher probability (99.95% vs 95%) for identifying of the target QTL compared to using only one of the markers. However, it is noteworthy that the application of two flanking markers for a QTL also doubles the cost of genotyping. In the oil palm, though studies on QTL mapping have identified many QTL for important traits, no detailed reports on the results of MAS on multigenetic traits have been released. This is mainly because most detected QTL for multigenetic traits are family-specific and can only be used in the selection of elite palms in the family where the QTL were detected. Although, in the oil palm, many studies detected significant QTL for important traits, significant success is only seen for the selection of single gene traits, including the shell thickness and fruit color. For multigene traits, the application of MAS to oil palm genetic improvement has not achieved the expected results yet, while some field verifications of MAS are still ongoing. Many more years may be needed to see the real effects of MAS. Many factors, including the precision of genotyping and phenotyping, statistical methods, number and spacing of DNA markers, reference population size and genetic background of selection candidates, may affect the success of MAS in the oil palm. Improvement of complex traits, including yield, quality and disease resistance, is still a great challenge in MAS. The application of MAS, using a few significant DNA markers flanking significant QTL, is still facing some challenges: (1) Many DNA markers associated with small effects are usually not detected in QTL mapping; thus, the prediction using a few markers with significant effect may not reflect the real genetic makeup of the traits of interest. (2) Many QTL detected are family-specific; thus, not all markers are applicable in other populations with different genetic background. Therefore, it is critically important to validate positions and effects of QTL in different populations. Only after validation can MAS be used in the validated populations. (3) Since in QTL mapping, the distance between the markers flanking a significant QTL is quite high (> 5 cM), false selection may occur because of recombination between the markers and QTL. Using two or more markers flanking the target QTL increases the accuracy of MAS, but also increases the cost of MAS. (4). The locations and effects of QTL may be wrongly estimated due to errors in genotyping or phenotyping. To precisely estimate the locations and effects of QTL, it is essential to have a high marker density, large population size with precise phenotypes and multiple QTL methods. (5) Many farms and plantation companies conducting conventional breeding programs may not have the essential facilities and tools to conduct MAS by themselves. Therefore, it might be a good idea to collaborate with research institutions or breeding service providers to conduct MAS, and (6) High start-up expenses may prohibit the large-scale use of MAS.

A method that is expected to gain better results for oil palm genetic improvement is genomic selection (GS). GS refers to DNA marker-based selection by simultaneously using many DNA markers covering the entire genome (Meuwissen et al. 2001). The genotypes of these markers, usually SNPs, are used to estimate the breeding values of complex traits (also known as genomic estimated breeding value, or GEBV) for selecting a few desirable individuals from many (> 1000) selection candidates (Hayes et al. 2009). For sufficient accuracy of the estimates of GEBV, the set of markers used should be genome-wide and ideally have high marker density, such that all QTL are in linkage disequilibrium with at least one marker. However, it is notable that in GS, sometimes, using more markers does not necessarily improve the precision of the selection. Studies have been conducted on optimising GS in terms of marker density, among other parameters. In most agronomic plant species, a subset of significant SNPs associated with traits is selected to genotype the selection candidate and estimate their genome-wide breeding values. GS usually consists of two phases: training phase and breeding phase. In the training phase, phenotypes of training populations are recorded and all individuals in the training population are genotyped with genome-wide SNPs. Then the phenotypes and genotypes are statistically analyzed to detect the significant associations between trait values and genotypes using statistical software, such as GWASpro (Kim et al. 2019), which can be found through Google. Subsequently, GEBVs are calculated and applied in choosing desirable individuals from many selection candidates in the breeding phase. GS is different from MAS. In GS, all markers associated with traits are used to estimate breeding values of an individual, while in MAS, only markers with big effects were detected and used in calculating the breeding value of an individual. Therefore, in terms of the power/precision of the selection, GS is higher than MAS. On the other hand, GS needs more DNA markers and is more expensive compared to MAS. These difficulties are now being overcome due to the decrease in the cost of SNP genotyping and high-performance computing (Cros et al. 2017; Cros et al. 2015b; Kwong et al. 2017). Here, the status and the challenges of GS in oil palm breeding are summarized and discussed.

In the oil palm, a simulation study showed that in a realistic and small population size of 50 individuals, GS is better than marker-assisted recurrent selection (MARS) and phenotypic selection (Wong and Bernardo 2008). A previous study (Cros 2013; Cros et al. 2015a) assessed the potential of GS in predicting GCA in parental populations including Deli and group B palms. The study showed that the accuracy of GS was enough to make a preselection in the group B on some yield components with candidates (sibs, progenies) related to the training population. In addition, the authors simulated the data for four generations and found out that the accuracy of several GS approaches was sufficient for selecting nonprogeny tested individuals based on their genotypes. They concluded that GS could lead to an increase in annual genetic gain of over 50% as compared with the traditional breeding approach. In 2014, a report on GS using a few hundred DNA markers was published (Cros et al. 2015b). The study evaluated the accuracy of GS in breeding populations using different GS approaches. Two parental populations (Deli and Group B) involved in reciprocal recurrent selection (RRS) were genotyped using 265 SSR markers. The results showed that the GS accuracy in different GS approaches ranged from −0.41 to 0.94. The statistical methods did not affect the accuracy of GS, whereas the relationship between training and selection populations is positively associated with the accuracy of GS, suggesting the importance of the training population. Recently, in another GS study (Kwong et al. 2017), a production population with 1218 oil palm individuals were genotyped with a SNP array containing 200,000 SNPs. All GS methods assessed, including RR-BLUP, Cπ (BC), Lasso (BL), Bayes A (BA), and Ridge Regression (BRR) got intermediate prediction accuracy of selection ranging from 0.40 to 0.70. By selecting important markers, which were most significantly associated with traits of interest, RR-BLUP B seemed to outperform other methods. The comparison of the time required for two cycles of selection in oil palm, between the conventional breeding and reciprocal recurrent GS is shown in Fig. 2. One cycle of conventional breeding (i.e. RRS) in the oil palm needs at least 20 years due to the preselection of Dura before progeny tests for general combining ability (GCA) and specific combining ability (SCA) of parents, which were conducted for important traits. For GS with DNA markers, it takes 24 years to complete two cycles of selection (Fig. 2). For the first cycle of GS, where it is essential to optimize the GS model while preselection for traits is not required, it takes 18 years. The second cycle of GS only needs six years to complete as the selection is based on the genotypes at marker loci (Cros et al. 2018). For GS in the oil palm, theoretically, selection based on genotypes can be carried out in any individual. However, in practice, it may not be true as the effects of DNA markers estimated in the training population may differ from the actual effects in the selection or validation populations. To solve the problem, the training population should be big enough and the prediction models must be optimized for each generation based on the existing genotypes and phenotypes. Recently, Sime Darby, a Malaysia listed oil palm plantation company claimed that the CPO yield of its Tenera palms, which were selected using GS, was 9.9 metric tons/ha (Personal communication: Mohamad Helmy Othman Basha). However, the company did not release the details about how the palms were selected and how many SNPs were used in the selection. It seems that the claimed CPO yield was just based on the yield record for the first year of fruit harvesting. Further record of CPO yield for at least 4 years is essential to confirm its GS results. It is also to note that the company only reported the selection of Tenera. It is still not known whether the company selected elite Pisifera and Dura to generate high yield hybrid Tenera. Anyway, GS is a promising approach to accelerate genetic improvement of oil palm.

Fig. 2
figure 2

The time required in RRS (reciprocal recurrent selection, left) vs. reciprocal recurrent GS (genomic selection, right) in two selection cycles of the oil palm. Green arrows: genotyping with DNA markers.

Today, due to the rapid advance of NGS, genotyping cost has been dramatically reduced, making it possible to genotype many individuals in breeding populations. GS is expected to further accelerate genetic improvement. However, to make GS accurate and economically viable for oil palm, it is essential to tackle a few challenges. To obtain high precision of SNP genotyping, the reference genome must be further improved by filling the gaps in the draft genome and annotation of genes, including protein-coding genes and regulatory genes. To improve the accuracy of genomic selection, the training populations should be big enough (> 2000 trees) and traits must be precisely measured for at least 5 years. Reducing the genotyping cost is another critical issue in GS (Lu et al. 2020; Guo et al. 2019). Ideally the genotyping cost should be lower than USD 10/tree. With the current speed of advancement of sequencing technologies, this price could be reached within a few years. Recently, due to the fast advance of artificial intelligence (AI), including machine learning, using the existing big data sets of phenotypes, genotypes, plantation location and condition in the oil palm, it is highly possible to get better prediction models of GS for precisely selecting the elite oil palms.

Transgenic oil palm

Researchers have developed techniques of genetic engineering in order to modify some important genes controlling fatty acid synthesis to generate palms with high oil yield and better fatty acid profiles (Parveez et al. 2000; Parveez 1998; Cheah et al. 1995; Abdullah et al. 2005; Yunus and Kadir 2008; Hani et al. 2020). Although transgenes in the oil palm have been studied and tried for more than thirty years with some success, the technology is still facing several technical difficulties. All these protocols developed so far allow for the rapid generation of transgenic oil palms for rapid genetic improvement. However, there is no report on the performance of transgenic palms. Furthermore, even if their oil yield and quality are improved, the remaining question is whether people can accept palm oil generated by transgenic palms.

Genome editing (GE) in oil palm

GE allows us to precisely manipulate DNA sequences and thus accelerate genetic improvement. There are currently several GE approaches available, such as TALENs, ZFNs, and Crispr/Cas-9. Among these, Crispr/Cas-9 is believed to be more specific, effective and easy to operate than the other technologies (Yarra et al. 2020). Therefore, CRISPR/Cas-9 is widely used in knockout or knockin of genes for characterizing gene functions and improving traits (Yin et al. 2017). GE using the CRISPR/Cas-9 system will speed up crop breeding effectively by enabling precise and predictable modifications of DNA sequences directly in elite individuals, as well as simultaneous modification of multiple genes for multiple traits. Here, the status of the application of genome editing technologies in oil palm breeding is briefly summarized. Detailed protocols of genome editing can be found in the review by Wayne et al. (Wayne et al. 2019). Readers may use and modify the existing protocols for gene or genome editing in the oil palm for further improvement.

A Malaysian group tried to establish Crispr/Cas-9 for GE to improve oil palm’s tolerance against Ganoderma in 2018 (Budiani et al. 2018). The authors tried to edit the isoflavone reductase gene and the metallothionein-like protein gene. The Crispr-IFR and Crispr-MT modules were transformed into calli. The transformed calli were propagated on a DF medium (Dworkin and Foster 1958). Then they were sub-cultured on the DF media to induce somatic embryos. PCR confirmed that the constructs for both genes were introduced into the cells. Analysis of DNA sequences showed deletions of nucleotides in the targeted regions of both genes. After 3–4 weeks of culture on the DF media, somatic embryos developed (Budiani et al. 2018). In a recent study, a group designed a construct to edit four genes simultaneously. A CRISPR/Cas-9 plasmid containing four sgRNAs was constructed (Aprilyanto et al. 2019). However, it is not known whether the construct was transformed into the oil palms. A recent review on CRISPR/Cas-9 technology in oil palm was published recently (Yarra et al. 2020). Readers may get more detailed information about this technology in that review.

Although GE is a promising technology to generate selected genotypes, it will take many years to see the final results due to the very long production cycle of the oil palm. In the oil palm, there is another issue related to GE: identifying which genes are to be edited using GE. To find the important genes for economically important traits, it is essential to identify these genes through QTL mapping, positional cloning and GWAS.

Conclusions and future directions of breeding ideal palms

In the past 100 years, conventional breeding and improvement of field management have substantially improved oil palm yield. The global demand for plant oils is growing at 4–6% annually (Teh et al. 2017), while with conventional breeding, genetic improvements only contribute around 1–2% annually. Therefore, this huge gap must be filled by either increasing the use of land or improving the productivity of existing plantations. Increasing the use of land for oil palm plantations, especially the conversion of forested land, is no more an option. Therefore, to make the palm industry more sustainable, it is better to breed ideal palms to continuously improve oil yield without enlarging current plantation areas (Padfield et al. 2019), thus minimizing the destruction of forests for oil palm plantations. We believe that the ideal palms should contain at least the following characteristics: (1) CPO yield of 8–9 tons/ha; (2) slow increment of trunk height/vertical growth (< 35 cm/year); (3) resistance to the major pests and diseases; (4) amenable to mechanized harvest; (5) efficient use and conversion of fertilizers; (6) better fatty acid profile for cooking; (7) longer usable time (e.g., > 30 years); (8) tolerance to biotic (e.g., drought) and abiotic stress, and (9) ideal fatty acid profile for human consumption. Though some palms with several of these good traits are available, the palm with all these characteristics is unfortunately not available yet. The question is whether such an ideal palm is possible and how to integrate these good traits into good palms. Advanced genetics, quantitative genetics, and genomics tools available for oil palm will enable us to obtain novel knowledge on the genetic makeups of important traits. New knowledge and technologies (e.g., GS and GE) will increase productivity and quality of the oil palm. The resulting sets of breeding approaches and biotechnologies are being applied in the genetic improvement of the oil palm. With the current technologies as summarized in this paper and emerging novel technologies including GE, it is now highly possible to generate such ideal palms in a relatively shorter time frame (e.g., 40–50 years). The use of a combination of conventional and molecular breeding approaches is useful in accelerating genetic improvement of the oil palm and improving the sustainability of its use. Here, a road map to breed ideal palms is suggested (Fig. 3). (1) Conducting world-wide survey on palms bearing favorable characteristics, (2) identifying DNA markers associated with desired traits using QTL mapping or GWAS, (3) maintaining the trees with good traits by using tissue culture, (4) pyramiding good traits into a few elite trees using DNA markers, (5) further improving good traits of elite palms with genome editing, (6) cloning elite trees with many improved traits using tissue culture, and (7) commercial plantation using elite palms. Certainly, alternative approaches may exist which are also effective or even better than our proposed roadmap. It is hoped that oil palm scientists and breeders can work together to find out innovative ways to accelerate the genetic improvement. It is hopeful that with so many novel and advanced technologies in hand and new emerging technologies, the productivity of the oil palm will be substantially improved, which in turn makes palm oil a sustainable commodity.

Fig. 3
figure 3

A roadmap to breed ideal palms, which should have at least nine characteristics