Introduction

In the past few decades, substantial increases in yield have resulted from genetic improvement in many crops including maize (Duvick 1984) and apple (Igarashi et al. 2016). However, genetic improvement is still in its infancy in many tree species due to their long generation times and the cost of screening new cultivars (Khan and Korban 2012; Kumar et al. 2013a; Isik 2014; van Nocker and Gardiner 2014; Cros et al. 2015). High yield is often the focus in crop breeding programmes, yet selection gains can be hindered since yield is commonly difficult to select due to its complex nature. The process of yield genetic gain in fruit tree crops can be accelerated in a number of ways.

One method of improving yield is by mining for yield component traits. Component traits that are correlated with yield, and are more heritable and easier to measure, may be used to indirectly select for high yield (Fraser and Eaton 1983; Sparnaaij and Bos 1993; Piepho 1995). This indirect selection may increase breeding gains by reducing cycle times if the component traits are measured earlier in the process than yield.

Other methods to increase yield in crop breeding programmes include employing DNA-based technologies. This includes combining genome-wide association studies (GWAS) with marker-assisted selection (MAS), and using genomic selection (GS) (Lande and Thompson 1990; Varshney et al. 2005; Endresen 2010; Khan and Korban 2012; van Nocker and Gardiner 2014; Isik et al. 2015). GWAS can help identify genetic markers associated with key yield component traits, which can then be screened for in a population and elite candidates selected using MAS. GS can be used to select for the more complex trait yield by modelling genetic markers across the genome and their effect on the trait to predict the yield of each candidate.

Luby and Shaw (2001) proposed that fruit crops have more to gain from MAS than annual crops due to their large tree size and long generation times, and the time and cost involved in maintaining the trees. However, they recognised that this may be true only if the trait in question is simply inherited, is economically important, and is conventionally very expensive to measure (Luby and Shaw 2001). Since that time, the technology of molecular markers has dramatically expanded and advanced. Genomics-based methods for improving the efficiency of breeding programmes such as GWAS and GS are now particularly pertinent for fruit trees (Wong and Bernardo 2008; Kumar et al. 2012b; Iwata et al. 2016; Yamamoto and Terakami 2016; Peace 2017). These methods have advanced from the more fundamental marker-assisted breeding and trait mapping, have higher accuracies and wider applications (Iwata et al. 2016), and have potential use in breeding for increased yield in a crop such as macadamia.

This review investigates genomic improvement in crop breeding, with specific reference to fruit and nut tree crops including macadamia. The potential use of yield component traits, GWAS, and GS in improving yield in macadamia will be explored.

Macadamia: a native Australian nut

Macadamia (Proteaceae) is a subtropical rainforest tree, native to the east coast of Australia between Mount Bauple, Queensland, and Lismore, New South Wales (Gross 1995; Hardner et al. 2009). The genus contains four species: M. integrifolia, M. tetraphylla, M. ternifolia, and M. jansenii (Peace et al. 2008; Hardner et al. 2009). Individual trees are produced predominantly from outcrossing, similar to many other rainforest species, are large in size, and have a long juvenile period (Fig. 1; Sedgley et al. 1990; Trueman and Turnbull 1994). Both M. integrifolia and M. tetraphylla and their hybrids are cultivated around the world for their edible nuts (Fig. 2; Hardner et al. 2009).

Fig. 1
figure 1

Timeline of the Australian macadamia breeding programme’s first generation, showing evaluation steps indicative of traditional breeding practices

Fig. 2
figure 2

Macadamia nuts—the edible kernel is enclosed in a hard, woody shell and the outer husk. Left to right: nut in husk, split husk, nut in shell, cracked shell, and kernel. Illustration by Todd Fox

Macadamias are diploid (2n = 28), highly heterozygous, with genome size estimates ranging from 652 Mb (Nock et al. 2016) to 780 Mb (Chagné 2015). A draft genome assembly of short-read Illumina sequences from cultivar ‘HAES 741’ covers 79% of the total estimated genome, at 518 Mb in length (Nock et al. 2016). Nock et al. (2016) discussed ongoing work to improve genome coverage by incorporating deeper, long-read PacBio sequence data and develop a high-density linkage map, which will be advantageous for future genomics studies. The Australian National Macadamia Germplasm Collection contains accessions across all four Macadamia species, which is also available as a genomics resource for sequencing (Hardner et al. 2004).

Domestication and cultivation

Domestication of macadamia is only relatively recent, with cultivation beginning in the late 1800s (Peace et al. 2008; Hardner et al. 2009; Hardner 2015). Two early importations of M. integrifolia nuts from Australia to Hawaii occurred in the 1880s and 1890s (Hamilton and Fukunaga 1959). It has been suggested that the first exports originated from near Mount Bauple (Hardner 2015). Planting of seedlings began around 1920 by the Hawaii Agricultural Experiment Station, whilst evaluation and selection of new cultivars commenced in the mid-1930s (Hamilton and Fukunaga 1959; Hardner 2015). This programme provided the majority of commercial cultivars currently grown around the world (Hardner 2015). In Australia, the first orchards were established near Lismore, NSW, in the 1880s and in Queensland in 1910 (Hardner et al. 2009). As such, cultivated varieties are only a few generations removed from their wild relatives. Macadamias are mainly produced in Australia, South Africa, the USA (Hawaii), and Kenya (Australian Macadamia Society 2012).

Trees begin to bear nuts after 4 to 5 years, are fully mature after 10 to 15 years, and can be commercially productive for up to 60 years (Hardner et al. 2009). In 1997, the Commonwealth Scientific and Industrial Research Organisation (CSIRO) launched an Australian macadamia breeding programme from which subsequent selections have been made for parental crossing and regional variety trials (RVTs) to evaluate elite selections (Fig. 1; Hardner et al. 2002). Large areas are required over various environments to evaluate new cultivars. RVTs in macadamias are usually maintained for 8 years from planting date, with many traits measured each year (Hardner et al. 2002). Several years of data are required in order to select high-yielding cultivars (Hardner et al. 2001). Yield and growth data up to year 8 are available for ~ 2000 trees in Australia from these trials.

High yield is the primary trait used to select new cultivars (Stephenson et al. 1986; Hardner et al. 2009; Howlett et al. 2015). Other important traits selected in the Australian breeding programme are high kernel recovery, small tree size, and high proportion of intact kernels (Topp et al. 2012). Nut-in-shell (NIS) yield refers to the weight of de-husked nuts at 1% moisture content with the shell intact (Hardner et al. 2002). Kernel recovery (KR) is the ratio of kernel to nut mass (Kester and Asay 1975; Hardner et al. 2002); high KR is desired as this indicates that the kernel is relatively large compared with the weight of the shell. However, cultivars with high KR have thin shells, which are susceptible to pests and diseases (Hardner et al. 2009). Depending on the use of the product, whole unbroken kernels may be desirable, so this is also an important trait (O'Hare et al. 2004; Hardner et al. 2009). Industry standards for these traits are as follows: 5 t per ha NIS of > 18 mm diameter, > 36% KR, 2–3 g kernels, and > 50% whole kernels (O'Hare et al. 2004); however, production can fall short of these standards.

Gain from selection

Gain from selection efforts in breeding can be predicted using the following formula:

$$ R=\frac{h^2S}{Y} $$

where R is the response per year, or genetic gain; h2 is the narrow-sense heritability, and also a function of trait measurement accuracy; S is the selection differential, the amount by which the average parents’ performance exceeds the average breeding population performance; and Y is the selection cycle length in years (Hansche 1983). In macadamia, genetic gain is impeded by a long breeding cycle with the current breeding programme requiring 8 years to evaluate NIS yield (Fig. 1). Reducing the selection cycle is thus an important aim for macadamia breeding. Selection intensity and accuracy are costly to improve due to the large plant size which adds to the cost of increased population size and replication. The following sections address these factors with reference to improvements in macadamia.

Yield and its component traits

Yield is the highest priority in macadamia breeding; yield was consistently ranked by the industry as the most important future cultivar characteristic (O'Hare and . 2010) and was economically weighted the highest of all traits in the selection index used in Australian macadamia breeding (Hardner et al. 2006). Macadamia yield can be quantified as wet nut-in-husk, wet nut-in-shell, nut-in-shell dried to 1% kernel moisture, and expressed on a per tree size or per hectare basis (Hardner et al. 2009). However, selecting for yield in the breeding programme is difficult as it is a complex trait (variation is the result of small effects at many loci). Hardner et al. (2002) found that broad-sense heritability for annual NIS yield ranged from 0.06 to 0.18, whilst cumulative NIS ranged from 0.11 to 0.20 (Table 1).

Table 1 Heritability and correlations between various flower and fruit characteristics in macadamia and other nut crops. r g genetic correlation, r p phenotypic correlation, H2 broad-sense heritability, h2 narrow-sense heritability

Yield or other complex traits may be indirectly selected through correlated component traits that are more heritable (Fraser and Eaton 1983; Sparnaaij and Bos 1993; Piepho 1995). It is best to initially explore simple component traits related to yield that may be easier and/or cheaper to measure (Sparnaaij and Bos 1993). The investigation of component traits can reduce cycle times if they can be measured earlier in the tree’s life, and selection intensity can be increased if the traits are efficiently measured, allowing evaluation of a larger number of plants. This is particularly true when the trees are young, as less land and fewer resources are required. However, Fraser and Eaton (1983) noted that in broad acre and horticultural crops, it may be ineffective to rely on component traits correlated with the complex target trait as many components are often linked. Other sequential and path analyses have been proposed to overcome this difficulty (e.g. Li 1975; Thomas and Grafius 1976; Eaton and Kyte 1978; Sparnaaij and Bos 1993; Piepho 1995).

It is important to recognise the relationship between different traits and how they affect yield (Samonte et al. 1998). Understanding genetic parameters such as heritability and correlations between various traits can help select parents in breeding programmes (Falconer 1989; Bodzon 2004). In Prunus, traits affecting fruit quality such as flavour, colour, and shape are often related (de Souza et al. 1998; Cantín et al. 2010). Correlations of component traits and yield in macadamia and other nut crops are presented in Table 1.

There are many components of yield in macadamia, some of which have been evaluated and others that need further exploration (Hardner et al. 2009). This review focuses on those factors that affect flower and nut development, and hence resource utilisation for the nuts. Further research is needed to understand the different components of yield in macadamia and to identify the important related traits that can easily be measured.

Flowering and growth traits

Flowering plays a critical role in fruit production, and so it is necessary to understand the factors affecting flower development (Westwood 1993). A review of flowering and fruiting in macadamia was conducted by Trueman (2013). The flowers are initiated on inflorescences called pendant racemes, varying from 6 to 30 cm in length (Huett 2004; Fig. 3). A mature tree can produce about 2500 racemes, with 100–300 flowers (florets) on each raceme. Macadamia flowers are pollinated predominantly by native stingless bees and European honeybees (Trueman 2013; Howlett et al. 2015).

Fig. 3
figure 3

Stages of flower and nut development on a raceme in macadamia. a Developing florets, with looping stage shown near base of raceme. b Anthesis. c Initial nut set, with fewer nutlets than florets. d Developing nuts, fewer than previous stage. e Nuts in husk at full size. f Nuts dehisce from husk and fall to ground. Illustration by Todd Fox

Macadamia is generally self-incompatible through mechanisms including protandry, though there is evidence of self-compatibility in some cultivars (Urata 1954; Sedgley et al. 1985; Sedgley et al. 1990). Self-incompatibility in plants can be controlled by several multi-allelic genes acting at different stages of flower development (de Nettancourt 1977; Seavey and Bawa 1986). For example, self-fertility in almond (Prunus dulcis) is controlled by a major gene, operating in a quantitative manner (Kester and Asay 1975). Sedgley et al. (1990) found that several macadamia cultivars (predominantly M. integrifolia) presented inferior pollen tube growth from self-pollen compared with outcrossed pollen, as well as lower fruit set. Macadamia tetraphylla also showed some self-compatibility, though, again, cross-pollen produced higher seed set per raceme (Pisanu et al. 2009). Self-compatibility should be investigated in various genotypes to identify if it is a heritable trait in macadamia, as this may be a target for breeding and selection to increase pollination success. Furthermore, the relationship between self-fertility and nut yield could be a focus of research in macadamia genotypes to determine if inbreeding level affects seed set.

In Australia, flower development occurs from May to October. Bud initiation begins in May, followed by bud dormancy (50–96 days), raceme and floret elongation, style elongation and looping, and anthesis (Moncur et al. 1985). Fertilisation occurs 1 week after anthesis; however, most flowers abscise in the following 2 weeks. Fruits develop and some premature fruit drop occurs; nuts are mature about 28 weeks after anthesis (Nagao and Sakai 1990). Research is required to investigate the heritability of raceme and floret characteristics, and their correlation with yield.

Several studies have investigated the relationship between yield and flowering with variable results. Since the racemes have many florets, there are many opportunities for nuts to be set; floret number does not appear to be a limiting factor. Ito (1980) stated that about 0.3% of the flowers develop into mature, saleable nuts. However, this analysis was based on estimates of the numbers of racemes and flowers and of yield, and not replicated measurements. Further, Urata (1954) argued that it was unreasonable to count the number of flowers per length of raceme due to the low percentage of flowers setting nuts and the low correlation between the two characters.

Trueman and Turnbull (1994) found that number of flowers per raceme affected initial fruit set in different pairs of cross-pollinated cultivars. Both cv. ‘H2’ and cv. ‘333’ racemes bearing 200 flowers when cross-pollinated with cv. ‘246’ had higher initial fruit set (22.3 and 40.3%, respectively), than control racemes (9.1 and 24.0%). For cv. ‘660’, racemes with 50 flowers set more fruits (21.6%) than those with 200 flowers (15.9%) when cross-pollinated with cv. ‘344’. In comparison, the number of fruits per raceme at the final nut set increased with number of flowers per raceme for cv. ‘660’. Trueman and Turnbull (1994) also found that for cv. ‘660’, nut and kernel fresh weights as well as KR were higher in cross-pollinated (with cv. ‘333’ and ‘246’) fruits than control racemes. These results demonstrate significant variation in pollination success between macadamia cultivars. Further research is required across many genotypes to determine if raceme and flower production and fruits per raceme limit yield in macadamia.

Xylem and phloem transport water, nutrients and photosynthates throughout plants (Campbell and Reece 2002), and thus the size of these vessels may influence the growth of limbs and fruit. Hardner et al. (2002) found that cumulative NIS yield up to year 10 was positively phenotypically correlated with girth of the trunk stem (0.59) in 40 cultivars (Table 1). The rachis (raceme stem) in macadamia enlarges after anthesis and is wider in inflorescences with high numbers of nuts than in inflorescences with low numbers of nuts (Urata 1954). Fruiting wood with larger diameters produced larger fruits in two out of six peach (Prunus persica) cultivars (Porter et al. 2002). The inheritance of the diameter of tree trunk, raceme stems, and fruit pedicels should be investigated, along with the relationship between yield and these characteristics.

Fruit characteristics

Nut development and abscission affect yield and profitability in macadamia (Boyton and Hardner 2002). After pollination, the pollen tube grows down to the ovary and fertilises one of the two ovules. Occasionally both ovules are fertilised, resulting in twin nuts (Sedgley 1981). The immature nut expands and oil accumulates from 80 to 165 days after anthesis (McConchie et al. 1996; Trueman et al. 2000). Nuts can drop from just after fertilisation to when they are mature (Boyton and Hardner 2002). Nuts are mature when they reach their maximum oil content, about 135 to 165 days after anthesis, or from February to March in Queensland and New South Wales, Australia (McConchie et al. 1996); commercial nut drop can continue through to September, depending on the cultivar (McConchie et al. 1997; Boyton and Hardner 2002). The husk can dehisce (split open) on the tree and the nut-in-shell falls to the ground, or the husk may fall with the shell (Hardner et al. 2009). It is economically advantageous to have cultivars in which the nut-in-husk abscises from the tree. This results in higher yield recovery due to the improved mechanical harvesting efficiency and reduced carry-over due to disease-harbouring stick-tight husks.

Saleable kernel yield in macadamia is related to several component traits. Parents and their progeny are usually selected for NIS yield and KR. The nuts consist of an edible kernel enclosed by the shell, a woody testa, and husk (an outer pericarp) (Hardner et al. 2009; Fig. 2). Nut size is also an important yield component trait in almond: larger nuts correspond with higher yields per acre (Kester and Asay 1975). Topp et al. (2012) suggested selecting for high KR after 4 or 5 years as an indirect indication of future precocity. Precocity is a desirable trait in macadamia (Hardner et al. 2009), though as found in the cultivar ‘Ikaika’ precociousness may mean lower yields at later ages compared with other cultivars (Hamilton and Ito 1984).

It may be useful to investigate the partitioning of the tree’s resources in husk, shell, and kernel. Harvest index was initially described by Donald (1962) regarding the ratio of economic yield to total biomass in grain crops. Cannell (1985) proposed that harvest index in perennial fruit trees should relate to the ratio of harvested fruit to total above-ground dry biomass. However, as only a portion of the macadamia nut is edible, then perhaps, an index based on the kernel rather than the nut-in-shell and husk should be used. As much as 30% of the moisture in a macadamia nut may be in the husk (Rosengarten 2004). Husk hardness, which influences the level of pest damage (Hardner et al. 2009), differs between cultivars (Campbell et al. 2005). It is not known whether the size of the husk affects yield. It is of interest to investigate if energy used by the tree in producing husk occurs at the expense of kernel production.

Bazzaz et al. (1987) suggested that perennial plants may not invest as much energy into reproduction as annuals as they have more opportunities to reproduce and can allocate resources to other activities such as defence. Flowering intensity has been inconsistently correlated with reserves of carbohydrates in macadamia trees (McFadyen et al. 2012). Carbohydrate resources may be depleted during flowering and fruit development, meaning that fruit set is negatively affected (Stephenson et al. 1989; Wilkie 2009).

Previous studies have reported correlations between different components of yield. Nut and kernel weight were strongly correlated in macadamia (rg = 0.79, rp = 0.68; Table 1) (Hardner et al. 2001; Peace 2005), and KR decreased significantly with increased shell thickness (rp = − 0.70) (Leverington 1962). Kernel recovery and kernel mass were moderately correlated (rg = 0.48, rp = 0.49, p < 0.005) in different cultivars (Hardner et al. 2001). Hansche et al. (1972) found that walnut crop decreased with increased nut weight (rp = − 0.20), after adjusting for year effect. In other species like cashew nut (Anacardium occidentale), the yield per tree was highly correlated with both number of nuts per panicle (rp = 0.844, p < 0.01) and number of hermaphrodite flowers per panicle (rp = 0.863, p < 0.01) (Aliyu 2006). In pecan (Carya illinoinensis), Thompson and Baker (1993) found a moderately low phenotypic correlation (rp = 0.394, p < 0.002) between KR and kernel weight, whilst Kumar et al. (2013a) found a high correlation (r = 0.569). The differences between these pecan studies may be due to alternate fruit bearing in the crop, differences in the study populations, or year of data collection (Thompson and Baker 1993; Kumar et al. 2013a).

No information is available on the link between macadamia yield and raceme length or number of nuts per cluster. However, fruit set per raceme varied in different cultivars (McConchie et al. 1997; Boyton and Hardner 2002). As previously mentioned, the heritability of NIS yield per tree in macadamia is low. Broad-sense heritability based on individual trees ranged from 0.06 to 0.18 for annual NIS, 0.11 to 0.20 for cumulative NIS, and 0.11 to 0.21 for cumulative kernel yield, between 4 and 10 years after planting, respectively (Table 1) (Hardner et al. 2002). Quantification of these traits in a wider group of genotypes will be beneficial for the breeding programme.

Nut size and ratio of edible nut to shell are important breeding factors in nut tree crops. Nut weight, kernel weight, and KR in macadamia were all found to have the same broad-sense heritability of 0.63 by Hardner et al. (2001); Table 1). These characteristics were also measured in 152 pecan families (Thompson and Baker 1993). Pecans are selected for thin shells and high KR, similar to macadamia. Estimates of narrow-sense heritability for nut and kernel weight in pecan were 0.35 and 0.38, respectively. In contrast, in hazelnut (Corylus avellana), kernel weight, KR, and relative husk length (husk:nut length) were highly heritable (Yao and Mehlenbacher 2000; Table 1). Walnut nut weight was also very highly heritable, though crop heritability was extremely low (Hansche et al. 1972; Table 1). Kumar et al. (2013a) selected nuts from 34 pecan selections and three standard cultivars and found that broad-sense heritability for nut yield and KR (> 0.85; Table 1) was extremely high compared with macadamias (0.14). However, this may have been due to favourable environmental conditions during the study, rather than genetic influence in pecan (Kumar et al. 2013a).

Genetic gain for yield may be hastened by selecting for yield component traits instead of selection for yield per se. However, indirectly selecting for high yield through component traits depends on a number of factors. Firstly, success depends on the genetic variance of the component trait and its heritability, which also encompasses the accuracy of measuring the trait. Traits with high heritability will be more easily bred and selected for than traits with low heritability. Component traits should be highly correlated with yield and more easily and cheaply measured than yield (Sparnaaij and Bos 1993).

The relative efficiency of indirect selection on a trait X via direct selection for trait Y depends on the ratio of correlated response (CR X ) to direct response (R X ) (Falconer 1989):

$$ \frac{CR_X}{R_X}=\frac{i_Y{h}_Y{r}_A{\sigma}_{AX}}{i_X{h}_X{\sigma}_{AX}} $$

Or if the selection intensities are the same, more simply, h y r A /hx.

Hardner et al. (2001, 2002) estimated heritability for kernel mass (H2 = 0.66; therefore, hy = 0.81) and cumulative kernel yield to 10 years (H2 = 0.14; hx = 0.37) with a genetic correlation (rA) between these traits of 0.30. Thus, using the above equation, for this population, the ratio of the two responses was 0.65, indicating that indirect selection using kernel mass was only 65% as efficient as direct selection for yield. A genetic correlation of > 0.46 would be needed for indirect selection of kernel mass to be more efficient than direct selection for yield. These estimates were from a population of clonally propagated elite selections and cultivars. Genetic estimates are required from segregating progeny populations to allow conclusions of the use of indirect selection in stage one breeding. More combinations of yield and component traits should be investigated to determine if correlated response to selection is promising in macadamia populations.

Genetic gain will also be affected by the stage at which the component trait can be measured and assessed in the tree: if it can be measured when the trees are juvenile, then costs may be reduced by elimination of inferior individuals prior to expensive field evaluations. Trees must flower before crosses can be made to produce the next generation of seedlings. Therefore, selection for component traits should be coupled with selecting for early flowering (Huett 2004).

Using genomic information to accelerate genetic gains in tree crops

Identifying target genes through genome-wide association studies followed by marker-assisted selection

Economically important traits in fruit trees such as yield and quality are likely to be controlled by several multi-allelic genes or a very large number of genes (Khan and Korban 2012; Iwata et al. 2016). Genome regions identified through linkage (family based) mapping as being associated, or in linkage disequilibrium (LD), with the target or component traits are called quantitative trait loci (QTLs) (Iwata et al. 2016). These QTLs can be used to predict individuals with high breeding values which can be used in marker-assisted selection (MAS) (Lynch and Walsh 1998; Myles et al. 2009; Hayes and Goddard 2010; Muranty et al. 2014; Iwata et al. 2016). Breeding values (BVs) are the sum of the mean additive effects of all alleles in an individual (Heffner et al. 2009).

Since QTLs can be thousands of kilobases in length, multiple genes may be closely linked with the target gene (Khan and Korban 2012). Linkage drag may occur with adverse results for the breeding programme as a result of undesirable traits positioned in proximity to desired ones (Khan and Korban 2012). As such, a more directed approach is desirable for capturing significant genes using genomic methods (Savolainen and Pyhäjärvi 2007).

Genome-wide association studies (GWAS) can utilise the allelic state of unrelated individuals to detect markers linked with target traits including the broader germplasm pool rather than using family-based methods such as biparental controlled crosses, which can be impractical, laborious, and expensive (Myles et al. 2009; Iwata et al. 2016). Association mapping in the form of GWAS offers a more fine-scale approach than QTLs to identify smaller, individual markers in LD with target traits. This can overcome the detrimental effects of genetic drag as the marker intervals are shorter, as well as accounting for population structure (Rikkerink et al. 2007; Myles et al. 2009; Brachi et al. 2011; Khan and Korban 2012; Isik 2014). The incorporation of population structure and kinship information can reduce a major problem of false associations between markers and phenotypes in a GWAS (Brachi et al. 2011; Khan and Korban 2012; Iwata et al. 2016).

In GWAS, each marker is tested individually for an association with the trait (Hayes and Goddard 2010; Khan and Korban 2012; Huang and Han 2014). This process relies on markers being in LD with the genes controlling the trait (Balding 2006), and very few markers are located within that causative locus itself (Hayes and Goddard 2010). Single nucleotide polymorphisms (SNPs) are now a commonly used genetic marker for these studies, where there is a variation in the base at a location in the genome which can be compared among individuals (Hayes and Goddard 2010; Huang and Han 2014).

Recent advancements have led to high-throughput and high-density genotyping at lowered costs per marker point for use in genomic analysis (Iwata et al. 2016). Next-generation sequencing technologies such as genotyping by sequencing can detect molecular markers for use in genomics studies like GWAS (He et al. 2014). Marker order along the genome is not strictly required, so association studies and some other genomics studies can be performed without a reference genome. This is particularly useful in novel species (Iwata et al. 2016).

GWAS for quantitative traits such as yield have shown that often there are many markers that influence the corresponding phenotype, and these can have a minor or major effect (Lee et al. 2008; Khan and Korban 2012). The gain from MAS is proportional to the variance of the trait captured by the markers and the significance of the association (Collard et al. 2005). As such, MAS has little value for traits that are complex (that is, affected by a large number of mutations all of small effect); MAS is more effective for monogenic or oligogenic traits (Luby and Shaw 2001; Hayes and Goddard 2010; Huang and Han 2014). Screening for target markers using GWAS and MAS can occur before the plants flower as only DNA is needed for the selection, and thus can substantially reduce the selection cycle and increase genetic gain by eliminating undesirable genotypes (van Nocker and Gardiner 2014).

‘DNA-informed breeding’, a term coined by Peace (2017), is becoming the convention driving breeding direction in Rosaceae crops in the USA. Previously, however, Ru et al. (2015) reviewed the opportunities and constraints of using MAS in Rosaceae breeding. They found that MAS was not yet widely applied in fruit trees, but that affordable and programme-specific testing of DNA for major trait loci at the seedling stage could be effective for the adoption of the technique. There are relatively few published studies employing GWAS in fruit trees. Recently, a study by Minamikawa et al. (2017) investigating fruit quality traits in 676 citrus individuals using 1841 SNPs found that correlated traits were controlled by several common SNPs. In apple, Kumar et al. (2013b) found significant associations in six fruit quality traits using 2500 SNPs across 1200 seedlings. SNP markers with the largest effect across linkage groups individually explained only 2% of the phenotypic variation for fruit firmness and 17% for red flesh, which was reasonably low, yet substantially more than that explained by pedigree-based analysis in many other traits (Kumar et al. 2013b). Kumar et al. (2013b) also found two genomic regions that were linked with two pairs of fruit quality traits, suggesting a pleiotropic effect.

Further studies have employed GWAS utilising genetic markers other than SNPs. For example, Iwata et al. (2013) and Cao et al. (2012) investigated fruit quality in Japanese pear (Pyrus pyrifolia) and peach, respectively, using simple sequence repeat (SSR) markers. Iwata et al. (2013) detected significant associations between markers and resistance to black spot disease, spur number, and harvest time, which indicated links to major QTLs, despite the small scale of the study in terms of number of markers (n = 162) and cultivars (n = 76). Using 53 SSR markers distributed across linkage groups, Cao et al. (2012) found that the significantly associated markers detected for peach fruit quality were located nearby previously known QTLs. Between 8.1 and 14.5% of the variation in red flesh pigment was explained by four SSRs. Similar to the findings of Kumar et al. (2013b) in apple, two of the pigment markers were associated with two other sets of traits: ripening time and fruit development period, and fruit weight and flowering time (Cao et al. 2012).

A review of breeding progress in tree nut crops by Mehlenbacher (2002), including efforts of trait mapping and MAS, concluded that genetic improvement is limited by small breeding programme size. Hardner et al. (2005) evaluated the potential for MAS to improve macadamia specifically, stating that SSR markers and pedigree will be useful in detecting marker-trait associations. However, the technology in the genomics field has vastly increased and improved; many more markers can now be screened at a lower cost. GWAS and MAS regarding important component traits of yield such as nut and kernel weights and KR using SNP markers appear to be feasible to improve macadamia if the traits are controlled by few genes of moderate to large effect. Determining the number of markers and their effects for these traits should be the focus of future genomics studies, as this is currently unreported.

Yield prediction using genomic selection

Genomic selection (GS) uses genome-wide markers to capture the effects of loci that affect the target trait (Meuwissen et al. 2001). GS is best when markers such as SNPs are in high LD with genes of large effect, hence capturing a large proportion of genetic variance (Goddard 1991; Druet et al. 2014; Viana et al. 2016). A two-step process is involved. In a reference (or training) population, where individuals have both genome-wide marker genotypes and target trait phenotypes available, the effects of all markers on the trait are estimated simultaneously. The effect of each marker is used to establish a prediction equation. The equation can then be used to predict genomic estimated breeding values (GEBVs) for genotyped selection candidates, likely to be seedlings or young trees. The accuracy of GS is assessed with cross validation of the predicted GEBV against the known and accurate phenotypes in a validation (or testing) population (Meuwissen et al. 2001).

Many simulations suggest that GS is superior to MAS and traditional phenotypic selection for complex traits (Bernardo and Yu 2007; Heffner et al. 2009; Grattapaglia and Resende 2011; Iwata et al. 2011). This is because MAS only uses markers significantly associated with a target trait, yet many yield and quality traits are often controlled by numerous minor-effect genes (Jannink et al. 2010; Kumar et al. 2012a; Iwata et al. 2016). In comparison, GS utilises all available genetic markers with no significance threshold and can therefore explain more of the genetic variability than MAS (Meuwissen et al. 2001; Viana et al. 2016). Thus, GS avoids marker effect biases and produces more highly correlated measured and predicted BVs (Meuwissen et al. 2001; Heffner et al. 2009).

GS can increase genetic gain in horticulture crops by accelerating breeding cycles (Meuwissen et al. 2001; Heffner et al. 2010; Jannink et al. 2010; Desta and Ortiz 2014). By selecting potential elite juvenile individuals, filtering candidates and only proceeding to the field with potentially high-performing trees, time, cost, and labour can be reduced. However, yield and other traits still need to be assessed over several locations before cultivars are recommended (Acquaah 2012). Selection of potential superior cultivars in the juvenile stage can also drastically reduce capital and maintenance costs (Luby and Shaw 2001; Rikkerink et al. 2007). This would be useful in macadamia where the trees do not reach full nut production until they are 8 years old (Hardner et al. 2009). Iwata et al. (2016) and Namkoong et al. (2005) recognised that a combination of traditional and marker selection strategies should be employed.

Denis and Bouvet (2013) concluded that perennial crops may have more to gain from GS than annual crops since genetic gain per unit time in perennial crops is critical for improved cultivars. There is a paucity of published studies for tree nut crops, though there has been some work conducted in citrus (Minamikawa et al. 2017), apple (Kumar et al. 2012b), oil palm (Wong and Bernardo 2008; Kwong et al. 2017), and pear (Iwata et al. 2013).

In their recent study of fruit quality traits in citrus, Minamikawa et al. (2017) obtained high (r > 0.7) prediction accuracies for six of the 17 traits including fruit weight. They also found that some model accuracies were trait dependent, but the genomic best linear unbiased prediction (GBLUP) model was the highest for most traits and was more accurate in predictions than MAS based on significant SNPs. Kumar et al. (2012b) investigated the use of GS in improving fruit quality traits in apple. They used 2500 high quality SNPs for 1120 seedlings, and model predictions for fruit quality were high, at 0.70 to 0.90 (Kumar et al. 2012b).

Wong and Bernardo (2008) demonstrated that gain per unit cost and time can be increased in oil palm (Elaeis guineensis) through GS. Costs for GS ranged from USD$75,000 to $194,000 per unit gain, depending on cost per marker data point, population size, QTL number, and heritability, compared with USD$116,000 to $333,000 per unit gain for 19 years of phenotypic selection per cycle. Also in oil palm, Kwong et al. (2017) found that for 1218 individuals genotyped using a 200K array, GS model accuracy increased with trait heritability, ranging from 0.40 to 0.70. The results of these studies may be applicable to other tree species with long generation intervals and large planting areas.

Pear has a long juvenile period and needs to be evaluated over many years and thus Iwata et al. (2013) found could benefit from GS. They investigated nine disease resistance and fruit set traits in 76 Japanese pear cultivars using 162 markers, mostly SSRs. Prediction of GEBVs was moderately high for flesh firmness and fruit weight (0.60 and 0.53, respectively). They found that using all makers, rather than just those with significant associations as identified using GWAS, was more accurate (Iwata et al. 2013). In comparison, predictions of BVs in citrus for fruit weight and other fruit quality traits were high (> 0.7) across 106 cultivars (Iwata 2016). A corresponding GWAS detected the influence of major QTLs in all citrus fruit traits; the use of all markers was more accurate than using only significant SNPs (Iwata 2016).

Given the large amount of phenotypic data available for macadamia, and documented parentage since the first domesticated cultivars, this species is a strong candidate for GS. Macadamia has a selection cycle of 22 years and typical planting densities of 312 trees/ha (Topp et al. 2012) and would benefit greatly from this technology. Potentially, the first stage of progeny testing (Fig. 1) could be substantially reduced by genotyping seedlings, applying GS models, and only continuing further evaluations with those individuals predicted to be high yielding.

The reference population’s size and structure, relationship between training and testing populations, choice of model, marker number and density, heritability of key traits, and LD span need to be assessed when employing GS in breeding. These have been considered in reviews conducted by Grattapaglia (2014) and Lin et al. (2014) on forestry and annual species. The accuracy of models can decline over subsequent generations, so it is necessary to recalibrate every few generations with new phenotypes and allelic frequencies (Goddard 2009; Viana et al. 2016).

The training population needs to be sufficiently large to enable accurate estimation of small effects across many loci and to capture all the genetic variation present in the breeding programme (Meuwissen et al. 2001). Ideally, the training and testing populations should be related or part of the same breeding programme for best results (Habier et al. 2007). Kumar et al. (2012b) divided 1120 apple seedlings into two groups for their GS evaluations: 90% of individuals for the training population and 10% for validation. In their simulations of GS in oil palm, Wong and Bernardo (2008) used small population sizes of 30 to 70 individuals. Macadamia has a relatively small breeding population available for genomic studies; the mean number of seedlings per family in the Australian macadamia breeding programme’s ‘B1.2’ population is 14 (n = 1961) (Topp et al. 2016). However, almost half of these trees have been removed from the field and are no longer available for genomic analysis. A larger second-generation progeny population (n ≈ 4000) is currently being phenotyped and will be available for further genomic selection evaluation (B. Topp, pers. comm.).

A number of prediction models have been developed for GS: BLUP (best linear unbiased prediction), GBLUP, ridge regression, Lasso, reproducing kernel Hilbert space, and various Bayesian regressions (Jannink et al. 2010; Heslot et al. 2012). Heslot et al. (2012) suggested using a reduced set of models for implementing GS in breeding. These include a faster version of BayesB called weighted Bayesian shrinkage regression, Bayesian Lasso, and random forest. It is possible to combine different models to improve predictions; however, Heslot et al. (2012) found that combining different models did not always improve the accuracy of predictions. Cross validation is an essential step in GS to identify the accuracy of the model or to compare different models (Crossa et al. 2010; Heslot et al. 2012). Models such as GBLUP and BayesR (Erbe et al. 2012) are sound candidates for use in macadamia, assuming, respectively, a normal distribution of SNP effects using a genomic relationship matrix among candidates (GBLUP) and allowing for a small number of moderate to large effect QTL (BayesR). Testing both strategies is advisable given the paucity of data regarding the genetic nature of macadamia yield traits.

Models are affected by effective population size, heritability of the trait, and the size of the reference population (Daetwyler et al. 2008; Goddard 2009; Hayes et al. 2009). Effective population size is calculated using marker information and population heterozygosity; genetic gains are greater in species with smaller effective populations (Goddard 2009). Thus, the genetic diversity of the crop must be determined before genomics studies can begin. GEBVs are more accurately predicted when the trait is highly heritable (Hayes et al. 2009). Therefore, it is important to understand the heritability of the characteristics and its component traits. The accuracy of predicting BVs increases with the size of the reference population (Hayes et al. 2009), showing the importance of the training set in GS.

Accurate phenotyping is critical for GS; if the accuracy of phenotyping is poor, many more individuals will be need in the reference population (Desta and Ortiz 2014). Accurate phenotyping requires multiple well-characterised environments, stringent selection criteria, and large training populations (Rikkerink et al. 2007; Xu and Crouch 2008; Resende Jr et al. 2012; Desta and Ortiz 2014). For many traits such as yield, data need to be collected across multiple years and sites and are costly and time consuming (Bernardo 2008; Stephens et al. 2009; Resende Jr et al. 2012; Xu et al. 2012; Isik 2014). This has been the case in macadamia where 2000 progeny from 47 families have been evaluated for 8 years at nine locations (Topp et al. 2016).

To implement GS in macadamia would involve growing progeny to their first leaf for DNA extraction and subsequent genotyping, hence reducing the labour and maintenance costs of growing trees to maturity. Topp et al. (2012) compared capital, maintenance, and evaluation costs, standardised to year of release of five cultivars for four breeding strategies. They found that full traditional assessment, involving the evaluation of 1200 hybrid seedlings for 9 years followed by regional variety trial for a total cycle length of 22 years, was much more expensive ($1,545,922 net present value) with a low ratio of gain to breeding cost ($570,000) compared to tandem selection, where seedlings are evaluated to age 7 only ($795,508 and $1,080,000). Their cloned seedling strategy, which evaluates 200 hybrid seedlings in a regional trial after only 2 years of initial measurements, also reduced the cycle length to 15 years, with improved breeding costs and gain to cost ratio than traditional assessment ($986,075 and $680,000) (Topp et al. 2012). Rapid phenotyping would be extremely useful in large tree crops with long juvenile periods like macadamia.

Genotyping costs continue to decline, with more data points becoming available, and therefore more markers likely to be in proximity to causal genes (Heffner et al. 2009; Khan and Korban 2012; Iwata et al. 2016). The cost of genotyping analysis varies with the volume of sequencing applied per sample, with the most popular services applying between one million to five million reads and the price per sample varying usually between US$25 to US$55 (A. Killian pers. comm.). Thus, with advancing technology, the accessibility of large numbers of molecular markers and the declining costs, the employment of and opportunity to use GS in breeding is increasing (Heffner et al. 2009; Iwata et al. 2016). Future macadamia breeding efforts should compare the costs and benefits of traditional breeding with selection strategies involving GWAS and GS.

Conclusions

The complex nature of yield in macadamia and its low heritability, as well as long cycle times, currently hinder cultivar development. Genetic improvement of yield by indirect selection for its component traits may improve breeding efficiency. Characteristics, such as nut, kernel, and husk weight, and raceme length and width, may be more simply and accurately measured in breeding populations, especially in the years before yield per tree estimates are stable. Yield component traits can be investigated with GWAS to determine if any major markers are associated with each trait. If so, this information could be used in MAS. GS models are suitable for predicting complex traits like yield in macadamia seedlings, as well as to predict important related traits. It is essential to compare the genetic gains and the costs using these different breeding strategies, to determine which method or combination of methods are most efficient.