Introduction

For a long time, genetic breeding programs selected the best individuals using morphological markers. These markers are influenced by the environment and have low selection gain (Toppa and Jadoski 2013). Technology has improved the use of molecular markers, allowing significant progress in the selection of superior individuals by detecting DNA polymorphism (Sousa et al. 2017; Alkimim et al. 2017). In the early 1990s, marker-assisted selection (MAS) was proposed, based on the existence of linkage disequilibrium between DNA markers and genes of interest (Lande and Thompson 1990). When comparing MAS for oligogenic traits with phenotypic selection, the selective efficiency increases and the time required to carry out selection shortens, among other benefits (Noir et al. 2003; Lopez et al. 2013; Romero et al. 2014; Alkimim et al. 2017). Also, this technique enables preventive breeding by allowing the selection of genotypes carrying genes of interest in regions where the pathogen is absent (Alkimim et al. 2017). However, the MAS technique has proven to be more efficient for monogenic or oligogenic traits with high heritability (Asins et al. 2012; Kemper and Goddard 2012). Agronomic traits, in general, are governed by several genes, compromising the efficiency of MAS. Therefore, a new selection method, known as genome-wide selection (GWS), was developed (Meuwissen et al. 2001). GWS emphasizes the simultaneous prediction of the genetic effects of hundreds or thousands of markers that densely cover the genome. Thus, all quantitative trait loci (QTL) of a quantitative trait are expected to be in linkage disequilibrium with at least some of the markers (Grattapaglia and Resende 2011; Valente et al. 2016). GWS stands out for promoting high selective accuracy and for not requiring the knowledge of the prior location (maps) of the QTL in the chromosomes (Meuwissen 2007; Jannink et al. 2010; de Almeida et al. 2016).

In the GWS approach, genomic estimated breeding value (GEBV) can be predicted by different statistical methodologies, including GBLUP (genomic best linear unbiased prediction) (VanRaden 2008). In this method, the GEBV values are predicted using the kinship matrix, estimated from the information of molecular markers, known as the genomic kinship matrix (G). The GBLUP prediction uses much more information on parentage than phenotypic selection, which is based on pedigree (through the parentage matrix A). Then genomic heritability and accuracy of genomic selection can sometimes be higher than those parameters from phenotypic selection. And this can be explained by the many more genetic relationship in the G (the genomic relationship matrix) than in A (the genetic relationship matrix based on genealogy). This increase in the amount of information by using the genomic matrix G can, sometimes, lead to better and more precise estimations and predictions. For two populations and its hybrid population, the genetic variance and heritability are defined at the interpopulation level (Bernardo, 2010; Resende, 2015).

The GBLUP is advantageous for its simplicity and for the shorter computational time required (Heslot et al. 2015). This method is mostly recommended for polygenic traits, which are governed by several genes of minor effect (VanRaden 2008). GBLUP is suitable for the analysis of continuous traits or outcomes. For non-normally distributed traits such as those evaluated by a score scale, GBLUP can be used with the technique called generalized linear model. The results may not differ so much from those got by using the standard procedure of linear mixed model. This is in line with theory, which preconizes that the higher the number of score scale classes, the smaller the benefit from using the generalized linear model technique (Sousa et al. 2019).

The molecular markers SNPs (single nucleotide polymorphism), used in GWS studies, stand out for being the most common type of polymorphism in genomes, for the possibility of automation, and for being codominant and biallelic (Resende et al. 2008; Liao and Lee 2010). Recently, the development of next-generation sequencing (NGS) platforms has facilitated the discovery of SNPs, decreasing data point costs. With the identification of SNP markers widely distributed in the species genome, GWS has become a reality, allowing significant gains for several breeding programs (Goddard and Hayes 2007; Meuwissen 2007; Carvalho and Silva 2010; Resende et al. 2012; Fritsche-Neto et al. 2012; Van Eenennaam et al. 2014; Zhao et al. 2015; Sousa et al. 2019).

Although significant, the number of reports regarding GWS in the genus Coffea, even for species of commercial importance, such as Coffea canephora and Coffea arabica, is still low (Ferrão et al. 2017, 2019; Sousa et al. 2019). In a recent study with two populations of a recurrent selection of C. canephora, genotyping by sequencing (GBS) showed good potential to be used in coffee breeding programs (Ferrão et al. 2017). C. canephora is characterized as allogamous, diploid (2n=2x=22), with gametophytic self-incompatibility (Leroy et al. 2005). The species stands out for its rusticity, high yield potential, higher soluble solids content, and genetic resistance to coffee leaf rust, caused by the fungus Hemileia vastatrix (Zambolim 2016).

This study aimed to apply the GWS principle and evaluate its efficiency in the prediction of genomic-genetic value and in the shortening of the selective cycle in C. canephora population, through the RAPiD Genomics sequencing company, by building specific probes in coding and non-coding regions.

Material and methods

Genetic material

The population consisted of clones of the Conilon and Robusta varietal groups and intervarietal hybrids originated from crosses between these groups. The Conilon genetic material was obtained from the Instituto Capixaba de Pesquisa, Assistência Técnica e Extensão Rural (Incaper), and the Robusta material was obtained from the Centro Agronómico Tropical de Investigación y Enseñanza (CATIE). This population composes the breeding program of the Empresa de Pesquisa Agropecuária de Minas Gerais (Epamig), in partnership with the Universidade Federal de Viçosa (UFV) and the Empresa Brasileira de Pesquisa Agropecuária—Café (Embrapa Café), located in Oratórios/MG and Viçosa/MG.

The Conilon and Robusta varietal groups consisted of 51 and 32 genotypes (Table 1), respectively. Also, 82 intervarietal hybrids were obtained by artificial crosses between five genotypes of the Conilon group (male parents) and five genotypes of the Robusta group (female parents), evaluated in the interpopulational partial diallel (Table 2).

Table 1 Coffea canephora genotypes used in the genome-wide selection study
Table 2 Intervarietal hybrids and description of their crosses, used in the genome-wide selection study

For the non-crossed genotypes, information came only from the parents of Conilon and Robusta in the experiment and from their parentage with the crossed parents.

Phenotypic evaluations

The experiment was established in an incomplete block design with up to 35 replicates and single tree plots. It included hybrids and parents. Phenotypic evaluations were carried out for eight traits during three consecutive years (2014–2016). Five categorical traits and three continuous traits were evaluated. Evaluations were performed at the time of physiological maturity of the coffee fruits.

The categorical traits evaluated were as follows: vegetative vigor (Vig), field evaluation of rust incidence (Rus) and cercosporiosis incidence (Cer), fruit maturation time (Mat), and fruit size (FS). The vegetative vigor was evaluated by the general appearance of the plant, by observing plant leaf development, leaf color, nutritional status, and health of coffee plants. A score scale ranging from 1 to 10 was used, where 1 was attributed to totally depleted plants and 10 was assigned to highly vigorous plants. Rust incidence and cercosporiosis were evaluated by a score scale ranging from 1 to 5, where 1 was attributed to genotypes with no symptoms of the pathogen, and 5 was assigned to highly susceptible genotypes, cercosporiosis. Fruit maturation time was classified as early, intermediate, and late, with scores ranging from 1 to 3, respectively. Fruit size was classified as small, medium, and large, with scores from 1 to 3, respectively.

The continuous traits evaluated were as follows: plant height (PH), diameter of the canopy projection (DC), and yield in liters per plant (Y). Plant height (cm) was determined by measuring the most developed orthotropic branch, from the ground to the last apical point of the coffee plant, using a measurement tape fixed to a wooden rod. The diameter of the canopy projection was determined in centimeters (cm), using a ruler perpendicular to the planting row. The yield per coffee plant was evaluated by harvesting all the fruits in a genotype and measuring the total volume in liters of freshly harvested coffee.

Analysis of phenotypic data

The phenotypes were corrected for environmental effects of years and blocks using the Selegen REML/BLUP software (de Resende 2016). The model used was as follows: y=Xu+Za+Wc+Qs+Sb+e, where y is the data vector; u is the vector of year-mean effects (assumed as fixed) added to the overall mean; c is the vector of specific combining ability effects between the Conilon and Robusta parents (assumed as random and distributed as N~I\( {\sigma}_c^2 \)); a is the vector of additive genetic effects of individuals (assumed as random and distributed as N~A\( {\sigma}_a^2 \)); s is the vector of permanent effects of individuals (assumed as random and distributed as N~I\( {\sigma}_s^2 \)); b is the vector of permanent environment effects of blocks (assumed as random and distributed as N~I\( {\sigma}_b^2 \)); and e is the residual vector (assumed as random and distributed as N~I\( {\sigma}_e^2 \)). All the effects were assumed as uncorrelated. Uppercase represent the incidence matrices for these effects. The corrected phenotypes were given by corrected phenotypes were given by y∗=yXu^−Sb^ and are called deregressed phenotypes, which enter in the genomic analyses (Garrick et al. 2009; de Andrade et al. 2019).

The selective accuracy was obtained by the equation ryy = (1 − PEV/\( {\sigma}_a^2 \))1/2, where \( {\sigma}_a^2 \) is the additive genetic variation between individuals under evaluation and PEV is the variance of the prediction error, given by PEV = \( {C}_i^{22}{\sigma}_e^2 \), where \( {C}_i^{22} \) is the ith element of the inverse diagonal of the matrix of the coefficients of the mixed model equations, and \( {\sigma}_e^2 \) is the residual variance.

According to the model y=Xu+Za+Wc+Qs+Sb+e, the individual heritability was estimated by the following: \( {h}^2={\sigma}_a^2/\left({\sigma}_a^2+{\sigma}_c^2+{\sigma}_s^2+{\sigma}_b^2+{\sigma}_e^2\right) \), where \( {\sigma}_j^2 \) is the variance component associated to the j effect.

Genomic DNA extraction, identification, and quality analysis of SNP markers

Young and fully expanded leaves of the 165 coffee trees under study were collected, and the genomic DNA was extracted using the methodology described by Diniz et al. (2005). The DNA concentration was verified in NanoDrop 2000, and its quality was evaluated in 1% agarose gel. The DNA concentration of the samples was standardized and sent to RAPiD Genomics, located in Florida, USA, for the construction of probes, sequencing, and identification of SNP molecular markers.

To identify SNP markers and coffee genotyping, 10,000 probes were selected from 40,000 polymorphic probes (Resende et al. 2016), and 18,111 SNP markers were identified. The probes were constructed from reference sequences. One of the databases was the Brazilian Genome Coffee Project, which contains over 200,000 ESTs (expressed sequence tags), corresponding to about 33,000 transcribed genes, known as Unigenes (Vieira et al. 2006). Another one was the reference genome of the C. canephora species, containing a total of 25,574 genes (Denoeud et al. 2014). Using these reference sequences, specific probes were obtained so that the whole genome was covered, considering both coding and non-coding regions. With these probes, the coffee genotypes were sequenced using the Illumina platform, and the SNP markers were identified using the methodology developed by the company RAPiD Genomics (Resende et al. 2016), developed for humans (Gnirke et al. 2009), and adapted to plants (Neves et al. 2013, 2014). This technology uses a method of genotyping-by-sequencing of specific regions of the genome. Details of the construction of the probes and identification of the SNP markers can be obtained in the study carried out by Alkimim et al. (2018). The SNPs set was subject to analysis of quality implemented in the Rbio software (Bhering 2017). Quality control of SNPs was carried out by the MAF (minor allele frequency—higher than or equal to 5%) and/or call rate (CR—higher than or equal to 90%). The critical level for the MAF parameter was obtained by the equation \( \mathrm{MAF}=\frac{1}{\sqrt{2N}} \), where N refers to the total number of genotypes evaluated (de Resende et al. 2017).

Prediction using the GBLUP model

Analyses were carried out using the GBLUP method via RKHS (Reproducing Kernel Hilbert Spaces) (Gianola 2006), with a Bayesian algorithm, via R environment, in the BGLR package (Resende 2008; Perez and De Los Campos 2014). RKHS accounts for the genetic effects using the Gaussian kernel matrix (K). K = exp. (− hD / median(D)), where h is the reduction coefficient to K values, h is equal to 1, and D is the Euclidean distance of codified markers matrix. A total of 100,000 Markov Chain Monte Carlo (MCMC) iterations were used, with a burn-in of the first 2000 MCMC iterations and a sampling interval (thinning) of 10.

The general mixed linear model (de Resende 2007, 2015; VanRaden 2008) was adjusted to estimate the additive genetic effects of the individuals: y* = Xm + Zg + e, where y* is the vector of corrected phenotypic observations, m is the vector of fixed effects (general mean), g is the vector of random effects of the additive genomic effects of the individuals (assumed distributed as N~G\( {\sigma}_g^2 \)), and e refers to the vector of random residuals. Uppercase letters represent the incidence matrices for these effects. The genomic mixed model equations for the prediction of g using the GBLUP method are given by the following:

$$ \left[\begin{array}{cc}X^{\prime }X& X^{\prime }Z\\ {}Z^{\prime }X& {Z}^{\prime }Z+G\frac{\sigma_e^2}{\sigma_g^2}\end{array}\right]\left[\begin{array}{c}\hat{m}\\ {}\hat{g}\end{array}\right]=\left[\begin{array}{c}X{\prime}_{y\ast}\\ {}Z{\prime}_{y\ast}\end{array}\right] $$

The genomic relationship matrix G comes from a incidence matrix M which contains the values 0, 1, and 2 for the number of alleles of the marker (or the so-called QTL) in a diploid individual.

The component Mij refers to the element i of the row j of the matrix M, referring to individual j. G is a function of MM′ (VanRaden 2008). The genomic heritability was computed as \( {h}_a^2=\frac{\sigma_g^2}{\left({\sigma}_g^2+{\sigma}_e^2\right)} \), where \( {\sigma}_g^2 \) is the additive genomic variance and \( {\sigma}_e^2 \)is the residual variance.

Cross-validation

The cross-validation method K-fold was used, considering k=11 folds. The set of observations of 165 genotypes was, randomly, divided into groups. In the process of analysis, 150 genotypes were used as training population, and the group of 15 genotypes (remaining of original population of 165 individuals) was used as the validation population. This procedure was repeated 11 times (k=11) so that all groups of excluded genotypes were used in the validation.

Predictive capacity, prediction, and accuracy bias of GWS

The predictive capacity and the prediction bias are practical measures of the capacity of a method in predicting with accuracy and not with bias. The predictive capacity (rgy) is determined by the correlation between the predicted genomic values and the observed phenotypic values, which are equivalent to the GWS predictive capacity to estimate the phenotypes. The prediction bias (b) is determined by the coefficient of regression of the predicted genomic values on the phenotypic values (de Resende et al. 2012; Pértile et al. 2016). The accuracy was determined by the estimator rgg=\( {r}_{\mathrm{gy}}/\sqrt{h^2} \), where rgy is the prediction ability of GWS, and h2 is the individual heritability (Borém and Fritsche-Neto 2013).

Estimate of the number of QTL (n QTL) and number of individuals (Ni) to obtain desired accuracy

The estimate of the number of QTLs that control each trait was calculated by the expression \( {n}_{\mathrm{QTL}}=\frac{\left(1-{r}_{\mathrm{gg}}^2\right)N{h}^2}{r_{\mathrm{gg}}^2} \), where rgg is equivalent to the GWS accuracy, N refers to the number of individuals in the population, and h2 is the individual heritability (de Resende et al. 2014).

The estimate of the number of individuals (Ni) that should be evaluated in order to obtain desired accuracy was calculated by the expression \( \mathrm{Ni}=\frac{r_{\mathrm{gg}}^2{n}_{\mathrm{QTL}}}{\left(1-{r}_{\mathrm{gg}}^2\right){h}^2} \), where rgg is equivalent to the accuracy of GWS, nQTL is the number of QTLs that control each trait, and h2is the individual heritability (de Resende et al. 2014).

Efficiency of GWS

The selective efficiency of GWS compared with the selection based only on 6-year phenotypes was calculated using the expression \( \mathrm{Ef}=\frac{r_{\mathrm{gy}}{L}_{\mathrm{f}}}{r_{\mathrm{yy}}{L}_{\mathrm{GWS}}} \), where rgy is the predictive capacity of GWS, ryy is the accuracy of the selection based on phenotypes, Lfis the mean time required for the selection cycle based on phenotypes, and LGWS is the mean time required for the selection cycle based on GWS (de Resende et al. 2012).

Results

Analysis of phenotypic data

Phenotype data were corrected for environmental effects of years and blocks. The values of selective accuracy (ryy) were estimated from the phenotypic evaluations (Table 3).

Table 3 Genome-wide selection (GWS) and estimates of selective accuracy based on phenotypic data obtained by mixed model analysis (REML/BLUP) for eight morphoagronomic traits in a breeding population of Coffea canephora

No satisfactory predictive capacity was verified for the traits fruit maturation time, fruit size, and yield per plant. Therefore, the accuracy values of these traits were not estimated. In addition, for the trait cercosporiosis incidence, selective accuracy was not estimated since the value of broad-sense heritability was 0, based on the phenotypic data.

In general, the evaluated traits showed ryy of high magnitude. Values ranged from 39% for rust incidence to 67% for the plant height.

Analysis of quality of SNP markers

With the sequencing of the 165 genotypes using 10,000 probes (previously selected and distributed throughout the genome), 18,111 SNPs were identified. After the quality analyses, carried out in the Rbio software (Bhering 2017), 14,429 SNP markers were obtained. The initial set of SNP markers reduced by 20.33% (Fig. 1). The number of SNPs per chromosome, after the quality analyses, ranged from 4 to 2163. The highest number of SNPs was observed on chromosomes 0 and 2 (Fig. 1). We made available a file with 14,429 SPNs used in the genetic analyzes and their respective positions in the genome (Online Resource 1).

Fig. 1
figure 1

Number of SNP markers by chromosomes obtained from original SNPs set and SNPs set after filtering. SNP markers distributed throughout the UNIGENES from the EST sequences of Coffea arabica and from the 11 and the chromosome 0 of Coffea canephora

Genomic heritability, predictive capacity, prediction, and accuracy bias of GWS

Estimates of genomic heritability values, predictive capacity of GWS, prediction bias, and accuracy based on the phenotype data are shown in Table 3.

The estimated genomic heritability values (\( {h}_a^2 \)) ranged from 0.15 for the trait yield per plant (Y) to 0.53 for the trait diameter of the canopy projection (DC). Despite the considerable \( {h}_a^2 \)values obtained for fruit maturation time (0.21), fruit size (0.21), and yield per plant (0.15), their predictive capacity was low.

Regarding predictive capacity of GWS (rgy), the traits Vig (0.44), Rus (0.48), Cer (0.54), PH (0.41), and DC (0.58) stood out for their high estimate. This confirms that, in general, the rgy values were higher for traits that had the highest \( {h}_a^2 \) values.

The prediction bias (b) resulted in values close to 1.0 for the traits vegetative vigor, rust incidence, cercosporiosis incidence, plant height, diameter of the canopy projection, fruit maturation time, and fruit size. The trait yield per plant showed no prediction bias close to 1.0. In addition, this trait had the lowest estimate value of genomic heritability.

The accuracy of GWS (rgg) for the traits fruit maturation time, fruit size, and plant yield was not satisfactory, and therefore, their accuracy was not estimated.

The estimates of the accuracy values were obtained for the other traits, ranging from 67% (Vig) to 82% (Rus). dergg values were moderate (68%) to high (79%), even for plant height and rust incidence, which had low \( {h}_a^2 \) values (0.36 and 0.37, respectively).

Estimate of the number of QTL (n QTL) and number of individuals (Ni) to obtain desired accuracy

The estimated number of QTLs controlling each trait ranged from 35 (Cer) to 87 (Vig). In addition, the lowest values of accuracy of GWS, 67% (Vig) and 68% (PH), were obtained for the traits that had the highest number of QTLs (Table 4).

Table 4 Number of QTLs that control the trait (nQTL) and number of individuals (Ni) required to achieve desired accuracy of GWS (rggd) in a breeding population of Coffea canephora for Vig, Rus, Cer, PH, and DC traits

Table 4 shows the estimated number of individuals (Ni) that should be evaluated to achieve desired accuracy (rggd). The values of desired accuracy used were of 0.50, 0.60, 0.70, 0.80, and 0.90 to estimate the number of individuals. This calculation considered estimates of the genomic heritability values (\( {h}_a^2 \)), shown in Table 3, and the estimated number of QTLs (nQTL) controlling each trait, shown in Table 4. To obtain desired accuracy of 70%, which is considered of high magnitude (de Resende and Duarte 2007), 194 individuals need to be evaluated for the trait vegetative vigor, 96 for coffee rust, 78 for cercosporiosis incidence, 184 for plant height, and 89 for diameter of the canopy projection. For all traits evaluated, the higher the accuracy desired, the larger was the number of individuals to be analyzed.

Efficiency of GWS

Figure 2 shows the efficiency of the GWS with the decrease of the selective cycle in relation to the selection based only on 6 years of phenotypic data, for all traits that had good predictive capacity, except for cercosporiosis incidence (Cer). Thus, the GWS efficiency was estimated for the traits vegetative vigor, rust incidence, plant height, and diameter of the canopy projection.

Fig. 2
figure 2

Efficiency of GWS in relation to selection based only on a 6-year long phenotypic data, in a breeding population of Coffea canephora, for the variables vegetative vigor (Vig), rust incidence (Rus), plant height (PH), and diameter of the canopy projection (DC)

Figure 2 shows an increase in selective efficiency by using GWS, even for vegetative vigor and plant height, which had high estimates of accuracy from selection based on phenotypic data (60 and 67%, respectively) (Table 2). Even the traits with low \( {h}_a^2 \) values, plant height (0.36), and rust incidence (0.37) showed efficiency gains with GWS. With the decrease of the selective cycle from 6 to 3 years, GWS was higher (ranging from 22 to 146%) for all traits.

Discussion

Analysis of phenotypic data

Selective accuracy (ryy) was estimated by the REML/BLUP method (de Resende 2016). Selective accuracy reflects the quality of the information and procedures used to predict the genetic values of the individuals (Sousa et al. 2019).

In general, the evaluated traits had high magnitude ryy, ranging from 39% (R) to 67% (PH). Therefore, the higher the value of the selective accuracy, the higher is the confidence in the evaluation and the predicted genetic value of an individual (Sousa et al. 2019).

Quality analysis of the SNP markers

Quality analyses revealed 14,429 SNP molecular markers. Quality evaluations allow identifying markers with ideal quality criteria (Sant’Ana et al. 2018). In addition, this evaluations are advantageous for they remove poor quality markers prior to the statistical analyses, consequently decreasing the occurrence of false-positive (type I error) and false-negative (type II error) (Anderson et al. 2010). High marker density is essential for capturing genes with lower and higher effect and, consequently, increasing the probability of explaining most of the genetic variation of the study trait (Resende et al. 2008; Resende et al. 2016). Valente et al. (2016) found that the use of higher marker densities is required to obtain prediction accuracy of high magnitude (>70%, according to de Resende and Duarte 2007).

A greater number of SNPs were identified on chromosomes 0 and 2. However, chromosome 0 is not a true chromosome, but a set of unsorted sequence scaffolds of C. canephora. This result, number of SNPs identified on chromosome 2, may be due to the length of chromosome 2 in the genome of C. canephora (Denoeud et al. 2014).

Genomic heritability, predictive capacity, prediction bias, and GWS accuracy

The results of estimated values of genomic heritability (0.15 for yield per plant at 0.53 for diameter of the canopy projection) indicate the inheritable capacity of each trait. Despite the considerable \( {h}_a^2 \) values, given the genetic complexity of the traits, obtained for the fruit maturation time (0.21), fruit size (0.21), and yield per plant (0.15), their predictive capacity was low. Estimates of predictive capacity are expected to be lower for traits with low heritability (Legarra et al. 2008). The fact that these three traits had the lowest \( {h}_a^2 \) values justifies their low predictive capacity values.

Good predictive capacity (rgy) was recorded for the traits Vig (0.44), Rus (0.48), Cer (0.54), PH (0.41), and DC (0.58), indicating the capacity to anticipate phenotypes for these traits. All of them are lower than 58%. Although 58% is not a high (above 70%) magnitude, it can be possible do have genetic gain from genomic selection and it can be higher per unit of time than that from phenotypic selection. These data show that, in general, the rgy values were higher for the traits that had the highest \( {h}_a^2 \)values. A GWS study with cashew tree (Anacardium occidentale) also revealed a response of the predictive capacity in function of the heritability (Cavalcanti et al. 2012).

The prediction bias (b) was close to 1.0 for vegetative vigor, rust incidence, cercosporosis, plant height, diameter of the canopy projection, fruit maturation time, and fruit size. Predictive bias close to 1.0 indicates that the prediction was non-biased and, therefore, is effective in predicting the real magnitudes of the differences between the individuals evaluated (Resende et al. 2012). The yield per plan trait had a biased prediction. This trait had also the lowest estimate value of genomic heritability, which may justify the observed bias. In addition, traits governed by larger numbers of genes require populations with larger sample sizes.

The accuracy can be classified as very high (>90%); high (70–90%); moderate (50–70%), and low (<50%) (de Resende and Duarte 2007; Rabier et al. 2016). The traits fruit maturation time, fruit size, and yield per plant showed unsatisfactory predictive capacity. Therefore, their accuracy values were not estimated. The estimates of the accuracy values were obtained for the other traits, ranging from 67% (Vig) to 82% (R). The rgg values were moderate (68%) to high (79%), even for plant height and rust incidence, which had low \( {\mathrm{h}}_{\mathrm{a}}^2 \) values (0.36 and 0.37, respectively). These results confirm the efficiency of GWS in the selection of traits with low heritability and agree with other studies (Legarra et al. 2008; Zhang et al. 2010).

Estimate of the number of QTLs (n QTL) and number of individuals (Ni) to obtain desired accuracy

The estimated number of QTLs controlling each trait ranged from 35 (Cer) to 87 (Vig). These results show the quantitative nature of the traits evaluated in this study. In addition, traits with the highest number of QTLs were those with the lowest values of accuracy of GWS, 67% (Vig) and 68% (PH). A study carried out with oil palm (Elaeis guineensis Jacq.) revealed that the accuracy of GWS is inversely proportional to the number of QTLs that control the traits (Wong and Bernardo 2008). This is expected because traits governed by a larger number of QTLs are more complex. Also, in polygenic traits, in general, each gene has a little contribution to the manifestation of the trait. However, even in a scenario under low heritability and a higher number of QTLs, results of our GWS work were promising. This event probably occurred due to the set of SNPs selected by the quality criteria. Another fact is that the SNPs are widely distributed along the genome of this species. These results lead us to highlight the importance of implementing the GWS technique in the Brazilian coffee breeding program. This can be gradually incorporated, obeying the pace and conditions of each program, as an auxiliary tool in the practical conduction of improvement programs, in order to obtain quick and accurate gains.

The estimate of the number of individuals required to achieve desired accuracy was obtained considering values of desired accuracy of 0.50, 0.60, 0.70, 0.80, and 0.90. To obtain desired accuracy (rggd) of 70% (high magnitude) (de Resende and Duarte 2007), 194 individuals would have to be evaluated for vegetative vigor, 96 for coffee rust incidence, 78 for cercosporiosis incidence, 184 for plant height, and 89 for diameter of the canopy projection. Thus, this study evaluated more individuals than necessary to achieve accuracy of 70% for most of the traits (Rus, Cer, and DC). In addition, this study revealed that the higher the desired accuracy, the larger is the number of individuals to be analyzed.

Efficiency of GWS

The y-axis values shown in Fig. 2 mean the ratio of gain, per unit of time, between genomic selection, and phenotypic selection. When higher than 1, it indicates that genomic selection will provide superior gain. For example, for a value of 1.25, the gain is 25%. Genomic prediction is an additional information on the genetic value of the individual and as such can allow an earlier selection with precision and so can enable reduction of the number of harvests below four (it is well-known that coffee trials require, on average, four harvests for an efficient selection).

In perennial species, such as C. canephora, one of the advantages of genomic selection is the shortening of the selection cycle in order to practice early selection (Castro et al. 2016). In our work, the selective efficiency increased when using GWS, even for vegetative vigor and plant height, which showed high estimates of accuracy based only on phenotype data (60 and 67%, respectively). This increase in efficiency is due to the reduction in the time required to complete a selective cycle using GWS. Thus, reducing the cycle from 6 to 3 years increased the selective efficiency of the GWS for all the traits throughout the reduction of the cycle. Therefore, even when the accuracy of genomic selection has the same magnitude as that obtained with selection based on phenotypic data, GWS will provide higher genetic gains due to the shorter selection cycle (Gois et al. 2016).

With the decrease of the selective cycle from 6 to 3 years, GWS was more efficient (ranging from 22 to 146%) for all traits. Thus, the reduction in the time required to complete a selective cycle is significant when using GWS. Genomic prediction and selection can be performed at the seedling stage, and therefore, GWS has a higher efficiency per unit of time (Resende et al. 2012; Gois et al. 2016). Similar results were observed in other studies. In a study evaluating the selective efficiency of GWS with a 50% decrease in the selective cycle of citrus, GWS was superior for all the traits evaluated (ranging from 31 to 160%) (Gois et al. 2016). A study with oil palm (Elaeis guineensis Jacq.) revealed a reduction in the selection cycle from 19 to 6 years when using GWS (Wong and Bernardo 2008). In a study with maize (Zea mays L.), the use of GWS significantly increased the selective accuracy and the genetic gains per unit of time (Fritsche-Neto et al. 2012). In other studies, genomic selection also showed great potential in increasing breeding efficiency by simulated results (Resende et al. 2008; Valente et al. 2016).

GWS provided gains in efficiency even for traits with low \( {h}_a^2 \), plant height (0.36), and rust incidence (0.37). These data confirm the importance of GWS in the selection of low-heritability traits, a fact that is also observed in other studies (Resende et al. 2012; Gois et al. 2016).

Conclusion

Results reveal that genome-wide selection is useful for C. canephora breeding since it accurately predicts the phenotypes of individuals. This fact leads to a significant reduction in the time required to complete the selection cycle, providing gains in selective efficiency per unit of time. In addition, these results can be used as the basis for further studies on the genus Coffea and on perennial species with genetic similarity.