Keywords

1 Introduction

Socio-economic and environmental factors point to the pivotal role of legume-based crops in future crop-livestock systems of southern Europe (Annicchiarico 2017). Alfalfa (alias lucerne, Medicago sativa L.) is the most-grown perennial forage in this region. Its genetic variation includes germplasm with outstanding drought tolerance (Annicchiarico et al. 2011), which can be exploited for improving crop adaptation to the predicted adverse effects of climate change. Regional production of hay or silage may also rely on annual legumes, particularly in severely drought-prone environments where perennials may lack sufficient persistence. Recent findings have highlighted the interest of field pea (Pisum sativum L.) over vetch species (Vicia spp.) in this respect, both as a pure stand crop and for intercropping with cereals (Annicchiarico et al. 2017b). Maximizing the aerial biomass of semi-dwarf pea germplasm has crucial importance not only for pure stands but also for pea-cereal intercropping, where it ensures sufficient legume content and competitive ability against cereals (Annicchiarico et al. 2013).

Genomic selection (GS) pools phenotyping and genotyping data of a genotype sample representing a target genetic base (reference population) into a model that estimates breeding values for future plant selection (Heffner et al. 2009). GS has taken impulse from the development of genotyping-by-sequencing (GBS) (Elshire et al. 2011), which can produce thousands of genome-wide markers at a lower cost than SNP array platforms (albeit with large amounts of missing data). While predicting pure line performance is the obvious aim of GS in inbred species (such as pea), predicting the breeding value of candidate parent genotypes for synthetic varieties of outbred species (such as alfalfa) can be pursued by genotyping a set of parent genotypes and phenotyping their half-sib progenies (Annicchiarico et al. 2015a). First results for GS prediction of alfalfa forage yield or pea grain yield were promising (Annicchiarico et al. 2015b, 2017a; Li et al. 2015). Positive results emerged as well for prediction of some grain yield components of pea (Burstin et al. 2015).

This study pooled results for different material and/or cropping conditions with the aim to assess the predictive ability of GS for biomass yield of alfalfa and pea. An additional aim was to briefly devise the incorporation of GS into the breeding scheme of these crops.

2 Material and Methods

Alfalfa genotypes were phenotyped for dry biomass yield under dense-stand conditions of their half-sib progenies in three experiments termed hereafter as data sets. Data set 1 comprised 154 genotypes from a broadly-based population of Mediterranean germplasm, phenotyped under water-favourable conditions in a managed environment (750 mm of water over March–October) over four harvests of one year. Data set 2 included the same material, phenotyped under moderate drought stress in a managed environment (on average, 455 mm of water over March–October) over seven harvests across two years and the following spring. Data set 3 included 124 parent genotypes from a broadly-based population of landrace and variety germplasm from the Po Valley, phenotyped in Lodi (northern Italy) under field conditions and moderate drought stress (on average, 454 mm of rainfall plus irrigation water over March–October) over 12 harvests across two years and the following spring. Annicchiarico et al. (2015b) described procedures of GBS and SNP data calling for these data sets, as well as phenotyping procedures and results generated by seven GS models for the first and the third data set. This study adds original results relative to the second data set, whose experiment was carried out using same procedures (for plot size, experimental design, etc.) as the first data set but different drought stress level and experiment duration. For this data set, we exploited SNP data from Annicchiarico et al. (2015b) and the two GS models that proved more predictive for yield in the other two data sets, namely, Ridge Regression BLUP (rrBLUP), and Support Vector Regression using Linear Kernel (SVR-lin).

For pea, an earlier study (Annicchiarico et al. 2017a) reported the predictive ability of four GS models for grain yield under severe managed drought stress (120 mm of water over the period March–May) of three recombinant inbred line (RIL) populations, each including 105 lines. Here, we added information on GS predictive ability for dry biomass and straw yield assessed in the same experiment. The RILs were issued by connected crosses between three semi-dwarf cultivars (Attika; Isard; Kaspa) that exhibited high and stable grain yield across climatically-contrasting Italian sites (Annicchiarico 2005; Annicchiarico and Iannucci 2008). Attika and Kaspa displayed high biomass yield too, and proved suitable for forage production in mixed cropping with cereals (Annicchiarico et al. 2013, 2017b). We used GBS-based SNP data from Annicchiarico et al. (2017a) for the cautious minimum read depth of six for SNP genotype calling (given some heterozygosity expected in the genotyped F6 generation), and adopted the two GS models that were more predictive for grain yield in that study, i.e., Bayesian Lasso (BL) and rrBLUP.

GS predictive ability (i.e., the correlation between GS-predicted values and observed values) was assessed across genotype SNP missing data thresholds for marker retention in the range 10–50%, using missing data imputation and cross-validation procedures described earlier (Annicchiarico et al. 2015b, 2017a). Pea GS models were trained over the three RIL populations without imputing genetic structure information, assessing their predicting ability on the single populations.

3 Results and Discussion

For unpublished results of alfalfa (second data set), the best GS configuration for predicting biomass yield was provided by the SVR-lin model with genotype missing data threshold of 40%, whose predictive ability reached r = 0.18 using 10911 polymorphic SNP markers. The rrBLUP model performed nearly as well (r = 0.17). Best predictions for this data set were distinctly lower than those observed for biomass yield of the same material under favourable cropping conditions (which featured distinctly lower experiment error CV, i.e., 14.1 vs 19.8%), or yield of a different reference population under moderate drought stress (which featured distinctly higher genetic variance CV, i.e., 22.8 vs 14.0%) (Table 1). It should be noted that even r = 0.18, although fairly unsatisfactory, could still provide a sizable advantage for GS over half-sib progeny based phenotypic selection in terms of predicted yield gains per unit time (Annicchiarico et al. 2015a, b). This would descend from assuming one year for each GS selection cycle and five years for each progeny-based phenotypic selection cycle, with narrow-sense heritability in the range 0.15–0.30 for biomass yield as indicated by various studies (Annicchiarico 2015).

Table 1. Predictive ability (PA; correlation between genome-based predicted values and observed values) of best genomic selection models for genotype breeding value of production traits, for two reference populations of alfalfa and three recombinant inbred line (RIL) populations of pea.

According to pea GS results averaged across the three RIL populations, best predictions of aerial biomass were provided by the BL model with genotype missing rate of 30%. This configuration achieved moderately high predictive ability, i.e., r = 0.45, using 1537 polymorphic SNP markers over the three populations. However, predictive ability values were nearly identical in the range 20–50% of genotype missing data for the two GS models (data not shown). Best predictions for the single RIL populations ranged from 0.29 to 0.60. For straw yield, best predictions were provided by the BL model with 20% genotype missing rate, which displayed an average predicting ability of r = 0.57. Prediction for grain yield displayed the highest accuracy, averaging r = 0.71 (using BL with 20% missing genotype data).

We expected worse GS predictions for alfalfa than for pea RILs, owing to much shorter linkage disequilibrium and the impossibility to exploit non-additive genetic variation in half-sib progeny-based selection of alfalfa parents. However, GS provides greater opportunity for time reduction of selection cycles in a perennial such as alfalfa, justifying our interest even in low predictive ability values in this species.

Our results suggest that GS may already be convenient for breeding programs of alfalfa and pea. However, its incorporation would require important modifications of their selection schemes. This is summarized in Fig. 1, where five basic selection stages are identified whose implementation depends on the reproductive system of the target species (outbred or inbred). The inclusion of GS implies (i) the construction of one or more reference populations, (ii) the definition of one GS model for each target trait in each population using a genotype sample, (iii) the application of the model(s) to a wide set of genotypes from each population, and (iv) the final field test of a reduced set of GS-selected lines (inbreds) or the GS-selected synthetic variety (outbreds). For inbreds, the reference population may conveniently include a set of RILs with partly common ancestors (as here) to facilitate the definition of a common GS model, or one MAGIC population. Key scientific questions remains, inter alia, the ability of GS models built on one population to predict phenotypes of other populations, and the verification of actual yield gains obtained via GS.

Fig. 1.
figure 1

Possible selection schemes integrating genomic selection, for pea and alfalfa