Keywords

8.1 Introduction

Flax (Linum usitatissimum L.) is a valuable source of linseed and stem fiber. Linseed, also known as flaxseed is rich in omega(ω)-3 essential fatty acids (α-linolenic acid or ALA), lignans, and soluble and insoluble fibers, making it one of the most healthy plant foods (Fofana et al. 2010; Touré and Xueming 2010; Kim and Ilich 2011; Leyva et al. 2011). Linseed oil also has various industrial uses such as soap, vehicle paints, linoleum, printing inks, oil clothing, textiles, patent leather, shoe polish and others (Juita et al. 2012). Flax fiber extracted from the skin of the flax stem is mainly used for linen, the manufacture of twine and rope and as raw materials for some high quality paper products (Deyholos 2006).

Flax has been grown worldwide, but primarily in temperate and subtropical regions, such as Canada (linseed), China (fiber and linseed), USA (linseed), India (linseed), Russia and Europe (fiber and linseed) and Kazakhstan (linseed) (Foulk et al. 2004; Liu et al. 2011; Worku et al. 2015; You et al. 2016b). In these growing regions, the biotic stresses primarily involve various diseases produced by fungi, viruses and mycoplasma like organisms, with fungal diseases including rust (Melampsora lini), anthracnose (Colletotrichum lini), pasmoor spharella linorium (Septorialinicola, Mycosphaerella linicola), wilt (soil-borne fungus Fusarium oxysporum f.sp. lini), seedling blight and root rot, and stem break and browning (Aureobasidium pullulan var. lini or Polyspora lini) (seedborne and soil-borne fungi Rhizoctonia solani, Fusarium spp., or Pythium spp. etc.) being predominant (https://flaxcouncil.ca/growing-flax/chapters/diseases/). These diseases damage flax plants, affect plant growth and development, and ultimately reduce seed and fiber yield and quality. To control these biotic stresses, rotations with other crops such as cereals (spring and winter wheats, barley and oat), oilseed (canola and mustard) and pulse (peas, lentils and soybean) crops are an effective agronomic practice in Canada. Seed treatment with suitable fungicides is another useful practice to kill seed borne pathogens (Bradley et al. 2007).

Incorporating genetic differences to develop agronomic characteristics and add long-term disease tolerance in flax has traditionally been done by conventional breeding methods (You et al. 2016b). A successful example is the genetic improvement against flax rust which has the potential to be the most destructive disease affecting flax. The rapidity with which rust races can evolve represents a challenge in breeding new resistant varieties. Over the last 70 years, more than 500 flax rust races have been recorded. Fortunately, flax rust resistance to different races is controlled by several major genes (Lawrence et al. 1995; Anderson et al. 1997; Ellis et al. 1999; Dodds et al. 2001a, b; Lawrence et al. 2010) that have been successfully pyramided in elite varieties by conventional breeding in Canada. Currently, all Canadian modern cultivars are immune to the locally existing rust races.

However, resistance to other major diseases such as wilt, pasmo, and powdery mildew is quantitative and controlled mostly by minor-effect polygenes (You et al. 2017a; He et al. 2019b), which poses a challenge to the widely used conventional breeding methods. To date, all flax cultivars registered in Canada are moderately resistant to powdery mildew, wilt, and pasmo (You et al. 2016b). The development of advanced genomic tools, such as quantitative trait locus (QTL) mapping, genomewide association study (GWAS) and genomic selection (GS) allows the rapid identification of QTLs that control complex quantitative traits, contributing to more efficient offspring selection and assisting candidate gene isolation whose validation can now be accurately performed via gene editing (GE), all of which contribute to accelerating crop genetic improvement.

This chapter briefly introduces genomic design strategies for genetic improvement of biotic stresses with special emphasis on pasmo as an example to describe methodology, outcomes and potential applications in breeding.

8.2 Genomic Design for Genetic Improvement of Biotic Stress Traits

With the development of QTL markers associated with biotic stress resistance, including functional markers, conventional breeding techniques are being revolutionized. Marker-assisted selection (MAS) has been used for traits controlled by major genes such as rust (Kumar et al. 2011; Miedaner and Korzun 2012). GS has been used for complex quantitative traits controlled by numerous polygenes such as resistance to Fusarium wilt, powdery mildew and pasmo (He et al. 2019a), and precision breeding using GE has been used for improving traits controlled by known genes (Nekrasov et al. 2017). Therefore, the identification and characterization of QTLs and causal genes are now an integral part of modern flax breeding programs.

8.2.1 Identification of QTLs

While classical quantitative or statistical genetics is capable of estimate genetic variances of polygenes for quantitative traits at the phenotypic level (Falconer and Mackay 1996), combining suitable genomic design with molecular markers provide a precise way to identify individual polygenic loci or QTLs on chromosomes, estimate their effects and predict co-located candidate genes related to the traits.

Two types of the QTL mapping strategies have been developed and successfully used for QTL identification: linkage mapping (LM) and GWAS (Sehgal et al. 2016). LM use segregating biparental populations, such as F2, backcross (BC), recombinant inbred line (RIL), and doubled haploid (DH) populations, to create a recombination-based genetic map using molecular makers that is suitable to find QTLs responsible for the characteristics that segregate in the population (Price 2006). The statistical methods and software tools for QTL mapping in biparental populations have been well developed (Kulwal 2018). The major statistical methods to detect additive, dominant and epistatic QTLs include simple interval mapping (SIM), composite or inclusive composite interval mapping (CIM/ICIM), multiple interval mapping (MIM), Bayesian interval mapping (BIM), and multiple trait mapping (MTM) (Kulwal 2018). These methods are implemented in many software tools, such as R/qtl (Arends et al. 2010), MAPMAKER/QTL (Lander et al. 1987), and QGene (Joehanes and Nelson 2008). QTLIciMapping may be mostly recommended because it provides functions of both construction of genetic maps and QTL mapping for additive, dominant, and digenic epistasis as well as interaction of QTLs with environments for various biparental and nested association mapping (NAM) populations (Meng et al. 2015). Traditional statistical methods primarily detect large-effect QTLs and have limited power to identify small-effect and linked QTLs. Recently, Zhang et al. (2020c) proposed a genomewide composite interval mapping (GCIM) for segregating biparental populations and developed a corresponding R package with a command line version called QTL.gCIMapping (v3.2) and a graphical user interface version named QTL.gCIMapping.GUI (v2.0). This method has been effective in identifying small-effect and associated QTLs in biparental populations (Wang et al. 2016b; Wen et al. 2019, 2020).

GWAS is based on linkage disequilibrium (LD) between molecular markers and QTLs in a diverse genetical panel, as opposed to biparental populations, in order to overcome the limitations of the latter. Many population types can be used for GWAS, including natural germplasm collections, diversity panels of both genetic germplasm and breeding lines, and multi-parent breeding populations such as nested association mapping (NAM) (Yu et al. 2008; Monir and Zhu 2018; Ren et al. 2018) and multi-parent advanced generation intercross (MAGIC) populations (Mackay and Powell 2007; Cavanagh et al. 2008; Camargo et al. 2018; Ongom and Ejeta 2018).

GWAS advantages over linkage-based QTL mapping include high genetic variation among individuals, high density molecular markers, and high resolution of QTLs and causal genes on chromosomes (Goutam et al. 2015; Ogura and Busch 2015). Many statistical models have been developed to identify large- and small-effect QTLs that can simply be grouped into two categories: single- and multi-locus models. General Linear Model (GLM) (Price et al. 2006) and Mixed Linear Model (MLM) (Yu et al. 2006) are two traditional single-locus statistical models implemented in many software tools such as TASSEL (Bradbury et al. 2007) for example. Single-locus approaches search the genome in one dimension and measure the significant marker-trait associations one by one. To control for false positives, the stringent Bonferroni correction for multiple tests (P value divided by the number of markers in the model) is frequently used, usually resulting in the exclusion of many false negative loci. This drawback can be particularly acute in crop genetics for traits measured from field experiments that are often plagued by large inherent experimental errors (Zhang et al. 2019). Thus, these types of methods have a restricted capability to detect polygenes with small effects that control the bulk of quantitative traits.

Multi-locus statistical methods that simultaneously test multiple markers include early proposed models such as Multi-Locus Mixed-Model (MLMM) (Segura et al. 2012), and more recent powerful methods to identify quantitative trait nucleotides (QTNs) with small effects. The latter include mrMLM (Wang et al. 2016a; Li et al. 2017), FASTmrMLM (Zhang and Tamba 2018), FASTmrEMMA (Wen et al. 2018), pLARmEB (Zhang et al. 2017a), ISIS EM-BLASSO (Tamba et al. 2017), and pKWmEB, which have been implemented in the R package “mrMLM”, thus called “mrMLM models” (Table 8.1). These multi-locus models use LOD score (≥3), rather than the stringent Bonferroni correction to identify significant QTNs, which substantially increases the statistical power to detect small effect QTNs and reduces Type 1 errors and running time (Wang et al. 2016a; Li et al. 2017; Ren et al. 2017; Tamba et al. 2017; Zhang et al. 2017a; Wen et al. 2018). FarmCPU, a multi-locus model implemented in the MVP R package, is an exception because it still relies on the Bonferroni correction to declare significance of association (Liu et al. 2016).

Table 8.1 Some single- and multi-locus statistical methods for genomewide association study (GWAS)

The haplotype block based multi-locus GWAS method RTM-GWAS (He et al. 2017) is implemented in a standalone software (https://github.com/njau-sri/rtm-gwas). This two-step method first groups highly correlated SNPs into LD blocks (called SNPLDBs) to define bi- or multi-allelic haplotypes. This is then followed by a two-stage association analysis to identify QTNs: (1) pre-screening haplotype markers using a single-locus model, and (2) identifying significant QTNs using a multi-locus and multi-allele model with stepwise regression (He et al. 2017).

We have evaluated these single and multi-locus models in several studies for agronomic traits, abiotic and biotic traits in flax and wheat (He et al. 2019b; Fatima et al. 2020; Lan et al. 2020; Sertse et al. 2020). Our results demonstrate that the single-locus models detected mostly large-effect QTNs, while the multi-locus models were capable of identifying QTNs with smaller effects. Some QTNs were identified by multiple models, but, generally speaking, the models identified different subsets of QTNs, indicative of the uniqueness and complementarity of these algorithms (He et al. 2019b). Therefore, both single and multi-locus models resulted in the identification of a more comprehensive set of QTNs that has been shown to increase prediction ability of GS, and hence is recommended (Lan et al. 2020).

In flax, several biparental populations have been developed to identify QTLs for biotic stress resistance. For Fusarium wilt resistance, a DH population of 143 lines was developed from a cross between the resistant variety Linola and the susceptible Australian flax variety Glenelg, from which two independent and additive genes were identified under greenhouse and field conditions (Spielmeyer et al. 1998). Using a RIL population of 160 lines derived from the resistant cultivar Aurore and the susceptible cultivar Oliver, two independent and recessive genes were also identified for wilt resistance (Edirisinghe 2016). For powdery mildew resistance, three QTLs were detected from F3 and F4 families derived from an F2 population of a cross between the susceptible cultivar NorMan and the resistant cultivar Linda (Asgarinia et al. 2013). Additional biparental populations have also been developed for QTL mapping of flax biotic stress resistance, for example, a Bison/Novelty population of 704 RILs segregating for Fusarium wilt and a Linda/Norman (LNm) population of 160 RILs segregating for powdery mildew (unpublished). These populations have been evaluated for field resistance in multiple years and locations and also re-sequenced using a genotyping-by-sequencing method.

GWAS have been successful in identifying QTLs for agronomic and seed quality traits in flax (Soto-Cerda et al. 2014a, 2014b; Xie et al. 2017; You et al. 2018b). The strength and effectiveness of GWAS using the flax core collection (You et al. 2017a) to detect QTNs for biotic stress traits have been shown for pasmo (He et al. 2019b), powdery mildew (unpublished) and Fusarium wilt (You et al. 2017b).

8.2.2 Candidate Gene Prediction

QTL mapping and GWAS are used to find causal genes underlying traits of interest. Prediction of candidate genes linked to QTNs first requires genomewide gene scans along chromosomes to pinpoint the co-located genes. Although QTNs can be located within coding regions, QTL mapping and GWAS do not provide sufficient resolution to pin the QTLs to accurate intragenic locations or genetic features responsible for controlling the traits. Most QTNs are located in intergenic regions. To infer causal genes linked to a QTN, a logically reasonable method is to check whether the LD correlation (r2 or D’) between the QTN and the markers on neighboring genes is sufficiently high (e.g., >0.8) or, alternatively, to partition the whole genome into haplotype/LD blocks based on the genomewide markers of the diversity panel (Purcell et al. 2007; He et al. 2017; Kim et al. 2019) and then perform candidate gene searches within haplotype blocks harboring significant QTNs. An obvious limitation of this method is that LD blocks or correlations depend on the genetic diversity and the structure of a population. For example, the size of LD blocks in the diversity panel for GWAS are much smaller than that of a biparental population because the former represents a greater number of historical recombination events of the GWAS panel. Thus, GWAS may find a candidate gene of a higher resolution.

A straight forward approach for prediction of candidate genes is to find related genes on the fixed-size flanking regions within a QTL, such as a window of 100–200 kb downstream and upstream of a QTL (Kumar et al. 2015; He et al. 2019b; Sertse et al. 2019; You and Cloutier 2019). The fixed window size may be estimated through analysis of LD decay curve (You et al. 2018b). However, this method has a disadvantage in that the fixed block size does not reflect the differential recombination rates across the genome. Therefore, no matter the methods used to identify candidate genes, all must be validated through functional genomics.

Resistance gene analogs (RGAs) are candidates of resistance genes in plants. They can be identified based on known structural features. RGAs can be clustered as either nucleotide-binding site leucine-rich repeat (NBS-LRR) or transmembrane leucine-rich repeat (TM-LRR) (Hammond-Kosack and Jones 1997). NBS-LRR can be further divided into toll/interleukin receptor (TIR)-NBS-LRR (TNL) or non-TNL/coiled coil-NBS-LRR (CNL) (Hammond-Kosack and Jones 1997). Similarly, TM-LRRs could be classified into two classes: receptor-like kinases (RLKs) and receptor-like proteins (RLPs) (Hammond-Kosack and Jones 1997). Genome-wide RGAs can be identified through software tools (Li et al. 2016) or manually using basic local alignment search tool (BLAST) against annotated gene sequences (You et al. 2018a). Using these approaches, we identified 1327 RGAs in the flax genome which constitute a useful subset to investigate co-localized QTLs associated with biotic stresses (You et al. 2018a).

8.2.3 Genomic Selection

Genomic selection (GS) is a promising breeding selection method that employs prediction models constructed using a training population that is both genotyped with genomewide markers and phenotyped, to predict genomic estimated breeding values (GEBVs) of genotyped but unphenotyped breeding lines. GS promises to increase selection accuracy, shorten breeding cycles, and reduce breeding cost. To date, GS has been implemented in many breeding programs to improve yield, quality, abiotic and biotic stresses, in a wide-range of crop plants such as wheat (Rutkoski et al. 2012, 2014, 2015; Daetwyler et al. 2014), rice (Spindel et al. 2015), flax (You et al. 2016a; He et al. 2019a; Lan et al. 2020), and others. GS is most often used for progeny selection in a breeding program but it can also be applied to evaluation of germplasm and parents, and to predict general combining ability (GCA) and specific combining ability (SCA) of crosses (Bernardo 2015; Lado et al. 2017; Yao et al. 2018). However, the performance of GS depends on (1) choosing a proper statistical model to construct a prediction model; (2) choosing a proper marker set to construct the prediction model; and (3) choosing a proper training population closely related to the test populations.

To evaluate the prediction accuracy or ability of GS models, cross-validation schemes which randomly split the whole population into several subsets (or folds) are frequently used, e.g., five subsets would be called five-fold cross-validation scheme. For a given random sample, each subset or fold is in turn used for a test data set, and the remaining four subsets are merged to be a training data set. This process is iterated multiple times, e.g. 100 to generate a set of random samples. In this case, a total of 500 permutations of training data sets are generated to construct GS models, which are then used to predict GEBVs. Each of these random sample data sets is used for GS modeling and GEBV prediction. The prediction accuracy or ability is defined using a Pearson’s correlation between the GEBVs and the observed phenotypes (You et al. 2016a).

Various genomic models have been developed to optimize prediction models for numerous complex traits. These models include classical parametric statistics based models such as best linear unbiased prediction (RR-BLUP) (Henderson 1975), and genomic BLUP (GBLUP) (Daetwyler et al. 2014); Bayesian statistics based parametric methods such as Bayesian LASSO (BL) (Park and Casella 2008), Bayesian ridge regression (BRR) (Campos et al. 2009), BayesA, BayesB and BayesC; and machine learning based non-parametric methods such as support vector machine (SVM), random forest (RF), radial basis function neural network (RBFNN) and some deep learning methods (Gonzalez-Camacho et al. 2018; Montesinos-Lopez et al. 2018; Fukuoka 2019; Lo-Ciganic et al. 2019; Grinberg et al. 2020; Gupta et al. 2020). These models have been implemented in some popular software tools, especially in some useful R packages (Table 8.2).

Table 8.2 Some popular R packages for modeling of genomic selection

GS parametric statistical models are usually built on additive genetic models and their prediction abilities differ depending on genetic architecture of the traits that are examined. However, because some non-additive effects such as dominance and epistasis interactions are common in quantitative traits, these effects are also considered in some GS models (Varona et al. 2018). Besides genomic prediction for individual traits, multi-trait models in GS have been evaluated (Covarrubias-Pazaran et al. 2018; Fernandes et al. 2018; Montesinos-Lopez et al. 2019b). Providing significant genetic correlation between the target traits, the multi-trait GS models outperform those for individual traits. Nevertheless, construction of multi-trait models is computation-intensive, especially for a large molecular marker and phenotypic data set. Recently, some computation-efficient GS models and R packages have been developed for modeling of multiple traits (Montesinos-Lopez et al. 2019a).

Although many GS models have been implemented and evaluated in a variety of crops and traits, RR-BLUP is the most widely used because of its high-caliber capability (Arruda et al. 2015; Rutkoski et al. 2015; Poland and Rutkoski 2016; Dong et al. 2018; Liabeuf et al. 2018). For example, RR-BLUP effectively identified complicated patterns with additive effects and conveyed effective genomic prediction in wheat disease resistance (Ornella et al. 2012). RR-BLUP also has a distinct benefit as well in the performance of computing compared with most of the alternative statistical models (Piepho 2009; Endelman 2011; Arruda et al. 2015; Liabeuf et al. 2018).

GS was initially suggested by Meuwissen et al. (2001). The main idea behind GS is the use of genomewide markers to train statistical models without prior knowledge of genes or QTLs associated with the traits. With the development of high-throughput genotyping technology, high-density genomewide molecular markers can be readily obtained and breeding populations can be genotyped at low costs. Several popular genotyping methods are available, such as genotyping by sequencing (GBS), array-based genotyping (e.g., iSelect 90 K array for wheat), and target sequence based genotyping (Bekele et al. 2020; Zhang et al. 2020a). To date, most GS models are constructed based on genomewide random markers. Though some studies have discussed the use of QTLs as markers, only major QTLs were used and the outcome was only a minor improvement in prediction accuracy. Our recent studies revealed that combining single and multi-locus GWAS methods can effectively detect both large and minor effect QTLs that can be used to build GS models, thereby significantly improving genomic prediction accuracy (He et al. 2019a, b; Lan et al. 2020).

8.2.4 Genome Editing (GE) and Precision Breeding

GE is a genome-engineering technology that facilitates precise and efficient targeted modification of genomes to characterize the functions of genes and create novel genetic resources for the genetic improvement of plants (Langner et al. 2018; Chen et al. 2019). GE starts with the creation of site-specific double-strand breaks (DSBs) at the target loci by sequence-specific nucleases. Then the DSBs are repaired by the plants endogenous DNA repair mechanisms, either error-prone non-homologous end joining (NHEJ) or homology-dependent recombination (HDR). NHEJ generates small random insertions, deletions and substitutions, preferably causing a gene knockout, whereas HDR is able to generate accurate point mutations, deletions, or gene knock-in especially useful for plant precision breeding but with low editing frequencies (Langner et al. 2018). Broad-sense genome editing techniques include reverse genetic tools such as induced mutagenesis (Rowland 1991; Chantreau et al. 2013; Fofana et al. 2017), oligonucleotide directed mutagenesis (Sauer et al. 2016), epigenome editing (Miglani et al. 2020), transposons, RNA interference (RNAi), and typical genome editing tools such as zinc-finger nucleases (Bibikova et al. 2002; Shukla et al. 2009; Osakabe et al. 2010), Transcriptional Activator Like Effector Nucleases (TALENs) (Malzahn et al. 2017), and Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR/Cas9) systems (Langner et al. 2018; Chen et al. 2019). In particular, the CRISPR/Cas9 system with CRISPR-associated protein 9 (Cas9) is presently the most commonly used approach for plant genome editing due to its ease and robustness.

GE has been successfully applied to improve disease resistance against various plant pathogens, such as in rice (Li et al. 2012), wheat (Wang et al. 2014; Zhang et al. 2017b), tomato (Nekrasov et al. 2017), citrus (Peng et al. 2017), watermelon (Zhang et al. 2020b) and virus (Chandrasekaran et al. 2016).

Fusarium wilt (Fusarium oxysporum) and powdery mildew are destructive diseases in many crops, including flax. Examples of GE applications for these two diseases are listed (Table 8.3). F. oxysporum is a soil-borne fungus that exists as pathogenic and non-pathogenic strains (Leslie and Summerell 2006). Three Fusarium mitogen-activated protein kinase (MAPK) signaling genes (FMK1, HOG1 and PBS2) are associated with plant surface hydrophobicity (sensing) and pathogenesis (Di Pietro et al. 2001). The RNAi-mediated silencing of these three genes in F. oxysporum resulted in decreased mycelial growth on tomato fruits, leading to reduced pathogenicity compared to the unsilenced fungus (Pareek and Rajam 2017). The F. oxysporum species complex (FOSC) is an economically important group of pathogenic filamentous fungi that are able to infect both animals and plants. Wang et al. (2018) developed an F. oxysporum-optimized Cas9 ribonucleoprotein (RNP) and a protoplast transformation method to generate a mutant bik1 of BIK1, a gene in a secondary metabolite biosynthetic cluster, confirming that this polyketide synthase was involved in the synthesis of the red pigment bikaverin.

Table 8.3 Some applications of genome editing in improving biotic stress resistance

Mildew resistance locus O (Mlo) harbors a gene associated with powdery mildew resistance. Its wild-type alleles confer susceptibility to fungi resulting in the powdery mildew disease (Acevedo-Garcia et al. 2014), while its homozygous knockout mutations (mlo) lead to resistance to powdery mildew. Nekrasov et al. (2017) reported a non-transgenic tomato variety resistant to powdery mildew (Oidium neolycopersici) using the CRISPR/Cas9 technologyto edit the Mlo gene (SlMlo1) which is based on the Cas9 DNA nuclease guided to a specific DNA target by a single guide-RNA (sgRNA). PMR4 encodes a callose synthase and its loss-of-function mutants are resistant to powdery mildew in Arabidopsis and tomato. The CRISPR/Cas9-mediated knockout mutants of the PMR4 ortholog (SlPMR4) in tomato showed partial resistance against the powdery mildew pathogen O. neolycopersici (Koseoglou 2017). RNA silencing of SlPMR4 also enhanced the resistance to powdery mildew in tomato (Huibers et al. 2013).

The new technology represented by the CRISPR/Cas-based GE opens a new era in plant precision breeding and is expected to drive the second green evolution (Chen et al. 2019). This technology is considered a novel plant breeding technique that could provide an alternative to the strict regulations applied to ‘genetically modified organisms’ (GMOs). Technically, GE can be employed in precision breeding in many ways (Chen et al. 2019): (1) knocking out genes that confer undesirable traits; (2) knock-in and replacement to introduce new favorable alleles without linkage drag or generating allelic variants that do not exist naturally; (3) nucleotide editing to alter SNPs in either coding or noncoding regions; (4) fine-tuning gene regulation by altering gene expression, mRNA processing, and mRNA translation; and (5) development of high-throughput mutant libraries for functional genomics and genetic improvement.

In flax, the first application of GE aimed to develop an herbicide tolerant version of CDC Bethune, the most popular flax variety in Western Canada, by precisely editing the ENOLPYRUVYLSHIKIMATE-3-PHOSPHATE SYNTHASE (EPSPS) genes using single-stranded oligonucleotides (ssODNs) and CRISPR/Cas9 (Sauer et al. 2016). Attempts to create a new flax variety tolerant to the herbicide glyphosate are being made by CIBUS (https://www.cibus.com/), a precision gene-editing company located in San Diego, using their proprietary GE method.

8.3 QTL Identification and Genetic Improvement for Pasmo Resistance in Flax

Pasmo disease affects flax production worldwide. This fungal disease caused by Septoria linicola (Speg.) Garassini is widespread through all flax growing regions and infects flax plants during the entire growth season (Halley et al. 2004). Rainfall accumulation from June to August increases the incidence and severity of the disease (Halley et al. 2004). High humidity and high temperature conditions during ripening mostly promote disease incidence. The major symptoms are brown circular lesions on leaves and brown or black banding patterns interspersed with green healthy tissues on stems. Pasmo negatively impacts both seed yield and fiber quality (Hall et al. 2016).

Pasmo resistance is a quantitatively heritable trait. The genetic improvement of pasmo resistance is hindered by the scarcity of highly resistant germplasm and a poor understanding of its complex genetic architecture. To date, no flax cultivars are truly highly resistant to pasmo (Diederichsen et al. 2008). Current flax cultivars developed in Canada are only moderately resistant and show a narrow genetic base (You et al. 2016b). To broaden the genetic base of flax cultivars, a core collection of 407 flax accessions has been assembled from a world collection of approximately 3,500 accessions of cultivated flax maintained by Plant Gene Resources of Canada (PGRC) (Diederichsen et al. 2012; Soto-Cerda et al. 2013). We previously evaluated pasmo resistance of the flax core collection and found significant variation associated with the geographical origin (You et al. 2017a). The most pasmo-susceptible accessions originate from India and Pakistan, whereas the accessions from Europe possessed the highest levels of resistance. Of the accessions from North America, most were moderately susceptible and susceptible. Even though CN101536 was evaluated as the most resistant Canadian linseed breeding line in the flax core collection, it was just moderately resistant to pasmo with a rating of 4.4 (You et al. 2017a). Therefore, pyramiding additional favorable alleles into current elite varieties is considered an efficient first step to develop highly resistant varieties. The a priori identification of QTLs associated with pasmo resistance is not only a prerequisite to perform such marker-assisted backcrossing but could also be applied to screen advanced flax breeding germplasm.

8.3.1 Genetic Panel and SNP Set for GWAS

The flax core collection of 407 accessions is a diverse genetic panel. The entire collection was re-sequenced using GBS methodology and generated 100-bp Illumina paired-end reads to an average of 17 × genome coverage using the Illumina HiSeq 2000 platform (Illumina Inc., San Diego, USA). The reads were mapped to the CDC Bethune reference (Wang et al. 2012) using BWA v0.6.1. The mapped reads were analyzed as described (He et al. 2019b) and 1.7 M SNPs were obtained. These SNPs were remapped to the chromosome-scale reference (You et al. 2018a; You and Cloutier 2019). From this unfiltered SNP data set, 258,873 SNPs were extracted using the following filtering criteria: minor allele frequency (MAF) ≥ 0.05, genotyping rate ≥ 60% and pairwise correlation coefficients (r2) among neighboring SNPs > 0.8 (International HapMap Consortium et al. 2005; Huang et al. 2010). Imputation was performed utilizing Beagle v.4.2 with default parameters (Browning and Browning 2007) to predict some of the 14.13% missing SNPs.

8.3.2 Pasmo Field Resistance of the Core Collection

Evaluation of flax accessions to pasmo resistance was carried out in a pasmo nursery that was established in the 1960s. To assure sufficient pasmo infection in the nursery, additional pasmo-infested chopped straw was spread from past growth periods as inoculum between rows when plants were roughly 30-cm tall. In addition, a misting system was applied to spread water for five minutes every half hour for four weeks, except on rainy days, to ensure conidia dispersal and disease infection and development. The 391 accessions were rated for pasmo resistance in the same nursery for five consecutive years from 2012 to 2016 at the farm of Agriculture and Agri-Food Canada, Morden Research and Development Centre, Morden, Manitoba, Canada. The field trial data was adjusted using a type-2 modified augmented design (MAD2) (Lin and Poushinsky 1985).

Pasmo severity, rated on a 0–9 scale, was evaluated based on symptoms on leaves and stems of all plants in a single row plot. Evaluation was conducted at four growth stages, i.e., the early (P1) and late flowering stages (P2), the green boll stage (P3), and the early brown boll stage (P4). To group the resistance of accessions, a rating of 0–2 is categorized as resistant (R), 3–4 as moderately resistant (MR), 5–6 as moderately susceptible (MS), and 7–9 as susceptible (S) (Table 8.4). Statistical analyses for pasmo ratings were previously described in You et al. (2013).

Table 8.4 Field evaluation criteria for pasmo severity on a scale of 0–9

We observed that pasmo infection increased with growth stages and peaked at the final evaluation stage every year, which followed a nearly normal distribution (Fig. 8.1) (You et al. 2017a); thus, only the data observed at the final growth stages (P3 or P4) was used for GWAS. Although significant correlation existed among years, significant differences between years and significant genotype × year interactions were also observed, indicating that the individual year data sets could be used for GWAS to identify environment-specific QTLs.

Fig. 8.1
figure 1

Source He et al. (2019a)

Pearson correlations (upper triangle), scatter plots (lower triangle), and histograms (diagonal) between six pasmo severity datasets. Fitted curves are displayed in scatter plots and histograms. *** represents significance at the <0.001 probability level.

8.3.3 QTL Identification

A total of 370 accessions from the 391 pasmo evaluated accessions, which have both quality SNP and phenotype data, were used for GWAS. We employed three single-locus models (GLM, MLM and GEMMA) and seven multi-locus models (six implemented in mrMLM and one in FarmCPU) (Table 8.1) to identify QTNs from the 370 accessions with 258,873 SNPs. Six pasmo rating data sets were independently analyzed for GWAS: five individual year data set and the 5-year average dataset. Significant QTNs associated with the traits were detected at α = 0.05 followed by a Bonferroni correction (1.93 × 10–7 = 0.05/258,873 SNPs) for GLM, MLM and FarmCPU, and a log of odds (LOD) score threshold of 3.0 for the remaining models. The pipeline for QTL identification and annotation is described in Fig. 8.2.

Fig. 8.2
figure 2

Source Modified from He et al. (2019b)

Pipeline of quantitative trait loci (QTLs) identification using genomewide association study (GWAS) and annotation for flax pasmo resistance.

There were a total of 719 QTNs detected using the ten statistical models for the six pasmo rating datasets. These QTNs were further filtered by removing the QTNs for which the allele effect was not significant, and then grouped into 500 QTN clusters or QTLs based on LDs of contiguous markers as shown in Fig. 8.3. When there was more than one QTN in a cluster, the tag QTN with the largest QTL effect among all QTNs in the cluster was chosen to represent the QTLs. Hereafter QTN and QTL are interchangeably used.

Fig. 8.3
figure 3

Circos map of 500 QTNs associated with pasmo resistance measured in the field for five consecutive years and identified using ten single- and multi-locus models. Track A: flax genome chromosomes; B: 1599 resistance gene analogs (RGAs); C: 372 putative candidate RGAs for pasmo resistance; D: 8 RGAs co-located with identified QTNs; E: 209 QTNs identified by GLM; F: 22 QTNs identified by FarmCPU; G: 281 QTNs identified by all six “mrMLM models” (from H to M); H: 60 QTNs identified by FASTmrEMMA; I: 125 QTNs identified by FASTmrMLM; J: 97 QTNs identified by ISIS-EM-BLASSO; K: 97 QTNs identified by mrMLM; L: 95 QTNs identified by pKWmEB; M: 118 QTNs identified by pLARmEB

Of these 500 QTNs, 14.4% (72) had large QTN effects (R2 > 10%), i.e., QTNs explaining a major portion of the phenotypic variance, while 24% (120) had minor effect (R2 < 1%). Several notably large-effect QTNs were identified, including Lu1-9232234 (R2 = 16.17%), Lu8-23104696 (R2 = 16.53%), Lu9-1896658 (R2 = 17.12%), andLu9-4333365 (R2 = 23.39%).

QTN detection power varies depending on statistical models used. Single-locus models mostly identified large-effect QTLs. Of the three single-locus models, MLM identified only one large-effect QTN with R2 = 15.02%, GEMMA identified six with an average R2 of 11.13%, whereas GLM detected 209 QTNs that had an average R2 of 5.57% and a range from 0.48 to 15.02%. Multi-locus models identified more small-effect QTNs than single-locus models. In addition, the six mrMLM models detected more QTNs with smaller effects (average R2 of 2.80%) than FarmCPU (average R2 of 5.09%), because the high stringency of the Bonferroni correction was applied to FarmCPU.

The stability and reliability of the QTNs identified correlated with the number of statistical models (NSMs) and the number of pasmo phenotype datasets (NPDs) to display significant allele effects for the QTNs (Fig. 8.4). A total of 127 QTNs were identified by two or more statistical models, but most of them (373) were detected by a single model. However, the effect size of QTNs was not necessarily associated with the NSMs (Fig. 8.4a), though the large-effect QTNs Lu4-14738243, Lu9-4333365 and Lu8-14317356 were all detected by more than five or all six models (Fig. 8.4a).

Fig. 8.4
figure 4

Relationship between R2 (phenotypic variance explained by a QTL, %) with the number of statistical models that detected the QTLs (a) and the number of pasmo phenotypic datasets that showed significant allele effects for the QTLs (b)

Nevertheless, the effect size of QTNs significantly correlated with NPDs (Fig. 8.4b), indicating that QTNs detected by a greater number of data sets were more reliable and associated with larger effect than QTNs identified in fewer data sets. Inversely, small-effect QTNs were usually identified in only one or two phenotypic datasets (or environments), indicative of their environment-specific associations.

Based on the QTN effect size and the number of pasmo phenotypic datasets that showed significant QTN effect, two QTN subsets were generated from the 500 QTN set associated with pasmo resistance in flax. The first subset was defined based on 134 stable QTNs that have significant QTN effects in all six phenotypic datasets and explained 27.4–60.9% of the total variation. The second subset of 67 QTNs represented the non-redundant and stable QTN subset, which were identified by the construction of forward stepwise multiple regression models and retained in at least three models. This subset contributed 31.5–64.2% of the total variation in the six phenotypic datasets, a range comparable or moderately larger than that of the 134 QTL subset, indicating that the latter retained redundant markers.

The 500 QTN set appeared to be primarily additive for pasmo resistance. Significant negative correlation between the number of favorable alleles (NFAs) and pasmo ratings were observed (R2 = 0.73) (Fig. 8.5), signifying that NFA is a good indicator or criterion to evaluate pasmo resistance of accessions.

Fig. 8.5
figure 5

Relationship between the number of favorable alleles and the average pasmo ratings of 370 flax accessions evaluated in the field for five consecutive years

8.3.4 Candidate Genes

To find candidate resistance genes that are co-localized with the detected QTNs, we first identified 1599 RGAs on the 15 chromosomes (Fig. 8.3, Track B), including the 1327 initially detected in the flax pseudomolecule (You et al. 2018a). We then performed genomewide scans along chromosomes to locate all the RGAs within a 200-kb window of the QTN’s flanking regions. A total of 372 RGAs co-locating with 314 QTNs were thus detected. Among them, Lu1-3420323, Lu2-23730537, Lu8-22525597, Lu9-1067536, Lu10-16054459, Lu12-1874446, Lu13-2227366 and Lu15-14719354 were located in the following RGAs per se: Lus10042324 (RLK), Lus10030634 (RLK), Lus10015350 (TNL), Lus10028975 (TM-CC), Lus10022900 (CNL), Lus10023329 (TN), Lus10026988 (RLK), and Lus10014810 (RLK), respectively (Table 8.5, Fig. 8.3).

Table 8.5 Quantitative trait nucleotides (QTNs) and putative candidate genes associated with pasmo resistance

We further analyzed the 67 stable and large-effect QTN subset and found that 45 QTNs co-localized with 85 RGAs (Table 8.5), representing all four types, i.e., RLP, RLK, NBS coding genes, and those encoding transmembrane coiled-coil proteins (TM-CC) (Sekhwal et al. 2015). RLKs accounted for 36.47% of RGAs, while TNLs for 22.35% (He et al. 2019b).

Of note, Chr 8 contains an important genomic region associated with pasmo resistance. A total of 49 QTNs were identified on Chr 8, and nine of them were classified stable and major QTNs with nearby candidate genes (Table 8.5). QTNs Lu8-18251174 (R2 = 10.38%) and Lu8-18447612 (R2 = 11.66%) both co-located with TNL gene clusters. Lu8-18251174 had high LD correlations with both Lus10007830 (NL) and Lus10007831 (TNL), while Lu8-18447612 was significantly correlated with Lus10007790 (TNL) (Fig. 8.6a). In addition, QTN Lu8-22525597 (R2 = 2.74%) is located within TNL gene Lus10015350 (Table 8.5 and Fig. 8.6b). Besides TNL genes in this genomic region, the RLK gene Lus10016620 was also found to be significantly correlated with QTN Lu8-14317356 (R2 = 14.32%) (Fig. 8.6c).

Fig. 8.6
figure 6

Linkage disequilibrium plots for three QTNs associated with pasmo resistance (a). QTN Lu8-18447612 (R2 = 11.66%) co-located with the gene Lus10007790 (TNL); (b). QTN Lu8-22525597 (R2 = 2.74%) located within the gene Lus10015350 (TNL); (c). QTN Lu8-14317356 (R2 = 14.32%) co-located with the gene Lus10016620 (RLK). The values in parentheses after QTN names are R2 values

Lus10031043 (RLK) and Lus10020016 (CNL) are two candidate genes which co-locate with the major QTNs Lu9-6270375 and Lu12-474480, respectively. These two genes are orthologous to Arabidopsis resistance genes AT5G20480.1 and AT3G07040.1 (RPM1), respectively (Xiang et al. 2008; Saijo et al. 2009). AT5G20480.1 encodes a leucine-rich repeat receptor kinase (LRR-RLK) and behaves as the receptor for bacterial pathogen-associated molecular patterns (PAMPs) EF-Tu (EFR). The LRR-RLK EFR can recognize the bacterial epitopes elf18 that is derived from elongation factor-Tu, and then activates the plant immune response (Saijo et al. 2009). The Pseudomonas syringae effector AvrPto has been shown to bind receptor kinases, including Arabidopsis LRR-RLK EFR, inhibit plant PAMP-triggered immunity and elicit strong immune responses (Xiang et al. 2008). RPM1 has a tripartite nucleotide binding site at the N-terminal and a tandem array of leucine-rich repeats at the C-terminal, and it conveys resistance to P. syringae strains that carry the avirulence genes avrB and avrRpm1. The RPM1 gene confers dual pathogen specificity that expresses either of the two unrelated P. syringae avirulence genes (Grant et al. 1995). Therefore, Lus10031043 and Lus10020016 are two additional candidate genes deserving further functional analyses.

8.3.5 Genomic Evaluation of the Resistance Germplasm

Flax has two morphotypes: seed and fiber. Pasmo resistance correlates with these morphotypes. Significant correlations between morphotype and pasmo ratings (r = 0.49, p < 0.00001) as well as between morphotype and NFAs (r =  − 0.65, p < 0.00001) were observed in the diversity panel which comprised 80 fiber and 290 linseed accessions (Fig. 8.7). Fiber accessions generally appeared to be more resistant to pasmo than linseed accessions. This is likely an indication that fiber flax breeders have expended greater effort into breeding for pasmo resistance than linseed breeders because fiber flax quality can be greatly affected by high pasmo incidence. Aside from artificial selection by breeders, long term natural selection and probably independent domestication of the fiber flax may also account for the differential in pasmo resistance between the morphotypes (Fu et al. 2012).

Fig. 8.7
figure 7

Source Modified from He et al. (2019b)

Boxplots of flax morphotypes in terms of flax pasmo ratings and number of favorable alleles in the accessions.

A variety of pasmo resistance was observed in the core collection (You et al. 2017a), allowing further investigations on a genomic scale. Making use of the QTN information of the genotypes, we identified 14 accessions with resistant phenotypes and high numbers of favorable alleles (Table 8.6). For instance, the fiber accession CN19001 from the Netherlands and the linseed accession CN101367from Georgia, have average pasmo ratings of 2.0 and 1.8 and 354 and 351 favorable alleles, respectively. Netherlands’s accessions CN40081 and CN33390 had the most favorable alleles but slightly higher pasmo ratings than the previous two. It is also notable that ten of the 14 resistant accessions are fibers. These fiber and linseed accessions are good parents to further improve flax resistance to pasmo through direct cross breeding through the pyramiding of favorable alleles into elite varieties.

Table 8.6 Genetic resources resistant to pasmo disease identified by genomic and phenotypic evaluation

8.3.6 Evaluation of Genomic Selection (GS)

For complex quantitatively heritable traits, the major purpose of genomewide QTL identification is to provide molecular markers for breeding selection. Some large-effect QTLs such as Lu9-4333365, Lu4-14213405, Lu5-14838893, Lu4-13813266 and Lu9-1896658, have R2 values exceeding 17%, which could be useful for MAS, but most of the QTNs identified have small allele effects, which would not be considered for MAS but could be valuable for GS. To explore the values of these QTNs in GS, we first assessed the efficiency of various GS models to ascertain the best model for GS of pasmo resistance. The GS models RR-BLUP, GBLUP, BL, BRR, BayesA, BayesB, BayesC, RFR, RKHS and SVR were evaluated using the 500 QTN subset as marker input and the five-year average pasmo rating dataset as the phenotype. The five-fold cross-validation results revealed the same prediction ability (r) of 0.92 for 9/10 models, exception being RFR which had a prediction ability of 0.79 (Fig. 8.8).

Fig. 8.8
figure 8

Comparison of prediction ability (r) of ten genomic selection (GS) models. The GWAS-derived 500 QTN subset (QTL-500) with the five-year average pasmo rating dataset were used for GS model construction

We further evaluated GS models with different marker sets to determine the best marker set in the development of GS model for pasmo resistance. Six different marker sets were tested with the six pasmo phenotype datasets using the random five-fold cross-validation scheme. The marker sets were three SNPdata sets (SNP-66723, SNP-9415 and SNP-3057) and three QTL data sets (QTL-500, QTL-134 and QTL-67). SNP-66723 was selected from the 258,873 SNP data set by a Pearson’s χ2 test with Yate’s continuity correction to identify all SNPs related to pasmo ratings. SNP-9415 and SNP-3057 are two subsets of SNP-65723 that were selected with probability value thresholds of 0.01 and 0.001, respectively. QTL-67, QTL-134 and QTL-500 represent the 500 GWAS-derived unique QTLs, the 134 statistically stable QTLs and 67 non-redundant and stable QTL subsets, respectively. QTL-67 is contained in QTL-134, which is in turn contained in QTL-500. RR-BLUP was used to construct the GS models. Results showed that the GS models with QTL markers consistently outperformed those with SNP markers for all pasmo phenotypic datasets (Fig. 8.9), similarly to our previous results on seven breeding target traits (Lan et al. 2020).

Fig. 8.9
figure 9

Comparison of prediction ability (r) of RR-BLUP prediction models constructed using six different marker sets and the five-year average pasmo rating dataset using a random five-fold cross-validation scheme. SNP-66723 is a SNP subset selected from 258,873 SNPs by a Pearson’s χ2 test with Yate’s continuity correction to identify all SNPs statistically correlated with pasmo ratings. SNP-9415 and SNP-3057 are two subsets of SNP-65723 that were selected at different probability thresholds. QTL-67, QTL-134 and QTL-500 represent the 500 unique QTL, the 135 stable QTL and the 67 non-redundant QTL subsets identified by GWAS, respectively. QTL-67 is comprised within QTL-134, which is in turn comprised within QTL-500

In the three QTL marker based GS models, GS models built from QTL-500 significantly outperformed those from QTL-134 and QTL-67, indicating that at least a portion of the minor-effect QTNs contribute positively to the development of the GS models. The similar prediction ability of the two smaller marker sets was anticipated since QTL-67 is fundamentally a non-redundant set of QTL-134. These GS prediction results indirectly serve as a validation of the QTL identified via GWAS. In addition, a prediction ability as high as 0.92, seen in the GS models clearly illustrates the effectiveness of genomic prediction for pasmo resistance by employing a comprehensive range of stable or environment-specific QTLs with large- and small-effect QTLs.

8.4 Future Perspectives

Resistance to diseases such as pasmo, Fusarium wilt and powdery mildew is a complex quantitative trait in flax. The conventional approach to flax genetic improvement still involves cross breeding through hybridization of two parents followed by offspring segregation and phenotypic selection. In such conventional approach, the quantitative inheritance nature of these disease resistances impedes the rapid pyramiding of desirable or resistant alleles/genes from donor parents into a single plant, resulting in slow advance in resistance breeding for these biotic stresses in flax. To date, the majority of registered flax varieties are moderately resistant to pasmo, Fusarium wilt and powdery mildew. However, large-scale QTL identification through linkage-based QTL mapping and GWAS has already identified a large number of QTLs associated with biotic stresses in flax, including large-and minor effect QTLs. QTL markers identified from the flax core collection offer the potential to enhance selection accuracy and efficiency of cross breeding through GS. In addition, QTL markers of parents can be combined with genetic simulation to generate virtual crosses and their offspring populations (Khan et al. 2022). Then GS can be applied to predict GCA of parents and SCA of the virtual crosses, which facilitate parent selection and cross making to make best crosses.

The “breeding by design” was proposed by Peleman and Voort (2003), aiming to gather favorable alleles or QTLs associated with breeding target traits from potential genetic resources to develop superior varieties. We have identified an array of QTNs related to the traits of interest, including biotic stresses, and deciphered the distribution of the favorable alleles on the genetic resources. We also found that the identified QTNs were primarily additive. Therefore, this offers a genomic approach to evaluate all genetic resources based on their genomewide QTN content. Furthermore, based on complementarity of favorable alleles among parents, suitable parents can be selected to “design” potential superior varieties. Such varieties may contain all favorable alleles in one variety and can be implemented through conventional breeding, MAS and GS.

Some candidate genes have been predicted for some of the significant QTNs, but validation and characterization of these candidate genes via functional genomic approaches remain challenging. Once their functions are validated and functional markers are developed, precision breeding through gene editing technologies is expected to be a revolutionary strategy towards rapid and accurate pyramiding of multiple resistant genes into elite flax varieties. The impending first successful application of GE in flax has the potential to accelerate the deployment of precision breeding technologies in flax genetic improvement.