Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

1.1 General Introduction

Alleles are different forms of a gene and affect a particular process in different ways. Different combinations of alleles may result in different phenotypes. Plant breeders try to improve varieties by introducing new alleles, resulting in higher yields and better quality or resistance characteristics. Identifying new, promising alleles is not an easy task. In the post-genomics era, mining of a crop’s (wild) gene pool for novel and superior alleles for agronomically important traits is becoming more and more feasible. Genebanks all over the world contain huge untapped resources of distinct alleles that may have potential application in crop breeding programs. This hidden diversity, which can consist of naturally occurring sequence variation in coding or regulatory regions of genes, can be explored by allele mining (Ramkumar et al. 2010; Varshney et al. 2005, 2009). The variation includes single nucleotide polymorphisms (SNPs) as well as insertions and deletions (InDels), which have the possibility to change the resulting phenotype. This may be by altering the amount of protein or its structure and/or function (Ramkumar et al. 2010). The recent rapid advancements in the field of genomics leads to the accumulation of enormous amounts of sequence information and fast evolving bioinformatic tools which pave the road for identifying, characterizing, isolating, and deploying previously unknown or under-utilized sources of genetic variation.

In this chapter we consider allele mining as the research field that aims at unlocking the genetic diversity existing in genetic resource collections (genebanks) and artificially created mutant populations by identifying allelic variants of genes and loci. Since resistance genes occur in clusters, where allelic relationships are often not clear (Sanchez et al. 2006; Millett et al. 2007) and because paralogs in a cluster can have different functions, the scope of this chapter is broader than allele mining alone. To deal with this we introduce the concept of paralog mining. Paralog mining is the identification of a gene within a cluster of highly homologues genes with different, often unknown, functions. Paralog mining can be used as a tool to generate molecular markers and in combination with functional screens it can be used to identify new genes conferring resistance to a particular pathogen. In this review we discuss how allele and paralog mining can help to improve disease resistance in Solanum crops.

1.2 Solanaceae Resources

The family of Solanaceae is of high economic importance and is composed of more than 3,000 species which include important crop and model plants such as potato (Solanum tuberosum), tomato (Solanum lycopersicum) and eggplant (Solanum melongena) (Knapp 2002), but also wild species occurring in very different habitats (Spooner and Hijmans 2001; Spooner et al. 2004). About 15,000 wild potato accessions are being maintained in large collections worldwide and the establishment of core and mini collections that enable an effective use of the existing variation in gene banks while maintaining the variability, as has been proposed before (Frankel and Brown 1984; Hoekstra 2009). Allele mining requires the assembly of a reasonably sized core germplasm collection usually comprising ~ 1,000 accessions representative of genetic diversity existing in the global population (Hofinger et al. 2009). Such collections can effectively be constructed using the Focussed Identification of Germplasm Strategy (FIGS) approach (Mackay et al. 2004; Bhullar et al. 2009). About 15,000 wild potato accessions are being maintained in large collections worldwide and the establishment of core and mini collections that enable an effective use of the existing variation in gene banks while maintaining the variability, as has been proposed before (Frankel and Brown 1984; Hoekstra 2009).

The genome sequence of potato (Potato Genome Sequencing Consortium et al. 2011) and tomato (The Tomato Genome Consortium et al. 2012) will facilitate mining for novel alleles or paralogs of resistance® genes. These may be found in the largely untapped resources of crossable species within the genus Solanum allowing their exploitation in breeding programs. Also, insight into sequence diversity at the R gene loci in wild Solanum species with different resistance response against economically important diseases will result in a better understanding of the mechanism of R gene functionality and evolution but can also help to identify new alleles or paralogs with different race specificities, and develop allele-specific diagnostic markers for marker assisted breeding.

1.3 Resistance Genes

If a gene is responsible for the resistance of a particular plant to a particular pathogen, this gene is called a resistance ® gene. To date, more than 100 R genes which confer resistance to a diversity of pathogens including bacteria, fungi, oomycetes, viruses, insects and nematodes have been identified and/or cloned from various plants, by a wide variety of methods including map-based cloning, transposon tagging, and similarity based DNA library screening (Sanchez et al. 2006; Ingvardsen et al. 2008; Vleeshouwers et al. 2011a). An overview of mapped and cloned R genes from Solanaceae is given in Fig. 2.1.

Fig. 2.1
figure 1

Genetic locations of disease resistance traits in Solanaceae. Twelve linkage groups are shown and the position of R genes is indicated. The R genes for potato are underlined and those for other species, mainly tomato, are not underlined. Map segments having QTL for resistance to Phytophthora infestans in potato are shown in black color

R genes often encode receptors for pathogen derived ligands and they are classified based on the combination of different domains (e.g. CC = coiled coil, TIR = toll interleukin receptor, Protein Kinase, NBS = nucleotide binding site, Lec (lectin), and LRRs = leucine rich repeats). Five classes can be identified, transmembrane proteins with extracelular LRRs (receptor like proteins, RLPs), transmembrane proteins with extracellular LRRs and intracellular protein kinase (receptor like kinases, RLKs), transmembrane proteins with extracellular “lectin like” domain and intracellular protein kinase (lectin receptor kinases, LecRKs), and intracellular NBS-LRR proteins which can be divided in CC-NBS-LRR and TIR-NBS-LRR (Dubery et al. 2012). The NBS-LRR class is the most abundant and has been extensively studied (Hulbert et al. 2001). Although NBS-LRR genes are assumed to cause pathogen race specific (or vertical) resistance, it has also been suggested that members of the NBS-LRR gene family are candidates for quantitative trait loci (QTL) that are responsible for horizontal resistance (Rietman et al. 2012; Sanz et al. 2012; Gebhardt and Valkonen 2001). Most characterized plant NBS-LRR genes are physically clustered in the plant genome. The homologous sequences in such a cluster are referred to as paralogs (Gebhardt and Valkonen 2001) and paralogs can confer resistance to different isolates of the same pathogen (Dodds et al. 2001; Li et al. 2011; Lokossou 2010) or to different pathogens (Dodds et al. 2001; van der Vossen et al. 2000). Some paralogs may also be considered as molecular fossils of evolution, whose activity is unclear or even absent, e.g. many pseudogenes have been found. In most R gene clusters the number of paralogs is very high and often an allelic relationship is hard to determine (Kuang et al. 2004). However, as the genome structure between species in the Solanaceae family is highly conserved, positional conservation of R gene clusters (synteny) is observed across Solanaceous species (Grube et al. 2000; Park et al. 2009, Fig. 2.1). Therefore, even when relatively unknown genetic sources are used, it is likely that the genes conferring resistance are linked to syntenic clusters of R genes known from well-studied species like potato and tomato.

Not just the 2006; Millett et al. identification of new alleles is important, also the functional characterisation of the identified alleles is extremely important to assess the added value of the new allele over alleles that are already present in crop plants. Many approaches have already been used and especially the currently booming research field of effector genomics, through which the identification of Avr genes is greatly accelerated, offers fast functional assays to distinguish the activity of newly identified R gene alleles and paralogs. So, allele mining approaches coupled with effector profiling enable the discovery of novel R genes at an unprecedented rate (Vleeshouwers et al. 2008, 2011a).

2 Functional Resistance Screens

2.1 Screening for Disease Resistant Accessions in Gene Bank Material

Several methods are available to carry out phenotypic screens for disease resistance in gene bank collections. Here we use the evaluation of potato germplasm for late blight resistance as an example. Inoculation of entire in vitro plantlets or inoculation of detached leaves can be used as high throughput screening methods (Vleeshouwers 2011b). In case of race specific resistance, the selection of the pathogen isolates is an important issue in the identification of major R genes. If the selected isolate happens to be compatible with the R gene (s) in a particular accession, these R genes may be overlooked. Multiple isolates can be used to distinguish the different R genes in a particular resistant accession (Huang et al. 2005; Verzaux 2010). Complementary to working with the entire pathogen, effector responsiveness can be used to identify and classify R gene alleles in a germplasm core collection (Rietman et al. 2010). In such an effectoromics approach, effectors or potential Avr genes from the pathogen, are expressed in the plant using agro-infiltration or through inoculation with recombinant Potato Virus X, referred to as agro-infection. Upon recognition of the effector by the R gene expressed by the plant, a defence reaction, referred to as hypersensitive response (HR), is initiated which is visible as a necrotic lesions in the infiltrated leaf (Fig. 2.2). The agro-infiltration test appears to be applicable and reliable for many genotypes and the variation in the HR in different genetic backgrounds is limited. The use of specific pathogen isolates and the use of specific effectors can also be employed to identify functional groups of R genes in germplasm of crop wild relatives. Within groups of functionally similar R genes, a true allele mining approach can be pursued in order to identify (sequence) variation. The functional grouping of R genes can also be employed to reduce the redundancy that is inevitably present in germplasm collections. Another virtue of the effectoromics approach was shown recently. Potato plants which have shown durable resistance to late blight contained stacks of different R genes (Verzaux 2010; Kim et al. 2012). The polygenic nature of the resistances could easily be characterised using the segregation patterns of the different effector responses. Effectors which displayed HR response in germplasm screens are potential Avr gene(s) recognized by the cognate R gene. These potential R-Avr interactions should be validated by additional genetic studies. Ideally, by cosegregation of responses to the effector with resistance to P. infestans isolates in segregating populations.

Fig. 2.2
figure 2

Effector induced hypersensitive response (HR) in Solanum tuberosum genotype MaR8. Available effectors from Phytophthora infestans were applied using agroinfiltration in leaves of the resistant plant MaR8. The dotted circles surround the infiltrated leaf area. The red dotted circles surround the effectors that elicited an HR. These effectors are selected for validation of R-Avr interactions by additional genetic analysis

2.2 QTL mapping/LD mapping

Plant pathogen resistance, at the phenotypic level, often does not behave as a single R gene but as a quantitative trait that is controlled by multiple genetic and environmental factors (Trognitz et al. 2002; Bai 2003). Understanding the molecular basis for quantitative traits will facilitate diagnosis and facilitate the combination of superior alleles in crop improvement programs. The possible approaches to mapping genes that underlie quantitative traits fall broadly into two categories: candidate gene studies, which use either association or resequencing approaches, and linkage studies, which include both QTL mapping and genome-wide association studies (GWAS). In this review, we do not discuss GWAS further because of the extensive review by Hirschhorn and Daly (2005). Linkage disequilibrium (LD) mapping, or association analysis based on candidate genes is also considered as an allele mining approach (Malosetti et al. 2007).

Some cases of close linkage between an R gene and quantitative trait loci (QTL) for pathogen resistance supports the hypothesis that qualitative and quantitative resistance have a similar molecular basis (Leonards-Schippers et al. 1994), thereby suggesting that genes showing sequence similarity to R genes are candidates for being factors underlying quantitative resistance (Rickert et al. 2003; Rietman et al. 2010). Candidate genes participating in the control of the quantitative resistance to pathogens are those involved in the disease response network; (i) R genes which recognize the pathogen and trigger the resistance response, (ii) genes which are involved in signal transduction pathways and (iii) the large group of pathogensis related (PR) genes which are expressed in response to pathogen attack and are involved in the execution phase of the defence response (reviewed by Gebhardt and Valkonen 2001).

The genetic dissection of complex plant traits in QTLs first became possible with the advent of DNA-based markers (Osborn et al. 1987). The first genes and their allelic variants underlying plant QTLs have been identified by positional cloning (reviewed in Salvi and Tuberosa 2005). Positional QTL cloning is a labor- and time-consuming process which requires the generation and analysis of large experimental mapping populations. An alternative to positional cloning of QTLs may be the allele mining approach, which is based on the knowledge of a gene’s function in controlling a characteristic of interest on the one hand, and genetic co-localization of a functional candidate gene with QTL of interest on the other (Pflieger et al. 2001; Faino et al. 2011). However, in this approach substantial a priori knowledge is required. DNA variation for genes fulfilling these criteria has been examined in natural populations of accessions related by descent for associations with positive or negative characteristic values (Li et al. 2005; Gonzalez-Martınez et al. 2007). Finding such associations indicates that DNA variation either at the candidate locus itself or at a physically linked locus is causal for the phenotypic variation, but defined prove for the involvement of the gene is still circumstantial.

3 Techniques for Allele Mining

Dependent on the research question but also dependent on genetic, genomic and financial resources available, several techniques can be used for allele mining, ranging from a rapid and inexpensive polymerase chain reaction (PCR) to next gen sequencing and everything in between. For some applications (partial) sequence information or only molecular polymorphism of the alleles is sufficient. For other applications actual cloning of the entire allele is required. Generally, all DNA based tools require the careful selection of target genes. The target gene model might require verification, and successively, a careful design of selective primers will allow the identification of novel alleles at candidate loci in the entire or core germplasm collection. In Fig. 2.3 a pipeline for novel allele discovery from germplasm collections is presented, including a combination of different approaches.

Fig. 2.3
figure 3

Pipeline of allele mining of R genes. a Indicates allele mining in case that effector tools are not available for pathogen under study, b joint effector profiling-allele mining approach, and c novel R gene discovery in the combination of NBS profiling and allele mining

3.1 Molecular Tools for Allele Tagging

All molecular marker techniques include a PCR amplification of one or multiple alleles or paralogs. In order to identify polymorphisms between amplified alleles, single-strand specific nucleases could be applied. Using this technique, that is often used in TILLING approaches, nicking of heteroduplexes of PCR products can be easily detected. A recent development is the use of high resolution melting point analysis in order to screen for mismatches between amplified alleles in a high throughput fashion. Especially suitable for the highly polymorphic and duplicated R genes, is the NBS profiling technique (van der Linden et al. 2004). It is a powerful tool to identify specific fragments of candidate R genes or R gene homologs throughout the genome by using degenerated primers that anneal to conserved sequences in the NBS domain of the NBS-LRR class of R genes. A high throughput application of this technique is to study fragment length polymorphisms as molecular markers. Also, PCR amplification of specific R genes is possible if primers are located in unique regions in order to target the specific paralog. The results may be visible as a DNA fragment of a specific size on an agarose gel. However, when gene specific markers are used in different germplasm material, often a-specific annealing of the primers can occur and therefore it will always be necessary to sequence the resulting PCR fragments to confirm their identity and homogeniety.

3.1.1 NBS Profiling

Many plant R genes are a member of a multigene cluster composed of multiple copies with high sequence similarity (Song et al. 2003). The NBS region of (NBS-LRR) R genes and their analogs (RGAs) contain highly conserved common motifs like the P-loop, the kinase −2 motif and the GLPL motif (Meyers et al. 2003; Monosi et al. 2004). These conserved motifs within the NBS-LRR genes have been used successfully to sequence (parts of) NBS regions from various plant species (Collins et al. 1998; Pflieger et al. 1999; Zhang and Gassmann 2007). NBS profiling uses the conserved motifs for efficient tagging of NBS-LRR type of R genes and their analogs (Van der Linden et al. 2004, 2005). The technique involves three different steps. (1) Digestion of genomic DNA with a restriction enzyme and ligation of adaptors to compatible restriction ends. (2) PCR amplification of NBS containing fragments using an NBS primer and an adaptor primer. (3) Separation of amplified fragments by polyacrylamide gel electrophoresis. The technique produces a multilocus profile of the genome.

NBS profiling can easily be adapted to target other conserved gene families, which is referred to as motif-directed profiling (Van der Linden et al. 2004, 2005). Also NBS profiling can be adapted to target particular R gene clusters. R genes from the same cluster usually have similarities in their sequences not shared with other R genes (McDowell and Simon 2006; Meyers et al. 2005), allowing the design of specific primers for a particular R gene cluster. NBS profiling could therefore also be adapted to reach high fragment saturation in an R gene cluster of interest (Verzaux et al. 2011, 2012; Jo et al. 2011). This technique is referred to as cluster directed profiling.

3.1.2 (Eco)-tilling

Eco tilling is a molecular method to screen germplasm core and mini collections. This technique is distinct from the TILLING approach since TILLING screens identify novel alleles that are induced by mutagenesis (Till et al. 2003; Barkley and Wang 2008) whereas eco-tilling identifies naturally occurring alleles in germplasm (Barone et al. 2009). Both approaches employ a similar screening method to identify variation in alleles. Polymorphisms in PCR amplified DNA fragments are detected in hetroduplexes of the amplicons using single strand specific nucleases, high resolution melting point analysis or deep sequencing in next generation sequencing.

3.1.3 Amplification of Specific Allelic Variants

Family members with very similar sequences may have dispersed around the genome into non synthenous loci or may have remained within a genetic locus but has multiplicated resulting in tandem or inverted repeats. In general, sequences in coding regions will be more conserved than primers in flanking sequences. Dependent of the downstream application (sequence comparison, in plant expression), primers are chosen in- or outside coding sequence to amplify the entire gene, or part of the gene or only the open reading frame. Because even single nucleotide polymorphisms can be relevant differences between alleles, preferably the DNA polymerase will contain proofreading activity. Also, because often long stretches of the target gene are amplified, a long range polymerase chain reaction (LR-PCR) polymerase is preferred. Examples of enzymes that harbour both characteristics are Pfu-Turbo from Invitrogen or Phusion from Fermentas. One approach is to amplify the entire coding sequence of the R gene of interest using primers annealing to start and stop codon regions. Subsequently, the amplicon is sequenced and for expression studies it can be cloned in a vector that harbours heterologous regulatory sequences. For some accessions a possible lack of amplification can be expected due to absence of a coding gene or to low sequence homology at the primer annealing sites. A drawback of this approach is that the promotor and terminator regions of the novel alleles are missing, so variation in these regulatory regions are neglected. For ‘true’ allele mining, the use of primers matching the promotor and terminator regions is feasible when sequence conservation is sufficient. Accessions may also first be screened for the presence of the known R gene with a diagnostic molecular marker obtained from haplotype studies at the R gene locus and next for the presence of new alleles of the known R gene (Bhullar et al. 2009) to identify stronger alleles. Song et al. (2003) showed that allele mining could be used to clone the functional RB allele from a cluster with two highly similar paralogs. Also Wang et al. (2008) and Lokossou et al. (2010) could specifically amplify the target allele rather than paralogous genes in a Rpi-blb1 allele mining study. Latha et al. (2004) exploited allele mining to identify stress tolerance genes in Oryza species and related germplasm. A common feature of the three genes investigated was that they were members of multigene families. Primers based on the 5’ and 3ʹ untranslated region of genes were found to be sufficiently conserved over the entire range of germplasm in rice to which the concept of allelism is applicable, while the primers based on the start and stop codon amplified sequences from additional loci (Latha et al. 2004).

is the cloning of the Rpi-vnt1.1 gene (Pel et al. 2009). NBS profiling revealed a fragment that was co-segregating with resistance in a F1 population. The sequence of this NBS profiling band was similar to a known R gene (Tm-2 2). The mined allele had a different genetic position on the same chromosome as the Tm-2 2 gene. The entire coding sequence of the Rpi-vnt1.1 allele was found after sequence analysis of a BAC clone derived from the genomic locus (Foster et al. 2009).

3.2 Next Generation Sequencing in Allele Mining

Currently, most genome and transcriptome sequencing projects, which used Sanger sequencing methodology in the past, are being replaced by next generation sequencing (NGS) technologies. These NGS technologies are able to generate data inexpensively and at a rate that is several orders of magnitude faster than that of traditional technologies (reviewed in Ercolano et al. 2012). At present there are several next generation sequencers on the market (Voelkerding et al. 2009). Most of these systems have different underlying biochemistries but all of these technologies sequence populations of PCR-amplified DNA molecules. The Heliscope and the PacBio, which sequence single molecules, are the exceptions. The amount of sequence data and the length of the reads are increasing with the continued development of the technology. Now resequencing and de novo sequencing of transcriptomes and genomes is becoming more and more accessible for individual labs (Varshney et al. 2009). This will lead to the discovery of novel useful variation which has been limiting the application of sequence-based selection in plants in the pre NGS era (Henry 2011). The availability of large numbers of genetic markers that can facilitate linkage mapping and whole genome scanning (WGS)-based association genetics that are of practical use for MAS in marker-deficient crops (Varshney et al. 2009). Resequencing of several genomes (Cao et al. 2011) followed by the comparison of all candidate R genes is now feasible in Arabidopsis (Guo et al. 2011). Soon this type of analysis will also be applied for crops and their wild relatives. Resequencing of parts of the genome with duplicated sequences, like R gene clusters, will remain a challenge, especially in heterozygous species like potato (Potato Genome Sequencing Consortium 2012). Single molecule sequencing will offer great opportunities for this research field (Koren et al. 2012).

3.3 Functional Analysis of Newly Identified Alleles

If the candidate genes have been identified in allele mining studies, it is required to confirm their functionality. Transient and stable transformations are valid for that purpose. Agroinfiltration, an Agrobacterium tumefaciens-based method, is currently the best developed and most reliable method for transient expression in plants (Vleeshouwers et al. 2011). Using this method R gene alleles and the Avr genes can be coexpressed in N. benthamiana (Bos et al. 2006) or other plant species (Rietman et al. 2010). Consequently, a hypersensitive response (HR) occurs in the infiltrated leaf area. This approach is only applicable in cases where the cognate Avr gene is available. Agroinfiltration can also be carried out by expressing only the R gene in a host plant, followed by pathogen challenge inoculations (Lokossou et al. 2009; Pel et al. 2009). Another type of transient expression, which allows high throughput screenings is agroinfection. A gene of interest is cloned into a viral genome. Successively, the viral genome is introduced into plant cells through A. tumefaciens. Only a few cells need to be infected after which viral particles are formed that spread through the plant. Along with the virus, the gene of interest is expressed in the plant. There is, however, a limitation to the size of the gene to be expressed. Fragments over 500 bp in size will not express sufficiently.

The stable transformation into plant is still considered functional analysis which provides the most clear and definitive evidence. Transgenic plants can be tested for resistance in different developmental stages. For example, an in vitro inoculation assay was developed for routine high-throughput disease testing of Phytophthora infestans in potato (Huang et al. 2005).

4 Examples of Allele Mining in Solanum

As described in the previous section the technique of choice is highly dependent on the research question and application. Applications can be very diverse, ranging from very practical, like R gene mapping and cloning, to the identification of novel genetic resources, to more scientific applications like R gene geographic distribution and evolution. In this section examples of applications using the different techniques are presented.

4.1 Genetic Mapping

Mapping of R genes is strongly facilitated by allele mining through NBS profiling. Typical examples of R gene mapping approach are provided by Pel et al. (2009), Jacobs et al. (2010), Jo et al. (2011) and Verzaux et al. (2011, 2012). The first step in the approach consists of producing small (n = 20–100) populations segregating for P. infestans resistance, phenotyping the populations for resistance, and composing bulks of resistant and susceptible individuals. Then, the bulks are genotyped using NBS profiling to obtain markers that co-segregate with resistance. Next sequencing of co-segregating NBS fragments and BLAST analysis to identify the fragments is performed. Combining this information with literature and genome sequence data on mapping of resistance genes will suggest a putative map position. Finally, the map positions are confirmed using known flanking markers.

Jo et al. (2011) used NBS profiling and successive marker sequence comparison to the potato and tomato genome draft sequences to identify the genetic position of the late blight resistance gene R8. According to this work, R8 was located on the long arm of chromosome IX and not on the short arm of chromosome XI as was suggested previously by Huang et al. (2005). This is a first example where NBS markers could be directly landed in the sequenced (draft) genomes of potato and tomato. Through comparison of known markers in the tomato genetic map to the draft sequence, scaffolds were anchored to the tomato genetic map (anchored scaffold approach). Very recently, the R9 mediated-late blight resistance was also mapped near the R8 locus on chromosome IX using R gene cluster directed profiling approaches (Jo et al. in preparation).

4.2 Cloning Functional Alleles

Several late blight R genes have been cloned from potato wild relatives using allele- and paralog mining (for reviews see: Vleeshouwers et al. 2011, Rietman et al. 2010). Sometimes there is no clear distinction between allele- and paralog mining because of the high similarity among genes. An example of true allele mining was shown by Vleeshouwers et al. (2008) who isolated the functional alleles of Rpi-blb1 present in S. stoloniferum, Rpi-sto1 and Rpi-pta1. The entire genes were isolated by long range PCR using primers up and downstream of the coding regions. Specificity of the cloned genes was shown with different P. infestans isolates and with effector IpiO −1 and 2, which is recognized by Rpi-blb1, Rpi-sto1 and Rpi-pta1. An allelic relationship between the three genes was also shown using marker (CT88) segregation studies (Wang et al. 2008). Sequence analyses showed that the putative functional homologs Rpi-sto1 and Rpi-pta1 are nearly identical to Rpi-blb1, with only 3 and 5 non-synonymous nucleotide substitutions inside the coding sequence, respectively.

A slightly different example of allele mining was provided by Lokossou et al. (2009), who described the map based cloning and functional characterization of Rpi-blb3 and Rpi-abpt, which are allelic variants R2 and R2-like. An allele mining strategy was employed using a start stopcodon approach. In this study a major technological improvement was made. The GatewayTM technology was used to clone the entire amplified coding sequences in a destination vector under the control of the Rpi-blb3 promotor and terminator. A combination of efficient cloning of candidate alleles was combined with transient complementation assays in Nicotiana benthamiana and allowed for the rapid cloning and identification of R2 and R2-like alleles.

Champouret used a similar technical approach to mine for R3a and R2 alleles. A start stopcodon approach was pursued and the R3a screen revealed alleles with identical activity in distantly related species. This is considered as true allele mining. Also the R2 screen revealed many genes with identical activity, however, also a few genes were identified which had slightly different recognition specificities, suggesting that not only alleles but also paralogs were mined. This is an example where allele mining and paralog mining are overlapping (Champouret 2010).

Paralog mining strategies can be pursued in order to facilitate map based cloning of novel R gene variants. An example of successful paralog mining came available using an R2 mining approach applied on S microdontum. This resulted in the isolation of Rpi-mcd1 which is functionally distinct from R2 since the Avr2 gene was not recognised (Lokossou 2010). Another example is the cloning of the Rpi-vnt1.1 gene (Pel et al. 2009). NBS profiling revealed a fragment that was co-segregating with resistance in a F1 population. The sequence of this NBS profiling band was similar to a known R gene (Tm-2 2). PCR amplification of Tm-2 2 homologs identified the functional Rpi-vnt1.1 gene. The mined allele had a different genetic position on the same chromosome as the Tm-2 2 gene. Also the biological activity was different and therefore this study followed a typical paralog mining approach. This study also illustrates a risk associated with paralog mining in multigene families (Pel et al. 2009). Using the start stop codon primer pair derived from Tm-2 2 only a part of the coding sequence was identified and a N-terminal extension, specific for the Rpi-vnt1 alleles were overlooked. The entire coding sequence of the Rpi-vnt1.1 allele was found after sequence analysis of a BAC clone derived from the genomic locus (Foster et al. 2009).

4.3 Uncovering Allelic Variation for Specific Genes

Allele mining can also be used to uncover genetic variation for a particular R gene and identify germplasm containing functional alleles from the same or different species. Nunziata et al. (2007) studied the variability of one cluster of genes at the Gro1 locus responsible for resistance to Globodera rostochiensis race Ro1 in several potato species. The cluster is known to comprise 10 different paralogs, among which only the Gro1–4 gene has been demonstrated to confer resistance against Globodera rostochiensis race Ro1. Using available sequence information, three primer pairs were designed that target different regions of the Gro1 sequence. The first was designed in a highly conserved region and allowed the presence of at least one member of the gene cluster to be identified in 16 wild species analysed. The second primer pair was designed on a Gro1–4 specific region and its use demonstrated that no gene identical to Gro1–4 was present in any wild potato species analysed. Finally, the major part of the LRR coding sequence of the Gro1 gene was amplified and sequenced in 16 wild species. In total, 409 SNPs were identified, varying between species from 12 SNPs in S. demissum to 35 in S. stoloniferum. These data could be used to identify evolutionary selection pressure since the non-synonymous/synonymous ratio (Ka/Ks) in most species was different from 1.

A similar type of screen was performed by Wang et al. (2008) and Lokossou et al. (2010). They analyzed the presence and allelic diversity of the late blight R genes Rpi-blb1, Rpi-blb2 and Rpi-blb3 in 196 different taxa of tuber-bearing Solanum species. The Rpi-blb1 gene is part of a resistance gene analog (RGA) cluster of four members on chromosome VIII, Rpi-blb2 resides in a locus harbouring at least 15 tomato Mi gene homologs on chromosome VI and Rpi-blb3 originates from a cluster on chromosome IV. For all genes primers were design that would allow amplification of a specific fragment of the gene. The genes were only present in some Mexican diploid as well as polyploid species closely related to S. bulbocastanum, although differences in the distribution existed among the 3 genes. The Rpi-blb1 gene was only found in S. bulbocastanum, S.cardiophyllum subsp. cardiophyllum, and S. stoloniferum, the Rpi-blb2 only in S. bulbocastanum, and the Rpi-blb3 gene in S. pinnatisectum, S. bulbocastanum (including some subspecies), S. hjertingii, S. nayaritense, S. brachistotrichum, and S. stoloniferum. Sequence analysis of part of the Rpi-blb1 and Rpi-blb3 gene suggests an evolution through recombination and point mutations. For Rpi-blb2 only sequences identical to the cloned gene were found, suggesting that it has emerged recently. The three R genes occurred in different combinations and frequencies in S. bulbocastanum accessions and their spread is confined to Central America (Lokossou et al. 2010). A practical outcome of the allele mining study by Wang et al. (2008) was the discovery of conserved homologues of Rpi-blb1 in an EBN 2 tetraploid potato species, e.g. S. stoloniferum. The Rpi-blb1 is present in the diploid tuber-bearing S. bulbocastanum, which is not directly crossable with the tetraploid S. tuberosum. Solanum stoloniferum can be directly crossed to cultivated potato, thus facilitating an easy transfer of a gene with exactly the same specificity and functionality as Rpi-blb1.

An allele mining approach to identify variation in the Avr9 recognizing Cf-9 alleles provided evidence for the presumed evolutionary mechanism driving R gene diversification. Subsequent intra- and intergenic unequal recombination events were held responsible for the sequence diversification of Cf-9 alleles. However, this diversification was not accompanied by a functional diversification since the Avr9 effector could still be recognized (Van der Hoorn et al. 2001; Kruijt et al. 2004).

4.4 Alleles in Natural Populations of Solanum

Knowledge on the evolution and distribution of disease resistance genes is important for a better understanding of the dynamics of these genes in nature. Caicedo (2008) studied geographic diversity cline of R gene homologs in natural populations of Solanum pimpinellifolium L., a wild relative of cultivated tomato, to determine the possible roles of demography and selection on R gene evolution. The patterns of diversity at the multigenic Cf-2 gene family were investigated which consisted of 26 closely related homologs, referred to as the Hcr2-p family (Caicedo and Schaal 2004). The 26 Hcr2-p homologs display length variation due primarily to variation in the number of LRR-coding units within each gene and can be classified into nine different size classes according to length; within size-classes, homologs differ from each other by one or a few single nucleotide polymorphisms (SNPs). Solanum pimpinellifolium individuals vary extensively in the number of Hcr2-p homologs they carry, with Southern blots results suggesting 1–5 genes per individual (Caicedo and Schaal 2004). Species-wide analyses of Hcr2-p sequence diversity suggest that selection has played a role in the evolution of the gene family. Patterns of amino acid substitution are consistent with purifying selection in the 5′ LRR-coding portion of the genes and positive selection on some amino acid residues in the 3′ region. Evolutionary relationships among homologs also suggest that balancing selection has shaped species-wide patterns of diversity. Studies on patterns of diversity at the multigenic Cf-2 gene family in S. pimpinellifolium populations along the northern coast of Peru showed that population diversity levels of Cf-2 homologs follow a latitudinal cline, consistent with the species’ history of gradual colonization of the Peruvian coast and population variation in outcrossing.

In another approach the wild tomato germplasm was screened for responsiveness to the Avr4 and Avr9 effectors from C. fulvum, which are recognized by the Cf-4 and Cf-9 R proteins respectively. Recognition and the presence of the matching R genes was ubiquitous throughout the screened germplasm. This allele mining approach showed that C. fulvum is an ancient pathogen of the genus Lycopersicon (Kruijt et al. 2005).

Several studies have now clearly shown the potential of allele mining in Solanum for the improvement of disease resistance. A large number of allelic variants of known disease resistance genes have been discovered and in several cases also functionality of the variants was shown. As was shown, allele mining can strongly facilitate the cloning of R genes by using comparative genomics approaches. Allele mining has also been shown to be useful for the identification of orthologous sequences in species that are more easily crossable with the cultivated material then the species the gene was originally discovered in, thus facilitating a more rapid deployment of genes in breeding programs. Furthermore allele mining was useful for the identification of novel, yet unknown R genes and shed light on evolutionary processes related to these genes. As more and more R genes are identified and cloned, the chances increase that new R genes reside at known and well-characterized loci, enabling the use of comparative genomics and, thus, the development of efficient allele mining strategies.

The availability of the potato and tomato genome sequences, together with a constant drop in the sequencing cost will boost allele mining even more. The fast (r)evolution in the high throughput sequencing technologies, especially the increase in read lengths expected from the third generation of single molecule sequencing platforms, will provide a complete survey of the distribution of R gene clusters in the Solanaceae family, enabling a dramatic acceleration in the process of identifying agronomically important genes like novel R genes.

We envisage that novel and efficient ‘mining’ strategies can give direct access to disease resistance genes of interest using next generation sequencing in combination with effector genomics. However, as not all effectors are known yet more effort should be made in that area. Another interesting research area relates to the durability of the R genes. At present it is unknown whether all allelic variants discovered for a particular gene are equally easily overcome by the pathogen. If not, this may be a way to identify more durable genes.