Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Domestication and breeding of many crops have resulted in relevant improvements in yield and quality, but at the same time they have been coupled to a depletion of the genetic variation present in elite germplasm, causing the loss of valuable alleles originally present in the wild relatives of many crops (Simmonds 1976; Tanksley and McCouch 1997). This problem is particularly severe in self-pollinated crops such as tomato and rice (Miller and Tanksley 1990; Wang et al. 1992). The narrow genetic base of modern crop varieties not only makes them more susceptible to diseases, but it also raises concerns about the prospects for continued genetic gains necessary to face the challenges of feeding 9 billion people by the year 2050, ensuring sustainable and global food security in an age of climate change (Godfray et al. 2010; Tester and Langridge 2010; Fridman and Zamir 2012). The above scenario, combined with restrictions on the commercial use of genetically modified plants, has renewed the interest in exploring and exploiting natural biodiversity as a source of novel alleles to improve the productivity, adaptation, quality and nutritional value of crops (Tanksley and McCouch 1997; Zamir 2001; McCouch 2004, 2007; Grandillo et al. 2008; Johal et al. 2008).

Genetic variability is the foundation for any crop breeding program. Nature offers a tremendous wealth of genetic variants of both basic and practical interest, which have been created and selected by nature over millions of years of evolution, as the wild ancestors of most crop plants can still be found in their natural habitats. The value of exotic germplasm, including landraces and wild relatives, as a source of new and useful alleles, that could compensate the loss caused by modern breeding, was recognized already at the beginning of the past century (Bessey 1906; McCouch 2004). Since then, considerable effort and resources have been invested worldwide in large plant collections and preservations, with a particular emphasis given to “exotics”, which have resulted in more than 1,400 gene banks with about 6 million accessions representing most of the common crop species (Glaszmann et al. 2010). However, these genetic resources have been only marginally explored and exploited, leaving most of their genetic potential still untapped (Tanksley and McCouch 1997; Glaszmann et al. 2010).

A wider use of exotic germplasm in breeding programs has been hindered by several inherent problems, which are often associated with crosses involving wild and domesticated species, and in part by the lack of adequate techniques that would enable a more efficient discovery and utilization of the valuable alleles present in exotic species. Pre- and post-zygotic barriers, infertility of the segregating generations, suppressed recombination between the chromosomes of the two species, ‘linkage drag’, as well as the long time and effort necessary to recover the elite parent genetic background, are some of the problems often observed in wide crosses. In addition, much of the unadapted germplasm is phenotypically inferior to elite germplasm for many of the traits that breeders would like to improve. As a result, most plant breeding programs have relied, and still rely, on reshuffling the same set of genes/alleles already available in the elite lines, reducing the overall genetic variation available for future sustained crop improvements. In general, the use of exotic germplasm has mostly focused on major genes for disease and insect resistance (Plunknett et al. 1987) as shown by the high number of resistance genes derived from wild species, which can be found in elite lines (Zamir 2001; Hajjar and Hodgkin 2007). In contrast, its use as a source of valuable alleles for the improvement of other traits relevant to agriculture such as yield, stress tolerance and quality has been more limited, with differences depending on the crops (Hajjar and Hodgkin 2007). Such traits, in fact, are often quantitatively inherited, displaying continuous variation and resulting from the segregation of numerous interacting quantitative trait loci (QTL), with varying magnitude of effect, whose expression is modified by the genetic background and the environment (Mackay 2001).

Over the past decades, improved interspecific hybridization techniques along with advances in quantitative genetics and genomic technologies have provided the necessary tools to overcome some of the difficulties associated with the use of exotic germplasm for the improvement of complex traits. High-density molecular genetic maps have allowed for the identification and characterization of single QTL contributing to complex traits while their fine-mapping allows us to distinguish pleiotropy from close linkage and, importantly, to reduce the negative effects of linkage drag (Tanksley 1993; Eshed and Zamir 1996; Frary et al. 2003). Furthermore, QTL mapping studies have also provided stronger evidence that low-performing wild and unadapted species can contribute agronomically favorable QTL alleles associated with transgressive segregation observed in several interspecific crosses that have the potential to improve yield, as well as other important traits (de Vicente and Tanksley 1993; Eshed and Zamir 1995; Tanksley et al. 1996; Xiao et al. 1996, 1998; McCouch et al. 2007; Grandillo et al. 2008). These results indicate that the phenotype of wild species is a poor predictor of their breeding value, and that the domestication process has “left behind” many favorable alleles, which could now be more efficiently “recovered” using innovative genomic-assisted breeding strategies (Tanksley and McCouch 1997; Zamir 2001; McCouch 2004; Cavanagh et al. 2008; Johal et al. 2008).

However, despite the numerous QTL-mapping studies conducted and reported for many crops, the contribution of QTL analysis to breeding new varieties has been limited. In order to bridge the gap between QTL mapping and variety development based on the use of unadapted germplam, two related molecular breeding strategies, the “Advanced Backcross QTL analysis (AB-QTL)” (Tanksley and Nelson 1996) and “exotic libraries” or introgression line (IL) libraries (Eshed and Zamir 1995; Zamir 2001) have been developed and tested in several crops (Table 4.1) (Lippman et al. 2007; McCouch et al. 2007; Grandillo et al. 2008, 2013; Swamy and Sarla 2008; Tan et al. 2008; Ali et al. 2010; Buerstmayr et al. 2011; Blair and Izquierdo 2012; Sayed et al. 2012; Varshney et al. 2013). These two approaches were proposed to more efficiently harness the genetic potential stored in seed banks and in exotic germplasm for the improvement of elite germplasm, thereby expanding the genetic base of crop species (Tanksley and McCouch 1997; Zamir 2001). Both methods have allowed the identification of favorable wild QTL alleles for numerous traits of agronomical interest and the development of introgression lines (prebred) that can be used in marker-assisted breeding programs.

Table 4.1 Introgression libraries of crops derived from interspecific crosses

Both strategies have been covered in other reviews (Lippman et al. 2007; McCouch et al. 2007; Grandillo et al. 2008; Ali et al. 2010). This paper will focus on the IL-approach, providing an overview of the results achieved over the last 20 years in tomato as well as in other crops. Considering that the principles of the IL approach were first outlined and successfully applied in tomato, a particular emphasis will be given to the efforts and accomplishments achieved within the tomato clade.

2 IL-Based Analyses of Complex Traits

Most traits of biological and economic interest are of a quantitative nature, making the elucidation of their genetic and molecular bases a notoriously challenging task. Over the past decades numerous different types of segregating populations have been used for QTL mapping in plants (Cavanagh et al. 2008). Many QTL have been identified either using biparental populations exploiting recent recombinations, or using association analysis, which exploits historical recombination. At the beginning, early biparental segregating generations (F2, F3 and BC1) or recombinant inbred lines (RILs) have been widely used. However, these populations have several limitations caused by the high proportion of donor parent alleles that still segregate, including the overshadowing effect of major QTL on the effects of independently segregating minor QTL, or the relatively high level of epistatic interactions that occur between donor QTL alleles and other donor genes. As a consequence, favorable donor QTL alleles identified in these mapping populations often lose their effects once they are introgressed into the genetic background of elite lines. In the case of interspecific crosses involving exotic germplasm, partial or complete sterility problems further complicate QTL analyses, since a few genes for sterility may hamper population development and/or the analysis of agronomical important traits (such as fruit characters).

In order to circumvent these limitations, and to gain an insight into the genetic factors underlying differences between the cultivated tomato (Solanum lycopersicum L.) and its wild relatives, Zamir and colleagues used RFLP (restriction fragment length polymorphism) markers to construct the first complete set of substitution lines in tomato (referred to as introgression lines—ILs), consisting of 50 near isogenic lines (NILs) carrying single marker-defined homozygous chromosomal segments of the wild green-fruited species S. pennellii (acc. LA0716) in an otherwise homogeneous genetic background of the processing inbred cv. M82 (Eshed and Zamir 1994, 1995). The whole donor genome is represented by the complete panel of overlapping homozygous chromosomal segments, and it is a permanent mapping population since it can be maintained by self-pollination. One of the earliest examples of this kind of genetic resources was reported by Kuspira and Unrau (1957), who analyzed quantitative traits in common wheat using whole-chromosome substitution lines (CSLs), in which the introgressions span complete chromosomes. Subsequently, to define the position of genes on substitution chromosomes, recombinant inbred chromosome substitution lines (RICSLs) have been developed (Cavanagh et al. 2008).

Since the pioneer studies conducted by Kuspira and Unrau (1957) and by Eshed and Zamir (1995, 1996) and the theoretical landmark laid by Tanksley and Nelson (1996), sets of introgression lines representing different fractions of the exotic (wild species or landrace varieties) parent genome have been developed for various crops including, barley, cotton, indian mustard, lettuce, peanut, rice, rye, and common wheat (Table 4.1). In other cases, such as cabbage (Ramsay et al. 1996), tomato (Causse et al. 2007), rice (Li et al. 2005; Ashikari and Matsuoka 2006; Mei et al. 2006; Zhao et al. 2009; Xu et al. 2010; Gu et al. 2012), melon (Eduardo et al. 2005, 2007; Fernandez-Silva et al. 2010) and maize (Szalma et al. 2007; Pea et al. 2009; Salvi et al. 2011), ILs have been obtained using intraspecific crosses. Sets of introgression lines have also been constructed for the model species Arabidopsis thaliana using the three accessions Columbia, Landsberg and Niederzenz (Koumproglou et al. 2002; Torjék et al. 2008).

In the case of crosses involving cultivated and exotic germplasm, these congenic populations have been referred to as “exotic libraries” (Zamir 2001). However, since populations of ILs have been developed also using adapted germplasm as donor parents and from intraspecific crosses, in more general terms they can be referred to as “IL populations” or “IL libraries”. Furthermore, while ideally an IL library should be made up of lines each containing a single chromosomal segment deriving from the donor parent, in practice, in many cases several lines in the population may still carry multiple donor introgressions (hereafter referred to as pre-ILs) and the whole set of ILs might cover variable portions of the donor genome (Table 4.1).

Although these populations are very similar in essence, different names have been used including “Introgression Lines (ILs), Backcross Recombinant Inbred Lines (BCRILs), Near Isogenic Lines (NILs) or QTL-NILs, Chromosome Segment Substitution Lines (CSSLs), Backcross Inbred Lines (BILs), Recombinant Chromosome Substitution Lines (RCSL) (see references in Table 4.1), as well as ‘Stepped Aligned Inbred Recombinant Strains’ (STAIRS) (Koumproglou et al. 2002), NILs (Keurentjes et al. 2007) and ILs (Torjék et al. 2008) in Arabidopsis. As mentioned before, a special case of IL populations are chromosome substitution lines such as those developed in Arabidopsis (Koumproglou et al. 2002) and cotton (Saha et al. 2006).

Similar population structures have also been produced for model animal species such as “Chromosome Substitution Strains (CSSs)” in mice (Singer et al. 2004), ILs in Caenorhabditis elegans (Doroszuk et al. 2009) and in Drosophila (Fang et al. 2012), and “Segmental Introgression Lines (SILs)” in parasitic wasp (Desjardins et al. 2013).

The process of IL production involves some backcrossing scheme aided by marker analysis during or after the backcross, followed by one or more generations of self-fertilization to fix the lines (Zamir 2001). The main factors influencing the efficiency of foreground and background selection are the breeding scheme, the selection strategy and the population sizes (Falke et al. 2009b; Falke and Frisch 2011). The production of such congenic and permanent resources is quite a laborious and time-consuming task which can take several years. However, the advent of high-throughput marker technologies has provided the necessary tools to make IL development a much more efficient and precise process (Severin et al. 2010; Xu et al. 2010; Schmalenbach et al. 2011).

In many instances, ILs have been used to confirm, stabilize and fine-map QTL identified in other population structures and therefore only a relatively small proportion of the donor parent genome was represented among the developed ILs (Paterson et al. 1990; Szalma et al. 2007). On the other hand, the availability of whole-genome IL populations allows screening for QTL of the entire genome (Eshed and Zamir 1995).

Several properties of these libraries of introgression lines contribute to their power in identifying and stabilizing QTL, and they have been thoroughly discussed elsewhere (Zamir 2001; Keurentjes et al. 2007; Lippman et al. 2007; Grandillo et al. 2008). In summary, in the ideal case of IL libraries made up of lines each containing a single donor parent introgression, all the phenotypic differences between an IL and the recurrent parent should be due to the allelic differences at one or more genes within the introgressed chromosomal segment. This should reduce much of the genetic background “noise”, thus increasing the ability to statistically identify small phenotypic effects using a simple statistical procedure. Another important aspect of these congenic mapping populations is their “immortal nature” with a characterized genotype which eliminates the need of making crosses and of genotyping, but it also allows replicated measurements of the same line, reducing the effect of the environment and increasing the power of QTL detection. The permanent nature of these lines not only facilitates more accurate estimates of the mean phenotypic values, but replicated trials of the same line can be analyzed in different years and/or environments, which allows to estimate the extent of QTL * environment interactions (Monforte et al. 2001; Liu et al. 2003; Gur and Zamir 2004). Multiple data can be collected in different laboratories on the same lines also for multiple, even invasive and destructive traits, thereby creating a comprehensive phenotypic database for general access (Zamir 2001; Gur et al. 2004). Since the lines in the library differ from the recurrent parent by only a single chromosomal segment derived from the donor parent, their phenotypes generally resemble that of the recipient parent, which, in the case of crosses between cultivated and exotic germplasm, reduces the sterility problems that occur in other mapping population structures characterized by a higher frequency of the exotic parent genome, and also allows the lines to be evaluated for yield-associated traits. However, the advantage of single-introgressed segment ILs in resolving individual QTL is also a drawback, as epistatic interactions between unlinked loci, which are a major component of the phenotypic variation, cannot be directly estimated.

The map resolution of a population of ILs is defined by the overlap between contiguous donor introgressions (bins) to which genes or QTL can be assigned by comparing lines (Pan et al. 2000; Liu et al. 2003; Paran and Zamir 2003). The number, length and overlap of adjacent segments define bin lengths, which vary across the genome. One drawback of IL libraries is their initial relatively low level of map resolution, which in the extreme case of whole-chromosome substitution lines corresponds to the entire chromosome. However, each IL represents the starting point by which the phenotypic effects of QTL can be fine-mapped to smaller intervals (Paterson et al. 1990).

Higher resolution mapping of QTL allows us to assess whether the effect on the phenotype is due to a single QTL or to several tightly linked QTL affecting the same trait, as well as to verify whether possible undesirable effects are caused by linkage drag of other genes or by pleiotropic effects of the selected QTL (Eshed and Zamir 1996; Monforte and Tanksley 2000b; Monforte et al. 2001; Fridman et al. 2002; Frary et al. 2003; Chen and Tanksley 2004; Yates et al. 2004; Gur et al. 2010). For instance, high-resolution mapping of the Brx9-2 -5 QTL (affecting total soluble solids of tomato fruit) in two divergent genetic backgrounds, indeterminate glasshouse tomatoes and determinate open-field varieties), enabled the mapping of a new pleiotropic QTL for the same trait that interacts with the genetic background (Fridman et al. 2002). Another example is provided by fine mapping of the major QTL stigma exsertion 2.1 (se2.1), which revealed a complex locus composed of at least five closely linked genes: three controlling stamen length, one controlling style length and one conditioning anther dehiscence (Chen and Tanksley 2004). Of these five loci, the locus controlling style length (Style 2.1) accounted for the greatest change in stigma exertion and was subsequently cloned (Chen et al. 2007).

Besides reducing linkage drag, the development of lines with smaller introgressions (sub-ILs) allows molecular markers to be found which are more tightly linked to the QTL of interest that can be used for marker-assisted breeding. Desirable donor QTL alleles identified in IL populations can be combined in multiple-ILs by means of marker-aided QTL pyramiding approaches to improve the performance of elite lines (Gur and Zamir 2004; Ashikari and Matsuoka 2006; Zong et al. 2012; Sacco et al. 2013) (also see Sect. 4.3.1). Once introgressed, chromosome segments have been subdivided and targeted, and QTL-containing lines have been created, crosses between the lines can be used to study the phenotypic effects of QTL interactions to better understand the nature of epistasis (Tanksley 1993; Eshed and Zamir 1996; Causse et al. 2007). ILs can also be used to obtain more precise estimates of the magnitude of QTL * genetic background interaction (Eshed and Zamir 1995, 1996; Monforte et al. 2001; Gur and Zamir 2004).

Introgression lines are also a powerful tool to study the genetic basis of heterosis, since homozygous lines in a library can be crossed to different tester lines, allowing the effects of heterozygosity on the phenotype to be investigated (Semel et al. 2006), for the positional cloning of key genes underlying quantitative traits (Frary et al. 2000; Fridman et al. 2000, 2004; Salvi and Tuberosa 2005; Uauy et al. 2006; Cong et al. 2008), and for systems-based analyses aimed at identifying genes controlling complex developmental networks (Lippman et al. 2007; Lee et al. 2012; Toubiana et al. 2012) (see Sects. 4.3.1 and 4.3.2).

3 The IL Approach in the Tomato Clade

3.1 The S. pennellii LA0716 Exotic Library

Members of Solanum sect. Lycopersicon—the clade containing the cultivated tomato (Solanum lycopersicum L.) and its 12 wild relatives—along with the four allied species in the immediate outgroups Solanum sects. Juglandifolia and Lycopersicoides, are adapted to a wide variety of environmental conditions, which correspond to a wide range of variation in terms of morphological, physiological, mating system and biochemical characteristics (Peralta et al. 2008). Due to the low genetic variation of cultivated germplasm (Miller and Tanksley 1990), tomato wild species have played an important role as sources of useful genes, and for the development of mapping populations (Rick 1982; Grandillo et al. 2011, 2013).

Solanum pennellii LA0716, is a small green-fruited desert species characterized by unique phenotypes. It is distantly related to cultivated tomato, yet the two species are sexually compatible and produce fertile hybrids. In 1969, Rick reported the development of tomato introgression lines using three chromosome segments from S. pennellii and recessive mutant chromosome stocks from S. lycopersicum. Subsequently, the development of DNA marker technology allowed the use of S. pennellii LA0716 as the founding donor parent of the first whole-genome exotic library in tomato (Eshed and Zamir 1994, 1995).

This population, initially consisting of 50 ILs in the genetic background of the elite inbred variety M82, allowed the identification of yield-associated QTL, and the analysis of their epistatic and environmental interactions (Eshed and Zamir 1995, 1996). These first studies also demonstrated the higher efficiency of IL populations in detecting QTL compared with conventional segregating populations such as F2, BC1 or RILs (Zamir and Eshed 1998). To increase the mapping resolution of the S. pennellii LA0716 ‘exotic library’ additional 26 sub-ILs were added and the resulting 76 lines partition the entire genetic map into 107 bins, which are defined by singular or overlapping segments (Fig. 4.1), each with an average length of 12 cM (Liu and Zamir 1999; Pan et al. 2000; Liu et al. 2003; http://solgenomics.net/). More recently, as part of a EU project (EU-SOL, http://www.eu-sol.net/), the S. pennellii IL library was expanded through the addition of > 400 sub-ILs (Lippman et al. 2007; D. Zamir, personal communication). Furthermore, in order to allow the estimation of the relative contributions of epistasis to the phenotypic diversity, a new S. pennellii LA0716-based population of several hundreds BILs was constructed in the M82 background (D. Zamir, personal communication). Each BIL genotype carries multiple wild species introgressions permitting phenotypes to be associated with specific epistatically interacting QTL. Individual ILs and sub-ILs can then be used to reconstruct any epistasis detected in the BILs and to study the genetic and developmental components underlying the specific interactions.

Fig. 4.1
figure 1

The Solanum pennellii IL population. a Genome introgressions on the 12 tomato chromosomes of the 76 S. pennellii ILs, which are nearly isogenic to each other and differ only for the marked introgressed chromosome segments. b Heterosis for plant biomass in the F1 hybrid of S. pennellii × S. lycopersicum (the middle plant) compared to the recurrent parent, M82 (far left and right plants). S. pennellii, while self-compatible in its native arid environment, does not set fruit in agricultural field conditions; however, it contributes QTL that significantly improve yield and other traits. The homozygous ILs show primarily lower yield than both parents owing to sterility, whereas certain IL hybrids show heterosis and increased yield. Interestingly, in many instances of crossing two ILs with similar QTL effects, double IL heterozygotes show lower magnitude than the sum of the effects of single heterozygotes, reflecting non-additivity of canalized phenotypes (Eshed and Zamir 1996). (Reproduced with permission from Lippman et al. (2007) Curr Opin Genet Dev 17:545, Fig. 1)

Over the years, the S. pennellii LA0716 ILs have been evaluated for hundreds of traits allowing the identification of over ~ 2,800 QTL (Table 4.1) (Lippman et al. 2007; Grandillo et al. 2011, 2013). Repeated measurements have been conducted by multiple labs for numerous yield-associated, fruit morphology and biochemical traits, and the resulting raw data have been deposited in the phenotype warehouse of Phenom Networks <http://phnserver.phenome-networks.com/>.

An important aspect of IL biology, especially in the context of interspecific crosses, is the exposure of new transgressive phenotypes, not present in the parental lines. This phenomenon is caused by novel epistatic relationships between the donor parent alleles, and the independently evolved molecular networks of the recipient parent (Lippman et al. 2007; McCouch et al. 2007; L’Hôte et al. 2010).

In the S. pennellii ILs, transgressive phenotypes have been observed for both qualitative and quantitative traits (Lippman et al. 2007). One clear example is fruit color. In fact, while mature fruits of most cultivated tomato varieties are red and those of S. pennellii are green, some ILs show novel fruit color variation such as the dark orange fruits of the two lines IL6–3 and IL12–2 which are regulated by the dominant genes, Beta and Delta, respectively. The map based-cloning of both genes and their analysis indicated that the primary mechanism underlying aberrant carotenoid accumulation, and likely other transgressive phenotypes, is novel epistatic transcriptional regulation of S. pennellii genes (Ronen et al. 1999, 2000).

Recently Shivaprasad et al. (2012) have investigated the possibility that stable transgressive phenotypes observed in the S. pennellii LA0716 IL library are associated with micro or small interfering(si)RNAs. The rational for their study was based on the observation that primary sRNAs from one parent could initiate secondary siRNA on a target RNA from the other parent through an RNA-based mechanism. Such interactions would establish patterns of gene expression at either the transcriptional or posttranscriptional level that would be specific to the hybrids, and the effect would persist in lines that inherit both interacting loci. To verify their hypothesis, the authors have used high-throughput sequencing to characterize sRNAs in young seedlings of four S. pennellii ILs, as well as of the two parental lines, the F1 and F2 hybrids. They identified loci from which these sRNAs were more abundant in hybrids than in either parent and they showed that accumulation of such transgressive sRNAs correlated with suppression of the corresponding target genes. In one case this effect was associated with hypermethylation of the corresponding genomic DNA. The results suggest that different sRNA-based mechanisms could be involved in transgressive segregation, and that the transgressive accumulation of miRNA and siRNAs is a manifestation of the hidden potential of parents that is released when hybrids are made.

The S. pennellii ILs have also been used to explore the underlying genetic mechanisms of heterosis, or hybrid vigor—the phenotypic superiority of a hybrid over its parents with respect to traits such as growth rate, reproductive success and yield. The genetic basis of this major genetic force that contributes to world food production is not clear yet. Possible genetic explanations include non-mutually exclusive mechanisms: dominance, true overdominance (ODO), pseudo-ODO (i.e. nearby loci at which alleles having dominant or partially dominant advantageous effects are in repulsion linkage phase) and certain types of epistasis (Lippman and Zamir 2007). For the genetic dissection of heterosis, exotic libraries have the double advantage of allowing the assessment of the contribution of ODO effects to heterosis while excluding epistasis, and to provide maximal genotypic and phenotypic diversity, which facilitates the evaluation for a broad range of phenotypes. In this respect, a phenomics study conducted on the S. pennellii LA0716 exotic library provided indirect support for true-ODO QTL, since ODO QTL were identified almost exclusively for the reproductive traits, while dominant and recessive QTL were detected for all analyzed traits (Semel et al. 2006). Other attempts to map ODO loci have been conducted in Arabidopsis (Meyer et al. 2010) and maize (Tang et al. 2010; Pea et al. 2009). These studies, along with the identification by Krieger et al. (2010) of a mutation in tomato with an ODO effect on yield, support the contribution of intragenic interactions to heterosis (Fridman and Zamir 2012).

Many ODO effects were confirmed over several years and environments, and a pyramiding approach was used to develop a multiple-introgression line carrying three independent S. pennellii yield-promoting genomic regions that had showed reproducible heterotic effect on fruit yield under irrigated and drought conditions (Gur and Zamir 2004). The pyramiding of these heterotic introgressions further increased yield beyond the individual QTL, although in a less-than-additive manner. The resulting hybrids had yields 50 % higher than leading commercial varieties when tested in multiple environments and irrigation regimes. The introduction of the S. pennellii introgressions into processing tomato lines resulted in the development of a leading hybrid variety, AB2 (Lippman et al. 2007).

The S. pennellii ILs have been a very effective tool also for the map-based cloning of the genes underlying QTL. The first QTL cloned have been fw2.2 (fruit weight) (Frary et al. 2000; Cong et al. 2002) and Brix9–2-5 (sugar yield, or Brix) (Fridman et al. 2000, 2004). While subtle changes in transcript quantity and in the timing of gene expression were correlated with natural variation at fw2.2; altered enzyme activity as a result of amino-acid substitutions in the gene was the cause for the variation between the cultivated and wild-species alleles at Brix9-2 -5. These studies demonstrated that IL-based Mendelian segregation is a very efficient way to partition continuous variation for complex traits into discrete molecular components. Furthermore, these QTL were the first among many showing that, similarly to the variation found for numerous genes that control quality traits, variation in QTL alleles in plants can be identified in both coding and regulatory regions of single genes (Salvi and Tuberosa 2005; Lippman et al. 2007). Besides fw2.2 and Brix9-2 -5, other tomato QTL have been cloned using segregating populations derived from S. pennellii ILs, such as ovate, style2.1 and fas (Liu et al. 2002; Chen et al. 2007; Cong et al. 2008). An attempt was also made to clone sun using the ILs. However, sun mapped inside a paracentric inversion within the S. pennellii genome; this prevented map-based cloning using that resource (van der Knaap et al. 2004).

Although an extremely powerful and unbiased approach, delimiting a QTL to a single gene using genetic approaches is a time-consuming and technically demanding process (Fridman et al. 2000, 2004; Chen et al. 2007). As a consequence, while much progress has been made in mapping QTL, the elucidation of the underlying molecular mechanisms has lagged behind. Over the years, to try to accelerate the rate of QTL discovery, alternative strategies aimed at identifying candidate genes have been proposed and tested. The complexity of the approaches has evolved along with the availability of more advanced ‘omics’ tools. In this respect, the ILs represent a very efficient genetic resource to increase the efficiency in candidate gene identification and cloning of target QTL based on convergence of evidence deriving from QTL position, expression profiling data, functional and molecular diversity analyses of candidate genes (Li et al. 2005).

In tomato, the S. pennellii IL population has been used to explore the potential of the ‘candidate gene approach’ to identify candidate genes for QTLs influencing the intensity of tomato fruit color (Liu et al. 2003), tomato fruit size and composition (Causse et al. 2004), as well as fruit AsA content (Stevens et al. 2008), and vitamin E (Almeida et al 2011). The approach attempts to link, through mapping analysis sequences that have a known functional role in the measured phenotype with QTL that are responsible for the studied variation. While no co-locations were initially found between candidate genes and fruit color QTL (Liu et al. 2003), several apparent links were observed in the other three studies. More integrated strategies have also been tested in the S. pennellii ILs to find associations between trascriptomic changes and phenotypes of interest including fruit composition (Baxter et al. 2005; Di Matteo et al. 2010, 2013) and drought tolerance (Gong et al. 2010). A systems-based approach was used by Lee et al. (2012) to identify key genes regulating tomato fruit ripening (see Sect. 4.3.2).

Recently, Morgan et al. (2013) have demonstrated that individual ILs can provide useful information to guide metabolic engineering strategies. In fact, in spite of the relatively large regions of introgressed DNA from the genetically distinct donor parent contained in each IL, a detailed biochemical analysis allows pinpointing the main factor of metabolic disturbance and to identify potential candidate proteins that can subsequently be tested in a targeted manner in transgenic plants. In the specific case, one IL (IL2-5) known to have increased levels of fruit citrate and malate at the breaker stage, allowed to focus specifically on aconitase amongst a myriad of possible targets for manipulation of accumulation of carboxylic acids in tomato fruit (Morgan et al. 2013).

3.2 IL-Based System Analyses of Integrated Developmental Networks

Natural genetic variation stored in IL populations also facilitates the integration of multiple ‘omics’ techniques allowing multifaceted systems-level analysis of integrated developmental networks, and the identification of candidate genes underlying complex traits (Li et al. 2005; Schauer et al. 2006, 2008; Lippman et al. 2007; Hansen et al. 2008). These approaches can help identify uncharacterized networks or pathways, in addition to candidate regulators of such pathways (Saito and Matsuda 2010). The availability of a full-genome sequence can further facilitate filtering through genes in the QTL interval, since the examination of the annotation can often suggest a more likely candidate.

In tomato, numerous studies have already demonstrated the effectiveness of these approaches. With the aim of deciphering the genetic basis of compositional quality in tomato fruit, the high-diversity S. pennellii IL population was phenotyped for a wide range of plant morphology traits and for fruit pericarp ‘primary’ metabolites (Schauer et al. 2006). An integrated cartographical network based on correlation analysis of these diverse phenotypes allowed for the identification of morphology-dependent and morphology-independent links among a large number of QTL for fruit metabolism and yield. Moreover, the analysis revealed that harvest index (Fig. 4.2), which is a measure of the efficiency in partitioning of assimilated photosynthate to harvestable product (source-sink partitioning), was the chief pleiotropic hub in the combined network of metabolic and whole-plant phenotypic traits. These results suggest that plant structure has an important role in the final metabolite composition of the fruit. However, the strong negative association between metabolite content and yield was not found in lines heterozygous for the S. pennellii introgressions (ILHs) (Schauer et al. 2008). The uncoupling of the metabolic and morphological traits observed in the ILHs was explained with the reduced fertility problems and range of fruit sizes displayed by the heterozygous lines compared to the homozygous counterparts.

Fig. 4.2
figure 2

A system view of IL-born morphology and metabolism interplay. Cartographic representation of the combined metabolic and morphological network of the tomato ILs (Schauer et al. 2006). Each trait (node) is represented by a shape (metabolites by circles and the phenotypes by triangles). The metabolites are color-coded according to type: brown, amino acids; pink, sugars; green, organic acids; yellow, phosphates; grey, miscellaneous, and module names are defined according to the most prevalent trait type. A line connecting two traits represents a significant correlation between them. Correlation of all trait pairs was calculated using IL means (total of 76 lines); gray lines represent positive correlations, blue lines represent negative correlations (significance threshold of p < 0.0001). Harvest index (HI), the ratio of fruit yield to total plant mass (plant weight + fruit yield), is the central pleiotropic hub of the network. (Reproduced with permission from Lippman et al. (2007) Curr Opin Genet Dev 17:545, Fig. 3)

More recently, the S. pennellii IL library was used to gain insights into the genetic basis regulating natural variability in seed ‘primary’ metabolism and to unfold inter-organ correlations (Toubiana et al. 2012). The seed metabolite profiles were integrated with data from previous metabolic profiling studies on fruit pericarp together with plant morphological traits and yield-related parameters (Schauer et al. 2006; Lippman et al. 2007). Metabolite QTL mapping and correlation-based metabolic network analysis of the integrated heterogeneous data matrices allowed a comparison of the seed and the fruit metabolic networks. The graphic outcome and network parameters showed that the seed metabolite network displayed stronger interdependence of metabolic processes than the fruit, emphasizing the centrality of a tightly inter-regulated amino acid module in the seed metabolic network. Differently from the seed network, the fruit network was characterized by a rigid sugar module, and by the absence of a fatty acid module. In addition, the analysis allowed the identification of a number of candidate genes that may be useful to improve the nutritional values of seeds.

Besides ‘primary’ metabolism, existing genetic variation stored in exotic libraries represents a very powerful tool also for the analysis of specialized (traditionally called ‘secondary’) metabolism (Schilmiller et al. 2012). For instance, glandular trichomes of cultivated tomato and wild tomato relatives produce a variety of structurally diverse volatile and non-volatile specialized metabolites, including terpenes, flavonoids and acyl sugars (Schilmiller et al. 2012). A genetic screen of leaf trichome and surface metabolite extracts of the S. pennellii LA0716 IL population allowed the identification of genomic regions of the wild parent influencing mono- and sesquiterpenes or only sesquiterpenes, and the quality or quantity of acyl sugars metabolites (Schilmiller et al. 2010). In addition, the Solanum ILs have also been profiled for accumulation of volatile fruit compounds, allowing the identification of 25 genetic regions from S. pennellii LA0716 that increased emissions of at least one of the 23 volatiles measured (Tieman et al. 2006; Mathieu et al. 2009). The ability to measure the influence of many regions of the genome on multiple metabolites provided important insights into the metabolic networks. Discovery of loci that influence emissions of multiple volatile compounds led to the hypothesis that these metabolites are biosynthetically related or regulated by a common regulatory network.

Finally, Lee et al. (2012) applied ripe fruit transcriptional and metabolic profiling to the S. pennellii LA0716 exotic library. Candidate genes mining based on correlation analyses allowed the identification of the ethylene response factor SlERF6. RNAi analysis showed that SlERF6 plays a central role in tomato ripening integrating the ethylene and carotenoid synthesis pathways.

Together, these examples illustrate that with the continued development of genetic and “omics” tools, more detailed systems-level analyses will be possible, increasing the efficiency in discovery, candidate gene identification and cloning of target QTL.

3.3 Other Tomato Library Resources

In order to enhance the rate of progress of introgression breeding, Zamir (2001) proposed to invest in the development of a genetic infrastructure of “exotic libraries”. Along this line, for tomato, besides the S. pennellii LA0716 exotic library, populations of ILs and/or pre-ILs have been developed and/or further refined from other wild relatives including S. habrochaites (acc. LA1777) (Monforte and Tanksley 2000a; Tripodi et al. 2010; S. Grandillo, personal communication), S. habrochaites (acc. LA0407) (Finkers et al. 2007), S. chmielewskii (acc. LA1840) (Peleman and Van der Voort 2003; Prudent et al. 2009), S. neorickii (acc. LA2133) (Fulton et al. 2000; D. Zamir, personal communication), S. pimpinellifolium (acc. LA1589) (Doganlar et al. 2002; D. Zamir, personal communication), S. pimpinellifolium (acc. TO-937) (W. Barrantes and A.J. Monforte, personal communication) and the wild tomato-like nightshade S. lycopersicoides LA2951 (Chetelat and Meglic 2000; Canady et al. 2005) (Table 4.1).

The first set of S. habrochaites LA1777 ILs and pre-ILs was developed by Monforte and Tanksley (2000a) from the AB-QTL population (Bernacchi et al. 1998), and consisted of 99 ILs and BCRILs, in the cv. E6203 genetic background, providing an estimated coverage of approximately 85 % of the wild donor genome. The lines are highly variable for numerous traits including yield, leaf morphology and trichome density, cold tolerance, as well as fruit traits such as shape, size, color, biochemical composition and flavor volatiles. Favorable wild QTL alleles have been identified for several of the evaluated traits (Table 4.1) (Monforte and Tanksley 2000b; Van der Hoeven et al. 2000; Monforte et al. 2001; Yates et al. 2004; Dal Cin et al. 2009; Mathieu et al. 2009; Liu et al. 2012; Grandillo et al. 2011, 2013). Nevertheless, several lines of this initial population still contain multiple wild species chromosome segments. Therefore, within the framework of the EU project (EU-SOL; http://www.eu-sol.net/), an improved collection of S. habrochaites LA1777 ILs was developed and anchored to a shared framework of ~ 120 conserved ortholog set II (COSII) markers (Wu et al. 2006). This new population of LA1777 ILs allows a better genome coverage based on single-introgression lines (Tripodi et al. 2010; S. Grandillo, personal communication). Furthermore, leaf and fruit pericarp RNA-seq SNP data collected on this new panel of LA1777 ILs provided a better definition of the introgression boundaries and their anchoring to the tomato genome sequence (S. Grandillo and J. Giovannoni, personal communication; The Tomato Genome Consortium 2012).

From the tomato AB-QTL populations MAS has been used to develop populations of ILs and pre-ILs in the genetic background of the processing cv. E6203 also for S. pimpinellifolium LA1589 (196 BILs) (Grandillo and Tanksley 1996; Tanksley et al. 1996; Bernacchi et al. 1998; Doganlar et al. 2002; D. Zamir, personal communication) and S. neorickii LA2133 (142 BILs) (Fulton et al. 2000; Zamir, personal communication). Within the framework of the EU-SOL project the 142 S. neorickii BILs have been evaluated for agronomic traits, including yield, brix and fruit weight, and several favorable wild alleles were identified that could be targeted for further marker-assisted introgression into cultivated tomato (D. Zamir and S. Grandillo, personal communication).

Another population of 55 S. chmielewskii LA1840 ILs in the genetic background of the cv. Moneyberg was developed by KeyGene N.V. (Peleman and van der Voort 2003). A subset of these lines was used to study the effect of fruit load, and therefore of carbon availabity, on the detection of QTL underlying fruit weight and composition (Prudent et al. 2009; Do et al. 2010), and on age- and genotype- dependent gene expression (Prudent et al. 2010). A model-based approach followed by genetic analysis allowed uncoupling genetic from physiological relationships among processes, and thus provided new insights towards understanding tomato fruit sugar assimilation (Prudent et al. 2011). Furthermore, phenotypic analysis of the S. chmielewskii LA1840 IL population revealed three overlapping ILs on chromosome 1 with a pink fruit color, a trait known to be regulated by the Y locus (Ballester et al. 2010). Biochemical and molecular data, along with gene mapping, segregation analysis and virus-induced gene silencing experiments allowed the identification of SlMYB12 as a likely candidate for the Y locus (Ballester et al. 2010).

In order to facilitate marker-assisted breeding based on these wild species resources, and to facilitate comparisons between function maps of tomato and potato, some of the IL libraries described above have been anchored to the potato genome using a common set of ~ 120 COSII markers and (Tripodi et al. 2010; S. Grandillo, personal communication). The multi-species IL platform include ILs and BILs derived from interspecific crosses of tomato and the five wild accessions S. pennellii LA0716, S. habrochaites LA1777, S. neorickii LA2133, S. chmielewskii LA1840, and S. pimpinellifolium LA1589 (Tripodi et al. 2010; Brog et al. 2011). Multi-species IL platforms are highly divergent in phenotypes providing abundant segregation for whole genome naturally selected variation affecting yield, morphological and biochemical traits, and allow multiallelic effects to be captured. A draft sequence of S. pimpinellifolium LA1589 is already available (Tomato Genome Consortium 2012), and within the SOL-100 sequencing project (http://solgenomics.net/organism/sol100/view), sequences are becoming available for most of the parents of the tomato IL libraries described above, which will further enhance the value of these genetic resources.

4 Integrative Approaches to Genomic Introgression Mapping

Genetically well-characterized IL populations, anchored to highly saturated genetic maps, are key tools for rapid and precise localization of QTL and subsequent identification of the casual genes. Hitherto, the mapping of IL introgression boundaries has relied on a wide range of electrophoresis-based molecular tools, which have rarely ensured sufficiently high marker saturations (Severin et al. 2010). The development of new high-throughput molecular platforms that allow automated genotyping is accelerating and making more precise the process of introgressions mapping and IL library development. Dense genetic maps, in fact, allow for localizing the introgressed segments with high resolution, which is crucial for the selection of ILs carrying small marker-defined segments for genome-wide coverage of the donor parent genome.

A few studies have compared the efficiency of different genotyping platforms for genome introgression mapping. For instance, in rice, an IL population consisting of 128 ILs and pre-ILs derived from a cross between two sequenced rice cultivars, was genotyped with 254 PCR-based markers and then subjected to whole-genome re-sequencing (Xu et al. 2010). The high-quality physical map of ultrahigh-density SNPs identified 117 new segments (almost all shorter than 3 Mb) that had not been detected in the molecular marker map. The new method improved the resolution of recombination breakpoints 236-fold, and almost eliminated the likelihood of missing double-crossovers in the mapping population. Furthermore, the sequencing-based physical map allowed QTL bin mapping with higher accuracy, thus being of great potential value for gene discovery and genetic mapping.

Another study was recently conducted to compare some of the existing (Affymetrix SFP and Illumina GoldenGate) and emerging (Illumina NGS) technologies for soybean introgression mapping (Severin et al. 2010). The results show that SFP, Illumina GoldenGate, and RNA-Seq are complementary methods for identifying genetic introgressions in NILs. RNA-Seq methodologies clearly identified a much greater number of polymorphic loci within the known introgression sites, and the increased marker coverage allowed to identify the introgression boundaries at a higher resolution. Comparative NGS analyses of NILs with their respective parental lines offer the additional advantage of identifying SNP polymorphisms that are specific to the genetic material of interest. The SNPs identified de novo by RNA-Seq can be directly used for fine-mapping on subsequent generations by means of custom SNP genotyping platforms. Furthermore, the RNA-Seq data may be mined for transcriptional differences or genetic alterations that may identify candidate genes that drive the differential phenotypes observed between the lines. In this respect, compared to the Affymetrix platform, the RNA-Seq data provide a larger sampling of transcripts and also permit the possible identification of frame-shift or nonsense mutations within introgressed loci. However, at the moment, besides the higher costs, the RNA-Seq approach has also the disadvantage of a marker depth necessarily biased for gene-rich regions and therefore, even applying bootstrapping methods to correct for gene densities, severely gene-poor regions might not be represented in the analyses (Severin et al. 2010).

High-throughput genotyping platforms have been used also on interspecific IL populations of crops including barley (Sato and Takeda 2009; Schmalenbach et al. 2011), tomato (Sim et al. 2012; Van Schalkwyk et al. 2012) and rice (Ali et al. 2010). For example, in barley, an Illumina 1536-SNP array was used for high-resolution genotyping of a set of 73 ILs (S42ILs) originating from a cross between the spring barley cv. Scarlet (Hordeum vulgare ssp. vulgare) and the wild barley accession IDSR42–8 (H. v. ssp. spontaneum) (Schmalenbach et al. 2011). The array enabled a precise localization of the wild barley introgressions in the elite barley background. In addition, to further implement this IL library into a resource for rapid identification, fine-mapping and positional cloning of QTL, segregating high-resolution mapping populations (S42IL-HRs) were developed for most ILs.

In tomato, the high-density “SolCAP” SNP array was used to genotype a large collection of tomato accessions, as well as the S. pennellii LA0716 ILs (Sim et al. 2012). In addition, Van Schalkwyk et al. (2012) reported the development of a diversity arrays technology (DArT) platform consisting of 6,912 clones from domesticated tomato and 12 wild tomato/Solanaceous species. The platform was validated by bin-mapping 990 polymorphic DArT markers together with 108 RFLP markers across the S. pennellii LA0716 IL library, resulting, on average, in a ten-fold increase of the number of markers available for each IL. A subset of DArT markers from ILs previously associated with increased levels of lycopene and carotene were sequenced, and 44 % matched protein coding genes. The conversion of the DArT markers to CAPS or SNP markers should facilitate fine mapping of QTLs in S. pennellii ILs.

In rice, about two dozen IL/BIL libraries have been developed representing different O. sativa backgrounds and wild donors, and most of the donors and recipient parents have been sequenced using second-generation sequencing technology and/or genotyped using the 44,100 SNP array (Table 4.1) (Ali et al. 2010). In addition, physical maps of 17 Oryza species (representing the 10 genome types) have been developed by the Oryza Map Alignment project (Ali et al. 2010; http://www.omap.org).

It is clear that high-throughput SNP assays and the availability of custom-designed medium- and low-density SNP arrays will greatly enhance the efficiency of whole-genome IL library development, allowing the selection of small marker-defined segments introgressed from the unadapted germplasm. Furthermore, the availability of SNP markers across the introgressed donor regions will facilitate fine-mapping and cloning of genes underlying target QTL.

5 Conclusions

Many crops have a very narrow genetic base that threatens future genetic gains. In contrast, wild species represent a rich, although mostly untapped, reservoir of valuable alleles that could be used to address present and future breeding challenges. For a more efficient exploitation of exotic germplasm, we need to capitalize on the acquired knowledge and on the ever-growing genetic and “omics” resources that are becoming available and that take advantage of many recently released crop genome sequences to investigate gene-function (Hamilton and Buell 2012, http://genomevolution.org/wiki/index.php/Sequenced_plant_genomes). Among all model systems, the wild and domesticated species of the tomato clade have pioneered novel population development, such as IL populations or “exotic libraries” (Zamir 2001; Lippman et al. 2007). The last 20 years of research conducted on the S. pennellii LA0716 ILs (the founding population) have clearly demonstrated the power of these congenic and permanent resources for the genetic and molecular analyses of QTL, for dissecting heterosis, and hence for the development of a leading hybrid variety. Over the years, the IL approach has been integrated with various state-of-the art ‘omics’ platforms, thus evolving beyond standard QTL identification towards a multifaceted systems-level analysis. These achievements have encouraged the research community to invest in the development of IL library resources, or related prebreds, such as BILs, representing different fractions of the exotic parent genomes, for a number of other tomato wild species, as well as for a wide range of crops. The results indicate that exotic germplsam stores a tremendous wealth of potentially valuable alleles, many of which would not have been predicted from the phenotypes of the wild plants. However, only a small fraction of the naturally occurring genetic diversity available in the world’s genebanks has been explored to date, and made permanently accessible through IL population development. The advent of new cost-effective, high-throughput genotyping and sequencing technologies is expected to change this trend. Strategies based on phylogenetic approaches can be pursued to select the right parents that would maximize the probability of creating new useful transgressive segregation from which to select superior phenotypes (McCouch et al. 2012). In addition, the new high-throughput molecular platforms are accelerating and making more precise the process of introgression mapping and IL library development, and the availability of SNP markers across the introgressed donor regions will facilitate fine mapping and cloning of genes underlying target QTL.

In this context of fast-evolving technological advances, the availability of exotic libraries further increases the value of the numerous unadapted genetic resources stored worldwide in our in situ and ex situ germplasm collections.