Keywords

Introduction

The phenotypic variation of many traits of agricultural and evolutionary importance is of quantitative nature, and results from the combined action of multiple segregating loci that may interact with each other as well as with the environment, making the dissection of the genetic architecture and molecular basis of these traits a notoriously challenging endeavor (Falconer 1989). Before the advent of molecular markers, the genetics of complex traits was studied in general terms by ‘‘quantitative genetics’’ (Mather 1949), and no information was available about the number and location of the underlying genes, termed polygenes by Mather (1941).

The theoretical landmarks for mapping polygenes were set already in 1923 when Sax reported the association of seed size in bean (a quantitatively inherited trait) with seed-coat pigmentation (a discrete monogenic trait). Subsequently, Thoday (1961) elaborated the basic approach for using marker genes in segregating populations to systematically map and characterize individual polygenes, and Geldermann (1975) introduced the term quantitative trait locus (QTL) to describe a genetic locus where functionally different alleles segregate and cause significant effects on a polygenic trait. However, the application of Thoday’s idea had to wait until the 1980s when isozyme markers started to be applied as a general tool for QTL analyses in tomato (Solanum lycopersicum) (Tanksley et al. 1982; Vallejos and Tanksley 1983; Weller et al. 1988) and in maize (Edwards et al. 1987).

Numerous factors influence the power of detecting QTL, including the heritability of the trait, gene action, the type of mapping population, marker coverage, the number and individual effects of QTL, as well as the distance between marker loci and QTL affecting the trait (Tanksley 1993; Mackay et al. 2009). Early tomato QTL mapping studies mainly applied morphological and isozyme markers in F2 and backcross (e.g., BC1) populations. Although several quantitative plant and fruit characteristics were analyzed, the number of informative isozyme markers was not sufficient to adequately scan the entire tomato genome for QTL, and it was therefore difficult to precisely estimate QTL positions (Tanksley et al. 1982; Vallejos and Tanksley 1983; Weller et al. 1988). The constraint of limited marker availability was subsequently overcome with the development of DNA-based genetic markers, the first of which were restriction fragment length polymorphisms (RFLPs) (Botstein et al. 1980; Bernatzky and Tanksley 1986). In 1988 Paterson and collaborators reported their pioneering study in which a complete RFLP linkage map, including 63 RFLPs, along with appropriate statistical procedures, were used in an interspecific tomato BC1 population to map and characterize QTL, thus demonstrating that complex traits could be dissected into single Mendelian factors. Thereafter, the number of RFLP markers available for tomato genetics has increased to approximately 1000 (Tanksley et al. 1992). Meanwhile, QTL mapping in tomato has flourished and has been applied to hundreds of traits of agronomical and biological interest (Tables 4.1, 4.2, 4.3, and 4.4; reviewed by Foolad 2007; Labate et al. 2007; Grandillo et al. 2011, 2013; Grandillo 2013). To this end, different segregating populations and mapping strategies have been used.

Table 4.1 Summary of QTL mapping studies for disease (viral, bacterial, fungal) resistance in tomato
Table 4.2 Summary of QTL mapping studies for pest resistance-related traits in tomato
Table 4.3 Summary of QTL mapping studies for abiotic stress/tolerance resistance in tomato
Table 4.4 Summary of QTL mapping studies for plant, flower, fruit and yield traits in tomato

An essential requirement for QTL mapping populations is the existence of sufficient polymorphism at marker loci and in genes underlying the trait(s) of interest. Due to several genetic bottlenecks occurred during tomato domestication and breeding, and similarly to other self-pollinated crops, the genotypic diversity within cultivated germplasm is very narrow (Miller and Tanksley 1990; Blanca et al. 2012). This limitation has led tomato geneticists and breeders to also harness the rich genetic variation stored in unadapted germplasm for the development of mapping populations and for breeding (Rick 1982; Bai and Lindhout 2007). As a result, most tomato QTL mapping experiments conducted thus far have used distant crosses between cultivated germplasm and related wild species, although several successful examples of S. lycopersicum intraspecific QTL studies have also been reported (Tables 4.1, 4.2, 4.3, and 4.4; Causse et al. 2001, 2007; Saliba-Colombani et al. 2001; reviewed by Foolad 2007; Labate et al. 2007; Grandillo et al. 2011, 2013).

Similarly to other autogamous species, primary segregating populations such as F2 or early backcross (BC) progenies have been widely used for tomato QTL mapping. However, over time a more variegated repertoire of population structures has been employed including recombinant inbred (RI) populations, advanced backcross (AB) populations, backcross inbred lines (BILs), and introgression lines (ILs) (Tables 4.1, 4.2, 4.3, and 4.4). As for marker technology, following a wide use of RFLP markers, PCR-based markers have gained ground and, in many cases, RFLP maps have been integrated with several types of PCR markers (reviewed by Grandillo et al. 2011, 2013). Although the large majority of known marker systems have found applications in tomato, yet most of them are too laborious and low throughput to meet the requirements of the genomics era (Víquez-Zamora et al. 2013). These drawbacks are now being circumvented by next-generation sequencing (NGS) projects, which are offering new possibilities to significantly increase genotyping throughput, as well as by the availability of high-throughput Single Nucleotide Polymorphisms (SNPs) arrays that have allowed massive parallel whole genome screening of genotypes (Sim et al. 2012; Víquez-Zamora et al. 2013). In addition, thanks to the recently published whole genome sequences of tomato (Tomato Genome Consortium 2012), next-generation resequencing approaches can be applied also in related germplasm (Causse et al. 2013; Aflitos et al. 2014).

The numerous QTL mapping studies conducted in tomato over the past three decades have provided information about the genetic architecture of complex traits, i.e., estimated number of QTL and magnitude of their estimated additive, dominance, and epistatic effects in multiple environments. These efforts have resulted in the detection of thousands of QTL, many of which are of potential interest for tomato breeding, and whose molecular bases still wait to be revealed (Tables 4.1, 4.2, 4.3, and 4.4) (reviewed by Foolad 2007; Labate et al. 2007; Grandillo et al. 2011, 2013; Grandillo 2013; Alseekh et al. 2013).

During these years, the tomato clade (Solanum sect. Lycopersicon), which encompasses the cultivated tomato (S. lycopersicum) and its 12 wild relatives (Peralta et al. 2008), has proven to be a model system not only for the identification (Paterson et al. 1988) and positional cloning of QTL (Frary et al. 2000; Fridman et al. 2000, 2004), but also for the development of new molecular breeding approaches aimed at ensuring a more efficient use of the wealth of genetic variation hold in wild germplasm (Tanksley and Nelson 1996; Tanksley et al. 1996; Tanksley and McCouch 1997; Zamir 2001).

Although the QTL mapping approach has proven to be an undoubtedly powerful method to dissect the genetic architecture of complex traits and for breeding, nevertheless, it suffers from several drawbacks including the restricted allelic variation, the low-resolution mapping, and the time necessary to develop the mapping populations (Korte and Farlow 2013). In order to overcome these limitations and to facilitate the association of phenotypes to genotypes, alternative approaches have been suggested including linkage disequilibrium (LD)-based association analysis, also referred to as association mapping (AM) (Flint-Garcia et al. 2003; Gupta et al. 2005), and next generation genetic-mapping populations such as Multi-parent Advanced Generation Inter-Cross (MAGIC) populations (Cavanagh et al. 2008). Over the last years, the availability of the tomato genome sequences (Tomato Genome Consortium 2012), the related new high-throughput genotyping tools, and the development of new methodological approaches have allowed successful applications of both strategies also in tomato (Sauvage et al. 2014; Pascual et al. 2015). These advances are paving the way for a more efficient exploitation of S. lycopersicum germplasm in breeding programs.

The status of QTL mapping in tomato has been the subject of several reviews (Foolad 2007; Labate et al. 2007; Grandillo et al. 2011, 2013; Grandillo 2013), and most of the studies have been summarized and updated in Tables 4.1, 4.2, 4.3, and 4.4. Therefore, also because of space limitations, in this current review we do not attempt to provide a comprehensive discussion of the subject, but rather we focus on a few aspects, highlighting the new opportunities that the tomato genome sequences and the related genomic tools are providing for the genetic and molecular dissection of complex traits and to accelerate the improvement of this important crop.

IL-Based Analysis of Complex Traits and Breeding

Since the first QTL mapping studies conducted in interspecific crosses of tomato, it became evident that the approach allowed a more efficient detection of “cryptic” genetic variants (Tanksley et al. 1982; Weller et al. 1988; de Vicente and Tanksley 1993). This suggested that despite their overall inferior phenotype, unadapted germplasm is likely to be a rich source of agronomically favorable QTL alleles (Tanksley and McCouch 1997). However, in order to increase the efficiency with which natural biodiversity could be mined to improve yield, adaptation and quality of elite germplasm, and thus to bridge the gap between QTL mapping and QTL-based breeding, new concepts and strategies needed to be developed. These new methods should have also allowed circumventing some of the constraints posed by QTL mapping conducted in early biparental segregating generations (F2, F3, and BC1) or in RILs. The high proportion of donor parent alleles that still segregate in these populations, in fact, may result in overshadowing effects of major QTL on the effects of independently segregating minor QTL, as well as in relatively high level of epistatic interactions between donor QTL alleles and other donor genes. Thereby, favorable donor QTL alleles detected in these mapping populations often lose their effects once they are introgressed into the genetic background of elite lines. In addition, in the case of interspecific crosses involving exotic germplasm, QTL analyses might be further complicated by partial or complete sterility problems, since a few genes for sterility may impede population development and/or the obtention of meaningful measurements for agronomical important traits (such as fruit characters).

In order to address these issues, two related molecular breeding strategies, the “Advanced Backcross (AB) QTL analysis” (Tanksley and Nelson 1996; Tanksley et al. 1996) and the “introgression line (IL) populations” or “exotic libraries” (Eshed and Zamir 1994, 1995; Zamir 2001), have been implemented first in tomato, and then in several other crops (Grandillo et al. 2008, 2013; Grandillo 2013). These methods were proposed to more efficiently unlock the genetic potential stored in seed banks and in exotic germplasm for the development of improved varieties, thereby expanding the genetic base of crop species (Tanksley and McCouch 1997; Zamir 2001). Both approaches have allowed the detection of favorable wild QTL alleles for numerous traits of agronomical and biological interest along with the development of ILs or QTL-NILs that can be used in marker-assisted breeding programs (Grandillo et al. 2008; Grandillo 2013). Sets of ILs or QTL-NILs have also been developed from intraspecific crosses (Lecomte et al. 2004a; Chaïb et al. 2006). In some instances, they have been used to verify, stabilize, and fine-map QTL, in the same or in different genetic backgrounds, and therefore only a relatively small proportion of the donor parent genome was represented in the developed ILs (Paterson et al. 1990; Tanksley et al. 1996; Bernacchi et al. 1998b; Monforte and Tanksley 2000b, Monforte et al. 2001; Lecomte et al. 2004b; Chaïb et al. 2006).

In tomato, the AB-QTL analysis method has been applied to six interspecific crosses involving the same S. lycopersicum parent (cv. E6203) and six wild species, selected to represent a broad spectrum of the phylogenetic tree: S. pimpinellifolium LA1589 (Tanksley et al. 1996), S. arcanum LA1708 (Fulton et al. 1997), S. habrochaites LA1777 (Bernacchi et al. 1998a, b), S. neorickii LA2133 (Fulton et al. 2000), and S. pennellii LA1657 (Frary et al. 2004a), S. chilense LA1932 (Termolino et al. 2010) (Table 4.4). These populations have been analyzed for numerous horticultural traits important for the tomato processing industry, using replicated field trials in several locations worldwide (Table 4.4). Overall, wild QTL alleles with favorable effects were detected for more than 45 % of traits evaluated across the first five AB populations (reviewed by Grandillo et al. 2008). In addition, the first four AB-QTL populations have also been analyzed for biochemical traits possibly contributing to flavor (Fulton et al. 2002).

Concomitantly, the IL approach was proposed in D. Zamir’s laboratory, and the first tomato whole genome IL population was developed which comprised a core set of 50 lines carrying single RFLP-defined homozygous chromosomal segments of the distantly related, wild desert green-fruited species S. pennellii LA0716 in the background of the processing inbred cv. M82 (Eshed and Zamir 1994, 1995). Several properties of IL populations contribute to their power in detecting and stabilizing QTL, and they have been widely discussed elsewhere (Zamir 2001; Lippman et al. 2007; Grandillo et al. 2008; Grandillo 2013). Collectively the S. pennellii LA0716 ILs represent whole genome coverage of the wild parent in overlapping segments, which define unique “bins” where genes and QTL can be mapped, albeit at an initial average coarse resolution. Another important feature of this IL library is its permanent nature, as it can be maintained by self-pollination, and this aspect allows replicated measurements to be taken across different environments, years, and laboratories (Eshed and Zamir 1995).

The numerous advantages of IL populations for the analyses of complex traits have become manifest since the first experiments conducted with the S. pennellii IL library (and, in some cases, also with the correspondent heterozygous lines, HILs) to map and fine-map QTL underlying horticultural yield and fruit quality traits (Eshed and Zamir 1995, 1996; Eshed et al. 1996). Thenceforth, the S. pennellii IL population, and subsequently also its second generation consisting of 76 ILs and subILs (Pan et al. 2000; http://solgenomics.net/), have been publicly available, and have been used to analyze a plethora of biologically and agronomically relevant traits including whole-plant morphology and yield (also heterosis), primary and secondary metabolic composition, fruit color, enzyme activities, leaf, fruit, and root morphology, cellular development, biotic and abiotic stress tolerance, hybrid incompatibility, and gene expression (Tables 4.1, 4.2, 4.3, and 4.4) (Grandillo et al. 2011, 2013; Grandillo 2013), resulting in more than 3069 QTL identified in this population to date (reviewed in Alseekh et al. 2013).

To aid in the discovery of the genes underlying the many QTL described to date, the mapping resolution of the S. pennellii LA0716 IL library was improved through the addition of 285 marker-defined subILs, which break up the 37 largest ILs of the initial population—corresponding to approximately 75 % of the genome; and work is going on to generate sublines also for the remaining 25 % of the genome. Seeds for the subILs as well as F2 seeds for each IL are publically available (Alseekh et al. 2013).

Panels of ILs, deriving from both interspecific as well as intraspecific crosses, represent also a very valuable resource to get more precise estimates of epistatic interactions (Eshed and Zamir 1996; Causse et al. 2007) and of QTL × genotype interactions (Eshed and Zamir 1995; Eshed et al. 1996; Monforte et al. 2001; Gur and Zamir 2004; Lecomte et al. 2004a; Chaïb et al. 2006; Causse et al. 2007). The immortality of IL populations allows taking phenotypic measurements on multiple replicates, which reduces the environmental effects and increases statistical power. By replicating the trials in more than one location and over time, it becomes possible to estimate QTL × environment interactions (Paterson et al. 1991; Eshed et al. 1996; Monforte et al. 2001; Liu et al. 2003b; Gur and Zamir 2004; Rousseaux et al. 2005). In this respect, a unique characteristic of the S. pennellii library is that phenotypic data from 45 IL experiments, in which 355 traits were scored in replicated measurements by multiple laboratories, have been deposited in the phenotype warehouse of Phenom Networks (http://phnserver.phenome-networks.com/) (Zamir 2013). The data can be browsed and statistically analyzed online; in alternative, they can be downloaded from the site to be analyzed using alternative statistical softwares. This tool allows comparisons of new data collected from the S. pennellii ILs with the results already available on the site.

Another relevant feature of IL biology, especially in the context of interspecific crosses, is the exposure of new transgressive phenotypes, not present in the parental lines. This phenomenon is caused by novel epistatic relationships arising between the donor parent alleles, and the independently evolved molecular networks of the recipient parent (Lippman et al. 2007). A recent example is provided by Chitwood et al. (2013) who have characterized the S. pennellii IL library for a suite of vegetative traits, ranging from leaf shape, size, complexity, and serration traits to cellular traits, such as stomatal density and epidermal cell phenotypes. Thus, leading to the identification of 1035 QTL, 826 toward the direction of S. pennellii and 209 transgressive, beyond the phenotype of the domesticated parent. Additionally, Shivaprasad et al. (2012) have explored the possible involvement of epigenetics and small silencing RNA in the occurrence of stable transgressive phenotypes observed in the S. pennellii LA0716 IL library. Their results indicate that different sRNA-based mechanisms could be implicated in transgressive segregation, and that the transgressive accumulation of miRNA and siRNAs is an indication of the hidden potential of parents that becomes manifest in the hybrids.

The IL approach has also facilitated the exploration of the genetic basis of heterosis (Semel et al. 2006), along with its application for IL-based crop improvement, as shown by the development of a new leading hybrid of processing tomato through marker-assisted pyramiding of three S. pennellii introgressions carrying heterotic QTL (Gur and Zamir 2004; Lippman et al. 2007).

One shortcoming of most IL populations is the relatively low map resolutions; nevertheless, each IL can be used as the starting point for high-resolution mapping. In this way, tight linkage of multiple QTL affecting one or more trait(s) can be discerned from pleiotropy (Alpert and Tanksley 1996; Eshed and Zamir 1996; Monforte and Tanksley 2000b; Monforte et al. 2001; Fridman et al. 2002; Frary et al. 2003; Chen and Tanksley 2004; Lecomte et al. 2004b; Stevens et al. 2008; Chapman et al. 2012; Haggard et al. 2013). Moreover, the identification of molecular markers more closely linked to the QTL of interest is the basis for marker selection (MAS) of elite breeding lines carrying individual or a combination of QTL.

Thanks to these properties, the S. pennellii ILs have soon demonstrated to be an efficient tool for the positional cloning of QTL (Frary et al. 2000; Fridman et al. 2000, 2004). However, in spite of the successes achieved so far, delimiting a QTL to a single gene or to a quantitative trait nucleotide (QTN) using genetic approaches is still an arduous and labor-intensive task. Therefore, over the years, alternative strategies have been tested to short list candidate genes for target QTL. For example, the S. pennellii IL population has been used to explore the potential of the ‘‘candidate gene approach’’ to identify candidate genes for QTL affecting tomato fruit color (Liu et al. 2003b), tomato fruit size, and composition (Causse et al. 2004), as well as fruit AsA content (Stevens et al. 2008), and vitamin E (Almeida et al. 2011). While no colocation was initially found between candidate genes and fruit color QTL (Liu et al. 2003b), several putative associations were observed in the other three studies.

Natural genetic variation stored in IL populations can also facilitate the integration of multiple cutting-edge ‘‘omic’’ platforms (genomic, transcriptomic, proteomic, and/or metabolomic) and large physiological data sets, along with statistical network analysis, allowing multifaceted systems-level analysis of integrated developmental networks, and the identification of candidate genes underlying complex traits (Schauer et al. 2006, 2008; Lippman et al. 2007). These approaches can help identifying previously uncharacterized networks or pathways, in addition to candidate regulators of such pathways (Saito and Matsuda 2010). The availability of a full-genome sequence can further facilitate reducing the list of genes in the QTL interval, since the analysis of the annotation might indicate a more likely candidate. In tomato, numerous studies have already demonstrated the power of these approaches to gain insights into the genetic basis of compositional quality in tomato fruit (Schauer et al. 2006, 2008), of seed ‘‘primary’’ metabolism (Toubiana et al. 2012), or for the analysis of ‘‘secondary’’ metabolism (Schilmiller et al. 2010, 2012), as well to unfold interorgan correlations (Toubiana et al. 2012). Furthermore, Morgan et al. (2013) have showed that detailed biochemical characterization of the S. pennellii IL library can provide useful information to guide metabolic engineering strategies aimed at increasing health-related compounds of tomato fruit. Recently, Lee et al. (2012) used a systems-based approach combining transcriptomic analysis (based on the TOM2 oligonucleotide array) and metabolic data to identify key genes regulating tomato fruit ripening and carotenoid accumulation. Altogether, these examples suggest that with the continued development of genetic and “omic” tools, more detailed systems-level analyses will be possible, increasing the efficiency in discovery, candidate gene identification and cloning of target QTL.

Considering the numerous successful applications of the S. pennellii LA0716 IL library, in order to accelerate the rate of progress of introgression breeding, Zamir (2001) proposed to invest in the establishment of a genetic infrastructure of “exotic libraries.” Along this line, for tomato, besides the S. pennellii LA0716 library, additional populations of ILs and BILs, covering different fractions of the wild species genomes, have been developed and/or further refined for other wild tomato relatives including S. habrochaites LA1777 (Monforte and Tanksley 2000a; Tripodi et al. 2010; Grandillo et al. 2014; S. Grandillo et al., unpublished results), S. habrochaites LA0407 (Finkers et al. 2007b), S. chmielewskii LA1840 (Prudent et al. 2009), S. neorickii LA2133 (Fulton et al. 2000; D. Zamir personal communication), S. pimpinellifolium LA1589 (Doganlar et al. 2002; D. Zamir personal communication), S. pimpinellifolium TO-937 (Barrantes et al. 2014) and the wild tomato-like nightshade S. lycopersicoides LA2951 (Chetelat and Meglic 2000; Canady et al. 2005). Some of these populations have already been used to identify QTL for several traits (Tables 4.1, 4.2, 4.3, and 4.4). For instance, the S. chimielewskii LA1840 ILs have been used to explore the effect of different fruit loads on QTL detection (Prudent et al. 2009, 2010, 2011; Do et al. 2010; Kromdijk et al. 2014).

In order to facilitate marker-assisted breeding based on these wild species resources, and to facilitate comparisons between function maps of tomato and potato, some of the above-mentioned IL/BIL populations have been anchored to the potato genome using a common set of ~120 COSII markers (Wu et al. 2006; Tripodi et al. 2010; S. Grandillo et al. unpublished results). The multispecies IL platform includes ILs and BILs derived from the S. neorickii LA2133 AB population (Fulton et al. 2000; D. Zamir personal communication), a new set of S. habrochaites LA1777 ILs (Grandillo et al. 2014), the S. chmielewskii LA1840 IL population and the S. pennellii LA0716 ILs and subILs (Alseekh et al. 2013). These genetic resources expose highly divergent phenotypes, providing a rich segregation for whole genome naturally selected genetic variation affecting yield, morphological, and biochemical traits, thus allowing multiallelic effects to be captured.

The production of such congenic and permanent resources, however, is quite an arduous and time-consuming task, which can take several years. The development of new high-throughput molecular platforms that allow automated genotyping is making IL development a much more efficient and precise process (Severin et al. 2010; Xu et al. 2010; Schmalenbach et al. 2011). Dense genetic maps, in fact, allow high-resolution localization of the introgressed segments, which is essential if one has to select ILs carrying single and small marker-defined segments for genome-wide coverage of the donor parent genome. Furthermore, IL populations genotyped at very high resolution should facilitate rapid and precise localization of QTL and subsequent identification of the underlying genes. In this respect, the S. pennellii LA0716 IL library has been genotyped using the high-density “SolCAP” SNP array (Sim et al. 2012), as well as using a diversity arrays technology (DArT) platform, which has resulted, on average, in tenfold increase of the number of markers available for each IL (Van Schalkwyk et al. 2012). Additionally, Chitwood et al. (2013) have genotyped the S. pennellii library at ultra-high density, using two complementary approaches, RNA-Seq and RESCAN, which have resulted in a precise definition of the boundaries of each IL at both the genomic and transcriptomic levels. The combination of these data with the recently completed tomato genome has also allowed the exact gene content of each IL to be determined, which should aid the molecular characterization of QTL as well as breeding efforts.

The recent availability of the genome sequences of the parents for some of the IL populations described above is further enhancing the potential of these congenic and permanent genetic resources. In order to support QTL analyses in the S. pennellii IL library, following on from the release of the genome sequence for tomato (S. lycopersicum cv Heinz) and of a draft sequence of S. pimpinellifolium (Tomato Genome Consortium 2012), Bolger et al. (2014) have recently released the genome sequences for the M82 cultivar and S. pennellii LA0716. Anchoring the S. pennellii genome to the genetic map has allowed the identification of candidate genes for stress tolerance traits; in addition, the study has provided evidence for the role of transposable elements in the evolution of these traits (Bolger et al. 2014). These results demonstrate the power of sequencing the parental lines of permanent genetic populations that have been extensively phenotyped. It is worth noting, that within the SOL-100 sequencing project (http://solgenomics.net/organism/sol100/view), sequences are becoming available for most of the parents of the tomato IL libraries described above, which will further enhance the value of these genetic resources.

Association Mapping and Next-generation Populations

QTL analysis conducted in biparental mapping populations, using the linkage mapping approach, has proven to be an effective tool to identify the genetic basis of complex traits in plants, including tomato. The approach, in fact, has several advantages, such as the lack of structure in the mapping population, the presence of alleles segregating at a balanced frequency, and the possibility to detect rare alleles and epistasis. However, the method is limited by the restricted allelic variation in biparental mapping populations (as only two alleles at a given locus can be studied simultaneously), the low-resolution mapping (generally limited to 10–20 cM) due to the reduced generations of recombination that can lead to extended linkage blocks, and the time-consuming crosses that are necessary for QTL mapping (Zhu et al. 2008).

Linkage disequilibrium (LD)-based association analysis, also known as association mapping (AM), has been proposed as an alternative approach, which can overcome these drawbacks. The approach has been pioneered in human genetics, where it has been exploited broadly to analyze human diseases (Kerem et al. 1989; Corder et al. 1994; reviewed by Visscher et al. 2012). Thanks to the rapid advances in the development of genomic tools and the consequent reduction in costs of genomic technologies, AM is now becoming a popular and powerful strategy also in crop genetics and crop improvement (for review, see Rafalski 2010; Flint-Garcia et al. 2003; Gupta et al. 2005; Zhu et al. 2008; Larsson et al. 2009; Korte and Farlow 2013). Two AM methodologies are in use: candidate gene association and whole genome scan, also called Genome-Wide Association Study (GWAS) (Rafalski 2010).

AM approaches rely on natural patterns of LD (the nonrandom association of alleles at different loci in the population), as they use panels of theoretically unrelated individuals. For crops, the method capitalizes on the wide range of phenotypic variation and historical recombination events accumulated in natural populations and collections of landraces, breeding materials, and varieties to infer marker-phenotype associations (reviewed by Flint-Garcia et al. 2003; Rafalski 2010; Korte and Farlow 2013). This allows reducing research time, to sample a broader genetic diversity, and to take advantage of a much greater genetic resolution, due to a larger number of recombination events. By contrast, the AM approach requires a thorough understanding of both the genetic structure and the extent of LD of the collection studied (Flint-Garcia et al. 2003; Myles et al. 2009). The decay of LD has been shown to differ dramatically between species, and generally LD is higher in selfing species like cultivated tomato and rice, than in outcrossing species; however, it can vary significantly even within a species, and among loci within a population, sometimes caused by positive selection (Flint-Garcia et al. 2003; Myles et al. 2009; Robbins et al. 2011). The rate of LD decay influences the resolution with which a QTL can be mapped, the number and density of markers, as well as the experimental design needed to perform an association analysis (Myles et al. 2009). AM approaches can result in increased resolution compared to linkage mapping populations, as long as enough markers are provided; and, in an ideal scenario, they can lead to the identification of the causative polymorphism(s) of a QTL. Because of domestication, crops are liable not only to higher levels of LD, but also to population structure (the presence of subgroups with unequal distribution of alleles in the population studied), and cryptic relatedness (the presence of close relatives in a sample of unrelated individuals) that all need to be taken into account in statistical analyses (Ranc et al. 2012; Korte and Farlow 2013). To handle the confounding effect of background loci that may be present throughout the genome due to LD, and thus to address the problem of high LD in GWA scans, Segura et al. (2012) proposed a multilocus mixed model (MLMM). In addition, several statistical methods have been suggested to reduce the risk of detecting spurious false-positive or false-negative associations in GWA studies due to population structure and cryptic relatedness (Flint-Garcia et al. 2003; Mitchell-Olds 2010).

Despite the advantages of AM in terms of higher resolution, allelic richness and speed, pitfalls do exist, and hence linkage mapping is considered a valuable complementary approach (Larsson et al. 2013). For this reason, the two strategies are often applied together to mitigate each other flaws, for example to validate the associations identified by AM, thus reducing spurious associations (Flint-Garcia et al. 2003; Larsson et al. 2013).

In tomato, a few association studies have been conducted to dissect morphophysical and fruit traits. Nesbitt and Tanksley (2002) used a collection of 39 cherry tomato accessions to identify associations between fruit size and genomic sequence of the fw2.2 region, which controls fruit weight (Frary et al. 2000). However, the small collection used prevented from finding any significant association. Subsequently, Mazzucato et al. (2008) investigated associations between 29 simple sequence repeat (SSR) markers and 15 morphophysiological traits in a collection of 50 tomato landraces. Recent association studies, which have included cherry tomato accessions (S. lycopersicum “cerasiforme”), have shown the potential of this genetic material to identify QTL by GWAS in tomato (Ranc et al. 2012; Xu et al. 2013). In particular, Ranc et al. (2012) carried out a pilot study to define the optimal conditions, including the marker density needed, to perform GWAS in the tomato by using an association panel of 90 tomato accessions (63 S. lycopersicum “cerasiforme”—cherry type, 17 S. lycopersicum—large fruited, 10 S. pimpinellifolium), focusing on chromosome 2, on which several clusters of QTL for fruit morphology and quality traits had been previously mapped (Causse et al. 2002). In another recent study, Xu et al. (2013) used low-density genome-wide-distributed SNP markers (SNPlexTM assay of 192 SNPs) on a large collection of 188 tomato accessions (44 heirloom and vintage cultivars (S. lycopersicum), 127 S. lycopersicum “cerasiforme” (cherry tomato) and 17 S. pimpinellifolium accessions) phenotyped for ten fruit quality traits. The results highlighted that GWAS in tomato should be easier with the group of S. lycopersicum “cerasiforme” accessions, characterized by an admixture structure (their genomes being mosaics of S. lycopesicum and the closely related wild species S. pimpinellifolium) as they exhibited higher minor frequency alleles (MAF) on average than cultivated group, lower LD and a less structured pattern. In spite of a high level of LD found in the collection at the whole genome level, a mixed linear model allowed the identification of several associations between SNP markers and fruit traits. However, the SNP density was still too low to identify SNPs in candidate genes.

Over the last years, the release of the tomato genome sequences (Tomato Genome Consortium 2012) and derived genomic tools such as a high-density SNP genotyping array (Sim et al. 2012) have offered new opportunities for GWAS in this crop. Shirasawa et al. (2013) analyzed a large collection of 663 tomato accessions with approximately 1300 SNPs obtained from resequencing analysis. Although, GWAS identified SNPs that were significantly associated with the measured agronomical traits, yet, the study investigated a limited number of traits (eight) with low precision on the association collection. More recently, Sauvage et al. (2014) have successfully applied high-resolution GWA using a MLMM as a general method for mapping complex traits in structured populations, to decipher the genetic architecture of tomato fruit composition traits. For this purpose, a core collection of 163 tomato accessions composed of S. lycopersicum, S. lycopersicum “cerasiforme,” and S. pimpinellifolium was genotyped with 5995 SNP markers spread over the whole genome. GWAS was conducted on a large set of metabolic traits that showed stability over 2 years, and the analysis allowed the identification of promising candidate loci underlying traits such as fruit malate and citrate levels.

Although, AM has rarely been used to identify the molecular bases of QTL in tomato, recently it has been successfully applied to identify QTNs responsible for locule number differences between S. lycopersicum “cerasiforme” and S. lycopersicum Muños et al. (2011). Furthermore, a combined approach was pursued by Chakrabarti et al. (2013) to clone the tomato fruit mass QTL fw3.2; in this case, association mapping followed by segregation analysis allowed to circumvent the low rate of LD decay found around the fw3.2 locus, and to identify a SNP in the promoter of the SlKLUH gene.

In order to overcome many of the shortcomings of both traditional biparental QTL mapping and AM approaches, a new generation of genetic-mapping populations, including Multi-parent Advanced Generation Inter-Cross (MAGIC) populations, have been proposed (Cavanagh et al. 2008). These next-generation populations combine the controlled crosses of QTL mapping with multiple parents and several generations of intermating to provide increased recombination and mapping resolution and to expand (albeit up to a certain point) allelic richness within the mapping population. The first tomato MAGIC population has been recently developed by Pascual et al. (2015) intercrossing eight resequenced S. lycopersicum founder lines, which had been selected to cover a wide range of genetic diversity. The study has shown the potential of this tomato MAGIC population for a better exploitation of intraspecific genetic variation, QTL mapping and for the identification of causal polymorphisms.

From QTL to QTN and Epialleles

A fundamental question in modern biology is identifying the causative genes and the genetic changes underlying complex traits. Whereas much progress has been made in detecting QTL, the molecular cloning of the underlying genes is lagging behind.

In tomato, map-based strategies, using higher resolution near-isogenic lines derived from the S. pennellii LA0716 ILs, were successfully applied for cloning the first-ever QTL: fw2.2 (fruit weight) (Frary et al. 2000; Cong et al. 2002) and Brix9-2-5 (sugar yield, or Brix) (Fridman et al. 2000, 2004). Both are major QTL, as natural genetic variation at fw2.2 alone can change the size of fruit by up to 30 % (Frary et al. 2000), while Brix9-2-5 can increase sugars by as much as 25 % (Fridman et al. 2000, 2004). The gene underlying fw2.2 encodes a negative regulator of cell division, member of the Cell Number Regulator (CNR) family, and controls tomato fruit mass as well as organ size in other species, e.g., maize (Guo et al. 2010; Guo and Simmons 2011) and nitrogen-fixing nodule number (Libault et al. 2010). While modest changes in transcript quantity and in the timing of gene expression were correlated with natural variation at fw2.2, on the other hand, altered enzyme activity, as a result of a single nucleotide change in a cell wall invertase gene, LIN5, leading to a single amino acid change in the corresponding protein in an area very close to the substrate-binding site of the enzyme, was found to be the cause for the variation between the cultivated and wild species alleles at Brix9-2-5 (Fridman et al. 2004). A comparative association study between the nucleotide polymorphism and activity of LIN5 conducted in a set of ILs derived from additional tomato species led to the identification of the causative quantitative trait nucleotide (QTN) (Fridman et al. 2004). These first two studies demonstrated that IL-based Mendelian segregation is a very efficient way to partition continuous variation for complex traits into discrete molecular components. Furthermore, these QTL were the first among many showing that, similarly to the variation found for numerous genes that control monogenic traits, variation in QTL alleles in plants can be identified in both coding and regulatory regions of single genes (Paran and Zamir 2003; Salvi and Tuberosa 2005).

Because of domestication and selection, tomato cultivars show a wide variation in fruit morphology (size and shape) that is under the control of a large number of QTL (Grandillo et al. 1999; Tanksley 2004; van der Knaap et al. 2014). Wild and semi-wild forms of tomato carry small fruit that might weigh only a few grams and that are usually round and bilocular. By contrast, fruit from modern tomato varieties may contain many locules (up to 10 or more) and weigh up to 1 kg, and come in a wide variety of shapes that have been recently classified in eight shape categories (flat, ellipsoid, rectangular, oxheart, heart, long, obovoid, and round) using the software program Tomato Analyzer (Brewer et al. 2006, 2007; Rodriguez et al. 2010, 2011). Among the numerous fruit mass QTL identified in tomato, six loci [fruit weight1.1 (fw1.1), fw2.2, fw2.3, fw3.1/fw3.2, fw4.1, and fw9.1] are postulated to be major QTL; whereas major fruit shape QTL include ovate, locule number (lc), sun, fs8.1 and fasciated (f or fas) (Grandillo et al. 1999; Tanksley 2004; Chakrabarti et al. 2013; van der Knaap et al. 2014).

Following the positional cloning of fw2.2, significant efforts have been invested in deciphering the molecular basis of tomato fruit morphology. The results obtained so far from the map-based cloning of six tomato fruit shape and weight genes demonstrate that inversions, duplications, as well as SNPs in promoters and coding regions control the phenotypic diversity of the tomato fruit (reviewed by Monforte et al. 2014; Van der Knaap et al. 2014). The cloning of fw2.2 revealed that one of the earliest steps in the evolution of larger tomato fruit was caused by a heterochronic regulatory mutation in a cell cycle–control gene, as more cells were observed in large compared with small fruits (Frary et al. 2000; Cong et al. 2002). More recently, Chakrabarti et al. (2013) have reported the fine mapping and cloning of a second major tomato fruit mass QTL, fw3.2, encoding the ortholog of KLUH, SlKLUH, a P450 enzyme of the CYP78A subfamily. A combination of association mapping followed by segregation analysis, and transgenic studies allowed the identification of a likely regulatory SNP in the promoter of the gene that was highly associated with fruit mass. The increase in fruit mass resulted from the production of extra cell layers in the pericarp, taking place after fertilization, which implies that SlKLUH affects cell division.

Changes in fw2.2 and other cell cycle related genes, however, cannot explain the extreme fruit size observed in modern tomato cultivars. Rather, the development of extreme fruit size has been associated to several QTL affecting locule number, which can influence both fruit size and shape. Two of these QTL, fas (chromosome 11) and lc (chromosome 2), and their epistatic interactions, explain most of the phenotypic variation (Lippman and Tanksley 2001; Barrero and Tanksley 2004). Both QTL affect organ (carpel) number rather than size, but fas exerts the larger effect; in addition, both QTL influence flat fruit shape (Lippman and Tanksley 2001; Barrero and Tanksley 2004; Barrero et al. 2006; Rodriguez et al. 2011). Besides fas and lc, other two major fruit shape QTL, whose molecular bases have been deciphered, are ovate (chromosome 2) and sun (chromosome 7), and both influence fruit elongation (Tanksley 2004; Rodriguez et al. 2011).

Positional cloning of ovate was achieved using segregating populations derived from S. pennellii ILs (Liu et al. 2002). The gene encodes a protein in the Ovate Family Protein (OFP) that is thought to negatively regulate transcription of target genes (Liu et al. 2002; van der Knaap et al. 2014), and a premature stop codon in OVATE controls fruit elongation. The OVATE gene affects fruit shape well before anthesis, and the increase in fruit elongation is caused by cell proliferation in the proximal region of the developing ovary (van der Knaap and Tanksley 2001; Monforte et al. 2014; van der Knaap et al. 2014).

The same S. pennellii IL-based strategy was adopted to clone the gene underlying the fas QTL, which was found to encode a YABBY-like transcription factor; a mutation in FAS leads to an increase in locule number which affects both fruit shape (flattened fruit) and fruit mass (larger fruit) (Lippman and Tanksley 2001; Cong et al. 2008). Initially, the mutation was postulated to be caused by a large insertion in the first intron of YABBY (Cong et al. 2008); however, a reexamination of the nature of the genome rearrangement at the fas locus demonstrated that the mutation is due to a 294-kb inversion disrupting the YABBY gene (Huang and van der Knaap 2011).

For the cloning of the other two major fruit shape QTL, sun and lc, the S. pennellii IL resource could not be used. For sun, the obstacle was given by its map position, as this locus was localized inside a paracentric inversion within the S. pennellii genome (van der Knaap et al. 2004). For lc, the limitation derived from its weaker effect on fruit locules compared with that of fas, and it was, therefore, necessary to overcome all genetic background effects.

Positional cloning of sun revealed that the gene underlying this QTL encodes a member of the IQ domain family (Xiao et al. 2008). The elongated fruit phenotype is caused by an unusual interchromosomal 24.7-kb gene duplication event mediated by the long-terminal repeat retrotransposon Rider, which results in a much higher expression of SUN throughout floral and fruit development and an extremely elongated fruit (Xiao et al. 2008; Jiang et al. 2009; Wu et al. 2011). Although fruit shape patterning mediated by SUN is most likely established before anthesis, yet, the most significant fruit shape changes take place after fertilization, during the cell division stage of fruit development (van der Knaap and Tanksley 2001; Xiao et al. 2009).

More recently, the lc QTL was positionally cloned using a combination of map-based cloning to identify the locus region (a sequence of 1600 bp) between a putative ortholog of WUSCHEL (WUS), which encodes a homeodomain protein that regulates stem cell fate in plants, and a WD40 motif containing protein, and association mapping to refine its molecular characterization, which consisted of two SNPs located approximately 1080-bp downstream of the stop codon of WUS (Muños et al. 2011). Subtle changes in the expression of SlWUS are likely the cause of the increased number of locules determined by lc (van der Knaap et al. 2014). It has also been suggested that the lc mutation might cause a loss-of-function regulatory element which would allow a higher expression of SlWUS, resulting in maintenance of a larger stem cell population and hence in increased locule numbers (van der Knaap et al. 2014).

Map-based cloning approaches have also been used to decipher the molecular basis of other two major QTL in tomato: style length 2.1 (Style 2.1) (Chen et al. 2007), controlling a key floral attribute associated with the evolution of autogamy in cultivated tomatoes, and seed weight 4.1 (sw4.1) (Orsi and Tanksley 2009). Mapping studies had demonstrated that most of the structural changes that accompanied the evolutionary transition from cross-pollinating to self-pollinating flowers could be explained by a single major QTL on chromosome 2, designated stigma exertion 2.1 (se2.1) (Bernacchi and Tanksley 1997; Fulton et al. 1997). Fine mapping has shown that se2.1 was a complex locus composed of at least five closely linked genes: three controlling stamen length, one conditioning anther dehiscence, and a fifth one, which accounted for the greatest change in stigma exertion, controlling style length (Style 2.1) (Chen and Tanksley 2004). Positional cloning of Style2.1 revealed that this gene encodes a putative transcription factor that regulates cell elongation in developing styles and that the transition from allogamy to autogamy was caused by a mutation in the Style2.1 promoter that leads to downregulation of Style2.1 expression during flower development (Chen et al. 2007).

The numerous QTL mapping studies conducted for tomato seed size in several interspecific crosses have revealed over 20 QTL accounting for most seed size variation; among these, the major QTL Sw4.1, mapping on chromosome 4, constantly explained a large fraction (up to 25 %) of the total phenotypic variation in segregating populations (Table 4.4) (reviewed by Doganlar et al. 2000b). For this reason, Sw4.1 was selected for map-based cloning, and using a combination of genetic, developmental, molecular, and transgenic techniques Orsi and Tanksley (2009) identified a gene encoding an ABC transporter gene as the cause of the Sw4.1 QTL. This gene exerts its control on seed size via gene expression in the developing zygote.

Despite the successes achieved so far, delimiting a QTL to a single gene using genetic approaches is still a technically demanding and daunting undertaking, largely limited to loci exerting large effects upon quantitative variation. In order to enhance the rate of QTL cloning, integrated strategies, which combine near-isogenic line mapping with “omic” analyses (transcriptome or genomic resequencing, metabolome and/or proteome) can be pursued (Wayne and McIntyre 2002). These approaches represent efficient tools for exploring the functional relationship between genotype and phenotype, as they facilitate filtering through candidate genes in a QTL interval. In line with this, Lee et al. (2012) applied ripe fruit transcriptional and metabolic profiling to the S. pennellii LA0716 exotic library. Correlation analyses allowed mining for candidate genes, and the ethylene response factor SlERF6 was identified as a valuable target for RNAi analysis, which showed that SlERF6 plays a central role in tomato ripening integrating the ethylene and carotenoid synthesis pathways. This study demonstrated the utility of systems-based analysis to identify genes controlling complex biochemical traits in tomato.

More recently, Quadrana et al. (2014), have identified the gene underlying a major tomato vitmine E (VTE) QTL (mQTL9-2-6), which encodes a 2-methyl-6-phytylquinol methyltransferase (namely VTE3(1)). Using a combination of reverse genetic approaches, expression analyses, siRNA profiling and DNA methylation assays, the authors demonstrated that mQTL9-2-6 is an expression QTL associated with differential methylation of a SINE retrotransposon located in the promoter region of VTE3(1). In addition, different epialleles affecting VTE3(1) expression and consequently VTE content in fruits were observed because of spontaneous reversions of promoter DNA methylation. These findings demonstrate that epigenetics can affect quantitative phenotypes of agronomic interest.

Conclusions and Perspectives

We have reviewed more than three decades of research conducted in tomato to dissect the genetic and molecular bases of quantitative traits. Over these years, the tomato clade (Solanum sect. Lycopersicon) has been at the forefront not only for the localization, characterization, and positional cloning of QTL, but also for the development of new molecular breeding strategies, namely the “AB-QTL” and the “IL libraries,” aimed at a more efficient exploitation of the wealth of genetic variation stored in unadapted germplam. The last 20 years of research conducted on the founder S. pennellii LA0716 IL library have demonstrated the power of these congenic and permanent resources for the genetic and molecular analyses of QTL, for exploring the genetic bases of heterosis, and for the related practical outcomes, which have resulted in the development of a leading hybrid variety.

The numerous QTL mapping studies conducted in tomato so far have allowed the identification of thousands of QTL many of which are of potential interest for the improvement of this crop. However, despite this richness of genetic information, only a few major QTL have been isolated to date. In order to reverse this trend the tomato research community is capitalizing on the ever growing genetic and “omic” tools, which, in turn, are building on the recently released tomato genome sequences (Tomato Genome Consortium 2012). In this respect, the application of integrated approaches are allowing more detailed systems-level analyses which hold the promise of enhancing our understanding of the functional relationship between genotypes and complex phenotypes (Schauer et al. 2006, 2008; Lee et al. 2012; Chitwood et al. 2013; Pascual et al. 2013).

In addition, the availability of the tomato genome sequences (Tomato Genome Consortium 2012) along with the advent of new cost-effective, high-throughput genotyping, and sequencing technologies are opening new avenues for a reexamination of the variation and inheritance of quantitative traits at the intraspecific level (Pascual et al. 2015; Sauvage et al. 2014). AM approaches can be viewed as complementary to AB-QTL and IL populations as they represent an additional tool for exploring and exploiting extant functional diversity available for each crop species on a much larger scale (Zhu et al. 2008). Furthermore, within the SOL-100 sequencing project (http://solgenomics.net/organism/sol100/view), sequences are becoming available for most of the parents of the tomato IL/BIL populations developed so far. This, in principle, should allow traits to be mapped to known sequence variation, which, in turn, should provide a major advancement in the identification of valuable alleles, further increasing the value of these genetic resources (Bolger et al. 2014). In view of the rapid developments in sequencing technology, it is also foreseen that methods that make use of whole genome sequencing-based technique, such as QTL-seq, will also accelerate crop improvement in a cost-effective way (Takagi et al. 2013).

In order to facilitate the identification of candidate genes and thus help elucidating the molecular basis of quantitative phenotypes, several bioinformatic tools are being developed (Tecle et al. 2010; Chibon et al. 2012). Notably, the Sol Genomics Network (SGN, http://solgenomics.net) has implemented a new QTL module, solQTL, which allows researchers to upload their raw genotype and phenotype QTL data to SGN, perform QTL analysis and dynamically cross-link to relevant genetic, expression and genome annotations, using a user-friendly web interface.

The constant improvements of molecular platforms, the development of new types of genetic resources, along with progresses in bioinformatics and in tools for functionally testing candidate genes are expected to rapidly enhance our ability in unveiling the molecular basis of QTL other than those with a major effect.

In spite of all these technological advances, QTL mapping in biparental populations will probably remain the method of choice for the analysis of epistatic interactions and when rare alleles are involved, especially those with moderate effects (Rafalski 2010). Regardless of the mapping approach used, independent validation of the associations and evaluation of their effects in different genetic backgrounds remain essential aspects of QTL analyses. Furthermore, the role of epigenetics in determining variation in quantitative traits and in phenotypic plasticity needs to be further addressed (Cobb et al. 2013; Quadrana et al. 2014).

Given the wealth of low-cost genomic information, which is rapidly becoming available for most important crop species, phenotyping is emerging as the major bottleneck and funding constraint limiting the power of quantitative traits analyses (Cobb et al. 2013). There is a clear need for precision phenotyping systems able to provide high-quality phenotypic information on the entire collection of genetic factors underlying quantitative phenotypic variation at all levels of biological organization (cells, tissues, organs, and developmental stages) as well as across years, environments, species, and research programs (Chitwood and Sinha 2013; Cobb et al. 2013). Due to the development of high-throughput platforms and image analysis software packages, next-generation phenotyping will require novel data management, access, and storage systems (Cobb et al. 2013). In this framework, public phenotype “warehousing” databases are foreseen as an additional necessary tool to empower our understanding of the genetic and molecular architecture of complex traits (Zamir 2013), and thus to ensure continued advancement in crop improvement aimed at sustainably meeting the demands of a growing human population under changing climates (Godfray et al. 2010).