Introduction

Rapid progress in the field of high-throughput sequencing technology has led to the accomplishment of more than one hundred plant genome sequencing projects to date, including tobacco (Bombarely et al. 2012; Sierro et al. 2013, 2014), and five other Solanaceae members (Potato Genome Consortium 2011; Tomato Genome Consortium 2012; Bolger et al. 2014; Hirakawa et al. 2014; Kim et al. 2014; Bombarely et al. 2016). To subsequently decipher the functions of predicted genes, identify novel genes, and understand the molecular basis of important agronomic traits in these plants, large libraries of mutant lines have been generated by random mutagenesis via chemical [e.g., ethyl methane sulfonate (EMS)], physical (e.g., fast neutrons and X-rays) or biological (e.g., T-DNA and transposon) agents based on the requirement of functional genomics (Alonso and Ecker 2006).

The most direct and effective method of investigating gene functions is to characterize the phenotypic variations of mutants. These phenotypic variations constitute the raw material for many fields of plant biology, including research on gene functions, mutation breeding for desirable characteristics, mutant plant tolerance to other species (e.g., virus, bacteria, and fungus), etc. (Oellrich et al. 2015). Ultimately, large-scale phenotyping of saturated mutant libraries can determine the relationship between genotypes and phenotypes and build a highly informative bridge between genomes and phenomes.

T-DNA and transposon insertions have been the most successful approaches for linking a genotype to a phenotype via the generation of sequence-indexed mutant collections suitable for systematic reverse genetics in Arabidopsis (Kuromori et al. 2009; O’Malley and Ecker 2010; Bolle et al. 2011; Wilson-Sánchez et al. 2014). A large number of insertion mutant libraries have been generated in rice, and these libraries are widely used for phenotypic analyses and gene function studies (Kuromori et al. 2009; Chang et al. 2012; Wang et al. 2013). In these diploid insertion mutants, the relative merits between loss-of-function and gain-of-function mutagenesis can complement one another. However, 12.2% of the genes in Arabidopsis still lack inserts even the amount of existing insertion mutants have been increased to 385,000 (O’Malley and Ecker 2010); and one of the probable reasons is that T-DNA insertions have a large integration site bias, such as favering intergenic regions (Sessions et al. 2002; Alonso et al. 2003; Rosso et al. 2003). Insertional mutagenesis also has a much lower mutagenic efficiency relative to EMS mutagenesis (Alonso and Ecker 2006; Warnasooriya and Montgomery 2014). The EMS mutagenesis generates multiple types of DNA alterations, although it predominantly generates not only single base-pair substitutions but also small insertions and deletions as demonstrated in a study of more than 1900 EMS-induced mutations in Arabidopsis (Greene et al. 2003; Till et al. 2003). More importantly, approximately 400 mutations per genome can be generated by EMS treatment, compared with an average of 1.5 insertions per T-DNA mutant by insertion mutagen (Alonso et al. 2003; Till et al. 2003). Owing to its high efficiency and unbiased mutagenic distribution, EMS mutagenesis is the preferred choice in exploratory mutagenesis studies to identify specific mutant phenotypes (Alonso and Ecker 2006), especially for morphological phenotypes of allopolyploid, such as tobacco (Nicotiana tabacum, allotetraploid, 2n = 48).

Tobacco is an economic crop and an important model plant for studying fundamental biological processes and plant disease susceptibility (Sierro et al. 2014). Moreover, tobacco can manufacture patient-specific vaccines as a bioreactor system (Arntzen 2008), and its ability to produce biodiesels makes it a potential bioenergy crop because of its large biomass and easy plant-transformation (Andrianov et al. 2010; Fuchs et al. 2013; Vanhercke et al. 2014). To accelerate tobacco research, the genomes of four tobacco species, N. benthamiana (Bombarely et al. 2012), N. sylvestris, N. tomentosiformis (Sierro et al. 2013), and N. tabacum (Sierro et al. 2014), have been sequenced. N. sylvestris (2n = 24) and N. tomentosiformis (2n = 24) are considered to be two ancestors of N. tabacum (Leitch et al. 2008; Sierro et al. 2013). Many databases related to tobacco that are useful for reverse genetics research have been developed, and the genome and transcriptome sequences are stored in the Sol Genomics Network (SGN), and the expressed sequence tags (ESTs) are stored in the National Center for Biotechnology Information (NCBI). One large-scale phenotyping study was also performed in RNA-interference transgenic tobacco plants to identify phenotypic changes (Lein et al. 2008). However, collections of tobacco mutants as a powerful tool for genetic research are not reported, except that one activation-tagging mutant library of tobacco containing approximately 100,000 transgenic tobacco plants was recently generated. In 4105 T1 family lines, three hundred and eleven lines displayed abnormal phenotypes in leaf and flower morphology, plant height, flowering time, branching, and fertility (Liu et al. 2015).

To compensate for the shortcomings of activation-tagging mutagenesis and take the first step to link phenotypes to genotypes in tobacco, EMS mutagenesis was performed on N. sylvestris, N. glutinosa, N. suaveolens, N. tabacum cv. Zhongyan100 (ZY100), HonghuaDajinyuan (HD), and CuiBi1 (CB1) in our mutant library. Mutant collections of ZY100 are presented and summarized in emphasis because of the high diversity and proportion of the populations, and a total of 94,997 M2 family seeds of ZY100 have been collected. To date, 1607 mutant family lines (containing 2196 mutations of abnormal phenotypes) and 8610 mutant plants have been isolated from 5513 M2 family lines and 69,531 M2 plants, respectively. All of the mutants with visible altered morphological phenotypes were collected and classified into different phenotypic categories. The pleiotropy, hereditability, and saturability of the mutant populations are analyzed and discussed. This novel mutant library can be widely utilized as a crucial and fundamental resource for functional genomics and applied research in tobacco.

Materials and methods

Plant materials and mutagenic treatment

Tobacco seeds were obtained from the National Infrastructure for Crop Germplasm Resources (Tobacco, Qingdao). Five cultivars containing N. tabacum cv. ZY100, Yuyan97 (Y97), Yunyan87 (Y87), HD, and K326 were used for the selection of donor parents. EMS of Sigma (M0880-5G, CAS: 62-50-0) were used as the chemical mutagen. The seeds were soaked in distilled water for 4 h and then treated with three different EMS concentrations (0.4, 0.6, and 0.8%) for 16 h at room temperature under gentle shaking. After removing the EMS solutions, the treated seeds were washed three times with distilled water for 2 h by gentle shaking. The resulting M1 seeds were then sown in the greenhouse or in field.

Procedure for isolating putative mutants

The detailed workflows for constructing mutagenized populations, isolating and characterizing putative mutants are indicated in Fig. 1. The resulting M1 seeds under EMS mutagenesis were sown in the greenhouse or in field to harvest M2 seeds from each of the single M1 plants. Next, 50–100 M2 seeds were sown in a pot, and 15 seedlings were selected randomly for transplantation as a M2 family line. The M2 plants were screened to detect alterations in their visible morphological phenotypes during the whole field stage. M3 seeds from part or all of the M2 plants in each line were harvested and mixed, and the M3 seeds from each individual putative mutant were also harvested. Individual mutant images, their phenotypic categories, and all M3 seeds were registered in the tobacco mutant database (http://www.tobaccomdb.com/english/). Subsequently, each M3 mutant family line with 30 plants was sown for characterization and verification of putative mutant phenotypes. The M3 homozygous mutants were crossed with wild-type cultivars to produce the F1 generation and the F2 populations for genetic analysis and genome mapping. If new mutant phenotypes were identified in an individual M3 or M4 mutant plant, then the M4 or M5 seeds of this plant were added to the seed library and then sown as a mutant family line to verify the new mutant phenotypes. M4 and M5 seeds from confirmed homozygous mutants were also collected in the seed library.

Fig. 1
figure 1

Flow chart detailing the steps followed for the mutagenesis of seeds, construction of M2 populations, screening for visible abnormal-phenotype mutants, genetic analysis of mutagenized populations, harvesting and storage of homozygous seeds, and registration in the mutant library. M0 seeds of ZY100 were mutagenized with EMS, and the resulting M1 seeds were sown to collect M2 seeds from single M1 plants. Then, 15 plants of each M2 family line were screened for mutants with visible morphological phenotypes for the whole field stage. M3 seeds from part or all of the M2 plants in each line were mixed, and the M3 seeds of each putative mutant were also harvested. Each M3 mutant family line was sowed to verify the mutant phenotypes. M4 seeds from confirmed homozygous mutants were entered into the database. M4 mutant family lines with new mutant phenotypes were sown to verify the new mutant phenotypes. Homozygous mutants were crossed with wild-type plants to produce F1 generations and F2 populations for the genetic analysis and genome mapping. Individual mutant images, their phenotypic categories, and all seeds from the M2 to M5 generation were registered in the mutant library

Plant phenotyping

Phenotyping was performed from seed germination to seed harvesting at different developmental stages, including the nursery stage, rosette stage, vigorous growing stage, flower budding stage, flowering period, and fructicative period (Fig. S1). Large-scale and limited phenotyping was conducted for the M2 generation, in which the plants with altered morphological phenotypes were regarded as ‘putative M2 mutants’. Systematic evaluations of mutant phenotypes were performed in the M3 to M5 generations. Distinguishable phenotypes were recorded and photographed using a Canon EOS 5D MarkII digital camera.

Each putative mutant was characterized according to four groups, 17 major categories, and 51 subcategories, which are reported in Table 1 and Table S1. The phenotypic categories of the screened mutants describe the morphological changes at all the developmental stages (Table S1).

Table 1 Assignment of putative M2 mutants to phenotypic groups and major categories with the number and percentage (%) of phenotypes

Evaluation criterion of the heritability of mutant phenotypes

To test the reproducibility and heritability of the mutant phenotypes, six types of family lines were defined based on the reproducibility of the original mutant phenotypes (OMPs) and the emergence of novel mutant phenotypes (NMPs). Taking the M3 family line for example, the phenotypes of its mutant plants in the previous M2 generation were named ‘OMPs’, and new phenotypes in this M3 line that differed from OMPs were named ‘NMPs’. When all M3 plants displayed phenotypes that were indistinguishable from their OMPs, the line was recorded as a ‘stable line with OMPs’. When part of the M3 plants in the line showed the same phenotypes for OMPs and no NMPs, the line was noted as a ‘segregating line with OMPs’. When all or part of the M3 plants in the line completely recovered OMPs simultaneously manifesting NMPs, the line was designated as a ‘segregating line with OMPs and NMPs’. When OMPs were not observed in all M3 plants, and NMPs emerged in part of the M3 plants, the line was recorded as a ‘segregating line with NMPs’. When all M3 plants exhibited NMPs and no OMPs, the line was recorded as a ‘stable line with NMPs’. When all M3 plants displayed a morphology that was indistinguishable from that of wild type, without OMPs and NMPs, the line was recorded as a ‘line without abnormal phenotypes’ (Table 2).

Table 2 Heritability of the mutant phenotypes in the M3 to M5 mutants of EMS-mutagenized populations

Results

Selection of donor parents and EMS concentrations for treatment

Common tobacco is a large-sized plant (more than 1.5 m in height after the flowering stage) that has a long life cycle (160–180 days from seed germination to leaf maturity), consisting of several developmental stages (Fig. S1). A large number of cultivated tobacco varieties are available worldwide, and polymorphisms occur in the different genetic backgrounds. To optimize the conditions for mutagenesis and to obtain eligible cultivars, the seeds of five cultivars with different ecotypes were chosen to test the concentration and duration of EMS treatment (Table S2) (Wang et al. 2011). The donor cultivar ZY100 is widely cultivated in the north producing area of China, and Y97, Y87, and HD are cultivated in the southwest. K326 is adaptable to grow in most of the producing areas. In preliminary tests, 16 h appeared to be an effective duration for EMS treatment (data not shown). Seven concentrations of EMS treatments (0.1, 0.3, 0.5, 0.7, 0.9, 1.1 or 1.3%) and one control treatment using a water solution without EMS were applied for 16 h. The results indicated that the lethal dose (LD) for all five varieties was 1.3%, and the appropriate concentrations of EMS for treating tobacco seeds ranged from 0.35 to 0.52%. The sensitivity of the five tobacco varieties to EMS treatment in descending order was as follows: with significant difference, Y97/Y87, HD, ZY100, and K326. Ultimately, 0.4% (LD30, lethal dose causing a 30% reduction in seed germination), 0.6% (LD50), and 0.8% (LD65) were selected as the treatment concentrations, and HD and ZY100 served as the donor parents for EMS mutagenesis. The rate of observable phenotypic changes in HD (19.26% of 862 M2 lines) is far lower than that in ZY100 (39.83% of 5513 M2 lines); therefore, the phenotypes of mutants from ZY100 were prioritized for analysis.

Phenotypic classification of putative mutants

Mutant libraries obtained via EMS-mutagenesis have been constructed from 2008 to 2015, during which a 3-year preliminary experiment, and a 5-year large-scale screening and verifying experiment were also performed. 94,997 M2 family seeds were collected in total. Statistical analyses of the data indicated that 84,531 of 101,512 mutagenic M1 plants planted in the field survived, for a survival rate of 83.27% (Table S3). The strategy for constructing mutagenized populations and isolating and characterizing putative mutants is summarized in detail (Fig. 1).

From 2012 to 2013, a total of 5513 M2 family lines (69,531 M2 plants in total) had been sown and screened (Table 1, Table S1), and they showed a sterility rate of 2.85% for family lines and 0.48% for plants. In total, the screen for mutants with visible altered morphological phenotypes yielded 1607 mutant family lines and 8610 mutant plants, which demonstrated mutation frequencies of 29.15% in the family lines. There were 2196 different mutations of abnormal phenotypes in 1607 mutant lines (Table 1, Table S1). All of the visible mutations of abnormal phenotypes were classified into four groups with 17 major categories and 51 subcategories, and there were one to six subcategories for each of the 17 major categories in the catalog. The major categories and subcategories are described and defined in more specific detail in Table S1.

To evaluate the distributions of different mutation phenotypes in the tissues and organs of the tobacco plant, the number and proportion of the groups, major categories, and subcategories of defined visible phenotypes were summarized. The most abundant group was leaf morphology (38.89% in the lines), followed by plant habit (25.68%), flower (19.44%), and leaf color and senescence (15.98%) in M2 mutant populations, with a mutation rate (the percentage of putative M2 mutant lines in all M2 family lines) of 15.49, 10.23, 7.75, and 6.37%, respectively (Fig. S2; Table 1). The five most abundant major categories were plant height, leaf shape, leaf surface, leaf color, and flowering time, which covered more than 60% of the abnormal phenotypes observed. Leaf shape and surface were the two most prominent categories (16.53 and 15.03% in the lines) in the leaf morphology group, and they had mutation rates of 6.58% and 5.99%. In the plant habit group, plant height was the most prominent category (14.71%), and it had a mutation rate of 5.86%. Leaf color and flowering time were the most abundant categories in the leaf color and senescence and flower groups, which exhibited mutation rates of 5.21 and 3.50%, respectively (Fig. 2; Table 1, Table S1).

Fig. 2
figure 2

Classification of visible mutant phenotypes into 17 major categories for M2 mutant family lines

Rough and rugose leaf, dark green leaf, dwarf plant, high plant, and narrow leaf were the five top subcategories and accounted for more than 40% of all of the categories. Rough and rugose leaf was the largest subcategory and presented mutation rates of 4.84% (Fig. 3e-1, e-2; Table S1). The mutations of this subcategory has several types that are usually accompanied by the mutations of other categories, such as leaf shape (Fig. 3d4-1, d5), leaf color (Fig. 3f3), and leaf vein (Fig. S3e1-2, e2). Many other subcategories had multiple types of mutant phenotypes. For example, mutants with spirally twisted leaves consisted of several types and included all leaves spiraling clockwise (Fig. 3d4-1), all spiraling anticlockwise (Fig. 3d4-2), part of leaves spiraling clockwise, and others spiraling anticlockwise (Table S1). The white (albino) phenotypes appeared in the form of spotted (Fig. S3i1-1), small sheet, large sheet (Fig. 3f1), and spotted confused sheet (Fig. S3i1-2). Mutants with yellow (etiolated) leaves were also indicated of several mutant types, such as entirely yellow leaf (Fig. S3i2-1, i2-3) and mostly yellow leaf (Fig. 3f2, Fig. S3i2-2).

Fig. 3
figure 3

Part of the mutants with a high mutation ratio representing several phenotypic categories and subcategories. a1, a2 Mutants with respect to plant height: a1 Dwarf plant and lanceolate leaves. a2 Tall plant with long internodes. b Mutant with short internodes. c Mutant with more lateral branches. d1d5 Mutants for leaf shape: d1 Lanceolate leaf with dwarf plants and early flowering. d2 and d3 Narrow and large wide leaves, respectively. d4-1 and d4-2 Clockwise and anticlockwise spirally twisted leaves, respectively. d5 Deformed leaves with dark green, rugose and sharply curly leaves. e-1, e-2 Mutants with rough and rugose leaves with lanceolate and dark green leaves. f1–f3 Leaf color mutants: f1 Albino leaves with large-sheet white color. f2 Etiolated leaves with yellow color in half of the leaf. f3 Dark green leaves with revolute margins. g1, g2 Flowering time mutants: g1 Early flowering with fewer leaves. g2 Late flowering time. Bars 10 cm

Pleiotropy between mutants and phenotypes

Because mutants occasionally present multiple phenotypes, the number of mutations of abnormal phenotypes exceeded the total number of mutant lines, thereby demonstrating as ‘pleiotropy’. 1607 mutant family lines and 2196 mutations of abnormal phenotypes were screened from the M2 mutant populations for a mutation frequency of 29.15% in the family lines and 39.83% in the morphological phenotypes (Table 1, Table S1). Of the 1607 M2 mutant lines, 1142 mutant lines showed single phenotypes and the others showed multiple phenotypic categories in each line. Each of the 357 mutant lines represented two kinds of phenotypic categories, and each of the 87 lines represented three kinds. Sixteen and three lines revealed four and five kinds of phenotypic categories in each line, respectively. Two mutant lines indicated six to seven categories. In summary, the pleiotropic mutant lines accounted for 28.94% of the total mutant family lines, and non-pleiotropic mutant lines accounted for the remainder (Fig. 4a).

Fig. 4
figure 4

Rate of pleiotropy in mutant populations and distribution of pleiotropic and non-pleiotropic mutants in the major phenotypic categories. a Rate of pleiotropy in M2 families of EMS-mutagenized populations. The bars for one to seven phenotypic categories indicated the number of mutants showing a single phenotypic category (non-pleiotropic mutants) or multiple (two to seven) phenotypic categories (pleiotropic mutants). b Number of mutants with pleiotropic and non-pleiotropic phenotypes in the major categories. Each bar shows the number of mutants in the relevant category. Pleiotropic mutants have more than one phenotype, and non-pleiotropic mutants have one

Of the 2196 mutations of abnormal phenotypes, 1054 (48%) were pleiotropic (Fig. 4b; Table S4). The rates of pleiotropy in the 17 major categories ranged from 17.19% (leaf senescence) to 78.57% (leaf number), with the rates in most major categories varying around 50%. The rates of pleiotropy in the top five major categories ranged from 45.82 to 51.79%, and their total number accounted for 68.88% of the pleiotropic phenotypes. The category containing the fewest pleiotropic phenotypes was stem color, which only had one mutant line (0.09%). Leaf shape constituted the maximum numbers of pleiotropic phenotypes with 188 mutant lines (17.84%). The remaining top pleiotropic categories included leaf surface at 15.65% (165), plant habit at 14.04% (148), and leaf color at 12.81% (135).

Reproducibility and heritability of the mutant phenotypes

To test the reproducibility and heritability of the mutant phenotypes, we randomly selected and examined the family lines with mutant phenotypes in the M3, M4, and M5 plants. Six types of family lines were defined as the evaluation criterion of the heritability of mutant phenotypes based on the reproducibility of OMPs and the emergence of NMPs (Table 2).

The heritability of the mutant phenotypes indicated that as the mutant generation increased, the ratios of lines inheriting OMPs increased and lines with emerging NMPs decreased. Thirty inbred seeds in each line were sown from all 143 M3 family lines to evaluate the reproducibility and heritability of OMPs in M2 putative mutants. In 101 (70.63%) M3 lines, the phenotypes were reproducible from OMPs of each corresponding M2 mutant and included 31 (21.68%) stable lines with OMPs, 51 (35.66%) segregating lines with OMPs, and 19 (13.29%) segregating lines with OMPs and NMPs. Eighty-two (57.34%) lines could completely transmit OMPs and showed stable inheritance. However, 41 (28.67%) lines produced NMPs, including 19 (13.29%) segregating lines with OMPs and NMPs, 16 (11.19%) segregating lines with NMPs, and six (4.20%) stable lines with NMPs. Twenty (13.99%) lines did not display any of the described phenotypes. One hundred fifty-two M4 family lines from part of the M3 mutant plants were selected and planted. One hundred twenty-nine (84.87%) M4 lines transmitted the M3 OMPs and included 44 (28.95%) stable lines with OMPs, 46 (30.26%) segregating lines with OMPs, and 39 (25.66%) segregating lines with OMPs and NMPs. Ninety (59.21%) lines indicated stable inheritable of OMPs without NMPs. Twelve (7.89%) lines showed the same morphology as the wild-type donor. NMPs appeared in 50 (32.89%) M4 lines, which contained 39 (25.66%) segregating lines with OMPs and NMPs, five (3.29%) segregating lines with NMPs, and six (3.95%) stable lines with NMPs. In 47 M5 family lines, 45 (95.75%) lines completely reproduced their corresponding M4 OMPs and included 40 (85.11%) stable lines with OMPs and five (10.64%) segregating lines with OMPs. NMPs and no mutant phenotype were observed in one line each (2.13%).

Determination of the inheritance patterns of stably heritable mutant phenotypes

To determine the inheritance patterns of stably heritable mutants, 18 individual plants from homozygous mutant lines in the M3 and M4 generations were crossed with wild-type or mutant plants, and their genetic segregations in the F1, F2 progenies and BC1F1 populations were examined (Table 3). This genetic analysis revealed that two mutants were double recessive and 16 mutants were monogenic. Two white stem mutants (ws1 and white stem 2) displayed double recessive patterns of inheritance, with a segregation ratio of 15:1 (WT:others) in the F2 generation and 3:1 in the BC1F1 generation produced by test-crossing F1 plants with mutants. The ws1 mutant phenotype has been confirmed to be controlled by two recessive nuclear genes, ws1a and ws1b, which were preliminarily mapped using the BC1F2 populations. The allelism test has revealed that the burley character in burley tobacco is controlled by the same two genes (Wu et al. 2014).

Table 3 Inheritance patterns and phenotypic segregations of stably heritable mutants based on the genetic analysis of the progenies of F1, F2, and BC1F1

Five mutants appeared to carry monogenic recessive mutations, such as spirally twisted and rough leaf 1, revolute margins 1, white (albino) leaf 1, and two white flower mutants (white flower 1 and 2), and their segregation ratios were the same at 3:1 in the F2 generation and 1:1 in the BC1F1 generation produced by test-crossing F1 plants with mutants. Eight mutants displayed monogenic dominant patterns of inheritance, including four plant height mutants (high plant 14) and four leaf petiole mutants (short petiole 12, and extremely long petiole 12). Monogenic and semi-dominant patterns were validated in three mutants, wide and large leaf 1 and deep red flower 1 and 2. The segregation ratios in the monogenic dominant and semi-dominant mutants were 1:3 in the F2 generation and 1:1 in the BC1F1 generation produced by crossing F1 plants with wild type.

Discussion

Optimization of EMS mutagenesis on multiple donor cultivars

It is important to assess parameters related to the efficient generation of mutant collections. The EMS-treated seeds were subjected to multiple concentrations and the incubation times of 12, 16, 17, 18, 24, and 48 h as observed for different species, such as Arabidopsis (Berná et al. 1999; Till et al. 2003, 2007), rice (Wu et al. 2005), barley (Caldwell et al. 2004), sorghum (Xin et al. 2008), and tomato (Menda et al. 2004; Saito et al. 2011). Saito et al. (2011) also suggested that the incubation duration was not the critical parameter. Therefore, the commonly used incubation time of 16 h was selected after a primary test. A concentration gradient was generated to examine the effects of EMS concentrations on the tobacco seeds (Table S2), and LD50 and LD65 were selected to obtain a relatively high mutant efficiency. In the two donor cultivars with different ecotype, ZY100 was cultivated in the north growing area, and HD was cultivated in the southwest. In 2012, the ecotype CB1, which covers a large area in the southeast growing area, was also mutagenized. Although the mutagenesis rates of HD and CB1 according to the visible morphological phenotypes were far lower than that of ZY100, the integrated mutagenesis based on multiple ecotypes can further enhance the number of polymorphisms in the mutant populations.

Pleiotropy: compound mutant phenotypes based on one gene, multiple genes, or the causality of different phenotypes

Many cases of pleiotropy were found in the tobacco mutants that included one line listed under two or more phenotypic categories. This event occurred extensively in almost all of the EMS-mutagenized mutant libraries. In our studies, we considered these cases as compound mutant phenotypes that are present in three forms. One form was the real pleiotropism, which included two or more mutations of abnormal phenotypes that were always presented together in one mutant plant. This observation likely indicated that these phenotypes were controlled by one gene. For example, rough leaf and dark green leaf were always present simultaneously, similar to spirally twisted leaf and dwarf plant. The second form was the compound mutant phenotype based on the random distribution of multiple phenotypes in one mutant line. Most of the pleiotropic events were of this type, which also caused the appearance of NMPs in advanced generations of mutants, such as M3 and M4. The last form was represented by one phenotype that led to another phenotype. For example, disordered veins produced a rugose leaf surface. Short internodes in the mutants could cause dwarf plants. Many early flowering mutants exhibited fewer leaves. In summary, the real cause and mechanism underlying these events will be illuminated via multifaceted genetic analyses.

Genetic analysis of the heritability and inheritance patterns of the mutant phenotypes

The reproducibility and heritability of the mutant phenotypes were confirmed by visible verification of the phenotyping of the M3, M4, and M5 mutant plants. We found that 70.63% of the M3 lines, 84.87% of the M4 lines, and 95.75% of the M5 lines were reproducible from OMPs of the corresponding M2, M3, and M4 mutant plants. Therefore, selfing in advanced generations may be a highly efficient method of achieving rapid isozygoty and mutant stabilization. Moreover, NMPs continued to emerge in the M3 and M4 generations at high percentages of 28.67 and 32.89%, respectively, which indicates that the M3 and M4 generations were important generations for screening the phenotypic mutants. Three aspects may explain the appearance of NMPs: NMPs are inherited recessively; the OMPs mask the NMPs; and interactions, such as epistasis, occur between OMPs and NMPs. The emergence of NMPs suggested that the ratio of mutants displaying pleiotropism might be relatively underestimated in the M2 generation.

Inheritance patterns in 18 mutants were demonstrated according to their F1, F2, and BC1F1 progenies. Seven members exhibited monogenic or digenic recessive inheritance, eight members demonstrated monogenic dominant inheritance, and two members demonstrated semi-dominant inheritance. These data confirmed that the phenotypes of the 18 mutants were reproducible in subsequent progenies, and their heritability was sufficiently stable for further functional analyses. However, the ratio of every type of inheritance pattern in these mutants does not represent the inheritance patterns of the whole population.

Preliminary evaluation of the saturability of the mutant populations

The population size necessary for conducting saturation mutagenesis in tobacco was estimated by referring to the genome sizes and saturability of other plants. Tomato is closely related to tobacco and has a genome size of approximately 900 Mb. A population of 13,000 M2 families was considered to represent a saturated collection (Menda et al. 2004). A total of 15,020 M2 mutagenized lines were produced and considered nearly saturated (Saito et al. 2011). Barley is a type of Triticeae crop, and it has a genome size of approximately 5 Gb and >80% repetitive sequences; thus, its features are similar to the tobacco genome. Caldwell et al. (2004) inferred that a saturated EMS mutant population in barley would require more than 100, 000 plants for comprehensive coverage because the mutation rate of EMS mutagenesis appears to be relatively independent of the genome size. Tobacco (N. tabacum) has a large genome size of 4.5 Gb and over 70% repetitive sequences. In general, a population with 100,000 M2 mutant lines is considered to be saturated with mutations, although the presence of multiple alleles per locus has not been determined. Therefore, allelism tests and other approaches to assess the functional genetics of tobacco will be conducted to evaluate the saturation of the ZY100 EMS-mutagenized populations.

Improvement of the mutant library

Ethyl methane sulfonate mutagenesis is an efficient way to identify mutant morphological phenotypes of tobacco, owing to its high efficiency and unbiased mutagenic distribution. All the data and images can be searched online (http://www.tobaccomdb.com/english/). In combination with the whole genome sequence of tobacco, these mutant resources provide preliminary data for associating the genome with the phenome. However, collection and storage of phenotypic mutants is not yet sufficient to facilitate the identification and characterization of gene functions. And the mutant library must be further improved to link specific phenotypes and associated genomic sequence information.

From phenotype to phenome

Phenotyping in classical genetics is largely based on large-scale phenotype screenings of mutant collections based on visual observations and scoring systems or simple instrumental inspections of organisms to understand the functional plant traits. However, the process of performing a manual evaluation of tobacco mutant phenotypes is subjective, inefficient, and error prone. Additionally, visible phenotypes are only part of the total phenome. From phenotype to phenome, multifunctional, high-throughput, and high-resolution phenotyping platforms should be developed, and the lack of these resources has become a new bottleneck in plant biology, functional genomics, and crop breeding, which are based on traditional methods of phenotyping large mutant collections (Finkel 2009; Furbank and Tester 2011; Yang et al. 2013). Certain shortcomings still limit the generalization of the high-throughput phenomic platforms, such as their high cost, sophisticated data analysis infrastructures, and inconvenient high-throughput field phenotyping (Yang et al. 2013; Grosskinsky et al. 2015). However, phenomics based on the high-throughput phenotyping has been performed to assess abiotic stress tolerance in Arabidopsis (Jansen et al. 2009; Berger et al. 2010), cereals (Sirault et al. 2009; Munns et al. 2010), tobacco, and cotton (Furbank and Tester 2011).

Genetic technologies in genome-wide phenomics

Functional genomics investigations of the mutant library were performed using forward and reverse genetic approaches. Forward genetics approaches mainly depend on positional cloning to map and clone the gene responsible for the mutant phenotype. Large numbers of genetic markers for map-based cloning have been developed, such as microsatellite markers (SSRs) (Bindler et al. 2007, 2011; Tong et al. 2012b), and amplified fragment length polymorphisms (AFLPs) (Marché et al. 2001; Dadras et al. 2014). SSR markers have been widely used for gene mapping (Wu et al. 2014), linkage disequilibrium analyses (Fricano et al. 2012), quantitative trait locus (QTL) mapping (Tong et al. 2012a), and evolutionary analyses (Wu et al. 2010). However, because of the poor genetic diversity among different tobacco cultivars, the density of SSR markers ultimately cannot meet the demand of positional cloning. Therefore, single nucleotide polymorphism (SNP) markers may be the most appropriate markers for mapping and should be exploited to construct a genetic linkage map (Xiao et al. 2015). The availability of tobacco genome sequence also makes possible to address mapping-by-sequencing approaches in this crop, which considerably accelerate forward genetic and mutant gene identification.

TILLING is a reverse genetic approach that was developed to identify mutations in EMS-mutagenized populations. This strategy has been validated by its successful application in many plants through multiple improvements (Gilchrist and Haughn 2005; Minoia et al. 2010; Tsai et al. 2011, 2013). The construction of a high-quality tobacco mutant collection also required the technological support of a systematic TILLING platform. Therefore, the mixed pools of DNA samples in the mutant populations were used to develop a high-efficiency TILLING platform and screen large numbers of various mutants in the LS, RAX, ANS, and QPT genes. In the future, this platform will facilitate functional genomics applications in tobacco.

Author contribution statement

LG and WY conceived and designed the research. WD, WS, CJ, WX, SY, LF, LJ, GX and LG performed technical work for investigating mutants. WD, WS, CJ, WX, and LG analyzed and summarized the data. WD wrote the manuscript. All authors read and approved the manuscript.