Introduction

Cotton (Gossypium spp.) is one of the major sources of natural textile fiber, livestock feed and human foodstuff. The commercial production of cotton comes from two allotetraploid species namely G. hirsutum (90%) and G. barbadense (8%), rest 2% from two diploid species (G. herbaceum and G. arboreum) (Kumar et al. 2021). The diploids are known for long maturity time, poor yield and fiber quality, but have a natural ability to withstand biotic and abiotic stresses (Kulkarni et al. 2009; Wang et al. 2012). In spite of better adaptability under harsh conditions, farmers do not prefer to grow diploid cotton due to poor productivity and fiber quality that fetches low price. On the other hand, limited efforts have been made to exploit the genetic potential toward its improvement (Parekh et al. 2018). At present, the demand for diploid cotton mainly G. herbaceum is increasing day by day as surgical cotton and technical textiles, which force improvement of its genetic potential towards higher lint production. The existence of satisfactory genetic variability in the gene pool provides a platform to devise suitable breeding strategies for genetic improvement however limited diversity has been reported in G. herbaceum (Jena et al. 2012; Parekh et al. 2016). Thus, alternative strategies need to be deployed to increase the genetic variation within the germplasm for further exploitation of genetic potential of this crop. Previously, gene expression studies have been carried out in mannitol-imposed drought stress using cultivar “WAGAD” of G. herbaceum and identified several novel genes (Ranjan et al. 2012). Some of the identified genes have characterized and validated their role in drought stress (Trivedi et al. 2012; Gunapati et al. 2016). But, most of them are of unknown functions and no homologs are reported in the public database. Thus, it is essential to explore the whole genome of WAGAD and characterize newly identified genes to better understand the drought tolerance mechanism at molecular levels. According to Khan et al. (2018), ambient temperature is increasing since the last decades and promoted the drought situation. Thus, WAGAD can be used in the varietal development programs of cotton breeding for drought stress tolerance.

The induced mutagenesis has a significant role in both crop improvement and studies on functional genomics. Among various mutagens, EMS is a widely accepted and utilized chemical mutagen for mutagenesis (Aslam et al. 2016). At the genetic level, it alkylates G nucleotide and leads GC to AC transitions, which results in point mutations throughout the genome. About 1/24 to 1/2500 kb mutation density have been reported in several crops (Kurowska et al. 2011). Among the molecular markers, the RAPD (Randomly amplified Polymorphic DNA) marker is amplified with short arbitrary primer, which amplifies the genomic DNA of targeted species without prior sequence information and produces polymorphic bands that is used in detection of generic diversity (Raj et al. 2014). The binding sites of these primers are changed (either loss or gain) in the mutated lines due to alteration of DNA against the control and detected by gel electrophoresis (Danylchenko and Sorochinsky, 2005). In previous reports, numerous mutant platforms of the model and crop plant species have been developed and identified phenotypic diversity followed by detection of induced mutation through sequencing and RAPD approaches such as wheat (Chen et al. 2012; Slade et al. 2012), tomato (Saito et al. 2011), Petunia (Krupa-Małkiewicz and Bienias 2018), alfalfa (Danylchenko and Sorochinsky, 2005), melon (Dahmani-Mardas et al. 2010) and Jatropha curcas (Dhakshanamoorthy et al. 2013; Maghuly et al. 2018). In the case of cotton, cultivated allotetraploid species has been used for the development of mutant population and used for identification of genome-wide mutations (Abid et al. 2020; Lian et al. 2020) as well as noble agronomic traits such as herbicide tolerance (Bechere et al. 2010) and good fiber quality (Brown et al. 2015). However, there is a low pace of research on G. herbaceum for its exploitation towards functional genomics studies and development of new varieties/lines (Kumar et al. 2021). Considering the importance of the G. herbaceum, efforts have been made to create new variability, which may either be used in functional genomics studies or crop improvement programs. Therefore, present work was conducted for phenotypic and molecular characterization of the mutant population and identification of potential variants for further exploitation in genetic improvement.

Materials and methods

Plant material and optimization of EMS concentration

The seed materials of G. herbaceum (cv. WAGAD) were obtained from CSIR-NBRI, Lucknow, India (Ranjan et al. 2012). To determine the appropriate optimum lethal concentration (LC50), two rounds of seed treatment tests were performed. A wider range of EMS doses e.g. 50, 100, 150, 200 and 250 mM were tested in the first round. While, lower EMS doses were used i.e. 40, 50, 60, 70, 80 and 90 mM in the second round of test as per the result obtained in the first round. Both rounds of tests were repeated three times. The EMS treatment of seeds were carried in flask kept of shaker for 16 h at 240C and 80 rpm. Hundred seeds were used for different concentrations of EMS treatment with control (0.0 mM, untreated). The treated seeds were rinsed under running tap water for 3 h and sown in a cotton growth house (Saveer Biotech Limited, New Delhi, India). Seed germination was recorded on 15 days and expressed in percentage to the nearest whole number.

Development of large mutant population

To develop the M1 population, 5500 seeds (M0 seed) were treated with 70 mM EMS (LC50) and harvested plant-wise self-pollinated M2 seeds from 2597 fertile M1 plants. Three M2 seeds from each M1 plant (3 × 2597 = 7791 M2 seeds) were sown in the next cropping season to raise M2 population and collected self-pollinated M3 seeds from 6600 M2 plants. Single plant per M2 derived M3 generation was used, which is equivalent to 3 plants from each M1 plant and sown a total of 6,500 M3 seeds (Table 1). The M1, M2 and M3 generations were grown in research field of CSIR-NBRI at Mahatma Gandhi Mission (MGM), Aurangabad (MS), India (19° 15' N; 75° 23' E). The spacing of plant to plant and line to line was maintained at 75 cm and 90 cm, respectively.

Table 1 Seed germination, survival and fertility details of the M3 population of G. herbaceum

Phenotyping of M3 population and RAPD analysis

The visible observations were made at regular time intervals from germination to harvesting stage and noticed mutant phenotypes. In addition, quantitative data for 11 agronomic traits were recorded for induced diversity analysis against the control. The name of traits and their methods for data collection are described in supplementary table 1. To perform RAPD analysis, the genomic DNA was isolated from a random set of 150 M3 plants using plant DNeasy mini kit (Qiagen, Valencia, California) as per manufacturer’s instructions. The quality of DNA was assessed on 0.8% agarose gel and quantified using Qubit 2.0 fluorometer (Life Technologies, USA). A set of 32 RAPD primers were used for preliminary screening. Finally, 14 RAPD primers with clear and reproducible amplification were selected for genotyping (Table 2). The PCR amplification was carried in 15 μL reaction volume with 50 ng genomic DNA, 10 pmol primer and 1X TopTaq PCR master mix (Qiagen Inc, USA). The PCR condition included pre-denaturation at 94 °C for 3 min, followed by 43 cycles of denaturation at 94 °C for 1 min, annealing at 35 °C for 1 min, extension at 72 °C for 1.30 min and a final extension at 72 °C for 10 min. The PCR product was separated on 1.2% agarose gel in 0.5X TBE buffer (Tris–Borate-EDTA). After electrophoresis, the gel was stained in ethidium bromide and visualized on Gel Doc XR (BioRad Laboratories, Inc). The mutation had considered from the change of RAPD amplified band position with respect to the control. RAPD bands were scored as 0–1 matrix, where ‘0’ indicates absence of band while ‘1’ indicates presence of band for each primer. The unclear (faint) bands were not considered for data scoring.

Table 2 RAPD analysis among selected M3 plants and related statistical parameters

Statistical analysis

Microsoft Excel 2016 was used for the statistical analysis of the treatment viz., range of dispersion, mean, standard error and coefficient of variance (CV%). The mean values of M3 population and the control were analyzed by student's t-test (p ≤ 0.05). The frequency distribution graph was plotted with the help of PAST-3 software (Hammer et al. 2001). The 0–1 binary data of RAPD was used to compute pair-wise genetic dissimilarities among the selected plants using Jaccard’s similarity coefficient. The neighbor-joining (NJ) tree was constructed along with principal component analysis (PCA) using DARwin 5.0.157 software (Perrier et al. 2003).

Results and discussion

Optimization of lethal concentration of EMS

The rate of induced mutation depends on the effectiveness and efficiency of a mutagen followed by the duration of treatment as well as plant materials being treated (e.g., seed, pollen and vegetative part). The EMS is absorbed by treated samples which shows a genotoxic effect, can induce either DNA damage or inhibit the physiological processes that block/reduce enzyme activity and RNA synthesis (Kumar et al. 2013). The higher doses increase frequency of mutation but dramatically decrease the recovery of viable seeds, while too low concentration recovered less mutation (Chawade et al. 2010). In the present investigation the seed germination was decreased as the EMS concentrations increased (Fig. 1). Similar pattern was also reported in previous studies which have shown the effectiveness of mutagenesis in sunflower (Sabetta et al. 2011) and wheat (Rawat et al. 2018). According to Abid et al. (2020), the optimum lethal concentration of a mutagen provides maximum mutation density without paying off plant viability and fertility. For this, different concentrations of EMS were tested and tried to find LC50 before treating the bulk materials (Fig. 1). In the first round of treatment, about 30–70% of seed germination was assumed from the control between 50 to 100 mM (Fig. 1). Too higher doses of EMS (150 to 250 mM) were very effective but dramatically killed the seed and were discarded (Fig. 1). In the second round of lower doses test about 51% of seed germination was noticed at 70 mM (Fig. 1). While, 83, 74, 61, 46 and 41% of seed germination were counted at 40, 50, 60, 80 and 90 mM of EMS, respectively (Fig. 1). Based on these results, we concluded that 70 mM was the best as LC50 in the subsequent experiments. In previous studies, higher concentrations of EMS ranging from 1.5 to 2% have been tested for allotetraploid cotton (Abid et al. 2020; Lian et al. 2020). This may be due to genome buffering as reported for wheat mutagenesis (Rawat et al. 2018).

Fig. 1
figure 1

Kill-curve plot of G. herbaceum (cv. WAGAD) during the optimization of lethal concentration (LC50)

Phenotyping of M3 population

Out of 6500 M3 seeds sown, 5527 of them had survived till the maturity. Finally, M4 seeds were harvested from 5473 fertile M3 plants (Table 1). The control was well grown and 100% fertile. The higher mortality of germinated seedlings might be due to damage of cotyledon, root and shoot. Such mortality in the initial phase of mutagenesis may have occurred due to the increased homozygosity of the lethal mutation (Schreiber et al. 2019). Other studies by Espina et al. (2018) and Chantreau et al. (2013) have reported similar results for soybean and linseed, respectively. According to Saito et al. (2011), the population size is necessary to develop a saturated mutant resource, where every locus of the genome is mutated and needed more than thousands of individuals in the case of the EMS mutagenesis platform, suggesting that the current population was almost saturated. Furthermore, perceptible phenotypic variation was noticed in 276 plants (~ 5%), which were categorized into 6 groups and 17 subgroups (Table 3). According to Mardas et al. (2010), M1 mutation becomes homozygous in the subsequent generations, exploring the fact that all phenotypes might be due to homozygous condition of mutated gene/s that control a defined pathway/s. The major mutant phenotypes showed changes in leaf color, curly leaf, bushy plant growth (shorter internode and more branches) and round shape of boll (Fig. 2a–e), others identified mutations are listed in Table 3. The leaf color and plant growth-related mutations are reported as primary signal of induced mutagenesis. These have also been observed in other experiments that elucidated sufficient mutagenesis and reported a high rate of point mutations (Chantreau et al. 2013; Kumar et al. 2013). The validation of these mutant phenotypes can be useful in identifying the genes responsible for photosynthetic efficiency, plant growth and development (Abid et al. 2020). The most striking phenotypic variations observed in the field were 47 complete sterile and 7 partial sterile plants (Table 3).

Table 3 Classification of visible phenotype and number of mutant plants in the M3 population
Fig. 2
figure 2

Visual screening of mutations in M3 population. (a) Chimeric white leaf (b) Chimeric yellow leaf (c) Curly leaf (d) Bushy plant (e) Round shape boll

The quantitative data were recorded from 4453 M3 plants. The lower and higher range of measured traits were highly scattered than control (Table 4) and formed a bell-shaped curve (Fig. 3a–k). This result suggested that the EMS mutagenesis has introduced additional variation in which intermediate phenotypes are more common than the extremes. Moreover, assessed CV had varied from 11.42 to 29.29%, which was higher than the control (Table 4). The maximum CV was recorded for the leaf size (29.29%) followed by number of branch/plant (29.27%), seed weight/plant (27.57%), biological yield/plant (27.25%), stem circumference (27.18%), lint weight/plant (28.17%), number of boll/plant (26.54%), number of node/plant (23%). The other traits were reported less than 20% (Table 4). The increased CV indicated a broad genetic base and heterogeneous nature of the population. The mean values revealed that most of the traits were significantly (p ≤ 0.05) varied from control except internodal distance and test weight (Table 4). Thus, M3 population has shown significant differences against the control for the recorded traits and suggested induced quantitative variation as previously reported in the mutant population of chili (Hasan et al. 2020), chickpea (Tiliouine et al. 2018) and lentil (Laskar and Khan 2017). Thus, quantitative variation of the population over the control would be of direct interest to plant breeders for the selection of superior line/s.

Table 4 Performance of 4,453 M3 plants for 11 different agro-morphological traits with the control
Fig. 3
figure 3

Frequency distribution of 11 agro-morphological traits for 4,453 M3 plants of G. herbaceum (cv. WAGAD). (a) Number of branches/plant, (b) Plant height, (c) Stem circumference, (d) Leaf size, (e) Number of nodes/plant, (f) Internodal length, (g) Number of bolls/plant, (h) Test weight, (i) Lint weight/plant, (j) Seed weight/plant, (k) Biological yield/plant. The red and blue arrow indicates the lower and higher range of dispersion, respectively of the controls

RAPD analysis

Sometimes, morphological variations cannot be observed due to gene redundancy and silent mutation. In such conditions, RAPD profiling for the small set of mutant with the control helps in detection of induced mutation at the DNA level at the primary stage. In the previous reports, such hidden variation has been explored by RAPD fingerprinting in wheat (Chen et al. 2012) and torch ginger (Yunus et al. 2013). In this study,14 RAPD primers were selected that gave 92 clear scorable bands (Table 2). The numbers of bands per primer ranged from 4 (OPZ-13) to 11 (OPG-02) with an average of 6.57 bands (Table 2). Of these, 68 (75.1%) bands were found to be polymorphic between the control and selected M3 plants (Table 2). The percentage of polymorphism varied from 33.3% (OPG-01) to 87.5% (OPE-20 and OPA-13) (Table 2). A representative gel electrophoresis image of the primer OPW-05 and OPE-20 are presented in Fig. 4a, b that showed a clear loss of bands in the number of amplified DNA fragment among M3 plants than the control. These polymorphic bands suggested induced mutation which may be either point mutation or small indels (insertion or deletion). In other studies, similar efforts have been made for sorghum (Taryonoand and Cahyaningrum, 2011) and J. curcas (Dhakshanamoorthy et al. 2013) by using six and twenty-five RAPD primers and reported comparable percentage of polymorphic bands, respectively. However, Aslam et al. (2017) reported a lower percentage of polymorphism among Capsicum annuum mutants. According to Raj et al. (2014), cluster analysis is a productive approach in diversity analysis that uses obtained numerical data of fingerprint and calculates distances between every pair of entities. In the present work, pair-wise genetic distance was calculated from the obtained binary data that ranged from 0.0 to 0.99 with an average value of 0.50 ± 0.07 (data not shown). The average genetic dissimilarity of each plant with other plants varied from 0.29 ± 0.29 (Control) to 1.00 ± 0.00 (Gh_mut_139). The highest degree of genetic dissimilarity was observed in Gh_mut_139 followed by Gh_mut_77 and Gh_mut_87 among the others and considered most divergent. The NJ dendrogram was created to better understand the similarity and dissimilarity among 150 M3 plants and grouped into three major clusters namely I, II and III (Fig. 5). Cluster I and II had further sub-divided into two sub-clusters as Ia, Ib and IIa, IIb, respectively. Sub-cluster IIa was found to be the largest with 67 plants while sub-cluster IIb was lowest with 4 plants. The control plant was clustered in IIa. Sub-cluster Ia and IIb had computed 66 and 5 plants, respectively. Cluster III was represented by 9 plants. In addition, PCA analysis also differentiated all the 150 M3 plants in to three major clusters almost similar to the NJ dendrogram (Fig. 6). Thus, both NJ dendrogram and PCA analysis of the present study have shown distinct clusters and subclusters due to the accumulation of induced mutations among M3 plants. As a result, current study gives information about the quality of the developed mutant resource that can be used in both functional genomics study and cotton breeding programs.

Fig. 4
figure 4

RAPD profile of selected mutants along with the control. (a) OPW-05 primer, (b) OPE-20 primer. Absence of RAPD band has been shown by red arrow, (M = DNA ladder, W = Control)

Fig. 5
figure 5

Dendrogram of 150 M3 plants with the control have built from 14 RAPD markers based on Jaccard’s coefficient using NJ clustering method. Control plant is highlighted by yellow color. The abbreviation Gh_mut = G. herbaceum_mutant whereas digits indicates ID of the M3 plant that used in RAPD analysis

Fig. 6
figure 6

Two-dimensional plot of principal component analysis (PCA) for genetic relationship among 150 M3 plants of G. herbaceum along with the control, (Cluster I in blue color, Cluster II in black color and Cluster III in red color)

Conclusion

The aim of the current study was to utilize EMS mutagenesis and establish a saturated mutant population. A set of 5473 fertile M3 plants has been established and shown to possess considerable degree of phenotypic as well as genotypic polymorphism. These plants will serve as a potential resource for forward and reverse genetic studies. Further, phenotypically diverse plants will allow exploring causal point mutation for functional validation of genes responsible for growth and development as well as marker-assisted selection.