Keywords

8.1 Introduction

Genes, being constituted by the nucleic acids and carried on the chromosomes, are the units of heredity. They undergo mutation to change from one form to another. Mutations involve sudden heritable changes as termed by Hugo de Vries (1901). These heritable changes can be identified at the chromosomal level and nucleic acid level (see Auerbach 2013). The changes at chromosome level include changes in chromosome number, changes in the entire set of genes, and changes in individual genes, while the molecular changes at nucleic acid level include base substitutions, additions, and deletions. Apart from occurring spontaneously, the mutations can be induced. Mutations are the chief source of genetic variability leading to phenotypic alterations profoundly contributing to biological evolution (Kutschera and Niklas 2004; Schoen and Schultz 2019; Jay et al. 2021) (Fig. 8.1).

Fig. 8.1
A bar graph shows the number of PubMed papers for morphological markers, isozyme or allozyme, R F L P, R A P D, A F L P, S S R, T E marker, Tilling, S N P, Mut-map, where the isozyme recorded the highest number 396 and T E marker has the least value of 8.

Number of PubMed publications reporting the use of genetic markers in mutant analysis till date

Apart from being extensively used in genetics, mutations have been applied for crop improvement. Till date, several mutants have been developed in plants, animals, and microbes, and many have been characterized. With the advancements in genetics and genomics, now the characterization of mutants and mutations is employed as an approach for gene discovery. Mutant characterization generally refers to identifying the type and location of genetic changes, determining the mode of inheritance, and recognizing the phenotype due to the genetic changes. Nowadays, the mutants are characterized at the level of phenome, genome, epigenome, transcriptome, proteome, metabolome, ionome, etc. With the advances in DNA sequencing, the genome-wide approach of characterization is becoming the method of choice, however considering its huge demand for the resources, molecular markers are widely used for mutant characterization. Reciprocally, the mutant resources offer an opportunity to develop new markers and marker systems that can be employed for mutant characterization. In this chapter, an effort is made to review the molecular markers that have been successfully employed for identifying and characterizing the mutations in plants.

8.2 Types of Mutations

8.2.1 Classical Mutations

Mutation refers to change in the genetic material and the process by which the changes occur. Thus, the mutations involve sudden and heritable changes that cannot be explained by recombination of pre-existing genetic variability. Such genotypic changes include changes in the chromosome number (euploidy and aneuploidy), gross changes in the structure of chromosomes (deletions/deficiencies, duplications, inversions, and translocations), and changes individual genes (excluding changes in chromosome number or structure) (Gardener et al. 1991). Changes in the structure of chromosomes require breaks (one or more) in the chromosome (single or a set of chromosomes). Based on the first cytological observations of maize chromosome rearrangements by McClintock, four structural changes were identified; (1) deletions/deficiencies (parts of chromosomes lost or deleted), (2) duplications (parts added or duplicated), (3) inversions (sections detached and reunited in reverse order), and (4) translocations (parts of chromosomes detached and join to nonhomologous chromosomes).

Mutations involving single base-pairs are referred to as point mutations which can be found due to substitutions (change of one base-pair to another), duplications, and deletions. Mutations resulting from tautomeric shifts in the DNA involve the replacement of a purine in one strand of DNA with other purine and/or the replacement of a pyrimidine in the complementary strand with the other pyrimidine, which collectively are called transitions. While base-pair substitutions involving the replacement of a purine with a pyrimidine and vice versa are called transversions. Four different transitions and eight different transversions are possible in DNA. Additions and deletions of one or a few base-pairs are collectively called as frameshift mutations since they alter the reading frame of a gene distal to the site of mutation.

Based on the occurrence, the mutations can be classified as spontaneous (those occurring without a known cause) or induced (resulting from the exposure to mutagenic agents). Operationally, it is difficult to discern spontaneous mutations from induced mutations. The effect of mutation on the phenotype might range from minor alterations, which can be detected only through special genetic or biochemical techniques to gross modifications of morphology to lethals.

Any mutation occurring within a given gene will produce a new allele. Because of the degeneracy of the genetic code, the mutant allele coding for the unaltered protein and the phenotype is called as isoallele. Mutant alleles coding for altered gene product (protein) may show modified phenotype, and the mutation resulting into loss of gene product in an essential gene may be lethal.

Mutations may be either recessive or dominant. In diploids (or polyploids), the recessive mutations can be recognized only in their homozygous state, whereas the dominant mutations can be identified both in the heterozygous state and their homozygous state. Most of the mutations that have been identified and studied by geneticists are recessive in nature. It is also important to note that a mutation can continue to exist in heterozygous condition (Branch et al. 2020). Mutations can occur in any cell cycle state of somatic and germinal cells (Schoen and Schultz 2019). Thus, the effect of mutation and the subsequent phenotypic change depends on its nature of dominance, the cell type, and state of the cell cycle.

8.2.2 Epimutations

Classical mutations (explained above) involve changes in DNA base sequences. In contrast, the changes in gene function that are mitotically and/or meiotically heritable and that do not involve any change in DNA sequence were named “epimutation” by Holliday (1984). These changes generally entail DNA methylation and histone/chromatin modifications (Noshay and Springer 2021; Shah 2021). Two types of epimutations can be identified: primary and secondary (Horsthemke 2006). The primary epimutations occur in the absence of any DNA sequence change while the secondary epimutations occur secondary to a DNA mutation in a cis- or trans-acting factor (Oey and Whitelaw 2014). Epimutations can also be recognized as germ line and somatic. Germ line epimutations being derived from the germ line would be present in all of the tissues (constitutive) of an individual, while the somatic epimutations arise in the somatic cells (Hitchins and Ward 2009). These possibilities explain the differences in zygosity of epimutations. Spontaneous epimutations leading to phenotypic changes in plants are well documented (Johannes and Schmitz 2019). DNA methylation is not always faithfully maintained somatic and germ line cells despite the regulation. As a result, cytosine methylation is sometimes gained or lost in a stochastic fashion due to spontaneous epimutations (Johannes and Schmitz 2019) not only in the somatic cells but also in the germ line cells thereby passing through the gametes to subsequent generations, and giving rise to heritable epigenetic variation. Epimutations can also be induced by biotic (Dowen et al. 2012; Bhat et al. 2019a) and abiotic (Verhoeven et al. 2010) stresses in plants. Profiling the DNA methylomes of Arabidopsis plants exposed to bacterial pathogen and salicylic acid (SA) hormone revealed numerous differentially methylated regions, many of which were transposon-associated regions (Dowen et al. 2012). Multi-generational drought-induced random epimutations were found at a high proportion among the cluster of drought-responsive genes in rice. Many of such epimutations were maintained in advanced generations, and they improved the drought adaptability of offsprings (Zheng et al. 2017).

8.2.3 Gene-Tagged Mutants

A large number of mutants in various crops have been developed using gene-tagging and trapping methods. They employed T-DNA or transposons for generating the insertional inactivation mutants (loss of gene function), activation tagged mutants (gain of gene function), and trans-activation tagged mutants for functional genomics (see Upadhyaya 2007). Till date, a large number of mutants have been characterized and the genes have been identified (Upadhyaya et al. 2003; Lo et al. 2016). Also, Ds insertion mutagenesis has been used as an efficient tool to produce diverse variations for rice breeding (Jiang et al. 2007). However, this method of characterizing the mutants largely do not involve any DNA markers since the genes are discovered by isolating the flanking sequences and further validated by overexpression, gene silencing, reversion, etc. (Pereira 2011).

8.2.4 Gene Silencing Mutants

Gene silencing is a straightforward approach to reduce or knockout expression of a gene with the hope of seeing a phenotype that is suggestive of its function (see Ghosh et al. 2020). Transcriptional gene silencing (TGS) and post-transcriptional gene silencing (PTGS) methods have been demonstrated in several plant species with clear advantages over insertional mutagenesis like not being limited by gene redundancy, free from lethal knockouts, lack of non-tagged mutants, and the ability to target the inserted element to a specific gene. Mutant characterization requires gene expression and phenotypic evaluation (see Curtin et al. 2007).

8.2.5 Gene-Edited Mutants

Targeted mutagenesis through genome-editing technologies (Przybyla and Gilbert 2021; Singh and Shekhawat 2021) allows direct and irreversible mutations through nonhomologous end joining of double-stranded breaks generated by CRISPR–Cas9 to get altered phenotype. Recently developed Cas9 variants, novel RNA-guided nucleases and base-editing systems, and DNA-free CRISPR–Cas9 delivery methods now provide great opportunities for inducing mutations (see Yin et al. 2017; Atkins and Voytas 2020; Porto et al. 2020). DNA sequencing and DNA markers are useful in detecting such gene-edited mutations.

Targeted mutagenesis in the conserved coding regions of AhFAD2 genes was undertaken using TALENs (Wen et al. 2018). Mutation frequencies among AhFAD2 mutant lines were significantly correlated with oleic acid accumulation. Using CRISPR/Cas9 activity, three mutations were generated; G448A in AhFAD2A, and 441_442insA and G451T in AhFAD2B leading to high oleic acid content in peanut (Yuan et al. 2019). Applications of gene-editing technologies are being demonstrated in various systems (Sattar et al. 2021; Singh and Shekhawat 2021).

8.2.6 Deletion Mutants

Gene isolation for genomic research (in the absence of genome sequence) employs generating random DNA fragments and sequencing. Unfortunately, this strategy requires a labor-intensive assembling of contiguous sequences. In addition, the redundant DNA segments which are quite frequent in plant genomes make this strategy inefficient in terms of time, effort, and cost. The effective methods typically involve the production of a series of nonrandom nested DNA deletions using DNA sonication, DNase I digestion, exonuclease III digestion, and restriction endonuclease digestion. Several nested deletion strategies utilizing PCR have also been reported (see Dennis and Zylstra 2002). In addition, a number of in vivo or in vitro transposon-mediated methods have been developed that utilize random transposon insertions as a binding site for sequencing primers. An alternative method which rapidly produces deletions in a cloned DNA fragment using a rare-cutting restriction enzyme and a frequent-cutting restriction enzyme was also demonstrated to generate contiguous and colinear cloned DNA fragment (Dennis and Zylstra 2002).

Like gene-tagged mutants and gene silencing mutants, these deletion mutants also do not largely involve any DNA markers but depend on expression analysis. Recently, a CRISPR/Cas9 genome-editing system based on RNA endoribonuclease Csy4 processing to induce high-efficiency and inheritable targeted deletion of transcription factors involved in floral development was reported in Arabidopsis (Liu et al. 2019).

8.3 Characterization of Mutants

Mutations in both somatic and germ lines are retained at a tolerable level in spite of DNA repair mechanisms (photoreactivation, excision repair, and postreplication recombination repair) which are probably universal. These mutations are invaluable to the process of evolution, and they provide the new alleles required for the various types of genetic analysis (Mendel’s two-factor crosses to chromosome mapping to studies on genetic structures of populations. Mutations have extensively been used for elucidating the metabolic pathways. These applications of mutations are now being increasingly realized with the use of several advanced molecular tools and techniques. Among them, genetic markers (Amom and Nongdam 2017; Nadeem et al. 2018; Barman and Kundu 2019) are being used for various purposes like mutant characterization, mapping the mutations, fine-mapping, candidate gene discovery, etc. The genetic marker is a gene or DNA sequence that directly controls a trait or shows linkage/association with a trait. They can be classical markers (morphological and biochemical) and DNA/molecular markers. Genetic markers are playing a major role in detecting the mutations (Wu et al. 2012) and evaluating mutant and non-mutant populations (Gupta et al. 1999; Jehan and Lakhanpaul 2006) since 1970s.

8.3.1 Morphological Markers

Morphological markers can visually distinguish phenotypes at the traits like seed structure, flower color, growth habit, and other important agronomic traits. They are easy to use, with no requirement for specific instruments and specialized biochemical and molecular techniques to measure. Wagner et al. (1992) used hypocotyl color, monogerm character, pollen fertility, and stem fasciation as the morphological markers to construct the linkage groups using the mutants in sugar beet. Morphological markers have been successfully employed for linkage map construction and diversity analysis among the mutants of alfalfa (Kiss et al. 1993), rye (Voylokov et al. 1998), banana (Miri et al. 2009), etc. However, the main disadvantages of morphological markers are: they are limited in number, influenced by the plant growth stages and various environmental factors.

8.3.2 Biochemical Markers

With the application of biochemical techniques, identifying the variation in the physical and chemical properties of proteins has been made possible using the gel electrophoresis. Forms of the same protein that show different mobility in gel electrophoresis due to the difference in amino acid sequence and the electric charge are called allozymes. Since allozymes are coded by different alleles, allozymic variation is direct indication of genetic variation. Specifically, two or more forms an enzyme coded by the different alleles of the same gene are called allozymes, and the enzymes that process or catalyze the same reaction but coded by different genes are called as isozymes. Though allozyme analysis has the advantage of being relatively rapid, cost-effective, efficient, and sampling being spread over a variety of presumably independent gene loci, it has the chief disadvantage of relatively low abundance and low level of polymorphism. In many species, the maximum allozymic variation is 20–30%. Another drawback of allozyme electrophoresis is that the bands (alleles) that have the same electric charge and migrate to the same pole in the gel may not be truly allozymic. In an initial effort on biochemical markers, Scandalios and Espiritu (1969) reported two forms of aminopeptidase (AmP) using conventional zone electrophoresis in Pisum sativum. Isozyme analysis was used for characterizing the mutants in maize (Schwartz 1971), tomato (Caruso and Glier 1973; Mattoo and Vickery 1977), rice (Alvarez et al. 2000), etc.

With the advances in proteomics (Ingole et al. 2021), metabolomics (Francisco et al. 2021; Murphy et al. 2021; Shen et al. 2021; Zheng et al. 2021), and ionomics (Murgia and Vigani 2015; Sevanthi et al. 2018), the mutant characterization and gene identification turned out to be more robust. The role of gibberellins (GAs) in germination of Arabidopsis seeds was examined by a proteomic approach, and changes in 46 proteins were detected during germination using two-dimensional (2D) electrophoresis (Gallardo et al. 2002). 2D electrophoresis was used to analyze the proteome of the salt-tolerant mutant (RH8706-49) and the salt-sensitive mutant (H8706-34) of wheat (Huo et al. 2004). With MALDI-TOF-MS analysis, the qualitative and quantitative differences were identified between the two mutants for five chloroplast candidate proteins: H+-transporting two-sector ATPase, glutamine synthetase 2 precursor, putative 33 kDa oxygen evolving protein of photosystem II and ribulose-1,5-bisphosphate carboxylase/oxygenase small subunit.

8.3.3 DNA Markers

DNA/molecular markers detect differences in DNA sequence arising due to base-pair substitutions, deletions, and additions, and are thus less ambiguous than the morphological and biochemical markers, which depend upon gene expression (Williams et al. 1993). Wessler and Varagona (1985) attempted the molecular analysis of more than 40 mutant alleles of the waxy (Wx) locus of maize which were phenotypically unstable due to the insertion of the maize transposable activator (Ac) and dissociation (Ds) elements. The Wx transcription unit had insertions ranging in size from 150 base-pairs to 6.1 kilobases and deletions resulting in the mutant phenotype. Later, Spell et al. (1988) sequenced the upstream region of Wx transcription unit and found restriction fragment length polymorphism (RFLP). Okagaki and Wessler (1988) cloned the waxy gene from maize and rice and found that many restriction sites within the translated exons were conserved. Helentjaris et al. (1986) constructed the first genetic map based on the RFLP markers in maize and tomato. RFLP markers were also employed to construct genetic map using the Arabidopsis mutants (Chang et al. 1988). Petunia hybrida line V30 harbors three dihydroflavonol-4-reductase (DFR) genes (A, B, C), and they were mapped by RFLP analysis on three different chromosomes (IV, II, and VI, respectively) (Beld et al. 1989).

Williams et al. (1993) demonstrated mapping of mutant genes using RAPD markers which employed a set of primers corresponding to mapped RAPDs distributed throughout the genome of Arabidopsis on the pools of F2 plants obtained by crossing homozygous wild type genotype with a mutant. Subsequent advances in marker technology witnessed the use of AFLP (Castiglioni et al. 1998; Qu et al. 1998; Li et al. 2000), SSR (Tang et al. 2001), ISSR (Schwarz-Sommer et al. 2003), DArT (Vipin et al. 2013; Shasidhar et al. 2017), and EST (Kuraparthy et al. 2008) markers for mutant characterization and mapping of mutations. da Silva et al. (2016) employed TRAP and SRAP markers on guarana plant accessions which showed more polymorphism than the RAPD markers. SRAP markers were also useful in capturing the variation among morphologically similar accessions. Utility of TRAP markers to determine indel mutation frequencies induced by gamma ray irradiation was also demonstrated in faba bean (Vicia faba L.) by Lee et al. (2019). The TRAP markers distinguished mutant lines and showed association between mutation frequency and gamma doses.

By far the most common type of molecular variation exists at single nucleotide level in terms of substitutions, insertions, and deletions. These polymorphisms collectively are referred to as single nucleotide polymorphisms (SNPs). SNP genotyping can be done either at whole-genome level or reduced genome representation level (atrial genome) to differentiate the mutants from its parents. Till date, a large number of SNPs have been identified and their structural and functional features have been studied (Bhat et al. 2022).

An initial study with SNP-derived cleaved amplified polymorphic sequence (CAPS) analysis could identify the structural variations for the rice semi-dwarfing gene, sd-1, the rice “green revolution gene” encoding a mutant enzyme involved in gibberellin synthesis (Monna et al. 2002) between the Dee-Geo-Woo-Gen-type sd-1 mutant and the normal-type variety. Further, PCR-RF-SSCP (PRS), which combines CAPS and single-strand conformation polymorphism (SSCP) was used to detect SNP between wild type Waxy gene and its mutant types (Sato and Nishio 2003). SNP detection reactions based on multiplexed primer extension assay called multiplexed SNaP shot assay and matrix-assisted laser desorption/ionization time-of-flight (MALDI-ToF) assay were established for high-throughput SNP genotyping the mutants in Arabidopsis (Torjek et al. 2003). SNPs derived via diversity arrays technology (DArT)seq were used to map liguleless mutant (LM) of Aegilops tauschii (Dresvyannikova et al. 2019).

Genotyping by sequencing (GBS)-derived SNPs were used to map the homologous transformation sterility gene (hts) in wheat (Yang et al. 2018), Anthocyanin Acyltransferase1 (AAT1) gene in maize (Paulsmeyer et al. 2018), the three-pistil gene (Pis1) in wheat (Yang et al. 2017) Breviaristatum-e (ari-e) locus in cultivated barley (Liu et al. 2014), etc. where the polymorphic alleles were derived from the mutations.

Genetic mapping using a 9 K SNP genotyping assay and restriction site-associated DNA sequencing (RAD-Seq) on bulked segregants derived from a cross between the susceptible cultivar Columbus, thought to possess the suppressor, and Columbus-NS766, a resistant, near-isogenic line believed to contain a mutant non-suppressor allele introgressed from Canthatch could identify the markers linked to a locus of stem rust resistance in wheat (Pujol et al. 2015).

To expedite the discovery of mutant gene for assessing gene function, the SHOREmap (SHOrtREad map) (Schneeberger et al. 2009), NGM (Next-Generation Mapping) (Austin et al. 2011), MutMap (Abe et al. 2012), MutMap+ (Fekih et al. 2013), MutMap-Gap (Takagi et al. 2013b), and QTL-Seq (Takagi et al. 2013a) methods were developed. In all of these methods, except QTL-Seq, mutation is generated using mutagens and the mutant population is screened for the desired phenotype.

Later, a few other variants of SHOREmap like synteny-based mapping-by-sequencing (Galvao et al. 2012) and SHOREmap (version 3.0) (Sun and Schneeberger 2015) were introduced. SHOREmap, the GATK pipeline, and the samtools pipeline were used to identify the mutations in HASTY locus governing leaf hyponasty in Arabidopsis (Allen et al. 2013). NGM was used to identify three genes involved in cell-wall biology in Arabidopsis thaliana (Austin et al. 2011). MutMap-Gap was applied to isolate the blast-resistant gene Pii from the rice cv. Hitomebore using mutant lines that have lost Pii function (Takagi et al. 2013b). These related techniques were used to identify the mutation in ent-kaurene synthase, a key enzyme involved in gibberellin biosynthesis conferring a non-heading phenotype in Chinese cabbage (Brassica rapa L. ssp. pekinensis) (Gao et al. 2020), GWC1 for high grain quality in rice (Guo et al. 2020b), PAMP-triggered immunity in Arabidopsis (Kato et al. 2020), single nucleotide substitution at the 3′-end of SBPase gene involved in Calvin cycle affecting plant growth and grain yield in rice (Li et al. 2020), ZmCLE7 underlying fasciation in maize (Tran et al. 2020), Brnym1, a magnesium-dechelatase protein, causing a stay-green phenotype in an EMS-mutagenized Chinese cabbage (Wang et al. 2020), PINOID regulating floral organ development by modulating auxin transport and interacting with MADS16 in rice (Wu et al. 2020), etc.

Genes underlying mutant phenotypes can be isolated and characterized by combining marker discovery, genetic mapping, and resequencing. However, direct comparison of mutant and wild type genomes could be a straightforward method to identify the mutant loci. NIKS (needle in the k-stack), a reference-free algorithm based on comparing k-mers in whole-genome sequencing data for precise discovery of homozygous mutations was proposed (Nordstrom et al. 2013). NIKS was applied in eight mutants induced in nonreference rice cultivars and also in two mutants of the nonmodel species Arabis alpina. In both the species, comparing pooled F2 plants selected for mutant phenotypes revealed small sets of causal mutations. Thus, NIKS enables forward genetics without requiring segregating populations, genetic maps, and reference sequences possibly in any species. Later, NIKS along with high-resolution melting (HRM) was used to identify regulators of postharvest senescence in EMS-derived mutants of Arabidopsis (Hunter et al. 2018).

A targeting-induced local lesions in genomes (TILLING) population was developed by irradiating rice seeds with X-rays, and the wild type and the early-maturing mutant were subjected to whole-genome resequencing (WGRS) to identify the SNPs. The expression of at least 202 structurally altered genes was changed in the mutant, and functional enrichment analysis of these genes revealed that their molecular functions were related to flower development (Hwang et al. 2014).

Genome-wide survey of artificial mutations induced by ethyl methanesulfonate and gamma rays in tomato indicated that C/G to T/A transitions were predominant in the EMS mutants, while C/G to T/A transitions, A/T to T/A transversions, A/T to G/C transitions, and deletion mutations were equally common in the gamma ray mutants. More than 90% of the mutations were located in intergenic regions, and only 0.2% were deleterious (Shirasawa et al. 2016).

8.3.4 Transcriptome and miRNA Profiling

RNA-Seq (RNA-sequencing) is a technique employed to examine the quantity and sequences of RNA in a sample using next-generation sequencing (NGS). It has become an indispensable tool for transcriptome-wide analysis of differential gene expression, differential splicing of mRNAs and RNA biology (single-cell gene expression, translation, and RNA structure) (Stark et al. 2019). RNA-Seq has been successfully used for characterizing the transcriptome of mutants in comparison to the wild types, thus identifying the underlying genes.

RNA-Seq was used to characterize a late leaf spot (LLS) disease susceptible EMS-derived mutant (M14) from a resistant genotype (Yuanza 9102) of peanut, and to identify the candidate genes under diseased condition. A total of 2219 differentially expressed genes including 1317 up-regulated genes and 902 down-regulated genes were detected. Pathogenesis-related (PR) protein genes were significantly up-regulated, while photosynthesis genes were down-regulated in M14. Moreover, the up-regulated WRKY transcription factors and down-regulated plant hormones related to plant growth were detected in the M14 (Han et al. 2017). RNA-Seq was also used to understand the molecular and genetic basis of plant height using a semi-dwarf peanut mutant and its wild line Fenghua 1 (FH1) at the mature stage. The DEGs were involved in hormone biosynthesis and signaling pathways, cell-wall synthetic and metabolic pathways (Guo et al. 2020a).

Transcriptome analysis and miRNA profile sequencing of seeds from the hydroxyproline (HYP)-tolerant mutant and its parent (Huayu 20) was conducted to elucidate the molecular basis of higher grain size and oil content. Major transcription factors linked to seed development and/or oil biosynthesis were differentially expressed between the genotypes. Moreover, differentially expressed genes related to seed development or oil biosynthesis were also identified. Differentially expressed miRNAs (116) and their target genes playing important role in seed development were identified (Sui et al. 2019).

8.3.5 Transposable Element Markers

Various types of transposable elements together make up a large (50–70%) portion of the plant genome. Largely, two types of TEs have been identified based on the transposition intermediates. Class I TEs transpose through RNA, while class II TEs do so by DNA. This difference results in the increase in the copy number of class I TEs within the genome. Structurally, class I TEs (specifically called retrotransposons) can either have direct long terminal repeats (LTRs) (as in LTR retrotransposons) or can be free from LTRs (as in non-LTR retrotransposons). Class II transposons generally have short terminal inverted repeat (TIR). Transposons have allowed the development of TE markers (Izsvak et al. 1999; Casa et al. 2000; Kumar and Hirochika 2001) because of their ubiquitous and wide distribution in genomes, and high polymorphism at their insertion sites. Also, the transposons have a particular genomic and transcriptional pattern thereby differentially influencing the genome and transcriptome (Marcon et al. 2015), and shaping the organization, stability (Chen and Ni 2006; Lu et al. 2012), and evolution of the genome (Casacuberta and Santiago 2003). Since a large number of mutations and genetic instabilities involve TE polymorphisms, development of TE markers (Flavell et al. 1998; Bhat et al. 2019b; Venkatesh and Nandini 2020) to detect such polymorphisms is very important.

8.3.6 Retrotransposon Markers

Several types of retrotransposon markers have been developed. They are sequence-specific amplification polymorphisms (S-SAP) (Waugh et al. 1997), inter-retrotransposon amplified polymorphism (IRAP), retrotransposon-microsatellite amplified polymorphism (REMAP) (Kalendar et al. 1999), and iPBS (Kalendar et al. 2010). Recently, a high-throughput sequencing platform was developed for an efficient screening of LTR retrotransposon families that show high levels of insertion polymorphism among closely related cultivars (Monden et al. 2014a, d). This approach was tested in strawberry genome to determine 24 LTR retrotransposon families. Among them, several families were experimentally confirmed for their high levels of insertion polymorphism among closely related cultivars. Additionally, a large number of insertion sites for retrotransposon families that showed diverse insertion patterns were identified using high-throughput sequencing (Yamane et al. 2012; Monden et al. 2014b, c, e). A marker system based on TaqMan quantitative PCR (qPCR) combined with retrotransposon-based insertion polymorphism (RBIP) was developed to estimate the dosage of an LTR retrotransposon (scIvana) in sugarcane (Metcalfe et al. 2015). Apart from LTR retrotransposons, Short Interspersed Nuclear Elements (SINEs), a non-LTR retrotransposon, was employed for the development Inter-SINE Amplified Polymorphism (ISAP) marker, which was employed for genotype-specific high-resolution fingerprinting in potato (Wenke et al. 2015). Suppression PCR was used for the whole-genome experimental identification of insertion/deletion polymorphisms of interspersed repeats (Mamedov et al. 2005). The same technique was applied for identifying genome-wide polymorphic insertion sites of PHARE1, a retrotransposon from the red bean (Vigna angularis (Willd.) Ohwi & H. Ohashi) (unpublished). Since many of the spontaneous and induced mutants involve TE activities, the aforesaid markers can be employed for mutant characterization. In fact, IRAPs were developed to differentiate the mutants from non-mutants in apple (Antonius-Klemola et al. 2006) and citrus (Du et al. 2018). In addition, IRAPs were also used in wheat (Belyayev et al. 2010; Nasri et al. 2013), flax (Smýkal et al. 2011), and Lallemantia iberica (Cheraghi et al. 2018), and REMAPs were generated in apple Antonius-Klemola et al. (2006), barley (Kalendar et al. 1999, 2000), wheat (Nasri et al. 2013), Lallemantia iberica (Cheraghi et al. 2018), almond (Sorkheh et al. 2017), etc. Sequence-specific amplified polymorphism (S-SAP) was developed through modification of the AFLP method, which has been widely used in several plant species (Waugh et al. 1997; Syed et al. 2005; Lou and Chen 2007; Konovalov et al. 2010; Petit et al. 2010; Melnikova et al. 2012; Du et al. 2018).

iPBS was used in Tetradium ruticarpum (Xu et al. 2018) and many Fagaceae species (Coutinho et al. 2018). To screen diverse LTR retrotransposon families at a genome-wide scale, iPBS method was employed using the Illumina HiSeq 2000 sequencing platform to obtain LTR sequences in citrus, apple, and soybean (unpublished data).

8.3.7 DNA Transposon Markers

Among the DNA transposons, miniature inverted-repeat transposable elements (MITEs) are most abundant (1000–15,000 per haploid genome) among the genomes. When compared to the non-autonomous elements of class II TEs, MITEs are relatively small (100–500 bp) and have a preference for insertion into 2–3 bp targets that are rich in A and T residues (Casa et al. 2000). Various families of MITEs have been described (Zhang et al. 2000). Heartbreaker (Hbr) family was the first one to be considered for marker development (Casa et al. 2000) after it was identified (Johal and Briggs 1992) and isolated from maize (Zhang et al. 2000). Subsequently, the diversity and dynamics of DcMaster-like elements of the PIF/Harbinger superfamily was studied in carrot (Grzebelus and Simon 2009) and Medicago truncatula (Grzebelus et al. 2007b). Vulmar/VulMITE were identified in Chenopodiaceae subfamily Betoideae (Grzebelus et al. 2011). Genomic distribution of MITEs in barley was determined by MITE-AFLP mapping (Takahashi et al. 2006), and the novel MITEs were isolated and analyzed for marker utility (Lyons et al. 2008). TEs were also identified using whole-genome sequencing in banana (Hřibová et al. 2010), peanut (Bertioli et al. 2016), wheat (Wanjugi et al. 2009), foxtail millet (Yadav et al. 2015), etc.

An exhaustive effort has been made in peanut to identify (Patel et al. 2004) and develop Arachis hypogaea miniature inverted-repeat transposable element 1 (AhMITE1) markers (Bhat et al. 2008; Gowda et al. 2010, 2011; Shirasawa et al. 2012a, b; Gayathri et al. 2018). Before the availability of the genome sequence of peanut (or its progenitors), Shirasawa et al. (2012a), followed a modified method of Nunome et al. (2006), which is generally used to develop SSR markers, to find out the polymorphic sites of AhMITE1.

Transposon display (TD) was also attempted for TEs other than MITE. TD for dTph1 (Ac/Ds family) in Petunia hybrida (Van den Broeck et al. 1998), DcMaster (PIF/Harbinger-like) in carrot (Grzebelus et al. 2007a), Rim2/Hipa (CACTA family) in Oryza ssp. (Kwon et al. 2005), nDart (hAT family) in rice (Takagi et al. 2007) etc. have been reported in the past.

Alternatively, transposon sites in genomes could be identified from the whole-genome sequence reads obtained by next-generation sequencing technology. Many tools are available for TE discovery and annotation (Goerner-Potvin and Bourque 2018). In peanut, with the availability of the genome sequences of the diploid progenitors of peanut (Bertioli et al. 2016), efforts were made to identify the genome-wide distribution of AhMITE1 (Gayathri et al. 2018). For this, a set of diverse genotypes (33) including the genetically unstable peanut mutants which show hyperactivity of AhMITE1 (Hake et al. 2018) were used to discover the AhMITE1 insertion polymorphic sites (AIPs). Whole-genome re-sequencing (WGRS) reads from a large number of diverse genotypes were analyzed using the computational method polymorphic TEs and their movement detection (PTEMD) (Kang et al. 2016) for the de novo discovery of AIPs. This tool has been applied in peanut lines (Gayathri et al. 2018) to find hundreds of AhMITE1 copies over the diploid peanut genomes and to develop AhMITE1 markers. These AhTE markers showed as high as 20% polymorphism. This effort demonstrated the utility of mutants as a source of TE polymorphism to develop TE markers. These AhTE markers were used to find the differences between TMV 2 and its EMS-derived mutant TMV 2-NLM for the AhMITE1 insertion polymorphisms and to map important taxonomic and productivity traits in peanut (Hake et al. 2017b; Jadhav et al. 2021). AhTE markers were also employed to identify the marker-trait associations for disease resistance (Kolekar et al. 2016; Shirasawa et al. 2018), quality traits (Hake et al. 2017a), and nutritional traits (Nayak et al. 2020) in peanut. Checking background genome recovery in marker-assisted backcross breeding is an important activity to ensure that the promising backcross line resembles the recurrent parent at all other genomic region except the region being transferred. AhTE markers have been successfully used assessing the background genome recovery in peanut (Yeri and Bhat 2016; Kolekar et al. 2017).

8.3.8 Markers to Detect Epimutations

With the growing understanding on epimutations, techniques to detect DNA methylation and histone modifications are being developed (Albertini and Marconi 2013; Ghosh et al. 2021). The methods which use comprehensive knowledge of an organism’s genome sequence are bisulfite modification and chromatin immunoprecipitation (ChIP). Methylation-sensitive amplification polymorphism (MSAP) is a modified amplified fragment length polymorphism (AFLP) technique which does not require genome sequencing. High-performance separation techniques such as high-performance capillary electrophoresis (HPCE) and high-performance liquid chromatography (HPLC) can also be used for detecting the cytosine methylation (Ghosh et al. 2021). metAFLP approach which allows for partition of complex variation into sequence changes, de novo methylation and demethylation was proposed and employed among the regenerants derived tissue culture in triticale (Machczyńska et al. 2014). A computational method AlphaBeta was demonstrated for estimating the precise rate of stochastic events using pedigree-based DNA methylation data as input to study transgenerationally heritable epimutations in clonal or sexually derived mutation accumulation lines, as well as somatic epimutations in long-lived perennials (Shahryary et al. 2020).

Initial method for the identification and quantification of histone post-translational modifications depended on mass spectrometry (Freitas et al. 2004). Later, the techniques like DNase-seq, which is based on nuclease DNase I, and ATAC-seq, which is based on transposase Tn5, have been widely used to identify genomic regions associated with open chromatin. These techniques have played a key role in dissecting the regulatory networks in gene expression in both animal and plant species. Zhao et al. (2020) developed a technique, named MNase hypersensitivity sequencing (MH-seq), to identify genomic regions associated with open chromatin in Arabidopsis thaliana. Genomic regions enriched with MH-seq reads were referred as MNase hypersensitive sites (MHSs), and these MHSs overlapped with the majority (~90%) of the open chromatin identified previously by DNase-seq and ATAC-seq. Further, 22% MHSs not covered by DNase-seq or ATAC-seq reads were identified by this technique, and they were referred to as “specific MHSs” (sMHSs).

8.4 Opportunities to Develop New Marker Systems

Various marker systems are in demand for their applications in crop improvement. The development of new marker systems requires identification of genome-wide DNA features. Like SNPs and TE insertion polymorphisms, DNA copy number variation (CNV) is an important source of genetic variation, which has been recognized recently (Sebat et al. 2004). Nearly 12% of the human genome is covered by 1447 CNV (Redon et al. 2006). A new method called CNV-Seq was proposed to detect copy number variation using high-throughput sequencing (Xie and Tammi 2009). Genome-wide analysis in the mutant and wild type genotypes could identify the CNVs in wheat (Diaz et al. 2012; Nilsen et al. 2020), soybean (Bolon et al. 2014; Lemay et al. 2019), maize (Jamann et al. 2014), banana (Datta et al. 2018), etc. Yin et al. (2020) observed the structural variations (SVs like CNVs and PAVs) among the resistant gene analogs (RGAs) in peanut. Differential expression between the resistant and the susceptible genotypes was more pronounced among the RGAs with SV than those RGAs without SVs. Thus, availability of SV-based markers in future may help identifying the candidate genes apart from characterizing the mutants.

In peanut, we made an effort to identify SNPs, insertion sites of AhMITE1, copy number variations and DNA methylation sites among the mutants and non-mutant genotypes to assess their scope in developing the new marker systems (Table 8.1) by analyzing the WGRS data of 231 peanut genotypes, including a few mutants, available in the public domain as an effort towards marker development (unpublished data). CNV analysis with a mean window size of 7672 bases identified a large number of CNVs in mutants though non-mutant genotypes showed more CNVs across all the chromosomes. Presence and absence variations (PAVs) were also observed at a few loci. Currently, the efforts are in progress to use CNVs and PAVs for marker development by designing the primers flanking the DNA sequence which shows CNV/PAV. Peanut mutants also showed considerably large number of SNPs, AhMITE1 insertion sites, and DNA methylation sites though the numbers were more in non-mutant genotypes (Table 8.1), indicating the possibility of developing the markers to detect the SNPs, AhMITE1, and DNA methylation for various applications (Fig. 8.2).

Table 8.1 Single nucleotide polymorphism, copy number variation, AhMITE1 insertion polymorphism, and DNA methylation sites among the peanut genotypes (Hake et al. 2018; Bhat et al. 2019a)
Fig. 8.2
A colored scatter plot shows the logarithm of ratio base 2 versus chromosome, where the different colors are plotted with numbers 0, 1, 2, 3, 4, and 5.

Copy number variation between the mutant and the wild types of peanuts (Hake et al. 2018)

8.5 Conclusions and Prospects

Beginning with the morphological markers to biochemical markers to DNA markers, they have been successfully used for mutant characterization, and mapping and identification of mutant loci in various plant systems. Considerably large number of genes have been identified from these mutants for various uses. Other categories of mutants obtained with DNA tagging, gene editing, gene silencing, etc. have also contributed extensively for the gene discovery. In spite of these progresses, efforts are underway to use other genomic features (CNVs, PAVs) for marker development. DNA and RNA sequencing and methylome sequencing are being applied in various crops to detect the mutations and epimutations. In the future, the mutant characterization at global levels of genome, transcriptome, and methylome could greatly enhance the efficiency of gene identification for better crop productivity.