Introduction

The analysis of genetic variation has been an integral part of plant genetics, breeding, and ecology. Detecting genetic variation is a critical step in the analysis of polymorphisms, which were described as early as the Biblical record of Jacob’s lamb polymorphisms. Gregor Mendel drew inferences of genetic inheritance by analyzing seven polymorphic morphological traits in pea breeding experiments. However, the number of discernable morphological traits is not extensive, and environmental effects on morphological variation can restrict the utilization of morphological traits in modern genetic analyses, which require a large number of unbiased genetic markers for various purposes in crop and animal breeding, germplasm management, conservation ecology, and population genetics. Genetic markers are associated non-randomly with variation in natural or segregating populations. With the advancement of molecular genetic methodologies, various protein and DNA markers associated with genetic variation have been developed. Mutations are the primary sources of diversity and, thus, evolution. Transposable elements (TEs), common denizens of eukaryotic genomes, account for a large portion of the genetic variation in plants and animals. This review summarizes the molecular markers, TEs, and TE-derived molecular markers used in genetic analyses.

The development of molecular markers

The detection of variation by markers based on protein polymorphisms was adopted in the 1960s (Hubby and Lewontin 1966). The easy scoring of allozyme variation by electrophoresis supplemented the use of morphological markers during the 1970s and 1980s (Ganapathy and Scandalios 1973; Tanksley 1983). However, the limited number of detectable isozymes, tissue specificity, and environmental effects on isozyme expression were drawbacks to the use of isozymes for large-scale genetic analyses (Tanksley 1983). A method of detecting DNA polymorphism in lieu of using isozyme variation was developed in 1980 (Botstein et al. 1980). Restriction fragment length polymorphism (RFLP) assays can detect restriction-site variation in genome sequences by hybridizing molecular probes in a Southern blot analysis (Botstein et al. 1980). Because molecular probes can be developed from different sources of cDNAs and genomic clones, RFLP promised to detect a large amount of molecular variation (Botstein et al. 1980; Tanksley et al. 1989; Prince and Tanksley 1992). RFLP markers were popular during the 1980s and 1990s for genetic mapping, tagging agronomic traits, and demographic analyses of plant populations (reviews available in Tanksley et al. 1989; Bachmann 1994, references therein). Regardless of the advantages, the technical difficulty and requirement for large amounts of relatively pure genomic DNAs limited the routine utilization of RFLPs in practical applications and fieldwork.

The advent of polymerase chain reaction (PCR) made diverse, PCR-based marker systems possible. The first PCR-based marker system was randomly amplified polymorphic DNAs (RAPDs) (Willams et al. 1990). The RAPD technique is very simple, employing a short arbitrary primer (usually a decamer) in a PCR; but it has the major drawback of low reproducibility due to low annealing temperatures in the PCRs. In the mid-1990s, amplified fragment length polymorphism (AFLP) was developed to detect restriction-site variation by PCR (Vos et al. 1995). It employs restriction digestions using two different restriction enzymes and the ligation of adaptors to the restriction sites for PCR amplification, which allows for complex mixtures of amplified fragments. AFLP combines the reproducibility of RFLP with the simple PCR of RAPD. AFLP can detect many restriction-site variations with simple PCR reactions. Both RAPD and AFLP are dominant markers and do not require sequence information for designing the primers. The simple sequence repeat polymorphism (SSRP) (or microsatellite polymorphism) system was developed in the mid-1990s. It employs flanking primers to amplify simple sequence repeats (SSRs), which are highly abundant in eukaryotic genomes. It is the preferred molecular marker system in plant genetics and ecology research due to the high abundance, high reproducibility, and high allelic variation of the SSRPs as well as the simplicity of the technique (Ellergren 2004, references therein). However, SSRPs require sequence information to design the flanking primers that amplify the SSR motifs (Tauts and Renz 1984). Thus, although the SSRP system has many merits over other marker systems, its applicability to minor or neglected crops or plants is limited due to the paucity of DNA sequence information (Park et al. 2009).

Advances in genome sequencing techniques have enabled many innovative techniques for genetic marker development. The large amount of accurate sequences has allowed the identification of single nucleotide polymorphisms (SNPs) in many plant species (Ganal et al. 2009; Varshney et al. 2009). Array technologies have allowed the massive, genome-wide detection of SNPs and the analysis of genetic diversity in plants (Borevitz et al. 2003; Singer et al. 2006; Kumar et al. 2007). Diversity array technique (DArT) was a similar technique using arrays, but offered a low-cost high throughput analysis without sequence information in genome-wide genotyping (Jaccoud et al. 2001). Because plant genomes carry abundant repetitive sequences, and they carry scattered regions that were duplicated in ancient tetraploids, the SNP mining of genome sequences in plants is challenging (Mammadov et al. 2012). Transcriptome sequencing can identify SNPs in an inexpensive way with a low frequency of spurious false-positive or false-negative SNPs (Chagné et al. 2008). Sequencing the gene-enriched library derived from methylation-sensitive restriction fragments also minimizes repetitive DNA sequencing to improve SNP identification (Gore et al. 2009).

Transposable elements

TEs are mobile genetic elements that are found with high copy numbers in almost all eukaryotes (Wicker et al. 2007; Schulman and Wicker 2013). The transposition of TEs can generate genome plasticity by inducing various chromosomal mutations and allelic diversity (Fedoroff 2013; Oliver and Greene 2009; Oliver et al. 2013). Based on the transposition mechanisms, TEs are conventionally classified into two classes: class I TEs and class II TEs (Finnegan 1989). Class I TEs are retrotransposons that move in a semi-conservative “copy-and-paste” manner via RNA intermediates. Class II TEs are DNA transposons that transpose via DNA intermediates.

The original copy of a class I retrotransposon is transcribed into mRNA that is then reverse transcribed by the retrotransposon-encoded reverse transcriptase. The resulting DNA copy is then inserted into a new location while the original copy is retained at the original position, which results in genome expansion (Fedoroff and Bennetzen 2013; Lee and Kim, 2014). The class I retrotransposons are subdivided into two subfamilies: long terminal repeat (LTR) retrotransposons and non-LTR retrotransposons. Upon integration, the LTR retrotransposons often produce target site duplications (TSDs) of 4–6 bp. The LTR retrotransposons are more prevalent in plant genomes and carry two open reading frames (ORFs), GAG and POL (Voytas and Boeke 2002; Levin 2002). GAG encodes a protein for replication, and POL encodes a multi-protein comprising protease (PR), integrase (INT), reverse transcriptase (RT), and RNase H (RH). Depending on the order of the genes encoded, the LTR retrotransposons are further classified into Ty1-copia and Ty3-gypsy retrotransposons (Fig. 1). The gene order of Ty1-copia retrotransposons is PR-INT-RT-RH, whereas that of Ty3-gypsy retrotransposons is PR-RT-INT-RH. Although several types of non-LTR retrotransposons have been reported, long interspersed nuclear elements (LINEs) and short interspersed nuclear elements (SINEs) are the typical non-LTR retrotransposons. LINE-1, a typical LINE in mammals, carries two ORFs; a short ORF encodes a nucleic acid binding protein, and a long ORF encodes a protein with endonuclease and reverse transcriptase activities (Moran and Gilbert 2002). LINEs are usually 4–6 kb in size and mostly found in mammals, except for Del-2. Del-2 is highly abundant in the monocotyledonous Lilium species (Leeton and Smyth 1993). SINEs are small in size (80–150 bp) and highly abundant in animal genomes. The Alu element, the best-known SINE, is present in up to 500,000 copies in the human genome (Rowold and Herrara 2000). The SINEs are elusive in their origin but are presumed to be derived from various polymerase III transcripts, including 7SL or any of several tRNA genes (Sakamoto and Okada 1985).

Fig. 1
figure 1

Structure of the Ty1-copia and Ty3-gypsy retrotransposons. Both LTR retrotransposons have long terminal repeats at both ends. The gene order at the Pol region is different between the Ty1-copia and Ty3-gypsy elements. PBS primer binding site, PPT poly purine track, RH RNase H, INT integrase, RT reverse transcriptase

Class II TEs are DNA transposons that transpose by single-stranded or double-stranded DNA intermediates (Craig et al. 2002). The class II TEs are subdivided into subclass I, the terminal inverted repeat (TIR) transposons, and subclass II, the non-TIR transposons (Wicker et al. 2007; Schulman and Wicker 2013). TIR transposons transpose by a conservative “cut-and-paste” mechanism. The TIRs form a fold-back structure, which is excised by a transposase encoded by the transposon itself. The excised double-stranded DNA reinserts elsewhere in the genome. Based on the TIR sequences and the TSD sizes, the TIR transposons are further classified into nine superfamilies. Six superfamilies (Tc1-Mariner, hAT, Mutator, P, PIF-Harbinger, and CACTA) are found in plants. The TIR transposons increase their copy number by transposition from replicated chromatids to unreplicated chromosomes during DNA replication. Alternatively, gap repair of the donor site after excision can also increase the copy number of TIR transposons. However, the TIR transposons do not attain copy numbers as high as those of the LTR-retrotransposons. Miniature inverted-repeat transposable elements (MITEs) are TIR transposons (Bureau and Wessler 1992). MITEs are small in size and do not carry an ORF. Also, they are present in very high copy numbers in plant genomes. A careful comparison between the MITEs and the other TIR transposons revealed that Stowaway MITEs are derived from the Tc1/mariner superfamily, and Tourist MITEs are derived from the PIF/Harbinger superfamily (Feschotte et al. 2002a, b). The Helitron superfamily of subclass-2 non-TIR DNA transposons is present in plants (Wicker et al. 2007). Unlike the double-stranded cleavage used for integration by the subclass-1 DNA transposons, the subclass-2 non-TIR DNA transposons undergo transposition without double-stranded breakage. The Helitron elements appear to replicate via a rolling-circle mechanism involving the displacement of only one DNA strand during transposition, and they do not produce TSDs (Kapitonov and Jurka 2001). Helitrons are present in high copy numbers and capture pseudogene fragments in maize, which contributes to the high genomic diversity of maize (Du et al. 2009).

Molecular markers derived from transposable elements

TEs have been utilized as genetic markers because of their genome-wide distributions (Kalendar et al. 2011; Poczai et al. 2013; Bonchev and Parisod 2013). TE integration into functional genes often results in null alleles, and the excision of TEs from such sites could restore allelic function. Variable phenotypes derived from TE mobilization have provided efficient genetic and molecular tools for gene discovery and isolation through forward and reverse genetic strategies (Bensen et al. 1995; Das and Martienssen 1995; Kumar and Hirochika 2001). The ubiquity, high copy numbers, and genome-wide distributions of TEs have rendered genome-wide genetic markers out of both the class I and the class II TEs (Feschotte et al. 2002a, b; Huang et al. 2008; Zhuang et al. 2014). Transposon-display (TD) is a modified AFLP technique used to detect TE insertion polymorphisms (Korswagen et al. 1996) (Fig. 2). While the AFLP protocol detects restriction site variation, TD detects the presence/absence of TEs, which have been utilized in various fields of genetics, plant breeding, and ecology (Le and Bureau 2004; Grzebelus 2006; Lockton et al. 2008; Kalendar et al. 2011). Comparisons between AFLP-derived and retrotransposon-derived markers showed that the retrotransposon-derived markers were more informative in many studies of genetic diversity and demography among plant species (Tam et al. 2005; Kalendar et al. 2011). In our unpublished data from 23 different ecotypes of Arabidopsis thaliana, the observed heterozygosity and genetic diversity measured by class II transposon CACTA markers were higher than those measured by AFLP (Fig. 3). Table 1 illustrate the comparison of molecular marker systems of isozyme, mutation based markers, marker system using array technique, and TE-based markers.

Fig. 2
figure 2

Transposon display (TD). Genomic DNA is restriction-digested with Mse I, which recognizes TTAA to make a TA overhang at the cutting site. An adaptor for the restriction site is then ligated. Pre-amplification is carried out with a primer (MseI + 0; no selective nucleotides) complementary to the adaptor and a primer (P1) complementary to the internal site of the transposon. Then, selective amplification is carried out with an MseI adaptor primer with selective nucleotides (MseI + 3N) and a primer (P2) complementary to an internal sequence that is complementary to the terminal inverted repeat and the P1 site. In the diagram, the filled arrowheads at each end are the terminal inverted repeats (TIRs). The logic of the sequence-specific amplified polymorphism (SSAP) is the same as that of the TD except that SSAP utilizes LTR retrotransposons instead of DNA transposons

Fig. 3
figure 3

Fingerprinting profiles of ten ecotypes of Arabidopsis thaliana by AFLP (a) and CACTA-TD (b). The profiles revealed that the transposition based variations are higher than restriction site variations

Table 1 Comparison of different marker systems

Class I retrotransposon-derived molecular markers

Class I retrotransposons are highly abundant and widely distributed in eukaryotes, particularly in plant genomes. The retrotransposons transpose to new sites after transcription so that the original copy remains in its site in the genome (Finnegan 1989). This unidirectional transposition of the class I retrotransposons can provide highly informative utilities in phylogenetic analyses (Vitte et al. 2004; Grzebelus 2006; Jing et al. 2010). LTR retrotransposons are more prevalent than non-LTR retrotransposons in plants. Their chromosomal locations are either clustered in pericentric or intercalary heterochromatic regions or dispersed throughout the genome, which makes them suitable for developing PCR-based markers. Primers are usually designed from the LTRs near the insertion sites. The LTR subdomain sequences are conserved within retrotransposon families but are different between families, so many insertion polymorphisms within a retrotransposon family can be retrieved by a single PCR amplification using a primer that is complementary to the LTR. The LTR retrotransposon-derived markers are all dominant markers, so they cannot distinguish between insertion homozygotes and insertion heterozygotes. Although numerous retrotransposon-based marker systems were contrived, only the four most frequently used systems are covered in this review (Fig. 4).

Fig. 4
figure 4

LTR retrotransposon-based marker systems. a IRAP (inter-retrotransposon amplified polymorphism). IRAP amplifies between LTR transposons using a set of primers that are complementary to the 3′-end of LTRs to be outbound. LTRs are either head-to-tail (top) or head-to-head (bottom). If adjacent LTR-retrotransposons are same elements, a single primer can amplify the spacers. If they are different elements, two different primers are needed to amply. b REMAP (retrotransposon-microsatellite amplified polymorphism). REMAP amplifies between LTR-retrotransposon and adjacent microsatellites. PCR carries out with a primer complementary to the 5′- or 3′-end of LTR and a primer with a simple sequence motif plus selective nucleotides [e.g., (CGG)5G, (ATT)5C]. For multicopy elements in complex genomes, selective nucleotides can be added to the LTR primer to reduce amplified fragments. c RBIP (retrotransposon based insertion polymorphism). Unlike other retrotransposon-based markers, RBIP detects a single locus of presence/absence of retrotransposon. It utilizes three primers for each locus. Two primers (P1 and P3) are flanking primers inward to the locus. Another primer (P2) is an out-bound primer from the transposon. If LTR-retrotransposon is present at the locus, the P1 + P2 primer sets can amplify the target site, but the P1 + P3 primer sets do not yield products because the distance between two primers is beyond the PCR amplification. If the locus is empty, P1 + P2 do not yield amplification, but the P1 + P3 do yield amplification

Inter-retrotransposon amplified polymorphism (IRAP)

The IRAP system amplifies the sequences between two adjacent LTR retrotransposons by utilizing primers that are complementary to the 3′ end of the LTR sequence (Kalendar et al. 1999). It can amplify the spacer sequences either between LTR retrotransposons of the same lineage by a single primer or between LTR retrotransposons of different lineages by a set of primers derived from each lineage of LTR sequences. Because LTRs are direct repeats, the primers facing outward from the 3′ end of the right LTR can also prime the 3′ end of left LTR, which amplifies the interior sequences of the LTR retrotransposon. The interior amplification can be avoided by designing the primers to overlap bases at the 3′ end that do not match the LTR-interior interior junction (Kalendar et al. 2011). The LTR sequences between adjacent retrotransposons are arranged (a) head-to-head, (b) tail-to-tail, or (c) head-to-tail (Poczai et al. 2013). If the arrangement between two identical tandem duplicate LTR retrotransposons is either head-to-head or tail-to-tail, a single primer can amplify the spacer. If the adjacent retrotransposons are from different lineages (which is usually the case), two different primers derived from each LTR sequence are needed to amplify the IRAP. Each IRAP reaction produces multiple amplicons ranging in size from 300 to 3,000 bp (Branco et al. 2007; Fan et al. 2014). A major advantage of the IRAP technique is its experimental simplicity, because all that is needed in IRAP is a simple PCR and subsequent electrophoresis. Vukich et al. (2009) analyzed the genetic variability among Helianthus species using IRAP and detected a species-specific insertion of a Copia-like element, Helicopia, and distinct fingerprints distinguishing the annual and perennial Helianthus species. More recently, Fan et al. (2014) utilized the IRAP technique to dissect the genetic diversity of the masson pine (Pinus massoniana), which revealed very high genetic diversity in that gymnosperm species. They detected the expression of reverse transposases from both the Copia-type and the Gypsy-type retrotransposons by exposing the plants to various hormones and environmental stresses, but no changes were detected in the IRAP fingerprinting among the masson pine specimens.

Retrotransposon-microsatellite amplified polymorphism (REMAP)

The REMAP protocol is similar to that of IRAP. REMAP utilizes microsatellites (or SSRs) in conjunction with LTR-specific primers in PCRs (Kalendar et al. 1999; Kalendar and Schulman 2006). Microsatellites are short, SSRs that are highly redundant and polymorphic in eukaryotic genomes (Temnykh et al. 2001; Park et al. 2009). The REMAP PCR uses primers for microsatellite loci containing the repeat motif plus an additional anchoring nucleotide at the 3′ end [e.g., (CA)nA, (GC)nC] to avoid slippage of the primer between the individual SSR motifs. REMAP detects multiple loci that are similar in size and number distribution to those detected by IRAP. Both IRAP and REMAP reveal very broad polymorphic profiles among different genotypes within species as well as between species within a genus. Kalendar et al. (1999) analyzed the genomic distributions of the BARE-1 retrotransposon in 15 barley varieties (Hordeum vulgare L.) using the IRAP and REMAP protocols, which allowed a clear distinction of the varieties. Moreover, the same primer combinations were easily transferable for the fingerprinting of related species within the genus. Similarly, Branco et al. (2007) demonstrated the amenability of both IRAP and REMAP to the assessment of genetic similarity among rice varieties, and they presented the results of the differentiation of the Brazilian and Japanese rice varieties.

Sequence-specific amplified polymorphism (SSAP)

SSAP is a modified AFLP protocol. While AFLP requires no a priori sequence information and employs two restriction enzymes and adaptor systems to amplify the restriction-digested fragments (Vos et al. 1995), SSAP requires a priori transposon sequence information (Syed and Flavell 2007). SSAP employs a single restriction digestion and the ligation of an adaptor to the restriction site. Then, a primer complementary to the 3′ end of the LTR and a primer complementary to the adaptor sequence are used for the SSAP PCR. Genomic DNA is digested with a restriction enzyme, and adaptors are ligated to the restricted end. Then, a primer complementary to the adaptor and a primer complementary to the 3′ end of the LTR sequence are employed in the PCR. If the elements are low in copy number, the first amplification may yield discernable bands in denaturing polyacrylamide gel electrophoresis. If the first amplification produces too many bands to read, the second amplification with selective nucleotides at the ends of both primers can reduce the number of bands (Poczai et al. 2013). The SSAP protocol is basically the same as the TD employed for the amplification of the class II DNA transposons (Fig. 2). In the case of the class II transposons, the TD protocols were named for the particular transposon class that was incorporated (e.g., MITE-TD, CACTA-TD). SSAP usually produces a large number of amplicons that represent the TE insertion sites, but mutations at the restriction sites may also yield polymorphisms in SSAP assays (Petit et al. 2010). The retrotransposon insertion sites identified by SSAP are highly reproducible and produce multiple fragments that cover the whole genome, which are suitable for examining the LTR-retrotransposon insertions at a specific level. Moisy et al. (2008) surveyed 10 Ty1-copia-like retrotransposon families in the Vitis genus and showed that most of the scorable bands were polymorphic and that only a few insertion sites were fixed in the accessions surveyed. Genomic shocks, including biological or abiological stresses, and polyploidization can induce the mobilization of both class I and class II TEs (McClintock 1984; reference). Woodrow et al. (2010) demonstrated that the retrotransposition of Ty1-copia-like elements plays important roles in defense responses to environmental stresses in tetraploid wheat. Allopolyploidy is a major driving force in plant evolution, inducing rapid structural changes in hybrid genomes; TEs are major components of those changes (Liu and Wendel 2000; Kashkush et al. 2002; Josefesson et al. 2006; Parisod and Senerchia 2012). Petit et al. (2010) analyzed retrotransposon mobilization in synthetic allotetraploid tobacco by SSAP analysis of a Copia-type retrotransposon, Tnt1. The maternal Tnt1 was transmitted to progenies, whereas the paternal Tnt1 was lost completely. Because TE mobilization can induce genome instability and gene disruption, the activities of TEs are under the control of host epigenetic mechanisms (Martienssen and Chandler 2013). Parisod et al. (2009) used a methylation-sensitive SSAP technique to show rapid epigenetic reorganization near retrotransposons in hybrid and allopolyploid Spartina genomes. Moreover, their study also revealed that genome alteration appeared preferentially in the maternal subgenome, and the environment of TEs was specifically affected by large maternal-specific methylation changes.

Retrotransposon-based insertional polymorphism (RBIP)

RBIP utilizes a primer designed on the basis of the LTR sequence and another primer from the genomic sequence near the LTR sequence (Paux et al. 2010). It detects polymorphism for the integration of an element at a particular locus to supply accurate DNA profiles. Retrotransposon insertions usually span several kilobase pairs. PCR amplification with a primer pair flanking the insertion sequences will produce an amplicon if the site is empty. However, if insertion occurred, the PCR amplification may yield a very long amplicon or not yield any amplicon because the insertion is too long to amplify. The incorporation of a primer within the inserted sequence will ensure amplification in the latter case. This three-primer system produces co-dominant RBIP markers, which provide extremely useful phylogenetic information because the retrotransposon insertions are irreversible. Using this three-primer system, Vitte et al. (2004) demonstrated that two distinct types of Asian rice varieties, Indica and Japonica, originated from two independent domestication events in Asia. RBIP is a very valuable resource for the protection of breeders’ rights, because it provides accurate, cultivar-specific DNA markers of insertion events that occur during cultivar development. In the hopes of developing markers for marker-assisted breeding in pears (Pyrus pyrifolia Nakai), Kim et al. (2012) generated 22 RBIP markers that are able to distinguish 61 of the 64 Japanese pear cultivars.

Class II DNA transposon-derived markers

Cut-and-paste transposition does not leave the original copy behind after transposition, so those transposons do not usually reach very high copy numbers in the genome, unlike the class I retrotransposons. Nevertheless, some class II DNA transposons (e.g., MITE, CACTA) are present in the genome with copy numbers from several thousands to hundreds of thousands (Bureau and Wessler 1992, 1994; Kunze and Weil 2002; Park and Kim 2012). This review introduces MITE and CACTA transposons as molecular markers for phylogenetic and genetic analyses using TD. TD is a technique that uses sequence-tagged site (STS) markers (Thomas and Scott 1993; Talbert et al. 1994; Sanchez et al. 1999). Korswagen et al. (1996) developed a TD protocol by modifying the STS technique to utilize a primer that is complementary to the internal TC-1 transposon and another primer that flanks the transposon. Van den Broeck et al. (1998) further refined the TD protocol to be utilized for multicopy elements. Figure 3 illustrates the CACTA-TD, which was employed in an analysis of an Arabidopsis thaliana population (Park et al. 2014).

The MITEs are an exceptional class II DNA transposon family. MITEs are small (< 500 bp) and carry no obvious ORFs to be mobilized themselves (Bureau and Wessler 1992, 1994; Feschotte et al. 2002a, b). However, they are present in several hundreds of thousands of copies in their host genomes, which distinguish them among the class II DNA transposons. The extremely abundant copies make the MITEs useful as molecular markers. Park et al. (2003a) isolated an MITE subfamily, Pangrangja, from some Gramineae species. MITE-TD using the terminal invert repeat (TIR) of the Pangrangja element revealed that the Pangrangja elements are common in species across the genus Oryza. Moreover, the Pangrangja MITE-TD profiles were matched with the geographical distribution of Oryza species in Asia, Africa, and Australia (Park et al. 2003b, c). Subsequent analysis of segregating populations showed that the Pangrangja MITEs are distributed evenly along all the chromosomes in rice (Kwon et al. 2006b) and maize (Lee et al. 2004). Because MITE sequences are short, amplification of the insertion polymorphism by PCR was also developed in rice. Monden et al. (2009) analyzed segregating lines of rice derived from a japonica × japonica cross by MITE insertion polymorphisms. They mined the mPing MITE from the rice genome database and designed a primer pair flanking the MITE loci. Of the 183 MITE loci analyzed, 150 loci showed polymorphisms between two japonica rice lines and were used to construct a recombinant genetic map for the analysis of quantitative trait loci. MITE-TD is also a good resource for detecting the transpositional activity of transposons. Jiang et al. (2003) used MITE-TD analysis to demonstrate that the transposition of mPing MITEs could be induced in tissue culture. mPing is a non-autonomous element and can be mobilized in the presence of the autonomous element Pong by genomic shocks including anther culture (Kikuchi et al. 2003). The new insertion bands were isolated from a gel and sequenced after TD gel display, which revealed that the mPing insertion mostly occurred in low-copy regions of the rice genome.

CACTA is a prototype class II DNA transposon superfamily that was discovered in maize by classical genetics as the Enhancer (En) (Peterson 1953) and SuppressorMutator (Spm) (McClintock 1954) elements. The TIRs of CACTA are short, ranging from 13 to 20 bp, and they have distinctive 5′-CACTA-3′ sequences in their termini (Kunze and Weil 2002). The CACTA superfamily is present in high copy numbers in plants and can be classified into several subfamilies. Each subfamily has sequence conservation of about 20–30 bp at their termini (Wicker et al. 2003). Kwon et al. (2005) utilized the consensus sequence in the TIR of the Rim2/Hippa CACTA subfamily for a CACTA-TD analysis in Oryza species. The phylogenetic dendrogram derived from the Rim 2/Hippa CACTA-TD was congruent with the geographic distribution of the A genome diploid Oryza species (Kwon et al. 2005, 2006a). The Rim 2/Hippa CACTA-TD profiles were highly reproducible and applicable to the construction of a high-density genetic map of Oryza sativa L. (Kwon et al. 2006b). CACTA elements are also highly abundant in Arabidopsis and Brassica species (Zhang and Wessler 2004). Park and Kim (2012) isolated all copies of the CACTA-like elements from the genome of A. thaliana and located them near the centromeric regions of all five chromosomes (Park and Kim 2012). Because the CACTA-TD profiles were revealed to be highly variable among different ecotypes of A. thaliana, the relationship between CACTA transposition and ecotype differentiation was analyzed by joint analyses of CACTA-TD and conventional AFLP among ecotypes. Although the transposition might not directly cause ecotype differentiation, the authors proposed that the insertion or excision of TEs into or out of critically functional genes could still be directly or indirectly involved in ecotype differentiation. In crop breeding, cultivar fingerprinting is an important technique for protecting breeders’ rights. CACTA-TD was also successfully applied in rapeseed. Lee et al. (2012) mined CACTA elements from the Brassica sequence database and designed a CACTA-TD experiment with 10 commercial rapeseed cultivars. The polymorphic CACTA-TD fragments were isolated from gels, and the sequences were determined to generate sequence-characterized amplified region (SCAR) markers, which led to the development of six CACTA-TD-derived transposon insertion SCAR (Ti-SCAR) markers for rapeseed.

Conclusion

Molecular markers are indispensable tools in modern and agricultural genetics (Schulman 2007; Henry 2013; Jiang 2013). Although several marker systems have been developed for plant genotypes, and abundantly available genome sequences promise to provide powerful molecular markers, none of the marker systems is a “jack-of-all-trades” for all concerns in plant genome analyses. RFLP, RAPD, and AFLP are based on sequence variation, and SSRP detects variation in the repeat numbers of simple sequence motifs. Transposon-based marker systems are based on the presence/absence of transposons at genetic loci. From the genomic projects for many eukaryotic organisms, it was realized that transposons are at the center of the dynamic evolution of many genomes. Next-generation sequencing techniques are generating sequence information at a speed never before seen (Varshney et al. 2009; Kelly and Leitch 2011; Edwards 2013). Moreover, computational biology allows us to mine genome-wide transposons and analyze their phylogenetic relationships (Xing et al. 2013). Thus, the marker systems that harness transposons are providing a new venue, in addition to the other marker systems using sequence variations or simple sequence motif variations, for the dissection of genomes in the genomic era.