Introduction

The concept of genetic markers is not a new one; Gregor Mendel used phenotype-based genetic markers in his experiment in the nineteenth century. Later, phenotype-based genetic markers for Drosophila led to the establishment of the theory of genetic linkage. The limitations of phenotype based genetic markers led to the development of more general and useful direct DNA based markers that became known as molecular markers. A molecular marker is defined as a particular segment of DNA that is representative of the differences at the genome level. Molecular markers may or may not correlate with phenotypic expression of a trait. Molecular markers offer numerous advantages over conventional phenotype based alternatives as they are stable and detectable in all tissues regardless of growth, differentiation, development, or defense status of the cell are not confounded by the environment, pleiotropic and epistatic effects.

An ideal molecular marker technique should have the following criteria: (1) be polymorphic and evenly distributed throughout the genome, (2) provide adequate resolution of genetic differences (3) generate multiple, independent and reliable markers (4) simple, quick and inexpensive (5) need small amounts of tissue and DNA samples; (6) have linkage to distinct phenotypes and (7) require no prior information about the genome of an organism. Unfortunately no molecular marker technique is ideal for every situation. Techniques differ from each other with respect to important features such as genomic abundance, level of polymorphism detected, locus specificity, reproducibility, technical requirements and cost (Table 1). Depending on the need, modifications in the techniques have been made, leading to a second generation of advanced molecular markers. In this review we have evaluated the recent advances made in molecular marker techniques and their application in plant sciences.

Table 1 Comparison of various aspects of frequently used molecular marker techniques

Molecular marker techniques

The publication of Botstein et al. (1980) about the construction of genetic maps using restriction fragment length polymorphism (RFLP) was the first reported molecular marker technique in the detection of DNA polymorphism.

Basic molecular marker techniques

Basic marker techniques can be classified into two categories: (1) non-PCR-based techniques or hybridization based techniques and, (2) PCR-based techniques.

Non-PCR-based techniques

Restriction fragment length polymorphism (RFLP)

In RFLP, DNA polymorphism is detected by hybridizing a chemically labelled DNA probe to a Southern blot of DNA digested by restriction endonucleases, resulting in differential DNA fragment profile. This differential profile is generated due to nucleotide substitutions or DNA rearrangements like insertion or deletion or single nucleotide polymorphisms. The RFLP markers are relatively highly polymorphic, codominantly inherited and highly reproducible. Because of their presence throughout the plant genome, high heritability and locus specificity the RFLP markers are considered superior. The method also provides opportunity to simultaneously screen numerous samples. DNA blots can be analyzed repeatedly by stripping and reprobing (usually eight to ten times) with different RFLP probes. The technique is not very widely used because it is time consuming, involves expensive and radioactive/toxic reagents and requires large quantity of high quality genomic DNA. The requirement of prior sequence information for probe generation increases the complexity of the methodology. These limitations led to the conceptualization of a new set of less technically complex methods known as PCR-based techniques.

PCR-based techniques

After the invention of polymerase chain reaction (PCR) technology (Mullis and Faloona 1987), a large number of approaches for generation of molecular markers based on PCR were detailed, primarily due to its apparent simplicity and high probability of success. Usage of random primers overcame the limitation of prior sequence knowledge for PCR analysis and facilitated the development of genetic markers for a variety of purposes. PCR-based techniques can further be subdivided into two subcategories: (1) arbitrarily primed PCR-based techniques or sequence non-specific techniques and (2) sequence targeted PCR-based techniques.

Arbitrarily primed PCR-based markers

Random amplified polymorphic DNA (RAPD)

The basis of RAPD technique is differential PCR amplification of genomic DNA. It deduces DNA polymorphisms produced by “rearrangements or deletions at or between oligonucleotide primer binding sites in the genome” using short random oligonucleotide sequences (mostly ten bases long) (Williams et al. 1991). As the approach requires no prior knowledge of the genome that is being analyzed, it can be employed across species using universal primers. The major drawback of the method is that the profiling is dependent on the reaction conditions so may vary within two different laboratories and as several discrete loci in the genome are amplified by each primer, profiles are not able to distinguish heterozygous from homozygous individuals (Bardakci 2001). Due to the speed and efficiency of RAPD analysis, high-density genetic mapping in many plant species such as alfalfa (Kiss et al. 1993), faba bean (Torress et al. 1993) and apple (Hemmat et al. 1994) was developed in a relatively short time. The RAPD analysis of NILs (non-isogenic lines) has been successful in identifying markers linked to disease resistance genes in tomato (Lycopersicon sp.) (Martin et al. 1991), lettuce (Lactuca sp.) (Paran et al. 1991) and common bean (Phaseolus vulgaris) (Adam-Blondon et al. 1994).

Arbitrarily primed polymerase chain reaction (AP-PCR) and DNA amplification fingerprinting (DAF) techniques are independently developed methodologies, which are variants of RAPD. For AP-PCR (Welsh and McClelland 1990) a single primer (about 10–15 nucleotides long) is used. The technique involves amplification for initial two PCR cycles at low stringency. Thereafter the remaining cycles are carried out at higher stringency by increasing the annealing temperature). This variant of RAPD was not very popular as it involved autoradiography but it has been simplified as fragments can now be fractionated using agarose gel electrophoresis. The DAF technique involves usage of single arbitrary primers shorter than ten nucleotides for amplification (Caetano-Anolles and Bassam 1993) and the amplicons are analysed using polyacrylamide gel along with silver staining.

Amplified fragment length polymorphism (AFLP)

To overcome the limitation of reproducibility associated with RAPD, AFLP technology (Vos et al. 1995) was developed. It combines the power of RFLP with the flexibility of PCR-based technology by ligating primer-recognition sequences (adaptors) to the restricted DNA and selective PCR amplification of restriction fragments using a limited set of primers (Fig. 1a). The primer pairs used for AFLP usually produce 50–100 bands per assay. Number of amplicons per AFLP assay is a function of the number selective nucleotides in the AFLP primer combination, the selective nucleotide motif, GC content, and physical genome size and complexity. The AFLP technique generates fingerprints of any DNA regardless of its source, and without any prior knowledge of DNA sequence. Most AFLP fragments correspond to unique positions on the genome and hence can be exploited as landmarks in genetic and physical mapping. The technique can be used to distinguish closely related individuals at the sub-species level (Althoff et al. 2007) and can also map genes. Applications for AFLP in plant mapping include establishing linkage groups in crosses, saturating regions with markers for gene landing efforts (Yin et al. 1999) and assessing the degree of relatedness or variability among cultivars (Mian et al. 2002). For high-throughput screening approach, fluorescence tagged primers are also used for AFLP analysis. The amplified fragments are detected on denaturing polyacrylamide gels using an automated ALF DNA sequencer with the fragment option (Huang and Sun 1999).

Fig. 1
figure 1

a Schematic representation of AFLP technique [Genomic DNA was restricted simultaneously by rare cutting and frequent cutting enzymes which results in fragments with overhanging ends. The restricted fragments were ligated to end specific adaptors and using primers with random ends (2–3 nucleotides) a set of fragments was amplified and analysed by either agarose and polyacrylamide electrophoresis].b Schematic description of microsatellite-based marker technique (Microsatellites comprise of tandem repeats of 1–5 nucleotide length. The number of repeats is highly variable among individuals. Primers specific to the regions flanking the microsatellites are designed and used to amplify the microsatellite region. Due to the hypervariability of the repeats the amplicons will differ in length and can be analysed by polyacrylamide gel electrophoresis)

Sequence specific PCR based markers

With the advent of high-throughput sequencing technology, abundant information on DNA sequences for the genomes of many plant species has been generated (Goff et al. 2002; The Arabidopsis Genome Initiative 2000; Yu et al. 2002). ESTs of many crop species have been generated and thousands of sequences have been annotated as putative functional genes using powerful bioinformatics tools. In order to correlate DNA sequence information with particular phenotypes, sequence-specific molecular marker techniques have been designed.

Microsatellite-based marker technique

Microsatellite or short tandem repeats or simple sequences repeats are monotonous repetitions of very short (one to five) nucleotide motifs, which occur as interspersed repetitive elements in all eukaryotic genomes (Tautz and Renz 1984). Variation in the number of tandemly repeated units is mainly due to strand slippage during DNA replication where the repeats allow matching via excision or addition of repeats (Schlotterer and Tautz 1992). As slippage in replication is more likely than point mutations, microsatellite loci tend to be hypervariable. Microsatellite assays show extensive inter-individual length polymorphisms during PCR analysis of unique loci using discriminatory primers sets (Fig. 1b).

The PCR amplification protocols used for microsatellites employ loci-specific either unlabelled primer pairs or primer pairs with one radiolabelled or fluorolabelled primer. Analysis of unlabelled PCR products is carried out using polyacrylamide or agarose gels. The employment of fluorescent labelled microsatellite primers and laser detection (e.g., automated sequencer) in genotyping procedures has significantly improved the throughput and automatisation (Wenz et al. 1998). However, due to the high price of the fluorescent label, which must be carried by one of the primers in the primer pair, the assay becomes costly. Schuelke (2000) introduced a novel procedure in which three primers are used for the amplification of a defined microsatellite locus: a sequence-specific forward primer with M13(–21) tail at its 5′ end, a sequence-specific reverse primer and the universal fluorescent-labelled M13(–21) primer which proved simple and less expensive. Microsatellites are highly popular genetic markers because of their codominant inheritance, high abundance, enormous extent of allelic diversity, and the ease of assessing SSR size variation by PCR with pairs of flanking primers. The reproducibility of microsatellites is such that, they can be used efficiently by different research laboratories to produce consistent data (Saghai Maroof et al. 1994). Locus-specific microsatellite-based markers have been reported from many plant species such as lettuce (Lactuca sativa L.) (van de Wiel et al. 1999), barley (Hordeum vulgare L.) (Saghai Maroof et al. 1994) and rice (Oryza sativa L.) (Wu and Tanksley 1993).

Single nucleotide polymorphism (SNPs)

Single nucleotide variations in genome sequence of individuals of a population are known as SNPs. They constitute the most abundant molecular markers in the genome and are widely distributed throughout genomes although their occurrence and distribution varies among species. Maize has 1 SNP per 60–120 bp (Ching et al. 2002), while humans have an estimated 1 SNP per 1,000 bp (Sachidanandam et al. 2001). The SNPs are usually more prevalent in the non-coding regions of the genome. Within the coding regions, an SNP is either non-synonymous and results in an amino acid sequence change (Sunyaev et al. 1999), or it is synonymous and does not alter the amino acid sequence. Synonymous changes can modify mRNA splicing, resulting in phenotypic differences (Richard and Beckman 1995). Improvements in sequencing technology and availability of an increasing number of EST sequences have made direct analysis of genetic variation at the DNA sequence level possible (Buetow et al. 1999; Soleimani et al. 2003). Majority of SNP genotyping assays are based on one or two of the following molecular mechanisms: allele specific hybridization, primer extension, oligonucleotide ligation and invasive cleavage (Sobrino et al. 2005). High throughput genotyping methods, including DNA chips, allele-specific PCR and primer extension approaches make single nucleotide polymorphisms (SNPs) especially attractive as genetic markers. They are suitable for automation and are used for a range of purposes, including rapid identification of crop cultivars and construction of ultra high-density genetic maps.

Advances in molecular marker techniques

The technical advancements and genome based discoveries has lead to the enhancement of molecular marker techniques. These advanced molecular marker techniques are an amalgamation of the advantageous characteristics of several basic techniques as well as incorporation of modifications in the methodology to increase the sensitivity and resolution to detect genetic discontinuity and distinctiveness.

Organelle microsatellites

Plant organelle genomes such as chloroplast DNA and mitochondrial DNA have been increasingly applied to study population genetic structure and phylogenetic relationships in plants. Due to their uniparental mode of transmission, chloroplast and mitochondrial genomes exhibit different patterns of genetic differentiation compared to nuclear alleles (Provan et al. 1999a, b). Thus, for a comprehensive understanding of plant population differentiation and evolution, three interrelated genomes must be considered, so in addition to nuclear microsatellites, marker techniques based on the chloroplast and mitochondrial microsatellites have also been developed.

Chloroplast microsatellites

The analysis of the chloroplast organelle provides information on the population dynamics of plants that is complementary to those obtained from the nuclear genome. Numerous studies have shown that chloroplast microsatellites consisting of relatively short and several mononucleotide stretches such as (dA)n × (dT)n (Powell et al. 1995) are ubiquitous and polymorphic components of chloroplast DNA. Chloroplast genome based markers uncover genetic discontinuities and distinctiveness among or between taxa with slight morphological differentiation, which sometimes cannot be revealed by nuclear DNA markers as interbreeding and genetic exchange has obscured the evidence of past demographic patterns (Wolfe et al. 1987). The conservation and homology of sequence in chloroplast genome makes it possible to compare genes across the plant kingdom and examine phylogenetic relationships in taxa that have diverged for hundreds of thousands to millions of years (Provan et al. 1999a, b). Chloroplast microsatellites are now becoming firmly established as a high-resolution tool for examining patterns of cytoplasmic variation in a wide range of plant species (Provan et al. 2001). Chloroplast microsatellites are particularly effective markers for studying mating systems, gene flow via both pollen and seeds, and uniparental lineage. Chloroplast microsatellite-based markers have been used for the detection of hybridization and introgression (Bucci et al. 1998), and the analysis of the genetic diversity (Clark et al. 2000) and phylogeography of plant populations (Parducci et al. 2001; Shaw et al. 2005). One limitation of the approach is the need of sequence data for primer construction. Primer sequences flanking chloroplast microsatellites are usually inferred from fully or partially sequenced chloroplast genomes. In general, these primer pairs produce polymorphic PCR fragments from the species of origin and their close relatives, but transportability to more distant taxa is limited. Attempts to design universal primers to amplify chloroplast microsatellites have resulted in a set of consensus chloroplast microsatellite primers (ccmp1–ccmp10) that aims at amplifying cpSSR regions in the chloroplast genome of dicotyledonous angiosperms (Weising and Gardner 1999). Most of the primer pairs derived from A or T mononucleotide repeats (n = 10) identified in the tobacco chloroplast genome, were functional as genetic markers in the Actinidiaceae, Brassicaceae and Solanaceae (Chung and Staub 2003). Universal primers for the amplification of chloroplast microsatellites in grasses (Poaceae) have also been developed (Provan et al. 2004).

Mitochondrial microsatellites

In contrast to animal mtDNA, which typically has a size of 10 MDa (MDa) per mitochondrial genome, plant mtDNA is far more complex, e.g., the maize mitochondrial genome has been estimated to be 320 MDa (Sederoff et al. 1981). In addition to larger size, plant mtDNA is characterized by molecular heterogeneity observed as classes of circular chromosomes that vary in size and relative abundance. In plants, mitochondrial genomes are not usually used for phylogenetic analysis due to a high rate of sequence reorganization (Sederoff et al. 1981). However, mitochondrial haplotype diversity related to sequence rearrangement proved useful in population differentiation of pine and fir taxa (Soranzo et al. 1999; Sperisen et al. 2001). Mitochondrial repeats have been used for trait-based segregation of population (Rajendrakumar et al. 2007).

Sequence characterized amplified regions (SCAR)

In order to utilize markers identified by arbitrary marker analysis (RAPD, AFLP, etc.) for map-based cloning, a single locus must be identified unequivocally. In addition, the arbitrary marker techniques are sensitive to changes in the reaction conditions. In order to bridge the gap between the ability to obtain linked markers to a gene of interest in a short time and the use of these markers for map-based cloning approaches and for routine screening procedures, SCAR marker technique was developed and applied. The SCARs are PCR-based markers that represent genomic DNA fragments at genetically defined loci that are identified by PCR amplification using sequence specific oligonucleotide primers (Paran and Michelmore 1993; McDermott et al. 1994). Derivation of SCARs involves cloning the amplified products of arbitrary marker techniques and then sequencing the two ends of the cloned products. The sequence is thereafter used to design specific primer pairs of 15–30 bp which amplify single major bands of the size similar to that of cloned fragment. Polymorphism is either retained as the presence or absence of amplification of the band or can appear as length polymorphisms convert dominant arbitrary primed marker loci into codominant SCAR markers. As SCARs are primarily defined genetically, they can be used both as physical landmarks in the genome and as genetic markers. Codominant SCARs are more informative for genetic mapping than dominant arbitrary-primed molecular markers, as they can be used to screen pooled genomic libraries by PCR and for physical mapping (Chelkowski and Stephen 2001), defining locus specificity (Paran and Michelmore 1993) as well as comparative mapping (Guo et al. 2003) and homology studies among related plant species.

Cleaved amplified polymorphic sequences (CAPS)

The CAPS marker technique provides a way to utilize the DNA sequences of mapped RFLP markers to develop PCR based markers thereby eliminating the tedious DNA blotting (Konori and Nitta 2005). Therefore CAPS are also known as PCR-RFLP markers (Konieczny and Ausubel 1993). The CAPS deciphers the restriction fragment length polymorphisms caused by single base changes like SNPs, insertions/deletions, which modify restriction endonuclease recognition sites in PCR amplicons (Chelkowski and Stephen 2001; Konieczny and Ausubel 1993). The CAPS assays are performed, by digesting locus-specific PCR amplicons with one or more restriction enzyme, followed by separation of the digested DNA on agarose or polyacrylamide gels (Fig. 2a). The primers are synthesized based on the sequence information available in databank of genomic or cDNA sequences or cloned RAPD bands. The CAPS analysis is versatile and can be combined with single strand conformational polymorphism (SSCP), SCAR, AFLP or RAPD analysis to increase the possibility of finding DNA polymorphisms. The CAPS markers are co-dominant and locus specific and have been used to distinguish between plants that are homozygous or heterozygous for alleles (Konieczny and Ausubel 1993). Thus, CAPS proves useful for genotyping, positional or map based cloning and molecular identification studies (Spaniolas et al. 2006; Weiland and Yu 2003) where sequence-based identification is not feasible. The technique is limited by mutations, which create or disrupt a restriction enzyme recognition site. To overcome this limitation, Michaels and Amasino (1998) proposed a variant of the CAPS method called dCAPS (derived cleaved amplified polymorphic sequence). In dCAPS analysis, a restriction enzyme recognition site, which includes the SNP is introduced into the PCR product by a primer containing one or more mismatches to template DNA (Neff et al. 1998). The modified PCR product is then subjected to restriction enzyme digestion and the presence or absence of the SNP is determined by the resulting restriction pattern. The method is simple, relatively inexpensive, and utilizes the ubiquitous technologies of PCR, restriction digestion and agarose gel analysis. This technique proved useful for following known mutations in segregating populations and positional based cloning of new genes in plants (Haliassos et al. 1989).

Fig. 2
figure 2

a Schematic description of CAPS technique (To generate CAPS marker, the requisite DNA sequence is first amplified and then the amplicon is restricted by the enzyme. Based on the gel analysis the presence or absence of the restriction enzyme is elucidated. The presence or absence of the restriction helps to differentiate allelic differences). b Diagrammatic representation of SRAP analysis involves the amplification of ORFs by using two primers of arbitrary sequences (The annealing temperature in the initial five cycles is set at 35°C. The following 35 cycles are run at 50°C. The amplified DNA fragments are separated by denaturing acrylamide gel electrophoresis)

Randomly amplified microsatellite polymorphisms (RAMP)

Microsatellite-based markers show a high degree of allelic polymorphism but they are labor-intensive. On the other hand RAPD markers are inexpensive but exhibit a low degree of polymorphism. To compensate for the weaknesses of these two approaches, a technique termed as random amplified microsatellite polymorphisms (RAMP) was developed (Wu et al. 1994). The technique involves a radiolabeled primer consisting of a 5′ anchor and 3′ repeats which is used to amplify genomic DNA in the presence or absence of RAPD primers. The resulting products are resolved using denaturing polyacrylamide gels and as the repeat primer is labeled, the amplification products derived from the anchored primer are only detected. The melting temperatures of the anchored primers are usually 10–15°C higher than those of the RAPD primers thus at higher annealing temperature only the anchored primer would anneal efficiently, whereas in PCR cycles at low annealing temperature both anchored microsatellite and RAPD primers would anneal. So the PCR program was modified such that there is switching between high and low annealing temperatures during the reaction. Most fragments obtained with RAMP primers alone disappear when RAPD primers are included, and different patterns are obtained with the same RAMP primer and different RAPDs, indicating that RAPD primers compete with RAMP primer during the low annealing temperature cycle. RAMP has been employed in genetic diversity studies of the cultivars of barley (Wu et al. 1994; Sanchez de la Hoz et al. 1996) and peach (Cheng et al. 2001).

Sequence-related amplified polymorphism (SRAP)

The aim of SRAP technique (Li and Quiros 2001) is the amplification of open reading frames (ORFs). It is based on two-primer amplification. The technique uses primers of arbitrary sequence, which are 17–21 nucleotides in length. It uses pairs of primers with AT- or GC- rich cores to amplify intragenic fragments for polymorphism detection. The primers consist of the following elements:

  1. 1.

    Core sequences, which are 13–14 bases long, where the first 10 or 11 bases starting at the 5′-end, are sequences of no specific constitution (“filler” sequences), followed by the sequence CCGG in the forward primer and AATT in the reverse primer.

  2. 2.

    The core is followed by three selective nucleotides at the 3′-end. The filler sequences of the forward and reverse primers must be different from each other and can be 10 or 11 bases long.

For the first five cycles the annealing temperature is set at 35°C. The following 35 cycles are run at 50°C.The amplified DNA fragments are fractionated by denaturing acrylamide gels and detected by autoradiography (Fig. 2b).

Sequence-related amplified polymorphism (SRAP) combines simplicity, reliability, moderate throughput ratio and facile sequencing of selected bands (Li and Quiros 2001). SRAP targets coding sequences in the genome and results in a moderate number of co-dominant markers. Sequencing demonstrated that SRAP polymorphism results from two events, fragment size changes due to insertions and deletions, which could lead to codominant markers, and nucleotide changes leading to dominant markers. The SRAP marker system has been adapted for a variety of purposes in different crops, including map construction, gene tagging and genetic diversity studies (Gulsen et al. 2006).

Target region amplification polymorphism (TRAP)

The TRAP technique (Hu and Vick 2003) is a rapid and efficient PCR-based technique, which utilizes bioinformatics tools and expressed sequence tag (EST) database information to generate polymorphic markers, around targeted candidate gene sequences. The technique uses two primers (18 nucleotides in length) to generate markers. One of the primers, the fixed primer, is designed from the targeted EST sequence in the database; the second primer is an arbitrary primer with either an AT- or GC-rich core to anneal with an intron or exon. As the TRAP technique can be used to generate markers for specific gene sequences, it is useful for genotyping germplasm and generating markers associated with desirable agronomic traits in crop plants for marker-assisted breeding (Hu et al. 2005). The technique has been effectively used in fingerprinting lettuce (Lactuca sativa L.) cultivars (Hu et al. 2005), in estimating genetic diversity (Alwala 2006) and mapping QTL in a wheat (Triticum aestivum L.) intervarietal recombinant inbred population (Liu et al. 2005).

Single strand conformation polymorphism (SSCP)

Single strand conformation polymorphism is the mobility shift analysis of single-stranded DNA sequences on neutral polyacrylamide gel electrophoresis, to detect polymorphisms produced by differential folding of single-stranded DNA due to subtle differences in sequence (often a single base pair) (Orita et al. 1989). In the absence of a complementary strand, the single strand experiences intra strand base pairing, resulting in loops and folds, that gives it a unique 3D structure which can be considerably altered due to single base change resulting in differential mobility (Orita et al. 1989). The SSCP analysis proves to be a powerful tool for assessing the complexity of PCR products as the two DNA strands from the same PCR product (Hayashi 1992) often run separately on SSCP gels, thereby providing two opportunities to score a polymorphism and secondly, resolving internal sequence polymorphisms in some PCR products from identical places in the two parental genomes (Fig. 3a). The PCR-based SSCP analysis is a rapid, simple and sensitive technique for detection of various mutations, including single nucleotide substitutions, insertions and deletions, in PCR-amplified DNA fragments (Hayashi 1993). Thus, it is a powerful technique for gene analysis particularly for detection of point mutations (Fukuoka et al. 1994). The technique shares similarity to RFLPs as it can also decipher the allelic variants of inherited and genetic traits. However, unlike RFLP analysis, SSCP analysis can detect DNA polymorphisms and mutations at multiple places in DNA fragments. The SSCP gels have been used to increase throughput and reliability of scoring during mapping by PCR fingerprinting in plants (Li et al. 2005). Fluorescence-based PCR-SSCP (F-SSCP) is an adapted version of SSCP analysis involving amplification of the target sequence using fluorescent primers (Makino et al. 1992). The major disadvantage of the technique is that the development of SSCP markers is labor intensive and costly and cannot be automated.

Fig. 3
figure 3

a SSCP analysis is based on the mobility shift of the single stranded DNA which is due to nucleotide changes (For SSCP analysis, the amplified target sequence is denatured and analysed on a native polyacrylamide gel). b IRAP markers are generated by the proximity of two LTRs using outward-facing primers annealing to LTR target sequences. c REMAP technique relies on amplification using one outward-facing LTR primer and a second primer from a microsatellite

Transposable elements-based molecular markers

Transposons are mobile genetic elements capable of changing their location in the genome. They were discovered almost 60 years ago in maize. There are two broad classes of transposable elements, each with characteristic properties (Finnegan 1989). For all Class I or retroelements, such as retrotransposons, short interspersed nuclear elements, and long interspersed nuclear elements, it is the element-encoded mRNA, and not the element itself, that forms the transposition intermediate. This means that each transposition event creates a new copy of the transposon while the original copy remains intact at the donor site. In contrast, Class II consists of DNA transposons, which change their location in the genome by a ‘cut and paste’ mechanism (Grzebelus 2006). This means that they excise themselves from the donor site and reintegrate themselves at the acceptor site. Based on structural characteristics, transposons can be further subdivided into subclasses, superfamilies, families, and subfamilies based on the type and orientation of open reading frames; the presence, orientation, length, and sequence of their terminal repeats and the length and sequence of target site duplications created upon insertion (Grzebelus 2006).

Retrotransposon-based molecular markers

In plants with large genomes, retrotransposons are the major class of repetitive DNA (Kumar and Bennetzen 1999) comprising 40–60% of the entire genome. Based on their structural organization and amino acid similarities among their encoded reverse transcriptases, retrotransposons can be divided into three categories. Long terminal direct repeats (LTRs) flank two of these categories and they encode proteins similar to the retroviruses. These LTR-retrotransposons are referred to as the gypsy-like and copia-like retrotransposons. The third class of retrotransposons, the LINE1-like or non-LTR retrotransposons, lack terminal repeats and encode proteins with significantly less similarity to those of the retroviruses. Retrotransposons replicate by successive transcription, reverse transcription, and insertion of the new cDNA copies back into the genome.

Copia-like (Voytas et al. 1992; Kumar et al. 1996) and gypsy-like retrotransposons (Suoniemi et al. 1998) are present throughout the plant kingdom. Retrotransposons provide an excellent opportunity to develop molecular marker system (Kalendar et al. 1999) due to their long, defined, conserved sequences and new insertional polymorphisms produced by replicationally active members. The new insertions help organizing insertion events temporally in a lineage (Shimamura et al. 1997) and thus can be used to determine pedigrees and phylogenies (Hafez et al. 2006). Retrotransposon-based molecular analysis relies on amplification using a primer corresponding to the retrotransposon and a primer matching a section of the neighboring genome. SSAP (sequence-specific amplified polymorphism) relies on amplification of DNA between a retrotransposon integration site and a restriction site with a ligated adapter (Waugh et al. 1997). In IRAP (inter-retrotransposon amplified polymorphism), DNA between two nearby retrotransposons or LTRs is amplified. Retrotransposon-microsatellite amplified polymorphism (REMAP) involves amplification of fragments which lie between a retrotransposon insertion site and a microsatellite site. RBIP (retrotransposon-based amplified polymorphism) detects loci occupied by or empty of a retrotransposon.

(a) Inter-retrotransposon amplified polymorphism (IRAP) and REtrotransposon-microsatellite amplified polymorphism (REMAP)

IRAP and REMAP are two amplification-based marker methods which have been developed based on the position of given LTRs within the genome (Kalendar et al. 1999). These two markers have been developed originally for BARE-I retrotransposon of Hordeum genus, which is present in the barley genome in numerous copies. The IRAP markers are generated by the proximity of two LTRs using outward-facing primers annealing to LTR target sequences (Fig. 3b). In REMAP, amplification between LTRs proximal to simple sequence repeats such as constitutive microsatellites produces markers (Fig. 3c). Both IRAP and REMAP examine polymorphism in retrotransposon insertion sites, IRAP between retrotransposons and REMAP between retrotransposons and microsatellites (SSRs). Retrotransposons can integrate in either orientation into the genome. For head-to-head and tail-to-tail orientations, PCR products can be generated using a single primer from elements sufficiently close to one another. Intervening genomic DNA for elements in head-to-tail orientation is amplified using both 5′ and 3′ LTR primers. The REMAP method relies on one outward-facing LTR primer and a second primer from a microsatellite. Primers were designed to the (GA)/, (CT)/, (CA)/, (CAC)/, (GTG)/, and (CAC)/ microsatellites and were anchored (all but one) to the microsatellite 3′ terminus by the addition of a single selective base at the 3′ end. In both techniques, polymorphism is detected by the presence or absence of the PCR product. Lack of amplification indicates the absence of the retrotransposon at the particular locus. About 30 bands were visualized following a single PCR reaction (Kalendar et al. 1999). As these markers were extremely polymorphic, they can prove useful for evaluating intraspecific relationships. Copia-SSR marker assay, a variant of REMAP, utilizes a Ty-1 copia-specific primer along with anchored SSR primers (Zietkewicz et al. 1994). IRAP technique has been used in genome classification of banana cultivars (Nair et al. 2005). The IRAP and REMAP techniques have been used to detect similarity between 51 rice cultivars (Branco et al. 2007).

(b) Sequence-specific amplification polymorphism (S-SAP)

The technique was first used to investigate the location of BARE-1 retrotransposons in the barley genome (Waugh et al. 1997). In principle, it is a simple modification of the standard AFLP (amplified fragment length polymorphism) protocol (Vos et al. 1995). The final amplification is performed with retrotransposon-specific and MseI-adaptor-specific primers. S-SAP has been extensively used to generate markers to study genetic diversity and to prepare linkage maps in several plants, including the pea, Medicago, wheat, and the cashew (Pearce et al. 2000; Porceddu et al. 2002; Queen et al. 2004; Syed et al. 2005).

(c) Retrotransposon-based insertion polymorphism (RBIP)

The technique was developed using the PDR1 retrotransposon in the pea (Flavell et al. 1998). It requires the sequence information of the 5′ and 3′ regions flanking the transposon. When a primer specific to the transposon is used together with a primer designed to anneal to the flanking region, they generate a product from template DNA containing the insertion. On the other hand, primers specific to both flanking regions amplify a product if the insertion is absent. Polymorphisms can be identified using standard agarose gel electrophoresis, or by hybridization with a reference PCR fragment. Hybridization is more useful for automated, high throughput analysis. It is much more costly and technically complicated than other methods for detecting transposon insertions.

Transposable display (TD)

Transposable display (TD) is a modification of the AFLP technique that permits the simultaneous detection of many TEs from high copy number lines. The technique is a modification of the AFLP procedure where PCR products are derived from primers anchored in a restriction site (i.e., BfaI or MseI) and a transposable element rather than in two restriction sites (van den Broeck et al. 1998). Individual transposons are identified by a ligation-mediated PCR that starts from within the transposon, and amplifies part of the flanking sequence up to a specific restriction site. Resulting PCR products can be analysed in a high resolution polyacrylamide gel system. Transposable Display was first used to reveal the copy number of the dTph1 transposon (TIRs) family in petunia and related insertion events (van den Broeck et al. 1998). It also allows detection of an insertion that can be correlated with a particular phenotype. Casa et al. (2000) exploited the unique properties of a group of TEs (transposable elements) called miniature inverted repeat transposable elements (MITEs) using TD technique to develop a new class of molecular marker for analysis Hbr transposon family in maize.

Inter-MITE polymorphism (IMP)

The technique is in principle very similar to IRAP, except that it uses MITE-like transposons rather than retrotransposons. Miniature inverted-repeat transposable elements (MITEs) are short, non-autonomous DNA elements (class-II transposons) that are widespread and abundant in plant genomes and exhibit high copy number and intrafamily homogeneity in size and sequence. Most of the hundreds of thousands of MITEs identified to date have been divided into two major groups on the basis of shared structural and sequence characteristics: Tourist-like and Stowaway-like.

The IMP technique was first used to identify two groups of MITEs in barley, one belonging to the Stowaway family, and the other to the recently identified Barfly family (Chang et al. 2001).

RNA-based molecular markers

Biological responses and developmental programming are regulated by the precise control of genetic expression. Obtaining in depth information about these processes necessitates the study of differential patterns of gene expression. PCR-based marker techniques, such as, cDNA-AFLP and RAP-PCR are used for differential RNA study by selective amplification of cDNAs.

cDNA-SSCP

The SSCP analysis of RT-PCR products can be used to evaluate the expression status (presence and relative quantity) of highly similar homologous gene pairs from a polyploid genome. Replicated tests show that cDNA-SSCP reliably separates duplicated transcripts with 99% sequence identity (Cronn and Adams 2003). This technique has been used to gain remarkable insight into the global frequency of silencing in synthetic and natural polyploids.

RNA fingerprinting by arbitrarily primed PCR (RAP-PCR)

The RAP-PCR technique (Welsh et al. 1992) involves fingerprinting of RNA populations using arbitrarily selected primer at low stringency for first and second strand cDNA synthesis followed by PCR amplification of cDNA population. The method requires nanograms of total RNA and is unaffected by low levels of genomic DNA contamination. Differential PCR fingerprints are detected for RNAs from the same tissue isolated from different individuals and for RNAs from different tissues from the same individual. The individual-specific differences revealed are due to sequence polymorphisms and are useful for genetic mapping of genes. The tissue-specific differences revealed are useful for studying differential gene expression.

cDNA-AFLP

The cDNA-AFLP is a novel RNA fingerprinting technique to display differentially expressed genes (Bachem et al. 1996). The methodology includes digestion of cDNAs by two restriction enzymes followed by ligation of oligonucleotide adapters and PCR amplification using primers complementary to the adapter sequences with additional selective nucleotides at the 3′ end (Bachem et al. 1998). The cDNA-AFLP technique is a more stringent and reproducible than RAP-PCR (Liang and Pardee 1992). In contrast to hybridization-based techniques, such as cDNA microarrays, cDNA-AFLP can distinguish between highly homologous genes from individual gene families. There is no requirement of any preexisting sequence information in cDNA-AFLP, thus it is valuable as a tool for the identification of novel process-related genes (Akihiro et al. 2006; van der Hoveven et al. 1996; Yaa et al. 2007). Identification of stress-regulated genes has been a major application of cDNA-AFLP (Mao et al. 2004).

Impact of molecular marker techniques

With the advent of molecular markers, it is now possible to make direct inferences about genetic diversity and interrelationships among organisms at the DNA level without the confounding effects of the environment and/or faulty pedigree records. Genetic analyses of plant and animal populations and species for taxonomic, evolutionary and ecological studies tremendously benefited from the development of various molecular marker techniques. Each molecular marker technique is based on different principles but their application is to bring out the genome-wide variability. An obvious problem that usually arises is, how to choose the most appropriate DNA marker among the myriad of different marker technologies. In general, the choice of a molecular marker technique has to be a compromise between reliability and ease of analysis, statistical power and confidence of revealing polymorphisms (Table 1).

The first marker technology that was employed for physical mapping of plant genomes was RFLP. The technique requires prior sequence information and is expensive. With the invention of PCR technology, marker techniques such as RAPD, AFLP, AP-PCR were developed. These techniques are fast, inexpensive and do not require prior sequence information. Techniques such as RAPD and AFLP have been routinely used for population genetics (Althoff et al. 2007) and breeding purposes. They are also used for tagging a phenotypic trait to a genetic component (Agarwal et al. unpublished). In order to convert arbitrarily primed PCR products in genomic physical landmarks, SCAR technique was designed. Microsatellite marker techniques utilize the intra as well as inter individual variation in microsatellites or simple sequence repeat region for fingerprinting analyses. Chloroplast and mitochondrial microsatellite-based techniques elucidate the parental lineage and thus are of immense help in breeding and evolutionary genetics. With the development and progress in technology, ESTs of plant genomes became available. This led to development of sequence-specific markers such as CAPS. The CAPS markers are employed to study the allelic variations of a gene in the population (Spaniolas et al. 2006) usually in form of SNPs. With the sequencing of plant genomes, new classes of genomic elements such as retrotransposons were identified. Retrotransposon-based markers such as IRAP, REMAP and S-SAP, help in analyzing genome-wide discontinuities among closely related individuals (Hafez et al. 2006). Molecular markers have also been used for differential cDNA display in order explain the basis of various biological phenomena and help to reveal novel genes responsible for the differential expression profile (Yaa et al. 2007). Thus, technological advances have contributed to advancements in every aspect of molecular marker techniques making them technically simpler as well as efficient and cost-effective (Fig. 4).

Fig. 4
figure 4

Schematic representation describing the development of molecular marker techniques and their advancements over last two decades

Most molecular marker techniques are applied to evaluate the genetic diversity and to construct a physical map of the genome being studied. Physical mapping of the linked markers helps in association of the genetic distance to the physical distance between them. Correlation of the pattern of inheritance of trait in a meiotic mapping population with that of individually mapped genetic markers, has led to construction of genetic linkage maps by locating many monogenic and polygenic traits to specific regions of the plant genome. The genome is restricted by rare cutter restriction enzymes to give fragments of several hundred kilobases to several megabases. Thereafter the fragments are separated by pulse-field gel electrophoresis and analysed by southern hybridization using probes comprising of closely linked molecular markers. If probes representing two molecular markers hybridize to the same fragment, the fragment size is taken to be the maximum distance between the two markers. Thus, the physical distance between the two markers is mapped. The mapped markers then are used for genome mapping by using ‘contigs’ of overlapping cloned fragments and assembling them to span the entire genome in question, or the particular region of interest. Physical mapping leads to enhancement the molecular genetics of the particular organism, since it serves as an archive of genomic information (Fig. 5).

Fig. 5
figure 5

Schematic representation of physical mapping of molecular markers