Introduction

Retrotransposons are mobile genetic elements, and transpose via an RNA intermediate. They are classified as LTR or non-LTR retrotransposons dependent on the presence or absence of long terminal repeats (LTRs), which are direct repeats flanking the internal coding regions (Staton et al. 2009). LTR retrotransposons can be further divided into two groups, the Ty1-copia group and the Ty3-gypsy group, based on the order of their coding domains, as well as their sequence similarity to the prototype elements in Drosophila and yeast (Boeke and Corces 1989; Zhao et al. 2010). Both Ty1-copia and Ty3-gypsy retrotransposons exist in plant genomes, usually as high-copy-number dispersed sequences, especially in gymnosperms (Nystedt et al. 2013). As the mechanism of retrotransposition is replicative, the parental copy is preserved with transpositional activity generating new insertions of the element at other sites of the genome, resulting in a high degree of heterogeneity and insertional polymorphism (Kumar and Bennetzen 1999; Carrier et al. 2012). In the cases examined, the increase of the retrotransposon copy number appeared to have been a major factor in the genome size growth of plants (Park et al. 2011; Zedek et al. 2010). Conversely, with a few exceptions, loss of retrotransposons through the recombinational production of solo LTRs and accumulation of deletions helps to keep genome size stable (Wicher and Keller 2007; Fedoroff 2013). Transcriptional and transpositional activations of transposable elements are mainly induced by abiotic and biotic stresses (Grandbastien 1998; Huang et al. 2012), and the activation has been regarded as the mechanism of genotypic remolding. These characteristics provide an excellent basis for the development of molecular marker systems (Schulman et al. 2012; Kalendar et al. 2011).

Several different retrotransposon-based molecular marker systems have been developed to visualize the genetic diversity (Kalendar et al. 2011; Smýkal et al. 2011; Jing et al. 2012; Abdollahi Mandoulakani et al. 2012). Most of the retrotransposon marker developments take advantage of two basic properties, namely the large insertions through transpositional activity and the conserved domains from which polymerase chain reaction (PCR) primers can be designed. The inter-retrotransposon amplified polymorphism (IRAP) method displays insertional polymorphisms by amplifying the segments of DNA between two retrotransposons. One significant virtue of IRAP is its experimental simplicity, which needs simple PCR followed by electrophoresis to resolve the PCR products. IRAP can be carried out with a single primer matching the conserved motifs, or with two primers. Adjacent TEs may be found in different orientations in the genome (head-to-head, tail-to-tail or head-to-tail), increasing availability to detect polymorphism resting with the method and primer combinations (Kalendar et al. 2011). To date, IRAP markers have been widely used to elucidate the genetic diversity in many species (Kalendar and Schulman 2006; Vukich et al. 2009; Campbell et al. 2011; Abdollahi Mandoulakani et al. 2012).

Masson pine (Pinus massoniana), a gymnosperm genus conifer native to Southern China, has been one of the most economically important forest trees, because it can be widely used for timber, pulp and resin production. Synecological studies had revealed that masson pine germplasm was prone to become short (Ding and Song 1998), chiefly due to the loss of genetic diversity. Molecular markers are extremely useful in the aspects of assessing genetic diversity and identifying potential novel genotypes among the masson pine germplasm. Several types of marker systems, including random amplified polymorphic DNA (RAPD), inter-simple sequence repeat (ISSR), and simple sequence repeat (SSR), have been used to analyze masson pine germplasm (Peng et al. 2003; Li et al. 2009; Cai and Ji 2009). Taken together, these studies show that masson pine has low genetic diversity, possibly resulting from a domestication bottleneck, or aforementioned markers produce a less even distribution across the genome compared to markers based on retrotransposons (Abdollahi Mandoulakani et al. 2011; Biswas et al. 2010; Yuan et al. 2012).

Previously, we demonstrated the occurrence of Ty1-copia and Ty3-gypsy group retrotransposons with a high heterogenous, and the total number of Ty1-copia group retrotransposons is approximately 89,577 molecules, and Ty3-gypsy group retrotransposons is about 29,310 molecules per genome (Fan et al. 2013). The objectives of the present study were (1) to detect the activation of masson pine retrotransposon under several abiotic stresses; (2) to isolate Ty1-copia and Ty3-gypsy group retrotransposons 3′-LTR segments; (3) to develop the IRAP marker based on the sequences of retrotransposons; and (4) to assess the genetic diversity of masson pine germplasm of China.

Materials and methods

Plant materials

Thirty-four masson pine cultivated lines were obtained from the National Masson Pine Germplasm Collection of Duyun Forestry Station (Guizhou Province, China) for genetic diversity analysis.

Seeds of masson pine from a same line obtained from the Duyun Forestry Station were germinated on moist filter paper in the dark and transferred to pots in the Plant Chamber Facility of Guizhou University. Seedlings irrigated with Hoagland and Arnon trace elements nutrient solution regularly were grown for about 2 months. Parts of these materials were used for genome walking, and others used for various stresses, such as heat (42°C), cold (−2°C), salicylic acid (2 mM), gibberellic acid (2 mM), 2, 4-D (50 mM) and UV (wavelength: 315–400 nm, irradiance: 0.68–0.8 W/m2) for 24 h. These needle materials were stored at −80°C until RNA isolation.

DNA extraction

Total DNA was extracted from fresh needles using Tiangen DNAsecure Plant Kit (Tiangen, Beijing, China) according to the manual.

RNA isolation and RT-PCR

Total RNA of normal and stressed materials was isolated using Invitrogen Plant RNA Purification Reagent (USA) according to the manual. All RNA samples were treated with DNase I (TaKaRa, Dalian, China) at 37°C for 2 h before RNA precipitation. First-strand cDNA synthesis was carried out using RNA LA PCR™ KIT (TaKaRa), following the manufacturer's instructions, and choosing the oligo dT-adaptor primer. Control reactions without RT were routinely included in the PCR amplification of reverse-transcribed products. Relative RT-PCR was performed to detect transcriptional activation of LTR-retrotransposons. The degenerate primers Rtp1 5′-ACNGCNTTYYTNCAYGG-3′ and Rtp2 5′-ARCATRTCRTCNACRTA-3′ were used to amplify RT domains of Ty1-copia group retrotransposons. The degenerate primers Gyrt1 5′-AGMGRTATGTGYGTSGAYTAT-3′ and Gyrt2 5′-CAMCCMRAAMWCACAMTT-3′ were employed to amplify RT domains of Ty3-gypsy group retrotransposons.

Determination of LTR sequences by genome walking

The principal protocol of GenomeWalker™ Universal Kit (Clontech, USA) was adopted to isolate 3′-LTR sequences of retrotransposons. Four independent pools of high-quality masson pine DNA (2.5 μg) were digested by the blunt end restriction enzymes DraI, EcoRV, StuI, and PvuII. Then each batch of digested genomic DNA fragments was ligated separately to the GenomeWalker adaptor, which consisting of a 48-mer (5′-GTAATACGACTCACTATAGGGCACGCGTGGTCGACGGCCCGGGCTGGT-3′) and an 8-mer with the 3′ end capped by an amino residue (5′-PO4-ACCAGCCC-N2H-3′). A walker step consisted primary and secondary PCR amplifications. The primary amplified mixture between the outer adaptor primer (5′-GTAATACGACTCACTATAGGGC-3′) and a primary gene-specific primer was used as the template for a secondary amplification with a nested gene-specific primer and the inner adaptor primer, which overlapped the primary primer (5′-ACTATAGGGCACGCGTGGT-3′). The gene-specific primers (Table 1) were designed from known sequences using the Primer Premier 5.0 software and were synthesized by Generay Biotechnology (Shanghai, China). Twenty-five-microliter PCR reaction mixtures included 20 ng of template DNA approximately, 2.5 μl of 10× Advantage 2 PCR buffer (Clontech, USA), 0.5 μl of dNTP (10 mM each), 0.5 μl of each primer (10 μM) and 0.5 μl of 50 × Advantage 2 Polymerase Mix (Clontech, USA). The primary PCR program used the following two-step cycle parameters: seven cycles of 94°C for 25 s and 72°C for 3 min, followed by 32 cycles of 94°C for 25 s and 67°C for 3 min, then 67°C for an additional 7 min after the final cycle. The secondary PCR program was the same as described above, except that 5 cycles was used instead of 7 cycles and 20 cycles instead of 32 cycles. Aliquot parts of the second PCR products were separated, recovered, cloned and sequenced as described previously (Fan et al. 2013). Three or four independent subclones were sequenced to correct errors generated by amplification and sequencing.

Table 1 The gene-specific primers employed in the genome walk

Sequence analysis

Sequence assembly was done using DNAMAN software. Homology queries were carried out by submitting our sequences to the BLASTn and BLASTx non-redundant database searching algorithms at the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov/). Multiple sequence alignments were made under the aid of ClustalX (version 2.1) software. Sequence alignments were annotated with GeneDoc. The sequences used for comparison were obtained from the GenBank database, such as Tnt1 (CAA32025) of Nicotiana tabacum, Copia (CAA26444) of Drosophila melanogaster, CIRE1 (CAJ09951) of Citrus sinensis and TLC1.1 (AAK29467) of Solanum chilense for Ty1-copia group comparison, and AFK13856 of Beta vulgaris, AAG51046 of Arabidopsis thaliana and ABA97923 of Oryza sativa Japonica for Ty3-gypsy group comparison. Both of the masson pine retrotransposons obtained here have been submitted to GeneBank with the accession numbers of KC355438 and KC355439. The conceptual translations were implemented corresponded to the conserved motifs expected by LTR retrotransposons.

IRAP amplification and data analysis

The IRAP primers (Table 2) were designed to match the LTRs and RT conserved regions with the aid of Primer Premier 5.0 software. PCR amplification performed in 10 μl reaction mixtures with 10–50 ng genomic DNA, 1 μl of primer (10 μM), 5 μl Mix (Tiangen, China. Containing 0.1 U μl−1 Taq polymerase, 500 μM dNTP, 20 mM Tris–HCl (pH 8.3), 100 mM KCl, 3 mM MgCl2 and other stabilizer and intensifier) and appropriate ddH2O. The cycling program consisted of initial denaturation at 94°C for 5 min, followed by 35 cycles at 94°C for 45 s, 55°C primer annealing for 45 s and 72°C extension for 1 min, with a final elongation step at 72°C for 7 min. The PCR products were separated either by electrophoresis on 1.8 % agarose gels in 1× TBE buffer and visualized under UV light after staining with ethidium bromide, or on 8 % PAGE gels followed by silver staining for better resolution.

Table 2 IRAP primers with sequence and polymorphism level

Each IRAP band was treated as a single locus. The presence or absence of a given length fragment in each sample was recorded manually in binary code. DNA marker data was processed by NTSYS-pc version 2.10 e using the SIMQUAL module with the Jaccard genetic similarity coefficient (GSj), and the similarity data was used to perform an unweighted pair group method with arithmetic mean (UPGMA) cluster analysis using the SHAN module.

Results

Detection of transcriptional activation

To determine the transcriptional activation function of retrotansponsons after exposure to abiotic stresses, reverse transcription (RT-PCR) was carried out in masson pine subjected to heat, cold, salicylic acid, gibberellic acid, 2,4-D and UV stress. Genomic DNA was used as a positive control and water as a negative control. The results revealed that salicylic acid, gibberellic acid, 2,4-D and UV light induced transcriptional activation of both Ty1- and Ty3-type retrotransposons, with 2,4-D as the strongest inducer on both. However, heat and cold treatment did not lead to transcription of LTR-retrotransposons (Fig. 1).

Fig. 1
figure 1

RT-PCR detection of LTR retrotransposon subjected to abiotic treatments. NC negative control, PC positive control. Heat, cold, SA, GA, 2,4-D and UV stands for heat, cold, salicylic acid, gibberellic acid, 2,4-D and UV treatment, respectively

Isolation and characterization of retrotransposon 3′-LTRs

In order to isolate the 3′-LTRs of retrotransposon from the masson pine genome, two sequences capable of generating the subsequent steps were used as original to carry out genome walking. One RT sequence of 266 bp named as PMRT16 (GenBank accession no. JQ975194) was isolated with degenerate primers corresponding to the conserved reverse transcriptase domains of Ty1-copia retrotransposons. The other Ty3-gypsy RT sequence of 432 bp, named as REPM6 (GenBank accession no. JQ975242), was also obtained by the similar method. The unknown fragments adjacent to both known sequences were isolated by GenomeWalker™ Universal Kit (Clontech, USA) with reference to the instruction. The largest and distinguishable PCR fragments from the secondary amplification were selected for sequence analysis. Identity of overlap sequences between the subsequently obtained fragment and the previous sequence implied that both fragments derived from a unique retrotransposon. Three successive steps downstream from PMRT16 and four successive steps downstream from REPM6 generated fragments of different sizes. Fragment assembly results revealed that both of the contig sequences contained almost exact LTRs that start with TG and end with CA. Therefore, an RT–RNaseH–LTR sequence of Ty1-copia retrotransposon, named as PmRT (GenBank accession no. KC355438) and an RT–RNaseH–INT–LTR sequence of Ty3-gypsy retrotransposon, named as REPm (GenBank accession no. KC355439), were characterized from the masson pine genome. The lengths are 1,766 and 2,533 bp for PmRT and REPm, respectively.

The sequences of different domains of PmRT and REPm retrotransposons were compared with the LTR retrotransposons of other reported organisms (tobacco, fruit fly, sweet orange, tomato, sugar beet, arabidopsis, japonica rice), and the results indicated that both retrotransposons contain conserved motifs (Fig. 2), such as a TG/CA inverted repeat in the LTRs. In addition, PmRT had three conserved amino acid motifs (TAF-HG, YGLKQ, and YVDDML) in RT sequence and two characteristic motifs (KHID and DMLTK) in the RNaseH region followed by a polypurine tract as an RNaseH–LTR junction (Pearce et al. 1999). REPm contained a zinc-finger domain (H-6aa–H-29aa–C-2aa–C) in the integrase followed by a polypurine tract besides the conserved coding region motifs of both the RT and RNaseH (Suoniemi et al. 1998).

Fig. 2
figure 2

The structure and conserved motifs of Ty1-copia group and Ty3-gypsy group retrotransposons. Sequences used for the alignment were obtained from the GenBank. The black vertical bars in the given regions represent sites of characteristic motifs. Completely conserved or nearly invariant amino acids among retrotransposons are indicated by black shade, and partially conserved signatures among retrotransposons are marked by gray shade

Development of IRAP makers for masson pine germplasm genotyping

The IRAP primers were designed to match the LTRs of PmRT and REPm, as well as the RT conserved regions of other reverse transcriptase sequences (Table 2). In order to test the suitability of the IRAP markers for genotyping, a set of six genotypes randomly selected from cultivated masson pine lines was used to carry out PCR amplification (Fig. 3). The scoring criteria consisted of the sharpness, number, and evenness in intensity of the PCR products following electrophoresis detection, as well as the degree of polymorphism among the genotypes. Some primers, which yielded either poor amplifications or few products, or produced primarily monomorphic products, were discarded. A set of nine primers might yield polymorphic and evenly distributed fragments (Table 2; Fig. 3), and those generating more than 40 % of polymorphism, was retained for further work.

Fig. 3
figure 3

IRAP analysis for a set of six masson pine lines. The primer codes used are shown above each set of lanes

Retrotransposons stability in the genome detected by IRAP markers

In order to see if we might detect retrotransposon mobilization, we elected the materials which stressed by 2,4-D (50 mM) for 24 h, and tested the stability of IRAP fingerprints of the materials from three sampled time. None of the analyzed seedlings in course of 2 months' trials displayed fingerprint changes (Fig. 4).

Fig. 4
figure 4

IRAP fingerprints detection. "1", "2" and "3" stand for the day before stress treatment, the fifth day and the 20th day after stress treatment, respectively. The primer codes used are shown above each set of lanes

Genetic diversity within masson pine germplasm

A total of 34 masson pine lines were scored by IRAP with the most informative nine primers, yielding 153 discernible reproducible fragments, among which 82 were polymorphic with a polymorphic rate as 53.6 %. The number of scorable bands per primer ranged from 13 (R17; 41.7 % polymorphic) to 28 (R16; 67.9 % polymorphic). The genetic relationships of the tested genotypes were unraveled using UPGMA method based on Jaccard similarity coefficients (ranged from 0.680 to 0.863 and overall mean: 0.77) computed with IRAP markers, and seven groups were obtained from dendrogram taking 0.73 as a threshold (Fig. 5). In the dendrogram, Group I included most of the accessions (15 genotypes), followed by group VII having nine masson pine lines. And remarkably, only one genotype was subclassed into group V, which indicated that this genotype had further genetic distance with other masson pine lines.

Fig. 5
figure 5

Dendrogram tree of 34 masson pine genotypes obtained from UPGMA cluster analysis using IRAP loci based on Jaccard similarity coefficients

Discussion

Previous evidences had showed that RT sequences of LTR retrotransposons in the masson pine genome were highly heterogeneous and in high copy number (Fan et al. 2013). In the present study, we had taken advantage of the ubiquity and abundance of LTR retrotransposons in plant genomes and their role in genomic diversification to develop IRAP marker, subsequently, the retrotransposon-based marker was used to genotype the masson pine lines, as well as to elucidate their genetic relationship. Two retrotransposon segments were isolated through genome walking based on adaptor-mediated nested PCR in this work. It showed that both of the contig sequences contained almost exact LTRs that start with TG and end with CA (Jin and Bennetzen 1989), which facilitate the design of LTR-specific primers. Out of tested primers, nine might yield discernible and polymorphic banding pattern, were applied to detect the transpositional activation of retrotransposon, and to illuminate the genetic diversity among 34 masson pine genotypes. Several cases have demonstrated that primers designed based on LTR sequences of retrotransposon families can be readily used across species lines (Lou and Chen 2007; Kalendar et al. 2011; Abdollahi Mandoulakani et al. 2012). In our study, some single IRAP primers based on retrotransposons of masson pine might produce polymorphic banding patterns in Hylocereus undatus and Prunus pseudocerasus (data not showed), indicating the presence of the homologous retroelements among them. This could be expected since horizontal transmission had happened during the evolution of retrotransposons (Stuart-Rogers and Flavell 2001).

Retrotransposons transcription and transposition activity

It is a common feature for some retrotransposons being transcriptionally activated by various biotic and abiotic stress factors (Grandbastien 1998; Ungerer and Kawakami 2013). We investigated the transcriptional activation of LTR-retrotransposons in masson pine by RT-PCR. The results revealed that abiotic stresses such as salicylic acid, gibberellic acid, 2,4-D and UV light induced transcriptional activation with the exception of heat and cold treatment, which is consistent with previous reports in oat (Kimura et al. 2001) and in cucumis (Jiang et al. 2010). The expression of retrotransposons and their transposition frequency in the host genome are regulated (Hirochika et al. 1996). In general, plant genomes remain stable in the face of retrotransposon replication through loss of the inserted copies overtime (International Brachypodium Initiative 2010). Practically, LTR-retrotransposons recombinational loss has less effect on display pattern than do new insertions (International Brachypodium Initiative 2010). Consequently, retrotransposon markers are sensitive enough to detect rapid genome changes (Belyayev et al. 2010). Genome instability was reported in most of subgenus Pinus (Grotkopp et al. 2004), many life-history patterns were indirectly but consistently associated with genome size. Previous reports demonstrated that transposable elements insertional dynamics could promote morphological and karyotypical changes, some of which might be potentially important for the process of microevolution, allowing species with plastic genomes to survive as new forms or even as new species in times of rapid climatic change (Belyayev et al. 2010). Morse et al. (2009) concluded that most of the enormous genome complexity of pines could be explained by divergence of retrotransposons. Therefore, the stability of IRAP fingerprints of stressed materials was conducted to see whether we might detect retrotransposon mobilization associated with genotrophs. None of the analyzed materials in course of 2-month trials displayed fingerprint changes. It suggested that LTR retrotransposons in conifers might be less frequently removed by unequal recombination than in other plant genomes, which was consistent with the conclusions from the conifer genome evolution as documented by Nystedt et al. (2013). Also, some small deletions possibly occurred through unequal recombination same as that in Arabidopsis (Devos et al. 2002), which could not be identified herein by agarose gel electrophoresis. It is also likely that the IRAP experiments did not display all the retrotransposons in the genome and would not detect the nested insertions located within the retrotransposon "behind" (5′ to) the PCR priming sites (Smýkal et al. 2011). Hence, we concluded that stress might activate retrotransposons transcription, but cannot exclude its subsequent transposition in masson pine.

IRAP markers for genetic diversity in masson pine

Genetic diversity is a commonly thought to be narrow as a consequence of plant breeding (Tanksley and McCouch 1997). As to masson pine, the elucidation of genetic diversity is highly important for genetic improvement, whereas, genetic uniformity was seen throughout South China caused probably by large-scale artificial afforestation during variety testing (Li and Peng 2001; Peng et al. 2003; Li et al. 2009). Up to date, it has been the first report of IRAP-based assessment of genetic diversity in masson pine. The dendrogram demonstrated that 7 groups were obtained taking 0.73 as a threshold (Fig. 4). Populations from different groups could be introduced as parents with enough genetic distance to produce hybrids on masson pine. Results from IRAP maker data presented here revealed no significant differentiation among tested masson pine germplasms, which was consistent with previous studies (Peng et al. 2003; Li et al. 2009). Numerous lines of evidence indicated that genetic diversity across the cultivars was lower compared with the wild lines (Mandel et al. 2011; Aranzana et al. 2012; Liu et al. 2012). Therefore, it was essential to enforce the protection of natural populations and to expand the scope of protection of natural populations on masson pine. In our study, a set of retrotransposon primers were developed, but only nine single polymorphic primers were used to test the genetic diversity of masson pine, while a series of primers can be derived in the future if they successively pairs with each other in IRAP analysis, with various anchored SSR primers for REMAP analysis or with different kinds of restriction endonuclease adapter primers for SSAP and MSAP analysis (Du et al. 2009).

In conclusion, we isolated the 3′-LTR sequences of Ty1-copia and Ty3-gypsy retrotransposon sequence separately in the masson pine genome. This study demonstrated the utility of high-resolution IRAP markers based on the LTRs and RTs of retrotransposons for distinguishing masson pine cultivated lines, and confirmed that those markers are highly informative for genetic diversity and phylogenetic studies. It also showed that primers designed based on masson pine retrotransposon families can be readily used across species lines. We did not aim here to show primer combinations that are more informative than single IRAP — this task awaits further large-scale genetic diversity analysis of masson pine.