Introduction

Retrotransposons are mobile genetic elements, which transpose via the reverse transcription of a transcribed, intermediate RNA (McDonald 1993; Feschotte et al. 2002). Retrotransposons are abundant and widespread components of eukaryotic genomes. They are usually present in plant genomes as populations of elements in high copy number, together accounting for more than 50% of the genome (Kumar and Bennetzen 1999; Vicient et al. 1999). The sequencing of 417.5 kb of the barley (Hordeum vulgare L.) genome revealed at least 40% to be composed of retrotransposons (Rostoks et al. 2002); in another 103-kb region, 75% was retrotransposons (Park et al. 2004). On a local scale, retrotransposons in the grasses are frequently present as extensive nests of elements inserted into each other that surround islands of genes (SanMiguel et al. 1996; Shirasu et al. 2000), although elements specifically or preferentially found in centromeres (Kumekawa et al. 2000; Nomomura and Kurata 2001; Zhong et al. 2002) or telomeres (Casacuberta and Pardue 2003) or as genie insertions (Yamakazi et al. 2001) have been described in the grasses and elsewhere.

Retrotransposons are subdivided into LTR and non-LTR retrotransposons, with the former being bounded by long terminal repeats (LTRs) oriented in the same direction (Boeke and Corces 1989). The life cycle of the LTR retrotransposons resembles that of retroviruses; it comprises transcription, reverse transcription, packaging into virus-like particles, and integration of the cDNA copy back into the genome. The polyprotein encoded by the retrotransposon itself provides the reverse transcriptase, Gag structural protein, and integrase for the latter steps (Frankel and Young 1998; Kumar and Bennetzen 1999), whereas cellular RNA polymerase II is responsible for transcription. The LTRs both contain the promoter necessary for transcription and specify the terminator and polyadenylation signals needed for RNA processing. The ends of the LTRs are recognized by the integrase. In addition, LTRs also contain the R (for “repeat”) region, lying between the transcription start and termination. Because the promoter functions in the 5′ LTR and the terminator in the 3′ LTR, the R region is found at both ends of the transcript. It enables the nascent (-)-strand cDNA to jump from a 5′ LTR to a 3′ LTR, a necessary step during reverse transcription.

Despite the functional importance of the LTR sequences, several reports indicate that the LTRs are one of the most rapidly evolving retrotransposon regions (Lankenau et al. 1990; Lyubomirskaya et al. 1990; Mizrokhi and Mazo 1990; Danilevskaya et al. 1997; Kalmykova et al. 2004). However, relatively few studies have focused on sequence variability in the LTR region of plant retrotransposons. A detailed analysis of regulatory regions (Casacuberta and Grandbastien 1993) and variability (Casacuberta et al. 1995) has been carried out for the tobacco (Nicotiana tabacum L.) retrotransposon Tnt1 and for the maize retrotransposon Grande1 (Garcia-Martinez and Martinez-Izquierdo 2003). The Tnt1 element is present in only a few hundred copies in the tobacco genome (Grandbastien et al. 1989; Hirochika 1993). This retrotransposon is generally silent within the plant, although it is strongly stress-inducible (Beguiristain et al. 2001).

In contrast to Tnt1, retrotransposon BARE-1 is present in more than 1.5 × 104 copies in barley and is similarly abundant in other Hordeum species (Vicient et al. 1999). Elements closely related to BARE-1 have been found in other grasses (Matsuoka and Tsunewaki 1996; Gribbon et al. 1999; Vicient et al. 2001). The BARE-1 element is transcriptionally active in the plant (Suoniemi et al. 1996), and processed BARE-1 translation products can be detected in barley tissues and in other cereals as well (Vicient et al. 2001). The BARE-1 LTRs are especially long, about 1.9 kb, and contain conserved regions (Suoniemi et al. 1997; Vicient et al. 1999). Sequence examination revealed that BARE-1 LTRs contain two canonical TATA boxes (Manninen et al. 1993), both of them being able to direct RNA transcription but under different conditions (Suoniemi et al. 1996). The BARE-1 LTR can drive expression of reporter genes in transiently transformed barley protoplasts in a manner dependent on the presence of a TATA box functional in planta as well. Deletion analysis of the promoter allowed identification of regions important for expression in protoplasts (Suoniemi et al. 1996).

Due to the great number of BARE-1 elements, intrachromosomal ectopic recombination between LTRs of the same or different elements can occur, leaving behind solo LTRs (Vicient et al. 1999; Shirasu et al. 2000; Kalendar et al. 2004). Solo LTRs comprise 85% of all retroelements in the yeast genome (Kim et al. 1998). In Hordeum, BARE-1 solo LTRs are 7- to 42-fold more abundant than full-length elements (Vicient et al. 1999) and LARD solo LTRs 9-fold more abundant than the full-length elements (Kalendar et al. 2004); they are also abundant for many retrotransposons in rice (Vicient and Schulman 2002). If recombination takes place between the LTRs of different individual elements, then chromosome rearrangements could occur as has been proposed for yeast (Kim et al. 1998). In other cases, recombination between related but not identical elements could generate new variants (McClure 1991; Lerat et al. 1999; Kalmykova et al. 2004; Mugnier et al. 2005). This phenomenon has also been reported for various viruses (Lai 1995).

The genus Hordeum, with some 50 species, is the second largest genus in the tribe Triticae of the family Poaoeae. It includes barley and is widely distributed in both hemispheres (von Bothmer et al. 1995). The species of the genus Hordeum can be divided into four genomic groups, designated H, I, X, and Y, based upon analyses of chromosomal pairing during meiosis in interspecific hybrids (Jacobsen and von Bothmer 1992).

In this study, we investigated the heterogeneity found in LTR sequences of retrotransposons similar to BARE-1 both within and between species of Hordeum. Our results demonstrate the existence of three subfamilies of BARE LTR sequences, each one characteristic of one of the three meiotic recombination groups of the Hordeum genomes analyzed. The data indicate that recombination between BARE retroelements generates several distinct classes of products, and may be important in BARE-1 evolution. We also provide evidence that BARE-1 elements can have an influence on the evolution of the host genome, not only by increasing genome size, but also by serving as substrates for rearrangements.

Materials and Methods

Plant Material

Accessions and provenances of all Hordeum accessions are as previously described (Kankaanpää et al. 1996). Seeds were germinated, and the seedlings grown for 10 days before DNA was extracted from the leaves.

DNA Extraction, PCR Amplification, Cloning, and Sequencing

DNA was extracted as previously described (Vicient et al. 1999). For amplification of genomic LTRs, primers at the ends of the LTRs were used (N referring to equal amounts of A, T, G, and C in the primer preparation at that position); “LTR-full-forward”, 5′-NNTGTTGGAATTATGCCCTAGAGGCAA-3′; “LTR-full-reverse”, 5′-NNTGTGGGGAACGTCGCATGGGAAAC-3′. The PCR reactions were performed using 10 ng genomic DNA, 0.2 mM each dNTP, and 1 pmol μl−1 each primer in a final volume of 50 μl. The mix was overlaid with paraffin oil. The reaction mixtures were heated to 95°C for 5 min, followed by 21 cycles of 94°C for 30 s, 40°C for 2 min, and 72°C for 2.5 min. Reactions were completed by incubation at 72°C for 10 min. The PCR products were purified from agarose gels (QIAEX II; Qiagen, Hilden, Germany) and cloned (pGEM-T vector system; Promega). Reactions were performed in a Minicycler (MJ Research, Waltham, MA) thermal cycler. For the amplification of the tandem arrays at C-terminal of the LTRs, the primers used were: “tan-forward”, 5′-GCTGTACGTGTGCTGAACGCGGAGGTG-3′, and “tan-reverse”, 5′-AACGCGGTTGATGTAGT(G/C)GAACGTC-3′. The PCR conditions were as above, but with a 1-min extension at 72°C. For the amplification of tandemly arrayed BARE-1 copies, primers 1 and 2 were the “full-forward” and “full-reverse,” described above, primer 3 was 5′-CGGATCTGAATGTAGCAACCCGC TG-3′, and primer 4 was 5′-CTACGCATGAACCTAGCTCATG ATGCC-3′. PCR conditions were as described previously, using a 2-min extension at 72°C and 40 cycles. The primers used for specific BARE-1 and BARE-2 amplifications were, respectively, “LTR-full-forward” and “reverse,” 5′-CTGGTTGGCCCACG(T/C)GAG CCATT(G/A)ATCTACAACA(C/T)A-3′ and 5′-CTG GTTGGC CCACAGTAGAGCTATAG(T/C)GCAAGCTAC-3′. PCR conditions were as above, but using a 3-min extension at 72°C and 30 cycles.

Quantitative PCR for LTRs was carried out using conserved primers amplifying both BARE-1 and BARE-2 LTRs, forward 5′- TGTTGGAAATATGCCCTAGAGGCA-3′ (primer R20045, nt 1–24 at the 5′ end of the LTR) and reverse 5′-GACG GCACCTCCGCGTTCAGCACA-3′ (primer R20046, nt 1568–1591 of the LTR). The diagnostic internal domain (LTR) structure of tandem elements was quantified using a forward PPT (polypurine tract) primer 5′-CGGATCTGAATGTAGCAA CCCGCTG-3′ (primer 82574) and a reverse PBS (primer binding site) primer 5′-CTACGCATGAACCTAGCTCATGATGCC-3′ (primer 8378). The 20-μl reactions contained: 20 ng barley DNA (cv Sultan), 0.2 mM dNTPs, 0.2 μM each primer, 1U BioTools Thermophilus thermus polymerase (Catalog number 10.001, Biotools, B & M Labs S.A., Madrid), 1 × BioTools buffer. The reaction mixture was heated to 94°C for 2 min, then subjected to cycles of 94°C for 30 s, 55°C for 30 s, and 72°C for 2 min in an Eppendorf Master Cycler thermocycler using tubes of 0.2 ml. Reactions for the two sets of primers were compared cycle-by-cycle for up to 21 cycles; on agarose gels, the LTRs were detectable by ethidium bromide staining after 4 cycles and the tandem structure after 9 cycles. Product was quantified by scanning the amplified bands in gels stained with ethidium bromide on a Fuji imaging system (FLA-5000) using a resolution of 50 μm. Calculations were made for the logarithmic portion of the amplification reaction, assuming a doubling of the product with each cycle.

The RNA for RT-PCR was isolated with the RNAqueous kit (Ambion 9690) and then treated with RNAse-free DNAse I (Roche). The RT-PCR reactions were performed using the OneStep RT-PCR kit (Qiagen) according to the manufacturer’s instructions, using 1 μg of total RNA and an amplification program comprising of 30 cycles of 45 s at 94°C, 45 s at 50°C, and 1 min at 72°C. Controls for DNA contamination consisted of reactions lacking dNTPs in the reverse transcription step but added instead at the beginning of the PCR step. The primers used for the promoter amplification were RT-forward, 5′-CCCGCTATTGG ATATTGACCGAGGAGTCCCTCGG-3′, and RT-reverse, 5′-CTGGTTCGGCCCAGGACG(G/A)CACCTCCGCGTrCAGCA CACG-3′.

Plasmid minipreps served as the templates for sequencing reactions. The reactions were catalyzed with Sequenase v2.0 (Amersham Pharmacia Biotech, Uppsala, Sweden) and resolved under standard conditions on an automated sequencing system (ALF; Amersham Pharmacia Biotech). The full-length LTR sequences and their corresponding database accession numbers are listed in Table 1. The accession numbers for the sequenced promoter regions, derived from PCR amplifications, are AJ582402-AJ582411 for genomic sequences, AJ582527–AJ582543 for sequences derived from leaf RNA, and AJ582544–AJ582561 for sequences derived from callus RNA. These accessions numbers will be available in the EMBL and GenBank databases.

Table 1 Origin of the full-length LTR sequences used in this study

Sequence Analysis

Sequence alignments were performed using ClustalW software (Higgins et al. 1994) with manual editing. Display and shading of the alignment was performed with GenDoc vers. 2.6.02 (Nicholas and Nicholas 1997). Phylogeny construction was done using the Treecon program (van de Peer and de Wachter 1997). Distances were calculated according to the Kimura 2-parameter model (Kimura 1980), and the trees were statistically evaluated using 1000 bootstrap samples. The analyses of sequence divergences were done using DnaSP program (Rozas and Rozas 1999).

Results

BARE-1 LTR Sequences in Different Hordeum Species

We have investigated sequence variability in the LTRs of the BARE-1 retrotransposon in the genomes of five different Hordeum species chosen as representatives of three meiotic recombinational types within the genus: H. vulgare, or cultivated barley (diploid, I genome); H. euclaston, H. roshevitzii, and H. pusillum (diploids. H genome); and H. marinum (diploid. X genome). These three representatives of the H genome type were chosen because H. euclaston and H. pusillum represent the smallest genomes (Kankaanpää et al. 1996; Jakob et al. 2004), the lowest number of full-length BARE-1 elements, and the highest proportion of BARE-1 solo LTRs of the Hordeum species previously investigated (Vicient et al. 1999). The species H. roshevitzii has a relatively high BARE-1 copy number and the second highest number of solo LTRs for any Hordeum species examined. The X genome, which comprises only H. marinum, has the highest number of solo LTRs and highest ratio of LTRs to full-length elements of any Hordeum spp. Cultivated barley was included because it is the best-explored experimental system for BARE-1. Representatives of the H and Y genome groups were excluded because their BARE-1 population sizes, solo LTR levels, and haploid genome sizes are not distinctive.

Genomic DNAs from accessions of the five species were used for PCR amplification with two primers located at the extremes of the LTR, using low stringency conditions for primer annealing. Following PCR amplification, DNA fragments only of the expected size were detectable and corresponded to the LTRs of the same size as the original BARE-1a element (1.9 kb). A total of 24 fragments were cloned and sequenced, as listed in Table 1. Two of the sequences contained major deletions, heucc (≈ 734 bp deletion) and hmare (≈ 296 bp deletion), and the sizes of the others ranged from 1719 bp (hvuld) to 1971 bp (hmarb), with an average size of 1789 bp.

The sequences were aligned, together with the 13 full-length BARE-1 LTRs from H. vulgare that were present in the database. Neighbor-joining and parsimony methods were used to generate phylogenetic trees from the alignment and yielded similar results. The 37 sequences cluster into three major families (Fig. 1), correlating very well with the three meiotic types of Hordeum genomes investigated (I, H, and X). There was only one exception, hrosd, which will be discussed later.

Fig. 1.
figure 1

Phylogenetic tree of BARE-1 LTR sequences. The tree was constructed by the neighbor-joining method. Sequence distances were calculated according to the Kimura two-parameter method, not taking in account insertions or deletions. The three groups of LTR sequences are indicated (I, H, X). The sequence names are as in Table 1. Bootstrap values, based on 250 replicates, of greater than 50% are shown. Horizontal lines are proportional in length to the sequence divergence; scaling of the vertical lines is only for clarity.

Species Distribution of the Different LTR Groups

Taking the BARE-1 groups as defined by Hordeum genome type into account, we then focused on the wealth of partial LTR sequences available for cultivated barley, H. vulgare. We compared these, presented in Table 2, with the consensus sequences of each of the three groups of LTRs that correspond to genome types, in order to determine to which of the groups they belong. Of the 59 barley sequences, 48 are more similar to I, which corresponds to the genome type of barley itself, 5 more similar to H (8%), 2 more similar to X (3%), and 4 more similar to H and X than to I (7%). Of the 18 wheat sequences, only one was more similar to I (6%), 7 more similar to H (39%), 2 more similar to X (11%), and 8 more similar to H and X than to H (44%). These data show that LTRs typical of the non-barley groups are also present in cultivated barley, indicating that these groups predate the divergence of the Hordeum genome types.

Table 2 Database accessions of BARE-1-like LTR sequences

Although no LTR sequences from other Hordeum species are present in the database, 18 sequences from bread wheat (Triticum aestivum L.) representing retrotransposon Wis-2, which is very similar to BARE-1, were found. Of these, shown in Table 2, we could place only one into LTR group I and one into group X. Eight fell into group H and eight displayed equal similarity to H and X. Hence, the Wis-2 retrotransposons of wheat appear most closely related to the BARE retrotransposons of the H genome species, which include Asiatic, North American, and South American diploids.

Localization of the Nucleotide Heterogeneity

In our survey of LTR sequence variation, we quantified levels of nucleotide polymorphism within each retrotransposon family using Nei and Jin’s (1989) measure of nucleotide diversity, π. Only the full-length sequences of Table 1 were analyzed. For all the sequences, the variability was 0.12193 ± 0.00669. The nucleotide variability is lower for I sequences (0.08519 ± 0.00902) and higher for the other two groups: 0.10579 ± 0.09350 for the H group, and 0.11478 ± 0.01620 for the X group. The variability is not uniformly distributed throughout the sequences. The nucleotide diversity (π) was studied using a sliding window of 50 bp and a step of 5 bp (Fig. 2). Three regions showed higher sequence diversity than the others: a short stretch next to the 5′ end; a tandem array close to the 3′ end; and the region between the two putative TATA boxes. The first hypervariable region is very short, about 23 bp, located approximately 250 bp from the 5′ end of the LTR. The variability in this segment is higher than that in the surrounding regions when considering all the sequences together, for the I and H groups, and when comparing the I-group sequences with those of the X or H groups. The data indicate that the short region has two forms, one for I and one for H and X together, which is uniform inside the families.

Fig. 2.
figure 2

LTR sequence divergence. The values on the x-axis correspond to the nucleotide position in the LTR alignment. The values on the y-axis are the nucleotide diversity (π) measure, the percentage of divergent nucleotides relative to the number of informative bases, calculated using a sliding window of 50 bp and a step of 5 bp, by the DNASp program (Rozas and Rozas 1999). Insertions and deletions were considered as single mutational events. The position of the two TATA boxes and the tandem array are indicated. Data are shown for all sequences described in Table 1 together, for each LTR group, and for pairwise comparisons between groups.

Tandem Repeats in the 3′ Region of the BARE-1 LTR

A region about 165 bp from the 3′ end of the LTR is composed of an array of tandemly repeated short sequences. In LTRs of group I, the tandem array is composed of a 12-bp unit. In groups H and X, the basic unit is 23 bp long and the first 12 bp are 100% similar to the repeat in LTRs of group I. The last 11 bp are 62% similar to the first 12 bp. The number of repeats is variable. PCR amplification with primers surrounding the tandem array, using genomic DNAs extracted from various Hordeum species, show a ladder of bands corresponding to different numbers of copies of the tandem repeats (Fig. 3). Two different band patterns were observed. In the species with Y or I genomes (Y, H. murinum; I, H. vulgare and H. bulbosum), three main bands were detected. Based on the position of the primers and a tandem unit of 12 bp, these may correspond to tandem arrays with three, five, and six repeats, respectively. This is consistent with the sequenced LTRs of H. vulgare in which five repeats on average were found. In the species with H or X genomes, at least nine bands were detected. Assuming a tandem unit of 23 bp, the most intense bands must correspond to tandems containing 1, 2, 3, 4, and 7 repeats.

Fig. 3.
figure 3

Variation in the copy number of the tandem arrays in the LTR. PCR amplification of the BARE-1 LTR region containing the tandem array was carried out and the products separated by agarose gel electrophoresis. An ethidium bromide–stained gel is shown. Multiple bands indicate the presence of multiple classes of LTRs differing in their number of repeats. The templates were from: 1, H. vulgare; 2, H. bulbosum; 3, H. murinum; 4, H. euclaston; 5, H. pusillum; 6, H. brachyanterum; 7, H. roshevitzii; 8, H. depresum; 9, H. marinum; 10, H. patagonicum. Size markers in bp are indicated on the right.

Two of the species have a slightly different pattern. In H. roshevitzii, the band corresponding to two repeats is more intense than it is in the other species. This species is the one with the lower average number of repeats in the sequences. In contrast, in H. marinum the average size of the bands seems to be higher and bands corresponding to 10 repeats or more are detectable. Accordingly, the sequences in this species have a higher average number of repeats. These data confirm the overall sequence-based placement of LTRs into the I, H, and X groups and suggests that Y genomes contain LTRs more similar to those of the I group.

Variability in the Promoter Region and Differential Expression of BARE Subfamilies

Upstream from the tandem array, the region containing the two BARE promoters was the most variable of the LTR. For the LTRs examined, especially for the within-group or pairwise comparisons, more sequence divergence was found in the region surrounding the second TATA box than surrounding the first (Fig. 2). The amount of variability surrounding the second TATA box is lower within the LTR groups than between them, but nevertheless remains higher than that for the LTR as a whole. The region upstream of the first TATA box shows little variability in the I-group LTRs; this box was earlier shown to contribute little to BARE-1 expression in barley (Suoniemi et al. 1996).

To better understand the variability in this region, we PCR-amplified, cloned, and sequenced more genomic sequences from it. We designed primers in the conserved flanks surrounding this region and cloned, from H. vulgare, ten genomic sequences, five from each of two independent amplification reactions. A total of about 470 bp were compared, located between the two TATA boxes. When a phylogenetic tree was generated from alignments of these sequences together with Wis-2 (ALIGN_000706, available via the SRS tool from the EMBLALIGN database at http://srs.ebi.ac.uk/srs6bin/cgi-bin/wgetz?-page+query+-libList+EMBLALIGN+-newId), the results were similar but not identical to those obtained previously (representatives are presented below in Fig. 5). All the sequences not from H. vulgare clustered together except hrosd and Wis-2, as we show above. The others form two clusters. One contains Wis-2 and H. vulgare hvBARE-2-5 and hvBARE-2-3 (the two LTRs of the same element, in accession AJ279072).

A low level of transcription in unstressed plant tissues has been demonstrated for several retrotransposons (Vicient et al. 2001) including BARE-1 (Suoniemi et al. 1996). Because the variable region includes a fragment previously shown to be necessary for the promoter activity of the BARE-1a LTR (Suoniemi et al. 1996), we decided to check the sequence conservation of the region in transcripts. We generated cDNAs from H. vulgare leaves and calli. PCR-amplified the variable region, and cloned and sequenced the products. A total of 17 sequences from leaves and 18 from calli were produced and aligned with the genomic DNA sequences described above (ALIGN_000706). The variable region of the alignment (Fig. 4) divides the sequences into three groups. The upper two in the alignment show good conservation across the alignment, whereas the third, which includes Wis-2, is considerably more variable. These three groups differ much more in the promoter than in other parts of the sequence.

Fig. 4.
figure 4

Alignment of BARE LTR promoter sequences. Ten sequences from the complete alignment (ALIGN_000706) were selected from each of the two major clusters and five from the diverse group that is from the H and X genomes. The variable region between the TATA boxes is presented. Three groups of LTR sequences are indicated by boxes over the alignment, with BARE-1 and BARE-2 together comprising the group I LTRs. The sequence accession names are as in Table 1; HvCal sequences are derived from callus cDNA, HvLeaf sequences from leaf cDNA. Accessions for these are listed in ALIGN_00706 and in Materials and Methods.

A phylogenetic tree generated from the genomic and cDNA sequences (Fig. 5) defines three major groups. The non-barley Hordeum sequences, except hrosd, are separated from a cluster containing Wis-2, the hvBARE-2-5 and hvBARE-2-3 sequences, and 15 cDNAs, 10 from leaves and 5 from calli, with a bootstrap strength of 70%. The other clade, distinct by a bootstrap value of 99%, contains all the other barley genomic sequences, hrosd, 13 calli cDNAs and 7 from leaves. The tree topology was unaffected by gap score. The RNA sequences were interspersed among the H. vulgare DNA sequences, although their relative abundance in the sequence groups was different from the DNA clones. This may reflect differences between the groups in their present transcriptional activity and their historic integrational success.

Fig. 5.
figure 5

Phylogenetic tree of the BARE LTR promoter sequences. The tree is constructed with the neighbor-joining method. Sequence distances were calculated according to the Kimura two-parameter method, not taking in account insertions or deletions. Accessions in the alignment are listed in ALIGN_00706 and in Materials and Methods. Sequences labeled Calli are derived from callus cDNA and those labeled Leaf are from leaf cDNA. Sequences included in Figure 4 are boxed. Bootstrap values greater than 35%, produced from 500 replicates, are shown. Horizontal line lengths are proportional to evolutionary divergence; those of the vertical lines are only for clarity.

The two strongly supported I-group clades or families were named BARE-1 and BARE-2. The RNA clones predominated in the BARE-2 clade, which contains only two genomic sequences but 15 of the RNA sequences. The BARE-1 family, in contrast, contains the other 20 RNA sequences together with 23 genomic sequences. Of the RNA sequences, a higher proportion from callus is found in the BARE-1 clade, whereas the BARE-2 cluster has relatively more leaf RNA clones. The third clade comprises a diverse group of sequences from the H and X LTR groups.

BARE-2 Is a Chimeric and Defective Retrotransposon

In order to clarify the nature of the BARE-2 elements, a full-length BARE-2 retrotransposon was sequenced from a barley BAC clone (AJ279072). This particular BARE-2 element is inserted into a Bagy2 retrotransposon. The sequences hvBARE-2-5 and hvBARE-2-3 correspond to the LTRs of this BARE-2 element. The full element is 8615 bp long and contains LTRs of 1811 and 1810 bp. A perfect target site duplication, GGTAC, was found at the insertion site, indicating that the integration event is relatively recent. When translated, however, the deduced BARE-2 polyprotein is interrupted by stop codons and frameshifts, as are many BARE-1 copies (Suoniemi et al. 1998), especially within the gag region. Furthermore, the putative ATG start codon of BARE-1a is deleted in BARE-2.

When the sequence of BARE-2 was compared with the related BARE-1 and Wis-2a elements (Fig. 6), a sharp dichotomy in sequence relatedness was observed. The untranslated leader region and all of the LTR of BARE-2, except the first 240 bases, were more similar to BARE-1 than to Wis-2a. However, the other parts of the element including the gag and pol (proteinase, integrase, and reverse transcriptase) coding regions were more similar to Wis-2a. The abrupt shift in similarity suggests that BARE-2 is chimeric, the product of a recombinational event.

Fig. 6.
figure 6

The chimeric nature of BARE-2. The x-axis corresponds to the position in the BARE-2 retrotransposon. The value on the y-axis was calculated as the percentage of divergence from Wis-2 minus percentage of divergence from BARE-1. Insertion and deletions were considered as single mutational events. Divergence was calculated in a window of 60 nucleotides, advanced in steps of 6 nucleotides.

The deletion in BARE-2 of the segment containing the start codon of the polyprotein provided a means of designing primers specific for BARE-1 and BARE-2 elements. These were used with genomic DNAs of other Triticeae members, including Elymus repens and various Hordeum and Triticum species (Fig. 7). By this standard, both BARE-1 and BARE-2 appear to be present in all species tested except in Elymus repens, which failed to amplify a BARE-2 product.

Fig. 7.
figure 7

Presence of BARE-2 elements in species of the genera Hordeum, Triticum, and Elymus. PCR amplifications were carried out using primers that specifically amplify sequences of the gag region of BARE-1 (1) and BARE-2 (2) elements. The figure shows a negative image of an ethidium bromide–stained agarose gel following electrophoresis. Wat., water control; H.vu., Hordeum vulgare; H.sp. Hordeum spontaneum; H.mr., H. murinum; H.eu., H. euclaston; H.pu., H. pusillum; H.br., H. brachyanterum. H.er., H. erectifolium; H.bo., H. bogdoni; H.mt., H. muticum; H.st., H. stenostachys; H.pa., H. patagonicum; H.ro., H. roshevitzii; H.de., H. depresum; H.ma., H. marinum; T.du., T. durum; T.ae., T. aestivum; E.re., Elymus repens.

Chimeric LTRs in Hordeum

Comparisons between BARE-2 and the related BARE-1 and Wis-2a sequences suggested that BARE-2 elements might generally be chimeras. This was tested further. We examined the hrosd sequence because, although it is from H. roshevitzii of the H genome set of species, it belongs to group I, which is mainly composed of H. vulgare sequences. The sequence was compared with the consensus sequences for the I, H, and X groups, using a sliding window. The hrosd is more similar to H LTRs in the first 520 bases and in the last 200, but is more similar to those of group I in the more variable middle region (Fig. 8A). This suggests that the hrosd LTR is a chimera between the I and the H LTR types. In order to confirm this, we performed a phylogenetic analysis using different parts of the alignment between the hrosd and consensus sequences (not shown). The trees generated are consistent with the previous results; in trees based on the central part of the alignment, hrosd fits into the I clade, whereas in trees using the LTR extremes, it clusters with the H clade.

Fig. 8.
figure 8

Evidence for recombination between two LTR sequences belonging to different LTR groups. The values on the x-axis correspond to the position in the LTR alignment. The values on the y-axis were calculated as the percentage of divergence from the consensus sequences of groups I (dotted line) and H (bold line). Insertion and deletions were considered as single mutational events. Divergence was calculated for a window of 60 nucleotides, advanced in steps of 6 nucleotides. The LTR sequence is indicated in each graph.

Similar analyses were performed with all other LTR sequences in the study, looking for new cases of chimeric sequences. Five more were detected, with one shown in Fig. 8B. In hvbaca2, approximately the first 250 bp are more similar to group H members, and the rest to group I. The same situation was found in hvbacb2-5, hvbacb2-3, hvBARE-2-5, and hvBARE-2-3. These results suggest the existence of recombination between LTRs of the different groups to create chimeric LTRs.

Tandem Multimeric BARE-1 Insertions

Chimeric retroelements and LTRs of the kind described above may arise through strand switching during reverse transcription, by crossovers or gene conversion, or via intrachromosomal recombination. In the latter case, recombination between LTRs of a single element generates solo LTRs. However, if recombination were to take place between the right LTR of one element and the left LTR of another element downstream from the first, a tandem multimeric element could be generated. This would consist of two internal regions and three LTRs, one of which abuts both coding regions. We tested and confirmed the presence of tandem copies of BARE-1, sharing one LTR, in the barley genome by PCR.

Outward-facing primers matching the internal regions (respectively, PBS and RNAse H) of BARE-1 were designed (Fig. 9). From these primers, two adjacent or nearby full-length elements would yield a product containing two LTRs and possibly intervening genomic DNA of varying size. A nested insertion of one full-length BARE-1 into the LTR of another would generate products distinguishable by their sequence, organization, and size. However, the PCR amplification did not reveal such events but instead produced a single band of the size expected for a tandem multimeric structure produced by LTR-LTR recombination. Sequencing of one of these fragments confirmed that they correspond to a shared LTR located between two internal domains.

Fig. 9.
figure 9

Tandem BARE-1 elements. The positions of the primers, assigned numbers, are shown, respectively, on an intact retrotransposon (A), labeled for the LTR and the internal (“ORF”) domains, and on a tandem recombinant structure (B). The presence of BARE-1 internal domains flanking single LTRs in the genome Hordeum vulgare was confirmed (C) by PCR using primers 3 and 4, matching the internal region of BARE-1. The PCR amplification produced a single band of expected size (lane 3-4), which is less abundant than those corresponding to full-length LTRs independent of whether they are from solo LTRs, full-length elements, or tandem elements (lanes 2–4). Lanes 1–2 and 1–3 are, respectively, from all LTRs and the left LTRs of full-length elements. Lane 3–4 was loaded with fivefold more PCR product than the others. The cloning and sequencing of products from lane 3–4 confirmed their origin from tandem structures (data not shown). Products of quantitative PCR are shown in (D). The left side of the gel displays lanes with 10–21 cycles of amplification for the tandem recombinant structure. The right side displays 4–15 cycles of amplification for a BARE LTR region. The markers visible on the left correspond to 1200 and 1300 bp.

The prevalence of the tandem structures was estimated by quantitative PCR. The presence of internal domains flanking an LTR was detected with PBS and PPT primers. The abundance of this structure was compared to that of BARE LTRs using conserved LTR primers. Using the two reaction products from genomic DNA as templates, we established that the two primer pairs were matched in their efficiency of amplification. The primers were also tested for amplification efficiency over a range of annealing temperatures (55°C to 65°C) and were found to be specific, comparable in efficiency, and robust over that range. The reactions were analyzed only over their logarithmic amplification range (Fig. 9D; cycles 6 to 14 for the LTR, 11 to 19 for the tandem repeat). The BARE LTRs were on average 28.4 times more prevalent in the target DNA (barley cv. Sultan) than were LTR-internal domain tandem structures. The BARE LTRs were previously estimated at about 1.3 × 105 copies per haploid genome equivalent in barley. Based on this, about 4.6 × 103 such tandem structures are present in the genome.

Discussion

Sequence Heterogeneity in BARE-1 LTR Sequences

Replication of retrotransposons is very error-prone, due to the lack of proofreading repair activity by RNA polymerase and reverse transcriptase. As a consequence, the replication of a single retrotransposon can generate a population of closely related, but not identical, sequences resembling the “quasi-species” populations described for RNA viruses (Domingo et al. 1985; Casacuberta et al. 1995). Furthermore, individual copies of retrotransposons are not expected to be under strong selection to maintain function and would accumulate mutations at the neutral rate following their integration. However, those retrotransposon copies with appropriate expression patterns, efficient mechanisms of replication and integration, and non deleterious integration preferences will tend to predominate in the population over time, leading to a feedback loop of purifying selection for functionality (Suoniemi et al. 1998).

If replication errors and mutational drift were the only factors responsible for the sequence variation found in the BARE-1 LTR, it might be expected that the variation would be randomly distributed throughout. Our results indicate clearly that this is not the case. The localization of the nucleotide variation within the BARE-1 LTR indicates that selective pressure is directed specifically. This can be understood in terms of the LTR function and suggests that there is sufficient transcriptional and integration activity to provide a selective feedback loop for these retrotransposons.

A retrotransposon under selection will display conservation in those regions that are essential for replication. Although LTRs are the most rapidly evolving region of the LTR retrotransposons (Kulguskin et al. 1981; Lankenau et al. 1990; Lyubomirskaya et al. 1990; Mizrokhi and Mazo 1990), they also contain some functionally important regions: the terminal segments, recognized by integrase, the promoter and enhancer elements, and the RNA processing signals. The data here show that the termini of BARE LTRs are conserved. Two functional TATA boxes have been detected in BARE-1 LTRs and the regions important for TATA2-promoter activity have been determined (Suoniemi et al. 1996). The region upstream of the TATA1 promoter, which is relatively inactive in transient assays (Suoniemi et al. 1996), is well conserved between the BARE-1 and BARE-2 families. However, the region of TATA2 necessary for its activity is the least conserved region of the LTR, not only between the families, but also within them. Similar results were reported for Tnt1 (Casacuberta et al. 1995; Vernhettes et al. 1998), copia (Matyunina et al. 1996), and retroviruses (Carpenter et al. 1991; Maury et al. 1997; Montano et al. 1997), although these are single-promoter systems.

High variability in the promoter regions opens the possibility of different transcriptional profiles, a form of niche differentiation, for retrotransposon subfamilies. The sequences reported here for the promoter region show that although BARE-2 comprises 92% of the genomic BARE copies, it contributes only 72% to the total number of cDNAs derived from callus tissue and just 41% of the total cDNAs sequenced from leaves. Taken together with the distinct promoter sequences of BARE-1 and BARE-2, these data indicate that these two retrotransposon families are differentially regulated, and that the pools of factors regulating BARE transcription vary from tissue to tissue.

The predominance of particular LTR groups in each Hordeum genome type suggests that certain forms were favored during the amplification of the BARE families, which happened, at least to a great degree, after speciation. In cultivated barley, which we have analyzed more extensively than the other species, all three groups (I, H, X) were found. Therefore, the groups appear to predate the divergence of the Hordeum genome types, but in each genome type one group has come to predominate. The higher proportion of full-length BARE-1 elements (Vicient et al. 1999) and a lower LTR sequence diversity among the I-group elements in cultivated barley suggest that growth in BARE copy number in barley may have been more recent and more pervasive than in the other species investigated. A conceptually similar phenomenon has been reported for the 1731 retrotransposon family of Drosophila (Kalmykova et al. 2004). Variants with both altered transcriptional profiles due to changes in the LTR sequence and altered translational strategy due to loss of frameshifting have supplanted the more ancient forms.

Chimeric Retrotransposons Generated ThroughTemplate Switching

In addition to sequence variation derived from replicational errors, retrotransposons are subject to recombinational mutagenesis. Four forms of recombination can be distinguished: template-switching during reverse transcription to generate a chimeric cDNA; integrase-catalyzed integration of one element into another; LTR-LTR recombination, generating either solo LTRs or tandem arrays of LTRs and internal domains; and allelic recombination and gene conversion between homologous chromosomes. Ectopic, interchromosomal recombination between retrotransposons is likely suppressed, due to the disruption of chromosomal integrity and the consequent lethality it would cause. The analyses of BARE retrotransposons reported here provide evidence for template switching and for the generation of tandem arrays through LTR-LTR recombination.

The full-length BARE-2 retrotransposon that is described here displays abrupt switches in sequence similarity between two related families of elements, BARE-1 and Wis-2. Hence, it appears that this BARE-2 is a mosaic or chimeric element generated by stand switching during replication. Retrotransposons (Gabus et al. 1998; Feng et al. 2000) and retroviruses (Hu and Temin 1990) are known to package two RNA templates. In yeast, 14 of the 32 elements previously identified as Ty1 are actually Ty1/Ty2 hybrid elements (Jordan and McDonald 1998, 1999); template switching between packaged Ty1 RNAs occurs with a high frequency (Wilhelm et al. 1999). Recombination within the LTR that affects the regulatory region of the Drosophila retrotransposon 412 has recently been reported (Mugnier et al. 2005). In plants, the sole examples of retrotransposon chimeras are the presence of segments of the nonautonomous Dasheng element in retrotransposon RIRE2 (Jiang et al. 2002a).

Template switching is better explored in retroviral than in retrotransposon replication and is a well-known phenomenon (reviewed by Mikkelsen and Pedersen 2000). Although template switching takes place as a normal part of reverse transcription, during the transfer of (−)-strand and (+)-strand strong-stop DNAs (Hu and Temin 1990; Marr and Telesnitsky 2003), it can occur by the jumping of the growing DNA strand to the co-packaged alternative template at many places across the retroviral genome (Jetzt et al. 2000; Moumen et al. 2001). The process can take place despite mismatched nucleotides (Yu et al. 1998; Marr and Telesnitsky 2003), but template secondary structure is important in determining its likelihood (Mikkelsen and Pedersen 2000; Moumen et al. 2001). The process is thought to increase retroviral fitness through the creation of diversity and altered tropisms and by repair of nonfunctional regions (Mikkelsen and Pedersen 2000). The BARE-2 and other chimeric elements reported here support the observation that template switching during replication also may generate diversity for plant retrotransposons, conferring advantages similar to those gained by retroviruses.

In both hrosd and hvbaca2, sharp increases in sequence divergence relative to the consensus occur near the start of transcription mapped for TATA1 (Suoniemi et al. 1996). The position of the shift away from the consensus sequence suggests it could have arisen through strand switching to a heterologous RNA during reverse transcription. An alternative is gene conversion. Although there is no evidence that gene conversion favors LTRs, at least in yeast it appears to play an important role in retrotransposon evolution (Jordan and McDonald 1999). The size of two conversion tracks in maize was recently estimated in 0.9–1.5 kb (Dooner and Martínez-Ferez 1997), which is similar to the central, divergent region of the hrosd sequence. However, although gene conversion may be important in the highly recombinogenic yeast genome, it appears to play a role in only a minority of gene families in the genomes of barley and other cereals (Zhang et al. 2001).

Two lines of evidence suggest that BARE-2 is an active retrotransposon despite its nonfunctional protein-coding domain. First, the terminal direct repeats (TDRs) of the BARE-2 insertion into Bagy-2 are identical and its LTRs are >99% similar, implying a recent integration event. Second, it is more transcriptionally active, especially in leaves, than BARE-1, which is an active element. The 5-bp TDRs, furthermore, indicate that the BARE-2 copy that was analyzed is not a consequence of post-insertional recombination. The prevalence and conservation in plants of nonautonomous retrotransposons such as TRIM (Witte et al. 2001) and Dasheng elements (Jiang et al. 2002a, b), the latter of which are members of the insertionally polymorphic LARD class (Kalendar et al. 2004), show that possession of an open reading frame is not required for the evolutionary success of a retrotransposon.

Recombination Between BARE-1 Elements

The means by which retrotransposons are reverse transcribed produces identical LTRs in the cDNA copy that is integrated. Direct repeats were earlier shown experimentally to recombine with an additive frequency, dependent on their length (Puchta and Hohn 1991). Short TDRs normally flank full-length retroelements and are produced as a consequence of integration by the staggered cuts made at the target site by integrase. The TDRs found on the flanks of solo LTRs in the barley genome have been interpreted as resulting from recombination between LTRs of a single retrotransposon, deleting everything in between (Shirasu et al. 2000). The very high ratio of solo LTRs to full-length BARE-1 elements in barley implies that LTR-LTR recombination is frequent for this retrotransposon (Vicient et al. 1999).

Recombination between LTRs of two different individual elements can generate a range of products, depending on which two LTRs, distal or proximal in relationship to each other, recombine. Recombination between the right LTR of a nested element with the left LTR of the surrounding element was reported for a single instance in barley (Shirasu et al. 2000). Here, we have demonstrated that LTR-LTR recombination in barley generates repeat units consisting of two internal domains flanking a single, recombinant LTR. This kind of structure has been found in yeast for Ty1 and Ty5 (Ke and Voytas 1997; Kim et al. 1998). Our estimates by quantitative PCR show that these structures are not rare, but are present in about 4.6 × 103 copies per haploid genome. Furthermore, the tandem structure represents only one of four possible outcomes of the recombination of an LTR from one element with that from another (the others being a solo LTR and in two cases a single intact element). In silico analyses showed that, of the solo LTRs of 11 low- and middle-copy-number families in the rice genome, 11% appear to be the product of interelement recombination (Ma et al. 2004), though these are less frequent in Arabidopsis (Bennetzen et al. 2005). In the orthologous regions of the rice subspecies that have been examined, Ma et al. (2004) estimate that half of the LTR retrotransposons inserted over the last 5 million years have undergone LTR-LTR recombination.

Two alternatives for the production of tandem retrotransposon structures have been proposed for yeast. One involves the demonstrated recombination between cDNAs and integrated copies (Ke and Voytas 1999) by a single-strand annealing mechanism in silent DNA. The other possibility is recombination between a one-LTR circle, such as generated by LTR-LTR recombination within a single element, and an integrated copy (Kim et al. 1998). However, such structures appear rare in the yeast genome (Kim et al. 1998); experimental frequencies are only high when mutations in the integrase or LTR termini block normal integration (Sharon et al. 1994). Hence, the significance for plants of these alternative mechanisms for the generation of tandem elements remains to be clarified.

The recombination between LTRs of a single element to generate solo LTRs has been suggested as a mechanism to counteract constant genome expansion due to retrotransposon propagation (Vicient et al. 1999; Shirasu et al. 2000; Bennetzen 2000; Kalendar et al. 2004). However, this process is not fully efficient because it removes only a single internal domain and LTR, together ∼7.2 kb for BARE-1, and leaves behind a potentially long LTR (1.8 kb for BARE-1). Recombination between LTRs of different elements can remove large segments of intervening DNA in one step to generate either solo LTRs with dissimilar flanking TDRs (terminal direct repeats) or tandem elements of the sort demonstrated here.

Recombination between LTRs of different elements to generate a tandem structure, however, can be very deleterious if it removes intervening genes as well. The “gene island,” “repeat sea” organization typical of large cereal genomes (Panstruga et al. 1998; Feuillet and Keller 1999; Shirausu et al. 2000; Rostocks et al. 2002; Wei et al. 2002; Park et al. 2004) limits the number of genes interspersed between retrotransposon pairs that could be subject to such a mechanism. Although tandem structures are almost 30-fold less prevalent than solo LTRs, of which there are about 6 × 105 in the barley genome, their number is about 5% that of the estimated 3 × 104 genes in the genome. Furthermore, we would not have detected tandem structures that have been disrupted by further recombination events or by nested insertions. Dispersion of retrotransposons and genes would have subjected at least 1600 genes to loss though recombination. Hence, genome organization where LTR-LTR recombination is frequent may be driven by simultaneous pressures to limit genome expansion and to retain cellular genes.