Introduction

Class 1 (RNA) transposable elements, or retrotransposons, propagate through a replicative mechanism called retrotransposition, in which an RNA intermediate is reverse-transcribed into a cDNA that can integrate in the genome (for review see Capy et al. 1998). There are two types of retrotransposons: those bordered by long terminal direct repeats (LTRs) are called LTR-retrotransposons while those without LTRs but with poly(A) (or another form of simple nucleotide repeat) at their 3′ end are called non-LTR retrotransposons (Doolittle et al. 1989; Xiong and Eickbush 1990). Non-LTR retrotransposons include long interspersed elements (LINEs) and short interspersed elements (SINEs). LINEs encode a protein(s) that catalyzes their retrotransposition through a mechanism termed target-primed reverse transcription (TPRT) (Eickbush 1992; Luan et al. 1993). SINEs are short (<500-bp), nonautonomous elements with no coding capacity and their retrotransposition relies on the protein machinery supplied in trans by LINEs (Dewannieux et al. 2003; Kajikawa and Okada 2002).

A typical SINE consists of three regions: a 5′ head containing conserved promoter motifs that are recognized by RNA polymerase III (Pol III), a 3′ tail usually consisting of poly(A) and an internal region (Okada 1991a, b). Head regions of most SINEs are ancestrally derived from tRNA genes. Accordingly, transcripts from this region can potentially fold into cloverleaf-like secondary structures reminiscent of their tRNA progenitors. Two exceptions are the primate Alu family and the rodent B1 family, which have head regions derived from 7SL RNA genes (Ullu and Tschudi 1984; Weiner 1980). The internal regions of different SINE families are highly variable in size as well as sequence. In several cases, sequence similarity between the internal region of a SINE and the 3′ untranslated region of a LINE has been noted (Ogiwara et al. 1999; Ohshima et al. 1996; Okada and Hamada 1997; Okada et al. 1997; Terai et al. 1998). This feature led to the hypothesis that ancestral SINEs were generated when the 3′ end of a LINE was fortuitously translocated (by retrotransposition or recombination) immediately downstream of a tRNA gene, and the resulting chimeric sequence was mobilized and amplified by LINE-encoded protein(s) (Ohshima et al. 1993; Okada and Hamada 1997).

Several SINE families consist of subfamilies, or groups of elements within a family that share a set of diagnostic nucleotides. Examples include the Alu family in primates (Shen et al. 1991), the AFC family in cichlid fish (Takahashi and Okada 2002), and the S1 family in Brassica (Lenoir et al. 1997). In several cases subfamilies appear to have exchanged their head and internal regions, resulting in a new “mosaic” subfamily (Kass et al. 1995; Lenoir et al. 1997; Takahashi and Okada 2002; Zietkiewicz and Labuda 1996).

Integration of SINEs into chromosomes occurs via a TPRT process, where a LINE-encoded protein cleaves the bottom strand of target DNA (called “5′ nicking”) and utilizes the exposed 3′ DNA end to prime the reverse transcription of SINE RNA into the first cDNA strand. Another cleavage is then made on the top strand of the target DNA (called “3′ nicking”) and the synthesis of second strand cDNA proceeds (Luan et al. 1993). Following the integration of SINEs and the repair of single-stranded gaps, a direct repeat of target site sequence (called the target site duplication; TSD) is generated. Previous analyses of the sequence context near several plant and animal SINE families revealed sequence motifs at the 5′ and/or 3′ nicking sites (Jurka 1997; Lenoir et al. 2001; Tatout et al. 1998).

SINEs are widespread in all three eukaryotic kingdoms but appear to be more diverse and abundant in animals than in plants or fungi. Numerous families of SINEs have been described in animals and some have accumulated to extremely high copy numbers. For example, the human genome harbors ∼1.5 million Alu SINEs, accounting for ∼10% of the genome (Lander et al. 2001; Schmid 1996). In contrast, only a few families of SINEs have been reported in plants, where their copy number is generally much lower. Plant SINEs include p-SINE1 in rice (∼6500 copies), TS in tobacco (∼50,000 copies), S1 in Brassica (∼500 copies in B. napus), and AtSN1/RAthE3, AtSN2/RathE1, and RAthE2 in Arabidopsis thaliana (∼70, ∼130–150, and ∼60 copies, respectively) (Deragon et al. 1994; Lenoir et al. 2001; Myouga et al. 2001; Umeda et al. 1991; Yoshioka et al. 1993).

Here we describe a new family of SINEs, named BoS, that is widespread in the Brassicaceae family and present at ∼2000 copies in B. oleracea. Over 400 full-length elements were identified from a partial B. oleracea genomic database (generated by The Institute for Genomic Research; TIGR) and grouped into 15 distinct subfamilies. BoS elements were characterized with regard to their modular primary structure, secondary structure, and target site preference as well as their distribution in the Brassicaceae family. Overall, the BoS family exhibits several interesting features, including frequent formation of mosaic subfamilies through sequence exchange between existing subfamilies and an ability of the head regions of their RNAs to form distinctive hairpin-like secondary structures. BoS is the most abundant SINE family identified in the Brassicaceae and is one of the most diverse families from any organism identified to date with regard to the number and sequence divergence of its subfamilies.

Materials and Methods

Database Searches and Sequence Analyses

Computer-assisted searches to identify BoS elements in B. oleracea were performed against the TIGR B. oleracea genome database (brassica prelim sequences; http://tigrblast.tigr.org/euk-blast/index.cgi?project=bog1) with no filter but otherwise using default parameters. Preliminary B. oleracea sequence data were obtained from the TIGR Web site at www.tigr.org. Blastn searches to identify BoS elements in other Brassicaceae species were performed at the National Center for Biotechnology Information Web site (http://www.ncbi.nlm.nih.gov; databases NR and EST; no filter but otherwise using default parameters). A. thaliana tRNA sequences were downloaded from the Genomic tRNA Database (http://rna.wustl.edu/GtRDB/At/At-align.html) (Lowe and Eddy 1997). Multiple sequence alignments were performed with the CLUSTALW server available at European Bioinformatics Institute (http://www.ebi.ac.uk/clustalw) using default parameters. Consensus sequences were derived based on simple majority rule. RNA secondary structures were predicted with the Vienna RNA Secondary Structure Prediction server (http://rna.tbi.univie.ac.at/cgi-bin/RNAfold.cgi) (Hofacker 2003). Phylogenetic trees were generated based on the neighbor-joining method, using PAUP* version 4.0b8 with default parameters (Swofford 1999). Bootstrap values were calculated from 1000 replicates.

Determination of BoS Target Site Preference

Sequence context of BoS insertion sites was analyzed as previously described (Jurka 1997). Briefly, the 5′ flanking sequences of BoS elements, including TSDs and 15 additional bases immediately upstream of TSDs, were adjusted so that all TSDs started at the same position, and the 3′ flanking sequences including TSDs and 15 additional bases immediately downstream of TSDs were adjusted so that all TSDs ended at the same position. The nucleotide occurrences at each position were then determined. Chi-square analysis of BoS target site preference was performed as described (Tatout et al. 1998) using a significance level of p < 0.001 (at 5′ nicking site) or p < 0.01 (at 3′ nicking site) for 3 degrees of freedom (df).

Results

Identification of a New Family of SINEs in B. oleracea

In the process of analyzing TEs in the TIGR B. oleracea genomic sequence database, a 194-bp segment (clone JBOGL76RB; position 165–358) was identified as an insertion into a LINE element. This segment has several structural features of SINEs, including a 3′ poly(A) track (22 bp) and a flanking direct repeat (12 bp with one mismatch), but does not share significant similarity with previously described SINEs. The sequence of this segment was used as query in a Blastn search to determine if related elements exist in the B. oleracea genome. Fifty hits were identified (E value <10−3) that displayed ∼75–100% identity to the query over their entire length, ended at the 3′ end with a poly(A) tract, and most were flanked by short direct repeats (see below). In addition, 120 hits displayed less sequence identity (60–75% identical) to either the entire query sequence or its 5′ end (∼80–90 bp), suggesting the existence of distantly related elements. These hits were resolved into groups based on sequence identity (not shown) and sequences from each group were used as queries to identify additional elements by reiterated Blastn searches. In this way, 377 additional full-length hits were identified. None of the 428 elements (including the original element, the 50 elements identified in the initial Blastn search, and 377 elements identified in subsequent searches) shared significant sequence similarity with previously described SINEs outside of the Pol III promoter motifs (see below). Therefore, these elements represent a new SINE family that was named BoS for B. oleracea SINE. In addition to recovering full-length elements, the Blastn searches also detected a large number (∼350–400) of elements that were truncated during cloning (located on clone ends) and were not analyzed further. Because each truncated element, statistically, represents half an element, the available database (covering ∼30% of the B. oleracea genome) contains ∼600–650 BoS elements. Therefore, the copy number of BoS elements was estimated to be ∼2000 in the entire genome.

The nucleotide sequences of the 428 full-length BoS elements were compared by CLUSTALW multiple alignment and used to generate an unrooted phylogenetic tree (available upon request). In this way, 15 subgroups were resolved that were well supported by bootstrap values. These subgroups were defined as subfamilies and designated BoS_a through BoS_l (subfamily features summarized in Table 1). Intrasubfamily sequence identity (defined as average nucleotide sequence identity of subfamily members to their respective consensus sequences) ranged from ∼80% (BoS_e) to ∼98% (BoS_ai2) and copy numbers ranged from a few (∼3, BoS_l) to ∼270 (BoS_ab) (extrapolated to the whole genome). The consensus sequences of BoS subfamilies are provided in supplemental data.

Table 1 Summary of BoS elements in B. oleracea

Primary Structure of BoS Elements

BoS elements exhibit a characteristic SINE structure, including a 5′ head region (∼80 bp; containing putative promoter motifs recognized by RNA polymerase III) (Fig. 1; A and B boxes), an internal region (length varies from 50 bp in BoS_h to 140 bp in BoS_f), and a poly(A) track (average length, ∼19 bp). Comparison of the consensus sequences of BoS subfamilies revealed significant similarity in their 5′ head regions (on average 66% identity), with the highest level of sequence conservation in blocks designated 1–5 (Fig. 1). Block 1 is located at the 5′ end, where nearly all families begin with the sequence 5′-AACCRRG-3′. Blocks 2 and 4 contain the A and B boxes of Pol III promoter motifs that are highly similar to those found in several previously described plant SINEs as well as A. thaliana tRNA genes. Comparison of the consensus sequences of BoS head regions to 629 A. thaliana tRNA genes showed that BoS elements are distantly related to tRNAGln genes (anticodon CTG and TTG) and tRNAPro genes (anticodon AGG, CGG and TGG) (∼50–60% identity; Fig. 1). While the highest level of sequence similarity was found in blocks 2 and 4, weaker but significant sequence similarity was also detected in other regions. However, neither sequence similarity (Fig. 1) or phylogenetic analysis (not shown) could unambiguously determine which tRNA gene is more closely related to BoS elements, and thus the direct ancestor of BoS remains to be identified.

Figure 1
figure 1

Primary structure of BoS subfamilies. A schematic representation of element structure is shown above the alignment. Alignment of regions conserved in all subfamilies is shown and is discussed in the text. Regions that are highly variable among subfamilies are not shown. Blocks of highly conserved residues (numbered 1–6) are noted by horizontal bars above the alignment, while the A and B boxes of RNA pol III promoter motifs are indicated by horizontal bars below the alignment. Arabidopsis tRNAGln (CTG, TTG) and tRNAPro (AGG, CGG, TGG) genes are included in the alignment of the head region. The A and B boxes and 3′ termini of SINEs previously described in Brassicaceae (including AtSN1/RAthE3, AtSN2/RAthE1, RAthE2, and S1) are included for comparison.

The only significant similarity shared by all BoS subfamilies in their internal region was a 5-bp motif (TTATC) located at the extreme 3′ end, immediately upstream of the poly(A) track (block 6; Fig. 1). Furthermore, the two nucleotide residues upstream of this motif are usually purines (Fig. 1). A similar motif is also found in all known SINEs from the Brassiceae family, including AtSN1/RAtheE3 (GATATTC), AtSN2/RAthE1 (GGTATTC), RAthE2 (GGTTATC) from A. thaliana and S1 from Brassica (GTTAATC) (Deragon et al. 1994; Lenoir et al. 2001; Myouga et al. 2001), but is absent from the tobacco TS and the rice p-SINE1 families (Umeda et al. 1991; Yoshioka et al. 1993). Therefore, it appears that all poly(A) SINEs from Brassicaceae described to date share an 5′-RRTWWTC-3′ motif at the 3′ termini of their internal regions.

Mosaic BoS Subfamilies

In several instances the head region of one BoS subfamily was found to be highly similar to the head region of a second subfamily, while its internal region is highly similar to a third subfamily. For example, BoS_a, BoS_ab, and BoS_ai2 have nearly identical head regions but distinct internal regions (Fig. 2A). Significantly, the internal region of BoS_ab is nearly identical to BoS_b, whereas BoS_ai2 shares its internal region with BoS_i2 (Fig. 2A). Note that the head regions of BoS_b and BoS_i2 are only weakly similar to each other and to the head regions of BoS_a/ab/ai2 (42–72% identity). The relationships among these subfamilies determined from their head and internal regions are clearly inconsistent with each other (Fig. 2B). These results suggest that formation of the ancestral BoS_ab and BoS_ai2 subfamilies involved shuffling of the head and internal regions between existing elements (see Discussion). A similar process may also be involved in the formation of BoS_c and BoS_f subfamilies, as they share 70% identity in the head region but are 96% identical over ∼50 bp of their internal region (not shown).

Figure 2
figure 2

Mosaic BoS subfamilies. A Comparison of the nucleotide sequences of BoS_a, ab, b, ai2 and i2. Head regions of the subfamilies are aligned (shaded black), while alignment of the internal regions is only shown for comparisons between BoS_ab and BoS_b (shaded light gray) and between BoS_ai2 and BoS_i2 (shaded dark gray). B Phylogenetic relationships of BoS_a, ab, b, ai2, and i2 determined based on their head regions (left) and internal regions (right). These unrooted trees were generated based on the neighbor-joining method. Bootstrap values were calculated from 1000 replicates.

Secondary Structure of BoS Elements

Because SINEs transpose via an RNA intermediate, it was of interest to determine the secondary structure of their encoded RNAs. The head regions of all BoS subfamily RNAs are predicted to form a relatively stable hairpin structure with free energies ranging from ∼−30 to ∼−45 kcal/mol (on average, −35.3 kcal/mol) (Fig. 3). This structure is highly conserved despite intersubfamily variation in primary sequence because nucleotide substitutions on one strand of the hairpin are often accompanied by compensatory changes on the other strand. For example, the predicted secondary structures of the head regions of BoS_ab and BoS_b RNAs are nearly identical despite their sharing only 73% sequence identity. In contrast to the head regions, the internal regions from BoS subfamily RNAs adopt a variety of secondary structures (not shown). Folding of the entire BoS RNA from different subfamilies, however, reveals that differences in internal region secondary structure do not appear to interfere with the hairpin formed by the head region.

Figure 3
figure 3

Predicted secondary structures of the head regions of BoS transcripts. The conserved A and B boxes of Pol III promoter motifs are in bold face. Free energies (kcal/mol) of the secondary structures are as follows: BoS_a, −36.30; BoS_ab, −45.25; BoS_ai2, −37.84; BoS_b, −38.42; BoS_c, −30.59; BoS_d, −30.28; BoS_e, −33.55; BoS_f, −32.42; BoS_g, −33.28; BoS_h, −33.50; BoS_i1, −40.61; BoS_i2, −34.53; BoS_j, −39.47; BoS_k, −33.98; BoS_l, −29.18.

Target Site Specificity of BoS Elements

Integration of SINEs into chromosomes involves two cleavage events (referred to as 5′ and 3′ nicking) of the target site DNA (Luan et al. 1993). Several animal and plant SINE families were found to insert preferentially into A/T rich regions of the genome, and for some elements, conserved sequence motifs have been discerned at both the 5′ and the 3′ nicking sites (Jurka 1997; Lenoir et al. 2001; Tatout et al. 1998). To determine whether BoS elements also have a target sequence preference, direct repeats flanking BoS elements (that could represent target site duplications [TSDs]) were first identified by comparing the sequences flanking each BoS element. TSDs were identified for 319 of the 390 elements with sufficient flanking sequence to make this determination. Ten additional elements flanked by TSDs of 1–5 bp were excluded from this analysis as short TSDs might represent coincidental matches. The length of TSDs ranged from 6 to 19 bp (on average, ∼13 bp), but most (∼70%) were 11–17 bp. Analysis of the sequence context of BoS insertion sites indicated that BoS elements insert preferentially into A/T-rich regions of the B. oleracea genome (Fig. 4). In addition, the 5′ nicking site exhibits a strong preference for 5′-B/AAA-3′ (where B represents C, G, T) (p < 0.001). A weaker but significant preference (5′-T-3′; p < 0.01) was also detected at the 3′ nicking site.

Figure 4
figure 4

Insertion site preference of BoS elements. A, B Base occurrences at the 5′ and 3′ nicking sites, respectively. The designations “−1” and “1” are the positions immediately upstream and downstream of the nicking sites, respectively. Adenosine residue data points are represented by black boxes; cytosine, by open boxes; guanosine, by gray boxes; and thymidine, by open and circles. C, D Chi-square tests of the 5′ and 3′ nicking sites, respectively. Horizontal dashed lines correspond to the significant chi-square values (p < 0.001 for 5′ nicking site and p < 0.01 for 3′ nicking site).

BoS Elements Are Widespread in the Brassicaceae Family

The relatively high level of sequence divergence among different BoS subfamilies indicates that BoS is an old family and, as such, related elements should exist in other plants. In order to identify these elements, the consensus sequence of each BoS subfamily was used as query in Blastn searches against the GenBank NR and EST databases. Seven BoS elements were identified from the genomic or EST sequences of three other Brassica species: B. napus, B. rapa, and B. nigra (Table 2). In each case, the BoS element from another Brassica species is closely related, over its entire length, to a particular BoS subfamily from B. oleracea, indicating that each species harbors multiple BoS subfamilies and that the divergence of these subfamilies predated the separation of the species. In addition, four BoS elements were found in A. thaliana (Table 2) that are highly similar to each other over their entire length (∼91% identical) but only share sequence similarity with the Brassica BoS elements in the head region (∼55–60% identity). No BoS element was found from species outside the Brassicaceae family.

Table 2 BoS elements in other Brassica species and A. thaliana

Discussion

In this study a new family of SINEs, named BoS, was identified and found to be widespread in Brassicaceae, where it is present at ∼2000 copies in B. oleracea. BoS elements have typical features of SINEs, including a (i) modular primary structure, (ii) head region distantly related to tRNA genes (Fig. 1), and (iii) strong target site preference similar to previously described plant and animal SINE families (Fig. 4). However, with 15 distinct subfamilies, BoS is, to our knowledge, one of the most diverse SINE families described to date. Comparison of related but distinct subfamilies has allowed the identification of conserved structural features as well as the origin of new subfamilies from parts of other subfamilies. For example, the head regions of all BoS subfamilies can fold into an unusual hairpin-like secondary structure, and existing subfamilies frequently exchanged their head and internal regions, resulting in the formation of new, more active subfamilies. While conserved secondary structure may be necessary for retrotransposition, the origin of new subfamilies from existing subfamilies may have been a major driving force in the evolution of the BoS family.

Unusual Hairpin-like Secondary Structure of BoS Head Regions

The potential of the head regions of BoS elements to form a conserved hairpin-like secondary structure is unusual for SINEs, as previously characterized SINEs derived from cellular RNAs usually adopt the secondary structures of their cognate cellular RNA. For example, the head regions of tRNA-derived SINEs usually fold into a cloverleaf-like secondary structure (Kachroo et al. 1995; Takahashi and Okada 2002; Yoshioka et al. 1993), whereas the head regions of 7SL RNA-derived SINEs fold into so-called “hammer head” (e.g., the B1 family) or “double hammer head” (e.g., the Alu family) structures (Labuda et al. 1991; Labuda and Zietkiewicz 1994; Sinnett et al. 1991). However, the head region of the rodent ID SINE RNA (ancestrally derived from tRNAala) was shown to form a stable hairpin-like secondary structure that is strikingly similar to that of BoS elements (Rozhdestvensky et al. 2001). In addition, although the secondary structure has not been described for several tRNA-derived SINEs from Brassicaceae (AtSN1/RAthE3, AtSN2/RAthE1, RAthE2, and S1), folding of their consensus sequences indicated that the RNA of their head regions can also form a relatively stable hairpin-like secondary structure (not shown). Taken together, these results suggest that the potential ability to form a hairpin-like structure in the head region may be shared by many tRNA-derived SINEs.

Comparison of distinct BoS subfamilies with divergent primary nucleotide sequences provided several lines of evidence suggesting that such a structure has been conserved during the evolution of BoS elements and may be functionally important. First, it is the most stable secondary structure formed for all 15 BoS subfamilies (∼−30 to ∼−45 kcal/mol, with an average of ∼−35 kcal/mol) (Fig. 3). Second, conservation of this structure does not simply result from conservation of primary nucleotide sequence, as BoS subfamilies share on average only ∼66% identity in this region. Rather, the base pairing necessary to form the hairpin structure appears to have been maintained by a strong selective pressure despite primary sequence divergence. For example, the head regions of BoS_ab and BoS_b RNAs differ by a 1-bp indel and 20 substitutions, of which 10 are compensatory changes that occur at five positions (e.g., A-to-G changes on one strand accompanied by T-to-C changes at corresponding positions on the opposite strand). The remaining 10 substitutions and 1-bp indel do not cause any disruption in the hairpin, as they either are in unpaired regions or are A/G transitions when the corresponding nucleotide on the opposite strand is U (Fig. 3). As a result, the secondary structures of the head regions of BoS_ab and BoS_b RNAs are nearly identical.

What could be the functional significance of this structure? Mimicry of tRNA or 7SL RNA structure by SINEs is thought to be involved in the localization of SINE RNA in the close proximity of the translation apparatus where LINE-encoded proteins are being synthesized (Labuda and Zietkiewicz 1994). This reflects the necessity for SINE RNA to compete with LINE RNA, as LINE-encoded proteins exhibit a marked cis preference for the mRNA from which they are translated. The RNA of ID elements has been shown to interact with several proteins to form an RNA–protein (RNP) complex (West et al. 2002). Interestingly, one component of this RNP complex is poly(A) binding protein (PABP), a regulator of translation initiation, suggesting that the RNP complex may be involved in the targeting of ID RNA to the translation apparatus (Muddashetty et al. 2002; West et al. 2002). Furthermore, incorporation of ID RNA into the RNP complex appears to be dependent on the secondary structure, rather than the primary sequence, of the RNA head region (West et al. 2002). Considering the striking similarity between the secondary structure of BoS and ID head regions, it is possible that the conserved hairpin structure in BoS RNA (as well as in AtSN1/RAthE3, AtSN2/RAthE1, RAthE2, and S1) is also involved in their targeting to the translation apparatus by a similar mechanism.

BoS Elements Emerged and Were Active Over a Long Period of Time

The BoS family must be relatively old because family and subfamily members display a high level of sequence divergence. In addition, the presence of related elements in other species in the Brassicaceae family indicates that BoS elements predate species divergence. The age of a family or subfamily of retro-elements can be estimated by assuming that nucleotide substitutions accumulate at random after retrotransposition and that 1% sequence divergence corresponds to one million years. Using these criteria, the oldest subfamily, BoS_e, may have emerged roughly ∼20 Mya, as its members share ∼80% identity, whereas members of the youngest subfamily, BoS_ai2 (97.5% sequence identity), originated about ∼2–3 Mya (Table 1). It should be noted that the age of BoS subfamilies determined this way remains a rough estimate because, for example, the nucleotide substitution rate in transposable elements is often different from that in nuclear genes, and different genomes or even different regions in the same genome have different substitution rates. Nevertheless, it is apparent that different BoS subfamilies emerged and were most active during various evolutionary time periods. Interestingly, of the 15 BoS subfamilies, 5 have ∼80% to ∼85% intrasubfamily sequence identity, whereas members of 8 other subfamilies are ∼93% to ∼98% identical (Table 1). Thus, the activity of most BoS subfamilies appears to be restricted to two periods, one corresponding to the emergence and early diversification of the Brassicaceae family (∼15–20 Mya) and the other much more recent (∼2–7 Mya).

It is not known whether any BoS elements are active in B. oleracea. For an element to be active it must be transcribed because the RNA is the transposition intermediate. To date, Blast searches have not turned up any BoS ESTs from this species. However, several ESTs that appear to be bona fide BoS transcripts were found in B. napus (Table 2). Taken together, these results indicate that different BoS subfamilies have emerged and were active during various evolutionary time periods and that active BoS elements may exist in some Brassica species.

Mosaic BoS Subfamilies

Formation of mosaic SINEs through sequence exchange between existing elements appears to be a common phenomenon in plants and animals, and such a process has been proposed as a way to generate new and more active elements (Kass et al. 1995; Lenoir et al. 1997; Takahashi and Okada 2002; Zietkiewicz and Labuda 1996). In this study, sequence comparison among BoS subfamilies indicates that formation of mosaic elements has occurred frequently during the evolution of the BoS family, with 7 of the 15 subfamilies either participating in or formed by sequence exchange. As described above, the age of a particular BoS subfamily can be estimated based on intrasubfamily sequence divergence. In several instances, the progenitor–progeny relationship inferred by intersubfamily sequence comparisons of BoS subfamilies is supported by the age of the subfamilies involved. For example, the mosaic subfamily BoS_ab is younger than both of its progenitors, BoS_a and BoS_b. Similarly, the mosaic subfamily BoS_ai2 is younger than both BoS_a and BoS_i2. The situation of BoS_c and BoS_f is less clear, as a third subfamily involved in the sequence exchange was not identified. However, since BoS_c is younger than BoS_f, it is likely that BoS_c acquired its internal region from BoS_f. Finally, two of the three mosaic subfamilies (BoS_ab and BoS_c) have the highest copy numbers among all subfamilies, providing support for the notion that some mosaic elements may be more successful than the elements they derive from.

Two mechanisms, involving either gene conversion or template switching, have been proposed to explain the formation of mosaic SINEs. The gene conversion model posits that mosaic SINEs form through recombination between the genomic copies of two different SINEs (Kass et al. 1995; Lenoir et al. 1997; Zietkiewicz and Labuda 1996). In contrast, the template switching model postulates that during reverse transcription of the RNA of one SINE into cDNA, the reverse transcriptase switches its template to the RNA of a second SINE (Takahashi and Okada 2002). A key difference distinguishing the two models is the requirement for SINE transcription and reverse transcription for template switching to occur but not for gene conversion which occurs between genomic copies. The results from this study indicate that, in the case of BoS, the template switching model is more likely, because all donor BoS subfamilies (BoS_a, BoS_b, BoS_i2, and BoS_f; see above) have been active relatively recently. In contrast, of the six older subfamilies (BoS_e, g, h, i1, j, and k, with less than 90% sequence identity; see Table 1) where recent activity is unlikely, none were found to be involved in mosaic SINE formation, either with another old subfamily or with one that was recently active. Furthermore, in each case where a mosaic subfamily was formed, the age of the two donor subfamilies are very similar (Table 1), suggesting that they may have been simultaneously active.

In summary, this study identified BoS as the most abundant SINE family in Brassicaceae and one of the most diverse families from any organism. In addition to the typical modular primary structure and target site preference of SINEs, BoS elements have several unusual features, including the distinct hairpin-like secondary structure formed by the head regions of their RNAs and the frequent formation of new, more successful mosaic subfamilies through sequence exchange between existing subfamilies. Finally, with 15 distinct subfamilies that were active during various time periods and the availability of the sequence information at a large number of insertion sites, BoS elements should provide useful phylogenetic makers for future analyses of the evolutionary relationships among species in the Brassicaceae family.