Introduction

Transposable elements (TEs) are fundamental components of most, if not all, plant genomes, which play important roles in the structure, variation and adaptive evolution of genomes (McClintock 1984; McDonald 1995; Kidwell and Lisch 1997). TEs are divided into two classes. Class 1 comprises the RNA elements (retrotransposons), which transpose via an RNA intermediate by a mechanism that is dependent on reverse transcription (Kumar and Bennetzen 1999). Class 2 consists of the DNA elements, which usually contain terminal inverted repeats (TIRs) and transpose via a DNA intermediate (Kunze et al. 1997). Class 2 elements can be further divided into two groups. Autonomous elements, such as the CACTA elements (also called the En/Spm family) which have the core sequence CACTA in their TIRs, possess two ORFs encoding a TNPA/TNP1 DNA-binding protein and a TNPD/TNP2 (TNP2-like) transposase (TPases), both of which are essential for their transposition. This group includes the elements En/Spm from maize (Gierl 1996) and Tam1 from snapdragon ( Antirrhinum majus) (Nacken et al. 1991). The non-autonomous elements, such as the maize dSpm (deletion derivatives of Spm), on the other hand, require the corresponding autonomous element (in this case, Spm) for transposition (Gierl 1996).

It is known that TEs contribute significantly to genomic plasticity in the face of diverse environmental conditions. During the past decade, a number of TEs have been found to be activated when plants or cells were subjected to stresses (Hirochika 1993; Arnault and Dufournel 1994; Wessler 1996). For example, transcription of the tobacco retrotransposon Tnt1 could be induced by pathogens and microbial elicitors, as well as by abiotic factors, indicating that activation of Tnt1 might be a local and early plant response to microbial stress (Pouteau et al. 1994; Mhiri et al. 1997; Vernhettes et al. 1997). Moreover, mobility of Tnt1 correlated with its transcriptional activation by fungal elicitation (Melayah et al. 2001), and Tnt1 insertion could change host gene splicing (Leprinc et al. 2001). Similarly, we isolated a cDNA ( Rim2) derived from a transcript that was strongly induced by the rice fungal pathogen Magnaporthe grisea and the corresponding elicitor. Rim2 harbored an ORF encoding a putative protein that shows weak similarity to the TNP2-like TPases encoded by CACTA TEs (He et al. 2000). This report was the starting point for the study of class 2 elements involved in stress responses in the rice genome. Subsequently Panaud et al. (2002) isolated four genomic clones of the Rim2 family using representational difference analysis (RDA), and named these Hipa elements. These authors suggested that Hipa is an ancient component of the Oryza genome. However, we still lack information on the organization of the entire Rim2/Hipa family in the rice genome (for consistency, we will refer to this family as Rim2 in the following).

DNA sequencing and data mining have demonstrated that the rice genome is rich in TEs (Bureau et al. 1996; Hirochika et al. 1996; Song et al. 1998; Ohtsubo et al. 1999; Mao et al. 2000; Jiang and Wessler 2001; Turcotte et al. 2001; Jiang et al. 2002; Panaud et al. 2002). Most recently, the rice class 2 element mPing was shown to transpose actively in rice cell cultures using a data mining and transposon display approach (Jiang et al. 2003). Similarly, the availability of genomic Rim2 clones and vast amounts of rice genomic sequence (http://www.ncbi.nlm.nih.gov) now allow us to address the issue of the structure and genomic organization of the Rim2 family. In this paper, we report that rice Rim2 elements constitute a CACTA-like superfamily with at least 600 members per haploid genome. Of particular interest is the finding that members of the Rim2 family carry a unique CACTG core sequence in their TIRs.

Materials and methods

Rim2 DNA cloning and sequencing

A 786-bp cDNA which codes for the conserved Rim2 TPase-was used as probe to screen a rice bacterial artificial chromosome (BAC) library containing 1.07×104 clones (He et al. 2000). From positive BACs, three Rim2 subclones, Rim2-228 , Rim2-248 and Rim2-553, were sequenced on an ABI 3700 sequencer (Perkin-Elmer, Foster City, Calif.). The sequences have been deposited in the GenBank database under the Accession Nos. AY090462, AY090463 and AY090464, respectively.

Sequence mining and identification of Rim2 elements

Publicly available genomic sequence data from rice (the japonica cultivar Nipponbare) obtained by the International Rice Genome Program were accessed through the GenBank database (http://www.ncbi.nlm.nih.gov) and subjected to Blast searches (Altschul et al. 1990), using the Rim2 cDNA and the conserved sequences of Rim2-228, Rim2-248 , Rim2-553 as queries. The conserved TIR sequence 5′-CACTGGTGGAGAAACC-3′ and target site duplications (TSDs) were excluded from the query sequences. More than 340 elements were identified among the rice sequences available as of 5 September, 2002. The Rim2 content of the genome was calculated by dividing the total length of the Rim2 sequences detected by the total length of sequence searched. Copy numbers were estimated assuming that the haploid genome size is 420–466 Mb (Goff et al. 2002; Yu et al. 2002).

DNA and protein sequence analysis

Initial sequence analysis was performed using GCG (Genetics Computer Group, Madison, Wis.) programs and the EMBOSS program (European Molecular Biology Open Software Suite) accessed through the Shanghai Institutes for Biological Sciences (SIBS). Sub-terminal repeat sequences were analyzed with Lasergene GeneQuest (DNAstar, Madison, Wis.). Multiple alignments were generated with ClustalX (Thompson et al. 1997), then refined and displayed using GeneDoc (http://www.psc.edu/biomed/genedoc). Sequence similarity between proteins was estimated based on the BLOSUM62 matrix (Henikoff and Henikoff 1993). AT/GC contents of regions around insertion sites were calculated and compared to the values derived from 7.07 Mb of randomly chosen sequences as a control. Insertions of annotated elements into intergenic and genic regions were predicted based on GenBank searches. To document element mobility in the past, sequences immediately flanking Rim2 inserts were used as queries to search for target sequences without inserts, an approach also called 'related to empty sites' (RESites) (Le et al. 2000). High levels of shared nucleotide sequence similarity between members were taken to indicate recent activity of the Rim2 family, as has been documented for other TEs.

Phylogenetic analysis

A phylogenetic tree was constructed using the distance-based neighbor-joining method MEGA2.1 (Kumar et al. 2001), based on amino acid sequences of TPases encoded by representative CACTA-like elements, including Rim2-228, Rim2-248 and Rim2-553 (this study), the Rim2 -cDNA (He et al. 2000), the maize TNPD (Pereira et al. 1986), DOPD (Bercury et al. 2001) and Sho sequences (Panavas et al. 1999), the snapdragon TNP2 (Nacken et al. 1991), and the Arabidopsis ATENSPM6 and ATENSPM9 sequences (Jiang and Wessler, http://www.girinst.org), the Tdc1-ORF from carrot (Ozeki et al. 1997), the soybean Tgm5-ORF (Rhodes and Vodkin 1988), and the Tnp2L from sorghum (GenBank AAD27566). Bootstrap analysis was performed with 1000 replicates to test the significance of nodes.

Fluorescence in situ hybridization (FISH) on pachytene chromosomes

The FISH procedure essentially followed the published protocol using young panicles of a japonica rice variety, Nipponbare (Jiang et al. 1995). A 2.8-kb Eco RI cDNA fragment (Accession No. AF121139) and a 12-kb Rim2-248 fragment were labeled with biotin-16-dUTP by nick translation. Hybridized slides were examined under an Olympus BX60 fluorescence microscope. Chromosomes and FISH signal images were captured using a SenSys charge-coupled device camera (Photometrics, Tucson, Ariz.).

Distribution of Rim2-coding sequences in monocotyledonous and dicotyledonous plants

Genomic DNAs of members of the family Poaceae, including cultivated rice ( Oryza sativa L.), wild rice ( O. rufipogon), wheat, barley, maize, sorghum, millet and Arabidopsis were digested with Eco RI, and Southern analysis was performed as described by He et al. (2000), using the conserved 786-bp Rim2 fragment as probe, and washing at 65°C. The blots were autoradiographed for only 3 h to avoid overexposure of the lanes containing rice DNA.

Results

Molecular structure of the Rim2 elements

Even though the Rim2 genomic sequences could be deduced by bioinformatic analysis of the sequenced rice genome, we nevertheless isolated and sequenced some Rim2 clones for direct determination of their sequences and for use as FISH probes. Three Rim2 subclones, Rim2-228 , Rim2-248 , Rim2-533 , were isolated from BACs, and subjected to detailed sequence analysis. These elements ( Rim2-228,10,917 bp; Rim2-248, 14,873 bp; and Rim2-533, 14,293 bp) consist of highly structured termini made up of three motifs (TSD, TIR and sub-terminal direct/inverted repeats), coding regions, and large internal variable regions (Fig. 1A and B). All three elements have perfect 16-bp CACTGGTG/AGAGAAACC TIRs. A unique feature is the presence of the 5-bp core sequence CACTG in the TIRs, instead of the core sequence CACTA found in the TIRs of other CACTA elements. Upon insertion, 3-bp TSDs, GAA ( Rim2-228), TAA ( Rim2-248) and TTA ( Rim2-553), respectively, were generated. In the sub-terminal regions, near identical 16- to 17-bp direct and inverted repeats with the 16-bp consensus sequence ATCTTTAGTCCCGGTT were identified (Fig. 1A and B); these repeats were also present in the central region (Panaud et al. 2002). Rim2 shares this feature with other CACTA elements, which contain sub-terminal repeats of 9–12 bp that differ in their consensus sequences. Interestingly, there are 105-bp tetra-repeats immediately downstream of the RIM2 coding region in Rim2-228 (Fig. 1A and B), which are almost the same as the repeats found in the Rim2 cDNA clone, suggesting this element might be the source of the Rim2 transcript previously isolated (He et al. 2000). Similar tandem repeats, albeit of variable length, are also found in Rim2-248 and Rim2-533. The role of these repeats remains unknown.

Fig. 1A,–C.
figure 1

Schematic representation of the structures of Rim2-228, Rim2-248 and Rim2-553. A The elements are drawn to scale. The black triangles represent TIRs with the trinucleotide duplications. The conserved 16-bp sub-terminal repeats are indicated by the numbered open triangles. The tandem repeats of 41-bp are indicated by gray triangles, and four 105-bp tandem repeats in Rim2-228 are depicted as vertical black triangles. RIM2 and HRGP- coding regions are also indicated. B Sequences of TIRs, sub-terminal repeats, 41-bp and 105-bp tandem repeats are shown. C Structural model for Rim2 elements. Highly conserved regions containing sub-terminal repeats and the RIM2-protein coding regions is shown as gray and black rectangles, respectively. The open rectangles represent variable internal sequences

ORF prediction revealed that Rim2-228, Rim2-248 and Rim2-553 all contain two genes. Gene 1 encodes RIM2 proteins (which show more than 90% sequence identity to the product of the ORF in the Rim2 cDNA and to each other) that show fragmentary homology to TNP2/TNPD-like proteins encoded by the CACTA family (see below). The predicted product of Gene 2 is a putative hydroxyproline-rich glycoprotein (HRGP; Accession No. AU090537), and shows no similarity to the TNP1/TNPA DNA-binding protein encoded by other elements of the CACTA family or other putative proteins coded for by Cs1, another CACTA-like element from sorghum (Chopra et al. 1999). The significance of the HRGP ORF in this TE family is not known. No ORF for a TNP1/TNPA protein could be detected in the Rim2 elements in spite of detailed screening. Based on these features, a model structure for Rim2 elements can be derived (Fig. 1C).

Genome-wide mining and sub-grouping of the Rim2 elements

Based on the structural features of the Rim2 elements described above, we conducted a genome-wide search for Rim2 elements using the publicly available rice genome sequence ( japonica accession). As a result, 344 individual Rim2 elements were retrieved from a total sequence of about 230 Mb, including the full-length sequences of chromosomes 1, 4 and 10, that were accessible by 5 September, 2002 (see Electronic Supplementary Material). Their average size is 5.8 kb (their TIRs were delimited on the basis of the structured motifs described in Fig. 1). Extrapolating from this result, it is estimated that about 600–700 Rim2 elements are present in the haploid rice genome of 420–466 Mb (Goff et al. 2002; Yu et al. 2002). Rim2 elements therefore account for about 0.88% of the total rice genome.

Most of the mined elements have conserved 16-bp TIRs and sub-terminal repeats, internal variable regions, as well as 3-bp TSDs (see Electronic Supplementary Material). Intriguingly, 49 members contain imperfect TIRs with CACTA and CACTG sequences at their 5′ and 3′ ends (CACTA-CACTG), respectively. The 347 Rim2 elements can be divided into three subgroups depending on their coding capacity: RIM2-coding (84), RIM2-pseudogene (50) and non-coding (213) subgroups (Table 1). The members of the RIM2-coding subgroup have ORFs near their 5′ ends for putative RIM2 proteins with 416–1110 amino acids. Most of the RIM2-coding elements also harbor ORFs near the 3′ ends for putative HRGPs with 521–2635 amino acids. Eight elements ( Rim2-M90, Rim2-M93 , Rim2-M272 , Rim2-M328 , Rim2-M333 , Rim2-M341 , Rim2-M342 and Rim2-M344) have ORFs that encode novel putative proteins which show only weak similarity to the RIM2 proteins, and all of these, except Rim2-M333, harbor ORFs for putative HRGPs. There are 50 Rim2 members containing pseudogenes; i.e., their ORFs are disrupted by several stop codons or frameshifts. These data support the postulate that high-copy-number TPase genes usually evolve into pseudogenes due to rapid accumulation of substitutions, insertions and deletions (Feschotte et al. 2002). These results also demonstrate that the Rim2 family, although structurally conserved, has been subject to dramatic evolutionary variation, indicative of long-term evolution of this TE family in the rice genome. Similar patterns of divergence are also observed in CACTA elements (Kunze and Weil 2002).

Table 1. Subgroups of sequenced/mined Rim2 element

Insertion preference of the Rim2 elements

Since the whole japonica genome has not been completely annotated with sequenced BACs, we could not make a complete inventory of Rim2 elements on all chromosomes. Instead, we performed FISH analysis on pachytene chromosomes using the Rim2 cDNA and Rim2-248 DNA as probes. The FISH images obtained with the cDNA and genomic DNA are quite similar, and provide direct evidence that Rim2 elements are distributed widely across the whole genome (Fig. 2). They also indicate that Rim2 elements are not evenly dispersed on chromosomes; some regions show higher element densities than others.

Fig. 2A, B.
figure 2

FISH images of rice pachytene chromosomes from young panicles. A A 2.8-kb Eco RI fragment of the Rim2 cDNA (He et al. 2000) labeled with biotin-16-dUTP was hybridized to rice pachytene chromosomes. B FISH with a 12-kb Rim2-248 fragment. FISH signals are green against the red background of chromosomes. N, nucleolus

The observation that Rim2 elements are unequally distributed along chromosomes raises the question whether Rim2 inserts preferentially at certain types of sites. We conducted a survey of the AT/GC contents of the regions adjacent to all Rim2 inserts (Fig. 3). The result indicates that Rim2 elements appear to show a preference for AT-rich regions: the 5′ and 3′ flanking regions lying within 100–500 bp from TSDs show average AT contents of 59.8–60.2% and 60.7–61.6%, respectively, compared to the average 56.3% AT content of a large sample of randomly chosen rice sequences. This preference for AT-rich regions has also been observed for some other transposons (Le et al. 2000).

Fig. 3.
figure 3

Rim2 preferentially inserts into AT-rich sequences. AT contents were calculated in flanking regions within different distances (up to 500 bp) from TSDs of all mined Rim2 members. The AT content of a randomly picked 7.07-Mb sequences was calculated as the control ( black bar). Note that both 5′ ( gray bars) and 3′ ( open bars) flanking regions have higher AT contents than the control. AT contents are indicated above the bars

In contrast to the observation that TEs insert randomly into rice genes (Mao et al. 2000), a probable preference for insertion into genic regions including exons, introns and the immediate 5′ and 3′ non-coding termini of predicted genes, was observed for this TE family. Of the 344 elements mined, 204 have been annotated in sequences. We surveyed the distribution patterns of these 204 elements (Fig. 4A). There are 44 elements (21.6%) inserted in coding regions of predicted genes, while 101 (49.5%) have inserted into non-coding regions including introns and 5′ and 3′ non-coding termini of predicted genes. Together, these two sets account for 71.1% of all the annotated elements; only 59 (28.9%) are found in predicted intergenic sequences. For example, Rim2-M6 is integrated in the promoter of a gene encoding a hypothetical protein, Rim2-M8 is inserted in a putative gene to constitute its intron 1, exon 2 and intron 2, and a putative peroxidase gene contains Rim2-M25 in intron 3 (Fig. 4B). It has previously been reported that the element is present in the Xa21 gene cluster in rice (He et al. 2000). These results suggest that the Rim2 elements, like many other TEs, can affect gene structure and expression by transposing into transcriptionally active sections of the chromosomes.

Fig. 4A, B.
figure 4

Distribution of Rim2 element in genic and intergenic regions. A In all, 204 annotated Rim2 elements were classified according to whether they were inserted in intergenic (28.9%), genic coding (21.6%) or genic non-coding regions (49.5%) of predicted genes. B Examples of Rim2 inserts in genic regions. Rim2-M6 was integrated into the promoter and the first exon of a gene encoding a hypothetical protein (BAB18329); Rim2-M8 spanned exon 2 of a putative gene (BAA90493); Rim2-M25 inserted into the intron 3 of a putative peroxidase gene (AAK72285). The black boxes and the open boxes represent exons and introns of putative genes, respectively. The open boxes with arrowheads (TIRs) indicate the inserted Rim2 elements

The Rim2 elements belong to a distinct CACTA-like superfamily

The previously isolated Rim2 cDNA was initially considered to be derived from a possible member of the CACTA family that includes the En / Spm element from maize and the Tam1 element from snapdragon, based on the fact that the putative RIM2 protein showed weak similarity to the TNP2/TNPD proteins encoded by the CACTA transposons (He et al. 2000). The genomic organization of the Rim2 elements also shows similarity to that of the CACTA elements. However, the Rim2 elements differ in several important respects from to the known CACTA elements: they exhibit neither the TNP1/TNPA protein-coding region nor the consensus CACTA motifs in the TIRs, and harbor a unique HRGP-encoding region. Moreover, the Rim2 elements are much more abundant than the CACTA elements (Nacken et al. 1991; Gierl 1996; Snowden and Napoli 1998). In order to gain insight into the phylogenetic relationship between Rim2 and other CACTA elements, we constructed a multiple alignment based on the sequences of representative RIM2 TPases and other CACTA-encoded TPases, and derived a phylogenetic tree for these proteins. This clearly shows that there is only fragmentary homology between the different kinds of proteins even in their most highly conserved regions (Fig. 5A). The phylogenetic tree indicates that the proteins encoded by the CACTA-like elements can be divided into two subgroups, the TNP2 and the RIM2 subgroups (Fig. 5B). Our results provide evidence that Rim2 elements constitute a CACTA-like (CACTA-related) superfamily that is distinct from other CACTA elements, and has several unique features: CACTG TIRs, conserved TPase- and HRGP-coding regions, and a high degree of amplification in the rice genome.

Fig. 5A, B.
figure 5

Multiple alignment and phylogenetic tree of transposases encoded by Rim2 and other CACTA elements. A Amino acid alignments of the regions most conserved between predicted protein products of the Rim2 elements and other TPases of the CACTA elements. Amino acid sequence similarity as determined by the conservation mode of GeneDoc is indicated by shading ( black for 100% conservation, gray for ≥55% conservation. Dashes represent gaps. The numbers indicate positions of amino acid residues in the corresponding proteins. B Unrooted neighbor-joining tree derived from the entire TPase sequence alignment indicated in A. Bootstrap values (>50%) for 1000 bootstrap iterations are shown below the corresponding nodes

Rim2 elements were recently active

Despite variation in size and coding region, the high level of shared nucleotide sequence similarity (>90% identity) over the entire sequences of Rim2 members suggests that at least some of the elements were recently active, as has been deduced for other TEs (Le et al. 2000). We also tried to reconstruct the evolutionary history of element mobilization by identifying repeat target sequences with and without Rim2 insertions (also called the RESite approach; Le et al. 2000). Several sequences were found to correspond to certain mined members such as Rim2-M3, Rim2-M39 and Rim2-M45 (Fig. 6), clearly attesting to the past transposition/insertion of the Rim2 elements. Thus active transposition is undoubtedly responsible for the high copy number of these TEs in the rice genome. Circumstantial evidence from studies of MITEs and retrotransposons has strongly suggested the involvement of high-copy-number TEs in the recent restructuring of plant genomes (Witte et al. 2001; Feschotte et al. 2002). Our observations on the transposition history of Rim2 elements implies a similar function for the Rim2 family in restructuring the rice genome, although no current transposition activity of the family has been detected because it is difficult to identify new inserts against the background of numerous older copies.

Fig. 6.
figure 6

Rim2 elements have been active recently. Empty repeat sites corresponding to three Rim2 members, Rim2-M3 , Rim2-M45 and Rim2-M39, were identified as described in Materials and methods. Note that at least 18 sites correspond to Rim2-M39, indicating its insertion into a highly repeated sequence. The open boxes with black triangles represent individual Rim2 elements. GenBank Accession Nos. and nucleotide positions for these sites are indicated. Target sites are underlined and TSDs are depicted in bold

The RIM2-coding sequence is specific for the Oryza genome

Southern hybridization was performed with genomic DNA from several monocot and dicot species (Fig. 7). The result showed that the RIM2-coding sequence only hybridized to the genomes of cultivated and wild rice, and did not cross-hybridize to the genomes of other cereal crops or Arabidopsis. In agreement with this observation, no homologous sequence from other plants was discovered in the databases by Blast searches with the Rim2 sequence queries. We also detected the Rim2 sequence in other Oryza species (data not shown). Thus, the RIM2-coding sequence is most probably specific to Oryza genomes.

Fig. 7.
figure 7

Southern hybridization of different plant DNAs with the RIM2-coding probe at high stringency (65°C). The lanes were loaded with genomic DNA from O. sativa (R), O. rufipogon (Ruf), Ma, maize; Mi, millet; B, barley; W, wheat; S, sorghum; Ab, Arabidopsis. Note that only rice DNA fragments hybridized specifically to the probe; the molecular weight markers (MW) also show some non-specific hybridization with the Rim2 DNA

Discussion

Rice is certainly a TE-rich species and, as the amount of sequence information for rice continues to increase, a complete picture of the TEs in the genome will ultimately emerge (Mao et al. 2000; Turcotte et al. 2001; Jiang and Wessler 2001). Here, we report that the previously isolated, pathogen-inducible, rice Rim2 elements belong to a distinct CACTA-like superfamily in the rice genome, which shows some novel features. Our study suggests that Rim2 elements might have originated from an autonomous CACTA ancestor that has yet to be identified. Indeed, rice elements with typical CACTA TIRs have been found, such as Tnr3 (Motohashi et al. 1996), Tnr12 (Han et al. 2000), and CACTA-E, F, G (Jiang and Wessler; www.girinst.org), which supports this hypothesis. However, all of these CACTA elements are small and harbor neither a TPase-encoding region nor sequence homology to Rim2 elements. We also could not find CACTA elements with both protein-coding regions and the conserved CACTA TIRs among the available rice sequences. The CACTA elements in rice probably represent the remnants of the CACTA ancestor, forming the residue of its evolution in the rice genome. Furthermore, some rice CACTA-like elements, such as those reported by Tarchini et al. (2000) and Turcotte et al. (2001), are in fact Rim2 elements with their CACTG TIRs. With respect to these observations, we suggest that the CACTA ancestor could have developed into two versions, CACTA in other species and its divergent counterpart, Rim2 , in rice. These results also suggest that the CACTA-related superfamily including Rim2 and the other CACTA ( En/Spm) groups is highly diverse in the plant kingdom.

The ORFs of the Rim2 elements have coding capacity for putative RIM2 proteins that structurally resemble the TNP2/TNPD proteins encoded by autonomous CACTA elements such as En / Spm and Tam1 (Fig. 5). TNPD, and probably TNP2 as well, is known to possess TPase activity (Kunze and Weil 2002). It is postulated that the TNP2/TNPD TPases function together with the TNP1/TNPA DNA-binding proteins as trans-acting factors, which bind to the TIRs and the 9- and 12-bp sub-terminal repeats that serve as cis-acting determinants, in order to mediate transposon excision. Furthermore, activation of the autonomous CACTA elements may be regulated by the sub-terminal repeats that are subject to methylation (Miura et al. 2001). The presence of TPase-like RIM2 proteins, 16-bp sub-terminal repeats and highly structured TIRs in Rim2 elements (Fig. 1) implies that a similar mechanism of trans / cis interaction-dependent transposition might have functioned in Rim2 amplification/transposition.

Interestingly, the other gene in the Rim2 elements encodes a putative HRGP instead of a DNA-binding protein, a unique feature not found in other class 2 TEs. It will be interesting to investigate the biological significance of this protein for TE activity.

It has long been known that defective elements can lose the ability to transpose actively during evolution and come to depend on the functions of autonomous elements, as proposed in the cases of the Sorghum element Cs1 which lacks a gene for the TNP2-like TPase (Chopra et al. 1999), the maize Dop which cannot synthesize functional products due to several mutations (Bercury et al. 2001), and the Arabidopsis element Atenspm which has suffered large internal deletions (Kapitonov and Jurka 1999). Whether members of the Rim2 family, which lack a coding sequence for the TNP1/TNPA DNA-binding protein, are defective or non-autonomous is still an open question. If so, the mobility of the Rim2 element(s) would depend on functions of an autonomous member of the Rim2 family that has not yet been identified. For genome stabilization, non-autonomous elements could derive an evolutionary advantage by preventing deleterious mutations from accumulating in the host organism, which is consistent with the observation that high-copy-number TEs seem to be transpositionally inactive (Feschotte et al. 2002). However, since Rim2 transcription is strongly induced under biotic stress (He et al. 2000), the pattern of transposition induced by stress-dependent activation will be worth investigating.

Sequence information for rice has increased dramatically as a result of the genome project (Goff et al. 2002; Yu et al. 2002). On the other hand, computer-aided data mining has proven to be the only viable approach for identifying vast groups of TEs (Le et al. 2000; Mao et al. 2000; Turcotte et al. 2001; Jiang et al. 2002). Certainly, more Rim2 members are expected as rice genome projects progress, making this family one of the biggest class 2 families in the rice genome, considering both element size and copy number. To the best of our knowledge, the Rim2 data presented here makes this the first class 2 family reported to be so highly amplified in the rice genome. So far, only class 1 elements have been shown to attain such copy numbers, as a consequence of their copy-paste mode of transposition. It is interesting to speculate that a similar copy-paste mechanism might function in Rim2 transposition. Moreover, Rim2 elements appear to preferentially insert into genic regions (Fig. 4), indicating the importance of the Rim2 family for rice gene structure and expression. Furthermore, because of their ubiquitous distribution and insertion polymorphism in different rice lines (Figs. 2 and 3, and unpublished data), the Rim2 elements can be exploited as novel genetic markers.