Introduction

Short Interspersed Nuclear Elements (SINEs) are transposable elements (TEs) of eukaryotic genomes, i.e. sequences of DNA capable of moving from place to place in the genome and of replicating independently of the systematic replication of genomic DNA (Wicker et al. 2007). In plants, SINE identification has been achieved only in a limited number of species, and especially in model plants, whose genome was completely sequenced. The aim of this work was to produce a specific bioinformatics pipeline for analysing second generation sequence reads and identifying SINEs in a non-model species, Olea europaea.

TEs represent a large proportion of eukaryotic genomes and are subdivided into two major classes: DNA transposons and retrotransposons. DNA transposons encode a transposase enzyme catalysing the excision of the element and its integration into a new genomic location. Retrotransposons transpose through an RNA intermediate that is transcribed from a genomic copy of the element, then reverse-transcribed into DNA by a TE-encoded reverse transcriptase and reintegrated elsewhere in the genome. Consequently, at each replication cycle, retrotransposons increase their copy number in the host genome, resulting in a host genome expansion (Wicker et al. 2007).

Retrotransposons can be distinguished into LTR retrotransposons, which are flanked by long terminal repeats (LTRs), and non-LTR elements. The first are represented by two main superfamilies, Copia and Gypsy (Wicker et al. 2007), and account for the majority of plant nuclear genomes, but are less abundant in animals. Non-LTR retrotransposons include long interspersed nuclear elements (LINEs) and SINEs. LINEs and SINEs vary in prevalence and diversity in eukaryotes and are generally poorly represented in plants, but predominate over the LTR retrotransposons in many animals. For instance, the LINE L1 superfamily constitutes about 20 % of the human genome; and the Alu element, a SINE, has at least 500,000 copies (Rowold and Herrara 2000).

Intact LTR retrotransposons and LINEs encode the enzymes necessary for the reverse transcription and the integration of the new copy into the genome (autonomous elements); in nonautonomous elements, reverse transcription and integration rely on the enzyme machinery produced by other elements.

SINEs are short nonautonomous retroelements ranging in size up to 500 nucleotides, and their retrotransposition depends on proteins encoded by a partner LINE (Singer 1982; Boeke 1997; Dewannieux et al. 2003). The so-called ‘stringent’ SINEs have a unique, obligatory partner (Dewannieux et al. 2003), whereas others have more partners (Kajikawa and Okada 2002).

A SINE consists of two or more modules. The 5′-terminal head originates from one tRNA, 7SL RNA, or 5S rRNA, synthesised by the RNA polymerase III, and carries an internal promoter (as the corresponding RNAs). The central domain, the body, has an unknown origin or descends from a partner LINE and can contain a core domain that is shared with distant SINE families. The 3′-terminal tail is a sequence of variable length that consists of a poly(A) stretch, a poly(T) stretch, or of simple sequence motifs. In addition, two SINEs can combine into a dimeric SINE (Vassetzky and Kramerov 2013).

SINEs are divided into three types according to the RNA of origin. The majority of SINEs are ancestrally derived from tRNAs (Galli et al. 1981; Okada 1991; Sun et al. 2007). By contrast, some mammalian SINEs, such as B1 and Alu, originated from 7SL RNAs, while the zebrafish SINE3 is derived from 5S rRNA genes (Ullu and Tschudi 1984; Kapitonov and Jurka 2003). SINEs originated from 5S rRNA have been found also in a few mammals such as fruit bats (Gogolevsky et al. 2009) and springhare (Gogolevsky et al. 2008).

SINEs are transcribed by the cellular RNA polymerase III from the internal promoter allowing them to be expressed (Galli et al. 1981), although active external RNA polymerase II-related promoters have been found in some cases (Ferrigno et al. 2001). The SINEs promoters originated from tRNA or from 7SL RNA consist of two boxes (A and B) of about 11 nt, spaced by 30–35 nt, while those originated from 5S rRNA genes have three boxes: A, IE and C (Schramm and Hernandez 2002). The presence of the promoter within the transcribed sequence is critical for SINE amplification, as the promoter is preserved in new SINE copies.

The notion of ‘SINE family’ is widely used, but not clearly defined. Vassetzky and Kramerov (2013) consider a SINE family as a set of SINEs of a common origin and consisting of the same modules (except the tail, which can vary even within the same family). Thus, similar SINEs with different LINE-derived regions (e.g. mammalian Ther-2 and Mar-1) belong to different families. Long insertions are considered as modules. At the same time, internal deletions or duplications within modules do not give birth to a new family; although a combination of complete or almost complete SINEs (complex SINEs) is considered as a new family. Finally, there are a few SINEs with similar structure but of independent origin (e.g. simple SINEs: ID in rodents, vic-1 in camels and DAS-I in armadillos) that are consequently considered as different families (Vassetzky and Kramerov 2013).

SINEs have frequently been used to explore phylogenetic relationships, based on insertional polymorphism, in humans (Perna et al. 1992), rodents (Churakov et al. 2010) and others animals. In plants, several SINEs have been identified, such as in Brassica napus (Deragon et al. 1994), Oryza sativa (Umeda et al. 1991), Nicotiana tabacum (Yoshioka et al. 1993), Myotis daubentoni (Borodulina and Kramerov 1999), Solanum tuberosum (Wenke et al. 2011), rice (Tsuchimoto et al. 2008), sugar beet (Schwichtenberg et al. 2015). However, plant SINEs remains largely unidentified due to their low level of retrotransposition. Although SINEs may have an impact on gene expression, their insertions in the genome seem to be more tolerated compared to other retroelements, even in gene-containing regions, probably because of SINE small size (Lenoir et al. 2001). Moreover, SINEs have no conserved coding sequence, making their research very difficult.

The first plant SINEs identified were p-SINE1 of rice and TS of tobacco (Umeda et al. 1991; Yoshioka et al. 1993). The current distribution of known SINE families suggests that SINEs are ubiquitous in plants and that the number of characterised plant SINE families will increase with the number of sequenced plant genomes. For example, SINEs are involved in the evolution of genes in Solanaceae species: they can produce start and stop codons, splice sites, enlarged introns and UTRs to genes (Seibt et al. 2016). Despite SINEs multiple presumed roles, their identification has been achieved only in a limited number of plant species and it does not allow to establish general evolutionary patterns for this kind of repeats. Such a lack of information is apparent especially in non-model plants whose genome has not been sequenced yet.

Olea europaea L. has a medium sized haploid genome of 1.4 Gbp (Loureiro et al. 2007) whose structure is still largely uncharacterised, despite the economic, cultural and ecological importance of olive trees in the Mediterranean area, now extending to other regions. Concerning the repetitive component of the olive genome, tandemly repeated sequences and putative retrotransposon fragments were isolated and characterised (Katsiotis et al. 1998; Bitonti et al. 1999; Minelli et al. 2000; Lorite et al. 2001; Contento et al. 2002; Stergiou et al. 2002; Natali et al. 2007). Recently, next generation sequencing technologies and different computational procedures have been used to gain a general insight into the composition of the olive genome and its repetitive fraction (Barghini et al. 2014, 2015a). Illumina and 454 reads from genomic DNA were assembled following different procedures, obtaining more than 200,000 differently abundant contigs, with mean lengths higher than 1000 nt. By combining identification and mapping of repeated sequences, it was established that tandem repeats represent a very large portion of the olive genome (more than a quarter of the whole genome), consisting of six main families of different length. Such a large proportion of tandem repeats is quite rare in plant genomes. The other large abundant class in the olive genome is represented by transposable elements (especially LTR-REs). Such analyses showed that non-LTR retrotransposons are poorly represented, as frequently observed in plant genomes (Barghini et al. 2014).

Recently, a draft assembly of the genome of O. europaea has been released (Cruz et al. 2016), providing a valuable resource for the study of key phenotypic traits of this important tree. However, this genome draft is quite fragmented (11,038 scaffolds with N50 = 443 Kb) and the work was especially focused on assembly procedure and olive gene prediction, while analysis of repetitive DNA was largely incomplete.

With the aim of obtaining a complete characterisation of the repetitive portion of the olive genome that will be useful for a precise annotation of the genome sequence, we analysed the presence of SINEs. We searched for putative full-length SINEs, with different structures, in a set of Roche-454 reads of relevant length, using a bioinformatics tool for the modules detection and a pipeline for manual validation of sequences. This analysis allowed us to perform the first targeted identification and characterisation of tRNA-derived SINEs of O. europaea. The pipeline designed for SINE identification can be used for every plant species, even in the absence of a reference genome and will allow wider analyses on this class of repetitive DNA in plants.

Materials and methods

SINEs characterisation

The identification of full-length SINEs in O. europaea subsp. europaea var. sativa, cv. Leccino, was performed using the SINEFinder tool based on searching structural features of SINEs (Wenke et al. 2011) on olive 454 sequences available on the NCBI-SRA website (http://www.ncbi.nlm.nih.gov/sra, SRX474079). We trimmed 454 reads for quality with default setting and checked for adapters using CLC-BIO Genomic Workbench 7.0.4 (CLC-BIO, Aarhus, Denmark). We obtained 8079,610 single reads, with mean read lengths of 407 nt, corresponding to 2.3 genome equivalents. A subset of 1718,887 reads was prepared selecting reads longer than 400 nt, using an internally developed Perl script.

Structural features of plant SINEs include a tRNA origin, an internal polymerase III promoter (made of A and B boxes) in their 5′ tRNA-related region, a tRNA-unrelated region of variable length, a short stretch of T or A at their 3′-end and the presence of flanking direct repeats (Target Site Duplication, TSD). These key structural features were used by a search algorithm (SINEFinder) to identify the putative SINEs, applying default parameters. After isolation, the SINEs were manually curated to verify the occurrence of the TSD and submitted to BLASTN at the NCBI website (http://www.ncbi.nlm.nih.gov/), with E <10–10, and to RepeatMasker (Jurka 2000, version open -4.0.5) against Repbase (http://www.girinst.org/repbase/index.html), and against databases of DNA repeats of olive (Barghini et al. 2014), rice (Copetti et al. 2015), sunflower (Natali et al. 2013), and Posidonia oceanica (Barghini et al. 2015b), to exclude those sequences sharing similarity with other repeat classes, as retrotransposons and miniature inverted-repeat transposable elements. Only elements with verified TSDs and showing similarity with SINEs of other species or no similarity with other repeats were retained.

Target site duplications were then manually removed from each sequence. Finally, all isolated SINEs were subjected to BLASTClust analysis performed using stringent parameters (-p F -L 1 -b T -S 100 -a 4), to produce a nonredundant sequence set.

Estimation of sequence abundance and conservation

The abundance of SINEs in the olive genome sequence was estimated by mapping a large Illumina sequence read set available from the NCBI-SRA website (SRX465835) to the dataset of SINEs. The Illumina reads were sequenced from DNA of the very same plant of cv. Leccino, used for Roche 454 sequencing. Illumina reads were preprocessed to remove Illumina adapters using the CLC-BIO Genomic Workbench 7.0.4 as well. The tool was also used for quality trimming with default setting to remove organellar reads from the set (using an in-house olive organellar sequence database, Barghini et al. 2014) and to define the length of the reads at 50 nt. The nuclear read set included 150,000,000 reads, corresponding to 7500,000,000 nt and 5.4 genome equivalents.

For these analysis, only the 5′-ends of the putative SINEs were collected (ranging from 52 to 98 nt in length) using a script that cut the sequences 10 bases downstream the B box, to exclude low complexity regions from aligning and mapping. Mapping was performed using the CLC-BIO Genomic Workbench 7.0.4 software with the following mapping parameters: mismatch cost 1, deletion cost 1, insertion cost 1, length fraction 0.7, and similarity 0.7 (Barghini et al. 2014). Since CLC-BIO mapping algorithm distributes multi-reads randomly, the number of mapped reads to a single sequence is simply an indication and not a precise quantification of its abundance.

In other analyses, mapping onto all isolated SINEs (deprived of the low complexity region at 3′-end, as above) and to a sample of 254 full-length LTR retrotransposons (Barghini et al. 2015a) was performed using CLC-BIO Genomic Workbench 7.0.4, with parameters set at different stringencies to compare the sequence conservation of different repeat classes (Natali et al. 2015). Mismatch cost, deletion cost and insertion cost were fixed at 1, and similarity and length fraction were both fixed at 0.9, 0.7, or 0.5 to obtain high, medium or low stringencies, respectively.

Analysis of SINEs positions in the genome

RepeatMasker (Smit et al. 2004) was used to evaluate the proximity of putative SINEs to genes or transposable elements in the olive genome. A library made up by 210,068 contigs composing a draft assembly of the olive genome (Barghini et al. 2014) was masked with the set of putative SINEs using the following parameters: –s –x –no_is –nolow. SINE-containing contigs were then scored for the concurrent presence of genic or repeated sequences.

Phylogenetic analyses

The 5′-ends of the SINEs deprived of low complexity regions as above were aligned using CLUSTALX (http://www.ebi.ac.uk/Tools/msa/clustalo/), and a phylogenetic tree was generated and visualised using FigTree (http://tree.bio.ed.ac.uk/software/figtree/).

Identification of modules

The SINESearch tool was used for the identification of SINE families and modules. SINESearch is based on using FASTA files and simple parameters such as overlap length and sequence identity that are biologically sensible for detecting SINE modules (Vassetzky and Kramerov 2013).

SINESearch offers four banks for searching SINE similarities. We used minimum sequence identity of 65 %, minimum overlap length of 90 % of the query sequence length as a starting point against SINEBank and minimum sequence length of 60 nt, as recommended by the authors, for the search against the Plant tRNABank and, subsequently, against the RNABase bank. The first identified tRNA was used for classification.

Results

We searched putative SINEs in 1718,887 454 reads (longer than 400 nt) of olive genomic DNA and collected 2996 sequences. These sequences were manually validated and subjected to BLASTN analysis against NCBI databases and Repbase to remove false positive and to BLASTClust analysis to produce a nonredundant dataset of SINEs. At the end of the process, we retained a total of 227 unique putative SINEs constituting the OLEASINE dataset. Their sequences are available at the Department of Agriculture, Food, and Environment of Pisa University repository website (http://www.agr.unipi.it/ricerca/plantgenetics-and-genomics-lab/sequence-repository).

The isolated full-length elements covered 54,555 nt on a total of 952,122,442 sequenced nucleotides. The length of the putative SINEs identified in this work ranges from 140 to 362 bp, with a mean length of 240.0 bp. Figure 1 shows the SINE length distribution in OLEASINE. Two peaks can be observed in the length distribution, the first around 212 bp and the second around 262 bp.

Fig. 1
figure 1

Length distribution of the 227 putative SINEs identified in this work

The sequence similarity of the putative SINEs to other elements already described in other species was determined using the online tool SINESearch. The composition of OLEASINE is reported in Fig. 2. The vast majority of these SINEs, 222 out of 227 elements, show similarity to only four types, of which two belong to Solanaceae (SolS-IIIa and SolS-IIIb, identified in Solanum tuberosum and Nicotiana benthamiana, respectively), one to Scrophulariaceae (ScroS-I) and one to Populus trichocarpa (PTr-2).

Fig. 2
figure 2

Distribution of 227 olive putative SINEs according to their similarity to SINEs identified in other species. For each group, the code reported by SINESearch (http://sines.eimb.ru/) is indicated; the codes refer to the plant family or species in which the SINE was identified: SolS Solanaceae, ScroS Scrophulariaceae, PTr Populus trichocarpa; the number of elements is reported in parentheses

A schematic representation of the structure of the major SINE types identified in our study is reported in Fig. 3. One of the most distinctive features of the SINE structure is the presence of a sequence with high similarity to tRNA sequence, which is necessary for SINE transcription because of the presence of an internal promoter, recognised by RNA polymerase III. A search for tRNA sequences was positive in 209 out of 227 putative SINEs. In 209 tRNA-carrying SINEs, tRNAs related to 16 amino acids were found (Fig. 4), the most common tRNAs being those of tyrosine, cysteine and asparagine.

Fig. 3
figure 3

Schematic representation of the major SINE types identified in this study. The head, the body, and the tail are indicated. Numbers indicate the average coordinates of each region (A, B A- and B- box, respectively, as indicated by SINESearch tool; TSD target site duplication)

Fig. 4
figure 4

Distribution of 227 olive putative SINEs based on the similarity to tRNAs related to different amino acids within their sequence

In another analysis, the abundance of each putative SINE in the olive genome was measured by mapping a large set of Illumina reads to the putative SINE sequences. Assuming that Illumina reads in our experiments are sampled without bias for particular sequence types, mapping the reads onto a set of sequences provides a method of estimating the abundance of any genomic sequence of the set (Swaminathan et al. 2007; Tenaillon et al. 2011; Natali et al. 2013).

In our analyses, putative SINEs were previously cut 10 nt downstream the B module to exclude the low complexity region that can be subjected to uncorrected alignment. A total of 5.4 genome equivalents of Illumina reads (150,000,000 nuclear reads, 50 nt in length) were mapped onto the OLEASINE. We used such relatively short reads because the mean length of sequences to be mapped was 72.3 nt. Overall, only 46,143 reads (0.03 %) matched the OLEASINE sequences, with a mean average coverage of 116.28. Although CLC-BIO mapping algorithm distributes multi-reads randomly, the number of multi-reads resulted lower than 1 % (data not shown); hence it did not affect significantly abundance values of each element.

The coverage distribution of the whole set of putative SINEs is reported in Fig. 5. Having mapped 5.4 genome equivalents, the expected average coverage for a single copy sequence was 5.4. Consequently, SINEs with average coverage ranging from 0 to 21.6 should correspond to single or low copy number (1–4) elements; sequences with average coverage between 21.6 and 540 should be medium repeated in the genome (4–100 copies), and those with average coverage higher than 540 should be highly repeated.

Fig. 5
figure 5

Distribution of mapped reads in the OLEASINE database. Sequences are subdivided into single copy or lowly repeated (LR, average coverage between 0 and 21.6), medium repeated (MR, average coverage between 21.6 and 540) and highly repeated (HR, average coverage higher than 540)

Overall, the majority of identified SINEs (211/227) were present as lowly or medium repeated elements. Only five putative SINEs were highly repeated in the olive genome; the most repeated one showed an average coverage of 1211.8, grossly corresponding to 224 copies per genome.

Plant SINEs are mainly dispersed randomly in genomes, although they are rarely present in heterochromatic, pericentromeric regions and have a preference for gene-rich regions (Wenke et al. 2011). To evaluate this aspect in the olive genome, we searched sequences with similarity to putative olive SINEs in the 210,068 contigs which compose a draft assembly of the olive genome (Barghini et al. 2014) and found 920 SINE-containing contigs; then, we scored each contigs for similarity to genic or non-genic sequences. The occurrence of putative gene and transposon-related sequences in SINE-flanking regions is summarised in Table 1. In many cases (184 plus 65 contigs, 27.1 % in total), the putative SINE was adjacent to gene sequences. Two-hundred-fifty plus 65 contigs (34.3 %) were close to retrotransposon fragments. Only 14 contigs (1.5 %) were flanked by DNA transposon fragments. It is to be considered that transposons account for 45.7 % of the olive genome and genes for less than 10 % (Barghini et al. 2014).

Table 1 Occurrence of sequences with similarity to genes and transposable elements (retrotransposons and DNA transposons) in the 920 SINE-containing contigs of the Whole Genome Set of Assembled Sequences obtained by Barghini et al. (2014)

To investigate the relationship among the olive SINEs, the 5′ termini of the elements as above (including A and B boxes, and ranging from 52 to 98 nt in length) were aligned (Supplementary Material # 1) to produce a distance tree (Fig. 6). We used the 5′ terminus instead of full-length elements, because the 3′-ends of SINEs show low complexity and this can produce uncorrect alignments; moreover, the 5′ tRNA-related region of SINEs is the most conserved, carrying the RNA polymerase III promoter needed for element replication. The phylogenetic tree (Fig. 6) shows the occurrence of six significantly distinct groups, of which four are composed by only one element and two include 135 (family A) and 88 (family B) elements, respectively.

Fig. 6
figure 6

Distance tree of olive putative SINEs (bar represents the distance scale). Coloured codes indicate the SINE families that can be distinguished in the database. Asterisks indicate bootstrap values >60 %, calculated for 1000 replicates. A, B Indicate the two main groups of elements according to sequence similarity

To gain insight of SINE dynamics during olive genome evolution lineages, we analysed SINE sequence conservation in comparison to that of LTR-retrotransposon lineages. In Fig. 7, we report the relative number of Illumina reads matching to the different repeat families applying different stringency parameters. Relaxing stringency parameters implied an increase in the number of mapped reads. The ratio between the abundance calculated at medium and high stringency (see “Materials and methods”) for a given lineage should indicate the degree of sequence conservation of the elements belonging to that lineage. Comparing two lineages, such that the lower the ratio, the higher the sequence conservation. SINE sequences are by far less conserved than retrotransposon, with the SINE family B (including poplar-like elements) less conserved than family A (whose elements show similarity to Solanaceae and Scrophulariaceae).

Fig. 7
figure 7

Relative number of mapped Illumina reads on olive SINE and LTR retrotransposons belonging to different families/lineages at different stringency parameters (see “Materials and methods”). For comparison, the number of matched reads at high stringency was fixed at 100 for each lineage

With regard to their abundance in the olive genome, the mean average coverage of A family was 84.8, whereas that of B family was 149.83, i.e. near twofold than that of the A family.

Discussion

SINEs have been well studied in animal genomes, in which they form abundant sequence families. However, they have been characterised only in a limited number of plant species (Umeda et al. 1991; Yoshioka et al. 1993; Deragon et al. 1994; Borodulina and Kramerov 1999; Yasui et al. 2001; Gadzalski and Sakowicz 2011; Wenke et al. 2011; Seibt et al. 2016).

In this work, we have identified and described 227 putative SINEs of olive tree. Such sequences form a small portion of the olive genome. The size of the putative SINEs identified in this work are similar to those typically found in plant SINEs (Deragon and Zhang 2006; Kramerov and Vassetzky 2011). Length distribution reflects the occurrence of two distinct major groups of SINEs.

Analysis of sequence similarity showed the occurrence of only four types, already characterised in Solanaceae (two types), in Scrophulariaceae and in Salicaceae. Almost all presented a sequence with high similarity to tRNA sequences, especially those of tyrosine, cysteine and asparagine. Tyrosine- and cysteine-tRNA related sequences have already been found in SINEs of Populus trichocarpa (Wenke et al. 2011).

Mapping Illumina reads to the putative SINEs was performed to estimate the abundance of SINEs in the olive genome. The resulting low average coverage (0.03 % of the genome) is comparable to that already reported by Barghini et al. (2014) indicating that SINEs plus tRNA genes account for only 0.046 % of the olive genome, a value in the range of that reported in numerous plant species (Wenke et al. 2011). No SINE showed very high abundance in the olive genome, as already observed in other plant species (Wenke et al. 2011).

Olive SINEs resulted located near or within protein-encoding genes at high frequency, compared to other retrotransposons or DNA transposons. These data confirm that SINEs are preferentially associated to genes, similar to that reported also for other plant species, for example, Triticum aestivum (Ben-David et al. 2013), in which approximately 38 % of the Au SINEs are inserted into or adjacent to transcribed regions.

Even if the occurrence of other SINEs in the genome cannot be ruled out, phylogenetic analyses of the 5′ termini of olive SINEs showed the occurrence of two major groups (beside to other four including only one element), that we called families A (related to Solanaceae and Scrophulariaceae SINEs) and B (related to Salicaceae SINEs). These two families showed different sequence conservation, elements of family A being more conserved than those of family B. Moreover, SINE sequences resulted by far less conserved than those of LTR retrotransposons.

Although sequence similarity-based dating can be subject to reservation when inferring over long evolutionary periods, these data suggest that SINE amplification in olive was more ancient than that of LTR retrotransposons, allowing to accumulate more mutations in SINEs than in LTR retrotransposons. Interestingly, Solanaceae and Scrophulariaceae are phylogenetically closer than Salicaceae (the poplar family) to Oleaceae (Judd and Olmstead 2004), in agreement with higher conservation of SINEs of family A than family B. Alternatively, higher sequence diversity in SINEs compared to LTR retrotransposons might be related to the different retrotranscriptases generally involved in reverse transcription of SINEs and LTR retrotransposons. It is known, for example, that reverse transcriptases from oncoretroviruses, lentiviruses, non-LTR and LTR retrotransposons are differently prone to errors during reverse transcription (Jamburuthugoda and Eickbush 2011), probably resulting in a different sequence conservation of these repeat classes.

The difference in abundance between elements of the A and B family (the mean average coverage of B family being near twofold than that of the A family), coupled with the lower sequence conservation of members of the B family, should indicate that most B SINEs amplification has occurred earlier than that of A SINEs. Given their similarity to elements of Salicaceae, it might be argued that in olive tree progenitors, SINE expansion has occurred especially before separation between rosids (to which Solanaceae, Scrophulariaceae and Oleaceae belong) and asterids (to which Salicaceae belong), i.e. the two olive SINE families should have originated at two different times during evolution. Once a precise and detailed olive genome sequence will be available, analysis of SINE loci will allow inferring on both mechanism and function of SINE amplification in this species.

In conclusion, although manual validation of the identified elements was necessary (only 227 out of 2996 elements were maintained), the results show that our strategy is appropriate for SINE identification and will favour further analyses on these relatively unknown elements to be performed in other plant species, even in the absence of a sequenced genome. Functional analyses of putative SINEs will contribute further clarification on the specific mechanisms of SINE replication in plants and the possible reason for their relative paucity in plant genomes.